openldap/servers/slapd/back-sql/docs/concept

CONTENT
1. Purpose
2. Metainformation used
3. Typical back-sql operation
4. Several important common techniques (referrals, multiclassing)

1. Purpose
Primary purpose of this backend is to PRESENT information stored in some RDBMS
as an LDAP subtree without any programming (some SQL and maybe stored
procedures can't be considered programming, anyway ;).

That is, for example, when you (some ISP) have account
information you use in RDBMS, and want to use modern solutions that expect such
information in LDAP (to authenticate users, make email lookups etc.).
Or you want to syncronize or distribute information between different
sites/applications that use RDBMSes and/or LDAP. Or whatever else...

It is NOT designed as general-purpose backend that uses RDBMS instead of
BerkeleyDB (as standard back-ldbm does), though it can be used as such
with several limitations. You can take a look at
http://www.openldap.org/faq/index.cgi?file=378
(OpenLDAP FAQ-O-Matic/General LDAP FAQ/Directories vs. conventional databases)
to find out more on this point.

The idea (detailed below) is to use some metainformation to translate
LDAP queries to SQL queries, leaving relational schema untouched, so that
old applications can continue using it without any modifications. This allows
SQL and LDAP applications to interoperate without replication, and exchange
data as needed.

Back-sql is designed to be tunable to virtually any relational schema without
having to change source (through that metainformation mentioned).
Also, it uses ODBC to connect to RDBMSes, and is highly configurable for
SQL dialects RDBMSes may use, so it may be used for integration and
distribution of data on different RDBMSes, OSes, hosts etc., in other words,
in highly heterogeneous environment.

2. Metainformation used
***
Almost everything mentioned later is illustrated in example, which is located
in backsql/rdbms_depend directory, and contains scripts for generating sample
database for Oracle,MS SQL Server and mySQL.
***
First thing that one must arrange for himself is what set of LDAP
objectclasses can present your RDBMS information.

The easiest way is to create objectclass for each entity you had
in ER-diagram when designing your relational schema.
Any relational schema, no matter how normalized it is, was designed after
some model of your applications domain (for instance, accounts, services etc.
in ISP), and is used in terms of its entities, not just tables of normalized
schema.
It means that for every attribute of every such instance there is an effective
SQL query that loads it's values.

Also you might want your objectclasses to conform to some of standard schemas
like inetOrgPerson etc..

Nevertheless, when you think it out, we must define a way to translate LDAP
operation requests to (series of) SQL queries. Let us deal with SEARCH
operation.

Example:
Lets suppose that we store information about persons working in our
organization in two tables:

PERSONS   		PHONES
----------           -------------
id integer		id integer
first_name varchar      pers_id integer references persons(id)
last_name varchar       phone
middle_name varchar
...

(PHONES contains telephone numbers associated with persons). A person can have
several numbers, then PHONES contains several records with corresponding
pers_id, or no numbers (and no records in PHONES with such pers_id). LDAP
objectclass to present such information could look like this:
person
-------
MUST cn
MAY telephoneNumber
MAY firstName
MAY lastName
...

To fetch all values for cn attribute given person ID, we construct the query:
SELECT CONCAT(persons.first_name,' ',persons.last_name) as cn FROM persons WHERE persons.id=?

for telephoneNumber we can use:
SELECT phones.phone as telephoneNumber FROM persons,phones WHERE persons.id=phones.pers.id and persons.id=?

If we wanted to service LDAP request with filter like (telephoneNumber=123*),
we would construct something like:
SELECT ... FROM persons,phones WHERE persons.id=phones.pers.id and persons.id=? and phones.phone like '123%'

So, if we had information about what tables contain values for each
attribute, how to join this tables and arrange these values, we could try
to automatically generate such statements, and translate search filters
to SQL WHERE clauses.

To store such information, we add three more tables to our schema, so that
 and fill it with data (see samples):

ldap_oc_mappings (some columns are not listed for clarity)
---------------
id=1
name="person"
keytbl="persons"
keycol="id"

This table defines mapping between objectclass (its name held in "name" column),
and table that holds primary key for corresponding entities. For instance,
in our example, the person entity, which we are trying to present as "person"
objectclass, resides in two tables (persons and phones), and is identified
by persons.id column (that we will call primary key for this entity).
keytbl and keycol thus contain "persons" (name of the table), and "id" (name
of the column).

ldap_attr_mappings (some columns are not listed for clarity)
-----------
id=1
oc_id=1
name="cn"
sel_expr="CONCAT(persons.first_name,' ',persons.last_name)"
from_tbls="persons"
join_where=NULL
************
id=<n>
oc_map_id=1
name="telephoneNumber"
sel_expr="phones.phone"
from_tbls="persons,phones"
join_where="phones.pers_id=persons.id"

This table defines mappings between LDAP attributes and SQL queries that
load their values. Note that, unlike LDAP schema, these are not *attribute types*
- attribute "cn" for "person" objectclass can well have it's values in different
table than "cn" for other objectclass, so attribute mappings depend on
objectclass mappings (unlike attribute types in LDAP schema, which are
indifferent to objectclasses). Thus, we have oc_map_id column with link to
oc_mappings table.
Now we cut the SQL query that loads values for given attribute into 3 parts.
First goes into sel_expr column - this is the expression we had between
SELECT and FROM keywords, which defines WHAT to load.
Next is table list - text between FROM and WHERE keywords. It may contain
aliases for convenience (see exapmles).
The last is part of where clause, which (if exists at all) express the
condition for joining the table containing values wich table containing
primary key (foreign key equality and such). If values are in the same table
with primary key, then this column is left NULL (as for cn attribute above).

Having this information in parts, we are able to not only construct queries
that load attribute values by id of entry (for this we could store SQL query
as a whole), but to construct queries that load id's of objects that
correspond to given search filter (or at least part of it).
See below for examples.

ldap_entries
------------
id=1
dn=<dn you choose>
oc_map_id=...
parent=<parent record id>
keyval=<value of primary key>

This table defines mappings between DNs of entries in your LDAP tree,
and values of primary keys for corresponding relational data. It has
recursive structure (parent column references id column of the same table),
which allows you to add any tree structure(s) to your flat relational data.
Having id of objectclass mapping, we can determine table and column for
primary key, and keyval stores value of it, thus defining exact tuple
corresponding to LDAP entry with this DN.

Note that such design (see exact SQL table creation query) implies one
important constraint - the key must be integer. But all that I know about
well-designed schemas makes me think that it s not very narrow ;)
If anyone needs support for different types for keys - he may want to write
a patch, and submit it to OpenLDAP ITS, then I'll include it.

Also, several people complained that they don't really need very structured
tree, and they don't want to update one more table every time they add or
delete instance in relational schema. Those can use a view instead of real
table, something like this:


Robin Elfrink wrote:

> About using a view for ldap_entries...
>
> This is what I came up with this morning:
>
> CREATE VIEW ldap_entries (id, dn, oc_map_id, parent, keyval) AS
>         SELECT (1000000000+userid),
> UPPER(CONCAT(CONCAT('cn=',gecos),',o=MyCompany,c=NL'))
> , 1, 0, userid FROM unixusers UNION
>         SELECT (2000000000+groupnummer),
> UPPER(CONCAT(CONCAT('cn=',groupnaam),',o=MyCompany,c=NL')), 2, 0,
> groupnummer FROM groups;
>


3. Typical back-sql operation
Having metainformation loaded, back-sql uses these tables to determine a set
of primary keys of candidates (depending on search scope and filter). It tries
to do it for each objectclass registered in ldap_objclasses.
Exapmle:
for our query with filter (telephoneNumber=123*) we would get following
query generated (which loads candidate IDs)
SELECT ldap_entries.id,persons.id, 'person' AS objectClass, ldap_entries.dn AS dn FROM ldap_entries,persons,phones WHERE persons.id=ldap_entries.keyval AND ldap_entries.objclass=? AND ldap_entries.parent=? AND phones.pers_id=persons.id AND (phones.phone LIKE '123%')
(for ONELEVEL search)
or "... AND dn=?" (for BASE search)
or "... AND dn LIKE '%?'" (for SUBTREE)

Then, for each candidate, we load attributes requested using per-attribute queries
like

SELECT phones.phone AS telephoneNumber FROM persons,phones WHERE persons.id=? AND phones.pers_id=persons.id

Then, we use test_filter() from frontend API to test entry for full
LDAP search filter match (since we cannot effectively make sense of SYNTAX
of corresponding LDAP schema attribute, we translate the filter into most relaxed
SQL condition to filter candidates), and send it to user.

ADD,DELETE,MODIFY operations also performed on per-attribute metainformation
(add_proc etc.). In those fields one can specify an SQL statement or stored procedure
call which can add, or delete given value of given attribute, using given entry
keyval (see examples -- mostly ORACLE and MSSQL - since there're no stored procs in mySQL).

We just add more columns to oc_m,appings and attr_mappings, holding statements
to execute (like create_proc, add_proc, del_proc etc.), and flags governing
order of parameters passed to those statements.
Please see samples to find out what are the parameters passed, and other
information on this matter - they are self-explanatory for those familiar
with concept expressed above.

4. Several common techniques (referrals, multiclassing etc.)
First of all, lets remember that among other major differences to complete
LDAP data model, the concept above does not directly support such things
as multiple objectclasses for entry, and referrals.
Fortunately, they are easy to adopt in this scheme. Back-sql suggests two more
tables being added to schema -
ldap_entry_objectclasses(entry_id, oc_name), and ldap_referrals (entry_id,url).

First contains any number of objectclass names that corresponding entries
will be found by, in addition to that mentioned in mapping. Back-sql
automatically adds attribute mapping for "objectclass" attribute to each objectclass
mapping, that loads values from this table. So, you may, for instance, have
mapping for inetOrgPerson, and use it for queries for "person" objectclass...

Second table contains any number of referrals associated with given entry.
Back-sql automatically adds attribute mapping for "ref" attribute to each
objectclass mapping, that loads values from this table.
So, if you add objectclass "referral" to this entry, and make one or more
tuples in ldap_referrals for this entry (they will be seen as values of "ref"
attribute), you will have slapd return referral, as described in Administrators
Guide.