Initial proxy cache and syncrepl chapters

This commit is contained in:
Kurt Zeilenga 2003-09-16 05:16:33 +00:00
parent 6489aaa08b
commit 34d6b50e45
4 changed files with 465 additions and 0 deletions

View File

@ -69,6 +69,12 @@ PB:
!include "replication.sdf"; chapter
PB:
!include "syncrepl.sdf"; chapter
PB:
!include "proxycache.sdf"; chapter
PB:
# Appendices
!include "../release/autoconf.sdf"; appendix
PB:

View File

@ -0,0 +1,133 @@
# $OpenLDAP$
# Copyright 2003, The OpenLDAP Foundation, All Rights Reserved.
# COPYING RESTRICTIONS APPLY, see COPYRIGHT.
H1: The Proxy Cache Engine
LDAP servers typically hold one or more subtrees of a DIT. Replica
(or shadow) servers hold shadow copies of entries held by one or
more master servers. Changes are propagated from the master server
to replica (slave) servers using LDAP Sync or {{slurpd}}(8). An
LDAP cache is a special type of replica which holds entries
corresponding to search filters instead of subtrees.
H2: Overview
The proxy cache extension of slapd handles a search request (query)
by first determining whether it is contained in any cached search
filter. Contained requests are answered from the proxy cache's local
database.
E.g. {{EX:(shoesize>=9)}} is contained in {{EX:(shoesize>=8)}} and
{{EX:(sn=Richardson)}} is contained in {{EX:(sn=Richards*)}}
Correct matching rules and syntaxes are used while comparing
assertions for query containment. To simplify the query containment
problem, a list of cacheable "templates" (defined below) is specified
at configuration time. A query is cached or answered only if it
belongs to one of these templates. The entries corresponding to
cached queries are stored in the proxy cache local database while
its associated meta information (filter, scope, base, attributes)
is stored in main memory. Instead of sending a referral for requests
which are not contained, it acts as a proxy and obtains the result
by querying one or more target servers. The proxy cache extends the
meta backend and uses it to connect to target servers.
A template is a prototype for generating LDAP search requests.
Templates are described by a prototype search filter and a list of
attributes which are required in queries generated from the template.
The representation for prototype filter is similar to RFC 2254,
except that the assertion values are missing. Examples of prototype
filters are: (sn=),(&(sn=)(givenname=)) which are instantiated by
search filters (sn=Doe) and (&(sn=Doe)(givenname=John)) respectively.
The cache replacement policy removes the least recently used (LRU)
query and entries belonging to only that query. Queries are allowed
a maximum time to live (TTL) in the cache thus providing weak
consistency. A background thread periodically checks the cache for
expired queries and removes them.
The Proxy Cache paper
({{URL:http://www.openldap.org/pub/kapurva/proxycaching.pdf}}) provides
design/implementation details.
H2: Proxy Cache Configuration
The cache configuration specific directives described below must
appear after the {{EX:"database meta"}} directive and before any other
{{EX:"database"}} declaration in {{slapd.conf}}(5).
H3: Setting cache parameters
> cacheparams <lo_thresh> <hi_thresh> <numattrsets> <max_entries> <cc_period>
The directive enables proxy caching and sets general cache parameters.
Cache replacement is invoked when the cache size crosses the
<hi_thresh> bytes and continues till the cache size is greater than
<lo_thresh> bytes. Total number of attributes sets (as specified
by the attrset directive) is given by <numattrsets>. The entry
restriction for cacheable queries is specified by <max_entries>.
Consistency check is performed every <cc_period> duration (specified
in secs). In each cycle queries with expired TTLs are removed.
H3: Defining attribute sets
> attrset <index> <attrs...>
Used to associate a set of attributes to an index. Each attribute
set is associated with an index number from 0 to <numattrsets>-1.
These indices are used by the addtemplate directive to define
cacheable templates.
H3: Specifying cacheable templates
> addtemplate <prototype_string> <attrset_index> <TTL>
Specifies a cacheable template and the "time to live" (in sec) <TTL>
for queries belonging to the template. A template is described by
its prototype filter string and set of required attributes identified
by <attrset_index>.
H3: Example
An example {{slapd.conf}}(5) for a caching server which proxies for
the backend server {{EX:ldap://server.mydomain.com}} and caches
queries with base object in the {{EX:"dc=example,dc=com"}} subtree
is described below,
> database meta
> suffix "dc=example,dc=com"
> uri ldap://server.mydomain.com/dc=example,dc=com
> cacheparams 100000 150000 1 50 100
> attrset 0 mail postaladdress telephonenumber
> addtemplate (sn=) 0 3600
> addtemplate (&(sn=)(givenName=)) 0 3600
> addtemplate (&(departmentNumber=)(secretary=*)) 0 3600
A different name space is associated with the local cache database.
E.g if the local database suffix is {{EX:"dc=example,dc=com,cn=cache"}},
then following rewriting rules need to be defined to translate
between master and cache database naming contexts.
> rewriteEngine on
> rewriteContext cacheResult
> rewriteRule "(.*)dc=example,dc=com" "%1dc=example,dc=com,cn=cache" ":"
> rewriteContext cacheBase
> rewriteRule "(.*)dc=example,dc=com" "%1dc=example,dc=com,cn=cache" ":"
> rewriteContext cacheReturn
> rewriteRule "(.*)dc=example,dc=com,cn=cache" "%1dc=example,dc=com" ":"
Finally, the local database for storing cached entries can be declared
as follows:
> database ldbm
> suffix "dc=example,dc=com,cn=cache"
> #other database specific directives
The proxy cache database instance could be either {{TERM:BDB}} or
{{TERM:LDBM}}. A script for demonstrating the proxy cache
({{FILE:test019-proxycaching}}) functionality is provided in the
tests/scripts directory of the distribution.

View File

@ -405,6 +405,79 @@ looks at the suffix line(s) in each database definition in the
order they appear in the file. Thus, if one database suffix is a
prefix of another, it must appear after it in the config file.
H4: syncrepl
> syncrepl id=<replica ID>
> provider=ldap[s]://<hostname>[:port]
> [updatedn=<dn>]
> [binddn=<dn>]
> [bindmethod=simple|sasl]
> [binddn=<simple DN>]
> [credentials=<simple passwd>]
> [saslmech=<SASL mech>]
> [secprops=<properties>]
> [realm=<realm>]
> [authcId=<authentication ID>]
> [authzId=<authorization ID>]
> [searchbase=<base DN>]
> [filter=<filter str>]
> [attrs=<attr list>]
> [scope=sub|one|base]
> [schemachecking=on|off]
> [type=refreshOnly|refreshAndPersist]
> [interval=dd:hh:mm]
This directive specifies an LDAP Sync replication between this
database and the specified replication provider site. The id=
parameter identifies the LDAP Sync specification in the database.
The {{EX:provider=}} parameter specifies a replication provider site as
an LDAP URI.
The LDAP Sync replication specification is based on the search
specification which defines the content of the replica. The replica
consists of the entries matching the search specification. As with
the normal searches, the search specification consists of
{{EX:searchbase}}, {{EX:scope}}, {{EX:filter}}, and EX:attrs}}
parameters.
The LDAP Sync replication has two types of operating modes. In the
{{EX:refreshOnly}} mode, the next synchronization session is
rescheduled at the interval time after the current session finishes.
The default interval is set to one day. In the {{EX:refreshAndPersist}}
mode, the LDAP Sync search remains persistent in the provider LDAP
server. Further updates to the provider replica will generate
searchResultEntry to the consumer.
The schema checking can be enforced at the LDAP Sync consumer site
by turning on the {{EX:schemachecking}} parameter. The default is off.
The {{EX:binddn=}} parameter gives the DN for the LDAP Sync search
to bind as to the provider slapd. The content of the replica will
be subject to the access control privileges of the DN.
The {{EX:bindmethod}} is {{EX:simple}} or {{EX:sasl}}, depending
on whether simple password-based authentication or SASL authentication
is to be used when connecting to the provider slapd.
Simple authentication should not be used unless adequate integrity
and data confidential protections are in place (e.g. TLS or IPSEC).
Simple authentication requires specification of {{EX:binddn}} and
{{EX:credentials}} parameters.
SASL authentication is generally recommended. SASL authentication
requires specification of a mechanism using the {{EX:mech}} parameter.
Depending on the mechanism, an authentication identity and/or
credentials can be specified using {{EX:authcid}} and {{EX:credentials}}
respectively. The {{EX:authzid}} parameter may be used to specify
a proxy authorization identity.
The LDAP Sync replication is supported in three native backends:
back-bdb, back-hdb, and back-ldbm.
See the {{SECT:LDAP Sync Replication}} chapter for more information
on how to use this directive.
H4: updatedn <dn>
This directive is only applicable in a slave slapd. It specifies

View File

@ -0,0 +1,253 @@
# $OpenLDAP$
# Copyright 2003, The OpenLDAP Foundation, All Rights Reserved.
# COPYING RESTRICTIONS APPLY, see COPYRIGHT.
H1: LDAP Sync Replication
The LDAP Sync replication engine is designed to function as an
improved alternative to {{slurpd}}(8). While the replication with
{{slurpd}}(8) provides the replication capability for improved capacity,
availability, and reliability, it has some drawbacks :
^ It is not stateful, hence lacks the resynchronization capability.
Because there is no representation of replica state in the replication
with {{slurpd}}(8), it is not possible to provide an efficient mechanism
to make the slave replica consistent to the master replica once
they become out of sync. For instance, if the slave database content
is damaged, the slave replica should be re-primed from the master
replica again. with a state-based replication, it would be possible
to recover the slave replica from a local backup. The slave replica,
then, will be synchronized by calculating and transmitting the diffs
between the slave replica and the master replica based on their
states. The LDAP Sync replication is stateful.
+ It is history-based, not state-based. The replication with
{{slurpd}}(8) relies on the history information in the replication log
file generated by {{slapd}}(8). If a portion of the log file that
contains updates yet to be synchronized to the slave is truncated
or damaged, a full reload is required. The state-based replication,
on the other hand, would not rely on the separate history store.
In the LDAP Sync replication, every directory entry has its state
information in the entryCSN operational attribute. The replica
contents are calculated based on the consumer cookie and the entryCSN
of the directory entries.
+ It is push-based, not pull-based. In the replication with
{{slurpd}}(8), it is the master who decides when to synchronize the
replica. The pull-based polling replication is not possible with
{{slurpd}}(8). For example, in order to make a daily directory backup
which is an exact image at a time, it is required to make the slave
replica read-only by stopping {{slurpd}}(8) during backup. After backup,
{{slurpd}}(8) can be run in an one-shot mode to resynchronize the slave
replica with the updates during the backup. In a pull-based, polling
replication, it is guaranteed to be read-only between the two polling
points. The LDAP Sync replication supports both the push-based
replication and the pull-based replication.
+ It only supports the fractional replication and does not support
the sparse replication. The LDAP Sync replication supports both the
fractional and sparse replication. It is possible to use general
search specification to initiate a synchronization session only for
the interesting subset of the context.
H2: LDAP Content Sync Protocol Description
The LDAP Sync replication uses the LDAP Content Sync protocol (refer
to the Internet Draft entitled "The LDAP Content Synchronization
Operation") for replica synchronization. The LDAP Content Sync
protocol operation is based on the replica state which is transmitted
between replicas as the synchronization cookies. There are two
operating modes : refreshOnly and refreshAndPersist. In both modes,
a consumer {{slapd}}(8) connects to a provider {{slapd}}(8) with a cookie
value representing the state of the consumer replica. The non-persistent
part of the synchronization consists of two phases.
The first is the state-base phase. The entries updated after the
point in time the consumer cookie represents will be transmitted
to the consumer. Because the unit of synchronization is entry, all
the requested attributes will be transmitted even though only some
of them are changed. For the rest of the entries, the present
messages consisting only of the name and the synchronization control
will be sent to the consumer. After the consumer receives all the
updated and present entries, it can reliably make its replica
consistent to the provider replica. The consumer will add all the
newly added entries, replace the entries if updated entries are
existent, and delete entries in the local replica if they are neither
updated nor specified as present.
The second is the log-base phase. This phase is incorporated to
optimize the protocol with respect to the volume of the present
traffic. If the provider maintains a history store from which the
content to be synchronized can be reliably calculated, this log-base
phase follows the state-base phase. In this mode, the actual directory
update operations such as delete, modify, and add are transmitted.
There is no need to send present messages in this log-base phase.
If the protocol operates in the refreshOnly mode, the synchronization
will terminate. The provider will send a synchronization cookie
which reflects the new state to the consumer. The consumer will
present the new cookie at the next time it requests a synchronization.
If the protocol operates in the refreshAndPersist mode, the
synchronization operation remains persistent in the provider. Every
updates made to the provider replica will be transmitted to the
consumer. Cookies can be sent to the consumer at any time by using
the SyncInfo intermediate response and at the end of the synchronization
by using the SyncDone control attached to the SearchResultDone
message.
Entries are uniquely identified by the entryUUID attribute value
in the LDAP Content Sync protocol. It can role as a reliable entry
identifier while DN of an entry can change by modrdn operations.
The entryUUID is attached to each SearchResultEntry or
SearchResultReference as a part of the Sync State control.
H2: LDAP Sync Replication Details
The LDAP Sync replication uses both the refreshOnly and the
refreshAndPersist modes of synchronization. If an LDAP Sync replication
is specified in a database definition, the {{slapd}}(8) schedules an
execution of the LDAP Sync replication engine. In the refreshOnly
mode, the engine will be rescheduled at the interval time after a
replication session ends. In the refreshAndPersist mode, the engine
will remain active to process the SearchResultEntry messages from
the provider.
The LDAP Sync replication uses only the state-base synchronization
phase. Because {{slapd}}(8) does not currently implement history store
like changelog or tombstone, it depends only on the state-base
phase. A Null log-base phase follows the state-base phase.
As an optimization, no entries will be transmitted to a consumer
if there has been no update in the master replica after the last
synchronization with the consumer. Even present messages for the
unchanged entries are not transmitted. The consumer retains its
replica contents.
H3: entryCSN
The LDAP Sync replication implemented in OpenLDAP stores state
information to ever entry in the entryCSN attribute. entryCSN of
an entry is the CSN (change sequence number), which is the refined
timestamp, at which the entry was updated most lately. The CSN
consists of three parts : the time, a replica ID, and a change count
within a single second.
H3: contextCSN
contextCSN represents the current state of the provider replica.
It is the largest entryCSN of all entries in the context such that
no transaction having smaller entryCSN value remains outstanding.
Because the entryCSN value is obtained before transaction start and
transactions are not committed in the entryCSN order, special care
needed to be taken to manage the proper contextCSN value in the
transactional environment. Also, the state of the search result set
is required to correspond to the contextCSN value returned to the
consumer as a sync cookie.
contextCSN, the provider replica state, is stored in the
syncProviderSubentry. The value of the contextCSN is transmitted
to the consumer replica as a Sync Cookie. The cookie is stored in
the syncreplCookie attribute of syncConsumerSubentry subentry. The
consumer will use the stored cookie value to represent its replica
state when it connects to the provider in the future.
H3: Glue Entry
Because general search filter can be used in the LDAP Sync replication,
an entry might be created without a parent, if the parent entry was
filtered out. The LDAP Sync replication engine creates the glue
entries for such holes in the replica. The glue entries will not
be returned in response to a search to the consumer {{slapd}}(8) if
manageDSAit is not set. It will be returned if it is set.
H2: Configuring slapd for LDAP Sync Replication
It is relatively simple to start servicing with a replicated OpenLDAP
environment with the LDAP Sync replication, compared to the replication
with {{slurpd}}(8). First, we should configure both the provider and
the consumer {{slapd}}(8) servers appropriately. Then, start the provider
slapd instance first, and the consumer slapd instance next.
Administrative tasks such as database copy and temporal shutdown
(or read-only demotion) of the provider are not required.
H3: Set up the provider slapd
There is no special slapd.conf(5) directive for the provider {{slapd}}(8).
Because the LDAP Sync searches are subject to access control, proper
access control privileges should be set up for the replicated
content.
When creating a provider database from an ldif file using slapadd(8),
you must create and update a state indicator of the database context
up to date. slapadd(8) will store the contextCSN in the
syncProviderSubentry if it is given the -w flag. It is also possible
to create the syncProviderSubentry with an appropriate contextCSN
value by directly including it in the ldif file. If slapadd(8) runs
without the -w flag, the provided contextCSN will be stored. With
the -w flag, a new value based on the current time will be stored
as contextCSN. slapcat(8) can be used to retrieve the directory
with the contextCSN when it is run with the -m flag.
Only the back-bdb and the back-hdb backends can perform as the LDAP
Sync replication provider. Back-ldbm currently does not have the
LDAP Content Sync protocol functionality.
H3: Set up the consumer slapd
The consumer slapd is configured by slapd.conf(5) configuration
file. For the configuration directives, see syncrepl section of the
slapd Configuration File chapter. In the configuration file, make
sure the DN given in the updatedn= directive of the syncrepl
specification has permission to write to the database. Below is an
example syncrepl specification at the consumer replica :
> syncrepl id = 1
> provider=ldap://provider.example.com:389
> updatedn="cn=replica,dc=example,dc=com"
> binddn="cn=syncuser,dc=example,dc=com"
> bindmethod=simple
> credentials=secret
> searchbase="dc=example,dc=com"
> filter="(objectClass=organizationalPerson)"
> attrs="cn,sn,ou,telephoneNumber,title,l"
> schemachecking=on
> scope=sub
> type=refreshOnly
> interval=01:00:00
In this example, the consumer will connect to the provider slapd
at the port 389 of ldap://provider.example.com to perform a polling
(refreshOnly) mode of synchronization once a day. It will bind as
"cn=syncuser,dc=example,dc=com" using simple authentication with
password "secret". Note that the DN specified by the binddn= directive
must be existent in the slave slapd's database or be the rootdn.
Also note that the access control privilege of the DN should be set
properly to synchronized the desired replica content. It will write
to the consumer database as "cn=replica,dc=example,dc=com". It
should have write permission to the database.
The synchronization search in the example will search for entries
whose objectClass is organizationalPerson in the entire subtree
under "dc=example,dc=com" search base inclusively. The requested
attributes are cn, sn, ou, telephoneNumber, title, and l. The schema
checking is turned on, so that the consumer {{slapd}}(8) will enforce
entry schema checking when it process updates from the provider
{{slapd}}(8).
The LDAP Sync replication engine is backend independent. All three
native backends can perform as the LDAP Sync replication consumer.
H3: Start the provider and the consumer slapd
If the currently running provider {{slapd}}(8) already has the
syncProviderSubentry in its database, it is not required to restart
the provider slapd. You don't need to restart the provider {{slapd}}(8)
when you start a replicated LDAP service. When you run a consumer
{{slapd}}(8), it will immediately perform either the initial full reload
if cookie is NULL or too out of date, or incremental synchronization
if effective cookie is provided. In the refreshOnly mode, the next
synchronization session is scheduled to run interval time after the
completion of the current session. In the refreshAndPersist mode,
the synchronization session is open between the consumer and provider.
The provider will send update message whenever there are updates
in the provider replica.