mirror of
https://git.openldap.org/openldap/openldap.git
synced 2025-01-24 13:24:56 +08:00
408 lines
20 KiB
Plaintext
408 lines
20 KiB
Plaintext
# $OpenLDAP$
|
|
# Copyright 2003-2005, The OpenLDAP Foundation, All Rights Reserved.
|
|
# COPYING RESTRICTIONS APPLY, see COPYRIGHT.
|
|
|
|
H1: LDAP Sync Replication
|
|
|
|
The LDAP Sync replication engine, syncrepl for short, is a consumer-side
|
|
replication engine that enables the consumer LDAP server to maintain
|
|
a shadow copy of a DIT fragment. A syncrepl engine resides at the
|
|
consumer-side as one of the {{slapd}} (8) threads. It creates and
|
|
maintains a consumer replica by connecting to the replication
|
|
provider to perform the initial DIT content load followed either
|
|
by periodic content polling or by timely updates upon content
|
|
changes.
|
|
|
|
Syncrepl uses the LDAP Content Synchronization (or LDAP Sync for
|
|
short) protocol as the replica synchronization protocol. It provides
|
|
a stateful replication which supports both pull-based and push-based
|
|
synchronization and does not mandate the use of a history store.
|
|
|
|
Syncrepl keeps track of the status of the replication content by
|
|
maintaining and exchanging synchronization cookies. Because the
|
|
syncrepl consumer and provider maintain their content status, the
|
|
consumer can poll the provider content to perform incremental
|
|
synchronization by asking for the entries required to make the
|
|
consumer replica up-to-date with the provider content. Syncrepl
|
|
also enables convenient management of replicas by maintaining replica
|
|
status. The consumer replica can be constructed from a consumer-side
|
|
or a provider-side backup at any synchronization status. Syncrepl
|
|
can automatically resynchronize the consumer replica up-to-date
|
|
with the current provider content.
|
|
|
|
Syncrepl supports both pull-based and push-based synchronization.
|
|
In its basic refreshOnly synchronization mode, the provider uses
|
|
pull-based synchronization where the consumer servers need not be
|
|
tracked and no history information is maintained. The information
|
|
required for the provider to process periodic polling requests is
|
|
contained in the synchronization cookie of the request itself. To
|
|
optimize the pull-based synchronization, syncrepl utilizes the
|
|
present phase of the LDAP Sync protocol as well as its delete phase,
|
|
instead of falling back on frequent full reloads. To further optimize
|
|
the pull-based synchronization, the provider can maintain a per-scope
|
|
session log as a history store. In its refreshAndPersist mode of
|
|
synchronization, the provider uses a push-based synchronization.
|
|
The provider keeps track of the consumer servers that have requested
|
|
a persistent search and sends them necessary updates as the provider
|
|
replication content gets modified.
|
|
|
|
With syncrepl, a consumer server can create a replica without
|
|
changing the provider's configurations and without restarting the
|
|
provider server, if the consumer server has appropriate access
|
|
privileges for the DIT fragment to be replicated. The consumer
|
|
server can stop the replication also without the need for provider-side
|
|
changes and restart.
|
|
|
|
Syncrepl supports both partial and sparse replications. The shadow
|
|
DIT fragment is defined by a general search criteria consisting of
|
|
base, scope, filter, and attribute list. The replica content is
|
|
also subject to the access privileges of the bind identity of the
|
|
syncrepl replication connection.
|
|
|
|
|
|
H2: The LDAP Content Synchronization Protocol
|
|
|
|
The LDAP Sync protocol allows a client to maintain a synchronized
|
|
copy of a DIT fragment. The LDAP Sync operation is defined as a set
|
|
of controls and other protocol elements which extend the LDAP search
|
|
operation. This section introduces the LDAP Content Sync protocol
|
|
only briefly. For more information, refer to the Internet Draft
|
|
{{The LDAP Content Synchronization Operation
|
|
<draft-zeilenga-ldup-sync-05.txt>}}.
|
|
|
|
The LDAP Sync protocol supports both polling and listening for
|
|
changes by defining two respective synchronization operations:
|
|
{{refreshOnly}} and {{refreshAndPersist}}. Polling is implemented
|
|
by the {{refreshOnly}} operation. The client copy is synchronized
|
|
to the server copy at the time of polling. The server finishes the
|
|
search operation by returning {{SearchResultDone}} at the end of
|
|
the search operation as in the normal search. The listening is
|
|
implemented by the {{refreshAndPersist}} operation. Instead of
|
|
finishing the search after returning all entries currently matching
|
|
the search criteria, the synchronization search remains persistent
|
|
in the server. Subsequent updates to the synchronization content
|
|
in the server cause additional entry updates to be sent to the
|
|
client.
|
|
|
|
The {{refreshOnly}} operation and the refresh stage of the
|
|
{{refreshAndPersist}} operation can be performed with a present
|
|
phase or a delete phase.
|
|
|
|
In the present phase, the server sends the client the entries updated
|
|
within the search scope since the last synchronization. The server
|
|
sends all requested attributes, be it changed or not, of the updated
|
|
entries. For each unchanged entry which remains in the scope, the
|
|
server sends a present message consisting only of the name of the
|
|
entry and the synchronization control representing state present.
|
|
The present message does not contain any attributes of the entry.
|
|
After the client receives all update and present entries, it can
|
|
reliably determine the new client copy by adding the entries added
|
|
to the server, by replacing the entries modified at the server, and
|
|
by deleting entries in the client copy which have not been updated
|
|
nor specified as being present at the server.
|
|
|
|
The transmission of the updated entries in the delete phase is the
|
|
same as in the present phase. The server sends all the requested
|
|
attributes of the entries updated within the search scope since the
|
|
last synchronization to the client. In the delete phase, however,
|
|
the server sends a delete message for each entry deleted from the
|
|
search scope, instead of sending present messages. The delete
|
|
message consists only of the name of the entry and the synchronization
|
|
control representing state delete. The new client copy can be
|
|
determined by adding, modifying, and removing entries according to
|
|
the synchronization control attached to the {{SearchResultEntry}}
|
|
message.
|
|
|
|
In the case that the LDAP Sync server maintains a history store and
|
|
can determine which entries are scoped out of the client copy since
|
|
the last synchronization time, the server can use the delete phase.
|
|
If the server does not maintain any history store, cannot determine
|
|
the scoped-out entries from the history store, or the history store
|
|
does not cover the outdated synchronization state of the client,
|
|
the server should use the present phase. The use of the present
|
|
phase is much more efficient than a full content reload in terms
|
|
of the synchronization traffic. To reduce the synchronization
|
|
traffic further, the LDAP Sync protocol also provides several
|
|
optimizations such as the transmission of the normalized {{EX:entryUUID}}s
|
|
and the transmission of multiple {{EX:entryUUIDs}} in a single
|
|
{{syncIdSet}} message.
|
|
|
|
At the end of the {{refreshOnly}} synchronization, the server sends
|
|
a synchronization cookie to the client as a state indicator of the
|
|
client copy after the synchronization is completed. The client
|
|
will present the received cookie when it requests the next incremental
|
|
synchronization to the server.
|
|
|
|
When {{refreshAndPersist}} synchronization is used, the server sends
|
|
a synchronization cookie at the end of the refresh stage by sending
|
|
a Sync Info message with TRUE refreshDone. It also sends a
|
|
synchronization cookie by attaching it to {{SearchResultEntry}}
|
|
generated in the persist stage of the synchronization search. During
|
|
the persist stage, the server can also send a Sync Info message
|
|
containing the synchronization cookie at any time the server wants
|
|
to update the client-side state indicator. The server also updates
|
|
a synchronization indicator of the client at the end of the persist
|
|
stage.
|
|
|
|
In the LDAP Sync protocol, entries are uniquely identified by the
|
|
{{EX:entryUUID}} attribute value. It can function as a reliable
|
|
identifier of the entry. The DN of the entry, on the other hand,
|
|
can be changed over time and hence cannot be considered as the
|
|
reliable identifier. The {{EX:entryUUID}} is attached to each
|
|
{{SearchResultEntry}} or {{SearchResultReference}} as a part of the
|
|
synchronization control.
|
|
|
|
|
|
H2: Syncrepl Details
|
|
|
|
The syncrepl engine utilizes both the {{refreshOnly}} and the
|
|
{{refreshAndPersist}} operations of the LDAP Sync protocol. If a
|
|
syncrepl specification is included in a database definition, {{slapd}}
|
|
(8) launches a syncrepl engine as a {{slapd}} (8) thread and schedules
|
|
its execution. If the {{refreshOnly}} operation is specified, the
|
|
syncrepl engine will be rescheduled at the interval time after a
|
|
synchronization operation is completed. If the {{refreshAndPersist}}
|
|
operation is specified, the engine will remain active and process
|
|
the persistent synchronization messages from the provider.
|
|
|
|
The syncrepl engine utilizes both the present phase and the delete
|
|
phase of the refresh synchronization. It is possible to configure
|
|
a per-scope session log in the provider server which stores the
|
|
{{EX:entryUUID}}s of a finite number of entries deleted from a
|
|
replication content. Multiple replicas of single provider content
|
|
share the same per-scope session log. The syncrepl engine uses the
|
|
delete phase if the session log is present and the state of the
|
|
consumer server is recent enough that no session log entries are
|
|
truncated after the last synchronization of the client. The syncrepl
|
|
engine uses the present phase if no session log is configured for
|
|
the replication content or if the consumer replica is too outdated
|
|
to be covered by the session log. The current design of the session
|
|
log store is memory based, so the information contained in the
|
|
session log is not persistent over multiple provider invocations.
|
|
It is not currently supported to access the session log store by
|
|
using LDAP operations. It is also not currently supported to impose
|
|
access control to the session log.
|
|
|
|
As a further optimization, even in the case the synchronization
|
|
search is not associated with any session log, no entries will be
|
|
transmitted to the consumer server when there has been no update
|
|
in the replication context.
|
|
|
|
The syncrepl engine, which is a consumer-side replication engine,
|
|
can work with any backends. The LDAP Sync provider can be configured
|
|
as an overlay on any backend, but works best with the {{back-bdb}}
|
|
or {{back-hdb}} backend. The provider can not support refreshAndPersist
|
|
mode on {{back-ldbm}} due to limits in that backend's locking
|
|
architecture.
|
|
|
|
The LDAP Sync provider maintains a {{EX:contextCSN}} for each
|
|
database as the current synchronization state indicator of the
|
|
provider content. It is the largest {{EX:entryCSN}} in the provider
|
|
context such that no transactions for an entry having smaller
|
|
{{EX:entryCSN}} value remains outstanding. The {{EX:contextCSN}}
|
|
could not just be set to the largest issued {{EX:entryCSN}} because
|
|
{{EX:entryCSN}} is obtained before a transaction starts and
|
|
transactions are not committed in the issue order.
|
|
|
|
The provider stores the {{EX:contextCSN}} of a context in the
|
|
{{EX:contextCSN}} attribute of the context suffix entry. The attribute
|
|
is not written to the database after every update operation though;
|
|
instead it is maintained primarily in memory. At database start
|
|
time the provider reads the last saved {{EX:contextCSN}} into memory
|
|
and uses the in-memory copy exclusively thereafter. By default,
|
|
changes to the {{EX:contextCSN}} as a result of database updates
|
|
will not be written to the database until the server is cleanly
|
|
shut down. A checkpoint facility exists to cause the contextCSN to
|
|
be written out more frequently if desired.
|
|
|
|
Note that at startup time, if the provider is unable to read a
|
|
{{EX:contextCSN}} from the suffix entry, it will scan the entire
|
|
database to determine the value, and this scan may take quite a
|
|
long time on a large database. When a {{EX:contextCSN}} value is
|
|
read, the database will still be scanned for any {{EX:entryCSN}}
|
|
values greater than it, to make sure the {{EX:contextCSN}} value
|
|
truly reflects the greatest committed {{EX:entryCSN}} in the database.
|
|
On databases which support inequality indexing, setting an eq index
|
|
on the {{EX:entryCSN}} attribute and configuring {{contextCSN}}
|
|
checkpoints will greatly speed up this scanning step.
|
|
|
|
If no {{EX:contextCSN}} can be determined by reading and scanning
|
|
the database, a new value will be generated. Also, if scanning the
|
|
database yielded a greater {{EX:entryCSN}} than was previously
|
|
recorded in the suffix entry's {{EX:contextCSN}} attribute, a
|
|
checkpoint will be immediately written with the new value.
|
|
|
|
The consumer also stores its replica state, which is the provider's
|
|
{{EX:contextCSN}} received as a synchronization cookie, in the
|
|
{{EX:contextCSN}} attribute of the suffix entry. The replica state
|
|
maintained by a consumer server is used as the synchronization state
|
|
indicator when it performs subsequent incremental synchronization
|
|
with the provider server. It is also used as a provider-side
|
|
synchronization state indicator when it functions as a secondary
|
|
provider server in a cascading replication configuration. Since
|
|
the consumer and provider state information are maintained in the
|
|
same location within their respective databases, any consumer can
|
|
be promoted to a provider (and vice versa) without any special
|
|
actions.
|
|
|
|
Because a general search filter can be used in the syncrepl
|
|
specification, some entries in the context may be omitted from the
|
|
synchronization content. The syncrepl engine creates a glue entry
|
|
to fill in the holes in the replica context if any part of the
|
|
replica content is subordinate to the holes. The glue entries will
|
|
not be returned in the search result unless {{ManageDsaIT}} control
|
|
is provided.
|
|
|
|
Also as a consequence of the search filter used in the syncrepl
|
|
specification, it is possible for a modification to remove an entry
|
|
from the replication scope even though the entry has not been deleted
|
|
on the provider. Logically the entry must be deleted on the consumer
|
|
but in {{refreshOnly}} mode the provider cannot detect and propagate
|
|
this change without the use of the session log.
|
|
|
|
|
|
H2: Configuring Syncrepl
|
|
|
|
Because syncrepl is a consumer-side replication engine, the syncrepl
|
|
specification is defined in {{slapd.conf}} (5) of the consumer
|
|
server, not in the provider server's configuration file. The initial
|
|
loading of the replica content can be performed either by starting
|
|
the syncrepl engine with no synchronization cookie or by populating
|
|
the consumer replica by adding an {{TERM:LDIF}} file dumped as a
|
|
backup at the provider.
|
|
|
|
When loading from a backup, it is not required to perform the initial
|
|
loading from the up-to-date backup of the provider content. The
|
|
syncrepl engine will automatically synchronize the initial consumer
|
|
replica to the current provider content. As a result, it is not
|
|
required to stop the provider server in order to avoid the replica
|
|
inconsistency caused by the updates to the provider content during
|
|
the content backup and loading process.
|
|
|
|
When replicating a large scale directory, especially in a bandwidth
|
|
constrained environment, it is advised to load the consumer replica
|
|
from a backup instead of performing a full initial load using
|
|
syncrepl.
|
|
|
|
|
|
H3: Set up the provider slapd
|
|
|
|
The provider is implemented as an overlay, so the overlay itself
|
|
must first be configured in {{slapd.conf}} (5) before it can be
|
|
used. The provider has only two configuration directives, for setting
|
|
checkpoints on the {{EX:contextCSN}} and for configuring the session
|
|
log. Because the LDAP Sync search is subject to access control,
|
|
proper access control privileges should be set up for the replicated
|
|
content.
|
|
|
|
The {{EX:contextCSN}} checkpoint is configured by the
|
|
|
|
> syncprov-checkpoint <ops> <minutes>
|
|
|
|
directive. Checkpoints are only tested after successful write
|
|
operations. If {{<ops>}} operations or more than {{<minutes>}}
|
|
time has passed since the last checkpoint, a new checkpoint is
|
|
performed.
|
|
|
|
The session log is configured by the
|
|
|
|
> syncprov-sessionlog <size>
|
|
|
|
directive, where {{<size>}} is the maximum number of session log
|
|
entries the session log can record. When a session log is configured,
|
|
it is automatically used for all LDAP Sync searches within the
|
|
database.
|
|
|
|
Note that using the session log requires searching on the {{entryUUID}}
|
|
attribute. Setting an eq index on this attribute will greatly benefit
|
|
the performance of the session log on the provider.
|
|
|
|
A more complete example of the {{slapd.conf}} content is thus:
|
|
|
|
> database bdb
|
|
> suffix dc=Example,dc=com
|
|
> rootdn dc=Example,dc=com
|
|
> directory /var/ldap/db
|
|
> index objectclass,entryCSN,entryUUID eq
|
|
>
|
|
> overlay syncprov
|
|
> syncprov-checkpoint 100 10
|
|
> syncprov-sessionlog 100
|
|
|
|
|
|
H3: Set up the consumer slapd
|
|
|
|
The syncrepl replication is specified in the database section of
|
|
{{slapd.conf}} (5) for the replica context. The syncrepl engine
|
|
is backend independent and the directive can be defined with any
|
|
database type.
|
|
|
|
> database hdb
|
|
> suffix dc=Example,dc=com
|
|
> rootdn dc=Example,dc=com
|
|
> directory /var/ldap/db
|
|
> index objectclass,entryCSN,entryUUID eq
|
|
>
|
|
> syncrepl rid=123
|
|
> provider=ldap://provider.example.com:389
|
|
> type=refreshOnly
|
|
> interval=01:00:00:00
|
|
> searchbase="dc=example,dc=com"
|
|
> filter="(objectClass=organizationalPerson)"
|
|
> scope=sub
|
|
> attrs="cn,sn,ou,telephoneNumber,title,l"
|
|
> schemachecking=off
|
|
> bindmethod=simple
|
|
> binddn="cn=syncuser,dc=example,dc=com"
|
|
> credentials=secret
|
|
|
|
In this example, the consumer will connect to the provider slapd
|
|
at port 389 of {{FILE:ldap://provider.example.com}} to perform a
|
|
polling ({{refreshOnly}}) mode of synchronization once a day. It
|
|
will bind as {{EX:cn=syncuser,dc=example,dc=com}} using simple
|
|
authentication with password "secret". Note that the access control
|
|
privilege of {{EX:cn=syncuser,dc=example,dc=com}} should be set
|
|
appropriately in the provider to retrieve the desired replication
|
|
content. Also the search limits must be high enough on the provider
|
|
to allow the syncuser to retrieve a complete copy of the requested
|
|
content. The consumer uses the rootdn to write to its database so
|
|
it always has full permissions to write all content.
|
|
|
|
The synchronization search in the above example will search for the
|
|
entries whose objectClass is organizationalPerson in the entire
|
|
subtree rooted at {{EX:dc=example,dc=com}}. The requested attributes
|
|
are {{EX:cn}}, {{EX:sn}}, {{EX:ou}}, {{EX:telephoneNumber}},
|
|
{{EX:title}}, and {{EX:l}}. The schema checking is turned off, so
|
|
that the consumer {{slapd}} (8) will not enforce entry schema
|
|
checking when it process updates from the provider {{slapd}} (8).
|
|
|
|
For more detailed information on the syncrepl directive, see the
|
|
{{SECT:syncrepl}} section of {{SECT:The slapd Configuration File}}
|
|
chapter of this admin guide.
|
|
|
|
|
|
H3: Start the provider and the consumer slapd
|
|
|
|
The provider {{slapd}} (8) is not required to be restarted.
|
|
{{contextCSN}} is automatically generated as needed: it might be
|
|
originally contained in the {{TERM:LDIF}} file, generated by
|
|
{{slapadd}} (8), generated upon changes in the context, or generated
|
|
when the first LDAP Sync search arrives at the provider. If an
|
|
LDIF file is being loaded which did not previously contain the
|
|
{{contextCSN}}, the {{-w}} option should be used with {{slapadd}}
|
|
(8) to cause it to be generated. This will allow the server to
|
|
startup a little quicker the first time it runs.
|
|
|
|
When starting a consumer {{slapd}} (8), it is possible to provide
|
|
a synchronization cookie as the {{-c cookie}} command line option
|
|
in order to start the synchronization from a specific state. The
|
|
cookie is a comma separated list of name=value pairs. Currently
|
|
supported syncrepl cookie fields are {{csn=<csn>}} and {{rid=<rid>}}.
|
|
{{<csn>}} represents the current synchronization state of the
|
|
consumer replica. {{<rid>}} identifies a consumer replica locally
|
|
within the consumer server. It is used to relate the cookie to the
|
|
syncrepl definition in {{slapd.conf}} (5) which has the matching
|
|
replica identifier. The {{<rid>}} must have no more than 3 decimal
|
|
digits. The command line cookie overrides the synchronization
|
|
cookie stored in the consumer replica database.
|