mirror of
https://git.openldap.org/openldap/openldap.git
synced 2024-12-21 03:10:25 +08:00
403 lines
20 KiB
Plaintext
403 lines
20 KiB
Plaintext
# $OpenLDAP$
|
|
# Copyright 2003, The OpenLDAP Foundation, All Rights Reserved.
|
|
# COPYING RESTRICTIONS APPLY, see COPYRIGHT.
|
|
|
|
H1: LDAP Sync Replication
|
|
|
|
The LDAP Sync replication engine, syncrepl for short, is a consumer-side
|
|
replication engine that enables the consumer LDAP server to maintain
|
|
a shadow copy of a DIT fragment. A syncrepl engine resides at the
|
|
consumer-side as one of the {{slapd}} (8) threads. It creates and
|
|
maintains a consumer replica by connecting to the replication provider
|
|
to perform the initial DIT content load followed either by
|
|
periodic content polling or by timely updates upon content changes.
|
|
|
|
Syncrepl uses the LDAP Content Synchronization (or LDAP Sync for short)
|
|
protocol as the replica synchronization protocol. It provides a stateful
|
|
replication which supports both
|
|
pull-based and push-based synchronization and does not mandate
|
|
the use of a history store.
|
|
|
|
Syncrepl keeps track of the status of the replication content by
|
|
maintaining and exchanging synchronization cookies. Because the
|
|
syncrepl consumer and provider maintain their content status,
|
|
the consumer can poll the provider content to perform incremental
|
|
synchronization by asking for the entries required to make the consumer
|
|
replica up-to-date with the provider content. Syncrepl also enables
|
|
convenient management of replicas by maintaining replica status.
|
|
The consumer replica can be constructed from a consumer-side or a
|
|
provider-side backup at any synchronization status. Syncrepl can
|
|
automatically resynchronize the consumer replica up-to-date with the
|
|
current provider content.
|
|
|
|
Syncrepl supports both pull-based and
|
|
push-based synchronization. In its basic refreshOnly synchronization mode,
|
|
the provider uses pull-based synchronization where the consumer servers
|
|
need not be tracked and no history information is maintained.
|
|
The information required for the provider to process periodic polling
|
|
requests is contained in the synchronization cookie of the request itself.
|
|
To optimize the pull-based synchronization, syncrepl utilizes the present
|
|
phase of the LDAP Sync protocol as well as its delete phase, instead of
|
|
falling back on frequent full reloads. To further optimize the pull-based
|
|
synchronization, the provider can maintain a per-scope session log
|
|
as a history store. In its refreshAndPersist mode of synchronization,
|
|
the provider uses a push-based synchronization. The provider keeps
|
|
track of the consumer servers that have requested a persistent search
|
|
and sends them necessary updates as the provider replication content
|
|
gets modified.
|
|
|
|
With syncrepl, a consumer server can create a replica without changing
|
|
the provider's configurations and without restarting the provider server,
|
|
if the consumer server has appropriate access privileges for the
|
|
DIT fragment to be replicated. The consumer server can stop the
|
|
replication also without the need for provider-side changes and restart.
|
|
|
|
Syncrepl supports both partial and sparse replications.
|
|
The shadow DIT fragment is defined by a general
|
|
search criteria consisting of base, scope, filter, and attribute list.
|
|
The replica content is also subject to the access privileges
|
|
of the bind identity of the syncrepl replication connection.
|
|
|
|
|
|
H2: The LDAP Content Synchronization Protocol
|
|
|
|
The LDAP Sync protocol allows a client to maintain a synchronized copy
|
|
of a DIT fragment. The LDAP Sync operation is defined as a set of
|
|
controls and other protocol elements which extend the LDAP search
|
|
operation. This section introduces the LDAP Content Sync protocol
|
|
only briefly. For more information, refer to the Internet Draft
|
|
{{The LDAP Content Synchronization Operation
|
|
<draft-zeilenga-ldup-sync-05.txt>}}.
|
|
|
|
The LDAP Sync protocol supports both polling and listening for
|
|
changes by defining two respective synchronization operations:
|
|
{{refreshOnly}} and {{refreshAndPersist}}.
|
|
Polling is implemented by the {{refreshOnly}} operation.
|
|
The client copy is synchronized to the server copy at the time of polling.
|
|
The server finishes the search operation by returning {{SearchResultDone}}
|
|
at the end of the search operation as in the normal search.
|
|
The listening is implemented by the {{refreshAndPersist}} operation.
|
|
Instead of finishing the search after returning all entries currently
|
|
matching the search criteria, the synchronization search remains
|
|
persistent in the server. Subsequent updates to the synchronization content
|
|
in the server cause additional entry updates to be sent to the client.
|
|
|
|
The {{refreshOnly}} operation and the refresh stage of the
|
|
{{refreshAndPersist}} operation can be performed with
|
|
a present phase or a delete phase.
|
|
|
|
In the present phase, the server sends the client the entries updated
|
|
within the search scope since the last synchronization. The server sends
|
|
all requested attributes, be it changed or not, of the updated entries.
|
|
For each unchanged entry which remains in the scope,
|
|
the server sends a present message consisting only of the name of the
|
|
entry and the synchronization control representing state present.
|
|
The present message does not contain any attributes of the entry.
|
|
After the client receives all update and present entries,
|
|
it can reliably determine the new client copy by adding the entries
|
|
added to the server, by replacing the entries modified at the server,
|
|
and by deleting entries in the client copy which have not
|
|
been updated nor specified as being present at the server.
|
|
|
|
The transmission of the updated entries in the delete phase is
|
|
the same as in the present phase. The server sends all the requested
|
|
attributes of the entries updated within the search scope since the
|
|
last synchronization to the client. In the delete phase, however,
|
|
the server sends a delete message for each entry deleted from the
|
|
search scope, instead of sending present messages.
|
|
The delete message consists only of the name of the entry
|
|
and the synchronization control representing state delete.
|
|
The new client copy can be determined by adding, modifying, and
|
|
removing entries according to the synchronization control
|
|
attached to the {{SearchResultEntry}} message.
|
|
|
|
In the case that the LDAP Sync server maintains a history store
|
|
and can determine which entries are scoped out of the client
|
|
copy since the last synchronization time, the server can use
|
|
the delete phase. If the server does not maintain any history store,
|
|
cannot determine the scoped-out entries from the history store,
|
|
or the history store does not cover the outdated synchronization
|
|
state of the client, the server should use the present phase.
|
|
The use of the present phase is much more efficient than a full
|
|
content reload in terms of the synchronization traffic.
|
|
To reduce the synchronization traffic further,
|
|
the LDAP Sync protocol also provides several optimizations
|
|
such as the transmission of the normalized {{EX:entryUUID}}s and the
|
|
transmission of multiple {{EX:entryUUIDs}} in a single
|
|
{{syncIdSet}} message.
|
|
|
|
At the end of the {{refreshOnly}} synchronization,
|
|
the server sends a synchronization cookie to the client as a state
|
|
indicator of the client copy after the synchronization is completed.
|
|
The client will present the received cookie when it requests
|
|
the next incremental synchronization to the server.
|
|
|
|
When {{refreshAndPersist}} synchronization is used,
|
|
the server sends a synchronization cookie at the end of the
|
|
refresh stage by sending a Sync Info message with TRUE refreshDone.
|
|
It also sends a synchronization cookie by attaching it to
|
|
{{SearchResultEntry}} generated in the persist stage of the
|
|
synchronization search. During the persist stage, the server
|
|
can also send a Sync Info message containing the synchronization
|
|
cookie at any time the server wants to update the client-side state
|
|
indicator. The server also updates a synchronization indicator
|
|
of the client at the end of the persist stage.
|
|
|
|
In the LDAP Sync protocol, entries are uniquely identified by
|
|
the {{EX:entryUUID}} attribute value. It can function as a reliable
|
|
identifier of the entry. The DN of the entry, on the other hand,
|
|
can be changed over time and hence cannot be considered as the reliable
|
|
identifier. The {{EX:entryUUID}} is attached to each {{SearchResultEntry}}
|
|
or {{SearchResultReference}} as a part of the synchronization control.
|
|
|
|
|
|
H2: Syncrepl Details
|
|
|
|
The syncrepl engine utilizes both the {{refreshOnly}} and the
|
|
{{refreshAndPersist}} operations of the LDAP Sync protocol.
|
|
If a syncrepl specification is included in a database definition,
|
|
{{slapd}} (8) launches a syncrepl engine as a {{slapd}} (8) thread
|
|
and schedules its execution. If the {{refreshOnly}} operation is
|
|
specified, the syncrepl engine will be rescheduled at the interval
|
|
time after a synchronization operation is completed.
|
|
If the {{refreshAndPersist}} operation is specified, the engine will
|
|
remain active and process the persistent synchronization messages
|
|
from the provider.
|
|
|
|
The syncrepl engine utilizes both the present phase and the
|
|
delete phase of the refresh synchronization. It is possible to
|
|
configure a per-scope session log in the provider server
|
|
which stores the {{EX:entryUUID}}s of a finite
|
|
number of entries deleted from a replication content.
|
|
Multiple replicas of single provider content share the same
|
|
per-scope session log. The syncrepl engine uses the delete phase
|
|
if the session log is present and the state of the consumer
|
|
server is recent enough that no session log entries are truncated
|
|
after the last synchronization of the client.
|
|
The syncrepl engine uses the present phase if no session log
|
|
is configured for the replication content or if the
|
|
consumer replica is too outdated to be covered by the session log.
|
|
The current design of the session log store is memory based, so
|
|
the information contained in the session log is not persistent
|
|
over multiple provider invocations. It is not currently supported
|
|
to access the session log store by using LDAP operations. It is
|
|
also not currently supported to impose access control to the session log.
|
|
|
|
As a further optimization, even in the case the synchronization search
|
|
is not associated with any session log, no entries will be transmitted
|
|
to the consumer server when there has been no update in the replication
|
|
context.
|
|
|
|
The syncrepl engine, which is a consumer-side replication engine,
|
|
can work with any backends. The LDAP Sync provider can be configured
|
|
as an overlay on any backend, but works best with the {{back-bdb}} or
|
|
{{back-hdb}} backend. The provider can not support refreshAndPersist
|
|
mode on {{back-ldbm}} due to limits in that backend's locking architecture.
|
|
|
|
The LDAP Sync provider maintains a {{EX:contextCSN}} for each
|
|
database as the current synchronization state indicator of the
|
|
provider content. It is the largest {{EX:entryCSN}} in the provider
|
|
context such that no transactions for an entry having
|
|
smaller {{EX:entryCSN}} value remains outstanding.
|
|
The {{EX:contextCSN}} could not just be set to the largest issued
|
|
{{EX:entryCSN}} because {{EX:entryCSN}} is obtained before
|
|
a transaction starts and transactions are not committed in the
|
|
issue order.
|
|
|
|
The provider stores the {{EX:contextCSN}} of a context in the
|
|
{{EX:contextCSN}} attribute of the context suffix entry. The attribute
|
|
is not written to the database after every update operation though;
|
|
instead it is maintained primarily in memory. At database start time
|
|
the provider reads the last saved {{EX:contextCSN}} into memory and
|
|
uses the in-memory copy exclusively thereafter. By default, changes
|
|
to the {{EX:contextCSN}} as a result of database updates will not be
|
|
written to the database until the server is cleanly shut down. A
|
|
checkpoint facility exists to cause the contextCSN to be written
|
|
out more frequently if desired.
|
|
|
|
Note that at startup time, if the
|
|
provider is unable to read a {{EX:contextCSN}} from the suffix entry,
|
|
it will scan the entire database to determine the value, and this
|
|
scan may take quite a long time on a large database. When a {{EX:contextCSN}}
|
|
value is read, the database will still be scanned for any {{EX:entryCSN}}
|
|
values greater than it, to make sure the {{EX:contextCSN}} value truly
|
|
reflects the greatest committed {{EX:entryCSN}} in the database. On
|
|
databases which support inequality indexing, setting an eq index
|
|
on the {{EX:entryCSN}} attribute and configuring {{contextCSN}} checkpoints
|
|
will greatly speed up this scanning step.
|
|
|
|
If no {{EX:contextCSN}} can be determined by reading and scanning the
|
|
database, a new value will be generated. Also, if scanning the database
|
|
yielded a greater {{EX:entryCSN}} than was previously recorded in the
|
|
suffix entry's {{EX:contextCSN}} attribute, a checkpoint will be immediately
|
|
written with the new value.
|
|
|
|
The consumer also stores its replica state, which is the provider's
|
|
{{EX:contextCSN}} received as a synchronization cookie,
|
|
in the {{EX:contextCSN}} attribute of the suffix entry.
|
|
The replica state maintained by a consumer server is used as the
|
|
synchronization state indicator when it performs subsequent incremental
|
|
synchronization with the provider server. It is also used as a
|
|
provider-side synchronization state indicator when it functions as
|
|
a secondary provider server in a cascading replication configuration.
|
|
Since the consumer and provider state information are maintained in
|
|
the same location within their respective databases, any consumer
|
|
can be promoted to a provider (and vice versa) without any special
|
|
actions.
|
|
|
|
Because a general search filter can be used in the syncrepl specification,
|
|
some entries in the context may be omitted from the synchronization content.
|
|
The syncrepl engine creates a glue entry to fill in the holes
|
|
in the replica context if any part of the replica content is
|
|
subordinate to the holes. The glue entries will not be returned
|
|
in the search result unless {{ManageDsaIT}} control is provided.
|
|
|
|
Also as a consequence of the search filter used in the syncrepl
|
|
specification, it is possible for a modification to remove an
|
|
entry from the replication scope even though the entry has not
|
|
been deleted on the provider. Logically the entry must be deleted on the
|
|
consumer but in {{refreshOnly}} mode the provider cannot detect
|
|
and propagate this change without the use of the session log.
|
|
|
|
H2: Configuring Syncrepl
|
|
|
|
Because syncrepl is a consumer-side replication engine, the syncrepl
|
|
specification is defined in {{slapd.conf}} (5) of the consumer server,
|
|
not in the provider server's configuration file.
|
|
The initial loading of the replica content can be performed either
|
|
by starting the syncrepl engine with no synchronization cookie
|
|
or by populating the consumer replica by adding an
|
|
{{TERM:LDIF}} file dumped as a backup at the provider.
|
|
|
|
When loading from a backup, it is not required to perform the initial
|
|
loading from the up-to-date backup of the provider content. The syncrepl
|
|
engine will automatically synchronize the initial consumer replica to
|
|
the current provider content. As a result, it is not required
|
|
to stop the provider server in order to avoid the replica inconsistency
|
|
caused by the updates to the provider content during the
|
|
content backup and loading process.
|
|
|
|
When replicating a large scale directory, especially in a bandwidth
|
|
constrained environment, it is advised to load the consumer replica
|
|
from a backup instead of performing a full initial load using syncrepl.
|
|
|
|
H3: Set up the provider slapd
|
|
|
|
The provider is implemented as an overlay, so the overlay itself must
|
|
first be configured in {{slapd.conf}} (5) before it can be used. The
|
|
provider has only two configuration directives, for setting checkpoints
|
|
on the {{EX:contextCSN}} and for configuring the session log.
|
|
Because the
|
|
LDAP Sync search is subject to access control, proper access control
|
|
privileges should be set up for the replicated content.
|
|
|
|
The {{EX:contextCSN}} checkpoint is configured by the
|
|
|
|
> syncprov-checkpoint <ops> <minutes>
|
|
|
|
directive. Checkpoints are only tested after successful write operations.
|
|
If {{<ops>}} operations or more than {{<minutes>}} time has passed
|
|
since the last checkpoint, a new checkpoint is performed.
|
|
|
|
The session log is configured by the
|
|
|
|
> syncprov-sessionlog <size>
|
|
|
|
directive, where {{<size>}} is the maximum number of
|
|
session log entries the session log can record. When a session log
|
|
is configured, it is automatically used for all LDAP Sync searches
|
|
within the database.
|
|
|
|
Note that using the session log requires searching on the {{entryUUID}}
|
|
attribute. Setting an eq index on this attribute will greatly
|
|
benefit the performance of the session log on the provider.
|
|
|
|
A more complete example of the {{slapd.conf}} content is thus:
|
|
|
|
> database bdb
|
|
> suffix dc=Example,dc=com
|
|
> rootdn dc=Example,dc=com
|
|
> directory /var/ldap/db
|
|
> index objectclass,entryCSN,entryUUID eq
|
|
>
|
|
> overlay syncprov
|
|
> syncprov-checkpoint 100 10
|
|
> syncprov-sessionlog 100
|
|
|
|
|
|
H3: Set up the consumer slapd
|
|
|
|
The syncrepl replication is specified in the database section
|
|
of {{slapd.conf}} (5) for the replica context.
|
|
The syncrepl engine is backend independent and the directive
|
|
can be defined with any database type.
|
|
|
|
> database hdb
|
|
> suffix dc=Example,dc=com
|
|
> rootdn dc=Example,dc=com
|
|
> directory /var/ldap/db
|
|
> index objectclass,entryCSN,entryUUID eq
|
|
>
|
|
> syncrepl rid=123
|
|
> provider=ldap://provider.example.com:389
|
|
> type=refreshOnly
|
|
> interval=01:00:00:00
|
|
> searchbase="dc=example,dc=com"
|
|
> filter="(objectClass=organizationalPerson)"
|
|
> scope=sub
|
|
> attrs="cn,sn,ou,telephoneNumber,title,l"
|
|
> schemachecking=off
|
|
> bindmethod=simple
|
|
> binddn="cn=syncuser,dc=example,dc=com"
|
|
> credentials=secret
|
|
|
|
In this example, the consumer will connect to the provider slapd
|
|
at port 389 of {{FILE:ldap://provider.example.com}} to perform a
|
|
polling ({{refreshOnly}}) mode of synchronization once a day. It will
|
|
bind as {{EX:cn=syncuser,dc=example,dc=com}} using simple authentication
|
|
with password "secret". Note that the access control privilege of
|
|
{{EX:cn=syncuser,dc=example,dc=com}} should be set appropriately
|
|
in the provider to retrieve the desired replication content. Also
|
|
the search limits must be high enough on the provider to allow the
|
|
syncuser to retrieve a complete copy of the requested content.
|
|
The consumer uses the rootdn to write to its database so it
|
|
always has full permissions to write all content.
|
|
|
|
The synchronization search in the above example will search for the
|
|
entries whose objectClass is organizationalPerson in the entire subtree
|
|
rooted at {{EX:dc=example,dc=com}}. The requested attributes are
|
|
{{EX:cn}}, {{EX:sn}}, {{EX:ou}}, {{EX:telephoneNumber}},
|
|
{{EX:title}}, and {{EX:l}}. The schema checking is turned off, so
|
|
that the consumer {{slapd}} (8) will not enforce entry schema checking
|
|
when it process updates from the provider {{slapd}} (8).
|
|
|
|
For more detailed information on the syncrepl directive,
|
|
see the {{SECT:syncrepl}} section of {{SECT:The slapd Configuration File}}
|
|
chapter of this admin guide.
|
|
|
|
H3: Start the provider and the consumer slapd
|
|
|
|
The provider {{slapd}} (8) is not required to be restarted.
|
|
{{contextCSN}} is automatically generated as needed:
|
|
it might be originally contained in the {{TERM:LDIF}} file,
|
|
generated by {{slapadd}} (8), generated upon changes in the context,
|
|
or generated when the first LDAP Sync search arrives at the provider.
|
|
If an LDIF file is being loaded which did not previously contain the
|
|
{{contextCSN}}, the {{-w}} option should be used with {{slapadd}} (8)
|
|
to cause it to be generated. This will allow the server to startup a
|
|
little quicker the first time it runs.
|
|
|
|
When starting a consumer {{slapd}} (8), it is possible to provide a
|
|
synchronization cookie as the {{-c cookie}} command line option
|
|
in order to start the synchronization from a specific state.
|
|
The cookie is a comma separated list of name=value pairs. Currently
|
|
supported syncrepl cookie fields are {{csn=<csn>}} and
|
|
{{rid=<rid>}}. {{<csn>}} represents the current synchronization state
|
|
of the consumer replica. {{<rid>}} identifies
|
|
a consumer replica locally within the consumer server. It is used to relate
|
|
the cookie to the syncrepl definition in {{slapd.conf}} (5) which has
|
|
the matching replica identifier.
|
|
The {{<rid>}} must have no more than 3 decimal digits.
|
|
The command line cookie overrides the synchronization cookie
|
|
stored in the consumer replica database.
|