openldap/doc/guide/admin/syncrepl.sdf
2004-12-08 09:09:22 +00:00

403 lines
20 KiB
Plaintext

# $OpenLDAP$
# Copyright 2003, The OpenLDAP Foundation, All Rights Reserved.
# COPYING RESTRICTIONS APPLY, see COPYRIGHT.
H1: LDAP Sync Replication
The LDAP Sync replication engine, syncrepl for short, is a consumer-side
replication engine that enables the consumer LDAP server to maintain
a shadow copy of a DIT fragment. A syncrepl engine resides at the
consumer-side as one of the {{slapd}} (8) threads. It creates and
maintains a consumer replica by connecting to the replication provider
to perform the initial DIT content load followed either by
periodic content polling or by timely updates upon content changes.
Syncrepl uses the LDAP Content Synchronization (or LDAP Sync for short)
protocol as the replica synchronization protocol. It provides a stateful
replication which supports both
pull-based and push-based synchronization and does not mandate
the use of a history store.
Syncrepl keeps track of the status of the replication content by
maintaining and exchanging synchronization cookies. Because the
syncrepl consumer and provider maintain their content status,
the consumer can poll the provider content to perform incremental
synchronization by asking for the entries required to make the consumer
replica up-to-date with the provider content. Syncrepl also enables
convenient management of replicas by maintaining replica status.
The consumer replica can be constructed from a consumer-side or a
provider-side backup at any synchronization status. Syncrepl can
automatically resynchronize the consumer replica up-to-date with the
current provider content.
Syncrepl supports both pull-based and
push-based synchronization. In its basic refreshOnly synchronization mode,
the provider uses pull-based synchronization where the consumer servers
need not be tracked and no history information is maintained.
The information required for the provider to process periodic polling
requests is contained in the synchronization cookie of the request itself.
To optimize the pull-based synchronization, syncrepl utilizes the present
phase of the LDAP Sync protocol as well as its delete phase, instead of
falling back on frequent full reloads. To further optimize the pull-based
synchronization, the provider can maintain a per-scope session log
as a history store. In its refreshAndPersist mode of synchronization,
the provider uses a push-based synchronization. The provider keeps
track of the consumer servers that have requested a persistent search
and sends them necessary updates as the provider replication content
gets modified.
With syncrepl, a consumer server can create a replica without changing
the provider's configurations and without restarting the provider server,
if the consumer server has appropriate access privileges for the
DIT fragment to be replicated. The consumer server can stop the
replication also without the need for provider-side changes and restart.
Syncrepl supports both partial and sparse replications.
The shadow DIT fragment is defined by a general
search criteria consisting of base, scope, filter, and attribute list.
The replica content is also subject to the access privileges
of the bind identity of the syncrepl replication connection.
H2: The LDAP Content Synchronization Protocol
The LDAP Sync protocol allows a client to maintain a synchronized copy
of a DIT fragment. The LDAP Sync operation is defined as a set of
controls and other protocol elements which extend the LDAP search
operation. This section introduces the LDAP Content Sync protocol
only briefly. For more information, refer to the Internet Draft
{{The LDAP Content Synchronization Operation
<draft-zeilenga-ldup-sync-05.txt>}}.
The LDAP Sync protocol supports both polling and listening for
changes by defining two respective synchronization operations:
{{refreshOnly}} and {{refreshAndPersist}}.
Polling is implemented by the {{refreshOnly}} operation.
The client copy is synchronized to the server copy at the time of polling.
The server finishes the search operation by returning {{SearchResultDone}}
at the end of the search operation as in the normal search.
The listening is implemented by the {{refreshAndPersist}} operation.
Instead of finishing the search after returning all entries currently
matching the search criteria, the synchronization search remains
persistent in the server. Subsequent updates to the synchronization content
in the server cause additional entry updates to be sent to the client.
The {{refreshOnly}} operation and the refresh stage of the
{{refreshAndPersist}} operation can be performed with
a present phase or a delete phase.
In the present phase, the server sends the client the entries updated
within the search scope since the last synchronization. The server sends
all requested attributes, be it changed or not, of the updated entries.
For each unchanged entry which remains in the scope,
the server sends a present message consisting only of the name of the
entry and the synchronization control representing state present.
The present message does not contain any attributes of the entry.
After the client receives all update and present entries,
it can reliably determine the new client copy by adding the entries
added to the server, by replacing the entries modified at the server,
and by deleting entries in the client copy which have not
been updated nor specified as being present at the server.
The transmission of the updated entries in the delete phase is
the same as in the present phase. The server sends all the requested
attributes of the entries updated within the search scope since the
last synchronization to the client. In the delete phase, however,
the server sends a delete message for each entry deleted from the
search scope, instead of sending present messages.
The delete message consists only of the name of the entry
and the synchronization control representing state delete.
The new client copy can be determined by adding, modifying, and
removing entries according to the synchronization control
attached to the {{SearchResultEntry}} message.
In the case that the LDAP Sync server maintains a history store
and can determine which entries are scoped out of the client
copy since the last synchronization time, the server can use
the delete phase. If the server does not maintain any history store,
cannot determine the scoped-out entries from the history store,
or the history store does not cover the outdated synchronization
state of the client, the server should use the present phase.
The use of the present phase is much more efficient than a full
content reload in terms of the synchronization traffic.
To reduce the synchronization traffic further,
the LDAP Sync protocol also provides several optimizations
such as the transmission of the normalized {{EX:entryUUID}}s and the
transmission of multiple {{EX:entryUUIDs}} in a single
{{syncIdSet}} message.
At the end of the {{refreshOnly}} synchronization,
the server sends a synchronization cookie to the client as a state
indicator of the client copy after the synchronization is completed.
The client will present the received cookie when it requests
the next incremental synchronization to the server.
When {{refreshAndPersist}} synchronization is used,
the server sends a synchronization cookie at the end of the
refresh stage by sending a Sync Info message with TRUE refreshDone.
It also sends a synchronization cookie by attaching it to
{{SearchResultEntry}} generated in the persist stage of the
synchronization search. During the persist stage, the server
can also send a Sync Info message containing the synchronization
cookie at any time the server wants to update the client-side state
indicator. The server also updates a synchronization indicator
of the client at the end of the persist stage.
In the LDAP Sync protocol, entries are uniquely identified by
the {{EX:entryUUID}} attribute value. It can function as a reliable
identifier of the entry. The DN of the entry, on the other hand,
can be changed over time and hence cannot be considered as the reliable
identifier. The {{EX:entryUUID}} is attached to each {{SearchResultEntry}}
or {{SearchResultReference}} as a part of the synchronization control.
H2: Syncrepl Details
The syncrepl engine utilizes both the {{refreshOnly}} and the
{{refreshAndPersist}} operations of the LDAP Sync protocol.
If a syncrepl specification is included in a database definition,
{{slapd}} (8) launches a syncrepl engine as a {{slapd}} (8) thread
and schedules its execution. If the {{refreshOnly}} operation is
specified, the syncrepl engine will be rescheduled at the interval
time after a synchronization operation is completed.
If the {{refreshAndPersist}} operation is specified, the engine will
remain active and process the persistent synchronization messages
from the provider.
The syncrepl engine utilizes both the present phase and the
delete phase of the refresh synchronization. It is possible to
configure a per-scope session log in the provider server
which stores the {{EX:entryUUID}}s of a finite
number of entries deleted from a replication content.
Multiple replicas of single provider content share the same
per-scope session log. The syncrepl engine uses the delete phase
if the session log is present and the state of the consumer
server is recent enough that no session log entries are truncated
after the last synchronization of the client.
The syncrepl engine uses the present phase if no session log
is configured for the replication content or if the
consumer replica is too outdated to be covered by the session log.
The current design of the session log store is memory based, so
the information contained in the session log is not persistent
over multiple provider invocations. It is not currently supported
to access the session log store by using LDAP operations. It is
also not currently supported to impose access control to the session log.
As a further optimization, even in the case the synchronization search
is not associated with any session log, no entries will be transmitted
to the consumer server when there has been no update in the replication
context.
The syncrepl engine, which is a consumer-side replication engine,
can work with any backends. The LDAP Sync provider can be configured
as an overlay on any backend, but works best with the {{back-bdb}} or
{{back-hdb}} backend. The provider can not support refreshAndPersist
mode on {{back-ldbm}} due to limits in that backend's locking architecture.
The LDAP Sync provider maintains a {{EX:contextCSN}} for each
database as the current synchronization state indicator of the
provider content. It is the largest {{EX:entryCSN}} in the provider
context such that no transactions for an entry having
smaller {{EX:entryCSN}} value remains outstanding.
The {{EX:contextCSN}} could not just be set to the largest issued
{{EX:entryCSN}} because {{EX:entryCSN}} is obtained before
a transaction starts and transactions are not committed in the
issue order.
The provider stores the {{EX:contextCSN}} of a context in the
{{EX:contextCSN}} attribute of the context suffix entry. The attribute
is not written to the database after every update operation though;
instead it is maintained primarily in memory. At database start time
the provider reads the last saved {{EX:contextCSN}} into memory and
uses the in-memory copy exclusively thereafter. By default, changes
to the {{EX:contextCSN}} as a result of database updates will not be
written to the database until the server is cleanly shut down. A
checkpoint facility exists to cause the contextCSN to be written
out more frequently if desired.
Note that at startup time, if the
provider is unable to read a {{EX:contextCSN}} from the suffix entry,
it will scan the entire database to determine the value, and this
scan may take quite a long time on a large database. When a {{EX:contextCSN}}
value is read, the database will still be scanned for any {{EX:entryCSN}}
values greater than it, to make sure the {{EX:contextCSN}} value truly
reflects the greatest committed {{EX:entryCSN}} in the database. On
databases which support inequality indexing, setting an eq index
on the {{EX:entryCSN}} attribute and configuring {{contextCSN}} checkpoints
will greatly speed up this scanning step.
If no {{EX:contextCSN}} can be determined by reading and scanning the
database, a new value will be generated. Also, if scanning the database
yielded a greater {{EX:entryCSN}} than was previously recorded in the
suffix entry's {{EX:contextCSN}} attribute, a checkpoint will be immediately
written with the new value.
The consumer stores its replica state, which is the provider's
{{EX:contextCSN}} received as a synchronization cookie,
in the {{EX:syncreplCookie}} attribute of the immediate child
of the context suffix whose DN is {{cn=syncrepl<rid>,<suffix>}}
and object class is {{EX:syncConsumerSubentry}}.
The replica state maintained by a consumer server is used as the
synchronization state indicator when it performs subsequent incremental
synchronization with the provider server. It is also used as a
provider-side synchronization state indicator when it functions as
a secondary provider server in a cascading replication configuration.
<rid> is the replica ID uniquely identifying the replica locally in the
syncrepl consumer server. <rid> is an integer which has no more than
three decimal digits.
It is possible to retrieve the
{{EX:syncConsumerSubentry}} by performing an LDAP search with
the respective entry as the base object and with the base scope.
Because a general search filter can be used in the syncrepl specification,
some entries in the context may be omitted from the synchronization content.
The syncrepl engine creates a glue entry to fill in the holes
in the replica context if any part of the replica content is
subordinate to the holes. The glue entries will not be returned
as the search result unless {{ManageDsaIT}} control is provided.
Also as a consequence of the search filter used in the syncrepl
specification, it is possible for a modification to remove an
entry from the replication scope even though the entry has not
been deleted on the provider. Logically the entry must be deleted on the
consumer but in {{refreshOnly}} mode the provider cannot detect
and propagate this change without the use of the session log.
H2: Configuring Syncrepl
Because syncrepl is a consumer-side replication engine, the syncrepl
specification is defined in {{slapd.conf}} (5) of the consumer server,
not in the provider server's configuration file.
The initial loading of the replica content can be performed either
by starting the syncrepl engine with no synchronization cookie
or by populating the consumer replica by adding and demoting an
{{TERM:LDIF}} file dumped as a backup at the provider.
{{slapadd}} (8) supports the replica promotion and demotion.
When loading from a backup, it is not required to perform the initial
loading from the up-to-date backup of the provider content. The syncrepl
engine will automatically synchronize the initial consumer replica to
the current provider content. As a result, it is not required
to stop the provider server in order to avoid the replica inconsistency
caused by the updates to the provider content during the
content backup and loading process.
When replicating a large scale directory, especially in a bandwidth
constrained environment, it is advised to load the consumer replica
from a backup instead of performing a full initial load using syncrepl.
H3: Set up the provider slapd
The provider is implemented as an overlay, so the overlay itself must
first be configured in {{slapd.conf}} (5) before it can be used. The
provider has only two configuration directives, for setting checkpoints
on the {{EX:contextCSN}} and for configuring the session log.
Because the
LDAP Sync search is subject to access control, proper access control
privileges should be set up for the replicated content.
The {{EX:contextCSN}} checkpoint is configured by the
> syncprov-checkpoint <ops> <minutes>
directive. Checkpoints are only tested after successful write operations.
If {{<ops>}} operations or more than {{<minutes>}} time has passed
since the last checkpoint, a new checkpoint is performed.
The session log is configured by the
> syncprov-sessionlog <sid> <size>
directive, where {{<sid>}} is the ID of the per-scope session log
in the provider server and {{<size>}} is the maximum number of
session log entries the session log can record. {{<sid>}}
is an integer no longer than 3 decimal digits. If the consumer
server sends a synchronization cookie containing {{sid=<sid>}}
where {{<sid>}} matches the session log ID specified in the directive,
the LDAP Sync search is to utilize the session log.
Note that using the session log requires searching on the {{entryUUID}}
attribute. Setting an eq index on this attribute will greatly
benefit the performance of the session log on the provider.
A more complete example of the {{slapd.conf}} content is thus:
> database bdb
> suffix dc=Example,dc=com
> directory /var/ldap/db
> index objectclass,entryCSN,entryUUID eq
>
> overlay syncprov
> syncprov-checkpoint 100 10
> syncprov-sessionlog 0 100
H3: Set up the consumer slapd
The syncrepl replication is specified in the database section
of {{slapd.conf}} (5) for the replica context.
The syncrepl engine is backend independent and the directive
can be defined with any database type.
> syncrepl rid=123
> provider=ldap://provider.example.com:389
> type=refreshOnly
> interval=01:00:00:00
> searchbase="dc=example,dc=com"
> filter="(objectClass=organizationalPerson)"
> scope=sub
> attrs="cn,sn,ou,telephoneNumber,title,l"
> schemachecking=off
> updatedn="cn=replica,dc=example,dc=com"
> bindmethod=simple
> binddn="cn=syncuser,dc=example,dc=com"
> credentials=secret
In this example, the consumer will connect to the provider slapd
at port 389 of {{FILE:ldap://provider.example.com}} to perform a
polling ({{refreshOnly}}) mode of synchronization once a day. It will
bind as {{EX:cn=syncuser,dc=example,dc=com}} using simple authentication
with password "secret". Note that the access control privilege of
{{EX:cn=syncuser,dc=example,dc=com}} should be set appropriately
in the provider to retrieve the desired replication content.
The consumer will write to its database with the privilege of the
{{EX:cn=replica,dc=example,dc=com}} entry as specified in the
{{EX:updatedn=}} directive. The {{EX:updatedn}} entry should have
write permission to the replica content.
The synchronization search in the above example will search for the
entries whose objectClass is organizationalPerson in the entire subtree
rooted at {{EX:dc=example,dc=com}}. The requested attributes are
{{EX:cn}}, {{EX:sn}}, {{EX:ou}}, {{EX:telephoneNumber}},
{{EX:title}}, and {{EX:l}}. The schema checking is turned off, so
that the consumer {{slapd}} (8) will not enforce entry schema checking
when it process updates from the provider {{slapd}} (8).
For more detailed information on the syncrepl directive,
see the {{SECT:syncrepl}} section of {{SECT:The slapd Configuration File}}
chapter of this admin guide.
H3: Start the provider and the consumer slapd
The provider {{slapd}} (8) is not required to be restarted.
{{contextCSN}} is automatically generated as needed:
it might originally contained in the {{TERM:LDIF}} file,
generated by {{slapadd}} (8), generated upon changes in the context,
or generated when the first LDAP Sync search arrived at the provider.
When starting a consumer {{slapd}} (8), it is possible to provide a
synchronization cookie as the {{-c cookie}} command line option
in order to start the synchronization from a specific state.
The cookie is a comma separated list of name=value pairs. Currently
supported syncrepl cookie fields are {{csn=<csn>}}, {{sid=<sid>}}, and
{{rid=<rid>}}. {{<csn>}} represents the current synchronization state
of the consumer replica. {{<sid>}} is the identity of the per-scope
session log to which this consumer will be associated. {{<rid>}} identifies
a consumer replica locally within the consumer server. It is used to relate
the cookie to the syncrepl definition in {{slapd.conf}} (5) which has
the matching replica identifier.
Both {{<sid>}} and {{<rid>}} have no more than 3 decimal digits.
The command line cookie overrides the synchronization cookie
stored in the consumer replica database.