openldap/doc/guide/admin/syncrepl.sdf

# $OpenLDAP$
# Copyright 2003, The OpenLDAP Foundation, All Rights Reserved.
# COPYING RESTRICTIONS APPLY, see COPYRIGHT.

H1: LDAP Sync Replication

The LDAP Sync replication engine is designed to function as an
improved alternative to {{slurpd}}(8).  While the replication with
{{slurpd}}(8) provides the replication capability for improved capacity,
availability, and reliability, it has some drawbacks:

^ It is {{not stateful}}, hence lacks the resynchronization capability.
Because there is no representation of replica state in the replication
with {{slurpd}}(8), it is not possible to provide an efficient
mechanism to make the slave replica consistent to the master replica
once they become out of sync. For instance, if the slave database
content is damaged, the slave replica should be re-primed from the
master replica again. with a state-based replication, it would be
possible to recover the slave replica from a local backup. The slave
replica, then, will be synchronized by calculating and transmitting
the diffs between the slave replica and the master replica based
on their states. The LDAP Sync replication is {{stateful}}.

+ It is {{history-based}}, not {{state-based}}. The replication with
{{slurpd}}(8) relies on the history information in the replication
log file generated by {{slapd}}(8). If a portion of the log file
that contains updates yet to be synchronized to the slave is truncated
or damaged, a full reload is required. The state-based replication,
on the other hand, would not rely on the separate history store.
In the LDAP Sync replication, every directory entry has its state
information in the {{EX:entryCSN}} operational attribute. The replica
contents are calculated based on the consumer cookie and the
{{EX:entryCSN}} of the directory entries.

+ It is {{push-based}}, not {{pull-based}}. In the replication with
{{slurpd}}(8), it is the master who decides when to synchronize the
replica. The pull-based polling replication is not possible with
{{slurpd}}(8). For example, in order to make a daily directory backup
which is an exact image at a time, it is required to make the slave
replica read-only by stopping {{slurpd}}(8) during backup. After backup,
{{slurpd}}(8) can be run in an one-shot mode to resynchronize the slave
replica with the updates during the backup. In a pull-based, polling
replication, it is guaranteed to be read-only between the two polling
points. The LDAP Sync replication supports both {{push-based}}
and {{pull-based}} replication.

+ It only supports the fractional replication and does not support
the sparse replication. The LDAP Sync replication supports both the
fractional and sparse replication. It is possible to use general
search specification to initiate a synchronization session only for
the interesting subset of the context.


H2: The LDAP Content Sychronization Operation

The LDAP Sync replication uses the LDAP Content Synchronization (or
LDAP Sync) protocol (refer to the Internet Draft titled {{The LDAP
Content Synchronization Operation}}) for replica synchronization.

The LDAP Sync operation is based on the replica state which is
transmitted between replicas as the synchronization cookies.  There
are two operating modes: {{refreshOnly}} and {{refreshAndPersist}}.
In both modes, a consumer {{slapd}}(8) connects to a provider
{{slapd}}(8) with a cookie value representing the state of the
consumer replica.  The non-persistent part of the synchronization
consists of two phases.

The first is the {{state-based}} phase. The entries updated after
the point in time the consumer cookie represents will be transmitted
to the consumer. Because the unit of synchronization is entry, all
the requested attributes will be transmitted even though only some
of them are changed. For the rest of the entries, the present
messages consisting only of the name and the synchronization control
will be sent to the consumer. After the consumer receives all the
updated and present entries, it can reliably make its replica
consistent to the provider replica. The consumer will add all the
newly added entries, replace the entries if updated entries are
existent, and delete entries in the local replica if they are neither
updated nor specified as present.

The second is the {{log-based}} phase. This phase is incorporated
to optimize the protocol with respect to the volume of the present
traffic. If the provider maintains a history store from which the
content to be synchronized can be reliably calculated, this log-base
phase follows the state-base phase. In this mode, the actual directory
update operations such as delete, modify, and add are transmitted.
There is no need to send present messages in this log-based phase.

If the protocol operates in the {{refreshOnly}} mode, the synchronization
will terminate. The provider will send a synchronization cookie
which reflects the new state to the consumer. The consumer will
present the new cookie at the next time it requests a synchronization.
If the protocol operates in the {{refreshAndPersist}} mode, the
synchronization operation remains persistent in the provider. Every
updates made to the provider replica will be transmitted to the
consumer. Cookies can be sent to the consumer at any time by using
the SyncInfo intermediate response and at the end of the synchronization
by using the SyncDone control attached to the SearchResultDone
message.

Entries are uniquely identified by the {{EX:entryUUID}} attribute
value in the LDAP Content Sync protocol. It can role as a reliable
entry identifier while DN of an entry can change by modrdn operations.
The {{EX:entryUUID}} is attached to each SearchResultEntry or
SearchResultReference as a part of the Sync State control.


H2: LDAP Sync Replication Details

The LDAP Sync replication uses both the {{refreshOnly}} and the
{{refreshAndPersist}} modes of synchronization.  If an LDAP Sync
replication is specified in a database definition, the {{slapd}}(8)
schedules an execution of the LDAP Sync replication engine. In the
{{refreshOnly}} mode, the engine will be rescheduled at the interval
time after a replication session ends. In the {{refreshAndPersist}}
mode, the engine will remain active to process the SearchResultEntry
messages from the provider.

The LDAP Sync replication uses only the state-based synchronization
phase.  Because {{slapd}}(8) does not currently implement history store
like changelog or tombstone, it depends only on the state-base
phase. A Null log-base phase follows the state-base phase.

As an optimization, no entries will be transmitted to a consumer
if there has been no update in the master replica after the last
synchronization with the consumer. Even present messages for the
unchanged entries are not transmitted. The consumer retains its
replica contents.

H3: entryCSN

The LDAP Sync replication implemented in {{slapd}}(8) stores state
information to ever entry in the {{EX:entryCSN}} attribute.
{{EX:entryCSN}} of an entry is the CSN or {{change sequence number}},
which is the refined timestamp, at which the entry was updated most
lately.  The CSN consists of three parts: the time, a replica ID,
and a change count within a single second.

H3: contextCSN

{{EX:contextCSN}} represents the current state of the provider
replica.  It is the largest {{EX:entryCSN}} of all entries in the
context such that no transaction having smaller {{EX:entryCSN}}
value remains outstanding.  Because the {{EX:entryCSN}} value is
obtained before transaction start and transactions are not committed
in the {{EX:entryCSN}} order, special care needed to be taken to
manage the proper {EX:contextCSN}} value in the transactional
environment.  Also, the state of the search result set is required
to correspond to the {{EX:contextCSN}} value returned to the consumer
as a sync cookie.

{{EX:contextCSN}}, the provider replica state, is stored in the
{{EX:syncProviderSubentry}}. The value of the {{EX:contextCSN}} is
transmitted to the consumer replica as a Sync Cookie. The cookie
is stored in the {{EX:syncreplCookie}} attribute of
{{EX:syncConsumerSubentry}} subentry. The consumer will use the
stored cookie value to represent its replica state when it connects
to the provider in the future.

H3: Glue Entry

Because general search filter can be used in the LDAP Sync replication,
an entry might be created without a parent, if the parent entry was
filtered out.  The LDAP Sync replication engine creates the glue
entries for such holes in the replica.  The glue entries will not
be returned in response to a search to the consumer {{slapd}}(8) if
manageDSAit is not set. It will be returned if it is set.

H2: Configuring slapd for LDAP Sync Replication

It is relatively simple to start providing a replicated directory
service with LDAP Sync replication, compared to the replication
with {{slurpd}}(8).  First, we should configure both the provider
and the consumer {{slapd}}(8) servers appropriately.  Then, start
the provider slapd instance first, and the consumer slapd instance
next.  Administrative tasks such as database copy and temporal
shutdown (or read-only demotion) of the provider are not required.

H3: Set up the provider slapd

There is no special {{slapd.conf}}(5) directive for the provider
{{slapd}}(8).  Because the LDAP Sync searches are subject to access
control, proper access control privileges should be set up for the
replicated content.

When creating a provider database from an {{TERM:LDIF}} file using
{{slapadd}}(8), you must create and update a state indicator of the
database context up to date. {{slapadd}}(8) will store the
{{EX:contextCSN}} in the {{EX:syncProviderSubentry}} if it is given
the {{EX:-w}} flag. It is also possible to create the
{{EX:syncProviderSubentry}} with an appropriate {{EX:contextCSN}}
value by directly including it in the ldif file. If {{slapadd}}(8)
runs without the {{EX:-w}} flag, the provided {{EX:contextCSN}}
will be stored. With the {{EX:-w}} flag, a new value based on the
current time will be stored as {{EX:contextCSN}}. {{slapcat}}(8)
can be used to retrieve the directory with the {{EX:contextCSN}}
when it is run with the {{EX:-m}} flag.

Only the BDB (back-bdb) and HDB (back-hdb) backends can perform as
the LDAP Sync replication provider.  LDBM (back-ldbm) currently
does not have the LDAP Sync protocol functionality.

H3: Set up the consumer slapd

The consumer slapd is configured by {{slapd.conf}}(5) configuration
file. For the configuration directives, see the {{SECT:syncrepl}}
section of {{SECT:The slapd Configuration File}} chapter. In the
configuration file, make sure the DN given in the {{EX:updatedn=}}
directive of the {{EX:syncrepl}} specification has permission to
write to the database. Below is an example {{EX:syncrepl}} specification
at the consumer replica:

>	syncrepl id = 1
>		provider=ldap://provider.example.com:389
>		updatedn="cn=replica,dc=example,dc=com"
>		binddn="cn=syncuser,dc=example,dc=com"
>		bindmethod=simple
>		credentials=secret
>		searchbase="dc=example,dc=com"
>		filter="(objectClass=organizationalPerson)"
>		attrs="cn,sn,ou,telephoneNumber,title,l"
>		schemachecking=on
>		scope=sub
>		type=refreshOnly
>		interval=01:00:00

In this example, the consumer will connect to the provider slapd
at port 389 of {{FILE:ldap://provider.example.com}} to perform a
polling (refreshOnly) mode of synchronization once a day.  It will
bind as {{EX:cn=syncuser,dc=example,dc=com}} using simple authentication
with password "secret".  Note that the access control privilege of
the DN specified by the {{EX:binddn=}} directive should be set
properly to synchronize the desired replica content.  The consumer
will write to its database with the privilege of the
{EX:cn=replica,dc=example,dc=com}} entry as specified by the
{{EX:updatedn=}} directive.  The {{EX:updatedn}} entry should have
write permission to the database.

The synchronization search in the example will search for entries
whose objectClass is organizationalPerson in the entire subtree
under {{EX:dc=example,dc=com}} search base inclusively. The requested
attributes are {{EX:cn}}, {{EX:sn}}, {{EX:ou}}, {{EX:telephoneNumber}},
{{EX:title}}, and {{EX:l}}. The schema checking is turned on, so
that the consumer {{slapd}}(8) will enforce entry schema checking
when it process updates from the provider {{slapd}}(8).

The LDAP Sync replication engine is backend independent. All three
native backends can perform as the LDAP Sync replication consumer.

H3: Start the provider and the consumer slapd

If the currently running provider {{slapd}}(8) already has the
{{EX:syncProviderSubentry}} in its database, it is not required to
restart the provider slapd. You don't need to restart the provider
{{slapd}}(8) when you start a replicated LDAP service. When you run
a consumer {{slapd}}(8), it will immediately perform either the
initial full reload if cookie is NULL or too out of date, or
incremental synchronization if effective cookie is provided.  In
the {{refreshOnly}} mode, the next synchronization session is
scheduled to run interval time after the completion of the current
session. In the {{refreshAndPersist}} mode, the synchronization
session is open between the consumer and provider.  The provider
will send update message whenever there are updates in the provider
replica.