ITS#9270 Additional information on indexing

2024-12-27 03:20:22 +08:00 · 2021-03-02 16:22:39 +00:00 · 2021-03-02 16:22:39 +00:00 · bab02ea40b
commit bab02ea40b
parent 05b1b4688c
1 changed files with 40 additions and 2 deletions
--- a/doc/guide/admin/tuning.sdf
+++ b/doc/guide/admin/tuning.sdf
@ -64,9 +64,20 @@ If the filter term has not been indexed, then the search must read every single
 entry in the target scope and test to see if each entry matches the filter. 
 Obviously indexing can save a lot of work when it's used correctly.

+In back-mdb, indexes can only track a certain number of entries per key (by
+default that number is 2^16 = 65536). If more entries' values hash to this
+key, some/all of them will have to be represented by a range of candidates,
+making the index less useful over time as deletions cannot usually be tracked
+accurately.
+
 H3: What to index

-You should create indices to match the actual filter terms used in
+As a general rule, to make any use of indexes, you must set up an equality
+index on objectClass:
+
+>        index objectClass eq
+
+Then you should create indices to match the actual filter terms used in
 search queries. 

 >        index cn,sn,givenname,mail eq
@ -86,7 +97,8 @@ all of those entries are going to be read anyway, because they are valid
 members of the result set. In a subtree where 100% of the
 entries are going to contain the same attributes, the presence index does
 absolutely NOTHING to benefit the search, because 100% of the entries match
-that presence filter. 
+that presence filter. As an example, setting a presence index on objectClass
+provides no benefit since it is present on every entry.

 So the resource cost of generating the index is a
 complete waste of CPU time, disk, and memory. Don't do it unless you know
@ -101,6 +113,32 @@ not be done, it's just wasted overhead.
 See the {{Logging}} section below on what to watch out for if you have a frequently searched
 for attribute that is unindexed.

+H3: Equality indexing
+
+Similarly to presence indexes, equality indexes are most useful if the
+values searched for are uncommon. Most OpenLDAP indexes work by hashing
+the normalised value and using the hash as the key. Hashing behaviour
+depends on the matching rule syntax, some matching rules also implement
+indexers that help speed up inequality (lower than, ...) queries.
+
+Check the documentation and other parts of this guide if some indexes are
+mandatory - e.g. to enable replication, it is expected you index certain
+operational attributes, likewise if you rely on filters in ACL processing.
+
+Approximate indexes are usually identical to equality indexes unless
+a matching rule explicitly implements it. As of OpenLDAP 2.5, only
+directoryStringApproxMatch and IA5StringApproxMatch matchers
+and indexers are implemented, currently using soundex or metaphone, with
+metaphone being the default.
+
+H3: Substring indexing
+
+Substring indexes work on spliting the value into short chunks and then
+indexing those in a similar way to how equality index does. The storage
+space needed to store all of this data is analogous to the amount of data
+being indexed, which makes the indexes extremely heavy-handed in most
+scenarios.
+

 H2: Logging