mirror of
https://git.postgresql.org/git/postgresql.git
synced 2025-01-30 19:00:29 +08:00
Add external documentation for KNNGIST.
This commit is contained in:
parent
04910a3ad5
commit
b576757d7e
@ -78,7 +78,7 @@
|
||||
|
||||
<para>
|
||||
All it takes to get a <acronym>GiST</acronym> access method up and running
|
||||
is to implement seven user-defined methods, which define the behavior of
|
||||
is to implement several user-defined methods, which define the behavior of
|
||||
keys in the tree. Of course these methods have to be pretty fancy to
|
||||
support fancy queries, but for all the standard queries (B-trees,
|
||||
R-trees, etc.) they're relatively straightforward. In short,
|
||||
@ -93,12 +93,13 @@
|
||||
|
||||
<para>
|
||||
There are seven methods that an index operator class for
|
||||
<acronym>GiST</acronym> must provide. Correctness of the index is ensured
|
||||
<acronym>GiST</acronym> must provide, and an eighth that is optional.
|
||||
Correctness of the index is ensured
|
||||
by proper implementation of the <function>same</>, <function>consistent</>
|
||||
and <function>union</> methods, while efficiency (size and speed) of the
|
||||
index will depend on the <function>penalty</> and <function>picksplit</>
|
||||
methods.
|
||||
The remaining two methods are <function>compress</> and
|
||||
The remaining two basic methods are <function>compress</> and
|
||||
<function>decompress</>, which allow an index to have internal tree data of
|
||||
a different type than the data it indexes. The leaves are to be of the
|
||||
indexed data type, while the other tree nodes can be of any C struct (but
|
||||
@ -106,6 +107,9 @@
|
||||
see about <literal>varlena</> for variable sized data). If the tree's
|
||||
internal data type exists at the SQL level, the <literal>STORAGE</> option
|
||||
of the <command>CREATE OPERATOR CLASS</> command can be used.
|
||||
The optional eighth method is <function>distance</>, which is needed
|
||||
if the operator class wishes to support ordered scans (nearest-neighbor
|
||||
searches).
|
||||
</para>
|
||||
|
||||
<variablelist>
|
||||
@ -567,6 +571,73 @@ my_same(PG_FUNCTION_ARGS)
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><function>distance</></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Given an index entry <literal>p</> and a query value <literal>q</>,
|
||||
this function determines the index entry's
|
||||
<quote>distance</> from the query value. This function must be
|
||||
supplied if the operator class contains any ordering operators.
|
||||
A query using the ordering operator will be implemented by returning
|
||||
index entries with the smallest <quote>distance</> values first,
|
||||
so the results must be consistent with the operator's semantics.
|
||||
For a leaf index entry the result just represents the distance to
|
||||
the index entry; for an internal tree node, the result must be the
|
||||
smallest distance that any child entry could have.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <acronym>SQL</> declaration of the function must look like this:
|
||||
|
||||
<programlisting>
|
||||
CREATE OR REPLACE FUNCTION my_distance(internal, data_type, smallint, oid)
|
||||
RETURNS float8
|
||||
AS 'MODULE_PATHNAME'
|
||||
LANGUAGE C STRICT;
|
||||
</programlisting>
|
||||
|
||||
And the matching code in the C module could then follow this skeleton:
|
||||
|
||||
<programlisting>
|
||||
Datum my_distance(PG_FUNCTION_ARGS);
|
||||
PG_FUNCTION_INFO_V1(my_distance);
|
||||
|
||||
Datum
|
||||
my_distance(PG_FUNCTION_ARGS)
|
||||
{
|
||||
GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
|
||||
data_type *query = PG_GETARG_DATA_TYPE_P(1);
|
||||
StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
|
||||
/* Oid subtype = PG_GETARG_OID(3); */
|
||||
data_type *key = DatumGetDataType(entry->key);
|
||||
double retval;
|
||||
|
||||
/*
|
||||
* determine return value as a function of strategy, key and query.
|
||||
*/
|
||||
|
||||
PG_RETURN_FLOAT8(retval);
|
||||
}
|
||||
</programlisting>
|
||||
|
||||
The arguments to the <function>distance</> function are identical to
|
||||
the arguments of the <function>consistent</> function, except that no
|
||||
recheck flag is used. The distance to a leaf index entry must always
|
||||
be determined exactly, since there is no way to re-order the tuples
|
||||
once they are returned. Some approximation is allowed when determining
|
||||
the distance to an internal tree node, so long as the result is never
|
||||
greater than any child's actual distance. Thus, for example, distance
|
||||
to a bounding box is usually sufficient in geometric applications. The
|
||||
result value can be any finite <type>float8</> value. (Infinity and
|
||||
minus infinity are used internally to handle cases such as nulls, so it
|
||||
is not recommended that <function>distance</> functions return these
|
||||
values.)
|
||||
</para>
|
||||
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
|
||||
</sect1>
|
||||
|
@ -505,11 +505,31 @@ amrestrpos (IndexScanDesc scan);
|
||||
|
||||
<para>
|
||||
Some access methods return index entries in a well-defined order, others
|
||||
do not. If entries are returned in sorted order, the access method should
|
||||
set <structname>pg_am</>.<structfield>amcanorder</> true to indicate that
|
||||
it supports ordered scans.
|
||||
All such access methods must use btree-compatible strategy numbers for
|
||||
their equality and ordering operators.
|
||||
do not. There are actually two different ways that an access method can
|
||||
support sorted output:
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
Access methods that always return entries in the natural ordering
|
||||
of their data (such as btree) should set
|
||||
<structname>pg_am</>.<structfield>amcanorder</> to true.
|
||||
Currently, such access methods must use btree-compatible strategy
|
||||
numbers for their equality and ordering operators.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Access methods that support ordering operators should set
|
||||
<structname>pg_am</>.<structfield>amcanorderbyop</> to true.
|
||||
This indicates that the index is capable of returning entries in
|
||||
an order satisfying <literal>ORDER BY</> <replaceable>index_key</>
|
||||
<replaceable>operator</> <replaceable>constant</>. Scan modifiers
|
||||
of that form can be passed to <function>amrescan</> as described
|
||||
previously.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@ -521,7 +541,7 @@ amrestrpos (IndexScanDesc scan);
|
||||
the normal front-to-back direction, so <function>amgettuple</> must return
|
||||
the last matching tuple in the index, rather than the first one as it
|
||||
normally would. (This will only occur for access
|
||||
methods that advertise they support ordered scans.) After the
|
||||
methods that set <structfield>amcanorder</> to true.) After the
|
||||
first call, <function>amgettuple</> must be prepared to advance the scan in
|
||||
either direction from the most recently returned entry. (But if
|
||||
<structname>pg_am</>.<structfield>amcanbackward</> is false, all subsequent
|
||||
@ -563,7 +583,8 @@ amrestrpos (IndexScanDesc scan);
|
||||
tuples at once and marking or restoring scan positions isn't
|
||||
supported. Secondly, the tuples are returned in a bitmap which doesn't
|
||||
have any specific ordering, which is why <function>amgetbitmap</> doesn't
|
||||
take a <literal>direction</> argument. Finally, <function>amgetbitmap</>
|
||||
take a <literal>direction</> argument. (Ordering operators will never be
|
||||
supplied for such a scan, either.) Finally, <function>amgetbitmap</>
|
||||
does not guarantee any locking of the returned tuples, with implications
|
||||
spelled out in <xref linkend="index-locking">.
|
||||
</para>
|
||||
|
@ -167,6 +167,11 @@ CREATE INDEX test1_id_index ON test1 (id);
|
||||
upper/lower case conversion.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
B-tree indexes can also be used to retrieve data in sorted order.
|
||||
This is not always faster than a simple scan and sort, but it is
|
||||
often helpful.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<indexterm>
|
||||
@ -236,6 +241,18 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
|
||||
classes are available in the <literal>contrib</> collection or as separate
|
||||
projects. For more information see <xref linkend="GiST">.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
GiST indexes are also capable of optimizing <quote>nearest-neighbor</>
|
||||
searches, such as
|
||||
<programlisting><![CDATA[
|
||||
SELECT * FROM places ORDER BY location <-> point '(101,456)' LIMIT 10;
|
||||
]]>
|
||||
</programlisting>
|
||||
which finds the ten places closest to a given target point. The ability
|
||||
to do this is again dependent on the particular operator class being used.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<indexterm>
|
||||
<primary>index</primary>
|
||||
|
@ -361,59 +361,74 @@
|
||||
</table>
|
||||
|
||||
<para>
|
||||
GiST indexes require seven support functions,
|
||||
GiST indexes require seven support functions, with an optional eighth, as
|
||||
shown in <xref linkend="xindex-gist-support-table">.
|
||||
</para>
|
||||
|
||||
<table tocentry="1" id="xindex-gist-support-table">
|
||||
<title>GiST Support Functions</title>
|
||||
<tgroup cols="2">
|
||||
<tgroup cols="3">
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Function</entry>
|
||||
<entry>Description</entry>
|
||||
<entry>Support Number</entry>
|
||||
</row>
|
||||
</thead>
|
||||
<tbody>
|
||||
<row>
|
||||
<entry>consistent - determine whether key satisfies the
|
||||
<entry><function>consistent</></entry>
|
||||
<entry>determine whether key satisfies the
|
||||
query qualifier</entry>
|
||||
<entry>1</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>union - compute union of a set of keys</entry>
|
||||
<entry><function>union</></entry>
|
||||
<entry>compute union of a set of keys</entry>
|
||||
<entry>2</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>compress - compute a compressed representation of a key or value
|
||||
<entry><function>compress</></entry>
|
||||
<entry>compute a compressed representation of a key or value
|
||||
to be indexed</entry>
|
||||
<entry>3</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>decompress - compute a decompressed representation of a
|
||||
<entry><function>decompress</></entry>
|
||||
<entry>compute a decompressed representation of a
|
||||
compressed key</entry>
|
||||
<entry>4</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>penalty - compute penalty for inserting new key into subtree
|
||||
<entry><function>penalty</></entry>
|
||||
<entry>compute penalty for inserting new key into subtree
|
||||
with given subtree's key</entry>
|
||||
<entry>5</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>picksplit - determine which entries of a page are to be moved
|
||||
<entry><function>picksplit</></entry>
|
||||
<entry>determine which entries of a page are to be moved
|
||||
to the new page and compute the union keys for resulting pages</entry>
|
||||
<entry>6</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>equal - compare two keys and return true if they are equal</entry>
|
||||
<entry><function>equal</></entry>
|
||||
<entry>compare two keys and return true if they are equal</entry>
|
||||
<entry>7</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><function>distance</></entry>
|
||||
<entry>
|
||||
(optional method) determine distance from key to query value
|
||||
</entry>
|
||||
<entry>8</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
|
||||
<para>
|
||||
GIN indexes require four support functions,
|
||||
GIN indexes require four support functions, with an optional fifth, as
|
||||
shown in <xref linkend="xindex-gin-support-table">.
|
||||
</para>
|
||||
|
||||
|
@ -20,6 +20,7 @@ The current implementation of GiST supports:
|
||||
|
||||
* Variable length keys
|
||||
* Composite keys (multi-key)
|
||||
* Ordered search (nearest-neighbor search)
|
||||
* provides NULL-safe interface to GiST core
|
||||
* Concurrency
|
||||
* Recovery support via WAL logging
|
||||
@ -32,8 +33,8 @@ Marcel Kornaker:
|
||||
|
||||
The original algorithms were modified in several ways:
|
||||
|
||||
* They should be adapted to PostgreSQL conventions. For example, the SEARCH
|
||||
algorithm was considerably changed, because in PostgreSQL function search
|
||||
* They had to be adapted to PostgreSQL conventions. For example, the SEARCH
|
||||
algorithm was considerably changed, because in PostgreSQL the search function
|
||||
should return one tuple (next), not all tuples at once. Also, it should
|
||||
release page locks between calls.
|
||||
* Since we added support for variable length keys, it's not possible to
|
||||
@ -41,12 +42,12 @@ The original algorithms were modified in several ways:
|
||||
defined function picksplit doesn't have information about size of tuples
|
||||
(each tuple may contain several keys as in multicolumn index while picksplit
|
||||
could work with only one key) and pages.
|
||||
* We modified original INSERT algorithm for performance reason. In particular,
|
||||
* We modified original INSERT algorithm for performance reasons. In particular,
|
||||
it is now a single-pass algorithm.
|
||||
* Since the papers were theoretical, some details were omitted and we
|
||||
have to find out ourself how to solve some specific problems.
|
||||
had to find out ourself how to solve some specific problems.
|
||||
|
||||
Because of the above reasons, we have to revised interaction of GiST
|
||||
Because of the above reasons, we have revised the interaction of GiST
|
||||
core and PostgreSQL WAL system. Moreover, we encountered (and solved)
|
||||
a problem of uncompleted insertions when recovering after crash, which
|
||||
was not touched in the paper.
|
||||
@ -54,46 +55,49 @@ was not touched in the paper.
|
||||
Search Algorithm
|
||||
----------------
|
||||
|
||||
Function gettuple finds a tuple which satisfies the search
|
||||
predicate. It store their state and returns next tuple under
|
||||
subsequent calls. Stack contains page, its LSN and LSN of parent page
|
||||
and currentposition is saved between calls.
|
||||
The search code maintains a queue of unvisited items, where an "item" is
|
||||
either a heap tuple known to satisfy the search conditions, or an index
|
||||
page that is consistent with the search conditions according to inspection
|
||||
of its parent page's downlink item. Initially the root page is searched
|
||||
to find unvisited items in it. Then we pull items from the queue. A
|
||||
heap tuple pointer is just returned immediately; an index page entry
|
||||
causes that page to be searched, generating more queue entries.
|
||||
|
||||
gettuple(search-pred)
|
||||
if ( firsttime )
|
||||
push(stack, [root, 0, 0]) // page, LSN, parentLSN
|
||||
currentposition=0
|
||||
end
|
||||
ptr = top of stack
|
||||
while(true)
|
||||
latch( ptr->page, S-mode )
|
||||
if ( ptr->page->lsn != ptr->lsn )
|
||||
ptr->lsn = ptr->page->lsn
|
||||
currentposition=0
|
||||
if ( ptr->parentlsn < ptr->page->nsn )
|
||||
add to stack rightlink
|
||||
else
|
||||
currentposition++
|
||||
end
|
||||
The queue is kept ordered with heap tuple items at the front, then
|
||||
index page entries, with any newly-added index page entry inserted
|
||||
before existing index page entries. This ensures depth-first traversal
|
||||
of the index, and in particular causes the first few heap tuples to be
|
||||
returned as soon as possible. That is helpful in case there is a LIMIT
|
||||
that requires only a few tuples to be produced.
|
||||
|
||||
while(true)
|
||||
currentposition = find_first_match( currentposition )
|
||||
if ( currentposition is invalid )
|
||||
unlatch( ptr->page )
|
||||
pop stack
|
||||
ptr = top of stack
|
||||
if (ptr is NULL)
|
||||
return NULL
|
||||
break loop
|
||||
else if ( ptr->page is leaf )
|
||||
unlatch( ptr->page )
|
||||
return tuple
|
||||
else
|
||||
add to stack child page
|
||||
end
|
||||
currentposition++
|
||||
end
|
||||
end
|
||||
To implement nearest-neighbor search, the queue entries are augmented
|
||||
with distance data: heap tuple entries are labeled with exact distance
|
||||
from the search argument, while index-page entries must be labeled with
|
||||
the minimum distance that any of their children could have. Then,
|
||||
queue entries are retrieved in smallest-distance-first order, with
|
||||
entries having identical distances managed as stated in the previous
|
||||
paragraph.
|
||||
|
||||
The search algorithm keeps an index page locked only long enough to scan
|
||||
its entries and queue those that satisfy the search conditions. Since
|
||||
insertions can occur concurrently with searches, it is possible for an
|
||||
index child page to be split between the time we make a queue entry for it
|
||||
(while visiting its parent page) and the time we actually reach and scan
|
||||
the child page. To avoid missing the entries that were moved to the right
|
||||
sibling, we detect whether a split has occurred by comparing the child
|
||||
page's NSN to the LSN that the parent had when visited. If it did, the
|
||||
sibling page is immediately added to the front of the queue, ensuring that
|
||||
its items will be scanned in the same order as if they were still on the
|
||||
original child page.
|
||||
|
||||
As is usual in Postgres, the search algorithm only guarantees to find index
|
||||
entries that existed before the scan started; index entries added during
|
||||
the scan might or might not be visited. This is okay as long as all
|
||||
searches use MVCC snapshot rules to reject heap tuples newer than the time
|
||||
of scan start. In particular, this means that we need not worry about
|
||||
cases where a parent page's downlink key is "enlarged" after we look at it.
|
||||
Any such enlargement would be to add child items that we aren't interested
|
||||
in returning anyway.
|
||||
|
||||
|
||||
Insert Algorithm
|
||||
|
Loading…
Reference in New Issue
Block a user