From 3a65e4fca66daaa585789462081f4f054b552ad1 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 20 Feb 2001 22:27:56 +0000 Subject: [PATCH] Add more details. --- doc/src/sgml/indices.sgml | 437 ++++++++++++++++++++++++++++---------- 1 file changed, 328 insertions(+), 109 deletions(-) diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml index db986593184..32ecd9e6695 100644 --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -1,132 +1,353 @@ - - Indices and Keys + + + + Indices + + + Indices are a common way to enhance database performance. An index + allows the database server to find and retrieve specific rows much + faster than it could do without an index. But indices also add + overhead to the database system as a whole, so they should be used + sensibly. + + + + + Introduction - Indexes are commonly used to enhance database - performance. They should be defined on table columns (or class - attributes) that are used as qualifications in repetitive queries. - Inappropriate use will result in slower performance, since update - and insertion times are increased in the presence of indices. + The classical example for the need of an index is if there is a + table similar to this: + +CREATE TABLE test1 ( + id integer, + content varchar +); + + and the application requires a lot of queries of the form + +SELECT content FROM test1 WHERE id = constant; + + Ordinarily, the system would have to scan the entire + test1 table row by row to find all + matching entries. If there are a lot of rows in + test1 and only a few rows (possibly zero + or one) returned by the query, then this is clearly an inefficient + method. If the system were instructed to maintain an index on the + id column, then it could use a more + efficient method for locating matching rows. For instance, it + might only have to walk a few levels deep into a search tree. - Indexes may also be used to enforce uniqueness of a table's primary key. - When an index is declared UNIQUE, multiple table rows with identical - index entries won't be allowed. - For this purpose, the goal is ensuring data consistency, not improving - performance, so the above caution about inappropriate use doesn't apply. + A similar approach is used in most books of non-fiction: Terms and + concepts that are frequently looked up by readers are collected in + an alphabetic index at the end of the book. The interested reader + can scan the index relatively quickly and flip to the appropriate + page, and would not have to read the entire book to find the + interesting location. As it is the task of the author to + anticipate the items that the readers are most likely to look up, + it is the task of the database programmer to foresee which indexes + would be of advantage. - Two forms of indices may be defined: - - - - - For a value index, - the key fields for the - index are specified as column names; multiple columns - can be specified if the index access method supports - multi-column indexes. - - - - - - For a functional index, an index is defined - on the result of a function applied - to one or more columns of a single table. - This is a single-column index (namely, the function result) - even if the function uses more than one input field. - Functional indices can be used to obtain fast access to data - based on operators that would normally require some - transformation to apply them to the base data. - - - + The following command would be used to create the index on the + id column, as discussed: + +CREATE INDEX test1_id_index ON test1 (id); + + The name test1_id_index can be chosen + freely, but you should pick something that enables you to remember + later what the index was for. - Postgres provides btree, rtree and hash access methods for - indices. The btree access method is an implementation of - Lehman-Yao high-concurrency btrees. The rtree access method - implements standard rtrees using Guttman's quadratic split algorithm. - The hash access method is an implementation of Litwin's linear - hashing. We mention the algorithms used solely to indicate that all - of these access methods are fully dynamic and do not have to be - optimized periodically (as is the case with, for example, static hash - access methods). + To remove an index, use the DROP INDEX command. + Indices can be added and removed from tables at any time. - The Postgres - query optimizer will consider using a btree index whenever - an indexed attribute is involved in a comparison using one of: + Once the index is created, no further intervention is required: the + system will use the index when it thinks it would be more efficient + than a sequential table scan. But you may have to run the + VACUUM ANALYZE command regularly to update + statistics to allow the query planner to make educated decisions. + Also read for information about + how to find out whether an index is used and when and why the + planner may choose to not use an index. + + + + Indices can also benefit UPDATEs and + DELETEs with search conditions. Note that a + query or data manipulation commands can only use at most one index + per table. Indices can also be used in table join methods. Thus, + an index defined on a column that is part of a join condition can + significantly speed up queries with joins. + + + + When an index is created, it has to be kept synchronized with the + table. This adds overhead to data manipulation operations. + Therefore indices that are non-essential or do not get used at all + should be removed. + + + + + + Index Types + + + Postgres provides several index types: + B-tree, R-tree, and Hash. Each index type is more appropriate for + a particular query type because of the algorithm it uses. By + default, the CREATE INDEX command will create a + B-tree index, which fits the most common situations. In + particular, the Postgres query optimizer + will consider using a B-tree index whenever an indexed column is + involved in a comparison using one of these operators: - < - <= - = - >= - > + < + <= + = + >= + > - The Postgres - query optimizer will consider using an rtree index whenever - an indexed attribute is involved in a comparison using one of: + R-tree indices are especially suited for spacial data. To create + an R-tree index, use a command of the form + +CREATE INDEX name ON table USING RTREE (column); + + The Postgres query optimizer will + consider using an R-tree index whenever an indexed column is + involved in a comparison using one of these operators: - << - &< - &> - >> - @ - ~= - && + << + &< + &> + >> + @ + ~= + && + (Refer to about the meaning of + these operators.) - The Postgres - query optimizer will consider using a hash index whenever - an indexed attribute is involved in a comparison using - the = operator. + The query optimizer will consider using a hash index whenever an + indexed column is involved in a comparison using the + = operator. The following command is used to + create a hash index: + +CREATE INDEX name ON table USING HASH (column); + + + + Because of the limited utility of hash indices, a B-tree index + should generally be preferred over a hash index. We do not have + sufficient evidence that hash indices are actually faster than + B-trees even for = comparisons. Moreover, + hash indices require coarser locks; see . + + - Currently, only the btree access method supports multi-column - indexes. Up to 16 keys may be specified by default (this limit - can be altered when building Postgres). + The B-tree index is an implementation of Lehman-Yao + high-concurrency B-trees. The R-tree index method implements + standard R-trees using Guttman's quadratic split algorithm. The + hash index is an implementation of Litwin's linear hashing. We + mention the algorithms used solely to indicate that all of these + access methods are fully dynamic and do not have to be optimized + periodically (as is the case with, for example, static hash access + methods). + + + + + + Multi-Column Indices + + + An index can be defined on more than one column. For example, if + you have a table of this form: + +CREATE TABLE test2 ( + major int, + minor int, + name varchar +); + + (Say, you keep you your /dev + directory in a database...) and you frequently make queries like + +SELECT name FROM test2 WHERE major = constant AND minor = constant; + + then it may be appropriate to define an index on the columns + major and + minor together, e.g., + +CREATE INDEX test2_mm_idx ON test2 (major, minor); + - An operator class can be specified for each - column of an index. The operator class identifies the operators to - be used by the index for that column. For example, a btree index on - four-byte integers would use the int4_ops class; - this operator class includes comparison functions for four-byte - integers. In practice the default operator class for the field's - datatype is usually sufficient. The main point of having operator classes - is that for some datatypes, there could be more than one meaningful - ordering. For example, we might want to sort a complex-number datatype - either by absolute value or by real part. We could do this by defining - two operator classes for the datatype and then selecting the proper - class when making an index. There are also some operator classes with - special purposes: + Currently, only the B-tree implementation supports multi-column + indices. Up to 16 columns may be specified. (This limit can be + altered when building Postgres; see the + file config.h.) + + + + The query optimizer can use a multi-column index for queries that + involve the first n consecutive columns in + the index (when used with appropriate operators), up to the total + number of columns specified in the index definition. For example, + an index on (a, b, c) can be used in queries + involving all of a, b, and + c, or in queries involving both + a and b, or in queries + involving only a, but not in other combinations. + (In a query involving a and c + the optimizer might choose to use the index for + a only and treat c like an + ordinary unindexed column.) + + + + Multi-column indexes can only be used if the clauses involving the + indexed columns are joined with AND. For instance, + +SELECT name FROM test2 WHERE major = constant OR minor = constant; + + cannot make use of the index test2_mm_idx + defined above to look up both columns. (It can be used to look up + only the major column, however.) + + + + Multi-column indices should be used sparingly. Most of the time, + an index on a single column is sufficient and saves space and time. + Indexes with more than three columns are almost certainly + inappropriate. + + + + + + Unique Indices + + + Indexes may also be used to enforce uniqueness of a column's value, + or the uniqueness of the combined values of more than one column. + +CREATE UNIQUE INDEX name ON table (column , ...); + + Only B-tree indices can be declared unique. + + + + When an index is declared unique, multiple table rows with equal + indexed values will not be allowed. NULL values are not considered + equal. + + + + PostgreSQL automatically creates unique + indices when a table is declared with a unique constraint or a + primary key, on the columns that make up the primary key or unique + columns (a multi-column index, if appropriate), to enforce that + constraint. A unique index can be added to a table at any later + time, to add a unique constraint. (But a primary key cannot be + added after table creation.) + + + + + + Functional Indices + + + For a functional index, an index is defined + on the result of a function applied to one or more columns of a + single table. Functional indices can be used to obtain fast access + to data based on the result of function calls. + + + + For example, a common way to do case-insensitive comparisons is to + use the lower: + +SELECT * FROM test1 WHERE lower(col1) = 'value'; + + In order for that query to be able to use an index, it has to be + defined on the result of the lower(column) + operation: + +CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1)); + + + + + The function in the index definition can take more than one + argument, but they must be table columns, not constants. + Functional indices are always single-column (namely, the function + result) even if the function uses more than one input field; there + cannot be multi-column indices that contain function calls. + + + + + The restrictions mentioned in the previous paragraph can easily be + worked around by defining custom functions to use in the index + definition that call the desired function(s) internally. + + + + + + + Operator Classes + + + An index definition may specify an operator + class for each column of an index. + +CREATE INDEX name ON table (column opclass , ...); + + The operator class identifies the operators to be used by the index + for that column. For example, a B-tree index on four-byte integers + would use the int4_ops class; this operator + class includes comparison functions for four-byte integers. In + practice the default operator class for the column's data type is + usually sufficient. The main point of having operator classes is + that for some data types, there could be more than one meaningful + ordering. For example, we might want to sort a complex-number data + type either by absolute value or by real part. We could do this by + defining two operator classes for the data type and then selecting + the proper class when making an index. There are also some + operator classes with special purposes: The operator classes box_ops and - bigbox_ops both support rtree indices on the - box datatype. - The difference between them is that bigbox_ops - scales box coordinates down, to avoid floating point exceptions from - doing multiplication, addition, and subtraction on very large - floating-point coordinates. If the field on which your rectangles lie - is about 20,000 units square or larger, you should use + bigbox_ops both support R-tree indices on the + box data type. The difference between them is + that bigbox_ops scales box coordinates down, + to avoid floating point exceptions from doing multiplication, + addition, and subtraction on very large floating point + coordinates. If the field on which your rectangles lie is about + 20 000 units square or larger, you should use bigbox_ops. @@ -135,8 +356,7 @@ The following query shows all defined operator classes: - - + SELECT am.amname AS acc_name, opc.opcname AS ops_name, opr.oprname AS ops_comp @@ -145,13 +365,11 @@ SELECT am.amname AS acc_name, WHERE amop.amopid = am.oid AND amop.amopclaid = opc.oid AND amop.amopopr = opr.oid - ORDER BY acc_name, ops_name, ops_comp - + ORDER BY acc_name, ops_name, ops_comp; + + - - Use DROP INDEX to remove an index. - Keys @@ -169,7 +387,7 @@ SELECT am.amname AS acc_name, - + Subject: Re: [QUESTIONS] PRIMARY KEY | UNIQUE What's the difference between: @@ -180,7 +398,7 @@ Subject: Re: [QUESTIONS] PRIMARY KEY | UNIQUE - Is this an alias? - If PRIMARY KEY is already unique, then why is there another kind of key named UNIQUE? - + @@ -199,7 +417,7 @@ Subject: Re: [QUESTIONS] PRIMARY KEY | UNIQUE However, my application requires that each collection will also have a unique name. Why? So that a human being who wants to modify a collection will be able to identify it. It's much harder to know, if you have two - collections named "Life Science", the the one tagged 24433 is the one you + collections named Life Science, the the one tagged 24433 is the one you need, and the one tagged 29882 is not. @@ -213,7 +431,7 @@ Subject: Re: [QUESTIONS] PRIMARY KEY | UNIQUE Moreover, despite being unique, the collection name does not actually define the collection! For example, if somebody decided to change the name - of the collection from "Life Science" to "Biology", it will still be the + of the collection from Life Science to Biology, it will still be the same collection, only with a different name. As long as the name is unique, that's OK. @@ -256,7 +474,7 @@ Subject: Re: [QUESTIONS] PRIMARY KEY | UNIQUE - Are updateable, so long as they are kept unique. + Are updatable, so long as they are kept unique. @@ -284,16 +502,16 @@ Subject: Re: [QUESTIONS] PRIMARY KEY | UNIQUE Thus, you may query a table by any combination of its columns, despite the fact that you don't have an index on these columns. The indexes are merely - an implementational aid that each RDBMS offers + an implementation aid that each RDBMS offers you, in order to cause commonly used queries to be done more efficiently. Some RDBMS may give you additional measures, such as keeping a key stored in main memory. They will have a special command, for example - -CREATE MEMSTORE ON <table> COLUMNS <cols> - - (this is not an existing command, just an example). + +CREATE MEMSTORE ON table COLUMNS cols + + (This is not an existing command, just an example.) @@ -318,6 +536,7 @@ CREATE MEMSTORE ON <table> COLUMNS <cols> + Partial Indices