postgresql/doc/manual/xindex.html
1997-01-15 15:16:25 +00:00

431 lines
17 KiB
HTML

<HTML>
<HEAD>
<TITLE>The POSTGRES95 User Manual - THE QUERY LANGUAGE</TITLE>
</HEAD>
<BODY>
<font size=-1>
<A HREF="pg95user.html">[ TOC ]</A>
<A HREF="xaggr.html">[ Previous ]</A>
<A HREF="libpq.html">[ Next ]</A>
</font>
<HR>
<H1>11. INTERFACING EXTENSIONS TO INDICES</H1>
<HR>
The procedures described thus far let you define a new
type, new functions and new operators. However, we
cannot yet define a secondary index (such as a <B>B-tree</B>,
<B>R-tree</B> or hash access method) over a new type or its
operators.<p>
<A HREF="extend.html#about-the-postgres-system-catalogs">Look back at Figure 3</A>.
The right half shows the catalogs
that we must modify in order to tell POSTGRES how
to use a user-defined type and/or user-defined operators
with an index (i.e., <CODE>pg_am, pg_amop, pg_amproc</CODE> and
<CODE>pg_opclass</CODE>). Unfortunately, there is no simple command
to do this. We will demonstrate how to modify these
catalogs through a running example: a new operator
class for the <B>B-tree</B> access method that sorts integers
in ascending absolute value order.<p>
The <CODE>pg_am</CODE> class contains one instance for every user
defined access method. Support for the heap access
method is built into POSTGRES, but every other access
method is described here. The schema is
<p>
<center>
<table border=1>
<tr>
<td>amname </td><td> name of the access method </td>
</tr>
<td>amowner </td><td> object id of the owner's instance in pg_user </td>
</tr>
<tr>
<td>amkind </td><td> not used at present, but set to 'o' as a place holder </td>
</tr>
<tr>
<td>amstrategies </td><td> number of strategies for this access method (see below) </td>
</tr>
<tr>
<td>amsupport </td><td> number of support routines for this access method (see below) </td>
</tr>
<tr>
<td>amgettuple<br>
aminsert<br>
...</td>
<td>procedure identifiers for interface routines to the access
method. For example, regproc ids for opening, closing, and
getting instances from the access method appear here. </td>
</tr>
</table>
</center>
<p>
The <B>object ID</B> of the instance in <CODE>pg_am</CODE> is used as a
foreign key in lots of other classes. You don't need
to add a new instance to this class; all you're interested in
is the <B>object ID</B> of the access method instance
you want to extend:
<pre> SELECT oid FROM pg_am WHERE amname = 'btree'
+----+
|oid |
+----+
|403 |
+----+
</pre>
The <CODE>amstrategies</CODE> attribute exists to standardize
comparisons across data types. For example, <B>B-tree</B>s
impose a strict ordering on keys, lesser to greater.
Since POSTGRES allows the user to define operators,
POSTGRES cannot look at the name of an operator (eg, &gt;
or &lt;) and tell what kind of comparison it is. In fact,
some access methods don't impose any ordering at all.
For example, <B>R-tree</B>s express a rectangle-containment
relationship, whereas a hashed data structure expresses
only bitwise similarity based on the value of a hash
function. POSTGRES needs some consistent way of taking
a qualification in your query, looking at the operator
and then deciding if a usable index exists. This
implies that POSTGRES needs to know, for example, that
the &lt;= and &gt; operators partition a <B>B-tree</B>. POSTGRES
uses strategies to express these relationships between
operators and the way they can be used to scan indices.<p>
Defining a new set of strategies is beyond the scope of
this discussion, but we'll explain how <B>B-tree</B> strategies
work because you'll need to know that to add a new
operator class. In the <CODE>pg_am</CODE> class, the amstrategies
attribute is the number of strategies defined for this
access method. For <B>B-tree</B>s, this number is 5. These
strategies correspond to
<p>
<center>
<table border=1>
<tr>
<td>less than </td><td> 1 </td>
</tr>
<tr>
<td>less than or equal </td><td> 2 </td>
</tr>
<tr>
<td>equal </td><td> 3 </td>
</tr>
<tr>
<td>greater than or equal </td><td> 4 </td>
</tr>
<tr>
<td>greater than </td><td> 5 </td>
</tr>
</table>
</center>
<p>
The idea is that you'll need to add procedures corresponding
to the comparisons above to the <CODE>pg_amop</CODE> relation
(see below). The access method code can use these
strategy numbers, regardless of data type, to figure
out how to partition the <B>B-tree</B>, compute selectivity,
and so on. Don't worry about the details of adding
procedures yet; just understand that there must be a
set of these procedures for <CODE>int2, int4, oid,</CODE> and every
other data type on which a <B>B-tree</B> can operate.
<p>
Sometimes, strategies aren't enough information for the
system to figure out how to use an index. Some access
methods require other support routines in order to
work. For example, the <B>B-tree</B> access method must be
able to compare two keys and determine whether one is
greater than, equal to, or less than the other.
Similarly, the <B>R-tree</B> access method must be able to compute
intersections, unions, and sizes of rectangles. These
operations do not correspond to user qualifications in
SQL queries; they are administrative routines used by
the access methods, internally.<p>
In order to manage diverse support routines
consistently across all POSTGRES access methods, <CODE>pg_am</CODE>
includes an attribute called <CODE>amsupport</CODE>. This attribute
records the number of support routines used by an
access method. For <B>B-tree</B>s, this number is one -- the
routine to take two keys and return -1, 0, or +1,
depending on whether the first key is less than, equal
to, or greater than the second.<A HREF="#8"><font size=-1>[8]</font></A><p>
The <CODE>amstrategies</CODE> entry in pg_am is just the number of
strategies defined for the access method in question.
The procedures for less than, less equal, and so on
don't appear in <CODE>pg_am</CODE>. Similarly, <CODE>amsupport</CODE> is just
the number of support routines required by the access
method. The actual routines are listed elsewhere.<p>
The next class of interest is pg_opclass. This class
exists only to associate a name with an oid. In
pg_amop, every <B>B-tree</B> operator class has a set of
procedures, one through five, above. Some existing
opclasses are <CODE>int2_ops, int4_ops, and oid_ops</CODE>. You
need to add an instance with your opclass name (for
example, <CODE>complex_abs_ops</CODE>) to <CODE>pg_opclass</CODE>. The <CODE>oid</CODE> of
this instance is a foreign key in other classes.
<pre> INSERT INTO pg_opclass (opcname) VALUES ('complex_abs_ops');
SELECT oid, opcname
FROM pg_opclass
WHERE opcname = 'complex_abs_ops';
+------+--------------+
|oid | opcname |
+------+--------------+
|17314 | int4_abs_ops |
+------+--------------+
</pre>
Note that the oid for your <CODE>pg_opclass</CODE> instance will be
different! You should substitute your value for 17314
wherever it appears in this discussion.<p>
So now we have an access method and an operator class.
We still need a set of operators; the procedure for
defining operators was discussed earlier in this manual.
For the complex_abs_ops operator class on Btrees,
the operators we require are:
<pre> absolute value less-than
absolute value less-than-or-equal
absolute value equal
absolute value greater-than-or-equal
absolute value greater-than
</pre>
Suppose the code that implements the functions defined
is stored in the file
<pre>
/usr/local/postgres95/src/tutorial/complex.c
</pre>
Part of the code look like this: (note that we will
only show the equality operator for the rest of the
examples. The other four operators are very similar.
Refer to <CODE>complex.c</CODE> or <CODE>complex.sql</CODE> for the details.)
<pre> #define Mag(c) ((c)-&gt;x&#42;(c)-&gt;x + (c)-&gt;y&#42;(c)-&gt;y)
bool
complex_abs_eq(Complex &#42;a, Complex &#42;b)
{
double amag = Mag(a), bmag = Mag(b);
return (amag==bmag);
}
</pre>
There are a couple of important things that are happening below.<p>
First, note that operators for less-than, less-than-or
equal, equal, greater-than-or-equal, and greater-than
for <CODE>int4</CODE> are being defined. All of these operators are
already defined for <CODE>int4</CODE> under the names &lt;, &lt;=, =, &gt;=,
and &gt;. The new operators behave differently, of
course. In order to guarantee that POSTGRES uses these
new operators rather than the old ones, they need to be
named differently from the old ones. This is a key
point: you can overload operators in POSTGRES, but only
if the operator isn't already defined for the argument
types. That is, if you have &lt; defined for (int4,
int4), you can't define it again. POSTGRES does not
check this when you define your operator, so be careful.
To avoid this problem, odd names will be used for
the operators. If you get this wrong, the access methods
are likely to crash when you try to do scans.<p>
The other important point is that all the operator
functions return Boolean values. The access methods
rely on this fact. (On the other hand, the support
function returns whatever the particular access method
expects -- in this case, a signed integer.)
The final routine in the file is the "support routine"
mentioned when we discussed the amsupport attribute of
the <CODE>pg_am</CODE> class. We will use this later on. For now,
ignore it.
<pre> CREATE FUNCTION complex_abs_eq(complex, complex)
RETURNS bool
AS '/usr/local/postgres95/tutorial/obj/complex.so'
LANGUAGE 'c';
</pre>
Now define the operators that use them. As noted, the
operator names must be unique among all operators that
take two <CODE>int4</CODE> operands. In order to see if the
operator names listed below are taken, we can do a query on
<CODE>pg_operator</CODE>:
<pre> /&#42;
&#42; this query uses the regular expression operator (~)
&#42; to find three-character operator names that end in
&#42; the character &amp;
&#42;/
SELECT &#42;
FROM pg_operator
WHERE oprname ~ '^..&amp;&#36;'::text;
</pre>
to see if your name is taken for the types you want.
The important things here are the procedure (which are
the <B>C</B> functions defined above) and the restriction and
join selectivity functions. You should just use the
ones used below--note that there are different such
functions for the less-than, equal, and greater-than
cases. These must be supplied, or the access method
will crash when it tries to use the operator. You
should copy the names for restrict and join, but use
the procedure names you defined in the last step.
<pre> CREATE OPERATOR = (
leftarg = complex, rightarg = complex, procedure = complex_abs_eq,
restrict = eqsel, join = eqjoinsel
)
</pre>
Notice that five operators corresponding to less, less
equal, equal, greater, and greater equal are defined.<p>
We're just about finished. the last thing we need to do
is to update the <CODE>pg_amop</CODE> relation. To do this, we need
the following attributes:
<p>
<center>
<table border=1>
<td>amopid </td><td> the <CODE>oid</CODE> of the <CODE>pg_am</CODE> instance for B-tree
(== 403, see above) </td>
<tr>
</tr>
<td>amopclaid </td><td> the <CODE>oid</CODE> of the
<CODE>pg_opclass</CODE> instance for <CODE>int4_abs_ops</CODE> (==
whatever you got instead of <CODE>17314</CODE>, see above)</td>
<tr>
</tr>
<td>amopopr </td><td> the <CODE>oid</CODE>s of the operators for the opclass (which we'll
get in just a minute) </td>
<tr>
</tr>
<td>amopselect, amopnpages </td><td> cost functions.</td>
</tr>
</table>
</center>
<p>
The cost functions are used by the query optimizer to
decide whether or not to use a given index in a scan.
Fortunately, these already exist. The two functions
we'll use are <CODE>btreesel</CODE>, which estimates the selectivity
of the <B>B-tree</B>, and <CODE>btreenpage</CODE>, which estimates the
number of pages a search will touch in the tree.<p>
So we need the <CODE>oid</CODE>s of the operators we just defined.
We'll look up the names of all the operators that take
two <CODE>int4</CODE>s, and pick ours out:
<pre> SELECT o.oid AS opoid, o.oprname
INTO TABLE complex_ops_tmp
FROM pg_operator o, pg_type t
WHERE o.oprleft = t.oid and o.oprright = t.oid
and t.typname = 'complex';
which returns:
+------+---------+
|oid | oprname |
+------+---------+
|17321 | &lt; |
+------+---------+
|17322 | &lt;= |
+------+---------+
|17323 | = |
+------+---------+
|17324 | &gt;= |
+------+---------+
|17325 | &gt; |
+------+---------+
</pre>
(Again, some of your <CODE>oid</CODE> numbers will almost certainly
be different.) The operators we are interested in are
those with <CODE>oid</CODE>s 17321 through 17325. The values you
get will probably be different, and you should
substitute them for the values below. We can look at the
operator names and pick out the ones we just added.<p>
Now we're ready to update <CODE>pg_amop</CODE> with our new operator
class. The most important thing in this entire
discussion is that the operators are ordered, from less equal
through greater equal, in <CODE>pg_amop</CODE>. We add the
instances we need:
<pre> INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy,
amopselect, amopnpages)
SELECT am.oid, opcl.oid, c.opoid, 3,
'btreesel'::regproc, 'btreenpage'::regproc
FROM pg_am am, pg_opclass opcl, complex_ops_tmp c
WHERE amname = 'btree' and opcname = 'complex_abs_ops'
and c.oprname = '=';
</pre>
Note the order: "less than" is 1, "less than or equal"
is 2, "equal" is 3, "greater than or equal" is 4, and
"greater than" is 5.<p>
The last step (finally!) is registration of the
"support routine" previously described in our discussion of
<CODE>pg_am</CODE>. The <CODE>oid</CODE> of this support routine is stored in
the <CODE>pg_amproc</CODE> class, keyed by the access method <CODE>oid</CODE> and
the operator class <CODE>oid</CODE>. First, we need to register the
function in POSTGRES (recall that we put the <B>C</B> code
that implements this routine in the bottom of the file
in which we implemented the operator routines):
<pre> CREATE FUNCTION int4_abs_cmp(int4, int4)
RETURNS int4
AS '/usr/local/postgres95/tutorial/obj/complex.so'
LANGUAGE 'c';
SELECT oid, proname FROM pg_proc WHERE prname = 'int4_abs_cmp';
+------+--------------+
|oid | proname |
+------+--------------+
|17328 | int4_abs_cmp |
+------+--------------+
</pre>
(Again, your <CODE>oid</CODE> number will probably be different and
you should substitute the value you see for the value
below.) Recalling that the <B>B-tree</B> instance's oid is
403 and that of <CODE>int4_abs_ops</CODE> is 17314, we can add the
new instance as follows:
<pre> INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum)
VALUES ('403'::oid, -- btree oid
'17314'::oid, -- pg_opclass tuple
'17328'::oid, -- new pg_proc oid
'1'::int2);
</pre>
<p>
<HR>
<A NAME="8"><B>[8]</B></A> Strictly speaking, this routine can return a negative
number (&lt; 0), 0, or a non-zero positive number (&gt; 0).
<HR>
<font size=-1>
<A HREF="pg95user.html">[ TOC ]</A>
<A HREF="xaggr.html">[ Previous ]</A>
<A HREF="libpq.html">[ Next ]</A>
</font>
</BODY>
</HTML>