Because of gcc -Wmissing-prototypes, all functions in dynamically
loadable modules must have a separate prototype declaration. This is
meant to detect global functions that are not declared in header files,
but in cases where the function is called via dfmgr, this is redundant.
Besides filling up space with boilerplate, this is a frequent source of
compiler warnings in extension modules.
We can fix that by creating the function prototype as part of the
PG_FUNCTION_INFO_V1 macro, which such modules have to use anyway. That
makes the code of modules cleaner, because there is one less place where
the entry points have to be listed, and creates an additional check that
functions have the right prototype.
Remove now redundant prototypes from contrib and other modules.
The initial coding just descended the index if any of the target trigrams
were possibly present at the next level down. But actually we can apply
trigramsMatchGraph() so as to take advantage of AND requirements when there
are some. The input data might contain false positive matches, but that
can only result in a false positive result, not false negative, so it's
safe to do it this way.
Alexander Korotkov
This wasn't addressed in the original patch, but it doesn't take very
much additional code to cover the case, so let's get it done.
Since pg_trgm 1.1 hasn't been released yet, I just changed the definition
of what's in it, rather than inventing a 1.2.
The latter was already the dominant use, and it's preferable because
in C the convention is that intXX means XX bits. Therefore, allowing
mixed use of int2, int4, int8, int16, int32 is obviously confusing.
Remove the typedefs for int2 and int4 for now. They don't seem to be
widely used outside of the PostgreSQL source tree, and the few uses
can probably be cleaned up by the time this ships.
Since gtrgm_penalty() is usually called many times in a row with the same
"newval" (to determine which item on an index page newval fits into best),
the makesign() calculation is repetitious. It's expensive enough to make
it worth caching the result, so do so. On my machine this is good for
more than a 40% savings in the time needed to build a trigram index on
/usr/share/dict/words. This is all per a suggestion of Heikki's.
In passing, make some mostly-cosmetic improvements in the caching logic in
the other functions in this file that rely on caching info in fn_extra.
This addresses only those cases that are easy to fix by adding or
moving a const qualifier or removing an unnecessary cast. There are
many more complicated cases remaining.
Unlike Btree-based LIKE optimization, this works for non-left-anchored
search patterns. The effectiveness of the search depends on how many
trigrams can be extracted from the pattern. (The worst case, with no
trigrams, degrades to a full-table scan, so this isn't a panacea. But
it can be very useful.)
Alexander Korotkov, reviewed by Jan Urbanski
"consistent" functions, and remove pg_amop.opreqcheck, as per recent
discussion. The main immediate benefit of this is that we no longer need
8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery
searches on GIN indexes. In future it should be possible to optimize some
other queries better than is done now, by detecting at runtime whether the
index match is exact or not.
Tom Lane, after an idea of Heikki's, and with some help from Teodor.
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields. While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.
Greg Stark with some help from Tom Lane
ways. I'm not totally sure that I caught everything, but at least now they pass
their regression tests with VARSIZE/SET_VARSIZE defined to reverse byte order.
* new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php)
* possible call pickSplit() for second and below columns
* add spl_(l|r)datum_exists to GIST_SPLITVEC -
pickSplit should check its values to use already defined
spl_(l|r)datum for splitting. pickSplit should set
spl_(l|r)datum_exists to 'false' (if they was 'true') to
signal to caller about using spl_(l|r)datum.
* support for old pickSplit(): not very optimal
but correct split
* remove 'bytes' field from GISTENTRY: in any case size of
value is defined by it's type.
* split GIST_SPLITVEC to two structures: one for using in picksplit
and second - for internal use.
* some code refactoring
* support of subsplit to rtree opclasses
TODO: add support of subsplit to contrib modules
more compliant with the error message style guide. In particular,
errdetail should begin with a capital letter and end with a period,
whereas errmsg should not. I also fixed a few related issues in
passing, such as fixing the repeated misspelling of "lexeme" in
contrib/tsearch2 (per Tom's suggestion).
sizebitvec of tsearch2, as well as identical code in several other
contrib modules. This provided about a 20X speedup in building a
large tsearch2 index ... didn't try to measure its effects for other
operations. Thanks to Stephan Vollmer for providing a test case.
--------------------------------------
The pg_trgm contrib module provides functions and index classes
for determining the similarity of text based on trigram
matching.