The previous coding supposed that the first differing bytes in two varlena
datums must have the same sign difference as their overall comparison
result. This is obviously bogus for text strings in non-C locales, and
probably wrong for numeric, and even for bytea I think it was wrong on
machines where char is signed. When the assumption failed, the function
could deliver a zero or negative penalty in situations where such a result
is quite ridiculous, leading the core GiST code to make very bad page-split
decisions.
To fix, take the absolute values of the byte-level differences. Also,
switch the code to using unsigned char not just char, so that the behavior
will be consistent whether char is signed or not.
Per investigation of a trouble report from Tomas Vondra. Back-patch to all
supported branches.
gbt_var_bin_union() failed to do the right thing when the existing range
needed to be widened at both ends rather than just one end. This could
result in an invalid index in which keys that are present would not be
found by searches, because the searches would not think they need to
descend to the relevant leaf pages. This error affected all the varlena
datatypes supported by btree_gist (text, bytea, bit, numeric).
Per investigation of a trouble report from Tomas Vondra. (There is also
an issue in gbt_var_penalty(), but that should only result in inefficiency
not wrong answers. I'm committing this separately so that we have a git
state in which it can be tested that bad penalty results don't produce
invalid indexes.) Back-patch to all supported branches.
Make use of the collation attached to the index column, instead of
hard-wiring DEFAULT_COLLATION_OID. (Note: in theory this could require
reindexing btree_gist indexes on textual columns, but I rather doubt anyone
has one with a non-default declared collation as yet.)
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names. Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught. In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
static variables. This avoids any risk of potential non-reentrancy,
and in particular offers a much cleaner workaround for the Intel compiler
bug that was affecting ginutil.c.
* new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php)
* possible call pickSplit() for second and below columns
* add spl_(l|r)datum_exists to GIST_SPLITVEC -
pickSplit should check its values to use already defined
spl_(l|r)datum for splitting. pickSplit should set
spl_(l|r)datum_exists to 'false' (if they was 'true') to
signal to caller about using spl_(l|r)datum.
* support for old pickSplit(): not very optimal
but correct split
* remove 'bytes' field from GISTENTRY: in any case size of
value is defined by it's type.
* split GIST_SPLITVEC to two structures: one for using in picksplit
and second - for internal use.
* some code refactoring
* support of subsplit to rtree opclasses
TODO: add support of subsplit to contrib modules
- Fix wrong index results on text, char, varchar for multibyte strings
- Fix some SIGFPE signals
- Add support for infinite timestamps
- Because of locale settings, btree_gist can not be a prefix index anymore (for text).
Each node holds now just the lower and upper boundary.