compatibility package. This supports importing dumps from past versions
using tsearch2, and provides the old names and API for most functions
that were changed. (rewrite(ARRAY[...]) is a glaring omission, though.)
Pavel Stehule and Tom Lane
It required some changes in lexize algorithm, but interface with
dictionaries stays compatible with old dictionaries.
Funded by Georgia Public Library Service and LibLime, Inc.
1) rank_cd now use weight of lexemes
2) rank_cd and rank can use any combination of normalization methods:
no normalization
normalization by log(length of document)
-----/------- by length of document
-----/------- by number of unique word in document
-----/------- by log(number of unique word in document)
-----/------- by number of covers (only rank_cd)
Improve cover's search.
TODO: changes in documentation
singlebyte encodings, so we should have snowball for every encodings.
I hope that finalize multibyte support work in tsearch2, but testing is needed...
- supports multibyte encodings
- more strict rules for lexemes
- flex isn't used
Add:
- tsquery plainto_tsquery(text)
Function makes tsquery from plain text.
- &&, ||, !! operation for tsquery for combining
tsquery from it's parts: 'foo & bar' || 'asd' => 'foo & bar | asd'
1 Comparison operation for tsquery
2 Btree index on tsquery
3 numnode(tsquery) - returns 'length' of tsquery
4 tsquery @ tsquery, tsquery ~ tsquery - contains, contained for tsquery.
Note: They don't gurantee exact result, only MAY BE, so it
useful only for speed up rewrite functions
5 GiST index support for @,~
6 rewrite():
select rewrite(orig, what, to);
select rewrite(ARRAY[orig, what, to]) from tsquery_table;
select rewrite(orig, 'select what, to from tsquery_table;');
7 significantly improve cover algorithm
literally.
Add GUC variables:
"escape_string_warning" - warn about backslashes in non-E strings
"escape_string_syntax" - supports E'' syntax?
"standard_compliant_strings" - treats backslashes literally in ''
Update code to use E'' when escapes are used.
The 'word' variable there is initialised from
the prs->words array, but immediately after,
that array may be reallocated, thus leaving
word pointing to unallocated memory.