Commit Graph

19 Commits

Author SHA1 Message Date
Teodor Sigaev
22505f4703 Add thesaurus dictionary which can replace N>0 lexemes by M>0 lexemes.
It required some changes in lexize algorithm, but interface with
dictionaries stays compatible with old dictionaries.

Funded by Georgia Public Library Service and LibLime, Inc.
2006-05-31 14:05:31 +00:00
Teodor Sigaev
8a3631f8d8 GIN: Generalized Inverted iNdex.
text[], int4[], Tsearch2 support for GIN.
2006-05-02 11:28:56 +00:00
Teodor Sigaev
38c4fe87ac Significantly improve ranking:
1) rank_cd now use weight of lexemes
2) rank_cd and rank can use any combination of normalization methods:
        no normalization
        normalization by log(length of document)
        -----/------- by length of document
        -----/------- by number of unique word in document
        -----/------- by log(number of unique word in document)
        -----/------- by number of covers (only rank_cd)

Improve cover's search.

TODO: changes in documentation
2006-03-02 19:07:19 +00:00
Peter Eisentraut
7f4f42fa10 Clean up CREATE FUNCTION syntax usage in contrib and elsewhere, in
particular get rid of single quotes around language names and old WITH ()
construct.
2006-02-27 16:09:50 +00:00
Teodor Sigaev
5e2707c45f Snowball multibyte. It's a pity, but snowball sources is very diferent for multibyte and
singlebyte encodings, so we should have snowball for every encodings.

I hope that finalize multibyte support work in tsearch2, but testing is needed...
2006-01-27 16:32:31 +00:00
Teodor Sigaev
c52795d18a Text parser rewritten:
- supports multibyte encodings
        - more strict rules for lexemes
        - flex isn't used
Add:
        - tsquery plainto_tsquery(text)
          Function makes tsquery from plain text.
        - &&, ||, !! operation for tsquery for combining
          tsquery from it's parts:  'foo & bar' || 'asd' => 'foo & bar | asd'
2005-11-21 12:27:57 +00:00
Teodor Sigaev
134bed8089 Fix rwrite(ARRAY) on 64-bit boxes:
Instead of getting elements of array manually call deconstruct_array
2005-11-09 09:26:04 +00:00
Teodor Sigaev
0645663e6c New features for tsearch2:
1 Comparison operation for tsquery
2 Btree index on tsquery
3 numnode(tsquery) - returns 'length' of tsquery
4 tsquery @ tsquery, tsquery ~ tsquery - contains, contained for tsquery.
  Note: They don't gurantee exact result, only MAY BE, so it
  useful only for speed up rewrite functions
5 GiST index support for @,~
6 rewrite():
        select rewrite(orig, what, to);
        select rewrite(ARRAY[orig, what, to]) from tsquery_table;
        select rewrite(orig, 'select what, to from tsquery_table;');
7 significantly improve cover algorithm
2005-11-08 17:08:46 +00:00
Tom Lane
177af51c04 Change tsearch2 to not use the unsafe practice of creating functions
that return INTERNAL without also having INTERNAL arguments.  Since the
functions in question aren't meant to be called by hand anyway, I just
redeclared them to take 'internal' instead of 'text'.  Also add code
to ProcedureCreate() to enforce the restriction, as I should have done
to start with :-(
2005-05-03 16:51:00 +00:00
Tom Lane
d3e36da789 Make the standard stopword files be sought relative to share_dir, so
that a tsearch2 installation can be relocatable.
2004-10-17 23:09:31 +00:00
Tom Lane
b04e70b11e Adjust tsearch2.sql to avoid use of COPY FROM STDIN, so as to
simplify life for the Win32 installer.  Per Dave Page.
2004-09-14 03:58:54 +00:00
Teodor Sigaev
de55c0cef6 1 Fix affixes with void replacement (AFAIK, it's only russian)
2 Optimize regex execution
2004-06-23 11:06:11 +00:00
Teodor Sigaev
553bc41633 1 add namespaces as Tom suggest http://www.pgsql.ru/db/mw/msg.html?mid=1987703
2 remove select qeury in inserts
2004-05-31 16:51:56 +00:00
Teodor Sigaev
a6ea6457fa Stat function now can show statistics per weight of lexemes 2004-05-28 15:36:49 +00:00
Teodor Sigaev
d1eb9fede5 Use regprocedure type instead of oid. Usefull for human read and dump/restore 2004-05-07 11:19:06 +00:00
Teodor Sigaev
f2c064afcb Cleanup vectors of GISTENTRY and eliminate problem with 64-bit strict-aligned
boxes. Change interface to user-defined GiST support methods union and
picksplit. Now instead of bytea struct it used special GistEntryVector
structure.
2004-03-30 15:45:33 +00:00
Teodor Sigaev
eebdfcdbe6 1 Minimize memory allocation for void (but not null) value.
2 Add silly ordering for ts_vector to aim grouping, union, except etc. Don't use BTree opclass (tsvector_ops).
2004-03-25 16:56:10 +00:00
Bruce Momjian
1d567aee07 The following bug has been logged online:
Bug reference:      1081
Logged by:          Aarjav Trivedi

Email address:      aarjav@cc.gatech.edu

PostgreSQL version: 7.4

Operating system:   Linux

Description:        Spelling error in tsearch2.sql leading to problems
with
tsearch

Details:

On line 620 of tsearch2.sql which is required to install and run
TSEARCH,

REATE FUNCTION tsstat_in(cstring)

should be

CREATE FUNCTION tsstat_in(cstring)

because of this error, TSEARCH fails to work as specified,
2004-02-20 20:42:29 +00:00
Peter Eisentraut
3d0d78ce2f Bring the makefiles up to our conventions. 2003-08-23 04:25:29 +00:00