Commit Graph

13 Commits

Author SHA1 Message Date
Teodor Sigaev
419fe7cd1b Fix bug http://archives.postgresql.org/pgsql-bugs/2006-10/msg00258.php.
Fix string's length calculation for recoding, fix strlower() to avoid wrong
assumption about length of recoded string (was: recoded string is no greater
that source, it may not true for multibyte encodings)
Thanks to Thomas H. <me@alternize.com> and Magnus Hagander <mha@sollentuna.net>
2006-11-20 14:03:30 +00:00
Bruce Momjian
fa601357fb Sort reference of include files, "A" - "F". 2006-07-11 16:35:33 +00:00
Bruce Momjian
f3d99d160d Add CVS tag lines to files that were lacking them. 2006-03-11 04:38:42 +00:00
Teodor Sigaev
5e2707c45f Snowball multibyte. It's a pity, but snowball sources is very diferent for multibyte and
singlebyte encodings, so we should have snowball for every encodings.

I hope that finalize multibyte support work in tsearch2, but testing is needed...
2006-01-27 16:32:31 +00:00
Teodor Sigaev
cb4ea994c6 Improve support of multibyte encoding:
- tsvector_(in|out)
- tsquery_(in|out)
- to_tsvector
- to_tsquery, plainto_tsquery
- 'simple' dictionary
2005-12-12 11:10:12 +00:00
Bruce Momjian
1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane
8a65b820e2 Suppress signed-vs-unsigned-char warnings in contrib. 2005-09-24 19:14:05 +00:00
Teodor Sigaev
f82b853b47 1 Update Snowball sources
2 Makefile fixes
2005-09-15 11:14:18 +00:00
Tom Lane
278bd0cc22 For some reason access/tupmacs.h has been #including utils/memutils.h,
which is neither needed by nor related to that header.  Remove the bogus
inclusion and instead include the header in those C files that actually
need it.  Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
2005-05-06 17:24:55 +00:00
Teodor Sigaev
324300bc7c improve support of agglutinative languages (query with compound words).
regression=# select to_tsquery( '\'fotballklubber\'');
                   to_tsquery
------------------------------------------------
 'fotball' & 'klubb' | 'fot' & 'ball' & 'klubb'
(1 row)

So, changed interface to dictionaries, lexize method of dictionary shoud return
pointer to aray of TSLexeme structs instead of char**. Last element should
have TSLexeme->lexeme == NULL.

typedef struct {
        /* number of variant of split word , for example
                Word 'fotballklubber' (norwegian) has two varian to split:
                ( fotball, klubb ) and ( fot, ball, klubb ). So, dictionary
                should return:
                nvariant        lexeme
                1               fotball
                1               klubb
                2               fot
                2               ball
                2               klubb

        */
        uint16  nvariant;

        /* currently unused */
        uint16  flags;

        /* C-string */
        char    *lexeme;
} TSLexeme;
2005-01-25 15:24:38 +00:00
Bruce Momjian
089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane
8fd5b3ed67 Error message editing in contrib (mostly by Joe Conway --- thanks Joe!) 2003-07-24 17:52:50 +00:00
Teodor Sigaev
b88605337e tsearch2 module 2003-07-21 10:27:44 +00:00