Commit Graph

220 Commits

Author SHA1 Message Date
Teodor Sigaev
419fe7cd1b Fix bug http://archives.postgresql.org/pgsql-bugs/2006-10/msg00258.php.
Fix string's length calculation for recoding, fix strlower() to avoid wrong
assumption about length of recoded string (was: recoded string is no greater
that source, it may not true for multibyte encodings)
Thanks to Thomas H. <me@alternize.com> and Magnus Hagander <mha@sollentuna.net>
2006-11-20 14:03:30 +00:00
Neil Conway
9d6f26325f Fix two typos. 2006-11-08 19:06:15 +00:00
Teodor Sigaev
092ed294fc New README, forgotten when docs was updated 2006-11-08 16:00:29 +00:00
Teodor Sigaev
bf028fa8a6 Add description of new features 2006-10-31 16:23:05 +00:00
Tom Lane
e378f82e00 Make use of qsort_arg in several places that were formerly using klugy
static variables.  This avoids any risk of potential non-reentrancy,
and in particular offers a much cleaner workaround for the Intel compiler
bug that was affecting ginutil.c.
2006-10-05 17:57:40 +00:00
Tom Lane
c48f2e3124 Improve error messages from to_tsquery per yesterday's discussion:
provide the bad input, and be sure to mention that we are talking about
a tsearch query.
2006-10-04 17:52:52 +00:00
Bruce Momjian
f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Bruce Momjian
26ffa627ac Update tsearch2 README.
Robert Treat
2006-10-02 22:32:10 +00:00
Tom Lane
41dcc65c0e Rename the uninstall scripts for contrib/lo and contrib/tsearch2 to
match the convention that foo's uninstall script is uninstall_foo.sql.
Also, stop installing lo_test.sql, which really ought to be made into
a regression test anyway (though it's unclear how to avoid a dependency
on the current OID counter...)
2006-09-11 15:14:46 +00:00
Tom Lane
684ad6a92f Rename contrib contains/contained-by operators to @> and <@, per discussion. 2006-09-10 17:36:52 +00:00
Bruce Momjian
0c4f2894f9 Use '' rather than \' for literal single quotes in strings in
/contrib/tsearch2.

Teodor Sigaev
2006-09-02 22:03:30 +00:00
Teodor Sigaev
72a3582522 Add description of tsvector type layout 2006-08-29 13:57:34 +00:00
Teodor Sigaev
3a214ab0f1 Remove pos comparison in silly_cmp_tsvector(): it is not a semantically significant 2006-08-29 13:39:20 +00:00
Teodor Sigaev
9711782628 Fix incorrect length of lexemes in silly_cmp_tsvector() 2006-08-29 13:31:54 +00:00
Teodor Sigaev
74dbba701f Fix regression tests: after changing comparing function
order is changed.
2006-08-25 07:39:08 +00:00
Teodor Sigaev
8f91e2b607 Fix compare bug for tsvector: problem was in aligment. Per Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> and Phil Frost <indigo@bitglue.com> 2006-08-24 17:37:34 +00:00
Tom Lane
a7143b3088 Fix some makefiles that fail to yield good results from 'make -qp'.
This doesn't really matter for ordinary building of Postgres, but it's
useful for automated checks, such as my just-committed pgcheckdefines.
2006-07-15 03:33:14 +00:00
Tom Lane
ae643747b1 Fix a passel of recently-committed violations of the rule 'thou shalt
have no other gods before c.h'.  Also remove some demonstrably redundant
#include lines, mostly of <errno.h> which was added to c.h years ago.
2006-07-14 05:28:29 +00:00
Bruce Momjian
66c15dfda1 Adjust /contrib for new include file contents. 2006-07-13 16:57:31 +00:00
Bruce Momjian
ac230e7431 Alphabetically order reference to include files, "S"-"Z". 2006-07-11 18:26:11 +00:00
Bruce Momjian
0ff3461bcc Alphabetically order reference to include files, "N" - "S". 2006-07-11 17:26:59 +00:00
Teodor Sigaev
234163649e GIN improvements
- Replace sorted array of entries in maintenance_work_mem to binary tree,
  this should improve create performance.
- More precisely calculate allocated memory, eliminate leaks
  with user-defined extractValue()
- Improve wordings in tsearch2
2006-07-11 16:55:34 +00:00
Bruce Momjian
fa601357fb Sort reference of include files, "A" - "F". 2006-07-11 16:35:33 +00:00
Bruce Momjian
c5133e5920 Allow /contrib include files to compile on their own. 2006-07-10 22:06:11 +00:00
Teodor Sigaev
1f7ef548ec Changes
* new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php)
  * possible call pickSplit() for second and below columns
  * add spl_(l|r)datum_exists to GIST_SPLITVEC -
    pickSplit should check its values to use already defined
    spl_(l|r)datum for splitting. pickSplit should set
    spl_(l|r)datum_exists to 'false' (if they was 'true') to
    signal to caller about using spl_(l|r)datum.
  * support for old pickSplit(): not very optimal
    but correct split
* remove 'bytes' field from GISTENTRY: in any case size of
  value is defined by it's type.
* split GIST_SPLITVEC to two structures: one for using in picksplit
  and second - for internal use.
* some code refactoring
* support of subsplit to rtree opclasses

TODO: add support of subsplit to contrib modules
2006-06-28 12:00:14 +00:00
Teodor Sigaev
04e9704b9e Now ispell dictionary can eat dictionaries in MySpell format,
used by OpenOffice. Dictionaries are placed at
http://lingucomponent.openoffice.org/spell_dic.html
Dictionary automatically recognizes format of files.

Warning. MySpell's format has limitation with compound
word support: it's impossible to mark affix as
compound-only affix. So for norwegian, german etc
languages it's recommended to use original ispell format.
For that reason I don't want to remove my2ispell
scripts, it's has workaround at least for norwegian language.
2006-06-09 13:25:59 +00:00
Teodor Sigaev
92bcb5abe0 Allow do not lexize words in substitution.
Docs will be submitted some later, now it's at
 http://www.sai.msu.su/~megera/oddmuse/index.cgi/Thesaurus_dictionary
2006-06-06 16:25:55 +00:00
Teodor Sigaev
a513ce2dff Fix wrong NOTICE/ERROR levels 2006-06-02 18:03:06 +00:00
Teodor Sigaev
efe1d427da Distinguish between stop-word recognized in thesaurus_lexize() 2006-06-02 17:55:40 +00:00
Teodor Sigaev
c7faf45160 Add more strict check of stop and non-recognized words,
allow only recognized words in thezaurus configuration file.
2006-06-02 15:35:42 +00:00
Teodor Sigaev
c269f0f1e2 fix comparison with SPI_processed 2006-05-31 14:53:41 +00:00
Teodor Sigaev
22505f4703 Add thesaurus dictionary which can replace N>0 lexemes by M>0 lexemes.
It required some changes in lexize algorithm, but interface with
dictionaries stays compatible with old dictionaries.

Funded by Georgia Public Library Service and LibLime, Inc.
2006-05-31 14:05:31 +00:00
Tom Lane
a0ffab351e Magic blocks don't do us any good unless we use 'em ... so install one
in every shared library.
2006-05-30 22:12:16 +00:00
Bruce Momjian
19892feb3c Back out \' change for tsearch2, broke regression tests. 2006-05-19 04:39:47 +00:00
Bruce Momjian
cc84163fa9 Use SQL standard '' rather than \' in /contrib. Backpatch to 8.1.X. 2006-05-19 02:38:47 +00:00
Teodor Sigaev
8a3631f8d8 GIN: Generalized Inverted iNdex.
text[], int4[], Tsearch2 support for GIN.
2006-05-02 11:28:56 +00:00
Teodor Sigaev
e30df619cd Fix stupid mistake in rank_cd_def cleanup 2006-04-10 09:56:52 +00:00
Bruce Momjian
f3d99d160d Add CVS tag lines to files that were lacking them. 2006-03-11 04:38:42 +00:00
Teodor Sigaev
38c4fe87ac Significantly improve ranking:
1) rank_cd now use weight of lexemes
2) rank_cd and rank can use any combination of normalization methods:
        no normalization
        normalization by log(length of document)
        -----/------- by length of document
        -----/------- by number of unique word in document
        -----/------- by log(number of unique word in document)
        -----/------- by number of covers (only rank_cd)

Improve cover's search.

TODO: changes in documentation
2006-03-02 19:07:19 +00:00
Neil Conway
8e5a10d46c This patch makes the error message strings throughout the backend
more compliant with the error message style guide. In particular,
errdetail should begin with a capital letter and end with a period,
whereas errmsg should not. I also fixed a few related issues in
passing, such as fixing the repeated misspelling of "lexeme" in
contrib/tsearch2 (per Tom's suggestion).
2006-03-01 06:30:32 +00:00
Peter Eisentraut
7f4f42fa10 Clean up CREATE FUNCTION syntax usage in contrib and elsewhere, in
particular get rid of single quotes around language names and old WITH ()
construct.
2006-02-27 16:09:50 +00:00
Teodor Sigaev
dde9457294 Fixing and improve compound word support. This changes cannot be applied to
previous version iwthout recreating tsvector fields...

Thanks to Alexander Presber <aljoscha@weisshuhn.de> to discover a problem.
2006-02-20 17:51:05 +00:00
Tom Lane
b35fdaaa1a Clean up some signedness warnings. 2006-02-10 15:57:58 +00:00
Teodor Sigaev
01f2172ec1 Allow "'" symbol in affixes ("'s" affix in english): it was diallowed during
multibyte support work.
Add line number to error output during affix file parsing.
2006-02-10 12:56:14 +00:00
Teodor Sigaev
011c520cb6 renew output of regression test accordingly to
http://archives.postgresql.org/pgsql-committers/2006-02/msg00089.php
2006-02-10 11:18:40 +00:00
Teodor Sigaev
46a25ce6a9 1 Fix bug with very short word: prefix and suffix might be overlapped,
sorry but fix can't be applyed to previous version: it's require
  refill tsvector...
2 Small optimize of load time for huge dictionaries
3 use palloc instead of malloc during load dict file
2006-02-09 18:04:20 +00:00
Teodor Sigaev
a6fefc866c Check number of affixes to prevent core dump with zero number of affixes 2006-02-06 15:45:34 +00:00
Teodor Sigaev
5e2707c45f Snowball multibyte. It's a pity, but snowball sources is very diferent for multibyte and
singlebyte encodings, so we should have snowball for every encodings.

I hope that finalize multibyte support work in tsearch2, but testing is needed...
2006-01-27 16:32:31 +00:00
Teodor Sigaev
80324fb1e3 Fix typeing as Tom suggest 2006-01-23 14:24:06 +00:00
Tom Lane
33feb55c47 Replace bitwise looping with bytewise looping in hemdistsign and
sizebitvec of tsearch2, as well as identical code in several other
contrib modules.  This provided about a 20X speedup in building a
large tsearch2 index ... didn't try to measure its effects for other
operations.  Thanks to Stephan Vollmer for providing a test case.
2006-01-20 22:46:16 +00:00
Teodor Sigaev
7ac8a4be89 Multibyte encodings support for ISpell dictionary 2005-12-21 13:05:49 +00:00
Teodor Sigaev
cb4ea994c6 Improve support of multibyte encoding:
- tsvector_(in|out)
- tsquery_(in|out)
- to_tsvector
- to_tsquery, plainto_tsquery
- 'simple' dictionary
2005-12-12 11:10:12 +00:00
Teodor Sigaev
faacdab101 Improve tag recognizing 2005-12-08 09:11:19 +00:00
Teodor Sigaev
9551ab2fe9 Fix small memory leak 2005-12-07 13:30:15 +00:00
Teodor Sigaev
4f94b49a31 Improve word parser.
- allow ~ in filenames
 - -8.2.1 now is '-' and '8.2.1' instead of '-8.2' '.' '3'
 - '.text' now is not a file
2005-12-07 13:12:54 +00:00
Teodor Sigaev
e8c81e179e Improve word parser.
- improve file and path recognition
 - fix misspeling
 - improve tag recognition
2005-12-05 18:13:22 +00:00
Bruce Momjian
436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Teodor Sigaev
3c6cd8a113 Fixes motivated by snake and spoonbill pgbuildfarm members 2005-11-22 09:01:35 +00:00
Teodor Sigaev
62699337bc remove forgotten // comments 2005-11-21 18:00:52 +00:00
Teodor Sigaev
c52795d18a Text parser rewritten:
- supports multibyte encodings
        - more strict rules for lexemes
        - flex isn't used
Add:
        - tsquery plainto_tsquery(text)
          Function makes tsquery from plain text.
        - &&, ||, !! operation for tsquery for combining
          tsquery from it's parts:  'foo & bar' || 'asd' => 'foo & bar | asd'
2005-11-21 12:27:57 +00:00
Tom Lane
1d0d8d3c38 Mop-up for nulls-in-arrays patch: fix some places that access array
contents directly.
2005-11-18 02:38:24 +00:00
Tom Lane
cecb607559 Make SQL arrays support null elements. This commit fixes the core array
functionality, but I still need to make another pass looking at places
that incidentally use arrays (such as ACL manipulation) to make sure they
are null-safe.  Contrib needs work too.
I have not changed the behaviors that are still under discussion about
array comparison and what to do with lower bounds.
2005-11-17 22:14:56 +00:00
Teodor Sigaev
bad1a5c217 Use postgres-wide macros BITS_PER_BYTE instead self-definenig macros, also use it for calculating bit length of TPQTGist 2005-11-14 14:44:06 +00:00
Teodor Sigaev
34b934f658 fix returning value 2005-11-14 09:59:13 +00:00
Teodor Sigaev
134bed8089 Fix rwrite(ARRAY) on 64-bit boxes:
Instead of getting elements of array manually call deconstruct_array
2005-11-09 09:26:04 +00:00
Teodor Sigaev
0645663e6c New features for tsearch2:
1 Comparison operation for tsquery
2 Btree index on tsquery
3 numnode(tsquery) - returns 'length' of tsquery
4 tsquery @ tsquery, tsquery ~ tsquery - contains, contained for tsquery.
  Note: They don't gurantee exact result, only MAY BE, so it
  useful only for speed up rewrite functions
5 GiST index support for @,~
6 rewrite():
        select rewrite(orig, what, to);
        select rewrite(ARRAY[orig, what, to]) from tsquery_table;
        select rewrite(orig, 'select what, to from tsquery_table;');
7 significantly improve cover algorithm
2005-11-08 17:08:46 +00:00
Tom Lane
2a8d3d83ef R-tree is dead ... long live GiST. 2005-11-07 17:36:47 +00:00
Teodor Sigaev
6812abb673 Fix incorrect header size macros 2005-11-03 18:16:31 +00:00
Teodor Sigaev
1dd6bd19fa Add sanity check of query 2005-10-31 13:47:09 +00:00
Teodor Sigaev
21b748e76a 1 Fix problem with lost precision in rank with OR-ed lexemes
2 Allow tsquery_in to input void tsquery: resolve dump/restore problem with tsquery
2005-10-28 13:05:06 +00:00
Tom Lane
c62b29a603 Fix several contrib makefiles that failed in VPATH builds, particularly
when not using gcc (which has slightly nonstandard inclusion rules).
2005-10-18 01:30:49 +00:00
Tom Lane
ad148c4154 Suppress warnings on platforms where fprintf is a macro (eg, recent
Fedora).  This was already done by somebody for the core flex files,
but these contrib files seem to have been missed.
2005-10-15 20:37:36 +00:00
Tom Lane
b562639561 Fix bogus error test in get_ti_Oid(). 2005-10-15 20:28:59 +00:00
Bruce Momjian
1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane
0b36cb83dc PGXS should be set with := not =, as specified in the documentation,
to avoid useless multiple executions of pg_config.
2005-09-27 17:13:14 +00:00
Tom Lane
0df7f493f8 Clean up possibly-uninitialized-variable warnings reported by gcc 4.x. 2005-09-24 23:07:18 +00:00
Tom Lane
8a65b820e2 Suppress signed-vs-unsigned-char warnings in contrib. 2005-09-24 19:14:05 +00:00
Teodor Sigaev
f82b853b47 1 Update Snowball sources
2 Makefile fixes
2005-09-15 11:14:18 +00:00
Tom Lane
ac652466ec Partial fixes for contrib build on AIX: include -lm where needed.
Per Rocco Altier.
2005-07-24 23:30:10 +00:00
Bruce Momjian
21634e513f Add extra argument for new pg_regexec API. 2005-07-10 18:31:59 +00:00
Bruce Momjian
bb3cce4ec9 Add E'' syntax so eventually normal strings can treat backslashes
literally.

Add GUC variables:

        "escape_string_warning" - warn about backslashes in non-E strings
        "escape_string_syntax" - supports E'' syntax?
        "standard_compliant_strings" - treats backslashes literally in ''

Update code to use E'' when escapes are used.
2005-06-26 03:04:37 +00:00
Tom Lane
368739dca8 Fix bogus assumption that sizeof() produces an int-sized result. 2005-06-20 00:32:22 +00:00
Teodor Sigaev
7148de1fa8 Prevent to divide by zero and range out of 0..1 2005-06-01 11:45:03 +00:00
Tom Lane
978129f28e Document get_call_result_type() and friends; mark TypeGetTupleDesc()
and RelationNameGetTupleDesc() as deprecated; remove uses of the
latter in the contrib library.  Along the way, clean up crosstab()
code and documentation a little.
2005-05-30 23:09:07 +00:00
Bruce Momjian
b492c3accc Add parentheses to macros when args are used in computations. Without
them, the executation behavior could be unexpected.
2005-05-25 21:40:43 +00:00
Neil Conway
36ab600511 Cleanup of GiST extensions in contrib/: now that we always invoke GiST
methods in a short-lived memory context, there is no need for GiST methods
to do their own manual (and error-prone) memory management.
2005-05-21 12:08:06 +00:00
Tom Lane
278bd0cc22 For some reason access/tupmacs.h has been #including utils/memutils.h,
which is neither needed by nor related to that header.  Remove the bogus
inclusion and instead include the header in those C files that actually
need it.  Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
2005-05-06 17:24:55 +00:00
Tom Lane
177af51c04 Change tsearch2 to not use the unsafe practice of creating functions
that return INTERNAL without also having INTERNAL arguments.  Since the
functions in question aren't meant to be called by hand anyway, I just
redeclared them to take 'internal' instead of 'text'.  Also add code
to ProcedureCreate() to enforce the restriction, as I should have done
to start with :-(
2005-05-03 16:51:00 +00:00
Teodor Sigaev
04ce41ca62 Add comment about permissions on pg_ts* tables 2005-04-19 13:58:48 +00:00
Teodor Sigaev
fb13881f42 1 fix various comparing functions
2 implement gtsvector_out for use with gevel module (debug GiST indexes, http://www.sai.msu.su/~megera/postgres/gist/gevel/)
2005-03-31 15:08:08 +00:00
Teodor Sigaev
31b6d840f6 Prevent rank change in case of duplicate search terms 2005-03-05 15:48:32 +00:00
Tom Lane
c0e0d3e2e9 Avoid unnecessary dependence on u_int16_t, per buildfarm failure.
(It doesn't compile on HPUX either...)
2005-01-26 18:49:39 +00:00
Teodor Sigaev
324300bc7c improve support of agglutinative languages (query with compound words).
regression=# select to_tsquery( '\'fotballklubber\'');
                   to_tsquery
------------------------------------------------
 'fotball' & 'klubb' | 'fot' & 'ball' & 'klubb'
(1 row)

So, changed interface to dictionaries, lexize method of dictionary shoud return
pointer to aray of TSLexeme structs instead of char**. Last element should
have TSLexeme->lexeme == NULL.

typedef struct {
        /* number of variant of split word , for example
                Word 'fotballklubber' (norwegian) has two varian to split:
                ( fotball, klubb ) and ( fot, ball, klubb ). So, dictionary
                should return:
                nvariant        lexeme
                1               fotball
                1               klubb
                2               fot
                2               ball
                2               klubb

        */
        uint16  nvariant;

        /* currently unused */
        uint16  flags;

        /* C-string */
        char    *lexeme;
} TSLexeme;
2005-01-25 15:24:38 +00:00
Teodor Sigaev
fe30edbab8 Change
typedef struct {} WordEntryPos;
to
typedef uint16 WordEntryPos
according to http://www.pgsql.ru/db/mw/msg.html?mid=2035188

Require re-fill all tsvector fields and reindex tsvector indexes.
2005-01-25 12:36:25 +00:00
Teodor Sigaev
5b354d2c7e Fixes:
1 Report error message instead of do nothing in case of error in regex
2 Malloced storage for mask, find and repl part of Affix. This parts may be
  large enough in real life (for example in czech, thanks to moje <moje@kalhotky.net>)
2005-01-11 16:07:55 +00:00
Neil Conway
1b3e769682 This patch makes some cleanups to contrib/ to silence some sparse
warnings:

- remove pointless "extern" keyword from some function definitions in
contrib/tsearch2

- use "NULL" not "0" as NULL pointer in contrib/tsearch,
contrib/tsearch2, contrib/pgbench, and contrib/vacuumlo
2004-11-09 06:09:40 +00:00
Tom Lane
0636d55843 Fix some more 'old-style parameter declaration' warnings. 2004-10-25 02:30:29 +00:00
Tom Lane
3fdd33ab99 Avoid macro-redefinition warnings on Windows, per Andrew Dunstan. 2004-10-21 19:49:27 +00:00
Tom Lane
380bd04c16 Standardize on using the Min, Max, and Abs macros that are in our c.h file,
getting rid of numerous ad-hoc versions that have popped up in various
places.  Shortens code and avoids conflict with Windows min() and max()
macros.
2004-10-21 19:28:36 +00:00
Tom Lane
d3e36da789 Make the standard stopword files be sought relative to share_dir, so
that a tsearch2 installation can be relocatable.
2004-10-17 23:09:31 +00:00