Commit Graph

192 Commits

Author SHA1 Message Date
Teodor Sigaev
efe1d427da Distinguish between stop-word recognized in thesaurus_lexize() 2006-06-02 17:55:40 +00:00
Teodor Sigaev
c7faf45160 Add more strict check of stop and non-recognized words,
allow only recognized words in thezaurus configuration file.
2006-06-02 15:35:42 +00:00
Teodor Sigaev
c269f0f1e2 fix comparison with SPI_processed 2006-05-31 14:53:41 +00:00
Teodor Sigaev
22505f4703 Add thesaurus dictionary which can replace N>0 lexemes by M>0 lexemes.
It required some changes in lexize algorithm, but interface with
dictionaries stays compatible with old dictionaries.

Funded by Georgia Public Library Service and LibLime, Inc.
2006-05-31 14:05:31 +00:00
Tom Lane
a0ffab351e Magic blocks don't do us any good unless we use 'em ... so install one
in every shared library.
2006-05-30 22:12:16 +00:00
Bruce Momjian
19892feb3c Back out \' change for tsearch2, broke regression tests. 2006-05-19 04:39:47 +00:00
Bruce Momjian
cc84163fa9 Use SQL standard '' rather than \' in /contrib. Backpatch to 8.1.X. 2006-05-19 02:38:47 +00:00
Teodor Sigaev
8a3631f8d8 GIN: Generalized Inverted iNdex.
text[], int4[], Tsearch2 support for GIN.
2006-05-02 11:28:56 +00:00
Teodor Sigaev
e30df619cd Fix stupid mistake in rank_cd_def cleanup 2006-04-10 09:56:52 +00:00
Bruce Momjian
f3d99d160d Add CVS tag lines to files that were lacking them. 2006-03-11 04:38:42 +00:00
Teodor Sigaev
38c4fe87ac Significantly improve ranking:
1) rank_cd now use weight of lexemes
2) rank_cd and rank can use any combination of normalization methods:
        no normalization
        normalization by log(length of document)
        -----/------- by length of document
        -----/------- by number of unique word in document
        -----/------- by log(number of unique word in document)
        -----/------- by number of covers (only rank_cd)

Improve cover's search.

TODO: changes in documentation
2006-03-02 19:07:19 +00:00
Neil Conway
8e5a10d46c This patch makes the error message strings throughout the backend
more compliant with the error message style guide. In particular,
errdetail should begin with a capital letter and end with a period,
whereas errmsg should not. I also fixed a few related issues in
passing, such as fixing the repeated misspelling of "lexeme" in
contrib/tsearch2 (per Tom's suggestion).
2006-03-01 06:30:32 +00:00
Peter Eisentraut
7f4f42fa10 Clean up CREATE FUNCTION syntax usage in contrib and elsewhere, in
particular get rid of single quotes around language names and old WITH ()
construct.
2006-02-27 16:09:50 +00:00
Teodor Sigaev
dde9457294 Fixing and improve compound word support. This changes cannot be applied to
previous version iwthout recreating tsvector fields...

Thanks to Alexander Presber <aljoscha@weisshuhn.de> to discover a problem.
2006-02-20 17:51:05 +00:00
Tom Lane
b35fdaaa1a Clean up some signedness warnings. 2006-02-10 15:57:58 +00:00
Teodor Sigaev
01f2172ec1 Allow "'" symbol in affixes ("'s" affix in english): it was diallowed during
multibyte support work.
Add line number to error output during affix file parsing.
2006-02-10 12:56:14 +00:00
Teodor Sigaev
011c520cb6 renew output of regression test accordingly to
http://archives.postgresql.org/pgsql-committers/2006-02/msg00089.php
2006-02-10 11:18:40 +00:00
Teodor Sigaev
46a25ce6a9 1 Fix bug with very short word: prefix and suffix might be overlapped,
sorry but fix can't be applyed to previous version: it's require
  refill tsvector...
2 Small optimize of load time for huge dictionaries
3 use palloc instead of malloc during load dict file
2006-02-09 18:04:20 +00:00
Teodor Sigaev
a6fefc866c Check number of affixes to prevent core dump with zero number of affixes 2006-02-06 15:45:34 +00:00
Teodor Sigaev
5e2707c45f Snowball multibyte. It's a pity, but snowball sources is very diferent for multibyte and
singlebyte encodings, so we should have snowball for every encodings.

I hope that finalize multibyte support work in tsearch2, but testing is needed...
2006-01-27 16:32:31 +00:00
Teodor Sigaev
80324fb1e3 Fix typeing as Tom suggest 2006-01-23 14:24:06 +00:00
Tom Lane
33feb55c47 Replace bitwise looping with bytewise looping in hemdistsign and
sizebitvec of tsearch2, as well as identical code in several other
contrib modules.  This provided about a 20X speedup in building a
large tsearch2 index ... didn't try to measure its effects for other
operations.  Thanks to Stephan Vollmer for providing a test case.
2006-01-20 22:46:16 +00:00
Teodor Sigaev
7ac8a4be89 Multibyte encodings support for ISpell dictionary 2005-12-21 13:05:49 +00:00
Teodor Sigaev
cb4ea994c6 Improve support of multibyte encoding:
- tsvector_(in|out)
- tsquery_(in|out)
- to_tsvector
- to_tsquery, plainto_tsquery
- 'simple' dictionary
2005-12-12 11:10:12 +00:00
Teodor Sigaev
faacdab101 Improve tag recognizing 2005-12-08 09:11:19 +00:00
Teodor Sigaev
9551ab2fe9 Fix small memory leak 2005-12-07 13:30:15 +00:00
Teodor Sigaev
4f94b49a31 Improve word parser.
- allow ~ in filenames
 - -8.2.1 now is '-' and '8.2.1' instead of '-8.2' '.' '3'
 - '.text' now is not a file
2005-12-07 13:12:54 +00:00
Teodor Sigaev
e8c81e179e Improve word parser.
- improve file and path recognition
 - fix misspeling
 - improve tag recognition
2005-12-05 18:13:22 +00:00
Bruce Momjian
436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Teodor Sigaev
3c6cd8a113 Fixes motivated by snake and spoonbill pgbuildfarm members 2005-11-22 09:01:35 +00:00
Teodor Sigaev
62699337bc remove forgotten // comments 2005-11-21 18:00:52 +00:00
Teodor Sigaev
c52795d18a Text parser rewritten:
- supports multibyte encodings
        - more strict rules for lexemes
        - flex isn't used
Add:
        - tsquery plainto_tsquery(text)
          Function makes tsquery from plain text.
        - &&, ||, !! operation for tsquery for combining
          tsquery from it's parts:  'foo & bar' || 'asd' => 'foo & bar | asd'
2005-11-21 12:27:57 +00:00
Tom Lane
1d0d8d3c38 Mop-up for nulls-in-arrays patch: fix some places that access array
contents directly.
2005-11-18 02:38:24 +00:00
Tom Lane
cecb607559 Make SQL arrays support null elements. This commit fixes the core array
functionality, but I still need to make another pass looking at places
that incidentally use arrays (such as ACL manipulation) to make sure they
are null-safe.  Contrib needs work too.
I have not changed the behaviors that are still under discussion about
array comparison and what to do with lower bounds.
2005-11-17 22:14:56 +00:00
Teodor Sigaev
bad1a5c217 Use postgres-wide macros BITS_PER_BYTE instead self-definenig macros, also use it for calculating bit length of TPQTGist 2005-11-14 14:44:06 +00:00
Teodor Sigaev
34b934f658 fix returning value 2005-11-14 09:59:13 +00:00
Teodor Sigaev
134bed8089 Fix rwrite(ARRAY) on 64-bit boxes:
Instead of getting elements of array manually call deconstruct_array
2005-11-09 09:26:04 +00:00
Teodor Sigaev
0645663e6c New features for tsearch2:
1 Comparison operation for tsquery
2 Btree index on tsquery
3 numnode(tsquery) - returns 'length' of tsquery
4 tsquery @ tsquery, tsquery ~ tsquery - contains, contained for tsquery.
  Note: They don't gurantee exact result, only MAY BE, so it
  useful only for speed up rewrite functions
5 GiST index support for @,~
6 rewrite():
        select rewrite(orig, what, to);
        select rewrite(ARRAY[orig, what, to]) from tsquery_table;
        select rewrite(orig, 'select what, to from tsquery_table;');
7 significantly improve cover algorithm
2005-11-08 17:08:46 +00:00
Tom Lane
2a8d3d83ef R-tree is dead ... long live GiST. 2005-11-07 17:36:47 +00:00
Teodor Sigaev
6812abb673 Fix incorrect header size macros 2005-11-03 18:16:31 +00:00
Teodor Sigaev
1dd6bd19fa Add sanity check of query 2005-10-31 13:47:09 +00:00
Teodor Sigaev
21b748e76a 1 Fix problem with lost precision in rank with OR-ed lexemes
2 Allow tsquery_in to input void tsquery: resolve dump/restore problem with tsquery
2005-10-28 13:05:06 +00:00
Tom Lane
c62b29a603 Fix several contrib makefiles that failed in VPATH builds, particularly
when not using gcc (which has slightly nonstandard inclusion rules).
2005-10-18 01:30:49 +00:00
Tom Lane
ad148c4154 Suppress warnings on platforms where fprintf is a macro (eg, recent
Fedora).  This was already done by somebody for the core flex files,
but these contrib files seem to have been missed.
2005-10-15 20:37:36 +00:00
Tom Lane
b562639561 Fix bogus error test in get_ti_Oid(). 2005-10-15 20:28:59 +00:00
Bruce Momjian
1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane
0b36cb83dc PGXS should be set with := not =, as specified in the documentation,
to avoid useless multiple executions of pg_config.
2005-09-27 17:13:14 +00:00
Tom Lane
0df7f493f8 Clean up possibly-uninitialized-variable warnings reported by gcc 4.x. 2005-09-24 23:07:18 +00:00
Tom Lane
8a65b820e2 Suppress signed-vs-unsigned-char warnings in contrib. 2005-09-24 19:14:05 +00:00
Teodor Sigaev
f82b853b47 1 Update Snowball sources
2 Makefile fixes
2005-09-15 11:14:18 +00:00
Tom Lane
ac652466ec Partial fixes for contrib build on AIX: include -lm where needed.
Per Rocco Altier.
2005-07-24 23:30:10 +00:00
Bruce Momjian
21634e513f Add extra argument for new pg_regexec API. 2005-07-10 18:31:59 +00:00
Bruce Momjian
bb3cce4ec9 Add E'' syntax so eventually normal strings can treat backslashes
literally.

Add GUC variables:

        "escape_string_warning" - warn about backslashes in non-E strings
        "escape_string_syntax" - supports E'' syntax?
        "standard_compliant_strings" - treats backslashes literally in ''

Update code to use E'' when escapes are used.
2005-06-26 03:04:37 +00:00
Tom Lane
368739dca8 Fix bogus assumption that sizeof() produces an int-sized result. 2005-06-20 00:32:22 +00:00
Teodor Sigaev
7148de1fa8 Prevent to divide by zero and range out of 0..1 2005-06-01 11:45:03 +00:00
Tom Lane
978129f28e Document get_call_result_type() and friends; mark TypeGetTupleDesc()
and RelationNameGetTupleDesc() as deprecated; remove uses of the
latter in the contrib library.  Along the way, clean up crosstab()
code and documentation a little.
2005-05-30 23:09:07 +00:00
Bruce Momjian
b492c3accc Add parentheses to macros when args are used in computations. Without
them, the executation behavior could be unexpected.
2005-05-25 21:40:43 +00:00
Neil Conway
36ab600511 Cleanup of GiST extensions in contrib/: now that we always invoke GiST
methods in a short-lived memory context, there is no need for GiST methods
to do their own manual (and error-prone) memory management.
2005-05-21 12:08:06 +00:00
Tom Lane
278bd0cc22 For some reason access/tupmacs.h has been #including utils/memutils.h,
which is neither needed by nor related to that header.  Remove the bogus
inclusion and instead include the header in those C files that actually
need it.  Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
2005-05-06 17:24:55 +00:00
Tom Lane
177af51c04 Change tsearch2 to not use the unsafe practice of creating functions
that return INTERNAL without also having INTERNAL arguments.  Since the
functions in question aren't meant to be called by hand anyway, I just
redeclared them to take 'internal' instead of 'text'.  Also add code
to ProcedureCreate() to enforce the restriction, as I should have done
to start with :-(
2005-05-03 16:51:00 +00:00
Teodor Sigaev
04ce41ca62 Add comment about permissions on pg_ts* tables 2005-04-19 13:58:48 +00:00
Teodor Sigaev
fb13881f42 1 fix various comparing functions
2 implement gtsvector_out for use with gevel module (debug GiST indexes, http://www.sai.msu.su/~megera/postgres/gist/gevel/)
2005-03-31 15:08:08 +00:00
Teodor Sigaev
31b6d840f6 Prevent rank change in case of duplicate search terms 2005-03-05 15:48:32 +00:00
Tom Lane
c0e0d3e2e9 Avoid unnecessary dependence on u_int16_t, per buildfarm failure.
(It doesn't compile on HPUX either...)
2005-01-26 18:49:39 +00:00
Teodor Sigaev
324300bc7c improve support of agglutinative languages (query with compound words).
regression=# select to_tsquery( '\'fotballklubber\'');
                   to_tsquery
------------------------------------------------
 'fotball' & 'klubb' | 'fot' & 'ball' & 'klubb'
(1 row)

So, changed interface to dictionaries, lexize method of dictionary shoud return
pointer to aray of TSLexeme structs instead of char**. Last element should
have TSLexeme->lexeme == NULL.

typedef struct {
        /* number of variant of split word , for example
                Word 'fotballklubber' (norwegian) has two varian to split:
                ( fotball, klubb ) and ( fot, ball, klubb ). So, dictionary
                should return:
                nvariant        lexeme
                1               fotball
                1               klubb
                2               fot
                2               ball
                2               klubb

        */
        uint16  nvariant;

        /* currently unused */
        uint16  flags;

        /* C-string */
        char    *lexeme;
} TSLexeme;
2005-01-25 15:24:38 +00:00
Teodor Sigaev
fe30edbab8 Change
typedef struct {} WordEntryPos;
to
typedef uint16 WordEntryPos
according to http://www.pgsql.ru/db/mw/msg.html?mid=2035188

Require re-fill all tsvector fields and reindex tsvector indexes.
2005-01-25 12:36:25 +00:00
Teodor Sigaev
5b354d2c7e Fixes:
1 Report error message instead of do nothing in case of error in regex
2 Malloced storage for mask, find and repl part of Affix. This parts may be
  large enough in real life (for example in czech, thanks to moje <moje@kalhotky.net>)
2005-01-11 16:07:55 +00:00
Neil Conway
1b3e769682 This patch makes some cleanups to contrib/ to silence some sparse
warnings:

- remove pointless "extern" keyword from some function definitions in
contrib/tsearch2

- use "NULL" not "0" as NULL pointer in contrib/tsearch,
contrib/tsearch2, contrib/pgbench, and contrib/vacuumlo
2004-11-09 06:09:40 +00:00
Tom Lane
0636d55843 Fix some more 'old-style parameter declaration' warnings. 2004-10-25 02:30:29 +00:00
Tom Lane
3fdd33ab99 Avoid macro-redefinition warnings on Windows, per Andrew Dunstan. 2004-10-21 19:49:27 +00:00
Tom Lane
380bd04c16 Standardize on using the Min, Max, and Abs macros that are in our c.h file,
getting rid of numerous ad-hoc versions that have popped up in various
places.  Shortens code and avoids conflict with Windows min() and max()
macros.
2004-10-21 19:28:36 +00:00
Tom Lane
d3e36da789 Make the standard stopword files be sought relative to share_dir, so
that a tsearch2 installation can be relocatable.
2004-10-17 23:09:31 +00:00
Tom Lane
b04e70b11e Adjust tsearch2.sql to avoid use of COPY FROM STDIN, so as to
simplify life for the Win32 installer.  Per Dave Page.
2004-09-14 03:58:54 +00:00
Tom Lane
b2c4071299 Redesign query-snapshot timing so that volatile functions in READ COMMITTED
mode see a fresh snapshot for each command in the function, rather than
using the latest interactive command's snapshot.  Also, suppress fresh
snapshots as well as CommandCounterIncrement inside STABLE and IMMUTABLE
functions, instead using the snapshot taken for the most closely nested
regular query.  (This behavior is only sane for read-only functions, so
the patch also enforces that such functions contain only SELECT commands.)
As per my proposal of 6-Sep-2004; I note that I floated essentially the
same proposal on 19-Jun-2002, but that discussion tailed off without any
action.  Since 8.0 seems like the right place to be taking possibly
nontrivial backwards compatibility hits, let's get it done now.
2004-09-13 20:10:13 +00:00
Bruce Momjian
15d3f9f6b7 Another pgindent run with lib typedefs added. 2004-08-30 02:54:42 +00:00
Bruce Momjian
b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian
ee85595d46 > Please find enclose a submission to fix these problems.
>
> The patch adds missing the "libpgport.a" file to the installation under
> "install-all-headers". It is needed by some contribs. I install the
> library in "pkglibdir", but I was wondering whether it should be "libdir"?
> I was wondering also whether it would make sense to have a "libpgport.so"?
>
> It fixes various macros which are used by contrib makefiles, especially
> libpq_*dir and LDFLAGS when used under PGXS. It seems to me that they are
> needed to
>
> It adds the ability to test and use PGXS with contribs, with "make
> USE_PGXS=1". Without the macro, this is exactly as before, there should be
> no difference, esp. wrt the vpath feature that seemed broken by previous
> submission. So it should not harm anybody, and it is useful at least to me.
>
> It fixes some inconsistencies in various contrib makefiles
> (useless override, ":=" instead of "=").

Fabien COELHO
2004-08-20 20:13:10 +00:00
Tom Lane
fcbc438727 Label CVS tip as 8.0devel instead of 7.5devel. Adjust various comments
and documentation to reference 8.0 instead of 7.5.
2004-08-04 21:34:35 +00:00
Tom Lane
959b353db2 Fix misspellings: langauge -> language. 2004-07-04 23:34:24 +00:00
Teodor Sigaev
bb89237531 1 Eliminate duplicate field HLWORD->skip
2 Rework support for html tags in parser
3 add HighlightAll to headline function for generating highlighted
  whole text with saved html tags
2004-06-28 16:19:09 +00:00
Teodor Sigaev
df9d87f608 Previous commit wasnt full... 2004-06-23 11:29:58 +00:00
Teodor Sigaev
de55c0cef6 1 Fix affixes with void replacement (AFAIK, it's only russian)
2 Optimize regex execution
2004-06-23 11:06:11 +00:00
Teodor Sigaev
09bc52fe73 Fix stupid bug in installcheck 2004-06-23 09:43:43 +00:00
Teodor Sigaev
f35e8d8431 Fix silly bug 2004-06-01 10:24:25 +00:00
Teodor Sigaev
553bc41633 1 add namespaces as Tom suggest http://www.pgsql.ru/db/mw/msg.html?mid=1987703
2 remove select qeury in inserts
2004-05-31 16:51:56 +00:00
Teodor Sigaev
7cb55d21ed Fix memory leak with pg_regexec 2004-05-31 13:55:19 +00:00
Teodor Sigaev
d222bb4d5e Fix memory leak with pg_regcomp 2004-05-31 13:52:57 +00:00
Teodor Sigaev
11864ab657 Win32 related patch by Darko Prenosil. Small correct by teodor 2004-05-31 13:29:43 +00:00
Neil Conway
72b6ad6313 Use the new List API function names throughout the backend, and disable the
list compatibility API by default. While doing this, I decided to keep
the llast() macro around and introduce llast_int() and llast_oid() variants.
2004-05-30 23:40:41 +00:00
Teodor Sigaev
a6ea6457fa Stat function now can show statistics per weight of lexemes 2004-05-28 15:36:49 +00:00
Teodor Sigaev
89a8e15671 Update docs from Andrew J. Kopciuch <akopciuch@bddf.ca> 2004-05-14 10:39:25 +00:00
Tom Lane
a90b2a035f Suppress 'uninitialized variable' warning emitted by some (not all)
versions of gcc.  The code is correct AFAICS, but it requires slightly
more analysis than usual to see that the variable can't be used uninitialized.
2004-05-07 13:09:12 +00:00
Teodor Sigaev
d1eb9fede5 Use regprocedure type instead of oid. Usefull for human read and dump/restore 2004-05-07 11:19:06 +00:00
Tom Lane
0bd61548ab Solve the 'Turkish problem' with undesirable locale behavior for case
conversion of basic ASCII letters.  Remove all uses of strcasecmp and
strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp;
remove most but not all direct uses of toupper and tolower in favor of
pg_toupper and pg_tolower.  These functions use the same notions of
case folding already developed for identifier case conversion.  I left
the straight locale-based folding in place for situations where we are
just manipulating user data and not trying to match it to built-in
strings --- for example, the SQL upper() function is still locale
dependent.  Perhaps this will prove not to be what's wanted, but at
the moment we can initdb and pass regression tests in Turkish locale.
2004-05-07 00:24:59 +00:00
Tom Lane
89ee5b89a6 Fix some more compatibility issues (ctype.h macros must never be passed
signed chars...)
2004-04-02 00:41:18 +00:00
Tom Lane
47fe0517fc Fix some portability issues (reliance on gcc-isms). 2004-04-01 23:44:38 +00:00
Tom Lane
375369acd1 Replace TupleTableSlot convention for whole-row variables and function
results with tuples as ordinary varlena Datums.  This commit does not
in itself do much for us, except eliminate the horrid memory leak
associated with evaluation of whole-row variables.  However, it lays the
groundwork for allowing composite types as table columns, and perhaps
some other useful features as well.  Per my proposal of a few days ago.
2004-04-01 21:28:47 +00:00
Teodor Sigaev
f2c064afcb Cleanup vectors of GISTENTRY and eliminate problem with 64-bit strict-aligned
boxes. Change interface to user-defined GiST support methods union and
picksplit. Now instead of bytea struct it used special GistEntryVector
structure.
2004-03-30 15:45:33 +00:00
Teodor Sigaev
eebdfcdbe6 1 Minimize memory allocation for void (but not null) value.
2 Add silly ordering for ts_vector to aim grouping, union, except etc. Don't use BTree opclass (tsvector_ops).
2004-03-25 16:56:10 +00:00
Tom Lane
fa96a5e15b Add %option nodefault to all our flex lexers. Fix a couple of rule gaps
exposed thereby.  AFAICT these would not lead to any worse problems than
junk emitted on the backend's stdout, but we should have the option to
catch possible worse errors in future.
2004-02-24 22:06:32 +00:00