postgresql/contrib/tsearch2/gendict/README.gendict

132 lines
4.4 KiB
Plaintext
Raw Normal View History

2003-07-21 18:27:44 +08:00
Gendict - generate dictionary templates for contrib/tsearch2 module.
This utility aims to help people creating dictionary for contrib/tsearch v2
module. Particularly, it has built-in support for snowball stemmers.
Programming API to tsearch2 dictionaries is described in tsearch v2
documentation.
Prerequisities:
* PostgreSQL 7.3 and above.
* You need tsearch2 module sources already compiled
* Rights to install contrib modules
Usage:
run config.sh without parameters to see options and arguments
Usage:
./config.sh -n DICTNAME ( [ -s [ -p PREFIX ] ] | [ -c CFILES ] [ -h HFILES ] [ -i ] ) [ -v ] [ -d DIR ] [ -C COMMENT ]
-v - be verbose
-d DIR - name of directory in PGSQL_SRC/contrib (default dict_DICTNAME)
-C COMMENT - dictionary comment
Generate Snowball stemmer:
./config.sh -n DICTNAME -s [ -p PREFIX ] [ -v ] [ -d DIR ] [ -C COMMENT ]
-s - generate Snowball wrapper
-p - prefix of Snowball's function, (default DICTNAME)
Generate template dictionary:
./config.sh -n DICTNAME [ -c CFILES ] [ -h HFILES ] [ -i ] [ -v ] [ -d DIR ] [ -C COMMENT ]
-c CFILES - source files, must be placed in contrib/tsearch2/gendict directory.
These files will be used in Makefile.
-h HFILES - header files, must be placed in contrib/tsearch2/gendict directory.
These files will be used in Makefile and subinclude.h
-i - dictionary has init method
Example 1:
Create Portuguese stemmer
0. cd PGSQL_SRC/contrib/tsearch2/gendict
1. Obtain stem.{c,h} files for Portuguese
wget http://snowball.tartarus.org/portuguese/stem.c
wget http://snowball.tartarus.org/portuguese/stem.h
2. Create template files for Portuguese
./config.sh -n pt -s -p portuguese_ISO_8859_1 -v -C'Snowball stemmer for Portuguese'
2003-07-21 18:27:44 +08:00
Note, that argument for -p option should be *the same* as name of stemming
function in stem.c (without _stem)
A bunch of files will be generated and placed in PGSQL_SRC/contrib/dict_pt
directory.
3. Compile and install dictionary
cd PGSQL_SRC/contrib/dict_pt
make
make install
4. Test it
Sample portuguese words with the stemmed forms are available
from http://snowball.tartarus.org/portuguese/stemmer.html
createdb testdict
psql testdict < /usr/local/pgsql/share/contrib/tsearch2.sql
psql testdict < /usr/local/pgsql/share/contrib/dict_pt.sql
psql -d testdict -c "select lexize('pt','bobagem');"
lexize
---------
{bobag}
(1 row)
Here is what I have in pg_ts_dict table
psql -d testdict -c "select * from pg_ts_dict where dict_name='pt';"
dict_name | dict_init | dict_initoption | dict_lexize | dict_comment
-----------+--------------------+-----------------+---------------------------------------+---------------------------------
pt | dinit_pt(internal) | | snb_lexize(internal,internal,integer) | Snowball stemmer for Portuguese
2003-07-21 18:27:44 +08:00
(1 row)
Note, that you have already installed dictionary and corresponding
entry in tsearch configuration and you may modify it using
plain SQL commands, for example, specify stop words.
Example 2:
a) Simple template dictionary with init method
./config.sh -n wow -v -i -C WOW
b) Create simple template dict (without init method):
./config.sh -n wow -v -C WOW
The same as above, but dictionary will have not init method
Dictionaries obtained in a) and b) are fully working and ready
for use:
a) lowercase input word and remove it if it is a stop word
b) recognizes any word
c) Simple template dictionary with source files (with init method):
./config.sh -n wow -v -i -c a.c -h a.h -C WOW
Source files ( a.c ) must be placed in contrib/tsearch2/gendict directory.
These files will be used in Makefile.
Header files ( a.h ), must be placed in contrib/tsearch2/gendict directory.
These files will be used in Makefile and subinclude.h
d) Simple template dictionary with source files (without init method):
./config.sh -n wow -v -c a.c -h a.h -C WOW
The same as above, but dictionary will have not init method
After that you have sources in PGSQL_SRC/contrib/dict_wow and
you may edit them to create actual dictionary.
Please, check Tsearch2 home page (http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/)
for additional information about "Gendict tutorial" and dictionaries.