2003-07-21 18:27:44 +08:00
|
|
|
Gendict - generate dictionary templates for contrib/tsearch2 module.
|
|
|
|
|
|
|
|
This utility aims to help people creating dictionary for contrib/tsearch v2
|
|
|
|
module. Particularly, it has built-in support for snowball stemmers.
|
|
|
|
|
|
|
|
Programming API to tsearch2 dictionaries is described in tsearch v2
|
|
|
|
documentation.
|
|
|
|
|
|
|
|
|
|
|
|
Prerequisities:
|
|
|
|
|
|
|
|
* PostgreSQL 7.3 and above.
|
|
|
|
|
|
|
|
* You need tsearch2 module sources already compiled
|
|
|
|
|
|
|
|
* Rights to install contrib modules
|
|
|
|
|
|
|
|
Usage:
|
|
|
|
|
|
|
|
run config.sh without parameters to see options and arguments
|
|
|
|
|
|
|
|
Usage:
|
|
|
|
./config.sh -n DICTNAME ( [ -s [ -p PREFIX ] ] | [ -c CFILES ] [ -h HFILES ] [ -i ] ) [ -v ] [ -d DIR ] [ -C COMMENT ]
|
|
|
|
-v - be verbose
|
|
|
|
-d DIR - name of directory in PGSQL_SRC/contrib (default dict_DICTNAME)
|
|
|
|
-C COMMENT - dictionary comment
|
|
|
|
Generate Snowball stemmer:
|
|
|
|
./config.sh -n DICTNAME -s [ -p PREFIX ] [ -v ] [ -d DIR ] [ -C COMMENT ]
|
|
|
|
-s - generate Snowball wrapper
|
|
|
|
-p - prefix of Snowball's function, (default DICTNAME)
|
|
|
|
Generate template dictionary:
|
|
|
|
./config.sh -n DICTNAME [ -c CFILES ] [ -h HFILES ] [ -i ] [ -v ] [ -d DIR ] [ -C COMMENT ]
|
|
|
|
-c CFILES - source files, must be placed in contrib/tsearch2/gendict directory.
|
|
|
|
These files will be used in Makefile.
|
|
|
|
-h HFILES - header files, must be placed in contrib/tsearch2/gendict directory.
|
|
|
|
These files will be used in Makefile and subinclude.h
|
|
|
|
-i - dictionary has init method
|
|
|
|
|
|
|
|
|
|
|
|
Example 1:
|
|
|
|
|
|
|
|
Create Portuguese stemmer
|
|
|
|
|
|
|
|
0. cd PGSQL_SRC/contrib/tsearch2/gendict
|
|
|
|
|
|
|
|
1. Obtain stem.{c,h} files for Portuguese
|
|
|
|
|
|
|
|
wget http://snowball.tartarus.org/portuguese/stem.c
|
|
|
|
wget http://snowball.tartarus.org/portuguese/stem.h
|
|
|
|
|
|
|
|
2. Create template files for Portuguese
|
|
|
|
|
2005-09-15 19:14:18 +08:00
|
|
|
./config.sh -n pt -s -p portuguese_ISO_8859_1 -v -C'Snowball stemmer for Portuguese'
|
2003-07-21 18:27:44 +08:00
|
|
|
|
|
|
|
Note, that argument for -p option should be *the same* as name of stemming
|
|
|
|
function in stem.c (without _stem)
|
|
|
|
|
|
|
|
A bunch of files will be generated and placed in PGSQL_SRC/contrib/dict_pt
|
|
|
|
directory.
|
|
|
|
|
|
|
|
3. Compile and install dictionary
|
|
|
|
|
|
|
|
cd PGSQL_SRC/contrib/dict_pt
|
|
|
|
make
|
|
|
|
make install
|
|
|
|
|
|
|
|
4. Test it
|
|
|
|
|
|
|
|
Sample portuguese words with the stemmed forms are available
|
|
|
|
from http://snowball.tartarus.org/portuguese/stemmer.html
|
|
|
|
|
|
|
|
createdb testdict
|
|
|
|
psql testdict < /usr/local/pgsql/share/contrib/tsearch2.sql
|
|
|
|
psql testdict < /usr/local/pgsql/share/contrib/dict_pt.sql
|
|
|
|
psql -d testdict -c "select lexize('pt','bobagem');"
|
|
|
|
lexize
|
|
|
|
---------
|
|
|
|
{bobag}
|
|
|
|
(1 row)
|
|
|
|
|
|
|
|
Here is what I have in pg_ts_dict table
|
|
|
|
|
|
|
|
psql -d testdict -c "select * from pg_ts_dict where dict_name='pt';"
|
2005-09-15 19:14:18 +08:00
|
|
|
dict_name | dict_init | dict_initoption | dict_lexize | dict_comment
|
|
|
|
-----------+--------------------+-----------------+---------------------------------------+---------------------------------
|
|
|
|
pt | dinit_pt(internal) | | snb_lexize(internal,internal,integer) | Snowball stemmer for Portuguese
|
|
|
|
|
2003-07-21 18:27:44 +08:00
|
|
|
(1 row)
|
|
|
|
|
|
|
|
|
|
|
|
Note, that you have already installed dictionary and corresponding
|
|
|
|
entry in tsearch configuration and you may modify it using
|
|
|
|
plain SQL commands, for example, specify stop words.
|
|
|
|
|
|
|
|
Example 2:
|
|
|
|
|
|
|
|
a) Simple template dictionary with init method
|
|
|
|
|
|
|
|
./config.sh -n wow -v -i -C WOW
|
|
|
|
|
|
|
|
b) Create simple template dict (without init method):
|
|
|
|
./config.sh -n wow -v -C WOW
|
|
|
|
|
|
|
|
The same as above, but dictionary will have not init method
|
|
|
|
|
|
|
|
Dictionaries obtained in a) and b) are fully working and ready
|
|
|
|
for use:
|
|
|
|
a) lowercase input word and remove it if it is a stop word
|
|
|
|
b) recognizes any word
|
|
|
|
|
|
|
|
c) Simple template dictionary with source files (with init method):
|
|
|
|
|
|
|
|
./config.sh -n wow -v -i -c a.c -h a.h -C WOW
|
|
|
|
|
|
|
|
Source files ( a.c ) must be placed in contrib/tsearch2/gendict directory.
|
|
|
|
These files will be used in Makefile.
|
|
|
|
|
|
|
|
Header files ( a.h ), must be placed in contrib/tsearch2/gendict directory.
|
|
|
|
These files will be used in Makefile and subinclude.h
|
|
|
|
|
|
|
|
d) Simple template dictionary with source files (without init method):
|
|
|
|
|
|
|
|
./config.sh -n wow -v -c a.c -h a.h -C WOW
|
|
|
|
|
|
|
|
The same as above, but dictionary will have not init method
|
|
|
|
|
|
|
|
After that you have sources in PGSQL_SRC/contrib/dict_wow and
|
|
|
|
you may edit them to create actual dictionary.
|
|
|
|
|
|
|
|
Please, check Tsearch2 home page (http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/)
|
2005-09-15 19:14:18 +08:00
|
|
|
for additional information about "Gendict tutorial" and dictionaries.
|