mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-12-09 08:10:09 +08:00
b4d107a777
old version doesn't available on Snowball's site and new version of stemmers can't be compiled with old interface. |
||
---|---|---|
.. | ||
config.sh | ||
dict_snowball.c.IN | ||
dict_tmpl.c.IN | ||
Makefile.IN | ||
README.gendict | ||
sql.IN |
Gendict - generate dictionary templates for contrib/tsearch2 module. This utility aims to help people creating dictionary for contrib/tsearch v2 module. Particularly, it has built-in support for snowball stemmers. Programming API to tsearch2 dictionaries is described in tsearch v2 documentation. Prerequisities: * PostgreSQL 7.3 and above. * You need tsearch2 module sources already compiled * Rights to install contrib modules Usage: run config.sh without parameters to see options and arguments Usage: ./config.sh -n DICTNAME ( [ -s [ -p PREFIX ] ] | [ -c CFILES ] [ -h HFILES ] [ -i ] ) [ -v ] [ -d DIR ] [ -C COMMENT ] -v - be verbose -d DIR - name of directory in PGSQL_SRC/contrib (default dict_DICTNAME) -C COMMENT - dictionary comment Generate Snowball stemmer: ./config.sh -n DICTNAME -s [ -p PREFIX ] [ -v ] [ -d DIR ] [ -C COMMENT ] -s - generate Snowball wrapper -p - prefix of Snowball's function, (default DICTNAME) Generate template dictionary: ./config.sh -n DICTNAME [ -c CFILES ] [ -h HFILES ] [ -i ] [ -v ] [ -d DIR ] [ -C COMMENT ] -c CFILES - source files, must be placed in contrib/tsearch2/gendict directory. These files will be used in Makefile. -h HFILES - header files, must be placed in contrib/tsearch2/gendict directory. These files will be used in Makefile and subinclude.h -i - dictionary has init method Example 1: Create Portuguese stemmer 0. cd PGSQL_SRC/contrib/tsearch2/gendict 1. Obtain stem.{c,h} files for Portuguese wget http://snowball.tartarus.org/portuguese/stem.c wget http://snowball.tartarus.org/portuguese/stem.h 2. Create template files for Portuguese ./config.sh -n pt -s -p portuguese_ISO_8859_1 -v -C'Snowball stemmer for Portuguese' Note, that argument for -p option should be *the same* as name of stemming function in stem.c (without _stem) A bunch of files will be generated and placed in PGSQL_SRC/contrib/dict_pt directory. 3. Compile and install dictionary cd PGSQL_SRC/contrib/dict_pt make make install 4. Test it Sample portuguese words with the stemmed forms are available from http://snowball.tartarus.org/portuguese/stemmer.html createdb testdict psql testdict < /usr/local/pgsql/share/contrib/tsearch2.sql psql testdict < /usr/local/pgsql/share/contrib/dict_pt.sql psql -d testdict -c "select lexize('pt','bobagem');" lexize --------- {bobag} (1 row) Here is what I have in pg_ts_dict table psql -d testdict -c "select * from pg_ts_dict where dict_name='pt';" dict_name | dict_init | dict_initoption | dict_lexize | dict_comment -----------+--------------------+-----------------+---------------------------------------+--------------------------------- pt | dinit_pt(internal) | | snb_lexize(internal,internal,integer) | Snowball stemmer for Portuguese (1 row) Note, that you have already installed dictionary and corresponding entry in tsearch configuration and you may modify it using plain SQL commands, for example, specify stop words. Example 2: a) Simple template dictionary with init method ./config.sh -n wow -v -i -C WOW b) Create simple template dict (without init method): ./config.sh -n wow -v -C WOW The same as above, but dictionary will have not init method Dictionaries obtained in a) and b) are fully working and ready for use: a) lowercase input word and remove it if it is a stop word b) recognizes any word c) Simple template dictionary with source files (with init method): ./config.sh -n wow -v -i -c a.c -h a.h -C WOW Source files ( a.c ) must be placed in contrib/tsearch2/gendict directory. These files will be used in Makefile. Header files ( a.h ), must be placed in contrib/tsearch2/gendict directory. These files will be used in Makefile and subinclude.h d) Simple template dictionary with source files (without init method): ./config.sh -n wow -v -c a.c -h a.h -C WOW The same as above, but dictionary will have not init method After that you have sources in PGSQL_SRC/contrib/dict_wow and you may edit them to create actual dictionary. Please, check Tsearch2 home page (http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/) for additional information about "Gendict tutorial" and dictionaries.