used by OpenOffice. Dictionaries are placed at
http://lingucomponent.openoffice.org/spell_dic.html
Dictionary automatically recognizes format of files.
Warning. MySpell's format has limitation with compound
word support: it's impossible to mark affix as
compound-only affix. So for norwegian, german etc
languages it's recommended to use original ispell format.
For that reason I don't want to remove my2ispell
scripts, it's has workaround at least for norwegian language.
sorry but fix can't be applyed to previous version: it's require
refill tsvector...
2 Small optimize of load time for huge dictionaries
3 use palloc instead of malloc during load dict file
regression=# select to_tsquery( '\'fotballklubber\'');
to_tsquery
------------------------------------------------
'fotball' & 'klubb' | 'fot' & 'ball' & 'klubb'
(1 row)
So, changed interface to dictionaries, lexize method of dictionary shoud return
pointer to aray of TSLexeme structs instead of char**. Last element should
have TSLexeme->lexeme == NULL.
typedef struct {
/* number of variant of split word , for example
Word 'fotballklubber' (norwegian) has two varian to split:
( fotball, klubb ) and ( fot, ball, klubb ). So, dictionary
should return:
nvariant lexeme
1 fotball
1 klubb
2 fot
2 ball
2 klubb
*/
uint16 nvariant;
/* currently unused */
uint16 flags;
/* C-string */
char *lexeme;
} TSLexeme;
1 Report error message instead of do nothing in case of error in regex
2 Malloced storage for mask, find and repl part of Affix. This parts may be
large enough in real life (for example in czech, thanks to moje <moje@kalhotky.net>)