3a54211bd6c4dc3f8687c16020770551cf83a548 |
|
17-Aug-2015 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
lib-fts: Add Unicode TR29 rule WB5a setting to tokenizer.
Splits prefixing contracted words from base word.
E.g. "l'homme" -> "l" "homme". Together with a language specific stopword list
unnecessary contractions can thus be filtered away.
This is disabled by default and only works with the TR29 algorithm.
Enable by "fts_tokenizer_generic = algorithm=tr29 wb5a=yes" |