149299c7d5136a8fb425ef3cf8953026a1358002 |
|
11-Oct-2017 |
Timo Sirainen <timo.sirainen@dovecot.fi> |
global: Use check-local in Makefile.am instead of overriding check directly
This helps with dependency problems, like running "make check" in
lib-storage without "make" first would try to compile the test programs
too early and fail. |
f08b05d41b66e6f52daf6e8b40c1612617e84c79 |
|
09-May-2017 |
Josef 'Jeff' Sipek <jeff.sipek@dovecot.fi> |
{lib,lib-fts}: fix builds with BSD make
Without this change, BSD make doesn't know how to make a couple of the
generated files because the BUILT_SOURCES file names don't match exactly
the left hand sides of the rules. (GNU make somehow manages to match the
rule even though it is not an exact match.) |
58f9b440f44eef4348a9043e3cef477a9733cb10 |
|
09-May-2017 |
Josef 'Jeff' Sipek <jeff.sipek@dovecot.fi> |
lib-fts: use full path to word-properties script
This is a step toward fixing builds where object dir != source dir. |
824107247fcaa05c081f32bffd2cdecea8ec557a |
|
09-May-2017 |
Josef 'Jeff' Sipek <jeff.sipek@dovecot.fi> |
lib-fts: download data files into srcdir
This is a step toward fixing builds where object dir != source dir. |
4e64ac91c5a3eb2a55e0b18d8da832b29ec08289 |
|
23-Mar-2017 |
Martti Rannanjärvi <martti.rannanjarvi@dovecot.fi> |
lib: Download unicode.org files from dovecot.org |
5fcd30add8dcf4d883978cce3e39f3a89184f1e5 |
|
23-Aug-2016 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
lib-fts: Cut overlong strings in lowercase filter.
Added new common truncate function for filters. It also removes any partial
characters, that would remain from plain truncation. |
3f3c1b629196bc8491f146705b6f8ddadfcde1c8 |
|
02-Jun-2016 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
lib-fts: Improved stopword file reading.
The reading tries to be a little bit stricter now. Only stopwords at the
start of a new line are accepted now. Changed fi stopwords accordingly.
Also removed superfluous stack allocation in parsing. |
0605ff6f25783f7c69c1148f9f3a7bd4c34c098f |
|
02-Jun-2016 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
lib-fts: Add stopword files for more languages. |
abfc91b502618e387a5c9c87bcf658b341735947 |
|
02-Jun-2016 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
lib-fts: Move stopwords to subdirectory.
All files incluided in dist are explicitly mentioned. The whole
subdirectory 'stopwords' could also be distributed, but that is
more error prone. |
00544ad37ece26b2c4f2210ed5e5295241d0db19 |
|
16-Mar-2016 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
lib-fts: Lift helper function out of generic tokenizer. |
7d4c8041ab63e6a1bf17a9b2bb11dd18634971e2 |
|
15-Jan-2016 |
Aki Tuomi <aki.tuomi@dovecot.fi> |
lib-fts: Add lib-fts to CPPFLAGS as include dir
Without this, VPATH builds fail because the includes cannot be
found as they are not on same directory. |
40bdcc2e50b6969596b10f848d1fbe23820666f9 |
|
12-Jan-2016 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
lib-fts: Create library for development packages. |
6dd785e6857866657d6ef7a88af6d46ed0133801 |
|
17-Nov-2015 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
fts: Added fts_library_init() and _deinit()
Replaces calling three different functions on init and deinit. |
3ec8b0d282d46d1f698b1f2aa27922cb8f26cb97 |
|
17-Nov-2015 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
lib-fts: Add Norwegian.
Norwegian has two main dialects, Bokmal(nb) and Nynorsk(nn). They
are detected separately by libexttextcat, but the stemmer only
knows Norwegian. Thus they are treated as a single language,
Norwegian (no). This might also make more sense in everyday
use of mixed writing style Norwegian.
Caveat: The default normalizer filter does not modify U+00F8
(Latin Small Letter O with Stroke). In some configurations it
might be desirable to rewrite it to e.g. o. Same goes for the
upper case version. This can be done by passing a modified "id"
setting to the normalizer filter. |
c5effa0f13da8f45991c89a9d8c9d2109db66039 |
|
17-Nov-2015 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
lib-fts: Add Swedish (sv) to supported languages. |
440b625484f3cc9d3ec0a7ba36fe3583aa90172d |
|
31-Aug-2015 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
lib-fts: Add prefixing contraction filter.
Filters away prefixing contracted words, e.g. "l'homme" -> "homme".
Tokens to be filtered must be lower case. Only supports French in
this initial version. |
f1306b3d242963588c97b35d16973c4198bcae7e |
|
11-Aug-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Install headers on make install. |
471167b9701fcc99b66f7a8bcae07bc4ac0dbbd4 |
|
03-Jun-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Added "english-possessive" filter. |
5a2910119ec0b878a0d7ca91918b97e9d40a936d |
|
02-Jun-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Moved IS_APOSTROPHE() to fts-common.h |
bf698b98d3a3a1eced66cc682c449f23bf2b67d0 |
|
16-May-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Rewrite ICU handling functions.
Some of the changes:
- Use buffers instead of allocating everything from data stack.
- Optimistically attempt to write the data directly to the buffers without
first calculating their size. Grow the buffer if it doesn't fit first.
- Use u_strFromUTF8Lenient() instead of u_strFromUTF8(). Our input is
already supposed to be valid UTF-8, although we don't check if all code
points are valid, while u_strFromUTF8() does check them and return failures.
We don't really care about if code points are valid or not and
u_strFromUTF8Lenient() passes through everything.
Added unit tests to make sure all the functions work as intended and all the
UTF-8 input passes through them successfully. |
a5563dc790a44bb58860d74479a24349f593d68f |
|
14-May-2015 |
Timo Sirainen <tss@iki.fi> |
Reverted d592417ec815 which added unnecessary code to Makefiles.
The original problem it tried to solve was properly fixed by 46969c4cc57e.
make will actually wait for processes to finish creating files before it
continues to the next program that wants to access the file. As long as the
dependencies are correct. |
b9495c944b49d71e8235c772c2dc035fdab282cd |
|
13-May-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Makefile compiling dependency fix |
91d2e560eb95a9ab7f2c194d5bf14179aff6023b |
|
12-May-2015 |
Phil Carmody <phil@dovecot.fi> |
lib-fts: autogenerate C arrays using perl
The sh script had bashisms, the awk script crashed mawk, so let's try perl...
Signed-off-by: Phil Carmody <phil@dovecot.fi> |
3756060476f110e7a8cb7069ea1319665815e845 |
|
11-May-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: .sh scripts weren't executable - changed them to be run via bash directly.
Better to avoid relying on the executable bit. |
5d8dad014bc0a18e79286953a92f7fae7684ee9b |
|
11-May-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Reverted e80969ea8684 which replaced .sh scripts with awk
Bugs in older awk versions (used at least by Debian squeeze & wheezy) caused
awk to crash while processing the script. |
412bd45e0cabee1284a56482578eb347d626bd4d |
|
11-May-2015 |
Timo Sirainen <tss@iki.fi> |
Makefile: Fixed build concurrency issues with lib-fts |
644e991973c99703e9994851fe365960ab1bc089 |
|
11-May-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Replaced word-boundary/break-data.sh with more portable awk scripts
Patch by Michael Grimm. |
acfcf88e4dd529e4b2409f43bc9713cbc0169347 |
|
09-May-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Added "lowercase" filter.
For now it handles only ASCII characters, but that's enough for our use. |
eac88e31b791d6a099e0e497ac2a29aa041f05b2 |
|
09-May-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Removed "simple" normalizer.
It translated input to titlecase, which wasn't suitable for snowball
stemming that wanted lowercase input. Since that doesn't work, there's
probably no good for the existence of this (perhaps in future it's replaced
by unicode-aware lowercaser). |
12bc47bcae87a1f954b98420929eaf90922aa605 |
|
08-May-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Use rfc822-parser in fts-tokenizer-address instead of duplicating its code. |
ec930ce90b17fb63ff035c1c87d994800de092f1 |
|
21-Apr-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Added normalizer-simple for doing normalization without libicu. |
63713f16bad8b55e74c479adb6b47965b519c29b |
|
21-Apr-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Renamed normalizer to icu-normalizer, including the source code. |
cb6f6ef5044a559fb285e2f7d3fe12b4751ea708 |
|
21-Apr-2015 |
Timo Sirainen <tss@iki.fi> |
configure: s/normalizer/libicu/ since we it could be used for something else as well. |
e162baa2d2ce41a009988e86636a5c77a2725477 |
|
21-Apr-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Added udhr_fra.txt to EXTRA_DIST |
4bf6941ccdfb27c99e15ab32e5299e25cd2855c6 |
|
20-Apr-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Added PropList.txt to EXTRA_DIST |
556c189ce6b6de3c8b4a3fc38b7c61bef800d012 |
|
20-Apr-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Fixed using FTS_NORMALIZER_CFLAGS/LIBS. |
9cff78f3cc4830cce2183f630ec671a98087e4d1 |
|
20-Apr-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Added missing stopwords_fi.txt |
4e07da7f29d35d1517fce9b7300c6c19f804325b |
|
20-Apr-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: Fixed test-fts-language to use TEXTCAT_DATADIR
This may still make too many assumptions about what data exists where.. So
we may need to remove this test from "make check". But for now leave it
there. |
c865b0e9c65fd77f7b2ab6f8616d3def5501ecb3 |
|
20-Apr-2015 |
Timo Sirainen <tss@iki.fi> |
Initial import for lib-fts.
Parts of what this code does was already implemented internally by
fts-lucene. lib-fts is intended to be usable for all the FTS backends. The
APIs are still going to change a bit, but hopefully not after v2.2.17
release.
Mostly written by Teemu Huovila. |