b1b0b2b543dc1a10015272fc970ad7534f84e0c5 |
|
22-Nov-2016 |
Timo Sirainen <timo.sirainen@dovecot.fi> |
lib-fts: Make sure address tokenizer can't return empty tokens.
This happened when address was a token that first looked like it could be
a valid address, but then got truncated due to reaching maxlen, followed
by truncating the UTF8-sequence and finally all the rest of the '-' or
'.' chars that were valid at the beginning of the address are stripped
away by fts_tokenizer_delete_trailing_invalid_char(), leaving nothing left.
Fixes:
Panic: file fts-tokenizer.c: line 206 (fts_tokenizer_next): assertion failed: (ret <= 0 || (*token_r)[0] != '\0') |
af177be2664018e8074d69449b9c6a2d9741ec25 |
|
16-Mar-2016 |
Teemu Huovila <teemu.huovila@dovecot.fi> |
lib-fts: Limit maximum length of addresses found.
The address tokenizer now takes a "maxlen" parameter, which
defaults to 254 bytes.
Previously addresses, or something looking like it, could
be of any length. This could cause trouble in fts backends. |
19ed8f08b23d6ed204e6b27e5d1c0c6fe6bb11dd |
|
15-Nov-2015 |
Phil Carmody <phil@dovecot.fi> |
various - remove 8-bit characters from literal strings in test cases
C has a portable way of expressing characters not in the basic character
set, namely \xNN escaping. Otherwise, the interpretation of the raw utf-8
is implentation dependent. This has the benefit of making some tests'
expected output more obvious, such as "=c3=a4" matching "\xC3\xA4", even
if it hinders the readability of some natural-language-based tests.
Signed-off-by: Phil Carmody <phil@dovecot.fi> |
b6b06530d654f0436bfbaefc1e988d53fff0cbee |
|
01-Jun-2015 |
Timo Sirainen <tss@iki.fi> |
lib-fts: tokenizers - Fixed removal of trailing character in truncated tokens.
If the token is truncated, we don't want to remove the trailing character
since it's not actually there.
Also we don't want to remove trailing apostrophes from a truncated word,
because they're not actually at the end of the (untruncated) token there.
This doesn't make a big difference, but it's slightly more correct. |