c47167da2f6443479e5c44ab31d2974252abdfd8 |
|
23-Oct-2017 |
Aki Tuomi <aki.tuomi@dovecot.fi> |
unichar: Use surrogate macros in ucs4 validity check |
2bb5b6721e9971b3bcbb2da48eebead7fd9488ee |
|
23-Oct-2017 |
Aki Tuomi <aki.tuomi@dovecot.fi> |
unichar: Add surrogate handling |
5fc1d7c7caffa7e5616a1681503dfea0fc582aae |
|
23-Oct-2017 |
Aki Tuomi <aki.tuomi@dovecot.fi> |
unichar: Add uni_is_valid_ucs4 |
3b02371c120dcd455a09148abc5a3f520880fef2 |
|
03-Jun-2015 |
Timo Sirainen <tss@iki.fi> |
lib: Added UTF8_IS_START_SEQ() helper macro |
304a9d2db2669ad910577e00dce2f81bfd0d5d39 |
|
01-Jun-2015 |
Phil Carmody <phil@dovecot.fi> |
lib: API change - have uni_utf8_get_char*() return _char_bytes
Often the two functions are called in close proximity (both ways round). As
_get_char*() calls _char_bytes() early on the success path, we may as well
return that value to the caller for immediate use.
The callers which call _char_bytes() first are simply rejecting the truncated
case quickly - all other invalid cases still call both functions, and all
other valid cases (which should be the fast path) likewise call both.
Signed-off-by: Phil Carmody <phil@dovecot.fi> |
c8b84f03c71e18f07940d1b60a77a4caf5e7c23b |
|
16-May-2015 |
Timo Sirainen <tss@iki.fi> |
lib: Added UNICODE_REPLACEMENT_CHAR_UTF8 |
32ae620015da6ab2ec28e04d3cdcdb4420f1fa6b |
|
15-Jan-2015 |
Timo Sirainen <tss@iki.fi> |
lib: Fixed NUL-handling in uni_utf8_*strlen*()
uni_utf8_strlen() could have skipped over the ending NUL byte and caused
read buffer overflows with invalid input.
uni_utf8_strlen_n() and uni_utf8_partial_strlen_n() now allow NUL bytes in
the input and they're treated as regular control characters. Previously the
size was actually treated as max_size with early NUL byte termination.
Technically this is an API change, but I'm not aware of anything using these
functions in an incompatible way. |
f66c8939c39e6bcd9dd5482bfd9689bd177ce0d4 |
|
10-Jan-2015 |
Timo Sirainen <tss@iki.fi> |
lib: Added uni_utf8_partial_strlen_n() |
c51afc0ab251923fbfcad5059af27a7fefab3502 |
|
27-Nov-2012 |
Timo Sirainen <tss@iki.fi> |
Reversed recent "short utf8" changes.
Solr code needs to parse the UTF8 input explicitly anyway to encode the XML
characters. And all the character checks were already done in it. |
cb0b320bea887d305e0e283c6bc74677d51a785e |
|
27-Nov-2012 |
Timo Sirainen <tss@iki.fi> |
liblib: Added uni_utf8_short_*() for handling UTF8 data where [56]-byte sequences are invalid. |
d74c9540cd64888055c4840a4544b1de4248e584 |
|
18-Sep-2012 |
Timo Sirainen <tss@iki.fi> |
Backported parts of normalizer_func_t changes from v2.2 tree. |
d9076f5939edf5d20a261494b1a861dcbb0d32e2 |
|
15-Sep-2012 |
Timo Sirainen <tss@iki.fi> |
Replaced "decomposed titlecase" conversions with more generic normalizer function.
Plugins can now change mail_user.default_normalizer. Specific searches can
also use different normalizers by changing mail_search_context.normalizer. |
3412b625dd238cc0774db968e6c351b007a98e25 |
|
15-Sep-2012 |
Timo Sirainen <tss@iki.fi> |
uni_utf8_to_decomposed_titlecase(): Require input length to be exact now.
Most of the callers did that already anyway |
88311240b8db117b120171a861a64e399dab57af |
|
22-Jul-2011 |
Timo Sirainen <tss@iki.fi> |
Added uni_utf8_strlen(). |
c6ead31ba07401556abe0c69374d7fbed99844e7 |
|
31-May-2011 |
Timo Sirainen <tss@iki.fi> |
liblib: Added uni_utf8_to_ucs4_n(). |
296857bde8dbe965bcfe5e96cf06d37c297d9315 |
|
18-Feb-2011 |
Timo Sirainen <tss@iki.fi> |
Added uni_utf8_data_is_valid(). |
f2de6ecc4424533633aea705f12d0f691d7ddf81 |
|
20-Aug-2010 |
Timo Sirainen <tss@iki.fi> |
Added a global utf8_replacement_char variable. |
a0044466cc46baf25a316ea63781c60aa52b58ca |
|
19-Aug-2010 |
Timo Sirainen <tss@iki.fi> |
UTF-8 string validity was still checked incorrectly. |
ef1d718c6a3a3a48b9835b004b8496de9dc4bec5 |
|
10-Nov-2009 |
Timo Sirainen <tss@iki.fi> |
Added uni_utf8_str_is_valid().
--HG--
branch : HEAD |
0b2b090cdc3d36f30d6d2ec99b35ac0b7657d538 |
|
01-Nov-2008 |
Timo Sirainen <tss@iki.fi> |
Added some UTF16_ macros for helping UTF-16 conversions.
--HG--
branch : HEAD |
68a4946b12583b88fa802e52ebee45cd96056772 |
|
20-Jun-2008 |
Timo Sirainen <tss@iki.fi> |
Added more consts, ATTR_CONSTs and ATTR_PUREs.
--HG--
branch : HEAD |
8e9666f46faceeef0f3c6f706f10f3a873e4b0eb |
|
22-Jan-2008 |
Timo Sirainen <tss@iki.fi> |
Replace invalid UTF8 input with a replacement character.
--HG--
branch : HEAD |
54df49100a0111a956662cb8a327969badd2d72d |
|
27-Dec-2007 |
Timo Sirainen <tss@iki.fi> |
Define unichars array type and use it for uni_utf8_to_ucs4() output.
--HG--
branch : HEAD |
7aa59f55d8a4e02c7039fbd22660c4055bfc8393 |
|
08-Dec-2007 |
Timo Sirainen <tss@iki.fi> |
uni_utf8_get_valid_data() API changed.
--HG--
branch : HEAD |
511ba4416aafb9f9ba1a4193703b95a033267068 |
|
08-Dec-2007 |
Timo Sirainen <tss@iki.fi> |
Moved uni_utf8_get_valid_data() to lib/
--HG--
branch : HEAD |
c25356d5978632df6203437e1953bcb29e0c736f |
|
16-Sep-2007 |
Timo Sirainen <tss@iki.fi> |
Changed .h ifdef/defines to use <NAME>_H format.
--HG--
branch : HEAD |
0ddb604f911b908085ef787455c015a91dc9c365 |
|
20-Jul-2007 |
Timo Sirainen <tss@iki.fi> |
Added uni_ucs4_to_titlecase() and uni_utf8_to_decomposed_titlecase(). They
use a unicharmap.c file generated from UnicodeData.txt.
--HG--
branch : HEAD |
2a7605bb97dc9ed8accf2537fad1073a5fc5ff48 |
|
11-Jun-2007 |
Timo Sirainen <tss@iki.fi> |
Rewrote some code and cleaned up the API
--HG--
branch : HEAD |
aa883f5fbc68920c48c4f52919e8a5bb9611e678 |
|
13-Dec-2006 |
Timo Sirainen <tss@iki.fi> |
Added unichar_t UCS-4 type and some ucs4/utf8 functions.
--HG--
branch : HEAD |