Hi,
I found a bug in the search module. If the pattern contins non basic
(>127)latin letter(s) and ignore case and the pattern occur the end of
text, userach_next will no match. I demostrate this with a Hungarian
character pair (SMALL/CAPITAL LETTER)
Example:
UErrorCode uerr;
UStringSearch* usearch;
/* the 0x31 and 0x32 just two dummy for demonstration */
/* this will fail, but see below */
UChar text[] = { 0x31, 0x32, 0x171 };
UChar pattern[] = { 0x170 };
usearch = usearch_open(pattern, sizeof(pattern) / sizeof(UChar),
text, sizeof(text) / sizeof(UChar), "hu_HU", PAL_NULL, &uerr);
if (U_SUCCESS(uerr))
{
/* ignore case */
ucol_setStrength(usearch_getCollator(usearch), UCOL_SECONDARY);
usearch_reset(usearch);
int32_t match_start = usearch_next(usearch, &uerr);
if (U_SUCCESS(uerr))
{
if (match_start == USEARCH_DONE)
printf("not found!\n");
else
printf("found at: %u\n", match_start);
}
usearch_close(usearch);
}
0x171 is a small Hungarian 'ű' letter
0x170 is a capital Hungarian 'Ű' letter
If you use this:
UChar text[] = { 0x171, 0x31, 0x32 };
UChar pattern[] = { 0x170 };
The result is: "found at: 0"
If you use this:
UChar text[] = { 0x31, 0x171, 0x32 };
UChar pattern[] = { 0x170 };
The result is: "found at: 1"
If you use this:
UChar text[] = { 0x31, 0x32, 0x171 };
UChar pattern[] = { 0x170 };
The result is: "not found"
If you use this (with basic latin letters):
UChar text[] = { 0x31, 0x32, 0x41 };
UChar pattern[] = { 0x61 };
The result is: "found at: 2"
OS: Windows
Tool: Visual Studio 2005
Both ICU 3.4.1 and 3.6 have this bug.
Regards,
Zoltan.