Summary: The StringSearch class (and underlying C APIs) fails to find a match in
NFD text, but will find a match in the equivalent NFC text.
The pattern being searched for is 03BA 03B1 03B9 (kai).
The text being searched is 03BA 03B1 03B9 0300 (NFD) or 03BA 03B1 1F76 (NFC).
The locale used for the search is "el".
A collator for the locale is created, and set to Primary strength (so that
accents will be ignored).
A standard character break iterator for the "el" locale is being used.
The StringSearch object is constructed using the primary-strength rules-based
collator and the character break iterator. When run on the NFD text, it finds no
matches. When run on the NFC text, it finds a match of length 3.
This problem only seems to occur with certain combining characters, however. If
we replace the 0300 with 0301 (and the 1F76 with 03AF, the corresponding
precomposed character), the StringSearch finds matches of length 4 and 3 in NFD
and NFC respectively. But if we use 0313 or 0314 (1F30 and 1F31 in NFC
respectively), the search again only succeeds on the NFC text.
A concise sample program that reproduces the problem can be provided if desired,
but essentially the code (without error checking) is:
UnicodeString pattern(L"\x03BA\x03B1\x03B9");
UnicodeString target1D(L"\x03BA\x03B1\x03B9\x0300");
Locale locale("el");
BreakIterator * pBreakIterator = BreakIterator::createCharacterInstance(locale,
nErrorCode);
Collator * pCollator = Collator::createInstance(locale, nErrorCode);
pCollator->setStrength(Collator::PRIMARY);
RuleBasedCollator * pRuleBasedCollator = static_cast<RuleBasedCollator
*>(pCollator);
StringSearch ss (pattern, target1D, pRuleBasedCollator, pBreakIterator,
nErrorCode);
int pos = ss.first(nErrorCode);
/* pos = USEARCH_DONE; expected 0 */