Ticket #4279 (closed defect: fixed)

Bug contains 2 commit(s) | SVN Diffs for #4279

 

Opened 5 years ago

Last modified 2 years ago

StringSearch succeeds on NFC text but not NFD

Reported by: bgrainger(at)hotmail.com Assigned to: michaelow
Priority: critical Milestone: 4.0
Component: collation Version: 3.2
Keywords: Cc:
Load: Xref: 5024
Java Version: Operating System: win32
Project (C/J): ICU4C Weeks: 1
Review: srl

Description (Last modified by srl)

Summary: The StringSearch class (and underlying C APIs) fails to find a match in NFD text, but will find a match in the equivalent NFC text.

The pattern being searched for is 03BA 03B1 03B9 (kai). The text being searched is 03BA 03B1 03B9 0300 (NFD) or 03BA 03B1 1F76 (NFC). The locale used for the search is "el". A collator for the locale is created, and set to Primary strength (so that accents will be ignored). A standard character break iterator for the "el" locale is being used.

The StringSearch object is constructed using the primary-strength rules-based collator and the character break iterator. When run on the NFD text, it finds no matches. When run on the NFC text, it finds a match of length 3.

This problem only seems to occur with certain combining characters, however. If we replace the 0300 with 0301 (and the 1F76 with 03AF, the corresponding precomposed character), the StringSearch finds matches of length 4 and 3 in NFD and NFC respectively. But if we use 0313 or 0314 (1F30 and 1F31 in NFC respectively), the search again only succeeds on the NFC text.

A concise sample program that reproduces the problem can be provided if desired, but essentially the code (without error checking) is:

UnicodeString pattern(L"\x03BA\x03B1\x03B9");
UnicodeString target1D(L"\x03BA\x03B1\x03B9\x0300");
Locale locale("el");
BreakIterator * pBreakIterator = BreakIterator::createCharacterInstance(locale,
nErrorCode);
Collator * pCollator = Collator::createInstance(locale, nErrorCode);
pCollator->setStrength(Collator::PRIMARY);
RuleBasedCollator * pRuleBasedCollator = static_cast<RuleBasedCollator
*>(pCollator);
StringSearch ss (pattern, target1D, pRuleBasedCollator, pBreakIterator,
nErrorCode);
int pos = ss.first(nErrorCode);
/* pos = USEARCH_DONE; expected 0 */

Attachments

Change History

12/31/69 17:29:15 changed by auditor

  • Mon Dec 6 14:29:39 2004 weiv changed notes2: assign: "" to "weiv", priority: "" to "critical", target: "UNSCH" to "3.4",
  • Mon Dec 6 14:29:39 2004 weiv moved from incoming to collation
  • Fri Jan 7 02:02:31 2005 weiv changed notes2: weeks: "" to "1",
  • Wed Jul 13 15:04:47 2005 weiv changed notes2: target: "3.4" to "3.6",
  • Tue Jan 31 12:13:15 2006 weiv changed notes2: xref: "" to "5024",
  • Fri Mar 31 14:23:40 2006 ram changed notes2: target: "3.6" to "3.8",
  • Fri Oct 13 18:17:03 2006 andy changed notes2: target: "3.8" to "3.8 Candidate",

04/25/07 11:27:17 changed by srl

  • load changed.
  • java changed.
  • description changed.
  • revw changed.

10/04/07 12:09:00 changed by grhoten

  • keywords deleted.
  • owner changed from weiv to michaelow.
  • milestone changed from 3.8 candidate to 4.0.

10/18/07 16:31:57 changed by michaelow

  • revw set to srl.

This bug is fixed by the fix for ticket: 5950. When trying to reproduce this problem, the pattern is found on both NFD and NFC text.

11/06/07 15:43:49 changed by michaelow

  • status changed from new to assigned.

12/12/07 10:44:18 changed by srl

  • status changed from assigned to closed.
  • resolution set to fixed.

Add/Change #4279 (StringSearch succeeds on NFC text but not NFD)




Anti spam check: