Ticket #5024 (closed defect: fixed)

Bug contains 6 commit(s) | SVN Diffs for #5024

 

Opened 3 years ago

Last modified 1 year ago

StringSearch with Secondary strength collator fails to match diacritic at end of input

Reported by: john_thomson(at)sil.org Assigned to: michaelow
Priority: critical Milestone: 4.0
Component: collation Version: 3.4
Keywords: Cc:
Load: Xref: 4279, 5382
Java Version: Operating System: win32
Project (C/J): ICU4C and ICU4J Weeks:
Review: srl

Description (Last modified by srl)

A StringSearch object with its default collator's strength set to Collator::SECONDARY fails to match a pattern which occurs at the end of the input string, even if it matches exactly.

For example, the following code sets m_ichMinFound to -1:

	StringSearch * m_piter;
	UErrorCode m_error = U_ZERO_ERROR;
	int m_ichMinFound;

	u_setDataDirectory("c:\\fw\\distfiles\\Icu34\\icudt34l"); // Happens to be
where my data lives :-)
	u_init(&m_error);
	wchar_t * pattern = L"a\xe1"; // L"aá" NFC
	wchar_t * searchData = L"aa\xe1"; // L"aaá" NFC
	m_piter = new StringSearch(pattern, searchData, Locale::createFromName("en"),
0, m_error);
	m_piter->getCollator()->setStrength(Collator::SECONDARY);
	m_piter->reset(); // seems to be needed to make MOST cases work after setting
strength

	m_ichMinFound = m_piter->first(m_error);

On the other hand, the code works as expected if an additional 'a' is appended to both strings, setting m_ichMinFound to 1.

It may be that there is a more appropriate way to request matching diacritics when initializing a StringSeach using a Locale. If so I failed to find it (when searching the 2.8 docs some time ago). But this approach works for almost all cases; it seems to me it should either work for all cases or clearly return an error condition.

Attachments

Change History

12/31/69 17:29:27 changed by auditor

  • Tue Jan 31 12:13:23 2006 weiv changed notes2: assign: "" to "weiv", priority: "" to "critical", target: "UNSCH" to "", xref: "" to "4279",
  • Tue Jan 31 12:13:23 2006 weiv moved from incoming to collation
  • Fri Oct 13 18:39:47 2006 andy changed notes2: target: "UNSCH" to "3.8 Candidate",

04/25/07 11:34:24 changed by srl

  • load changed.
  • weeks changed.
  • java changed.
  • description changed.
  • revw changed.

04/26/07 10:15:19 changed by andy

  • xref changed from 4279 to 4279, 5382.

10/04/07 12:08:38 changed by grhoten

  • keywords deleted.
  • owner changed from weiv to michaelow.
  • milestone changed from 3.8 candidate to 4.0.

10/19/07 15:19:21 changed by michaelow

  • revw set to srl.

The failure to match diacritic at the end of the input is due to the calling of getCE in the hasAccentsAfterMatch. The search is done correctly, however, when hasAccentsAfterMatch is called, the last collation element checked gets masked by getCE which changes it from UCOL_NULLORDER to something else causing the match to actually fail. After removing this line of code, the search returns the correct value.

11/06/07 15:44:25 changed by michaelow

  • status changed from new to assigned.

11/30/07 14:23:17 changed by srl

  • status changed from assigned to closed.
  • resolution set to fixed.

Add/Change #5024 (StringSearch with Secondary strength collator fails to match diacritic at end of input)




Anti spam check: