Ticket #5420 (assigned defect)

Bug contains 95 commit(s) | SVN Diffs for #5420

 

Opened 2 years ago

Last modified 6 months ago

String search matching problem

Reported by: andrei(at)yahoo-inc.com Assigned to: eric (accepted)
Priority: major Milestone: 4.0
Component: collation Version: 3.4
Keywords: collation Cc:
Load: Xref: 5382 3315 5382 5959
Java Version: Operating System: unix
Project (C/J): ICU4C Weeks: 2
Review: andy

Description (Last modified by srl)

I am on FreeBSD 4.

Using the samples/strsrch/strsrch.cpp program, I ran the following tests:

% ./strsrch -level 1 -source fu\\u00dfball -pattern fu\\u00df
Finding pattern fu\u00df in source fu\u00dfball
Pattern found at offset 0 size 3
 
% ./strsrch -level 1 -source fu\\u00dfball -pattern fuss
Finding pattern fuss in source fu\u00dfball
Pattern not found in source
 
% ./strsrch -level 1 -source fu\\u00dfball -pattern uss
Finding pattern uss in source fu\u00dfball
Pattern found at offset 1 size 2

My question is: why is "uss" found in "fußball", but "fuss" is not?

Attachments

Change History

12/31/69 17:29:34 changed by auditor

  • Mon Oct 2 09:01:05 2006 guest sent reply 1
  • Sat Oct 7 09:36:53 2006 guest sent reply 2
  • Sat Oct 7 09:43:37 2006 guest sent reply 3
  • Mon Oct 16 18:42:41 2006 grhoten changed notes2: assign: "" to "andy", priority: "" to "high", target: "UNSCH" to "3.8", weeks: "" to "0.5",
  • Mon Oct 16 18:42:41 2006 grhoten moved from incoming to collation
  • Fri Oct 20 23:02:57 2006 andy changed notes2: priority: "high" to "assess",

10/02/06 09:01:04 changed by hbrands(at)web.de

(Guest Reply)

I just wanted to add, that this problem is also present in the latest ICU4J version (3.6).

10/07/06 09:36:53 changed by jplemieux(at)gmail.com

(Guest Reply)

I am also using the ICU4J library and my users are experiencing this bug. +1 from me for its priority...

10/07/06 09:43:37 changed by jplemieux(at)gmail.com

(Guest Reply)

I am also using the ICU4J library and my users are experiencing this bug. +1 from me for its priority...

04/25/07 11:35:56 changed by srl

  • load changed.
  • xref changed.
  • java changed.
  • description changed.
  • revw changed.

08/31/07 10:58:07 changed by andy

  • xref set to 5382.
  • milestone changed from 3.8 to 4.0.

The problem is confirmed, the fix will not be easy. 5382 is probably from the same cause.

09/28/07 13:43:17 changed by andy

  • priority changed from assess to major.

01/08/08 10:50:30 changed by eric

  • owner changed from andy to eric.
  • status changed from new to assigned.
  • weeks changed from 0.5 to 2.

05/28/08 11:26:40 changed by eric

  • xref changed from 5382 to 5382 3315 5382 5959.
  • revw set to andy.

This is a fundamental problem with the Boyer-Moore implementation. The value returned by getMaxExpansion is not sufficient in all cases for computing skip distances.

As an interim fix, the Boyer-Moore implementation was replaced by a linear search, which is slower but accurate.

The problems reported in tickets 3315, 5382 and 5959 are also caused by this problem.


Add/Change #5420 (String search matching problem)




Anti spam check: