Ticket #5382 (closed defect: fixed)

SVN Diffs for #5382

 

Opened 2 years ago

Last modified 1 month ago

Bug in the search module

Reported by: nzoltan(at)freemail.hu Assigned to: eric
Priority: major Milestone: 4.0
Component: collation Version: 3.6
Keywords: collation Cc:
Load: Xref: 5420
Java Version: Operating System: all
Project (C/J): ICU4C Weeks: 2
Review: andy

Description

Hi,

I found a bug in the search module. If the pattern contins non basic (>127)latin letter(s) and ignore case and the pattern occur the end of text, userach_next will no match. I demostrate this with a Hungarian character pair (SMALL/CAPITAL LETTER)

Example:


UErrorCode uerr; UStringSearch* usearch;

/* the 0x31 and 0x32 just two dummy for demonstration */

/* this will fail, but see below */ UChar text[] = { 0x31, 0x32, 0x171 }; UChar pattern[] = { 0x170 };

usearch = usearch_open(pattern, sizeof(pattern) / sizeof(UChar),

text, sizeof(text) / sizeof(UChar), "hu_HU", PAL_NULL, &uerr);

if (U_SUCCESS(uerr)) {

/* ignore case */ ucol_setStrength(usearch_getCollator(usearch), UCOL_SECONDARY); usearch_reset(usearch);

int32_t match_start = usearch_next(usearch, &uerr);

if (U_SUCCESS(uerr)) {

if (match_start == USEARCH_DONE)

printf("not found!\n");

else

printf("found at: %u\n", match_start);

}

usearch_close(usearch);

}

0x171 is a small Hungarian 'ű' letter 0x170 is a capital Hungarian 'Ű' letter

If you use this:

UChar text[] = { 0x171, 0x31, 0x32 }; UChar pattern[] = { 0x170 };

The result is: "found at: 0"

If you use this:

UChar text[] = { 0x31, 0x171, 0x32 }; UChar pattern[] = { 0x170 };

The result is: "found at: 1"

If you use this:

UChar text[] = { 0x31, 0x32, 0x171 }; UChar pattern[] = { 0x170 };

The result is: "not found"

If you use this (with basic latin letters):

UChar text[] = { 0x31, 0x32, 0x41 }; UChar pattern[] = { 0x61 };

The result is: "found at: 2"

OS: Windows Tool: Visual Studio 2005

Both ICU 3.4.1 and 3.6 have this bug.

Regards, Zoltan.

Attachments

Change History

12/31/69 17:29:33 changed by auditor

  • Mon Oct 16 18:42:20 2006 grhoten changed notes2: assign: "" to "andy", priority: "" to "high", target: "UNSCH" to "3.8", weeks: "" to "0.5",
  • Mon Oct 16 18:42:20 2006 grhoten moved from incoming to collation

02/02/07 14:53:09 changed by andy

  • xref changed.
  • java changed.
  • revw changed.
  • milestone changed from 3.8 to 3.8 M2.

02/05/07 12:31:56 changed by andy

  • weeks changed from 0.5 to 2.

08/31/07 10:56:04 changed by andy

  • load changed.
  • xref set to 5420.
  • milestone changed from 3.8 M2 to 4.0.

01/08/08 10:50:49 changed by eric

  • owner changed from andy to eric.
  • status changed from new to assigned.

05/28/08 11:30:11 changed by eric

  • revw set to andy.

This is a fundamental problem with the Boyer-Moore implementation. The value returned by getMaxExpansion is not sufficient in all cases for computing skip distances. Fixes for this are checked into bug 5420.

10/22/08 15:03:34 changed by andy

  • status changed from assigned to closed.
  • resolution set to fixed.

Changes are under ticket 5420


Add/Change #5382 (Bug in the search module)




Anti spam check: