Ticket #5993 (closed defect: fixed)

Bug contains 1 commit(s) | SVN Diffs for #5993

 

Opened 2 years ago

Last modified 2 years ago

hang in string search function usearch_first() when search string is single combining diacritical

Reported by: dmso@... Assigned to: michaelow
Priority: major Milestone: 4.0
Component: collation Version: 3.8
Keywords: Cc:
Load: Xref:
Java Version: Operating System:
Project (C/J): all Weeks: 0.5
Review: srl

Description

usearch_first() never returns when the search string is a single combining diacritical. Even though this kind of search might not be logical, usearch_first() should not hang. I have tested on ICU 3.2.1 and ICU 3.8 and verified this issue exists on both. Here is a small program that demonstrates this.

#include <stdio.h> #include "unicode/ucol.h" #include "unicode/ubrk.h" #include "unicode/usearch.h"

int main() {

UChar search[] = { 0x0300 }; UChar source[] = { 0x0020,

0xDD3D, 0x0020, 0xDD3D, 0x0055, 0xDD3D, 0x0075, 0xDD3D, 0x00D9, 0xDD3D, 0x0055, 0x0300, 0xDD3D, 0x0055, 0x0340, 0xDD3D, 0x00F9, 0xDD3D, 0x0075, 0x0300, 0xDD3D, 0x0075, 0x0340, 0xDD3D, 0xD978, 0xDCF9, 0xDD3D, 0x0306,

0x0020 };

int32_t searchLen; int32_t sourceLen; UErrorCode icuStatus = U_ZERO_ERROR; UCollator *coll; const char *locale; UBreakIterator *ubrk; UStringSearch *usearch; int32_t match = 0;

searchLen = sizeof(search)/sizeof(UChar); sourceLen = sizeof(source)/sizeof(UChar);

coll = ucol_openFromShortString( "LDE_AN_CX_EX_FX_HX_NX_S2",

false, NULL, &icuStatus );

if ( U_FAILURE(icuStatus) ) {

printf( "ucol_openFromShortString error\n" ); goto exit;

}

locale = ucol_getLocaleByType( coll,

ULOC_VALID_LOCALE, &icuStatus );

if ( U_FAILURE(icuStatus) ) {

printf( "ucol_getLocaleByType error\n" ); goto exit;

}

ubrk = ubrk_open( UBRK_CHARACTER,

locale, source, sourceLen, &icuStatus );

if ( U_FAILURE(icuStatus) ) {

printf( "ubrk_open error\n" ); goto exit;

}

usearch = usearch_openFromCollator( search,

searchLen, source, sourceLen, coll, NULL, &icuStatus );

if ( U_FAILURE(icuStatus) ) {

printf( "usearch_openFromCollator error\n" ); goto exit;

}

usearch_setAttribute( usearch,

USEARCH_OVERLAP, USEARCH_ON, &icuStatus );

if ( U_FAILURE(icuStatus) ) {

printf( "usearch_setAttribute error\n" ); goto exit;

}

match = usearch_first( usearch,

&icuStatus );

if ( U_FAILURE(icuStatus) ) {

printf( "usearch_first error\n" ); goto exit;

}

printf( "match=%d\n", match );

exit:

return 0;

}

Attachments

Change History

10/16/07 12:12:15 changed by dmso@...

More info. The repro program can be simplified by shorting the source string to only contain the character U+00D9 (LATIN CAPITAL LETTER U WITH GRAVE). The search string is U+0300 (COMBINING GRAVE ACCENT). The hang only occurs on Strength 2 collations, it does not seem to occur on Strength 1, 3, or 4 collations.

10/24/07 11:53:25 changed by michaelow

  • owner changed from somebody to michaelow.
  • weeks changed.
  • xref changed.
  • revw changed.

10/24/07 11:54:16 changed by grhoten

  • weeks set to 0.5.
  • milestone changed from UNSCH to 4.0.

11/06/07 12:05:26 changed by michaelow

  • status changed from new to assigned.
  • revw set to srl.

The hanging search function is the result of an infinite loop caused by collation element iterator returning UCOL_NULLORDER which the loop condition was not checking for. This was in checkExtraMatchAccents internal function.

06/27/08 13:02:16 changed by srl

  • status changed from assigned to closed.
  • resolution set to fixed.

Add/Change #5993 (hang in string search function usearch_first() when search string is single combining diacritical)




Anti spam check: