Ticket #2934 (closed enhancement: wontfix)

SVN Diffs for #2934

 

Opened 6 years ago

Last modified 9 months ago

RFE: Thai-Latin transform in ICU4C

Reported by: ram(at)jtcsv.com Assigned to: srl
Priority: minor Milestone: 3.0
Component: unknown Version:
Keywords: returned Cc:
Load: Xref: 2034, 4009
Java Version: Operating System: all
Project (C/J): ICU4C,ICU4J and ICU4JNI Weeks: 0.2
Review:

Description (Last modified by srl)

ICU4J has Thai-Latin Transform. This should be eiter ported to ICU4C or special syntax need to be added to Transform to invoke a break iterator. <quote author="Alan"> The Thai transliterator uses a temporary break-iterator solution that Mark put together. My guess is that we never ported it to C, since the "real" solution was to implement "\b" syntax in transliterator (which would invoke a break iterator). So this remained a Java-only implementation. </quote>

=======Test Code==================== class LegalThai : public Legal{ private:

BreakIterator* thaiBreak; // anything is legal except word ending with Logical-order-exception

public :

LegalThai(UErrorCode& status){

if(U_FAILURE(status)){

return;

} thaiBreak = BreakIterator::createWordInstance(Locale("th",

"TH"),status);

}

~LegalThai(){

delete thaiBreak;

} UBool is(UnicodeString sourceString) {

if (sourceString.length() == 0) return TRUE; UChar32 ch = sourceString.charAt(sourceString.length() - 1); // don't

worry about surrogates.

if (u_hasBinaryProperty(ch, UCHAR_LOGICAL_ORDER_EXCEPTION)) return

FALSE;

return TRUE;

}

};

void TransliteratorRoundTripTest::TestThai() {

RTTest test("Latin-Thai"); UErrorCode status = U_ZERO_ERROR; LegalThai lt(status); if(U_FAILURE(status)){

errln("Could not create a LegalThai oject. Error : %s",

u_errorName(status));

return;

} test.test("[a-zA-Z\\u0142\\u1ECD\\u00E6\\u0131\\u0268\\u02CC]",

"[\\u0E01-\\u0E3A\\u0E40-\\u0E5B]", "[a-zA-Z\\u0142\\u1ECD\\u00E6\\u0131\\u0268\\u02B9\\u02CC]", this, quick, &lt,50 );

}

Attachments

Change History

12/31/69 18:06:36 changed by notes2

12/31/69 18:06:37 changed by auditor

  • 06/05/03 18:17:48 mark changed notes2
  • 06/07/03 11:35:15 schererm moved from incoming to transliterate
  • 08/11/03 20:51:16 hshih changed notes2
  • 02/04/04 17:15:25 alan changed notes2
  • 02/04/04 17:15:44 alan changed notes2
  • 02/17/04 17:47:01 weiv changed notes2
  • 02/26/04 13:36:08 weiv changed notes2
  • 02/26/04 13:36:08 weiv moved from transliterate to returned

04/09/08 13:40:28 changed by srl

  • load changed.
  • java changed.
  • description changed.
  • revw changed.

NB the "private comments" say: "will be fixed with #2034, return"

Internal documents from 20040225 say: 2934 Thai-Latin transform, Alan: this bug [adding a new Thai-Latin transliterator to ICU4C] is not the right approach. Jitterbug #2034, adding a Unicode Word Boundary construct ( \b ) to transliteration is the correct approach, and also fixes #2934.

04/09/08 13:41:49 changed by srl

  • xref changed from 2034 to 2034, 4009.

Bug #4009 seems to have picked up the 'break transliterator'


Add/Change #2934 (RFE: Thai-Latin transform in ICU4C)




Anti spam check: