Ticket #5589 (assigned defect)

SVN Diffs for #5589

 

Opened 2 years ago

Last modified 6 months ago

Thai layout broken for <0E25, 0E37, 0E4C>?

Reported by: markus Assigned to: eric (accepted)
Priority: minor Milestone: 4.2
Component: layout Version:
Keywords: Cc: lokeshjoshi@gmail.com
Load: Xref: 2382, 2386, 4740
Java Version: Operating System:
Project (C/J): ICU4C Weeks: 1
Review:

Description

Thread "Query for Validity of Thai Sequence" on the Unicode mailing list 20070207..09.

Lokesh Joshi asks "...can anyone pls confirm that the thai unicode sequence:

U+0E25 (THAI CHARACTER LO LING) U+0E37 (THAI CHARACTER SARA UEE) U+0E4C (THAI CHARACTER THANTHAKHAT)

is a valid sequence..."

and says "...ICU shows the sequence as illegal, showes CIRCLE below the THANTHAKHAT."

John Hudson replies: "I tested this sequence with the version of the Uniscribe Thai engine that ships with Vista, and it works fine, i.e. no dotted cirles."

Attachments

Change History

02/09/07 14:07:33 changed by markus

  • cc set to Lokesh, Joshi, <lokeshjoshi@gmail.com>.

02/09/07 14:07:48 changed by markus

  • cc changed from Lokesh, Joshi, <lokeshjoshi@gmail.com> to lokeshjoshi@gmail.com.

02/15/07 10:57:42 changed by eric

According to http://www.nectec.or.th/it-standards/keyboard_layout/thai-key.htm and http://www.inet.co.th/cyberclub/trin/thairef/wtt2/char-class.pdf, the ICU implementation is correct. They say that U+0E37 has class AV3 and that U+0E4C has class AD1, and that AV1 does not compose with AV3.

02/15/07 11:01:40 changed by eric

Sorry, make that "*AD1* does not compose with AV3." I'm checking w/ IBM Thailand to confirm whether or not the sequence is legal.

02/16/07 10:17:32 changed by eric

  • status changed from new to assigned.
  • weeks set to 1.
  • xref set to 2382, 2386, 4740.

Suwit Srivilairith, my contact at IBM Thailand, confirms that the sequence is illegal according to WTT 2.0, which is a Thai national standard. Other posters on the Unicode list thread point out, however, that the strict checking of WTT 2.0 makes it impossible to write Pali and Sanskrit using the Thai script. Another poster suggested that the strict checking of WTT 2.0 should more correctly be done in a spell checker rather than in the rendering engine, which makes sense to me.

Here's the Microsoft Thai OT spec. section about invalid combining marks: http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx

Therefore, I'm inclined to think that more relaxed checking is in order. This, unfortunately, will require a redesign of the Thai engine. Perhaps this should be done in conjunction w/ the work for Thai and Lao OpenType processing, and perhaps the generic mark checking. (Tickets 2382, 2386, 4740)

07/10/08 10:40:53 changed by yoshito

  • load changed.
  • priority changed from major to assess.

07/21/08 09:07:11 changed by hchapman

  • priority changed from assess to minor.
  • milestone changed from UNSCH to 4.2.

Add/Change #5589 (Thai layout broken for <0E25, 0E37, 0E4C>?)




Anti spam check: