Ticket #5880 (assigned defect)

SVN Diffs for #5880

 

Opened 1 year ago

Last modified 1 year ago

narrow (and hardcode?) JIS X 0201 for ISO-2022-JP

Reported by: markus Assigned to: markus (accepted)
Priority: minor Milestone: 4.0
Component: conversion Version: Current
Keywords: Cc:
Load: Xref: 5797
Java Version: Operating System:
Project (C/J): ICU4C Weeks: 0.2
Review: jungshik

Description

It looks like ISO-2022-JP converters (all versions) will output C0 control code bytes when using fallbacks from the JIS X 0201 converter, which currently uses the ibm-897_P100-1995.ucm table which has fallbacks for PC control image look-alikes. This could be particularly problematic for those that fall back to the bytes that also encode SO/SI/ESC:

<UFFED> \x0E |1
<U263C> \x0F |1
<U21B5> \x1B |1

The best solution seems to be to hardcode the JIS X 0201 conversion in ucnv2022.c, omitting the fallbacks to C0 control code bytes. Otherwise, it's mostly the same as ASCII (except for 5C and 7E) and halfwidth Katakana, plus fallbacks for fullwidth ASCII. This is very easy to do in a few lines of code.

Then if JIS X 0201 is not needed as a separate converter, this .cnv table file can be removed.

Attachments

Change History

08/25/07 20:33:03 changed by grhoten

The table can be removed, but when the old jisx201 table is resurrected from ICU 3.0, the table can be customized as needed (if at all).

There is no need to hard code the table into the source code. I'd like to share this data between ICU4J and ICU4C, and keeping hard coded data in sync is problematic.

10/05/07 10:08:12 changed by markus

  • status changed from new to assigned.

10/11/07 15:15:38 changed by markus

  • xref set to 5797.
  • revw set to jungshik.

implemented as part of #5797


Add/Change #5880 (narrow (and hardcode?) JIS X 0201 for ISO-2022-JP)




Anti spam check: