Ticket #5043 (new enhancement)

Bug contains 1 commit(s) | SVN Diffs for #5043

 

Opened 3 years ago

Last modified 1 year ago

UText provider for lenient-UTF-8 strings

Reported by: markus.scherer(at)us.ibm.com Assigned to: markus
Priority: minor Milestone: UNSCH
Component: strings Version:
Keywords: Cc:
Load: Xref: 4776
Java Version: Operating System: all
Project (C/J): ICU4C Weeks: 1.5
Review:

Description

Add a UText provider implementation for lenient-UTF-8 strings in the sense described in the uciter8 sample. There, the uit_len8.c file describes this as follows:

  • This code leniently reads 8-bit Unicode strings,
  • which could contain a mix of UTF-8 and CESU-8.
  • More precisely:
  • - supplementary code points may be encoded with dedicated 4-byte sequences
  • (UTF-8 style)
  • - supplementary code points may be encoded with
  • pairs of 3-byte sequences, one for each surrogate of the UTF-16 form
  • (CESU-8 style)
  • - single surrogates are allowed, encoded with their "natural" 3-byte

sequences

Attachments

Change History

12/31/69 18:23:38 changed by notes

postponed from 3.6

12/31/69 18:23:39 changed by auditor

  • Fri Mar 31 14:01:51 2006 ram changed notes2: priority: "committed" to "expected",
  • Sat Aug 5 06:41:41 2006 schererm changed notes2: target: "3.6" to "3.8",
  • Sat Aug 5 06:41:41 2006 schererm changed notes
  • Mon Nov 6 22:23:50 2006 grhoten changed notes2: priority: "expected" to "high",

08/29/07 20:58:23 changed by markus

  • load changed.
  • java changed.
  • priority changed from major to minor.
  • milestone changed from 3.8 to UNSCH.
  • keywords deleted.
  • revw changed.

09/26/07 16:15:47 changed by markus

  • summary changed from RFE: UText provider for lenient-UTF-8 strings to UText provider for lenient-UTF-8 strings.

Add/Change #5043 (UText provider for lenient-UTF-8 strings)




Anti spam check: