Ticket #1473 (new enhancement)

SVN Diffs for #1473

 

Opened 7 years ago

Last modified 2 years ago

RFE:need spellout rules for mandarin

Reported by: gclprjg(at)cn.ibm.com Assigned to: doug
Priority: minor Milestone: UNSCH
Component: formatting Version: 2.0
Keywords: formatting Cc:
Load: Xref: 3658
Java Version: Operating System: win32
Project (C/J): all Weeks: 1
Review:

Description

I am researching the spelled-out function of icu4j 1.3.1, when I debuging my program, i just wanna "23" be spelled out in chinese just as "¶þÈý" when the locale is CHINA, but i can not see the chinese words, only the words same as the Locale is US as"twenty three", why i can't see chinese words when Locale is CHINA? below is my program, maybe u can have a look and check it for me, or maybe u can tell me where to find some sample programs, thanks a lot!

import com.ibm.text.*; import com.ibm.text.NumberFormat; import com.ibm.util.*; import java.util.Locale;

public class NumberSpellOut {

public static void main(String args[]) {

try {

//Locale lc = Locale.getDefault(); String language = args[0]; String country = args[1]; Locale lc = new Locale(language, country); String lg = lc.getDisplayLanguage(); System.out.println(lg); String co = lc.getDisplayCountry(); System.out.println(co);

//com.ibm.text.NumberFormat nf = com.ibm.text.NumberFormat.getInstance(lc); //com.ibm.text.NumberFormat nf =

com.ibm.text.NumberFormat.getNumberInstance(Locale.CHINESE);

com.ibm.text.RuleBasedNumberFormat rbnf =

new com.ibm.text.RuleBasedNumberFormat(lc, RuleBasedNumberFormat.SPELLOUT);

//1---SPELLOUT, 2---ORDINAL, 3---DURATION

int num1 = 12; String numberMessage1 = rbnf.format(num1); System.out.println(numberMessage1);

float num2 = 23467.234F; String numberMessage2 = rbnf.format(num2); System.out.println(numberMessage2);

com.ibm.util.Calendar cal = new GregorianCalendar(lc); int style = DateFormat.LONG; com.ibm.text.DateFormat df = DateFormat.getDateInstance(cal, style, lc); com.ibm.text.DateFormat tf = DateFormat.getTimeInstance(cal, style, lc);

String dateMessage = df.format(cal.getTime()); String timeMessage = tf.format(cal.getTime()); System.out.println(dateMessage); System.out.println(timeMessage);

System.out.println(com.ibm.text.Transliterator.getDisplayName("AB-B", lc)); //for (int i=0;rbnf.getRuleSetNames();i++) {

System.out.println(rbnf.getRuleSetNames()[0]); System.out.println(rbnf.getRuleSetNames()[1]);

//}

} catch (Exception e) {

}

}

}

when i using java NumberSpellOut zh CN, i only can see words spelled out in English, not in Chinese, and i can see correct russian words using "java NumberSpellOut ru RU" and same as france, japanese words, can u just tell me the reason?

Attachments

Change History

12/31/69 17:36:24 changed by notes2

Have data, just need to add in and make it work.

12/31/69 17:36:25 changed by notes

RBNF for Chinese assigned to doug target: unscheduled

12/31/69 17:36:26 changed by auditor

  • 11/12/01 20:50:23 schererm changed notes
  • 11/12/01 20:50:25 schererm moved from incoming to formatting
  • 11/13/01 13:48:44 dougfelt sent reply 1
  • 11/13/01 13:49:10 dougfelt changed notes
  • 07/31/02 13:30:26 schererm changed notes
  • 07/31/02 19:21:44 schererm changed notes
  • 08/29/02 12:47:23 dougfelt changed notes2
  • 08/29/02 12:47:23 dougfelt changed notes
  • 08/29/02 12:52:19 dougfelt changed notes2
  • 08/29/02 12:52:19 dougfelt changed notes
  • 01/03/03 15:01:35 hshih changed notes2
  • 01/03/03 15:01:35 hshih changed notes
  • 05/29/03 16:12:37 hshih changed notes2
  • 05/29/03 16:12:37 hshih changed notes
  • 02/06/04 15:47:08 dougfelt changed notes2
  • 02/06/04 15:47:08 dougfelt changed notes
  • 02/06/04 15:59:16 dougfelt changed notes2
  • 02/06/04 15:59:16 dougfelt changed notes
  • 07/13/04 17:48:03 srl changed notes2
  • 07/13/04 17:48:03 srl changed notes
  • Fri Dec 3 18:17:02 2004 dougfelt changed notes2: target: "UNSCH" to "3.4", comments: "" to "Have data, just need to add in and make it work.",
  • Fri Dec 3 18:17:02 2004 dougfelt changed notes
  • Tue Sep 27 14:12:18 2005 weiv changed notes2: (via expression '$PglTl3.5') target: "3.4" to "",

11/13/01 12:48:44 changed by Doug Felt <jtcsv(at)jtcsv.com>

Unfortunately, we don't have number-spellout rules for Chinese. When you ask for the Chinese rules, since we don't have resources for it, the SpelloutNumberFormat is defaulting to use the resources from the root. We recently changed (for 2.0) the root rule set to format numbers similarly to DecimalNumberFormat, to make it a bit clearer that you aren't getting 'real' spellout rules. Unfortunately in 1.3.1 you are getting the English rules, which aren't much good if you are trying to format Chinese.

The main problem holding up the spellout rules for Chinese is the handling of 'ling' (zero) in large numbers. Typically 'ling' gets elided (is not used) in alternating groups of numbers. The pattern language for RuleBasedNumberFormat isn't adequate to handle this, so I would have to either create a custom class, or develop some post-processing rules. I wasn't able to do this for the upcoming release.

  • Here's an attempt to state the rules for 'ling2' (zero)... *
  • 1) numbers are divided into clusters of 10000 (yi wan)
  • 2) there is at most one ling2 per cluster
  • 3) if there is a zero in the ten's place of the cluster followed by

non-zero,

  • ling2 always appears there and nowhere else in the cluster
  • 4) otherwise there might be an 'optional' ling2 if there are any

higher

  • non-zero digits in the number and lower non-zero digits in the

cluster.

  • Optional lings are used only if there are no lings in adjacent

clusters.

  • They are assigned starting with the lowest possible cluster,

working up.

*

  • So the algorithm is:
  • a) divide into clusters
  • b) assign required lings
  • c) assign optional lings, working from lowest to highest cluster *
  • examples:
  • 101 - yi bai ling yi
  • 1001 - yi qian ling yi
  • 1010 - yi qian ling yi shi *
  • 1 0001 - yi wan ling yi
  • 1 0010 - yi wan ling yi shi
  • 1 0100 - yi wan ling yi bai
  • 1 0101 - yi wan yi bai ling yi
  • 1 1010 - yi wan yi qian ling yi shi *
  • 1000 0001 - yi qian wan ling yi
  • 1000 0010 - yi qian wan ling yi shi
  • 1000 1000 - yi qian wan yi qian
  • 1001 0001 - yi qian ling yi wan ling yi
  • 1001 0010 - yi qian ling yi wan yi shi
  • 1010 0000 - yi qian ling yi shi wan
  • 1010 0001 - yi qian yi shi wan ling yi
  • 1000 0000 0001 - yi qian yi4 ling yi
  • 1000 0100 0000 - yi qian yi4 ling yi bai wan
  • 1000 0100 0001 - yi qian yi4 yi bai wan ling yi
  • 1001 1010 0010 - yi qian ling yi yi4 yi qian yi shi wan ling yi shi
  • 1010 1010 0010 - yi qian ling yi shi yi4 yi qian yi shi wan ling yi

shi

  • 1010 1001 0010 - yi qian yi shi yi4 yi qian ling yi wan yi shi *
  • 1001 1010 1010 1010 - yi qian ling yi zhao yi qian yi shi yi4 yi qian

yi shi wan yi qian ling yi shi

  • 1010 1010 1010 1010 - yi qian yi shi zhao yi qian ling yi shi yi4 yi

quan yi shi wan yi qian ling yi shi

  • 1010 1010 1001 0010 - yi qian ling yi shi zhao yi qian yi shi yi4 yi

qian ling yi wan yi shi

  • 1000 0100 1010 0010 - yi qian zhao ling yi bai yi4 yi qian yi shi wan

ling yi shi

01/17/07 09:25:01 changed by anonymous

So, what would 101 0123 be? Would it be:

yi bai ling yi wan yi bai er shi san

or would it be:

yi bai ling yi wan *ling* yi bai er shi san ?

03/11/07 21:06:26 changed by andy

  • project set to all.
  • type changed from defect to enhancement.
  • java changed.
  • revw changed.

Add/Change #1473 (RFE:need spellout rules for mandarin)




Anti spam check: