We have used com.ibm.icu.lang.UCharacter.toUpperCase to uppercase a Greek
string. And the result is wrong. Capital letters in Greek cannot be accented.
Consider the following Greek words written in lower letters as an example for
my explanation:
Üäéêïò, êåßìåíï, ßñéäá
In Greek, the acute accent (') is placed on top of the vowel letter (stressed)
of the syllable of the word, which is pronounced the loudest i.e.
Ü-äéêïò,
êåß-ìåíï, ß-ñéäá
1) If the initial vowel of a word is capitalised and stressed, then the acute
accent (') should be placed on the upper left corner of the vowel, e.g.
¢äéêïò,
ºñéäá.
For instance in ISO 8859-7 encoding:
Üäéêïò->¢äéêïò
Ü (hex value: DC) should be replaced with ¢ (hex value: B6)
and
ßñéäá->ºñéäá
ß (hex value: DF) should be replaced with º (hex value: BA)
2) If the whole word is capitalised, then the acute accent SHOULD NOT be used,
e.g. ÁÄÉÊÏÓ, ÉÑÉÄÁ, ÊÅÉÌÅÍÏ.
For instance in ISO 8859-7 encoding:
Üäéêïò->ÁÄÉÊÏÓ
Ü (hex value: DC) should be replaced with Á (hex value: C1)
ä (hex value: E4) should be replaced with Ä (hex value: C4)
é (hex value: E9) should be replaced with É (hex value: C9)
ê (hex value: EA) should be replaced with Ê (hex value: CA)
ï (hex value: EF) should be replaced with Ï (hex value: CF)
ò (hex value: F2) should be replaced with Ó (hex value: D3)
êåßìåíï-ÊÅÉÌÅÍÏ
ê (hex value: EA) should be replaced with Ê (hex value: CA)
å (hex value: E5) should be replaced with Å (hex value: C5)
ß (hex value: DF) should be replaced with É (hex value: C9)
ì (hex value: EC) should be replaced with Ì (hex value: CC)
å (hex value: E5) should be replaced with Å (hex value: C5)
í (hex value: ED) should be replaced with Í (hex value: CD)
ï (hex value: EF) should be replaced with Ï (hex value: CF)
ßñéäá->ÉÑÉÄÁ
ß (hex value: DF) should be replaced with É (hex value: C9)
ñ (hex value: F1) should be replaced with Ñ (hex value: D1)
é (hex value: E9) should be replaced with É (hex value: C9)
ä (hex value: E4) should be replaced with Ä (hex value: C4)
á (hex value: E1) should be replaced with Á (hex value: C1)
There is only one exception to the second rule.
Before getting into this, allow me to mention another rule which relates to our
issue.
In Greek, monosyllabic words aren't accented because there is only one
syllable. There are exceptions to this rule. One of these exceptions is the
word 'Þ' (the equivalent of 'or' in English) which is one of the monosyllabic
words that SHOULD be accented when written in lower letters 'Þ' (This occurs
in
order to distinguish it from the article 'ç' which by default is not
accented.).
In addition, it is the only one word that SHOULD be accented when written in
capital letters '¹' (again to distinguish it from the article when written in
capitals).
For instance in ISO 8859-7 encoding:
Þ->¹
Þ (hex value: DE) should be replaced with ¹ (hex value: B9)