This problem was reported by Oleg Sukhodolsky to the ICU support mailing list. Below is the original description.
Hi,
I've tried to use ICU4J with JavaMail API to have encoder for
x-mac-cyrillic (which is not provided by JDK). And found that simple
call
MimeUtility.decodeWord("=?x-mac-cyrillic?B?ODU4OV0=?=");
Throws exception
java.nio.BufferOverflowException
at java.nio.charset.CoderResult.throwException(CoderResult.java:259)
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:142)
at java.lang.StringCoding.decode(StringCoding.java:173)
at java.lang.String.<init>(String.java:444)
at javax.mail.internet.MimeUtility.decodeWord(MimeUtility.java:834)
After investigating I have found that the problem is caused by the
capacity of byte array
that JavaMail passes to String ctor (and thus by capacity of
ByteBuffer which String ctor
passes to decode() method). It is a little bit bigger than size of
text. I.e. in term of
ByteBuffer capacity() > limit(). So, I've looked at the ICU code and
(I think) I have identified
the cause of the problem in CharsetMBCS.cnvMBCSSingleToBMPWithOffsets():
if (!cr[0].isError() && sourceArrayIndex < source.capacity() &&
!target.hasRemaining()) {
/* target is full */
cr[0] = CoderResult.OVERFLOW;
}
I believe source.limit() should be used instead of source.capacity() here.
So, here is a fix:
Index: src/com/ibm/icu/charset/CharsetMBCS.java
===================================================================
--- src/com/ibm/icu/charset/CharsetMBCS.java (revision 24923)
+++ src/com/ibm/icu/charset/CharsetMBCS.java (working copy)
@@ -2684,7 +2684,7 @@
}
}
- if (!cr[0].isError() && sourceArrayIndex <
source.capacity() && !target.hasRemaining()) {
+ if (!cr[0].isError() && sourceArrayIndex < source.limit()
&& !target.hasRemaining()) {
/* target is full */
cr[0] = CoderResult.OVERFLOW;
}
Should I file a separate bug about it? Or it is a know problem?
With best regards, Oleg.