In ICU4J, the case insensitive compare is slower than it should be.
In UTF16.StringComparator, it calls:
if (m_ignoreCase_) {
return compareCaseInsensitive(str1, str2);
}
which calls
return NormalizerImpl.cmpEquivFold(s1, s2,
m_foldCase_ |
m_codePointCompare_
|
Normalizer.COMPARE_IGNORE_CASE);
which calls
public static int cmpEquivFold(String s1, String s2,int options){
return cmpEquivFold(s1.toCharArray(),0,s1.length(),
s2.toCharArray(),0,s2.length(),
options);
}
// note the often unnecessary extraction of a char array
which calls:
public static int cmpEquivFold(String s1, String s2,int options){
return cmpEquivFold(s1.toCharArray(),0,s1.length(),
s2.toCharArray(),0,s2.length(),
options);
}
// internal function
public static int cmpEquivFold(char[] s1, int s1Start,int s1Limit,
char[] s2, int s2Start,int s2Limit,
int options) {
// current-level start/limit - s1/s2 as current
int start1, start2, limit1, limit2;
char[] cSource1, cSource2;
cSource1 = s1;
cSource2 = s2;
// decomposition variables
int length;
// stacks of previous-level start/current/limit
CmpEquivLevel[] stack1 = new CmpEquivLevel[]{
new CmpEquivLevel(),
new CmpEquivLevel()
};
CmpEquivLevel[] stack2 = new CmpEquivLevel[]{
new CmpEquivLevel(),
new CmpEquivLevel()
};
// decomposition buffers for Hangul
char[] decomp1 = new char[8];
char[] decomp2 = new char[8];
// case folding buffers, only use current-level start/limit
char[] fold1 = new char[32];
char[] fold2 = new char[32];
...
All of this is pretty expensive setup when it is most often not needed (eg
comparing "mark" to "Mark" or to "fred").
then internally it calls foldCase, which turns stuff back into temporary
strings, bunches of them:
private static int foldCase(int c, char[] dest, int destStart, int
destLimit,
int options){
String src = UTF16.valueOf(c);
String foldedStr = UCharacter.foldCase(src,options);
char[] foldedC = foldedStr.toCharArray();
...