Using RegexMatcher, I'm searching for a Find string of "\bते" (or \u005C\u0062\u0924\u0947, which is the '\b' word boundary code followed by a two letter Devanagari word). The string I am searching is: वक्ते (or \u0935\u0915\u094D\u200D\u0924\u0947). The u200D is the Zero Width Joiner that occurs just prior to the two Devanagari characters at the end of the search string. The ZWJ is there because the first of the two letters is actually part of a consonant cluster, which is what we use the halant (\u094d) followed by the ZWJ to signal (i.e. within a word).
Anyway, it appears that the word boundary code in the find string (i.e. '\b') is thinking that the ZWJ is signalling a word break, but it is not.
I also checked the BreakIterator stuff just to make sure and it is correctly treating that location as within a word and not at a break as RegexMatcher is. So at the very least, the BreakIterator is not behaving the same as RegexMatcher with the '\b' is.
Bob Eaton
P.S. Attached is a short C++ program that shows this error