Add syntax for Transliteration to be able to recognize word boundaries.
Example:
[:Thai:] {([:break:])} [:Thai:] > ' ';
to insert a space between Thai words. This will simplify code for
transliterating Thai, and be more generally applicable to other cases.
[:break:] consumes no characters: it matches iff there is a word break at that
position.
Additional syntax:
\b for matching a word boundary
\B for matching not at a word boundary
Note: There are some possible ways to generalize this that we should consider:
1. Allow any break iterator: character, word, line, sentence (word is default).
2. Allow testing the status also (for the case of \b).
3. Allow adding a specific locale.
Example: \b{ja word=3}
Syntax:
"[:" ""? locale? ("character" | "word" | "line" | "sentence")? "break" ("="
status )? ":]"
("\b" | "\B") ("{" locale? ("character" | "word" | "line" | "sentence")? ("="
status )? "}")?
Note: For 2.4 the plan is to have Thai word break be 'always on' (if we
encounter Thai characters), but for some locales (e.g. Japanese) it could make a
difference, and for other break iterators it will make a difference.