Ticket #3321 (new enhancement)

SVN Diffs for #3321

 

Opened 5 years ago

Last modified 5 months ago

Standardize rule/pattern quoting

Reported by: liuas(at)us.ibm.com Assigned to: andy
Priority: assess Milestone: UNSCH
Component: formatting Version: 2.8
Keywords: formatting Cc:
Load: Xref: 2920, 901
Java Version: Operating System: all
Project (C/J): ICU4C and ICU4J Weeks: 1
Review:

Description (Last modified by srl)

Classes using rules and patterns should standardize the way they do quoting, and deviations should be documented.

Quoting mechanisms in use include: 1. Single quotes. Inside OR OUTSIDE of single quotes, two adjacent single quotes represents a literal single quote. 2. Backslash. This escapes the next character, or initiates an escape sequence (see unescape(), unescapeAt()).

ICU parsers use zero, one, or two of these mechanisms.

When a parser GENERATES a pattern (toPattern(), etc.) it will need to quote characters that otherwise would be taken for syntax characters.

The general rule for doing so:

*** Quote all LITERAL characters in a rule/pattern for which isIdContinue(c) == FALSE.

This was decided in a staff meeting technical discussion 9/8/2003.

Attachments

Change History

12/31/69 17:37:07 changed by auditor

  • 12/09/03 13:42:27 schererm changed notes2
  • 12/09/03 13:42:27 schererm moved from incoming to formatting
  • 02/06/04 12:59:12 andy sent reply 1
  • 02/06/04 13:02:31 andy changed notes2
  • 02/17/04 03:09:09 weiv changed notes2
  • 02/17/04 20:06:17 weiv changed notes2
  • 02/18/04 13:59:43 schererm changed notes2
  • 07/19/04 18:39:03 schererm changed notes2

02/06/04 11:59:12 changed by Andy Heninger <heninger(at)us.ibm.com>

Consolidating with bug 901, which was...

BACKGROUND

Several components of ICU4C/J are data driven. They take patterns (short strings, less than a line of text) or rules (typically one or more lines of text) that parameterize them. Examples are SimpleDateFormat, DecimalFormat, RuleBasedCollator, RuleBasedTransliterator, RuleBasedBreakIterator, and other classes. In these rules, a class of characters ('specials') must be escaped or quoted if they are to be used as literals. Quoting mechanisms include single quotes and backslash escapes (including \uXXXX).

Rule-based components typically return their rule string via methods such as toPattern(), getRules(), etc. The returned string should be directly usable to create new objects equivalent to the originals. For this to work, special characters must be escaped in the returned rule string.

RFE

Add a quoting function that escapes and/or quotes special characters. This function should take a string, a set of characters that need quoting, and return the quoted string.

C

/**

  • Quote specials in-place. Return actual length (may be > capacity).
  • @param specials set of characters that need quoting */

int32_t u_quoteSpecials(UChar* string, int32_t length, int32_t capacity,

UUnicodeSet* specials);

This might be a little cleaner:

/**

  • Quote specials. Return result length (may be > capacity). */

int32_t u_quoteSpecials(const UChar* string, int32_t length,

UChar* result, int32_t resultCapacity, UUnicodeSet* specials);

Java

/**

  • Quote specials. */

UnicodeString quoteSpecials(UnicodeString string,

UnicodeSet specials);

09/28/07 13:14:19 changed by andy

  • load changed.
  • java changed.
  • revw changed.
  • summary changed from RFE: Standardize rule/pattern quoting to Standardize rule/pattern quoting.

07/07/08 12:16:03 changed by srl

  • priority changed from major to assess.
  • description changed.

Add/Change #3321 (Standardize rule/pattern quoting)




Anti spam check: