Ticket #824 (new enhancement)

SVN Diffs for #824

 

Opened 8 years ago

Last modified 1 year ago

Add ucnv_getToUnicodeLength() and ucnv_getFromUnicodeLength() functions

Reported by: bumgard(at)roguewave.com Assigned to: markus
Priority: minor Milestone: UNSCH
Component: conversion Version: 1.7
Keywords: conversion Cc:
Load: Xref:
Java Version: Operating System: all
Project (C/J): all Weeks: 0.5
Review:

Description

I propose to add a pair of functions that calculate the target sequence length for a given sequence of source characters. This functionality would be useful in applications that need to calculate the destination buffer size prior to performing translation (even though these functions must perform the translation anyway). The Standard C++ Library std::codecvt<> class requires a do_length() member function that provides the semantics offered by this proposal.

Since these functions might be called at any point within an ongoing conversion process, they need to start with the current conversion state, but also need to perform their function without changing the original conversion state.

The following code illustrates how these function might be implemented. Please Note: this code was not compiled!

#define U_TEMP_BUFFER_SIZE 32

/*!

  • Counts the number of complete and valid UChar values that would be
  • produced as a result of translating the source sequence of characters.
  • This function attempts to translate the entire source, until the
  • source limit is reached, or until a translation error occurs. *
  • @param converter
  • A Unicode converter.
  • @param source
  • Input: Points to the first character in the
  • source sequence.
  • Output: Points to the character following the
  • last character translated.
  • @param sourceLimit
  • Points to the character location that follows
  • the last character in the source sequence.
  • @param flush
  • A value of true indicates that any characters
  • that would be produced as a result of flushing
  • the converters internal state should also be counted.
  • @param err
  • Input: If set to an error value, causes the
  • function to immediately return.
  • Output: An indication of success or failure.
  • This function may return:
  • \li \c U_INVALID_CHAR_FOUND
  • The source contains a character that
  • could not be translated into Unicode.
  • \li \c U_ILLEGAL_CHAR_FOUND
  • The source contains a character
  • sequence that violates the external
  • encoding scheme.
  • \li \c U_TRUNCATED_CHAR_FOUND
  • The source contains a truncated
  • character sequence.
  • @return The number of complete and valid UChar values
  • would be produced by translating the source
  • sequence. */

int32_t getToUnicodeLength(const UConverter* converter,

const char** source, const char* sourceLimit, UBool flush, UErrorCode* err) {

If (U_FAILURE(*err) (sourceLimit < *source)) {

*err = U_ILLEGAL_ARGUMENT_ERROR; return 0;

}

UConverter* localConverter = ucnv_safeClone(converter, (void*)0, (void*)0,

err);

UChar buffer[U_TEMP_BUFFER_SIZE]; UChar* target, targetLimit = buffer + U_TEMP_BUFFER_SIZE;

int32_t count = 0; do {

target = buffer; err = U_ZERO_ERROR; target = ucnv_toUnicode(localConverter,

&target, targetLimit, source, sourceLimit, NULL, flush, err);

if (U_SUCCESS(err) err == U_BUFFER_OVERFLOW_ERROR)

count += (int32_t)(target - buffer);

} while(*source < sourceLimit && err == U_BUFFER_OVERFLOW));

ucnv_close(localConverter); return count;

}

/*!

  • Counts the number of complete and valid char values that would be
  • produced as a result of translating the source sequence of UChar
  • characters.
  • This function attempts to translate the entire source, until the
  • source limit is reached, or until a translation error occurs. *
  • @param converter
  • A Unicode converter.
  • @param source
  • Input: Points to the first character in the
  • source sequence.
  • Output: Points to the character following the
  • last character translated.
  • @param sourceLimit
  • Points to the character location that follows
  • the last character in the source sequence.
  • @param flush
  • A value of true indicates that any characters
  • that would be produced as a result of flushing
  • the converters internal state should also be counted.
  • @param err
  • Input: If set to an error value, causes the
  • function to immediately return.
  • Output: An indication of success or failure.
  • This function may return:
  • \li \c U_INVALID_CHAR_FOUND
  • The source contains a character that
  • could not be translated into the
  • external encoding.
  • \li \c U_ILLEGAL_CHAR_FOUND
  • A source contains an invalid Unicode
  • value or malformed surrogate pair.
  • \li \c U_TRUNCATED_CHAR_FOUND
  • The source contains or ends with a
  • low-surrogate value that is not followed
  • by a high-surrogate value.
  • @return The number of complete and valid char values
  • would be produced by translating the source
  • sequence. For fixed size multibyte encodings,
  • this value may be divided by the value returned
  • fromm getMaxCharSize() to calculate the number
  • external characters. */

int32_t getFromUnicodeLength(const UConverter* converter,

const UChar** source, const UChar* sourceLimit, UBool flush, UErrorCode* err) {

If (U_FAILURE(*err) (sourceLimit < *source)) {

*err = U_ILLEGAL_ARGUMENT_ERROR; return 0;

}

UConverter* localConverter = ucnv_safeClone(converter, (void*)0, (void*)0,

err);

char buffer[U_TEMP_BUFFER_SIZE]; char* target, targetLimit = buffer + U_TEMP_BUFFER_SIZE;

int32_t count = 0; do {

target = buffer; err = U_ZERO_ERROR; target = ucnv_fromUnicode(localConverter,

&target, targetLimit, source, sourceLimit, NULL, flush, err);

if (U_SUCCESS(err) err == U_BUFFER_OVERFLOW_ERROR)

count += (int32_t)(target - buffer);

} while(*source < sourceLimit && err == U_BUFFER_OVERFLOW));

ucnv_close(localConverter); return count;

}

Attachments

Change History

12/31/69 17:31:54 changed by notes

Can pre-flighting be used instead?

12/31/69 17:31:55 changed by auditor

  • Wed Mar 7 21:35:49 2001 schererm moved from incoming to feature
  • 03/19/02 20:38:00 mark moved from feature to conversion
  • 10/29/02 15:04:46 hshih changed notes2
  • 05/29/03 16:03:05 hshih changed notes2
  • 02/06/04 19:09:00 schererm changed notes2
  • Fri Oct 13 23:18:19 2006 grhoten changed notes2: comments: "
  • " to "",
  • Fri Oct 13 23:18:19 2006 grhoten changed notes

09/26/07 16:11:50 changed by markus

  • load changed.
  • xref changed.
  • java changed.
  • summary changed from RFE: Add ucnv_getToUnicodeLength() and ucnv_getFromUnicodeLength() functions to Add ucnv_getToUnicodeLength() and ucnv_getFromUnicodeLength() functions.
  • project set to all.
  • revw changed.

Add/Change #824 (Add ucnv_getToUnicodeLength() and ucnv_getFromUnicodeLength() functions)




Anti spam check: