uregex_replaceAll will not report the correct size to hold the entire replaced string in some circumstances.
The bug exists in uregex_replaceAll itself. The original relevant lines of codes:
int32_t len = 0;
uregex_reset(regexp, 0, status);
while (uregex_findNext(regexp, status)) {
len += uregex_appendReplacement(regexp, replacementText, replacementLength,
&destBuf, &destCapacity, status);
}
len += uregex_appendTail(regexp, &destBuf, &destCapacity, status);
Proposed fix:
int32_t len = 0, overflowed = 0;
uregex_reset(regexp, 0, status);
while (uregex_findNext(regexp, status)) {
len += uregex_appendReplacement(regexp, replacementText, replacementLength,
&destBuf, &destCapacity, status);
if(*status == U_BUFFER_OVERFLOW_ERROR) { overflowed = 1; *status = 0; }
}
if((*status == 0) && (overflowed == 1)) { *status = U_BUFFER_OVERFLOW_ERROR; }
len += uregex_appendTail(regexp, &destBuf, &destCapacity, status);
The bug is a result of the fact that once uregex_appendReplacement sets status to U_BUFFER_OVERFLOW_ERROR, the next call to uregex_find() will 'fail' because status != 0. Thus the size of any remaining replacements is skipped, causing the final calculated buffer replacement size to be short.
In the event of a U_BUFFER_OVERFLOW_ERROR, the returned replacement length should be sufficient to hold the fully replaced string. It should not return with a U_BUFFER_OVERFLOW_ERROR again since the correct buffer size should have been calculated by the first failed call.
The following demonstrates the bug. I coded it in the style of 'source/test/cintltst/reapits.c', and it can be placed in the replaceAll() section. I replicated the first replaceAll test, and placed this one right after it. However, I was working with just the uregex_openC and the first uregex_replaceAll test in a stand alone .c file, so I'm not sure if it can be dropped in completely unmodified in the full test suite. It should be pretty close, though.
As it stands, this test fails assertions 2, 3, 4 and passes assertions 1 and 5. It passes five as a fluke of the particular strings involved which just happen to only require two buffer overflow iterations to get the correct size. Different replacement texts / etc can cause additional buffer overflow errors until a large enough buffer is finally calculated.
status = U_ZERO_ERROR;
u_uastrncpy(replText, "<$1>[$1]", sizeof(replText)/2);
resultSz = uregex_replaceAll(re, replText, -1, buf, 4, &status); // Buffer size of 4 is deliberate.
TEST_ASSERT(status == U_BUFFER_OVERFLOW_ERROR);
TEST_ASSERT(resultSz == (int32_t)strlen("Replace <aa>[aa] <1>[1] <...>...."));
status = U_ZERO_ERROR; // Clear U_BUFFER_OVERFLOW_ERROR error
resultSz = uregex_replaceAll(re, replText, -1, buf, resultSz, &status); // previous replaceAll buffer size to hold all of the replaced string
TEST_ASSERT_SUCCESS(status);
TEST_ASSERT_STRING("Replace <aa>[aa] <1>[1] <...>....", buf, TRUE);
TEST_ASSERT(resultSz == (int32_t)strlen("Replace <aa>[aa] <1>[1] <...>...."));
u_uastrncpy(replText, "<$1>", sizeof(replText)/2); // Reset replText