regcomp: Fix off-by-one bug in build_equiv_class [BZ #23396]
Commit Message
This bug is very similar to bug 23036: The existing code assumed that
the length count included the length byte itself.
2018-07-13 Florian Weimer <fweimer@redhat.com>
[BZ #23396]
* posix/regcomp.c (build_equiv_class): When comparing weights, do
not compare an extra byte after the end of the weights.
Comments
On 07/13/2018 11:35 AM, Florian Weimer wrote:
> This bug is very similar to bug 23036: The existing code assumed that
> the length count included the length byte itself.
>
> 2018-07-13 Florian Weimer <fweimer@redhat.com>
>
> [BZ #23396]
> * posix/regcomp.c (build_equiv_class): When comparing weights, do
> not compare an extra byte after the end of the weights.
There is another loop similar to this in fnmatch_loop.c, could you please
have a look at that one too. It appears to be correct, you pointed out to
me that it does 'cnt == len', but it would be good to have a definitive
answer there.
OK for 2.28.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> diff --git a/posix/regcomp.c b/posix/regcomp.c
> index 7b5ddaad0c..545d188468 100644
> --- a/posix/regcomp.c
> +++ b/posix/regcomp.c
> @@ -3531,18 +3531,10 @@ build_equiv_class (bitset_t sbcset, const unsigned char *name)
> continue;
> /* Compare only if the length matches and the collation rule
> index is the same. */
> - if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24))
OK. len is weights[idx1 & 0xffffff]; Compare length the same and rule index the same.
> - {
> - int cnt = 0;
> -
> - while (cnt <= len &&
> - weights[(idx1 & 0xffffff) + 1 + cnt]
> - == weights[(idx2 & 0xffffff) + 1 + cnt])
> - ++cnt;
> -
> - if (cnt > len)
> - bitset_set (sbcset, ch);
> - }
OK. Compares 1 byte past the end.
> + if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24)
> + && memcmp (weights + (idx1 & 0xffffff) + 1,
> + weights + (idx2 & 0xffffff) + 1, len) == 0)
> + bitset_set (sbcset, ch);
OK. Compares only len via memcmp.
> }
> /* Check whether the array has enough space. */
> if (BE (*equiv_class_alloc == mbcset->nequiv_classes, 0))
>
Cheers,
Carlos.
On 13/07/2018 15:30, Carlos O'Donell wrote:
> On 07/13/2018 11:35 AM, Florian Weimer wrote:
>> This bug is very similar to bug 23036: The existing code assumed that
>> the length count included the length byte itself.
>>
>> 2018-07-13 Florian Weimer <fweimer@redhat.com>
>>
>> [BZ #23396]
>> * posix/regcomp.c (build_equiv_class): When comparing weights, do
>> not compare an extra byte after the end of the weights.
>
> There is another loop similar to this in fnmatch_loop.c, could you please
> have a look at that one too. It appears to be correct, you pointed out to
> me that it does 'cnt == len', but it would be good to have a definitive
> answer there.
Maybe is it related to this patch [1]? This is on my radar to get reviewed,
however I would need some time to dig into the locale internal layout and
organizations to get this done.
[1] https://sourceware.org/ml/libc-alpha/2018-05/msg00794.html
>
> OK for 2.28.
>
> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
>
>> diff --git a/posix/regcomp.c b/posix/regcomp.c
>> index 7b5ddaad0c..545d188468 100644
>> --- a/posix/regcomp.c
>> +++ b/posix/regcomp.c
>> @@ -3531,18 +3531,10 @@ build_equiv_class (bitset_t sbcset, const unsigned char *name)
>> continue;
>> /* Compare only if the length matches and the collation rule
>> index is the same. */
>> - if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24))
>
> OK. len is weights[idx1 & 0xffffff]; Compare length the same and rule index the same.
>
>> - {
>> - int cnt = 0;
>> -
>> - while (cnt <= len &&
>> - weights[(idx1 & 0xffffff) + 1 + cnt]
>> - == weights[(idx2 & 0xffffff) + 1 + cnt])
>> - ++cnt;
>> -
>> - if (cnt > len)
>> - bitset_set (sbcset, ch);
>> - }
>
> OK. Compares 1 byte past the end.
>
>> + if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24)
>> + && memcmp (weights + (idx1 & 0xffffff) + 1,
>> + weights + (idx2 & 0xffffff) + 1, len) == 0)
>> + bitset_set (sbcset, ch);
>
> OK. Compares only len via memcmp.
>
>> }
>> /* Check whether the array has enough space. */
>> if (BE (*equiv_class_alloc == mbcset->nequiv_classes, 0))
>>
>
> Cheers,
> Carlos.
>
On 07/13/2018 08:30 PM, Carlos O'Donell wrote:
> On 07/13/2018 11:35 AM, Florian Weimer wrote:
>> This bug is very similar to bug 23036: The existing code assumed that
>> the length count included the length byte itself.
>>
>> 2018-07-13 Florian Weimer<fweimer@redhat.com>
>>
>> [BZ #23396]
>> * posix/regcomp.c (build_equiv_class): When comparing weights, do
>> not compare an extra byte after the end of the weights.
> There is another loop similar to this in fnmatch_loop.c, could you please
> have a look at that one too. It appears to be correct, you pointed out to
> me that it does 'cnt == len', but it would be good to have a definitive
> answer there.
I enhanced my enumeration tester to cover fnmatch too, and it produces
the expected result for [[=a=]] (matching the fixed regex result),
without further changes.
Thanks,
Florian
On 07/24/2018 10:35 AM, Florian Weimer wrote:> I enhanced my enumeration tester to cover fnmatch too, and it
> produces the expected result for [[=a=]] (matching the fixed regex
> result), without further changes.
Thanks for checking.
Cheers,
Carlos.
@@ -3531,18 +3531,10 @@ build_equiv_class (bitset_t sbcset, const unsigned char *name)
continue;
/* Compare only if the length matches and the collation rule
index is the same. */
- if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24))
- {
- int cnt = 0;
-
- while (cnt <= len &&
- weights[(idx1 & 0xffffff) + 1 + cnt]
- == weights[(idx2 & 0xffffff) + 1 + cnt])
- ++cnt;
-
- if (cnt > len)
- bitset_set (sbcset, ch);
- }
+ if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24)
+ && memcmp (weights + (idx1 & 0xffffff) + 1,
+ weights + (idx2 & 0xffffff) + 1, len) == 0)
+ bitset_set (sbcset, ch);
}
/* Check whether the array has enough space. */
if (BE (*equiv_class_alloc == mbcset->nequiv_classes, 0))