regcomp: Fix off-by-one bug in build_equiv_class [BZ #23396]

Message ID 20180713153540.2B5BB43994575@oldenburg.str.redhat.com
State Committed
Headers

Commit Message

Florian Weimer July 13, 2018, 3:35 p.m. UTC
  This bug is very similar to bug 23036: The existing code assumed that
the length count included the length byte itself.

2018-07-13  Florian Weimer  <fweimer@redhat.com>

	[BZ #23396]
	* posix/regcomp.c (build_equiv_class): When comparing weights, do
	not compare an extra byte after the end of the weights.
  

Comments

Carlos O'Donell July 13, 2018, 6:30 p.m. UTC | #1
On 07/13/2018 11:35 AM, Florian Weimer wrote:
> This bug is very similar to bug 23036: The existing code assumed that
> the length count included the length byte itself.
> 
> 2018-07-13  Florian Weimer  <fweimer@redhat.com>
> 
> 	[BZ #23396]
> 	* posix/regcomp.c (build_equiv_class): When comparing weights, do
> 	not compare an extra byte after the end of the weights.

There is another loop similar to this in fnmatch_loop.c, could you please
have a look at that one too. It appears to be correct, you pointed out to
me that it does 'cnt == len', but it would be good to have a definitive
answer there.

OK for 2.28.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> diff --git a/posix/regcomp.c b/posix/regcomp.c
> index 7b5ddaad0c..545d188468 100644
> --- a/posix/regcomp.c
> +++ b/posix/regcomp.c
> @@ -3531,18 +3531,10 @@ build_equiv_class (bitset_t sbcset, const unsigned char *name)
>  	    continue;
>  	  /* Compare only if the length matches and the collation rule
>  	     index is the same.  */
> -	  if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24))

OK. len is weights[idx1 & 0xffffff]; Compare length the same and rule index the same.

> -	    {
> -	      int cnt = 0;
> -
> -	      while (cnt <= len &&
> -		     weights[(idx1 & 0xffffff) + 1 + cnt]
> -		     == weights[(idx2 & 0xffffff) + 1 + cnt])
> -		++cnt;
> -
> -	      if (cnt > len)
> -		bitset_set (sbcset, ch);
> -	    }

OK. Compares 1 byte past the end.

> +	  if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24)
> +	      && memcmp (weights + (idx1 & 0xffffff) + 1,
> +			 weights + (idx2 & 0xffffff) + 1, len) == 0)
> +	    bitset_set (sbcset, ch);

OK. Compares only len via memcmp.

>  	}
>        /* Check whether the array has enough space.  */
>        if (BE (*equiv_class_alloc == mbcset->nequiv_classes, 0))
> 

Cheers,
Carlos.
  
Adhemerval Zanella Netto July 13, 2018, 6:46 p.m. UTC | #2
On 13/07/2018 15:30, Carlos O'Donell wrote:
> On 07/13/2018 11:35 AM, Florian Weimer wrote:
>> This bug is very similar to bug 23036: The existing code assumed that
>> the length count included the length byte itself.
>>
>> 2018-07-13  Florian Weimer  <fweimer@redhat.com>
>>
>> 	[BZ #23396]
>> 	* posix/regcomp.c (build_equiv_class): When comparing weights, do
>> 	not compare an extra byte after the end of the weights.
> 
> There is another loop similar to this in fnmatch_loop.c, could you please
> have a look at that one too. It appears to be correct, you pointed out to
> me that it does 'cnt == len', but it would be good to have a definitive
> answer there.

Maybe is it related to this patch [1]? This is on my radar to get reviewed,
however I would need some time to dig into the locale internal layout and
organizations to get this done.

[1] https://sourceware.org/ml/libc-alpha/2018-05/msg00794.html

> 
> OK for 2.28.
> 
> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> 
>> diff --git a/posix/regcomp.c b/posix/regcomp.c
>> index 7b5ddaad0c..545d188468 100644
>> --- a/posix/regcomp.c
>> +++ b/posix/regcomp.c
>> @@ -3531,18 +3531,10 @@ build_equiv_class (bitset_t sbcset, const unsigned char *name)
>>  	    continue;
>>  	  /* Compare only if the length matches and the collation rule
>>  	     index is the same.  */
>> -	  if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24))
> 
> OK. len is weights[idx1 & 0xffffff]; Compare length the same and rule index the same.
> 
>> -	    {
>> -	      int cnt = 0;
>> -
>> -	      while (cnt <= len &&
>> -		     weights[(idx1 & 0xffffff) + 1 + cnt]
>> -		     == weights[(idx2 & 0xffffff) + 1 + cnt])
>> -		++cnt;
>> -
>> -	      if (cnt > len)
>> -		bitset_set (sbcset, ch);
>> -	    }
> 
> OK. Compares 1 byte past the end.
> 
>> +	  if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24)
>> +	      && memcmp (weights + (idx1 & 0xffffff) + 1,
>> +			 weights + (idx2 & 0xffffff) + 1, len) == 0)
>> +	    bitset_set (sbcset, ch);
> 
> OK. Compares only len via memcmp.
> 
>>  	}
>>        /* Check whether the array has enough space.  */
>>        if (BE (*equiv_class_alloc == mbcset->nequiv_classes, 0))
>>
> 
> Cheers,
> Carlos.
>
  
Florian Weimer July 24, 2018, 2:35 p.m. UTC | #3
On 07/13/2018 08:30 PM, Carlos O'Donell wrote:
> On 07/13/2018 11:35 AM, Florian Weimer wrote:
>> This bug is very similar to bug 23036: The existing code assumed that
>> the length count included the length byte itself.
>>
>> 2018-07-13  Florian Weimer<fweimer@redhat.com>
>>
>> 	[BZ #23396]
>> 	* posix/regcomp.c (build_equiv_class): When comparing weights, do
>> 	not compare an extra byte after the end of the weights.
> There is another loop similar to this in fnmatch_loop.c, could you please
> have a look at that one too. It appears to be correct, you pointed out to
> me that it does 'cnt == len', but it would be good to have a definitive
> answer there.

I enhanced my enumeration tester to cover fnmatch too, and it produces 
the expected result for [[=a=]] (matching the fixed regex result), 
without further changes.

Thanks,
Florian
  
Carlos O'Donell July 24, 2018, 3:32 p.m. UTC | #4
On 07/24/2018 10:35 AM, Florian Weimer wrote:> I enhanced my enumeration tester to cover fnmatch too, and it
> produces the expected result for [[=a=]] (matching the fixed regex
> result), without further changes.

Thanks for checking.

Cheers,
Carlos.
  

Patch

diff --git a/posix/regcomp.c b/posix/regcomp.c
index 7b5ddaad0c..545d188468 100644
--- a/posix/regcomp.c
+++ b/posix/regcomp.c
@@ -3531,18 +3531,10 @@  build_equiv_class (bitset_t sbcset, const unsigned char *name)
 	    continue;
 	  /* Compare only if the length matches and the collation rule
 	     index is the same.  */
-	  if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24))
-	    {
-	      int cnt = 0;
-
-	      while (cnt <= len &&
-		     weights[(idx1 & 0xffffff) + 1 + cnt]
-		     == weights[(idx2 & 0xffffff) + 1 + cnt])
-		++cnt;
-
-	      if (cnt > len)
-		bitset_set (sbcset, ch);
-	    }
+	  if (len == weights[idx2 & 0xffffff] && (idx1 >> 24) == (idx2 >> 24)
+	      && memcmp (weights + (idx1 & 0xffffff) + 1,
+			 weights + (idx2 & 0xffffff) + 1, len) == 0)
+	    bitset_set (sbcset, ch);
 	}
       /* Check whether the array has enough space.  */
       if (BE (*equiv_class_alloc == mbcset->nequiv_classes, 0))