regex: fix buffer read overrun in search [BZ#28470]

Message ID 20211018221548.76024-1-eggert@cs.ucla.edu
State Changes Requested, archived
Headers
Series regex: fix buffer read overrun in search [BZ#28470] |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Paul Eggert Oct. 18, 2021, 10:15 p.m. UTC
  Problem reported by Benno Schulenberg in:
https://lists.gnu.org/r/bug-gnulib/2021-10/msg00035.html
* posix/regexec.c (re_search_internal): Use better bounds check.
---
 posix/regexec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Andreas Schwab Oct. 19, 2021, 7:17 a.m. UTC | #1
On Okt 18 2021, Paul Eggert wrote:

>  	      /* If MATCH_FIRST is out of the buffer, leave it as '\0'.
>  		 Note that MATCH_FIRST must not be smaller than 0.  */
> -	      ch = (match_first >= length
> +	      ch = (mctx.input.valid_len <= offset

That needs to update the comment.

Andreas.
  
Paul Eggert Oct. 19, 2021, 8:13 a.m. UTC | #2
On 10/19/21 00:17, Andreas Schwab wrote:
> That needs to update the comment.

Thanks, revised patch attached.
  
Andreas Schwab Oct. 19, 2021, 8:25 a.m. UTC | #3
On Okt 19 2021, Paul Eggert wrote:

> +	      ch = (mctx.input.valid_len <= offset

This is backwards.

Andreas.
  
Paul Eggert Oct. 19, 2021, 8:57 a.m. UTC | #4
On 10/19/21 01:25, Andreas Schwab wrote:
> On Okt 19 2021, Paul Eggert wrote:
> 
>> +	      ch = (mctx.input.valid_len <= offset
> 
> This is backwards.

It's correct as-is, so that comment is merely about style. I revamped 
the patch to turn the comparison around; see attached. Let's not have 
our longstanding style disagreement distract us from the fix.
  
Andreas Schwab Oct. 19, 2021, 3:09 p.m. UTC | #5
On Okt 19 2021, Paul Eggert wrote:

> diff --git a/posix/regexec.c b/posix/regexec.c
> index 83e9aaf8ca..6aeba3c0b4 100644
> --- a/posix/regexec.c
> +++ b/posix/regexec.c
> @@ -758,10 +758,9 @@ re_search_internal (const regex_t *preg, const char *string, Idx length,
>  
>  		  offset = match_first - mctx.input.raw_mbs_idx;
>  		}
> -	      /* If MATCH_FIRST is out of the buffer, leave it as '\0'.
> -		 Note that MATCH_FIRST must not be smaller than 0.  */
> -	      ch = (match_first >= length
> -		    ? 0 : re_string_byte_at (&mctx.input, offset));
> +	      /* Use buffer byte if OFFSET is in buffer, otherwise '\0'.  */
> +	      ch = (offset < mctx.input.valid_len
> +		    ? re_string_byte_at (&mctx.input, offset) : 0);

Why is the bug not in re_string_reconstruct?  Since string[match_first]
exists, so should re_string_byte_at (&mctx.input, offset).

Andreas.
  
Paul Eggert Oct. 19, 2021, 6:14 p.m. UTC | #6
On 10/19/21 08:09, Andreas Schwab wrote:
> Why is the bug not in re_string_reconstruct?  Since string[match_first]
> exists, so should re_string_byte_at (&mctx.input, offset).

I don't know, as I lacked the time to investigate re_string_reconstruct. 
Although the patch I proposed fixes the test case that prompted it, 
possibly it is only a partial fix for a more-general problem.
  
Paul Eggert Nov. 24, 2021, 10:27 p.m. UTC | #7
No further comment, and the patch is safe and has been used in Gnulib 
for some time even if it doesn't necessarily fix all the underlying 
problem, so I installed it. Tests pass on x86-64.
  
Andreas Schwab Nov. 24, 2021, 10:45 p.m. UTC | #8
On Nov 24 2021, Paul Eggert wrote:

> the patch is safe

Is it?  Why?

Andreas.
  
Paul Eggert Nov. 24, 2021, 11:50 p.m. UTC | #9
On 11/24/21 14:45, Andreas Schwab wrote:
> Is it?  Why?

Partly because it refuses to read past the bounds of an array, where the 
old code would. And partly because it's been run through several tests 
- not just glibc tests, but also grep and coreutils and probably some 
others by now.

Of course this is not a 100% guarantee of safety, but it's close enough.
  
Andreas Schwab Nov. 25, 2021, 9:01 a.m. UTC | #10
On Nov 24 2021, Paul Eggert wrote:

> On 11/24/21 14:45, Andreas Schwab wrote:
>> Is it?  Why?
>
> Partly because it refuses to read past the bounds of an array, where the
> old code would.

That's just papering over a bug, not fixing it.

> And partly because it's been run through several tests - not just
> glibc tests, but also grep and coreutils and probably some others by
> now.

How much coverage do they provide?

Also, you failed to add a test.

Andreas.
  
Paul Eggert Nov. 26, 2021, 6:35 p.m. UTC | #11
On 11/25/21 01:01, Andreas Schwab wrote:

>> Partly because it refuses to read past the bounds of an array, where the
>> old code would.
> 
> That's just papering over a bug, not fixing it.

That's not clear to me. Perhaps you're right, but perhaps it really does 
fix the bug.

>> And partly because it's been run through several tests - not just
>> glibc tests, but also grep and coreutils and probably some others by
>> now.
> 
> How much coverage do they provide?

Someone who has more time could presumably determine this by looking at 
the respective test suites. I forgot to mention, Gnulib also has its own 
regex tests (which also pass).

> Also, you failed to add a test.

Yes, that's correct. It would be nice if someone could do that. However, 
it'd be some work and like you I'm pressed for time.
  
Andreas Schwab Nov. 26, 2021, 6:39 p.m. UTC | #12
On Nov 26 2021, Paul Eggert wrote:

> On 11/25/21 01:01, Andreas Schwab wrote:
>
>>> Partly because it refuses to read past the bounds of an array, where the
>>> old code would.
>> That's just papering over a bug, not fixing it.
>
> That's not clear to me. Perhaps you're right, but perhaps it really does
> fix the bug.

That's why we need a proper test case.  Not voodoo programming.

Andreas.
  

Patch

diff --git a/posix/regexec.c b/posix/regexec.c
index 83e9aaf8ca..a955aa2182 100644
--- a/posix/regexec.c
+++ b/posix/regexec.c
@@ -760,7 +760,7 @@  re_search_internal (const regex_t *preg, const char *string, Idx length,
 		}
 	      /* If MATCH_FIRST is out of the buffer, leave it as '\0'.
 		 Note that MATCH_FIRST must not be smaller than 0.  */
-	      ch = (match_first >= length
+	      ch = (mctx.input.valid_len <= offset
 		    ? 0 : re_string_byte_at (&mctx.input, offset));
 	      if (fastmap[ch])
 		break;