[2/3] ungetc: Fix uninitialized read when putting into unused streams [BZ #27821]

Message ID 20240814233534.1469084-3-siddhesh@sourceware.org
State Superseded
Headers
Series ungetc fixes for BZ #27821 |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed

Commit Message

Siddhesh Poyarekar Aug. 14, 2024, 11:35 p.m. UTC
  When ungetc is called on an unused stream, the backup buffer is
allocated without the main get area being present.  This results in two
every subsequent ungetc (as the stream remains in the backup area)
checking uninitialized memory in the backup buffer when trying to put a
character back into the stream.

Avoid comparing the input character with buffer contents when in backup
to avoid this uninitialized read.  The uninitialized read is harmless in
this context since the location is promptly overwritten with the input
character, thus fulfilling ungetc functionality.

Also adjust wording in the manual to drop the paragraph that says glibc
cannot do multiple ungetc back to back since with this change, ungetc
can actually do this.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
---
 libio/genops.c            | 2 +-
 manual/stdio.texi         | 8 +++-----
 stdio-common/tst-ungetc.c | 2 ++
 3 files changed, 6 insertions(+), 6 deletions(-)
  

Comments

Andreas Schwab Aug. 15, 2024, 7:15 a.m. UTC | #1
On Aug 14 2024, Siddhesh Poyarekar wrote:

> When ungetc is called on an unused stream, the backup buffer is
> allocated without the main get area being present.  This results in two
> every subsequent ungetc (as the stream remains in the backup area)

I think you want to remove "two" here.
  
Siddhesh Poyarekar Aug. 15, 2024, 1 p.m. UTC | #2
On 2024-08-15 03:15, Andreas Schwab wrote:
> On Aug 14 2024, Siddhesh Poyarekar wrote:
> 
>> When ungetc is called on an unused stream, the backup buffer is
>> allocated without the main get area being present.  This results in two
>> every subsequent ungetc (as the stream remains in the backup area)
> 
> I think you want to remove "two" here.
> 

Fixed commit message, is the change OK otherwise?

Thanks,
Sid
  
Carlos O'Donell Aug. 15, 2024, 5:23 p.m. UTC | #3
On 8/14/24 7:35 PM, Siddhesh Poyarekar wrote:
> When ungetc is called on an unused stream, the backup buffer is
> allocated without the main get area being present.  This results in two
> every subsequent ungetc (as the stream remains in the backup area)
> checking uninitialized memory in the backup buffer when trying to put a
> character back into the stream.
> 
> Avoid comparing the input character with buffer contents when in backup
> to avoid this uninitialized read.  The uninitialized read is harmless in
> this context since the location is promptly overwritten with the input
> character, thus fulfilling ungetc functionality.
> 
> Also adjust wording in the manual to drop the paragraph that says glibc
> cannot do multiple ungetc back to back since with this change, ungetc
> can actually do this.
> 
> Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

LGTM.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> ---
>  libio/genops.c            | 2 +-
>  manual/stdio.texi         | 8 +++-----
>  stdio-common/tst-ungetc.c | 2 ++
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/libio/genops.c b/libio/genops.c
> index 99f5e80f20..b012fa33d2 100644
> --- a/libio/genops.c
> +++ b/libio/genops.c
> @@ -662,7 +662,7 @@ _IO_sputbackc (FILE *fp, int c)
>  {
>    int result;
>  
> -  if (fp->_IO_read_ptr > fp->_IO_read_base
> +  if (fp->_IO_read_ptr > fp->_IO_read_base && !_IO_in_backup (fp)

OK. This now *MATCHES* the conditional in _IO_default_pbackfail.

We avoid reading from _IO_read_ptr[-1] if _IO_in_backup (fp).

If _IO_in_backup (fp) then we should be looking at writing the character directly
into the backup area rather than reading from it.

>        && (unsigned char)fp->_IO_read_ptr[-1] == (unsigned char)c)
>      {
>        fp->_IO_read_ptr--;
> diff --git a/manual/stdio.texi b/manual/stdio.texi
> index 8517653507..92614775fa 100644
> --- a/manual/stdio.texi
> +++ b/manual/stdio.texi
> @@ -1467,11 +1467,9 @@ program; usually @code{ungetc} is used only to unread a character that
>  was just read from the same stream.  @Theglibc{} supports this
>  even on files opened in binary mode, but other systems might not.
>  
> -@Theglibc{} only supports one character of pushback---in other
> -words, it does not work to call @code{ungetc} twice without doing input
> -in between.  Other systems might let you push back multiple characters;
> -then reading from the stream retrieves the characters in the reverse
> -order that they were pushed.
> +@Theglibc{} supports pushing back multiple characters; subsequently
> +reading from the stream retrieves the characters in the reverse order
> +that they were pushed.
>  
>  Pushing back characters doesn't alter the file; only the internal
>  buffering for the stream is affected.  If a file positioning function
> diff --git a/stdio-common/tst-ungetc.c b/stdio-common/tst-ungetc.c
> index 5c808f0734..388b202493 100644
> --- a/stdio-common/tst-ungetc.c
> +++ b/stdio-common/tst-ungetc.c
> @@ -48,6 +48,8 @@ do_test (void)
>    TEST_VERIFY_EXIT (getc (fp) == 'b');
>    TEST_VERIFY_EXIT (getc (fp) == 'l');
>    TEST_VERIFY_EXIT (ungetc ('m', fp) == 'm');
> +  TEST_VERIFY_EXIT (ungetc ('n', fp) == 'n');
> +  TEST_VERIFY_EXIT (getc (fp) == 'n');
>    TEST_VERIFY_EXIT (getc (fp) == 'm');

OK. Adds test case for 2 character ungetc and read back via getc.

>    TEST_VERIFY_EXIT ((c = getc (fp)) == 'a');
>    TEST_VERIFY_EXIT (getc (fp) == EOF);
  

Patch

diff --git a/libio/genops.c b/libio/genops.c
index 99f5e80f20..b012fa33d2 100644
--- a/libio/genops.c
+++ b/libio/genops.c
@@ -662,7 +662,7 @@  _IO_sputbackc (FILE *fp, int c)
 {
   int result;
 
-  if (fp->_IO_read_ptr > fp->_IO_read_base
+  if (fp->_IO_read_ptr > fp->_IO_read_base && !_IO_in_backup (fp)
       && (unsigned char)fp->_IO_read_ptr[-1] == (unsigned char)c)
     {
       fp->_IO_read_ptr--;
diff --git a/manual/stdio.texi b/manual/stdio.texi
index 8517653507..92614775fa 100644
--- a/manual/stdio.texi
+++ b/manual/stdio.texi
@@ -1467,11 +1467,9 @@  program; usually @code{ungetc} is used only to unread a character that
 was just read from the same stream.  @Theglibc{} supports this
 even on files opened in binary mode, but other systems might not.
 
-@Theglibc{} only supports one character of pushback---in other
-words, it does not work to call @code{ungetc} twice without doing input
-in between.  Other systems might let you push back multiple characters;
-then reading from the stream retrieves the characters in the reverse
-order that they were pushed.
+@Theglibc{} supports pushing back multiple characters; subsequently
+reading from the stream retrieves the characters in the reverse order
+that they were pushed.
 
 Pushing back characters doesn't alter the file; only the internal
 buffering for the stream is affected.  If a file positioning function
diff --git a/stdio-common/tst-ungetc.c b/stdio-common/tst-ungetc.c
index 5c808f0734..388b202493 100644
--- a/stdio-common/tst-ungetc.c
+++ b/stdio-common/tst-ungetc.c
@@ -48,6 +48,8 @@  do_test (void)
   TEST_VERIFY_EXIT (getc (fp) == 'b');
   TEST_VERIFY_EXIT (getc (fp) == 'l');
   TEST_VERIFY_EXIT (ungetc ('m', fp) == 'm');
+  TEST_VERIFY_EXIT (ungetc ('n', fp) == 'n');
+  TEST_VERIFY_EXIT (getc (fp) == 'n');
   TEST_VERIFY_EXIT (getc (fp) == 'm');
   TEST_VERIFY_EXIT ((c = getc (fp)) == 'a');
   TEST_VERIFY_EXIT (getc (fp) == EOF);