[1/3] : C++20 P0482R6 and C2X N2653: Fix for bug 25744, mbrtowc with Big5-HKSCS

Message ID 9de491d0-5f3f-518e-47d8-1f25123027bf@honermann.net
State Superseded
Headers
Series : C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and c8rtomb(). |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Tom Honermann Jan. 8, 2022, 12:39 a.m. UTC
  This patch for bug 25744 [1] updates the Big5-HKSCS converter to 
properly maintain the lowest 3 bits of the mbstate_t __count data 
member.  This change is necessary to ensure that state is correctly 
preserved when the converter encounters an incomplete multibyte 
character.  More details are available in bug 25744 [1].

The code changes are styled to match how these bits are maintained by 
converters such as iso-2022-jp.c, ibm930.c, and others.

Running 'grep __count' in the 'iconvdata' directory suggests that a 
number of other converters, euc-jisx0213.c for example, also fail to 
preserve these bits in some cases, though it may be that negative 
effects are not observed for those converters.  This patch does not 
attempt to address such issues with other converters.

This fix was previously posted to this mailing list on April 7th, 2020 
[2], and again on June 7th, 2021 [3], but was not followed up on.

Tested on Linux x86_64.

Tom.

[1]: Bug 25744
      "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the
      second byte of certain double byte characters"
      https://sourceware.org/bugzilla/show_bug.cgi?id=25744

[2]: "[PATCH] Correct the Big5-HKSCS converter to preserve low order
      state bits (bug 25744)"
      https://sourceware.org/pipermail/libc-alpha/2020-April/112595.html

[3]: "[PATCH 1/3]: C++20 P0482R6 and C2X N2653: Fix for bug 25744,
      mbrtowc with Big5-HKSCS"
      https://sourceware.org/pipermail/libc-alpha/2021-June/127231.html
  

Comments

Adhemerval Zanella May 16, 2022, 6:37 p.m. UTC | #1
On 07/01/2022 21:39, Tom Honermann via Libc-alpha wrote:
> Subject:
> [PATCH 1/3]: C++20 P0482R6 and C2X N2653: Fix for bug 25744, mbrtowc with Big5-HKSCS
> From:
> Tom Honermann via Libc-alpha <libc-alpha@sourceware.org>
> Date:
> 07/01/2022 21:39
> 
> To:
> libc-alpha <libc-alpha@sourceware.org>
> 
> 
> This patch for bug 25744 [1] updates the Big5-HKSCS converter to properly maintain the lowest 3 bits of the mbstate_t __count data member.  This change is necessary to ensure that state is correctly preserved when the converter encounters an incomplete multibyte character.  More details are available in bug 25744 [1].
> 
> The code changes are styled to match how these bits are maintained by converters such as iso-2022-jp.c, ibm930.c, and others.
> 
> Running 'grep __count' in the 'iconvdata' directory suggests that a number of other converters, euc-jisx0213.c for example, also fail to preserve these bits in some cases, though it may be that negative effects are not observed for those converters.  This patch does not attempt to address such issues with other converters.
> 
> This fix was previously posted to this mailing list on April 7th, 2020 [2], and again on June 7th, 2021 [3], but was not followed up on.
> 
> Tested on Linux x86_64.
> 
> Tom.
> 
> [1]: Bug 25744
>      "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the
>      second byte of certain double byte characters"
>      https://sourceware.org/bugzilla/show_bug.cgi?id=25744
> 
> [2]: "[PATCH] Correct the Big5-HKSCS converter to preserve low order
>      state bits (bug 25744)"
>      https://sourceware.org/pipermail/libc-alpha/2020-April/112595.html
> 
> [3]: "[PATCH 1/3]: C++20 P0482R6 and C2X N2653: Fix for bug 25744,
>      mbrtowc with Big5-HKSCS"
>      https://sourceware.org/pipermail/libc-alpha/2021-June/127231.html
> 
> n2653-1.patch

Next time please send the patch inline instead of as attachment (as git-send
does).  It is usually easier to reply with most email clients.

> 
> commit 900d307f4bfc5016e1a18711e8909dbce3488d3e
> Author: Tom Honermann <tom@honermann.net>
> Date:   Wed Jan 5 18:02:24 2022 -0500
> 
>     Correct the Big5-HKSCS converter to preserve low order state bits.
>     
>     BZ: https://sourceware.org/bugzilla/show_bug.cgi?id=25744

The fix make sense, although I am not familiar with the locate.  Some comments
on testcase.

> 
> diff --git a/iconvdata/big5hkscs.c b/iconvdata/big5hkscs.c
> index ed51412bb9..2455c9db53 100644
> --- a/iconvdata/big5hkscs.c
> +++ b/iconvdata/big5hkscs.c
> @@ -17769,7 +17769,7 @@ static struct
>     the output state to the initial state.  This has to be done during the
>     flushing.  */
>  #define EMIT_SHIFT_TO_INIT \
> -  if (data->__statep->__count != 0)					      \
> +  if ((data->__statep->__count >> 3) != 0)				      \
>      {									      \
>        if (FROM_DIRECTION)						      \
>  	{								      \
> @@ -17778,7 +17778,7 @@ static struct
>  	      /* Write out the last character.  */			      \
>  	      *((uint32_t *) outbuf) = data->__statep->__count >> 3;	      \
>  	      outbuf += sizeof (uint32_t);				      \
> -	      data->__statep->__count = 0;				      \
> +	      data->__statep->__count &= 7;				      \
>  	    }								      \
>  	  else								      \
>  	    /* We don't have enough room in the output buffer.  */	      \
> @@ -17792,7 +17792,7 @@ static struct
>  	      uint32_t lasttwo = data->__statep->__count >> 3;		      \
>  	      *outbuf++ = (lasttwo >> 8) & 0xff;			      \
>  	      *outbuf++ = lasttwo & 0xff;				      \
> -	      data->__statep->__count = 0;				      \
> +	      data->__statep->__count &= 7;				      \
>  	    }								      \
>  	  else								      \
>  	    /* We don't have enough room in the output buffer.  */	      \
> @@ -17878,7 +17878,7 @@ static struct
>  									      \
>  		/* Otherwise store only the first character now, and	      \
>  		   put the second one into the queue.  */		      \
> -		*statep = ch2 << 3;					      \
> +		*statep = (ch2 << 3) | (*statep & 7);			      \
>  		/* Tell the caller why we terminate the loop.  */	      \
>  		result = __GCONV_FULL_OUTPUT;				      \
>  		break;							      \
> @@ -17895,7 +17895,7 @@ static struct
>        }									      \
>      else								      \
>        /* Clear the queue and proceed to output the saved character.  */	      \
> -      *statep = 0;							      \
> +      *statep &= 7;							      \
>  									      \
>      put32 (outptr, ch);							      \
>      outptr += 4;							      \
> @@ -17946,7 +17946,7 @@ static struct
>  	  }								      \
>  	*outptr++ = (ch >> 8) & 0xff;					      \
>  	*outptr++ = ch & 0xff;						      \
> -	*statep = 0;							      \
> +	*statep &= 7;							      \
>  	inptr += 4;							      \
>  	continue;							      \
>  									      \
> @@ -17959,7 +17959,7 @@ static struct
>  	  }								      \
>  	*outptr++ = (lasttwo >> 8) & 0xff;				      \
>  	*outptr++ = lasttwo & 0xff;					      \
> -	*statep = 0;							      \
> +	*statep &= 7;							      \
>  	continue;							      \
>        }									      \
>  									      \
> @@ -17996,7 +17996,7 @@ static struct
>  	   /* Check for possible combining character.  */		      \
>  	    if (__glibc_unlikely (ch == 0xca || ch == 0xea))		      \
>  	      {								      \
> -		*statep = ((cp[0] << 8) | cp[1]) << 3;			      \
> +		*statep = (((cp[0] << 8) | cp[1]) << 3) | (*statep & 7);      \
>  		inptr += 4;						      \
>  		continue;						      \
>  	      }								      \
> diff --git a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
> index 91d7d3a552..12697ebe23 100644
> --- a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
> +++ b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
> @@ -128,6 +128,71 @@ check_conversion (struct testdata test)
>        printf ("error: Result of third conversion was wrong.\n");
>        err++;
>      }
> +
> +  /* Now perform the same test as above consuming one byte at a time.  */
> +  mbs = test.input;
> +  memset (&st, 0, sizeof (st));
> +
> +  /* Consume the first byte; expect an incomplete multibyte character.  */
> +  ret = mbrtowc (&wc, mbs, 1, &st);
> +  if (ret != -2)
> +    {
> +      printf ("error: First byte conversion returned %zd.\n", ret);
> +      err++;
> +    }

Although rest of the file does not use the libsupport checks, I think it is
expected the newer code does.  So change to:

  TEST_COMPARE (ret, -2);

Some for the rest of tests (there is no need to update err since TEST_COMPARE
will take care of setting the tests a failure for its parent).  If you want
you may also adapt all the previous tests to use TEST_COMPARE as well.

> +  /* Advance past the first consumed byte.  */
> +  mbs += 1;
> +  /* Consume the second byte; expect the first wchar_t.  */
> +  ret = mbrtowc (&wc, mbs, 1, &st);
> +  if (ret != 1)
> +    {
> +      printf ("error: Second byte conversion returned %zd.\n", ret);
> +      err++;
> +    }
> +  /* Advance past the second consumed byte.  */
> +  mbs += 1;
> +  if (wc != test.expected[0])
> +    {
> +      printf ("error: Result of first wchar_t conversion was wrong.\n");
> +      err++;
> +    }
> +  /* Consume no bytes; expect the second wchar_t.  */
> +  ret = mbrtowc (&wc, mbs, 1, &st);
> +  if (ret != 0)
> +    {
> +      printf ("error: First attempt of third byte conversion returned %zd.\n", ret);
> +      err++;
> +    }
> +  /* Do not advance past the third byte.  */
> +  mbs += 0;
> +  if (wc != test.expected[1])
> +    {
> +      printf ("error: Result of second wchar_t conversion was wrong.\n");
> +      err++;
> +    }
> +  /* After the second wchar_t conversion, the converter should be in
> +     the initial state since the two input BIG5-HKSCS bytes have been
> +     consumed and the two wchar_t's have been output.  */
> +  if (mbsinit (&st) == 0)
> +    {
> +      printf ("error: Converter not in initial state.\n");
> +      err++;
> +    }
> +  /* Consume the third byte; expect the third wchar_t.  */
> +  ret = mbrtowc (&wc, mbs, 1, &st);
> +  if (ret != 1)
> +    {
> +      printf ("error: Third byte conversion returned %zd.\n", ret);
> +      err++;
> +    }
> +  /* Advance past the third consumed byte.  */
> +  mbs += 1;
> +  if (wc != test.expected[2])
> +    {
> +      printf ("error: Result of third wchar_t conversion was wrong.\n");
> +      err++;
> +    }
> +
>    /* Return 0 if we saw no errors.  */
>    return err;
>  }
> 
> Attachments:
> 
> n2653-1.patch	5,5 KB
>
  
Tom Honermann May 18, 2022, 3:29 p.m. UTC | #2
Thank you, Adhemerval, Carlos, Joseph, and Florian for your review 
comments in this and other email threads. I'll follow up with a revised 
set of patches soon.

Tom.

On 5/16/22 2:37 PM, Adhemerval Zanella wrote:
>
> On 07/01/2022 21:39, Tom Honermann via Libc-alpha wrote:
>> Subject:
>> [PATCH 1/3]: C++20 P0482R6 and C2X N2653: Fix for bug 25744, mbrtowc with Big5-HKSCS
>> From:
>> Tom Honermann via Libc-alpha <libc-alAdhemervalpha@sourceware.org>
>> Date:
>> 07/01/2022 21:39
>>
>> To:
>> libc-alpha <libc-alpha@sourceware.org>
>>
>>
>> This patch for bug 25744 [1] updates the Big5-HKSCS converter to properly maintain the lowest 3 bits of the mbstate_t __count data member.  This change is necessary to ensure that state is correctly preserved when the converter encounters an incomplete multibyte character.  More details are available in bug 25744 [1].
>>
>> The code changes are styled to match how these bits are maintained by converters such as iso-2022-jp.c, ibm930.c, and others.
>>
>> Running 'grep __count' in the 'iconvdata' directory suggests that a number of other converters, euc-jisx0213.c for example, also fail to preserve these bits in some cases, though it may be that negative effects are not observed for those converters.  This patch does not attempt to address such issues with other converters.
>>
>> This fix was previously posted to this mailing list on April 7th, 2020 [2], and again on June 7th, 2021 [3], but was not followed up on.
>>
>> Tested on Linux x86_64.
>>
>> Tom.
>>
>> [1]: Bug 25744
>>       "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the
>>       second byte of certain double byte characters"
>>       https://sourceware.org/bugzilla/show_bug.cgi?id=25744
>>
>> [2]: "[PATCH] Correct the Big5-HKSCS converter to preserve low order
>>       state bits (bug 25744)"
>>       https://sourceware.org/pipermail/libc-alpha/2020-April/112595.html
>>
>> [3]: "[PATCH 1/3]: C++20 P0482R6 and C2X N2653: Fix for bug 25744,
>>       mbrtowc with Big5-HKSCS"
>>       https://sourceware.org/pipermail/libc-alpha/2021-June/127231.html
>>
>> n2653-1.patch
> Next time please send the patch inline instead of as attachment (as git-send
> does).  It is usually easier to reply with most email clients.
>
>> commit 900d307f4bfc5016e1a18711e8909dbce3488d3e
>> Author: Tom Honermann <tom@honermann.net>
>> Date:   Wed Jan 5 18:02:24 2022 -0500
>>
>>      Correct the Big5-HKSCS converter to preserve low order state bits.
>>      
>>      BZ: https://sourceware.org/bugzilla/show_bug.cgi?id=25744
> The fix make sense, although I am not familiar with the locate.  Some comments
> on testcase.
>
>> diff --git a/iconvdata/big5hkscs.c b/iconvdata/big5hkscs.c
>> index ed51412bb9..2455c9db53 100644
>> --- a/iconvdata/big5hkscs.c
>> +++ b/iconvdata/big5hkscs.c
>> @@ -17769,7 +17769,7 @@ static struct
>>      the output state to the initial state.  This has to be done during the
>>      flushing.  */
>>   #define EMIT_SHIFT_TO_INIT \
>> -  if (data->__statep->__count != 0)					      \
>> +  if ((data->__statep->__count >> 3) != 0)				      \
>>       {									      \
>>         if (FROM_DIRECTION)						      \
>>   	{								      \
>> @@ -17778,7 +17778,7 @@ static struct
>>   	      /* Write out the last character.  */			      \
>>   	      *((uint32_t *) outbuf) = data->__statep->__count >> 3;	      \
>>   	      outbuf += sizeof (uint32_t);				      \
>> -	      data->__statep->__count = 0;				      \
>> +	      data->__statep->__count &= 7;				      \
>>   	    }								      \
>>   	  else								      \
>>   	    /* We don't have enough room in the output buffer.  */	      \
>> @@ -17792,7 +17792,7 @@ static struct
>>   	      uint32_t lasttwo = data->__statep->__count >> 3;		      \
>>   	      *outbuf++ = (lasttwo >> 8) & 0xff;			      \
>>   	      *outbuf++ = lasttwo & 0xff;				      \
>> -	      data->__statep->__count = 0;				      \
>> +	      data->__statep->__count &= 7;				      \
>>   	    }								      \
>>   	  else								      \
>>   	    /* We don't have enough room in the output buffer.  */	      \
>> @@ -17878,7 +17878,7 @@ static struct
>>   									      \
>>   		/* Otherwise store only the first character now, and	      \
>>   		   put the second one into the queue.  */		      \
>> -		*statep = ch2 << 3;					      \
>> +		*statep = (ch2 << 3) | (*statep & 7);			      \
>>   		/* Tell the caller why we terminate the loop.  */	      \
>>   		result = __GCONV_FULL_OUTPUT;				      \
>>   		break;							      \
>> @@ -17895,7 +17895,7 @@ static struct
>>         }									      \
>>       else								      \
>>         /* Clear the queue and proceed to output the saved character.  */	      \
>> -      *statep = 0;							      \
>> +      *statep &= 7;							      \
>>   									      \
>>       put32 (outptr, ch);							      \
>>       outptr += 4;							      \
>> @@ -17946,7 +17946,7 @@ static struct
>>   	  }								      \
>>   	*outptr++ = (ch >> 8) & 0xff;					      \
>>   	*outptr++ = ch & 0xff;						      \
>> -	*statep = 0;							      \
>> +	*statep &= 7;							      \
>>   	inptr += 4;							      \
>>   	continue;							      \
>>   									      \
>> @@ -17959,7 +17959,7 @@ static struct
>>   	  }								      \
>>   	*outptr++ = (lasttwo >> 8) & 0xff;				      \
>>   	*outptr++ = lasttwo & 0xff;					      \
>> -	*statep = 0;							      \
>> +	*statep &= 7;							      \
>>   	continue;							      \
>>         }									      \
>>   									      \
>> @@ -17996,7 +17996,7 @@ static struct
>>   	   /* Check for possible combining character.  */		      \
>>   	    if (__glibc_unlikely (ch == 0xca || ch == 0xea))		      \
>>   	      {								      \
>> -		*statep = ((cp[0] << 8) | cp[1]) << 3;			      \
>> +		*statep = (((cp[0] << 8) | cp[1]) << 3) | (*statep & 7);      \
>>   		inptr += 4;						      \
>>   		continue;						      \
>>   	      }								      \
>> diff --git a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
>> index 91d7d3a552..12697ebe23 100644
>> --- a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
>> +++ b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
>> @@ -128,6 +128,71 @@ check_conversion (struct testdata test)
>>         printf ("error: Result of third conversion was wrong.\n");
>>         err++;
>>       }
>> +
>> +  /* Now perform the same test as above consuming one byte at a time.  */
>> +  mbs = test.input;
>> +  memset (&st, 0, sizeof (st));
>> +
>> +  /* Consume the first byte; expect an incomplete multibyte character.  */
>> +  ret = mbrtowc (&wc, mbs, 1, &st);
>> +  if (ret != -2)
>> +    {
>> +      printf ("error: First byte conversion returned %zd.\n", ret);
>> +      err++;
>> +    }
> Although rest of the file does not use the libsupport checks, I think it is
> expected the newer code does.  So change to:
>
>    TEST_COMPARE (ret, -2);
>
> Some for the rest of tests (there is no need to update err since TEST_COMPARE
> will take care of setting the tests a failure for its parent).  If you want
> you may also adapt all the previous tests to use TEST_COMPARE as well.
>
>> +  /* Advance past the first consumed byte.  */
>> +  mbs += 1;
>> +  /* Consume the second byte; expect the first wchar_t.  */
>> +  ret = mbrtowc (&wc, mbs, 1, &st);
>> +  if (ret != 1)
>> +    {
>> +      printf ("error: Second byte conversion returned %zd.\n", ret);
>> +      err++;
>> +    }
>> +  /* Advance past the second consumed byte.  */
>> +  mbs += 1;
>> +  if (wc != test.expected[0])
>> +    {
>> +      printf ("error: Result of first wchar_t conversion was wrong.\n");
>> +      err++;
>> +    }
>> +  /* Consume no bytes; expect the second wchar_t.  */
>> +  ret = mbrtowc (&wc, mbs, 1, &st);
>> +  if (ret != 0)
>> +    {
>> +      printf ("error: First attempt of third byte conversion returned %zd.\n", ret);
>> +      err++;
>> +    }
>> +  /* Do not advance past the third byte.  */
>> +  mbs += 0;
>> +  if (wc != test.expected[1])
>> +    {
>> +      printf ("error: Result of second wchar_t conversion was wrong.\n");
>> +      err++;
>> +    }
>> +  /* After the second wchar_t conversion, the converter should be in
>> +     the initial state since the two input BIG5-HKSCS bytes have been
>> +     consumed and the two wchar_t's have been output.  */
>> +  if (mbsinit (&st) == 0)
>> +    {
>> +      printf ("error: Converter not in initial state.\n");
>> +      err++;
>> +    }
>> +  /* Consume the third byte; expect the third wchar_t.  */
>> +  ret = mbrtowc (&wc, mbs, 1, &st);
>> +  if (ret != 1)
>> +    {
>> +      printf ("error: Third byte conversion returned %zd.\n", ret);
>> +      err++;
>> +    }
>> +  /* Advance past the third consumed byte.  */
>> +  mbs += 1;
>> +  if (wc != test.expected[2])
>> +    {
>> +      printf ("error: Result of third wchar_t conversion was wrong.\n");
>> +      err++;
>> +    }
>> +
>>     /* Return 0 if we saw no errors.  */
>>     return err;
>>   }
>>
>> Attachments:
>>
>> n2653-1.patch	5,5 KB
>>
  

Patch

commit 900d307f4bfc5016e1a18711e8909dbce3488d3e
Author: Tom Honermann <tom@honermann.net>
Date:   Wed Jan 5 18:02:24 2022 -0500

    Correct the Big5-HKSCS converter to preserve low order state bits.
    
    BZ: https://sourceware.org/bugzilla/show_bug.cgi?id=25744

diff --git a/iconvdata/big5hkscs.c b/iconvdata/big5hkscs.c
index ed51412bb9..2455c9db53 100644
--- a/iconvdata/big5hkscs.c
+++ b/iconvdata/big5hkscs.c
@@ -17769,7 +17769,7 @@  static struct
    the output state to the initial state.  This has to be done during the
    flushing.  */
 #define EMIT_SHIFT_TO_INIT \
-  if (data->__statep->__count != 0)					      \
+  if ((data->__statep->__count >> 3) != 0)				      \
     {									      \
       if (FROM_DIRECTION)						      \
 	{								      \
@@ -17778,7 +17778,7 @@  static struct
 	      /* Write out the last character.  */			      \
 	      *((uint32_t *) outbuf) = data->__statep->__count >> 3;	      \
 	      outbuf += sizeof (uint32_t);				      \
-	      data->__statep->__count = 0;				      \
+	      data->__statep->__count &= 7;				      \
 	    }								      \
 	  else								      \
 	    /* We don't have enough room in the output buffer.  */	      \
@@ -17792,7 +17792,7 @@  static struct
 	      uint32_t lasttwo = data->__statep->__count >> 3;		      \
 	      *outbuf++ = (lasttwo >> 8) & 0xff;			      \
 	      *outbuf++ = lasttwo & 0xff;				      \
-	      data->__statep->__count = 0;				      \
+	      data->__statep->__count &= 7;				      \
 	    }								      \
 	  else								      \
 	    /* We don't have enough room in the output buffer.  */	      \
@@ -17878,7 +17878,7 @@  static struct
 									      \
 		/* Otherwise store only the first character now, and	      \
 		   put the second one into the queue.  */		      \
-		*statep = ch2 << 3;					      \
+		*statep = (ch2 << 3) | (*statep & 7);			      \
 		/* Tell the caller why we terminate the loop.  */	      \
 		result = __GCONV_FULL_OUTPUT;				      \
 		break;							      \
@@ -17895,7 +17895,7 @@  static struct
       }									      \
     else								      \
       /* Clear the queue and proceed to output the saved character.  */	      \
-      *statep = 0;							      \
+      *statep &= 7;							      \
 									      \
     put32 (outptr, ch);							      \
     outptr += 4;							      \
@@ -17946,7 +17946,7 @@  static struct
 	  }								      \
 	*outptr++ = (ch >> 8) & 0xff;					      \
 	*outptr++ = ch & 0xff;						      \
-	*statep = 0;							      \
+	*statep &= 7;							      \
 	inptr += 4;							      \
 	continue;							      \
 									      \
@@ -17959,7 +17959,7 @@  static struct
 	  }								      \
 	*outptr++ = (lasttwo >> 8) & 0xff;				      \
 	*outptr++ = lasttwo & 0xff;					      \
-	*statep = 0;							      \
+	*statep &= 7;							      \
 	continue;							      \
       }									      \
 									      \
@@ -17996,7 +17996,7 @@  static struct
 	   /* Check for possible combining character.  */		      \
 	    if (__glibc_unlikely (ch == 0xca || ch == 0xea))		      \
 	      {								      \
-		*statep = ((cp[0] << 8) | cp[1]) << 3;			      \
+		*statep = (((cp[0] << 8) | cp[1]) << 3) | (*statep & 7);      \
 		inptr += 4;						      \
 		continue;						      \
 	      }								      \
diff --git a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
index 91d7d3a552..12697ebe23 100644
--- a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
+++ b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
@@ -128,6 +128,71 @@  check_conversion (struct testdata test)
       printf ("error: Result of third conversion was wrong.\n");
       err++;
     }
+
+  /* Now perform the same test as above consuming one byte at a time.  */
+  mbs = test.input;
+  memset (&st, 0, sizeof (st));
+
+  /* Consume the first byte; expect an incomplete multibyte character.  */
+  ret = mbrtowc (&wc, mbs, 1, &st);
+  if (ret != -2)
+    {
+      printf ("error: First byte conversion returned %zd.\n", ret);
+      err++;
+    }
+  /* Advance past the first consumed byte.  */
+  mbs += 1;
+  /* Consume the second byte; expect the first wchar_t.  */
+  ret = mbrtowc (&wc, mbs, 1, &st);
+  if (ret != 1)
+    {
+      printf ("error: Second byte conversion returned %zd.\n", ret);
+      err++;
+    }
+  /* Advance past the second consumed byte.  */
+  mbs += 1;
+  if (wc != test.expected[0])
+    {
+      printf ("error: Result of first wchar_t conversion was wrong.\n");
+      err++;
+    }
+  /* Consume no bytes; expect the second wchar_t.  */
+  ret = mbrtowc (&wc, mbs, 1, &st);
+  if (ret != 0)
+    {
+      printf ("error: First attempt of third byte conversion returned %zd.\n", ret);
+      err++;
+    }
+  /* Do not advance past the third byte.  */
+  mbs += 0;
+  if (wc != test.expected[1])
+    {
+      printf ("error: Result of second wchar_t conversion was wrong.\n");
+      err++;
+    }
+  /* After the second wchar_t conversion, the converter should be in
+     the initial state since the two input BIG5-HKSCS bytes have been
+     consumed and the two wchar_t's have been output.  */
+  if (mbsinit (&st) == 0)
+    {
+      printf ("error: Converter not in initial state.\n");
+      err++;
+    }
+  /* Consume the third byte; expect the third wchar_t.  */
+  ret = mbrtowc (&wc, mbs, 1, &st);
+  if (ret != 1)
+    {
+      printf ("error: Third byte conversion returned %zd.\n", ret);
+      err++;
+    }
+  /* Advance past the third consumed byte.  */
+  mbs += 1;
+  if (wc != test.expected[2])
+    {
+      printf ("error: Result of third wchar_t conversion was wrong.\n");
+      err++;
+    }
+
   /* Return 0 if we saw no errors.  */
   return err;
 }