[committed,PR,rtl-optimization/116199] Fix latent bug in reload's SUBREG handling

Message ID 3329c491-59e9-4d18-b1ba-901610e50c94@gmail.com
State Committed
Commit cfeb994d64743691d4a787f8eef7ce52d8711756
Headers
Series [committed,PR,rtl-optimization/116199] Fix latent bug in reload's SUBREG handling |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm warning Patch is already merged
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 warning Patch is already merged

Commit Message

Jeff Law Aug. 4, 2024, 7:12 p.m. UTC
  Building glibc on the m68k has exposed a long standing latent bug in reload.

Basically ext-dce replaced an extension with a subreg expression (good) 
resulting in this pair of insns:

> (insn 7 4 8 2 (set (reg:DI 31 [ _1 ])
>         (subreg:DI (reg/v:SI 37 [ __major ]) 0)) "j.c":7:32 75 {*m68k.md:1568}
>      (nil))
> (insn 8 7 10 2 (set (reg:DI 39 [ _2 ])
>         (ashift:DI (reg:DI 31 [ _1 ])
>             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
>      (expr_list:REG_DEAD (reg:DI 31 [ _1 ])
>         (nil)))


insn 7 was optimized to the simple copy by ext-dce.  That looks fine. 
Combine comes along and squashes them together resulting in:

> (insn 8 7 10 2 (set (reg:DI 39 [ _2 ])
>         (ashift:DI (subreg:DI (reg/v:SI 37 [ __major ]) 0)
>             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
>      (nil))

Which also looks good.

After IRA's allocation, in the middle of reload we have:

> (insn 8 7 10 2 (set (reg:DI 8 %a0 [orig:39 _2 ] [39])
>         (ashift:DI (subreg:DI (reg/v:SI 0 %d0 [orig:37 __major ] [37]) 0)
>             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
>      (nil))


Again, sensible.  The pattern requires op0 and op1 to match, so we try 
to figure out if d0 & a0 are the same underlying register.  So we get 
into this code in operands_match_p:



>       if (code == SUBREG)
>         {
>           i = REGNO (SUBREG_REG (x)); 
>           if (i >= FIRST_PSEUDO_REGISTER)
>             goto slow;
>           i += subreg_regno_offset (REGNO (SUBREG_REG (x)), 
>                                     GET_MODE (SUBREG_REG (x)),
>                                     SUBREG_BYTE (x),
>                                     GET_MODE (x));
>         }
>       else
>         i = REGNO (x);

There's a similar fragment for the other operand.  The key is that 
subreg_regno_offset call.  That call assumes the subreg is 
representable.  But in the case of (subreg:DI (reg:SI d0)) we're going 
to get -1 (remember, m68k is a big endian target).  That -1 gets passed 
to hard_regno_regs via this code (again, just showing one of the two 
copies of this fragment):

>      if (REG_WORDS_BIG_ENDIAN
>           && is_a <scalar_int_mode> (GET_MODE (x), &xmode)
>           && GET_MODE_SIZE (xmode) > UNITS_PER_WORD
>           && i < FIRST_PSEUDO_REGISTER)
>         i += hard_regno_nregs (i, xmode) - 1;


That triggers the reported ICE.  It appears this has been broken since 
the conversion to SUBREG_BYTE way back in 2001, though possibly it could 
have been some minor changes around this code circa 2005 as well, it 
didn't seem worth putting under the debugger to be sure.  Certainly the 
code from 2001 looks suspicious to me.

Anyway, the fix here is pretty simple.  The routine 
"simplify_subreg_regno" is meant to be used to determine if we can 
simplify the subreg expression and will explicitly return -1 if it can't 
be represented for one reason or another.  It checks a variety of 
conditions that aren't worth listing here.


Bootstrapped and regression tested on x86 (after reverting an unrelated 
patch from Richard S that's causing multiple unrelated failures), which 
of course doesn't really test the code as x86 is an LRA target.  Also 
built & tested the crosses, none of which show issues (and some of which 
are reload targets).  m68k will bootstrap & regression test tomorrow, 
but I don't think there's any point in waiting for that.

Pushing to the trunk.

jeff
commit cfeb994d64743691d4a787f8eef7ce52d8711756
Author: Jeff Law <jlaw@ventanamicro.com>
Date:   Sun Aug 4 13:09:02 2024 -0600

    [committed][PR rtl-optimization/116199] Fix latent bug in reload's SUBREG handling
    
    Building glibc on the m68k has exposed a long standing latent bug in reload.
    
    Basically ext-dce replaced an extension with a subreg expression (good)
    resulting in this pair of insns:
    
    > (insn 7 4 8 2 (set (reg:DI 31 [ _1 ])
    >         (subreg:DI (reg/v:SI 37 [ __major ]) 0)) "j.c":7:32 75 {*m68k.md:1568}
    >      (nil))
    > (insn 8 7 10 2 (set (reg:DI 39 [ _2 ])
    >         (ashift:DI (reg:DI 31 [ _1 ])
    >             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
    >      (expr_list:REG_DEAD (reg:DI 31 [ _1 ])
    >         (nil)))
    
    insn 7 was optimized to the simple copy by ext-dce.  That looks fine.  Combine
    comes along and squashes them together resulting in:
    
    > (insn 8 7 10 2 (set (reg:DI 39 [ _2 ])
    >         (ashift:DI (subreg:DI (reg/v:SI 37 [ __major ]) 0)
    >             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
    >      (nil))
    
    Which also looks good.
    
    After IRA's allocation, in the middle of reload we have:
    
    > (insn 8 7 10 2 (set (reg:DI 8 %a0 [orig:39 _2 ] [39])
    >         (ashift:DI (subreg:DI (reg/v:SI 0 %d0 [orig:37 __major ] [37]) 0)
    >             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
    >      (nil))
    
    Again, sensible.  The pattern requires op0 and op1 to match, so we try to
    figure out if d0 & a0 are the same underlying register.  So we get into this
    code in operands_match_p:
    
    >       if (code == SUBREG)
    >         {
    >           i = REGNO (SUBREG_REG (x));
    >           if (i >= FIRST_PSEUDO_REGISTER)
    >             goto slow;
    >           i += subreg_regno_offset (REGNO (SUBREG_REG (x)),
    >                                     GET_MODE (SUBREG_REG (x)),
    >                                     SUBREG_BYTE (x),
    >                                     GET_MODE (x));
    >         }
    >       else
    >         i = REGNO (x);
    There's a similar fragment for the other operand.  The key is that
    subreg_regno_offset call.  That call assumes the subreg is representable.  But
    in the case of (subreg:DI (reg:SI d0)) we're going to get -1 (remember, m68k is
    a big endian target).  That -1 gets passed to hard_regno_regs via this code
    (again, just showing one of the two copies of this fragment):
    
    >      if (REG_WORDS_BIG_ENDIAN
    >           && is_a <scalar_int_mode> (GET_MODE (x), &xmode)
    >           && GET_MODE_SIZE (xmode) > UNITS_PER_WORD
    >           && i < FIRST_PSEUDO_REGISTER)
    >         i += hard_regno_nregs (i, xmode) - 1;
    
    That triggers the reported ICE.  It appears this has been broken since the
    conversion to SUBREG_BYTE way back in 2001, though possibly it could have been
    some minor changes around this code circa 2005 as well, it didn't seem worth
    putting under the debugger to be sure.  Certainly the code from 2001 looks
    suspicious to me.
    
    Anyway, the fix here is pretty simple.  The routine "simplify_subreg_regno" is
    meant to be used to determine if we can simplify the subreg expression and will
    explicitly return -1 if it can't be represented for one reason or another.  It
    checks a variety of conditions that aren't worth listing here.
    
    Bootstrapped and regression tested on x86 (after reverting an unrelated patch
    from Richard S that's causing multiple unrelated failures), which of course
    doesn't really test the code as x86 is an LRA target.  Also built & tested the
    crosses, none of which show issues (and some of which are reload targets).
    m68k will bootstrap & regression test tomorrow, but I don't think there's any
    point in waiting for that.
    
    Pushing to the trunk.
    
            PR rtl-optimization/116199
    gcc/
            * reload.cc (operands_match_p): Verify subreg is expressable before
            trying to simplify and match it to another operand.
    
    gcc/testsuite/
            * gcc.dg/torture/pr116199.c: New test.
  

Comments

Georg-Johann Lay Aug. 4, 2024, 8:47 p.m. UTC | #1
> 
> 
> Building glibc on the m68k has exposed a long standing latent bug in reload.
> 
> Basically ext-dce replaced an extension with a subreg expression (good) resulting in this pair of insns:
> 
>> (insn 7 4 8 2 (set (reg:DI 31 [ _1 ])
>>         (subreg:DI (reg/v:SI 37 [ __major ]) 0)) "j.c":7:32 75 {*m68k.md:1568}
>>      (nil))
>> (insn 8 7 10 2 (set (reg:DI 39 [ _2 ])
>>         (ashift:DI (reg:DI 31 [ _1 ])
>>             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
>>      (expr_list:REG_DEAD (reg:DI 31 [ _1 ])
>>         (nil)))
> 
> 
> insn 7 was optimized to the simple copy by ext-dce.  That looks fine. Combine comes along and squashes them together resulting in:
> 
>> (insn 8 7 10 2 (set (reg:DI 39 [ _2 ])
>>         (ashift:DI (subreg:DI (reg/v:SI 37 [ __major ]) 0)
>>             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
>>      (nil))
> 
> Which also looks good.
> 
> After IRA's allocation, in the middle of reload we have:
> 
>> (insn 8 7 10 2 (set (reg:DI 8 %a0 [orig:39 _2 ] [39])
>>         (ashift:DI (subreg:DI (reg/v:SI 0 %d0 [orig:37 __major ] [37]) 0)
>>             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
>>      (nil))
> 
> 
> Again, sensible.  The pattern requires op0 and op1 to match, so we try to figure out if d0 & a0 are the same underlying register.  So we get into this code in operands_match_p:
> 
> 
> 
>>       if (code == SUBREG)
>>         {
>>           i = REGNO (SUBREG_REG (x));           if (i >= FIRST_PSEUDO_REGISTER)
>>             goto slow;
>>           i += subreg_regno_offset (REGNO (SUBREG_REG (x)),                                     GET_MODE (SUBREG_REG (x)),
>>                                     SUBREG_BYTE (x),
>>                                     GET_MODE (x));
>>         }
>>       else
>>         i = REGNO (x);
> 
> There's a similar fragment for the other operand.  The key is that subreg_regno_offset call.  That call assumes the subreg is representable.  But in the case of (subreg:DI (reg:SI d0)) we're going to get -1 (remember, m68k is a big endian target).  That -1 gets passed to hard_regno_regs via this code (again, just showing one of the two copies of this fragment):
> 
>>      if (REG_WORDS_BIG_ENDIAN
>>           && is_a <scalar_int_mode> (GET_MODE (x), &xmode)
>>           && GET_MODE_SIZE (xmode) > UNITS_PER_WORD
>>           && i < FIRST_PSEUDO_REGISTER)
>>         i += hard_regno_nregs (i, xmode) - 1;
> 
> 
> That triggers the reported ICE.  It appears this has been broken since the conversion to SUBREG_BYTE way back in 2001, though possibly it could have been some minor changes around this code circa 2005 as well, it didn't seem worth putting under the debugger to be sure.  Certainly the code from 2001 looks suspicious to me.
> 
> Anyway, the fix here is pretty simple.  The routine "simplify_subreg_regno" is meant to be used to determine if we can simplify the subreg expression and will explicitly return -1 if it can't be represented for one reason or another.  It checks a variety of conditions that aren't worth listing here.
> 
> 
> Bootstrapped and regression tested on x86 (after reverting an unrelated patch from Richard S that's causing multiple unrelated failures), which of course doesn't really test the code as x86 is an LRA target.  Also built & tested the crosses, none of which show issues (and some of which are reload targets).  m68k will bootstrap & regression test tomorrow, but I don't think there's any point in waiting for that.
> 
> Pushing to the trunk.
> 
> jeff
> 
> P
> 
> commit cfeb994d64743691d4a787f8eef7ce52d8711756
> Author: Jeff Law <jlaw@ventanamicro.com>
> Date:   Sun Aug 4 13:09:02 2024 -0600
> 
>     [committed][PR rtl-optimization/116199] Fix latent bug in reload's SUBREG handling
>     
>     Building glibc on the m68k has exposed a long standing latent bug in reload.
>     
>     Basically ext-dce replaced an extension with a subreg expression (good)
>     resulting in this pair of insns:
>     
>     > (insn 7 4 8 2 (set (reg:DI 31 [ _1 ])
>     >         (subreg:DI (reg/v:SI 37 [ __major ]) 0)) "j.c":7:32 75 {*m68k.md:1568}
>     >      (nil))
>     > (insn 8 7 10 2 (set (reg:DI 39 [ _2 ])
>     >         (ashift:DI (reg:DI 31 [ _1 ])
>     >             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
>     >      (expr_list:REG_DEAD (reg:DI 31 [ _1 ])
>     >         (nil)))
>     
>     insn 7 was optimized to the simple copy by ext-dce.  That looks fine.  Combine
>     comes along and squashes them together resulting in:
>     
>     > (insn 8 7 10 2 (set (reg:DI 39 [ _2 ])
>     >         (ashift:DI (subreg:DI (reg/v:SI 37 [ __major ]) 0)
>     >             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
>     >      (nil))
>     
>     Which also looks good.
>     
>     After IRA's allocation, in the middle of reload we have:
>     
>     > (insn 8 7 10 2 (set (reg:DI 8 %a0 [orig:39 _2 ] [39])
>     >         (ashift:DI (subreg:DI (reg/v:SI 0 %d0 [orig:37 __major ] [37]) 0)
>     >             (const_int 8 [0x8]))) "j.c":7:48 322 {*ashldi3}
>     >      (nil))
>     
>     Again, sensible.  The pattern requires op0 and op1 to match, so we try to
>     figure out if d0 & a0 are the same underlying register.  So we get into this
>     code in operands_match_p:
>     
>     >       if (code == SUBREG)
>     >         {
>     >           i = REGNO (SUBREG_REG (x));
>     >           if (i >= FIRST_PSEUDO_REGISTER)
>     >             goto slow;
>     >           i += subreg_regno_offset (REGNO (SUBREG_REG (x)),
>     >                                     GET_MODE (SUBREG_REG (x)),
>     >                                     SUBREG_BYTE (x),
>     >                                     GET_MODE (x));
>     >         }
>     >       else
>     >         i = REGNO (x);
>     There's a similar fragment for the other operand.  The key is that
>     subreg_regno_offset call.  That call assumes the subreg is representable.  But
>     in the case of (subreg:DI (reg:SI d0)) we're going to get -1 (remember, m68k is
>     a big endian target).  That -1 gets passed to hard_regno_regs via this code
>     (again, just showing one of the two copies of this fragment):
>     
>     >      if (REG_WORDS_BIG_ENDIAN
>     >           && is_a <scalar_int_mode> (GET_MODE (x), &xmode)
>     >           && GET_MODE_SIZE (xmode) > UNITS_PER_WORD
>     >           && i < FIRST_PSEUDO_REGISTER)
>     >         i += hard_regno_nregs (i, xmode) - 1;
>     
>     That triggers the reported ICE.  It appears this has been broken since the
>     conversion to SUBREG_BYTE way back in 2001, though possibly it could have been
>     some minor changes around this code circa 2005 as well, it didn't seem worth
>     putting under the debugger to be sure.  Certainly the code from 2001 looks
>     suspicious to me.
>     
>     Anyway, the fix here is pretty simple.  The routine "simplify_subreg_regno" is
>     meant to be used to determine if we can simplify the subreg expression and will
>     explicitly return -1 if it can't be represented for one reason or another.  It
>     checks a variety of conditions that aren't worth listing here.
>     
>     Bootstrapped and regression tested on x86 (after reverting an unrelated patch
>     from Richard S that's causing multiple unrelated failures), which of course
>     doesn't really test the code as x86 is an LRA target.  Also built & tested the
>     crosses, none of which show issues (and some of which are reload targets).
>     m68k will bootstrap & regression test tomorrow, but I don't think there's any
>     point in waiting for that.
>     
>     Pushing to the trunk.
>     
>             PR rtl-optimization/116199
>     gcc/
>             * reload.cc (operands_match_p): Verify subreg is expressable before
>             trying to simplify and match it to another operand.
>     
>     gcc/testsuite/
>             * gcc.dg/torture/pr116199.c: New test.
> 
> diff --git a/gcc/reload.cc b/gcc/reload.cc
> index fecf245c827..f849f1559dc 100644
> --- a/gcc/reload.cc
> +++ b/gcc/reload.cc
> @@ -2211,7 +2211,11 @@ operands_match_p (rtx x, rtx y)
>        if (code == SUBREG)
>  	{
>  	  i = REGNO (SUBREG_REG (x));
> -	  if (i >= FIRST_PSEUDO_REGISTER)
> +	  if (i >= FIRST_PSEUDO_REGISTER
> +	      || simplify_subreg_regno (REGNO (SUBREG_REG (x)),
> +					 GET_MODE (SUBREG_REG (x)),
> +					 SUBREG_BYTE (x),
> +					 GET_MODE (x)) == -1)
>  	    goto slow;
>  	  i += subreg_regno_offset (REGNO (SUBREG_REG (x)),
>  				    GET_MODE (SUBREG_REG (x)),
> @@ -2224,7 +2228,11 @@ operands_match_p (rtx x, rtx y)
>        if (GET_CODE (y) == SUBREG)
>  	{
>  	  j = REGNO (SUBREG_REG (y));
> -	  if (j >= FIRST_PSEUDO_REGISTER)
> +	  if (j >= FIRST_PSEUDO_REGISTER
> +	      || simplify_subreg_regno (REGNO (SUBREG_REG (y)),
> +					 GET_MODE (SUBREG_REG (y)),
> +					 SUBREG_BYTE (y),
> +					 GET_MODE (y)) == -1)
>  	    goto slow;
>  	  j += subreg_regno_offset (REGNO (SUBREG_REG (y)),
>  				    GET_MODE (SUBREG_REG (y)),
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116199.c b/gcc/testsuite/gcc.dg/torture/pr116199.c
> new file mode 100644
> index 00000000000..1012816180e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116199.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target int32 } */
> +
> +__extension__ typedef unsigned long long int __uint64_t;

In almost all cases, __UINT64_TYPE__ etc. is more robust in test
cases than using long or short etc.

Johann

> +__extension__ typedef __uint64_t __dev_t;
> +
> +__dev_t __gnu_dev_makedev (unsigned int __major, unsigned int __minor)
> +{
> +  __dev_t __dev;
> +  __dev = (((__dev_t) (__major & 0x00000fffu)) << 8);
> +  __dev |= (((__dev_t) (__major & 0xfffff000u)) << 32);
> +  return __dev;
> +}
>
  
Jakub Jelinek Aug. 4, 2024, 8:57 p.m. UTC | #2
On Sun, Aug 04, 2024 at 10:47:57PM +0200, Georg-Johann Lay wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/torture/pr116199.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target int32 } */
> > +
> > +__extension__ typedef unsigned long long int __uint64_t;
> 
> In almost all cases, __UINT64_TYPE__ etc. is more robust in test
> cases than using long or short etc.

unsigned long long is just fine.  The C standard requires it is at least
64-bits and GCC doesn't support it larger than that on any arch.

	Jakub
  

Patch

diff --git a/gcc/reload.cc b/gcc/reload.cc
index fecf245c827..f849f1559dc 100644
--- a/gcc/reload.cc
+++ b/gcc/reload.cc
@@ -2211,7 +2211,11 @@  operands_match_p (rtx x, rtx y)
       if (code == SUBREG)
 	{
 	  i = REGNO (SUBREG_REG (x));
-	  if (i >= FIRST_PSEUDO_REGISTER)
+	  if (i >= FIRST_PSEUDO_REGISTER
+	      || simplify_subreg_regno (REGNO (SUBREG_REG (x)),
+					 GET_MODE (SUBREG_REG (x)),
+					 SUBREG_BYTE (x),
+					 GET_MODE (x)) == -1)
 	    goto slow;
 	  i += subreg_regno_offset (REGNO (SUBREG_REG (x)),
 				    GET_MODE (SUBREG_REG (x)),
@@ -2224,7 +2228,11 @@  operands_match_p (rtx x, rtx y)
       if (GET_CODE (y) == SUBREG)
 	{
 	  j = REGNO (SUBREG_REG (y));
-	  if (j >= FIRST_PSEUDO_REGISTER)
+	  if (j >= FIRST_PSEUDO_REGISTER
+	      || simplify_subreg_regno (REGNO (SUBREG_REG (y)),
+					 GET_MODE (SUBREG_REG (y)),
+					 SUBREG_BYTE (y),
+					 GET_MODE (y)) == -1)
 	    goto slow;
 	  j += subreg_regno_offset (REGNO (SUBREG_REG (y)),
 				    GET_MODE (SUBREG_REG (y)),
diff --git a/gcc/testsuite/gcc.dg/torture/pr116199.c b/gcc/testsuite/gcc.dg/torture/pr116199.c
new file mode 100644
index 00000000000..1012816180e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116199.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target int32 } */
+
+__extension__ typedef unsigned long long int __uint64_t;
+__extension__ typedef __uint64_t __dev_t;
+
+__dev_t __gnu_dev_makedev (unsigned int __major, unsigned int __minor)
+{
+  __dev_t __dev;
+  __dev = (((__dev_t) (__major & 0x00000fffu)) << 8);
+  __dev |= (((__dev_t) (__major & 0xfffff000u)) << 32);
+  return __dev;
+}