c++: Avoid incorrect shortening of divisions [PR108365]

Message ID Y7742V4Ipt6WxHyb@tucnak
State New
Headers
Series c++: Avoid incorrect shortening of divisions [PR108365] |

Commit Message

Jakub Jelinek Jan. 11, 2023, 5:58 p.m. UTC
  Hi!

The following testcase is miscompiled, because we shorten the division
in a case where it should not be shortened.
Divisions (and modulos) can be shortened if it is unsigned division/modulo,
or if it is signed division/modulo where we can prove the dividend will
not be the minimum signed value or divisor will not be -1, because e.g.
on sizeof(long long)==sizeof(int)*2 && __INT_MAX__ == 0x7fffffff targets
(-2147483647 - 1) / -1 is UB
but
(int) (-2147483648LL / -1LL) is not, it is -2147483648.
The primary aim of both the C and C++ FE division/modulo shortening I assume
was for the implicit integral promotions of {,signed,unsigned} {char,short}
and because at this point we have no VRP information etc., the shortening
is done if the integral promotion is from unsigned type for the divisor
or if the dividend is an integer constant other than -1.
This works fine for char/short -> int promotions when char/short have
smaller precision than int - unsigned char -> int or unsigned short -> int
will always be a positive int, so never the most negative.

Now, the C FE checks whether orig_op0 is TYPE_UNSIGNED where op0 is either
the same as orig_op0 or that promoted to int, I think that works fine,
if it isn't promoted, either the division/modulo common type will have the
same precision as op0 but then the division/modulo is unsigned and so
without UB, or it will be done in wider precision (e.g. because op1 has
wider precision), but then op0 can't be minimum signed value.  Or it has
been promoted to int, but in that case it was again from narrower type and
so never minimum signed int.

But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED.
First of all, not sure if the operand of NOP_EXPR couldn't be non-integral
type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly,
even if it is a cast from unsigned integral type, we only know it can't be
minimum signed value if it is a widening cast, if it is same precision or
narrowing cast, we know nothing.

So, the following patch for the NOP_EXPR cases checks just in case that
it is from integral type and more importantly checks it is a widening
conversion, and then next to it also allows op0 to be just unsigned,
promoted or not, as that is what the C FE will do for those cases too
and I believe it must work - either the division/modulo common type
will be that unsigned type, then we can shorten and don't need to worry
about UB, or it will be some wider signed type but then it can't be most
negative value of the wider type.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-01-11  Jakub Jelinek  <jakub@redhat.com>

	PR c++/108365
	* typeck.cc (cp_build_binary_op): For integral division or modulo,
	shorten if type0 is unsigned, or op0 is cast from narrower unsigned
	integral type or stripped_op1 is INTEGER_CST other than -1.

	* g++.dg/opt/pr108365.C: New test.
	* g++.dg/warn/pr108365.C: New test.


	Jakub
  

Comments

Jason Merrill Jan. 12, 2023, 7:37 p.m. UTC | #1
On 1/11/23 12:58, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase is miscompiled, because we shorten the division
> in a case where it should not be shortened.
> Divisions (and modulos) can be shortened if it is unsigned division/modulo,
> or if it is signed division/modulo where we can prove the dividend will
> not be the minimum signed value or divisor will not be -1, because e.g.
> on sizeof(long long)==sizeof(int)*2 && __INT_MAX__ == 0x7fffffff targets
> (-2147483647 - 1) / -1 is UB
> but
> (int) (-2147483648LL / -1LL) is not, it is -2147483648.
> The primary aim of both the C and C++ FE division/modulo shortening I assume
> was for the implicit integral promotions of {,signed,unsigned} {char,short}
> and because at this point we have no VRP information etc., the shortening
> is done if the integral promotion is from unsigned type for the divisor
> or if the dividend is an integer constant other than -1.
> This works fine for char/short -> int promotions when char/short have
> smaller precision than int - unsigned char -> int or unsigned short -> int
> will always be a positive int, so never the most negative.
> 
> Now, the C FE checks whether orig_op0 is TYPE_UNSIGNED where op0 is either
> the same as orig_op0 or that promoted to int, I think that works fine,
> if it isn't promoted, either the division/modulo common type will have the
> same precision as op0 but then the division/modulo is unsigned and so
> without UB, or it will be done in wider precision (e.g. because op1 has
> wider precision), but then op0 can't be minimum signed value.  Or it has
> been promoted to int, but in that case it was again from narrower type and
> so never minimum signed int.
> 
> But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED.
> First of all, not sure if the operand of NOP_EXPR couldn't be non-integral
> type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly,
> even if it is a cast from unsigned integral type, we only know it can't be
> minimum signed value if it is a widening cast, if it is same precision or
> narrowing cast, we know nothing.

Curious, this divergence goes back to 1994, when the C++ front-end was 
merged and tege changed the condition in the C front-end.

> So, the following patch for the NOP_EXPR cases checks just in case that
> it is from integral type and more importantly checks it is a widening
> conversion, and then next to it also allows op0 to be just unsigned,
> promoted or not, as that is what the C FE will do for those cases too
> and I believe it must work - either the division/modulo common type
> will be that unsigned type, then we can shorten and don't need to worry
> about UB, or it will be some wider signed type but then it can't be most
> negative value of the wider type.

Why not use the same condition in C and C++?

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2023-01-11  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR c++/108365
> 	* typeck.cc (cp_build_binary_op): For integral division or modulo,
> 	shorten if type0 is unsigned, or op0 is cast from narrower unsigned
> 	integral type or stripped_op1 is INTEGER_CST other than -1.
> 
> 	* g++.dg/opt/pr108365.C: New test.
> 	* g++.dg/warn/pr108365.C: New test.
> 
> --- gcc/cp/typeck.cc.jj	2022-12-15 19:17:37.828072458 +0100
> +++ gcc/cp/typeck.cc	2023-01-11 12:15:25.195284107 +0100
> @@ -5455,8 +5455,15 @@ cp_build_binary_op (const op_location_t
>   		 point, so we have to dig out the original type to find out if
>   		 it was unsigned.  */
>   	      tree stripped_op1 = tree_strip_any_location_wrapper (op1);
> -	      shorten = ((TREE_CODE (op0) == NOP_EXPR
> -			  && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0, 0))))
> +	      shorten = (TYPE_UNSIGNED (type0)
> +			 || (TREE_CODE (op0) == NOP_EXPR
> +			     && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (op0,
> +									  0)))
> +			     && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0,
> +									0)))
> +			     && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (op0,
> +									  0)))
> +				 < TYPE_PRECISION (type0)))
>   			 || (TREE_CODE (stripped_op1) == INTEGER_CST
>   			     && ! integer_all_onesp (stripped_op1)));
>   	    }
> @@ -5491,8 +5498,12 @@ cp_build_binary_op (const op_location_t
>   	     quotient can't be represented in the computation mode.  We shorten
>   	     only if unsigned or if dividing by something we know != -1.  */
>   	  tree stripped_op1 = tree_strip_any_location_wrapper (op1);
> -	  shorten = ((TREE_CODE (op0) == NOP_EXPR
> -		      && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0, 0))))
> +	  shorten = (TYPE_UNSIGNED (type0)
> +		     || (TREE_CODE (op0) == NOP_EXPR
> +			 && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (op0, 0)))
> +			 && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0, 0)))
> +			 && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (op0, 0)))
> +			     < TYPE_PRECISION (type0)))
>   		     || (TREE_CODE (stripped_op1) == INTEGER_CST
>   			 && ! integer_all_onesp (stripped_op1)));
>   	  common = 1;
> --- gcc/testsuite/g++.dg/opt/pr108365.C.jj	2023-01-11 12:19:03.322086288 +0100
> +++ gcc/testsuite/g++.dg/opt/pr108365.C	2023-01-11 12:18:39.811430975 +0100
> @@ -0,0 +1,13 @@
> +// PR c++/108365
> +// { dg-do run }
> +
> +char b = 1;
> +
> +int
> +main ()
> +{
> +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
> +  while ((short) ((long long) (unsigned long long) (-__INT_MAX__ - 1) / (long long) (b ? -1 : 0)))
> +    ;
> +#endif
> +}
> --- gcc/testsuite/g++.dg/warn/pr108365.C.jj	2023-01-11 12:32:55.952875172 +0100
> +++ gcc/testsuite/g++.dg/warn/pr108365.C	2023-01-11 12:32:37.345148131 +0100
> @@ -0,0 +1,5 @@
> +// PR c++/108365
> +// { dg-do compile { target { { { ilp32 || lp64 } || llp64 } && c++11 } } }
> +
> +constexpr char b = 1;
> +long t = (short) ((long long) (unsigned long long) (-__INT_MAX__ - 1) / (long long) (b ? -1 : 0)); // { dg-bogus "integer overflow in expression of type" }
> 
> 	Jakub
>
  
Jakub Jelinek Jan. 12, 2023, 7:55 p.m. UTC | #2
On Thu, Jan 12, 2023 at 02:37:13PM -0500, Jason Merrill wrote:
> > But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED.
> > First of all, not sure if the operand of NOP_EXPR couldn't be non-integral
> > type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly,
> > even if it is a cast from unsigned integral type, we only know it can't be
> > minimum signed value if it is a widening cast, if it is same precision or
> > narrowing cast, we know nothing.
> 
> Curious, this divergence goes back to 1994, when the C++ front-end was
> merged and tege changed the condition in the C front-end.

And it was changed to match the modulo condition adjusted by rms in 1993.

> > So, the following patch for the NOP_EXPR cases checks just in case that
> > it is from integral type and more importantly checks it is a widening
> > conversion, and then next to it also allows op0 to be just unsigned,
> > promoted or not, as that is what the C FE will do for those cases too
> > and I believe it must work - either the division/modulo common type
> > will be that unsigned type, then we can shorten and don't need to worry
> > about UB, or it will be some wider signed type but then it can't be most
> > negative value of the wider type.
> 
> Why not use the same condition in C and C++?

I can test that.  Do you mean change the C FE to match the patched C++
or change C++ FE to just test TYPE_UNSIGNED (orig_op0)?
I think both should work, though what I wrote perhaps can shorten in more
cases.  Can try to construct testcases where it differs...

	Jakub
  

Patch

--- gcc/cp/typeck.cc.jj	2022-12-15 19:17:37.828072458 +0100
+++ gcc/cp/typeck.cc	2023-01-11 12:15:25.195284107 +0100
@@ -5455,8 +5455,15 @@  cp_build_binary_op (const op_location_t
 		 point, so we have to dig out the original type to find out if
 		 it was unsigned.  */
 	      tree stripped_op1 = tree_strip_any_location_wrapper (op1);
-	      shorten = ((TREE_CODE (op0) == NOP_EXPR
-			  && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0, 0))))
+	      shorten = (TYPE_UNSIGNED (type0)
+			 || (TREE_CODE (op0) == NOP_EXPR
+			     && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (op0,
+									  0)))
+			     && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0,
+									0)))
+			     && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (op0,
+									  0)))
+				 < TYPE_PRECISION (type0)))
 			 || (TREE_CODE (stripped_op1) == INTEGER_CST
 			     && ! integer_all_onesp (stripped_op1)));
 	    }
@@ -5491,8 +5498,12 @@  cp_build_binary_op (const op_location_t
 	     quotient can't be represented in the computation mode.  We shorten
 	     only if unsigned or if dividing by something we know != -1.  */
 	  tree stripped_op1 = tree_strip_any_location_wrapper (op1);
-	  shorten = ((TREE_CODE (op0) == NOP_EXPR
-		      && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0, 0))))
+	  shorten = (TYPE_UNSIGNED (type0)
+		     || (TREE_CODE (op0) == NOP_EXPR
+			 && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (op0, 0)))
+			 && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0, 0)))
+			 && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (op0, 0)))
+			     < TYPE_PRECISION (type0)))
 		     || (TREE_CODE (stripped_op1) == INTEGER_CST
 			 && ! integer_all_onesp (stripped_op1)));
 	  common = 1;
--- gcc/testsuite/g++.dg/opt/pr108365.C.jj	2023-01-11 12:19:03.322086288 +0100
+++ gcc/testsuite/g++.dg/opt/pr108365.C	2023-01-11 12:18:39.811430975 +0100
@@ -0,0 +1,13 @@ 
+// PR c++/108365
+// { dg-do run }
+
+char b = 1;
+
+int
+main ()
+{
+#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
+  while ((short) ((long long) (unsigned long long) (-__INT_MAX__ - 1) / (long long) (b ? -1 : 0)))
+    ;
+#endif
+}
--- gcc/testsuite/g++.dg/warn/pr108365.C.jj	2023-01-11 12:32:55.952875172 +0100
+++ gcc/testsuite/g++.dg/warn/pr108365.C	2023-01-11 12:32:37.345148131 +0100
@@ -0,0 +1,5 @@ 
+// PR c++/108365
+// { dg-do compile { target { { { ilp32 || lp64 } || llp64 } && c++11 } } }
+
+constexpr char b = 1;
+long t = (short) ((long long) (unsigned long long) (-__INT_MAX__ - 1) / (long long) (b ? -1 : 0)); // { dg-bogus "integer overflow in expression of type" }