Add 'c-c++-common/torture/pr107195-1.c' [PR107195] (was: [COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.)

Message ID 878rlej3o6.fsf@euler.schwinge.homeip.net
State Superseded
Headers
Series Add 'c-c++-common/torture/pr107195-1.c' [PR107195] (was: [COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.) |

Commit Message

Thomas Schwinge Oct. 17, 2022, 7:43 a.m. UTC
  Hi!

On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> When solving 0 = _15 & 1, we calculate _15 as:
>
>       [irange] int [-INF, -2][0, +INF] NONZERO 0xfffffffe
>
> The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
> the above, yielding:
>
>       [0, 1] NONZERO 0x0
>
> This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.
>
> This is problematic because here we have a bool which is zero, but
> returns false for irange::zero_p, since the latter does not look at
> nonzero bits.  This causes logical_combine to assume the range is
> not-zero, and all hell breaks loose.
>
> I think we should just normalize a nonzero mask of 0 to [0, 0] at
> creation, thus avoiding all this.

1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
"[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx
offloading test case:

    UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
    [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"

Same for C++.

I'll later send a patch (for the test case!) to fix that up.

2. Looking into this, I found that this
commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
"[PR107195] Set range to zero when nonzero mask is 0" actually enables a
code transformation/optimization that GCC apparently has not been doing
before!  I've tried to capture that in the attached
"Add 'c-c++-common/torture/pr107195-1.c' [PR107195]".

Will you please verify that one?  In its current '#if 1' configuration,
it's all-PASS after commit
r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
"[PR107195] Set range to zero when nonzero mask is 0", whereas before, we
get two calls to 'foo', because GCC apparently didnn't understand the
relation (optimization opportunity) between 'r *= 2;' and the subsequent
'if (r & 1)'.

I've left in the other '#if' variants in case you'd like to experiment
with these, but would otherwise clean that up before pushing.

Where does one put such a test case?

Should the file be named 'pr107195' or something else?

Do we scan 'optimized', or an earlier dump?

At '-O1', the actual code transformation is visible already in the 'dom2'
dump:

       <bb 3> [local count: 536870913]:
       gimple_assign <mult_expr, r_7, r_6(D), 2, NULL>
    +  gimple_assign <bit_and_expr, _11, r_7, 1, NULL>
    +  goto <bb 6>; [100.00%]

    -  <bb 4> [local count: 1073741824]:
    -  # gimple_phi <r_4, r_6(D)(2), r_7(3)>
    +  <bb 4> [local count: 536870912]:
    +  # gimple_phi <r_4, r_6(D)(2)>
       gimple_assign <bit_and_expr, _2, r_4, 1, NULL>
       gimple_cond <ne_expr, _2, 0, NULL, NULL>
    -    goto <bb 5>; [50.00%]
    +    goto <bb 5>; [100.00%]
       else
    -    goto <bb 6>; [50.00%]
    +    goto <bb 6>; [0.00%]

       <bb 5> [local count: 536870913]:
       gimple_call <foo, _3, r_4>
       gimple_assign <plus_expr, r_8, _3, r_4, NULL>

       <bb 6> [local count: 1073741824]:
    -  # gimple_phi <r_5, r_4(4), r_8(5)>
    +  # gimple_phi <r_5, r_4(4), r_8(5), r_7(3)>
       gimple_return <r_5>

And, the actual "avoid second call 'foo'" optimization is visiable
starting 'dom3':

       <bb 3> [local count: 536870913]:
       gimple_assign <mult_expr, r_7, r_6(D), 2, NULL>
    +  goto <bb 6>; [100.00%]

    -  <bb 4> [local count: 1073741824]:
    -  # gimple_phi <r_4, r_6(D)(2), r_7(3)>
    -  gimple_assign <bit_and_expr, _2, r_4, 1, NULL>
    +  <bb 4> [local count: 536870912]:
    +  gimple_assign <bit_and_expr, _2, r_6(D), 1, NULL>
       gimple_cond <ne_expr, _2, 0, NULL, NULL>
    -    goto <bb 5>; [50.00%]
    +    goto <bb 5>; [100.00%]
       else
    -    goto <bb 6>; [50.00%]
    +    goto <bb 6>; [0.00%]

       <bb 5> [local count: 536870913]:
    -  gimple_call <foo, _3, r_4>
    -  gimple_assign <plus_expr, r_8, _3, r_4, NULL>
    +  gimple_assign <integer_cst, _3, 0, NULL, NULL>
    +  gimple_assign <ssa_name, r_8, r_6(D), NULL, NULL>

       <bb 6> [local count: 1073741824]:
    -  # gimple_phi <r_5, r_4(4), r_8(5)>
    +  # gimple_phi <r_5, r_6(D)(4), r_6(D)(5), r_7(3)>
       gimple_return <r_5>

..., but I don't know if either of those would be stable/appropriate to
scan instead of 'optimized'?


Grüße
 Thomas

>       PR tree-optimization/107195
>
> gcc/ChangeLog:
>
>       * value-range.cc (irange::set_range_from_nonzero_bits): Set range
>       to [0,0] when nonzero mask is 0.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.dg/tree-ssa/pr107195-1.c: New test.
>       * gcc.dg/tree-ssa/pr107195-2.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c | 15 +++++++++++++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c | 16 ++++++++++++++++
>  gcc/value-range.cc                         |  5 +++++
>  3 files changed, 36 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
> new file mode 100644
> index 00000000000..a0c20dbd4b1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
> @@ -0,0 +1,15 @@
> +// { dg-do run }
> +// { dg-options "-O1 -fno-tree-ccp" }
> +
> +int a, b;
> +int main() {
> +  int c = 0;
> +  if (a)
> +    c = 1;
> +  c = 1 & (a && c) && b;
> +  if (a) {
> +    b = c;
> +    __builtin_abort ();
> +  }
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
> new file mode 100644
> index 00000000000..d447c78bdd3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
> @@ -0,0 +1,16 @@
> +// { dg-do run }
> +// { dg-options "-O1" }
> +
> +int a, b;
> +int main() {
> +  int c = 0;
> +  long d;
> +  for (; b < 1; b++) {
> +    (c && d) & 3 || a;
> +    d = c;
> +    c = -1;
> +    if (d)
> +      __builtin_abort();
> +  }
> +  return 0;
> +}
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index a14f9bc4394..e07d2aa9a5b 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -2903,6 +2903,11 @@ irange::set_range_from_nonzero_bits ()
>       }
>        return true;
>      }
> +  else if (popcount == 0)
> +    {
> +      set_zero (type ());
> +      return true;
> +    }
>    return false;
>  }
>
> --
> 2.37.3


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Comments

Aldy Hernandez Oct. 17, 2022, 1:58 p.m. UTC | #1
On Mon, Oct 17, 2022 at 9:44 AM Thomas Schwinge <thomas@codesourcery.com> wrote:
>
> Hi!
>
> On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > When solving 0 = _15 & 1, we calculate _15 as:
> >
> >       [irange] int [-INF, -2][0, +INF] NONZERO 0xfffffffe
> >
> > The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
> > the above, yielding:
> >
> >       [0, 1] NONZERO 0x0
> >
> > This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.
> >
> > This is problematic because here we have a bool which is zero, but
> > returns false for irange::zero_p, since the latter does not look at
> > nonzero bits.  This causes logical_combine to assume the range is
> > not-zero, and all hell breaks loose.
> >
> > I think we should just normalize a nonzero mask of 0 to [0, 0] at
> > creation, thus avoiding all this.
>
> 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx
> offloading test case:
>
>     UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
>     [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"
>
> Same for C++.
>
> I'll later send a patch (for the test case!) to fix that up.
>
> 2. Looking into this, I found that this
> commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> "[PR107195] Set range to zero when nonzero mask is 0" actually enables a
> code transformation/optimization that GCC apparently has not been doing
> before!  I've tried to capture that in the attached
> "Add 'c-c++-common/torture/pr107195-1.c' [PR107195]".

Nice.

>
> Will you please verify that one?  In its current '#if 1' configuration,
> it's all-PASS after commit
> r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> "[PR107195] Set range to zero when nonzero mask is 0", whereas before, we
> get two calls to 'foo', because GCC apparently didnn't understand the
> relation (optimization opportunity) between 'r *= 2;' and the subsequent
> 'if (r & 1)'.

Yeah, that looks correct.  We keep better track of nonzero masks.

>
> I've left in the other '#if' variants in case you'd like to experiment
> with these, but would otherwise clean that up before pushing.
>
> Where does one put such a test case?
>
> Should the file be named 'pr107195' or something else?

The aforementioned patch already has:

            * gcc.dg/tree-ssa/pr107195-1.c: New test.
            * gcc.dg/tree-ssa/pr107195-2.c: New test.

So I would just add a pr107195-3.c test.

>
> Do we scan 'optimized', or an earlier dump?
>
> At '-O1', the actual code transformation is visible already in the 'dom2'
> dump:
>
>        <bb 3> [local count: 536870913]:
>        gimple_assign <mult_expr, r_7, r_6(D), 2, NULL>
>     +  gimple_assign <bit_and_expr, _11, r_7, 1, NULL>
>     +  goto <bb 6>; [100.00%]
>
>     -  <bb 4> [local count: 1073741824]:
>     -  # gimple_phi <r_4, r_6(D)(2), r_7(3)>
>     +  <bb 4> [local count: 536870912]:
>     +  # gimple_phi <r_4, r_6(D)(2)>
>        gimple_assign <bit_and_expr, _2, r_4, 1, NULL>
>        gimple_cond <ne_expr, _2, 0, NULL, NULL>
>     -    goto <bb 5>; [50.00%]
>     +    goto <bb 5>; [100.00%]
>        else
>     -    goto <bb 6>; [50.00%]
>     +    goto <bb 6>; [0.00%]
>
>        <bb 5> [local count: 536870913]:
>        gimple_call <foo, _3, r_4>
>        gimple_assign <plus_expr, r_8, _3, r_4, NULL>
>
>        <bb 6> [local count: 1073741824]:
>     -  # gimple_phi <r_5, r_4(4), r_8(5)>
>     +  # gimple_phi <r_5, r_4(4), r_8(5), r_7(3)>
>        gimple_return <r_5>
>
> And, the actual "avoid second call 'foo'" optimization is visiable
> starting 'dom3':
>
>        <bb 3> [local count: 536870913]:
>        gimple_assign <mult_expr, r_7, r_6(D), 2, NULL>
>     +  goto <bb 6>; [100.00%]
>
>     -  <bb 4> [local count: 1073741824]:
>     -  # gimple_phi <r_4, r_6(D)(2), r_7(3)>
>     -  gimple_assign <bit_and_expr, _2, r_4, 1, NULL>
>     +  <bb 4> [local count: 536870912]:
>     +  gimple_assign <bit_and_expr, _2, r_6(D), 1, NULL>
>        gimple_cond <ne_expr, _2, 0, NULL, NULL>
>     -    goto <bb 5>; [50.00%]
>     +    goto <bb 5>; [100.00%]
>        else
>     -    goto <bb 6>; [50.00%]
>     +    goto <bb 6>; [0.00%]
>
>        <bb 5> [local count: 536870913]:
>     -  gimple_call <foo, _3, r_4>
>     -  gimple_assign <plus_expr, r_8, _3, r_4, NULL>
>     +  gimple_assign <integer_cst, _3, 0, NULL, NULL>
>     +  gimple_assign <ssa_name, r_8, r_6(D), NULL, NULL>
>
>        <bb 6> [local count: 1073741824]:
>     -  # gimple_phi <r_5, r_4(4), r_8(5)>
>     +  # gimple_phi <r_5, r_6(D)(4), r_6(D)(5), r_7(3)>
>        gimple_return <r_5>
>
> ..., but I don't know if either of those would be stable/appropriate to
> scan instead of 'optimized'?

IMO, either dom3 or optimized is fine.

Thanks.
Aldy
  
Thomas Schwinge Oct. 17, 2022, 2:46 p.m. UTC | #2
Hi!

On 2022-10-17T15:58:47+0200, Aldy Hernandez <aldyh@redhat.com> wrote:
> On Mon, Oct 17, 2022 at 9:44 AM Thomas Schwinge <thomas@codesourcery.com> wrote:
>> On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> > When solving 0 = _15 & 1, we calculate _15 as:
>> >
>> >       [irange] int [-INF, -2][0, +INF] NONZERO 0xfffffffe
>> >
>> > The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
>> > the above, yielding:
>> >
>> >       [0, 1] NONZERO 0x0
>> >
>> > This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.
>> >
>> > This is problematic because here we have a bool which is zero, but
>> > returns false for irange::zero_p, since the latter does not look at
>> > nonzero bits.  This causes logical_combine to assume the range is
>> > not-zero, and all hell breaks loose.
>> >
>> > I think we should just normalize a nonzero mask of 0 to [0, 0] at
>> > creation, thus avoiding all this.
>>
>> 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
>> "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx
>> offloading test case:
>>
>>     UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
>>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
>>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
>>     [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"
>>
>> Same for C++.
>>
>> I'll later send a patch (for the test case!) to fix that up.
>>
>> 2. Looking into this, I found that this
>> commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
>> "[PR107195] Set range to zero when nonzero mask is 0" actually enables a
>> code transformation/optimization that GCC apparently has not been doing
>> before!  I've tried to capture that in the attached
>> "Add 'c-c++-common/torture/pr107195-1.c' [PR107195]".
>
> Nice.
>
>> Will you please verify that one?  In its current '#if 1' configuration,
>> it's all-PASS after commit
>> r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
>> "[PR107195] Set range to zero when nonzero mask is 0", whereas before, we
>> get two calls to 'foo', because GCC apparently didnn't understand the
>> relation (optimization opportunity) between 'r *= 2;' and the subsequent
>> 'if (r & 1)'.
>
> Yeah, that looks correct.  We keep better track of nonzero masks.

OK, next observation: this also works for split-up expressions
'if ((r & 2) && (r & 1))' (same rationale as for 'if (r & 1)' alone).
I've added such a variant in my test case.

But: it doesn't work for logically equal 'if (r & 3)'.  I've added such
an XFAILed variant in my test case.  Do you have guidance what needs to
be done to make such cases work, too?

>> I've left in the other '#if' variants in case you'd like to experiment
>> with these, but would otherwise clean that up before pushing.
>>
>> Where does one put such a test case?
>>
>> Should the file be named 'pr107195' or something else?
>
> The aforementioned patch already has:
>
>             * gcc.dg/tree-ssa/pr107195-1.c: New test.
>             * gcc.dg/tree-ssa/pr107195-2.c: New test.
>
> So I would just add a pr107195-3.c test.

But note that unlike yours in 'gcc.dg/tree-ssa/', I had put mine into
'c-c++-common/torture/'.  That's so that we get C and C++ testing, and
all torture testing flag variants.  (... where we do see the optimization
happen starting at '-O1'.)  Do you think that is excessive, and a single
'gcc.dg/tree-ssa/' test case, C only, '-O1' only is sufficient for this?
(I don't have much experience with test cases in such regions of GCC,
hence these questions.)

>> Do we scan 'optimized', or an earlier dump?
>>
>> At '-O1', the actual code transformation is visible already in the 'dom2'
>> dump:
>>
>>        <bb 3> [local count: 536870913]:
>>        gimple_assign <mult_expr, r_7, r_6(D), 2, NULL>
>>     +  gimple_assign <bit_and_expr, _11, r_7, 1, NULL>
>>     +  goto <bb 6>; [100.00%]
>>
>>     -  <bb 4> [local count: 1073741824]:
>>     -  # gimple_phi <r_4, r_6(D)(2), r_7(3)>
>>     +  <bb 4> [local count: 536870912]:
>>     +  # gimple_phi <r_4, r_6(D)(2)>
>>        gimple_assign <bit_and_expr, _2, r_4, 1, NULL>
>>        gimple_cond <ne_expr, _2, 0, NULL, NULL>
>>     -    goto <bb 5>; [50.00%]
>>     +    goto <bb 5>; [100.00%]
>>        else
>>     -    goto <bb 6>; [50.00%]
>>     +    goto <bb 6>; [0.00%]
>>
>>        <bb 5> [local count: 536870913]:
>>        gimple_call <foo, _3, r_4>
>>        gimple_assign <plus_expr, r_8, _3, r_4, NULL>
>>
>>        <bb 6> [local count: 1073741824]:
>>     -  # gimple_phi <r_5, r_4(4), r_8(5)>
>>     +  # gimple_phi <r_5, r_4(4), r_8(5), r_7(3)>
>>        gimple_return <r_5>
>>
>> And, the actual "avoid second call 'foo'" optimization is visiable
>> starting 'dom3':
>>
>>        <bb 3> [local count: 536870913]:
>>        gimple_assign <mult_expr, r_7, r_6(D), 2, NULL>
>>     +  goto <bb 6>; [100.00%]
>>
>>     -  <bb 4> [local count: 1073741824]:
>>     -  # gimple_phi <r_4, r_6(D)(2), r_7(3)>
>>     -  gimple_assign <bit_and_expr, _2, r_4, 1, NULL>
>>     +  <bb 4> [local count: 536870912]:
>>     +  gimple_assign <bit_and_expr, _2, r_6(D), 1, NULL>
>>        gimple_cond <ne_expr, _2, 0, NULL, NULL>
>>     -    goto <bb 5>; [50.00%]
>>     +    goto <bb 5>; [100.00%]
>>        else
>>     -    goto <bb 6>; [50.00%]
>>     +    goto <bb 6>; [0.00%]
>>
>>        <bb 5> [local count: 536870913]:
>>     -  gimple_call <foo, _3, r_4>
>>     -  gimple_assign <plus_expr, r_8, _3, r_4, NULL>
>>     +  gimple_assign <integer_cst, _3, 0, NULL, NULL>
>>     +  gimple_assign <ssa_name, r_8, r_6(D), NULL, NULL>
>>
>>        <bb 6> [local count: 1073741824]:
>>     -  # gimple_phi <r_5, r_4(4), r_8(5)>
>>     +  # gimple_phi <r_5, r_6(D)(4), r_6(D)(5), r_7(3)>
>>        gimple_return <r_5>
>>
>> ..., but I don't know if either of those would be stable/appropriate to
>> scan instead of 'optimized'?
>
> IMO, either dom3 or optimized is fine.

OK, I left it at 'optimized', as I don't have any rationale why exactly
it should happen in 'dom3' already.  ;-)


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Aldy Hernandez Oct. 18, 2022, 5:41 a.m. UTC | #3
On Mon, Oct 17, 2022 at 4:47 PM Thomas Schwinge <thomas@codesourcery.com> wrote:
>
> Hi!
>
> On 2022-10-17T15:58:47+0200, Aldy Hernandez <aldyh@redhat.com> wrote:
> > On Mon, Oct 17, 2022 at 9:44 AM Thomas Schwinge <thomas@codesourcery.com> wrote:
> >> On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >> > When solving 0 = _15 & 1, we calculate _15 as:
> >> >
> >> >       [irange] int [-INF, -2][0, +INF] NONZERO 0xfffffffe
> >> >
> >> > The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
> >> > the above, yielding:
> >> >
> >> >       [0, 1] NONZERO 0x0
> >> >
> >> > This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.
> >> >
> >> > This is problematic because here we have a bool which is zero, but
> >> > returns false for irange::zero_p, since the latter does not look at
> >> > nonzero bits.  This causes logical_combine to assume the range is
> >> > not-zero, and all hell breaks loose.
> >> >
> >> > I think we should just normalize a nonzero mask of 0 to [0, 0] at
> >> > creation, thus avoiding all this.
> >>
> >> 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> >> "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx
> >> offloading test case:
> >>
> >>     UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
> >>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
> >>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
> >>     [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"
> >>
> >> Same for C++.
> >>
> >> I'll later send a patch (for the test case!) to fix that up.
> >>
> >> 2. Looking into this, I found that this
> >> commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> >> "[PR107195] Set range to zero when nonzero mask is 0" actually enables a
> >> code transformation/optimization that GCC apparently has not been doing
> >> before!  I've tried to capture that in the attached
> >> "Add 'c-c++-common/torture/pr107195-1.c' [PR107195]".
> >
> > Nice.
> >
> >> Will you please verify that one?  In its current '#if 1' configuration,
> >> it's all-PASS after commit
> >> r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> >> "[PR107195] Set range to zero when nonzero mask is 0", whereas before, we
> >> get two calls to 'foo', because GCC apparently didnn't understand the
> >> relation (optimization opportunity) between 'r *= 2;' and the subsequent
> >> 'if (r & 1)'.
> >
> > Yeah, that looks correct.  We keep better track of nonzero masks.
>
> OK, next observation: this also works for split-up expressions
> 'if ((r & 2) && (r & 1))' (same rationale as for 'if (r & 1)' alone).
> I've added such a variant in my test case.

Unless I'm missing something, your testcase doesn't have a body for
foo[123], so GCC has no way to know what any of those functions did or
what bits are set/unset.

>
> But: it doesn't work for logically equal 'if (r & 3)'.  I've added such
> an XFAILed variant in my test case.  Do you have guidance what needs to
> be done to make such cases work, too?
>
> >> I've left in the other '#if' variants in case you'd like to experiment
> >> with these, but would otherwise clean that up before pushing.
> >>
> >> Where does one put such a test case?
> >>
> >> Should the file be named 'pr107195' or something else?
> >
> > The aforementioned patch already has:
> >
> >             * gcc.dg/tree-ssa/pr107195-1.c: New test.
> >             * gcc.dg/tree-ssa/pr107195-2.c: New test.
> >
> > So I would just add a pr107195-3.c test.
>
> But note that unlike yours in 'gcc.dg/tree-ssa/', I had put mine into
> 'c-c++-common/torture/'.  That's so that we get C and C++ testing, and
> all torture testing flag variants.  (... where we do see the optimization
> happen starting at '-O1'.)  Do you think that is excessive, and a single
> 'gcc.dg/tree-ssa/' test case, C only, '-O1' only is sufficient for this?
> (I don't have much experience with test cases in such regions of GCC,
> hence these questions.)

My personal preference is tree-ssa since they are middle end tests.
Also, since we're testing ranger, it primarily runs in DOM, VRP, evrp,
and the backward threader, so no need to run it at multiple
optimization levels.

I suggested DOM, because I know ranger runs within DOM, so if the
transformation is seen at -O1, it's likely to be done there.  Also,
evrp/VRP don't run at -O1, so that's another hint it happened in DOM.
This is a guess though, it could've been CCP setting a nonzero mask,
which then ranger/DOM picked up.

All in all, I'm in favor of putting tests as early as possible,
otherwise any number of passes could perform a transformation that
could lead to the same end result.  We are testing ranger, so the most
likely place to put this test is in DOM at -O1, or in evrp/VRP[12] for
-O2.

Of course, this is my personal preference, and these are just general
guidelines.  Perhaps others can opine.

Aldy

>
> >> Do we scan 'optimized', or an earlier dump?
> >>
> >> At '-O1', the actual code transformation is visible already in the 'dom2'
> >> dump:
> >>
> >>        <bb 3> [local count: 536870913]:
> >>        gimple_assign <mult_expr, r_7, r_6(D), 2, NULL>
> >>     +  gimple_assign <bit_and_expr, _11, r_7, 1, NULL>
> >>     +  goto <bb 6>; [100.00%]
> >>
> >>     -  <bb 4> [local count: 1073741824]:
> >>     -  # gimple_phi <r_4, r_6(D)(2), r_7(3)>
> >>     +  <bb 4> [local count: 536870912]:
> >>     +  # gimple_phi <r_4, r_6(D)(2)>
> >>        gimple_assign <bit_and_expr, _2, r_4, 1, NULL>
> >>        gimple_cond <ne_expr, _2, 0, NULL, NULL>
> >>     -    goto <bb 5>; [50.00%]
> >>     +    goto <bb 5>; [100.00%]
> >>        else
> >>     -    goto <bb 6>; [50.00%]
> >>     +    goto <bb 6>; [0.00%]
> >>
> >>        <bb 5> [local count: 536870913]:
> >>        gimple_call <foo, _3, r_4>
> >>        gimple_assign <plus_expr, r_8, _3, r_4, NULL>
> >>
> >>        <bb 6> [local count: 1073741824]:
> >>     -  # gimple_phi <r_5, r_4(4), r_8(5)>
> >>     +  # gimple_phi <r_5, r_4(4), r_8(5), r_7(3)>
> >>        gimple_return <r_5>
> >>
> >> And, the actual "avoid second call 'foo'" optimization is visiable
> >> starting 'dom3':
> >>
> >>        <bb 3> [local count: 536870913]:
> >>        gimple_assign <mult_expr, r_7, r_6(D), 2, NULL>
> >>     +  goto <bb 6>; [100.00%]
> >>
> >>     -  <bb 4> [local count: 1073741824]:
> >>     -  # gimple_phi <r_4, r_6(D)(2), r_7(3)>
> >>     -  gimple_assign <bit_and_expr, _2, r_4, 1, NULL>
> >>     +  <bb 4> [local count: 536870912]:
> >>     +  gimple_assign <bit_and_expr, _2, r_6(D), 1, NULL>
> >>        gimple_cond <ne_expr, _2, 0, NULL, NULL>
> >>     -    goto <bb 5>; [50.00%]
> >>     +    goto <bb 5>; [100.00%]
> >>        else
> >>     -    goto <bb 6>; [50.00%]
> >>     +    goto <bb 6>; [0.00%]
> >>
> >>        <bb 5> [local count: 536870913]:
> >>     -  gimple_call <foo, _3, r_4>
> >>     -  gimple_assign <plus_expr, r_8, _3, r_4, NULL>
> >>     +  gimple_assign <integer_cst, _3, 0, NULL, NULL>
> >>     +  gimple_assign <ssa_name, r_8, r_6(D), NULL, NULL>
> >>
> >>        <bb 6> [local count: 1073741824]:
> >>     -  # gimple_phi <r_5, r_4(4), r_8(5)>
> >>     +  # gimple_phi <r_5, r_6(D)(4), r_6(D)(5), r_7(3)>
> >>        gimple_return <r_5>
> >>
> >> ..., but I don't know if either of those would be stable/appropriate to
> >> scan instead of 'optimized'?
> >
> > IMO, either dom3 or optimized is fine.
>
> OK, I left it at 'optimized', as I don't have any rationale why exactly
> it should happen in 'dom3' already.  ;-)
>
>
> Grüße
>  Thomas
>
>
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Thomas Schwinge Oct. 21, 2022, 9:36 a.m. UTC | #4
Hi!

On 2022-10-17T09:43:37+0200, I wrote:
> On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> When solving 0 = _15 & 1, we calculate _15 as:
>>
>>      [irange] int [-INF, -2][0, +INF] NONZERO 0xfffffffe
>>
>> The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
>> the above, yielding:
>>
>>      [0, 1] NONZERO 0x0
>>
>> This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.
>>
>> This is problematic because here we have a bool which is zero, but
>> returns false for irange::zero_p, since the latter does not look at
>> nonzero bits.  This causes logical_combine to assume the range is
>> not-zero, and all hell breaks loose.
>>
>> I think we should just normalize a nonzero mask of 0 to [0, 0] at
>> creation, thus avoiding all this.
>
> 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx
> offloading test case:
>
>     UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
>     [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2   scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"
>
> Same for C++.
>
> I'll later send a patch (for the test case!) to fix that up.

Pushed to master branch commit a9de836c2b22f878cff592b96e11c1b95d4d36ee
"Restore 'libgomp.oacc-c-c++-common/nvptx-sese-1.c' SESE regions checking [PR107195, PR107344]",
see attached.

That discussion I suppose is to be continued in
<https://gcc.gnu.org/PR107344> "GCC/nvptx SESE region optimization".


Grüße
 Thomas


>>      PR tree-optimization/107195
>>
>> gcc/ChangeLog:
>>
>>      * value-range.cc (irange::set_range_from_nonzero_bits): Set range
>>      to [0,0] when nonzero mask is 0.
>>
>> gcc/testsuite/ChangeLog:
>>
>>      * gcc.dg/tree-ssa/pr107195-1.c: New test.
>>      * gcc.dg/tree-ssa/pr107195-2.c: New test.
>> ---
>>  gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c | 15 +++++++++++++++
>>  gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c | 16 ++++++++++++++++
>>  gcc/value-range.cc                         |  5 +++++
>>  3 files changed, 36 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
>> new file mode 100644
>> index 00000000000..a0c20dbd4b1
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
>> @@ -0,0 +1,15 @@
>> +// { dg-do run }
>> +// { dg-options "-O1 -fno-tree-ccp" }
>> +
>> +int a, b;
>> +int main() {
>> +  int c = 0;
>> +  if (a)
>> +    c = 1;
>> +  c = 1 & (a && c) && b;
>> +  if (a) {
>> +    b = c;
>> +    __builtin_abort ();
>> +  }
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
>> new file mode 100644
>> index 00000000000..d447c78bdd3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
>> @@ -0,0 +1,16 @@
>> +// { dg-do run }
>> +// { dg-options "-O1" }
>> +
>> +int a, b;
>> +int main() {
>> +  int c = 0;
>> +  long d;
>> +  for (; b < 1; b++) {
>> +    (c && d) & 3 || a;
>> +    d = c;
>> +    c = -1;
>> +    if (d)
>> +      __builtin_abort();
>> +  }
>> +  return 0;
>> +}
>> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
>> index a14f9bc4394..e07d2aa9a5b 100644
>> --- a/gcc/value-range.cc
>> +++ b/gcc/value-range.cc
>> @@ -2903,6 +2903,11 @@ irange::set_range_from_nonzero_bits ()
>>      }
>>        return true;
>>      }
>> +  else if (popcount == 0)
>> +    {
>> +      set_zero (type ());
>> +      return true;
>> +    }
>>    return false;
>>  }
>>
>> --
>> 2.37.3
>
>
> From dc4644dcef05a1f21a9ebc194689f31412811387 Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge <thomas@codesourcery.com>
> Date: Mon, 17 Oct 2022 09:10:03 +0200
> Subject: [PATCH] Add 'c-c++-common/torture/pr107195-1.c' [PR107195]
>
> ... to display optimization performed as of recent
> commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
> "[PR107195] Set range to zero when nonzero mask is 0".
>
>       PR tree-optimization/107195
>       gcc/testsuite/
>       * c-c++-common/torture/pr107195-1.c: New.
> ---
>  .../c-c++-common/torture/pr107195-1.c         | 41 +++++++++++++++++++
>  1 file changed, 41 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/torture/pr107195-1.c
>
> diff --git a/gcc/testsuite/c-c++-common/torture/pr107195-1.c b/gcc/testsuite/c-c++-common/torture/pr107195-1.c
> new file mode 100644
> index 000000000000..1e201c1f5e6c
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/torture/pr107195-1.c
> @@ -0,0 +1,41 @@
> +/* Inspired by 'libgomp.oacc-c-c++-common/nvptx-sese-1.c'.  */
> +
> +/* { dg-additional-options -fdump-tree-optimized-raw }
> +   { dg-skip-if {} { *-*-* } { {-flto -fno-fat-lto-objects} } { } } */
> +
> +#if 1
> +extern int
> +__attribute__((const))
> +foo (int);
> +#else
> +int
> +__attribute__((noinline))
> +foo (int x)
> +{
> +  return x & 2;
> +}
> +#endif
> +
> +int f (int r)
> +{
> +  if (foo (r)) /* If this first 'if' holds...  */
> +    r *= 2;  /* ..., 'r' now has a zero-value lower-most bit...  */
> +
> +  if (r & 1) /* ..., so this second 'if' can never hold...  */
> +    { /* ..., so this is unreachable.  */
> +#if 1
> +      /* In constrast, if the first 'if' does not hold ('foo (r) == 0'), the
> +      second 'if' may hold, but we know ('foo' being 'const') that
> +      'foo (r) == 0', so don't have to re-evaluate it here: */
> +      r += foo (r);
> +      /* Thus, if optimizing, we only ever expect one call of 'foo'.
> +      { dg-final { scan-tree-dump-times {gimple_call <foo,} 1 optimized { target __OPTIMIZE__ } } }
> +      { dg-final { scan-tree-dump-times {gimple_call <foo,} 2 optimized { target { ! __OPTIMIZE__ } } } }
> +      */
> +#else
> +      r += foo (-r);
> +#endif
> +    }
> +
> +  return r;
> +}
> --
> 2.35.1
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Patch

From dc4644dcef05a1f21a9ebc194689f31412811387 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 17 Oct 2022 09:10:03 +0200
Subject: [PATCH] Add 'c-c++-common/torture/pr107195-1.c' [PR107195]

... to display optimization performed as of recent
commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
"[PR107195] Set range to zero when nonzero mask is 0".

	PR tree-optimization/107195
	gcc/testsuite/
	* c-c++-common/torture/pr107195-1.c: New.
---
 .../c-c++-common/torture/pr107195-1.c         | 41 +++++++++++++++++++
 1 file changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/torture/pr107195-1.c

diff --git a/gcc/testsuite/c-c++-common/torture/pr107195-1.c b/gcc/testsuite/c-c++-common/torture/pr107195-1.c
new file mode 100644
index 000000000000..1e201c1f5e6c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/pr107195-1.c
@@ -0,0 +1,41 @@ 
+/* Inspired by 'libgomp.oacc-c-c++-common/nvptx-sese-1.c'.  */
+
+/* { dg-additional-options -fdump-tree-optimized-raw }
+   { dg-skip-if {} { *-*-* } { {-flto -fno-fat-lto-objects} } { } } */
+
+#if 1
+extern int
+__attribute__((const))
+foo (int);
+#else
+int
+__attribute__((noinline))
+foo (int x)
+{
+  return x & 2;
+}
+#endif
+
+int f (int r)
+{
+  if (foo (r)) /* If this first 'if' holds...  */
+    r *= 2;  /* ..., 'r' now has a zero-value lower-most bit...  */
+
+  if (r & 1) /* ..., so this second 'if' can never hold...  */
+    { /* ..., so this is unreachable.  */
+#if 1
+      /* In constrast, if the first 'if' does not hold ('foo (r) == 0'), the
+	 second 'if' may hold, but we know ('foo' being 'const') that
+	 'foo (r) == 0', so don't have to re-evaluate it here: */
+      r += foo (r);
+      /* Thus, if optimizing, we only ever expect one call of 'foo'.
+	 { dg-final { scan-tree-dump-times {gimple_call <foo,} 1 optimized { target __OPTIMIZE__ } } }
+	 { dg-final { scan-tree-dump-times {gimple_call <foo,} 2 optimized { target { ! __OPTIMIZE__ } } } }
+      */
+#else
+      r += foo (-r);
+#endif
+    }
+
+  return r;
+}
-- 
2.35.1