diff mbox series

[match.pd] PR83750 - CSE erf/erfc pair

Message ID	CAAgBjMkadc11j998xBSOpNB8ES29KWN8=Td-dVf54U26rqabVg@mail.gmail.com
State	Changes Requested
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E48463858029 MIME-Version: 1.0 Date: Mon, 18 Oct 2021 15:35:24 +0530 Message-ID: <CAAgBjMkadc11j998xBSOpNB8ES29KWN8=Td-dVf54U26rqabVg@mail.gmail.com> Subject: [match.pd] PR83750 - CSE erf/erfc pair To: gcc Patches <gcc-patches@gcc.gnu.org>, Richard Biener <rguenther@suse.de> Content-Type: multipart/mixed; boundary="0000000000001f7b5f05ce9db199" Precedence: list From: Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
Series	[match.pd] PR83750 - CSE erf/erfc pair \| [match.pd] PR83750 - CSE erf/erfc pair

Commit Message

Prathamesh Kulkarni Oct. 18, 2021, 10:05 a.m. UTC

  Hi Richard,
As suggested in PR, I have attached WIP patch that adds two patterns
to match.pd:
erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
1 - erf(x) --> erfc(x) if !canonicalize_math_p().

This works to remove call to erfc for the following test:
double f(double x)
{
  double g(double, double);

  double t1 = __builtin_erf (x);
  double t2 = __builtin_erfc (x);
  return g(t1, t2);
}

with .optimized dump shows:
  t1_2 = __builtin_erf (x_1(D));
  t2_3 = 1.0e+0 - t1_2;

However, for the following test:
double f(double x)
{
  double g(double, double);

  double t1 = __builtin_erfc (x);
  return t1;
}

It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
erf(x) to erfc(x) again
post canonicalization.
-fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
but then it tries to
resimplify erfc(x), which fails post canonicalization. So we end up
with erfc(x) transformed to
1 - erf(x) in .optimized dump, which I suppose isn't ideal.
Could you suggest how to proceed ?

Thanks,
Prathamesh

Comments

Richard Biener Oct. 18, 2021, 10:48 a.m. UTC | #1

On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:

> Hi Richard,
> As suggested in PR, I have attached WIP patch that adds two patterns
> to match.pd:
> erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> 
> This works to remove call to erfc for the following test:
> double f(double x)
> {
>   double g(double, double);
> 
>   double t1 = __builtin_erf (x);
>   double t2 = __builtin_erfc (x);
>   return g(t1, t2);
> }
> 
> with .optimized dump shows:
>   t1_2 = __builtin_erf (x_1(D));
>   t2_3 = 1.0e+0 - t1_2;
> 
> However, for the following test:
> double f(double x)
> {
>   double g(double, double);
> 
>   double t1 = __builtin_erfc (x);
>   return t1;
> }
> 
> It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> erf(x) to erfc(x) again
> post canonicalization.
> -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> but then it tries to
> resimplify erfc(x), which fails post canonicalization. So we end up
> with erfc(x) transformed to
> 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> Could you suggest how to proceed ?

I applied your patch manually and it does the intended
simplifications so I wonder what I am missing?

Richard.

> Thanks,
> Prathamesh
>

Prathamesh Kulkarni Oct. 18, 2021, 11:30 a.m. UTC | #2

On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
>
> > Hi Richard,
> > As suggested in PR, I have attached WIP patch that adds two patterns
> > to match.pd:
> > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> >
> > This works to remove call to erfc for the following test:
> > double f(double x)
> > {
> >   double g(double, double);
> >
> >   double t1 = __builtin_erf (x);
> >   double t2 = __builtin_erfc (x);
> >   return g(t1, t2);
> > }
> >
> > with .optimized dump shows:
> >   t1_2 = __builtin_erf (x_1(D));
> >   t2_3 = 1.0e+0 - t1_2;
> >
> > However, for the following test:
> > double f(double x)
> > {
> >   double g(double, double);
> >
> >   double t1 = __builtin_erfc (x);
> >   return t1;
> > }
> >
> > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > erf(x) to erfc(x) again
> > post canonicalization.
> > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > but then it tries to
> > resimplify erfc(x), which fails post canonicalization. So we end up
> > with erfc(x) transformed to
> > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > Could you suggest how to proceed ?
>
> I applied your patch manually and it does the intended
> simplifications so I wonder what I am missing?
Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
no erf(x) in the source ?
So for the following test:
double f(double x)
{
  t1 = __builtin_erfc(x)
  return t1;
}

.optimized dump shows:
double f (double x)
{
  double t1;
  double _2;

  <bb 2> [local count: 1073741824]:
  _2 = __builtin_erf (x_1(D));
  t1_3 = 1.0e+0 - _2;
  return t1_3;
}

while before patch, it has:
  t1_4 = __builtin_erfc (x_2(D)); [tail call]
  return t1_4;

Thanks,
Prathamesh

>
> Richard.
>
> > Thanks,
> > Prathamesh
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Richard Biener Oct. 18, 2021, 11:40 a.m. UTC | #3

On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:

> On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > Hi Richard,
> > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > to match.pd:
> > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > >
> > > This works to remove call to erfc for the following test:
> > > double f(double x)
> > > {
> > >   double g(double, double);
> > >
> > >   double t1 = __builtin_erf (x);
> > >   double t2 = __builtin_erfc (x);
> > >   return g(t1, t2);
> > > }
> > >
> > > with .optimized dump shows:
> > >   t1_2 = __builtin_erf (x_1(D));
> > >   t2_3 = 1.0e+0 - t1_2;
> > >
> > > However, for the following test:
> > > double f(double x)
> > > {
> > >   double g(double, double);
> > >
> > >   double t1 = __builtin_erfc (x);
> > >   return t1;
> > > }
> > >
> > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > erf(x) to erfc(x) again
> > > post canonicalization.
> > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > but then it tries to
> > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > with erfc(x) transformed to
> > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > Could you suggest how to proceed ?
> >
> > I applied your patch manually and it does the intended
> > simplifications so I wonder what I am missing?
> Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> no erf(x) in the source ?

I do think it's reasonable to expect erfc to be available when erf
is and vice versa but note both are C99 specified functions (either
requires -lm).

Richard.

> So for the following test:
> double f(double x)
> {
>   t1 = __builtin_erfc(x)
>   return t1;
> }
> 
> .optimized dump shows:
> double f (double x)
> {
>   double t1;
>   double _2;
> 
>   <bb 2> [local count: 1073741824]:
>   _2 = __builtin_erf (x_1(D));
>   t1_3 = 1.0e+0 - _2;
>   return t1_3;
> }
> 
> while before patch, it has:
>   t1_4 = __builtin_erfc (x_2(D)); [tail call]
>   return t1_4;
> 
> Thanks,
> Prathamesh
> 
> >
> > Richard.
> >
> > > Thanks,
> > > Prathamesh
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>

Prathamesh Kulkarni Oct. 18, 2021, 11:45 a.m. UTC | #4

On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
>
> > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > >
> > > > Hi Richard,
> > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > to match.pd:
> > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > >
> > > > This works to remove call to erfc for the following test:
> > > > double f(double x)
> > > > {
> > > >   double g(double, double);
> > > >
> > > >   double t1 = __builtin_erf (x);
> > > >   double t2 = __builtin_erfc (x);
> > > >   return g(t1, t2);
> > > > }
> > > >
> > > > with .optimized dump shows:
> > > >   t1_2 = __builtin_erf (x_1(D));
> > > >   t2_3 = 1.0e+0 - t1_2;
> > > >
> > > > However, for the following test:
> > > > double f(double x)
> > > > {
> > > >   double g(double, double);
> > > >
> > > >   double t1 = __builtin_erfc (x);
> > > >   return t1;
> > > > }
> > > >
> > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > erf(x) to erfc(x) again
> > > > post canonicalization.
> > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > but then it tries to
> > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > with erfc(x) transformed to
> > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > Could you suggest how to proceed ?
> > >
> > > I applied your patch manually and it does the intended
> > > simplifications so I wonder what I am missing?
> > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > no erf(x) in the source ?
>
> I do think it's reasonable to expect erfc to be available when erf
> is and vice versa but note both are C99 specified functions (either
> requires -lm).
OK, thanks. Would it be OK to commit the patch after bootstrap+test ?

Thanks,
Prathamesh

>
> Richard.
>
> > So for the following test:
> > double f(double x)
> > {
> >   t1 = __builtin_erfc(x)
> >   return t1;
> > }
> >
> > .optimized dump shows:
> > double f (double x)
> > {
> >   double t1;
> >   double _2;
> >
> >   <bb 2> [local count: 1073741824]:
> >   _2 = __builtin_erf (x_1(D));
> >   t1_3 = 1.0e+0 - _2;
> >   return t1_3;
> > }
> >
> > while before patch, it has:
> >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> >   return t1_4;
> >
> > Thanks,
> > Prathamesh
> >
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Richard Biener Oct. 18, 2021, 11:53 a.m. UTC | #5

On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:

> On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > >
> > > > > Hi Richard,
> > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > to match.pd:
> > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > >
> > > > > This works to remove call to erfc for the following test:
> > > > > double f(double x)
> > > > > {
> > > > >   double g(double, double);
> > > > >
> > > > >   double t1 = __builtin_erf (x);
> > > > >   double t2 = __builtin_erfc (x);
> > > > >   return g(t1, t2);
> > > > > }
> > > > >
> > > > > with .optimized dump shows:
> > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > >
> > > > > However, for the following test:
> > > > > double f(double x)
> > > > > {
> > > > >   double g(double, double);
> > > > >
> > > > >   double t1 = __builtin_erfc (x);
> > > > >   return t1;
> > > > > }
> > > > >
> > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > erf(x) to erfc(x) again
> > > > > post canonicalization.
> > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > but then it tries to
> > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > with erfc(x) transformed to
> > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > Could you suggest how to proceed ?
> > > >
> > > > I applied your patch manually and it does the intended
> > > > simplifications so I wonder what I am missing?
> > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > no erf(x) in the source ?
> >
> > I do think it's reasonable to expect erfc to be available when erf
> > is and vice versa but note both are C99 specified functions (either
> > requires -lm).
> OK, thanks. Would it be OK to commit the patch after bootstrap+test ?

Yes, but I'm confused because you say the patch doesn't work for you?

Btw, please add the testcase from the PR and also a testcase that shows
the canonicalization is undone.  Maybe you can also double-check that
we handle x + erfc (x) because I see we associate that as
(x + 1) - erf (x) which is then not recognized back to erfc.

The less surprising (as to preserve the function called in the source)
variant for the PR would be to teach CSE to lookup erf(x) when
visiting erfc(x) and when found synthesize 1 - erf(x).

That said, a mathematician should chime in on how important it is
to preserve erfc vs. erf (precision or even speed).

Thanks,
Richard.

> Thanks,
> Prathamesh
> 
> >
> > Richard.
> >
> > > So for the following test:
> > > double f(double x)
> > > {
> > >   t1 = __builtin_erfc(x)
> > >   return t1;
> > > }
> > >
> > > .optimized dump shows:
> > > double f (double x)
> > > {
> > >   double t1;
> > >   double _2;
> > >
> > >   <bb 2> [local count: 1073741824]:
> > >   _2 = __builtin_erf (x_1(D));
> > >   t1_3 = 1.0e+0 - _2;
> > >   return t1_3;
> > > }
> > >
> > > while before patch, it has:
> > >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> > >   return t1_4;
> > >
> > > Thanks,
> > > Prathamesh
> > >
> > > >
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>

Prathamesh Kulkarni Oct. 19, 2021, 7:01 a.m. UTC | #6

On Mon, 18 Oct 2021 at 17:23, Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
>
> > On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > >
> > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > Hi Richard,
> > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > to match.pd:
> > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > >
> > > > > > This works to remove call to erfc for the following test:
> > > > > > double f(double x)
> > > > > > {
> > > > > >   double g(double, double);
> > > > > >
> > > > > >   double t1 = __builtin_erf (x);
> > > > > >   double t2 = __builtin_erfc (x);
> > > > > >   return g(t1, t2);
> > > > > > }
> > > > > >
> > > > > > with .optimized dump shows:
> > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > >
> > > > > > However, for the following test:
> > > > > > double f(double x)
> > > > > > {
> > > > > >   double g(double, double);
> > > > > >
> > > > > >   double t1 = __builtin_erfc (x);
> > > > > >   return t1;
> > > > > > }
> > > > > >
> > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > erf(x) to erfc(x) again
> > > > > > post canonicalization.
> > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > but then it tries to
> > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > with erfc(x) transformed to
> > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > Could you suggest how to proceed ?
> > > > >
> > > > > I applied your patch manually and it does the intended
> > > > > simplifications so I wonder what I am missing?
> > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > no erf(x) in the source ?
> > >
> > > I do think it's reasonable to expect erfc to be available when erf
> > > is and vice versa but note both are C99 specified functions (either
> > > requires -lm).
> > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
>
> Yes, but I'm confused because you say the patch doesn't work for you?
The patch works for me to CSE erf/erfc pair.
However when there's only erfc in the source, it canonicalizes erfc(x)
to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
erfc(x)
with -O3 -funsafe-math-optimizations.

For,
t1 = __builtin_erfc(x),

.optimized dump shows:
  _2 = __builtin_erf (x_1(D));
  t1_3 = 1.0e+0 - _2;

and for,
double t1 = x + __builtin_erfc(x);

.optimized dump shows:
  _3 = __builtin_erf (x_2(D));
  _7 = x_2(D) + 1.0e+0;
  t1_4 = _7 - _3;

I assume in both cases, we want erfc in the code-gen instead ?
I think the reason uncaonicalization fails is because the pattern 1 -
erf(x) to erfc(x)
gets applied, but then it fails in resimplifying erfc(x), and we end
up with 1 - erf(x) in code-gen.

From gimple-match.c, it hits the simplification:

                                gimple_seq *lseq = seq;
                                if (__builtin_expect (!dbg_cnt
(match), 0)) goto next_after_fail1172;
                                if (__builtin_expect (dump_file &&
(dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
%s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
                                {
                                  res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
                                  res_op->ops[0] = captures[0];
                                  res_op->resimplify (lseq, valueize);
                                  return true;
                                }

But res_op->resimplify returns false, and doesn't end up adding to lseq.

As you suggest, should we instead handle this in fre to transform
erfc(x) to 1 - erf(x), only when
there's a matching erf(x) in the source ?

Thanks,
Prathamesh
>
> Btw, please add the testcase from the PR and also a testcase that shows
> the canonicalization is undone.  Maybe you can also double-check that
> we handle x + erfc (x) because I see we associate that as
> (x + 1) - erf (x) which is then not recognized back to erfc.
>
> The less surprising (as to preserve the function called in the source)
> variant for the PR would be to teach CSE to lookup erf(x) when
> visiting erfc(x) and when found synthesize 1 - erf(x).
>
> That said, a mathematician should chime in on how important it is
> to preserve erfc vs. erf (precision or even speed).
>
> Thanks,
> Richard.
>
> > Thanks,
> > Prathamesh
> >
> > >
> > > Richard.
> > >
> > > > So for the following test:
> > > > double f(double x)
> > > > {
> > > >   t1 = __builtin_erfc(x)
> > > >   return t1;
> > > > }
> > > >
> > > > .optimized dump shows:
> > > > double f (double x)
> > > > {
> > > >   double t1;
> > > >   double _2;
> > > >
> > > >   <bb 2> [local count: 1073741824]:
> > > >   _2 = __builtin_erf (x_1(D));
> > > >   t1_3 = 1.0e+0 - _2;
> > > >   return t1_3;
> > > > }
> > > >
> > > > while before patch, it has:
> > > >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> > > >   return t1_4;
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Richard Biener Oct. 19, 2021, 7:32 a.m. UTC | #7

On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Mon, 18 Oct 2021 at 17:23, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > Hi Richard,
> > > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > > to match.pd:
> > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > >
> > > > > > > This works to remove call to erfc for the following test:
> > > > > > > double f(double x)
> > > > > > > {
> > > > > > >   double g(double, double);
> > > > > > >
> > > > > > >   double t1 = __builtin_erf (x);
> > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > >   return g(t1, t2);
> > > > > > > }
> > > > > > >
> > > > > > > with .optimized dump shows:
> > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > >
> > > > > > > However, for the following test:
> > > > > > > double f(double x)
> > > > > > > {
> > > > > > >   double g(double, double);
> > > > > > >
> > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > >   return t1;
> > > > > > > }
> > > > > > >
> > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > > erf(x) to erfc(x) again
> > > > > > > post canonicalization.
> > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > > but then it tries to
> > > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > > with erfc(x) transformed to
> > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > Could you suggest how to proceed ?
> > > > > >
> > > > > > I applied your patch manually and it does the intended
> > > > > > simplifications so I wonder what I am missing?
> > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > > no erf(x) in the source ?
> > > >
> > > > I do think it's reasonable to expect erfc to be available when erf
> > > > is and vice versa but note both are C99 specified functions (either
> > > > requires -lm).
> > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> >
> > Yes, but I'm confused because you say the patch doesn't work for you?
> The patch works for me to CSE erf/erfc pair.
> However when there's only erfc in the source, it canonicalizes erfc(x)
> to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> erfc(x)
> with -O3 -funsafe-math-optimizations.
>
> For,
> t1 = __builtin_erfc(x),
>
> .optimized dump shows:
>   _2 = __builtin_erf (x_1(D));
>   t1_3 = 1.0e+0 - _2;
>
> and for,
> double t1 = x + __builtin_erfc(x);
>
> .optimized dump shows:
>   _3 = __builtin_erf (x_2(D));
>   _7 = x_2(D) + 1.0e+0;
>   t1_4 = _7 - _3;
>
> I assume in both cases, we want erfc in the code-gen instead ?
> I think the reason uncaonicalization fails is because the pattern 1 -
> erf(x) to erfc(x)
> gets applied, but then it fails in resimplifying erfc(x), and we end
> up with 1 - erf(x) in code-gen.
>
> From gimple-match.c, it hits the simplification:
>
>                                 gimple_seq *lseq = seq;
>                                 if (__builtin_expect (!dbg_cnt
> (match), 0)) goto next_after_fail1172;
>                                 if (__builtin_expect (dump_file &&
> (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
>                                 {
>                                   res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
>                                   res_op->ops[0] = captures[0];
>                                   res_op->resimplify (lseq, valueize);
>                                   return true;
>                                 }
>
> But res_op->resimplify returns false, and doesn't end up adding to lseq.

There's nothing to add to lseq since there's also nothing to resimplify.
The only thing that could happen is that the replacement is not done
because replace_stmt_with_simplification via maybe_push_res_to_seq
doesn't pass the builtin_decl_implicit test:

          /* Find the function we want to call.  */
          tree decl = builtin_decl_implicit (as_builtin_fn (fn));
          if (!decl)
            return NULL;

btw, it did work for me since the call was present before and gimplification
should then mark the function eligible for implicit generation.

> As you suggest, should we instead handle this in fre to transform
> erfc(x) to 1 - erf(x), only when
> there's a matching erf(x) in the source ?

Note that's strictly less powerful and we'd have to handle erf(x) -> 1 +erfc (x)
to handle CSE in

  tem = erfc (x);
  tem2 = erf (x);

So no, I think the canonicalization is fine unless there's a compelling reason
for having both erfc and erf.

Can you debug why the reverse transform doesn't work for you?

Richard.

> Thanks,
> Prathamesh
> >
> > Btw, please add the testcase from the PR and also a testcase that shows
> > the canonicalization is undone.  Maybe you can also double-check that
> > we handle x + erfc (x) because I see we associate that as
> > (x + 1) - erf (x) which is then not recognized back to erfc.
> >
> > The less surprising (as to preserve the function called in the source)
> > variant for the PR would be to teach CSE to lookup erf(x) when
> > visiting erfc(x) and when found synthesize 1 - erf(x).
> >
> > That said, a mathematician should chime in on how important it is
> > to preserve erfc vs. erf (precision or even speed).
> >
> > Thanks,
> > Richard.
> >
> > > Thanks,
> > > Prathamesh
> > >
> > > >
> > > > Richard.
> > > >
> > > > > So for the following test:
> > > > > double f(double x)
> > > > > {
> > > > >   t1 = __builtin_erfc(x)
> > > > >   return t1;
> > > > > }
> > > > >
> > > > > .optimized dump shows:
> > > > > double f (double x)
> > > > > {
> > > > >   double t1;
> > > > >   double _2;
> > > > >
> > > > >   <bb 2> [local count: 1073741824]:
> > > > >   _2 = __builtin_erf (x_1(D));
> > > > >   t1_3 = 1.0e+0 - _2;
> > > > >   return t1_3;
> > > > > }
> > > > >
> > > > > while before patch, it has:
> > > > >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> > > > >   return t1_4;
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > >
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Prathamesh Kulkarni Oct. 19, 2021, 11:09 a.m. UTC | #8

On Tue, 19 Oct 2021 at 13:02, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Mon, 18 Oct 2021 at 17:23, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > >
> > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > >
> > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > > Hi Richard,
> > > > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > > > to match.pd:
> > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > >
> > > > > > > > This works to remove call to erfc for the following test:
> > > > > > > > double f(double x)
> > > > > > > > {
> > > > > > > >   double g(double, double);
> > > > > > > >
> > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > >   return g(t1, t2);
> > > > > > > > }
> > > > > > > >
> > > > > > > > with .optimized dump shows:
> > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > >
> > > > > > > > However, for the following test:
> > > > > > > > double f(double x)
> > > > > > > > {
> > > > > > > >   double g(double, double);
> > > > > > > >
> > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > >   return t1;
> > > > > > > > }
> > > > > > > >
> > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > > > erf(x) to erfc(x) again
> > > > > > > > post canonicalization.
> > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > > > but then it tries to
> > > > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > > > with erfc(x) transformed to
> > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > > Could you suggest how to proceed ?
> > > > > > >
> > > > > > > I applied your patch manually and it does the intended
> > > > > > > simplifications so I wonder what I am missing?
> > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > > > no erf(x) in the source ?
> > > > >
> > > > > I do think it's reasonable to expect erfc to be available when erf
> > > > > is and vice versa but note both are C99 specified functions (either
> > > > > requires -lm).
> > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> > >
> > > Yes, but I'm confused because you say the patch doesn't work for you?
> > The patch works for me to CSE erf/erfc pair.
> > However when there's only erfc in the source, it canonicalizes erfc(x)
> > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > erfc(x)
> > with -O3 -funsafe-math-optimizations.
> >
> > For,
> > t1 = __builtin_erfc(x),
> >
> > .optimized dump shows:
> >   _2 = __builtin_erf (x_1(D));
> >   t1_3 = 1.0e+0 - _2;
> >
> > and for,
> > double t1 = x + __builtin_erfc(x);
> >
> > .optimized dump shows:
> >   _3 = __builtin_erf (x_2(D));
> >   _7 = x_2(D) + 1.0e+0;
> >   t1_4 = _7 - _3;
> >
> > I assume in both cases, we want erfc in the code-gen instead ?
> > I think the reason uncaonicalization fails is because the pattern 1 -
> > erf(x) to erfc(x)
> > gets applied, but then it fails in resimplifying erfc(x), and we end
> > up with 1 - erf(x) in code-gen.
> >
> > From gimple-match.c, it hits the simplification:
> >
> >                                 gimple_seq *lseq = seq;
> >                                 if (__builtin_expect (!dbg_cnt
> > (match), 0)) goto next_after_fail1172;
> >                                 if (__builtin_expect (dump_file &&
> > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> >                                 {
> >                                   res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
> >                                   res_op->ops[0] = captures[0];
> >                                   res_op->resimplify (lseq, valueize);
> >                                   return true;
> >                                 }
> >
> > But res_op->resimplify returns false, and doesn't end up adding to lseq.
>
> There's nothing to add to lseq since there's also nothing to resimplify.
> The only thing that could happen is that the replacement is not done
> because replace_stmt_with_simplification via maybe_push_res_to_seq
> doesn't pass the builtin_decl_implicit test:
>
>           /* Find the function we want to call.  */
>           tree decl = builtin_decl_implicit (as_builtin_fn (fn));
>           if (!decl)
>             return NULL;
>
> btw, it did work for me since the call was present before and gimplification
> should then mark the function eligible for implicit generation.
>
> > As you suggest, should we instead handle this in fre to transform
> > erfc(x) to 1 - erf(x), only when
> > there's a matching erf(x) in the source ?
>
> Note that's strictly less powerful and we'd have to handle erf(x) -> 1 +erfc (x)
> to handle CSE in
>
>   tem = erfc (x);
>   tem2 = erf (x);
>
> So no, I think the canonicalization is fine unless there's a compelling reason
> for having both erfc and erf.
>
> Can you debug why the reverse transform doesn't work for you?
It seems the issue was that erfc wasn't getting marked with const
attribute, and failed the following test in
maybe_push_res_to_seq:
          /* We can't and should not emit calls to non-const functions.  */
          if (!(flags_from_decl_or_type (decl) & ECF_CONST))
            return NULL;

Passing -fno-math-errno seems to work for the reverse transform:

double f(double x)
{
  double g(double, double);

  double t1 = __builtin_erfc (x);
  return t1;
}

Compiling with -O3 -funsafe-math-optimizations -fno-math-errno:

vrp2 dump shows:
Folding statement: _2 = __builtin_erf (x_1(D));
Not folded
Folding statement: t1_3 = 1.0e+0 - _2;
Applying pattern match.pd:6162, gimple-match.c:68450
gimple_simplified to t1_3 = __builtin_erfc (x_1(D));
Folded into: t1_3 = __builtin_erfc (x_1(D));

and .optimized dump shows:
double f (double x)
{
  double t1;

  <bb 2> [local count: 1073741824]:
  t1_3 = __builtin_erfc (x_1(D)); [tail call]
  return t1_3;
}

Unfortunately, for the test-case involving erf/erfc pair, the reverse
transform seems to undo the CSE:

double f(double x)
{
  double g(double, double);

  double t1 = __builtin_erf (x);
  double t2 = __builtin_erfc (x);
  return g(t1, t2);
}

gimplification turns erfc to 1 - erf:
Applying pattern match.pd:6158, gimple-match.c:44479
gimple_simplified to D.1988 = __builtin_erf (x);
t2 = 1.0e+0 - D.1988;

  t1 = __builtin_erf (x);
  D.1988 = __builtin_erf (x);
  t2 = 1.0e+0 - D.1988;
  D.1987 = g (t1, t2);

fre1 does the CSE:
  t1_2 = __builtin_erf (x_1(D));
  t2_4 = 1.0e+0 - t1_2;
  _7 = g (t1_2, t2_4);

and forwprop4 again converts 1 - erf(x) to erfc(x), "undoing" the CSE:
Applying pattern match.pd:6162, gimple-match.c:68450
gimple_simplified to t2_3 = __builtin_erfc (x_1(D));

  t1_2 = __builtin_erf (x_1(D));
  t2_3 = __builtin_erfc (x_1(D));
  _6 = g (t1_2, t2_3);

and .optimized dump shows:
  t1_2 = __builtin_erf (x_1(D));
  t2_3 = __builtin_erfc (x_1(D));
  _6 = g (t1_2, t2_3); [tail call]

Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Prathamesh
> > >
> > > Btw, please add the testcase from the PR and also a testcase that shows
> > > the canonicalization is undone.  Maybe you can also double-check that
> > > we handle x + erfc (x) because I see we associate that as
> > > (x + 1) - erf (x) which is then not recognized back to erfc.
> > >
> > > The less surprising (as to preserve the function called in the source)
> > > variant for the PR would be to teach CSE to lookup erf(x) when
> > > visiting erfc(x) and when found synthesize 1 - erf(x).
> > >
> > > That said, a mathematician should chime in on how important it is
> > > to preserve erfc vs. erf (precision or even speed).
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > > > >
> > > > > Richard.
> > > > >
> > > > > > So for the following test:
> > > > > > double f(double x)
> > > > > > {
> > > > > >   t1 = __builtin_erfc(x)
> > > > > >   return t1;
> > > > > > }
> > > > > >
> > > > > > .optimized dump shows:
> > > > > > double f (double x)
> > > > > > {
> > > > > >   double t1;
> > > > > >   double _2;
> > > > > >
> > > > > >   <bb 2> [local count: 1073741824]:
> > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > >   return t1_3;
> > > > > > }
> > > > > >
> > > > > > while before patch, it has:
> > > > > >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> > > > > >   return t1_4;
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > >
> > > > > > >
> > > > > > > Richard.
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Richard Biener Oct. 19, 2021, 11:25 a.m. UTC | #9

On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:

> On Tue, 19 Oct 2021 at 13:02, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Mon, 18 Oct 2021 at 17:23, Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > >
> > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > Hi Richard,
> > > > > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > > > > to match.pd:
> > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > >
> > > > > > > > > This works to remove call to erfc for the following test:
> > > > > > > > > double f(double x)
> > > > > > > > > {
> > > > > > > > >   double g(double, double);
> > > > > > > > >
> > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > >   return g(t1, t2);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > with .optimized dump shows:
> > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > >
> > > > > > > > > However, for the following test:
> > > > > > > > > double f(double x)
> > > > > > > > > {
> > > > > > > > >   double g(double, double);
> > > > > > > > >
> > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > >   return t1;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > post canonicalization.
> > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > > > > but then it tries to
> > > > > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > > > > with erfc(x) transformed to
> > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > > > Could you suggest how to proceed ?
> > > > > > > >
> > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > > > > no erf(x) in the source ?
> > > > > >
> > > > > > I do think it's reasonable to expect erfc to be available when erf
> > > > > > is and vice versa but note both are C99 specified functions (either
> > > > > > requires -lm).
> > > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> > > >
> > > > Yes, but I'm confused because you say the patch doesn't work for you?
> > > The patch works for me to CSE erf/erfc pair.
> > > However when there's only erfc in the source, it canonicalizes erfc(x)
> > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > > erfc(x)
> > > with -O3 -funsafe-math-optimizations.
> > >
> > > For,
> > > t1 = __builtin_erfc(x),
> > >
> > > .optimized dump shows:
> > >   _2 = __builtin_erf (x_1(D));
> > >   t1_3 = 1.0e+0 - _2;
> > >
> > > and for,
> > > double t1 = x + __builtin_erfc(x);
> > >
> > > .optimized dump shows:
> > >   _3 = __builtin_erf (x_2(D));
> > >   _7 = x_2(D) + 1.0e+0;
> > >   t1_4 = _7 - _3;
> > >
> > > I assume in both cases, we want erfc in the code-gen instead ?
> > > I think the reason uncaonicalization fails is because the pattern 1 -
> > > erf(x) to erfc(x)
> > > gets applied, but then it fails in resimplifying erfc(x), and we end
> > > up with 1 - erf(x) in code-gen.
> > >
> > > From gimple-match.c, it hits the simplification:
> > >
> > >                                 gimple_seq *lseq = seq;
> > >                                 if (__builtin_expect (!dbg_cnt
> > > (match), 0)) goto next_after_fail1172;
> > >                                 if (__builtin_expect (dump_file &&
> > > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> > > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> > >                                 {
> > >                                   res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
> > >                                   res_op->ops[0] = captures[0];
> > >                                   res_op->resimplify (lseq, valueize);
> > >                                   return true;
> > >                                 }
> > >
> > > But res_op->resimplify returns false, and doesn't end up adding to lseq.
> >
> > There's nothing to add to lseq since there's also nothing to resimplify.
> > The only thing that could happen is that the replacement is not done
> > because replace_stmt_with_simplification via maybe_push_res_to_seq
> > doesn't pass the builtin_decl_implicit test:
> >
> >           /* Find the function we want to call.  */
> >           tree decl = builtin_decl_implicit (as_builtin_fn (fn));
> >           if (!decl)
> >             return NULL;
> >
> > btw, it did work for me since the call was present before and gimplification
> > should then mark the function eligible for implicit generation.
> >
> > > As you suggest, should we instead handle this in fre to transform
> > > erfc(x) to 1 - erf(x), only when
> > > there's a matching erf(x) in the source ?
> >
> > Note that's strictly less powerful and we'd have to handle erf(x) -> 1 +erfc (x)
> > to handle CSE in
> >
> >   tem = erfc (x);
> >   tem2 = erf (x);
> >
> > So no, I think the canonicalization is fine unless there's a compelling reason
> > for having both erfc and erf.
> >
> > Can you debug why the reverse transform doesn't work for you?
> It seems the issue was that erfc wasn't getting marked with const
> attribute, and failed the following test in
> maybe_push_res_to_seq:
>           /* We can't and should not emit calls to non-const functions.  */
>           if (!(flags_from_decl_or_type (decl) & ECF_CONST))
>             return NULL;
> 
> Passing -fno-math-errno seems to work for the reverse transform:
> 
> double f(double x)
> {
>   double g(double, double);
> 
>   double t1 = __builtin_erfc (x);
>   return t1;
> }
> 
> Compiling with -O3 -funsafe-math-optimizations -fno-math-errno:
> 
> vrp2 dump shows:
> Folding statement: _2 = __builtin_erf (x_1(D));
> Not folded
> Folding statement: t1_3 = 1.0e+0 - _2;
> Applying pattern match.pd:6162, gimple-match.c:68450
> gimple_simplified to t1_3 = __builtin_erfc (x_1(D));
> Folded into: t1_3 = __builtin_erfc (x_1(D));
> 
> and .optimized dump shows:
> double f (double x)
> {
>   double t1;
> 
>   <bb 2> [local count: 1073741824]:
>   t1_3 = __builtin_erfc (x_1(D)); [tail call]
>   return t1_3;
> }
> 
> Unfortunately, for the test-case involving erf/erfc pair, the reverse
> transform seems to undo the CSE:
> 
> double f(double x)
> {
>   double g(double, double);
> 
>   double t1 = __builtin_erf (x);
>   double t2 = __builtin_erfc (x);
>   return g(t1, t2);
> }
> 
> gimplification turns erfc to 1 - erf:
> Applying pattern match.pd:6158, gimple-match.c:44479
> gimple_simplified to D.1988 = __builtin_erf (x);
> t2 = 1.0e+0 - D.1988;
> 
>   t1 = __builtin_erf (x);
>   D.1988 = __builtin_erf (x);
>   t2 = 1.0e+0 - D.1988;
>   D.1987 = g (t1, t2);
> 
> fre1 does the CSE:
>   t1_2 = __builtin_erf (x_1(D));
>   t2_4 = 1.0e+0 - t1_2;
>   _7 = g (t1_2, t2_4);
> 
> and forwprop4 again converts 1 - erf(x) to erfc(x), "undoing" the CSE:
> Applying pattern match.pd:6162, gimple-match.c:68450
> gimple_simplified to t2_3 = __builtin_erfc (x_1(D));
> 
>   t1_2 = __builtin_erf (x_1(D));
>   t2_3 = __builtin_erfc (x_1(D));
>   _6 = g (t1_2, t2_3);
> 
> and .optimized dump shows:
>   t1_2 = __builtin_erf (x_1(D));
>   t2_3 = __builtin_erfc (x_1(D));
>   _6 = g (t1_2, t2_3); [tail call]

You probably want an explicit && single_use () check on the
1 - erf() -> erfc transform.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Btw, please add the testcase from the PR and also a testcase that shows
> > > > the canonicalization is undone.  Maybe you can also double-check that
> > > > we handle x + erfc (x) because I see we associate that as
> > > > (x + 1) - erf (x) which is then not recognized back to erfc.
> > > >
> > > > The less surprising (as to preserve the function called in the source)
> > > > variant for the PR would be to teach CSE to lookup erf(x) when
> > > > visiting erfc(x) and when found synthesize 1 - erf(x).
> > > >
> > > > That said, a mathematician should chime in on how important it is
> > > > to preserve erfc vs. erf (precision or even speed).
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > >
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > So for the following test:
> > > > > > > double f(double x)
> > > > > > > {
> > > > > > >   t1 = __builtin_erfc(x)
> > > > > > >   return t1;
> > > > > > > }
> > > > > > >
> > > > > > > .optimized dump shows:
> > > > > > > double f (double x)
> > > > > > > {
> > > > > > >   double t1;
> > > > > > >   double _2;
> > > > > > >
> > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > >   return t1_3;
> > > > > > > }
> > > > > > >
> > > > > > > while before patch, it has:
> > > > > > >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> > > > > > >   return t1_4;
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > >
> > > > > > > >
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Prathamesh
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>

Prathamesh Kulkarni Oct. 20, 2021, 11:59 a.m. UTC | #10

On Tue, 19 Oct 2021 at 16:55, Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:
>
> > On Tue, 19 Oct 2021 at 13:02, Richard Biener <richard.guenther@gmail.com> wrote:
> > >
> > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> > > > > > >
> > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > >
> > > > > > > > > > Hi Richard,
> > > > > > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > > > > > to match.pd:
> > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > > >
> > > > > > > > > > This works to remove call to erfc for the following test:
> > > > > > > > > > double f(double x)
> > > > > > > > > > {
> > > > > > > > > >   double g(double, double);
> > > > > > > > > >
> > > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > > >   return g(t1, t2);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > with .optimized dump shows:
> > > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > > >
> > > > > > > > > > However, for the following test:
> > > > > > > > > > double f(double x)
> > > > > > > > > > {
> > > > > > > > > >   double g(double, double);
> > > > > > > > > >
> > > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > > >   return t1;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > > post canonicalization.
> > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > > > > > but then it tries to
> > > > > > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > > > > > with erfc(x) transformed to
> > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > > > > Could you suggest how to proceed ?
> > > > > > > > >
> > > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > > > > > no erf(x) in the source ?
> > > > > > >
> > > > > > > I do think it's reasonable to expect erfc to be available when erf
> > > > > > > is and vice versa but note both are C99 specified functions (either
> > > > > > > requires -lm).
> > > > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> > > > >
> > > > > Yes, but I'm confused because you say the patch doesn't work for you?
> > > > The patch works for me to CSE erf/erfc pair.
> > > > However when there's only erfc in the source, it canonicalizes erfc(x)
> > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > > > erfc(x)
> > > > with -O3 -funsafe-math-optimizations.
> > > >
> > > > For,
> > > > t1 = __builtin_erfc(x),
> > > >
> > > > .optimized dump shows:
> > > >   _2 = __builtin_erf (x_1(D));
> > > >   t1_3 = 1.0e+0 - _2;
> > > >
> > > > and for,
> > > > double t1 = x + __builtin_erfc(x);
> > > >
> > > > .optimized dump shows:
> > > >   _3 = __builtin_erf (x_2(D));
> > > >   _7 = x_2(D) + 1.0e+0;
> > > >   t1_4 = _7 - _3;
> > > >
> > > > I assume in both cases, we want erfc in the code-gen instead ?
> > > > I think the reason uncaonicalization fails is because the pattern 1 -
> > > > erf(x) to erfc(x)
> > > > gets applied, but then it fails in resimplifying erfc(x), and we end
> > > > up with 1 - erf(x) in code-gen.
> > > >
> > > > From gimple-match.c, it hits the simplification:
> > > >
> > > >                                 gimple_seq *lseq = seq;
> > > >                                 if (__builtin_expect (!dbg_cnt
> > > > (match), 0)) goto next_after_fail1172;
> > > >                                 if (__builtin_expect (dump_file &&
> > > > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> > > > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> > > >                                 {
> > > >                                   res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
> > > >                                   res_op->ops[0] = captures[0];
> > > >                                   res_op->resimplify (lseq, valueize);
> > > >                                   return true;
> > > >                                 }
> > > >
> > > > But res_op->resimplify returns false, and doesn't end up adding to lseq.
> > >
> > > There's nothing to add to lseq since there's also nothing to resimplify.
> > > The only thing that could happen is that the replacement is not done
> > > because replace_stmt_with_simplification via maybe_push_res_to_seq
> > > doesn't pass the builtin_decl_implicit test:
> > >
> > >           /* Find the function we want to call.  */
> > >           tree decl = builtin_decl_implicit (as_builtin_fn (fn));
> > >           if (!decl)
> > >             return NULL;
> > >
> > > btw, it did work for me since the call was present before and gimplification
> > > should then mark the function eligible for implicit generation.
> > >
> > > > As you suggest, should we instead handle this in fre to transform
> > > > erfc(x) to 1 - erf(x), only when
> > > > there's a matching erf(x) in the source ?
> > >
> > > Note that's strictly less powerful and we'd have to handle erf(x) -> 1 +erfc (x)
> > > to handle CSE in
> > >
> > >   tem = erfc (x);
> > >   tem2 = erf (x);
> > >
> > > So no, I think the canonicalization is fine unless there's a compelling reason
> > > for having both erfc and erf.
> > >
> > > Can you debug why the reverse transform doesn't work for you?
> > It seems the issue was that erfc wasn't getting marked with const
> > attribute, and failed the following test in
> > maybe_push_res_to_seq:
> >           /* We can't and should not emit calls to non-const functions.  */
> >           if (!(flags_from_decl_or_type (decl) & ECF_CONST))
> >             return NULL;
> >
> > Passing -fno-math-errno seems to work for the reverse transform:
> >
> > double f(double x)
> > {
> >   double g(double, double);
> >
> >   double t1 = __builtin_erfc (x);
> >   return t1;
> > }
> >
> > Compiling with -O3 -funsafe-math-optimizations -fno-math-errno:
> >
> > vrp2 dump shows:
> > Folding statement: _2 = __builtin_erf (x_1(D));
> > Not folded
> > Folding statement: t1_3 = 1.0e+0 - _2;
> > Applying pattern match.pd:6162, gimple-match.c:68450
> > gimple_simplified to t1_3 = __builtin_erfc (x_1(D));
> > Folded into: t1_3 = __builtin_erfc (x_1(D));
> >
> > and .optimized dump shows:
> > double f (double x)
> > {
> >   double t1;
> >
> >   <bb 2> [local count: 1073741824]:
> >   t1_3 = __builtin_erfc (x_1(D)); [tail call]
> >   return t1_3;
> > }
> >
> > Unfortunately, for the test-case involving erf/erfc pair, the reverse
> > transform seems to undo the CSE:
> >
> > double f(double x)
> > {
> >   double g(double, double);
> >
> >   double t1 = __builtin_erf (x);
> >   double t2 = __builtin_erfc (x);
> >   return g(t1, t2);
> > }
> >
> > gimplification turns erfc to 1 - erf:
> > Applying pattern match.pd:6158, gimple-match.c:44479
> > gimple_simplified to D.1988 = __builtin_erf (x);
> > t2 = 1.0e+0 - D.1988;
> >
> >   t1 = __builtin_erf (x);
> >   D.1988 = __builtin_erf (x);
> >   t2 = 1.0e+0 - D.1988;
> >   D.1987 = g (t1, t2);
> >
> > fre1 does the CSE:
> >   t1_2 = __builtin_erf (x_1(D));
> >   t2_4 = 1.0e+0 - t1_2;
> >   _7 = g (t1_2, t2_4);
> >
> > and forwprop4 again converts 1 - erf(x) to erfc(x), "undoing" the CSE:
> > Applying pattern match.pd:6162, gimple-match.c:68450
> > gimple_simplified to t2_3 = __builtin_erfc (x_1(D));
> >
> >   t1_2 = __builtin_erf (x_1(D));
> >   t2_3 = __builtin_erfc (x_1(D));
> >   _6 = g (t1_2, t2_3);
> >
> > and .optimized dump shows:
> >   t1_2 = __builtin_erf (x_1(D));
> >   t2_3 = __builtin_erfc (x_1(D));
> >   _6 = g (t1_2, t2_3); [tail call]
>
> You probably want an explicit && single_use () check on the
> 1 - erf() -> erfc transform.
single_use worked, thanks!
As you pointed out, we reassociate x + erfc(x) to (x + 1) - erf(x) and
don't uncanonicalize back.
I added another pattern to reassociate (x + 1) - erf(x) to x + (1 -
erf(x)), after which,
it gets uncanonicalized back to x + erfc(x) in .optimized dump.
Does the attached patch look OK after bootstrap+test ?

Thanks,
Prathamesh
>
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Btw, please add the testcase from the PR and also a testcase that shows
> > > > > the canonicalization is undone.  Maybe you can also double-check that
> > > > > we handle x + erfc (x) because I see we associate that as
> > > > > (x + 1) - erf (x) which is then not recognized back to erfc.
> > > > >
> > > > > The less surprising (as to preserve the function called in the source)
> > > > > variant for the PR would be to teach CSE to lookup erf(x) when
> > > > > visiting erfc(x) and when found synthesize 1 - erf(x).
> > > > >
> > > > > That said, a mathematician should chime in on how important it is
> > > > > to preserve erfc vs. erf (precision or even speed).
> > > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > >
> > > > > > >
> > > > > > > Richard.
> > > > > > >
> > > > > > > > So for the following test:
> > > > > > > > double f(double x)
> > > > > > > > {
> > > > > > > >   t1 = __builtin_erfc(x)
> > > > > > > >   return t1;
> > > > > > > > }
> > > > > > > >
> > > > > > > > .optimized dump shows:
> > > > > > > > double f (double x)
> > > > > > > > {
> > > > > > > >   double t1;
> > > > > > > >   double _2;
> > > > > > > >
> > > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > > >   return t1_3;
> > > > > > > > }
> > > > > > > >
> > > > > > > > while before patch, it has:
> > > > > > > >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> > > > > > > >   return t1_4;
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Richard.
> > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Prathamesh
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
diff --git a/gcc/match.pd b/gcc/match.pd
index a9791ceb74a..827670fb59b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6147,6 +6147,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (floors tree_expr_nonnegative_p@0)
    (truncs @0))))
 
+/* Simplify,
+   erfc(x) -> 1 - erf(x) if canonicalize_math_p().
+   1 - erf(x) -> erfc(x) if !canonicalize_math_p().  */
+
+(if (flag_unsafe_math_optimizations)
+ (simplify
+  (ERFC @0)
+   (if (canonicalize_math_p ())
+    (minus { build_one_cst (TREE_TYPE (@0)); } (ERF @0))))
+ (simplify
+  (minus real_onep (ERF@1 @0))
+  (if (!canonicalize_math_p () && single_use (@1))
+   (ERFC @0))))
+ /* (x + 1) - erf(x) -> x + (1 - erf(x)).  */
+ (simplify
+  (minus (plus @0 real_onep) (ERF @0))
+  (plus @0 (minus { build_one_cst (TREE_TYPE (@0)); } (ERF @0))))
+
 (match double_value_p
  @0
  (if (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == double_type_node)))
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83750-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-1.c
new file mode 100644
index 00000000000..a1d456c0119
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized -funsafe-math-optimizations -fno-math-errno" } */
+
+double f(double x)
+{
+  double g (double, double);
+  double t1 = __builtin_erf (x);
+  double t2 = __builtin_erfc (x);
+  return g (t1, t2);
+}
+
+/* { dg-final { scan-tree-dump-not "__builtin_erfc" "optimized"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83750-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-2.c
new file mode 100644
index 00000000000..a55825bca47
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized -funsafe-math-optimizations -fno-math-errno" } */
+
+double f(double x)
+{
+  return __builtin_erfc (x); 
+}
+
+/* { dg-final { scan-tree-dump-not "__builtin_erf \\(" "optimized"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83750-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-3.c
new file mode 100644
index 00000000000..3305cdf125c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-3.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized -funsafe-math-optimizations -fno-math-errno" } */
+
+double f1(double x)
+{
+  return x + __builtin_erfc (x); 
+}
+
+double f2(double x)
+{
+  return __builtin_erfc (x) + x;
+}
+
+/* { dg-final { scan-tree-dump-not "__builtin_erf \\(" "optimized"} } */

Richard Biener Oct. 20, 2021, 12:51 p.m. UTC | #11

On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote:

> On Tue, 19 Oct 2021 at 16:55, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > On Tue, 19 Oct 2021 at 13:02, Richard Biener <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > >
> > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Richard,
> > > > > > > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > > > > > > to match.pd:
> > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > > > >
> > > > > > > > > > > This works to remove call to erfc for the following test:
> > > > > > > > > > > double f(double x)
> > > > > > > > > > > {
> > > > > > > > > > >   double g(double, double);
> > > > > > > > > > >
> > > > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > > > >   return g(t1, t2);
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > with .optimized dump shows:
> > > > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > > > >
> > > > > > > > > > > However, for the following test:
> > > > > > > > > > > double f(double x)
> > > > > > > > > > > {
> > > > > > > > > > >   double g(double, double);
> > > > > > > > > > >
> > > > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > > > >   return t1;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > > > post canonicalization.
> > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > > > > > > but then it tries to
> > > > > > > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > > > > > > with erfc(x) transformed to
> > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > > > > > Could you suggest how to proceed ?
> > > > > > > > > >
> > > > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > > > > > > no erf(x) in the source ?
> > > > > > > >
> > > > > > > > I do think it's reasonable to expect erfc to be available when erf
> > > > > > > > is and vice versa but note both are C99 specified functions (either
> > > > > > > > requires -lm).
> > > > > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> > > > > >
> > > > > > Yes, but I'm confused because you say the patch doesn't work for you?
> > > > > The patch works for me to CSE erf/erfc pair.
> > > > > However when there's only erfc in the source, it canonicalizes erfc(x)
> > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > > > > erfc(x)
> > > > > with -O3 -funsafe-math-optimizations.
> > > > >
> > > > > For,
> > > > > t1 = __builtin_erfc(x),
> > > > >
> > > > > .optimized dump shows:
> > > > >   _2 = __builtin_erf (x_1(D));
> > > > >   t1_3 = 1.0e+0 - _2;
> > > > >
> > > > > and for,
> > > > > double t1 = x + __builtin_erfc(x);
> > > > >
> > > > > .optimized dump shows:
> > > > >   _3 = __builtin_erf (x_2(D));
> > > > >   _7 = x_2(D) + 1.0e+0;
> > > > >   t1_4 = _7 - _3;
> > > > >
> > > > > I assume in both cases, we want erfc in the code-gen instead ?
> > > > > I think the reason uncaonicalization fails is because the pattern 1 -
> > > > > erf(x) to erfc(x)
> > > > > gets applied, but then it fails in resimplifying erfc(x), and we end
> > > > > up with 1 - erf(x) in code-gen.
> > > > >
> > > > > From gimple-match.c, it hits the simplification:
> > > > >
> > > > >                                 gimple_seq *lseq = seq;
> > > > >                                 if (__builtin_expect (!dbg_cnt
> > > > > (match), 0)) goto next_after_fail1172;
> > > > >                                 if (__builtin_expect (dump_file &&
> > > > > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> > > > > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> > > > >                                 {
> > > > >                                   res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
> > > > >                                   res_op->ops[0] = captures[0];
> > > > >                                   res_op->resimplify (lseq, valueize);
> > > > >                                   return true;
> > > > >                                 }
> > > > >
> > > > > But res_op->resimplify returns false, and doesn't end up adding to lseq.
> > > >
> > > > There's nothing to add to lseq since there's also nothing to resimplify.
> > > > The only thing that could happen is that the replacement is not done
> > > > because replace_stmt_with_simplification via maybe_push_res_to_seq
> > > > doesn't pass the builtin_decl_implicit test:
> > > >
> > > >           /* Find the function we want to call.  */
> > > >           tree decl = builtin_decl_implicit (as_builtin_fn (fn));
> > > >           if (!decl)
> > > >             return NULL;
> > > >
> > > > btw, it did work for me since the call was present before and gimplification
> > > > should then mark the function eligible for implicit generation.
> > > >
> > > > > As you suggest, should we instead handle this in fre to transform
> > > > > erfc(x) to 1 - erf(x), only when
> > > > > there's a matching erf(x) in the source ?
> > > >
> > > > Note that's strictly less powerful and we'd have to handle erf(x) -> 1 +erfc (x)
> > > > to handle CSE in
> > > >
> > > >   tem = erfc (x);
> > > >   tem2 = erf (x);
> > > >
> > > > So no, I think the canonicalization is fine unless there's a compelling reason
> > > > for having both erfc and erf.
> > > >
> > > > Can you debug why the reverse transform doesn't work for you?
> > > It seems the issue was that erfc wasn't getting marked with const
> > > attribute, and failed the following test in
> > > maybe_push_res_to_seq:
> > >           /* We can't and should not emit calls to non-const functions.  */
> > >           if (!(flags_from_decl_or_type (decl) & ECF_CONST))
> > >             return NULL;
> > >
> > > Passing -fno-math-errno seems to work for the reverse transform:
> > >
> > > double f(double x)
> > > {
> > >   double g(double, double);
> > >
> > >   double t1 = __builtin_erfc (x);
> > >   return t1;
> > > }
> > >
> > > Compiling with -O3 -funsafe-math-optimizations -fno-math-errno:
> > >
> > > vrp2 dump shows:
> > > Folding statement: _2 = __builtin_erf (x_1(D));
> > > Not folded
> > > Folding statement: t1_3 = 1.0e+0 - _2;
> > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > gimple_simplified to t1_3 = __builtin_erfc (x_1(D));
> > > Folded into: t1_3 = __builtin_erfc (x_1(D));
> > >
> > > and .optimized dump shows:
> > > double f (double x)
> > > {
> > >   double t1;
> > >
> > >   <bb 2> [local count: 1073741824]:
> > >   t1_3 = __builtin_erfc (x_1(D)); [tail call]
> > >   return t1_3;
> > > }
> > >
> > > Unfortunately, for the test-case involving erf/erfc pair, the reverse
> > > transform seems to undo the CSE:
> > >
> > > double f(double x)
> > > {
> > >   double g(double, double);
> > >
> > >   double t1 = __builtin_erf (x);
> > >   double t2 = __builtin_erfc (x);
> > >   return g(t1, t2);
> > > }
> > >
> > > gimplification turns erfc to 1 - erf:
> > > Applying pattern match.pd:6158, gimple-match.c:44479
> > > gimple_simplified to D.1988 = __builtin_erf (x);
> > > t2 = 1.0e+0 - D.1988;
> > >
> > >   t1 = __builtin_erf (x);
> > >   D.1988 = __builtin_erf (x);
> > >   t2 = 1.0e+0 - D.1988;
> > >   D.1987 = g (t1, t2);
> > >
> > > fre1 does the CSE:
> > >   t1_2 = __builtin_erf (x_1(D));
> > >   t2_4 = 1.0e+0 - t1_2;
> > >   _7 = g (t1_2, t2_4);
> > >
> > > and forwprop4 again converts 1 - erf(x) to erfc(x), "undoing" the CSE:
> > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > gimple_simplified to t2_3 = __builtin_erfc (x_1(D));
> > >
> > >   t1_2 = __builtin_erf (x_1(D));
> > >   t2_3 = __builtin_erfc (x_1(D));
> > >   _6 = g (t1_2, t2_3);
> > >
> > > and .optimized dump shows:
> > >   t1_2 = __builtin_erf (x_1(D));
> > >   t2_3 = __builtin_erfc (x_1(D));
> > >   _6 = g (t1_2, t2_3); [tail call]
> >
> > You probably want an explicit && single_use () check on the
> > 1 - erf() -> erfc transform.
> single_use worked, thanks!
> As you pointed out, we reassociate x + erfc(x) to (x + 1) - erf(x) and
> don't uncanonicalize back.
> I added another pattern to reassociate (x + 1) - erf(x) to x + (1 -
> erf(x)), after which,
> it gets uncanonicalized back to x + erfc(x) in .optimized dump.

I think that's not the way to "fix" this case, instead if we'd care
we should change reassoc to either associate a -erf() operand and
a 1 next to each other or recognize it there.  But I think that
can be done as followup.

> Does the attached patch look OK after bootstrap+test ?

It's OK with removing the extra (x+1) - erf pattern and the
associated testcase.

Richard.

> Thanks,
> Prathamesh
> >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Btw, please add the testcase from the PR and also a testcase that shows
> > > > > > the canonicalization is undone.  Maybe you can also double-check that
> > > > > > we handle x + erfc (x) because I see we associate that as
> > > > > > (x + 1) - erf (x) which is then not recognized back to erfc.
> > > > > >
> > > > > > The less surprising (as to preserve the function called in the source)
> > > > > > variant for the PR would be to teach CSE to lookup erf(x) when
> > > > > > visiting erfc(x) and when found synthesize 1 - erf(x).
> > > > > >
> > > > > > That said, a mathematician should chime in on how important it is
> > > > > > to preserve erfc vs. erf (precision or even speed).
> > > > > >
> > > > > > Thanks,
> > > > > > Richard.
> > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > >
> > > > > > > >
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > > > So for the following test:
> > > > > > > > > double f(double x)
> > > > > > > > > {
> > > > > > > > >   t1 = __builtin_erfc(x)
> > > > > > > > >   return t1;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > .optimized dump shows:
> > > > > > > > > double f (double x)
> > > > > > > > > {
> > > > > > > > >   double t1;
> > > > > > > > >   double _2;
> > > > > > > > >
> > > > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > > > >   return t1_3;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > while before patch, it has:
> > > > > > > > >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> > > > > > > > >   return t1_4;
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Prathamesh
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Richard.
> > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Prathamesh
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>

Prathamesh Kulkarni Oct. 22, 2021, 7:47 a.m. UTC | #12

On Wed, 20 Oct 2021 at 18:21, Richard Biener <rguenther@suse.de> wrote:
>
> On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote:
>
> > On Tue, 19 Oct 2021 at 16:55, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:
> > >
> > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener <rguenther@suse.de> wrote:
> > > > > > >
> > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > >
> > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > > > > > > > to match.pd:
> > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > > > > >
> > > > > > > > > > > > This works to remove call to erfc for the following test:
> > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > {
> > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > >
> > > > > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > > > > >   return g(t1, t2);
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > with .optimized dump shows:
> > > > > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > > > > >
> > > > > > > > > > > > However, for the following test:
> > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > {
> > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > >
> > > > > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > > > > >   return t1;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > > > > post canonicalization.
> > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > > > > > > > but then it tries to
> > > > > > > > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > > > > > > > with erfc(x) transformed to
> > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > > > > > > Could you suggest how to proceed ?
> > > > > > > > > > >
> > > > > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > > > > > > > no erf(x) in the source ?
> > > > > > > > >
> > > > > > > > > I do think it's reasonable to expect erfc to be available when erf
> > > > > > > > > is and vice versa but note both are C99 specified functions (either
> > > > > > > > > requires -lm).
> > > > > > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> > > > > > >
> > > > > > > Yes, but I'm confused because you say the patch doesn't work for you?
> > > > > > The patch works for me to CSE erf/erfc pair.
> > > > > > However when there's only erfc in the source, it canonicalizes erfc(x)
> > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > > > > > erfc(x)
> > > > > > with -O3 -funsafe-math-optimizations.
> > > > > >
> > > > > > For,
> > > > > > t1 = __builtin_erfc(x),
> > > > > >
> > > > > > .optimized dump shows:
> > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > >
> > > > > > and for,
> > > > > > double t1 = x + __builtin_erfc(x);
> > > > > >
> > > > > > .optimized dump shows:
> > > > > >   _3 = __builtin_erf (x_2(D));
> > > > > >   _7 = x_2(D) + 1.0e+0;
> > > > > >   t1_4 = _7 - _3;
> > > > > >
> > > > > > I assume in both cases, we want erfc in the code-gen instead ?
> > > > > > I think the reason uncaonicalization fails is because the pattern 1 -
> > > > > > erf(x) to erfc(x)
> > > > > > gets applied, but then it fails in resimplifying erfc(x), and we end
> > > > > > up with 1 - erf(x) in code-gen.
> > > > > >
> > > > > > From gimple-match.c, it hits the simplification:
> > > > > >
> > > > > >                                 gimple_seq *lseq = seq;
> > > > > >                                 if (__builtin_expect (!dbg_cnt
> > > > > > (match), 0)) goto next_after_fail1172;
> > > > > >                                 if (__builtin_expect (dump_file &&
> > > > > > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> > > > > > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> > > > > >                                 {
> > > > > >                                   res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
> > > > > >                                   res_op->ops[0] = captures[0];
> > > > > >                                   res_op->resimplify (lseq, valueize);
> > > > > >                                   return true;
> > > > > >                                 }
> > > > > >
> > > > > > But res_op->resimplify returns false, and doesn't end up adding to lseq.
> > > > >
> > > > > There's nothing to add to lseq since there's also nothing to resimplify.
> > > > > The only thing that could happen is that the replacement is not done
> > > > > because replace_stmt_with_simplification via maybe_push_res_to_seq
> > > > > doesn't pass the builtin_decl_implicit test:
> > > > >
> > > > >           /* Find the function we want to call.  */
> > > > >           tree decl = builtin_decl_implicit (as_builtin_fn (fn));
> > > > >           if (!decl)
> > > > >             return NULL;
> > > > >
> > > > > btw, it did work for me since the call was present before and gimplification
> > > > > should then mark the function eligible for implicit generation.
> > > > >
> > > > > > As you suggest, should we instead handle this in fre to transform
> > > > > > erfc(x) to 1 - erf(x), only when
> > > > > > there's a matching erf(x) in the source ?
> > > > >
> > > > > Note that's strictly less powerful and we'd have to handle erf(x) -> 1 +erfc (x)
> > > > > to handle CSE in
> > > > >
> > > > >   tem = erfc (x);
> > > > >   tem2 = erf (x);
> > > > >
> > > > > So no, I think the canonicalization is fine unless there's a compelling reason
> > > > > for having both erfc and erf.
> > > > >
> > > > > Can you debug why the reverse transform doesn't work for you?
> > > > It seems the issue was that erfc wasn't getting marked with const
> > > > attribute, and failed the following test in
> > > > maybe_push_res_to_seq:
> > > >           /* We can't and should not emit calls to non-const functions.  */
> > > >           if (!(flags_from_decl_or_type (decl) & ECF_CONST))
> > > >             return NULL;
> > > >
> > > > Passing -fno-math-errno seems to work for the reverse transform:
> > > >
> > > > double f(double x)
> > > > {
> > > >   double g(double, double);
> > > >
> > > >   double t1 = __builtin_erfc (x);
> > > >   return t1;
> > > > }
> > > >
> > > > Compiling with -O3 -funsafe-math-optimizations -fno-math-errno:
> > > >
> > > > vrp2 dump shows:
> > > > Folding statement: _2 = __builtin_erf (x_1(D));
> > > > Not folded
> > > > Folding statement: t1_3 = 1.0e+0 - _2;
> > > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > > gimple_simplified to t1_3 = __builtin_erfc (x_1(D));
> > > > Folded into: t1_3 = __builtin_erfc (x_1(D));
> > > >
> > > > and .optimized dump shows:
> > > > double f (double x)
> > > > {
> > > >   double t1;
> > > >
> > > >   <bb 2> [local count: 1073741824]:
> > > >   t1_3 = __builtin_erfc (x_1(D)); [tail call]
> > > >   return t1_3;
> > > > }
> > > >
> > > > Unfortunately, for the test-case involving erf/erfc pair, the reverse
> > > > transform seems to undo the CSE:
> > > >
> > > > double f(double x)
> > > > {
> > > >   double g(double, double);
> > > >
> > > >   double t1 = __builtin_erf (x);
> > > >   double t2 = __builtin_erfc (x);
> > > >   return g(t1, t2);
> > > > }
> > > >
> > > > gimplification turns erfc to 1 - erf:
> > > > Applying pattern match.pd:6158, gimple-match.c:44479
> > > > gimple_simplified to D.1988 = __builtin_erf (x);
> > > > t2 = 1.0e+0 - D.1988;
> > > >
> > > >   t1 = __builtin_erf (x);
> > > >   D.1988 = __builtin_erf (x);
> > > >   t2 = 1.0e+0 - D.1988;
> > > >   D.1987 = g (t1, t2);
> > > >
> > > > fre1 does the CSE:
> > > >   t1_2 = __builtin_erf (x_1(D));
> > > >   t2_4 = 1.0e+0 - t1_2;
> > > >   _7 = g (t1_2, t2_4);
> > > >
> > > > and forwprop4 again converts 1 - erf(x) to erfc(x), "undoing" the CSE:
> > > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > > gimple_simplified to t2_3 = __builtin_erfc (x_1(D));
> > > >
> > > >   t1_2 = __builtin_erf (x_1(D));
> > > >   t2_3 = __builtin_erfc (x_1(D));
> > > >   _6 = g (t1_2, t2_3);
> > > >
> > > > and .optimized dump shows:
> > > >   t1_2 = __builtin_erf (x_1(D));
> > > >   t2_3 = __builtin_erfc (x_1(D));
> > > >   _6 = g (t1_2, t2_3); [tail call]
> > >
> > > You probably want an explicit && single_use () check on the
> > > 1 - erf() -> erfc transform.
> > single_use worked, thanks!
> > As you pointed out, we reassociate x + erfc(x) to (x + 1) - erf(x) and
> > don't uncanonicalize back.
> > I added another pattern to reassociate (x + 1) - erf(x) to x + (1 -
> > erf(x)), after which,
> > it gets uncanonicalized back to x + erfc(x) in .optimized dump.
>
> I think that's not the way to "fix" this case, instead if we'd care
> we should change reassoc to either associate a -erf() operand and
> a 1 next to each other or recognize it there.  But I think that
> can be done as followup.
>
> > Does the attached patch look OK after bootstrap+test ?
>
> It's OK with removing the extra (x+1) - erf pattern and the
> associated testcase.
Thanks. It regressed builtin-nonneg-1.c because for following case:

void test(double d)
{
  if (signbit (__builtin_erfc (d)))
    link_failure_erfc ();
}

before patch signbit (erfc (d)) was transformed to 0, since erfc
returns a non-negative result.
However after patch, erfc (d), transformed to 1 - erf(d), and in
optimized dump we get:

  <bb 2> [local count: 1073741824]:
  _3 = __builtin_erf (d_2(D));
  if (_3 > 1.0e+0)
    goto <bb 3>; [41.48%]
  else
    goto <bb 4>; [58.52%]

  <bb 3> [local count: 445388112]:
  link_failure_erfc (); [tail call]

  <bb 4> [local count: 1073741824]:
  return;

The attached patch adds transform erf(x) > 1 --> 0 which removes the
redundant test and call to link_failure_erfc.
I assume the transform is OK since erf's range is [-1, 1] ?

Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Prathamesh
> > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > Btw, please add the testcase from the PR and also a testcase that shows
> > > > > > > the canonicalization is undone.  Maybe you can also double-check that
> > > > > > > we handle x + erfc (x) because I see we associate that as
> > > > > > > (x + 1) - erf (x) which is then not recognized back to erfc.
> > > > > > >
> > > > > > > The less surprising (as to preserve the function called in the source)
> > > > > > > variant for the PR would be to teach CSE to lookup erf(x) when
> > > > > > > visiting erfc(x) and when found synthesize 1 - erf(x).
> > > > > > >
> > > > > > > That said, a mathematician should chime in on how important it is
> > > > > > > to preserve erfc vs. erf (precision or even speed).
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard.
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Richard.
> > > > > > > > >
> > > > > > > > > > So for the following test:
> > > > > > > > > > double f(double x)
> > > > > > > > > > {
> > > > > > > > > >   t1 = __builtin_erfc(x)
> > > > > > > > > >   return t1;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > .optimized dump shows:
> > > > > > > > > > double f (double x)
> > > > > > > > > > {
> > > > > > > > > >   double t1;
> > > > > > > > > >   double _2;
> > > > > > > > > >
> > > > > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > > > > >   return t1_3;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > while before patch, it has:
> > > > > > > > > >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> > > > > > > > > >   return t1_4;
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Prathamesh
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Richard.
> > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Prathamesh
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
diff --git a/gcc/match.pd b/gcc/match.pd
index a9791ceb74a..80472631e00 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6147,6 +6147,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (floors tree_expr_nonnegative_p@0)
    (truncs @0))))
 
+/* Simplify,
+   erfc(x) -> 1 - erf(x) if canonicalize_math_p().
+   1 - erf(x) -> erfc(x) if !canonicalize_math_p().  */
+
+(if (flag_unsafe_math_optimizations)
+ (simplify
+  (ERFC @0)
+   (if (canonicalize_math_p ())
+    (minus { build_one_cst (TREE_TYPE (@0)); } (ERF @0))))
+ (simplify
+  (minus real_onep (ERF@1 @0))
+  (if (!canonicalize_math_p () && single_use (@1))
+   (ERFC @0)))
+ /* Simplify erf(x) > 1 to 0.  */
+ (simplify
+  (gt (ERF @0) real_onep)
+  { boolean_false_node; }))
+
 (match double_value_p
  @0
  (if (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == double_type_node)))
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83750-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-1.c
new file mode 100644
index 00000000000..a1d456c0119
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized -funsafe-math-optimizations -fno-math-errno" } */
+
+double f(double x)
+{
+  double g (double, double);
+  double t1 = __builtin_erf (x);
+  double t2 = __builtin_erfc (x);
+  return g (t1, t2);
+}
+
+/* { dg-final { scan-tree-dump-not "__builtin_erfc" "optimized"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83750-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-2.c
new file mode 100644
index 00000000000..a55825bca47
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized -funsafe-math-optimizations -fno-math-errno" } */
+
+double f(double x)
+{
+  return __builtin_erfc (x); 
+}
+
+/* { dg-final { scan-tree-dump-not "__builtin_erf \\(" "optimized"} } */

Richard Biener Oct. 22, 2021, 9:26 a.m. UTC | #13

On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote:

> On Wed, 20 Oct 2021 at 18:21, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > On Tue, 19 Oct 2021 at 16:55, Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > >
> > > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > >
> > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > > > > > > > > to match.pd:
> > > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > > > > > >
> > > > > > > > > > > > > This works to remove call to erfc for the following test:
> > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > {
> > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > >
> > > > > > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > > > > > >   return g(t1, t2);
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > with .optimized dump shows:
> > > > > > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > > > > > >
> > > > > > > > > > > > > However, for the following test:
> > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > {
> > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > >
> > > > > > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > > > > > >   return t1;
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > > > > > post canonicalization.
> > > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > > > > > > > > but then it tries to
> > > > > > > > > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > > > > > > > > with erfc(x) transformed to
> > > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > > > > > > > Could you suggest how to proceed ?
> > > > > > > > > > > >
> > > > > > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > > > > > > > > no erf(x) in the source ?
> > > > > > > > > >
> > > > > > > > > > I do think it's reasonable to expect erfc to be available when erf
> > > > > > > > > > is and vice versa but note both are C99 specified functions (either
> > > > > > > > > > requires -lm).
> > > > > > > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> > > > > > > >
> > > > > > > > Yes, but I'm confused because you say the patch doesn't work for you?
> > > > > > > The patch works for me to CSE erf/erfc pair.
> > > > > > > However when there's only erfc in the source, it canonicalizes erfc(x)
> > > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > > > > > > erfc(x)
> > > > > > > with -O3 -funsafe-math-optimizations.
> > > > > > >
> > > > > > > For,
> > > > > > > t1 = __builtin_erfc(x),
> > > > > > >
> > > > > > > .optimized dump shows:
> > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > >
> > > > > > > and for,
> > > > > > > double t1 = x + __builtin_erfc(x);
> > > > > > >
> > > > > > > .optimized dump shows:
> > > > > > >   _3 = __builtin_erf (x_2(D));
> > > > > > >   _7 = x_2(D) + 1.0e+0;
> > > > > > >   t1_4 = _7 - _3;
> > > > > > >
> > > > > > > I assume in both cases, we want erfc in the code-gen instead ?
> > > > > > > I think the reason uncaonicalization fails is because the pattern 1 -
> > > > > > > erf(x) to erfc(x)
> > > > > > > gets applied, but then it fails in resimplifying erfc(x), and we end
> > > > > > > up with 1 - erf(x) in code-gen.
> > > > > > >
> > > > > > > From gimple-match.c, it hits the simplification:
> > > > > > >
> > > > > > >                                 gimple_seq *lseq = seq;
> > > > > > >                                 if (__builtin_expect (!dbg_cnt
> > > > > > > (match), 0)) goto next_after_fail1172;
> > > > > > >                                 if (__builtin_expect (dump_file &&
> > > > > > > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> > > > > > > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> > > > > > >                                 {
> > > > > > >                                   res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
> > > > > > >                                   res_op->ops[0] = captures[0];
> > > > > > >                                   res_op->resimplify (lseq, valueize);
> > > > > > >                                   return true;
> > > > > > >                                 }
> > > > > > >
> > > > > > > But res_op->resimplify returns false, and doesn't end up adding to lseq.
> > > > > >
> > > > > > There's nothing to add to lseq since there's also nothing to resimplify.
> > > > > > The only thing that could happen is that the replacement is not done
> > > > > > because replace_stmt_with_simplification via maybe_push_res_to_seq
> > > > > > doesn't pass the builtin_decl_implicit test:
> > > > > >
> > > > > >           /* Find the function we want to call.  */
> > > > > >           tree decl = builtin_decl_implicit (as_builtin_fn (fn));
> > > > > >           if (!decl)
> > > > > >             return NULL;
> > > > > >
> > > > > > btw, it did work for me since the call was present before and gimplification
> > > > > > should then mark the function eligible for implicit generation.
> > > > > >
> > > > > > > As you suggest, should we instead handle this in fre to transform
> > > > > > > erfc(x) to 1 - erf(x), only when
> > > > > > > there's a matching erf(x) in the source ?
> > > > > >
> > > > > > Note that's strictly less powerful and we'd have to handle erf(x) -> 1 +erfc (x)
> > > > > > to handle CSE in
> > > > > >
> > > > > >   tem = erfc (x);
> > > > > >   tem2 = erf (x);
> > > > > >
> > > > > > So no, I think the canonicalization is fine unless there's a compelling reason
> > > > > > for having both erfc and erf.
> > > > > >
> > > > > > Can you debug why the reverse transform doesn't work for you?
> > > > > It seems the issue was that erfc wasn't getting marked with const
> > > > > attribute, and failed the following test in
> > > > > maybe_push_res_to_seq:
> > > > >           /* We can't and should not emit calls to non-const functions.  */
> > > > >           if (!(flags_from_decl_or_type (decl) & ECF_CONST))
> > > > >             return NULL;
> > > > >
> > > > > Passing -fno-math-errno seems to work for the reverse transform:
> > > > >
> > > > > double f(double x)
> > > > > {
> > > > >   double g(double, double);
> > > > >
> > > > >   double t1 = __builtin_erfc (x);
> > > > >   return t1;
> > > > > }
> > > > >
> > > > > Compiling with -O3 -funsafe-math-optimizations -fno-math-errno:
> > > > >
> > > > > vrp2 dump shows:
> > > > > Folding statement: _2 = __builtin_erf (x_1(D));
> > > > > Not folded
> > > > > Folding statement: t1_3 = 1.0e+0 - _2;
> > > > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > > > gimple_simplified to t1_3 = __builtin_erfc (x_1(D));
> > > > > Folded into: t1_3 = __builtin_erfc (x_1(D));
> > > > >
> > > > > and .optimized dump shows:
> > > > > double f (double x)
> > > > > {
> > > > >   double t1;
> > > > >
> > > > >   <bb 2> [local count: 1073741824]:
> > > > >   t1_3 = __builtin_erfc (x_1(D)); [tail call]
> > > > >   return t1_3;
> > > > > }
> > > > >
> > > > > Unfortunately, for the test-case involving erf/erfc pair, the reverse
> > > > > transform seems to undo the CSE:
> > > > >
> > > > > double f(double x)
> > > > > {
> > > > >   double g(double, double);
> > > > >
> > > > >   double t1 = __builtin_erf (x);
> > > > >   double t2 = __builtin_erfc (x);
> > > > >   return g(t1, t2);
> > > > > }
> > > > >
> > > > > gimplification turns erfc to 1 - erf:
> > > > > Applying pattern match.pd:6158, gimple-match.c:44479
> > > > > gimple_simplified to D.1988 = __builtin_erf (x);
> > > > > t2 = 1.0e+0 - D.1988;
> > > > >
> > > > >   t1 = __builtin_erf (x);
> > > > >   D.1988 = __builtin_erf (x);
> > > > >   t2 = 1.0e+0 - D.1988;
> > > > >   D.1987 = g (t1, t2);
> > > > >
> > > > > fre1 does the CSE:
> > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > >   t2_4 = 1.0e+0 - t1_2;
> > > > >   _7 = g (t1_2, t2_4);
> > > > >
> > > > > and forwprop4 again converts 1 - erf(x) to erfc(x), "undoing" the CSE:
> > > > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > > > gimple_simplified to t2_3 = __builtin_erfc (x_1(D));
> > > > >
> > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > >   t2_3 = __builtin_erfc (x_1(D));
> > > > >   _6 = g (t1_2, t2_3);
> > > > >
> > > > > and .optimized dump shows:
> > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > >   t2_3 = __builtin_erfc (x_1(D));
> > > > >   _6 = g (t1_2, t2_3); [tail call]
> > > >
> > > > You probably want an explicit && single_use () check on the
> > > > 1 - erf() -> erfc transform.
> > > single_use worked, thanks!
> > > As you pointed out, we reassociate x + erfc(x) to (x + 1) - erf(x) and
> > > don't uncanonicalize back.
> > > I added another pattern to reassociate (x + 1) - erf(x) to x + (1 -
> > > erf(x)), after which,
> > > it gets uncanonicalized back to x + erfc(x) in .optimized dump.
> >
> > I think that's not the way to "fix" this case, instead if we'd care
> > we should change reassoc to either associate a -erf() operand and
> > a 1 next to each other or recognize it there.  But I think that
> > can be done as followup.
> >
> > > Does the attached patch look OK after bootstrap+test ?
> >
> > It's OK with removing the extra (x+1) - erf pattern and the
> > associated testcase.
> Thanks. It regressed builtin-nonneg-1.c because for following case:
> 
> void test(double d)
> {
>   if (signbit (__builtin_erfc (d)))
>     link_failure_erfc ();
> }
> 
> before patch signbit (erfc (d)) was transformed to 0, since erfc
> returns a non-negative result.
> However after patch, erfc (d), transformed to 1 - erf(d), and in
> optimized dump we get:
> 
>   <bb 2> [local count: 1073741824]:
>   _3 = __builtin_erf (d_2(D));
>   if (_3 > 1.0e+0)
>     goto <bb 3>; [41.48%]
>   else
>     goto <bb 4>; [58.52%]
> 
>   <bb 3> [local count: 445388112]:
>   link_failure_erfc (); [tail call]
> 
>   <bb 4> [local count: 1073741824]:
>   return;
> 
> The attached patch adds transform erf(x) > 1 --> 0 which removes the
> redundant test and call to link_failure_erfc.
> I assume the transform is OK since erf's range is [-1, 1] ?

+ /* Simplify erf(x) > 1 to 0.  */
+ (simplify
+  (gt (ERF @0) real_onep)
+  { boolean_false_node; }))

most definitely this is too special - it doesn't work for erf(x) > 1.1
for example.  Ideally we'd have value-ranges on floats and that would
deal with such cases.  Note that erf can return NaN (for NaN input),
in which case gt would raise an exception but unge would not.
So the issue is that folding the gt to false would eventually remove
an exception.  Conditionalizing this on HONOR_NANS (@0) would work
around this.  And of course use sth like

 (gt (ERF @0) REAL_CST@1)
 (if (!HONOR_NANS (@0)
      && real_compare (GE_EXPR, @1, dconst1))
  { boolean_false_node; })

that said, I'm not convinced this is the very best idea - maybe
we should revisit the way do CSE erf and erfc (was there a
real-world usecase for this btw?)

Richard.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > Btw, please add the testcase from the PR and also a testcase that shows
> > > > > > > > the canonicalization is undone.  Maybe you can also double-check that
> > > > > > > > we handle x + erfc (x) because I see we associate that as
> > > > > > > > (x + 1) - erf (x) which is then not recognized back to erfc.
> > > > > > > >
> > > > > > > > The less surprising (as to preserve the function called in the source)
> > > > > > > > variant for the PR would be to teach CSE to lookup erf(x) when
> > > > > > > > visiting erfc(x) and when found synthesize 1 - erf(x).
> > > > > > > >
> > > > > > > > That said, a mathematician should chime in on how important it is
> > > > > > > > to preserve erfc vs. erf (precision or even speed).
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Prathamesh
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Richard.
> > > > > > > > > >
> > > > > > > > > > > So for the following test:
> > > > > > > > > > > double f(double x)
> > > > > > > > > > > {
> > > > > > > > > > >   t1 = __builtin_erfc(x)
> > > > > > > > > > >   return t1;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > .optimized dump shows:
> > > > > > > > > > > double f (double x)
> > > > > > > > > > > {
> > > > > > > > > > >   double t1;
> > > > > > > > > > >   double _2;
> > > > > > > > > > >
> > > > > > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > > > > > >   return t1_3;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > while before patch, it has:
> > > > > > > > > > >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> > > > > > > > > > >   return t1_4;
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Prathamesh
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Richard.
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>

Prathamesh Kulkarni Oct. 22, 2021, 10:51 a.m. UTC | #14

On Fri, 22 Oct 2021 at 14:56, Richard Biener <rguenther@suse.de> wrote:
>
> On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote:
>
> > On Wed, 20 Oct 2021 at 18:21, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote:
> > >
> > > > On Tue, 19 Oct 2021 at 16:55, Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> > > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > >
> > > > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > >
> > > > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > > > > > > > > > to match.pd:
> > > > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This works to remove call to erfc for the following test:
> > > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > > > > > > >   return g(t1, t2);
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > with .optimized dump shows:
> > > > > > > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > However, for the following test:
> > > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > > > > > > >   return t1;
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > > > > > > post canonicalization.
> > > > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > > > > > > > > > but then it tries to
> > > > > > > > > > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > > > > > > > > > with erfc(x) transformed to
> > > > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > > > > > > > > Could you suggest how to proceed ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > > > > > > > > > no erf(x) in the source ?
> > > > > > > > > > >
> > > > > > > > > > > I do think it's reasonable to expect erfc to be available when erf
> > > > > > > > > > > is and vice versa but note both are C99 specified functions (either
> > > > > > > > > > > requires -lm).
> > > > > > > > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> > > > > > > > >
> > > > > > > > > Yes, but I'm confused because you say the patch doesn't work for you?
> > > > > > > > The patch works for me to CSE erf/erfc pair.
> > > > > > > > However when there's only erfc in the source, it canonicalizes erfc(x)
> > > > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > > > > > > > erfc(x)
> > > > > > > > with -O3 -funsafe-math-optimizations.
> > > > > > > >
> > > > > > > > For,
> > > > > > > > t1 = __builtin_erfc(x),
> > > > > > > >
> > > > > > > > .optimized dump shows:
> > > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > > >
> > > > > > > > and for,
> > > > > > > > double t1 = x + __builtin_erfc(x);
> > > > > > > >
> > > > > > > > .optimized dump shows:
> > > > > > > >   _3 = __builtin_erf (x_2(D));
> > > > > > > >   _7 = x_2(D) + 1.0e+0;
> > > > > > > >   t1_4 = _7 - _3;
> > > > > > > >
> > > > > > > > I assume in both cases, we want erfc in the code-gen instead ?
> > > > > > > > I think the reason uncaonicalization fails is because the pattern 1 -
> > > > > > > > erf(x) to erfc(x)
> > > > > > > > gets applied, but then it fails in resimplifying erfc(x), and we end
> > > > > > > > up with 1 - erf(x) in code-gen.
> > > > > > > >
> > > > > > > > From gimple-match.c, it hits the simplification:
> > > > > > > >
> > > > > > > >                                 gimple_seq *lseq = seq;
> > > > > > > >                                 if (__builtin_expect (!dbg_cnt
> > > > > > > > (match), 0)) goto next_after_fail1172;
> > > > > > > >                                 if (__builtin_expect (dump_file &&
> > > > > > > > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> > > > > > > > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> > > > > > > >                                 {
> > > > > > > >                                   res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
> > > > > > > >                                   res_op->ops[0] = captures[0];
> > > > > > > >                                   res_op->resimplify (lseq, valueize);
> > > > > > > >                                   return true;
> > > > > > > >                                 }
> > > > > > > >
> > > > > > > > But res_op->resimplify returns false, and doesn't end up adding to lseq.
> > > > > > >
> > > > > > > There's nothing to add to lseq since there's also nothing to resimplify.
> > > > > > > The only thing that could happen is that the replacement is not done
> > > > > > > because replace_stmt_with_simplification via maybe_push_res_to_seq
> > > > > > > doesn't pass the builtin_decl_implicit test:
> > > > > > >
> > > > > > >           /* Find the function we want to call.  */
> > > > > > >           tree decl = builtin_decl_implicit (as_builtin_fn (fn));
> > > > > > >           if (!decl)
> > > > > > >             return NULL;
> > > > > > >
> > > > > > > btw, it did work for me since the call was present before and gimplification
> > > > > > > should then mark the function eligible for implicit generation.
> > > > > > >
> > > > > > > > As you suggest, should we instead handle this in fre to transform
> > > > > > > > erfc(x) to 1 - erf(x), only when
> > > > > > > > there's a matching erf(x) in the source ?
> > > > > > >
> > > > > > > Note that's strictly less powerful and we'd have to handle erf(x) -> 1 +erfc (x)
> > > > > > > to handle CSE in
> > > > > > >
> > > > > > >   tem = erfc (x);
> > > > > > >   tem2 = erf (x);
> > > > > > >
> > > > > > > So no, I think the canonicalization is fine unless there's a compelling reason
> > > > > > > for having both erfc and erf.
> > > > > > >
> > > > > > > Can you debug why the reverse transform doesn't work for you?
> > > > > > It seems the issue was that erfc wasn't getting marked with const
> > > > > > attribute, and failed the following test in
> > > > > > maybe_push_res_to_seq:
> > > > > >           /* We can't and should not emit calls to non-const functions.  */
> > > > > >           if (!(flags_from_decl_or_type (decl) & ECF_CONST))
> > > > > >             return NULL;
> > > > > >
> > > > > > Passing -fno-math-errno seems to work for the reverse transform:
> > > > > >
> > > > > > double f(double x)
> > > > > > {
> > > > > >   double g(double, double);
> > > > > >
> > > > > >   double t1 = __builtin_erfc (x);
> > > > > >   return t1;
> > > > > > }
> > > > > >
> > > > > > Compiling with -O3 -funsafe-math-optimizations -fno-math-errno:
> > > > > >
> > > > > > vrp2 dump shows:
> > > > > > Folding statement: _2 = __builtin_erf (x_1(D));
> > > > > > Not folded
> > > > > > Folding statement: t1_3 = 1.0e+0 - _2;
> > > > > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > > > > gimple_simplified to t1_3 = __builtin_erfc (x_1(D));
> > > > > > Folded into: t1_3 = __builtin_erfc (x_1(D));
> > > > > >
> > > > > > and .optimized dump shows:
> > > > > > double f (double x)
> > > > > > {
> > > > > >   double t1;
> > > > > >
> > > > > >   <bb 2> [local count: 1073741824]:
> > > > > >   t1_3 = __builtin_erfc (x_1(D)); [tail call]
> > > > > >   return t1_3;
> > > > > > }
> > > > > >
> > > > > > Unfortunately, for the test-case involving erf/erfc pair, the reverse
> > > > > > transform seems to undo the CSE:
> > > > > >
> > > > > > double f(double x)
> > > > > > {
> > > > > >   double g(double, double);
> > > > > >
> > > > > >   double t1 = __builtin_erf (x);
> > > > > >   double t2 = __builtin_erfc (x);
> > > > > >   return g(t1, t2);
> > > > > > }
> > > > > >
> > > > > > gimplification turns erfc to 1 - erf:
> > > > > > Applying pattern match.pd:6158, gimple-match.c:44479
> > > > > > gimple_simplified to D.1988 = __builtin_erf (x);
> > > > > > t2 = 1.0e+0 - D.1988;
> > > > > >
> > > > > >   t1 = __builtin_erf (x);
> > > > > >   D.1988 = __builtin_erf (x);
> > > > > >   t2 = 1.0e+0 - D.1988;
> > > > > >   D.1987 = g (t1, t2);
> > > > > >
> > > > > > fre1 does the CSE:
> > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > >   t2_4 = 1.0e+0 - t1_2;
> > > > > >   _7 = g (t1_2, t2_4);
> > > > > >
> > > > > > and forwprop4 again converts 1 - erf(x) to erfc(x), "undoing" the CSE:
> > > > > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > > > > gimple_simplified to t2_3 = __builtin_erfc (x_1(D));
> > > > > >
> > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > >   t2_3 = __builtin_erfc (x_1(D));
> > > > > >   _6 = g (t1_2, t2_3);
> > > > > >
> > > > > > and .optimized dump shows:
> > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > >   t2_3 = __builtin_erfc (x_1(D));
> > > > > >   _6 = g (t1_2, t2_3); [tail call]
> > > > >
> > > > > You probably want an explicit && single_use () check on the
> > > > > 1 - erf() -> erfc transform.
> > > > single_use worked, thanks!
> > > > As you pointed out, we reassociate x + erfc(x) to (x + 1) - erf(x) and
> > > > don't uncanonicalize back.
> > > > I added another pattern to reassociate (x + 1) - erf(x) to x + (1 -
> > > > erf(x)), after which,
> > > > it gets uncanonicalized back to x + erfc(x) in .optimized dump.
> > >
> > > I think that's not the way to "fix" this case, instead if we'd care
> > > we should change reassoc to either associate a -erf() operand and
> > > a 1 next to each other or recognize it there.  But I think that
> > > can be done as followup.
> > >
> > > > Does the attached patch look OK after bootstrap+test ?
> > >
> > > It's OK with removing the extra (x+1) - erf pattern and the
> > > associated testcase.
> > Thanks. It regressed builtin-nonneg-1.c because for following case:
> >
> > void test(double d)
> > {
> >   if (signbit (__builtin_erfc (d)))
> >     link_failure_erfc ();
> > }
> >
> > before patch signbit (erfc (d)) was transformed to 0, since erfc
> > returns a non-negative result.
> > However after patch, erfc (d), transformed to 1 - erf(d), and in
> > optimized dump we get:
> >
> >   <bb 2> [local count: 1073741824]:
> >   _3 = __builtin_erf (d_2(D));
> >   if (_3 > 1.0e+0)
> >     goto <bb 3>; [41.48%]
> >   else
> >     goto <bb 4>; [58.52%]
> >
> >   <bb 3> [local count: 445388112]:
> >   link_failure_erfc (); [tail call]
> >
> >   <bb 4> [local count: 1073741824]:
> >   return;
> >
> > The attached patch adds transform erf(x) > 1 --> 0 which removes the
> > redundant test and call to link_failure_erfc.
> > I assume the transform is OK since erf's range is [-1, 1] ?
>
> + /* Simplify erf(x) > 1 to 0.  */
> + (simplify
> +  (gt (ERF @0) real_onep)
> +  { boolean_false_node; }))
>
> most definitely this is too special - it doesn't work for erf(x) > 1.1
> for example.  Ideally we'd have value-ranges on floats and that would
> deal with such cases.  Note that erf can return NaN (for NaN input),
> in which case gt would raise an exception but unge would not.
> So the issue is that folding the gt to false would eventually remove
> an exception.  Conditionalizing this on HONOR_NANS (@0) would work
> around this.  And of course use sth like
>
>  (gt (ERF @0) REAL_CST@1)
>  (if (!HONOR_NANS (@0)
>       && real_compare (GE_EXPR, @1, dconst1))
>   { boolean_false_node; })
>
> that said, I'm not convinced this is the very best idea - maybe
> we should revisit the way do CSE erf and erfc (was there a
> real-world usecase for this btw?)
Hmm indeed you're right, it's rather specialised to catch erf > 0
condition in patch.
I initially thought of adding this to
fold_using_range::range_of_builtin_call, but since
vrp doesn't propagate floats yet, I added it as a transform.
No I didn't have any particular real world use case for the transform.
How do you suggest to proceed ?

Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > Richard.
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh
> > > > > > > > >
> > > > > > > > > Btw, please add the testcase from the PR and also a testcase that shows
> > > > > > > > > the canonicalization is undone.  Maybe you can also double-check that
> > > > > > > > > we handle x + erfc (x) because I see we associate that as
> > > > > > > > > (x + 1) - erf (x) which is then not recognized back to erfc.
> > > > > > > > >
> > > > > > > > > The less surprising (as to preserve the function called in the source)
> > > > > > > > > variant for the PR would be to teach CSE to lookup erf(x) when
> > > > > > > > > visiting erfc(x) and when found synthesize 1 - erf(x).
> > > > > > > > >
> > > > > > > > > That said, a mathematician should chime in on how important it is
> > > > > > > > > to preserve erfc vs. erf (precision or even speed).
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Richard.
> > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Prathamesh
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Richard.
> > > > > > > > > > >
> > > > > > > > > > > > So for the following test:
> > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > {
> > > > > > > > > > > >   t1 = __builtin_erfc(x)
> > > > > > > > > > > >   return t1;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > .optimized dump shows:
> > > > > > > > > > > > double f (double x)
> > > > > > > > > > > > {
> > > > > > > > > > > >   double t1;
> > > > > > > > > > > >   double _2;
> > > > > > > > > > > >
> > > > > > > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > > > > > > >   return t1_3;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > while before patch, it has:
> > > > > > > > > > > >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> > > > > > > > > > > >   return t1_4;
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Prathamesh
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Richard.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Richard Biener Oct. 25, 2021, 9:48 a.m. UTC | #15

On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote:

> On Fri, 22 Oct 2021 at 14:56, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > On Wed, 20 Oct 2021 at 18:21, Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Tue, 19 Oct 2021 at 16:55, Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> > > > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > > > > > > > > > > to match.pd:
> > > > > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This works to remove call to erfc for the following test:
> > > > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > > > > > > > >   return g(t1, t2);
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > with .optimized dump shows:
> > > > > > > > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > However, for the following test:
> > > > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > > > > > > > >   return t1;
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > > > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > > > > > > > post canonicalization.
> > > > > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > > > > > > > > > > but then it tries to
> > > > > > > > > > > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > > > > > > > > > > with erfc(x) transformed to
> > > > > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > > > > > > > > > Could you suggest how to proceed ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > > > > > > > > > > no erf(x) in the source ?
> > > > > > > > > > > >
> > > > > > > > > > > > I do think it's reasonable to expect erfc to be available when erf
> > > > > > > > > > > > is and vice versa but note both are C99 specified functions (either
> > > > > > > > > > > > requires -lm).
> > > > > > > > > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> > > > > > > > > >
> > > > > > > > > > Yes, but I'm confused because you say the patch doesn't work for you?
> > > > > > > > > The patch works for me to CSE erf/erfc pair.
> > > > > > > > > However when there's only erfc in the source, it canonicalizes erfc(x)
> > > > > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > > > > > > > > erfc(x)
> > > > > > > > > with -O3 -funsafe-math-optimizations.
> > > > > > > > >
> > > > > > > > > For,
> > > > > > > > > t1 = __builtin_erfc(x),
> > > > > > > > >
> > > > > > > > > .optimized dump shows:
> > > > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > > > >
> > > > > > > > > and for,
> > > > > > > > > double t1 = x + __builtin_erfc(x);
> > > > > > > > >
> > > > > > > > > .optimized dump shows:
> > > > > > > > >   _3 = __builtin_erf (x_2(D));
> > > > > > > > >   _7 = x_2(D) + 1.0e+0;
> > > > > > > > >   t1_4 = _7 - _3;
> > > > > > > > >
> > > > > > > > > I assume in both cases, we want erfc in the code-gen instead ?
> > > > > > > > > I think the reason uncaonicalization fails is because the pattern 1 -
> > > > > > > > > erf(x) to erfc(x)
> > > > > > > > > gets applied, but then it fails in resimplifying erfc(x), and we end
> > > > > > > > > up with 1 - erf(x) in code-gen.
> > > > > > > > >
> > > > > > > > > From gimple-match.c, it hits the simplification:
> > > > > > > > >
> > > > > > > > >                                 gimple_seq *lseq = seq;
> > > > > > > > >                                 if (__builtin_expect (!dbg_cnt
> > > > > > > > > (match), 0)) goto next_after_fail1172;
> > > > > > > > >                                 if (__builtin_expect (dump_file &&
> > > > > > > > > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> > > > > > > > > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> > > > > > > > >                                 {
> > > > > > > > >                                   res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
> > > > > > > > >                                   res_op->ops[0] = captures[0];
> > > > > > > > >                                   res_op->resimplify (lseq, valueize);
> > > > > > > > >                                   return true;
> > > > > > > > >                                 }
> > > > > > > > >
> > > > > > > > > But res_op->resimplify returns false, and doesn't end up adding to lseq.
> > > > > > > >
> > > > > > > > There's nothing to add to lseq since there's also nothing to resimplify.
> > > > > > > > The only thing that could happen is that the replacement is not done
> > > > > > > > because replace_stmt_with_simplification via maybe_push_res_to_seq
> > > > > > > > doesn't pass the builtin_decl_implicit test:
> > > > > > > >
> > > > > > > >           /* Find the function we want to call.  */
> > > > > > > >           tree decl = builtin_decl_implicit (as_builtin_fn (fn));
> > > > > > > >           if (!decl)
> > > > > > > >             return NULL;
> > > > > > > >
> > > > > > > > btw, it did work for me since the call was present before and gimplification
> > > > > > > > should then mark the function eligible for implicit generation.
> > > > > > > >
> > > > > > > > > As you suggest, should we instead handle this in fre to transform
> > > > > > > > > erfc(x) to 1 - erf(x), only when
> > > > > > > > > there's a matching erf(x) in the source ?
> > > > > > > >
> > > > > > > > Note that's strictly less powerful and we'd have to handle erf(x) -> 1 +erfc (x)
> > > > > > > > to handle CSE in
> > > > > > > >
> > > > > > > >   tem = erfc (x);
> > > > > > > >   tem2 = erf (x);
> > > > > > > >
> > > > > > > > So no, I think the canonicalization is fine unless there's a compelling reason
> > > > > > > > for having both erfc and erf.
> > > > > > > >
> > > > > > > > Can you debug why the reverse transform doesn't work for you?
> > > > > > > It seems the issue was that erfc wasn't getting marked with const
> > > > > > > attribute, and failed the following test in
> > > > > > > maybe_push_res_to_seq:
> > > > > > >           /* We can't and should not emit calls to non-const functions.  */
> > > > > > >           if (!(flags_from_decl_or_type (decl) & ECF_CONST))
> > > > > > >             return NULL;
> > > > > > >
> > > > > > > Passing -fno-math-errno seems to work for the reverse transform:
> > > > > > >
> > > > > > > double f(double x)
> > > > > > > {
> > > > > > >   double g(double, double);
> > > > > > >
> > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > >   return t1;
> > > > > > > }
> > > > > > >
> > > > > > > Compiling with -O3 -funsafe-math-optimizations -fno-math-errno:
> > > > > > >
> > > > > > > vrp2 dump shows:
> > > > > > > Folding statement: _2 = __builtin_erf (x_1(D));
> > > > > > > Not folded
> > > > > > > Folding statement: t1_3 = 1.0e+0 - _2;
> > > > > > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > > > > > gimple_simplified to t1_3 = __builtin_erfc (x_1(D));
> > > > > > > Folded into: t1_3 = __builtin_erfc (x_1(D));
> > > > > > >
> > > > > > > and .optimized dump shows:
> > > > > > > double f (double x)
> > > > > > > {
> > > > > > >   double t1;
> > > > > > >
> > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > >   t1_3 = __builtin_erfc (x_1(D)); [tail call]
> > > > > > >   return t1_3;
> > > > > > > }
> > > > > > >
> > > > > > > Unfortunately, for the test-case involving erf/erfc pair, the reverse
> > > > > > > transform seems to undo the CSE:
> > > > > > >
> > > > > > > double f(double x)
> > > > > > > {
> > > > > > >   double g(double, double);
> > > > > > >
> > > > > > >   double t1 = __builtin_erf (x);
> > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > >   return g(t1, t2);
> > > > > > > }
> > > > > > >
> > > > > > > gimplification turns erfc to 1 - erf:
> > > > > > > Applying pattern match.pd:6158, gimple-match.c:44479
> > > > > > > gimple_simplified to D.1988 = __builtin_erf (x);
> > > > > > > t2 = 1.0e+0 - D.1988;
> > > > > > >
> > > > > > >   t1 = __builtin_erf (x);
> > > > > > >   D.1988 = __builtin_erf (x);
> > > > > > >   t2 = 1.0e+0 - D.1988;
> > > > > > >   D.1987 = g (t1, t2);
> > > > > > >
> > > > > > > fre1 does the CSE:
> > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > >   t2_4 = 1.0e+0 - t1_2;
> > > > > > >   _7 = g (t1_2, t2_4);
> > > > > > >
> > > > > > > and forwprop4 again converts 1 - erf(x) to erfc(x), "undoing" the CSE:
> > > > > > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > > > > > gimple_simplified to t2_3 = __builtin_erfc (x_1(D));
> > > > > > >
> > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > >   t2_3 = __builtin_erfc (x_1(D));
> > > > > > >   _6 = g (t1_2, t2_3);
> > > > > > >
> > > > > > > and .optimized dump shows:
> > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > >   t2_3 = __builtin_erfc (x_1(D));
> > > > > > >   _6 = g (t1_2, t2_3); [tail call]
> > > > > >
> > > > > > You probably want an explicit && single_use () check on the
> > > > > > 1 - erf() -> erfc transform.
> > > > > single_use worked, thanks!
> > > > > As you pointed out, we reassociate x + erfc(x) to (x + 1) - erf(x) and
> > > > > don't uncanonicalize back.
> > > > > I added another pattern to reassociate (x + 1) - erf(x) to x + (1 -
> > > > > erf(x)), after which,
> > > > > it gets uncanonicalized back to x + erfc(x) in .optimized dump.
> > > >
> > > > I think that's not the way to "fix" this case, instead if we'd care
> > > > we should change reassoc to either associate a -erf() operand and
> > > > a 1 next to each other or recognize it there.  But I think that
> > > > can be done as followup.
> > > >
> > > > > Does the attached patch look OK after bootstrap+test ?
> > > >
> > > > It's OK with removing the extra (x+1) - erf pattern and the
> > > > associated testcase.
> > > Thanks. It regressed builtin-nonneg-1.c because for following case:
> > >
> > > void test(double d)
> > > {
> > >   if (signbit (__builtin_erfc (d)))
> > >     link_failure_erfc ();
> > > }
> > >
> > > before patch signbit (erfc (d)) was transformed to 0, since erfc
> > > returns a non-negative result.
> > > However after patch, erfc (d), transformed to 1 - erf(d), and in
> > > optimized dump we get:
> > >
> > >   <bb 2> [local count: 1073741824]:
> > >   _3 = __builtin_erf (d_2(D));
> > >   if (_3 > 1.0e+0)
> > >     goto <bb 3>; [41.48%]
> > >   else
> > >     goto <bb 4>; [58.52%]
> > >
> > >   <bb 3> [local count: 445388112]:
> > >   link_failure_erfc (); [tail call]
> > >
> > >   <bb 4> [local count: 1073741824]:
> > >   return;
> > >
> > > The attached patch adds transform erf(x) > 1 --> 0 which removes the
> > > redundant test and call to link_failure_erfc.
> > > I assume the transform is OK since erf's range is [-1, 1] ?
> >
> > + /* Simplify erf(x) > 1 to 0.  */
> > + (simplify
> > +  (gt (ERF @0) real_onep)
> > +  { boolean_false_node; }))
> >
> > most definitely this is too special - it doesn't work for erf(x) > 1.1
> > for example.  Ideally we'd have value-ranges on floats and that would
> > deal with such cases.  Note that erf can return NaN (for NaN input),
> > in which case gt would raise an exception but unge would not.
> > So the issue is that folding the gt to false would eventually remove
> > an exception.  Conditionalizing this on HONOR_NANS (@0) would work
> > around this.  And of course use sth like
> >
> >  (gt (ERF @0) REAL_CST@1)
> >  (if (!HONOR_NANS (@0)
> >       && real_compare (GE_EXPR, @1, dconst1))
> >   { boolean_false_node; })
> >
> > that said, I'm not convinced this is the very best idea - maybe
> > we should revisit the way do CSE erf and erfc (was there a
> > real-world usecase for this btw?)
> Hmm indeed you're right, it's rather specialised to catch erf > 0
> condition in patch.
> I initially thought of adding this to
> fold_using_range::range_of_builtin_call, but since
> vrp doesn't propagate floats yet, I added it as a transform.
> No I didn't have any particular real world use case for the transform.
> How do you suggest to proceed ?

Put it on hold, I don't see a nice place to do this that wouldn't
be a very special case for the non-real-world testcase.  The VRP
folks say they'll work on float stuff for GCC 13 so maybe we can
revisit then.

Richard.

diff mbox series

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index a9791ceb74a..217e46ff12c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6147,6 +6147,20 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (floors tree_expr_nonnegative_p@0)
    (truncs @0))))
 
+/* Simplify,
+   erfc(x) -> 1 - erf(x) if canonicalize_math_p().
+   1 - erf(x) -> erfc(x) if !canonicalize_math_p().  */
+
+(if (flag_unsafe_math_optimizations)
+ (simplify
+  (ERFC @0)
+   (if (canonicalize_math_p ())
+    (minus { build_one_cst (TREE_TYPE (@0)); } (ERF @0))))
+ (simplify
+  (minus real_onep (ERF @0))
+  (if (!canonicalize_math_p ())
+   (ERFC @0))))
+
 (match double_value_p
  @0
  (if (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == double_type_node)))