middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

Message ID patch-15778-tamar@arm.com
State Committed
Headers
Series middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart. |

Commit Message

Tamar Christina June 9, 2022, 7:52 a.m. UTC
  Hi All,

When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.  One for the
lowpart and one for the highpart.

The problem with this is that in RTL the lvalue of the RTX is the only thing
tying the two instructions together.

This means that e.g. combine is unable to try to combine the two instructions
for setting the lowpart and highpart.

For ISAs that have bit extract instructions we can eliminate one of the extracts
if, and only if we're setting the entire complex number.

This change changes the expand code when we're setting the entire complex number
to generate a subreg for the lowpart instead of a vec_extract.

This allows us to optimize sequences such as:

_Complex int f(int a, int b) {
    _Complex int t = a + b * 1i;
    return t;
}

from:

f:
	bfi     x2, x0, 0, 32
	bfi     x2, x1, 32, 32
	mov     x0, x2
	ret

into:

f:
	bfi	x0, x1, 32, 32
	ret

I have also confirmed the codegen for x86_64 did not change.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* emit-rtl.cc (validate_subreg): Accept subregs of complex modes.
	* expr.cc (emit_move_complex_parts): Emit subreg of lowpart if possible.

gcc/testsuite/ChangeLog:

	* g++.target/aarch64/complex-init.C: New test.

--- inline copy of patch -- 
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index f4404d7abe33b565358b7f609a91114c75ecf4e7..15ffca2ffe986bca56c1fae9381bd33f5d6b012d 100644




--
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index f4404d7abe33b565358b7f609a91114c75ecf4e7..15ffca2ffe986bca56c1fae9381bd33f5d6b012d 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -947,9 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
 	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
     ;
   /* Subregs involving floating point modes are not allowed to
-     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
+     change size unless it's an insert into a complex mode.
+     Therefore (subreg:DI (reg:DF) 0) and (subreg:CS (reg:SF) 0) are fine, but
      (subreg:SI (reg:DF) 0) isn't.  */
-  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
+  else if ((FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
+	   && !COMPLEX_MODE_P (omode))
     {
       if (! (known_eq (isize, osize)
 	     /* LRA can use subreg to store a floating point value in
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 5f7142b975ada2cd8b00663d35ba1e0004b8e28d..fce672c236fdbc4d40adb6e2614c234c02a61933 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3740,7 +3740,17 @@ emit_move_complex_parts (rtx x, rtx y)
       && REG_P (x) && !reg_overlap_mentioned_p (x, y))
     emit_clobber (x);
 
-  write_complex_part (x, read_complex_part (y, false), false);
+  /* If we're writing the entire value using a concat into a register
+     then emit the lower part as a simple mov followed by an insert
+     into the top part.  */
+  if (GET_CODE (y) == CONCAT && !reload_completed && REG_P (x))
+    {
+      rtx val = XEXP (y, false);
+      rtx dest = lowpart_subreg (GET_MODE (val), x, GET_MODE (x));
+      emit_move_insn (dest, val);
+    }
+  else
+    write_complex_part (x, read_complex_part (y, false), false);
   write_complex_part (x, read_complex_part (y, true), true);
 
   return get_last_insn ();
diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C b/gcc/testsuite/g++.target/aarch64/complex-init.C
new file mode 100644
index 0000000000000000000000000000000000000000..497cc4bca3e2c59da95c871ceb5cc96216fc302d
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+/*
+** _Z1fii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f(int a, int b) {
+    _Complex int t = a + b * 1i;
+    return t;
+}
+
+/*
+** _Z2f2ii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f2(int a, int b) {
+    _Complex int t = {a, b};
+    return t;
+}
+
+/*
+** _Z12f_convolutedii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f_convoluted(int a, int b) {
+    _Complex int t = (_Complex int)a;
+    __imag__ t = b;
+    return t;
+}
  

Comments

Jeff Law June 12, 2022, 5:27 p.m. UTC | #1
On 6/9/2022 1:52 AM, Tamar Christina via Gcc-patches wrote:
> Hi All,
>
> When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.  One for the
> lowpart and one for the highpart.
>
> The problem with this is that in RTL the lvalue of the RTX is the only thing
> tying the two instructions together.
>
> This means that e.g. combine is unable to try to combine the two instructions
> for setting the lowpart and highpart.
>
> For ISAs that have bit extract instructions we can eliminate one of the extracts
> if, and only if we're setting the entire complex number.
>
> This change changes the expand code when we're setting the entire complex number
> to generate a subreg for the lowpart instead of a vec_extract.
>
> This allows us to optimize sequences such as:
Just a note.  I regularly see subregs significantly interfere with 
optimization, particularly register allocation.  So be aware that 
subregs can often get in the way of generating good code.  When changing 
something to use subregs I like to run real benchmarks rather than 
working with code snippets.


>
> _Complex int f(int a, int b) {
>      _Complex int t = a + b * 1i;
>      return t;
> }
>
> from:
>
> f:
> 	bfi     x2, x0, 0, 32
> 	bfi     x2, x1, 32, 32
> 	mov     x0, x2
> 	ret
>
> into:
>
> f:
> 	bfi	x0, x1, 32, 32
> 	ret
>
> I have also confirmed the codegen for x86_64 did not change.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* emit-rtl.cc (validate_subreg): Accept subregs of complex modes.
> 	* expr.cc (emit_move_complex_parts): Emit subreg of lowpart if possible.
>
> gcc/testsuite/ChangeLog:
>
> 	* g++.target/aarch64/complex-init.C: New test.
OK.

On a related topic, any thoughts on keeping complex objects as complex 
types/modes through gimple and into at least parts of the RTL pipeline?

The way complex arithmetic instructions work on our chip is going to be 
extremely tough to utilize in GCC -- we really need to the complex 
types/arithmetic up through RTL generation at the least. Ideally we'd 
even expose complex modes all the way to final.    Is that something 
y'all could benefit from as well?  Have y'all poked at this problem at all?

jeff
  
Richard Sandiford June 13, 2022, 8:40 a.m. UTC | #2
Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.  One for the
> lowpart and one for the highpart.
>
> The problem with this is that in RTL the lvalue of the RTX is the only thing
> tying the two instructions together.
>
> This means that e.g. combine is unable to try to combine the two instructions
> for setting the lowpart and highpart.
>
> For ISAs that have bit extract instructions we can eliminate one of the extracts
> if, and only if we're setting the entire complex number.
>
> This change changes the expand code when we're setting the entire complex number
> to generate a subreg for the lowpart instead of a vec_extract.
>
> This allows us to optimize sequences such as:
>
> _Complex int f(int a, int b) {
>     _Complex int t = a + b * 1i;
>     return t;
> }
>
> from:
>
> f:
> 	bfi     x2, x0, 0, 32
> 	bfi     x2, x1, 32, 32
> 	mov     x0, x2
> 	ret
>
> into:
>
> f:
> 	bfi	x0, x1, 32, 32
> 	ret
>
> I have also confirmed the codegen for x86_64 did not change.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Ok for master?

I'm not sure this is endian-safe.  For big-endian it's the imaginary
part that can be written as a subreg.  The real part might be the
high part of a register.

Maybe a more general way to handle this would be to add (yet another)
parameter to store_bit_field that indicates that the current value
of the structure is undefined.  That would also be useful in at
least one other caller (from calls.cc).  write_complex_part could
then have a similar parameter, true for the first write and false
for the second.

Thanks,
Richard

>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* emit-rtl.cc (validate_subreg): Accept subregs of complex modes.
> 	* expr.cc (emit_move_complex_parts): Emit subreg of lowpart if possible.
>
> gcc/testsuite/ChangeLog:
>
> 	* g++.target/aarch64/complex-init.C: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index f4404d7abe33b565358b7f609a91114c75ecf4e7..15ffca2ffe986bca56c1fae9381bd33f5d6b012d 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -947,9 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
>  	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
>      ;
>    /* Subregs involving floating point modes are not allowed to
> -     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> +     change size unless it's an insert into a complex mode.
> +     Therefore (subreg:DI (reg:DF) 0) and (subreg:CS (reg:SF) 0) are fine, but
>       (subreg:SI (reg:DF) 0) isn't.  */
> -  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> +  else if ((FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> +	   && !COMPLEX_MODE_P (omode))
>      {
>        if (! (known_eq (isize, osize)
>  	     /* LRA can use subreg to store a floating point value in
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 5f7142b975ada2cd8b00663d35ba1e0004b8e28d..fce672c236fdbc4d40adb6e2614c234c02a61933 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -3740,7 +3740,17 @@ emit_move_complex_parts (rtx x, rtx y)
>        && REG_P (x) && !reg_overlap_mentioned_p (x, y))
>      emit_clobber (x);
>  
> -  write_complex_part (x, read_complex_part (y, false), false);
> +  /* If we're writing the entire value using a concat into a register
> +     then emit the lower part as a simple mov followed by an insert
> +     into the top part.  */
> +  if (GET_CODE (y) == CONCAT && !reload_completed && REG_P (x))
> +    {
> +      rtx val = XEXP (y, false);
> +      rtx dest = lowpart_subreg (GET_MODE (val), x, GET_MODE (x));
> +      emit_move_insn (dest, val);
> +    }
> +  else
> +    write_complex_part (x, read_complex_part (y, false), false);
>    write_complex_part (x, read_complex_part (y, true), true);
>  
>    return get_last_insn ();
> diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C b/gcc/testsuite/g++.target/aarch64/complex-init.C
> new file mode 100644
> index 0000000000000000000000000000000000000000..497cc4bca3e2c59da95c871ceb5cc96216fc302d
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +/*
> +** _Z1fii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...
> +*/
> +_Complex int f(int a, int b) {
> +    _Complex int t = a + b * 1i;
> +    return t;
> +}
> +
> +/*
> +** _Z2f2ii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...
> +*/
> +_Complex int f2(int a, int b) {
> +    _Complex int t = {a, b};
> +    return t;
> +}
> +
> +/*
> +** _Z12f_convolutedii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...
> +*/
> +_Complex int f_convoluted(int a, int b) {
> +    _Complex int t = (_Complex int)a;
> +    __imag__ t = b;
> +    return t;
> +}
  
Tamar Christina June 13, 2022, 10:19 a.m. UTC | #3
> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Jeff Law via
> Gcc-patches
> Sent: Sunday, June 12, 2022 6:27 PM
> To: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to
> set the lowpart.
> 
> 
> 
> On 6/9/2022 1:52 AM, Tamar Christina via Gcc-patches wrote:
> > Hi All,
> >
> > When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.
> One
> > for the lowpart and one for the highpart.
> >
> > The problem with this is that in RTL the lvalue of the RTX is the only
> > thing tying the two instructions together.
> >
> > This means that e.g. combine is unable to try to combine the two
> > instructions for setting the lowpart and highpart.
> >
> > For ISAs that have bit extract instructions we can eliminate one of
> > the extracts if, and only if we're setting the entire complex number.
> >
> > This change changes the expand code when we're setting the entire
> > complex number to generate a subreg for the lowpart instead of a
> vec_extract.
> >
> > This allows us to optimize sequences such as:
> Just a note.  I regularly see subregs significantly interfere with optimization,
> particularly register allocation.  So be aware that subregs can often get in the
> way of generating good code.  When changing something to use subregs I
> like to run real benchmarks rather than working with code snippets.
> 
> 
> >
> > _Complex int f(int a, int b) {
> >      _Complex int t = a + b * 1i;
> >      return t;
> > }
> >
> > from:
> >
> > f:
> > 	bfi     x2, x0, 0, 32
> > 	bfi     x2, x1, 32, 32
> > 	mov     x0, x2
> > 	ret
> >
> > into:
> >
> > f:
> > 	bfi	x0, x1, 32, 32
> > 	ret
> >
> > I have also confirmed the codegen for x86_64 did not change.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* emit-rtl.cc (validate_subreg): Accept subregs of complex modes.
> > 	* expr.cc (emit_move_complex_parts): Emit subreg of lowpart if
> possible.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* g++.target/aarch64/complex-init.C: New test.
> OK.
> 
> On a related topic, any thoughts on keeping complex objects as complex
> types/modes through gimple and into at least parts of the RTL pipeline?
> 
> The way complex arithmetic instructions work on our chip is going to be
> extremely tough to utilize in GCC -- we really need to the complex
> types/arithmetic up through RTL generation at the least. Ideally we'd even
> expose complex modes all the way to final.    Is that something y'all could
> benefit from as well?  Have y'all poked at this problem at all?

Not extensively, but right now the big advantage of lowering them early is for
auto-vec.   Lowering them early allows you to e.g. realize that you only need the
imaginary part of the number etc.  For auto-vec it also means we treat them as
just any other loads/stores.

I think LLVM keeps them as complex expressions much longer and they've been
having a harder time implementing some of the complex arith stuff we did in GCC 11.

Regards,
Tamar

> 
> jeff
  
Richard Biener June 13, 2022, 11:54 a.m. UTC | #4
On Sun, Jun 12, 2022 at 7:27 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
[...]
> On a related topic, any thoughts on keeping complex objects as complex
> types/modes through gimple and into at least parts of the RTL pipeline?
>
> The way complex arithmetic instructions work on our chip is going to be
> extremely tough to utilize in GCC -- we really need to the complex
> types/arithmetic up through RTL generation at the least. Ideally we'd
> even expose complex modes all the way to final.    Is that something
> y'all could benefit from as well?  Have y'all poked at this problem at all?

Since you are going to need to "recover" complex operations from people
open-coding them (both fortran and C and also C++ with std::complex) it
should be less work to just do that ;)  I think that complex modes and types
exist solely for ABI purposes.

Richard.

> jeff
>
  
Jeff Law June 13, 2022, 5:34 p.m. UTC | #5
On 6/13/2022 5:54 AM, Richard Biener wrote:
> On Sun, Jun 12, 2022 at 7:27 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> [...]
>> On a related topic, any thoughts on keeping complex objects as complex
>> types/modes through gimple and into at least parts of the RTL pipeline?
>>
>> The way complex arithmetic instructions work on our chip is going to be
>> extremely tough to utilize in GCC -- we really need to the complex
>> types/arithmetic up through RTL generation at the least. Ideally we'd
>> even expose complex modes all the way to final.    Is that something
>> y'all could benefit from as well?  Have y'all poked at this problem at all?
> Since you are going to need to "recover" complex operations from people
> open-coding them (both fortran and C and also C++ with std::complex) it
> should be less work to just do that ;)  I think that complex modes and types
> exist solely for ABI purposes.
I don't see any reasonable way to do that.  Without going into all the 
details, our complex ops work on low elements within a vector 
register.   Trying to recover them after gimple->rtl expansion would be 
similar to trying to SLP vectorize on RTL, something I'm not keen to chase.

It would be a hell of a lot easier to leave the complex ops as complex 
ops to the expanders, then make the decision to use the complex 
instructions or decompose into components.

jeff
  
Jeff Law June 13, 2022, 5:42 p.m. UTC | #6
On 6/13/2022 4:19 AM, Tamar Christina wrote:
>> -----Original Message-----
>> From: Gcc-patches <gcc-patches-
>> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Jeff Law via
>> Gcc-patches
>> Sent: Sunday, June 12, 2022 6:27 PM
>> To: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to
>> set the lowpart.
>>
>>
>>
>> On 6/9/2022 1:52 AM, Tamar Christina via Gcc-patches wrote:
>>> Hi All,
>>>
>>> When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.
>> One
>>> for the lowpart and one for the highpart.
>>>
>>> The problem with this is that in RTL the lvalue of the RTX is the only
>>> thing tying the two instructions together.
>>>
>>> This means that e.g. combine is unable to try to combine the two
>>> instructions for setting the lowpart and highpart.
>>>
>>> For ISAs that have bit extract instructions we can eliminate one of
>>> the extracts if, and only if we're setting the entire complex number.
>>>
>>> This change changes the expand code when we're setting the entire
>>> complex number to generate a subreg for the lowpart instead of a
>> vec_extract.
>>> This allows us to optimize sequences such as:
>> Just a note.  I regularly see subregs significantly interfere with optimization,
>> particularly register allocation.  So be aware that subregs can often get in the
>> way of generating good code.  When changing something to use subregs I
>> like to run real benchmarks rather than working with code snippets.
>>
>>
>>> _Complex int f(int a, int b) {
>>>       _Complex int t = a + b * 1i;
>>>       return t;
>>> }
>>>
>>> from:
>>>
>>> f:
>>> 	bfi     x2, x0, 0, 32
>>> 	bfi     x2, x1, 32, 32
>>> 	mov     x0, x2
>>> 	ret
>>>
>>> into:
>>>
>>> f:
>>> 	bfi	x0, x1, 32, 32
>>> 	ret
>>>
>>> I have also confirmed the codegen for x86_64 did not change.
>>>
>>> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
>>> and no issues.
>>>
>>> Ok for master?
>>>
>>> Thanks,
>>> Tamar
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* emit-rtl.cc (validate_subreg): Accept subregs of complex modes.
>>> 	* expr.cc (emit_move_complex_parts): Emit subreg of lowpart if
>> possible.
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* g++.target/aarch64/complex-init.C: New test.
>> OK.
>>
>> On a related topic, any thoughts on keeping complex objects as complex
>> types/modes through gimple and into at least parts of the RTL pipeline?
>>
>> The way complex arithmetic instructions work on our chip is going to be
>> extremely tough to utilize in GCC -- we really need to the complex
>> types/arithmetic up through RTL generation at the least. Ideally we'd even
>> expose complex modes all the way to final.    Is that something y'all could
>> benefit from as well?  Have y'all poked at this problem at all?
> Not extensively, but right now the big advantage of lowering them early is for
> auto-vec.   Lowering them early allows you to e.g. realize that you only need the
> imaginary part of the number etc.  For
In the case where you only operate on part of the object, then 
decomposing early and using standard vops and fops on the components 
absolutely makes sense.  That we already do.

The case I want to handle is when we're operating on both the complex 
and imaginary parts at the same time.  I've got instructions to do that, 
but where they find their operands is, umm, inconvenient -- a complex 
add ends up looking like a 2-element vector add.


> auto-vec it also means we treat them as
> just any other loads/stores.
>
> I think LLVM keeps them as complex expressions much longer and they've been
> having a harder time implementing some of the complex arith stuff we did in GCC 11.
Good to know.  THanks.

jeff
  
Richard Sandiford June 15, 2022, 11:36 a.m. UTC | #7
Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On 6/13/2022 5:54 AM, Richard Biener wrote:
>> On Sun, Jun 12, 2022 at 7:27 PM Jeff Law via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>> [...]
>>> On a related topic, any thoughts on keeping complex objects as complex
>>> types/modes through gimple and into at least parts of the RTL pipeline?
>>>
>>> The way complex arithmetic instructions work on our chip is going to be
>>> extremely tough to utilize in GCC -- we really need to the complex
>>> types/arithmetic up through RTL generation at the least. Ideally we'd
>>> even expose complex modes all the way to final.    Is that something
>>> y'all could benefit from as well?  Have y'all poked at this problem at all?
>> Since you are going to need to "recover" complex operations from people
>> open-coding them (both fortran and C and also C++ with std::complex) it
>> should be less work to just do that ;)  I think that complex modes and types
>> exist solely for ABI purposes.
> I don't see any reasonable way to do that.  Without going into all the 
> details, our complex ops work on low elements within a vector 
> register.   Trying to recover them after gimple->rtl expansion would be 
> similar to trying to SLP vectorize on RTL, something I'm not keen to chase.
>
> It would be a hell of a lot easier to leave the complex ops as complex 
> ops to the expanders, then make the decision to use the complex 
> instructions or decompose into components.

Realise you might not be in a position to answer this for confidentiality
reasons, but: would normal tree SLP not help here?  We already try to
recognise complex add & multiply, and in principle we could do the same
for other operations as well.  It shouldn't matter that a vector
multiply on 2 elements is really just a single-data operation.

Thanks,
Richard
  
Tamar Christina June 16, 2022, 11:22 a.m. UTC | #8
> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Richard
> Sandiford via Gcc-patches
> Sent: Wednesday, June 15, 2022 12:36 PM
> To: Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to
> set the lowpart.
> 
> Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On 6/13/2022 5:54 AM, Richard Biener wrote:
> >> On Sun, Jun 12, 2022 at 7:27 PM Jeff Law via Gcc-patches
> >> <gcc-patches@gcc.gnu.org> wrote:
> >> [...]
> >>> On a related topic, any thoughts on keeping complex objects as
> >>> complex types/modes through gimple and into at least parts of the RTL
> pipeline?
> >>>
> >>> The way complex arithmetic instructions work on our chip is going to
> >>> be extremely tough to utilize in GCC -- we really need to the
> >>> complex types/arithmetic up through RTL generation at the least. Ideally
> we'd
> >>> even expose complex modes all the way to final.    Is that something
> >>> y'all could benefit from as well?  Have y'all poked at this problem at all?
> >> Since you are going to need to "recover" complex operations from
> >> people open-coding them (both fortran and C and also C++ with
> >> std::complex) it should be less work to just do that ;)  I think that
> >> complex modes and types exist solely for ABI purposes.
> > I don't see any reasonable way to do that.  Without going into all the
> > details, our complex ops work on low elements within a vector
> > register.   Trying to recover them after gimple->rtl expansion would
> > be similar to trying to SLP vectorize on RTL, something I'm not keen to
> chase.
> >
> > It would be a hell of a lot easier to leave the complex ops as complex
> > ops to the expanders, then make the decision to use the complex
> > instructions or decompose into components.
> 
> Realise you might not be in a position to answer this for confidentiality
> reasons, but: would normal tree SLP not help here?  We already try to
> recognise complex add & multiply, and in principle we could do the same for
> other operations as well.  It shouldn't matter that a vector multiply on 2
> elements is really just a single-data operation.

I guess if we're talking about scalar we currently don't recognize TWO_OPERANDS
as seeds to SLP. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485  I did briefly
consider this but the problem is that if we do then we'd have to construct the scalar
complex value on the SIMD side, and that may negate any gains if the op being done is
simple. i.e. for a simple multiply the scalar code would win.  But I suppose that's
something the cost model should be able to decide.

Cheers,
Tamar

> 
> Thanks,
> Richard
  
Tamar Christina June 16, 2022, 11:28 a.m. UTC | #9
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, June 13, 2022 9:41 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de
> Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to
> set the lowpart.
> 
> Tamar Christina <tamar.christina@arm.com> writes:
> > Hi All,
> >
> > When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.
> One
> > for the lowpart and one for the highpart.
> >
> > The problem with this is that in RTL the lvalue of the RTX is the only
> > thing tying the two instructions together.
> >
> > This means that e.g. combine is unable to try to combine the two
> > instructions for setting the lowpart and highpart.
> >
> > For ISAs that have bit extract instructions we can eliminate one of
> > the extracts if, and only if we're setting the entire complex number.
> >
> > This change changes the expand code when we're setting the entire
> > complex number to generate a subreg for the lowpart instead of a
> vec_extract.
> >
> > This allows us to optimize sequences such as:
> >
> > _Complex int f(int a, int b) {
> >     _Complex int t = a + b * 1i;
> >     return t;
> > }
> >
> > from:
> >
> > f:
> > 	bfi     x2, x0, 0, 32
> > 	bfi     x2, x1, 32, 32
> > 	mov     x0, x2
> > 	ret
> >
> > into:
> >
> > f:
> > 	bfi	x0, x1, 32, 32
> > 	ret
> >
> > I have also confirmed the codegen for x86_64 did not change.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Ok for master?
> 
> I'm not sure this is endian-safe.  For big-endian it's the imaginary part that can
> be written as a subreg.  The real part might be the high part of a register.
> 
> Maybe a more general way to handle this would be to add (yet another)
> parameter to store_bit_field that indicates that the current value of the
> structure is undefined.  That would also be useful in at least one other caller
> (from calls.cc).  write_complex_part could then have a similar parameter,
> true for the first write and false for the second.

Ohayou-gozaimasu!

I've rewritten it using the approach you requested. I attempted to set the flag
In the correct places as well.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* expmed.cc (store_bit_field): Add parameter that indicates if value is
	still undefined and if so emit a subreg move instead.
	* expr.h (write_complex_part): Likewise.
	* expmed.h (store_bit_field): Add new parameter.
	* builtins.cc (expand_ifn_atomic_compare_exchange_into_call): Use new
	parameter.
	(expand_ifn_atomic_compare_exchange): Likewise.
	* calls.cc (store_unaligned_arguments_into_pseudos): Likewise.
	* emit-rtl.cc (validate_subreg): Likewise.
	* expr.cc (emit_group_store): Likewise.
	(copy_blkmode_from_reg): Likewise.
	(copy_blkmode_to_reg): Likewise.
	(clear_storage_hints): Likewise.
	(write_complex_part):  Likewise.
	(emit_move_complex_parts): Likewise.
	(expand_assignment): Likewise.
	(store_expr): Likewise.
	(store_field): Likewise.
	(expand_expr_real_2): Likewise.
	* ifcvt.cc (noce_emit_move_insn): Likewise.
	* internal-fn.cc (expand_arith_set_overflow): Likewise.
	(expand_arith_overflow_result_store): Likewise.
	(expand_addsub_overflow): Likewise.
	(expand_neg_overflow): Likewise.
	(expand_mul_overflow): Likewise.
	(expand_arith_overflow): Likewise.

gcc/testsuite/ChangeLog:

	* g++.target/aarch64/complex-init.C: New test.

--- inline copy of patch ---

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 4c6c29390531d8ae9765add598621727213b23ec..8c80e46d9c9c9c2a7e1ce0f8add86729fd542b16 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6014,8 +6014,8 @@ expand_ifn_atomic_compare_exchange_into_call (gcall *call, machine_mode mode)
       if (GET_MODE (boolret) != mode)
 	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
       x = force_reg (mode, x);
-      write_complex_part (target, boolret, true);
-      write_complex_part (target, x, false);
+      write_complex_part (target, boolret, true, true);
+      write_complex_part (target, x, false, false);
     }
 }
 
@@ -6070,8 +6070,8 @@ expand_ifn_atomic_compare_exchange (gcall *call)
       rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (GET_MODE (boolret) != mode)
 	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
-      write_complex_part (target, boolret, true);
-      write_complex_part (target, oldval, false);
+      write_complex_part (target, boolret, true, true);
+      write_complex_part (target, oldval, false, false);
     }
 }
 
diff --git a/gcc/calls.cc b/gcc/calls.cc
index e13469cfd43b5bdd4ca0d2b8458a9e4f996e36e9..586af170879ab0152e9c0634a4b8e0ce03ea8d6e 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -1214,7 +1214,7 @@ store_unaligned_arguments_into_pseudos (struct arg_data *args, int num_actuals)
 
 	    bytes -= bitsize / BITS_PER_UNIT;
 	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
-			     word_mode, word, false);
+			     word_mode, word, false, false);
 	  }
       }
 }
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index f4404d7abe33b565358b7f609a91114c75ecf4e7..15ffca2ffe986bca56c1fae9381bd33f5d6b012d 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -947,9 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
 	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
     ;
   /* Subregs involving floating point modes are not allowed to
-     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
+     change size unless it's an insert into a complex mode.
+     Therefore (subreg:DI (reg:DF) 0) and (subreg:CS (reg:SF) 0) are fine, but
      (subreg:SI (reg:DF) 0) isn't.  */
-  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
+  else if ((FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
+	   && !COMPLEX_MODE_P (omode))
     {
       if (! (known_eq (isize, osize)
 	     /* LRA can use subreg to store a floating point value in
diff --git a/gcc/expmed.h b/gcc/expmed.h
index ee1ddc82b601ce02957c493dad0d70eee2784ed7..0b2538c4c6bd51dfdc772ef70bdf631c0bed8717 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -715,7 +715,7 @@ extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
 
 extern void store_bit_field (rtx, poly_uint64, poly_uint64,
 			     poly_uint64, poly_uint64,
-			     machine_mode, rtx, bool);
+			     machine_mode, rtx, bool, bool);
 extern rtx extract_bit_field (rtx, poly_uint64, poly_uint64, int, rtx,
 			      machine_mode, machine_mode, bool, rtx *);
 extern rtx extract_low_bits (machine_mode, machine_mode, rtx);
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index ed39c88bd044279113f46661608c53b8a69d81a1..9c66cc40f60aadf5472b25770cb16d6e9f85a7e2 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -1112,13 +1112,15 @@ store_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.
 
-   If REVERSE is true, the store is to be done in reverse order.  */
+   If REVERSE is true, the store is to be done in reverse order.
+
+   If UNDEFINED_P is true then STR_RTX is currently undefined.  */
 
 void
 store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 		 poly_uint64 bitregion_start, poly_uint64 bitregion_end,
 		 machine_mode fieldmode,
-		 rtx value, bool reverse)
+		 rtx value, bool reverse, bool undefined_p)
 {
   /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
   unsigned HOST_WIDE_INT ibitsize = 0, ibitnum = 0;
@@ -1160,6 +1162,18 @@ store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
       return;
     }
 
+  if (bitsize.is_constant (&ibitsize)
+      && bitnum.is_constant (&ibitnum)
+      && is_a <scalar_int_mode> (fieldmode, &int_mode)
+      && undefined_p
+      && ibitsize == GET_MODE_BITSIZE (int_mode))
+    {
+      gcc_assert (ibitnum % BITS_PER_UNIT == 0);
+      rtx dest = lowpart_subreg (GET_MODE (value), str_rtx, GET_MODE (str_rtx));
+      emit_move_insn (dest, value);
+      return;
+    }
+
   /* Under the C++0x memory model, we must not touch bits outside the
      bit region.  Adjust the address to start at the beginning of the
      bit region.  */
diff --git a/gcc/expr.h b/gcc/expr.h
index 7e5cf495a2bf12e15e2a00d293dfb54f830f38dd..41447e023c7218db195e8db3725714cffb10b3f3 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -253,7 +253,7 @@ extern rtx_insn *emit_move_insn_1 (rtx, rtx);
 extern rtx_insn *emit_move_complex_push (machine_mode, rtx, rtx);
 extern rtx_insn *emit_move_complex_parts (rtx, rtx);
 extern rtx read_complex_part (rtx, bool);
-extern void write_complex_part (rtx, rtx, bool);
+extern void write_complex_part (rtx, rtx, bool, bool);
 extern rtx read_complex_part (rtx, bool);
 extern rtx emit_move_resolve_push (machine_mode, rtx);
 
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 5f7142b975ada2cd8b00663d35ba1e0004b8e28d..c7f057a87fbc92d81388c87dd3754b4d21af6612 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2850,7 +2850,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
 	  store_bit_field (dest,
 			   adj_bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
 			   bytepos * BITS_PER_UNIT, ssize * BITS_PER_UNIT - 1,
-			   VOIDmode, tmps[i], false);
+			   VOIDmode, tmps[i], false, false);
 	}
 
       /* Optimize the access just a bit.  */
@@ -2864,7 +2864,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
 
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 0, 0, mode, tmps[i], false);
+			 0, 0, mode, tmps[i], false, false);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2997,7 +2997,7 @@ copy_blkmode_from_reg (rtx target, rtx srcreg, tree type)
 					  xbitpos % BITS_PER_WORD, 1,
 					  NULL_RTX, copy_mode, copy_mode,
 					  false, NULL),
-		       false);
+		       false, false);
     }
 }
 
@@ -3099,7 +3099,7 @@ copy_blkmode_to_reg (machine_mode mode_in, tree src)
 					  bitpos % BITS_PER_WORD, 1,
 					  NULL_RTX, word_mode, word_mode,
 					  false, NULL),
-		       false);
+		       false, false);
     }
 
   if (mode == BLKmode)
@@ -3267,8 +3267,8 @@ clear_storage_hints (rtx object, rtx size, enum block_op_methods method,
 	  zero = CONST0_RTX (GET_MODE_INNER (mode));
 	  if (zero != NULL)
 	    {
-	      write_complex_part (object, zero, 0);
-	      write_complex_part (object, zero, 1);
+	      write_complex_part (object, zero, 0, true);
+	      write_complex_part (object, zero, 1, false);
 	      return NULL;
 	    }
 	}
@@ -3429,10 +3429,11 @@ set_storage_via_setmem (rtx object, rtx size, rtx val, unsigned int align,
 
 

 /* Write to one of the components of the complex value CPLX.  Write VAL to
-   the real part if IMAG_P is false, and the imaginary part if its true.  */
+   the real part if IMAG_P is false, and the imaginary part if its true.
+   If UNDEFINED_P then the value in CPLX is currently undefined.  */
 
 void
-write_complex_part (rtx cplx, rtx val, bool imag_p)
+write_complex_part (rtx cplx, rtx val, bool imag_p, bool undefined_p)
 {
   machine_mode cmode;
   scalar_mode imode;
@@ -3487,7 +3488,7 @@ write_complex_part (rtx cplx, rtx val, bool imag_p)
     }
 
   store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val,
-		   false);
+		   false, undefined_p);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3740,8 +3741,8 @@ emit_move_complex_parts (rtx x, rtx y)
       && REG_P (x) && !reg_overlap_mentioned_p (x, y))
     emit_clobber (x);
 
-  write_complex_part (x, read_complex_part (y, false), false);
-  write_complex_part (x, read_complex_part (y, true), true);
+  write_complex_part (x, read_complex_part (y, false), false, true);
+  write_complex_part (x, read_complex_part (y, true), true, false);
 
   return get_last_insn ();
 }
@@ -5385,7 +5386,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
 	}
       else
 	store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg,
-			 false);
+			 false, false);
       return;
     }
 
@@ -5607,8 +5608,8 @@ expand_assignment (tree to, tree from, bool nontemporal)
 	    concat_store_slow:;
 	      rtx temp = assign_stack_temp (GET_MODE (to_rtx),
 					    GET_MODE_SIZE (GET_MODE (to_rtx)));
-	      write_complex_part (temp, XEXP (to_rtx, 0), false);
-	      write_complex_part (temp, XEXP (to_rtx, 1), true);
+	      write_complex_part (temp, XEXP (to_rtx, 0), false, true);
+	      write_complex_part (temp, XEXP (to_rtx, 1), true, false);
 	      result = store_field (temp, bitsize, bitpos,
 				    bitregion_start, bitregion_end,
 				    mode1, from, get_alias_set (to),
@@ -6166,7 +6167,8 @@ store_expr (tree exp, rtx target, int call_param_p,
 		store_bit_field (target,
 				 rtx_to_poly_int64 (expr_size (exp))
 				 * BITS_PER_UNIT,
-				 0, 0, 0, GET_MODE (temp), temp, reverse);
+				 0, 0, 0, GET_MODE (temp), temp, reverse,
+				 false);
 	    }
 	  else
 	    convert_move (target, temp, TYPE_UNSIGNED (TREE_TYPE (exp)));
@@ -7556,7 +7558,7 @@ store_field (rtx target, poly_int64 bitsize, poly_int64 bitpos,
       gcc_checking_assert (known_ge (bitpos, 0));
       store_bit_field (target, bitsize, bitpos,
 		       bitregion_start, bitregion_end,
-		       mode, temp, reverse);
+		       mode, temp, reverse, false);
 
       return const0_rtx;
     }
@@ -10012,8 +10014,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	      complex_expr_swap_order:
 		/* Move the imaginary (op1) and real (op0) parts to their
 		   location.  */
-		write_complex_part (target, op1, true);
-		write_complex_part (target, op0, false);
+		write_complex_part (target, op1, true, true);
+		write_complex_part (target, op0, false, false);
 
 		return target;
 	      }
@@ -10042,8 +10044,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	  }
 
       /* Move the real (op0) and imaginary (op1) parts to their location.  */
-      write_complex_part (target, op0, false);
-      write_complex_part (target, op1, true);
+      write_complex_part (target, op0, false, true);
+      write_complex_part (target, op1, true, false);
 
       return target;
 
@@ -10282,7 +10284,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	rtx dst = gen_reg_rtx (mode);
 	emit_move_insn (dst, op0);
 	store_bit_field (dst, bitsize, bitpos, 0, 0,
-			 TYPE_MODE (TREE_TYPE (treeop1)), op1, false);
+			 TYPE_MODE (TREE_TYPE (treeop1)), op1, false, false);
 	return dst;
       }
 
diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 22960a67f893316563e46658fab8f06cf9cbeb80..149403586f4e220d27759a3c36cf5806d03f2040 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -999,7 +999,8 @@ noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y, false);
+	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y, false,
+			       false);
 	      return;
 	    }
 
@@ -1056,7 +1057,7 @@ noce_emit_move_insn (rtx x, rtx y)
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
   store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
-		   0, 0, outmode, y, false);
+		   0, 0, outmode, y, false, false);
 }
 
 /* Return the CC reg if it is used in COND.  */
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 8b1733e20c4455e4e8c383c92fe859f4256cae69..8baf52f01516a0e8294176af68e69ec67e43278f 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -735,9 +735,9 @@ expand_arith_set_overflow (tree lhs, rtx target)
 {
   if (TYPE_PRECISION (TREE_TYPE (TREE_TYPE (lhs))) == 1
       && !TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (lhs))))
-    write_complex_part (target, constm1_rtx, true);
+    write_complex_part (target, constm1_rtx, true, false);
   else
-    write_complex_part (target, const1_rtx, true);
+    write_complex_part (target, const1_rtx, true, false);
 }
 
 /* Helper for expand_*_overflow.  Store RES into the __real__ part
@@ -792,7 +792,7 @@ expand_arith_overflow_result_store (tree lhs, rtx target,
       expand_arith_set_overflow (lhs, target);
       emit_label (done_label);
     }
-  write_complex_part (target, lres, false);
+  write_complex_part (target, lres, false, false);
 }
 
 /* Helper for expand_*_overflow.  Store RES into TARGET.  */
@@ -837,7 +837,7 @@ expand_addsub_overflow (location_t loc, tree_code code, tree lhs,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true);
+	write_complex_part (target, const0_rtx, true, false);
     }
 
   /* We assume both operands and result have the same precision
@@ -1282,7 +1282,7 @@ expand_neg_overflow (location_t loc, tree lhs, tree arg1, bool is_ubsan,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true);
+	write_complex_part (target, const0_rtx, true, false);
     }
 
   enum insn_code icode = optab_handler (negv3_optab, mode);
@@ -1407,7 +1407,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true);
+	write_complex_part (target, const0_rtx, true, false);
     }
 
   if (is_ubsan)
@@ -2224,7 +2224,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
       do_compare_rtx_and_jump (op1, res, NE, true, mode, NULL_RTX, NULL,
 			       all_done_label, profile_probability::very_unlikely ());
       emit_label (set_noovf);
-      write_complex_part (target, const0_rtx, true);
+      write_complex_part (target, const0_rtx, true, false);
       emit_label (all_done_label);
     }
 
@@ -2493,7 +2493,7 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
 	{
 	  /* The infinity precision result will always fit into result.  */
 	  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
-	  write_complex_part (target, const0_rtx, true);
+	  write_complex_part (target, const0_rtx, true, false);
 	  scalar_int_mode mode = SCALAR_INT_TYPE_MODE (type);
 	  struct separate_ops ops;
 	  ops.code = code;
diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C b/gcc/testsuite/g++.target/aarch64/complex-init.C
new file mode 100644
index 0000000000000000000000000000000000000000..d3fd3e88d04a87bacf1c4ee74ce25282c6ff81e8
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+/*
+** _Z1fii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f(int a, int b) {
+    _Complex int t = a + b * 1i;
+    return t;
+}
+
+/*
+** _Z2f2ii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f2(int a, int b) {
+    _Complex int t = {a, b};
+    return t;
+}
+
+/* 
+** _Z12f_convolutedii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f_convoluted(int a, int b) {
+    _Complex int t = (_Complex int)a;
+    __imag__ t = b;
+    return t;
+}
  
Richard Sandiford June 17, 2022, 5:13 p.m. UTC | #10
Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, June 13, 2022 9:41 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de
>> Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to
>> set the lowpart.
>> 
>> Tamar Christina <tamar.christina@arm.com> writes:
>> > Hi All,
>> >
>> > When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.
>> One
>> > for the lowpart and one for the highpart.
>> >
>> > The problem with this is that in RTL the lvalue of the RTX is the only
>> > thing tying the two instructions together.
>> >
>> > This means that e.g. combine is unable to try to combine the two
>> > instructions for setting the lowpart and highpart.
>> >
>> > For ISAs that have bit extract instructions we can eliminate one of
>> > the extracts if, and only if we're setting the entire complex number.
>> >
>> > This change changes the expand code when we're setting the entire
>> > complex number to generate a subreg for the lowpart instead of a
>> vec_extract.
>> >
>> > This allows us to optimize sequences such as:
>> >
>> > _Complex int f(int a, int b) {
>> >     _Complex int t = a + b * 1i;
>> >     return t;
>> > }
>> >
>> > from:
>> >
>> > f:
>> > 	bfi     x2, x0, 0, 32
>> > 	bfi     x2, x1, 32, 32
>> > 	mov     x0, x2
>> > 	ret
>> >
>> > into:
>> >
>> > f:
>> > 	bfi	x0, x1, 32, 32
>> > 	ret
>> >
>> > I have also confirmed the codegen for x86_64 did not change.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
>> > and no issues.
>> >
>> > Ok for master?
>> 
>> I'm not sure this is endian-safe.  For big-endian it's the imaginary part that can
>> be written as a subreg.  The real part might be the high part of a register.
>> 
>> Maybe a more general way to handle this would be to add (yet another)
>> parameter to store_bit_field that indicates that the current value of the
>> structure is undefined.  That would also be useful in at least one other caller
>> (from calls.cc).  write_complex_part could then have a similar parameter,
>> true for the first write and false for the second.
>
> Ohayou-gozaimasu!
>
> I've rewritten it using the approach you requested. I attempted to set the flag
> In the correct places as well.

Thanks, looks good.  But rather than treat this as a new case, I think
we can instead generalise this store_bit_field_1 code:

      else if (constant_multiple_p (bitnum, regsize * BITS_PER_UNIT, &regnum)
	       && multiple_p (bitsize, regsize * BITS_PER_UNIT)
	       && known_ge (GET_MODE_BITSIZE (GET_MODE (op0)), bitsize))
	{
	  sub = simplify_gen_subreg (fieldmode, op0, GET_MODE (op0),
				     regnum * regsize);
	  if (sub)
	    {
	      if (reverse)
		value = flip_storage_order (fieldmode, value);
	      emit_move_insn (sub, value);
	      return true;
	    }
	}

so that the multiple_p test is skipped if the structure is undefined.

Richard

> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* expmed.cc (store_bit_field): Add parameter that indicates if value is
> 	still undefined and if so emit a subreg move instead.
> 	* expr.h (write_complex_part): Likewise.
> 	* expmed.h (store_bit_field): Add new parameter.
> 	* builtins.cc (expand_ifn_atomic_compare_exchange_into_call): Use new
> 	parameter.
> 	(expand_ifn_atomic_compare_exchange): Likewise.
> 	* calls.cc (store_unaligned_arguments_into_pseudos): Likewise.
> 	* emit-rtl.cc (validate_subreg): Likewise.
> 	* expr.cc (emit_group_store): Likewise.
> 	(copy_blkmode_from_reg): Likewise.
> 	(copy_blkmode_to_reg): Likewise.
> 	(clear_storage_hints): Likewise.
> 	(write_complex_part):  Likewise.
> 	(emit_move_complex_parts): Likewise.
> 	(expand_assignment): Likewise.
> 	(store_expr): Likewise.
> 	(store_field): Likewise.
> 	(expand_expr_real_2): Likewise.
> 	* ifcvt.cc (noce_emit_move_insn): Likewise.
> 	* internal-fn.cc (expand_arith_set_overflow): Likewise.
> 	(expand_arith_overflow_result_store): Likewise.
> 	(expand_addsub_overflow): Likewise.
> 	(expand_neg_overflow): Likewise.
> 	(expand_mul_overflow): Likewise.
> 	(expand_arith_overflow): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 	* g++.target/aarch64/complex-init.C: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 4c6c29390531d8ae9765add598621727213b23ec..8c80e46d9c9c9c2a7e1ce0f8add86729fd542b16 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -6014,8 +6014,8 @@ expand_ifn_atomic_compare_exchange_into_call (gcall *call, machine_mode mode)
>        if (GET_MODE (boolret) != mode)
>  	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
>        x = force_reg (mode, x);
> -      write_complex_part (target, boolret, true);
> -      write_complex_part (target, x, false);
> +      write_complex_part (target, boolret, true, true);
> +      write_complex_part (target, x, false, false);
>      }
>  }
>  
> @@ -6070,8 +6070,8 @@ expand_ifn_atomic_compare_exchange (gcall *call)
>        rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>        if (GET_MODE (boolret) != mode)
>  	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
> -      write_complex_part (target, boolret, true);
> -      write_complex_part (target, oldval, false);
> +      write_complex_part (target, boolret, true, true);
> +      write_complex_part (target, oldval, false, false);
>      }
>  }
>  
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index e13469cfd43b5bdd4ca0d2b8458a9e4f996e36e9..586af170879ab0152e9c0634a4b8e0ce03ea8d6e 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -1214,7 +1214,7 @@ store_unaligned_arguments_into_pseudos (struct arg_data *args, int num_actuals)
>  
>  	    bytes -= bitsize / BITS_PER_UNIT;
>  	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
> -			     word_mode, word, false);
> +			     word_mode, word, false, false);
>  	  }
>        }
>  }
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index f4404d7abe33b565358b7f609a91114c75ecf4e7..15ffca2ffe986bca56c1fae9381bd33f5d6b012d 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -947,9 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
>  	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
>      ;
>    /* Subregs involving floating point modes are not allowed to
> -     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> +     change size unless it's an insert into a complex mode.
> +     Therefore (subreg:DI (reg:DF) 0) and (subreg:CS (reg:SF) 0) are fine, but
>       (subreg:SI (reg:DF) 0) isn't.  */
> -  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> +  else if ((FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> +	   && !COMPLEX_MODE_P (omode))
>      {
>        if (! (known_eq (isize, osize)
>  	     /* LRA can use subreg to store a floating point value in
> diff --git a/gcc/expmed.h b/gcc/expmed.h
> index ee1ddc82b601ce02957c493dad0d70eee2784ed7..0b2538c4c6bd51dfdc772ef70bdf631c0bed8717 100644
> --- a/gcc/expmed.h
> +++ b/gcc/expmed.h
> @@ -715,7 +715,7 @@ extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
>  
>  extern void store_bit_field (rtx, poly_uint64, poly_uint64,
>  			     poly_uint64, poly_uint64,
> -			     machine_mode, rtx, bool);
> +			     machine_mode, rtx, bool, bool);
>  extern rtx extract_bit_field (rtx, poly_uint64, poly_uint64, int, rtx,
>  			      machine_mode, machine_mode, bool, rtx *);
>  extern rtx extract_low_bits (machine_mode, machine_mode, rtx);
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index ed39c88bd044279113f46661608c53b8a69d81a1..9c66cc40f60aadf5472b25770cb16d6e9f85a7e2 100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -1112,13 +1112,15 @@ store_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
>  
>     FIELDMODE is the machine-mode of the FIELD_DECL node for this field.
>  
> -   If REVERSE is true, the store is to be done in reverse order.  */
> +   If REVERSE is true, the store is to be done in reverse order.
> +
> +   If UNDEFINED_P is true then STR_RTX is currently undefined.  */
>  
>  void
>  store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
>  		 poly_uint64 bitregion_start, poly_uint64 bitregion_end,
>  		 machine_mode fieldmode,
> -		 rtx value, bool reverse)
> +		 rtx value, bool reverse, bool undefined_p)
>  {
>    /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
>    unsigned HOST_WIDE_INT ibitsize = 0, ibitnum = 0;
> @@ -1160,6 +1162,18 @@ store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
>        return;
>      }
>  
> +  if (bitsize.is_constant (&ibitsize)
> +      && bitnum.is_constant (&ibitnum)
> +      && is_a <scalar_int_mode> (fieldmode, &int_mode)
> +      && undefined_p
> +      && ibitsize == GET_MODE_BITSIZE (int_mode))
> +    {
> +      gcc_assert (ibitnum % BITS_PER_UNIT == 0);
> +      rtx dest = lowpart_subreg (GET_MODE (value), str_rtx, GET_MODE (str_rtx));
> +      emit_move_insn (dest, value);
> +      return;
> +    }
> +
>    /* Under the C++0x memory model, we must not touch bits outside the
>       bit region.  Adjust the address to start at the beginning of the
>       bit region.  */
> diff --git a/gcc/expr.h b/gcc/expr.h
> index 7e5cf495a2bf12e15e2a00d293dfb54f830f38dd..41447e023c7218db195e8db3725714cffb10b3f3 100644
> --- a/gcc/expr.h
> +++ b/gcc/expr.h
> @@ -253,7 +253,7 @@ extern rtx_insn *emit_move_insn_1 (rtx, rtx);
>  extern rtx_insn *emit_move_complex_push (machine_mode, rtx, rtx);
>  extern rtx_insn *emit_move_complex_parts (rtx, rtx);
>  extern rtx read_complex_part (rtx, bool);
> -extern void write_complex_part (rtx, rtx, bool);
> +extern void write_complex_part (rtx, rtx, bool, bool);
>  extern rtx read_complex_part (rtx, bool);
>  extern rtx emit_move_resolve_push (machine_mode, rtx);
>  
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 5f7142b975ada2cd8b00663d35ba1e0004b8e28d..c7f057a87fbc92d81388c87dd3754b4d21af6612 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -2850,7 +2850,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
>  	  store_bit_field (dest,
>  			   adj_bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
>  			   bytepos * BITS_PER_UNIT, ssize * BITS_PER_UNIT - 1,
> -			   VOIDmode, tmps[i], false);
> +			   VOIDmode, tmps[i], false, false);
>  	}
>  
>        /* Optimize the access just a bit.  */
> @@ -2864,7 +2864,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
>  
>        else
>  	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
> -			 0, 0, mode, tmps[i], false);
> +			 0, 0, mode, tmps[i], false, false);
>      }
>  
>    /* Copy from the pseudo into the (probable) hard reg.  */
> @@ -2997,7 +2997,7 @@ copy_blkmode_from_reg (rtx target, rtx srcreg, tree type)
>  					  xbitpos % BITS_PER_WORD, 1,
>  					  NULL_RTX, copy_mode, copy_mode,
>  					  false, NULL),
> -		       false);
> +		       false, false);
>      }
>  }
>  
> @@ -3099,7 +3099,7 @@ copy_blkmode_to_reg (machine_mode mode_in, tree src)
>  					  bitpos % BITS_PER_WORD, 1,
>  					  NULL_RTX, word_mode, word_mode,
>  					  false, NULL),
> -		       false);
> +		       false, false);
>      }
>  
>    if (mode == BLKmode)
> @@ -3267,8 +3267,8 @@ clear_storage_hints (rtx object, rtx size, enum block_op_methods method,
>  	  zero = CONST0_RTX (GET_MODE_INNER (mode));
>  	  if (zero != NULL)
>  	    {
> -	      write_complex_part (object, zero, 0);
> -	      write_complex_part (object, zero, 1);
> +	      write_complex_part (object, zero, 0, true);
> +	      write_complex_part (object, zero, 1, false);
>  	      return NULL;
>  	    }
>  	}
> @@ -3429,10 +3429,11 @@ set_storage_via_setmem (rtx object, rtx size, rtx val, unsigned int align,
>  
>  
>
>  /* Write to one of the components of the complex value CPLX.  Write VAL to
> -   the real part if IMAG_P is false, and the imaginary part if its true.  */
> +   the real part if IMAG_P is false, and the imaginary part if its true.
> +   If UNDEFINED_P then the value in CPLX is currently undefined.  */
>  
>  void
> -write_complex_part (rtx cplx, rtx val, bool imag_p)
> +write_complex_part (rtx cplx, rtx val, bool imag_p, bool undefined_p)
>  {
>    machine_mode cmode;
>    scalar_mode imode;
> @@ -3487,7 +3488,7 @@ write_complex_part (rtx cplx, rtx val, bool imag_p)
>      }
>  
>    store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val,
> -		   false);
> +		   false, undefined_p);
>  }
>  
>  /* Extract one of the components of the complex value CPLX.  Extract the
> @@ -3740,8 +3741,8 @@ emit_move_complex_parts (rtx x, rtx y)
>        && REG_P (x) && !reg_overlap_mentioned_p (x, y))
>      emit_clobber (x);
>  
> -  write_complex_part (x, read_complex_part (y, false), false);
> -  write_complex_part (x, read_complex_part (y, true), true);
> +  write_complex_part (x, read_complex_part (y, false), false, true);
> +  write_complex_part (x, read_complex_part (y, true), true, false);
>  
>    return get_last_insn ();
>  }
> @@ -5385,7 +5386,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
>  	}
>        else
>  	store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg,
> -			 false);
> +			 false, false);
>        return;
>      }
>  
> @@ -5607,8 +5608,8 @@ expand_assignment (tree to, tree from, bool nontemporal)
>  	    concat_store_slow:;
>  	      rtx temp = assign_stack_temp (GET_MODE (to_rtx),
>  					    GET_MODE_SIZE (GET_MODE (to_rtx)));
> -	      write_complex_part (temp, XEXP (to_rtx, 0), false);
> -	      write_complex_part (temp, XEXP (to_rtx, 1), true);
> +	      write_complex_part (temp, XEXP (to_rtx, 0), false, true);
> +	      write_complex_part (temp, XEXP (to_rtx, 1), true, false);
>  	      result = store_field (temp, bitsize, bitpos,
>  				    bitregion_start, bitregion_end,
>  				    mode1, from, get_alias_set (to),
> @@ -6166,7 +6167,8 @@ store_expr (tree exp, rtx target, int call_param_p,
>  		store_bit_field (target,
>  				 rtx_to_poly_int64 (expr_size (exp))
>  				 * BITS_PER_UNIT,
> -				 0, 0, 0, GET_MODE (temp), temp, reverse);
> +				 0, 0, 0, GET_MODE (temp), temp, reverse,
> +				 false);
>  	    }
>  	  else
>  	    convert_move (target, temp, TYPE_UNSIGNED (TREE_TYPE (exp)));
> @@ -7556,7 +7558,7 @@ store_field (rtx target, poly_int64 bitsize, poly_int64 bitpos,
>        gcc_checking_assert (known_ge (bitpos, 0));
>        store_bit_field (target, bitsize, bitpos,
>  		       bitregion_start, bitregion_end,
> -		       mode, temp, reverse);
> +		       mode, temp, reverse, false);
>  
>        return const0_rtx;
>      }
> @@ -10012,8 +10014,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
>  	      complex_expr_swap_order:
>  		/* Move the imaginary (op1) and real (op0) parts to their
>  		   location.  */
> -		write_complex_part (target, op1, true);
> -		write_complex_part (target, op0, false);
> +		write_complex_part (target, op1, true, true);
> +		write_complex_part (target, op0, false, false);
>  
>  		return target;
>  	      }
> @@ -10042,8 +10044,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
>  	  }
>  
>        /* Move the real (op0) and imaginary (op1) parts to their location.  */
> -      write_complex_part (target, op0, false);
> -      write_complex_part (target, op1, true);
> +      write_complex_part (target, op0, false, true);
> +      write_complex_part (target, op1, true, false);
>  
>        return target;
>  
> @@ -10282,7 +10284,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
>  	rtx dst = gen_reg_rtx (mode);
>  	emit_move_insn (dst, op0);
>  	store_bit_field (dst, bitsize, bitpos, 0, 0,
> -			 TYPE_MODE (TREE_TYPE (treeop1)), op1, false);
> +			 TYPE_MODE (TREE_TYPE (treeop1)), op1, false, false);
>  	return dst;
>        }
>  
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 22960a67f893316563e46658fab8f06cf9cbeb80..149403586f4e220d27759a3c36cf5806d03f2040 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -999,7 +999,8 @@ noce_emit_move_insn (rtx x, rtx y)
>  		}
>  
>  	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
> -	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y, false);
> +	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y, false,
> +			       false);
>  	      return;
>  	    }
>  
> @@ -1056,7 +1057,7 @@ noce_emit_move_insn (rtx x, rtx y)
>    outmode = GET_MODE (outer);
>    bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
>    store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
> -		   0, 0, outmode, y, false);
> +		   0, 0, outmode, y, false, false);
>  }
>  
>  /* Return the CC reg if it is used in COND.  */
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 8b1733e20c4455e4e8c383c92fe859f4256cae69..8baf52f01516a0e8294176af68e69ec67e43278f 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -735,9 +735,9 @@ expand_arith_set_overflow (tree lhs, rtx target)
>  {
>    if (TYPE_PRECISION (TREE_TYPE (TREE_TYPE (lhs))) == 1
>        && !TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (lhs))))
> -    write_complex_part (target, constm1_rtx, true);
> +    write_complex_part (target, constm1_rtx, true, false);
>    else
> -    write_complex_part (target, const1_rtx, true);
> +    write_complex_part (target, const1_rtx, true, false);
>  }
>  
>  /* Helper for expand_*_overflow.  Store RES into the __real__ part
> @@ -792,7 +792,7 @@ expand_arith_overflow_result_store (tree lhs, rtx target,
>        expand_arith_set_overflow (lhs, target);
>        emit_label (done_label);
>      }
> -  write_complex_part (target, lres, false);
> +  write_complex_part (target, lres, false, false);
>  }
>  
>  /* Helper for expand_*_overflow.  Store RES into TARGET.  */
> @@ -837,7 +837,7 @@ expand_addsub_overflow (location_t loc, tree_code code, tree lhs,
>      {
>        target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>        if (!is_ubsan)
> -	write_complex_part (target, const0_rtx, true);
> +	write_complex_part (target, const0_rtx, true, false);
>      }
>  
>    /* We assume both operands and result have the same precision
> @@ -1282,7 +1282,7 @@ expand_neg_overflow (location_t loc, tree lhs, tree arg1, bool is_ubsan,
>      {
>        target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>        if (!is_ubsan)
> -	write_complex_part (target, const0_rtx, true);
> +	write_complex_part (target, const0_rtx, true, false);
>      }
>  
>    enum insn_code icode = optab_handler (negv3_optab, mode);
> @@ -1407,7 +1407,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
>      {
>        target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>        if (!is_ubsan)
> -	write_complex_part (target, const0_rtx, true);
> +	write_complex_part (target, const0_rtx, true, false);
>      }
>  
>    if (is_ubsan)
> @@ -2224,7 +2224,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
>        do_compare_rtx_and_jump (op1, res, NE, true, mode, NULL_RTX, NULL,
>  			       all_done_label, profile_probability::very_unlikely ());
>        emit_label (set_noovf);
> -      write_complex_part (target, const0_rtx, true);
> +      write_complex_part (target, const0_rtx, true, false);
>        emit_label (all_done_label);
>      }
>  
> @@ -2493,7 +2493,7 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
>  	{
>  	  /* The infinity precision result will always fit into result.  */
>  	  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> -	  write_complex_part (target, const0_rtx, true);
> +	  write_complex_part (target, const0_rtx, true, false);
>  	  scalar_int_mode mode = SCALAR_INT_TYPE_MODE (type);
>  	  struct separate_ops ops;
>  	  ops.code = code;
> diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C b/gcc/testsuite/g++.target/aarch64/complex-init.C
> new file mode 100644
> index 0000000000000000000000000000000000000000..d3fd3e88d04a87bacf1c4ee74ce25282c6ff81e8
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +/*
> +** _Z1fii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...
> +*/
> +_Complex int f(int a, int b) {
> +    _Complex int t = a + b * 1i;
> +    return t;
> +}
> +
> +/*
> +** _Z2f2ii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...
> +*/
> +_Complex int f2(int a, int b) {
> +    _Complex int t = {a, b};
> +    return t;
> +}
> +
> +/* 
> +** _Z12f_convolutedii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...
> +*/
> +_Complex int f_convoluted(int a, int b) {
> +    _Complex int t = (_Complex int)a;
> +    __imag__ t = b;
> +    return t;
> +}
  
Richard Sandiford June 20, 2022, 8 a.m. UTC | #11
Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Tamar Christina <Tamar.Christina@arm.com> writes:
>>> -----Original Message-----
>>> From: Richard Sandiford <richard.sandiford@arm.com>
>>> Sent: Monday, June 13, 2022 9:41 AM
>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de
>>> Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to
>>> set the lowpart.
>>> 
>>> Tamar Christina <tamar.christina@arm.com> writes:
>>> > Hi All,
>>> >
>>> > When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.
>>> One
>>> > for the lowpart and one for the highpart.
>>> >
>>> > The problem with this is that in RTL the lvalue of the RTX is the only
>>> > thing tying the two instructions together.
>>> >
>>> > This means that e.g. combine is unable to try to combine the two
>>> > instructions for setting the lowpart and highpart.
>>> >
>>> > For ISAs that have bit extract instructions we can eliminate one of
>>> > the extracts if, and only if we're setting the entire complex number.
>>> >
>>> > This change changes the expand code when we're setting the entire
>>> > complex number to generate a subreg for the lowpart instead of a
>>> vec_extract.
>>> >
>>> > This allows us to optimize sequences such as:
>>> >
>>> > _Complex int f(int a, int b) {
>>> >     _Complex int t = a + b * 1i;
>>> >     return t;
>>> > }
>>> >
>>> > from:
>>> >
>>> > f:
>>> > 	bfi     x2, x0, 0, 32
>>> > 	bfi     x2, x1, 32, 32
>>> > 	mov     x0, x2
>>> > 	ret
>>> >
>>> > into:
>>> >
>>> > f:
>>> > 	bfi	x0, x1, 32, 32
>>> > 	ret
>>> >
>>> > I have also confirmed the codegen for x86_64 did not change.
>>> >
>>> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
>>> > and no issues.
>>> >
>>> > Ok for master?
>>> 
>>> I'm not sure this is endian-safe.  For big-endian it's the imaginary part that can
>>> be written as a subreg.  The real part might be the high part of a register.
>>> 
>>> Maybe a more general way to handle this would be to add (yet another)
>>> parameter to store_bit_field that indicates that the current value of the
>>> structure is undefined.  That would also be useful in at least one other caller
>>> (from calls.cc).  write_complex_part could then have a similar parameter,
>>> true for the first write and false for the second.
>>
>> Ohayou-gozaimasu!
>>
>> I've rewritten it using the approach you requested. I attempted to set the flag
>> In the correct places as well.
>
> Thanks, looks good.  But rather than treat this as a new case, I think
> we can instead generalise this store_bit_field_1 code:
>
>       else if (constant_multiple_p (bitnum, regsize * BITS_PER_UNIT, &regnum)
> 	       && multiple_p (bitsize, regsize * BITS_PER_UNIT)
> 	       && known_ge (GET_MODE_BITSIZE (GET_MODE (op0)), bitsize))



> 	{
> 	  sub = simplify_gen_subreg (fieldmode, op0, GET_MODE (op0),
> 				     regnum * regsize);
> 	  if (sub)
> 	    {
> 	      if (reverse)
> 		value = flip_storage_order (fieldmode, value);
> 	      emit_move_insn (sub, value);
> 	      return true;
> 	    }
> 	}
>
> so that the multiple_p test is skipped if the structure is undefined.

Actually, we should probably skip the constant_multiple_p test as well.
Keeping it would only be meaningful for little-endian.

simplify_gen_subreg should alread do the necessary checks to make sure
that the subreg can be formed (via validate_subreg).

Thanks,
Richard

> Richard
>
>> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
>> and no issues.
>>
>> Ok for master?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>> 	* expmed.cc (store_bit_field): Add parameter that indicates if value is
>> 	still undefined and if so emit a subreg move instead.
>> 	* expr.h (write_complex_part): Likewise.
>> 	* expmed.h (store_bit_field): Add new parameter.
>> 	* builtins.cc (expand_ifn_atomic_compare_exchange_into_call): Use new
>> 	parameter.
>> 	(expand_ifn_atomic_compare_exchange): Likewise.
>> 	* calls.cc (store_unaligned_arguments_into_pseudos): Likewise.
>> 	* emit-rtl.cc (validate_subreg): Likewise.
>> 	* expr.cc (emit_group_store): Likewise.
>> 	(copy_blkmode_from_reg): Likewise.
>> 	(copy_blkmode_to_reg): Likewise.
>> 	(clear_storage_hints): Likewise.
>> 	(write_complex_part):  Likewise.
>> 	(emit_move_complex_parts): Likewise.
>> 	(expand_assignment): Likewise.
>> 	(store_expr): Likewise.
>> 	(store_field): Likewise.
>> 	(expand_expr_real_2): Likewise.
>> 	* ifcvt.cc (noce_emit_move_insn): Likewise.
>> 	* internal-fn.cc (expand_arith_set_overflow): Likewise.
>> 	(expand_arith_overflow_result_store): Likewise.
>> 	(expand_addsub_overflow): Likewise.
>> 	(expand_neg_overflow): Likewise.
>> 	(expand_mul_overflow): Likewise.
>> 	(expand_arith_overflow): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* g++.target/aarch64/complex-init.C: New test.
>>
>> --- inline copy of patch ---
>>
>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
>> index 4c6c29390531d8ae9765add598621727213b23ec..8c80e46d9c9c9c2a7e1ce0f8add86729fd542b16 100644
>> --- a/gcc/builtins.cc
>> +++ b/gcc/builtins.cc
>> @@ -6014,8 +6014,8 @@ expand_ifn_atomic_compare_exchange_into_call (gcall *call, machine_mode mode)
>>        if (GET_MODE (boolret) != mode)
>>  	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
>>        x = force_reg (mode, x);
>> -      write_complex_part (target, boolret, true);
>> -      write_complex_part (target, x, false);
>> +      write_complex_part (target, boolret, true, true);
>> +      write_complex_part (target, x, false, false);
>>      }
>>  }
>>  
>> @@ -6070,8 +6070,8 @@ expand_ifn_atomic_compare_exchange (gcall *call)
>>        rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>>        if (GET_MODE (boolret) != mode)
>>  	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
>> -      write_complex_part (target, boolret, true);
>> -      write_complex_part (target, oldval, false);
>> +      write_complex_part (target, boolret, true, true);
>> +      write_complex_part (target, oldval, false, false);
>>      }
>>  }
>>  
>> diff --git a/gcc/calls.cc b/gcc/calls.cc
>> index e13469cfd43b5bdd4ca0d2b8458a9e4f996e36e9..586af170879ab0152e9c0634a4b8e0ce03ea8d6e 100644
>> --- a/gcc/calls.cc
>> +++ b/gcc/calls.cc
>> @@ -1214,7 +1214,7 @@ store_unaligned_arguments_into_pseudos (struct arg_data *args, int num_actuals)
>>  
>>  	    bytes -= bitsize / BITS_PER_UNIT;
>>  	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
>> -			     word_mode, word, false);
>> +			     word_mode, word, false, false);
>>  	  }
>>        }
>>  }
>> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
>> index f4404d7abe33b565358b7f609a91114c75ecf4e7..15ffca2ffe986bca56c1fae9381bd33f5d6b012d 100644
>> --- a/gcc/emit-rtl.cc
>> +++ b/gcc/emit-rtl.cc
>> @@ -947,9 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
>>  	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
>>      ;
>>    /* Subregs involving floating point modes are not allowed to
>> -     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
>> +     change size unless it's an insert into a complex mode.
>> +     Therefore (subreg:DI (reg:DF) 0) and (subreg:CS (reg:SF) 0) are fine, but
>>       (subreg:SI (reg:DF) 0) isn't.  */
>> -  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
>> +  else if ((FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
>> +	   && !COMPLEX_MODE_P (omode))
>>      {
>>        if (! (known_eq (isize, osize)
>>  	     /* LRA can use subreg to store a floating point value in
>> diff --git a/gcc/expmed.h b/gcc/expmed.h
>> index ee1ddc82b601ce02957c493dad0d70eee2784ed7..0b2538c4c6bd51dfdc772ef70bdf631c0bed8717 100644
>> --- a/gcc/expmed.h
>> +++ b/gcc/expmed.h
>> @@ -715,7 +715,7 @@ extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
>>  
>>  extern void store_bit_field (rtx, poly_uint64, poly_uint64,
>>  			     poly_uint64, poly_uint64,
>> -			     machine_mode, rtx, bool);
>> +			     machine_mode, rtx, bool, bool);
>>  extern rtx extract_bit_field (rtx, poly_uint64, poly_uint64, int, rtx,
>>  			      machine_mode, machine_mode, bool, rtx *);
>>  extern rtx extract_low_bits (machine_mode, machine_mode, rtx);
>> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
>> index ed39c88bd044279113f46661608c53b8a69d81a1..9c66cc40f60aadf5472b25770cb16d6e9f85a7e2 100644
>> --- a/gcc/expmed.cc
>> +++ b/gcc/expmed.cc
>> @@ -1112,13 +1112,15 @@ store_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
>>  
>>     FIELDMODE is the machine-mode of the FIELD_DECL node for this field.
>>  
>> -   If REVERSE is true, the store is to be done in reverse order.  */
>> +   If REVERSE is true, the store is to be done in reverse order.
>> +
>> +   If UNDEFINED_P is true then STR_RTX is currently undefined.  */
>>  
>>  void
>>  store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
>>  		 poly_uint64 bitregion_start, poly_uint64 bitregion_end,
>>  		 machine_mode fieldmode,
>> -		 rtx value, bool reverse)
>> +		 rtx value, bool reverse, bool undefined_p)
>>  {
>>    /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
>>    unsigned HOST_WIDE_INT ibitsize = 0, ibitnum = 0;
>> @@ -1160,6 +1162,18 @@ store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
>>        return;
>>      }
>>  
>> +  if (bitsize.is_constant (&ibitsize)
>> +      && bitnum.is_constant (&ibitnum)
>> +      && is_a <scalar_int_mode> (fieldmode, &int_mode)
>> +      && undefined_p
>> +      && ibitsize == GET_MODE_BITSIZE (int_mode))
>> +    {
>> +      gcc_assert (ibitnum % BITS_PER_UNIT == 0);
>> +      rtx dest = lowpart_subreg (GET_MODE (value), str_rtx, GET_MODE (str_rtx));
>> +      emit_move_insn (dest, value);
>> +      return;
>> +    }
>> +
>>    /* Under the C++0x memory model, we must not touch bits outside the
>>       bit region.  Adjust the address to start at the beginning of the
>>       bit region.  */
>> diff --git a/gcc/expr.h b/gcc/expr.h
>> index 7e5cf495a2bf12e15e2a00d293dfb54f830f38dd..41447e023c7218db195e8db3725714cffb10b3f3 100644
>> --- a/gcc/expr.h
>> +++ b/gcc/expr.h
>> @@ -253,7 +253,7 @@ extern rtx_insn *emit_move_insn_1 (rtx, rtx);
>>  extern rtx_insn *emit_move_complex_push (machine_mode, rtx, rtx);
>>  extern rtx_insn *emit_move_complex_parts (rtx, rtx);
>>  extern rtx read_complex_part (rtx, bool);
>> -extern void write_complex_part (rtx, rtx, bool);
>> +extern void write_complex_part (rtx, rtx, bool, bool);
>>  extern rtx read_complex_part (rtx, bool);
>>  extern rtx emit_move_resolve_push (machine_mode, rtx);
>>  
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index 5f7142b975ada2cd8b00663d35ba1e0004b8e28d..c7f057a87fbc92d81388c87dd3754b4d21af6612 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -2850,7 +2850,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
>>  	  store_bit_field (dest,
>>  			   adj_bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
>>  			   bytepos * BITS_PER_UNIT, ssize * BITS_PER_UNIT - 1,
>> -			   VOIDmode, tmps[i], false);
>> +			   VOIDmode, tmps[i], false, false);
>>  	}
>>  
>>        /* Optimize the access just a bit.  */
>> @@ -2864,7 +2864,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
>>  
>>        else
>>  	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
>> -			 0, 0, mode, tmps[i], false);
>> +			 0, 0, mode, tmps[i], false, false);
>>      }
>>  
>>    /* Copy from the pseudo into the (probable) hard reg.  */
>> @@ -2997,7 +2997,7 @@ copy_blkmode_from_reg (rtx target, rtx srcreg, tree type)
>>  					  xbitpos % BITS_PER_WORD, 1,
>>  					  NULL_RTX, copy_mode, copy_mode,
>>  					  false, NULL),
>> -		       false);
>> +		       false, false);
>>      }
>>  }
>>  
>> @@ -3099,7 +3099,7 @@ copy_blkmode_to_reg (machine_mode mode_in, tree src)
>>  					  bitpos % BITS_PER_WORD, 1,
>>  					  NULL_RTX, word_mode, word_mode,
>>  					  false, NULL),
>> -		       false);
>> +		       false, false);
>>      }
>>  
>>    if (mode == BLKmode)
>> @@ -3267,8 +3267,8 @@ clear_storage_hints (rtx object, rtx size, enum block_op_methods method,
>>  	  zero = CONST0_RTX (GET_MODE_INNER (mode));
>>  	  if (zero != NULL)
>>  	    {
>> -	      write_complex_part (object, zero, 0);
>> -	      write_complex_part (object, zero, 1);
>> +	      write_complex_part (object, zero, 0, true);
>> +	      write_complex_part (object, zero, 1, false);
>>  	      return NULL;
>>  	    }
>>  	}
>> @@ -3429,10 +3429,11 @@ set_storage_via_setmem (rtx object, rtx size, rtx val, unsigned int align,
>>  
>>  
>>
>>  /* Write to one of the components of the complex value CPLX.  Write VAL to
>> -   the real part if IMAG_P is false, and the imaginary part if its true.  */
>> +   the real part if IMAG_P is false, and the imaginary part if its true.
>> +   If UNDEFINED_P then the value in CPLX is currently undefined.  */
>>  
>>  void
>> -write_complex_part (rtx cplx, rtx val, bool imag_p)
>> +write_complex_part (rtx cplx, rtx val, bool imag_p, bool undefined_p)
>>  {
>>    machine_mode cmode;
>>    scalar_mode imode;
>> @@ -3487,7 +3488,7 @@ write_complex_part (rtx cplx, rtx val, bool imag_p)
>>      }
>>  
>>    store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val,
>> -		   false);
>> +		   false, undefined_p);
>>  }
>>  
>>  /* Extract one of the components of the complex value CPLX.  Extract the
>> @@ -3740,8 +3741,8 @@ emit_move_complex_parts (rtx x, rtx y)
>>        && REG_P (x) && !reg_overlap_mentioned_p (x, y))
>>      emit_clobber (x);
>>  
>> -  write_complex_part (x, read_complex_part (y, false), false);
>> -  write_complex_part (x, read_complex_part (y, true), true);
>> +  write_complex_part (x, read_complex_part (y, false), false, true);
>> +  write_complex_part (x, read_complex_part (y, true), true, false);
>>  
>>    return get_last_insn ();
>>  }
>> @@ -5385,7 +5386,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
>>  	}
>>        else
>>  	store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg,
>> -			 false);
>> +			 false, false);
>>        return;
>>      }
>>  
>> @@ -5607,8 +5608,8 @@ expand_assignment (tree to, tree from, bool nontemporal)
>>  	    concat_store_slow:;
>>  	      rtx temp = assign_stack_temp (GET_MODE (to_rtx),
>>  					    GET_MODE_SIZE (GET_MODE (to_rtx)));
>> -	      write_complex_part (temp, XEXP (to_rtx, 0), false);
>> -	      write_complex_part (temp, XEXP (to_rtx, 1), true);
>> +	      write_complex_part (temp, XEXP (to_rtx, 0), false, true);
>> +	      write_complex_part (temp, XEXP (to_rtx, 1), true, false);
>>  	      result = store_field (temp, bitsize, bitpos,
>>  				    bitregion_start, bitregion_end,
>>  				    mode1, from, get_alias_set (to),
>> @@ -6166,7 +6167,8 @@ store_expr (tree exp, rtx target, int call_param_p,
>>  		store_bit_field (target,
>>  				 rtx_to_poly_int64 (expr_size (exp))
>>  				 * BITS_PER_UNIT,
>> -				 0, 0, 0, GET_MODE (temp), temp, reverse);
>> +				 0, 0, 0, GET_MODE (temp), temp, reverse,
>> +				 false);
>>  	    }
>>  	  else
>>  	    convert_move (target, temp, TYPE_UNSIGNED (TREE_TYPE (exp)));
>> @@ -7556,7 +7558,7 @@ store_field (rtx target, poly_int64 bitsize, poly_int64 bitpos,
>>        gcc_checking_assert (known_ge (bitpos, 0));
>>        store_bit_field (target, bitsize, bitpos,
>>  		       bitregion_start, bitregion_end,
>> -		       mode, temp, reverse);
>> +		       mode, temp, reverse, false);
>>  
>>        return const0_rtx;
>>      }
>> @@ -10012,8 +10014,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
>>  	      complex_expr_swap_order:
>>  		/* Move the imaginary (op1) and real (op0) parts to their
>>  		   location.  */
>> -		write_complex_part (target, op1, true);
>> -		write_complex_part (target, op0, false);
>> +		write_complex_part (target, op1, true, true);
>> +		write_complex_part (target, op0, false, false);
>>  
>>  		return target;
>>  	      }
>> @@ -10042,8 +10044,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
>>  	  }
>>  
>>        /* Move the real (op0) and imaginary (op1) parts to their location.  */
>> -      write_complex_part (target, op0, false);
>> -      write_complex_part (target, op1, true);
>> +      write_complex_part (target, op0, false, true);
>> +      write_complex_part (target, op1, true, false);
>>  
>>        return target;
>>  
>> @@ -10282,7 +10284,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
>>  	rtx dst = gen_reg_rtx (mode);
>>  	emit_move_insn (dst, op0);
>>  	store_bit_field (dst, bitsize, bitpos, 0, 0,
>> -			 TYPE_MODE (TREE_TYPE (treeop1)), op1, false);
>> +			 TYPE_MODE (TREE_TYPE (treeop1)), op1, false, false);
>>  	return dst;
>>        }
>>  
>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>> index 22960a67f893316563e46658fab8f06cf9cbeb80..149403586f4e220d27759a3c36cf5806d03f2040 100644
>> --- a/gcc/ifcvt.cc
>> +++ b/gcc/ifcvt.cc
>> @@ -999,7 +999,8 @@ noce_emit_move_insn (rtx x, rtx y)
>>  		}
>>  
>>  	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
>> -	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y, false);
>> +	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y, false,
>> +			       false);
>>  	      return;
>>  	    }
>>  
>> @@ -1056,7 +1057,7 @@ noce_emit_move_insn (rtx x, rtx y)
>>    outmode = GET_MODE (outer);
>>    bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
>>    store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
>> -		   0, 0, outmode, y, false);
>> +		   0, 0, outmode, y, false, false);
>>  }
>>  
>>  /* Return the CC reg if it is used in COND.  */
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 8b1733e20c4455e4e8c383c92fe859f4256cae69..8baf52f01516a0e8294176af68e69ec67e43278f 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -735,9 +735,9 @@ expand_arith_set_overflow (tree lhs, rtx target)
>>  {
>>    if (TYPE_PRECISION (TREE_TYPE (TREE_TYPE (lhs))) == 1
>>        && !TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (lhs))))
>> -    write_complex_part (target, constm1_rtx, true);
>> +    write_complex_part (target, constm1_rtx, true, false);
>>    else
>> -    write_complex_part (target, const1_rtx, true);
>> +    write_complex_part (target, const1_rtx, true, false);
>>  }
>>  
>>  /* Helper for expand_*_overflow.  Store RES into the __real__ part
>> @@ -792,7 +792,7 @@ expand_arith_overflow_result_store (tree lhs, rtx target,
>>        expand_arith_set_overflow (lhs, target);
>>        emit_label (done_label);
>>      }
>> -  write_complex_part (target, lres, false);
>> +  write_complex_part (target, lres, false, false);
>>  }
>>  
>>  /* Helper for expand_*_overflow.  Store RES into TARGET.  */
>> @@ -837,7 +837,7 @@ expand_addsub_overflow (location_t loc, tree_code code, tree lhs,
>>      {
>>        target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>>        if (!is_ubsan)
>> -	write_complex_part (target, const0_rtx, true);
>> +	write_complex_part (target, const0_rtx, true, false);
>>      }
>>  
>>    /* We assume both operands and result have the same precision
>> @@ -1282,7 +1282,7 @@ expand_neg_overflow (location_t loc, tree lhs, tree arg1, bool is_ubsan,
>>      {
>>        target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>>        if (!is_ubsan)
>> -	write_complex_part (target, const0_rtx, true);
>> +	write_complex_part (target, const0_rtx, true, false);
>>      }
>>  
>>    enum insn_code icode = optab_handler (negv3_optab, mode);
>> @@ -1407,7 +1407,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
>>      {
>>        target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>>        if (!is_ubsan)
>> -	write_complex_part (target, const0_rtx, true);
>> +	write_complex_part (target, const0_rtx, true, false);
>>      }
>>  
>>    if (is_ubsan)
>> @@ -2224,7 +2224,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
>>        do_compare_rtx_and_jump (op1, res, NE, true, mode, NULL_RTX, NULL,
>>  			       all_done_label, profile_probability::very_unlikely ());
>>        emit_label (set_noovf);
>> -      write_complex_part (target, const0_rtx, true);
>> +      write_complex_part (target, const0_rtx, true, false);
>>        emit_label (all_done_label);
>>      }
>>  
>> @@ -2493,7 +2493,7 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
>>  	{
>>  	  /* The infinity precision result will always fit into result.  */
>>  	  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>> -	  write_complex_part (target, const0_rtx, true);
>> +	  write_complex_part (target, const0_rtx, true, false);
>>  	  scalar_int_mode mode = SCALAR_INT_TYPE_MODE (type);
>>  	  struct separate_ops ops;
>>  	  ops.code = code;
>> diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C b/gcc/testsuite/g++.target/aarch64/complex-init.C
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..d3fd3e88d04a87bacf1c4ee74ce25282c6ff81e8
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
>> @@ -0,0 +1,40 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
>> +
>> +/*
>> +** _Z1fii:
>> +** ...
>> +** 	bfi	x0, x1, 32, 32
>> +** 	ret
>> +** ...
>> +*/
>> +_Complex int f(int a, int b) {
>> +    _Complex int t = a + b * 1i;
>> +    return t;
>> +}
>> +
>> +/*
>> +** _Z2f2ii:
>> +** ...
>> +** 	bfi	x0, x1, 32, 32
>> +** 	ret
>> +** ...
>> +*/
>> +_Complex int f2(int a, int b) {
>> +    _Complex int t = {a, b};
>> +    return t;
>> +}
>> +
>> +/* 
>> +** _Z12f_convolutedii:
>> +** ...
>> +** 	bfi	x0, x1, 32, 32
>> +** 	ret
>> +** ...
>> +*/
>> +_Complex int f_convoluted(int a, int b) {
>> +    _Complex int t = (_Complex int)a;
>> +    __imag__ t = b;
>> +    return t;
>> +}
  
Jeff Law June 24, 2022, 9:54 p.m. UTC | #12
On 6/15/2022 5:36 AM, Richard Sandiford wrote:
> Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On 6/13/2022 5:54 AM, Richard Biener wrote:
>>> On Sun, Jun 12, 2022 at 7:27 PM Jeff Law via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>> [...]
>>>> On a related topic, any thoughts on keeping complex objects as complex
>>>> types/modes through gimple and into at least parts of the RTL pipeline?
>>>>
>>>> The way complex arithmetic instructions work on our chip is going to be
>>>> extremely tough to utilize in GCC -- we really need to the complex
>>>> types/arithmetic up through RTL generation at the least. Ideally we'd
>>>> even expose complex modes all the way to final.    Is that something
>>>> y'all could benefit from as well?  Have y'all poked at this problem at all?
>>> Since you are going to need to "recover" complex operations from people
>>> open-coding them (both fortran and C and also C++ with std::complex) it
>>> should be less work to just do that ;)  I think that complex modes and types
>>> exist solely for ABI purposes.
>> I don't see any reasonable way to do that.  Without going into all the
>> details, our complex ops work on low elements within a vector
>> register.   Trying to recover them after gimple->rtl expansion would be
>> similar to trying to SLP vectorize on RTL, something I'm not keen to chase.
>>
>> It would be a hell of a lot easier to leave the complex ops as complex
>> ops to the expanders, then make the decision to use the complex
>> instructions or decompose into components.
> Realise you might not be in a position to answer this for confidentiality
> reasons, but: would normal tree SLP not help here?  We already try to
> recognise complex add & multiply, and in principle we could do the same
> for other operations as well.  It shouldn't matter that a vector
> multiply on 2 elements is really just a single-data operation.
That's the only viable option I see for our target if we keep the model 
where complex operations are broken down into their component operations 
in the gimple passes.

jeff
  
Tamar Christina July 5, 2022, 3:05 p.m. UTC | #13
> > so that the multiple_p test is skipped if the structure is undefined.
> 
> Actually, we should probably skip the constant_multiple_p test as well.
> Keeping it would only be meaningful for little-endian.
> 
> simplify_gen_subreg should alread do the necessary checks to make sure
> that the subreg can be formed (via validate_subreg).
> 

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* expmed.cc (store_bit_field):
        * expmed.cc (store_bit_field_1): Add parameter that indicates if value is
	still undefined and if so emit a subreg move instead.
	(store_integral_bit_field): Likewise.
	(store_bit_field): Likewise.
	* expr.h (write_complex_part): Likewise.
	* expmed.h (store_bit_field): Add new parameter.
	* builtins.cc (expand_ifn_atomic_compare_exchange_into_call): Use new
	parameter.
	(expand_ifn_atomic_compare_exchange): Likewise.
	* calls.cc (store_unaligned_arguments_into_pseudos): Likewise.
	* emit-rtl.cc (validate_subreg): Likewise.
	* expr.cc (emit_group_store): Likewise.
	(copy_blkmode_from_reg): Likewise.
	(copy_blkmode_to_reg): Likewise.
	(clear_storage_hints): Likewise.
	(write_complex_part):  Likewise.
	(emit_move_complex_parts): Likewise.
	(expand_assignment): Likewise.
	(store_expr): Likewise.
	(store_field): Likewise.
	(expand_expr_real_2): Likewise.
	* ifcvt.cc (noce_emit_move_insn): Likewise.
	* internal-fn.cc (expand_arith_set_overflow): Likewise.
	(expand_arith_overflow_result_store): Likewise.
	(expand_addsub_overflow): Likewise.
	(expand_neg_overflow): Likewise.
	(expand_mul_overflow): Likewise.
	(expand_arith_overflow): Likewise.

gcc/testsuite/ChangeLog:

	* g++.target/aarch64/complex-init.C: New test.

--- inline copy of patch ---

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index e6816d5c86550b724e89aad834ad3314d555a6b4..35b9197945fdc8ba44a8b02a871490ed384a4927 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6029,8 +6029,8 @@ expand_ifn_atomic_compare_exchange_into_call (gcall *call, machine_mode mode)
       if (GET_MODE (boolret) != mode)
 	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
       x = force_reg (mode, x);
-      write_complex_part (target, boolret, true);
-      write_complex_part (target, x, false);
+      write_complex_part (target, boolret, true, true);
+      write_complex_part (target, x, false, false);
     }
 }
 
@@ -6085,8 +6085,8 @@ expand_ifn_atomic_compare_exchange (gcall *call)
       rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (GET_MODE (boolret) != mode)
 	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
-      write_complex_part (target, boolret, true);
-      write_complex_part (target, oldval, false);
+      write_complex_part (target, boolret, true, true);
+      write_complex_part (target, oldval, false, false);
     }
 }
 
diff --git a/gcc/calls.cc b/gcc/calls.cc
index f4e1299505ed542f34a6873c3537b881ed288c98..4e61e558b27071ee0f2f0f4cff7c1d806741b069 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -1216,7 +1216,7 @@ store_unaligned_arguments_into_pseudos (struct arg_data *args, int num_actuals)
 
 	    bytes -= bitsize / BITS_PER_UNIT;
 	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
-			     word_mode, word, false);
+			     word_mode, word, false, false);
 	  }
       }
 }
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 1e02ae254d01293d03447bf686e7581d3758305f..3929ee08986cb9137dada62b0edcc30ed81cc1a4 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -947,9 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
 	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
     ;
   /* Subregs involving floating point modes are not allowed to
-     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
+     change size unless it's an insert into a complex mode.
+     Therefore (subreg:DI (reg:DF) 0) and (subreg:CS (reg:SF) 0) are fine, but
      (subreg:SI (reg:DF) 0) isn't.  */
-  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
+  else if ((FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
+	   && !COMPLEX_MODE_P (omode))
     {
       if (! (known_eq (isize, osize)
 	     /* LRA can use subreg to store a floating point value in
diff --git a/gcc/expmed.h b/gcc/expmed.h
index ee1ddc82b601ce02957c493dad0d70eee2784ed7..0b2538c4c6bd51dfdc772ef70bdf631c0bed8717 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -715,7 +715,7 @@ extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
 
 extern void store_bit_field (rtx, poly_uint64, poly_uint64,
 			     poly_uint64, poly_uint64,
-			     machine_mode, rtx, bool);
+			     machine_mode, rtx, bool, bool);
 extern rtx extract_bit_field (rtx, poly_uint64, poly_uint64, int, rtx,
 			      machine_mode, machine_mode, bool, rtx *);
 extern rtx extract_low_bits (machine_mode, machine_mode, rtx);
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index c3e4aa8019a69431641f8b1097e32e9d28892039..9b01b5a51e797e18daa2f312f9bdb8f74bcf1ed9 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -738,13 +738,16 @@ store_bit_field_using_insv (const extraction_insn *insv, rtx op0,
 
    If FALLBACK_P is true, fall back to store_fixed_bit_field if we have
    no other way of implementing the operation.  If FALLBACK_P is false,
-   return false instead.  */
+   return false instead.
+
+   if UNDEFINED_P is true then STR_RTX is undefined and may be set using
+   a subreg instead.  */
 
 static bool
 store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 		   poly_uint64 bitregion_start, poly_uint64 bitregion_end,
 		   machine_mode fieldmode,
-		   rtx value, bool reverse, bool fallback_p)
+		   rtx value, bool reverse, bool fallback_p, bool undefined_p)
 {
   rtx op0 = str_rtx;
 
@@ -805,8 +808,9 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 	      return true;
 	    }
 	}
-      else if (constant_multiple_p (bitnum, regsize * BITS_PER_UNIT, &regnum)
-	       && multiple_p (bitsize, regsize * BITS_PER_UNIT)
+      else if (((constant_multiple_p (bitnum, regsize * BITS_PER_UNIT, &regnum)
+		 && multiple_p (bitsize, regsize * BITS_PER_UNIT))
+		|| undefined_p)
 	       && known_ge (GET_MODE_BITSIZE (GET_MODE (op0)), bitsize))
 	{
 	  sub = simplify_gen_subreg (fieldmode, op0, GET_MODE (op0),
@@ -869,7 +873,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 					GET_MODE_SIZE (GET_MODE (op0)));
 	  emit_move_insn (temp, op0);
 	  store_bit_field_1 (temp, bitsize, bitnum, 0, 0, fieldmode, value,
-			     reverse, fallback_p);
+			     reverse, fallback_p, undefined_p);
 	  emit_move_insn (op0, temp);
 	  return true;
 	}
@@ -994,7 +998,7 @@ store_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 				  bitnum + bit_offset,
 				  bitregion_start, bitregion_end,
 				  word_mode,
-				  value_word, reverse, fallback_p))
+				  value_word, reverse, fallback_p, false))
 	    {
 	      delete_insns_since (last);
 	      return false;
@@ -1084,7 +1088,7 @@ store_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 	  rtx tempreg = copy_to_reg (xop0);
 	  if (store_bit_field_1 (tempreg, bitsize, bitpos,
 				 bitregion_start, bitregion_end,
-				 fieldmode, orig_value, reverse, false))
+				 fieldmode, orig_value, reverse, false, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
 	      return true;
@@ -1112,13 +1116,15 @@ store_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.
 
-   If REVERSE is true, the store is to be done in reverse order.  */
+   If REVERSE is true, the store is to be done in reverse order.
+
+   If UNDEFINED_P is true then STR_RTX is currently undefined.  */
 
 void
 store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 		 poly_uint64 bitregion_start, poly_uint64 bitregion_end,
 		 machine_mode fieldmode,
-		 rtx value, bool reverse)
+		 rtx value, bool reverse, bool undefined_p)
 {
   /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
   unsigned HOST_WIDE_INT ibitsize = 0, ibitnum = 0;
@@ -1151,7 +1157,7 @@ store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 	  gcc_assert (ibitnum + ibitsize <= GET_MODE_BITSIZE (int_mode));
 	  temp = copy_to_reg (str_rtx);
 	  if (!store_bit_field_1 (temp, ibitsize, ibitnum, 0, 0,
-				  int_mode, value, reverse, true))
+				  int_mode, value, reverse, true, undefined_p))
 	    gcc_unreachable ();
 
 	  emit_move_insn (str_rtx, temp);
@@ -1186,7 +1192,7 @@ store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 
   if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
 			  bitregion_start, bitregion_end,
-			  fieldmode, value, reverse, true))
+			  fieldmode, value, reverse, true, undefined_p))
     gcc_unreachable ();
 }
 

diff --git a/gcc/expr.h b/gcc/expr.h
index 035118324057e9dcb56d08a998aa9b469295ae3e..332e985856718b20594e5f14370119af0589ebbd 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -253,7 +253,7 @@ extern rtx_insn *emit_move_insn_1 (rtx, rtx);
 extern rtx_insn *emit_move_complex_push (machine_mode, rtx, rtx);
 extern rtx_insn *emit_move_complex_parts (rtx, rtx);
 extern rtx read_complex_part (rtx, bool);
-extern void write_complex_part (rtx, rtx, bool);
+extern void write_complex_part (rtx, rtx, bool, bool);
 extern rtx read_complex_part (rtx, bool);
 extern rtx emit_move_resolve_push (machine_mode, rtx);
 
diff --git a/gcc/expr.cc b/gcc/expr.cc
index c90cde35006bf44b560cdb2ecb6cfc113bf3fb89..a10f8e3c4a958ab7fb1de70d20cf3d43c9bac461 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2865,7 +2865,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
 	  store_bit_field (dest,
 			   adj_bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
 			   bytepos * BITS_PER_UNIT, ssize * BITS_PER_UNIT - 1,
-			   VOIDmode, tmps[i], false);
+			   VOIDmode, tmps[i], false, false);
 	}
 
       /* Optimize the access just a bit.  */
@@ -2879,7 +2879,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
 
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 0, 0, mode, tmps[i], false);
+			 0, 0, mode, tmps[i], false, false);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -3012,7 +3012,7 @@ copy_blkmode_from_reg (rtx target, rtx srcreg, tree type)
 					  xbitpos % BITS_PER_WORD, 1,
 					  NULL_RTX, copy_mode, copy_mode,
 					  false, NULL),
-		       false);
+		       false, false);
     }
 }
 
@@ -3114,7 +3114,7 @@ copy_blkmode_to_reg (machine_mode mode_in, tree src)
 					  bitpos % BITS_PER_WORD, 1,
 					  NULL_RTX, word_mode, word_mode,
 					  false, NULL),
-		       false);
+		       false, false);
     }
 
   if (mode == BLKmode)
@@ -3282,8 +3282,8 @@ clear_storage_hints (rtx object, rtx size, enum block_op_methods method,
 	  zero = CONST0_RTX (GET_MODE_INNER (mode));
 	  if (zero != NULL)
 	    {
-	      write_complex_part (object, zero, 0);
-	      write_complex_part (object, zero, 1);
+	      write_complex_part (object, zero, 0, true);
+	      write_complex_part (object, zero, 1, false);
 	      return NULL;
 	    }
 	}
@@ -3444,10 +3444,11 @@ set_storage_via_setmem (rtx object, rtx size, rtx val, unsigned int align,
 
 

 /* Write to one of the components of the complex value CPLX.  Write VAL to
-   the real part if IMAG_P is false, and the imaginary part if its true.  */
+   the real part if IMAG_P is false, and the imaginary part if its true.
+   If UNDEFINED_P then the value in CPLX is currently undefined.  */
 
 void
-write_complex_part (rtx cplx, rtx val, bool imag_p)
+write_complex_part (rtx cplx, rtx val, bool imag_p, bool undefined_p)
 {
   machine_mode cmode;
   scalar_mode imode;
@@ -3502,7 +3503,7 @@ write_complex_part (rtx cplx, rtx val, bool imag_p)
     }
 
   store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val,
-		   false);
+		   false, undefined_p);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3755,8 +3756,8 @@ emit_move_complex_parts (rtx x, rtx y)
       && REG_P (x) && !reg_overlap_mentioned_p (x, y))
     emit_clobber (x);
 
-  write_complex_part (x, read_complex_part (y, false), false);
-  write_complex_part (x, read_complex_part (y, true), true);
+  write_complex_part (x, read_complex_part (y, false), false, true);
+  write_complex_part (x, read_complex_part (y, true), true, false);
 
   return get_last_insn ();
 }
@@ -5415,7 +5416,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
 	}
       else
 	store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg,
-			 false);
+			 false, false);
       return;
     }
 
@@ -5637,8 +5638,8 @@ expand_assignment (tree to, tree from, bool nontemporal)
 	    concat_store_slow:;
 	      rtx temp = assign_stack_temp (GET_MODE (to_rtx),
 					    GET_MODE_SIZE (GET_MODE (to_rtx)));
-	      write_complex_part (temp, XEXP (to_rtx, 0), false);
-	      write_complex_part (temp, XEXP (to_rtx, 1), true);
+	      write_complex_part (temp, XEXP (to_rtx, 0), false, true);
+	      write_complex_part (temp, XEXP (to_rtx, 1), true, false);
 	      result = store_field (temp, bitsize, bitpos,
 				    bitregion_start, bitregion_end,
 				    mode1, from, get_alias_set (to),
@@ -6196,7 +6197,8 @@ store_expr (tree exp, rtx target, int call_param_p,
 		store_bit_field (target,
 				 rtx_to_poly_int64 (expr_size (exp))
 				 * BITS_PER_UNIT,
-				 0, 0, 0, GET_MODE (temp), temp, reverse);
+				 0, 0, 0, GET_MODE (temp), temp, reverse,
+				 false);
 	    }
 	  else
 	    convert_move (target, temp, TYPE_UNSIGNED (TREE_TYPE (exp)));
@@ -7605,7 +7607,7 @@ store_field (rtx target, poly_int64 bitsize, poly_int64 bitpos,
       gcc_checking_assert (known_ge (bitpos, 0));
       store_bit_field (target, bitsize, bitpos,
 		       bitregion_start, bitregion_end,
-		       mode, temp, reverse);
+		       mode, temp, reverse, false);
 
       return const0_rtx;
     }
@@ -10094,8 +10096,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	      complex_expr_swap_order:
 		/* Move the imaginary (op1) and real (op0) parts to their
 		   location.  */
-		write_complex_part (target, op1, true);
-		write_complex_part (target, op0, false);
+		write_complex_part (target, op1, true, true);
+		write_complex_part (target, op0, false, false);
 
 		return target;
 	      }
@@ -10124,8 +10126,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	  }
 
       /* Move the real (op0) and imaginary (op1) parts to their location.  */
-      write_complex_part (target, op0, false);
-      write_complex_part (target, op1, true);
+      write_complex_part (target, op0, false, true);
+      write_complex_part (target, op1, true, false);
 
       return target;
 
@@ -10364,7 +10366,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	rtx dst = gen_reg_rtx (mode);
 	emit_move_insn (dst, op0);
 	store_bit_field (dst, bitsize, bitpos, 0, 0,
-			 TYPE_MODE (TREE_TYPE (treeop1)), op1, false);
+			 TYPE_MODE (TREE_TYPE (treeop1)), op1, false, false);
 	return dst;
       }
 
diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index e007b17b793e7905b9ad7651e2dabdc867ea8e67..2f76afd8b0e4d5b8d0be57756dbffb48ccce5324 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -999,7 +999,8 @@ noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y, false);
+	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y, false,
+			       false);
 	      return;
 	    }
 
@@ -1056,7 +1057,7 @@ noce_emit_move_insn (rtx x, rtx y)
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
   store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
-		   0, 0, outmode, y, false);
+		   0, 0, outmode, y, false, false);
 }
 
 /* Return the CC reg if it is used in COND.  */
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 91588f8bc9f7c3fe2bac17f3c4e6078cddb7b4d2..d666ccccf670b5c639c5ae21659aae4e3f789f6a 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -815,9 +815,9 @@ expand_arith_set_overflow (tree lhs, rtx target)
 {
   if (TYPE_PRECISION (TREE_TYPE (TREE_TYPE (lhs))) == 1
       && !TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (lhs))))
-    write_complex_part (target, constm1_rtx, true);
+    write_complex_part (target, constm1_rtx, true, false);
   else
-    write_complex_part (target, const1_rtx, true);
+    write_complex_part (target, const1_rtx, true, false);
 }
 
 /* Helper for expand_*_overflow.  Store RES into the __real__ part
@@ -872,7 +872,7 @@ expand_arith_overflow_result_store (tree lhs, rtx target,
       expand_arith_set_overflow (lhs, target);
       emit_label (done_label);
     }
-  write_complex_part (target, lres, false);
+  write_complex_part (target, lres, false, false);
 }
 
 /* Helper for expand_*_overflow.  Store RES into TARGET.  */
@@ -917,7 +917,7 @@ expand_addsub_overflow (location_t loc, tree_code code, tree lhs,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true);
+	write_complex_part (target, const0_rtx, true, false);
     }
 
   /* We assume both operands and result have the same precision
@@ -1362,7 +1362,7 @@ expand_neg_overflow (location_t loc, tree lhs, tree arg1, bool is_ubsan,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true);
+	write_complex_part (target, const0_rtx, true, false);
     }
 
   enum insn_code icode = optab_handler (negv3_optab, mode);
@@ -1487,7 +1487,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true);
+	write_complex_part (target, const0_rtx, true, false);
     }
 
   if (is_ubsan)
@@ -2304,7 +2304,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
       do_compare_rtx_and_jump (op1, res, NE, true, mode, NULL_RTX, NULL,
 			       all_done_label, profile_probability::very_unlikely ());
       emit_label (set_noovf);
-      write_complex_part (target, const0_rtx, true);
+      write_complex_part (target, const0_rtx, true, false);
       emit_label (all_done_label);
     }
 
@@ -2573,7 +2573,7 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
 	{
 	  /* The infinity precision result will always fit into result.  */
 	  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
-	  write_complex_part (target, const0_rtx, true);
+	  write_complex_part (target, const0_rtx, true, false);
 	  scalar_int_mode mode = SCALAR_INT_TYPE_MODE (type);
 	  struct separate_ops ops;
 	  ops.code = code;
diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C b/gcc/testsuite/g++.target/aarch64/complex-init.C
new file mode 100644
index 0000000000000000000000000000000000000000..d3fd3e88d04a87bacf1c4ee74ce25282c6ff81e8
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+/*
+** _Z1fii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f(int a, int b) {
+    _Complex int t = a + b * 1i;
+    return t;
+}
+
+/*
+** _Z2f2ii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f2(int a, int b) {
+    _Complex int t = {a, b};
+    return t;
+}
+
+/* 
+** _Z12f_convolutedii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f_convoluted(int a, int b) {
+    _Complex int t = (_Complex int)a;
+    __imag__ t = b;
+    return t;
+}
  
Richard Sandiford July 5, 2022, 4:11 p.m. UTC | #14
Tamar Christina <Tamar.Christina@arm.com> writes:
>> > so that the multiple_p test is skipped if the structure is undefined.
>> 
>> Actually, we should probably skip the constant_multiple_p test as well.
>> Keeping it would only be meaningful for little-endian.
>> 
>> simplify_gen_subreg should alread do the necessary checks to make sure
>> that the subreg can be formed (via validate_subreg).
>> 
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* expmed.cc (store_bit_field):
>         * expmed.cc (store_bit_field_1): Add parameter that indicates if value is
> 	still undefined and if so emit a subreg move instead.
> 	(store_integral_bit_field): Likewise.
> 	(store_bit_field): Likewise.
> 	* expr.h (write_complex_part): Likewise.
> 	* expmed.h (store_bit_field): Add new parameter.
> 	* builtins.cc (expand_ifn_atomic_compare_exchange_into_call): Use new
> 	parameter.
> 	(expand_ifn_atomic_compare_exchange): Likewise.
> 	* calls.cc (store_unaligned_arguments_into_pseudos): Likewise.
> 	* emit-rtl.cc (validate_subreg): Likewise.
> 	* expr.cc (emit_group_store): Likewise.
> 	(copy_blkmode_from_reg): Likewise.
> 	(copy_blkmode_to_reg): Likewise.
> 	(clear_storage_hints): Likewise.
> 	(write_complex_part):  Likewise.
> 	(emit_move_complex_parts): Likewise.
> 	(expand_assignment): Likewise.
> 	(store_expr): Likewise.
> 	(store_field): Likewise.
> 	(expand_expr_real_2): Likewise.
> 	* ifcvt.cc (noce_emit_move_insn): Likewise.
> 	* internal-fn.cc (expand_arith_set_overflow): Likewise.
> 	(expand_arith_overflow_result_store): Likewise.
> 	(expand_addsub_overflow): Likewise.
> 	(expand_neg_overflow): Likewise.
> 	(expand_mul_overflow): Likewise.
> 	(expand_arith_overflow): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 	* g++.target/aarch64/complex-init.C: New test.
>
> --- inline copy of patch ---
>
> […]
> diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C b/gcc/testsuite/g++.target/aarch64/complex-init.C
> new file mode 100644
> index 0000000000000000000000000000000000000000..d3fd3e88d04a87bacf1c4ee74ce25282c6ff81e8
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +/*
> +** _Z1fii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...

Sorry for the test nit, but: it shouldn't be necessary to add ...
after the ret.  Same for the other tests.

OK with that change, thanks.

Richard

> +*/
> +_Complex int f(int a, int b) {
> +    _Complex int t = a + b * 1i;
> +    return t;
> +}
> +
> +/*
> +** _Z2f2ii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...
> +*/
> +_Complex int f2(int a, int b) {
> +    _Complex int t = {a, b};
> +    return t;
> +}
> +
> +/* 
> +** _Z12f_convolutedii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...
> +*/
> +_Complex int f_convoluted(int a, int b) {
> +    _Complex int t = (_Complex int)a;
> +    __imag__ t = b;
> +    return t;
> +}
  
Richard Biener Aug. 29, 2022, 10:52 a.m. UTC | #15
On Tue, 5 Jul 2022, Richard Sandiford wrote:

> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> > so that the multiple_p test is skipped if the structure is undefined.
> >> 
> >> Actually, we should probably skip the constant_multiple_p test as well.
> >> Keeping it would only be meaningful for little-endian.
> >> 
> >> simplify_gen_subreg should alread do the necessary checks to make sure
> >> that the subreg can be formed (via validate_subreg).
> >> 
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* expmed.cc (store_bit_field):
> >         * expmed.cc (store_bit_field_1): Add parameter that indicates if value is
> > 	still undefined and if so emit a subreg move instead.
> > 	(store_integral_bit_field): Likewise.
> > 	(store_bit_field): Likewise.
> > 	* expr.h (write_complex_part): Likewise.
> > 	* expmed.h (store_bit_field): Add new parameter.
> > 	* builtins.cc (expand_ifn_atomic_compare_exchange_into_call): Use new
> > 	parameter.
> > 	(expand_ifn_atomic_compare_exchange): Likewise.
> > 	* calls.cc (store_unaligned_arguments_into_pseudos): Likewise.
> > 	* emit-rtl.cc (validate_subreg): Likewise.
> > 	* expr.cc (emit_group_store): Likewise.
> > 	(copy_blkmode_from_reg): Likewise.
> > 	(copy_blkmode_to_reg): Likewise.
> > 	(clear_storage_hints): Likewise.
> > 	(write_complex_part):  Likewise.
> > 	(emit_move_complex_parts): Likewise.
> > 	(expand_assignment): Likewise.
> > 	(store_expr): Likewise.
> > 	(store_field): Likewise.
> > 	(expand_expr_real_2): Likewise.
> > 	* ifcvt.cc (noce_emit_move_insn): Likewise.
> > 	* internal-fn.cc (expand_arith_set_overflow): Likewise.
> > 	(expand_arith_overflow_result_store): Likewise.
> > 	(expand_addsub_overflow): Likewise.
> > 	(expand_neg_overflow): Likewise.
> > 	(expand_mul_overflow): Likewise.
> > 	(expand_arith_overflow): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* g++.target/aarch64/complex-init.C: New test.
> >
> > --- inline copy of patch ---
> >
> > [?]
> > diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C b/gcc/testsuite/g++.target/aarch64/complex-init.C
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..d3fd3e88d04a87bacf1c4ee74ce25282c6ff81e8
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
> > @@ -0,0 +1,40 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> > +
> > +/*
> > +** _Z1fii:
> > +** ...
> > +** 	bfi	x0, x1, 32, 32
> > +** 	ret
> > +** ...
> 
> Sorry for the test nit, but: it shouldn't be necessary to add ...
> after the ret.  Same for the other tests.
> 
> OK with that change, thanks.

This has

      HOST_WIDE_INT regnum;
...
      else if (((constant_multiple_p (bitnum, regsize * BITS_PER_UNIT, 
&regnum)
                 && multiple_p (bitsize, regsize * BITS_PER_UNIT))
                || undefined_p)
               && known_ge (GET_MODE_BITSIZE (GET_MODE (op0)), bitsize))
        {
          sub = simplify_gen_subreg (fieldmode, op0, GET_MODE (op0),
                                     regnum * regsize);

where regnum is used uninitialized when undefined_p.  An uninit
improvement of mine will diagnose this now.

What's the intended behavior here?  Use regnum = 0?

Please fix.

Thanks,
Richard.

> Richard
> 
> > +*/
> > +_Complex int f(int a, int b) {
> > +    _Complex int t = a + b * 1i;
> > +    return t;
> > +}
> > +
> > +/*
> > +** _Z2f2ii:
> > +** ...
> > +** 	bfi	x0, x1, 32, 32
> > +** 	ret
> > +** ...
> > +*/
> > +_Complex int f2(int a, int b) {
> > +    _Complex int t = {a, b};
> > +    return t;
> > +}
> > +
> > +/* 
> > +** _Z12f_convolutedii:
> > +** ...
> > +** 	bfi	x0, x1, 32, 32
> > +** 	ret
> > +** ...
> > +*/
> > +_Complex int f_convoluted(int a, int b) {
> > +    _Complex int t = (_Complex int)a;
> > +    __imag__ t = b;
> > +    return t;
> > +}
>
  

Patch

--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -947,9 +947,11 @@  validate_subreg (machine_mode omode, machine_mode imode,
 	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
     ;
   /* Subregs involving floating point modes are not allowed to
-     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
+     change size unless it's an insert into a complex mode.
+     Therefore (subreg:DI (reg:DF) 0) and (subreg:CS (reg:SF) 0) are fine, but
      (subreg:SI (reg:DF) 0) isn't.  */
-  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
+  else if ((FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
+	   && !COMPLEX_MODE_P (omode))
     {
       if (! (known_eq (isize, osize)
 	     /* LRA can use subreg to store a floating point value in
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 5f7142b975ada2cd8b00663d35ba1e0004b8e28d..fce672c236fdbc4d40adb6e2614c234c02a61933 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3740,7 +3740,17 @@  emit_move_complex_parts (rtx x, rtx y)
       && REG_P (x) && !reg_overlap_mentioned_p (x, y))
     emit_clobber (x);
 
-  write_complex_part (x, read_complex_part (y, false), false);
+  /* If we're writing the entire value using a concat into a register
+     then emit the lower part as a simple mov followed by an insert
+     into the top part.  */
+  if (GET_CODE (y) == CONCAT && !reload_completed && REG_P (x))
+    {
+      rtx val = XEXP (y, false);
+      rtx dest = lowpart_subreg (GET_MODE (val), x, GET_MODE (x));
+      emit_move_insn (dest, val);
+    }
+  else
+    write_complex_part (x, read_complex_part (y, false), false);
   write_complex_part (x, read_complex_part (y, true), true);
 
   return get_last_insn ();
diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C b/gcc/testsuite/g++.target/aarch64/complex-init.C
new file mode 100644
index 0000000000000000000000000000000000000000..497cc4bca3e2c59da95c871ceb5cc96216fc302d
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
@@ -0,0 +1,40 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+/*
+** _Z1fii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f(int a, int b) {
+    _Complex int t = a + b * 1i;
+    return t;
+}
+
+/*
+** _Z2f2ii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f2(int a, int b) {
+    _Complex int t = {a, b};
+    return t;
+}
+
+/*
+** _Z12f_convolutedii:
+** ...
+** 	bfi	x0, x1, 32, 32
+** 	ret
+** ...
+*/
+_Complex int f_convoluted(int a, int b) {
+    _Complex int t = (_Complex int)a;
+    __imag__ t = b;
+    return t;
+}