diff mbox series

[Arm] Fix PR 92999

Message ID	CAJA7tRZonrXGHcaqVLNduyoAXa8mT+5TiYk29PsXd4sBwfa2JA@mail.gmail.com
State	Changes Requested
Delegated to:	Richard Earnshaw
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 15AE53858D3C MIME-Version: 1.0 Date: Tue, 8 Nov 2022 18:20:20 +0000 Message-ID: <CAJA7tRZonrXGHcaqVLNduyoAXa8mT+5TiYk29PsXd4sBwfa2JA@mail.gmail.com> Subject: [Patch Arm] Fix PR 92999 To: gcc-patches <gcc-patches@gcc.gnu.org> Cc: Richard Earnshaw <Richard.Earnshaw@arm.com> Content-Type: multipart/mixed; boundary="00000000000069b05305ecf998e6" Precedence: list From: Ramana Radhakrishnan via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Ramana Radhakrishnan <ramana.gcc@googlemail.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
Series	[Arm] Fix PR 92999 \| [Arm] Fix PR 92999

Commit Message

Ramana Radhakrishnan Nov. 8, 2022, 6:20 p.m. UTC

  PR92999 is a case where the VFP calling convention does not allocate
enough FP registers for a homogenous aggregate containing FP16 values.
I believe this is the complete fix but would appreciate another set of
eyes on this.

Could I get a hand with a regression test run on an armhf environment
while I fix my environment ?

gcc/ChangeLog:

PR target/92999
*  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
aggregates with elements smaller than SFmode.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr92999.c: New test.


Thanks,
Ramana

Signed-off-by: Ramana Radhakrishnan <ramana.gcc@gmail.com>
---
 gcc/config/arm/arm.cc                  |  6 ++++-
 gcc/testsuite/gcc.target/arm/pr92999.c | 31 ++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr92999.c

Comments

Alex Coplan Nov. 9, 2022, 6:42 p.m. UTC | #1

Hi Ramana,

On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> PR92999 is a case where the VFP calling convention does not allocate
> enough FP registers for a homogenous aggregate containing FP16 values.
> I believe this is the complete fix but would appreciate another set of
> eyes on this.
> 
> Could I get a hand with a regression test run on an armhf environment
> while I fix my environment ?

I ran a bootstrap/regtest on arm-linux-gnueabihf, which ran OK (no
regressions, new test passes).

FWIW, I noticed some minor style issues with the patch, though:

$ contrib/check_GNU_style.py Arm-Fix-PR-92999.diff
=== ERROR type #1: there should be exactly one space between function name and parenthesis (2 error(s)) ===
gcc/config/arm/arm.cc:6746:18:      shift = (MAX(GET_MODE_SIZE(ag_mode), GET_MODE_SIZE(SFmode))
gcc/config/arm/arm.cc:6747:23:         / GET_MODE_SIZE(SFmode));

Thanks,
Alex

> 
> gcc/ChangeLog:
> 
> PR target/92999
> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> aggregates with elements smaller than SFmode.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/arm/pr92999.c: New test.
> 
> 
> Thanks,
> Ramana
> 
> Signed-off-by: Ramana Radhakrishnan <ramana.gcc@gmail.com>

Richard Earnshaw Nov. 10, 2022, 5:21 p.m. UTC | #2

On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> PR92999 is a case where the VFP calling convention does not allocate
> enough FP registers for a homogenous aggregate containing FP16 values.
> I believe this is the complete fix but would appreciate another set of
> eyes on this.
> 
> Could I get a hand with a regression test run on an armhf environment
> while I fix my environment ?
> 
> gcc/ChangeLog:
> 
> PR target/92999
> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> aggregates with elements smaller than SFmode.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/arm/pr92999.c: New test.
> 
> 
> Thanks,
> Ramana
> 
> Signed-off-by: Ramana Radhakrishnan <ramana.gcc@gmail.com>

I'm not sure about this.  The AAPCS does not mention a base type of a 
half-precision FP type as an appropriate homogeneous aggregate for using 
VFP registers for either calling or returning.

So perhaps the bug is that we try to treat this as a homogeneous 
aggregate at all.

R.

Richard Earnshaw Nov. 10, 2022, 6:03 p.m. UTC | #3

On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> 
> 
> On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
>> PR92999 is a case where the VFP calling convention does not allocate
>> enough FP registers for a homogenous aggregate containing FP16 values.
>> I believe this is the complete fix but would appreciate another set of
>> eyes on this.
>>
>> Could I get a hand with a regression test run on an armhf environment
>> while I fix my environment ?
>>
>> gcc/ChangeLog:
>>
>> PR target/92999
>> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
>> aggregates with elements smaller than SFmode.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/arm/pr92999.c: New test.
>>
>>
>> Thanks,
>> Ramana
>>
>> Signed-off-by: Ramana Radhakrishnan <ramana.gcc@gmail.com>
> 
> I'm not sure about this.  The AAPCS does not mention a base type of a 
> half-precision FP type as an appropriate homogeneous aggregate for using 
> VFP registers for either calling or returning.
> 
> So perhaps the bug is that we try to treat this as a homogeneous 
> aggregate at all.
> 
> R.

And clang seems to agree with my opinion: https://godbolt.org/z/ncaYfzebM

R.

Ramana Radhakrishnan Nov. 10, 2022, 7:46 p.m. UTC | #4

On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
>
>
>
> On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> >
> >
> > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> >> PR92999 is a case where the VFP calling convention does not allocate
> >> enough FP registers for a homogenous aggregate containing FP16 values.
> >> I believe this is the complete fix but would appreciate another set of
> >> eyes on this.
> >>
> >> Could I get a hand with a regression test run on an armhf environment
> >> while I fix my environment ?
> >>
> >> gcc/ChangeLog:
> >>
> >> PR target/92999
> >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> >> aggregates with elements smaller than SFmode.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/arm/pr92999.c: New test.
> >>
> >>
> >> Thanks,
> >> Ramana
> >>
> >> Signed-off-by: Ramana Radhakrishnan <ramana.gcc@gmail.com>
> >
> > I'm not sure about this.  The AAPCS does not mention a base type of a
> > half-precision FP type as an appropriate homogeneous aggregate for using
> > VFP registers for either calling or returning.

Ooh interesting, thanks for taking a look and poking at the AAPCS and
that's a good catch. BF16 should also have the same behaviour as FP16
, I suspect ?

> >
> > So perhaps the bug is that we try to treat this as a homogeneous
> > aggregate at all.

Yep I agree - I'll take a look again tomorrow and see if I can get a fix.

(And thanks Alex for the test run, I might trouble you again while I
still (slowly) get some of my boards back up)

regards,
Ramana


>
> R.

Ramana Radhakrishnan Nov. 11, 2022, 9:50 p.m. UTC | #5

On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
>
> On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
> <Richard.Earnshaw@foss.arm.com> wrote:
> >
> >
> >
> > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> > >
> > >
> > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> > >> PR92999 is a case where the VFP calling convention does not allocate
> > >> enough FP registers for a homogenous aggregate containing FP16 values.
> > >> I believe this is the complete fix but would appreciate another set of
> > >> eyes on this.
> > >>
> > >> Could I get a hand with a regression test run on an armhf environment
> > >> while I fix my environment ?
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >> PR target/92999
> > >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> > >> aggregates with elements smaller than SFmode.
> > >>
> > >> gcc/testsuite/ChangeLog:
> > >>
> > >> * gcc.target/arm/pr92999.c: New test.
> > >>
> > >>
> > >> Thanks,
> > >> Ramana
> > >>
> > >> Signed-off-by: Ramana Radhakrishnan <ramana.gcc@gmail.com>
> > >
> > > I'm not sure about this.  The AAPCS does not mention a base type of a
> > > half-precision FP type as an appropriate homogeneous aggregate for using
> > > VFP registers for either calling or returning.
>
> Ooh interesting, thanks for taking a look and poking at the AAPCS and
> that's a good catch. BF16 should also have the same behaviour as FP16
> , I suspect ?

I suspect I got caught out by the definition of the Homogenous
aggregate from Section 5.3.5
((https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates)
which simply suggests it's an aggregate of fundamental types which
lists half precision floating point .

FTR, ideally I should have read 7.1.2.1
https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling)
:)



>
> > >
> > > So perhaps the bug is that we try to treat this as a homogeneous
> > > aggregate at all.
>
> Yep I agree - I'll take a look again tomorrow and see if I can get a fix.
>
> (And thanks Alex for the test run, I might trouble you again while I
> still (slowly) get some of my boards back up)


and as promised take 2. I'd really prefer another review on this one
to see if I've not missed anything in the cases below.

regards
Ramana


>
> regards,
> Ramana
>
>
> >
> > R.
commit c2ed018d10328c5cf93aa56b00ba4caf5dace539
Author: Ramana Radhakrishnan <ramana.gcc@gmail.com>
Date:   Fri Nov 11 21:39:22 2022 +0000

    [Patch Arm] Fix PR92999
    
    PR target/92999 is a case where the VFP PCS implementation is
    incorrectly considering homogenous floating point aggregates with FP16
    and BF16 values.
    
    Can someone help me with a bootstrap and regression test on an armhf
    environment ?
    
    Signed-off-by: Ramana Radhakrishnan <ramana.gcc@gmail.com>
    Tested-by: Alex Coplan  <alex.coplan@arm.com>
    Reviewed-by: Richard Earnshaw  <rearnsha@arm.com>
    
            PR target/92999
    
    gcc/ChangeLog:
    
            * config/arm/arm.cc (aapcs_vfp_is_invalid_scalar_in_ha): New
            (aapcs_vfp_sub_candidate): Adjust.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/arm/pr92999.c: New test.

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 2eb4d51e4a3..cd3e1ffe777 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -6281,6 +6281,31 @@ const unsigned int WARN_PSABI_EMPTY_CXX17_BASE = 1U << 0;
 const unsigned int WARN_PSABI_NO_UNIQUE_ADDRESS = 1U << 1;
 const unsigned int WARN_PSABI_ZERO_WIDTH_BITFIELD = 1U << 2;
 
+
+/* The AAPCS VFP ABI allows homogenous aggregates with scalar
+   FP32 and FP64 members.
+   Return
+     true if this is a scalar that is not a proper candidate
+     false if this is a scalar that is an acceptable scalar data
+     type in a homogenous aggregate or if this is not a scalar allowing
+     the tree walk in aapcs_vfp_sub_candidate to continue.
+ */
+static bool
+aapcs_vfp_is_invalid_scalar_in_ha (const_tree inner_type)
+{
+
+  machine_mode mode = TYPE_MODE (inner_type);
+  if (TREE_CODE (inner_type) == REAL_TYPE)
+    {
+      if (mode == DFmode && mode == SFmode)
+	return false;
+      else
+	return true;
+    }
+  else
+    return false;
+}
+
 /* Walk down the type tree of TYPE counting consecutive base elements.
    If *MODEP is VOIDmode, then set it to the first valid floating point
    type.  If a non-floating point type is found, or if a floating point
@@ -6372,6 +6397,10 @@ aapcs_vfp_sub_candidate (const_tree type, machine_mode *modep,
 	    || TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST)
 	  return -1;
 
+	/* We ignore HFA's of FP16 and BF16.  */
+	if (aapcs_vfp_is_invalid_scalar_in_ha (TREE_TYPE (type)))
+	  return -1;
+
 	count = aapcs_vfp_sub_candidate (TREE_TYPE (type), modep,
 					 warn_psabi_flags);
 	if (count == -1
@@ -6455,6 +6484,10 @@ aapcs_vfp_sub_candidate (const_tree type, machine_mode *modep,
 		  }
 	      }
 
+	    /* We ignore HA's of FP16 and BF16.  */
+	    if (aapcs_vfp_is_invalid_scalar_in_ha (TREE_TYPE (field)))
+	      return -1;
+
 	    sub_count = aapcs_vfp_sub_candidate (TREE_TYPE (field), modep,
 						 warn_psabi_flags);
 	    if (sub_count < 0)
@@ -6489,6 +6522,10 @@ aapcs_vfp_sub_candidate (const_tree type, machine_mode *modep,
 	    if (TREE_CODE (field) != FIELD_DECL)
 	      continue;
 
+	    /* We ignore HA's of FP16 and BF16.  */
+	    if (aapcs_vfp_is_invalid_scalar_in_ha (TREE_TYPE (field)))
+		return -1;
+
 	    sub_count = aapcs_vfp_sub_candidate (TREE_TYPE (field), modep,
 						 warn_psabi_flags);
 	    if (sub_count < 0)
diff --git a/gcc/testsuite/gcc.target/arm/pr92999.c b/gcc/testsuite/gcc.target/arm/pr92999.c
new file mode 100644
index 00000000000..faa21fdb7d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr92999.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-mfp16-format=ieee" } */
+
+//
+// Compile with gcc -mfp16-format=ieee
+// Any optimization level is fine.
+//
+// Correct output should be
+// "y.first = 1, y.second = -99"
+//
+// Buggy output is
+// "y.first = -99, y.second = -99"
+//
+#include <stdlib.h>
+struct phalf {
+    __fp16 first;
+    __fp16 second;
+};
+
+struct phalf phalf_copy(struct phalf* src) __attribute__((noinline));
+struct phalf phalf_copy(struct phalf* src) {
+    return *src;
+}
+
+int main() {
+    struct phalf x = { 1.0, -99.0};
+    struct phalf y = phalf_copy(&x);
+    if (y.first != 1.0 && y.second != -99.0)
+	abort();
+    return 0;
+}

Ramana Radhakrishnan Nov. 17, 2022, 8:15 p.m. UTC | #6

On Fri, Nov 11, 2022 at 9:50 PM Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
>
> On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
> >
> > On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
> > <Richard.Earnshaw@foss.arm.com> wrote:
> > >
> > >
> > >
> > > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> > > >
> > > >
> > > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> > > >> PR92999 is a case where the VFP calling convention does not allocate
> > > >> enough FP registers for a homogenous aggregate containing FP16 values.
> > > >> I believe this is the complete fix but would appreciate another set of
> > > >> eyes on this.
> > > >>
> > > >> Could I get a hand with a regression test run on an armhf environment
> > > >> while I fix my environment ?
> > > >>
> > > >> gcc/ChangeLog:
> > > >>
> > > >> PR target/92999
> > > >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> > > >> aggregates with elements smaller than SFmode.
> > > >>
> > > >> gcc/testsuite/ChangeLog:
> > > >>
> > > >> * gcc.target/arm/pr92999.c: New test.
> > > >>
> > > >>
> > > >> Thanks,
> > > >> Ramana
> > > >>
> > > >> Signed-off-by: Ramana Radhakrishnan <ramana.gcc@gmail.com>
> > > >
> > > > I'm not sure about this.  The AAPCS does not mention a base type of a
> > > > half-precision FP type as an appropriate homogeneous aggregate for using
> > > > VFP registers for either calling or returning.
> >
> > Ooh interesting, thanks for taking a look and poking at the AAPCS and
> > that's a good catch. BF16 should also have the same behaviour as FP16
> > , I suspect ?
>
> I suspect I got caught out by the definition of the Homogenous
> aggregate from Section 5.3.5
> ((https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates)
> which simply suggests it's an aggregate of fundamental types which
> lists half precision floating point .
>
> FTR, ideally I should have read 7.1.2.1
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling)
> :)
>
>
>
> >
> > > >
> > > > So perhaps the bug is that we try to treat this as a homogeneous
> > > > aggregate at all.
> >
> > Yep I agree - I'll take a look again tomorrow and see if I can get a fix.
> >
> > (And thanks Alex for the test run, I might trouble you again while I
> > still (slowly) get some of my boards back up)
>
>
> and as promised take 2. I'd really prefer another review on this one
> to see if I've not missed anything in the cases below.

Ping  ?

Ramana

>
> regards
> Ramana
>
>
> >
> > regards,
> > Ramana
> >
> >
> > >
> > > R.

Ramana Radhakrishnan Nov. 24, 2022, 12:32 p.m. UTC | #7

Ping x 2

Ramana

On Thu, 17 Nov 2022, 20:15 Ramana Radhakrishnan, <ramana.gcc@googlemail.com>
wrote:

> On Fri, Nov 11, 2022 at 9:50 PM Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
> >
> > On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
> > <ramana.gcc@googlemail.com> wrote:
> > >
> > > On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
> > > <Richard.Earnshaw@foss.arm.com> wrote:
> > > >
> > > >
> > > >
> > > > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> > > > >
> > > > >
> > > > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> > > > >> PR92999 is a case where the VFP calling convention does not
> allocate
> > > > >> enough FP registers for a homogenous aggregate containing FP16
> values.
> > > > >> I believe this is the complete fix but would appreciate another
> set of
> > > > >> eyes on this.
> > > > >>
> > > > >> Could I get a hand with a regression test run on an armhf
> environment
> > > > >> while I fix my environment ?
> > > > >>
> > > > >> gcc/ChangeLog:
> > > > >>
> > > > >> PR target/92999
> > > > >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to
> handle
> > > > >> aggregates with elements smaller than SFmode.
> > > > >>
> > > > >> gcc/testsuite/ChangeLog:
> > > > >>
> > > > >> * gcc.target/arm/pr92999.c: New test.
> > > > >>
> > > > >>
> > > > >> Thanks,
> > > > >> Ramana
> > > > >>
> > > > >> Signed-off-by: Ramana Radhakrishnan <ramana.gcc@gmail.com>
> > > > >
> > > > > I'm not sure about this.  The AAPCS does not mention a base type
> of a
> > > > > half-precision FP type as an appropriate homogeneous aggregate for
> using
> > > > > VFP registers for either calling or returning.
> > >
> > > Ooh interesting, thanks for taking a look and poking at the AAPCS and
> > > that's a good catch. BF16 should also have the same behaviour as FP16
> > > , I suspect ?
> >
> > I suspect I got caught out by the definition of the Homogenous
> > aggregate from Section 5.3.5
> > ((
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates
> )
> > which simply suggests it's an aggregate of fundamental types which
> > lists half precision floating point .
> >
> > FTR, ideally I should have read 7.1.2.1
> >
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling
> )
> > :)
> >
> >
> >
> > >
> > > > >
> > > > > So perhaps the bug is that we try to treat this as a homogeneous
> > > > > aggregate at all.
> > >
> > > Yep I agree - I'll take a look again tomorrow and see if I can get a
> fix.
> > >
> > > (And thanks Alex for the test run, I might trouble you again while I
> > > still (slowly) get some of my boards back up)
> >
> >
> > and as promised take 2. I'd really prefer another review on this one
> > to see if I've not missed anything in the cases below.
>
> Ping  ?
>
> Ramana
>
> >
> > regards
> > Ramana
> >
> >
> > >
> > > regards,
> > > Ramana
> > >
> > >
> > > >
> > > > R.
>

Richard Earnshaw Nov. 24, 2022, 4:12 p.m. UTC | #8

On 11/11/2022 21:50, Ramana Radhakrishnan via Gcc-patches wrote:
> On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
>>
>> On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
>> <Richard.Earnshaw@foss.arm.com> wrote:
>>>
>>>
>>>
>>> On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
>>>>
>>>>
>>>> On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
>>>>> PR92999 is a case where the VFP calling convention does not allocate
>>>>> enough FP registers for a homogenous aggregate containing FP16 values.
>>>>> I believe this is the complete fix but would appreciate another set of
>>>>> eyes on this.
>>>>>
>>>>> Could I get a hand with a regression test run on an armhf environment
>>>>> while I fix my environment ?
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> PR target/92999
>>>>> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
>>>>> aggregates with elements smaller than SFmode.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> * gcc.target/arm/pr92999.c: New test.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Ramana
>>>>>
>>>>> Signed-off-by: Ramana Radhakrishnan <ramana.gcc@gmail.com>
>>>>
>>>> I'm not sure about this.  The AAPCS does not mention a base type of a
>>>> half-precision FP type as an appropriate homogeneous aggregate for using
>>>> VFP registers for either calling or returning.
>>
>> Ooh interesting, thanks for taking a look and poking at the AAPCS and
>> that's a good catch. BF16 should also have the same behaviour as FP16
>> , I suspect ?
> 
> I suspect I got caught out by the definition of the Homogenous
> aggregate from Section 5.3.5
> ((https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates)
> which simply suggests it's an aggregate of fundamental types which
> lists half precision floating point .

A homogeneous aggregate is any aggregate that fits the general 
definition, but only HAs of specific types are of interest for the VFP 
PCS rules.

The problem we have is that when we added HFmode (and later BF16mode) 
support we didn't notice that the base types are VFP candidates, but the 
nested types (eg in records or arrays) are not.

The problems started around SVN r236269 (git:1b81a1c1bd53) when we added 
FP16 support.


> 
> FTR, ideally I should have read 7.1.2.1
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling)
> :)
> 
> 
> 
>>
>>>>
>>>> So perhaps the bug is that we try to treat this as a homogeneous
>>>> aggregate at all.
>>
>> Yep I agree - I'll take a look again tomorrow and see if I can get a fix.
>>
>> (And thanks Alex for the test run, I might trouble you again while I
>> still (slowly) get some of my boards back up)
> 
> 
> and as promised take 2. I'd really prefer another review on this one
> to see if I've not missed anything in the cases below.

I think I'd prefer to try and fix this at the point where we accept the 
base types, ie around:

     case REAL_TYPE:
       mode = TYPE_MODE (type);
       if (mode != DFmode && mode != SFmode && mode != HFmode && mode != 
BFmode)
         return -1;

by changing this to something like

     /* HFmode and BFmode can be passed in registers, but are not valid
        base types for an HFA, so only accept these if we are at the top
        level.  */
     if (!(mode == DFmode || mode == SFmode
           || (depth == 0
               && (mode == HFmode || mode == BFmode)))
        return -1;

and we then pass depth into the recursion calls as an extra parameter, 
starting at 0 for the top level and incrementing it by 1 each time 
aapcs_vfp_sub_candidate recurses.

For the test, would it be possible to rewrite it in the style of 
gcc.target/arm/aapcs/* and put it there? That would ensure that not only 
are the caller and callee compatible, but that the values are passed in 
the correct location.

R.

diff mbox series

Patch

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 2eb4d51e4a3..03f4057f717 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -6740,7 +6740,11 @@  aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant ATTRIBUTE_UNUSED,
 	      count *= 2;
 	    }
 	}
-      shift = GET_MODE_SIZE(ag_mode) / GET_MODE_SIZE(SFmode);
+
+      /* Aggregates can contain FP16 or BF16 values which would need to
+	 be passed in via FP registers.  */
+      shift = (MAX(GET_MODE_SIZE(ag_mode), GET_MODE_SIZE(SFmode))
+	       / GET_MODE_SIZE(SFmode));
       par = gen_rtx_PARALLEL (mode, rtvec_alloc (count));
       for (i = 0; i < count; i++)
 	{
diff --git a/gcc/testsuite/gcc.target/arm/pr92999.c b/gcc/testsuite/gcc.target/arm/pr92999.c
new file mode 100644
index 00000000000..faa21fdb7d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr92999.c
@@ -0,0 +1,31 @@ 
+/* { dg-do run } */
+/* { dg-options "-mfp16-format=ieee" } */
+
+//
+// Compile with gcc -mfp16-format=ieee
+// Any optimization level is fine.
+//
+// Correct output should be
+// "y.first = 1, y.second = -99"
+//
+// Buggy output is
+// "y.first = -99, y.second = -99"
+//
+#include <stdlib.h>
+struct phalf {
+    __fp16 first;
+    __fp16 second;
+};
+
+struct phalf phalf_copy(struct phalf* src) __attribute__((noinline));
+struct phalf phalf_copy(struct phalf* src) {
+    return *src;
+}
+
+int main() {
+    struct phalf x = { 1.0, -99.0};
+    struct phalf y = phalf_copy(&x);
+    if (y.first != 1.0 && y.second != -99.0)
+	abort();
+    return 0;
+}