[v3] x86-64: mingw: Pass and return _Float16 in vector registers [PR115054]

Message ID 20250914005020.29063-1-tmgross@umich.edu
State New
Headers
Series [v3] x86-64: mingw: Pass and return _Float16 in vector registers [PR115054] |

Commit Message

Trevor Gross Sept. 14, 2025, 12:50 a.m. UTC
  For MinGW on x86-64, GCC currently passes and returns `_Float16` in
GPRs. Microsoft does not specify an ABI for the type so this is purely
an extension; however, there are a few reasons the current ABI is not
ideal:

1. `float` and `double` are both passed and returned in xmm registers
   under the MSVC ABI, there isn't any reason for `_Float16` to deviate.
2. `_Float16` is returned in xmm0 on Windows x86-32 by both GCC and
   Clang.
2. There is a platform-natural ABI with AVX512-FP16, which requires
   half-precision operands to be in vector registers.
3. System V uses vector registers for `_Float16`.

Thus, update the `HFmode` ABI to both pass and return in vector
registers, meaning its ABI is now identical to `float` and `double`.
This is already Clang's behavior on both its x64 MSVC and MinGW targets,
so the change here also resolves an ABI incompatibility (originally
reported in linked issue).

The results can be verified by evaluating the change in assembly output
with this source:

    void pass_f16(_Float16 x, _Float16 *dst) {
        *dst = x;
    }

    void callee_f16(_Float16);
    void call_f16() {
        callee_f16(1.0);
    }

    _Float16 ret_f16(_Float16 *x) {
        return *x;
    }

    /* Check libcall ABI */

    void extend_f16(_Float16 *x, _Float32 *dst) {
        *dst = (_Float32)*x;
    }

    void trunc_f16(_Float32 *x, _Float16 *dst) {
        *dst = (_Float16)*x;
    }

    /* Float varargs should be in vregs with a zeroed shadow GPR */

    void va(_Float16, ...);
    void va_f16() {
        va(1.0f16, 2.0f16, 3.0f16, 4.0f16, 5.0f16);
    }

While modifying the `function_value_ms_64` `switch` statement, a
redundant condition and trailing whitespace in the 16-byte case is
cleaned up.

2025-09-13  Trevor Gross  <tmgross@umich.edu>

gcc/:
        PR target/115054
        * gcc/config/i386/i386.cc (function_arg_ms_64,
          function_value_ms_64): Pass and return _Float16 in vector
          registers on Windows.

Signed-off-by: Trevor Gross <tmgross@umich.edu>
---

* Changes v1->v2: no code changes, only updated a commit message
* Changes v2->v3: fix a typo in a comment missed from [1]

v1: https://inbox.sourceware.org/gcc-patches/20250831215847.64430-1-tmgross@umich.edu/
v2: https://inbox.sourceware.org/gcc-patches/20250913175452.16582-1-tmgross@umich.edu/

I have sample output with and without this patch at [2] for reference.

[1]: https://inbox.sourceware.org/gcc-patches/8fc02832-aac1-4341-987b-849f17281839@gmail.com/
[2]: https://github.com/tgross35/gcc-build-output

 gcc/config/i386/i386.cc | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)
  

Comments

LIU Hao Sept. 14, 2025, 4:53 a.m. UTC | #1
在 2025-9-14 08:50, Trevor Gross 写道:
> For MinGW on x86-64, GCC currently passes and returns `_Float16` in
> GPRs. Microsoft does not specify an ABI for the type so this is purely
> an extension; however, there are a few reasons the current ABI is not
> ideal:
> 
> 1. `float` and `double` are both passed and returned in xmm registers
>     under the MSVC ABI, there isn't any reason for `_Float16` to deviate.
> 2. `_Float16` is returned in xmm0 on Windows x86-32 by both GCC and
>     Clang.
> 2. There is a platform-natural ABI with AVX512-FP16, which requires
>     half-precision operands to be in vector registers.
> 3. System V uses vector registers for `_Float16`.
> 

these numbers should read 1, 2, 3, 4 instead of  1, 2, 2, 3.



-- 
Best regards,
LIU Hao
  
Trevor Gross Sept. 14, 2025, 5:35 a.m. UTC | #2
On Sat Sep 13, 2025 at 11:53 PM CDT, LIU Hao wrote:
> 在 2025-9-14 08:50, Trevor Gross 写道:
>> For MinGW on x86-64, GCC currently passes and returns `_Float16` in
>> GPRs. Microsoft does not specify an ABI for the type so this is purely
>> an extension; however, there are a few reasons the current ABI is not
>> ideal:
>> 
>> 1. `float` and `double` are both passed and returned in xmm registers
>>     under the MSVC ABI, there isn't any reason for `_Float16` to deviate.
>> 2. `_Float16` is returned in xmm0 on Windows x86-32 by both GCC and
>>     Clang.
>> 2. There is a platform-natural ABI with AVX512-FP16, which requires
>>     half-precision operands to be in vector registers.
>> 3. System V uses vector registers for `_Float16`.
>> 
>
> these numbers should read 1, 2, 3, 4 instead of  1, 2, 2, 3.

Good catch. Want me to resend or can you fix it before committing?

- Trevor
  
Jonathan Yong Sept. 28, 2025, 2:59 a.m. UTC | #3
On 9/14/25 5:35 AM, Trevor Gross wrote:
> On Sat Sep 13, 2025 at 11:53 PM CDT, LIU Hao wrote:
>> 在 2025-9-14 08:50, Trevor Gross 写道:
>>> For MinGW on x86-64, GCC currently passes and returns `_Float16` in
>>> GPRs. Microsoft does not specify an ABI for the type so this is purely
>>> an extension; however, there are a few reasons the current ABI is not
>>> ideal:
>>>
>>> 1. `float` and `double` are both passed and returned in xmm registers
>>>      under the MSVC ABI, there isn't any reason for `_Float16` to deviate.
>>> 2. `_Float16` is returned in xmm0 on Windows x86-32 by both GCC and
>>>      Clang.
>>> 2. There is a platform-natural ABI with AVX512-FP16, which requires
>>>      half-precision operands to be in vector registers.
>>> 3. System V uses vector registers for `_Float16`.
>>>
>>
>> these numbers should read 1, 2, 3, 4 instead of  1, 2, 2, 3.
> 
> Good catch. Want me to resend or can you fix it before committing?
> 
> - Trevor

Any updates on this?
  
Jonathan Yong Oct. 8, 2025, 12:20 p.m. UTC | #4
On 9/28/25 2:59 AM, Jonathan Yong wrote:
> On 9/14/25 5:35 AM, Trevor Gross wrote:
>> On Sat Sep 13, 2025 at 11:53 PM CDT, LIU Hao wrote:
>>> 在 2025-9-14 08:50, Trevor Gross 写道:
>>>> For MinGW on x86-64, GCC currently passes and returns `_Float16` in
>>>> GPRs. Microsoft does not specify an ABI for the type so this is purely
>>>> an extension; however, there are a few reasons the current ABI is not
>>>> ideal:
>>>>
>>>> 1. `float` and `double` are both passed and returned in xmm registers
>>>>      under the MSVC ABI, there isn't any reason for `_Float16` to 
>>>> deviate.
>>>> 2. `_Float16` is returned in xmm0 on Windows x86-32 by both GCC and
>>>>      Clang.
>>>> 2. There is a platform-natural ABI with AVX512-FP16, which requires
>>>>      half-precision operands to be in vector registers.
>>>> 3. System V uses vector registers for `_Float16`.
>>>>
>>>
>>> these numbers should read 1, 2, 3, 4 instead of  1, 2, 2, 3.
>>
>> Good catch. Want me to resend or can you fix it before committing?
>>
>> - Trevor
> 
> Any updates on this?
> 

Fixed the minor numbering issue and spacing, bootstraped on x86_64 linux 
and pushed to master branch.

Thanks.
  
Trevor Gross Oct. 8, 2025, 1:30 p.m. UTC | #5
On Wed Oct 8, 2025 at 7:20 AM CDT, Jonathan Yong wrote:
> On 9/28/25 2:59 AM, Jonathan Yong wrote:
>
> Fixed the minor numbering issue and spacing, bootstraped on x86_64 linux 
> and pushed to master branch.
>
> Thanks.

Thank you for reviewing!
  

Patch

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 5ef7c315091..d56094e1daa 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -3493,8 +3493,10 @@  function_arg_ms_64 (const CUMULATIVE_ARGS *cum, machine_mode mode,
 
   regno = x86_64_ms_abi_int_parameter_registers[cum->regno];
 
-  /* Only floating point modes are passed in anything but integer regs.  */
-  if (TARGET_SSE && (mode == SFmode || mode == DFmode))
+  /* Only floating point modes less than 64 bits are passed in anything but
+     integer regs.  Larger floating point types are excluded as the Windows
+     ABI requires vreg args can be shadowed in GPRs (for red zone / varargs). */
+  if (TARGET_SSE && (mode == HFmode || mode == SFmode || mode == DFmode))
     {
       if (named)
 	{
@@ -4314,9 +4316,8 @@  function_value_ms_64 (machine_mode orig_mode, machine_mode mode,
     {
       switch (GET_MODE_SIZE (mode))
 	{
-	case 16:
+	case 16:	  
 	  if (valtype != NULL_TREE
-	      && !VECTOR_INTEGER_TYPE_P (valtype)
 	      && !VECTOR_INTEGER_TYPE_P (valtype)
 	      && !INTEGRAL_TYPE_P (valtype)
 	      && !VECTOR_FLOAT_TYPE_P (valtype))
@@ -4327,9 +4328,10 @@  function_value_ms_64 (machine_mode orig_mode, machine_mode mode,
 	  break;
 	case 8:
 	case 4:
+	case 2:
 	  if (valtype != NULL_TREE && AGGREGATE_TYPE_P (valtype))
 	    break;
-	  if (mode == SFmode || mode == DFmode)
+	  if (mode == HFmode || mode == SFmode || mode == DFmode)
 	    regno = FIRST_SSE_REG;
 	  break;
 	default: