[v2] x86-64: Replace `and %ah` write with `test %ah` read

Message ID 20230313205056.419587-1-goldstein.w.n@gmail.com
State New
Headers
Series [v2] x86-64: Replace `and %ah` write with `test %ah` read |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Noah Goldstein March 13, 2023, 8:50 p.m. UTC
  High8 partial registers can incur a stall when being modified (if not
renamed seperately), or at the very least incur extra backend uops (if
renamed seperately). Either way read only `testl $0x4, %ah` is preferable
to `andb $0x4, %ah`.

Function size is unchanged when accounting for 16-byte padding.
---
 sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

H.J. Lu March 14, 2023, 2 a.m. UTC | #1
On Mon, Mar 13, 2023 at 1:51 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> High8 partial registers can incur a stall when being modified (if not
> renamed seperately), or at the very least incur extra backend uops (if
> renamed seperately). Either way read only `testl $0x4, %ah` is preferable
> to `andb $0x4, %ah`.
>
> Function size is unchanged when accounting for 16-byte padding.
> ---
>  sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
> index d754668bce..e9a76178f9 100644
> --- a/sysdeps/x86_64/fpu/e_fmodl.S
> +++ b/sysdeps/x86_64/fpu/e_fmodl.S
> @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
>         fldt    8(%rsp)
>  1:     fprem
>         fstsw   %ax
> -       and     $04,%ah
> +       testb   $0x4,%ah
>         jnz     1b
>         fstp    %st(1)
>         ret
> --
> 2.34.1
>

OK.

Thanks.
  

Patch

diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
index d754668bce..e9a76178f9 100644
--- a/sysdeps/x86_64/fpu/e_fmodl.S
+++ b/sysdeps/x86_64/fpu/e_fmodl.S
@@ -13,7 +13,7 @@  ENTRY(__ieee754_fmodl)
 	fldt	8(%rsp)
 1:	fprem
 	fstsw	%ax
-	and	$04,%ah
+	testb	$0x4,%ah
 	jnz	1b
 	fstp	%st(1)
 	ret