From patchwork Thu Sep 23 21:19:10 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Joseph Myers <joseph@codesourcery.com>
X-Patchwork-Id: 45401
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 3EFBE3857C4A
	for <patchwork@sourceware.org>; Thu, 23 Sep 2021 21:19:30 +0000 (GMT)
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 3B5AC3858D29
 for <libc-alpha@sourceware.org>; Thu, 23 Sep 2021 21:19:17 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3B5AC3858D29
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 b28NgE+FICoC8/CxWQB6ccrGSopuRUC9wwHTA303IGYsEWkZp9wEmWIl8w++ufsqSlc/C7Bwp+
 J7Q92JJcbdf9ieRXTEOmxTKUIWRpccGCnBVaT/4zLZ4rrqAEMD/6vQzcaG/hGAfXkFkldnM62I
 i5vlvHG3USUZFAdSguO1ivnT4M1Y6zQCWum+UR9OcERXpevNENqLoeHTlMRlzp0OZqDs1Uni74
 /eKEYDlb+PajNK8IzjklskU5dpPmX7OhyD5VLIL8ld7edkfyK6X4XjGphxUhX0EydA+75ryIj0
 XekTc84lno2CYVjxaE0BNEtT
X-IronPort-AV: E=Sophos;i="5.85,317,1624348800"; d="scan'208";a="66229994"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa2.mentor.iphmx.com with ESMTP; 23 Sep 2021 13:19:16 -0800
IronPort-SDR: 
 fvnccBJjPwJij0WPB30gJpCn3MhAoNuzGhP4KI/olY8jSHT0X9gUZMq6krq5WzsbjXuzYgxJBA
 b47qnHduoQ3PkGPO/ZYraGkDtEX444+bKPiD7LpX2w20ENzV2HQq1RxCsPrcR29efc39dn5Kri
 gHmew27PS8fvexwn6qQxcB+gyor92iCbaieLLZPfXpz2KEUHSWs/o0YP/B55iiNhIGmHyrlob6
 V8hQt2uInp0qFBvHE1QTNigFnMeBlQ7A7O8CCV9g3tdlPopvbUUIVwMs6nlFhyIuj3Glsjep8E
 bxM=
Date: Thu, 23 Sep 2021 21:19:10 +0000
From: Joseph Myers <joseph@codesourcery.com>
X-X-Sender: jsm28@digraph.polyomino.org.uk
To: <libc-alpha@sourceware.org>
Subject: Fix ffma use of round-to-odd on x86 [committed]
Message-ID: 
 <alpine.DEB.2.22.394.2109232118480.786440@digraph.polyomino.org.uk>
User-Agent: Alpine 2.22 (DEB 394 2020-01-19)
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-02.mgc.mentorg.com (139.181.222.2) To
 svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1)
X-Spam-Status: No, score=-3124.2 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

On 32-bit x86 with -mfpmath=sse, and on x86_64 with
--disable-multi-arch, the tests of ffma and its aliases (fma narrowing
from binary64 to binary32) fail.  This is probably the issue reported
by H.J. in
<https://sourceware.org/pipermail/libc-alpha/2021-September/131277.html>.

The problem is the use of fenv_private.h macros in the round-to-odd
implementation.  Those macros are set up to manipulate only one of the
SSE and 387 floating-point state, whichever is relevant for the type
indicated by the suffix on the macro name.  But x86 configurations
sometimes use the ldbl-96 implementation of binary64 fma (that's where
--disable-multi-arch is relevant for x86_64: it causes the ldbl-96
implementation to be used, instead of an IFUNC implementation that
falls back to the dbl-64 version), contrary to the expectations of
those macros for functions operating on double when __SSE2_MATH__ is
defined.

This can be addressed by using the default versions of those macros
(giving x86 its own version of s_ffma.c), as is done for the *f128
macro variants where it depends on the details of how GCC was
configured when building libgcc which floating-point state is affected
by _Float128 arithmetic.  The issue only applies when __SSE2_MATH__ is
defined, and doesn't apply when __FP_FAST_FMA is defined (because in
that case, fma will be inlined by the compiler, meaning it's
definitely an SSE operation; for the same reason, this is not an issue
for narrowing sqrt, as hardware sqrt is always inlined in that
implementation for x86), but in other cases it's safest to use the
default versions of the fenv_private.h macros to ensure things work
whichever fma implementation is used.

Tested for x86_64 (with and without --disable-multi-arch) and x86
(with and without -mfpmath=sse).
---

Committed.

diff --git a/sysdeps/x86/fpu/s_ffma.c b/sysdeps/x86/fpu/s_ffma.c
new file mode 100644
index 0000000000..95c2dcd7b7
--- /dev/null
+++ b/sysdeps/x86/fpu/s_ffma.c
@@ -0,0 +1,46 @@
+/* Fused multiply-add of double value, narrowing the result to float.
+   x86 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define f32fmaf64 __hide_f32fmaf64
+#define f32fmaf32x __hide_f32fmaf32x
+#define ffmal __hide_ffmal
+#include <math.h>
+#undef f32fmaf64
+#undef f32fmaf32x
+#undef ffmal
+
+#include <math-narrow.h>
+
+#if defined __SSE2_MATH__ && !defined __FP_FAST_FMA
+/* Depending on the details of the glibc configuration, fma might use
+   either SSE or 387 arithmetic; ensure that both parts of the
+   floating-point state are handled in the round-to-odd code.  */
+# undef libc_feholdexcept_setround
+# define libc_feholdexcept_setround	default_libc_feholdexcept_setround
+# undef libc_feupdateenv_test
+# define libc_feupdateenv_test		default_libc_feupdateenv_test
+#endif
+
+float
+__ffma (double x, double y, double z)
+{
+  NARROW_FMA_ROUND_TO_ODD (x, y, z, float, union ieee754_double, , mantissa1,
+			   false);
+}
+libm_alias_float_double (fma)