powerpc: Add a POWER8 implementation for GET|SET_FLOAT_WORD
Commit Message
Provides a POWER8-specific implementation for GET_FLOAT_WORD and
SET_FLOAT_WORD that is able to extract or set a float using only VSR and
GPR.
2016-12-20 Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
* sysdeps/powerpc/powerpc64/power8/fpu/math_private.h (GET_FLOAT_WORD):
(SET_FLOAT_WORD): New macros.
---
.../powerpc/powerpc64/power8/fpu/math_private.h | 48 ++++++++++++++++++++++
1 file changed, 48 insertions(+)
create mode 100644 sysdeps/powerpc/powerpc64/power8/fpu/math_private.h
Comments
On Tue, 20 Dec 2016, Tulio Magno Quites Machado Filho wrote:
> Provides a POWER8-specific implementation for GET_FLOAT_WORD and
> SET_FLOAT_WORD that is able to extract or set a float using only VSR and
> GPR.
Why doesn't the compiler handle this automatically when building for
POWER8? My comments in
<https://sourceware.org/ml/libc-alpha/2016-06/msg01180.html> apply equally
here (but reinterpretation of a bit pattern between integer and float is
more generic than bitwise masking of a float, so a wider range of code is
likely to benefit from such a compiler optimization).
On 20/12/2016 16:46, Joseph Myers wrote:
> On Tue, 20 Dec 2016, Tulio Magno Quites Machado Filho wrote:
>
>> Provides a POWER8-specific implementation for GET_FLOAT_WORD and
>> SET_FLOAT_WORD that is able to extract or set a float using only VSR and
>> GPR.
>
> Why doesn't the compiler handle this automatically when building for
> POWER8? My comments in
> <https://sourceware.org/ml/libc-alpha/2016-06/msg01180.html> apply equally
> here (but reinterpretation of a bit pattern between integer and float is
> more generic than bitwise masking of a float, so a wider range of code is
> likely to benefit from such a compiler optimization).
>
Looks like GCC, at least master (7.0.0), does generate awful code for
this conversion. It does use a xscvdpuxws+mfvsrd, but still store and
load the float memory as pre-ISA 2.07 expectation plus a lot of
superfluous precision instructions. Double precision to 64 bits integers
is somewhat better (mtvsrd plus 2 fcfidu), but the contrary (64 bits
integer to double precision) is also far from optimal.
And I agree with Joseph here, these optimization should be handled
by the compiler.
new file mode 100644
@@ -0,0 +1,48 @@
+/* Private inline math functions for POWER8.
+ Copyright (C) 2016 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* Move a float to a GPR using VSX instructions available in the POWER
+ ISA 2.07. */
+#define GET_FLOAT_WORD(i,d) \
+ do { \
+ float tmpd = d; \
+ double tmp; \
+ long tmpi; \
+ __asm__ ("xscvdpspn %x1, %x2\n\t" \
+ "mfvsrd %0, %x1\n\t" \
+ : "=wr" (tmpi), \
+ "=wa" (tmp) \
+ : "wa" (tmpd) ); \
+ i = tmpi >> 32; \
+ } while(0)
+
+/* Move a float from GPR using VSX instructions available in the POWER
+ ISA 2.07. */
+#define SET_FLOAT_WORD(d,i) \
+ do { \
+ long tmpi = i; \
+ float tmpd; \
+ tmpi = tmpi << 32; \
+ __asm__ ("mtvsrd %x0, %1\n\t" \
+ "xscvspdpn %x0, %x0\n\t" \
+ : "=wa" (tmpd) \
+ : "wr" (tmpi) ); \
+ d = tmpd; \
+ } while(0)
+
+#include_next <math_private.h>