[3/4] Add ILP32 support to aarch64

Message ID 1501888532.3962.92.camel@cavium.com
State New, archived
Headers

Commit Message

Steve Ellcey Aug. 4, 2017, 11:15 p.m. UTC
  On Fri, 2017-08-04 at 00:12 +0000, Joseph Myers wrote:
> On Thu, 3 Aug 2017, Wilco Dijkstra wrote:
> 
> > The generic implementation may well be faster... I'm not sure where the
> > requirement of not raising inexact comes from (I don't see it in the definition
> > of lrint, and we generally don't care since inexact is set by almost every FP
> > calculation), but if it is absolutely required you'd special case values larger
> > than LONG_MAX.
> The requirement comes from lrint being bound to IEEE 754 conversion 
> operations, so only raising inexact under the conditions specified and no 
> spurious inexact.


Here is a new version of this patch.  It (mostly) avoids fenv calls
when not needed and preserves any exceptions that may be set on entry
to the function.

Steve Ellcey
sellcey@cavium.com


2017-08-04  Steve Ellcey  <sellcey@cavium.com>

	* sysdeps/aarch64/fpu/s_llrint.c (OREG_SIZE): New macro.
	* sysdeps/aarch64/fpu/s_llround.c (OREG_SIZE): Likewise.
	* sysdeps/aarch64/fpu/s_llrintf.c (OREGS, IREGS): Remove.
	(IREG_SIZE, OREG_SIZE): New macros.
	* sysdeps/aarch64/fpu/s_llroundf.c: (OREGS, IREGS): Remove.
	(IREG_SIZE, OREG_SIZE): New macros.
	* sysdeps/aarch64/fpu/s_lrintf.c (IREGS): Remove.
	(IREG_SIZE): New macro.
	* sysdeps/aarch64/fpu/s_lroundf.c (IREGS): Remove.
	(IREG_SIZE): New macro.
	* sysdeps/aarch64/fpu/s_lrint.c (math_private.h, fenv.h, stdint.h):
	New includes.
	(IREG_SIZE, OREG_SIZE): Initialize if not already set.
	(OREGS, IREGS): Set based on IREG_SIZE and OREG_SIZE.
	(__CONCATX): Handle exceptions correctly on large values that may
	set FE_INVALID.
	* sysdeps/aarch64/fpu/s_lround.c (IREG_SIZE, OREG_SIZE):
	Initialize if not already set.
        (OREGS, IREGS): Set based on IREG_SIZE and OREG_SIZE.
  

Comments

Szabolcs Nagy Aug. 8, 2017, 3:01 p.m. UTC | #1
On 05/08/17 00:15, Steve Ellcey wrote:
> On Fri, 2017-08-04 at 00:12 +0000, Joseph Myers wrote:
>> > On Thu, 3 Aug 2017, Wilco Dijkstra wrote:
>> > 
>>> > > The generic implementation may well be faster... I'm not sure where the
>>> > > requirement of not raising inexact comes from (I don't see it in the definition
>>> > > of lrint, and we generally don't care since inexact is set by almost every FP
>>> > > calculation), but if it is absolutely required you'd special case values larger
>>> > > than LONG_MAX.
>> > The requirement comes from lrint being bound to IEEE 754 conversion 
>> > operations, so only raising inexact under the conditions specified and no 
>> > spurious inexact.
> 
> Here is a new version of this patch.  It (mostly) avoids fenv calls
> when not needed and preserves any exceptions that may be set on entry
> to the function.
> 
...
> +#if IREG_SIZE == 64 && OREG_SIZE == 32
> +  if (__builtin_fabs (x) > INT32_MAX - 2)

i don't understand the -2 here.

> +    {
> +      /* Converting large values to a 32 bit in may cause the frintx/fcvtza

s/in/int/

> +	 sequence to set both FE_INVALID and FE_INEXACT.  To avoid this
> +         we save and restore the FE and only set one or the other.  */
> +
> +      fenv_t env;
> +      bool invalid_p, inexact_p;
> +
> +      libc_feholdexcept (&env);
> +      asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
> +	    "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
> +	    : "=r" (result), "=w" (temp) : "w" (x) );
> +      invalid_p = libc_fetestexcept (FE_INVALID);
> +      inexact_p = libc_fetestexcept (FE_INEXACT);

multiple flags can be tested/raised in a single call.

> +      libc_fesetenv (&env);
> +
> +      if (invalid_p)
> +	feraiseexcept (FE_INVALID);
> +      else if (inexact_p)
> +	feraiseexcept (FE_INEXACT);
> +

i think correct trapping is not guaranteed by glibc,
only correct status flags when the function returns,
so spurious inexact is not a problem if it is already
raised, and then i expect better code gen for the
inexact clearing approach:

if (fabs (x) > INT32_MAX && fetestexcept (FE_INEXACT) == 0)
  {
    asm (...);
    if (fetestexcept (FE_INVALID|FE_INEXACT) == (FE_INVALID|FE_INEXACT))
      feclearexcept (FE_INEXACT);
  }
else
  asm (...);
  
Szabolcs Nagy Aug. 8, 2017, 3:22 p.m. UTC | #2
On 08/08/17 16:01, Szabolcs Nagy wrote:
> On 05/08/17 00:15, Steve Ellcey wrote:
>> On Fri, 2017-08-04 at 00:12 +0000, Joseph Myers wrote:
>>>> On Thu, 3 Aug 2017, Wilco Dijkstra wrote:
>>>>
>>>>>> The generic implementation may well be faster... I'm not sure where the
>>>>>> requirement of not raising inexact comes from (I don't see it in the definition
>>>>>> of lrint, and we generally don't care since inexact is set by almost every FP
>>>>>> calculation), but if it is absolutely required you'd special case values larger
>>>>>> than LONG_MAX.
>>>> The requirement comes from lrint being bound to IEEE 754 conversion 
>>>> operations, so only raising inexact under the conditions specified and no 
>>>> spurious inexact.
>>
>> Here is a new version of this patch.  It (mostly) avoids fenv calls
>> when not needed and preserves any exceptions that may be set on entry
>> to the function.
>>
> ...
>> +#if IREG_SIZE == 64 && OREG_SIZE == 32
>> +  if (__builtin_fabs (x) > INT32_MAX - 2)
> 
> i don't understand the -2 here.
> 
>> +    {
>> +      /* Converting large values to a 32 bit in may cause the frintx/fcvtza
> 
> s/in/int/
> 
>> +	 sequence to set both FE_INVALID and FE_INEXACT.  To avoid this
>> +         we save and restore the FE and only set one or the other.  */
>> +
>> +      fenv_t env;
>> +      bool invalid_p, inexact_p;
>> +
>> +      libc_feholdexcept (&env);
>> +      asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
>> +	    "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
>> +	    : "=r" (result), "=w" (temp) : "w" (x) );
>> +      invalid_p = libc_fetestexcept (FE_INVALID);
>> +      inexact_p = libc_fetestexcept (FE_INEXACT);
> 
> multiple flags can be tested/raised in a single call.
> 
>> +      libc_fesetenv (&env);
>> +
>> +      if (invalid_p)
>> +	feraiseexcept (FE_INVALID);
>> +      else if (inexact_p)
>> +	feraiseexcept (FE_INEXACT);
>> +
> 
> i think correct trapping is not guaranteed by glibc,
> only correct status flags when the function returns,
> so spurious inexact is not a problem if it is already
> raised, and then i expect better code gen for the
> inexact clearing approach:
> 
> if (fabs (x) > INT32_MAX && fetestexcept (FE_INEXACT) == 0)
>   {
>     asm (...);
>     if (fetestexcept (FE_INVALID|FE_INEXACT) == (FE_INVALID|FE_INEXACT))
>       feclearexcept (FE_INEXACT);

Wilco pointed out to me that this approach would be
more complicated because invalid may be already raised
so you need to check that too, clear it if it's set
and restore it at the end..

>   }
> else
>   asm (...);
> 
>
  
Joseph Myers Aug. 8, 2017, 5:21 p.m. UTC | #3
On Tue, 8 Aug 2017, Szabolcs Nagy wrote:

> i think correct trapping is not guaranteed by glibc,
> only correct status flags when the function returns,
> so spurious inexact is not a problem if it is already
> raised, and then i expect better code gen for the
> inexact clearing approach:

Since we have APIs for enabling / disabling exception traps, I think it's 
expected that any spurious exceptions raised internally will be raised 
inside an feholdexcept context so the user doesn't see traps.  (We do not 
claim anything about the number of times a given exception is raised 
within a function, beyond whether it's zero or nonzero, or about the order 
in which different exceptions are raised by a function raising multiple 
exceptions, or about which subexceptions are raised on architectures such 
as powerpc that support subexceptions.)
  

Patch

diff --git a/sysdeps/aarch64/fpu/s_llrint.c b/sysdeps/aarch64/fpu/s_llrint.c
index c0d0d0e..57821c0 100644
--- a/sysdeps/aarch64/fpu/s_llrint.c
+++ b/sysdeps/aarch64/fpu/s_llrint.c
@@ -18,4 +18,5 @@ 
 
 #define FUNC llrint
 #define OTYPE long long int
+#define OREG_SIZE 64
 #include <s_lrint.c>
diff --git a/sysdeps/aarch64/fpu/s_llrintf.c b/sysdeps/aarch64/fpu/s_llrintf.c
index 67724c6..98ed4f8 100644
--- a/sysdeps/aarch64/fpu/s_llrintf.c
+++ b/sysdeps/aarch64/fpu/s_llrintf.c
@@ -18,6 +18,7 @@ 
 
 #define FUNC llrintf
 #define ITYPE float
-#define IREGS "s"
+#define IREG_SIZE 32
 #define OTYPE long long int
+#define OREG_SIZE 64
 #include <s_lrint.c>
diff --git a/sysdeps/aarch64/fpu/s_llround.c b/sysdeps/aarch64/fpu/s_llround.c
index ed4b192..ef7aedf 100644
--- a/sysdeps/aarch64/fpu/s_llround.c
+++ b/sysdeps/aarch64/fpu/s_llround.c
@@ -18,4 +18,5 @@ 
 
 #define FUNC llround
 #define OTYPE long long int
+#define OREG_SIZE 64
 #include <s_lround.c>
diff --git a/sysdeps/aarch64/fpu/s_llroundf.c b/sysdeps/aarch64/fpu/s_llroundf.c
index 360ce8b..294f0f4 100644
--- a/sysdeps/aarch64/fpu/s_llroundf.c
+++ b/sysdeps/aarch64/fpu/s_llroundf.c
@@ -18,6 +18,7 @@ 
 
 #define FUNC llroundf
 #define ITYPE float
-#define IREGS "s"
+#define IREG_SIZE 32
 #define OTYPE long long int
+#define OREG_SIZE 64
 #include <s_lround.c>
diff --git a/sysdeps/aarch64/fpu/s_lrint.c b/sysdeps/aarch64/fpu/s_lrint.c
index 8c61a03..19f9b5b 100644
--- a/sysdeps/aarch64/fpu/s_lrint.c
+++ b/sysdeps/aarch64/fpu/s_lrint.c
@@ -16,7 +16,10 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include <math_private.h>
 #include <math.h>
+#include <fenv.h>
+#include <stdint.h>
 
 #ifndef FUNC
 # define FUNC lrint
@@ -24,18 +27,37 @@ 
 
 #ifndef ITYPE
 # define ITYPE double
-# define IREGS "d"
+# define IREG_SIZE 64
 #else
-# ifndef IREGS
-#  error IREGS not defined
+# ifndef IREG_SIZE
+#  error IREG_SIZE not defined
 # endif
 #endif
 
 #ifndef OTYPE
 # define OTYPE long int
+# ifdef __ILP32__
+#  define OREG_SIZE 32
+# else
+#  define OREG_SIZE 64
+# endif
+#else
+# ifndef OREG_SIZE
+#  error OREG_SIZE not defined
+# endif
+#endif
+
+#if IREG_SIZE == 32
+# define IREGS "s"
+#else
+# define IREGS "d"
 #endif
 
-#define OREGS "x"
+#if OREG_SIZE == 32
+# define OREGS "w"
+#else
+# define OREGS "x"
+#endif
 
 #define __CONCATX(a,b) __CONCAT(a,b)
 
@@ -44,6 +66,33 @@  __CONCATX(__,FUNC) (ITYPE x)
 {
   OTYPE result;
   ITYPE temp;
+
+#if IREG_SIZE == 64 && OREG_SIZE == 32
+  if (__builtin_fabs (x) > INT32_MAX - 2)
+    {
+      /* Converting large values to a 32 bit in may cause the frintx/fcvtza
+	 sequence to set both FE_INVALID and FE_INEXACT.  To avoid this
+         we save and restore the FE and only set one or the other.  */
+
+      fenv_t env;
+      bool invalid_p, inexact_p;
+
+      libc_feholdexcept (&env);
+      asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
+	    "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
+	    : "=r" (result), "=w" (temp) : "w" (x) );
+      invalid_p = libc_fetestexcept (FE_INVALID);
+      inexact_p = libc_fetestexcept (FE_INEXACT);
+      libc_fesetenv (&env);
+
+      if (invalid_p)
+	feraiseexcept (FE_INVALID);
+      else if (inexact_p)
+	feraiseexcept (FE_INEXACT);
+
+      return result;
+  }
+#endif
   asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
         "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
         : "=r" (result), "=w" (temp) : "w" (x) );
diff --git a/sysdeps/aarch64/fpu/s_lrintf.c b/sysdeps/aarch64/fpu/s_lrintf.c
index a995e4b..2e73271 100644
--- a/sysdeps/aarch64/fpu/s_lrintf.c
+++ b/sysdeps/aarch64/fpu/s_lrintf.c
@@ -18,5 +18,5 @@ 
 
 #define FUNC lrintf
 #define ITYPE float
-#define IREGS "s"
+#define IREG_SIZE 32
 #include <s_lrint.c>
diff --git a/sysdeps/aarch64/fpu/s_lround.c b/sysdeps/aarch64/fpu/s_lround.c
index 9be9e7f..1f77d82 100644
--- a/sysdeps/aarch64/fpu/s_lround.c
+++ b/sysdeps/aarch64/fpu/s_lround.c
@@ -24,18 +24,37 @@ 
 
 #ifndef ITYPE
 # define ITYPE double
-# define IREGS "d"
+# define IREG_SIZE 64
 #else
-# ifndef IREGS
-#  error IREGS not defined
+# ifndef IREG_SIZE
+#  error IREG_SIZE not defined
 # endif
 #endif
 
 #ifndef OTYPE
 # define OTYPE long int
+# ifdef __ILP32__
+#  define OREG_SIZE 32
+# else
+#  define OREG_SIZE 64
+# endif
+#else
+# ifndef OREG_SIZE
+#  error OREG_SIZE not defined
+# endif
+#endif
+
+#if IREG_SIZE == 32
+# define IREGS "s"
+#else
+# define IREGS "d"
 #endif
 
-#define OREGS "x"
+#if OREG_SIZE == 32
+# define OREGS "w"
+#else
+# define OREGS "x"
+#endif
 
 #define __CONCATX(a,b) __CONCAT(a,b)
 
diff --git a/sysdeps/aarch64/fpu/s_lroundf.c b/sysdeps/aarch64/fpu/s_lroundf.c
index 4a066d4..b30ddb6 100644
--- a/sysdeps/aarch64/fpu/s_lroundf.c
+++ b/sysdeps/aarch64/fpu/s_lroundf.c
@@ -18,5 +18,5 @@ 
 
 #define FUNC lroundf
 #define ITYPE float
-#define IREGS "s"
+#define IREG_SIZE 32
 #include <s_lround.c>