[2/2] soft-fp: Add the lack of implementation for 128 bit self-contained

Message ID 04da5a0bfe2816e47c7b52c7e82dc6ec87db7527.1539595555.git.zongbox@gmail.com
State New, archived
Headers

Commit Message

Zong Li Oct. 15, 2018, 11:52 a.m. UTC
  Here only add the lack of implementation when building the RISC-V 32-bit
port.

These marcos are used when the following situations occur at the same
time: soft-fp fma, ldbl-128 and 32-bit _FP_W_TYPE_SIZE. The RISC-V
32-bit port is the first port which use all three together.

This is the building flow about the situation:
When building soft-fp/s_fmal.c, there uses the FP_FMA_Q in __fmal.
The _FP_W_TYPE_SIZE is defined to 32-bit in sysdeps/riscv/sfp-machine.h,
so the FP_FMA_Q was defined to _FP_FMA (Q, 4, 8, R, X, Y, Z) in
soft-fp/quad.h.

Something in the soft-fp/quad.h:
 #if _FP_W_TYPE_SIZE < 64
    # define FP_FMA_Q(R, X, Y, Z)    _FP_FMA (Q, 4, 8, R, X, Y, Z)
 #else
    # define FP_FMA_Q(R, X, Y, Z)    _FP_FMA (Q, 2, 4, R, X, Y, Z)
 #endif

Finally, in _FP_FMA (fs, wc, dwc, R, X, Y, Z), it will use the
_FP_FRAC_HIGHBIT_DW_##dwc macro, and it will be expanded to
_FP_FRAC_HIGHBIT_DW_8, but the _FP_FRAC_HIGHBIT_DW_8 is not be
implemented in soft-fp/op-8.h. there is only _FP_FRAC_HIGHBIT_DW_1,
_FP_FRAC_HIGHBIT_DW_2 and _FP_FRAC_HIGHBIT_DW_4 in the
soft-fp/op-*.h.

After this modification, we can pass the soft floating testing of glibc
testsuites on RV32.

	* soft-fp/op-8.h: Add macros for RV32 use.
---
 ChangeLog      |   1 +
 soft-fp/op-8.h | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+)
  

Comments

Joseph Myers Oct. 15, 2018, 3:10 p.m. UTC | #1
On Mon, 15 Oct 2018, Zong Li wrote:

> Here only add the lack of implementation when building the RISC-V 32-bit
> port.

You're adding an implementation, not adding a lack of implementation 
(you're adding the implementation of the pieces that are needed by the 
RISC-V 32-bit port).  The same point applies to the proposed summary line 
for the commit.

> These marcos are used when the following situations occur at the same

s/marcos/macros/

> +	* soft-fp/op-8.h: Add macros for RV32 use.

Please see the GNU Coding Standards for how to write ChangeLog entries.  
In this case, you need to list all the new macros individually.

> +#define _FP_FRAC_CLZ_8(R, X)                    \

A previous review comment suggested a loop here.  You expressed a 
performance concern in 
<https://sourceware.org/ml/libc-alpha/2018-07/msg00929.html>.  Have you 
done any benchmarking since then to justify the concern and not making the 
suggested change?
  
Zong Li Oct. 15, 2018, 4:47 p.m. UTC | #2
Joseph Myers <joseph@codesourcery.com> 於 2018年10月15日 週一 下午11:10寫道:
>
> On Mon, 15 Oct 2018, Zong Li wrote:
>
> > Here only add the lack of implementation when building the RISC-V 32-bit
> > port.
>
> You're adding an implementation, not adding a lack of implementation
> (you're adding the implementation of the pieces that are needed by the
> RISC-V 32-bit port).  The same point applies to the proposed summary line
> for the commit.

I see, I will change that. I use 'lack' word because there are the
implementations
for different size in other op-*.h files, but not in op-8.h

> > These marcos are used when the following situations occur at the same
>
> s/marcos/macros/
>
> > +     * soft-fp/op-8.h: Add macros for RV32 use.
>
> Please see the GNU Coding Standards for how to write ChangeLog entries.
> In this case, you need to list all the new macros individually.
>

Ok, I modify it together in next version.

> > +#define _FP_FRAC_CLZ_8(R, X)                    \
>
> A previous review comment suggested a loop here.  You expressed a
> performance concern in
> <https://sourceware.org/ml/libc-alpha/2018-07/msg00929.html>.  Have you
> done any benchmarking since then to justify the concern and not making the
> suggested change?
>

No, I didn't run some benchmarks. I worry about the performance
because I saw the
code generation got more one conditional branch for the looping for
every comparisons.
The two kinds of implementation are fine for me, but there are nobody
to discussion that,
so I retained the original one.
  
Jim Wilson Oct. 15, 2018, 5 p.m. UTC | #3
On Mon, Oct 15, 2018 at 9:48 AM Zong Li <zongbox@gmail.com> wrote:
> > > Here only add the lack of implementation when building the RISC-V 32-bit
> > > port.

"Add the missing implementation ..." would be one way to rewrite this
in better English.

Jim
  
Zong Li Oct. 16, 2018, 1:19 a.m. UTC | #4
Jim Wilson <jimw@sifive.com> 於 2018年10月16日 週二 上午1:00寫道:
>
> On Mon, Oct 15, 2018 at 9:48 AM Zong Li <zongbox@gmail.com> wrote:
> > > > Here only add the lack of implementation when building the RISC-V 32-bit
> > > > port.
>
> "Add the missing implementation ..." would be one way to rewrite this
> in better English.
>
Hi Jim,

Thanks, it is another good choice.
  
Zong Li Oct. 22, 2018, 7:48 a.m. UTC | #5
Zong Li <zongbox@gmail.com> 於 2018年10月16日 週二 上午12:47寫道:
>
> Joseph Myers <joseph@codesourcery.com> 於 2018年10月15日 週一 下午11:10寫道:
> >
> > On Mon, 15 Oct 2018, Zong Li wrote:
> >
> > > Here only add the lack of implementation when building the RISC-V 32-bit
> > > port.
> >
> > You're adding an implementation, not adding a lack of implementation
> > (you're adding the implementation of the pieces that are needed by the
> > RISC-V 32-bit port).  The same point applies to the proposed summary line
> > for the commit.
>
> I see, I will change that. I use 'lack' word because there are the
> implementations
> for different size in other op-*.h files, but not in op-8.h
>
> > > These marcos are used when the following situations occur at the same
> >
> > s/marcos/macros/
> >
> > > +     * soft-fp/op-8.h: Add macros for RV32 use.
> >
> > Please see the GNU Coding Standards for how to write ChangeLog entries.
> > In this case, you need to list all the new macros individually.
> >
>
> Ok, I modify it together in next version.
>
> > > +#define _FP_FRAC_CLZ_8(R, X)                    \
> >
> > A previous review comment suggested a loop here.  You expressed a
> > performance concern in
> > <https://sourceware.org/ml/libc-alpha/2018-07/msg00929.html>.  Have you
> > done any benchmarking since then to justify the concern and not making the
> > suggested change?
> >
>
> No, I didn't run some benchmarks. I worry about the performance
> because I saw the
> code generation got more one conditional branch for the looping for
> every comparisons.
> The two kinds of implementation are fine for me, but there are nobody
> to discussion that,
> so I retained the original one.

Has anyone had some suggestions? I want to ensure that we can get the
suitable implementation for glibc before I summit next version. It
seem that the looping form is more simplified and without critical
side effect, If there is no any comments, I will use the looping form
for next version.
  

Patch

diff --git a/ChangeLog b/ChangeLog
index bb30093..47d3cf2 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,6 +1,7 @@ 
 2018-10-15  Zong Li  <zongbox@gmail.com>
 
 	* soft-fp/op-4.h: Fix wrong calculation of division.
+	* soft-fp/op-8.h: Add macros for RV32 use.
 
 2018-10-14  Paul Eggert  <eggert@cs.ucla.edu>
 
diff --git a/soft-fp/op-8.h b/soft-fp/op-8.h
index ffed258..45b019e 100644
--- a/soft-fp/op-8.h
+++ b/soft-fp/op-8.h
@@ -35,6 +35,7 @@ 
 /* We need just a few things from here for op-4, if we ever need some
    other macros, they can be added.  */
 #define _FP_FRAC_DECL_8(X)	_FP_W_TYPE X##_f[8]
+#define _FP_FRAC_SET_8(X, I)    __FP_FRAC_SET_8 (X, I)
 #define _FP_FRAC_HIGH_8(X)	(X##_f[7])
 #define _FP_FRAC_LOW_8(X)	(X##_f[0])
 #define _FP_FRAC_WORD_8(X, w)	(X##_f[w])
@@ -147,4 +148,110 @@ 
     }									\
   while (0)
 
+#define _FP_FRAC_ADD_8(R, X, Y)                                         \
+  do                                                                    \
+    {                                                                   \
+      _FP_W_TYPE fa8_c = 0;                                             \
+      for (int fa8_i = 0; fa8_i < 8; ++fa8_i)                           \
+        {                                                               \
+          R##_f[fa8_i] = X##_f[fa8_i] + Y##_f[fa8_i] + fa8_c;           \
+          fa8_c = (fa8_c                                                \
+                   ? R##_f[fa8_i] <= X##_f[fa8_i]                       \
+                   : R##_f[fa8_i] < X##_f[fa8_i]);                      \
+        }                                                               \
+    }                                                                   \
+  while (0)
+
+#define _FP_FRAC_SUB_8(R, X, Y)                                         \
+  do                                                                    \
+    {                                                                   \
+      _FP_W_TYPE fs8_c = 0;                                             \
+      for (int fs8_i = 0; fs8_i < 8; ++fs8_i)                           \
+        {                                                               \
+          R##_f[fs8_i] = X##_f[fs8_i] - Y##_f[fs8_i] - fs8_c;           \
+          fs8_c = (fs8_c                                                \
+                   ? R##_f[fs8_i] >= X##_f[fs8_i]                       \
+                   : R##_f[fs8_i] > X##_f[fs8_i]);                      \
+        }                                                               \
+    }                                                                   \
+  while (0)
+
+#define _FP_FRAC_CLZ_8(R, X)                    \
+  do                                            \
+    {                                           \
+      if (X##_f[7])                             \
+        __FP_CLZ ((R), X##_f[7]);               \
+      else if (X##_f[6])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[6]);             \
+          (R) += _FP_W_TYPE_SIZE;               \
+        }                                       \
+      else if (X##_f[5])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[5]);             \
+          (R) += _FP_W_TYPE_SIZE * 2;           \
+        }                                       \
+      else if (X##_f[4])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[4]);             \
+          (R) += _FP_W_TYPE_SIZE * 3;           \
+        }                                       \
+      else if (X##_f[3])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[3]);             \
+          (R) += _FP_W_TYPE_SIZE * 4;           \
+        }                                       \
+      else if (X##_f[2])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[2]);             \
+          (R) += _FP_W_TYPE_SIZE * 5;           \
+        }                                       \
+      else if (X##_f[1])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[1]);             \
+          (R) += _FP_W_TYPE_SIZE * 6;           \
+        }                                       \
+      else                                      \
+        {                                       \
+          __FP_CLZ ((R), X##_f[0]);             \
+          (R) += _FP_W_TYPE_SIZE * 7;           \
+        }                                       \
+    }                                           \
+  while (0)
+
+#define _FP_MINFRAC_8   0, 0, 0, 0, 0, 0, 0, 1
+
+#define _FP_FRAC_NEGP_8(X)      ((_FP_WS_TYPE) X##_f[7] < 0)
+#define _FP_FRAC_ZEROP_8(X)                                             \
+  ((X##_f[0] | X##_f[1] | X##_f[2] | X##_f[3]                           \
+    | X##_f[4] | X##_f[5] | X##_f[6] | X##_f[7]) == 0)
+#define _FP_FRAC_HIGHBIT_DW_8(fs, X)                                    \
+  (_FP_FRAC_HIGH_DW_##fs (X) & _FP_HIGHBIT_DW_##fs)
+
+
+#define _FP_FRAC_COPY_4_8(D, S)                           \
+  do                                                      \
+    {                                                     \
+      D##_f[0] = S##_f[0];                                \
+      D##_f[1] = S##_f[1];                                \
+      D##_f[2] = S##_f[2];                                \
+      D##_f[3] = S##_f[3];                                \
+    }                                                     \
+  while (0)
+
+#define _FP_FRAC_COPY_8_4(D, S)                           \
+  do                                                      \
+    {                                                     \
+      D##_f[0] = S##_f[0];                                \
+      D##_f[1] = S##_f[1];                                \
+      D##_f[2] = S##_f[2];                                \
+      D##_f[3] = S##_f[3];                                \
+      D##_f[4] = D##_f[5] = D##_f[6] = D##_f[7]= 0;       \
+    }                                                     \
+  while (0)
+
+#define __FP_FRAC_SET_8(X, I7, I6, I5, I4, I3, I2, I1, I0)             \
+  (X##_f[7] = I7, X##_f[6] = I6, X##_f[5] = I5, X##_f[4] = I4,         \
+   X##_f[3] = I3, X##_f[2] = I2, X##_f[1] = I1, X##_f[0] = I0)
+
 #endif /* !SOFT_FP_OP_8_H */