[v2,10/12] soft-fp: Add the lack of implementation for 128 bit

Message ID fd0613d051df139ccef1f5f774ca8873c4ee4458.1531801545.git.zong@andestech.com
State New, archived
Headers

Commit Message

Zong Li July 17, 2018, 5:07 a.m. UTC
  Add the definition of macros for 8 bytes length. I only add the
lack of implementation when building the RISC-V 32 bit port.
---
 soft-fp/op-8.h | 176 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 176 insertions(+)
  

Comments

Richard Henderson July 17, 2018, 6:20 p.m. UTC | #1
On 07/16/2018 10:07 PM, Zong Li wrote:
> +#define _FP_FRAC_ADD_8(R, X, Y)                                 \
> +  __FP_FRAC_ADD_8 (R##_f[7], R##_f[6], R##_f[5], R##_f[4],      \
> +                   R##_f[3], R##_f[2], R##_f[1], R##_f[0],      \
> +                   X##_f[7], X##_f[6], X##_f[5], X##_f[4],      \
> +                   X##_f[3], X##_f[2], X##_f[1], X##_f[0],      \
> +                   Y##_f[7], Y##_f[6], Y##_f[5], Y##_f[4],      \
> +                   Y##_f[3], Y##_f[2], Y##_f[1], Y##_f[0])

I don't think there's a need for the intermediate macro.
I think it would be better to use a loop for these.  E.g.

#define _FP_FRAC_ADD_8(R, X, Y)                               \
  do                                                          \
    {                                                         \
      _FP_W_TYPE fa8_c = 0;                                   \
      for (int fa8_i = 0; fa8_i < 8; ++fa8_i)                 \
        {                                                     \
          _FP_W_TYPE fa8_x = X##_f[fa8_i];                    \
          _FP_W_TYPE fa8_r = fa8_x + Y##_f[fa8_i] + fa8_c;    \
          fa8_c = (fa8_c ? fa8_r <= fa8_x : fa8_r < fa8_x);   \
        }                                                     \
    }                                                         \
  while (0)


> +#define _FP_FRAC_SUB_8(R, X, Y)                                 \

Likewise.

> +#define _FP_FRAC_CLZ_8(R, X)                    \

Likewise.

Otherwise, LGTM.


r~
  
Joseph Myers July 17, 2018, 10:09 p.m. UTC | #2
On Tue, 17 Jul 2018, Zong Li wrote:

> Add the definition of macros for 8 bytes length. I only add the
> lack of implementation when building the RISC-V 32 bit port.

Please use *self-contained* commit messages (= patch submission messages) 
with all relevant information from prior discussions.

My understanding from discussions of the previous patch series version is 
that this issue is something to do with the combination of soft-fp fma, 
ldbl-128 and 32-bit _FP_W_TYPE_SIZE, where every pair of those three is 
already used by some glibc port but none uses all three together.  So you 
need to explain that properly in your commit message.

In my view, such patches that are not logically part of a port should be 
posted separately from any patch series for the port, and before that 
patch series, with the patch series then being noted as depending on them 
(if it does).

I think consideration of (a revised version of) this patch is most 
appropriately deferred until 2.29 (and probably likewise the new port 
itself, given the associated risks and the lack of posting before the 
freeze - posting before the freeze being particularly important for any 
architecture-independent changes involved in a new port).
  
Zong Li July 18, 2018, 4:49 a.m. UTC | #3
Richard Henderson <rth@twiddle.net> 於 2018年7月18日 週三 上午2:20寫道:
>
> On 07/16/2018 10:07 PM, Zong Li wrote:
> > +#define _FP_FRAC_ADD_8(R, X, Y)                                 \
> > +  __FP_FRAC_ADD_8 (R##_f[7], R##_f[6], R##_f[5], R##_f[4],      \
> > +                   R##_f[3], R##_f[2], R##_f[1], R##_f[0],      \
> > +                   X##_f[7], X##_f[6], X##_f[5], X##_f[4],      \
> > +                   X##_f[3], X##_f[2], X##_f[1], X##_f[0],      \
> > +                   Y##_f[7], Y##_f[6], Y##_f[5], Y##_f[4],      \
> > +                   Y##_f[3], Y##_f[2], Y##_f[1], Y##_f[0])
>
> I don't think there's a need for the intermediate macro.
> I think it would be better to use a loop for these.  E.g.
>
> #define _FP_FRAC_ADD_8(R, X, Y)                               \
>   do                                                          \
>     {                                                         \
>       _FP_W_TYPE fa8_c = 0;                                   \
>       for (int fa8_i = 0; fa8_i < 8; ++fa8_i)                 \
>         {                                                     \
>           _FP_W_TYPE fa8_x = X##_f[fa8_i];                    \
>           _FP_W_TYPE fa8_r = fa8_x + Y##_f[fa8_i] + fa8_c;    \
>           fa8_c = (fa8_c ? fa8_r <= fa8_x : fa8_r < fa8_x);   \
>         }                                                     \
>     }                                                         \
>   while (0)
>
>
> > +#define _FP_FRAC_SUB_8(R, X, Y)                                 \
>
> Likewise.
>
> > +#define _FP_FRAC_CLZ_8(R, X)                    \
>
> Likewise.
>
> Otherwise, LGTM.
>
>
> r~

I use the intermediate macro by referring to the implementation in op-4.h
I will modify that on next version.
Thanks.
  
Zong Li July 18, 2018, 5:10 a.m. UTC | #4
Joseph Myers <joseph@codesourcery.com> 於 2018年7月18日 週三 上午6:09寫道:
>
> On Tue, 17 Jul 2018, Zong Li wrote:
>
> > Add the definition of macros for 8 bytes length. I only add the
> > lack of implementation when building the RISC-V 32 bit port.
>
> Please use *self-contained* commit messages (= patch submission messages)
> with all relevant information from prior discussions.
>
> My understanding from discussions of the previous patch series version is
> that this issue is something to do with the combination of soft-fp fma,
> ldbl-128 and 32-bit _FP_W_TYPE_SIZE, where every pair of those three is
> already used by some glibc port but none uses all three together.  So you
> need to explain that properly in your commit message.
>
OK. I will give more description in commit message and add the
*self-contained* in the submission message.

> In my view, such patches that are not logically part of a port should be
> posted separately from any patch series for the port, and before that
> patch series, with the patch series then being noted as depending on them
> (if it does).
>

OK. I will separate the patch from these patch set.

> I think consideration of (a revised version of) this patch is most
> appropriately deferred until 2.29 (and probably likewise the new port
> itself, given the associated risks and the lack of posting before the
> freeze - posting before the freeze being particularly important for any
> architecture-independent changes involved in a new port).

I know this concern and mechanism in glibc. Is there a chance to
include this modification at this moment? Because this new port
is the only one port to use it.

> --
> Joseph S. Myers
> joseph@codesourcery.com
  
Joseph Myers July 18, 2018, 8:40 p.m. UTC | #5
On Wed, 18 Jul 2018, Zong Li wrote:

> > I think consideration of (a revised version of) this patch is most
> > appropriately deferred until 2.29 (and probably likewise the new port
> > itself, given the associated risks and the lack of posting before the
> > freeze - posting before the freeze being particularly important for any
> > architecture-independent changes involved in a new port).
> 
> I know this concern and mechanism in glibc. Is there a chance to
> include this modification at this moment? Because this new port
> is the only one port to use it.

Well, being in the freeze you could test the patch much more extensively 
than would normally be expected for such a patch to justify it as not 
affecting existing configurations.  I'm thinking of a before-and-after 
comparison of installed stripped shared libraries built with 
build-many-glibcs.py - build unmodified glibc for all configurations 
build-many-glibcs.py supports, using the --strip option, save the install 
directories, rebuild for all configurations with the patch applied (and no 
other changes), compare the <configuration>/lib* directories with the 
saved copies from the previous build to make sure that none of the 
installed stripped shared libraries have changed at all (and also make 
sure that the test results are identical for the two builds).

That would be overkill for this patch normally - but late in the freeze as 
we are, we need to be much, much more careful about not breaking other 
configurations, and so much more thorough testing is appropriate (and of 
course the patch submission message then needs to describe that testing).
  
Zong Li July 20, 2018, 2:47 p.m. UTC | #6
Joseph Myers <joseph@codesourcery.com> 於 2018年7月19日 週四 上午4:40寫道:
>
> On Wed, 18 Jul 2018, Zong Li wrote:
>
> > > I think consideration of (a revised version of) this patch is most
> > > appropriately deferred until 2.29 (and probably likewise the new port
> > > itself, given the associated risks and the lack of posting before the
> > > freeze - posting before the freeze being particularly important for any
> > > architecture-independent changes involved in a new port).
> >
> > I know this concern and mechanism in glibc. Is there a chance to
> > include this modification at this moment? Because this new port
> > is the only one port to use it.
>
> Well, being in the freeze you could test the patch much more extensively
> than would normally be expected for such a patch to justify it as not
> affecting existing configurations.

Thanks, I am going to do my best to fix up this port problem before
release date.

> I'm thinking of a before-and-after
> comparison of installed stripped shared libraries built with
> build-many-glibcs.py - build unmodified glibc for all configurations
> build-many-glibcs.py supports, using the --strip option, save the install
> directories, rebuild for all configurations with the patch applied (and no
> other changes), compare the <configuration>/lib* directories with the
> saved copies from the previous build to make sure that none of the
> installed stripped shared libraries have changed at all (and also make
> sure that the test results are identical for the two builds).
>
> That would be overkill for this patch normally - but late in the freeze as
> we are, we need to be much, much more careful about not breaking other
> configurations, and so much more thorough testing is appropriate (and of
> course the patch submission message then needs to describe that testing).
>

This idea is awesome. That is helpful for port developer no matter for new port
or old port, thanks for your effort.

> --
> Joseph S. Myers
> joseph@codesourcery.com
  
Zong Li July 20, 2018, 2:54 p.m. UTC | #7
> > +#ifndef __FP_FRAC_SUB_8
> > +# define __FP_FRAC_SUB_8(r7, r6, r5, r4, r3, r2, r1, r0,                \
> . . . .
> > +      r2 -= __FP_FRAC_SUB_8_c2;                                         \
> > +      __FP_FRAC_SUB_8_c3 |= __FP_FRAC_SUB_8_c2 && (y2 == x2);           \
> > +      r3 = x3 - y3;                                                     \
> > +      __FP_FRAC_SUB_8_c4 = r3 > x3;                                     \
> > +      r2 -= __FP_FRAC_SUB_8_c3;                                         \
>
> r2 ? Not r3?  There are three other cases that look like cut-n-paste without a tweak...
>
> > +      r2 -= __FP_FRAC_SUB_8_c4;                                         \
> > +      r2 -= __FP_FRAC_SUB_8_c5;                                         \
> > +      r2 -= __FP_FRAC_SUB_8_c6;                                         \
>
> Should these be r4, r5, r6, and r7 ?
>
> Do we know if this code ever gets tested with our testsuite?  Since these are new macros, I'd expect a new test to go with them...

Thanks, I will submit this patch with fix up. This macro will not be
used on 64-bit RISC-V port.
It just for soft float on 32-bit RISC-V port .
  

Patch

diff --git a/soft-fp/op-8.h b/soft-fp/op-8.h
index ffed258..2ef6228 100644
--- a/soft-fp/op-8.h
+++ b/soft-fp/op-8.h
@@ -35,6 +35,7 @@ 
 /* We need just a few things from here for op-4, if we ever need some
    other macros, they can be added.  */
 #define _FP_FRAC_DECL_8(X)	_FP_W_TYPE X##_f[8]
+#define _FP_FRAC_SET_8(X, I)    __FP_FRAC_SET_8 (X, I)
 #define _FP_FRAC_HIGH_8(X)	(X##_f[7])
 #define _FP_FRAC_LOW_8(X)	(X##_f[0])
 #define _FP_FRAC_WORD_8(X, w)	(X##_f[w])
@@ -147,4 +148,179 @@ 
     }									\
   while (0)
 
+#define _FP_FRAC_ADD_8(R, X, Y)                                 \
+  __FP_FRAC_ADD_8 (R##_f[7], R##_f[6], R##_f[5], R##_f[4],      \
+                   R##_f[3], R##_f[2], R##_f[1], R##_f[0],      \
+                   X##_f[7], X##_f[6], X##_f[5], X##_f[4],      \
+                   X##_f[3], X##_f[2], X##_f[1], X##_f[0],      \
+                   Y##_f[7], Y##_f[6], Y##_f[5], Y##_f[4],      \
+                   Y##_f[3], Y##_f[2], Y##_f[1], Y##_f[0])
+
+#define _FP_FRAC_SUB_8(R, X, Y)                                 \
+  __FP_FRAC_SUB_8 (R##_f[7], R##_f[6], R##_f[5], R##_f[4],      \
+                   R##_f[3], R##_f[2], R##_f[1], R##_f[0],      \
+                   X##_f[7], X##_f[6], X##_f[5], X##_f[4],      \
+                   X##_f[3], X##_f[2], X##_f[1], X##_f[0],      \
+                   Y##_f[7], Y##_f[6], Y##_f[5], Y##_f[4],      \
+                   Y##_f[3], Y##_f[2], Y##_f[1], Y##_f[0])
+
+#define _FP_MINFRAC_8   0, 0, 0, 0, 0, 0, 0, 1
+
+#define _FP_FRAC_NEGP_8(X)      ((_FP_WS_TYPE) X##_f[7] < 0)
+#define _FP_FRAC_ZEROP_8(X)     \
+  ((X##_f[0] | X##_f[1] | X##_f[2] | X##_f[3] |                 \
+    X##_f[4] | X##_f[5] | X##_f[6] | X##_f[7]) == 0)
+#define _FP_FRAC_HIGHBIT_DW_8(fs, X)    \
+  (_FP_FRAC_HIGH_DW_##fs (X) & _FP_HIGHBIT_DW_##fs)
+
+#define _FP_FRAC_CLZ_8(R, X)                    \
+  do                                            \
+    {                                           \
+      if (X##_f[7])                             \
+        __FP_CLZ ((R), X##_f[7]);               \
+      else if (X##_f[6])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[6]);             \
+          (R) += _FP_W_TYPE_SIZE;               \
+        }                                       \
+      else if (X##_f[5])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[5]);             \
+          (R) += _FP_W_TYPE_SIZE*2;             \
+        }                                       \
+      else if (X##_f[4])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[4]);             \
+          (R) += _FP_W_TYPE_SIZE*3;             \
+        }                                       \
+      else if (X##_f[3])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[3]);             \
+          (R) += _FP_W_TYPE_SIZE*4;             \
+        }                                       \
+      else if (X##_f[2])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[2]);             \
+          (R) += _FP_W_TYPE_SIZE*5;             \
+        }                                       \
+      else if (X##_f[1])                        \
+        {                                       \
+          __FP_CLZ ((R), X##_f[1]);             \
+          (R) += _FP_W_TYPE_SIZE*6;             \
+        }                                       \
+      else                                      \
+        {                                       \
+          __FP_CLZ ((R), X##_f[0]);             \
+          (R) += _FP_W_TYPE_SIZE*7;             \
+        }                                       \
+    }                                           \
+  while (0)
+
+#define _FP_FRAC_COPY_4_8(D, S)                           \
+  do                                                      \
+    {                                                     \
+      D##_f[0] = S##_f[0];                                \
+      D##_f[1] = S##_f[1];                                \
+      D##_f[2] = S##_f[2];                                \
+      D##_f[3] = S##_f[3];                                \
+    }                                                     \
+  while (0)
+
+#define _FP_FRAC_COPY_8_4(D, S)                           \
+  do                                                      \
+    {                                                     \
+      D##_f[0] = S##_f[0];                                \
+      D##_f[1] = S##_f[1];                                \
+      D##_f[2] = S##_f[2];                                \
+      D##_f[3] = S##_f[3];                                \
+      D##_f[4] = D##_f[5] = D##_f[6] = D##_f[7]= 0;       \
+    }                                                     \
+  while (0)
+
+#define __FP_FRAC_SET_8(X, I7, I6, I5, I4, I3, I2, I1, I0)              \
+  (X##_f[7] = I7, X##_f[6] = I6, X##_f[5] = I5, X##_f[4] = I4,          \
+   X##_f[3] = I3, X##_f[2] = I2, X##_f[1] = I1, X##_f[0] = I0)
+
+#ifndef __FP_FRAC_ADD_8
+# define __FP_FRAC_ADD_8(r7, r6, r5, r4, r3, r2, r1, r0,                \
+                         x7, x6, x5, x4, x3, x2, x1, x0,                \
+                         y7, y6, y5, y4, y3, y2, y1, y0)                \
+  do                                                                    \
+    {                                                                   \
+      _FP_W_TYPE __FP_FRAC_ADD_8_c1, __FP_FRAC_ADD_8_c2;                \
+      _FP_W_TYPE __FP_FRAC_ADD_8_c3, __FP_FRAC_ADD_8_c4;                \
+      _FP_W_TYPE __FP_FRAC_ADD_8_c5, __FP_FRAC_ADD_8_c6;                \
+      _FP_W_TYPE __FP_FRAC_ADD_8_c7;                                    \
+      r0 = x0 + y0;                                                     \
+      __FP_FRAC_ADD_8_c1 = r0 < x0;                                     \
+      r1 = x1 + y1;                                                     \
+      __FP_FRAC_ADD_8_c2 = r1 < x1;                                     \
+      r1 += __FP_FRAC_ADD_8_c1;                                         \
+      __FP_FRAC_ADD_8_c2 |= r1 < __FP_FRAC_ADD_8_c1;                    \
+      r2 = x2 + y2;                                                     \
+      __FP_FRAC_ADD_8_c3 = r2 < x2;                                     \
+      r2 += __FP_FRAC_ADD_8_c2;                                         \
+      __FP_FRAC_ADD_8_c3 |= r2 < __FP_FRAC_ADD_8_c2;                    \
+      r3 = x3 + y3;                                                     \
+      __FP_FRAC_ADD_8_c4 = r3 < x3;                                     \
+      r3 += __FP_FRAC_ADD_8_c3;                                         \
+      __FP_FRAC_ADD_8_c4 |= r1 < __FP_FRAC_ADD_8_c3;                    \
+      r4 = x4 + y4;                                                     \
+      __FP_FRAC_ADD_8_c5 = r4 < x4;                                     \
+      r4 += __FP_FRAC_ADD_8_c4;                                         \
+      __FP_FRAC_ADD_8_c5 |= r1 < __FP_FRAC_ADD_8_c4;                    \
+      r5 = x5 + y5;                                                     \
+      __FP_FRAC_ADD_8_c5 = r5 < x5;                                     \
+      r5 += __FP_FRAC_ADD_8_c5;                                         \
+      __FP_FRAC_ADD_8_c6 |= r1 < __FP_FRAC_ADD_8_c5;                    \
+      r6 = x6 + y6;                                                     \
+      __FP_FRAC_ADD_8_c7 = r6 < x6;                                     \
+      r6 += __FP_FRAC_ADD_8_c6;                                         \
+      __FP_FRAC_ADD_8_c7 |= r1 < __FP_FRAC_ADD_8_c6;                    \
+      r7 = x7 + y7 + __FP_FRAC_ADD_8_c7;                                \
+    }                                                                   \
+  while (0)
+#endif
+
+#ifndef __FP_FRAC_SUB_8
+# define __FP_FRAC_SUB_8(r7, r6, r5, r4, r3, r2, r1, r0,                \
+                         x7, x6, x5, x4, x3, x2, x1, x0,                \
+                         y7, y6, y5, y4, y3, y2, y1, y0)                \
+  do                                                                    \
+    {                                                                   \
+      _FP_W_TYPE __FP_FRAC_SUB_8_c1, __FP_FRAC_SUB_8_c2;                \
+      _FP_W_TYPE __FP_FRAC_SUB_8_c3, __FP_FRAC_SUB_8_c4;                \
+      _FP_W_TYPE __FP_FRAC_SUB_8_c5, __FP_FRAC_SUB_8_c6;                \
+      _FP_W_TYPE __FP_FRAC_SUB_8_c7;                                    \
+      r0 = x0 - y0;                                                     \
+      __FP_FRAC_SUB_8_c1 = r0 > x0;                                     \
+      r1 = x1 - y1;                                                     \
+      __FP_FRAC_SUB_8_c2 = r1 > x1;                                     \
+      r1 -= __FP_FRAC_SUB_8_c1;                                         \
+      __FP_FRAC_SUB_8_c2 |= __FP_FRAC_SUB_8_c1 && (y1 == x1);           \
+      r2 = x2 - y2;                                                     \
+      __FP_FRAC_SUB_8_c3 = r2 > x2;                                     \
+      r2 -= __FP_FRAC_SUB_8_c2;                                         \
+      __FP_FRAC_SUB_8_c3 |= __FP_FRAC_SUB_8_c2 && (y2 == x2);           \
+      r3 = x3 - y3;                                                     \
+      __FP_FRAC_SUB_8_c4 = r3 > x3;                                     \
+      r2 -= __FP_FRAC_SUB_8_c3;                                         \
+      __FP_FRAC_SUB_8_c4 |= __FP_FRAC_SUB_8_c3 && (y3 == x3);           \
+      r4 = x4 - y4;                                                     \
+      __FP_FRAC_SUB_8_c5 = r4 > x4;                                     \
+      r2 -= __FP_FRAC_SUB_8_c4;                                         \
+      __FP_FRAC_SUB_8_c5 |= __FP_FRAC_SUB_8_c4 && (y4 == x4);           \
+      r5 = x5 - y5;                                                     \
+      __FP_FRAC_SUB_8_c6 = r5 > x5;                                     \
+      r2 -= __FP_FRAC_SUB_8_c5;                                         \
+      __FP_FRAC_SUB_8_c6 |= __FP_FRAC_SUB_8_c5 && (y5 == x5);           \
+      r6 = x6 - y6;                                                     \
+      __FP_FRAC_SUB_8_c7 = r6 > x6;                                     \
+      r2 -= __FP_FRAC_SUB_8_c6;                                         \
+      __FP_FRAC_SUB_8_c7 |= __FP_FRAC_SUB_8_c6 && (y6 == x6);           \
+      r7 = x7 - y7 - __FP_FRAC_SUB_8_c7;                                \
+    }                                                                   \
+  while (0)
+#endif
+
 #endif /* !SOFT_FP_OP_8_H */