[1/2] soft-fp: Fix overwritten issue for division in op-4.h

Message ID 5e51c4ea8cfe99f2faac5e206982658bd6c5d265.1539595555.git.zongbox@gmail.com
State New, archived
Headers

Commit Message

Zong Li Oct. 15, 2018, 11:52 a.m. UTC
  The original value of X##_f[0] is overwritten before the calculatation.

After this modification, we can pass the soft floating testing of glibc
testsuites on RV32.

	* soft-fp/op-4.h: Fix wrong calculation of division.
---
 ChangeLog      | 4 ++++
 soft-fp/op-4.h | 7 ++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)
  

Comments

Joseph Myers Oct. 15, 2018, 1 p.m. UTC | #1
On Mon, 15 Oct 2018, Zong Li wrote:

> diff --git a/soft-fp/op-4.h b/soft-fp/op-4.h
> index 01b87d0..a407168 100644
> --- a/soft-fp/op-4.h
> +++ b/soft-fp/op-4.h
> @@ -517,10 +517,15 @@
>  	      R##_f[_FP_DIV_MEAT_4_udiv_i] = -1;			\
>  	      if (!_FP_DIV_MEAT_4_udiv_i)				\
>  		break;							\
> -	      __FP_FRAC_SUB_4 (X##_f[3], X##_f[2], X##_f[1], X##_f[0],	\
> +	      UWtype __FP_FRAC_tmp;					\
> +	      __FP_FRAC_SUB_4 (X##_f[2], X##_f[1], X##_f[0], __FP_FRAC_tmp,	\
>  			       Y##_f[2], Y##_f[1], Y##_f[0], 0,		\
>  			       X##_f[2], X##_f[1], X##_f[0],		\
>  			       _FP_DIV_MEAT_4_udiv_n_f[_FP_DIV_MEAT_4_udiv_i]); \
> +	      X##_f[3] = X##_f[2];					\
> +	      X##_f[2] = X##_f[1];					\
> +	      X##_f[1] = X##_f[0];					\
> +	      X##_f[0] = __FP_FRAC_tmp;					\

This approach, using the X array in a shifted form to store the result 
with a single temporary, is not a clean solution.

First, we should work out what the interface *should* be to all the 
FRAC_ADD and FRAC_SUB macros, regarding permitted overlap between the 
output and the inputs.  This requires examining all those macros and their 
uses to determine what consistent or nearly consistent patterns there are 
in that interface.

Given an appropriate choice of interface, all the definitions that do not 
fit that choice, and all the uses that do not fit that choice, then need 
to be fixed.  If that choice says the overlap above it not OK, a whole 
four-word temporary should be defined using _FP_FRAC_DECL_4 for the 
result, and then copied with _FP_FRAC_COPY_4.  The commit message for any 
proposed change needs to describe the conclusions regarding the interface 
for permitted overlaps, and the analysis done to determine what needs to 
change.

I suspect that the intended interface is that *exact* overlap is permitted 
between the output and the *second* input, but not other overlap.  In 
which case at least the definitions of __FP_FRAC_SUB_3 and __FP_FRAC_SUB_4 
look buggy and need changing (they reference y1 and y2 after r1 and r2 
have been set), as well as the user you're changing above (which should 
use a full temporary).
  
Joseph Myers Oct. 15, 2018, 1:04 p.m. UTC | #2
On Mon, 15 Oct 2018, Joseph Myers wrote:

> to be fixed.  If that choice says the overlap above it not OK, a whole 
> four-word temporary should be defined using _FP_FRAC_DECL_4 for the 
> result, and then copied with _FP_FRAC_COPY_4.  The commit message for any 

In fact, you don't need that copy from the temporary back to X, because 
the result is immediately used in a call to _FP_FRAC_SUB_4 which can 
become _FP_FRAC_SUB_4 (X, Y, _FP_DIV_MEAT_4_tmp).
  
Zong Li Oct. 15, 2018, 5:28 p.m. UTC | #3
Joseph Myers <joseph@codesourcery.com> 於 2018年10月15日 週一 下午9:04寫道:
>
> On Mon, 15 Oct 2018, Joseph Myers wrote:
>
> > to be fixed.  If that choice says the overlap above it not OK, a whole
> > four-word temporary should be defined using _FP_FRAC_DECL_4 for the
> > result, and then copied with _FP_FRAC_COPY_4.  The commit message for any
>
> In fact, you don't need that copy from the temporary back to X, because
> the result is immediately used in a call to _FP_FRAC_SUB_4 which can
> become _FP_FRAC_SUB_4 (X, Y, _FP_DIV_MEAT_4_tmp).
>
Yes, I will use the full temporary replace with the shifted form if
overlay is not OK.
  
Zong Li Oct. 22, 2018, 9:55 a.m. UTC | #4
Joseph Myers <joseph@codesourcery.com> 於 2018年10月15日 週一 下午9:01寫道:
>
> On Mon, 15 Oct 2018, Zong Li wrote:
>
> > diff --git a/soft-fp/op-4.h b/soft-fp/op-4.h
> > index 01b87d0..a407168 100644
> > --- a/soft-fp/op-4.h
> > +++ b/soft-fp/op-4.h
> > @@ -517,10 +517,15 @@
> >             R##_f[_FP_DIV_MEAT_4_udiv_i] = -1;                        \
> >             if (!_FP_DIV_MEAT_4_udiv_i)                               \
> >               break;                                                  \
> > -           __FP_FRAC_SUB_4 (X##_f[3], X##_f[2], X##_f[1], X##_f[0],  \
> > +           UWtype __FP_FRAC_tmp;                                     \
> > +           __FP_FRAC_SUB_4 (X##_f[2], X##_f[1], X##_f[0], __FP_FRAC_tmp,     \
> >                              Y##_f[2], Y##_f[1], Y##_f[0], 0,         \
> >                              X##_f[2], X##_f[1], X##_f[0],            \
> >                              _FP_DIV_MEAT_4_udiv_n_f[_FP_DIV_MEAT_4_udiv_i]); \
> > +           X##_f[3] = X##_f[2];                                      \
> > +           X##_f[2] = X##_f[1];                                      \
> > +           X##_f[1] = X##_f[0];                                      \
> > +           X##_f[0] = __FP_FRAC_tmp;                                 \
>
> This approach, using the X array in a shifted form to store the result
> with a single temporary, is not a clean solution.
>
> First, we should work out what the interface *should* be to all the
> FRAC_ADD and FRAC_SUB macros, regarding permitted overlap between the
> output and the inputs.  This requires examining all those macros and their
> uses to determine what consistent or nearly consistent patterns there are
> in that interface.
>
> Given an appropriate choice of interface, all the definitions that do not
> fit that choice, and all the uses that do not fit that choice, then need
> to be fixed.  If that choice says the overlap above it not OK, a whole
> four-word temporary should be defined using _FP_FRAC_DECL_4 for the
> result, and then copied with _FP_FRAC_COPY_4.  The commit message for any
> proposed change needs to describe the conclusions regarding the interface
> for permitted overlaps, and the analysis done to determine what needs to
> change.
>
> I suspect that the intended interface is that *exact* overlap is permitted
> between the output and the *second* input, but not other overlap.  In
> which case at least the definitions of __FP_FRAC_SUB_3 and __FP_FRAC_SUB_4
> look buggy and need changing (they reference y1 and y2 after r1 and r2
> have been set), as well as the user you're changing above (which should
> use a full temporary).
>

I spend some time to check all the FRAC_ADD and FRAC_SUB use cases.

In FRAC_ADD_3 and FRAC_ADD_4(R, X, Y), it reference the x[N] after
r[N] have been set. If the x and r are the same address, the
expression r[N] < x[N] is always false(it always be equal), then the
carry bit is always zero. In glibc, all FRAC_ADD users aren't specify
the r and x with the same address, but there are some users specify
the r and y with the same address, and they are still work because
y[N] not be used again in FRAC_ADD.

In FRAC_SUB_3 and FRAC_SUB_4, it is more strict that it reference both
the x[N] and y[N] after r[N] have been set. If one of the x and y is
the same address with r, the result of the calculation is wrong
because the value of the original x and y are overwritten. In glibc,
there are two places use FRAC_SUB and occurs the overlap. The first is
_FP_DIV_MEAT_N_loop in op-common.h, it uses the source
_FP_DIV_MEAT_N_loop_u as the destination. This macro
only be used when N is one(_FP_DIV_MEAT_1_loop) and then the
_FP_FRAC_SUB_##wc extend to _FP_FRAC_SUB_1 in this macro. so it also
work because _FP_FRAC_SUB_1 has no overlap problem in its
implementation. The second places is I had modified in this patch, and
I would use the full temporary replace single temporary for next
version.

In FRAC_ADD_1, FRAC_ADD_2, FRAC_SUB_1 and FRAC_SUB_2, there don't
refer the source after destination have been set, so they have no
problem.

In conclusion, I inclines to keep the current implementation of
FRAC_SUB_3, FRAC_SUB_4, FRAC_ADD_3 and FRAC_ADD_4 instead of using the
temporary variables like FRAC_ADD_2 and FRAC_SUB_2, and fix the
_FP_DIV_MEAT_N_loop together in next version. Because using temporary
variables have to additional move the result from temporary to real
destination in the most situation. The macros user should be
responsible for ensuing no overlap
between sources and destination. I want to know whether anyone has
some suggestion, please let me know, thanks.

I test the macros by hand and glibc testsutie, and it seems that the
overlap is unreasonable. The overlap will cause the wrong result and
failed on glibc testsuite.
  
Joseph Myers Oct. 22, 2018, 12:10 p.m. UTC | #5
On Mon, 22 Oct 2018, Zong Li wrote:

> I spend some time to check all the FRAC_ADD and FRAC_SUB use cases.

Thanks for the analysis of the various definitions and users of these 
macros.

> In conclusion, I inclines to keep the current implementation of
> FRAC_SUB_3, FRAC_SUB_4, FRAC_ADD_3 and FRAC_ADD_4 instead of using the
> temporary variables like FRAC_ADD_2 and FRAC_SUB_2, and fix the
> _FP_DIV_MEAT_N_loop together in next version. Because using temporary

I think a basic principle here should be that all FRAC_ADD and FRAC_SUB 
macros have the same interface regarding what overlap is permitted between 
arguments and results.

That is, if that interface is defined not to allow any overlap, then there 
are several users to change to avoid overlap, even if they are only using 
FRAC_ADD macros with overlap between R and Y, or only FRAC_SUB_1 and 
FRAC_SUB_2, and so the overlap does not cause problems with the present 
implementations.

I think it would be less disruptive to allow exact overlap between 
corresponding elements of R and Y.  Then __FP_FRAC_SUB_3 and 
__FP_FRAC_SUB_4 would be fixed to use temporaries to follow that 
interface, while _FP_DIV_MEAT_4_udiv and _FP_DIV_MEAT_N_loop would also be 
made to use temporaries so their calls to the FRAC_SUB macros follow that 
restriction on overlap.  (It's true that the current sfp-machine.h files 
only use _FP_DIV_MEAT_1_loop, not _FP_DIV_MEAT_2_loop or 
_FP_DIV_MEAT_4_loop, but it's still appropriate to fix _FP_DIV_MEAT_N_loop 
to follow the FRAC_SUB interface properly.)

The compiler should have no difficulty optimizing out the temporaries in 
cases where they end up not being needed.

In all cases, names of temporary variables in soft-fp macros should start 
with the full name of the macro (so __FP_FRAC_SUB_3_* for temporaries 
inside __FP_FRAC_SUB_3, for example), to avoid the risk of bugs resulting 
from shadowing when one macro calls another macro with the same variable 
name.
  
Palmer Dabbelt Oct. 22, 2018, 7:31 p.m. UTC | #6
On Mon, 22 Oct 2018 05:10:59 PDT (-0700), joseph@codesourcery.com wrote:
> On Mon, 22 Oct 2018, Zong Li wrote:
>
>> I spend some time to check all the FRAC_ADD and FRAC_SUB use cases.
>
> Thanks for the analysis of the various definitions and users of these
> macros.
>
>> In conclusion, I inclines to keep the current implementation of
>> FRAC_SUB_3, FRAC_SUB_4, FRAC_ADD_3 and FRAC_ADD_4 instead of using the
>> temporary variables like FRAC_ADD_2 and FRAC_SUB_2, and fix the
>> _FP_DIV_MEAT_N_loop together in next version. Because using temporary
>
> I think a basic principle here should be that all FRAC_ADD and FRAC_SUB
> macros have the same interface regarding what overlap is permitted between
> arguments and results.
>
> That is, if that interface is defined not to allow any overlap, then there
> are several users to change to avoid overlap, even if they are only using
> FRAC_ADD macros with overlap between R and Y, or only FRAC_SUB_1 and
> FRAC_SUB_2, and so the overlap does not cause problems with the present
> implementations.
>
> I think it would be less disruptive to allow exact overlap between
> corresponding elements of R and Y.  Then __FP_FRAC_SUB_3 and
> __FP_FRAC_SUB_4 would be fixed to use temporaries to follow that
> interface, while _FP_DIV_MEAT_4_udiv and _FP_DIV_MEAT_N_loop would also be
> made to use temporaries so their calls to the FRAC_SUB macros follow that
> restriction on overlap.  (It's true that the current sfp-machine.h files
> only use _FP_DIV_MEAT_1_loop, not _FP_DIV_MEAT_2_loop or
> _FP_DIV_MEAT_4_loop, but it's still appropriate to fix _FP_DIV_MEAT_N_loop
> to follow the FRAC_SUB interface properly.)
>
> The compiler should have no difficulty optimizing out the temporaries in
> cases where they end up not being needed.
>
> In all cases, names of temporary variables in soft-fp macros should start
> with the full name of the macro (so __FP_FRAC_SUB_3_* for temporaries
> inside __FP_FRAC_SUB_3, for example), to avoid the risk of bugs resulting
> from shadowing when one macro calls another macro with the same variable
> name.

Thanks for digging in to this so deeply, I'm afraid I'm way over my head here.  
If you need any help feel free to ask, as I'm not really following any of this.
  
Zong Li Oct. 23, 2018, 1:19 p.m. UTC | #7
Joseph Myers <joseph@codesourcery.com> 於 2018年10月22日 週一 下午8:11寫道:
>
> On Mon, 22 Oct 2018, Zong Li wrote:
>
> > I spend some time to check all the FRAC_ADD and FRAC_SUB use cases.
>
> Thanks for the analysis of the various definitions and users of these
> macros.
>
> > In conclusion, I inclines to keep the current implementation of
> > FRAC_SUB_3, FRAC_SUB_4, FRAC_ADD_3 and FRAC_ADD_4 instead of using the
> > temporary variables like FRAC_ADD_2 and FRAC_SUB_2, and fix the
> > _FP_DIV_MEAT_N_loop together in next version. Because using temporary
>
> I think a basic principle here should be that all FRAC_ADD and FRAC_SUB
> macros have the same interface regarding what overlap is permitted between
> arguments and results.
>
> That is, if that interface is defined not to allow any overlap, then there
> are several users to change to avoid overlap, even if they are only using
> FRAC_ADD macros with overlap between R and Y, or only FRAC_SUB_1 and
> FRAC_SUB_2, and so the overlap does not cause problems with the present
> implementations.
>
> I think it would be less disruptive to allow exact overlap between
> corresponding elements of R and Y.  Then __FP_FRAC_SUB_3 and
> __FP_FRAC_SUB_4 would be fixed to use temporaries to follow that
> interface, while _FP_DIV_MEAT_4_udiv and _FP_DIV_MEAT_N_loop would also be
> made to use temporaries so their calls to the FRAC_SUB macros follow that
> restriction on overlap.  (It's true that the current sfp-machine.h files
> only use _FP_DIV_MEAT_1_loop, not _FP_DIV_MEAT_2_loop or
> _FP_DIV_MEAT_4_loop, but it's still appropriate to fix _FP_DIV_MEAT_N_loop
> to follow the FRAC_SUB interface properly.)
>
> The compiler should have no difficulty optimizing out the temporaries in
> cases where they end up not being needed.
>
> In all cases, names of temporary variables in soft-fp macros should start
> with the full name of the macro (so __FP_FRAC_SUB_3_* for temporaries
> inside __FP_FRAC_SUB_3, for example), to avoid the risk of bugs resulting
> from shadowing when one macro calls another macro with the same variable
> name.

I see, I would change the __FP_FRAC_SUB_3, __FP_FRAC_SUB_4,
__FP_FRAC_ADD_3 and __FP_FRAC_ADD_4 implementation, let the interface
is identical for users.
  
Zong Li Oct. 23, 2018, 1:24 p.m. UTC | #8
Palmer Dabbelt <palmer@dabbelt.com> 於 2018年10月23日 週二 上午3:31寫道:
>
> On Mon, 22 Oct 2018 05:10:59 PDT (-0700), joseph@codesourcery.com wrote:
> > On Mon, 22 Oct 2018, Zong Li wrote:
> >
> >> I spend some time to check all the FRAC_ADD and FRAC_SUB use cases.
> >
> > Thanks for the analysis of the various definitions and users of these
> > macros.
> >
> >> In conclusion, I inclines to keep the current implementation of
> >> FRAC_SUB_3, FRAC_SUB_4, FRAC_ADD_3 and FRAC_ADD_4 instead of using the
> >> temporary variables like FRAC_ADD_2 and FRAC_SUB_2, and fix the
> >> _FP_DIV_MEAT_N_loop together in next version. Because using temporary
> >
> > I think a basic principle here should be that all FRAC_ADD and FRAC_SUB
> > macros have the same interface regarding what overlap is permitted between
> > arguments and results.
> >
> > That is, if that interface is defined not to allow any overlap, then there
> > are several users to change to avoid overlap, even if they are only using
> > FRAC_ADD macros with overlap between R and Y, or only FRAC_SUB_1 and
> > FRAC_SUB_2, and so the overlap does not cause problems with the present
> > implementations.
> >
> > I think it would be less disruptive to allow exact overlap between
> > corresponding elements of R and Y.  Then __FP_FRAC_SUB_3 and
> > __FP_FRAC_SUB_4 would be fixed to use temporaries to follow that
> > interface, while _FP_DIV_MEAT_4_udiv and _FP_DIV_MEAT_N_loop would also be
> > made to use temporaries so their calls to the FRAC_SUB macros follow that
> > restriction on overlap.  (It's true that the current sfp-machine.h files
> > only use _FP_DIV_MEAT_1_loop, not _FP_DIV_MEAT_2_loop or
> > _FP_DIV_MEAT_4_loop, but it's still appropriate to fix _FP_DIV_MEAT_N_loop
> > to follow the FRAC_SUB interface properly.)
> >
> > The compiler should have no difficulty optimizing out the temporaries in
> > cases where they end up not being needed.
> >
> > In all cases, names of temporary variables in soft-fp macros should start
> > with the full name of the macro (so __FP_FRAC_SUB_3_* for temporaries
> > inside __FP_FRAC_SUB_3, for example), to avoid the risk of bugs resulting
> > from shadowing when one macro calls another macro with the same variable
> > name.
>
> Thanks for digging in to this so deeply, I'm afraid I'm way over my head here.
> If you need any help feel free to ask, as I'm not really following any of this.

OK, thanks!
  
Joseph Myers Oct. 23, 2018, 8:43 p.m. UTC | #9
On Tue, 23 Oct 2018, Zong Li wrote:

> > I think it would be less disruptive to allow exact overlap between
> > corresponding elements of R and Y.  Then __FP_FRAC_SUB_3 and
> > __FP_FRAC_SUB_4 would be fixed to use temporaries to follow that
> > interface, while _FP_DIV_MEAT_4_udiv and _FP_DIV_MEAT_N_loop would also be
> > made to use temporaries so their calls to the FRAC_SUB macros follow that
> > restriction on overlap.  (It's true that the current sfp-machine.h files
> > only use _FP_DIV_MEAT_1_loop, not _FP_DIV_MEAT_2_loop or
> > _FP_DIV_MEAT_4_loop, but it's still appropriate to fix _FP_DIV_MEAT_N_loop
> > to follow the FRAC_SUB interface properly.)
> >
> > The compiler should have no difficulty optimizing out the temporaries in
> > cases where they end up not being needed.
> >
> > In all cases, names of temporary variables in soft-fp macros should start
> > with the full name of the macro (so __FP_FRAC_SUB_3_* for temporaries
> > inside __FP_FRAC_SUB_3, for example), to avoid the risk of bugs resulting
> > from shadowing when one macro calls another macro with the same variable
> > name.
> 
> I see, I would change the __FP_FRAC_SUB_3, __FP_FRAC_SUB_4,
> __FP_FRAC_ADD_3 and __FP_FRAC_ADD_4 implementation, let the interface
> is identical for users.

My expectation is that __FP_FRAC_ADD_3 and __FP_FRAC_ADD_4 don't need to 
change (because they are already valid for exact overlap between the 
result and the second input).
  
Zong Li Oct. 24, 2018, 12:52 p.m. UTC | #10
Joseph Myers <joseph@codesourcery.com> 於 2018年10月24日 週三 上午4:44寫道:
>
> On Tue, 23 Oct 2018, Zong Li wrote:
>
> > > I think it would be less disruptive to allow exact overlap between
> > > corresponding elements of R and Y.  Then __FP_FRAC_SUB_3 and
> > > __FP_FRAC_SUB_4 would be fixed to use temporaries to follow that
> > > interface, while _FP_DIV_MEAT_4_udiv and _FP_DIV_MEAT_N_loop would also be
> > > made to use temporaries so their calls to the FRAC_SUB macros follow that
> > > restriction on overlap.  (It's true that the current sfp-machine.h files
> > > only use _FP_DIV_MEAT_1_loop, not _FP_DIV_MEAT_2_loop or
> > > _FP_DIV_MEAT_4_loop, but it's still appropriate to fix _FP_DIV_MEAT_N_loop
> > > to follow the FRAC_SUB interface properly.)
> > >
> > > The compiler should have no difficulty optimizing out the temporaries in
> > > cases where they end up not being needed.
> > >
> > > In all cases, names of temporary variables in soft-fp macros should start
> > > with the full name of the macro (so __FP_FRAC_SUB_3_* for temporaries
> > > inside __FP_FRAC_SUB_3, for example), to avoid the risk of bugs resulting
> > > from shadowing when one macro calls another macro with the same variable
> > > name.
> >
> > I see, I would change the __FP_FRAC_SUB_3, __FP_FRAC_SUB_4,
> > __FP_FRAC_ADD_3 and __FP_FRAC_ADD_4 implementation, let the interface
> > is identical for users.
>
> My expectation is that __FP_FRAC_ADD_3 and __FP_FRAC_ADD_4 don't need to
> change (because they are already valid for exact overlap between the
> result and the second input).
>
OK, I would change the __FP_FRAC_SUB_3 and __FP_FRAC_SUB_4 and submit
next version after I testing glibc testsuite.
  

Patch

diff --git a/ChangeLog b/ChangeLog
index 2601765..bb30093 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@ 
+2018-10-15  Zong Li  <zongbox@gmail.com>
+
+	* soft-fp/op-4.h: Fix wrong calculation of division.
+
 2018-10-14  Paul Eggert  <eggert@cs.ucla.edu>
 
 	regex: simplify by using intprops.h
diff --git a/soft-fp/op-4.h b/soft-fp/op-4.h
index 01b87d0..a407168 100644
--- a/soft-fp/op-4.h
+++ b/soft-fp/op-4.h
@@ -517,10 +517,15 @@ 
 	      R##_f[_FP_DIV_MEAT_4_udiv_i] = -1;			\
 	      if (!_FP_DIV_MEAT_4_udiv_i)				\
 		break;							\
-	      __FP_FRAC_SUB_4 (X##_f[3], X##_f[2], X##_f[1], X##_f[0],	\
+	      UWtype __FP_FRAC_tmp;					\
+	      __FP_FRAC_SUB_4 (X##_f[2], X##_f[1], X##_f[0], __FP_FRAC_tmp,	\
 			       Y##_f[2], Y##_f[1], Y##_f[0], 0,		\
 			       X##_f[2], X##_f[1], X##_f[0],		\
 			       _FP_DIV_MEAT_4_udiv_n_f[_FP_DIV_MEAT_4_udiv_i]); \
+	      X##_f[3] = X##_f[2];					\
+	      X##_f[2] = X##_f[1];					\
+	      X##_f[1] = X##_f[0];					\
+	      X##_f[0] = __FP_FRAC_tmp;					\
 	      _FP_FRAC_SUB_4 (X, Y, X);					\
 	      if (X##_f[3] > Y##_f[3])					\
 		{							\