Improve gen-libm-test.pl LIT() application

Message ID 588e39fe-e762-8f54-d793-8ba2e06eb9a2@linux.vnet.ibm.com
State Superseded
Headers

Commit Message

Paul E. Murphy Aug. 4, 2016, 3:05 p.m. UTC
  When bootstrapping float128, this exposed a number of areas where
the L suffix is incorrectly applied to simple expressions when it
should be applied to each constant in the expression.

In order to stave off more macros in libm-test.inc, apply_lit is
made slightly more intelligent.  It will now split most basic
expressions and apply LIT() individually to each token.  As noted,
it is not a perfect solution, but compromises correctness,
readability, and simplicity.

The above is problematic when the L real suffix is not the most
expressive modifier, and the compiler complains (i.e ppc64) or
silently truncates a value (i.e ppc64).

	* math/gen-libm-test.pl (apply_lit): Rewrite to apply
	LIT() to individual constants in simple expressions.
	(_apply_lit): Rename replaced version, and use it to
	apply to what appears to be a token.
---
 math/gen-libm-test.pl | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)
  

Comments

Carlos O'Donell Aug. 4, 2016, 4:02 p.m. UTC | #1
On 08/04/2016 11:05 AM, Paul E. Murphy wrote:
> When bootstrapping float128, this exposed a number of areas where
> the L suffix is incorrectly applied to simple expressions when it
> should be applied to each constant in the expression.
> 
> In order to stave off more macros in libm-test.inc, apply_lit is
> made slightly more intelligent.  It will now split most basic
> expressions and apply LIT() individually to each token.  As noted,
> it is not a perfect solution, but compromises correctness,
> readability, and simplicity.
> 
> The above is problematic when the L real suffix is not the most
> expressive modifier, and the compiler complains (i.e ppc64) or
> silently truncates a value (i.e ppc64).
> 
> 	* math/gen-libm-test.pl (apply_lit): Rewrite to apply
> 	LIT() to individual constants in simple expressions.
> 	(_apply_lit): Rename replaced version, and use it to
> 	apply to what appears to be a token.

This looks like a good compromise to me.

Question below.

> ---
>  math/gen-libm-test.pl | 40 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/math/gen-libm-test.pl b/math/gen-libm-test.pl
> index aa66e76..cb0a02b 100755
> --- a/math/gen-libm-test.pl
> +++ b/math/gen-libm-test.pl
> @@ -163,7 +163,7 @@ sub show_exceptions {
>  
>  # Apply the LIT(x) macro to a literal floating point constant
>  # and strip any existing suffix.
> -sub apply_lit {
> +sub _apply_lit {
>    my ($lit) = @_;
>    my $exp_re = "([+-])?[[:digit:]]+";
>    # Don't wrap something that does not look like a:
> @@ -180,6 +180,44 @@ sub apply_lit {
>    return "LIT (${lit})";
>  }
>  
> +
> +# Apply LIT macro to individual tokens within an
> +# expression.  This is only meant to work with
> +# the very simple expressions encountered like
> +# A (op B)*.  It will not split all literals
> +# correctly, but suffices for this usage without
> +# a substantially more complex tokenizer.

Is there any chance you can reject literals you won't correctly
split and raise an error?

> +sub apply_lit {
> +  my ($lit) = @_;
> +  my @toks = ();
> +
> +  # There are some constant expressions embedded in the
> +  # test cases.  To avoid writing a more complex lexer,
> +  # we apply some fixups, and split based on simple
> +  # operators, unfixup, then apply LIT as needed.
> +
> +  # This behaves poorly on inputs like 0x0e+1.0f,
> +  # or MAX_EXP+1, but ultimately doesn't break
> +  # anything... for now.

What do you mean by 'behaves poorly'? It looks like your
regexp's below handle it correctly.

> +  $lit =~ s/([[:xdigit:]])[pP]\+/$1,/g;
> +  $lit =~ s/([[:xdigit:]])[pP]\-/$1#/g;
> +  $lit =~ s/([0-9])[eE]\+/$1'/g;
> +  $lit =~ s/([0-9])[eE]\-/$1"/g;
> +
> +  # Split into tokenish looking things
> +  @toks = split (/([\*\-\+\/])/, $lit);
> +
> +  # Remove fixups and apply LIT() if needed.
> +  foreach (@toks) {
> +    $_ =~ s/,/p+/g;
> +    $_ =~ s/#/p-/g;
> +    $_ =~ s/'/e+/g;
> +    $_ =~ s/"/e-/g;
> +    $_ = _apply_lit ($_);
> +  }
> +  return join ('', @toks);
> +}
> +
>  # Parse the arguments to TEST_x_y
>  sub parse_args {
>    my ($file, $descr, $args) = @_;
>
  
Paul E. Murphy Aug. 4, 2016, 4:45 p.m. UTC | #2
On 08/04/2016 11:02 AM, Carlos O'Donell wrote:
> On 08/04/2016 11:05 AM, Paul E. Murphy wrote:
>> When bootstrapping float128, this exposed a number of areas where
>> the L suffix is incorrectly applied to simple expressions when it
>> should be applied to each constant in the expression.
>>
>> In order to stave off more macros in libm-test.inc, apply_lit is
>> made slightly more intelligent.  It will now split most basic
>> expressions and apply LIT() individually to each token.  As noted,
>> it is not a perfect solution, but compromises correctness,
>> readability, and simplicity.
>>
>> The above is problematic when the L real suffix is not the most
>> expressive modifier, and the compiler complains (i.e ppc64) or
>> silently truncates a value (i.e ppc64).
>>
>> 	* math/gen-libm-test.pl (apply_lit): Rewrite to apply
>> 	LIT() to individual constants in simple expressions.
>> 	(_apply_lit): Rename replaced version, and use it to
>> 	apply to what appears to be a token.
> 
> This looks like a good compromise to me.
> 
> Question below.
> 
>> ---
>>  math/gen-libm-test.pl | 40 +++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 39 insertions(+), 1 deletion(-)
>>
>> diff --git a/math/gen-libm-test.pl b/math/gen-libm-test.pl
>> index aa66e76..cb0a02b 100755
>> --- a/math/gen-libm-test.pl
>> +++ b/math/gen-libm-test.pl
>> @@ -163,7 +163,7 @@ sub show_exceptions {
>>  
>>  # Apply the LIT(x) macro to a literal floating point constant
>>  # and strip any existing suffix.
>> -sub apply_lit {
>> +sub _apply_lit {
>>    my ($lit) = @_;
>>    my $exp_re = "([+-])?[[:digit:]]+";
>>    # Don't wrap something that does not look like a:
>> @@ -180,6 +180,44 @@ sub apply_lit {
>>    return "LIT (${lit})";
>>  }
>>  
>> +
>> +# Apply LIT macro to individual tokens within an
>> +# expression.  This is only meant to work with
>> +# the very simple expressions encountered like
>> +# A (op B)*.  It will not split all literals
>> +# correctly, but suffices for this usage without
>> +# a substantially more complex tokenizer.
> 
> Is there any chance you can reject literals you won't correctly
> split and raise an error?

As I understand it, the only incorrect splitting occurs for
some inputs of the form:

{integer, identifier} op {integer,real}

Which, will ultimately only apply LIT() to the expressions
containing a real value as the second operand.  But, LIT is
applied to the entire expression.

So you might end up passing things like "MAX_EXP+1", "0xe+1.0f"
to _apply_lit.  The former does happen, the latter is a
constructed example.

If more complicated expressions are used in libm-test.inc, or
this workaround proven insufficient, we should refactor
libm-test.inc to remove the need for this hack.

>> +sub apply_lit {
>> +  my ($lit) = @_;
>> +  my @toks = ();
>> +
>> +  # There are some constant expressions embedded in the
>> +  # test cases.  To avoid writing a more complex lexer,
>> +  # we apply some fixups, and split based on simple
>> +  # operators, unfixup, then apply LIT as needed.
>> +
>> +  # This behaves poorly on inputs like 0x0e+1.0f,
>> +  # or MAX_EXP+1, but ultimately doesn't break
>> +  # anything... for now.
> 
> What do you mean by 'behaves poorly'? It looks like your
> regexp's below handle it correctly.

They are ultimately handled correctly, but you could still
conceivably end up applying LIT to more than necessary.
I.e LIT(0xe+1.0) instead of 0xe+LIT(1.0).  MAX_EXP+1 is
vacuously correct as it never needs modification. 

I've yet to convince myself this solution is foolproof,
but it suffices for the current usage in libm-test.inc.
  
Joseph Myers Aug. 4, 2016, 4:47 p.m. UTC | #3
On Thu, 4 Aug 2016, Paul E. Murphy wrote:

> +  # This behaves poorly on inputs like 0x0e+1.0f,

0x0e+1.0f is a pp-number which is not valid to convert from a pp-token to 
a token.  So it should never occur in libm-test.inc in the first place and 
is not a useful example to give here.
  
Carlos O'Donell Aug. 4, 2016, 4:51 p.m. UTC | #4
On 08/04/2016 12:45 PM, Paul E. Murphy wrote:
> 
> 
> On 08/04/2016 11:02 AM, Carlos O'Donell wrote:
>> On 08/04/2016 11:05 AM, Paul E. Murphy wrote:
>>> When bootstrapping float128, this exposed a number of areas where
>>> the L suffix is incorrectly applied to simple expressions when it
>>> should be applied to each constant in the expression.
>>>
>>> In order to stave off more macros in libm-test.inc, apply_lit is
>>> made slightly more intelligent.  It will now split most basic
>>> expressions and apply LIT() individually to each token.  As noted,
>>> it is not a perfect solution, but compromises correctness,
>>> readability, and simplicity.
>>>
>>> The above is problematic when the L real suffix is not the most
>>> expressive modifier, and the compiler complains (i.e ppc64) or
>>> silently truncates a value (i.e ppc64).
>>>
>>> 	* math/gen-libm-test.pl (apply_lit): Rewrite to apply
>>> 	LIT() to individual constants in simple expressions.
>>> 	(_apply_lit): Rename replaced version, and use it to
>>> 	apply to what appears to be a token.
>>
>> This looks like a good compromise to me.
>>
>> Question below.
>>
>>> ---
>>>  math/gen-libm-test.pl | 40 +++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 39 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/math/gen-libm-test.pl b/math/gen-libm-test.pl
>>> index aa66e76..cb0a02b 100755
>>> --- a/math/gen-libm-test.pl
>>> +++ b/math/gen-libm-test.pl
>>> @@ -163,7 +163,7 @@ sub show_exceptions {
>>>  
>>>  # Apply the LIT(x) macro to a literal floating point constant
>>>  # and strip any existing suffix.
>>> -sub apply_lit {
>>> +sub _apply_lit {
>>>    my ($lit) = @_;
>>>    my $exp_re = "([+-])?[[:digit:]]+";
>>>    # Don't wrap something that does not look like a:
>>> @@ -180,6 +180,44 @@ sub apply_lit {
>>>    return "LIT (${lit})";
>>>  }
>>>  
>>> +
>>> +# Apply LIT macro to individual tokens within an
>>> +# expression.  This is only meant to work with
>>> +# the very simple expressions encountered like
>>> +# A (op B)*.  It will not split all literals
>>> +# correctly, but suffices for this usage without
>>> +# a substantially more complex tokenizer.
>>
>> Is there any chance you can reject literals you won't correctly
>> split and raise an error?
> 
> As I understand it, the only incorrect splitting occurs for
> some inputs of the form:
> 
> {integer, identifier} op {integer,real}
> 
> Which, will ultimately only apply LIT() to the expressions
> containing a real value as the second operand.  But, LIT is
> applied to the entire expression.
> 
> So you might end up passing things like "MAX_EXP+1", "0xe+1.0f"
> to _apply_lit.  The former does happen, the latter is a
> constructed example.
> 
> If more complicated expressions are used in libm-test.inc, or
> this workaround proven insufficient, we should refactor
> libm-test.inc to remove the need for this hack.

OK, agreed.

>>> +sub apply_lit {
>>> +  my ($lit) = @_;
>>> +  my @toks = ();
>>> +
>>> +  # There are some constant expressions embedded in the
>>> +  # test cases.  To avoid writing a more complex lexer,
>>> +  # we apply some fixups, and split based on simple
>>> +  # operators, unfixup, then apply LIT as needed.
>>> +
>>> +  # This behaves poorly on inputs like 0x0e+1.0f,
>>> +  # or MAX_EXP+1, but ultimately doesn't break
>>> +  # anything... for now.
>>
>> What do you mean by 'behaves poorly'? It looks like your
>> regexp's below handle it correctly.
> 
> They are ultimately handled correctly, but you could still
> conceivably end up applying LIT to more than necessary.
> I.e LIT(0xe+1.0) instead of 0xe+LIT(1.0).  MAX_EXP+1 is
> vacuously correct as it never needs modification. 
> 
> I've yet to convince myself this solution is foolproof,
> but it suffices for the current usage in libm-test.inc.
 
OK, please adjust the comment wording to say that instead
of "behaves poorly"

OK with that change, as long as you tested on x86_64 and
ppc64*.
  
Joseph Myers Aug. 4, 2016, 4:51 p.m. UTC | #5
On Thu, 4 Aug 2016, Paul E. Murphy wrote:

> As I understand it, the only incorrect splitting occurs for
> some inputs of the form:
> 
> {integer, identifier} op {integer,real}
> 
> Which, will ultimately only apply LIT() to the expressions
> containing a real value as the second operand.  But, LIT is
> applied to the entire expression.
> 
> So you might end up passing things like "MAX_EXP+1", "0xe+1.0f"
> to _apply_lit.  The former does happen, the latter is a
> constructed example.
> 
> If more complicated expressions are used in libm-test.inc, or
> this workaround proven insufficient, we should refactor
> libm-test.inc to remove the need for this hack.

How about putting spaces around the operators in libm-test.inc in all 
cases where you need to split on operators, and then making the code split 
on spaces rather than needing to do more complicated lexing and 
substitutions to identify tokens?  Spaces should be present anyway in 
accordance with the GNU Coding Standards.
  
Carlos O'Donell Aug. 4, 2016, 4:52 p.m. UTC | #6
On 08/04/2016 12:51 PM, Joseph Myers wrote:
> On Thu, 4 Aug 2016, Paul E. Murphy wrote:
> 
>> As I understand it, the only incorrect splitting occurs for
>> some inputs of the form:
>>
>> {integer, identifier} op {integer,real}
>>
>> Which, will ultimately only apply LIT() to the expressions
>> containing a real value as the second operand.  But, LIT is
>> applied to the entire expression.
>>
>> So you might end up passing things like "MAX_EXP+1", "0xe+1.0f"
>> to _apply_lit.  The former does happen, the latter is a
>> constructed example.
>>
>> If more complicated expressions are used in libm-test.inc, or
>> this workaround proven insufficient, we should refactor
>> libm-test.inc to remove the need for this hack.
> 
> How about putting spaces around the operators in libm-test.inc in all 
> cases where you need to split on operators, and then making the code split 
> on spaces rather than needing to do more complicated lexing and 
> substitutions to identify tokens?  Spaces should be present anyway in 
> accordance with the GNU Coding Standards.
 
+1

That's a good suggestion.
  

Patch

diff --git a/math/gen-libm-test.pl b/math/gen-libm-test.pl
index aa66e76..cb0a02b 100755
--- a/math/gen-libm-test.pl
+++ b/math/gen-libm-test.pl
@@ -163,7 +163,7 @@  sub show_exceptions {
 
 # Apply the LIT(x) macro to a literal floating point constant
 # and strip any existing suffix.
-sub apply_lit {
+sub _apply_lit {
   my ($lit) = @_;
   my $exp_re = "([+-])?[[:digit:]]+";
   # Don't wrap something that does not look like a:
@@ -180,6 +180,44 @@  sub apply_lit {
   return "LIT (${lit})";
 }
 
+
+# Apply LIT macro to individual tokens within an
+# expression.  This is only meant to work with
+# the very simple expressions encountered like
+# A (op B)*.  It will not split all literals
+# correctly, but suffices for this usage without
+# a substantially more complex tokenizer.
+sub apply_lit {
+  my ($lit) = @_;
+  my @toks = ();
+
+  # There are some constant expressions embedded in the
+  # test cases.  To avoid writing a more complex lexer,
+  # we apply some fixups, and split based on simple
+  # operators, unfixup, then apply LIT as needed.
+
+  # This behaves poorly on inputs like 0x0e+1.0f,
+  # or MAX_EXP+1, but ultimately doesn't break
+  # anything... for now.
+  $lit =~ s/([[:xdigit:]])[pP]\+/$1,/g;
+  $lit =~ s/([[:xdigit:]])[pP]\-/$1#/g;
+  $lit =~ s/([0-9])[eE]\+/$1'/g;
+  $lit =~ s/([0-9])[eE]\-/$1"/g;
+
+  # Split into tokenish looking things
+  @toks = split (/([\*\-\+\/])/, $lit);
+
+  # Remove fixups and apply LIT() if needed.
+  foreach (@toks) {
+    $_ =~ s/,/p+/g;
+    $_ =~ s/#/p-/g;
+    $_ =~ s/'/e+/g;
+    $_ =~ s/"/e-/g;
+    $_ = _apply_lit ($_);
+  }
+  return join ('', @toks);
+}
+
 # Parse the arguments to TEST_x_y
 sub parse_args {
   my ($file, $descr, $args) = @_;