Message ID | 588e39fe-e762-8f54-d793-8ba2e06eb9a2@linux.vnet.ibm.com |
---|---|
State | Superseded |
Headers |
Received: (qmail 9030 invoked by alias); 4 Aug 2016 15:06:25 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 9018 invoked by uid 89); 4 Aug 2016 15:06:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW, UNSUBSCRIBE_BODY autolearn=no version=3.3.2 spammy=digit, xdigit, 2.4.11, libm-test.inc X-HELO: mx0a-001b2d01.pphosted.com X-IBM-Helo: d03dlp03.boulder.ibm.com X-IBM-MailFrom: murphyp@linux.vnet.ibm.com From: "Paul E. Murphy" <murphyp@linux.vnet.ibm.com> Subject: [PATCH] Improve gen-libm-test.pl LIT() application To: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org> Cc: Joseph Myers <joseph@codesourcery.com> Date: Thu, 4 Aug 2016 10:05:59 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.0 MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16080415-0024-0000-0000-000014421B1D X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005548; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000178; SDB=6.00739925; UDB=6.00347947; IPR=6.00512523; BA=6.00004644; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012156; XFM=3.00000011; UTC=2016-08-04 15:06:09 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16080415-0025-0000-0000-0000434764DB Message-Id: <588e39fe-e762-8f54-d793-8ba2e06eb9a2@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-08-04_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608040166 |
Commit Message
Paul E. Murphy
Aug. 4, 2016, 3:05 p.m. UTC
When bootstrapping float128, this exposed a number of areas where the L suffix is incorrectly applied to simple expressions when it should be applied to each constant in the expression. In order to stave off more macros in libm-test.inc, apply_lit is made slightly more intelligent. It will now split most basic expressions and apply LIT() individually to each token. As noted, it is not a perfect solution, but compromises correctness, readability, and simplicity. The above is problematic when the L real suffix is not the most expressive modifier, and the compiler complains (i.e ppc64) or silently truncates a value (i.e ppc64). * math/gen-libm-test.pl (apply_lit): Rewrite to apply LIT() to individual constants in simple expressions. (_apply_lit): Rename replaced version, and use it to apply to what appears to be a token. --- math/gen-libm-test.pl | 40 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-)
Comments
On 08/04/2016 11:05 AM, Paul E. Murphy wrote: > When bootstrapping float128, this exposed a number of areas where > the L suffix is incorrectly applied to simple expressions when it > should be applied to each constant in the expression. > > In order to stave off more macros in libm-test.inc, apply_lit is > made slightly more intelligent. It will now split most basic > expressions and apply LIT() individually to each token. As noted, > it is not a perfect solution, but compromises correctness, > readability, and simplicity. > > The above is problematic when the L real suffix is not the most > expressive modifier, and the compiler complains (i.e ppc64) or > silently truncates a value (i.e ppc64). > > * math/gen-libm-test.pl (apply_lit): Rewrite to apply > LIT() to individual constants in simple expressions. > (_apply_lit): Rename replaced version, and use it to > apply to what appears to be a token. This looks like a good compromise to me. Question below. > --- > math/gen-libm-test.pl | 40 +++++++++++++++++++++++++++++++++++++++- > 1 file changed, 39 insertions(+), 1 deletion(-) > > diff --git a/math/gen-libm-test.pl b/math/gen-libm-test.pl > index aa66e76..cb0a02b 100755 > --- a/math/gen-libm-test.pl > +++ b/math/gen-libm-test.pl > @@ -163,7 +163,7 @@ sub show_exceptions { > > # Apply the LIT(x) macro to a literal floating point constant > # and strip any existing suffix. > -sub apply_lit { > +sub _apply_lit { > my ($lit) = @_; > my $exp_re = "([+-])?[[:digit:]]+"; > # Don't wrap something that does not look like a: > @@ -180,6 +180,44 @@ sub apply_lit { > return "LIT (${lit})"; > } > > + > +# Apply LIT macro to individual tokens within an > +# expression. This is only meant to work with > +# the very simple expressions encountered like > +# A (op B)*. It will not split all literals > +# correctly, but suffices for this usage without > +# a substantially more complex tokenizer. Is there any chance you can reject literals you won't correctly split and raise an error? > +sub apply_lit { > + my ($lit) = @_; > + my @toks = (); > + > + # There are some constant expressions embedded in the > + # test cases. To avoid writing a more complex lexer, > + # we apply some fixups, and split based on simple > + # operators, unfixup, then apply LIT as needed. > + > + # This behaves poorly on inputs like 0x0e+1.0f, > + # or MAX_EXP+1, but ultimately doesn't break > + # anything... for now. What do you mean by 'behaves poorly'? It looks like your regexp's below handle it correctly. > + $lit =~ s/([[:xdigit:]])[pP]\+/$1,/g; > + $lit =~ s/([[:xdigit:]])[pP]\-/$1#/g; > + $lit =~ s/([0-9])[eE]\+/$1'/g; > + $lit =~ s/([0-9])[eE]\-/$1"/g; > + > + # Split into tokenish looking things > + @toks = split (/([\*\-\+\/])/, $lit); > + > + # Remove fixups and apply LIT() if needed. > + foreach (@toks) { > + $_ =~ s/,/p+/g; > + $_ =~ s/#/p-/g; > + $_ =~ s/'/e+/g; > + $_ =~ s/"/e-/g; > + $_ = _apply_lit ($_); > + } > + return join ('', @toks); > +} > + > # Parse the arguments to TEST_x_y > sub parse_args { > my ($file, $descr, $args) = @_; >
On 08/04/2016 11:02 AM, Carlos O'Donell wrote: > On 08/04/2016 11:05 AM, Paul E. Murphy wrote: >> When bootstrapping float128, this exposed a number of areas where >> the L suffix is incorrectly applied to simple expressions when it >> should be applied to each constant in the expression. >> >> In order to stave off more macros in libm-test.inc, apply_lit is >> made slightly more intelligent. It will now split most basic >> expressions and apply LIT() individually to each token. As noted, >> it is not a perfect solution, but compromises correctness, >> readability, and simplicity. >> >> The above is problematic when the L real suffix is not the most >> expressive modifier, and the compiler complains (i.e ppc64) or >> silently truncates a value (i.e ppc64). >> >> * math/gen-libm-test.pl (apply_lit): Rewrite to apply >> LIT() to individual constants in simple expressions. >> (_apply_lit): Rename replaced version, and use it to >> apply to what appears to be a token. > > This looks like a good compromise to me. > > Question below. > >> --- >> math/gen-libm-test.pl | 40 +++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 39 insertions(+), 1 deletion(-) >> >> diff --git a/math/gen-libm-test.pl b/math/gen-libm-test.pl >> index aa66e76..cb0a02b 100755 >> --- a/math/gen-libm-test.pl >> +++ b/math/gen-libm-test.pl >> @@ -163,7 +163,7 @@ sub show_exceptions { >> >> # Apply the LIT(x) macro to a literal floating point constant >> # and strip any existing suffix. >> -sub apply_lit { >> +sub _apply_lit { >> my ($lit) = @_; >> my $exp_re = "([+-])?[[:digit:]]+"; >> # Don't wrap something that does not look like a: >> @@ -180,6 +180,44 @@ sub apply_lit { >> return "LIT (${lit})"; >> } >> >> + >> +# Apply LIT macro to individual tokens within an >> +# expression. This is only meant to work with >> +# the very simple expressions encountered like >> +# A (op B)*. It will not split all literals >> +# correctly, but suffices for this usage without >> +# a substantially more complex tokenizer. > > Is there any chance you can reject literals you won't correctly > split and raise an error? As I understand it, the only incorrect splitting occurs for some inputs of the form: {integer, identifier} op {integer,real} Which, will ultimately only apply LIT() to the expressions containing a real value as the second operand. But, LIT is applied to the entire expression. So you might end up passing things like "MAX_EXP+1", "0xe+1.0f" to _apply_lit. The former does happen, the latter is a constructed example. If more complicated expressions are used in libm-test.inc, or this workaround proven insufficient, we should refactor libm-test.inc to remove the need for this hack. >> +sub apply_lit { >> + my ($lit) = @_; >> + my @toks = (); >> + >> + # There are some constant expressions embedded in the >> + # test cases. To avoid writing a more complex lexer, >> + # we apply some fixups, and split based on simple >> + # operators, unfixup, then apply LIT as needed. >> + >> + # This behaves poorly on inputs like 0x0e+1.0f, >> + # or MAX_EXP+1, but ultimately doesn't break >> + # anything... for now. > > What do you mean by 'behaves poorly'? It looks like your > regexp's below handle it correctly. They are ultimately handled correctly, but you could still conceivably end up applying LIT to more than necessary. I.e LIT(0xe+1.0) instead of 0xe+LIT(1.0). MAX_EXP+1 is vacuously correct as it never needs modification. I've yet to convince myself this solution is foolproof, but it suffices for the current usage in libm-test.inc.
On Thu, 4 Aug 2016, Paul E. Murphy wrote:
> + # This behaves poorly on inputs like 0x0e+1.0f,
0x0e+1.0f is a pp-number which is not valid to convert from a pp-token to
a token. So it should never occur in libm-test.inc in the first place and
is not a useful example to give here.
On 08/04/2016 12:45 PM, Paul E. Murphy wrote: > > > On 08/04/2016 11:02 AM, Carlos O'Donell wrote: >> On 08/04/2016 11:05 AM, Paul E. Murphy wrote: >>> When bootstrapping float128, this exposed a number of areas where >>> the L suffix is incorrectly applied to simple expressions when it >>> should be applied to each constant in the expression. >>> >>> In order to stave off more macros in libm-test.inc, apply_lit is >>> made slightly more intelligent. It will now split most basic >>> expressions and apply LIT() individually to each token. As noted, >>> it is not a perfect solution, but compromises correctness, >>> readability, and simplicity. >>> >>> The above is problematic when the L real suffix is not the most >>> expressive modifier, and the compiler complains (i.e ppc64) or >>> silently truncates a value (i.e ppc64). >>> >>> * math/gen-libm-test.pl (apply_lit): Rewrite to apply >>> LIT() to individual constants in simple expressions. >>> (_apply_lit): Rename replaced version, and use it to >>> apply to what appears to be a token. >> >> This looks like a good compromise to me. >> >> Question below. >> >>> --- >>> math/gen-libm-test.pl | 40 +++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 39 insertions(+), 1 deletion(-) >>> >>> diff --git a/math/gen-libm-test.pl b/math/gen-libm-test.pl >>> index aa66e76..cb0a02b 100755 >>> --- a/math/gen-libm-test.pl >>> +++ b/math/gen-libm-test.pl >>> @@ -163,7 +163,7 @@ sub show_exceptions { >>> >>> # Apply the LIT(x) macro to a literal floating point constant >>> # and strip any existing suffix. >>> -sub apply_lit { >>> +sub _apply_lit { >>> my ($lit) = @_; >>> my $exp_re = "([+-])?[[:digit:]]+"; >>> # Don't wrap something that does not look like a: >>> @@ -180,6 +180,44 @@ sub apply_lit { >>> return "LIT (${lit})"; >>> } >>> >>> + >>> +# Apply LIT macro to individual tokens within an >>> +# expression. This is only meant to work with >>> +# the very simple expressions encountered like >>> +# A (op B)*. It will not split all literals >>> +# correctly, but suffices for this usage without >>> +# a substantially more complex tokenizer. >> >> Is there any chance you can reject literals you won't correctly >> split and raise an error? > > As I understand it, the only incorrect splitting occurs for > some inputs of the form: > > {integer, identifier} op {integer,real} > > Which, will ultimately only apply LIT() to the expressions > containing a real value as the second operand. But, LIT is > applied to the entire expression. > > So you might end up passing things like "MAX_EXP+1", "0xe+1.0f" > to _apply_lit. The former does happen, the latter is a > constructed example. > > If more complicated expressions are used in libm-test.inc, or > this workaround proven insufficient, we should refactor > libm-test.inc to remove the need for this hack. OK, agreed. >>> +sub apply_lit { >>> + my ($lit) = @_; >>> + my @toks = (); >>> + >>> + # There are some constant expressions embedded in the >>> + # test cases. To avoid writing a more complex lexer, >>> + # we apply some fixups, and split based on simple >>> + # operators, unfixup, then apply LIT as needed. >>> + >>> + # This behaves poorly on inputs like 0x0e+1.0f, >>> + # or MAX_EXP+1, but ultimately doesn't break >>> + # anything... for now. >> >> What do you mean by 'behaves poorly'? It looks like your >> regexp's below handle it correctly. > > They are ultimately handled correctly, but you could still > conceivably end up applying LIT to more than necessary. > I.e LIT(0xe+1.0) instead of 0xe+LIT(1.0). MAX_EXP+1 is > vacuously correct as it never needs modification. > > I've yet to convince myself this solution is foolproof, > but it suffices for the current usage in libm-test.inc. OK, please adjust the comment wording to say that instead of "behaves poorly" OK with that change, as long as you tested on x86_64 and ppc64*.
On Thu, 4 Aug 2016, Paul E. Murphy wrote: > As I understand it, the only incorrect splitting occurs for > some inputs of the form: > > {integer, identifier} op {integer,real} > > Which, will ultimately only apply LIT() to the expressions > containing a real value as the second operand. But, LIT is > applied to the entire expression. > > So you might end up passing things like "MAX_EXP+1", "0xe+1.0f" > to _apply_lit. The former does happen, the latter is a > constructed example. > > If more complicated expressions are used in libm-test.inc, or > this workaround proven insufficient, we should refactor > libm-test.inc to remove the need for this hack. How about putting spaces around the operators in libm-test.inc in all cases where you need to split on operators, and then making the code split on spaces rather than needing to do more complicated lexing and substitutions to identify tokens? Spaces should be present anyway in accordance with the GNU Coding Standards.
On 08/04/2016 12:51 PM, Joseph Myers wrote: > On Thu, 4 Aug 2016, Paul E. Murphy wrote: > >> As I understand it, the only incorrect splitting occurs for >> some inputs of the form: >> >> {integer, identifier} op {integer,real} >> >> Which, will ultimately only apply LIT() to the expressions >> containing a real value as the second operand. But, LIT is >> applied to the entire expression. >> >> So you might end up passing things like "MAX_EXP+1", "0xe+1.0f" >> to _apply_lit. The former does happen, the latter is a >> constructed example. >> >> If more complicated expressions are used in libm-test.inc, or >> this workaround proven insufficient, we should refactor >> libm-test.inc to remove the need for this hack. > > How about putting spaces around the operators in libm-test.inc in all > cases where you need to split on operators, and then making the code split > on spaces rather than needing to do more complicated lexing and > substitutions to identify tokens? Spaces should be present anyway in > accordance with the GNU Coding Standards. +1 That's a good suggestion.
diff --git a/math/gen-libm-test.pl b/math/gen-libm-test.pl index aa66e76..cb0a02b 100755 --- a/math/gen-libm-test.pl +++ b/math/gen-libm-test.pl @@ -163,7 +163,7 @@ sub show_exceptions { # Apply the LIT(x) macro to a literal floating point constant # and strip any existing suffix. -sub apply_lit { +sub _apply_lit { my ($lit) = @_; my $exp_re = "([+-])?[[:digit:]]+"; # Don't wrap something that does not look like a: @@ -180,6 +180,44 @@ sub apply_lit { return "LIT (${lit})"; } + +# Apply LIT macro to individual tokens within an +# expression. This is only meant to work with +# the very simple expressions encountered like +# A (op B)*. It will not split all literals +# correctly, but suffices for this usage without +# a substantially more complex tokenizer. +sub apply_lit { + my ($lit) = @_; + my @toks = (); + + # There are some constant expressions embedded in the + # test cases. To avoid writing a more complex lexer, + # we apply some fixups, and split based on simple + # operators, unfixup, then apply LIT as needed. + + # This behaves poorly on inputs like 0x0e+1.0f, + # or MAX_EXP+1, but ultimately doesn't break + # anything... for now. + $lit =~ s/([[:xdigit:]])[pP]\+/$1,/g; + $lit =~ s/([[:xdigit:]])[pP]\-/$1#/g; + $lit =~ s/([0-9])[eE]\+/$1'/g; + $lit =~ s/([0-9])[eE]\-/$1"/g; + + # Split into tokenish looking things + @toks = split (/([\*\-\+\/])/, $lit); + + # Remove fixups and apply LIT() if needed. + foreach (@toks) { + $_ =~ s/,/p+/g; + $_ =~ s/#/p-/g; + $_ =~ s/'/e+/g; + $_ =~ s/"/e-/g; + $_ = _apply_lit ($_); + } + return join ('', @toks); +} + # Parse the arguments to TEST_x_y sub parse_args { my ($file, $descr, $args) = @_;