[libgfortran] PR114304 - [13/14 Regression] libgfortran I/O – bogus "Semicolon not allowed as separator with DECIMAL='point'"
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Testing passed
|
Commit Message
Hi all,
The attached log entry and patch (git show) fixes this issue by adding
logic to handle spaces in eat_separators. One or more spaces by
themselves are a valid separator. So in this case we look at the
character following the spaces to see if it is a comma or semicolon.
If so, I change it to the valid separator for the given decimal mode,
point or comma. This allows the comma or semicolon to be interpreted as
a null read on the next effective item in the formatted read.
I chose a permissive approach here that allows reads to proceed when the
input line is mal-formed with an incorrect separator as long as there is
at least one space in front of it.
New test case included. Regression tested on X86-64.
OK for trunk? Backport to 13 after some time.
Regards,
Jerry
Comments
Hi Jerry,
It looks good to me. Noting that this is not a regression, OK for mainline
on condition that you keep a sharp eye out for any associated problems.
Likewise with backporting to 13-branch.
Thanks
Paul
On Thu, 4 Apr 2024 at 02:34, Jerry D <jvdelisle2@gmail.com> wrote:
> Hi all,
>
> The attached log entry and patch (git show) fixes this issue by adding
> logic to handle spaces in eat_separators. One or more spaces by
> themselves are a valid separator. So in this case we look at the
> character following the spaces to see if it is a comma or semicolon.
>
> If so, I change it to the valid separator for the given decimal mode,
> point or comma. This allows the comma or semicolon to be interpreted as
> a null read on the next effective item in the formatted read.
>
> I chose a permissive approach here that allows reads to proceed when the
> input line is mal-formed with an incorrect separator as long as there is
> at least one space in front of it.
>
> New test case included. Regression tested on X86-64.
>
> OK for trunk? Backport to 13 after some time.
>
> Regards,
>
> Jerry
Hi Jerry,
Jerry D wrote:
> The attached log entry and patch (git show) fixes this issue by adding
> logic to handle spaces in eat_separators. One or more spaces by
> themselves are a valid separator. So in this case we look at the
> character following the spaces to see if it is a comma or semicolon.
>
> If so, I change it to the valid separator for the given decimal mode,
> point or comma. This allows the comma or semicolon to be interpreted as
> a null read on the next effective item in the formatted read.
>
> I chose a permissive approach here that allows reads to proceed when the
> input line is mal-formed with an incorrect separator as long as there is
> at least one space in front of it.
First: Consider also adding 'PR fortran/105473' to the commit log
as the PRs are closely related, albeit this PR is different-
The patch looks mostly like I would expect, except for decimal='point'
and a ';' which is *not* preceded by a space.
Thanks for working on it.
Regarding the 'except' case:
* * *
If I try your patch with the testcase of at comment 19,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114304#c19
→ https://gcc.gnu.org/bugzilla/attachment.cgi?id=57695 ,
I do note that with 'decimal=point', a tailing semicolon is silently
accepted – even if not proceeded by a space.
I think such code is invalid – and you could consider to reject it.
Otherwise, the handling all seems to be in line with the Fortran spec.
i.e. for the following string, I had *expected an error*:
point, isreal = F , testinput = ";"n= 42 ios= 0
point, isreal = F , testinput = "5;"n= 5 ios= 0
point, isreal = T , testinput = "8;"r= 8.00000000 ios= 0
point, isreal = T , testinput = "3.3;"r= 3.29999995 ios= 0
point, isreal = T , testinput = "3,3;"r= 3.00000000 ios= 0
while I think the following is OK (i.e. no error is what I expect) due
to the the space before the ';'.
point, isreal = F , testinput = "7 ;"n= 7 ios= 0
point, isreal = T , testinput = "9 ;"r= 9.00000000 ios= 0
point, isreal = T , testinput = "4.4 ;"r= 4.40000010 ios=0
point, isreal = T , testinput = "9 ;"r= 9.00000000 ios= 0
point, isreal = T , testinput = "4,4 ;"r= 4.00000000 ios= 0
* * *
Looking at the other compilers, ifort, ifx and Flang do issue an error
here. Likewise, g95 seems to yield an error in this case (see below).
I do note that the Lapack testcase that triggered this PR did have such
a code - but it was then changed because g95 did not like it:
https://github.com/Reference-LAPACK/lapack/commit/64e8a7500d817869e5fcde35afd39af8bc7a8086
In terms of gfortran: until recently did accept it (all versions,
including 13+14); it then rejected it due to the change in PR105473 (GCC
14/mainline, backported to 13)– but I now think it rightly did so. With
the current patch, it is accepted again.
* * *
I have attached the modified testcase linked above; consider adding it
as well. - Changes to the one of the attachment:
- I added a few additional (albeit boring) tests
- I added an expected output + error diagnostic.
The testcase assumes an error for ';' as separator (with 'point'),
unless there is a space before it.
[If we want to not diagnose this as vendor extension, we really need to
add a comment to that testcase besides changing valid = .false. to .true.]
Tobias
On 4/4/24 2:31 AM, Tobias Burnus wrote:
> Hi Jerry,
>
> Jerry D wrote:
>> The attached log entry and patch (git show) fixes this issue by adding
>> logic to handle spaces in eat_separators. One or more spaces by
>> themselves are a valid separator. So in this case we look at the
>> character following the spaces to see if it is a comma or semicolon.
>>
>> If so, I change it to the valid separator for the given decimal mode,
>> point or comma. This allows the comma or semicolon to be interpreted
>> as a null read on the next effective item in the formatted read.
>>
>> I chose a permissive approach here that allows reads to proceed when the
>> input line is mal-formed with an incorrect separator as long as there
>> is at least one space in front of it.
>
> First: Consider also adding 'PR fortran/105473' to the commit log
> as the PRs are closely related, albeit this PR is different-
>
> The patch looks mostly like I would expect, except for decimal='point'
> and a ';' which is *not* preceded by a space.
>
> Thanks for working on it.
>
> Regarding the 'except' case:
>
> * * *
>
> If I try your patch with the testcase of at comment 19,
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114304#c19
> → https://gcc.gnu.org/bugzilla/attachment.cgi?id=57695 ,
>
> I do note that with 'decimal=point', a tailing semicolon is silently
> accepted – even if not proceeded by a space.
I did that on purpose out of leniency and fear of breaking something. I
will add that in and do some testing.
>
> I think such code is invalid – and you could consider to reject it.
> Otherwise, the handling all seems to be in line with the Fortran spec.
>
> i.e. for the following string, I had *expected an error*:
I will see what it does. I agree in principle.
>
> point, isreal = F , testinput = ";"n= 42 ios= 0
> point, isreal = F , testinput = "5;"n= 5 ios= 0
> point, isreal = T , testinput = "8;"r= 8.00000000 ios= 0
> point, isreal = T , testinput = "3.3;"r= 3.29999995 ios= 0
> point, isreal = T , testinput = "3,3;"r= 3.00000000 ios= 0
>
> while I think the following is OK (i.e. no error is what I expect) due
> to the the space before the ';'.
Agree and this is what I was attempting.
>
> point, isreal = F , testinput = "7 ;"n= 7 ios= 0
> point, isreal = T , testinput = "9 ;"r= 9.00000000 ios= 0
> point, isreal = T , testinput = "4.4 ;"r= 4.40000010 ios=0
> point, isreal = T , testinput = "9 ;"r= 9.00000000 ios= 0
> point, isreal = T , testinput = "4,4 ;"r= 4.00000000 ios= 0
>
> * * *
>
> Looking at the other compilers, ifort, ifx and Flang do issue an error
> here. Likewise, g95 seems to yield an error in this case (see below).
>
> I do note that the Lapack testcase that triggered this PR did have such
> a code - but it was then changed because g95 did not like it:
>
> https://github.com/Reference-LAPACK/lapack/commit/64e8a7500d817869e5fcde35afd39af8bc7a8086
>
I am glad they fixed that, it triggered a whole lot of cycles here.
> In terms of gfortran: until recently did accept it (all versions,
> including 13+14); it then rejected it due to the change in PR105473 (GCC
> 14/mainline, backported to 13)– but I now think it rightly did so. With
> the current patch, it is accepted again.
>
> * * *
>
> I have attached the modified testcase linked above; consider adding it
> as well. - Changes to the one of the attachment:
> - I added a few additional (albeit boring) tests
> - I added an expected output + error diagnostic.
>
> The testcase assumes an error for ';' as separator (with 'point'),
> unless there is a space before it.
>
> [If we want to not diagnose this as vendor extension, we really need to
> add a comment to that testcase besides changing valid = .false. to .true.]
I have gone back and forth on this issue many times trying to find the
middle ground. The standard is fairly clear. To me it is on the user to
not use malformed input so allowing with the intervening space we are
simply ignoring the trailing ',' or ';' and calling the spaces
sufficient as a valid separator.
Regardless I now have your modified test case passing. Much appreciated.
Thanks for the reviews. I will resubmit when I can, hopefully today.
Cheers,
Jerry
On 4/4/24 2:31 AM, Tobias Burnus wrote:
> Hi Jerry,
>
--- snip ---
> The patch looks mostly like I would expect, except for decimal='point'
> and a ';' which is *not* preceded by a space.
>
> Thanks for working on it.
>
> Regarding the 'except' case:
>
--- snip ---
> i.e. for the following string, I had *expected an error*:
>
> point, isreal = F , testinput = ";"n= 42 ios= 0
> point, isreal = F , testinput = "5;"n= 5 ios= 0
> point, isreal = T , testinput = "8;"r= 8.00000000 ios= 0
> point, isreal = T , testinput = "3.3;"r= 3.29999995 ios= 0
> point, isreal = T , testinput = "3,3;"r= 3.00000000 ios= 0
>
--- snip ---
> I have attached the modified testcase linked above; consider adding it
> as well. - Changes to the one of the attachment:
> - I added a few additional (albeit boring) tests
> - I added an expected output + error diagnostic.
>
> The testcase assumes an error for ';' as separator (with 'point'),
> unless there is a space before it.
>
--- snip ---
Here attached is the revised patch. I replaced the new test case I had
with the one you provided modified and it now passes.
I modified the test case pr105473.f90 to expect the error on semicolon.
Regression tested on X86_64. OK for trunk and a bit later back port to 13?
Cheers,
Jerry
Hi Jerry,
I think for the current testcases, I like the patch – the question is
only what's about:
',3' as input for 'comma' (or '.3' as input for 'point')
For 'point' – 0.3 is read and ios = 0 (as expected)
But for 'comma':
* GCC 12 reads nothing and has ios = 0.
* GCC 13/mainline has an error (ios != 0 – and reads nothing)
* GCC with your patch: Same result: ios != 0 and nothing read.
Expected: Same as with ','/'comma' – namely: read-in value is 0.3.
→ https://godbolt.org/z/4rc8fz4sT for the full example, which works with
ifort, ifx and flang
* * *
Can you check and fix this? It looks perfectly valid to me to have
remove the '0' in the floating point numbers '0.3' or '0,3' seems to be
permitted – and it works for '.' (with 'point') but not for ',' (with
'comma').
F2023's "13.10.3.1 List-directed input forms" refers to "13.7.2.3.2 F
editing", which states:
"The standard form of the input field [...] The form of the mantissa is
an optional sign, followed by a string of one or more digits optionally
containing a decimal symbol."
The latter does not require that the digit has to be before the decimal
sign and as for output, it is optional, it is surely intended that ",3"
is a valid floating-point number for decimal='comma'.
* * *
I extended the testcase to check for this – see attached diff. All
'point' work, all 'comma' fail.
Thanks for working on this!
Tobias
On 4/4/24 2:41 PM, Tobias Burnus wrote:
> Hi Jerry,
>
> I think for the current testcases, I like the patch – the question is
> only what's about:
>
> ',3' as input for 'comma' (or '.3' as input for 'point')
>
> For 'point' – 0.3 is read and ios = 0 (as expected)
> But for 'comma':
> * GCC 12 reads nothing and has ios = 0.
> * GCC 13/mainline has an error (ios != 0 – and reads nothing)
> * GCC with your patch: Same result: ios != 0 and nothing read.
>
> Expected: Same as with ','/'comma' – namely: read-in value is 0.3.
> → https://godbolt.org/z/4rc8fz4sT for the full example, which works with
> ifort, ifx and flang
>
> * * *
>
> Can you check and fix this? It looks perfectly valid to me to have
> remove the '0' in the floating point numbers '0.3' or '0,3' seems to be
> permitted – and it works for '.' (with 'point') but not for ',' (with
> 'comma').
>
Yes, I found the spot to fix this.
> F2023's "13.10.3.1 List-directed input forms" refers to "13.7.2.3.2 F
> editing", which states:
>
> "The standard form of the input field [...] The form of the mantissa is
> an optional sign, followed by a string of one or more digits optionally
> containing a decimal symbol."
>
> The latter does not require that the digit has to be before the decimal
> sign and as for output, it is optional, it is surely intended that ",3"
> is a valid floating-point number for decimal='comma'.
>
Agree
> * * *
>
> I extended the testcase to check for this – see attached diff. All
> 'point' work, all 'comma' fail.
>
> Thanks for working on this!
>
> Tobias
Thanks much. After I fix it, I will use your extended test case in the
test suite.
Jerry -
On 4/5/24 10:47 AM, Jerry D wrote:
> On 4/4/24 2:41 PM, Tobias Burnus wrote:
>> Hi Jerry,
>>
>> I think for the current testcases, I like the patch – the question is
>> only what's about:
>>
>> ',3' as input for 'comma' (or '.3' as input for 'point')
>>
>> For 'point' – 0.3 is read and ios = 0 (as expected)
>> But for 'comma':
>> * GCC 12 reads nothing and has ios = 0.
>> * GCC 13/mainline has an error (ios != 0 – and reads nothing)
>> * GCC with your patch: Same result: ios != 0 and nothing read.
>>
>> Expected: Same as with ','/'comma' – namely: read-in value is 0.3.
>> → https://godbolt.org/z/4rc8fz4sT for the full example, which works
>> with ifort, ifx and flang
>>
>> * * *
>>
>> Can you check and fix this? It looks perfectly valid to me to have
>> remove the '0' in the floating point numbers '0.3' or '0,3' seems to
>> be permitted – and it works for '.' (with 'point') but not for ','
>> (with 'comma').
>>
> Yes, I found the spot to fix this.
>
>> F2023's "13.10.3.1 List-directed input forms" refers to "13.7.2.3.2 F
>> editing", which states:
>>
>> "The standard form of the input field [...] The form of the mantissa
>> is an optional sign, followed by a string of one or more digits
>> optionally containing a decimal symbol."
>>
>> The latter does not require that the digit has to be before the
>> decimal sign and as for output, it is optional, it is surely intended
>> that ",3" is a valid floating-point number for decimal='comma'.
>>
>
> Agree
>
>> * * *
>>
>> I extended the testcase to check for this – see attached diff. All
>> 'point' work, all 'comma' fail.
>>
>> Thanks for working on this!
>>
>> Tobias
>
> Thanks much. After I fix it, I will use your extended test case in the
> test suite.
>
> Jerry -
See attached updated patch.
Regressions tested on x86-64. OK for trunk and 13 after a bit.
Jerry -
Hi Jerry, hello world,
Jerry D wrote:
> On 4/5/24 10:47 AM, Jerry D wrote:
>> On 4/4/24 2:41 PM, Tobias Burnus wrote:
>>> I think for the current testcases, I like the patch – the question
>>> is only what's about:
>>> ',3' as input for 'comma' (or '.3' as input for 'point')
>>> [...]
>>> But for 'comma': [...]
>>> * GCC with your patch: Same result: ios != 0 and nothing read.
>>>
>>> Expected: [...] read-in value is 0.3. [...]
> See attached updated patch.
> Regressions tested on x86-64. OK for trunk and 13 after a bit.
OK. Thanks for the patch!
Tobias
commit 7d1a958d6b099ea88b6c51649baf5dbd5e598909
Author: Jerry DeLisle <jvdelisle@gcc.gnu.org>
Date: Wed Apr 3 18:07:30 2024 -0700
libfortran: Fix handling of formatted separators.
PR libfortran/114304
libgfortran/ChangeLog:
* io/list_read.c (eat_separator): Add logic to handle spaces
preceding a comma or semicolon such that that a 'null' read
occurs without error at the end of comma or semicolon
terminated input lines.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr114304.f90: New test.
new file mode 100644
@@ -0,0 +1,49 @@
+! { dg-do run }
+! pr114304
+real :: x(3)
+character(len=1) :: s
+integer :: ios
+
+s = 'x'
+
+open(99, decimal="comma", status='scratch')
+write(99, '(a)') '1,23435 1243,24 13,24 a'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'a') stop 1
+
+rewind(99)
+write(99, '(a)') '1,23435;1243,24;13,24;a'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'a') stop 2
+
+! Note: not reading 's'
+rewind(99)
+write(99, '(a)') '1,23435 1243,24 13,24 ,'
+rewind(99)
+read(99, *) x
+if (ios /= 0) stop 3
+
+rewind(99)
+write(99, '(a)') '1,23435;1243,24;13,24 ,'
+rewind(99)
+read(99, *, iostat=ios) x
+if (ios /= 0) stop 4
+
+! Now reading 's'
+s = 'w'
+rewind(99)
+write(99, '(a)') '1,23435 1243,24 13,24 ,'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'w') stop 5
+
+s = 'w'
+rewind(99)
+write(99, '(a)') '1,23435;1243,24;13,24 ,'
+rewind(99)
+read(99, *, iostat=ios) x, s
+if (ios /= 0 .or. s /= 'w') stop 6
+close(99)
+end
@@ -461,11 +461,30 @@ eat_separator (st_parameter_dt *dtp)
int c, n;
int err = 0;
- eat_spaces (dtp);
dtp->u.p.comma_flag = 0;
+ c = next_char (dtp);
+ if (c == ' ')
+ {
+ eat_spaces (dtp);
+ c = next_char (dtp);
+ if (c == ',')
+ {
+ if (dtp->u.p.current_unit->decimal_status == DECIMAL_COMMA)
+ unget_char (dtp, ';');
+ dtp->u.p.comma_flag = 1;
+ eat_spaces (dtp);
+ return err;
+ }
+ if (c == ';')
+ {
+ if (dtp->u.p.current_unit->decimal_status == DECIMAL_POINT)
+ unget_char (dtp, ',');
+ dtp->u.p.comma_flag = 1;
+ eat_spaces (dtp);
+ return err;
+ }
+ }
- if ((c = next_char (dtp)) == EOF)
- return LIBERROR_END;
switch (c)
{
case ',':
@@ -476,7 +495,10 @@ eat_separator (st_parameter_dt *dtp)
unget_char (dtp, c);
break;
}
- /* Fall through. */
+ dtp->u.p.comma_flag = 1;
+ eat_spaces (dtp);
+ break;
+
case ';':
dtp->u.p.comma_flag = 1;
eat_spaces (dtp);