Fix tst-cancel17/tst-cancelx17, which sometimes segfaults while exiting.
Commit Message
Hi,
The testcase tst-cancel[x]17 ends sometimes with a segmentation fault.
This happens in one of 10000 cases. Then the real testcase has already
exited with success and returned from do_test(). The segmentation fault
occurs after returning from main in _dl_fini().
In those cases, the aio_read(&a) was not canceled, because the read
request was already in progress. In the meanwhile aio_write(ap) wrote
something to the pipe and the read request is able to read the
requested byte.
The read request hasn't finished before returning from do_test().
After it finishes, it writes the return value and error code from the
read syscall to the struct aiocb a, which lays on the stack of do_test.
The stack of the subsequent function call of _dl_fini or _dl_sort_fini,
which is inlined in _dl_fini, is corrupted.
In case of S390, it reads a zero and decrements it by 1:
unsigned int k = nmaps - 1;
struct link_map **runp = maps[k]->l_initfini;
The subsequent load from unmapped memory leads to the segmentation
fault.
The stack corruption also happens on other architectures.
I saw them e.g. on x86 and ppc, too.
This patch adds an aio_suspend call to ensure, that the read request
is finished before returning from do_test().
Okay to commit?
Bye
Stefan
ChangeLog:
* nptl/tst-cancel17.c (do_test):
Wait for finishing aio_read(&a).
Comments
On 05/17/2016 08:57 AM, Stefan Liebler wrote:
> * nptl/tst-cancel17.c (do_test): Wait for finishing aio_read(&a).
Nice analysis. Looks good to me.
> diff --git a/nptl/tst-cancel17.c b/nptl/tst-cancel17.c
> index fb89292..9ff4e27 100644
> --- a/nptl/tst-cancel17.c
> +++ b/nptl/tst-cancel17.c
> @@ -333,6 +333,22 @@ do_test (void)
>
> puts ("early cancellation succeeded");
>
> + if (ap == &a2)
> + {
> + /* The aio_read(&a) was not canceled, because the read request was
> + already in progress. In the meanwhile aio_write(ap) wrote something
> + to the pipe and the read request either has already been finished or
> + is able to read the requested byte.
> + Wait for the read request before returning from this function, because
> + the return value and error code from the read syscall will be written
> + to the struct aiocb a, which lays on the stack of this function.
> + Otherwise the stack from subsequent function calls - e.g. _dl_fini -
> + will be corrupted, which can lead to undefined behaviour like a
> + segmentation fault. */
I think it should be “which *lies* on the stack” (the intransitive
variant), and a comma before “because“ is frowned upon in some circles.
Thanks,
Florian
On 05/17/2016 09:18 AM, Florian Weimer wrote:
> On 05/17/2016 08:57 AM, Stefan Liebler wrote:
>> * nptl/tst-cancel17.c (do_test): Wait for finishing aio_read(&a).
>
> Nice analysis. Looks good to me.
>
>> diff --git a/nptl/tst-cancel17.c b/nptl/tst-cancel17.c
>> index fb89292..9ff4e27 100644
>> --- a/nptl/tst-cancel17.c
>> +++ b/nptl/tst-cancel17.c
>> @@ -333,6 +333,22 @@ do_test (void)
>>
>> puts ("early cancellation succeeded");
>>
>> + if (ap == &a2)
>> + {
>> + /* The aio_read(&a) was not canceled, because the read request was
>> + already in progress. In the meanwhile aio_write(ap) wrote something
>> + to the pipe and the read request either has already been
>> finished or
>> + is able to read the requested byte.
>> + Wait for the read request before returning from this function,
>> because
>> + the return value and error code from the read syscall will be
>> written
>> + to the struct aiocb a, which lays on the stack of this function.
>> + Otherwise the stack from subsequent function calls - e.g.
>> _dl_fini -
>> + will be corrupted, which can lead to undefined behaviour like a
>> + segmentation fault. */
>
> I think it should be “which *lies* on the stack” (the intransitive
> variant), and a comma before “because“ is frowned upon in some circles.
>
> Thanks,
> Florian
>
Oops. Thanks.
Committed with your comments.
commit fd27163dc65df76132946c3eca29dfb82fea2fcc
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Fri May 13 15:10:30 2016 -0400
Fix tst-cancel17/tst-cancelx17, which sometimes segfaults while exiting.
The testcase tst-cancel[x]17 ends sometimes with a segmentation fault.
This happens in one of 10000 cases. Then the real testcase has already
exited with success and returned from do_test(). The segmentation fault
occurs after returning from main in _dl_fini().
In those cases, the aio_read(&a) was not canceled, because the read
request was already in progress. In the meanwhile aio_write(ap) wrote
something to the pipe and the read request is able to read the
requested byte.
The read request hasn't finished before returning from do_test().
After it finishes, it writes the return value and error code from the
read syscall to the struct aiocb a, which lays on the stack of do_test.
The stack of the subsequent function call of _dl_fini or _dl_sort_fini,
which is inlined in _dl_fini is corrupted.
In case of S390, it reads a zero and decrements it by 1:
unsigned int k = nmaps - 1;
struct link_map **runp = maps[k]->l_initfini;
The load from unmapped memory leads to the segmentation fault.
The stack corruption also happens on other architectures.
I saw them e.g. on x86 and ppc, too.
This patch adds an aio_suspend call to ensure, that the read request
is finished before returning from do_test().
ChangeLog:
* nptl/tst-cancel17.c (do_test): Wait for finishing aio_read(&a).
@@ -333,6 +333,22 @@ do_test (void)
puts ("early cancellation succeeded");
+ if (ap == &a2)
+ {
+ /* The aio_read(&a) was not canceled, because the read request was
+ already in progress. In the meanwhile aio_write(ap) wrote something
+ to the pipe and the read request either has already been finished or
+ is able to read the requested byte.
+ Wait for the read request before returning from this function, because
+ the return value and error code from the read syscall will be written
+ to the struct aiocb a, which lays on the stack of this function.
+ Otherwise the stack from subsequent function calls - e.g. _dl_fini -
+ will be corrupted, which can lead to undefined behaviour like a
+ segmentation fault. */
+ const struct aiocb *l[1] = { &a };
+ TEMP_FAILURE_RETRY (aio_suspend(l, 1, NULL));
+ }
+
return 0;
}