[v3] i386: Use pthread_barrier for synchronization on tst-bz21269

Message ID 20230224161528.997-1-adhemerval.zanella@linaro.org
State Superseded
Headers
Series [v3] i386: Use pthread_barrier for synchronization on tst-bz21269 |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Adhemerval Zanella Netto Feb. 24, 2023, 4:15 p.m. UTC
  To improve the false negative in patchwork buildbot.

Checked on i686-linux-gnu.
---
 sysdeps/unix/sysv/linux/i386/tst-bz21269.c | 69 ++++++++--------------
 1 file changed, 23 insertions(+), 46 deletions(-)
  

Comments

DJ Delorie Feb. 27, 2023, 10:21 p.m. UTC | #1
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

>  static void *
>  threadproc (void *ctx)
>  {
> -  while (1)
> +  for (int i = 0; i < NITER; i++)

Ok.

> -      futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0);
> -      while (atomic_load (&ftx) != 2)
> -	{
> -	  if (atomic_load (&ftx) >= 3)
> -	    return NULL;
> -	}
> +      xpthread_barrier_wait (&barrier);

- parent up to this point has set up the LDT
block 1 - thread waits for parent;

>        /* clear LDT entry 0.  */
>        const struct user_desc desc = { 0 };
>        xmodify_ldt (1, &desc, sizeof (desc));
>  
> -      /* If ftx == 2, set it to zero,  If ftx == 100, quit.  */
> -      if (atomic_fetch_add (&ftx, -2) != 2)
> -	return NULL;
> +      /* Wait for 'ss' set in main thread.  */
> +      xpthread_barrier_wait (&barrier);

block 2 - child sets ldt, parent does nothing

- post, parent sets SS.  This still isn't doing what the original test
case was doing.

The original code did this:


Parent changes LDT and sets SS
- force a task switch, which should segfault
  - sigaction's handler either works, or fails  <-- important part
- child resets LDT

Do you have a version of libc newer than the patch in BZ#21269 that you
can test the modified test against, to make sure it still detects the
failing case?

Thinking about it, I suspect this change shouldn't be done:

> -  TEST_VERIFY_EXIT (sigaction (sig, &sa, 0) == 0);
> +  xsigaction (sig, &sa, 0);

Although they do the same things, since it's sigaction we're actually
testing here, hiding it in an xfunction isn't appropriate.
  
Adhemerval Zanella Netto Feb. 28, 2023, 2:29 p.m. UTC | #2
On 27/02/23 19:21, DJ Delorie wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> 
>>  static void *
>>  threadproc (void *ctx)
>>  {
>> -  while (1)
>> +  for (int i = 0; i < NITER; i++)
> 
> Ok.
> 
>> -      futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0);
>> -      while (atomic_load (&ftx) != 2)
>> -	{
>> -	  if (atomic_load (&ftx) >= 3)
>> -	    return NULL;
>> -	}
>> +      xpthread_barrier_wait (&barrier);
> 
> - parent up to this point has set up the LDT
> block 1 - thread waits for parent;
> 
>>        /* clear LDT entry 0.  */
>>        const struct user_desc desc = { 0 };
>>        xmodify_ldt (1, &desc, sizeof (desc));
>>  
>> -      /* If ftx == 2, set it to zero,  If ftx == 100, quit.  */
>> -      if (atomic_fetch_add (&ftx, -2) != 2)
>> -	return NULL;
>> +      /* Wait for 'ss' set in main thread.  */
>> +      xpthread_barrier_wait (&barrier);
> 
> block 2 - child sets ldt, parent does nothing
> 
> - post, parent sets SS.  This still isn't doing what the original test
> case was doing.
> 
> The original code did this:
> 
> 
> Parent changes LDT and sets SS
> - force a task switch, which should segfault
>   - sigaction's handler either works, or fails  <-- important part
> - child resets LDT
> 
> Do you have a version of libc newer than the patch in BZ#21269 that you
> can test the modified test against, to make sure it still detects the
> failing case?

What I did was to actually revert the original fix and check if the test
in fact trigger the issue:

diff --git a/sysdeps/unix/sysv/linux/i386/libc_sigaction.c b/sysdeps/unix/sysv/linux/i386/libc_sigaction.c
index 0665b41bbc..b2454d8d55 100644
--- a/sysdeps/unix/sysv/linux/i386/libc_sigaction.c
+++ b/sysdeps/unix/sysv/linux/i386/libc_sigaction.c
@@ -33,7 +33,7 @@ extern void restore (void) asm ("__restore") attribute_hidden;
                               ? &restore_rt : &restore);       \
        }                                                       \
      else                                                      \
-       (kact)->sa_restorer = NULL;                             \
+       (kact)->sa_restorer = (void*)0xdeadbeef;                        \
   })

 #define RESET_SA_RESTORER(act, kact) \

And it does not, because as you pointed out the barrier are not fully
correct.  So I circle back and I think the v2 is actually correctly,
using a bogus sa_restore does trigger a failure and the test succeeds
with current code.

> 
> Thinking about it, I suspect this change shouldn't be done:
> 
>> -  TEST_VERIFY_EXIT (sigaction (sig, &sa, 0) == 0);
>> +  xsigaction (sig, &sa, 0);
> 
> Although they do the same things, since it's sigaction we're actually
> testing here, hiding it in an xfunction isn't appropriate.
> 
Alright.
  
DJ Delorie March 2, 2023, 5:10 a.m. UTC | #3
So I was able to reproduce the hangs in the original source, and debug
it, and fix it.  In doing so, I realized that we can't use anything
complex to trigger the thread because that "anything" might also cause
the expected segfault and force everything out of sync again.

Here's what I ended up with, and it doesn't seem to hang where the
original one hung quite often (in a tight while..end loop).  The key
changes are:

1. Calls to futex are error checked, with retries, to ensure that the
   futexes are actually doing what they're supposed to be doing.  In the
   original code, nearly every futex call returned an error.

2. The main loop has checks for whether the thread ran or not, and
   "unlocks" the thread if it didn't (this is how the original source
   hangs).

diff --git a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c
index c95bfcb917..51d4a1b082 100644
--- a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c
+++ b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c
@@ -1,5 +1,5 @@
 /* Test for i386 sigaction sa_restorer handling (BZ#21269)
-   Copyright (C) 2017-2022 Free Software Foundation, Inc.
+   Copyright (C) 2017-2023 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -135,7 +135,14 @@ threadproc (void *ctx)
 {
   while (1)
     {
-      futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0);
+      /* Continue to wait here until we've successfully waited, unless
+	 we're supposed to be clearing the LDT already.  */
+      while (futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0) < 0)
+	if (atomic_load (&ftx) >= 2)
+	  break;
+
+      /* Normally there's time to hit this busy loop and wait for ftx
+	 to be set to 2.  */
       while (atomic_load (&ftx) != 2)
 	{
 	  if (atomic_load (&ftx) >= 3)
@@ -189,7 +196,14 @@ do_test (void)
       if (sigsetjmp (jmpbuf, 1) != 0)
 	continue;
 
-      /* Make sure the thread is ready after the last test. */
+      /* We may have longjmp'd before triggering the thread.  If so,
+	 trigger the thread now and wait for it.  */
+      if (atomic_load (&ftx) == 1)
+	atomic_store (&ftx, 2);
+
+      /* Make sure the thread is ready after the last test.  FTX is
+	 initially zero for the first loop, and set to zero each time
+	 the thread clears the LDT.  */
       while (atomic_load (&ftx) != 0)
 	;
 
@@ -207,15 +221,24 @@ do_test (void)
 
       xmodify_ldt (0x11, &desc, sizeof (desc));
 
-      /* Arm the thread.  */
-      ftx = 1;
-      futex ((int*) &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+      /* Arm the thread.  We loop here until we've woken up one thread.  */
+      atomic_store (&ftx, 1);
+      while (futex ((int*) &ftx, FUTEX_WAKE, 1, NULL, NULL, 0) < 1)
+	;
+
+      /* Give the thread a chance to get into it's busy loop.  */
+      usleep (5);
 
+      /* At *ANY* point after this instruction, we may segfault and
+	 longjump back to the top of the loop.  The intention is to
+	 have this happen when the thread clears the LDT, but it could
+	 happen elsewhen.  */
       asm volatile ("mov %0, %%ss" : : "r" (0x7));
 
       /* Fire up thread modify_ldt call.  */
       atomic_store (&ftx, 2);
 
+      /* And wait for it.  */
       while (atomic_load (&ftx) != 0)
 	;
  
Adhemerval Zanella Netto March 2, 2023, 5 p.m. UTC | #4
On 02/03/23 02:10, DJ Delorie wrote:
> 
> So I was able to reproduce the hangs in the original source, and debug
> it, and fix it.  In doing so, I realized that we can't use anything
> complex to trigger the thread because that "anything" might also cause
> the expected segfault and force everything out of sync again.
> 
> Here's what I ended up with, and it doesn't seem to hang where the
> original one hung quite often (in a tight while..end loop).  The key
> changes are:
> 
> 1. Calls to futex are error checked, with retries, to ensure that the
>    futexes are actually doing what they're supposed to be doing.  In the
>    original code, nearly every futex call returned an error.
> 
> 2. The main loop has checks for whether the thread ran or not, and
>    "unlocks" the thread if it didn't (this is how the original source
>    hangs).

This makes sense but...

> 
> diff --git a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c
> index c95bfcb917..51d4a1b082 100644
> --- a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c
> +++ b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c
> @@ -1,5 +1,5 @@
>  /* Test for i386 sigaction sa_restorer handling (BZ#21269)
> -   Copyright (C) 2017-2022 Free Software Foundation, Inc.
> +   Copyright (C) 2017-2023 Free Software Foundation, Inc.
>     This file is part of the GNU C Library.
>  
>     The GNU C Library is free software; you can redistribute it and/or
> @@ -135,7 +135,14 @@ threadproc (void *ctx)
>  {
>    while (1)
>      {
> -      futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0);
> +      /* Continue to wait here until we've successfully waited, unless
> +	 we're supposed to be clearing the LDT already.  */
> +      while (futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0) < 0)
> +	if (atomic_load (&ftx) >= 2)
> +	  break;
> +
> +      /* Normally there's time to hit this busy loop and wait for ftx
> +	 to be set to 2.  */
>        while (atomic_load (&ftx) != 2)
>  	{
>  	  if (atomic_load (&ftx) >= 3)
> @@ -189,7 +196,14 @@ do_test (void)
>        if (sigsetjmp (jmpbuf, 1) != 0)
>  	continue;
>  
> -      /* Make sure the thread is ready after the last test. */
> +      /* We may have longjmp'd before triggering the thread.  If so,
> +	 trigger the thread now and wait for it.  */
> +      if (atomic_load (&ftx) == 1)
> +	atomic_store (&ftx, 2);
> +
> +      /* Make sure the thread is ready after the last test.  FTX is
> +	 initially zero for the first loop, and set to zero each time
> +	 the thread clears the LDT.  */
>        while (atomic_load (&ftx) != 0)
>  	;
>  
> @@ -207,15 +221,24 @@ do_test (void)
>  
>        xmodify_ldt (0x11, &desc, sizeof (desc));
>  
> -      /* Arm the thread.  */
> -      ftx = 1;
> -      futex ((int*) &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
> +      /* Arm the thread.  We loop here until we've woken up one thread.  */
> +      atomic_store (&ftx, 1);
> +      while (futex ((int*) &ftx, FUTEX_WAKE, 1, NULL, NULL, 0) < 1)
> +	;
> +
> +      /* Give the thread a chance to get into it's busy loop.  */
> +      usleep (5);

... I shivers every time I see sleep used as synchronization mechanism, since
most likely in some environment the sleep won't work as expected due
scheduling pressuer and we will end up with a false positive.

I am wondering if it would be better to just remove this test saying we
can't really make it work reliable.

>  
> +      /* At *ANY* point after this instruction, we may segfault and
> +	 longjump back to the top of the loop.  The intention is to
> +	 have this happen when the thread clears the LDT, but it could
> +	 happen elsewhen.  */
>        asm volatile ("mov %0, %%ss" : : "r" (0x7));
>  
>        /* Fire up thread modify_ldt call.  */
>        atomic_store (&ftx, 2);
>  
> +      /* And wait for it.  */
>        while (atomic_load (&ftx) != 0)
>  	;
>  
>
  
DJ Delorie March 2, 2023, 8:21 p.m. UTC | #5
Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes:
>> +      /* Give the thread a chance to get into it's busy loop.  */
>> +      usleep (5);
>
> ... I shivers every time I see sleep used as synchronization mechanism, since
> most likely in some environment the sleep won't work as expected due
> scheduling pressuer and we will end up with a false positive.

Yeah, I'm using as just a "yeild if you can" operation.  The code works
without it, but not in the way the test is intended.

> I am wondering if it would be better to just remove this test saying we
> can't really make it work reliable.

I'm not opposed to removing it.  Even with the fixed I put in, the test
is still more likely to fault "for some reason" than for the expected
reason.
  
Adhemerval Zanella Netto March 8, 2023, 6:01 p.m. UTC | #6
On 02/03/23 17:21, DJ Delorie wrote:
> Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes:
>>> +      /* Give the thread a chance to get into it's busy loop.  */
>>> +      usleep (5);
>>
>> ... I shivers every time I see sleep used as synchronization mechanism, since
>> most likely in some environment the sleep won't work as expected due
>> scheduling pressuer and we will end up with a false positive.
> 
> Yeah, I'm using as just a "yeild if you can" operation.  The code works
> without it, but not in the way the test is intended.
> 
>> I am wondering if it would be better to just remove this test saying we
>> can't really make it work reliable.
> 
> I'm not opposed to removing it.  Even with the fixed I put in, the test
> is still more likely to fault "for some reason" than for the expected
> reason.
> 

Ok, this LGTM then.  The patchwork false positive failures is being really
annoying.
  
DJ Delorie May 10, 2023, 9:31 p.m. UTC | #7
Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes:
> On 02/03/23 17:21, DJ Delorie wrote:
>> I'm not opposed to removing it.  Even with the fixed I put in, the test
>> is still more likely to fault "for some reason" than for the expected
>> reason.
>> 
>
> Ok, this LGTM then.  The patchwork false positive failures is being really
> annoying. 

We dropped the ball on this one.  We ended up with three options:

1> Adhemerval's v4 patch
	https://sourceware.org/pipermail/libc-alpha/2023-February/145934.html
2> DJ's v3 patch
	https://sourceware.org/pipermail/libc-alpha/2023-March/145999.html
3> remove the test

Which option were you agreeing to, and do you still agree to it?
  
Adhemerval Zanella Netto May 15, 2023, 6:33 p.m. UTC | #8
On 10/05/23 18:31, DJ Delorie wrote:
> 
> Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes:
>> On 02/03/23 17:21, DJ Delorie wrote:
>>> I'm not opposed to removing it.  Even with the fixed I put in, the test
>>> is still more likely to fault "for some reason" than for the expected
>>> reason.
>>>
>>
>> Ok, this LGTM then.  The patchwork false positive failures is being really
>> annoying. 
> 
> We dropped the ball on this one.  We ended up with three options:
> 
> 1> Adhemerval's v4 patch
> 	https://sourceware.org/pipermail/libc-alpha/2023-February/145934.html
> 2> DJ's v3 patch
> 	https://sourceware.org/pipermail/libc-alpha/2023-March/145999.html
> 3> remove the test
> 
> Which option were you agreeing to, and do you still agree to it?
> 

Either 2 or 3 is fine for me, since I acked your patch I think we can
go with 2 for now.
  
DJ Delorie May 16, 2023, 7:10 p.m. UTC | #9
Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes:
>> 1> Adhemerval's v4 patch
>> 	https://sourceware.org/pipermail/libc-alpha/2023-February/145934.html
>> 2> DJ's v3 patch
>> 	https://sourceware.org/pipermail/libc-alpha/2023-March/145999.html
>> 3> remove the test
>> 
>> Which option were you agreeing to, and do you still agree to it?
>> 
>
> Either 2 or 3 is fine for me, since I acked your patch I think we can
> go with 2 for now.

I pushed #2.  Next time we have a problem with that test, we'll just
remove it.
  

Patch

diff --git a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c
index 1850240c34..351a9187e7 100644
--- a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c
+++ b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c
@@ -20,16 +20,13 @@ 
    more specifically in do_multicpu_tests function.  The main changes
    are:
 
-   - C11 atomics instead of plain access.
+   - Use pthread_barrier instead of atomic and futexes.
    - Remove x86_64 support which simplifies the syscall handling
      and fallbacks.
    - Replicate only the test required to trigger the issue for the
      BZ#21269.  */
 
-#include <stdatomic.h>
-
 #include <asm/ldt.h>
-#include <linux/futex.h>
 
 #include <setjmp.h>
 #include <signal.h>
@@ -40,6 +37,9 @@ 
 #include <support/xunistd.h>
 #include <support/check.h>
 #include <support/xthread.h>
+#include <support/xsignal.h>
+
+#define NITER 5
 
 static int
 xset_thread_area (struct user_desc *u_info)
@@ -55,13 +55,6 @@  xmodify_ldt (int func, const void *ptr, unsigned long bytecount)
   TEST_VERIFY_EXIT (syscall (SYS_modify_ldt, 1, ptr, bytecount) == 0);
 }
 
-static int
-futex (int *uaddr, int futex_op, int val, void *timeout, int *uaddr2,
-	int val3)
-{
-  return syscall (SYS_futex, uaddr, futex_op, val, timeout, uaddr2, val3);
-}
-
 static void
 xsethandler (int sig, void (*handler)(int, siginfo_t *, void *), int flags)
 {
@@ -69,7 +62,7 @@  xsethandler (int sig, void (*handler)(int, siginfo_t *, void *), int flags)
   sa.sa_sigaction = handler;
   sa.sa_flags = SA_SIGINFO | flags;
   TEST_VERIFY_EXIT (sigemptyset (&sa.sa_mask) == 0);
-  TEST_VERIFY_EXIT (sigaction (sig, &sa, 0) == 0);
+  xsigaction (sig, &sa, 0);
 }
 
 static jmp_buf jmpbuf;
@@ -123,33 +116,24 @@  setup_low_user_desc (void)
   low_user_desc_clear->seg_not_present = 1;
 }
 
-/* Possible values of futex:
-   0: thread is idle.
-   1: thread armed.
-   2: thread should clear LDT entry 0.
-   3: thread should exit.  */
-static atomic_uint ftx;
+static pthread_barrier_t barrier;
 
 static void *
 threadproc (void *ctx)
 {
-  while (1)
+  for (int i = 0; i < NITER; i++)
     {
-      futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0);
-      while (atomic_load (&ftx) != 2)
-	{
-	  if (atomic_load (&ftx) >= 3)
-	    return NULL;
-	}
+      xpthread_barrier_wait (&barrier);
 
       /* clear LDT entry 0.  */
       const struct user_desc desc = { 0 };
       xmodify_ldt (1, &desc, sizeof (desc));
 
-      /* If ftx == 2, set it to zero,  If ftx == 100, quit.  */
-      if (atomic_fetch_add (&ftx, -2) != 2)
-	return NULL;
+      /* Wait for 'ss' set in main thread.  */
+      xpthread_barrier_wait (&barrier);
     }
+
+  return NULL;
 }
 
 
@@ -180,20 +164,18 @@  do_test (void)
   /* Some kernels send SIGBUS instead.  */
   xsethandler (SIGBUS, sigsegv_handler, 0);
 
+  xpthread_barrier_init (&barrier, NULL, 2);
+
   thread = xpthread_create (0, threadproc, 0);
 
   asm volatile ("mov %%ss, %0" : "=rm" (orig_ss));
 
-  for (int i = 0; i < 5; i++)
+  for (int i = 0; i < NITER; i++)
     {
       if (sigsetjmp (jmpbuf, 1) != 0)
 	continue;
 
-      /* Make sure the thread is ready after the last test. */
-      while (atomic_load (&ftx) != 0)
-	;
-
-      struct user_desc desc = {
+      const struct user_desc desc = {
 	.entry_number       = 0,
 	.base_addr          = 0,
 	.limit              = 0xffff,
@@ -207,28 +189,23 @@  do_test (void)
 
       xmodify_ldt (0x11, &desc, sizeof (desc));
 
-      /* Arm the thread.  */
-      ftx = 1;
-      futex ((int*) &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
-
-      asm volatile ("mov %0, %%ss" : : "r" (0x7));
+      /* Make sure the thread is ready after the last test. */
+      xpthread_barrier_wait (&barrier);
 
-      /* Fire up thread modify_ldt call.  */
-      atomic_store (&ftx, 2);
+      /* Wait thread clear LDT entry 0.  */
+      xpthread_barrier_wait (&barrier);
 
-      while (atomic_load (&ftx) != 0)
-	;
+      asm volatile ("mov %0, %%ss" : : "r" (0x7));
 
       /* On success, modify_ldt will segfault us synchronously and we will
 	 escape via siglongjmp.  */
       support_record_failure ();
     }
 
-  atomic_store (&ftx, 100);
-  futex ((int*) &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
-
   xpthread_join (thread);
 
+  xpthread_barrier_destroy (&barrier);
+
   return 0;
 }