sysdeps: Clear O_CREAT|O_ACCMODE when trying again on sem_open

Message ID 20230823042129.3955131-1-sergiodj@sergiodj.net
State New
Headers
Series sysdeps: Clear O_CREAT|O_ACCMODE when trying again on sem_open |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed
redhat-pt-bot/TryBot-still_applies warning Patch no longer applies to master

Commit Message

Sergio Durigan Junior Aug. 23, 2023, 4:21 a.m. UTC
  When invoking sem_open with O_CREAT as one of its flags, we'll end up
in the second part of sem_open's "if ((oflag & O_CREAT) == 0 || (oflag
& O_EXCL) == 0)", which means that we don't expect the semaphore file
to exist.

In that part, open_flags is initialized as "O_RDWR | O_CREAT | O_EXCL
| O_CLOEXEC" and there's an attempt to open(2) the file, which will
likely fail because it won't exist.  After that first (expected)
failure, some cleanup is done and we go back to the label "try_again",
which lives in the first part of the aforementioned "if".

The problem is that, in that part of the code, we expect the semaphore
file to exist, and as such O_CREAT (this time the flag we pass to
open(2)) needs to be cleaned from open_flags, otherwise we'll see
another failure (this time unexpected) when trying to open the file,
which will lead the call to sem_open to fail as well.

This can cause very strange bugs, especially with OpenMPI, which makes
extensive use of semaphores.

The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
are clear after we enter "try_again".

See also: https://bugs.launchpad.net/ubuntu/+source/h5py/+bug/2031912
---
 sysdeps/pthread/sem_open.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Andreas Schwab Aug. 23, 2023, 7:40 a.m. UTC | #1
On Aug 23 2023, Sergio Durigan Junior via Libc-alpha wrote:

> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
> are clear after we enter "try_again".

That doesn't match what the patch does.  It only adds flags to
open_flags, and does not remove any.
  
Adhemerval Zanella Netto Aug. 23, 2023, 1:46 p.m. UTC | #2
On 23/08/23 01:21, Sergio Durigan Junior via Libc-alpha wrote:
> When invoking sem_open with O_CREAT as one of its flags, we'll end up
> in the second part of sem_open's "if ((oflag & O_CREAT) == 0 || (oflag
> & O_EXCL) == 0)", which means that we don't expect the semaphore file
> to exist.
> 
> In that part, open_flags is initialized as "O_RDWR | O_CREAT | O_EXCL
> | O_CLOEXEC" and there's an attempt to open(2) the file, which will
> likely fail because it won't exist.  After that first (expected)
> failure, some cleanup is done and we go back to the label "try_again",
> which lives in the first part of the aforementioned "if".
> 
> The problem is that, in that part of the code, we expect the semaphore
> file to exist, and as such O_CREAT (this time the flag we pass to
> open(2)) needs to be cleaned from open_flags, otherwise we'll see
> another failure (this time unexpected) when trying to open the file,
> which will lead the call to sem_open to fail as well.
> 
> This can cause very strange bugs, especially with OpenMPI, which makes
> extensive use of semaphores.
> 
> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
> are clear after we enter "try_again".
> 
> See also: https://bugs.launchpad.net/ubuntu/+source/h5py/+bug/2031912

This need needs a bug report and, if possible, a regression check (I give
you that it might be tricky due it is a racy condition).

> ---
>  sysdeps/pthread/sem_open.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/sysdeps/pthread/sem_open.c b/sysdeps/pthread/sem_open.c
> index e5db929d20..ba91f89d57 100644
> --- a/sysdeps/pthread/sem_open.c
> +++ b/sysdeps/pthread/sem_open.c
> @@ -66,8 +66,8 @@ __sem_open (const char *name, int oflag, ...)
>    if ((oflag & O_CREAT) == 0 || (oflag & O_EXCL) == 0)
>      {
>        open_flags = O_RDWR | O_NOFOLLOW | O_CLOEXEC;
> -      open_flags |= (oflag & ~(O_CREAT|O_ACCMODE));
>      try_again:
> +      open_flags |= (oflag & ~(O_CREAT|O_ACCMODE));
>        fd = __open (dirname.name, open_flags);
>  
>        if (fd == -1)

I still think this is not fully correct, because on second try it would
not use O_NOFOLLOW.  Also, O_RDWR will be always set since now it always
clear the O_ACCMODE.

So I think it does not actually need to keep the open_flags over the
iteration, we can simplify with something like that:

diff --git a/sysdeps/pthread/sem_open.c b/sysdeps/pthread/sem_open.c
index e5db929d20..7c189afbcf 100644
--- a/sysdeps/pthread/sem_open.c
+++ b/sysdeps/pthread/sem_open.c
@@ -32,11 +32,12 @@
 # define __unlink unlink
 #endif

+#define SEM_OPEN_FLAGS (O_RDWR | O_NOFOLLOW | O_CLOEXEC)
+
 sem_t *
 __sem_open (const char *name, int oflag, ...)
 {
   int fd;
-  int open_flags;
   sem_t *result;

   /* Check that shared futexes are supported.  */
@@ -65,10 +66,8 @@ __sem_open (const char *name, int oflag, ...)
   /* If the semaphore object has to exist simply open it.  */
   if ((oflag & O_CREAT) == 0 || (oflag & O_EXCL) == 0)
     {
-      open_flags = O_RDWR | O_NOFOLLOW | O_CLOEXEC;
-      open_flags |= (oflag & ~(O_CREAT|O_ACCMODE));
     try_again:
-      fd = __open (dirname.name, open_flags);
+      fd = __open (dirname.name, (oflag & O_EXCL) | SEM_OPEN_FLAGS);

       if (fd == -1)
        {
@@ -135,8 +134,7 @@ __sem_open (const char *name, int oflag, ...)
            }

          /* Open the file.  Make sure we do not overwrite anything.  */
-         open_flags = O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC;
-         fd = __open (tmpfname, open_flags, mode);
+         fd = __open (tmpfname, O_CREAT | O_EXCL | SEM_OPEN_FLAGS, mode);
          if (fd == -1)
            {
              if (errno == EEXIST)
  
Sergio Durigan Junior Aug. 23, 2023, 2:10 p.m. UTC | #3
On Wednesday, August 23 2023, Andreas Schwab wrote:

> On Aug 23 2023, Sergio Durigan Junior via Libc-alpha wrote:
>
>> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
>> are clear after we enter "try_again".
>
> That doesn't match what the patch does.  It only adds flags to
> open_flags, and does not remove any.

You're correct.  Simon also spotted the same problem.  Here's an updated
version of the patch.
  
Sergio Durigan Junior Aug. 23, 2023, 2:16 p.m. UTC | #4
On Wednesday, August 23 2023, Adhemerval Zanella Netto wrote:

> On 23/08/23 01:21, Sergio Durigan Junior via Libc-alpha wrote:
>> When invoking sem_open with O_CREAT as one of its flags, we'll end up
>> in the second part of sem_open's "if ((oflag & O_CREAT) == 0 || (oflag
>> & O_EXCL) == 0)", which means that we don't expect the semaphore file
>> to exist.
>> 
>> In that part, open_flags is initialized as "O_RDWR | O_CREAT | O_EXCL
>> | O_CLOEXEC" and there's an attempt to open(2) the file, which will
>> likely fail because it won't exist.  After that first (expected)
>> failure, some cleanup is done and we go back to the label "try_again",
>> which lives in the first part of the aforementioned "if".
>> 
>> The problem is that, in that part of the code, we expect the semaphore
>> file to exist, and as such O_CREAT (this time the flag we pass to
>> open(2)) needs to be cleaned from open_flags, otherwise we'll see
>> another failure (this time unexpected) when trying to open the file,
>> which will lead the call to sem_open to fail as well.
>> 
>> This can cause very strange bugs, especially with OpenMPI, which makes
>> extensive use of semaphores.
>> 
>> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
>> are clear after we enter "try_again".
>> 
>> See also: https://bugs.launchpad.net/ubuntu/+source/h5py/+bug/2031912
>
> This need needs a bug report and, if possible, a regression check (I give
> you that it might be tricky due it is a racy condition).

Sure thing.  I can provide a regression check using the openmpi bug
we've been investigating, if that's acceptable.

>> ---
>>  sysdeps/pthread/sem_open.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/sysdeps/pthread/sem_open.c b/sysdeps/pthread/sem_open.c
>> index e5db929d20..ba91f89d57 100644
>> --- a/sysdeps/pthread/sem_open.c
>> +++ b/sysdeps/pthread/sem_open.c
>> @@ -66,8 +66,8 @@ __sem_open (const char *name, int oflag, ...)
>>    if ((oflag & O_CREAT) == 0 || (oflag & O_EXCL) == 0)
>>      {
>>        open_flags = O_RDWR | O_NOFOLLOW | O_CLOEXEC;
>> -      open_flags |= (oflag & ~(O_CREAT|O_ACCMODE));
>>      try_again:
>> +      open_flags |= (oflag & ~(O_CREAT|O_ACCMODE));
>>        fd = __open (dirname.name, open_flags);
>>  
>>        if (fd == -1)
>
> I still think this is not fully correct, because on second try it would
> not use O_NOFOLLOW.  Also, O_RDWR will be always set since now it always
> clear the O_ACCMODE.

Yeah, Andreas and Simon pointed this out too.

> So I think it does not actually need to keep the open_flags over the
> iteration, we can simplify with something like that:
>
> diff --git a/sysdeps/pthread/sem_open.c b/sysdeps/pthread/sem_open.c
> index e5db929d20..7c189afbcf 100644
> --- a/sysdeps/pthread/sem_open.c
> +++ b/sysdeps/pthread/sem_open.c
> @@ -32,11 +32,12 @@
>  # define __unlink unlink
>  #endif
>
> +#define SEM_OPEN_FLAGS (O_RDWR | O_NOFOLLOW | O_CLOEXEC)
> +
>  sem_t *
>  __sem_open (const char *name, int oflag, ...)
>  {
>    int fd;
> -  int open_flags;
>    sem_t *result;
>
>    /* Check that shared futexes are supported.  */
> @@ -65,10 +66,8 @@ __sem_open (const char *name, int oflag, ...)
>    /* If the semaphore object has to exist simply open it.  */
>    if ((oflag & O_CREAT) == 0 || (oflag & O_EXCL) == 0)
>      {
> -      open_flags = O_RDWR | O_NOFOLLOW | O_CLOEXEC;
> -      open_flags |= (oflag & ~(O_CREAT|O_ACCMODE));
>      try_again:
> -      fd = __open (dirname.name, open_flags);
> +      fd = __open (dirname.name, (oflag & O_EXCL) | SEM_OPEN_FLAGS);
>
>        if (fd == -1)
>         {
> @@ -135,8 +134,7 @@ __sem_open (const char *name, int oflag, ...)
>             }
>
>           /* Open the file.  Make sure we do not overwrite anything.  */
> -         open_flags = O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC;
> -         fd = __open (tmpfname, open_flags, mode);
> +         fd = __open (tmpfname, O_CREAT | O_EXCL | SEM_OPEN_FLAGS, mode);
>           if (fd == -1)
>             {
>               if (errno == EEXIST)

Thanks.  I've just sent a v2 of the patch (before I saw your message),
which is pretty much the same thing you're proposing here (modulo the
SEM_OPEN_FLAGS define).

I've checked and it does fix the problem we are seeing.
  
Sergio Durigan Junior Aug. 23, 2023, 2:23 p.m. UTC | #5
On Wednesday, August 23 2023, I wrote:

> On Wednesday, August 23 2023, Adhemerval Zanella Netto wrote:
>
>> On 23/08/23 01:21, Sergio Durigan Junior via Libc-alpha wrote:
>>> When invoking sem_open with O_CREAT as one of its flags, we'll end up
>>> in the second part of sem_open's "if ((oflag & O_CREAT) == 0 || (oflag
>>> & O_EXCL) == 0)", which means that we don't expect the semaphore file
>>> to exist.
>>> 
>>> In that part, open_flags is initialized as "O_RDWR | O_CREAT | O_EXCL
>>> | O_CLOEXEC" and there's an attempt to open(2) the file, which will
>>> likely fail because it won't exist.  After that first (expected)
>>> failure, some cleanup is done and we go back to the label "try_again",
>>> which lives in the first part of the aforementioned "if".
>>> 
>>> The problem is that, in that part of the code, we expect the semaphore
>>> file to exist, and as such O_CREAT (this time the flag we pass to
>>> open(2)) needs to be cleaned from open_flags, otherwise we'll see
>>> another failure (this time unexpected) when trying to open the file,
>>> which will lead the call to sem_open to fail as well.
>>> 
>>> This can cause very strange bugs, especially with OpenMPI, which makes
>>> extensive use of semaphores.
>>> 
>>> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
>>> are clear after we enter "try_again".
>>> 
>>> See also: https://bugs.launchpad.net/ubuntu/+source/h5py/+bug/2031912
>>
>> This need needs a bug report and, if possible, a regression check (I give
>> you that it might be tricky due it is a racy condition).
>
> Sure thing.  I can provide a regression check using the openmpi bug
> we've been investigating, if that's acceptable.

https://sourceware.org/bugzilla/show_bug.cgi?id=30789

Let me know if that's enough.

Thanks,
  
Adhemerval Zanella Netto Aug. 23, 2023, 2:45 p.m. UTC | #6
On 23/08/23 11:10, Sergio Durigan Junior via Libc-alpha wrote:
> On Wednesday, August 23 2023, Andreas Schwab wrote:
> 
>> On Aug 23 2023, Sergio Durigan Junior via Libc-alpha wrote:
>>
>>> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
>>> are clear after we enter "try_again".
>>
>> That doesn't match what the patch does.  It only adds flags to
>> open_flags, and does not remove any.
> 
> You're correct.  Simon also spotted the same problem.  Here's an updated
> version of the patch.
> 

> From cb1a17590878955bcf6d7c2821ca95da3896608c Mon Sep 17 00:00:00 2001
> From: Sergio Durigan Junior <sergiodj@sergiodj.net>
> Date: Wed, 23 Aug 2023 00:10:44 -0400
> Subject: [PATCH] sysdeps: Clear O_CREAT|O_ACCMODE when trying again on
>  sem_open
> 

Please open a bug report, even without a reproducer; and reference it
on the subject.

> When invoking sem_open with O_CREAT as one of its flags, we'll end up
> in the second part of sem_open's "if ((oflag & O_CREAT) == 0 || (oflag
> & O_EXCL) == 0)", which means that we don't expect the semaphore file
> to exist.
> 
> In that part, open_flags is initialized as "O_RDWR | O_CREAT | O_EXCL
> | O_CLOEXEC" and there's an attempt to open(2) the file, which will
> likely fail because it won't exist.  After that first (expected)
> failure, some cleanup is done and we go back to the label "try_again",
> which lives in the first part of the aforementioned "if".
> 
> The problem is that, in that part of the code, we expect the semaphore
> file to exist, and as such O_CREAT (this time the flag we pass to
> open(2)) needs to be cleaned from open_flags, otherwise we'll see
> another failure (this time unexpected) when trying to open the file,
> which will lead the call to sem_open to fail as well.
> 
> This can cause very strange bugs, especially with OpenMPI, which makes
> extensive use of semaphores.
> 
> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
> are clear after we enter "try_again".
> 
> See also: https://bugs.launchpad.net/ubuntu/+source/h5py/+bug/2031912
> 
> Signed-off-by: Sergio Durigan Junior <sergiodj@sergiodj.net>
> Co-Authored-By: Simon Chopin <simon.chopin@canonical.com>
> Fixes: 533deafbdf189f5fbb280c28562dd43ace2f4b0f ("Use O_CLOEXEC in more places (BZ #15722)")
> ---
>  sysdeps/pthread/sem_open.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/sysdeps/pthread/sem_open.c b/sysdeps/pthread/sem_open.c
> index e5db929d20..529c286541 100644
> --- a/sysdeps/pthread/sem_open.c
> +++ b/sysdeps/pthread/sem_open.c
> @@ -36,7 +36,6 @@ sem_t *
>  __sem_open (const char *name, int oflag, ...)
>  {
>    int fd;
> -  int open_flags;
>    sem_t *result;
>  
>    /* Check that shared futexes are supported.  */
> @@ -65,10 +64,9 @@ __sem_open (const char *name, int oflag, ...)
>    /* If the semaphore object has to exist simply open it.  */
>    if ((oflag & O_CREAT) == 0 || (oflag & O_EXCL) == 0)
>      {
> -      open_flags = O_RDWR | O_NOFOLLOW | O_CLOEXEC;
> -      open_flags |= (oflag & ~(O_CREAT|O_ACCMODE));
>      try_again:
> -      fd = __open (dirname.name, open_flags);
> +      fd = __open (dirname.name,
> +		   (oflag & ~(O_CREAT|O_ACCMODE)) | O_NOFOLLOW | O_RDWR | O_CLOEXEC);

There is no need to support other flags than O_CREAT/O_EXCL, in fact I think
is QoI if sem_open ignore non-supported flags (since POSIX does not really
state an error for invalid flags).  That's why I added 'oflags & O_EXCL' on my
suggestion, since at this point either oflag does not have any supported
flags set, or any of O_CREAT/O_EXCL; and only need to check O_EXCL.

>  
>        if (fd == -1)
>  	{
> @@ -135,8 +133,7 @@ __sem_open (const char *name, int oflag, ...)
>  	    }
>  
>  	  /* Open the file.  Make sure we do not overwrite anything.  */
> -	  open_flags = O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC;
> -	  fd = __open (tmpfname, open_flags, mode);
> +	  fd = __open (tmpfname, O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC, mode);
>  	  if (fd == -1)
>  	    {
>  	      if (errno == EEXIST)
  
Sergio Durigan Junior Aug. 23, 2023, 7:08 p.m. UTC | #7
On Wednesday, August 23 2023, Adhemerval Zanella Netto wrote:

> On 23/08/23 11:10, Sergio Durigan Junior via Libc-alpha wrote:
>> On Wednesday, August 23 2023, Andreas Schwab wrote:
>> 
>>> On Aug 23 2023, Sergio Durigan Junior via Libc-alpha wrote:
>>>
>>>> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
>>>> are clear after we enter "try_again".
>>>
>>> That doesn't match what the patch does.  It only adds flags to
>>> open_flags, and does not remove any.
>> 
>> You're correct.  Simon also spotted the same problem.  Here's an updated
>> version of the patch.
>> 
>
>> From cb1a17590878955bcf6d7c2821ca95da3896608c Mon Sep 17 00:00:00 2001
>> From: Sergio Durigan Junior <sergiodj@sergiodj.net>
>> Date: Wed, 23 Aug 2023 00:10:44 -0400
>> Subject: [PATCH] sysdeps: Clear O_CREAT|O_ACCMODE when trying again on
>>  sem_open
>> 
>
> Please open a bug report, even without a reproducer; and reference it
> on the subject.

Here it is.

Thanks,
  
Joseph Myers Aug. 23, 2023, 9:53 p.m. UTC | #8
I'd expect the patch to add a testcase to the glibc testsuite or explain 
why it's hard to test.
  
Sergio Durigan Junior Oct. 28, 2023, 8:30 p.m. UTC | #9
On Wednesday, August 23 2023, Adhemerval Zanella Netto wrote:

> On 23/08/23 01:21, Sergio Durigan Junior via Libc-alpha wrote:
>> When invoking sem_open with O_CREAT as one of its flags, we'll end up
>> in the second part of sem_open's "if ((oflag & O_CREAT) == 0 || (oflag
>> & O_EXCL) == 0)", which means that we don't expect the semaphore file
>> to exist.
>> 
>> In that part, open_flags is initialized as "O_RDWR | O_CREAT | O_EXCL
>> | O_CLOEXEC" and there's an attempt to open(2) the file, which will
>> likely fail because it won't exist.  After that first (expected)
>> failure, some cleanup is done and we go back to the label "try_again",
>> which lives in the first part of the aforementioned "if".
>> 
>> The problem is that, in that part of the code, we expect the semaphore
>> file to exist, and as such O_CREAT (this time the flag we pass to
>> open(2)) needs to be cleaned from open_flags, otherwise we'll see
>> another failure (this time unexpected) when trying to open the file,
>> which will lead the call to sem_open to fail as well.
>> 
>> This can cause very strange bugs, especially with OpenMPI, which makes
>> extensive use of semaphores.
>> 
>> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
>> are clear after we enter "try_again".
>> 
>> See also: https://bugs.launchpad.net/ubuntu/+source/h5py/+bug/2031912
>
> This need needs a bug report and, if possible, a regression check (I give
> you that it might be tricky due it is a racy condition).

Hi folks,

It took me much longer than I intended to get back to this thread, so I
apologize.  I'm afraid I don't have very exciting news either: I still
don't have a testcase to exercise the fix.

After talking to Adhemerval during the last Cauldron, we have agreed
that (a) creating a testcase for this bug is indeed tricky (and may even
introduce false positives), and (b) we should likely move forward as is.

I still would like to point out that it is possible have a reliable
reproducer if you follow the steps I outlined on
https://sourceware.org/bugzilla/show_bug.cgi?id=30789#c1, but
unfortunately this is not acceptable as a glibc test, so there's that.

Either way, I'd like to know if you consider it OK to proceed.  I can
replicate this explanation in the commit message if you think it's
necessary.

Thank you,
  
Adhemerval Zanella Netto Nov. 1, 2023, 1:27 p.m. UTC | #10
On 28/10/23 17:30, Sergio Durigan Junior wrote:
> On Wednesday, August 23 2023, Adhemerval Zanella Netto wrote:
> 
>> On 23/08/23 01:21, Sergio Durigan Junior via Libc-alpha wrote:
>>> When invoking sem_open with O_CREAT as one of its flags, we'll end up
>>> in the second part of sem_open's "if ((oflag & O_CREAT) == 0 || (oflag
>>> & O_EXCL) == 0)", which means that we don't expect the semaphore file
>>> to exist.
>>>
>>> In that part, open_flags is initialized as "O_RDWR | O_CREAT | O_EXCL
>>> | O_CLOEXEC" and there's an attempt to open(2) the file, which will
>>> likely fail because it won't exist.  After that first (expected)
>>> failure, some cleanup is done and we go back to the label "try_again",
>>> which lives in the first part of the aforementioned "if".
>>>
>>> The problem is that, in that part of the code, we expect the semaphore
>>> file to exist, and as such O_CREAT (this time the flag we pass to
>>> open(2)) needs to be cleaned from open_flags, otherwise we'll see
>>> another failure (this time unexpected) when trying to open the file,
>>> which will lead the call to sem_open to fail as well.
>>>
>>> This can cause very strange bugs, especially with OpenMPI, which makes
>>> extensive use of semaphores.
>>>
>>> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
>>> are clear after we enter "try_again".
>>>
>>> See also: https://bugs.launchpad.net/ubuntu/+source/h5py/+bug/2031912
>>
>> This need needs a bug report and, if possible, a regression check (I give
>> you that it might be tricky due it is a racy condition).
> 
> Hi folks,
> 
> It took me much longer than I intended to get back to this thread, so I
> apologize.  I'm afraid I don't have very exciting news either: I still
> don't have a testcase to exercise the fix.
> 
> After talking to Adhemerval during the last Cauldron, we have agreed
> that (a) creating a testcase for this bug is indeed tricky (and may even
> introduce false positives), and (b) we should likely move forward as is.
> 
> I still would like to point out that it is possible have a reliable
> reproducer if you follow the steps I outlined on
> https://sourceware.org/bugzilla/show_bug.cgi?id=30789#c1, but
> unfortunately this is not acceptable as a glibc test, so there's that.
> 
> Either way, I'd like to know if you consider it OK to proceed.  I can
> replicate this explanation in the commit message if you think it's
> necessary.

Joseph, would adding the following on commit message be enough to 
install this patch:

  A regression test for this issue would require a complex and cpu
  time consuming logic, since to trigger the wrong code path is not
  straightforward due the racy condition. 

Sergio, could you resend the patch either this following or more
extended explanation along with BZ# 30789 on the title?
  
Joseph Myers Nov. 1, 2023, 1:53 p.m. UTC | #11
On Wed, 1 Nov 2023, Adhemerval Zanella Netto wrote:

> Joseph, would adding the following on commit message be enough to 
> install this patch:
> 
>   A regression test for this issue would require a complex and cpu
>   time consuming logic, since to trigger the wrong code path is not
>   straightforward due the racy condition. 

Yes, that seems reasonable as justification for not adding a test case.  
(Note: I have not reviewed the substance of the patch.)

The default assumption is that a test case is added for a bug fix (that 
isn't just fixing an issue shown by an existing testsuite failure, build 
failure, etc.), but this assumption may be overcome in a particular case 
by a reason the issue is hard to cover in the testsuite.
  
Sergio Durigan Junior Nov. 1, 2023, 10:14 p.m. UTC | #12
On Wednesday, November 01 2023, Adhemerval Zanella Netto wrote:

> On 28/10/23 17:30, Sergio Durigan Junior wrote:
>> On Wednesday, August 23 2023, Adhemerval Zanella Netto wrote:
>> 
>>> On 23/08/23 01:21, Sergio Durigan Junior via Libc-alpha wrote:
>>>> When invoking sem_open with O_CREAT as one of its flags, we'll end up
>>>> in the second part of sem_open's "if ((oflag & O_CREAT) == 0 || (oflag
>>>> & O_EXCL) == 0)", which means that we don't expect the semaphore file
>>>> to exist.
>>>>
>>>> In that part, open_flags is initialized as "O_RDWR | O_CREAT | O_EXCL
>>>> | O_CLOEXEC" and there's an attempt to open(2) the file, which will
>>>> likely fail because it won't exist.  After that first (expected)
>>>> failure, some cleanup is done and we go back to the label "try_again",
>>>> which lives in the first part of the aforementioned "if".
>>>>
>>>> The problem is that, in that part of the code, we expect the semaphore
>>>> file to exist, and as such O_CREAT (this time the flag we pass to
>>>> open(2)) needs to be cleaned from open_flags, otherwise we'll see
>>>> another failure (this time unexpected) when trying to open the file,
>>>> which will lead the call to sem_open to fail as well.
>>>>
>>>> This can cause very strange bugs, especially with OpenMPI, which makes
>>>> extensive use of semaphores.
>>>>
>>>> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags
>>>> are clear after we enter "try_again".
>>>>
>>>> See also: https://bugs.launchpad.net/ubuntu/+source/h5py/+bug/2031912
>>>
>>> This need needs a bug report and, if possible, a regression check (I give
>>> you that it might be tricky due it is a racy condition).
>> 
>> Hi folks,
>> 
>> It took me much longer than I intended to get back to this thread, so I
>> apologize.  I'm afraid I don't have very exciting news either: I still
>> don't have a testcase to exercise the fix.
>> 
>> After talking to Adhemerval during the last Cauldron, we have agreed
>> that (a) creating a testcase for this bug is indeed tricky (and may even
>> introduce false positives), and (b) we should likely move forward as is.
>> 
>> I still would like to point out that it is possible have a reliable
>> reproducer if you follow the steps I outlined on
>> https://sourceware.org/bugzilla/show_bug.cgi?id=30789#c1, but
>> unfortunately this is not acceptable as a glibc test, so there's that.
>> 
>> Either way, I'd like to know if you consider it OK to proceed.  I can
>> replicate this explanation in the commit message if you think it's
>> necessary.
>
> Joseph, would adding the following on commit message be enough to 
> install this patch:
>
>   A regression test for this issue would require a complex and cpu
>   time consuming logic, since to trigger the wrong code path is not
>   straightforward due the racy condition. 
>
> Sergio, could you resend the patch either this following or more
> extended explanation along with BZ# 30789 on the title?

Sure.  I'll send it as a reply to your message.
  

Patch

diff --git a/sysdeps/pthread/sem_open.c b/sysdeps/pthread/sem_open.c
index e5db929d20..ba91f89d57 100644
--- a/sysdeps/pthread/sem_open.c
+++ b/sysdeps/pthread/sem_open.c
@@ -66,8 +66,8 @@  __sem_open (const char *name, int oflag, ...)
   if ((oflag & O_CREAT) == 0 || (oflag & O_EXCL) == 0)
     {
       open_flags = O_RDWR | O_NOFOLLOW | O_CLOEXEC;
-      open_flags |= (oflag & ~(O_CREAT|O_ACCMODE));
     try_again:
+      open_flags |= (oflag & ~(O_CREAT|O_ACCMODE));
       fd = __open (dirname.name, open_flags);
 
       if (fd == -1)