[v2] handle sem_t with ILP32 and __HAVE_64B_ATOMICS

Message ID 54C6CF4B.1020600@ezchip.com
State New, archived
Headers

Commit Message

Chris Metcalf Jan. 26, 2015, 11:35 p.m. UTC
  This version reflects Torvald's recent comments.

2015-01-25  Chris Metcalf  <cmetcalf@ezchip.com>

         * sysdeps/nptl/internaltypes.h (to_new_sem): Define.  Provides new
         behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)].
         * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem.
         * nptl/sem_init.c (__new_sem_init): Likewise.
         * nptl/sem_open.c (sem_open): Likewise.
         * nptl/sem_post.c (__new_sem_post): Likewise.
         * nptl/sem_timedwait.c (sem_timedwait): Likewise.
         * nptl/sem_wait.c (__new_sem_wait): Likewise.
         (__new_sem_trywait): Likewise.
  

Comments

Carlos O'Donell Jan. 27, 2015, 4:31 a.m. UTC | #1
On 01/26/2015 06:35 PM, Chris Metcalf wrote:
> This version reflects Torvald's recent comments.
> 
> 2015-01-25  Chris Metcalf  <cmetcalf@ezchip.com>
> 
>         * sysdeps/nptl/internaltypes.h (to_new_sem): Define.  Provides new
>         behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)].
>         * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem.
>         * nptl/sem_init.c (__new_sem_init): Likewise.
>         * nptl/sem_open.c (sem_open): Likewise.
>         * nptl/sem_post.c (__new_sem_post): Likewise.
>         * nptl/sem_timedwait.c (sem_timedwait): Likewise.
>         * nptl/sem_wait.c (__new_sem_wait): Likewise.
>         (__new_sem_trywait): Likewise.

I'm not all that happy with this patch.

For example a statically compiled application sharing a semaphore
with a dynamically compiled application would appear to break
under your changes and nothing you can do would fix it. They each
have different layouts for the semaphore. This is a common problem
when switching internal layouts (like we did from Linuxthreads to
NPTL). I accept that you may want to make this change and that this
particular case might be unsupportable, but I want more discussion
on the topic.

Similarly H.J's comment to change the alignment of the type
is equally flawed as embedded sem_t's in other types would break
other ABIs down the line. I see he's commented on that in a later
down-thread post.

As far as I can tell the only immediately kosher solution is for
these machines, that can't use 64-bit atomics because the
alignment of their sem_t is not correct, is to use 32-bit atomics
(for now). The choice of 32-bit atomics in no way prevents you
from future 64-bit atomic uses. You can switch AFAICT to another
internal implementation at a later date.

Could you switch to 32-bit atomics for 2.21 and continue to
investigate this optimziation for 2.22?

Cheers,
Carlos.
  
H.J. Lu Jan. 27, 2015, 4:49 a.m. UTC | #2
On Mon, Jan 26, 2015 at 8:31 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 01/26/2015 06:35 PM, Chris Metcalf wrote:
>> This version reflects Torvald's recent comments.
>>
>> 2015-01-25  Chris Metcalf  <cmetcalf@ezchip.com>
>>
>>         * sysdeps/nptl/internaltypes.h (to_new_sem): Define.  Provides new
>>         behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)].
>>         * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem.
>>         * nptl/sem_init.c (__new_sem_init): Likewise.
>>         * nptl/sem_open.c (sem_open): Likewise.
>>         * nptl/sem_post.c (__new_sem_post): Likewise.
>>         * nptl/sem_timedwait.c (sem_timedwait): Likewise.
>>         * nptl/sem_wait.c (__new_sem_wait): Likewise.
>>         (__new_sem_trywait): Likewise.
>
> I'm not all that happy with this patch.
>
> For example a statically compiled application sharing a semaphore
> with a dynamically compiled application would appear to break
> under your changes and nothing you can do would fix it. They each
> have different layouts for the semaphore. This is a common problem
> when switching internal layouts (like we did from Linuxthreads to
> NPTL). I accept that you may want to make this change and that this
> particular case might be unsupportable, but I want more discussion
> on the topic.
>
> Similarly H.J's comment to change the alignment of the type
> is equally flawed as embedded sem_t's in other types would break
> other ABIs down the line. I see he's commented on that in a later
> down-thread post.
>
> As far as I can tell the only immediately kosher solution is for
> these machines, that can't use 64-bit atomics because the
> alignment of their sem_t is not correct, is to use 32-bit atomics
> (for now). The choice of 32-bit atomics in no way prevents you
> from future 64-bit atomic uses. You can switch AFAICT to another
> internal implementation at a later date.
>
> Could you switch to 32-bit atomics for 2.21 and continue to
> investigate this optimziation for 2.22?
>

What is the motivation to use 64-bit atomics for ILP32 here?
Performance or correctness?
  
Andrew Pinski Jan. 27, 2015, 4:50 a.m. UTC | #3
On Mon, Jan 26, 2015 at 8:31 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 01/26/2015 06:35 PM, Chris Metcalf wrote:
>> This version reflects Torvald's recent comments.
>>
>> 2015-01-25  Chris Metcalf  <cmetcalf@ezchip.com>
>>
>>         * sysdeps/nptl/internaltypes.h (to_new_sem): Define.  Provides new
>>         behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)].
>>         * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem.
>>         * nptl/sem_init.c (__new_sem_init): Likewise.
>>         * nptl/sem_open.c (sem_open): Likewise.
>>         * nptl/sem_post.c (__new_sem_post): Likewise.
>>         * nptl/sem_timedwait.c (sem_timedwait): Likewise.
>>         * nptl/sem_wait.c (__new_sem_wait): Likewise.
>>         (__new_sem_trywait): Likewise.
>
> I'm not all that happy with this patch.
>
> For example a statically compiled application sharing a semaphore
> with a dynamically compiled application would appear to break
> under your changes and nothing you can do would fix it. They each
> have different layouts for the semaphore. This is a common problem
> when switching internal layouts (like we did from Linuxthreads to
> NPTL). I accept that you may want to make this change and that this
> particular case might be unsupportable, but I want more discussion
> on the topic.
>
> Similarly H.J's comment to change the alignment of the type
> is equally flawed as embedded sem_t's in other types would break
> other ABIs down the line. I see he's commented on that in a later
> down-thread post.
>
> As far as I can tell the only immediately kosher solution is for
> these machines, that can't use 64-bit atomics because the
> alignment of their sem_t is not correct, is to use 32-bit atomics
> (for now). The choice of 32-bit atomics in no way prevents you
> from future 64-bit atomic uses. You can switch AFAICT to another
> internal implementation at a later date.
>
> Could you switch to 32-bit atomics for 2.21 and continue to
> investigate this optimziation for 2.22?

Since AARCH64:ILP32 is not in yet, it might make sense to use 64bit
atomics for it though we have the same issue as there are now glibc in
the wild with AARCH64:ILP32 support and those would have the same
issue as tile now.

Thanks,
Andrew

>
> Cheers,
> Carlos.
>
  
Torvald Riegel Jan. 27, 2015, 11:30 a.m. UTC | #4
On Mon, 2015-01-26 at 23:31 -0500, Carlos O'Donell wrote:
> On 01/26/2015 06:35 PM, Chris Metcalf wrote:
> > This version reflects Torvald's recent comments.
> > 
> > 2015-01-25  Chris Metcalf  <cmetcalf@ezchip.com>
> > 
> >         * sysdeps/nptl/internaltypes.h (to_new_sem): Define.  Provides new
> >         behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)].
> >         * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem.
> >         * nptl/sem_init.c (__new_sem_init): Likewise.
> >         * nptl/sem_open.c (sem_open): Likewise.
> >         * nptl/sem_post.c (__new_sem_post): Likewise.
> >         * nptl/sem_timedwait.c (sem_timedwait): Likewise.
> >         * nptl/sem_wait.c (__new_sem_wait): Likewise.
> >         (__new_sem_trywait): Likewise.
> 
> I'm not all that happy with this patch.
> 
> For example a statically compiled application sharing a semaphore
> with a dynamically compiled application would appear to break
> under your changes and nothing you can do would fix it. They each
> have different layouts for the semaphore. This is a common problem
> when switching internal layouts (like we did from Linuxthreads to
> NPTL). I accept that you may want to make this change and that this
> particular case might be unsupportable, but I want more discussion
> on the topic.

I'm not quite sure what you are concerned about.  The sem_t structure
exposed to programs does not change.  I agree the internal layout
changes, depending on the address of the sem_t instance.  This is
similar to having a pointer in sem_t pointing to within this instance.
One isn't allowed to do bitwise copy though, so once initialized at an
address, an instance can't move to another address (with the exception
of mapping at a different page, but that won't change the within-a-page
offset, which really matters here).

If we have a statically compiled application sharing a semaphore with a
dynamically compiled application, then if they use the same glibc code,
I don't see what could break.
If they don't, things will break, but even without the patch by Chris.

So, are you concerned about process-shared pthreads objects where the
glibc code used on both sides differs?  We can decide to never change
any of such code in incompatible ways (and this might already happen
just if we remove barriers or read from it differently!) -- but then we
won't be able to fix certain bugs (eg, this semaphore bug).  I also
don't see how versioning could help -- the (process-)sharing between the
applications happens at the source code level, and we have no versioning
support built into the data representation, so we can't detect the
difference when sharing.

> Similarly H.J's comment to change the alignment of the type
> is equally flawed as embedded sem_t's in other types would break
> other ABIs down the line. I see he's commented on that in a later
> down-thread post.

I don't think it's the same, but maybe I don't understand your concern.

> As far as I can tell the only immediately kosher solution is for
> these machines, that can't use 64-bit atomics because the
> alignment of their sem_t is not correct, is to use 32-bit atomics
> (for now). The choice of 32-bit atomics in no way prevents you
> from future 64-bit atomic uses. You can switch AFAICT to another
> internal implementation at a later date.

I agree that this is possible.  But not that this will change the
internal representation of struct new_sem too.
  
Torvald Riegel Jan. 27, 2015, 11:32 a.m. UTC | #5
On Mon, 2015-01-26 at 18:35 -0500, Chris Metcalf wrote:
> This version reflects Torvald's recent comments.

I think this looks good, but we'll have to find consensus with Carlos
first.  I also wouldn't be opposed to at first just enable the
64b-algorithm if __HAVE_64B_ATOMICS and _LP64 are defined (if the latter
is the right check...).
  
H.J. Lu Jan. 27, 2015, 11:40 a.m. UTC | #6
On Mon, Jan 26, 2015 at 3:35 PM, Chris Metcalf <cmetcalf@ezchip.com> wrote:
> This version reflects Torvald's recent comments.
>
> 2015-01-25  Chris Metcalf  <cmetcalf@ezchip.com>
>
>         * sysdeps/nptl/internaltypes.h (to_new_sem): Define.  Provides new
>         behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)].
>         * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem.
>         * nptl/sem_init.c (__new_sem_init): Likewise.
>         * nptl/sem_open.c (sem_open): Likewise.
>         * nptl/sem_post.c (__new_sem_post): Likewise.
>         * nptl/sem_timedwait.c (sem_timedwait): Likewise.
>         * nptl/sem_wait.c (__new_sem_wait): Likewise.
>         (__new_sem_trywait): Likewise.
>

>    /* Use the values the caller provided.  */
>  #if __HAVE_64B_ATOMICS
> diff --git a/nptl/sem_open.c b/nptl/sem_open.c
> index bfd2dea..202769f 100644
> --- a/nptl/sem_open.c
> +++ b/nptl/sem_open.c
> @@ -186,25 +186,28 @@ sem_open (const char *name, int oflag, ...)
>           return SEM_FAILED;
>         }
>
> -      /* Create the initial file content.  */
> -      union
> -      {
> -       sem_t initsem;
> -       struct new_sem newsem;
> -      } sem;
> +      /* Create the initial file content.  We force the alignment of
> +         the sem_t to match the alignment of struct new_sem since we
> +         will copy this stack structure to a file and then mmap it,
> +         so we must ensure it is aligned to zero here so that the
> +         behavior of to_new_sem () is the same as when we later mmap
> +         it into memory and have it be page-aligned.  */
> +      sem_t sem __attribute__ ((aligned (__alignof__ (struct new_sem))));;
> +      struct new_sem *newsem = to_new_sem (&sem);
> +
> +      /* Initialize the unused parts of sem_t to zero.
> +         The compiler will notice most of this memset is dead based on
> +         the assignments through the struct new_sem pointer.  */
> +      memset (&sem, '\0', sizeof (sem_t));

Why is this change needed?  Since union sem has the largest alignment of
sem_t and struct new_sem, sem is properly aligned here.

>  #if __HAVE_64B_ATOMICS
> -      sem.newsem.data = value;
> +      newsem->data = value;
>  #else
> -      sem.newsem.value = value << SEM_VALUE_SHIFT;
> -      sem.newsem.nwaiters = 0;
> +      newsem->value = value << SEM_VALUE_SHIFT;
> +      newsem->nwaiters = 0;
>  #endif
>        /* This always is a shared semaphore.  */
> -      sem.newsem.private = LLL_SHARED;
> -
> -      /* Initialize the remaining bytes as well.  */
> -      memset ((char *) &sem.initsem + sizeof (struct new_sem), '\0',
> -             sizeof (sem_t) - sizeof (struct new_sem));
> +      newsem->private = LLL_SHARED;
>
>        tmpfname = __alloca (shm_dirlen + sizeof SEM_SHM_PREFIX + 6);
>        char *xxxxxx = __mempcpy (tmpfname, shm_dir, shm_dirlen);
> @@ -242,7 +245,7 @@ sem_open (const char *name, int oflag, ...)
>           break;
>         }
>
> -      if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem.initsem, sizeof
> (sem_t)))
> +      if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem, sizeof (sem_t)))
>           == sizeof (sem_t)
>           /* Map the sem_t structure from the file.  */
>           && (result = (sem_t *) mmap (NULL, sizeof (sem_t),
  
Chris Metcalf Jan. 27, 2015, 9:18 p.m. UTC | #7
On 1/27/2015 6:40 AM, H.J. Lu wrote:
> On Mon, Jan 26, 2015 at 3:35 PM, Chris Metcalf <cmetcalf@ezchip.com> wrote:
>> --- a/nptl/sem_open.c
>> +++ b/nptl/sem_open.c
>> @@ -186,25 +186,28 @@ sem_open (const char *name, int oflag, ...)
>>            return SEM_FAILED;
>>          }
>>
>> -      /* Create the initial file content.  */
>> -      union
>> -      {
>> -       sem_t initsem;
>> -       struct new_sem newsem;
>> -      } sem;
>> +      /* Create the initial file content.  We force the alignment of
>> +         the sem_t to match the alignment of struct new_sem since we
>> +         will copy this stack structure to a file and then mmap it,
>> +         so we must ensure it is aligned to zero here so that the
>> +         behavior of to_new_sem () is the same as when we later mmap
>> +         it into memory and have it be page-aligned.  */
>> +      sem_t sem __attribute__ ((aligned (__alignof__ (struct new_sem))));;
>> +      struct new_sem *newsem = to_new_sem (&sem);
>> +
>> +      /* Initialize the unused parts of sem_t to zero.
>> +         The compiler will notice most of this memset is dead based on
>> +         the assignments through the struct new_sem pointer.  */
>> +      memset (&sem, '\0', sizeof (sem_t));
> Why is this change needed?  Since union sem has the largest alignment of
> sem_t and struct new_sem, sem is properly aligned here.

It is not needed.  Torvald's claim was that it was more stylistically
consistent to use to_new_sem() in all cases, even here.  I am personally
on the fence about this; I think the new code and old code are both
plausible ways to represent the issue with the required alignment.

If Torvald prefers the new code and you prefer the old code, perhaps
we will need to ask for a tie-breaker vote. :-)
  
Chris Metcalf Jan. 27, 2015, 9:21 p.m. UTC | #8
On 1/26/2015 11:31 PM, Carlos O'Donell wrote:
> On 01/26/2015 06:35 PM, Chris Metcalf wrote:
>> This version reflects Torvald's recent comments.
>>
>> 2015-01-25  Chris Metcalf  <cmetcalf@ezchip.com>
>>
>>          * sysdeps/nptl/internaltypes.h (to_new_sem): Define.  Provides new
>>          behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)].
>>          * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem.
>>          * nptl/sem_init.c (__new_sem_init): Likewise.
>>          * nptl/sem_open.c (sem_open): Likewise.
>>          * nptl/sem_post.c (__new_sem_post): Likewise.
>>          * nptl/sem_timedwait.c (sem_timedwait): Likewise.
>>          * nptl/sem_wait.c (__new_sem_wait): Likewise.
>>          (__new_sem_trywait): Likewise.
> I'm not all that happy with this patch.
>
> For example a statically compiled application sharing a semaphore
> with a dynamically compiled application would appear to break
> under your changes and nothing you can do would fix it. They each
> have different layouts for the semaphore. This is a common problem
> when switching internal layouts (like we did from Linuxthreads to
> NPTL). I accept that you may want to make this change and that this
> particular case might be unsupportable, but I want more discussion
> on the topic.
>
> Similarly H.J's comment to change the alignment of the type
> is equally flawed as embedded sem_t's in other types would break
> other ABIs down the line. I see he's commented on that in a later
> down-thread post.
>
> As far as I can tell the only immediately kosher solution is for
> these machines, that can't use 64-bit atomics because the
> alignment of their sem_t is not correct, is to use 32-bit atomics
> (for now). The choice of 32-bit atomics in no way prevents you
> from future 64-bit atomic uses. You can switch AFAICT to another
> internal implementation at a later date.
>
> Could you switch to 32-bit atomics for 2.21 and continue to
> investigate this optimziation for 2.22?

I am certainly willing to use 32-bit atomics here for now.  The absolutely
minimal change is to change tilegx32 to set __HAVE_64B_ATOMICS to 0.
Obviously the AArch64 ILP32 mode (out of tree) will want to do the same.
And I don't know whether m68k or mips require a similar change or if
they can tolerate unaligned 64-bit atomic operations.  So arguably if
we can decide that my proposed re-aligning change is best, then it also
automatically handles all the other platforms as well.
  
H.J. Lu Jan. 27, 2015, 9:26 p.m. UTC | #9
On Tue, Jan 27, 2015 at 1:21 PM, Chris Metcalf <cmetcalf@ezchip.com> wrote:
> On 1/26/2015 11:31 PM, Carlos O'Donell wrote:
>>
>> On 01/26/2015 06:35 PM, Chris Metcalf wrote:
>>>
>>> This version reflects Torvald's recent comments.
>>>
>>> 2015-01-25  Chris Metcalf  <cmetcalf@ezchip.com>
>>>
>>>          * sysdeps/nptl/internaltypes.h (to_new_sem): Define.  Provides
>>> new
>>>          behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)].
>>>          * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem.
>>>          * nptl/sem_init.c (__new_sem_init): Likewise.
>>>          * nptl/sem_open.c (sem_open): Likewise.
>>>          * nptl/sem_post.c (__new_sem_post): Likewise.
>>>          * nptl/sem_timedwait.c (sem_timedwait): Likewise.
>>>          * nptl/sem_wait.c (__new_sem_wait): Likewise.
>>>          (__new_sem_trywait): Likewise.
>>
>> I'm not all that happy with this patch.
>>
>> For example a statically compiled application sharing a semaphore
>> with a dynamically compiled application would appear to break
>> under your changes and nothing you can do would fix it. They each
>> have different layouts for the semaphore. This is a common problem
>> when switching internal layouts (like we did from Linuxthreads to
>> NPTL). I accept that you may want to make this change and that this
>> particular case might be unsupportable, but I want more discussion
>> on the topic.
>>
>> Similarly H.J's comment to change the alignment of the type
>> is equally flawed as embedded sem_t's in other types would break
>> other ABIs down the line. I see he's commented on that in a later
>> down-thread post.
>>
>> As far as I can tell the only immediately kosher solution is for
>> these machines, that can't use 64-bit atomics because the
>> alignment of their sem_t is not correct, is to use 32-bit atomics
>> (for now). The choice of 32-bit atomics in no way prevents you
>> from future 64-bit atomic uses. You can switch AFAICT to another
>> internal implementation at a later date.
>>
>> Could you switch to 32-bit atomics for 2.21 and continue to
>> investigate this optimziation for 2.22?
>
>
> I am certainly willing to use 32-bit atomics here for now.  The absolutely
> minimal change is to change tilegx32 to set __HAVE_64B_ATOMICS to 0.
> Obviously the AArch64 ILP32 mode (out of tree) will want to do the same.
> And I don't know whether m68k or mips require a similar change or if
> they can tolerate unaligned 64-bit atomic operations.  So arguably if
> we can decide that my proposed re-aligning change is best, then it also
> automatically handles all the other platforms as well.
>

X32 can use a hybrid implementation.  Use 32-bit atomics unless
64-bit atomics on misaligned memory can simplify the code.  It is
hard to say if it is faster.  Since we are in code freeze, I prefer
to use 32-bit atomics in 2.21 for now.  We can always backport if
needed.
  
Carlos O'Donell Jan. 27, 2015, 9:30 p.m. UTC | #10
On 01/27/2015 04:21 PM, Chris Metcalf wrote:
> On 1/26/2015 11:31 PM, Carlos O'Donell wrote:
>> On 01/26/2015 06:35 PM, Chris Metcalf wrote:
>>> This version reflects Torvald's recent comments.
>>>
>>> 2015-01-25  Chris Metcalf  <cmetcalf@ezchip.com>
>>>
>>>          * sysdeps/nptl/internaltypes.h (to_new_sem): Define.  Provides new
>>>          behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)].
>>>          * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem.
>>>          * nptl/sem_init.c (__new_sem_init): Likewise.
>>>          * nptl/sem_open.c (sem_open): Likewise.
>>>          * nptl/sem_post.c (__new_sem_post): Likewise.
>>>          * nptl/sem_timedwait.c (sem_timedwait): Likewise.
>>>          * nptl/sem_wait.c (__new_sem_wait): Likewise.
>>>          (__new_sem_trywait): Likewise.
>> I'm not all that happy with this patch.
>>
>> For example a statically compiled application sharing a semaphore
>> with a dynamically compiled application would appear to break
>> under your changes and nothing you can do would fix it. They each
>> have different layouts for the semaphore. This is a common problem
>> when switching internal layouts (like we did from Linuxthreads to
>> NPTL). I accept that you may want to make this change and that this
>> particular case might be unsupportable, but I want more discussion
>> on the topic.
>>
>> Similarly H.J's comment to change the alignment of the type
>> is equally flawed as embedded sem_t's in other types would break
>> other ABIs down the line. I see he's commented on that in a later
>> down-thread post.
>>
>> As far as I can tell the only immediately kosher solution is for
>> these machines, that can't use 64-bit atomics because the
>> alignment of their sem_t is not correct, is to use 32-bit atomics
>> (for now). The choice of 32-bit atomics in no way prevents you
>> from future 64-bit atomic uses. You can switch AFAICT to another
>> internal implementation at a later date.
>>
>> Could you switch to 32-bit atomics for 2.21 and continue to
>> investigate this optimziation for 2.22?
> 
> I am certainly willing to use 32-bit atomics here for now.  The absolutely
> minimal change is to change tilegx32 to set __HAVE_64B_ATOMICS to 0.
> Obviously the AArch64 ILP32 mode (out of tree) will want to do the same.
> And I don't know whether m68k or mips require a similar change or if
> they can tolerate unaligned 64-bit atomic operations.  So arguably if
> we can decide that my proposed re-aligning change is best, then it also
> automatically handles all the other platforms as well.

I would like several more rounds of review of your realigning code.

You have my permission to checkin the minimal change for tilegx32.

Similarly other ILP32 with 64-bit atomic arches should do the same,
and we can continue this conversation without the pressure of the
time boxed release.

Cheers,
Carlos.
  
Joseph Myers Jan. 27, 2015, 11:31 p.m. UTC | #11
On Tue, 27 Jan 2015, Carlos O'Donell wrote:

> > I am certainly willing to use 32-bit atomics here for now.  The absolutely
> > minimal change is to change tilegx32 to set __HAVE_64B_ATOMICS to 0.
> > Obviously the AArch64 ILP32 mode (out of tree) will want to do the same.
> > And I don't know whether m68k or mips require a similar change or if
> > they can tolerate unaligned 64-bit atomic operations.  So arguably if
> > we can decide that my proposed re-aligning change is best, then it also
> > automatically handles all the other platforms as well.
> 
> I would like several more rounds of review of your realigning code.
> 
> You have my permission to checkin the minimal change for tilegx32.
> 
> Similarly other ILP32 with 64-bit atomic arches should do the same,
> and we can continue this conversation without the pressure of the
> time boxed release.

Is __HAVE_64B_ATOMICS used *only* for semaphores?  Because if there are 
other effects of changing it, doing so seems risky - it would be better to 
have a define such as SEM_USE_64B_ATOMICS, and make that default to 0 for 
ILP32 architectures.
  
Roland McGrath Jan. 27, 2015, 11:40 p.m. UTC | #12
> Is __HAVE_64B_ATOMICS used *only* for semaphores?  Because if there are 
> other effects of changing it, doing so seems risky - it would be better to 
> have a define such as SEM_USE_64B_ATOMICS, and make that default to 0 for 
> ILP32 architectures.

A trivial grep shows there are no other uses.
  
Torvald Riegel Jan. 28, 2015, 9:04 a.m. UTC | #13
On Tue, 2015-01-27 at 16:18 -0500, Chris Metcalf wrote:
> On 1/27/2015 6:40 AM, H.J. Lu wrote:
> > On Mon, Jan 26, 2015 at 3:35 PM, Chris Metcalf <cmetcalf@ezchip.com> wrote:
> >> --- a/nptl/sem_open.c
> >> +++ b/nptl/sem_open.c
> >> @@ -186,25 +186,28 @@ sem_open (const char *name, int oflag, ...)
> >>            return SEM_FAILED;
> >>          }
> >>
> >> -      /* Create the initial file content.  */
> >> -      union
> >> -      {
> >> -       sem_t initsem;
> >> -       struct new_sem newsem;
> >> -      } sem;
> >> +      /* Create the initial file content.  We force the alignment of
> >> +         the sem_t to match the alignment of struct new_sem since we
> >> +         will copy this stack structure to a file and then mmap it,
> >> +         so we must ensure it is aligned to zero here so that the
> >> +         behavior of to_new_sem () is the same as when we later mmap
> >> +         it into memory and have it be page-aligned.  */
> >> +      sem_t sem __attribute__ ((aligned (__alignof__ (struct new_sem))));;
> >> +      struct new_sem *newsem = to_new_sem (&sem);
> >> +
> >> +      /* Initialize the unused parts of sem_t to zero.
> >> +         The compiler will notice most of this memset is dead based on
> >> +         the assignments through the struct new_sem pointer.  */
> >> +      memset (&sem, '\0', sizeof (sem_t));
> > Why is this change needed?  Since union sem has the largest alignment of
> > sem_t and struct new_sem, sem is properly aligned here.
> 
> It is not needed.  Torvald's claim was that it was more stylistically
> consistent to use to_new_sem() in all cases, even here.  I am personally
> on the fence about this; I think the new code and old code are both
> plausible ways to represent the issue with the required alignment.
> 
> If Torvald prefers the new code and you prefer the old code, perhaps
> we will need to ask for a tie-breaker vote. :-)
> 

I thing using to_new_sem in all cases is cleaner, but my comment on the
union wasn't quite right.  I agree the union should give the proper
alignment, because it includes struct new_sem directly, so will pick up
the uint64_t alignment of it.
  
Chris Metcalf Jan. 28, 2015, 5:45 p.m. UTC | #14
On 1/28/2015 4:04 AM, Torvald Riegel wrote:
> I thing using to_new_sem in all cases is cleaner, but my comment on the
> union wasn't quite right.  I agree the union should give the proper
> alignment, because it includes struct new_sem directly, so will pick up
> the uint64_t alignment of it.

I have reverted the code to just using the union, which on balance I
think I slightly prefer.  I think mentioning to_new_sem() in the comment
as part of the explanation is a sufficient way to deal with this.

+      /* Create the initial file content.  The union forces the
+         alignment of initsem to be at least as much as newsem.
+         When to_new_sem() provides for varying internal alignment,
+         this expression makes the alignment zero, and matches the
+         eventual alignment when the union is copied to the start of
+         an mmap'ed file page.  */
  

Patch

diff --git a/nptl/sem_getvalue.c b/nptl/sem_getvalue.c
index 1432cc7..08a0789 100644
--- a/nptl/sem_getvalue.c
+++ b/nptl/sem_getvalue.c
@@ -25,7 +25,7 @@ 
  int
  __new_sem_getvalue (sem_t *sem, int *sval)
  {
-  struct new_sem *isem = (struct new_sem *) sem;
+  struct new_sem *isem = to_new_sem (sem);

    /* XXX Check for valid SEM parameter.  */
    /* FIXME This uses relaxed MO, even though POSIX specifies that this function
diff --git a/nptl/sem_init.c b/nptl/sem_init.c
index 575b661..aaa6af8 100644
--- a/nptl/sem_init.c
+++ b/nptl/sem_init.c
@@ -50,7 +50,7 @@  __new_sem_init (sem_t *sem, int pshared, unsigned int value)
      }

    /* Map to the internal type.  */
-  struct new_sem *isem = (struct new_sem *) sem;
+  struct new_sem *isem = to_new_sem (sem);

    /* Use the values the caller provided.  */
  #if __HAVE_64B_ATOMICS
diff --git a/nptl/sem_open.c b/nptl/sem_open.c
index bfd2dea..202769f 100644
--- a/nptl/sem_open.c
+++ b/nptl/sem_open.c
@@ -186,25 +186,28 @@  sem_open (const char *name, int oflag, ...)
           return SEM_FAILED;
         }

-      /* Create the initial file content.  */
-      union
-      {
-       sem_t initsem;
-       struct new_sem newsem;
-      } sem;
+      /* Create the initial file content.  We force the alignment of
+         the sem_t to match the alignment of struct new_sem since we
+         will copy this stack structure to a file and then mmap it,
+         so we must ensure it is aligned to zero here so that the
+         behavior of to_new_sem () is the same as when we later mmap
+         it into memory and have it be page-aligned.  */
+      sem_t sem __attribute__ ((aligned (__alignof__ (struct new_sem))));;
+      struct new_sem *newsem = to_new_sem (&sem);
+
+      /* Initialize the unused parts of sem_t to zero.
+         The compiler will notice most of this memset is dead based on
+         the assignments through the struct new_sem pointer.  */
+      memset (&sem, '\0', sizeof (sem_t));

  #if __HAVE_64B_ATOMICS
-      sem.newsem.data = value;
+      newsem->data = value;
  #else
-      sem.newsem.value = value << SEM_VALUE_SHIFT;
-      sem.newsem.nwaiters = 0;
+      newsem->value = value << SEM_VALUE_SHIFT;
+      newsem->nwaiters = 0;
  #endif
        /* This always is a shared semaphore.  */
-      sem.newsem.private = LLL_SHARED;
-
-      /* Initialize the remaining bytes as well.  */
-      memset ((char *) &sem.initsem + sizeof (struct new_sem), '\0',
-             sizeof (sem_t) - sizeof (struct new_sem));
+      newsem->private = LLL_SHARED;

        tmpfname = __alloca (shm_dirlen + sizeof SEM_SHM_PREFIX + 6);
        char *xxxxxx = __mempcpy (tmpfname, shm_dir, shm_dirlen);
@@ -242,7 +245,7 @@  sem_open (const char *name, int oflag, ...)
           break;
         }

-      if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem.initsem, sizeof (sem_t)))
+      if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem, sizeof (sem_t)))
           == sizeof (sem_t)
           /* Map the sem_t structure from the file.  */
           && (result = (sem_t *) mmap (NULL, sizeof (sem_t),
diff --git a/nptl/sem_post.c b/nptl/sem_post.c
index 6e495ed..71818ea 100644
--- a/nptl/sem_post.c
+++ b/nptl/sem_post.c
@@ -56,7 +56,7 @@  futex_wake (unsigned int* futex, int processes_to_wake, int private)
  int
  __new_sem_post (sem_t *sem)
  {
-  struct new_sem *isem = (struct new_sem *) sem;
+  struct new_sem *isem = to_new_sem (sem);
    int private = isem->private;

  #if __HAVE_64B_ATOMICS
diff --git a/nptl/sem_timedwait.c b/nptl/sem_timedwait.c
index 042b0ac..5e650e0 100644
--- a/nptl/sem_timedwait.c
+++ b/nptl/sem_timedwait.c
@@ -30,8 +30,8 @@  sem_timedwait (sem_t *sem, const struct timespec *abstime)
        return -1;
      }

-  if (__new_sem_wait_fast ((struct new_sem *) sem, 0) == 0)
+  if (__new_sem_wait_fast (to_new_sem (sem), 0) == 0)
      return 0;
    else
-    return __new_sem_wait_slow((struct new_sem *) sem, abstime);
+    return __new_sem_wait_slow(to_new_sem (sem), abstime);
  }
diff --git a/nptl/sem_wait.c b/nptl/sem_wait.c
index c1fd10c..73759f1 100644
--- a/nptl/sem_wait.c
+++ b/nptl/sem_wait.c
@@ -22,10 +22,10 @@ 
  int
  __new_sem_wait (sem_t *sem)
  {
-  if (__new_sem_wait_fast ((struct new_sem *) sem, 0) == 0)
+  if (__new_sem_wait_fast (to_new_sem (sem), 0) == 0)
      return 0;
    else
-    return __new_sem_wait_slow((struct new_sem *) sem, NULL);
+    return __new_sem_wait_slow(to_new_sem (sem), NULL);
  }
  versioned_symbol (libpthread, __new_sem_wait, sem_wait, GLIBC_2_1);

@@ -65,7 +65,7 @@  __new_sem_trywait (sem_t *sem)
  {
    /* We must not fail spuriously, so require a definitive result even if this
       may lead to a long execution time.  */
-  if (__new_sem_wait_fast ((struct new_sem *) sem, 1) == 0)
+  if (__new_sem_wait_fast (to_new_sem (sem), 1) == 0)
      return 0;
    __set_errno (EAGAIN);
    return -1;
diff --git a/sysdeps/nptl/internaltypes.h b/sysdeps/nptl/internaltypes.h
index 8f5cfa4..8dd4018 100644
--- a/sysdeps/nptl/internaltypes.h
+++ b/sysdeps/nptl/internaltypes.h
@@ -168,6 +168,22 @@  struct new_sem
  #endif
  };

+/* The sem_t alignment is typically just 4 bytes for ILP32, whereas
+   new_sem is 8 bytes with __HAVE_64B_ATOMICS.  For better atomic
+   performance on platforms that tolerate unaligned atomics (and to
+   make it work at all on platforms that don't), round up the new_sem
+   pointer within the sem_t object to an 8-byte boundary.
+
+   Note that in this case, the useful part of the new_sem structure
+   (up to the "pad" variable) is only 12 bytes, and sem_t is always
+   at least 16 bytes, so moving the start of new_sem to an offset
+   of 4 bytes within sem_t is okay.  We must not read or write the
+   "pad" field within new_sem to maintain correctness.  */
+#define to_new_sem(s)                                                \
+  ((__alignof__ (sem_t) == 4 && __alignof__ (struct new_sem) == 8) ? \
+   (struct new_sem *) (((uintptr_t) (s) + 4) & -8) :                 \
+   (struct new_sem *) (s))
+
  struct old_sem
  {
    unsigned int value;