Message ID | 54C6CF4B.1020600@ezchip.com |
---|---|
State | New, archived |
Headers | show |
On 01/26/2015 06:35 PM, Chris Metcalf wrote: > This version reflects Torvald's recent comments. > > 2015-01-25 Chris Metcalf <cmetcalf@ezchip.com> > > * sysdeps/nptl/internaltypes.h (to_new_sem): Define. Provides new > behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)]. > * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem. > * nptl/sem_init.c (__new_sem_init): Likewise. > * nptl/sem_open.c (sem_open): Likewise. > * nptl/sem_post.c (__new_sem_post): Likewise. > * nptl/sem_timedwait.c (sem_timedwait): Likewise. > * nptl/sem_wait.c (__new_sem_wait): Likewise. > (__new_sem_trywait): Likewise. I'm not all that happy with this patch. For example a statically compiled application sharing a semaphore with a dynamically compiled application would appear to break under your changes and nothing you can do would fix it. They each have different layouts for the semaphore. This is a common problem when switching internal layouts (like we did from Linuxthreads to NPTL). I accept that you may want to make this change and that this particular case might be unsupportable, but I want more discussion on the topic. Similarly H.J's comment to change the alignment of the type is equally flawed as embedded sem_t's in other types would break other ABIs down the line. I see he's commented on that in a later down-thread post. As far as I can tell the only immediately kosher solution is for these machines, that can't use 64-bit atomics because the alignment of their sem_t is not correct, is to use 32-bit atomics (for now). The choice of 32-bit atomics in no way prevents you from future 64-bit atomic uses. You can switch AFAICT to another internal implementation at a later date. Could you switch to 32-bit atomics for 2.21 and continue to investigate this optimziation for 2.22? Cheers, Carlos.
On Mon, Jan 26, 2015 at 8:31 PM, Carlos O'Donell <carlos@redhat.com> wrote: > On 01/26/2015 06:35 PM, Chris Metcalf wrote: >> This version reflects Torvald's recent comments. >> >> 2015-01-25 Chris Metcalf <cmetcalf@ezchip.com> >> >> * sysdeps/nptl/internaltypes.h (to_new_sem): Define. Provides new >> behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)]. >> * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem. >> * nptl/sem_init.c (__new_sem_init): Likewise. >> * nptl/sem_open.c (sem_open): Likewise. >> * nptl/sem_post.c (__new_sem_post): Likewise. >> * nptl/sem_timedwait.c (sem_timedwait): Likewise. >> * nptl/sem_wait.c (__new_sem_wait): Likewise. >> (__new_sem_trywait): Likewise. > > I'm not all that happy with this patch. > > For example a statically compiled application sharing a semaphore > with a dynamically compiled application would appear to break > under your changes and nothing you can do would fix it. They each > have different layouts for the semaphore. This is a common problem > when switching internal layouts (like we did from Linuxthreads to > NPTL). I accept that you may want to make this change and that this > particular case might be unsupportable, but I want more discussion > on the topic. > > Similarly H.J's comment to change the alignment of the type > is equally flawed as embedded sem_t's in other types would break > other ABIs down the line. I see he's commented on that in a later > down-thread post. > > As far as I can tell the only immediately kosher solution is for > these machines, that can't use 64-bit atomics because the > alignment of their sem_t is not correct, is to use 32-bit atomics > (for now). The choice of 32-bit atomics in no way prevents you > from future 64-bit atomic uses. You can switch AFAICT to another > internal implementation at a later date. > > Could you switch to 32-bit atomics for 2.21 and continue to > investigate this optimziation for 2.22? > What is the motivation to use 64-bit atomics for ILP32 here? Performance or correctness?
On Mon, Jan 26, 2015 at 8:31 PM, Carlos O'Donell <carlos@redhat.com> wrote: > On 01/26/2015 06:35 PM, Chris Metcalf wrote: >> This version reflects Torvald's recent comments. >> >> 2015-01-25 Chris Metcalf <cmetcalf@ezchip.com> >> >> * sysdeps/nptl/internaltypes.h (to_new_sem): Define. Provides new >> behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)]. >> * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem. >> * nptl/sem_init.c (__new_sem_init): Likewise. >> * nptl/sem_open.c (sem_open): Likewise. >> * nptl/sem_post.c (__new_sem_post): Likewise. >> * nptl/sem_timedwait.c (sem_timedwait): Likewise. >> * nptl/sem_wait.c (__new_sem_wait): Likewise. >> (__new_sem_trywait): Likewise. > > I'm not all that happy with this patch. > > For example a statically compiled application sharing a semaphore > with a dynamically compiled application would appear to break > under your changes and nothing you can do would fix it. They each > have different layouts for the semaphore. This is a common problem > when switching internal layouts (like we did from Linuxthreads to > NPTL). I accept that you may want to make this change and that this > particular case might be unsupportable, but I want more discussion > on the topic. > > Similarly H.J's comment to change the alignment of the type > is equally flawed as embedded sem_t's in other types would break > other ABIs down the line. I see he's commented on that in a later > down-thread post. > > As far as I can tell the only immediately kosher solution is for > these machines, that can't use 64-bit atomics because the > alignment of their sem_t is not correct, is to use 32-bit atomics > (for now). The choice of 32-bit atomics in no way prevents you > from future 64-bit atomic uses. You can switch AFAICT to another > internal implementation at a later date. > > Could you switch to 32-bit atomics for 2.21 and continue to > investigate this optimziation for 2.22? Since AARCH64:ILP32 is not in yet, it might make sense to use 64bit atomics for it though we have the same issue as there are now glibc in the wild with AARCH64:ILP32 support and those would have the same issue as tile now. Thanks, Andrew > > Cheers, > Carlos. >
On Mon, 2015-01-26 at 23:31 -0500, Carlos O'Donell wrote: > On 01/26/2015 06:35 PM, Chris Metcalf wrote: > > This version reflects Torvald's recent comments. > > > > 2015-01-25 Chris Metcalf <cmetcalf@ezchip.com> > > > > * sysdeps/nptl/internaltypes.h (to_new_sem): Define. Provides new > > behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)]. > > * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem. > > * nptl/sem_init.c (__new_sem_init): Likewise. > > * nptl/sem_open.c (sem_open): Likewise. > > * nptl/sem_post.c (__new_sem_post): Likewise. > > * nptl/sem_timedwait.c (sem_timedwait): Likewise. > > * nptl/sem_wait.c (__new_sem_wait): Likewise. > > (__new_sem_trywait): Likewise. > > I'm not all that happy with this patch. > > For example a statically compiled application sharing a semaphore > with a dynamically compiled application would appear to break > under your changes and nothing you can do would fix it. They each > have different layouts for the semaphore. This is a common problem > when switching internal layouts (like we did from Linuxthreads to > NPTL). I accept that you may want to make this change and that this > particular case might be unsupportable, but I want more discussion > on the topic. I'm not quite sure what you are concerned about. The sem_t structure exposed to programs does not change. I agree the internal layout changes, depending on the address of the sem_t instance. This is similar to having a pointer in sem_t pointing to within this instance. One isn't allowed to do bitwise copy though, so once initialized at an address, an instance can't move to another address (with the exception of mapping at a different page, but that won't change the within-a-page offset, which really matters here). If we have a statically compiled application sharing a semaphore with a dynamically compiled application, then if they use the same glibc code, I don't see what could break. If they don't, things will break, but even without the patch by Chris. So, are you concerned about process-shared pthreads objects where the glibc code used on both sides differs? We can decide to never change any of such code in incompatible ways (and this might already happen just if we remove barriers or read from it differently!) -- but then we won't be able to fix certain bugs (eg, this semaphore bug). I also don't see how versioning could help -- the (process-)sharing between the applications happens at the source code level, and we have no versioning support built into the data representation, so we can't detect the difference when sharing. > Similarly H.J's comment to change the alignment of the type > is equally flawed as embedded sem_t's in other types would break > other ABIs down the line. I see he's commented on that in a later > down-thread post. I don't think it's the same, but maybe I don't understand your concern. > As far as I can tell the only immediately kosher solution is for > these machines, that can't use 64-bit atomics because the > alignment of their sem_t is not correct, is to use 32-bit atomics > (for now). The choice of 32-bit atomics in no way prevents you > from future 64-bit atomic uses. You can switch AFAICT to another > internal implementation at a later date. I agree that this is possible. But not that this will change the internal representation of struct new_sem too.
On Mon, 2015-01-26 at 18:35 -0500, Chris Metcalf wrote:
> This version reflects Torvald's recent comments.
I think this looks good, but we'll have to find consensus with Carlos
first. I also wouldn't be opposed to at first just enable the
64b-algorithm if __HAVE_64B_ATOMICS and _LP64 are defined (if the latter
is the right check...).
On Mon, Jan 26, 2015 at 3:35 PM, Chris Metcalf <cmetcalf@ezchip.com> wrote: > This version reflects Torvald's recent comments. > > 2015-01-25 Chris Metcalf <cmetcalf@ezchip.com> > > * sysdeps/nptl/internaltypes.h (to_new_sem): Define. Provides new > behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)]. > * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem. > * nptl/sem_init.c (__new_sem_init): Likewise. > * nptl/sem_open.c (sem_open): Likewise. > * nptl/sem_post.c (__new_sem_post): Likewise. > * nptl/sem_timedwait.c (sem_timedwait): Likewise. > * nptl/sem_wait.c (__new_sem_wait): Likewise. > (__new_sem_trywait): Likewise. > > /* Use the values the caller provided. */ > #if __HAVE_64B_ATOMICS > diff --git a/nptl/sem_open.c b/nptl/sem_open.c > index bfd2dea..202769f 100644 > --- a/nptl/sem_open.c > +++ b/nptl/sem_open.c > @@ -186,25 +186,28 @@ sem_open (const char *name, int oflag, ...) > return SEM_FAILED; > } > > - /* Create the initial file content. */ > - union > - { > - sem_t initsem; > - struct new_sem newsem; > - } sem; > + /* Create the initial file content. We force the alignment of > + the sem_t to match the alignment of struct new_sem since we > + will copy this stack structure to a file and then mmap it, > + so we must ensure it is aligned to zero here so that the > + behavior of to_new_sem () is the same as when we later mmap > + it into memory and have it be page-aligned. */ > + sem_t sem __attribute__ ((aligned (__alignof__ (struct new_sem))));; > + struct new_sem *newsem = to_new_sem (&sem); > + > + /* Initialize the unused parts of sem_t to zero. > + The compiler will notice most of this memset is dead based on > + the assignments through the struct new_sem pointer. */ > + memset (&sem, '\0', sizeof (sem_t)); Why is this change needed? Since union sem has the largest alignment of sem_t and struct new_sem, sem is properly aligned here. > #if __HAVE_64B_ATOMICS > - sem.newsem.data = value; > + newsem->data = value; > #else > - sem.newsem.value = value << SEM_VALUE_SHIFT; > - sem.newsem.nwaiters = 0; > + newsem->value = value << SEM_VALUE_SHIFT; > + newsem->nwaiters = 0; > #endif > /* This always is a shared semaphore. */ > - sem.newsem.private = LLL_SHARED; > - > - /* Initialize the remaining bytes as well. */ > - memset ((char *) &sem.initsem + sizeof (struct new_sem), '\0', > - sizeof (sem_t) - sizeof (struct new_sem)); > + newsem->private = LLL_SHARED; > > tmpfname = __alloca (shm_dirlen + sizeof SEM_SHM_PREFIX + 6); > char *xxxxxx = __mempcpy (tmpfname, shm_dir, shm_dirlen); > @@ -242,7 +245,7 @@ sem_open (const char *name, int oflag, ...) > break; > } > > - if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem.initsem, sizeof > (sem_t))) > + if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem, sizeof (sem_t))) > == sizeof (sem_t) > /* Map the sem_t structure from the file. */ > && (result = (sem_t *) mmap (NULL, sizeof (sem_t),
On 1/27/2015 6:40 AM, H.J. Lu wrote: > On Mon, Jan 26, 2015 at 3:35 PM, Chris Metcalf <cmetcalf@ezchip.com> wrote: >> --- a/nptl/sem_open.c >> +++ b/nptl/sem_open.c >> @@ -186,25 +186,28 @@ sem_open (const char *name, int oflag, ...) >> return SEM_FAILED; >> } >> >> - /* Create the initial file content. */ >> - union >> - { >> - sem_t initsem; >> - struct new_sem newsem; >> - } sem; >> + /* Create the initial file content. We force the alignment of >> + the sem_t to match the alignment of struct new_sem since we >> + will copy this stack structure to a file and then mmap it, >> + so we must ensure it is aligned to zero here so that the >> + behavior of to_new_sem () is the same as when we later mmap >> + it into memory and have it be page-aligned. */ >> + sem_t sem __attribute__ ((aligned (__alignof__ (struct new_sem))));; >> + struct new_sem *newsem = to_new_sem (&sem); >> + >> + /* Initialize the unused parts of sem_t to zero. >> + The compiler will notice most of this memset is dead based on >> + the assignments through the struct new_sem pointer. */ >> + memset (&sem, '\0', sizeof (sem_t)); > Why is this change needed? Since union sem has the largest alignment of > sem_t and struct new_sem, sem is properly aligned here. It is not needed. Torvald's claim was that it was more stylistically consistent to use to_new_sem() in all cases, even here. I am personally on the fence about this; I think the new code and old code are both plausible ways to represent the issue with the required alignment. If Torvald prefers the new code and you prefer the old code, perhaps we will need to ask for a tie-breaker vote. :-)
On 1/26/2015 11:31 PM, Carlos O'Donell wrote: > On 01/26/2015 06:35 PM, Chris Metcalf wrote: >> This version reflects Torvald's recent comments. >> >> 2015-01-25 Chris Metcalf <cmetcalf@ezchip.com> >> >> * sysdeps/nptl/internaltypes.h (to_new_sem): Define. Provides new >> behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)]. >> * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem. >> * nptl/sem_init.c (__new_sem_init): Likewise. >> * nptl/sem_open.c (sem_open): Likewise. >> * nptl/sem_post.c (__new_sem_post): Likewise. >> * nptl/sem_timedwait.c (sem_timedwait): Likewise. >> * nptl/sem_wait.c (__new_sem_wait): Likewise. >> (__new_sem_trywait): Likewise. > I'm not all that happy with this patch. > > For example a statically compiled application sharing a semaphore > with a dynamically compiled application would appear to break > under your changes and nothing you can do would fix it. They each > have different layouts for the semaphore. This is a common problem > when switching internal layouts (like we did from Linuxthreads to > NPTL). I accept that you may want to make this change and that this > particular case might be unsupportable, but I want more discussion > on the topic. > > Similarly H.J's comment to change the alignment of the type > is equally flawed as embedded sem_t's in other types would break > other ABIs down the line. I see he's commented on that in a later > down-thread post. > > As far as I can tell the only immediately kosher solution is for > these machines, that can't use 64-bit atomics because the > alignment of their sem_t is not correct, is to use 32-bit atomics > (for now). The choice of 32-bit atomics in no way prevents you > from future 64-bit atomic uses. You can switch AFAICT to another > internal implementation at a later date. > > Could you switch to 32-bit atomics for 2.21 and continue to > investigate this optimziation for 2.22? I am certainly willing to use 32-bit atomics here for now. The absolutely minimal change is to change tilegx32 to set __HAVE_64B_ATOMICS to 0. Obviously the AArch64 ILP32 mode (out of tree) will want to do the same. And I don't know whether m68k or mips require a similar change or if they can tolerate unaligned 64-bit atomic operations. So arguably if we can decide that my proposed re-aligning change is best, then it also automatically handles all the other platforms as well.
On Tue, Jan 27, 2015 at 1:21 PM, Chris Metcalf <cmetcalf@ezchip.com> wrote: > On 1/26/2015 11:31 PM, Carlos O'Donell wrote: >> >> On 01/26/2015 06:35 PM, Chris Metcalf wrote: >>> >>> This version reflects Torvald's recent comments. >>> >>> 2015-01-25 Chris Metcalf <cmetcalf@ezchip.com> >>> >>> * sysdeps/nptl/internaltypes.h (to_new_sem): Define. Provides >>> new >>> behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)]. >>> * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem. >>> * nptl/sem_init.c (__new_sem_init): Likewise. >>> * nptl/sem_open.c (sem_open): Likewise. >>> * nptl/sem_post.c (__new_sem_post): Likewise. >>> * nptl/sem_timedwait.c (sem_timedwait): Likewise. >>> * nptl/sem_wait.c (__new_sem_wait): Likewise. >>> (__new_sem_trywait): Likewise. >> >> I'm not all that happy with this patch. >> >> For example a statically compiled application sharing a semaphore >> with a dynamically compiled application would appear to break >> under your changes and nothing you can do would fix it. They each >> have different layouts for the semaphore. This is a common problem >> when switching internal layouts (like we did from Linuxthreads to >> NPTL). I accept that you may want to make this change and that this >> particular case might be unsupportable, but I want more discussion >> on the topic. >> >> Similarly H.J's comment to change the alignment of the type >> is equally flawed as embedded sem_t's in other types would break >> other ABIs down the line. I see he's commented on that in a later >> down-thread post. >> >> As far as I can tell the only immediately kosher solution is for >> these machines, that can't use 64-bit atomics because the >> alignment of their sem_t is not correct, is to use 32-bit atomics >> (for now). The choice of 32-bit atomics in no way prevents you >> from future 64-bit atomic uses. You can switch AFAICT to another >> internal implementation at a later date. >> >> Could you switch to 32-bit atomics for 2.21 and continue to >> investigate this optimziation for 2.22? > > > I am certainly willing to use 32-bit atomics here for now. The absolutely > minimal change is to change tilegx32 to set __HAVE_64B_ATOMICS to 0. > Obviously the AArch64 ILP32 mode (out of tree) will want to do the same. > And I don't know whether m68k or mips require a similar change or if > they can tolerate unaligned 64-bit atomic operations. So arguably if > we can decide that my proposed re-aligning change is best, then it also > automatically handles all the other platforms as well. > X32 can use a hybrid implementation. Use 32-bit atomics unless 64-bit atomics on misaligned memory can simplify the code. It is hard to say if it is faster. Since we are in code freeze, I prefer to use 32-bit atomics in 2.21 for now. We can always backport if needed.
On 01/27/2015 04:21 PM, Chris Metcalf wrote: > On 1/26/2015 11:31 PM, Carlos O'Donell wrote: >> On 01/26/2015 06:35 PM, Chris Metcalf wrote: >>> This version reflects Torvald's recent comments. >>> >>> 2015-01-25 Chris Metcalf <cmetcalf@ezchip.com> >>> >>> * sysdeps/nptl/internaltypes.h (to_new_sem): Define. Provides new >>> behavior for [__HAVE_64B_ATOMICS && !defined (_LP64)]. >>> * nptl/sem_getvalue.c (__new_sem_getvalue): Use to_new_sem. >>> * nptl/sem_init.c (__new_sem_init): Likewise. >>> * nptl/sem_open.c (sem_open): Likewise. >>> * nptl/sem_post.c (__new_sem_post): Likewise. >>> * nptl/sem_timedwait.c (sem_timedwait): Likewise. >>> * nptl/sem_wait.c (__new_sem_wait): Likewise. >>> (__new_sem_trywait): Likewise. >> I'm not all that happy with this patch. >> >> For example a statically compiled application sharing a semaphore >> with a dynamically compiled application would appear to break >> under your changes and nothing you can do would fix it. They each >> have different layouts for the semaphore. This is a common problem >> when switching internal layouts (like we did from Linuxthreads to >> NPTL). I accept that you may want to make this change and that this >> particular case might be unsupportable, but I want more discussion >> on the topic. >> >> Similarly H.J's comment to change the alignment of the type >> is equally flawed as embedded sem_t's in other types would break >> other ABIs down the line. I see he's commented on that in a later >> down-thread post. >> >> As far as I can tell the only immediately kosher solution is for >> these machines, that can't use 64-bit atomics because the >> alignment of their sem_t is not correct, is to use 32-bit atomics >> (for now). The choice of 32-bit atomics in no way prevents you >> from future 64-bit atomic uses. You can switch AFAICT to another >> internal implementation at a later date. >> >> Could you switch to 32-bit atomics for 2.21 and continue to >> investigate this optimziation for 2.22? > > I am certainly willing to use 32-bit atomics here for now. The absolutely > minimal change is to change tilegx32 to set __HAVE_64B_ATOMICS to 0. > Obviously the AArch64 ILP32 mode (out of tree) will want to do the same. > And I don't know whether m68k or mips require a similar change or if > they can tolerate unaligned 64-bit atomic operations. So arguably if > we can decide that my proposed re-aligning change is best, then it also > automatically handles all the other platforms as well. I would like several more rounds of review of your realigning code. You have my permission to checkin the minimal change for tilegx32. Similarly other ILP32 with 64-bit atomic arches should do the same, and we can continue this conversation without the pressure of the time boxed release. Cheers, Carlos.
On Tue, 27 Jan 2015, Carlos O'Donell wrote: > > I am certainly willing to use 32-bit atomics here for now. The absolutely > > minimal change is to change tilegx32 to set __HAVE_64B_ATOMICS to 0. > > Obviously the AArch64 ILP32 mode (out of tree) will want to do the same. > > And I don't know whether m68k or mips require a similar change or if > > they can tolerate unaligned 64-bit atomic operations. So arguably if > > we can decide that my proposed re-aligning change is best, then it also > > automatically handles all the other platforms as well. > > I would like several more rounds of review of your realigning code. > > You have my permission to checkin the minimal change for tilegx32. > > Similarly other ILP32 with 64-bit atomic arches should do the same, > and we can continue this conversation without the pressure of the > time boxed release. Is __HAVE_64B_ATOMICS used *only* for semaphores? Because if there are other effects of changing it, doing so seems risky - it would be better to have a define such as SEM_USE_64B_ATOMICS, and make that default to 0 for ILP32 architectures.
> Is __HAVE_64B_ATOMICS used *only* for semaphores? Because if there are > other effects of changing it, doing so seems risky - it would be better to > have a define such as SEM_USE_64B_ATOMICS, and make that default to 0 for > ILP32 architectures. A trivial grep shows there are no other uses.
On Tue, 2015-01-27 at 16:18 -0500, Chris Metcalf wrote: > On 1/27/2015 6:40 AM, H.J. Lu wrote: > > On Mon, Jan 26, 2015 at 3:35 PM, Chris Metcalf <cmetcalf@ezchip.com> wrote: > >> --- a/nptl/sem_open.c > >> +++ b/nptl/sem_open.c > >> @@ -186,25 +186,28 @@ sem_open (const char *name, int oflag, ...) > >> return SEM_FAILED; > >> } > >> > >> - /* Create the initial file content. */ > >> - union > >> - { > >> - sem_t initsem; > >> - struct new_sem newsem; > >> - } sem; > >> + /* Create the initial file content. We force the alignment of > >> + the sem_t to match the alignment of struct new_sem since we > >> + will copy this stack structure to a file and then mmap it, > >> + so we must ensure it is aligned to zero here so that the > >> + behavior of to_new_sem () is the same as when we later mmap > >> + it into memory and have it be page-aligned. */ > >> + sem_t sem __attribute__ ((aligned (__alignof__ (struct new_sem))));; > >> + struct new_sem *newsem = to_new_sem (&sem); > >> + > >> + /* Initialize the unused parts of sem_t to zero. > >> + The compiler will notice most of this memset is dead based on > >> + the assignments through the struct new_sem pointer. */ > >> + memset (&sem, '\0', sizeof (sem_t)); > > Why is this change needed? Since union sem has the largest alignment of > > sem_t and struct new_sem, sem is properly aligned here. > > It is not needed. Torvald's claim was that it was more stylistically > consistent to use to_new_sem() in all cases, even here. I am personally > on the fence about this; I think the new code and old code are both > plausible ways to represent the issue with the required alignment. > > If Torvald prefers the new code and you prefer the old code, perhaps > we will need to ask for a tie-breaker vote. :-) > I thing using to_new_sem in all cases is cleaner, but my comment on the union wasn't quite right. I agree the union should give the proper alignment, because it includes struct new_sem directly, so will pick up the uint64_t alignment of it.
On 1/28/2015 4:04 AM, Torvald Riegel wrote: > I thing using to_new_sem in all cases is cleaner, but my comment on the > union wasn't quite right. I agree the union should give the proper > alignment, because it includes struct new_sem directly, so will pick up > the uint64_t alignment of it. I have reverted the code to just using the union, which on balance I think I slightly prefer. I think mentioning to_new_sem() in the comment as part of the explanation is a sufficient way to deal with this. + /* Create the initial file content. The union forces the + alignment of initsem to be at least as much as newsem. + When to_new_sem() provides for varying internal alignment, + this expression makes the alignment zero, and matches the + eventual alignment when the union is copied to the start of + an mmap'ed file page. */
diff --git a/nptl/sem_getvalue.c b/nptl/sem_getvalue.c index 1432cc7..08a0789 100644 --- a/nptl/sem_getvalue.c +++ b/nptl/sem_getvalue.c @@ -25,7 +25,7 @@ int __new_sem_getvalue (sem_t *sem, int *sval) { - struct new_sem *isem = (struct new_sem *) sem; + struct new_sem *isem = to_new_sem (sem); /* XXX Check for valid SEM parameter. */ /* FIXME This uses relaxed MO, even though POSIX specifies that this function diff --git a/nptl/sem_init.c b/nptl/sem_init.c index 575b661..aaa6af8 100644 --- a/nptl/sem_init.c +++ b/nptl/sem_init.c @@ -50,7 +50,7 @@ __new_sem_init (sem_t *sem, int pshared, unsigned int value) } /* Map to the internal type. */ - struct new_sem *isem = (struct new_sem *) sem; + struct new_sem *isem = to_new_sem (sem); /* Use the values the caller provided. */ #if __HAVE_64B_ATOMICS diff --git a/nptl/sem_open.c b/nptl/sem_open.c index bfd2dea..202769f 100644 --- a/nptl/sem_open.c +++ b/nptl/sem_open.c @@ -186,25 +186,28 @@ sem_open (const char *name, int oflag, ...) return SEM_FAILED; } - /* Create the initial file content. */ - union - { - sem_t initsem; - struct new_sem newsem; - } sem; + /* Create the initial file content. We force the alignment of + the sem_t to match the alignment of struct new_sem since we + will copy this stack structure to a file and then mmap it, + so we must ensure it is aligned to zero here so that the + behavior of to_new_sem () is the same as when we later mmap + it into memory and have it be page-aligned. */ + sem_t sem __attribute__ ((aligned (__alignof__ (struct new_sem))));; + struct new_sem *newsem = to_new_sem (&sem); + + /* Initialize the unused parts of sem_t to zero. + The compiler will notice most of this memset is dead based on + the assignments through the struct new_sem pointer. */ + memset (&sem, '\0', sizeof (sem_t)); #if __HAVE_64B_ATOMICS - sem.newsem.data = value; + newsem->data = value; #else - sem.newsem.value = value << SEM_VALUE_SHIFT; - sem.newsem.nwaiters = 0; + newsem->value = value << SEM_VALUE_SHIFT; + newsem->nwaiters = 0; #endif /* This always is a shared semaphore. */ - sem.newsem.private = LLL_SHARED; - - /* Initialize the remaining bytes as well. */ - memset ((char *) &sem.initsem + sizeof (struct new_sem), '\0', - sizeof (sem_t) - sizeof (struct new_sem)); + newsem->private = LLL_SHARED; tmpfname = __alloca (shm_dirlen + sizeof SEM_SHM_PREFIX + 6); char *xxxxxx = __mempcpy (tmpfname, shm_dir, shm_dirlen); @@ -242,7 +245,7 @@ sem_open (const char *name, int oflag, ...) break; } - if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem.initsem, sizeof (sem_t))) + if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem, sizeof (sem_t))) == sizeof (sem_t) /* Map the sem_t structure from the file. */ && (result = (sem_t *) mmap (NULL, sizeof (sem_t), diff --git a/nptl/sem_post.c b/nptl/sem_post.c index 6e495ed..71818ea 100644 --- a/nptl/sem_post.c +++ b/nptl/sem_post.c @@ -56,7 +56,7 @@ futex_wake (unsigned int* futex, int processes_to_wake, int private) int __new_sem_post (sem_t *sem) { - struct new_sem *isem = (struct new_sem *) sem; + struct new_sem *isem = to_new_sem (sem); int private = isem->private; #if __HAVE_64B_ATOMICS diff --git a/nptl/sem_timedwait.c b/nptl/sem_timedwait.c index 042b0ac..5e650e0 100644 --- a/nptl/sem_timedwait.c +++ b/nptl/sem_timedwait.c @@ -30,8 +30,8 @@ sem_timedwait (sem_t *sem, const struct timespec *abstime) return -1; } - if (__new_sem_wait_fast ((struct new_sem *) sem, 0) == 0) + if (__new_sem_wait_fast (to_new_sem (sem), 0) == 0) return 0; else - return __new_sem_wait_slow((struct new_sem *) sem, abstime); + return __new_sem_wait_slow(to_new_sem (sem), abstime); } diff --git a/nptl/sem_wait.c b/nptl/sem_wait.c index c1fd10c..73759f1 100644 --- a/nptl/sem_wait.c +++ b/nptl/sem_wait.c @@ -22,10 +22,10 @@ int __new_sem_wait (sem_t *sem) { - if (__new_sem_wait_fast ((struct new_sem *) sem, 0) == 0) + if (__new_sem_wait_fast (to_new_sem (sem), 0) == 0) return 0; else - return __new_sem_wait_slow((struct new_sem *) sem, NULL); + return __new_sem_wait_slow(to_new_sem (sem), NULL); } versioned_symbol (libpthread, __new_sem_wait, sem_wait, GLIBC_2_1); @@ -65,7 +65,7 @@ __new_sem_trywait (sem_t *sem) { /* We must not fail spuriously, so require a definitive result even if this may lead to a long execution time. */ - if (__new_sem_wait_fast ((struct new_sem *) sem, 1) == 0) + if (__new_sem_wait_fast (to_new_sem (sem), 1) == 0) return 0; __set_errno (EAGAIN); return -1; diff --git a/sysdeps/nptl/internaltypes.h b/sysdeps/nptl/internaltypes.h index 8f5cfa4..8dd4018 100644 --- a/sysdeps/nptl/internaltypes.h +++ b/sysdeps/nptl/internaltypes.h @@ -168,6 +168,22 @@ struct new_sem #endif }; +/* The sem_t alignment is typically just 4 bytes for ILP32, whereas + new_sem is 8 bytes with __HAVE_64B_ATOMICS. For better atomic + performance on platforms that tolerate unaligned atomics (and to + make it work at all on platforms that don't), round up the new_sem + pointer within the sem_t object to an 8-byte boundary. + + Note that in this case, the useful part of the new_sem structure + (up to the "pad" variable) is only 12 bytes, and sem_t is always + at least 16 bytes, so moving the start of new_sem to an offset + of 4 bytes within sem_t is okay. We must not read or write the + "pad" field within new_sem to maintain correctness. */ +#define to_new_sem(s) \ + ((__alignof__ (sem_t) == 4 && __alignof__ (struct new_sem) == 8) ? \ + (struct new_sem *) (((uintptr_t) (s) + 4) & -8) : \ + (struct new_sem *) (s)) + struct old_sem { unsigned int value;