[6/6] Compile sched_getaffinity.c with -fno-builtin-memset

Message ID 20150814121907.GF28610@gmail.com
State Dropped
Headers

Commit Message

H.J. Lu Aug. 14, 2015, 12:19 p.m. UTC
  Since sched_getaffinity.c calls memset which may not be inlined, we
should compile it with -fno-builtin-memset so that the internal hidden
memset is called without PLT.

OK for master?

	* sysdeps/unix/sysv/linux/Makefile (CFLAGS-sched_getaffinity.c):
	New.  Set to -fno-builtin-memset.
---
 sysdeps/unix/sysv/linux/Makefile | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Joseph Myers Aug. 14, 2015, 3:27 p.m. UTC | #1
On Fri, 14 Aug 2015, H.J. Lu wrote:

> Since sched_getaffinity.c calls memset which may not be inlined, we
> should compile it with -fno-builtin-memset so that the internal hidden
> memset is called without PLT.
> 
> OK for master?

memset is called all over the place.  I don't like hardcoding special 
options for particular files based on calls to memset, and especially not 
based on what some particular compiler version happens to do with a 
particular call.  Why doesn't the call to libc_hidden_builtin_proto 
(memset) in include/string.h suffice?  Can you devise some set of 
declarations / toplevel asms to put in a header that will ensure memset 
calls go via a hidden alias (ideally with the compiler knowing it's 
hidden, not just the assembler / linker, so the compiler can do any 
optimizations based on it being an intra-library call)?
  
H.J. Lu Aug. 14, 2015, 3:33 p.m. UTC | #2
On Fri, Aug 14, 2015 at 8:27 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Fri, 14 Aug 2015, H.J. Lu wrote:
>
>> Since sched_getaffinity.c calls memset which may not be inlined, we
>> should compile it with -fno-builtin-memset so that the internal hidden
>> memset is called without PLT.
>>
>> OK for master?
>
> memset is called all over the place.  I don't like hardcoding special
> options for particular files based on calls to memset, and especially not
> based on what some particular compiler version happens to do with a
> particular call.  Why doesn't the call to libc_hidden_builtin_proto
> (memset) in include/string.h suffice?  Can you devise some set of
> declarations / toplevel asms to put in a header that will ensure memset
> calls go via a hidden alias (ideally with the compiler knowing it's
> hidden, not just the assembler / linker, so the compiler can do any
> optimizations based on it being an intra-library call)?

libc_hidden_builtin_proto (memset)

has no impact on memset inlined by compiler.
  
Joseph Myers Aug. 14, 2015, 3:46 p.m. UTC | #3
On Fri, 14 Aug 2015, H.J. Lu wrote:

> On Fri, Aug 14, 2015 at 8:27 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> > On Fri, 14 Aug 2015, H.J. Lu wrote:
> >
> >> Since sched_getaffinity.c calls memset which may not be inlined, we
> >> should compile it with -fno-builtin-memset so that the internal hidden
> >> memset is called without PLT.
> >>
> >> OK for master?
> >
> > memset is called all over the place.  I don't like hardcoding special
> > options for particular files based on calls to memset, and especially not
> > based on what some particular compiler version happens to do with a
> > particular call.  Why doesn't the call to libc_hidden_builtin_proto
> > (memset) in include/string.h suffice?  Can you devise some set of
> > declarations / toplevel asms to put in a header that will ensure memset
> > calls go via a hidden alias (ideally with the compiler knowing it's
> > hidden, not just the assembler / linker, so the compiler can do any
> > optimizations based on it being an intra-library call)?
> 
> libc_hidden_builtin_proto (memset)
> 
> has no impact on memset inlined by compiler.

Well, I see references to __GI_memset, not memset, in sched_getaffinity.os 
for both x86 and x86_64 (building with GCC 5).  So it seems to be working 
OK for me.  It's true some of those references use PLT-generating 
relocations, but those relocations shouldn't actually generate PLT entries 
when they are relocations against symbols not exported from the library - 
the linker should convert them to direct calls.  If they do generate PLT 
entries, it sounds like a linker optimization is missing.

Apart from any missing linker optimizations, the compiler should know when 
calls are to hidden functions as calls to those may be more efficient than 
calls that only the linker determines don't need a PLT entry.  On 32-bit 
x86, that means the compiler knowing it doesn't need to set %ebx up to 
point to the GOT as it would for a call through the PLT.  I don't know 
whether this gets optimized for built-in function calls using 
libc_hidden_builtin_proto, but if it doesn't, that seems like a missed 
optimization that should be addressed in the compiler not in glibc.
  
H.J. Lu Aug. 14, 2015, 3:55 p.m. UTC | #4
On Fri, Aug 14, 2015 at 8:46 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Fri, 14 Aug 2015, H.J. Lu wrote:
>
>> On Fri, Aug 14, 2015 at 8:27 AM, Joseph Myers <joseph@codesourcery.com> wrote:
>> > On Fri, 14 Aug 2015, H.J. Lu wrote:
>> >
>> >> Since sched_getaffinity.c calls memset which may not be inlined, we
>> >> should compile it with -fno-builtin-memset so that the internal hidden
>> >> memset is called without PLT.
>> >>
>> >> OK for master?
>> >
>> > memset is called all over the place.  I don't like hardcoding special
>> > options for particular files based on calls to memset, and especially not
>> > based on what some particular compiler version happens to do with a
>> > particular call.  Why doesn't the call to libc_hidden_builtin_proto
>> > (memset) in include/string.h suffice?  Can you devise some set of
>> > declarations / toplevel asms to put in a header that will ensure memset
>> > calls go via a hidden alias (ideally with the compiler knowing it's
>> > hidden, not just the assembler / linker, so the compiler can do any
>> > optimizations based on it being an intra-library call)?
>>
>> libc_hidden_builtin_proto (memset)
>>
>> has no impact on memset inlined by compiler.
>
> Well, I see references to __GI_memset, not memset, in sched_getaffinity.os
> for both x86 and x86_64 (building with GCC 5).  So it seems to be working
> OK for me.  It's true some of those references use PLT-generating

Yes, it does reference __GI_memset, but via PLT, since GCC doesn't
know the libcall for the builtin memset is hidden.

> relocations, but those relocations shouldn't actually generate PLT entries
> when they are relocations against symbols not exported from the library -
> the linker should convert them to direct calls.  If they do generate PLT
> entries, it sounds like a linker optimization is missing.
>
> Apart from any missing linker optimizations, the compiler should know when

Linker isn't an issue.

> calls are to hidden functions as calls to those may be more efficient than
> calls that only the linker determines don't need a PLT entry.  On 32-bit
> x86, that means the compiler knowing it doesn't need to set %ebx up to

%ebx is the issue.

> point to the GOT as it would for a call through the PLT.  I don't know
> whether this gets optimized for built-in function calls using
> libc_hidden_builtin_proto, but if it doesn't, that seems like a missed
> optimization that should be addressed in the compiler not in glibc.
>

It is hard to say it is a GCC issue when we are kind of doing this behind
the back of GCC unless there is a way to tell GCC that the libcall of
a builtin function is hidden.
  
Joseph Myers Aug. 14, 2015, 4:02 p.m. UTC | #5
On Fri, 14 Aug 2015, H.J. Lu wrote:

> > point to the GOT as it would for a call through the PLT.  I don't know
> > whether this gets optimized for built-in function calls using
> > libc_hidden_builtin_proto, but if it doesn't, that seems like a missed
> > optimization that should be addressed in the compiler not in glibc.
> >
> 
> It is hard to say it is a GCC issue when we are kind of doing this behind
> the back of GCC unless there is a way to tell GCC that the libcall of
> a builtin function is hidden.

We're declaring (essentially, though a load of macros):

extern __typeof (memset) memset __asm__ ("__GI_memset") __attribute__ ((visibility ("hidden")));

GCC is using the __asm__ information to compile calls to __builtin_memset 
to use the assembler name __GI_memset - that is, in that regard our 
declaration is being taken as applying to the libcall.  If it's not also 
using the visibility information to know that those are calls within the 
library and don't need to set up %ebx, that's a clear inconsistency; all 
pieces of information from an explicit declaration of a built-in function 
should be treated the same unless there is a clear reason not to do so.
  
H.J. Lu Aug. 14, 2015, 4:06 p.m. UTC | #6
On Fri, Aug 14, 2015 at 9:02 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Fri, 14 Aug 2015, H.J. Lu wrote:
>
>> > point to the GOT as it would for a call through the PLT.  I don't know
>> > whether this gets optimized for built-in function calls using
>> > libc_hidden_builtin_proto, but if it doesn't, that seems like a missed
>> > optimization that should be addressed in the compiler not in glibc.
>> >
>>
>> It is hard to say it is a GCC issue when we are kind of doing this behind
>> the back of GCC unless there is a way to tell GCC that the libcall of
>> a builtin function is hidden.
>
> We're declaring (essentially, though a load of macros):
>
> extern __typeof (memset) memset __asm__ ("__GI_memset") __attribute__ ((visibility ("hidden")));
>
> GCC is using the __asm__ information to compile calls to __builtin_memset
> to use the assembler name __GI_memset - that is, in that regard our
> declaration is being taken as applying to the libcall.  If it's not also
> using the visibility information to know that those are calls within the
> library and don't need to set up %ebx, that's a clear inconsistency; all
> pieces of information from an explicit declaration of a built-in function
> should be treated the same unless there is a clear reason not to do so.
>

Should I open a GCC bug?
  
Joseph Myers Aug. 14, 2015, 4:11 p.m. UTC | #7
On Fri, 14 Aug 2015, H.J. Lu wrote:

> > We're declaring (essentially, though a load of macros):
> >
> > extern __typeof (memset) memset __asm__ ("__GI_memset") __attribute__ ((visibility ("hidden")));
> >
> > GCC is using the __asm__ information to compile calls to __builtin_memset
> > to use the assembler name __GI_memset - that is, in that regard our
> > declaration is being taken as applying to the libcall.  If it's not also
> > using the visibility information to know that those are calls within the
> > library and don't need to set up %ebx, that's a clear inconsistency; all
> > pieces of information from an explicit declaration of a built-in function
> > should be treated the same unless there is a clear reason not to do so.
> >
> 
> Should I open a GCC bug?

If you wish (and have verified that %ebx is being unnecessarily set up).  
It's a missed optimization rather than an actual bug.
  
H.J. Lu Aug. 14, 2015, 6 p.m. UTC | #8
On Fri, Aug 14, 2015 at 9:11 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Fri, 14 Aug 2015, H.J. Lu wrote:
>
>> > We're declaring (essentially, though a load of macros):
>> >
>> > extern __typeof (memset) memset __asm__ ("__GI_memset") __attribute__ ((visibility ("hidden")));
>> >
>> > GCC is using the __asm__ information to compile calls to __builtin_memset
>> > to use the assembler name __GI_memset - that is, in that regard our
>> > declaration is being taken as applying to the libcall.  If it's not also
>> > using the visibility information to know that those are calls within the
>> > library and don't need to set up %ebx, that's a clear inconsistency; all
>> > pieces of information from an explicit declaration of a built-in function
>> > should be treated the same unless there is a clear reason not to do so.
>> >
>>
>> Should I open a GCC bug?
>
> If you wish (and have verified that %ebx is being unnecessarily set up).
> It's a missed optimization rather than an actual bug.

I withdrew my patch and opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67220
  

Patch

diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index bfbabd4..368f57b 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -144,6 +144,8 @@  tests += tst-getcpu
 CFLAGS-fork.c = $(libio-mtsafe)
 CFLAGS-getpid.o = -fomit-frame-pointer
 CFLAGS-getpid.os = -fomit-frame-pointer
+# Use the internal hidden memset.
+CFLAGS-sched_getaffinity.c = -fno-builtin-memset
 endif
 
 ifeq ($(subdir),inet)