[roland/arm] ARM: Define PI_STATIC_AND_HIDDEN.

Message ID 20150414223916.9B4782C3BDC@topped-with-meat.com
State Committed
Headers

Commit Message

Roland McGrath April 14, 2015, 10:39 p.m. UTC
  I suspect this is now safe.  (It caused no apparent mischief in my
arm-linux-gnueabihf build using GCC 4.8.2.)  But I don't actually know
what GCC version is implied by "added by the GCC TLS patches".  I'm
guessing that 4.6 (our minimum) is new enough that this is always true
now.  If that's true, it would be nice to know the appropriate GCC
version number for sure and mention that in the comment.  If it is
really still the case that some supported compiler versions or
configurations do not reliably do pure PI access to static/hidden,
then I'd like to have enough information about that to write a proper
configure test.


Thanks,
Roland


2015-04-14  Roland McGrath  <roland@hack.frob.com>

	* sysdeps/arm/configure.ac (PI_STATIC_AND_HIDDEN): Define it.
	* sysdeps/arm/configure: Regenerated.
  

Comments

Chris Metcalf April 15, 2015, 6:46 p.m. UTC | #1
On 04/14/2015 06:39 PM, Roland McGrath wrote:
> I suspect this is now safe.  (It caused no apparent mischief in my
> arm-linux-gnueabihf build using GCC 4.8.2.)  But I don't actually know
> what GCC version is implied by "added by the GCC TLS patches".  I'm
> guessing that 4.6 (our minimum) is new enough that this is always true
> now.  If that's true, it would be nice to know the appropriate GCC
> version number for sure and mention that in the comment.  If it is
> really still the case that some supported compiler versions or
> configurations do not reliably do pure PI access to static/hidden,
> then I'd like to have enough information about that to write a proper
> configure test.

Did you do any benchmarking?  I enabled this on tilegx out of curiosity,
and I found it was both larger (ld.so size increased 0.3%) and slower
(average of 0.1% slower when running fork/exec of a no-op program
in a loop with 10,000 iterations).  The timing results do vary enough
that I'd want to do more extensive testing to be able to say anything
definitive, but it's not very encouraging.

I admit I'm not sure why this might be, but it seems like the right
things were happening, e.g. _dl_start_final is missing and _dl_start
is bigger in the version with PI_STATIC_AND_HIDDEN defined.

All that said, if it's the more standard way, or is desirable for some
other reason, I'm happy to enable it for tilegx, but...
  
Roland McGrath April 15, 2015, 9:11 p.m. UTC | #2
> Did you do any benchmarking?  I enabled this on tilegx out of curiosity,
> and I found it was both larger (ld.so size increased 0.3%) and slower
> (average of 0.1% slower when running fork/exec of a no-op program
> in a loop with 10,000 iterations).  The timing results do vary enough
> that I'd want to do more extensive testing to be able to say anything
> definitive, but it's not very encouraging.

I did not.  Just now I've looked at the sizes on arm-linux-gnueabihf
(-mthumb), and indeed this increases ld.so's text by 256 bytes (to
93989 from 94245).  For -marm the difference is only 192 bytes (to
126581 from 126389).

> I admit I'm not sure why this might be, but it seems like the right
> things were happening, e.g. _dl_start_final is missing and _dl_start
> is bigger in the version with PI_STATIC_AND_HIDDEN defined.

My guess is that there is some extra arithmetic to compute runtime
addresses from PC-relative ones.  Conversely, without
PI_STATIC_AND_HIDDEN defined the equivalent accesses are SP-relative
(and thus simpler) while the extra code to copy fields from the stack
struct to the hidden global struct is less than that increase.

But I didn't actually compare the code even for ARM, let alone Tile.

> All that said, if it's the more standard way, or is desirable for some
> other reason, I'm happy to enable it for tilegx, but...

It's certainly the more standard way in the sense that most machines,
including the most-used machines (x86), do it that way.  The recent
MIPS bug is an indication that the stack bootstrap_map code path is
less tested and perhaps less generally reliable.  I'd say that the
PI_STATIC_AND_HIDDEN case is just more straightforward and clean, so
if we can eventually get every machine into that state, that would be
ideal.

For x86_64, PC-relative accesses are entirely free.  (Well, you forego
indexed addressing modes you might sometimes use with absolute or
SP-relative accesses, but there is no PC-relative overhead per se.)
Not surprisingly, turning PI_STATIC_AND_HIDDEN off there adds 256
bytes to ld.so.  (Actually it is slightly surprising that it's exactly
+256 while ARM/Thumb is exactly -256. :-) It's just entirely expected
that it's a loser on x86_64.)

For i386, PC-relative accesses are relatively costly.  So it's a mild
surprise that turning PI_STATIC_AND_HIDDEN off there adds 188 bytes
(i.e. fewer PC-relative accesses needs more code, perhaps mostly the
extra copy-from-stack code).  My guess is that the compiler backend
for i386 has had more effort put into optimizing these cases.  For
example, perhaps it computes the common base address only once in the
function and reuses it more effectively while the ARM and TIle
backends are repeating the computation more often.  Given that i386
has more register pressure than any other machine, the fact that they
can win here suggests that you could too with sufficient work on the
compiler side.  (But this is all just guesses.)

(All these examples were with GCC 4.8.2 as modified by Ubuntu.
For arm-nacl with GCC 4.9.2 as modified by me, it adds 512 bytes.)


Thanks,
Roland
  
Joseph Myers April 24, 2015, 5:22 p.m. UTC | #3
On Tue, 14 Apr 2015, Roland McGrath wrote:

> I suspect this is now safe.  (It caused no apparent mischief in my
> arm-linux-gnueabihf build using GCC 4.8.2.)  But I don't actually know
> what GCC version is implied by "added by the GCC TLS patches".  I'm

I believe it means 4.1.

> 2015-04-14  Roland McGrath  <roland@hack.frob.com>
> 
> 	* sysdeps/arm/configure.ac (PI_STATIC_AND_HIDDEN): Define it.
> 	* sysdeps/arm/configure: Regenerated.

OK.
  
Roland McGrath April 24, 2015, 5:52 p.m. UTC | #4
Committed with improved comment.

Thanks,
Roland
  

Patch

--- a/sysdeps/arm/configure
+++ b/sysdeps/arm/configure
@@ -1,7 +1,8 @@ 
 # This file is generated from configure.ac by Autoconf.  DO NOT EDIT!
  # Local configure fragment for sysdeps/arm.
 
-#AC_DEFINE(PI_STATIC_AND_HIDDEN)
+$as_echo "#define PI_STATIC_AND_HIDDEN 1" >>confdefs.h
+
 
 # We check to see if the compiler and flags are
 # selecting the hard-float ABI and if they are then
--- a/sysdeps/arm/configure.ac
+++ b/sysdeps/arm/configure.ac
@@ -3,9 +3,7 @@  GLIBC_PROVIDES dnl See aclocal.m4 in the top level source directory.
 
 dnl It is always possible to access static and hidden symbols in an
 dnl position independent way.
-dnl NOTE: This feature was added by the GCC TLS patches.  We should test for
-dnl it.  Until we do, don't define it.
-#AC_DEFINE(PI_STATIC_AND_HIDDEN)
+AC_DEFINE(PI_STATIC_AND_HIDDEN)
 
 # We check to see if the compiler and flags are
 # selecting the hard-float ABI and if they are then