What is R_X86_64_GOTPLT64 used for?

Message ID CAMe9rOrnQRo3XXowAEcd_h=i_i5v04=i=kLWjm2ANduv8MwhYQ@mail.gmail.com
State Not applicable
Headers

Commit Message

H.J. Lu Nov. 13, 2014, 5:58 p.m. UTC
  On Thu, Nov 13, 2014 at 9:03 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Nov 13, 2014 at 8:33 AM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Thu, 13 Nov 2014, H.J. Lu wrote:
>>
>>> x86-64 psABI has
>>>
>>> name@GOT: specifies the offset to the GOT entry for the symbol name
>>> from the base of the GOT.
>>>
>>> name@GOTPLT: specifies the offset to the GOT entry for the symbol name
>>> from the base of the GOT, implying that there is a corresponding PLT entry.
>>>
>>> But GCC never generates name@GOTPLT and assembler fails to assemble
>>> it:
>>
>> I've added the implementation for the large model, but only dimly remember
>> how it got added to the ABI in the first place.  The additional effect of
>> using that reloc was supposed to be that the GOT slot was to be placed
>> into .got.plt, and this might hint at the reasoning for this reloc:
>>
>> If you take the address of a function and call it, you need both a GOT
>> slot and a PLT entry (where the existence of GOT slot is implied by the
>
> That is correct.
>
>> PLT of course).  Now, if you use the normal @GOT64 reloc for the
>> address-taking operation that would create a slot in .got.  For the call
>> instruction you'd use @PLT (or variants thereof, like PLTOFF), which
>> creates the PLT slot _and_ a slot in .got.plt.  So, now we've ended up
>> with two GOT slots for the same symbol, where one should be enough (the
>> address taking operation can just as well use the slot in .got.plt).  So
>> if the compiler would emit @GOTPLT64 instead of @GOT64 for all address
>> references to symbols where it knows that it's a function it could save
>> one GOT slot.
>
> @GOTPLT will create a PLT entry, but it doesn't mean PLT entry will
> be used.  Only @PLTOFF will use PLT entry.  Linker should be smart
> enough to use only one GOT slot, regardless if @GOTPLT or @GOT
> is used to take function address and call via PLT.  However, if
> @GOTPLT is used without @PLT, a PLT entry will be created and unused.
>
> I'd like to propose
>
> 1. Update psABI to remove R_X86_64_GOTPLT64.
> 2. Fix assembler to take @GOTPLT for backward compatibility,
> 3. Make sure that linker uses one GOT slot for @GOT and @PLTOFF.
>

Linker does:

        case R_X86_64_GOT64:
        case R_X86_64_GOTPLT64:
           base_got = htab->elf.sgot;

          if (htab->elf.sgot == NULL)
            abort ();

          if (h != NULL)
            {
              bfd_boolean dyn;

              off = h->got.offset;
              if (h->needs_plt
                  && h->plt.offset != (bfd_vma)-1
                  && off == (bfd_vma)-1)
                {
                  /* We can't use h->got.offset here to save
                     state, or even just remember the offset, as
                     finish_dynamic_symbol would use that as offset into
                     .got.  */
                  bfd_vma plt_index = h->plt.offset / plt_entry_size - 1;
                  off = (plt_index + 3) * GOT_ENTRY_SIZE;
                  base_got = htab->elf.sgotplt;
                }

So if  a symbol is accessed by both @GOT and @PLTOFF, its
needs_plt will be true and its got.plt entry will be used for
both @GOT and @GOTPLT.  @GOTPLT has no advantage
over @GOT, but potentially wastes a PLT entry.

Here is a patch to mark relocation 30 (R_X86_64_GOTPLT64)
as reserved.  I pushed updated x86-64 psABI changes to

https://github.com/hjl-tools/x86-64-psABI/tree/hjl/master

I will update linker to keep accepting relocation 30 and
treat it the same as R_X86_64_GOT64.
  

Comments

Michael Matz Nov. 17, 2014, 2:14 p.m. UTC | #1
Hi,

On Thu, 13 Nov 2014, H.J. Lu wrote:

> Linker does:
> 
> ... code that looks like it might create just one GOT slot ...
> 
> So if  a symbol is accessed by both @GOT and @PLTOFF, its
> needs_plt will be true and its got.plt entry will be used for
> both @GOT and @GOTPLT.  @GOTPLT has no advantage
> over @GOT, but potentially wastes a PLT entry.

The above is not correct.  Had you tried you'd see this:

% cat x.c
extern void foo (void);
void main (void)
{
  void (*f)(void) = foo;
  f();
  foo();
}
% gcc -fPIE -mcmodel=large -S x.c; cat x.s
...
        movabsq $foo@GOT, %rax
...
        movabsq $foo@PLTOFF, %rax
...

So, foo is access via @GOT offset and @PLTOFF.  Then,

% cat y.c
void foo (void) {}
% gcc -o liby.so -shared -fPIC y.c
% gcc -fPIE -mcmodel=large x.s liby.so
% readelf -r a.out
...
000000600ff8  000400000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0
...
000000601028  000400000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0
...

The first one (to 600ff8) is the normal GOT slot, the second one the GOT 
slot for the PLT entry.  Both are actually used:

00000000004005f0 <foo@plt>:
  4005f0:       ff 25 32 0a 20 00       jmpq   *0x200a32(%rip)        # 601028 <_GLOBAL_OFFSET_TABLE_+0x28>

That uses the second GOT slot, and:

00000000004006ec <main>:
  4006ec:       55                      push   %rbp
  4006ed:       48 89 e5                mov    %rsp,%rbp
  4006f0:       53                      push   %rbx
  4006f1:       48 83 ec 18             sub    $0x18,%rsp
  4006f5:       48 8d 1d f9 ff ff ff    lea    -0x7(%rip),%rbx        # 4006f5 <main+0x9>
  4006fc:       49 bb 0b 09 20 00 00    movabs $0x20090b,%r11
  400703:       00 00 00 
  400706:       4c 01 db                add    %r11,%rbx
  400709:       48 b8 f8 ff ff ff ff    movabs $0xfffffffffffffff8,%rax
  400710:       ff ff ff 
  400713:       48 8b 04 03             mov    (%rbx,%rax,1),%rax

This uses the first slot at 0x600ff8.

So, no, currently GOT and GOTPLT (at least how it's supposed to be 
implemented) are not equivalent.

> Here is a patch to mark relocation 30 (R_X86_64_GOTPLT64) as reserved.  
> I pushed updated x86-64 psABI changes to
> 
> https://github.com/hjl-tools/x86-64-psABI/tree/hjl/master
> 
> I will update linker to keep accepting relocation 30 and treat it the 
> same as R_X86_64_GOT64.

That seems a bit premature given the above.


Ciao,
Michael.
  
H.J. Lu Nov. 17, 2014, 8:35 p.m. UTC | #2
On Mon, Nov 17, 2014 at 6:14 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Thu, 13 Nov 2014, H.J. Lu wrote:
>
>> Linker does:
>>
>> ... code that looks like it might create just one GOT slot ...
>>
>> So if  a symbol is accessed by both @GOT and @PLTOFF, its
>> needs_plt will be true and its got.plt entry will be used for
>> both @GOT and @GOTPLT.  @GOTPLT has no advantage
>> over @GOT, but potentially wastes a PLT entry.
>
> The above is not correct.  Had you tried you'd see this:
>
> % cat x.c
> extern void foo (void);
> void main (void)
> {
>   void (*f)(void) = foo;
>   f();
>   foo();
> }
> % gcc -fPIE -mcmodel=large -S x.c; cat x.s
> ...
>         movabsq $foo@GOT, %rax
> ...
>         movabsq $foo@PLTOFF, %rax
> ...
>
> So, foo is access via @GOT offset and @PLTOFF.  Then,
>
> % cat y.c
> void foo (void) {}
> % gcc -o liby.so -shared -fPIC y.c
> % gcc -fPIE -mcmodel=large x.s liby.so
> % readelf -r a.out
> ...
> 000000600ff8  000400000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0
> ...
> 000000601028  000400000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0
> ...
>
> The first one (to 600ff8) is the normal GOT slot, the second one the GOT
> slot for the PLT entry.  Both are actually used:
>
> 00000000004005f0 <foo@plt>:
>   4005f0:       ff 25 32 0a 20 00       jmpq   *0x200a32(%rip)        # 601028 <_GLOBAL_OFFSET_TABLE_+0x28>
>
> That uses the second GOT slot, and:
>
> 00000000004006ec <main>:
>   4006ec:       55                      push   %rbp
>   4006ed:       48 89 e5                mov    %rsp,%rbp
>   4006f0:       53                      push   %rbx
>   4006f1:       48 83 ec 18             sub    $0x18,%rsp
>   4006f5:       48 8d 1d f9 ff ff ff    lea    -0x7(%rip),%rbx        # 4006f5 <main+0x9>
>   4006fc:       49 bb 0b 09 20 00 00    movabs $0x20090b,%r11
>   400703:       00 00 00
>   400706:       4c 01 db                add    %r11,%rbx
>   400709:       48 b8 f8 ff ff ff ff    movabs $0xfffffffffffffff8,%rax
>   400710:       ff ff ff
>   400713:       48 8b 04 03             mov    (%rbx,%rax,1),%rax
>
> This uses the first slot at 0x600ff8.
>
> So, no, currently GOT and GOTPLT (at least how it's supposed to be
> implemented) are not equivalent.

It has nothing to do with large model.  The same thing
happens to small model.    We may be to able optimize
it, independent of GOTPLT.

In any case,  -mcmodel=large shouldn't change program behavior.
  
Michael Matz Nov. 18, 2014, 1:12 p.m. UTC | #3
Hi,

On Mon, 17 Nov 2014, H.J. Lu wrote:

> It has nothing to do with large model.

Yes, I didn't say so.  I've used it only to force GCC to emit @GOT relocs 
(otherwise it would have used @GOTPCREL) to disprove your claim.

> The same thing happens to small model.  We may be to able optimize it, 
> independent of GOTPLT.

Yes, if we were to optimize this, the difference between GOT and GOTPLT 
would be very minor.

> In any case, -mcmodel=large shouldn't change program behavior.

No, it shouldn't of course.


Ciao,
Michael.
  
H.J. Lu Nov. 18, 2014, 2:52 p.m. UTC | #4
On Tue, Nov 18, 2014 at 5:12 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Mon, 17 Nov 2014, H.J. Lu wrote:
>
>> It has nothing to do with large model.
>
> Yes, I didn't say so.  I've used it only to force GCC to emit @GOT relocs
> (otherwise it would have used @GOTPCREL) to disprove your claim.

Well, it was just on paper.  Linker never implemented such GOTPLT optimization:

[hjl@gnu-6 simple]$ cat main.S
.file "main.c"
.text
.globl _start
.type _start, @function
_start:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rbx
subq $24, %rsp
.L2:
.cfi_offset 3, -24
leaq .L2(%rip), %rbx
movabsq $_GLOBAL_OFFSET_TABLE_-.L2, %r11
addq %r11, %rbx
movabsq $foo@GOTPLT, %rax
movq (%rbx,%rax), %rax
movq %rax, -24(%rbp)
movq -24(%rbp), %rax
call *%rax
movabsq $foo@PLTOFF, %rax
addq %rbx, %rax
call *%rax
addq $24, %rsp
popq %rbx
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size _start, .-_start
.ident "GCC: (GNU) 4.8.3 20140911 (Red Hat 4.8.3-7)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-6 simple]$ cat foo.c
void
foo (void)
{
}
[hjl@gnu-6 simple]$ make
gcc -fpie -mcmodel=large -c -o main.o main.S
gcc -fpic   -c -o foo.o foo.c
./usr/local/bin/ld -shared -o libfoo.so foo.o
./usr/local/bin/ld -o foo main.o libfoo.so
readelf -r main.o

Relocation section '.rela.text' at offset 0x290 contains 3 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000012  00090000001d R_X86_64_GOTPC64  0000000000000000
_GLOBAL_OFFSET_TABLE_ + 9
00000000001f  000a0000001e R_X86_64_GOTPLT64 0000000000000000 foo + 0
000000000037  000a0000001f R_X86_64_PLTOFF64 0000000000000000 foo + 0

Relocation section '.rela.eh_frame' at offset 0x2d8 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000020  000200000002 R_X86_64_PC32     0000000000000000 .text + 0
readelf -r foo

Relocation section '.rela.dyn' at offset 0x268 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
0000006004b8  000200000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0

Relocation section '.rela.plt' at offset 0x280 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
0000006004d8  000200000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0
[hjl@gnu-6 simple]$

>> The same thing happens to small model.  We may be to able optimize it,
>> independent of GOTPLT.
>
> Yes, if we were to optimize this, the difference between GOT and GOTPLT
> would be very minor.
>

I will give it a thought.  But we don't need GOTPLT for it.  We should obsolete
GOTPLT.
  
H.J. Lu Nov. 19, 2014, 4:38 p.m. UTC | #5
On Mon, Nov 17, 2014 at 6:14 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Thu, 13 Nov 2014, H.J. Lu wrote:
>
>> Linker does:
>>
>> ... code that looks like it might create just one GOT slot ...
>>
>> So if  a symbol is accessed by both @GOT and @PLTOFF, its
>> needs_plt will be true and its got.plt entry will be used for
>> both @GOT and @GOTPLT.  @GOTPLT has no advantage
>> over @GOT, but potentially wastes a PLT entry.
>
> The above is not correct.  Had you tried you'd see this:
>
> % cat x.c
> extern void foo (void);
> void main (void)
> {
>   void (*f)(void) = foo;
>   f();
>   foo();
> }
> % gcc -fPIE -mcmodel=large -S x.c; cat x.s
> ...
>         movabsq $foo@GOT, %rax
> ...
>         movabsq $foo@PLTOFF, %rax
> ...
>
> So, foo is access via @GOT offset and @PLTOFF.  Then,
>
> % cat y.c
> void foo (void) {}
> % gcc -o liby.so -shared -fPIC y.c
> % gcc -fPIE -mcmodel=large x.s liby.so
> % readelf -r a.out
> ...
> 000000600ff8  000400000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0
> ...
> 000000601028  000400000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0
> ...
>
> The first one (to 600ff8) is the normal GOT slot, the second one the GOT
> slot for the PLT entry.  Both are actually used:
>
> 00000000004005f0 <foo@plt>:
>   4005f0:       ff 25 32 0a 20 00       jmpq   *0x200a32(%rip)        # 601028 <_GLOBAL_OFFSET_TABLE_+0x28>

They are not:

Starting program: /export/home/hjl/bugs/binutils/gotplt/foo

Breakpoint 1, main () at main.c:5
5  void (*f)(void) = foo;
(gdb) disass
Dump of assembler code for function main:
   0x000000000040058d <+0>: push   %rbp
   0x000000000040058e <+1>: mov    %rsp,%rbp
   0x0000000000400591 <+4>: push   %rbx
   0x0000000000400592 <+5>: sub    $0x18,%rsp
   0x0000000000400596 <+9>: lea    -0x7(%rip),%rbx        # 0x400596 <main+9>
   0x000000000040059d <+16>: movabs $0x20042a,%r11
   0x00000000004005a7 <+26>: add    %r11,%rbx
=> 0x00000000004005aa <+29>: movabs $0xfffffffffffffff8,%rax
   0x00000000004005b4 <+39>: mov    (%rbx,%rax,1),%rax
   0x00000000004005b8 <+43>: mov    %rax,-0x18(%rbp)
   0x00000000004005bc <+47>: mov    -0x18(%rbp),%rax
   0x00000000004005c0 <+51>: callq  *%rax
   0x00000000004005c2 <+53>: movabs $0xffffffffffdffad0,%rax
   0x00000000004005cc <+63>: add    %rbx,%rax
   0x00000000004005cf <+66>: callq  *%rax
   0x00000000004005d1 <+68>: mov    $0x0,%eax
   0x00000000004005d6 <+73>: add    $0x18,%rsp
   0x00000000004005da <+77>: pop    %rbx
   0x00000000004005db <+78>: pop    %rbp
   0x00000000004005dc <+79>: retq
End of assembler dump.
(gdb) b *0x00000000004005c0
Breakpoint 2 at 0x4005c0: file main.c, line 6.
(gdb) b *0x00000000004005cf
Breakpoint 3 at 0x4005cf: file main.c, line 7.
(gdb) c
Continuing.

Breakpoint 2, 0x00000000004005c0 in main () at main.c:6
6  f();
(gdb) p $rax
$5 = 140737352012384
(gdb) disass $rax
Dump of assembler code for function foo:
   0x00007ffff7df9260 <+0>: push   %rbp
   0x00007ffff7df9261 <+1>: mov    %rsp,%rbp
   0x00007ffff7df9264 <+4>: lea    0x7(%rip),%rdi        # 0x7ffff7df9272
   0x00007ffff7df926b <+11>: callq  0x7ffff7df9250 <puts@plt>
   0x00007ffff7df9270 <+16>: pop    %rbp
   0x00007ffff7df9271 <+17>: retq
End of assembler dump.
(gdb) c
Continuing.
foo

Breakpoint 3, 0x00000000004005cf in main () at main.c:7
7  foo();
(gdb) p $rax
$6 = 4195472
(gdb) disass $rax
Dump of assembler code for function foo@plt:
   0x0000000000400490 <+0>: jmpq   *0x200552(%rip)        # 0x6009e8
<foo@got.plt>
   0x0000000000400496 <+6>: pushq  $0x2
   0x000000000040049b <+11>: jmpq   0x400460
End of assembler dump.
(gdb)

> That uses the second GOT slot, and:
>
> 00000000004006ec <main>:
>   4006ec:       55                      push   %rbp
>   4006ed:       48 89 e5                mov    %rsp,%rbp
>   4006f0:       53                      push   %rbx
>   4006f1:       48 83 ec 18             sub    $0x18,%rsp
>   4006f5:       48 8d 1d f9 ff ff ff    lea    -0x7(%rip),%rbx        # 4006f5 <main+0x9>
>   4006fc:       49 bb 0b 09 20 00 00    movabs $0x20090b,%r11
>   400703:       00 00 00
>   400706:       4c 01 db                add    %r11,%rbx
>   400709:       48 b8 f8 ff ff ff ff    movabs $0xfffffffffffffff8,%rax
>   400710:       ff ff ff
>   400713:       48 8b 04 03             mov    (%rbx,%rax,1),%rax
>
> This uses the first slot at 0x600ff8.
>
> So, no, currently GOT and GOTPLT (at least how it's supposed to be
> implemented) are not equivalent.

GOT reference:

  void (*f)(void) = foo;
  f();

gives you the address of function, foo, in liby.so, without going through
PLT, while

foo()

is called via PLT.  For function call, we must use PLT.  For pointer
reference, we don't use PLT slot:

1. We don't need the indirect branch in PLT.
2. All pointer references to the same function should have the same
value.

One way to optimize it is to make PLT entry to use the normal GOT
slot:

jmp  *name@GOTPCREL(%rip)
8 byte nop

where name@GOTPCREL points to the normal GOT slot
updated by R_X86_64_GLOB_DAT relocation at run-time.
Should I give it a try?
  
H.J. Lu Nov. 19, 2014, 11:54 p.m. UTC | #6
On Wed, Nov 19, 2014 at 8:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
> One way to optimize it is to make PLT entry to use the normal GOT
> slot:
>
> jmp  *name@GOTPCREL(%rip)
> 8 byte nop
>
> where name@GOTPCREL points to the normal GOT slot
> updated by R_X86_64_GLOB_DAT relocation at run-time.
> Should I give it a try?

I turned out that we can reuse BND PLT.  I implemented it in BFD ld
on hjl/plt.got branch:

https://sourceware.org/git/?p=binutils-gdb.git;a=shortlog;h=refs/heads/hjl/plt.got

I tested it on glibc and it works.  It should work with all models.  Please
give it a try.
  
H.J. Lu Nov. 20, 2014, 12:02 a.m. UTC | #7
On Wed, Nov 19, 2014 at 3:54 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Nov 19, 2014 at 8:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> One way to optimize it is to make PLT entry to use the normal GOT
>> slot:
>>
>> jmp  *name@GOTPCREL(%rip)
>> 8 byte nop
>>
>> where name@GOTPCREL points to the normal GOT slot
>> updated by R_X86_64_GLOB_DAT relocation at run-time.
>> Should I give it a try?
>
> I turned out that we can reuse BND PLT.  I implemented it in BFD ld
> on hjl/plt.got branch:
>
> https://sourceware.org/git/?p=binutils-gdb.git;a=shortlog;h=refs/heads/hjl/plt.got
>
> I tested it on glibc and it works.  It should work with all models.  Please
> give it a try.

I spoke too soon.  I found a problem and I will investigate it.
  
Michael Matz Nov. 20, 2014, 1:42 p.m. UTC | #8
Hi,

On Wed, 19 Nov 2014, H.J. Lu wrote:

> > The first one (to 600ff8) is the normal GOT slot, the second one the GOT
> > slot for the PLT entry.  Both are actually used:
> >
> > 00000000004005f0 <foo@plt>:
> >   4005f0:       ff 25 32 0a 20 00       jmpq   *0x200a32(%rip)        # 601028 <_GLOBAL_OFFSET_TABLE_+0x28>
> 
> They are not:

Huh?  I said both GOT slots are used and I proved it in the disasm dumps.

> => 0x00000000004005aa <+29>: movabs $0xfffffffffffffff8,%rax
>    0x00000000004005b4 <+39>: mov    (%rbx,%rax,1),%rax

Here it's using one of the GOT slots (namely the one not associated with 
the PLT entry) ...

> Breakpoint 2, 0x00000000004005c0 in main () at main.c:6
> 6  f();
> (gdb) p $rax
> $5 = 140737352012384
> (gdb) disass $rax
> Dump of assembler code for function foo:

... which is why %rax contains the final address of foo, being loaded from 
the appropriate GOT slot that was just relocated with a GLOB_DAT reloc.

> Breakpoint 3, 0x00000000004005cf in main () at main.c:7
> 7  foo();
> (gdb) p $rax
> $6 = 4195472
> (gdb) disass $rax
> Dump of assembler code for function foo@plt:
>    0x0000000000400490 <+0>: jmpq   *0x200552(%rip)        # 0x6009e8
> <foo@got.plt>

And here it's using the other GOT slot (associated with this PLT entry), 
unequal to the one used above and initially pointing to the first PLT 
stub.  So why do you say that not both are used, you clearly see they are?

> One way to optimize it is to make PLT entry to use the normal GOT
> slot:

Exactly.  As a symbol lookup needs to be done anyway for the GLOB_DAT 
reloc going through the dynamic linker for the lazy lookup later when a 
call occurs doesn't make sense.

> jmp *name@GOTPCREL(%rip)
> 8 byte nop

You mean replacing the PLT slot with the above?  Yep, something like that.  
Even better of course would be to not use the PLT slot at all, it's just a 
useless indirection.  It would be even cooler to rewrite the call insn 
from
  call foo@PLT
into
  call *foo@GOTPCREL(%rip)

(in the small model here)  Unfortunately the latter is one byte larger 
than the former.  But perhaps GCC could already emit the latter form 
when it knows a certain function symbol has its address taken (or more 
precisely if a GLOB_DAT reloc is going to be emitted for it).

> where name@GOTPCREL points to the normal GOT slot
> updated by R_X86_64_GLOB_DAT relocation at run-time.
> Should I give it a try?

Frankly, I have no idea if it's worth it.  Address takings of function 
symbols doesn't occur very often, except in vtables, and that's not using 
GOT slots.  Vtables should be handled in a completely different way 
anyway: as the entries aren't usually used for address comparisons they 
should point to the PLT slots, so that it's only RELATIVE relocs, not 
symbol based ones, so that also virtual calls can be resolved lazily.


Ciao,
Michael.
  
H.J. Lu Nov. 20, 2014, 3:04 p.m. UTC | #9
On Thu, Nov 20, 2014 at 5:42 AM, Michael Matz <matz@suse.de> wrote:
> Exactly.  As a symbol lookup needs to be done anyway for the GLOB_DAT
> reloc going through the dynamic linker for the lazy lookup later when a
> call occurs doesn't make sense.
>
>> jmp *name@GOTPCREL(%rip)
>> 8 byte nop
>
> You mean replacing the PLT slot with the above?  Yep, something like that.
> Even better of course would be to not use the PLT slot at all, it's just a
> useless indirection.  It would be even cooler to rewrite the call insn
> from
>   call foo@PLT
> into
>   call *foo@GOTPCREL(%rip)
>
> (in the small model here)  Unfortunately the latter is one byte larger
> than the former.  But perhaps GCC could already emit the latter form
> when it knows a certain function symbol has its address taken (or more
> precisely if a GLOB_DAT reloc is going to be emitted for it).
>
>> where name@GOTPCREL points to the normal GOT slot
>> updated by R_X86_64_GLOB_DAT relocation at run-time.
>> Should I give it a try?
>
> Frankly, I have no idea if it's worth it.  Address takings of function
> symbols doesn't occur very often, except in vtables, and that's not using
> GOT slots.  Vtables should be handled in a completely different way
> anyway: as the entries aren't usually used for address comparisons they
> should point to the PLT slots, so that it's only RELATIVE relocs, not
> symbol based ones, so that also virtual calls can be resolved lazily.
>
>
> Ciao,
> Michael.

I fixed a bug on hjl/plt.got branch:

https://sourceware.org/git/?p=binutils-gdb.git;a=shortlog;h=refs/heads/hjl/plt.got

It passed glibc tests and bootstrapped GCC.  It optimized functions like

std::bad_exception::~bad_exception()
__cxa_finalize
std::range_error::~range_error()
std::bad_array_length::~bad_array_length()
  

Patch

diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index 7f636fc..981390b 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -1242,9 +1242,6 @@  examples and discussion.  They are:
 \begin{itemize}
 \item \code{name@GOT}: specifies the offset to the GOT entry for
       the symbol \code{name} from the base of the GOT.
-\item \code{name@GOTPLT}: specifies the offset to the GOT entry for
-      the symbol \code{name} from the base of the GOT, implying that
-      there is a corresponding PLT entry.
 \item \code{name@GOTOFF}: specifies the offset to the location of
       the symbol \code{name} from the base of the GOT.
 \item \code{name@GOTPCREL}: specifies the offset to the GOT entry
diff --git a/object-files.tex b/object-files.tex
index 4705e96..c0698dc 100644
--- a/object-files.tex
+++ b/object-files.tex
@@ -611,7 +611,7 @@  Name                        &  Value &   Field   & Calculati
on            \\
 \hline
 \code{R_X86_64_GOTPC64}     &  29    &   word64  & \code{GOT - P + A}     \\
 \hline
gnu-6:pts/18[114]> cat /tmp/x
diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index 7f636fc..981390b 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -1242,9 +1242,6 @@  examples and discussion.  They are:
 \begin{itemize}
 \item \code{name@GOT}: specifies the offset to the GOT entry for
       the symbol \code{name} from the base of the GOT.
-\item \code{name@GOTPLT}: specifies the offset to the GOT entry for
-      the symbol \code{name} from the base of the GOT, implying that
-      there is a corresponding PLT entry.
 \item \code{name@GOTOFF}: specifies the offset to the location of
       the symbol \code{name} from the base of the GOT.
 \item \code{name@GOTPCREL}: specifies the offset to the GOT entry
diff --git a/object-files.tex b/object-files.tex
index 4705e96..c0698dc 100644
--- a/object-files.tex
+++ b/object-files.tex
@@ -611,7 +611,7 @@  Name                        &  Value &   Field   &
Calculation            \\
 \hline
 \code{R_X86_64_GOTPC64}     &  29    &   word64  & \code{GOT - P + A}     \\
 \hline
-\code{R_X86_64_GOTPLT64}    &  30    &   word64  & \code{G + A}           \\
+\code{Reserved}             &  30    &           &                        \\
 \hline
 \code{R_X86_64_PLTOFF64}    &  31    &   word64  & \code{L - GOT + A}     \\