Message ID | CAMe9rOrnQRo3XXowAEcd_h=i_i5v04=i=kLWjm2ANduv8MwhYQ@mail.gmail.com |
---|---|
State | Not applicable |
Headers |
Received: (qmail 1174 invoked by alias); 13 Nov 2014 17:58:51 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 1145 invoked by uid 89); 13 Nov 2014 17:58:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients X-HELO: mail-oi0-f42.google.com MIME-Version: 1.0 X-Received: by 10.182.108.229 with SMTP id hn5mr3083496obb.51.1415901527075; Thu, 13 Nov 2014 09:58:47 -0800 (PST) In-Reply-To: <CAMe9rOrTg=YtVZ1EqN7ha8qUPSXzms20eMU51txVAmL3+cUsQQ@mail.gmail.com> References: <CAMe9rOqb0g2asAe6UZ0hxh8jFf-+eBiaez0pLrPjd0oqVdP0Rg@mail.gmail.com> <alpine.LNX.2.00.1411131717220.405@wotan.suse.de> <CAMe9rOrTg=YtVZ1EqN7ha8qUPSXzms20eMU51txVAmL3+cUsQQ@mail.gmail.com> Date: Thu, 13 Nov 2014 09:58:46 -0800 Message-ID: <CAMe9rOrnQRo3XXowAEcd_h=i_i5v04=i=kLWjm2ANduv8MwhYQ@mail.gmail.com> Subject: Re: What is R_X86_64_GOTPLT64 used for? From: "H.J. Lu" <hjl.tools@gmail.com> To: Michael Matz <matz@suse.de> Cc: "x86-64-abi@googlegroups.com" <x86-64-abi@googlegroups.com>, GCC Development <gcc@gcc.gnu.org>, Binutils <binutils@sourceware.org>, GNU C Library <libc-alpha@sourceware.org> Content-Type: text/plain; charset=UTF-8 |
Commit Message
H.J. Lu
Nov. 13, 2014, 5:58 p.m. UTC
On Thu, Nov 13, 2014 at 9:03 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Thu, Nov 13, 2014 at 8:33 AM, Michael Matz <matz@suse.de> wrote: >> Hi, >> >> On Thu, 13 Nov 2014, H.J. Lu wrote: >> >>> x86-64 psABI has >>> >>> name@GOT: specifies the offset to the GOT entry for the symbol name >>> from the base of the GOT. >>> >>> name@GOTPLT: specifies the offset to the GOT entry for the symbol name >>> from the base of the GOT, implying that there is a corresponding PLT entry. >>> >>> But GCC never generates name@GOTPLT and assembler fails to assemble >>> it: >> >> I've added the implementation for the large model, but only dimly remember >> how it got added to the ABI in the first place. The additional effect of >> using that reloc was supposed to be that the GOT slot was to be placed >> into .got.plt, and this might hint at the reasoning for this reloc: >> >> If you take the address of a function and call it, you need both a GOT >> slot and a PLT entry (where the existence of GOT slot is implied by the > > That is correct. > >> PLT of course). Now, if you use the normal @GOT64 reloc for the >> address-taking operation that would create a slot in .got. For the call >> instruction you'd use @PLT (or variants thereof, like PLTOFF), which >> creates the PLT slot _and_ a slot in .got.plt. So, now we've ended up >> with two GOT slots for the same symbol, where one should be enough (the >> address taking operation can just as well use the slot in .got.plt). So >> if the compiler would emit @GOTPLT64 instead of @GOT64 for all address >> references to symbols where it knows that it's a function it could save >> one GOT slot. > > @GOTPLT will create a PLT entry, but it doesn't mean PLT entry will > be used. Only @PLTOFF will use PLT entry. Linker should be smart > enough to use only one GOT slot, regardless if @GOTPLT or @GOT > is used to take function address and call via PLT. However, if > @GOTPLT is used without @PLT, a PLT entry will be created and unused. > > I'd like to propose > > 1. Update psABI to remove R_X86_64_GOTPLT64. > 2. Fix assembler to take @GOTPLT for backward compatibility, > 3. Make sure that linker uses one GOT slot for @GOT and @PLTOFF. > Linker does: case R_X86_64_GOT64: case R_X86_64_GOTPLT64: base_got = htab->elf.sgot; if (htab->elf.sgot == NULL) abort (); if (h != NULL) { bfd_boolean dyn; off = h->got.offset; if (h->needs_plt && h->plt.offset != (bfd_vma)-1 && off == (bfd_vma)-1) { /* We can't use h->got.offset here to save state, or even just remember the offset, as finish_dynamic_symbol would use that as offset into .got. */ bfd_vma plt_index = h->plt.offset / plt_entry_size - 1; off = (plt_index + 3) * GOT_ENTRY_SIZE; base_got = htab->elf.sgotplt; } So if a symbol is accessed by both @GOT and @PLTOFF, its needs_plt will be true and its got.plt entry will be used for both @GOT and @GOTPLT. @GOTPLT has no advantage over @GOT, but potentially wastes a PLT entry. Here is a patch to mark relocation 30 (R_X86_64_GOTPLT64) as reserved. I pushed updated x86-64 psABI changes to https://github.com/hjl-tools/x86-64-psABI/tree/hjl/master I will update linker to keep accepting relocation 30 and treat it the same as R_X86_64_GOT64.
Comments
Hi, On Thu, 13 Nov 2014, H.J. Lu wrote: > Linker does: > > ... code that looks like it might create just one GOT slot ... > > So if a symbol is accessed by both @GOT and @PLTOFF, its > needs_plt will be true and its got.plt entry will be used for > both @GOT and @GOTPLT. @GOTPLT has no advantage > over @GOT, but potentially wastes a PLT entry. The above is not correct. Had you tried you'd see this: % cat x.c extern void foo (void); void main (void) { void (*f)(void) = foo; f(); foo(); } % gcc -fPIE -mcmodel=large -S x.c; cat x.s ... movabsq $foo@GOT, %rax ... movabsq $foo@PLTOFF, %rax ... So, foo is access via @GOT offset and @PLTOFF. Then, % cat y.c void foo (void) {} % gcc -o liby.so -shared -fPIC y.c % gcc -fPIE -mcmodel=large x.s liby.so % readelf -r a.out ... 000000600ff8 000400000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0 ... 000000601028 000400000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0 ... The first one (to 600ff8) is the normal GOT slot, the second one the GOT slot for the PLT entry. Both are actually used: 00000000004005f0 <foo@plt>: 4005f0: ff 25 32 0a 20 00 jmpq *0x200a32(%rip) # 601028 <_GLOBAL_OFFSET_TABLE_+0x28> That uses the second GOT slot, and: 00000000004006ec <main>: 4006ec: 55 push %rbp 4006ed: 48 89 e5 mov %rsp,%rbp 4006f0: 53 push %rbx 4006f1: 48 83 ec 18 sub $0x18,%rsp 4006f5: 48 8d 1d f9 ff ff ff lea -0x7(%rip),%rbx # 4006f5 <main+0x9> 4006fc: 49 bb 0b 09 20 00 00 movabs $0x20090b,%r11 400703: 00 00 00 400706: 4c 01 db add %r11,%rbx 400709: 48 b8 f8 ff ff ff ff movabs $0xfffffffffffffff8,%rax 400710: ff ff ff 400713: 48 8b 04 03 mov (%rbx,%rax,1),%rax This uses the first slot at 0x600ff8. So, no, currently GOT and GOTPLT (at least how it's supposed to be implemented) are not equivalent. > Here is a patch to mark relocation 30 (R_X86_64_GOTPLT64) as reserved. > I pushed updated x86-64 psABI changes to > > https://github.com/hjl-tools/x86-64-psABI/tree/hjl/master > > I will update linker to keep accepting relocation 30 and treat it the > same as R_X86_64_GOT64. That seems a bit premature given the above. Ciao, Michael.
On Mon, Nov 17, 2014 at 6:14 AM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Thu, 13 Nov 2014, H.J. Lu wrote: > >> Linker does: >> >> ... code that looks like it might create just one GOT slot ... >> >> So if a symbol is accessed by both @GOT and @PLTOFF, its >> needs_plt will be true and its got.plt entry will be used for >> both @GOT and @GOTPLT. @GOTPLT has no advantage >> over @GOT, but potentially wastes a PLT entry. > > The above is not correct. Had you tried you'd see this: > > % cat x.c > extern void foo (void); > void main (void) > { > void (*f)(void) = foo; > f(); > foo(); > } > % gcc -fPIE -mcmodel=large -S x.c; cat x.s > ... > movabsq $foo@GOT, %rax > ... > movabsq $foo@PLTOFF, %rax > ... > > So, foo is access via @GOT offset and @PLTOFF. Then, > > % cat y.c > void foo (void) {} > % gcc -o liby.so -shared -fPIC y.c > % gcc -fPIE -mcmodel=large x.s liby.so > % readelf -r a.out > ... > 000000600ff8 000400000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0 > ... > 000000601028 000400000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0 > ... > > The first one (to 600ff8) is the normal GOT slot, the second one the GOT > slot for the PLT entry. Both are actually used: > > 00000000004005f0 <foo@plt>: > 4005f0: ff 25 32 0a 20 00 jmpq *0x200a32(%rip) # 601028 <_GLOBAL_OFFSET_TABLE_+0x28> > > That uses the second GOT slot, and: > > 00000000004006ec <main>: > 4006ec: 55 push %rbp > 4006ed: 48 89 e5 mov %rsp,%rbp > 4006f0: 53 push %rbx > 4006f1: 48 83 ec 18 sub $0x18,%rsp > 4006f5: 48 8d 1d f9 ff ff ff lea -0x7(%rip),%rbx # 4006f5 <main+0x9> > 4006fc: 49 bb 0b 09 20 00 00 movabs $0x20090b,%r11 > 400703: 00 00 00 > 400706: 4c 01 db add %r11,%rbx > 400709: 48 b8 f8 ff ff ff ff movabs $0xfffffffffffffff8,%rax > 400710: ff ff ff > 400713: 48 8b 04 03 mov (%rbx,%rax,1),%rax > > This uses the first slot at 0x600ff8. > > So, no, currently GOT and GOTPLT (at least how it's supposed to be > implemented) are not equivalent. It has nothing to do with large model. The same thing happens to small model. We may be to able optimize it, independent of GOTPLT. In any case, -mcmodel=large shouldn't change program behavior.
Hi, On Mon, 17 Nov 2014, H.J. Lu wrote: > It has nothing to do with large model. Yes, I didn't say so. I've used it only to force GCC to emit @GOT relocs (otherwise it would have used @GOTPCREL) to disprove your claim. > The same thing happens to small model. We may be to able optimize it, > independent of GOTPLT. Yes, if we were to optimize this, the difference between GOT and GOTPLT would be very minor. > In any case, -mcmodel=large shouldn't change program behavior. No, it shouldn't of course. Ciao, Michael.
On Tue, Nov 18, 2014 at 5:12 AM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Mon, 17 Nov 2014, H.J. Lu wrote: > >> It has nothing to do with large model. > > Yes, I didn't say so. I've used it only to force GCC to emit @GOT relocs > (otherwise it would have used @GOTPCREL) to disprove your claim. Well, it was just on paper. Linker never implemented such GOTPLT optimization: [hjl@gnu-6 simple]$ cat main.S .file "main.c" .text .globl _start .type _start, @function _start: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 pushq %rbx subq $24, %rsp .L2: .cfi_offset 3, -24 leaq .L2(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.L2, %r11 addq %r11, %rbx movabsq $foo@GOTPLT, %rax movq (%rbx,%rax), %rax movq %rax, -24(%rbp) movq -24(%rbp), %rax call *%rax movabsq $foo@PLTOFF, %rax addq %rbx, %rax call *%rax addq $24, %rsp popq %rbx popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size _start, .-_start .ident "GCC: (GNU) 4.8.3 20140911 (Red Hat 4.8.3-7)" .section .note.GNU-stack,"",@progbits [hjl@gnu-6 simple]$ cat foo.c void foo (void) { } [hjl@gnu-6 simple]$ make gcc -fpie -mcmodel=large -c -o main.o main.S gcc -fpic -c -o foo.o foo.c ./usr/local/bin/ld -shared -o libfoo.so foo.o ./usr/local/bin/ld -o foo main.o libfoo.so readelf -r main.o Relocation section '.rela.text' at offset 0x290 contains 3 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000012 00090000001d R_X86_64_GOTPC64 0000000000000000 _GLOBAL_OFFSET_TABLE_ + 9 00000000001f 000a0000001e R_X86_64_GOTPLT64 0000000000000000 foo + 0 000000000037 000a0000001f R_X86_64_PLTOFF64 0000000000000000 foo + 0 Relocation section '.rela.eh_frame' at offset 0x2d8 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0 readelf -r foo Relocation section '.rela.dyn' at offset 0x268 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 0000006004b8 000200000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0 Relocation section '.rela.plt' at offset 0x280 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 0000006004d8 000200000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0 [hjl@gnu-6 simple]$ >> The same thing happens to small model. We may be to able optimize it, >> independent of GOTPLT. > > Yes, if we were to optimize this, the difference between GOT and GOTPLT > would be very minor. > I will give it a thought. But we don't need GOTPLT for it. We should obsolete GOTPLT.
On Mon, Nov 17, 2014 at 6:14 AM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Thu, 13 Nov 2014, H.J. Lu wrote: > >> Linker does: >> >> ... code that looks like it might create just one GOT slot ... >> >> So if a symbol is accessed by both @GOT and @PLTOFF, its >> needs_plt will be true and its got.plt entry will be used for >> both @GOT and @GOTPLT. @GOTPLT has no advantage >> over @GOT, but potentially wastes a PLT entry. > > The above is not correct. Had you tried you'd see this: > > % cat x.c > extern void foo (void); > void main (void) > { > void (*f)(void) = foo; > f(); > foo(); > } > % gcc -fPIE -mcmodel=large -S x.c; cat x.s > ... > movabsq $foo@GOT, %rax > ... > movabsq $foo@PLTOFF, %rax > ... > > So, foo is access via @GOT offset and @PLTOFF. Then, > > % cat y.c > void foo (void) {} > % gcc -o liby.so -shared -fPIC y.c > % gcc -fPIE -mcmodel=large x.s liby.so > % readelf -r a.out > ... > 000000600ff8 000400000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0 > ... > 000000601028 000400000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0 > ... > > The first one (to 600ff8) is the normal GOT slot, the second one the GOT > slot for the PLT entry. Both are actually used: > > 00000000004005f0 <foo@plt>: > 4005f0: ff 25 32 0a 20 00 jmpq *0x200a32(%rip) # 601028 <_GLOBAL_OFFSET_TABLE_+0x28> They are not: Starting program: /export/home/hjl/bugs/binutils/gotplt/foo Breakpoint 1, main () at main.c:5 5 void (*f)(void) = foo; (gdb) disass Dump of assembler code for function main: 0x000000000040058d <+0>: push %rbp 0x000000000040058e <+1>: mov %rsp,%rbp 0x0000000000400591 <+4>: push %rbx 0x0000000000400592 <+5>: sub $0x18,%rsp 0x0000000000400596 <+9>: lea -0x7(%rip),%rbx # 0x400596 <main+9> 0x000000000040059d <+16>: movabs $0x20042a,%r11 0x00000000004005a7 <+26>: add %r11,%rbx => 0x00000000004005aa <+29>: movabs $0xfffffffffffffff8,%rax 0x00000000004005b4 <+39>: mov (%rbx,%rax,1),%rax 0x00000000004005b8 <+43>: mov %rax,-0x18(%rbp) 0x00000000004005bc <+47>: mov -0x18(%rbp),%rax 0x00000000004005c0 <+51>: callq *%rax 0x00000000004005c2 <+53>: movabs $0xffffffffffdffad0,%rax 0x00000000004005cc <+63>: add %rbx,%rax 0x00000000004005cf <+66>: callq *%rax 0x00000000004005d1 <+68>: mov $0x0,%eax 0x00000000004005d6 <+73>: add $0x18,%rsp 0x00000000004005da <+77>: pop %rbx 0x00000000004005db <+78>: pop %rbp 0x00000000004005dc <+79>: retq End of assembler dump. (gdb) b *0x00000000004005c0 Breakpoint 2 at 0x4005c0: file main.c, line 6. (gdb) b *0x00000000004005cf Breakpoint 3 at 0x4005cf: file main.c, line 7. (gdb) c Continuing. Breakpoint 2, 0x00000000004005c0 in main () at main.c:6 6 f(); (gdb) p $rax $5 = 140737352012384 (gdb) disass $rax Dump of assembler code for function foo: 0x00007ffff7df9260 <+0>: push %rbp 0x00007ffff7df9261 <+1>: mov %rsp,%rbp 0x00007ffff7df9264 <+4>: lea 0x7(%rip),%rdi # 0x7ffff7df9272 0x00007ffff7df926b <+11>: callq 0x7ffff7df9250 <puts@plt> 0x00007ffff7df9270 <+16>: pop %rbp 0x00007ffff7df9271 <+17>: retq End of assembler dump. (gdb) c Continuing. foo Breakpoint 3, 0x00000000004005cf in main () at main.c:7 7 foo(); (gdb) p $rax $6 = 4195472 (gdb) disass $rax Dump of assembler code for function foo@plt: 0x0000000000400490 <+0>: jmpq *0x200552(%rip) # 0x6009e8 <foo@got.plt> 0x0000000000400496 <+6>: pushq $0x2 0x000000000040049b <+11>: jmpq 0x400460 End of assembler dump. (gdb) > That uses the second GOT slot, and: > > 00000000004006ec <main>: > 4006ec: 55 push %rbp > 4006ed: 48 89 e5 mov %rsp,%rbp > 4006f0: 53 push %rbx > 4006f1: 48 83 ec 18 sub $0x18,%rsp > 4006f5: 48 8d 1d f9 ff ff ff lea -0x7(%rip),%rbx # 4006f5 <main+0x9> > 4006fc: 49 bb 0b 09 20 00 00 movabs $0x20090b,%r11 > 400703: 00 00 00 > 400706: 4c 01 db add %r11,%rbx > 400709: 48 b8 f8 ff ff ff ff movabs $0xfffffffffffffff8,%rax > 400710: ff ff ff > 400713: 48 8b 04 03 mov (%rbx,%rax,1),%rax > > This uses the first slot at 0x600ff8. > > So, no, currently GOT and GOTPLT (at least how it's supposed to be > implemented) are not equivalent. GOT reference: void (*f)(void) = foo; f(); gives you the address of function, foo, in liby.so, without going through PLT, while foo() is called via PLT. For function call, we must use PLT. For pointer reference, we don't use PLT slot: 1. We don't need the indirect branch in PLT. 2. All pointer references to the same function should have the same value. One way to optimize it is to make PLT entry to use the normal GOT slot: jmp *name@GOTPCREL(%rip) 8 byte nop where name@GOTPCREL points to the normal GOT slot updated by R_X86_64_GLOB_DAT relocation at run-time. Should I give it a try?
On Wed, Nov 19, 2014 at 8:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > > One way to optimize it is to make PLT entry to use the normal GOT > slot: > > jmp *name@GOTPCREL(%rip) > 8 byte nop > > where name@GOTPCREL points to the normal GOT slot > updated by R_X86_64_GLOB_DAT relocation at run-time. > Should I give it a try? I turned out that we can reuse BND PLT. I implemented it in BFD ld on hjl/plt.got branch: https://sourceware.org/git/?p=binutils-gdb.git;a=shortlog;h=refs/heads/hjl/plt.got I tested it on glibc and it works. It should work with all models. Please give it a try.
On Wed, Nov 19, 2014 at 3:54 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Wed, Nov 19, 2014 at 8:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> >> One way to optimize it is to make PLT entry to use the normal GOT >> slot: >> >> jmp *name@GOTPCREL(%rip) >> 8 byte nop >> >> where name@GOTPCREL points to the normal GOT slot >> updated by R_X86_64_GLOB_DAT relocation at run-time. >> Should I give it a try? > > I turned out that we can reuse BND PLT. I implemented it in BFD ld > on hjl/plt.got branch: > > https://sourceware.org/git/?p=binutils-gdb.git;a=shortlog;h=refs/heads/hjl/plt.got > > I tested it on glibc and it works. It should work with all models. Please > give it a try. I spoke too soon. I found a problem and I will investigate it.
Hi, On Wed, 19 Nov 2014, H.J. Lu wrote: > > The first one (to 600ff8) is the normal GOT slot, the second one the GOT > > slot for the PLT entry. Both are actually used: > > > > 00000000004005f0 <foo@plt>: > > 4005f0: ff 25 32 0a 20 00 jmpq *0x200a32(%rip) # 601028 <_GLOBAL_OFFSET_TABLE_+0x28> > > They are not: Huh? I said both GOT slots are used and I proved it in the disasm dumps. > => 0x00000000004005aa <+29>: movabs $0xfffffffffffffff8,%rax > 0x00000000004005b4 <+39>: mov (%rbx,%rax,1),%rax Here it's using one of the GOT slots (namely the one not associated with the PLT entry) ... > Breakpoint 2, 0x00000000004005c0 in main () at main.c:6 > 6 f(); > (gdb) p $rax > $5 = 140737352012384 > (gdb) disass $rax > Dump of assembler code for function foo: ... which is why %rax contains the final address of foo, being loaded from the appropriate GOT slot that was just relocated with a GLOB_DAT reloc. > Breakpoint 3, 0x00000000004005cf in main () at main.c:7 > 7 foo(); > (gdb) p $rax > $6 = 4195472 > (gdb) disass $rax > Dump of assembler code for function foo@plt: > 0x0000000000400490 <+0>: jmpq *0x200552(%rip) # 0x6009e8 > <foo@got.plt> And here it's using the other GOT slot (associated with this PLT entry), unequal to the one used above and initially pointing to the first PLT stub. So why do you say that not both are used, you clearly see they are? > One way to optimize it is to make PLT entry to use the normal GOT > slot: Exactly. As a symbol lookup needs to be done anyway for the GLOB_DAT reloc going through the dynamic linker for the lazy lookup later when a call occurs doesn't make sense. > jmp *name@GOTPCREL(%rip) > 8 byte nop You mean replacing the PLT slot with the above? Yep, something like that. Even better of course would be to not use the PLT slot at all, it's just a useless indirection. It would be even cooler to rewrite the call insn from call foo@PLT into call *foo@GOTPCREL(%rip) (in the small model here) Unfortunately the latter is one byte larger than the former. But perhaps GCC could already emit the latter form when it knows a certain function symbol has its address taken (or more precisely if a GLOB_DAT reloc is going to be emitted for it). > where name@GOTPCREL points to the normal GOT slot > updated by R_X86_64_GLOB_DAT relocation at run-time. > Should I give it a try? Frankly, I have no idea if it's worth it. Address takings of function symbols doesn't occur very often, except in vtables, and that's not using GOT slots. Vtables should be handled in a completely different way anyway: as the entries aren't usually used for address comparisons they should point to the PLT slots, so that it's only RELATIVE relocs, not symbol based ones, so that also virtual calls can be resolved lazily. Ciao, Michael.
On Thu, Nov 20, 2014 at 5:42 AM, Michael Matz <matz@suse.de> wrote: > Exactly. As a symbol lookup needs to be done anyway for the GLOB_DAT > reloc going through the dynamic linker for the lazy lookup later when a > call occurs doesn't make sense. > >> jmp *name@GOTPCREL(%rip) >> 8 byte nop > > You mean replacing the PLT slot with the above? Yep, something like that. > Even better of course would be to not use the PLT slot at all, it's just a > useless indirection. It would be even cooler to rewrite the call insn > from > call foo@PLT > into > call *foo@GOTPCREL(%rip) > > (in the small model here) Unfortunately the latter is one byte larger > than the former. But perhaps GCC could already emit the latter form > when it knows a certain function symbol has its address taken (or more > precisely if a GLOB_DAT reloc is going to be emitted for it). > >> where name@GOTPCREL points to the normal GOT slot >> updated by R_X86_64_GLOB_DAT relocation at run-time. >> Should I give it a try? > > Frankly, I have no idea if it's worth it. Address takings of function > symbols doesn't occur very often, except in vtables, and that's not using > GOT slots. Vtables should be handled in a completely different way > anyway: as the entries aren't usually used for address comparisons they > should point to the PLT slots, so that it's only RELATIVE relocs, not > symbol based ones, so that also virtual calls can be resolved lazily. > > > Ciao, > Michael. I fixed a bug on hjl/plt.got branch: https://sourceware.org/git/?p=binutils-gdb.git;a=shortlog;h=refs/heads/hjl/plt.got It passed glibc tests and bootstrapped GCC. It optimized functions like std::bad_exception::~bad_exception() __cxa_finalize std::range_error::~range_error() std::bad_array_length::~bad_array_length()
diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index 7f636fc..981390b 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -1242,9 +1242,6 @@ examples and discussion. They are: \begin{itemize} \item \code{name@GOT}: specifies the offset to the GOT entry for the symbol \code{name} from the base of the GOT. -\item \code{name@GOTPLT}: specifies the offset to the GOT entry for - the symbol \code{name} from the base of the GOT, implying that - there is a corresponding PLT entry. \item \code{name@GOTOFF}: specifies the offset to the location of the symbol \code{name} from the base of the GOT. \item \code{name@GOTPCREL}: specifies the offset to the GOT entry diff --git a/object-files.tex b/object-files.tex index 4705e96..c0698dc 100644 --- a/object-files.tex +++ b/object-files.tex @@ -611,7 +611,7 @@ Name & Value & Field & Calculati on \\ \hline \code{R_X86_64_GOTPC64} & 29 & word64 & \code{GOT - P + A} \\ \hline gnu-6:pts/18[114]> cat /tmp/x diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index 7f636fc..981390b 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -1242,9 +1242,6 @@ examples and discussion. They are: \begin{itemize} \item \code{name@GOT}: specifies the offset to the GOT entry for the symbol \code{name} from the base of the GOT. -\item \code{name@GOTPLT}: specifies the offset to the GOT entry for - the symbol \code{name} from the base of the GOT, implying that - there is a corresponding PLT entry. \item \code{name@GOTOFF}: specifies the offset to the location of the symbol \code{name} from the base of the GOT. \item \code{name@GOTPCREL}: specifies the offset to the GOT entry diff --git a/object-files.tex b/object-files.tex index 4705e96..c0698dc 100644 --- a/object-files.tex +++ b/object-files.tex @@ -611,7 +611,7 @@ Name & Value & Field & Calculation \\ \hline \code{R_X86_64_GOTPC64} & 29 & word64 & \code{GOT - P + A} \\ \hline -\code{R_X86_64_GOTPLT64} & 30 & word64 & \code{G + A} \\ +\code{Reserved} & 30 & & \\ \hline \code{R_X86_64_PLTOFF64} & 31 & word64 & \code{L - GOT + A} \\