[v3] RISC-V: Fix IFUNC resolver cannot access gp pointer

Message ID tencent_EA6F621A42D41AFDF99A0561B51F1CB57109@qq.com
State New
Headers
Series [v3] RISC-V: Fix IFUNC resolver cannot access gp pointer |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed
redhat-pt-bot/TryBot-32bit success Build for i686

Commit Message

Yangyu Chen Nov. 27, 2024, 7:35 a.m. UTC
  In some cases, an IFUNC resolver may need to access the gp pointer to
access global variables. Such an object may have l_relocated == 0 at
this time. In this case, an IFUNC resolver will fail to access a global
variable and cause a SIGSEGV. This patch fixes this issue by relaxing
the check of l_relocated in elf_machine_runtime_setup.

As for the original BZ #31317, since the static-linked executable has
already set up the gp pointer, we don't need to execute the code to set
up the gp pointer again. Thus, this code should be skipped for !SHARED
case. I have also reproduced and checked BZ #31317 using a mold commit
bed5b1731b ("illumos: Treat absolute symbols specially"), this patch can
fix the issue. To prevent some other bug like original BZ #31317 from
happening again, I also added check `l->l_scope != NULL`.

Also, we used the wrong gp pointer previously because ref->st_value is
not the relocated address but just the offset from the base address of
ELF. An edge case may happen if we reference gp pointer in IFUNC
resolverin a PIE object. The GP will be initialized incorrectly since
the ref->st_value is not the address after relocation. This patch fixes
this issue by adding the l->l_addr to ref->st_value to get the relocated
address for the gp pointer. We don't use SYMBOL_ADDRESS macro here
because __global_pointer$ is a special symbol that has SHN_ABS type, but
it should use PC-relative addressing.

Closes: BZ #32269
Fixes: 96d1b9ac23 ("RISC-V: Fix the static-PIE non-relocated object check")

Co-authored-by: Vivian Wang <dramforever@live.com>
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
---
 sysdeps/riscv/dl-machine.h | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)
  

Comments

Andreas Schwab Dec. 2, 2024, 10:01 a.m. UTC | #1
On Nov 27 2024, Yangyu Chen wrote:

> +	asm (
> +	  "mv gp, %0\n"
> +	  :
> +	  : "r" (ref->st_value + l->l_addr)

Doesn't that apply the relocation a second time if the object is already
relocated?
  
Yangyu Chen Dec. 3, 2024, 10:36 a.m. UTC | #2
On 12/2/24 18:01, Andreas Schwab wrote:
> On Nov 27 2024, Yangyu Chen wrote:
> 
>> +	asm (
>> +	  "mv gp, %0\n"
>> +	  :
>> +	  : "r" (ref->st_value + l->l_addr)
> 
> Doesn't that apply the relocation a second time if the object is already
> relocated?
> 

No. Because we will use st_value from elf directly. Relocation does not 
change this.
  
Andreas Schwab Dec. 3, 2024, 10:47 a.m. UTC | #3
On Dez 03 2024, Yangyu Chen wrote:

> On 12/2/24 18:01, Andreas Schwab wrote:
>> On Nov 27 2024, Yangyu Chen wrote:
>> 
>>> +	asm (
>>> +	  "mv gp, %0\n"
>>> +	  :
>>> +	  : "r" (ref->st_value + l->l_addr)
>> Doesn't that apply the relocation a second time if the object is already
>> relocated?
>> 
>
> No. Because we will use st_value from elf directly. Relocation does not
> change this.

Then how did it work before?
  
Yangyu Chen Dec. 3, 2024, 10:54 a.m. UTC | #4
On 12/3/24 18:47, Andreas Schwab wrote:
> On Dez 03 2024, Yangyu Chen wrote:
> 
>> On 12/2/24 18:01, Andreas Schwab wrote:
>>> On Nov 27 2024, Yangyu Chen wrote:
>>>
>>>> +	asm (
>>>> +	  "mv gp, %0\n"
>>>> +	  :
>>>> +	  : "r" (ref->st_value + l->l_addr)
>>> Doesn't that apply the relocation a second time if the object is already
>>> relocated?
>>>
>>
>> No. Because we will use st_value from elf directly. Relocation does not
>> change this.
> 
> Then how did it work before?
> 

This code did not work before since this only happens when we use gp 
with pie binary. But ld in GNU binutils will not relax auipc + load to 
load gp directly when linking pie binary, thus gp is hardly used in this 
case. We also call load_gp to load the correct gp in _main. So this bug 
is hard to trigger when all the code are generated by compilers.
  

Patch

diff --git a/sysdeps/riscv/dl-machine.h b/sysdeps/riscv/dl-machine.h
index b2f28697f7..b10b2eb909 100644
--- a/sysdeps/riscv/dl-machine.h
+++ b/sysdeps/riscv/dl-machine.h
@@ -348,7 +348,8 @@  elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
       gotplt[1] = (ElfW(Addr)) l;
     }
 
-  if (l->l_type == lt_executable && l->l_relocated)
+#ifdef SHARED
+  if (l->l_type == lt_executable && l->l_scope != NULL)
     {
       /* The __global_pointer$ may not be defined by the linker if the
 	 $gp register does not be used to access the global variable
@@ -362,12 +363,16 @@  elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
       _dl_lookup_symbol_x ("__global_pointer$", l, &ref,
 			   l->l_scope, NULL, 0, 0, NULL);
       if (ref)
-        asm (
-          "mv gp, %0\n"
-          :
-          : "r" (ref->st_value)
-        );
+	asm (
+	  "mv gp, %0\n"
+	  :
+	  : "r" (ref->st_value + l->l_addr)
+	  /* Don't use SYMBOL_ADDRESS here since __global_pointer$
+	     can be SHN_ABS type, but we need the address relative to
+	     PC, not the absolute address.  */
+	);
     }
+#endif
 #endif
   return lazy;
 }