[RFC] objcopy, elf: Align Header Offset [PR 32324]

Message ID c58d46f4-befd-404b-8ec4-d1f4db55a43c@arm.com
State New
Headers
Series [RFC] objcopy, elf: Align Header Offset [PR 32324] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 warning Skipped upon request
linaro-tcwg-bot/tcwg_binutils_build--master-arm warning Skipped upon request

Commit Message

Andre Simoes Dias Vieira Dec. 2, 2024, 2:16 p.m. UTC
  Hello,

This RFC 'fixes' the issue described in PR32324, but I doubt this is the 
right thing to do.  I am not at all familiar with this piece of code but 
it looked wrong for the offset after objcopy to be lower than the 
'Align' value, whereas all other headers seem to have an 'Offset' that 
is larger than their 'Align'.
Debugging this also hinted that had the offset been the same as 'Align' 
and in fact the original 'Offset' then everything would fall in place.

(deleted some leading 0s so it would fit in 80-char line).
Before objcopy/strip:
   Type           Offset             VirtAddr           PhysAddr
                  FileSiz            MemSiz              Flags  Align
...
   LOAD           0x00000000200000 0x00000000600000 0x00000000600000
                  0x0000000040467c 0x0000000040467c  R E    0x200000

After objcopy/strip:
   Type           Offset             VirtAddr           PhysAddr
                  FileSiz            MemSiz              Flags  Align
...
   LOAD           0x00000000030120 0x00000000430120 0x00000000600000
                  0x0000000040467c 0x0000000040467c  R E    0x200000


So I decided to try this hack for now and for this case it seems to work 
for the binary I was using to reproduce this.
This is just intended as breadcrumbs for the next person to pick this up.

My reasoning was that I was seeing that the p_offset and p_vaddr of the 
Header where things went wrong was much lower than the original, however 
the section addresses were not, so some sections were no longer fitting 
in this Segment and were being moved to the next one and eventually we 
ran out of space for some Sections, hence the warnings.

Kind regards,
Andre
  

Comments

Jan Beulich Dec. 3, 2024, 9:26 a.m. UTC | #1
On 02.12.2024 15:16, Andre Vieira (lists) wrote:
> This RFC 'fixes' the issue described in PR32324, but I doubt this is the 
> right thing to do.  I am not at all familiar with this piece of code but 
> it looked wrong for the offset after objcopy to be lower than the 
> 'Align' value, whereas all other headers seem to have an 'Offset' that 
> is larger than their 'Align'.
> Debugging this also hinted that had the offset been the same as 'Align' 
> and in fact the original 'Offset' then everything would fall in place.
> 
> (deleted some leading 0s so it would fit in 80-char line).
> Before objcopy/strip:
>    Type           Offset             VirtAddr           PhysAddr
>                   FileSiz            MemSiz              Flags  Align
> ...
>    LOAD           0x00000000200000 0x00000000600000 0x00000000600000
>                   0x0000000040467c 0x0000000040467c  R E    0x200000
> 
> After objcopy/strip:
>    Type           Offset             VirtAddr           PhysAddr
>                   FileSiz            MemSiz              Flags  Align
> ...
>    LOAD           0x00000000030120 0x00000000430120 0x00000000600000
>                   0x0000000040467c 0x0000000040467c  R E    0x200000

Neither here nor in the bug report a complete picture is presented. I
don't think providing the info for just a single segment / section can
suffice. What's more, in the PR you mention the input image is the
result of some other tool's optimization. Pre-optimized image data may
also be relevant, especially as long as we can't exclude there's an
issue with that optimization (as you hint at as a possibility in the
PR).

Therefore can you please provide at least a full program and section
header dump from the one binary in question, for all three states of
it (pre optimization, post optimization, and post stripping)? Ideally
you'd make available a shrunk down testcase, for people (like me) to
actually play with.

> So I decided to try this hack for now and for this case it seems to work 
> for the binary I was using to reproduce this.
> This is just intended as breadcrumbs for the next person to pick this up.
> 
> My reasoning was that I was seeing that the p_offset and p_vaddr of the 
> Header where things went wrong was much lower than the original, however 
> the section addresses were not, so some sections were no longer fitting 
> in this Segment and were being moved to the next one and eventually we 
> ran out of space for some Sections, hence the warnings.

Right, except that I don't think it makes much sense to special case
offsets below p_align. p_align, after all, says nothing about full
addresses, but is just about requirements towards their low bits.

Jan
  
Andreas Schwab Dec. 3, 2024, 10:08 a.m. UTC | #2
On Dez 02 2024, Andre Vieira (lists) wrote:

> diff --git a/bfd/elf.c b/bfd/elf.c
> index 74236a658fd9a10a61f466d9a2191998c2f4ce06..28d0143acca351203c840e6afc1dca710d6f671a 100644
> --- a/bfd/elf.c
> +++ b/bfd/elf.c
> @@ -6152,7 +6152,12 @@ assign_file_positions_for_load_sections (bfd *abfd,
>        if (m == phdr_load_seg)
>  	{
>  	  if (!m->includes_filehdr)
> -	    p->p_offset = off;
> +	    {
> +	      if (((bfd_vma) off) < p->p_align)
> +		p->p_offset = p->p_align;
> +	      else
> +		p->p_offset = off;
> +	    }

That is definitely wrong.  p_align only asserts that (p_offset -
p_vaddr) % p_align == 0.  It is _not_ required that p_offset % p_align
== 0.
  
Andre Simoes Dias Vieira Dec. 3, 2024, 10:35 a.m. UTC | #3
Hi Jan,

Thank you for responding!

On 03/12/2024 09:26, Jan Beulich wrote:
> 
> Neither here nor in the bug report a complete picture is presented. I
> don't think providing the info for just a single segment / section can
> suffice. What's more, in the PR you mention the input image is the
> result of some other tool's optimization. Pre-optimized image data may
> also be relevant, especially as long as we can't exclude there's an
> issue with that optimization (as you hint at as a possibility in the
> PR).
> 
> Therefore can you please provide at least a full program and section
> header dump from the one binary in question, for all three states of
> it (pre optimization, post optimization, and post stripping)? Ideally
> you'd make available a shrunk down testcase, for people (like me) to
> actually play with.

Yeah I fully appreciate this, I was not sure what the best way was to go 
about it, I've attached the headers and segments of the binary before 
and after copying.

I'll see if I can create a binary reproducer of code I am able to share, 
but in 9 years of GNU development I've never sent binaries over for 
debugging, we usually operate with source-level reproducers... so not 
sure about whether it was something people did. Also not sure I'll be 
able to come up with a shareable reproducer for you, but I'll try.

I think I was hoping others would have seen similar behavior and 
would've been able to investigate, but I guess if they had, it might 
have already been fixed, even if I'm not 100% convinced this is an issue 
with objcopy/strip ... could well be binary that was modified to 
something that isn't quite compliant with ELF.
> 
>> So I decided to try this hack for now and for this case it seems to work
>> for the binary I was using to reproduce this.
>> This is just intended as breadcrumbs for the next person to pick this up.
>>
>> My reasoning was that I was seeing that the p_offset and p_vaddr of the
>> Header where things went wrong was much lower than the original, however
>> the section addresses were not, so some sections were no longer fitting
>> in this Segment and were being moved to the next one and eventually we
>> ran out of space for some Sections, hence the warnings.
> 
> Right, except that I don't think it makes much sense to special case
> offsets below p_align. p_align, after all, says nothing about full
> addresses, but is just about requirements towards their low bits.

True... and from what I could find p_vaddr should equal p_offset modulo 
p_align, but I probably need to read that as:
    p_vaddr % p_align == p_offset % p_align

Both before and after headers uphold that ... though strangely, unlike 
other headers, the p_paddr of the header where the problems start didn't 
change and is no longer the same as p_vaddr.

Like I said, the p_align hack was just something I noticed whilst 
debugging that would 'make it work', but I'm not at all convinced it's 
the right approach either, but I don't know what is either.

Kind Regards,
Andre

PS: Apologies if I stop responding, I'll make sure someone else in my 
team picks this up whilst I'm away.
Elf file type is EXEC (Executable file)
Entry point 0x800000
There are 11 program headers, starting at offset 2097152

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000200000 0x0000000000600000 0x0000000000600000
                 0x0000000000000268 0x0000000000000268  R      0x8
  INTERP         0x0000000000000294 0x0000000000400294 0x0000000000400294
                 0x0000000000000042 0x0000000000000042  R      0x1
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000024458 0x0000000000024458  R E    0x10000
  LOAD           0x000000000002fdc0 0x000000000043fdc0 0x000000000043fdc0
                 0x0000000000000360 0x0000000000000360  RW     0x10000
  LOAD           0x0000000000200000 0x0000000000600000 0x0000000000600000
                 0x000000000040467c 0x000000000040467c  R E    0x200000
  DYNAMIC        0x000000000002fdd8 0x000000000043fdd8 0x000000000043fdd8
                 0x00000000000001f0 0x00000000000001f0  RW     0x8
  NOTE           0x0000000000000270 0x0000000000400270 0x0000000000400270
                 0x0000000000000024 0x0000000000000024  R      0x4
  NOTE           0x0000000000030100 0x0000000000440100 0x0000000000440100
                 0x0000000000000020 0x0000000000000020  R      0x4
  GNU_EH_FRAME   0x0000000000603d08 0x0000000000a03d08 0x0000000000a03d08
                 0x0000000000000974 0x0000000000000974  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x000000000002fdc0 0x000000000043fdc0 0x000000000043fdc0
                 0x0000000000000240 0x0000000000000240  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .note.gnu.build-id .interp .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .bolt.org.text .fini .rodata .bolt.org.eh_frame_hdr .bolt.org.eh_frame
   03     .init_array .fini_array .dynamic .got .got.plt .data .tm_clone_table .bss .note.ABI-tag
   04     .text .text.cold .eh_frame .eh_frame_hdr
   05     .dynamic
   06     .note.gnu.build-id
   07     .note.ABI-tag
   08     .eh_frame_hdr
   09
   10     .init_array .fini_array .dynamic .got
Elf file type is EXEC (Executable file)
Entry point 0x800000
There are 11 program headers, starting at offset 196896

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000030120 0x0000000000430120 0x0000000000600000
                 0x0000000000000268 0x0000000000000268  R      0x8
  INTERP         0x0000000000000294 0x0000000000400294 0x0000000000400294
                 0x0000000000000042 0x0000000000000042  R      0x1
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000024458 0x0000000000024458  R E    0x10000
  LOAD           0x000000000002fdc0 0x000000000043fdc0 0x000000000043fdc0
                 0x0000000000000360 0x0000000000000360  RW     0x10000
  LOAD           0x0000000000030120 0x0000000000430120 0x0000000000600000
                 0x000000000040467c 0x000000000040467c  R E    0x200000
  DYNAMIC        0x000000000002fdd8 0x000000000043fdd8 0x000000000043fdd8
                 0x00000000000001f0 0x00000000000001f0  RW     0x8
  NOTE           0x0000000000000270 0x0000000000400270 0x0000000000400270
                 0x0000000000000024 0x0000000000000024  R      0x4
  NOTE           0x0000000000030100 0x0000000000440100 0x0000000000440100
                 0x0000000000000020 0x0000000000000020  R      0x4
  GNU_EH_FRAME   0x0000000000433e28 0x0000000000a03d08 0x0000000000a03d08
                 0x0000000000000974 0x0000000000000974  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x000000000002fdc0 0x000000000043fdc0 0x000000000043fdc0
                 0x0000000000000240 0x0000000000000240  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .note.gnu.build-id .interp .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .bolt.org.text .fini .rodata .bolt.org.eh_frame_hdr .bolt.org.eh_frame
   03     .init_array .fini_array .dynamic .got .got.plt .data .tm_clone_table .bss .note.ABI-tag
   04     .bss .text .text.cold
   05     .dynamic
   06     .note.gnu.build-id
   07     .note.ABI-tag
   08     .eh_frame_hdr
   09
   10     .init_array .fini_array .dynamic .got
  
Andre Simoes Dias Vieira Dec. 3, 2024, 4:30 p.m. UTC | #4
OK so I have a binary reproducer for you, it's just a nonsensical hello 
world with a loop to make it run for long enough for perf to create a 
record that actually captures something. I compressed it using gzip, so 
hopefully the mailing list accepts it.

This binary was built on an aarch64-none-linux-gnu target, using a top 
of the trunk build of llvm-bolt and a gcc 11.4 with the following commands:
$ gcc hello.c fn.c -O2 -g -Wl,--emit-relocs 
-fno-reorder-blocks-and-partition
$ perf record -e cycles:u -o perf.data -- ./a.out
$ ./bolt_build/bin/perf2bolt -p perf.data -o perf.fdata -nl  ./a.out
$ ./bolt_build/bin/llvm-bolt ./a.out -o ./a-bolted.out -data perf.fdata 
-reorder-blocks=ext-tsp -reorder-functions=hfsort -split-functions 
-split-all-cold -split-eh -dyno-stats

To reproduce the issue with GNU strip/objcopy it is sufficient to call:
$ objcopy a-bolted.out bad_copy
objcopy: bad_copy: section `.eh_frame' can't be allocated in segment 2
LOAD: .text .text.cold .eh_frame .eh_frame_hdr
objcopy: bad_copy: section `.eh_frame_hdr' can't be allocated in segment 2
LOAD: .text .text.cold .eh_frame .eh_frame_hdr
$ ./bad_copy
Illegal Instruction

The sources used to create the binary are ...

hello.c:
#include <stdio.h>

extern int fn (int *, int *, int);

#  define N 2230000
int a[N];
int b[N];

int main (void)
{
   for (int i = 0; i < N; ++i)
     {
       a[i] = i % 10;
       b[i] = i % 2;
     }
   printf ("Hello World! %d\n", fn(&a[0], &b[0], N));
   return 0;
}

... and ...

fn.c:
int fn (int *a, int *b, int n)
{
   int sum = 0;
   for (int i = 0; i < n; ++ i)
     sum += a[i] * b[i];
   return sum;
}
  
Jan Beulich Dec. 17, 2024, 2:39 p.m. UTC | #5
On 03.12.2024 17:30, Andre Vieira (lists) wrote:
> OK so I have a binary reproducer for you, it's just a nonsensical hello 
> world with a loop to make it run for long enough for perf to create a 
> record that actually captures something. I compressed it using gzip, so 
> hopefully the mailing list accepts it.
> 
> This binary was built on an aarch64-none-linux-gnu target, using a top 
> of the trunk build of llvm-bolt and a gcc 11.4 with the following commands:
> $ gcc hello.c fn.c -O2 -g -Wl,--emit-relocs 
> -fno-reorder-blocks-and-partition
> $ perf record -e cycles:u -o perf.data -- ./a.out

The original binary would also have been of interest.

> $ ./bolt_build/bin/perf2bolt -p perf.data -o perf.fdata -nl  ./a.out
> $ ./bolt_build/bin/llvm-bolt ./a.out -o ./a-bolted.out -data perf.fdata 
> -reorder-blocks=ext-tsp -reorder-functions=hfsort -split-functions 
> -split-all-cold -split-eh -dyno-stats
> 
> To reproduce the issue with GNU strip/objcopy it is sufficient to call:
> $ objcopy a-bolted.out bad_copy
> objcopy: bad_copy: section `.eh_frame' can't be allocated in segment 2
> LOAD: .text .text.cold .eh_frame .eh_frame_hdr
> objcopy: bad_copy: section `.eh_frame_hdr' can't be allocated in segment 2
> LOAD: .text .text.cold .eh_frame .eh_frame_hdr

It looks to me as if the problem was that objcopy tries to move the two
sections from segment 4 to segment 2. While I don't think I'm aware of a
place where it is written down what exact changes objcopy is expected /
permitted to make, it would seem to me that it shouldn't alter the
section-to-segment mapping. The two sections in question should remain
in segment 4. Doing so would feel like partial re-linking of the binary,
which pretty certainly isn't what objcopy is supposed to do (and even
less so strip).

Jan
  
Jan Beulich Dec. 17, 2024, 4:33 p.m. UTC | #6
On 17.12.2024 15:39, Jan Beulich wrote:
> On 03.12.2024 17:30, Andre Vieira (lists) wrote:
>> OK so I have a binary reproducer for you, it's just a nonsensical hello 
>> world with a loop to make it run for long enough for perf to create a 
>> record that actually captures something. I compressed it using gzip, so 
>> hopefully the mailing list accepts it.
>>
>> This binary was built on an aarch64-none-linux-gnu target, using a top 
>> of the trunk build of llvm-bolt and a gcc 11.4 with the following commands:
>> $ gcc hello.c fn.c -O2 -g -Wl,--emit-relocs 
>> -fno-reorder-blocks-and-partition
>> $ perf record -e cycles:u -o perf.data -- ./a.out
> 
> The original binary would also have been of interest.
> 
>> $ ./bolt_build/bin/perf2bolt -p perf.data -o perf.fdata -nl  ./a.out
>> $ ./bolt_build/bin/llvm-bolt ./a.out -o ./a-bolted.out -data perf.fdata 
>> -reorder-blocks=ext-tsp -reorder-functions=hfsort -split-functions 
>> -split-all-cold -split-eh -dyno-stats
>>
>> To reproduce the issue with GNU strip/objcopy it is sufficient to call:
>> $ objcopy a-bolted.out bad_copy
>> objcopy: bad_copy: section `.eh_frame' can't be allocated in segment 2
>> LOAD: .text .text.cold .eh_frame .eh_frame_hdr
>> objcopy: bad_copy: section `.eh_frame_hdr' can't be allocated in segment 2
>> LOAD: .text .text.cold .eh_frame .eh_frame_hdr
> 
> It looks to me as if the problem was that objcopy tries to move the two
> sections from segment 4 to segment 2.

That was a red herring. The diagnostic emits an internal loop counter,
making the number reported entirely meaningless. There's no moving of
sections between segments, afaict right now. Back to square 1.

Jan
  

Patch

diff --git a/bfd/elf.c b/bfd/elf.c
index 74236a658fd9a10a61f466d9a2191998c2f4ce06..28d0143acca351203c840e6afc1dca710d6f671a 100644
--- a/bfd/elf.c
+++ b/bfd/elf.c
@@ -6152,7 +6152,12 @@  assign_file_positions_for_load_sections (bfd *abfd,
       if (m == phdr_load_seg)
 	{
 	  if (!m->includes_filehdr)
-	    p->p_offset = off;
+	    {
+	      if (((bfd_vma) off) < p->p_align)
+		p->p_offset = p->p_align;
+	      else
+		p->p_offset = off;
+	    }
 	  off += actual * bed->s->sizeof_phdr;
 	}