Fix writing object contents above 4GB in tekhex format

Message ID 20241023151112.112480-1-szedani@linux.ibm.com
State New
Headers
Series Fix writing object contents above 4GB in tekhex format |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_binutils_check--master-arm success Test passed

Commit Message

Dani Szebenyi Oct. 23, 2024, 3:10 p.m. UTC
  Writing tekhex object with addresses above 4GB resulted in a corrupted object
contents, because the upper 32-bit of the address were silently ignored
in `writevalue`.
While reading a 64-bit address section did work in tekhex format, with the
address length encoded as 0.

This code adds the neccesary support for writing the upper bits of the
address, while not changing the existing behavior for addresses below
4GB.

Signed-off-by: Dani Szebenyi <szedani@linux.ibm.com>
---
 bfd/tekhex.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Jens Remus Oct. 23, 2024, 4:04 p.m. UTC | #1
Hello Dani!

On 23.10.2024 17:10, Dani Szebenyi wrote:
> Writing tekhex object with addresses above 4GB resulted in a corrupted object
> contents, because the upper 32-bit of the address were silently ignored
> in `writevalue`.
> While reading a 64-bit address section did work in tekhex format, with the
> address length encoded as 0.

The extended Tektronix hex format encodes an address as one (hex?)digit 
address-field-length N followed by the N hexadecimal digits address.

Both getvalue() and getsym() treat the address length as hexadecimal and 
a value of 0 as if it was decimal 16 when reading tekhex:

len = hex_value (*src++);
if (len == 0)
   len = 16;

> This code adds the neccesary support for writing the upper bits of the
> address, while not changing the existing behavior for addresses below
> 4GB.

Therefore I agree with you that it would be fair to also do so in 
writevalue() when writing tekhex.

Btw. the following likely unofficial documentation agrees with this 
encoding (using "0" for 16):
http://ftp.dataio.com/main/manuals/unifam/translation%20formats.pdf

> Signed-off-by: Dani Szebenyi <szedani@linux.ibm.com>
> ---
>   bfd/tekhex.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/bfd/tekhex.c b/bfd/tekhex.c
> index 96022c585ab..46f40b19817 100644
> --- a/bfd/tekhex.c
> +++ b/bfd/tekhex.c
> @@ -719,11 +719,11 @@ writevalue (char **dst, bfd_vma value)
>     int len;
>     int shift;
>   
> -  for (len = 8, shift = 28; shift; shift -= 4, len--)
> +  for (len = 16, shift = 60; shift; shift -= 4, len--)
>       {
>         if ((value >> shift) & 0xf)
>   	{
> -	  *p++ = len + '0';
> +	  *p++ = digs[len & 0xf];
>   	  while (len)
>   	    {
>   	      *p++ = digs[(value >> shift) & 0xf];

Looks good to me! Hope Nick/Alan agree.

With this patch I do wonder whether gdb/testsuite/gdb.base/dump.exp 
could also be changed to enable test reload of tekhex on 64-bit 
architectures?

Thanks and regards,
Jens
  
Alan Modra Oct. 24, 2024, 7:33 a.m. UTC | #2
On Wed, Oct 23, 2024 at 06:04:34PM +0200, Jens Remus wrote:
> Hello Dani!
> 
> On 23.10.2024 17:10, Dani Szebenyi wrote:
> > Writing tekhex object with addresses above 4GB resulted in a corrupted object
> > contents, because the upper 32-bit of the address were silently ignored
> > in `writevalue`.
> > While reading a 64-bit address section did work in tekhex format, with the
> > address length encoded as 0.
> 
> The extended Tektronix hex format encodes an address as one (hex?)digit
> address-field-length N followed by the N hexadecimal digits address.
> 
> Both getvalue() and getsym() treat the address length as hexadecimal and a
> value of 0 as if it was decimal 16 when reading tekhex:
> 
> len = hex_value (*src++);
> if (len == 0)
>   len = 16;
> 
> > This code adds the neccesary support for writing the upper bits of the
> > address, while not changing the existing behavior for addresses below
> > 4GB.
> 
> Therefore I agree with you that it would be fair to also do so in
> writevalue() when writing tekhex.
> 
> Btw. the following likely unofficial documentation agrees with this encoding
> (using "0" for 16):
> http://ftp.dataio.com/main/manuals/unifam/translation%20formats.pdf
> 
> > Signed-off-by: Dani Szebenyi <szedani@linux.ibm.com>
> > ---
> >   bfd/tekhex.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/bfd/tekhex.c b/bfd/tekhex.c
> > index 96022c585ab..46f40b19817 100644
> > --- a/bfd/tekhex.c
> > +++ b/bfd/tekhex.c
> > @@ -719,11 +719,11 @@ writevalue (char **dst, bfd_vma value)
> >     int len;
> >     int shift;
> > -  for (len = 8, shift = 28; shift; shift -= 4, len--)
> > +  for (len = 16, shift = 60; shift; shift -= 4, len--)
> >       {
> >         if ((value >> shift) & 0xf)
> >   	{
> > -	  *p++ = len + '0';
> > +	  *p++ = digs[len & 0xf];
> >   	  while (len)
> >   	    {
> >   	      *p++ = digs[(value >> shift) & 0xf];
> 
> Looks good to me! Hope Nick/Alan agree.

Um, no.  You'll get undefined behaviour when bfd_vma is 32 bits.

Huh, and it looks like the function is broken in other ways.  If, for
example, "value" is 3 you'll get the same output as "value" 0.
So I rewrote it and decided I ought to test it..

echo -n "123" > bin
binutils/objcopy -I binary bin -O tekhex xxx
binutils/objdump -s xxx
binutils/objdump: xxx: file format not recognized

Oh great, let's see what is going on.  After debugging a little,
I see we have a problem with *ABS* symbols, and after fixing that,
the tekhex file created above:
%47621103132330000000000000000000000000000000000000000000000000000000000
%103EB5.data11013
%1F3015.data40_binary_bin_star10
%1E3C85.data4F_binary_bin_end13
%1F3585*ABS*20_binary_bin_size13
%0781010

now can be dumped.
binutils/objdump -sx xxx

xxx:     file format tekhex
xxx
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x00000000

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .data         00000003  00000000  00000000  00000000  2**0
                  CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
00000000 g       .data _binary_bin_star
00000003 g       .data _binary_bin_end
00000003 g       *ABS* _binary_bin_size


Contents of section .data:
 0000 313233                               123             

	* tekhex.c (first_phase): Handle *ABS* symbols.
	(writevalue): Rewrite.

diff --git a/bfd/tekhex.c b/bfd/tekhex.c
index 96022c585ab..16e8e185572 100644
--- a/bfd/tekhex.c
+++ b/bfd/tekhex.c
@@ -395,7 +395,7 @@ first_phase (bfd *abfd, int type, char *src, char * src_end)
 	  if (!n)
 	    return false;
 	  memcpy (n, sym, len + 1);
-	  section = bfd_make_section (abfd, n);
+	  section = bfd_make_section_old_way (abfd, n);
 	  if (section == NULL)
 	    return false;
 	}
@@ -719,24 +719,13 @@ writevalue (char **dst, bfd_vma value)
   int len;
   int shift;
 
-  for (len = 8, shift = 28; shift; shift -= 4, len--)
-    {
-      if ((value >> shift) & 0xf)
-	{
-	  *p++ = len + '0';
-	  while (len)
-	    {
-	      *p++ = digs[(value >> shift) & 0xf];
-	      shift -= 4;
-	      len--;
-	    }
-	  *dst = p;
-	  return;
+  for (len = BFD_ARCH_SIZE / 4, shift = len * 4 - 4; len > 1; shift -= 4, len--)
+    if ((value >> shift) & 0xf)
+      break;
 
-	}
-    }
-  *p++ = '1';
-  *p++ = '0';
+  *p++ = digs[len & 0xf];
+  for (; len; shift -= 4, len--)
+    *p++ = digs[(value >> shift) & 0xf];
   *dst = p;
 }
  
Jens Remus Oct. 24, 2024, 8:22 a.m. UTC | #3
Hello Alan!

On 24.10.2024 09:33, Alan Modra wrote:
> On Wed, Oct 23, 2024 at 06:04:34PM +0200, Jens Remus wrote:
>> On 23.10.2024 17:10, Dani Szebenyi wrote:
...>>>    bfd/tekhex.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/bfd/tekhex.c b/bfd/tekhex.c
>>> index 96022c585ab..46f40b19817 100644
>>> --- a/bfd/tekhex.c
>>> +++ b/bfd/tekhex.c
>>> @@ -719,11 +719,11 @@ writevalue (char **dst, bfd_vma value)
>>>      int len;
>>>      int shift;
>>> -  for (len = 8, shift = 28; shift; shift -= 4, len--)
>>> +  for (len = 16, shift = 60; shift; shift -= 4, len--)
>>>        {
>>>          if ((value >> shift) & 0xf)
>>>    	{
>>> -	  *p++ = len + '0';
>>> +	  *p++ = digs[len & 0xf];
>>>    	  while (len)
>>>    	    {
>>>    	      *p++ = digs[(value >> shift) & 0xf];
>>
>> Looks good to me! Hope Nick/Alan agree.
> 
> Um, no.  You'll get undefined behaviour when bfd_vma is 32 bits.

Aww, I did not think of that.

> Huh, and it looks like the function is broken in other ways.  If, for
> example, "value" is 3 you'll get the same output as "value" 0.

Good catch!

> So I rewrote it and decided I ought to test it..
> 
> echo -n "123" > bin
> binutils/objcopy -I binary bin -O tekhex xxx
> binutils/objdump -s xxx
> binutils/objdump: xxx: file format not recognized

Dani did not notice when testing his use use case in GDB (dump tekhex ...).

> Oh great, let's see what is going on.  After debugging a little,
> I see we have a problem with *ABS* symbols, and after fixing that,
> the tekhex file created above:
> %47621103132330000000000000000000000000000000000000000000000000000000000
> %103EB5.data11013
> %1F3015.data40_binary_bin_star10
> %1E3C85.data4F_binary_bin_end13
> %1F3585*ABS*20_binary_bin_size13
> %0781010
> 
> now can be dumped.
> binutils/objdump -sx xxx
> 
> xxx:     file format tekhex
> xxx
> architecture: UNKNOWN!, flags 0x00000010:
> HAS_SYMS
> start address 0x00000000
> 
> Sections:
> Idx Name          Size      VMA       LMA       File off  Algn
>    0 .data         00000003  00000000  00000000  00000000  2**0
>                    CONTENTS, ALLOC, LOAD, DATA
> SYMBOL TABLE:
> 00000000 g       .data _binary_bin_star
> 00000003 g       .data _binary_bin_end
> 00000003 g       *ABS* _binary_bin_size
> 
> 
> Contents of section .data:
>   0000 313233                               123

Awesome, thank you!

> 	* tekhex.c (first_phase): Handle *ABS* symbols.
> 	(writevalue): Rewrite.
> 
> diff --git a/bfd/tekhex.c b/bfd/tekhex.c
> index 96022c585ab..16e8e185572 100644
> --- a/bfd/tekhex.c
> +++ b/bfd/tekhex.c
> @@ -395,7 +395,7 @@ first_phase (bfd *abfd, int type, char *src, char * src_end)
>   	  if (!n)
>   	    return false;
>   	  memcpy (n, sym, len + 1);
> -	  section = bfd_make_section (abfd, n);
> +	  section = bfd_make_section_old_way (abfd, n);
>   	  if (section == NULL)
>   	    return false;
>   	}
> @@ -719,24 +719,13 @@ writevalue (char **dst, bfd_vma value)
>     int len;
>     int shift;
>   
> -  for (len = 8, shift = 28; shift; shift -= 4, len--)
> -    {
> -      if ((value >> shift) & 0xf)
> -	{
> -	  *p++ = len + '0';
> -	  while (len)
> -	    {
> -	      *p++ = digs[(value >> shift) & 0xf];
> -	      shift -= 4;
> -	      len--;
> -	    }
> -	  *dst = p;
> -	  return;
> +  for (len = BFD_ARCH_SIZE / 4, shift = len * 4 - 4; len > 1; shift -= 4, len--)
> +    if ((value >> shift) & 0xf)
> +      break;
>   
> -	}
> -    }
> -  *p++ = '1';
> -  *p++ = '0';
> +  *p++ = digs[len & 0xf];
> +  for (; len; shift -= 4, len--)
> +    *p++ = digs[(value >> shift) & 0xf];
>     *dst = p;
>   }

Looks good to me. I see you already commited your patch. Thank you for 
taking care!

Regards,
Jens
  

Patch

diff --git a/bfd/tekhex.c b/bfd/tekhex.c
index 96022c585ab..46f40b19817 100644
--- a/bfd/tekhex.c
+++ b/bfd/tekhex.c
@@ -719,11 +719,11 @@  writevalue (char **dst, bfd_vma value)
   int len;
   int shift;
 
-  for (len = 8, shift = 28; shift; shift -= 4, len--)
+  for (len = 16, shift = 60; shift; shift -= 4, len--)
     {
       if ((value >> shift) & 0xf)
 	{
-	  *p++ = len + '0';
+	  *p++ = digs[len & 0xf];
 	  while (len)
 	    {
 	      *p++ = digs[(value >> shift) & 0xf];