[v3] gas: add .cv_ucomp and .cv_scomp pseudo-directives

Message ID 20241112014009.32623-1-mark@harmstone.com
State New
Headers
Series [v3] gas: add .cv_ucomp and .cv_scomp pseudo-directives |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_binutils_check--master-arm success Test passed

Commit Message

Mark Harmstone Nov. 12, 2024, 1:40 a.m. UTC
  Add .cv_ucomp and .cv_scomp pseudo-directives for object files for
Windows targets, which encode compressed CodeView integers according to
the algorithm in CVCompressData in
https://github.com/Microsoft/microsoft-pdb/blob/master/include/cvinfo.h.
This is essentially Microsoft's answer to the LEB128, though used in far
fewer places.

CodeView uses these to encode the "binary annotations" in the
S_INLINESITE symbol, which express the relationship between code offsets
and line numbers in inlined functions. This has to be done in the
assembler as GCC doesn't know how many bytes each instruction takes up.
There's no equivalent for this for MSVC or LLVM, as in both cases the
assembler and compiler are integrated.

.cv_ucomp represents an unsigned big-endian integer between 0 and 0x1fffffff,
taking up 1, 2, or 4 bytes:

Value between 0 and 0x7f:

	0aaaaaaa -> 0aaaaaaa (identity-mapped)

Value between 0x80 and 0x3fff:

	00aaaaaa bbbbbbbb -> 10aaaaaa bbbbbbbb

Value between 0x4000 and 0x1fffffff:
	000aaaaa bbbbbbbb ccccccccc dddddddd ->
	110aaaaa bbbbbbbb ccccccccc dddddddd

.cv_scomp represents a signed big-endian integer between -0xfffffff and
0xfffffff, encoded according to EncodeSignedInt32 in cvinfo.h. The
absolute value of the integer is shifted left one bit, the LSB set
for a negative value, and the result expressed as if it were a
.cv_ucomp: cv_scomp(x) = cv_ucomp((abs(x) << 1) | (x < 0 ? 1 : 0))
---
Changelog:
v3:
* Moved output_cv_comp and sizeof_cv_comp to gas/codeview.c
* Gated .cv_ucomp and .cv_scomp behind TE_PE && O_secrel, to match existing
  CodeView code
* Fixed cv_comp test on aarch64
* Misc. minor changes as per Jan's suggestions

v2:
* Handle O_subtract case when symbols are in different fragments, and
  add test for this

 gas/as.h                       |   5 +-
 gas/codeview.c                 |  78 ++++++++++++++++++++++++
 gas/codeview.h                 |   2 +
 gas/read.c                     |  98 ++++++++++++++++++++++++++++++
 gas/testsuite/gas/pe/cv_comp.d |  14 +++++
 gas/testsuite/gas/pe/cv_comp.s | 105 +++++++++++++++++++++++++++++++++
 gas/testsuite/gas/pe/pe.exp    |   5 ++
 gas/write.c                    |  44 ++++++++++++++
 8 files changed, 350 insertions(+), 1 deletion(-)
 create mode 100644 gas/testsuite/gas/pe/cv_comp.d
 create mode 100644 gas/testsuite/gas/pe/cv_comp.s
  

Comments

Jan Beulich Nov. 12, 2024, 1:26 p.m. UTC | #1
On 12.11.2024 02:40, Mark Harmstone wrote:
> --- a/gas/codeview.c
> +++ b/gas/codeview.c
> @@ -533,6 +533,84 @@ codeview_generate_asm_lineno (void)
>    lf->num_lines++;
>  }
>  
> +/* Output a compressed CodeView integer.  The return value is the number of
> +   bytes used.  */
> +
> +unsigned int
> +output_cv_comp (char *p, offsetT value, int sign)
> +{
> +  char *orig = p;
> +
> +  if (sign)
> +    {
> +      if (value < -0xfffffff || value > 0xfffffff)
> +	as_fatal (_("value cannot be expressed as a .cv_scomp"));
> +    }
> +  else
> +    {
> +      if (value > 0x1fffffff)
> +	as_fatal (_("value cannot be expressed as a .cv_ucomp"));
> +    }

No as_fatal() please unless we really can't continue. as_bad() plus
returning 0 here ought to allow further progress, I think.

For the !sign case don't you also need to check for >= 0 (or cast
value to valueT / addressT)?

> --- a/gas/testsuite/gas/pe/pe.exp
> +++ b/gas/testsuite/gas/pe/pe.exp
> @@ -38,6 +38,11 @@ run_dump_test "section-exclude"
>  
>  run_dump_test "set"
>  
> +if { [istarget "aarch64-*-*"] || [istarget "i*86-*-*"]
> +    || [istarget "x86_64-*-*"] } then {
> +  run_dump_test "cv_comp"
> +}

What about arm*-*-*, which also has O_secrel?

Okay with respective adjustments and (as before) the testsuite passing
for all targets that the directive is usable with.

Jan
  
Mark Harmstone Nov. 13, 2024, 3:07 a.m. UTC | #2
Thanks Jan. Great, I'll push it with those changes made.

On 12/11/2024 13:26, Jan Beulich wrote:
> On 12.11.2024 02:40, Mark Harmstone wrote:
>> --- a/gas/codeview.c
>> +++ b/gas/codeview.c
>> @@ -533,6 +533,84 @@ codeview_generate_asm_lineno (void)
>>     lf->num_lines++;
>>   }
>>   
>> +/* Output a compressed CodeView integer.  The return value is the number of
>> +   bytes used.  */
>> +
>> +unsigned int
>> +output_cv_comp (char *p, offsetT value, int sign)
>> +{
>> +  char *orig = p;
>> +
>> +  if (sign)
>> +    {
>> +      if (value < -0xfffffff || value > 0xfffffff)
>> +	as_fatal (_("value cannot be expressed as a .cv_scomp"));
>> +    }
>> +  else
>> +    {
>> +      if (value > 0x1fffffff)
>> +	as_fatal (_("value cannot be expressed as a .cv_ucomp"));
>> +    }
> 
> No as_fatal() please unless we really can't continue. as_bad() plus
> returning 0 here ought to allow further progress, I think.
> 
> For the !sign case don't you also need to check for >= 0 (or cast
> value to valueT / addressT)?
> 
>> --- a/gas/testsuite/gas/pe/pe.exp
>> +++ b/gas/testsuite/gas/pe/pe.exp
>> @@ -38,6 +38,11 @@ run_dump_test "section-exclude"
>>   
>>   run_dump_test "set"
>>   
>> +if { [istarget "aarch64-*-*"] || [istarget "i*86-*-*"]
>> +    || [istarget "x86_64-*-*"] } then {
>> +  run_dump_test "cv_comp"
>> +}
> 
> What about arm*-*-*, which also has O_secrel?
> 
> Okay with respective adjustments and (as before) the testsuite passing
> for all targets that the directive is usable with.
> 
> Jan
  

Patch

diff --git a/gas/as.h b/gas/as.h
index 780773cd78c..3d5f710c5c5 100644
--- a/gas/as.h
+++ b/gas/as.h
@@ -263,7 +263,10 @@  enum _relax_state
   rs_dwarf2dbg,
 
   /* SFrame FRE type selection optimization.  */
-  rs_sframe
+  rs_sframe,
+
+  /* CodeView compressed integer.  */
+  rs_cv_comp,
 };
 
 typedef enum _relax_state relax_stateT;
diff --git a/gas/codeview.c b/gas/codeview.c
index 3eaa7a64fd2..445a3b97761 100644
--- a/gas/codeview.c
+++ b/gas/codeview.c
@@ -533,6 +533,84 @@  codeview_generate_asm_lineno (void)
   lf->num_lines++;
 }
 
+/* Output a compressed CodeView integer.  The return value is the number of
+   bytes used.  */
+
+unsigned int
+output_cv_comp (char *p, offsetT value, int sign)
+{
+  char *orig = p;
+
+  if (sign)
+    {
+      if (value < -0xfffffff || value > 0xfffffff)
+	as_fatal (_("value cannot be expressed as a .cv_scomp"));
+    }
+  else
+    {
+      if (value > 0x1fffffff)
+	as_fatal (_("value cannot be expressed as a .cv_ucomp"));
+    }
+
+  if (sign)
+    {
+      if (value >= 0)
+	value <<= 1;
+      else
+	value = (-value << 1) | 1;
+    }
+
+  if (value <= 0x7f)
+    {
+      *p++ = value;
+    }
+  else if (value <= 0x3fff)
+    {
+      *p++ = 0x80 | (value >> 8);
+      *p++ = value & 0xff;
+    }
+  else
+    {
+      *p++ = 0xc0 | (value >> 24);
+      *p++ = (value >> 16) & 0xff;
+      *p++ = (value >> 8) & 0xff;
+      *p++ = value & 0xff;
+    }
+
+  return p - orig;
+}
+
+/* Return the size needed to output a compressed CodeView integer.  */
+
+unsigned int
+sizeof_cv_comp (offsetT value, int sign)
+{
+  if (sign)
+    {
+      if (value < -0xfffffff || value > 0xfffffff)
+	return 0;
+
+      if (value >= 0)
+	value <<= 1;
+      else
+	value = (-value << 1) | 1;
+    }
+  else
+    {
+      if (value > 0x1fffffff)
+	return 0;
+    }
+
+  if (value <= 0x7f)
+    return 1;
+  else if (value <= 0x3fff)
+    return 2;
+  else if (value <= 0x1fffffff)
+    return 4;
+  else
+    return 0;
+}
+
 #else
 
 void
diff --git a/gas/codeview.h b/gas/codeview.h
index 57f8dfaedbf..ef2e78a9477 100644
--- a/gas/codeview.h
+++ b/gas/codeview.h
@@ -101,5 +101,7 @@  struct cv_line
 
 extern void codeview_finish (void);
 extern void codeview_generate_asm_lineno (void);
+extern unsigned int output_cv_comp (char *, offsetT, int);
+extern unsigned int sizeof_cv_comp (offsetT, int);
 
 #endif
diff --git a/gas/read.c b/gas/read.c
index aefbd7aefe8..589c7b080c2 100644
--- a/gas/read.c
+++ b/gas/read.c
@@ -265,6 +265,9 @@  static void poend (void);
 static size_t get_macro_line_sb (sb *);
 static void generate_file_debug (void);
 static char *_find_end_of_line (char *, int, int, int);
+#if defined (TE_PE) && defined (O_secrel)
+static void s_cv_comp (int sign);
+#endif
 
 void
 read_begin (void)
@@ -369,6 +372,10 @@  static const pseudo_typeS potable[] = {
   {"comm", s_comm, 0},
   {"common", s_mri_common, 0},
   {"common.s", s_mri_common, 1},
+#if defined (TE_PE) && defined (O_secrel)
+  {"cv_scomp", s_cv_comp, 1},
+  {"cv_ucomp", s_cv_comp, 0},
+#endif
   {"data", s_data, 0},
   {"dc", cons, 2},
   {"dc.a", cons, 0},
@@ -5457,6 +5464,97 @@  s_leb128 (int sign)
   demand_empty_rest_of_line ();
 }
 
+#if defined (TE_PE) && defined (O_secrel)
+
+/* Generate the appropriate fragments for a given expression to emit a
+   cv_comp value.  SIGN is 1 for cv_scomp, 0 for cv_ucomp.  */
+
+static void
+emit_cv_comp_expr (expressionS *exp, int sign)
+{
+  operatorT op = exp->X_op;
+
+  if (op == O_absent || op == O_illegal)
+    {
+      as_warn (_("zero assumed for missing expression"));
+      exp->X_add_number = 0;
+      op = O_constant;
+    }
+  else if (op == O_big)
+    {
+      as_bad (_("number invalid"));
+      exp->X_add_number = 0;
+      op = O_constant;
+    }
+  else if (op == O_register)
+    {
+      as_warn (_("register value used as expression"));
+      op = O_constant;
+    }
+
+  if (now_seg == absolute_section)
+    {
+      if (op != O_constant || exp->X_add_number != 0)
+	as_bad (_("attempt to store value in absolute section"));
+      abs_section_offset++;
+      return;
+    }
+
+  if ((op != O_constant || exp->X_add_number != 0) && in_bss ())
+    as_bad (_("attempt to store non-zero value in section `%s'"),
+	    segment_name (now_seg));
+
+  /* Let the backend know that subsequent data may be byte aligned.  */
+#ifdef md_cons_align
+  md_cons_align (1);
+#endif
+
+  if (op == O_constant)
+    {
+      offsetT value = exp->X_add_number;
+      unsigned int size;
+      char *p;
+
+      /* If we've got a constant, emit the thing directly right now.  */
+
+      size = sizeof_cv_comp (value, sign);
+      p = frag_more (size);
+      if (output_cv_comp (p, value, sign) > size)
+	abort ();
+    }
+  else
+    {
+      /* Otherwise, we have to create a variable sized fragment and
+	 resolve things later.  */
+
+      frag_var (rs_cv_comp, 4, 0, sign, make_expr_symbol (exp), 0, NULL);
+    }
+}
+
+/* Parse the .cv_ucomp and .cv_scomp pseudos.  */
+
+static void
+s_cv_comp (int sign)
+{
+  expressionS exp;
+
+#ifdef md_flush_pending_output
+  md_flush_pending_output ();
+#endif
+
+  do
+    {
+      expression (&exp);
+      emit_cv_comp_expr (&exp, sign);
+    }
+  while (*input_line_pointer++ == ',');
+
+  input_line_pointer--;
+  demand_empty_rest_of_line ();
+}
+
+#endif /* TE_PE && O_secrel */
+
 /* Code for handling base64 encoded strings.
    Based upon code in sharutils' lib/base64.c source file, written by
    Simon Josefsson.  Which was partially adapted from GNU MailUtils
diff --git a/gas/testsuite/gas/pe/cv_comp.d b/gas/testsuite/gas/pe/cv_comp.d
new file mode 100644
index 00000000000..3a87554c6a5
--- /dev/null
+++ b/gas/testsuite/gas/pe/cv_comp.d
@@ -0,0 +1,14 @@ 
+#objdump: -s -j .rdata
+#name: CodeView compressed integer test
+
+.*: .*
+
+Contents of section .rdata:
+ 0000 21002101 212a217f 21808021 853921bf  .*
+ 0010 ff21c000 400021c0 0f424021 dfffffff  .*
+ 0020 21002102 21542180 fe218100 218a7221  .*
+ 0030 c0007ffe 21c00080 0021c01e 848021df  .*
+ 0040 fffffe21 03215521 80ff2181 01218a73  .*
+ 0050 21c0007f ff21c000 800121c0 1e848121  .*
+ 0060 dfffffff 21022104 21900421 04210821  .*
+ 0070 a0082105 210921a0 0921.*
diff --git a/gas/testsuite/gas/pe/cv_comp.s b/gas/testsuite/gas/pe/cv_comp.s
new file mode 100644
index 00000000000..29183d5a346
--- /dev/null
+++ b/gas/testsuite/gas/pe/cv_comp.s
@@ -0,0 +1,105 @@ 
+	.section .rdata
+
+	.ascii		"!"
+	.cv_ucomp	0
+	.ascii		"!"
+	.cv_ucomp	1
+	.ascii		"!"
+	.cv_ucomp	42
+	.ascii		"!"
+	.cv_ucomp	127
+	.ascii		"!"
+	.cv_ucomp	128
+	.ascii		"!"
+	.cv_ucomp	1337
+	.ascii		"!"
+	.cv_ucomp	16383
+	.ascii		"!"
+	.cv_ucomp	16384
+	.ascii		"!"
+	.cv_ucomp	1000000
+	.ascii		"!"
+	.cv_ucomp	536870911
+
+	.ascii		"!"
+	.cv_scomp	0
+	.ascii		"!"
+	.cv_scomp	1
+	.ascii		"!"
+	.cv_scomp	42
+	.ascii		"!"
+	.cv_scomp	127
+	.ascii		"!"
+	.cv_scomp	128
+	.ascii		"!"
+	.cv_scomp	1337
+	.ascii		"!"
+	.cv_scomp	16383
+	.ascii		"!"
+	.cv_scomp	16384
+	.ascii		"!"
+	.cv_scomp	1000000
+	.ascii		"!"
+	.cv_scomp	268435455
+
+	.ascii		"!"
+	.cv_scomp	-1
+	.ascii		"!"
+	.cv_scomp	-42
+	.ascii		"!"
+	.cv_scomp	-127
+	.ascii		"!"
+	.cv_scomp	-128
+	.ascii		"!"
+	.cv_scomp	-1337
+	.ascii		"!"
+	.cv_scomp	-16383
+	.ascii		"!"
+	.cv_scomp	-16384
+	.ascii		"!"
+	.cv_scomp	-1000000
+	.ascii		"!"
+	.cv_scomp	-268435455
+
+	.ascii		"!"
+	# 2
+	.cv_ucomp	addr2 - addr1
+	.ascii		"!"
+	# 4
+	.cv_ucomp	addr3 - addr1
+	.ascii		"!"
+	# 4100
+	.cv_ucomp	addr4 - addr1
+
+	.ascii		"!"
+	# 2
+	.cv_scomp	addr2 - addr1
+	.ascii		"!"
+	# 4
+	.cv_scomp	addr3 - addr1
+	.ascii		"!"
+	# 4100
+	.cv_scomp	addr4 - addr1
+	.ascii		"!"
+	# -2
+	.cv_scomp	addr1 - addr2
+	.ascii		"!"
+	# -4
+	.cv_scomp	addr1 - addr3
+	.ascii		"!"
+	# -4100
+	.cv_scomp	addr1 - addr4
+	.ascii		"!"
+
+	.data
+	.space		1
+addr1: # .data + 0x1
+	.space		2
+addr2: # .data + 0x3
+	.space		2
+	# force new fragment
+	.text
+	.data
+addr3: # .data + 0x5
+	.space		0x1000
+addr4: # .data + 0x1005
diff --git a/gas/testsuite/gas/pe/pe.exp b/gas/testsuite/gas/pe/pe.exp
index 5f8396e5924..b667fbfe73a 100644
--- a/gas/testsuite/gas/pe/pe.exp
+++ b/gas/testsuite/gas/pe/pe.exp
@@ -38,6 +38,11 @@  run_dump_test "section-exclude"
 
 run_dump_test "set"
 
+if { [istarget "aarch64-*-*"] || [istarget "i*86-*-*"]
+    || [istarget "x86_64-*-*"] } then {
+  run_dump_test "cv_comp"
+}
+
 # SEH related tests
 
 # These tests are only for x86_64 targets
diff --git a/gas/write.c b/gas/write.c
index 853a9a012b7..a9c53d70f7f 100644
--- a/gas/write.c
+++ b/gas/write.c
@@ -26,6 +26,7 @@ 
 #include "output-file.h"
 #include "dwarf2dbg.h"
 #include "compress-debug.h"
+#include "codeview.h"
 
 #ifndef TC_FORCE_RELOCATION
 #define TC_FORCE_RELOCATION(FIX)		\
@@ -513,6 +514,32 @@  cvt_frag_to_fill (segT sec ATTRIBUTE_UNUSED, fragS *fragP)
       break;
 #endif
 
+#if defined (TE_PE) && defined (O_secrel)
+    case rs_cv_comp:
+      {
+	offsetT value = S_GET_VALUE (fragP->fr_symbol);
+	int size;
+
+	if (!S_IS_DEFINED (fragP->fr_symbol))
+	  {
+	    as_bad_where (fragP->fr_file, fragP->fr_line,
+			  _(".cv_%ccomp operand is an undefined symbol: %s"),
+			  fragP->fr_subtype ? 's' : 'u',
+			  S_GET_NAME (fragP->fr_symbol));
+	  }
+
+	size = output_cv_comp (fragP->fr_literal + fragP->fr_fix, value,
+			       fragP->fr_subtype);
+
+	fragP->fr_fix += size;
+	fragP->fr_type = rs_fill;
+	fragP->fr_var = 0;
+	fragP->fr_offset = 0;
+	fragP->fr_symbol = NULL;
+      }
+      break;
+#endif
+
     default:
       BAD_CASE (fragP->fr_type);
       break;
@@ -2767,6 +2794,9 @@  relax_segment (struct frag *segment_frag_root, segT segment, int pass)
 #endif
 
 	case rs_leb128:
+#if defined (TE_PE) && defined (O_secrel)
+	case rs_cv_comp:
+#endif
 	  /* Initial guess is always 1; doing otherwise can result in
 	     stable solutions that are larger than the minimum.  */
 	  address += fragP->fr_offset = 1;
@@ -3120,6 +3150,20 @@  relax_segment (struct frag *segment_frag_root, segT segment, int pass)
 		}
 		break;
 
+#if defined (TE_PE) && defined (O_secrel)
+	      case rs_cv_comp:
+		{
+		  valueT value;
+		  offsetT size;
+
+		  value = resolve_symbol_value (fragP->fr_symbol);
+		  size = sizeof_cv_comp (value, fragP->fr_subtype);
+		  growth = size - fragP->fr_offset;
+		  fragP->fr_offset = size;
+		}
+	      break;
+#endif
+
 	      case rs_cfa:
 		growth = eh_frame_relax_frag (fragP);
 		break;