[RFC,2/3] gas: add a means to programmatically determine the assembler version

Message ID 85359ef4-5c18-40c8-afbb-edc163813c59@suse.com
State New
Headers
Series gas: pre-defined symbols and .errif / .warnif |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 fail Patch failed to apply
linaro-tcwg-bot/tcwg_binutils_build--master-arm fail Patch failed to apply

Commit Message

Jan Beulich Dec. 9, 2024, 9:20 a.m. UTC
  It has been more than once that I would have wanted to have a way to
know the gas version in assembly sources, perhaps for use with .if. Add
such a pre-defined symbol, yet key exposure to -v being used at least
twice (three times to also emit the symbols to the symbol table).
Together with attempting to use non-symbol characters as separators
(thus requiring quotation), this is to keep the risk of collisions with
users' symbols as low as possible.

To compensate for the effect the first -v has, also introduce --silent
(to override that meaning of -v).

Seeing that the "verbose" variable presently has no consumers, re-use it
for the purposes here. Limit its visibility though, just to be on the
safe side (e.g. with out-of-tree changes). If access is needed from
elsewhere, an accessor function may want introducing instead.

Similarly permit determining whether the assembler is a released
version. The exact value probably isn't of much use, it's more the
defined-ness that one might care about. Yet the symbol needs to have
some value anyway.

While fiddling with documentation, split -v from -version/--version.
They aren't the same; -v long form is --verbose, which so far wasn't
mentioned at all.
---
Subsequently we may want to consider whether target-specific code should
introduce further such symbols to indicate e.g. new ISA extensions they
support (e.g. __x86_apx__ or just __apx__ on x86, with underscores again
replaced as is done here).

The string literal near the top of predefine_symbol() effectively
becomes part of the gas specification: It may not easily have characters
re-ordered (let alone dropped) for example, and even adding characters
at the end may pose compatibility issues.

The version2 test would certainly be nice to run for non-ELF as well,
yet I can't seem to be able to thinks of a way to sensibly arrange for
that (checking for the absence of any _other_ symbols).

The ECOFF/XCOFF special casing of the 3rd test feels bogus. I wonder if
this doesn't need to be XFAIL instead. At the example of XCOFF:
ppc_saw_abs isn't set for local symbols, yet ppc_adjust_symtab() doesn't
distinguish local and non-local ones.
  

Comments

Michael Matz Dec. 11, 2024, 2:06 p.m. UTC | #1
Hello,

On Mon, 9 Dec 2024, Jan Beulich wrote:

> It has been more than once that I would have wanted to have a way to
> know the gas version in assembly sources, perhaps for use with .if. Add
> such a pre-defined symbol, yet key exposure to -v being used at least
> twice (three times to also emit the symbols to the symbol table).

Only defining the symbol with -vv or more is beyond quirky.  One would 
never be able to rely on these symbols in .s sources then, possibly except 
for test cases.  The common user wanting such symbol (for .if. or 
otherwise, which is very reasonable!) wouldn't know that those .s files 
need to be specifically assembled with -vv.  If the user learns of that, 
then that prints the version, so the user then needs to know even more, 
namely to use --silent.  Which then suppressed printing the version, but 
_not_ the emission of the symbols.

That's not a sensible interface :-)

Just define the symbols always, the usage of non-symbol chars should make 
all of this be reasonably non-conflicting with existing code.  If you 
decide that you absolutely _must_ have a cmdline option to control this, 
then don't make it be -v.  (I really think this should not depend on any 
options!)

Suggestion: make the symbols for .if. usage that are predefined be of a 
dependable format: e.g. 'GAS_DEFINED(foobar)' (where "foobar" would be the 
different names, but the whole string, including parens is the symbol).
That way the probability of conflicts with current code is about zero and 
the symbol names are somewhat systematic.


Ciao,
Michael.
  
Jan Beulich Dec. 12, 2024, 10:06 a.m. UTC | #2
On 11.12.2024 15:06, Michael Matz wrote:
> Hello,
> 
> On Mon, 9 Dec 2024, Jan Beulich wrote:
> 
>> It has been more than once that I would have wanted to have a way to
>> know the gas version in assembly sources, perhaps for use with .if. Add
>> such a pre-defined symbol, yet key exposure to -v being used at least
>> twice (three times to also emit the symbols to the symbol table).
> 
> Only defining the symbol with -vv or more is beyond quirky.  One would 
> never be able to rely on these symbols in .s sources then, possibly except 
> for test cases.  The common user wanting such symbol (for .if. or 
> otherwise, which is very reasonable!) wouldn't know that those .s files 
> need to be specifically assembled with -vv.  If the user learns of that, 
> then that prints the version, so the user then needs to know even more, 
> namely to use --silent.  Which then suppressed printing the version, but 
> _not_ the emission of the symbols.
> 
> That's not a sensible interface :-)
> 
> Just define the symbols always, the usage of non-symbol chars should make 
> all of this be reasonably non-conflicting with existing code.  If you 
> decide that you absolutely _must_ have a cmdline option to control this, 
> then don't make it be -v.  (I really think this should not depend on any 
> options!)

Hmm, maybe I was indeed overly concerned of still possible collisions with
what people might use. The advent of quoted symbol names, despite remaining
quirks there, has - after all - provided people with great freedom as to
the naming of their symbols.

> Suggestion: make the symbols for .if. usage that are predefined be of a 
> dependable format: e.g. 'GAS_DEFINED(foobar)' (where "foobar" would be the 
> different names, but the whole string, including parens is the symbol).
> That way the probability of conflicts with current code is about zero and 
> the symbol names are somewhat systematic.

I'll have to think about this. It surely has advantages over the initial
naming scheme I used; it merely feels as if this could also end up
misleading / ambiguous, but to support / discard that concern I'll need to
play some with it.

Jan
  

Patch

--- a/gas/as.c
+++ b/gas/as.c
@@ -95,7 +95,7 @@  int chunksize = 0;
 int debug_memory = 0;
 
 /* Enable verbose mode.  */
-int verbose = 0;
+static int verbose = 0;
 
 /* Which version of DWARF CIE to produce.  This default value of -1
    indicates that this value has not been set yet, a default value is
@@ -461,6 +461,9 @@  parse_args (int * pargc, char *** pargv)
       OPTION_NOCPP,
       OPTION_STATISTICS,
       OPTION_VERSION,
+      OPTION_VERBOSE2,
+      OPTION_VERBOSE3,
+      OPTION_SILENT,
       OPTION_DUMPCONFIG,
       OPTION_EMULATION,
       OPTION_DEBUG_PREFIX_MAP,
@@ -577,10 +580,13 @@  parse_args (int * pargc, char *** pargv)
     ,{"no-pad-sections", no_argument, NULL, OPTION_NO_PAD_SECTIONS}
     ,{"no-warn", no_argument, NULL, 'W'}
     ,{"reduce-memory-overheads", no_argument, NULL, OPTION_REDUCE_MEMORY_OVERHEADS}
+    ,{"silent", no_argument, NULL, OPTION_SILENT}
     ,{"statistics", no_argument, NULL, OPTION_STATISTICS}
     ,{"strip-local-absolute", no_argument, NULL, OPTION_STRIP_LOCAL_ABSOLUTE}
     ,{"version", no_argument, NULL, OPTION_VERSION}
     ,{"verbose", no_argument, NULL, 'v'}
+    ,{"vv", no_argument, NULL, OPTION_VERBOSE2}
+    ,{"vvv", no_argument, NULL, OPTION_VERBOSE3}
     ,{"target-help", no_argument, NULL, OPTION_TARGET_HELP}
     ,{"traditional-format", no_argument, NULL, OPTION_TRADITIONAL_FORMAT}
     ,{"warn", no_argument, NULL, OPTION_WARN}
@@ -631,7 +637,9 @@  parse_args (int * pargc, char *** pargv)
 	  if (optc == 'v')
 	    {
 	case 'v':
-	      verbose = 1;
+	      if (verbose < 0)
+		verbose = -verbose;
+	      ++verbose;
 	      break;
 	    }
 	  else if (is_a_char (optc))
@@ -718,6 +726,23 @@  This program has absolutely no warranty.
 #endif
 	  exit (EXIT_SUCCESS);
 
+	case OPTION_VERBOSE2:
+	  if (verbose < 0)
+	    verbose = -verbose;
+	  verbose += 2;
+	  break;
+
+	case OPTION_VERBOSE3:
+	  if (verbose < 0)
+	    verbose = -verbose;
+	  verbose += 3;
+	  break;
+
+	case OPTION_SILENT:
+	  if (verbose > 0)
+	    verbose = -verbose;
+	  break;
+
 	case OPTION_EMULATION:
 #ifdef USE_EMULATIONS
 	  if (strcmp (optarg, this_emulation->name))
@@ -1153,6 +1178,50 @@  This program has absolutely no warranty.
 #endif
 }
 
+/* Depending on command line options, optionally pre-define a symbol with its
+   name derived from TMPL (by replacing underscores if possible), to value
+   VAL.  */
+
+void
+predefine_symbol (const char *tmpl, valueT val)
+{
+  static const char separators[] = ".@$?%&#";
+  static char sep;
+  char name[64 /* Arbitrary, yet ought to fit all needs.  */];
+  symbolS *s;
+
+  if (abs (verbose) <= 1)
+    return;
+
+  /* To limit the risk of name collisions, try to find a non-symbol char to
+     use as separator.  */
+  if (sep == '\0')
+    for (const char *p = separators; *p != '\0'; ++p)
+      if (!is_name_beginner (*p) && !is_part_of_name (*p))
+	{
+	  sep = *p;
+	  break;
+	}
+
+  if (sep == '\0')
+    sep = '_';
+
+  for (char *p = name; ; ++p, ++tmpl)
+    {
+      *p = *tmpl != '_' ? *tmpl : sep;
+      if (*tmpl == '\0')
+	break;
+    }
+
+  /* Also put the symbol in the symbol table in yet more "verbose" mode.  */
+  if (abs (verbose) > 2)
+    s = symbol_new (name, absolute_section, &zero_address_frag, val);
+  else
+    s = symbol_create (name, absolute_section, &zero_address_frag, val);
+  S_CLEAR_EXTERNAL (s);
+  symbol_table_insert (s);
+}
+
 static void
 dump_statistics (void)
 {
@@ -1221,6 +1290,10 @@  perform_an_assembly_pass (int argc, char
   subseg_set (text_section, 0);
 #endif
 
+  predefine_symbol ("__gas_version__", BFD_VERSION);
+  if (strstr (BFD_VERSION_STRING, "." XSTRING (BFD_VERSION_DATE)) != NULL)
+    predefine_symbol ("__gas_date__", BFD_VERSION_DATE);
+
   /* This may add symbol table entries, which requires having an open BFD,
      and sections already created.  */
   md_begin ();
--- a/gas/as.h
+++ b/gas/as.h
@@ -410,9 +410,6 @@  extern unsigned int dwarf_level;
 /* Maximum level of macro nesting.  */
 extern int max_macro_nest;
 
-/* Verbosity level.  */
-extern int verbose;
-
 struct obstack;
 
 /* Obstack chunk size.  Keep large for efficient space use, make small to
@@ -510,6 +507,7 @@  void   as_report_context (void);
 const char * as_where (unsigned int *);
 const char * as_where_top (unsigned int *);
 const char * as_where_physical (unsigned int *);
+void   predefine_symbol (const char *, valueT);
 void   bump_line_counters (void);
 void   do_scrub_begin (int);
 void   input_scrub_begin (void);
--- a/gas/doc/as.texi
+++ b/gas/doc/as.texi
@@ -257,9 +257,11 @@  gcc(1), ld(1), and the Info entries for
  [@b{-o} @var{objfile}] [@b{-R}]
  [@b{--scfi=experimental}]
  [@b{--sectname-subst}]
+ [@b{--silent}]
  [@b{--size-check=[error|warning]}]
  [@b{--statistics}]
- [@b{-v}] [@b{-version}] [@b{--version}]
+ [@b{-v}] [@b{--verbose}]
+ [@b{-version}] [@b{--version}]
  [@b{-W}] [@b{--no-warn}] [@b{--warn}] [@b{--fatal-warnings}]
  [@b{-w}] [@b{-x}]
  [@b{-Z}] [@b{@@@var{FILE}}]
@@ -961,6 +963,11 @@  Honor substitution sequences in section
 @xref{Section Name Substitutions,,@code{.section @var{name}}}.
 @end ifclear
 
+@item --silent
+Since @option{-v}, when used multiple times, has dual purpose, this option
+suppresses the emission of the version ID message from any earlier @option{-v}.
+Later uses of @option{-v} will further override this.
+
 @item --size-check=error
 @itemx --size-check=warning
 Issue an error or warning for invalid ELF .size directive.
@@ -974,10 +981,15 @@  assembly.
 Remove local absolute symbols from the outgoing symbol table.
 
 @item -v
-@itemx -version
-Print the @command{as} version.
+@itemx --verbose
+Print the @command{as} version.  When used more than once, it also controls the
+insertion of certain pre-defined symbols in the symbol table: When used twice,
+those symbols become available for use in source code, without inserting them
+into the symbol table.  When used three times, such symbols are also inserted
+in the symbol table. (@pxref{Predefined Symbols})
 
 @item --version
+@itemx -version
 Print the @command{as} version and exit.
 
 @item -W
@@ -3896,6 +3908,7 @@  the same order they were declared.  This
 * Symbol Names::                Symbol Names
 * Dot::                         The Special Dot Symbol
 * Symbol Attributes::           Symbol Attributes
+* Predefined Symbols::          Predefined Symbols
 @end menu
 
 @node Labels
@@ -4237,6 +4250,30 @@  Language Reference Manual} (HP 92432-900
 @code{EXPORT} assembler directive documentation.
 @end ifset
 
+@node Predefined Symbols
+@section Predefined Symbols
+
+If enabled, certain pre-defined symbols will be made available for use, and
+possibly also inserted in the symbol table.  Below @code{_} is used as a word
+separator.  The actual separator used is target specific, and will typically
+be a character not usable in plain symbol names.  Access to these symbols will
+therefore normally require quotation.
+
+Independent of the specific target, the following symbols can be made
+available:
+@itemize @bullet
+
+@item __gas_version__
+The version of the assembler, expressed as @samp{major} @code{*} 100000000
+@code{+} @samp{minor} @code{*} 1000000 @code{+} @samp{rev} @code{*} 10000.
+
+@item __gas_date__
+The date of the assembler sources (which may not be the date the assembler was
+built).  This is added only for non-release versions of gas.  The specific
+value probably better isn't checked for, just its defined-ness.
+
+@end itemize
+
 @node Expressions
 @chapter Expressions
 
--- a/gas/testsuite/gas/all/gas.exp
+++ b/gas/testsuite/gas/all/gas.exp
@@ -566,3 +566,39 @@  run_dump_test "multibyte1"
 run_dump_test "multibyte2"
 run_list_test "multibyte3" "--multibyte-handling=warn"
 run_list_test "multibyte3" "-f --multibyte-handling=warn"
+
+# ONLY_STANDARD_ESCAPES targets can't deal with the macro-argument-like
+# expansion used in the test.
+# EVAX has an extra line printed by objdump when there are no relocations,
+# and doesn't appear to enter absolute symbols into the symbol table.
+# HPPA has too many flavors, quite a few of which won't properly handle
+# the tests.
+# IA-64 doesn't like the # that the test probes for, as that's a suffix
+# operator there.
+# The TI C54x backend segfaults apparently because of the use of $, after
+# having claimed over 90k times that it would have stopped substitution
+# symbol recursion.
+switch -glob $target_triplet {
+    alpha-*-*vms* { }
+    avr-*-* { }
+    *c54x*-*-* { }
+    cris*-*-* { }
+    hppa*-*-* { }
+    ia64-*-* { }
+    msp430*-*-* { }
+    z80-*-* { }
+    default {
+	run_dump_test "version"
+	# Non-ELF symbol tables may include section symbols.
+	# RL78 includes a special absolute symbol.
+	if { [is_elf_format] && ![istarget "rl78*-*-*"] } {
+	    run_dump_test "version2"
+	}
+	# ECOFF/XCOFF don't look to (reliably) emit local absolute symbols.
+	if { ![is_xcoff_format]
+	     && ![istarget "alpha-*-linux*ecoff*"]
+	     && ![istarget "alpha-*-osf*"] } {
+	    run_dump_test "version3"
+	}
+    }
+}
--- /dev/null
+++ b/gas/testsuite/gas/all/version.d
@@ -0,0 +1,9 @@ 
+#as: -vv -silent
+#as: -v -v -silent
+#objdump: -rsj .data
+#name: pre-defined version symbol
+
+.*: +file format .*
+
+Contents of section .data:
+ 0+ [0-9a-f]*[1-9a-f][0-9a-f]* .*
--- /dev/null
+++ b/gas/testsuite/gas/all/version.s
@@ -0,0 +1,6 @@ 
+	.data
+	.irpc sep, ".@$?%&#_"
+	.ifdef "\sep\sep\()gas\sep\()version\sep\sep"
+	.dc.l "\sep\sep\()gas\sep\()version\sep\sep"
+	.endif
+	.endr
--- /dev/null
+++ b/gas/testsuite/gas/all/version2.d
@@ -0,0 +1,5 @@ 
+#as: -vv -silent
+#as: -v -v -silent
+#nm: --quiet
+#name: pre-defined version symbol (empty symbol table)
+#source: version.s
--- /dev/null
+++ b/gas/testsuite/gas/all/version3.d
@@ -0,0 +1,9 @@ 
+#as: -vvv -silent
+#as: -v -v -v -silent
+#nm: -f bsd
+#name: pre-defined version symbol (non-empty symbol table)
+#source: version.s
+
+#...
+.* a ..gas.version..
+#pass