Use gen-as-const.py to process .pysym files

Message ID alpine.DEB.2.21.1812042206050.4787@digraph.polyomino.org.uk
State Superseded
Headers

Commit Message

Joseph Myers Dec. 4, 2018, 10:07 p.m. UTC
  This patch eliminates the gen-py-const.awk variant of gen-as-const,
switching to use of gnu-as-const.py (with a new --python option) to
process .pysym files (i.e., to generate nptl_lock_constants.py), as
the syntax of those files is identical to that of .sym files.

Note that the generated nptl_lock_constants.py is *not* identical to
the version generated by the awk script.  Apart from the trivial
changes (comment referencing the new script, and output being sorted),
the constant FUTEX_WAITERS, PTHREAD_MUTEXATTR_FLAG_BITS,
PTHREAD_MUTEXATTR_FLAG_PSHARED and PTHREAD_MUTEX_PRIO_CEILING_MASK are
now output as positive rather than negative constants (on x86_64
anyway; maybe not necessarily on 32-bit systems):

< FUTEX_WAITERS = -2147483648
---
> FUTEX_WAITERS = 2147483648

< PTHREAD_MUTEXATTR_FLAG_BITS = -251662336
< PTHREAD_MUTEXATTR_FLAG_PSHARED = -2147483648
---
> PTHREAD_MUTEXATTR_FLAG_BITS = 4043304960
> PTHREAD_MUTEXATTR_FLAG_PSHARED = 2147483648

< PTHREAD_MUTEX_PRIO_CEILING_MASK = -524288
---
> PTHREAD_MUTEX_PRIO_CEILING_MASK = 4294443008

This is because gen-as-const has a cast of the constant value to long
int, which gen-py-const lacks.

I think the positive values are more logically correct, since the
constants in question are in fact unsigned in C.  But to reliably
produce gen-as-const.py output for constants that always (in C and
Python) reflects the signedness of values with the high bit of "long
int" set would mean more complicated logic needs to be used in
computing values.

The more correct positive values by themselves produce a failure of
nptl/test-mutexattr-printers, because masking with
~PTHREAD_MUTEXATTR_FLAG_BITS & ~PTHREAD_MUTEX_NO_ELISION_NP now leaves
a bit -1 << 32 in the Python value, resulting in a KeyError exception.
To avoid that, places masking with ~ of one of the constants in
question are changed to mask with 0xffffffff as well (this reflects
how ~ in Python applies to an infinite-precision integer whereas ~ in
C does not do any promotions beyond the width of int).

Tested for x86_64.

2018-12-04  Joseph Myers  <joseph@codesourcery.com>

	* scripts/gen-as-const.py (main): Handle --python option.
	* scripts/gen-py-const.awk: Remove.
	* Makerules (py-const-script): Use gen-as-const.py.
	($(py-const)): Likewise.
	* nptl/nptl-printers.py (MutexPrinter.read_status_no_robust): Mask
	with 0xffffffff together with ~(PTHREAD_MUTEX_PRIO_CEILING_MASK).
	(MutexAttributesPrinter.read_values): Mask with 0xffffffff
	together with ~PTHREAD_MUTEXATTR_FLAG_BITS and
	~PTHREAD_MUTEX_NO_ELISION_NP.
	* manual/README.pretty-printers: Update reference to
	gen-py-const.awk.

---

This is functionally independent of
<https://sourceware.org/ml/libc-alpha/2018-12/msg00089.html> (Move
tst-signal-numbers to Python); whichever goes in second can readily be
updated for the trivial textual conflicts.
  

Comments

Andreas Schwab Dec. 4, 2018, 10:37 p.m. UTC | #1
On Dez 04 2018, Joseph Myers <joseph@codesourcery.com> wrote:

> The more correct positive values by themselves produce a failure of
> nptl/test-mutexattr-printers, because masking with
> ~PTHREAD_MUTEXATTR_FLAG_BITS & ~PTHREAD_MUTEX_NO_ELISION_NP now leaves
> a bit -1 << 32 in the Python value, resulting in a KeyError exception.
> To avoid that, places masking with ~ of one of the constants in
> question are changed to mask with 0xffffffff as well (this reflects
> how ~ in Python applies to an infinite-precision integer whereas ~ in
> C does not do any promotions beyond the width of int).

If ~ doesn't reflect C semantics then ^ 0xffffffff would be better.

Andreas.
  
Joseph Myers Dec. 4, 2018, 11 p.m. UTC | #2
On Tue, 4 Dec 2018, Andreas Schwab wrote:

> On Dez 04 2018, Joseph Myers <joseph@codesourcery.com> wrote:
> 
> > The more correct positive values by themselves produce a failure of
> > nptl/test-mutexattr-printers, because masking with
> > ~PTHREAD_MUTEXATTR_FLAG_BITS & ~PTHREAD_MUTEX_NO_ELISION_NP now leaves
> > a bit -1 << 32 in the Python value, resulting in a KeyError exception.
> > To avoid that, places masking with ~ of one of the constants in
> > question are changed to mask with 0xffffffff as well (this reflects
> > how ~ in Python applies to an infinite-precision integer whereas ~ in
> > C does not do any promotions beyond the width of int).
> 
> If ~ doesn't reflect C semantics then ^ 0xffffffff would be better.

The constants in question remain negative on 32-bit systems both before 
and after this patch, so replacing ~CONST with CONST ^ 0xffffffff wouldn't 
achieve the result of removing the -1 << 32 bits.
  
Joseph Myers Dec. 10, 2018, 2:07 p.m. UTC | #3
Ping.  This patch 
<https://sourceware.org/ml/libc-alpha/2018-12/msg00118.html> is pending 
review.
  
Florian Weimer Dec. 10, 2018, 3:09 p.m. UTC | #4
* Joseph Myers:

> +# The output is redirected to a .py file; we'll import it in the
> +# pretty printers file to read the constants generated by
> +# gen-as-const.py.

I would drop the reference to pretty printers here …

> +    elif args.python:
> +        consts = compute_c_consts(sym_data, args.cc)
> +        print('# GENERATED FILE\n'
> +              '\n'
> +              '# Constant definitions for pretty printers.\n'
> +              '# See gen-as-const.py for details.\n')
> +        print('\n'.join('%s = %s' % c for c in sorted(consts.items())))

… and here since the code appears to be generic.

The masking changes look okay to me.

Thanks,
Florian
  
Andreas Schwab Dec. 10, 2018, 3:23 p.m. UTC | #5
On Dez 04 2018, Joseph Myers <joseph@codesourcery.com> wrote:

> @@ -157,6 +159,13 @@ def main():
>              sym_data.append('START')
>      if args.test:
>          print(gen_test(sym_data))
> +    elif args.python:
> +        consts = compute_c_consts(sym_data, args.cc)
> +        print('# GENERATED FILE\n'
> +              '\n'
> +              '# Constant definitions for pretty printers.\n'
> +              '# See gen-as-const.py for details.\n')
> +        print('\n'.join('%s = %s' % c for c in sorted(consts.items())))
>      else:
>          consts = compute_c_consts(sym_data, args.cc)
>          print(''.join('#define %s %s\n' % c for c in sorted(consts.items())), end='')

Please use ''.join instead of '\n'.join.

Andreas.
  

Patch

diff --git a/Makerules b/Makerules
index 8e49a73342..73ee61563e 100644
--- a/Makerules
+++ b/Makerules
@@ -232,42 +232,27 @@  ifdef gen-py-const-headers
 py-const-files := $(patsubst %.pysym,%.py,$(gen-py-const-headers))
 py-const-dir := $(objpfx)
 py-const := $(addprefix $(py-const-dir),$(py-const-files))
-py-const-script := $(..)scripts/gen-py-const.awk
+py-const-script := $(..)scripts/gen-as-const.py
 
 # This is a hack we use to generate .py files with constants for Python
-# pretty printers.  It works the same way as gen-as-const.
-# See scripts/gen-py-const.awk for details on how the awk | gcc mechanism
-# works.
+# pretty printers.
 #
-# $@.tmp and $@.tmp2 are temporary files we use to store the partial contents
-# of the target file.  We do this instead of just writing on $@ because, if the
-# build process terminates prematurely, re-running Make wouldn't run this rule
-# since Make would see that the target file already exists (despite it being
-# incomplete).
+# $@.tmp is a temporary file we use to store the partial contents of
+# the target file.  We do this instead of just writing on $@ because,
+# if the build process terminates prematurely, re-running Make
+# wouldn't run this rule since Make would see that the target file
+# already exists (despite it being incomplete).
 #
-# The sed line replaces "@name@SOME_NAME@value@SOME_VALUE@" strings from the
-# output of 'gcc -S' with "SOME_NAME = SOME_VALUE" strings.
-# The '-n' option, combined with the '/p' command, makes sed output only the
-# modified lines instead of the whole input file.  The output is redirected
-# to a .py file; we'll import it in the pretty printers file to read
-# the constants generated by gen-py-const.awk.
-# The regex has two capturing groups, for SOME_NAME and SOME_VALUE
-# respectively.  Notice SOME_VALUE may be prepended by a special character,
-# depending on the assembly syntax (e.g. immediates are prefixed by a '$'
-# in AT&T x86, and by a '#' in ARM).  We discard it using a complemented set
-# before the second capturing group.
+# The output is redirected to a .py file; we'll import it in the
+# pretty printers file to read the constants generated by
+# gen-as-const.py.
 $(py-const): $(py-const-dir)%.py: %.pysym $(py-const-script) \
 	     $(common-before-compile)
 	$(make-target-directory)
-	$(AWK) -f $(py-const-script) $< \
-	       | $(CC) -S -o $@.tmp $(CFLAGS) $(CPPFLAGS) -x c -
-	echo '# GENERATED FILE\n' > $@.tmp2
-	echo '# Constant definitions for pretty printers.' >> $@.tmp2
-	echo '# See gen-py-const.awk for details.\n' >> $@.tmp2
-	sed -n -r 's/^.*@name@([^@]+)@value@[^[:xdigit:]Xx-]*([[:xdigit:]Xx-]+)@.*/\1 = \2/p' \
-	    $@.tmp >> $@.tmp2
-	mv -f $@.tmp2 $@
-	rm -f $@.tmp
+	$(PYTHON) $(py-const-script) --python \
+		  --cc="$(CC) $(CFLAGS) $(CPPFLAGS)" $< \
+		  > $@.tmp
+	mv -f $@.tmp $@
 
 generated += $(py-const)
 endif  # gen-py-const-headers
diff --git a/manual/README.pretty-printers b/manual/README.pretty-printers
index 2522cb858d..a2ba89843c 100644
--- a/manual/README.pretty-printers
+++ b/manual/README.pretty-printers
@@ -104,7 +104,7 @@  Adding new pretty printers
 Adding new pretty printers to glibc requires following these steps:
 
 1. Identify which constants must be generated from C headers, and write the
-corresponding .pysym file.  See scripts/gen-py-const.awk for more information
+corresponding .pysym file.  See scripts/gen-as-const.py for more information
 on how this works.  The name of the .pysym file must be added to the
 'gen-py-const-headers' variable in your submodule's Makefile (without the .pysym
 extension).
diff --git a/nptl/nptl-printers.py b/nptl/nptl-printers.py
index c5a69ef1fd..104460c487 100644
--- a/nptl/nptl-printers.py
+++ b/nptl/nptl-printers.py
@@ -155,7 +155,7 @@  class MutexPrinter(object):
         lock_value = self.lock
 
         if self.kind & PTHREAD_MUTEX_PRIO_PROTECT_NP:
-            lock_value &= ~(PTHREAD_MUTEX_PRIO_CEILING_MASK)
+            lock_value &= 0xffffffff & ~(PTHREAD_MUTEX_PRIO_CEILING_MASK)
 
         if lock_value == PTHREAD_MUTEX_UNLOCKED:
             self.values.append(('Status', 'Not acquired'))
@@ -274,6 +274,7 @@  class MutexAttributesPrinter(object):
         """
 
         mutexattr_type = (self.mutexattr
+                          & 0xffffffff
                           & ~PTHREAD_MUTEXATTR_FLAG_BITS
                           & ~PTHREAD_MUTEX_NO_ELISION_NP)
 
diff --git a/scripts/gen-as-const.py b/scripts/gen-as-const.py
index eb85ef1aa0..086a265d93 100644
--- a/scripts/gen-as-const.py
+++ b/scripts/gen-as-const.py
@@ -129,6 +129,8 @@  def main():
                         help='C compiler (including options) to use')
     parser.add_argument('--test', action='store_true',
                         help='Generate test case instead of header')
+    parser.add_argument('--python', action='store_true',
+                        help='Generate Python file instead of header')
     parser.add_argument('sym_file',
                         help='.sym file to process')
     args = parser.parse_args()
@@ -157,6 +159,13 @@  def main():
             sym_data.append('START')
     if args.test:
         print(gen_test(sym_data))
+    elif args.python:
+        consts = compute_c_consts(sym_data, args.cc)
+        print('# GENERATED FILE\n'
+              '\n'
+              '# Constant definitions for pretty printers.\n'
+              '# See gen-as-const.py for details.\n')
+        print('\n'.join('%s = %s' % c for c in sorted(consts.items())))
     else:
         consts = compute_c_consts(sym_data, args.cc)
         print(''.join('#define %s %s\n' % c for c in sorted(consts.items())), end='')
diff --git a/scripts/gen-py-const.awk b/scripts/gen-py-const.awk
deleted file mode 100644
index 91220281c5..0000000000
--- a/scripts/gen-py-const.awk
+++ /dev/null
@@ -1,118 +0,0 @@ 
-# Script to generate constants for Python pretty printers.
-#
-# Copyright (C) 2016-2018 Free Software Foundation, Inc.
-# This file is part of the GNU C Library.
-#
-# The GNU C Library is free software; you can redistribute it and/or
-# modify it under the terms of the GNU Lesser General Public
-# License as published by the Free Software Foundation; either
-# version 2.1 of the License, or (at your option) any later version.
-#
-# The GNU C Library is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-# Lesser General Public License for more details.
-#
-# You should have received a copy of the GNU Lesser General Public
-# License along with the GNU C Library; if not, see
-# <http://www.gnu.org/licenses/>.
-
-# This script is a smaller version of the clever gen-asm-const.awk hack used to
-# generate ASM constants from .sym files.  We'll use this to generate constants
-# for Python pretty printers.
-#
-# The input to this script are .pysym files that look like:
-# #C_Preprocessor_Directive...
-# NAME1
-# NAME2 expression...
-#
-# A line giving just a name implies an expression consisting of just that name.
-# Comments start with '--'.
-#
-# The output of this script is a 'dummy' function containing 'asm' declarations
-# for each non-preprocessor line in the .pysym file.  The expression values
-# will appear as input operands to the 'asm' declaration.  For example, if we
-# have:
-#
-# /* header.h */
-# #define MACRO 42
-#
-# struct S {
-#     char c1;
-#     char c2;
-#     char c3;
-# };
-#
-# enum E {
-#     ZERO,
-#     ONE
-# };
-#
-# /* symbols.pysym */
-# #include <stddef.h>
-# #include "header.h"
-# -- This is a comment
-# MACRO
-# C3_OFFSET offsetof(struct S, c3)
-# E_ONE ONE
-#
-# the output will be:
-#
-# #include <stddef.h>
-# #include "header.h"
-# void dummy(void)
-# {
-#   asm ("@name@MACRO@value@%0@" : : "i" (MACRO));
-#   asm ("@name@C3_OFFSET@value@%0@" : : "i" (offsetof(struct S, c3)));
-#   asm ("@name@E_ONE@value@%0@" : : "i" (ONE));
-# }
-#
-# We'll later feed this output to gcc -S.  Since '-S' tells gcc to compile but
-# not assemble, gcc will output something like:
-#
-# dummy:
-# 	...
-# 	@name@MACRO@value@$42@
-# 	@name@C3_OFFSET@value@$2@
-# 	@name@E_ONE@value@$1@
-#
-# Finally, we can process that output to extract the constant values.
-# Notice gcc may prepend a special character such as '$' to each value.
-
-# found_symbol indicates whether we found a non-comment, non-preprocessor line.
-BEGIN { found_symbol = 0 }
-
-# C preprocessor directives go straight through.
-/^#/ { print; next; }
-
-# Skip comments.
-/--/ { next; }
-
-# Trim leading whitespace.
-{ sub(/^[[:blank:]]*/, ""); }
-
-# If we found a non-comment, non-preprocessor line, print the 'dummy' function
-# header.
-NF > 0 && !found_symbol {
-    print "void dummy(void)\n{";
-    found_symbol = 1;
-}
-
-# If the line contains just a name, duplicate it so we can use that name
-# as the value of the expression.
-NF == 1 { sub(/^.*$/, "& &"); }
-
-# If a line contains a name and an expression...
-NF > 1 {
-    name = $1;
-
-    # Remove any characters before the second field.
-    sub(/^[^[:blank:]]+[[:blank:]]+/, "");
-
-    # '$0' ends up being everything that appeared after the first field
-    # separator.
-    printf "  asm (\"@name@%s@value@%0@\" : : \"i\" (%s));\n", name, $0;
-}
-
-# Close the 'dummy' function.
-END { if (found_symbol) print "}"; }