[v4] Use a proper C tokenizer to implement the obsolete typedefs test.

  The test for obsolete typedefs in installed headers was implemented
using grep, and could therefore get false positives on e.g. “ulong”
in a comment.  It was also scanning all of the headers included by
our headers, and therefore testing headers we don’t control, e.g.
Linux kernel headers.

This patch splits the obsolete-typedef test from
scripts/check-installed-headers.sh to a separate program,
scripts/check-obsolete-constructs.py.  Being implemented in Python,
it is feasible to make it tokenize C accurately enough to avoid false
positives on the contents of comments and strings.  It also only
examines $(headers) in each subdirectory--all the headers we install,
but not any external dependencies of those headers.  Headers whose
installed name starts with finclude/ are ignored, on the assumption
that they contain Fortran.

It is also feasible to make the new test understand the difference
between _defining_ the obsolete typedefs and _using_ the obsolete
typedefs, which means posix/{bits,sys}/types.h no longer need to be
exempted.  This uncovered an actual bug in bits/types.h: __quad_t and
__u_quad_t were being used to define __S64_TYPE, __U64_TYPE,
__SQUAD_TYPE and __UQUAD_TYPE.  These are changed to __int64_t and
__uint64_t respectively.  This is a safe change, despite the comments
in bits/types.h claiming a difference between __quad_t and __int64_t,
because those comments are incorrect.  In all current ABIs, both
__quad_t and __int64_t are ‘long’ when ‘long’ is a 64-bit type, and
‘long long’ when ‘long’ is a 32-bit type, and similarly for __u_quad_t
and __uint64_t.  (Changing the types to be what the comments say they
are would be an ABI break, as it affects C++ name mangling.)  This
patch includes a minimal change to make the comments not completely
wrong.  I plan to remove __SQUAD_TYPE and __UQUAD_TYPE altogether in
subseqent patches, but that would be inappropriate for backporting to
release branches.

sys/types.h was defining the legacy BSD u_intN_t typedefs using a
construct that was not necessarily consistent with how the C99 uintN_t
typedefs are defined, and is also too complicated for the new script to
understand (it lexes C relatively accurately, but it does not attempt
to expand preprocessor macros, nor does it do any actual parsing).
This patch cuts all of that out and uses bits/types.h's __uintN_t typedefs
to define u_intN_t instead.  This is verified to not change the ABI on
any supported architecture, via the c++-types test, which means u_intN_t
and uintN_t were, in fact, consistent on all supported architectures.
I plan to restrict u_intN_t and some other legacy typedefs (but not
intN_t) to __USE_MISC in subsequent patches, but again that would be
inappropriate for backporting to release branches.

	* scripts/check-obsolete-constructs.py: New test script.
        * scripts/check-installed-headers.sh: Remove tests for
        obsolete typedefs, superseded by check-obsolete-constructs.py.
        * Rules: Run scripts/check-obsolete-constructs.py over $(headers)
        as a special test.  Update commentary.
        * posix/bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.
        (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.
        Update commentary.
        * posix/sys/types.h (__u_intN_t): Remove.
        (u_int8_t): Typedef using __uint8_t.
        (u_int16_t): Typedef using __uint16_t.
        (u_int32_t): Typedef using __uint32_t.
        (u_int64_t): Typedef using __uint64_t.
---

Changes since v3: Fortran headers are now detected by path
(finclude/*) instead of looking for Emacs mode annotations within the
file.  tokenize_c is now responsible for issuing errors for the BAD_*
and OTHER lexical productions, and no longer returns BAD_* tokens.
It is also responsible for tracking whether or not each token belongs
to a preprocessing directive line, and accurately tokenizes #include
arguments.  (Accurate tokenization of #include arguments will be
required by future patches I have planned.)

---
 Rules                                |  17 +-
 posix/bits/types.h                   |  10 +-
 posix/sys/types.h                    |  33 +-
 scripts/check-installed-headers.sh   |  37 +--
 scripts/check-obsolete-constructs.py | 466 +++++++++++++++++++++++++++
 5 files changed, 500 insertions(+), 63 deletions(-)
 create mode 100755 scripts/check-obsolete-constructs.py

2.20.1

[v4] Use a proper C tokenizer to implement the obsolete typedefs test.

Commit Message

Comments

Patch