Fix regcomp.c build for 32-bit with GCC mainline
Commit Message
On 11/20/2017 03:18 PM, Joseph Myers wrote:
> On Mon, 20 Nov 2017, Adhemerval Zanella wrote:
>
>> Why not use the already upstream fix from gnulib
>> (252b52457da7887667c036d18cc5169777615bb0) ?
> That change would not apply cleanly to current sources,
That should be easy enough to fix. Proposed patch attached.
> and if we have
> over five years of regex changes out of sync with gnulib, it doesn't seem
> a good idea to try to tie any possible resync with gnulib (in both
> directions) to fixing build failures to make it possible to detect build
> regressions when they occur (rather than piling one failure on top of
> another that isn't yet fixed, so making it hard to determine the
> problematic patch, which is the current state).
Although I didn't quite follow all that, I vaguely took it to mean "it's
too much trouble to try to keep glibc regex code close to Gnulib". But
if someone has taken the trouble to keep them closer (as in the attached
patch) then we might as well keep them as close as we reasonably can. On
the Gnulib side I've been trying to do that by merging glibc's changes,
as in the patch I installed into Gnulib today:
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=6dc8556f97cf3e7953249190b46465ad845761cf
From 99a25c4defd9ff20c9592d999c5ff2e932c22008 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 20 Nov 2017 16:25:49 -0800
Subject: [PATCH] regex: don't assume uint64_t or uint32_t
This avoids -Werror=overflow errors for 32-bit systems in
the 64-bit case. Problem reported by Joseph Myers in:
https://sourceware.org/ml/libc-alpha/2017-11/msg00694.html
Also, when this code is used in Gnulib it ports to platforms
that lack uint64_t and uint32_t. The C standard doesn't guarantee
them, and on some 32-bit compilers there is no uint64_t.
Problem reported by Gianluigi Tiesi in:
http://lists.gnu.org/archive/html/bug-gnulib/2012-03/msg00154.html
* posix/regcomp.c (init_word_char): Don't assume that the types
uint64_t and uint32_t exist. Adapted from Gnulib patch
2012-05-27T06:40:00!eggert@cs.ucla.edu. See:
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=252b52457da7887667c036d18cc5169777615bb0
---
ChangeLog | 16 ++++++++++++++++
posix/regcomp.c | 29 +++++++++++++++--------------
2 files changed, 31 insertions(+), 14 deletions(-)
Comments
On Mon, 20 Nov 2017, Paul Eggert wrote:
> On 11/20/2017 03:18 PM, Joseph Myers wrote:
> > On Mon, 20 Nov 2017, Adhemerval Zanella wrote:
> >
> > > Why not use the already upstream fix from gnulib
> > > (252b52457da7887667c036d18cc5169777615bb0) ?
> > That change would not apply cleanly to current sources,
>
> That should be easy enough to fix. Proposed patch attached.
Please commit this fix.
> > and if we have
> > over five years of regex changes out of sync with gnulib, it doesn't seem
> > a good idea to try to tie any possible resync with gnulib (in both
> > directions) to fixing build failures to make it possible to detect build
> > regressions when they occur (rather than piling one failure on top of
> > another that isn't yet fixed, so making it hard to determine the
> > problematic patch, which is the current state).
>
> Although I didn't quite follow all that, I vaguely took it to mean "it's too
> much trouble to try to keep glibc regex code close to Gnulib". But if someone
It means that *in the context where we haven't had regression-free builds
with GCC and binutils mainline for over two months* (over two weeks even
if you disregard the old powerpc64le problems), and so investigation of
regressions is routinely complicated by them having appeared while things
were already broken for another reason, we want simple, safe fixes for
build regressions that can be understood and reviewed quickly.
Complicated, high-risk changes such as merging years of changes from
gnulib need more detailed review, which would leave the build broken for
longer, and if the build is already clean at the time such complicated
changes go in, it makes it a lot easier to identify when those changes
have caused build regressions.
Joseph Myers wrote:
> Complicated, high-risk changes such as merging years of changes from
> gnulib need more detailed review
No argument there! And thanks for the further explanation. I committed the patch.
On Nov 20 2017, Paul Eggert <eggert@cs.ucla.edu> wrote:
> This avoids -Werror=overflow errors for 32-bit systems in
> the 64-bit case. Problem reported by Joseph Myers in:
> https://sourceware.org/ml/libc-alpha/2017-11/msg00694.html
> Also, when this code is used in Gnulib it ports to platforms
> that lack uint64_t and uint32_t. The C standard doesn't guarantee
> them, and on some 32-bit compilers there is no uint64_t.
> Problem reported by Gianluigi Tiesi in:
This needs to go in a comment.
Andreas.
@@ -1,3 +1,19 @@
+2017-11-20 Paul Eggert <eggert@cs.ucla.edu>
+
+ regex: don't assume uint64_t or uint32_t
+ This avoids -Werror=overflow errors for 32-bit systems in
+ the 64-bit case. Problem reported by Joseph Myers in:
+ https://sourceware.org/ml/libc-alpha/2017-11/msg00694.html
+ Also, when this code is used in Gnulib it ports to platforms
+ that lack uint64_t and uint32_t. The C standard doesn't guarantee
+ them, and on some 32-bit compilers there is no uint64_t.
+ Problem reported by Gianluigi Tiesi in:
+ http://lists.gnu.org/archive/html/bug-gnulib/2012-03/msg00154.html
+ * posix/regcomp.c (init_word_char): Don't assume that the types
+ uint64_t and uint32_t exist. Adapted from Gnulib patch
+ 2012-05-27T06:40:00!eggert@cs.ucla.edu. See:
+ https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=252b52457da7887667c036d18cc5169777615bb0
+
2017-11-20 Siddhesh Poyarekar <siddhesh@sourceware.org>
* sysdeps/aarch64/memset-reg.h: New file.
@@ -925,26 +925,26 @@ init_word_char (re_dfa_t *dfa)
int ch = 0;
if (BE (dfa->map_notascii == 0, 1))
{
- if (sizeof (dfa->word_char[0]) == 8)
- {
- /* The extra temporaries here avoid "implicitly truncated"
- warnings in the case when this is dead code, i.e. 32-bit. */
- const uint64_t wc0 = UINT64_C (0x03ff000000000000);
- const uint64_t wc1 = UINT64_C (0x07fffffe87fffffe);
- dfa->word_char[0] = wc0;
- dfa->word_char[1] = wc1;
+ bitset_word_t bits0 = 0x00000000;
+ bitset_word_t bits1 = 0x03ff0000;
+ bitset_word_t bits2 = 0x87fffffe;
+ bitset_word_t bits3 = 0x07fffffe;
+ if (BITSET_WORD_BITS == 64)
+ {
+ dfa->word_char[0] = bits1 << 31 << 1 | bits0;
+ dfa->word_char[1] = bits3 << 31 << 1 | bits2;
i = 2;
}
- else if (sizeof (dfa->word_char[0]) == 4)
+ else if (BITSET_WORD_BITS == 32)
{
- dfa->word_char[0] = UINT32_C (0x00000000);
- dfa->word_char[1] = UINT32_C (0x03ff0000);
- dfa->word_char[2] = UINT32_C (0x87fffffe);
- dfa->word_char[3] = UINT32_C (0x07fffffe);
+ dfa->word_char[0] = bits0;
+ dfa->word_char[1] = bits1;
+ dfa->word_char[2] = bits2;
+ dfa->word_char[3] = bits3;
i = 4;
}
else
- abort ();
+ goto general_case;
ch = 128;
if (BE (dfa->is_utf8, 1))
@@ -954,6 +954,7 @@ init_word_char (re_dfa_t *dfa)
}
}
+ general_case:
for (; i < BITSET_WORDS; ++i)
for (int j = 0; j < BITSET_WORD_BITS; ++j, ++ch)
if (isalnum (ch) || ch == '_')