[v2] __builtin_expect cleanup for iconv{,data}/*.c (was: Re: [PATCH] __builtin_expect cleanup for iconvdata/*.c)
Commit Message
This is the second iteration of these patches. The rewriting script has
grown considerably and seems useful for subsequent clean-ups. The first
patch installs it as scripts/builtin_expect.py.
The second patch is the automated part of the conversion. The changes
look reasonable to me, but double-checking is certainly welcome.
The third patch contains the manual part. It is now much smaller
because the rewriting script is more sophisticated.
Again, I used a preprocessor #define to disable __builtin_expect and
objdump -d --reloc to compare the results. Because of a line merge and
the resulting line number difference in an assert statement, there was
one assembly difference.
Testing on Fedora 20 x86_64 also revealed no regressions.
Comments
Florian Weimer <fweimer@redhat.com> writes:
> --- a/iconv/gconv_simple.c
> +++ b/iconv/gconv_simple.c
> @@ -387,8 +387,7 @@ ucs4_internal_loop_single (struct __gconv_step *step,
> return __GCONV_INCOMPLETE_INPUT;
> }
>
> - if (__builtin_expect (((unsigned char *) state->__value.__wchb)[0] > 0x80,
> - 0))
> + if (__glibc_unlikely (((unsigned char *) state->__value.__wchb)[0] > 0x80))
I think that is a bug in the original code. Should compare > 0x7f.
Likewise in internal_ucs4_loop_unaligned.
Contrast to ucs4_internal_loop, which compares inval > 0x7fffffff.
> - if (__builtin_expect (flags & GCONV_AVOID_NOCONV, 0)
> + if (__glibc_unlikely (flags & GCONV_AVOID_NOCONV)
This looks a bit suspicious because it is not obvious that the
flags & GCONV_AVOID_NOCONV expression always evaluates to 0 or 1.
I'm not sure __glibc_unlikely ((flags & GCONV_AVOID_NOCONV) != 0)
would be any nicer, though.
My concern was that the code is not very different from
__builtin_expect (x & 2, 1), from which an excessively clever
compiler could figure out that the expectation cannot ever hold.
Anyway, the code is different enough to avoid any such problem.
> - /*if (__builtin_expect (ch, 0) == __UNKNOWN_10646_CHAR)*/ \
> + /*if (__glibc_unlikely (ch == __UNKNOWN_10646_CHAR))*/ \
> if (__glibc_unlikely (ch == __UNKNOWN_10646_CHAR)) \
I don't think that comment has any value left.
I didn't finish reading the patch yet.
On 09/10/2014 01:55 AM, Kalle Olavi Niemitalo wrote:
> Florian Weimer <fweimer@redhat.com> writes:
>
>> --- a/iconv/gconv_simple.c
>> +++ b/iconv/gconv_simple.c
>> @@ -387,8 +387,7 @@ ucs4_internal_loop_single (struct __gconv_step *step,
>> return __GCONV_INCOMPLETE_INPUT;
>> }
>>
>> - if (__builtin_expect (((unsigned char *) state->__value.__wchb)[0] > 0x80,
>> - 0))
>> + if (__glibc_unlikely (((unsigned char *) state->__value.__wchb)[0] > 0x80))
>
> I think that is a bug in the original code. Should compare > 0x7f.
> Likewise in internal_ucs4_loop_unaligned.
> Contrast to ucs4_internal_loop, which compares inval > 0x7fffffff.
This might be the case. But it seems this code is effectively
dead—iconv only converts complete multi-byte sequences, and the mbr*
need a matching locale, and we haven't got a UCS-4 locale (multi-byte
character sets which regularly use NUL bytes do not work with C).
Please file a bug if you think this still needs fixing, especially if
you can write a test case which shows the issue.
>> - if (__builtin_expect (flags & GCONV_AVOID_NOCONV, 0)
>> + if (__glibc_unlikely (flags & GCONV_AVOID_NOCONV)
>
> This looks a bit suspicious because it is not obvious that the
> flags & GCONV_AVOID_NOCONV expression always evaluates to 0 or 1.
> I'm not sure __glibc_unlikely ((flags & GCONV_AVOID_NOCONV) != 0)
> would be any nicer, though.
Has glibc a rule not to rely on implicit booleans? Than the != 0
variant would be preferred.
>> - /*if (__builtin_expect (ch, 0) == __UNKNOWN_10646_CHAR)*/ \
>> + /*if (__glibc_unlikely (ch == __UNKNOWN_10646_CHAR))*/ \
>> if (__glibc_unlikely (ch == __UNKNOWN_10646_CHAR)) \
>
> I don't think that comment has any value left.
Okay, I'll remove it.
> I didn't finish reading the patch yet.
Thanks so far!
Florian Weimer <fweimer@redhat.com> writes:
> Has glibc a rule not to rely on implicit booleans? Than the != 0
> variant would be preferred.
I'm not sure. Roland McGrath has objected to implicit Boolean coercion
before; see <https://sourceware.org/ml/libc-alpha/2012-09/msg00218.html>
and <https://sourceware.org/ml/libc-alpha/2014-06/msg00119.html>.
If there is a consensus, then I guess someone should edit
<https://sourceware.org/glibc/wiki/Style_and_Conventions>, in which
some examples of good patterns nowadays do "if (buf)" or similar.
The GNU Coding Standards don't discuss the issue, AFAICS.
The preferred style in glibc has always been to eschew implicit Boolean
coercions, except for the return value of strcmp/memcmp and the like (where
the most common idiomatic uses treat the value as a Boolean even though
nonzero values have further meaning).
On 09/10/2014 02:07 PM, Roland McGrath wrote:
> The preferred style in glibc has always been to eschew implicit Boolean
> coercions, except for the return value of strcmp/memcmp and the like (where
> the most common idiomatic uses treat the value as a Boolean even though
> nonzero values have further meaning).
>
Wikified.
https://sourceware.org/glibc/wiki/Style_and_Conventions#Boolean_Coercions
c.
> On 09/10/2014 02:07 PM, Roland McGrath wrote:
> > The preferred style in glibc has always been to eschew implicit Boolean
> > coercions, except for the return value of strcmp/memcmp and the like (where
> > the most common idiomatic uses treat the value as a Boolean even though
> > nonzero values have further meaning).
> >
>
> Wikified.
>
> https://sourceware.org/glibc/wiki/Style_and_Conventions#Boolean_Coercions
I forgot to mention the other exception: if (foo & BIT) or if (!(foo & BIT))
is fine, no need for if ((foo & BIT) != 0) or if ((foo & BIT) == 0).
On 09/12/2014 01:52 PM, Roland McGrath wrote:
>> On 09/10/2014 02:07 PM, Roland McGrath wrote:
>>> The preferred style in glibc has always been to eschew implicit Boolean
>>> coercions, except for the return value of strcmp/memcmp and the like (where
>>> the most common idiomatic uses treat the value as a Boolean even though
>>> nonzero values have further meaning).
>>>
>>
>> Wikified.
>>
>> https://sourceware.org/glibc/wiki/Style_and_Conventions#Boolean_Coercions
>
> I forgot to mention the other exception: if (foo & BIT) or if (!(foo & BIT))
> is fine, no need for if ((foo & BIT) != 0) or if ((foo & BIT) == 0).
Wikified++
c.
On 09/12/2014 07:52 PM, Roland McGrath wrote:
> I forgot to mention the other exception: if (foo & BIT) or if (!(foo & BIT))
> is fine, no need for if ((foo & BIT) != 0) or if ((foo & BIT) == 0).
Thanks, so there shouldn't be a need to change the patch.
From 25282e6d2d0bb5d9314ec02845747aa9c86c5606 Mon Sep 17 00:00:00 2001
From: Florian Weimer <fweimer@redhat.com>
Date: Tue, 9 Sep 2014 20:37:28 +0200
Subject: [PATCH 3/3] Manual cleanups after __builtin_expect conversion of
iconv/ and iconvdata/
This commit removes the one remaining __builtin_expect expressions which
can be converted, and fixes long lines introduces by the automated
rewriting.
2014-09-09 Florian Weimer <fweimer@redhat.com>
* iconv/gconv_dl.c (__gconv_find_shlib): Wrap long line resulting
from the automated __builtin_expect conversion.
* iconvdata/euc-kr.c (euckr_from_ucs4): Likewise.
* iconv/gconv_cache.c (__gconv_load_cache): Replace multi-line
__builtin_expect with __glibc_unlikely.
@@ -116,9 +116,9 @@ __gconv_load_cache (void)
|| __glibc_unlikely (header->string_offset >= cache_size)
|| __glibc_unlikely (header->hash_offset >= cache_size)
|| __glibc_unlikely (header->hash_size == 0)
- || __builtin_expect ((header->hash_offset
+ || __glibc_unlikely ((header->hash_offset
+ header->hash_size * sizeof (struct hash_entry))
- > cache_size, 0)
+ > cache_size)
|| __glibc_unlikely (header->module_offset >= cache_size)
|| __glibc_unlikely (header->otherconv_offset > cache_size))
{
@@ -93,7 +93,8 @@ __gconv_find_shlib (const char *name)
found->counter = -TRIES_BEFORE_UNLOAD - 1;
found->handle = NULL;
- if (__glibc_unlikely (__tsearch (found, &loaded, known_compare) == NULL))
+ if (__glibc_unlikely (__tsearch (found, &loaded, known_compare)
+ == NULL))
{
/* Something went wrong while inserting the entry. */
free (found);
@@ -38,7 +38,8 @@ euckr_from_ucs4 (uint32_t ch, unsigned char *cp)
cp[0] = '\xa3';
cp[1] = '\xdc';
}
- else if (__glibc_likely (ucs4_to_ksc5601 (ch, cp, 2) != __UNKNOWN_10646_CHAR))
+ else if (__glibc_likely (ucs4_to_ksc5601 (ch, cp, 2)
+ != __UNKNOWN_10646_CHAR))
{
cp[0] |= 0x80;
cp[1] |= 0x80;
--
1.9.3