[v2] __builtin_expect cleanup for iconv{,data}/*.c (was: Re: [PATCH] __builtin_expect cleanup for iconvdata/*.c)

Message ID 540F501B.2080808@redhat.com
State Under Review, archived
Headers

Commit Message

Florian Weimer Sept. 9, 2014, 7:08 p.m. UTC
  This is the second iteration of these patches.  The rewriting script has 
grown considerably and seems useful for subsequent clean-ups.  The first 
patch installs it as scripts/builtin_expect.py.

The second patch is the automated part of the conversion.  The changes 
look reasonable to me, but double-checking is certainly welcome.

The third patch contains the manual part.  It is now much smaller 
because the rewriting script is more sophisticated.

Again, I used a preprocessor #define to disable __builtin_expect and 
objdump -d --reloc to compare the results.  Because of a line merge and 
the resulting line number difference in an assert statement, there was 
one assembly difference.

Testing on Fedora 20 x86_64 also revealed no regressions.
  

Comments

Kalle Olavi Niemitalo Sept. 9, 2014, 11:55 p.m. UTC | #1
Florian Weimer <fweimer@redhat.com> writes:

> --- a/iconv/gconv_simple.c
> +++ b/iconv/gconv_simple.c
> @@ -387,8 +387,7 @@ ucs4_internal_loop_single (struct __gconv_step *step,
>        return __GCONV_INCOMPLETE_INPUT;
>      }
>  
> -  if (__builtin_expect (((unsigned char *) state->__value.__wchb)[0] > 0x80,
> -			0))
> +  if (__glibc_unlikely (((unsigned char *) state->__value.__wchb)[0] > 0x80))

I think that is a bug in the original code.  Should compare > 0x7f.
Likewise in internal_ucs4_loop_unaligned.
Contrast to ucs4_internal_loop, which compares inval > 0x7fffffff.

> -  if (__builtin_expect (flags & GCONV_AVOID_NOCONV, 0)
> +  if (__glibc_unlikely (flags & GCONV_AVOID_NOCONV)

This looks a bit suspicious because it is not obvious that the
flags & GCONV_AVOID_NOCONV expression always evaluates to 0 or 1.
I'm not sure __glibc_unlikely ((flags & GCONV_AVOID_NOCONV) != 0)
would be any nicer, though.

My concern was that the code is not very different from
__builtin_expect (x & 2, 1), from which an excessively clever
compiler could figure out that the expectation cannot ever hold.
Anyway, the code is different enough to avoid any such problem.

> -	    /*if (__builtin_expect (ch, 0) == __UNKNOWN_10646_CHAR)*/	      \
> +	    /*if (__glibc_unlikely (ch == __UNKNOWN_10646_CHAR))*/	      \
>  	    if (__glibc_unlikely (ch == __UNKNOWN_10646_CHAR))		      \

I don't think that comment has any value left.

I didn't finish reading the patch yet.
  
Florian Weimer Sept. 10, 2014, 7:45 a.m. UTC | #2
On 09/10/2014 01:55 AM, Kalle Olavi Niemitalo wrote:
> Florian Weimer <fweimer@redhat.com> writes:
>
>> --- a/iconv/gconv_simple.c
>> +++ b/iconv/gconv_simple.c
>> @@ -387,8 +387,7 @@ ucs4_internal_loop_single (struct __gconv_step *step,
>>         return __GCONV_INCOMPLETE_INPUT;
>>       }
>>
>> -  if (__builtin_expect (((unsigned char *) state->__value.__wchb)[0] > 0x80,
>> -			0))
>> +  if (__glibc_unlikely (((unsigned char *) state->__value.__wchb)[0] > 0x80))
>
> I think that is a bug in the original code.  Should compare > 0x7f.
> Likewise in internal_ucs4_loop_unaligned.
> Contrast to ucs4_internal_loop, which compares inval > 0x7fffffff.

This might be the case.  But it seems this code is effectively 
dead—iconv only converts complete multi-byte sequences, and the mbr* 
need a matching locale, and we haven't got a UCS-4 locale (multi-byte 
character sets which regularly use NUL bytes do not work with C).

Please file a bug if you think this still needs fixing, especially if 
you can write a test case which shows the issue.

>> -  if (__builtin_expect (flags & GCONV_AVOID_NOCONV, 0)
>> +  if (__glibc_unlikely (flags & GCONV_AVOID_NOCONV)
>
> This looks a bit suspicious because it is not obvious that the
> flags & GCONV_AVOID_NOCONV expression always evaluates to 0 or 1.
> I'm not sure __glibc_unlikely ((flags & GCONV_AVOID_NOCONV) != 0)
> would be any nicer, though.

Has glibc a rule not to rely on implicit booleans?  Than the != 0 
variant would be preferred.

>> -	    /*if (__builtin_expect (ch, 0) == __UNKNOWN_10646_CHAR)*/	      \
>> +	    /*if (__glibc_unlikely (ch == __UNKNOWN_10646_CHAR))*/	      \
>>   	    if (__glibc_unlikely (ch == __UNKNOWN_10646_CHAR))		      \
>
> I don't think that comment has any value left.

Okay, I'll remove it.

> I didn't finish reading the patch yet.

Thanks so far!
  
Kalle Olavi Niemitalo Sept. 10, 2014, 5:53 p.m. UTC | #3
Florian Weimer <fweimer@redhat.com> writes:

> Has glibc a rule not to rely on implicit booleans?  Than the != 0
> variant would be preferred.

I'm not sure.  Roland McGrath has objected to implicit Boolean coercion
before; see <https://sourceware.org/ml/libc-alpha/2012-09/msg00218.html>
and <https://sourceware.org/ml/libc-alpha/2014-06/msg00119.html>.
If there is a consensus, then I guess someone should edit
<https://sourceware.org/glibc/wiki/Style_and_Conventions>, in which
some examples of good patterns nowadays do "if (buf)" or similar.
The GNU Coding Standards don't discuss the issue, AFAICS.
  
Roland McGrath Sept. 10, 2014, 6:07 p.m. UTC | #4
The preferred style in glibc has always been to eschew implicit Boolean
coercions, except for the return value of strcmp/memcmp and the like (where
the most common idiomatic uses treat the value as a Boolean even though
nonzero values have further meaning).
  
Carlos O'Donell Sept. 12, 2014, 1:45 a.m. UTC | #5
On 09/10/2014 02:07 PM, Roland McGrath wrote:
> The preferred style in glibc has always been to eschew implicit Boolean
> coercions, except for the return value of strcmp/memcmp and the like (where
> the most common idiomatic uses treat the value as a Boolean even though
> nonzero values have further meaning).
> 

Wikified.

https://sourceware.org/glibc/wiki/Style_and_Conventions#Boolean_Coercions

c.
  
Roland McGrath Sept. 12, 2014, 5:52 p.m. UTC | #6
> On 09/10/2014 02:07 PM, Roland McGrath wrote:
> > The preferred style in glibc has always been to eschew implicit Boolean
> > coercions, except for the return value of strcmp/memcmp and the like (where
> > the most common idiomatic uses treat the value as a Boolean even though
> > nonzero values have further meaning).
> > 
> 
> Wikified.
> 
> https://sourceware.org/glibc/wiki/Style_and_Conventions#Boolean_Coercions

I forgot to mention the other exception: if (foo & BIT) or if (!(foo & BIT))
is fine, no need for if ((foo & BIT) != 0) or if ((foo & BIT) == 0).
  
Carlos O'Donell Sept. 12, 2014, 6:17 p.m. UTC | #7
On 09/12/2014 01:52 PM, Roland McGrath wrote:
>> On 09/10/2014 02:07 PM, Roland McGrath wrote:
>>> The preferred style in glibc has always been to eschew implicit Boolean
>>> coercions, except for the return value of strcmp/memcmp and the like (where
>>> the most common idiomatic uses treat the value as a Boolean even though
>>> nonzero values have further meaning).
>>>
>>
>> Wikified.
>>
>> https://sourceware.org/glibc/wiki/Style_and_Conventions#Boolean_Coercions
> 
> I forgot to mention the other exception: if (foo & BIT) or if (!(foo & BIT))
> is fine, no need for if ((foo & BIT) != 0) or if ((foo & BIT) == 0).

Wikified++

c.
  
Florian Weimer Sept. 23, 2014, 11:24 a.m. UTC | #8
On 09/12/2014 07:52 PM, Roland McGrath wrote:
> I forgot to mention the other exception: if (foo & BIT) or if (!(foo & BIT))
> is fine, no need for if ((foo & BIT) != 0) or if ((foo & BIT) == 0).

Thanks, so there shouldn't be a need to change the patch.
  

Patch

From 25282e6d2d0bb5d9314ec02845747aa9c86c5606 Mon Sep 17 00:00:00 2001
From: Florian Weimer <fweimer@redhat.com>
Date: Tue, 9 Sep 2014 20:37:28 +0200
Subject: [PATCH 3/3] Manual cleanups after __builtin_expect conversion of
 iconv/ and iconvdata/

This commit removes the one remaining __builtin_expect expressions which
can be converted, and fixes long lines introduces by the automated
rewriting.

2014-09-09  Florian Weimer  <fweimer@redhat.com>
 
	* iconv/gconv_dl.c (__gconv_find_shlib): Wrap long line resulting
	from the automated __builtin_expect conversion.
	* iconvdata/euc-kr.c (euckr_from_ucs4): Likewise.
	* iconv/gconv_cache.c (__gconv_load_cache): Replace multi-line
	__builtin_expect with __glibc_unlikely.

diff --git a/iconv/gconv_cache.c b/iconv/gconv_cache.c
index cb48eca..dab5ac6 100644
--- a/iconv/gconv_cache.c
+++ b/iconv/gconv_cache.c
@@ -116,9 +116,9 @@  __gconv_load_cache (void)
       || __glibc_unlikely (header->string_offset >= cache_size)
       || __glibc_unlikely (header->hash_offset >= cache_size)
       || __glibc_unlikely (header->hash_size == 0)
-      || __builtin_expect ((header->hash_offset
+      || __glibc_unlikely ((header->hash_offset
 			    + header->hash_size * sizeof (struct hash_entry))
-			   > cache_size, 0)
+			   > cache_size)
       || __glibc_unlikely (header->module_offset >= cache_size)
       || __glibc_unlikely (header->otherconv_offset > cache_size))
     {
diff --git a/iconv/gconv_dl.c b/iconv/gconv_dl.c
index 840793d..05448b3 100644
--- a/iconv/gconv_dl.c
+++ b/iconv/gconv_dl.c
@@ -93,7 +93,8 @@  __gconv_find_shlib (const char *name)
 	  found->counter = -TRIES_BEFORE_UNLOAD - 1;
 	  found->handle = NULL;
 
-	  if (__glibc_unlikely (__tsearch (found, &loaded, known_compare) == NULL))
+	  if (__glibc_unlikely (__tsearch (found, &loaded, known_compare)
+				== NULL))
 	    {
 	      /* Something went wrong while inserting the entry.  */
 	      free (found);
diff --git a/iconvdata/euc-kr.c b/iconvdata/euc-kr.c
index 50d1a8c..2f10453 100644
--- a/iconvdata/euc-kr.c
+++ b/iconvdata/euc-kr.c
@@ -38,7 +38,8 @@  euckr_from_ucs4 (uint32_t ch, unsigned char *cp)
 	  cp[0] = '\xa3';
 	  cp[1] = '\xdc';
 	}
-      else if (__glibc_likely (ucs4_to_ksc5601 (ch, cp, 2) != __UNKNOWN_10646_CHAR))
+      else if (__glibc_likely (ucs4_to_ksc5601 (ch, cp, 2)
+			       != __UNKNOWN_10646_CHAR))
 	{
 	  cp[0] |= 0x80;
 	  cp[1] |= 0x80;
-- 
1.9.3