[COMMITTED] PowerPC: Remove 64 bits instructions in PPC32 code

Message ID 53834CE6.2080802@linux.vnet.ibm.com
State Committed
Headers

Commit Message

Adhemerval Zanella Netto May 26, 2014, 2:17 p.m. UTC
  This patch replaces the insrdi by insrwi in powerpc32 assembly.  Although they
are not wrong, since all POWER chips supported in 32-bits are 64-bits and the chips
do not thrown an illegal exception when running these instructions, valgrind
fails accusing an invalid one.

There are still rldimi usage on ppc32 assembly code, however they are only for
little-endian and they won't be used anytime soon.

Checked on powerpc32, no regressions found.

--

2014-05-26  Adhemerval Zanella  <azanella@linux.vnet.ibm.com>

	* sysdeps/powerpc/powerpc32/power4/memset.S (memset): Replace insrdi
	by insrwi.
	* sysdeps/powerpc/powerpc32/power6/memset.S (memset): Likewise.
	* sysdeps/powerpc/powerpc32/power7/memset.S (memset): Likewise.
	* sysdeps/powerpc/powerpc32/power7/memchr.S (memchr): Likewise.
	* sysdeps/powerpc/powerpc32/power7/memrchr.S (memrchr): Likewise.
	* sysdeps/powerpc/powerpc32/power7/rawmemchr.S (rawmemchr): Likewise.
	* sysdeps/powerpc/powerpc32/power7/strchr.S (strchr): Likewise.
	* sysdeps/powerpc/powerpc32/power7/strchrnul.S (strchrnul): Likewise.

---
  

Comments

Segher Boessenkool May 26, 2014, 6:12 p.m. UTC | #1
> This patch replaces the insrdi by insrwi in powerpc32 assembly.  Although they
> are not wrong, since all POWER chips supported in 32-bits are 64-bits and the chips
> do not thrown an illegal exception when running these instructions, valgrind
> fails accusing an invalid one.

This code is CPU-specific; as you say, those CPUs can use rldimi just
fine.  The reason the code uses rldimi instead of rlwimi is because
it is faster (at least on power4, power5).  Fix valgrind instead?


Segher
  
Adhemerval Zanella Netto May 26, 2014, 7:34 p.m. UTC | #2
On 26-05-2014 15:12, Segher Boessenkool wrote:
>> This patch replaces the insrdi by insrwi in powerpc32 assembly.  Although they
>> are not wrong, since all POWER chips supported in 32-bits are 64-bits and the chips
>> do not thrown an illegal exception when running these instructions, valgrind
>> fails accusing an invalid one.
> This code is CPU-specific; as you say, those CPUs can use rldimi just
> fine.  The reason the code uses rldimi instead of rlwimi is because
> it is faster (at least on power4, power5).  Fix valgrind instead?
>
>
> Segher
>
Well, using http://pastebin.com/CttashRQ on a POWER5 (1.9 GHz) I get:

> ./test 
rldimi: min: 7 | max: 9
rlwimi: min: 7 | max: 10

And by issuing 16 instruction per test function I get:

> ./test 
rldimi: min: 7 | max: 9
rlwimi: min: 7 | max: 13

Newer processor (POWER7) also shows the same behavior.  And the instructions
and not in hot path in the code (it is only called once), so I hardly consider
this a performance regression. 

Anyway, I would prefer to keep consistent and using only 32-bits in 32-bits 
assembly code to avoid such issues with external tools (valgrind is only an
example) and to allow possible future implementation in different chips that
do not implement the 64-bits instructions to use powerN code.
  
Segher Boessenkool May 26, 2014, 9:04 p.m. UTC | #3
> >> This patch replaces the insrdi by insrwi in powerpc32 assembly.  Although they
> >> are not wrong, since all POWER chips supported in 32-bits are 64-bits and the chips
> >> do not thrown an illegal exception when running these instructions, valgrind
> >> fails accusing an invalid one.
> > This code is CPU-specific; as you say, those CPUs can use rldimi just
> > fine.  The reason the code uses rldimi instead of rlwimi is because
> > it is faster (at least on power4, power5).  Fix valgrind instead?
> >
> >
> > Segher
> >
> Well, using http://pastebin.com/CttashRQ on a POWER5 (1.9 GHz) I get:
> 
> > ./test 
> rldimi: min: 7 | max: 9
> rlwimi: min: 7 | max: 10
> 
> And by issuing 16 instruction per test function I get:
> 
> > ./test 
> rldimi: min: 7 | max: 9
> rlwimi: min: 7 | max: 13
> 
> Newer processor (POWER7) also shows the same behavior.

On a POWER7 I get that rlwimi is almost twice as slow as rldimi,
just as expected.  The way you constructed your test with a blr
immediately after a single rl*imi you get only one per group no
matter what.

> And the instructions
> and not in hot path in the code (it is only called once), so I hardly consider
> this a performance regression. 

That might well be.  But see http://sourceware.org/ml/libc-alpha/2013-08/msg00101.html
where (part of) this code was added.

> Anyway, I would prefer to keep consistent and using only 32-bits in 32-bits 
> assembly code to avoid such issues with external tools (valgrind is only an
> example) and to allow possible future implementation in different chips that
> do not implement the 64-bits instructions to use powerN code.

I find this not a convincing argument at all.  But it's not my call ;-)


Segher
  
Adhemerval Zanella Netto May 26, 2014, 9:48 p.m. UTC | #4
On 26-05-2014 18:04, Segher Boessenkool wrote:
>>>> This patch replaces the insrdi by insrwi in powerpc32 assembly.  Although they
>>>> are not wrong, since all POWER chips supported in 32-bits are 64-bits and the chips
>>>> do not thrown an illegal exception when running these instructions, valgrind
>>>> fails accusing an invalid one.
>>> This code is CPU-specific; as you say, those CPUs can use rldimi just
>>> fine.  The reason the code uses rldimi instead of rlwimi is because
>>> it is faster (at least on power4, power5).  Fix valgrind instead?
>>>
>>>
>>> Segher
>>>
>> Well, using http://pastebin.com/CttashRQ on a POWER5 (1.9 GHz) I get:
>>
>>> ./test 
>> rldimi: min: 7 | max: 9
>> rlwimi: min: 7 | max: 10
>>
>> And by issuing 16 instruction per test function I get:
>>
>>> ./test 
>> rldimi: min: 7 | max: 9
>> rlwimi: min: 7 | max: 13
>>
>> Newer processor (POWER7) also shows the same behavior.
> On a POWER7 I get that rlwimi is almost twice as slow as rldimi,
> just as expected.  The way you constructed your test with a blr
> immediately after a single rl*imi you get only one per group no
> matter what.

That's why I also constructed a example with 16 instructions (which I didn't included
in the link).  A more comprehensible example: http://pastebin.com/KD7xSqkJ and running
it on POWER7 (3.5GHz):

$ taskset -c 24 ./test
rldimi1:  min: 12 | max: 19
rldimi4:  min: 12 | max: 20
rldimi8:  min: 12 | max: 20
rldimi16: min: 12 | max: 23
rlwimi1:  min: 12 | max: 23
rlwimi4:  min: 12 | max: 23
rlwimi8:  min: 12 | max: 24
rlwimi16: min: 12 | max: 29

>
>> And the instructions
>> and not in hot path in the code (it is only called once), so I hardly consider
>> this a performance regression. 
> That might well be.  But see http://sourceware.org/ml/libc-alpha/2013-08/msg00101.html
> where (part of) this code was added.

Yeap I aware, I recalled this thread.  Anyway, in attachments I'm sending you the strchr
benchtest output in a POWER7 with and without the modification.  Since the code patch is
taken only once I would expect some latency difference with short length, however the
results does not really shown any noticeable slowdown.  I will check later on a POWER5
machine, but I also don't really expect much difference.


>
>> Anyway, I would prefer to keep consistent and using only 32-bits in 32-bits 
>> assembly code to avoid such issues with external tools (valgrind is only an
>> example) and to allow possible future implementation in different chips that
>> do not implement the 64-bits instructions to use powerN code.
> I find this not a convincing argument at all.  But it's not my call ;-)

POWER4 ifunc call selection, for instance, use the hwcap bits flags to select the best
implementation.  If a future chips would like to use the same code to add ifunc as well,
it will require a new way to differ from PPC_FEATURE_POWER4 (which is bit 0x00080000).
It will add more complexity on this code.  It is the for external tools: it will need
to add more logic to handle different chips.  Also, AFAIK GCC does not generate 64-bits
instructions for -m32 (I might be wrong on this one).

Anyway, as I said I just change this because 1. it fixes valgrind checks, 2. I didn't 
see performance compelling reasons, and 3. it do see it more consistent to use 32 bits
instructions on 32 bit code.  However, 1. can be fixed if 2. is false (and 3. just can
be ignored).


>
>
> Segher
>
simple_STRCHR	stupid_STRCHR	__strchr_power7	__strchr_ppc
Length   32, alignment in bytes  0:	23	201.656	9.75	11.4531
Length   32, alignment in bytes  1:	21.5312	196.188	7.5	10.7812
Length   64, alignment in bytes  0:	37.9531	221.5	11.75	19.8438
Length   64, alignment in bytes  2:	37.8594	220.828	11.875	19.875
Length  128, alignment in bytes  0:	70.4688	272.984	23.4844	32.4375
Length  128, alignment in bytes  3:	70.2969	273.281	22.7656	32.2969
Length  256, alignment in bytes  0:	135.188	374.203	37.3438	57.4844
Length  256, alignment in bytes  4:	185.766	523.484	37.3594	57.25
Length  512, alignment in bytes  0:	527.656	627.25	66.4531	107.781
Length  512, alignment in bytes  5:	265.359	545.375	65.375	107.562
Length 1024, alignment in bytes  0:	525.516	998.297	124.25	207.594
Length 1024, alignment in bytes  6:	525.469	913.25	123.375	207.672
Length 2048, alignment in bytes  0:	1045.94	1777.61	240.75	411.406
Length 2048, alignment in bytes  7:	1045.58	1657.86	240.062	406.5
Length   64, alignment in bytes  1:	37.7812	85.6562	11.7656	19.8438
Length   64, alignment in bytes  1:	37.7812	85.2188	11.3906	19.9219
Length   64, alignment in bytes  2:	37.9531	85.3281	11.2344	19.9219
Length   64, alignment in bytes  2:	37.6562	85.6719	11.25	19.875
Length   64, alignment in bytes  3:	37.7188	84.3281	11.2969	19.75
Length   64, alignment in bytes  3:	37.5781	84.5781	11.2031	19.7812
Length   64, alignment in bytes  4:	37.875	82.4219	11.4531	20
Length   64, alignment in bytes  4:	37.5	81.3594	10.8281	19.7969
Length   64, alignment in bytes  5:	37.7344	82.1719	10.6406	19.875
Length   64, alignment in bytes  5:	37.5312	81.5156	10.6562	19.9062
Length   64, alignment in bytes  6:	37.7656	82.1406	10.6094	19.75
Length   64, alignment in bytes  6:	37.7188	81.1875	10.7344	19.9219
Length   64, alignment in bytes  7:	37.625	81.3125	10.5938	19.75
Length   64, alignment in bytes  7:	37.7031	81.125	10.6562	20
Length    0, alignment in bytes  0:	3.65625	7.09375	4	4.75
Length    0, alignment in bytes  0:	2.03125	6.5625	3.71875	4.28125
Length    1, alignment in bytes  0:	2.6875	7.17188	3.45312	4.25
Length    1, alignment in bytes  0:	2.5	7	3.46875	4.26562
Length    2, alignment in bytes  0:	6.54688	8.125	3.35938	4.34375
Length    2, alignment in bytes  0:	8.51562	7.65625	3.53125	4.375
Length    3, alignment in bytes  0:	6.82812	8.71875	3.5625	4.92188
Length    3, alignment in bytes  0:	7.14062	7.96875	3.5	4.73438
Length    4, alignment in bytes  0:	7.42188	9	4.51562	5.51562
Length    4, alignment in bytes  0:	7.28125	8.5625	4.1875	4.76562
Length    5, alignment in bytes  0:	7.45312	9.40625	3.8125	4.79688
Length    5, alignment in bytes  0:	7.75	9.04688	3.92188	4.76562
Length    6, alignment in bytes  0:	8.28125	10.25	3.70312	4.71875
Length    6, alignment in bytes  0:	8.28125	9.71875	3.89062	4.76562
Length    7, alignment in bytes  0:	8.84375	12.375	3.85938	5.3125
Length    7, alignment in bytes  0:	8.73438	11.4531	3.84375	5.10938
Length    8, alignment in bytes  0:	9.21875	12.3125	5.21875	6.0625
Length    8, alignment in bytes  0:	9.3125	12	5.03125	5.76562
Length    9, alignment in bytes  0:	9.67188	13.2812	4.71875	5.73438
Length    9, alignment in bytes  0:	9.67188	13.0469	4.67188	5.73438
Length   10, alignment in bytes  0:	10.3281	13.7969	4.73438	5.53125
Length   10, alignment in bytes  0:	10.4375	13.5	4.625	5.6875
Length   11, alignment in bytes  0:	10.8125	14	4.71875	6.25
Length   11, alignment in bytes  0:	10.6406	13.8594	4.57812	5.67188
Length   12, alignment in bytes  0:	11.1562	15	5.28125	6.84375
Length   12, alignment in bytes  0:	11.25	14.75	5.125	6.40625
Length   13, alignment in bytes  0:	11.7344	15.2188	4.95312	6.35938
Length   13, alignment in bytes  0:	11.8906	14.9688	5	6.4375
Length   14, alignment in bytes  0:	12.3594	15.7812	5	6.29688
Length   14, alignment in bytes  0:	12.25	15.1094	5.04688	6.3125
Length   15, alignment in bytes  0:	12.7344	19.0938	4.92188	6.84375
Length   15, alignment in bytes  0:	12.8594	18.1406	5	6.54688
Length   16, alignment in bytes  0:	13.3906	19.7031	6.45312	7.96875
Length   16, alignment in bytes  0:	13.0625	19.3125	5.71875	7.17188
Length   17, alignment in bytes  0:	13.9062	21.7812	5.6875	7.5
Length   17, alignment in bytes  0:	13.8281	20.4219	5.6875	7.25
Length   18, alignment in bytes  0:	14.3125	24.7656	5.625	7.40625
Length   18, alignment in bytes  0:	14.3125	24.7031	5.64062	7.14062
Length   19, alignment in bytes  0:	14.6875	25.5312	5.54688	7.78125
Length   19, alignment in bytes  0:	14.6094	24.9844	5.76562	7.35938
Length   20, alignment in bytes  0:	15.2969	25.2031	6.28125	8.45312
Length   20, alignment in bytes  0:	15.2812	25.4062	5.9375	8.32812
Length   21, alignment in bytes  0:	15.6719	26.0156	5.75	8.09375
Length   21, alignment in bytes  0:	15.7812	25.875	5.76562	8.25
Length   22, alignment in bytes  0:	16.4375	26.8594	5.79688	8.1875
Length   22, alignment in bytes  0:	16.4219	27	5.78125	8.25
Length   23, alignment in bytes  0:	16.9062	31.5625	5.79688	8.75
Length   23, alignment in bytes  0:	16.75	30.5	5.73438	8.34375
Length   24, alignment in bytes  0:	17.4844	30.1719	7.1875	9.3125
Length   24, alignment in bytes  0:	17.4219	30.7969	6.60938	8.75
Length   25, alignment in bytes  0:	17.9219	31.2188	6.51562	8.82812
Length   25, alignment in bytes  0:	17.9375	30.3125	6.34375	8.84375
Length   26, alignment in bytes  0:	18.25	31.2656	6.45312	8.82812
Length   26, alignment in bytes  0:	18.2656	31.4688	6.34375	8.75
Length   27, alignment in bytes  0:	18.8594	31.9062	6.35938	9.17188
Length   27, alignment in bytes  0:	18.9062	31.2969	6.42188	8.85938
Length   28, alignment in bytes  0:	19.4062	32.2188	7.5	9.73438
Length   28, alignment in bytes  0:	19.5	32	6.92188	9.46875
Length   29, alignment in bytes  0:	19.9688	32.6406	6.75	9.4375
Length   29, alignment in bytes  0:	19.8125	32.2812	6.78125	9.5
Length   30, alignment in bytes  0:	20.3594	34.625	6.85938	9.5
Length   30, alignment in bytes  0:	20.3281	34.0781	6.75	9.35938
Length   31, alignment in bytes  0:	21	37.6562	6.92188	9.85938
Length   31, alignment in bytes  0:	20.7188	36.7344	6.76562	9.57812
Length   32, alignment in bytes  0:	21.6094	36.7344	6.46875	10.3594
Length   32, alignment in bytes  1:	21.4844	36.5938	6.0625	10.25
Length   64, alignment in bytes  0:	37.7031	64.0469	9.20312	19.75
Length   64, alignment in bytes  2:	37.75	63.4688	8.40625	19.5
Length  128, alignment in bytes  0:	70.0938	118.703	13.6562	32.0781
Length  128, alignment in bytes  3:	70.0156	118.25	13.0156	32
Length  256, alignment in bytes  0:	135.047	238.5	28.0312	56.8281
Length  256, alignment in bytes  4:	135.234	221.203	27.5156	57.0312
Length  512, alignment in bytes  0:	265.281	457.75	47.125	106.766
Length  512, alignment in bytes  5:	265.375	426.359	46.5	107.188
Length 1024, alignment in bytes  0:	525.391	912.156	85.2344	206.922
Length 1024, alignment in bytes  6:	525.5	842	85.4688	207
Length 2048, alignment in bytes  0:	1045.5	1778.64	163.734	409.719
Length 2048, alignment in bytes  7:	1045.47	1656.89	163.266	407.641
Length   64, alignment in bytes  1:	37.6094	64.5	8.70312	19.7031
Length   64, alignment in bytes  1:	37.75	58.9219	8.39062	19.4844
Length   64, alignment in bytes  2:	37.9062	63.1094	8.1875	19.5938
Length   64, alignment in bytes  2:	37.5469	63.5156	8.14062	19.5938
Length   64, alignment in bytes  3:	37.6719	64.4688	8.14062	19.75
Length   64, alignment in bytes  3:	37.75	63.5781	8.01562	19.7188
Length   64, alignment in bytes  4:	37.8438	64.3281	8.39062	19.5469
Length   64, alignment in bytes  4:	37.875	64.3906	8	19.7344
Length   64, alignment in bytes  5:	37.6406	64.1406	7.6875	19.625
Length   64, alignment in bytes  5:	37.8906	64.125	7.78125	19.6094
Length   64, alignment in bytes  6:	37.8594	64.6406	7.70312	19.6406
Length   64, alignment in bytes  6:	37.6719	64.4219	7.89062	19.5
Length   64, alignment in bytes  7:	37.8281	64	7.82812	19.5938
Length   64, alignment in bytes  7:	37.75	64.7031	7.78125	19.6875
Length    0, alignment in bytes  0:	2.32812	6.92188	3.45312	4.70312
Length    0, alignment in bytes  0:	1.96875	6.5625	3.04688	4.34375
Length    1, alignment in bytes  0:	2.6875	7.03125	3.20312	4.25
Length    1, alignment in bytes  0:	2.46875	6.64062	3.04688	4.5
Length    2, alignment in bytes  0:	5.85938	8	2.95312	4.10938
Length    2, alignment in bytes  0:	6	7.53125	3.04688	4.20312
Length    3, alignment in bytes  0:	6.84375	8.35938	3	4.20312
Length    3, alignment in bytes  0:	6.65625	7.89062	2.98438	4.14062
Length    4, alignment in bytes  0:	7.4375	9.15625	3.82812	5.29688
Length    4, alignment in bytes  0:	7.28125	8.92188	3.48438	4.92188
Length    5, alignment in bytes  0:	7.79688	9.3125	3	4.75
Length    5, alignment in bytes  0:	7.59375	9.67188	3	4.89062
Length    6, alignment in bytes  0:	8.20312	9.92188	3.07812	4.70312
Length    6, alignment in bytes  0:	8.32812	10.125	3.125	4.67188
Length    7, alignment in bytes  0:	8.71875	10.8594	3.35938	4.82812
Length    7, alignment in bytes  0:	8.71875	10.0625	3.28125	4.76562
Length    8, alignment in bytes  0:	9.40625	13	4.75	5.73438
Length    8, alignment in bytes  0:	9.07812	12.0156	4	5.71875
Length    9, alignment in bytes  0:	9.64062	13.4844	3.78125	5.65625
Length    9, alignment in bytes  0:	9.67188	13.1406	3.92188	5.67188
Length   10, alignment in bytes  0:	10.1875	13.6875	3.875	5.57812
Length   10, alignment in bytes  0:	10.3281	13.6562	3.89062	5.625
Length   11, alignment in bytes  0:	10.9375	14.7188	3.82812	5.57812
Length   11, alignment in bytes  0:	10.8125	14.4219	3.79688	5.59375
Length   12, alignment in bytes  0:	11.4688	15.25	4.5	6.84375
Length   12, alignment in bytes  0:	11.1719	14.8438	3.95312	6.48438
Length   13, alignment in bytes  0:	11.8125	15.5156	3.65625	6.4375
Length   13, alignment in bytes  0:	11.8281	15.0312	3.67188	6.39062
Length   14, alignment in bytes  0:	12.2969	16.5625	3.75	6.32812
Length   14, alignment in bytes  0:	12.3281	15.0312	3.76562	6.375
Length   15, alignment in bytes  0:	12.6094	17.0469	3.67188	6.34375
Length   15, alignment in bytes  0:	12.7344	16.4375	3.98438	6.35938
Length   16, alignment in bytes  0:	13.1875	19.9219	10.2031	7.75
Length   16, alignment in bytes  0:	13.1562	18.9531	9.875	7.25
Length   17, alignment in bytes  0:	13.6719	20.4219	9.73438	7.14062
Length   17, alignment in bytes  0:	13.8281	20.3594	9.76562	7.23438
Length   18, alignment in bytes  0:	14.3438	24.75	9.85938	7.26562
Length   18, alignment in bytes  0:	14.4219	24.7344	9.65625	7.20312
Length   19, alignment in bytes  0:	14.75	26.2188	9.65625	7.17188
Length   19, alignment in bytes  0:	14.8594	26	9.6875	7.32812
Length   20, alignment in bytes  0:	15.3594	25.5156	4.625	8.48438
Length   20, alignment in bytes  0:	15.1094	27.0469	4.25	8.20312
Length   21, alignment in bytes  0:	15.8438	27.9219	4.07812	8.25
Length   21, alignment in bytes  0:	15.7812	28.8281	4.20312	8.28125
Length   22, alignment in bytes  0:	16.4375	29.2188	4.17188	8.25
Length   22, alignment in bytes  0:	16.4531	29.5938	4.125	8.14062
Length   23, alignment in bytes  0:	16.8281	27.1094	4	8.10938
Length   23, alignment in bytes  0:	16.8125	27.1875	4.20312	8.10938
Length   24, alignment in bytes  0:	17.5	31.4531	6	9.125
Length   24, alignment in bytes  0:	17.5	30.625	5.40625	8.84375
Length   25, alignment in bytes  0:	17.8281	31.0625	5.26562	8.75
Length   25, alignment in bytes  0:	17.9219	29.125	5.5	8.85938
Length   26, alignment in bytes  0:	18.4844	31.7188	5.10938	8.95312
Length   26, alignment in bytes  0:	18.3594	31.8906	5.10938	8.9375
Length   27, alignment in bytes  0:	18.9375	31.7031	5.1875	8.82812
Length   27, alignment in bytes  0:	18.8906	31.7344	5.1875	8.82812
Length   28, alignment in bytes  0:	19.375	33.25	9.98438	9.92188
Length   28, alignment in bytes  0:	19.25	33.6094	9.6875	9.73438
Length   29, alignment in bytes  0:	19.9531	33.9844	11	9.65625
Length   29, alignment in bytes  0:	19.9688	33.7656	9.46875	9.53125
Length   30, alignment in bytes  0:	20.3438	34.4062	9.53125	9.5
Length   30, alignment in bytes  0:	20.5	34.0156	10.5938	9.5625
Length   31, alignment in bytes  0:	20.9375	34.9062	11.4531	9.625
Length   31, alignment in bytes  0:	20.875	36.3906	11.1094	9.625
simple_STRCHR	stupid_STRCHR	__strchr_power7	__strchr_ppc
Length   32, alignment in bytes  0:	23.5625	209.562	10.25	13.3125
Length   32, alignment in bytes  1:	21.5156	196.406	7.53125	10.7969
Length   64, alignment in bytes  0:	37.9219	221.297	12.2812	19.8438
Length   64, alignment in bytes  2:	37.7188	222	11.7188	19.8438
Length  128, alignment in bytes  0:	70.25	272.047	22.9062	32.5
Length  128, alignment in bytes  3:	70.3906	273.375	22.8438	32.5
Length  256, alignment in bytes  0:	135.172	369.406	37.3125	57.6562
Length  256, alignment in bytes  4:	135.484	360.969	37.5156	57.7031
Length  512, alignment in bytes  0:	265.266	572.812	66.3906	108.25
Length  512, alignment in bytes  5:	265.25	542.641	65.7812	107.828
Length 1024, alignment in bytes  0:	525.484	995.219	124.188	209.141
Length 1024, alignment in bytes  6:	525.5	919.625	123.922	209.031
Length 2048, alignment in bytes  0:	1045.92	1807.25	241.641	411.703
Length 2048, alignment in bytes  7:	1045.73	1643.98	240.531	411.078
Length   64, alignment in bytes  1:	37.9844	83.6406	12	20.0156
Length   64, alignment in bytes  1:	37.6719	85.0312	11.7344	19.9375
Length   64, alignment in bytes  2:	37.75	85.6562	11.6875	19.8594
Length   64, alignment in bytes  2:	37.6406	85.125	11.3438	20.0781
Length   64, alignment in bytes  3:	37.6094	86	11.375	19.8125
Length   64, alignment in bytes  3:	37.625	85.5	11.4219	19.8906
Length   64, alignment in bytes  4:	37.8906	82.8281	11.4219	19.8906
Length   64, alignment in bytes  4:	37.6562	82.3281	10.9062	20
Length   64, alignment in bytes  5:	37.75	82.1719	10.6719	19.8438
Length   64, alignment in bytes  5:	37.75	82.0781	10.6406	19.9844
Length   64, alignment in bytes  6:	37.6875	82.4844	10.5	19.8125
Length   64, alignment in bytes  6:	37.75	82.2969	10.7344	20
Length   64, alignment in bytes  7:	37.6875	82.4531	10.7344	19.9062
Length   64, alignment in bytes  7:	37.8594	82.2031	10.7031	20.0156
Length    0, alignment in bytes  0:	3.67188	7.07812	4.25	4.89062
Length    0, alignment in bytes  0:	2.10938	6.75	3.75	4.39062
Length    1, alignment in bytes  0:	3.15625	7.40625	3.45312	4.26562
Length    1, alignment in bytes  0:	2.89062	7.35938	3.5	4.14062
Length    2, alignment in bytes  0:	8.82812	8.375	3.40625	4.40625
Length    2, alignment in bytes  0:	8.6875	7.75	3.54688	4.34375
Length    3, alignment in bytes  0:	6.9375	8.64062	3.51562	4.60938
Length    3, alignment in bytes  0:	6.79688	8	3.54688	4.6875
Length    4, alignment in bytes  0:	7.28125	9.42188	4.53125	5.54688
Length    4, alignment in bytes  0:	7.4375	8.75	4.10938	4.84375
Length    5, alignment in bytes  0:	7.57812	10.3281	3.95312	5.01562
Length    5, alignment in bytes  0:	7.67188	9.01562	3.79688	4.82812
Length    6, alignment in bytes  0:	8.34375	10.5	3.85938	4.75
Length    6, alignment in bytes  0:	8.21875	10.4062	3.8125	4.85938
Length    7, alignment in bytes  0:	8.65625	12.8125	3.92188	5.51562
Length    7, alignment in bytes  0:	8.71875	11.2812	3.84375	5.29688
Length    8, alignment in bytes  0:	9.20312	12.8438	5.17188	6.3125
Length    8, alignment in bytes  0:	9.32812	11.9688	4.90625	5.8125
Length    9, alignment in bytes  0:	9.625	13.2812	4.76562	5.5625
Length    9, alignment in bytes  0:	9.85938	13.2188	4.67188	5.6875
Length   10, alignment in bytes  0:	10.3438	13.75	4.6875	5.5
Length   10, alignment in bytes  0:	10.25	13.7656	4.71875	5.67188
Length   11, alignment in bytes  0:	10.9062	14.5	4.73438	6.20312
Length   11, alignment in bytes  0:	10.8906	13.5625	4.73438	5.75
Length   12, alignment in bytes  0:	11.2344	14.8281	5.23438	7.29688
Length   12, alignment in bytes  0:	11.2188	14.6094	5.25	6.48438
Length   13, alignment in bytes  0:	11.9844	15.5156	5.0625	6.59375
Length   13, alignment in bytes  0:	11.9062	15.0938	4.95312	6.34375
Length   14, alignment in bytes  0:	12.3594	16.25	5	6.45312
Length   14, alignment in bytes  0:	12.3281	15.0781	5.01562	6.4375
Length   15, alignment in bytes  0:	12.75	18.6562	4.92188	7.04688
Length   15, alignment in bytes  0:	12.75	19.0156	4.98438	6.71875
Length   16, alignment in bytes  0:	13.3594	19.0312	6.20312	7.67188
Length   16, alignment in bytes  0:	13.1562	18.6719	5.82812	7.10938
Length   17, alignment in bytes  0:	13.9219	21.9219	5.71875	7.25
Length   17, alignment in bytes  0:	13.7812	20.3906	5.65625	7.21875
Length   18, alignment in bytes  0:	14.1094	24.7812	5.78125	7.25
Length   18, alignment in bytes  0:	14.2188	24.8281	5.73438	7.23438
Length   19, alignment in bytes  0:	14.6562	26.4688	5.60938	7.98438
Length   19, alignment in bytes  0:	14.7031	24.9219	5.57812	7.35938
Length   20, alignment in bytes  0:	15.2031	25.3438	6.40625	8.4375
Length   20, alignment in bytes  0:	15.1719	26.75	5.92188	8.20312
Length   21, alignment in bytes  0:	15.9219	26.1406	5.84375	8.34375
Length   21, alignment in bytes  0:	15.75	26.2969	5.8125	8.03125
Length   22, alignment in bytes  0:	16.2812	27	5.78125	8.20312
Length   22, alignment in bytes  0:	16.2188	27.5	5.85938	8.15625
Length   23, alignment in bytes  0:	16.75	31.7031	5.84375	8.75
Length   23, alignment in bytes  0:	16.7656	30.6562	5.79688	8.375
Length   24, alignment in bytes  0:	17.1719	30.5469	7.46875	9.09375
Length   24, alignment in bytes  0:	17.1875	30	6.75	8.75
Length   25, alignment in bytes  0:	17.6562	32.5469	6.4375	8.84375
Length   25, alignment in bytes  0:	17.7344	30.0312	6.45312	8.95312
Length   26, alignment in bytes  0:	18.375	31.3438	6.46875	8.73438
Length   26, alignment in bytes  0:	18.25	31.3594	6.46875	8.85938
Length   27, alignment in bytes  0:	18.6406	32.2344	6.45312	9.57812
Length   27, alignment in bytes  0:	18.6562	31.75	6.51562	8.96875
Length   28, alignment in bytes  0:	19.3594	32.5625	7.28125	9.67188
Length   28, alignment in bytes  0:	19.4531	32.1406	6.95312	9.60938
Length   29, alignment in bytes  0:	19.9688	33.2344	6.98438	9.46875
Length   29, alignment in bytes  0:	19.7969	32.8594	6.89062	9.51562
Length   30, alignment in bytes  0:	20.3594	34.0938	6.76562	9.57812
Length   30, alignment in bytes  0:	20.2812	33.7812	6.89062	9.65625
Length   31, alignment in bytes  0:	20.8594	38.4062	6.73438	9.84375
Length   31, alignment in bytes  0:	20.9219	37.2031	6.78125	9.70312
Length   32, alignment in bytes  0:	21.5	36.9844	6.46875	10.4375
Length   32, alignment in bytes  1:	21.4375	36.75	6.10938	10.5
Length   64, alignment in bytes  0:	37.7031	64.1562	8.6875	19.8906
Length   64, alignment in bytes  2:	37.75	63.3594	8.40625	19.5938
Length  128, alignment in bytes  0:	70.2031	116.984	13.9219	32.1719
Length  128, alignment in bytes  3:	70.0781	118.141	13.3438	32.1562
Length  256, alignment in bytes  0:	135.078	239.953	27.6094	57
Length  256, alignment in bytes  4:	135.25	221.781	27.8594	57.5
Length  512, alignment in bytes  0:	265.281	460.062	47.2344	107.562
Length  512, alignment in bytes  5:	265.25	427.344	46.4531	107.875
Length 1024, alignment in bytes  0:	525.344	912.016	86.0469	208.438
Length 1024, alignment in bytes  6:	525.391	832.109	85.75	208.656
Length 2048, alignment in bytes  0:	1045.69	1819.2	163.281	411.562
Length 2048, alignment in bytes  7:	1045.53	1680.5	282.812	731.234
Length   64, alignment in bytes  1:	50.4531	72.625	10.9531	25.7812
Length   64, alignment in bytes  1:	50.5	70.1406	8.29688	17.8906
Length   64, alignment in bytes  2:	37.6562	63.7188	8.15625	17.9219
Length   64, alignment in bytes  2:	37.75	62.5	8.14062	17.9844
Length   64, alignment in bytes  3:	37.9375	60.9062	8.09375	17.9375
Length   64, alignment in bytes  3:	37.5	62	8.28125	18
Length   64, alignment in bytes  4:	37.7969	65	8.07812	18
Length   64, alignment in bytes  4:	37.875	64.5469	7.73438	18.0469
Length   64, alignment in bytes  5:	37.6094	64.3125	7.51562	17.9531
Length   64, alignment in bytes  5:	37.7969	64.8594	7.54688	17.9062
Length   64, alignment in bytes  6:	37.75	64.2969	7.51562	17.9375
Length   64, alignment in bytes  6:	37.5781	64.3594	7.60938	17.9375
Length   64, alignment in bytes  7:	37.6875	64.5156	7.39062	17.9219
Length   64, alignment in bytes  7:	37.6875	64.4531	7.5625	18.0156
Length    0, alignment in bytes  0:	2.21875	6.39062	3.65625	4.03125
Length    0, alignment in bytes  0:	2.01562	6.32812	3.35938	3.875
Length    1, alignment in bytes  0:	2.57812	7.21875	3.125	3.9375
Length    1, alignment in bytes  0:	2.64062	6.82812	3.10938	3.78125
Length    2, alignment in bytes  0:	5.625	7.75	3	3.75
Length    2, alignment in bytes  0:	5.4375	7.5	3.04688	3.75
Length    3, alignment in bytes  0:	6.73438	8.29688	2.92188	3.73438
Length    3, alignment in bytes  0:	9.625	8.125	3	3.67188
Length    4, alignment in bytes  0:	7.64062	8.9375	3.5625	4.9375
Length    4, alignment in bytes  0:	7.75	8.79688	3.28125	4.20312
Length    5, alignment in bytes  0:	7.6875	9.85938	2.98438	4.20312
Length    5, alignment in bytes  0:	7.60938	9.04688	3.04688	4.1875
Length    6, alignment in bytes  0:	8.3125	9.76562	3.17188	4.20312
Length    6, alignment in bytes  0:	8.26562	9.64062	3.15625	4.09375
Length    7, alignment in bytes  0:	8.82812	10.9219	3.125	4.0625
Length    7, alignment in bytes  0:	8.53125	10.3125	3.23438	4.10938
Length    8, alignment in bytes  0:	9.53125	12.75	4.375	5.20312
Length    8, alignment in bytes  0:	9.28125	11.7812	4	4.75
Length    9, alignment in bytes  0:	9.78125	12.8594	3.71875	4.90625
Length    9, alignment in bytes  0:	9.71875	12.7656	3.70312	4.76562
Length   10, alignment in bytes  0:	10.0469	13.25	3.70312	4.75
Length   10, alignment in bytes  0:	10.0781	13.25	3.57812	4.57812
Length   11, alignment in bytes  0:	10.7656	14	3.5625	4.76562
Length   11, alignment in bytes  0:	10.6875	13.7969	3.5	4.67188
Length   12, alignment in bytes  0:	11.1719	14.6719	4.6875	6
Length   12, alignment in bytes  0:	11.0938	14.125	4	5.5
Length   13, alignment in bytes  0:	11.8594	15.5312	3.84375	5.40625
Length   13, alignment in bytes  0:	11.75	14.9219	3.67188	5.45312
Length   14, alignment in bytes  0:	12.1094	16.25	3.92188	5.29688
Length   14, alignment in bytes  0:	12.0312	16.0781	3.875	5.5
Length   15, alignment in bytes  0:	12.75	16.9062	3.75	5.42188
Length   15, alignment in bytes  0:	12.75	16.3438	3.70312	5.46875
Length   16, alignment in bytes  0:	13.1406	19.0938	10.0469	6.67188
Length   16, alignment in bytes  0:	13.3594	19.0781	10.1094	6.14062
Length   17, alignment in bytes  0:	13.6094	19.0469	9.60938	6
Length   17, alignment in bytes  0:	13.7031	19.625	9.6875	6.0625
Length   18, alignment in bytes  0:	14.2188	23.8594	9.625	5.96875
Length   18, alignment in bytes  0:	14.2969	23.9531	9.5	5.98438
Length   19, alignment in bytes  0:	14.7031	24.5312	9.60938	5.96875
Length   19, alignment in bytes  0:	14.7656	24.25	9.5625	5.98438
Length   20, alignment in bytes  0:	15.1406	25.6875	4.5625	7.40625
Length   20, alignment in bytes  0:	15.3281	25.7812	4.28125	7.17188
Length   21, alignment in bytes  0:	15.6562	27.2656	4.0625	7.125
Length   21, alignment in bytes  0:	15.8438	27.0938	3.84375	7.03125
Length   22, alignment in bytes  0:	16.1719	27.0938	3.96875	7.14062
Length   22, alignment in bytes  0:	16.2344	27	4.01562	7.09375
Length   23, alignment in bytes  0:	16.6562	29.3281	3.98438	7.40625
Length   23, alignment in bytes  0:	16.7344	29.2812	4.0625	7.03125
Length   24, alignment in bytes  0:	17.3438	29.3906	5.75	7.89062
Length   24, alignment in bytes  0:	17.1719	29.2344	5.39062	7.46875
Length   25, alignment in bytes  0:	17.6719	29.5	5.29688	7.375
Length   25, alignment in bytes  0:	17.75	29.5625	5	7.375
Length   26, alignment in bytes  0:	18.2812	30.3594	4.96875	7.34375
Length   26, alignment in bytes  0:	18.1562	30.4062	5.0625	7.5
Length   27, alignment in bytes  0:	18.9062	31.1406	5.09375	7.5
Length   27, alignment in bytes  0:	18.6875	31.0625	5.10938	7.65625
Length   28, alignment in bytes  0:	19.2031	31.7812	9.73438	8.54688
Length   28, alignment in bytes  0:	19.1875	31.75	9.59375	8.25
Length   29, alignment in bytes  0:	19.9375	33.0469	9.64062	8.21875
Length   29, alignment in bytes  0:	19.75	33.25	9.40625	8.125
Length   30, alignment in bytes  0:	20.3125	34.2656	9.34375	8.07812
Length   30, alignment in bytes  0:	20.2969	33.7969	9.20312	8.15625
Length   31, alignment in bytes  0:	20.7812	34.5	10.4375	8.01562
Length   31, alignment in bytes  0:	20.8438	34.5	9.6875	8.04688
  

Patch

diff --git a/sysdeps/powerpc/powerpc32/power4/memset.S b/sysdeps/powerpc/powerpc32/power4/memset.S
index 88110e3..8b746a6 100644
--- a/sysdeps/powerpc/powerpc32/power4/memset.S
+++ b/sysdeps/powerpc/powerpc32/power4/memset.S
@@ -50,7 +50,7 @@  L(_memset):
 
 /* Align to word boundary.  */
 	cmplwi	cr5, rLEN, 31
-	insrdi	rCHR, rCHR, 8, 48     /* Replicate byte to halfword.  */
+	insrwi	rCHR, rCHR, 8, 16     /* Replicate byte to halfword.  */
 	beq+	L(aligned)
 	mtcrf	0x01, rMEMP0
 	subfic	rALIGN, rALIGN, 4
@@ -65,7 +65,7 @@  L(g0):
 /* Handle the case of size < 31.  */
 L(aligned):
 	mtcrf	0x01, rLEN
-	insrdi	rCHR, rCHR, 16, 32    /* Replicate halfword to word.  */
+	insrwi	rCHR, rCHR, 16, 0    /* Replicate halfword to word.  */
 	ble	cr5, L(medium)
 /* Align to 32-byte boundary.  */
 	andi.	rALIGN, rMEMP, 0x1C
diff --git a/sysdeps/powerpc/powerpc32/power6/memset.S b/sysdeps/powerpc/powerpc32/power6/memset.S
index 4b18fa7..445fa44 100644
--- a/sysdeps/powerpc/powerpc32/power6/memset.S
+++ b/sysdeps/powerpc/powerpc32/power6/memset.S
@@ -48,7 +48,7 @@  L(_memset):
 	ble-	cr1, L(small)
 /* Align to word boundary.  */
 	cmplwi	cr5, rLEN, 31
-	insrdi	rCHR, rCHR, 8, 48	/* Replicate byte to halfword.  */
+	insrwi	rCHR, rCHR, 8, 16	/* Replicate byte to halfword.  */
 	beq+	L(aligned)
 	mtcrf	0x01, rMEMP0
 	subfic	rALIGN, rALIGN, 4
@@ -64,7 +64,7 @@  L(g0):
 /* Handle the case of size < 31.  */
 L(aligned):
 	mtcrf	0x01, rLEN
-	insrdi	rCHR, rCHR, 16, 32	/* Replicate halfword to word.  */
+	insrwi	rCHR, rCHR, 16, 0	/* Replicate halfword to word.  */
 	ble	cr5, L(medium)
 /* Align to 32-byte boundary.  */
 	andi.	rALIGN, rMEMP, 0x1C
diff --git a/sysdeps/powerpc/powerpc32/power7/memchr.S b/sysdeps/powerpc/powerpc32/power7/memchr.S
index 1d6a0d6..ccdd7cf 100644
--- a/sysdeps/powerpc/powerpc32/power7/memchr.S
+++ b/sysdeps/powerpc/powerpc32/power7/memchr.S
@@ -25,9 +25,9 @@  ENTRY (__memchr)
 	CALL_MCOUNT
 	dcbt	0,r3
 	clrrwi  r8,r3,2
-	insrdi	r4,r4,8,48
+	insrwi	r4,r4,8,16    /* Replicate byte to word.  */
 	add	r7,r3,r5      /* Calculate the last acceptable address.  */
-	insrdi	r4,r4,16,32
+	insrwi	r4,r4,16,0
 	cmplwi	r5,16
 	li	r9, -1
 	rlwinm	r6,r3,3,27,28 /* Calculate padding.  */
diff --git a/sysdeps/powerpc/powerpc32/power7/memrchr.S b/sysdeps/powerpc/powerpc32/power7/memrchr.S
index ebfd540..b05bf32 100644
--- a/sysdeps/powerpc/powerpc32/power7/memrchr.S
+++ b/sysdeps/powerpc/powerpc32/power7/memrchr.S
@@ -32,8 +32,8 @@  ENTRY (__memrchr)
 	dcbt	r9,r6,16      /* Stream hint, decreasing addresses.  */
 
 	/* Replicate BYTE to word.  */
-	rldimi	r4,r4,8,48
-	rldimi	r4,r4,16,32
+	insrwi	r4,r4,8,16
+	insrwi	r4,r4,16,0
 	li	r6,-4
 	li	r9,-1
 	rlwinm	r0,r0,3,27,28 /* Calculate padding.  */
diff --git a/sysdeps/powerpc/powerpc32/power7/memset.S b/sysdeps/powerpc/powerpc32/power7/memset.S
index ae18761..34fc1ad 100644
--- a/sysdeps/powerpc/powerpc32/power7/memset.S
+++ b/sysdeps/powerpc/powerpc32/power7/memset.S
@@ -35,8 +35,8 @@  L(_memset):
 	cfi_offset(31,-8)
 
 	/* Replicate byte to word.  */
-	insrdi	4,4,8,48
-	insrdi	4,4,16,32
+	insrwi	4,4,8,16
+	insrwi	4,4,16,0
 
 	ble	cr6,L(small)	/* If length <= 8, use short copy code.  */
 
diff --git a/sysdeps/powerpc/powerpc32/power7/rawmemchr.S b/sysdeps/powerpc/powerpc32/power7/rawmemchr.S
index dec4db0..8ccf186 100644
--- a/sysdeps/powerpc/powerpc32/power7/rawmemchr.S
+++ b/sysdeps/powerpc/powerpc32/power7/rawmemchr.S
@@ -27,8 +27,8 @@  ENTRY (__rawmemchr)
 	clrrwi	r8,r3,2	      /* Align the address to word boundary.  */
 
 	/* Replicate byte to word.  */
-	rldimi	r4,r4,8,48
-	rldimi	r4,r4,16,32
+	insrwi	r4,r4,8,16
+	insrwi	r4,r4,16,0
 
 	/* Now r4 has a word of c bytes.  */
 
diff --git a/sysdeps/powerpc/powerpc32/power7/strchr.S b/sysdeps/powerpc/powerpc32/power7/strchr.S
index f7ecb72..d795833 100644
--- a/sysdeps/powerpc/powerpc32/power7/strchr.S
+++ b/sysdeps/powerpc/powerpc32/power7/strchr.S
@@ -35,8 +35,8 @@  ENTRY (strchr)
 	beq	cr7,L(null_match)
 
 	/* Replicate byte to word.  */
-	insrdi	r4,r4,8,48
-	insrdi	r4,r4,16,32
+	insrwi	r4,r4,8,16
+	insrwi	r4,r4,16,0
 
 	/* Now r4 has a word of c bytes and r0 has
 	   a word of null bytes.  */
diff --git a/sysdeps/powerpc/powerpc32/power7/strchrnul.S b/sysdeps/powerpc/powerpc32/power7/strchrnul.S
index ece8237..dcc7620 100644
--- a/sysdeps/powerpc/powerpc32/power7/strchrnul.S
+++ b/sysdeps/powerpc/powerpc32/power7/strchrnul.S
@@ -27,8 +27,8 @@  ENTRY (__strchrnul)
 	clrrwi	r8,r3,2	      /* Align the address to word boundary.  */
 
 	/* Replicate byte to word.  */
-	insrdi	r4,r4,8,48
-	insrdi	r4,r4,16,32
+	insrwi  r4,r4,8,16
+	insrwi  r4,r4,16,0
 
 	rlwinm	r6,r3,3,27,28 /* Calculate padding.  */
 	lwz	r12,0(r8)     /* Load word from memory.  */