S390: Do not clobber r13 with memcpy on 31bit with copies >1MB.

Message ID 5786162A.7080605@linux.vnet.ibm.com
State Committed
Headers

Commit Message

Stefan Liebler July 13, 2016, 10:21 a.m. UTC
  Hi,

If the default memcpy variant is called with a length of >1MB on 31bit,
r13 is clobbered as the algorithm is switching to mvcle. The mvcle code
returns without restoring r13. All other cases are restoring r13.

If memcpy is called from outside libc the ifunc resolver will only
select this variant if running on machines older than z10.
Otherwise or if memcpy is called from inside libc,
this default memcpy variant is called.
The testcase timezone/tst-tzset is triggering this issue in some
combinations of gcc versions and optimization levels.

This bug was introduced in commit 04bb21ac93e90d7696bcaf8febe2b2dd2d83585a
and thus is a regression compared to glibc 2.23 release. It should
be committed before glibc 2.24 release.

This patch removes the usage of r13 at all. Thus it is not saved
and restored. The base address for execute-instruction is now stored
in r5 which is obtained after r5 is not needed anymore as
256byte block counter.

Okay to commit before release 2.24?

Bye
Stefan

ChangeLog:

	* sysdeps/s390/s390-32/memcpy.S (memcpy): Eliminate the usage
	of r13 as it is not restored in mvcle case.
  

Comments

Stefan Liebler July 19, 2016, 6:26 a.m. UTC | #1
PING.

If there is no objection, I'll commit this patch the next days
as it is an s390 related regression compared to glibc 2.23 release.

On 07/13/2016 12:21 PM, Stefan Liebler wrote:
> Hi,
>
> If the default memcpy variant is called with a length of >1MB on 31bit,
> r13 is clobbered as the algorithm is switching to mvcle. The mvcle code
> returns without restoring r13. All other cases are restoring r13.
>
> If memcpy is called from outside libc the ifunc resolver will only
> select this variant if running on machines older than z10.
> Otherwise or if memcpy is called from inside libc,
> this default memcpy variant is called.
> The testcase timezone/tst-tzset is triggering this issue in some
> combinations of gcc versions and optimization levels.
>
> This bug was introduced in commit 04bb21ac93e90d7696bcaf8febe2b2dd2d83585a
> and thus is a regression compared to glibc 2.23 release. It should
> be committed before glibc 2.24 release.
>
> This patch removes the usage of r13 at all. Thus it is not saved
> and restored. The base address for execute-instruction is now stored
> in r5 which is obtained after r5 is not needed anymore as
> 256byte block counter.
>
> Okay to commit before release 2.24?
>
> Bye
> Stefan
>
> ChangeLog:
>
>      * sysdeps/s390/s390-32/memcpy.S (memcpy): Eliminate the usage
>      of r13 as it is not restored in mvcle case.
  
Adhemerval Zanella July 19, 2016, 1:12 p.m. UTC | #2
Sorry Stefan, for some reason I missed this thread.  I think it is
ok for 2.24.


On 19/07/2016 03:26, Stefan Liebler wrote:
> PING.
> 
> If there is no objection, I'll commit this patch the next days
> as it is an s390 related regression compared to glibc 2.23 release.
> 
> On 07/13/2016 12:21 PM, Stefan Liebler wrote:
>> Hi,
>>
>> If the default memcpy variant is called with a length of >1MB on 31bit,
>> r13 is clobbered as the algorithm is switching to mvcle. The mvcle code
>> returns without restoring r13. All other cases are restoring r13.
>>
>> If memcpy is called from outside libc the ifunc resolver will only
>> select this variant if running on machines older than z10.
>> Otherwise or if memcpy is called from inside libc,
>> this default memcpy variant is called.
>> The testcase timezone/tst-tzset is triggering this issue in some
>> combinations of gcc versions and optimization levels.
>>
>> This bug was introduced in commit 04bb21ac93e90d7696bcaf8febe2b2dd2d83585a
>> and thus is a regression compared to glibc 2.23 release. It should
>> be committed before glibc 2.24 release.
>>
>> This patch removes the usage of r13 at all. Thus it is not saved
>> and restored. The base address for execute-instruction is now stored
>> in r5 which is obtained after r5 is not needed anymore as
>> 256byte block counter.
>>
>> Okay to commit before release 2.24?
>>
>> Bye
>> Stefan
>>
>> ChangeLog:
>>
>>      * sysdeps/s390/s390-32/memcpy.S (memcpy): Eliminate the usage
>>      of r13 as it is not restored in mvcle case.
>
  
Stefan Liebler July 20, 2016, 7:32 a.m. UTC | #3
On 07/19/2016 03:12 PM, Adhemerval Zanella wrote:
> Sorry Stefan, for some reason I missed this thread.  I think it is
> ok for 2.24.
>

No problem. I've commited it.
  

Patch

commit a671883eb41bc3179e7a435e6c2857a913757bf1
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed Jul 13 10:46:10 2016 +0200

    S390: Do not clobber r13 with memcpy on 31bit with copies >1MB.
    
    If the default memcpy variant is called with a length of >1MB on 31bit,
    r13 is clobbered as the algorithm is switching to mvcle. The mvcle code
    returns without restoring r13. All other cases are restoring r13.
    
    If memcpy is called from outside libc the ifunc resolver will only select
    this variant if running on machines older than z10.
    Otherwise or if memcpy is called from inside libc, this default memcpy
    variant is called.
    The testcase timezone/tst-tzset is triggering this issue in some combinations
    of gcc versions and optimization levels.
    
    This bug was introduced in commit 04bb21ac93e90d7696bcaf8febe2b2dd2d83585a
    and thus is a regression compared to glibc 2.23 release. It should
    be committed before glibc 2.24 release.
    
    This patch removes the usage of r13 at all. Thus it is not saved and restored.
    The base address for execute-instruction is now stored in r5 which is obtained
    after r5 is not needed anymore as 256byte block counter.
    
    ChangeLog:
    
    	* sysdeps/s390/s390-32/memcpy.S (memcpy): Eliminate the usage
    	of r13 as it is not restored in mvcle case.

diff --git a/sysdeps/s390/s390-32/memcpy.S b/sysdeps/s390/s390-32/memcpy.S
index 6be5104..1a8bdbf 100644
--- a/sysdeps/s390/s390-32/memcpy.S
+++ b/sysdeps/s390/s390-32/memcpy.S
@@ -42,20 +42,18 @@  ENTRY(memcpy)
 	.machine "g5"
 	lr      %r1,%r2             # r1: Use as dest ; r2: Return dest
 .L_G5_start:
-	st      %r13,52(%r15)
-	.cfi_offset 13, -44
-	basr    %r13,0
-.L_G5_16:
 	ltr     %r4,%r4
-	je      .L_G5_4
+	je      .L_G5_99
 	ahi     %r4,-1
 	lr      %r5,%r4
 	srl     %r5,8
 	ltr     %r5,%r5
 	jne     .L_G5_13
-	ex      %r4,.L_G5_17-.L_G5_16(%r13)
 .L_G5_4:
-	l       %r13,52(%r15)
+	basr    %r5,0
+.L_G5_16:
+	ex      %r4,.L_G5_17-.L_G5_16(%r5)
+.L_G5_99:
 	br      %r14
 .L_G5_13:
 	chi	%r5,4096            # Switch to mvcle for copies >1MB
@@ -65,7 +63,6 @@  ENTRY(memcpy)
 	la      %r1,256(%r1)
 	la      %r3,256(%r3)
 	brct    %r5,.L_G5_12
-	ex      %r4,.L_G5_17-.L_G5_16(%r13)
 	j       .L_G5_4
 .L_G5_17:
 	mvc     0(1,%r1),0(%r3)