[COMMITED,2.22,2.21,2.20] S390: Save and restore fprs/vrs while resolving symbols.

Message ID ndtj69$rtq$1@ger.gmane.org
State Committed
Headers

Commit Message

Stefan Liebler April 4, 2016, 11:29 a.m. UTC
  Hi,

I`ve applied the "S390: Save and restore fprs/vrs while resolving 
symbols." patch to release branch 2.20 - 2.22. The other patches are 
needed for preparation.

[PATCH 1/5] S390: Add hwcaps value for vector facility.

The HWCAP_S390_VX flag in hwcap field of auxiliary vector indicates
if the vector facility is available and the kernel is aware of it.
This can be tested with LD_SHOW_AUXV=1 <prog>.
Currently it does not show te, because it was not incremented
by commit "S/390: Add hwcap value for transactional execution.".
Thus _DL_HWCAP_COUNT is incremented by two.

ChangeLog:

	* sysdeps/s390/dl-procinfo.c (_dl_s390_platforms): Add vector flag.
	* sysdeps/s390/dl-procinfo.h: Add vector capability.
	* sysdeps/unix/sysv/linux/s390/bits/hwcap.h (HWCAP_S390_VX): Define.

(cherry picked from commit 4e28fa80886c71e6aaf85016b82ce981c0f12e6d)







[PATCH 2/5] S390: Add new s390 platform.

The new IBM z13 is added to platform string array.
The macro _DL_PLATFORMS_COUNT is incremented to 8,
because it was not incremented by commit
"S/390: Sync AUXV capabilities and archs with kernel".

ChangeLog:

	* sysdeps/s390/dl-procinfo.c (_dl_s390_cap_flags): Add z13.
	* sysdeps/s390/dl-procinfo.h (_DL_PLATFORMS_COUNT): Increased.

(cherry picked from commit a1b0488fc9df3d895a2e5eefbcd348d3f7fe0e52)





[PATCH 3/5] S390: configure check for vector instruction support in
  assembler.

The S390 specific test checks if the assembler has support for the new z13
vector instructions by compiling a vector instruction. The .machine and
.machinemode directives are needed to compile the vector instruction without
-march=z13 option on 31/64 bit.
On success the macro HAVE_S390_VX_ASM_SUPPORT is defined. This macro is used
to determine if the optimized functions can be build without compile errors.
If the used assembler lacks vector support, then a warning is dumped while
configuring and only the common code functions are build.

The z13 instruction support was introduced in
"[Committed] S/390: Add support for IBM z13."
(https://sourceware.org/ml/binutils/2015-01/msg00197.html)

ChangeLog:

	* config.h.in (HAVE_S390_VX_ASM_SUPPORT): New macro undefine.
	* sysdeps/s390/configure.ac: Add test for S390 vector instruction
	assembler support.
	* sysdeps/s390/configure: Regenerated.

(cherry picked from commit 4f0a1cea34c05fb2acc16f1a2d291f53230eb4fb)





[PATCH 4/5] S390: Save and restore fprs/vrs while resolving symbols.

On s390, no fpr/vrs were saved while resolving a symbol
via _dl_runtime_resolve/_dl_runtime_profile.

According to the abi, the fpr-arguments are defined as call clobbered.
In leaf-functions, gcc 4.9 and newer can use fprs for saving/restoring gprs
instead of saving them to the stack.
If gcc do this in one of the resolver-functions, then the floating point
arguments of a library-function are invalid for the first 
library-function-call.
Thus, this patch saves/restores the fprs around the resolving code.

The same could occur for vector registers. Furthermore an ifunc-resolver
could also clobber the vector/floating point argument registers.
Thus this patch provides the further variants _dl_runtime_resolve_vx/
_dl_runtime_profile_vx, which are used if the kernel claims, that
we run on a machine with vector registers.

Furthermore, if _dl_runtime_profile calls _dl_call_pltexit,
the pointers to inregs-/outregs-structs were setup invalid.
Now they point to the correct location in the stack-frame.
Before branching back to the caller, the return values are now
restored instead of containing the return values of the
_dl_call_pltexit() call.
On s390-32, an endless loop occurs if _dl_call_pltexit() should be called.
Now, this code-path branches to this function instead of just after the
preceding basr-instruction.

ChangeLog:

	* sysdeps/s390/s390-32/dl-trampoline.S: Include dl-trampoline.h twice
	to create a non-vector/vector version for _dl_runtime_resolve and
	_dl_runtime_profile. Move implementation to ...
	* sysdeps/s390/s390-32/dl-trampoline.h: ... here.
	(_dl_runtime_resolve) Save and restore fpr/vrs.
	(_dl_runtime_profile) Save and restore vrs and fix some issues
	if _dl_call_pltexit is called.
	* sysdeps/s390/s390-32/dl-machine.h (elf_machine_runtime_setup):
	Choose the correct resolver function if running on a machine with vx.
	* sysdeps/s390/s390-64/dl-trampoline.S: Include dl-trampoline.h twice
	to create a non-vector/vector version for _dl_runtime_resolve and
	_dl_runtime_profile. Move implementation to ...
	* sysdeps/s390/s390-64/dl-trampoline.h: ... here.
	(_dl_runtime_resolve) Save and restore fpr/vrs.
	(_dl_runtime_profile) Save and restore vrs and fix some issues
	* sysdeps/s390/s390-64/dl-machine.h: (elf_machine_runtime_setup):
	Choose the correct resolver function if running on a machine with vx.

(cherry picked from commit 4603c51ef7989d7eb800cdd6f42aab206f891077
and commit d8a012c5c9e4bfc1b8db2bc6deacb85b44a2e1eb)









[PATCH 5/5] S390: Extend structs La_s390_regs / La_s390_retval with
  vector-registers.

Starting with z13, vector registers can also occur as argument registers.
Thus the passed input/output register structs for
la_s390_[32|64]_gnu_plt[enter|exit] functions should reflect those new
registers. This patch extends these structs La_s390_regs and La_s390_retval
and adjusts _dl_runtime_profile() to handle those fields in case of
running on a z13 machine.

ChangeLog:

	* sysdeps/s390/bits/link.h: (La_s390_vr) New typedef.
	(La_s390_32_regs): Append vector register lr_v24-lr_v31.
	(La_s390_64_regs): Likewise.
	(La_s390_32_retval): Append vector register lrv_v24.
	(La_s390_64_retval): Likeweise.
	* sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_profile):
	Handle extended structs La_s390_32_regs and La_s390_32_retval.
	* sysdeps/s390/s390-64/dl-trampoline.h (_dl_runtime_profile):
	Handle extended structs La_s390_64_regs and La_s390_64_retval.

(cherry picked from commit 5cdd1989d1d2f135d02e66250f37ba8e767f9772)
  

Patch

From 1e1cfaa2374d9b8bc468bea61469c2a71ef8a75f Mon Sep 17 00:00:00 2001
From: Stefan Liebler <stli@linux.vnet.ibm.com>
Date: Thu, 31 Mar 2016 17:37:16 +0200
Subject: [PATCH 5/5] S390: Extend structs La_s390_regs / La_s390_retval with
 vector-registers.

Starting with z13, vector registers can also occur as argument registers.
Thus the passed input/output register structs for
la_s390_[32|64]_gnu_plt[enter|exit] functions should reflect those new
registers. This patch extends these structs La_s390_regs and La_s390_retval
and adjusts _dl_runtime_profile() to handle those fields in case of
running on a z13 machine.

ChangeLog:

	* sysdeps/s390/bits/link.h: (La_s390_vr) New typedef.
	(La_s390_32_regs): Append vector register lr_v24-lr_v31.
	(La_s390_64_regs): Likewise.
	(La_s390_32_retval): Append vector register lrv_v24.
	(La_s390_64_retval): Likeweise.
	* sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_profile):
	Handle extended structs La_s390_32_regs and La_s390_32_retval.
	* sysdeps/s390/s390-64/dl-trampoline.h (_dl_runtime_profile):
	Handle extended structs La_s390_64_regs and La_s390_64_retval.

(cherry picked from commit 5cdd1989d1d2f135d02e66250f37ba8e767f9772)
---
 ChangeLog                            | 12 ++++++
 sysdeps/s390/bits/link.h             | 29 +++++++++++++
 sysdeps/s390/s390-32/dl-trampoline.h | 76 +++++++++++++++++++-------------
 sysdeps/s390/s390-64/dl-trampoline.h | 84 +++++++++++++++++++++---------------
 4 files changed, 136 insertions(+), 65 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 6788f58..6c8f1ad 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,17 @@ 
 2016-04-04  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	* sysdeps/s390/bits/link.h: (La_s390_vr) New typedef.
+	(La_s390_32_regs): Append vector register lr_v24-lr_v31.
+	(La_s390_64_regs): Likewise.
+	(La_s390_32_retval): Append vector register lrv_v24.
+	(La_s390_64_retval): Likeweise.
+	* sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_profile):
+	Handle extended structs La_s390_32_regs and La_s390_32_retval.
+	* sysdeps/s390/s390-64/dl-trampoline.h (_dl_runtime_profile):
+	Handle extended structs La_s390_64_regs and La_s390_64_retval.
+
+2016-04-04  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* sysdeps/s390/s390-32/dl-trampoline.S: Include dl-trampoline.h twice
 	to create a non-vector/vector version for _dl_runtime_resolve and
 	_dl_runtime_profile. Move implementation to ...
diff --git a/sysdeps/s390/bits/link.h b/sysdeps/s390/bits/link.h
index 370a767..f431009 100644
--- a/sysdeps/s390/bits/link.h
+++ b/sysdeps/s390/bits/link.h
@@ -19,6 +19,9 @@ 
 # error "Never include <bits/link.h> directly; use <link.h> instead."
 #endif
 
+#if defined HAVE_S390_VX_ASM_SUPPORT
+typedef char La_s390_vr[16];
+#endif
 
 #if __ELF_NATIVE_CLASS == 32
 
@@ -32,6 +35,16 @@  typedef struct La_s390_32_regs
   uint32_t lr_r6;
   double lr_fp0;
   double lr_fp2;
+# if defined HAVE_S390_VX_ASM_SUPPORT
+  La_s390_vr lr_v24;
+  La_s390_vr lr_v25;
+  La_s390_vr lr_v26;
+  La_s390_vr lr_v27;
+  La_s390_vr lr_v28;
+  La_s390_vr lr_v29;
+  La_s390_vr lr_v30;
+  La_s390_vr lr_v31;
+# endif
 } La_s390_32_regs;
 
 /* Return values for calls from PLT on s390-32.  */
@@ -40,6 +53,9 @@  typedef struct La_s390_32_retval
   uint32_t lrv_r2;
   uint32_t lrv_r3;
   double lrv_fp0;
+# if defined HAVE_S390_VX_ASM_SUPPORT
+  La_s390_vr lrv_v24;
+# endif
 } La_s390_32_retval;
 
 
@@ -77,6 +93,16 @@  typedef struct La_s390_64_regs
   double lr_fp2;
   double lr_fp4;
   double lr_fp6;
+# if defined HAVE_S390_VX_ASM_SUPPORT
+  La_s390_vr lr_v24;
+  La_s390_vr lr_v25;
+  La_s390_vr lr_v26;
+  La_s390_vr lr_v27;
+  La_s390_vr lr_v28;
+  La_s390_vr lr_v29;
+  La_s390_vr lr_v30;
+  La_s390_vr lr_v31;
+# endif
 } La_s390_64_regs;
 
 /* Return values for calls from PLT on s390-64.  */
@@ -84,6 +110,9 @@  typedef struct La_s390_64_retval
 {
   uint64_t lrv_r2;
   double lrv_fp0;
+# if defined HAVE_S390_VX_ASM_SUPPORT
+  La_s390_vr lrv_v24;
+# endif
 } La_s390_64_retval;
 
 
diff --git a/sysdeps/s390/s390-32/dl-trampoline.h b/sysdeps/s390/s390-32/dl-trampoline.h
index 5627567..086449f 100644
--- a/sysdeps/s390/s390-32/dl-trampoline.h
+++ b/sysdeps/s390/s390-32/dl-trampoline.h
@@ -112,28 +112,31 @@  _dl_runtime_resolve:
 	cfi_startproc
 	.align 16
 _dl_runtime_profile:
-	stm    %r2,%r6,32(%r15)		# save registers
-	cfi_offset (r2, -64)		# + r6 needed as arg for
-	cfi_offset (r3, -60)		#  _dl_profile_fixup
-	cfi_offset (r4, -56)
-	cfi_offset (r5, -52)
-	cfi_offset (r6, -48)
-	std    %f0,56(%r15)
-	cfi_offset (f0, -40)
-	std    %f2,64(%r15)
-	cfi_offset (f2, -32)
 	st     %r12,12(%r15)		# r12 is used as backup of r15
 	cfi_offset (r12, -84)
 	st     %r14,16(%r15)
 	cfi_offset (r14, -80)
 	lr     %r12,%r15		# backup stack pointer
 	cfi_def_cfa_register (12)
+	ahi    %r15,-264		# create stack frame:
+					# 96 + sizeof(La_s390_32_regs)
+	st     %r12,0(%r15)		# save backchain
+
+	stm    %r2,%r6,96(%r15)		# save registers
+	cfi_offset (r2, -264)		# + r6 needed as arg for
+	cfi_offset (r3, -260)		#  _dl_profile_fixup
+	cfi_offset (r4, -256)
+	cfi_offset (r5, -252)
+	cfi_offset (r6, -248)
+	std    %f0,120(%r15)
+	cfi_offset (f0, -240)
+	std    %f2,128(%r15)
+	cfi_offset (f2, -232)
 #ifdef RESTORE_VRS
-	ahi    %r15,-224		# create stack frame
 	.machine push
 	.machine "z13"
 	.machinemode "zarch_nohighgprs"
-	vstm   %v24,%v31,96(%r15)	# store call-clobbered vr arguments
+	vstm   %v24,%v31,136(%r15)	# store call-clobbered vr arguments
 	cfi_offset (v24, -224)
 	cfi_offset (v25, -208)
 	cfi_offset (v26, -192)
@@ -143,31 +146,31 @@  _dl_runtime_profile:
 	cfi_offset (v30, -128)
 	cfi_offset (v31, -112)
 	.machine pop
-#else
-	ahi    %r15,-96			# create stack frame
 #endif
-	st     %r12,0(%r15)		# save backchain
+
 	lm     %r2,%r3,24(%r12)		# load arguments saved by PLT
 	lr     %r4,%r14			# return address as third parameter
 	basr   %r1,0
 0:	l      %r14,6f-0b(%r1)
-	la     %r5,32(%r12)		# pointer to struct La_s390_32_regs
+	la     %r5,96(%r15)		# pointer to struct La_s390_32_regs
 	la     %r6,20(%r12)		# long int * framesize
 	bas    %r14,0(%r14,%r1)		# call resolver
 	lr     %r1,%r2			# function addr returned in r2
-	ld     %f0,56(%r12)		# restore call-clobbered arg fprs
-	ld     %f2,64(%r12)
+	ld     %f0,120(%r15)		# restore call-clobbered arg fprs
+	ld     %f2,128(%r15)
 #ifdef RESTORE_VRS
 	.machine push
 	.machine "z13"
 	.machinemode "zarch_nohighgprs"
-	vlm    %v24,%v31,96(%r15)	# restore call-clobbered arg vrs
+	vlm    %v24,%v31,136(%r15)	# restore call-clobbered arg vrs
 	.machine pop
 #endif
 	icm    %r0,15,20(%r12)		# load & test framesize
 	jnm    2f
 
-	lm     %r2,%r6,32(%r12)
+	lm     %r2,%r6,96(%r15)		# framesize < 0 means no pltexit call
+					# so we can do a tail call without
+					# copying the arg overflow area
 	lr     %r15,%r12		# remove stack frame
 	cfi_def_cfa_register (15)
 	l      %r14,16(%r15)		# restore registers
@@ -175,7 +178,9 @@  _dl_runtime_profile:
 	br     %r1			# tail-call to the resolved function
 
 	cfi_def_cfa_register (12)
-2:	jz     4f			# framesize == 0 ?
+2:	la     %r4,96(%r15)		# pointer to struct La_s390_32_regs
+	st     %r4,32(%r12)
+	jz     4f			# framesize == 0 ?
 	ahi    %r0,7			# align framesize to 8
 	lhi    %r2,-8
 	nr     %r0,%r2
@@ -188,24 +193,35 @@  _dl_runtime_profile:
 	la     %r2,8(%r2)
 	la     %r3,8(%r3)
 	brct   %r0,3b
-4:	lm     %r2,%r6,32(%r12)		# load register parameters
+4:	lm     %r2,%r6,0(%r4)		# load register parameters
 	basr   %r14,%r1			# call resolved function
-	stm    %r2,%r3,72(%r12)		# store return values r2, r3, f0
-	std    %f0,80(%r12)		# to struct La_s390_32_retval
-	lm     %r2,%r3,24(%r12)		# load arguments saved by PLT
+	stm    %r2,%r3,40(%r12)		# store return values r2, r3, f0
+	std    %f0,48(%r12)		# to struct La_s390_32_retval
+#ifdef RESTORE_VRS
+	.machine push
+	.machine "z13"
+	vst    %v24,56(%r12)		# store return value v24
+	.machine pop
+#endif
+	lm     %r2,%r4,24(%r12)		# r2, r3: load arguments saved by PLT
+					# r4: pointer to struct La_s390_32_regs
 	basr   %r1,0
 5:	l      %r14,7f-5b(%r1)
-	la     %r4,32(%r12)		# pointer to struct La_s390_32_regs
-	la     %r5,72(%r12)		# pointer to struct La_s390_32_retval
+	la     %r5,40(%r12)		# pointer to struct La_s390_32_retval
 	bas    %r14,0(%r14,%r1)		# call _dl_call_pltexit
 
 	lr     %r15,%r12		# remove stack frame
 	cfi_def_cfa_register (15)
 	l      %r14,16(%r15)		# restore registers
 	l      %r12,12(%r15)
-	l      %r2,72(%r15)		# restore return values
-	l      %r3,76(%r15)
-	ld     %f0,80(%r15)
+	lm     %r2,%r3,40(%r15)		# restore return values
+	ld     %f0,48(%r15)
+#ifdef RESTORE_VRS
+	.machine push
+	.machine "z13"
+	vl    %v24,56(%r15)		# restore return value v24
+	.machine pop
+#endif
 	br     %r14
 
 6:	.long  _dl_profile_fixup - 0b
diff --git a/sysdeps/s390/s390-64/dl-trampoline.h b/sysdeps/s390/s390-64/dl-trampoline.h
index 658e3a3..33ea3de 100644
--- a/sysdeps/s390/s390-64/dl-trampoline.h
+++ b/sysdeps/s390/s390-64/dl-trampoline.h
@@ -109,31 +109,34 @@  _dl_runtime_resolve:
 	cfi_startproc
 	.align 16
 _dl_runtime_profile:
-	stmg   %r2,%r6,64(%r15)		# save call-clobbered arg regs
-	cfi_offset (r2, -96)		# + r6 needed as arg for
-	cfi_offset (r3, -88)		#  _dl_profile_fixup
-	cfi_offset (r4, -80)
-	cfi_offset (r5, -72)
-	cfi_offset (r6, -64)
-	std    %f0,104(%r15)
-	cfi_offset (f0, -56)
-	std    %f2,112(%r15)
-	cfi_offset (f2, -48)
-	std    %f4,120(%r15)
-	cfi_offset (f4, -40)
-	std    %f6,128(%r15)
-	cfi_offset (f6, -32)
 	stg    %r12,24(%r15)		# r12 is used as backup of r15
 	cfi_offset (r12, -136)
 	stg    %r14,32(%r15)
 	cfi_offset (r14, -128)
 	lgr    %r12,%r15		# backup stack pointer
 	cfi_def_cfa_register (12)
+	aghi   %r15,-360		# create stack frame:
+					# 160 + sizeof(La_s390_64_regs)
+	stg    %r12,0(%r15)		# save backchain
+
+	stmg   %r2,%r6,160(%r15)	# save call-clobbered arg regs
+	cfi_offset (r2, -360)		# + r6 needed as arg for
+	cfi_offset (r3, -352)		#  _dl_profile_fixup
+	cfi_offset (r4, -344)
+	cfi_offset (r5, -336)
+	cfi_offset (r6, -328)
+	std    %f0,200(%r15)
+	cfi_offset (f0, -320)
+	std    %f2,208(%r15)
+	cfi_offset (f2, -312)
+	std    %f4,216(%r15)
+	cfi_offset (f4, -304)
+	std    %f6,224(%r15)
+	cfi_offset (f6, -296)
 #ifdef RESTORE_VRS
-	aghi   %r15,-288		# create stack frame
 	.machine push
 	.machine "z13"
-	vstm   %v24,%v31,160(%r15)# store call-clobbered vector argument registers
+	vstm   %v24,%v31,232(%r15)      # store call-clobbered vector arguments
 	cfi_offset (v24, -288)
 	cfi_offset (v25, -272)
 	cfi_offset (v26, -256)
@@ -143,31 +146,28 @@  _dl_runtime_profile:
 	cfi_offset (v30, -192)
 	cfi_offset (v31, -176)
 	.machine pop
-#else
-	aghi   %r15,-160		# create stack frame
 #endif
-	stg    %r12,0(%r15)		# save backchain
 	lmg    %r2,%r3,48(%r12)		# load arguments saved by PLT
 	lgr    %r4,%r14			# return address as third parameter
-	la     %r5,64(%r12)		# pointer to struct La_s390_64_regs
+	la     %r5,160(%r15)		# pointer to struct La_s390_64_regs
 	la     %r6,40(%r12)		# long int * framesize
 	brasl  %r14,_dl_profile_fixup	# call resolver
 	lgr    %r1,%r2			# function addr returned in r2
-	ld     %f0,104(%r12)		# restore call-clobbered arg fprs
-	ld     %f2,112(%r12)
-	ld     %f4,120(%r12)
-	ld     %f6,128(%r12)
+	ld     %f0,200(%r15)		# restore call-clobbered arg fprs
+	ld     %f2,208(%r15)
+	ld     %f4,216(%r15)
+	ld     %f6,224(%r15)
 #ifdef RESTORE_VRS
 	.machine push
 	.machine "z13"
-	vlm    %v24,%v31,160(%r15)	# restore call-clobbered arg vrs
+	vlm    %v24,%v31,232(%r15)	# restore call-clobbered arg vrs
 	.machine pop
 #endif
 	lg     %r0,40(%r12)		# load framesize
 	ltgr   %r0,%r0
 	jnm    1f
 
-	lmg    %r2,%r6,64(%r12)		# framesize < 0 means no pltexit call
+	lmg    %r2,%r6,160(%r15)	# framesize < 0 means no pltexit call
 					# so we can do a tail call without
 					# copying the arg overflow area
 	lgr    %r15,%r12		# remove stack frame
@@ -177,7 +177,9 @@  _dl_runtime_profile:
 	br     %r1			# tail-call to resolved function
 
 	cfi_def_cfa_register (12)
-1:	jz     4f			# framesize == 0 ?
+1:	la     %r4,160(%r15)		# pointer to struct La_s390_64_regs
+	stg    %r4,64(%r12)
+	jz     4f			# framesize == 0 ?
 	aghi   %r0,7			# align framesize to 8
 	nill   %r0,0xfff8
 	slgr   %r15,%r0			# make room for framesize bytes
@@ -189,21 +191,33 @@  _dl_runtime_profile:
 	la     %r2,8(%r2)		# depending on framesize
 	la     %r3,8(%r3)
 	brctg  %r0,3b
-4:	lmg    %r2,%r6,64(%r12)		# restore call-clobbered arg gprs
+4:	lmg    %r2,%r6,0(%r4)		# restore call-clobbered arg gprs
 	basr   %r14,%r1			# call resolved function
-	stg    %r2,136(%r12)		# store return values r2, f0
-	std    %f0,144(%r12)		# to struct La_s390_64_retval
-	lmg    %r2,%r3,48(%r12)		# load arguments saved by PLT
-	la     %r4,64(%r12)		# pointer to struct La_s390_64_regs
-	la     %r5,136(%r12)		# pointer to struct La_s390_64_retval
+	stg    %r2,72(%r12)		# store return values r2, f0
+	std    %f0,80(%r12)		# to struct La_s390_64_retval
+#ifdef RESTORE_VRS
+	.machine push
+	.machine "z13"
+	vst    %v24,88(%r12)		# store return value v24
+	.machine pop
+#endif
+	lmg    %r2,%r4,48(%r12)		# r2, r3: load arguments saved by PLT
+					# r4: pointer to struct La_s390_64_regs
+	la     %r5,72(%r12)		# pointer to struct La_s390_64_retval
 	brasl  %r14,_dl_call_pltexit
 
 	lgr    %r15,%r12		# remove stack frame
 	cfi_def_cfa_register (15)
 	lg     %r14,32(%r15)		# restore registers
 	lg     %r12,24(%r15)
-	lg     %r2,136(%r15)		# restore return values
-	ld     %f0,144(%r15)
+	lg     %r2,72(%r15)		# restore return values
+	ld     %f0,80(%r15)
+#ifdef RESTORE_VRS
+	.machine push
+	.machine "z13"
+	vl    %v24,88(%r15)		# restore return value v24
+	.machine pop
+#endif
 	br     %r14			# Jump back to caller
 
 	cfi_endproc
-- 
2.3.0