Message ID | 20200504152503.9608-1-murphyp@linux.vnet.ibm.com |
---|---|
State | Committed |
Headers |
Return-Path: <libc-alpha-bounces@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B7BA3876059; Mon, 4 May 2020 15:25:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B7BA3876059 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1588605908; bh=ymJsiiuF/n4gM/tOIlNlYqmwyuse9eKuH5SPKyaLtSY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=xzLtlnoxJM2cFiRkKDuExrIa77a+0legILyVb//l8PNV9q0ykvbXgRviLQ1cG9ntK bF58Cp5+xd8zN6HCFUulRmSsCFGgLdCwn6+PKnSvFMR2uCl/lKCPyCqtFDtblN+ATQ VuehYD66TSJv+bPHjGvnMeo3Qn8YcoKj649lXb08= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id C7938383E838 for <libc-alpha@sourceware.org>; Mon, 4 May 2020 15:25:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C7938383E838 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 044F2NHw189967; Mon, 4 May 2020 11:25:04 -0400 Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com with ESMTP id 30s4v6yjgv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 04 May 2020 11:25:04 -0400 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 044FJvue020045; Mon, 4 May 2020 15:25:04 GMT Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by ppma04wdc.us.ibm.com with ESMTP id 30s0g6hj68-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 04 May 2020 15:25:04 +0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 044FP3rq49480066 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 4 May 2020 15:25:03 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D15CEB2064; Mon, 4 May 2020 15:25:03 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8BEF6B2065; Mon, 4 May 2020 15:25:03 +0000 (GMT) Received: from brokenarrow.ibmuc.com (unknown [9.85.137.49]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 4 May 2020 15:25:03 +0000 (GMT) To: libc-alpha@sourceware.org, adhemerval.zanella@linaro.org Subject: [PATCHv2] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905] Date: Mon, 4 May 2020 10:25:03 -0500 Message-Id: <20200504152503.9608-1-murphyp@linux.vnet.ibm.com> X-Mailer: git-send-email 2.21.1 In-Reply-To: <27007d15-933d-8d13-161f-cab2b26b37f2@linaro.org> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.676 definitions=2020-05-04_09:2020-05-04, 2020-05-04 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 spamscore=0 suspectscore=0 adultscore=0 clxscore=1011 impostorscore=0 lowpriorityscore=0 bulkscore=0 phishscore=0 mlxscore=0 mlxlogscore=876 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005040121 X-Spam-Status: No, score=-21.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <http://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <http://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: "Paul E. Murphy via Libc-alpha" <libc-alpha@sourceware.org> Reply-To: "Paul E. Murphy" <murphyp@linux.vnet.ibm.com> Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org> |
Series |
[PATCHv2] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]
|
|
Commit Message
Paul E. Murphy
May 4, 2020, 3:25 p.m. UTC
Use rtld-strcpy.S file in power9 to redirect to power8 implementation instead. This builds the power8 strcpy rtld with --with-cpu=power9 and --disable-multi-arch. The existing behavior is unchanged when multi-arch is enabled. ---8<--- strcmp is used while resolving PLT references. Vector registers should not be used during this. The P9 strcmp makes heavy use of vector registers, so it should be avoided in rtld. This prevents quiet vector register corruption when glibc is configured with --disable-multi-arch and --with-cpu=power9. This can be seen with test-float64x-compat_totalordermag during the first call into totalordermagf64x@GLIBC_2.27. Add a guard to fallback to the power8 implementation when building power9 strcmp for libraries other than libc. --- sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
Comments
On 04/05/2020 12:25, Paul E. Murphy wrote: > Use rtld-strcpy.S file in power9 to redirect to power8 implementation > instead. This builds the power8 strcpy rtld with --with-cpu=power9 and s/strcpy/strcmp. > --disable-multi-arch. The existing behavior is unchanged when > multi-arch is enabled. LGTM thanks. As a side note, is strcmp the only string routine that used in lazy resolution? Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> > > ---8<--- > strcmp is used while resolving PLT references. Vector registers > should not be used during this. The P9 strcmp makes heavy use of > vector registers, so it should be avoided in rtld. > > This prevents quiet vector register corruption when glibc is configured > with --disable-multi-arch and --with-cpu=power9. This can be seen with > test-float64x-compat_totalordermag during the first call into > totalordermagf64x@GLIBC_2.27. > > Add a guard to fallback to the power8 implementation when building > power9 strcmp for libraries other than libc. > --- > sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S | 2 ++ > 1 file changed, 2 insertions(+) > create mode 100644 sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S > > diff --git a/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S > new file mode 100644 > index 0000000000..afdb492b3d > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S > @@ -0,0 +1,2 @@ > +/* Fallback to P8 which does not use vector regs for rtld. */ > +#include <sysdeps/powerpc/powerpc64/power8/strcmp.S> >
On 5/4/20 11:36 AM, Adhemerval Zanella wrote: > As a side note, is strcmp the only string routine that used in lazy > resolution? It was the only one I encountered while debugging this issue. I won't claim it is the only one, I have limited experience with that corner of glibc. Though, I am surprised only one test failed due to the vector register clobbering.
On 04/05/2020 15:02, Paul E Murphy wrote: > > > On 5/4/20 11:36 AM, Adhemerval Zanella wrote: > >> As a side note, is strcmp the only string routine that used in lazy >> resolution? > > It was the only one I encountered while debugging this issue. I won't claim it is the only one, I have limited experience with that corner of glibc. Though, I am surprised only one test failed due to the vector register clobbering. I am wondering which would be best way to avoid such issue to happen with different string routines since some specific loader code need to adhere to more strict ABI requirements. The powerpc Makefile already adds the expected compiler tools to avoid this very issue: --- sysdeps/powerpc/powerpc64/Makefile 13 # These flags prevent FPU or Altivec registers from being used, 14 # for code called in contexts that is not allowed to touch those registers. 15 # Stupid GCC requires us to pass all these ridiculous switches. We need to 16 # pass the -mno-* switches as well to prevent the compiler from attempting 17 # to emit altivec or vsx instructions, especially when the registers aren't 18 # available. 19 no-special-regs := $(sort $(foreach n,40 41 50 51 60 61 62 63 \ 20 $(foreach m,2 3 4 5 6 7 8 9, \ 21 3$m 4$m 5$m),\ 22 -ffixed-$n)) \ 23 $(sort $(foreach n,$(foreach m,0 1 2 3 4 5 6 7 8 9,\ 24 $m 1$m 2$m) 30 31,\ 25 -ffixed-v$n)) \ 26 -ffixed-vrsave -ffixed-vscr -mno-altivec -mno-vsx [...] 36 CFLAGS-dl-runtime.os = $(no-special-regs) 37 CFLAGS-dl-lookup.os = $(no-special-regs) 38 CFLAGS-dl-misc.os = $(no-special-regs) 39 CFLAGS-rtld-mempcpy.os = $(no-special-regs) 40 CFLAGS-rtld-memmove.os = $(no-special-regs) 41 CFLAGS-rtld-memchr.os = $(no-special-regs) 42 CFLAGS-rtld-strnlen.os = $(no-special-regs) --- However it does not help when an arch-specific implementation might ended being pulled in rtld build and thus making this ineffective (And it seems that the strcmp is missing as well for the powerpc case). And using __builtin_strcmp not -ffreestanding does really help on dl-runtime, since strcmp call might still being generated by compiler. Maybe one option could to enforce all rtld code to use generic C implementation instead, not allowing sysdeps folder to override it. It should scale better than actually testing all possible permutation and require less hacky solutions like overriding the loader object for an specific build option as this patch.
On 5/4/20 1:26 PM, Adhemerval Zanella wrote: > > > On 04/05/2020 15:02, Paul E Murphy wrote: >> >> >> On 5/4/20 11:36 AM, Adhemerval Zanella wrote: >> >>> As a side note, is strcmp the only string routine that used in lazy >>> resolution? >> >> It was the only one I encountered while debugging this issue. I won't claim it is the only one, I have limited experience with that corner of glibc. Though, I am surprised only one test failed due to the vector register clobbering. > > I am wondering which would be best way to avoid such issue to happen > with different string routines since some specific loader code need > to adhere to more strict ABI requirements. > > The powerpc Makefile already adds the expected compiler tools to > avoid this very issue: > > --- > sysdeps/powerpc/powerpc64/Makefile > > 13 # These flags prevent FPU or Altivec registers from being used, > 14 # for code called in contexts that is not allowed to touch those registers. > 15 # Stupid GCC requires us to pass all these ridiculous switches. We need to > 16 # pass the -mno-* switches as well to prevent the compiler from attempting > 17 # to emit altivec or vsx instructions, especially when the registers aren't > 18 # available. > 19 no-special-regs := $(sort $(foreach n,40 41 50 51 60 61 62 63 \ > 20 $(foreach m,2 3 4 5 6 7 8 9, \ > 21 3$m 4$m 5$m),\ > 22 -ffixed-$n)) \ > 23 $(sort $(foreach n,$(foreach m,0 1 2 3 4 5 6 7 8 9,\ > 24 $m 1$m 2$m) 30 31,\ > 25 -ffixed-v$n)) \ > 26 -ffixed-vrsave -ffixed-vscr -mno-altivec -mno-vsx > [...] > 36 CFLAGS-dl-runtime.os = $(no-special-regs) > 37 CFLAGS-dl-lookup.os = $(no-special-regs) > 38 CFLAGS-dl-misc.os = $(no-special-regs) > 39 CFLAGS-rtld-mempcpy.os = $(no-special-regs) > 40 CFLAGS-rtld-memmove.os = $(no-special-regs) > 41 CFLAGS-rtld-memchr.os = $(no-special-regs) > 42 CFLAGS-rtld-strnlen.os = $(no-special-regs) Hrm, all the string functions listed above generate VMX instructions when building with --with-cpu=power9 --disable-multi-arch. I question the correctness and completeness of this list as strcmp is missing. > --- > > However it does not help when an arch-specific implementation might ended being > pulled in rtld build and thus making this ineffective (And it seems that > the strcmp is missing as well for the powerpc case). > > And using __builtin_strcmp not -ffreestanding does really help on dl-runtime, > since strcmp call might still being generated by compiler. > > Maybe one option could to enforce all rtld code to use generic C implementation > instead, not allowing sysdeps folder to override it. It should scale better > than actually testing all possible permutation and require less hacky solutions > like overriding the loader object for an specific build option as this patch. > I agree with that line of thought. However, what would the performance impact be? I would hazard to guess it would be limited.
On 04/05/2020 18:16, Paul E Murphy wrote: > > > On 5/4/20 1:26 PM, Adhemerval Zanella wrote: >> >> >> On 04/05/2020 15:02, Paul E Murphy wrote: >>> >>> >>> On 5/4/20 11:36 AM, Adhemerval Zanella wrote: >>> >>>> As a side note, is strcmp the only string routine that used in lazy >>>> resolution? >>> >>> It was the only one I encountered while debugging this issue. I won't claim it is the only one, I have limited experience with that corner of glibc. Though, I am surprised only one test failed due to the vector register clobbering. >> >> I am wondering which would be best way to avoid such issue to happen >> with different string routines since some specific loader code need >> to adhere to more strict ABI requirements. >> >> The powerpc Makefile already adds the expected compiler tools to >> avoid this very issue: >> >> --- >> sysdeps/powerpc/powerpc64/Makefile >> >> 13 # These flags prevent FPU or Altivec registers from being used, >> 14 # for code called in contexts that is not allowed to touch those registers. >> 15 # Stupid GCC requires us to pass all these ridiculous switches. We need to >> 16 # pass the -mno-* switches as well to prevent the compiler from attempting >> 17 # to emit altivec or vsx instructions, especially when the registers aren't >> 18 # available. >> 19 no-special-regs := $(sort $(foreach n,40 41 50 51 60 61 62 63 \ >> 20 $(foreach m,2 3 4 5 6 7 8 9, \ >> 21 3$m 4$m 5$m),\ >> 22 -ffixed-$n)) \ >> 23 $(sort $(foreach n,$(foreach m,0 1 2 3 4 5 6 7 8 9,\ >> 24 $m 1$m 2$m) 30 31,\ >> 25 -ffixed-v$n)) \ >> 26 -ffixed-vrsave -ffixed-vscr -mno-altivec -mno-vsx >> [...] >> 36 CFLAGS-dl-runtime.os = $(no-special-regs) >> 37 CFLAGS-dl-lookup.os = $(no-special-regs) >> 38 CFLAGS-dl-misc.os = $(no-special-regs) > > >> 39 CFLAGS-rtld-mempcpy.os = $(no-special-regs) >> 40 CFLAGS-rtld-memmove.os = $(no-special-regs) >> 41 CFLAGS-rtld-memchr.os = $(no-special-regs) >> 42 CFLAGS-rtld-strnlen.os = $(no-special-regs) > > Hrm, all the string functions listed above generate VMX instructions when building with --with-cpu=power9 --disable-multi-arch. I question the correctness and completeness of this list as strcmp is missing. I think that's the problem of having a dissociated rule from the code it actually uses, since its definition might be unnecessary if the code is refactored or changed. I didn't dig into the history of the lazy resolution code, but it might the case where old implementation did actually used the referenced routines. What I think would be a better approach is to make no-special-regs a parametric rule define in the architecture Makefile (which should know which registers it supports in lazy resolution) and use this rule (no-special-regs) on generic definition in elf/Makefile. > >> --- >> >> However it does not help when an arch-specific implementation might ended being >> pulled in rtld build and thus making this ineffective (And it seems that >> the strcmp is missing as well for the powerpc case). >> >> And using __builtin_strcmp not -ffreestanding does really help on dl-runtime, >> since strcmp call might still being generated by compiler. >> >> Maybe one option could to enforce all rtld code to use generic C implementation >> instead, not allowing sysdeps folder to override it. It should scale better >> than actually testing all possible permutation and require less hacky solutions >> like overriding the loader object for an specific build option as this patch. >> > I agree with that line of thought. However, what would the performance impact be? I would hazard to guess it would be limited. It would impart mostly the loader code and I would expects mostly in program loading (since distros now are moving to make bind now the default build option) with large a large number of symbols with large sizes (most likely C++ programs with load of shared libraries). Using clang default installation from ubuntu-10 (which links to quite large C++ libraries with more than 30k symbols) on x86_64, strcmp consumes about ~3% of total runtime. And adding a very crude strcmp profiling using the same strategy for relocation time from rtld.c at elf/dl-lookup.c it seems strcmp consumes about 2.2% of total relocation time. Most of time is in fact spend at _dl_lookup_symbol_x and _dl_relocate_object, so I am not sure if using the best mem* routines in dynamic loader is indeed paramount.
diff --git a/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S new file mode 100644 index 0000000000..afdb492b3d --- /dev/null +++ b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S @@ -0,0 +1,2 @@ +/* Fallback to P8 which does not use vector regs for rtld. */ +#include <sysdeps/powerpc/powerpc64/power8/strcmp.S>