From patchwork Sun Jun 16 11:49:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh Bodapati X-Patchwork-Id: 92263 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AC520385841C for ; Sun, 16 Jun 2024 13:43:42 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id B315E3858D26 for ; Sun, 16 Jun 2024 13:43:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B315E3858D26 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B315E3858D26 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718545396; cv=none; b=tclXCg1Uy6KYjEgCsmp9OEwsOjX5F+4Zg5jJhDcrZlE2B5VxS5ARrvyc6yW0JyFUsymK6vPu8jcoDK6a0JyVVeJhW4St73qX8h596KAdXUV37YHb6SJ2YKNkmZH/NvHMFatinDK5aPJgiWmv6UTU6hjxwXVnTxoXzh1BrbWum+Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718545396; c=relaxed/simple; bh=wI+TFcT13lcXTYfvVNc7G8e4mv3WHqaGUhulNRwONPQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=EwITCudV+3XGOTfKH45d7IdE6lZlkt4TryaPhoxPc65mnyI+q3QOGVK54woIj24Vqd/LTJxfnwwtLMCrXso/a5z4tbG5x9EPjjCcrW8ELG6ME6ufLKn35XAV33KJ68OLYmMCNvPJqzaLayBdDzkHQ/ciPhWSBrh7rK2F8iVbTxA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 45GBRYaL013077 for ; Sun, 16 Jun 2024 11:50:28 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from :to:cc:subject:date:message-id:content-transfer-encoding :mime-version; s=pp1; bh=CljDzroPZ2K4jF0r7rnbUmzy8UM4si0CtQp2RdD wtqM=; b=p23lT0BnR1XCzZ65xAoOXpLqi23H1kxbFKPcslrke3M6iS1bdZR5uyD 5PS/DR0hzqX485mG95pyCiWMZPPKOApQyxPViu9LPblE4ZpApObk0JIVdnOzJoES Bgx0smYCCt+OgACYR1kCHNimPoPiWqXUV3eqRZLj4zoVAp7NN88knZRDdz2HNXHK rDORf80MDP27VfYZuwElUdAdODHL9sAxZnv2+9a7a+GaNR+hG5PAe83CXv33SLwr GUbHvrG3Bi5r9OiqVEXPlJi7wVZHAPPCcvv1rlzD9MKzLaixevowyttkQkf6AdQN 0jUt34IU5pna/T25tyLvIdvEyOAAIuQ== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3yspvpgujd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sun, 16 Jun 2024 11:50:28 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 45GAeNw4006226 for ; Sun, 16 Jun 2024 11:50:27 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3ysn9u2vrp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sun, 16 Jun 2024 11:50:27 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 45GBoMo249742310 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 16 Jun 2024 11:50:24 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4FF9620043; Sun, 16 Jun 2024 11:50:22 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8044A20040; Sun, 16 Jun 2024 11:50:21 +0000 (GMT) Received: from ltcden2-lp1.aus.stglabs.ibm.com (unknown [9.3.90.43]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Sun, 16 Jun 2024 11:50:21 +0000 (GMT) From: bmahi496@linux.ibm.com To: libc-alpha@sourceware.org Cc: bergner@linux.ibm.com, MAHESH BODAPATI Subject: [PATCH] powerpc64: Add optimized strcpy and stpcpy for POWER10 Date: Sun, 16 Jun 2024 06:49:47 -0500 Message-ID: <20240616114947.413012-1-bmahi496@linux.ibm.com> X-Mailer: git-send-email 2.43.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: lLMuGVkPv-XTgNq8vEIqyd7YUb4aNx6d X-Proofpoint-GUID: lLMuGVkPv-XTgNq8vEIqyd7YUb4aNx6d X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-06-16_09,2024-06-14_03,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 suspectscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 clxscore=1011 impostorscore=0 spamscore=0 mlxscore=0 adultscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2405170001 definitions=main-2406160085 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org From: MAHESH BODAPATI Improvements compared to POWER9 version: Use simple comparisons for the first ~512 bytes The main loop is good for long strings, but comparing 16B each time is better for shorter strings. After aligning the address to 16 bytes, we unroll the loop four times, checking 128 bytes each time. There may be some overlap with the main loop for unaligned strings, but it is better for shorter strings. Use new P10 instructions lxvp is used to load 32B with a single instruction, reducing contention in the load queue. The degradations for smaller strings are not consistent and the overall performance numbers are good. --- sysdeps/powerpc/powerpc64/le/power10/stpcpy.S | 24 ++ sysdeps/powerpc/powerpc64/le/power10/strcpy.S | 394 ++++++++++++++++++ sysdeps/powerpc/powerpc64/multiarch/Makefile | 3 +- .../powerpc64/multiarch/ifunc-impl-list.c | 6 + .../powerpc64/multiarch/stpcpy-power10.S | 24 ++ sysdeps/powerpc/powerpc64/multiarch/stpcpy.c | 10 +- .../powerpc64/multiarch/strcpy-power10.S | 26 ++ sysdeps/powerpc/powerpc64/multiarch/strcpy.c | 8 +- 8 files changed, 489 insertions(+), 6 deletions(-) create mode 100644 sysdeps/powerpc/powerpc64/le/power10/stpcpy.S create mode 100644 sysdeps/powerpc/powerpc64/le/power10/strcpy.S create mode 100644 sysdeps/powerpc/powerpc64/multiarch/stpcpy-power10.S create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strcpy-power10.S diff --git a/sysdeps/powerpc/powerpc64/le/power10/stpcpy.S b/sysdeps/powerpc/powerpc64/le/power10/stpcpy.S new file mode 100644 index 0000000000..711a1ad512 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/le/power10/stpcpy.S @@ -0,0 +1,24 @@ +/* Optimized stpcpy implementation for PowerPC64/POWER10. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define USE_AS_STPCPY +#include + +weak_alias (__stpcpy, stpcpy) +libc_hidden_def (__stpcpy) +libc_hidden_builtin_def (stpcpy) diff --git a/sysdeps/powerpc/powerpc64/le/power10/strcpy.S b/sysdeps/powerpc/powerpc64/le/power10/strcpy.S new file mode 100644 index 0000000000..de52c92e78 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/le/power10/strcpy.S @@ -0,0 +1,394 @@ +/* Optimized strcpy implementation for PowerPC64/POWER10. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +#ifdef USE_AS_STPCPY +# ifndef STPCPY +# define FUNC_NAME __stpcpy +# else +# define FUNC_NAME STPCPY +# endif +#else +# ifndef STRCPY +# define FUNC_NAME strcpy +# else +# define FUNC_NAME STRCPY +# endif +#endif /* !USE_AS_STPCPY */ + +/* Implements the function + + char * [r3] strcpy (char *dest [r3], const char *src [r4]) + + or + + char * [r3] stpcpy (char *dest [r3], const char *src [r4]) + + if USE_AS_STPCPY is defined. + + The implementation can load bytes past a NULL terminator, but only + up to the next 16B/64B boundary, so it never crosses a page. */ + + +#define LXVP(xtp,dq,ra) \ + .long(((6)<<(32-6)) \ + | ((((xtp)-32)>>1)<<(32-10)) \ + | ((1)<<(32-11)) \ + | ((ra)<<(32-16)) \ + | dq) + +/* Load 4 quadwords, merge into one VR for speed and check for NULLs + and branch to label if NULL is found. */ +#define CHECK_64B(offset,addr,label) \ + LXVP(v4+32,offset,addr); \ + LXVP(v6+32,offset+32,addr); \ + vminub v14,v4,v5; \ + vminub v15,v6,v7; \ + vminub v16,v14,v15; \ + vcmpequb. v0,v16,v18; \ + beq cr6,$+12; \ + li r7,offset; \ + b L(label); \ + stxv 32+v5,(offset+0)(r11); \ + stxv 32+v4,(offset+16)(r11); \ + stxv 32+v7,(offset+32)(r11); \ + stxv 32+v6,(offset+48)(r11) + +/* Load quadword at addr+offset to vreg, check for NULL bytes, + and branch to label if any are found. */ +#define CHECK_16B(vreg,offset,addr,label) \ + lxv vreg+32,offset(addr); \ + vcmpequb. v15,vreg,v18; \ + bne cr6,L(label); + +/* Store vreg2 with length if NULL is found. */ +#define STORE_WITH_LEN(vreg1,vreg2,reg) \ + vctzlsbb r8,vreg1; \ + addi r9,r8,1; \ + sldi r9,r9,56; \ + stxvl 32+vreg2,reg,r9; + + + /* TODO: change this to .machine power10 when the minimum required + binutils allows it. */ + .machine power9 +ENTRY_TOCLESS (FUNC_NAME, 4) + CALL_MCOUNT 2 + + vspltisb v18,0 /* Zeroes in v18 */ + vspltisb v19,-1 /* 0xFF bytes in v19 */ + + /* Next 16B-aligned address. Prepare address for L(loop). */ + addi r5,r4,16 + clrrdi r5,r5,4 + subf r8,r4,r5 + add r11,r3,r8 + + /* Align data and fill bytes not loaded with non matching char. */ + lvx v0,0,r4 + lvsr v1,0,r4 + vperm v0,v19,v0,v1 + + vcmpequb. v6,v0,v18 /* 0xff if byte is NULL, 0x00 otherwise. */ + beq cr6,L(no_null) + + /* There's a NULL byte. */ + STORE_WITH_LEN(v6,v0,r3) + +#ifdef USE_AS_STPCPY + /* stpcpy returns the dest address plus the size not counting the + final '\0'. */ + add r3,r3,r8 +#endif + blr + +L(no_null): + sldi r10,r8,56 /* stxvl wants size in top 8 bits. */ + stxvl 32+v0,r3,r10 /* Partial store. */ + +/* The main loop is optimized for longer strings(> 512 bytes), + so checking the first bytes in 16B chunks benefits shorter + strings a lot. */ + .p2align 4 +L(aligned): + CHECK_16B(v0,0,r5,tail1) + CHECK_16B(v1,16,r5,tail2) + CHECK_16B(v2,32,r5,tail3) + CHECK_16B(v3,48,r5,tail4) + CHECK_16B(v4,64,r5,tail5) + CHECK_16B(v5,80,r5,tail6) + CHECK_16B(v6,96,r5,tail7) + CHECK_16B(v7,112,r5,tail8) + + stxv 32+v0,0(r11) + stxv 32+v1,16(r11) + stxv 32+v2,32(r11) + stxv 32+v3,48(r11) + stxv 32+v4,64(r11) + stxv 32+v5,80(r11) + stxv 32+v6,96(r11) + stxv 32+v7,112(r11) + + addi r11,r11,128 + + CHECK_16B(v0,128,r5,tail1) + CHECK_16B(v1,128+16,r5,tail2) + CHECK_16B(v2,128+32,r5,tail3) + CHECK_16B(v3,128+48,r5,tail4) + CHECK_16B(v4,128+64,r5,tail5) + CHECK_16B(v5,128+80,r5,tail6) + CHECK_16B(v6,128+96,r5,tail7) + CHECK_16B(v7,128+112,r5,tail8) + + stxv 32+v0,0(r11) + stxv 32+v1,16(r11) + stxv 32+v2,32(r11) + stxv 32+v3,48(r11) + stxv 32+v4,64(r11) + stxv 32+v5,80(r11) + stxv 32+v6,96(r11) + stxv 32+v7,112(r11) + + addi r11,r11,128 + + CHECK_16B(v0,256,r5,tail1) + CHECK_16B(v1,256+16,r5,tail2) + CHECK_16B(v2,256+32,r5,tail3) + CHECK_16B(v3,256+48,r5,tail4) + CHECK_16B(v4,256+64,r5,tail5) + CHECK_16B(v5,256+80,r5,tail6) + CHECK_16B(v6,256+96,r5,tail7) + CHECK_16B(v7,256+112,r5,tail8) + + stxv 32+v0,0(r11) + stxv 32+v1,16(r11) + stxv 32+v2,32(r11) + stxv 32+v3,48(r11) + stxv 32+v4,64(r11) + stxv 32+v5,80(r11) + stxv 32+v6,96(r11) + stxv 32+v7,112(r11) + + addi r11,r11,128 + + CHECK_16B(v0,384,r5,tail1) + CHECK_16B(v1,384+16,r5,tail2) + CHECK_16B(v2,384+32,r5,tail3) + CHECK_16B(v3,384+48,r5,tail4) + CHECK_16B(v4,384+64,r5,tail5) + CHECK_16B(v5,384+80,r5,tail6) + CHECK_16B(v6,384+96,r5,tail7) + CHECK_16B(v7,384+112,r5,tail8) + + stxv 32+v0,0(r11) + stxv 32+v1,16(r11) + stxv 32+v2,32(r11) + stxv 32+v3,48(r11) + stxv 32+v4,64(r11) + stxv 32+v5,80(r11) + stxv 32+v6,96(r11) + stxv 32+v7,112(r11) + + /* Prepare address for the loop. */ + addi r5,r4,512 + clrrdi r5,r5,6 + subf r7,r4,r5 + add r11,r3,r7 + +/* Switch to a more aggressive approach checking 64B each time. */ + .p2align 5 +L(strcpy_loop): + CHECK_64B(0,r5,tail_64b) + CHECK_64B(64,r5,tail_64b) + CHECK_64B(128,r5,tail_64b) + CHECK_64B(192,r5,tail_64b) + + CHECK_64B(256,r5,tail_64b) + CHECK_64B(256+64,r5,tail_64b) + CHECK_64B(256+128,r5,tail_64b) + CHECK_64B(256+192,r5,tail_64b) + addi r5,r5,512 + addi r11,r11,512 + + b L(strcpy_loop) + + .p2align 5 +L(tail_64b): + /* OK, we found a NULL byte. Let's look for it in the current 64-byte + block and mark it in its corresponding VR. lxvp vx,0(ry) puts the + low 16B bytes into vx+1, and the high into vx, so the order here is + v5, v4, v7, v6. */ + add r11,r11,r7 + vcmpequb. v8,v5,v18 + beq cr6,L(no_null_16B) + /* There's a NULL byte. */ + STORE_WITH_LEN(v8,v5,r11) +#ifdef USE_AS_STPCPY + add r3,r11,r8 +#endif + blr + +L(no_null_16B): + stxv 32+v5,0(r11) + vcmpequb. v8,v4,v18 + beq cr6,L(no_null_32B) + /* There's a NULL byte. */ + addi r11,r11,16 + STORE_WITH_LEN(v8,v4,r11) +#ifdef USE_AS_STPCPY + add r3,r11,r8 +#endif + blr + +L(no_null_32B): + stxv 32+v4,16(r11) + vcmpequb. v8,v7,v18 + beq cr6,L(no_null_48B) + /* There's a NULL byte. */ + addi r11,r11,32 + STORE_WITH_LEN(v8,v7,r11) +#ifdef USE_AS_STPCPY + add r3,r11,r8 +#endif + blr + +L(no_null_48B): + stxv 32+v7,32(r11) + vcmpequb. v8,v6,v18; + /* There's a NULL byte. */ + addi r11,r11,48 + STORE_WITH_LEN(v8,v6,r11) +#ifdef USE_AS_STPCPY + add r3,r11,r8 +#endif + blr + + .p2align 4 +L(tail1): + /* There's a NULL byte. */ + STORE_WITH_LEN(v15,v0,r11) +#ifdef USE_AS_STPCPY + /* stpcpy returns the dest address plus the size not counting the + final '\0'. */ + add r3,r11,r8 +#endif + blr + + .p2align 4 +L(tail2): + stxv 32+v0,0(r11) + /* There's a NULL byte. */ + addi r11,r11,16 + STORE_WITH_LEN(v15,v1,r11) +#ifdef USE_AS_STPCPY + add r3,r11,r8 +#endif + blr + + .p2align 4 +L(tail3): + stxv 32+v0,0(r11) + stxv 32+v1,16(r11) + /* There's a NULL byte. */ + addi r11,r11,32 + STORE_WITH_LEN(v15,v2,r11) +#ifdef USE_AS_STPCPY + add r3,r11,r8 +#endif + blr + + .p2align 4 +L(tail4): + stxv 32+v0,0(r11) + stxv 32+v1,16(r11) + stxv 32+v2,32(r11) + /* There's a NULL byte. */ + addi r11,r11,48 + STORE_WITH_LEN(v15,v3,r11) +#ifdef USE_AS_STPCPY + add r3,r11,r8 +#endif + blr + + .p2align 4 +L(tail5): + stxv 32+v0,0(r11) + stxv 32+v1,16(r11) + stxv 32+v2,32(r11) + stxv 32+v3,48(r11) + /* There's a NULL byte. */ + addi r11,r11,64 + STORE_WITH_LEN(v15,v4,r11) +#ifdef USE_AS_STPCPY + add r3,r11,r8 +#endif + blr + + .p2align 4 +L(tail6): + stxv 32+v0,0(r11) + stxv 32+v1,16(r11) + stxv 32+v2,32(r11) + stxv 32+v3,48(r11) + stxv 32+v4,64(r11) + /* There's a NULL byte. */ + addi r11,r11,80 + STORE_WITH_LEN(v15,v5,r11) +#ifdef USE_AS_STPCPY + add r3,r11,r8 +#endif + blr + + .p2align 4 +L(tail7): + stxv 32+v0,0(r11) + stxv 32+v1,16(r11) + stxv 32+v2,32(r11) + stxv 32+v3,48(r11) + stxv 32+v4,64(r11) + stxv 32+v5,80(r11) + /* There's a NULL byte. */ + addi r11,r11,96 + STORE_WITH_LEN(v15,v6,r11) +#ifdef USE_AS_STPCPY + add r3,r11,r8 +#endif + blr + + .p2align 4 +L(tail8): + stxv 32+v0,0(r11) + stxv 32+v1,16(r11) + stxv 32+v2,32(r11) + stxv 32+v3,48(r11) + stxv 32+v4,64(r11) + stxv 32+v5,80(r11) + stxv 32+v6,96(r11) + /* There's a NULL byte. */ + addi r11,r11,112 + STORE_WITH_LEN(v15,v7,r11) +#ifdef USE_AS_STPCPY + add r3,r11,r8 +#endif + blr + +END (FUNC_NAME) +#ifndef USE_AS_STPCPY +libc_hidden_builtin_def (strcpy) +#endif diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile index b847c19049..1a71f8d239 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/Makefile +++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile @@ -34,7 +34,8 @@ ifneq (,$(filter %le,$(config-machine))) sysdep_routines += memchr-power10 memcmp-power10 memcpy-power10 \ memmove-power10 memset-power10 rawmemchr-power9 \ rawmemchr-power10 strcmp-power9 strcmp-power10 \ - strncmp-power9 strncmp-power10 strcpy-power9 stpcpy-power9 \ + strncmp-power9 strncmp-power10 strcpy-power9 \ + strcpy-power10 stpcpy-power9 stpcpy-power10 \ strlen-power9 strncpy-power9 stpncpy-power9 strlen-power10 endif CFLAGS-strncase-power7.c += -mcpu=power7 -funroll-loops diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c index 2bb47d3527..5c515e2756 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c @@ -114,6 +114,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/powerpc/powerpc64/multiarch/strcpy.c. */ IFUNC_IMPL (i, name, strcpy, #ifdef __LITTLE_ENDIAN__ + IFUNC_IMPL_ADD (array, i, strcpy, hwcap2 & PPC_FEATURE2_ARCH_3_1 + && hwcap & PPC_FEATURE_HAS_VSX, + __strcpy_power10) IFUNC_IMPL_ADD (array, i, strcpy, hwcap2 & PPC_FEATURE2_ARCH_3_00 && hwcap & PPC_FEATURE_HAS_VSX, __strcpy_power9) @@ -130,6 +133,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/powerpc/powerpc64/multiarch/stpcpy.c. */ IFUNC_IMPL (i, name, stpcpy, #ifdef __LITTLE_ENDIAN__ + IFUNC_IMPL_ADD (array, i, stpcpy, hwcap2 & PPC_FEATURE2_ARCH_3_1 + && hwcap & PPC_FEATURE_HAS_VSX, + __stpcpy_power10) IFUNC_IMPL_ADD (array, i, stpcpy, hwcap2 & PPC_FEATURE2_ARCH_3_00 && hwcap & PPC_FEATURE_HAS_VSX, __stpcpy_power9) diff --git a/sysdeps/powerpc/powerpc64/multiarch/stpcpy-power10.S b/sysdeps/powerpc/powerpc64/multiarch/stpcpy-power10.S new file mode 100644 index 0000000000..fed5494ff4 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/multiarch/stpcpy-power10.S @@ -0,0 +1,24 @@ +/* Optimized stpcpy implementation for POWER10/PPC64. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define STPCPY __stpcpy_power10 + +#undef libc_hidden_builtin_def +#define libc_hidden_builtin_def(name) + +#include diff --git a/sysdeps/powerpc/powerpc64/multiarch/stpcpy.c b/sysdeps/powerpc/powerpc64/multiarch/stpcpy.c index 33c4a1f241..668ced0653 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/stpcpy.c +++ b/sysdeps/powerpc/powerpc64/multiarch/stpcpy.c @@ -28,13 +28,17 @@ extern __typeof (__stpcpy) __stpcpy_power7 attribute_hidden; extern __typeof (__stpcpy) __stpcpy_power8 attribute_hidden; # ifdef __LITTLE_ENDIAN__ extern __typeof (__stpcpy) __stpcpy_power9 attribute_hidden; +extern __typeof (__stpcpy) __stpcpy_power10 attribute_hidden; # endif libc_ifunc_hidden (__stpcpy, __stpcpy, # ifdef __LITTLE_ENDIAN__ - (hwcap2 & PPC_FEATURE2_ARCH_3_00 - && hwcap & PPC_FEATURE_HAS_VSX) - ? __stpcpy_power9 : + (hwcap2 & PPC_FEATURE2_ARCH_3_1 + && hwcap & PPC_FEATURE_HAS_VSX) + ? __stpcpy_power10 + : (hwcap2 & PPC_FEATURE2_ARCH_3_00 + && hwcap & PPC_FEATURE_HAS_VSX) + ? __stpcpy_power9 : # endif (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) diff --git a/sysdeps/powerpc/powerpc64/multiarch/strcpy-power10.S b/sysdeps/powerpc/powerpc64/multiarch/strcpy-power10.S new file mode 100644 index 0000000000..5d5997deb8 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/multiarch/strcpy-power10.S @@ -0,0 +1,26 @@ +/* Optimized strcpy implementation for POWER10/PPC64. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if defined __LITTLE_ENDIAN__ && IS_IN (libc) +#define STRCPY __strcpy_power10 + +#undef libc_hidden_builtin_def +#define libc_hidden_builtin_def(name) + +#include +#endif diff --git a/sysdeps/powerpc/powerpc64/multiarch/strcpy.c b/sysdeps/powerpc/powerpc64/multiarch/strcpy.c index 37189e6fd6..304b0fe588 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/strcpy.c +++ b/sysdeps/powerpc/powerpc64/multiarch/strcpy.c @@ -27,14 +27,18 @@ extern __typeof (strcpy) __strcpy_power7 attribute_hidden; extern __typeof (strcpy) __strcpy_power8 attribute_hidden; # ifdef __LITTLE_ENDIAN__ extern __typeof (strcpy) __strcpy_power9 attribute_hidden; +extern __typeof (strcpy) __strcpy_power10 attribute_hidden; # endif #undef strcpy libc_ifunc_redirected (__redirect_strcpy, strcpy, # ifdef __LITTLE_ENDIAN__ - (hwcap2 & PPC_FEATURE2_ARCH_3_00 + (hwcap2 & PPC_FEATURE2_ARCH_3_1 && hwcap & PPC_FEATURE_HAS_VSX) - ? __strcpy_power9 : + ? __strcpy_power10 + : (hwcap2 & PPC_FEATURE2_ARCH_3_00 + && hwcap & PPC_FEATURE_HAS_VSX) + ? __strcpy_power9 : # endif (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)