From patchwork Mon Sep 20 09:24:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Schulze Frielinghaus X-Patchwork-Id: 45175 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0E9A43858010 for ; Mon, 20 Sep 2021 09:26:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0E9A43858010 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1632129960; bh=Om8XTC0O98YBCagVPoYd7j8K8zCuCIV8CE3A3O1XkkE=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=iS4jZ3LtkRm4tTnkmw9pY5PcR1UY3kilMRKcf24ZuWbs3MM3DQSbEEJcQMAfshZj1 Qgon2dyyRKA51OMZu8bHB0oBjus+43QvABk+31PTgQ39l7gfuJlHk6ruE9qR73frYL ge9n/B0wueXyORosfMUHY5I+dbuuGKGJiaaU2fA8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 898033858022 for ; Mon, 20 Sep 2021 09:24:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 898033858022 Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 18K93rd2037397 for ; Mon, 20 Sep 2021 05:24:55 -0400 Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 3b5wjxvhnn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 20 Sep 2021 05:24:54 -0400 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 18K9D78M016906 for ; Mon, 20 Sep 2021 09:24:53 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma03ams.nl.ibm.com with ESMTP id 3b57r8e89b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 20 Sep 2021 09:24:53 +0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 18K9K47n57606422 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Sep 2021 09:20:04 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 785094C06E; Mon, 20 Sep 2021 09:24:49 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CFA5B4C046; Mon, 20 Sep 2021 09:24:48 +0000 (GMT) Received: from localhost.localdomain (unknown [9.145.187.195]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Mon, 20 Sep 2021 09:24:48 +0000 (GMT) Date: Mon, 20 Sep 2021 11:24:46 +0200 To: Andreas Krebbel Subject: [PATCH] IBM Z: Provide rawmemchr{qi,hi,si} expander Message-ID: Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-GUID: GE_6QQZw8pNRtDGYI7KhH-rYjAiK-G0q X-Proofpoint-ORIG-GUID: GE_6QQZw8pNRtDGYI7KhH-rYjAiK-G0q X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-20_05,2021-09-20_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 mlxlogscore=999 spamscore=0 malwarescore=0 clxscore=1015 suspectscore=0 priorityscore=1501 adultscore=0 phishscore=0 impostorscore=0 lowpriorityscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109200055 X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Stefan Schulze Frielinghaus via Gcc-patches From: Stefan Schulze Frielinghaus Reply-To: Stefan Schulze Frielinghaus Cc: gcc-patches@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch implements the rawmemchr expander as introduced in https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579649.html Bootstrapped and regtested in conjunction with the patch from above on IBM Z. Ok for mainline? From 551362cda54048dc1a51588112f11c070ed52020 Mon Sep 17 00:00:00 2001 From: Stefan Schulze Frielinghaus Date: Mon, 8 Feb 2021 10:35:39 +0100 Subject: [PATCH 2/2] IBM Z: Provide rawmemchr{qi,hi,si} expander gcc/ChangeLog: * config/s390/s390-protos.h (s390_rawmemchrqi): Add prototype. (s390_rawmemchrhi): Add prototype. (s390_rawmemchrsi): Add prototype. * config/s390/s390.c (s390_rawmemchr): New function. (s390_rawmemchrqi): New function. (s390_rawmemchrhi): New function. (s390_rawmemchrsi): New function. * config/s390/s390.md (rawmemchr): New expander. (rawmemchr): New expander. * config/s390/vector.md (vec_vfees): Basically a copy of the pattern vfees from vx-builtins.md. * config/s390/vx-builtins.md (*vfees): Remove. gcc/testsuite/ChangeLog: * gcc.target/s390/rawmemchr-1.c: New test. --- gcc/config/s390/s390-protos.h | 4 + gcc/config/s390/s390.c | 89 ++++++++++++++++++ gcc/config/s390/s390.md | 20 +++++ gcc/config/s390/vector.md | 26 ++++++ gcc/config/s390/vx-builtins.md | 26 ------ gcc/testsuite/gcc.target/s390/rawmemchr-1.c | 99 +++++++++++++++++++++ 6 files changed, 238 insertions(+), 26 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/rawmemchr-1.c diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 4b03c6e99f5..0d9619e8254 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -66,6 +66,10 @@ s390_asm_declare_function_size (FILE *asm_out_file, const char *fnname ATTRIBUTE_UNUSED, tree decl); #endif +extern void s390_rawmemchrqi(rtx dst, rtx src, rtx pat); +extern void s390_rawmemchrhi(rtx dst, rtx src, rtx pat); +extern void s390_rawmemchrsi(rtx dst, rtx src, rtx pat); + #ifdef RTX_CODE extern int s390_extra_constraint_str (rtx, int, const char *); extern int s390_const_ok_for_constraint_p (HOST_WIDE_INT, int, const char *); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 54dd6332c3a..1435ce156e2 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16559,6 +16559,95 @@ s390_excess_precision (enum excess_precision_type type) } #endif +template +static void +s390_rawmemchr(rtx dst, rtx src, rtx pat) { + rtx lens = gen_reg_rtx (V16QImode); + rtx pattern = gen_reg_rtx (vec_mode); + rtx loop_start = gen_label_rtx (); + rtx loop_end = gen_label_rtx (); + rtx addr = gen_reg_rtx (Pmode); + rtx offset = gen_reg_rtx (Pmode); + rtx tmp = gen_reg_rtx (Pmode); + rtx loadlen = gen_reg_rtx (SImode); + rtx matchlen = gen_reg_rtx (SImode); + rtx mem; + + pat = GEN_INT (trunc_int_for_mode (INTVAL (pat), elt_mode)); + emit_insn (gen_rtx_SET (pattern, gen_rtx_VEC_DUPLICATE (vec_mode, pat))); + + emit_move_insn (addr, XEXP (src, 0)); + + // alignment + emit_insn (gen_vlbb (lens, gen_rtx_MEM (BLKmode, addr), GEN_INT (6))); + emit_insn (gen_lcbb (loadlen, addr, GEN_INT (6))); + lens = convert_to_mode (vec_mode, lens, 1); + emit_insn (gen_vec_vfees (lens, lens, pattern, GEN_INT (0))); + lens = convert_to_mode (V4SImode, lens, 1); + emit_insn (gen_vec_extractv4sisi (matchlen, lens, GEN_INT (1))); + lens = convert_to_mode (vec_mode, lens, 1); + emit_cmp_and_jump_insns (matchlen, loadlen, LT, NULL_RTX, SImode, 1, loop_end); + force_expand_binop (Pmode, and_optab, addr, GEN_INT (15), tmp, 1, OPTAB_DIRECT); + force_expand_binop (Pmode, sub_optab, GEN_INT (16), tmp, tmp, 1, OPTAB_DIRECT); + force_expand_binop (Pmode, add_optab, addr, tmp, addr, 1, OPTAB_DIRECT); + // now, addr is 16-byte aligned + + mem = gen_rtx_MEM (vec_mode, addr); + set_mem_align (mem, 128); + emit_move_insn (lens, mem); + emit_insn (gen_vec_vfees (lens, lens, pattern, GEN_INT (VSTRING_FLAG_CS))); + add_int_reg_note (s390_emit_ccraw_jump (4, EQ, loop_end), + REG_BR_PROB, + profile_probability::very_unlikely ().to_reg_br_prob_note ()); + + emit_label (loop_start); + LABEL_NUSES (loop_start) = 1; + + force_expand_binop (Pmode, add_optab, addr, GEN_INT (16), addr, 1, OPTAB_DIRECT); + mem = gen_rtx_MEM (vec_mode, addr); + set_mem_align (mem, 128); + emit_move_insn (lens, mem); + emit_insn (gen_vec_vfees (lens, lens, pattern, GEN_INT (VSTRING_FLAG_CS))); + add_int_reg_note (s390_emit_ccraw_jump (4, NE, loop_start), + REG_BR_PROB, + profile_probability::very_likely ().to_reg_br_prob_note ()); + + emit_label (loop_end); + LABEL_NUSES (loop_end) = 1; + + if (TARGET_64BIT) + { + lens = convert_to_mode (V2DImode, lens, 1); + emit_insn (gen_vec_extractv2didi (offset, lens, GEN_INT (0))); + } + else + { + lens = convert_to_mode (V4SImode, lens, 1); + emit_insn (gen_vec_extractv4sisi (offset, lens, GEN_INT (1))); + } + force_expand_binop (Pmode, add_optab, addr, offset, dst, 1, OPTAB_DIRECT); +} + +void +s390_rawmemchrqi (rtx dst, rtx src, rtx pat) +{ + s390_rawmemchr (dst, src, pat); +} + +void +s390_rawmemchrhi (rtx dst, rtx src, rtx pat) +{ + s390_rawmemchr (dst, src, pat); +} + +void +s390_rawmemchrsi (rtx dst, rtx src, rtx pat) +{ + s390_rawmemchr (dst, src, pat); +} + /* Implement the TARGET_ASAN_SHADOW_OFFSET hook. */ static unsigned HOST_WIDE_INT diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 1b894a926ce..f81bcef86ce 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -12258,3 +12258,23 @@ UNSPECV_PPA)] "TARGET_ZEC12" "") + +(define_expand "rawmemchr" + [(match_operand 0 "register_operand") + (match_operand 1 "memory_operand") + (match_operand:SINT 2 "const_int_operand")] + "TARGET_VX" +{ + if (TARGET_64BIT) + emit_insn (gen_rawmemchrdi (operands[0], operands[1], operands[2])); + else + emit_insn (gen_rawmemchrsi (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "rawmemchr" + [(match_operand:P 0 "register_operand") + (match_operand:BLK 1 "memory_operand") + (match_operand:SINT 2 "const_int_operand")] + "TARGET_VX" + "s390_rawmemchr (operands[0], operands[1], operands[2]); DONE;") diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 70274a6ab70..0870e2341fc 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -1988,6 +1988,32 @@ "vll\t%v0,%1,%2" [(set_attr "op_type" "VRS")]) +; vfeebs, vfeehs, vfeefs +; vfeezbs, vfeezhs, vfeezfs +(define_insn "vec_vfees" + [(set (match_operand:VI_HW_QHS 0 "register_operand" "=v") + (unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "v") + (match_operand:VI_HW_QHS 2 "register_operand" "v") + (match_operand:QI 3 "const_mask_operand" "C")] + UNSPEC_VEC_VFEE)) + (set (reg:CCRAW CC_REGNUM) + (unspec:CCRAW [(match_dup 1) + (match_dup 2) + (match_dup 3)] + UNSPEC_VEC_VFEECC))] + "TARGET_VX" +{ + unsigned HOST_WIDE_INT flags = UINTVAL (operands[3]); + + gcc_assert (!(flags & ~(VSTRING_FLAG_ZS | VSTRING_FLAG_CS))); + flags &= ~VSTRING_FLAG_CS; + + if (flags == VSTRING_FLAG_ZS) + return "vfeezs\t%v0,%v1,%v2"; + return "vfees\t%v0,%v1,%v2"; +} + [(set_attr "op_type" "VRR")]) + ; vfenebs, vfenehs, vfenefs ; vfenezbs, vfenezhs, vfenezfs (define_insn "vec_vfenes" diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index 3e7b8541887..efa77992f31 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -1366,32 +1366,6 @@ ; Vector find element equal -; vfeebs, vfeehs, vfeefs -; vfeezbs, vfeezhs, vfeezfs -(define_insn "*vfees" - [(set (match_operand:VI_HW_QHS 0 "register_operand" "=v") - (unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "v") - (match_operand:VI_HW_QHS 2 "register_operand" "v") - (match_operand:QI 3 "const_mask_operand" "C")] - UNSPEC_VEC_VFEE)) - (set (reg:CCRAW CC_REGNUM) - (unspec:CCRAW [(match_dup 1) - (match_dup 2) - (match_dup 3)] - UNSPEC_VEC_VFEECC))] - "TARGET_VX" -{ - unsigned HOST_WIDE_INT flags = UINTVAL (operands[3]); - - gcc_assert (!(flags & ~(VSTRING_FLAG_ZS | VSTRING_FLAG_CS))); - flags &= ~VSTRING_FLAG_CS; - - if (flags == VSTRING_FLAG_ZS) - return "vfeezs\t%v0,%v1,%v2"; - return "vfees\t%v0,%v1,%v2,%b3"; -} - [(set_attr "op_type" "VRR")]) - ; vfeeb, vfeeh, vfeef (define_insn "vfee" [(set (match_operand:VI_HW_QHS 0 "register_operand" "=v") diff --git a/gcc/testsuite/gcc.target/s390/rawmemchr-1.c b/gcc/testsuite/gcc.target/s390/rawmemchr-1.c new file mode 100644 index 00000000000..a5125702315 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/rawmemchr-1.c @@ -0,0 +1,99 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-details -mzarch -march=z13" } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrQI" 2 "ldist" } } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrHI" 2 "ldist" } } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrSI" 2 "ldist" } } */ + +#include +#include +#include +#include + +#define rawmemchrT(T, pattern) \ +__attribute__((noinline,noclone)) \ +T* rawmemchr_##T (T *s) \ +{ \ + while (*s != pattern) \ + ++s; \ + return s; \ +} + +rawmemchrT(int8_t, (int8_t)0xde) +rawmemchrT(uint8_t, 0xde) +rawmemchrT(int16_t, (int16_t)0xdead) +rawmemchrT(uint16_t, 0xdead) +rawmemchrT(int32_t, (int32_t)0xdeadbeef) +rawmemchrT(uint32_t, 0xdeadbeef) + +#define runT(T, pattern) \ +void run_##T () \ +{ \ + T *buf = malloc (4096 * 2 * sizeof(T)); \ + assert (buf != NULL); \ + memset (buf, 0xa, 4096 * 2 * sizeof(T)); \ + /* ensure q is 4096-byte aligned */ \ + T *q = (T*)((unsigned char *)buf \ + + (4096 - ((uintptr_t)buf & 4095))); \ + T *p; \ + /* unaligned + block boundary + 1st load */ \ + p = (T *) ((uintptr_t)q - 8); \ + p[2] = pattern; \ + assert ((rawmemchr_##T (&p[0]) == &p[2])); \ + p[2] = (T) 0xaaaaaaaa; \ + /* unaligned + block boundary + 2nd load */ \ + p = (T *) ((uintptr_t)q - 8); \ + p[6] = pattern; \ + assert ((rawmemchr_##T (&p[0]) == &p[6])); \ + p[6] = (T) 0xaaaaaaaa; \ + /* unaligned + 1st load */ \ + q[5] = pattern; \ + assert ((rawmemchr_##T (&q[2]) == &q[5])); \ + q[5] = (T) 0xaaaaaaaa; \ + /* unaligned + 2nd load */ \ + q[14] = pattern; \ + assert ((rawmemchr_##T (&q[2]) == &q[14])); \ + q[14] = (T) 0xaaaaaaaa; \ + /* unaligned + 3rd load */ \ + q[19] = pattern; \ + assert ((rawmemchr_##T (&q[2]) == &q[19])); \ + q[19] = (T) 0xaaaaaaaa; \ + /* unaligned + 4th load */ \ + q[25] = pattern; \ + assert ((rawmemchr_##T (&q[2]) == &q[25])); \ + q[25] = (T) 0xaaaaaaaa; \ + /* aligned + 1st load */ \ + q[5] = pattern; \ + assert ((rawmemchr_##T (&q[0]) == &q[5])); \ + q[5] = (T) 0xaaaaaaaa; \ + /* aligned + 2nd load */ \ + q[14] = pattern; \ + assert ((rawmemchr_##T (&q[0]) == &q[14])); \ + q[14] = (T) 0xaaaaaaaa; \ + /* aligned + 3rd load */ \ + q[19] = pattern; \ + assert ((rawmemchr_##T (&q[0]) == &q[19])); \ + q[19] = (T) 0xaaaaaaaa; \ + /* aligned + 4th load */ \ + q[25] = pattern; \ + assert ((rawmemchr_##T (&q[0]) == &q[25])); \ + q[25] = (T) 0xaaaaaaaa; \ + free (buf); \ +} + +runT(int8_t, (int8_t)0xde) +runT(uint8_t, 0xde) +runT(int16_t, (int16_t)0xdead) +runT(uint16_t, 0xdead) +runT(int32_t, (int32_t)0xdeadbeef) +runT(uint32_t, 0xdeadbeef) + +int main (void) +{ + run_uint8_t (); + run_int8_t (); + run_uint16_t (); + run_int16_t (); + run_uint32_t (); + run_int32_t (); + return 0; +}