From patchwork Wed May 1 00:31:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Pengxuan Zheng (QUIC)" X-Patchwork-Id: 89215 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D7FF53858414 for ; Wed, 1 May 2024 00:32:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by sourceware.org (Postfix) with ESMTPS id 43B133858D20 for ; Wed, 1 May 2024 00:32:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 43B133858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=quicinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 43B133858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714523537; cv=none; b=wCub4UK3pK2yNBWpmpTOsb5WM7NkvJVJ+WBE+yXmXNZJlgGKL9Gm5f6iiYdGeS2AT7AczyaZ07L5DWRQwx2q5iUvJwrd/QV+XgyMA3fC4bicTnKFBBhHGyRHaGRurhIZacUTt9+eOzzWZdxQstnutchCb0YA0DnsIdD30lJ6Xy4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714523537; c=relaxed/simple; bh=SRYQHKxOkcX9J+a7or6dVfgLcTGFMXqRexuaYtT7K2Y=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=x1VJpvzIW8wG0S26vcgqsrwvWAvgSYpXfq4a26w/KYQb0SesT8c50rKA04k17kG4u4PrtMFMTglW5LHibcQ2GjEZutSbfyScOKS7SIi0GeIfbdJP++cwHo29ELHRYgu5KWNPGIaM5YQh3hPaK5AUGDpqXOBrxcG1J3/9YQEuIJI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0279862.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 441093bl005244 for ; Wed, 1 May 2024 00:32:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= from:to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=qcppdkim1; bh=TDp8I9u1/ND6r/5wdoWD 9JpL3ZPOL+rn3hL+D0B6mLg=; b=Bs/aJPDu3Eha8TV+QIFedMIjiS5ODANsyKgd mZm9JZb4asPzbA5QHXL10XSJnZ1KYv6Txh7aXvycoh5rCZ73Iu0zrv5Jx6WI7ycF FcE0WfeATVrkzqzUJ6v7qcCLtq49NVHRMi7Q10mJSClHN+AWSSMwh5iVuyDb5Kqf r19OUXlqU2IEZB0fmHBb85ZgxSbKShUfmz2HF4eAJ12iz1LZQ0ut0CsmIoXm4+G9 64ne40bJUP6XluvYI2bGidFhNQLm+jpIknzalfpiOWixWcrpByUT/uSrTgxJ4heg PJx6SOUyMju+iVuUpAhBMmK7r4EE9I5RfzzZRAsEBVYaT95XDw== Received: from nalasppmta03.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3xtvf7jad9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 01 May 2024 00:32:14 +0000 (GMT) Received: from nalasex01c.na.qualcomm.com (nalasex01c.na.qualcomm.com [10.47.97.35]) by NALASPPMTA03.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 4410WBpm031778 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 1 May 2024 00:32:11 GMT Received: from hu-pzheng-lv.qualcomm.com (10.49.16.6) by nalasex01c.na.qualcomm.com (10.47.97.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Tue, 30 Apr 2024 17:32:11 -0700 From: Pengxuan Zheng To: CC: , Pengxuan Zheng Subject: [PATCH] aarch64: Add vector popcount besides QImode [PR113859] Date: Tue, 30 Apr 2024 17:31:43 -0700 Message-ID: <20240501003143.5323-1-quic_pzheng@quicinc.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-Originating-IP: [10.49.16.6] X-ClientProxiedBy: nalasex01c.na.qualcomm.com (10.47.97.35) To nalasex01c.na.qualcomm.com (10.47.97.35) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: 0cx_yvfAV-HsLPvV9QkRd6ceA6kSELvV X-Proofpoint-ORIG-GUID: 0cx_yvfAV-HsLPvV9QkRd6ceA6kSELvV X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1011,Hydra:6.0.650,FMLib:17.11.176.26 definitions=2024-04-30_16,2024-04-30_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 impostorscore=0 lowpriorityscore=0 mlxlogscore=566 priorityscore=1501 phishscore=0 mlxscore=0 adultscore=0 bulkscore=0 clxscore=1011 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2404010003 definitions=main-2405010001 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target by adding popcount patterns for vector modes besides QImode, i.e., HImode, SImode and DImode. With this patch, we now generate the following for HImode: cnt v1.16b, v.16b uaddlp v2.8h, v1.16b For SImode, we generate: cnt v1.16b, v.16b uaddlp v2.8h, v1.16b uaddlp v3.4s, v2.8h For V2DI, we generate: cnt v1.16b, v.16b uaddlp v2.8h, v1.16b uaddlp v3.4s, v2.8h uaddlp v4.2d, v3.4s gcc/ChangeLog: PR target/113859 * config/aarch64/aarch64-simd.md (popcount2): New define_expand. gcc/testsuite/ChangeLog: PR target/113859 * gcc.target/aarch64/popcnt-vec.c: New test. Signed-off-by: Pengxuan Zheng --- gcc/config/aarch64/aarch64-simd.md | 40 ++++++++++++++++ gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 48 +++++++++++++++++++ 2 files changed, 88 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f8bb973a278..093c32ee8ff 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3540,6 +3540,46 @@ (define_insn "popcount2" [(set_attr "type" "neon_cnt")] ) +(define_expand "popcount2" + [(set (match_operand:VQN 0 "register_operand" "=w") + (popcount:VQN (match_operand:VQN 1 "register_operand" "w")))] + "TARGET_SIMD" + { + rtx v = gen_reg_rtx (V16QImode); + rtx v1 = gen_reg_rtx (V16QImode); + emit_move_insn (v, gen_lowpart (V16QImode, operands[1])); + emit_insn (gen_popcountv16qi2 (v1, v)); + if (mode == V8HImode) + { + /* For V8HI, we generate: + cnt v1.16b, v.16b + uaddlp v2.8h, v1.16b */ + emit_insn (gen_aarch64_uaddlpv16qi (operands[0], v1)); + DONE; + } + rtx v2 = gen_reg_rtx (V8HImode); + emit_insn (gen_aarch64_uaddlpv16qi (v2, v1)); + if (mode == V4SImode) + { + /* For V4SI, we generate: + cnt v1.16b, v.16b + uaddlp v2.8h, v1.16b + uaddlp v3.4s, v2.8h */ + emit_insn (gen_aarch64_uaddlpv8hi (operands[0], v2)); + DONE; + } + /* For V2DI, we generate: + cnt v1.16b, v.16b + uaddlp v2.8h, v1.16b + uaddlp v3.4s, v2.8h + uaddlp v4.2d, v3.4s */ + rtx v3 = gen_reg_rtx (V4SImode); + emit_insn (gen_aarch64_uaddlpv8hi (v3, v2)); + emit_insn (gen_aarch64_uaddlpv4si (operands[0], v3)); + DONE; + } +) + ;; 'across lanes' max and min ops. ;; Template for outputting a scalar, so we can create __builtins which can be diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c new file mode 100644 index 00000000000..4c9a1b95990 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c @@ -0,0 +1,48 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +/* This function should produce cnt v.16b. */ +void +bar (unsigned char *__restrict b, unsigned char *__restrict d) +{ + for (int i = 0; i < 1024; i++) + d[i] = __builtin_popcount (b[i]); +} + +/* This function should produce cnt v.16b and uaddlp (Add Long Pairwise). */ +void +bar1 (unsigned short *__restrict b, unsigned short *__restrict d) +{ + for (int i = 0; i < 1024; i++) + d[i] = __builtin_popcount (b[i]); +} + +/* This function should produce cnt v.16b and 2 uaddlp (Add Long Pairwise). */ +void +bar2 (unsigned int *__restrict b, unsigned int *__restrict d) +{ + for (int i = 0; i < 1024; i++) + d[i] = __builtin_popcount (b[i]); +} + +/* This function should produce cnt v.16b and 3 uaddlp (Add Long Pairwise). */ +void +bar3 (unsigned long long *__restrict b, unsigned long long *__restrict d) +{ + for (int i = 0; i < 1024; i++) + d[i] = __builtin_popcountll (b[i]); +} + +/* SLP + This function should produce cnt v.16b and 3 uaddlp (Add Long Pairwise). */ +void +bar4 (unsigned long long *__restrict b, unsigned long long *__restrict d) +{ + d[0] = __builtin_popcountll (b[0]); + d[1] = __builtin_popcountll (b[1]); +} + +/* { dg-final { scan-assembler-not {\tbl\tpopcount} } } */ +/* { dg-final { scan-assembler-times {cnt\t} 5 } } */ +/* { dg-final { scan-assembler-times {uaddlp\t} 9 } } */ +/* { dg-final { scan-assembler-times {ldr\tq} 5 } } */