From patchwork Fri Feb 3 07:16:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Prathamesh Kulkarni X-Patchwork-Id: 64212 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 664EB3858C33 for ; Fri, 3 Feb 2023 07:21:45 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 664EB3858C33 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1675408905; bh=BMdgzj6i/QkTXpS7Zl2LXM3k62weX7czt4HeYspyb7k=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=JjcwYUpC2y5m9mBXGZ1kdJNUIGN0igEeEvpNvBDF/NV5K3CaHr89pCxdytQkzT8vH cVRnRcBhPHZ9HmlfRmGiGY8NB34ndTxw8OjDfU67KBzYgxIwgLDylMJiLLfft9Ho1/ 3B4bgB2MCjTADCBv4oqYuCuUUXAQG9cgXDaHQH+g= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by sourceware.org (Postfix) with ESMTPS id B961E38432D6 for ; Fri, 3 Feb 2023 07:17:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B961E38432D6 Received: by mail-wm1-x329.google.com with SMTP id k8-20020a05600c1c8800b003dc57ea0dfeso5375091wms.0 for ; Thu, 02 Feb 2023 23:17:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=BMdgzj6i/QkTXpS7Zl2LXM3k62weX7czt4HeYspyb7k=; b=Pqr75jdYjDN5jT4JhQRlYaObfPRhC6+BisQuDPrWY0keX56JDmzfcfF2RoK3HQIsNx Vfmanab2IbaANdSpuzgsrU6vmdxf5prA5mFcVxnpzzTkD2e3MwVCOdokO6aq40ulCrPp nf1I+I1oMrdScF5xx0PT4nTcwNV+05AKyx9llJ44GdFso1ZcHtGEXMAQOlGopm9R8Nyv YqgQDgJvp1Iz/IIuEFuuF/9Uad+A+n0KCekHIcp+vY3m7XNp0JvKctzy+ML81L82CtGS qeXQUI0jyvhRt4vTsW9KrHJniIUKW5gie8hoOCrrWgYJsWG2K1OTWbbRx0+AZ5B7LJTq rDRw== X-Gm-Message-State: AO0yUKViijh1BSyKyGKzjUiDkuonZSO3V+/ynTBEQG5p7k6NfEyEihXA 1sC9+kxKPKrOujLpEjB1J+oPQUhpTe7NVpRE8Li/kmkdfEinqbvx X-Google-Smtp-Source: AK7set9LyMhJNPVpt1IYwqxSN4TxmB7rAkj138GBbBCZJYMO7J6MAwPzsq/9w+NFlO/J/vEgeTdCjNOIIrwxv+h27S0= X-Received: by 2002:a05:600c:4f4d:b0:3d1:e4ed:2719 with SMTP id m13-20020a05600c4f4d00b003d1e4ed2719mr267886wmq.147.1675408630936; Thu, 02 Feb 2023 23:17:10 -0800 (PST) MIME-Version: 1.0 Date: Fri, 3 Feb 2023 12:46:33 +0530 Message-ID: Subject: [aarch64] Code-gen for vector initialization involving constants To: gcc Patches , Richard Sandiford X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Prathamesh Kulkarni via Gcc-patches From: Prathamesh Kulkarni Reply-To: Prathamesh Kulkarni Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi Richard, While digging thru aarch64_expand_vector_init, I noticed it gives priority to loading a constant first: /* Initialise a vector which is part-variable. We want to first try to build those lanes which are constant in the most efficient way we can. */ which results in suboptimal code-gen for following case: int16x8_t f_s16(int16_t x) { return (int16x8_t) { x, x, x, x, x, x, x, 1 }; } code-gen trunk: f_s16: movi v0.8h, 0x1 ins v0.h[0], w0 ins v0.h[1], w0 ins v0.h[2], w0 ins v0.h[3], w0 ins v0.h[4], w0 ins v0.h[5], w0 ins v0.h[6], w0 ret The attached patch tweaks the following condition: if (n_var == n_elts && n_elts <= 16) { ... } to pass if maxv >= 80% of n_elts, with 80% being an arbitrary "high enough" threshold. The intent is to dup the most repeating variable if it it's repetition is "high enough" and insert constants which should be "better" than loading constant first and inserting variables like in the above case. Alternatively, I suppose we can remove threshold and for constants, generate both sequences and check which one is more efficient ? code-gen with patch: f_s16: dup v0.8h, w0 movi v1.4h, 0x1 ins v0.h[7], v1.h[0] ret The patch is lightly tested to verify that vec[t]-init-*.c tests pass with bootstrap+test in progress. Does this look OK ? Thanks, Prathamesh diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index acc0cfe5f94..df33509c6e4 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -22079,30 +22079,36 @@ aarch64_expand_vector_init (rtx target, rtx vals) and matches[X][1] with the count of duplicate elements (if X is the earliest element which has duplicates). */ - if (n_var == n_elts && n_elts <= 16) + int matches[16][2] = {0}; + for (int i = 0; i < n_elts; i++) { - int matches[16][2] = {0}; - for (int i = 0; i < n_elts; i++) + for (int j = 0; j <= i; j++) { - for (int j = 0; j <= i; j++) + if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j))) { - if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j))) - { - matches[i][0] = j; - matches[j][1]++; - break; - } + matches[i][0] = j; + matches[j][1]++; + break; } } - int maxelement = 0; - int maxv = 0; - for (int i = 0; i < n_elts; i++) - if (matches[i][1] > maxv) - { - maxelement = i; - maxv = matches[i][1]; - } + } + int maxelement = 0; + int maxv = 0; + for (int i = 0; i < n_elts; i++) + if (matches[i][1] > maxv) + { + maxelement = i; + maxv = matches[i][1]; + } + + rtx max_elem = XVECEXP (vals, 0, maxelement); + if (n_elts <= 16 + && ((n_var == n_elts) + || (maxv >= (int)(0.8 * n_elts) + && !CONST_INT_P (max_elem) + && !CONST_DOUBLE_P (max_elem)))) + { /* Create a duplicate of the most common element, unless all elements are equally useless to us, in which case just immediately set the vector register using the first element. */ diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-18.c b/gcc/testsuite/gcc.target/aarch64/vec-init-18.c new file mode 100644 index 00000000000..e20b813559e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-18.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** f1_s16: +** ... +** dup v[0-9]+\.8h, w[0-9]+ +** movi v[0-9]+\.4h, 0x1 +** ins v[0-9]+\.h\[7\], v[0-9]+\.h\[0\] +** ... +** ret +*/ + +int16x8_t f1_s16(int16_t x) +{ + return (int16x8_t) {x, x, x, x, x, x, x, 1}; +} + +/* +** f2_s16: +** ... +** dup v[0-9]+\.8h, w[0-9]+ +** movi v[0-9]+\.4h, 0x1 +** movi v[0-9]+\.4h, 0x2 +** ins v[0-9]+\.h\[6\], v[0-9]+\.h\[0\] +** ins v[0-9]+\.h\[7\], v[0-9]+\.h\[0\] +** ... +** ret +*/ + +int16x8_t f2_s16(int16_t x) +{ + return (int16x8_t) { x, x, x, x, x, x, 1, 2 }; +} + +/* +** f3_s16: +** ... +** movi v[0-9]+\.8h, 0x1 +** ins v[0-9]+\.h\[0\], w0 +** ins v[0-9]+\.h\[1\], w0 +** ins v[0-9]+\.h\[2\], w0 +** ... +** ret +*/ + +int16x8_t f3_s16(int16_t x) +{ + return (int16x8_t) {x, x, x, 1, 1, 1, 1, 1}; +}