From patchwork Mon Nov 28 17:44:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 61180 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ABAE23852C6E for ; Mon, 28 Nov 2022 17:44:28 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by sourceware.org (Postfix) with ESMTPS id D39E43858C62 for ; Mon, 28 Nov 2022 17:44:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D39E43858C62 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com Received: by mail-wr1-x434.google.com with SMTP id z4so18155225wrr.3 for ; Mon, 28 Nov 2022 09:44:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=content-transfer-encoding:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=pXQ77C2lqdx/CLDS0iJmt3oZTrtADfCktgvFwUuu2cc=; b=bKnXwdWz2smK+LqdDijfCg5uGOlpxwJi0C6Og26v/1ifZNkeqP8Ja3cSpr09NAokgm 1cKmbiIRl+90l2Kzsp60/2eFsiRAgvchsVh1k6BipJGzuS0JP83zyWxMnPZL6bbUsUwp Y9ekNWySBmlV4LVrue9XeSwhHqjrfEWp4rXzuXLSSHPfYL79FFpgSYW6czyHfeM9G4+g C4V1o4V9Zx0QauDUvXfFP5h90c4I06Qfh660cSj21YYBaC1Qn8aujow9RyuTlZ1vWBCQ QrfCEwefnxgZ+rVKaLIeQeD9+tdY9BamajFJhD1rVTx0Ct1IPVNbytPN7WhGZTsz4fLL vEoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=pXQ77C2lqdx/CLDS0iJmt3oZTrtADfCktgvFwUuu2cc=; b=XBae8X7hlbZKXdpStjjtG5uEFTrpex0GBb6vK4D9n8Zsy3xSOL18V9V3BrfPKXViMn mAH3hhbEEgsxG66KNZWG/IELNLY1ye7YevPkpkBwhNtxQw6OZFINEqT08XBJh0+nKcG2 8NN9GGrqwnuKZjNWwJ7ViYY0pVD46GqD40nu2kvw7gff9Wxm03MMDlmTK1krqFIX7oOo qTacMvgkNFU5Jp9fgW5/LTIAGWfQw+MzNw6v8F/uYA1CVOCVW3CCdBBtTp/weMER5IMh 28Hh/Gc4ofsjQX+W2ie05+ucp00IOcdtGGuW181bhhuIvJobA/UbavLgEkSQXbqCSlQN 2glg== X-Gm-Message-State: ANoB5plMPHz9lhTfBcQbio4EZjdCcZ+jJ9qjhkY4l0PAD9NKqSP7JFZs fGTuVL18nMaajwtOvXp+jo+a+A== X-Google-Smtp-Source: AA0mqf6bo3HUp+xjIxhVeJJKwZ5qeardX7qeq6OeqYFvBQb4CjOKPQyxa4P1yBWBOXKcheqzvPXGcw== X-Received: by 2002:a5d:510b:0:b0:241:fe9d:fbf4 with SMTP id s11-20020a5d510b000000b00241fe9dfbf4mr13698850wrt.412.1669657448610; Mon, 28 Nov 2022 09:44:08 -0800 (PST) Received: from [192.168.0.201] ([212.69.42.53]) by smtp.gmail.com with ESMTPSA id r11-20020a05600c35cb00b003a84375d0d1sm21360390wmq.44.2022.11.28.09.44.07 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 28 Nov 2022 09:44:08 -0800 (PST) Date: Mon, 28 Nov 2022 17:44:05 +0000 (GMT) From: "Maciej W. Rozycki" To: Jeff Law cc: Kito Cheng , GCC Patches , Andrew Waterman Subject: [PATCH v2] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU In-Reply-To: <85ca8410-376b-5de7-0cfd-c213d8ab0cbd@gmail.com> Message-ID: References: <904539a8-00ca-851c-b893-d6684d58df73@gmail.com> <85ca8410-376b-5de7-0cfd-c213d8ab0cbd@gmail.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" We produce inefficient code for some synthesized SImode conditional set operations (i.e. ones that are not directly implemented in hardware) on RV64. For example a piece of C code like this: int sleu (unsigned int x, unsigned int y) { return x <= y; } gets compiled (at `-O2') to this: sleu: sgtu a0,a0,a1 # 9 [c=4 l=4] *sgtu_disi xori a0,a0,1 # 10 [c=4 l=4] *xorsi3_internal/1 andi a0,a0,1 # 16 [c=4 l=4] anddi3/1 ret # 25 [c=0 l=4] simple_return or (at `-O1') to this: sleu: sgtu a0,a0,a1 # 9 [c=4 l=4] *sgtu_disi xori a0,a0,1 # 10 [c=4 l=4] *xorsi3_internal/1 sext.w a0,a0 # 16 [c=4 l=4] extendsidi2/0 ret # 24 [c=0 l=4] simple_return This is because the middle end expands a SLEU operation missing from RISC-V hardware into a sequence of a SImode SGTU operation followed by an explicit SImode XORI operation with immediate 1. And while the SGTU machine instruction (alias SLTU with the input operands swapped) gives a properly sign-extended 32-bit result which is valid both as a SImode or a DImode operand the middle end does not see that through a SImode XORI operation, because we tell the middle end that the RISC-V target (unlike MIPS) may hold values in DImode integer registers that are valid for SImode operations even if not properly sign-extended. However the RISC-V psABI requires that 32-bit function arguments and results passed in 64-bit integer registers be properly sign-extended, so this is explicitly done at the conclusion of the function. Fix this by making the backend use a sequence of a DImode SGTU operation followed by a SImode SEQZ operation instead. The latter operation is known by the middle end to produce a properly sign-extended 32-bit result and therefore combine gets rid of the sign-extension operation that follows and actually folds it into the very same XORI machine operation resulting in: sleu: sgtu a0,a0,a1 # 9 [c=4 l=4] *sgtu_didi xori a0,a0,1 # 16 [c=4 l=4] xordi3/1 ret # 25 [c=0 l=4] simple_return instead (although the SEQZ alias SLTIU against immediate 1 machine instruction would equally do and is actually retained at `-O0'). This is handled analogously for the remaining synthesized operations of this kind, i.e. `SLE', `SGEU', and `SGE'. gcc/ * config/riscv/riscv.cc (riscv_emit_int_order_test): Use EQ 0 rather that XOR 1 for LE and LEU operations. gcc/testsuite/ * gcc.target/riscv/sge.c: New test. * gcc.target/riscv/sgeu.c: New test. * gcc.target/riscv/sle.c: New test. * gcc.target/riscv/sleu.c: New test. --- On Mon, 28 Nov 2022, Jeff Law wrote: > > > > I have noticed it went nowhere. Can you please check what > > > > compilation > > > > options lead to this discrepancy so that we can have the fix included in > > > > GCC 13? I'd like to understand what's going on here. > > > FWIW, I don't see the redundant sign extension with this testcase at -O2 > > > on > > > the trunk.  Is it possible the patch has been made redundant over the last > > > few > > > months? > > Maybe at -O2, but the test cases continue to fail in my configuration for > > other optimisation levels: > > > > FAIL: gcc.target/riscv/sge.c -O1 scan-assembler-not sext\\.w > > FAIL: gcc.target/riscv/sge.c -Og -g scan-assembler-not sext\\.w > > FAIL: gcc.target/riscv/sgeu.c -O1 scan-assembler-not sext\\.w > > FAIL: gcc.target/riscv/sgeu.c -Og -g scan-assembler-not sext\\.w > > FAIL: gcc.target/riscv/sle.c -O1 scan-assembler-not sext\\.w > > FAIL: gcc.target/riscv/sle.c -Og -g scan-assembler-not sext\\.w > > FAIL: gcc.target/riscv/sleu.c -O1 scan-assembler-not sext\\.w > > FAIL: gcc.target/riscv/sleu.c -Og -g scan-assembler-not sext\\.w > > I may have been running an rv32 toolchain...  So I'll start over and ensure > that I'm running rv64 :-) > > > With the trunk, I get code like Kito (AND with 0x1 mask) Right, I have examined assembly produced at -O2 and this is what happens here as well: --- sleu-O1.s 2022-11-28 16:31:18.520538342 +0000 +++ sleu-O2.s 2022-11-28 16:30:27.054241372 +0000 @@ -10,7 +10,7 @@ sleu: sgtu a0,a0,a1 xori a0,a0,1 - sext.w a0,a0 + andi a0,a0,1 ret .size sleu, .-sleu .section .note.GNU-stack,"",@progbits following Kito's observations. Which is why the tests incorrectly pass at some optimisation levels while code produced is still suboptimal and just trivially different. > The key difference is Roger's patch: > > commit c23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f > Author: Roger Sayle > Date:   Wed Aug 3 08:55:35 2022 +0100 > >     Some additional zero-extension related optimizations in simplify-rtx. > >     This patch implements some additional zero-extension and sign-extension >     related optimizations in simplify-rtx.cc.  The original motivation comes >     from PR rtl-optimization/71775, where in comment #2 Andrew Pinksi sees: > >     Failed to match this instruction: >     (set (reg:DI 88 [ _1 ]) >         (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0))) > > [ ... ] > > With that patch the sign extension is removed and instead we generate the AND > with 0x1. > > Old, from combine dump: > >   Successfully matched this instruction: >   (set (reg/i:DI 10 a0) > !     (sign_extend:DI (reg:SI 78))) > > > New, from combine dump: > >   (set (reg/i:DI 10 a0) > !     (and:DI (subreg:DI (reg:SI 78) 0) > !         (const_int 1 [0x1]))) > > Note the date on Roger's patch, roughly the same time as yours. I suspect Kito > had tested the truck with Roger's patch. That indeed seems like the correct explanation. Thanks for tracking it down! > Your patch is probably still useful.  I think Kito's only concern was to make > sure we don't have the ANDI instruction in addition to not having the SEXT > instruction.  So still approved for trunk, just update the testcases to make > sure we don't have the ANDI too. Given the false negatives how about getting a bit stricter and also checking there's nothing following the XORI instruction, like here? It might be an overkill to have a check both for the sequence and for the absence of ANDI or SEXT.W as well, but I'd rather have them both out of an abundance of caution. Maciej Changes from v1: - Update test cases so as to verify there's no extra operation between XORI and the final RET, and that an ANDI instruction is not present either. - Update the change description to reflect changes in code generation. --- gcc/config/riscv/riscv.cc | 4 ++-- gcc/testsuite/gcc.target/riscv/sge.c | 12 ++++++++++++ gcc/testsuite/gcc.target/riscv/sgeu.c | 12 ++++++++++++ gcc/testsuite/gcc.target/riscv/sle.c | 12 ++++++++++++ gcc/testsuite/gcc.target/riscv/sleu.c | 12 ++++++++++++ 5 files changed, 50 insertions(+), 2 deletions(-) gcc-riscv-int-order-inv-seqz.diff Index: gcc/gcc/config/riscv/riscv.cc =================================================================== --- gcc.orig/gcc/config/riscv/riscv.cc +++ gcc/gcc/config/riscv/riscv.cc @@ -3004,9 +3004,9 @@ riscv_emit_int_order_test (enum rtx_code } else if (invert_ptr == 0) { - rtx inv_target = riscv_force_binary (GET_MODE (target), + rtx inv_target = riscv_force_binary (word_mode, inv_code, cmp0, cmp1); - riscv_emit_binary (XOR, target, inv_target, const1_rtx); + riscv_emit_binary (EQ, target, inv_target, const0_rtx); } else { Index: gcc/gcc/testsuite/gcc.target/riscv/sge.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/sge.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +int +sge (int x, int y) +{ + return x >= y; +} + +/* { dg-final { scan-assembler "\\sxori\\sa0,a0,1\n\\sret\n" } } */ +/* { dg-final { scan-assembler-not "andi|sext\\.w" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/sgeu.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/sgeu.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +int +sgeu (unsigned int x, unsigned int y) +{ + return x >= y; +} + +/* { dg-final { scan-assembler "\\sxori\\sa0,a0,1\n\\sret\n" } } */ +/* { dg-final { scan-assembler-not "andi|sext\\.w" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/sle.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/sle.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +int +sle (int x, int y) +{ + return x <= y; +} + +/* { dg-final { scan-assembler "\\sxori\\sa0,a0,1\n\\sret\n" } } */ +/* { dg-final { scan-assembler-not "andi|sext\\.w" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/sleu.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/sleu.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +int +sleu (unsigned int x, unsigned int y) +{ + return x <= y; +} + +/* { dg-final { scan-assembler "\\sxori\\sa0,a0,1\n\\sret\n" } } */ +/* { dg-final { scan-assembler-not "andi|sext\\.w" } } */