From patchwork Thu Jun 9 13:44:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 54986 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2AFC63839C5F for ; Thu, 9 Jun 2022 13:44:32 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by sourceware.org (Postfix) with ESMTPS id 38BAB3852777 for ; Thu, 9 Jun 2022 13:44:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 38BAB3852777 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com Received: by mail-wm1-x336.google.com with SMTP id 67-20020a1c1946000000b00397382b44f4so12635154wmz.2 for ; Thu, 09 Jun 2022 06:44:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=NOGHFfGMgNhlwYf+JIYmGjAXAbaQfcaVfMvet/8ouFQ=; b=eWw2QR+Saew0a2Q5NhSfh5rflUQyZjX2vq9MGdtLVhNMrVI9MInFwJ6dngFefX5M2+ MvcBpbZOFLX1vyMsYvbpX5i1hQBPzBCCaDU4kk+QPjKNlj1Tq+C5+Gy1ke1gcQX7OLI+ QyFsFgMWMzdO7Wefi1UxpaeNqw4XXh1Ga9+degCy0Y+RFe/kbC4+oXyXfy1oASpz4+Zn mZTLj+67/Vj+tRwTu6Za7L+ym1hbe+IRFUTavnw8z3Z83XC4PKJo/zRPyS83tG//X8U2 cRsRCV+uoTzBUjYoJAGR3n56A7gf6rEXWUkaIxSpQvYUD8N2WJqy5/3DlZ2PnewWEiWk /5qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:user-agent :mime-version; bh=NOGHFfGMgNhlwYf+JIYmGjAXAbaQfcaVfMvet/8ouFQ=; b=fgGVKo/c4omWOPii3dFgQEBQ/ZxtVQV8kgjI2xh1gM/Wc7ZBFCB1rkY9x+hOhbcHro r1UB0ttZ9eWlZ71o1bzpjhqW7hDhGxFnW4DQKzIOqUrI2nxqT37GC4I0BUBMhphXXORM IULQIDt6oDMxZkBonu7CzIEH2y+JWbfpAWPHk8Xn5gv8gOktjKtHJfjQLps/VdGwaYMR nLoWcTnHY1MMjoVrMTfKkSYr55J/VSpgUwDPdslj6929F30tBr0exfemmSijU3izHQdz KIdUclXjC/5N8aJarzy83Wsd8gCnwmHNQCyIK7jqDUDASN/JTw3TMC/QNx/QTKg5WN+0 w/XA== X-Gm-Message-State: AOAM532mDSJ9fXpOC2jSE6Vzd6tcnePPnoBmrcakBZDOhaPo7T/i7BYq teV47HbqfE9TtCwmPksP58csy+pUkbS0Rg== X-Google-Smtp-Source: ABdhPJy24j3llnRFypM2Ni4OKOqvWp8gdMZvImgxw3zzjasMlvlgqD5pqxFKl8S7DLBO5F42XZbdwg== X-Received: by 2002:a05:600c:1c94:b0:39c:55a2:af25 with SMTP id k20-20020a05600c1c9400b0039c55a2af25mr3339868wms.155.1654782251918; Thu, 09 Jun 2022 06:44:11 -0700 (PDT) Received: from tpp.orcam.me.uk (tpp.orcam.me.uk. [2001:8b0:154:0:ea6a:64ff:fe24:f2fc]) by smtp.gmail.com with ESMTPSA id k14-20020a5d518e000000b0021350f7b22esm21775935wrv.109.2022.06.09.06.44.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Jun 2022 06:44:11 -0700 (PDT) Date: Thu, 9 Jun 2022 14:44:08 +0100 (BST) From: "Maciej W. Rozycki" To: gcc-patches@gcc.gnu.org Subject: [PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns Message-ID: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Waterman , Kito Cheng Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" We have unordered FP comparisons implemented as RTL insns that produce multiple machine instructions. Such RTL insns are hard to match with a processor pipeline description and additionally there is a redundant SNEZ instruction produced on the result of these comparisons even though the FLT.fmt and FLE.fmt machine instructions already produce either 0 or 1, e.g.: long flt (double x, double y) { return __builtin_isless (x, y); } with `-O2 -fno-finite-math-only -fno-signaling-nans' gets compiled to: .globl flt .type flt, @function flt: frflags a5 flt.d a0,fa0,fa1 fsflags a5 snez a0,a0 ret .size flt, .-flt because the middle end can't see through the UNSPEC operation unordered FP comparisons have been defined in terms of. These instructions are only produced via an expander already, so change the expander to emit individual RTL insns for each machine instruction in the ultimate ultimate sequence produced rather than deferring to a single RTL insn producing the whole sequence at once. gcc/ * config/riscv/riscv.md (UNSPECV_FSNVSNAN): New constant. (QUIET_PATTERN): New int attribute. (f_quiet4): Emit the intended RTL insns entirely within the preparation statements. (*f_quiet4_default) (*f_quiet4_snan): Remove insns. (*riscv_fsnvsnan2): New insn. gcc/testsuite/ * gcc.target/riscv/fle-ieee.c: New test. * gcc.target/riscv/fle-snan.c: New test. * gcc.target/riscv/fle.c: New test. * gcc.target/riscv/flef-ieee.c: New test. * gcc.target/riscv/flef-snan.c: New test. * gcc.target/riscv/flef.c: New test. * gcc.target/riscv/flt-ieee.c: New test. * gcc.target/riscv/flt-snan.c: New test. * gcc.target/riscv/flt.c: New test. * gcc.target/riscv/fltf-ieee.c: New test. * gcc.target/riscv/fltf-snan.c: New test. * gcc.target/riscv/fltf.c: New test. --- Hi, I think it is a step in the right direction, however ultimately I think we ought to actually tell GCC about the IEEE exception flags, so that the compiler can track data dependencies and we do not have to resort to UNSPECs which the compiler cannot see through. E.g. for a piece of code like: long fltlt (double x, double y, double z) { return __builtin_isless (x, y) + __builtin_isless (x, z); } (using an addition here for clarity because for a logical operation even more horror is produced) we get: .globl fltlt .type fltlt, @function fltlt: frflags a5 # 8 [c=4 l=4] riscv_frflags flt.d a0,fa0,fa1 # 9 [c=4 l=4] *cstoredfdi4 fsflags a5 # 10 [c=0 l=4] riscv_fsflags frflags a4 # 16 [c=4 l=4] riscv_frflags flt.d a5,fa0,fa2 # 17 [c=4 l=4] *cstoredfdi4 fsflags a4 # 18 [c=0 l=4] riscv_fsflags addw a0,a0,a5 # 30 [c=8 l=4] *addsi3_extended/0 ret # 40 [c=0 l=4] simple_return .size fltlt, .-fltlt where the middle FSFLAGS/FRFLAGS pair makes no sense of course and is a waste of both space and cycles. I'm yet running some benchmarking to see if the use of UNSPEC_VOLATILEs makes any noticeable performance difference, but I suspect it does not as the compiler could not do much about the original multiple-instruction single RTL insns anyway. No regressions with the GCC (with and w/o `-fsignaling-nans') and glibc testsuites (as per commit 1fcbfb00fc67 ("RISC-V: Fix -fsignaling-nans for glibc testsuite.")). OK to apply? Maciej --- gcc/config/riscv/riscv.md | 67 +++++++++++++++-------------- gcc/testsuite/gcc.target/riscv/fle-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fle-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fle.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flef-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flef-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flef.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flt-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flt-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flt.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fltf-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fltf-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fltf.c | 12 +++++ 13 files changed, 179 insertions(+), 32 deletions(-) gcc-riscv-fcmp-split.diff Index: gcc/gcc/config/riscv/riscv.md =================================================================== --- gcc.orig/gcc/config/riscv/riscv.md +++ gcc/gcc/config/riscv/riscv.md @@ -57,6 +57,7 @@ ;; Floating-point unspecs. UNSPECV_FRFLAGS UNSPECV_FSFLAGS + UNSPECV_FSNVSNAN ;; Interrupt handler instructions. UNSPECV_MRET @@ -360,6 +361,7 @@ ;; Iterator and attributes for quiet comparisons. (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET]) (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")]) +(define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET "LE")]) ;; This code iterator allows signed and unsigned widening multiplications ;; to use the same template. @@ -2326,39 +2328,31 @@ (set_attr "mode" "")]) (define_expand "f_quiet4" - [(parallel [(set (match_operand:X 0 "register_operand") - (unspec:X - [(match_operand:ANYF 1 "register_operand") - (match_operand:ANYF 2 "register_operand")] - QUIET_COMPARISON)) - (clobber (match_scratch:X 3))])] - "TARGET_HARD_FLOAT") - -(define_insn "*f_quiet4_default" - [(set (match_operand:X 0 "register_operand" "=r") - (unspec:X - [(match_operand:ANYF 1 "register_operand" " f") - (match_operand:ANYF 2 "register_operand" " f")] - QUIET_COMPARISON)) - (clobber (match_scratch:X 3 "=&r"))] - "TARGET_HARD_FLOAT && ! HONOR_SNANS (mode)" - "frflags\t%3\n\tf.\t%0,%1,%2\n\tfsflags\t%3" - [(set_attr "type" "fcmp") - (set_attr "mode" "") - (set (attr "length") (const_int 12))]) + [(set (match_operand:X 0 "register_operand") + (unspec:X [(match_operand:ANYF 1 "register_operand") + (match_operand:ANYF 2 "register_operand")] + QUIET_COMPARISON))] + "TARGET_HARD_FLOAT" +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx tmp = gen_reg_rtx (SImode); + rtx cmp = gen_rtx_ (mode, op1, op2); + rtx frflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, const0_rtx), + UNSPECV_FRFLAGS); + rtx fsflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, tmp), + UNSPECV_FSFLAGS); -(define_insn "*f_quiet4_snan" - [(set (match_operand:X 0 "register_operand" "=r") - (unspec:X - [(match_operand:ANYF 1 "register_operand" " f") - (match_operand:ANYF 2 "register_operand" " f")] - QUIET_COMPARISON)) - (clobber (match_scratch:X 3 "=&r"))] - "TARGET_HARD_FLOAT && HONOR_SNANS (mode)" - "frflags\t%3\n\tf.\t%0,%1,%2\n\tfsflags\t%3\n\tfeq.\tzero,%1,%2" - [(set_attr "type" "fcmp") - (set_attr "mode" "") - (set (attr "length") (const_int 16))]) + emit_insn (gen_rtx_SET (tmp, frflags)); + emit_insn (gen_rtx_SET (op0, cmp)); + emit_insn (fsflags); + if (HONOR_SNANS (mode)) + emit_insn (gen_rtx_UNSPEC_VOLATILE (mode, + gen_rtvec (2, op1, op2), + UNSPECV_FSNVSNAN)); + DONE; +}) (define_insn "*seq_zero_" [(set (match_operand:GPR 0 "register_operand" "=r") @@ -2766,6 +2760,15 @@ "TARGET_HARD_FLOAT" "fsflags\t%0") +(define_insn "*riscv_fsnvsnan2" + [(unspec_volatile [(match_operand:ANYF 0 "register_operand") + (match_operand:ANYF 1 "register_operand")] + UNSPECV_FSNVSNAN)] + "TARGET_HARD_FLOAT" + "feq.\tzero,%0,%1" + [(set_attr "type" "fcmp") + (set_attr "mode" "")]) + (define_insn "riscv_mret" [(return) (unspec_volatile [(const_int 0)] UNSPECV_MRET)] Index: gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */ + +long +fle (double x, double y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */ + +long +fle (double x, double y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fle.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fle.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-ffinite-math-only" } */ + +long +fle (double x, double y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:gt|le)\\.d\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */ + +long +flef (float x, float y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */ + +long +flef (float x, float y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flef.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flef.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-ffinite-math-only" } */ + +long +flef (float x, float y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:gt|le)\\.s\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */ + +long +flt (double x, double y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */ + +long +flt (double x, double y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flt.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flt.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-ffinite-math-only" } */ + +long +flt (double x, double y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.d\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */ + +long +fltf (float x, float y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */ + +long +fltf (float x, float y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fltf.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fltf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-ffinite-math-only" } */ + +long +fltf (float x, float y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.s\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */