From patchwork Sat Jun 3 09:55:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Takayuki 'January June' Suwa X-Patchwork-Id: 70558 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BFF56385842D for ; Sat, 3 Jun 2023 09:55:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BFF56385842D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1685786154; bh=afg0uhD3IH79gqqJjV9goEJuFNK9DKCUVFYxfSLcRBs=; h=Date:To:Cc:Subject:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=t2PWXf1I3YDajsCC4lCa6gsI2ktXlQ8PKtE2AsA76z6po0nllgGCiVgBNjmTmZP9E i7ygjiPiaFvdY3OWWMgY8tVxUbL4HL/OKgNrLb3G0XV56y3u9a3CapE75YECO585fi WeuLjwAtjXAoBbixTYUJF3mZXGvZvpe4ENXSbYTw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from omggw0008-vm0.mail.otm.yahoo.co.jp (omggw0008-vm0.mail.otm.yahoo.co.jp [182.22.18.145]) by sourceware.org (Postfix) with ESMTPS id 8323C3858D32 for ; Sat, 3 Jun 2023 09:55:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8323C3858D32 X-YMail-OSG: UPCOWTgVM1mvvaHZT1YuZ139Jw2ytFYo9DOj1h8vwzOx2kY22mbE9L4wkuhEQSN 4iYxzsYg83xDV73l0NPEda.1V1CfGDK7oZdg6bRdFTtpS8h5YLPAtmPs3EmPD.fNe3fU4WS22I4o JmOB3kGC6d7KwIG_ODu9PCaS5_kTuuHy6mNDbVKS3lzBjPZRCrNhwilma_mPo54f4cHSNmXqNyX3 IJ_o76AihM9Mmt8FWd7FMXiOvgoZDPa7fUFs0p0SlFKEt662VrlQL20405X0d9Z.p56fxmfBOGTW WdtzpFyiVxVxE.kwgfCj7YLWoXvMzQJzNODlwfPoVSsmDt5aYdhQEKDLi1lSgLC924h0jD2l6Qoi f7lLWlS4U2gaIZa7SpbDmd4b93TzLP5tTawtiA8I0r8d4C5Q69MDP.o4.dL54MR5chMaZYxwE8Sq NFImW.rKh20RALrT_aALadgWRFOU6fKeQQL3HxMD.YWdyatmaKYS64S1YtYJpEQ43_wTp7V1nAPe nXmBFY.IR5ccXb0rhoJN0tq.D5gmSfY0V6Sv5UxC7gYwzAMk7chuc6VNCCHA95.X9jzAjDRjmQgs FlpmxAupUbkZGa1T2hBeHgEOEeSHAp_d2dqNGZL_I8.l0b1GY6WbAuN9DsIsHE0dRHIP1EUPkCbT H32sLvBfgjva5GxrkQq1qhgGauuwzGXHk9rD8YaxCDJ2ZOkIMCFNZOIFww5UlbKgvi.g1ywntJ23 IgtRN06xzJro4ml5LolJNvmLkud0wkwG6mGPLfcZ76dwx7WgyBXPa.IQ0yRV064LQ69lYRMdaVxt ieJZFfk_2soyalmSWYEvgOvBcyn4XvIU1uPUFAkhDlyEbY7cMvnp3zmje2VMIiikNWybZOye8s4Y khfEZC5NxQ8ppYpIooAHhK3YzbvA2h_cSxVRZQegWbrs5W8VAZ1_PmS00KAVW4PY13fmG93GS0X2 dQEjGCI28oYi9oVYwFHjvs7iayY1D5Yw2GUzY6voCgwh1NVy2v.Z3oGy1VGR2UXlgkQ0- Received: from sonicgw.mail.yahoo.co.jp by sonicconh6002.mail.ssk.yahoo.co.jp with HTTP; Sat, 3 Jun 2023 09:55:21 +0000 Received: by smtphe6007.mail.ssk.ynwp.yahoo.co.jp (YJ Hermes SMTP Server) with ESMTPA ID eca4d76ba1057665fe5b0101571b4d10; Sat, 03 Jun 2023 18:55:20 +0900 (JST) Message-ID: Date: Sat, 3 Jun 2023 18:55:17 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.2 To: GCC Patches Cc: Max Filippov Subject: [PATCH] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode References: X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Takayuki 'January June' Suwa via Gcc-patches From: Takayuki 'January June' Suwa Reply-To: Takayuki 'January June' Suwa Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch optimizes the boolean evaluation of EQ/NE against zero by adding two insn_and_split patterns similar to SImode conditional store: "eq_zero": op0 = (op1 == 0) ? 1 : 0; op0 = clz(op1) >> 5; /* optimized (requires TARGET_NSA) */ "movsicc_ne0_reg_0": op0 = (op1 != 0) ? op2 : 0; op0 = op2; if (op1 == 0) ? op0 = op1; /* optimized */ These also work in SFmode by ignoring their sign bits, and further- more, the branch if EQ/NE against zero in SFmode is also done in the same manner. The reasons for this optimization in SFmode are: - Only zero values (negative or non-negative) contain no bits of 1 with both the exponent and the mantissa. - EQ/NE comparisons involving NaNs produce no signal even if they are signaling. - Even if the use of IEEE 754 single-precision floating-point co- processor is configured (TARGET_HARD_FLOAT is true): 1. Load zero value to FP register 2. Possibly, additional FP move if the comparison target is an address register 3. FP equality check instruction 4. Read the boolean register containing the result, or condi- tional branch As noted above, a considerable number of instructions are still generated. gcc/ChangeLog: * config/xtensa/predicates.md (const_float_0_operand): Rename from obsolete "const_float_1_operand" and change the constant to compare. (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator): New. * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch): Add code for EQ/NE comparison with constant zero in SFmode. (xtensa_expand_scc): Added code to derive boolean evaluation of EQ/NE with constant zero for comparison in SFmode. (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value zero inside "cbranchsf4" to 0. * config/xtensa/xtensa.md (cbranchsf4, cstoresf4): Change "match_operator" and the third "match_operand" to the ones mentioned above. (movsicc_ne0_reg_zero, eq_zero): New. --- gcc/config/xtensa/predicates.md | 19 ++++++++++-- gcc/config/xtensa/xtensa.cc | 43 ++++++++++++++++++++++++++ gcc/config/xtensa/xtensa.md | 53 +++++++++++++++++++++++++++++---- 3 files changed, 106 insertions(+), 9 deletions(-) diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md index a3575a68892..d3b49e32fa4 100644 --- a/gcc/config/xtensa/predicates.md +++ b/gcc/config/xtensa/predicates.md @@ -155,11 +155,11 @@ && CONSTANT_P (op) && GET_MODE_SIZE (mode) % UNITS_PER_WORD == 0"))))) -;; Accept the floating point constant 1 in the appropriate mode. -(define_predicate "const_float_1_operand" +;; Accept the floating point constant 0 in the appropriate mode. +(define_predicate "const_float_0_operand" (match_code "const_double") { - return real_equal (CONST_DOUBLE_REAL_VALUE (op), &dconst1); + return real_equal (CONST_DOUBLE_REAL_VALUE (op), &dconst0); }) (define_predicate "fpmem_offset_operand" @@ -179,6 +179,13 @@ return false; }) +(define_predicate "cstoresf_cbranchsf_operand" + (ior (and (match_test "TARGET_HARD_FLOAT") + (match_operand 0 "register_operand")) + (and (match_code "const_double") + (match_test "real_equal (CONST_DOUBLE_REAL_VALUE (op), + &dconst0)")))) + (define_predicate "branch_operator" (match_code "eq,ne,lt,ge")) @@ -197,6 +204,12 @@ (define_predicate "xtensa_cstoresi_operator" (match_code "eq,ne,gt,ge,lt,le")) +(define_predicate "cstoresf_cbranchsf_operator" + (ior (and (match_test "TARGET_HARD_FLOAT") + (match_operand 0 "comparison_operator")) + (and (match_test "!TARGET_HARD_FLOAT") + (match_operand 0 "boolean_operator")))) + (define_predicate "xtensa_shift_per_byte_operator" (match_code "ashift,ashiftrt,lshiftrt")) diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc index 3b5d25b660a..fefca3b11cd 100644 --- a/gcc/config/xtensa/xtensa.cc +++ b/gcc/config/xtensa/xtensa.cc @@ -865,6 +865,16 @@ xtensa_expand_conditional_branch (rtx *operands, machine_mode mode) switch (mode) { case E_SFmode: + if ((test_code == EQ || test_code == NE) + && const_float_0_operand (cmp1, SFmode)) + { + emit_move_insn (cmp1 = gen_reg_rtx (SImode), + gen_rtx_SUBREG (SImode, cmp0, 0)); + emit_insn (gen_addsi3 (cmp1, cmp1, cmp1)); + cmp = gen_int_relational (test_code, cmp1, const0_rtx); + break; + } + if (TARGET_HARD_FLOAT) { cmp = gen_float_relational (test_code, cmp0, cmp1); @@ -996,6 +1006,34 @@ xtensa_expand_scc (rtx operands[4], machine_mode cmp_mode) rtx one_tmp, zero_tmp; rtx (*gen_fn) (rtx, rtx, rtx, rtx, rtx); + if (cmp_mode == SFmode) + { + if (const_float_0_operand (operands[3], SFmode)) + switch (GET_CODE (operands[1])) + { + case EQ: + emit_move_insn (cmp = gen_reg_rtx (SImode), + gen_rtx_SUBREG (SImode, operands[2], 0)); + emit_insn (gen_addsi3 (cmp, cmp, cmp)); + emit_insn (gen_eq_zero (dest, cmp)); + return 1; + + case NE: + emit_move_insn (cmp = gen_reg_rtx (SImode), + gen_rtx_SUBREG (SImode, operands[2], 0)); + emit_insn (gen_addsi3 (cmp, cmp, cmp)); + one_tmp = force_reg (SImode, const1_rtx); + emit_insn (gen_movsicc_ne0_reg_zero (dest, cmp, one_tmp)); + return 1; + + default: + gcc_unreachable (); + } + + if (! register_operand (operands[3], SFmode)) + return 0; + } + if (!(cmp = gen_conditional_move (GET_CODE (operands[1]), cmp_mode, operands[2], operands[3]))) return 0; @@ -4438,6 +4476,11 @@ xtensa_rtx_costs (rtx x, machine_mode mode, int outer_code, return true; case CONST_DOUBLE: + if (outer_code == COMPARE && const_float_0_operand (x, SFmode)) + { + *total = 0; + return true; + } if (TARGET_CONST16) *total = COSTS_N_INSNS (4); else diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index 21afa747e89..87620934bbe 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -1906,11 +1906,11 @@ }) (define_expand "cbranchsf4" - [(match_operator 0 "comparison_operator" + [(match_operator 0 "cstoresf_cbranchsf_operator" [(match_operand:SF 1 "register_operand") - (match_operand:SF 2 "register_operand")]) + (match_operand:SF 2 "cstoresf_cbranchsf_operand")]) (match_operand 3 "")] - "TARGET_HARD_FLOAT" + "" { xtensa_expand_conditional_branch (operands, SFmode); DONE; @@ -2364,10 +2364,10 @@ (define_expand "cstoresf4" [(match_operand:SI 0 "register_operand") - (match_operator:SI 1 "comparison_operator" + (match_operator:SI 1 "cstoresf_cbranchsf_operator" [(match_operand:SF 2 "register_operand") - (match_operand:SF 3 "register_operand")])] - "TARGET_HARD_FLOAT" + (match_operand:SF 3 "cstoresf_cbranchsf_operand")])] + "" { if (!xtensa_expand_scc (operands, SFmode)) FAIL; @@ -2432,6 +2432,30 @@ (set_attr "mode" "SI") (set_attr "length" "3,3")]) +(define_insn_and_split "movsicc_ne0_reg_zero" + [(set (match_operand:SI 0 "register_operand" "=a") + (if_then_else:SI (ne (match_operand:SI 1 "register_operand" "r") + (const_int 0)) + (match_operand:SI 2 "register_operand" "r") + (const_int 0)))] + "" + "#" + "" + [(set (match_dup 0) + (match_dup 2)) + (set (match_dup 0) + (if_then_else:SI (ne (match_dup 1) + (const_int 0)) + (match_dup 0) + (match_dup 1)))] + "" + [(set_attr "type" "move") + (set_attr "mode" "SI") + (set (attr "length") + (if_then_else (match_test "TARGET_DENSITY") + (const_int 5) + (const_int 6)))]) + (define_insn "movsfcc_internal0" [(set (match_operand:SF 0 "register_operand" "=a,a,f,f") (if_then_else:SF (match_operator 4 "branch_operator" @@ -3157,6 +3181,23 @@ (const_int 5) (const_int 6)))]) +(define_insn_and_split "eq_zero" + [(set (match_operand:SI 0 "register_operand" "=a") + (eq:SI (match_operand:SI 1 "register_operand" "r") + (const_int 0)))] + "TARGET_NSA" + "#" + "&& 1" + [(set (match_dup 0) + (clz:SI (match_dup 1))) + (set (match_dup 0) + (lshiftrt:SI (match_dup 0) + (const_int 5)))] + "" + [(set_attr "type" "move") + (set_attr "mode" "SI") + (set_attr "length" "6")]) + (define_peephole2 [(set (match_operand:SI 0 "register_operand") (match_operand:SI 6 "reload_operand"))