Message ID | 1e8fab8f-c0bb-dfc6-5533-eba3bde49ea4@yahoo.co.jp |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D83BD3846469 for <patchwork@sourceware.org>; Sat, 7 Jan 2023 02:56:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D83BD3846469 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1673060164; bh=BzaF3QJbqhZiHPjbTM6rDD9cxTm4C3Yvhnv2YEfy+UI=; h=Date:Subject:To:Cc:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=hmMWUcfVX8/Sh+u8B9riGk2NL5QERjtfc4Nlx5O23H01cCRqJDT8lO86ScrLIWb6Q z7vpCTu53igFPmjGVb9wYtkndnDuqdJwZUEV5Or+Y95Q1q+IuWDJmKs+tDVodvYzJQ XuOgoFoU7S/46olOG8BBD42zNb9jRuUB5QiNb4kU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from sonicconh6001.mail.ssk.yahoo.co.jp (sonicconh6001.mail.ssk.yahoo.co.jp [182.22.37.8]) by sourceware.org (Postfix) with ESMTPS id B608E3858430 for <gcc-patches@gcc.gnu.org>; Sat, 7 Jan 2023 02:55:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B608E3858430 X-YMail-OSG: UbF1hu8VM1mFBFwyOM6GdgpMxnpE4O8Y99Z4vItk_SSIwh5GigN1chb3Rr9ba_o lxXrpQQDWMdL_AC8QmOoNk8jMAaHZBebOgzuFM4QbJOFyOe9HbA2APdfxl1ggRU9FJtcuQ5f7bSC XWzeYGU1rwnGuIfYISvk5ptaiLIP0Op9I8SmI4dcG.0xAC8wW5ENJT86uHpu_S92MHAKQPtOYsua CCf8dobhLxA9Ck92LbG214iOk.wxKfVW1XOA19VrN0WJbGobCqfuzsj2awh2uOrycQzRUGfOQTbc 9tgCl7rPDEVTOfAdthU.HzdaOffGp4nW0OwyfqiTvTItLSh5L3LEwkPIN0r.VAvSCnKSrMddwVP6 BqlVs5SNmiddDdwe0Bv93q03M.xz3s20FWKOiYJR3XzUnrh35AzRWUi8j7GqBfFknF3ULxqSH8J3 4s25kigcpgmuxVmb3nwi6WxYs0bCZi.4._3luTYUoU5Dijq291xpmod3DeOXM5M8aKDviZbHFh3F Y_d5mxQabA0yebyjAj2tBdwhwpY5dRYLTVAns8LSLMl6ZZeQCS9.uKFuHtNIWFyJVBHR8iMLjsYy _hAztTQvUoNcDEUEJF3XWK3n0dVSK0ceSlsy_1B.AxePMTEuC8s3d7Ojq396qrzPP22f47vsa1pI I3mNnwchBFtVArvAl1B9yYvUWgpZZuAO.REQx2NNb9QZ8mWZ1vQnU9.TOzuPoIUDVAnaXTNzSifb mnnxV1wO.kTWJZHxBCygInEdKAr7Jl06q_n0VEz8M9NJgzi12JIJjwplaZL6LvmQ3RtNo2aMgETj PxLJduXS7DpfB51VWqRC0mhsPL43BK4m9xXIYyNvvRK3MLcR4dP3.njtAGOMG_qcJOgXkhr9QN2B jumZyInGUmwVxrIA326mPHzC51MEgEvzIr5tNTGmE_L0MAmF3BbMhw11UwuYsmpdgOv37ksLDPCU g.w-- Received: from sonicgw.mail.yahoo.co.jp by sonicconh6001.mail.ssk.yahoo.co.jp with HTTP; Sat, 7 Jan 2023 02:55:30 +0000 Received: by smtphe6008.mail.ssk.ynwp.yahoo.co.jp (YJ Hermes SMTP Server) with ESMTPA ID d45b3edc67808cedae48d999efe09ad4; Sat, 07 Jan 2023 11:55:27 +0900 (JST) Message-ID: <1e8fab8f-c0bb-dfc6-5533-eba3bde49ea4@yahoo.co.jp> Date: Sat, 7 Jan 2023 11:55:26 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: [PATCH] xtensa: Optimize bitwise splicing operation To: GCC Patches <gcc-patches@gcc.gnu.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit References: <1e8fab8f-c0bb-dfc6-5533-eba3bde49ea4.ref@yahoo.co.jp> X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_INVALID, DKIM_SIGNED, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, KAM_DMARC_STATUS, NML_ADSP_CUSTOM_MED, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
xtensa: Optimize bitwise splicing operation
|
|
Commit Message
Takayuki 'January June' Suwa
Jan. 7, 2023, 2:55 a.m. UTC
This patch optimizes the operation of cutting and splicing two register values at a specified bit position, in other words, combining (bitwise ORing) bits 0 through (C-1) of the register with bits C through 31 of the other, where C is the specified immediate integer 1 through 31. This typically applies to signedness copy of floating point number or __builtin_return_address() if the windowed register ABI, and saves one instruction compared to four shifts and a bitwise OR by the RTL generation pass. gcc/ChangeLog: * config/xtensa/xtensa.md (*splice_bits): New insn_and_split pattern. --- gcc/config/xtensa/xtensa.md | 47 +++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
Comments
On Fri, Jan 6, 2023 at 6:55 PM Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp> wrote: > > This patch optimizes the operation of cutting and splicing two register > values at a specified bit position, in other words, combining (bitwise > ORing) bits 0 through (C-1) of the register with bits C through 31 > of the other, where C is the specified immediate integer 1 through 31. > > This typically applies to signedness copy of floating point number or > __builtin_return_address() if the windowed register ABI, and saves one > instruction compared to four shifts and a bitwise OR by the RTL > generation pass. While I indeed see this kind of change, e.g.: - extui a3, a3, 27, 5 - slli a2, a2, 5 - srli a2, a2, 5 - slli a3, a3, 27 - or a2, a2, a3 + slli a2, a2, 5 + extui a3, a3, 27, 5 + ssai 5 + src a2, a3, a2 I also see the following: - movi.n a6, -4 - and a5, a5, a6 - extui a3, a3, 0, 2 - or a3, a3, a5 + srli a5, a5, 2 + slli a3, a3, 30 + ssai 30 + src a3, a5, a3 i.e. after the split there's the same number of instructions, but the new sequence is one byte longer than the original one because of the movi.n. Looking at a bunch of linux builds I observe a slight code size growth in call0 kernels and a slight code size reduction in windowed kernels. > gcc/ChangeLog: > > * config/xtensa/xtensa.md (*splice_bits): > New insn_and_split pattern. > --- > gcc/config/xtensa/xtensa.md | 47 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 47 insertions(+) > > diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md > index 0a26d3dccf4..36ec1b1918e 100644 > --- a/gcc/config/xtensa/xtensa.md > +++ b/gcc/config/xtensa/xtensa.md > @@ -746,6 +746,53 @@ > (set_attr "mode" "SI") > (set_attr "length" "3")]) > > +(define_insn_and_split "*splice_bits" > + [(set (match_operand:SI 0 "register_operand" "=a") > + (ior:SI (and:SI (match_operand:SI 1 "register_operand" "r") > + (match_operand:SI 3 "const_int_operand" "i")) > + (and:SI (match_operand:SI 2 "register_operand" "r") > + (match_operand:SI 4 "const_int_operand" "i"))))] > + > + "!optimize_debug && optimize > + && INTVAL (operands[3]) + INTVAL (operands[4]) == -1 > + && (exact_log2 (INTVAL (operands[3]) + 1) > 0 > + || exact_log2 (INTVAL (operands[4]) + 1) > 0)" > + "#" > + "&& can_create_pseudo_p ()" > + [(set (match_dup 5) > + (ashift:SI (match_dup 1) > + (match_dup 4))) > + (set (match_dup 6) > + (lshiftrt:SI (match_dup 2) > + (match_dup 3))) > + (set (match_dup 0) > + (ior:SI (lshiftrt:SI (match_dup 5) > + (match_dup 4)) > + (ashift:SI (match_dup 6) > + (match_dup 3))))] > +{ > + int shift; > + if (INTVAL (operands[3]) < 0) > + { > + rtx x; > + x = operands[1], operands[1] = operands[2], operands[2] = x; > + x = operands[3], operands[3] = operands[4], operands[4] = x; > + } > + shift = floor_log2 (INTVAL (operands[3]) + 1); > + operands[3] = GEN_INT (shift); > + operands[4] = GEN_INT (32 - shift); > + operands[5] = gen_reg_rtx (SImode); > + operands[6] = gen_reg_rtx (SImode); > +} > + [(set_attr "type" "arith") > + (set_attr "mode" "SI") > + (set (attr "length") > + (if_then_else (match_test "TARGET_DENSITY > + && (INTVAL (operands[3]) == 0x7FFFFFFF > + || INTVAL (operands[4]) == 0x7FFFFFFF)") > + (const_int 11) > + (const_int 12)))]) I wonder how the length could be 11 here? I always see 4 3-byte instructions generated by this pattern.
On 2023/01/08 6:53, Max Filippov wrote: > On Fri, Jan 6, 2023 at 6:55 PM Takayuki 'January June' Suwa > <jjsuwa_sys3175@yahoo.co.jp> wrote: >> >> This patch optimizes the operation of cutting and splicing two register >> values at a specified bit position, in other words, combining (bitwise >> ORing) bits 0 through (C-1) of the register with bits C through 31 >> of the other, where C is the specified immediate integer 1 through 31. >> >> This typically applies to signedness copy of floating point number or >> __builtin_return_address() if the windowed register ABI, and saves one >> instruction compared to four shifts and a bitwise OR by the RTL >> generation pass. > > While I indeed see this kind of change, e.g.: > - extui a3, a3, 27, 5 > - slli a2, a2, 5 > - srli a2, a2, 5 > - slli a3, a3, 27 > - or a2, a2, a3 > + slli a2, a2, 5 > + extui a3, a3, 27, 5 > + ssai 5 > + src a2, a3, a2 > > I also see the following: > - movi.n a6, -4 > - and a5, a5, a6 > - extui a3, a3, 0, 2 > - or a3, a3, a5 > + srli a5, a5, 2 > + slli a3, a3, 30 > + ssai 30 > + src a3, a5, a3 > > i.e. after the split there's the same number of instructions, > but the new sequence is one byte longer than the original one > because of the movi.n. > > Looking at a bunch of linux builds I observe a slight code size > growth in call0 kernels and a slight code size reduction in > windowed kernels. > >> gcc/ChangeLog: >> >> * config/xtensa/xtensa.md (*splice_bits): >> New insn_and_split pattern. >> --- >> gcc/config/xtensa/xtensa.md | 47 +++++++++++++++++++++++++++++++++++++ >> 1 file changed, 47 insertions(+) >> >> diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md >> index 0a26d3dccf4..36ec1b1918e 100644 >> --- a/gcc/config/xtensa/xtensa.md >> +++ b/gcc/config/xtensa/xtensa.md >> @@ -746,6 +746,53 @@ >> (set_attr "mode" "SI") >> (set_attr "length" "3")]) >> >> +(define_insn_and_split "*splice_bits" >> + [(set (match_operand:SI 0 "register_operand" "=a") >> + (ior:SI (and:SI (match_operand:SI 1 "register_operand" "r") >> + (match_operand:SI 3 "const_int_operand" "i")) >> + (and:SI (match_operand:SI 2 "register_operand" "r") >> + (match_operand:SI 4 "const_int_operand" "i"))))] >> + >> + "!optimize_debug && optimize >> + && INTVAL (operands[3]) + INTVAL (operands[4]) == -1 >> + && (exact_log2 (INTVAL (operands[3]) + 1) > 0 >> + || exact_log2 (INTVAL (operands[4]) + 1) > 0)" >> + "#" >> + "&& can_create_pseudo_p ()" >> + [(set (match_dup 5) >> + (ashift:SI (match_dup 1) >> + (match_dup 4))) >> + (set (match_dup 6) >> + (lshiftrt:SI (match_dup 2) >> + (match_dup 3))) >> + (set (match_dup 0) >> + (ior:SI (lshiftrt:SI (match_dup 5) >> + (match_dup 4)) >> + (ashift:SI (match_dup 6) >> + (match_dup 3))))] >> +{ >> + int shift; >> + if (INTVAL (operands[3]) < 0) >> + { >> + rtx x; >> + x = operands[1], operands[1] = operands[2], operands[2] = x; >> + x = operands[3], operands[3] = operands[4], operands[4] = x; >> + } >> + shift = floor_log2 (INTVAL (operands[3]) + 1); >> + operands[3] = GEN_INT (shift); >> + operands[4] = GEN_INT (32 - shift); >> + operands[5] = gen_reg_rtx (SImode); >> + operands[6] = gen_reg_rtx (SImode); >> +} >> + [(set_attr "type" "arith") >> + (set_attr "mode" "SI") >> + (set (attr "length") >> + (if_then_else (match_test "TARGET_DENSITY >> + && (INTVAL (operands[3]) == 0x7FFFFFFF >> + || INTVAL (operands[4]) == 0x7FFFFFFF)") >> + (const_int 11) >> + (const_int 12)))]) > > I wonder how the length could be 11 here? I always see 4 3-byte > instructions generated by this pattern. > Sorry, I should have carried out a systematic test beforehand: #define TEST(c) \ unsigned int test_ ## c (unsigned int a, unsigned int b) { \ return (a & (-1U >> c)) | (b & ~(-1U >> c)); \ } TEST(1) TEST(2) ... TEST(30) TEST(31) Without this patch, compiling the above if c is: a. between 1 and 15, slli (or add.n) + extui + slli + srli + or b. 16 then extui + slli + extui + or c. between 17 and 20, srli + slli + extui + or d. between 21 and 31, movi(.n) + and + extui + or Clearly, the patch should be restricted to apply only to case a.
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index 0a26d3dccf4..36ec1b1918e 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -746,6 +746,53 @@ (set_attr "mode" "SI") (set_attr "length" "3")]) +(define_insn_and_split "*splice_bits" + [(set (match_operand:SI 0 "register_operand" "=a") + (ior:SI (and:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:SI 3 "const_int_operand" "i")) + (and:SI (match_operand:SI 2 "register_operand" "r") + (match_operand:SI 4 "const_int_operand" "i"))))] + + "!optimize_debug && optimize + && INTVAL (operands[3]) + INTVAL (operands[4]) == -1 + && (exact_log2 (INTVAL (operands[3]) + 1) > 0 + || exact_log2 (INTVAL (operands[4]) + 1) > 0)" + "#" + "&& can_create_pseudo_p ()" + [(set (match_dup 5) + (ashift:SI (match_dup 1) + (match_dup 4))) + (set (match_dup 6) + (lshiftrt:SI (match_dup 2) + (match_dup 3))) + (set (match_dup 0) + (ior:SI (lshiftrt:SI (match_dup 5) + (match_dup 4)) + (ashift:SI (match_dup 6) + (match_dup 3))))] +{ + int shift; + if (INTVAL (operands[3]) < 0) + { + rtx x; + x = operands[1], operands[1] = operands[2], operands[2] = x; + x = operands[3], operands[3] = operands[4], operands[4] = x; + } + shift = floor_log2 (INTVAL (operands[3]) + 1); + operands[3] = GEN_INT (shift); + operands[4] = GEN_INT (32 - shift); + operands[5] = gen_reg_rtx (SImode); + operands[6] = gen_reg_rtx (SImode); +} + [(set_attr "type" "arith") + (set_attr "mode" "SI") + (set (attr "length") + (if_then_else (match_test "TARGET_DENSITY + && (INTVAL (operands[3]) == 0x7FFFFFFF + || INTVAL (operands[4]) == 0x7FFFFFFF)") + (const_int 11) + (const_int 12)))]) + ;; Zero-extend instructions.