From patchwork Sun Aug 7 17:04:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 56584 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EA95B3857C67 for ; Sun, 7 Aug 2022 17:04:39 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 48A11385800E for ; Sun, 7 Aug 2022 17:04:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 48A11385800E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=NNlCX3bYjjx/0nie/Acg7b2hEoDUXwiR10dM2dW1NnM=; b=GggGNRlYxr2GzTKY/CXWzJyFkA nJV+xpqLTSawc73/qdNNC5eZYPxzAgq62A9oCa8agn6axb0Fx5O0JebBN9vNDfD4mB2c8C4ACDRk4 xGZ6VcykanYhwpoCQbrm+xFiTvBRezMcHF8gAyO/pNqfFr4gz33jhnmVJKQU3hT9fJXxO8fGhZ4HA RFd41pB+/P72QGyEfPmpzYhSt/L8PVpleX51KyW3m9Dz8aC4+aRD/cEtPsRy4YaG4g+USwZWMuYLv /EJ0ND9k42DWfs1c1YXctFx2QOzvtHYaZncXL6Fd+9F/yU2juXFiUq606Dwq4v4hrAiAH0tGW/qiE NqR1x3sA==; Received: from [185.62.158.67] (port=52442 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oKjhO-0007NN-9T; Sun, 07 Aug 2022 13:04:22 -0400 From: "Roger Sayle" To: Subject: [x86 PATCH take #2] Add peephole2 to reduce double word register shuffling Date: Sun, 7 Aug 2022 18:04:20 +0100 Message-ID: <013401d8aa7f$bd006d80$37014880$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdiqdZ2eQv/c/Zs1SIqQvcXjmQGjvg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 'Richard Biener' Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This is a resubmission of my patch from June to fix some forms of inefficient register allocation using an additional peephole2 in i386.md. https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596064.html Since the original, a number of supporting patches/improvements have been reviewed and approved, making this peephole even more effective. Hence for the simple function: __int128 foo(__int128 x, __int128 y) { return x+y; } mainline GCC on x86_64 with -O2 currently generates: movq %rsi, %rax movq %rdi, %r8 movq %rax, %rdi movq %rdx, %rax movq %rcx, %rdx addq %r8, %rax adcq %rdi, %rdx ret with this patch we now generate (a three insn improvement): movq %rdx, %rax movq %rcx, %rdx addq %rdi, %rax adcq %rsi, %rdx ret Back in June the review of the original patch stalled, as peephole2 isn't the ideal place to fix this (with which I fully agree), and this patch is really just a workaround for a deeper deficiency in reload/lra. To address this I've now filed a new enhancement PR in Bugzilla, PR rtl-optimization/106518, that describes that underlying issue, which might make an interesting (GSoC) project for anyone brave (fool hardy) enough to tweak GCC's register allocation. By comparison, this single peephole can't adversely affect other targets, and should the happy day come that it's no longer required, at worst would just become a harmless legacy transform that no longer triggers. I'm also investigating Uros' suggestion that it may be possible for RTL expansion to do a better job expanding the function prologue, but ultimately the hard register placement constraints are fixed by the target ABI, and poor allocation/assignment of hard registers is the responsibility/fault of the register allocation passes. But it may still be possible to reduce register pressure, but avoiding the use of SUBREGs (which keep the source and destination double words live during shuffling) along the lines of Richard's CONCAT suggestion. This patch has been retested again mainline using make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok mainline? 2022-08-07 Roger Sayle gcc/ChangeLog PR target/43644 PR rtl-optimization/97756 PR rtl-optimization/98438 * config/i386/i386.md (define_peephole2): Recognize double word swap sequences, and replace them with more efficient idioms, including using xchg when optimizing for size. gcc/testsuite/ChangeLog PR target/43644 * gcc.target/i386/pr43644.c: New test case. Thanks in advance, Roger diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 298e4b3..a11fd5b 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -3039,6 +3039,36 @@ [(parallel [(set (match_dup 0) (match_dup 1)) (set (match_dup 1) (match_dup 0))])]) +;; Replace a double word swap that requires 4 mov insns with a +;; 3 mov insn implementation (or an xchg when optimizing for size). +(define_peephole2 + [(set (match_operand:DWIH 0 "general_reg_operand") + (match_operand:DWIH 1 "general_reg_operand")) + (set (match_operand:DWIH 2 "general_reg_operand") + (match_operand:DWIH 3 "general_reg_operand")) + (clobber (match_operand: 4 "general_reg_operand")) + (set (match_dup 3) (match_dup 0)) + (set (match_dup 1) (match_dup 2))] + "REGNO (operands[0]) != REGNO (operands[3]) + && REGNO (operands[1]) != REGNO (operands[2]) + && REGNO (operands[1]) != REGNO (operands[3]) + && REGNO (operands[3]) == REGNO (operands[4]) + && peep2_reg_dead_p (4, operands[0]) + && peep2_reg_dead_p (5, operands[2])" + [(parallel [(set (match_dup 1) (match_dup 3)) + (set (match_dup 3) (match_dup 1))])] +{ + if (!optimize_insn_for_size_p ()) + { + rtx tmp = REGNO (operands[0]) > REGNO (operands[2]) ? operands[0] + : operands[2]; + emit_move_insn (tmp, operands[1]); + emit_move_insn (operands[1], operands[3]); + emit_move_insn (operands[3], tmp); + DONE; + } +}) + (define_expand "movstrict" [(set (strict_low_part (match_operand:SWI12 0 "register_operand")) (match_operand:SWI12 1 "general_operand"))] diff --git a/gcc/testsuite/gcc.target/i386/pr43644.c b/gcc/testsuite/gcc.target/i386/pr43644.c new file mode 100644 index 0000000..ffdf31c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr43644.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2" } */ + +__int128 foo(__int128 x, __int128 y) +{ + return x+y; +} + +/* { dg-final { scan-assembler-times "movq" 2 } } */ +/* { dg-final { scan-assembler-not "push" } } */ +/* { dg-final { scan-assembler-not "pop" } } */