From patchwork Mon Jan 6 13:03:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 104171 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 31F8C385843F for ; Mon, 6 Jan 2025 13:04:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 31F8C385843F X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from angie.orcam.me.uk (angie.orcam.me.uk [78.133.224.34]) by sourceware.org (Postfix) with ESMTP id 64AA53858D38; Mon, 6 Jan 2025 13:03:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 64AA53858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 64AA53858D38 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=78.133.224.34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168603; cv=none; b=YJmOjin4au/OQrkxUHOqaLc2XwrKdCmM3OdM7fs675acJQEVQB8MmveMPbHEF936u1sfna+kiwMIolR1txZF4RiVJWEZOWP995xN/g1NKZ+IzpXCg3fdMQKdi0Pg5Q9b2HL5blELfH2+CPJ9WxwBuIhpkw6akwI8PPBTcnhrZCM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168603; c=relaxed/simple; bh=rJKGX4iivZknUFAWu0to9R1MAi9oWTt0csCFRe8P/qs=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=nEUv5NxYzusOMTwl/vcppTt5iTSBwBGaZXaCVy5OM5DpVd7dJjlq09tEwvvq2ncmbewT9xhjuPP5R1QSI6BtUvz75SyiaOBhFRUwQ51sWG4uS4iATUlk+PUd0BUfkKvSHaTBXYmkSYZG3WnvbG8DIXmMYfKJZVxLGCKCKI4XsyA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by angie.orcam.me.uk (Postfix, from userid 500) id EDF3592009D; Mon, 6 Jan 2025 14:03:22 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id E71CD92009B; Mon, 6 Jan 2025 13:03:22 +0000 (GMT) Date: Mon, 6 Jan 2025 13:03:22 +0000 (GMT) From: "Maciej W. Rozycki" To: Richard Henderson , gcc-patches@gcc.gnu.org Subject: [PATCH v2 1/7] Alpha: Always respect -mbwx, -mcix, -mfix, -mmax, and their inverse In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-1162.3 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_INFOUSMEBIZ, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org Contrary to user documentation the `-mbwx', `-mcix', `-mfix', `-mmax' feature options and their inverse forms are ignored whenever `-mcpu=' option is in effect, either by having been given explicitly or where configured as the default such as with the `alphaev56-linux-gnu' target. In the latter case there is no way to change the settings these options are supposed to tweak other than with `-mcpu=' and the settings cannot be individually controlled, making all the feature options permanently inactive. It seems a regression from commit 7816bea0e23b ("config.gcc: Reorganize --with-cpu logic.") back in 2003, which replaced the setting of the default feature mask with the setting of the default CPU across a few targets, and the complementing logic in the Alpha backend wasn't updated accordingly. Fix this by making the individual feature options take precedence over `-mcpu='. Add test cases to verify this is the case, and to cover the defaults as well for the boundary cases. This has a drawback where the order of the options is ignored between `-mcpu=' and these individual options, so e.g. `-mno-bwx -mcpu=ev6' will keep the BWX feature disabled even though `-mcpu=ev6' comes later in the command line. This may affect some scenarios involving user overrides such as with CFLAGS passed to `configure' and `make' invocations. I do believe it has been our practice anyway for more finegrained options to override group options regardless of their relative order on the command line and in any case using `-mcpu=ev6 -mbwx' as the override will do the right thing if required, canceling any previous `-mno-bwx'. This has been spotted with `alphaev56-linux-gnu' target verification and a recently added test case: FAIL: gcc.target/alpha/stwx0.c -O1 scan-assembler-times \\sldq_u\\s 2 FAIL: gcc.target/alpha/stwx0.c -O1 scan-assembler-times \\smskwh\\s 1 FAIL: gcc.target/alpha/stwx0.c -O1 scan-assembler-times \\smskwl\\s 1 FAIL: gcc.target/alpha/stwx0.c -O1 scan-assembler-times \\sstq_u\\s 2 (and similarly for the remaining optimization levels covered) which this fix has addressed. gcc/ * config/alpha/alpha.cc (alpha_option_override): Ignore CPU flags corresponding to features the enabling or disabling of which has been requested with an individual feature option. gcc/testsuite/ * gcc.target/alpha/target-bwx-1.c: New file. * gcc.target/alpha/target-bwx-2.c: New file. * gcc.target/alpha/target-bwx-3.c: New file. * gcc.target/alpha/target-bwx-4.c: New file. * gcc.target/alpha/target-cix-1.c: New file. * gcc.target/alpha/target-cix-2.c: New file. * gcc.target/alpha/target-cix-3.c: New file. * gcc.target/alpha/target-cix-4.c: New file. * gcc.target/alpha/target-fix-1.c: New file. * gcc.target/alpha/target-fix-2.c: New file. * gcc.target/alpha/target-fix-3.c: New file. * gcc.target/alpha/target-fix-4.c: New file. * gcc.target/alpha/target-max-1.c: New file. * gcc.target/alpha/target-max-2.c: New file. * gcc.target/alpha/target-max-3.c: New file. * gcc.target/alpha/target-max-4.c: New file. --- Changes from v1: - Original standalone submission: . - Fold into this series to prevent failures with newly-added tests in 2/7, 6/7, and 7/7. - Mention the likely cause of the regression in the description. --- gcc/config/alpha/alpha.cc | 5 +++-- gcc/testsuite/gcc.target/alpha/target-bwx-1.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-bwx-2.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-bwx-3.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-bwx-4.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-cix-1.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-cix-2.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-cix-3.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-cix-4.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-fix-1.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-fix-2.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-fix-3.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-fix-4.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-max-1.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-max-2.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-max-3.c | 6 ++++++ gcc/testsuite/gcc.target/alpha/target-max-4.c | 6 ++++++ 17 files changed, 99 insertions(+), 2 deletions(-) gcc-alpha-flags-explicit.diff Index: gcc/gcc/config/alpha/alpha.cc =================================================================== --- gcc.orig/gcc/config/alpha/alpha.cc +++ gcc/gcc/config/alpha/alpha.cc @@ -460,8 +460,9 @@ alpha_option_override (void) line_size = cpu_table[i].line_size; l1_size = cpu_table[i].l1_size; l2_size = cpu_table[i].l2_size; - target_flags &= ~ (MASK_BWX | MASK_MAX | MASK_FIX | MASK_CIX); - target_flags |= cpu_table[i].flags; + target_flags &= ~((MASK_BWX | MASK_MAX | MASK_FIX | MASK_CIX) + & ~target_flags_explicit); + target_flags |= cpu_table[i].flags & ~target_flags_explicit; break; } if (i == ct_size) Index: gcc/gcc/testsuite/gcc.target/alpha/target-bwx-1.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-bwx-1.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev5" } */ + +#ifdef __alpha_bwx__ +# error "BWX enabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-bwx-2.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-bwx-2.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev56" } */ + +#ifndef __alpha_bwx__ +# error "BWX disabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-bwx-3.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-bwx-3.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev5 -mbwx" } */ + +#ifndef __alpha_bwx__ +# error "BWX disabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-bwx-4.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-bwx-4.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev56 -mno-bwx" } */ + +#ifdef __alpha_bwx__ +# error "BWX enabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-cix-1.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-cix-1.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev6" } */ + +#ifdef __alpha_cix__ +# error "CIX enabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-cix-2.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-cix-2.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev67" } */ + +#ifndef __alpha_cix__ +# error "CIX disabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-cix-3.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-cix-3.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev6 -mcix" } */ + +#ifndef __alpha_cix__ +# error "CIX disabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-cix-4.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-cix-4.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev67 -mno-cix" } */ + +#ifdef __alpha_cix__ +# error "CIX enabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-fix-1.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-fix-1.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=pca56" } */ + +#ifdef __alpha_fix__ +# error "FIX enabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-fix-2.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-fix-2.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev6" } */ + +#ifndef __alpha_fix__ +# error "FIX disabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-fix-3.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-fix-3.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=pca56 -mfix" } */ + +#ifndef __alpha_fix__ +# error "FIX disabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-fix-4.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-fix-4.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev6 -mno-fix" } */ + +#ifdef __alpha_fix__ +# error "FIX enabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-max-1.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-max-1.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev56" } */ + +#ifdef __alpha_max__ +# error "MAX enabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-max-2.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-max-2.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=pca56" } */ + +#ifndef __alpha_max__ +# error "MAX disabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-max-3.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-max-3.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=ev56 -mmax" } */ + +#ifndef __alpha_max__ +# error "MAX disabled" +#endif Index: gcc/gcc/testsuite/gcc.target/alpha/target-max-4.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/target-max-4.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=pca56 -mno-max" } */ + +#ifdef __alpha_max__ +# error "MAX enabled" +#endif From patchwork Mon Jan 6 13:03:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 104170 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 45F173858023 for ; Mon, 6 Jan 2025 13:04:17 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::34]) by sourceware.org (Postfix) with ESMTP id 947C93858C54; Mon, 6 Jan 2025 13:03:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 947C93858C54 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 947C93858C54 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4190:8020::34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168609; cv=none; b=YL2tpX/r1GJuvll8rOVTNlf+QBcsVN1EsDRSlu1+VClfzi0vm6IIbBlOjDu8EJBm53N7tpjuFysi6qSoixZxlBils6YzxZ7vckknb9LCs/HjksvdhdkjSgEN6SbTX18bNiZ8PO4O58hpy0nlJzEyvWTRFDl6/760H7ZMRoifaKQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168609; c=relaxed/simple; bh=v062rOeYrXJjd1kMLngffu3F5ExNGuy9TZ84MbhEx98=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=w5K1ARrxBlhT0URylaUSreeqfltmk4Soq+2fjC3ry17iQYRJUS2Y3mY3tUZJQnZn8AEj4Vdr3vEolVQGFnn2PweIEBC1V8yCxfDtxouz1Xq0BgYvL8MLsSPi/w3SRW6kryeNY5jKi3S62ttVwRpyXXveRZ+LQXmi3nkokY6zclE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by angie.orcam.me.uk (Postfix, from userid 500) id 83F5492009E; Mon, 6 Jan 2025 14:03:28 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 7DA2E92009B; Mon, 6 Jan 2025 13:03:28 +0000 (GMT) Date: Mon, 6 Jan 2025 13:03:28 +0000 (GMT) From: "Maciej W. Rozycki" To: Richard Henderson , gcc-patches@gcc.gnu.org Subject: [PATCH v2 2/7] Alpha: Optimize block moves coming from longword-aligned source In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-3487.4 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_INFOUSMEBIZ, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org Now that we have proper alignment determination for block moves in place the case of copying a block of longword-aligned data has become real, so implement the merging of loaded data from pairs of SImode registers into single DImode registers for the purpose of using with unaligned stores efficiently, as suggested by a comment in `alpha_expand_block_move' and discard the comment. Provide test cases accordingly. gcc/ * config/alpha/alpha.cc (alpha_expand_block_move): Merge loaded data from pairs of SImode registers into single DImode registers if to be used with unaligned stores. gcc/testsuite/ * gcc.target/alpha/memcpy-si-aligned.c: New file. * gcc.target/alpha/memcpy-si-unaligned.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst.c: New file. * gcc.target/alpha/memcpy-si-unaligned-src.c: New file. * gcc.target/alpha/memcpy-si-unaligned-src-bwx.c: New file. --- No change from v1. --- gcc/config/alpha/alpha.cc | 45 +++++++-- gcc/testsuite/gcc.target/alpha/memcpy-si-aligned.c | 16 +++ gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst.c | 16 +++ gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-src-bwx.c | 11 ++ gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-src.c | 15 +++ gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned.c | 51 +++++++++++ 6 files changed, 146 insertions(+), 8 deletions(-) gcc-alpha-block-move-si-unaligned.diff Index: gcc/gcc/config/alpha/alpha.cc =================================================================== --- gcc.orig/gcc/config/alpha/alpha.cc +++ gcc/gcc/config/alpha/alpha.cc @@ -3930,14 +3930,44 @@ alpha_expand_block_move (rtx operands[]) { words = bytes / 4; - for (i = 0; i < words; ++i) - data_regs[nregs + i] = gen_reg_rtx (SImode); + /* Load an even quantity of SImode data pieces only. */ + unsigned int hwords = words / 2; + for (i = 0; i / 2 < hwords; ++i) + { + data_regs[nregs + i] = gen_reg_rtx (SImode); + emit_move_insn (data_regs[nregs + i], + adjust_address (orig_src, SImode, ofs + i * 4)); + } - for (i = 0; i < words; ++i) - emit_move_insn (data_regs[nregs + i], - adjust_address (orig_src, SImode, ofs + i * 4)); + /* If we'll be using unaligned stores, merge data from pairs + of SImode registers into DImode registers so that we can + store it more efficiently via quadword unaligned stores. */ + unsigned int j; + if (dst_align < 32) + for (i = 0, j = 0; i < words / 2; ++i, j = i * 2) + { + rtx hi = expand_simple_binop (DImode, ASHIFT, + data_regs[nregs + j + 1], + GEN_INT (32), NULL_RTX, + 1, OPTAB_WIDEN); + data_regs[nregs + i] = expand_simple_binop (DImode, IOR, hi, + data_regs[nregs + j], + NULL_RTX, + 1, OPTAB_WIDEN); + } + else + j = i; - nregs += words; + /* Take care of any remaining odd trailing SImode data piece. */ + if (j < words) + { + data_regs[nregs + i] = gen_reg_rtx (SImode); + emit_move_insn (data_regs[nregs + i], + adjust_address (orig_src, SImode, ofs + j * 4)); + ++i; + } + + nregs += i; bytes -= words * 4; ofs += words * 4; } @@ -4056,13 +4086,12 @@ alpha_expand_block_move (rtx operands[]) } /* Due to the above, this won't be aligned. */ - /* ??? If we have more than one of these, consider constructing full - words in registers and using alpha_expand_unaligned_store_words. */ while (i < nregs && GET_MODE (data_regs[i]) == SImode) { alpha_expand_unaligned_store (orig_dst, data_regs[i], 4, ofs); ofs += 4; i++; + gcc_assert (i == nregs || GET_MODE (data_regs[i]) != SImode); } if (dst_align >= 16) Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-aligned.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-aligned.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +unsigned int aligned_src_si[17] = { [0 ... 16] = 0xeaebeced }; +unsigned int aligned_dst_si[17] = { [0 ... 16] = 0xdcdbdad9 }; + +void +memcpy_aligned_data_si (void) +{ + __builtin_memcpy (aligned_dst_si + 1, aligned_src_si + 1, 60); +} + +/* { dg-final { scan-assembler-times "\\sldl\\s" 15 } } */ +/* { dg-final { scan-assembler-times "\\sstl\\s" 15 } } */ +/* { dg-final { scan-assembler-not "\\s(?:ldq_u|stq_u)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +unsigned int unaligned_src_si[17] = { [0 ... 16] = 0xfefdfcfb }; + +void +memcpy_unaligned_dst_si (void *dst) +{ + __builtin_memcpy (dst, unaligned_src_si + 1, 60); +} + +/* { dg-final { scan-assembler-times "\\sldl\\s" 15 } } */ +/* { dg-final { scan-assembler-times "\\sldq_u\\s" 4 } } */ +/* { dg-final { scan-assembler-times "\\sstq_u\\s" 10 } } */ +/* { dg-final { scan-assembler-not "\\sstl\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-src-bwx.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-src-bwx.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-mbwx" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "memcpy-si-unaligned-src.c" + +/* { dg-final { scan-assembler-times "\\sldbu\\s" 4 } } */ +/* { dg-final { scan-assembler-times "\\sldq_u\\s" 8 } } */ +/* { dg-final { scan-assembler-times "\\sstb\\s" 4 } } */ +/* { dg-final { scan-assembler-times "\\sstl\\s" 14 } } */ +/* { dg-final { scan-assembler-not "\\s(?:ldl|stq_u)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-src.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-src.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +unsigned int unaligned_dst_si[17] = { [0 ... 16] = 0xc8c9cacb }; + +void +memcpy_unaligned_src_si (const void *src) +{ + __builtin_memcpy (unaligned_dst_si + 1, src, 60); +} + +/* { dg-final { scan-assembler-times "\\sldq_u\\s" 10 } } */ +/* { dg-final { scan-assembler-times "\\sstl\\s" 15 } } */ +/* { dg-final { scan-assembler-not "\\s(?:ldl|stq_u)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned.c @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-additional-sources memcpy-si-aligned.c } */ +/* { dg-additional-sources memcpy-si-unaligned-src.c } */ +/* { dg-additional-sources memcpy-si-unaligned-dst.c } */ +/* { dg-options "" } */ + +void memcpy_aligned_data_si (void); +void memcpy_unaligned_dst_si (void *); +void memcpy_unaligned_src_si (const void *); + +extern unsigned int aligned_src_si[]; +extern unsigned int aligned_dst_si[]; +extern unsigned int unaligned_src_si[]; +extern unsigned int unaligned_dst_si[]; + +int +main (void) +{ + unsigned int v; + int i; + + for (i = 1, v = 0x04030201; i < 16; i++, v += 0x04040404) + unaligned_src_si[i] = v; + asm ("" : : : "memory"); + memcpy_unaligned_dst_si (aligned_src_si + 1); + asm ("" : : : "memory"); + memcpy_aligned_data_si (); + asm ("" : : : "memory"); + memcpy_unaligned_src_si (aligned_dst_si + 1); + asm ("" : : : "memory"); + for (i = 1, v = 0x04030201; i < 16; i++, v += 0x04040404) + if (unaligned_dst_si[i] != v) + return 1; + if (unaligned_src_si[0] != 0xfefdfcfb) + return 1; + if (unaligned_src_si[16] != 0xfefdfcfb) + return 1; + if (aligned_src_si[0] != 0xeaebeced) + return 1; + if (aligned_src_si[16] != 0xeaebeced) + return 1; + if (aligned_dst_si[0] != 0xdcdbdad9) + return 1; + if (aligned_dst_si[16] != 0xdcdbdad9) + return 1; + if (unaligned_dst_si[0] != 0xc8c9cacb) + return 1; + if (unaligned_dst_si[16] != 0xc8c9cacb) + return 1; + return 0; +} From patchwork Mon Jan 6 13:03:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 104174 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A8FD93858402 for ; Mon, 6 Jan 2025 13:06:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A8FD93858402 X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::34]) by sourceware.org (Postfix) with ESMTP id 972B43858416; Mon, 6 Jan 2025 13:03:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 972B43858416 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 972B43858416 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4190:8020::34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168619; cv=none; b=Gaixm/jDRX4ZAvYlVWb1FBytkH4SYm+S1MQWDRvJHgaL0C6+3Otxg6FKaqiuhPQsrruOPmyhGZ7Q+6kiw7bBge+q8OLrOf0TFbxu+RTJGi6xyPOC1rHvkX/JHkAPqw3iBQXm02Pjb1vaIu9z8W9EbEOuXmOEDmsynJQQ+B1dJQo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168619; c=relaxed/simple; bh=PnNDzxaIdKwiPYnSKsuzKDBdITBy4KMjyj3+WCxDII0=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=adqHpqL88jxlUVHjaL37NJ/HA0OH0LEUUkhG30km4wgdMWVbnEi1p69b3o+0I2ElTrIAibzT314nonqPXWLvLKy44heq7VvwhUwfbnyZ8Br8znPE5voQ0Sl4arMZoPSBpsNXEm3mNaINo8+z8XLV/5ThGUyJVIkefukRt+imjCA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by angie.orcam.me.uk (Postfix, from userid 500) id F2E1892009C; Mon, 6 Jan 2025 14:03:32 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id ECF4392009B; Mon, 6 Jan 2025 13:03:32 +0000 (GMT) Date: Mon, 6 Jan 2025 13:03:32 +0000 (GMT) From: "Maciej W. Rozycki" To: Richard Henderson , gcc-patches@gcc.gnu.org Subject: [PATCH v2 3/7] Alpha: Fix a block move pessimisation with zero-extension after LDWU In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-3487.4 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_INFOUSMEBIZ, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org For the BWX case we have a pessimisation in `alpha_expand_block_move' for HImode loads where we place the data loaded into a HImode register as well, therefore losing information that indeed the data loaded has already been zero-extended to the full DImode width of the register. Later on when we store this data in QImode quantities into an unaligned destination, we zero-extend it again for the purpose of right-shifting, such as with the test case included producing code at `-O2' as follows: ldah $2,unaligned_src_hi($29) !gprelhigh lda $1,unaligned_src_hi($2) !gprellow ldwu $6,unaligned_src_hi($2) !gprellow ldwu $5,2($1) ldwu $4,4($1) bis $31,$31,$31 zapnot $6,3,$3 # Redundant! ldbu $7,6($1) zapnot $5,3,$2 # Redundant! stb $6,0($16) zapnot $4,3,$1 # Redundant! stb $5,2($16) srl $3,8,$3 stb $4,4($16) srl $2,8,$2 stb $3,1($16) srl $1,8,$1 stb $2,3($16) stb $1,5($16) stb $7,6($16) The non-BWX case is unaffected, because there we use byte insertion, so we don't care that data is held in a HImode register. Address this by making the holding RTX a HImode subreg of the original DImode register, which the RTL passes can then see through and eliminate the zero-extension where otherwise required, resulting in this shortened code: ldah $2,unaligned_src_hi($29) !gprelhigh lda $1,unaligned_src_hi($2) !gprellow ldwu $4,unaligned_src_hi($2) !gprellow ldwu $3,2($1) ldwu $2,4($1) bis $31,$31,$31 srl $4,8,$6 ldbu $1,6($1) srl $3,8,$5 stb $4,0($16) stb $6,1($16) srl $2,8,$4 stb $3,2($16) stb $5,3($16) stb $2,4($16) stb $4,5($16) stb $1,6($16) While at it reformat the enclosing do-while statement according to the GNU Coding Standards, observing that in this case it does not obfuscate the change owing to the odd original indentation. gcc/ * config/alpha/alpha.cc (alpha_expand_block_move): Use a HImode subreg of a DImode register to hold data from an aligned HImode load. --- Changes from v1: - Remove a comma from the last sentence of the change description for clarity. --- gcc/config/alpha/alpha.cc | 17 +++++++++------ gcc/testsuite/gcc.target/alpha/memcpy-hi-unaligned-dst.c | 16 ++++++++++++++ 2 files changed, 27 insertions(+), 6 deletions(-) gcc-alpha-unaligned-store-bwx-hi.diff Index: gcc/gcc/config/alpha/alpha.cc =================================================================== --- gcc.orig/gcc/config/alpha/alpha.cc +++ gcc/gcc/config/alpha/alpha.cc @@ -3998,14 +3998,19 @@ alpha_expand_block_move (rtx operands[]) if (bytes >= 2) { if (src_align >= 16) - { - do { - data_regs[nregs++] = tmp = gen_reg_rtx (HImode); - emit_move_insn (tmp, adjust_address (orig_src, HImode, ofs)); + do + { + tmp = gen_reg_rtx (DImode); + emit_move_insn (tmp, + expand_simple_unop (DImode, SET, + adjust_address (orig_src, + HImode, ofs), + NULL_RTX, 1)); + data_regs[nregs++] = gen_rtx_SUBREG (HImode, tmp, 0); bytes -= 2; ofs += 2; - } while (bytes >= 2); - } + } + while (bytes >= 2); else if (! TARGET_BWX) { data_regs[nregs++] = tmp = gen_reg_rtx (HImode); Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-hi-unaligned-dst.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-hi-unaligned-dst.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mbwx" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +unsigned short unaligned_src_hi[4]; + +void +memcpy_unaligned_dst_hi (void *dst) +{ + __builtin_memcpy (dst, unaligned_src_hi, 7); +} + +/* { dg-final { scan-assembler-times "\\sldwu\\s" 3 } } */ +/* { dg-final { scan-assembler-times "\\sldbu\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstb\\s" 7 } } */ +/* { dg-final { scan-assembler-not "\\szapnot\\s" } } */ From patchwork Mon Jan 6 13:03:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 104175 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 461903858427 for ; Mon, 6 Jan 2025 13:07:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 461903858427 X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from angie.orcam.me.uk (angie.orcam.me.uk [78.133.224.34]) by sourceware.org (Postfix) with ESMTP id 9119D3858D38; Mon, 6 Jan 2025 13:03:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9119D3858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9119D3858D38 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=78.133.224.34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168629; cv=none; b=apbYrmxvqdeg29/tPStg8GhDvuWKJkCq5T1l+JZzQ4JIy2JnkwL+WR+r+8Jaap6uD+yzxcmbkexNng9duN8sW3fG9y8kEFoojtA5f6Lj9kNslOHdXUXJsA1TEjY/P+fnuAuzBGmf4KCG6k3XeQbZRE1N0Wl3yODnWM/KWTOioLw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168629; c=relaxed/simple; bh=zD3PpPSdu4yTZuX3rfb96l53c1obshE0PfDBmCikEhU=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=nOHNPhUfL0n0wpQqScU2JTp9iFbdyf4zrgHN2bf37k+MpN+di9zEFylW6p4OdlVrwqExqrWUNMRiLsiI5LB2ItkMr+CnySlfp5jwfi2CK74mde4Wd9MUfawARESIxP3arPiZhLK41kZ1Z64Hb5h7augxfoyEk/KVPXveBlk3dyE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by angie.orcam.me.uk (Postfix, from userid 500) id 25AE992009D; Mon, 6 Jan 2025 14:03:38 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 1FC6192009B; Mon, 6 Jan 2025 13:03:38 +0000 (GMT) Date: Mon, 6 Jan 2025 13:03:38 +0000 (GMT) From: "Maciej W. Rozycki" To: Richard Henderson , gcc-patches@gcc.gnu.org Subject: [PATCH v2 4/7] Alpha: Export `emit_unlikely_jump' for a subsequent change to use In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-1162.3 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_INFOUSMEBIZ, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org Rename `emit_unlikely_jump' function to `alpha_emit_unlikely_jump', so as to avoid namespace pollution, updating callers accordingly and export it for use in the machine description. Make it return the insn emitted. gcc/ * config/alpha/alpha-protos.h (alpha_emit_unlikely_jump): New prototype. * config/alpha/alpha.cc (emit_unlikely_jump): Rename to... (alpha_emit_unlikely_jump): ... this. Return the insn emitted. (alpha_split_atomic_op, alpha_split_compare_and_swap) (alpha_split_compare_and_swap_12, alpha_split_atomic_exchange) (alpha_split_atomic_exchange_12): Update call sites accordingly. --- No change from v1. --- gcc/config/alpha/alpha-protos.h | 1 + gcc/config/alpha/alpha.cc | 19 ++++++++++--------- 2 files changed, 11 insertions(+), 9 deletions(-) gcc-alpha-emit-unlikely-jump-export.diff Index: gcc/gcc/config/alpha/alpha-protos.h =================================================================== --- gcc.orig/gcc/config/alpha/alpha-protos.h +++ gcc/gcc/config/alpha/alpha-protos.h @@ -59,6 +59,7 @@ extern rtx alpha_expand_zap_mask (HOST_W extern void alpha_expand_builtin_vector_binop (rtx (*)(rtx, rtx, rtx), machine_mode, rtx, rtx, rtx); +extern rtx alpha_emit_unlikely_jump (rtx, rtx); extern void alpha_expand_builtin_establish_vms_condition_handler (rtx, rtx); extern void alpha_expand_builtin_revert_vms_condition_handler (rtx); Index: gcc/gcc/config/alpha/alpha.cc =================================================================== --- gcc.orig/gcc/config/alpha/alpha.cc +++ gcc/gcc/config/alpha/alpha.cc @@ -4420,12 +4420,13 @@ alpha_expand_builtin_vector_binop (rtx ( /* A subroutine of the atomic operation splitters. Jump to LABEL if COND is true. Mark the jump as unlikely to be taken. */ -static void -emit_unlikely_jump (rtx cond, rtx label) +rtx +alpha_emit_unlikely_jump (rtx cond, rtx label) { rtx x = gen_rtx_IF_THEN_ELSE (VOIDmode, cond, label, pc_rtx); rtx_insn *insn = emit_jump_insn (gen_rtx_SET (pc_rtx, x)); add_reg_br_prob_note (insn, profile_probability::very_unlikely ()); + return insn; } /* Subroutines of the atomic operation splitters. Emit barriers @@ -4517,7 +4518,7 @@ alpha_split_atomic_op (enum rtx_code cod emit_insn (gen_store_conditional (mode, cond, mem, scratch)); x = gen_rtx_EQ (DImode, cond, const0_rtx); - emit_unlikely_jump (x, label); + alpha_emit_unlikely_jump (x, label); alpha_post_atomic_barrier (model); } @@ -4567,7 +4568,7 @@ alpha_split_compare_and_swap (rtx operan emit_insn (gen_rtx_SET (cond, x)); x = gen_rtx_EQ (DImode, cond, const0_rtx); } - emit_unlikely_jump (x, label2); + alpha_emit_unlikely_jump (x, label2); emit_move_insn (cond, newval); emit_insn (gen_store_conditional @@ -4576,7 +4577,7 @@ alpha_split_compare_and_swap (rtx operan if (!is_weak) { x = gen_rtx_EQ (DImode, cond, const0_rtx); - emit_unlikely_jump (x, label1); + alpha_emit_unlikely_jump (x, label1); } if (!is_mm_relaxed (mod_f)) @@ -4679,7 +4680,7 @@ alpha_split_compare_and_swap_12 (rtx ope emit_insn (gen_rtx_SET (cond, x)); x = gen_rtx_EQ (DImode, cond, const0_rtx); } - emit_unlikely_jump (x, label2); + alpha_emit_unlikely_jump (x, label2); emit_insn (gen_mskxl (cond, scratch, mask, addr)); @@ -4691,7 +4692,7 @@ alpha_split_compare_and_swap_12 (rtx ope if (!is_weak) { x = gen_rtx_EQ (DImode, cond, const0_rtx); - emit_unlikely_jump (x, label1); + alpha_emit_unlikely_jump (x, label1); } if (!is_mm_relaxed (mod_f)) @@ -4731,7 +4732,7 @@ alpha_split_atomic_exchange (rtx operand emit_insn (gen_store_conditional (mode, cond, mem, scratch)); x = gen_rtx_EQ (DImode, cond, const0_rtx); - emit_unlikely_jump (x, label); + alpha_emit_unlikely_jump (x, label); alpha_post_atomic_barrier (model); } @@ -4805,7 +4806,7 @@ alpha_split_atomic_exchange_12 (rtx oper emit_insn (gen_store_conditional (DImode, scratch, mem, scratch)); x = gen_rtx_EQ (DImode, scratch, const0_rtx); - emit_unlikely_jump (x, label); + alpha_emit_unlikely_jump (x, label); alpha_post_atomic_barrier (model); } From patchwork Mon Jan 6 13:03:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 104176 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A53E4385840D for ; Mon, 6 Jan 2025 13:09:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A53E4385840D X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::34]) by sourceware.org (Postfix) with ESMTP id 920CF3858CD1; Mon, 6 Jan 2025 13:04:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 920CF3858CD1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 920CF3858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4190:8020::34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168647; cv=none; b=QZVckGoTdzjJUGQdKdSYD0MbW5xanKYibF1n+q5udAxPIleIZ5T2PbVW+AdH7gm5Ujbe6hoUJDMw7BgbS1A2x7Xvw3wH8c6tbg/IlvNYUI7aE8TL11zq31C9glNm4E2uqeHxhbtXIy4Ph/Ysdf8WodTCX/LS211owkYtpmksnhU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168647; c=relaxed/simple; bh=0rELzzB2nAyg2RbRxaI5ThJMsFvidmtrWYz9IkR7V1M=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=fkAB4/BP1CI1KyhfAC88ubuDY9yy2I4rvh/TG3MjG0PQofCCsjMPbxQW0JglAv35SCLe4Lc0Nr2fCIUJ8FCLWstWN1qBJtzNOGSWysbXpRBAzIgX5Ioy3NJwjRxewjPtF7AaR5a6IalxjdwFto1dt8wVUb8R/mVMBnQ4k1u1dCs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by angie.orcam.me.uk (Postfix, from userid 500) id EFE9092009C; Mon, 6 Jan 2025 14:03:42 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id E9BAA92009B; Mon, 6 Jan 2025 13:03:42 +0000 (GMT) Date: Mon, 6 Jan 2025 13:03:42 +0000 (GMT) From: "Maciej W. Rozycki" To: Richard Henderson , gcc-patches@gcc.gnu.org Subject: [PATCH v2 5/7] IRA+LRA: Let the backend request to split basic blocks In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-3487.4 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_INFOUSMEBIZ, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org The next change for Alpha will produce extra labels and branches in reload, which in turn requires basic blocks to be split at completion. We do this already for functions that can trap, so just extend the arrangement with a flag for the backend to use whenever it finds it necessary. gcc/ * function.h (struct function): Add `split_basic_blocks_after_reload' member. * lra.cc (lra): Handle it. * reload1.cc (reload): Likewise. --- This was approved by Richard Sandiford in v1, so I'll commit it along with 6/7, which relies on it. No change from v1. --- gcc/function.h | 3 +++ gcc/lra.cc | 6 ++++-- gcc/reload1.cc | 6 ++++-- 3 files changed, 11 insertions(+), 4 deletions(-) gcc-split-basic-blocks-after-reload.diff Index: gcc/gcc/function.h =================================================================== --- gcc.orig/gcc/function.h +++ gcc/gcc/function.h @@ -449,6 +449,9 @@ struct GTY(()) function { /* Set for artificial function created for [[assume (cond)]]. These should be GIMPLE optimized, but not expanded to RTL. */ unsigned int assume_function : 1; + + /* Nonzero if reload will have to split basic blocks. */ + unsigned int split_basic_blocks_after_reload : 1; }; /* Add the decl D to the local_decls list of FUN. */ Index: gcc/gcc/lra.cc =================================================================== --- gcc.orig/gcc/lra.cc +++ gcc/gcc/lra.cc @@ -2594,8 +2594,10 @@ lra (FILE *f, int verbose) inserted_p = fixup_abnormal_edges (); - /* We've possibly turned single trapping insn into multiple ones. */ - if (cfun->can_throw_non_call_exceptions) + /* Split basic blocks if we've possibly turned single trapping insn + into multiple ones or otherwise the backend requested to do so. */ + if (cfun->can_throw_non_call_exceptions + || cfun->split_basic_blocks_after_reload) { auto_sbitmap blocks (last_basic_block_for_fn (cfun)); bitmap_ones (blocks); Index: gcc/gcc/reload1.cc =================================================================== --- gcc.orig/gcc/reload1.cc +++ gcc/gcc/reload1.cc @@ -1272,8 +1272,10 @@ reload (rtx_insn *first, int global) inserted = fixup_abnormal_edges (); - /* We've possibly turned single trapping insn into multiple ones. */ - if (cfun->can_throw_non_call_exceptions) + /* Split basic blocks if we've possibly turned single trapping insn + into multiple ones or otherwise the backend requested to do so. */ + if (cfun->can_throw_non_call_exceptions + || cfun->split_basic_blocks_after_reload) { auto_sbitmap blocks (last_basic_block_for_fn (cfun)); bitmap_ones (blocks); From patchwork Mon Jan 6 13:03:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 104172 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0ACE0385828B for ; Mon, 6 Jan 2025 13:05:17 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::34]) by sourceware.org (Postfix) with ESMTP id E23FD3858283; Mon, 6 Jan 2025 13:04:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E23FD3858283 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E23FD3858283 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4190:8020::34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168658; cv=none; b=x47JVzYDYpn/7/sST79cYqYWlfdGQud/2SeBsmRuIwOpLBBKdRxBKk0F+vAdRqIzImVkP8BJlGdacI6hIHNPpXcLb5NQyuHCPk0EFaEbb5ByenSYL+KPYp2+RV3aSxHCOOAuHvJJ61uiI/V0jEXl+OBIjw1PZbMnc6EOAOHBLlE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168658; c=relaxed/simple; bh=8D32qUS5O5knNgg/r236ixLQE5u0eWpxyQim3ijN6Nk=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=BuVhmG8Z5iVEkSQ340MXcX5Sqn9q4cWZNrkMzHqgLPboowNm4AMh2cfxcbU3pu745ugku9o747T42pr/GGUU4Ov1N01XpJehQP/fitJW1byntM8UEnVIOy15FR8XPXAbqAi0bSBl1CL5+1ICEv9PaWzO/nQpNEpKPg3bE+hyydY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by angie.orcam.me.uk (Postfix, from userid 500) id 6D8EA92009E; Mon, 6 Jan 2025 14:03:47 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 66B9C92009B; Mon, 6 Jan 2025 13:03:47 +0000 (GMT) Date: Mon, 6 Jan 2025 13:03:47 +0000 (GMT) From: "Maciej W. Rozycki" To: Richard Henderson , gcc-patches@gcc.gnu.org cc: Arnd Bergmann , John Paul Adrian Glaubitz , Richard Henderson , "Paul E. McKenney" , Linus Torvalds Subject: [PATCH v2 6/7] Alpha: Add option to avoid data races for sub-longword memory stores [PR117759] In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-3487.4 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_INFOUSMEBIZ, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org With non-BWX Alpha implementations we have a problem of data races where a 8-bit byte or 16-bit word quantity is to be written to memory in that in those cases we use an unprotected RMW access of a 32-bit longword or 64-bit quadword width. If contents of the longword or quadword accessed outside the byte or word to be written are changed midway through by a concurrent write executing on the same CPU such as by a signal handler or a parallel write executing on another CPU such as by another thread or via a shared memory segment, then the concluding write of the RMW access will clobber them. This is especially important for the safety of RCU algorithms, but is otherwise an issue anyway. To guard against these data races with byte and aligned word quantities introduce the `-msafe-bwa' command-line option (standing for Safe Byte & Word Access) that instructs the compiler to instead use an atomic RMW access sequence where byte and word memory access machine instructions are not available. There is no change to code produced for BWX targets. It would be sufficient for the secondary reload handle to use a pair of scratch registers, as requested by `reload_out', but it would end with poor code produced as one of the scratches would be occupied by data retrieved and the other one would have to be reloaded with repeated calculations, all within the LL/SC sequence. Therefore I chose to add a dedicated `reload_out_safe_bwa' handler and ask for more scratches there by defining a 256-bit OI integer mode. While reload is documented in our manual to support an arbitrary number of scratches in reality it hasn't been implemented for IRA: /* ??? It would be useful to be able to handle only two, or more than three, operands, but for now we can only handle the case of having exactly three: output, input and one temp/scratch. */ and it seems to be the case for LRA as well. Do what everyone else does then and just have one wide multi-register scratch. I note that the atomic sequences emitted are suboptimal performance-wise as the looping branch for the unsuccessful completion of the sequence points backwards, which means it will be predicted as taken despite that in most cases it will fall through. I do not see it as a deficiency of this change proposed as it takes care of recording that the branch is unlikely to be taken, by calling `alpha_emit_unlikely_jump'. Therefore generic code elsewhere shou Add test cases accordingly. There are notable regressions between a plain `-mno-bwx' configuration and a `-mno-bwx -msafe-bwa' one: FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O0 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O1 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O3 -g execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -Os execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: g++.dg/init/array25.C -std=c++17 execution test FAIL: g++.dg/init/array25.C -std=c++98 execution test FAIL: g++.dg/init/array25.C -std=c++26 execution test They come from the fact that these test cases play tricks with alignment and end up calling code that expects a reference to aligned data but is handed one to unaligned data. This doesn't cause a visible problem with plain `-mno-bwx' code, because the resulting alignment exception is fixed up by Linux. There's no such handling currently implemented for LDL_L or LDQ_L instructions (which are first in the sequence) and consequently the offender is issued with SIGBUS instead. Suitable handling will be added to Linux to complement this change, so these regressions are seen as harmless and expected. gcc/ PR target/117759 * config/alpha/alpha-modes.def (OI): New integer mode. * config/alpha/alpha-protos.h (alpha_expand_mov_safe_bwa): New prototype. * config/alpha/alpha.cc (alpha_expand_mov_safe_bwa): New function. (alpha_secondary_reload): Handle TARGET_SAFE_BWA. * config/alpha/alpha.md (aligned_store_safe_bwa) (unaligned_store_safe_bwa, reload_out_safe_bwa) (reload_out_unaligned_safe_bwa): New expanders. (mov, movcqi, reload_out_aligned): Handle TARGET_SAFE_BWA. (reload_out): Guard against TARGET_SAFE_BWA. * config/alpha/alpha.opt (msafe-bwa): New option. * config/alpha/alpha.opt.urls: Regenerate. * doc/invoke.texi (Option Summary, DEC Alpha Options): Document the new option. gcc/testsuite/ PR target/117759 * gcc.target/alpha/stb.c: New file. * gcc.target/alpha/stb-bwa.c: New file. * gcc.target/alpha/stb-bwx.c: New file. * gcc.target/alpha/stba.c: New file. * gcc.target/alpha/stba-bwa.c: New file. * gcc.target/alpha/stba-bwx.c: New file. * gcc.target/alpha/stw.c: New file. * gcc.target/alpha/stw-bwa.c: New file. * gcc.target/alpha/stw-bwx.c: New file. * gcc.target/alpha/stwa.c: New file. * gcc.target/alpha/stwa-bwa.c: New file. * gcc.target/alpha/stwa-bwx.c: New file. --- NB I note that there is a warning in gcc/config/alpha/sync.md that it is unpredictable if the lock_flag is cleared or not by a normal load or store executed on the same CPU and therefore we need to make sure no register spill is inserted in the sequence. I seem not to have seen it actually happen and testing results with actual hardware do look good. However out of the abundance of caution we may want to make sure it can't happen. It should be quite a straightforward change, but owing to the number of issues encountered, as indicated by the size of the patchset, and the limited time I did not get to it in time for Stage 1 closure. So I've chosen to post this change for review anyway with the intent to make a suitable update in the coming weeks. As it only affects newly-added `-msafe-bwa' option I don't think it will be disruptive to the pre-release stabilisation process. I will appreciate input for this part NB2 I reckon the manual ought to be updated to say only one scratch is permitted for secondary reloads. While it's documented otherwise since forever, it's never actually matched reality. Changes from v1: - Add a reference to PR target/117759. --- gcc/config/alpha/alpha-modes.def | 4 gcc/config/alpha/alpha-protos.h | 1 gcc/config/alpha/alpha.cc | 68 ++++++++++++- gcc/config/alpha/alpha.md | 155 +++++++++++++++++++++++++++++- gcc/config/alpha/alpha.opt | 4 gcc/config/alpha/alpha.opt.urls | 3 gcc/doc/invoke.texi | 9 + gcc/testsuite/gcc.target/alpha/stb-bwa.c | 28 +++++ gcc/testsuite/gcc.target/alpha/stb-bwx.c | 16 +++ gcc/testsuite/gcc.target/alpha/stb.c | 25 ++++ gcc/testsuite/gcc.target/alpha/stba-bwa.c | 35 ++++++ gcc/testsuite/gcc.target/alpha/stba-bwx.c | 23 ++++ gcc/testsuite/gcc.target/alpha/stba.c | 33 ++++++ gcc/testsuite/gcc.target/alpha/stw-bwa.c | 28 +++++ gcc/testsuite/gcc.target/alpha/stw-bwx.c | 16 +++ gcc/testsuite/gcc.target/alpha/stw.c | 25 ++++ gcc/testsuite/gcc.target/alpha/stwa-bwa.c | 35 ++++++ gcc/testsuite/gcc.target/alpha/stwa-bwx.c | 23 ++++ gcc/testsuite/gcc.target/alpha/stwa.c | 33 ++++++ 19 files changed, 558 insertions(+), 6 deletions(-) gcc-alpha-safe-bwa.diff Index: gcc/gcc/config/alpha/alpha-modes.def =================================================================== --- gcc.orig/gcc/config/alpha/alpha-modes.def +++ gcc/gcc/config/alpha/alpha-modes.def @@ -17,6 +17,10 @@ You should have received a copy of the G along with GCC; see the file COPYING3. If not see . */ +/* 256-bit integer mode used by "reload_out_safe_bwa" secondary + reload patterns to obtain 4 scratch registers. */ +INT_MODE (OI, 32); + /* 128-bit floating point. This gets reset in alpha_option_override if VAX float format is in use. */ FLOAT_MODE (TF, 16, ieee_quad_format); Index: gcc/gcc/config/alpha/alpha-protos.h =================================================================== --- gcc.orig/gcc/config/alpha/alpha-protos.h +++ gcc/gcc/config/alpha/alpha-protos.h @@ -43,6 +43,7 @@ extern enum reg_class alpha_preferred_re extern void alpha_set_memflags (rtx, rtx); extern bool alpha_split_const_mov (machine_mode, rtx *); extern bool alpha_expand_mov (machine_mode, rtx *); +extern bool alpha_expand_mov_safe_bwa (machine_mode, rtx *); extern bool alpha_expand_mov_nobwx (machine_mode, rtx *); extern void alpha_expand_movmisalign (machine_mode, rtx *); extern void alpha_emit_floatuns (rtx[]); Index: gcc/gcc/config/alpha/alpha.cc =================================================================== --- gcc.orig/gcc/config/alpha/alpha.cc +++ gcc/gcc/config/alpha/alpha.cc @@ -1660,8 +1660,10 @@ alpha_secondary_reload (bool in_p, rtx x if (!aligned_memory_operand (x, mode)) sri->icode = direct_optab_handler (reload_in_optab, mode); } - else + else if (aligned_memory_operand (x, mode) || !TARGET_SAFE_BWA) sri->icode = direct_optab_handler (reload_out_optab, mode); + else + sri->icode = code_for_reload_out_safe_bwa (mode); return NO_REGS; } } @@ -2386,6 +2388,70 @@ alpha_expand_mov_nobwx (machine_mode mod } return true; } + + return false; +} + +/* Expand a multi-thread and async-signal safe QImode or HImode + move instruction; return true if all work is done. */ + +bool +alpha_expand_mov_safe_bwa (machine_mode mode, rtx *operands) +{ + /* If the output is not a register, the input must be. */ + if (MEM_P (operands[0])) + operands[1] = force_reg (mode, operands[1]); + + /* If it's a memory load, the sequence is the usual non-BWX one. */ + if (any_memory_operand (operands[1], mode)) + return alpha_expand_mov_nobwx (mode, operands); + + /* Handle memory store cases, unaligned and aligned. The only case + where we can be called during reload is for aligned loads; all + other cases require temporaries. */ + if (any_memory_operand (operands[0], mode)) + { + if (aligned_memory_operand (operands[0], mode)) + { + rtx label = gen_rtx_LABEL_REF (DImode, gen_label_rtx ()); + emit_label (XEXP (label, 0)); + + rtx aligned_mem, bitnum; + rtx status = gen_reg_rtx (SImode); + rtx temp = gen_reg_rtx (SImode); + get_aligned_mem (operands[0], &aligned_mem, &bitnum); + emit_insn (gen_aligned_store_safe_bwa (aligned_mem, operands[1], + bitnum, status, temp)); + + rtx cond = gen_rtx_EQ (DImode, + gen_rtx_SUBREG (DImode, status, 0), + const0_rtx); + alpha_emit_unlikely_jump (cond, label); + } + else + { + rtx addr = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (addr, get_unaligned_address (operands[0]))); + + rtx aligned_addr = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (aligned_addr, + gen_rtx_AND (DImode, addr, GEN_INT (-8)))); + + rtx label = gen_rtx_LABEL_REF (DImode, gen_label_rtx ()); + emit_label (XEXP (label, 0)); + + rtx status = gen_reg_rtx (DImode); + rtx temp = gen_reg_rtx (DImode); + rtx seq = gen_unaligned_store_safe_bwa (mode, addr, operands[1], + aligned_addr, status, temp); + alpha_set_memflags (seq, operands[0]); + emit_insn (seq); + + rtx cond = gen_rtx_EQ (DImode, status, const0_rtx); + alpha_emit_unlikely_jump (cond, label); + } + return true; + } return false; } Index: gcc/gcc/config/alpha/alpha.md =================================================================== --- gcc.orig/gcc/config/alpha/alpha.md +++ gcc/gcc/config/alpha/alpha.md @@ -4200,6 +4200,31 @@ << INTVAL (operands[2]))); }) +;; Multi-thread and async-signal safe variant. Operand 0 is the aligned +;; SImode MEM. Operand 1 is the data to store. Operand 2 is the number +;; of bits within the word that the value should be placed. Operand 3 is +;; the SImode status. Operand 4 is a SImode temporary. + +(define_expand "aligned_store_safe_bwa" + [(set (match_operand:SI 3 "register_operand") + (unspec_volatile:SI + [(match_operand:SI 0 "memory_operand")] UNSPECV_LL)) + (set (subreg:DI (match_dup 3) 0) + (and:DI (subreg:DI (match_dup 3) 0) (match_dup 5))) + (set (subreg:DI (match_operand:SI 4 "register_operand") 0) + (ashift:DI (zero_extend:DI (match_operand 1 "register_operand")) + (match_operand:DI 2 "const_int_operand"))) + (set (subreg:DI (match_dup 3) 0) + (ior:DI (subreg:DI (match_dup 4) 0) (subreg:DI (match_dup 3) 0))) + (parallel [(set (subreg:DI (match_dup 3) 0) + (unspec_volatile:DI [(const_int 0)] UNSPECV_SC)) + (set (match_dup 0) (match_dup 3))])] + "" +{ + operands[5] = GEN_INT (~(GET_MODE_MASK (GET_MODE (operands[1])) + << INTVAL (operands[2]))); +}) + ;; For the unaligned byte and halfword cases, we use code similar to that ;; in the Architecture book, but reordered to lower the number of registers ;; required. Operand 0 is the address. Operand 1 is the data to store. @@ -4227,6 +4252,31 @@ "" "operands[5] = GEN_INT (GET_MODE_MASK (mode));") +;; Multi-thread and async-signal safe variant. Operand 0 is the address. +;; Operand 1 is the data to store. Operand 2 is the aligned address. +;; Operand 3 is the DImode status. Operand 4 is a DImode temporary. + +(define_expand "@unaligned_store_safe_bwa" + [(set (match_operand:DI 3 "register_operand") + (unspec_volatile:DI + [(mem:DI (match_operand:DI 2 "register_operand"))] UNSPECV_LL)) + (set (match_dup 3) + (and:DI (not:DI + (ashift:DI (match_dup 5) + (ashift:DI (match_operand:DI 0 "register_operand") + (const_int 3)))) + (match_dup 3))) + (set (match_operand:DI 4 "register_operand") + (ashift:DI (zero_extend:DI + (match_operand:I12MODE 1 "register_operand")) + (ashift:DI (match_dup 0) (const_int 3)))) + (set (match_dup 3) (ior:DI (match_dup 4) (match_dup 3))) + (parallel [(set (match_dup 3) + (unspec_volatile:DI [(const_int 0)] UNSPECV_SC)) + (set (mem:DI (match_dup 2)) (match_dup 3))])] + "" + "operands[5] = GEN_INT (GET_MODE_MASK (mode));") + ;; Here are the define_expand's for QI and HI moves that use the above ;; patterns. We have the normal sets, plus the ones that need scratch ;; registers for reload. @@ -4236,8 +4286,8 @@ (match_operand:I12MODE 1 "general_operand"))] "" { - if (TARGET_BWX - ? alpha_expand_mov (mode, operands) + if (TARGET_BWX ? alpha_expand_mov (mode, operands) + : TARGET_SAFE_BWA ? alpha_expand_mov_safe_bwa (mode, operands) : alpha_expand_mov_nobwx (mode, operands)) DONE; }) @@ -4292,7 +4342,9 @@ operands[1] = gen_lowpart (HImode, operands[1]); do_aligned2: operands[0] = gen_lowpart (HImode, operands[0]); - done = alpha_expand_mov_nobwx (HImode, operands); + done = (TARGET_SAFE_BWA + ? alpha_expand_mov_safe_bwa (HImode, operands) + : alpha_expand_mov_nobwx (HImode, operands)); gcc_assert (done); DONE; } @@ -4371,6 +4423,8 @@ } else { + gcc_assert (!TARGET_SAFE_BWA); + rtx addr = get_unaligned_address (operands[0]); rtx scratch1 = gen_rtx_REG (DImode, regno); rtx scratch2 = gen_rtx_REG (DImode, regno + 1); @@ -4388,6 +4442,52 @@ DONE; }) +(define_expand "@reload_out_safe_bwa" + [(parallel [(match_operand:RELOAD12 0 "any_memory_operand" "=m") + (match_operand:RELOAD12 1 "register_operand" "r") + (match_operand:OI 2 "register_operand" "=&r")])] + "!TARGET_BWX && TARGET_SAFE_BWA" +{ + unsigned regno = REGNO (operands[2]); + + if (mode == CQImode) + { + operands[0] = gen_lowpart (HImode, operands[0]); + operands[1] = gen_lowpart (HImode, operands[1]); + } + + rtx addr = get_unaligned_address (operands[0]); + rtx status = gen_rtx_REG (DImode, regno); + rtx areg = gen_rtx_REG (DImode, regno + 1); + rtx aligned_addr = gen_rtx_REG (DImode, regno + 2); + rtx scratch = gen_rtx_REG (DImode, regno + 3); + + if (REG_P (addr)) + areg = addr; + else + emit_move_insn (areg, addr); + emit_move_insn (aligned_addr, gen_rtx_AND (DImode, areg, GEN_INT (-8))); + + rtx label = gen_label_rtx (); + emit_label (label); + LABEL_NUSES (label) = 1; + + rtx seq = gen_reload_out_unaligned_safe_bwa (areg, operands[1], + aligned_addr, + status, scratch); + alpha_set_memflags (seq, operands[0]); + emit_insn (seq); + + rtx label_ref = gen_rtx_LABEL_REF (DImode, label); + rtx cond = gen_rtx_EQ (DImode, status, const0_rtx); + rtx jump = alpha_emit_unlikely_jump (cond, label_ref); + JUMP_LABEL (jump) = label; + + cfun->split_basic_blocks_after_reload = 1; + + DONE; +}) + ;; Helpers for the above. The way reload is structured, we can't ;; always get a proper address for a stack slot during reload_foo ;; expansion, so we must delay our address manipulations until after. @@ -4420,10 +4520,55 @@ { rtx aligned_mem, bitnum; get_aligned_mem (operands[0], &aligned_mem, &bitnum); - emit_insn (gen_aligned_store (aligned_mem, operands[1], bitnum, - operands[2], operands[3])); + if (TARGET_SAFE_BWA) + { + rtx label = gen_label_rtx (); + emit_label (label); + LABEL_NUSES (label) = 1; + + rtx status = operands[2]; + rtx temp = operands[3]; + emit_insn (gen_aligned_store_safe_bwa (aligned_mem, operands[1], bitnum, + status, temp)); + + rtx label_ref = gen_rtx_LABEL_REF (DImode, label); + rtx cond = gen_rtx_EQ (DImode, gen_rtx_SUBREG (DImode, status, 0), + const0_rtx); + rtx jump = alpha_emit_unlikely_jump (cond, label_ref); + JUMP_LABEL (jump) = label; + + cfun->split_basic_blocks_after_reload = 1; + } + else + emit_insn (gen_aligned_store (aligned_mem, operands[1], bitnum, + operands[2], operands[3])); DONE; }) + +;; Operand 0 is the address. Operand 1 is the data to store. Operand 2 +;; is the aligned address. Operand 3 is the DImode status. Operand 4 is +;; a DImode scratch. + +(define_expand "reload_out_unaligned_safe_bwa" + [(set (match_operand:DI 3 "register_operand") + (unspec_volatile:DI [(mem:DI (match_operand:DI 2 "register_operand"))] + UNSPECV_LL)) + (set (match_dup 3) + (and:DI (not:DI + (ashift:DI (match_dup 5) + (ashift:DI (match_operand:DI 0 "register_operand") + (const_int 3)))) + (match_dup 3))) + (set (match_operand:DI 4 "register_operand") + (ashift:DI (zero_extend:DI + (match_operand:I12MODE 1 "register_operand")) + (ashift:DI (match_dup 0) (const_int 3)))) + (set (match_dup 3) (ior:DI (match_dup 4) (match_dup 3))) + (parallel [(set (match_dup 3) + (unspec_volatile:DI [(const_int 0)] UNSPECV_SC)) + (set (mem:DI (match_dup 2)) (match_dup 3))])] + "" + "operands[5] = GEN_INT (GET_MODE_MASK (mode));") ;; Vector operations Index: gcc/gcc/config/alpha/alpha.opt =================================================================== --- gcc.orig/gcc/config/alpha/alpha.opt +++ gcc/gcc/config/alpha/alpha.opt @@ -69,6 +69,10 @@ mcix Target Mask(CIX) Emit code for the counting ISA extension. +msafe-bwa +Target Mask(SAFE_BWA) +Emit multi-thread and async-signal safe code for byte and word memory accesses. + mexplicit-relocs Target Mask(EXPLICIT_RELOCS) Emit code using explicit relocation directives. Index: gcc/gcc/config/alpha/alpha.opt.urls =================================================================== --- gcc.orig/gcc/config/alpha/alpha.opt.urls +++ gcc/gcc/config/alpha/alpha.opt.urls @@ -35,6 +35,9 @@ UrlSuffix(gcc/DEC-Alpha-Options.html#ind mcix UrlSuffix(gcc/DEC-Alpha-Options.html#index-mcix) +msafe-bwa +UrlSuffix(gcc/DEC-Alpha-Options.html#index-msafe-bwa) + mexplicit-relocs UrlSuffix(gcc/DEC-Alpha-Options.html#index-mexplicit-relocs) Index: gcc/gcc/doc/invoke.texi =================================================================== --- gcc.orig/gcc/doc/invoke.texi +++ gcc/gcc/doc/invoke.texi @@ -976,6 +976,7 @@ Objective-C and Objective-C++ Dialects}. -mtrap-precision=@var{mode} -mbuild-constants -mcpu=@var{cpu-type} -mtune=@var{cpu-type} -mbwx -mmax -mfix -mcix +-msafe-bwa -mfloat-vax -mfloat-ieee -mexplicit-relocs -msmall-data -mlarge-data -msmall-text -mlarge-text @@ -25691,6 +25692,14 @@ CIX, FIX and MAX instruction sets. The sets supported by the CPU type specified via @option{-mcpu=} option or that of the CPU on which GCC was built if none is specified. +@opindex msafe-bwa +@opindex mno-safe-bwa +@item -msafe-bwa +@itemx -mno-safe-bwa +Indicate whether in the absence of the optional BWX instruction set +GCC should generate multi-thread and async-signal safe code for byte +and aligned word memory accesses. + @opindex mfloat-vax @opindex mfloat-ieee @item -mfloat-vax Index: gcc/gcc/testsuite/gcc.target/alpha/stb-bwa.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stb-bwa.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx -msafe-bwa" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +void +stb (char *p, char v) +{ + *p = v; +} + +/* Expect assembly such as: + + bic $16,7,$2 + insbl $17,$16,$17 +$L2: + ldq_l $1,0($2) + mskbl $1,$16,$1 + bis $17,$1,$1 + stq_c $1,0($2) + beq $1,$L2 + + with address masking. */ + +/* { dg-final { scan-assembler-times "\\sldq_l\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstq_c\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sinsbl\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\smskbl\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sbic\\s\\\$\[0-9\]+,7,\\\$\[0-9\]+\\s" 1 } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stb-bwx.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stb-bwx.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mbwx" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +void +stb (char *p, char v) +{ + *p = v; +} + +/* Expect assembly such as: + + stb $17,0($16) + */ + +/* { dg-final { scan-assembler-times "\\sstb\\s\\\$17,0\\\(\\\$16\\\)\\s" 1 } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stb.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stb.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx -mno-safe-bwa" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +void +stb (char *p, char v) +{ + *p = v; +} + +/* Expect assembly such as: + + insbl $17,$16,$17 + ldq_u $1,0($16) + mskbl $1,$16,$1 + bis $17,$1,$17 + stq_u $17,0($16) + + without address masking. */ + +/* { dg-final { scan-assembler-times "\\sldq_u\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstq_u\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sinsbl\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\smskbl\\s" 1 } } */ +/* { dg-final { scan-assembler-not "\\sbic\\s\\\$\[0-9\]+,7,\\\$\[0-9\]+\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stba-bwa.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stba-bwa.c @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx -msafe-bwa" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +typedef union + { + int i; + char c; + } +char_a; + +void +stba (char_a *p, char v) +{ + p->c = v; +} + +/* Expect assembly such as: + + and $17,0xff,$17 +$L2: + ldl_l $1,0($16) + bic $1,255,$1 + bis $17,$1,$1 + stl_c $1,0($16) + beq $1,$L2 + + without any INSBL or MSKBL instructions and without address masking. */ + +/* { dg-final { scan-assembler-times "\\sldl_l\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstl_c\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sand\\s\\\$\[0-9\]+,0xff,\\\$\[0-9\]+\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sbic\\s\\\$\[0-9\]+,255,\\\$\[0-9\]+\\s" 1 } } */ +/* { dg-final { scan-assembler-not "\\sbic\\s\\\$\[0-9\]+,7,\\\$\[0-9\]+\\s" } } */ +/* { dg-final { scan-assembler-not "\\s(?:insbl|mskbl)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stba-bwx.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stba-bwx.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-mbwx" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +typedef union + { + int i; + char c; + } +char_a; + +void +stba (char_a *p, char v) +{ + p->c = v; +} + +/* Expect assembly such as: + + stb $17,0($16) + */ + +/* { dg-final { scan-assembler-times "\\sstb\\s\\\$17,0\\\(\\\$16\\\)\\s" 1 } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stba.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stba.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx -mno-safe-bwa" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +typedef union + { + int i; + char c; + } +char_a; + +void +stba (char_a *p, char v) +{ + p->c = v; +} + +/* Expect assembly such as: + + and $17,0xff,$17 + ldl $1,0($16) + bic $1,255,$1 + bis $17,$1,$17 + stl $17,0($16) + + without any INSBL or MSKBL instructions and without address masking. */ + +/* { dg-final { scan-assembler-times "\\sldl\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstl\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sand\\s\\\$\[0-9\]+,0xff,\\\$\[0-9\]+\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sbic\\s\\\$\[0-9\]+,255,\\\$\[0-9\]+\\s" 1 } } */ +/* { dg-final { scan-assembler-not "\\sbic\\s\\\$\[0-9\]+,7,\\\$\[0-9\]+\\s" } } */ +/* { dg-final { scan-assembler-not "\\s(?:insbl|mskbl)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stw-bwa.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stw-bwa.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx -msafe-bwa" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +void +stw (short *p, short v) +{ + *p = v; +} + +/* Expect assembly such as: + + bic $16,7,$2 + inswl $17,$16,$17 +$L2: + ldq_l $1,0($2) + mskwl $1,$16,$1 + bis $17,$1,$1 + stq_c $1,0($2) + beq $1,$L2 + + with address masking. */ + +/* { dg-final { scan-assembler-times "\\sldq_l\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstq_c\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sinswl\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\smskwl\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sbic\\s\\\$\[0-9\]+,7,\\\$\[0-9\]+\\s" 1 } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stw-bwx.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stw-bwx.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mbwx" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +void +stw (short *p, short v) +{ + *p = v; +} + +/* Expect assembly such as: + + stw $17,0($16) + */ + +/* { dg-final { scan-assembler-times "\\sstw\\s\\\$17,0\\\(\\\$16\\\)\\s" 1 } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stw.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stw.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx -mno-safe-bwa" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +void +stw (short *p, short v) +{ + *p = v; +} + +/* Expect assembly such as: + + inswl $17,$16,$17 + ldq_u $1,0($16) + mskwl $1,$16,$1 + bis $17,$1,$17 + stq_u $17,0($16) + + without address masking. */ + +/* { dg-final { scan-assembler-times "\\sldq_u\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstq_u\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sinswl\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\smskwl\\s" 1 } } */ +/* { dg-final { scan-assembler-not "\\sbic\\s\\\$\[0-9\]+,7,\\\$\[0-9\]+\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stwa-bwa.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stwa-bwa.c @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx -msafe-bwa" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +typedef union + { + int i; + short c; + } +short_a; + +void +stwa (short_a *p, short v) +{ + p->c = v; +} + +/* Expect assembly such as: + + zapnot $17,3,$17 +$L2: + ldl_l $1,0($16) + zapnot $1,252,$1 + bis $17,$1,$1 + stl_c $1,0($16) + beq $1,$L2 + + without any INSWL or MSKWL instructions and without address masking. */ + +/* { dg-final { scan-assembler-times "\\sldl_l\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstl_c\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\szapnot\\s\\\$\[0-9\]+,3,\\\$\[0-9\]+\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\szapnot\\s\\\$\[0-9\]+,252,\\\$\[0-9\]+\\s" 1 } } */ +/* { dg-final { scan-assembler-not "\\sbic\\s\\\$\[0-9\]+,7,\\\$\[0-9\]+\\s" } } */ +/* { dg-final { scan-assembler-not "\\s(?:inswl|mskwl)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stwa-bwx.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stwa-bwx.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-mbwx" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +typedef union + { + int i; + short c; + } +short_a; + +void +stwa (short_a *p, short v) +{ + p->c = v; +} + +/* Expect assembly such as: + + stw $17,0($16) + */ + +/* { dg-final { scan-assembler-times "\\sstw\\s\\\$17,0\\\(\\\$16\\\)\\s" 1 } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stwa.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stwa.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx -mno-safe-bwa" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +typedef union + { + int i; + short c; + } +short_a; + +void +stwa (short_a *p, short v) +{ + p->c = v; +} + +/* Expect assembly such as: + + zapnot $17,3,$17 + ldl $1,0($16) + zapnot $1,252,$1 + bis $17,$1,$17 + stl $17,0($16) + + without any INSWL or MSKWL instructions and without address masking. */ + +/* { dg-final { scan-assembler-times "\\sldl\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstl\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\szapnot\\s\\\$\[0-9\]+,3,\\\$\[0-9\]+\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\szapnot\\s\\\$\[0-9\]+,252,\\\$\[0-9\]+\\s" 1 } } */ +/* { dg-final { scan-assembler-not "\\sbic\\s\\\$\[0-9\]+,7,\\\$\[0-9\]+\\s" } } */ +/* { dg-final { scan-assembler-not "\\s(?:inswl|mskwl)\\s" } } */ From patchwork Mon Jan 6 13:03:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 104173 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 09A8D3858423 for ; Mon, 6 Jan 2025 13:05:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from angie.orcam.me.uk (angie.orcam.me.uk [78.133.224.34]) by sourceware.org (Postfix) with ESMTP id DF39A3858D29; Mon, 6 Jan 2025 13:04:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DF39A3858D29 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DF39A3858D29 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=78.133.224.34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168663; cv=none; b=sjEpYvm5Oe4zgEZe0YPiK9dvaE9C3jxJY+SFVpIxn9AQEtGxNuRByLGmf7e71xqmdzFSqAC//mvxJDV57qm3XTM9AA3VcYGocl9AAjI1Dux/2qgS0ampKbGSZjzXHmIGSi0WuF8OM+V9M9P3YSi4nekxJcUJgdzigJTidpGzR1w= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736168663; c=relaxed/simple; bh=ORwgUpusNmangonRJHC3tlfneyUoI4Oo7sYzfEreJ5g=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=stOTRJztfam0O9YP3B5VbMXjfcuF7p02GKQeqTKV0LVbWPyyH1VAuwq6EXXwecAx3DMU8jpubmdEwd2NgwKAI7yB74ToDDvNLREKhPL7GkAAGitqGcaugkVPQUvTvW09uBD9fRflGNuoGQlqTQxNdIHSM5jKaQzdqTfCzyYvBlU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by angie.orcam.me.uk (Postfix, from userid 500) id 62E069200B3; Mon, 6 Jan 2025 14:03:52 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 5BA6192009B; Mon, 6 Jan 2025 13:03:52 +0000 (GMT) Date: Mon, 6 Jan 2025 13:03:52 +0000 (GMT) From: "Maciej W. Rozycki" To: Richard Henderson , gcc-patches@gcc.gnu.org cc: Arnd Bergmann , John Paul Adrian Glaubitz , Richard Henderson , "Paul E. McKenney" , Linus Torvalds Subject: [PATCH v2 7/7] Alpha: Add option to avoid data races for partial writes [PR117759] In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-1162.3 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_INFOUSMEBIZ, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org Similarly to data races with 8-bit byte or 16-bit word quantity memory writes on non-BWX Alpha implementations we have the same problem even on BWX implementations with partial memory writes produced for unaligned stores as well as block memory move and clear operations. This happens at the boundaries of the area written where we produce unprotected RMW sequences, such as for example: ldbu $1,0($3) stw $31,8($3) stq $1,0($3) to zero a 9-byte member at the byte offset of 1 of a quadword-aligned struct, happily clobbering a 1-byte member at the beginning of said struct if concurrent write happens while executing on the same CPU such as in a signal handler or a parallel write happens while executing on another CPU such as in another thread or via a shared memory segment. To guard against these data races with partial memory write accesses introduce the `-msafe-partial' command-line option that instructs the compiler to protect boundaries of the data quantity accessed by instead using a longer code sequence composed of narrower memory writes where suitable machine instructions are available (i.e. with BWX targets) or atomic RMW access sequences where byte and word memory access machine instructions are not available (i.e. with non-BWX targets). Owing to the desire of branch avoidance there are redundant overlapping writes in unaligned cases where STQ_U operations are used in the middle of a block so as to make sure no part of data to be written has been lost regardless of run-time alignment. For the non-BWX case it means that with blocks whose size is not a multiple of 8 there are additional atomic RMW sequences issued towards the end of the block in addition to the always required pair enclosing the block from each end. Only one such additional atomic RMW sequence is actually required, but code currently issues two for the sake of simplicity. An improvement might be added to `alpha_expand_unaligned_store_words_safe_partial' in the future, by folding `alpha_expand_unaligned_store_safe_partial' code for handling multi-word blocks whose size is not a multiple of 8 (i.e. with a trailing partial-word part). It would improve performance a bit, but current code is correct regardless. Update test cases with `-mno-safe-partial' where required and add new ones accordingly. There are notable regressions between a plain `-mno-bwx' configuration and a `-mno-bwx -msafe-partial' one: FAIL: gm2/iso/run/pass/strcons.mod execution, -g FAIL: gm2/iso/run/pass/strcons.mod execution, -O FAIL: gm2/iso/run/pass/strcons.mod execution, -O -g FAIL: gm2/iso/run/pass/strcons.mod execution, -Os FAIL: gm2/iso/run/pass/strcons.mod execution, -O3 -fomit-frame-pointer FAIL: gm2/iso/run/pass/strcons.mod execution, -O3 -fomit-frame-pointer -finline-functions FAIL: gm2/iso/run/pass/strcons4.mod execution, -g FAIL: gm2/iso/run/pass/strcons4.mod execution, -O FAIL: gm2/iso/run/pass/strcons4.mod execution, -O -g FAIL: gm2/iso/run/pass/strcons4.mod execution, -Os FAIL: gm2/iso/run/pass/strcons4.mod execution, -O3 -fomit-frame-pointer FAIL: gm2/iso/run/pass/strcons4.mod execution, -O3 -fomit-frame-pointer -finline-functions Just as with `-msafe-bwa' regressions they come from the fact that these test cases end up calling code that expects a reference to aligned data but is handed one to unaligned data, causing an alignment exception with LDL_L or LDQ_L, which will eventually be fixed up by Linux. In some cases GCC chooses to open-code block memory write operations, so with non-BWX targets `-msafe-partial' will in the usual case have to be used together with `-msafe-bwa'. Credit to Magnus Lindholm for sharing hardware for the purpose of verifying the BWX side of this change. gcc/ PR target/117759 * config/alpha/alpha-protos.h (alpha_expand_unaligned_store_safe_partial): New prototype. * config/alpha/alpha.cc (alpha_expand_movmisalign) (alpha_expand_block_move, alpha_expand_block_clear): Handle TARGET_SAFE_PARTIAL. (alpha_expand_unaligned_store_safe_partial) (alpha_expand_unaligned_store_words_safe_partial) (alpha_expand_clear_safe_partial_nobwx): New functions. * config/alpha/alpha.md (insvmisaligndi): Handle TARGET_SAFE_PARTIAL. * config/alpha/alpha.opt (msafe-partial): New option. * config/alpha/alpha.opt.urls: Regenerate. * doc/invoke.texi (Option Summary, DEC Alpha Options): Document the new option. gcc/testsuite/ PR target/117759 * gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Add `-mno-safe-partial'. * gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c: New file. * gcc.target/alpha/stlx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stlx0-safe-partial.c: New file. * gcc.target/alpha/stlx0-safe-partial-bwx.c: New file. * gcc.target/alpha/stqx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stqx0-safe-partial.c: New file. * gcc.target/alpha/stqx0-safe-partial-bwx.c: New file. * gcc.target/alpha/stwx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stwx0-bwx.c: Add `-mno-safe-partial'. Refer to stwx0.c rather than copying its code and also verify no LDQ_U or STQ_U instructions have been produced. * gcc.target/alpha/stwx0-safe-partial.c: New file. * gcc.target/alpha/stwx0-safe-partial-bwx.c: New file. --- Verifying with the `alphaev56-linux-gnu' target revealed a bunch of regressions with test cases where I forgot to add `-mno-safe-partial'. I took the opportunity to add complementary tests to cover the `-msafe-partial' case too. NB from my limited experience with Modula 2 decades ago I thought the language was strongly-typed, so an alignment mismatch I guess shouldn't happen. But perhaps I've been wrong; corrections are welcome. NB2 as expected the atomic RMW sequences have a noticable influence on the system's performance. Regression testing completes in ~19h30m for `-mno-bwx' and `23h15m' for `-mno-bwx -msafe-bwa -msafe-partial'. But correctness has to take priority over performance. Changes from v1: - Add a reference to PR target/117759. - Add `-mno-safe-partial' to memclr-a2-o1-c9-ptr.c, stlx0.c, stwx0.c, and stwx0-bwx.c tests. - Make stwx0-bwx.c a bit stricter and also verify no LDQ_U or STQ_U instructions have been produced and include stwx0.c rather than copying its code. - Add memclr-a2-o1-c9-ptr-safe-partial.c, stlx0-safe-partial.c, stlx0-safe-partial-bwx.c, stqx0-safe-partial.c, stqx0-safe-partial-bwx.c, stwx0-safe-partial.c, and stwx0-safe-partial-bwx.c tests. - Update the change description accordingly. --- gcc/config/alpha/alpha-protos.h | 3 gcc/config/alpha/alpha.cc | 616 +++++++++- gcc/config/alpha/alpha.md | 12 gcc/config/alpha/alpha.opt | 4 gcc/config/alpha/alpha.opt.urls | 3 gcc/doc/invoke.texi | 12 gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c | 22 gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr.c | 2 gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c | 13 gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c | 12 gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst.c | 2 gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c | 13 gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c | 12 gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst.c | 2 gcc/testsuite/gcc.target/alpha/stlx0-safe-partial-bwx.c | 17 gcc/testsuite/gcc.target/alpha/stlx0-safe-partial.c | 29 gcc/testsuite/gcc.target/alpha/stlx0.c | 2 gcc/testsuite/gcc.target/alpha/stqx0-safe-partial-bwx.c | 21 gcc/testsuite/gcc.target/alpha/stqx0-safe-partial.c | 29 gcc/testsuite/gcc.target/alpha/stqx0.c | 2 gcc/testsuite/gcc.target/alpha/stwx0-bwx.c | 14 gcc/testsuite/gcc.target/alpha/stwx0-safe-partial-bwx.c | 15 gcc/testsuite/gcc.target/alpha/stwx0-safe-partial.c | 29 gcc/testsuite/gcc.target/alpha/stwx0.c | 2 24 files changed, 837 insertions(+), 51 deletions(-) gcc-alpha-safe-partial.diff Index: gcc/gcc/config/alpha/alpha-protos.h =================================================================== --- gcc.orig/gcc/config/alpha/alpha-protos.h +++ gcc/gcc/config/alpha/alpha-protos.h @@ -54,6 +54,9 @@ extern void alpha_expand_unaligned_load HOST_WIDE_INT, int); extern void alpha_expand_unaligned_store (rtx, rtx, HOST_WIDE_INT, HOST_WIDE_INT); +extern void alpha_expand_unaligned_store_safe_partial (rtx, rtx, HOST_WIDE_INT, + HOST_WIDE_INT, + HOST_WIDE_INT); extern int alpha_expand_block_move (rtx []); extern int alpha_expand_block_clear (rtx []); extern rtx alpha_expand_zap_mask (HOST_WIDE_INT); Index: gcc/gcc/config/alpha/alpha.cc =================================================================== --- gcc.orig/gcc/config/alpha/alpha.cc +++ gcc/gcc/config/alpha/alpha.cc @@ -2481,7 +2481,11 @@ alpha_expand_movmisalign (machine_mode m { if (!reg_or_0_operand (operands[1], mode)) operands[1] = force_reg (mode, operands[1]); - alpha_expand_unaligned_store (operands[0], operands[1], 8, 0); + if (TARGET_SAFE_PARTIAL) + alpha_expand_unaligned_store_safe_partial (operands[0], operands[1], + 8, 0, BITS_PER_UNIT); + else + alpha_expand_unaligned_store (operands[0], operands[1], 8, 0); } else gcc_unreachable (); @@ -3673,6 +3677,310 @@ alpha_expand_unaligned_store (rtx dst, r emit_move_insn (meml, dstl); } +/* Store data SRC of size SIZE using unaligned methods to location + referred by base DST plus offset OFS and of alignment ALIGN. This is + a multi-thread and async-signal safe implementation for all sizes from + 8 down to 1. + + For BWX targets it is straightforward, we just write data piecemeal, + taking any advantage of the alignment known and observing that we + shouldn't have been called for alignments of 32 or above in the first + place (though adding support for that would be easy). + + For non-BWX targets we need to load data from memory, mask it such as + to keep any part outside the area written, insert data to be stored, + and write the result back atomically. For sizes that are not a power + of 2 there are no byte mask or insert machine instructions available + so the mask required has to be built by hand, however ZAP and ZAPNOT + instructions can then be used to apply the mask. Since LL/SC loops + are used, the high and low parts have to be disentangled from each + other and handled sequentially except for size 1 where there is only + the low part to be written. */ + +void +alpha_expand_unaligned_store_safe_partial (rtx dst, rtx src, + HOST_WIDE_INT size, + HOST_WIDE_INT ofs, + HOST_WIDE_INT align) +{ + if (TARGET_BWX) + { + machine_mode mode = align >= 2 * BITS_PER_UNIT ? HImode : QImode; + HOST_WIDE_INT step = mode == HImode ? 2 : 1; + + while (1) + { + rtx dstl = src == const0_rtx ? const0_rtx : gen_lowpart (mode, src); + rtx meml = adjust_address (dst, mode, ofs); + emit_move_insn (meml, dstl); + + ofs += step; + size -= step; + if (size == 0) + return; + + if (size < step) + { + mode = QImode; + step = 1; + } + + if (src != const0_rtx) + src = expand_simple_binop (DImode, LSHIFTRT, src, + GEN_INT (step * BITS_PER_UNIT), + NULL, 1, OPTAB_WIDEN); + } + } + + rtx dsta = XEXP (dst, 0); + if (GET_CODE (dsta) == LO_SUM) + dsta = force_reg (Pmode, dsta); + + rtx addr = copy_addr_to_reg (plus_constant (Pmode, dsta, ofs)); + + rtx byte_mask = NULL_RTX; + switch (size) + { + case 3: + case 5: + case 6: + case 7: + /* If size is not a power of 2 we need to build the byte mask from + size by hand. This is SIZE consecutive bits starting from bit 0. */ + byte_mask = force_reg (DImode, GEN_INT (~(HOST_WIDE_INT_M1U << size))); + + /* Unlike with machine INSxx and MSKxx operations there is no + implicit mask applied to addr with corresponding operations + made by hand, so extract the byte index now. */ + emit_insn (gen_rtx_SET (addr, + gen_rtx_AND (DImode, addr, GEN_INT (~-8)))); + } + + /* Must handle high before low for degenerate case of aligned. */ + if (size != 1) + { + rtx addrh = gen_reg_rtx (DImode); + rtx aligned_addrh = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (addrh, + plus_constant (DImode, dsta, ofs + size - 1))); + emit_insn (gen_rtx_SET (aligned_addrh, + gen_rtx_AND (DImode, addrh, GEN_INT (-8)))); + + /* AND addresses cannot be in any alias set, since they may implicitly + alias surrounding code. Ideally we'd have some alias set that + covered all types except those with alignment 8 or higher. */ + rtx memh = change_address (dst, DImode, aligned_addrh); + set_mem_alias_set (memh, 0); + + rtx insh = gen_reg_rtx (DImode); + rtx maskh = NULL_RTX; + switch (size) + { + case 1: + case 2: + case 4: + case 8: + if (src != CONST0_RTX (GET_MODE (src))) + emit_insn (gen_insxh (insh, gen_lowpart (DImode, src), + GEN_INT (size * 8), addr)); + break; + case 3: + case 5: + case 6: + case 7: + { + /* For the high part we shift the byte mask right by 8 minus + the byte index in addr, so we need an extra calculation. */ + rtx shamt = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (shamt, + gen_rtx_MINUS (DImode, + force_reg (DImode, + GEN_INT (8)), + addr))); + + maskh = gen_reg_rtx (DImode); + rtx shift = gen_rtx_LSHIFTRT (DImode, byte_mask, shamt); + emit_insn (gen_rtx_SET (maskh, shift)); + + /* Insert any bytes required by hand, by doing a byte-wise + shift on SRC right by the same number and then zap the + bytes outside the byte mask. */ + if (src != CONST0_RTX (GET_MODE (src))) + { + rtx byte_loc = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (byte_loc, + gen_rtx_ASHIFT (DImode, + shamt, GEN_INT (3)))); + rtx bytes = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (bytes, + gen_rtx_LSHIFTRT (DImode, + gen_lowpart (DImode, + src), + byte_loc))); + + rtx zapmask = gen_rtx_NOT (QImode, + gen_rtx_SUBREG (QImode, maskh, 0)); + rtx zap = gen_rtx_UNSPEC (DImode, gen_rtvec (1, zapmask), + UNSPEC_ZAP); + emit_insn (gen_rtx_SET (insh, + gen_rtx_AND (DImode, zap, bytes))); + } + } + break; + default: + gcc_unreachable (); + } + + rtx labelh = gen_rtx_LABEL_REF (DImode, gen_label_rtx ()); + emit_label (XEXP (labelh, 0)); + + rtx dsth = gen_reg_rtx (DImode); + emit_insn (gen_load_locked (DImode, dsth, memh)); + + switch (size) + { + case 1: + case 2: + case 4: + case 8: + emit_insn (gen_mskxh (dsth, dsth, GEN_INT (size * 8), addr)); + break; + case 3: + case 5: + case 6: + case 7: + { + rtx zapmask = gen_rtx_SUBREG (QImode, maskh, 0); + rtx zap = gen_rtx_UNSPEC (DImode, gen_rtvec (1, zapmask), + UNSPEC_ZAP); + emit_insn (gen_rtx_SET (dsth, gen_rtx_AND (DImode, zap, dsth))); + } + break; + default: + gcc_unreachable (); + } + + if (src != CONST0_RTX (GET_MODE (src))) + dsth = expand_simple_binop (DImode, IOR, insh, dsth, dsth, 0, + OPTAB_WIDEN); + + emit_insn (gen_store_conditional (DImode, dsth, memh, dsth)); + + alpha_emit_unlikely_jump (gen_rtx_EQ (DImode, dsth, const0_rtx), labelh); + } + + /* Now handle low. */ + rtx addrl = gen_reg_rtx (DImode); + rtx aligned_addrl = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (addrl, plus_constant (DImode, dsta, ofs))); + emit_insn (gen_rtx_SET (aligned_addrl, + gen_rtx_AND (DImode, addrl, GEN_INT (-8)))); + + /* AND addresses cannot be in any alias set, since they may implicitly + alias surrounding code. Ideally we'd have some alias set that + covered all types except those with alignment 8 or higher. */ + rtx meml = change_address (dst, DImode, aligned_addrl); + set_mem_alias_set (meml, 0); + + rtx insl = gen_reg_rtx (DImode); + rtx maskl; + switch (size) + { + case 1: + if (src != CONST0_RTX (GET_MODE (src))) + emit_insn (gen_insbl (insl, gen_lowpart (QImode, src), addr)); + break; + case 2: + if (src != CONST0_RTX (GET_MODE (src))) + emit_insn (gen_inswl (insl, gen_lowpart (HImode, src), addr)); + break; + case 4: + if (src != CONST0_RTX (GET_MODE (src))) + emit_insn (gen_insll (insl, gen_lowpart (SImode, src), addr)); + break; + case 8: + if (src != CONST0_RTX (GET_MODE (src))) + emit_insn (gen_insql (insl, gen_lowpart (DImode, src), addr)); + break; + case 3: + case 5: + case 6: + case 7: + /* For the low part we shift the byte mask left by the byte index, + which is already in ADDR. */ + maskl = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (maskl, + gen_rtx_ASHIFT (DImode, byte_mask, addr))); + + /* Insert any bytes required by hand, by doing a byte-wise shift + on SRC left by the same number and then zap the bytes outside + the byte mask. */ + if (src != CONST0_RTX (GET_MODE (src))) + { + rtx byte_loc = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (byte_loc, + gen_rtx_ASHIFT (DImode, + force_reg (DImode, addr), + GEN_INT (3)))); + rtx bytes = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (bytes, + gen_rtx_ASHIFT (DImode, + gen_lowpart (DImode, src), + byte_loc))); + + rtx zapmask = gen_rtx_NOT (QImode, + gen_rtx_SUBREG (QImode, maskl, 0)); + rtx zap = gen_rtx_UNSPEC (DImode, gen_rtvec (1, zapmask), + UNSPEC_ZAP); + emit_insn (gen_rtx_SET (insl, gen_rtx_AND (DImode, zap, bytes))); + } + break; + default: + gcc_unreachable (); + } + + rtx labell = gen_rtx_LABEL_REF (DImode, gen_label_rtx ()); + emit_label (XEXP (labell, 0)); + + rtx dstl = gen_reg_rtx (DImode); + emit_insn (gen_load_locked (DImode, dstl, meml)); + + switch (size) + { + case 1: + emit_insn (gen_mskbl (dstl, dstl, addr)); + break; + case 2: + emit_insn (gen_mskwl (dstl, dstl, addr)); + break; + case 4: + emit_insn (gen_mskll (dstl, dstl, addr)); + break; + case 8: + emit_insn (gen_mskql (dstl, dstl, addr)); + break; + case 3: + case 5: + case 6: + case 7: + { + rtx zapmask = gen_rtx_SUBREG (QImode, maskl, 0); + rtx zap = gen_rtx_UNSPEC (DImode, gen_rtvec (1, zapmask), UNSPEC_ZAP); + emit_insn (gen_rtx_SET (dstl, gen_rtx_AND (DImode, zap, dstl))); + } + break; + default: + gcc_unreachable (); + } + + if (src != CONST0_RTX (GET_MODE (src))) + dstl = expand_simple_binop (DImode, IOR, insl, dstl, dstl, 0, OPTAB_WIDEN); + + emit_insn (gen_store_conditional (DImode, dstl, meml, dstl)); + + alpha_emit_unlikely_jump (gen_rtx_EQ (DImode, dstl, const0_rtx), labell); +} + /* The block move code tries to maximize speed by separating loads and stores at the expense of register pressure: we load all of the data before we store it back out. There are two secondary effects worth @@ -3838,6 +4146,117 @@ alpha_expand_unaligned_store_words (rtx emit_move_insn (st_addr_1, st_tmp_1); } +/* Store an integral number of consecutive unaligned quadwords. DATA_REGS + may be NULL to store zeros. This is a multi-thread and async-signal + safe implementation. */ + +static void +alpha_expand_unaligned_store_words_safe_partial (rtx *data_regs, rtx dmem, + HOST_WIDE_INT words, + HOST_WIDE_INT ofs, + HOST_WIDE_INT align) +{ + rtx const im8 = GEN_INT (-8); + rtx ins_tmps[MAX_MOVE_WORDS]; + HOST_WIDE_INT i; + + /* Generate all the tmp registers we need. */ + for (i = 0; i < words; i++) + ins_tmps[i] = data_regs != NULL ? gen_reg_rtx (DImode) : const0_rtx; + + if (ofs != 0) + dmem = adjust_address (dmem, GET_MODE (dmem), ofs); + + /* For BWX store the ends before we start fiddling with data registers + to fill the middle. Also if we have no more than two quadwords, + then obviously we're done. */ + if (TARGET_BWX) + { + rtx datan = data_regs ? data_regs[words - 1] : const0_rtx; + rtx data0 = data_regs ? data_regs[0] : const0_rtx; + HOST_WIDE_INT e = (words - 1) * 8; + + alpha_expand_unaligned_store_safe_partial (dmem, data0, 8, 0, align); + alpha_expand_unaligned_store_safe_partial (dmem, datan, 8, e, align); + if (words <= 2) + return; + } + + rtx dmema = XEXP (dmem, 0); + if (GET_CODE (dmema) == LO_SUM) + dmema = force_reg (Pmode, dmema); + + /* Shift the input data into place. */ + rtx dreg = copy_addr_to_reg (dmema); + if (data_regs != NULL) + { + for (i = words - 1; i >= 0; i--) + { + emit_insn (gen_insqh (ins_tmps[i], data_regs[i], dreg)); + emit_insn (gen_insql (data_regs[i], data_regs[i], dreg)); + } + for (i = words - 1; i > 0; i--) + ins_tmps[i - 1] = expand_simple_binop (DImode, IOR, data_regs[i], + ins_tmps[i - 1], + ins_tmps[i - 1], + 1, OPTAB_DIRECT); + } + + if (!TARGET_BWX) + { + rtx temp = gen_reg_rtx (DImode); + rtx mem = gen_rtx_MEM (DImode, + expand_simple_binop (Pmode, AND, dreg, im8, + NULL_RTX, 1, OPTAB_DIRECT)); + + rtx label = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ()); + emit_label (XEXP (label, 0)); + + emit_insn (gen_load_locked (DImode, temp, mem)); + emit_insn (gen_mskql (temp, temp, dreg)); + if (data_regs != NULL) + temp = expand_simple_binop (DImode, IOR, temp, data_regs[0], + temp, 1, OPTAB_DIRECT); + emit_insn (gen_store_conditional (DImode, temp, mem, temp)); + + alpha_emit_unlikely_jump (gen_rtx_EQ (DImode, temp, const0_rtx), label); + } + + for (i = words - 1; i > 0; --i) + { + rtx temp = change_address (dmem, Pmode, + gen_rtx_AND (Pmode, + plus_constant (Pmode, + dmema, i * 8), + im8)); + set_mem_alias_set (temp, 0); + emit_move_insn (temp, ins_tmps[i - 1]); + } + + if (!TARGET_BWX) + { + rtx temp = gen_reg_rtx (DImode); + rtx addr = expand_simple_binop (Pmode, PLUS, dreg, + GEN_INT (words * 8 - 1), + NULL_RTX, 1, OPTAB_DIRECT); + rtx mem = gen_rtx_MEM (DImode, + expand_simple_binop (Pmode, AND, addr, im8, + NULL_RTX, 1, OPTAB_DIRECT)); + + rtx label = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ()); + emit_label (XEXP (label, 0)); + + emit_insn (gen_load_locked (DImode, temp, mem)); + emit_insn (gen_mskqh (temp, temp, dreg)); + if (data_regs != NULL) + temp = expand_simple_binop (DImode, IOR, temp, ins_tmps[words - 1], + temp, 1, OPTAB_DIRECT); + emit_insn (gen_store_conditional (DImode, temp, mem, temp)); + + alpha_emit_unlikely_jump (gen_rtx_EQ (DImode, temp, const0_rtx), label); + } +} + /* Get the base alignment and offset of EXPR in A and O respectively. Check for any pseudo register pointer alignment and for any tree node information and return the largest alignment determined and @@ -4147,26 +4566,74 @@ alpha_expand_block_move (rtx operands[]) if (GET_MODE (data_regs[i + words]) != DImode) break; - if (words == 1) - alpha_expand_unaligned_store (orig_dst, data_regs[i], 8, ofs); + if (TARGET_SAFE_PARTIAL) + { + if (words == 1) + alpha_expand_unaligned_store_safe_partial (orig_dst, data_regs[i], + 8, ofs, dst_align); + else + alpha_expand_unaligned_store_words_safe_partial (data_regs + i, + orig_dst, words, + ofs, dst_align); + } else - alpha_expand_unaligned_store_words (data_regs + i, orig_dst, - words, ofs); - + { + if (words == 1) + alpha_expand_unaligned_store (orig_dst, data_regs[i], 8, ofs); + else + alpha_expand_unaligned_store_words (data_regs + i, orig_dst, + words, ofs); + } i += words; ofs += words * 8; } - /* Due to the above, this won't be aligned. */ + /* If we are in the partial memory access safety mode with a non-BWX + target, then coalesce data loaded of different widths so as to + minimize the number of safe partial stores as they are expensive. */ + if (!TARGET_BWX && TARGET_SAFE_PARTIAL) + { + HOST_WIDE_INT size = 0; + unsigned int n; + + for (n = i; i < nregs; i++) + { + if (i != n) + { + /* Don't widen SImode data where obtained by extraction. */ + rtx data = data_regs[n]; + if (GET_MODE (data) == SImode && src_align < 32) + data = gen_rtx_SUBREG (DImode, data, 0); + rtx field = expand_simple_binop (DImode, ASHIFT, data_regs[i], + GEN_INT (size * BITS_PER_UNIT), + NULL_RTX, 1, OPTAB_DIRECT); + data_regs[n] = expand_simple_binop (DImode, IOR, data, field, + data, 1, OPTAB_WIDEN); + } + size += GET_MODE_SIZE (GET_MODE (data_regs[i])); + gcc_assert (size < 8); + } + if (size > 0) + alpha_expand_unaligned_store_safe_partial (orig_dst, data_regs[n], + size, ofs, dst_align); + ofs += size; + } + + /* We've done aligned stores above, this won't be aligned. */ while (i < nregs && GET_MODE (data_regs[i]) == SImode) { - alpha_expand_unaligned_store (orig_dst, data_regs[i], 4, ofs); + gcc_assert (TARGET_BWX || !TARGET_SAFE_PARTIAL); + if (TARGET_SAFE_PARTIAL) + alpha_expand_unaligned_store_safe_partial (orig_dst, data_regs[i], + 4, ofs, dst_align); + else + alpha_expand_unaligned_store (orig_dst, data_regs[i], 4, ofs); ofs += 4; i++; gcc_assert (i == nregs || GET_MODE (data_regs[i]) != SImode); } - if (dst_align >= 16) + if (TARGET_BWX && dst_align >= 16) while (i < nregs && GET_MODE (data_regs[i]) == HImode) { emit_move_insn (adjust_address (orig_dst, HImode, ofs), data_regs[i]); @@ -4176,7 +4643,12 @@ alpha_expand_block_move (rtx operands[]) else while (i < nregs && GET_MODE (data_regs[i]) == HImode) { - alpha_expand_unaligned_store (orig_dst, data_regs[i], 2, ofs); + gcc_assert (TARGET_BWX || !TARGET_SAFE_PARTIAL); + if (TARGET_SAFE_PARTIAL) + alpha_expand_unaligned_store_safe_partial (orig_dst, data_regs[i], + 2, ofs, dst_align); + else + alpha_expand_unaligned_store (orig_dst, data_regs[i], 2, ofs); i++; ofs += 2; } @@ -4185,6 +4657,7 @@ alpha_expand_block_move (rtx operands[]) while (i < nregs) { gcc_assert (GET_MODE (data_regs[i]) == QImode); + gcc_assert (TARGET_BWX || !TARGET_SAFE_PARTIAL); emit_move_insn (adjust_address (orig_dst, QImode, ofs), data_regs[i]); i++; ofs += 1; @@ -4193,6 +4666,27 @@ alpha_expand_block_move (rtx operands[]) return 1; } +/* Expand a multi-thread and async-signal safe partial clear of a longword + or a quadword quantity indicated by MODE at aligned memory location MEM + according to MASK. */ + +static void +alpha_expand_clear_safe_partial_nobwx (rtx mem, machine_mode mode, + HOST_WIDE_INT mask) +{ + rtx label = gen_rtx_LABEL_REF (DImode, gen_label_rtx ()); + emit_label (XEXP (label, 0)); + + rtx temp = gen_reg_rtx (mode); + rtx status = mode == DImode ? temp : gen_rtx_SUBREG (DImode, temp, 0); + + emit_insn (gen_load_locked (mode, temp, mem)); + emit_insn (gen_rtx_SET (temp, gen_rtx_AND (mode, temp, GEN_INT (mask)))); + emit_insn (gen_store_conditional (mode, status, mem, temp)); + + alpha_emit_unlikely_jump (gen_rtx_EQ (DImode, status, const0_rtx), label); +} + int alpha_expand_block_clear (rtx operands[]) { @@ -4237,8 +4731,9 @@ alpha_expand_block_clear (rtx operands[] { /* Given that alignofs is bounded by align, the only time BWX could generate three stores is for a 7 byte fill. Prefer two individual - stores over a load/mask/store sequence. */ - if ((!TARGET_BWX || alignofs == 7) + stores over a load/mask/store sequence. In the partial safety + mode always do individual stores regardless of their count. */ + if ((!TARGET_BWX || (!TARGET_SAFE_PARTIAL && alignofs == 7)) && align >= 32 && !(alignofs == 4 && bytes >= 4)) { @@ -4264,10 +4759,15 @@ alpha_expand_block_clear (rtx operands[] } alignofs = 0; - tmp = expand_binop (mode, and_optab, mem, GEN_INT (mask), - NULL_RTX, 1, OPTAB_WIDEN); + if (TARGET_SAFE_PARTIAL) + alpha_expand_clear_safe_partial_nobwx (mem, mode, mask); + else + { + tmp = expand_binop (mode, and_optab, mem, GEN_INT (mask), + NULL_RTX, 1, OPTAB_WIDEN); - emit_move_insn (mem, tmp); + emit_move_insn (mem, tmp); + } } if (TARGET_BWX && (alignofs & 1) && bytes >= 1) @@ -4372,7 +4872,11 @@ alpha_expand_block_clear (rtx operands[] { words = bytes / 8; - alpha_expand_unaligned_store_words (NULL, orig_dst, words, ofs); + if (TARGET_SAFE_PARTIAL) + alpha_expand_unaligned_store_words_safe_partial (NULL, orig_dst, + words, ofs, align); + else + alpha_expand_unaligned_store_words (NULL, orig_dst, words, ofs); bytes -= words * 8; ofs += words * 8; @@ -4389,7 +4893,7 @@ alpha_expand_block_clear (rtx operands[] /* If we have appropriate alignment (and it wouldn't take too many instructions otherwise), mask out the bytes we need. */ - if ((TARGET_BWX ? words > 2 : bytes > 0) + if ((TARGET_BWX ? !TARGET_SAFE_PARTIAL && words > 2 : bytes > 0) && (align >= 64 || (align >= 32 && bytes < 4))) { machine_mode mode = (align >= 64 ? DImode : SImode); @@ -4401,18 +4905,46 @@ alpha_expand_block_clear (rtx operands[] mask = HOST_WIDE_INT_M1U << (bytes * 8); - tmp = expand_binop (mode, and_optab, mem, GEN_INT (mask), - NULL_RTX, 1, OPTAB_WIDEN); + if (TARGET_SAFE_PARTIAL) + alpha_expand_clear_safe_partial_nobwx (mem, mode, mask); + else + { + tmp = expand_binop (mode, and_optab, mem, GEN_INT (mask), + NULL_RTX, 1, OPTAB_WIDEN); - emit_move_insn (mem, tmp); + emit_move_insn (mem, tmp); + } return 1; } - if (!TARGET_BWX && bytes >= 4) + if (bytes >= 4) { - alpha_expand_unaligned_store (orig_dst, const0_rtx, 4, ofs); - bytes -= 4; - ofs += 4; + if (align >= 32) + do + { + emit_move_insn (adjust_address (orig_dst, SImode, ofs), + const0_rtx); + bytes -= 4; + ofs += 4; + } + while (bytes >= 4); + else if (!TARGET_BWX) + { + gcc_assert (bytes < 8); + if (TARGET_SAFE_PARTIAL) + { + alpha_expand_unaligned_store_safe_partial (orig_dst, const0_rtx, + bytes, ofs, align); + ofs += bytes; + bytes = 0; + } + else + { + alpha_expand_unaligned_store (orig_dst, const0_rtx, 4, ofs); + bytes -= 4; + ofs += 4; + } + } } if (bytes >= 2) @@ -4428,18 +4960,38 @@ alpha_expand_block_clear (rtx operands[] } else if (! TARGET_BWX) { - alpha_expand_unaligned_store (orig_dst, const0_rtx, 2, ofs); - bytes -= 2; - ofs += 2; + gcc_assert (bytes < 4); + if (TARGET_SAFE_PARTIAL) + { + alpha_expand_unaligned_store_safe_partial (orig_dst, const0_rtx, + bytes, ofs, align); + ofs += bytes; + bytes = 0; + } + else + { + alpha_expand_unaligned_store (orig_dst, const0_rtx, 2, ofs); + bytes -= 2; + ofs += 2; + } } } while (bytes > 0) - { - emit_move_insn (adjust_address (orig_dst, QImode, ofs), const0_rtx); - bytes -= 1; - ofs += 1; - } + if (TARGET_BWX || !TARGET_SAFE_PARTIAL) + { + emit_move_insn (adjust_address (orig_dst, QImode, ofs), const0_rtx); + bytes -= 1; + ofs += 1; + } + else + { + gcc_assert (bytes < 2); + alpha_expand_unaligned_store_safe_partial (orig_dst, const0_rtx, + bytes, ofs, align); + ofs += bytes; + bytes = 0; + } return 1; } Index: gcc/gcc/config/alpha/alpha.md =================================================================== --- gcc.orig/gcc/config/alpha/alpha.md +++ gcc/gcc/config/alpha/alpha.md @@ -4781,9 +4781,15 @@ && INTVAL (operands[1]) != 64)) FAIL; - alpha_expand_unaligned_store (operands[0], operands[3], - INTVAL (operands[1]) / 8, - INTVAL (operands[2]) / 8); + if (TARGET_SAFE_PARTIAL) + alpha_expand_unaligned_store_safe_partial (operands[0], operands[3], + INTVAL (operands[1]) / 8, + INTVAL (operands[2]) / 8, + BITS_PER_UNIT); + else + alpha_expand_unaligned_store (operands[0], operands[3], + INTVAL (operands[1]) / 8, + INTVAL (operands[2]) / 8); DONE; }) Index: gcc/gcc/config/alpha/alpha.opt =================================================================== --- gcc.orig/gcc/config/alpha/alpha.opt +++ gcc/gcc/config/alpha/alpha.opt @@ -73,6 +73,10 @@ msafe-bwa Target Mask(SAFE_BWA) Emit multi-thread and async-signal safe code for byte and word memory accesses. +msafe-partial +Target Mask(SAFE_PARTIAL) +Emit multi-thread and async-signal safe code for partial memory accesses. + mexplicit-relocs Target Mask(EXPLICIT_RELOCS) Emit code using explicit relocation directives. Index: gcc/gcc/config/alpha/alpha.opt.urls =================================================================== --- gcc.orig/gcc/config/alpha/alpha.opt.urls +++ gcc/gcc/config/alpha/alpha.opt.urls @@ -38,6 +38,9 @@ UrlSuffix(gcc/DEC-Alpha-Options.html#ind msafe-bwa UrlSuffix(gcc/DEC-Alpha-Options.html#index-msafe-bwa) +msafe-partial +UrlSuffix(gcc/DEC-Alpha-Options.html#index-msafe-partial) + mexplicit-relocs UrlSuffix(gcc/DEC-Alpha-Options.html#index-mexplicit-relocs) Index: gcc/gcc/doc/invoke.texi =================================================================== --- gcc.orig/gcc/doc/invoke.texi +++ gcc/gcc/doc/invoke.texi @@ -976,7 +976,7 @@ Objective-C and Objective-C++ Dialects}. -mtrap-precision=@var{mode} -mbuild-constants -mcpu=@var{cpu-type} -mtune=@var{cpu-type} -mbwx -mmax -mfix -mcix --msafe-bwa +-msafe-bwa -msafe-partial -mfloat-vax -mfloat-ieee -mexplicit-relocs -msmall-data -mlarge-data -msmall-text -mlarge-text @@ -25700,6 +25700,16 @@ Indicate whether in the absence of the o GCC should generate multi-thread and async-signal safe code for byte and aligned word memory accesses. +@opindex msafe-partial +@opindex mno-safe-partial +@item -msafe-partial +@itemx -mno-safe-partial +Indicate whether GCC should generate multi-thread and async-signal +safe code for partial memory accesses, including piecemeal accesses +to unaligned data as well as block accesses to leading and trailing +parts of aggregate types or other objects in memory that do not +respectively start and end on an aligned 64-bit data boundary. + @opindex mfloat-vax @opindex mfloat-ieee @item -mfloat-vax Index: gcc/gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-mbwx -msafe-partial" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "memclr-a2-o1-c9-ptr.c" + +/* Expect assembly such as: + + stb $31,1($16) + stw $31,2($16) + stw $31,4($16) + stw $31,6($16) + stw $31,8($16) + + that is with a byte store at offset 1, followed by word stores at + offsets 2, 4, 6, and 8. */ + +/* { dg-final { scan-assembler-times "\\sstb\\s\\\$31,1\\\(\\\$16\\\)\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstw\\s\\\$31,2\\\(\\\$16\\\)\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstw\\s\\\$31,4\\\(\\\$16\\\)\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstw\\s\\\$31,6\\\(\\\$16\\\)\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstw\\s\\\$31,8\\\(\\\$16\\\)\\s" 1 } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr.c =================================================================== --- gcc.orig/gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr.c +++ gcc/gcc/testsuite/gcc.target/alpha/memclr-a2-o1-c9-ptr.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-mbwx" } */ +/* { dg-options "-mbwx -mno-safe-partial" } */ /* { dg-skip-if "" { *-*-* } { "-O0" } } */ typedef unsigned int __attribute__ ((mode (QI))) int08_t; Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-msafe-partial -mbwx" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "memcpy-di-unaligned-dst.c" + +/* { dg-final { scan-assembler-times "\\sldq\\s" 7 } } */ +/* { dg-final { scan-assembler-times "\\sstb\\s" 16 } } */ +/* { dg-final { scan-assembler-times "\\sstq_u\\s" 6 } } */ +/* { dg-final { scan-assembler-not "\\sldq_l\\s" } } */ +/* { dg-final { scan-assembler-not "\\sldq_u\\s" } } */ +/* { dg-final { scan-assembler-not "\\sstq\\s" } } */ +/* { dg-final { scan-assembler-not "\\sstq_c\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-msafe-partial -mno-bwx" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "memcpy-di-unaligned-dst.c" + +/* { dg-final { scan-assembler-times "\\sldq\\s" 7 } } */ +/* { dg-final { scan-assembler-times "\\sldq_l\\s" 2 } } */ +/* { dg-final { scan-assembler-times "\\sstq_c\\s" 2 } } */ +/* { dg-final { scan-assembler-times "\\sstq_u\\s" 6 } } */ +/* { dg-final { scan-assembler-not "\\sldq_u\\s" } } */ +/* { dg-final { scan-assembler-not "\\sstq\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst.c =================================================================== --- gcc.orig/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst.c +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-di-unaligned-dst.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "" } */ +/* { dg-options "-mno-safe-partial" } */ /* { dg-skip-if "" { *-*-* } { "-O0" } } */ unsigned long unaligned_src_di[9] = { [0 ... 8] = 0xfefdfcfbfaf9f8f7 }; Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-msafe-partial -mbwx" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "memcpy-si-unaligned-dst.c" + +/* { dg-final { scan-assembler-times "\\sldl\\s" 15 } } */ +/* { dg-final { scan-assembler-times "\\sstb\\s" 20 } } */ +/* { dg-final { scan-assembler-times "\\sstq_u\\s" 6 } } */ +/* { dg-final { scan-assembler-not "\\sldq_l\\s" } } */ +/* { dg-final { scan-assembler-not "\\sldq_u\\s" } } */ +/* { dg-final { scan-assembler-not "\\sstl\\s" } } */ +/* { dg-final { scan-assembler-not "\\sstq_c\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-msafe-partial -mno-bwx" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "memcpy-si-unaligned-dst.c" + +/* { dg-final { scan-assembler-times "\\sldl\\s" 15 } } */ +/* { dg-final { scan-assembler-times "\\sldq_l\\s" 4 } } */ +/* { dg-final { scan-assembler-times "\\sstq_c\\s" 4 } } */ +/* { dg-final { scan-assembler-times "\\sstq_u\\s" 6 } } */ +/* { dg-final { scan-assembler-not "\\sldq_u\\s" } } */ +/* { dg-final { scan-assembler-not "\\sstl\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst.c =================================================================== --- gcc.orig/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst.c +++ gcc/gcc/testsuite/gcc.target/alpha/memcpy-si-unaligned-dst.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "" } */ +/* { dg-options "-mno-safe-partial" } */ /* { dg-skip-if "" { *-*-* } { "-O0" } } */ unsigned int unaligned_src_si[17] = { [0 ... 16] = 0xfefdfcfb }; Index: gcc/gcc/testsuite/gcc.target/alpha/stlx0-safe-partial-bwx.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stlx0-safe-partial-bwx.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-mbwx -msafe-partial" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "stlx0.c" + +/* Expect assembly such as: + + stb $31,0($16) + stb $31,1($16) + stb $31,2($16) + stb $31,3($16) + + without any LDQ_U or STQ_U instructions. */ + +/* { dg-final { scan-assembler-times "\\sstb\\s" 4 } } */ +/* { dg-final { scan-assembler-not "\\s(?:ldq_u|stq_u)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stlx0-safe-partial.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stlx0-safe-partial.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx -msafe-partial" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "stlx0.c" + +/* Expect assembly such as: + + lda $2,3($16) + bic $2,7,$2 +$L2: + ldq_l $1,0($2) + msklh $1,$16,$1 + stq_c $1,0($2) + beq $1,$L2 + bic $16,7,$2 +$L3: + ldq_l $1,0($2) + mskll $1,$16,$1 + stq_c $1,0($2) + beq $1,$L3 + + without any INSLH, INSLL, BIS, LDQ_U, or STQ_U instructions. */ + +/* { dg-final { scan-assembler-times "\\sldq_l\\s" 2 } } */ +/* { dg-final { scan-assembler-times "\\smsklh\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\smskll\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstq_c\\s" 2 } } */ +/* { dg-final { scan-assembler-not "\\s(?:bis|inslh|insll|ldq_u|stq_u)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stlx0.c =================================================================== --- gcc.orig/gcc/testsuite/gcc.target/alpha/stlx0.c +++ gcc/gcc/testsuite/gcc.target/alpha/stlx0.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "" } */ +/* { dg-options "-mno-safe-partial" } */ /* { dg-skip-if "" { *-*-* } { "-O0" } } */ typedef struct { int v __attribute__ ((packed)); } intx; Index: gcc/gcc/testsuite/gcc.target/alpha/stqx0-safe-partial-bwx.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stqx0-safe-partial-bwx.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-mbwx -msafe-partial" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "stqx0.c" + +/* Expect assembly such as: + + stb $31,0($16) + stb $31,1($16) + stb $31,2($16) + stb $31,3($16) + stb $31,4($16) + stb $31,5($16) + stb $31,6($16) + stb $31,7($16) + + without any LDQ_U or STQ_U instructions. */ + +/* { dg-final { scan-assembler-times "\\sstb\\s" 8 } } */ +/* { dg-final { scan-assembler-not "\\s(?:ldq_u|stq_u)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stqx0-safe-partial.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stqx0-safe-partial.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx -msafe-partial" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "stqx0.c" + +/* Expect assembly such as: + + lda $2,7($16) + bic $2,7,$2 +$L2: + ldq_l $1,0($2) + mskqh $1,$16,$1 + stq_c $1,0($2) + beq $1,$L2 + bic $16,7,$2 +$L3: + ldq_l $1,0($2) + mskql $1,$16,$1 + stq_c $1,0($2) + beq $1,$L3 + + without any INSLH, INSLL, BIS, LDQ_U, or STQ_U instructions. */ + +/* { dg-final { scan-assembler-times "\\sldq_l\\s" 2 } } */ +/* { dg-final { scan-assembler-times "\\smskqh\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\smskql\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstq_c\\s" 2 } } */ +/* { dg-final { scan-assembler-not "\\s(?:bis|insqh|insql|ldq_u|stq_u)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stqx0.c =================================================================== --- gcc.orig/gcc/testsuite/gcc.target/alpha/stqx0.c +++ gcc/gcc/testsuite/gcc.target/alpha/stqx0.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "" } */ +/* { dg-options "-mno-safe-partial" } */ /* { dg-skip-if "" { *-*-* } { "-O0" } } */ typedef struct { long v __attribute__ ((packed)); } longx; Index: gcc/gcc/testsuite/gcc.target/alpha/stwx0-bwx.c =================================================================== --- gcc.orig/gcc/testsuite/gcc.target/alpha/stwx0-bwx.c +++ gcc/gcc/testsuite/gcc.target/alpha/stwx0-bwx.c @@ -1,19 +1,15 @@ /* { dg-do compile } */ -/* { dg-options "-mbwx" } */ +/* { dg-options "-mbwx -mno-safe-partial" } */ /* { dg-skip-if "" { *-*-* } { "-O0" } } */ -typedef struct { short v __attribute__ ((packed)); } shortx; - -void -stwx0 (shortx *p) -{ - p->v = 0; -} +#include "stwx0.c" /* Expect assembly such as: stb $31,0($16) stb $31,1($16) - */ + + without any LDQ_U or STQ_U instructions. */ /* { dg-final { scan-assembler-times "\\sstb\\s\\\$31," 2 } } */ +/* { dg-final { scan-assembler-not "\\s(?:ldq_u|stq_u)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stwx0-safe-partial-bwx.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stwx0-safe-partial-bwx.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-mbwx -msafe-partial" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "stwx0.c" + +/* Expect assembly such as: + + stb $31,0($16) + stb $31,1($16) + + without any LDQ_U or STQ_U instructions. */ + +/* { dg-final { scan-assembler-times "\\sstb\\s\\\$31," 2 } } */ +/* { dg-final { scan-assembler-not "\\s(?:ldq_u|stq_u)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stwx0-safe-partial.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/alpha/stwx0-safe-partial.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mno-bwx -msafe-partial" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } } */ + +#include "stwx0.c" + +/* Expect assembly such as: + + lda $2,1($16) + bic $2,7,$2 +$L2: + ldq_l $1,0($2) + mskwh $1,$16,$1 + stq_c $1,0($2) + beq $1,$L2 + bic $16,7,$2 +$L3: + ldq_l $1,0($2) + mskwl $1,$16,$1 + stq_c $1,0($2) + beq $1,$L3 + + without any INSWH, INSWL, BIS, LDQ_U, or STQ_U instructions. */ + +/* { dg-final { scan-assembler-times "\\sldq_l\\s" 2 } } */ +/* { dg-final { scan-assembler-times "\\smskwh\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\smskwl\\s" 1 } } */ +/* { dg-final { scan-assembler-times "\\sstq_c\\s" 2 } } */ +/* { dg-final { scan-assembler-not "\\s(?:bis|inswh|inswl|ldq_u|stq_u)\\s" } } */ Index: gcc/gcc/testsuite/gcc.target/alpha/stwx0.c =================================================================== --- gcc.orig/gcc/testsuite/gcc.target/alpha/stwx0.c +++ gcc/gcc/testsuite/gcc.target/alpha/stwx0.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-mno-bwx" } */ +/* { dg-options "-mno-bwx -mno-safe-partial" } */ /* { dg-skip-if "" { *-*-* } { "-O0" } } */ typedef struct { short v __attribute__ ((packed)); } shortx;