From patchwork Tue Jun 28 15:27:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 55500 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1AF8638B1EA6 for ; Tue, 28 Jun 2022 15:28:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1AF8638B1EA6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1656430083; bh=LzBntyQ0MHm4YN1cT4KAlHQMT3jFiba2U//mZ+Sy0P0=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=OJilN+CWhkr9bZsDJCjLYvvF7Hnm06HlDuhCdK/wru49z3EEjBpD8RaASNV4yjyxa PDm9rnIpbhRH0t71fQ2eETExe8sgetx3jPIICbTzMJyVoFl4ODyYTdb7tN8wZoPGhH aMktvin4xZyCq8iZDCNCfcXr6PvR/owgB/ixAdJY= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by sourceware.org (Postfix) with ESMTPS id ABB77389988B for ; Tue, 28 Jun 2022 15:27:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org ABB77389988B Received: by mail-pl1-x62d.google.com with SMTP id l6so11363424plg.11 for ; Tue, 28 Jun 2022 08:27:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LzBntyQ0MHm4YN1cT4KAlHQMT3jFiba2U//mZ+Sy0P0=; b=ON8iQK1X7RH3wv08Yd3SGr2uzCA8aaCN5o6OzYyLSmeRE38AAofCzwUD3m1PGCvOLN Ri9O0/tdV5CCPH9LkOvurFDp1shUB+NH+OWEbMSstttNvbAp8GTZz5C+koGIHxSdpI+C 6ATe9JfOCvBglMZxv1SPrdwrr37vradTfj9jQQ3XxOIY7f3sbmS5qvB4dI+8ttcZFy9/ n+UXPa9OaqF3Pl3Y7VHF1vhynzMfmO+fRqfVq6kQb1+CmlPd0XoQsvrVwP25AfohNyer 2/jToqSt6azaw8V1WnkcHOARVMrChYQlsqYf5xz5sHJLwO4WsPl2JSRcvVsYLM2IGwo4 vnvw== X-Gm-Message-State: AJIora/LbxQflgvIKqaVFS5slxyxRTVGeCyMokLTcz8eKm1A4c0pu0hA dl1aXylRMeIH0r7cKcfjxzaOwwTbOHE= X-Google-Smtp-Source: AGRyM1vk/lWHNv0NZozDsxJ2hCcrBbb4pVpv+7UBD/GmNySMCFaFUVgQer/URuH2igOIv+RTvCPyow== X-Received: by 2002:a17:90a:e7c1:b0:1ed:3b8c:7ced with SMTP id kb1-20020a17090ae7c100b001ed3b8c7cedmr158679pjb.77.1656430059506; Tue, 28 Jun 2022 08:27:39 -0700 (PDT) Received: from noah-tgl.. ([192.55.60.43]) by smtp.gmail.com with ESMTPSA id jj2-20020a170903048200b0016a4db1343fsm9445361plb.190.2022.06.28.08.27.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 08:27:39 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v1 2/3] x86: Move and slightly improve memset_erms Date: Tue, 28 Jun 2022 08:27:34 -0700 Message-Id: <20220628152735.17863-2-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220628152735.17863-1-goldstein.w.n@gmail.com> References: <20220628152735.17863-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Implementation wise: 1. Remove the VZEROUPPER as memset_{impl}_unaligned_erms does not use the L(stosb) label that was previously defined. 2. Don't give the hotpath (fallthrough) to zero size. Code positioning wise: Move L(memset_{chk}_erms) to its own file. Leaving it in between the memset_{impl}_unaligned both adds unnecessary complexity to the file and wastes space in a relatively hot cache section. --- .../multiarch/memset-vec-unaligned-erms.S | 54 ++++++++----------- 1 file changed, 23 insertions(+), 31 deletions(-) diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S index abc12d9cda..d98c613651 100644 --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S @@ -156,37 +156,6 @@ L(entry_from_wmemset): #if defined USE_MULTIARCH && IS_IN (libc) END (MEMSET_SYMBOL (__memset, unaligned)) -# if VEC_SIZE == 16 -ENTRY (__memset_chk_erms) - cmp %RDX_LP, %RCX_LP - jb HIDDEN_JUMPTARGET (__chk_fail) -END (__memset_chk_erms) - -/* Only used to measure performance of REP STOSB. */ -ENTRY (__memset_erms) - /* Skip zero length. */ - test %RDX_LP, %RDX_LP - jnz L(stosb) - movq %rdi, %rax - ret -# else -/* Provide a hidden symbol to debugger. */ - .hidden MEMSET_SYMBOL (__memset, erms) -ENTRY (MEMSET_SYMBOL (__memset, erms)) -# endif -L(stosb): - mov %RDX_LP, %RCX_LP - movzbl %sil, %eax - mov %RDI_LP, %RDX_LP - rep stosb - mov %RDX_LP, %RAX_LP - VZEROUPPER_RETURN -# if VEC_SIZE == 16 -END (__memset_erms) -# else -END (MEMSET_SYMBOL (__memset, erms)) -# endif - # if defined SHARED && IS_IN (libc) ENTRY_CHK (MEMSET_CHK_SYMBOL (__memset_chk, unaligned_erms)) cmp %RDX_LP, %RCX_LP @@ -461,3 +430,26 @@ L(between_2_3): #endif ret END (MEMSET_SYMBOL (__memset, unaligned_erms)) + +#if defined USE_MULTIARCH && IS_IN (libc) && VEC_SIZE == 16 +ENTRY (__memset_chk_erms) + cmp %RDX_LP, %RCX_LP + jb HIDDEN_JUMPTARGET (__chk_fail) +END (__memset_chk_erms) + +/* Only used to measure performance of REP STOSB. */ +ENTRY (__memset_erms) + /* Skip zero length. */ + test %RDX_LP, %RDX_LP + jz L(stosb_return_zero) + mov %RDX_LP, %RCX_LP + movzbl %sil, %eax + mov %RDI_LP, %RDX_LP + rep stosb + mov %RDX_LP, %RAX_LP + ret +L(stosb_return_zero): + movq %rdi, %rax + ret +END (__memset_erms) +#endif