From patchwork Mon May 15 14:48:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Bugaev X-Patchwork-Id: 55764 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9D3933854145 for ; Mon, 15 May 2023 14:48:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9D3933854145 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1684162122; bh=AdtUk+2rzjd7jv5K7VRBaAnSM+yxKLoi1XQx6WFtr5w=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=ek0KtEzIHpT1Av6HqVbIBZn3/zWKOB8ePdJ45YhIxv5UEQkZTwROIzxujUc0Wn59G n7OFpU3do6WbynWoAX9G6usHjG7aNxt/REDOj+d0/xoFQ7j/WuwHVaUFxHSUoO0tmP jsnIUWZhh7xTr1fukcHfw9O41SnK1L24bQQYo0u8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-lj1-x22a.google.com (mail-lj1-x22a.google.com [IPv6:2a00:1450:4864:20::22a]) by sourceware.org (Postfix) with ESMTPS id 4D0613858C66 for ; Mon, 15 May 2023 14:48:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4D0613858C66 Received: by mail-lj1-x22a.google.com with SMTP id 38308e7fff4ca-2ac82912a59so136442001fa.3 for ; Mon, 15 May 2023 07:48:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684162097; x=1686754097; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AdtUk+2rzjd7jv5K7VRBaAnSM+yxKLoi1XQx6WFtr5w=; b=RHhB7eRR80o+OPrXmYqr8SGs5sxRvEhQeZ6CqMoNIOngprGv/tfNZA6kzKZaiYWOCI nxtcZwfjJpYQEMHAkR7LPfss6T9nmTZB4Qr7i4wzBjgwGPXW3igXtUeCaJXhNSB4lcyb p2tW/BCvfsirgl09zCoZypeXXIQ5eZeiwJO1ljG1tGIEn1dhs+0d+dBidrn9ud32mlsk DAcEQ6rFSJ9la8hz8Z5Bwt4c19w5IQIVvJ/UV1e7eJxo2JdUOmnJIfOO2XdKM5aGT6WU zJY9SVlsZd0j8gfeNywG0LMfW7mGCHP8b2NEsQFQRNVBqdcIOUxgr5LKe9AYBuwwjbLu DyNA== X-Gm-Message-State: AC+VfDxpCYNJIxMD7LmK78NkORFdK1Czau3z4rYgCni0tlnjTgbCfeGq i0hQ6yJl90ig+NiG2qcV6whEgZrY65w= X-Google-Smtp-Source: ACHHUZ7WAwxVjrA4lxw/l8zY8dHwdmMtsF3LcsuvwubE3GEJUQ8BrDXlG8+46blhrsj3DtiRmgVqlg== X-Received: by 2002:a05:6512:1024:b0:4ec:89d3:a8ac with SMTP id r4-20020a056512102400b004ec89d3a8acmr5722423lfr.30.1684162097211; Mon, 15 May 2023 07:48:17 -0700 (PDT) Received: from surface-pro-6.. ([194.190.106.50]) by smtp.gmail.com with ESMTPSA id w9-20020a056512098900b004f11e965308sm2643336lft.20.2023.05.15.07.48.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 May 2023 07:48:16 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [RFC PATCH 0/6] .text.subsections for some questionable benefit Date: Mon, 15 May 2023 17:48:09 +0300 Message-Id: <20230515144815.3939017-1-bugaevc@gmail.com> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Sergey Bugaev via Libc-alpha From: Sergey Bugaev Reply-To: Sergey Bugaev Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Hello, this patch series is the continuation of the __COLD patchset, and the result of me looking into how GCC places some code into the .text.xxxxxx subsections instead of the regular .text. Namely, as far as I was able to understand, GCC does the following: 1. Functions marked with __atrribute__ ((cold)) are (among other effects) placed into .text.unlikely; 2. Similarly, functions marked with __atrribute__ ((hot)) get placed into .text.hot; 3. ELF constructors and main () are placed into .text.startup; 4. ELF destructors are placed into .text.exit. When using profile-guiaded optimization, GCC may be able to make decisions about this differently based on the profile data, but those are the static rules. The default linker script (ld --verbose) contains the following stanza for constructing the .text of the final executable/library: .text : { *(.text.unlikely .text.*_unlikely .text.unlikely.*) *(.text.exit .text.exit.*) *(.text.startup .text.startup.*) *(.text.hot .text.hot.*) *(SORT(.text.sorted.*)) *(.text .stub .text.* .gnu.linkonce.t.*) /* .gnu.warning sections are handled specially by elf.em. */ *(.gnu.warning) } So: the contents of .text.{unlikely,hot,startup.exit} of the linked object files are grouped together during linking, but all end up inside the final binary's .text. Since GCC does not intrinsically know about glibc specifics, it makes some sense to try and help it with finding startup- and exit-only code. Hence, __TEXT_STARTUP and __TEXT_EXIT macros. The supposed benefit of this is cache locality. As I understand it, it's two-sided. For instance, talking about .text.exit: 1. During normal runtime (when not exiting yet), the .text.exit functions don't "get in the way", i.e. don't take up the precious place in the caches. 2. During exit, the code to be run (a large part of it anyway) is located in mostly the same place, and now it _is_, rightfully, taking up the cache space, and making full use of it. The same applies to .text.startup. And depending on how lucky you are, your system may not need to page in .text.unlikely at all -- if nothing on the system abort ()s or error ()s out. That's the idea anyway. I have checked that indeed, the various startup, exit, and cold functions are all neatly grouped together with this patchset. What I have not done is I have not run any benchmarks (what would be the relevant benchmarks to run?), so I can't tell if this provides any noticeable benefit. But having spent countless hours over the last few weeks single- stepping through x86_64 Hurd startup in QEMU, I can confidently say that during libc startup, it page faults on missing code pages way too often. This is normally invisible to the program and to the debugger, but very visible when you're debugging the whole system. One more thing: the Linux kernel has a somewhat similar thing with __init and __exit macros, which place the annotated function into .init.text and .exit.text. They then do further tricks with this, such (potentially?) unmapping the pages containing .init.text after startup is completed. The SerenityOS Kernel similarly has UNMAP_AFTER_INIT (and READONLY_AFTER_INIT, which is like attribute_relro). This patchset *does not* introduce anything like that. It only does grouping (and even that is done by GCC/ld), not any unmapping. It is still 100% safe to call any __TEXT_STARTUP function after startup (such as if a function has been mistakenly marked __TEXT_STARTUP, or only normally used during startup, but may also be called later in some exceptional / rare cases). Now to the downsides: 1. This adds __TEXT_STARTUP annotations all over the place, particularly in elf/. So much code churn for some questionable and frankly theoretical benefit. 2. Even worse, this modifies assembly code! -- on all architectures. These are the architectures I have not even *heard* of, and cannot cross-compile for or test on. Surely I should not be allowed anywhere near writing assembly code for them! Counterpoint: I'm not altering the actual assembly code, I'm only really changing ".text" to ".section .text.startup", what could possibly go wrong? Sergey