[RFC,0/6] .text.subsections for some questionable benefit

Message ID 20230515144815.3939017-1-bugaevc@gmail.com
Headers
Series .text.subsections for some questionable benefit |

Message

Sergey Bugaev May 15, 2023, 2:48 p.m. UTC
  Hello,

this patch series is the continuation of the __COLD patchset, and the
result of me looking into how GCC places some code into the
.text.xxxxxx subsections instead of the regular .text. Namely, as far
as I was able to understand, GCC does the following:

1. Functions marked with __atrribute__ ((cold)) are (among other
   effects) placed into .text.unlikely;
2. Similarly, functions marked with __atrribute__ ((hot)) get placed
   into .text.hot;
3. ELF constructors and main () are placed into .text.startup;
4. ELF destructors are placed into .text.exit.

When using profile-guiaded optimization, GCC may be able to make
decisions about this differently based on the profile data, but those
are the static rules.

The default linker script (ld --verbose) contains the following
stanza for constructing the .text of the final executable/library:

  .text           :
  {
    *(.text.unlikely .text.*_unlikely .text.unlikely.*)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(SORT(.text.sorted.*))
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf.em.  */
    *(.gnu.warning)
  }

So: the contents of .text.{unlikely,hot,startup.exit} of the linked
object files are grouped together during linking, but all end up
inside the final binary's .text.

Since GCC does not intrinsically know about glibc specifics, it makes
some sense to try and help it with finding startup- and exit-only
code. Hence, __TEXT_STARTUP and __TEXT_EXIT macros.

The supposed benefit of this is cache locality. As I understand it,
it's two-sided. For instance, talking about .text.exit:

1. During normal runtime (when not exiting yet), the .text.exit
   functions don't "get in the way", i.e. don't take up the precious
   place in the caches.
2. During exit, the code to be run (a large part of it anyway) is
   located in mostly the same place, and now it _is_, rightfully,
   taking up the cache space, and making full use of it.

The same applies to .text.startup. And depending on how lucky you are,
your system may not need to page in .text.unlikely at all -- if
nothing on the system abort ()s or error ()s out.

That's the idea anyway.

I have checked that indeed, the various startup, exit, and cold
functions are all neatly grouped together with this patchset. What I
have not done is I have not run any benchmarks (what would be the
relevant benchmarks to run?), so I can't tell if this provides any
noticeable benefit.

But having spent countless hours over the last few weeks single-
stepping through x86_64 Hurd startup in QEMU, I can confidently say
that during libc startup, it page faults on missing code pages way too
often. This is normally invisible to the program and to the debugger,
but very visible when you're debugging the whole system.

One more thing: the Linux kernel has a somewhat similar thing with
__init and __exit macros, which place the annotated function into
.init.text and .exit.text. They then do further tricks with this, such
(potentially?) unmapping the pages containing .init.text after startup
is completed. The SerenityOS Kernel similarly has UNMAP_AFTER_INIT
(and READONLY_AFTER_INIT, which is like attribute_relro).

This patchset *does not* introduce anything like that. It only does
grouping (and even that is done by GCC/ld), not any unmapping. It is
still 100% safe to call any __TEXT_STARTUP function after startup
(such as if a function has been mistakenly marked __TEXT_STARTUP, or
only normally used during startup, but may also be called later in
some exceptional / rare cases).

Now to the downsides:
1. This adds __TEXT_STARTUP annotations all over the place,
   particularly in elf/. So much code churn for some questionable and
   frankly theoretical benefit.
2. Even worse, this modifies assembly code! -- on all architectures.
   These are the architectures I have not even *heard* of, and cannot
   cross-compile for or test on. Surely I should not be allowed
   anywhere near writing assembly code for them!

   Counterpoint: I'm not altering the actual assembly code, I'm only
   really changing ".text" to ".section .text.startup", what could
   possibly go wrong?

Sergey
  

Comments

Cristian Rodríguez May 15, 2023, 3:33 p.m. UTC | #1
On Mon, May 15, 2023 at 10:48 AM Sergey Bugaev via Libc-alpha <
libc-alpha@sourceware.org> wrote:

>
> One more thing: the Linux kernel has a somewhat similar thing with
> __init and __exit macros, which place the annotated function into
> .init.text and .exit.text. They then do further tricks with this, such
> (potentially?) unmapping the pages containing .init.text after startup
> is completed. The SerenityOS Kernel similarly has UNMAP_AFTER_INIT
> (and READONLY_AFTER_INIT, which is like attribute_relro).
>
>
This not only for functions but variables is a sorely missed
compiler/linker extension that will be very useful to have, many programs
need single use routines
that never ought to be called again.. or ro_after_init variables, not just
the kernel or libc..