[v2,0/2] Generic x86-64 CPU diagnostics dumper

Message ID cover.1701947019.git.fweimer@redhat.com
Headers
Series Generic x86-64 CPU diagnostics dumper |

Message

Florian Weimer Dec. 7, 2023, 11:05 a.m. UTC
  This is an unchanged repost after a rebase.

I hope to get this integrated so that I can work on further CPU
compatibility diagnostics.  The intent behind those is to give
distributions better built-in tools to diagnose compatibility issues
after they incorporate CPU-specific optimizations.

Thanks,
Florian

Florian Weimer (2):
  elf: Wire up _dl_diagnostics_cpu_kernel
  x86: Add generic CPUID data dumper to ld.so --list-diagnostics

 elf/Makefile                                  |   1 +
 elf/dl-diagnostics-cpu-kernel.c               |  24 +
 elf/dl-diagnostics.c                          |   1 +
 elf/dl-diagnostics.h                          |   4 +
 manual/dynlink.texi                           |  86 +++-
 .../linux/x86/dl-diagnostics-cpu-kernel.c     | 457 ++++++++++++++++++
 6 files changed, 572 insertions(+), 1 deletion(-)
 create mode 100644 elf/dl-diagnostics-cpu-kernel.c
 create mode 100644 sysdeps/unix/sysv/linux/x86/dl-diagnostics-cpu-kernel.c


base-commit: 958478889c6a7a12b35b857b9788b7ad8706a01e
  

Comments

Adhemerval Zanella Dec. 7, 2023, 11:46 a.m. UTC | #1
On 07/12/23 08:05, Florian Weimer wrote:
> This is an unchanged repost after a rebase.
> 
> I hope to get this integrated so that I can work on further CPU
> compatibility diagnostics.  The intent behind those is to give
> distributions better built-in tools to diagnose compatibility issues
> after they incorporate CPU-specific optimizations.

What I would expect from ld.so --list-diagnostics was to dump some
relevant information used on ifunc and other optimizations selection, 
with some meaningful text on the queried data from the kernel/CPU and 
which function would be selected based on the obtained information.

Instead, we are moving towards a generic tool that dumps a lot of 
information that is either not directly relevant or requires some 
other tool to post-process it. 

This makes me wonder if it would be better to create a cpuid 
dumper/parser as a different program instead.

> 
> Thanks,
> Florian
> 
> Florian Weimer (2):
>   elf: Wire up _dl_diagnostics_cpu_kernel
>   x86: Add generic CPUID data dumper to ld.so --list-diagnostics
> 
>  elf/Makefile                                  |   1 +
>  elf/dl-diagnostics-cpu-kernel.c               |  24 +
>  elf/dl-diagnostics.c                          |   1 +
>  elf/dl-diagnostics.h                          |   4 +
>  manual/dynlink.texi                           |  86 +++-
>  .../linux/x86/dl-diagnostics-cpu-kernel.c     | 457 ++++++++++++++++++
>  6 files changed, 572 insertions(+), 1 deletion(-)
>  create mode 100644 elf/dl-diagnostics-cpu-kernel.c
>  create mode 100644 sysdeps/unix/sysv/linux/x86/dl-diagnostics-cpu-kernel.c
> 
> 
> base-commit: 958478889c6a7a12b35b857b9788b7ad8706a01e
  
Florian Weimer Dec. 7, 2023, 12:36 p.m. UTC | #2
* Adhemerval Zanella Netto:

> On 07/12/23 08:05, Florian Weimer wrote:
>> This is an unchanged repost after a rebase.
>> 
>> I hope to get this integrated so that I can work on further CPU
>> compatibility diagnostics.  The intent behind those is to give
>> distributions better built-in tools to diagnose compatibility issues
>> after they incorporate CPU-specific optimizations.
>
> What I would expect from ld.so --list-diagnostics was to dump some
> relevant information used on ifunc and other optimizations selection, 
> with some meaningful text on the queried data from the kernel/CPU and 
> which function would be selected based on the obtained information.

We don't know what GCC (__builtin_cpu_supports etc.) bases its selection
on because it does not use the glibc interfaces.  Even if we restricted
this to other toolchain usage, we'd end up having to update it alongside
GCC changes that add more selection logic.  The generic dumper avoids
that.

Even inside glibc, we have logic that looks at CPUID data that is
currently not captured in the dumps.

> Instead, we are moving towards a generic tool that dumps a lot of 
> information that is either not directly relevant or requires some 
> other tool to post-process it. 
>
> This makes me wonder if it would be better to create a cpuid 
> dumper/parser as a different program instead.

Some people can read these dumps directly (so far, I don't).  Even the
limited, glibc-specific dumps we have today, which surprised me.

Architecture-specific tools already exists, sure.  The downside is that
every architecture needs a different tool, and we would need to educate
users and support staff to request the appropriate data for each
architecture.  I think in glibc, we are quite well-positioned to note
what's important to include for each architecture (because we have such
selection logic in glibc for the string functions) and provide an
architecture-agnostic interface to capture the data for later analysis
by someone with architecture-specific knowledge.

Thanks,
Florian
  
Adhemerval Zanella Dec. 7, 2023, 1:11 p.m. UTC | #3
On 07/12/23 09:36, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>> On 07/12/23 08:05, Florian Weimer wrote:
>>> This is an unchanged repost after a rebase.
>>>
>>> I hope to get this integrated so that I can work on further CPU
>>> compatibility diagnostics.  The intent behind those is to give
>>> distributions better built-in tools to diagnose compatibility issues
>>> after they incorporate CPU-specific optimizations.
>>
>> What I would expect from ld.so --list-diagnostics was to dump some
>> relevant information used on ifunc and other optimizations selection, 
>> with some meaningful text on the queried data from the kernel/CPU and 
>> which function would be selected based on the obtained information.
> 
> We don't know what GCC (__builtin_cpu_supports etc.) bases its selection
> on because it does not use the glibc interfaces.  Even if we restricted
> this to other toolchain usage, we'd end up having to update it alongside
> GCC changes that add more selection logic.  The generic dumper avoids
> that.
> 
> Even inside glibc, we have logic that looks at CPUID data that is
> currently not captured in the dumps.

But my understanding is --list-diagnostics should be limited to own
glibc selection; it does not make much sense to trying to sync with
other interfaces (__builtin_cpu_supports) if there use a complete
different selection interface.

> 
>> Instead, we are moving towards a generic tool that dumps a lot of 
>> information that is either not directly relevant or requires some 
>> other tool to post-process it. 
>>
>> This makes me wonder if it would be better to create a cpuid 
>> dumper/parser as a different program instead.
> 
> Some people can read these dumps directly (so far, I don't).  Even the
> limited, glibc-specific dumps we have today, which surprised me.> 
> Architecture-specific tools already exists, sure.  The downside is that
> every architecture needs a different tool, and we would need to educate
> users and support staff to request the appropriate data for each
> architecture.  I think in glibc, we are quite well-positioned to note
> what's important to include for each architecture (because we have such
> selection logic in glibc for the string functions) and provide an
> architecture-agnostic interface to capture the data for later analysis
> by someone with architecture-specific knowledge.

Right, but currently  --list-diagnostics dumps around 100 lines related to 
x86 on my system. Although some are straightforward (like cache size),
some are really specific to implementation detail, like the 
x86.cpu_features.features vs x86.cpu_features.preferred and the preferred
fields description.

Also, such information can be queries either by accessing cpuid itself
and/or with x86.h specific ABI (from x86.h).  My questioning is if this 
really required to be on ld.so.
  
Florian Weimer Dec. 11, 2023, 9:06 a.m. UTC | #4
* Adhemerval Zanella Netto:

> On 07/12/23 09:36, Florian Weimer wrote:
>> * Adhemerval Zanella Netto:
>> 
>>> On 07/12/23 08:05, Florian Weimer wrote:
>>>> This is an unchanged repost after a rebase.
>>>>
>>>> I hope to get this integrated so that I can work on further CPU
>>>> compatibility diagnostics.  The intent behind those is to give
>>>> distributions better built-in tools to diagnose compatibility issues
>>>> after they incorporate CPU-specific optimizations.
>>>
>>> What I would expect from ld.so --list-diagnostics was to dump some
>>> relevant information used on ifunc and other optimizations selection, 
>>> with some meaningful text on the queried data from the kernel/CPU and 
>>> which function would be selected based on the obtained information.
>> 
>> We don't know what GCC (__builtin_cpu_supports etc.) bases its selection
>> on because it does not use the glibc interfaces.  Even if we restricted
>> this to other toolchain usage, we'd end up having to update it alongside
>> GCC changes that add more selection logic.  The generic dumper avoids
>> that.
>> 
>> Even inside glibc, we have logic that looks at CPUID data that is
>> currently not captured in the dumps.
>
> But my understanding is --list-diagnostics should be limited to own
> glibc selection; it does not make much sense to trying to sync with
> other interfaces (__builtin_cpu_supports) if there use a complete
> different selection interface.

I think we should consider the whole GNU toolchain, not just glibc.
Just as in other cases.

> Right, but currently  --list-diagnostics dumps around 100 lines related to 
> x86 on my system. Although some are straightforward (like cache size),
> some are really specific to implementation detail, like the 
> x86.cpu_features.features vs x86.cpu_features.preferred and the preferred
> fields description.

Yes, and those are required to diagnose issues with IFUNC selection and
other aspects of self-configuration.  Even so, the current dumps are
insufficient to trace how glibc comes up with the cache information for
sysconf.

They also do not cover asymmetric reporting across multiple cores.

> Also, such information can be queries either by accessing cpuid itself
> and/or with x86.h specific ABI (from x86.h).  My questioning is if this 
> really required to be on ld.so.

If we don't put it into a cross-architecture project, then we'll have to
teach everyone to use the appropriate tool for gathering such data on
every architecture.  This makes switching between architectures more
difficult.  I don't think we can get non-x86 CPU support into cpuid
<http://www.etallen.com/cpuid.html>.

Thanks,
Florian