[v2] i386: Allow -mlarge-data-threshold with -mcmodel=large

Message ID 20230525151632.3567825-1-maskray@google.com
State New
Headers
Series [v2] i386: Allow -mlarge-data-threshold with -mcmodel=large |

Commit Message

Fangrui Song May 25, 2023, 3:16 p.m. UTC
  When using -mcmodel=medium, large data objects larger than the
-mlarge-data-threshold threshold are placed into large data sections
(.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
.l* sections into separate output sections.  If small and medium code
model object files are mixed, the .l* sections won't exert relocation
overflow pressure on sections in object files built with -mcmodel=small.

However, when using -mcmodel=large, -mlarge-data-threshold doesn't
apply.  This means that the .rodata/.data/.bss sections may exert
relocation overflow pressure on sections in -mcmodel=small object files.

This patch allows -mcmodel=large to generate .l* sections.

Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
("Large data sections for the large code model")

Signed-off-by: Fangrui Song <maskray@google.com>

---
Changes from v1 (https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616947.html):
* Clarify commit message. Add link to https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
---
 gcc/config/i386/i386.cc                    | 15 +++++++++------
 gcc/config/i386/i386.opt                   |  2 +-
 gcc/doc/invoke.texi                        |  7 ++++---
 gcc/testsuite/gcc.target/i386/large-data.c | 13 +++++++++++++
 4 files changed, 27 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c
  

Comments

Jan Beulich May 25, 2023, 3:28 p.m. UTC | #1
On 25.05.2023 17:16, Fangrui Song wrote:
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -32942,9 +32942,10 @@ the cache line size.  @samp{compat} is the default.
>  
>  @opindex mlarge-data-threshold
>  @item -mlarge-data-threshold=@var{threshold}
> -When @option{-mcmodel=medium} is specified, data objects larger than
> -@var{threshold} are placed in the large data section.  This value must be the
> -same across all objects linked into the binary, and defaults to 65535.
> +When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
> +objects larger than @var{threshold} are placed in large data sections.  This
> +value must be the same across all objects linked into the binary, and defaults
> +to 65535.

Where's the "must be the same" requirement coming from?

As to the default - to remain compatible with earlier versions, shouldn't
large model code default to "infinity"?

Jan
  
Fangrui Song May 25, 2023, 4:11 p.m. UTC | #2
On 2023-05-25, Jan Beulich wrote:
>On 25.05.2023 17:16, Fangrui Song wrote:
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -32942,9 +32942,10 @@ the cache line size.  @samp{compat} is the default.
>>
>>  @opindex mlarge-data-threshold
>>  @item -mlarge-data-threshold=@var{threshold}
>> -When @option{-mcmodel=medium} is specified, data objects larger than
>> -@var{threshold} are placed in the large data section.  This value must be the
>> -same across all objects linked into the binary, and defaults to 65535.
>> +When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
>> +objects larger than @var{threshold} are placed in large data sections.  This
>> +value must be the same across all objects linked into the binary, and defaults
>> +to 65535.
>
>Where's the "must be the same" requirement coming from?

It's an existing requirement.  I think it may be related to discouraging
different COMDAT sections names due to different -mlarge-data-threshold=.
I don't think it makes sense but did not feel strongly dropping it.

Happy to drop the requirement if I revise this patch.

>As to the default - to remain compatible with earlier versions, shouldn't
>large model code default to "infinity"?
>
>Jan

I have thought about this compatibility need and feel that it is very
unlikly to be needed.  GNU ld has supported large data sections since
2005
(https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=3b22753a67cf616514de804ef6d5ed5e90a7d883).
Users' programs with the internal linker scripts will still be working
and -fdata-sections sections will be combined.

First, -mcmodel=large use cases are rare enough.  Rare perhaps
-mcmodel=largel was considered theoretic excercise  in
trying to reach feature completion
(https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU/m/NNuA0P7pAQAJ),
without this patch -mcmodel=large object files don't interract well with
existing -mcmodel=small object files.
Moreover, if a user expects a specific section prefix with
-mcmodel=large, that's a brittle assumption. I think it's fair to say
that the fault is on the user side and GCC doesn't need to work around
their issues.
  
Jan Beulich May 26, 2023, 7:11 a.m. UTC | #3
On 25.05.2023 18:11, Fangrui Song wrote:
> On 2023-05-25, Jan Beulich wrote:
>> On 25.05.2023 17:16, Fangrui Song wrote:
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -32942,9 +32942,10 @@ the cache line size.  @samp{compat} is the default.
>>>
>>>  @opindex mlarge-data-threshold
>>>  @item -mlarge-data-threshold=@var{threshold}
>>> -When @option{-mcmodel=medium} is specified, data objects larger than
>>> -@var{threshold} are placed in the large data section.  This value must be the
>>> -same across all objects linked into the binary, and defaults to 65535.
>>> +When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
>>> +objects larger than @var{threshold} are placed in large data sections.  This
>>> +value must be the same across all objects linked into the binary, and defaults
>>> +to 65535.
>>
>> Where's the "must be the same" requirement coming from?
> 
> It's an existing requirement.  I think it may be related to discouraging
> different COMDAT sections names due to different -mlarge-data-threshold=.
> I don't think it makes sense but did not feel strongly dropping it.
> 
> Happy to drop the requirement if I revise this patch.

I understand that this isn't something you introduce, but it still stuck
me as odd. Therefore I thought I'd suggest to take the opportunity to at
least soften the language, unless of course there's a real reason behind
it.

>> As to the default - to remain compatible with earlier versions, shouldn't
>> large model code default to "infinity"?
>>
>> Jan
> 
> I have thought about this compatibility need and feel that it is very
> unlikly to be needed.  GNU ld has supported large data sections since
> 2005
> (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=3b22753a67cf616514de804ef6d5ed5e90a7d883).
> Users' programs with the internal linker scripts will still be working
> and -fdata-sections sections will be combined.

Well, the concern clearly is about custom scripts. Imo ...

> First, -mcmodel=large use cases are rare enough.  Rare perhaps
> -mcmodel=largel was considered theoretic excercise  in
> trying to reach feature completion
> (https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU/m/NNuA0P7pAQAJ),
> without this patch -mcmodel=large object files don't interract well with
> existing -mcmodel=small object files.

... the more exotic a project, the more likely it is that they're using
custom scripts.

> Moreover, if a user expects a specific section prefix with
> -mcmodel=large, that's a brittle assumption. I think it's fair to say
> that the fault is on the user side and GCC doesn't need to work around
> their issues.

I guess I don't really see what you base this on. Without any special
options, expecting data to end up in .data/.bss/.rodata (and variants
thereof) looks like quite reasonable an assumption to me.

Jan
  
Fangrui Song May 26, 2023, 6:50 p.m. UTC | #4
On Fri, May 26, 2023 at 12:11 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 25.05.2023 18:11, Fangrui Song wrote:
> > On 2023-05-25, Jan Beulich wrote:
> >> On 25.05.2023 17:16, Fangrui Song wrote:
> >>> --- a/gcc/doc/invoke.texi
> >>> +++ b/gcc/doc/invoke.texi
> >>> @@ -32942,9 +32942,10 @@ the cache line size.  @samp{compat} is the default.
> >>>
> >>>  @opindex mlarge-data-threshold
> >>>  @item -mlarge-data-threshold=@var{threshold}
> >>> -When @option{-mcmodel=medium} is specified, data objects larger than
> >>> -@var{threshold} are placed in the large data section.  This value must be the
> >>> -same across all objects linked into the binary, and defaults to 65535.
> >>> +When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
> >>> +objects larger than @var{threshold} are placed in large data sections.  This
> >>> +value must be the same across all objects linked into the binary, and defaults
> >>> +to 65535.
> >>
> >> Where's the "must be the same" requirement coming from?
> >
> > It's an existing requirement.  I think it may be related to discouraging
> > different COMDAT sections names due to different -mlarge-data-threshold=.
> > I don't think it makes sense but did not feel strongly dropping it.
> >
> > Happy to drop the requirement if I revise this patch.
>
> I understand that this isn't something you introduce, but it still stuck
> me as odd. Therefore I thought I'd suggest to take the opportunity to at
> least soften the language, unless of course there's a real reason behind
> it.

Dropping "This value must be the same across all objects linked into
the binary" looks good to me.

> >> As to the default - to remain compatible with earlier versions, shouldn't
> >> large model code default to "infinity"?
> >>
> >> Jan
> >
> > I have thought about this compatibility need and feel that it is very
> > unlikly to be needed.  GNU ld has supported large data sections since
> > 2005
> > (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=3b22753a67cf616514de804ef6d5ed5e90a7d883).
> > Users' programs with the internal linker scripts will still be working
> > and -fdata-sections sections will be combined.
>
> Well, the concern clearly is about custom scripts. Imo ...
>
> > First, -mcmodel=large use cases are rare enough.  Rare perhaps
> > -mcmodel=largel was considered theoretic excercise  in
> > trying to reach feature completion
> > (https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU/m/NNuA0P7pAQAJ),
> > without this patch -mcmodel=large object files don't interract well with
> > existing -mcmodel=small object files.
>
> ... the more exotic a project, the more likely it is that they're using
> custom scripts.
>
> > Moreover, if a user expects a specific section prefix with
> > -mcmodel=large, that's a brittle assumption. I think it's fair to say
> > that the fault is on the user side and GCC doesn't need to work around
> > their issues.
>
> I guess I don't really see what you base this on. Without any special
> options, expecting data to end up in .data/.bss/.rodata (and variants
> thereof) looks like quite reasonable an assumption to me.
>
> Jan

Making -mlarge-data-threshold= default value for
-mcmodel={medium,large} seems quite odd to me.

The default value is 65536, which is larger than most data objects
that we may encounter in practice.
I want to investigate how often users use -mcmodel=large but it is
quite difficult. Many are for AIX and/or powerpc.
I have tried to be considerate but I am not sure we have users in the
intersection of the three sets: -mcmodel=large, data objects larger
than 65536, using linker script in a way that orphan sections .ldata
will cause trouble.
  

Patch

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 202abf0b39c..3568da4f053 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -637,7 +637,8 @@  ix86_can_inline_p (tree caller, tree callee)
 static bool
 ix86_in_large_data_p (tree exp)
 {
-  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC)
+  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC &&
+      ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC)
     return false;
 
   if (exp == NULL_TREE)
@@ -848,8 +849,9 @@  x86_elf_aligned_decl_common (FILE *file, tree decl,
 			const char *name, unsigned HOST_WIDE_INT size,
 			unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-      && size > (unsigned int)ix86_section_threshold)
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
+      ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
+     size > (unsigned int)ix86_section_threshold)
     {
       switch_to_section (get_named_section (decl, ".lbss", 0));
       fputs (LARGECOMM_SECTION_ASM_OP, file);
@@ -869,9 +871,10 @@  void
 x86_output_aligned_bss (FILE *file, tree decl, const char *name,
 		       	unsigned HOST_WIDE_INT size, unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-      && size > (unsigned int)ix86_section_threshold)
-    switch_to_section (get_named_section (decl, ".lbss", 0));
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
+       ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
+      size > (unsigned int)ix86_section_threshold)
+    switch_to_section(get_named_section(decl, ".lbss", 0));
   else
     switch_to_section (bss_section);
   ASM_OUTPUT_ALIGN (file, floor_log2 (align / BITS_PER_UNIT));
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index d74f6b1f8fc..de8e722cd62 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -282,7 +282,7 @@  Branches are this expensive (arbitrary units).
 
 mlarge-data-threshold=
 Target RejectNegative Joined UInteger Var(ix86_section_threshold) Init(DEFAULT_LARGE_SECTION_THRESHOLD)
--mlarge-data-threshold=<number>	Data greater than given threshold will go into .ldata section in x86-64 medium model.
+-mlarge-data-threshold=<number>	Data greater than given threshold will go into a large data section in x86-64 medium and large code models.
 
 mcmodel=
 Target RejectNegative Joined Enum(cmodel) Var(ix86_cmodel) Init(CM_32)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ee78591c73e..4b5391e12b5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -32942,9 +32942,10 @@  the cache line size.  @samp{compat} is the default.
 
 @opindex mlarge-data-threshold
 @item -mlarge-data-threshold=@var{threshold}
-When @option{-mcmodel=medium} is specified, data objects larger than
-@var{threshold} are placed in the large data section.  This value must be the
-same across all objects linked into the binary, and defaults to 65535.
+When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
+objects larger than @var{threshold} are placed in large data sections.  This
+value must be the same across all objects linked into the binary, and defaults
+to 65535.
 
 @opindex mrtd
 @item -mrtd
diff --git a/gcc/testsuite/gcc.target/i386/large-data.c b/gcc/testsuite/gcc.target/i386/large-data.c
new file mode 100644
index 00000000000..09a917431d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/large-data.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mcmodel=large -mlarge-data-threshold=4" } */
+/* { dg-final { scan-assembler ".lbss" } } */
+/* { dg-final { scan-assembler ".bss" } } */
+/* { dg-final { scan-assembler ".ldata" } } */
+/* { dg-final { scan-assembler ".data" } } */
+/* { dg-final { scan-assembler ".lrodata" } } */
+/* { dg-final { scan-assembler ".rodata" } } */
+
+const char rodata_a[] = "abc", rodata_b[] = "abcd";
+char data_a[4] = {1}, data_b[5] = {1};
+char bss_a[4], bss_b[5];