Various fixes for DWARF register size computation

Message ID 87pmbvx41g.fsf@oldenburg.str.redhat.com
State New
Headers
Series Various fixes for DWARF register size computation |

Commit Message

Florian Weimer Jan. 3, 2023, 11:15 a.m. UTC
  The previous code had several issues.

1. XALLOCAVEC does not create any objects, so invocating
   the non-POD poly_uint16 assignment operator is undefined.
2. The default constructor of poly-ints does not create a
   zero poly-int object (unlike what happens with regular ints).
3. The register size array must have DWARF_FRAME_REGISTERS + 1
   elements.  The extra element can be DWARF_FRAME_RETURN_COLUMN
   or DWARF_ALT_FRAME_RETURN_COLUMN.

To fix problem 3, merely increasing the array size is sufficient,
but it inhibits the x86-64 register size optimization in libgcc
because it does not use the extra register, so it has size zero.
To re-enable the optimization, expose the maximum used register
to libgcc.  This is sufficient for the optimizers to figure out
that the memcpy call in uw_install_context_1 has a fixed size
argument on x86-64.

This restores bootstrap on aarch64-linux-gnu and powerpc64-linux-gnu.
Not sure about test suite results yet, I need to check the baseline.

gcc/

	* debug.h (dwarf_reg_sizes_constant): Remove declaration.
	(dwarf_single_register_size): New struct.
	* dwarf2cfi.cc (generate_dwarf_reg_sizes): Initialize
	extra register size.  Use in-place new for initialization.
	Remove unnecessary memset.
	(dwarf_reg_sizes_constant): Remove.
	(dwarf_single_register_size::dwarf_single_register_size):
	New constructor based on removed dwarf_reg_sizes_constant
	function.  Allocate extra size element.
	(expand_builtin_init_dwarf_reg_sizes): Allocate extra size
	element.
	* target.def (init_dwarf_reg_sizes_extra): Mention extra size
	element.
	* doc/tm.texi: Update.

gcc/c-family/

	* c-cppbuiltin.cc (c_cpp_builtins): Switch to
	dwarf_single_register_size for obtaining DWARF register sizes.
	Define __LIBGCC_DWARF_REG_MAXIMUM__.

libgcc/

	* unwind-dw2.c (dwarf_reg_size): Use
	__LIBGCC_DWARF_REG_MAXIMUM__.

---
 gcc/c-family/c-cppbuiltin.cc | 12 ++++++++----
 gcc/debug.h                  | 13 ++++++++++++-
 gcc/doc/tm.texi              |  2 +-
 gcc/dwarf2cfi.cc             | 45 +++++++++++++++++++++++++++-----------------
 gcc/target.def               |  2 +-
 libgcc/unwind-dw2.c          |  7 +++++--
 6 files changed, 55 insertions(+), 26 deletions(-)


base-commit: 201c21b0e847679645df1af3dd13459274f41047
  

Comments

Jakub Jelinek Jan. 3, 2023, 12:05 p.m. UTC | #1
On Tue, Jan 03, 2023 at 12:15:23PM +0100, Florian Weimer wrote:
> --- a/gcc/debug.h
> +++ b/gcc/debug.h
> @@ -245,7 +245,18 @@ extern const struct gcc_debug_hooks vmsdbg_debug_hooks;
>  
>  /* Dwarf2 frame information.  */
>  
> -extern int dwarf_reg_sizes_constant ();
> +/* Query size information about DWARF registers.  */
> +struct dwarf_single_register_size
> +{
> +  dwarf_single_register_size();

Formatting, space before (

> @@ -334,27 +333,39 @@ generate_dwarf_reg_sizes (poly_uint16 *sizes)
>      targetm.init_dwarf_reg_sizes_extra (sizes);
>  }
>  
> -/* Return 0 if the DWARF register sizes are not constant, otherwise
> -   return the size constant.  */
> -
> -int
> -dwarf_reg_sizes_constant ()
> +dwarf_single_register_size::dwarf_single_register_size()

Likewise.

> +  for (int i = DWARF_FRAME_REGISTERS; i >= 0; --i)
> +    {
> +      unsigned short value;
> +      if (!sizes[i].is_constant (&value) || value != 0)

    if (!known_eq (sizes[i], 0))
?

Though, I still wonder, because all of this is a hack for a single target
- x86_64-linux -m64 - I think no other target has similar constant sizes, whether
it wouldn't be better to revert all this compiler side stuff and handle it
purely on the libgcc side - allow target headers to specify a simple
expression how to compute dwarf_reg_size + don't define dwarf_reg_size_table
array in that case and instead in a testcase verify that
__builtin_init_dwarf_reg_size_table initializes an array to the exact same
values as the libgcc/config/**/*.h overridden dwarf_reg_size version.
That way, for x86_64-linux we can use
((index) <= __LIBGCC_DWARF_FRAME_REGISTERS__ ? 8 : 0)
but could provide something reasonable even for other targets if it improves
the unwinder sufficiently.
Say s390x-linux -m64 is
((index) <= 32 ? 8 : (index) == 33 ? 4 : 0)
etc.

Or, if you want to do it on the compiler side, instead of predefining
__LIBGCC_DWARF_REG_SIZES_CONSTANT__ and __LIBGCC_DWARF_REG_MAXIMUM__
register conditionally a new builtin, __builtin_dwarf_reg_size,
which would be defined only if -fbuilding-libgcc and the compiler determines
dwarf_reg_size is desirable to be computed inline without a table and
would fold the builtin to say that
index <= 16U ? 8 : 0 on x86_64 -m64,
index <= 9U ? 4 : index - 11U <= 5U ? 12 : 0 on x86_64 -m32 etc.
and if the expression is too large/complex, wouldn't predefine the builtin.

Then you can
#if __has_builtin(__builtin_dwarf_reg_size)
use the builtin and don't provide the table + initialize it,
otherwise initialize + use the table.

Or, is it actually the use of table that is bad on the unwinder side,
or lack of a small upper bound for what you get from the table?
In that case you could predefine upper bound on the sizes instead (if
constant) and simply add if (size > __LIBGCC_DWARF_REG_SIZE_MAX__)
__builtin_unreachable ()).

	Jakub
  
Florian Weimer Jan. 3, 2023, 1:25 p.m. UTC | #2
* Jakub Jelinek:

> On Tue, Jan 03, 2023 at 12:15:23PM +0100, Florian Weimer wrote:
>> --- a/gcc/debug.h
>> +++ b/gcc/debug.h
>> @@ -245,7 +245,18 @@ extern const struct gcc_debug_hooks vmsdbg_debug_hooks;
>>  
>>  /* Dwarf2 frame information.  */
>>  
>> -extern int dwarf_reg_sizes_constant ();
>> +/* Query size information about DWARF registers.  */
>> +struct dwarf_single_register_size
>> +{
>> +  dwarf_single_register_size();
>
> Formatting, space before (
>
>> @@ -334,27 +333,39 @@ generate_dwarf_reg_sizes (poly_uint16 *sizes)
>>      targetm.init_dwarf_reg_sizes_extra (sizes);
>>  }
>>  
>> -/* Return 0 if the DWARF register sizes are not constant, otherwise
>> -   return the size constant.  */
>> -
>> -int
>> -dwarf_reg_sizes_constant ()
>> +dwarf_single_register_size::dwarf_single_register_size()
>
> Likewise.
>
>> +  for (int i = DWARF_FRAME_REGISTERS; i >= 0; --i)
>> +    {
>> +      unsigned short value;
>> +      if (!sizes[i].is_constant (&value) || value != 0)
>
>     if (!known_eq (sizes[i], 0))
> ?

Right.

> Though, I still wonder, because all of this is a hack for a single target
> - x86_64-linux -m64 - I think no other target has similar constant
> sizes,

Really?  That's odd.

Is it because other architectures track callee-saved vector registers
through unwinding?

> whether
> it wouldn't be better to revert all this compiler side stuff and handle it
> purely on the libgcc side - allow target headers to specify a simple
> expression how to compute dwarf_reg_size + don't define dwarf_reg_size_table
> array in that case and instead in a testcase verify that
> __builtin_init_dwarf_reg_size_table initializes an array to the exact same
> values as the libgcc/config/**/*.h overridden dwarf_reg_size version.
> That way, for x86_64-linux we can use
> ((index) <= __LIBGCC_DWARF_FRAME_REGISTERS__ ? 8 : 0)
> but could provide something reasonable even for other targets if it improves
> the unwinder sufficiently.
> Say s390x-linux -m64 is
> ((index) <= 32 ? 8 : (index) == 33 ? 4 : 0)
> etc.

> Or, if you want to do it on the compiler side, instead of predefining
> __LIBGCC_DWARF_REG_SIZES_CONSTANT__ and __LIBGCC_DWARF_REG_MAXIMUM__
> register conditionally a new builtin, __builtin_dwarf_reg_size,
> which would be defined only if -fbuilding-libgcc and the compiler determines
> dwarf_reg_size is desirable to be computed inline without a table and
> would fold the builtin to say that
> index <= 16U ? 8 : 0 on x86_64 -m64,
> index <= 9U ? 4 : index - 11U <= 5U ? 12 : 0 on x86_64 -m32 etc.
> and if the expression is too large/complex, wouldn't predefine the builtin.

I think the pre-computation of the size array is useful even for targets
where the expression is not so simple, but the array elements are still
constants.  A builtin like __builtin_dwarf_reg_size could use a
reference to a constant static array, so that we can get rid of the
array initialization code in libgcc.  Before we can do that, we need to
figure out if the fully dynamic register sizes on AArch64 with SVE are
actually correct—and if we need to fix the non-SVE unwinder to work
properly for SVE programs.

So I don't want to revert the size array computation just yet.

> Or, is it actually the use of table that is bad on the unwinder side,
> or lack of a small upper bound for what you get from the table?
> In that case you could predefine upper bound on the sizes instead (if
> constant) and simply add if (size > __LIBGCC_DWARF_REG_SIZE_MAX__)
> __builtin_unreachable ()).

It also matters what kind of register sizes are used in practice.
Looking at the FDE for _Unwind_RaiseException on i686, we only save
4-byte registers there, I think.  Perhaps only non-GCC-generated code
may exercise the other register sizes?  That's different on AArch64,
where the vector registers are saved as well.

Should I repost this patch with the three nits fixed?  Or should I
revert two of the three patches I committed instead?

Thanks,
Florian
  
Jakub Jelinek Jan. 3, 2023, 1:54 p.m. UTC | #3
On Tue, Jan 03, 2023 at 02:25:21PM +0100, Florian Weimer wrote:
> > Though, I still wonder, because all of this is a hack for a single target
> > - x86_64-linux -m64 - I think no other target has similar constant
> > sizes,
> 
> Really?  That's odd.

I've tried about 30 cross compilers I had around, I admit it isn't
exhaustive.

> Is it because other architectures track callee-saved vector registers
> through unwinding?

I think it is far more than just vector registers, it can be floating point
registers, or just one integral or special register of a different size etc.

> > Or, if you want to do it on the compiler side, instead of predefining
> > __LIBGCC_DWARF_REG_SIZES_CONSTANT__ and __LIBGCC_DWARF_REG_MAXIMUM__
> > register conditionally a new builtin, __builtin_dwarf_reg_size,
> > which would be defined only if -fbuilding-libgcc and the compiler determines
> > dwarf_reg_size is desirable to be computed inline without a table and
> > would fold the builtin to say that
> > index <= 16U ? 8 : 0 on x86_64 -m64,
> > index <= 9U ? 4 : index - 11U <= 5U ? 12 : 0 on x86_64 -m32 etc.
> > and if the expression is too large/complex, wouldn't predefine the builtin.
> 
> I think the pre-computation of the size array is useful even for targets
> where the expression is not so simple, but the array elements are still
> constants.  A builtin like __builtin_dwarf_reg_size could use a
> reference to a constant static array, so that we can get rid of the
> array initialization code in libgcc.  Before we can do that, we need to

I think constant static array might be sometimes an option too, but not
always, not just because of AArch64, or other potential poly-int arches
(RISCV?), but also if something could depend on some runtime check.
It is true that most of the time it is constant and depends just on the
target or more often on target + options combo (mostly ABI related options).

> figure out if the fully dynamic register sizes on AArch64 with SVE are
> actually correct—and if we need to fix the non-SVE unwinder to work
> properly for SVE programs.
> 
> So I don't want to revert the size array computation just yet.
> 
> > Or, is it actually the use of table that is bad on the unwinder side,
> > or lack of a small upper bound for what you get from the table?
> > In that case you could predefine upper bound on the sizes instead (if
> > constant) and simply add if (size > __LIBGCC_DWARF_REG_SIZE_MAX__)
> > __builtin_unreachable ()).
> 
> It also matters what kind of register sizes are used in practice.

Yes, but that is hard to find out easily.  E.g. some registers might be
only saved/restored in signal frames and nowhere else, others only rarely
touched but still we'd need their sizes if they are ever used.

> Should I repost this patch with the three nits fixed?  Or should I
> revert two of the three patches I committed instead?

I lean towards reversion and trying to figure out something that works
on more than one arch.  It doesn't have to improve all arches (say
AArch64 is clearly a nightmare that isn't handled correctly even without
any changes - as noted in the PRs, if libgcc is built without SVE, it will
hardcode 8, while if it is built with SVE, it will be runtime dependent and
will be wrong in theory when some HW has 2048 bit SVE vectors - when it is
256 bytes), but still watching into what we compile -O2 -nostdinc -fbuilding-libgcc

static unsigned char dwarf_reg_size_table[__LIBGCC_DWARF_FRAME_REGISTERS__+1];

void
foo (void)
{
  __builtin_init_dwarf_reg_size_table (dwarf_reg_size_table);
}

on at least 10 most common arches would be useful and optimizing what is
easily possible would be nice.

	Jakub
  
Richard Sandiford Jan. 12, 2023, 4:50 p.m. UTC | #4
Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Tue, Jan 03, 2023 at 02:25:21PM +0100, Florian Weimer wrote:
>> > Though, I still wonder, because all of this is a hack for a single target
>> > - x86_64-linux -m64 - I think no other target has similar constant
>> > sizes,
>> 
>> Really?  That's odd.
>
> I've tried about 30 cross compilers I had around, I admit it isn't
> exhaustive.
>
>> Is it because other architectures track callee-saved vector registers
>> through unwinding?
>
> I think it is far more than just vector registers, it can be floating point
> registers, or just one integral or special register of a different size etc.
>
>> > Or, if you want to do it on the compiler side, instead of predefining
>> > __LIBGCC_DWARF_REG_SIZES_CONSTANT__ and __LIBGCC_DWARF_REG_MAXIMUM__
>> > register conditionally a new builtin, __builtin_dwarf_reg_size,
>> > which would be defined only if -fbuilding-libgcc and the compiler determines
>> > dwarf_reg_size is desirable to be computed inline without a table and
>> > would fold the builtin to say that
>> > index <= 16U ? 8 : 0 on x86_64 -m64,
>> > index <= 9U ? 4 : index - 11U <= 5U ? 12 : 0 on x86_64 -m32 etc.
>> > and if the expression is too large/complex, wouldn't predefine the builtin.
>> 
>> I think the pre-computation of the size array is useful even for targets
>> where the expression is not so simple, but the array elements are still
>> constants.  A builtin like __builtin_dwarf_reg_size could use a
>> reference to a constant static array, so that we can get rid of the
>> array initialization code in libgcc.  Before we can do that, we need to
>
> I think constant static array might be sometimes an option too, but not
> always, not just because of AArch64, or other potential poly-int arches
> (RISCV?), but also if something could depend on some runtime check.
> It is true that most of the time it is constant and depends just on the
> target or more often on target + options combo (mostly ABI related options).
>
>> figure out if the fully dynamic register sizes on AArch64 with SVE are
>> actually correct—and if we need to fix the non-SVE unwinder to work
>> properly for SVE programs.
>> 
>> So I don't want to revert the size array computation just yet.
>> 
>> > Or, is it actually the use of table that is bad on the unwinder side,
>> > or lack of a small upper bound for what you get from the table?
>> > In that case you could predefine upper bound on the sizes instead (if
>> > constant) and simply add if (size > __LIBGCC_DWARF_REG_SIZE_MAX__)
>> > __builtin_unreachable ()).
>> 
>> It also matters what kind of register sizes are used in practice.
>
> Yes, but that is hard to find out easily.  E.g. some registers might be
> only saved/restored in signal frames and nowhere else, others only rarely
> touched but still we'd need their sizes if they are ever used.
>
>> Should I repost this patch with the three nits fixed?  Or should I
>> revert two of the three patches I committed instead?
>
> I lean towards reversion and trying to figure out something that works
> on more than one arch.  It doesn't have to improve all arches (say
> AArch64 is clearly a nightmare that isn't handled correctly even without
> any changes - as noted in the PRs, if libgcc is built without SVE, it will
> hardcode 8, while if it is built with SVE, it will be runtime dependent and
> will be wrong in theory when some HW has 2048 bit SVE vectors - when it is
> 256 bytes), but still watching into what we compile -O2 -nostdinc -fbuilding-libgcc

I'm jumping in here without fully understanding the context, so maybe this
is exactly your point, but: the SIMD/FP DWARF registers are supposed to be
size 8 regardless of which features are enabled.  That's already only half
of the hardware register size for base Armv8-A, since Advanced SIMD registers
are 16 bytes in size.

So yeah, if we're using the hardware register size then something is wrong.

Thanks,
Richard

> static unsigned char dwarf_reg_size_table[__LIBGCC_DWARF_FRAME_REGISTERS__+1];
>
> void
> foo (void)
> {
>   __builtin_init_dwarf_reg_size_table (dwarf_reg_size_table);
> }
>
> on at least 10 most common arches would be useful and optimizing what is
> easily possible would be nice.
>
> 	Jakub
  
Jakub Jelinek Jan. 12, 2023, 5:07 p.m. UTC | #5
On Thu, Jan 12, 2023 at 04:50:07PM +0000, Richard Sandiford wrote:
> I'm jumping in here without fully understanding the context, so maybe this
> is exactly your point, but: the SIMD/FP DWARF registers are supposed to be
> size 8 regardless of which features are enabled.  That's already only half
> of the hardware register size for base Armv8-A, since Advanced SIMD registers
> are 16 bytes in size.
> 
> So yeah, if we're using the hardware register size then something is wrong.

I'm talking about what the following compiles to
static unsigned char dwarf_reg_size_table[__LIBGCC_DWARF_FRAME_REGISTERS__+1];

void
foo (void)
{
  __builtin_init_dwarf_reg_size_table (dwarf_reg_size_table);
}
(and therefore what libgcc/unwind-dw2.c (init_dwarf_reg_size_table) as well)
with -O2 -fbuilding-libgcc -march=armv8-a vs. -O2 -fbuilding-libgcc -march=armv8-a+sve
The former is setting I think [0..31, 46, 48..63, 72..79, 96]=8, [64..71, 80..95]=0
(and leaving others untouched, which keeps them 0).
While the latter is setting [0..31, 46, 72..79, 96]=8, [64..71, 80..95]=0
and [48..63]=cntd

	Jakub
.arch armv8-a
	.file	"8.c"
	.text
	.align	2
	.align	4
	.global	foo
	.type	foo, %function
foo:
.LFB0:
	adrp	x2, .LANCHOR0
	add	x0, x2, :lo12:.LANCHOR0
	mov	w1, 8
	strb	w1, [x2, #:lo12:.LANCHOR0]
	strb	w1, [x0, 1]
	strb	w1, [x0, 2]
	strb	w1, [x0, 3]
	strb	w1, [x0, 4]
	strb	w1, [x0, 5]
	strb	w1, [x0, 6]
	strb	w1, [x0, 7]
	strb	w1, [x0, 8]
	strb	w1, [x0, 9]
	strb	w1, [x0, 10]
	strb	w1, [x0, 11]
	strb	w1, [x0, 12]
	strb	w1, [x0, 13]
	strb	w1, [x0, 14]
	strb	w1, [x0, 15]
	strb	w1, [x0, 16]
	strb	w1, [x0, 17]
	strb	w1, [x0, 18]
	strb	w1, [x0, 19]
	strb	w1, [x0, 20]
	strb	w1, [x0, 21]
	strb	w1, [x0, 22]
	strb	w1, [x0, 23]
	strb	w1, [x0, 24]
	strb	w1, [x0, 25]
	strb	w1, [x0, 26]
	strb	w1, [x0, 27]
	strb	w1, [x0, 28]
	strb	w1, [x0, 29]
	strb	w1, [x0, 30]
	strb	w1, [x0, 31]
	strb	wzr, [x0, 64]
	strb	w1, [x0, 46]
	strb	wzr, [x0, 65]
	strb	wzr, [x0, 66]
	strb	wzr, [x0, 67]
	strb	wzr, [x0, 68]
	strb	wzr, [x0, 69]
	strb	wzr, [x0, 70]
	strb	wzr, [x0, 71]
	strb	w1, [x0, 72]
	strb	w1, [x0, 73]
	strb	w1, [x0, 74]
	strb	w1, [x0, 75]
	strb	w1, [x0, 76]
	strb	w1, [x0, 77]
	strb	w1, [x0, 78]
	strb	w1, [x0, 79]
	strb	wzr, [x0, 80]
	strb	wzr, [x0, 81]
	strb	wzr, [x0, 82]
	strb	wzr, [x0, 83]
	strb	wzr, [x0, 84]
	strb	wzr, [x0, 85]
	strb	wzr, [x0, 86]
	strb	wzr, [x0, 87]
	strb	wzr, [x0, 88]
	strb	wzr, [x0, 89]
	strb	wzr, [x0, 90]
	strb	wzr, [x0, 91]
	strb	wzr, [x0, 92]
	strb	wzr, [x0, 93]
	strb	wzr, [x0, 94]
	strb	wzr, [x0, 95]
	strb	w1, [x0, 48]
	strb	w1, [x0, 49]
	strb	w1, [x0, 50]
	strb	w1, [x0, 51]
	strb	w1, [x0, 52]
	strb	w1, [x0, 53]
	strb	w1, [x0, 54]
	strb	w1, [x0, 55]
	strb	w1, [x0, 56]
	strb	w1, [x0, 57]
	strb	w1, [x0, 58]
	strb	w1, [x0, 59]
	strb	w1, [x0, 60]
	strb	w1, [x0, 61]
	strb	w1, [x0, 62]
	strb	w1, [x0, 63]
	strb	w1, [x0, 96]
	ret
.LFE0:
	.size	foo, .-foo
	.bss
	.align	4
	.set	.LANCHOR0,. + 0
	.type	dwarf_reg_size_table, %object
	.size	dwarf_reg_size_table, 98
dwarf_reg_size_table:
	.zero	98
	.section	.eh_frame,"aw",@progbits
.Lframe1:
	.4byte	.LECIE1-.LSCIE1
.LSCIE1:
	.4byte	0
	.byte	0x3
	.string	"zR"
	.byte	0x1
	.byte	0x78
	.byte	0x1e
	.byte	0x1
	.byte	0x1b
	.byte	0xc
	.byte	0x1f
	.byte	0
	.align	3
.LECIE1:
.LSFDE1:
	.4byte	.LEFDE1-.LASFDE1
.LASFDE1:
	.4byte	.LASFDE1-.Lframe1
	.4byte	.LFB0-.
	.4byte	.LFE0-.LFB0
	.byte	0
	.align	3
.LEFDE1:
	.ident	"GCC: (GNU) 13.0.0 20230111 (experimental)"
	.section	.note.GNU-stack,"",@progbits
.arch armv8-a+sve
	.file	"8.c"
	.text
	.align	2
	.align	4
	.global	foo
	.type	foo, %function
foo:
.LFB0:
	adrp	x3, .LANCHOR0
	add	x0, x3, :lo12:.LANCHOR0
	mov	w1, 8
	cntd	x2
	strb	w1, [x3, #:lo12:.LANCHOR0]
	strb	w1, [x0, 1]
	strb	w1, [x0, 2]
	strb	w1, [x0, 3]
	strb	w1, [x0, 4]
	strb	w1, [x0, 5]
	strb	w1, [x0, 6]
	strb	w1, [x0, 7]
	strb	w1, [x0, 8]
	strb	w1, [x0, 9]
	strb	w1, [x0, 10]
	strb	w1, [x0, 11]
	strb	w1, [x0, 12]
	strb	w1, [x0, 13]
	strb	w1, [x0, 14]
	strb	w1, [x0, 15]
	strb	w1, [x0, 16]
	strb	w1, [x0, 17]
	strb	w1, [x0, 18]
	strb	w1, [x0, 19]
	strb	w1, [x0, 20]
	strb	w1, [x0, 21]
	strb	w1, [x0, 22]
	strb	w1, [x0, 23]
	strb	w1, [x0, 24]
	strb	w1, [x0, 25]
	strb	w1, [x0, 26]
	strb	w1, [x0, 27]
	strb	w1, [x0, 28]
	strb	w1, [x0, 29]
	strb	w1, [x0, 30]
	strb	w1, [x0, 31]
	strb	wzr, [x0, 64]
	strb	w1, [x0, 46]
	strb	wzr, [x0, 65]
	strb	wzr, [x0, 66]
	strb	wzr, [x0, 67]
	strb	wzr, [x0, 68]
	strb	wzr, [x0, 69]
	strb	wzr, [x0, 70]
	strb	wzr, [x0, 71]
	strb	w1, [x0, 72]
	strb	w1, [x0, 73]
	strb	w1, [x0, 74]
	strb	w1, [x0, 75]
	strb	w1, [x0, 76]
	strb	w1, [x0, 77]
	strb	w1, [x0, 78]
	strb	w1, [x0, 79]
	strb	wzr, [x0, 80]
	strb	wzr, [x0, 81]
	strb	wzr, [x0, 82]
	strb	wzr, [x0, 83]
	strb	wzr, [x0, 84]
	strb	wzr, [x0, 85]
	strb	wzr, [x0, 86]
	strb	wzr, [x0, 87]
	strb	wzr, [x0, 88]
	strb	wzr, [x0, 89]
	strb	wzr, [x0, 90]
	strb	wzr, [x0, 91]
	strb	wzr, [x0, 92]
	strb	wzr, [x0, 93]
	strb	wzr, [x0, 94]
	strb	wzr, [x0, 95]
	strb	w2, [x0, 48]
	strb	w2, [x0, 49]
	strb	w2, [x0, 50]
	strb	w2, [x0, 51]
	strb	w2, [x0, 52]
	strb	w2, [x0, 53]
	strb	w2, [x0, 54]
	strb	w2, [x0, 55]
	strb	w2, [x0, 56]
	strb	w2, [x0, 57]
	strb	w2, [x0, 58]
	strb	w2, [x0, 59]
	strb	w2, [x0, 60]
	strb	w2, [x0, 61]
	strb	w2, [x0, 62]
	strb	w2, [x0, 63]
	strb	w1, [x0, 96]
	ret
.LFE0:
	.size	foo, .-foo
	.bss
	.align	3
	.set	.LANCHOR0,. + 0
	.type	dwarf_reg_size_table, %object
	.size	dwarf_reg_size_table, 98
dwarf_reg_size_table:
	.zero	98
	.section	.eh_frame,"aw",@progbits
.Lframe1:
	.4byte	.LECIE1-.LSCIE1
.LSCIE1:
	.4byte	0
	.byte	0x3
	.string	"zR"
	.byte	0x1
	.byte	0x78
	.byte	0x1e
	.byte	0x1
	.byte	0x1b
	.byte	0xc
	.byte	0x1f
	.byte	0
	.align	3
.LECIE1:
.LSFDE1:
	.4byte	.LEFDE1-.LASFDE1
.LASFDE1:
	.4byte	.LASFDE1-.Lframe1
	.4byte	.LFB0-.
	.4byte	.LFE0-.LFB0
	.byte	0
	.align	3
.LEFDE1:
	.ident	"GCC: (GNU) 13.0.0 20230111 (experimental)"
	.section	.note.GNU-stack,"",@progbits
  
Richard Sandiford Jan. 12, 2023, 9:03 p.m. UTC | #6
Jakub Jelinek <jakub@redhat.com> writes:
> On Thu, Jan 12, 2023 at 04:50:07PM +0000, Richard Sandiford wrote:
>> I'm jumping in here without fully understanding the context, so maybe this
>> is exactly your point, but: the SIMD/FP DWARF registers are supposed to be
>> size 8 regardless of which features are enabled.  That's already only half
>> of the hardware register size for base Armv8-A, since Advanced SIMD registers
>> are 16 bytes in size.
>> 
>> So yeah, if we're using the hardware register size then something is wrong.
>
> I'm talking about what the following compiles to
> static unsigned char dwarf_reg_size_table[__LIBGCC_DWARF_FRAME_REGISTERS__+1];
>
> void
> foo (void)
> {
>   __builtin_init_dwarf_reg_size_table (dwarf_reg_size_table);
> }
> (and therefore what libgcc/unwind-dw2.c (init_dwarf_reg_size_table) as well)
> with -O2 -fbuilding-libgcc -march=armv8-a vs. -O2 -fbuilding-libgcc -march=armv8-a+sve
> The former is setting I think [0..31, 46, 48..63, 72..79, 96]=8, [64..71, 80..95]=0
> (and leaving others untouched, which keeps them 0).
> While the latter is setting [0..31, 46, 72..79, 96]=8, [64..71, 80..95]=0
> and [48..63]=cntd

Ah, interesting.  So the SIMD/FP registers are OK, but the predicate
registers are causing a problem.

I think we should set the predicates to size 0 too, like we do for
call-clobbered FP registers.  Predicate registers should never need
to be represented in CFI.

Thanks,
Richard
  

Patch

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index ddfd63b8eb9..8098aca41e8 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1522,10 +1522,14 @@  c_cpp_builtins (cpp_reader *pfile)
       builtin_define_with_int_value ("__LIBGCC_DWARF_FRAME_REGISTERS__",
 				     DWARF_FRAME_REGISTERS);
       {
-	int value = dwarf_reg_sizes_constant ();
-	if (value > 0)
-	  builtin_define_with_int_value ("__LIBGCC_DWARF_REG_SIZES_CONSTANT__",
-					 value);
+	dwarf_single_register_size srs;
+	if (srs.common_size > 0)
+	  {
+	    builtin_define_with_int_value ("__LIBGCC_DWARF_REG_SIZES_CONSTANT__",
+					   srs.common_size);
+	    builtin_define_with_int_value ("__LIBGCC_DWARF_REG_MAXIMUM__",
+					   srs.maximum_register);
+	  }
       }
       builtin_define_with_int_value ("__LIBGCC_DWARF_CIE_DATA_ALIGNMENT__",
 				     DWARF_CIE_DATA_ALIGNMENT);
diff --git a/gcc/debug.h b/gcc/debug.h
index 4fe9f3570ac..2e843da8b41 100644
--- a/gcc/debug.h
+++ b/gcc/debug.h
@@ -245,7 +245,18 @@  extern const struct gcc_debug_hooks vmsdbg_debug_hooks;
 
 /* Dwarf2 frame information.  */
 
-extern int dwarf_reg_sizes_constant ();
+/* Query size information about DWARF registers.  */
+struct dwarf_single_register_size
+{
+  dwarf_single_register_size();
+
+  /* The common register size, or 0 if the register size varies.  */
+  unsigned int common_size;
+
+  /* The maximum register number that is actually present.  Registers
+     above the maximum are size zero even if common_size is positive.  */
+  unsigned int maximum_register;
+};
 
 extern void dwarf2out_begin_prologue (unsigned int, unsigned int,
 				      const char *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b6d7900f212..eb29cfb95aa 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -9847,7 +9847,7 @@  sizes of those pieces in the table used by the unwinder at runtime.
 It will be called by @code{generate_dwarf_reg_sizes} after
 filling in a single size corresponding to each hard register;
 @var{sizes} is the address of the table.  It will contain
-@code{DWARF_FRAME_REGISTERS} elements when this hook is called.
+@code{DWARF_FRAME_REGISTERS + 1} elements when this hook is called.
 @end deftypefn
 
 @deftypefn {Target Hook} bool TARGET_ASM_TTYPE (rtx @var{sym})
diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index d5a27dc36c5..5bd12e070b3 100644
--- a/gcc/dwarf2cfi.cc
+++ b/gcc/dwarf2cfi.cc
@@ -291,11 +291,10 @@  init_one_dwarf_reg_size (int regno, machine_mode regmode,
 static void
 generate_dwarf_reg_sizes (poly_uint16 *sizes)
 {
-  for (unsigned int i = 0; i < DWARF_FRAME_REGISTERS; i++)
-    sizes[i] = poly_uint16{};
+  for (unsigned int i = 0; i <= DWARF_FRAME_REGISTERS; i++)
+    new (&sizes[i]) poly_uint16(0);
 
   init_one_dwarf_reg_state init_state{};
-  memset ((char *)&init_state, 0, sizeof (init_state));
 
   for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
     {
@@ -334,27 +333,39 @@  generate_dwarf_reg_sizes (poly_uint16 *sizes)
     targetm.init_dwarf_reg_sizes_extra (sizes);
 }
 
-/* Return 0 if the DWARF register sizes are not constant, otherwise
-   return the size constant.  */
-
-int
-dwarf_reg_sizes_constant ()
+dwarf_single_register_size::dwarf_single_register_size()
 {
-  poly_uint16 *sizes = XALLOCAVEC (poly_uint16, DWARF_FRAME_REGISTERS);
+  poly_uint16 *sizes = XALLOCAVEC (poly_uint16, DWARF_FRAME_REGISTERS + 1);
   generate_dwarf_reg_sizes (sizes);
 
-  int result;
-  for (unsigned int i = 0; i < DWARF_FRAME_REGISTERS; i++)
+  /* Find the last register number actually in use.  */
+  for (int i = DWARF_FRAME_REGISTERS; i >= 0; --i)
+    {
+      unsigned short value;
+      if (!sizes[i].is_constant (&value) || value != 0)
+	{
+	  maximum_register = i;
+	  break;
+	}
+    }
+
+  /* Check for a common register size among the used registers.  */
+  for (unsigned int i = 0; i <= maximum_register; ++i)
     {
       unsigned short value;
       if (!sizes[i].is_constant (&value))
-	return 0;
+	{
+	  common_size = 0;
+	  break;
+	}
       if (i == 0)
-	result = value;
-      else if (result != value)
-	return 0;
+	common_size = value;
+      else if (common_size != value)
+	{
+	  common_size = 0;
+	  break;
+	}
     }
-  return result;
 }
 
 /* Generate code to initialize the dwarf register size table located
@@ -363,7 +374,7 @@  dwarf_reg_sizes_constant ()
 void
 expand_builtin_init_dwarf_reg_sizes (tree address)
 {
-  poly_uint16 *sizes = XALLOCAVEC (poly_uint16, DWARF_FRAME_REGISTERS);
+  poly_uint16 *sizes = XALLOCAVEC (poly_uint16, DWARF_FRAME_REGISTERS + 1);
   generate_dwarf_reg_sizes (sizes);
 
   scalar_int_mode mode = SCALAR_INT_TYPE_MODE (char_type_node);
diff --git a/gcc/target.def b/gcc/target.def
index da5dd31d7a4..dfffc534f83 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4042,7 +4042,7 @@  sizes of those pieces in the table used by the unwinder at runtime.\n\
 It will be called by @code{generate_dwarf_reg_sizes} after\n\
 filling in a single size corresponding to each hard register;\n\
 @var{sizes} is the address of the table.  It will contain\n\
-@code{DWARF_FRAME_REGISTERS} elements when this hook is called.",
+@code{DWARF_FRAME_REGISTERS + 1} elements when this hook is called.",
  void, (poly_uint16 *sizes), nullptr)
 
 /* Fetch the fixed register(s) which hold condition codes, for
diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
index 792338c1e38..7a173487061 100644
--- a/libgcc/unwind-dw2.c
+++ b/libgcc/unwind-dw2.c
@@ -150,9 +150,12 @@  struct _Unwind_Context
 
 #ifdef __LIBGCC_DWARF_REG_SIZES_CONSTANT__
 static inline unsigned char
-dwarf_reg_size (int index __attribute__ ((__unused__)))
+dwarf_reg_size (unsigned int index)
 {
-  return __LIBGCC_DWARF_REG_SIZES_CONSTANT__;
+  if (index <= __LIBGCC_DWARF_REG_MAXIMUM__)
+    return __LIBGCC_DWARF_REG_SIZES_CONSTANT__;
+  else
+    return 0;
 }
 #else
 /* Byte size of every register managed by these routines.  */