[0/3,RFC] fsra: Add final gimple sra before expander

Message ID 20240227070412.3471038-1-guojiufu@linux.ibm.com
Series fsra: Add final gimple sra before expander |


Jiufu Guo Feb. 27, 2024, 7:04 a.m. UTC

As known there are a few PRs (meta-bug PR101926) about
accessing aggregate param/returns which are passed through registers.

Given the suggestion from: 
We could even use the actual SRA pass in a special mode right before
RTL expansion for the incoming/outgoing part.

Compared to other solutions (e.g. previous light-sra-in-expander), this
method could decouple the different parts (gimple-sra and rtl-expand),
and could leverage the current SRA maximum.

The following patches implements a prototype of this idea.

In this prototype, only "parameters and returns" are treated as 'sra'
candidates.  If a 'parameter' is scalarized, then an IFN_ARG_PART is 
generated for the access at the beginning of the function, and if an
access of a 'return' is scalarized, then IFN_SET_RET is generated 
for it.  Those IFNs are expanded according to the incoming/outgoing
registers for the accesses.

Bootstrapped/regtested on ppc64{,le} and x86_64.

In this prototype, there are still a few areas which can be enhanced,
- Access multi-registers in one stmt,
- Arg access across function calls,
- More special target/ABI behavior,

I would like to ask for comments/suggestions before jump into depth,
to ensure this is in the correct direction, and to avoid missing some
important thing.

One thing/concern in this implementation:
For an aggregate parameter, if it is not passed through registers,
there is no need to scalarize in this sra.
For example like i386/pr101908-3.c,
Without sra, the stmts look like this:
 bar ();
 vect__1.5_10 = MEM <vector(2) double> [(double *)&x];

With sra, the stmts look like this:
 x_1 = .ARG_PART (x, 0, 128);
 bar ();
 vect__1.5_10 = x_1;

The issue is that there are no instructions before invoking
'bar ()' without the patch; with the patch, instructions may be generated
before 'bar ()' and those insns would not easy to be optimized by RTL
This would not be hard to fix (but maybe hacking):
- Let 'sra' pass know the information about if the access is in the
  register, then avoid generating 'ARG_PART'. This would introduce coupling
  before gimple sra and rtl.
- When expanding 'ARG_PART', if the access is in mem, then defer it.
  This would mean 'ARG_PART' may expand to nothing and make the dumped rtl is
  a little confused.
Any comments?

This prototype is splitted into three patches for review.
1/3: Add final gimple sra just before expander
2/3: Add support for ARG_PARTS
3/3: Add support for RET_PARTS

Thanks for your comments and suggestions!

Jeff (Jiufu Guo)