[RFC] Aarch64: optimize stpcpy a bit.

  On Thu, Jun 04, 2015 at 02:44:59PM +0100, Richard Earnshaw wrote:
> On 04/06/15 13:28, Ondřej Bílka wrote:
> > On Thu, Jun 04, 2015 at 11:27:57AM +0100, Richard Earnshaw wrote:
> >> On 25/05/15 12:45, Ondřej Bílka wrote:
> >>> Replaces it with strcpy. One could argue that opposite way to replace
> >>> strcpy with stpcpy is faster.
> >>>
> >>> Reason is register pressure. Strcpy needs extra register to save return
> >>> value while stpcpy has return value already in register used for writing
> >>> terminating zero.
> >>
> >>
> >> Depends on your architecture.  On aarch64 we have plenty of spare
> >> registers, so strcpy simply copies the destination register into a
> >> scratch.  It then doesn't have to carefully calculate the return value
> >> at the end of the function (making the tail code simpler - there are
> >> multiple return statements, but only one entry point).
> >>
> > Thats correct, main saving you get is from return value is first register, that
> > forces needing extra copy which is suboptimal.
> 
> No, look at the AArch64 code.  The only time we ever end up with a
> simple MOV instruction to copy the register from one location to another
> is in the stPcpy code.  In strcpy it always ends up folded into some
> other operation that we have to do anyway.  Once it's been copied to
> that other register we never have to use it elsewhere again.
> Furthermore, the need to handle smallish copies with overlapping stores
> means we need both the original base address /and/ the final result, so
> we'd still need to end up saving it for stpcpy.
>
Wrote too fast, was refering that you would need to copy that on small
address. With dest in different register I could make strcpy and stpcpy
const same  instructions in most cases except size 8-15 by adjusting 
offsets with some constants.

Also if I think that you could remove extra instructions for stpcpy loop 
with following, which also removes one instruction from strcpy if I read
code correctly.

[RFC] Aarch64: optimize stpcpy a bit.

Commit Message

Comments

Patch