[v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores

Message ID 20240618181407.2750948-1-lis8215@gmail.com
State Superseded
Headers
Series [v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed

Commit Message

Siarhei Volkau June 18, 2024, 6:14 p.m. UTC
  If the address register is dead after load/store operation it looks
beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
at least if optimizing for size.

Changes v1 -> v2:
 - switching to peephole2 approach
 - added test case

gcc/ChangeLog:

        * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
        (peephole2 to rewrite DI/DF store): New.
        (thumb1_movdi_insn): Handle overlapped regs ldmia case.
        (thumb_movdf_insn): Likewise.

	* config/arm/iterators.md (DIDF): New.

gcc/testsuite:

        * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.

Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
---
 gcc/config/arm/iterators.md                   |  3 +++
 gcc/config/arm/thumb1.md                      | 27 ++++++++++++++++++-
 .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++++++++++
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
  

Comments

Richard Earnshaw (lists) June 19, 2024, 12:19 p.m. UTC | #1
On 18/06/2024 19:14, Siarhei Volkau wrote:
> If the address register is dead after load/store operation it looks
> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
> at least if optimizing for size.
> 
> Changes v1 -> v2:
>  - switching to peephole2 approach
>  - added test case
> 
> gcc/ChangeLog:
> 
>         * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
>         (peephole2 to rewrite DI/DF store): New.
>         (thumb1_movdi_insn): Handle overlapped regs ldmia case.
>         (thumb_movdf_insn): Likewise.
> 
> 	* config/arm/iterators.md (DIDF): New.
> 
> gcc/testsuite:
> 
>         * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
> 
> Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
> ---
>  gcc/config/arm/iterators.md                   |  3 +++
>  gcc/config/arm/thumb1.md                      | 27 ++++++++++++++++++-
>  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++++++++++
>  3 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 8d066fcf05d..09046bff83b 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
>  ;; A list of the 32bit and 64bit integer modes
>  (define_mode_iterator SIDI [SI DI])
>  
> +;; A list of the 64bit modes for thumb1.
> +(define_mode_iterator DIDF [DI DF])
> +
>  ;; A list of atomic compare and swap success return modes
>  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
>  
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index d7074b43f60..ed4b706773a 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
>        gcc_assert (TARGET_HAVE_MOVT);
>        return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
>      case 4:
> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> +	return \"ldmia\\t%m1, {%0, %H0}\";

See below for why I don't think this is a case we need to consider here.

>        return \"ldmia\\t%1, {%0, %H0}\";
>      case 5:
>        return \"stmia\\t%0, {%1, %H1}\";
> @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
>  	return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
>        return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
>      case 1:
> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> +	return \"ldmia\\t%m1, {%0, %H0}\";
>        return \"ldmia\\t%1, {%0, %H0}\";
>      case 2:
>        return \"stmia\\t%0, {%1, %H1}\";
> @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
>     (set_attr "conds" "clob")
>     (set_attr "type" "multiple")]
>  )
> -
> +
> +;; match patterns usable by ldmia/stmia
> +(define_peephole2
> +  [(set (match_operand:DIDF 0 "low_register_operand" "")
> +	(mem:DIDF (match_operand:SI 1 "low_register_operand")))]
> +  "TARGET_THUMB1
> +   && (peep2_reg_dead_p (1, operands[1])
> +       || REGNO (operands[0]) + 1 == REGNO (operands[1]))"

I don't understand this second condition (partial overlap of the base address with the value loaded), what are you guarding against here?  The instruction specification says that there is no writeback if the base register has any overlap with any of the loaded registers, so really we should reject the peephole if that's true (we'd have invalid RTL as well as there would end up being two writes to the same register).

> +  [(set (match_dup 0)
> +	(mem:DIDF (post_inc:SI (match_dup 1))))]
> +  ""

This is not enough, unfortunately.  MEM() objects carry attributes about the memory accessed (alias sets, known alignment, etc) and these will not be propagated correctly if you rewrite the pattern this way.  The correct solution is to match the entire mem as operand1, then use change_address to rewrite that.  Something like:

     operands[1] = change_address (operands[1], VOIDmode,
				   gen_rtx_POST_INC (SImode,
						     XEXP (operands[1], 0)));

> +)
> +
> +(define_peephole2
> +  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
> +	(match_operand:DIDF 0 "low_register_operand" ""))]
> +  "TARGET_THUMB1
> +   && peep2_reg_dead_p (1, operands[1])"
> +  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
> +	(match_dup 0))]

There's a similar address rewriting issue here as well, but we also have to guard against the (unlikely) case where the base register is part of the value stored if it isn't the lowest numbered register in the list.  In that case the value stored would be undefined.  That is:

  stmia r0!, {r0, r1}

is ok, but

  stmia r1!, {r0, r1}

is not.

> +  ""
> +)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> new file mode 100644
> index 00000000000..167fa9ec876
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mthumb -Os" }  */
> +/* { dg-require-effective-target arm_thumb1_ok } */
> +
> +void copy_df(double *dst, const double *src)
> +{
> +    *dst = *src;
> +}
> +
> +void copy_di(unsigned long long *dst, const unsigned long long *src)
> +{
> +    *dst = *src;
> +}
> +
> +/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
> +/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */

R.
  
Siarhei Volkau June 19, 2024, 3:11 p.m. UTC | #2
ср, 19 июн. 2024 г. в 15:19, Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com>:
>
> On 18/06/2024 19:14, Siarhei Volkau wrote:
> > If the address register is dead after load/store operation it looks
> > beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
> > at least if optimizing for size.
> >
> > Changes v1 -> v2:
> >  - switching to peephole2 approach
> >  - added test case
> >
> > gcc/ChangeLog:
> >
> >         * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
> >         (peephole2 to rewrite DI/DF store): New.
> >         (thumb1_movdi_insn): Handle overlapped regs ldmia case.
> >         (thumb_movdf_insn): Likewise.
> >
> >       * config/arm/iterators.md (DIDF): New.
> >
> > gcc/testsuite:
> >
> >         * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
> >
> > Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
> > ---
> >  gcc/config/arm/iterators.md                   |  3 +++
> >  gcc/config/arm/thumb1.md                      | 27 ++++++++++++++++++-
> >  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++++++++++
> >  3 files changed, 45 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> >
> > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > index 8d066fcf05d..09046bff83b 100644
> > --- a/gcc/config/arm/iterators.md
> > +++ b/gcc/config/arm/iterators.md
> > @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
> >  ;; A list of the 32bit and 64bit integer modes
> >  (define_mode_iterator SIDI [SI DI])
> >
> > +;; A list of the 64bit modes for thumb1.
> > +(define_mode_iterator DIDF [DI DF])
> > +
> >  ;; A list of atomic compare and swap success return modes
> >  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
> >
> > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > index d7074b43f60..ed4b706773a 100644
> > --- a/gcc/config/arm/thumb1.md
> > +++ b/gcc/config/arm/thumb1.md
> > @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
> >        gcc_assert (TARGET_HAVE_MOVT);
> >        return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
> >      case 4:
> > +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> > +     return \"ldmia\\t%m1, {%0, %H0}\";
>
> See below for why I don't think this is a case we need to consider here.
>
> >        return \"ldmia\\t%1, {%0, %H0}\";
> >      case 5:
> >        return \"stmia\\t%0, {%1, %H1}\";
> > @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
> >       return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
> >        return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
> >      case 1:
> > +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> > +     return \"ldmia\\t%m1, {%0, %H0}\";
> >        return \"ldmia\\t%1, {%0, %H0}\";
> >      case 2:
> >        return \"stmia\\t%0, {%1, %H1}\";
> > @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
> >     (set_attr "conds" "clob")
> >     (set_attr "type" "multiple")]
> >  )
> > -
> > +
> > +;; match patterns usable by ldmia/stmia
> > +(define_peephole2
> > +  [(set (match_operand:DIDF 0 "low_register_operand" "")
> > +     (mem:DIDF (match_operand:SI 1 "low_register_operand")))]
> > +  "TARGET_THUMB1
> > +   && (peep2_reg_dead_p (1, operands[1])
> > +       || REGNO (operands[0]) + 1 == REGNO (operands[1]))"
>
> I don't understand this second condition (partial overlap of the base address with the value loaded), what are you guarding against here?  The instruction specification says that there is no writeback if the base register has any overlap with any of the loaded registers, so really we should reject the peephole if that's true (we'd have invalid RTL as well as there would end up being two writes to the same register).
>

The first condition here is for ldmia cases with writeback, the second
condition is for cases without writeback. I falsely thought that base
register has to be the last register in the list similarly like for
stmia it has to be first.
So, I have to reject the no-writeback case there because it emits
incorrect RTL, this is clear now.

However, to not miss possible optimization, no-writeback cases check
might be moved to instruction itself as it was done in v1 patch
(incorrect instruction length will appear again then).
Any objections on that?

> > +  [(set (match_dup 0)
> > +     (mem:DIDF (post_inc:SI (match_dup 1))))]
> > +  ""
>
> This is not enough, unfortunately.  MEM() objects carry attributes about the memory accessed (alias sets, known alignment, etc) and these will not be propagated correctly if you rewrite the pattern this way.  The correct solution is to match the entire mem as operand1, then use change_address to rewrite that.  Something like:
>
>      operands[1] = change_address (operands[1], VOIDmode,
>                                    gen_rtx_POST_INC (SImode,
>                                                      XEXP (operands[1], 0)));
>

Got it, will rewrite, thanks.

> > +)
> > +
> > +(define_peephole2
> > +  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
> > +     (match_operand:DIDF 0 "low_register_operand" ""))]
> > +  "TARGET_THUMB1
> > +   && peep2_reg_dead_p (1, operands[1])"
> > +  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
> > +     (match_dup 0))]
>
> There's a similar address rewriting issue here as well, but we also have to guard against the (unlikely) case where the base register is part of the value stored if it isn't the lowest numbered register in the list.  In that case the value stored would be undefined.  That is:
>
>   stmia r0!, {r0, r1}
>
> is ok, but
>
>   stmia r1!, {r0, r1}
>
> is not.
>

I don't think that base register will appear also in the list here,
because it can't be Pmode and DI/DFmode at the same time, can it?

> > +  ""
> > +)
> > diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> > new file mode 100644
> > index 00000000000..167fa9ec876
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mthumb -Os" }  */
> > +/* { dg-require-effective-target arm_thumb1_ok } */
> > +
> > +void copy_df(double *dst, const double *src)
> > +{
> > +    *dst = *src;
> > +}
> > +
> > +void copy_di(unsigned long long *dst, const unsigned long long *src)
> > +{
> > +    *dst = *src;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
> > +/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */
>
> R.
>

BR,
Siarhei
  
Richard Earnshaw (lists) June 19, 2024, 3:32 p.m. UTC | #3
On 19/06/2024 16:11, Siarhei Volkau wrote:
> ср, 19 июн. 2024 г. в 15:19, Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com>:
>>
>> On 18/06/2024 19:14, Siarhei Volkau wrote:
>>> If the address register is dead after load/store operation it looks
>>> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
>>> at least if optimizing for size.
>>>
>>> Changes v1 -> v2:
>>>  - switching to peephole2 approach
>>>  - added test case
>>>
>>> gcc/ChangeLog:
>>>
>>>         * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
>>>         (peephole2 to rewrite DI/DF store): New.
>>>         (thumb1_movdi_insn): Handle overlapped regs ldmia case.
>>>         (thumb_movdf_insn): Likewise.
>>>
>>>       * config/arm/iterators.md (DIDF): New.
>>>
>>> gcc/testsuite:
>>>
>>>         * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
>>>
>>> Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
>>> ---
>>>  gcc/config/arm/iterators.md                   |  3 +++
>>>  gcc/config/arm/thumb1.md                      | 27 ++++++++++++++++++-
>>>  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++++++++++
>>>  3 files changed, 45 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
>>>
>>> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
>>> index 8d066fcf05d..09046bff83b 100644
>>> --- a/gcc/config/arm/iterators.md
>>> +++ b/gcc/config/arm/iterators.md
>>> @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
>>>  ;; A list of the 32bit and 64bit integer modes
>>>  (define_mode_iterator SIDI [SI DI])
>>>
>>> +;; A list of the 64bit modes for thumb1.
>>> +(define_mode_iterator DIDF [DI DF])
>>> +
>>>  ;; A list of atomic compare and swap success return modes
>>>  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
>>>
>>> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
>>> index d7074b43f60..ed4b706773a 100644
>>> --- a/gcc/config/arm/thumb1.md
>>> +++ b/gcc/config/arm/thumb1.md
>>> @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
>>>        gcc_assert (TARGET_HAVE_MOVT);
>>>        return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
>>>      case 4:
>>> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
>>> +     return \"ldmia\\t%m1, {%0, %H0}\";
>>
>> See below for why I don't think this is a case we need to consider here.
>>
>>>        return \"ldmia\\t%1, {%0, %H0}\";
>>>      case 5:
>>>        return \"stmia\\t%0, {%1, %H1}\";
>>> @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
>>>       return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
>>>        return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
>>>      case 1:
>>> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
>>> +     return \"ldmia\\t%m1, {%0, %H0}\";
>>>        return \"ldmia\\t%1, {%0, %H0}\";
>>>      case 2:
>>>        return \"stmia\\t%0, {%1, %H1}\";
>>> @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
>>>     (set_attr "conds" "clob")
>>>     (set_attr "type" "multiple")]
>>>  )
>>> -
>>> +
>>> +;; match patterns usable by ldmia/stmia
>>> +(define_peephole2
>>> +  [(set (match_operand:DIDF 0 "low_register_operand" "")
>>> +     (mem:DIDF (match_operand:SI 1 "low_register_operand")))]
>>> +  "TARGET_THUMB1
>>> +   && (peep2_reg_dead_p (1, operands[1])
>>> +       || REGNO (operands[0]) + 1 == REGNO (operands[1]))"
>>
>> I don't understand this second condition (partial overlap of the base address with the value loaded), what are you guarding against here?  The instruction specification says that there is no writeback if the base register has any overlap with any of the loaded registers, so really we should reject the peephole if that's true (we'd have invalid RTL as well as there would end up being two writes to the same register).
>>
> 
> The first condition here is for ldmia cases with writeback, the second
> condition is for cases without writeback. I falsely thought that base
> register has to be the last register in the list similarly like for
> stmia it has to be first.
> So, I have to reject the no-writeback case there because it emits
> incorrect RTL, this is clear now.
> 
> However, to not miss possible optimization, no-writeback cases check
> might be moved to instruction itself as it was done in v1 patch
> (incorrect instruction length will appear again then).
> Any objections on that?

Yes, I think the no-writeback case will need to be handled inside the code that prints the instruction (like you were doing originally).  We'll just have to forgo the knowledge that this case only needs 2 bytes for the instruction here (it's not a major loss).

> 
>>> +  [(set (match_dup 0)
>>> +     (mem:DIDF (post_inc:SI (match_dup 1))))]
>>> +  ""
>>
>> This is not enough, unfortunately.  MEM() objects carry attributes about the memory accessed (alias sets, known alignment, etc) and these will not be propagated correctly if you rewrite the pattern this way.  The correct solution is to match the entire mem as operand1, then use change_address to rewrite that.  Something like:
>>
>>      operands[1] = change_address (operands[1], VOIDmode,
>>                                    gen_rtx_POST_INC (SImode,
>>                                                      XEXP (operands[1], 0)));
>>
> 
> Got it, will rewrite, thanks.
> 
>>> +)
>>> +
>>> +(define_peephole2
>>> +  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
>>> +     (match_operand:DIDF 0 "low_register_operand" ""))]
>>> +  "TARGET_THUMB1
>>> +   && peep2_reg_dead_p (1, operands[1])"
>>> +  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
>>> +     (match_dup 0))]
>>
>> There's a similar address rewriting issue here as well, but we also have to guard against the (unlikely) case where the base register is part of the value stored if it isn't the lowest numbered register in the list.  In that case the value stored would be undefined.  That is:
>>
>>   stmia r0!, {r0, r1}
>>
>> is ok, but
>>
>>   stmia r1!, {r0, r1}
>>
>> is not.
>>
> 
> I don't think that base register will appear also in the list here,
> because it can't be Pmode and DI/DFmode at the same time, can it?

It's highly unlikely (as I said), but it could theoretically happen with some very weird casts.  Something a bit like

int64_t f;
...
*(int64_t *)f = f;

might trigger it.  This would be OK when compiling little-endian code, as the low bits used for the pointer are in the first transfer value; but incorrect on big-endian since then we'd be using the second transfer register for the pointer.

But compilers are all about weird corner cases :)

R.

> 
>>> +  ""
>>> +)
>>> diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
>>> new file mode 100644
>>> index 00000000000..167fa9ec876
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
>>> @@ -0,0 +1,16 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-mthumb -Os" }  */
>>> +/* { dg-require-effective-target arm_thumb1_ok } */
>>> +
>>> +void copy_df(double *dst, const double *src)
>>> +{
>>> +    *dst = *src;
>>> +}
>>> +
>>> +void copy_di(unsigned long long *dst, const unsigned long long *src)
>>> +{
>>> +    *dst = *src;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
>>> +/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */
>>
>> R.
>>
> 
> BR,
> Siarhei
  

Patch

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 8d066fcf05d..09046bff83b 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -50,6 +50,9 @@  (define_mode_iterator QHSD [QI HI SI DI])
 ;; A list of the 32bit and 64bit integer modes
 (define_mode_iterator SIDI [SI DI])
 
+;; A list of the 64bit modes for thumb1.
+(define_mode_iterator DIDF [DI DF])
+
 ;; A list of atomic compare and swap success return modes
 (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
 
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index d7074b43f60..ed4b706773a 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -633,6 +633,8 @@  (define_insn "*thumb1_movdi_insn"
       gcc_assert (TARGET_HAVE_MOVT);
       return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
     case 4:
+      if (reg_overlap_mentioned_p (operands[0], operands[1]))
+	return \"ldmia\\t%m1, {%0, %H0}\";
       return \"ldmia\\t%1, {%0, %H0}\";
     case 5:
       return \"stmia\\t%0, {%1, %H1}\";
@@ -966,6 +968,8 @@  (define_insn "*thumb_movdf_insn"
 	return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
       return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
     case 1:
+      if (reg_overlap_mentioned_p (operands[0], operands[1]))
+	return \"ldmia\\t%m1, {%0, %H0}\";
       return \"ldmia\\t%1, {%0, %H0}\";
     case 2:
       return \"stmia\\t%0, {%1, %H1}\";
@@ -2055,4 +2059,25 @@  (define_insn "thumb1_stack_protect_test_insn"
    (set_attr "conds" "clob")
    (set_attr "type" "multiple")]
 )
-
+
+;; match patterns usable by ldmia/stmia
+(define_peephole2
+  [(set (match_operand:DIDF 0 "low_register_operand" "")
+	(mem:DIDF (match_operand:SI 1 "low_register_operand")))]
+  "TARGET_THUMB1
+   && (peep2_reg_dead_p (1, operands[1])
+       || REGNO (operands[0]) + 1 == REGNO (operands[1]))"
+  [(set (match_dup 0)
+	(mem:DIDF (post_inc:SI (match_dup 1))))]
+  ""
+)
+
+(define_peephole2
+  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
+	(match_operand:DIDF 0 "low_register_operand" ""))]
+  "TARGET_THUMB1
+   && peep2_reg_dead_p (1, operands[1])"
+  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
+	(match_dup 0))]
+  ""
+)
diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
new file mode 100644
index 00000000000..167fa9ec876
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mthumb -Os" }  */
+/* { dg-require-effective-target arm_thumb1_ok } */
+
+void copy_df(double *dst, const double *src)
+{
+    *dst = *src;
+}
+
+void copy_di(unsigned long long *dst, const unsigned long long *src)
+{
+    *dst = *src;
+}
+
+/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
+/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */