aarch64: Thunderx specific memcpy and memmove
Commit Message
Now that the IFUNC infrastructure for aarch64 is in place, here is a
patch to use it to create ThunderX specific versions of memcpy and
memmove.
This was part of my original patch before it was split in two and a
couple of issues were raised at that time.
Siddhesh Poyarekar wanted to separate the generic and thunderx copies
of memcpy/memmove instead of using ifdefs in a combined source file.
I prefer the ifdef version as a cleaner implementation with less code
duplication but I can change it if that is the consensus.
Also Adhemerval Zanella did some benchmarking that showed the
prefetching done in the thunderx version might be appropriate for the
generic version. However if you look at the prefetching we only do it
every other time through the loop. This is because the loop copies 64
bytes and the ThunderX cache line size is 128 bytes. If other aarch64
chips have a 64 byte cache line they might want a different prefetching
setup.
If people think we should use the ThunderX version of memcpy for all
aarch64 systems I am happy to drop this patch and create one that just
changes memcpy.S to do the ThunderX style prefetches for all aarch64
systems.
Steve Ellcey
sellcey@cavium.com
2017-03-24 Steve Ellcey <sellcey@caviumnetworks.com>
* sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
(memmove): Use MEMMOVE for name.
(memcpy): Use MEMCPY for name. Add loop with prefetching
under USE_THUNDERX macro.
* sysdeps/aarch64/multiarch/Makefile: New file.
* sysdeps/aarch64/multiarch/ifunc-impl-list.c: Likewise.
* sysdeps/aarch64/multiarch/init-arch.h: Likewise.
* sysdeps/aarch64/multiarch/memcpy.c: Likewise.
* sysdeps/aarch64/multiarch/memcpy_generic.S: Likewise.
* sysdeps/aarch64/multiarch/memcpy_thunderx.S: Likewise.
* sysdeps/aarch64/multiarch/memmove.c: Likewise.
Comments
On 24/03/17 23:25, Steve Ellcey wrote:
> Now that the IFUNC infrastructure for aarch64 is in place, here is a
> patch to use it to create ThunderX specific versions of memcpy and
> memmove.
>
> This was part of my original patch before it was split in two and a
> couple of issues were raised at that time.
>
> Siddhesh Poyarekar wanted to separate the generic and thunderx copies
> of memcpy/memmove instead of using ifdefs in a combined source file.
> I prefer the ifdef version as a cleaner implementation with less code
> duplication but I can change it if that is the consensus.
>
both are fine with me.
> Also Adhemerval Zanella did some benchmarking that showed the
> prefetching done in the thunderx version might be appropriate for the
> generic version. However if you look at the prefetching we only do it
> every other time through the loop. This is because the loop copies 64
> bytes and the ThunderX cache line size is 128 bytes. If other aarch64
> chips have a 64 byte cache line they might want a different prefetching
> setup.
>
> If people think we should use the ThunderX version of memcpy for all
> aarch64 systems I am happy to drop this patch and create one that just
> changes memcpy.S to do the ThunderX style prefetches for all aarch64
> systems.
>
adding prefetches to the generic code is preferable
if it can make both thunderx and generic users happy.
we need to find what's the best way to add the prefetches,
the new memcpy benchmarks may help here.
> Steve Ellcey
> sellcey@cavium.com
>
>
> 2017-03-24 Steve Ellcey <sellcey@caviumnetworks.com>
>
> * sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
> (memmove): Use MEMMOVE for name.
> (memcpy): Use MEMCPY for name. Add loop with prefetching
> under USE_THUNDERX macro.
> * sysdeps/aarch64/multiarch/Makefile: New file.
> * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Likewise.
> * sysdeps/aarch64/multiarch/init-arch.h: Likewise.
> * sysdeps/aarch64/multiarch/memcpy.c: Likewise.
> * sysdeps/aarch64/multiarch/memcpy_generic.S: Likewise.
> * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Likewise.
> * sysdeps/aarch64/multiarch/memmove.c: Likewise.
>
On Fri, Mar 24, 2017 at 11:25 PM, Steve Ellcey
<sellcey@caviumnetworks.com> wrote:
> Now that the IFUNC infrastructure for aarch64 is in place, here is a
> patch to use it to create ThunderX specific versions of memcpy and
> memmove.
>
> This was part of my original patch before it was split in two and a
> couple of issues were raised at that time.
>
> Siddhesh Poyarekar wanted to separate the generic and thunderx copies
> of memcpy/memmove instead of using ifdefs in a combined source file.
> I prefer the ifdef version as a cleaner implementation with less code
> duplication but I can change it if that is the consensus.
>
> Also Adhemerval Zanella did some benchmarking that showed the
> prefetching done in the thunderx version might be appropriate for the
> generic version. However if you look at the prefetching we only do it
> every other time through the loop. This is because the loop copies 64
> bytes and the ThunderX cache line size is 128 bytes. If other aarch64
> chips have a 64 byte cache line they might want a different prefetching
> setup.
Can you link to the benchmark numbers, workloads and what systems ?
Ramana
>
> If people think we should use the ThunderX version of memcpy for all
> aarch64 systems I am happy to drop this patch and create one that just
> changes memcpy.S to do the ThunderX style prefetches for all aarch64
> systems.
>
> Steve Ellcey
> sellcey@cavium.com
>
>
> 2017-03-24 Steve Ellcey <sellcey@caviumnetworks.com>
>
> * sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
> (memmove): Use MEMMOVE for name.
> (memcpy): Use MEMCPY for name. Add loop with prefetching
> under USE_THUNDERX macro.
> * sysdeps/aarch64/multiarch/Makefile: New file.
> * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Likewise.
> * sysdeps/aarch64/multiarch/init-arch.h: Likewise.
> * sysdeps/aarch64/multiarch/memcpy.c: Likewise.
> * sysdeps/aarch64/multiarch/memcpy_generic.S: Likewise.
> * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Likewise.
> * sysdeps/aarch64/multiarch/memmove.c: Likewise.
On Mon, 2017-03-27 at 11:52 +0100, Ramana Radhakrishnan wrote:
>
> > Also Adhemerval Zanella did some benchmarking that showed the
> > prefetching done in the thunderx version might be appropriate for the
> > generic version. However if you look at the prefetching we only do it
> > every other time through the loop. This is because the loop copies 64
> > bytes and the ThunderX cache line size is 128 bytes. If other aarch64
> > chips have a 64 byte cache line they might want a different prefetching
> > setup.
> Can you link to the benchmark numbers, workloads and what systems ?
>
> Ramana
The only reference I have to Adhemerval's results are at:
https://sourceware.org/ml/libc-alpha/2017-02/msg00118.html
Attached are my latest results on ThunderX with the IFUNC numbers from
the glibc memcpy performance benchmarks. They include the new bench-
memcpy-random benchmark which doesn't show much difference. It is
really bench-memcpy-large that stands out.
Steve Ellcey
sellcey@cavium.com
builtin_memcpy simple_memcpy __memcpy_thunderx __memcpy_generic
Length 1, alignment 0/ 0: 39.2188 18.75 23.4375 23.125
Length 1, alignment 0/ 0: 27.0312 17.6562 23.125 23.125
Length 1, alignment 0/ 0: 27.0312 17.0312 22.9688 22.8125
Length 1, alignment 0/ 0: 27.9688 17.1875 26.875 24.375
Length 2, alignment 0/ 0: 27.0312 26.875 23.125 23.2812
Length 2, alignment 1/ 0: 27.0312 25.625 23.125 22.8125
Length 2, alignment 0/ 1: 27.0312 25.1562 22.9688 22.8125
Length 2, alignment 1/ 1: 26.875 25.4688 22.9688 22.8125
Length 4, alignment 0/ 0: 26.25 26.7188 21.25 20.9375
Length 4, alignment 2/ 0: 25 25.9375 20.9375 20.7812
Length 4, alignment 0/ 2: 24.6875 25.9375 21.0938 20.7812
Length 4, alignment 2/ 2: 24.8438 25.4688 20.7812 20.625
Length 8, alignment 0/ 0: 24.2188 38.5938 19.6875 19.8438
Length 8, alignment 3/ 0: 34.2188 37.1875 28.9062 28.75
Length 8, alignment 0/ 3: 35.7812 36.875 30.4688 30.3125
Length 8, alignment 3/ 3: 44.2188 36.875 38.9062 38.5938
Length 16, alignment 0/ 0: 23.75 75 19.5312 19.375
Length 16, alignment 4/ 0: 34.0625 74.5312 28.9062 28.5938
Length 16, alignment 0/ 4: 35.9375 74.375 30.4688 30.3125
Length 16, alignment 4/ 4: 44.2188 74.5312 38.9062 38.5938
Length 32, alignment 0/ 0: 25.3125 110 19.6875 19.0625
Length 32, alignment 5/ 0: 35.3125 110 30.3125 30
Length 32, alignment 0/ 5: 35.3125 110.156 30.1562 30
Length 32, alignment 5/ 5: 45.3125 110 40 40
Length 64, alignment 0/ 0: 26.25 198.906 21.25 21.25
Length 64, alignment 6/ 0: 45 198.906 39.6875 39.8438
Length 64, alignment 0/ 6: 46.5625 198.75 41.25 41.25
Length 64, alignment 6/ 6: 64.375 198.906 59.2188 58.9062
Length 128, alignment 0/ 0: 34.0625 376.875 29.6875 27.9688
Length 128, alignment 7/ 0: 75.625 376.719 71.25 70
Length 128, alignment 0/ 7: 77.8125 376.875 73.5938 71.25
Length 128, alignment 7/ 7: 80.625 376.562 75.9375 74.6875
Length 256, alignment 0/ 0: 44.375 732.344 39.0625 41.0938
Length 256, alignment 8/ 0: 120.312 732.188 116.094 121.406
Length 256, alignment 0/ 8: 122.5 732.344 118.438 122.812
Length 256, alignment 8/ 8: 90.3125 732.344 86.0938 88.4375
Length 512, alignment 0/ 0: 64.375 1443.44 59.375 57.6562
Length 512, alignment 9/ 0: 216.406 1443.59 212.812 211.25
Length 512, alignment 0/ 9: 218.594 1443.44 214.844 212.656
Length 512, alignment 9/ 9: 110.469 1443.44 106.25 104.844
Length 1024, alignment 0/ 0: 107.344 2865.94 103.281 101.719
Length 1024, alignment 10/ 0: 414.219 2866.09 410.312 405.312
Length 1024, alignment 0/10: 416.094 2865.47 412.344 406.562
Length 1024, alignment 10/10: 154.219 2865 150 147.812
Length 2048, alignment 0/ 0: 216.406 5714.69 212.969 209.531
Length 2048, alignment 11/ 0: 793.281 5710.47 789.844 787.969
Length 2048, alignment 0/11: 796.094 5710.62 791.875 789.688
Length 2048, alignment 11/11: 262.344 5710.62 259.219 254.844
Length 4096, alignment 0/ 0: 408.75 11399.7 406.094 398.75
Length 4096, alignment 12/ 0: 1558.28 11399.7 1555.78 1552.97
Length 4096, alignment 0/12: 1559.84 11400 1556.88 1554.22
Length 4096, alignment 12/12: 455.312 11399.7 452.5 445.312
Length 8192, alignment 0/ 0: 796.094 22779.5 944.375 782.344
Length 8192, alignment 13/ 0: 3089.38 22779.7 3084.53 3082.81
Length 8192, alignment 0/13: 3091.56 22922.8 3087.5 3085.16
Length 8192, alignment 13/13: 841.875 22779.8 838.906 827.031
Length 16384, alignment 0/ 0: 1585.78 45738.1 1579.22 1567.66
Length 16384, alignment 14/ 0: 6164.69 45726.9 6155.31 6166.88
Length 16384, alignment 0/14: 6160.94 45736.9 6158.75 6166.88
Length 16384, alignment 14/14: 1624.84 45793.9 1622.03 1608.75
Length 32768, alignment 0/ 0: 3905.47 93004.7 3902.34 4998.44
Length 32768, alignment 15/ 0: 13493.4 92454.4 13462.8 14771.6
Length 32768, alignment 0/15: 13685.5 92742.5 13495 13854.5
Length 32768, alignment 15/15: 4035.31 92889.4 4008.44 4661.09
Length 65536, alignment 0/ 0: 8697.66 193559 8674.38 16843.6
Length 65536, alignment 16/ 0: 8698.12 193557 8677.5 17120.6
Length 65536, alignment 0/16: 8845.62 193541 8678.12 16837.8
Length 65536, alignment 16/16: 8834.38 193557 8679.53 17148.3
Length 0, alignment 0/ 0: 28.2812 18.2812 23.4375 23.5938
Length 0, alignment 0/ 0: 27.3438 17.8125 23.2812 23.4375
Length 0, alignment 0/ 0: 27.3438 17.8125 23.2812 23.2812
Length 0, alignment 0/ 0: 27.1875 17.8125 23.2812 23.2812
Length 1, alignment 0/ 0: 27.3438 17.6562 22.8125 22.9688
Length 1, alignment 1/ 0: 27.0312 17.3438 22.9688 22.9688
Length 1, alignment 0/ 1: 27.0312 17.3438 22.9688 22.9688
Length 1, alignment 1/ 1: 27.0312 17.1875 22.9688 22.8125
Length 2, alignment 0/ 0: 27.1875 25.3125 22.9688 22.8125
Length 2, alignment 2/ 0: 27.0312 25.4688 22.9688 22.9688
Length 2, alignment 0/ 2: 27.0312 25.1562 22.9688 22.8125
Length 2, alignment 2/ 2: 27.1875 25.1562 22.9688 22.8125
Length 3, alignment 0/ 0: 27.0312 24.6875 22.9688 22.8125
Length 3, alignment 3/ 0: 27.0312 22.9688 22.9688 22.9688
Length 3, alignment 0/ 3: 27.0312 22.6562 22.9688 22.9688
Length 3, alignment 3/ 3: 27.0312 22.6562 22.9688 22.8125
Length 4, alignment 0/ 0: 25.3125 26.0938 21.0938 20.7812
Length 4, alignment 4/ 0: 25 25.9375 20.7812 20.7812
Length 4, alignment 0/ 4: 25 25.4688 20.9375 20.7812
Length 4, alignment 4/ 4: 24.8438 25.625 20.7812 20.625
Length 5, alignment 0/ 0: 25 29.6875 20.7812 20.625
Length 5, alignment 5/ 0: 34.5312 28.5938 30.4688 30.3125
Length 5, alignment 0/ 5: 36.25 28.2812 31.875 31.875
Length 5, alignment 5/ 5: 44.5312 28.2812 40.3125 40.1562
Length 6, alignment 0/ 0: 25 32.1875 22.9688 20.625
Length 6, alignment 6/ 0: 29.5312 31.5625 25.4688 25.1562
Length 6, alignment 0/ 6: 30.9375 30.9375 27.0312 26.7188
Length 6, alignment 6/ 6: 35 31.0938 30.7812 30.7812
Length 7, alignment 0/ 0: 24.8438 35.4688 20.7812 20.625
Length 7, alignment 7/ 0: 29.375 34.6875 25.1562 25
Length 7, alignment 0/ 7: 31.0938 34.2188 27.0312 26.875
Length 7, alignment 7/ 7: 35.1562 34.0625 30.9375 30.625
Length 8, alignment 0/ 0: 24.0625 37.5 19.8438 19.2188
Length 8, alignment 8/ 0: 23.5938 37.5 19.375 19.375
Length 8, alignment 0/ 8: 23.5938 37.1875 19.5312 19.2188
Length 8, alignment 8/ 8: 23.75 37.1875 19.5312 19.2188
Length 9, alignment 0/ 0: 35.3125 40.7812 29.8438 29.6875
Length 9, alignment 9/ 0: 39.6875 40.3125 34.5312 34.2188
Length 9, alignment 0/ 9: 39.8438 40 34.375 34.2188
Length 9, alignment 9/ 9: 44.2188 40 38.75 38.5938
Length 10, alignment 0/ 0: 35.3125 43.5938 30 29.8438
Length 10, alignment 10/ 0: 39.6875 42.9688 34.375 34.2188
Length 10, alignment 0/10: 39.6875 42.8125 34.375 34.2188
Length 10, alignment 10/10: 44.2188 42.8125 38.75 38.5938
Length 11, alignment 0/ 0: 35.3125 46.25 29.8438 29.6875
Length 11, alignment 11/ 0: 39.8438 45.9375 34.375 34.0625
Length 11, alignment 0/11: 39.6875 45.625 34.5312 34.2188
Length 11, alignment 11/11: 44.0625 45.7812 38.75 38.5938
Length 12, alignment 0/ 0: 35.3125 48.9062 29.8438 29.6875
Length 12, alignment 12/ 0: 35.9375 48.4375 30.625 30.3125
Length 12, alignment 0/12: 34.8438 48.2812 29.375 29.0625
Length 12, alignment 12/12: 34.6875 48.2812 29.375 29.2188
Length 13, alignment 0/ 0: 35.1562 51.25 30 29.6875
Length 13, alignment 13/ 0: 39.8438 51.0938 34.375 34.0625
Length 13, alignment 0/13: 39.6875 51.25 34.375 34.2188
Length 13, alignment 13/13: 44.2188 51.25 38.9062 38.5938
Length 14, alignment 0/ 0: 35.3125 58.4375 29.8438 29.6875
Length 14, alignment 14/ 0: 39.8438 58.4375 34.375 34.2188
Length 14, alignment 0/14: 39.6875 58.4375 34.5312 34.375
Length 14, alignment 14/14: 44.2188 58.4375 38.9062 38.75
Length 15, alignment 0/ 0: 35.3125 72.0312 29.8438 29.6875
Length 15, alignment 15/ 0: 39.8438 71.875 34.5312 34.375
Length 15, alignment 0/15: 39.6875 72.0312 34.375 34.2188
Length 15, alignment 15/15: 44.2188 72.0312 38.75 38.75
Length 16, alignment 0/ 0: 23.75 74.8438 19.5312 19.375
Length 16, alignment 16/ 0: 23.5938 74.6875 19.375 19.2188
Length 16, alignment 0/16: 23.5938 74.5312 19.2188 17.6562
Length 16, alignment 16/16: 23.5938 74.5312 19.375 19.2188
Length 17, alignment 0/ 0: 36.7188 68.4375 31.5625 31.4062
Length 17, alignment 17/ 0: 40.9375 68.2812 35.7812 35.7812
Length 17, alignment 0/17: 40.4688 68.4375 35.625 35.4688
Length 17, alignment 17/17: 45 68.4375 40 39.8438
Length 18, alignment 0/ 0: 36.0938 71.25 31.25 30.9375
Length 18, alignment 18/ 0: 40.625 71.0938 35.625 35.4688
Length 18, alignment 0/18: 40.4688 71.25 35.625 35.3125
Length 18, alignment 18/18: 45 71.0938 40.1562 39.8438
Length 19, alignment 0/ 0: 36.0938 73.9062 31.25 31.0938
Length 19, alignment 19/ 0: 40.4688 74.0625 35.625 35.4688
Length 19, alignment 0/19: 40.4688 73.9062 35.7812 35.4688
Length 19, alignment 19/19: 45 73.9062 40 39.8438
Length 20, alignment 0/ 0: 36.0938 76.7188 31.0938 30.9375
Length 20, alignment 20/ 0: 40.625 76.7188 35.7812 35.625
Length 20, alignment 0/20: 40.4688 76.7188 35.7812 35.4688
Length 20, alignment 20/20: 45 76.5625 40.1562 39.8438
Length 21, alignment 0/ 0: 36.0938 79.5312 31.25 30.9375
Length 21, alignment 21/ 0: 40.625 79.5312 35.625 35.4688
Length 21, alignment 0/21: 40.4688 79.5312 35.625 35.4688
Length 21, alignment 21/21: 45 79.5312 40.1562 40
Length 22, alignment 0/ 0: 36.0938 82.3438 31.0938 30.9375
Length 22, alignment 22/ 0: 40.625 82.3438 35.625 35.4688
Length 22, alignment 0/22: 40.625 82.3438 35.7812 35.4688
Length 22, alignment 22/22: 45 82.1875 40.1562 40
Length 23, alignment 0/ 0: 36.0938 85.1562 31.25 30.9375
Length 23, alignment 23/ 0: 40.4688 85.1562 35.7812 35.625
Length 23, alignment 0/23: 40.4688 85.1562 35.7812 35.4688
Length 23, alignment 23/23: 45 85 40.1562 39.8438
Length 24, alignment 0/ 0: 36.0938 87.8125 31.25 31.0938
Length 24, alignment 24/ 0: 35.4688 87.8125 30.625 30.4688
Length 24, alignment 0/24: 35.4688 87.8125 30.625 30.4688
Length 24, alignment 24/24: 35 87.8125 30.1562 30
Length 25, alignment 0/ 0: 36.0938 90.625 31.0938 30.9375
Length 25, alignment 25/ 0: 40.625 233.906 36.0938 35.3125
Length 25, alignment 0/25: 40.9375 90.7812 35.7812 35.4688
Length 25, alignment 25/25: 45.3125 90.4688 40 39.8438
Length 26, alignment 0/ 0: 36.0938 93.2812 31.0938 30.9375
Length 26, alignment 26/ 0: 40.4688 93.2812 35.625 35.4688
Length 26, alignment 0/26: 40.625 93.2812 35.625 35.4688
Length 26, alignment 26/26: 45 93.2812 40.1562 40
Length 27, alignment 0/ 0: 36.0938 96.0938 31.0938 30.9375
Length 27, alignment 27/ 0: 40.625 96.0938 35.625 35.4688
Length 27, alignment 0/27: 40.625 96.0938 35.7812 35.4688
Length 27, alignment 27/27: 45 96.0938 40 39.8438
Length 28, alignment 0/ 0: 36.0938 98.9062 31.25 31.0938
Length 28, alignment 28/ 0: 40.4688 98.75 35.625 35.625
Length 28, alignment 0/28: 40.4688 98.9062 35.625 35.4688
Length 28, alignment 28/28: 45 99.0625 40.1562 39.8438
Length 29, alignment 0/ 0: 36.25 101.719 31.25 31.0938
Length 29, alignment 29/ 0: 40.4688 101.719 35.625 35.3125
Length 29, alignment 0/29: 40.4688 101.719 35.625 35.4688
Length 29, alignment 29/29: 45 101.719 40 39.8438
Length 30, alignment 0/ 0: 36.0938 104.531 31.0938 31.0938
Length 30, alignment 30/ 0: 40.625 104.375 35.625 35.4688
Length 30, alignment 0/30: 40.625 104.375 35.7812 35.4688
Length 30, alignment 30/30: 45 104.531 40.1562 39.8438
Length 31, alignment 0/ 0: 36.0938 107.344 31.25 30.9375
Length 31, alignment 31/ 0: 40.4688 107.344 35.7812 35.625
Length 31, alignment 0/31: 40.4688 107.188 35.7812 35.4688
Length 31, alignment 31/31: 45 107.344 40 40
Length 48, alignment 0/ 0: 25.4688 154.375 20.9375 20.9375
Length 48, alignment 3/ 0: 44.5312 154.375 39.6875 39.6875
Length 48, alignment 0/ 3: 46.25 154.375 41.25 41.0938
Length 48, alignment 3/ 3: 64.0625 154.375 59.2188 58.9062
Length 80, alignment 0/ 0: 27.9688 243.281 23.125 22.5
Length 80, alignment 5/ 0: 57.5 243.281 53.125 53.125
Length 80, alignment 0/ 5: 56.875 243.281 52.0312 51.875
Length 80, alignment 5/ 5: 87.5 243.281 82.8125 82.5
Length 96, alignment 0/ 0: 27.1875 287.656 22.6562 22.3438
Length 96, alignment 6/ 0: 57.8125 287.656 53.4375 53.2812
Length 96, alignment 0/ 6: 57.1875 287.656 52.0312 51.875
Length 96, alignment 6/ 6: 87.6562 287.656 82.5 82.3438
Length 112, alignment 0/ 0: 33.5938 332.344 29.5312 27.0312
Length 112, alignment 7/ 0: 75.3125 332.188 71.4062 69.6875
Length 112, alignment 0/ 7: 77.6562 332.188 73.5938 71.25
Length 112, alignment 7/ 7: 80.3125 332.188 76.25 74.5312
Length 144, alignment 0/ 0: 33.2812 421.25 29.0625 26.875
Length 144, alignment 9/ 0: 75.3125 421.25 71.0938 69.6875
Length 144, alignment 0/ 9: 99.5312 421.094 95.1562 93.9062
Length 144, alignment 9/ 9: 84.8438 421.25 80.9375 79.5312
Length 160, alignment 0/ 0: 37.5 465.625 33.9062 31.7188
Length 160, alignment 10/ 0: 96.5625 465.625 92.6562 90.9375
Length 160, alignment 0/10: 98.75 465.625 94.8438 92.8125
Length 160, alignment 10/10: 84.8438 465.625 80.9375 79.2188
Length 176, alignment 0/ 0: 37.6562 510.156 33.9062 31.5625
Length 176, alignment 11/ 0: 96.5625 510 92.6562 91.0938
Length 176, alignment 0/11: 98.75 510 95 92.8125
Length 176, alignment 11/11: 84.8438 510 80.9375 79.2188
Length 192, alignment 0/ 0: 37.6562 554.531 33.75 31.5625
Length 192, alignment 12/ 0: 96.4062 554.531 92.6562 91.0938
Length 192, alignment 0/12: 98.75 554.531 94.8438 92.6562
Length 192, alignment 12/12: 84.6875 554.531 232.656 79.6875
Length 208, alignment 0/ 0: 38.2812 598.906 34.6875 31.7188
Length 208, alignment 13/ 0: 97.1875 598.906 92.9688 91.0938
Length 208, alignment 0/13: 123.125 598.906 119.062 123.594
Length 208, alignment 13/13: 90 598.906 85.9375 88.5938
Length 224, alignment 0/ 0: 42.6562 643.438 38.75 40.625
Length 224, alignment 14/ 0: 120.625 643.438 116.719 121.875
Length 224, alignment 0/14: 122.812 643.438 118.75 123.438
Length 224, alignment 14/14: 90 643.438 85.9375 88.4375
Length 240, alignment 0/ 0: 42.8125 687.656 38.9062 40.7812
Length 240, alignment 15/ 0: 120.625 687.969 116.562 121.719
Length 240, alignment 0/15: 122.812 687.812 118.75 123.281
Length 240, alignment 15/15: 90 687.812 85.9375 88.4375
Length 272, alignment 0/ 0: 42.8125 776.719 38.9062 40.625
Length 272, alignment 17/ 0: 120.469 776.719 116.562 121.719
Length 272, alignment 0/17: 147.812 776.875 143.125 141.719
Length 272, alignment 17/17: 95.3125 776.875 91.25 89.8438
Length 288, alignment 0/ 0: 47.9688 821.25 43.9062 41.7188
Length 288, alignment 18/ 0: 144.531 821.25 140.781 138.906
Length 288, alignment 0/18: 146.719 821.25 142.812 140.469
Length 288, alignment 18/18: 95.1562 821.25 91.25 89.375
Length 304, alignment 0/ 0: 47.9688 865.625 44.0625 41.7188
Length 304, alignment 19/ 0: 144.531 865.625 140.625 138.75
Length 304, alignment 0/19: 146.719 865.781 142.812 140.469
Length 304, alignment 19/19: 95.1562 865.625 91.25 89.375
Length 320, alignment 0/ 0: 47.9688 910 44.0625 41.7188
Length 320, alignment 20/ 0: 144.531 910 140.625 138.906
Length 320, alignment 0/20: 146.875 910.156 142.969 140.625
Length 320, alignment 20/20: 95.1562 910 91.25 89.5312
Length 336, alignment 0/ 0: 47.9688 954.531 43.9062 41.7188
Length 336, alignment 21/ 0: 144.531 954.531 140.781 138.906
Length 336, alignment 0/21: 171.562 954.531 167.031 165.312
Length 336, alignment 21/21: 100 954.531 96.0938 94.6875
Length 352, alignment 0/ 0: 52.6562 999.062 48.9062 46.7188
Length 352, alignment 22/ 0: 168.281 998.906 164.531 162.812
Length 352, alignment 0/22: 172.031 998.906 165.312 162.656
Length 352, alignment 22/22: 100.469 999.219 96.25 94.6875
Length 368, alignment 0/ 0: 52.9688 1043.44 49.2188 46.7188
Length 368, alignment 23/ 0: 168.594 1043.91 164.844 162.812
Length 368, alignment 0/23: 170.781 1043.44 166.562 164.375
Length 368, alignment 23/23: 99.8438 1043.59 96.0938 94.5312
Length 384, alignment 0/ 0: 52.8125 1087.97 49.0625 46.5625
Length 384, alignment 24/ 0: 167.656 1087.97 164.062 162.188
Length 384, alignment 0/24: 170 1087.97 165.938 163.906
Length 384, alignment 24/24: 100.625 1087.34 95.9375 93.9062
Length 400, alignment 0/ 0: 53.4375 1132.5 49.375 46.7188
Length 400, alignment 25/ 0: 169.062 1132.34 164.531 163.281
Length 400, alignment 0/25: 195.625 1132.34 190.938 189.375
Length 400, alignment 25/25: 105.312 1132.5 101.094 100
Length 416, alignment 0/ 0: 58.125 1176.72 54.0625 51.7188
Length 416, alignment 26/ 0: 192.656 1176.88 188.281 186.719
Length 416, alignment 0/26: 194.844 1176.88 190.625 188.281
Length 416, alignment 26/26: 105.469 1176.88 101.094 99.375
Length 432, alignment 0/ 0: 58.125 1221.25 54.0625 51.7188
Length 432, alignment 27/ 0: 192.5 1221.41 188.438 186.562
Length 432, alignment 0/27: 194.688 1221.25 190.625 188.438
Length 432, alignment 27/27: 105.312 1221.25 101.094 99.375
Length 448, alignment 0/ 0: 58.125 1265.78 53.9062 51.7188
Length 448, alignment 28/ 0: 192.656 1265.62 188.438 186.719
Length 448, alignment 0/28: 194.844 1265.62 190.625 188.281
Length 448, alignment 28/28: 105.312 1265.62 101.25 99.375
Length 464, alignment 0/ 0: 58.125 1310.16 53.9062 51.7188
Length 464, alignment 29/ 0: 192.5 1311.25 189.062 186.562
Length 464, alignment 0/29: 219.062 1310.31 215.156 212.969
Length 464, alignment 29/29: 110.781 1310.16 106.25 105
Length 480, alignment 0/ 0: 63.2812 1354.69 59.0625 57.1875
Length 480, alignment 30/ 0: 216.406 1354.69 212.344 210.938
Length 480, alignment 0/30: 218.75 1354.53 214.531 212.5
Length 480, alignment 30/30: 110.312 1354.53 106.25 104.844
Length 496, alignment 0/ 0: 63.125 1399.06 59.0625 57.0312
Length 496, alignment 31/ 0: 216.25 1399.06 212.5 210.938
Length 496, alignment 0/31: 218.594 1399.06 215 212.656
Length 496, alignment 31/31: 110.469 1398.91 106.25 104.688
Length 1024, alignment 0/ 0: 107.031 2866.09 103.125 100.938
Length 1024, alignment 32/ 0: 106.875 2863.91 101.406 100.781
Length 1024, alignment 0/32: 106.719 2865.47 102.812 100.625
Length 1024, alignment 32/32: 106.875 2865 102.656 100.156
Length 1056, alignment 0/ 0: 115.781 2954.69 111.875 108.906
Length 1056, alignment 33/ 0: 434.688 2954.69 430.938 428.438
Length 1056, alignment 0/33: 436.875 2954.53 433.125 430.156
Length 1056, alignment 33/33: 159.219 3092.19 155.469 161.094
Length 1088, alignment 0/ 0: 112.031 3043.44 108.75 112.969
Length 1088, alignment 34/ 0: 435 3043.59 430.781 428.281
Length 1088, alignment 0/34: 436.875 3043.59 433.125 430.156
Length 1088, alignment 34/34: 159.219 3043.44 155.469 160.469
Length 1120, alignment 0/ 0: 117.031 3132.34 113.125 117.812
Length 1120, alignment 35/ 0: 458.906 3132.34 455 452.344
Length 1120, alignment 0/35: 461.094 3132.5 457.344 453.906
Length 1120, alignment 35/35: 164.531 3132.34 160.625 157.812
Length 1152, alignment 0/ 0: 117.5 3221.25 113.594 110.156
Length 1152, alignment 36/ 0: 458.906 3221.25 455 452.344
Length 1152, alignment 0/36: 461.094 3221.25 457.344 454.062
Length 1152, alignment 36/36: 164.531 3221.25 160.781 157.812
Length 1184, alignment 0/ 0: 122.344 3445 118.281 115.781
Length 1184, alignment 37/ 0: 482.969 3310.31 479.062 476.25
Length 1184, alignment 0/37: 485 3310.16 481.25 477.812
Length 1184, alignment 37/37: 169.375 3310.16 165.469 163.125
Length 1216, alignment 0/ 0: 122.344 3398.91 118.594 115.312
Length 1216, alignment 38/ 0: 482.812 3399.06 479.062 476.25
Length 1216, alignment 0/38: 485 3398.91 481.25 477.812
Length 1216, alignment 38/38: 169.375 3398.91 165.312 163.125
Length 1248, alignment 0/ 0: 127.344 3487.97 123.594 120.156
Length 1248, alignment 39/ 0: 506.562 3487.97 502.812 500.156
Length 1248, alignment 0/39: 508.906 3487.97 505 501.719
Length 1248, alignment 39/39: 174.531 3487.81 299.062 168.906
Length 1280, alignment 0/ 0: 127.344 3577.19 123.125 120.156
Length 1280, alignment 40/ 0: 506.562 3576.72 502.812 500.156
Length 1280, alignment 0/40: 508.906 3576.72 505 501.719
Length 1280, alignment 40/40: 174.531 3576.72 170.312 168.125
Length 1312, alignment 0/ 0: 132.5 3665.78 128.438 125.156
Length 1312, alignment 41/ 0: 530.625 3665.62 526.719 524.062
Length 1312, alignment 0/41: 532.812 3665.78 529.062 526.094
Length 1312, alignment 41/41: 179.531 3666.56 175.469 172.812
Length 1344, alignment 0/ 0: 132.344 3754.84 128.594 125
Length 1344, alignment 42/ 0: 530.469 3755 526.875 524.062
Length 1344, alignment 0/42: 532.812 3754.84 668.125 526.875
Length 1344, alignment 42/42: 179.531 3755 175.625 173.125
Length 1376, alignment 0/ 0: 137.344 3843.44 133.75 130.156
Length 1376, alignment 43/ 0: 554.375 3843.44 550.625 547.812
Length 1376, alignment 0/43: 556.719 3843.59 552.969 549.531
Length 1376, alignment 43/43: 184.688 3843.44 180.469 178.125
Length 1408, alignment 0/ 0: 137.344 3932.34 133.594 130.156
Length 1408, alignment 44/ 0: 554.375 3932.5 550.625 547.812
Length 1408, alignment 0/44: 556.719 3932.5 552.812 549.688
Length 1408, alignment 44/44: 184.531 3932.34 180.312 178.125
Length 1440, alignment 0/ 0: 142.344 4021.41 138.438 135.312
Length 1440, alignment 45/ 0: 578.438 4158.28 574.844 572.5
Length 1440, alignment 0/45: 580.469 4021.25 576.875 573.281
Length 1440, alignment 45/45: 189.531 4021.56 185.312 183.125
Length 1472, alignment 0/ 0: 142.344 4110.47 138.594 135.156
Length 1472, alignment 46/ 0: 578.281 4110.47 574.531 571.719
Length 1472, alignment 0/46: 580.625 4110.47 576.719 573.438
Length 1472, alignment 46/46: 189.531 4110.47 185.312 183.125
Length 1504, alignment 0/ 0: 147.344 4199.22 143.594 140.156
Length 1504, alignment 47/ 0: 602.188 4199.38 598.438 595.938
Length 1504, alignment 0/47: 604.375 4199.38 600.312 597.344
Length 1504, alignment 47/47: 195.625 4199.38 190.312 188.594
Length 1536, alignment 0/ 0: 147.344 4288.28 143.75 140.312
Length 1536, alignment 48/ 0: 147.344 4288.28 143.594 140.312
Length 1536, alignment 0/48: 148.125 4288.28 144.219 140.781
Length 1536, alignment 48/48: 147.969 4288.28 144.375 141.25
Length 1568, alignment 0/ 0: 153.125 4377.19 149.062 145.938
Length 1568, alignment 49/ 0: 626.094 4377.19 622.344 619.531
Length 1568, alignment 0/49: 628.281 4377.19 624.219 621.094
Length 1568, alignment 49/49: 200.156 4377.19 195.938 193.438
Length 1600, alignment 0/ 0: 152.969 4466.09 148.75 145.469
Length 1600, alignment 50/ 0: 626.094 4466.88 622.031 620.312
Length 1600, alignment 0/50: 628.281 4465.78 624.531 621.406
Length 1600, alignment 50/50: 200 4465.62 195.938 193.594
Length 1632, alignment 0/ 0: 157.969 4554.69 154.844 150.938
Length 1632, alignment 51/ 0: 650 4554.69 646.094 643.438
Length 1632, alignment 0/51: 652.188 4554.69 648.125 645
Length 1632, alignment 51/51: 206.406 4555.16 201.406 202.344
Length 1664, alignment 0/ 0: 157.812 4643.59 153.906 150.781
Length 1664, alignment 52/ 0: 650 4643.59 645.938 643.438
Length 1664, alignment 0/52: 780 4644.22 648.125 645.156
Length 1664, alignment 52/52: 209.219 4644.38 206.25 201.25
Length 1696, alignment 0/ 0: 165.469 4733.12 163.906 158.281
Length 1696, alignment 53/ 0: 673.906 4732.81 669.844 667.344
Length 1696, alignment 0/53: 676.094 4732.81 672.344 669.219
Length 1696, alignment 53/53: 214.062 4732.66 226.719 211.562
Length 1728, alignment 0/ 0: 171.875 4821.41 163.125 159.531
Length 1728, alignment 54/ 0: 673.75 4821.41 669.688 667.188
Length 1728, alignment 0/54: 676.094 4954.69 672.031 669.688
Length 1728, alignment 54/54: 225.781 4821.88 211.094 223.594
Length 1760, alignment 0/ 0: 177.188 4910.47 166.25 163.438
Length 1760, alignment 55/ 0: 697.656 4910.31 693.75 691.094
Length 1760, alignment 0/55: 700 4910.31 695.938 692.812
Length 1760, alignment 55/55: 237.656 4910.31 234.062 230.469
Length 1792, alignment 0/ 0: 170.312 4999.22 167.656 162.188
Length 1792, alignment 56/ 0: 697.812 4999.22 693.594 691.094
Length 1792, alignment 0/56: 700 4999.22 695.781 692.812
Length 1792, alignment 56/56: 235.625 5001.09 232.344 229.375
Length 1824, alignment 0/ 0: 192.031 5088.75 182.5 183.594
Length 1824, alignment 57/ 0: 721.719 5088.28 718.125 715.625
Length 1824, alignment 0/57: 723.906 5088.75 720 716.875
Length 1824, alignment 57/57: 242.5 5088.28 240 235.312
Length 1856, alignment 0/ 0: 185 5177.03 187.031 175.625
Length 1856, alignment 58/ 0: 721.719 5177.03 717.656 715
Length 1856, alignment 0/58: 724.062 5177.34 720 717.344
Length 1856, alignment 58/58: 242.969 5316.56 239.688 236.094
Length 1888, alignment 0/ 0: 187.969 5266.72 188.125 190
Length 1888, alignment 59/ 0: 745.625 5266.41 741.719 739.375
Length 1888, alignment 0/59: 747.969 5266.56 743.594 740.781
Length 1888, alignment 59/59: 250 5266.56 247.031 242.812
Length 1920, alignment 0/ 0: 192.031 5355.16 200.312 180.312
Length 1920, alignment 60/ 0: 745.469 5355.16 741.562 739.219
Length 1920, alignment 0/60: 747.656 5355.16 743.594 740.781
Length 1920, alignment 60/60: 250.625 5355.31 247.5 242.656
Length 1952, alignment 0/ 0: 211.25 5444.53 208.281 203.906
Length 1952, alignment 61/ 0: 769.375 5444.22 765.781 763.281
Length 1952, alignment 0/61: 771.562 5444.38 768.125 765.312
Length 1952, alignment 61/61: 254.688 5443.91 251.875 247.344
Length 1984, alignment 0/ 0: 210.938 5532.81 208.125 203.594
Length 1984, alignment 62/ 0: 769.531 5532.97 765.625 763.281
Length 1984, alignment 0/62: 771.562 5664.53 767.5 765.781
Length 1984, alignment 62/62: 254.531 5533.44 252.188 247.344
Length 2016, alignment 0/ 0: 215.938 5622.03 213.125 208.438
Length 2016, alignment 63/ 0: 793.281 5621.72 789.531 787.969
Length 2016, alignment 0/63: 795.469 5621.72 791.719 789.531
Length 2016, alignment 63/63: 261.875 5621.72 259.531 255.312
Length 65536, alignment 0/ 0: 8698.28 193524 8673.91 16843.6
__memcpy_thunderx __memcpy_generic
Length 65543, alignment 0/ 0: 8917.5 16964.4
Length 65551, alignment 0/ 3: 27263.1 35761.9
Length 65567, alignment 3/ 0: 26981.2 37039.4
Length 65599, alignment 3/ 5: 27258.1 35762.5
Length 131079, alignment 0/ 0: 17373.8 33872.5
Length 131087, alignment 0/ 3: 54279.4 71325
Length 131103, alignment 3/ 0: 53722.5 72745.6
Length 131135, alignment 3/ 5: 54763.8 71325.6
Length 262151, alignment 0/ 0: 35076.9 67497.5
Length 262159, alignment 0/ 3: 108816 142948
Length 262175, alignment 3/ 0: 107732 145918
Length 262207, alignment 3/ 5: 108326 142461
Length 524295, alignment 0/ 0: 68905 134738
Length 524303, alignment 0/ 3: 216870 285178
Length 524319, alignment 3/ 0: 214163 290869
Length 524351, alignment 3/ 5: 216894 285184
Length 1048583, alignment 0/ 0: 138173 270021
Length 1048591, alignment 0/ 3: 433695 570389
Length 1048607, alignment 3/ 0: 429163 582216
Length 1048639, alignment 3/ 5: 433696 570890
Length 2097159, alignment 0/ 0: 275731 540180
Length 2097167, alignment 0/ 3: 867276 1.14113e+06
Length 2097183, alignment 3/ 0: 857778 1.16346e+06
Length 2097215, alignment 3/ 5: 866851 1.1407e+06
Length 4194311, alignment 0/ 0: 551626 1.08047e+06
Length 4194319, alignment 0/ 3: 1.73384e+06 2.2816e+06
Length 4194335, alignment 3/ 0: 1.71571e+06 2.32717e+06
Length 4194367, alignment 3/ 5: 1.73384e+06 2.2816e+06
Length 8388615, alignment 0/ 0: 1.29659e+06 3.89121e+06
Length 8388623, alignment 0/ 3: 3.52809e+06 6.25292e+06
Length 8388639, alignment 3/ 0: 3.4988e+06 6.3012e+06
Length 8388671, alignment 3/ 5: 3.52861e+06 6.21914e+06
Length 16777223, alignment 0/ 0: 3.72447e+06 1.37534e+07
Length 16777231, alignment 0/ 3: 7.49027e+06 1.86301e+07
Length 16777247, alignment 3/ 0: 7.46343e+06 1.88143e+07
Length 16777279, alignment 3/ 5: 7.49158e+06 1.85915e+07
Length 33554439, alignment 0/ 0: 7.98001e+06 2.81398e+07
Length 33554447, alignment 0/ 3: 1.52274e+07 3.77285e+07
Length 33554463, alignment 3/ 0: 1.51635e+07 3.81515e+07
Length 33554495, alignment 3/ 5: 1.52274e+07 3.77297e+07
__memcpy_thunderx __memcpy_generic
Memory size 4096: 96434.8 95387.3
Memory size 8192: 94016.1 93048.1
Memory size 16384: 102725 101778
Memory size 32768: 108842 107799
Memory size 65536: 149327 148883
In sysdeps/aarch64/multiarch/memcpy_generic.S, it has:
+#include "../memcpy.S"
Is it ok to use relative path here? or rather it's recommended use of the full path since sysdeps?
On 03/24/2017 08:25 PM, Steve Ellcey wrote:
> Now that the IFUNC infrastructure for aarch64 is in place, here is a
> patch to use it to create ThunderX specific versions of memcpy and
> memmove.
>
> This was part of my original patch before it was split in two and a
> couple of issues were raised at that time.
>
> Siddhesh Poyarekar wanted to separate the generic and thunderx copies
> of memcpy/memmove instead of using ifdefs in a combined source file.
> I prefer the ifdef version as a cleaner implementation with less code
> duplication but I can change it if that is the consensus.
>
> Also Adhemerval Zanella did some benchmarking that showed the
> prefetching done in the thunderx version might be appropriate for the
> generic version. However if you look at the prefetching we only do it
> every other time through the loop. This is because the loop copies 64
> bytes and the ThunderX cache line size is 128 bytes. If other aarch64
> chips have a 64 byte cache line they might want a different prefetching
> setup.
>
> If people think we should use the ThunderX version of memcpy for all
> aarch64 systems I am happy to drop this patch and create one that just
> changes memcpy.S to do the ThunderX style prefetches for all aarch64
> systems.
>
> Steve Ellcey
> sellcey@cavium.com
>
>
> 2017-03-24 Steve Ellcey <sellcey@caviumnetworks.com>
>
> * sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
> (memmove): Use MEMMOVE for name.
> (memcpy): Use MEMCPY for name. Add loop with prefetching
> under USE_THUNDERX macro.
> * sysdeps/aarch64/multiarch/Makefile: New file.
> * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Likewise.
> * sysdeps/aarch64/multiarch/init-arch.h: Likewise.
> * sysdeps/aarch64/multiarch/memcpy.c: Likewise.
> * sysdeps/aarch64/multiarch/memcpy_generic.S: Likewise.
> * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Likewise.
> * sysdeps/aarch64/multiarch/memmove.c: Likewise.
On Sat, 2017-04-01 at 21:01 -0300, Wainer dos Santos Moschetta wrote:
> In sysdeps/aarch64/multiarch/memcpy_generic.S, it has:
> +#include "../memcpy.S"
>
> Is it ok to use relative path here? or rather it's recommended use of
> the full path since sysdeps?
I think its OK. I don't see any preference listed in the Coding Style
page of the glibc wiki for one way or the other. I see other includes
of relative paths, the most common one is '#include "../test-
skeleton.c"' but I also see other examples:
sysdeps/sparc/sparc64/multiarch/rtld-memset.c:#include "../rtld-memset.c"
sysdeps/sparc/sparc64/multiarch/rtld-memcpy.c:#include "../rtld-memcpy.c"
sysdeps/wordsize-64/ftw.c:#include "../../io/ftw.c"
sysdeps/wordsize-64/fts.c:#include "../../io/fts.c"
sysdeps/unix/sysv/linux/sparc/sparc64/xstat.c:#include "../../i386/xstat.c"
sysdeps/unix/sysv/linux/sparc/sparc64/fxstat.c:#include "../../i386/fxstat.c"
sysdeps/unix/sysv/linux/sparc/sparc64/fxstatat.c:#include "../../i386/fxstatat.c"
sysdeps/unix/sysv/linux/sparc/sparc64/lxstat.c:#include "../../i386/lxstat.c"
sysdeps/unix/sysv/linux/aarch64/readelflib.c:#include "../arm/readelflib.c"
sysdeps/unix/sysv/linux/wordsize-64/statvfs.c:#include "../statvfs.c"
sysdeps/unix/sysv/linux/wordsize-64/getdirentries.c:#include "../getdirentries.c"
sysdeps/unix/sysv/linux/wordsize-64/fstatvfs.c:#include "../fstatvfs.c"
sysdeps/unix/sysv/linux/wordsize-64/aio_write.c:#include "../../../../pthread/aio_write.c"
sysdeps/unix/sysv/linux/wordsize-64/openat.c:#include "../openat.c"
That seems more common than using:
sysdeps/unix/sysv/linux/s390/s390-32/updwtmp.c:#include "sysdeps/gnu/updwtmp.c"
sysdeps/unix/sysv/linux/s390/s390-32/getutmp.c:#include "sysdeps/gnu/getutmp.c"
sysdeps/x86_64/fpu/e_sqrtl.c:#include "sysdeps/i386/fpu/e_sqrtl.c"
sysdeps/x86_64/fpu/e_atan2l.c:#include "sysdeps/i386/fpu/e_atan2l.c"
sysdeps/x86_64/fpu/s_atanl.c:#include "sysdeps/i386/fpu/s_atanl.c"
sysdeps/x86_64/fpu/e_acosl.c:#include "sysdeps/i386/fpu/e_acosl.c"
Steve Ellcey
On Saturday 25 March 2017 04:55 AM, Steve Ellcey wrote:
> If people think we should use the ThunderX version of memcpy for all
> aarch64 systems I am happy to drop this patch and create one that just
> changes memcpy.S to do the ThunderX style prefetches for all aarch64
> systems.
That could be done as an add-on if we find out that it is the case.
The patch looks good to me with the formatting fixups I have specified
inline.
Siddhesh
> 2017-03-24 Steve Ellcey <sellcey@caviumnetworks.com>
>
> * sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
> (memmove): Use MEMMOVE for name.
> (memcpy): Use MEMCPY for name. Add loop with prefetching
> under USE_THUNDERX macro.
> * sysdeps/aarch64/multiarch/Makefile: New file.
> * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Likewise.
> * sysdeps/aarch64/multiarch/init-arch.h: Likewise.
> * sysdeps/aarch64/multiarch/memcpy.c: Likewise.
> * sysdeps/aarch64/multiarch/memcpy_generic.S: Likewise.
> * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Likewise.
> * sysdeps/aarch64/multiarch/memmove.c: Likewise.
>
>
> ifunc.patch
>
>
> diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
> index 29af8b1..74444b4 100644
> --- a/sysdeps/aarch64/memcpy.S
> +++ b/sysdeps/aarch64/memcpy.S
> @@ -59,7 +59,14 @@
> Overlapping large forward memmoves use a loop that copies backwards.
> */
>
> -ENTRY_ALIGN (memmove, 6)
> +#ifndef MEMMOVE
> +# define MEMMOVE memmove
Single char indent.
> +#endif
> +#ifndef MEMCPY
> +# define MEMCPY memcpy
Likewise.
> +#endif
> +
> +ENTRY_ALIGN (MEMMOVE, 6)
>
> DELOUSE (0)
> DELOUSE (1)
> @@ -71,9 +78,9 @@ ENTRY_ALIGN (memmove, 6)
> b.lo L(move_long)
>
> /* Common case falls through into memcpy. */
> -END (memmove)
> -libc_hidden_builtin_def (memmove)
> -ENTRY (memcpy)
> +END (MEMMOVE)
> +libc_hidden_builtin_def (MEMMOVE)
> +ENTRY (MEMCPY)
>
> DELOUSE (0)
> DELOUSE (1)
> @@ -158,10 +165,22 @@ L(copy96):
>
> .p2align 4
> L(copy_long):
> +
> +#ifdef USE_THUNDERX
> +
> + /* On thunderx, large memcpy's are helped by software prefetching.
> + This loop is identical to the one below it but with prefetching
> + instructions included. For loops that are less than 32768 bytes,
> + the prefetching does not help and slow the code down so we only
> + use the prefetching loop for the largest memcpys. */
> +
> + cmp count, #32768
> + b.lo L(copy_long_without_prefetch)
> and tmp1, dstin, 15
> bic dst, dstin, 15
> ldp D_l, D_h, [src]
> sub src, src, tmp1
> + prfm pldl1strm, [src, 384]
> add count, count, tmp1 /* Count is now 16 too large. */
> ldp A_l, A_h, [src, 16]
> stp D_l, D_h, [dstin]
> @@ -169,7 +188,10 @@ L(copy_long):
> ldp C_l, C_h, [src, 48]
> ldp D_l, D_h, [src, 64]!
> subs count, count, 128 + 16 /* Test and readjust count. */
> - b.ls 2f
> +
> +L(prefetch_loop64):
> + tbz src, #6, 1f
> + prfm pldl1strm, [src, 512]
> 1:
> stp A_l, A_h, [dst, 16]
> ldp A_l, A_h, [src, 16]
> @@ -180,12 +202,40 @@ L(copy_long):
> stp D_l, D_h, [dst, 64]!
> ldp D_l, D_h, [src, 64]!
> subs count, count, 64
> - b.hi 1b
> + b.hi L(prefetch_loop64)
> + b L(last64)
> +
> +L(copy_long_without_prefetch):
> +#endif
> +
> + and tmp1, dstin, 15
> + bic dst, dstin, 15
> + ldp D_l, D_h, [src]
> + sub src, src, tmp1
> + add count, count, tmp1 /* Count is now 16 too large. */
> + ldp A_l, A_h, [src, 16]
> + stp D_l, D_h, [dstin]
> + ldp B_l, B_h, [src, 32]
> + ldp C_l, C_h, [src, 48]
> + ldp D_l, D_h, [src, 64]!
> + subs count, count, 128 + 16 /* Test and readjust count. */
> + b.ls L(last64)
> +L(loop64):
> + stp A_l, A_h, [dst, 16]
> + ldp A_l, A_h, [src, 16]
> + stp B_l, B_h, [dst, 32]
> + ldp B_l, B_h, [src, 32]
> + stp C_l, C_h, [dst, 48]
> + ldp C_l, C_h, [src, 48]
> + stp D_l, D_h, [dst, 64]!
> + ldp D_l, D_h, [src, 64]!
> + subs count, count, 64
> + b.hi L(loop64)
>
> /* Write the last full set of 64 bytes. The remainder is at most 64
> bytes, so it is safe to always copy 64 bytes from the end even if
> there is just 1 byte left. */
> -2:
> +L(last64):
> ldp E_l, E_h, [srcend, -64]
> stp A_l, A_h, [dst, 16]
> ldp A_l, A_h, [srcend, -48]
> @@ -256,5 +306,5 @@ L(move_long):
> stp C_l, C_h, [dstin]
> 3: ret
>
> -END (memcpy)
> -libc_hidden_builtin_def (memcpy)
> +END (MEMCPY)
> +libc_hidden_builtin_def (MEMCPY)
> diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile
> index e69de29..78d52c7 100644
> --- a/sysdeps/aarch64/multiarch/Makefile
> +++ b/sysdeps/aarch64/multiarch/Makefile
> @@ -0,0 +1,3 @@
> +ifeq ($(subdir),string)
> +sysdep_routines += memcpy_generic memcpy_thunderx
> +endif
> diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> index e69de29..c4f23df 100644
> --- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> @@ -0,0 +1,51 @@
> +/* Enumerate available IFUNC implementations of a function. AARCH64 version.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#include <assert.h>
> +#include <string.h>
> +#include <wchar.h>
> +#include <ldsodefs.h>
> +#include <ifunc-impl-list.h>
> +#include <init-arch.h>
> +#include <stdio.h>
> +
> +/* Maximum number of IFUNC implementations. */
> +#define MAX_IFUNC 2
> +
> +size_t
> +__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> + size_t max)
> +{
> + assert (max >= MAX_IFUNC);
> +
> + size_t i = 0;
> +
> + INIT_ARCH ();
> +
> + /* Support sysdeps/aarch64/multiarch/memcpy.c and memmove.c. */
> + IFUNC_IMPL (i, name, memcpy,
> + IFUNC_IMPL_ADD (array, i, memcpy, IS_THUNDERX (midr),
> + __memcpy_thunderx)
> + IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic))
> + IFUNC_IMPL (i, name, memmove,
> + IFUNC_IMPL_ADD (array, i, memmove, IS_THUNDERX (midr),
> + __memmove_thunderx)
> + IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic))
> +
> + return i;
> +}
> diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
> index e69de29..e690e00 100644
> --- a/sysdeps/aarch64/multiarch/init-arch.h
> +++ b/sysdeps/aarch64/multiarch/init-arch.h
> @@ -0,0 +1,22 @@
> +/* This file is part of the GNU C Library.
One line description of the file.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#include <ldsodefs.h>
> +
> +#define INIT_ARCH() \
> + uint64_t __attribute__((unused)) midr = \
> + GLRO(dl_aarch64_cpu_features).midr_el1;
> diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c
> index e69de29..4e3f251 100644
> --- a/sysdeps/aarch64/multiarch/memcpy.c
> +++ b/sysdeps/aarch64/multiarch/memcpy.c
> @@ -0,0 +1,39 @@
> +/* Multiple versions of memcpy. AARCH64 version.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +/* Define multiple versions only for the definition in libc. */
> +
> +#if IS_IN (libc)
> +/* Redefine memcpy so that the compiler won't complain about the type
> + mismatch with the IFUNC selector in strong_alias, below. */
> +# undef memcpy
> +# define memcpy __redirect_memcpy
> +# include <string.h>
> +# include <init-arch.h>
> +
> +extern __typeof (__redirect_memcpy) __libc_memcpy;
> +
> +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
> +extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden;
> +
> +libc_ifunc (__libc_memcpy,
> + IS_THUNDERX (midr) ? __memcpy_thunderx : __memcpy_generic);
> +
> +#undef memcpy
Single char indent.
> +strong_alias (__libc_memcpy, memcpy);
> +#endif
> diff --git a/sysdeps/aarch64/multiarch/memcpy_generic.S b/sysdeps/aarch64/multiarch/memcpy_generic.S
> index e69de29..50e1a1c 100644
> --- a/sysdeps/aarch64/multiarch/memcpy_generic.S
> +++ b/sysdeps/aarch64/multiarch/memcpy_generic.S
> @@ -0,0 +1,42 @@
> +/* A Generic Optimized memcpy implementation for AARCH64.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +/* The actual memcpy and memmove code is in ../memcpy.S. If we are
> + building libc this file defines __memcpy_generic and __memmove_generic.
> + Otherwise the include of ../memcpy.S will define the normal __memcpy
> + and__memmove entry points. */
> +
> +#include <sysdep.h>
> +
> +#if IS_IN (libc)
> +
> +#define MEMCPY __memcpy_generic
> +#define MEMMOVE __memmove_generic
> +
> +/* Do not hide the generic versions of memcpy and memmove, we use them
> + internally. */
> +#undef libc_hidden_builtin_def
> +#define libc_hidden_builtin_def(name)
> +
> +/* It doesn't make sense to send libc-internal memcpy calls through a PLT. */
> + .globl __GI_memcpy; __GI_memcpy = __memcpy_generic
> + .globl __GI_memmove; __GI_memmove = __memmove_generic
Single char indent for all macro defs.
> +
> +#endif
> +
> +#include "../memcpy.S"
> diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx.S b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> index e69de29..ee971c8 100644
> --- a/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> +++ b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> @@ -0,0 +1,32 @@
> +/* A Thunderx Optimized memcpy implementation for AARCH64.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +/* The actual thunderx optimized code is in ../memcpy.S under the USE_THUNDERX
> + ifdef. If we are not building libc then we do not build anything when
> + compiling this file and __memcpy is defined by memcpy_generic.S. */
> +
> +#include <sysdep.h>
> +
> +#if IS_IN (libc)
> +
> +#define MEMCPY __memcpy_thunderx
> +#define MEMMOVE __memmove_thunderx
> +#define USE_THUNDERX
> +#include "../memcpy.S"
Single char indent for all macro defs.
> +
> +#endif
> diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c
> index e69de29..8d7a146 100644
> --- a/sysdeps/aarch64/multiarch/memmove.c
> +++ b/sysdeps/aarch64/multiarch/memmove.c
> @@ -0,0 +1,39 @@
> +/* Multiple versions of memmove. AARCH64 version.
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +/* Define multiple versions only for the definition in libc. */
> +
> +#if IS_IN (libc)
> +/* Redefine memmove so that the compiler won't complain about the type
> + mismatch with the IFUNC selector in strong_alias, below. */
> +# undef memmove
> +# define memmove __redirect_memmove
> +# include <string.h>
> +# include <init-arch.h>
> +
> +extern __typeof (__redirect_memmove) __libc_memmove;
> +
> +extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden;
> +extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden;
> +
> +libc_ifunc (__libc_memmove,
> + IS_THUNDERX (midr) ? __memmove_thunderx : __memmove_generic);
> +
> +#undef memmove
Single char indent.
> +strong_alias (__libc_memmove, memmove);
> +#endif
>
@@ -59,7 +59,14 @@
Overlapping large forward memmoves use a loop that copies backwards.
*/
-ENTRY_ALIGN (memmove, 6)
+#ifndef MEMMOVE
+# define MEMMOVE memmove
+#endif
+#ifndef MEMCPY
+# define MEMCPY memcpy
+#endif
+
+ENTRY_ALIGN (MEMMOVE, 6)
DELOUSE (0)
DELOUSE (1)
@@ -71,9 +78,9 @@ ENTRY_ALIGN (memmove, 6)
b.lo L(move_long)
/* Common case falls through into memcpy. */
-END (memmove)
-libc_hidden_builtin_def (memmove)
-ENTRY (memcpy)
+END (MEMMOVE)
+libc_hidden_builtin_def (MEMMOVE)
+ENTRY (MEMCPY)
DELOUSE (0)
DELOUSE (1)
@@ -158,10 +165,22 @@ L(copy96):
.p2align 4
L(copy_long):
+
+#ifdef USE_THUNDERX
+
+ /* On thunderx, large memcpy's are helped by software prefetching.
+ This loop is identical to the one below it but with prefetching
+ instructions included. For loops that are less than 32768 bytes,
+ the prefetching does not help and slow the code down so we only
+ use the prefetching loop for the largest memcpys. */
+
+ cmp count, #32768
+ b.lo L(copy_long_without_prefetch)
and tmp1, dstin, 15
bic dst, dstin, 15
ldp D_l, D_h, [src]
sub src, src, tmp1
+ prfm pldl1strm, [src, 384]
add count, count, tmp1 /* Count is now 16 too large. */
ldp A_l, A_h, [src, 16]
stp D_l, D_h, [dstin]
@@ -169,7 +188,10 @@ L(copy_long):
ldp C_l, C_h, [src, 48]
ldp D_l, D_h, [src, 64]!
subs count, count, 128 + 16 /* Test and readjust count. */
- b.ls 2f
+
+L(prefetch_loop64):
+ tbz src, #6, 1f
+ prfm pldl1strm, [src, 512]
1:
stp A_l, A_h, [dst, 16]
ldp A_l, A_h, [src, 16]
@@ -180,12 +202,40 @@ L(copy_long):
stp D_l, D_h, [dst, 64]!
ldp D_l, D_h, [src, 64]!
subs count, count, 64
- b.hi 1b
+ b.hi L(prefetch_loop64)
+ b L(last64)
+
+L(copy_long_without_prefetch):
+#endif
+
+ and tmp1, dstin, 15
+ bic dst, dstin, 15
+ ldp D_l, D_h, [src]
+ sub src, src, tmp1
+ add count, count, tmp1 /* Count is now 16 too large. */
+ ldp A_l, A_h, [src, 16]
+ stp D_l, D_h, [dstin]
+ ldp B_l, B_h, [src, 32]
+ ldp C_l, C_h, [src, 48]
+ ldp D_l, D_h, [src, 64]!
+ subs count, count, 128 + 16 /* Test and readjust count. */
+ b.ls L(last64)
+L(loop64):
+ stp A_l, A_h, [dst, 16]
+ ldp A_l, A_h, [src, 16]
+ stp B_l, B_h, [dst, 32]
+ ldp B_l, B_h, [src, 32]
+ stp C_l, C_h, [dst, 48]
+ ldp C_l, C_h, [src, 48]
+ stp D_l, D_h, [dst, 64]!
+ ldp D_l, D_h, [src, 64]!
+ subs count, count, 64
+ b.hi L(loop64)
/* Write the last full set of 64 bytes. The remainder is at most 64
bytes, so it is safe to always copy 64 bytes from the end even if
there is just 1 byte left. */
-2:
+L(last64):
ldp E_l, E_h, [srcend, -64]
stp A_l, A_h, [dst, 16]
ldp A_l, A_h, [srcend, -48]
@@ -256,5 +306,5 @@ L(move_long):
stp C_l, C_h, [dstin]
3: ret
-END (memcpy)
-libc_hidden_builtin_def (memcpy)
+END (MEMCPY)
+libc_hidden_builtin_def (MEMCPY)
@@ -0,0 +1,3 @@
+ifeq ($(subdir),string)
+sysdep_routines += memcpy_generic memcpy_thunderx
+endif
@@ -0,0 +1,51 @@
+/* Enumerate available IFUNC implementations of a function. AARCH64 version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <assert.h>
+#include <string.h>
+#include <wchar.h>
+#include <ldsodefs.h>
+#include <ifunc-impl-list.h>
+#include <init-arch.h>
+#include <stdio.h>
+
+/* Maximum number of IFUNC implementations. */
+#define MAX_IFUNC 2
+
+size_t
+__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
+ size_t max)
+{
+ assert (max >= MAX_IFUNC);
+
+ size_t i = 0;
+
+ INIT_ARCH ();
+
+ /* Support sysdeps/aarch64/multiarch/memcpy.c and memmove.c. */
+ IFUNC_IMPL (i, name, memcpy,
+ IFUNC_IMPL_ADD (array, i, memcpy, IS_THUNDERX (midr),
+ __memcpy_thunderx)
+ IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic))
+ IFUNC_IMPL (i, name, memmove,
+ IFUNC_IMPL_ADD (array, i, memmove, IS_THUNDERX (midr),
+ __memmove_thunderx)
+ IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic))
+
+ return i;
+}
@@ -0,0 +1,22 @@
+/* This file is part of the GNU C Library.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <ldsodefs.h>
+
+#define INIT_ARCH() \
+ uint64_t __attribute__((unused)) midr = \
+ GLRO(dl_aarch64_cpu_features).midr_el1;
@@ -0,0 +1,39 @@
+/* Multiple versions of memcpy. AARCH64 version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* Define multiple versions only for the definition in libc. */
+
+#if IS_IN (libc)
+/* Redefine memcpy so that the compiler won't complain about the type
+ mismatch with the IFUNC selector in strong_alias, below. */
+# undef memcpy
+# define memcpy __redirect_memcpy
+# include <string.h>
+# include <init-arch.h>
+
+extern __typeof (__redirect_memcpy) __libc_memcpy;
+
+extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
+extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden;
+
+libc_ifunc (__libc_memcpy,
+ IS_THUNDERX (midr) ? __memcpy_thunderx : __memcpy_generic);
+
+#undef memcpy
+strong_alias (__libc_memcpy, memcpy);
+#endif
@@ -0,0 +1,42 @@
+/* A Generic Optimized memcpy implementation for AARCH64.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* The actual memcpy and memmove code is in ../memcpy.S. If we are
+ building libc this file defines __memcpy_generic and __memmove_generic.
+ Otherwise the include of ../memcpy.S will define the normal __memcpy
+ and__memmove entry points. */
+
+#include <sysdep.h>
+
+#if IS_IN (libc)
+
+#define MEMCPY __memcpy_generic
+#define MEMMOVE __memmove_generic
+
+/* Do not hide the generic versions of memcpy and memmove, we use them
+ internally. */
+#undef libc_hidden_builtin_def
+#define libc_hidden_builtin_def(name)
+
+/* It doesn't make sense to send libc-internal memcpy calls through a PLT. */
+ .globl __GI_memcpy; __GI_memcpy = __memcpy_generic
+ .globl __GI_memmove; __GI_memmove = __memmove_generic
+
+#endif
+
+#include "../memcpy.S"
@@ -0,0 +1,32 @@
+/* A Thunderx Optimized memcpy implementation for AARCH64.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* The actual thunderx optimized code is in ../memcpy.S under the USE_THUNDERX
+ ifdef. If we are not building libc then we do not build anything when
+ compiling this file and __memcpy is defined by memcpy_generic.S. */
+
+#include <sysdep.h>
+
+#if IS_IN (libc)
+
+#define MEMCPY __memcpy_thunderx
+#define MEMMOVE __memmove_thunderx
+#define USE_THUNDERX
+#include "../memcpy.S"
+
+#endif
@@ -0,0 +1,39 @@
+/* Multiple versions of memmove. AARCH64 version.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* Define multiple versions only for the definition in libc. */
+
+#if IS_IN (libc)
+/* Redefine memmove so that the compiler won't complain about the type
+ mismatch with the IFUNC selector in strong_alias, below. */
+# undef memmove
+# define memmove __redirect_memmove
+# include <string.h>
+# include <init-arch.h>
+
+extern __typeof (__redirect_memmove) __libc_memmove;
+
+extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden;
+extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden;
+
+libc_ifunc (__libc_memmove,
+ IS_THUNDERX (midr) ? __memmove_thunderx : __memmove_generic);
+
+#undef memmove
+strong_alias (__libc_memmove, memmove);
+#endif