[6/7] stdlib: Optimization qsort{_r} swap implementation

  This patchs adds a optimized swap operation on qsort based in previous
msort one.  Instead of byte operation, three variants are provided:

  1. Using uint32_t loads and stores.
  2. Using uint64_t loads and stores.
  3. Generic one with a temporary buffer and memcpy/mempcpy.

The 1. and 2. option are selected only either if architecture defines
_STRING_ARCH_unaligned or if base pointer is aligned to required type.
This is due based on data for bench-qsort, usually programs calls
qsort with array with multiple of machine word as element size.

Benchmarking shows an increase performance:

Results for member size 4
  Sorted
  nmemb   |      base |   patched | diff
        32|      1401 |      1958 | 39.76
      4096|    351333 |    368533 | 4.90
     32768|   3369386 |   3131712 | -7.05
    524288|  63192972 |  59807494 | -5.36

  MostlySorted
  nmemb   |      base |   patched | diff
        32|      2391 |      2061 | -13.80
      4096|   1124074 |    961816 | -14.43
     32768|  11196607 |   9410438 | -15.95
    524288| 215908169 | 185586732 | -14.04

  Unsorted
  nmemb   |      base |   patched | diff
        32|      4993 |      2021 | -59.52
      4096|   1113860 |    963126 | -13.53
     32768|  11251293 |   9518795 | -15.40
    524288| 217252237 | 185072278 | -14.81

Results for member size 8
  Sorted
  nmemb   |      base |   patched | diff
        32|      1296 |      1267 | -2.24
      4096|    359418 |    334852 | -6.83
     32768|   3535229 |   3345157 | -5.38
    524288|  69847251 |  67029358 | -4.03

  MostlySorted
  nmemb   |      base |   patched | diff
        32|      2745 |      2340 | -14.75
      4096|   1222082 |   1014314 | -17.00
     32768|  12244800 |   9924706 | -18.95
    524288| 241557971 | 196898760 | -18.49

  Unsorted
  nmemb   |      base |   patched | diff
        32|      2972 |      2389 | -19.62
      4096|   1314861 |   1024052 | -22.12
     32768|  12397909 |  10120848 | -18.37
    524288| 241789262 | 193414824 | -20.01

Results for member size 32
  Sorted
  nmemb   |      base |   patched | diff
        32|      1305 |      1287 | -1.38
      4096|    346332 |    347979 | 0.48
     32768|   3458244 |   3408058 | -1.45
    524288|  72793445 |  69973719 | -3.87

  MostlySorted
  nmemb   |      base |   patched | diff
        32|      5435 |      4890 | -10.03
      4096|   2032260 |   1688556 | -16.91
     32768|  19909035 |  16419992 | -17.52
    524288| 390339319 | 325921585 | -16.50

  Unsorted
  nmemb   |      base |   patched | diff
        32|      5833 |      5351 | -8.26
      4096|   2022531 |   1724961 | -14.71
     32768|  19842888 |  16588545 | -16.40
    524288| 388838382 | 324102703 | -16.65

Checked on x86_64-linux-gnu.

	[BZ #19305].
	* stdlib/qsort.c (SWAP): Remove.
	(check_alignment, swap_u32, swap_u64, swap_generic,
	select_swap_func): New functions.
	(__qsort_r):
---
 stdlib/qsort.c | 77 ++++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 59 insertions(+), 18 deletions(-)

[6/7] stdlib: Optimization qsort{_r} swap implementation

Commit Message

Comments

Patch