Unroll slightly so some of the multiples can be pipelined on out-order
machines. Unrolling further started to induce slowdowns for sizes
[0, 4] but can help the loop so if larger sizes are the target
further unrolling can be beneficial.
Results for __dl_new_hash
Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
Time as Geometric Mean of N=25 runs
Geometric of all benchmark New / Old: 0.791
type, length, New Time, Old Time, New Time / Old Time
fixed, 0, 0.641, 0.658, 0.974
fixed, 1, 1.888, 1.883, 1.003
fixed, 2, 2.712, 2.833, 0.957
fixed, 3, 3.314, 3.739, 0.886
fixed, 4, 4.316, 4.866, 0.887
fixed, 5, 5.16, 5.966, 0.865
fixed, 6, 5.986, 7.241, 0.827
fixed, 7, 7.264, 8.435, 0.861
fixed, 8, 8.052, 9.846, 0.818
fixed, 9, 9.369, 11.316, 0.828
fixed, 10, 10.256, 12.925, 0.794
fixed, 11, 12.191, 14.546, 0.838
fixed, 12, 12.667, 15.92, 0.796
fixed, 13, 14.442, 17.465, 0.827
fixed, 14, 14.808, 18.981, 0.78
fixed, 15, 16.244, 20.565, 0.79
fixed, 16, 17.166, 22.044, 0.779
fixed, 32, 35.447, 50.558, 0.701
fixed, 64, 86.479, 134.529, 0.643
fixed, 128, 155.453, 287.527, 0.541
fixed, 256, 302.57, 593.64, 0.51
random, 2, 11.168, 10.61, 1.053
random, 4, 13.308, 13.53, 0.984
random, 8, 16.579, 19.437, 0.853
random, 16, 21.292, 24.776, 0.859
random, 32, 30.56, 35.906, 0.851
random, 64, 49.249, 68.577, 0.718
random, 128, 81.845, 140.664, 0.582
random, 256, 152.517, 292.204, 0.522
---
sysdeps/generic/dl-hash.h | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)