mbox

[0/2] Enable EVEX strcmp

Message ID 20211101125412.611713-1-hjl.tools@gmail.com
Headers

Message

H.J. Lu Nov. 1, 2021, 12:54 p.m. UTC
  Remove Prefer_AVX2_STRCMP to enable EVEX strcmp.  When comparing 2 32-byte
strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1
VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD
and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1
VPMOVMSKB and 1 TESTL.  EVEX strcmp is now faster than AVX2 strcmp by up
to 40% on Tiger Lake and Ice Lake.

bench-strcmp data on Tiger Lake:

Function: strcmp
Variant: default
                                    __strcmp_avx2	__strcmp_evex
=======================================================================
        length=1, align1=1, align2=1:        23.69	       25.56	
        length=1, align1=1, align2=1:        24.62	       23.43	
        length=1, align1=1, align2=1:        23.87	       23.43	
        length=2, align1=2, align2=2:         6.82	        6.61	
        length=2, align1=2, align2=2:         5.38	        5.98	
        length=2, align1=2, align2=2:         6.86	        6.85	
        length=3, align1=3, align2=3:         6.85	        6.86	
        length=3, align1=3, align2=3:         5.98	        5.98	
        length=3, align1=3, align2=3:         5.98	        6.10	
        length=4, align1=4, align2=4:         6.58	        5.98	
        length=4, align1=4, align2=4:         6.37	        5.98	
        length=4, align1=4, align2=4:         6.58	        5.98	
        length=5, align1=5, align2=5:         5.98	        5.98	
        length=5, align1=5, align2=5:         6.06	        6.82	
        length=5, align1=5, align2=5:         5.98	        5.98	
        length=6, align1=6, align2=6:         6.58	        5.98	
        length=6, align1=6, align2=6:         6.58	        6.06	
        length=6, align1=6, align2=6:         5.98	        5.98	
        length=7, align1=7, align2=7:         5.98	        5.98	
        length=7, align1=7, align2=7:         5.98	        6.05	
        length=7, align1=7, align2=7:         5.98	        5.98	
        length=8, align1=8, align2=8:         5.38	        5.38	
        length=8, align1=8, align2=8:         5.98	        5.38	
        length=8, align1=8, align2=8:         5.98	        5.38	
        length=9, align1=9, align2=9:         5.38	        5.38	
        length=9, align1=9, align2=9:         5.38	        5.38	
        length=9, align1=9, align2=9:         4.78	        5.38	
     length=10, align1=10, align2=10:         6.05	        5.40	
     length=10, align1=10, align2=10:         5.38	        5.38	
     length=10, align1=10, align2=10:         5.38	        5.38	
     length=11, align1=11, align2=11:         4.78	        5.38	
     length=11, align1=11, align2=11:         4.78	        5.38	
     length=11, align1=11, align2=11:         4.78	        5.38	
     length=12, align1=12, align2=12:         4.86	        5.38	
     length=12, align1=12, align2=12:         5.98	        5.38	
     length=12, align1=12, align2=12:         5.98	        5.38	
     length=13, align1=13, align2=13:         5.98	        5.38	
     length=13, align1=13, align2=13:         4.78	        5.38	
     length=13, align1=13, align2=13:         4.78	        5.38	
     length=14, align1=14, align2=14:         5.98	        5.38	
     length=14, align1=14, align2=14:         5.47	        5.38	
     length=14, align1=14, align2=14:         5.38	        5.38	
     length=15, align1=15, align2=15:         5.38	        5.38	
     length=15, align1=15, align2=15:         5.98	        5.38	
     length=15, align1=15, align2=15:         6.05	        5.38	
     length=16, align1=16, align2=16:         4.79	        4.79	
     length=16, align1=16, align2=16:         4.78	        4.78	
     length=16, align1=16, align2=16:         5.38	        4.79	
     length=17, align1=17, align2=17:         6.58	        7.18	
     length=17, align1=17, align2=17:         6.58	        7.18	
     length=17, align1=17, align2=17:         6.58	        7.20	
     length=18, align1=18, align2=18:         6.58	        7.20	
     length=18, align1=18, align2=18:         6.58	        7.20	
     length=18, align1=18, align2=18:         6.58	        7.20	
     length=19, align1=19, align2=19:         6.58	        7.20	
     length=19, align1=19, align2=19:         6.58	        7.18	
     length=19, align1=19, align2=19:         6.58	        7.20	
     length=20, align1=20, align2=20:         6.58	        7.18	
     length=20, align1=20, align2=20:         6.58	        7.17	
     length=20, align1=20, align2=20:         6.58	        7.18	
     length=21, align1=21, align2=21:         6.58	        7.07	
     length=21, align1=21, align2=21:         7.18	        5.98	
     length=21, align1=21, align2=21:         7.18	        5.98	
     length=22, align1=22, align2=22:         6.58	        5.98	
     length=22, align1=22, align2=22:         7.18	        5.98	
     length=22, align1=22, align2=22:         7.18	        6.06	
     length=23, align1=23, align2=23:         6.58	        5.98	
     length=23, align1=23, align2=23:         6.58	        5.98	
     length=23, align1=23, align2=23:         6.58	        5.98	
     length=24, align1=24, align2=24:         4.86	        4.79	
     length=24, align1=24, align2=24:         5.38	        4.79	
     length=24, align1=24, align2=24:         5.38	        4.79	
     length=25, align1=25, align2=25:         4.78	        4.79	
     length=25, align1=25, align2=25:         5.38	        4.79	
     length=25, align1=25, align2=25:         5.38	        4.78	
     length=26, align1=26, align2=26:         5.46	        4.78	
     length=26, align1=26, align2=26:         5.38	        4.79	
     length=26, align1=26, align2=26:         5.38	        4.78	
     length=27, align1=27, align2=27:         4.78	        4.79	
     length=27, align1=27, align2=27:         4.78	        4.78	
     length=27, align1=27, align2=27:         4.78	        4.79	
     length=28, align1=28, align2=28:         5.38	        4.79	
     length=28, align1=28, align2=28:         4.78	        4.79	
     length=28, align1=28, align2=28:         5.38	        4.78	
     length=29, align1=29, align2=29:         4.78	        4.79	
     length=29, align1=29, align2=29:         5.38	        4.78	
     length=29, align1=29, align2=29:         4.78	        4.79	
     length=30, align1=30, align2=30:         4.78	        4.86	
     length=30, align1=30, align2=30:         5.38	        4.79	
     length=30, align1=30, align2=30:         4.78	        4.79	
     length=31, align1=31, align2=31:         4.78	        4.86	
     length=31, align1=31, align2=31:         5.38	        4.78	
     length=31, align1=31, align2=31:         5.38	        4.78	
        length=4, align1=0, align2=0:         6.00	        5.39	
        length=4, align1=0, align2=0:         6.00	        5.38	
        length=4, align1=0, align2=0:         6.00	        5.38	
        length=4, align1=0, align2=0:         5.98	        5.38	
        length=4, align1=0, align2=0:         6.02	        5.38	
        length=4, align1=0, align2=0:         5.98	        5.38	
        length=4, align1=0, align2=1:         5.98	        5.98	
        length=4, align1=1, align2=2:         5.38	        5.98	
        length=8, align1=0, align2=0:         5.98	        5.38	
        length=8, align1=0, align2=0:         6.02	        5.38	
        length=8, align1=0, align2=0:         6.00	        5.38	
        length=8, align1=0, align2=0:         6.00	        5.38	
        length=8, align1=0, align2=0:         6.02	        5.38	
        length=8, align1=0, align2=0:         5.98	        5.38	
        length=8, align1=0, align2=2:         5.98	        5.98	
        length=8, align1=2, align2=3:         5.38	        5.98	
       length=16, align1=0, align2=0:         5.38	        4.79	
       length=16, align1=0, align2=0:         5.38	        4.78	
       length=16, align1=0, align2=0:         4.87	        4.78	
       length=16, align1=0, align2=0:         5.38	        4.79	
       length=16, align1=0, align2=0:         4.78	        4.79	
       length=16, align1=0, align2=0:         5.38	        4.79	
       length=16, align1=0, align2=3:         6.00	        5.38	
       length=16, align1=3, align2=4:         5.98	        5.38	
       length=32, align1=0, align2=0:         7.82	        5.99	
       length=32, align1=0, align2=0:         7.71	        6.58	
       length=32, align1=0, align2=0:         6.44	        4.79	
       length=32, align1=0, align2=0:         6.81	        4.79	
       length=32, align1=0, align2=0:         6.53	        4.79	
       length=32, align1=0, align2=0:         6.33	        4.79	
       length=32, align1=0, align2=4:         8.61	        4.78	
       length=32, align1=4, align2=5:         6.74	        5.49	
       length=64, align1=0, align2=0:         9.67	        8.24	
       length=64, align1=0, align2=0:        11.11	        8.23	
       length=64, align1=0, align2=0:        10.00	        6.88	
       length=64, align1=0, align2=0:        12.82	        6.88	
       length=64, align1=0, align2=0:        10.42	        7.88	
       length=64, align1=0, align2=0:        10.37	        6.88	
       length=64, align1=0, align2=5:        11.08	        6.88	
       length=64, align1=5, align2=6:         9.29	        6.88	
      length=128, align1=0, align2=0:        14.06	       14.08	
      length=128, align1=0, align2=0:        14.23	       14.14	
      length=128, align1=0, align2=0:         8.41	        7.48	
      length=128, align1=0, align2=0:        10.55	        7.48	
      length=128, align1=0, align2=0:         8.45	        7.48	
      length=128, align1=0, align2=0:         9.38	        7.48	
      length=128, align1=0, align2=6:         8.44	        7.48	
      length=128, align1=6, align2=7:         8.66	        7.48	
      length=256, align1=0, align2=0:        16.54	       17.55	
      length=256, align1=0, align2=0:        16.42	       17.49	
      length=256, align1=0, align2=0:        17.03	       17.47	
      length=256, align1=0, align2=0:        17.57	       17.49	
      length=256, align1=0, align2=0:        16.63	       17.47	
      length=256, align1=0, align2=0:        17.88	       17.54	
      length=256, align1=0, align2=7:        20.20	       19.18	
      length=256, align1=7, align2=8:        20.17	       19.14	
      length=512, align1=0, align2=0:        25.17	       24.51	
      length=512, align1=0, align2=0:        24.60	       24.38	
      length=512, align1=0, align2=0:        24.53	       24.52	
      length=512, align1=0, align2=0:        25.71	       24.34	
      length=512, align1=0, align2=0:        24.55	       24.48	
      length=512, align1=0, align2=0:        25.15	       24.44	
      length=512, align1=0, align2=8:        25.97	       25.90	
      length=512, align1=8, align2=9:        25.88	       25.92	
     length=1024, align1=0, align2=0:        40.13	       36.75	
     length=1024, align1=0, align2=0:        39.84	       36.63	
     length=1024, align1=0, align2=0:        40.50	       36.84	
     length=1024, align1=0, align2=0:        40.16	       36.76	
     length=1024, align1=0, align2=0:        39.72	       36.76	
     length=1024, align1=0, align2=0:        40.67	       36.76	
     length=1024, align1=0, align2=9:        40.57	       39.59	
    length=1024, align1=9, align2=10:        40.66	       39.60	
       length=16, align1=1, align2=2:         6.59	        7.18	
       length=16, align1=2, align2=1:         7.18	        7.18	
       length=16, align1=1, align2=2:         5.39	        5.38	
       length=16, align1=2, align2=1:         5.97	        5.40	
       length=16, align1=1, align2=2:         5.41	        5.38	
       length=16, align1=2, align2=1:         5.98	        5.38	
       length=32, align1=2, align2=4:         8.81	        7.18	
       length=32, align1=4, align2=2:         8.79	        7.18	
       length=32, align1=2, align2=4:         7.57	        4.79	
       length=32, align1=4, align2=2:         6.79	        4.79	
       length=32, align1=2, align2=4:         7.03	        4.78	
       length=32, align1=4, align2=2:         7.04	        4.78	
       length=64, align1=3, align2=6:        10.00	        8.38	
       length=64, align1=6, align2=3:         8.89	        9.57	
       length=64, align1=3, align2=6:         9.31	        6.88	
       length=64, align1=6, align2=3:        10.06	        6.88	
       length=64, align1=3, align2=6:         9.38	        6.88	
       length=64, align1=6, align2=3:        10.42	        6.88	
      length=128, align1=4, align2=8:        17.36	       16.15	
      length=128, align1=8, align2=4:        14.30	       14.50	
      length=128, align1=4, align2=8:         8.48	        7.48	
      length=128, align1=8, align2=4:         8.78	        7.48	
      length=128, align1=4, align2=8:         8.45	        7.48	
      length=128, align1=8, align2=4:         8.57	        7.55	
     length=256, align1=5, align2=10:        20.73	       19.26	
     length=256, align1=10, align2=5:        16.81	       18.56	
     length=256, align1=5, align2=10:        20.44	       19.14	
     length=256, align1=10, align2=5:        16.76	       18.57	
     length=256, align1=5, align2=10:        20.03	       19.22	
     length=256, align1=10, align2=5:        17.01	       18.55	
     length=512, align1=6, align2=12:        26.50	       25.81	
     length=512, align1=12, align2=6:        24.64	       25.61	
     length=512, align1=6, align2=12:        26.23	       25.90	
     length=512, align1=12, align2=6:        24.78	       25.70	
     length=512, align1=6, align2=12:        25.85	       25.90	
     length=512, align1=12, align2=6:        25.98	       25.71	
    length=1024, align1=7, align2=14:        40.62	       39.69	
    length=1024, align1=14, align2=7:        39.74	       39.06	
    length=1024, align1=7, align2=14:        40.70	       39.58	
    length=1024, align1=14, align2=7:        40.16	       39.04	
    length=1024, align1=7, align2=14:        40.62	       39.65	
    length=1024, align1=14, align2=7:        39.68	       39.12	
length=128, align1=8063, align2=8063:        14.19	       14.43	
length=128, align1=8063, align2=8062:        14.57	       14.48	
length=129, align1=8062, align2=8063:        17.52	       16.06	
length=129, align1=8062, align2=8062:        14.13	       14.08	
length=129, align1=8062, align2=8062:        14.16	       14.08	
length=129, align1=8062, align2=8061:        15.59	       14.54	
length=130, align1=8061, align2=8062:        17.53	       16.14	
length=130, align1=8061, align2=8061:        14.66	       14.08	
length=130, align1=8061, align2=8061:        13.80	       14.09	
length=130, align1=8061, align2=8060:        14.28	       14.47	
length=131, align1=8060, align2=8061:        17.84	       16.11	
length=131, align1=8060, align2=8060:        14.08	       14.07	
length=131, align1=8060, align2=8060:        14.02	       14.07	
length=131, align1=8060, align2=8059:        15.05	       14.48	
length=132, align1=8059, align2=8060:        17.46	       16.10	
length=132, align1=8059, align2=8059:        13.99	       14.07	
length=132, align1=8059, align2=8059:        14.01	       14.08	
length=132, align1=8059, align2=8058:        14.54	       14.54	
length=133, align1=8058, align2=8059:        17.38	       16.17	
length=133, align1=8058, align2=8058:        14.14	       14.08	
length=133, align1=8058, align2=8058:        13.88	       14.06	
length=133, align1=8058, align2=8057:        14.66	       14.47	
length=134, align1=8057, align2=8058:        17.45	       16.13	
length=134, align1=8057, align2=8057:        14.10	       14.07	
length=134, align1=8057, align2=8057:        14.54	       14.07	
length=134, align1=8057, align2=8056:        14.58	       14.49	
length=135, align1=8056, align2=8057:        17.65	       16.10	
length=135, align1=8056, align2=8056:        13.91	       14.08	
length=135, align1=8056, align2=8056:        14.16	       14.07	
length=135, align1=8056, align2=8055:        15.19	       14.74	
length=136, align1=8055, align2=8056:        18.17	       16.10	
length=136, align1=8055, align2=8055:        14.68	       14.64	
length=136, align1=8055, align2=8055:        14.58	       14.64	
length=136, align1=8055, align2=8054:        15.21	       15.03	
length=137, align1=8054, align2=8055:        17.75	       16.22	
length=137, align1=8054, align2=8054:        14.51	       14.62	
length=137, align1=8054, align2=8054:        15.15	       14.69	
length=137, align1=8054, align2=8053:        15.11	       14.94	
length=138, align1=8053, align2=8054:        18.13	       16.22	
length=138, align1=8053, align2=8053:        14.61	       14.70	
length=138, align1=8053, align2=8053:        14.41	       14.70	
length=138, align1=8053, align2=8052:        14.96	       14.94	
length=139, align1=8052, align2=8053:        17.98	       16.21	
length=139, align1=8052, align2=8052:        14.63	       14.68	
length=139, align1=8052, align2=8052:        15.30	       14.62	
length=139, align1=8052, align2=8051:        15.20	       14.95	
length=140, align1=8051, align2=8052:        17.66	       16.13	
length=140, align1=8051, align2=8051:        14.60	       14.68	
length=140, align1=8051, align2=8051:        14.58	       14.62	
length=140, align1=8051, align2=8050:        15.51	       14.94	
length=141, align1=8050, align2=8051:        17.41	       16.14	
length=141, align1=8050, align2=8050:        14.77	       14.71	
length=141, align1=8050, align2=8050:        14.50	       14.62	
length=141, align1=8050, align2=8049:        14.95	       14.97	
length=142, align1=8049, align2=8050:        17.55	       16.14	
length=142, align1=8049, align2=8049:        14.46	       14.70	
length=142, align1=8049, align2=8049:        14.60	       14.61	
length=142, align1=8049, align2=8048:        14.77	       14.78	
length=143, align1=8048, align2=8049:        18.15	       16.15	
length=143, align1=8048, align2=8048:        13.92	       14.02	
length=143, align1=8048, align2=8048:        13.88	       14.02	
length=143, align1=8048, align2=8047:        14.11	       14.32	
length=144, align1=8047, align2=8048:        17.64	       16.19	
length=144, align1=8047, align2=8047:        14.20	       13.96	
length=144, align1=8047, align2=8047:        14.03	       13.95	
length=144, align1=8047, align2=8046:        14.36	       14.32	
length=145, align1=8046, align2=8047:        17.82	       16.11	
length=145, align1=8046, align2=8046:        14.39	       13.95	
length=145, align1=8046, align2=8046:        13.88	       13.95	
length=145, align1=8046, align2=8045:        14.55	       14.33	
length=146, align1=8045, align2=8046:        18.02	       16.10	
length=146, align1=8045, align2=8045:        13.91	       13.95	
length=146, align1=8045, align2=8045:        13.77	       13.95	
length=146, align1=8045, align2=8044:        14.26	       14.32	
length=147, align1=8044, align2=8045:        17.43	       16.17	
length=147, align1=8044, align2=8044:        14.02	       14.01	
length=147, align1=8044, align2=8044:        13.99	       13.89	
length=147, align1=8044, align2=8043:        14.40	       14.32	
length=148, align1=8043, align2=8044:        17.57	       16.08	
length=148, align1=8043, align2=8043:        14.00	       13.95	
length=148, align1=8043, align2=8043:        14.18	       13.95	
length=148, align1=8043, align2=8042:        14.66	       14.33	
length=149, align1=8042, align2=8043:        17.50	       16.20	
length=149, align1=8042, align2=8042:        13.87	       13.95	
length=149, align1=8042, align2=8042:        14.12	       13.96	
length=149, align1=8042, align2=8041:        14.74	       14.32	
length=150, align1=8041, align2=8042:        17.63	       16.13	
length=150, align1=8041, align2=8041:        13.87	       13.95	
length=150, align1=8041, align2=8041:        13.73	       13.94	
length=150, align1=8041, align2=8040:        14.31	       14.34	
length=151, align1=8040, align2=8041:        18.46	       16.09	
length=151, align1=8040, align2=8040:        15.37	       13.95	
length=151, align1=8040, align2=8040:        14.01	       13.95	
length=151, align1=8040, align2=8039:        14.25	       14.32	
length=152, align1=8039, align2=8040:        17.70	       16.11	
length=152, align1=8039, align2=8039:        13.89	       14.03	
length=152, align1=8039, align2=8039:        14.49	       14.02	
length=152, align1=8039, align2=8038:        14.31	       14.39	
length=153, align1=8038, align2=8039:        17.62	       16.10	
length=153, align1=8038, align2=8038:        13.75	       13.95	
length=153, align1=8038, align2=8038:        14.00	       13.94	
length=153, align1=8038, align2=8037:        14.25	       14.33	
length=154, align1=8037, align2=8038:        18.33	       16.11	
length=154, align1=8037, align2=8037:        14.12	       13.96	
length=154, align1=8037, align2=8037:        14.08	       13.95	
length=154, align1=8037, align2=8036:        15.15	       14.33	
length=155, align1=8036, align2=8037:        17.66	       16.09	
length=155, align1=8036, align2=8036:        14.22	       14.01	
length=155, align1=8036, align2=8036:        13.87	       14.02	
length=155, align1=8036, align2=8035:        14.63	       14.32	
length=156, align1=8035, align2=8036:        17.57	       16.10	
length=156, align1=8035, align2=8035:        14.00	       13.96	
length=156, align1=8035, align2=8035:        13.88	       13.95	
length=156, align1=8035, align2=8034:        14.79	       14.41	
length=157, align1=8034, align2=8035:        17.74	       16.15	
length=157, align1=8034, align2=8034:        14.13	       13.94	
length=157, align1=8034, align2=8034:        14.86	       13.95	
length=157, align1=8034, align2=8033:        14.35	       14.33	
length=158, align1=8033, align2=8034:        17.68	       16.16	
length=158, align1=8033, align2=8033:        13.94	       13.94	
H.J. Lu (2):
  x86-64: Improve EVEX strcmp with masked load
  x86-64: Remove Prefer_AVX2_STRCMP

 sysdeps/x86/cpu-features.c                    |   8 -
 sysdeps/x86/cpu-tunables.c                    |   2 -
 ...cpu-features-preferred_feature_index_1.def |   1 -
 sysdeps/x86_64/multiarch/strcmp-evex.S        | 461 +++++++++---------
 sysdeps/x86_64/multiarch/strcmp.c             |   3 +-
 sysdeps/x86_64/multiarch/strncmp.c            |   3 +-
 6 files changed, 245 insertions(+), 233 deletions(-)