x86: Fix Haswell strong flags (BZ#23709)

Message ID 20181011182352.25394-1-adhemerval.zanella@linaro.org
State Dropped
Headers

Commit Message

Adhemerval Zanella Netto Oct. 11, 2018, 6:23 p.m. UTC
  Th commit 'Disable TSX on some Haswell processors.' (2702856bf4) changed the
default flags for Haswell models.  Previously, new models were handled by the
default switch path, which assumed a Core i3/i5/i7 if AVX is available. After
the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) do not set the flags
Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and
Prefer_PMINUB_for_stringop (only the TSX one).

This patch fixes it by disentangle the TSX flag handling from the memory
optimization ones.  The strstr case cited on patch now selects the
__strstr_sse2_unaligned as expected for the Haswell cpu.

Checked on x86_64-linux-gnu.

	[BZ #23709]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set TSX bits
	independently of other flags.
---
 ChangeLog                  | 6 ++++++
 sysdeps/x86/cpu-features.c | 6 ++++++
 2 files changed, 12 insertions(+)
  

Comments

Florian Weimer Oct. 23, 2018, 9:55 a.m. UTC | #1
* Adhemerval Zanella:

> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index f4e0f5a2ed..80b3054cf8 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -316,7 +316,13 @@ init_cpu_features (struct cpu_features *cpu_features)
>  		    | bit_arch_Fast_Unaligned_Copy
>  		    | bit_arch_Prefer_PMINUB_for_stringop);
>  	      break;
> +	    }
>  
> +	 /* Disable TSX on some Haswell processors to avoid TSX on kernels that
> +	    weren't updated with the latest microcode package (which disables
> +	    broken feature by default).  */
> +	 switch (model)
> +	    {
>  	    case 0x3f:
>  	      /* Xeon E7 v3 with stepping >= 4 has working TSX.  */
>  	      if (stepping >= 4)

I think the change is okay as posted.  It will need some testing in the
field because the newly selected implementations could have unexpected
performance drawbacks.

Thanks,
Florian
  
Adhemerval Zanella Netto Oct. 24, 2018, 7:22 p.m. UTC | #2
On 23/10/2018 06:55, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
>> index f4e0f5a2ed..80b3054cf8 100644
>> --- a/sysdeps/x86/cpu-features.c
>> +++ b/sysdeps/x86/cpu-features.c
>> @@ -316,7 +316,13 @@ init_cpu_features (struct cpu_features *cpu_features)
>>  		    | bit_arch_Fast_Unaligned_Copy
>>  		    | bit_arch_Prefer_PMINUB_for_stringop);
>>  	      break;
>> +	    }
>>  
>> +	 /* Disable TSX on some Haswell processors to avoid TSX on kernels that
>> +	    weren't updated with the latest microcode package (which disables
>> +	    broken feature by default).  */
>> +	 switch (model)
>> +	    {
>>  	    case 0x3f:
>>  	      /* Xeon E7 v3 with stepping >= 4 has working TSX.  */
>>  	      if (stepping >= 4)
> 
> I think the change is okay as posted.  It will need some testing in the
> field because the newly selected implementations could have unexpected
> performance drawbacks.

This fix only change ifunc selection for Haswell chips [1]:

Haswell (Client)	GT3E		0	0x6	0x4	0x6	Family 6 Model 70
			ULT		0	0x6	0x4	0x5	Family 6 Model 69
			S		0	0x6	0x3	0xC	Family 6 Model 60

Haswell (Server)	E, EP, EX	0	0x6	0x3	0xF	Family 6 Model 63

On these chips the internal glibc flags bit_arch_Fast_Rep_String, 
bit_arch_Fast_Unaligned_Load, bit_arch_Fast_Unaligned_Copy, and 
bit_arch_Prefer_PMINUB_for_stringop won't be set:

  * The bit_arch_Fast_Rep_String is only used on ifunc selection 
    on i686 (32-bits) and it selects the *ssse3_rep* memcpy, memmove, 
    bcopy, and mempcpy.  It it is not set the *ssse3* variant is used 
    instead.  

    I am not sure which is the performance difference between but my 
    expectation is ssse3_rep should be faster than ssse3.

  * The bit_arch_Fast_Unaligned_Load influences both i686 and x86_64. 
    For x86_64 it influences the selections of the SSE2 unaligned 
    optimization variants for stpncpy, strcpy, strncpy, stpcpy, strncat, 
    strcat, and strstr.  For all but strstr an ssse3 or sse2 variant is 
    used instead

    I am not sure which is the performance difference between but my 
    expectation the unaligned version should be faster.

  * The bit_arch_Fast_Unaligned_Copy influences mempcpy, memmove, and 
    memcpy. If chip has not SSE3 the bit will select either a RMS or 
    a unaligned variant. For Haswell the *_avx_unaligned_erms variants 
    will be selected, so this bits won't interfere with best selections.

The bit_arch_PMINUB_for_stringop is not used on ifunc selection.

[1] https://en.wikichip.org/wiki/intel/cpuid
  

Patch

diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index f4e0f5a2ed..80b3054cf8 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -316,7 +316,13 @@  init_cpu_features (struct cpu_features *cpu_features)
 		    | bit_arch_Fast_Unaligned_Copy
 		    | bit_arch_Prefer_PMINUB_for_stringop);
 	      break;
+	    }
 
+	 /* Disable TSX on some Haswell processors to avoid TSX on kernels that
+	    weren't updated with the latest microcode package (which disables
+	    broken feature by default).  */
+	 switch (model)
+	    {
 	    case 0x3f:
 	      /* Xeon E7 v3 with stepping >= 4 has working TSX.  */
 	      if (stepping >= 4)