Add Prefer_MAP_32BIT_EXEC for Silvermont

Message ID CAMe9rOofj5_TQFcU695CbMmpqcULUgD4r0O9kY2+VmsNoPJx9A@mail.gmail.com
State New, archived
Headers

Commit Message

H.J. Lu Dec. 12, 2015, 5:36 p.m. UTC
  On Fri, Dec 11, 2015 at 6:01 PM, Jeff Law <law@redhat.com> wrote:
> On 12/11/2015 05:31 PM, Zack Weinberg wrote:
>>
>> On Fri, Dec 11, 2015 at 7:14 PM, Andi Kleen <andi@firstfloor.org> wrote:
>>>>
>>>> And I'd argue that this is killing ASLR at a level that it should be
>>>> an opt-out rather than opt-in.  Crippling ASLR is, IMHO,
>>>> unacceptable.
>>>
>>>
>>> You're arguing then that running 32bit code is unacceptable.
>>
>>
>> I don't see that that follows.
>>
>> Right now, 32-bit code has security margin X and 64-bit code has
>> security margin Y > X.  The proposed patch *reduces* the security
>> margin of 64-bit code from Y to X (give or take).  That may be, and
>> IMHO is, an unacceptable change *even if* X is agreed to be adequate,
>> or anyway the best that can be done for 32-bit.
>
> Exactly.  For a 64 bit application, this change will essentially cripple
> ASLR if I understand the patch correctly.  That is unacceptable to me and
> likely to Red Hat as a whole.
>
>>
>> Fundamentally, my issue here is that there are people right now
>> depending on this security margin to be Y, so a glibc upgrade should
>> not silently remove that.  It is a compatibility break of the worst
>> kind: completely invisible in normal operation, but the system no
>> longer has a property you were counting on to protect you under
>> abnormal (adversarial) conditions.
>
> Right.  And in fact, ASLR is the margin by which some currently known
> vulnerabilities have not been turned into proof of concept exploits.
>
> ASLR, while not perfect, while bypassable via various information leaks, is
> still a vital component in the overall security profile for Linux,
> particularly for 64 bit OSs & applications.
>

Here is the updated patch to make it opt-in.  OK for master?
  

Comments

Zack Weinberg Dec. 14, 2015, 2:56 p.m. UTC | #1
On Sat, Dec 12, 2015 at 12:36 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> Here is the updated patch to make it opt-in.  OK for master?

Thank you for being willing to make that change.

I have no more principled objections; however, the code could be much
simpler. Now that it's opt-in, there's no reason to entangle it with
the x86 tuning code; it should be wholly controlled by the environment
variable.  Also, why are you open-coding a loop over the contents of
__environ?  Isn't this what __secure_getenv is for?

zw
  
Florian Weimer Dec. 14, 2015, 3:15 p.m. UTC | #2
On 12/12/2015 06:36 PM, H.J. Lu wrote:

> Here is the updated patch to make it opt-in.  OK for master?

I would still like to see performance numbers for the PIE and vDSO
cases, as requested here:

  <https://sourceware.org/ml/libc-alpha/2015-12/>

As far as I can tell, the application impact of this setting has not
been investigated, either:

  <https://sourceware.org/ml/libc-alpha/2015-12/msg00261.html>

And it probably makes sense to guard this by a configure switch because
for almost all users, it's just bloat.

I'm somewhat sympathetic to Zack's request to remove the special-casing
on CPU settings.  It would certainly simplify application compatibility
testing without access to the defective silicon.

Florian
  
H.J. Lu Dec. 14, 2015, 4:43 p.m. UTC | #3
On Mon, Dec 14, 2015 at 6:56 AM, Zack Weinberg <zackw@panix.com> wrote:
> On Sat, Dec 12, 2015 at 12:36 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> Here is the updated patch to make it opt-in.  OK for master?
>
> Thank you for being willing to make that change.
>
> I have no more principled objections; however, the code could be much
> simpler. Now that it's opt-in, there's no reason to entangle it with
> the x86 tuning code; it should be wholly controlled by the environment

How about

@@ -161,6 +195,14 @@ init_cpu_features (struct cpu_features *cpu_features)
   if (HAS_CPU_FEATURE (CMOV))
     cpu_features->feature[index_I686] |= bit_I686;

+  /* For 64-bit applications, branch prediction performance may be
+     negatively impacted when the target of a branch is more than 4GB
+     away from the branch.  Set the Prefer_MAP_32BIT_EXEC bit so that
+     mmap will try to map executable pages with MAP_32BIT first.
+     NB: MAP_32BIT will map to lower 2GB, not lower 4GB, address.  */
+  cpu_features->feature[index_Prefer_MAP_32BIT_EXEC]
+    |= get_prefer_map_32bit_exec ();
+

> variable.  Also, why are you open-coding a loop over the contents of
> __environ?  Isn't this what __secure_getenv is for?
>

It is bcause get_prefer_map_32bit_exec is called very early in ld.so when
__secure_getenv/getenv aren't available yet.
  
H.J. Lu Dec. 14, 2015, 5:10 p.m. UTC | #4
On Mon, Dec 14, 2015 at 7:15 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 12/12/2015 06:36 PM, H.J. Lu wrote:
>
>> Here is the updated patch to make it opt-in.  OK for master?
>
> I would still like to see performance numbers for the PIE and vDSO
> cases, as requested here:

This optimization has no performance impact on PIE nor
vDSO.

>   <https://sourceware.org/ml/libc-alpha/2015-12/>
>
> As far as I can tell, the application impact of this setting has not
> been investigated, either:
>
>   <https://sourceware.org/ml/libc-alpha/2015-12/msg00261.html>
>
> And it probably makes sense to guard this by a configure switch because
> for almost all users, it's just bloat.

Adding a configure switch defeats LD_ENABLE_PREFER_MAP_32BIT_EXEC.

old:

[hjl@gnu-6 build-x86_64-linux]$ size elf/dl-sysdep.os
   text   data    bss    dec    hex filename
   3976     20      4   4000    fa0 elf/dl-sysdep.os

and new:

[hjl@gnu-6 build-x86_64-linux]$ size elf/dl-sysdep.os
   text   data    bss    dec    hex filename
   4201     20      4   4225   1081 elf/dl-sysdep.os

It increases elf/dl-sysdep.os by 225 bytes.  Calling it "bloat" is
a stretch.

> I'm somewhat sympathetic to Zack's request to remove the special-casing
> on CPU settings.  It would certainly simplify application compatibility
> testing without access to the defective silicon.
>
> Florian
>
  
H.J. Lu Dec. 14, 2015, 5:31 p.m. UTC | #5
On Mon, Dec 14, 2015 at 9:16 AM, Zack Weinberg <zackw@panix.com> wrote:
> On Mon, Dec 14, 2015 at 11:43 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> How about
>>
>> @@ -161,6 +195,14 @@ init_cpu_features (struct cpu_features *cpu_features)
>>    if (HAS_CPU_FEATURE (CMOV))
>>      cpu_features->feature[index_I686] |= bit_I686;
>>
>> +  /* For 64-bit applications, branch prediction performance may be
>> +     negatively impacted when the target of a branch is more than 4GB
>> +     away from the branch.  Set the Prefer_MAP_32BIT_EXEC bit so that
>> +     mmap will try to map executable pages with MAP_32BIT first.
>> +     NB: MAP_32BIT will map to lower 2GB, not lower 4GB, address.  */
>> +  cpu_features->feature[index_Prefer_MAP_32BIT_EXEC]
>> +    |= get_prefer_map_32bit_exec ();
>
> Uh, what I'm saying is I don't think this belongs in cpu_features[] *at all*.

If other 64-bit architectures want to support LD_ENABLE_PREFER_MAP_32BIT_EXEC,
I will make it generic.  Otherwise, I will leave it in x86 cpu_features.

>>> variable.  Also, why are you open-coding a loop over the contents of
>>> __environ?  Isn't this what __secure_getenv is for?
>>
>> It is bcause get_prefer_map_32bit_exec is called very early in ld.so when
>> __secure_getenv/getenv aren't available yet.
>
> Then we should fix that instead of duplicating security-sensitive logic.

ld.so has its own functions to process environment variables. Since
only sysdeps/x86/cpu-features.c needs it,  I left  __secure_getenv out of
elf/dl-environ.c.
  
Florian Weimer Dec. 14, 2015, 5:34 p.m. UTC | #6
On 12/14/2015 06:10 PM, H.J. Lu wrote:
> On Mon, Dec 14, 2015 at 7:15 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> On 12/12/2015 06:36 PM, H.J. Lu wrote:
>>
>>> Here is the updated patch to make it opt-in.  OK for master?
>>
>> I would still like to see performance numbers for the PIE and vDSO
>> cases, as requested here:
> 
> This optimization has no performance impact on PIE nor
> vDSO.

That's unfortunate.  Why don't you need a more complete fix?

Florian
  
H.J. Lu Dec. 14, 2015, 5:38 p.m. UTC | #7
On Mon, Dec 14, 2015 at 9:34 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 12/14/2015 06:10 PM, H.J. Lu wrote:
>> On Mon, Dec 14, 2015 at 7:15 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>> On 12/12/2015 06:36 PM, H.J. Lu wrote:
>>>
>>>> Here is the updated patch to make it opt-in.  OK for master?
>>>
>>> I would still like to see performance numbers for the PIE and vDSO
>>> cases, as requested here:
>>
>> This optimization has no performance impact on PIE nor
>> vDSO.
>
> That's unfortunate.  Why don't you need a more complete fix?
>

PIE and vDSO are loaded by kernel, not ld.so, which is beyond
the scope of this patch.  For our workloads, this patch is sufficient.
  
Carlos O'Donell Dec. 15, 2015, 6:27 p.m. UTC | #8
On 12/12/2015 12:36 PM, H.J. Lu wrote:
>> > ASLR, while not perfect, while bypassable via various information leaks, is
>> > still a vital component in the overall security profile for Linux,
>> > particularly for 64 bit OSs & applications.
>> >
> Here is the updated patch to make it opt-in.  OK for master?

I agree with Zack and others here, using Prefer_MAP_32_BIT_EXEC unconditionally
is simply unacceptable.

See comments below.

> 0001-Add-Prefer_MAP_32BIT_EXEC-for-Silvermont.patch
> 
> 
> From 55a5e6278f86cecba8515804a7a2859a109920ba Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" <hjl.tools@gmail.com>
> Date: Wed, 21 Oct 2015 14:44:23 -0700
> Subject: [PATCH] Add Prefer_MAP_32BIT_EXEC for Silvermont
> 
> According to Silvermont software optimization guide, for 64-bit
> applications, branch prediction performance can be negatively impacted
> when the target of a branch is more than 4GB away from the branch.  Set
> the Prefer_MAP_32BIT_EXEC bit for Silvermont so that mmap will try to
> map executable pages with MAP_32BIT first.  Also enable Silvermont
> optimizations for Knights Landing.
> 
> Prefer_MAP_32BIT_EXEC reduces bits available for address space
> layout randomization (ASLR), which is always disabled for SUID
> programs and can only be enabled by setting environment variable,
> LD_ENABLE_PREFER_MAP_32BIT_EXEC.
> 
> On Fedora 23, this patch speeds up GCC 5 testsuite by 3% on Silvermont.
> 
> 	* sysdeps/unix/sysv/linux/wordsize-64/mmap.c: New file.
> 	* sysdeps/unix/sysv/linux/x86_64/64/mmap.c: Likewise.
> 	* sysdeps/x86/cpu-features.c (get_prefer_map_32bit_exec): New
> 	function.
> 	(init_cpu_features): Call get_prefer_map_32bit_exec for
> 	Silvermont.  Enable Silvermont optimizations for Knights Landing.
> 	* sysdeps/x86/cpu-features.h (bit_Prefer_MAP_32BIT_EXEC): New.
> 	(index_Prefer_MAP_32BIT_EXEC): Likewise.
> ---
>  sysdeps/unix/sysv/linux/wordsize-64/mmap.c | 40 +++++++++++++++++++++++++
>  sysdeps/unix/sysv/linux/x86_64/64/mmap.c   | 37 +++++++++++++++++++++++
>  sysdeps/x86/cpu-features.c                 | 48 ++++++++++++++++++++++++++++--
>  sysdeps/x86/cpu-features.h                 |  3 ++
>  4 files changed, 126 insertions(+), 2 deletions(-)
>  create mode 100644 sysdeps/unix/sysv/linux/wordsize-64/mmap.c
>  create mode 100644 sysdeps/unix/sysv/linux/x86_64/64/mmap.c
> 
> diff --git a/sysdeps/unix/sysv/linux/wordsize-64/mmap.c b/sysdeps/unix/sysv/linux/wordsize-64/mmap.c
> new file mode 100644
> index 0000000..e098976
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/wordsize-64/mmap.c
> @@ -0,0 +1,40 @@
> +/* Linux mmap system call.  64-bit version.
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public License as
> +   published by the Free Software Foundation; either version 2.1 of the
> +   License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <sys/types.h>
> +#include <sys/mman.h>
> +#include <errno.h>
> +#include <sysdep.h>
> +
> +/* An architecture may override this.  */
> +#ifndef MMAP_PREPARE
> +# define MMAP_PREPARE(addr, len, prot, flags, fd, offset)
> +#endif
> +
> +__ptr_t
> +__mmap (__ptr_t addr, size_t len, int prot, int flags, int fd, off_t offset)
> +{
> +  MMAP_PREPARE (addr, len, prot, flags, fd, offset);
> +  return (__ptr_t) INLINE_SYSCALL (mmap, 6, addr, len, prot, flags,
> +				   fd, offset);
> +}

OK.

> +
> +weak_alias (__mmap, mmap)
> +weak_alias (__mmap, mmap64)
> +weak_alias (__mmap, __mmap64)
> diff --git a/sysdeps/unix/sysv/linux/x86_64/64/mmap.c b/sysdeps/unix/sysv/linux/x86_64/64/mmap.c
> new file mode 100644
> index 0000000..031316c
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/x86_64/64/mmap.c
> @@ -0,0 +1,37 @@
> +/* Linux mmap system call.  x86-64 version.
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public License as
> +   published by the Free Software Foundation; either version 2.1 of the
> +   License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <ldsodefs.h>
> +
> +/* If the Prefer_MAP_32BIT_EXEC bit is set, try to map executable pages
> +   with MAP_32BIT first.  */
> +#define MMAP_PREPARE(addr, len, prot, flags, fd, offset)		\
> +  if ((addr) == NULL							\
> +      && ((prot) & PROT_EXEC) != 0					\
> +      && HAS_ARCH_FEATURE (Prefer_MAP_32BIT_EXEC))			\
> +    {									\
> +      __ptr_t ret = (__ptr_t) INLINE_SYSCALL (mmap, 6, (addr), (len),	\
> +					      (prot),			\
> +					      (flags) | MAP_32BIT,	\
> +					      (fd), (offset));		\
> +      if (ret != MAP_FAILED)						\
> +	return ret;							\
> +    }
> +
> +#include <sysdeps/unix/sysv/linux/wordsize-64/mmap.c>

OK.

> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index fba3ef0..33e0e73 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -39,6 +39,37 @@ get_common_indeces (struct cpu_features *cpu_features,
>      }
>  }
>  
> +/* Prefer_MAP_32BIT_EXEC reduces bits available for address space layout
> +   randomization (ASLR).  Prefer_MAP_32BIT_EXEC is always disabled for
> +   SUID programs and can be enabled by setting environment variable,
> +   LD_ENABLE_PREFER_MAP_32BIT_EXEC.  */

Drop 'Enable' use just "LD_PREFER_MAP_32BIT_EXEC", it's implicit in the
value that you're either enabling or disabling it.

> +
> +static inline unsigned int
> +get_prefer_map_32bit_exec (void)
> +{
> +#if defined __LP64__ && IS_IN (rtld)
> +  extern char **__environ attribute_hidden;
> +  extern int __libc_enable_secure;
> +  if (__builtin_expect (__libc_enable_secure, 0))
> +    return 0;

Fails to unset the unsecure LD env var if you consider LD_PREFER_MAP_32BIT_EXEC
dangerous, and I do.

> +  for (char **current = __environ; *current != NULL; ++current)
> +    {
> +      /* Check LD_ENABLE_PREFER_MAP_32BIT_EXEC=.  */
> +      static const char *enable = "LD_ENABLE_PREFER_MAP_32BIT_EXEC=";
> +      for (size_t i = 0; ; i++)
> +	{
> +	  if (enable[i] != (*current)[i])
> +	    break;
> +	  if ((*current)[i] == '=')
> +	    return bit_Prefer_MAP_32BIT_EXEC;
> +	}
> +    }
> +  return 0;

Why not just add this to elf/rtld.c (process_envvars) via EXTRA_LD_ENVVARS
and EXTRA_UNSECURE_ENVVARS or just UNSECURE_ENVVARS.

Is this early enough? It should be.

> +#else
> +  return 0;
> +#endif
> +}
> +
>  static inline void
>  init_cpu_features (struct cpu_features *cpu_features)
>  {
> @@ -78,22 +109,35 @@ init_cpu_features (struct cpu_features *cpu_features)
>  	      cpu_features->feature[index_Slow_BSF] |= bit_Slow_BSF;
>  	      break;
>  
> +	    case 0x57:
> +	      /* Knights Landing.  Enable Silvermont optimizations.  */
> +
>  	    case 0x37:
>  	    case 0x4a:
>  	    case 0x4d:
>  	    case 0x5a:
>  	    case 0x5d:
> -	      /* Unaligned load versions are faster than SSSE3
> -		 on Silvermont.  */
> +	      /* Unaligned load versions are faster than SSSE3 on
> +		 Silvermont.  For 64-bit applications, branch
> +		 prediction performance can be negatively impacted
> +		 when the target of a branch is more than 4GB away
> +		 from the branch.  Set the Prefer_MAP_32BIT_EXEC bit
> +		 so that mmap will try to map executable pages with
> +		 MAP_32BIT first.  NB: MAP_32BIT will map to lower
> +		 2GB, not lower 4GB, address.  */
>  #if index_Fast_Unaligned_Load != index_Prefer_PMINUB_for_stringop
>  # error index_Fast_Unaligned_Load != index_Prefer_PMINUB_for_stringop
>  #endif
> +#if index_Fast_Unaligned_Load != index_Prefer_MAP_32BIT_EXEC
> +# error index_Fast_Unaligned_Load != index_Prefer_MAP_32BIT_EXEC
> +#endif
>  #if index_Fast_Unaligned_Load != index_Slow_SSE4_2
>  # error index_Fast_Unaligned_Load != index_Slow_SSE4_2
>  #endif
>  	      cpu_features->feature[index_Fast_Unaligned_Load]
>  		|= (bit_Fast_Unaligned_Load
>  		    | bit_Prefer_PMINUB_for_stringop
> +		    | get_prefer_map_32bit_exec ()
>  		    | bit_Slow_SSE4_2);

All of the above code is gone, but this remains as part of the macro
that processes the env var and writes the bit to the cpu_features
array?

+  /* For 64-bit applications, branch prediction performance may be
+     negatively impacted when the target of a branch is more than 4GB
+     away from the branch.  Set the Prefer_MAP_32BIT_EXEC bit so that
+     mmap will try to map executable pages with MAP_32BIT first.
+     NB: MAP_32BIT will map to lower 2GB, not lower 4GB, address.  */

Needs the same comment about ASLR as get_prefer_map_32bit_exec has. Please.

+  cpu_features->feature[index_Prefer_MAP_32BIT_EXEC]
+    |= get_prefer_map_32bit_exec ();

You wouldn't need get_prefer_map_32bit_exec, since this is all part of
the code, like dl-librecon.h, which parses the extra env var.

>  	      break;
>  
> diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
> index 80edbee..93bee69 100644
> --- a/sysdeps/x86/cpu-features.h
> +++ b/sysdeps/x86/cpu-features.h
> @@ -33,6 +33,7 @@
>  #define bit_AVX512DQ_Usable		(1 << 13)
>  #define bit_I586			(1 << 14)
>  #define bit_I686			(1 << 15)
> +#define bit_Prefer_MAP_32BIT_EXEC	(1 << 16)

OK.

>  
>  /* CPUID Feature flags.  */
>  
> @@ -97,6 +98,7 @@
>  # define index_AVX512DQ_Usable		FEATURE_INDEX_1*FEATURE_SIZE
>  # define index_I586			FEATURE_INDEX_1*FEATURE_SIZE
>  # define index_I686			FEATURE_INDEX_1*FEATURE_SIZE
> +# define index_Prefer_MAP_32BIT_EXEC	FEATURE_INDEX_1*FEATURE_SIZE

OK.

>  
>  # if defined (_LIBC) && !IS_IN (nonlib)
>  #  ifdef __x86_64__
> @@ -248,6 +250,7 @@ extern const struct cpu_features *__get_cpu_features (void)
>  # define index_AVX512DQ_Usable		FEATURE_INDEX_1
>  # define index_I586			FEATURE_INDEX_1
>  # define index_I686			FEATURE_INDEX_1
> +# define index_Prefer_MAP_32BIT_EXEC	FEATURE_INDEX_1

Ok.

>  
>  #endif	/* !__ASSEMBLER__ */
>  
> -- 2.5.0

Please post new version.

Cheers,
Carlos.
  
Carlos O'Donell Dec. 15, 2015, 6:38 p.m. UTC | #9
On 12/15/2015 01:27 PM, Carlos O'Donell wrote:
> +  cpu_features->feature[index_Prefer_MAP_32BIT_EXEC]
> +    |= get_prefer_map_32bit_exec ();
> 
> You wouldn't need get_prefer_map_32bit_exec, since this is all part of
> the code, like dl-librecon.h, which parses the extra env var.

To be clear:

* Add new bit flag definitions for cpu_features.
* Add a sysdeps/unix/sysv/linux/x86_64/dl-silvermont.h
  * Fill in EXTRA_LD_ENVVARS or add new ones.
  * Write to rtld's GLRO cpu_features the bit you need based
    on __libc_enable_secure.

That should simplify and concentrate the Silvermont fixes to
just two files, making future maintenance and documentation
easier.

$0.02.

Cheers,
Carlos.
  

Patch

From 55a5e6278f86cecba8515804a7a2859a109920ba Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Wed, 21 Oct 2015 14:44:23 -0700
Subject: [PATCH] Add Prefer_MAP_32BIT_EXEC for Silvermont

According to Silvermont software optimization guide, for 64-bit
applications, branch prediction performance can be negatively impacted
when the target of a branch is more than 4GB away from the branch.  Set
the Prefer_MAP_32BIT_EXEC bit for Silvermont so that mmap will try to
map executable pages with MAP_32BIT first.  Also enable Silvermont
optimizations for Knights Landing.

Prefer_MAP_32BIT_EXEC reduces bits available for address space
layout randomization (ASLR), which is always disabled for SUID
programs and can only be enabled by setting environment variable,
LD_ENABLE_PREFER_MAP_32BIT_EXEC.

On Fedora 23, this patch speeds up GCC 5 testsuite by 3% on Silvermont.

	* sysdeps/unix/sysv/linux/wordsize-64/mmap.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/64/mmap.c: Likewise.
	* sysdeps/x86/cpu-features.c (get_prefer_map_32bit_exec): New
	function.
	(init_cpu_features): Call get_prefer_map_32bit_exec for
	Silvermont.  Enable Silvermont optimizations for Knights Landing.
	* sysdeps/x86/cpu-features.h (bit_Prefer_MAP_32BIT_EXEC): New.
	(index_Prefer_MAP_32BIT_EXEC): Likewise.
---
 sysdeps/unix/sysv/linux/wordsize-64/mmap.c | 40 +++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/x86_64/64/mmap.c   | 37 +++++++++++++++++++++++
 sysdeps/x86/cpu-features.c                 | 48 ++++++++++++++++++++++++++++--
 sysdeps/x86/cpu-features.h                 |  3 ++
 4 files changed, 126 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/unix/sysv/linux/wordsize-64/mmap.c
 create mode 100644 sysdeps/unix/sysv/linux/x86_64/64/mmap.c

diff --git a/sysdeps/unix/sysv/linux/wordsize-64/mmap.c b/sysdeps/unix/sysv/linux/wordsize-64/mmap.c
new file mode 100644
index 0000000..e098976
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/wordsize-64/mmap.c
@@ -0,0 +1,40 @@ 
+/* Linux mmap system call.  64-bit version.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <errno.h>
+#include <sysdep.h>
+
+/* An architecture may override this.  */
+#ifndef MMAP_PREPARE
+# define MMAP_PREPARE(addr, len, prot, flags, fd, offset)
+#endif
+
+__ptr_t
+__mmap (__ptr_t addr, size_t len, int prot, int flags, int fd, off_t offset)
+{
+  MMAP_PREPARE (addr, len, prot, flags, fd, offset);
+  return (__ptr_t) INLINE_SYSCALL (mmap, 6, addr, len, prot, flags,
+				   fd, offset);
+}
+
+weak_alias (__mmap, mmap)
+weak_alias (__mmap, mmap64)
+weak_alias (__mmap, __mmap64)
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/mmap.c b/sysdeps/unix/sysv/linux/x86_64/64/mmap.c
new file mode 100644
index 0000000..031316c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/64/mmap.c
@@ -0,0 +1,37 @@ 
+/* Linux mmap system call.  x86-64 version.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <ldsodefs.h>
+
+/* If the Prefer_MAP_32BIT_EXEC bit is set, try to map executable pages
+   with MAP_32BIT first.  */
+#define MMAP_PREPARE(addr, len, prot, flags, fd, offset)		\
+  if ((addr) == NULL							\
+      && ((prot) & PROT_EXEC) != 0					\
+      && HAS_ARCH_FEATURE (Prefer_MAP_32BIT_EXEC))			\
+    {									\
+      __ptr_t ret = (__ptr_t) INLINE_SYSCALL (mmap, 6, (addr), (len),	\
+					      (prot),			\
+					      (flags) | MAP_32BIT,	\
+					      (fd), (offset));		\
+      if (ret != MAP_FAILED)						\
+	return ret;							\
+    }
+
+#include <sysdeps/unix/sysv/linux/wordsize-64/mmap.c>
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index fba3ef0..33e0e73 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -39,6 +39,37 @@  get_common_indeces (struct cpu_features *cpu_features,
     }
 }
 
+/* Prefer_MAP_32BIT_EXEC reduces bits available for address space layout
+   randomization (ASLR).  Prefer_MAP_32BIT_EXEC is always disabled for
+   SUID programs and can be enabled by setting environment variable,
+   LD_ENABLE_PREFER_MAP_32BIT_EXEC.  */
+
+static inline unsigned int
+get_prefer_map_32bit_exec (void)
+{
+#if defined __LP64__ && IS_IN (rtld)
+  extern char **__environ attribute_hidden;
+  extern int __libc_enable_secure;
+  if (__builtin_expect (__libc_enable_secure, 0))
+    return 0;
+  for (char **current = __environ; *current != NULL; ++current)
+    {
+      /* Check LD_ENABLE_PREFER_MAP_32BIT_EXEC=.  */
+      static const char *enable = "LD_ENABLE_PREFER_MAP_32BIT_EXEC=";
+      for (size_t i = 0; ; i++)
+	{
+	  if (enable[i] != (*current)[i])
+	    break;
+	  if ((*current)[i] == '=')
+	    return bit_Prefer_MAP_32BIT_EXEC;
+	}
+    }
+  return 0;
+#else
+  return 0;
+#endif
+}
+
 static inline void
 init_cpu_features (struct cpu_features *cpu_features)
 {
@@ -78,22 +109,35 @@  init_cpu_features (struct cpu_features *cpu_features)
 	      cpu_features->feature[index_Slow_BSF] |= bit_Slow_BSF;
 	      break;
 
+	    case 0x57:
+	      /* Knights Landing.  Enable Silvermont optimizations.  */
+
 	    case 0x37:
 	    case 0x4a:
 	    case 0x4d:
 	    case 0x5a:
 	    case 0x5d:
-	      /* Unaligned load versions are faster than SSSE3
-		 on Silvermont.  */
+	      /* Unaligned load versions are faster than SSSE3 on
+		 Silvermont.  For 64-bit applications, branch
+		 prediction performance can be negatively impacted
+		 when the target of a branch is more than 4GB away
+		 from the branch.  Set the Prefer_MAP_32BIT_EXEC bit
+		 so that mmap will try to map executable pages with
+		 MAP_32BIT first.  NB: MAP_32BIT will map to lower
+		 2GB, not lower 4GB, address.  */
 #if index_Fast_Unaligned_Load != index_Prefer_PMINUB_for_stringop
 # error index_Fast_Unaligned_Load != index_Prefer_PMINUB_for_stringop
 #endif
+#if index_Fast_Unaligned_Load != index_Prefer_MAP_32BIT_EXEC
+# error index_Fast_Unaligned_Load != index_Prefer_MAP_32BIT_EXEC
+#endif
 #if index_Fast_Unaligned_Load != index_Slow_SSE4_2
 # error index_Fast_Unaligned_Load != index_Slow_SSE4_2
 #endif
 	      cpu_features->feature[index_Fast_Unaligned_Load]
 		|= (bit_Fast_Unaligned_Load
 		    | bit_Prefer_PMINUB_for_stringop
+		    | get_prefer_map_32bit_exec ()
 		    | bit_Slow_SSE4_2);
 	      break;
 
diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
index 80edbee..93bee69 100644
--- a/sysdeps/x86/cpu-features.h
+++ b/sysdeps/x86/cpu-features.h
@@ -33,6 +33,7 @@ 
 #define bit_AVX512DQ_Usable		(1 << 13)
 #define bit_I586			(1 << 14)
 #define bit_I686			(1 << 15)
+#define bit_Prefer_MAP_32BIT_EXEC	(1 << 16)
 
 /* CPUID Feature flags.  */
 
@@ -97,6 +98,7 @@ 
 # define index_AVX512DQ_Usable		FEATURE_INDEX_1*FEATURE_SIZE
 # define index_I586			FEATURE_INDEX_1*FEATURE_SIZE
 # define index_I686			FEATURE_INDEX_1*FEATURE_SIZE
+# define index_Prefer_MAP_32BIT_EXEC	FEATURE_INDEX_1*FEATURE_SIZE
 
 # if defined (_LIBC) && !IS_IN (nonlib)
 #  ifdef __x86_64__
@@ -248,6 +250,7 @@  extern const struct cpu_features *__get_cpu_features (void)
 # define index_AVX512DQ_Usable		FEATURE_INDEX_1
 # define index_I586			FEATURE_INDEX_1
 # define index_I686			FEATURE_INDEX_1
+# define index_Prefer_MAP_32BIT_EXEC	FEATURE_INDEX_1
 
 #endif	/* !__ASSEMBLER__ */
 
-- 
2.5.0