[RFCv2] Dynamic lock elision support

Message ID 55E4A9E7.3030700@linux.vnet.ibm.com
State Superseded
Headers

Commit Message

Paul E. Murphy Aug. 31, 2015, 7:24 p.m. UTC
  Narrowing my focus here, we should have a runtime
mechanism to disable elision for those applications
which experience significant degradation from the
non-optional nature of this feature.

I think we can table the discussion of runtime
tunable parameters as it is highly dependent on the
framework which emerges.

In the meantime, there is a need to turn this
off for select workloads. It would be preferable
to add this in such a way that it can be easily
merged into the tunables framework when it
does evolve.

--
Paul M.


On 08/18/2015 11:46 AM, Adhemerval Zanella wrote:
> Hi,
> 
> Andi Kleen has sent some patches to add tunables for lock elision with
> NPTL algorithms:
> 
> http://patchwork.sourceware.org/patch/4358/
> http://patchwork.sourceware.org/patch/4355/
> http://patchwork.sourceware.org/patch/4356/
> http://patchwork.sourceware.org/patch/4359/
> http://patchwork.sourceware.org/patch/4357/
> http://patchwork.sourceware.org/patch/4361/
> http://patchwork.sourceware.org/patch/4360/
> 
> But he has not yet ping or followed up with these.  I also would prefer
> to add tunable to be as platform agnostic as possible, since we have
> 3 architectures that support LE (x86, ppc, zarch).  So I would suggest
> you work on this options and/or adjust the patch on these.
> 
> Also, we may want to avoid a configure flag and enable LE support as
> default for powerpc, but disable as default (enabled through env. vars
> as Andi Kleen patchset).
> 
> On 18-08-2015 13:10, Paul E. Murphy wrote:
>> Elided locks can have mixed overall performance in practice. That is, there is some non-trivial tuning a user might have to do to see the positive benefits. Additionally, when tuning the adaptive lock constants on PPC, my experimentation seems to correlate tuned values with both the number of hardware threads per core, and the behavior of the application.
>>
>> My initial thought is elision should be disabled by default, with an environment variable to toggle both support, and potentially override tuning constants.
>>
>> Paul
>>
>
  

Comments

Carlos O'Donell Sept. 2, 2015, 6:58 p.m. UTC | #1
On 08/31/2015 03:24 PM, Paul E. Murphy wrote:
> Narrowing my focus here, we should have a runtime
> mechanism to disable elision for those applications
> which experience significant degradation from the
> non-optional nature of this feature.
> 
> I think we can table the discussion of runtime
> tunable parameters as it is highly dependent on the
> framework which emerges.
> 
> In the meantime, there is a need to turn this
> off for select workloads. It would be preferable
> to add this in such a way that it can be easily
> merged into the tunables framework when it
> does evolve.

Is this theoretical or do you have such customer workloads,
I'm not talking about the synthetic benchmarks you have, where
default pthread mutexes cause the application to experience
significant performance loss?

If you are proposing that your patch is sufficiently important
to bypass the tunables discussion and get accepted right away
then you need to give sufficient detail for the community to
accept that we need to move quickly because users are having
problems.

Otherwise I think that Siddhesh's present work should get
reviewed by you, and others, get merged, and represent the
basis for an internal tunables API that enables per-env-var
tunables.

In summary, we can add "yet one more env var" for you, but
you have to justify it.

Cheers,
Carlos.
  
Adhemerval Zanella Sept. 2, 2015, 7:30 p.m. UTC | #2
On 02-09-2015 15:58, Carlos O'Donell wrote:
> On 08/31/2015 03:24 PM, Paul E. Murphy wrote:
>> Narrowing my focus here, we should have a runtime
>> mechanism to disable elision for those applications
>> which experience significant degradation from the
>> non-optional nature of this feature.
>>
>> I think we can table the discussion of runtime
>> tunable parameters as it is highly dependent on the
>> framework which emerges.
>>
>> In the meantime, there is a need to turn this
>> off for select workloads. It would be preferable
>> to add this in such a way that it can be easily
>> merged into the tunables framework when it
>> does evolve.
> 
> Is this theoretical or do you have such customer workloads,
> I'm not talking about the synthetic benchmarks you have, where
> default pthread mutexes cause the application to experience
> significant performance loss?
> 
> If you are proposing that your patch is sufficiently important
> to bypass the tunables discussion and get accepted right away
> then you need to give sufficient detail for the community to
> accept that we need to move quickly because users are having
> problems.
> 
> Otherwise I think that Siddhesh's present work should get
> reviewed by you, and others, get merged, and represent the
> basis for an internal tunables API that enables per-env-var
> tunables.
> 
> In summary, we can add "yet one more env var" for you, but
> you have to justify it.
> 

Based on Siddhesh initial proposal [1], IMHO best approach can be divide in
functionalities or libraries areas, such as:

* GLIBC_MALLOC for malloc related tunables (all current one, such as mmap trim, etc.)
* GLIBC_THREAD for libpthread one, such as PI-aware locks, TLE, stack size, etc.
* etc.

So in the TLE case for instance, we can use GLIBC_THREAD as the base and
add it in general case (instead of replicate the logic in every architecture).

[1] https://sourceware.org/glibc/wiki/TuningLibraryRuntimeBehavior

> Cheers,
> Carlos.
>
  
Carlos O'Donell Sept. 2, 2015, 9:01 p.m. UTC | #3
On 09/02/2015 03:30 PM, Adhemerval Zanella wrote:
> Based on Siddhesh initial proposal [1], IMHO best approach can be divide in
> functionalities or libraries areas, such as:
> 
> * GLIBC_MALLOC for malloc related tunables (all current one, such as mmap trim, etc.)
> * GLIBC_THREAD for libpthread one, such as PI-aware locks, TLE, stack size, etc.
> * etc.
> 
> So in the TLE case for instance, we can use GLIBC_THREAD as the base and
> add it in general case (instead of replicate the logic in every architecture).
> 
> [1] https://sourceware.org/glibc/wiki/TuningLibraryRuntimeBehavior

As long as the documentation in the manual says tunables are supported only for
a single release, and are not backwards compatible, then I'm happy, because we
can eventually revisit this decision.

I don't see any real value in having grouped tunnables, because the user still
has to know all of them and clear all of them. And now instead of having either
a simple 1:1 of tunable:envvar, or ALL:1 as Roland suggested, we have some mix
that the user has to learn.

I still recommend one tunable per env var.

We might have a tool provided by glibcto do this would help 
e.g. glibc-tune --clear, glibc-tune --list
e.g. glibc-tune --set GLIBC_PTHREAD_ELISION_DISABLE=yes

Cheers,
Carlos.
  
Steven Munroe Sept. 2, 2015, 10:02 p.m. UTC | #4
On Wed, 2015-09-02 at 17:01 -0400, Carlos O'Donell wrote:
> On 09/02/2015 03:30 PM, Adhemerval Zanella wrote:
> > Based on Siddhesh initial proposal [1], IMHO best approach can be divide in
> > functionalities or libraries areas, such as:
> > 
> > * GLIBC_MALLOC for malloc related tunables (all current one, such as mmap trim, etc.)
> > * GLIBC_THREAD for libpthread one, such as PI-aware locks, TLE, stack size, etc.
> > * etc.
> > 
> > So in the TLE case for instance, we can use GLIBC_THREAD as the base and
> > add it in general case (instead of replicate the logic in every architecture).
> > 
> > [1] https://sourceware.org/glibc/wiki/TuningLibraryRuntimeBehavior
> 
> As long as the documentation in the manual says tunables are supported only for
> a single release, and are not backwards compatible, then I'm happy, because we
> can eventually revisit this decision.
> 
> I don't see any real value in having grouped tunnables, because the user still
> has to know all of them and clear all of them. And now instead of having either
> a simple 1:1 of tunable:envvar, or ALL:1 as Roland suggested, we have some mix
> that the user has to learn.
> 
> I still recommend one tunable per env var.
> 
I also prefer a 1:1 structure but suggest an overlay of reasonable
prefix naming standard/practice to make possible/practical the tooling
you suggest below.

> We might have a tool provided by glibcto do this would help 
> e.g. glibc-tune --clear, glibc-tune --list
> e.g. glibc-tune --set GLIBC_PTHREAD_ELISION_DISABLE=yes
> 
> Cheers,
> Carlos.
>
  
Steven Munroe Sept. 3, 2015, 3:08 p.m. UTC | #5
On Wed, 2015-09-02 at 14:58 -0400, Carlos O'Donell wrote:
> On 08/31/2015 03:24 PM, Paul E. Murphy wrote:
> > Narrowing my focus here, we should have a runtime
> > mechanism to disable elision for those applications
> > which experience significant degradation from the
> > non-optional nature of this feature.
> > 
> > I think we can table the discussion of runtime
> > tunable parameters as it is highly dependent on the
> > framework which emerges.
> > 
> > In the meantime, there is a need to turn this
> > off for select workloads. It would be preferable
> > to add this in such a way that it can be easily
> > merged into the tunables framework when it
> > does evolve.
> 
> Is this theoretical or do you have such customer workloads,
> I'm not talking about the synthetic benchmarks you have, where
> default pthread mutexes cause the application to experience
> significant performance loss?
> 
We are motivated to address this issue as we have code (TLE enabled
GLIBC) in in the field and have heard some complaints. Unfortunately the
customer did not provide a test case.

Our (well Paul's really) analysis is that we are missed tuned for TLE
transactions that abort due to syscalls within the critical region. This
is compounded by older kernels that did not tabort the transaction early
but cause the transaction to fail due to other (like overflowing the
foot print) reasons. Net for some applications (with a propensity to
include syscalls within pthread_mutex critical regions) we see near 100%
TLE abort frequencies. And we are taking an extra long time to abort the
transaction.

Clearly is better to tabort the transaction early for most syscalls but
we do not expect correct handling of this until 4.2 or later.

So for now (until non-TLE friendly kernels age out of the system) a
mechanism to optionally disable TLE seems like a good idea.

> If you are proposing that your patch is sufficiently important
> to bypass the tunables discussion and get accepted right away
> then you need to give sufficient detail for the community to
> accept that we need to move quickly because users are having
> problems.
> 
We are willing to work with the community as long as the discussion does
not drag-on or turn into a filibuster.

We do need to fix this for existing customers, soon.  

> Otherwise I think that Siddhesh's present work should get
> reviewed by you, and others, get merged, and represent the
> basis for an internal tunables API that enables per-env-var
> tunables.
> 
> In summary, we can add "yet one more env var" for you, but
> you have to justify it.
> 
> Cheers,
> Carlos.
>
  
Ondrej Bilka Sept. 3, 2015, 7:51 p.m. UTC | #6
On Thu, Sep 03, 2015 at 10:08:54AM -0500, Steven Munroe wrote:
> On Wed, 2015-09-02 at 14:58 -0400, Carlos O'Donell wrote:
> > On 08/31/2015 03:24 PM, Paul E. Murphy wrote:
> > > Narrowing my focus here, we should have a runtime
> > > mechanism to disable elision for those applications
> > > which experience significant degradation from the
> > > non-optional nature of this feature.
> > > 
> > > I think we can table the discussion of runtime
> > > tunable parameters as it is highly dependent on the
> > > framework which emerges.
> > > 
> > > In the meantime, there is a need to turn this
> > > off for select workloads. It would be preferable
> > > to add this in such a way that it can be easily
> > > merged into the tunables framework when it
> > > does evolve.
> > 
> > Is this theoretical or do you have such customer workloads,
> > I'm not talking about the synthetic benchmarks you have, where
> > default pthread mutexes cause the application to experience
> > significant performance loss?
> > 
> We are motivated to address this issue as we have code (TLE enabled
> GLIBC) in in the field and have heard some complaints. Unfortunately the
> customer did not provide a test case.
> 
> Our (well Paul's really) analysis is that we are missed tuned for TLE
> transactions that abort due to syscalls within the critical region. This
> is compounded by older kernels that did not tabort the transaction early
> but cause the transaction to fail due to other (like overflowing the
> foot print) reasons. Net for some applications (with a propensity to
> include syscalls within pthread_mutex critical regions) we see near 100%
> TLE abort frequencies. And we are taking an extra long time to abort the
> transaction.
> 
> Clearly is better to tabort the transaction early for most syscalls but
> we do not expect correct handling of this until 4.2 or later.
> 

Then adding tunable for that looks like bad idea. If you want to add
tunable it should be for something where only programmer has data about
performance.

What you described looks quite tracable, as problem is in too many
aborts. To handle that it would be better to add profiling counters to
disable elision when aborts exceeds certain percentage.

My proposal with least profiling overhead would be instead disable
elision once application exceeds say 1024 aborts per second or other
constant that we find from benchmark. In pseudocode it could use
something like this, we don't have to care about reliability much as its
heuristic that does affect only performance not correctness.

int use_elision = 1;
int abort_count = 0;
uint64_t abort_time = 0;

void elision_aborted();
{
  int c = atomic_fetch_add_relaxed (&abort_count, 1);
  if (c == 1024)
    {
      uint64_t time = gettime ();
      uint64_t ctime = atomic_load_relaxed(&abort_time);
      if (time - ctime < 1000000000)
        use_elision = 0;
      atomic_store_relaxed (&abort_time, time);
      atomic_store_relaxed (&abort_count, 0);
    }
}
  
Steven Munroe Sept. 3, 2015, 8:23 p.m. UTC | #7
On Thu, 2015-09-03 at 21:51 +0200, Ondřej Bílka wrote:
> On Thu, Sep 03, 2015 at 10:08:54AM -0500, Steven Munroe wrote:
> > On Wed, 2015-09-02 at 14:58 -0400, Carlos O'Donell wrote:
> > > On 08/31/2015 03:24 PM, Paul E. Murphy wrote:
> > > > Narrowing my focus here, we should have a runtime
> > > > mechanism to disable elision for those applications
> > > > which experience significant degradation from the
> > > > non-optional nature of this feature.
> > > > 
> > > > I think we can table the discussion of runtime
> > > > tunable parameters as it is highly dependent on the
> > > > framework which emerges.
> > > > 
> > > > In the meantime, there is a need to turn this
> > > > off for select workloads. It would be preferable
> > > > to add this in such a way that it can be easily
> > > > merged into the tunables framework when it
> > > > does evolve.
> > > 
> > > Is this theoretical or do you have such customer workloads,
> > > I'm not talking about the synthetic benchmarks you have, where
> > > default pthread mutexes cause the application to experience
> > > significant performance loss?
> > > 
> > We are motivated to address this issue as we have code (TLE enabled
> > GLIBC) in in the field and have heard some complaints. Unfortunately the
> > customer did not provide a test case.
> > 
> > Our (well Paul's really) analysis is that we are missed tuned for TLE
> > transactions that abort due to syscalls within the critical region. This
> > is compounded by older kernels that did not tabort the transaction early
> > but cause the transaction to fail due to other (like overflowing the
> > foot print) reasons. Net for some applications (with a propensity to
> > include syscalls within pthread_mutex critical regions) we see near 100%
> > TLE abort frequencies. And we are taking an extra long time to abort the
> > transaction.
> > 
> > Clearly is better to tabort the transaction early for most syscalls but
> > we do not expect correct handling of this until 4.2 or later.
> > 
> 
> Then adding tunable for that looks like bad idea. If you want to add
> tunable it should be for something where only programmer has data about
> performance.
> 
I did not ask for a tunable for this. I am asking for a enable / disable
control for TLE.

The community via Carlos et al is asking for a larger discussion about
tunables in general.

I was asked to describe the specific issue that I thought justified
expediting this discussion and or staging the implementation. This was
the intent for my reply to Carlos.

We will address the heuristics for our TLE implementation. But there
will always be cases where the customer and the kernel will do
unexpected things. So a top level control to enable/disable TLE is a
first order requirement.
  
Carlos O'Donell Sept. 4, 2015, 2 a.m. UTC | #8
On 09/03/2015 11:08 AM, Steven Munroe wrote:
> We are willing to work with the community as long as the discussion does
> not drag-on or turn into a filibuster.
> 
> We do need to fix this for existing customers, soon.  

Next steps:

- New thread to get consensus on the namespace for new tunable env vars.

  - I suggest GLIBC_PTHREAD_ELISION_* since this is elision for posix threads,
    not generic threads like we might some day export via an execution agent
    library (see discussions around libea).

- Patch with manual documentation for the new tunable.

  - Start a new section on Tunables. We can help write it. Lay out the
    ground work for stability rules etc.

  - Describe what GLIBC_PTHREAD_ELISION_* does.

I look forward to working with you to clean up all the env vars
in glibc and extend them via the tunables in some way or another.
Since it would be really nice to be able to dynamically enable
or disable elision at runtime, and that's the kind of interface
we are designing for tunables.

Cheers,
Carlos.
  
Carlos O'Donell Sept. 4, 2015, 3:07 a.m. UTC | #9
On 09/03/2015 04:23 PM, Steven Munroe wrote:
> The community via Carlos et al is asking for a larger discussion about
> tunables in general.

Kicked off the discussion to allow patches that accept GLIBC_*
namespace env vars to operate as-if they were tunables.

https://www.sourceware.org/ml/libc-alpha/2015-09/msg00130.html

c.
  
Ondrej Bilka Sept. 5, 2015, 6:23 a.m. UTC | #10
On Thu, Sep 03, 2015 at 03:23:15PM -0500, Steven Munroe wrote:
> On Thu, 2015-09-03 at 21:51 +0200, Ondřej Bílka wrote:
> > On Thu, Sep 03, 2015 at 10:08:54AM -0500, Steven Munroe wrote:
> > > On Wed, 2015-09-02 at 14:58 -0400, Carlos O'Donell wrote:
> > > > On 08/31/2015 03:24 PM, Paul E. Murphy wrote:
> > > > > Narrowing my focus here, we should have a runtime
> > > > > mechanism to disable elision for those applications
> > > > > which experience significant degradation from the
> > > > > non-optional nature of this feature.
> > > > > 
> > > > > I think we can table the discussion of runtime
> > > > > tunable parameters as it is highly dependent on the
> > > > > framework which emerges.
> > > > > 
> > > > > In the meantime, there is a need to turn this
> > > > > off for select workloads. It would be preferable
> > > > > to add this in such a way that it can be easily
> > > > > merged into the tunables framework when it
> > > > > does evolve.
> > > > 
> > > > Is this theoretical or do you have such customer workloads,
> > > > I'm not talking about the synthetic benchmarks you have, where
> > > > default pthread mutexes cause the application to experience
> > > > significant performance loss?
> > > > 
> > > We are motivated to address this issue as we have code (TLE enabled
> > > GLIBC) in in the field and have heard some complaints. Unfortunately the
> > > customer did not provide a test case.
> > > 
> > > Our (well Paul's really) analysis is that we are missed tuned for TLE
> > > transactions that abort due to syscalls within the critical region. This
> > > is compounded by older kernels that did not tabort the transaction early
> > > but cause the transaction to fail due to other (like overflowing the
> > > foot print) reasons. Net for some applications (with a propensity to
> > > include syscalls within pthread_mutex critical regions) we see near 100%
> > > TLE abort frequencies. And we are taking an extra long time to abort the
> > > transaction.
> > > 
> > > Clearly is better to tabort the transaction early for most syscalls but
> > > we do not expect correct handling of this until 4.2 or later.
> > > 
> > 
> > Then adding tunable for that looks like bad idea. If you want to add
> > tunable it should be for something where only programmer has data about
> > performance.
> > 
> I did not ask for a tunable for this. I am asking for a enable / disable
> control for TLE.
> 
But you still didn't answered why its neccessary to manually
enable/disable it, when automatic detection could do job as well. 

> The community via Carlos et al is asking for a larger discussion about
> tunables in general.
> 
> I was asked to describe the specific issue that I thought justified
> expediting this discussion and or staging the implementation. This was
> the intent for my reply to Carlos.
> 
> We will address the heuristics for our TLE implementation. But there
> will always be cases where the customer and the kernel will do
> unexpected things. So a top level control to enable/disable TLE is a
> first order requirement.

I am still skeptical of that claim, is it possbible that ellision would
be slow and succeed most of time?

If not then profiling should work. Also this doesn't handle case where
elision helps half mutexes and harms other half where you would need
better granularity than per-program one.
\
  
Paul E. Murphy Sept. 8, 2015, 4:30 p.m. UTC | #11
On 09/05/2015 01:23 AM, Ondřej Bílka wrote:
> On Thu, Sep 03, 2015 at 03:23:15PM -0500, Steven Munroe wrote:
>> On Thu, 2015-09-03 at 21:51 +0200, Ondřej Bílka wrote:
>>> On Thu, Sep 03, 2015 at 10:08:54AM -0500, Steven Munroe wrote:
>>>
>>> Then adding tunable for that looks like bad idea. If you want to add
>>> tunable it should be for something where only programmer has data about
>>> performance.
>>>
>> I did not ask for a tunable for this. I am asking for a enable / disable
>> control for TLE.
>>
> But you still didn't answered why its neccessary to manually
> enable/disable it, when automatic detection could do job as well. 
> 
>> The community via Carlos et al is asking for a larger discussion about
>> tunables in general.
>>
>> I was asked to describe the specific issue that I thought justified
>> expediting this discussion and or staging the implementation. This was
>> the intent for my reply to Carlos.
>>
>> We will address the heuristics for our TLE implementation. But there
>> will always be cases where the customer and the kernel will do
>> unexpected things. So a top level control to enable/disable TLE is a
>> first order requirement.
> 
> I am still skeptical of that claim, is it possbible that ellision would
> be slow and succeed most of time?
> 
> If not then profiling should work. Also this doesn't handle case where
> elision helps half mutexes and harms other half where you would need
> better granularity than per-program one.
> 

Ondřej, you do present an interesting idea. I appreciate your feedback,
but we are still a few steps prior to tuning TLE in such a way. I think
we need to take a conservative approach with TLE, especially on PPC,
as we work out the kinks.

TLE is much more sensitive on PPC than other architectures. Having too
many users ruins it for all. The hardware works best with very short
critical sections, or low contention for TLE resources per thread/core.

Mixing in with the tunables discussion, we will be retaining the right
to remove, or alter these options as we see fit. Such a change can be
trivially reverted should we find an effective runtime heuristic, but
we lack the breadth of data to propose a good mechanism in the
meantime.

BR,
Paul
  

Patch

From 78373b09b201ce840ad164d911ee7017f59070f7 Mon Sep 17 00:00:00 2001
From: Paul E. Murphy <murphyp@linux.vnet.ibm.com>
Date: Thu, 27 Aug 2015 18:04:59 -0500
Subject: [PATCH] [RFCv2] Optionally disable TLE

It may be desirable to disable TLE.  While we wait out the
tunables framework, it may be helpeful to have an env var
to turn it off if it is discovered to hamper performance.

My hope is this degenerate usage can be trivially ported
to the tunables framework when it arrives.

This defines the environment variable GLIBC_LOCK_ELISION.
If it is set to "none", will disable TLE for all archs which
support it.

2015-08-28  Paul E. Murphy  <murphyp@linux.vnet.ibm.com>

	* sysdeps/unix/sysv/linux/powerpc/elision-conf.c (elision_init):
	Disable based on env var set to "none".
	* sysdeps/unix/sysv/linux/s390/elision-conf.c (elision_init):
	Likewise.
	* sysdeps/unix/sysv/linux/x86/elision-conf.c (elision_init):
	Likewise.
---
 sysdeps/unix/sysv/linux/powerpc/elision-conf.c |    5 +++++
 sysdeps/unix/sysv/linux/s390/elision-conf.c    |    5 +++++
 sysdeps/unix/sysv/linux/x86/elision-conf.c     |    5 +++++
 3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/powerpc/elision-conf.c b/sysdeps/unix/sysv/linux/powerpc/elision-conf.c
index 5341222..30a3bfd 100644
--- a/sysdeps/unix/sysv/linux/powerpc/elision-conf.c
+++ b/sysdeps/unix/sysv/linux/powerpc/elision-conf.c
@@ -62,6 +62,11 @@  elision_init (int argc __attribute__ ((unused)),
 #ifdef ENABLE_LOCK_ELISION
   int elision_available = (GLRO (dl_hwcap2) & PPC_FEATURE2_HAS_HTM) ? 1 : 0;
   __pthread_force_elision = __libc_enable_secure ? 0 : elision_available;
+
+  /* Check environment to see if we should disable LE */
+  for (; *environ != NULL; environ++)
+    if (strcmp (*environ, "GLIBC_LOCK_ELISION=none") == 0)
+      __pthread_force_elision = 0;
 #endif
   if (!__pthread_force_elision)
     /* Disable elision on rwlocks.  */
diff --git a/sysdeps/unix/sysv/linux/s390/elision-conf.c b/sysdeps/unix/sysv/linux/s390/elision-conf.c
index e1ff599..5fcb343 100644
--- a/sysdeps/unix/sysv/linux/s390/elision-conf.c
+++ b/sysdeps/unix/sysv/linux/s390/elision-conf.c
@@ -64,6 +64,11 @@  elision_init (int argc __attribute__ ((unused)),
      When false elision is never attempted.  */
   int elision_available = (GLRO (dl_hwcap) & HWCAP_S390_TE) ? 1 : 0;
 
+  /* Check environment to see if we should disable LE */
+  for (; *environ != NULL; environ++)
+    if (strcmp (*environ, "GLIBC_LOCK_ELISION=none") == 0)
+      __pthread_force_elision = 0;
+
   __pthread_force_elision = __libc_enable_secure ? 0 : elision_available;
 }
 
diff --git a/sysdeps/unix/sysv/linux/x86/elision-conf.c b/sysdeps/unix/sysv/linux/x86/elision-conf.c
index 4a73382..6690304 100644
--- a/sysdeps/unix/sysv/linux/x86/elision-conf.c
+++ b/sysdeps/unix/sysv/linux/x86/elision-conf.c
@@ -65,6 +65,11 @@  elision_init (int argc __attribute__ ((unused)),
   __elision_available = HAS_CPU_FEATURE (RTM);
 #ifdef ENABLE_LOCK_ELISION
   __pthread_force_elision = __libc_enable_secure ? 0 : __elision_available;
+
+  /* Check environment to see if we should disable LE */
+  for (; *environ != NULL; environ++)
+    if (strcmp (*environ, "GLIBC_LOCK_ELISION=none") == 0)
+      __pthread_force_elision = 0;
 #endif
   if (!HAS_CPU_FEATURE (RTM))
     __elision_aconf.retry_try_xbegin = 0; /* Disable elision on rwlocks */
-- 
1.7.1