Message ID | CAMe9rOoP_293SM=sYpuqx5Yg9K1a-F9GtN6g0DDFjHTZM5OkcA@mail.gmail.com |
---|---|
State | Committed |
Headers |
Received: (qmail 45175 invoked by alias); 7 Jun 2019 18:00:34 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 36338 invoked by uid 89); 7 Jun 2019 18:00:27 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-13.9 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-ot1-f66.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=Ft40I7aIj2GiHj/sYep+/LgtSN8WIYku6jcOrDzpJXQ=; b=ZN+2qdkOShe3YVdXo146ANVlIBpPiJC8LUHCkHMFA7UMsGLq6dxwzd62X9vwlrKEcl gPhhH9zrSb/U+u9P046nm8HkP5an/YP8UV8X+v1BneDJdPDxgvxnVdJdaFrm/V3p9izF doDi6j/tyMunNUS9lh2uiheNubmmuTd6XA6JMPLGsoCd1Sv5x50rV04ZqUIwE8lFru4D s0fZdW8ga6fMnfMpZ4pQAzQO7AcF9BvJdeoyeLJGXT8mN/N0Gd4MIQ5yRC9nmQ9ZFyr9 gJNZvKBzTzpekeMUGxTvhLlurJw6EJbRk9W/2hH3pyabotfvG3bsRXxAP2xhjfYI/g6P CzKg== MIME-Version: 1.0 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Fri, 7 Jun 2019 10:59:46 -0700 Message-ID: <CAMe9rOoP_293SM=sYpuqx5Yg9K1a-F9GtN6g0DDFjHTZM5OkcA@mail.gmail.com> Subject: [PATCH] x86-64: Compile branred.c with -mprefer-vector-width=128 To: GNU C Library <libc-alpha@sourceware.org> Content-Type: multipart/mixed; boundary="000000000000ae10a9058abf9ae0" |
Commit Message
H.J. Lu
June 7, 2019, 5:59 p.m. UTC
-O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c with 256-bit vector instructions, which leads to store forward stall: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579 There is no easy fix in compiler. This patch limits vector width to 128 bits to work around this issue. It improves performance of sin and cos by more than 40% on Skylake compiled with -O3 -march=skylake. OK for master branch? * sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set to -mprefer-vector-width=128.
Comments
* H. J. Lu: > -O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c > with 256-bit vector instructions, which leads to store forward stall: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579 > > There is no easy fix in compiler. This patch limits vector width to > 128 bits to work around this issue. It improves performance of sin > and cos by more than 40% on Skylake compiled with -O3 -march=skylake. > > OK for master branch? > > * sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set > to -mprefer-vector-width=128. This is bug 24603, right? Let's hope that the reproducer in the test case is misleadingly reduced, and we can fix the actual issue in the compiler. I updated the GCC PR. Thanks, Florian
On Fri, Jun 7, 2019 at 11:53 AM Florian Weimer <fweimer@redhat.com> wrote: > > * H. J. Lu: > > > -O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c > > with 256-bit vector instructions, which leads to store forward stall: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579 > > > > There is no easy fix in compiler. This patch limits vector width to > > 128 bits to work around this issue. It improves performance of sin > > and cos by more than 40% on Skylake compiled with -O3 -march=skylake. > > > > OK for master branch? > > > > * sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set > > to -mprefer-vector-width=128. > > This is bug 24603, right? Yes. > Let's hope that the reproducer in the test case is misleadingly reduced, > and we can fix the actual issue in the compiler. I updated the GCC PR. > It makes when arrays are 32-byte aligned.
On Fri, 7 Jun 2019, H.J. Lu wrote: > -O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c > with 256-bit vector instructions, which leads to store forward stall: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579 > > There is no easy fix in compiler. This patch limits vector width to > 128 bits to work around this issue. It improves performance of sin > and cos by more than 40% on Skylake compiled with -O3 -march=skylake. > > OK for master branch? > > * sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set > to -mprefer-vector-width=128. This is a case where the Makefile definition clearly needs a comment explaining the issue, stating what GCC version is known to have the issue and pointing to the GCC bug.
On Mon, Jun 10, 2019 at 8:45 AM Joseph Myers <joseph@codesourcery.com> wrote: > > On Fri, 7 Jun 2019, H.J. Lu wrote: > > > -O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c > > with 256-bit vector instructions, which leads to store forward stall: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579 > > > > There is no easy fix in compiler. This patch limits vector width to > > 128 bits to work around this issue. It improves performance of sin > > and cos by more than 40% on Skylake compiled with -O3 -march=skylake. > > > > OK for master branch? > > > > * sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set > > to -mprefer-vector-width=128. > > This is a case where the Makefile definition clearly needs a comment > explaining the issue, stating what GCC version is known to have the issue > and pointing to the GCC bug. > Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. GCC 7 doesn't have this problem since it doesn't vectorize the second loop. Here is the updated patch. OK for master? Thanks.
* H. J. Lu: > Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. > GCC 7 doesn't have this problem since it doesn't vectorize the second loop. > > Here is the updated patch. > > OK for master? I still don't understand why we need a workaround for a GCC performance bug in glibc. Is there any precedent for this? Thanks, Florian
On Tue, Jun 11, 2019 at 9:02 AM Florian Weimer <fweimer@redhat.com> wrote: > > * H. J. Lu: > > > Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. > > GCC 7 doesn't have this problem since it doesn't vectorize the second loop. > > > > Here is the updated patch. > > > > OK for master? > > I still don't understand why we need a workaround for a GCC performance > bug in glibc. Is there any precedent for this? > I don't know if we ever investigated performance impacts of different GCC optimizations on glibc.
On Tue, Jun 11, 2019 at 9:07 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Tue, Jun 11, 2019 at 9:02 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > * H. J. Lu: > > > > > Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. > > > GCC 7 doesn't have this problem since it doesn't vectorize the second loop. > > > > > > Here is the updated patch. > > > > > > OK for master? > > > > I still don't understand why we need a workaround for a GCC performance > > bug in glibc. Is there any precedent for this? > > > > I don't know if we ever investigated performance impacts of different GCC > optimizations on glibc. > I found: sysdeps/powerpc/powerpc64/power4/Makefile:CFLAGS-memmove.c += --param max-variable-expansions-in-unroller=2 --param max-unroll-times=2 -funroll-loops -fpeel-loops
* H. J. Lu: > On Tue, Jun 11, 2019 at 9:07 AM H.J. Lu <hjl.tools@gmail.com> wrote: >> >> On Tue, Jun 11, 2019 at 9:02 AM Florian Weimer <fweimer@redhat.com> wrote: >> > >> > * H. J. Lu: >> > >> > > Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. >> > > GCC 7 doesn't have this problem since it doesn't vectorize the second loop. >> > > >> > > Here is the updated patch. >> > > >> > > OK for master? >> > >> > I still don't understand why we need a workaround for a GCC performance >> > bug in glibc. Is there any precedent for this? >> > >> >> I don't know if we ever investigated performance impacts of different GCC >> optimizations on glibc. >> > > I found: > > sysdeps/powerpc/powerpc64/power4/Makefile:CFLAGS-memmove.c += --param > max-variable-expansions-in-unroller=2 --param max-unroll-times=2 > -funroll-loops -fpeel-loops That's been there since before 2012. This reflects my main concern about such tweaks: That they will never be reviewed and removed when they are no longer necessary (and probably have negative impact). Thanks, Florian
On Tue, Jun 11, 2019 at 9:16 AM Florian Weimer <fweimer@redhat.com> wrote: > > * H. J. Lu: > > > On Tue, Jun 11, 2019 at 9:07 AM H.J. Lu <hjl.tools@gmail.com> wrote: > >> > >> On Tue, Jun 11, 2019 at 9:02 AM Florian Weimer <fweimer@redhat.com> wrote: > >> > > >> > * H. J. Lu: > >> > > >> > > Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. > >> > > GCC 7 doesn't have this problem since it doesn't vectorize the second loop. > >> > > > >> > > Here is the updated patch. > >> > > > >> > > OK for master? > >> > > >> > I still don't understand why we need a workaround for a GCC performance > >> > bug in glibc. Is there any precedent for this? > >> > > >> > >> I don't know if we ever investigated performance impacts of different GCC > >> optimizations on glibc. > >> > > > > I found: > > > > sysdeps/powerpc/powerpc64/power4/Makefile:CFLAGS-memmove.c += --param > > max-variable-expansions-in-unroller=2 --param max-unroll-times=2 > > -funroll-loops -fpeel-loops > > That's been there since before 2012. This reflects my main concern > about such tweaks: That they will never be reviewed and removed when > they are no longer necessary (and probably have negative impact). > In the case of sysdeps/ieee754/dbl-64/branred.c, 128-bit vector instructions give better performance that 256-bit . It won't change with different or new compilers.
On 11/06/2019 13:16, Florian Weimer wrote: > * H. J. Lu: > >> On Tue, Jun 11, 2019 at 9:07 AM H.J. Lu <hjl.tools@gmail.com> wrote: >>> >>> On Tue, Jun 11, 2019 at 9:02 AM Florian Weimer <fweimer@redhat.com> wrote: >>>> >>>> * H. J. Lu: >>>> >>>>> Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. >>>>> GCC 7 doesn't have this problem since it doesn't vectorize the second loop. >>>>> >>>>> Here is the updated patch. >>>>> >>>>> OK for master? >>>> >>>> I still don't understand why we need a workaround for a GCC performance >>>> bug in glibc. Is there any precedent for this? >>>> >>> >>> I don't know if we ever investigated performance impacts of different GCC >>> optimizations on glibc. >>> >> >> I found: >> >> sysdeps/powerpc/powerpc64/power4/Makefile:CFLAGS-memmove.c += --param >> max-variable-expansions-in-unroller=2 --param max-unroll-times=2 >> -funroll-loops -fpeel-loops > > That's been there since before 2012. This reflects my main concern > about such tweaks: That they will never be reviewed and removed when > they are no longer necessary (and probably have negative impact). > In fact I have been playing in make powerpc sysdeps less convoluted and I almost sure this snippet could be removed without performance differences, as for some mpa.c hacks that tried to overcome some compiler not being able to optimize that eventually turned out to produce worse results in newer version and chips (c4c0848bbb7a4ad6ab8149abf982a0f10fd2821b).
* Florian Weimer: > * H. J. Lu: > >> On Tue, Jun 11, 2019 at 9:07 AM H.J. Lu <hjl.tools@gmail.com> wrote: >>> >>> On Tue, Jun 11, 2019 at 9:02 AM Florian Weimer <fweimer@redhat.com> wrote: >>> > >>> > * H. J. Lu: >>> > >>> > > Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. >>> > > GCC 7 doesn't have this problem since it doesn't vectorize the second loop. >>> > > >>> > > Here is the updated patch. >>> > > >>> > > OK for master? >>> > >>> > I still don't understand why we need a workaround for a GCC performance >>> > bug in glibc. Is there any precedent for this? >>> > >>> >>> I don't know if we ever investigated performance impacts of different GCC >>> optimizations on glibc. >>> >> >> I found: >> >> sysdeps/powerpc/powerpc64/power4/Makefile:CFLAGS-memmove.c += --param >> max-variable-expansions-in-unroller=2 --param max-unroll-times=2 >> -funroll-loops -fpeel-loops > > That's been there since before 2012. This reflects my main concern > about such tweaks: That they will never be reviewed and removed when > they are no longer necessary (and probably have negative impact). For the record, this is not a strong objection to your patch, H.J. Thanks, Florian
On Fri, Jun 14, 2019 at 6:58 AM Florian Weimer <fweimer@redhat.com> wrote: > > * Florian Weimer: > > > * H. J. Lu: > > > >> On Tue, Jun 11, 2019 at 9:07 AM H.J. Lu <hjl.tools@gmail.com> wrote: > >>> > >>> On Tue, Jun 11, 2019 at 9:02 AM Florian Weimer <fweimer@redhat.com> wrote: > >>> > > >>> > * H. J. Lu: > >>> > > >>> > > Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. > >>> > > GCC 7 doesn't have this problem since it doesn't vectorize the second loop. > >>> > > > >>> > > Here is the updated patch. > >>> > > > >>> > > OK for master? > >>> > > >>> > I still don't understand why we need a workaround for a GCC performance > >>> > bug in glibc. Is there any precedent for this? > >>> > > >>> > >>> I don't know if we ever investigated performance impacts of different GCC > >>> optimizations on glibc. > >>> > >> > >> I found: > >> > >> sysdeps/powerpc/powerpc64/power4/Makefile:CFLAGS-memmove.c += --param > >> max-variable-expansions-in-unroller=2 --param max-unroll-times=2 > >> -funroll-loops -fpeel-loops > > > > That's been there since before 2012. This reflects my main concern > > about such tweaks: That they will never be reviewed and removed when > > they are no longer necessary (and probably have negative impact). > > For the record, this is not a strong objection to your patch, H.J. > In this particular case, 256-bit vectorizer caused 40% slowdown in sin/cos bench. After the GCC bug is fixed, we should update it to check for the fixed GCC. I'd like to have a resolution before glibc 2.31 is released. Thanks.
On 6/14/19 7:58 AM, Florian Weimer wrote: > * Florian Weimer: > >> * H. J. Lu: >> >>> On Tue, Jun 11, 2019 at 9:07 AM H.J. Lu <hjl.tools@gmail.com> wrote: >>>> >>>> On Tue, Jun 11, 2019 at 9:02 AM Florian Weimer <fweimer@redhat.com> wrote: >>>>> >>>>> * H. J. Lu: >>>>> >>>>>> Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. >>>>>> GCC 7 doesn't have this problem since it doesn't vectorize the second loop. >>>>>> >>>>>> Here is the updated patch. >>>>>> >>>>>> OK for master? >>>>> >>>>> I still don't understand why we need a workaround for a GCC performance >>>>> bug in glibc. Is there any precedent for this? >>>>> >>>> >>>> I don't know if we ever investigated performance impacts of different GCC >>>> optimizations on glibc. >>>> >>> >>> I found: >>> >>> sysdeps/powerpc/powerpc64/power4/Makefile:CFLAGS-memmove.c += --param >>> max-variable-expansions-in-unroller=2 --param max-unroll-times=2 >>> -funroll-loops -fpeel-loops >> >> That's been there since before 2012. This reflects my main concern >> about such tweaks: That they will never be reviewed and removed when >> they are no longer necessary (and probably have negative impact). > > For the record, this is not a strong objection to your patch, H.J. So one way to address this would be to somehow conditionalize the bits on the GCC version where they were an issue. If the version number is higher then issue an error of some kind which would force someone to look at the problem again. But I certainly understand the concern here :-) jeff
On Fri, Jun 14, 2019 at 9:43 AM Jeff Law <law@redhat.com> wrote: > > On 6/14/19 7:58 AM, Florian Weimer wrote: > > * Florian Weimer: > > > >> * H. J. Lu: > >> > >>> On Tue, Jun 11, 2019 at 9:07 AM H.J. Lu <hjl.tools@gmail.com> wrote: > >>>> > >>>> On Tue, Jun 11, 2019 at 9:02 AM Florian Weimer <fweimer@redhat.com> wrote: > >>>>> > >>>>> * H. J. Lu: > >>>>> > >>>>>> Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. > >>>>>> GCC 7 doesn't have this problem since it doesn't vectorize the second loop. > >>>>>> > >>>>>> Here is the updated patch. > >>>>>> > >>>>>> OK for master? > >>>>> > >>>>> I still don't understand why we need a workaround for a GCC performance > >>>>> bug in glibc. Is there any precedent for this? > >>>>> > >>>> > >>>> I don't know if we ever investigated performance impacts of different GCC > >>>> optimizations on glibc. > >>>> > >>> > >>> I found: > >>> > >>> sysdeps/powerpc/powerpc64/power4/Makefile:CFLAGS-memmove.c += --param > >>> max-variable-expansions-in-unroller=2 --param max-unroll-times=2 > >>> -funroll-loops -fpeel-loops > >> > >> That's been there since before 2012. This reflects my main concern > >> about such tweaks: That they will never be reviewed and removed when > >> they are no longer necessary (and probably have negative impact). > > > > For the record, this is not a strong objection to your patch, H.J. > So one way to address this would be to somehow conditionalize the bits > on the GCC version where they were an issue. If the version number is > higher then issue an error of some kind which would force someone to > look at the problem again. > We know GCC 8 and 9 have this issue. But we don't know when this issue will be fixed in GCC and we should allow building glibc with GCC newer than the current minimum version, including GCC 10. How should GCC version be checked here?
On 6/17/19 1:06 PM, H.J. Lu wrote: > On Fri, Jun 14, 2019 at 9:43 AM Jeff Law <law@redhat.com> wrote: >> >> On 6/14/19 7:58 AM, Florian Weimer wrote: >>> * Florian Weimer: >>> >>>> * H. J. Lu: >>>> >>>>> On Tue, Jun 11, 2019 at 9:07 AM H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>> >>>>>> On Tue, Jun 11, 2019 at 9:02 AM Florian Weimer <fweimer@redhat.com> wrote: >>>>>>> >>>>>>> * H. J. Lu: >>>>>>> >>>>>>>> Fixed. I also added check for -mprefer-vector-width=128 which is new in GCC 8. >>>>>>>> GCC 7 doesn't have this problem since it doesn't vectorize the second loop. >>>>>>>> >>>>>>>> Here is the updated patch. >>>>>>>> >>>>>>>> OK for master? >>>>>>> >>>>>>> I still don't understand why we need a workaround for a GCC performance >>>>>>> bug in glibc. Is there any precedent for this? >>>>>>> >>>>>> >>>>>> I don't know if we ever investigated performance impacts of different GCC >>>>>> optimizations on glibc. >>>>>> >>>>> >>>>> I found: >>>>> >>>>> sysdeps/powerpc/powerpc64/power4/Makefile:CFLAGS-memmove.c += --param >>>>> max-variable-expansions-in-unroller=2 --param max-unroll-times=2 >>>>> -funroll-loops -fpeel-loops >>>> >>>> That's been there since before 2012. This reflects my main concern >>>> about such tweaks: That they will never be reviewed and removed when >>>> they are no longer necessary (and probably have negative impact). >>> >>> For the record, this is not a strong objection to your patch, H.J. >> So one way to address this would be to somehow conditionalize the bits >> on the GCC version where they were an issue. If the version number is >> higher then issue an error of some kind which would force someone to >> look at the problem again. >> > We know GCC 8 and 9 have this issue. But we don't know when this issue > will be fixed in GCC and we should allow building glibc with GCC newer than > the current minimum version, including GCC 10. How should GCC version > be checked here? ISTM that 8, 9, 10 would use the new flag. 11 would issue an error which would trigger a reinvestigation roughly a year from now. The glibc maintainers would have the final say about this -- it's just an idea off the top of my head. jeff >
* Jeff Law: > ISTM that 8, 9, 10 would use the new flag. 11 would issue an error > which would trigger a reinvestigation roughly a year from now. Do you suggest to put in a #warning for GCC 11, so that people can configure with --disable-werror and still build with GCC 11? Given that GCC 11 will probably add other warnings that break the build, this proposal isn't entirely unreasonable (even though I generally dislike such time bombs). Thanks, Florian
On Mon, 17 Jun 2019, Florian Weimer wrote: > * Jeff Law: > > > ISTM that 8, 9, 10 would use the new flag. 11 would issue an error > > which would trigger a reinvestigation roughly a year from now. > > Do you suggest to put in a #warning for GCC 11, so that people can > configure with --disable-werror and still build with GCC 11? I don't think that's a good idea. We might want e.g. a helper script to find places in the source tree to revisit after some given GCC version is out / after some given GCC version is no longer supported for building glibc. (E.g. DIAG_IGNORE_NEEDS_COMMENT calls where the version specified is now too old to build glibc and so we should see if the warning in question still appears with the oldest supported version.) That implies having a limited number of standard ways to mark such places in the sources so they can be found later.
On Mon, Jun 17, 2019 at 12:12 PM Florian Weimer <fweimer@redhat.com> wrote: > > * Jeff Law: > > > ISTM that 8, 9, 10 would use the new flag. 11 would issue an error > > which would trigger a reinvestigation roughly a year from now. > > Do you suggest to put in a #warning for GCC 11, so that people can > configure with --disable-werror and still build with GCC 11? > > Given that GCC 11 will probably add other warnings that break the build, > this proposal isn't entirely unreasonable (even though I generally > dislike such time bombs). > Since there is no issue in source, this requires an artificial warning purely based on GCC version. We can place GCC version check in configure script with explicit -mprefer-vector-width=128 support. --enable-mprefer-vector-width: 1. Default. For GCC 8, 9, 10, use -mprefer-vector-width=128. For GCC 11 and above, issue a configure error. 2. Enabled. No GCC version check, use -mprefer-vector-width=128. 3. Disabled. No GCC version check, don't use -mprefer-vector-width=128.
On Mon, 17 Jun 2019, H.J. Lu wrote: > On Mon, Jun 17, 2019 at 12:12 PM Florian Weimer <fweimer@redhat.com> wrote: > > > > * Jeff Law: > > > > > ISTM that 8, 9, 10 would use the new flag. 11 would issue an error > > > which would trigger a reinvestigation roughly a year from now. > > > > Do you suggest to put in a #warning for GCC 11, so that people can > > configure with --disable-werror and still build with GCC 11? > > > > Given that GCC 11 will probably add other warnings that break the build, > > this proposal isn't entirely unreasonable (even though I generally > > dislike such time bombs). > > > > Since there is no issue in source, this requires an artificial warning > purely based on GCC version. We can place GCC version check in > configure script with explicit -mprefer-vector-width=128 support. > > --enable-mprefer-vector-width: > > 1. Default. For GCC 8, 9, 10, use -mprefer-vector-width=128. > For GCC 11 and above, issue a configure error. I don't think anything causing the build to break with newer GCC versions is appropriate.
On 6/17/19 3:37 PM, Joseph Myers wrote: > On Mon, 17 Jun 2019, H.J. Lu wrote: > >> On Mon, Jun 17, 2019 at 12:12 PM Florian Weimer <fweimer@redhat.com> wrote: >>> >>> * Jeff Law: >>> >>>> ISTM that 8, 9, 10 would use the new flag. 11 would issue an error >>>> which would trigger a reinvestigation roughly a year from now. >>> >>> Do you suggest to put in a #warning for GCC 11, so that people can >>> configure with --disable-werror and still build with GCC 11? >>> >>> Given that GCC 11 will probably add other warnings that break the build, >>> this proposal isn't entirely unreasonable (even though I generally >>> dislike such time bombs). >>> >> >> Since there is no issue in source, this requires an artificial warning >> purely based on GCC version. We can place GCC version check in >> configure script with explicit -mprefer-vector-width=128 support. >> >> --enable-mprefer-vector-width: >> >> 1. Default. For GCC 8, 9, 10, use -mprefer-vector-width=128. >> For GCC 11 and above, issue a configure error. > > I don't think anything causing the build to break with newer GCC versions > is appropriate. Fair enough -- hence a bit of a step away from the decision process in my prior message on this subject. I threw it out as a way to address the issue Florian raised, but I strongly believe it's up to the glibc team to decide how they want to go forward. jeff
On Mon, Jun 17, 2019 at 2:41 PM Jeff Law <law@redhat.com> wrote: > > On 6/17/19 3:37 PM, Joseph Myers wrote: > > On Mon, 17 Jun 2019, H.J. Lu wrote: > > > >> On Mon, Jun 17, 2019 at 12:12 PM Florian Weimer <fweimer@redhat.com> wrote: > >>> > >>> * Jeff Law: > >>> > >>>> ISTM that 8, 9, 10 would use the new flag. 11 would issue an error > >>>> which would trigger a reinvestigation roughly a year from now. > >>> > >>> Do you suggest to put in a #warning for GCC 11, so that people can > >>> configure with --disable-werror and still build with GCC 11? > >>> > >>> Given that GCC 11 will probably add other warnings that break the build, > >>> this proposal isn't entirely unreasonable (even though I generally > >>> dislike such time bombs). > >>> > >> > >> Since there is no issue in source, this requires an artificial warning > >> purely based on GCC version. We can place GCC version check in > >> configure script with explicit -mprefer-vector-width=128 support. > >> > >> --enable-mprefer-vector-width: > >> > >> 1. Default. For GCC 8, 9, 10, use -mprefer-vector-width=128. > >> For GCC 11 and above, issue a configure error. > > > > I don't think anything causing the build to break with newer GCC versions > > is appropriate. > Fair enough -- hence a bit of a step away from the decision process in > my prior message on this subject. I threw it out as a way to address > the issue Florian raised, but I strongly believe it's up to the glibc > team to decide how they want to go forward. > I'd like to check in my patch to close the 40% performance gap with GCC 8, 9 and 10. We will monitor performance changes in the future GCC. Thanks.
On Thu, Jun 20, 2019 at 9:30 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Mon, Jun 17, 2019 at 2:41 PM Jeff Law <law@redhat.com> wrote: > > > > On 6/17/19 3:37 PM, Joseph Myers wrote: > > > On Mon, 17 Jun 2019, H.J. Lu wrote: > > > > > >> On Mon, Jun 17, 2019 at 12:12 PM Florian Weimer <fweimer@redhat.com> wrote: > > >>> > > >>> * Jeff Law: > > >>> > > >>>> ISTM that 8, 9, 10 would use the new flag. 11 would issue an error > > >>>> which would trigger a reinvestigation roughly a year from now. > > >>> > > >>> Do you suggest to put in a #warning for GCC 11, so that people can > > >>> configure with --disable-werror and still build with GCC 11? > > >>> > > >>> Given that GCC 11 will probably add other warnings that break the build, > > >>> this proposal isn't entirely unreasonable (even though I generally > > >>> dislike such time bombs). > > >>> > > >> > > >> Since there is no issue in source, this requires an artificial warning > > >> purely based on GCC version. We can place GCC version check in > > >> configure script with explicit -mprefer-vector-width=128 support. > > >> > > >> --enable-mprefer-vector-width: > > >> > > >> 1. Default. For GCC 8, 9, 10, use -mprefer-vector-width=128. > > >> For GCC 11 and above, issue a configure error. > > > > > > I don't think anything causing the build to break with newer GCC versions > > > is appropriate. > > Fair enough -- hence a bit of a step away from the decision process in > > my prior message on this subject. I threw it out as a way to address > > the issue Florian raised, but I strongly believe it's up to the glibc > > team to decide how they want to go forward. > > > > I'd like to check in my patch to close the 40% performance gap with GCC > 8, 9 and 10. We will monitor performance changes in the future GCC. > > Thanks. > I will check in my patch tomorrow. Thanks.
From 53f43ccf241896d37b759ac416df0ef0ccd2da0e Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Fri, 17 May 2019 14:23:03 -0700 Subject: [PATCH] x86-64: Compile branred.c with -mprefer-vector-width=128 -O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c with 256-bit vector instructions, which leads to store forward stall: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579 There is no easy fix in compiler. This patch limits vector width to 128 bits to work around this issue. It improves performance of sin and cos by more than 40% on Skylake compiled with -O3 -march=skylake. * sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set to -mprefer-vector-width=128. --- sysdeps/x86_64/fpu/Makefile | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile index 2b7d69bb50..b5f9589021 100644 --- a/sysdeps/x86_64/fpu/Makefile +++ b/sysdeps/x86_64/fpu/Makefile @@ -237,3 +237,7 @@ CFLAGS-test-float-libmvec-sincosf-avx512.c = -DREQUIRE_AVX512F CFLAGS-test-float-libmvec-sincosf-avx512-main.c = $(libmvec-sincos-cflags) $(float-vlen16-arch-ext-cflags) endif endif + +ifeq ($(subdir),math) +CFLAGS-branred.c = -mprefer-vector-width=128 +endif -- 2.20.1