Message ID | CAMe9rOqpbUF2m40pxjyr+O8pSrA9EmUgNsOcBPuP-wDaMqn+RQ@mail.gmail.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 103948 invoked by alias); 18 Mar 2016 13:51:47 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 103924 invoked by uid 89); 18 Mar 2016 13:51:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=429, AmitPawaramdcom, Amit.Pawar@amd.com, amitpawaramdcom X-HELO: mail-qg0-f45.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=MdsZgdWhAyhvLGYJD3RI/V/vFz2EKPi2P53EJnP8q1A=; b=HjJT3Z0rdYeK75p+9VVslcUHGL0u2MGIjFjipfofawP5M9U36mRnQyAvpviaj6AoNf la806M39fTXFCpRS5o5XOdMGNHmm6TfCTCBBBXuVfV7sCHG/Qgdo/iVaW1Rd6u9Fie9g Z99PiMrzXcTIWmYW40aj6Fizd823LC9RJyJHxiAIjdiHn/SufjauZGQ+EMay/2VJN0Ge /Ik2k4fjtHx6OSSiX0kjfvSOqNwNFD2zWaAYO9iNtqe/4ULlOp2CItY8S4My4ehqO6Lk QatZCJrsi6ltXPiAp6BWLTqg18uu7K2E04pg43C7tyCshMchqLGMKG9nEqTj0Vw1GYxr sy+Q== X-Gm-Message-State: AD7BkJIxWKjx3orCw40kUnPHkgWt2moeuNcCmlYbN33zqf6PgKly54AWHrTZSQLTuM2tcfUE7BgLYl3ycKiEcg== MIME-Version: 1.0 X-Received: by 10.141.1.87 with SMTP id c84mr23538822qhd.1.1458309092980; Fri, 18 Mar 2016 06:51:32 -0700 (PDT) In-Reply-To: <SN1PR12MB0733522F9520520B45459C24978C0@SN1PR12MB0733.namprd12.prod.outlook.com> References: <SN1PR12MB073325E2FB320E3CECD22660978B0@SN1PR12MB0733.namprd12.prod.outlook.com> <CAMe9rOo_pgS7Vh1+JGWiYbHr3yXZmRDpxaLX6Xs9dzHr-TSH1A@mail.gmail.com> <SN1PR12MB0733B252EEDF7DE08EE91AF9978B0@SN1PR12MB0733.namprd12.prod.outlook.com> <CAMe9rOqhAUNhvD0=FZm23MDeVMRaYrnkZ51wWB1O4JRu8o2ywg@mail.gmail.com> <SN1PR12MB07332500CE527AA6EAC1C360978C0@SN1PR12MB0733.namprd12.prod.outlook.com> <CAMe9rOqGKGsWHsM1NO7L46QdtMALoG_Wq3mahg=beWSesAg0jg@mail.gmail.com> <SN1PR12MB0733A07FB69B2EC3831FB091978C0@SN1PR12MB0733.namprd12.prod.outlook.com> <CAMe9rOoYXJQWB_T0SOM9+vj38yTndYknBZyjU3MbdAyc9x+g8A@mail.gmail.com> <SN1PR12MB0733522F9520520B45459C24978C0@SN1PR12MB0733.namprd12.prod.outlook.com> Date: Fri, 18 Mar 2016 06:51:32 -0700 Message-ID: <CAMe9rOqpbUF2m40pxjyr+O8pSrA9EmUgNsOcBPuP-wDaMqn+RQ@mail.gmail.com> Subject: Re: [PATCH x86_64] Update memcpy, mempcpy and memmove selection order for Excavator CPU BZ #19583 From: "H.J. Lu" <hjl.tools@gmail.com> To: "Pawar, Amit" <Amit.Pawar@amd.com> Cc: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org> Content-Type: text/plain; charset=UTF-8 |
Commit Message
H.J. Lu
March 18, 2016, 1:51 p.m. UTC
On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote: >>No, it isn't fixed. Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more. Also you need to fix ALL selections. > > diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S > index 8882590..a5afaf4 100644 > --- a/sysdeps/x86_64/multiarch/memcpy.S > +++ b/sysdeps/x86_64/multiarch/memcpy.S > @@ -39,6 +39,8 @@ ENTRY(__new_memcpy) > ret > #endif > 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP > + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) > + jnz 3f > HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) > jnz 2f > lea __memcpy_sse2_unaligned(%rip), %RAX_LP > @@ -52,6 +54,8 @@ ENTRY(__new_memcpy) > jnz 2f > lea __memcpy_ssse3(%rip), %RAX_LP > 2: ret > +3: lea __memcpy_ssse3(%rip), %RAX_LP > + ret > END(__new_memcpy) > > # undef ENTRY > > Will update all IFUNC's if this ok else please suggest. > Better, but not OK. Try something like iff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S index ab5998c..2abe2fd 100644
Comments
On 18-03-2016 10:51, H.J. Lu wrote: > On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote: >>> No, it isn't fixed. Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more. Also you need to fix ALL selections. >> >> diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S >> index 8882590..a5afaf4 100644 >> --- a/sysdeps/x86_64/multiarch/memcpy.S >> +++ b/sysdeps/x86_64/multiarch/memcpy.S >> @@ -39,6 +39,8 @@ ENTRY(__new_memcpy) >> ret >> #endif >> 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP >> + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) >> + jnz 3f >> HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) >> jnz 2f >> lea __memcpy_sse2_unaligned(%rip), %RAX_LP >> @@ -52,6 +54,8 @@ ENTRY(__new_memcpy) >> jnz 2f >> lea __memcpy_ssse3(%rip), %RAX_LP >> 2: ret >> +3: lea __memcpy_ssse3(%rip), %RAX_LP >> + ret >> END(__new_memcpy) >> >> # undef ENTRY >> >> Will update all IFUNC's if this ok else please suggest. >> > > Better, but not OK. Try something like > > iff --git a/sysdeps/x86_64/multiarch/memcpy.S > b/sysdeps/x86_64/multiarch/memcpy.S > index ab5998c..2abe2fd 100644 > --- a/sysdeps/x86_64/multiarch/memcpy.S > +++ b/sysdeps/x86_64/multiarch/memcpy.S > @@ -42,9 +42,11 @@ ENTRY(__new_memcpy) > ret > #endif > 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP > + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) > + jnz 3f > HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) > jnz 2f > - lea __memcpy_sse2_unaligned(%rip), %RAX_LP > +3: lea __memcpy_sse2_unaligned(%rip), %RAX_LP > HAS_ARCH_FEATURE (Fast_Unaligned_Load) > jnz 2f > lea __memcpy_sse2(%rip), %RAX_LP > > I know this is not related to this patch, but any reason to not code the resolver using the libc_ifunc macros?
On Fri, Mar 18, 2016 at 6:55 AM, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > > On 18-03-2016 10:51, H.J. Lu wrote: >> On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote: >>>> No, it isn't fixed. Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more. Also you need to fix ALL selections. >>> >>> diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S >>> index 8882590..a5afaf4 100644 >>> --- a/sysdeps/x86_64/multiarch/memcpy.S >>> +++ b/sysdeps/x86_64/multiarch/memcpy.S >>> @@ -39,6 +39,8 @@ ENTRY(__new_memcpy) >>> ret >>> #endif >>> 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP >>> + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) >>> + jnz 3f >>> HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) >>> jnz 2f >>> lea __memcpy_sse2_unaligned(%rip), %RAX_LP >>> @@ -52,6 +54,8 @@ ENTRY(__new_memcpy) >>> jnz 2f >>> lea __memcpy_ssse3(%rip), %RAX_LP >>> 2: ret >>> +3: lea __memcpy_ssse3(%rip), %RAX_LP >>> + ret >>> END(__new_memcpy) >>> >>> # undef ENTRY >>> >>> Will update all IFUNC's if this ok else please suggest. >>> >> >> Better, but not OK. Try something like >> >> iff --git a/sysdeps/x86_64/multiarch/memcpy.S >> b/sysdeps/x86_64/multiarch/memcpy.S >> index ab5998c..2abe2fd 100644 >> --- a/sysdeps/x86_64/multiarch/memcpy.S >> +++ b/sysdeps/x86_64/multiarch/memcpy.S >> @@ -42,9 +42,11 @@ ENTRY(__new_memcpy) >> ret >> #endif >> 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP >> + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) >> + jnz 3f >> HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) >> jnz 2f >> - lea __memcpy_sse2_unaligned(%rip), %RAX_LP >> +3: lea __memcpy_sse2_unaligned(%rip), %RAX_LP >> HAS_ARCH_FEATURE (Fast_Unaligned_Load) >> jnz 2f >> lea __memcpy_sse2(%rip), %RAX_LP >> >> > > I know this is not related to this patch, but any reason to not code the > resolver using the libc_ifunc macros? Did you mean writing them in C? It can be done. Someone needs to write patches.
On Fri, Mar 18, 2016 at 6:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote: >>>No, it isn't fixed. Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more. Also you need to fix ALL selections. >> >> diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S >> index 8882590..a5afaf4 100644 >> --- a/sysdeps/x86_64/multiarch/memcpy.S >> +++ b/sysdeps/x86_64/multiarch/memcpy.S >> @@ -39,6 +39,8 @@ ENTRY(__new_memcpy) >> ret >> #endif >> 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP >> + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) >> + jnz 3f >> HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) >> jnz 2f >> lea __memcpy_sse2_unaligned(%rip), %RAX_LP >> @@ -52,6 +54,8 @@ ENTRY(__new_memcpy) >> jnz 2f >> lea __memcpy_ssse3(%rip), %RAX_LP >> 2: ret >> +3: lea __memcpy_ssse3(%rip), %RAX_LP >> + ret >> END(__new_memcpy) >> >> # undef ENTRY >> >> Will update all IFUNC's if this ok else please suggest. >> > > Better, but not OK. Try something like > > iff --git a/sysdeps/x86_64/multiarch/memcpy.S > b/sysdeps/x86_64/multiarch/memcpy.S > index ab5998c..2abe2fd 100644 > --- a/sysdeps/x86_64/multiarch/memcpy.S > +++ b/sysdeps/x86_64/multiarch/memcpy.S > @@ -42,9 +42,11 @@ ENTRY(__new_memcpy) > ret > #endif > 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP > + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) > + jnz 3f > HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) > jnz 2f > - lea __memcpy_sse2_unaligned(%rip), %RAX_LP > +3: lea __memcpy_sse2_unaligned(%rip), %RAX_LP > HAS_ARCH_FEATURE (Fast_Unaligned_Load) > jnz 2f > lea __memcpy_sse2(%rip), %RAX_LP > One question. If you don't want __memcpy_avx_unaligned, why do you set AVX_Fast_Unaligned_Load?
>One question. If you don't want __memcpy_avx_unaligned, why do you set AVX_Fast_Unaligned_Load?
Any idea whether currently any other string and memory functions are under implementation based on this? If not then let me just verify it.
Also this feature is enabled in generic code. To disable it, need to change after this.
--Amit Pawar
On Fri, Mar 18, 2016 at 8:19 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote: >>One question. If you don't want __memcpy_avx_unaligned, why do you set AVX_Fast_Unaligned_Load? > Any idea whether currently any other string and memory functions are under implementation based on this? If not then let me just verify it. > > Also this feature is enabled in generic code. To disable it, need to change after this. > It was done based on assumption that AVX enabled machine has fast AVX unaligned load. If it isn't true for AMD CPUs, we can enable it for all Intel AVX CPUs and you can set it for AMD CPUs properly.
--- a/sysdeps/x86_64/multiarch/memcpy.S +++ b/sysdeps/x86_64/multiarch/memcpy.S @@ -42,9 +42,11 @@ ENTRY(__new_memcpy) ret #endif 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) + jnz 3f HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) jnz 2f - lea __memcpy_sse2_unaligned(%rip), %RAX_LP +3: lea __memcpy_sse2_unaligned(%rip), %RAX_LP HAS_ARCH_FEATURE (Fast_Unaligned_Load) jnz 2f lea __memcpy_sse2(%rip), %RAX_LP