[v6,1/2] x86: Add comment explaining no Slow_SSE4_2 check in ifunc-sse4_2

Message ID 20220630201319.3777218-1-goldstein.w.n@gmail.com
State Superseded
Headers
Series [v6,1/2] x86: Add comment explaining no Slow_SSE4_2 check in ifunc-sse4_2 |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Noah Goldstein June 30, 2022, 8:13 p.m. UTC
  Just for clarities sake and so that if a future implementation is
added we remember to add the check.
---
 sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 4 ++++
 1 file changed, 4 insertions(+)
  

Comments

H.J. Lu June 30, 2022, 11:20 p.m. UTC | #1
On Thu, Jun 30, 2022 at 1:13 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> Just for clarities sake and so that if a future implementation is
> added we remember to add the check.
> ---
>  sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> index ee36525bcf..752798278c 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> +++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> @@ -27,6 +27,10 @@ IFUNC_SELECTOR (void)
>  {
>    const struct cpu_features* cpu_features = __get_cpu_features ();
>
> +  /* This function uses slow sse4.2 instructions (pcmpstri) but since
> +     there is no other optimized implementation keep using it.  If an
> +     optimized fallback is added add a X86_ISA_CPU_FEATURE_ARCH_P
> +     (cpu_features, Slow_SSE4_2) check.  */

This function always uses sse4.2 instructions (pcmpstri) since there
is no other optimized implementation.  If an ...

>    if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
>      return OPTIMIZE (sse42);
>
> --
> 2.34.1
>
  
Noah Goldstein July 1, 2022, 12:01 a.m. UTC | #2
On Thu, Jun 30, 2022 at 4:20 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 30, 2022 at 1:13 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > Just for clarities sake and so that if a future implementation is
> > added we remember to add the check.
> > ---
> >  sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > index ee36525bcf..752798278c 100644
> > --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > +++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > @@ -27,6 +27,10 @@ IFUNC_SELECTOR (void)
> >  {
> >    const struct cpu_features* cpu_features = __get_cpu_features ();
> >
> > +  /* This function uses slow sse4.2 instructions (pcmpstri) but since
> > +     there is no other optimized implementation keep using it.  If an
> > +     optimized fallback is added add a X86_ISA_CPU_FEATURE_ARCH_P
> > +     (cpu_features, Slow_SSE4_2) check.  */
>
> This function always uses sse4.2 instructions (pcmpstri) since there
> is no other optimized implementation.  If an ...

Is it all sse4.2 instructions that count for Slow_SSE4.2 or other the
microcode string ones?
>
> >    if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
> >      return OPTIMIZE (sse42);
> >
> > --
> > 2.34.1
> >
>
>
> --
> H.J.
  
Noah Goldstein July 1, 2022, 12:01 a.m. UTC | #3
On Thu, Jun 30, 2022 at 5:01 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, Jun 30, 2022 at 4:20 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Thu, Jun 30, 2022 at 1:13 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > >
> > > Just for clarities sake and so that if a future implementation is
> > > added we remember to add the check.
> > > ---
> > >  sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > index ee36525bcf..752798278c 100644
> > > --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > +++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > @@ -27,6 +27,10 @@ IFUNC_SELECTOR (void)
> > >  {
> > >    const struct cpu_features* cpu_features = __get_cpu_features ();
> > >
> > > +  /* This function uses slow sse4.2 instructions (pcmpstri) but since
> > > +     there is no other optimized implementation keep using it.  If an
> > > +     optimized fallback is added add a X86_ISA_CPU_FEATURE_ARCH_P
> > > +     (cpu_features, Slow_SSE4_2) check.  */
> >
> > This function always uses sse4.2 instructions (pcmpstri) since there
> > is no other optimized implementation.  If an ...
>
> Is it all sse4.2 instructions that count for Slow_SSE4.2 or other the
> microcode string ones?
other->only*
> >
> > >    if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
> > >      return OPTIMIZE (sse42);
> > >
> > > --
> > > 2.34.1
> > >
> >
> >
> > --
> > H.J.
  
H.J. Lu July 1, 2022, 10:38 p.m. UTC | #4
On Thu, Jun 30, 2022 at 5:01 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, Jun 30, 2022 at 5:01 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Thu, Jun 30, 2022 at 4:20 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Thu, Jun 30, 2022 at 1:13 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > >
> > > > Just for clarities sake and so that if a future implementation is
> > > > added we remember to add the check.
> > > > ---
> > > >  sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > index ee36525bcf..752798278c 100644
> > > > --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > +++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > @@ -27,6 +27,10 @@ IFUNC_SELECTOR (void)
> > > >  {
> > > >    const struct cpu_features* cpu_features = __get_cpu_features ();
> > > >
> > > > +  /* This function uses slow sse4.2 instructions (pcmpstri) but since
> > > > +     there is no other optimized implementation keep using it.  If an
> > > > +     optimized fallback is added add a X86_ISA_CPU_FEATURE_ARCH_P
> > > > +     (cpu_features, Slow_SSE4_2) check.  */
> > >
> > > This function always uses sse4.2 instructions (pcmpstri) since there
> > > is no other optimized implementation.  If an ...
> >
> > Is it all sse4.2 instructions that count for Slow_SSE4.2 or other the
> > microcode string ones?
> other->only*

No.  Only the string instructions are slow.

> > >
> > > >    if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
> > > >      return OPTIMIZE (sse42);
> > > >
> > > > --
> > > > 2.34.1
> > > >
> > >
> > >
> > > --
> > > H.J.
  
Noah Goldstein July 1, 2022, 10:52 p.m. UTC | #5
On Fri, Jul 1, 2022 at 3:38 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 30, 2022 at 5:01 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Thu, Jun 30, 2022 at 5:01 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > >
> > > On Thu, Jun 30, 2022 at 4:20 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 1:13 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > > >
> > > > > Just for clarities sake and so that if a future implementation is
> > > > > added we remember to add the check.
> > > > > ---
> > > > >  sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 4 ++++
> > > > >  1 file changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > > index ee36525bcf..752798278c 100644
> > > > > --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > > +++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > > @@ -27,6 +27,10 @@ IFUNC_SELECTOR (void)
> > > > >  {
> > > > >    const struct cpu_features* cpu_features = __get_cpu_features ();
> > > > >
> > > > > +  /* This function uses slow sse4.2 instructions (pcmpstri) but since
> > > > > +     there is no other optimized implementation keep using it.  If an
> > > > > +     optimized fallback is added add a X86_ISA_CPU_FEATURE_ARCH_P
> > > > > +     (cpu_features, Slow_SSE4_2) check.  */
> > > >
> > > > This function always uses sse4.2 instructions (pcmpstri) since there
> > > > is no other optimized implementation.  If an ...

Then always uses slow sse4.2...

Also I think there needs to be a 'but' before since otherwise the sentence
sounds like it's saying the function uses sse4.2 because there are no
other implementations.

> > >
> > > Is it all sse4.2 instructions that count for Slow_SSE4.2 or other the
> > > microcode string ones?
> > other->only*
>
> No.  Only the string instructions are slow.
>
> > > >
> > > > >    if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
> > > > >      return OPTIMIZE (sse42);
> > > > >
> > > > > --
> > > > > 2.34.1
> > > > >
> > > >
> > > >
> > > > --
> > > > H.J.
>
>
>
> --
> H.J.
  
H.J. Lu July 3, 2022, 6 p.m. UTC | #6
On Fri, Jul 1, 2022 at 3:52 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Fri, Jul 1, 2022 at 3:38 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Thu, Jun 30, 2022 at 5:01 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > >
> > > On Thu, Jun 30, 2022 at 5:01 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 4:20 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > >
> > > > > On Thu, Jun 30, 2022 at 1:13 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > > > >
> > > > > > Just for clarities sake and so that if a future implementation is
> > > > > > added we remember to add the check.
> > > > > > ---
> > > > > >  sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 4 ++++
> > > > > >  1 file changed, 4 insertions(+)
> > > > > >
> > > > > > diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > > > index ee36525bcf..752798278c 100644
> > > > > > --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > > > +++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > > > @@ -27,6 +27,10 @@ IFUNC_SELECTOR (void)
> > > > > >  {
> > > > > >    const struct cpu_features* cpu_features = __get_cpu_features ();
> > > > > >
> > > > > > +  /* This function uses slow sse4.2 instructions (pcmpstri) but since
> > > > > > +     there is no other optimized implementation keep using it.  If an
> > > > > > +     optimized fallback is added add a X86_ISA_CPU_FEATURE_ARCH_P
> > > > > > +     (cpu_features, Slow_SSE4_2) check.  */
> > > > >
> > > > > This function always uses sse4.2 instructions (pcmpstri) since there
> > > > > is no other optimized implementation.  If an ...
>
> Then always uses slow sse4.2...

SSE4.2 isn't always slow.

> Also I think there needs to be a 'but' before since otherwise the sentence
> sounds like it's saying the function uses sse4.2 because there are no
> other implementations.

because there are no other "optimized" implementations.

> > > >
> > > > Is it all sse4.2 instructions that count for Slow_SSE4.2 or other the
> > > > microcode string ones?
> > > other->only*
> >
> > No.  Only the string instructions are slow.
> >
> > > > >
> > > > > >    if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
> > > > > >      return OPTIMIZE (sse42);
> > > > > >
> > > > > > --
> > > > > > 2.34.1
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > H.J.
> >
> >
> >
> > --
> > H.J.
  
Noah Goldstein July 4, 2022, 4:28 a.m. UTC | #7
On Sun, Jul 3, 2022 at 11:01 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Fri, Jul 1, 2022 at 3:52 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Fri, Jul 1, 2022 at 3:38 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Thu, Jun 30, 2022 at 5:01 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 5:01 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > > >
> > > > > On Thu, Jun 30, 2022 at 4:20 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Jun 30, 2022 at 1:13 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > > > > >
> > > > > > > Just for clarities sake and so that if a future implementation is
> > > > > > > added we remember to add the check.
> > > > > > > ---
> > > > > > >  sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 4 ++++
> > > > > > >  1 file changed, 4 insertions(+)
> > > > > > >
> > > > > > > diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > > > > index ee36525bcf..752798278c 100644
> > > > > > > --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > > > > +++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> > > > > > > @@ -27,6 +27,10 @@ IFUNC_SELECTOR (void)
> > > > > > >  {
> > > > > > >    const struct cpu_features* cpu_features = __get_cpu_features ();
> > > > > > >
> > > > > > > +  /* This function uses slow sse4.2 instructions (pcmpstri) but since
> > > > > > > +     there is no other optimized implementation keep using it.  If an
> > > > > > > +     optimized fallback is added add a X86_ISA_CPU_FEATURE_ARCH_P
> > > > > > > +     (cpu_features, Slow_SSE4_2) check.  */
> > > > > >
> > > > > > This function always uses sse4.2 instructions (pcmpstri) since there
> > > > > > is no other optimized implementation.  If an ...
> >
> > Then always uses slow sse4.2...
>
> SSE4.2 isn't always slow.
>

Think made the comment reflect that in V7.

> > Also I think there needs to be a 'but' before since otherwise the sentence
> > sounds like it's saying the function uses sse4.2 because there are no
> > other implementations.
>
> because there are no other "optimized" implementations.
>

Made clearer in V7.

> > > > >
> > > > > Is it all sse4.2 instructions that count for Slow_SSE4.2 or other the
> > > > > microcode string ones?
> > > > other->only*
> > >
> > > No.  Only the string instructions are slow.
> > >
> > > > > >
> > > > > > >    if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
> > > > > > >      return OPTIMIZE (sse42);
> > > > > > >
> > > > > > > --
> > > > > > > 2.34.1
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > H.J.
> > >
> > >
> > >
> > > --
> > > H.J.
>
>
>
> --
> H.J.
  

Patch

diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
index ee36525bcf..752798278c 100644
--- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
+++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
@@ -27,6 +27,10 @@  IFUNC_SELECTOR (void)
 {
   const struct cpu_features* cpu_features = __get_cpu_features ();
 
+  /* This function uses slow sse4.2 instructions (pcmpstri) but since
+     there is no other optimized implementation keep using it.  If an
+     optimized fallback is added add a X86_ISA_CPU_FEATURE_ARCH_P
+     (cpu_features, Slow_SSE4_2) check.  */
   if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
     return OPTIMIZE (sse42);