rs6000, fix vec_replace_unaligned builtin arguments

Message ID 63afb57c80647d16859c4a2bdd83fa2bbcdc843e.camel@us.ibm.com
State New
Headers
Series rs6000, fix vec_replace_unaligned builtin arguments |

Commit Message

Carl Love May 30, 2023, 8:41 p.m. UTC
  GCC maintainers:

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:


   commit 8cb748a31cd8c7ac9c88b6abc38ce077dd462a7a
   Author: Bill Schmidt <wschmidt@linux.ibm.com>
   Date:   Fri Feb 4 13:26:44 2022 -0600

       rs6000: Clean up ISA 3.1 documentation [PR100808]

       Due to a pasto error in the documentation, vec_replace_unaligned was
       implemented with the same function prototypes as vec_replace_elt.  It was
       intended that vec_replace_unaligned always specify output vectors as having
       type vector unsigned char, to emphasize that elements are potentially
       misaligned by this built-in function.  This patch corrects the
       misimplementation.

       2022-02-04  Bill Schmidt  <wschmidt@linux.ibm.com>

       gcc/
               PR target/100808
               * doc/extend.texi (Basic PowerPC Built-in Functions Available on ISA
               3.1): Provide consistent type names.  Remove unnecessary semicolons.
               Fix bad line breaks.

This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

                 Carl 

--------------------------------------------------
rs6000, fix vec_replace_unaligned builtin arguments

The first argument of the vec_replace_unaligned builtin should always be
unsinged char, as specified in gcc/doc/extend.texi.

This patch fixes the buitin definitions and updates the testcases to use
the correct arguments.

gcc/ChangeLog:
	* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
	Fix first argument type.

gcc/testsuite/ChangeLog:
	* gcc.target/powerpc/ver-replace-word-runnable.c
	(vec_replace_unaligned) Fix first argument type.
	(vresult_uchar): Fix expected	results.
	(vec_replace_unaligned): Update for loop to check uchar results.
	Remove extra spaces in if statements.
	Insert missing spaces in for statements.
	(dg-final): Update expected instruction counts.
---
 gcc/config/rs6000/rs6000-overload.def         |  12 +-
 .../powerpc/vec-replace-word-runnable.c       | 157 ++++++++++--------
 2 files changed, 92 insertions(+), 77 deletions(-)
  

Comments

Kewen.Lin June 13, 2023, 3:24 a.m. UTC | #1
Hi Carl,

on 2023/5/31 04:41, Carl Love wrote:
> GCC maintainers:
> 
> The following patch fixes the first argument in the builtin definition
> and the corresponding test cases.  Initially, the builtin specification
> was wrong due to a cut and past error.  The documentation was fixed in:
> 
> 
>    commit 8cb748a31cd8c7ac9c88b6abc38ce077dd462a7a
>    Author: Bill Schmidt <wschmidt@linux.ibm.com>
>    Date:   Fri Feb 4 13:26:44 2022 -0600
> 
>        rs6000: Clean up ISA 3.1 documentation [PR100808]
> 
>        Due to a pasto error in the documentation, vec_replace_unaligned was
>        implemented with the same function prototypes as vec_replace_elt.  It was
>        intended that vec_replace_unaligned always specify output vectors as having
>        type vector unsigned char, to emphasize that elements are potentially
>        misaligned by this built-in function.  This patch corrects the
>        misimplementation.
> 
>        2022-02-04  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
>        gcc/
>                PR target/100808
>                * doc/extend.texi (Basic PowerPC Built-in Functions Available on ISA
>                3.1): Provide consistent type names.  Remove unnecessary semicolons.
>                Fix bad line breaks.
> 

Wrong referred commit, should be ed3fea09b18f67e757b5768b42cb6e816626f1db.
The above commit used the wrong commit log.

> This patch fixes the arguments in the definitions and updates the
> testcases accordingly.  Additionally, a few minor spacing issues are
> fixed.
> 
> The patch has been tested on Power 10 with no regressions.  Please let
> me know if the patch is acceptable for mainline.  Thanks.
> 
>                  Carl 
> 
> --------------------------------------------------
> rs6000, fix vec_replace_unaligned builtin arguments
> 
> The first argument of the vec_replace_unaligned builtin should always be
> unsinged char, as specified in gcc/doc/extend.texi.

s/unsinged/unsigned/

> 
> This patch fixes the buitin definitions and updates the testcases to use

s/buitin/builtin/

> the correct arguments.
> 
> gcc/ChangeLog:
> 	* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
> 	Fix first argument type.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.target/powerpc/ver-replace-word-runnable.c
> 	(vec_replace_unaligned) Fix first argument type.
> 	(vresult_uchar): Fix expected	results.

Nit: unexpected tab.

> 	(vec_replace_unaligned): Update for loop to check uchar results.
> 	Remove extra spaces in if statements.
> 	Insert missing spaces in for statements.
> 	(dg-final): Update expected instruction counts.
> ---
>  gcc/config/rs6000/rs6000-overload.def         |  12 +-
>  .../powerpc/vec-replace-word-runnable.c       | 157 ++++++++++--------
>  2 files changed, 92 insertions(+), 77 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
> index c582490c084..26dc662b8fb 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -3059,17 +3059,17 @@
>      VREPLACE_ELT_V2DF
>  
>  [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
> -  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
> +  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
>      VREPLACE_UN_UV4SI
> -  vuc __builtin_vec_replace_un (vsi, signed int, const int);
> +  vuc __builtin_vec_replace_un (vuc, signed int, const int);
>      VREPLACE_UN_V4SI
> -  vuc __builtin_vec_replace_un (vull, unsigned long long, const int);
> +  vuc __builtin_vec_replace_un (vuc, unsigned long long, const int);
>      VREPLACE_UN_UV2DI
> -  vuc __builtin_vec_replace_un (vsll, signed long long, const int);
> +  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
>      VREPLACE_UN_V2DI
> -  vuc __builtin_vec_replace_un (vf, float, const int);
> +  vuc __builtin_vec_replace_un (vuc, float, const int);
>      VREPLACE_UN_V4SF
> -  vuc __builtin_vec_replace_un (vd, double, const int);
> +  vuc __builtin_vec_replace_un (vuc, double, const int);
>      VREPLACE_UN_V2DF

Looks good, since the given element can be replaced without aligned,
the given vector type don't need to match the given element, with
the potential implication that it can be misaligned.

>  
>  [VEC_REVB, vec_revb, __builtin_vec_revb]
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
> index 27318822871..66b0ef58996 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
> @@ -20,6 +20,9 @@ main (int argc, char *argv [])
>    unsigned char ch;
>    unsigned int index;
>  
> +  vector unsigned char src_va_uchar;
> +  vector unsigned char expected_vresult_uchar;
> +
>    vector unsigned int vresult_uint;
>    vector unsigned int expected_vresult_uint;
>    vector unsigned int src_va_uint;
> @@ -64,10 +67,10 @@ main (int argc, char *argv [])
>  						 
>    vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
>  
> -  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
> +  if (!vec_all_eq (vresult_uint, expected_vresult_uint)) {
>  #if DEBUG
>      printf("ERROR, vec_replace_elt (src_vb_uint, src_va_uint, index)\n");
> -    for(i = 0; i < 4; i++)
> +    for (i = 0; i < 4; i++)
>        printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
>  	     i, vresult_uint[i], i, expected_vresult_uint[i]);
>  #else
> @@ -82,10 +85,10 @@ main (int argc, char *argv [])
>  						 
>    vresult_int = vec_replace_elt (src_va_int, src_a_int, 1);
>  
> -  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
> +  if (!vec_all_eq (vresult_int, expected_vresult_int)) {
>  #if DEBUG
>      printf("ERROR, vec_replace_elt (src_vb_int, src_va_int, index)\n");
> -    for(i = 0; i < 4; i++)
> +    for (i = 0; i < 4; i++)
>        printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
>  	     i, vresult_int[i], i, expected_vresult_int[i]);
>  #else
> @@ -100,10 +103,10 @@ main (int argc, char *argv [])
>  						 
>    vresult_float = vec_replace_elt (src_va_float, src_a_float, 1);
>  
> -  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
> +  if (!vec_all_eq (vresult_float, expected_vresult_float)) {
>  #if DEBUG
>      printf("ERROR, vec_replace_elt (src_vb_float, src_va_float, index)\n");
> -    for(i = 0; i < 4; i++)
> +    for (i = 0; i < 4; i++)
>        printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n",
>  	     i, vresult_float[i], i, expected_vresult_float[i]);
>  #else
> @@ -122,7 +125,7 @@ main (int argc, char *argv [])
>    if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
>  #if DEBUG
>      printf("ERROR, vec_replace_elt (src_vb_ullint, src_va_ullint, index)\n");
> -    for(i = 0; i < 2; i++)
> +    for (i = 0; i < 2; i++)
>        printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
>  	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
>  #else
> @@ -137,10 +140,10 @@ main (int argc, char *argv [])
>  						 
>    vresult_llint = vec_replace_elt (src_va_llint, src_a_llint, 1);
>  
> -  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
> +  if (!vec_all_eq (vresult_llint, expected_vresult_llint)) {
>  #if DEBUG
>      printf("ERROR, vec_replace_elt (src_vb_llint, src_va_llint, index)\n");
> -    for(i = 0; i < 2; i++)
> +    for (i = 0; i < 2; i++)
>        printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
>  	     i, vresult_llint[i], i, expected_vresult_llint[i]);
>  #else
> @@ -155,10 +158,10 @@ main (int argc, char *argv [])
>  						 
>    vresult_double = vec_replace_elt (src_va_double, src_a_double, 1);
>  
> -  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
> +  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
>  #if DEBUG
>      printf("ERROR, vec_replace_elt (src_vb_double, src_va_double, index)\n");
> -    for(i = 0; i < 2; i++)
> +    for (i = 0; i < 2; i++)
>        printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n",
>  	     i, vresult_double[i], i, expected_vresult_double[i]);
>  #else
> @@ -169,60 +172,68 @@ main (int argc, char *argv [])
>  
>    /* Vector replace 32-bit element, unaligned */
>    src_a_uint = 345;
> -  src_va_uint = (vector unsigned int) { 1, 2, 0, 0 };
> -  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
> +  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
> +					  0, 0, 0, 0, 0, 0, 0, 0 };
> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 };
>    /* Byte index 7 will overwrite part of elements 2 and 3 */
> -  expected_vresult_uint = (vector unsigned int) { 1, 2, 345*256, 0 };
> +  expected_vresult_uchar
> +    = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
> +			       0, 0x59, 0x1, 0, 0, 0, 0, 0 };
>  						 
> -  vresult_uchar = vec_replace_unaligned (src_va_uint, src_a_uint, 3);
> -  vresult_uint = (vector unsigned int) vresult_uchar;
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_uint, 3);
>  
> -  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_uint, src_va_uint, index)\n");
> -    for(i = 0; i < 4; i++)
> -      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
> -	     i, vresult_uint[i], i, expected_vresult_uint[i]);
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_uint, index)\n");
> +    for (i = 0; i < 16; i++)
> +      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uint[%d] = 0%x\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>  #else
>      abort();
>  #endif
>    }
>  
>    src_a_int = 234;
> -  src_va_int = (vector int) { 1, 0, 3, 4 };
> -  vresult_int = (vector int) { 0, 0, 0, 0 };
> +  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 0, 0, 0, 0,
> +					  3, 0, 0, 0, 4, 0, 0, 0 };
> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 };
>    /* Byte index 7 will over write part of elements 1 and 2 */
> -  expected_vresult_int = (vector int) { 1, 234*256, 0, 4 };
> -						 
> -  vresult_uchar = vec_replace_unaligned (src_va_int, src_a_int, 7);
> -  vresult_int = (vector signed int) vresult_uchar;
> +  expected_vresult_uchar = (vector unsigned char) { 1, 0, 0, 0, 0, 0xea, 0, 0,
> +						    0, 0, 0, 0, 4, 0, 0, 0 };
>  
> -  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_int, 7);
> +
> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_int, src_va_int, index)\n");
> -    for(i = 0; i < 4; i++)
> -      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
> -	     i, vresult_int[i], i, expected_vresult_int[i]);
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_int, index)\n");
> +    for (i = 0; i < 16; i++)
> +      printf(" vresult_int[%d] = 0x%x, expected_vresult_int[%d] = 0x%x\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>  #else
>      abort();
>  #endif
>    }
>  
>    src_a_float = 34.0;
> -  src_va_float = (vector float) { 0.0, 10.0, 20.0, 30.0 };
> -  vresult_float = (vector float) { 0.0, 0.0, 0.0, 0.0 };
> -  expected_vresult_float = (vector float) { 0.0, 34.0, 20.0, 30.0 };
> +  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0x41, 0xa0,
> +					  5, 6, 7, 8, 0x41, 0xf0, 0, 0};
> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 };
> +  expected_vresult_uchar
> +    = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 8, 0x42,
> +			       5, 6, 7, 8, 0x41, 0xf0, 0, 0 };
>  						 
> -  vresult_uchar = vec_replace_unaligned (src_va_float, src_a_float, 8);
> -  vresult_float = (vector float) vresult_uchar;
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_float, 8);
>  
> -  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_float, src_va_float, "
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_float, "
>  	   "index)\n");
> -    for(i = 0; i < 4; i++)
> -      printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n",
> -	     i, vresult_float[i], i, expected_vresult_float[i]);
> +    for (i = 0; i < 16; i++)
> +      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uchar[%d] = 0x%x\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>  #else
>      abort();
>  #endif
> @@ -230,61 +241,67 @@ main (int argc, char *argv [])
>  
>    /* Vector replace 64-bit element, unaligned  */
>    src_a_ullint = 456;
> -  src_va_ullint = (vector unsigned long long int) { 0, 0x222 };
> -  vresult_ullint = (vector unsigned long long int) { 0, 0 };
> -  expected_vresult_ullint = (vector unsigned long long int) { 456*256,
> -							      0x200 };
> +  src_va_uchar = (vector unsigned char) { 0, 0xc, 0x1, 0, 0, 0, 0, 0,
> +					  0x22, 0x2, 0, 0, 0, 0, 0, 0 };
> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 };
> +  expected_vresult_uchar
> +    = (vector unsigned char) { 0, 0xc, 0x1, 0, 0, 0xc8, 1, 0,
> +			       0, 2, 0, 0, 0, 0, 0, 0 };
>  						 
>    /* Byte index 7 will over write least significant byte of  element 0  */
> -  vresult_uchar = vec_replace_unaligned (src_va_ullint, src_a_ullint, 7);
> -  vresult_ullint = (vector unsigned long long) vresult_uchar;
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_ullint, 7);
>  
> -  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_ullint, src_va_ullint, "
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_ullint, "
>  	   "index)\n");
> -    for(i = 0; i < 2; i++)
> -      printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
> -	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
> +    for (i = 0; i < 15; i++)
> +      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uchar[%d] = 0x%x\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>  #else
>      abort();
>  #endif
>    }
>  
>    src_a_llint = 678;
> -  src_va_llint = (vector long long int) { 0, 0x101 };
> +  src_va_uchar = (vector unsigned char) { 0, 0xa6, 0x2, 0, 0, 0, 0, 0,
> +					  0x0, 0x1, 0, 0, 0, 0, 0, 0 };
>    vresult_llint = (vector long long int) { 0, 0 };
>    /* Byte index 7 will over write least significant byte of  element 0  */
> -  expected_vresult_llint = (vector long long int) { 678*256, 0x100 };
> +  expected_vresult_uchar
> +    = (vector unsigned char) { 0x0, 0xa6, 0x2, 0x0, 0x0, 0xa6, 0x2, 0x0,
> +			       0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0 };
>  						 
> -  vresult_uchar = vec_replace_unaligned (src_va_llint, src_a_llint, 7);
> -  vresult_llint = (vector signed long long) vresult_uchar;
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_llint, 7);
>  
> -  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_llint, src_va_llint, "
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_llint, "
>  	   "index)\n");
> -    for(i = 0; i < 2; i++)
> -      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
> -	     i, vresult_llint[i], i, expected_vresult_llint[i]);
> +    for (i = 0; i < 16; i++)
> +      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n",
> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>  #else
>      abort();
>  #endif
>    }
>    
>    src_a_double = 678.0;
> -  src_va_double = (vector double) { 0.0, 50.0 };
> +  //  src_va_double = (vector double) { 0.0, 50.0 };
> +  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					  0, 0, 0, 0, 0, 0, 0, 0 };
>    vresult_double = (vector double) { 0.0, 0.0 };
>    expected_vresult_double = (vector double) { 0.0, 678.0 };
>  						 
> -  vresult_uchar = vec_replace_unaligned (src_va_double, src_a_double, 0);
> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_double, 0);
>    vresult_double = (vector double) vresult_uchar;
>  
> -  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
> +  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
>  #if DEBUG
> -    printf("ERROR, vec_replace_unaligned (src_vb_double, src_va_double, "
> +    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_double, "
>  	   "index)\n");
> -    for(i = 0; i < 2; i++)
> +    for (i = 0; i < 2; i++)
>        printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n",
>  	     i, vresult_double[i], i, expected_vresult_double[i]);
>  #else
> @@ -295,7 +312,5 @@ main (int argc, char *argv [])
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler-times {\mvinsw\M} 6 } } */
> -/* { dg-final { scan-assembler-times {\mvinsd\M} 6 } } */
> -
> -
> +/* { dg-final { scan-assembler-times {\mvinsw\M} 8 } } */
> +/* { dg-final { scan-assembler-times {\mvinsd\M} 4 } } */


Why do you have this change?  As changelog, it just updates some types as the
prototypes change and formats, it's not clear to me why it can cause the
expected insn count to change.

BR,
Kewen
  
Carl Love June 15, 2023, 4 p.m. UTC | #2
On Tue, 2023-06-13 at 11:24 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/31 04:41, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch fixes the first argument in the builtin
> > definition
> > and the corresponding test cases.  Initially, the builtin
> > specification
> > was wrong due to a cut and past error.  The documentation was fixed
> > in:
> > 
> > 
> >    commit 8cb748a31cd8c7ac9c88b6abc38ce077dd462a7a
> >    Author: Bill Schmidt <wschmidt@linux.ibm.com>
> >    Date:   Fri Feb 4 13:26:44 2022 -0600
> > 
> >        rs6000: Clean up ISA 3.1 documentation [PR100808]
> > 
> >        Due to a pasto error in the documentation,
> > vec_replace_unaligned was
> >        implemented with the same function prototypes as
> > vec_replace_elt.  It was
> >        intended that vec_replace_unaligned always specify output
> > vectors as having
> >        type vector unsigned char, to emphasize that elements are
> > potentially
> >        misaligned by this built-in function.  This patch corrects
> > the
> >        misimplementation.
> > 
> >        2022-02-04  Bill Schmidt  <wschmidt@linux.ibm.com>
> > 
> >        gcc/
> >                PR target/100808
> >                * doc/extend.texi (Basic PowerPC Built-in Functions
> > Available on ISA
> >                3.1): Provide consistent type names.  Remove
> > unnecessary semicolons.
> >                Fix bad line breaks.
> > 
> 
> Wrong referred commit, should be
> ed3fea09b18f67e757b5768b42cb6e816626f1db.
> The above commit used the wrong commit log.

Fixed the commit reference as noted.

> 
> > This patch fixes the arguments in the definitions and updates the
> > testcases accordingly.  Additionally, a few minor spacing issues
> > are
> > fixed.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable for mainline.  Thanks.
> > 
> >                  Carl 
> > 
> > --------------------------------------------------
> > rs6000, fix vec_replace_unaligned builtin arguments
> > 
> > The first argument of the vec_replace_unaligned builtin should
> > always be
> > unsinged char, as specified in gcc/doc/extend.texi.
> 
> s/unsinged/unsigned/

Fixed.

> 
> > This patch fixes the buitin definitions and updates the testcases
> > to use
> 
> s/buitin/builtin/

Fixed.

> 
> > the correct arguments.
> > 
> > gcc/ChangeLog:
> > 	* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
> > 	Fix first argument type.
> > 
> > gcc/testsuite/ChangeLog:
> > 	* gcc.target/powerpc/ver-replace-word-runnable.c
> > 	(vec_replace_unaligned) Fix first argument type.
> > 	(vresult_uchar): Fix expected	results.
> 
> Nit: unexpected tab.

Fixed.

> 
> > 	(vec_replace_unaligned): Update for loop to check uchar
> > results.
> > 	Remove extra spaces in if statements.
> > 	Insert missing spaces in for statements.
> > 	(dg-final): Update expected instruction counts.
> > ---
> >  gcc/config/rs6000/rs6000-overload.def         |  12 +-
> >  .../powerpc/vec-replace-word-runnable.c       | 157 ++++++++++--
> > ------
> >  2 files changed, 92 insertions(+), 77 deletions(-)
> > 
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..26dc662b8fb 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -3059,17 +3059,17 @@
> >      VREPLACE_ELT_V2DF
> >  
> >  [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
> > -  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
> > +  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
> >      VREPLACE_UN_UV4SI
> > -  vuc __builtin_vec_replace_un (vsi, signed int, const int);
> > +  vuc __builtin_vec_replace_un (vuc, signed int, const int);
> >      VREPLACE_UN_V4SI
> > -  vuc __builtin_vec_replace_un (vull, unsigned long long, const
> > int);
> > +  vuc __builtin_vec_replace_un (vuc, unsigned long long, const
> > int);
> >      VREPLACE_UN_UV2DI
> > -  vuc __builtin_vec_replace_un (vsll, signed long long, const
> > int);
> > +  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
> >      VREPLACE_UN_V2DI
> > -  vuc __builtin_vec_replace_un (vf, float, const int);
> > +  vuc __builtin_vec_replace_un (vuc, float, const int);
> >      VREPLACE_UN_V4SF
> > -  vuc __builtin_vec_replace_un (vd, double, const int);
> > +  vuc __builtin_vec_replace_un (vuc, double, const int);
> >      VREPLACE_UN_V2DF
> 
> Looks good, since the given element can be replaced without aligned,
> the given vector type don't need to match the given element, with
> the potential implication that it can be misaligned.
> 
> >  
> >  [VEC_REVB, vec_revb, __builtin_vec_revb]
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable.c
> > index 27318822871..66b0ef58996 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
> > @@ -20,6 +20,9 @@ main (int argc, char *argv [])
> >    unsigned char ch;
> >    unsigned int index;
> >  
> > +  vector unsigned char src_va_uchar;
> > +  vector unsigned char expected_vresult_uchar;
> > +
> >    vector unsigned int vresult_uint;
> >    vector unsigned int expected_vresult_uint;
> >    vector unsigned int src_va_uint;
> > @@ -64,10 +67,10 @@ main (int argc, char *argv [])
> >  						 
> >    vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
> >  
> > -  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
> > +  if (!vec_all_eq (vresult_uint, expected_vresult_uint)) {
> >  #if DEBUG
> >      printf("ERROR, vec_replace_elt (src_vb_uint, src_va_uint,
> > index)\n");
> > -    for(i = 0; i < 4; i++)
> > +    for (i = 0; i < 4; i++)
> >        printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] =
> > %d\n",
> >  	     i, vresult_uint[i], i, expected_vresult_uint[i]);
> >  #else
> > @@ -82,10 +85,10 @@ main (int argc, char *argv [])
> >  						 
> >    vresult_int = vec_replace_elt (src_va_int, src_a_int, 1);
> >  
> > -  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
> > +  if (!vec_all_eq (vresult_int, expected_vresult_int)) {
> >  #if DEBUG
> >      printf("ERROR, vec_replace_elt (src_vb_int, src_va_int,
> > index)\n");
> > -    for(i = 0; i < 4; i++)
> > +    for (i = 0; i < 4; i++)
> >        printf(" vresult_int[%d] = %d, expected_vresult_int[%d] =
> > %d\n",
> >  	     i, vresult_int[i], i, expected_vresult_int[i]);
> >  #else
> > @@ -100,10 +103,10 @@ main (int argc, char *argv [])
> >  						 
> >    vresult_float = vec_replace_elt (src_va_float, src_a_float, 1);
> >  
> > -  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
> > +  if (!vec_all_eq (vresult_float, expected_vresult_float)) {
> >  #if DEBUG
> >      printf("ERROR, vec_replace_elt (src_vb_float, src_va_float,
> > index)\n");
> > -    for(i = 0; i < 4; i++)
> > +    for (i = 0; i < 4; i++)
> >        printf(" vresult_float[%d] = %f, expected_vresult_float[%d]
> > = %f\n",
> >  	     i, vresult_float[i], i, expected_vresult_float[i]);
> >  #else
> > @@ -122,7 +125,7 @@ main (int argc, char *argv [])
> >    if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
> >  #if DEBUG
> >      printf("ERROR, vec_replace_elt (src_vb_ullint, src_va_ullint,
> > index)\n");
> > -    for(i = 0; i < 2; i++)
> > +    for (i = 0; i < 2; i++)
> >        printf(" vresult_ullint[%d] = %d,
> > expected_vresult_ullint[%d] = %d\n",
> >  	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
> >  #else
> > @@ -137,10 +140,10 @@ main (int argc, char *argv [])
> >  						 
> >    vresult_llint = vec_replace_elt (src_va_llint, src_a_llint, 1);
> >  
> > -  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
> > +  if (!vec_all_eq (vresult_llint, expected_vresult_llint)) {
> >  #if DEBUG
> >      printf("ERROR, vec_replace_elt (src_vb_llint, src_va_llint,
> > index)\n");
> > -    for(i = 0; i < 2; i++)
> > +    for (i = 0; i < 2; i++)
> >        printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d]
> > = %d\n",
> >  	     i, vresult_llint[i], i, expected_vresult_llint[i]);
> >  #else
> > @@ -155,10 +158,10 @@ main (int argc, char *argv [])
> >  						 
> >    vresult_double = vec_replace_elt (src_va_double, src_a_double,
> > 1);
> >  
> > -  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
> > +  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
> >  #if DEBUG
> >      printf("ERROR, vec_replace_elt (src_vb_double, src_va_double,
> > index)\n");
> > -    for(i = 0; i < 2; i++)
> > +    for (i = 0; i < 2; i++)
> >        printf(" vresult_double[%d] = %f,
> > expected_vresult_double[%d] = %f\n",
> >  	     i, vresult_double[i], i, expected_vresult_double[i]);
> >  #else
> > @@ -169,60 +172,68 @@ main (int argc, char *argv [])
> >  
> >    /* Vector replace 32-bit element, unaligned */
> >    src_a_uint = 345;
> > -  src_va_uint = (vector unsigned int) { 1, 2, 0, 0 };
> > -  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
> > +  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
> > +					  0, 0, 0, 0, 0, 0, 0, 0 };
> > +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> > +					   0, 0, 0, 0, 0, 0, 0, 0 };
> >    /* Byte index 7 will overwrite part of elements 2 and 3 */
> > -  expected_vresult_uint = (vector unsigned int) { 1, 2, 345*256, 0
> > };
> > +  expected_vresult_uchar
> > +    = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
> > +			       0, 0x59, 0x1, 0, 0, 0, 0, 0 };
> >  						 
> > -  vresult_uchar = vec_replace_unaligned (src_va_uint, src_a_uint,
> > 3);
> > -  vresult_uint = (vector unsigned int) vresult_uchar;
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_uint,
> > 3);
> >  
> > -  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
> > +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_uint,
> > src_va_uint, index)\n");
> > -    for(i = 0; i < 4; i++)
> > -      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] =
> > %d\n",
> > -	     i, vresult_uint[i], i, expected_vresult_uint[i]);
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
> > src_va_uint, index)\n");
> > +    for (i = 0; i < 16; i++)
> > +      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uint[%d]
> > = 0%x\n",
> > +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> >  #else
> >      abort();
> >  #endif
> >    }
> >  
> >    src_a_int = 234;
> > -  src_va_int = (vector int) { 1, 0, 3, 4 };
> > -  vresult_int = (vector int) { 0, 0, 0, 0 };
> > +  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 0, 0, 0, 0,
> > +					  3, 0, 0, 0, 4, 0, 0, 0 };
> > +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> > +					   0, 0, 0, 0, 0, 0, 0, 0 };
> >    /* Byte index 7 will over write part of elements 1 and 2 */
> > -  expected_vresult_int = (vector int) { 1, 234*256, 0, 4 };
> > -						 
> > -  vresult_uchar = vec_replace_unaligned (src_va_int, src_a_int,
> > 7);
> > -  vresult_int = (vector signed int) vresult_uchar;
> > +  expected_vresult_uchar = (vector unsigned char) { 1, 0, 0, 0, 0,
> > 0xea, 0, 0,
> > +						    0, 0, 0, 0, 4, 0,
> > 0, 0 };
> >  
> > -  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_int,
> > 7);
> > +
> > +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_int, src_va_int,
> > index)\n");
> > -    for(i = 0; i < 4; i++)
> > -      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] =
> > %d\n",
> > -	     i, vresult_int[i], i, expected_vresult_int[i]);
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
> > src_va_int, index)\n");
> > +    for (i = 0; i < 16; i++)
> > +      printf(" vresult_int[%d] = 0x%x, expected_vresult_int[%d] =
> > 0x%x\n",
> > +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> >  #else
> >      abort();
> >  #endif
> >    }
> >  
> >    src_a_float = 34.0;
> > -  src_va_float = (vector float) { 0.0, 10.0, 20.0, 30.0 };
> > -  vresult_float = (vector float) { 0.0, 0.0, 0.0, 0.0 };
> > -  expected_vresult_float = (vector float) { 0.0, 34.0, 20.0, 30.0
> > };
> > +  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0x41,
> > 0xa0,
> > +					  5, 6, 7, 8, 0x41, 0xf0, 0,
> > 0};
> > +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> > +					   0, 0, 0, 0, 0, 0, 0, 0 };
> > +  expected_vresult_uchar
> > +    = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 8, 0x42,
> > +			       5, 6, 7, 8, 0x41, 0xf0, 0, 0 };
> >  						 
> > -  vresult_uchar = vec_replace_unaligned (src_va_float,
> > src_a_float, 8);
> > -  vresult_float = (vector float) vresult_uchar;
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
> > src_a_float, 8);
> >  
> > -  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
> > +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_float,
> > src_va_float, "
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
> > src_va_float, "
> >  	   "index)\n");
> > -    for(i = 0; i < 4; i++)
> > -      printf(" vresult_float[%d] = %f, expected_vresult_float[%d]
> > = %f\n",
> > -	     i, vresult_float[i], i, expected_vresult_float[i]);
> > +    for (i = 0; i < 16; i++)
> > +      printf(" vresult_uchar[%d] = 0x%x,
> > expected_vresult_uchar[%d] = 0x%x\n",
> > +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> >  #else
> >      abort();
> >  #endif
> > @@ -230,61 +241,67 @@ main (int argc, char *argv [])
> >  
> >    /* Vector replace 64-bit element, unaligned  */
> >    src_a_ullint = 456;
> > -  src_va_ullint = (vector unsigned long long int) { 0, 0x222 };
> > -  vresult_ullint = (vector unsigned long long int) { 0, 0 };
> > -  expected_vresult_ullint = (vector unsigned long long int) {
> > 456*256,
> > -							      0x200 };
> > +  src_va_uchar = (vector unsigned char) { 0, 0xc, 0x1, 0, 0, 0, 0,
> > 0,
> > +					  0x22, 0x2, 0, 0, 0, 0, 0, 0
> > };
> > +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> > +					   0, 0, 0, 0, 0, 0, 0, 0 };
> > +  expected_vresult_uchar
> > +    = (vector unsigned char) { 0, 0xc, 0x1, 0, 0, 0xc8, 1, 0,
> > +			       0, 2, 0, 0, 0, 0, 0, 0 };
> >  						 
> >    /* Byte index 7 will over write least significant byte
> > of  element 0  */
> > -  vresult_uchar = vec_replace_unaligned (src_va_ullint,
> > src_a_ullint, 7);
> > -  vresult_ullint = (vector unsigned long long) vresult_uchar;
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
> > src_a_ullint, 7);
> >  
> > -  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
> > +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_ullint,
> > src_va_ullint, "
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
> > src_va_ullint, "
> >  	   "index)\n");
> > -    for(i = 0; i < 2; i++)
> > -      printf(" vresult_ullint[%d] = %d,
> > expected_vresult_ullint[%d] = %d\n",
> > -	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
> > +    for (i = 0; i < 15; i++)
> > +      printf(" vresult_uchar[%d] = 0x%x,
> > expected_vresult_uchar[%d] = 0x%x\n",
> > +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> >  #else
> >      abort();
> >  #endif
> >    }
> >  
> >    src_a_llint = 678;
> > -  src_va_llint = (vector long long int) { 0, 0x101 };
> > +  src_va_uchar = (vector unsigned char) { 0, 0xa6, 0x2, 0, 0, 0,
> > 0, 0,
> > +					  0x0, 0x1, 0, 0, 0, 0, 0, 0 };
> >    vresult_llint = (vector long long int) { 0, 0 };
> >    /* Byte index 7 will over write least significant byte
> > of  element 0  */
> > -  expected_vresult_llint = (vector long long int) { 678*256, 0x100
> > };
> > +  expected_vresult_uchar
> > +    = (vector unsigned char) { 0x0, 0xa6, 0x2, 0x0, 0x0, 0xa6,
> > 0x2, 0x0,
> > +			       0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0
> > };
> >  						 
> > -  vresult_uchar = vec_replace_unaligned (src_va_llint,
> > src_a_llint, 7);
> > -  vresult_llint = (vector signed long long) vresult_uchar;
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
> > src_a_llint, 7);
> >  
> > -  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
> > +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_llint,
> > src_va_llint, "
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
> > src_va_llint, "
> >  	   "index)\n");
> > -    for(i = 0; i < 2; i++)
> > -      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d]
> > = %d\n",
> > -	     i, vresult_llint[i], i, expected_vresult_llint[i]);
> > +    for (i = 0; i < 16; i++)
> > +      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d]
> > = %d\n",
> > +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
> >  #else
> >      abort();
> >  #endif
> >    }
> >    
> >    src_a_double = 678.0;
> > -  src_va_double = (vector double) { 0.0, 50.0 };
> > +  //  src_va_double = (vector double) { 0.0, 50.0 };
> > +  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
> > +					  0, 0, 0, 0, 0, 0, 0, 0 };
> >    vresult_double = (vector double) { 0.0, 0.0 };
> >    expected_vresult_double = (vector double) { 0.0, 678.0 };
> >  						 
> > -  vresult_uchar = vec_replace_unaligned (src_va_double,
> > src_a_double, 0);
> > +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
> > src_a_double, 0);
> >    vresult_double = (vector double) vresult_uchar;
> >  
> > -  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
> > +  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
> >  #if DEBUG
> > -    printf("ERROR, vec_replace_unaligned (src_vb_double,
> > src_va_double, "
> > +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
> > src_va_double, "
> >  	   "index)\n");
> > -    for(i = 0; i < 2; i++)
> > +    for (i = 0; i < 2; i++)
> >        printf(" vresult_double[%d] = %f,
> > expected_vresult_double[%d] = %f\n",
> >  	     i, vresult_double[i], i, expected_vresult_double[i]);
> >  #else
> > @@ -295,7 +312,5 @@ main (int argc, char *argv [])
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler-times {\mvinsw\M} 6 } } */
> > -/* { dg-final { scan-assembler-times {\mvinsd\M} 6 } } */
> > -
> > -
> > +/* { dg-final { scan-assembler-times {\mvinsw\M} 8 } } */
> > +/* { dg-final { scan-assembler-times {\mvinsd\M} 4 } } */
> 
> Why do you have this change?  As changelog, it just updates some
> types as the
> prototypes change and formats, it's not clear to me why it can cause
> the
> expected insn count to change.

The original builtin call 

  vector unsigned char vec_replace_unligned (vector unsigned long long int, ...) 

generated the vinsd instruction for the two calls with the first
argument of unsigned long long int.  When the first argument of the
builtin is changed to the correct type, vector unsigned char the
builtin generates the vinsw instruction instead.  The change occurs in
two places resulting in reducing the counts for vinsd by two and
increasing the counts for vinsw by two.  The other calls to the builtin
are either vector ints or vector floats which generate the vinsw
instruction.  Changing the first argument in those calls to vector
unsigned char still generate the vinsw instruction.

                           Carl
  
Kewen.Lin June 19, 2023, 3:50 a.m. UTC | #3
Hi Carl,

on 2023/6/16 00:00, Carl Love wrote:
> On Tue, 2023-06-13 at 11:24 +0800, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2023/5/31 04:41, Carl Love wrote:
>>> GCC maintainers:
>>>
>>> The following patch fixes the first argument in the builtin
>>> definition
>>> and the corresponding test cases.  Initially, the builtin
>>> specification
>>> was wrong due to a cut and past error.  The documentation was fixed
>>> in:
>>>
>>>
>>>    commit 8cb748a31cd8c7ac9c88b6abc38ce077dd462a7a
>>>    Author: Bill Schmidt <wschmidt@linux.ibm.com>
>>>    Date:   Fri Feb 4 13:26:44 2022 -0600
>>>
>>>        rs6000: Clean up ISA 3.1 documentation [PR100808]
>>>
>>>        Due to a pasto error in the documentation,
>>> vec_replace_unaligned was
>>>        implemented with the same function prototypes as
>>> vec_replace_elt.  It was
>>>        intended that vec_replace_unaligned always specify output
>>> vectors as having
>>>        type vector unsigned char, to emphasize that elements are
>>> potentially
>>>        misaligned by this built-in function.  This patch corrects
>>> the
>>>        misimplementation.
>>>
>>>        2022-02-04  Bill Schmidt  <wschmidt@linux.ibm.com>
>>>
>>>        gcc/
>>>                PR target/100808
>>>                * doc/extend.texi (Basic PowerPC Built-in Functions
>>> Available on ISA
>>>                3.1): Provide consistent type names.  Remove
>>> unnecessary semicolons.
>>>                Fix bad line breaks.
>>>
>>
>> Wrong referred commit, should be
>> ed3fea09b18f67e757b5768b42cb6e816626f1db.
>> The above commit used the wrong commit log.
> 
> Fixed the commit reference as noted.
> 
>>
>>> This patch fixes the arguments in the definitions and updates the
>>> testcases accordingly.  Additionally, a few minor spacing issues
>>> are
>>> fixed.
>>>
>>> The patch has been tested on Power 10 with no regressions.  Please
>>> let
>>> me know if the patch is acceptable for mainline.  Thanks.
>>>
>>>                  Carl 
>>>
>>> --------------------------------------------------
>>> rs6000, fix vec_replace_unaligned builtin arguments
>>>
>>> The first argument of the vec_replace_unaligned builtin should
>>> always be
>>> unsinged char, as specified in gcc/doc/extend.texi.
>>
>> s/unsinged/unsigned/
> 
> Fixed.
> 
>>
>>> This patch fixes the buitin definitions and updates the testcases
>>> to use
>>
>> s/buitin/builtin/
> 
> Fixed.
> 
>>
>>> the correct arguments.
>>>
>>> gcc/ChangeLog:
>>> 	* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
>>> 	Fix first argument type.
>>>
>>> gcc/testsuite/ChangeLog:
>>> 	* gcc.target/powerpc/ver-replace-word-runnable.c
>>> 	(vec_replace_unaligned) Fix first argument type.
>>> 	(vresult_uchar): Fix expected	results.
>>
>> Nit: unexpected tab.
> 
> Fixed.
> 
>>
>>> 	(vec_replace_unaligned): Update for loop to check uchar
>>> results.
>>> 	Remove extra spaces in if statements.
>>> 	Insert missing spaces in for statements.
>>> 	(dg-final): Update expected instruction counts.
>>> ---
>>>  gcc/config/rs6000/rs6000-overload.def         |  12 +-
>>>  .../powerpc/vec-replace-word-runnable.c       | 157 ++++++++++--
>>> ------
>>>  2 files changed, 92 insertions(+), 77 deletions(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-overload.def
>>> b/gcc/config/rs6000/rs6000-overload.def
>>> index c582490c084..26dc662b8fb 100644
>>> --- a/gcc/config/rs6000/rs6000-overload.def
>>> +++ b/gcc/config/rs6000/rs6000-overload.def
>>> @@ -3059,17 +3059,17 @@
>>>      VREPLACE_ELT_V2DF
>>>  
>>>  [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
>>> -  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
>>> +  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
>>>      VREPLACE_UN_UV4SI
>>> -  vuc __builtin_vec_replace_un (vsi, signed int, const int);
>>> +  vuc __builtin_vec_replace_un (vuc, signed int, const int);
>>>      VREPLACE_UN_V4SI
>>> -  vuc __builtin_vec_replace_un (vull, unsigned long long, const
>>> int);
>>> +  vuc __builtin_vec_replace_un (vuc, unsigned long long, const
>>> int);
>>>      VREPLACE_UN_UV2DI
>>> -  vuc __builtin_vec_replace_un (vsll, signed long long, const
>>> int);
>>> +  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
>>>      VREPLACE_UN_V2DI
>>> -  vuc __builtin_vec_replace_un (vf, float, const int);
>>> +  vuc __builtin_vec_replace_un (vuc, float, const int);
>>>      VREPLACE_UN_V4SF
>>> -  vuc __builtin_vec_replace_un (vd, double, const int);
>>> +  vuc __builtin_vec_replace_un (vuc, double, const int);
>>>      VREPLACE_UN_V2DF
>>
>> Looks good, since the given element can be replaced without aligned,
>> the given vector type don't need to match the given element, with
>> the potential implication that it can be misaligned.
>>
>>>  
>>>  [VEC_REVB, vec_revb, __builtin_vec_revb]
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
>>> runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
>>> runnable.c
>>> index 27318822871..66b0ef58996 100644
>>> --- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
>>> @@ -20,6 +20,9 @@ main (int argc, char *argv [])
>>>    unsigned char ch;
>>>    unsigned int index;
>>>  
>>> +  vector unsigned char src_va_uchar;
>>> +  vector unsigned char expected_vresult_uchar;
>>> +
>>>    vector unsigned int vresult_uint;
>>>    vector unsigned int expected_vresult_uint;
>>>    vector unsigned int src_va_uint;
>>> @@ -64,10 +67,10 @@ main (int argc, char *argv [])
>>>  						 
>>>    vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
>>>  
>>> -  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
>>> +  if (!vec_all_eq (vresult_uint, expected_vresult_uint)) {
>>>  #if DEBUG
>>>      printf("ERROR, vec_replace_elt (src_vb_uint, src_va_uint,
>>> index)\n");
>>> -    for(i = 0; i < 4; i++)
>>> +    for (i = 0; i < 4; i++)
>>>        printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] =
>>> %d\n",
>>>  	     i, vresult_uint[i], i, expected_vresult_uint[i]);
>>>  #else
>>> @@ -82,10 +85,10 @@ main (int argc, char *argv [])
>>>  						 
>>>    vresult_int = vec_replace_elt (src_va_int, src_a_int, 1);
>>>  
>>> -  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
>>> +  if (!vec_all_eq (vresult_int, expected_vresult_int)) {
>>>  #if DEBUG
>>>      printf("ERROR, vec_replace_elt (src_vb_int, src_va_int,
>>> index)\n");
>>> -    for(i = 0; i < 4; i++)
>>> +    for (i = 0; i < 4; i++)
>>>        printf(" vresult_int[%d] = %d, expected_vresult_int[%d] =
>>> %d\n",
>>>  	     i, vresult_int[i], i, expected_vresult_int[i]);
>>>  #else
>>> @@ -100,10 +103,10 @@ main (int argc, char *argv [])
>>>  						 
>>>    vresult_float = vec_replace_elt (src_va_float, src_a_float, 1);
>>>  
>>> -  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
>>> +  if (!vec_all_eq (vresult_float, expected_vresult_float)) {
>>>  #if DEBUG
>>>      printf("ERROR, vec_replace_elt (src_vb_float, src_va_float,
>>> index)\n");
>>> -    for(i = 0; i < 4; i++)
>>> +    for (i = 0; i < 4; i++)
>>>        printf(" vresult_float[%d] = %f, expected_vresult_float[%d]
>>> = %f\n",
>>>  	     i, vresult_float[i], i, expected_vresult_float[i]);
>>>  #else
>>> @@ -122,7 +125,7 @@ main (int argc, char *argv [])
>>>    if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
>>>  #if DEBUG
>>>      printf("ERROR, vec_replace_elt (src_vb_ullint, src_va_ullint,
>>> index)\n");
>>> -    for(i = 0; i < 2; i++)
>>> +    for (i = 0; i < 2; i++)
>>>        printf(" vresult_ullint[%d] = %d,
>>> expected_vresult_ullint[%d] = %d\n",
>>>  	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
>>>  #else
>>> @@ -137,10 +140,10 @@ main (int argc, char *argv [])
>>>  						 
>>>    vresult_llint = vec_replace_elt (src_va_llint, src_a_llint, 1);
>>>  
>>> -  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
>>> +  if (!vec_all_eq (vresult_llint, expected_vresult_llint)) {
>>>  #if DEBUG
>>>      printf("ERROR, vec_replace_elt (src_vb_llint, src_va_llint,
>>> index)\n");
>>> -    for(i = 0; i < 2; i++)
>>> +    for (i = 0; i < 2; i++)
>>>        printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d]
>>> = %d\n",
>>>  	     i, vresult_llint[i], i, expected_vresult_llint[i]);
>>>  #else
>>> @@ -155,10 +158,10 @@ main (int argc, char *argv [])
>>>  						 
>>>    vresult_double = vec_replace_elt (src_va_double, src_a_double,
>>> 1);
>>>  
>>> -  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
>>> +  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
>>>  #if DEBUG
>>>      printf("ERROR, vec_replace_elt (src_vb_double, src_va_double,
>>> index)\n");
>>> -    for(i = 0; i < 2; i++)
>>> +    for (i = 0; i < 2; i++)
>>>        printf(" vresult_double[%d] = %f,
>>> expected_vresult_double[%d] = %f\n",
>>>  	     i, vresult_double[i], i, expected_vresult_double[i]);
>>>  #else
>>> @@ -169,60 +172,68 @@ main (int argc, char *argv [])
>>>  
>>>    /* Vector replace 32-bit element, unaligned */
>>>    src_a_uint = 345;
>>> -  src_va_uint = (vector unsigned int) { 1, 2, 0, 0 };
>>> -  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
>>> +  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
>>> +					  0, 0, 0, 0, 0, 0, 0, 0 };
>>> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
>>> +					   0, 0, 0, 0, 0, 0, 0, 0 };
>>>    /* Byte index 7 will overwrite part of elements 2 and 3 */
>>> -  expected_vresult_uint = (vector unsigned int) { 1, 2, 345*256, 0
>>> };
>>> +  expected_vresult_uchar
>>> +    = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
>>> +			       0, 0x59, 0x1, 0, 0, 0, 0, 0 };
>>>  						 
>>> -  vresult_uchar = vec_replace_unaligned (src_va_uint, src_a_uint,
>>> 3);
>>> -  vresult_uint = (vector unsigned int) vresult_uchar;
>>> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_uint,
>>> 3);
>>>  
>>> -  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
>>> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>>>  #if DEBUG
>>> -    printf("ERROR, vec_replace_unaligned (src_vb_uint,
>>> src_va_uint, index)\n");
>>> -    for(i = 0; i < 4; i++)
>>> -      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] =
>>> %d\n",
>>> -	     i, vresult_uint[i], i, expected_vresult_uint[i]);
>>> +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
>>> src_va_uint, index)\n");
>>> +    for (i = 0; i < 16; i++)
>>> +      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uint[%d]
>>> = 0%x\n",
>>> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>>>  #else
>>>      abort();
>>>  #endif
>>>    }
>>>  
>>>    src_a_int = 234;
>>> -  src_va_int = (vector int) { 1, 0, 3, 4 };
>>> -  vresult_int = (vector int) { 0, 0, 0, 0 };
>>> +  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 0, 0, 0, 0,
>>> +					  3, 0, 0, 0, 4, 0, 0, 0 };
>>> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
>>> +					   0, 0, 0, 0, 0, 0, 0, 0 };
>>>    /* Byte index 7 will over write part of elements 1 and 2 */
>>> -  expected_vresult_int = (vector int) { 1, 234*256, 0, 4 };
>>> -						 
>>> -  vresult_uchar = vec_replace_unaligned (src_va_int, src_a_int,
>>> 7);
>>> -  vresult_int = (vector signed int) vresult_uchar;
>>> +  expected_vresult_uchar = (vector unsigned char) { 1, 0, 0, 0, 0,
>>> 0xea, 0, 0,
>>> +						    0, 0, 0, 0, 4, 0,
>>> 0, 0 };
>>>  
>>> -  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
>>> +  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_int,
>>> 7);
>>> +
>>> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>>>  #if DEBUG
>>> -    printf("ERROR, vec_replace_unaligned (src_vb_int, src_va_int,
>>> index)\n");
>>> -    for(i = 0; i < 4; i++)
>>> -      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] =
>>> %d\n",
>>> -	     i, vresult_int[i], i, expected_vresult_int[i]);
>>> +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
>>> src_va_int, index)\n");
>>> +    for (i = 0; i < 16; i++)
>>> +      printf(" vresult_int[%d] = 0x%x, expected_vresult_int[%d] =
>>> 0x%x\n",
>>> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>>>  #else
>>>      abort();
>>>  #endif
>>>    }
>>>  
>>>    src_a_float = 34.0;
>>> -  src_va_float = (vector float) { 0.0, 10.0, 20.0, 30.0 };
>>> -  vresult_float = (vector float) { 0.0, 0.0, 0.0, 0.0 };
>>> -  expected_vresult_float = (vector float) { 0.0, 34.0, 20.0, 30.0
>>> };
>>> +  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0x41,
>>> 0xa0,
>>> +					  5, 6, 7, 8, 0x41, 0xf0, 0,
>>> 0};
>>> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
>>> +					   0, 0, 0, 0, 0, 0, 0, 0 };
>>> +  expected_vresult_uchar
>>> +    = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 8, 0x42,
>>> +			       5, 6, 7, 8, 0x41, 0xf0, 0, 0 };
>>>  						 
>>> -  vresult_uchar = vec_replace_unaligned (src_va_float,
>>> src_a_float, 8);
>>> -  vresult_float = (vector float) vresult_uchar;
>>> +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
>>> src_a_float, 8);
>>>  
>>> -  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
>>> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>>>  #if DEBUG
>>> -    printf("ERROR, vec_replace_unaligned (src_vb_float,
>>> src_va_float, "
>>> +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
>>> src_va_float, "
>>>  	   "index)\n");
>>> -    for(i = 0; i < 4; i++)
>>> -      printf(" vresult_float[%d] = %f, expected_vresult_float[%d]
>>> = %f\n",
>>> -	     i, vresult_float[i], i, expected_vresult_float[i]);
>>> +    for (i = 0; i < 16; i++)
>>> +      printf(" vresult_uchar[%d] = 0x%x,
>>> expected_vresult_uchar[%d] = 0x%x\n",
>>> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>>>  #else
>>>      abort();
>>>  #endif
>>> @@ -230,61 +241,67 @@ main (int argc, char *argv [])
>>>  
>>>    /* Vector replace 64-bit element, unaligned  */
>>>    src_a_ullint = 456;
>>> -  src_va_ullint = (vector unsigned long long int) { 0, 0x222 };
>>> -  vresult_ullint = (vector unsigned long long int) { 0, 0 };
>>> -  expected_vresult_ullint = (vector unsigned long long int) {
>>> 456*256,
>>> -							      0x200 };
>>> +  src_va_uchar = (vector unsigned char) { 0, 0xc, 0x1, 0, 0, 0, 0,
>>> 0,
>>> +					  0x22, 0x2, 0, 0, 0, 0, 0, 0
>>> };
>>> +  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
>>> +					   0, 0, 0, 0, 0, 0, 0, 0 };
>>> +  expected_vresult_uchar
>>> +    = (vector unsigned char) { 0, 0xc, 0x1, 0, 0, 0xc8, 1, 0,
>>> +			       0, 2, 0, 0, 0, 0, 0, 0 };
>>>  						 
>>>    /* Byte index 7 will over write least significant byte
>>> of  element 0  */
>>> -  vresult_uchar = vec_replace_unaligned (src_va_ullint,
>>> src_a_ullint, 7);
>>> -  vresult_ullint = (vector unsigned long long) vresult_uchar;
>>> +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
>>> src_a_ullint, 7);
>>>  
>>> -  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
>>> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>>>  #if DEBUG
>>> -    printf("ERROR, vec_replace_unaligned (src_vb_ullint,
>>> src_va_ullint, "
>>> +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
>>> src_va_ullint, "
>>>  	   "index)\n");
>>> -    for(i = 0; i < 2; i++)
>>> -      printf(" vresult_ullint[%d] = %d,
>>> expected_vresult_ullint[%d] = %d\n",
>>> -	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
>>> +    for (i = 0; i < 15; i++)
>>> +      printf(" vresult_uchar[%d] = 0x%x,
>>> expected_vresult_uchar[%d] = 0x%x\n",
>>> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>>>  #else
>>>      abort();
>>>  #endif
>>>    }
>>>  
>>>    src_a_llint = 678;
>>> -  src_va_llint = (vector long long int) { 0, 0x101 };
>>> +  src_va_uchar = (vector unsigned char) { 0, 0xa6, 0x2, 0, 0, 0,
>>> 0, 0,
>>> +					  0x0, 0x1, 0, 0, 0, 0, 0, 0 };
>>>    vresult_llint = (vector long long int) { 0, 0 };
>>>    /* Byte index 7 will over write least significant byte
>>> of  element 0  */
>>> -  expected_vresult_llint = (vector long long int) { 678*256, 0x100
>>> };
>>> +  expected_vresult_uchar
>>> +    = (vector unsigned char) { 0x0, 0xa6, 0x2, 0x0, 0x0, 0xa6,
>>> 0x2, 0x0,
>>> +			       0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0
>>> };
>>>  						 
>>> -  vresult_uchar = vec_replace_unaligned (src_va_llint,
>>> src_a_llint, 7);
>>> -  vresult_llint = (vector signed long long) vresult_uchar;
>>> +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
>>> src_a_llint, 7);
>>>  
>>> -  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
>>> +  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
>>>  #if DEBUG
>>> -    printf("ERROR, vec_replace_unaligned (src_vb_llint,
>>> src_va_llint, "
>>> +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
>>> src_va_llint, "
>>>  	   "index)\n");
>>> -    for(i = 0; i < 2; i++)
>>> -      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d]
>>> = %d\n",
>>> -	     i, vresult_llint[i], i, expected_vresult_llint[i]);
>>> +    for (i = 0; i < 16; i++)
>>> +      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d]
>>> = %d\n",
>>> +	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
>>>  #else
>>>      abort();
>>>  #endif
>>>    }
>>>    
>>>    src_a_double = 678.0;
>>> -  src_va_double = (vector double) { 0.0, 50.0 };
>>> +  //  src_va_double = (vector double) { 0.0, 50.0 };
>>> +  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
>>> +					  0, 0, 0, 0, 0, 0, 0, 0 };
>>>    vresult_double = (vector double) { 0.0, 0.0 };
>>>    expected_vresult_double = (vector double) { 0.0, 678.0 };
>>>  						 
>>> -  vresult_uchar = vec_replace_unaligned (src_va_double,
>>> src_a_double, 0);
>>> +  vresult_uchar = vec_replace_unaligned (src_va_uchar,
>>> src_a_double, 0);
>>>    vresult_double = (vector double) vresult_uchar;
>>>  
>>> -  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
>>> +  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
>>>  #if DEBUG
>>> -    printf("ERROR, vec_replace_unaligned (src_vb_double,
>>> src_va_double, "
>>> +    printf("ERROR, vec_replace_unaligned (src_va_uchar,
>>> src_va_double, "
>>>  	   "index)\n");
>>> -    for(i = 0; i < 2; i++)
>>> +    for (i = 0; i < 2; i++)
>>>        printf(" vresult_double[%d] = %f,
>>> expected_vresult_double[%d] = %f\n",
>>>  	     i, vresult_double[i], i, expected_vresult_double[i]);
>>>  #else
>>> @@ -295,7 +312,5 @@ main (int argc, char *argv [])
>>>    return 0;
>>>  }
>>>  
>>> -/* { dg-final { scan-assembler-times {\mvinsw\M} 6 } } */
>>> -/* { dg-final { scan-assembler-times {\mvinsd\M} 6 } } */
>>> -
>>> -
>>> +/* { dg-final { scan-assembler-times {\mvinsw\M} 8 } } */
>>> +/* { dg-final { scan-assembler-times {\mvinsd\M} 4 } } */
>>
>> Why do you have this change?  As changelog, it just updates some
>> types as the
>> prototypes change and formats, it's not clear to me why it can cause
>> the
>> expected insn count to change.
> 
> The original builtin call 
> 
>   vector unsigned char vec_replace_unligned (vector unsigned long long int, ...) 
> 
> generated the vinsd instruction for the two calls with the first
> argument of unsigned long long int.  When the first argument of the
> builtin is changed to the correct type, vector unsigned char the
> builtin generates the vinsw instruction instead.  The change occurs in
> two places resulting in reducing the counts for vinsd by two and
> increasing the counts for vinsw by two.  The other calls to the builtin
> are either vector ints or vector floats which generate the vinsw
> instruction.  Changing the first argument in those calls to vector
> unsigned char still generate the vinsw instruction.

But it did expose something odd and needed to be handled in this change.
I had a further check, for the below test case:

#include "altivec.h"

#ifdef ORIG
vector unsigned char foo (vector unsigned long long v){
  unsigned long long val = 678ull;
  return vec_replace_unaligned (v, val, 7);
}
#else
vector unsigned char foo (vector unsigned long long v){
  unsigned long long val = 678ull;
  return vec_replace_unaligned ((vector unsigned char)v, val, 7);
}
#endif

Without this patch (-DORIG required to match the previous prototype),
it would generate vinsd; while with this proposed patch, it would
generate vinsw.  I think it's unexpected since users can still have
the need to replace a doubleword size of chunk but give a constant
which can be represented by int.  The previous way can support it,
while the new way can't.  So we should have some way to distinguish
it, we have some special-casing in function
altivec_resolve_overloaded_builtin, could you have a check and try
there?  Thanks!

BR,
Kewen
  
Carl Love July 7, 2023, 8:18 p.m. UTC | #4
Kewen:

On Mon, 2023-06-19 at 11:50 +0800, Kewen.Lin wrote:
> > generated the vinsd instruction for the two calls with the first
> > argument of unsigned long long int.  When the first argument of the
> > builtin is changed to the correct type, vector unsigned char the
> > builtin generates the vinsw instruction instead.  The change occurs
> > in
> > two places resulting in reducing the counts for vinsd by two and
> > increasing the counts for vinsw by two.  The other calls to the
> > builtin
> > are either vector ints or vector floats which generate the vinsw
> > instruction.  Changing the first argument in those calls to vector
> > unsigned char still generate the vinsw instruction.
> 
> But it did expose something odd and needed to be handled in this
> change.
> I had a further check, for the below test case:
> 
> #include "altivec.h"
> 
> #ifdef ORIG
> vector unsigned char foo (vector unsigned long long v){
>   unsigned long long val = 678ull;
>   return vec_replace_unaligned (v, val, 7);
> }
> #else
> vector unsigned char foo (vector unsigned long long v){
>   unsigned long long val = 678ull;
>   return vec_replace_unaligned ((vector unsigned char)v, val, 7);
> }
> #endif
> 
> Without this patch (-DORIG required to match the previous prototype),
> it would generate vinsd; while with this proposed patch, it would
> generate vinsw.  I think it's unexpected since users can still have
> the need to replace a doubleword size of chunk but give a constant
> which can be represented by int.  The previous way can support it,
> while the new way can't.  So we should have some way to distinguish
> it, we have some special-casing in function
> altivec_resolve_overloaded_builtin, could you have a check and try
> there?  Thanks!

I added the needed handling in altivec_resolve_overloaded_builtin to
address the issue with the built-in generating the correct instruction
for the unsigned long long cases in the test file.  I added an
additional test file with the above test case.  It was put into a new
test file as it requires the -flax-vector-conversions argument.  I felt
that it was best to separate the tests that need/do not need the -flax-
vector-conversions argument.

Note, adding the additional case statement RS6000_OVLD_VEC_REPLACE_UN
to handle the three argument built-in vec_replace_unaligned in
altivec_resolve_overloaded_builtin exposed an issue with function
find_instance.  Function find_instance assumes there are only two
arguments in the builtin.  There are no checks on the actual number of
arguments used by the built-in. This leads to an error in
tree_operand_check_failed() when using find_builtin.  The find_buitin
function was extended to handle 2 or 3 arguments with a check to make
sure the number of arguments is either 2 or 3.

FYI, I also noticed in the current patch the names in rs6000-
builtins.def and rs6000-overload.def for builtin_altivec_vreplace_un
still reflect the type of the first argument.  The current patch
changes the first argument to vuc, but the naming didn't all get
updated.  I think the names should be changed to reflect the name of
the second argument since the first arguments are all identical.  For
example:
 
    -- a/gcc/config/rs6000/rs6000-builtins.def
    +++ b/gcc/config/rs6000/rs6000-builtins.def
    @@ -3388,29 +3388,29 @@
       const vull __builtin_altivec_vpextd (vull, vull);
         VPEXTD vpextd {}
     
   -  const vuc __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \
   -                                                 const int<4>);
   -    VREPLACE_UN_UV2DI vreplace_un_v2di {}
   +  const vuc __builtin_altivec_vreplace_un_udi (vuc, unsigned long long, \
   +                                               const int<4>);
   +    VREPLACE_UN_UDI vreplace_un_di {}
 
 The name changes will ripple thru files rs6000-builtins.def, rs6000-
 overload.def and vsx.md.

I did all the naming as well in the new version 3 of the patch.

                         Carl
  

Patch

diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..26dc662b8fb 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3059,17 +3059,17 @@ 
     VREPLACE_ELT_V2DF
 
 [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
-  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
+  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
     VREPLACE_UN_UV4SI
-  vuc __builtin_vec_replace_un (vsi, signed int, const int);
+  vuc __builtin_vec_replace_un (vuc, signed int, const int);
     VREPLACE_UN_V4SI
-  vuc __builtin_vec_replace_un (vull, unsigned long long, const int);
+  vuc __builtin_vec_replace_un (vuc, unsigned long long, const int);
     VREPLACE_UN_UV2DI
-  vuc __builtin_vec_replace_un (vsll, signed long long, const int);
+  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
     VREPLACE_UN_V2DI
-  vuc __builtin_vec_replace_un (vf, float, const int);
+  vuc __builtin_vec_replace_un (vuc, float, const int);
     VREPLACE_UN_V4SF
-  vuc __builtin_vec_replace_un (vd, double, const int);
+  vuc __builtin_vec_replace_un (vuc, double, const int);
     VREPLACE_UN_V2DF
 
 [VEC_REVB, vec_revb, __builtin_vec_revb]
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
index 27318822871..66b0ef58996 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
@@ -20,6 +20,9 @@  main (int argc, char *argv [])
   unsigned char ch;
   unsigned int index;
 
+  vector unsigned char src_va_uchar;
+  vector unsigned char expected_vresult_uchar;
+
   vector unsigned int vresult_uint;
   vector unsigned int expected_vresult_uint;
   vector unsigned int src_va_uint;
@@ -64,10 +67,10 @@  main (int argc, char *argv [])
 						 
   vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
 
-  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+  if (!vec_all_eq (vresult_uint, expected_vresult_uint)) {
 #if DEBUG
     printf("ERROR, vec_replace_elt (src_vb_uint, src_va_uint, index)\n");
-    for(i = 0; i < 4; i++)
+    for (i = 0; i < 4; i++)
       printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
 	     i, vresult_uint[i], i, expected_vresult_uint[i]);
 #else
@@ -82,10 +85,10 @@  main (int argc, char *argv [])
 						 
   vresult_int = vec_replace_elt (src_va_int, src_a_int, 1);
 
-  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+  if (!vec_all_eq (vresult_int, expected_vresult_int)) {
 #if DEBUG
     printf("ERROR, vec_replace_elt (src_vb_int, src_va_int, index)\n");
-    for(i = 0; i < 4; i++)
+    for (i = 0; i < 4; i++)
       printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
 	     i, vresult_int[i], i, expected_vresult_int[i]);
 #else
@@ -100,10 +103,10 @@  main (int argc, char *argv [])
 						 
   vresult_float = vec_replace_elt (src_va_float, src_a_float, 1);
 
-  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
+  if (!vec_all_eq (vresult_float, expected_vresult_float)) {
 #if DEBUG
     printf("ERROR, vec_replace_elt (src_vb_float, src_va_float, index)\n");
-    for(i = 0; i < 4; i++)
+    for (i = 0; i < 4; i++)
       printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n",
 	     i, vresult_float[i], i, expected_vresult_float[i]);
 #else
@@ -122,7 +125,7 @@  main (int argc, char *argv [])
   if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
 #if DEBUG
     printf("ERROR, vec_replace_elt (src_vb_ullint, src_va_ullint, index)\n");
-    for(i = 0; i < 2; i++)
+    for (i = 0; i < 2; i++)
       printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
 	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
 #else
@@ -137,10 +140,10 @@  main (int argc, char *argv [])
 						 
   vresult_llint = vec_replace_elt (src_va_llint, src_a_llint, 1);
 
-  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
+  if (!vec_all_eq (vresult_llint, expected_vresult_llint)) {
 #if DEBUG
     printf("ERROR, vec_replace_elt (src_vb_llint, src_va_llint, index)\n");
-    for(i = 0; i < 2; i++)
+    for (i = 0; i < 2; i++)
       printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
 	     i, vresult_llint[i], i, expected_vresult_llint[i]);
 #else
@@ -155,10 +158,10 @@  main (int argc, char *argv [])
 						 
   vresult_double = vec_replace_elt (src_va_double, src_a_double, 1);
 
-  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
+  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
 #if DEBUG
     printf("ERROR, vec_replace_elt (src_vb_double, src_va_double, index)\n");
-    for(i = 0; i < 2; i++)
+    for (i = 0; i < 2; i++)
       printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n",
 	     i, vresult_double[i], i, expected_vresult_double[i]);
 #else
@@ -169,60 +172,68 @@  main (int argc, char *argv [])
 
   /* Vector replace 32-bit element, unaligned */
   src_a_uint = 345;
-  src_va_uint = (vector unsigned int) { 1, 2, 0, 0 };
-  vresult_uint = (vector unsigned int) { 0, 0, 0, 0 };
+  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
+					  0, 0, 0, 0, 0, 0, 0, 0 };
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
   /* Byte index 7 will overwrite part of elements 2 and 3 */
-  expected_vresult_uint = (vector unsigned int) { 1, 2, 345*256, 0 };
+  expected_vresult_uchar
+    = (vector unsigned char) { 1, 0, 0, 0, 2, 0, 0, 0,
+			       0, 0x59, 0x1, 0, 0, 0, 0, 0 };
 						 
-  vresult_uchar = vec_replace_unaligned (src_va_uint, src_a_uint, 3);
-  vresult_uint = (vector unsigned int) vresult_uchar;
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_uint, 3);
 
-  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_uint, src_va_uint, index)\n");
-    for(i = 0; i < 4; i++)
-      printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
-	     i, vresult_uint[i], i, expected_vresult_uint[i]);
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_uint, index)\n");
+    for (i = 0; i < 16; i++)
+      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uint[%d] = 0%x\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
 #else
     abort();
 #endif
   }
 
   src_a_int = 234;
-  src_va_int = (vector int) { 1, 0, 3, 4 };
-  vresult_int = (vector int) { 0, 0, 0, 0 };
+  src_va_uchar = (vector unsigned char) { 1, 0, 0, 0, 0, 0, 0, 0,
+					  3, 0, 0, 0, 4, 0, 0, 0 };
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
   /* Byte index 7 will over write part of elements 1 and 2 */
-  expected_vresult_int = (vector int) { 1, 234*256, 0, 4 };
-						 
-  vresult_uchar = vec_replace_unaligned (src_va_int, src_a_int, 7);
-  vresult_int = (vector signed int) vresult_uchar;
+  expected_vresult_uchar = (vector unsigned char) { 1, 0, 0, 0, 0, 0xea, 0, 0,
+						    0, 0, 0, 0, 4, 0, 0, 0 };
 
-  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_int, 7);
+
+  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_int, src_va_int, index)\n");
-    for(i = 0; i < 4; i++)
-      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
-	     i, vresult_int[i], i, expected_vresult_int[i]);
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_int, index)\n");
+    for (i = 0; i < 16; i++)
+      printf(" vresult_int[%d] = 0x%x, expected_vresult_int[%d] = 0x%x\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
 #else
     abort();
 #endif
   }
 
   src_a_float = 34.0;
-  src_va_float = (vector float) { 0.0, 10.0, 20.0, 30.0 };
-  vresult_float = (vector float) { 0.0, 0.0, 0.0, 0.0 };
-  expected_vresult_float = (vector float) { 0.0, 34.0, 20.0, 30.0 };
+  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0x41, 0xa0,
+					  5, 6, 7, 8, 0x41, 0xf0, 0, 0};
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_uchar
+    = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 8, 0x42,
+			       5, 6, 7, 8, 0x41, 0xf0, 0, 0 };
 						 
-  vresult_uchar = vec_replace_unaligned (src_va_float, src_a_float, 8);
-  vresult_float = (vector float) vresult_uchar;
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_float, 8);
 
-  if (!vec_all_eq (vresult_float,  expected_vresult_float)) {
+  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_float, src_va_float, "
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_float, "
 	   "index)\n");
-    for(i = 0; i < 4; i++)
-      printf(" vresult_float[%d] = %f, expected_vresult_float[%d] = %f\n",
-	     i, vresult_float[i], i, expected_vresult_float[i]);
+    for (i = 0; i < 16; i++)
+      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uchar[%d] = 0x%x\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
 #else
     abort();
 #endif
@@ -230,61 +241,67 @@  main (int argc, char *argv [])
 
   /* Vector replace 64-bit element, unaligned  */
   src_a_ullint = 456;
-  src_va_ullint = (vector unsigned long long int) { 0, 0x222 };
-  vresult_ullint = (vector unsigned long long int) { 0, 0 };
-  expected_vresult_ullint = (vector unsigned long long int) { 456*256,
-							      0x200 };
+  src_va_uchar = (vector unsigned char) { 0, 0xc, 0x1, 0, 0, 0, 0, 0,
+					  0x22, 0x2, 0, 0, 0, 0, 0, 0 };
+  vresult_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					   0, 0, 0, 0, 0, 0, 0, 0 };
+  expected_vresult_uchar
+    = (vector unsigned char) { 0, 0xc, 0x1, 0, 0, 0xc8, 1, 0,
+			       0, 2, 0, 0, 0, 0, 0, 0 };
 						 
   /* Byte index 7 will over write least significant byte of  element 0  */
-  vresult_uchar = vec_replace_unaligned (src_va_ullint, src_a_ullint, 7);
-  vresult_ullint = (vector unsigned long long) vresult_uchar;
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_ullint, 7);
 
-  if (!vec_all_eq (vresult_ullint,  expected_vresult_ullint)) {
+  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_ullint, src_va_ullint, "
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_ullint, "
 	   "index)\n");
-    for(i = 0; i < 2; i++)
-      printf(" vresult_ullint[%d] = %d, expected_vresult_ullint[%d] = %d\n",
-	     i, vresult_ullint[i], i, expected_vresult_ullint[i]);
+    for (i = 0; i < 15; i++)
+      printf(" vresult_uchar[%d] = 0x%x, expected_vresult_uchar[%d] = 0x%x\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
 #else
     abort();
 #endif
   }
 
   src_a_llint = 678;
-  src_va_llint = (vector long long int) { 0, 0x101 };
+  src_va_uchar = (vector unsigned char) { 0, 0xa6, 0x2, 0, 0, 0, 0, 0,
+					  0x0, 0x1, 0, 0, 0, 0, 0, 0 };
   vresult_llint = (vector long long int) { 0, 0 };
   /* Byte index 7 will over write least significant byte of  element 0  */
-  expected_vresult_llint = (vector long long int) { 678*256, 0x100 };
+  expected_vresult_uchar
+    = (vector unsigned char) { 0x0, 0xa6, 0x2, 0x0, 0x0, 0xa6, 0x2, 0x0,
+			       0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0 };
 						 
-  vresult_uchar = vec_replace_unaligned (src_va_llint, src_a_llint, 7);
-  vresult_llint = (vector signed long long) vresult_uchar;
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_llint, 7);
 
-  if (!vec_all_eq (vresult_llint,  expected_vresult_llint)) {
+  if (!vec_all_eq (vresult_uchar, expected_vresult_uchar)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_llint, src_va_llint, "
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_llint, "
 	   "index)\n");
-    for(i = 0; i < 2; i++)
-      printf(" vresult_llint[%d] = %d, expected_vresult_llint[%d] = %d\n",
-	     i, vresult_llint[i], i, expected_vresult_llint[i]);
+    for (i = 0; i < 16; i++)
+      printf(" vresult_uchar[%d] = %d, expected_vresult_uchar[%d] = %d\n",
+	     i, vresult_uchar[i], i, expected_vresult_uchar[i]);
 #else
     abort();
 #endif
   }
   
   src_a_double = 678.0;
-  src_va_double = (vector double) { 0.0, 50.0 };
+  //  src_va_double = (vector double) { 0.0, 50.0 };
+  src_va_uchar = (vector unsigned char) { 0, 0, 0, 0, 0, 0, 0, 0,
+					  0, 0, 0, 0, 0, 0, 0, 0 };
   vresult_double = (vector double) { 0.0, 0.0 };
   expected_vresult_double = (vector double) { 0.0, 678.0 };
 						 
-  vresult_uchar = vec_replace_unaligned (src_va_double, src_a_double, 0);
+  vresult_uchar = vec_replace_unaligned (src_va_uchar, src_a_double, 0);
   vresult_double = (vector double) vresult_uchar;
 
-  if (!vec_all_eq (vresult_double,  expected_vresult_double)) {
+  if (!vec_all_eq (vresult_double, expected_vresult_double)) {
 #if DEBUG
-    printf("ERROR, vec_replace_unaligned (src_vb_double, src_va_double, "
+    printf("ERROR, vec_replace_unaligned (src_va_uchar, src_va_double, "
 	   "index)\n");
-    for(i = 0; i < 2; i++)
+    for (i = 0; i < 2; i++)
       printf(" vresult_double[%d] = %f, expected_vresult_double[%d] = %f\n",
 	     i, vresult_double[i], i, expected_vresult_double[i]);
 #else
@@ -295,7 +312,5 @@  main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mvinsw\M} 6 } } */
-/* { dg-final { scan-assembler-times {\mvinsd\M} 6 } } */
-
-
+/* { dg-final { scan-assembler-times {\mvinsw\M} 8 } } */
+/* { dg-final { scan-assembler-times {\mvinsd\M} 4 } } */