[0/5] Add Power10 XXSPLTI* and LXVKQ instructions

Message ID YYSs1OZk3bR0VxED@toto.the-meissners.org
Headers
Series Add Power10 XXSPLTI* and LXVKQ instructions |

Message

Michael Meissner Nov. 5, 2021, 4:02 a.m. UTC
  These patches are a refinement of the patches to add XXSPLTIDP support on
September 13th.  These patches generate instructions that load up a VSX
register with certain constants instead of using PLXV to load the constant.

On the Power10:

 * XXSPLTIDP is a prefixed instruction that takes a value encoded as a SFmode
   constant, converts it to DFmode, and splats that value to the two 64-bit
   parts of the register.

 * XXSPLTIW is a prefixed instruction that takes a 32-bit value and splats this
   value into the 4 32-bit parts of the vector register, i.e. it can be used to
   generate V4SImode and V4SFmode vector constants where all of the elements
   are the same.

 * XXSPLTI32DX is a prefixed instruction that takes a 32-bit value and splats
   this value into either the 2 even 32-bit parts of the vector register or 2
   odd 32-bit parts.  Thus 2 XXSPLTI32DX instructions can generate a 64-bit
   constant that cannot be generated by XXSPLTIDP.  Note, in the current set of
   patches, I do not add support for XXSPLTI32DX.  I have done so in previous
   patches, and I could add it if desired.  Because it is 2 back-to-back
   prefixed instructions that are serially dependent on each other, I don't
   think it is worthwhile to use XXSPLTI32DX.

 * LXVKQ is a non-prefixed instruction that loads up certain 128-bit values the
   match particular IEEE 128-bit constants (-0.0f128, 1.0f128, 2.0f128, etc.).

There are 5 patches in this set.

One of the takeaways from the last review was it would be desirable to generate
the instruction if it generates a value that matches the vector constant, even
if the vector type is not the native vector type for the instruction.

For example, the following code:

	vector unsigned long long
	foo (void)
	{
	#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
	  return (vector unsigned long long) { 0, 1ULL << 63 };
	#else
	  return (vector unsigned long long) { 1ULL << 63, 0 };
	#endif
	}

should generate:

	foo:
		lxvkq 34,16
	        blr

To that end, I added support to create a data structure that takes a vector or
scalar constant and represents it as a series of bytes, half-words, words, and
double-words.  Then the recognizer functions use this data structure to decide
if a given instruction can be generated.

This way functions like easy_vector_constant can avoid repeatedly taking a
vector constant and converting it into internal format before trying to decide
if a given instruction can be generated.  For example, this is the part in
easy_vector_constant that determines if a vector constant can generate LXVKQ,
XXSPLTIDP, or XXSPLTIW:

      /* Constants that can be generated with ISA 3.1 instructions are
         easy.  */
      vec_const_128bit_type vsx_const;
      if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
	{
	  if (constant_generates_lxvkq (&vsx_const))
	    return true;

	  if (constant_generates_xxspltiw (&vsx_const))
	    return true;

	  if (constant_generates_xxspltidp (&vsx_const))
	    return true;
	}

In theory, a lot of the altivec constant functions could be converted to use
this data structure, but I haven't rewritten those instructions.

The 5 patches are:

1) Add the data structure and function converting vector/scalar constants to
   that data structure.  Note, this function is not used in the current patch,
   but the remaining 4 patches depend on it.
   
2) Add support to recognize when we could generate the LXVKQ instruction.

3) Add support to recognize when we could generate the XXSPLTIW instruction.

4) Add support to recognize when we could generate the XXSPLTIDP instruction
   for vector constants.

5) Add support to recognize when we could generate the XXSPLTIDP instruction
   for SFmode and DFmode constants.

I have built these patches on power9 and power10 little endian systems with no
regressions in the current tests.  I am kicking off a build on a power8 big
endian system as I write this post.  I have run previous versions of the patch
on the big endian system without problems.  I would like to check this into the
GCC 12 trunk branch.

At the moment, I am not asking to be able to back-port the patches to GCC 11,
but we can do this if it is deemed desirable.
  

Comments

Michael Meissner Nov. 5, 2021, 1:08 p.m. UTC | #1
I mentioned that I would start a build/check on a big endian power8 system in
the last set of patches.  There were no regressions with this set of patches on
a big endian system, testing both 32-bit and 64-bit code generation.