[OG12,commit] vect: WORKAROUND vectorizer bug

Message ID 264e9c27-cef4-b2a5-8758-a8b621428e01@codesourcery.com
State Committed
Headers
Series [OG12,commit] vect: WORKAROUND vectorizer bug |

Commit Message

Andrew Stubbs Oct. 24, 2022, 4:50 p.m. UTC
  I've committed this to the OG12 branch to remove some test failures. We 
probably ought to have something on mainline also, but a proper fix 
would be better.

Without this. the libgomp.oacc-c-c++-common/private-variables.c testcase 
fails to compile due to an ICE.  The OpenACC worker broadcasting code is 
creating SLP optimizable loads and stores in amdgcn address-space-4. 
Previously this was "ok" as SLP didn't work with less that 64-lane 
vectors, but the newly implemented smaller vectors are working as 
intended and optimizing this.

Unfortunately the vectorizer is losing the address-space data from the 
intermediate types, and it all falls apart during expand when it tries 
the convert a 32-bit address into a 64-bit address and that's not 
something that works. At first sight it looks like we could possibly 
make that work with POINTERS_EXTEND_UNSIGNED, but that only changes the 
error message. Fundamentally we need to make sure that various instances 
of "vectype" have the correct address space, but my attempts to do so 
showed that that's a larger task than I have time for right now.

This patch simply prevents the vectorizer working in the case where it 
would break. This should not be a regression because this code didn't 
vectorize at all, previously.

Andrew
vect: WORKAROUND vectorizer bug

This patch disables vectorization of memory accesses to non-default address
spaces where the pointer size is different to the usual pointer size.  This
condition typically occurs in OpenACC programs on amdgcn, where LDS memory is
used for broadcasting gang-private variables between threads. In particular,
see libgomp.oacc-c-c++-common/private-variables.c

The problem is that the address space information is dropped from the various
types in the middle-end and eventually it triggers an ICE trying to do an
address conversion.  That ICE can be avoided by defining
POINTERS_EXTEND_UNSIGNED, but that just produces wrong RTL code later on.

A correct solution would ensure that all the vectypes have the correct address
spaces, but I don't have time for that right now.

gcc/ChangeLog:

	* tree-vect-data-refs.cc (vect_analyze_data_refs): Workaround an
	address-space bug.
  

Comments

Richard Biener Oct. 24, 2022, 6:06 p.m. UTC | #1
> Am 24.10.2022 um 18:51 schrieb Andrew Stubbs <ams@codesourcery.com>:
> 
> I've committed this to the OG12 branch to remove some test failures. We probably ought to have something on mainline also, but a proper fix would be better.
> 
> Without this. the libgomp.oacc-c-c++-common/private-variables.c testcase fails to compile due to an ICE.  The OpenACC worker broadcasting code is creating SLP optimizable loads and stores in amdgcn address-space-4. Previously this was "ok" as SLP didn't work with less that 64-lane vectors, but the newly implemented smaller vectors are working as intended and optimizing this.
> 
> Unfortunately the vectorizer is losing the address-space data from the intermediate types, and it all falls apart during expand when it tries the convert a 32-bit address into a 64-bit address and that's not something that works. At first sight it looks like we could possibly make that work with POINTERS_EXTEND_UNSIGNED, but that only changes the error message. Fundamentally we need to make sure that various instances of "vectype" have the correct address space, but my attempts to do so showed that that's a larger task than I have time for right now.

Istr there were issues like this in the past that I fixed, so any testcase that exposes this with just a gcn cc1 would be nice to have.

Richard 

> 
> This patch simply prevents the vectorizer working in the case where it would break. This should not be a regression because this code didn't vectorize at all, previously.
> 
> Andrew
> <221024-workarround-vec-addrspace-bug.patch>
  
Andrew Stubbs Oct. 27, 2022, 1:12 p.m. UTC | #2
On 24/10/2022 19:06, Richard Biener wrote:
> 
> 
>> Am 24.10.2022 um 18:51 schrieb Andrew Stubbs <ams@codesourcery.com>:
>>
>> I've committed this to the OG12 branch to remove some test failures. We probably ought to have something on mainline also, but a proper fix would be better.
>>
>> Without this. the libgomp.oacc-c-c++-common/private-variables.c testcase fails to compile due to an ICE.  The OpenACC worker broadcasting code is creating SLP optimizable loads and stores in amdgcn address-space-4. Previously this was "ok" as SLP didn't work with less that 64-lane vectors, but the newly implemented smaller vectors are working as intended and optimizing this.
>>
>> Unfortunately the vectorizer is losing the address-space data from the intermediate types, and it all falls apart during expand when it tries the convert a 32-bit address into a 64-bit address and that's not something that works. At first sight it looks like we could possibly make that work with POINTERS_EXTEND_UNSIGNED, but that only changes the error message. Fundamentally we need to make sure that various instances of "vectype" have the correct address space, but my attempts to do so showed that that's a larger task than I have time for right now.
> 
> Istr there were issues like this in the past that I fixed, so any testcase that exposes this with just a gcn cc1 would be nice to have.

I've been unable to reproduce this issue on the mainline compiler. The 
SLP vectorizer says the accesses are not consecutive, although I don't 
know why they would be different.

A simple testcase works fine on OG12 as well. It's something weird to do 
with the OpenACC worker broadcasting code that I can't reproduce manually.

Thank you for the offer. I'll let you know if I get a testcase.

Andrew
  

Patch

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 09223baf718..70b671ed94a 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4598,7 +4598,21 @@  vect_analyze_data_refs (vec_info *vinfo, poly_uint64 *min_vf, bool *fatal)
       /* Set vectype for STMT.  */
       scalar_type = TREE_TYPE (DR_REF (dr));
       tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
-      if (!vectype)
+
+      /* FIXME: If the object is in an address-space in which the pointer size
+	 is different to the default address space then vectorizing here will
+	 lead to an ICE down the road because the address space information
+	 gets lost.  This work-around fixes the problem until we have a proper
+	 solution.  */
+      tree base_object = DR_REF (dr);
+      tree op = (TREE_CODE (base_object) == COMPONENT_REF
+		 || TREE_CODE (base_object) == ARRAY_REF
+		 ? TREE_OPERAND (base_object, 0) : base_object);
+      addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (op));
+      bool addr_space_bug = (!ADDR_SPACE_GENERIC_P (as)
+			     && targetm.addr_space.pointer_mode (as) != Pmode);
+
+      if (!vectype || addr_space_bug)
         {
           if (dump_enabled_p ())
             {