Fix wrong code in gnatmake

Message ID Yk7KXSeQ3UmLmMzV@kam.mff.cuni.cz
State New
Headers
Series Fix wrong code in gnatmake |

Commit Message

Jan Hubicka April 7, 2022, 11:26 a.m. UTC
  Hi,
this patch fixes miscompilation of gnatmake.  Modref attempts to track memory
accesses relative to the base pointers which are parameters of functions.
If it fails, it still makes difference between unknown memory access and
global memory access.  The second makes it possible to disambiguate with
memory that is not accessible from outside world (i.e. everything that does
not escape from the caller function).  This is useful so we do not punt
when unknown function is called.

Now I added ref_may_access_global_memory_p to tree-ssa-alias whic is using
ptr_deref_may_alias_global_p.  There is however a shift in meaning of this
predicate: the second tests that the dereference may alias with global variable.

In the testcase we are disambiguating heap allocated escaping memory which is
not a global variable but it is still a global memory in the modref's sense.
So we need to test in addition contains_escaped.

The patch simply copies logic from the predicate and adds the check.  
I am not sure if there is better way to handle this?

I apologize for taking so long time to look into the PR (and other my bugs).
Will try to handle them quickly now.

Bootstrapped/regtested x86_64-linux.

Honza

gcc/ChangeLog:

2022-04-07  Jan Hubicka  <hubicka@ucw.cz>

	PR 104303
	* tree-ssa-alias.cc (ref_may_access_global_memory_p): Fix handling of
	refs.
  

Comments

Richard Biener April 7, 2022, 12:25 p.m. UTC | #1
On Thu, 7 Apr 2022, Jan Hubicka wrote:

> Hi,
> this patch fixes miscompilation of gnatmake.  Modref attempts to track memory
> accesses relative to the base pointers which are parameters of functions.
> If it fails, it still makes difference between unknown memory access and
> global memory access.  The second makes it possible to disambiguate with
> memory that is not accessible from outside world (i.e. everything that does
> not escape from the caller function).  This is useful so we do not punt
> when unknown function is called.
> 
> Now I added ref_may_access_global_memory_p to tree-ssa-alias whic is using
> ptr_deref_may_alias_global_p.  There is however a shift in meaning of this
> predicate: the second tests that the dereference may alias with global variable.
> 
> In the testcase we are disambiguating heap allocated escaping memory which is
> not a global variable but it is still a global memory in the modref's sense.
> So we need to test in addition contains_escaped.
> 
> The patch simply copies logic from the predicate and adds the check.  
> I am not sure if there is better way to handle this?

I'm testing the following variant which exposes this detail
(escaped local memory global or not) in the APIs that say "global"
which allows to remove ref_may_access_global_memory_p.

Richard.

From 66f986c1bdea2597b1e9406429bf1c144d757a3e Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Thu, 7 Apr 2022 14:07:54 +0200
Subject: [PATCH] ipa/104303 - miscompilation of gnatmake
To: gcc-patches@gcc.gnu.org

Modref attempts to track memory accesses relative to the base pointers
which are parameters of functions.
If it fails, it still makes difference between unknown memory access and
global memory access.  The second makes it possible to disambiguate with
memory that is not accessible from outside world (i.e. everything that does
not escape from the caller function).  This is useful so we do not punt
when unknown function is called.

The added ref_may_access_global_memory_p ends up using
ptr_deref_may_alias_global_p which does not consider escaped automatic
variables as global.  For modref those are still global since they
can be accessed from functions called.

The following adds a flag to the *_global_p APIs indicating whether
escaped local memory should be considered as global or not and
removes ref_may_access_global_memory_p in favor of using
ref_may_alias_global_p with the flag set to true.

Richard.

2022-04-07  Richard Biener  <rguenther@suse.de>
	    Jan Hubicka  <hubicka@ucw.cz>

	PR ipa/104303
	* tree-ssa-alias.h (ptr_deref_may_alias_global_p,
	ref_may_alias_global_p, ref_may_alias_global_p,
	stmt_may_clobber_global_p, pt_solution_includes_global): Add
	bool parameters indicating whether escaped locals should be
	considered global.
	* tree-ssa-structalias.cc (pt_solution_includes_global):
	When the new escaped_nonlocal_p flag is true also consider
	pt->vars_contains_escaped.
	* tree-ssa-alias.cc (ptr_deref_may_alias_global_p):
	Pass down new escaped_nonlocal_p flag.
	(ref_may_alias_global_p): Likewise.
	(stmt_may_clobber_global_p): Likewise.
	(ref_may_alias_global_p_1): Likewise.  For decls also
	query the escaped solution if true.
	(ref_may_access_global_memory_p): Remove.
	(modref_may_conflict): Use ref_may_alias_global_p with
	escaped locals considered global.
	(ref_maybe_used_by_stmt_p): Adjust.
	* ipa-fnsummary.cc (points_to_local_or_readonly_memory_p):
	Likewise.
	* tree-ssa-dse.cc (dse_classify_store): Likewise.
	* trans-mem.cc (thread_private_new_memory): Likewise, but
	consider escaped locals global.
	* tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Likewise.

	* gnat.dg/concat5.adb: New.
	* gnat.dg/concat5_pkg1.adb: Likewise.
	* gnat.dg/concat5_pkg1.ads: Likewise.
	* gnat.dg/concat5_pkg2.adb: Likewise.
	* gnat.dg/concat5_pkg2.ads: Likewise.
---
 gcc/ipa-fnsummary.cc                   |  2 +-
 gcc/testsuite/gnat.dg/concat5.adb      |  9 ++++
 gcc/testsuite/gnat.dg/concat5_pkg1.adb | 18 +++++++
 gcc/testsuite/gnat.dg/concat5_pkg1.ads |  5 ++
 gcc/testsuite/gnat.dg/concat5_pkg2.adb | 10 ++++
 gcc/testsuite/gnat.dg/concat5_pkg2.ads |  5 ++
 gcc/trans-mem.cc                       |  2 +-
 gcc/tree-ssa-alias.cc                  | 65 ++++++++++----------------
 gcc/tree-ssa-alias.h                   | 10 ++--
 gcc/tree-ssa-dce.cc                    |  2 +-
 gcc/tree-ssa-dse.cc                    |  4 +-
 gcc/tree-ssa-structalias.cc            | 15 ++++--
 12 files changed, 93 insertions(+), 54 deletions(-)
 create mode 100644 gcc/testsuite/gnat.dg/concat5.adb
 create mode 100644 gcc/testsuite/gnat.dg/concat5_pkg1.adb
 create mode 100644 gcc/testsuite/gnat.dg/concat5_pkg1.ads
 create mode 100644 gcc/testsuite/gnat.dg/concat5_pkg2.adb
 create mode 100644 gcc/testsuite/gnat.dg/concat5_pkg2.ads

diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index 10b679bbb27..8a19a54998b 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -2587,7 +2587,7 @@ points_to_local_or_readonly_memory_p (tree t)
 	  && DECL_BY_REFERENCE (DECL_RESULT (current_function_decl))
 	  && t == ssa_default_def (cfun, DECL_RESULT (current_function_decl)))
 	return true;
-      return !ptr_deref_may_alias_global_p (t);
+      return !ptr_deref_may_alias_global_p (t, false);
     }
   if (TREE_CODE (t) == ADDR_EXPR)
     return refs_local_or_readonly_memory_p (TREE_OPERAND (t, 0));
diff --git a/gcc/testsuite/gnat.dg/concat5.adb b/gcc/testsuite/gnat.dg/concat5.adb
new file mode 100644
index 00000000000..fabf24865d6
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/concat5.adb
@@ -0,0 +1,9 @@
+-- { dg-do run }
+-- { dg-options "-O" }
+
+with Concat5_Pkg1; use Concat5_Pkg1;
+
+procedure Concat5 is
+begin
+  Scan ("-RTS=none");
+end;
diff --git a/gcc/testsuite/gnat.dg/concat5_pkg1.adb b/gcc/testsuite/gnat.dg/concat5_pkg1.adb
new file mode 100644
index 00000000000..c32f5af314e
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/concat5_pkg1.adb
@@ -0,0 +1,18 @@
+with Concat5_Pkg2; use Concat5_Pkg2;
+
+package body Concat5_Pkg1 is
+
+  procedure Make_Failed (S : String);
+  pragma No_Inline (Make_Failed);
+
+  procedure Make_Failed (S : String) is
+  begin
+    Compare (S);
+  end;
+
+  procedure Scan (S : String) is
+  begin
+    Make_Failed ("option " & S & " should start with '--'");
+  end;
+
+end Concat5_Pkg1;
diff --git a/gcc/testsuite/gnat.dg/concat5_pkg1.ads b/gcc/testsuite/gnat.dg/concat5_pkg1.ads
new file mode 100644
index 00000000000..7f46d87f6e6
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/concat5_pkg1.ads
@@ -0,0 +1,5 @@
+package Concat5_Pkg1 is
+
+  procedure Scan (S : String);
+
+end Concat5_Pkg1;
diff --git a/gcc/testsuite/gnat.dg/concat5_pkg2.adb b/gcc/testsuite/gnat.dg/concat5_pkg2.adb
new file mode 100644
index 00000000000..98bd38826b2
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/concat5_pkg2.adb
@@ -0,0 +1,10 @@
+package body Concat5_Pkg2 is
+
+  procedure Compare (S : String) is
+  begin
+    if S /= "option -RTS=none should start with '--'" then
+      raise Program_Error;
+    end if;
+  end;
+
+end Concat5_Pkg2;
diff --git a/gcc/testsuite/gnat.dg/concat5_pkg2.ads b/gcc/testsuite/gnat.dg/concat5_pkg2.ads
new file mode 100644
index 00000000000..2931ffd5d5a
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/concat5_pkg2.ads
@@ -0,0 +1,5 @@
+package Concat5_Pkg2 is
+
+  procedure Compare (S : String);
+
+end Concat5_Pkg2;
diff --git a/gcc/trans-mem.cc b/gcc/trans-mem.cc
index e9feac2321c..ae2921f808e 100644
--- a/gcc/trans-mem.cc
+++ b/gcc/trans-mem.cc
@@ -1400,7 +1400,7 @@ thread_private_new_memory (basic_block entry_block, tree x)
   /* Search DEF chain to find the original definition of this address.  */
   do
     {
-      if (ptr_deref_may_alias_global_p (x))
+      if (ptr_deref_may_alias_global_p (x, true))
 	{
 	  /* Address escapes.  This is not thread-private.  */
 	  retval = mem_non_local;
diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 50bd47b31f3..063f1893851 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -210,10 +210,12 @@ dump_alias_stats (FILE *s)
 }
 
 
-/* Return true, if dereferencing PTR may alias with a global variable.  */
+/* Return true, if dereferencing PTR may alias with a global variable.
+   When ESCAPED_LOCAL_P is true escaped local memory is also considered
+   global.  */
 
 bool
-ptr_deref_may_alias_global_p (tree ptr)
+ptr_deref_may_alias_global_p (tree ptr, bool escaped_local_p)
 {
   struct ptr_info_def *pi;
 
@@ -230,7 +232,7 @@ ptr_deref_may_alias_global_p (tree ptr)
     return true;
 
   /* ???  This does not use TBAA to prune globals ptr may not access.  */
-  return pt_solution_includes_global (&pi->pt);
+  return pt_solution_includes_global (&pi->pt, escaped_local_p);
 }
 
 /* Return true if dereferencing PTR may alias DECL.
@@ -480,37 +482,44 @@ ptrs_compare_unequal (tree ptr1, tree ptr2)
   return false;
 }
 
-/* Returns whether reference REF to BASE may refer to global memory.  */
+/* Returns whether reference REF to BASE may refer to global memory.
+   When ESCAPED_LOCAL_P is true escaped local memory is also considered
+   global.  */
 
 static bool
-ref_may_alias_global_p_1 (tree base)
+ref_may_alias_global_p_1 (tree base, bool escaped_local_p)
 {
   if (DECL_P (base))
-    return is_global_var (base);
+    return (is_global_var (base)
+	    || (escaped_local_p
+		&& pt_solution_includes (&cfun->gimple_df->escaped, base)));
   else if (TREE_CODE (base) == MEM_REF
 	   || TREE_CODE (base) == TARGET_MEM_REF)
-    return ptr_deref_may_alias_global_p (TREE_OPERAND (base, 0));
+    return ptr_deref_may_alias_global_p (TREE_OPERAND (base, 0),
+					 escaped_local_p);
   return true;
 }
 
 bool
-ref_may_alias_global_p (ao_ref *ref)
+ref_may_alias_global_p (ao_ref *ref, bool escaped_local_p)
 {
   tree base = ao_ref_base (ref);
-  return ref_may_alias_global_p_1 (base);
+  return ref_may_alias_global_p_1 (base, escaped_local_p);
 }
 
 bool
-ref_may_alias_global_p (tree ref)
+ref_may_alias_global_p (tree ref, bool escaped_local_p)
 {
   tree base = get_base_address (ref);
-  return ref_may_alias_global_p_1 (base);
+  return ref_may_alias_global_p_1 (base, escaped_local_p);
 }
 
-/* Return true whether STMT may clobber global memory.  */
+/* Return true whether STMT may clobber global memory.
+   When ESCAPED_LOCAL_P is true escaped local memory is also considered
+   global.  */
 
 bool
-stmt_may_clobber_global_p (gimple *stmt)
+stmt_may_clobber_global_p (gimple *stmt, bool escaped_local_p)
 {
   tree lhs;
 
@@ -531,7 +540,7 @@ stmt_may_clobber_global_p (gimple *stmt)
     case GIMPLE_ASSIGN:
       lhs = gimple_assign_lhs (stmt);
       return (TREE_CODE (lhs) != SSA_NAME
-	      && ref_may_alias_global_p (lhs));
+	      && ref_may_alias_global_p (lhs, escaped_local_p));
     case GIMPLE_CALL:
       return true;
     default:
@@ -2567,30 +2576,6 @@ refs_output_dependent_p (tree store1, tree store2)
   return refs_may_alias_p_1 (&r1, &r2, false);
 }
 
-/* Return ture if REF may access global memory.  */
-
-bool
-ref_may_access_global_memory_p (ao_ref *ref)
-{
-  if (!ref->ref)
-    return true;
-  tree base = ao_ref_base (ref);
-  if (TREE_CODE (base) == MEM_REF
-      || TREE_CODE (base) == TARGET_MEM_REF)
-    {
-      if (ptr_deref_may_alias_global_p (TREE_OPERAND (base, 0)))
-	return true;
-    }
-  else
-    {
-      if (!auto_var_in_fn_p (base, current_function_decl)
-	  || pt_solution_includes (&cfun->gimple_df->escaped,
-				   base))
-	return true;
-    }
-  return false;
-}
-
 /* Returns true if and only if REF may alias any access stored in TT.
    IF TBAA_P is true, use TBAA oracle.  */
 
@@ -2654,7 +2639,7 @@ modref_may_conflict (const gcall *stmt,
 		{
 		  if (global_memory_ok)
 		    continue;
-		  if (ref_may_access_global_memory_p (ref))
+		  if (ref_may_alias_global_p (ref, true))
 		    return true;
 		  global_memory_ok = true;
 		  num_tests++;
@@ -2990,7 +2975,7 @@ ref_maybe_used_by_stmt_p (gimple *stmt, ao_ref *ref, bool tbaa_p)
 	return is_global_var (base);
       else if (TREE_CODE (base) == MEM_REF
 	       || TREE_CODE (base) == TARGET_MEM_REF)
-	return ptr_deref_may_alias_global_p (TREE_OPERAND (base, 0));
+	return ptr_deref_may_alias_global_p (TREE_OPERAND (base, 0), false);
       return false;
     }
 
diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
index dfb21275657..fa081ab0173 100644
--- a/gcc/tree-ssa-alias.h
+++ b/gcc/tree-ssa-alias.h
@@ -121,18 +121,18 @@ extern tree ao_ref_alias_ptr_type (ao_ref *);
 extern tree ao_ref_base_alias_ptr_type (ao_ref *);
 extern bool ao_ref_alignment (ao_ref *, unsigned int *,
 			      unsigned HOST_WIDE_INT *);
-extern bool ptr_deref_may_alias_global_p (tree);
+extern bool ptr_deref_may_alias_global_p (tree, bool);
 extern bool ptr_derefs_may_alias_p (tree, tree);
 extern bool ptrs_compare_unequal (tree, tree);
-extern bool ref_may_alias_global_p (tree);
-extern bool ref_may_alias_global_p (ao_ref *);
+extern bool ref_may_alias_global_p (tree, bool);
+extern bool ref_may_alias_global_p (ao_ref *, bool);
 extern bool refs_may_alias_p (tree, tree, bool = true);
 extern bool refs_may_alias_p_1 (ao_ref *, ao_ref *, bool);
 extern bool refs_anti_dependent_p (tree, tree);
 extern bool refs_output_dependent_p (tree, tree);
 extern bool ref_maybe_used_by_stmt_p (gimple *, tree, bool = true);
 extern bool ref_maybe_used_by_stmt_p (gimple *, ao_ref *, bool = true);
-extern bool stmt_may_clobber_global_p (gimple *);
+extern bool stmt_may_clobber_global_p (gimple *, bool);
 extern bool stmt_may_clobber_ref_p (gimple *, tree, bool = true);
 extern bool stmt_may_clobber_ref_p_1 (gimple *, ao_ref *, bool = true);
 extern bool call_may_clobber_ref_p (gcall *, tree, bool = true);
@@ -171,7 +171,7 @@ extern void dump_alias_stats (FILE *);
 extern unsigned int compute_may_aliases (void);
 extern bool pt_solution_empty_p (const pt_solution *);
 extern bool pt_solution_singleton_or_null_p (struct pt_solution *, unsigned *);
-extern bool pt_solution_includes_global (struct pt_solution *);
+extern bool pt_solution_includes_global (struct pt_solution *, bool);
 extern bool pt_solution_includes (struct pt_solution *, const_tree);
 extern bool pt_solutions_intersect (struct pt_solution *, struct pt_solution *);
 extern void pt_solution_reset (struct pt_solution *);
diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index 2a13ea34829..34ce8abe33a 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -315,7 +315,7 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool aggressive)
     }
 
   if ((gimple_vdef (stmt) && keep_all_vdefs_p ())
-      || stmt_may_clobber_global_p (stmt))
+      || stmt_may_clobber_global_p (stmt, true))
     {
       mark_stmt_necessary (stmt, true);
       return;
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 3beaed3ad38..881a2d0f98d 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -1030,7 +1030,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
 	 just pretend the stmt makes itself dead.  Otherwise fail.  */
       if (defs.is_empty ())
 	{
-	  if (ref_may_alias_global_p (ref))
+	  if (ref_may_alias_global_p (ref, false))
 	    return DSE_STORE_LIVE;
 
 	  if (by_clobber_p)
@@ -1062,7 +1062,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
 	    {
 	      /* But if the store is to global memory it is definitely
 		 not dead.  */
-	      if (ref_may_alias_global_p (ref))
+	      if (ref_may_alias_global_p (ref, false))
 		return DSE_STORE_LIVE;
 	      defs.unordered_remove (i);
 	    }
diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index d318f883e70..581bdcf5652 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -6995,10 +6995,12 @@ pt_solution_singleton_or_null_p (struct pt_solution *pt, unsigned *uid)
   return true;
 }
 
-/* Return true if the points-to solution *PT includes global memory.  */
+/* Return true if the points-to solution *PT includes global memory.
+   If ESCAPED_LOCAL_P is true then escaped local variables are also
+   considered global.  */
 
 bool
-pt_solution_includes_global (struct pt_solution *pt)
+pt_solution_includes_global (struct pt_solution *pt, bool escaped_local_p)
 {
   if (pt->anything
       || pt->nonlocal
@@ -7009,12 +7011,17 @@ pt_solution_includes_global (struct pt_solution *pt)
       || pt->vars_contains_escaped_heap)
     return true;
 
+  if (escaped_local_p && pt->vars_contains_escaped)
+    return true;
+
   /* 'escaped' is also a placeholder so we have to look into it.  */
   if (pt->escaped)
-    return pt_solution_includes_global (&cfun->gimple_df->escaped);
+    return pt_solution_includes_global (&cfun->gimple_df->escaped,
+					escaped_local_p);
 
   if (pt->ipa_escaped)
-    return pt_solution_includes_global (&ipa_escaped_pt);
+    return pt_solution_includes_global (&ipa_escaped_pt,
+					escaped_local_p);
 
   return false;
 }
  
Jan Hubicka April 7, 2022, 12:58 p.m. UTC | #2
> On Thu, 7 Apr 2022, Jan Hubicka wrote:
> 
> > Hi,
> > this patch fixes miscompilation of gnatmake.  Modref attempts to track memory
> > accesses relative to the base pointers which are parameters of functions.
> > If it fails, it still makes difference between unknown memory access and
> > global memory access.  The second makes it possible to disambiguate with
> > memory that is not accessible from outside world (i.e. everything that does
> > not escape from the caller function).  This is useful so we do not punt
> > when unknown function is called.
> > 
> > Now I added ref_may_access_global_memory_p to tree-ssa-alias whic is using
> > ptr_deref_may_alias_global_p.  There is however a shift in meaning of this
> > predicate: the second tests that the dereference may alias with global variable.
> > 
> > In the testcase we are disambiguating heap allocated escaping memory which is
> > not a global variable but it is still a global memory in the modref's sense.
> > So we need to test in addition contains_escaped.
> > 
> > The patch simply copies logic from the predicate and adds the check.  
> > I am not sure if there is better way to handle this?
> 
> I'm testing the following variant which exposes this detail
> (escaped local memory global or not) in the APIs that say "global"
> which allows to remove ref_may_access_global_memory_p.

Thank you.  Indeed it is better to have an explicit flag, since the
clash of names is bit sensitive. 

Honza
  
Richard Biener April 7, 2022, 1:04 p.m. UTC | #3
On Thu, 7 Apr 2022, Jan Hubicka wrote:

> > On Thu, 7 Apr 2022, Jan Hubicka wrote:
> > 
> > > Hi,
> > > this patch fixes miscompilation of gnatmake.  Modref attempts to track memory
> > > accesses relative to the base pointers which are parameters of functions.
> > > If it fails, it still makes difference between unknown memory access and
> > > global memory access.  The second makes it possible to disambiguate with
> > > memory that is not accessible from outside world (i.e. everything that does
> > > not escape from the caller function).  This is useful so we do not punt
> > > when unknown function is called.
> > > 
> > > Now I added ref_may_access_global_memory_p to tree-ssa-alias whic is using
> > > ptr_deref_may_alias_global_p.  There is however a shift in meaning of this
> > > predicate: the second tests that the dereference may alias with global variable.
> > > 
> > > In the testcase we are disambiguating heap allocated escaping memory which is
> > > not a global variable but it is still a global memory in the modref's sense.
> > > So we need to test in addition contains_escaped.
> > > 
> > > The patch simply copies logic from the predicate and adds the check.  
> > > I am not sure if there is better way to handle this?
> > 
> > I'm testing the following variant which exposes this detail
> > (escaped local memory global or not) in the APIs that say "global"
> > which allows to remove ref_may_access_global_memory_p.
> 
> Thank you.  Indeed it is better to have an explicit flag, since the
> clash of names is bit sensitive. 

OK - bootstrapped / tested on x86_64-unknown-linux-gnu including Ada
and now pushed.

Thanks for analyzing this!

Richard.
  
Thomas Schwinge April 12, 2022, 1:11 p.m. UTC | #4
Hi!

On 2022-04-07T15:04:15+0200, Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> On Thu, 7 Apr 2022, Jan Hubicka wrote:
>> > On Thu, 7 Apr 2022, Jan Hubicka wrote:
>> > > this patch fixes miscompilation of gnatmake.  Modref attempts to track memory
>> > > accesses relative to the base pointers which are parameters of functions.
>> > > If it fails, it still makes difference between unknown memory access and
>> > > global memory access.  The second makes it possible to disambiguate with
>> > > memory that is not accessible from outside world (i.e. everything that does
>> > > not escape from the caller function).  This is useful so we do not punt
>> > > when unknown function is called.
>> > >
>> > > Now I added ref_may_access_global_memory_p to tree-ssa-alias whic is using
>> > > ptr_deref_may_alias_global_p.  There is however a shift in meaning of this
>> > > predicate: the second tests that the dereference may alias with global variable.
>> > >
>> > > In the testcase we are disambiguating heap allocated escaping memory which is
>> > > not a global variable but it is still a global memory in the modref's sense.
>> > > So we need to test in addition contains_escaped.
>> > >
>> > > The patch simply copies logic from the predicate and adds the check.
>> > > I am not sure if there is better way to handle this?
>> >
>> > I'm testing the following variant which exposes this detail
>> > (escaped local memory global or not) in the APIs that say "global"
>> > which allows to remove ref_may_access_global_memory_p.
>>
>> Thank you.  Indeed it is better to have an explicit flag, since the
>> clash of names is bit sensitive.
>
> OK - bootstrapped / tested on x86_64-unknown-linux-gnu including Ada
> and now pushed.

This commit r12-8048-g8c0ebaf9f586100920a3c0849fb10e9985d7ae58
"ipa/104303 - miscompilation of gnatmake" is causing one regression in
nvptx offload testing:

    [...]
    [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/private-variables.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1   at line 142 (test for bogus messages, line 131)
    [...]

I've done a before/after 'diff' of
'-fdump-tree-all -foffload-options=nvptx-none=-fdump-tree-all'
with all functions and calls other than 't4' commented out.

For '-O0', there's no difference at all.

For '-O1', for host compilation we see:

    diff -ru 0-O1/a-private-variables.f90.117t.dce2 ./a-private-variables.f90.117t.dce2
    --- 0-O1/a-private-variables.f90.117t.dce2      2022-04-12 08:36:54.525302868 +0200
    +++ ./a-private-variables.f90.117t.dce2 2022-04-12 12:51:43.726304109 +0200
    @@ -30,9 +30,13 @@

       <bb 3> [local count: 87490071]:
       # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
    +  pt.x = .offset.15_2;
       _25 = .offset.15_2 * 2;
    +  pt.y = _25;
       _27 = .offset.15_2 * 4;
    +  pt.z = _27;
       _29 = .offset.15_2 * 6;
    +  pt.attr[4] = _29;

       <bb 4> [local count: 1073741824]:
       # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
    diff -ru 0-O1/a-private-variables.f90.118t.stdarg ./a-private-variables.f90.118t.stdarg
    --- 0-O1/a-private-variables.f90.118t.stdarg    2022-04-12 08:36:54.525302868 +0200
    +++ ./a-private-variables.f90.118t.stdarg       2022-04-12 12:51:43.726304109 +0200
    @@ -4,6 +4,7 @@
     __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
     {
    +  struct vec3 pt;
       integer(kind=4) .offset.15_2;
       integer(kind=4) .offset.10_4;
       integer(kind=4) _25;
    @@ -25,9 +26,13 @@

       <bb 3> [local count: 87490071]:
       # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
    +  pt.x = .offset.15_2;
       _25 = .offset.15_2 * 2;
    +  pt.y = _25;
       _27 = .offset.15_2 * 4;
    +  pt.z = _27;
       _29 = .offset.15_2 * 6;
    +  pt.attr[4] = _29;

       <bb 4> [local count: 1073741824]:
       # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
    [Similar for following passes/dumps.]
    diff -ru 0-O1/a-private-variables.f90.141t.lim2 ./a-private-variables.f90.141t.lim2
    --- 0-O1/a-private-variables.f90.141t.lim2      2022-04-12 08:36:54.525302868 +0200
    +++ ./a-private-variables.f90.141t.lim2 2022-04-12 12:51:43.730304125 +0200
    @@ -24,11 +24,42 @@
     ;; 5 succs { 7 6 }
     ;; 7 succs { 3 }
     ;; 6 succs { 1 }
    +
    +Symbols to be put in SSA form
    +{ D.4340 D.4356 D.4357 D.4358 D.4359 D.4360 D.4361 D.4362 D.4363 }
    +Incremental SSA update started at block: 0
    +Number of blocks in CFG: 9
    +Number of blocks to update: 8 ( 89%)
    +
    +
    +
    +SSA replacement table
    +N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j
    +
    +pt_x_lsm.22_1 -> { pt_x_lsm.22_72 }
    +pt_z_lsm.24_20 -> { pt_z_lsm.24_9 }
    +pt_attr_I_lsm.25_21 -> { pt_attr_I_lsm.25_10 }
    +pt_y_lsm.23_22 -> { pt_y_lsm.23_73 }
    +Incremental SSA update started at block: 3
    +Number of blocks in CFG: 9
    +Number of blocks to update: 3 ( 33%)
    +
    +
     __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
     {
    +  integer(kind=4) D.4363;
    +  integer(kind=4) pt_attr_I_lsm.25;
    +  integer(kind=4) D.4361;
    +  integer(kind=4) pt_z_lsm.24;
    +  integer(kind=4) D.4359;
    +  integer(kind=4) pt_y_lsm.23;
    +  integer(kind=4) D.4357;
    +  integer(kind=4) pt_x_lsm.22;
    +  struct vec3 pt;
       integer(kind=4) .offset.15_2;
       integer(kind=4) .offset.10_4;
    +  integer(kind=4) _7(D);
       integer(kind=4) _25;
       integer(kind=4) _27;
       integer(kind=4) _29;
    @@ -37,21 +68,32 @@
       integer(kind=8) _43;
       integer(kind=4)[1025] * _45;
       integer(kind=4) _46;
    +  integer(kind=4) _47(D);
       integer(kind=4) _48;
       integer(kind=4) _50;
    +  integer(kind=4) _51(D);
       integer(kind=4) _52;
       integer(kind=4) _54;
       integer(kind=4) .offset.10_56;
       integer(kind=4) .offset.15_63;
    +  integer(kind=4) _70(D);

       <bb 2> [local count: 7128820]:
    +  pt_x_lsm.22_8 = _7(D);
    +  pt_y_lsm.23_49 = _47(D);
    +  pt_z_lsm.24_53 = _51(D);
    +  pt_attr_I_lsm.25_71 = _70(D);
       _45 = *.omp_data_i_44(D).arr;

       <bb 3> [local count: 87490071]:
       # .offset.15_2 = PHI <0(2), .offset.15_63(7)>
    +  pt_x_lsm.22_72 = .offset.15_2;
       _25 = .offset.15_2 * 2;
    +  pt_y_lsm.23_73 = _25;
       _27 = .offset.15_2 * 4;
    +  pt_z_lsm.24_9 = _27;
       _29 = .offset.15_2 * 6;
    +  pt_attr_I_lsm.25_10 = _29;
       _41 = .offset.15_2 * 32;

       <bb 4> [local count: 1073741824]:
    @@ -84,6 +126,14 @@
       goto <bb 3>; [100.00%]

       <bb 6> [local count: 35644102]:
    +  # pt_z_lsm.24_20 = PHI <pt_z_lsm.24_9(5)>
    +  # pt_attr_I_lsm.25_21 = PHI <pt_attr_I_lsm.25_10(5)>
    +  # pt_x_lsm.22_1 = PHI <pt_x_lsm.22_72(5)>
    +  # pt_y_lsm.23_22 = PHI <pt_y_lsm.23_73(5)>
    +  pt.attr[4] = pt_attr_I_lsm.25_21;
    +  pt.z = pt_z_lsm.24_20;
    +  pt.y = pt_y_lsm.23_22;
    +  pt.x = pt_x_lsm.22_1;
       return;

     }
    [Similar for following passes/dumps.]
    diff -ru 0-O1/a-private-variables.f90.148t.dse3 ./a-private-variables.f90.148t.dse3
    --- 0-O1/a-private-variables.f90.148t.dse3      2022-04-12 08:36:54.525302868 +0200
    +++ ./a-private-variables.f90.148t.dse3 2022-04-12 12:51:43.730304125 +0200
    @@ -1,11 +1,24 @@

     ;; Function t4_._omp_fn.0 (t4_._omp_fn.0, funcdef_no=3, decl_uid=4278, cgraph_uid=4, symbol_order=3)

    +Removing basic block 7
    +Removing basic block 8
    +Removing basic block 9
     __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
     {
    +  integer(kind=4) D.4363;
    +  integer(kind=4) pt_attr_I_lsm.25;
    +  integer(kind=4) D.4361;
    +  integer(kind=4) pt_z_lsm.24;
    +  integer(kind=4) D.4359;
    +  integer(kind=4) pt_y_lsm.23;
    +  integer(kind=4) D.4357;
    +  integer(kind=4) pt_x_lsm.22;
    +  struct vec3 pt;
       integer(kind=4) .offset.15_2;
       integer(kind=4) .offset.10_4;
    +  integer(kind=4) _7(D);
       integer(kind=4) _25;
       integer(kind=4) _27;
       integer(kind=4) _29;
    @@ -14,25 +27,28 @@
       integer(kind=8) _43;
       integer(kind=4)[1025] * _45;
       integer(kind=4) _46;
    +  integer(kind=4) _47(D);
       integer(kind=4) _48;
       integer(kind=4) _50;
    +  integer(kind=4) _51(D);
       integer(kind=4) _52;
       integer(kind=4) _54;
       integer(kind=4) .offset.10_56;
       integer(kind=4) .offset.15_63;
    +  integer(kind=4) _70(D);

       <bb 2> [local count: 7128820]:
       _45 = *.omp_data_i_44(D).arr;

       <bb 3> [local count: 87490071]:
    -  # .offset.15_2 = PHI <0(2), .offset.15_63(7)>
    +  # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
       _25 = .offset.15_2 * 2;
       _27 = .offset.15_2 * 4;
       _29 = .offset.15_2 * 6;
       _41 = .offset.15_2 * 32;

       <bb 4> [local count: 1073741824]:
    -  # .offset.10_4 = PHI <0(3), .offset.10_56(8)>
    +  # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
       _42 = .offset.10_4 + _41;
       _43 = (integer(kind=8)) _42;
       _46 = (*_45)[_43];
    @@ -43,23 +59,17 @@
       (*_45)[_43] = _54;
       .offset.10_56 = .offset.10_4 + 1;
       if (.offset.10_56 <= 31)
    -    goto <bb 8>; [89.00%]
    +    goto <bb 4>; [89.00%]
       else
         goto <bb 5>; [11.00%]

    -  <bb 8> [local count: 955630224]:
    -  goto <bb 4>; [100.00%]
    -
       <bb 5> [local count: 437450365]:
       .offset.15_63 = .offset.15_2 + 1;
       if (.offset.15_63 <= 31)
    -    goto <bb 7>; [89.00%]
    +    goto <bb 3>; [89.00%]
       else
         goto <bb 6>; [11.00%]

    -  <bb 7> [local count: 389330825]:
    -  goto <bb 3>; [100.00%]
    -
       <bb 6> [local count: 35644102]:
       return;

    [Similar for following passes/dumps.]

..., so in 'a-private-variables.f90.148t.dse3', the 'pt.{x,y,z,attr}'
assignments for the new 'struct vec3 pt;' get cleaned out, so that should
all be fine; no actual changes in the end.

Comparing '-O1' nvptx offload target compilation before/after, the first
difference is in 'a.xnvptx-none.mkoffload.117t.dce2': similar to host
compilation.  But then, in the following things do not get cleaned up as
they do for the host compilation; the 'pt.{x,y,z,attr}' assignments for
the new 'struct vec3 pt;' persist:

    diff -ru 0-O1/a.xnvptx-none.mkoffload.252t.optimized ./a.xnvptx-none.mkoffload.252t.optimized
    --- 0-O1/a.xnvptx-none.mkoffload.252t.optimized 2022-04-12 08:36:54.569303204 +0200
    +++ ./a.xnvptx-none.mkoffload.252t.optimized    2022-04-12 12:51:43.774304292 +0200
    @@ -7,34 +7,36 @@
     __attribute__((oacc function (32, 8, 32), oacc parallel, omp target entrypoint))
     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
     {
    -  unsigned int ivtmp$6;
    +  unsigned int ivtmp$8;
       unsigned int ivtmp$5;
    +  unsigned int ivtmp$3;
       int D.1527;
       int D.1524;
    +  struct vec3 pt;
       int _2;
    +  int[1025] * _4;
    +  int _22;
       int _23;
       int _25;
       int _27;
       int _29;
       int _34;
    -  int[1025] * _43;
    -  sizetype _45;
    -  sizetype _46;
    -  int[1025] * _47;
    -  unsigned int _48;
    +  sizetype _41;
    +  unsigned int _42;
    +  unsigned int _43;
    +  unsigned int _45;
    +  int _46;
       int _49;
    -  unsigned int _50;
       int _51;
    -  int _52;
    -  int _53;
    -  int _63;
    -  int _82;
    -  int _83;
    -  int _87;
    +  int _54;
    +  sizetype _63;
       int _95;
    +  int _96;
    +  int _99;
       int _102;
    -  int _103;
    +  int[1025] * _103;
       int _104;
    +  int _105;
       int _107;

       <bb 2> [local count: 7128820]:
    @@ -47,43 +49,50 @@
         goto <bb 7>; [73.00%]

       <bb 3> [local count: 1924781]:
    -  _87 = _104 * 32;
    -  ivtmp$5_80 = (unsigned int) _87;
    -  _52 = _104 * 2;
    -  ivtmp$6_54 = (unsigned int) _52;
    +  ivtmp$3_61 = (unsigned int) _104;
    +  _54 = _104 * 2;
    +  ivtmp$5_56 = (unsigned int) _54;
    +  _46 = _104 * 32;
    +  ivtmp$8_48 = (unsigned int) _46;
    +  _42 = (unsigned int) _23;

       <bb 4> [local count: 87490071]:
    -  # _2 = PHI <_104(3), _63(6)>
    -  # ivtmp$5_22 = PHI <ivtmp$5_80(3), ivtmp$5_81(6)>
    -  # ivtmp$6_72 = PHI <ivtmp$6_54(3), ivtmp$6_56(6)>
    -  _25 = (int) ivtmp$6_72;
    -  _50 = ivtmp$6_72 * 2;
    -  _27 = (int) _50;
    -  _48 = ivtmp$6_72 * 3;
    -  _29 = (int) _48;
    +  # ivtmp$3_106 = PHI <ivtmp$3_61(3), ivtmp$3_47(6)>
    +  # ivtmp$5_87 = PHI <ivtmp$5_56(3), ivtmp$5_72(6)>
    +  # ivtmp$8_52 = PHI <ivtmp$8_48(3), ivtmp$8_50(6)>
    +  _2 = (int) ivtmp$3_106;
    +  pt.x = _2;
    +  _25 = (int) ivtmp$5_87;
    +  pt.y = _25;
    +  _45 = ivtmp$5_87 * 2;
    +  _27 = (int) _45;
    +  pt.z = _27;
    +  _43 = ivtmp$5_87 * 3;
    +  _29 = (int) _43;
    +  pt.attr[4] = _29;
       _34 = .UNIQUE (OACC_FORK, 0, 2);

       <bb 5> [local count: 437450365]:
       _107 = .GOACC_DIM_POS (2);
    -  _82 = (int) ivtmp$5_22;
    -  _83 = _82 + _107;
    -  _47 = *.omp_data_i_44(D).arr;
    -  _46 = (sizetype) _83;
    -  _45 = _46 * 4;
    -  _43 = _47 + _45;
    -  _49 = MEM <int> [(int[1025] *)_43];
    -  _51 = _2 + _49;
    -  _53 = _25 + _51;
    -  _103 = _27 + _53;
    -  _95 = _29 + _103;
    -  MEM <int> [(int[1025] *)_43] = _95;
    +  _49 = (int) ivtmp$8_52;
    +  _51 = _49 + _107;
    +  _103 = *.omp_data_i_44(D).arr;
    +  _63 = (sizetype) _51;
    +  _41 = _63 * 4;
    +  _4 = _103 + _41;
    +  _95 = MEM <int> [(int[1025] *)_4];
    +  _96 = _2 + _95;
    +  _22 = _25 + _96;
    +  _105 = _22 + _27;
    +  _99 = _29 + _105;
    +  MEM <int> [(int[1025] *)_4] = _99;
       .UNIQUE (OACC_JOIN, _34, 2);

       <bb 6> [local count: 87490071]:
    -  _63 = _2 + 1;
    -  ivtmp$5_81 = ivtmp$5_22 + 32;
    -  ivtmp$6_56 = ivtmp$6_72 + 2;
    -  if (_23 != _63)
    +  ivtmp$3_47 = ivtmp$3_106 + 1;
    +  ivtmp$5_72 = ivtmp$5_87 + 2;
    +  ivtmp$8_50 = ivtmp$8_52 + 32;
    +  if (_42 != ivtmp$3_47)
         goto <bb 4>; [89.00%]
       else
         goto <bb 7>; [11.00%]

..., and thus the change in diagnostics:

     [...]
     source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ ought to be adjusted for OpenACC privatization level: ‘gang’
     source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ ought to be adjusted for OpenACC privatization level: ‘gang’
    +source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90: In function ‘t4_._omp_fn.0’:
    +source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ adjusted for OpenACC privatization level: ‘gang’
    +  131 |   !$acc loop gang private(pt) ! { dg-line l_loop[incr c_loop] }
    +      |                                                               ^

For '-O2', host compilation begins same as '-O1', and again in
'a-private-variables.f90.148t.dse3', the 'pt.{x,y,z,attr}' assignments
for the new 'struct vec3 pt;' get cleaned out:

    --- ./a-private-variables.f90.144t.sink1        2022-04-12 14:28:19.173520425 +0200
    +++ ./a-private-variables.f90.148t.dse3 2022-04-12 14:28:19.173520425 +0200
    [...]
     __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
     {
    @@ -97,10 +74,6 @@
       goto <bb 3>; [100.00%]

       <bb 6> [local count: 35644102]:
    -  pt.attr[4] = _29;
    -  pt.z = _27;
    -  pt.y = _25;
    -  pt.x = .offset.15_2;
       return;

     }
    [...]

For '-O2', nvptx offload target compilation looks very similar to host
compilation, and again in 'a.xnvptx-none.mkoffload.148t.dse3', the
'pt.{x,y,z,attr}' assignments for the new 'struct vec3 pt;' get cleaned
out:

    --- ./a.xnvptx-none.mkoffload.144t.sink1        2022-04-12 14:28:19.213520366 +0200
    +++ ./a.xnvptx-none.mkoffload.148t.dse3 2022-04-12 14:28:19.213520366 +0200
    [...]
     __attribute__((oacc function (32, 8, 32), oacc parallel, omp target entrypoint))
     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
     {
    @@ -34,13 +25,9 @@

       <bb 2> [local count: 7128820]:
       _104 = .GOACC_DIM_POS (0);
    -  pt.x = _104;
       _25 = _104 * 2;
    -  pt.y = _25;
       _27 = _104 * 4;
    -  pt.z = _27;
       _29 = _104 * 6;
    -  pt.attr[4] = _29;
       _34 = .UNIQUE (OACC_FORK, 0, 2);

       <bb 3> [local count: 437450365]:

..., so no actual changes in the end.

I have not verified other ("higher") optimization levels, but given no
change in diagnostics, I suppose the same ("no actual changes") happens
for those.

Is the '-O1' change/regression unexpected, and should be analyzed, or
should we just accept the slightly worse code generation (for '-O1'
only), and I accordingly adjust the test case for the change in
diagnostics?


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Richard Biener April 12, 2022, 1:45 p.m. UTC | #5
On Tue, 12 Apr 2022, Thomas Schwinge wrote:

> Hi!
> 
> On 2022-04-07T15:04:15+0200, Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > On Thu, 7 Apr 2022, Jan Hubicka wrote:
> >> > On Thu, 7 Apr 2022, Jan Hubicka wrote:
> >> > > this patch fixes miscompilation of gnatmake.  Modref attempts to track memory
> >> > > accesses relative to the base pointers which are parameters of functions.
> >> > > If it fails, it still makes difference between unknown memory access and
> >> > > global memory access.  The second makes it possible to disambiguate with
> >> > > memory that is not accessible from outside world (i.e. everything that does
> >> > > not escape from the caller function).  This is useful so we do not punt
> >> > > when unknown function is called.
> >> > >
> >> > > Now I added ref_may_access_global_memory_p to tree-ssa-alias whic is using
> >> > > ptr_deref_may_alias_global_p.  There is however a shift in meaning of this
> >> > > predicate: the second tests that the dereference may alias with global variable.
> >> > >
> >> > > In the testcase we are disambiguating heap allocated escaping memory which is
> >> > > not a global variable but it is still a global memory in the modref's sense.
> >> > > So we need to test in addition contains_escaped.
> >> > >
> >> > > The patch simply copies logic from the predicate and adds the check.
> >> > > I am not sure if there is better way to handle this?
> >> >
> >> > I'm testing the following variant which exposes this detail
> >> > (escaped local memory global or not) in the APIs that say "global"
> >> > which allows to remove ref_may_access_global_memory_p.
> >>
> >> Thank you.  Indeed it is better to have an explicit flag, since the
> >> clash of names is bit sensitive.
> >
> > OK - bootstrapped / tested on x86_64-unknown-linux-gnu including Ada
> > and now pushed.
> 
> This commit r12-8048-g8c0ebaf9f586100920a3c0849fb10e9985d7ae58
> "ipa/104303 - miscompilation of gnatmake" is causing one regression in
> nvptx offload testing:
> 
>     [...]
>     [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/private-variables.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1   at line 142 (test for bogus messages, line 131)
>     [...]
> 
> I've done a before/after 'diff' of
> '-fdump-tree-all -foffload-options=nvptx-none=-fdump-tree-all'
> with all functions and calls other than 't4' commented out.

I suppose the

diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index 2a13ea34829..34ce8abe33a 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -315,7 +315,7 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool 
aggressive)
     }
 
   if ((gimple_vdef (stmt) && keep_all_vdefs_p ())
-      || stmt_may_clobber_global_p (stmt))
+      || stmt_may_clobber_global_p (stmt, true))
     {
       mark_stmt_necessary (stmt, true);
       return;

was overly conservative (I was probably misled by keep_all_vdefs_p ()),
passing false to stmt_may_clobber_global_p should fix the regression?

Richard.

> For '-O0', there's no difference at all.
> 
> For '-O1', for host compilation we see:
> 
>     diff -ru 0-O1/a-private-variables.f90.117t.dce2 ./a-private-variables.f90.117t.dce2
>     --- 0-O1/a-private-variables.f90.117t.dce2      2022-04-12 08:36:54.525302868 +0200
>     +++ ./a-private-variables.f90.117t.dce2 2022-04-12 12:51:43.726304109 +0200
>     @@ -30,9 +30,13 @@
> 
>        <bb 3> [local count: 87490071]:
>        # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>     +  pt.x = .offset.15_2;
>        _25 = .offset.15_2 * 2;
>     +  pt.y = _25;
>        _27 = .offset.15_2 * 4;
>     +  pt.z = _27;
>        _29 = .offset.15_2 * 6;
>     +  pt.attr[4] = _29;
> 
>        <bb 4> [local count: 1073741824]:
>        # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>     diff -ru 0-O1/a-private-variables.f90.118t.stdarg ./a-private-variables.f90.118t.stdarg
>     --- 0-O1/a-private-variables.f90.118t.stdarg    2022-04-12 08:36:54.525302868 +0200
>     +++ ./a-private-variables.f90.118t.stdarg       2022-04-12 12:51:43.726304109 +0200
>     @@ -4,6 +4,7 @@
>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>      {
>     +  struct vec3 pt;
>        integer(kind=4) .offset.15_2;
>        integer(kind=4) .offset.10_4;
>        integer(kind=4) _25;
>     @@ -25,9 +26,13 @@
> 
>        <bb 3> [local count: 87490071]:
>        # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>     +  pt.x = .offset.15_2;
>        _25 = .offset.15_2 * 2;
>     +  pt.y = _25;
>        _27 = .offset.15_2 * 4;
>     +  pt.z = _27;
>        _29 = .offset.15_2 * 6;
>     +  pt.attr[4] = _29;
> 
>        <bb 4> [local count: 1073741824]:
>        # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>     [Similar for following passes/dumps.]
>     diff -ru 0-O1/a-private-variables.f90.141t.lim2 ./a-private-variables.f90.141t.lim2
>     --- 0-O1/a-private-variables.f90.141t.lim2      2022-04-12 08:36:54.525302868 +0200
>     +++ ./a-private-variables.f90.141t.lim2 2022-04-12 12:51:43.730304125 +0200
>     @@ -24,11 +24,42 @@
>      ;; 5 succs { 7 6 }
>      ;; 7 succs { 3 }
>      ;; 6 succs { 1 }
>     +
>     +Symbols to be put in SSA form
>     +{ D.4340 D.4356 D.4357 D.4358 D.4359 D.4360 D.4361 D.4362 D.4363 }
>     +Incremental SSA update started at block: 0
>     +Number of blocks in CFG: 9
>     +Number of blocks to update: 8 ( 89%)
>     +
>     +
>     +
>     +SSA replacement table
>     +N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j
>     +
>     +pt_x_lsm.22_1 -> { pt_x_lsm.22_72 }
>     +pt_z_lsm.24_20 -> { pt_z_lsm.24_9 }
>     +pt_attr_I_lsm.25_21 -> { pt_attr_I_lsm.25_10 }
>     +pt_y_lsm.23_22 -> { pt_y_lsm.23_73 }
>     +Incremental SSA update started at block: 3
>     +Number of blocks in CFG: 9
>     +Number of blocks to update: 3 ( 33%)
>     +
>     +
>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>      {
>     +  integer(kind=4) D.4363;
>     +  integer(kind=4) pt_attr_I_lsm.25;
>     +  integer(kind=4) D.4361;
>     +  integer(kind=4) pt_z_lsm.24;
>     +  integer(kind=4) D.4359;
>     +  integer(kind=4) pt_y_lsm.23;
>     +  integer(kind=4) D.4357;
>     +  integer(kind=4) pt_x_lsm.22;
>     +  struct vec3 pt;
>        integer(kind=4) .offset.15_2;
>        integer(kind=4) .offset.10_4;
>     +  integer(kind=4) _7(D);
>        integer(kind=4) _25;
>        integer(kind=4) _27;
>        integer(kind=4) _29;
>     @@ -37,21 +68,32 @@
>        integer(kind=8) _43;
>        integer(kind=4)[1025] * _45;
>        integer(kind=4) _46;
>     +  integer(kind=4) _47(D);
>        integer(kind=4) _48;
>        integer(kind=4) _50;
>     +  integer(kind=4) _51(D);
>        integer(kind=4) _52;
>        integer(kind=4) _54;
>        integer(kind=4) .offset.10_56;
>        integer(kind=4) .offset.15_63;
>     +  integer(kind=4) _70(D);
> 
>        <bb 2> [local count: 7128820]:
>     +  pt_x_lsm.22_8 = _7(D);
>     +  pt_y_lsm.23_49 = _47(D);
>     +  pt_z_lsm.24_53 = _51(D);
>     +  pt_attr_I_lsm.25_71 = _70(D);
>        _45 = *.omp_data_i_44(D).arr;
> 
>        <bb 3> [local count: 87490071]:
>        # .offset.15_2 = PHI <0(2), .offset.15_63(7)>
>     +  pt_x_lsm.22_72 = .offset.15_2;
>        _25 = .offset.15_2 * 2;
>     +  pt_y_lsm.23_73 = _25;
>        _27 = .offset.15_2 * 4;
>     +  pt_z_lsm.24_9 = _27;
>        _29 = .offset.15_2 * 6;
>     +  pt_attr_I_lsm.25_10 = _29;
>        _41 = .offset.15_2 * 32;
> 
>        <bb 4> [local count: 1073741824]:
>     @@ -84,6 +126,14 @@
>        goto <bb 3>; [100.00%]
> 
>        <bb 6> [local count: 35644102]:
>     +  # pt_z_lsm.24_20 = PHI <pt_z_lsm.24_9(5)>
>     +  # pt_attr_I_lsm.25_21 = PHI <pt_attr_I_lsm.25_10(5)>
>     +  # pt_x_lsm.22_1 = PHI <pt_x_lsm.22_72(5)>
>     +  # pt_y_lsm.23_22 = PHI <pt_y_lsm.23_73(5)>
>     +  pt.attr[4] = pt_attr_I_lsm.25_21;
>     +  pt.z = pt_z_lsm.24_20;
>     +  pt.y = pt_y_lsm.23_22;
>     +  pt.x = pt_x_lsm.22_1;
>        return;
> 
>      }
>     [Similar for following passes/dumps.]
>     diff -ru 0-O1/a-private-variables.f90.148t.dse3 ./a-private-variables.f90.148t.dse3
>     --- 0-O1/a-private-variables.f90.148t.dse3      2022-04-12 08:36:54.525302868 +0200
>     +++ ./a-private-variables.f90.148t.dse3 2022-04-12 12:51:43.730304125 +0200
>     @@ -1,11 +1,24 @@
> 
>      ;; Function t4_._omp_fn.0 (t4_._omp_fn.0, funcdef_no=3, decl_uid=4278, cgraph_uid=4, symbol_order=3)
> 
>     +Removing basic block 7
>     +Removing basic block 8
>     +Removing basic block 9
>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>      {
>     +  integer(kind=4) D.4363;
>     +  integer(kind=4) pt_attr_I_lsm.25;
>     +  integer(kind=4) D.4361;
>     +  integer(kind=4) pt_z_lsm.24;
>     +  integer(kind=4) D.4359;
>     +  integer(kind=4) pt_y_lsm.23;
>     +  integer(kind=4) D.4357;
>     +  integer(kind=4) pt_x_lsm.22;
>     +  struct vec3 pt;
>        integer(kind=4) .offset.15_2;
>        integer(kind=4) .offset.10_4;
>     +  integer(kind=4) _7(D);
>        integer(kind=4) _25;
>        integer(kind=4) _27;
>        integer(kind=4) _29;
>     @@ -14,25 +27,28 @@
>        integer(kind=8) _43;
>        integer(kind=4)[1025] * _45;
>        integer(kind=4) _46;
>     +  integer(kind=4) _47(D);
>        integer(kind=4) _48;
>        integer(kind=4) _50;
>     +  integer(kind=4) _51(D);
>        integer(kind=4) _52;
>        integer(kind=4) _54;
>        integer(kind=4) .offset.10_56;
>        integer(kind=4) .offset.15_63;
>     +  integer(kind=4) _70(D);
> 
>        <bb 2> [local count: 7128820]:
>        _45 = *.omp_data_i_44(D).arr;
> 
>        <bb 3> [local count: 87490071]:
>     -  # .offset.15_2 = PHI <0(2), .offset.15_63(7)>
>     +  # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>        _25 = .offset.15_2 * 2;
>        _27 = .offset.15_2 * 4;
>        _29 = .offset.15_2 * 6;
>        _41 = .offset.15_2 * 32;
> 
>        <bb 4> [local count: 1073741824]:
>     -  # .offset.10_4 = PHI <0(3), .offset.10_56(8)>
>     +  # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>        _42 = .offset.10_4 + _41;
>        _43 = (integer(kind=8)) _42;
>        _46 = (*_45)[_43];
>     @@ -43,23 +59,17 @@
>        (*_45)[_43] = _54;
>        .offset.10_56 = .offset.10_4 + 1;
>        if (.offset.10_56 <= 31)
>     -    goto <bb 8>; [89.00%]
>     +    goto <bb 4>; [89.00%]
>        else
>          goto <bb 5>; [11.00%]
> 
>     -  <bb 8> [local count: 955630224]:
>     -  goto <bb 4>; [100.00%]
>     -
>        <bb 5> [local count: 437450365]:
>        .offset.15_63 = .offset.15_2 + 1;
>        if (.offset.15_63 <= 31)
>     -    goto <bb 7>; [89.00%]
>     +    goto <bb 3>; [89.00%]
>        else
>          goto <bb 6>; [11.00%]
> 
>     -  <bb 7> [local count: 389330825]:
>     -  goto <bb 3>; [100.00%]
>     -
>        <bb 6> [local count: 35644102]:
>        return;
> 
>     [Similar for following passes/dumps.]
> 
> ..., so in 'a-private-variables.f90.148t.dse3', the 'pt.{x,y,z,attr}'
> assignments for the new 'struct vec3 pt;' get cleaned out, so that should
> all be fine; no actual changes in the end.
> 
> Comparing '-O1' nvptx offload target compilation before/after, the first
> difference is in 'a.xnvptx-none.mkoffload.117t.dce2': similar to host
> compilation.  But then, in the following things do not get cleaned up as
> they do for the host compilation; the 'pt.{x,y,z,attr}' assignments for
> the new 'struct vec3 pt;' persist:
> 
>     diff -ru 0-O1/a.xnvptx-none.mkoffload.252t.optimized ./a.xnvptx-none.mkoffload.252t.optimized
>     --- 0-O1/a.xnvptx-none.mkoffload.252t.optimized 2022-04-12 08:36:54.569303204 +0200
>     +++ ./a.xnvptx-none.mkoffload.252t.optimized    2022-04-12 12:51:43.774304292 +0200
>     @@ -7,34 +7,36 @@
>      __attribute__((oacc function (32, 8, 32), oacc parallel, omp target entrypoint))
>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>      {
>     -  unsigned int ivtmp$6;
>     +  unsigned int ivtmp$8;
>        unsigned int ivtmp$5;
>     +  unsigned int ivtmp$3;
>        int D.1527;
>        int D.1524;
>     +  struct vec3 pt;
>        int _2;
>     +  int[1025] * _4;
>     +  int _22;
>        int _23;
>        int _25;
>        int _27;
>        int _29;
>        int _34;
>     -  int[1025] * _43;
>     -  sizetype _45;
>     -  sizetype _46;
>     -  int[1025] * _47;
>     -  unsigned int _48;
>     +  sizetype _41;
>     +  unsigned int _42;
>     +  unsigned int _43;
>     +  unsigned int _45;
>     +  int _46;
>        int _49;
>     -  unsigned int _50;
>        int _51;
>     -  int _52;
>     -  int _53;
>     -  int _63;
>     -  int _82;
>     -  int _83;
>     -  int _87;
>     +  int _54;
>     +  sizetype _63;
>        int _95;
>     +  int _96;
>     +  int _99;
>        int _102;
>     -  int _103;
>     +  int[1025] * _103;
>        int _104;
>     +  int _105;
>        int _107;
> 
>        <bb 2> [local count: 7128820]:
>     @@ -47,43 +49,50 @@
>          goto <bb 7>; [73.00%]
> 
>        <bb 3> [local count: 1924781]:
>     -  _87 = _104 * 32;
>     -  ivtmp$5_80 = (unsigned int) _87;
>     -  _52 = _104 * 2;
>     -  ivtmp$6_54 = (unsigned int) _52;
>     +  ivtmp$3_61 = (unsigned int) _104;
>     +  _54 = _104 * 2;
>     +  ivtmp$5_56 = (unsigned int) _54;
>     +  _46 = _104 * 32;
>     +  ivtmp$8_48 = (unsigned int) _46;
>     +  _42 = (unsigned int) _23;
> 
>        <bb 4> [local count: 87490071]:
>     -  # _2 = PHI <_104(3), _63(6)>
>     -  # ivtmp$5_22 = PHI <ivtmp$5_80(3), ivtmp$5_81(6)>
>     -  # ivtmp$6_72 = PHI <ivtmp$6_54(3), ivtmp$6_56(6)>
>     -  _25 = (int) ivtmp$6_72;
>     -  _50 = ivtmp$6_72 * 2;
>     -  _27 = (int) _50;
>     -  _48 = ivtmp$6_72 * 3;
>     -  _29 = (int) _48;
>     +  # ivtmp$3_106 = PHI <ivtmp$3_61(3), ivtmp$3_47(6)>
>     +  # ivtmp$5_87 = PHI <ivtmp$5_56(3), ivtmp$5_72(6)>
>     +  # ivtmp$8_52 = PHI <ivtmp$8_48(3), ivtmp$8_50(6)>
>     +  _2 = (int) ivtmp$3_106;
>     +  pt.x = _2;
>     +  _25 = (int) ivtmp$5_87;
>     +  pt.y = _25;
>     +  _45 = ivtmp$5_87 * 2;
>     +  _27 = (int) _45;
>     +  pt.z = _27;
>     +  _43 = ivtmp$5_87 * 3;
>     +  _29 = (int) _43;
>     +  pt.attr[4] = _29;
>        _34 = .UNIQUE (OACC_FORK, 0, 2);
> 
>        <bb 5> [local count: 437450365]:
>        _107 = .GOACC_DIM_POS (2);
>     -  _82 = (int) ivtmp$5_22;
>     -  _83 = _82 + _107;
>     -  _47 = *.omp_data_i_44(D).arr;
>     -  _46 = (sizetype) _83;
>     -  _45 = _46 * 4;
>     -  _43 = _47 + _45;
>     -  _49 = MEM <int> [(int[1025] *)_43];
>     -  _51 = _2 + _49;
>     -  _53 = _25 + _51;
>     -  _103 = _27 + _53;
>     -  _95 = _29 + _103;
>     -  MEM <int> [(int[1025] *)_43] = _95;
>     +  _49 = (int) ivtmp$8_52;
>     +  _51 = _49 + _107;
>     +  _103 = *.omp_data_i_44(D).arr;
>     +  _63 = (sizetype) _51;
>     +  _41 = _63 * 4;
>     +  _4 = _103 + _41;
>     +  _95 = MEM <int> [(int[1025] *)_4];
>     +  _96 = _2 + _95;
>     +  _22 = _25 + _96;
>     +  _105 = _22 + _27;
>     +  _99 = _29 + _105;
>     +  MEM <int> [(int[1025] *)_4] = _99;
>        .UNIQUE (OACC_JOIN, _34, 2);
> 
>        <bb 6> [local count: 87490071]:
>     -  _63 = _2 + 1;
>     -  ivtmp$5_81 = ivtmp$5_22 + 32;
>     -  ivtmp$6_56 = ivtmp$6_72 + 2;
>     -  if (_23 != _63)
>     +  ivtmp$3_47 = ivtmp$3_106 + 1;
>     +  ivtmp$5_72 = ivtmp$5_87 + 2;
>     +  ivtmp$8_50 = ivtmp$8_52 + 32;
>     +  if (_42 != ivtmp$3_47)
>          goto <bb 4>; [89.00%]
>        else
>          goto <bb 7>; [11.00%]
> 
> ..., and thus the change in diagnostics:
> 
>      [...]
>      source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ ought to be adjusted for OpenACC privatization level: ‘gang’
>      source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ ought to be adjusted for OpenACC privatization level: ‘gang’
>     +source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90: In function ‘t4_._omp_fn.0’:
>     +source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ adjusted for OpenACC privatization level: ‘gang’
>     +  131 |   !$acc loop gang private(pt) ! { dg-line l_loop[incr c_loop] }
>     +      |                                                               ^
> 
> For '-O2', host compilation begins same as '-O1', and again in
> 'a-private-variables.f90.148t.dse3', the 'pt.{x,y,z,attr}' assignments
> for the new 'struct vec3 pt;' get cleaned out:
> 
>     --- ./a-private-variables.f90.144t.sink1        2022-04-12 14:28:19.173520425 +0200
>     +++ ./a-private-variables.f90.148t.dse3 2022-04-12 14:28:19.173520425 +0200
>     [...]
>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>      {
>     @@ -97,10 +74,6 @@
>        goto <bb 3>; [100.00%]
> 
>        <bb 6> [local count: 35644102]:
>     -  pt.attr[4] = _29;
>     -  pt.z = _27;
>     -  pt.y = _25;
>     -  pt.x = .offset.15_2;
>        return;
> 
>      }
>     [...]
> 
> For '-O2', nvptx offload target compilation looks very similar to host
> compilation, and again in 'a.xnvptx-none.mkoffload.148t.dse3', the
> 'pt.{x,y,z,attr}' assignments for the new 'struct vec3 pt;' get cleaned
> out:
> 
>     --- ./a.xnvptx-none.mkoffload.144t.sink1        2022-04-12 14:28:19.213520366 +0200
>     +++ ./a.xnvptx-none.mkoffload.148t.dse3 2022-04-12 14:28:19.213520366 +0200
>     [...]
>      __attribute__((oacc function (32, 8, 32), oacc parallel, omp target entrypoint))
>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>      {
>     @@ -34,13 +25,9 @@
> 
>        <bb 2> [local count: 7128820]:
>        _104 = .GOACC_DIM_POS (0);
>     -  pt.x = _104;
>        _25 = _104 * 2;
>     -  pt.y = _25;
>        _27 = _104 * 4;
>     -  pt.z = _27;
>        _29 = _104 * 6;
>     -  pt.attr[4] = _29;
>        _34 = .UNIQUE (OACC_FORK, 0, 2);
> 
>        <bb 3> [local count: 437450365]:
> 
> ..., so no actual changes in the end.
> 
> I have not verified other ("higher") optimization levels, but given no
> change in diagnostics, I suppose the same ("no actual changes") happens
> for those.
> 
> Is the '-O1' change/regression unexpected, and should be analyzed, or
> should we just accept the slightly worse code generation (for '-O1'
> only), and I accordingly adjust the test case for the change in
> diagnostics?
> 
> 
> Grüße
>  Thomas
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
>
  
Thomas Schwinge April 12, 2022, 3:07 p.m. UTC | #6
Hi!

On 2022-04-12T15:45:03+0200, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 12 Apr 2022, Thomas Schwinge wrote:
>> On 2022-04-07T15:04:15+0200, Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> > On Thu, 7 Apr 2022, Jan Hubicka wrote:
>> >> > On Thu, 7 Apr 2022, Jan Hubicka wrote:
>> >> > > this patch fixes miscompilation of gnatmake.  Modref attempts to track memory
>> >> > > accesses relative to the base pointers which are parameters of functions.
>> >> > > If it fails, it still makes difference between unknown memory access and
>> >> > > global memory access.  The second makes it possible to disambiguate with
>> >> > > memory that is not accessible from outside world (i.e. everything that does
>> >> > > not escape from the caller function).  This is useful so we do not punt
>> >> > > when unknown function is called.
>> >> > >
>> >> > > Now I added ref_may_access_global_memory_p to tree-ssa-alias whic is using
>> >> > > ptr_deref_may_alias_global_p.  There is however a shift in meaning of this
>> >> > > predicate: the second tests that the dereference may alias with global variable.
>> >> > >
>> >> > > In the testcase we are disambiguating heap allocated escaping memory which is
>> >> > > not a global variable but it is still a global memory in the modref's sense.
>> >> > > So we need to test in addition contains_escaped.
>> >> > >
>> >> > > The patch simply copies logic from the predicate and adds the check.
>> >> > > I am not sure if there is better way to handle this?
>> >> >
>> >> > I'm testing the following variant which exposes this detail
>> >> > (escaped local memory global or not) in the APIs that say "global"
>> >> > which allows to remove ref_may_access_global_memory_p.
>> >>
>> >> Thank you.  Indeed it is better to have an explicit flag, since the
>> >> clash of names is bit sensitive.
>> >
>> > OK - bootstrapped / tested on x86_64-unknown-linux-gnu including Ada
>> > and now pushed.
>>
>> This commit r12-8048-g8c0ebaf9f586100920a3c0849fb10e9985d7ae58
>> "ipa/104303 - miscompilation of gnatmake" is causing one regression in
>> nvptx offload testing:
>>
>>     [...]
>>     [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/private-variables.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1   at line 142 (test for bogus messages, line 131)
>>     [...]
>>
>> I've done a before/after 'diff' of
>> '-fdump-tree-all -foffload-options=nvptx-none=-fdump-tree-all'
>> with all functions and calls other than 't4' commented out.
>
> I suppose the
>
> diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
> index 2a13ea34829..34ce8abe33a 100644
> --- a/gcc/tree-ssa-dce.cc
> +++ b/gcc/tree-ssa-dce.cc
> @@ -315,7 +315,7 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool
> aggressive)
>      }
>
>    if ((gimple_vdef (stmt) && keep_all_vdefs_p ())
> -      || stmt_may_clobber_global_p (stmt))
> +      || stmt_may_clobber_global_p (stmt, true))
>      {
>        mark_stmt_necessary (stmt, true);
>        return;
>
> was overly conservative (I was probably misled by keep_all_vdefs_p ()),
> passing false to stmt_may_clobber_global_p should fix the regression?

It does, thanks.  Per more 'diff'ing, this change again enables
the '-O1' host compilation 'a-private-variables.f90.117t.dce2' to
clean this up, and likewise for nvptx offload compilation
'a.xnvptx-none.mkoffload.117t.dce2', thus the extra diagnostic
again disappears.  In fact, for all optimization flags variants
that I've tried, we've again got the exactly same dumps as before
commit r12-8048-g8c0ebaf9f586100920a3c0849fb10e9985d7ae58
"ipa/104303 - miscompilation of gnatmake"!

So I'll run that through standard bootstrap/regression testing, and push?
Got a suggestion for rationale to put into the commit log?


Grüße
 Thomas


>> For '-O0', there's no difference at all.
>>
>> For '-O1', for host compilation we see:
>>
>>     diff -ru 0-O1/a-private-variables.f90.117t.dce2 ./a-private-variables.f90.117t.dce2
>>     --- 0-O1/a-private-variables.f90.117t.dce2      2022-04-12 08:36:54.525302868 +0200
>>     +++ ./a-private-variables.f90.117t.dce2 2022-04-12 12:51:43.726304109 +0200
>>     @@ -30,9 +30,13 @@
>>
>>        <bb 3> [local count: 87490071]:
>>        # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>>     +  pt.x = .offset.15_2;
>>        _25 = .offset.15_2 * 2;
>>     +  pt.y = _25;
>>        _27 = .offset.15_2 * 4;
>>     +  pt.z = _27;
>>        _29 = .offset.15_2 * 6;
>>     +  pt.attr[4] = _29;
>>
>>        <bb 4> [local count: 1073741824]:
>>        # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>>     diff -ru 0-O1/a-private-variables.f90.118t.stdarg ./a-private-variables.f90.118t.stdarg
>>     --- 0-O1/a-private-variables.f90.118t.stdarg    2022-04-12 08:36:54.525302868 +0200
>>     +++ ./a-private-variables.f90.118t.stdarg       2022-04-12 12:51:43.726304109 +0200
>>     @@ -4,6 +4,7 @@
>>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     +  struct vec3 pt;
>>        integer(kind=4) .offset.15_2;
>>        integer(kind=4) .offset.10_4;
>>        integer(kind=4) _25;
>>     @@ -25,9 +26,13 @@
>>
>>        <bb 3> [local count: 87490071]:
>>        # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>>     +  pt.x = .offset.15_2;
>>        _25 = .offset.15_2 * 2;
>>     +  pt.y = _25;
>>        _27 = .offset.15_2 * 4;
>>     +  pt.z = _27;
>>        _29 = .offset.15_2 * 6;
>>     +  pt.attr[4] = _29;
>>
>>        <bb 4> [local count: 1073741824]:
>>        # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>>     [Similar for following passes/dumps.]
>>     diff -ru 0-O1/a-private-variables.f90.141t.lim2 ./a-private-variables.f90.141t.lim2
>>     --- 0-O1/a-private-variables.f90.141t.lim2      2022-04-12 08:36:54.525302868 +0200
>>     +++ ./a-private-variables.f90.141t.lim2 2022-04-12 12:51:43.730304125 +0200
>>     @@ -24,11 +24,42 @@
>>      ;; 5 succs { 7 6 }
>>      ;; 7 succs { 3 }
>>      ;; 6 succs { 1 }
>>     +
>>     +Symbols to be put in SSA form
>>     +{ D.4340 D.4356 D.4357 D.4358 D.4359 D.4360 D.4361 D.4362 D.4363 }
>>     +Incremental SSA update started at block: 0
>>     +Number of blocks in CFG: 9
>>     +Number of blocks to update: 8 ( 89%)
>>     +
>>     +
>>     +
>>     +SSA replacement table
>>     +N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j
>>     +
>>     +pt_x_lsm.22_1 -> { pt_x_lsm.22_72 }
>>     +pt_z_lsm.24_20 -> { pt_z_lsm.24_9 }
>>     +pt_attr_I_lsm.25_21 -> { pt_attr_I_lsm.25_10 }
>>     +pt_y_lsm.23_22 -> { pt_y_lsm.23_73 }
>>     +Incremental SSA update started at block: 3
>>     +Number of blocks in CFG: 9
>>     +Number of blocks to update: 3 ( 33%)
>>     +
>>     +
>>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     +  integer(kind=4) D.4363;
>>     +  integer(kind=4) pt_attr_I_lsm.25;
>>     +  integer(kind=4) D.4361;
>>     +  integer(kind=4) pt_z_lsm.24;
>>     +  integer(kind=4) D.4359;
>>     +  integer(kind=4) pt_y_lsm.23;
>>     +  integer(kind=4) D.4357;
>>     +  integer(kind=4) pt_x_lsm.22;
>>     +  struct vec3 pt;
>>        integer(kind=4) .offset.15_2;
>>        integer(kind=4) .offset.10_4;
>>     +  integer(kind=4) _7(D);
>>        integer(kind=4) _25;
>>        integer(kind=4) _27;
>>        integer(kind=4) _29;
>>     @@ -37,21 +68,32 @@
>>        integer(kind=8) _43;
>>        integer(kind=4)[1025] * _45;
>>        integer(kind=4) _46;
>>     +  integer(kind=4) _47(D);
>>        integer(kind=4) _48;
>>        integer(kind=4) _50;
>>     +  integer(kind=4) _51(D);
>>        integer(kind=4) _52;
>>        integer(kind=4) _54;
>>        integer(kind=4) .offset.10_56;
>>        integer(kind=4) .offset.15_63;
>>     +  integer(kind=4) _70(D);
>>
>>        <bb 2> [local count: 7128820]:
>>     +  pt_x_lsm.22_8 = _7(D);
>>     +  pt_y_lsm.23_49 = _47(D);
>>     +  pt_z_lsm.24_53 = _51(D);
>>     +  pt_attr_I_lsm.25_71 = _70(D);
>>        _45 = *.omp_data_i_44(D).arr;
>>
>>        <bb 3> [local count: 87490071]:
>>        # .offset.15_2 = PHI <0(2), .offset.15_63(7)>
>>     +  pt_x_lsm.22_72 = .offset.15_2;
>>        _25 = .offset.15_2 * 2;
>>     +  pt_y_lsm.23_73 = _25;
>>        _27 = .offset.15_2 * 4;
>>     +  pt_z_lsm.24_9 = _27;
>>        _29 = .offset.15_2 * 6;
>>     +  pt_attr_I_lsm.25_10 = _29;
>>        _41 = .offset.15_2 * 32;
>>
>>        <bb 4> [local count: 1073741824]:
>>     @@ -84,6 +126,14 @@
>>        goto <bb 3>; [100.00%]
>>
>>        <bb 6> [local count: 35644102]:
>>     +  # pt_z_lsm.24_20 = PHI <pt_z_lsm.24_9(5)>
>>     +  # pt_attr_I_lsm.25_21 = PHI <pt_attr_I_lsm.25_10(5)>
>>     +  # pt_x_lsm.22_1 = PHI <pt_x_lsm.22_72(5)>
>>     +  # pt_y_lsm.23_22 = PHI <pt_y_lsm.23_73(5)>
>>     +  pt.attr[4] = pt_attr_I_lsm.25_21;
>>     +  pt.z = pt_z_lsm.24_20;
>>     +  pt.y = pt_y_lsm.23_22;
>>     +  pt.x = pt_x_lsm.22_1;
>>        return;
>>
>>      }
>>     [Similar for following passes/dumps.]
>>     diff -ru 0-O1/a-private-variables.f90.148t.dse3 ./a-private-variables.f90.148t.dse3
>>     --- 0-O1/a-private-variables.f90.148t.dse3      2022-04-12 08:36:54.525302868 +0200
>>     +++ ./a-private-variables.f90.148t.dse3 2022-04-12 12:51:43.730304125 +0200
>>     @@ -1,11 +1,24 @@
>>
>>      ;; Function t4_._omp_fn.0 (t4_._omp_fn.0, funcdef_no=3, decl_uid=4278, cgraph_uid=4, symbol_order=3)
>>
>>     +Removing basic block 7
>>     +Removing basic block 8
>>     +Removing basic block 9
>>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     +  integer(kind=4) D.4363;
>>     +  integer(kind=4) pt_attr_I_lsm.25;
>>     +  integer(kind=4) D.4361;
>>     +  integer(kind=4) pt_z_lsm.24;
>>     +  integer(kind=4) D.4359;
>>     +  integer(kind=4) pt_y_lsm.23;
>>     +  integer(kind=4) D.4357;
>>     +  integer(kind=4) pt_x_lsm.22;
>>     +  struct vec3 pt;
>>        integer(kind=4) .offset.15_2;
>>        integer(kind=4) .offset.10_4;
>>     +  integer(kind=4) _7(D);
>>        integer(kind=4) _25;
>>        integer(kind=4) _27;
>>        integer(kind=4) _29;
>>     @@ -14,25 +27,28 @@
>>        integer(kind=8) _43;
>>        integer(kind=4)[1025] * _45;
>>        integer(kind=4) _46;
>>     +  integer(kind=4) _47(D);
>>        integer(kind=4) _48;
>>        integer(kind=4) _50;
>>     +  integer(kind=4) _51(D);
>>        integer(kind=4) _52;
>>        integer(kind=4) _54;
>>        integer(kind=4) .offset.10_56;
>>        integer(kind=4) .offset.15_63;
>>     +  integer(kind=4) _70(D);
>>
>>        <bb 2> [local count: 7128820]:
>>        _45 = *.omp_data_i_44(D).arr;
>>
>>        <bb 3> [local count: 87490071]:
>>     -  # .offset.15_2 = PHI <0(2), .offset.15_63(7)>
>>     +  # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>>        _25 = .offset.15_2 * 2;
>>        _27 = .offset.15_2 * 4;
>>        _29 = .offset.15_2 * 6;
>>        _41 = .offset.15_2 * 32;
>>
>>        <bb 4> [local count: 1073741824]:
>>     -  # .offset.10_4 = PHI <0(3), .offset.10_56(8)>
>>     +  # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>>        _42 = .offset.10_4 + _41;
>>        _43 = (integer(kind=8)) _42;
>>        _46 = (*_45)[_43];
>>     @@ -43,23 +59,17 @@
>>        (*_45)[_43] = _54;
>>        .offset.10_56 = .offset.10_4 + 1;
>>        if (.offset.10_56 <= 31)
>>     -    goto <bb 8>; [89.00%]
>>     +    goto <bb 4>; [89.00%]
>>        else
>>          goto <bb 5>; [11.00%]
>>
>>     -  <bb 8> [local count: 955630224]:
>>     -  goto <bb 4>; [100.00%]
>>     -
>>        <bb 5> [local count: 437450365]:
>>        .offset.15_63 = .offset.15_2 + 1;
>>        if (.offset.15_63 <= 31)
>>     -    goto <bb 7>; [89.00%]
>>     +    goto <bb 3>; [89.00%]
>>        else
>>          goto <bb 6>; [11.00%]
>>
>>     -  <bb 7> [local count: 389330825]:
>>     -  goto <bb 3>; [100.00%]
>>     -
>>        <bb 6> [local count: 35644102]:
>>        return;
>>
>>     [Similar for following passes/dumps.]
>>
>> ..., so in 'a-private-variables.f90.148t.dse3', the 'pt.{x,y,z,attr}'
>> assignments for the new 'struct vec3 pt;' get cleaned out, so that should
>> all be fine; no actual changes in the end.
>>
>> Comparing '-O1' nvptx offload target compilation before/after, the first
>> difference is in 'a.xnvptx-none.mkoffload.117t.dce2': similar to host
>> compilation.  But then, in the following things do not get cleaned up as
>> they do for the host compilation; the 'pt.{x,y,z,attr}' assignments for
>> the new 'struct vec3 pt;' persist:
>>
>>     diff -ru 0-O1/a.xnvptx-none.mkoffload.252t.optimized ./a.xnvptx-none.mkoffload.252t.optimized
>>     --- 0-O1/a.xnvptx-none.mkoffload.252t.optimized 2022-04-12 08:36:54.569303204 +0200
>>     +++ ./a.xnvptx-none.mkoffload.252t.optimized    2022-04-12 12:51:43.774304292 +0200
>>     @@ -7,34 +7,36 @@
>>      __attribute__((oacc function (32, 8, 32), oacc parallel, omp target entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     -  unsigned int ivtmp$6;
>>     +  unsigned int ivtmp$8;
>>        unsigned int ivtmp$5;
>>     +  unsigned int ivtmp$3;
>>        int D.1527;
>>        int D.1524;
>>     +  struct vec3 pt;
>>        int _2;
>>     +  int[1025] * _4;
>>     +  int _22;
>>        int _23;
>>        int _25;
>>        int _27;
>>        int _29;
>>        int _34;
>>     -  int[1025] * _43;
>>     -  sizetype _45;
>>     -  sizetype _46;
>>     -  int[1025] * _47;
>>     -  unsigned int _48;
>>     +  sizetype _41;
>>     +  unsigned int _42;
>>     +  unsigned int _43;
>>     +  unsigned int _45;
>>     +  int _46;
>>        int _49;
>>     -  unsigned int _50;
>>        int _51;
>>     -  int _52;
>>     -  int _53;
>>     -  int _63;
>>     -  int _82;
>>     -  int _83;
>>     -  int _87;
>>     +  int _54;
>>     +  sizetype _63;
>>        int _95;
>>     +  int _96;
>>     +  int _99;
>>        int _102;
>>     -  int _103;
>>     +  int[1025] * _103;
>>        int _104;
>>     +  int _105;
>>        int _107;
>>
>>        <bb 2> [local count: 7128820]:
>>     @@ -47,43 +49,50 @@
>>          goto <bb 7>; [73.00%]
>>
>>        <bb 3> [local count: 1924781]:
>>     -  _87 = _104 * 32;
>>     -  ivtmp$5_80 = (unsigned int) _87;
>>     -  _52 = _104 * 2;
>>     -  ivtmp$6_54 = (unsigned int) _52;
>>     +  ivtmp$3_61 = (unsigned int) _104;
>>     +  _54 = _104 * 2;
>>     +  ivtmp$5_56 = (unsigned int) _54;
>>     +  _46 = _104 * 32;
>>     +  ivtmp$8_48 = (unsigned int) _46;
>>     +  _42 = (unsigned int) _23;
>>
>>        <bb 4> [local count: 87490071]:
>>     -  # _2 = PHI <_104(3), _63(6)>
>>     -  # ivtmp$5_22 = PHI <ivtmp$5_80(3), ivtmp$5_81(6)>
>>     -  # ivtmp$6_72 = PHI <ivtmp$6_54(3), ivtmp$6_56(6)>
>>     -  _25 = (int) ivtmp$6_72;
>>     -  _50 = ivtmp$6_72 * 2;
>>     -  _27 = (int) _50;
>>     -  _48 = ivtmp$6_72 * 3;
>>     -  _29 = (int) _48;
>>     +  # ivtmp$3_106 = PHI <ivtmp$3_61(3), ivtmp$3_47(6)>
>>     +  # ivtmp$5_87 = PHI <ivtmp$5_56(3), ivtmp$5_72(6)>
>>     +  # ivtmp$8_52 = PHI <ivtmp$8_48(3), ivtmp$8_50(6)>
>>     +  _2 = (int) ivtmp$3_106;
>>     +  pt.x = _2;
>>     +  _25 = (int) ivtmp$5_87;
>>     +  pt.y = _25;
>>     +  _45 = ivtmp$5_87 * 2;
>>     +  _27 = (int) _45;
>>     +  pt.z = _27;
>>     +  _43 = ivtmp$5_87 * 3;
>>     +  _29 = (int) _43;
>>     +  pt.attr[4] = _29;
>>        _34 = .UNIQUE (OACC_FORK, 0, 2);
>>
>>        <bb 5> [local count: 437450365]:
>>        _107 = .GOACC_DIM_POS (2);
>>     -  _82 = (int) ivtmp$5_22;
>>     -  _83 = _82 + _107;
>>     -  _47 = *.omp_data_i_44(D).arr;
>>     -  _46 = (sizetype) _83;
>>     -  _45 = _46 * 4;
>>     -  _43 = _47 + _45;
>>     -  _49 = MEM <int> [(int[1025] *)_43];
>>     -  _51 = _2 + _49;
>>     -  _53 = _25 + _51;
>>     -  _103 = _27 + _53;
>>     -  _95 = _29 + _103;
>>     -  MEM <int> [(int[1025] *)_43] = _95;
>>     +  _49 = (int) ivtmp$8_52;
>>     +  _51 = _49 + _107;
>>     +  _103 = *.omp_data_i_44(D).arr;
>>     +  _63 = (sizetype) _51;
>>     +  _41 = _63 * 4;
>>     +  _4 = _103 + _41;
>>     +  _95 = MEM <int> [(int[1025] *)_4];
>>     +  _96 = _2 + _95;
>>     +  _22 = _25 + _96;
>>     +  _105 = _22 + _27;
>>     +  _99 = _29 + _105;
>>     +  MEM <int> [(int[1025] *)_4] = _99;
>>        .UNIQUE (OACC_JOIN, _34, 2);
>>
>>        <bb 6> [local count: 87490071]:
>>     -  _63 = _2 + 1;
>>     -  ivtmp$5_81 = ivtmp$5_22 + 32;
>>     -  ivtmp$6_56 = ivtmp$6_72 + 2;
>>     -  if (_23 != _63)
>>     +  ivtmp$3_47 = ivtmp$3_106 + 1;
>>     +  ivtmp$5_72 = ivtmp$5_87 + 2;
>>     +  ivtmp$8_50 = ivtmp$8_52 + 32;
>>     +  if (_42 != ivtmp$3_47)
>>          goto <bb 4>; [89.00%]
>>        else
>>          goto <bb 7>; [11.00%]
>>
>> ..., and thus the change in diagnostics:
>>
>>      [...]
>>      source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ ought to be adjusted for OpenACC privatization level: ‘gang’
>>      source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ ought to be adjusted for OpenACC privatization level: ‘gang’
>>     +source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90: In function ‘t4_._omp_fn.0’:
>>     +source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ adjusted for OpenACC privatization level: ‘gang’
>>     +  131 |   !$acc loop gang private(pt) ! { dg-line l_loop[incr c_loop] }
>>     +      |                                                               ^
>>
>> For '-O2', host compilation begins same as '-O1', and again in
>> 'a-private-variables.f90.148t.dse3', the 'pt.{x,y,z,attr}' assignments
>> for the new 'struct vec3 pt;' get cleaned out:
>>
>>     --- ./a-private-variables.f90.144t.sink1        2022-04-12 14:28:19.173520425 +0200
>>     +++ ./a-private-variables.f90.148t.dse3 2022-04-12 14:28:19.173520425 +0200
>>     [...]
>>      __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     @@ -97,10 +74,6 @@
>>        goto <bb 3>; [100.00%]
>>
>>        <bb 6> [local count: 35644102]:
>>     -  pt.attr[4] = _29;
>>     -  pt.z = _27;
>>     -  pt.y = _25;
>>     -  pt.x = .offset.15_2;
>>        return;
>>
>>      }
>>     [...]
>>
>> For '-O2', nvptx offload target compilation looks very similar to host
>> compilation, and again in 'a.xnvptx-none.mkoffload.148t.dse3', the
>> 'pt.{x,y,z,attr}' assignments for the new 'struct vec3 pt;' get cleaned
>> out:
>>
>>     --- ./a.xnvptx-none.mkoffload.144t.sink1        2022-04-12 14:28:19.213520366 +0200
>>     +++ ./a.xnvptx-none.mkoffload.148t.dse3 2022-04-12 14:28:19.213520366 +0200
>>     [...]
>>      __attribute__((oacc function (32, 8, 32), oacc parallel, omp target entrypoint))
>>      void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>      {
>>     @@ -34,13 +25,9 @@
>>
>>        <bb 2> [local count: 7128820]:
>>        _104 = .GOACC_DIM_POS (0);
>>     -  pt.x = _104;
>>        _25 = _104 * 2;
>>     -  pt.y = _25;
>>        _27 = _104 * 4;
>>     -  pt.z = _27;
>>        _29 = _104 * 6;
>>     -  pt.attr[4] = _29;
>>        _34 = .UNIQUE (OACC_FORK, 0, 2);
>>
>>        <bb 3> [local count: 437450365]:
>>
>> ..., so no actual changes in the end.
>>
>> I have not verified other ("higher") optimization levels, but given no
>> change in diagnostics, I suppose the same ("no actual changes") happens
>> for those.
>>
>> Is the '-O1' change/regression unexpected, and should be analyzed, or
>> should we just accept the slightly worse code generation (for '-O1'
>> only), and I accordingly adjust the test case for the change in
>> diagnostics?
>>
>>
>> Grüße
>>  Thomas
>> -----------------
>> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Richard Biener April 12, 2022, 3:51 p.m. UTC | #7
> Am 12.04.2022 um 17:08 schrieb Thomas Schwinge <thomas@codesourcery.com>:
> 
> Hi!
> 
>> On 2022-04-12T15:45:03+0200, Richard Biener <rguenther@suse.de> wrote:
>>> On Tue, 12 Apr 2022, Thomas Schwinge wrote:
>>> On 2022-04-07T15:04:15+0200, Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>> On Thu, 7 Apr 2022, Jan Hubicka wrote:
>>>>>> On Thu, 7 Apr 2022, Jan Hubicka wrote:
>>>>>>> this patch fixes miscompilation of gnatmake.  Modref attempts to track memory
>>>>>>> accesses relative to the base pointers which are parameters of functions.
>>>>>>> If it fails, it still makes difference between unknown memory access and
>>>>>>> global memory access.  The second makes it possible to disambiguate with
>>>>>>> memory that is not accessible from outside world (i.e. everything that does
>>>>>>> not escape from the caller function).  This is useful so we do not punt
>>>>>>> when unknown function is called.
>>>>>>> 
>>>>>>> Now I added ref_may_access_global_memory_p to tree-ssa-alias whic is using
>>>>>>> ptr_deref_may_alias_global_p.  There is however a shift in meaning of this
>>>>>>> predicate: the second tests that the dereference may alias with global variable.
>>>>>>> 
>>>>>>> In the testcase we are disambiguating heap allocated escaping memory which is
>>>>>>> not a global variable but it is still a global memory in the modref's sense.
>>>>>>> So we need to test in addition contains_escaped.
>>>>>>> 
>>>>>>> The patch simply copies logic from the predicate and adds the check.
>>>>>>> I am not sure if there is better way to handle this?
>>>>>> 
>>>>>> I'm testing the following variant which exposes this detail
>>>>>> (escaped local memory global or not) in the APIs that say "global"
>>>>>> which allows to remove ref_may_access_global_memory_p.
>>>>> 
>>>>> Thank you.  Indeed it is better to have an explicit flag, since the
>>>>> clash of names is bit sensitive.
>>>> 
>>>> OK - bootstrapped / tested on x86_64-unknown-linux-gnu including Ada
>>>> and now pushed.
>>> 
>>> This commit r12-8048-g8c0ebaf9f586100920a3c0849fb10e9985d7ae58
>>> "ipa/104303 - miscompilation of gnatmake" is causing one regression in
>>> nvptx offload testing:
>>> 
>>>    [...]
>>>    [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/private-variables.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1   at line 142 (test for bogus messages, line 131)
>>>    [...]
>>> 
>>> I've done a before/after 'diff' of
>>> '-fdump-tree-all -foffload-options=nvptx-none=-fdump-tree-all'
>>> with all functions and calls other than 't4' commented out.
>> 
>> I suppose the
>> 
>> diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
>> index 2a13ea34829..34ce8abe33a 100644
>> --- a/gcc/tree-ssa-dce.cc
>> +++ b/gcc/tree-ssa-dce.cc
>> @@ -315,7 +315,7 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool
>> aggressive)
>>     }
>> 
>>   if ((gimple_vdef (stmt) && keep_all_vdefs_p ())
>> -      || stmt_may_clobber_global_p (stmt))
>> +      || stmt_may_clobber_global_p (stmt, true))
>>     {
>>       mark_stmt_necessary (stmt, true);
>>       return;
>> 
>> was overly conservative (I was probably misled by keep_all_vdefs_p ()),
>> passing false to stmt_may_clobber_global_p should fix the regression?
> 
> It does, thanks.  Per more 'diff'ing, this change again enables
> the '-O1' host compilation 'a-private-variables.f90.117t.dce2' to
> clean this up, and likewise for nvptx offload compilation
> 'a.xnvptx-none.mkoffload.117t.dce2', thus the extra diagnostic
> again disappears.  In fact, for all optimization flags variants
> that I've tried, we've again got the exactly same dumps as before
> commit r12-8048-g8c0ebaf9f586100920a3c0849fb10e9985d7ae58
> "ipa/104303 - miscompilation of gnatmake"!
> 
> So I'll run that through standard bootstrap/regression testing, and push?
> Got a suggestion for rationale to put into the commit log?

I have already tested and pushed it.

Richard.

> 
> Grüße
> Thomas
> 
> 
>>> For '-O0', there's no difference at all.
>>> 
>>> For '-O1', for host compilation we see:
>>> 
>>>    diff -ru 0-O1/a-private-variables.f90.117t.dce2 ./a-private-variables.f90.117t.dce2
>>>    --- 0-O1/a-private-variables.f90.117t.dce2      2022-04-12 08:36:54.525302868 +0200
>>>    +++ ./a-private-variables.f90.117t.dce2 2022-04-12 12:51:43.726304109 +0200
>>>    @@ -30,9 +30,13 @@
>>> 
>>>       <bb 3> [local count: 87490071]:
>>>       # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>>>    +  pt.x = .offset.15_2;
>>>       _25 = .offset.15_2 * 2;
>>>    +  pt.y = _25;
>>>       _27 = .offset.15_2 * 4;
>>>    +  pt.z = _27;
>>>       _29 = .offset.15_2 * 6;
>>>    +  pt.attr[4] = _29;
>>> 
>>>       <bb 4> [local count: 1073741824]:
>>>       # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>>>    diff -ru 0-O1/a-private-variables.f90.118t.stdarg ./a-private-variables.f90.118t.stdarg
>>>    --- 0-O1/a-private-variables.f90.118t.stdarg    2022-04-12 08:36:54.525302868 +0200
>>>    +++ ./a-private-variables.f90.118t.stdarg       2022-04-12 12:51:43.726304109 +0200
>>>    @@ -4,6 +4,7 @@
>>>     __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>>>     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>>     {
>>>    +  struct vec3 pt;
>>>       integer(kind=4) .offset.15_2;
>>>       integer(kind=4) .offset.10_4;
>>>       integer(kind=4) _25;
>>>    @@ -25,9 +26,13 @@
>>> 
>>>       <bb 3> [local count: 87490071]:
>>>       # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>>>    +  pt.x = .offset.15_2;
>>>       _25 = .offset.15_2 * 2;
>>>    +  pt.y = _25;
>>>       _27 = .offset.15_2 * 4;
>>>    +  pt.z = _27;
>>>       _29 = .offset.15_2 * 6;
>>>    +  pt.attr[4] = _29;
>>> 
>>>       <bb 4> [local count: 1073741824]:
>>>       # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>>>    [Similar for following passes/dumps.]
>>>    diff -ru 0-O1/a-private-variables.f90.141t.lim2 ./a-private-variables.f90.141t.lim2
>>>    --- 0-O1/a-private-variables.f90.141t.lim2      2022-04-12 08:36:54.525302868 +0200
>>>    +++ ./a-private-variables.f90.141t.lim2 2022-04-12 12:51:43.730304125 +0200
>>>    @@ -24,11 +24,42 @@
>>>     ;; 5 succs { 7 6 }
>>>     ;; 7 succs { 3 }
>>>     ;; 6 succs { 1 }
>>>    +
>>>    +Symbols to be put in SSA form
>>>    +{ D.4340 D.4356 D.4357 D.4358 D.4359 D.4360 D.4361 D.4362 D.4363 }
>>>    +Incremental SSA update started at block: 0
>>>    +Number of blocks in CFG: 9
>>>    +Number of blocks to update: 8 ( 89%)
>>>    +
>>>    +
>>>    +
>>>    +SSA replacement table
>>>    +N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j
>>>    +
>>>    +pt_x_lsm.22_1 -> { pt_x_lsm.22_72 }
>>>    +pt_z_lsm.24_20 -> { pt_z_lsm.24_9 }
>>>    +pt_attr_I_lsm.25_21 -> { pt_attr_I_lsm.25_10 }
>>>    +pt_y_lsm.23_22 -> { pt_y_lsm.23_73 }
>>>    +Incremental SSA update started at block: 3
>>>    +Number of blocks in CFG: 9
>>>    +Number of blocks to update: 3 ( 33%)
>>>    +
>>>    +
>>>     __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>>>     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>>     {
>>>    +  integer(kind=4) D.4363;
>>>    +  integer(kind=4) pt_attr_I_lsm.25;
>>>    +  integer(kind=4) D.4361;
>>>    +  integer(kind=4) pt_z_lsm.24;
>>>    +  integer(kind=4) D.4359;
>>>    +  integer(kind=4) pt_y_lsm.23;
>>>    +  integer(kind=4) D.4357;
>>>    +  integer(kind=4) pt_x_lsm.22;
>>>    +  struct vec3 pt;
>>>       integer(kind=4) .offset.15_2;
>>>       integer(kind=4) .offset.10_4;
>>>    +  integer(kind=4) _7(D);
>>>       integer(kind=4) _25;
>>>       integer(kind=4) _27;
>>>       integer(kind=4) _29;
>>>    @@ -37,21 +68,32 @@
>>>       integer(kind=8) _43;
>>>       integer(kind=4)[1025] * _45;
>>>       integer(kind=4) _46;
>>>    +  integer(kind=4) _47(D);
>>>       integer(kind=4) _48;
>>>       integer(kind=4) _50;
>>>    +  integer(kind=4) _51(D);
>>>       integer(kind=4) _52;
>>>       integer(kind=4) _54;
>>>       integer(kind=4) .offset.10_56;
>>>       integer(kind=4) .offset.15_63;
>>>    +  integer(kind=4) _70(D);
>>> 
>>>       <bb 2> [local count: 7128820]:
>>>    +  pt_x_lsm.22_8 = _7(D);
>>>    +  pt_y_lsm.23_49 = _47(D);
>>>    +  pt_z_lsm.24_53 = _51(D);
>>>    +  pt_attr_I_lsm.25_71 = _70(D);
>>>       _45 = *.omp_data_i_44(D).arr;
>>> 
>>>       <bb 3> [local count: 87490071]:
>>>       # .offset.15_2 = PHI <0(2), .offset.15_63(7)>
>>>    +  pt_x_lsm.22_72 = .offset.15_2;
>>>       _25 = .offset.15_2 * 2;
>>>    +  pt_y_lsm.23_73 = _25;
>>>       _27 = .offset.15_2 * 4;
>>>    +  pt_z_lsm.24_9 = _27;
>>>       _29 = .offset.15_2 * 6;
>>>    +  pt_attr_I_lsm.25_10 = _29;
>>>       _41 = .offset.15_2 * 32;
>>> 
>>>       <bb 4> [local count: 1073741824]:
>>>    @@ -84,6 +126,14 @@
>>>       goto <bb 3>; [100.00%]
>>> 
>>>       <bb 6> [local count: 35644102]:
>>>    +  # pt_z_lsm.24_20 = PHI <pt_z_lsm.24_9(5)>
>>>    +  # pt_attr_I_lsm.25_21 = PHI <pt_attr_I_lsm.25_10(5)>
>>>    +  # pt_x_lsm.22_1 = PHI <pt_x_lsm.22_72(5)>
>>>    +  # pt_y_lsm.23_22 = PHI <pt_y_lsm.23_73(5)>
>>>    +  pt.attr[4] = pt_attr_I_lsm.25_21;
>>>    +  pt.z = pt_z_lsm.24_20;
>>>    +  pt.y = pt_y_lsm.23_22;
>>>    +  pt.x = pt_x_lsm.22_1;
>>>       return;
>>> 
>>>     }
>>>    [Similar for following passes/dumps.]
>>>    diff -ru 0-O1/a-private-variables.f90.148t.dse3 ./a-private-variables.f90.148t.dse3
>>>    --- 0-O1/a-private-variables.f90.148t.dse3      2022-04-12 08:36:54.525302868 +0200
>>>    +++ ./a-private-variables.f90.148t.dse3 2022-04-12 12:51:43.730304125 +0200
>>>    @@ -1,11 +1,24 @@
>>> 
>>>     ;; Function t4_._omp_fn.0 (t4_._omp_fn.0, funcdef_no=3, decl_uid=4278, cgraph_uid=4, symbol_order=3)
>>> 
>>>    +Removing basic block 7
>>>    +Removing basic block 8
>>>    +Removing basic block 9
>>>     __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>>>     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>>     {
>>>    +  integer(kind=4) D.4363;
>>>    +  integer(kind=4) pt_attr_I_lsm.25;
>>>    +  integer(kind=4) D.4361;
>>>    +  integer(kind=4) pt_z_lsm.24;
>>>    +  integer(kind=4) D.4359;
>>>    +  integer(kind=4) pt_y_lsm.23;
>>>    +  integer(kind=4) D.4357;
>>>    +  integer(kind=4) pt_x_lsm.22;
>>>    +  struct vec3 pt;
>>>       integer(kind=4) .offset.15_2;
>>>       integer(kind=4) .offset.10_4;
>>>    +  integer(kind=4) _7(D);
>>>       integer(kind=4) _25;
>>>       integer(kind=4) _27;
>>>       integer(kind=4) _29;
>>>    @@ -14,25 +27,28 @@
>>>       integer(kind=8) _43;
>>>       integer(kind=4)[1025] * _45;
>>>       integer(kind=4) _46;
>>>    +  integer(kind=4) _47(D);
>>>       integer(kind=4) _48;
>>>       integer(kind=4) _50;
>>>    +  integer(kind=4) _51(D);
>>>       integer(kind=4) _52;
>>>       integer(kind=4) _54;
>>>       integer(kind=4) .offset.10_56;
>>>       integer(kind=4) .offset.15_63;
>>>    +  integer(kind=4) _70(D);
>>> 
>>>       <bb 2> [local count: 7128820]:
>>>       _45 = *.omp_data_i_44(D).arr;
>>> 
>>>       <bb 3> [local count: 87490071]:
>>>    -  # .offset.15_2 = PHI <0(2), .offset.15_63(7)>
>>>    +  # .offset.15_2 = PHI <0(2), .offset.15_63(5)>
>>>       _25 = .offset.15_2 * 2;
>>>       _27 = .offset.15_2 * 4;
>>>       _29 = .offset.15_2 * 6;
>>>       _41 = .offset.15_2 * 32;
>>> 
>>>       <bb 4> [local count: 1073741824]:
>>>    -  # .offset.10_4 = PHI <0(3), .offset.10_56(8)>
>>>    +  # .offset.10_4 = PHI <0(3), .offset.10_56(4)>
>>>       _42 = .offset.10_4 + _41;
>>>       _43 = (integer(kind=8)) _42;
>>>       _46 = (*_45)[_43];
>>>    @@ -43,23 +59,17 @@
>>>       (*_45)[_43] = _54;
>>>       .offset.10_56 = .offset.10_4 + 1;
>>>       if (.offset.10_56 <= 31)
>>>    -    goto <bb 8>; [89.00%]
>>>    +    goto <bb 4>; [89.00%]
>>>       else
>>>         goto <bb 5>; [11.00%]
>>> 
>>>    -  <bb 8> [local count: 955630224]:
>>>    -  goto <bb 4>; [100.00%]
>>>    -
>>>       <bb 5> [local count: 437450365]:
>>>       .offset.15_63 = .offset.15_2 + 1;
>>>       if (.offset.15_63 <= 31)
>>>    -    goto <bb 7>; [89.00%]
>>>    +    goto <bb 3>; [89.00%]
>>>       else
>>>         goto <bb 6>; [11.00%]
>>> 
>>>    -  <bb 7> [local count: 389330825]:
>>>    -  goto <bb 3>; [100.00%]
>>>    -
>>>       <bb 6> [local count: 35644102]:
>>>       return;
>>> 
>>>    [Similar for following passes/dumps.]
>>> 
>>> ..., so in 'a-private-variables.f90.148t.dse3', the 'pt.{x,y,z,attr}'
>>> assignments for the new 'struct vec3 pt;' get cleaned out, so that should
>>> all be fine; no actual changes in the end.
>>> 
>>> Comparing '-O1' nvptx offload target compilation before/after, the first
>>> difference is in 'a.xnvptx-none.mkoffload.117t.dce2': similar to host
>>> compilation.  But then, in the following things do not get cleaned up as
>>> they do for the host compilation; the 'pt.{x,y,z,attr}' assignments for
>>> the new 'struct vec3 pt;' persist:
>>> 
>>>    diff -ru 0-O1/a.xnvptx-none.mkoffload.252t.optimized ./a.xnvptx-none.mkoffload.252t.optimized
>>>    --- 0-O1/a.xnvptx-none.mkoffload.252t.optimized 2022-04-12 08:36:54.569303204 +0200
>>>    +++ ./a.xnvptx-none.mkoffload.252t.optimized    2022-04-12 12:51:43.774304292 +0200
>>>    @@ -7,34 +7,36 @@
>>>     __attribute__((oacc function (32, 8, 32), oacc parallel, omp target entrypoint))
>>>     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>>     {
>>>    -  unsigned int ivtmp$6;
>>>    +  unsigned int ivtmp$8;
>>>       unsigned int ivtmp$5;
>>>    +  unsigned int ivtmp$3;
>>>       int D.1527;
>>>       int D.1524;
>>>    +  struct vec3 pt;
>>>       int _2;
>>>    +  int[1025] * _4;
>>>    +  int _22;
>>>       int _23;
>>>       int _25;
>>>       int _27;
>>>       int _29;
>>>       int _34;
>>>    -  int[1025] * _43;
>>>    -  sizetype _45;
>>>    -  sizetype _46;
>>>    -  int[1025] * _47;
>>>    -  unsigned int _48;
>>>    +  sizetype _41;
>>>    +  unsigned int _42;
>>>    +  unsigned int _43;
>>>    +  unsigned int _45;
>>>    +  int _46;
>>>       int _49;
>>>    -  unsigned int _50;
>>>       int _51;
>>>    -  int _52;
>>>    -  int _53;
>>>    -  int _63;
>>>    -  int _82;
>>>    -  int _83;
>>>    -  int _87;
>>>    +  int _54;
>>>    +  sizetype _63;
>>>       int _95;
>>>    +  int _96;
>>>    +  int _99;
>>>       int _102;
>>>    -  int _103;
>>>    +  int[1025] * _103;
>>>       int _104;
>>>    +  int _105;
>>>       int _107;
>>> 
>>>       <bb 2> [local count: 7128820]:
>>>    @@ -47,43 +49,50 @@
>>>         goto <bb 7>; [73.00%]
>>> 
>>>       <bb 3> [local count: 1924781]:
>>>    -  _87 = _104 * 32;
>>>    -  ivtmp$5_80 = (unsigned int) _87;
>>>    -  _52 = _104 * 2;
>>>    -  ivtmp$6_54 = (unsigned int) _52;
>>>    +  ivtmp$3_61 = (unsigned int) _104;
>>>    +  _54 = _104 * 2;
>>>    +  ivtmp$5_56 = (unsigned int) _54;
>>>    +  _46 = _104 * 32;
>>>    +  ivtmp$8_48 = (unsigned int) _46;
>>>    +  _42 = (unsigned int) _23;
>>> 
>>>       <bb 4> [local count: 87490071]:
>>>    -  # _2 = PHI <_104(3), _63(6)>
>>>    -  # ivtmp$5_22 = PHI <ivtmp$5_80(3), ivtmp$5_81(6)>
>>>    -  # ivtmp$6_72 = PHI <ivtmp$6_54(3), ivtmp$6_56(6)>
>>>    -  _25 = (int) ivtmp$6_72;
>>>    -  _50 = ivtmp$6_72 * 2;
>>>    -  _27 = (int) _50;
>>>    -  _48 = ivtmp$6_72 * 3;
>>>    -  _29 = (int) _48;
>>>    +  # ivtmp$3_106 = PHI <ivtmp$3_61(3), ivtmp$3_47(6)>
>>>    +  # ivtmp$5_87 = PHI <ivtmp$5_56(3), ivtmp$5_72(6)>
>>>    +  # ivtmp$8_52 = PHI <ivtmp$8_48(3), ivtmp$8_50(6)>
>>>    +  _2 = (int) ivtmp$3_106;
>>>    +  pt.x = _2;
>>>    +  _25 = (int) ivtmp$5_87;
>>>    +  pt.y = _25;
>>>    +  _45 = ivtmp$5_87 * 2;
>>>    +  _27 = (int) _45;
>>>    +  pt.z = _27;
>>>    +  _43 = ivtmp$5_87 * 3;
>>>    +  _29 = (int) _43;
>>>    +  pt.attr[4] = _29;
>>>       _34 = .UNIQUE (OACC_FORK, 0, 2);
>>> 
>>>       <bb 5> [local count: 437450365]:
>>>       _107 = .GOACC_DIM_POS (2);
>>>    -  _82 = (int) ivtmp$5_22;
>>>    -  _83 = _82 + _107;
>>>    -  _47 = *.omp_data_i_44(D).arr;
>>>    -  _46 = (sizetype) _83;
>>>    -  _45 = _46 * 4;
>>>    -  _43 = _47 + _45;
>>>    -  _49 = MEM <int> [(int[1025] *)_43];
>>>    -  _51 = _2 + _49;
>>>    -  _53 = _25 + _51;
>>>    -  _103 = _27 + _53;
>>>    -  _95 = _29 + _103;
>>>    -  MEM <int> [(int[1025] *)_43] = _95;
>>>    +  _49 = (int) ivtmp$8_52;
>>>    +  _51 = _49 + _107;
>>>    +  _103 = *.omp_data_i_44(D).arr;
>>>    +  _63 = (sizetype) _51;
>>>    +  _41 = _63 * 4;
>>>    +  _4 = _103 + _41;
>>>    +  _95 = MEM <int> [(int[1025] *)_4];
>>>    +  _96 = _2 + _95;
>>>    +  _22 = _25 + _96;
>>>    +  _105 = _22 + _27;
>>>    +  _99 = _29 + _105;
>>>    +  MEM <int> [(int[1025] *)_4] = _99;
>>>       .UNIQUE (OACC_JOIN, _34, 2);
>>> 
>>>       <bb 6> [local count: 87490071]:
>>>    -  _63 = _2 + 1;
>>>    -  ivtmp$5_81 = ivtmp$5_22 + 32;
>>>    -  ivtmp$6_56 = ivtmp$6_72 + 2;
>>>    -  if (_23 != _63)
>>>    +  ivtmp$3_47 = ivtmp$3_106 + 1;
>>>    +  ivtmp$5_72 = ivtmp$5_87 + 2;
>>>    +  ivtmp$8_50 = ivtmp$8_52 + 32;
>>>    +  if (_42 != ivtmp$3_47)
>>>         goto <bb 4>; [89.00%]
>>>       else
>>>         goto <bb 7>; [11.00%]
>>> 
>>> ..., and thus the change in diagnostics:
>>> 
>>>     [...]
>>>     source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ ought to be adjusted for OpenACC privatization level: ‘gang’
>>>     source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ ought to be adjusted for OpenACC privatization level: ‘gang’
>>>    +source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90: In function ‘t4_._omp_fn.0’:
>>>    +source-gcc/libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90:131:63: note: variable ‘pt’ adjusted for OpenACC privatization level: ‘gang’
>>>    +  131 |   !$acc loop gang private(pt) ! { dg-line l_loop[incr c_loop] }
>>>    +      |                                                               ^
>>> 
>>> For '-O2', host compilation begins same as '-O1', and again in
>>> 'a-private-variables.f90.148t.dse3', the 'pt.{x,y,z,attr}' assignments
>>> for the new 'struct vec3 pt;' get cleaned out:
>>> 
>>>    --- ./a-private-variables.f90.144t.sink1        2022-04-12 14:28:19.173520425 +0200
>>>    +++ ./a-private-variables.f90.148t.dse3 2022-04-12 14:28:19.173520425 +0200
>>>    [...]
>>>     __attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint))
>>>     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>>     {
>>>    @@ -97,10 +74,6 @@
>>>       goto <bb 3>; [100.00%]
>>> 
>>>       <bb 6> [local count: 35644102]:
>>>    -  pt.attr[4] = _29;
>>>    -  pt.z = _27;
>>>    -  pt.y = _25;
>>>    -  pt.x = .offset.15_2;
>>>       return;
>>> 
>>>     }
>>>    [...]
>>> 
>>> For '-O2', nvptx offload target compilation looks very similar to host
>>> compilation, and again in 'a.xnvptx-none.mkoffload.148t.dse3', the
>>> 'pt.{x,y,z,attr}' assignments for the new 'struct vec3 pt;' get cleaned
>>> out:
>>> 
>>>    --- ./a.xnvptx-none.mkoffload.144t.sink1        2022-04-12 14:28:19.213520366 +0200
>>>    +++ ./a.xnvptx-none.mkoffload.148t.dse3 2022-04-12 14:28:19.213520366 +0200
>>>    [...]
>>>     __attribute__((oacc function (32, 8, 32), oacc parallel, omp target entrypoint))
>>>     void t4_._omp_fn.0 (const struct .omp_data_t.1 & restrict .omp_data_i)
>>>     {
>>>    @@ -34,13 +25,9 @@
>>> 
>>>       <bb 2> [local count: 7128820]:
>>>       _104 = .GOACC_DIM_POS (0);
>>>    -  pt.x = _104;
>>>       _25 = _104 * 2;
>>>    -  pt.y = _25;
>>>       _27 = _104 * 4;
>>>    -  pt.z = _27;
>>>       _29 = _104 * 6;
>>>    -  pt.attr[4] = _29;
>>>       _34 = .UNIQUE (OACC_FORK, 0, 2);
>>> 
>>>       <bb 3> [local count: 437450365]:
>>> 
>>> ..., so no actual changes in the end.
>>> 
>>> I have not verified other ("higher") optimization levels, but given no
>>> change in diagnostics, I suppose the same ("no actual changes") happens
>>> for those.
>>> 
>>> Is the '-O1' change/regression unexpected, and should be analyzed, or
>>> should we just accept the slightly worse code generation (for '-O1'
>>> only), and I accordingly adjust the test case for the change in
>>> diagnostics?
>>> 
>>> 
>>> Grüße
>>> Thomas
>>> -----------------
>>> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
>>> 
>> 
>> --
>> Richard Biener <rguenther@suse.de>
>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
>> Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Patch

diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 50bd47b31f3..9e34f76c3cb 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -2578,8 +2578,24 @@  ref_may_access_global_memory_p (ao_ref *ref)
   if (TREE_CODE (base) == MEM_REF
       || TREE_CODE (base) == TARGET_MEM_REF)
     {
-      if (ptr_deref_may_alias_global_p (TREE_OPERAND (base, 0)))
+      struct ptr_info_def *pi;
+      tree ptr = TREE_OPERAND (base, 0);
+
+      /* If we end up with a pointer constant here that may point
+	 to global memory.  */
+      if (TREE_CODE (ptr) != SSA_NAME)
+	return true;
+
+      pi = SSA_NAME_PTR_INFO (ptr);
+
+      /* If we do not have points-to information for this variable,
+	 we have to punt.  */
+      if (!pi)
 	return true;
+
+      /* ???  This does not use TBAA to prune globals ptr may not access.  */
+      return pt_solution_includes_global (&pi->pt)
+	     || pi->pt.vars_contains_escaped;
     }
   else
     {