OpenMP/Fortran: Partially fix non-rect loop nests [PR107424]

Message ID 18c3aed8-71dd-9b7f-6c7c-da529876d3f5@codesourcery.com
State New
Headers
Series OpenMP/Fortran: Partially fix non-rect loop nests [PR107424] |

Commit Message

Tobias Burnus Jan. 19, 2023, 2:40 p.m. UTC
  This is all about non-rectangular loop nests in OpenMP.

The attached patch depends on the obvious fix for https://gcc.gnu.org/PR108459,
which is together with a nice testcase in Jakub's WIP patch attached to the PR;
without, gfortran.dg/gomp/canonical-loop-1.f90 fails with an ICE (segfault).

My patch fixes part of the Fortran issues found. Namely, it ensures that a "regular"
non-rectangular loop nest actually works by passing the outer-loop-var, the multiplier
and offset in a TREE_VEC to the middle end. It additionally avoids pointlessly
creating a temporary variable for a VAR_DECL (main advantage: dump looks cleaner and
avoids some dependency analysis) - and likewise for 'step' given that 'step' was
evaluated before.

There is an additional issue - not quite addressed in this patch: There are cases
when a loop variable is replaced by another variable ('count') and then at the beginning
of the loop body, the original variable gets the value from the count variable. Obviously,
this no longer works with non-rectangular loop nests.
The 'count' appears in two cases: (a) when the iteration step is not 1 or -1 and (b) if
the iteration variable is a pointer (scalar with allocatable, pointer, optional argument
or just a dummy argument; oddly, even if it has the value attribute).

There is pending work to be done in this case, as mentioned in comment 6 and 8 of the PR.
This patch adds some 'sorry' messages for them. I hope and think that I have not missed
a case where 'count' is used which I did not catch, but I should have all or at least most.

OK for mainline, once the other patch has been committed?

Tobias

PS: I still need to verify that everything is fine, once the other patch has been committed.
A flaky mainboard on the laptop causes multiple random freezes per day, which makes testing
+ patch writing a bit harder. (At least the mainboard replacement is scheduled for tomorrow :-) )
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Comments

Jakub Jelinek Jan. 20, 2023, 5:39 p.m. UTC | #1
On Thu, Jan 19, 2023 at 03:40:19PM +0100, Tobias Burnus wrote:
> +  gfc_symbol *var = code->ext.iterator->var->symtree->n.sym;
> +
> +  gfc_se se;
> +  tree tree_var, a1, a2;
> +  a1 = integer_one_node;
> +  a2 = integer_zero_node;
> +
> +  gfc_init_se (&se, NULL);
> +  gfc_conv_expr_lhs (&se, code->ext.iterator->var);
> +  gfc_add_block_to_block (pblock, &se.pre);
> +  tree_var = se.expr;
> +
> +  {
> +    /* FIXME: Handle non-unity iterations, cf. PR fortran/107424.

I think instead of non-unity etc. it is better to talk about
constant step 1 or -1.

> +       The issue is that for those a 'count' variable is used.  */
> +    dovar_init *di;
> +    unsigned ix;
> +    tree t = tree_var;
> +    while (TREE_CODE (t) == INDIRECT_REF)
> +      t = TREE_OPERAND (t, 0);
> +    FOR_EACH_VEC_ELT (*inits, ix, di)
> +      {
> +	tree t2 = di->var;
> +	while (TREE_CODE (t2) == INDIRECT_REF)
> +	  t2 = TREE_OPERAND (t2, 0);

The actual problem with non-simple loops for non-rectangular loops is
both in case it is an inner loop which uses some outer loop's iterator,
or if it is outer loop whose iterator is used, both of those cases
will not be handled properly.  The former case because instead of
having lb and ub expressions in canonicalized form var-outer * m + a
lb will be 0 (that is fine) and ub will be
(var-outer * m2 + a2 + step - var-outer * m1 - a1) / step
or so (sure, we can simplify that to
(var-outer * (m1 - m2) + (a2 + step - a1)) / step
but the division remains.  And the latter case is bad because we
need var-outer but we actually compute some artificial count iterator
and var-outer is only initialized in the body of the loop.
These sorry_at seems to handle just one of those, when the outer
loop whose var-outer is referenced is not simple, no?

I wonder if it wouldn't be cleaner and easier to simply remember for
each loop in XALLOCAVEC array whether it was simple or not and why
(from the:
      if (VAR_P (dovar))
        {
          if (integer_onep (step))
            simple = 1;
          else if (tree_int_cst_equal (step, integer_minus_one_node))
            simple = -1;
        }
      else
        dovar_decl
          = gfc_trans_omp_variable (code->ext.iterator->var->symtree->n.sym,
                                    false);
remember if it was simple (1/-1) or VAR_P !simple (then we would
if needed for non-rect sorry_at about step not being constant 1 or -1)
or if it is the !VAR_P case.
And then the non-rect sorry can be emitted for both the cases easily
(especially if you precompute the:
      if (VAR_P (dovar))
        {
          if (integer_onep (step))
            simple_loop[i] = 1;
          else if (tree_int_cst_equal (step, integer_minus_one_node))
            simple_loop[i] = -1;
	  else
	    simple_loop[i] = 0;
        }
      else
	simple_loop[i] = 2;
early) and in this function check it for both loop_n and i.

> +	if (t == t2)
> +	  {
> +	    HOST_WIDE_INT intval;
> +	    if (gfc_extract_hwi (code->ext.iterator->step, &intval, 0) == 0
> +		&& intval != 1 && intval != -1)
> +	      sorry_at (gfc_get_location (&code->loc),
> +			"non-rectangular loop nest with non-unit loop iteration"
> +			" step for %qs", var->name);

I'd say step other than constant 1 or -1.

> +  ! Use 'i' or 'j', unite stride on 'i' or on 'j' -> 4 loops

unit ?

> +  ! Then same, execpt use nonunit stride for 'k'

except, non-unit ?

> +  ! Use 'i' or 'j', unite stride on 'i' or on 'j' -> 4 loops
> +  ! Then same, execpt use nonunit stride for 'k'

2x again
(and some more later).

	Jakub
  
Jakub Jelinek Jan. 20, 2023, 6 p.m. UTC | #2
On Fri, Jan 20, 2023 at 06:39:04PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > +       The issue is that for those a 'count' variable is used.  */
> > +    dovar_init *di;
> > +    unsigned ix;
> > +    tree t = tree_var;
> > +    while (TREE_CODE (t) == INDIRECT_REF)
> > +      t = TREE_OPERAND (t, 0);
> > +    FOR_EACH_VEC_ELT (*inits, ix, di)
> > +      {
> > +	tree t2 = di->var;
> > +	while (TREE_CODE (t2) == INDIRECT_REF)
> > +	  t2 = TREE_OPERAND (t2, 0);
> 
> The actual problem with non-simple loops for non-rectangular loops is
> both in case it is an inner loop which uses some outer loop's iterator,
> or if it is outer loop whose iterator is used, both of those cases
> will not be handled properly.  The former case because instead of
> having lb and ub expressions in canonicalized form var-outer * m + a
> lb will be 0 (that is fine) and ub will be
> (var-outer * m2 + a2 + step - var-outer * m1 - a1) / step
> or so (sure, we can simplify that to
> (var-outer * (m1 - m2) + (a2 + step - a1)) / step
> but the division remains.  And the latter case is bad because we
> need var-outer but we actually compute some artificial count iterator
> and var-outer is only initialized in the body of the loop.
> These sorry_at seems to handle just one of those, when the outer
> loop whose var-outer is referenced is not simple, no?

Though, I wonder if we shouldn't for GCC 13 just sorry_at about
steps other than constant 1/-1 (in both outer loop with var-outer referenced
in inner loop and on inner loop that references it) and for the !VAR_P case
actually handle it if step 1/-1 by using simple like translation just with
an artificial iterator.
Say for:
subroutine foo (x, y, z)
  integer :: x, y, z
  !$omp do private (x)
  do x = y, z
  end do
end subroutine foo
we right now in *.original dump have:
    D.4265 = *y;
    D.4266 = *z;
    D.4267 = (1 - D.4265) + D.4266;
    #pragma omp for private(count.0) private(x)
    for (count.0 = 0; count.0 < D.4267; count.0 = count.0 + 1)
      {
        *x = D.4265 + NON_LVALUE_EXPR <count.0>;
        L.1:;
      }
What I'd suggest is:
    D.4265 = *y;
    D.4266 = *z;
    #pragma omp for private(x)
    for (x.0 = D.4265; x.0 <= D.4266; x.0 = x.0 + 1)
      {
        *x = x.0;
        L.1:;
      }
or so.  This could be done independently from the non-rect stuff,
as a first change.

	Jakub
  
Jakub Jelinek Jan. 20, 2023, 8:02 p.m. UTC | #3
On Fri, Jan 20, 2023 at 07:00:18PM +0100, Jakub Jelinek via Gcc-patches wrote:
> Though, I wonder if we shouldn't for GCC 13 just sorry_at about
> steps other than constant 1/-1 (in both outer loop with var-outer referenced
> in inner loop and on inner loop that references it) and for the !VAR_P case
> actually handle it if step 1/-1 by using simple like translation just with
> an artificial iterator.

As for the steps other than constant 1/-1, we have 5 cases:
  do i = x, y, 25
or
  do i = 12, 72, z
or
  do i = x, y, -42
or
  do i = 42, -10, z
or
  do i = x, y, z
The 1st and 3rd are with constant step, 2nd and 4th with constant lower and
upper bounds and the last one has step and at least one of the bounds
non-constant.

I wonder if in the light of e.g. PR108431 which says that
do i = -huge(i), huge(i) is invalid (well, that one would be very wrong
even from OpenMP POV because computing number of iterations definitely
overflows) and the fact that we handle step 1 and -1 the simple way
do do i = huge(i) - 10, huge(i) will not work either, I wonder if even
do i = huge(i) - 5, huge(i) - 1, 2 is undefined (similar reasoning, if
i after loop needs to be set to the huge(i) + 1 it is signed integer
overflow).  If yes, then perhaps at least the first 4 cases could be easily
handled (perhaps for GCC 13 just if clauses->non_rectangular only) as
for (i = x; i <= y; i += 25)
or
for (i = 12; i <= 72; i += z)
or
for (i = x; i >= y; i -= 42)
or
for (i = 42; i >= -10; i += z)

If those give equivalent behavior, then that would mean a sorry
only for the last case - the problem is that we then don't know at compile
time the direction.
Though perhaps even for that case we could play tricks, handle
  do i = x, y, z
as
if (z > 0)
  a = x, b = y, c = z;
else
  a = INT_MIN, b = too_lazy_to_compute_that_now, c = -z;
for (counter = a; counter <= b; counter += c)
{
  if (z > 0)
    i = counter;
  else
    i = counter - (unsigned) INT_MAX;
}
If that works, we'd need to figure also out how to handle that
in the non-rect cases.  But the m1 * var-outer + a1 and m2 * var-outer + a2
factors can be non-constant invariants, so again we could compute something
for them depending on if the outer or inner step was positive or negative.

	Jakub
  

Patch

OpenMP/Fortran: Partially fix non-rect loop nests [PR107424]

This patch ensures that loop bounds depending on outer loop vars use the
proper TREE_VEC format. It additionally gives a sorry if such an outer
var has a non-one/non-minus-one increment as currently a count variable
is used in this case (see PR).

gcc/fortran/ChangeLog:

	PR fortran/107424
	* trans-openmp.cc (gfc_nonrect_loop_expr): New.
	(gfc_trans_omp_do): Call it for start/end loop bound
	for non-rectangular loop nests.

gcc/testsuite/

	PR fortran/107424
	* gfortran.dg/gomp/non-rectangular-loop-3.f90: New test.

libgomp/ChangeLog:

	PR fortran/107424
	* testsuite/libgomp.fortran/non-rectangular-loop-1.f90: New test.
	* testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: New test.
	* testsuite/libgomp.fortran/non-rectangular-loop-2.f90: New test.

 gcc/fortran/trans-openmp.cc                        | 167 +++++-
 .../gfortran.dg/gomp/non-rectangular-loop-3.f90    |  85 +++
 .../libgomp.fortran/non-rectangular-loop-1.f90     | 637 +++++++++++++++++++++
 .../libgomp.fortran/non-rectangular-loop-1a.f90    | 374 ++++++++++++
 .../libgomp.fortran/non-rectangular-loop-2.f90     | 243 ++++++++
 5 files changed, 1495 insertions(+), 11 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 87213de0918..73376894316 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -5120,6 +5120,136 @@  typedef struct dovar_init_d {
   tree init;
 } dovar_init;
 
+static bool
+gfc_nonrect_loop_expr (stmtblock_t *pblock, gfc_se *sep, int loop_n,
+		       gfc_code *code, gfc_expr *expr, vec<dovar_init> *inits)
+{
+  int i;
+  for (i = 0; i < loop_n; i++)
+    {
+      gcc_assert (code->ext.iterator->var->expr_type == EXPR_VARIABLE);
+      if (gfc_find_sym_in_expr (code->ext.iterator->var->symtree->n.sym, expr))
+	break;
+      code = code->block->next;
+    }
+  if (i >= loop_n)
+    return false;
+
+  /* Canonic format: TREE_VEC with [var, multiplier, offset].  */
+  gfc_symbol *var = code->ext.iterator->var->symtree->n.sym;
+
+  gfc_se se;
+  tree tree_var, a1, a2;
+  a1 = integer_one_node;
+  a2 = integer_zero_node;
+
+  gfc_init_se (&se, NULL);
+  gfc_conv_expr_lhs (&se, code->ext.iterator->var);
+  gfc_add_block_to_block (pblock, &se.pre);
+  tree_var = se.expr;
+
+  {
+    /* FIXME: Handle non-unity iterations, cf. PR fortran/107424.
+       The issue is that for those a 'count' variable is used.  */
+    dovar_init *di;
+    unsigned ix;
+    tree t = tree_var;
+    while (TREE_CODE (t) == INDIRECT_REF)
+      t = TREE_OPERAND (t, 0);
+    FOR_EACH_VEC_ELT (*inits, ix, di)
+      {
+	tree t2 = di->var;
+	while (TREE_CODE (t2) == INDIRECT_REF)
+	  t2 = TREE_OPERAND (t2, 0);
+	if (t == t2)
+	  {
+	    HOST_WIDE_INT intval;
+	    if (gfc_extract_hwi (code->ext.iterator->step, &intval, 0) == 0
+		&& intval != 1 && intval != -1)
+	      sorry_at (gfc_get_location (&code->loc),
+			"non-rectangular loop nest with non-unit loop iteration"
+			" step for %qs", var->name);
+	    else
+	      sorry_at (gfc_get_location (&code->loc),
+			"non-rectangular loop nest with dummy-argument or "
+			"pointer, optional or allocatable do-variable %qs",
+			var->name);
+
+	    inform (gfc_get_location (&expr->where), "Used here");
+	    return false;
+	  }
+      }
+  }
+
+  if (expr->expr_type == EXPR_VARIABLE)
+    gcc_assert (expr->symtree->n.sym == var);
+  else if (expr->expr_type != EXPR_OP
+	   || (expr->value.op.op != INTRINSIC_TIMES
+	       && expr->value.op.op != INTRINSIC_PLUS
+	       && expr->value.op.op != INTRINSIC_MINUS))
+    gcc_unreachable ();
+  else
+    {
+      gfc_expr *et = NULL, *eo = NULL, *e = expr;
+      if (expr->value.op.op != INTRINSIC_TIMES)
+	{
+	  if (gfc_find_sym_in_expr (var, expr->value.op.op1))
+	    {
+	      e = expr->value.op.op1;
+	      eo = expr->value.op.op2;
+	    }
+	  else
+	    {
+	      eo = expr->value.op.op1;
+	      e = expr->value.op.op2;
+	    }
+	}
+      if (e->value.op.op == INTRINSIC_TIMES)
+	{
+	  if (e->value.op.op1->expr_type == EXPR_VARIABLE
+	      && e->value.op.op1->symtree->n.sym == var)
+	    et = e->value.op.op2;
+	  else
+	    {
+	      et = e->value.op.op1;
+	      gcc_assert (e->value.op.op2->expr_type == EXPR_VARIABLE
+			  && e->value.op.op2->symtree->n.sym == var);
+	    }
+	}
+      else
+	gcc_assert (e->expr_type == EXPR_VARIABLE && e->symtree->n.sym == var);
+      if (et != NULL)
+	{
+	  gfc_init_se (&se, NULL);
+	  gfc_conv_expr_val (&se, et);
+	  gfc_add_block_to_block (pblock, &se.pre);
+	  a1 = se.expr;
+	}
+      if (eo != NULL)
+	{
+	  gfc_init_se (&se, NULL);
+	  gfc_conv_expr_val (&se, eo);
+	  gfc_add_block_to_block (pblock, &se.pre);
+	  a2 = se.expr;
+	  if (expr->value.op.op == INTRINSIC_MINUS && expr->value.op.op2 == eo)
+	    /* outer-var - a2.  */
+	    a2 = fold_build1 (NEGATE_EXPR, TREE_TYPE (a2), a2);
+	  else if (expr->value.op.op == INTRINSIC_MINUS)
+	    /* a2 - outer-var.  */
+	    a1 = fold_build1 (NEGATE_EXPR, TREE_TYPE (a1), a1);
+	}
+      a1 = DECL_P (a1) ? a1 : gfc_evaluate_now (a1, pblock);
+      a2 = DECL_P (a2) ? a2 : gfc_evaluate_now (a2, pblock);
+    }
+
+  gfc_init_se (sep, NULL);
+  sep->expr = make_tree_vec (3);
+  TREE_VEC_ELT (sep->expr, 0) = tree_var;
+  TREE_VEC_ELT (sep->expr, 1) = fold_convert (TREE_TYPE (tree_var), a1);
+  TREE_VEC_ELT (sep->expr, 2) = fold_convert (TREE_TYPE (tree_var), a2);
+
+  return true;
+}
 
 static tree
 gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
@@ -5219,19 +5349,35 @@  gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
       gcc_assert (TREE_CODE (type) == INTEGER_TYPE);
 
       gfc_init_se (&se, NULL);
-      gfc_conv_expr_val (&se, code->ext.iterator->start);
-      gfc_add_block_to_block (pblock, &se.pre);
-      from = gfc_evaluate_now (se.expr, pblock);
+      if (!clauses->non_rectangular
+	  || !gfc_nonrect_loop_expr (pblock, &se, i, orig_code->block->next,
+				     code->ext.iterator->start, &inits))
+	{
+	  gfc_conv_expr_val (&se, code->ext.iterator->start);
+	  gfc_add_block_to_block (pblock, &se.pre);
+	  if (!DECL_P (se.expr))
+	    se.expr = gfc_evaluate_now (se.expr, pblock);
+	}
+      from = se.expr;
 
       gfc_init_se (&se, NULL);
-      gfc_conv_expr_val (&se, code->ext.iterator->end);
-      gfc_add_block_to_block (pblock, &se.pre);
-      to = gfc_evaluate_now (se.expr, pblock);
+      if (!clauses->non_rectangular
+	  || !gfc_nonrect_loop_expr (pblock, &se, i, orig_code->block->next,
+				     code->ext.iterator->end, &inits))
+	{
+	  gfc_conv_expr_val (&se, code->ext.iterator->end);
+	  gfc_add_block_to_block (pblock, &se.pre);
+	  if (!DECL_P (se.expr))
+	    se.expr = gfc_evaluate_now (se.expr, pblock);
+	}
+      to = se.expr;
 
       gfc_init_se (&se, NULL);
       gfc_conv_expr_val (&se, code->ext.iterator->step);
       gfc_add_block_to_block (pblock, &se.pre);
-      step = gfc_evaluate_now (se.expr, pblock);
+      if (!DECL_P (se.expr))
+	se.expr = gfc_evaluate_now (se.expr, pblock);
+      step = se.expr;
       dovar_decl = dovar;
 
       /* Special case simple loops.  */
@@ -5331,9 +5477,9 @@  gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
 	      OMP_CLAUSE_LINEAR_NO_COPYIN (tmp) = 1;
 	      OMP_CLAUSE_DECL (tmp) = dovar_decl;
 	      omp_clauses = gfc_trans_add_clause (tmp, omp_clauses);
+	      if (!simple)
+		dovar_found = 3;
 	    }
-	  if (!simple)
-	    dovar_found = 3;
 	}
       else if (!dovar_found && !simple)
 	{
@@ -5367,9 +5513,8 @@  gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
 		}
 	      else
 		{
-		  tmp = gfc_evaluate_now (step, pblock);
 		  tmp = fold_build2_loc (input_location, PLUS_EXPR, type,
-					 dovar, tmp);
+					 dovar, step);
 		}
 	      tmp = fold_build2_loc (input_location, MODIFY_EXPR, type,
 				     dovar, tmp);
diff --git a/gcc/testsuite/gfortran.dg/gomp/non-rectangular-loop-3.f90 b/gcc/testsuite/gfortran.dg/gomp/non-rectangular-loop-3.f90
new file mode 100644
index 00000000000..5c8e92589dc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/non-rectangular-loop-3.f90
@@ -0,0 +1,85 @@ 
+! PR fortran/107424
+
+subroutine foo (av, avo, a0, a0o, a1, a2, a3, a4)
+implicit none
+
+integer, value :: av
+integer, value :: avo
+integer :: a0
+integer :: a0o
+integer, pointer :: a1
+integer, pointer, optional :: a2
+integer, allocatable :: a3
+integer, allocatable, optional :: a4
+integer :: a5
+integer, pointer :: a6
+integer, allocatable :: a7
+
+integer :: j
+
+!$omp simd collapse(2)
+do av = 1, 10  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with dummy-argument or pointer, optional or allocatable do-variable 'av'" }
+  do j = av, 20  ! { dg-note "Used here" }
+  end do
+end do
+
+!$omp simd collapse(2)
+do avo = 1, 10  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with dummy-argument or pointer, optional or allocatable do-variable 'avo'" }
+  do j = avo, 20  ! { dg-note "Used here" }
+  end do
+end do
+
+!$omp simd collapse(2)
+do a0 = 1, 10  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with dummy-argument or pointer, optional or allocatable do-variable 'a0'" }
+  do j = a0, 20  ! { dg-note "Used here" }
+  end do
+end do
+
+!$omp simd collapse(2)
+do a0o = 1, 10  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with dummy-argument or pointer, optional or allocatable do-variable 'a0o'" }
+  do j = a0o, 20  ! { dg-note "Used here" }
+  end do
+end do
+
+!$omp simd collapse(2)
+do a1 = 1, 10  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with dummy-argument or pointer, optional or allocatable do-variable 'a1'" }
+  do j = a1, 20  ! { dg-note "Used here" }
+  end do
+end do
+
+!$omp simd collapse(2)
+do a2 = 1, 10  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with dummy-argument or pointer, optional or allocatable do-variable 'a2'" }
+  do j = a2, 20  ! { dg-note "Used here" }
+  end do
+end do
+
+!$omp simd collapse(2)
+do a3 = 1, 10  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with dummy-argument or pointer, optional or allocatable do-variable 'a3'" }
+  do j = a3, 20  ! { dg-note "Used here" }
+  end do
+end do
+
+!$omp simd collapse(2)
+do a4 = 1, 10  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with dummy-argument or pointer, optional or allocatable do-variable 'a4'" }
+  do j = a4, 20  ! { dg-note "Used here" }
+  end do
+end do
+
+!$omp simd collapse(2)
+do a5 = 1, 10
+  do j = a5, 20
+  end do
+end do
+
+!$omp simd collapse(2)
+do a6 = 1, 10  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with dummy-argument or pointer, optional or allocatable do-variable 'a6'" }
+  do j = a6, 20  ! { dg-note "Used here" }
+  end do
+end do
+
+!$omp simd collapse(2)
+do a7 = 1, 10  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with dummy-argument or pointer, optional or allocatable do-variable 'a7'" }
+  do j = a7, 20  ! { dg-note "Used here" }
+  end do
+end do
+end
diff --git a/libgomp/testsuite/libgomp.fortran/non-rectangular-loop-1.f90 b/libgomp/testsuite/libgomp.fortran/non-rectangular-loop-1.f90
new file mode 100644
index 00000000000..99bfa1a0c98
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/non-rectangular-loop-1.f90
@@ -0,0 +1,637 @@ 
+! { dg-do run }
+! { dg-additional-options "-msse2" { target sse2_runtime } }
+! { dg-additional-options "-mavx" { target avx_runtime } }
+
+! PR fortran/107424
+
+! Nonrectangular loop nests checks
+
+! See PR or non-rectangular-loop-1a.f90 for the commented tests
+! Hint: Those use strides for loop vars part of nonrectangular loop nests
+
+module m
+  implicit none (type, external)
+contains
+
+! The 'k' loop uses i or j as start value
+! but a constant end value such that 'lastprivate'
+! should be well-defined
+subroutine lastprivate_check_simd_1
+  integer :: n,m,p, i,j,k
+
+  n = 11
+  m = 23
+  p = 27
+
+  ! Use 'i' or 'j', unite stride on 'i' or on 'j' -> 4 loops
+  ! Then same, execpt use nonunit stride for 'k'
+
+!  !$omp simd collapse(3) lastprivate(k)
+!  do i = 1, n
+!    do j = 1, m, 2
+!      do k = j - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+
+!  !$omp simd collapse(3) lastprivate(k)
+!  do i = 1, n, 2
+!    do j = 1, m
+!      do k = i - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+
+  !$omp simd collapse(3) lastprivate(k)
+  do i = 1, n, 2
+    do j = 1, m
+      do k = j - 41, p
+        if (k < 1 - 41 .or. k > p) then
+          print *, i, j, k,p, " -> i, j, k, p   (k < 1 - 41 .or. k > p)"
+          error stop
+        end if
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp simd collapse(3) lastprivate(k)
+  do i = 1, n
+    do j = 1, m, 2
+      do k = i - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same but 'private' for all (i,j) vars
+
+!  !$omp simd collapse(3) lastprivate(k) private(i,j)
+!  do i = 1, n
+!    do j = 1, m, 2
+!      do k = j - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+!
+!  !$omp simd collapse(3) lastprivate(k) private(i,j)
+!  do i = 1, n, 2
+!    do j = 1, m
+!      do k = i - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+
+  !$omp simd collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n, 2
+    do j = 1, m
+      do k = j - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp simd collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n
+    do j = 1, m, 2
+      do k = i - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same - but with lastprivate(i,j)
+
+!  !$omp simd collapse(3) lastprivate(k) lastprivate(i,j)
+!  do i = 1, n
+!    do j = 1, m, 2
+!      do k = j - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+!  if (i /= n + 1 .or. j /= m + 2) error stop
+
+!  !$omp simd collapse(3) lastprivate(k) lastprivate(i,j)
+!  do i = 1, n, 2
+!    do j = 1, m
+!      do k = i - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+!  if (i /= n + 2 .or. j /= m + 1) error stop
+
+  !$omp simd collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n, 2
+    do j = 1, m
+      do k = j - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 2 .or. j /= m + 1) error stop
+
+  !$omp simd collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n
+    do j = 1, m, 2
+      do k = i - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 1 .or. j /= m + 2) error stop
+end subroutine lastprivate_check_simd_1
+
+
+! Same but with do simd
+subroutine lastprivate_check_do_simd_1
+  integer :: n,m,p, i,j,k
+
+  n = 11
+  m = 23
+  p = 27
+
+  ! Use 'i' or 'j', unite stride on 'i' or on 'j' -> 4 loops
+  ! Then same, execpt use nonunit stride for 'k'
+
+!  !$omp parallel do simd collapse(3) lastprivate(k)
+!  do i = 1, n
+!    do j = 1, m, 2
+!      do k = j - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+
+!  !$omp parallel do simd collapse(3) lastprivate(k)
+!  do i = 1, n, 2
+!    do j = 1, m
+!      do k = i - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+
+  !$omp parallel do simd collapse(3) lastprivate(k)
+  do i = 1, n, 2
+    do j = 1, m
+      do k = j - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp parallel do simd collapse(3) lastprivate(k)
+  do i = 1, n
+    do j = 1, m, 2
+      do k = i - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same but 'private' for all (i,j) vars
+
+!  !$omp parallel do simd collapse(3) lastprivate(k) private(i,j)
+!  do i = 1, n
+!    do j = 1, m, 2
+!      do k = j - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+
+!  !$omp parallel do simd collapse(3) lastprivate(k) private(i,j)
+!  do i = 1, n, 2
+!    do j = 1, m
+!      do k = i - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+
+  !$omp parallel do simd collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n, 2
+    do j = 1, m
+      do k = j - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp parallel do simd collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n
+    do j = 1, m, 2
+      do k = i - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same - but with lastprivate(i,j)
+
+!  !$omp parallel do simd collapse(3) lastprivate(k) lastprivate(i,j)
+!  do i = 1, n
+!    do j = 1, m, 2
+!      do k = j - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+!  if (i /= n + 1 .or. j /= m + 2) error stop
+
+!  !$omp parallel do simd collapse(3) lastprivate(k) lastprivate(i,j)
+!  do i = 1, n, 2
+!    do j = 1, m
+!      do k = i - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+!  if (i /= n + 2 .or. j /= m + 1) error stop
+
+  !$omp parallel do simd collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n, 2
+    do j = 1, m
+      do k = j - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 2 .or. j /= m + 1) error stop
+
+  !$omp parallel do simd collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n
+    do j = 1, m, 2
+      do k = i - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 1 .or. j /= m + 2) error stop
+end subroutine lastprivate_check_do_simd_1
+
+
+
+! Same but with do
+subroutine lastprivate_check_do_1
+  integer :: n,m,p, i,j,k
+
+  n = 11
+  m = 23
+  p = 27
+
+  ! Use 'i' or 'j', unite stride on 'i' or on 'j' -> 4 loops
+  ! Then same, execpt use nonunit stride for 'k'
+
+!  !$omp parallel do collapse(3) lastprivate(k)
+!  do i = 1, n
+!    do j = 1, m, 2
+!      do k = j - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+
+!  !$omp parallel do collapse(3) lastprivate(k)
+!  do i = 1, n, 2
+!    do j = 1, m
+!      do k = i - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+
+  !$omp parallel do collapse(3) lastprivate(k)
+  do i = 1, n, 2
+    do j = 1, m
+      do k = j - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp parallel do collapse(3) lastprivate(k)
+  do i = 1, n
+    do j = 1, m, 2
+      do k = i - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same but 'private' for all (i,j) vars
+
+!  !$omp parallel do collapse(3) lastprivate(k) private(i,j)
+!  do i = 1, n
+!    do j = 1, m, 2
+!      do k = j - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+
+!  !$omp parallel do collapse(3) lastprivate(k) private(i,j)
+!  do i = 1, n, 2
+!    do j = 1, m
+!      do k = i - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+
+  !$omp parallel do collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n, 2
+    do j = 1, m
+      do k = j - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp parallel do collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n
+    do j = 1, m, 2
+      do k = i - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same - but with lastprivate(i,j)
+
+!  !$omp parallel do collapse(3) lastprivate(k) lastprivate(i,j)
+!  do i = 1, n
+!    do j = 1, m, 2
+!      do k = j - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+!  if (i /= n + 1 .or. j /= m + 2) error stop
+
+!  !$omp parallel do collapse(3) lastprivate(k) lastprivate(i,j)
+!  do i = 1, n, 2
+!    do j = 1, m
+!      do k = i - 41, p
+!        if (k < 1 - 41 .or. k > p) error stop
+!      end do
+!    end do
+!  end do
+!  if (k /= p + 1) error stop
+!  if (i /= n + 2 .or. j /= m + 1) error stop
+
+  !$omp parallel do collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n, 2
+    do j = 1, m
+      do k = j - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 2 .or. j /= m + 1) error stop
+
+  !$omp parallel do collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n
+    do j = 1, m, 2
+      do k = i - 41, p
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 1 .or. j /= m + 2) error stop
+end subroutine lastprivate_check_do_1
+
+
+
+subroutine lastprivate_check_2
+  integer :: n,m,p, i,j,k,ll
+
+  n = 11
+  m = 23
+  p = 27
+
+!  !$omp parallel do simd collapse(3) lastprivate(p)
+!  do i = 1, n
+!    do j = 1, m,2
+!      do k = 1, j + 41
+!        do ll = 1, p, 2
+!          if (k > 23 + 41 .or. k < 1) error stop
+!        end do
+!      end do
+!    end do
+!  end do
+!  if (ll /= 29) error stop
+
+!  !$omp simd collapse(3) lastprivate(p)
+!  do i = 1, n
+!    do j = 1, m,2
+!      do k = 1, j + 41
+!        do ll = 1, p, 2
+!          if (k > 23 + 41 .or. k < 1) error stop
+!        end do
+!      end do
+!    end do
+!  end do
+!  if (ll /= 29) error stop
+
+!  !$omp simd collapse(3) lastprivate(k)
+!  do i = 1, n,2
+!    do j = 1, m
+!      do k = 1, i + 41
+!        if (k > 11 + 41 .or. k < 1) error stop
+!      end do
+!    end do
+!  end do
+!if (k /= 53) then
+!  print *, k, 53
+!  error stop
+!endif
+
+!$omp simd collapse(3) lastprivate(k)
+do i = 1, n,2
+  do j = 1, m
+    do k = 1, j + 41
+      if (k > 23 + 41 .or. k < 1) error stop
+    end do
+  end do
+end do
+if (k /= 65) then
+  print *, k, 65
+  error stop
+endif
+
+
+!$omp simd collapse(3) lastprivate(k)
+do i = 1, n
+  do j = 1, m,2
+    do k = 1, i + 41
+      if (k > 11 + 41 .or. k < 1) error stop
+    end do
+  end do
+end do
+if (k /= 53) then
+  print *, k, 53
+  error stop
+endif
+
+! - Same but without 'private':
+!!$omp simd collapse(3) lastprivate(k)
+!do i = 1, n
+!  do j = 1, m,2
+!    do k = 1, j + 41
+!      if (k > 23 + 41 .or. k < 1) error stop
+!    end do
+!  end do
+!end do
+!if (k /= 65) then
+!  print *, k, 65
+!  error stop
+!endif
+
+
+!!$omp simd collapse(3) lastprivate(k)
+!do i = 1, n,2
+!  do j = 1, m
+!    do k = 1, i + 41
+!      if (k > 11 + 41 .or. k < 1) error stop
+!    end do
+!  end do
+!end do
+!if (k /= 53) then
+!  print *, k, 53
+!  error stop
+!endif
+
+!$omp simd collapse(3) lastprivate(k)
+do i = 1, n,2
+  do j = 1, m
+    do k = 1, j + 41
+      if (k > 23 + 41 .or. k < 1) error stop
+    end do
+  end do
+end do
+if (k /= 65) then
+  print *, k, 65
+  error stop
+endif
+
+
+!$omp simd collapse(3) lastprivate(k)
+do i = 1, n
+  do j = 1, m,2
+    do k = 1, i + 41
+      if (k > 11 + 41 .or. k < 1) error stop
+    end do
+  end do
+end do
+if (k /= 53) then
+  print *, k, 53
+  error stop
+endif
+
+! - all with lastprivate
+!!$omp simd collapse(3) lastprivate(k) lastprivate(i, j)
+!do i = 1, n
+!  do j = 1, m,2
+!    do k = 1, j + 41
+!      if (k > 23 + 41 .or. k < 1) error stop
+!    end do
+!  end do
+!end do
+!if (k /= 65) then
+!  print *, k, 65
+!  error stop
+!endif
+
+
+!!$omp simd collapse(3) lastprivate(k) lastprivate(i, j)
+!do i = 1, n,2
+!  do j = 1, m
+!    do k = 1, i + 41
+!      if (k > 11 + 41 .or. k < 1) error stop
+!    end do
+!  end do
+!end do
+!if (k /= 53) then
+!  print *, k, 53
+!  error stop
+!endif
+
+!$omp simd collapse(3) lastprivate(k) lastprivate(i, j)
+do i = 1, n,2
+  do j = 1, m
+    do k = 1, j + 41
+      if (k > 23 + 41 .or. k < 1) error stop
+    end do
+  end do
+end do
+if (k /= 65) then
+  print *, k, 65
+  error stop
+endif
+
+
+!$omp simd collapse(3) lastprivate(k) lastprivate(i, j)
+do i = 1, n
+  do j = 1, m,2
+    do k = 1, i + 41
+      if (k > 11 + 41 .or. k < 1) error stop
+    end do
+  end do
+end do
+if (k /= 53) then
+  print *, k, 53
+  error stop
+endif
+
+end
+end module m
+
+program main
+  use m
+  implicit none (type, external)
+  call lastprivate_check_simd_1
+  call lastprivate_check_do_simd_1
+  call lastprivate_check_do_1
+  call lastprivate_check_2
+end
diff --git a/libgomp/testsuite/libgomp.fortran/non-rectangular-loop-1a.f90 b/libgomp/testsuite/libgomp.fortran/non-rectangular-loop-1a.f90
new file mode 100644
index 00000000000..9607fcc1038
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/non-rectangular-loop-1a.f90
@@ -0,0 +1,374 @@ 
+! { dg-do compile }
+! { dg-additional-options "-msse2" { target sse2_runtime } }
+! { dg-additional-options "-mavx" { target avx_runtime } }
+
+! PR fortran/107424
+
+! Nonrectangular loop nests checks
+
+! ========================================================
+! NOTE: The testcases are from non-rectangular-loop-1.f90,
+! but commented there. Feel free to remove this
+! file + uncomment them in non-rectangular-loop-1.f90
+! Otherwise, you need to change it to 'dg-do run'!
+! ========================================================
+
+module m
+  implicit none (type, external)
+contains
+
+! The 'k' loop uses i or j as start value
+! but a constant end value such that 'lastprivate'
+! should be well-defined
+subroutine lastprivate_check_simd_1
+  integer :: n,m,p, i,j,k
+
+  n = 11
+  m = 23
+  p = 27
+
+  ! Use 'i' or 'j', unite stride on 'i' or on 'j' -> 4 loops
+  ! Then same, execpt use nonunit stride for 'k'
+
+  !$omp simd collapse(3) lastprivate(k)
+  do i = 1, n
+    do j = 1, m, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+      do k = j - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp simd collapse(3) lastprivate(k)
+  do i = 1, n, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+    do j = 1, m
+      do k = i - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same but 'private' for all (i,j) vars
+
+  !$omp simd collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n
+    do j = 1, m, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+      do k = j - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp simd collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+    do j = 1, m
+      do k = i - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same - but with lastprivate(i,j)
+
+  !$omp simd collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n
+    do j = 1, m, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+      do k = j - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 1 .or. j /= m + 2) error stop
+
+  !$omp simd collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+    do j = 1, m
+      do k = i - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 2 .or. j /= m + 1) error stop
+
+end subroutine lastprivate_check_simd_1
+
+
+! Same but with do simd
+subroutine lastprivate_check_do_simd_1
+  integer :: n,m,p, i,j,k
+
+  n = 11
+  m = 23
+  p = 27
+
+  ! Use 'i' or 'j', unite stride on 'i' or on 'j' -> 4 loops
+  ! Then same, execpt use nonunit stride for 'k'
+
+  !$omp parallel do simd collapse(3) lastprivate(k)
+  do i = 1, n
+    do j = 1, m, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+      do k = j - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp parallel do simd collapse(3) lastprivate(k)
+  do i = 1, n, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+    do j = 1, m
+      do k = i - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same but 'private' for all (i,j) vars
+
+  !$omp parallel do simd collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n
+    do j = 1, m, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+      do k = j - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp parallel do simd collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+    do j = 1, m
+      do k = i - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same - but with lastprivate(i,j)
+
+  !$omp parallel do simd collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n
+    do j = 1, m, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+      do k = j - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 1 .or. j /= m + 2) error stop
+
+  !$omp parallel do simd collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+    do j = 1, m
+      do k = i - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 2 .or. j /= m + 1) error stop
+
+end subroutine lastprivate_check_do_simd_1
+
+
+
+! Same but with do
+subroutine lastprivate_check_do_1
+  integer :: n,m,p, i,j,k
+
+  n = 11
+  m = 23
+  p = 27
+
+  ! Use 'i' or 'j', unite stride on 'i' or on 'j' -> 4 loops
+  ! Then same, execpt use nonunit stride for 'k'
+
+  !$omp parallel do collapse(3) lastprivate(k)
+  do i = 1, n
+    do j = 1, m, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+      do k = j - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp parallel do collapse(3) lastprivate(k)
+  do i = 1, n, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+    do j = 1, m
+      do k = i - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same but 'private' for all (i,j) vars
+
+  !$omp parallel do collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n
+    do j = 1, m, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+      do k = j - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  !$omp parallel do collapse(3) lastprivate(k) private(i,j)
+  do i = 1, n, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+    do j = 1, m
+      do k = i - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+
+  ! Same - but with lastprivate(i,j)
+
+  !$omp parallel do collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n
+    do j = 1, m, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+      do k = j - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 1 .or. j /= m + 2) error stop
+
+  !$omp parallel do collapse(3) lastprivate(k) lastprivate(i,j)
+  do i = 1, n, 2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+    do j = 1, m
+      do k = i - 41, p  ! { dg-note "Used here" }
+        if (k < 1 - 41 .or. k > p) error stop
+      end do
+    end do
+  end do
+  if (k /= p + 1) error stop
+  if (i /= n + 2 .or. j /= m + 1) error stop
+
+end subroutine lastprivate_check_do_1
+
+
+
+subroutine lastprivate_check_2
+  integer :: n,m,p, i,j,k,ll
+
+  n = 11
+  m = 23
+  p = 27
+
+  !$omp parallel do simd collapse(3) lastprivate(p)
+  do i = 1, n
+    do j = 1, m,2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+      do k = 1, j + 41  ! { dg-note "Used here" }
+        do ll = 1, p, 2
+          if (k > 23 + 41 .or. k < 1) error stop
+        end do
+      end do
+    end do
+  end do
+  if (ll /= 29) error stop
+
+  !$omp simd collapse(3) lastprivate(p)
+  do i = 1, n
+    do j = 1, m,2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+      do k = 1, j + 41  ! { dg-note "Used here" }
+        do ll = 1, p, 2
+          if (k > 23 + 41 .or. k < 1) error stop
+        end do
+      end do
+    end do
+  end do
+  if (ll /= 29) error stop
+
+  !$omp simd collapse(3) lastprivate(k)
+  do i = 1, n,2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+    do j = 1, m
+      do k = 1, i + 41  ! { dg-note "Used here" }
+        if (k > 11 + 41 .or. k < 1) error stop
+      end do
+    end do
+  end do
+if (k /= 53) then
+  print *, k, 53
+  error stop
+endif
+
+! - Same but without 'private':
+!$omp simd collapse(3) lastprivate(k)
+do i = 1, n
+  do j = 1, m,2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+    do k = 1, j + 41  ! { dg-note "Used here" }
+      if (k > 23 + 41 .or. k < 1) error stop
+    end do
+  end do
+end do
+if (k /= 65) then
+  print *, k, 65
+  error stop
+endif
+
+
+!$omp simd collapse(3) lastprivate(k)
+do i = 1, n,2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+  do j = 1, m
+    do k = 1, i + 41  ! { dg-note "Used here" }
+      if (k > 11 + 41 .or. k < 1) error stop
+    end do
+  end do
+end do
+if (k /= 53) then
+  print *, k, 53
+  error stop
+endif
+
+! - all with lastprivate
+!$omp simd collapse(3) lastprivate(k) lastprivate(i, j)
+do i = 1, n
+  do j = 1, m,2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'j'" }
+    do k = 1, j + 41  ! { dg-note "Used here" }
+      if (k > 23 + 41 .or. k < 1) error stop
+    end do
+  end do
+end do
+if (k /= 65) then
+  print *, k, 65
+  error stop
+endif
+
+
+!$omp simd collapse(3) lastprivate(k) lastprivate(i, j)
+do i = 1, n,2  ! { dg-message "sorry, unimplemented: non-rectangular loop nest with non-unit loop iteration step for 'i'" }
+  do j = 1, m
+    do k = 1, i + 41  ! { dg-note "Used here" }
+      if (k > 11 + 41 .or. k < 1) error stop
+    end do
+  end do
+end do
+if (k /= 53) then
+  print *, k, 53
+  error stop
+endif
+
+end
+end module m
+
+program main
+  use m
+  implicit none (type, external)
+  call lastprivate_check_simd_1
+  call lastprivate_check_do_simd_1
+  call lastprivate_check_do_1
+  call lastprivate_check_2
+end
diff --git a/libgomp/testsuite/libgomp.fortran/non-rectangular-loop-2.f90 b/libgomp/testsuite/libgomp.fortran/non-rectangular-loop-2.f90
new file mode 100644
index 00000000000..0cea61e5f0d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/non-rectangular-loop-2.f90
@@ -0,0 +1,243 @@ 
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-original -fcheck=all" }
+
+! PR fortran/107424
+
+! Nonrectangular loop nests checks
+
+! Valid patterns are:
+!  (1)  a2 - var-outer
+!  (2)  a1 * var-outer
+!  (3)  a1 * var-outer + a2
+!  (4)  a2 + a1 * var-outer
+!  (5)  a1 * var-outer - a2
+!  (6)  a2 - a1 * var-outer
+!  (7)  var-outer * a1
+!  (8)  var-outer * a1 + a2
+!  (9)  a2 + var-outer * a1
+! (10)  var-outer * a1 - a2
+! (11)  a2 - var-outer * a1
+
+module m
+contains
+
+
+! { dg-final { scan-tree-dump-times "for \\(one_two_inner = one_two_outer \\* -1 \\+ one_a2; one_two_inner <= one_two_outer \\* two_a1 \\+ 0; one_two_inner = one_two_inner \\+ 1\\)" 1 original } }
+
+!  (1)  a2 - var-outer
+!  (2)  a1 * var-outer
+subroutine one_two()
+  implicit none
+  integer :: one_a2
+  integer :: two_a1
+  integer :: one_two_outer, one_two_inner
+  integer :: i, j
+  integer, allocatable :: var(:,:)
+
+  one_a2 = 13
+  two_a1 = 5
+  allocate(var(1:10, one_a2 - 10:two_a1 * 10), &
+           source=0)
+  if (size(var) <= 4) error stop
+
+  !$omp simd collapse(2)
+  do one_two_outer = 1, 10
+    do one_two_inner = one_a2 - one_two_outer, two_a1 * one_two_outer
+      !$omp atomic update
+      var(one_two_outer,one_two_inner) = var(one_two_outer,one_two_inner) + 2
+    end do
+  end do
+
+  do i = 1, 10
+    do j = one_a2 - i, two_a1 * i
+      if (var(i,j) /= 2) error stop
+    end do
+  end do
+end
+
+
+! { dg-final { scan-tree-dump-times "for \\(three_four_inner = three_four_outer \\* three_a1 \\+ three_a2; three_four_inner <= three_four_outer \\* four_a1 \\+ four_a2; three_four_inner = three_four_inner \\+ 1\\)" 1 original } }
+
+!  (3)  a1 * var-outer + a2
+!  (4)  a2 + a1 * var-outer
+subroutine three_four()
+  implicit none
+  integer :: three_a1, three_a2
+  integer :: four_a1, four_a2
+  integer :: three_four_outer, three_four_inner
+  integer :: i, j
+  integer, allocatable :: var(:,:)
+
+  three_a1 = 2
+  three_a2 = 3
+  four_a1 = 3
+  four_a2 = 5
+  allocate(var(1:10, three_a1 * 1 + three_a2:four_a2 + four_a1 * 10), &
+           source=0)
+  if (size(var) <= 4) error stop
+
+  !$omp simd collapse(2)
+  do three_four_outer = 1, 10
+    do three_four_inner = three_a1 * three_four_outer + three_a2, four_a2 + four_a1 * three_four_outer
+      !$omp atomic update
+      var(three_four_outer, three_four_inner) = var(three_four_outer, three_four_inner) + 2
+    end do
+  end do
+  do i = 1, 10
+    do j = three_a1 * i + three_a2, four_a2 + four_a1 * i
+      if (var(i,j) /= 2) error stop
+    end do
+  end do
+end
+
+
+! { dg-final { scan-tree-dump-times "for \\(five_six_inner = five_six_outer \\* five_a1 \\+ D\\.\[0-9\]+; five_six_inner <= five_six_outer \\* D\\.\[0-9\]+ \\+ six_a2; five_six_inner = five_six_inner \\+ 1\\)" 1 original } }
+
+!  (5)  a1 * var-outer - a2
+!  (6)  a2 - a1 * var-outer
+subroutine five_six()
+  implicit none
+  integer :: five_a1, five_a2
+  integer :: six_a1, six_a2
+  integer :: five_six_outer, five_six_inner
+  integer :: i, j
+  integer, allocatable :: var(:,:)
+
+  five_a1 = 2
+  five_a2 = -3
+  six_a1 = 3
+  six_a2 = 20
+  allocate(var(1:10, five_a1 * 1 - five_a2:six_a2 - six_a1 * 1), &
+           source=0)
+  if (size(var) <= 4) error stop
+
+  !$omp simd collapse(2)
+  do five_six_outer = 1, 10
+    do five_six_inner = five_a1 * five_six_outer - five_a2, six_a2 - six_a1 * five_six_outer
+      !$omp atomic update
+      var(five_six_outer, five_six_inner) = var(five_six_outer, five_six_inner) + 2
+    end do
+  end do
+
+  do i = 1, 10
+    do j = five_a1 * i - five_a2, six_a2 - six_a1 * i
+      if (var(i,j) /= 2) error stop
+    end do
+  end do
+end
+
+
+! { dg-final { scan-tree-dump-times "for \\(seven_eight_inner = seven_eight_outer \\* seven_a1 \\+ 0; seven_eight_inner <= seven_eight_outer \\* eight_a1 \\+ eight_a2; seven_eight_inner = seven_eight_inner \\+ 1\\)" 1 original } }
+
+!  (7)  var-outer * a1
+!  (8)  var-outer * a1 + a2
+subroutine seven_eight()
+  implicit none
+  integer :: seven_a1
+  integer :: eight_a1, eight_a2
+  integer :: seven_eight_outer, seven_eight_inner
+  integer :: i, j
+  integer, allocatable :: var(:,:)
+
+  seven_a1 = 3
+  eight_a1 = 2
+  eight_a2 = -4
+  allocate(var(1:10, 1 * seven_a1 : 10 * eight_a1 + eight_a2), &
+           source=0)
+  if (size(var) <= 4) error stop
+
+  !$omp simd collapse(2)
+  do seven_eight_outer = 1, 10
+    do seven_eight_inner = seven_eight_outer * seven_a1, seven_eight_outer * eight_a1 + eight_a2
+      !$omp atomic update
+      var(seven_eight_outer, seven_eight_inner) = var(seven_eight_outer, seven_eight_inner) + 2
+    end do
+  end do
+
+  do i = 1, 10
+    do j = i * seven_a1, i * eight_a1 + eight_a2
+      if (var(i,j) /= 2) error stop
+    end do
+  end do
+end
+
+
+! { dg-final { scan-tree-dump-times "for \\(nine_ten_inner = nine_ten_outer \\* nine_a1 \\+ nine_a2; nine_ten_inner <= nine_ten_outer \\* ten_a1 \\+ D\\.\[0-9\]+; nine_ten_inner = nine_ten_inner \\+ 1\\)" 1 original } }
+
+!  (9)  a2 + var-outer * a1
+! (10)  var-outer * a1 - a2
+subroutine nine_ten()
+  implicit none
+  integer :: nine_a1, nine_a2
+  integer :: ten_a1, ten_a2
+  integer :: nine_ten_outer, nine_ten_inner
+  integer :: i, j
+  integer, allocatable :: var(:,:)
+
+  nine_a1 = 3
+  nine_a2 = 5
+  ten_a1 = 2
+  ten_a2 = 3
+  allocate(var(1:10, nine_a2 + 1 * nine_a1:10 * ten_a1 - ten_a2), &
+           source=0)
+  if (size(var) <= 4) error stop
+
+  !$omp simd collapse(2)
+  do nine_ten_outer = 1, 10
+    do nine_ten_inner = nine_a2 + nine_ten_outer * nine_a1, nine_ten_outer * ten_a1 - ten_a2
+      !$omp atomic update
+      var(nine_ten_outer, nine_ten_inner) = var(nine_ten_outer, nine_ten_inner) + 2
+    end do
+  end do
+
+  do i = 1, 10
+    do j = nine_a2 + i * nine_a1, i * ten_a1 - ten_a2
+      if (var(i,j) /= 2) error stop
+    end do
+  end do
+end
+
+
+! { dg-final { scan-tree-dump-times "for \\(eleven_inner = eleven_outer \\* D\\.\[0-9\]+ \\+ eleven_a2; eleven_inner <= 10; eleven_inner = eleven_inner \\+ 1\\)" 1 original } }
+
+! (11)  a2 - var-outer * a1
+
+subroutine eleven()
+  implicit none
+  integer :: eleven_a1, eleven_a2
+  integer :: eleven_outer, eleven_inner
+  integer :: i, j
+  integer, allocatable :: var(:,:)
+
+  eleven_a1 = 2
+  eleven_a2 = 3
+  allocate(var(1:10, eleven_a2 - 10 * eleven_a1 : 10), &
+           source=0)
+  if (size(var) <= 4) error stop
+
+  !$omp simd collapse(2)
+  do eleven_outer = 1, 10
+    do eleven_inner = eleven_a2 - eleven_outer * eleven_a1, 10
+      !$omp atomic update
+      var(eleven_outer, eleven_inner) = var(eleven_outer, eleven_inner) + 2
+    end do
+  end do
+
+  do i = 1, 10
+    do j = eleven_a2 - i * eleven_a1, 10
+      if (var(i,j) /= 2) error stop
+    end do
+  end do
+end
+end module m
+
+program main
+use m
+implicit none
+call one_two()
+call three_four()
+call five_six()
+call seven_eight()
+call nine_ten()
+call eleven()
+end