[PR103149] introduce asmnesia internal function

Message ID orr1aviyen.fsf@lxoliva.fsfla.org
State New
Headers
Series [PR103149] introduce asmnesia internal function |

Commit Message

Alexandre Oliva Dec. 2, 2021, 1:33 a.m. UTC
  This is now used by -fharden-conditional-branches and
-fharden-compares to detach the values used in redundant hardening
compares from their original locations.  It eventually still expands
to an asm statement, as before, but the temporary is only introduced
at expand time, preventing gimple optimizations from diverging input
and output, which may prevent their unification in asm match operands.

Additional logic to check asm operands matches, and to keep them
matchable, which may improve some cases of "+m" and "+X" in asm that
are currently rejected because the single operand, split into input
and output, ends up diverging in ways that can't be merged back by
reload.

While investigating regressions hit by an earlier iteration of this
patch, I came across pr93027.c, and noticed that, with optimization
(unlike the test), reload overwrites one of the inputs with another.
This preexisted this patch, and I have not attempted to fix it.


The changes above paved the way to fix PR103149: a vectorized loop
ends up using vector booleans on aarch64, and the hardening pass
attempts to "copy-and-forget" them with an asm statement (thus
ASMNESIA) for the redundant hardening compare.

The problem there is that vector booleans, on the affected platform,
can't go in GENERAL_REGS, but the asm statement we used to emit could
only use those registers or memory.

The work-around introduced in the newly-introduced ASMNESIA expander
is to check whether the mode is suitable for GENERAL_REGS, and force
the asm operands to use MEM otherwise.  This is not ideal, but
functionality to map a mode to a constraint string that selects a
suitable register class doesn't seem readily available.  Maybe in a
future patch...

ASMNESIA currently works only with gimple registers of types with
scalar modes.  I have not attempted to make it more general than that,
though I envision future uses that might end up requiring it.  I'll
cross that bridge if/when I get to it.


Regstrapped on x86_64-linux-gnu.  Also bootstrapped along with a patch
that enables both compare hardening passes.  Built crosses to aarch64-
and riscv64-, and verified that the mentioned aarch64 PR is fixed.  Ok
to install?


(Manual inspection of the asm output for riscv PR103302 suggests that
the reported problem may be gone with the given testcase, but I have not
built enough of the target environment to enable me to run that.)



for  gcc/ChangeLog

	PR middle-end/103149
	* cfgexpand.c (expand_asm_stmt): Pass original input
	constraint, and constraint array, to asm_operand_ok.
	* gimple-hander-conditionals.cc (detach_value): Call
	ASMNESIA internal function.
	* internal-fn.c (expand_ASMNESIA): New.
	* internal-fn.def (ASMNESIA): New.
	* recog.c (asm_operand_ok): Turn into wrapper of...
	(asm_operand_ok_match): ... renamed.  Skip non-constraint
	characters faster.  Don't require register_operand for matched
	operands.  Combine results with...
	(check_match): New.
	(check_asm_operands): Enable match checking.

for  gcc/testsuite/ChangeLog

	PR middle-end/103149
	* gcc.target/aarch64/pr103149.c: New.
	* gcc.target/i386/pr85030.c: Also accept impossible constraint
	error.
---
 gcc/cfgexpand.c                             |    6 +
 gcc/gimple-harden-conditionals.cc           |   17 ----
 gcc/internal-fn.c                           |   81 +++++++++++++++++++
 gcc/internal-fn.def                         |    4 +
 gcc/recog.c                                 |  114 +++++++++++++++++++++++----
 gcc/testsuite/gcc.target/aarch64/pr103149.c |   13 +++
 gcc/testsuite/gcc.target/i386/pr85030.c     |    2 
 7 files changed, 200 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr103149.c
  

Comments

Richard Biener Dec. 2, 2021, 9:35 a.m. UTC | #1
On Wed, 1 Dec 2021, Alexandre Oliva wrote:

> 
> This is now used by -fharden-conditional-branches and
> -fharden-compares to detach the values used in redundant hardening
> compares from their original locations.  It eventually still expands
> to an asm statement, as before, but the temporary is only introduced
> at expand time, preventing gimple optimizations from diverging input
> and output, which may prevent their unification in asm match operands.
> 
> Additional logic to check asm operands matches, and to keep them
> matchable, which may improve some cases of "+m" and "+X" in asm that
> are currently rejected because the single operand, split into input
> and output, ends up diverging in ways that can't be merged back by
> reload.
> 
> While investigating regressions hit by an earlier iteration of this
> patch, I came across pr93027.c, and noticed that, with optimization
> (unlike the test), reload overwrites one of the inputs with another.
> This preexisted this patch, and I have not attempted to fix it.
> 
> 
> The changes above paved the way to fix PR103149: a vectorized loop
> ends up using vector booleans on aarch64, and the hardening pass
> attempts to "copy-and-forget" them with an asm statement (thus
> ASMNESIA) for the redundant hardening compare.
> 
> The problem there is that vector booleans, on the affected platform,
> can't go in GENERAL_REGS, but the asm statement we used to emit could
> only use those registers or memory.
> 
> The work-around introduced in the newly-introduced ASMNESIA expander
> is to check whether the mode is suitable for GENERAL_REGS, and force
> the asm operands to use MEM otherwise.  This is not ideal, but
> functionality to map a mode to a constraint string that selects a
> suitable register class doesn't seem readily available.  Maybe in a
> future patch...
> 
> ASMNESIA currently works only with gimple registers of types with
> scalar modes.  I have not attempted to make it more general than that,
> though I envision future uses that might end up requiring it.  I'll
> cross that bridge if/when I get to it.
> 
> 
> Regstrapped on x86_64-linux-gnu.  Also bootstrapped along with a patch
> that enables both compare hardening passes.  Built crosses to aarch64-
> and riscv64-, and verified that the mentioned aarch64 PR is fixed.  Ok
> to install?

While adding ASMNESIA looks OK at this point, I'm not sure about the
asm handling stuff.  You mention 'reload' above, do you mean LRA?

Thanks,
Richard.

> 
> (Manual inspection of the asm output for riscv PR103302 suggests that
> the reported problem may be gone with the given testcase, but I have not
> built enough of the target environment to enable me to run that.)
> 
> 
> 
> for  gcc/ChangeLog
> 
> 	PR middle-end/103149
> 	* cfgexpand.c (expand_asm_stmt): Pass original input
> 	constraint, and constraint array, to asm_operand_ok.
> 	* gimple-hander-conditionals.cc (detach_value): Call
> 	ASMNESIA internal function.
> 	* internal-fn.c (expand_ASMNESIA): New.
> 	* internal-fn.def (ASMNESIA): New.
> 	* recog.c (asm_operand_ok): Turn into wrapper of...
> 	(asm_operand_ok_match): ... renamed.  Skip non-constraint
> 	characters faster.  Don't require register_operand for matched
> 	operands.  Combine results with...
> 	(check_match): New.
> 	(check_asm_operands): Enable match checking.
> 
> for  gcc/testsuite/ChangeLog
> 
> 	PR middle-end/103149
> 	* gcc.target/aarch64/pr103149.c: New.
> 	* gcc.target/i386/pr85030.c: Also accept impossible constraint
> 	error.
> ---
>  gcc/cfgexpand.c                             |    6 +
>  gcc/gimple-harden-conditionals.cc           |   17 ----
>  gcc/internal-fn.c                           |   81 +++++++++++++++++++
>  gcc/internal-fn.def                         |    4 +
>  gcc/recog.c                                 |  114 +++++++++++++++++++++++----
>  gcc/testsuite/gcc.target/aarch64/pr103149.c |   13 +++
>  gcc/testsuite/gcc.target/i386/pr85030.c     |    2 
>  7 files changed, 200 insertions(+), 37 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr103149.c
> 
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index fb84d469f1e77..f0272ba1ac9ba 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -3374,10 +3374,10 @@ expand_asm_stmt (gasm *stmt)
>        tree val = input_tvec[i];
>        tree type = TREE_TYPE (val);
>        bool allows_reg, allows_mem, ok;
> -      const char *constraint;
> +      const char *constraint, *orig_constraint;
>        rtx op;
>  
> -      constraint = constraints[i + noutputs];
> +      orig_constraint = constraint = constraints[i + noutputs];
>        ok = parse_input_constraint (&constraint, i, ninputs, noutputs, 0,
>  				   constraints.address (),
>  				   &allows_mem, &allows_reg);
> @@ -3397,7 +3397,7 @@ expand_asm_stmt (gasm *stmt)
>        else if (MEM_P (op))
>  	op = validize_mem (op);
>  
> -      if (asm_operand_ok (op, constraint, NULL) <= 0)
> +      if (asm_operand_ok (op, orig_constraint, constraints.address ()) <= 0)
>  	{
>  	  if (allows_reg && TYPE_MODE (type) != BLKmode)
>  	    op = force_reg (TYPE_MODE (type), op);
> diff --git a/gcc/gimple-harden-conditionals.cc b/gcc/gimple-harden-conditionals.cc
> index cfa2361d65be0..3768b2e23bdaf 100644
> --- a/gcc/gimple-harden-conditionals.cc
> +++ b/gcc/gimple-harden-conditionals.cc
> @@ -132,21 +132,8 @@ detach_value (location_t loc, gimple_stmt_iterator *gsip, tree val)
>    tree ret = make_ssa_name (TREE_TYPE (val));
>    SET_SSA_NAME_VAR_OR_IDENTIFIER (ret, SSA_NAME_IDENTIFIER (val));
>  
> -  /* Output asm ("" : "=g" (ret) : "0" (val));  */
> -  vec<tree, va_gc> *inputs = NULL;
> -  vec<tree, va_gc> *outputs = NULL;
> -  vec_safe_push (outputs,
> -		 build_tree_list
> -		 (build_tree_list
> -		  (NULL_TREE, build_string (2, "=g")),
> -		  ret));
> -  vec_safe_push (inputs,
> -		 build_tree_list
> -		 (build_tree_list
> -		  (NULL_TREE, build_string (1, "0")),
> -		  val));
> -  gasm *detach = gimple_build_asm_vec ("", inputs, outputs,
> -				       NULL, NULL);
> +  gcall *detach = gimple_build_call_internal (IFN_ASMNESIA, 1, val);
> +  gimple_call_set_lhs (detach, ret);
>    gimple_set_location (detach, loc);
>    gsi_insert_before (gsip, detach, GSI_SAME_STMT);
>  
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 6ac3460d538be..a4a2ca91d9f93 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -3234,6 +3234,87 @@ expand_LAUNDER (internal_fn, gcall *call)
>    expand_assignment (lhs, gimple_call_arg (call, 0), false);
>  }
>  
> +/* Expand ASMNESIA to assignment and asm that makes the value unknown.  */
> +
> +static void
> +expand_ASMNESIA (internal_fn, gcall *call)
> +{
> +  tree to = gimple_call_lhs (call);
> +
> +  if (!to)
> +    return;
> +
> +  location_t locus = gimple_location (call);
> +
> +  tree from = gimple_call_arg (call, 0);
> +
> +  rtx to_rtx = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +
> +  rtx from_rtx = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> +
> +  enum machine_mode mode = GET_MODE (to_rtx);
> +
> +  gcc_checking_assert (mode != BLKmode && mode != VOIDmode
> +		       && (GET_MODE (from_rtx) == mode
> +			   || GET_MODE (from_rtx) == VOIDmode));
> +
> +  rtvec argvec = rtvec_alloc (1);
> +  rtvec constraintvec = rtvec_alloc (1);
> +  rtvec labelvec = rtvec_alloc (0);
> +
> +  bool need_memory = true;
> +  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +    if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i)
> +	&& targetm.hard_regno_mode_ok (i, mode))
> +      {
> +	need_memory = false;
> +	break;
> +      }
> +
> +  rtx regtemp = copy_to_mode_reg (mode, from_rtx);
> +  rtx temp = regtemp;
> +  const char *constraint = "=g";
> +
> +  if (need_memory)
> +    {
> +      constraint = "=m";
> +      temp = assign_stack_temp (mode, GET_MODE_SIZE (mode));
> +      emit_move_insn (temp, regtemp);
> +    }
> +
> +  /* This is roughly equivalent to asm ("" : "+g" (temp));
> +
> +     We use +m instead of +g if we NEED_MEMORY, i.e., the mode is not suitable
> +     for GENERAL_REGS.  ??? Ideally, we'd choose a suitable register class, if
> +     there is one, but how do we get a constraint type that maps to a it?  +X is
> +     at the same time too lax, because arbitrary RTL and x87 FP regs are not
> +     desirable, and too strict, because it doesn't guide any register class.
> +     ??? Maybe the errors reg-stack would flag on e.g. libgcc/_multc3 could be
> +     conditioned on the operand's being referenced?
> +
> +     Unlike expansion of gimple asm stmts, it doesn't go through
> +     targetm.md_asm_adjust (we don't wish any clobbers).
> +
> +     Furthermore, we ensure input and output start out with the same pseudo or
> +     MEM, which gimple doesn't.  This is particularly important for MEMs, since
> +     reloads can't merge disparate ones, and it would be important for X because
> +     it guides NO_REGS.  */
> +  rtx body = gen_rtx_ASM_OPERANDS (mode,
> +				   ggc_strdup (""),
> +				   constraint, 0,
> +				   argvec, constraintvec, labelvec,
> +				   locus);
> +  ASM_OPERANDS_INPUT (body, 0) = temp;
> +  ASM_OPERANDS_INPUT_CONSTRAINT_EXP (body, 0)
> +    = gen_rtx_ASM_INPUT_loc (mode, "0", locus);
> +  emit_insn (gen_rtx_SET (temp, body));
> +
> +  if (need_memory)
> +    emit_move_insn (regtemp, temp);
> +
> +  emit_move_insn (to_rtx, regtemp);
> +}
> +
>  /* Expand {MASK_,}SCATTER_STORE{S,U} call CALL using optab OPTAB.  */
>  
>  static void
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index acb0dbda5567b..f10b220192196 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -410,6 +410,10 @@ DEF_INTERNAL_FN (FALLTHROUGH, ECF_LEAF | ECF_NOTHROW, NULL)
>  /* To implement __builtin_launder.  */
>  DEF_INTERNAL_FN (LAUNDER, ECF_LEAF | ECF_NOTHROW | ECF_NOVOPS, NULL)
>  
> +/* Copy a value while preventing optimizations based on knowledge
> +   about the input operand.  */
> +DEF_INTERNAL_FN (ASMNESIA, ECF_LEAF | ECF_NOTHROW, NULL)
> +
>  /* Divmod function.  */
>  DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
>  
> diff --git a/gcc/recog.c b/gcc/recog.c
> index 5a42c45361d5c..2f402daad55a0 100644
> --- a/gcc/recog.c
> +++ b/gcc/recog.c
> @@ -53,6 +53,9 @@ along with GCC; see the file COPYING3.  If not see
>  static void validate_replace_rtx_1 (rtx *, rtx, rtx, rtx_insn *, bool);
>  static void validate_replace_src_1 (rtx *, void *);
>  static rtx_insn *split_insn (rtx_insn *);
> +static int asm_operand_ok_match (rtx op, const char *constraint,
> +				 const char **constraints,
> +				 rtx *operands, int match_index);
>  
>  struct target_recog default_target_recog;
>  #if SWITCHABLE_TARGET
> @@ -170,7 +173,7 @@ check_asm_operands (rtx x)
>        const char *c = constraints[i];
>        if (c[0] == '%')
>  	c++;
> -      if (! asm_operand_ok (operands[i], c, constraints))
> +      if (! asm_operand_ok_match (operands[i], c, constraints, operands, -1))
>  	return 0;
>      }
>  
> @@ -2167,6 +2170,67 @@ get_referenced_operands (const char *string, bool *used,
>  
>  int
>  asm_operand_ok (rtx op, const char *constraint, const char **constraints)
> +{
> +  return asm_operand_ok_match (op, constraint, constraints, NULL, -1);
> +}
> +
> +/* Combine a previous cummulative result PREVRES with RES.  If MATCH_INDEX is
> +   nonnegative, check that OP is compatible with the matched operand, using
> +   OPERANDS if not NULL, under the CN-implied requirements.
> + */
> +
> +static int
> +check_match (int prevres, rtx op, enum constraint_num cn, int res,
> +	     rtx *operands, int match_index)
> +{
> +  if (prevres > 0 || !res)
> +    return prevres;
> +
> +  if (match_index < 0)
> +    return res;
> +
> +  bool allows_reg = false, allows_mem = false;
> +  enum reg_class rclass = NO_REGS;
> +
> +  if (cn == CONSTRAINT__LIMIT)
> +    {
> +      allows_reg = allows_mem = true;
> +      rclass = GENERAL_REGS;
> +    }
> +  else
> +    {
> +      insn_extra_constraint_allows_reg_mem (cn, &allows_reg, &allows_mem);
> +      if (allows_reg)
> +	rclass = reg_class_for_constraint (cn);
> +    }
> +
> +  rtx mop = operands ? operands[match_index] : NULL_RTX;
> +
> +  /* Avoid divergence between input and output when reload wouldn't be able to
> +     fix it up.  Keep matching MEMs identical if we can't have REGs, but allow
> +     reloadable MEMs otherwise.  Avoid immediates and rvalue expressions, since
> +     they can't match outputs.  */
> +  if (op != mop
> +      && ((mop && allows_mem && MEM_P (op) && MEM_P (mop)
> +	   && (!allows_reg || rclass == NO_REGS || GET_MODE (mop) == BLKmode)
> +	   && !rtx_equal_p (op, mop))
> +	  || (mop && allows_reg
> +	      && (rclass == NO_REGS || GET_MODE (mop) == BLKmode)
> +	      && (!allows_mem || !MEM_P (op) || !MEM_P (mop)))
> +	  || !nonimmediate_operand (op, VOIDmode)))
> +    return prevres;
> +
> +  return res;
> +}
> +
> +/* Check if an asm_operand matches its constraints.
> +   If MATCH_INDEX is nonnegative, also check for potential compatibility
> +   with the matched operand, using OPERANDS if non-NULL.
> +   Return > 0 if ok, = 0 if bad, < 0 if inconclusive.  */
> +
> +static int
> +asm_operand_ok_match (rtx op, const char *constraint, const char **constraints,
> +		      rtx *operands, int match_index)
>  {
>    int result = 0;
>    bool incdec_ok = false;
> @@ -2177,7 +2241,8 @@ asm_operand_ok (rtx op, const char *constraint, const char **constraints)
>    /* Empty constraint string is the same as "X,...,X", i.e. X for as
>       many alternatives as required to match the other operands.  */
>    if (*constraint == '\0')
> -    result = 1;
> +    result = check_match (result, op, CONSTRAINT_X,
> +			  1, operands, match_index);
>  
>    while (*constraint)
>      {
> @@ -2186,7 +2251,8 @@ asm_operand_ok (rtx op, const char *constraint, const char **constraints)
>        int len;
>        switch (c)
>  	{
> -	case ',':
> +	case '&': case '?': case '!': case '#': case '*':
> +	case '=': case '+': case ',':
>  	  constraint++;
>  	  continue;
>  
> @@ -2204,7 +2270,8 @@ asm_operand_ok (rtx op, const char *constraint, const char **constraints)
>  
>  	      match = strtoul (constraint, &end, 10);
>  	      if (!result)
> -		result = asm_operand_ok (op, constraints[match], NULL);
> +		result = asm_operand_ok_match (op, constraints[match], NULL,
> +					       operands, match);
>  	      constraint = (const char *) end;
>  	    }
>  	  else
> @@ -2229,12 +2296,14 @@ asm_operand_ok (rtx op, const char *constraint, const char **constraints)
>  	     offsettable address exists.  */
>  	case 'o': /* offsettable */
>  	  if (offsettable_nonstrict_memref_p (op))
> -	    result = 1;
> +	    result = check_match (result, op, CONSTRAINT_o, 1,
> +				  operands, match_index);
>  	  break;
>  
>  	case 'g':
>  	  if (general_operand (op, VOIDmode))
> -	    result = 1;
> +	    result = check_match (result, op, CONSTRAINT__LIMIT, 1,
> +				  operands, match_index);
>  	  break;
>  
>  	case '<':
> @@ -2253,18 +2322,21 @@ asm_operand_ok (rtx op, const char *constraint, const char **constraints)
>  	  switch (get_constraint_type (cn))
>  	    {
>  	    case CT_REGISTER:
> -	      if (!result
> -		  && reg_class_for_constraint (cn) != NO_REGS
> -		  && GET_MODE (op) != BLKmode
> -		  && register_operand (op, VOIDmode))
> -		result = 1;
> +	      /* Don't demand a register for a matched operand, see pr93027.  */
> +	      result = check_match (result, op, cn,
> +				    reg_class_for_constraint (cn) != NO_REGS
> +				    && GET_MODE (op) != BLKmode
> +				    && (match_index >= 0
> +					|| register_operand (op, VOIDmode)),
> +				    operands, match_index);
>  	      break;
>  
>  	    case CT_CONST_INT:
> -	      if (!result
> -		  && CONST_INT_P (op)
> -		  && insn_const_int_ok_for_constraint (INTVAL (op), cn))
> -		result = 1;
> +	      result = check_match (result, op, cn,
> +				    CONST_INT_P (op)
> +				    && (insn_const_int_ok_for_constraint
> +					(INTVAL (op), cn)),
> +				    operands, match_index);
>  	      break;
>  
>  	    case CT_MEMORY:
> @@ -2275,16 +2347,22 @@ asm_operand_ok (rtx op, const char *constraint, const char **constraints)
>  	      /* Every memory operand can be reloaded to fit.  */
>  	      if (!mem)
>  		mem = extract_mem_from_operand (op);
> -	      result = result || memory_operand (mem, VOIDmode);
> +	      result = check_match (result, op, cn,
> +				    memory_operand (mem, VOIDmode),
> +				    operands, match_index);
>  	      break;
>  
>  	    case CT_ADDRESS:
>  	      /* Every address operand can be reloaded to fit.  */
> -	      result = result || address_operand (op, VOIDmode);
> +	      result = check_match (result, op, cn,
> +				    address_operand (op, VOIDmode),
> +				    operands, match_index);
>  	      break;
>  
>  	    case CT_FIXED_FORM:
> -	      result = result || constraint_satisfied_p (op, cn);
> +	      result = check_match (result, op, cn,
> +				    constraint_satisfied_p (op, cn),
> +				    operands, match_index);
>  	      break;
>  	    }
>  	  break;
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr103149.c b/gcc/testsuite/gcc.target/aarch64/pr103149.c
> new file mode 100644
> index 0000000000000..e7283b4f569c7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr103149.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv8-a+sve -O2 -fharden-conditional-branches -fno-tree-scev-cprop" } */
> +
> +/* -fharden-conditional-branches relies on ASMNESIA, that used to require
> +   GENERAL_REGS even for vectorized booleans, which can't go on
> +   GENERAL_REGS.  */
> +
> +void
> +foo (int *p)
> +{
> +  while (*p < 1)
> +    ++*p;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr85030.c b/gcc/testsuite/gcc.target/i386/pr85030.c
> index ff41df6bb643c..f24e690b8c11d 100644
> --- a/gcc/testsuite/gcc.target/i386/pr85030.c
> +++ b/gcc/testsuite/gcc.target/i386/pr85030.c
> @@ -6,5 +6,5 @@ void
>  foo ()
>  {
>    struct S a;
> -  asm volatile ("" : "=rm" (a) : "0" (1)); /* { dg-error "inconsistent operand constraints in an 'asm'" } */
> +  asm volatile ("" : "=rm" (a) : "0" (1)); /* { dg-error "impossible constraint in 'asm'|inconsistent operand constraints in an 'asm'" } */
>  }
> 
> 
>
  
Alexandre Oliva Dec. 3, 2021, 9:14 a.m. UTC | #2
On Dec  2, 2021, Richard Biener <rguenther@suse.de> wrote:

> While adding ASMNESIA looks OK at this point, I'm not sure about the
> asm handling stuff.  You mention 'reload' above, do you mean LRA?

I meant the allocation was first visible in -fdump-rtl-reload.  I've
added a note about this problem, and a modified testcase, to PR93027
once I realized that the problem was present in an unpatched compiler,
even at -O0.


In the previous patch, I'd already abandoned the intent to use the +X
constraint for ASMNESIA, because it was not useful and created other
problems.  I'd fixed numerous of those additional problems in the
recog.c and cfgexpand.c changes, but those turned out to be not needed
to fix the PR once I backpedaled to +g (or +m).  ASMNESIA also turned
out to be unnecessary, so I prepared and retested another version of the
patch that uses the same switch-to-MEM logic I wrote for ASMNESIA in the
previous patch, but now directly in the detach_value implementation
within the harden-conditional passes, using +m and an addressable
temporary if we find that +g won't do.

If you find that ASMNESIA is still useful as a reusable internal
primitive, I can reintroduce it, now or at a later time.  It would bring
some slight codegen benefit, but we could do without it at this stage.


[PR103149] detach values through mem only if general regs won't do

From: Alexandre Oliva <oliva@adacore.com>

When hardening compares or conditional branches, we perform redundant
tests, and to prevent them from being optimized out, we use asm
statements that preserve a value used in a compare, but in a way that
the compiler can no longer assume it's the same value, so it can't
optimize the redundant test away.

We used to use +g, but that requires general regs or mem.  You might
think that, if a reg constraint can't be satisfied, the register
allocator will fall back to memory, but that's not so: we decide on
matching MEMs very early on, by using the same addressable operand on
both input and output, and only if the constraint does not allow
registers.  If it does, we use gimple registers and then pseudos as
inputs and outputs, and then inputs can be substituted by equivalent
expressions, and then, if no register contraint fits (e.g. because
that mode won't fit in general regs, or won't fit in regs at all), the
register allocator will give up before even trying to allocate some
temporary memory to unify input and output.

This patch arranges for us to create and use the temporary stack slot
if we can tell the mode requires memory, or won't otherwise fit in
general regs, and thus to use +m for that asm.


Regstrapped on x86_64-linux-gnu.  Verified that the mentioned aarch64 PR
is fixed.  Also bootstrapping along with a patch that enables both
compare hardening passes.  Ok to install?


for  gcc/ChangeLog

	PR middle-end/103149
	* gimple-harden-conditionals.cc (detach_value): Use memory if
	general regs won't do.

for  gcc/testsuite/ChangeLog

	PR middle-end/103149
	* gcc.target/aarch64/pr103149.c: New.
---
 gcc/gimple-harden-conditionals.cc           |   67 +++++++++++++++++++++++++--
 gcc/testsuite/gcc.target/aarch64/pr103149.c |   14 ++++++
 2 files changed, 75 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr103149.c

diff --git a/gcc/gimple-harden-conditionals.cc b/gcc/gimple-harden-conditionals.cc
index cfa2361d65be0..81867d6e4275f 100644
--- a/gcc/gimple-harden-conditionals.cc
+++ b/gcc/gimple-harden-conditionals.cc
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "target.h"
+#include "rtl.h"
 #include "tree.h"
 #include "fold-const.h"
 #include "gimple.h"
@@ -132,25 +134,78 @@ detach_value (location_t loc, gimple_stmt_iterator *gsip, tree val)
   tree ret = make_ssa_name (TREE_TYPE (val));
   SET_SSA_NAME_VAR_OR_IDENTIFIER (ret, SSA_NAME_IDENTIFIER (val));
 
-  /* Output asm ("" : "=g" (ret) : "0" (val));  */
+  /* Some modes won't fit in general regs, so we fall back to memory
+     for them.  ??? It would be ideal to try to identify an alternate,
+     wider or more suitable register class, and use the corresponding
+     constraint, but there's no logic to go from register class to
+     constraint, even if there is a corresponding constraint, and even
+     if we could enumerate constraints, we can't get to their string
+     either.  So this will do for now.  */
+  bool need_memory = true;
+  enum machine_mode mode = TYPE_MODE (TREE_TYPE (val));
+  if (mode != BLKmode)
+    for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+      if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i)
+	  && targetm.hard_regno_mode_ok (i, mode))
+	{
+	  need_memory = false;
+	  break;
+	}
+
+  tree asminput = val;
+  tree asmoutput = ret;
+  const char *constraint_out = need_memory ? "=m" : "=g";
+  const char *constraint_in = need_memory ? "m" : "0";
+
+  if (need_memory)
+    {
+      tree temp = create_tmp_var (TREE_TYPE (val), "dtch");
+      mark_addressable (temp);
+
+      gassign *copyin = gimple_build_assign (temp, asminput);
+      gimple_set_location (copyin, loc);
+      gsi_insert_before (gsip, copyin, GSI_SAME_STMT);
+
+      asminput = asmoutput = temp;
+    }
+
+  /* Output an asm statement with matching input and output.  It does
+     nothing, but after it the compiler no longer knows the output
+     still holds the same value as the input.  */
   vec<tree, va_gc> *inputs = NULL;
   vec<tree, va_gc> *outputs = NULL;
   vec_safe_push (outputs,
 		 build_tree_list
 		 (build_tree_list
-		  (NULL_TREE, build_string (2, "=g")),
-		  ret));
+		  (NULL_TREE, build_string (strlen (constraint_out),
+					    constraint_out)),
+		  asmoutput));
   vec_safe_push (inputs,
 		 build_tree_list
 		 (build_tree_list
-		  (NULL_TREE, build_string (1, "0")),
-		  val));
+		  (NULL_TREE, build_string (strlen (constraint_in),
+					    constraint_in)),
+		  asminput));
   gasm *detach = gimple_build_asm_vec ("", inputs, outputs,
 				       NULL, NULL);
   gimple_set_location (detach, loc);
   gsi_insert_before (gsip, detach, GSI_SAME_STMT);
 
-  SSA_NAME_DEF_STMT (ret) = detach;
+  if (need_memory)
+    {
+      gassign *copyout = gimple_build_assign (ret, asmoutput);
+      gimple_set_location (copyout, loc);
+      gsi_insert_before (gsip, copyout, GSI_SAME_STMT);
+      SSA_NAME_DEF_STMT (ret) = copyout;
+
+      gassign *clobber = gimple_build_assign (asmoutput,
+					      build_clobber
+					      (TREE_TYPE (asmoutput)));
+      gimple_set_location (clobber, loc);
+      gsi_insert_before (gsip, clobber, GSI_SAME_STMT);
+    }
+  else
+    SSA_NAME_DEF_STMT (ret) = detach;
 
   return ret;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/pr103149.c b/gcc/testsuite/gcc.target/aarch64/pr103149.c
new file mode 100644
index 0000000000000..906bc9ae06612
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr103149.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+sve -O2 -fharden-conditional-branches -fno-tree-scev-cprop" } */
+
+/* -fharden-conditional-branches prevents optimization of its redundant
+   compares by detaching values from the operands with asm statements.  They
+   used to require GENERAL_REGS, but the vectorized booleans, generated while
+   vectorizing this function, can't be held in GENERAL_REGS.  */
+
+void
+foo (int *p)
+{
+  while (*p < 1)
+    ++*p;
+}
  
Richard Biener Dec. 7, 2021, 11:51 a.m. UTC | #3
On Fri, 3 Dec 2021, Alexandre Oliva wrote:

> On Dec  2, 2021, Richard Biener <rguenther@suse.de> wrote:
> 
> > While adding ASMNESIA looks OK at this point, I'm not sure about the
> > asm handling stuff.  You mention 'reload' above, do you mean LRA?
> 
> I meant the allocation was first visible in -fdump-rtl-reload.  I've
> added a note about this problem, and a modified testcase, to PR93027
> once I realized that the problem was present in an unpatched compiler,
> even at -O0.

Ah, I see.

> In the previous patch, I'd already abandoned the intent to use the +X
> constraint for ASMNESIA, because it was not useful and created other
> problems.  I'd fixed numerous of those additional problems in the
> recog.c and cfgexpand.c changes, but those turned out to be not needed
> to fix the PR once I backpedaled to +g (or +m).  ASMNESIA also turned
> out to be unnecessary, so I prepared and retested another version of the
> patch that uses the same switch-to-MEM logic I wrote for ASMNESIA in the
> previous patch, but now directly in the detach_value implementation
> within the harden-conditional passes, using +m and an addressable
> temporary if we find that +g won't do.
> 
> If you find that ASMNESIA is still useful as a reusable internal
> primitive, I can reintroduce it, now or at a later time.  It would bring
> some slight codegen benefit, but we could do without it at this stage.

I prefer to be minimalistic at this point.  Maybe we can re-use
ASMNESIA for the FENC_ACCESS stuff if somebody picks up Marcs work
during next stage1.

> 
> [PR103149] detach values through mem only if general regs won't do
> 
> From: Alexandre Oliva <oliva@adacore.com>
> 
> When hardening compares or conditional branches, we perform redundant
> tests, and to prevent them from being optimized out, we use asm
> statements that preserve a value used in a compare, but in a way that
> the compiler can no longer assume it's the same value, so it can't
> optimize the redundant test away.
> 
> We used to use +g, but that requires general regs or mem.  You might
> think that, if a reg constraint can't be satisfied, the register
> allocator will fall back to memory, but that's not so: we decide on
> matching MEMs very early on, by using the same addressable operand on
> both input and output, and only if the constraint does not allow
> registers.  If it does, we use gimple registers and then pseudos as
> inputs and outputs, and then inputs can be substituted by equivalent
> expressions, and then, if no register contraint fits (e.g. because
> that mode won't fit in general regs, or won't fit in regs at all), the
> register allocator will give up before even trying to allocate some
> temporary memory to unify input and output.
> 
> This patch arranges for us to create and use the temporary stack slot
> if we can tell the mode requires memory, or won't otherwise fit in
> general regs, and thus to use +m for that asm.
> 
> 
> Regstrapped on x86_64-linux-gnu.  Verified that the mentioned aarch64 PR
> is fixed.  Also bootstrapping along with a patch that enables both
> compare hardening passes.  Ok to install?

OK.

Thanks,
Richard.

> 
> 
> for  gcc/ChangeLog
> 
> 	PR middle-end/103149
> 	* gimple-harden-conditionals.cc (detach_value): Use memory if
> 	general regs won't do.
> 
> for  gcc/testsuite/ChangeLog
> 
> 	PR middle-end/103149
> 	* gcc.target/aarch64/pr103149.c: New.
> ---
>  gcc/gimple-harden-conditionals.cc           |   67 +++++++++++++++++++++++++--
>  gcc/testsuite/gcc.target/aarch64/pr103149.c |   14 ++++++
>  2 files changed, 75 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr103149.c
> 
> diff --git a/gcc/gimple-harden-conditionals.cc b/gcc/gimple-harden-conditionals.cc
> index cfa2361d65be0..81867d6e4275f 100644
> --- a/gcc/gimple-harden-conditionals.cc
> +++ b/gcc/gimple-harden-conditionals.cc
> @@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "system.h"
>  #include "coretypes.h"
>  #include "backend.h"
> +#include "target.h"
> +#include "rtl.h"
>  #include "tree.h"
>  #include "fold-const.h"
>  #include "gimple.h"
> @@ -132,25 +134,78 @@ detach_value (location_t loc, gimple_stmt_iterator *gsip, tree val)
>    tree ret = make_ssa_name (TREE_TYPE (val));
>    SET_SSA_NAME_VAR_OR_IDENTIFIER (ret, SSA_NAME_IDENTIFIER (val));
>  
> -  /* Output asm ("" : "=g" (ret) : "0" (val));  */
> +  /* Some modes won't fit in general regs, so we fall back to memory
> +     for them.  ??? It would be ideal to try to identify an alternate,
> +     wider or more suitable register class, and use the corresponding
> +     constraint, but there's no logic to go from register class to
> +     constraint, even if there is a corresponding constraint, and even
> +     if we could enumerate constraints, we can't get to their string
> +     either.  So this will do for now.  */
> +  bool need_memory = true;
> +  enum machine_mode mode = TYPE_MODE (TREE_TYPE (val));
> +  if (mode != BLKmode)
> +    for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +      if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i)
> +	  && targetm.hard_regno_mode_ok (i, mode))
> +	{
> +	  need_memory = false;
> +	  break;
> +	}
> +
> +  tree asminput = val;
> +  tree asmoutput = ret;
> +  const char *constraint_out = need_memory ? "=m" : "=g";
> +  const char *constraint_in = need_memory ? "m" : "0";
> +
> +  if (need_memory)
> +    {
> +      tree temp = create_tmp_var (TREE_TYPE (val), "dtch");
> +      mark_addressable (temp);
> +
> +      gassign *copyin = gimple_build_assign (temp, asminput);
> +      gimple_set_location (copyin, loc);
> +      gsi_insert_before (gsip, copyin, GSI_SAME_STMT);
> +
> +      asminput = asmoutput = temp;
> +    }
> +
> +  /* Output an asm statement with matching input and output.  It does
> +     nothing, but after it the compiler no longer knows the output
> +     still holds the same value as the input.  */
>    vec<tree, va_gc> *inputs = NULL;
>    vec<tree, va_gc> *outputs = NULL;
>    vec_safe_push (outputs,
>  		 build_tree_list
>  		 (build_tree_list
> -		  (NULL_TREE, build_string (2, "=g")),
> -		  ret));
> +		  (NULL_TREE, build_string (strlen (constraint_out),
> +					    constraint_out)),
> +		  asmoutput));
>    vec_safe_push (inputs,
>  		 build_tree_list
>  		 (build_tree_list
> -		  (NULL_TREE, build_string (1, "0")),
> -		  val));
> +		  (NULL_TREE, build_string (strlen (constraint_in),
> +					    constraint_in)),
> +		  asminput));
>    gasm *detach = gimple_build_asm_vec ("", inputs, outputs,
>  				       NULL, NULL);
>    gimple_set_location (detach, loc);
>    gsi_insert_before (gsip, detach, GSI_SAME_STMT);
>  
> -  SSA_NAME_DEF_STMT (ret) = detach;
> +  if (need_memory)
> +    {
> +      gassign *copyout = gimple_build_assign (ret, asmoutput);
> +      gimple_set_location (copyout, loc);
> +      gsi_insert_before (gsip, copyout, GSI_SAME_STMT);
> +      SSA_NAME_DEF_STMT (ret) = copyout;
> +
> +      gassign *clobber = gimple_build_assign (asmoutput,
> +					      build_clobber
> +					      (TREE_TYPE (asmoutput)));
> +      gimple_set_location (clobber, loc);
> +      gsi_insert_before (gsip, clobber, GSI_SAME_STMT);
> +    }
> +  else
> +    SSA_NAME_DEF_STMT (ret) = detach;
>  
>    return ret;
>  }
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr103149.c b/gcc/testsuite/gcc.target/aarch64/pr103149.c
> new file mode 100644
> index 0000000000000..906bc9ae06612
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr103149.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv8-a+sve -O2 -fharden-conditional-branches -fno-tree-scev-cprop" } */
> +
> +/* -fharden-conditional-branches prevents optimization of its redundant
> +   compares by detaching values from the operands with asm statements.  They
> +   used to require GENERAL_REGS, but the vectorized booleans, generated while
> +   vectorizing this function, can't be held in GENERAL_REGS.  */
> +
> +void
> +foo (int *p)
> +{
> +  while (*p < 1)
> +    ++*p;
> +}
> 
> 
>
  

Patch

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index fb84d469f1e77..f0272ba1ac9ba 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3374,10 +3374,10 @@  expand_asm_stmt (gasm *stmt)
       tree val = input_tvec[i];
       tree type = TREE_TYPE (val);
       bool allows_reg, allows_mem, ok;
-      const char *constraint;
+      const char *constraint, *orig_constraint;
       rtx op;
 
-      constraint = constraints[i + noutputs];
+      orig_constraint = constraint = constraints[i + noutputs];
       ok = parse_input_constraint (&constraint, i, ninputs, noutputs, 0,
 				   constraints.address (),
 				   &allows_mem, &allows_reg);
@@ -3397,7 +3397,7 @@  expand_asm_stmt (gasm *stmt)
       else if (MEM_P (op))
 	op = validize_mem (op);
 
-      if (asm_operand_ok (op, constraint, NULL) <= 0)
+      if (asm_operand_ok (op, orig_constraint, constraints.address ()) <= 0)
 	{
 	  if (allows_reg && TYPE_MODE (type) != BLKmode)
 	    op = force_reg (TYPE_MODE (type), op);
diff --git a/gcc/gimple-harden-conditionals.cc b/gcc/gimple-harden-conditionals.cc
index cfa2361d65be0..3768b2e23bdaf 100644
--- a/gcc/gimple-harden-conditionals.cc
+++ b/gcc/gimple-harden-conditionals.cc
@@ -132,21 +132,8 @@  detach_value (location_t loc, gimple_stmt_iterator *gsip, tree val)
   tree ret = make_ssa_name (TREE_TYPE (val));
   SET_SSA_NAME_VAR_OR_IDENTIFIER (ret, SSA_NAME_IDENTIFIER (val));
 
-  /* Output asm ("" : "=g" (ret) : "0" (val));  */
-  vec<tree, va_gc> *inputs = NULL;
-  vec<tree, va_gc> *outputs = NULL;
-  vec_safe_push (outputs,
-		 build_tree_list
-		 (build_tree_list
-		  (NULL_TREE, build_string (2, "=g")),
-		  ret));
-  vec_safe_push (inputs,
-		 build_tree_list
-		 (build_tree_list
-		  (NULL_TREE, build_string (1, "0")),
-		  val));
-  gasm *detach = gimple_build_asm_vec ("", inputs, outputs,
-				       NULL, NULL);
+  gcall *detach = gimple_build_call_internal (IFN_ASMNESIA, 1, val);
+  gimple_call_set_lhs (detach, ret);
   gimple_set_location (detach, loc);
   gsi_insert_before (gsip, detach, GSI_SAME_STMT);
 
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 6ac3460d538be..a4a2ca91d9f93 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3234,6 +3234,87 @@  expand_LAUNDER (internal_fn, gcall *call)
   expand_assignment (lhs, gimple_call_arg (call, 0), false);
 }
 
+/* Expand ASMNESIA to assignment and asm that makes the value unknown.  */
+
+static void
+expand_ASMNESIA (internal_fn, gcall *call)
+{
+  tree to = gimple_call_lhs (call);
+
+  if (!to)
+    return;
+
+  location_t locus = gimple_location (call);
+
+  tree from = gimple_call_arg (call, 0);
+
+  rtx to_rtx = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx from_rtx = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+
+  enum machine_mode mode = GET_MODE (to_rtx);
+
+  gcc_checking_assert (mode != BLKmode && mode != VOIDmode
+		       && (GET_MODE (from_rtx) == mode
+			   || GET_MODE (from_rtx) == VOIDmode));
+
+  rtvec argvec = rtvec_alloc (1);
+  rtvec constraintvec = rtvec_alloc (1);
+  rtvec labelvec = rtvec_alloc (0);
+
+  bool need_memory = true;
+  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i)
+	&& targetm.hard_regno_mode_ok (i, mode))
+      {
+	need_memory = false;
+	break;
+      }
+
+  rtx regtemp = copy_to_mode_reg (mode, from_rtx);
+  rtx temp = regtemp;
+  const char *constraint = "=g";
+
+  if (need_memory)
+    {
+      constraint = "=m";
+      temp = assign_stack_temp (mode, GET_MODE_SIZE (mode));
+      emit_move_insn (temp, regtemp);
+    }
+
+  /* This is roughly equivalent to asm ("" : "+g" (temp));
+
+     We use +m instead of +g if we NEED_MEMORY, i.e., the mode is not suitable
+     for GENERAL_REGS.  ??? Ideally, we'd choose a suitable register class, if
+     there is one, but how do we get a constraint type that maps to a it?  +X is
+     at the same time too lax, because arbitrary RTL and x87 FP regs are not
+     desirable, and too strict, because it doesn't guide any register class.
+     ??? Maybe the errors reg-stack would flag on e.g. libgcc/_multc3 could be
+     conditioned on the operand's being referenced?
+
+     Unlike expansion of gimple asm stmts, it doesn't go through
+     targetm.md_asm_adjust (we don't wish any clobbers).
+
+     Furthermore, we ensure input and output start out with the same pseudo or
+     MEM, which gimple doesn't.  This is particularly important for MEMs, since
+     reloads can't merge disparate ones, and it would be important for X because
+     it guides NO_REGS.  */
+  rtx body = gen_rtx_ASM_OPERANDS (mode,
+				   ggc_strdup (""),
+				   constraint, 0,
+				   argvec, constraintvec, labelvec,
+				   locus);
+  ASM_OPERANDS_INPUT (body, 0) = temp;
+  ASM_OPERANDS_INPUT_CONSTRAINT_EXP (body, 0)
+    = gen_rtx_ASM_INPUT_loc (mode, "0", locus);
+  emit_insn (gen_rtx_SET (temp, body));
+
+  if (need_memory)
+    emit_move_insn (regtemp, temp);
+
+  emit_move_insn (to_rtx, regtemp);
+}
+
 /* Expand {MASK_,}SCATTER_STORE{S,U} call CALL using optab OPTAB.  */
 
 static void
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index acb0dbda5567b..f10b220192196 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -410,6 +410,10 @@  DEF_INTERNAL_FN (FALLTHROUGH, ECF_LEAF | ECF_NOTHROW, NULL)
 /* To implement __builtin_launder.  */
 DEF_INTERNAL_FN (LAUNDER, ECF_LEAF | ECF_NOTHROW | ECF_NOVOPS, NULL)
 
+/* Copy a value while preventing optimizations based on knowledge
+   about the input operand.  */
+DEF_INTERNAL_FN (ASMNESIA, ECF_LEAF | ECF_NOTHROW, NULL)
+
 /* Divmod function.  */
 DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
 
diff --git a/gcc/recog.c b/gcc/recog.c
index 5a42c45361d5c..2f402daad55a0 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -53,6 +53,9 @@  along with GCC; see the file COPYING3.  If not see
 static void validate_replace_rtx_1 (rtx *, rtx, rtx, rtx_insn *, bool);
 static void validate_replace_src_1 (rtx *, void *);
 static rtx_insn *split_insn (rtx_insn *);
+static int asm_operand_ok_match (rtx op, const char *constraint,
+				 const char **constraints,
+				 rtx *operands, int match_index);
 
 struct target_recog default_target_recog;
 #if SWITCHABLE_TARGET
@@ -170,7 +173,7 @@  check_asm_operands (rtx x)
       const char *c = constraints[i];
       if (c[0] == '%')
 	c++;
-      if (! asm_operand_ok (operands[i], c, constraints))
+      if (! asm_operand_ok_match (operands[i], c, constraints, operands, -1))
 	return 0;
     }
 
@@ -2167,6 +2170,67 @@  get_referenced_operands (const char *string, bool *used,
 
 int
 asm_operand_ok (rtx op, const char *constraint, const char **constraints)
+{
+  return asm_operand_ok_match (op, constraint, constraints, NULL, -1);
+}
+
+/* Combine a previous cummulative result PREVRES with RES.  If MATCH_INDEX is
+   nonnegative, check that OP is compatible with the matched operand, using
+   OPERANDS if not NULL, under the CN-implied requirements.
+ */
+
+static int
+check_match (int prevres, rtx op, enum constraint_num cn, int res,
+	     rtx *operands, int match_index)
+{
+  if (prevres > 0 || !res)
+    return prevres;
+
+  if (match_index < 0)
+    return res;
+
+  bool allows_reg = false, allows_mem = false;
+  enum reg_class rclass = NO_REGS;
+
+  if (cn == CONSTRAINT__LIMIT)
+    {
+      allows_reg = allows_mem = true;
+      rclass = GENERAL_REGS;
+    }
+  else
+    {
+      insn_extra_constraint_allows_reg_mem (cn, &allows_reg, &allows_mem);
+      if (allows_reg)
+	rclass = reg_class_for_constraint (cn);
+    }
+
+  rtx mop = operands ? operands[match_index] : NULL_RTX;
+
+  /* Avoid divergence between input and output when reload wouldn't be able to
+     fix it up.  Keep matching MEMs identical if we can't have REGs, but allow
+     reloadable MEMs otherwise.  Avoid immediates and rvalue expressions, since
+     they can't match outputs.  */
+  if (op != mop
+      && ((mop && allows_mem && MEM_P (op) && MEM_P (mop)
+	   && (!allows_reg || rclass == NO_REGS || GET_MODE (mop) == BLKmode)
+	   && !rtx_equal_p (op, mop))
+	  || (mop && allows_reg
+	      && (rclass == NO_REGS || GET_MODE (mop) == BLKmode)
+	      && (!allows_mem || !MEM_P (op) || !MEM_P (mop)))
+	  || !nonimmediate_operand (op, VOIDmode)))
+    return prevres;
+
+  return res;
+}
+
+/* Check if an asm_operand matches its constraints.
+   If MATCH_INDEX is nonnegative, also check for potential compatibility
+   with the matched operand, using OPERANDS if non-NULL.
+   Return > 0 if ok, = 0 if bad, < 0 if inconclusive.  */
+
+static int
+asm_operand_ok_match (rtx op, const char *constraint, const char **constraints,
+		      rtx *operands, int match_index)
 {
   int result = 0;
   bool incdec_ok = false;
@@ -2177,7 +2241,8 @@  asm_operand_ok (rtx op, const char *constraint, const char **constraints)
   /* Empty constraint string is the same as "X,...,X", i.e. X for as
      many alternatives as required to match the other operands.  */
   if (*constraint == '\0')
-    result = 1;
+    result = check_match (result, op, CONSTRAINT_X,
+			  1, operands, match_index);
 
   while (*constraint)
     {
@@ -2186,7 +2251,8 @@  asm_operand_ok (rtx op, const char *constraint, const char **constraints)
       int len;
       switch (c)
 	{
-	case ',':
+	case '&': case '?': case '!': case '#': case '*':
+	case '=': case '+': case ',':
 	  constraint++;
 	  continue;
 
@@ -2204,7 +2270,8 @@  asm_operand_ok (rtx op, const char *constraint, const char **constraints)
 
 	      match = strtoul (constraint, &end, 10);
 	      if (!result)
-		result = asm_operand_ok (op, constraints[match], NULL);
+		result = asm_operand_ok_match (op, constraints[match], NULL,
+					       operands, match);
 	      constraint = (const char *) end;
 	    }
 	  else
@@ -2229,12 +2296,14 @@  asm_operand_ok (rtx op, const char *constraint, const char **constraints)
 	     offsettable address exists.  */
 	case 'o': /* offsettable */
 	  if (offsettable_nonstrict_memref_p (op))
-	    result = 1;
+	    result = check_match (result, op, CONSTRAINT_o, 1,
+				  operands, match_index);
 	  break;
 
 	case 'g':
 	  if (general_operand (op, VOIDmode))
-	    result = 1;
+	    result = check_match (result, op, CONSTRAINT__LIMIT, 1,
+				  operands, match_index);
 	  break;
 
 	case '<':
@@ -2253,18 +2322,21 @@  asm_operand_ok (rtx op, const char *constraint, const char **constraints)
 	  switch (get_constraint_type (cn))
 	    {
 	    case CT_REGISTER:
-	      if (!result
-		  && reg_class_for_constraint (cn) != NO_REGS
-		  && GET_MODE (op) != BLKmode
-		  && register_operand (op, VOIDmode))
-		result = 1;
+	      /* Don't demand a register for a matched operand, see pr93027.  */
+	      result = check_match (result, op, cn,
+				    reg_class_for_constraint (cn) != NO_REGS
+				    && GET_MODE (op) != BLKmode
+				    && (match_index >= 0
+					|| register_operand (op, VOIDmode)),
+				    operands, match_index);
 	      break;
 
 	    case CT_CONST_INT:
-	      if (!result
-		  && CONST_INT_P (op)
-		  && insn_const_int_ok_for_constraint (INTVAL (op), cn))
-		result = 1;
+	      result = check_match (result, op, cn,
+				    CONST_INT_P (op)
+				    && (insn_const_int_ok_for_constraint
+					(INTVAL (op), cn)),
+				    operands, match_index);
 	      break;
 
 	    case CT_MEMORY:
@@ -2275,16 +2347,22 @@  asm_operand_ok (rtx op, const char *constraint, const char **constraints)
 	      /* Every memory operand can be reloaded to fit.  */
 	      if (!mem)
 		mem = extract_mem_from_operand (op);
-	      result = result || memory_operand (mem, VOIDmode);
+	      result = check_match (result, op, cn,
+				    memory_operand (mem, VOIDmode),
+				    operands, match_index);
 	      break;
 
 	    case CT_ADDRESS:
 	      /* Every address operand can be reloaded to fit.  */
-	      result = result || address_operand (op, VOIDmode);
+	      result = check_match (result, op, cn,
+				    address_operand (op, VOIDmode),
+				    operands, match_index);
 	      break;
 
 	    case CT_FIXED_FORM:
-	      result = result || constraint_satisfied_p (op, cn);
+	      result = check_match (result, op, cn,
+				    constraint_satisfied_p (op, cn),
+				    operands, match_index);
 	      break;
 	    }
 	  break;
diff --git a/gcc/testsuite/gcc.target/aarch64/pr103149.c b/gcc/testsuite/gcc.target/aarch64/pr103149.c
new file mode 100644
index 0000000000000..e7283b4f569c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr103149.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+sve -O2 -fharden-conditional-branches -fno-tree-scev-cprop" } */
+
+/* -fharden-conditional-branches relies on ASMNESIA, that used to require
+   GENERAL_REGS even for vectorized booleans, which can't go on
+   GENERAL_REGS.  */
+
+void
+foo (int *p)
+{
+  while (*p < 1)
+    ++*p;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr85030.c b/gcc/testsuite/gcc.target/i386/pr85030.c
index ff41df6bb643c..f24e690b8c11d 100644
--- a/gcc/testsuite/gcc.target/i386/pr85030.c
+++ b/gcc/testsuite/gcc.target/i386/pr85030.c
@@ -6,5 +6,5 @@  void
 foo ()
 {
   struct S a;
-  asm volatile ("" : "=rm" (a) : "0" (1)); /* { dg-error "inconsistent operand constraints in an 'asm'" } */
+  asm volatile ("" : "=rm" (a) : "0" (1)); /* { dg-error "impossible constraint in 'asm'|inconsistent operand constraints in an 'asm'" } */
 }