[v2] ree: Improve ree pass for rs6000 target.

Message ID 466cc7e1-9f40-be24-33ef-d965e1e61cba@linux.ibm.com
State New
Headers
Series [v2] ree: Improve ree pass for rs6000 target. |

Commit Message

Ajit Agarwal April 6, 2023, 10:49 a.m. UTC
  Hello All:

Eliminate unnecessary redundant extension within basic and across basic blocks. For rs6000 target we see redundant zero and sign extension and done to improve ree pass to eliminate such redundant zero and sign extension.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


	ree: Improve ree pass for rs6000 target.

	Eliminate unnecessary redundant extension within basic
	and across basic blocks. For rs6000 target we see
	redundant zero and sign extension and done to improve
	ree pass to eliminate such redundant zero and sign
	extension.

	2023-04-06  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>

gcc/ChangeLog:

	* ree.cc (eliminate_across_bbs_p): Add checks to enable extension
	elimination across and within basic blocks.
	(def_arith_p): New function to check definition has arithmetic
	operation.
	(combine_set_extension): Modification to incorporate AND
	and current zero_extend and sign_extend instruction.
	(combline_reaching_defs): Add zero_extend and sign_extend.
	Add FUNCTION_ARG_REGNO_P abi interfaces calls and
	FUNCTION_VALUE_REGNO_P support.
	(merge_def_and_ext): Add calls to eliminate_across_bbs_p and
	zero_extend sign_extend and AND instruction.
	(insn_is_zext_p): New function.
	(add_removable_extension): Add FUNCTION_ARG_REGNO_P abi
	interface calls.
	* common/config/rs6000/rs6000-common.cc: Add REE pass as a
	default rs6000 target pass for O2 and above.

gcc/testsuite/ChangeLog:

	* g++.target/powerpc/zext-elim.C: New testcase.
	* g++.target/powerpc/zext-elim-1.C: New testcase.
	* g++.target/powerpc/zext-elim-2.C: New testcase.
	* g++.target/powerpc/zext-elim-3.C: New testcase.
	* g++.target/powerpc/sext-elim.C: New testcase.
---
 gcc/common/config/rs6000/rs6000-common.cc     |   2 +
 gcc/ree.cc                                    | 655 ++++++++++++++----
 gcc/testsuite/g++.target/powerpc/sext-elim.C  |  18 +
 .../g++.target/powerpc/zext-elim-1.C          |  19 +
 .../g++.target/powerpc/zext-elim-2.C          |  11 +
 .../g++.target/powerpc/zext-elim-3.C          |  16 +
 gcc/testsuite/g++.target/powerpc/zext-elim.C  |  30 +
 7 files changed, 606 insertions(+), 145 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C
  

Comments

Bernhard Reutner-Fischer April 12, 2023, 6:08 p.m. UTC | #1
On 6 April 2023 12:49:53 CEST, Ajit Agarwal via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>Hello All:
>
>Eliminate unnecessary redundant extension within basic and across basic blocks.

To borrow HP's "arm-chair development mode", unfortunately most of the comments i had for the previous version apply to this version too, fwiw.

PS: Pretty please avoid moving functions around, if possible, as it spoils the history needlessly, IMHO.

thanks, and looking forward to a stage 1 patch.
  
Jeff Law April 14, 2023, 8:17 p.m. UTC | #2
On 4/6/23 04:49, Ajit Agarwal wrote:
> Hello All:
> 
> Eliminate unnecessary redundant extension within basic and across basic blocks. For rs6000 target we see redundant zero and sign extension and done to improve ree pass to eliminate such redundant zero and sign extension.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
> 	ree: Improve ree pass for rs6000 target.
> 
> 	Eliminate unnecessary redundant extension within basic
> 	and across basic blocks. For rs6000 target we see
> 	redundant zero and sign extension and done to improve
> 	ree pass to eliminate such redundant zero and sign
> 	extension.
> 
> 	2023-04-06  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
> 
> gcc/ChangeLog:
> 
> 	* ree.cc (eliminate_across_bbs_p): Add checks to enable extension
> 	elimination across and within basic blocks.
> 	(def_arith_p): New function to check definition has arithmetic
> 	operation.
> 	(combine_set_extension): Modification to incorporate AND
> 	and current zero_extend and sign_extend instruction.
> 	(combline_reaching_defs): Add zero_extend and sign_extend.
> 	Add FUNCTION_ARG_REGNO_P abi interfaces calls and
> 	FUNCTION_VALUE_REGNO_P support.
> 	(merge_def_and_ext): Add calls to eliminate_across_bbs_p and
> 	zero_extend sign_extend and AND instruction.
> 	(insn_is_zext_p): New function.
> 	(add_removable_extension): Add FUNCTION_ARG_REGNO_P abi
> 	interface calls.
> 	* common/config/rs6000/rs6000-common.cc: Add REE pass as a
> 	default rs6000 target pass for O2 and above.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* g++.target/powerpc/zext-elim.C: New testcase.
> 	* g++.target/powerpc/zext-elim-1.C: New testcase.
> 	* g++.target/powerpc/zext-elim-2.C: New testcase.
> 	* g++.target/powerpc/zext-elim-3.C: New testcase.
> 	* g++.target/powerpc/sext-elim.C: New testcase.
It would be useful to know the kinds of patterns you're trying to 
improve.  I get the sense there's at least three distinct cases you're 
trying to handle.

One case appears to stem from operations which we know produce a zero 
extended results.  For example x & 0x1.   We can kindof view that as a 
zero extension from narrow modes up through word_mode since we know the 
upper bits are zero.

Another stems from exploiting ABI characteristics.

Finally extending to handle cases across basic blocks

These should be independent changes.  So I can easily see this patch 
should morph into a patch series with at least 4 entries.

1/4 Just moves code around and produces no functional changes.
2/4 Would implement the case where an operation is known to
     produce a zero extended result.
3/4 Would exploit the ABI characteristics to eliminate more
     extensions.
4/4 Would extend REE to work across blocks

WIth this kind of structure patches #1 and #2 might to in fairly 
quickly, even if we have to figure out how to handle ABI issues.  It's 
also easier for the reviewer on multiple levels.

> ---
>   gcc/common/config/rs6000/rs6000-common.cc     |   2 +
>   gcc/ree.cc                                    | 655 ++++++++++++++----
>   gcc/testsuite/g++.target/powerpc/sext-elim.C  |  18 +
>   .../g++.target/powerpc/zext-elim-1.C          |  19 +
>   .../g++.target/powerpc/zext-elim-2.C          |  11 +
>   .../g++.target/powerpc/zext-elim-3.C          |  16 +
>   gcc/testsuite/g++.target/powerpc/zext-elim.C  |  30 +
>   7 files changed, 606 insertions(+), 145 deletions(-)
>   create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C
>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C
> 
> diff --git a/gcc/common/config/rs6000/rs6000-common.cc b/gcc/common/config/rs6000/rs6000-common.cc
> index 2140c442ba9..968db215028 100644
> --- a/gcc/common/config/rs6000/rs6000-common.cc
> +++ b/gcc/common/config/rs6000/rs6000-common.cc
> @@ -34,6 +34,8 @@ static const struct default_options rs6000_option_optimization_table[] =
>       { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>       /* Enable -fsched-pressure for first pass instruction scheduling.  */
>       { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> +    /* Enable -free for zero extension and sign extension elimination.*/
> +    { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
So if we're going to make this change, then we need to update the 
documentation as well (there's a section which lists which -f options 
are enabled at the different -O<n> optimization levels.

It may be better to have the ppc backend enable the REE pass on its own 
rather than forcing it on for all the targets since it hasn't been 
tested on all the targets.  That's been pretty standard practice for the 
REE implementation.


>       /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
>          loops at -O2 and above by default.  */
>       { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> diff --git a/gcc/ree.cc b/gcc/ree.cc
> index 413aec7c8eb..8057f0325f4 100644
> --- a/gcc/ree.cc
> +++ b/gcc/ree.cc
> @@ -253,6 +253,101 @@ struct ext_cand
>   
>   static int max_insn_uid;
>   
> +/* Get all the reaching definitions of an instruction.  The definitions are
> +   desired for REG used in INSN.  Return the definition list or NULL if a
> +   definition is missing.  If DEST is non-NULL, additionally push the INSN
> +   of the definitions onto DEST.  */
> +
> +static struct df_link *
> +get_defs (rtx_insn *insn, rtx reg, vec<rtx_insn *> *dest)
[ ... ]
So you moved some functions around.  This needs to be reflected in the 
ChangeLog entry.  Presumably you did this so that you didn't need to 
have a prototype for the static functions?


> +
> +
> +/* Identify instruction AND with identical zero extension.  */
> +
> +static unsigned int
> +insn_is_zext_p (rtx insn)
This is *very* poorly named.  You're not operating on an INSN at this 
point anymore.  You're operating on an RTX.  All INSNs are RTXs, but not 
all RTXs are INSNs.

The function comment should indicate what the input argument is as well 
as the return value.  We generally would prefer to use booleans for 
return values rather than integers when we can.  So the return type 
should probably be adjusted.

/* Return TRUE if OP can be considered a zero extension from one or
    more sub-word modes to larger modes up to a full word.

    For example (and:DI (reg) (const_int X))

    Depending on the value of X could be considered a zero extension
    from QI, HI and SI to larger modes up to DImode.  */



> +{
> +  if (GET_CODE (insn) == AND)
> +    {
> +      rtx set = XEXP (insn, 0);
> +      if (REG_P(set))
Formatting.  Space between the macro/function name and the open 
parenthesis for arguments.

> +	{
> +	  if (XEXP (insn, 1) == const1_rtx)
> +	    return 1;
This is *way* too specific.  You probably want to be looking at the 
result mode's mask and comparing the constant to that.


> +/* Identify instruction AND with identical zero extension.  */
> +
> +static unsigned int
> +insn_is_zext_p (rtx_insn *insn)
The function comment needs improvements similar to the other function. 
But this time you really are working with an INSN.



> +{
> +  rtx body = PATTERN (insn);
> +
> +  if (GET_CODE (body) == PARALLEL)
> +     return 0;
You should consider handling the case where you've got a 2 element 
PARALLEL where the first element is a suitable AND and the second is 
just a CLOBBER.  This case arises on targets that implement condition 
code registers.

Using single_set may be a good way to handle that.  It'll give you back 
the SET expression after stripping out the CLOBBER.


> +
> +  if (GET_CODE(body) == SET && GET_CODE (SET_SRC (body)) == AND)
Formatting nit.  GET_CODE(body) should be GET_CODE (body)).


> +   {
> +     rtx set = XEXP (SET_SRC (body), 0);
> +
> +     if (REG_P(set) && GET_MODE (SET_DEST (body)) == GET_MODE(set))
Multiple formatting nits here.

> +       {
> +	 if (XEXP (SET_SRC (body), 1) == const1_rtx)
> +	   return 1;
Same comment as prior function.  I don't think we want to restrict this 
to just (const_int 1).








> @@ -359,27 +479,45 @@ combine_set_extension (ext_cand *cand, rtx_insn *curr_insn, rtx *orig_set)
> +      if (GET_CODE (SET_SRC (cand_pat)) == AND)
> +	temp_extension
> +	= gen_rtx_fmt_ee (cand->code, cand->mode, XEXP (orig_src, 0),
> +			  XEXP (orig_src, 1));
> +      else
> +	temp_extension
> +	 = gen_rtx_fmt_e (cand->code, cand->mode, XEXP (orig_src, 0));Isn't cand->code always AND here?  Would it make more sense to use 
gen_rtx_AND?  The reason the other code doesn't here is because I think 
the candidate could be a zero or sign extension, thus we have to use 
gen_rtx_fmt_e.


>     else if (GET_CODE (orig_src) == IF_THEN_ELSE)
>       {
>         /* Only IF_THEN_ELSE of phi-type copies are combined.  Otherwise,
> -         in general, IF_THEN_ELSE should not be combined.  */
> -      return false;
> +	in general, IF_THEN_ELSE should not be combined.  Relaxed
> +	cases with IF_THEN_ELSE across basic blocls */
> +	return true;
Hard to know if this is safe/correct.  I would *strongly* suggest 
breaking this patch up along the lines I've suggested above.  That way 
we can focus on one significant change at a time.



> +	temp_extension
> +	= gen_rtx_fmt_e (cand->code, cand->mode,orig_src);
Please be careful with formatting.  THere's always a space after a comma 
in an argument list.


> +
>         rtx simplified_temp_extension = simplify_rtx (temp_extension);
> +
>         if (simplified_temp_extension)
>           temp_extension = simplified_temp_extension;
> +
>         new_set = gen_rtx_SET (new_reg, temp_extension);
>       }
You're introducing extraneous newlines.  Sometimes that's useful to 
provide visual indications of related code.  But be aware that often 
folks will object to such changes intermixed in a large patch.



>   
> +static bool
> +def_arith_p (rtx_insn *insn, rtx orig_src)
Missing a comment indicating what this function does, its arguments and 
return value.  These are important so that when someone looks at your 
code later they can quickly ascertain the general purpose of the 
function without having to read and try to understand all the code.


> +{
> +  if (!REG_P (orig_src)) return true;
Bring the "return true;" down to a new line.

> +
> +  vec<rtx_insn *> *dest = XCNEWVEC (vec<rtx_insn *>, 4);
> +  if (!get_defs (insn, orig_src, dest))
> +    return false;
> +
> +  int i;
> +  rtx_insn *def_insn;
> +  bool has_arith = false;
> +
> +  FOR_EACH_VEC_ELT (*dest, i, def_insn)
> +    {
> +      if (DEBUG_INSN_P (def_insn))
> +	{
> +	  has_arith = true;
> +	  break;
> +	}
Hard to know since there's no function comment, but does it make sense 
to set HAS_ARITH for a DEBUG_INSN?


> +
> +      if ((GET_CODE (PATTERN (def_insn)) == SET
> +	   && (GET_CODE (SET_SRC (PATTERN (def_insn))) == ASHIFT
> +	   || GET_CODE (SET_SRC (PATTERN (def_insn))) == LSHIFTRT))
> +	   //|| GET_CODE (SET_SRC (PATTERN (def_insn))) == ZERO_EXTEND))
> +	   || GET_CODE (PATTERN (def_insn)) == PARALLEL)
> +	{
> +	  has_arith = true;
> +	  break;
> +	}
So there's the commented out line in here.  That should be removed.  It 
seems to me that there should be a comment indicating what property 
we're looking for so that the reader knows why these codes were chosen.




> +
> +      if (GET_CODE (PATTERN (def_insn)) == SET
> +	  && (GET_RTX_CLASS (GET_CODE (SET_SRC (PATTERN (def_insn)))) == RTX_BIN_ARITH
> +	  || GET_RTX_CLASS (GET_CODE (SET_SRC (PATTERN (def_insn)))) == RTX_COMM_ARITH))
> +	{
> +	  rtx src = XEXP (SET_SRC (PATTERN (def_insn)),0);
> +
> +	  if (GET_CODE (src) == LSHIFTRT
> +	      || GET_CODE (src) == ASHIFT)
> +	    {
> +	      has_arith = true;
> +	      break;
> +	    }
> +	}
Similarly.  Why are these particular RTX codes relevant?



> +}
> +
> +/* Find feasibility of extension elimination
> +   across basic blocks.  */
Needs to document the arguments and return value.

> +
> +static bool
> +eliminate_across_bbs_p (ext_cand *cand,
> +			rtx_insn *def_insn)
Both arguments should probably be on the same line.  The general rule is
if they fit in 80 columns, then don't split them.


> +{
> +  basic_block bb = BLOCK_FOR_INSN (cand->insn);
> +  edge fallthru_edge;
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +    {
> +      rtx_insn *insn = BB_END (e->src) ? PREV_INSN (BB_END (e->src)) : NULL;
> +
> +      if (insn && NONDEBUG_INSN_P (insn)
> +	  && GET_CODE (PATTERN (insn)) == SET && SET_SRC (PATTERN(insn))
> +	  && GET_CODE (SET_SRC (PATTERN (insn))) == IF_THEN_ELSE)
Filter out the NONDEBUG_INSN_P like this:

	if (!NONDEBUG_INSN_P (insn))
	  continue;

Use single_set to ensure you have a proper set.  If your code can't 
handle a single_set inside a clobber, then what you've written is OK, 
but it should at least be documented.

THere's also multiple whitespace problems in your code.  I would 
strongly suggest you review the code formatting guidelines and perhaps 
use one of the various formatting tools out there to help make sure 
you're following the guidelines.

So at a high level, what property are you looking for from a CFG 
standpoint?  I'd hazard a guess you're looking for extended basic blocks 
or dominance.



> +
> +   rtx set = single_set(cand->insn);
> +   /* The destination register of the extension insn must not be
> +	 used or set between the def_insn and cand->insn exclusive.  */
> +   if (INSN_CHAIN_CODE_P (GET_CODE (def_insn))
> +       && INSN_CHAIN_CODE_P (cand->code))
> +     if ((cand->code == ZERO_EXTEND)
> +	  && REG_P (SET_DEST (set)) && NEXT_INSN (def_insn)
> +	  && (reg_used_between_p (SET_DEST (set), def_insn, cand->insn)
> +		|| reg_set_between_p (SET_DEST (set), def_insn, cand->insn)))
> +	return false;
This looks similar to other check(s) in ree.cc.  Would it make sense to 
factor this test into its own function?  If so, that would belong in 
patch #1 of the proposed series.


> +
> +   if (cand->code == ZERO_EXTEND && (bb != BLOCK_FOR_INSN (def_insn)
> +	|| DF_INSN_LUID (def_insn) > DF_INSN_LUID (cand->insn)))
> +     return false;
Formatting problems with your operations.  That makes this much harder 
to read.  Bring the && down to a new line and indent it just past the 
open paren.  The || will need further indentation as well.

> +
> +   if (insn_is_zext_p (cand->insn)
> +	&& GET_CODE (PATTERN (BB_END (bb))) != USE)
> +     return false;
So why is the USE important here?  And if it is, don't you need to check 
the argument of hte use for something?


> +
> +   if (insn_is_zext_p (cand->insn)
> +	&& GET_CODE (PATTERN (BB_END (bb))) == USE
> +	&& REGNO (XEXP (PATTERN (BB_END (bb)), 0)) != REGNO (SET_DEST (cand->expr)))
> +     return false;
You've repeated some of the tests.  Please clean this up.


> +
> +
> +   if (cand->code == SIGN_EXTEND
> +       && GET_CODE ((PATTERN (def_insn))) == SET)
> +     {
> +	rtx orig_src = SET_SRC (PATTERN (def_insn));
> +	machine_mode ext_src_mode;
> +
> +	ext_src_mode = GET_MODE (XEXP (SET_SRC (cand->expr), 0));
> +
> +	if (GET_MODE (SET_DEST (PATTERN (def_insn))) != ext_src_mode)
> +	  return false;
> +	if (GET_CODE (orig_src) != PLUS)
> +	  return false;
> +
> +	if (!REG_P (XEXP (orig_src, 0)))
> +	  return false;
> +
> +	if (!REG_P (XEXP (orig_src,1)))
> +	  return false;
> +
> +	if (GET_CODE (orig_src) == PLUS)
> +	  {
> +	    bool def_src1
> +	      = def_arith_p (def_insn,
> +			     XEXP (SET_SRC (PATTERN(def_insn)), 0));
> +	    bool def_src2
> +	       = def_arith_p (def_insn,
> +			     XEXP (SET_SRC (PATTERN(def_insn)), 1));
> +
> +	    if (def_src1 || def_src2)
> +	      return false;
> +	}
> +     }
So are you relying on WORD_REGISTER_OPERATIONS here?  It's like you're 
expecting arithmetic to be setting full words for sub-word operations. 
If so, you need to actually verify WORD_REGISTER_OPERATIONS is in effect 
and that the mode of the operation is smaller than a word.





> +
>   /* Merge the DEF_INSN with an extension.  Calls combine_set_extension
>      on the SET pattern.  */
>   
> @@ -710,15 +971,26 @@ merge_def_and_ext (ext_cand *cand, rtx_insn *def_insn, ext_state *state)
>     ext_src_mode = GET_MODE (XEXP (SET_SRC (cand->expr), 0));
>     sub_rtx = get_sub_rtx (def_insn);
>   
> +
>     if (sub_rtx == NULL)
>       return false;
>   
> -  if (GET_MODE (SET_DEST (*sub_rtx)) == ext_src_mode
> -	  || ((state->modified[INSN_UID (def_insn)].kind
> -	       == (cand->code == ZERO_EXTEND
> +  bool copy_needed
> +    = (REGNO (SET_DEST (cand->expr)) != REGNO (XEXP (SET_SRC (cand->expr), 0)));
> +
> +  bool feasible =  eliminate_across_bbs_p (cand, def_insn);
> +
> +  if (!feasible) return false;
Various formatting issues.


> +
> +  if (((!copy_needed && (insn_is_zext_p (cand->insn)
> +	|| (cand->code == ZERO_EXTEND || cand->code == SIGN_EXTEND))
> +	&& (cand->code == SIGN_EXTEND || GET_MODE (SET_DEST (*sub_rtx)) != ext_src_mode)
> +	&& state->modified[INSN_UID (def_insn)].kind == EXT_MODIFIED_NONE))
> +	|| ((state->modified[INSN_UID (def_insn)].kind
> +		== (cand->code == ZERO_EXTEND
>   		   ? EXT_MODIFIED_ZEXT : EXT_MODIFIED_SEXT))
> -	      && state->modified[INSN_UID (def_insn)].mode
> -		 == ext_src_mode))
> +	     && state->modified[INSN_UID (def_insn)].mode
> +		== ext_src_mode))
Ugh.  What a mess.  This huge condition deserves a comment.  THere may 
be formatting issues in here too.  Mailers tend to muck up whitespace so 
it's a bit hard to be 100% sure.



> @@ -771,6 +1044,45 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
>     state->defs_list.truncate (0);
>     state->copies_list.truncate (0);
>   
> +  if (cand->code == ZERO_EXTEND)
> +    {
> +      rtx orig_src = XEXP (SET_SRC (cand->expr),0);
> +      if (!get_defs (cand->insn, orig_src, NULL))
> +	{
> +	  if (GET_MODE (orig_src) == QImode
> +	      && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
> +	      && !FUNCTION_VALUE_REGNO_P (REGNO (orig_src)))
Way too specific.  You don't want to be checking specific modes.  It's 
also the case that you can't necessarily depend on ABIs to guarantee a 
particular zero/sign extension.  As has been mentioned earlier in this 
thread we need a way to describe what the ABI guarantees & requires. 
Just assuming that these are extended is wrong.




>   	  FOR_EACH_VEC_ELT (state->modified_list, i, def_insn)
>   	    {
>   	      ext_modified *modified = &state->modified[INSN_UID (def_insn)];
>   	      if (modified->kind == EXT_MODIFIED_NONE)
>   		modified->kind = (cand->code == ZERO_EXTEND ? EXT_MODIFIED_ZEXT
> -						            : EXT_MODIFIED_SEXT);
> +							    : EXT_MODIFIED_SEXT);
Looks like a gratutious whitespace change.  If the formatting on this 
was wrong before, then include the formatting fix in patch #1 of the series.


I think at this point there's enough TODOs for you.  I look forward to 
seeing an updated series.

Jeff
  

Patch

diff --git a/gcc/common/config/rs6000/rs6000-common.cc b/gcc/common/config/rs6000/rs6000-common.cc
index 2140c442ba9..968db215028 100644
--- a/gcc/common/config/rs6000/rs6000-common.cc
+++ b/gcc/common/config/rs6000/rs6000-common.cc
@@ -34,6 +34,8 @@  static const struct default_options rs6000_option_optimization_table[] =
     { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
     /* Enable -fsched-pressure for first pass instruction scheduling.  */
     { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
+    /* Enable -free for zero extension and sign extension elimination.*/
+    { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
     /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
        loops at -O2 and above by default.  */
     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
diff --git a/gcc/ree.cc b/gcc/ree.cc
index 413aec7c8eb..8057f0325f4 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -253,6 +253,101 @@  struct ext_cand
 
 static int max_insn_uid;
 
+/* Get all the reaching definitions of an instruction.  The definitions are
+   desired for REG used in INSN.  Return the definition list or NULL if a
+   definition is missing.  If DEST is non-NULL, additionally push the INSN
+   of the definitions onto DEST.  */
+
+static struct df_link *
+get_defs (rtx_insn *insn, rtx reg, vec<rtx_insn *> *dest)
+{
+  df_ref use;
+  struct df_link *ref_chain, *ref_link;
+
+  FOR_EACH_INSN_USE (use, insn)
+    {
+      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+	return NULL;
+      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
+	break;
+    }
+
+  if (use == NULL)
+    return NULL;
+
+  ref_chain = DF_REF_CHAIN (use);
+
+  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+    {
+      /* Problem getting some definition for this instruction.  */
+      if (ref_link->ref == NULL)
+	return NULL;
+      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
+	return NULL;
+      /* As global regs are assumed to be defined at each function call
+	 dataflow can report a call_insn as being a definition of REG.
+	 But we can't do anything with that in this pass so proceed only
+	 if the instruction really sets REG in a way that can be deduced
+	 from the RTL structure.  */
+      if (global_regs[REGNO (reg)]
+	  && !set_of (reg, DF_REF_INSN (ref_link->ref)))
+	return NULL;
+    }
+
+  if (dest)
+    for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+      dest->safe_push (DF_REF_INSN (ref_link->ref));
+
+  return ref_chain;
+}
+
+
+/* Identify instruction AND with identical zero extension.  */
+
+static unsigned int
+insn_is_zext_p (rtx insn)
+{
+  if (GET_CODE (insn) == AND)
+    {
+      rtx set = XEXP (insn, 0);
+      if (REG_P(set))
+	{
+	  if (XEXP (insn, 1) == const1_rtx)
+	    return 1;
+	}
+      else
+	return 0;
+    }
+
+  return 0;
+}
+
+/* Identify instruction AND with identical zero extension.  */
+
+static unsigned int
+insn_is_zext_p (rtx_insn *insn)
+{
+  rtx body = PATTERN (insn);
+
+  if (GET_CODE (body) == PARALLEL)
+     return 0;
+
+  if (GET_CODE(body) == SET && GET_CODE (SET_SRC (body)) == AND)
+   {
+     rtx set = XEXP (SET_SRC (body), 0);
+
+     if (REG_P(set) && GET_MODE (SET_DEST (body)) == GET_MODE(set))
+       {
+	 if (XEXP (SET_SRC (body), 1) == const1_rtx)
+	   return 1;
+       }
+     else
+      return 0;
+   }
+
+   return 0;
+}
+
 /* Update or remove REG_EQUAL or REG_EQUIV notes for INSN.  */
 
 static bool
@@ -297,6 +392,31 @@  update_reg_equal_equiv_notes (rtx_insn *insn, machine_mode new_mode,
   return true;
 }
 
+/* Return true if INSN is
+     (SET (reg REGNO (def_reg)) (if_then_else (cond) (REG x1) (REG x2)))
+   and store x1 and x2 in REG_1 and REG_2.  */
+
+static bool
+is_cond_copy_insn (rtx_insn *insn, rtx *reg1, rtx *reg2)
+{
+  rtx expr = single_set (insn);
+
+  if (expr != NULL_RTX
+      && GET_CODE (expr) == SET
+      && GET_CODE (SET_DEST (expr)) == REG
+      && GET_CODE (SET_SRC (expr))  == IF_THEN_ELSE
+      && REG_P (XEXP (SET_SRC (expr), 1))
+      && REG_P (XEXP (SET_SRC (expr), 2)))
+    {
+      *reg1 = XEXP (SET_SRC (expr), 1);
+      *reg2 = XEXP (SET_SRC (expr), 2);
+      return true;
+    }
+
+  return false;
+}
+
+
 /* Given a insn (CURR_INSN), an extension candidate for removal (CAND)
    and a pointer to the SET rtx (ORIG_SET) that needs to be modified,
    this code modifies the SET rtx to a new SET rtx that extends the
@@ -319,7 +439,7 @@  combine_set_extension (ext_cand *cand, rtx_insn *curr_insn, rtx *orig_set)
 {
   rtx orig_src = SET_SRC (*orig_set);
   machine_mode orig_mode = GET_MODE (SET_DEST (*orig_set));
-  rtx new_set;
+  rtx new_set = NULL_RTX;
   rtx cand_pat = single_set (cand->insn);
 
   /* If the extension's source/destination registers are not the same
@@ -359,27 +479,45 @@  combine_set_extension (ext_cand *cand, rtx_insn *curr_insn, rtx *orig_set)
   else if (GET_CODE (orig_src) == cand->code)
     {
       /* Here is a sequence of two extensions.  Try to merge them.  */
-      rtx temp_extension
-	= gen_rtx_fmt_e (cand->code, cand->mode, XEXP (orig_src, 0));
+      rtx temp_extension = NULL_RTX;
+      if (GET_CODE (SET_SRC (cand_pat)) == AND)
+	temp_extension
+	= gen_rtx_fmt_ee (cand->code, cand->mode, XEXP (orig_src, 0),
+			  XEXP (orig_src, 1));
+      else
+	temp_extension
+	 = gen_rtx_fmt_e (cand->code, cand->mode, XEXP (orig_src, 0));
       rtx simplified_temp_extension = simplify_rtx (temp_extension);
       if (simplified_temp_extension)
         temp_extension = simplified_temp_extension;
+
       new_set = gen_rtx_SET (new_reg, temp_extension);
     }
   else if (GET_CODE (orig_src) == IF_THEN_ELSE)
     {
       /* Only IF_THEN_ELSE of phi-type copies are combined.  Otherwise,
-         in general, IF_THEN_ELSE should not be combined.  */
-      return false;
+	in general, IF_THEN_ELSE should not be combined.  Relaxed
+	cases with IF_THEN_ELSE across basic blocls */
+	return true;
     }
   else
     {
       /* This is the normal case.  */
-      rtx temp_extension
-	= gen_rtx_fmt_e (cand->code, cand->mode, orig_src);
+      rtx temp_extension = NULL_RTX;
+
+      if (GET_CODE (SET_SRC (cand_pat)) == AND)
+	temp_extension
+	= gen_rtx_fmt_ee (cand->code, cand->mode,orig_src,
+			  XEXP (SET_SRC (cand_pat), 1));
+      else
+	temp_extension
+	= gen_rtx_fmt_e (cand->code, cand->mode,orig_src);
+
       rtx simplified_temp_extension = simplify_rtx (temp_extension);
+
       if (simplified_temp_extension)
         temp_extension = simplified_temp_extension;
+
       new_set = gen_rtx_SET (new_reg, temp_extension);
     }
 
@@ -454,53 +592,6 @@  transform_ifelse (ext_cand *cand, rtx_insn *def_insn)
   return false;
 }
 
-/* Get all the reaching definitions of an instruction.  The definitions are
-   desired for REG used in INSN.  Return the definition list or NULL if a
-   definition is missing.  If DEST is non-NULL, additionally push the INSN
-   of the definitions onto DEST.  */
-
-static struct df_link *
-get_defs (rtx_insn *insn, rtx reg, vec<rtx_insn *> *dest)
-{
-  df_ref use;
-  struct df_link *ref_chain, *ref_link;
-
-  FOR_EACH_INSN_USE (use, insn)
-    {
-      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
-        return NULL;
-      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
-	break;
-    }
-
-  gcc_assert (use != NULL);
-
-  ref_chain = DF_REF_CHAIN (use);
-
-  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
-    {
-      /* Problem getting some definition for this instruction.  */
-      if (ref_link->ref == NULL)
-        return NULL;
-      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
-        return NULL;
-      /* As global regs are assumed to be defined at each function call
-	 dataflow can report a call_insn as being a definition of REG.
-	 But we can't do anything with that in this pass so proceed only
-	 if the instruction really sets REG in a way that can be deduced
-	 from the RTL structure.  */
-      if (global_regs[REGNO (reg)]
-	  && !set_of (reg, DF_REF_INSN (ref_link->ref)))
-	return NULL;
-    }
-
-  if (dest)
-    for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
-      dest->safe_push (DF_REF_INSN (ref_link->ref));
-
-  return ref_chain;
-}
-
 /* Get all the reaching uses of an instruction.  The uses are desired for REG
    set in INSN.  Return use list or NULL if a use is missing or irregular.  */
 
@@ -530,30 +621,6 @@  get_uses (rtx_insn *insn, rtx reg)
   return ref_chain;
 }
 
-/* Return true if INSN is
-     (SET (reg REGNO (def_reg)) (if_then_else (cond) (REG x1) (REG x2)))
-   and store x1 and x2 in REG_1 and REG_2.  */
-
-static bool
-is_cond_copy_insn (rtx_insn *insn, rtx *reg1, rtx *reg2)
-{
-  rtx expr = single_set (insn);
-
-  if (expr != NULL_RTX
-      && GET_CODE (expr) == SET
-      && GET_CODE (SET_DEST (expr)) == REG
-      && GET_CODE (SET_SRC (expr))  == IF_THEN_ELSE
-      && GET_CODE (XEXP (SET_SRC (expr), 1)) == REG
-      && GET_CODE (XEXP (SET_SRC (expr), 2)) == REG)
-    {
-      *reg1 = XEXP (SET_SRC (expr), 1);
-      *reg2 = XEXP (SET_SRC (expr), 2);
-      return true;
-    }
-
-  return false;
-}
-
 enum ext_modified_kind
 {
   /* The insn hasn't been modified by ree pass yet.  */
@@ -698,6 +765,200 @@  get_sub_rtx (rtx_insn *def_insn)
   return sub_rtx;
 }
 
+static bool
+def_arith_p (rtx_insn *insn, rtx orig_src)
+{
+  if (!REG_P (orig_src)) return true;
+
+  vec<rtx_insn *> *dest = XCNEWVEC (vec<rtx_insn *>, 4);
+  if (!get_defs (insn, orig_src, dest))
+    return false;
+
+  int i;
+  rtx_insn *def_insn;
+  bool has_arith = false;
+
+  FOR_EACH_VEC_ELT (*dest, i, def_insn)
+    {
+      if (DEBUG_INSN_P (def_insn))
+	{
+	  has_arith = true;
+	  break;
+	}
+
+      if ((GET_CODE (PATTERN (def_insn)) == SET
+	   && (GET_CODE (SET_SRC (PATTERN (def_insn))) == ASHIFT
+	   || GET_CODE (SET_SRC (PATTERN (def_insn))) == LSHIFTRT))
+	   //|| GET_CODE (SET_SRC (PATTERN (def_insn))) == ZERO_EXTEND))
+	   || GET_CODE (PATTERN (def_insn)) == PARALLEL)
+	{
+	  has_arith = true;
+	  break;
+	}
+
+      if (GET_CODE (PATTERN (def_insn)) == SET
+	  && (GET_RTX_CLASS (GET_CODE (SET_SRC (PATTERN (def_insn)))) == RTX_BIN_ARITH
+	  || GET_RTX_CLASS (GET_CODE (SET_SRC (PATTERN (def_insn)))) == RTX_COMM_ARITH))
+	{
+	  rtx src = XEXP (SET_SRC (PATTERN (def_insn)),0);
+
+	  if (GET_CODE (src) == LSHIFTRT
+	      || GET_CODE (src) == ASHIFT)
+	    {
+	      has_arith = true;
+	      break;
+	    }
+	}
+     }
+  XDELETEVEC (dest);
+  return has_arith;
+}
+
+/* Find feasibility of extension elimination
+   across basic blocks.  */
+
+static bool
+eliminate_across_bbs_p (ext_cand *cand,
+			rtx_insn *def_insn)
+{
+  basic_block bb = BLOCK_FOR_INSN (cand->insn);
+  edge fallthru_edge;
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      rtx_insn *insn = BB_END (e->src) ? PREV_INSN (BB_END (e->src)) : NULL;
+
+      if (insn && NONDEBUG_INSN_P (insn)
+	  && GET_CODE (PATTERN (insn)) == SET && SET_SRC (PATTERN(insn))
+	  && GET_CODE (SET_SRC (PATTERN (insn))) == IF_THEN_ELSE)
+	{
+	  if (e->dest == bb)
+	    {
+	      basic_block jump_block = e->dest;
+	      if (jump_block == bb)
+		{
+		  if (single_succ_p (BLOCK_FOR_INSN (def_insn)))
+		    {
+		      fallthru_edge = single_succ_edge (BLOCK_FOR_INSN (def_insn));
+		      if (BB_END (fallthru_edge->dest) && (bb != fallthru_edge->dest
+			|| e->dest != fallthru_edge->dest))
+			return false;
+		    }
+		   else
+		     return false;
+		}
+		else return false;
+	      }
+	      else
+		{
+		  if (single_succ_p (e->dest))
+		    {
+		      fallthru_edge = single_succ_edge (e->dest);
+		      if (BB_END (fallthru_edge->dest) && bb != fallthru_edge->dest)
+			return false;
+		    }
+		}
+	     }
+	  }
+  if (single_succ_p (BLOCK_FOR_INSN (def_insn)))
+    {
+      fallthru_edge = single_succ_edge (BLOCK_FOR_INSN (def_insn));
+      if (BB_END (fallthru_edge->dest) && bb != fallthru_edge->dest)
+	return false;
+    }
+   else
+     return false;
+
+   rtx set = single_set(cand->insn);
+   /* The destination register of the extension insn must not be
+	 used or set between the def_insn and cand->insn exclusive.  */
+   if (INSN_CHAIN_CODE_P (GET_CODE (def_insn))
+       && INSN_CHAIN_CODE_P (cand->code))
+     if ((cand->code == ZERO_EXTEND)
+	  && REG_P (SET_DEST (set)) && NEXT_INSN (def_insn)
+	  && (reg_used_between_p (SET_DEST (set), def_insn, cand->insn)
+		|| reg_set_between_p (SET_DEST (set), def_insn, cand->insn)))
+	return false;
+
+   if (cand->code == ZERO_EXTEND && (bb != BLOCK_FOR_INSN (def_insn)
+	|| DF_INSN_LUID (def_insn) > DF_INSN_LUID (cand->insn)))
+     return false;
+
+   if (insn_is_zext_p (cand->insn)
+	&& GET_CODE (PATTERN (BB_END (bb))) != USE)
+     return false;
+
+   if (insn_is_zext_p (cand->insn)
+	&& GET_CODE (PATTERN (BB_END (bb))) == USE
+	&& REGNO (XEXP (PATTERN (BB_END (bb)), 0)) != REGNO (SET_DEST (cand->expr)))
+     return false;
+
+
+   if (cand->code == SIGN_EXTEND
+       && GET_CODE ((PATTERN (def_insn))) == SET)
+     {
+	rtx orig_src = SET_SRC (PATTERN (def_insn));
+	machine_mode ext_src_mode;
+
+	ext_src_mode = GET_MODE (XEXP (SET_SRC (cand->expr), 0));
+
+	if (GET_MODE (SET_DEST (PATTERN (def_insn))) != ext_src_mode)
+	  return false;
+	if (GET_CODE (orig_src) != PLUS)
+	  return false;
+
+	if (!REG_P (XEXP (orig_src, 0)))
+	  return false;
+
+	if (!REG_P (XEXP (orig_src,1)))
+	  return false;
+
+	if (GET_CODE (orig_src) == PLUS)
+	  {
+	    bool def_src1
+	      = def_arith_p (def_insn,
+			     XEXP (SET_SRC (PATTERN(def_insn)), 0));
+	    bool def_src2
+	       = def_arith_p (def_insn,
+			     XEXP (SET_SRC (PATTERN(def_insn)), 1));
+
+	    if (def_src1 || def_src2)
+	      return false;
+	}
+     }
+
+   if ((cand->code == ZERO_EXTEND || cand->code == SIGN_EXTEND)
+	&& GET_CODE (PATTERN (def_insn)) == PARALLEL)
+     return false;
+
+   if (cand->code == ZERO_EXTEND
+	&& GET_CODE (PATTERN (def_insn)) == SET)
+     {
+	if (GET_CODE (SET_SRC (PATTERN (def_insn))) != XOR
+	    && GET_CODE (SET_SRC (PATTERN (def_insn))) != IOR)
+	  return false;
+
+
+	if (GET_CODE (SET_SRC (PATTERN (def_insn))) == XOR
+	     || GET_CODE (SET_SRC (PATTERN (def_insn))) == IOR)
+	  {
+	    bool def_src1
+	      = def_arith_p (def_insn,
+			     XEXP (SET_SRC (PATTERN(def_insn)), 0));
+	    bool def_src2
+	       = def_arith_p (def_insn,
+			     XEXP (SET_SRC (PATTERN(def_insn)), 1));
+
+	    if (def_src1 || def_src2)
+	      return false;
+	  }
+      }
+
+   return true;
+}
+
 /* Merge the DEF_INSN with an extension.  Calls combine_set_extension
    on the SET pattern.  */
 
@@ -710,15 +971,26 @@  merge_def_and_ext (ext_cand *cand, rtx_insn *def_insn, ext_state *state)
   ext_src_mode = GET_MODE (XEXP (SET_SRC (cand->expr), 0));
   sub_rtx = get_sub_rtx (def_insn);
 
+
   if (sub_rtx == NULL)
     return false;
 
-  if (GET_MODE (SET_DEST (*sub_rtx)) == ext_src_mode
-	  || ((state->modified[INSN_UID (def_insn)].kind
-	       == (cand->code == ZERO_EXTEND
+  bool copy_needed
+    = (REGNO (SET_DEST (cand->expr)) != REGNO (XEXP (SET_SRC (cand->expr), 0)));
+
+  bool feasible =  eliminate_across_bbs_p (cand, def_insn);
+
+  if (!feasible) return false;
+
+  if (((!copy_needed && (insn_is_zext_p (cand->insn)
+	|| (cand->code == ZERO_EXTEND || cand->code == SIGN_EXTEND))
+	&& (cand->code == SIGN_EXTEND || GET_MODE (SET_DEST (*sub_rtx)) != ext_src_mode)
+	&& state->modified[INSN_UID (def_insn)].kind == EXT_MODIFIED_NONE))
+	|| ((state->modified[INSN_UID (def_insn)].kind
+		== (cand->code == ZERO_EXTEND
 		   ? EXT_MODIFIED_ZEXT : EXT_MODIFIED_SEXT))
-	      && state->modified[INSN_UID (def_insn)].mode
-		 == ext_src_mode))
+	     && state->modified[INSN_UID (def_insn)].mode
+		== ext_src_mode))
     {
       if (GET_MODE_UNIT_SIZE (GET_MODE (SET_DEST (*sub_rtx)))
 	  >= GET_MODE_UNIT_SIZE (cand->mode))
@@ -734,7 +1006,6 @@  merge_def_and_ext (ext_cand *cand, rtx_insn *def_insn, ext_state *state)
 	  return true;
 	}
     }
-
   return false;
 }
 
@@ -744,7 +1015,9 @@  merge_def_and_ext (ext_cand *cand, rtx_insn *def_insn, ext_state *state)
 static inline rtx
 get_extended_src_reg (rtx src)
 {
-  while (GET_CODE (src) == SIGN_EXTEND || GET_CODE (src) == ZERO_EXTEND)
+  while (GET_CODE (src) == SIGN_EXTEND
+	 || GET_CODE (src) == ZERO_EXTEND
+	 || insn_is_zext_p(src))
     src = XEXP (src, 0);
   gcc_assert (REG_P (src));
   return src;
@@ -771,6 +1044,45 @@  combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
   state->defs_list.truncate (0);
   state->copies_list.truncate (0);
 
+  if (cand->code == ZERO_EXTEND)
+    {
+      rtx orig_src = XEXP (SET_SRC (cand->expr),0);
+      if (!get_defs (cand->insn, orig_src, NULL))
+	{
+	  if (GET_MODE (orig_src) == QImode
+	      && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+	      && !FUNCTION_VALUE_REGNO_P (REGNO (orig_src)))
+	     {
+		if (side_effects_p (PATTERN (cand->insn)))
+		  return false;
+
+		struct df_link *uses
+		  = get_uses (cand->insn, SET_DEST (PATTERN (cand->insn)));
+
+		if (!uses) return false;
+
+		for (df_link *use = uses; use; use = use->next)
+		  {
+		    if (BLOCK_FOR_INSN (cand->insn)
+			!= BLOCK_FOR_INSN (DF_REF_INSN (use->ref)))
+		      return false;
+
+		    rtx_insn *insn = DF_REF_INSN (use->ref);
+
+		    if (GET_CODE (PATTERN (insn)) == SET)
+		      {
+			rtx_code code = GET_CODE (SET_SRC (PATTERN (insn)));
+			if (GET_RTX_CLASS (code) == RTX_BIN_ARITH
+			    || GET_RTX_CLASS (code) == RTX_COMM_ARITH
+			    || GET_RTX_CLASS (code) == RTX_UNARY)
+			  return false;
+		       }
+		    }
+		 return true;
+	     }
+	 }
+    }
+
   outcome = make_defs_and_copies_lists (cand->insn, set_pat, state);
 
   if (!outcome)
@@ -1004,15 +1316,17 @@  combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
       cand->mode = mode;
     }
 
-  merge_successful = true;
-
+  merge_successful = false;
   /* Go through the defs vector and try to merge all the definitions
      in this vector.  */
   state->modified_list.truncate (0);
   FOR_EACH_VEC_ELT (state->defs_list, defs_ix, def_insn)
     {
       if (merge_def_and_ext (cand, def_insn, state))
-	state->modified_list.safe_push (def_insn);
+	{
+	  merge_successful = true;
+	  state->modified_list.safe_push (def_insn);
+	}
       else
         {
           merge_successful = false;
@@ -1045,34 +1359,71 @@  combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
 	 definitions could be merged.  */
       if (apply_change_group ())
         {
-          if (dump_file)
-            fprintf (dump_file, "All merges were successful.\n");
+	  if (state->modified_list.length() == 0)
+	     return false;
+
+	  rtx_insn *insn = state->modified_list[0];
+
+	  if ((cand->code == ZERO_EXTEND || cand->code == SIGN_EXTEND)
+	       && GET_CODE (PATTERN (insn)) == SET
+	       && GET_CODE (SET_SRC (PATTERN (insn))) != XOR
+	       && GET_CODE (SET_SRC (PATTERN (insn))) != PLUS
+	       && GET_CODE (SET_SRC (PATTERN (insn))) != IOR)
+	     return false;
+
+	   if (dump_file)
+	     fprintf (dump_file, "All merges were successful.\n");
 
 	  FOR_EACH_VEC_ELT (state->modified_list, i, def_insn)
 	    {
 	      ext_modified *modified = &state->modified[INSN_UID (def_insn)];
 	      if (modified->kind == EXT_MODIFIED_NONE)
 		modified->kind = (cand->code == ZERO_EXTEND ? EXT_MODIFIED_ZEXT
-						            : EXT_MODIFIED_SEXT);
+							    : EXT_MODIFIED_SEXT);
 
 	      if (copy_needed)
 		modified->do_not_reextend = 1;
 	    }
           return true;
         }
-      else
-        {
-          /* Changes need not be cancelled explicitly as apply_change_group
-             does it.  Print list of definitions in the dump_file for debug
-             purposes.  This extension cannot be deleted.  */
-          if (dump_file)
-            {
-	      fprintf (dump_file,
-		       "Merge cancelled, non-mergeable definitions:\n");
-	      FOR_EACH_VEC_ELT (state->modified_list, i, def_insn)
-	        print_rtl_single (dump_file, def_insn);
-            }
-        }
+	else
+	  {
+	    if (state->modified_list.length() == 0)
+	      return false;
+
+	     rtx_insn *insn = state->modified_list[0];
+
+	     if ((cand->code == ZERO_EXTEND || cand->code == SIGN_EXTEND)
+		  && GET_CODE (PATTERN (insn)) == SET
+		  && GET_CODE (SET_SRC (PATTERN (insn))) != XOR
+		  && GET_CODE (SET_SRC (PATTERN (insn))) != PLUS
+		  && GET_CODE (SET_SRC (PATTERN (insn))) != IOR)
+		return false;
+
+	    if (cand->code == ZERO_EXTEND || cand->code == SIGN_EXTEND)
+	      {
+		FOR_EACH_VEC_ELT (state->modified_list, i, def_insn)
+		  {
+		    ext_modified *modified = &state->modified[INSN_UID (def_insn)];
+		    if (modified->kind == EXT_MODIFIED_NONE)
+		      modified->kind = (cand->code == ZERO_EXTEND ? EXT_MODIFIED_ZEXT
+								  : EXT_MODIFIED_SEXT);
+
+		     modified->do_not_reextend = 1;
+		   }
+		  return true;
+	      }
+	    /* Changes need not be cancelled explicitly as apply_change_group
+		does it.  Print list of definitions in the dump_file for debug
+		purposes.  This extension cannot be deleted.  */
+	    if (dump_file)
+	      {
+		fprintf (dump_file,
+			"Merge cancelled, non-mergeable definitions:\n");
+		FOR_EACH_VEC_ELT (state->modified_list, i, def_insn)
+		  print_rtl_single (dump_file, def_insn);
+	      }
+	   }
     }
   else
     {
@@ -1106,42 +1457,51 @@  add_removable_extension (const_rtx expr, rtx_insn *insn,
   mode = GET_MODE (dest);
 
   if (REG_P (dest)
-      && (code == SIGN_EXTEND || code == ZERO_EXTEND)
+      && (code == SIGN_EXTEND || code == ZERO_EXTEND || insn_is_zext_p(src))
       && REG_P (XEXP (src, 0)))
     {
       rtx reg = XEXP (src, 0);
       struct df_link *defs, *def;
       ext_cand *cand;
 
+      defs = get_defs (insn, reg, NULL);
+
       /* Zero-extension of an undefined value is partly defined (it's
 	 completely undefined for sign-extension, though).  So if there exists
 	 a path from the entry to this zero-extension that leaves this register
 	 uninitialized, removing the extension could change the behavior of
 	 correct programs.  So first, check it is not the case.  */
-      if (code == ZERO_EXTEND && !bitmap_bit_p (init_regs, REGNO (reg)))
+      if (!defs && FUNCTION_ARG_REGNO_P (REGNO (reg)))
 	{
-	  if (dump_file)
-	    {
-	      fprintf (dump_file, "Cannot eliminate extension:\n");
-	      print_rtl_single (dump_file, insn);
-	      fprintf (dump_file, " because it can operate on uninitialized"
-			          " data\n");
-	    }
+	  ext_cand e = {expr, code, mode, insn};
+	  insn_list->safe_push (e);
 	  return;
 	}
 
-      /* Second, make sure we can get all the reaching definitions.  */
-      defs = get_defs (insn, reg, NULL);
-      if (!defs)
-	{
-	  if (dump_file)
-	    {
-	      fprintf (dump_file, "Cannot eliminate extension:\n");
-	      print_rtl_single (dump_file, insn);
-	      fprintf (dump_file, " because of missing definition(s)\n");
-	    }
-	  return;
-	}
+       if ((code == ZERO_EXTEND
+	    && (!FUNCTION_ARG_REGNO_P (REGNO (reg)))
+	    && bitmap_bit_p (init_regs, REGNO (reg))))
+	 {
+	   if (dump_file)
+	     {
+	       fprintf (dump_file, "Cannot eliminate extension:\n");
+	       print_rtl_single (dump_file, insn);
+	       fprintf (dump_file, " because it can operate on uninitialized"
+				   " data\n");
+	     }
+	     return;
+	 }
+
+	if (!defs)
+	  {
+	    if (dump_file)
+	      {
+		fprintf (dump_file, "Cannot eliminate extension:\n");
+		print_rtl_single (dump_file, insn);
+		fprintf (dump_file, " because of missing definition(s)\n");
+	      }
+	      return;
+	  }
 
       /* Third, make sure the reaching definitions don't feed another and
 	 different extension.  FIXME: this obviously can be improved.  */
@@ -1321,7 +1681,8 @@  find_and_remove_re (void)
 	      && (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0))))
 	    {
               reinsn_copy_list.safe_push (curr_cand->insn);
-              reinsn_copy_list.safe_push (state.defs_list[0]);
+	      if (state.defs_list.length() != 0)
+		reinsn_copy_list.safe_push (state.defs_list[0]);
 	    }
 	  reinsn_del_list.safe_push (curr_cand->insn);
 	  state.modified[INSN_UID (curr_cand->insn)].deleted = 1;
@@ -1342,24 +1703,27 @@  find_and_remove_re (void)
      Remember that the memory reference will be changed to refer to the
      destination of the extention.  So we're actually emitting a copy
      from the new destination to the old destination.  */
-  for (unsigned int i = 0; i < reinsn_copy_list.length (); i += 2)
-    {
-      rtx_insn *curr_insn = reinsn_copy_list[i];
-      rtx_insn *def_insn = reinsn_copy_list[i + 1];
-
-      /* Use the mode of the destination of the defining insn
-	 for the mode of the copy.  This is necessary if the
-	 defining insn was used to eliminate a second extension
-	 that was wider than the first.  */
-      rtx sub_rtx = *get_sub_rtx (def_insn);
-      rtx set = single_set (curr_insn);
-      rtx new_dst = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)),
-				 REGNO (XEXP (SET_SRC (set), 0)));
-      rtx new_src = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)),
-				 REGNO (SET_DEST (set)));
-      rtx new_set = gen_rtx_SET (new_dst, new_src);
-      emit_insn_after (new_set, def_insn);
-    }
+     for (unsigned int i = 0; i < reinsn_copy_list.length (); i += 2)
+	{
+	  rtx_insn *curr_insn = reinsn_copy_list[i];
+
+	  if ((i+1) < reinsn_copy_list.length())
+	    {
+	      rtx_insn *def_insn = reinsn_copy_list[i + 1];
+	      /* Use the mode of the destination of the defining insn
+		 for the mode of the copy.  This is necessary if the
+		 defining insn was used to eliminate a second extension
+		 that was wider than the first.  */
+	      rtx sub_rtx = *get_sub_rtx (def_insn);
+	      rtx set = single_set (curr_insn);
+	      rtx new_dst = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)),
+					 REGNO (XEXP (SET_SRC (set), 0)));
+	      rtx new_src = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)),
+					 REGNO (SET_DEST (set)));
+	      rtx new_set = gen_rtx_SET (new_dst, new_src);
+	      emit_insn_after (new_set, def_insn);
+	   }
+	}
 
   /* Delete all useless extensions here in one sweep.  */
   FOR_EACH_VEC_ELT (reinsn_del_list, i, curr_insn)
@@ -1368,9 +1732,10 @@  find_and_remove_re (void)
   reinsn_list.release ();
   XDELETEVEC (state.modified);
 
-  if (dump_file && num_re_opportunities > 0)
+  if (dump_file && num_re_opportunities > 0) {
     fprintf (dump_file, "Elimination opportunities = %d realized = %d\n",
 	     num_re_opportunities, num_realized);
+  }
 }
 
 /* Find and remove redundant extensions.  */
diff --git a/gcc/testsuite/g++.target/powerpc/sext-elim.C b/gcc/testsuite/g++.target/powerpc/sext-elim.C
new file mode 100644
index 00000000000..431696cf11e
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/sext-elim.C
@@ -0,0 +1,18 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 -free" } */
+
+unsigned long c2l(unsigned char* p)
+{
+  unsigned long res = *p + *(p+1);
+  return res;
+}
+
+long c2sl(signed char* p)
+{
+  long res = *p + *(p+1);
+  return res;
+}
+
+/* { dg-final { scan-assembler-not "extsw" } } */
diff --git a/gcc/testsuite/g++.target/powerpc/zext-elim-1.C b/gcc/testsuite/g++.target/powerpc/zext-elim-1.C
new file mode 100644
index 00000000000..bc6cc0eb3ca
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/zext-elim-1.C
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 -free" } */
+
+extern unsigned char magic1[256];
+
+unsigned int hash(const unsigned char inp[4])
+{
+   const unsigned long long INIT = 0x1ULL;
+   unsigned long long h1 = INIT;
+   h1 = magic1[((unsigned long long)inp[0]) ^ h1];
+   h1 = magic1[((unsigned long long)inp[1]) ^ h1];
+   h1 = magic1[((unsigned long long)inp[2]) ^ h1];
+   h1 = magic1[((unsigned long long)inp[3]) ^ h1];
+   return h1;
+}
+
+/* { dg-final { scan-assembler-not "rlwinm" } } */
diff --git a/gcc/testsuite/g++.target/powerpc/zext-elim-2.C b/gcc/testsuite/g++.target/powerpc/zext-elim-2.C
new file mode 100644
index 00000000000..4e72925104f
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/zext-elim-2.C
@@ -0,0 +1,11 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 -free" } */
+
+unsigned char g(unsigned char t[], unsigned char v)
+{
+  return (t[v & 0x7f] & 0x7f) | (v & 0x80);
+}
+
+/* { dg-final { scan-assembler-times "rlwinm" 2 } } */
diff --git a/gcc/testsuite/g++.target/powerpc/zext-elim-3.C b/gcc/testsuite/g++.target/powerpc/zext-elim-3.C
new file mode 100644
index 00000000000..1d443af066a
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/zext-elim-3.C
@@ -0,0 +1,16 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 -free" } */
+
+void *memset(void *b, int c, unsigned long len)
+{
+  unsigned long i;
+
+  for (i = 0; i < len; i++)
+    ((unsigned char *)b)[i] = c;
+
+   return b;
+}
+
+/* { dg-final { scan-assembler-not "rlwinm" } } */
diff --git a/gcc/testsuite/g++.target/powerpc/zext-elim.C b/gcc/testsuite/g++.target/powerpc/zext-elim.C
new file mode 100644
index 00000000000..56eabbe0c19
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/zext-elim.C
@@ -0,0 +1,30 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 -free" } */
+
+#include <stddef.h>
+
+bool foo (int a, int b)
+{
+  if (a > 2)
+    return false;
+
+  if (b < 10)
+    return true;
+
+  return true;
+}
+
+int bar (int a, int b)
+{
+  if (a > 2)
+    return 0;
+
+  if (b < 10)
+    return 1;
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "rldicl" } } */