fold fold_truth_andor field merging into ifcombine was: [PATCH] assorted improvements for fold_truth_andor_1)
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
Commit Message
This patch introduces various improvements to the logic that merges
field compares, moving it into ifcombine.
Before the patch, we could merge:
(a.x1 EQNE b.x1) ANDOR (a.y1 EQNE b.y1)
into something like:
(((type *)&a)[Na] & MASK) EQNE (((type *)&b)[Nb] & MASK)
if both of A's fields live within the same alignment boundaries, and
so do B's, at the same relative positions. Constants may be used
instead of the object B.
The initial goal of this patch was to enable such combinations when a
field crossed alignment boundaries, e.g. for packed types. We can't
generally access such fields with a single memory access, so when we
come across such a compare, we will attempt to combine each access
separately.
Some merging opportunities were missed because of right-shifts,
compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and
narrowing conversions, especially after earlier merges. This patch
introduces handlers for several cases involving these.
The merging of multiple field accesses into wider bitfield-like
accesses is undesirable to do too early in compilation, so we move it
from folding to ifcombine, and extend ifcombine to merge noncontiguous
compares, absent intervening side effects. VUSEs used to prevent
ifcombine; that seemed excessively conservative, since relevant side
effects were already tested, including the possibility of trapping
loads, so that's removed.
Unlike earlier ifcombine, when merging noncontiguous compares the
merged compare must replace the earliest compare, which may require
moving up the DEFs that contributed to the latter compare.
When it is the second of a noncontiguous pair of compares that first
accesses a word, we may merge the first compare with part of the
second compare that refers to the same word, keeping the compare of
the remaining bits at the spot where the second compare used to be.
Handling compares with non-constant fields was somewhat generalized
from what fold used to do, now handling non-adjacent fields, even if a
field of one object crosses an alignment boundary but the other
doesn't.
The -Wno-error for toplev.o on rs6000 is because of toplev.c's:
if ((flag_sanitize & SANITIZE_ADDRESS)
&& !FRAME_GROWS_DOWNWARD)
and rs6000.h's:
#define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0 \
|| (flag_sanitize & SANITIZE_ADDRESS) != 0)
The mutually exclusive conditions involving flag_sanitize are now
noticed and reported by ifcombine's warning on mutually exclusive
compares. i386's needs -Wno-error for insn-attrtab.o for similar
reasons.
for gcc/ChangeLog
* fold-const.cc (make_bit_field): Export.
(all_ones_mask_p): Drop.
(unextend, decode_field_reference, fold_truth_andor_1): Move
field compare merging logic...
* gimple-fold.cc: ... here.
(ssa_is_substitutable_p, is_cast_p, is_binop_p): New.
(prepare_xor, follow_load): New.
(compute_split_boundary_from_align): New.
(make_bit_field_load, build_split_load): New.
(reuse_split_load, mergeable_loads_p): New.
(fold_truth_andor_maybe_separate): New.
* tree-ssa-ifcombine.cc: Include bitmap.h.
(constant_condition_p): New.
(recognize_if_then_else_nc, recognize_if_succs): New.
(bb_no_side_effects_p): Don't reject VUSEs.
(update_profile_after_ifcombine): Adjust for noncontiguous
merges.
(ifcombine_mark_ssa_name): New.
(struct ifcombine_mark_ssa_name_t): New.
(ifcombine_mark_ssa_name_walk): New.
(ifcombine_replace_cond): Extended for noncontiguous merges
after factoring out of...
(ifcombine_ifandif): ... this. Drop result_inv arg. Try
fold_truth_andor_maybe_separate.
(tree_ssa_ifcombine_bb_1): Add outer_succ_bb arg. Call
recognize_if_then_else_nc. Adjust ifcombine_ifandif calls.
(tree_ssa_ifcombine_bb): Return the earliest affected block.
Call recognize_if_then_else_nc. Try noncontiguous blocks.
(pass_tree_ifcombine::execute): Retry affected blocks.
* config/i386/t-i386 (insn-attrtab.o-warn): Disable errors.
* config/rs6000/t-rs6000 (toplev.o-warn): Likewise.
for gcc/testsuite/ChangeLog
* gcc.dg/field-merge-1.c: New.
* gcc.dg/field-merge-2.c: New.
* gcc.dg/field-merge-3.c: New.
* gcc.dg/field-merge-4.c: New.
* gcc.dg/field-merge-5.c: New.
* gcc.dg/field-merge-6.c: New.
* gcc.dg/field-merge-7.c: New.
---
gcc/config/i386/t-i386 | 2
gcc/config/rs6000/t-rs6000 | 4
gcc/fold-const.cc | 512 -------------
gcc/fold-const.h | 8
gcc/gimple-fold.cc | 1315 ++++++++++++++++++++++++++++++++++
gcc/testsuite/gcc.dg/field-merge-1.c | 64 ++
gcc/testsuite/gcc.dg/field-merge-2.c | 31 +
gcc/testsuite/gcc.dg/field-merge-3.c | 36 +
gcc/testsuite/gcc.dg/field-merge-4.c | 40 +
gcc/testsuite/gcc.dg/field-merge-5.c | 40 +
gcc/testsuite/gcc.dg/field-merge-6.c | 26 +
gcc/testsuite/gcc.dg/field-merge-7.c | 23 +
gcc/tree-ssa-ifcombine.cc | 572 ++++++++++++---
13 files changed, 2040 insertions(+), 633 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/field-merge-1.c
create mode 100644 gcc/testsuite/gcc.dg/field-merge-2.c
create mode 100644 gcc/testsuite/gcc.dg/field-merge-3.c
create mode 100644 gcc/testsuite/gcc.dg/field-merge-4.c
create mode 100644 gcc/testsuite/gcc.dg/field-merge-5.c
create mode 100644 gcc/testsuite/gcc.dg/field-merge-6.c
create mode 100644 gcc/testsuite/gcc.dg/field-merge-7.c
Comments
On Thu, Sep 26, 2024 at 10:49 AM Alexandre Oliva <oliva@adacore.com> wrote:
>
>
> This patch introduces various improvements to the logic that merges
> field compares, moving it into ifcombine.
>
> Before the patch, we could merge:
>
> (a.x1 EQNE b.x1) ANDOR (a.y1 EQNE b.y1)
>
> into something like:
>
> (((type *)&a)[Na] & MASK) EQNE (((type *)&b)[Nb] & MASK)
>
> if both of A's fields live within the same alignment boundaries, and
> so do B's, at the same relative positions. Constants may be used
> instead of the object B.
>
> The initial goal of this patch was to enable such combinations when a
> field crossed alignment boundaries, e.g. for packed types. We can't
> generally access such fields with a single memory access, so when we
> come across such a compare, we will attempt to combine each access
> separately.
>
> Some merging opportunities were missed because of right-shifts,
> compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and
> narrowing conversions, especially after earlier merges. This patch
> introduces handlers for several cases involving these.
>
> The merging of multiple field accesses into wider bitfield-like
> accesses is undesirable to do too early in compilation, so we move it
> from folding to ifcombine, and extend ifcombine to merge noncontiguous
> compares, absent intervening side effects. VUSEs used to prevent
> ifcombine; that seemed excessively conservative, since relevant side
> effects were already tested, including the possibility of trapping
> loads, so that's removed.
>
> Unlike earlier ifcombine, when merging noncontiguous compares the
> merged compare must replace the earliest compare, which may require
> moving up the DEFs that contributed to the latter compare.
>
> When it is the second of a noncontiguous pair of compares that first
> accesses a word, we may merge the first compare with part of the
> second compare that refers to the same word, keeping the compare of
> the remaining bits at the spot where the second compare used to be.
>
> Handling compares with non-constant fields was somewhat generalized
> from what fold used to do, now handling non-adjacent fields, even if a
> field of one object crosses an alignment boundary but the other
> doesn't.
Thanks for working on this. There's #if 0 portions in the patch - did you
send the correct version?
>
> The -Wno-error for toplev.o on rs6000 is because of toplev.c's:
>
> if ((flag_sanitize & SANITIZE_ADDRESS)
> && !FRAME_GROWS_DOWNWARD)
>
> and rs6000.h's:
>
> #define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0 \
> || (flag_sanitize & SANITIZE_ADDRESS) != 0)
>
> The mutually exclusive conditions involving flag_sanitize are now
> noticed and reported by ifcombine's warning on mutually exclusive
> compares. i386's needs -Wno-error for insn-attrtab.o for similar
> reasons.
I wonder if we can check the locations as to whether the spelling location
is different and suppress diagnostics when macro expansions were involved?
Adding -Wno-error should be the last resort and I suspect such cases
happen in user code as well?
More comments inline.
>
> for gcc/ChangeLog
>
> * fold-const.cc (make_bit_field): Export.
> (all_ones_mask_p): Drop.
> (unextend, decode_field_reference, fold_truth_andor_1): Move
> field compare merging logic...
> * gimple-fold.cc: ... here.
> (ssa_is_substitutable_p, is_cast_p, is_binop_p): New.
> (prepare_xor, follow_load): New.
> (compute_split_boundary_from_align): New.
> (make_bit_field_load, build_split_load): New.
> (reuse_split_load, mergeable_loads_p): New.
> (fold_truth_andor_maybe_separate): New.
> * tree-ssa-ifcombine.cc: Include bitmap.h.
> (constant_condition_p): New.
> (recognize_if_then_else_nc, recognize_if_succs): New.
> (bb_no_side_effects_p): Don't reject VUSEs.
> (update_profile_after_ifcombine): Adjust for noncontiguous
> merges.
> (ifcombine_mark_ssa_name): New.
> (struct ifcombine_mark_ssa_name_t): New.
> (ifcombine_mark_ssa_name_walk): New.
> (ifcombine_replace_cond): Extended for noncontiguous merges
> after factoring out of...
> (ifcombine_ifandif): ... this. Drop result_inv arg. Try
> fold_truth_andor_maybe_separate.
> (tree_ssa_ifcombine_bb_1): Add outer_succ_bb arg. Call
> recognize_if_then_else_nc. Adjust ifcombine_ifandif calls.
> (tree_ssa_ifcombine_bb): Return the earliest affected block.
> Call recognize_if_then_else_nc. Try noncontiguous blocks.
> (pass_tree_ifcombine::execute): Retry affected blocks.
> * config/i386/t-i386 (insn-attrtab.o-warn): Disable errors.
> * config/rs6000/t-rs6000 (toplev.o-warn): Likewise.
>
> for gcc/testsuite/ChangeLog
>
> * gcc.dg/field-merge-1.c: New.
> * gcc.dg/field-merge-2.c: New.
> * gcc.dg/field-merge-3.c: New.
> * gcc.dg/field-merge-4.c: New.
> * gcc.dg/field-merge-5.c: New.
> * gcc.dg/field-merge-6.c: New.
> * gcc.dg/field-merge-7.c: New.
> ---
> gcc/config/i386/t-i386 | 2
> gcc/config/rs6000/t-rs6000 | 4
> gcc/fold-const.cc | 512 -------------
> gcc/fold-const.h | 8
> gcc/gimple-fold.cc | 1315 ++++++++++++++++++++++++++++++++++
> gcc/testsuite/gcc.dg/field-merge-1.c | 64 ++
> gcc/testsuite/gcc.dg/field-merge-2.c | 31 +
> gcc/testsuite/gcc.dg/field-merge-3.c | 36 +
> gcc/testsuite/gcc.dg/field-merge-4.c | 40 +
> gcc/testsuite/gcc.dg/field-merge-5.c | 40 +
> gcc/testsuite/gcc.dg/field-merge-6.c | 26 +
> gcc/testsuite/gcc.dg/field-merge-7.c | 23 +
> gcc/tree-ssa-ifcombine.cc | 572 ++++++++++++---
> 13 files changed, 2040 insertions(+), 633 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/field-merge-1.c
> create mode 100644 gcc/testsuite/gcc.dg/field-merge-2.c
> create mode 100644 gcc/testsuite/gcc.dg/field-merge-3.c
> create mode 100644 gcc/testsuite/gcc.dg/field-merge-4.c
> create mode 100644 gcc/testsuite/gcc.dg/field-merge-5.c
> create mode 100644 gcc/testsuite/gcc.dg/field-merge-6.c
> create mode 100644 gcc/testsuite/gcc.dg/field-merge-7.c
>
> diff --git a/gcc/config/i386/t-i386 b/gcc/config/i386/t-i386
> index bf4ae109af986..1b904787ec624 100644
> --- a/gcc/config/i386/t-i386
> +++ b/gcc/config/i386/t-i386
> @@ -79,3 +79,5 @@ s-i386-bt: $(srcdir)/config/i386/i386-builtin-types.awk \
> $(AWK) -f $^ > tmp-bt.inc
> $(SHELL) $(srcdir)/../move-if-change tmp-bt.inc i386-builtin-types.inc
> $(STAMP) $@
> +
> +insn-attrtab.o-warn = -Wno-error
> diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
> index 155788de40a35..a83968d663a61 100644
> --- a/gcc/config/rs6000/t-rs6000
> +++ b/gcc/config/rs6000/t-rs6000
> @@ -95,6 +95,10 @@ $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
> $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
> $(srcdir)/config/rs6000/rs6000-tables.opt
>
> +# FRAME_GROWS_DOWNWARD tests flag_sanitize in a way that rules out a
> +# test in toplev.c.
> +toplev.o-warn = -Wno-error
> +
> # The rs6000 backend doesn't cause warnings in these files.
> insn-conditions.o-warn =
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 0578f42ac0c51..552a706ab6def 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -137,7 +137,6 @@ static tree range_successor (tree);
> static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
> static tree fold_cond_expr_with_comparison (location_t, tree, enum tree_code,
> tree, tree, tree, tree);
> -static tree unextend (tree, int, int, tree);
> static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
> static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *);
> static tree fold_binary_op_with_conditional_arg (location_t,
> @@ -4701,7 +4700,7 @@ invert_truthvalue_loc (location_t loc, tree arg)
> is the original memory reference used to preserve the alias set of
> the access. */
>
> -static tree
> +tree
> make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type,
> HOST_WIDE_INT bitsize, poly_int64 bitpos,
> int unsignedp, int reversep)
> @@ -4951,136 +4950,6 @@ optimize_bit_field_compare (location_t loc, enum tree_code code,
> return lhs;
> }
>
> -/* Subroutine for fold_truth_andor_1: decode a field reference.
> -
> - If EXP is a comparison reference, we return the innermost reference.
> -
> - *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
> - set to the starting bit number.
> -
> - If the innermost field can be completely contained in a mode-sized
> - unit, *PMODE is set to that mode. Otherwise, it is set to VOIDmode.
> -
> - *PVOLATILEP is set to 1 if the any expression encountered is volatile;
> - otherwise it is not changed.
> -
> - *PUNSIGNEDP is set to the signedness of the field.
> -
> - *PREVERSEP is set to the storage order of the field.
> -
> - *PMASK is set to the mask used. This is either contained in a
> - BIT_AND_EXPR or derived from the width of the field.
> -
> - *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
> -
> - Return 0 if this is not a component reference or is one that we can't
> - do anything with. */
> -
> -static tree
> -decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
> - HOST_WIDE_INT *pbitpos, machine_mode *pmode,
> - int *punsignedp, int *preversep, int *pvolatilep,
> - tree *pmask, tree *pand_mask)
> -{
> - tree exp = *exp_;
> - tree outer_type = 0;
> - tree and_mask = 0;
> - tree mask, inner, offset;
> - tree unsigned_type;
> - unsigned int precision;
> -
> - /* All the optimizations using this function assume integer fields.
> - There are problems with FP fields since the type_for_size call
> - below can fail for, e.g., XFmode. */
> - if (! INTEGRAL_TYPE_P (TREE_TYPE (exp)))
> - return NULL_TREE;
> -
> - /* We are interested in the bare arrangement of bits, so strip everything
> - that doesn't affect the machine mode. However, record the type of the
> - outermost expression if it may matter below. */
> - if (CONVERT_EXPR_P (exp)
> - || TREE_CODE (exp) == NON_LVALUE_EXPR)
> - outer_type = TREE_TYPE (exp);
> - STRIP_NOPS (exp);
> -
> - if (TREE_CODE (exp) == BIT_AND_EXPR)
> - {
> - and_mask = TREE_OPERAND (exp, 1);
> - exp = TREE_OPERAND (exp, 0);
> - STRIP_NOPS (exp); STRIP_NOPS (and_mask);
> - if (TREE_CODE (and_mask) != INTEGER_CST)
> - return NULL_TREE;
> - }
> -
> - poly_int64 poly_bitsize, poly_bitpos;
> - inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
> - pmode, punsignedp, preversep, pvolatilep);
> - if ((inner == exp && and_mask == 0)
> - || !poly_bitsize.is_constant (pbitsize)
> - || !poly_bitpos.is_constant (pbitpos)
> - || *pbitsize < 0
> - || offset != 0
> - || TREE_CODE (inner) == PLACEHOLDER_EXPR
> - /* We eventually want to build a larger reference and need to take
> - the address of this. */
> - || (!REFERENCE_CLASS_P (inner) && !DECL_P (inner))
> - /* Reject out-of-bound accesses (PR79731). */
> - || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
> - && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
> - *pbitpos + *pbitsize) < 0))
> - return NULL_TREE;
> -
> - unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
> - if (unsigned_type == NULL_TREE)
> - return NULL_TREE;
> -
> - *exp_ = exp;
> -
> - /* If the number of bits in the reference is the same as the bitsize of
> - the outer type, then the outer type gives the signedness. Otherwise
> - (in case of a small bitfield) the signedness is unchanged. */
> - if (outer_type && *pbitsize == TYPE_PRECISION (outer_type))
> - *punsignedp = TYPE_UNSIGNED (outer_type);
> -
> - /* Compute the mask to access the bitfield. */
> - precision = TYPE_PRECISION (unsigned_type);
> -
> - mask = build_int_cst_type (unsigned_type, -1);
> -
> - mask = const_binop (LSHIFT_EXPR, mask, size_int (precision - *pbitsize));
> - mask = const_binop (RSHIFT_EXPR, mask, size_int (precision - *pbitsize));
> -
> - /* Merge it with the mask we found in the BIT_AND_EXPR, if any. */
> - if (and_mask != 0)
> - mask = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
> - fold_convert_loc (loc, unsigned_type, and_mask), mask);
> -
> - *pmask = mask;
> - *pand_mask = and_mask;
> - return inner;
> -}
> -
> -/* Return nonzero if MASK represents a mask of SIZE ones in the low-order
> - bit positions and MASK is SIGNED. */
> -
> -static bool
> -all_ones_mask_p (const_tree mask, unsigned int size)
> -{
> - tree type = TREE_TYPE (mask);
> - unsigned int precision = TYPE_PRECISION (type);
> -
> - /* If this function returns true when the type of the mask is
> - UNSIGNED, then there will be errors. In particular see
> - gcc.c-torture/execute/990326-1.c. There does not appear to be
> - any documentation paper trail as to why this is so. But the pre
> - wide-int worked with that restriction and it has been preserved
> - here. */
> - if (size > precision || TYPE_SIGN (type) == UNSIGNED)
> - return false;
> -
> - return wi::mask (size, false, precision) == wi::to_wide (mask);
> -}
> -
> /* Subroutine for fold: determine if VAL is the INTEGER_CONST that
> represents the sign bit of EXP's type. If EXP represents a sign
> or zero extension, also test VAL against the unextended type.
> @@ -6390,48 +6259,6 @@ fold_range_test (location_t loc, enum tree_code code, tree type,
> return 0;
> }
>
> -/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
> - bit value. Arrange things so the extra bits will be set to zero if and
> - only if C is signed-extended to its full width. If MASK is nonzero,
> - it is an INTEGER_CST that should be AND'ed with the extra bits. */
> -
> -static tree
> -unextend (tree c, int p, int unsignedp, tree mask)
> -{
> - tree type = TREE_TYPE (c);
> - int modesize = GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (type));
> - tree temp;
> -
> - if (p == modesize || unsignedp)
> - return c;
> -
> - /* We work by getting just the sign bit into the low-order bit, then
> - into the high-order bit, then sign-extend. We then XOR that value
> - with C. */
> - temp = build_int_cst (TREE_TYPE (c),
> - wi::extract_uhwi (wi::to_wide (c), p - 1, 1));
> -
> - /* We must use a signed type in order to get an arithmetic right shift.
> - However, we must also avoid introducing accidental overflows, so that
> - a subsequent call to integer_zerop will work. Hence we must
> - do the type conversion here. At this point, the constant is either
> - zero or one, and the conversion to a signed type can never overflow.
> - We could get an overflow if this conversion is done anywhere else. */
> - if (TYPE_UNSIGNED (type))
> - temp = fold_convert (signed_type_for (type), temp);
> -
> - temp = const_binop (LSHIFT_EXPR, temp, size_int (modesize - 1));
> - temp = const_binop (RSHIFT_EXPR, temp, size_int (modesize - p - 1));
> - if (mask != 0)
> - temp = const_binop (BIT_AND_EXPR, temp,
> - fold_convert (TREE_TYPE (c), mask));
> - /* If necessary, convert the type back to match the type of C. */
> - if (TYPE_UNSIGNED (type))
> - temp = fold_convert (type, temp);
> -
> - return fold_convert (type, const_binop (BIT_XOR_EXPR, c, temp));
> -}
> -
> /* For an expression that has the form
> (A && B) || ~B
> or
> @@ -6502,20 +6329,13 @@ merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
> lhs, rhs);
> return NULL_TREE;
> }
> -
> +
> /* Find ways of folding logical expressions of LHS and RHS:
> Try to merge two comparisons to the same innermost item.
> Look for range tests like "ch >= '0' && ch <= '9'".
> Look for combinations of simple terms on machines with expensive branches
> and evaluate the RHS unconditionally.
>
> - For example, if we have p->a == 2 && p->b == 4 and we can make an
> - object large enough to span both A and B, we can do this with a comparison
> - against the object ANDed with the a mask.
> -
> - If we have p->a == q->a && p->b == q->b, we may be able to use bit masking
> - operations to do this with one comparison.
> -
> We check for both normal comparisons and the BIT_AND_EXPRs made this by
> function and the one above.
>
> @@ -6540,24 +6360,9 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
> convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
> comparison for one-bit fields. */
>
> - enum tree_code wanted_code;
> enum tree_code lcode, rcode;
> tree ll_arg, lr_arg, rl_arg, rr_arg;
> - tree ll_inner, lr_inner, rl_inner, rr_inner;
> - HOST_WIDE_INT ll_bitsize, ll_bitpos, lr_bitsize, lr_bitpos;
> - HOST_WIDE_INT rl_bitsize, rl_bitpos, rr_bitsize, rr_bitpos;
> - HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
> - HOST_WIDE_INT lnbitsize, lnbitpos, rnbitsize, rnbitpos;
> - int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
> - int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
> - machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
> - scalar_int_mode lnmode, rnmode;
> - tree ll_mask, lr_mask, rl_mask, rr_mask;
> - tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
> - tree l_const, r_const;
> - tree lntype, rntype, result;
> - HOST_WIDE_INT first_bit, end_bit;
> - int volatilep;
> + tree result;
>
> /* Start by getting the comparison codes. Fail if anything is volatile.
> If one operand is a BIT_AND_EXPR with the constant one, treat it as if
> @@ -6652,316 +6457,7 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
> build_int_cst (TREE_TYPE (ll_arg), 0));
> }
>
> - /* See if the comparisons can be merged. Then get all the parameters for
> - each side. */
> -
> - if ((lcode != EQ_EXPR && lcode != NE_EXPR)
> - || (rcode != EQ_EXPR && rcode != NE_EXPR))
> - return 0;
> -
> - ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
> - volatilep = 0;
> - ll_inner = decode_field_reference (loc, &ll_arg,
> - &ll_bitsize, &ll_bitpos, &ll_mode,
> - &ll_unsignedp, &ll_reversep, &volatilep,
> - &ll_mask, &ll_and_mask);
> - lr_inner = decode_field_reference (loc, &lr_arg,
> - &lr_bitsize, &lr_bitpos, &lr_mode,
> - &lr_unsignedp, &lr_reversep, &volatilep,
> - &lr_mask, &lr_and_mask);
> - rl_inner = decode_field_reference (loc, &rl_arg,
> - &rl_bitsize, &rl_bitpos, &rl_mode,
> - &rl_unsignedp, &rl_reversep, &volatilep,
> - &rl_mask, &rl_and_mask);
> - rr_inner = decode_field_reference (loc, &rr_arg,
> - &rr_bitsize, &rr_bitpos, &rr_mode,
> - &rr_unsignedp, &rr_reversep, &volatilep,
> - &rr_mask, &rr_and_mask);
> -
> - /* It must be true that the inner operation on the lhs of each
> - comparison must be the same if we are to be able to do anything.
> - Then see if we have constants. If not, the same must be true for
> - the rhs's. */
> - if (volatilep
> - || ll_reversep != rl_reversep
> - || ll_inner == 0 || rl_inner == 0
> - || ! operand_equal_p (ll_inner, rl_inner, 0))
> - return 0;
> -
> - if (TREE_CODE (lr_arg) == INTEGER_CST
> - && TREE_CODE (rr_arg) == INTEGER_CST)
> - {
> - l_const = lr_arg, r_const = rr_arg;
> - lr_reversep = ll_reversep;
> - }
> - else if (lr_reversep != rr_reversep
> - || lr_inner == 0 || rr_inner == 0
> - || ! operand_equal_p (lr_inner, rr_inner, 0))
> - return 0;
> - else
> - l_const = r_const = 0;
> -
> - /* If either comparison code is not correct for our logical operation,
> - fail. However, we can convert a one-bit comparison against zero into
> - the opposite comparison against that bit being set in the field. */
> -
> - wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
> - if (lcode != wanted_code)
> - {
> - if (l_const && integer_zerop (l_const) && integer_pow2p (ll_mask))
> - {
> - /* Make the left operand unsigned, since we are only interested
> - in the value of one bit. Otherwise we are doing the wrong
> - thing below. */
> - ll_unsignedp = 1;
> - l_const = ll_mask;
> - }
> - else
> - return 0;
> - }
> -
> - /* This is analogous to the code for l_const above. */
> - if (rcode != wanted_code)
> - {
> - if (r_const && integer_zerop (r_const) && integer_pow2p (rl_mask))
> - {
> - rl_unsignedp = 1;
> - r_const = rl_mask;
> - }
> - else
> - return 0;
> - }
> -
> - /* See if we can find a mode that contains both fields being compared on
> - the left. If we can't, fail. Otherwise, update all constants and masks
> - to be relative to a field of that size. */
> - first_bit = MIN (ll_bitpos, rl_bitpos);
> - end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
> - if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> - TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
> - volatilep, &lnmode))
> - return 0;
> -
> - lnbitsize = GET_MODE_BITSIZE (lnmode);
> - lnbitpos = first_bit & ~ (lnbitsize - 1);
> - lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
> - xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
> -
> - if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> - {
> - xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
> - xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
> - }
> -
> - ll_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc, lntype, ll_mask),
> - size_int (xll_bitpos));
> - rl_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc, lntype, rl_mask),
> - size_int (xrl_bitpos));
> - if (ll_mask == NULL_TREE || rl_mask == NULL_TREE)
> - return 0;
> -
> - if (l_const)
> - {
> - l_const = fold_convert_loc (loc, lntype, l_const);
> - l_const = unextend (l_const, ll_bitsize, ll_unsignedp, ll_and_mask);
> - l_const = const_binop (LSHIFT_EXPR, l_const, size_int (xll_bitpos));
> - if (l_const == NULL_TREE)
> - return 0;
> - if (! integer_zerop (const_binop (BIT_AND_EXPR, l_const,
> - fold_build1_loc (loc, BIT_NOT_EXPR,
> - lntype, ll_mask))))
> - {
> - warning (0, "comparison is always %d", wanted_code == NE_EXPR);
> -
> - return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> - }
> - }
> - if (r_const)
> - {
> - r_const = fold_convert_loc (loc, lntype, r_const);
> - r_const = unextend (r_const, rl_bitsize, rl_unsignedp, rl_and_mask);
> - r_const = const_binop (LSHIFT_EXPR, r_const, size_int (xrl_bitpos));
> - if (r_const == NULL_TREE)
> - return 0;
> - if (! integer_zerop (const_binop (BIT_AND_EXPR, r_const,
> - fold_build1_loc (loc, BIT_NOT_EXPR,
> - lntype, rl_mask))))
> - {
> - warning (0, "comparison is always %d", wanted_code == NE_EXPR);
> -
> - return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> - }
> - }
> -
> - /* If the right sides are not constant, do the same for it. Also,
> - disallow this optimization if a size, signedness or storage order
> - mismatch occurs between the left and right sides. */
> - if (l_const == 0)
> - {
> - if (ll_bitsize != lr_bitsize || rl_bitsize != rr_bitsize
> - || ll_unsignedp != lr_unsignedp || rl_unsignedp != rr_unsignedp
> - || ll_reversep != lr_reversep
> - /* Make sure the two fields on the right
> - correspond to the left without being swapped. */
> - || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos)
> - return 0;
> -
> - first_bit = MIN (lr_bitpos, rr_bitpos);
> - end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
> - if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> - TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
> - volatilep, &rnmode))
> - return 0;
> -
> - rnbitsize = GET_MODE_BITSIZE (rnmode);
> - rnbitpos = first_bit & ~ (rnbitsize - 1);
> - rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
> - xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
> -
> - if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> - {
> - xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
> - xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
> - }
> -
> - lr_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc,
> - rntype, lr_mask),
> - size_int (xlr_bitpos));
> - rr_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc,
> - rntype, rr_mask),
> - size_int (xrr_bitpos));
> - if (lr_mask == NULL_TREE || rr_mask == NULL_TREE)
> - return 0;
> -
> - /* Make a mask that corresponds to both fields being compared.
> - Do this for both items being compared. If the operands are the
> - same size and the bits being compared are in the same position
> - then we can do this by masking both and comparing the masked
> - results. */
> - ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
> - lr_mask = const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
> - if (lnbitsize == rnbitsize
> - && xll_bitpos == xlr_bitpos
> - && lnbitpos >= 0
> - && rnbitpos >= 0)
> - {
> - lhs = make_bit_field_ref (loc, ll_inner, ll_arg,
> - lntype, lnbitsize, lnbitpos,
> - ll_unsignedp || rl_unsignedp, ll_reversep);
> - if (! all_ones_mask_p (ll_mask, lnbitsize))
> - lhs = build2 (BIT_AND_EXPR, lntype, lhs, ll_mask);
> -
> - rhs = make_bit_field_ref (loc, lr_inner, lr_arg,
> - rntype, rnbitsize, rnbitpos,
> - lr_unsignedp || rr_unsignedp, lr_reversep);
> - if (! all_ones_mask_p (lr_mask, rnbitsize))
> - rhs = build2 (BIT_AND_EXPR, rntype, rhs, lr_mask);
> -
> - return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
> - }
> -
> - /* There is still another way we can do something: If both pairs of
> - fields being compared are adjacent, we may be able to make a wider
> - field containing them both.
> -
> - Note that we still must mask the lhs/rhs expressions. Furthermore,
> - the mask must be shifted to account for the shift done by
> - make_bit_field_ref. */
> - if (((ll_bitsize + ll_bitpos == rl_bitpos
> - && lr_bitsize + lr_bitpos == rr_bitpos)
> - || (ll_bitpos == rl_bitpos + rl_bitsize
> - && lr_bitpos == rr_bitpos + rr_bitsize))
> - && ll_bitpos >= 0
> - && rl_bitpos >= 0
> - && lr_bitpos >= 0
> - && rr_bitpos >= 0)
> - {
> - tree type;
> -
> - lhs = make_bit_field_ref (loc, ll_inner, ll_arg, lntype,
> - ll_bitsize + rl_bitsize,
> - MIN (ll_bitpos, rl_bitpos),
> - ll_unsignedp, ll_reversep);
> - rhs = make_bit_field_ref (loc, lr_inner, lr_arg, rntype,
> - lr_bitsize + rr_bitsize,
> - MIN (lr_bitpos, rr_bitpos),
> - lr_unsignedp, lr_reversep);
> -
> - ll_mask = const_binop (RSHIFT_EXPR, ll_mask,
> - size_int (MIN (xll_bitpos, xrl_bitpos)));
> - lr_mask = const_binop (RSHIFT_EXPR, lr_mask,
> - size_int (MIN (xlr_bitpos, xrr_bitpos)));
> - if (ll_mask == NULL_TREE || lr_mask == NULL_TREE)
> - return 0;
> -
> - /* Convert to the smaller type before masking out unwanted bits. */
> - type = lntype;
> - if (lntype != rntype)
> - {
> - if (lnbitsize > rnbitsize)
> - {
> - lhs = fold_convert_loc (loc, rntype, lhs);
> - ll_mask = fold_convert_loc (loc, rntype, ll_mask);
> - type = rntype;
> - }
> - else if (lnbitsize < rnbitsize)
> - {
> - rhs = fold_convert_loc (loc, lntype, rhs);
> - lr_mask = fold_convert_loc (loc, lntype, lr_mask);
> - type = lntype;
> - }
> - }
> -
> - if (! all_ones_mask_p (ll_mask, ll_bitsize + rl_bitsize))
> - lhs = build2 (BIT_AND_EXPR, type, lhs, ll_mask);
> -
> - if (! all_ones_mask_p (lr_mask, lr_bitsize + rr_bitsize))
> - rhs = build2 (BIT_AND_EXPR, type, rhs, lr_mask);
> -
> - return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
> - }
> -
> - return 0;
> - }
> -
> - /* Handle the case of comparisons with constants. If there is something in
> - common between the masks, those bits of the constants must be the same.
> - If not, the condition is always false. Test for this to avoid generating
> - incorrect code below. */
> - result = const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
> - if (! integer_zerop (result)
> - && simple_cst_equal (const_binop (BIT_AND_EXPR, result, l_const),
> - const_binop (BIT_AND_EXPR, result, r_const)) != 1)
> - {
> - if (wanted_code == NE_EXPR)
> - {
> - warning (0, "%<or%> of unmatched not-equal tests is always 1");
> - return constant_boolean_node (true, truth_type);
> - }
> - else
> - {
> - warning (0, "%<and%> of mutually exclusive equal-tests is always 0");
> - return constant_boolean_node (false, truth_type);
> - }
> - }
> -
> - if (lnbitpos < 0)
> - return 0;
> -
> - /* Construct the expression we will return. First get the component
> - reference we will make. Unless the mask is all ones the width of
> - that field, perform the mask operation. Then compare with the
> - merged constant. */
> - result = make_bit_field_ref (loc, ll_inner, ll_arg,
> - lntype, lnbitsize, lnbitpos,
> - ll_unsignedp || rl_unsignedp, ll_reversep);
> -
> - ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
> - if (! all_ones_mask_p (ll_mask, lnbitsize))
> - result = build2_loc (loc, BIT_AND_EXPR, lntype, result, ll_mask);
> -
> - return build2_loc (loc, wanted_code, truth_type, result,
> - const_binop (BIT_IOR_EXPR, l_const, r_const));
> + return 0;
> }
>
> /* T is an integer expression that is being multiplied, divided, or taken a
> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
> index 3e3998b57b042..6372dee74e9eb 100644
> --- a/gcc/fold-const.h
> +++ b/gcc/fold-const.h
> @@ -253,11 +253,19 @@ extern tree fold_build_pointer_plus_hwi_loc (location_t loc, tree ptr, HOST_WIDE
> extern tree_code minmax_from_comparison (tree_code, tree, tree,
> tree, tree);
>
> +extern tree make_bit_field_ref (location_t, tree, tree, tree,
> + HOST_WIDE_INT, poly_int64, int, int);
> +
> /* In gimple-fold.cc. */
> extern void clear_type_padding_in_mask (tree, unsigned char *);
> extern bool clear_padding_type_may_have_padding_p (tree);
> extern bool arith_overflowed_p (enum tree_code, const_tree, const_tree,
> const_tree);
> +extern tree fold_truth_andor_maybe_separate (location_t, enum tree_code, tree,
> + enum tree_code, tree, tree,
> + enum tree_code, tree, tree,
> + tree *);
> +
>
> /* Class used to compare gimple operands. */
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 942de7720fd2b..13ffd16e646eb 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -69,6 +69,7 @@ along with GCC; see the file COPYING3. If not see
> #include "varasm.h"
> #include "internal-fn.h"
> #include "gimple-range.h"
> +#include "tree-ssa-loop-niter.h" // stmt_dominates_stmt_p
Ick. Don't use it.
> enum strlen_range_kind {
> /* Compute the exact constant string length. */
> @@ -7384,7 +7385,1321 @@ maybe_fold_comparisons_from_match_pd (tree type, enum tree_code code,
>
> return NULL_TREE;
> }
> +
> +/* Return TRUE if expression STMT is suitable for replacement. ???
> + Same as ssa_is_replaceable_p, except that we don't insist it has a
> + single use. */
>
> +static bool
> +ssa_is_substitutable_p (gimple *stmt)
I don't yet see how many times you use this - there's other passes
which need to query a subset of this like DCE or DSE for whether
a stmt can be elided.
This is mostly gimple_has_side_effects () but there's also
!cfun->can_delete_dead_exceptions && stmt_could_throw_p (cfun, stmt)
I think the name is probably bad and suggests it's generally applicable,
so maybe rename it to something more specific for your use?
> +{
> +#if 0
> + use_operand_p use_p;
> +#endif
> + tree def;
> +#if 0
> + gimple *use_stmt;
> +#endif
> +
> + /* Only consider modify stmts. */
> + if (!is_gimple_assign (stmt))
> + return false;
> +
> + /* If the statement may throw an exception, it cannot be replaced. */
> + if (stmt_could_throw_p (cfun, stmt))
only when it throws internally?
> + return false;
> +
> + /* Punt if there is more than 1 def. */
> + def = SINGLE_SSA_TREE_OPERAND (stmt, SSA_OP_DEF);
> + if (!def)
NULL_DEF_OPERAND_P (def)
> + return false;
> +
> +#if 0
> + /* Only consider definitions which have a single use. */
> + if (!single_imm_use (def, &use_p, &use_stmt))
> + return false;
> +
> + /* Used in this block, but at the TOP of the block, not the end. */
> + if (gimple_code (use_stmt) == GIMPLE_PHI)
> + return false;
> +#endif
> +
> + /* There must be no VDEFs. */
> + if (gimple_vdef (stmt))
> + return false;
> +
> + /* Float expressions must go through memory if float-store is on. */
> + if (flag_float_store
> + && FLOAT_TYPE_P (TREE_TYPE (def)))
> + return false;
This looks odd - did you run into actual problems? No other pass cares.
> +
> + /* An assignment with a register variable on the RHS is not
> + replaceable. */
> + if (gimple_assign_rhs_code (stmt) == VAR_DECL
> + && DECL_HARD_REGISTER (gimple_assign_rhs1 (stmt)))
> + return false;
> +
> + /* No function calls can be replaced. */
> + if (is_gimple_call (stmt))
> + return false;
calls are not GIMPLE assignments, so redundant.
> +
> + /* Leave any stmt with volatile operands alone as well. */
> + if (gimple_has_volatile_ops (stmt))
> + return false;
> +
> + return true;
> +}
> +
> +/* Follow substitutable SSA DEFs for *NAME, including type casts,
> + adjusting *NAME to the single rhs or the type cast operand along
> + the way. Return the target type of the earliest type cast
> + found. */
> +
> +static tree
> +is_cast_p (tree *name)
very short name, add a prefix?
> +{
> + tree type = 0;
> +
> + while (TREE_CODE (*name) == SSA_NAME
> + && !SSA_NAME_IS_DEFAULT_DEF (*name))
> + {
> + gimple *def = SSA_NAME_DEF_STMT (*name);
> +
> + if (!ssa_is_substitutable_p (def))
Overly restrictive/expensive? Just check if it's a GIMPLE assign?
> + break;
> +
> + if (gimple_num_ops (def) != 2)
> + break;
> +
> + if (gimple_assign_single_p (def))
> + {
> + if (gimple_assign_load_p (def))
> + break;
> + *name = gimple_assign_rhs1 (def);
that could be a constant, or a REALPART_EXPR. Maybe move after ...
> + continue;
> + }
> +
> + if (!gimple_assign_cast_p (def))
> + break;
... this?
> +
> + if (!type)
> + type = TREE_TYPE (*name);
> + *name = gimple_assign_rhs1 (def);
> + }
> +
> + return type;
Overall this is a very odd function as it strips even (widen)(truncate)x?
> +}
> +
> +/* Follow substitutable SSA DEFs for *NAME. If we find it is assigned
> + a CODE binary expr, return the second operand, and set *NAME to the
> + first operand. */
> +
> +static tree
> +is_binop_p (enum tree_code code, tree *name)
Likewise to all of the above.
> +{
> + while (TREE_CODE (*name) == SSA_NAME
> + && !SSA_NAME_IS_DEFAULT_DEF (*name))
> + {
> + gimple *def = SSA_NAME_DEF_STMT (*name);
> +
> + if (!ssa_is_substitutable_p (def))
> + return 0;
> +
> + switch (gimple_num_ops (def))
> + {
> + default:
> + return 0;
> +
> + case 2:
> + if (gimple_assign_single_p (def) && !gimple_assign_load_p (def))
> + {
> + *name = gimple_assign_rhs1 (def);
> + continue;
> + }
> + return 0;
> +
> + case 3:
> + break;
> + }
> +
> + if (gimple_assign_rhs_code (def) != code)
> + return 0;
> +
> + *name = gimple_assign_rhs1 (def);
> + return gimple_assign_rhs2 (def);
> + }
> +
> + return 0;
> +}
> +
> +/* If *R_ARG is a constant zero, and L_ARG is a possibly masked
> + BIT_XOR_EXPR, return 1 and set *r_arg to l_arg.
> + Otherwise, return 0.
> +
> + The returned value should be passed to decode_field_reference for it
> + to handle l_arg, and then doubled for r_arg. */
> +
> +static int
> +prepare_xor (tree l_arg, tree *r_arg)
Same.
> +{
> + int ret = 0;
> +
> + if (!integer_zerop (*r_arg))
> + return ret;
> +
> + tree exp = l_arg;
> +
> + if (tree and_mask = is_binop_p (BIT_AND_EXPR, &exp))
> + {
> + if (TREE_CODE (and_mask) != INTEGER_CST)
> + return ret;
> + }
> +
> + if (is_binop_p (BIT_XOR_EXPR, &exp))
> + {
> + *r_arg = l_arg;
> + return 1;
> + }
> +
> + return ret;
> +}
> +
> +/* If EXP is a SSA_NAME whose DEF is a load stmt, set *LOAD to it and
> + return its RHS, otherwise return EXP. */
> +
> +static tree
> +follow_load (tree exp, gimple **load)
> +{
> + if (TREE_CODE (exp) == SSA_NAME
> + && !SSA_NAME_IS_DEFAULT_DEF (exp))
> + {
> + gimple *def = SSA_NAME_DEF_STMT (exp);
> + if (gimple_assign_load_p (def))
> + {
> + *load = def;
> + exp = gimple_assign_rhs1 (def);
> + }
> + }
> +
> + return exp;
> +}
> +
> +/* Subroutine for fold_truth_andor_1: decode a field reference.
> +
> + If EXP is a comparison reference, we return the innermost reference.
> +
> + *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
> + set to the starting bit number.
> +
> + If the innermost field can be completely contained in a mode-sized
> + unit, *PMODE is set to that mode. Otherwise, it is set to VOIDmode.
> +
> + *PVOLATILEP is set to 1 if the any expression encountered is volatile;
> + otherwise it is not changed.
> +
> + *PUNSIGNEDP is set to the signedness of the field.
> +
> + *PREVERSEP is set to the storage order of the field.
> +
> + *PMASK is set to the mask used. This is either contained in a
> + BIT_AND_EXPR or derived from the width of the field.
> +
> + *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
> +
> + XOR_WHICH is 1 or 2 if EXP was found to be a (possibly masked)
> + BIT_XOR_EXPR compared with zero. We're to take the first or second
> + operand thereof if so. It should be zero otherwise.
> +
> + *LOAD is set to the load stmt of the innermost reference, if any,
> + *and NULL otherwise.
> +
> + Return 0 if this is not a component reference or is one that we can't
> + do anything with. */
> +
> +static tree
> +decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
> + HOST_WIDE_INT *pbitpos, machine_mode *pmode,
> + int *punsignedp, int *preversep, int *pvolatilep,
> + tree *pmask, tree *pand_mask, int xor_which,
> + gimple **load)
> +{
> + tree exp = *exp_;
> + tree outer_type = 0;
> + tree and_mask = 0;
> + tree mask, inner, offset;
> + tree unsigned_type;
> + unsigned int precision;
> + HOST_WIDE_INT shiftrt = 0;
> +
> + *load = NULL;
> +
> + /* All the optimizations using this function assume integer fields.
> + There are problems with FP fields since the type_for_size call
> + below can fail for, e.g., XFmode. */
> + if (! INTEGRAL_TYPE_P (TREE_TYPE (exp)))
> + return NULL_TREE;
> +
> + /* We are interested in the bare arrangement of bits, so strip everything
> + that doesn't affect the machine mode. However, record the type of the
> + outermost expression if it may matter below. */
> + outer_type = is_cast_p (&exp);
I think is_cast_p strips more than casts not affecting the machine mode.
It looks like you want a STRIP_NOPS that works on GIMPLE?
> +
> + if ((and_mask = is_binop_p (BIT_AND_EXPR, &exp)))
> + {
> + if (TREE_CODE (and_mask) != INTEGER_CST)
> + return NULL_TREE;
> + }
> +
> + if (xor_which)
> + {
> + tree op2 = is_binop_p (BIT_XOR_EXPR, &exp);
> + gcc_checking_assert (op2);
> + if (xor_which > 1)
> + exp = op2;
> + }
> +
> + if (tree t = is_cast_p (&exp))
> + if (!outer_type)
> + outer_type = t;
> +
> + if (tree shift = is_binop_p (RSHIFT_EXPR, &exp))
> + {
> + if (TREE_CODE (shift) != INTEGER_CST || !tree_fits_shwi_p (shift))
> + return NULL_TREE;
> + shiftrt = tree_to_shwi (shift);
> + if (shiftrt <= 0)
> + return NULL_TREE;
> + }
> +
> + if (tree t = is_cast_p (&exp))
> + if (!outer_type)
> + outer_type = t;
I'm not sure if it would scale, but it looks like there's a certain
set of patterns
this tries to match - did you think of using a (match ...) pattern in match.pd
instead?
> + exp = follow_load (exp, load);
> +
> + poly_int64 poly_bitsize, poly_bitpos;
> + inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
> + pmode, punsignedp, preversep, pvolatilep);
> +
> + if ((inner == exp && and_mask == 0)
> + || !poly_bitsize.is_constant (pbitsize)
> + || !poly_bitpos.is_constant (pbitpos)
> + || *pbitsize <= shiftrt
> + || offset != 0
> + || TREE_CODE (inner) == PLACEHOLDER_EXPR
> + /* Reject out-of-bound accesses (PR79731). */
> + || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
> + && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
> + *pbitpos + *pbitsize) < 0))
> + return NULL_TREE;
> +
> + if (shiftrt)
> + {
> + if (!*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> + *pbitpos += shiftrt;
> + *pbitsize -= shiftrt;
> + }
> +
> + if (outer_type && *pbitsize > TYPE_PRECISION (outer_type))
> + {
> + HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type);
> + if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> + *pbitpos += excess;
> + *pbitsize -= excess;
> + }
> +
> + unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
> + if (unsigned_type == NULL_TREE)
> + return NULL_TREE;
> +
> + *exp_ = exp;
> +
> + /* If the number of bits in the reference is the same as the bitsize of
> + the outer type, then the outer type gives the signedness. Otherwise
> + (in case of a small bitfield) the signedness is unchanged. */
> + if (outer_type && *pbitsize == TYPE_PRECISION (outer_type))
> + *punsignedp = TYPE_UNSIGNED (outer_type);
> +
> + /* Compute the mask to access the bitfield. */
> + precision = TYPE_PRECISION (unsigned_type);
> +
> + mask = build_int_cst_type (unsigned_type, -1);
> +
> + mask = int_const_binop (LSHIFT_EXPR, mask, size_int (precision - *pbitsize));
> + mask = int_const_binop (RSHIFT_EXPR, mask, size_int (precision - *pbitsize));
> +
> + /* Merge it with the mask we found in the BIT_AND_EXPR, if any. */
> + if (and_mask != 0)
> + mask = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
> + fold_convert_loc (loc, unsigned_type, and_mask), mask);
> +
> + *pmask = mask;
> + *pand_mask = and_mask;
> + return inner;
> +}
> +
> +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
> + bit value. Arrange things so the extra bits will be set to zero if and
> + only if C is signed-extended to its full width. If MASK is nonzero,
> + it is an INTEGER_CST that should be AND'ed with the extra bits. */
> +
> +static tree
> +unextend (tree c, int p, int unsignedp, tree mask)
> +{
> + tree type = TREE_TYPE (c);
> + int modesize = GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (type));
> + tree temp;
> +
> + if (p == modesize || unsignedp)
> + return c;
> +
> + /* We work by getting just the sign bit into the low-order bit, then
> + into the high-order bit, then sign-extend. We then XOR that value
> + with C. */
> + temp = build_int_cst (TREE_TYPE (c),
> + wi::extract_uhwi (wi::to_wide (c), p - 1, 1));
> +
> + /* We must use a signed type in order to get an arithmetic right shift.
> + However, we must also avoid introducing accidental overflows, so that
> + a subsequent call to integer_zerop will work. Hence we must
> + do the type conversion here. At this point, the constant is either
> + zero or one, and the conversion to a signed type can never overflow.
> + We could get an overflow if this conversion is done anywhere else. */
> + if (TYPE_UNSIGNED (type))
> + temp = fold_convert (signed_type_for (type), temp);
> +
> + temp = int_const_binop (LSHIFT_EXPR, temp, size_int (modesize - 1));
> + temp = int_const_binop (RSHIFT_EXPR, temp, size_int (modesize - p - 1));
> + if (mask != 0)
> + temp = int_const_binop (BIT_AND_EXPR, temp,
> + fold_convert (TREE_TYPE (c), mask));
> + /* If necessary, convert the type back to match the type of C. */
> + if (TYPE_UNSIGNED (type))
> + temp = fold_convert (type, temp);
> +
> + return fold_convert (type, int_const_binop (BIT_XOR_EXPR, c, temp));
I'd like all of the above to be done via wide_int, that has utilities to build
(shifted) masks and sign- or zero-extend from specific bits.
> +}
> +
> +/* Return the one bitpos within bit extents L or R that is at an
> + ALIGN-bit alignment boundary, or -1 if there is more than one such
> + boundary, if there isn't any, or if there is any such boundary
> + between the extents. L and R are given by bitpos and bitsize. If
> + it doesn't return -1, there are two consecutive ALIGN-bit words
> + that contain both extents, and at least one of the extents
> + straddles across the returned alignment boundary. */
> +
> +static inline HOST_WIDE_INT
> +compute_split_boundary_from_align (HOST_WIDE_INT align,
> + HOST_WIDE_INT l_bitpos,
> + HOST_WIDE_INT l_bitsize,
> + HOST_WIDE_INT r_bitpos,
> + HOST_WIDE_INT r_bitsize)
> +{
> + HOST_WIDE_INT amask = ~(align - 1);
> +
> + HOST_WIDE_INT first_bit = MIN (l_bitpos, r_bitpos);
> + HOST_WIDE_INT end_bit = MAX (l_bitpos + l_bitsize, r_bitpos + r_bitsize);
> +
> + HOST_WIDE_INT boundary = (end_bit - 1) & amask;
> +
> + /* Make sure we're crossing no more than one alignment boundary.
> +
> + ??? We don't have logic to recombine loads of two adjacent
> + fields that each crosses a different alignment boundary, so
> + as to load the middle word only once, if other words can't be
> + otherwise recombined. */
> + if (boundary - first_bit > align)
> + return -1;
> +
> + HOST_WIDE_INT l_start_word = l_bitpos & amask;
> + HOST_WIDE_INT l_end_word = (l_bitpos + l_bitsize - 1) & amask;
> +
> + HOST_WIDE_INT r_start_word = r_bitpos & amask;
> + HOST_WIDE_INT r_end_word = (r_bitpos + r_bitsize - 1) & amask;
> +
> + /* If neither field straddles across an alignment boundary, it's no
> + use to even try to merge them. */
> + if (l_start_word == l_end_word && r_start_word == r_end_word)
> + return -1;
> +
> + return boundary;
> +}
> +
> +/* Make a bit_field_ref. If POINT is NULL, return the BIT_FIELD_REF.
> + Otherwise, build and insert a load stmt before POINT, and return
> + the SSA_NAME. ??? Rewrite LOAD in terms of the bitfield? */
> +
> +static tree
> +make_bit_field_load (location_t loc, tree inner, tree orig_inner, tree type,
> + HOST_WIDE_INT bitsize, poly_int64 bitpos,
> + int unsignedp, int reversep, gimple *point)
> +{
> + tree ref = make_bit_field_ref (loc, unshare_expr (inner),
> + unshare_expr (orig_inner),
> + type, bitsize, bitpos,
> + unsignedp, reversep);
> + if (!point)
> + return ref;
> +
> + gimple_stmt_iterator gsi = gsi_for_stmt (point);
> + return force_gimple_operand_gsi (&gsi, ref,
> + true, NULL_TREE,
> + true, GSI_SAME_STMT);
I suppose 'inner' is what the earlier get_inner_reference returned - I don't
think this built load properly preserves TBAA properties, at least
a MEM[(char *)&a] would have an inner of 'a'?
> +}
> +
> +/* Initialize ln_arg[0] and ln_arg[1] to a pair of newly-created (at
> + LOC) loads from INNER (from ORIG_INNER), of modes MODE and MODE2,
> + respectively, starting at BIT_POS, using reversed endianness if
> + REVERSEP. Also initialize BITPOS (the starting position of each
> + part into INNER), BITSIZ (the bit count starting at BITPOS),
> + TOSHIFT[1] (the amount by which the part and its mask are to be
> + shifted right to bring its least-significant bit to bit zero) and
> + SHIFTED (the amount by which the part, by separate loading, has
> + already been shifted right, but that the mask needs shifting to
> + match). */
> +
> +static inline void
> +build_split_load (tree /* out */ ln_arg[2],
> + HOST_WIDE_INT /* out */ bitpos[2],
> + HOST_WIDE_INT /* out */ bitsiz[2],
> + HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
> + HOST_WIDE_INT /* out */ shifted[2],
> + location_t loc, tree inner, tree orig_inner,
> + scalar_int_mode mode, scalar_int_mode mode2,
> + HOST_WIDE_INT bit_pos, bool reversep,
> + gimple *point[2])
> +{
> + bitsiz[0] = GET_MODE_BITSIZE (mode);
> + bitsiz[1] = GET_MODE_BITSIZE (mode2);
> +
> + for (int i = 0; i < 2; i++)
> + {
> + tree type = lang_hooks.types.type_for_size (bitsiz[i], 1);
Use type_for_mode?
> + bitpos[i] = bit_pos;
> + ln_arg[i] = make_bit_field_load (loc, inner, orig_inner,
> + type, bitsiz[i],
> + bit_pos, 1, reversep, point[i]);
> + bit_pos += bitsiz[i];
> + }
> +
> + toshift[1] = toshift[0];
> + if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> + {
> + shifted[0] = bitsiz[1];
> + shifted[1] = 0;
> + toshift[0] = 0;
> + }
> + else
> + {
> + shifted[1] = bitsiz[0];
> + shifted[0] = 0;
> + toshift[1] = 0;
> + }
> +}
> +
> +/* Make arrangements to split at bit BOUNDARY a single loaded word
> + (with REVERSEP bit order) LN_ARG[0], to be shifted right by
> + TOSHIFT[0] to bring the field of interest to the least-significant
> + bit. The expectation is that the same loaded word will be
> + propagated from part 0 to part 1, with just different shifting and
> + masking to extract both parts. MASK is not expected to do more
> + than masking out the bits that belong to the other part. See
> + build_split_load for more information on the other fields. */
> +
> +static inline void
> +reuse_split_load (tree /* in[0] out[1] */ ln_arg[2],
> + HOST_WIDE_INT /* in[0] out[1] */ bitpos[2],
> + HOST_WIDE_INT /* in[0] out[1] */ bitsiz[2],
> + HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
> + HOST_WIDE_INT /* out */ shifted[2],
> + tree /* out */ mask[2],
> + HOST_WIDE_INT boundary, bool reversep)
> +{
> + ln_arg[1] = ln_arg[0];
> + bitpos[1] = bitpos[0];
> + bitsiz[1] = bitsiz[0];
> + shifted[1] = shifted[0] = 0;
> +
> + tree basemask = build_int_cst_type (TREE_TYPE (ln_arg[0]), -1);
> +
> + if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> + {
> + toshift[1] = toshift[0];
> + toshift[0] = bitpos[0] + bitsiz[0] - boundary;
> + mask[0] = int_const_binop (LSHIFT_EXPR, basemask,
> + bitsize_int (toshift[0]));
> + mask[1] = int_const_binop (BIT_XOR_EXPR, basemask, mask[0]);
> + }
> + else
> + {
> + toshift[1] = boundary - bitpos[1];
> + mask[1] = int_const_binop (LSHIFT_EXPR, basemask,
> + bitsize_int (toshift[1]));
> + mask[0] = int_const_binop (BIT_XOR_EXPR, basemask, mask[1]);
> + }
> +}
> +
> +/* Return nonzero if LSTMT and RSTMT, assumed to load from the same
> + words, can be safely merged into a single load. Return zero if
> + there are intervening stores, or if neither stmt dominates the
> + other. If they are mergeable, return -1 if the merged load should
> + be inserted before LSTMT to dominate both, otherwise return +1. */
> +
> +static int
> +mergeable_loads_p (gimple *lstmt, gimple *rstmt)
> +{
> + gimple *stmts[2] = { lstmt, rstmt };
> + int ret;
> +
> + if (stmt_dominates_stmt_p (lstmt, rstmt))
Can you avoid these please?
> + {
> + ret = -1;
> + stmts[0] = lstmt;
> + stmts[1] = rstmt;
> + }
> + else if (stmt_dominates_stmt_p (rstmt, lstmt))
> + {
> + ret = 1;
> + stmts[1] = lstmt;
> + stmts[0] = rstmt;
> + }
> + else
> + return 0;
> +
> + basic_block bbs[2] = { gimple_bb (stmts[0]), gimple_bb (stmts[1]) };
> + if (bbs[0] == bbs[1])
> + {
> + for (gimple_stmt_iterator gsi = gsi_for_stmt (stmts[0]);
> + gsi_stmt (gsi) != stmts[1]; gsi_next (&gsi))
> + if (gimple_vdef (gsi_stmt (gsi)))
> + return 0;
if (gimple_vuse (stmts[0]) != gimple_vuse (stmts[1]))
return 0;
> + return ret;
> + }
> +
> + for (gimple_stmt_iterator gsi = gsi_for_stmt (stmts[0]);
> + !gsi_end_p (gsi); gsi_next (&gsi))
> + if (gimple_vdef (gsi_stmt (gsi)))
> + return 0;
> +
> + for (gphi_iterator gpi = gsi_start_phis (bbs[1]);
> + !gsi_end_p (gpi); gsi_next (&gpi))
> + if (gimple_vdef (gsi_stmt (gpi)))
> + return 0;
> +
> + for (gimple_stmt_iterator gsi = gsi_for_stmt (stmts[1]);
> + !gsi_end_p (gsi); gsi_prev (&gsi))
> + if (gimple_vdef (gsi_stmt (gsi)))
> + return 0;
Same. All of the stmt walking shouldn't be necessary
as long as one stmt dominates the other.
> + for (basic_block bb = bbs[1]; bb != bbs[0]; )
> + {
> + /* ??? For now, stop if any visited block has more than one
> + predecessor. */
> + if (!single_pred_p (bb))
> + return 0;
> +
> + bb = single_pred (bb);
> +
> + for (gphi_iterator gpi = gsi_start_phis (bbs[1]);
> + !gsi_end_p (gpi); gsi_next (&gpi))
> + if (gimple_vdef (gsi_stmt (gpi)))
> + return 0;
> +
> + for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> + !gsi_end_p (gsi); gsi_next (&gsi))
> + if (gimple_vdef (gsi_stmt (gsi)))
> + return 0;
> + }
> +
> + return ret;
> +}
> +
> +/* Find ways of folding logical expressions of LHS and RHS:
> + Try to merge two comparisons to the same innermost item.
> + Look for range tests like "ch >= '0' && ch <= '9'".
> + Look for combinations of simple terms on machines with expensive branches
> + and evaluate the RHS unconditionally.
> +
> + For example, if we have p->a == 2 && p->b == 4 and we can make an
> + object large enough to span both A and B, we can do this with a comparison
> + against the object ANDed with the a mask.
> +
> + If we have p->a == q->a && p->b == q->b, we may be able to use bit masking
> + operations to do this with one comparison.
> +
> + We check for both normal comparisons and the BIT_AND_EXPRs made this by
> + function and the one above.
> +
> + CODE is the logical operation being done. It can be TRUTH_ANDIF_EXPR,
> + TRUTH_AND_EXPR, TRUTH_ORIF_EXPR, or TRUTH_OR_EXPR.
> +
> + TRUTH_TYPE is the type of the logical operand and LHS and RHS are its
> + two operands.
> +
> + SEPARATEP should be NULL if LHS and RHS are adjacent in
> + CODE-chained compares, and a non-NULL pointer to NULL_TREE
> + otherwise. If the "words" accessed by RHS are already accessed by
> + LHS, this won't matter, but if RHS accesses "words" that LHS
> + doesn't, then *SEPARATEP will be set to the compares that should
> + take RHS's place. By "words" we mean contiguous bits that do not
> + cross a an TYPE_ALIGN boundary of the accessed object's type.
> +
> + We return the simplified tree or 0 if no optimization is possible. */
> +
> +tree
> +fold_truth_andor_maybe_separate (location_t loc,
> + enum tree_code code, tree truth_type,
> + enum tree_code lcode, tree ll_arg, tree lr_arg,
> + enum tree_code rcode, tree rl_arg, tree rr_arg,
> + tree *separatep)
> +{
> + /* If this is the "or" of two comparisons, we can do something if
> + the comparisons are NE_EXPR. If this is the "and", we can do something
> + if the comparisons are EQ_EXPR. I.e.,
> + (a->b == 2 && a->c == 4) can become (a->new == NEW).
> +
> + WANTED_CODE is this operation code. For single bit fields, we can
> + convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
> + comparison for one-bit fields. */
> +
> + enum tree_code orig_code = code;
> + enum tree_code wanted_code;
> + tree ll_inner, lr_inner, rl_inner, rr_inner;
> + gimple *ll_load, *lr_load, *rl_load, *rr_load;
> + int l_mergeable = 0, r_mergeable = 0;
> + HOST_WIDE_INT ll_bitsize, ll_bitpos, lr_bitsize, lr_bitpos;
> + HOST_WIDE_INT rl_bitsize, rl_bitpos, rr_bitsize, rr_bitpos;
> + HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
> + HOST_WIDE_INT lnbitsize, lnbitpos, rnbitsize, rnbitpos;
> + int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
> + int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
> + machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
> + scalar_int_mode lnmode, lnmode2, rnmode;
> + tree ll_mask, lr_mask, rl_mask, rr_mask;
> + tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
> + tree l_const, r_const;
> + tree lntype, rntype, result;
> + HOST_WIDE_INT first_bit, end_bit;
> + int volatilep;
> + bool l_split_load;
> +
> + gcc_checking_assert (!separatep || !*separatep);
> +
> + /* Start by getting the comparison codes. Fail if anything is volatile.
> + If one operand is a BIT_AND_EXPR with the constant one, treat it as if
> + it were surrounded with a NE_EXPR. */
> +
> + if (TREE_SIDE_EFFECTS (ll_arg) || TREE_SIDE_EFFECTS (lr_arg)
> + || TREE_SIDE_EFFECTS (rl_arg) || TREE_SIDE_EFFECTS (rr_arg))
> + return 0;
> +
> + if (TREE_CODE_CLASS (lcode) != tcc_comparison
> + || TREE_CODE_CLASS (rcode) != tcc_comparison)
> + return 0;
> +
> + code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR)
> + ? TRUTH_AND_EXPR : TRUTH_OR_EXPR);
> +
> + bool lsignbit = false, rsignbit = false;
> + if ((lcode == LT_EXPR || lcode == GE_EXPR)
> + && integer_zerop (lr_arg)
> + && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg))
> + && !TYPE_UNSIGNED (TREE_TYPE (ll_arg)))
> + {
> + lsignbit = true;
> + lcode = (lcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
> + }
> + if ((rcode == LT_EXPR || rcode == GE_EXPR)
> + && integer_zerop (rr_arg)
> + && INTEGRAL_TYPE_P (TREE_TYPE (rl_arg))
> + && !TYPE_UNSIGNED (TREE_TYPE (rl_arg)))
> + {
> + rsignbit = true;
> + rcode = (rcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
> + }
> +
> + /* See if the comparisons can be merged. Then get all the parameters for
> + each side. */
> +
> + if ((lcode != EQ_EXPR && lcode != NE_EXPR)
> + || (rcode != EQ_EXPR && rcode != NE_EXPR))
> + return 0;
> +
> + ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
> + volatilep = 0;
> + int l_xor = prepare_xor (ll_arg, &lr_arg);
> + ll_inner = decode_field_reference (loc, &ll_arg,
> + &ll_bitsize, &ll_bitpos, &ll_mode,
> + &ll_unsignedp, &ll_reversep, &volatilep,
> + &ll_mask, &ll_and_mask, l_xor,
> + &ll_load);
> + lr_inner = decode_field_reference (loc, &lr_arg,
> + &lr_bitsize, &lr_bitpos, &lr_mode,
> + &lr_unsignedp, &lr_reversep, &volatilep,
> + &lr_mask, &lr_and_mask, 2 * l_xor,
> + &lr_load);
> + int r_xor = prepare_xor (rl_arg, &rr_arg);
> + rl_inner = decode_field_reference (loc, &rl_arg,
> + &rl_bitsize, &rl_bitpos, &rl_mode,
> + &rl_unsignedp, &rl_reversep, &volatilep,
> + &rl_mask, &rl_and_mask, r_xor,
> + &rl_load);
> + rr_inner = decode_field_reference (loc, &rr_arg,
> + &rr_bitsize, &rr_bitpos, &rr_mode,
> + &rr_unsignedp, &rr_reversep, &volatilep,
> + &rr_mask, &rr_and_mask, 2 * r_xor,
> + &rr_load);
> +
> + /* It must be true that the inner operation on the lhs of each
> + comparison must be the same if we are to be able to do anything.
> + Then see if we have constants. If not, the same must be true for
> + the rhs's. */
> + if (volatilep
> + || ll_reversep != rl_reversep
> + || ll_inner == 0 || rl_inner == 0
> + || ! operand_equal_p (ll_inner, rl_inner, 0)
> + || (ll_load && rl_load
> + && ! (l_mergeable = mergeable_loads_p (ll_load, rl_load))))
> + return 0;
> +
> + if (TREE_CODE (lr_arg) == INTEGER_CST
> + && TREE_CODE (rr_arg) == INTEGER_CST)
> + {
> + l_const = lr_arg, r_const = rr_arg;
> + lr_reversep = ll_reversep;
> + }
> + else if (lr_reversep != rr_reversep
> + || lr_inner == 0 || rr_inner == 0
> + || ! operand_equal_p (lr_inner, rr_inner, 0)
> + || (lr_load && rr_load
> + && ! (r_mergeable = mergeable_loads_p (lr_load, rr_load))))
> + return 0;
> + else
> + l_const = r_const = 0;
> +
> + if (lsignbit)
> + {
> + tree mask = build_int_cst_type (TREE_TYPE (ll_arg), -1);
> + tree sign = int_const_binop (LSHIFT_EXPR, mask,
> + bitsize_int (ll_bitsize - 1));
> + if (!ll_mask)
> + ll_mask = sign;
> + else
> + ll_mask = int_const_binop (BIT_AND_EXPR, ll_mask, sign);
> + if (!ll_and_mask)
> + ll_and_mask = sign;
> + else
> + ll_and_mask = int_const_binop (BIT_AND_EXPR, ll_and_mask, sign);
> + }
> +
> + if (rsignbit)
> + {
> + tree mask = build_int_cst_type (TREE_TYPE (rl_arg), -1);
> + tree sign = int_const_binop (LSHIFT_EXPR, mask,
> + bitsize_int (rl_bitsize - 1));
> + if (!rl_mask)
> + rl_mask = sign;
> + else
> + rl_mask = int_const_binop (BIT_AND_EXPR, rl_mask, sign);
> + if (!rl_and_mask)
> + rl_and_mask = sign;
> + else
> + rl_and_mask = int_const_binop (BIT_AND_EXPR, rl_and_mask, sign);
> + }
> +
> + /* If either comparison code is not correct for our logical operation,
> + fail. However, we can convert a one-bit comparison against zero into
> + the opposite comparison against that bit being set in the field. */
> +
> + wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
> + if (lcode != wanted_code)
> + {
> + if (l_const && integer_zerop (l_const) && integer_pow2p (ll_mask))
> + {
> + /* Make the left operand unsigned, since we are only interested
> + in the value of one bit. Otherwise we are doing the wrong
> + thing below. */
> + ll_unsignedp = 1;
> + l_const = ll_mask;
> + }
> + else
> + return 0;
> + }
> +
> + /* This is analogous to the code for l_const above. */
> + if (rcode != wanted_code)
> + {
> + if (r_const && integer_zerop (r_const) && integer_pow2p (rl_mask))
> + {
> + rl_unsignedp = 1;
> + r_const = rl_mask;
> + }
> + else
> + return 0;
> + }
> +
> + /* This will be bumped to 2 if any of the field pairs crosses an
> + alignment boundary, so the merged compare has to be done in two
> + parts. */
> + int parts = 1;
> + /* Set to true if the second combined compare should come first,
> + e.g., because the second original compare accesses a word that
> + the first one doesn't, and the combined compares access those in
> + cmp[0]. */
> + bool first1 = false;
> + /* Set to true if the first original compare is not the one being
> + split. */
> + bool maybe_separate = false;
> +
> + /* The following 2-dimensional arrays use the first index to
> + identify left(0)- vs right(1)-hand compare operands, and the
> + second one to identify merged compare parts. */
> + /* The memory loads or constants to be compared. */
> + tree ld_arg[2][2];
> + /* The first bit of the corresponding inner object that the
> + corresponding LD_ARG covers. */
> + HOST_WIDE_INT bitpos[2][2];
> + /* The bit count starting at BITPOS that the corresponding LD_ARG
> + covers. */
> + HOST_WIDE_INT bitsiz[2][2];
> + /* The number of bits by which LD_ARG has already been shifted
> + right, WRT mask. */
> + HOST_WIDE_INT shifted[2][2];
> + /* The number of bits by which both LD_ARG and MASK need shifting to
> + bring its least-significant bit to bit zero. */
> + HOST_WIDE_INT toshift[2][2];
> + /* An additional mask to be applied to LD_ARG, to remove any bits
> + that may have been loaded for use in another compare, but that
> + don't belong in the corresponding compare. */
> + tree xmask[2][2] = {};
> +
> + /* The combined compare or compares. */
> + tree cmp[2];
> +
> + /* Consider we're comparing two non-contiguous fields of packed
> + structs, both aligned at 32-bit boundaries:
> +
> + ll_arg: an 8-bit field at offset 0
> + lr_arg: a 16-bit field at offset 2
> +
> + rl_arg: an 8-bit field at offset 1
> + rr_arg: a 16-bit field at offset 3
> +
> + We'll have r_split_load, because rr_arg straddles across an
> + alignment boundary.
> +
> + We'll want to have:
> +
> + bitpos = { { 0, 0 }, { 0, 32 } }
> + bitsiz = { { 32, 32 }, { 32, 8 } }
> +
> + And, for little-endian:
> +
> + shifted = { { 0, 0 }, { 0, 32 } }
> + toshift = { { 0, 24 }, { 0, 0 } }
> +
> + Or, for big-endian:
> +
> + shifted = { { 0, 0 }, { 8, 0 } }
> + toshift = { { 8, 0 }, { 0, 0 } }
> + */
> +
> + /* See if we can find a mode that contains both fields being compared on
> + the left. If we can't, fail. Otherwise, update all constants and masks
> + to be relative to a field of that size. */
> + first_bit = MIN (ll_bitpos, rl_bitpos);
> + end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
> + if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> + TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
> + volatilep, &lnmode))
> + {
> + /* Consider the possibility of recombining loads if any of the
> + fields straddles across an alignment boundary, so that either
> + part can be loaded along with the other field. */
> + HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner));
> + HOST_WIDE_INT boundary = compute_split_boundary_from_align
> + (align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
> +
> + if (boundary < 0
> + || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
> + align, BITS_PER_WORD, volatilep, &lnmode)
> + || !get_best_mode (end_bit - boundary, boundary, 0, 0,
> + align, BITS_PER_WORD, volatilep, &lnmode2))
> + return 0;
> +
> + l_split_load = true;
> + parts = 2;
> + if (ll_bitpos >= boundary)
> + maybe_separate = first1 = true;
> + else if (ll_bitpos + ll_bitsize <= boundary)
> + maybe_separate = true;
> + }
> + else
> + l_split_load = false;
> +
> + lnbitsize = GET_MODE_BITSIZE (lnmode);
> + lnbitpos = first_bit & ~ (lnbitsize - 1);
> + if (l_split_load)
> + lnbitsize += GET_MODE_BITSIZE (lnmode2);
> + lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
> + if (!lntype)
> + {
> + gcc_checking_assert (l_split_load);
> + lntype = build_nonstandard_integer_type (lnbitsize, 1);
> + }
> + xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
> +
> + if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> + {
> + xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
> + xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
> + }
> +
> + ll_mask = int_const_binop (LSHIFT_EXPR,
> + fold_convert_loc (loc, lntype, ll_mask),
> + size_int (xll_bitpos));
> + if (!ll_mask)
> + return 0;
> + rl_mask = int_const_binop (LSHIFT_EXPR,
> + fold_convert_loc (loc, lntype, rl_mask),
> + size_int (xrl_bitpos));
> + if (!rl_mask)
> + return 0;
> +
> + if (l_const)
> + {
> + l_const = fold_convert_loc (loc, lntype, l_const);
> + l_const = unextend (l_const, ll_bitsize, ll_unsignedp, ll_and_mask);
> + l_const = int_const_binop (LSHIFT_EXPR, l_const, size_int (xll_bitpos));
> + if (! integer_zerop (int_const_binop (BIT_AND_EXPR, l_const,
> + fold_build1_loc (loc, BIT_NOT_EXPR,
> + lntype, ll_mask))))
> + {
> + warning (0, "comparison is always %d", wanted_code == NE_EXPR);
> +
> + return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> + }
> + }
> + if (r_const)
> + {
> + r_const = fold_convert_loc (loc, lntype, r_const);
> + r_const = unextend (r_const, rl_bitsize, rl_unsignedp, rl_and_mask);
> + r_const = int_const_binop (LSHIFT_EXPR, r_const, size_int (xrl_bitpos));
> + if (! integer_zerop (int_const_binop (BIT_AND_EXPR, r_const,
> + fold_build1_loc (loc, BIT_NOT_EXPR,
> + lntype, rl_mask))))
> + {
> + warning (0, "comparison is always %d", wanted_code == NE_EXPR);
> +
> + return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> + }
> + }
> +
> + /* If the right sides are not constant, do the same for it. Also,
> + disallow this optimization if a size, signedness or storage order
> + mismatch occurs between the left and right sides. */
> + if (l_const == 0)
> + {
> + if (ll_bitsize != lr_bitsize || rl_bitsize != rr_bitsize
> + || ll_unsignedp != lr_unsignedp || rl_unsignedp != rr_unsignedp
> + || ll_reversep != lr_reversep
> + /* Make sure the two fields on the right
> + correspond to the left without being swapped. */
> + || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos
> + || lnbitpos < 0)
> + return 0;
> +
> + bool r_split_load;
> + scalar_int_mode rnmode2;
> +
> + first_bit = MIN (lr_bitpos, rr_bitpos);
> + end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
> + if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> + TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
> + volatilep, &rnmode))
> + {
> + /* Consider the possibility of recombining loads if any of the
> + fields straddles across an alignment boundary, so that either
> + part can be loaded along with the other field. */
> + HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (lr_inner));
> + HOST_WIDE_INT boundary = compute_split_boundary_from_align
> + (align, lr_bitpos, lr_bitsize, rr_bitpos, rr_bitsize);
> +
> + if (boundary < 0
> + /* If we're to split both, make sure the split point is
> + the same. */
> + || (l_split_load
> + && (boundary - lr_bitpos
> + != (lnbitpos + GET_MODE_BITSIZE (lnmode)) - ll_bitpos))
> + || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
> + align, BITS_PER_WORD, volatilep, &rnmode)
> + || !get_best_mode (end_bit - boundary, boundary, 0, 0,
> + align, BITS_PER_WORD, volatilep, &rnmode2))
> + return 0;
> +
> + r_split_load = true;
> + parts = 2;
> + if (lr_bitpos >= boundary)
> + maybe_separate = first1 = true;
> + else if (lr_bitpos + lr_bitsize <= boundary)
> + maybe_separate = true;
> + }
> + else
> + r_split_load = false;
> +
> + rnbitsize = GET_MODE_BITSIZE (rnmode);
> + rnbitpos = first_bit & ~ (rnbitsize - 1);
> + if (r_split_load)
> + rnbitsize += GET_MODE_BITSIZE (rnmode2);
> + rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
> + if (!rntype)
> + {
> + gcc_checking_assert (r_split_load);
> + rntype = build_nonstandard_integer_type (rnbitsize, 1);
> + }
> + xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
> +
> + if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> + {
> + xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
> + xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
> + }
> +
> + lr_mask = int_const_binop (LSHIFT_EXPR,
> + fold_convert_loc (loc, rntype, lr_mask),
> + size_int (xlr_bitpos));
> + rr_mask = int_const_binop (LSHIFT_EXPR,
> + fold_convert_loc (loc, rntype, rr_mask),
> + size_int (xrr_bitpos));
> +
> + lr_mask = int_const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
> +
> + toshift[1][0] = MIN (xlr_bitpos, xrr_bitpos);
> + shifted[1][0] = 0;
> +
> + if (!r_split_load)
> + {
> + bitpos[1][0] = rnbitpos;
> + bitsiz[1][0] = rnbitsize;
> + ld_arg[1][0] = make_bit_field_load (loc, lr_inner, lr_arg,
> + rntype, rnbitsize, rnbitpos,
> + lr_unsignedp || rr_unsignedp,
> + lr_reversep,
> + r_mergeable == 0
> + ? NULL
> + : r_mergeable < 0
> + ? lr_load
> + : rr_load);
> + }
> +
> + if (parts > 1)
> + {
> + if (r_split_load)
> + {
> + gimple *point[2];
> + if (r_mergeable == 0)
> + point[0] = point[1] = NULL;
> + else if (r_mergeable < 0)
> + {
> + point[0] = lr_load;
> + point[1] = rr_load;
> + }
> + else
> + {
> + point[0] = rr_load;
> + point[1] = lr_load;
> + }
> + build_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> + shifted[1], loc, lr_inner, lr_arg,
> + rnmode, rnmode2, rnbitpos, lr_reversep, point);
> + }
> + else
> + reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> + shifted[1], xmask[1],
> + lnbitpos + GET_MODE_BITSIZE (lnmode)
> + - ll_bitpos + lr_bitpos, lr_reversep);
> + }
> + }
> + else
> + {
> + /* Handle the case of comparisons with constants. If there is
> + something in common between the masks, those bits of the
> + constants must be the same. If not, the condition is always
> + false. Test for this to avoid generating incorrect code
> + below. */
> + result = int_const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
> + if (! integer_zerop (result)
> + && simple_cst_equal (int_const_binop (BIT_AND_EXPR,
> + result, l_const),
> + int_const_binop (BIT_AND_EXPR,
> + result, r_const)) != 1)
> + {
> + if (wanted_code == NE_EXPR)
> + {
> + warning (0,
> + "%<or%> of unmatched not-equal tests"
> + " is always 1");
> + return constant_boolean_node (true, truth_type);
> + }
> + else
> + {
> + warning (0,
> + "%<and%> of mutually exclusive equal-tests"
> + " is always 0");
> + return constant_boolean_node (false, truth_type);
> + }
> + }
> +
> + if (lnbitpos < 0)
> + return 0;
> +
> + /* The constants are combined so as to line up with the loaded
> + field, so use the same parameters. */
> + ld_arg[1][0] = int_const_binop (BIT_IOR_EXPR, l_const, r_const);
> + toshift[1][0] = MIN (xll_bitpos, xrl_bitpos);
> + shifted[1][0] = 0;
> + bitpos[1][0] = lnbitpos;
> + bitsiz[1][0] = lnbitsize;
> +
> + if (parts > 1)
> + reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> + shifted[1], xmask[1],
> + lnbitpos + GET_MODE_BITSIZE (lnmode),
> + lr_reversep);
> +
> + lr_mask = build_int_cst_type (TREE_TYPE (ld_arg[1][0]), -1);
> +
> + /* If the compiler thinks this is used uninitialized below, it's
> + because it can't realize that parts can only be 2 when
> + comparing wiht constants if l_split_load is also true. This
> + just silences the warning. */
> + rnbitpos = 0;
> + }
> +
> + ll_mask = int_const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
> + toshift[0][0] = MIN (xll_bitpos, xrl_bitpos);
> + shifted[0][0] = 0;
> +
> + if (!l_split_load)
> + {
> + bitpos[0][0] = lnbitpos;
> + bitsiz[0][0] = lnbitsize;
> + ld_arg[0][0] = make_bit_field_load (loc, ll_inner, ll_arg,
> + lntype, lnbitsize, lnbitpos,
> + ll_unsignedp || rl_unsignedp,
> + ll_reversep,
> + l_mergeable == 0
> + ? NULL
> + : l_mergeable < 0
> + ? ll_load
> + : rl_load);
> + }
> +
> + if (parts > 1)
> + {
> + if (l_split_load)
> + {
> + gimple *point[2];
> + if (l_mergeable == 0)
> + point[0] = point[1] = NULL;
> + else if (l_mergeable < 0)
> + {
> + point[0] = ll_load;
> + point[1] = rl_load;
> + }
> + else
> + {
> + point[0] = rl_load;
> + point[1] = ll_load;
> + }
> + build_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
> + shifted[0], loc, ll_inner, ll_arg,
> + lnmode, lnmode2, lnbitpos, ll_reversep, point);
> + }
> + else
> + reuse_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
> + shifted[0], xmask[0],
> + rnbitpos + GET_MODE_BITSIZE (rnmode)
> + - lr_bitpos + ll_bitpos, ll_reversep);
> + }
> +
> + for (int i = 0; i < parts; i++)
> + {
> + tree op[2] = { ld_arg[0][i], ld_arg[1][i] };
> + tree mask[2] = { ll_mask, lr_mask };
> +
> + for (int j = 0; j < 2; j++)
> + {
> + op[j] = unshare_expr (op[j]);
> +
> + /* Mask out the bits belonging to the other part. */
> + if (xmask[j][i])
> + mask[j] = int_const_binop (BIT_AND_EXPR, mask[j], xmask[j][i]);
> +
> + if (shifted[j][i])
> + {
> + tree shiftsz = bitsize_int (shifted[j][i]);
> + mask[j] = int_const_binop (RSHIFT_EXPR, mask[j], shiftsz);
> + }
> + mask[j] = fold_convert_loc (loc, TREE_TYPE (op[j]), mask[j]);
> + }
> +
> + HOST_WIDE_INT shift = (toshift[0][i] - toshift[1][i]);
> +
> + if (shift)
> + {
> + int j;
> + if (shift > 0)
> + j = 0;
> + else
> + {
> + j = 1;
> + shift = -shift;
> + }
> +
> + tree shiftsz = bitsize_int (shift);
> + op[j] = fold_build2_loc (loc, RSHIFT_EXPR, TREE_TYPE (op[j]),
> + op[j], shiftsz);
> + mask[j] = int_const_binop (RSHIFT_EXPR, mask[j], shiftsz);
> + }
> +
> + /* Convert to the smaller type before masking out unwanted
> + bits. */
> + tree type = TREE_TYPE (op[0]);
> + if (type != TREE_TYPE (op[1]))
> + {
> + int j = (TYPE_PRECISION (type)
> + < TYPE_PRECISION (TREE_TYPE (op[1])));
> + if (!j)
> + type = TREE_TYPE (op[1]);
> + op[j] = fold_convert_loc (loc, type, op[j]);
> + mask[j] = fold_convert_loc (loc, type, mask[j]);
> + }
> +
> + for (int j = 0; j < 2; j++)
> + if (! integer_all_onesp (mask[j]))
> + op[j] = build2_loc (loc, BIT_AND_EXPR, type,
> + op[j], mask[j]);
> +
> + cmp[i] = build2_loc (loc, wanted_code, truth_type, op[0], op[1]);
> + }
> +
> + if (first1)
> + std::swap (cmp[0], cmp[1]);
> +
> + if (parts == 1)
> + result = cmp[0];
> + else if (!separatep || !maybe_separate)
> + result = build2_loc (loc, orig_code, truth_type, cmp[0], cmp[1]);
> + else
> + {
> + result = cmp[0];
> + *separatep = cmp[1];
> + }
> +
The function is large - maybe some factoring is possible? I also notice
you return a GENERIC expression here where having a GIMPLE value
and an alternate gimple_seq output for defining stmts would be better?
> + return result;
> +}
> +
> /* Try to simplify the AND of two comparisons, specified by
> (OP1A CODE1 OP1B) and (OP2B CODE2 OP2B), respectively.
> If this can be simplified to a single expression (without requiring
> diff --git a/gcc/testsuite/gcc.dg/field-merge-1.c b/gcc/testsuite/gcc.dg/field-merge-1.c
> new file mode 100644
> index 0000000000000..1818e104437e1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-1.c
> @@ -0,0 +1,64 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -save-temps -fdump-tree-optimized" } */
> +
> +/* Check that field loads compared with constants are merged, even if
> + tested out of order, and when fields straddle across alignment
> + boundaries. */
> +
> +struct TL {
> + unsigned char p;
> + unsigned int a;
> + unsigned char q;
> + unsigned int b;
> + unsigned char r;
> + unsigned int c;
> + unsigned char s;
> +} __attribute__ ((packed, aligned (4), scalar_storage_order ("little-endian")));
> +
> +struct TB {
> + unsigned char p;
> + unsigned int a;
> + unsigned char q;
> + unsigned int b;
> + unsigned char r;
> + unsigned int c;
> + unsigned char s;
> +} __attribute__ ((packed, aligned (4), scalar_storage_order ("big-endian")));
> +
> +#define vc 0xaa
> +#define vi 0x12345678
> +
> +struct TL vL = { vc, vi, vc, vi, vc, vi, vc };
> +struct TB vB = { vc, vi, vc, vi, vc, vi, vc };
> +
> +void f (void) {
> + /* Which words of | vL | vB | */
> + /* are accessed by |0123|0123| */
> + if (0 /* the tests? | | | */
> + || vL.p != vc /* |* | | */
> + || vB.p != vc /* | |* | */
> + || vL.s != vc /* | *| | */
> + || vB.q != vc /* | | * | */
> + || vL.a != vi /* |^* | | */
> + || vB.b != vi /* | | ^* | */
> + || vL.c != vi /* | *^| | */
> + || vB.c != vi /* | | ^*| */
> + || vL.b != vi /* | ^^ | | */
> + || vL.q != vc /* | ^ | | */
> + || vB.a != vi /* | |^^ | */
> + || vB.r != vc /* | | ^ | */
> + || vB.s != vc /* | | ^| */
> + || vL.r != vc /* | ^ | | */
> + )
> + __builtin_abort ();
> +}
> +
> +int main () {
> + f ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 8 "optimized" } } */
> +/* { dg-final { scan-assembler-not "cmpb" { target { i*86-*-* || x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler-times "cmpl" 8 { target { i*86-*-* || x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler-times "cmpw" 8 { target { powerpc*-*-* || rs6000-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/field-merge-2.c b/gcc/testsuite/gcc.dg/field-merge-2.c
> new file mode 100644
> index 0000000000000..80c573b31ddc7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-2.c
> @@ -0,0 +1,31 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct TL {
> + unsigned short a;
> + unsigned short b;
> +} __attribute__ ((packed, aligned (8)));
> +
> +struct TB {
> + unsigned char p;
> + unsigned short a;
> + unsigned short b;
> +} __attribute__ ((packed, aligned (8)));
> +
> +#define vc 0xaa
> +
> +struct TL vL = { vc, vc };
> +struct TB vB = { vc, vc, vc };
> +
> +void f (void) {
> + if (0
> + || vL.b != vB.b
> + || vL.a != vB.a
> + )
> + __builtin_abort ();
> +}
> +
> +int main () {
> + f ();
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-3.c b/gcc/testsuite/gcc.dg/field-merge-3.c
> new file mode 100644
> index 0000000000000..f26e8a96ea04f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-3.c
> @@ -0,0 +1,36 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +const int BIG_ENDIAN_P = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__);
> +
> +struct T1 {
> + unsigned char p[2];
> + unsigned short a;
> + unsigned int z;
> +} __attribute__((__aligned__(8)));
> +
> +struct T2 {
> + unsigned short p;
> + unsigned short a;
> + unsigned int z;
> +} __attribute__((__aligned__(8)));
> +
> +#define vc 0xaa
> +#define vi 0x12345678
> +
> +struct T1 v1 = { { vc + !BIG_ENDIAN_P, vc + BIG_ENDIAN_P }, vc, vi };
> +struct T2 v2 = { (vc << 8) | (vc - 1), vc, vi };
> +
> +void f (void) {
> + if (0
> + || v1.p[!BIG_ENDIAN_P] != v2.p >> 8
> + || v1.a != v2.a
> + || ((v1.z ^ v2.z) & 0xff00ff00) != 0
> + || ((v1.z ^ v2.z) & 0x00ff00ff) != 0)
> + __builtin_abort ();
> +}
> +
> +int main () {
> + f ();
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-4.c b/gcc/testsuite/gcc.dg/field-merge-4.c
> new file mode 100644
> index 0000000000000..c629069e52b2c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-4.c
> @@ -0,0 +1,40 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct T1 {
> + unsigned int zn;
> + unsigned char p;
> + unsigned char qn;
> + unsigned short a;
> + unsigned int z;
> +} __attribute__((__packed__, __aligned__(4)));
> +
> +struct T2 {
> + unsigned int zn;
> + unsigned char rn;
> + unsigned char p;
> + unsigned char qn;
> + unsigned short a;
> + unsigned int z;
> +} __attribute__((__packed__, __aligned__(4)));
> +
> +#define vc 0xaa
> +#define vs 0xccdd
> +#define vi 0x12345678
> +
> +struct T1 v1 = { -1, vc, 1, vs, vi };
> +struct T2 v2 = { -1, 0, vc, 1, vs, vi };
> +
> +void f (void) {
> + if (0
> + || v1.p != v2.p
> + || v1.a != v2.a
> + || v1.z != v2.z
> + )
> + __builtin_abort ();
> +}
> +
> +int main () {
> + f ();
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-5.c b/gcc/testsuite/gcc.dg/field-merge-5.c
> new file mode 100644
> index 0000000000000..1580b14bcc935
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-5.c
> @@ -0,0 +1,40 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct T1 {
> + unsigned int zn;
> + unsigned char p;
> + unsigned char qn;
> + unsigned short a;
> + unsigned int z;
> +} __attribute__((__packed__, __aligned__(8)));
> +
> +struct T2 {
> + unsigned int zn;
> + unsigned char rn;
> + unsigned char p;
> + unsigned char qn;
> + unsigned short a;
> + unsigned int z;
> +} __attribute__((__packed__, __aligned__(8)));
> +
> +#define vc 0xaa
> +#define vs 0xccdd
> +#define vi 0x12345678
> +
> +struct T1 v1 = { -1, vc, 1, vs, vi };
> +struct T2 v2 = { -1, 0, vc, 1, vs, vi };
> +
> +void f (void) {
> + if (0
> + || v1.p != v2.p
> + || v1.a != v2.a
> + || v1.z != v2.z
> + )
> + __builtin_abort ();
> +}
> +
> +int main () {
> + f ();
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-6.c b/gcc/testsuite/gcc.dg/field-merge-6.c
> new file mode 100644
> index 0000000000000..7fd48a138d14a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-6.c
> @@ -0,0 +1,26 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +/* { dg-shouldfail } */
> +
> +/* Check that the third compare won't be pulled ahead of the second one and
> + prevent, which would prevent the NULL pointer dereference that should cause
> + the execution to fail. */
> +
> +struct s {
> + char a, b;
> + int *p;
> +};
> +
> +struct s a = { 0, 1, 0 };
> +struct s b = { 0, 0, 0 };
> +
> +int f () {
> + return (a.a != b.a
> + || *b.p != *a.p
> + || a.b != b.b);
> +}
> +
> +int main() {
> + f ();
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-7.c b/gcc/testsuite/gcc.dg/field-merge-7.c
> new file mode 100644
> index 0000000000000..728a29b6fafa9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-7.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-ifcombine-details" } */
> +
> +/* Check that the third compare won't be combined with the first one. */
> +
> +struct s {
> + char a, b;
> + int p;
> +};
> +
> +struct s a = { 0, 0, 0 };
> +struct s b = { 0, 0, 0 };
> +
> +int f () {
> + return (a.a != b.a || (a.p != b.p && a.b != b.b));
> +}
> +
> +int g () {
> + return (a.a == b.a && (a.p == b.p || a.b == b.b));
> +}
> +
> +/* { dg-final { scan-tree-dump-not "optimizing" "ifcombine" } } */
> +/* { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "ifcombine" } } */
> diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
> index 6a3bc99190d9e..58222445daf97 100644
> --- a/gcc/tree-ssa-ifcombine.cc
> +++ b/gcc/tree-ssa-ifcombine.cc
> @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3. If not see
> #include "tree-ssa.h"
> #include "attribs.h"
> #include "asan.h"
> +#include "bitmap.h"
>
> #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
> #define LOGICAL_OP_NON_SHORT_CIRCUIT \
> @@ -49,6 +50,28 @@ along with GCC; see the file COPYING3. If not see
> false) >= 2)
> #endif
>
> +/* Return TRUE iff COND is NULL, or the condition in it is constant. */
> +
> +static bool
> +constant_condition_p (gcond *cond)
> +{
> + if (!cond)
> + return true;
> +
> + return (CONSTANT_CLASS_P (gimple_cond_lhs (cond))
> + && CONSTANT_CLASS_P (gimple_cond_rhs (cond)));
> +}
> +
> +/* Return FALSE iff the condition in the COND stmt that ends COND_BB is not
> + constant. */
> +
> +static bool
> +constant_condition_p (basic_block cond_bb)
> +{
> + gcond *cond = safe_dyn_cast <gcond *> (*gsi_last_bb (cond_bb));
> + return constant_condition_p (cond);
It's a bit odd this returns true for no condition. Maybe
static_outgoing_edge_p ()?
(what about a throwing stmt?) IMO inlining the predicate to it's few(?) uses
avoids giving it a "proper" name ;)
> +}
> +
> /* This pass combines COND_EXPRs to simplify control flow. It
> currently recognizes bit tests and comparisons in chains that
> represent logical and or logical or of two COND_EXPRs.
> @@ -110,6 +133,37 @@ recognize_if_then_else (basic_block cond_bb,
> return true;
> }
>
> +/* Same as recognize_if_then_else, but check that the condition is not
> + constant. It is not useful to combine constant conditions. */
I do wonder why we ever get to the situation with constant conditions,
thus why recognize_if_then_else should match them in the first place?
> +static bool
> +recognize_if_then_else_nc (basic_block cond_bb,
> + basic_block *then_bb, basic_block *else_bb)
> +{
> + return recognize_if_then_else (cond_bb, then_bb, else_bb)
> + && !constant_condition_p (cond_bb);
> +}
> +
> +/* Same as recognize_if_then_else, but don't associate the blocks with then and
> + else, check both possibilities. */
> +
> +static bool
> +recognize_if_succs (basic_block cond_bb,
> + basic_block succ1, basic_block succ2)
> +{
> + basic_block t, e;
> +
> + if (EDGE_COUNT (cond_bb->succs) != 2)
> + return false;
> +
> + /* Find both succs. */
> + t = EDGE_SUCC (cond_bb, 0)->dest;
> + e = EDGE_SUCC (cond_bb, 1)->dest;
Can you use
basic_block tem_then = NULL, tem_else = NULL;
recognize_if_then_else (cond_bb, &tem_then, &tem_else);
or even
(recognize_if_then_else (cond_bb, succ1, succ2)
|| recognize_if_then_else (cond_bb, succ2, succ1))
at callers instead of adding another function?
> + return ((t == succ1 && e == succ2)
> + || (t == succ2 && e == succ1));
> +}
> +
> /* Verify if the basic block BB does not have side-effects. Return
> true in this case, else false. */
>
> @@ -129,7 +183,7 @@ bb_no_side_effects_p (basic_block bb)
> enum tree_code rhs_code;
> if (gimple_has_side_effects (stmt)
> || gimple_could_trap_p (stmt)
> - || gimple_vuse (stmt)
> + /* || gimple_vuse (stmt) */
I think you want to preserve || gimple_vdef (stmt) at least - this possibly
saves you from checking for intermediate stores.
As you special-case hard register uses, do we want to ever make hard-register
uses or defs unconditional? I'll note that on GIMPLE hard register uses/defs
are treated as memory, so they have virtual defs and uses.
> /* We need to rewrite stmts with undefined overflow to use
> unsigned arithmetic but cannot do so for signed division. */
> || ((ass = dyn_cast <gassign *> (stmt))
> @@ -356,14 +410,28 @@ recognize_bits_test (gcond *cond, tree *name, tree *bits, bool inv)
> }
>
>
> -/* Update profile after code in outer_cond_bb was adjusted so
> - outer_cond_bb has no condition. */
> +/* Update profile after code in either outer_cond_bb or inner_cond_bb was
> + adjusted so that it has no condition. */
>
> static void
> update_profile_after_ifcombine (basic_block inner_cond_bb,
> basic_block outer_cond_bb)
> {
> - edge outer_to_inner = find_edge (outer_cond_bb, inner_cond_bb);
> + /* In the following we assume that inner_cond_bb has single predecessor. */
> + gcc_assert (single_pred_p (inner_cond_bb));
> +
> + basic_block outer_to_inner_bb = inner_cond_bb;
> + profile_probability prob = profile_probability::always ();
> + for (;;)
> + {
> + basic_block parent = single_pred (outer_to_inner_bb);
> + prob *= find_edge (parent, outer_to_inner_bb)->probability;
> + if (parent == outer_cond_bb)
> + break;
> + outer_to_inner_bb = parent;
> + }
> +
> + edge outer_to_inner = find_edge (outer_cond_bb, outer_to_inner_bb);
> edge outer2 = (EDGE_SUCC (outer_cond_bb, 0) == outer_to_inner
> ? EDGE_SUCC (outer_cond_bb, 1)
> : EDGE_SUCC (outer_cond_bb, 0));
> @@ -374,40 +442,294 @@ update_profile_after_ifcombine (basic_block inner_cond_bb,
> std::swap (inner_taken, inner_not_taken);
> gcc_assert (inner_taken->dest == outer2->dest);
>
> - /* In the following we assume that inner_cond_bb has single predecessor. */
> - gcc_assert (single_pred_p (inner_cond_bb));
> + if (outer_to_inner_bb == inner_cond_bb
> + && constant_condition_p (outer_cond_bb))
> + {
> + /* Path outer_cond_bb->(outer2) needs to be merged into path
> + outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
> + and probability of inner_not_taken updated. */
> +
> + inner_cond_bb->count = outer_cond_bb->count;
> +
> + /* Handle special case where inner_taken probability is always. In this
> + case we know that the overall outcome will be always as well, but
> + combining probabilities will be conservative because it does not know
> + that outer2->probability is inverse of
> + outer_to_inner->probability. */
> + if (inner_taken->probability == profile_probability::always ())
> + ;
> + else
> + inner_taken->probability = outer2->probability
> + + outer_to_inner->probability * inner_taken->probability;
> + inner_not_taken->probability = profile_probability::always ()
> + - inner_taken->probability;
> +
> + outer_to_inner->probability = profile_probability::always ();
> + outer2->probability = profile_probability::never ();
> + }
> + else if (constant_condition_p (inner_cond_bb))
> + {
> + /* Path inner_cond_bb->(inner_taken) needs to be merged into path
> + outer_cond_bb->(outer2). We've accumulated the probabilities from
> + outer_cond_bb->(outer)->...->inner_cond_bb in prob, so we have to
> + adjust that by inner_taken, and make inner unconditional. */
> +
> + prob *= inner_taken->probability;
> + outer2->probability += prob;
> + outer_to_inner->probability = profile_probability::always ()
> + - outer2->probability;
> +
> + inner_taken->probability = profile_probability::never ();
> + inner_not_taken->probability = profile_probability::always ();
> + }
> + else
> + {
> + /* We've moved part of the inner cond to outer, but we don't know the
> + probabilities for each part, so estimate the effects by moving half of
> + the odds of inner_taken to outer. */
> +
> + inner_taken->probability *= profile_probability::even ();
> + inner_not_taken->probability = profile_probability::always ()
> + - inner_taken->probability;
> +
> + prob *= inner_taken->probability;
> + outer2->probability += prob;
> + outer_to_inner->probability = profile_probability::always ()
> + - outer2->probability;
> + }
> +}
> +
> +/* Set NAME's bit in USED if OUTER dominates it. */
> +
> +static void
> +ifcombine_mark_ssa_name (bitmap used, tree name, basic_block outer)
> +{
> + if (SSA_NAME_IS_DEFAULT_DEF (name))
> + return;
> +
> + gimple *def = SSA_NAME_DEF_STMT (name);
> + basic_block bb = gimple_bb (def);
> + if (!dominated_by_p (CDI_DOMINATORS, bb, outer))
> + return;
> +
> + bitmap_set_bit (used, SSA_NAME_VERSION (name));
> +}
> +
> +/* Data structure passed to ifcombine_mark_ssa_name. */
> +struct ifcombine_mark_ssa_name_t
> +{
> + /* SSA_NAMEs that have been referenced. */
> + bitmap used;
> + /* Dominating block of DEFs that might need moving. */
> + basic_block outer;
> +};
> +
> +/* Mark in DATA->used any SSA_NAMEs used in *t. */
> +
> +static tree
> +ifcombine_mark_ssa_name_walk (tree *t, int *, void *data_)
> +{
> + ifcombine_mark_ssa_name_t *data = (ifcombine_mark_ssa_name_t *)data_;
> +
> + if (*t && TREE_CODE (*t) == SSA_NAME)
> + ifcombine_mark_ssa_name (data->used, *t, data->outer);
> +
> + return NULL;
> +}
> +
> +/* Replace the conditions in INNER_COND and OUTER_COND with COND and COND2.
> + COND and COND2 are computed for insertion at INNER_COND, with OUTER_COND
> + replaced with a constant, but if there are intervening blocks, it's best to
> + adjust COND for insertion at OUTER_COND, placing COND2 at INNER_COND. */
What kind of "intervening" blocks could there be?
> +
> +static tree
> +ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
> + gcond *outer_cond, bool outer_inv,
> + tree cond, bool must_canon,
> + tree cond2)
> +{
> + tree ret = cond;
> + if (cond2)
> + ret = fold_build2 (TRUTH_AND_EXPR, TREE_TYPE (ret), ret, cond2);
> +
> + /* Split cond into cond2 if they're contiguous. ??? We might be able to
> + handle ORIF as well, inverting both conditions, but it's not clear that
> + this would be enough, and it never comes up. */
> + if (!cond2
> + && TREE_CODE (cond) == TRUTH_ANDIF_EXPR
> + && single_pred (gimple_bb (inner_cond)) == gimple_bb (outer_cond))
> + {
> + cond2 = TREE_OPERAND (cond, 1);
> + cond = TREE_OPERAND (cond, 0);
> + }
> +
> + bool outer_p = cond2 || (single_pred (gimple_bb (inner_cond))
> + != gimple_bb (outer_cond));
> + bool result_inv = outer_p ? outer_inv : inner_inv;
> +
> + if (result_inv)
> + cond = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
> +
> + if (tree tcanon = canonicalize_cond_expr_cond (cond))
> + cond = tcanon;
> + else if (must_canon)
> + return NULL_TREE;
? I think you should use gimple_build or gimplify what was built to
a is_gimple_condexpr_for_cond like ifcombine does elsehwere.
> +
> + if (outer_p)
> + {
> + {
> + auto_bitmap used;
> + basic_block outer_bb = gimple_bb (outer_cond);
>
> - /* Path outer_cond_bb->(outer2) needs to be merged into path
> - outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
> - and probability of inner_not_taken updated. */
> + /* Mark SSA DEFs that are referenced by cond and may thus need to be
> + moved to outer. */
> + {
> + ifcombine_mark_ssa_name_t data = { used, outer_bb };
> + walk_tree (&cond, ifcombine_mark_ssa_name_walk, &data, NULL);
> + }
I think this moving code should be an enhancement ontop of the basic
functionality - that is, I'd insert at inner_cond. The intervening blocks must
necessarily form straight-line code and thus we're doing "scheduling" here
which is premature on GIMPLE?
> + if (!bitmap_empty_p (used))
> + {
> + /* Iterate up from inner_cond, moving DEFs identified as used by
> + cond, and marking USEs in the DEFs for moving as well. */
> + gimple_stmt_iterator gsins = gsi_for_stmt (outer_cond);
> + for (basic_block bb = gimple_bb (inner_cond);
> + bb != outer_bb; bb = single_pred (bb))
> + {
> + for (gimple_stmt_iterator gsitr = gsi_last_bb (bb);
> + !gsi_end_p (gsitr); gsi_prev (&gsitr))
> + {
> + gimple *stmt = gsi_stmt (gsitr);
> + bool move = false;
> + tree t;
> + ssa_op_iter it;
> +
> + FOR_EACH_SSA_TREE_OPERAND (t, stmt, it, SSA_OP_DEF)
> + if (bitmap_bit_p (used, SSA_NAME_VERSION (t)))
> + {
> + move = true;
> + break;
> + }
> +
> + if (!move)
> + continue;
> +
> + /* Mark uses in STMT before moving it. */
> + FOR_EACH_SSA_TREE_OPERAND (t, stmt, it, SSA_OP_USE)
> + ifcombine_mark_ssa_name (used, t, outer_bb);
> +
> + gsi_move_before (&gsitr, &gsins, GSI_NEW_STMT);
> + }
> +
> + /* Surprisingly, there may be PHI nodes in single-predecessor
> + bocks, as in pr50682.C. Fortunately, since they can't
> + involve back edges, there won't be references to parallel
> + nodes that we'd have to pay special attention to to keep
> + them parallel. We can't move the PHI nodes, but we can turn
> + them into assignments. */
> + for (gphi_iterator gsi = gsi_start_phis (bb);
> + !gsi_end_p (gsi);)
> + {
> + gphi *phi = gsi.phi ();
> +
> + gcc_assert (gimple_phi_num_args (phi) == 1);
> + tree def = gimple_phi_result (phi);
> +
> + if (!bitmap_bit_p (used, SSA_NAME_VERSION (def)))
> + {
> + gsi_next (&gsi);
> + continue;
> + }
> +
> + /* Mark uses in STMT before moving it. */
> + use_operand_p use_p;
> + ssa_op_iter it;
> + FOR_EACH_PHI_ARG (use_p, phi, it, SSA_OP_USE)
> + ifcombine_mark_ssa_name (used, USE_FROM_PTR (use_p),
> + outer_bb);
> +
> + tree use = gimple_phi_arg_def (phi, 0);
> + location_t loc = gimple_phi_arg_location (phi, 0);
> +
> + remove_phi_node (&gsi, false);
> +
> + gassign *a = gimple_build_assign (def, use);
> + gimple_set_location (a, loc);
> + gsi_insert_before (&gsins, a, GSI_NEW_STMT);
> + }
> + }
> + }
> + }
>
> - inner_cond_bb->count = outer_cond_bb->count;
> + if (!is_gimple_condexpr_for_cond (cond))
> + {
> + gimple_stmt_iterator gsi = gsi_for_stmt (outer_cond);
> + cond = force_gimple_operand_gsi_1 (&gsi, cond,
> + is_gimple_condexpr_for_cond,
> + NULL, true, GSI_SAME_STMT);
> + }
>
> - /* Handle special case where inner_taken probability is always. In this case
> - we know that the overall outcome will be always as well, but combining
> - probabilities will be conservative because it does not know that
> - outer2->probability is inverse of outer_to_inner->probability. */
> - if (inner_taken->probability == profile_probability::always ())
> - ;
> + /* Leave CFG optimization to cfg_cleanup. */
> + gimple_cond_set_condition_from_tree (outer_cond, cond);
> + update_stmt (outer_cond);
> +
> + if (cond2)
> + {
> + if (inner_inv)
> + cond2 = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond2), cond2);
> +
> + if (tree tcanon = canonicalize_cond_expr_cond (cond2))
> + cond2 = tcanon;
> + if (!is_gimple_condexpr_for_cond (cond2))
> + {
> + gimple_stmt_iterator gsi = gsi_for_stmt (inner_cond);
> + cond2 = force_gimple_operand_gsi_1 (&gsi, cond2,
> + is_gimple_condexpr_for_cond,
> + NULL, true, GSI_SAME_STMT);
> + }
> + gimple_cond_set_condition_from_tree (inner_cond, cond2);
> + }
> + else
> + gimple_cond_set_condition_from_tree (inner_cond,
> + inner_inv
> + ? boolean_false_node
> + : boolean_true_node);
> + update_stmt (inner_cond);
> + }
> else
> - inner_taken->probability = outer2->probability + outer_to_inner->probability
> - * inner_taken->probability;
> - inner_not_taken->probability = profile_probability::always ()
> - - inner_taken->probability;
> + {
> + if (!is_gimple_condexpr_for_cond (cond))
> + {
> + gimple_stmt_iterator gsi = gsi_for_stmt (inner_cond);
> + cond = force_gimple_operand_gsi_1 (&gsi, cond,
> + is_gimple_condexpr_for_cond,
> + NULL, true, GSI_SAME_STMT);
> + }
> + gimple_cond_set_condition_from_tree (inner_cond, cond);
> + update_stmt (inner_cond);
> +
> + /* Leave CFG optimization to cfg_cleanup. */
> + gimple_cond_set_condition_from_tree (outer_cond,
> + outer_inv
> + ? boolean_false_node
> + : boolean_true_node);
There is gimple_cond_make_{true,false}
> + update_stmt (outer_cond);
> + }
> +
> + update_profile_after_ifcombine (gimple_bb (inner_cond),
> + gimple_bb (outer_cond));
>
> - outer_to_inner->probability = profile_probability::always ();
> - outer2->probability = profile_probability::never ();
> + return ret;
> }
>
> /* If-convert on a and pattern with a common else block. The inner
> if is specified by its INNER_COND_BB, the outer by OUTER_COND_BB.
> - inner_inv, outer_inv and result_inv indicate whether the conditions
> - are inverted.
> + inner_inv, outer_inv indicate whether the conditions are inverted.
> Returns true if the edges to the common else basic-block were merged. */
>
> static bool
> ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
> - basic_block outer_cond_bb, bool outer_inv, bool result_inv)
> + basic_block outer_cond_bb, bool outer_inv)
> {
> gimple_stmt_iterator gsi;
> tree name1, name2, bit1, bit2, bits1, bits2;
> @@ -446,26 +768,13 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
> t2 = fold_build2 (BIT_AND_EXPR, TREE_TYPE (name1), name1, t);
> t2 = force_gimple_operand_gsi (&gsi, t2, true, NULL_TREE,
> true, GSI_SAME_STMT);
> - t = fold_build2 (result_inv ? NE_EXPR : EQ_EXPR,
> - boolean_type_node, t2, t);
> - t = canonicalize_cond_expr_cond (t);
> - if (!t)
> - return false;
> - if (!is_gimple_condexpr_for_cond (t))
> - {
> - gsi = gsi_for_stmt (inner_cond);
> - t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
> - NULL, true, GSI_SAME_STMT);
> - }
> - gimple_cond_set_condition_from_tree (inner_cond, t);
> - update_stmt (inner_cond);
>
> - /* Leave CFG optimization to cfg_cleanup. */
> - gimple_cond_set_condition_from_tree (outer_cond,
> - outer_inv ? boolean_false_node : boolean_true_node);
> - update_stmt (outer_cond);
> + t = fold_build2 (EQ_EXPR, boolean_type_node, t2, t);
>
> - update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
> + if (!ifcombine_replace_cond (inner_cond, inner_inv,
> + outer_cond, outer_inv,
> + t, true, NULL_TREE))
> + return false;
>
> if (dump_file)
> {
> @@ -485,9 +794,8 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
> In that case remove the outer test and change the inner one to
> test for name & (bits1 | bits2) != 0. */
> else if (recognize_bits_test (inner_cond, &name1, &bits1, !inner_inv)
> - && recognize_bits_test (outer_cond, &name2, &bits2, !outer_inv))
> + && recognize_bits_test (outer_cond, &name2, &bits2, !outer_inv))
> {
> - gimple_stmt_iterator gsi;
> tree t;
>
> if ((TREE_CODE (name1) == SSA_NAME
> @@ -530,33 +838,14 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
> bits1 = fold_convert (TREE_TYPE (bits2), bits1);
> }
>
> - /* Do it. */
> - gsi = gsi_for_stmt (inner_cond);
> t = fold_build2 (BIT_IOR_EXPR, TREE_TYPE (name1), bits1, bits2);
> - t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
> - true, GSI_SAME_STMT);
> t = fold_build2 (BIT_AND_EXPR, TREE_TYPE (name1), name1, t);
> - t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
> - true, GSI_SAME_STMT);
> - t = fold_build2 (result_inv ? NE_EXPR : EQ_EXPR, boolean_type_node, t,
> + t = fold_build2 (EQ_EXPR, boolean_type_node, t,
> build_int_cst (TREE_TYPE (t), 0));
> - t = canonicalize_cond_expr_cond (t);
> - if (!t)
> + if (!ifcombine_replace_cond (inner_cond, inner_inv,
> + outer_cond, outer_inv,
> + t, false, NULL_TREE))
Can you split out factoring this function?
> return false;
> - if (!is_gimple_condexpr_for_cond (t))
> - {
> - gsi = gsi_for_stmt (inner_cond);
> - t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
> - NULL, true, GSI_SAME_STMT);
> - }
> - gimple_cond_set_condition_from_tree (inner_cond, t);
> - update_stmt (inner_cond);
> -
> - /* Leave CFG optimization to cfg_cleanup. */
> - gimple_cond_set_condition_from_tree (outer_cond,
> - outer_inv ? boolean_false_node : boolean_true_node);
> - update_stmt (outer_cond);
> - update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
And moving this?
>
> if (dump_file)
> {
> @@ -576,7 +865,7 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
> else if (TREE_CODE_CLASS (gimple_cond_code (inner_cond)) == tcc_comparison
> && TREE_CODE_CLASS (gimple_cond_code (outer_cond)) == tcc_comparison)
> {
> - tree t;
> + tree t, ts = NULL_TREE;
> enum tree_code inner_cond_code = gimple_cond_code (inner_cond);
> enum tree_code outer_cond_code = gimple_cond_code (outer_cond);
>
> @@ -599,10 +888,20 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
> outer_cond_code,
> gimple_cond_lhs (outer_cond),
> gimple_cond_rhs (outer_cond),
> - gimple_bb (outer_cond))))
> + gimple_bb (outer_cond)))
> + && !(t = (fold_truth_andor_maybe_separate
> + (UNKNOWN_LOCATION, TRUTH_ANDIF_EXPR,
> + boolean_type_node,
> + outer_cond_code,
> + gimple_cond_lhs (outer_cond),
> + gimple_cond_rhs (outer_cond),
> + inner_cond_code,
> + gimple_cond_lhs (inner_cond),
> + gimple_cond_rhs (inner_cond),
> + &ts))))
> {
> + {
> tree t1, t2;
> - gimple_stmt_iterator gsi;
> bool logical_op_non_short_circuit = LOGICAL_OP_NON_SHORT_CIRCUIT;
> if (param_logical_op_non_short_circuit != -1)
> logical_op_non_short_circuit
> @@ -624,34 +923,13 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
> gimple_cond_rhs (outer_cond));
> t = fold_build2_loc (gimple_location (inner_cond),
> TRUTH_AND_EXPR, boolean_type_node, t1, t2);
> - if (result_inv)
> - {
> - t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
> - result_inv = false;
> - }
> - gsi = gsi_for_stmt (inner_cond);
> - t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
> - NULL, true, GSI_SAME_STMT);
> + }
> }
> - if (result_inv)
> - t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
> - t = canonicalize_cond_expr_cond (t);
> - if (!t)
> - return false;
> - if (!is_gimple_condexpr_for_cond (t))
> - {
> - gsi = gsi_for_stmt (inner_cond);
> - t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
> - NULL, true, GSI_SAME_STMT);
> - }
> - gimple_cond_set_condition_from_tree (inner_cond, t);
> - update_stmt (inner_cond);
>
> - /* Leave CFG optimization to cfg_cleanup. */
> - gimple_cond_set_condition_from_tree (outer_cond,
> - outer_inv ? boolean_false_node : boolean_true_node);
> - update_stmt (outer_cond);
> - update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
> + if (!(t = ifcombine_replace_cond (inner_cond, inner_inv,
> + outer_cond, outer_inv,
> + t, false, ts)))
> + return false;
>
> if (dump_file)
> {
> @@ -669,19 +947,21 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
> /* Helper function for tree_ssa_ifcombine_bb. Recognize a CFG pattern and
> dispatch to the appropriate if-conversion helper for a particular
> set of INNER_COND_BB, OUTER_COND_BB, THEN_BB and ELSE_BB.
> - PHI_PRED_BB should be one of INNER_COND_BB, THEN_BB or ELSE_BB. */
> + PHI_PRED_BB should be one of INNER_COND_BB, THEN_BB or ELSE_BB.
> + OUTER_SUCC_BB is the successor of OUTER_COND_BB on the path towards
> + INNER_COND_BB. */
>
> static bool
> tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
> basic_block then_bb, basic_block else_bb,
> - basic_block phi_pred_bb)
> + basic_block phi_pred_bb, basic_block outer_succ_bb)
> {
> /* The && form is characterized by a common else_bb with
> the two edges leading to it mergable. The latter is
> guaranteed by matching PHI arguments in the else_bb and
> the inner cond_bb having no side-effects. */
> if (phi_pred_bb != else_bb
> - && recognize_if_then_else (outer_cond_bb, &inner_cond_bb, &else_bb)
> + && recognize_if_then_else_nc (outer_cond_bb, &outer_succ_bb, &else_bb)
> && same_phi_args_p (outer_cond_bb, phi_pred_bb, else_bb))
> {
> /* We have
> @@ -693,13 +973,12 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
> <else_bb>
> ...
> */
> - return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, false,
> - false);
> + return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, false);
> }
>
> /* And a version where the outer condition is negated. */
> if (phi_pred_bb != else_bb
> - && recognize_if_then_else (outer_cond_bb, &else_bb, &inner_cond_bb)
> + && recognize_if_then_else_nc (outer_cond_bb, &else_bb, &outer_succ_bb)
> && same_phi_args_p (outer_cond_bb, phi_pred_bb, else_bb))
> {
> /* We have
> @@ -711,8 +990,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
> <else_bb>
> ...
> */
> - return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, true,
> - false);
> + return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, true);
> }
>
> /* The || form is characterized by a common then_bb with the
> @@ -720,7 +998,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
> by matching PHI arguments in the then_bb and the inner cond_bb
> having no side-effects. */
> if (phi_pred_bb != then_bb
> - && recognize_if_then_else (outer_cond_bb, &then_bb, &inner_cond_bb)
> + && recognize_if_then_else_nc (outer_cond_bb, &then_bb, &outer_succ_bb)
> && same_phi_args_p (outer_cond_bb, phi_pred_bb, then_bb))
> {
> /* We have
> @@ -731,13 +1009,12 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
> <then_bb>
> ...
> */
> - return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, true,
> - true);
> + return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, true);
> }
>
> /* And a version where the outer condition is negated. */
> if (phi_pred_bb != then_bb
> - && recognize_if_then_else (outer_cond_bb, &inner_cond_bb, &then_bb)
> + && recognize_if_then_else_nc (outer_cond_bb, &outer_succ_bb, &then_bb)
> && same_phi_args_p (outer_cond_bb, phi_pred_bb, then_bb))
> {
> /* We have
> @@ -748,8 +1025,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
> <then_bb>
> ...
> */
> - return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, false,
> - true);
> + return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, false);
> }
>
> return false;
> @@ -759,13 +1035,13 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
> if-conversion helper. We start with BB as the innermost
> worker basic-block. Returns true if a transformation was done. */
>
> -static bool
> +static basic_block
> tree_ssa_ifcombine_bb (basic_block inner_cond_bb)
> {
> basic_block then_bb = NULL, else_bb = NULL;
>
> - if (!recognize_if_then_else (inner_cond_bb, &then_bb, &else_bb))
> - return false;
> + if (!recognize_if_then_else_nc (inner_cond_bb, &then_bb, &else_bb))
> + return NULL;
>
> /* Recognize && and || of two conditions with a common
> then/else block which entry edges we can merge. That is:
> @@ -775,14 +1051,41 @@ tree_ssa_ifcombine_bb (basic_block inner_cond_bb)
> if (a && b)
> ;
> This requires a single predecessor of the inner cond_bb. */
I'm a bit lost as to what the following now matches - it's certainly
not only the
special-case in the above comment. Having the loop also appears to make
it somewhat "unbound"?
Isn't this also an improvement that's independent on the bitfield compare
simplification? If so can you split it out to make the overall patch smaller?
> - if (single_pred_p (inner_cond_bb)
> - && bb_no_side_effects_p (inner_cond_bb))
> + for (basic_block bb = inner_cond_bb, outer_cond_bb, exit_bb = NULL;
> + single_pred_p (bb) && bb_no_side_effects_p (bb)
> + && (!exit_bb || same_phi_args_p (bb, inner_cond_bb, exit_bb));
> + bb = outer_cond_bb)
> {
> - basic_block outer_cond_bb = single_pred (inner_cond_bb);
> + outer_cond_bb = single_pred (bb);
> +
> + /* Skip blocks without conditions. */
> + if (single_succ_p (outer_cond_bb))
> + continue;
> +
> + /* When considering noncontiguous conditions, make sure that all
> + non-final conditions lead to the same successor of the final
> + condition, when not taking the path to inner_bb, so that we can
> + combine C into A, both in A && (B && C), and in A || (B || C), but
> + neither in A && (B || C), nor A || (B && C). Say, if C goes to
> + THEN_BB or ELSE_BB, then B must go to either of these, say X, besides
> + C (whether C is then or else), and A must go to X and B (whether then
> + or else).
> +
> + We test for this, while allowing intervening nonconditional blocks, by
> + first taking note of which of the successors of the inner conditional
> + block is the exit path taken by the first considered outer conditional
> + block.
> +
> + Having ve identified and saved the exit block in exit_bb at the end of
> + the loop, here we test that subsequent conditional blocks under
> + consideration also use the exit block as a successor, besides the
> + block that leads to inner_cond_bb. */
> + if (exit_bb && !recognize_if_succs (outer_cond_bb, bb, exit_bb))
> + break;
>
> if (tree_ssa_ifcombine_bb_1 (inner_cond_bb, outer_cond_bb,
> - then_bb, else_bb, inner_cond_bb))
> - return true;
> + then_bb, else_bb, inner_cond_bb, bb))
> + return outer_cond_bb;
>
> if (forwarder_block_to (else_bb, then_bb))
> {
> @@ -793,8 +1096,8 @@ tree_ssa_ifcombine_bb (basic_block inner_cond_bb)
> For same_phi_args_p we look at equality of arguments between
> edge from outer_cond_bb and the forwarder block. */
> if (tree_ssa_ifcombine_bb_1 (inner_cond_bb, outer_cond_bb, else_bb,
> - then_bb, else_bb))
> - return true;
> + then_bb, else_bb, bb))
> + return outer_cond_bb;
> }
> else if (forwarder_block_to (then_bb, else_bb))
> {
> @@ -805,12 +1108,25 @@ tree_ssa_ifcombine_bb (basic_block inner_cond_bb)
> For same_phi_args_p we look at equality of arguments between
> edge from outer_cond_bb and the forwarder block. */
> if (tree_ssa_ifcombine_bb_1 (inner_cond_bb, outer_cond_bb, else_bb,
> - then_bb, then_bb))
> - return true;
> + then_bb, then_bb, bb))
> + return outer_cond_bb;
> + }
> +
> + /* Record the exit path taken by the outer condition. In the loop
> + condition test, we'll check whether INNER_COND_BB and the current
> + OUTER_COND_BB (then BB) share the same PHI args at EXIT_BB. */
> + if (!exit_bb)
> + {
> + if (recognize_if_succs (outer_cond_bb, then_bb, bb))
> + exit_bb = then_bb;
> + else if (recognize_if_succs (outer_cond_bb, bb, else_bb))
> + exit_bb = else_bb;
> + else
> + break;
> }
> }
>
> - return false;
> + return NULL;
> }
>
> /* Main entry for the tree if-conversion pass. */
> @@ -866,7 +1182,7 @@ pass_tree_ifcombine::execute (function *fun)
> basic_block bb = bbs[i];
>
> if (safe_is_a <gcond *> (*gsi_last_bb (bb)))
> - if (tree_ssa_ifcombine_bb (bb))
> + if (basic_block outer_bb = tree_ssa_ifcombine_bb (bb))
would it help if we'd perform this walk in a more defined manner? In particular
after your more complex CFG matching is single-pred-before-succ order still
appropriate?
> {
> /* Clear range info from all stmts in BB which is now executed
> conditional on a always true/false condition. */
> @@ -885,6 +1201,12 @@ pass_tree_ifcombine::execute (function *fun)
> rewrite_to_defined_overflow (&gsi);
> }
> cfg_changed |= true;
> + /* Go back to outer_bb, in case it could be further optimized, but
> + only at -O2+, since it could get quadratic. */
You're also walking back(?) in tree_ssa_ifcombine_bb now. I also think
single-pred-before-succ order is supposed to catch all 2nd level opportunities?
Thanks,
Richard.
> + do
> + i++;
> + while (optimize >= 2 && bbs[i] != outer_bb);
> + continue;
> }
> }
>
>
> --
> Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/
> Free Software Activist GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
On Oct 10, 2024, Richard Biener <richard.guenther@gmail.com> wrote:
> Thanks for working on this. There's #if 0 portions in the patch - did you
> send the correct version?
'fraid so. Sorry, I'd forgotten about that bit. Long story (not so)
short, this patch is a bit of a frankenstein monster: there's a large
portion from the original fold_truth_andor patch, that deals with the
bit field merging; there's a largish portion from a couple of years ago,
in which I set out to reconstruct the mergeable expressions in gimple
for ifcombine, but failed because vuses would unavoidably be present,
and the latest recent try in which I went ahead with the bits that
dropped vuses, and managed to introduce in ifcombine what I considered
the most valuable contribution in this patch, namely, the combination of
non-neighbor conditions. The logic from each phase was not really
touched in subsequent phases, and years had elapsed, so the recollection
of these cpp conditionals had slipped out of my mind.
My swapped-back-in realization is that I took a copy of
ssa_is_replaceable_p and made it ssa_is_substitutable_p, with some
undesired conditions removed by #if 0 along wiht declarations only used
in those bits. The idea back then was to highlight the differences, so
that, when revisiting it, I coulud perhaps refactor and combine the
functions if that made sense. I'm not sure it does.
> I wonder if we can check the locations as to whether the spelling location
> is different and suppress diagnostics when macro expansions were involved?
Presumably. I had tracked down the ppc condition back in the
fold_truth_andoor version of the patch, but the x86 hits are newer, and
the line numbers didn't help me find the errors, so I don't even know
whether it's macros that we're talking about. These new warnings were
encouraging, for they showed that the ifcombine incarnation of the patch
covered additional combinations that the fold-based one couldn't. But
they only came up in the latest version, so I know the mutually
exclusive conditions are not neighbors.
But I digress. The relevant information here is that the line numbers
are off, so I don't even know whether macros are involved.
> Adding -Wno-error should be the last resort and I suspect such cases
> happen in user code as well?
I'm pretty sure this will come up in user code, yeah, but isn't that
something we'd *want* to flag, rather than suppress? I mean, mutually
exclusive conditions are probably something worth flagging and looking
into even (especially?) if they come from macros. The fact that we
missed them before doesn't strike me as enough of a reason to silence
them.
>> +/* Return TRUE if expression STMT is suitable for replacement. ???
>> + Same as ssa_is_replaceable_p, except that we don't insist it has a
>> + single use. */
>>
>> +static bool
>> +ssa_is_substitutable_p (gimple *stmt)
> I don't yet see how many times you use this - there's other passes
> which need to query a subset of this like DCE or DSE for whether
> a stmt can be elided.
Only twice, in the bits from phase2 that recognize and reconstruct from
gimple condition operand trees that could be merged. I realize now that
perhaps gimple matching could do this job, so perhaps I can get rid of
this.
> This is mostly gimple_has_side_effects () but there's also
> !cfun->can_delete_dead_exceptions && stmt_could_throw_p (cfun, stmt)
Erhm, I don't think the paragraph above applies to
ssa_is_substitutable_p.
> I think the name is probably bad and suggests it's generally applicable,
> so maybe rename it to something more specific for your use?
ACK, will do if I don't get rid of it.
TBH, I don't think I've given this modified version of
tree-outof-ssa.cc's ssa_is_replaceable_p() much thought, I just knew I
didn't mind combining/merging trees even if they had additional uses.
Should I apply your suggestions for it to the original function?
>> +static tree
>> +is_cast_p (tree *name)
> very short name, add a prefix?
Will do if it survives.
>> +{
>> + tree type = 0;
>> +
>> + while (TREE_CODE (*name) == SSA_NAME
>> + && !SSA_NAME_IS_DEFAULT_DEF (*name))
>> + {
>> + gimple *def = SSA_NAME_DEF_STMT (*name);
>> +
>> + if (!ssa_is_substitutable_p (def))
> Overly restrictive/expensive? Just check if it's a GIMPLE assign?
This is from phase2, so the details escape me already, but I don't think
that would be enough. We need to know that the definition is movable in
principle, presumably to the point of use. If we combine conditions,
we'll end up evaluating it at that point.
>> + if (gimple_assign_single_p (def))
>> + {
>> + if (gimple_assign_load_p (def))
>> + break;
>> + *name = gimple_assign_rhs1 (def);
> that could be a constant, or a REALPART_EXPR. Maybe move after ...
>> + continue;
>> + }
>> +
>> + if (!gimple_assign_cast_p (def))
>> + break;
> ... this?
I don't get what you're getting at here.
IIRC this function recognizes gimple that, in the fold logic, used to
come up in (bit)field extraction and manipulation for testing,
particularly type widening and narrowing. It is supposed to preserve
the preexisting logic.
>> +
>> + if (!type)
>> + type = TREE_TYPE (*name);
>> + *name = gimple_assign_rhs1 (def);
>> + }
>> +
>> + return type;
> Overall this is a very odd function as it strips even (widen)(truncate)x?
It's not really stripping anything, just identifying the initial type of
the field that underwent type casts.
>> + /* We are interested in the bare arrangement of bits, so strip everything
>> + that doesn't affect the machine mode. However, record the type of the
>> + outermost expression if it may matter below. */
>> + outer_type = is_cast_p (&exp);
> I think is_cast_p strips more than casts not affecting the machine mode.
> It looks like you want a STRIP_NOPS that works on GIMPLE?
I don't think so. IIRC this is modeled after preexisting code that did
more than STRIP_NOPs, it followed CONVERT_EXPR_P as well.
> I'm not sure if it would scale, but it looks like there's a certain
> set of patterns
> this tries to match - did you think of using a (match ...) pattern in match.pd
> instead?
I didn't. I recall looking into such pattern matching at some point,
before getting started on ifcombine, but I didn't revisit it. The
patterns I needed to recognize to preserve the fold functionality were
so varied depending on details that I wasn't sure that would be useful.
But I guess now I should give it a try.
>> +static tree
>> +unextend (tree c, int p, int unsignedp, tree mask)
> I'd like all of the above to be done via wide_int, that has utilities to build
> (shifted) masks and sign- or zero-extend from specific bits.
Ugh :-) I was hoping I wouldn't have to revisit the bit-manipulation
logic taken from the earliest version of the patch, let alone rewrite it
:-)
>> + tree ref = make_bit_field_ref (loc, unshare_expr (inner),
>> + unshare_expr (orig_inner),
>> + type, bitsize, bitpos,
> I suppose 'inner' is what the earlier get_inner_reference returned - I don't
> think this built load properly preserves TBAA properties, at least
> a MEM[(char *)&a] would have an inner of 'a'?
These bitfield references combine access to multiple fields of a. It's
not very much unlike accessing the entirety of 'a'. What else
could/should it be?
>> +build_split_load (tree /* out */ ln_arg[2],
>> + tree type = lang_hooks.types.type_for_size (bitsiz[i], 1);
> Use type_for_mode?
Hmm, I'm pretty sure I tried something like that but had to resort to
this back in the first version.
>> +static int
>> +mergeable_loads_p (gimple *lstmt, gimple *rstmt)
>> + if (stmt_dominates_stmt_p (lstmt, rstmt))
> Can you avoid these please?
I'm not sure. IIRC we need to know which stmt dominates the other. But
that's only to know where to insert prep stmts; maybe we can return them
and leave it for the caller to gimplify as needed.
> if (gimple_vuse (stmts[0]) != gimple_vuse (stmts[1]))
> return 0;
*doh*, yeah, right, thanks!
> The function is large - maybe some factoring is possible? I also notice
> you return a GENERIC expression here where having a GIMPLE value
> and an alternate gimple_seq output for defining stmts would be better?
It would have to be two seqs, to handle split cases.
I'll give that a try.
>> +static bool
>> +constant_condition_p (gcond *cond)
>> +{
>> + if (!cond)
>> + return true;
> It's a bit odd this returns true for no condition. Maybe
> static_outgoing_edge_p ()?
That name would work. I'll think of other possibilities. I didn't
expect that covering the degenerate case would be a problem.
> (what about a throwing stmt?)
That would prevent ifcombine from considering the block.
> IMO inlining the predicate to it's few(?) uses
> avoids giving it a "proper" name ;)
I'll consider that. Are you suggesting inlining the second
constant_condition_p (that takes a bb), the one quoted above, or both?
>> +/* Same as recognize_if_then_else, but check that the condition is not
>> + constant. It is not useful to combine constant conditions. */
> I do wonder why we ever get to the situation with constant conditions,
> thus why recognize_if_then_else should match them in the first place?
Two situations:
- when there's a straight-line block (that could be cleaned up)
separating blocks with combinable conditions
- when we're crossing a block whose condition has already been combined,
looking for other conditions to combine in earlier blocks
>> +static bool
>> +recognize_if_succs (basic_block cond_bb,
>> + basic_block succ1, basic_block succ2)
> Can you use
> basic_block tem_then = NULL, tem_else = NULL;
> recognize_if_then_else (cond_bb, &tem_then, &tem_else);
> or even
> (recognize_if_then_else (cond_bb, succ1, succ2)
> || recognize_if_then_else (cond_bb, succ2, succ1))
> at callers instead of adding another function?
I could, I recall that it worked that way before I optimized it to avoid
various tests that we don't need but the other function performs, and
even the indirection and optional filling-in that we don't need.
>> @@ -129,7 +183,7 @@ bb_no_side_effects_p (basic_block bb)
>> enum tree_code rhs_code;
>> if (gimple_has_side_effects (stmt)
>> || gimple_could_trap_p (stmt)
>> - || gimple_vuse (stmt)
>> + /* || gimple_vuse (stmt) */
> I think you want to preserve || gimple_vdef (stmt) at least - this possibly
> saves you from checking for intermediate stores.
That's covered by gimple_has_side_effects, I hope.
(checks)
Hmm, apparently not.
Comments in there, and even its name, are now very misleading.
Yeah, looks like we need to rule out gimple_vdef.
> As you special-case hard register uses, do we want to ever make hard-register
> uses or defs unconditional? I'll note that on GIMPLE hard register uses/defs
> are treated as memory, so they have virtual defs and uses.
Erhm, do I? I don't recall that, and I can't tell what you're getting
at.
>> +/* Replace the conditions in INNER_COND and OUTER_COND with COND and COND2.
>> + COND and COND2 are computed for insertion at INNER_COND, with OUTER_COND
>> + replaced with a constant, but if there are intervening blocks, it's best to
>> + adjust COND for insertion at OUTER_COND, placing COND2 at INNER_COND. */
> What kind of "intervening" blocks could there be?
(a.f1 == b.f1 // bb A
&& x != y // bb B
&& a.f2 == b.f2 // bb C
... )
We may be able to combine C into A, without penalty, as long as B has no
side effects that prevent the combination.
>> +static tree
>> +ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
>> + gcond *outer_cond, bool outer_inv,
>> + tree cond, bool must_canon,
>> + tree cond2)
>> +{
[...]
>> + if (tree tcanon = canonicalize_cond_expr_cond (cond))
>> + cond = tcanon;
>> + else if (must_canon)
>> + return NULL_TREE;
> ? I think you should use gimple_build or gimplify what was built to
> a is_gimple_condexpr_for_cond like ifcombine does elsehwere.
That bit is refactored from preexisting code. Some paths required
canonicalization to succeed. I only preserved that.
>> - /* Path outer_cond_bb->(outer2) needs to be merged into path
>> - outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
>> - and probability of inner_not_taken updated. */
>> + /* Mark SSA DEFs that are referenced by cond and may thus need to be
>> + moved to outer. */
>> + {
>> + ifcombine_mark_ssa_name_t data = { used, outer_bb };
>> + walk_tree (&cond, ifcombine_mark_ssa_name_walk, &data, NULL);
>> + }
> I think this moving code should be an enhancement ontop of the basic
> functionality - that is, I'd insert at inner_cond.
Combining A into C (from my example above), as you suggest here, could
penalize a carefully written sequence of conditions, so I'd rather not
do that. It wouldn't even be an optimization.
> The intervening blocks must
> necessarily form straight-line code and thus we're doing "scheduling" here
> which is premature on GIMPLE?
Maybe it wasn't clear from the description that the intervening blocks
could have multiple other unrelated conditions?
>> + /* Leave CFG optimization to cfg_cleanup. */
>> + gimple_cond_set_condition_from_tree (outer_cond,
>> + outer_inv
>> + ? boolean_false_node
>> + : boolean_true_node);
> There is gimple_cond_make_{true,false}
Yeah. This code was factored from ifcombine_ifandif.
I don't like making unrelated changes during refactoring.
I understand I'm to separate the refactoring, so I might as well make
this cleanup (?) another separate patch.
>> + if (!ifcombine_replace_cond (inner_cond, inner_inv,
>> + outer_cond, outer_inv,
>> + t, false, NULL_TREE))
> Can you split out factoring this function?
Heh. Sure. Funny. I had it separate before I folded and consolidated
the patch to submit it.
>> - if (!is_gimple_condexpr_for_cond (t))
>> - {
>> - gsi = gsi_for_stmt (inner_cond);
>> - t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
>> - NULL, true, GSI_SAME_STMT);
>> - }
>> - gimple_cond_set_condition_from_tree (inner_cond, t);
>> - update_stmt (inner_cond);
>> -
>> - /* Leave CFG optimization to cfg_cleanup. */
>> - gimple_cond_set_condition_from_tree (outer_cond,
>> - outer_inv ? boolean_false_node : boolean_true_node);
>> - update_stmt (outer_cond);
>> - update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
> And moving this?
*nod*, it's the same (re)factoring.
>> This requires a single predecessor of the inner cond_bb. */
> I'm a bit lost as to what the following now matches - it's certainly
> not only the
> special-case in the above comment. Having the loop also appears to make
> it somewhat "unbound"?
We're walking up the chain of single-preds looking for an outer block
whose condition can be combined with the inner.
The comment still holds, but I guess pointing out that intervening
blocks may exist and be skipped.
> Isn't this also an improvement that's independent on the bitfield compare
> simplification?
Yes.
> If so can you split it out to make the overall patch smaller?
Heh. Sure. Funny again ;-)
The code for moving defs up and combining into the earlier condition
would probably also go into this separate patch, because without
intervening blocks, using the inner block is always ok, except when we
get a split condition, i.e., when the second condition accesses a field
that straddles over words, and the first condition accesses one of those
words.
Handling that correctly requires the code to move bits up, but it
complicates the patch splitting. Would you prefer a larger patch
sequence, breaking up the pieces so that every patch generates ideal
code for the features it supports, or would an intermediate patch in
which we combine the split conditions at the inner block in a way that
loses the short-circuiting from andif/orif, but that a later patch in
the sequence restores, be ok?
>> if (safe_is_a <gcond *> (*gsi_last_bb (bb)))
>> - if (tree_ssa_ifcombine_bb (bb))
>> + if (basic_block outer_bb = tree_ssa_ifcombine_bb (bb))
> would it help if we'd perform this walk in a more defined manner? In particular
> after your more complex CFG matching is single-pred-before-succ order still
> appropriate?
Perhaps. There's an apparent potential for quadratic behavior here, but
I'm not sure it is actually there. If it is, a different order might or
might not make things better.
I'm not entirely sure that the combination into outer requires
reconsidering outer. ISTM that, if we could have combined outer, we
would have already done that, and then the combination of inner would
involve the same third block.
I am sure, however, that reconsidering inner is necessary for some
optimizations, such as cases in which we split its condition, and then
we could combine part of the condition into one block, and another part
into another.
>> + /* Go back to outer_bb, in case it could be further optimized, but
>> + only at -O2+, since it could get quadratic. */
> You're also walking back(?) in tree_ssa_ifcombine_bb now. I also think
> single-pred-before-succ order is supposed to catch all 2nd level opportunities?
No, we need at the very least to retry the inner block after a
successful combination, because of split conditions, and because in most
cases we bring the outer condition into the inner one.
The uncertainty to me is whether there's any case in which a combination
into outer makes room for outer to be further combined as inner. I
suspect some split condition cases might, and I haven't been able to
convince myself otherwise, nor have I been able to find a case in which
it does (I had testing code to detect this).
Thanks again,
On Mon, Oct 21, 2024 at 4:30 AM Alexandre Oliva <oliva@adacore.com> wrote:
>
> On Oct 10, 2024, Richard Biener <richard.guenther@gmail.com> wrote:
>
> > Thanks for working on this. There's #if 0 portions in the patch - did you
> > send the correct version?
>
> 'fraid so. Sorry, I'd forgotten about that bit. Long story (not so)
> short, this patch is a bit of a frankenstein monster: there's a large
> portion from the original fold_truth_andor patch, that deals with the
> bit field merging; there's a largish portion from a couple of years ago,
> in which I set out to reconstruct the mergeable expressions in gimple
> for ifcombine, but failed because vuses would unavoidably be present,
> and the latest recent try in which I went ahead with the bits that
> dropped vuses, and managed to introduce in ifcombine what I considered
> the most valuable contribution in this patch, namely, the combination of
> non-neighbor conditions. The logic from each phase was not really
> touched in subsequent phases, and years had elapsed, so the recollection
> of these cpp conditionals had slipped out of my mind.
>
> My swapped-back-in realization is that I took a copy of
> ssa_is_replaceable_p and made it ssa_is_substitutable_p, with some
> undesired conditions removed by #if 0 along wiht declarations only used
> in those bits. The idea back then was to highlight the differences, so
> that, when revisiting it, I coulud perhaps refactor and combine the
> functions if that made sense. I'm not sure it does.
>
> > I wonder if we can check the locations as to whether the spelling location
> > is different and suppress diagnostics when macro expansions were involved?
>
> Presumably. I had tracked down the ppc condition back in the
> fold_truth_andoor version of the patch, but the x86 hits are newer, and
> the line numbers didn't help me find the errors, so I don't even know
> whether it's macros that we're talking about. These new warnings were
> encouraging, for they showed that the ifcombine incarnation of the patch
> covered additional combinations that the fold-based one couldn't. But
> they only came up in the latest version, so I know the mutually
> exclusive conditions are not neighbors.
>
> But I digress. The relevant information here is that the line numbers
> are off, so I don't even know whether macros are involved.
>
> > Adding -Wno-error should be the last resort and I suspect such cases
> > happen in user code as well?
>
> I'm pretty sure this will come up in user code, yeah, but isn't that
> something we'd *want* to flag, rather than suppress? I mean, mutually
> exclusive conditions are probably something worth flagging and looking
> into even (especially?) if they come from macros. The fact that we
> missed them before doesn't strike me as enough of a reason to silence
> them.
>
> >> +/* Return TRUE if expression STMT is suitable for replacement. ???
> >> + Same as ssa_is_replaceable_p, except that we don't insist it has a
> >> + single use. */
> >>
> >> +static bool
> >> +ssa_is_substitutable_p (gimple *stmt)
>
> > I don't yet see how many times you use this - there's other passes
> > which need to query a subset of this like DCE or DSE for whether
> > a stmt can be elided.
>
> Only twice, in the bits from phase2 that recognize and reconstruct from
> gimple condition operand trees that could be merged. I realize now that
> perhaps gimple matching could do this job, so perhaps I can get rid of
> this.
>
> > This is mostly gimple_has_side_effects () but there's also
> > !cfun->can_delete_dead_exceptions && stmt_could_throw_p (cfun, stmt)
>
> Erhm, I don't think the paragraph above applies to
> ssa_is_substitutable_p.
>
> > I think the name is probably bad and suggests it's generally applicable,
> > so maybe rename it to something more specific for your use?
>
> ACK, will do if I don't get rid of it.
>
> TBH, I don't think I've given this modified version of
> tree-outof-ssa.cc's ssa_is_replaceable_p() much thought, I just knew I
> didn't mind combining/merging trees even if they had additional uses.
>
> Should I apply your suggestions for it to the original function?
>
> >> +static tree
> >> +is_cast_p (tree *name)
>
> > very short name, add a prefix?
>
> Will do if it survives.
>
> >> +{
> >> + tree type = 0;
> >> +
> >> + while (TREE_CODE (*name) == SSA_NAME
> >> + && !SSA_NAME_IS_DEFAULT_DEF (*name))
> >> + {
> >> + gimple *def = SSA_NAME_DEF_STMT (*name);
> >> +
> >> + if (!ssa_is_substitutable_p (def))
>
> > Overly restrictive/expensive? Just check if it's a GIMPLE assign?
>
> This is from phase2, so the details escape me already, but I don't think
> that would be enough. We need to know that the definition is movable in
> principle, presumably to the point of use. If we combine conditions,
> we'll end up evaluating it at that point.
>
> >> + if (gimple_assign_single_p (def))
> >> + {
> >> + if (gimple_assign_load_p (def))
> >> + break;
> >> + *name = gimple_assign_rhs1 (def);
>
> > that could be a constant, or a REALPART_EXPR. Maybe move after ...
>
> >> + continue;
> >> + }
> >> +
> >> + if (!gimple_assign_cast_p (def))
> >> + break;
>
> > ... this?
>
> I don't get what you're getting at here.
>
> IIRC this function recognizes gimple that, in the fold logic, used to
> come up in (bit)field extraction and manipulation for testing,
> particularly type widening and narrowing. It is supposed to preserve
> the preexisting logic.
The GIMPLE variant of that you came up with looks overly broad to
the extent of being wrong (in the wrong-code sense), but of course
it's always difficult to tell when looking at this piece-wise.
> >> +
> >> + if (!type)
> >> + type = TREE_TYPE (*name);
> >> + *name = gimple_assign_rhs1 (def);
> >> + }
> >> +
> >> + return type;
>
> > Overall this is a very odd function as it strips even (widen)(truncate)x?
>
> It's not really stripping anything, just identifying the initial type of
> the field that underwent type casts.
>
> >> + /* We are interested in the bare arrangement of bits, so strip everything
> >> + that doesn't affect the machine mode. However, record the type of the
> >> + outermost expression if it may matter below. */
> >> + outer_type = is_cast_p (&exp);
>
> > I think is_cast_p strips more than casts not affecting the machine mode.
> > It looks like you want a STRIP_NOPS that works on GIMPLE?
>
> I don't think so. IIRC this is modeled after preexisting code that did
> more than STRIP_NOPs, it followed CONVERT_EXPR_P as well.
>
> > I'm not sure if it would scale, but it looks like there's a certain
> > set of patterns
> > this tries to match - did you think of using a (match ...) pattern in match.pd
> > instead?
>
> I didn't. I recall looking into such pattern matching at some point,
> before getting started on ifcombine, but I didn't revisit it. The
> patterns I needed to recognize to preserve the fold functionality were
> so varied depending on details that I wasn't sure that would be useful.
> But I guess now I should give it a try.
>
> >> +static tree
> >> +unextend (tree c, int p, int unsignedp, tree mask)
>
> > I'd like all of the above to be done via wide_int, that has utilities to build
> > (shifted) masks and sign- or zero-extend from specific bits.
>
> Ugh :-) I was hoping I wouldn't have to revisit the bit-manipulation
> logic taken from the earliest version of the patch, let alone rewrite it
> :-)
>
> >> + tree ref = make_bit_field_ref (loc, unshare_expr (inner),
> >> + unshare_expr (orig_inner),
> >> + type, bitsize, bitpos,
>
> > I suppose 'inner' is what the earlier get_inner_reference returned - I don't
> > think this built load properly preserves TBAA properties, at least
> > a MEM[(char *)&a] would have an inner of 'a'?
>
> These bitfield references combine access to multiple fields of a. It's
> not very much unlike accessing the entirety of 'a'. What else
> could/should it be?
It should be a semantically valid transform ;) It probably falls apart
only for "interesting" cases like in-place construction of a different
typed entity using the storage of 'a' but nevertheless - given 'inner'
isn't necessarily a declaration but could be a pointer dereference
(of aggregate type) this concern is valid.
> >> +build_split_load (tree /* out */ ln_arg[2],
>
> >> + tree type = lang_hooks.types.type_for_size (bitsiz[i], 1);
>
> > Use type_for_mode?
>
> Hmm, I'm pretty sure I tried something like that but had to resort to
> this back in the first version.
>
> >> +static int
> >> +mergeable_loads_p (gimple *lstmt, gimple *rstmt)
>
> >> + if (stmt_dominates_stmt_p (lstmt, rstmt))
>
> > Can you avoid these please?
>
> I'm not sure. IIRC we need to know which stmt dominates the other. But
> that's only to know where to insert prep stmts; maybe we can return them
> and leave it for the caller to gimplify as needed.
>
> > if (gimple_vuse (stmts[0]) != gimple_vuse (stmts[1]))
> > return 0;
>
> *doh*, yeah, right, thanks!
>
> > The function is large - maybe some factoring is possible? I also notice
> > you return a GENERIC expression here where having a GIMPLE value
> > and an alternate gimple_seq output for defining stmts would be better?
>
> It would have to be two seqs, to handle split cases.
> I'll give that a try.
>
> >> +static bool
> >> +constant_condition_p (gcond *cond)
> >> +{
> >> + if (!cond)
> >> + return true;
>
> > It's a bit odd this returns true for no condition. Maybe
> > static_outgoing_edge_p ()?
>
> That name would work. I'll think of other possibilities. I didn't
> expect that covering the degenerate case would be a problem.
>
> > (what about a throwing stmt?)
>
> That would prevent ifcombine from considering the block.
>
> > IMO inlining the predicate to it's few(?) uses
> > avoids giving it a "proper" name ;)
>
> I'll consider that. Are you suggesting inlining the second
> constant_condition_p (that takes a bb), the one quoted above, or both?
The one with the weird semantics (no condition) at least. When reading
code I always prefer slightly longer but "standard API" code over lots
of little "pass specific" helpers that make the code smaller but makes you
need to lookup the exact semantics of said uncommon predicates.
> >> +/* Same as recognize_if_then_else, but check that the condition is not
> >> + constant. It is not useful to combine constant conditions. */
>
> > I do wonder why we ever get to the situation with constant conditions,
> > thus why recognize_if_then_else should match them in the first place?
>
> Two situations:
>
> - when there's a straight-line block (that could be cleaned up)
> separating blocks with combinable conditions
>
> - when we're crossing a block whose condition has already been combined,
> looking for other conditions to combine in earlier blocks
Ah, ok.
> >> +static bool
> >> +recognize_if_succs (basic_block cond_bb,
> >> + basic_block succ1, basic_block succ2)
>
> > Can you use
>
> > basic_block tem_then = NULL, tem_else = NULL;
> > recognize_if_then_else (cond_bb, &tem_then, &tem_else);
>
> > or even
>
> > (recognize_if_then_else (cond_bb, succ1, succ2)
> > || recognize_if_then_else (cond_bb, succ2, succ1))
>
> > at callers instead of adding another function?
>
> I could, I recall that it worked that way before I optimized it to avoid
> various tests that we don't need but the other function performs, and
> even the indirection and optional filling-in that we don't need.
>
>
> >> @@ -129,7 +183,7 @@ bb_no_side_effects_p (basic_block bb)
> >> enum tree_code rhs_code;
> >> if (gimple_has_side_effects (stmt)
> >> || gimple_could_trap_p (stmt)
> >> - || gimple_vuse (stmt)
> >> + /* || gimple_vuse (stmt) */
>
> > I think you want to preserve || gimple_vdef (stmt) at least - this possibly
> > saves you from checking for intermediate stores.
>
> That's covered by gimple_has_side_effects, I hope.
> (checks)
> Hmm, apparently not.
> Comments in there, and even its name, are now very misleading.
gimple_has_side_effects says "effects not represented in the IL",
it doesn't mean side-effect in the sense of how it's used by language standards.
> Yeah, looks like we need to rule out gimple_vdef.
>
> > As you special-case hard register uses, do we want to ever make hard-register
> > uses or defs unconditional? I'll note that on GIMPLE hard register uses/defs
> > are treated as memory, so they have virtual defs and uses.
>
> Erhm, do I? I don't recall that, and I can't tell what you're getting
> at.
Somewhere you tested DECL_HARD_REGISTER?
> >> +/* Replace the conditions in INNER_COND and OUTER_COND with COND and COND2.
> >> + COND and COND2 are computed for insertion at INNER_COND, with OUTER_COND
> >> + replaced with a constant, but if there are intervening blocks, it's best to
> >> + adjust COND for insertion at OUTER_COND, placing COND2 at INNER_COND. */
>
> > What kind of "intervening" blocks could there be?
>
> (a.f1 == b.f1 // bb A
> && x != y // bb B
> && a.f2 == b.f2 // bb C
> ... )
>
> We may be able to combine C into A, without penalty, as long as B has no
> side effects that prevent the combination.
>
> >> +static tree
> >> +ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
> >> + gcond *outer_cond, bool outer_inv,
> >> + tree cond, bool must_canon,
> >> + tree cond2)
> >> +{
> [...]
> >> + if (tree tcanon = canonicalize_cond_expr_cond (cond))
> >> + cond = tcanon;
> >> + else if (must_canon)
> >> + return NULL_TREE;
>
> > ? I think you should use gimple_build or gimplify what was built to
> > a is_gimple_condexpr_for_cond like ifcombine does elsehwere.
>
> That bit is refactored from preexisting code. Some paths required
> canonicalization to succeed. I only preserved that.
>
> >> - /* Path outer_cond_bb->(outer2) needs to be merged into path
> >> - outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
> >> - and probability of inner_not_taken updated. */
> >> + /* Mark SSA DEFs that are referenced by cond and may thus need to be
> >> + moved to outer. */
> >> + {
> >> + ifcombine_mark_ssa_name_t data = { used, outer_bb };
> >> + walk_tree (&cond, ifcombine_mark_ssa_name_walk, &data, NULL);
> >> + }
>
> > I think this moving code should be an enhancement ontop of the basic
> > functionality - that is, I'd insert at inner_cond.
>
> Combining A into C (from my example above), as you suggest here, could
> penalize a carefully written sequence of conditions, so I'd rather not
> do that. It wouldn't even be an optimization.
Hmm, OK. Still this makes the whole thing quite compliated - none of
if-combine currently tries to "re-associate" such cases. I think this feature
addition should be split out at least.
> > The intervening blocks must
> > necessarily form straight-line code and thus we're doing "scheduling" here
> > which is premature on GIMPLE?
>
> Maybe it wasn't clear from the description that the intervening blocks
> could have multiple other unrelated conditions?
Yes, that wasn't clear.
> >> + /* Leave CFG optimization to cfg_cleanup. */
> >> + gimple_cond_set_condition_from_tree (outer_cond,
> >> + outer_inv
> >> + ? boolean_false_node
> >> + : boolean_true_node);
>
> > There is gimple_cond_make_{true,false}
>
> Yeah. This code was factored from ifcombine_ifandif.
> I don't like making unrelated changes during refactoring.
>
> I understand I'm to separate the refactoring, so I might as well make
> this cleanup (?) another separate patch.
>
> >> + if (!ifcombine_replace_cond (inner_cond, inner_inv,
> >> + outer_cond, outer_inv,
> >> + t, false, NULL_TREE))
>
> > Can you split out factoring this function?
>
> Heh. Sure. Funny. I had it separate before I folded and consolidated
> the patch to submit it.
>
> >> - if (!is_gimple_condexpr_for_cond (t))
> >> - {
> >> - gsi = gsi_for_stmt (inner_cond);
> >> - t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
> >> - NULL, true, GSI_SAME_STMT);
> >> - }
> >> - gimple_cond_set_condition_from_tree (inner_cond, t);
> >> - update_stmt (inner_cond);
> >> -
> >> - /* Leave CFG optimization to cfg_cleanup. */
> >> - gimple_cond_set_condition_from_tree (outer_cond,
> >> - outer_inv ? boolean_false_node : boolean_true_node);
> >> - update_stmt (outer_cond);
> >> - update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
>
> > And moving this?
>
> *nod*, it's the same (re)factoring.
>
> >> This requires a single predecessor of the inner cond_bb. */
>
> > I'm a bit lost as to what the following now matches - it's certainly
> > not only the
> > special-case in the above comment. Having the loop also appears to make
> > it somewhat "unbound"?
>
> We're walking up the chain of single-preds looking for an outer block
> whose condition can be combined with the inner.
>
> The comment still holds, but I guess pointing out that intervening
> blocks may exist and be skipped.
>
> > Isn't this also an improvement that's independent on the bitfield compare
> > simplification?
>
> Yes.
>
> > If so can you split it out to make the overall patch smaller?
>
> Heh. Sure. Funny again ;-)
>
> The code for moving defs up and combining into the earlier condition
> would probably also go into this separate patch, because without
> intervening blocks, using the inner block is always ok, except when we
> get a split condition, i.e., when the second condition accesses a field
> that straddles over words, and the first condition accesses one of those
> words.
>
> Handling that correctly requires the code to move bits up, but it
> complicates the patch splitting. Would you prefer a larger patch
> sequence, breaking up the pieces so that every patch generates ideal
> code for the features it supports, or would an intermediate patch in
> which we combine the split conditions at the inner block in a way that
> loses the short-circuiting from andif/orif, but that a later patch in
> the sequence restores, be ok?
I think intermediate non-optimal code is OK. I think doing the general
if-combine enhancements (condition re-association) first would be good,
WRT the code motion I defer to you if it makes sense to disentangle that
from the re-association or not.
I'll note that you do the transforms "greedily" rather than re-constructing
a combined condition from the CFG and find the "global" best combinations.
On my overly long TODO (or rather "wishlist") is also the ability to
do condition re-association _without_ necessarily combining, for example
when profile feedback identifies an easy to predict currently inner condition
we want to test that first (turn it to the outer condition). Iff the whole
condition would be combined then the scalar reassoc pass would also
re-associate conditions to do "related" checks together, like transform
(a == 1) || (c == 3) || (a == 7) to (a == 1) || (a == 7) || (c == 3).
Being able
to identify the CFG part that can be freely associated (no side-effects
intervening) this way is a first step thats somehow hidden in your patch.
>
> >> if (safe_is_a <gcond *> (*gsi_last_bb (bb)))
> >> - if (tree_ssa_ifcombine_bb (bb))
> >> + if (basic_block outer_bb = tree_ssa_ifcombine_bb (bb))
>
> > would it help if we'd perform this walk in a more defined manner? In particular
> > after your more complex CFG matching is single-pred-before-succ order still
> > appropriate?
>
> Perhaps. There's an apparent potential for quadratic behavior here, but
> I'm not sure it is actually there. If it is, a different order might or
> might not make things better.
>
> I'm not entirely sure that the combination into outer requires
> reconsidering outer. ISTM that, if we could have combined outer, we
> would have already done that, and then the combination of inner would
> involve the same third block.
>
> I am sure, however, that reconsidering inner is necessary for some
> optimizations, such as cases in which we split its condition, and then
> we could combine part of the condition into one block, and another part
> into another.
>
> >> + /* Go back to outer_bb, in case it could be further optimized, but
> >> + only at -O2+, since it could get quadratic. */
>
> > You're also walking back(?) in tree_ssa_ifcombine_bb now. I also think
> > single-pred-before-succ order is supposed to catch all 2nd level opportunities?
>
> No, we need at the very least to retry the inner block after a
> successful combination, because of split conditions, and because in most
> cases we bring the outer condition into the inner one.
>
> The uncertainty to me is whether there's any case in which a combination
> into outer makes room for outer to be further combined as inner. I
> suspect some split condition cases might, and I haven't been able to
> convince myself otherwise, nor have I been able to find a case in which
> it does (I had testing code to detect this).
I see. Maybe we want to put some hard limit on the amount of work the
pass does trying to form combinations.
Thanks,
Richard.
>
> Thanks again,
>
> --
> Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/
> Free Software Activist GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
On Oct 22, 2024, Richard Biener <richard.guenther@gmail.com> wrote:
> On Mon, Oct 21, 2024 at 4:30 AM Alexandre Oliva <oliva@adacore.com> wrote:
>>
>> On Oct 10, 2024, Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> > As you special-case hard register uses, do we want to ever make hard-register
>> > uses or defs unconditional? I'll note that on GIMPLE hard register uses/defs
>> > are treated as memory, so they have virtual defs and uses.
>>
>> Erhm, do I? I don't recall that, and I can't tell what you're getting
>> at.
> Somewhere you tested DECL_HARD_REGISTER?
Aah, yeah, that was in ssa_is_substitutable_p, mostly copied from
ssa_is_replaceable_p.
> I'll note that you do the transforms "greedily" rather than re-constructing
> a combined condition from the CFG and find the "global" best combinations.
*nod*
For the intended case, namely combining bitfield compares whose bits are
loaded from the same "word", this greedy approach is fine: the
combinations accumulate until all combinable tests involving the same
"words" become one. For other circumstances, I can see that smarter
algorithms might be called for.
> Being able
> to identify the CFG part that can be freely associated (no side-effects
> intervening) this way is a first step thats somehow hidden in your patch.
*nod*
Hopefully it will be more clearly visible in the upcoming patchset.
> I see. Maybe we want to put some hard limit on the amount of work the
> pass does trying to form combinations.
I`ve dropped the retrying ifcombining preds, and instead made the loop
that searches for an outer node to ifcombine with inner run all the way.
I suppose we could add a param to limit that, but exploring all pairs of
conditions in a straight-line region whose conditionals all share the
same exit doesn't strike me as particularly expensive. It's
theoretically O(n^2) on the block count, where the previous constraint
on combining neighbor blocks was O(n), but the constraints of no side
effects and on the shape of the region seem to make this not much of an
issue.
On Oct 20, 2024, Alexandre Oliva <oliva@adacore.com> wrote:
> the x86 hits are newer, and
> the line numbers didn't help me find the errors, so I don't even know
> whether it's macros that we're talking about. These new warnings were
> encouraging, for they showed that the ifcombine incarnation of the patch
> covered additional combinations that the fold-based one couldn't. But
> they only came up in the latest version, so I know the mutually
> exclusive conditions are not neighbors.
I've looked into the error-warning in insn-attr.cc.
It's in get_attr_unit's handling of vec_dupv4sf.
(set (attr "type")
(cond [(and (eq_attr "alternative" "0")
(match_test "!TARGET_AVX2"))
(const_string "sseshuf1")
(eq_attr "alternative" "3")
(const_string "sseshuf1")
]
(const_string "ssemov")))
both cond cases, as well as the fallback value, all match the types for
unit sse. The generated code for that is the negation of the cond
conditions, followed by each of the conditions:
if ((((which_alternative != 0) || (!((!TARGET_AVX2))))
&& (which_alternative != 3))
|| ((which_alternative == 0) && ((!TARGET_AVX2)))
|| (which_alternative == 3))
return UNIT_SSE;
else
return UNIT_INTEGER;
until mergephi2, the return value is represented sort of as:
(((which_alternative != 0
|| TARGET_AVX2)
&& which_alternative != 3))
? UNIT_SSE
: (which_alternative == 0) && !TARGET_AVX2
? UNIT_SSE
: (which_alternative == 3)
? UNIT_SSE
: UNIT_INTEGER)
threadfull1 simplifies this to:
((which_alternative != 0)
? UNIT_SSE
: TARGET_AVX2
? UNIT_SSE
: !TARGET_AVX2
? UNIT_SSE
: UNIT_INTEGER)
until the field-load-merging ifcombine looks at the bit tests and
realizes they are mutually exclusive:
((which_alternative != 0)
? UNIT_SSE
: UNIT_SSE)
and that eventually gets down to UNIT_SSE, which perhaps genattrtab
could have generated to begin with.
Without the field-load merging improvement to ifcombine, we end up
taking a far more convoluted path to the same end result: phiopt2
optimizes the final ternary op to a binary op:
((which_alternative != 0)
? UNIT_SSE
: TARGET_AVX2
? UNIT_SSE
: !TARGET_AVX2 * UNIT_SSE)
(because UNIT_INTEGER is zero), then dom2 realizes that the value of
!TARGET_AVX2 is known at that point, so we get:
((which_alternative != 0)
? UNIT_SSE
: TARGET_AVX2
? UNIT_SSE
: UNIT_SSE)
reassoc1 drops the trailing test:
((which_alternative != 0)
? UNIT_SSE
: UNIT_SSE)
so does dce3, leaving only the load of which_alternative, that only dse3
drops.
So it's not quite the case that we don't realize that the conditions are
mutually exclusive, or that they are initially complementary, but the
way we take advantage of it doesn't trigger warnings.
With the new optimization, we get the warning we would have got from
fold if the conditions were neighbors. But since they only become
mutually exclusive after jump threading, fold can't deal with it, but
the improved ifcombine does, and it reports it.
Since the condition was not originally mutually exclusive, and only
became so after jump threading and various other optimizations, I
suppose we shouldn't warn any more, it could flag too many false
positives.
This also shows opportunity to improve genattrtab, that fails to notice
that all paths lead to Rome, so to speak, and outputs a reasonably
complex condition that should optimize down to a constant boolean.
@@ -79,3 +79,5 @@ s-i386-bt: $(srcdir)/config/i386/i386-builtin-types.awk \
$(AWK) -f $^ > tmp-bt.inc
$(SHELL) $(srcdir)/../move-if-change tmp-bt.inc i386-builtin-types.inc
$(STAMP) $@
+
+insn-attrtab.o-warn = -Wno-error
@@ -95,6 +95,10 @@ $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
$(srcdir)/config/rs6000/rs6000-tables.opt
+# FRAME_GROWS_DOWNWARD tests flag_sanitize in a way that rules out a
+# test in toplev.c.
+toplev.o-warn = -Wno-error
+
# The rs6000 backend doesn't cause warnings in these files.
insn-conditions.o-warn =
@@ -137,7 +137,6 @@ static tree range_successor (tree);
static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
static tree fold_cond_expr_with_comparison (location_t, tree, enum tree_code,
tree, tree, tree, tree);
-static tree unextend (tree, int, int, tree);
static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *);
static tree fold_binary_op_with_conditional_arg (location_t,
@@ -4701,7 +4700,7 @@ invert_truthvalue_loc (location_t loc, tree arg)
is the original memory reference used to preserve the alias set of
the access. */
-static tree
+tree
make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type,
HOST_WIDE_INT bitsize, poly_int64 bitpos,
int unsignedp, int reversep)
@@ -4951,136 +4950,6 @@ optimize_bit_field_compare (location_t loc, enum tree_code code,
return lhs;
}
-/* Subroutine for fold_truth_andor_1: decode a field reference.
-
- If EXP is a comparison reference, we return the innermost reference.
-
- *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
- set to the starting bit number.
-
- If the innermost field can be completely contained in a mode-sized
- unit, *PMODE is set to that mode. Otherwise, it is set to VOIDmode.
-
- *PVOLATILEP is set to 1 if the any expression encountered is volatile;
- otherwise it is not changed.
-
- *PUNSIGNEDP is set to the signedness of the field.
-
- *PREVERSEP is set to the storage order of the field.
-
- *PMASK is set to the mask used. This is either contained in a
- BIT_AND_EXPR or derived from the width of the field.
-
- *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
-
- Return 0 if this is not a component reference or is one that we can't
- do anything with. */
-
-static tree
-decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
- HOST_WIDE_INT *pbitpos, machine_mode *pmode,
- int *punsignedp, int *preversep, int *pvolatilep,
- tree *pmask, tree *pand_mask)
-{
- tree exp = *exp_;
- tree outer_type = 0;
- tree and_mask = 0;
- tree mask, inner, offset;
- tree unsigned_type;
- unsigned int precision;
-
- /* All the optimizations using this function assume integer fields.
- There are problems with FP fields since the type_for_size call
- below can fail for, e.g., XFmode. */
- if (! INTEGRAL_TYPE_P (TREE_TYPE (exp)))
- return NULL_TREE;
-
- /* We are interested in the bare arrangement of bits, so strip everything
- that doesn't affect the machine mode. However, record the type of the
- outermost expression if it may matter below. */
- if (CONVERT_EXPR_P (exp)
- || TREE_CODE (exp) == NON_LVALUE_EXPR)
- outer_type = TREE_TYPE (exp);
- STRIP_NOPS (exp);
-
- if (TREE_CODE (exp) == BIT_AND_EXPR)
- {
- and_mask = TREE_OPERAND (exp, 1);
- exp = TREE_OPERAND (exp, 0);
- STRIP_NOPS (exp); STRIP_NOPS (and_mask);
- if (TREE_CODE (and_mask) != INTEGER_CST)
- return NULL_TREE;
- }
-
- poly_int64 poly_bitsize, poly_bitpos;
- inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
- pmode, punsignedp, preversep, pvolatilep);
- if ((inner == exp && and_mask == 0)
- || !poly_bitsize.is_constant (pbitsize)
- || !poly_bitpos.is_constant (pbitpos)
- || *pbitsize < 0
- || offset != 0
- || TREE_CODE (inner) == PLACEHOLDER_EXPR
- /* We eventually want to build a larger reference and need to take
- the address of this. */
- || (!REFERENCE_CLASS_P (inner) && !DECL_P (inner))
- /* Reject out-of-bound accesses (PR79731). */
- || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
- && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
- *pbitpos + *pbitsize) < 0))
- return NULL_TREE;
-
- unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
- if (unsigned_type == NULL_TREE)
- return NULL_TREE;
-
- *exp_ = exp;
-
- /* If the number of bits in the reference is the same as the bitsize of
- the outer type, then the outer type gives the signedness. Otherwise
- (in case of a small bitfield) the signedness is unchanged. */
- if (outer_type && *pbitsize == TYPE_PRECISION (outer_type))
- *punsignedp = TYPE_UNSIGNED (outer_type);
-
- /* Compute the mask to access the bitfield. */
- precision = TYPE_PRECISION (unsigned_type);
-
- mask = build_int_cst_type (unsigned_type, -1);
-
- mask = const_binop (LSHIFT_EXPR, mask, size_int (precision - *pbitsize));
- mask = const_binop (RSHIFT_EXPR, mask, size_int (precision - *pbitsize));
-
- /* Merge it with the mask we found in the BIT_AND_EXPR, if any. */
- if (and_mask != 0)
- mask = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
- fold_convert_loc (loc, unsigned_type, and_mask), mask);
-
- *pmask = mask;
- *pand_mask = and_mask;
- return inner;
-}
-
-/* Return nonzero if MASK represents a mask of SIZE ones in the low-order
- bit positions and MASK is SIGNED. */
-
-static bool
-all_ones_mask_p (const_tree mask, unsigned int size)
-{
- tree type = TREE_TYPE (mask);
- unsigned int precision = TYPE_PRECISION (type);
-
- /* If this function returns true when the type of the mask is
- UNSIGNED, then there will be errors. In particular see
- gcc.c-torture/execute/990326-1.c. There does not appear to be
- any documentation paper trail as to why this is so. But the pre
- wide-int worked with that restriction and it has been preserved
- here. */
- if (size > precision || TYPE_SIGN (type) == UNSIGNED)
- return false;
-
- return wi::mask (size, false, precision) == wi::to_wide (mask);
-}
-
/* Subroutine for fold: determine if VAL is the INTEGER_CONST that
represents the sign bit of EXP's type. If EXP represents a sign
or zero extension, also test VAL against the unextended type.
@@ -6390,48 +6259,6 @@ fold_range_test (location_t loc, enum tree_code code, tree type,
return 0;
}
-/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
- bit value. Arrange things so the extra bits will be set to zero if and
- only if C is signed-extended to its full width. If MASK is nonzero,
- it is an INTEGER_CST that should be AND'ed with the extra bits. */
-
-static tree
-unextend (tree c, int p, int unsignedp, tree mask)
-{
- tree type = TREE_TYPE (c);
- int modesize = GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (type));
- tree temp;
-
- if (p == modesize || unsignedp)
- return c;
-
- /* We work by getting just the sign bit into the low-order bit, then
- into the high-order bit, then sign-extend. We then XOR that value
- with C. */
- temp = build_int_cst (TREE_TYPE (c),
- wi::extract_uhwi (wi::to_wide (c), p - 1, 1));
-
- /* We must use a signed type in order to get an arithmetic right shift.
- However, we must also avoid introducing accidental overflows, so that
- a subsequent call to integer_zerop will work. Hence we must
- do the type conversion here. At this point, the constant is either
- zero or one, and the conversion to a signed type can never overflow.
- We could get an overflow if this conversion is done anywhere else. */
- if (TYPE_UNSIGNED (type))
- temp = fold_convert (signed_type_for (type), temp);
-
- temp = const_binop (LSHIFT_EXPR, temp, size_int (modesize - 1));
- temp = const_binop (RSHIFT_EXPR, temp, size_int (modesize - p - 1));
- if (mask != 0)
- temp = const_binop (BIT_AND_EXPR, temp,
- fold_convert (TREE_TYPE (c), mask));
- /* If necessary, convert the type back to match the type of C. */
- if (TYPE_UNSIGNED (type))
- temp = fold_convert (type, temp);
-
- return fold_convert (type, const_binop (BIT_XOR_EXPR, c, temp));
-}
-
/* For an expression that has the form
(A && B) || ~B
or
@@ -6502,20 +6329,13 @@ merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
lhs, rhs);
return NULL_TREE;
}
-
+
/* Find ways of folding logical expressions of LHS and RHS:
Try to merge two comparisons to the same innermost item.
Look for range tests like "ch >= '0' && ch <= '9'".
Look for combinations of simple terms on machines with expensive branches
and evaluate the RHS unconditionally.
- For example, if we have p->a == 2 && p->b == 4 and we can make an
- object large enough to span both A and B, we can do this with a comparison
- against the object ANDed with the a mask.
-
- If we have p->a == q->a && p->b == q->b, we may be able to use bit masking
- operations to do this with one comparison.
-
We check for both normal comparisons and the BIT_AND_EXPRs made this by
function and the one above.
@@ -6540,24 +6360,9 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
comparison for one-bit fields. */
- enum tree_code wanted_code;
enum tree_code lcode, rcode;
tree ll_arg, lr_arg, rl_arg, rr_arg;
- tree ll_inner, lr_inner, rl_inner, rr_inner;
- HOST_WIDE_INT ll_bitsize, ll_bitpos, lr_bitsize, lr_bitpos;
- HOST_WIDE_INT rl_bitsize, rl_bitpos, rr_bitsize, rr_bitpos;
- HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
- HOST_WIDE_INT lnbitsize, lnbitpos, rnbitsize, rnbitpos;
- int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
- int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
- machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
- scalar_int_mode lnmode, rnmode;
- tree ll_mask, lr_mask, rl_mask, rr_mask;
- tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
- tree l_const, r_const;
- tree lntype, rntype, result;
- HOST_WIDE_INT first_bit, end_bit;
- int volatilep;
+ tree result;
/* Start by getting the comparison codes. Fail if anything is volatile.
If one operand is a BIT_AND_EXPR with the constant one, treat it as if
@@ -6652,316 +6457,7 @@ fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
build_int_cst (TREE_TYPE (ll_arg), 0));
}
- /* See if the comparisons can be merged. Then get all the parameters for
- each side. */
-
- if ((lcode != EQ_EXPR && lcode != NE_EXPR)
- || (rcode != EQ_EXPR && rcode != NE_EXPR))
- return 0;
-
- ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
- volatilep = 0;
- ll_inner = decode_field_reference (loc, &ll_arg,
- &ll_bitsize, &ll_bitpos, &ll_mode,
- &ll_unsignedp, &ll_reversep, &volatilep,
- &ll_mask, &ll_and_mask);
- lr_inner = decode_field_reference (loc, &lr_arg,
- &lr_bitsize, &lr_bitpos, &lr_mode,
- &lr_unsignedp, &lr_reversep, &volatilep,
- &lr_mask, &lr_and_mask);
- rl_inner = decode_field_reference (loc, &rl_arg,
- &rl_bitsize, &rl_bitpos, &rl_mode,
- &rl_unsignedp, &rl_reversep, &volatilep,
- &rl_mask, &rl_and_mask);
- rr_inner = decode_field_reference (loc, &rr_arg,
- &rr_bitsize, &rr_bitpos, &rr_mode,
- &rr_unsignedp, &rr_reversep, &volatilep,
- &rr_mask, &rr_and_mask);
-
- /* It must be true that the inner operation on the lhs of each
- comparison must be the same if we are to be able to do anything.
- Then see if we have constants. If not, the same must be true for
- the rhs's. */
- if (volatilep
- || ll_reversep != rl_reversep
- || ll_inner == 0 || rl_inner == 0
- || ! operand_equal_p (ll_inner, rl_inner, 0))
- return 0;
-
- if (TREE_CODE (lr_arg) == INTEGER_CST
- && TREE_CODE (rr_arg) == INTEGER_CST)
- {
- l_const = lr_arg, r_const = rr_arg;
- lr_reversep = ll_reversep;
- }
- else if (lr_reversep != rr_reversep
- || lr_inner == 0 || rr_inner == 0
- || ! operand_equal_p (lr_inner, rr_inner, 0))
- return 0;
- else
- l_const = r_const = 0;
-
- /* If either comparison code is not correct for our logical operation,
- fail. However, we can convert a one-bit comparison against zero into
- the opposite comparison against that bit being set in the field. */
-
- wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
- if (lcode != wanted_code)
- {
- if (l_const && integer_zerop (l_const) && integer_pow2p (ll_mask))
- {
- /* Make the left operand unsigned, since we are only interested
- in the value of one bit. Otherwise we are doing the wrong
- thing below. */
- ll_unsignedp = 1;
- l_const = ll_mask;
- }
- else
- return 0;
- }
-
- /* This is analogous to the code for l_const above. */
- if (rcode != wanted_code)
- {
- if (r_const && integer_zerop (r_const) && integer_pow2p (rl_mask))
- {
- rl_unsignedp = 1;
- r_const = rl_mask;
- }
- else
- return 0;
- }
-
- /* See if we can find a mode that contains both fields being compared on
- the left. If we can't, fail. Otherwise, update all constants and masks
- to be relative to a field of that size. */
- first_bit = MIN (ll_bitpos, rl_bitpos);
- end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
- if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
- TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
- volatilep, &lnmode))
- return 0;
-
- lnbitsize = GET_MODE_BITSIZE (lnmode);
- lnbitpos = first_bit & ~ (lnbitsize - 1);
- lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
- xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
-
- if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
- {
- xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
- xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
- }
-
- ll_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc, lntype, ll_mask),
- size_int (xll_bitpos));
- rl_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc, lntype, rl_mask),
- size_int (xrl_bitpos));
- if (ll_mask == NULL_TREE || rl_mask == NULL_TREE)
- return 0;
-
- if (l_const)
- {
- l_const = fold_convert_loc (loc, lntype, l_const);
- l_const = unextend (l_const, ll_bitsize, ll_unsignedp, ll_and_mask);
- l_const = const_binop (LSHIFT_EXPR, l_const, size_int (xll_bitpos));
- if (l_const == NULL_TREE)
- return 0;
- if (! integer_zerop (const_binop (BIT_AND_EXPR, l_const,
- fold_build1_loc (loc, BIT_NOT_EXPR,
- lntype, ll_mask))))
- {
- warning (0, "comparison is always %d", wanted_code == NE_EXPR);
-
- return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
- }
- }
- if (r_const)
- {
- r_const = fold_convert_loc (loc, lntype, r_const);
- r_const = unextend (r_const, rl_bitsize, rl_unsignedp, rl_and_mask);
- r_const = const_binop (LSHIFT_EXPR, r_const, size_int (xrl_bitpos));
- if (r_const == NULL_TREE)
- return 0;
- if (! integer_zerop (const_binop (BIT_AND_EXPR, r_const,
- fold_build1_loc (loc, BIT_NOT_EXPR,
- lntype, rl_mask))))
- {
- warning (0, "comparison is always %d", wanted_code == NE_EXPR);
-
- return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
- }
- }
-
- /* If the right sides are not constant, do the same for it. Also,
- disallow this optimization if a size, signedness or storage order
- mismatch occurs between the left and right sides. */
- if (l_const == 0)
- {
- if (ll_bitsize != lr_bitsize || rl_bitsize != rr_bitsize
- || ll_unsignedp != lr_unsignedp || rl_unsignedp != rr_unsignedp
- || ll_reversep != lr_reversep
- /* Make sure the two fields on the right
- correspond to the left without being swapped. */
- || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos)
- return 0;
-
- first_bit = MIN (lr_bitpos, rr_bitpos);
- end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
- if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
- TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
- volatilep, &rnmode))
- return 0;
-
- rnbitsize = GET_MODE_BITSIZE (rnmode);
- rnbitpos = first_bit & ~ (rnbitsize - 1);
- rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
- xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
-
- if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
- {
- xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
- xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
- }
-
- lr_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc,
- rntype, lr_mask),
- size_int (xlr_bitpos));
- rr_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc,
- rntype, rr_mask),
- size_int (xrr_bitpos));
- if (lr_mask == NULL_TREE || rr_mask == NULL_TREE)
- return 0;
-
- /* Make a mask that corresponds to both fields being compared.
- Do this for both items being compared. If the operands are the
- same size and the bits being compared are in the same position
- then we can do this by masking both and comparing the masked
- results. */
- ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
- lr_mask = const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
- if (lnbitsize == rnbitsize
- && xll_bitpos == xlr_bitpos
- && lnbitpos >= 0
- && rnbitpos >= 0)
- {
- lhs = make_bit_field_ref (loc, ll_inner, ll_arg,
- lntype, lnbitsize, lnbitpos,
- ll_unsignedp || rl_unsignedp, ll_reversep);
- if (! all_ones_mask_p (ll_mask, lnbitsize))
- lhs = build2 (BIT_AND_EXPR, lntype, lhs, ll_mask);
-
- rhs = make_bit_field_ref (loc, lr_inner, lr_arg,
- rntype, rnbitsize, rnbitpos,
- lr_unsignedp || rr_unsignedp, lr_reversep);
- if (! all_ones_mask_p (lr_mask, rnbitsize))
- rhs = build2 (BIT_AND_EXPR, rntype, rhs, lr_mask);
-
- return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
- }
-
- /* There is still another way we can do something: If both pairs of
- fields being compared are adjacent, we may be able to make a wider
- field containing them both.
-
- Note that we still must mask the lhs/rhs expressions. Furthermore,
- the mask must be shifted to account for the shift done by
- make_bit_field_ref. */
- if (((ll_bitsize + ll_bitpos == rl_bitpos
- && lr_bitsize + lr_bitpos == rr_bitpos)
- || (ll_bitpos == rl_bitpos + rl_bitsize
- && lr_bitpos == rr_bitpos + rr_bitsize))
- && ll_bitpos >= 0
- && rl_bitpos >= 0
- && lr_bitpos >= 0
- && rr_bitpos >= 0)
- {
- tree type;
-
- lhs = make_bit_field_ref (loc, ll_inner, ll_arg, lntype,
- ll_bitsize + rl_bitsize,
- MIN (ll_bitpos, rl_bitpos),
- ll_unsignedp, ll_reversep);
- rhs = make_bit_field_ref (loc, lr_inner, lr_arg, rntype,
- lr_bitsize + rr_bitsize,
- MIN (lr_bitpos, rr_bitpos),
- lr_unsignedp, lr_reversep);
-
- ll_mask = const_binop (RSHIFT_EXPR, ll_mask,
- size_int (MIN (xll_bitpos, xrl_bitpos)));
- lr_mask = const_binop (RSHIFT_EXPR, lr_mask,
- size_int (MIN (xlr_bitpos, xrr_bitpos)));
- if (ll_mask == NULL_TREE || lr_mask == NULL_TREE)
- return 0;
-
- /* Convert to the smaller type before masking out unwanted bits. */
- type = lntype;
- if (lntype != rntype)
- {
- if (lnbitsize > rnbitsize)
- {
- lhs = fold_convert_loc (loc, rntype, lhs);
- ll_mask = fold_convert_loc (loc, rntype, ll_mask);
- type = rntype;
- }
- else if (lnbitsize < rnbitsize)
- {
- rhs = fold_convert_loc (loc, lntype, rhs);
- lr_mask = fold_convert_loc (loc, lntype, lr_mask);
- type = lntype;
- }
- }
-
- if (! all_ones_mask_p (ll_mask, ll_bitsize + rl_bitsize))
- lhs = build2 (BIT_AND_EXPR, type, lhs, ll_mask);
-
- if (! all_ones_mask_p (lr_mask, lr_bitsize + rr_bitsize))
- rhs = build2 (BIT_AND_EXPR, type, rhs, lr_mask);
-
- return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
- }
-
- return 0;
- }
-
- /* Handle the case of comparisons with constants. If there is something in
- common between the masks, those bits of the constants must be the same.
- If not, the condition is always false. Test for this to avoid generating
- incorrect code below. */
- result = const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
- if (! integer_zerop (result)
- && simple_cst_equal (const_binop (BIT_AND_EXPR, result, l_const),
- const_binop (BIT_AND_EXPR, result, r_const)) != 1)
- {
- if (wanted_code == NE_EXPR)
- {
- warning (0, "%<or%> of unmatched not-equal tests is always 1");
- return constant_boolean_node (true, truth_type);
- }
- else
- {
- warning (0, "%<and%> of mutually exclusive equal-tests is always 0");
- return constant_boolean_node (false, truth_type);
- }
- }
-
- if (lnbitpos < 0)
- return 0;
-
- /* Construct the expression we will return. First get the component
- reference we will make. Unless the mask is all ones the width of
- that field, perform the mask operation. Then compare with the
- merged constant. */
- result = make_bit_field_ref (loc, ll_inner, ll_arg,
- lntype, lnbitsize, lnbitpos,
- ll_unsignedp || rl_unsignedp, ll_reversep);
-
- ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
- if (! all_ones_mask_p (ll_mask, lnbitsize))
- result = build2_loc (loc, BIT_AND_EXPR, lntype, result, ll_mask);
-
- return build2_loc (loc, wanted_code, truth_type, result,
- const_binop (BIT_IOR_EXPR, l_const, r_const));
+ return 0;
}
/* T is an integer expression that is being multiplied, divided, or taken a
@@ -253,11 +253,19 @@ extern tree fold_build_pointer_plus_hwi_loc (location_t loc, tree ptr, HOST_WIDE
extern tree_code minmax_from_comparison (tree_code, tree, tree,
tree, tree);
+extern tree make_bit_field_ref (location_t, tree, tree, tree,
+ HOST_WIDE_INT, poly_int64, int, int);
+
/* In gimple-fold.cc. */
extern void clear_type_padding_in_mask (tree, unsigned char *);
extern bool clear_padding_type_may_have_padding_p (tree);
extern bool arith_overflowed_p (enum tree_code, const_tree, const_tree,
const_tree);
+extern tree fold_truth_andor_maybe_separate (location_t, enum tree_code, tree,
+ enum tree_code, tree, tree,
+ enum tree_code, tree, tree,
+ tree *);
+
/* Class used to compare gimple operands. */
@@ -69,6 +69,7 @@ along with GCC; see the file COPYING3. If not see
#include "varasm.h"
#include "internal-fn.h"
#include "gimple-range.h"
+#include "tree-ssa-loop-niter.h" // stmt_dominates_stmt_p
enum strlen_range_kind {
/* Compute the exact constant string length. */
@@ -7384,7 +7385,1321 @@ maybe_fold_comparisons_from_match_pd (tree type, enum tree_code code,
return NULL_TREE;
}
+
+/* Return TRUE if expression STMT is suitable for replacement. ???
+ Same as ssa_is_replaceable_p, except that we don't insist it has a
+ single use. */
+static bool
+ssa_is_substitutable_p (gimple *stmt)
+{
+#if 0
+ use_operand_p use_p;
+#endif
+ tree def;
+#if 0
+ gimple *use_stmt;
+#endif
+
+ /* Only consider modify stmts. */
+ if (!is_gimple_assign (stmt))
+ return false;
+
+ /* If the statement may throw an exception, it cannot be replaced. */
+ if (stmt_could_throw_p (cfun, stmt))
+ return false;
+
+ /* Punt if there is more than 1 def. */
+ def = SINGLE_SSA_TREE_OPERAND (stmt, SSA_OP_DEF);
+ if (!def)
+ return false;
+
+#if 0
+ /* Only consider definitions which have a single use. */
+ if (!single_imm_use (def, &use_p, &use_stmt))
+ return false;
+
+ /* Used in this block, but at the TOP of the block, not the end. */
+ if (gimple_code (use_stmt) == GIMPLE_PHI)
+ return false;
+#endif
+
+ /* There must be no VDEFs. */
+ if (gimple_vdef (stmt))
+ return false;
+
+ /* Float expressions must go through memory if float-store is on. */
+ if (flag_float_store
+ && FLOAT_TYPE_P (TREE_TYPE (def)))
+ return false;
+
+ /* An assignment with a register variable on the RHS is not
+ replaceable. */
+ if (gimple_assign_rhs_code (stmt) == VAR_DECL
+ && DECL_HARD_REGISTER (gimple_assign_rhs1 (stmt)))
+ return false;
+
+ /* No function calls can be replaced. */
+ if (is_gimple_call (stmt))
+ return false;
+
+ /* Leave any stmt with volatile operands alone as well. */
+ if (gimple_has_volatile_ops (stmt))
+ return false;
+
+ return true;
+}
+
+/* Follow substitutable SSA DEFs for *NAME, including type casts,
+ adjusting *NAME to the single rhs or the type cast operand along
+ the way. Return the target type of the earliest type cast
+ found. */
+
+static tree
+is_cast_p (tree *name)
+{
+ tree type = 0;
+
+ while (TREE_CODE (*name) == SSA_NAME
+ && !SSA_NAME_IS_DEFAULT_DEF (*name))
+ {
+ gimple *def = SSA_NAME_DEF_STMT (*name);
+
+ if (!ssa_is_substitutable_p (def))
+ break;
+
+ if (gimple_num_ops (def) != 2)
+ break;
+
+ if (gimple_assign_single_p (def))
+ {
+ if (gimple_assign_load_p (def))
+ break;
+ *name = gimple_assign_rhs1 (def);
+ continue;
+ }
+
+ if (!gimple_assign_cast_p (def))
+ break;
+
+ if (!type)
+ type = TREE_TYPE (*name);
+ *name = gimple_assign_rhs1 (def);
+ }
+
+ return type;
+}
+
+/* Follow substitutable SSA DEFs for *NAME. If we find it is assigned
+ a CODE binary expr, return the second operand, and set *NAME to the
+ first operand. */
+
+static tree
+is_binop_p (enum tree_code code, tree *name)
+{
+ while (TREE_CODE (*name) == SSA_NAME
+ && !SSA_NAME_IS_DEFAULT_DEF (*name))
+ {
+ gimple *def = SSA_NAME_DEF_STMT (*name);
+
+ if (!ssa_is_substitutable_p (def))
+ return 0;
+
+ switch (gimple_num_ops (def))
+ {
+ default:
+ return 0;
+
+ case 2:
+ if (gimple_assign_single_p (def) && !gimple_assign_load_p (def))
+ {
+ *name = gimple_assign_rhs1 (def);
+ continue;
+ }
+ return 0;
+
+ case 3:
+ break;
+ }
+
+ if (gimple_assign_rhs_code (def) != code)
+ return 0;
+
+ *name = gimple_assign_rhs1 (def);
+ return gimple_assign_rhs2 (def);
+ }
+
+ return 0;
+}
+
+/* If *R_ARG is a constant zero, and L_ARG is a possibly masked
+ BIT_XOR_EXPR, return 1 and set *r_arg to l_arg.
+ Otherwise, return 0.
+
+ The returned value should be passed to decode_field_reference for it
+ to handle l_arg, and then doubled for r_arg. */
+
+static int
+prepare_xor (tree l_arg, tree *r_arg)
+{
+ int ret = 0;
+
+ if (!integer_zerop (*r_arg))
+ return ret;
+
+ tree exp = l_arg;
+
+ if (tree and_mask = is_binop_p (BIT_AND_EXPR, &exp))
+ {
+ if (TREE_CODE (and_mask) != INTEGER_CST)
+ return ret;
+ }
+
+ if (is_binop_p (BIT_XOR_EXPR, &exp))
+ {
+ *r_arg = l_arg;
+ return 1;
+ }
+
+ return ret;
+}
+
+/* If EXP is a SSA_NAME whose DEF is a load stmt, set *LOAD to it and
+ return its RHS, otherwise return EXP. */
+
+static tree
+follow_load (tree exp, gimple **load)
+{
+ if (TREE_CODE (exp) == SSA_NAME
+ && !SSA_NAME_IS_DEFAULT_DEF (exp))
+ {
+ gimple *def = SSA_NAME_DEF_STMT (exp);
+ if (gimple_assign_load_p (def))
+ {
+ *load = def;
+ exp = gimple_assign_rhs1 (def);
+ }
+ }
+
+ return exp;
+}
+
+/* Subroutine for fold_truth_andor_1: decode a field reference.
+
+ If EXP is a comparison reference, we return the innermost reference.
+
+ *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
+ set to the starting bit number.
+
+ If the innermost field can be completely contained in a mode-sized
+ unit, *PMODE is set to that mode. Otherwise, it is set to VOIDmode.
+
+ *PVOLATILEP is set to 1 if the any expression encountered is volatile;
+ otherwise it is not changed.
+
+ *PUNSIGNEDP is set to the signedness of the field.
+
+ *PREVERSEP is set to the storage order of the field.
+
+ *PMASK is set to the mask used. This is either contained in a
+ BIT_AND_EXPR or derived from the width of the field.
+
+ *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
+
+ XOR_WHICH is 1 or 2 if EXP was found to be a (possibly masked)
+ BIT_XOR_EXPR compared with zero. We're to take the first or second
+ operand thereof if so. It should be zero otherwise.
+
+ *LOAD is set to the load stmt of the innermost reference, if any,
+ *and NULL otherwise.
+
+ Return 0 if this is not a component reference or is one that we can't
+ do anything with. */
+
+static tree
+decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
+ HOST_WIDE_INT *pbitpos, machine_mode *pmode,
+ int *punsignedp, int *preversep, int *pvolatilep,
+ tree *pmask, tree *pand_mask, int xor_which,
+ gimple **load)
+{
+ tree exp = *exp_;
+ tree outer_type = 0;
+ tree and_mask = 0;
+ tree mask, inner, offset;
+ tree unsigned_type;
+ unsigned int precision;
+ HOST_WIDE_INT shiftrt = 0;
+
+ *load = NULL;
+
+ /* All the optimizations using this function assume integer fields.
+ There are problems with FP fields since the type_for_size call
+ below can fail for, e.g., XFmode. */
+ if (! INTEGRAL_TYPE_P (TREE_TYPE (exp)))
+ return NULL_TREE;
+
+ /* We are interested in the bare arrangement of bits, so strip everything
+ that doesn't affect the machine mode. However, record the type of the
+ outermost expression if it may matter below. */
+ outer_type = is_cast_p (&exp);
+
+ if ((and_mask = is_binop_p (BIT_AND_EXPR, &exp)))
+ {
+ if (TREE_CODE (and_mask) != INTEGER_CST)
+ return NULL_TREE;
+ }
+
+ if (xor_which)
+ {
+ tree op2 = is_binop_p (BIT_XOR_EXPR, &exp);
+ gcc_checking_assert (op2);
+ if (xor_which > 1)
+ exp = op2;
+ }
+
+ if (tree t = is_cast_p (&exp))
+ if (!outer_type)
+ outer_type = t;
+
+ if (tree shift = is_binop_p (RSHIFT_EXPR, &exp))
+ {
+ if (TREE_CODE (shift) != INTEGER_CST || !tree_fits_shwi_p (shift))
+ return NULL_TREE;
+ shiftrt = tree_to_shwi (shift);
+ if (shiftrt <= 0)
+ return NULL_TREE;
+ }
+
+ if (tree t = is_cast_p (&exp))
+ if (!outer_type)
+ outer_type = t;
+
+ exp = follow_load (exp, load);
+
+ poly_int64 poly_bitsize, poly_bitpos;
+ inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
+ pmode, punsignedp, preversep, pvolatilep);
+
+ if ((inner == exp && and_mask == 0)
+ || !poly_bitsize.is_constant (pbitsize)
+ || !poly_bitpos.is_constant (pbitpos)
+ || *pbitsize <= shiftrt
+ || offset != 0
+ || TREE_CODE (inner) == PLACEHOLDER_EXPR
+ /* Reject out-of-bound accesses (PR79731). */
+ || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
+ && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
+ *pbitpos + *pbitsize) < 0))
+ return NULL_TREE;
+
+ if (shiftrt)
+ {
+ if (!*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+ *pbitpos += shiftrt;
+ *pbitsize -= shiftrt;
+ }
+
+ if (outer_type && *pbitsize > TYPE_PRECISION (outer_type))
+ {
+ HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type);
+ if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+ *pbitpos += excess;
+ *pbitsize -= excess;
+ }
+
+ unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
+ if (unsigned_type == NULL_TREE)
+ return NULL_TREE;
+
+ *exp_ = exp;
+
+ /* If the number of bits in the reference is the same as the bitsize of
+ the outer type, then the outer type gives the signedness. Otherwise
+ (in case of a small bitfield) the signedness is unchanged. */
+ if (outer_type && *pbitsize == TYPE_PRECISION (outer_type))
+ *punsignedp = TYPE_UNSIGNED (outer_type);
+
+ /* Compute the mask to access the bitfield. */
+ precision = TYPE_PRECISION (unsigned_type);
+
+ mask = build_int_cst_type (unsigned_type, -1);
+
+ mask = int_const_binop (LSHIFT_EXPR, mask, size_int (precision - *pbitsize));
+ mask = int_const_binop (RSHIFT_EXPR, mask, size_int (precision - *pbitsize));
+
+ /* Merge it with the mask we found in the BIT_AND_EXPR, if any. */
+ if (and_mask != 0)
+ mask = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
+ fold_convert_loc (loc, unsigned_type, and_mask), mask);
+
+ *pmask = mask;
+ *pand_mask = and_mask;
+ return inner;
+}
+
+/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
+ bit value. Arrange things so the extra bits will be set to zero if and
+ only if C is signed-extended to its full width. If MASK is nonzero,
+ it is an INTEGER_CST that should be AND'ed with the extra bits. */
+
+static tree
+unextend (tree c, int p, int unsignedp, tree mask)
+{
+ tree type = TREE_TYPE (c);
+ int modesize = GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (type));
+ tree temp;
+
+ if (p == modesize || unsignedp)
+ return c;
+
+ /* We work by getting just the sign bit into the low-order bit, then
+ into the high-order bit, then sign-extend. We then XOR that value
+ with C. */
+ temp = build_int_cst (TREE_TYPE (c),
+ wi::extract_uhwi (wi::to_wide (c), p - 1, 1));
+
+ /* We must use a signed type in order to get an arithmetic right shift.
+ However, we must also avoid introducing accidental overflows, so that
+ a subsequent call to integer_zerop will work. Hence we must
+ do the type conversion here. At this point, the constant is either
+ zero or one, and the conversion to a signed type can never overflow.
+ We could get an overflow if this conversion is done anywhere else. */
+ if (TYPE_UNSIGNED (type))
+ temp = fold_convert (signed_type_for (type), temp);
+
+ temp = int_const_binop (LSHIFT_EXPR, temp, size_int (modesize - 1));
+ temp = int_const_binop (RSHIFT_EXPR, temp, size_int (modesize - p - 1));
+ if (mask != 0)
+ temp = int_const_binop (BIT_AND_EXPR, temp,
+ fold_convert (TREE_TYPE (c), mask));
+ /* If necessary, convert the type back to match the type of C. */
+ if (TYPE_UNSIGNED (type))
+ temp = fold_convert (type, temp);
+
+ return fold_convert (type, int_const_binop (BIT_XOR_EXPR, c, temp));
+}
+
+/* Return the one bitpos within bit extents L or R that is at an
+ ALIGN-bit alignment boundary, or -1 if there is more than one such
+ boundary, if there isn't any, or if there is any such boundary
+ between the extents. L and R are given by bitpos and bitsize. If
+ it doesn't return -1, there are two consecutive ALIGN-bit words
+ that contain both extents, and at least one of the extents
+ straddles across the returned alignment boundary. */
+
+static inline HOST_WIDE_INT
+compute_split_boundary_from_align (HOST_WIDE_INT align,
+ HOST_WIDE_INT l_bitpos,
+ HOST_WIDE_INT l_bitsize,
+ HOST_WIDE_INT r_bitpos,
+ HOST_WIDE_INT r_bitsize)
+{
+ HOST_WIDE_INT amask = ~(align - 1);
+
+ HOST_WIDE_INT first_bit = MIN (l_bitpos, r_bitpos);
+ HOST_WIDE_INT end_bit = MAX (l_bitpos + l_bitsize, r_bitpos + r_bitsize);
+
+ HOST_WIDE_INT boundary = (end_bit - 1) & amask;
+
+ /* Make sure we're crossing no more than one alignment boundary.
+
+ ??? We don't have logic to recombine loads of two adjacent
+ fields that each crosses a different alignment boundary, so
+ as to load the middle word only once, if other words can't be
+ otherwise recombined. */
+ if (boundary - first_bit > align)
+ return -1;
+
+ HOST_WIDE_INT l_start_word = l_bitpos & amask;
+ HOST_WIDE_INT l_end_word = (l_bitpos + l_bitsize - 1) & amask;
+
+ HOST_WIDE_INT r_start_word = r_bitpos & amask;
+ HOST_WIDE_INT r_end_word = (r_bitpos + r_bitsize - 1) & amask;
+
+ /* If neither field straddles across an alignment boundary, it's no
+ use to even try to merge them. */
+ if (l_start_word == l_end_word && r_start_word == r_end_word)
+ return -1;
+
+ return boundary;
+}
+
+/* Make a bit_field_ref. If POINT is NULL, return the BIT_FIELD_REF.
+ Otherwise, build and insert a load stmt before POINT, and return
+ the SSA_NAME. ??? Rewrite LOAD in terms of the bitfield? */
+
+static tree
+make_bit_field_load (location_t loc, tree inner, tree orig_inner, tree type,
+ HOST_WIDE_INT bitsize, poly_int64 bitpos,
+ int unsignedp, int reversep, gimple *point)
+{
+ tree ref = make_bit_field_ref (loc, unshare_expr (inner),
+ unshare_expr (orig_inner),
+ type, bitsize, bitpos,
+ unsignedp, reversep);
+ if (!point)
+ return ref;
+
+ gimple_stmt_iterator gsi = gsi_for_stmt (point);
+ return force_gimple_operand_gsi (&gsi, ref,
+ true, NULL_TREE,
+ true, GSI_SAME_STMT);
+}
+
+/* Initialize ln_arg[0] and ln_arg[1] to a pair of newly-created (at
+ LOC) loads from INNER (from ORIG_INNER), of modes MODE and MODE2,
+ respectively, starting at BIT_POS, using reversed endianness if
+ REVERSEP. Also initialize BITPOS (the starting position of each
+ part into INNER), BITSIZ (the bit count starting at BITPOS),
+ TOSHIFT[1] (the amount by which the part and its mask are to be
+ shifted right to bring its least-significant bit to bit zero) and
+ SHIFTED (the amount by which the part, by separate loading, has
+ already been shifted right, but that the mask needs shifting to
+ match). */
+
+static inline void
+build_split_load (tree /* out */ ln_arg[2],
+ HOST_WIDE_INT /* out */ bitpos[2],
+ HOST_WIDE_INT /* out */ bitsiz[2],
+ HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
+ HOST_WIDE_INT /* out */ shifted[2],
+ location_t loc, tree inner, tree orig_inner,
+ scalar_int_mode mode, scalar_int_mode mode2,
+ HOST_WIDE_INT bit_pos, bool reversep,
+ gimple *point[2])
+{
+ bitsiz[0] = GET_MODE_BITSIZE (mode);
+ bitsiz[1] = GET_MODE_BITSIZE (mode2);
+
+ for (int i = 0; i < 2; i++)
+ {
+ tree type = lang_hooks.types.type_for_size (bitsiz[i], 1);
+ bitpos[i] = bit_pos;
+ ln_arg[i] = make_bit_field_load (loc, inner, orig_inner,
+ type, bitsiz[i],
+ bit_pos, 1, reversep, point[i]);
+ bit_pos += bitsiz[i];
+ }
+
+ toshift[1] = toshift[0];
+ if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+ {
+ shifted[0] = bitsiz[1];
+ shifted[1] = 0;
+ toshift[0] = 0;
+ }
+ else
+ {
+ shifted[1] = bitsiz[0];
+ shifted[0] = 0;
+ toshift[1] = 0;
+ }
+}
+
+/* Make arrangements to split at bit BOUNDARY a single loaded word
+ (with REVERSEP bit order) LN_ARG[0], to be shifted right by
+ TOSHIFT[0] to bring the field of interest to the least-significant
+ bit. The expectation is that the same loaded word will be
+ propagated from part 0 to part 1, with just different shifting and
+ masking to extract both parts. MASK is not expected to do more
+ than masking out the bits that belong to the other part. See
+ build_split_load for more information on the other fields. */
+
+static inline void
+reuse_split_load (tree /* in[0] out[1] */ ln_arg[2],
+ HOST_WIDE_INT /* in[0] out[1] */ bitpos[2],
+ HOST_WIDE_INT /* in[0] out[1] */ bitsiz[2],
+ HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
+ HOST_WIDE_INT /* out */ shifted[2],
+ tree /* out */ mask[2],
+ HOST_WIDE_INT boundary, bool reversep)
+{
+ ln_arg[1] = ln_arg[0];
+ bitpos[1] = bitpos[0];
+ bitsiz[1] = bitsiz[0];
+ shifted[1] = shifted[0] = 0;
+
+ tree basemask = build_int_cst_type (TREE_TYPE (ln_arg[0]), -1);
+
+ if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+ {
+ toshift[1] = toshift[0];
+ toshift[0] = bitpos[0] + bitsiz[0] - boundary;
+ mask[0] = int_const_binop (LSHIFT_EXPR, basemask,
+ bitsize_int (toshift[0]));
+ mask[1] = int_const_binop (BIT_XOR_EXPR, basemask, mask[0]);
+ }
+ else
+ {
+ toshift[1] = boundary - bitpos[1];
+ mask[1] = int_const_binop (LSHIFT_EXPR, basemask,
+ bitsize_int (toshift[1]));
+ mask[0] = int_const_binop (BIT_XOR_EXPR, basemask, mask[1]);
+ }
+}
+
+/* Return nonzero if LSTMT and RSTMT, assumed to load from the same
+ words, can be safely merged into a single load. Return zero if
+ there are intervening stores, or if neither stmt dominates the
+ other. If they are mergeable, return -1 if the merged load should
+ be inserted before LSTMT to dominate both, otherwise return +1. */
+
+static int
+mergeable_loads_p (gimple *lstmt, gimple *rstmt)
+{
+ gimple *stmts[2] = { lstmt, rstmt };
+ int ret;
+
+ if (stmt_dominates_stmt_p (lstmt, rstmt))
+ {
+ ret = -1;
+ stmts[0] = lstmt;
+ stmts[1] = rstmt;
+ }
+ else if (stmt_dominates_stmt_p (rstmt, lstmt))
+ {
+ ret = 1;
+ stmts[1] = lstmt;
+ stmts[0] = rstmt;
+ }
+ else
+ return 0;
+
+ basic_block bbs[2] = { gimple_bb (stmts[0]), gimple_bb (stmts[1]) };
+ if (bbs[0] == bbs[1])
+ {
+ for (gimple_stmt_iterator gsi = gsi_for_stmt (stmts[0]);
+ gsi_stmt (gsi) != stmts[1]; gsi_next (&gsi))
+ if (gimple_vdef (gsi_stmt (gsi)))
+ return 0;
+ return ret;
+ }
+
+ for (gimple_stmt_iterator gsi = gsi_for_stmt (stmts[0]);
+ !gsi_end_p (gsi); gsi_next (&gsi))
+ if (gimple_vdef (gsi_stmt (gsi)))
+ return 0;
+
+ for (gphi_iterator gpi = gsi_start_phis (bbs[1]);
+ !gsi_end_p (gpi); gsi_next (&gpi))
+ if (gimple_vdef (gsi_stmt (gpi)))
+ return 0;
+
+ for (gimple_stmt_iterator gsi = gsi_for_stmt (stmts[1]);
+ !gsi_end_p (gsi); gsi_prev (&gsi))
+ if (gimple_vdef (gsi_stmt (gsi)))
+ return 0;
+
+ for (basic_block bb = bbs[1]; bb != bbs[0]; )
+ {
+ /* ??? For now, stop if any visited block has more than one
+ predecessor. */
+ if (!single_pred_p (bb))
+ return 0;
+
+ bb = single_pred (bb);
+
+ for (gphi_iterator gpi = gsi_start_phis (bbs[1]);
+ !gsi_end_p (gpi); gsi_next (&gpi))
+ if (gimple_vdef (gsi_stmt (gpi)))
+ return 0;
+
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+ !gsi_end_p (gsi); gsi_next (&gsi))
+ if (gimple_vdef (gsi_stmt (gsi)))
+ return 0;
+ }
+
+ return ret;
+}
+
+/* Find ways of folding logical expressions of LHS and RHS:
+ Try to merge two comparisons to the same innermost item.
+ Look for range tests like "ch >= '0' && ch <= '9'".
+ Look for combinations of simple terms on machines with expensive branches
+ and evaluate the RHS unconditionally.
+
+ For example, if we have p->a == 2 && p->b == 4 and we can make an
+ object large enough to span both A and B, we can do this with a comparison
+ against the object ANDed with the a mask.
+
+ If we have p->a == q->a && p->b == q->b, we may be able to use bit masking
+ operations to do this with one comparison.
+
+ We check for both normal comparisons and the BIT_AND_EXPRs made this by
+ function and the one above.
+
+ CODE is the logical operation being done. It can be TRUTH_ANDIF_EXPR,
+ TRUTH_AND_EXPR, TRUTH_ORIF_EXPR, or TRUTH_OR_EXPR.
+
+ TRUTH_TYPE is the type of the logical operand and LHS and RHS are its
+ two operands.
+
+ SEPARATEP should be NULL if LHS and RHS are adjacent in
+ CODE-chained compares, and a non-NULL pointer to NULL_TREE
+ otherwise. If the "words" accessed by RHS are already accessed by
+ LHS, this won't matter, but if RHS accesses "words" that LHS
+ doesn't, then *SEPARATEP will be set to the compares that should
+ take RHS's place. By "words" we mean contiguous bits that do not
+ cross a an TYPE_ALIGN boundary of the accessed object's type.
+
+ We return the simplified tree or 0 if no optimization is possible. */
+
+tree
+fold_truth_andor_maybe_separate (location_t loc,
+ enum tree_code code, tree truth_type,
+ enum tree_code lcode, tree ll_arg, tree lr_arg,
+ enum tree_code rcode, tree rl_arg, tree rr_arg,
+ tree *separatep)
+{
+ /* If this is the "or" of two comparisons, we can do something if
+ the comparisons are NE_EXPR. If this is the "and", we can do something
+ if the comparisons are EQ_EXPR. I.e.,
+ (a->b == 2 && a->c == 4) can become (a->new == NEW).
+
+ WANTED_CODE is this operation code. For single bit fields, we can
+ convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
+ comparison for one-bit fields. */
+
+ enum tree_code orig_code = code;
+ enum tree_code wanted_code;
+ tree ll_inner, lr_inner, rl_inner, rr_inner;
+ gimple *ll_load, *lr_load, *rl_load, *rr_load;
+ int l_mergeable = 0, r_mergeable = 0;
+ HOST_WIDE_INT ll_bitsize, ll_bitpos, lr_bitsize, lr_bitpos;
+ HOST_WIDE_INT rl_bitsize, rl_bitpos, rr_bitsize, rr_bitpos;
+ HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
+ HOST_WIDE_INT lnbitsize, lnbitpos, rnbitsize, rnbitpos;
+ int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
+ int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
+ machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
+ scalar_int_mode lnmode, lnmode2, rnmode;
+ tree ll_mask, lr_mask, rl_mask, rr_mask;
+ tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
+ tree l_const, r_const;
+ tree lntype, rntype, result;
+ HOST_WIDE_INT first_bit, end_bit;
+ int volatilep;
+ bool l_split_load;
+
+ gcc_checking_assert (!separatep || !*separatep);
+
+ /* Start by getting the comparison codes. Fail if anything is volatile.
+ If one operand is a BIT_AND_EXPR with the constant one, treat it as if
+ it were surrounded with a NE_EXPR. */
+
+ if (TREE_SIDE_EFFECTS (ll_arg) || TREE_SIDE_EFFECTS (lr_arg)
+ || TREE_SIDE_EFFECTS (rl_arg) || TREE_SIDE_EFFECTS (rr_arg))
+ return 0;
+
+ if (TREE_CODE_CLASS (lcode) != tcc_comparison
+ || TREE_CODE_CLASS (rcode) != tcc_comparison)
+ return 0;
+
+ code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR)
+ ? TRUTH_AND_EXPR : TRUTH_OR_EXPR);
+
+ bool lsignbit = false, rsignbit = false;
+ if ((lcode == LT_EXPR || lcode == GE_EXPR)
+ && integer_zerop (lr_arg)
+ && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg))
+ && !TYPE_UNSIGNED (TREE_TYPE (ll_arg)))
+ {
+ lsignbit = true;
+ lcode = (lcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
+ }
+ if ((rcode == LT_EXPR || rcode == GE_EXPR)
+ && integer_zerop (rr_arg)
+ && INTEGRAL_TYPE_P (TREE_TYPE (rl_arg))
+ && !TYPE_UNSIGNED (TREE_TYPE (rl_arg)))
+ {
+ rsignbit = true;
+ rcode = (rcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
+ }
+
+ /* See if the comparisons can be merged. Then get all the parameters for
+ each side. */
+
+ if ((lcode != EQ_EXPR && lcode != NE_EXPR)
+ || (rcode != EQ_EXPR && rcode != NE_EXPR))
+ return 0;
+
+ ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
+ volatilep = 0;
+ int l_xor = prepare_xor (ll_arg, &lr_arg);
+ ll_inner = decode_field_reference (loc, &ll_arg,
+ &ll_bitsize, &ll_bitpos, &ll_mode,
+ &ll_unsignedp, &ll_reversep, &volatilep,
+ &ll_mask, &ll_and_mask, l_xor,
+ &ll_load);
+ lr_inner = decode_field_reference (loc, &lr_arg,
+ &lr_bitsize, &lr_bitpos, &lr_mode,
+ &lr_unsignedp, &lr_reversep, &volatilep,
+ &lr_mask, &lr_and_mask, 2 * l_xor,
+ &lr_load);
+ int r_xor = prepare_xor (rl_arg, &rr_arg);
+ rl_inner = decode_field_reference (loc, &rl_arg,
+ &rl_bitsize, &rl_bitpos, &rl_mode,
+ &rl_unsignedp, &rl_reversep, &volatilep,
+ &rl_mask, &rl_and_mask, r_xor,
+ &rl_load);
+ rr_inner = decode_field_reference (loc, &rr_arg,
+ &rr_bitsize, &rr_bitpos, &rr_mode,
+ &rr_unsignedp, &rr_reversep, &volatilep,
+ &rr_mask, &rr_and_mask, 2 * r_xor,
+ &rr_load);
+
+ /* It must be true that the inner operation on the lhs of each
+ comparison must be the same if we are to be able to do anything.
+ Then see if we have constants. If not, the same must be true for
+ the rhs's. */
+ if (volatilep
+ || ll_reversep != rl_reversep
+ || ll_inner == 0 || rl_inner == 0
+ || ! operand_equal_p (ll_inner, rl_inner, 0)
+ || (ll_load && rl_load
+ && ! (l_mergeable = mergeable_loads_p (ll_load, rl_load))))
+ return 0;
+
+ if (TREE_CODE (lr_arg) == INTEGER_CST
+ && TREE_CODE (rr_arg) == INTEGER_CST)
+ {
+ l_const = lr_arg, r_const = rr_arg;
+ lr_reversep = ll_reversep;
+ }
+ else if (lr_reversep != rr_reversep
+ || lr_inner == 0 || rr_inner == 0
+ || ! operand_equal_p (lr_inner, rr_inner, 0)
+ || (lr_load && rr_load
+ && ! (r_mergeable = mergeable_loads_p (lr_load, rr_load))))
+ return 0;
+ else
+ l_const = r_const = 0;
+
+ if (lsignbit)
+ {
+ tree mask = build_int_cst_type (TREE_TYPE (ll_arg), -1);
+ tree sign = int_const_binop (LSHIFT_EXPR, mask,
+ bitsize_int (ll_bitsize - 1));
+ if (!ll_mask)
+ ll_mask = sign;
+ else
+ ll_mask = int_const_binop (BIT_AND_EXPR, ll_mask, sign);
+ if (!ll_and_mask)
+ ll_and_mask = sign;
+ else
+ ll_and_mask = int_const_binop (BIT_AND_EXPR, ll_and_mask, sign);
+ }
+
+ if (rsignbit)
+ {
+ tree mask = build_int_cst_type (TREE_TYPE (rl_arg), -1);
+ tree sign = int_const_binop (LSHIFT_EXPR, mask,
+ bitsize_int (rl_bitsize - 1));
+ if (!rl_mask)
+ rl_mask = sign;
+ else
+ rl_mask = int_const_binop (BIT_AND_EXPR, rl_mask, sign);
+ if (!rl_and_mask)
+ rl_and_mask = sign;
+ else
+ rl_and_mask = int_const_binop (BIT_AND_EXPR, rl_and_mask, sign);
+ }
+
+ /* If either comparison code is not correct for our logical operation,
+ fail. However, we can convert a one-bit comparison against zero into
+ the opposite comparison against that bit being set in the field. */
+
+ wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
+ if (lcode != wanted_code)
+ {
+ if (l_const && integer_zerop (l_const) && integer_pow2p (ll_mask))
+ {
+ /* Make the left operand unsigned, since we are only interested
+ in the value of one bit. Otherwise we are doing the wrong
+ thing below. */
+ ll_unsignedp = 1;
+ l_const = ll_mask;
+ }
+ else
+ return 0;
+ }
+
+ /* This is analogous to the code for l_const above. */
+ if (rcode != wanted_code)
+ {
+ if (r_const && integer_zerop (r_const) && integer_pow2p (rl_mask))
+ {
+ rl_unsignedp = 1;
+ r_const = rl_mask;
+ }
+ else
+ return 0;
+ }
+
+ /* This will be bumped to 2 if any of the field pairs crosses an
+ alignment boundary, so the merged compare has to be done in two
+ parts. */
+ int parts = 1;
+ /* Set to true if the second combined compare should come first,
+ e.g., because the second original compare accesses a word that
+ the first one doesn't, and the combined compares access those in
+ cmp[0]. */
+ bool first1 = false;
+ /* Set to true if the first original compare is not the one being
+ split. */
+ bool maybe_separate = false;
+
+ /* The following 2-dimensional arrays use the first index to
+ identify left(0)- vs right(1)-hand compare operands, and the
+ second one to identify merged compare parts. */
+ /* The memory loads or constants to be compared. */
+ tree ld_arg[2][2];
+ /* The first bit of the corresponding inner object that the
+ corresponding LD_ARG covers. */
+ HOST_WIDE_INT bitpos[2][2];
+ /* The bit count starting at BITPOS that the corresponding LD_ARG
+ covers. */
+ HOST_WIDE_INT bitsiz[2][2];
+ /* The number of bits by which LD_ARG has already been shifted
+ right, WRT mask. */
+ HOST_WIDE_INT shifted[2][2];
+ /* The number of bits by which both LD_ARG and MASK need shifting to
+ bring its least-significant bit to bit zero. */
+ HOST_WIDE_INT toshift[2][2];
+ /* An additional mask to be applied to LD_ARG, to remove any bits
+ that may have been loaded for use in another compare, but that
+ don't belong in the corresponding compare. */
+ tree xmask[2][2] = {};
+
+ /* The combined compare or compares. */
+ tree cmp[2];
+
+ /* Consider we're comparing two non-contiguous fields of packed
+ structs, both aligned at 32-bit boundaries:
+
+ ll_arg: an 8-bit field at offset 0
+ lr_arg: a 16-bit field at offset 2
+
+ rl_arg: an 8-bit field at offset 1
+ rr_arg: a 16-bit field at offset 3
+
+ We'll have r_split_load, because rr_arg straddles across an
+ alignment boundary.
+
+ We'll want to have:
+
+ bitpos = { { 0, 0 }, { 0, 32 } }
+ bitsiz = { { 32, 32 }, { 32, 8 } }
+
+ And, for little-endian:
+
+ shifted = { { 0, 0 }, { 0, 32 } }
+ toshift = { { 0, 24 }, { 0, 0 } }
+
+ Or, for big-endian:
+
+ shifted = { { 0, 0 }, { 8, 0 } }
+ toshift = { { 8, 0 }, { 0, 0 } }
+ */
+
+ /* See if we can find a mode that contains both fields being compared on
+ the left. If we can't, fail. Otherwise, update all constants and masks
+ to be relative to a field of that size. */
+ first_bit = MIN (ll_bitpos, rl_bitpos);
+ end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
+ if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+ TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
+ volatilep, &lnmode))
+ {
+ /* Consider the possibility of recombining loads if any of the
+ fields straddles across an alignment boundary, so that either
+ part can be loaded along with the other field. */
+ HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner));
+ HOST_WIDE_INT boundary = compute_split_boundary_from_align
+ (align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
+
+ if (boundary < 0
+ || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
+ align, BITS_PER_WORD, volatilep, &lnmode)
+ || !get_best_mode (end_bit - boundary, boundary, 0, 0,
+ align, BITS_PER_WORD, volatilep, &lnmode2))
+ return 0;
+
+ l_split_load = true;
+ parts = 2;
+ if (ll_bitpos >= boundary)
+ maybe_separate = first1 = true;
+ else if (ll_bitpos + ll_bitsize <= boundary)
+ maybe_separate = true;
+ }
+ else
+ l_split_load = false;
+
+ lnbitsize = GET_MODE_BITSIZE (lnmode);
+ lnbitpos = first_bit & ~ (lnbitsize - 1);
+ if (l_split_load)
+ lnbitsize += GET_MODE_BITSIZE (lnmode2);
+ lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
+ if (!lntype)
+ {
+ gcc_checking_assert (l_split_load);
+ lntype = build_nonstandard_integer_type (lnbitsize, 1);
+ }
+ xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
+
+ if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+ {
+ xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
+ xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
+ }
+
+ ll_mask = int_const_binop (LSHIFT_EXPR,
+ fold_convert_loc (loc, lntype, ll_mask),
+ size_int (xll_bitpos));
+ if (!ll_mask)
+ return 0;
+ rl_mask = int_const_binop (LSHIFT_EXPR,
+ fold_convert_loc (loc, lntype, rl_mask),
+ size_int (xrl_bitpos));
+ if (!rl_mask)
+ return 0;
+
+ if (l_const)
+ {
+ l_const = fold_convert_loc (loc, lntype, l_const);
+ l_const = unextend (l_const, ll_bitsize, ll_unsignedp, ll_and_mask);
+ l_const = int_const_binop (LSHIFT_EXPR, l_const, size_int (xll_bitpos));
+ if (! integer_zerop (int_const_binop (BIT_AND_EXPR, l_const,
+ fold_build1_loc (loc, BIT_NOT_EXPR,
+ lntype, ll_mask))))
+ {
+ warning (0, "comparison is always %d", wanted_code == NE_EXPR);
+
+ return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
+ }
+ }
+ if (r_const)
+ {
+ r_const = fold_convert_loc (loc, lntype, r_const);
+ r_const = unextend (r_const, rl_bitsize, rl_unsignedp, rl_and_mask);
+ r_const = int_const_binop (LSHIFT_EXPR, r_const, size_int (xrl_bitpos));
+ if (! integer_zerop (int_const_binop (BIT_AND_EXPR, r_const,
+ fold_build1_loc (loc, BIT_NOT_EXPR,
+ lntype, rl_mask))))
+ {
+ warning (0, "comparison is always %d", wanted_code == NE_EXPR);
+
+ return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
+ }
+ }
+
+ /* If the right sides are not constant, do the same for it. Also,
+ disallow this optimization if a size, signedness or storage order
+ mismatch occurs between the left and right sides. */
+ if (l_const == 0)
+ {
+ if (ll_bitsize != lr_bitsize || rl_bitsize != rr_bitsize
+ || ll_unsignedp != lr_unsignedp || rl_unsignedp != rr_unsignedp
+ || ll_reversep != lr_reversep
+ /* Make sure the two fields on the right
+ correspond to the left without being swapped. */
+ || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos
+ || lnbitpos < 0)
+ return 0;
+
+ bool r_split_load;
+ scalar_int_mode rnmode2;
+
+ first_bit = MIN (lr_bitpos, rr_bitpos);
+ end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
+ if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+ TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
+ volatilep, &rnmode))
+ {
+ /* Consider the possibility of recombining loads if any of the
+ fields straddles across an alignment boundary, so that either
+ part can be loaded along with the other field. */
+ HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (lr_inner));
+ HOST_WIDE_INT boundary = compute_split_boundary_from_align
+ (align, lr_bitpos, lr_bitsize, rr_bitpos, rr_bitsize);
+
+ if (boundary < 0
+ /* If we're to split both, make sure the split point is
+ the same. */
+ || (l_split_load
+ && (boundary - lr_bitpos
+ != (lnbitpos + GET_MODE_BITSIZE (lnmode)) - ll_bitpos))
+ || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
+ align, BITS_PER_WORD, volatilep, &rnmode)
+ || !get_best_mode (end_bit - boundary, boundary, 0, 0,
+ align, BITS_PER_WORD, volatilep, &rnmode2))
+ return 0;
+
+ r_split_load = true;
+ parts = 2;
+ if (lr_bitpos >= boundary)
+ maybe_separate = first1 = true;
+ else if (lr_bitpos + lr_bitsize <= boundary)
+ maybe_separate = true;
+ }
+ else
+ r_split_load = false;
+
+ rnbitsize = GET_MODE_BITSIZE (rnmode);
+ rnbitpos = first_bit & ~ (rnbitsize - 1);
+ if (r_split_load)
+ rnbitsize += GET_MODE_BITSIZE (rnmode2);
+ rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
+ if (!rntype)
+ {
+ gcc_checking_assert (r_split_load);
+ rntype = build_nonstandard_integer_type (rnbitsize, 1);
+ }
+ xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
+
+ if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
+ {
+ xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
+ xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
+ }
+
+ lr_mask = int_const_binop (LSHIFT_EXPR,
+ fold_convert_loc (loc, rntype, lr_mask),
+ size_int (xlr_bitpos));
+ rr_mask = int_const_binop (LSHIFT_EXPR,
+ fold_convert_loc (loc, rntype, rr_mask),
+ size_int (xrr_bitpos));
+
+ lr_mask = int_const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
+
+ toshift[1][0] = MIN (xlr_bitpos, xrr_bitpos);
+ shifted[1][0] = 0;
+
+ if (!r_split_load)
+ {
+ bitpos[1][0] = rnbitpos;
+ bitsiz[1][0] = rnbitsize;
+ ld_arg[1][0] = make_bit_field_load (loc, lr_inner, lr_arg,
+ rntype, rnbitsize, rnbitpos,
+ lr_unsignedp || rr_unsignedp,
+ lr_reversep,
+ r_mergeable == 0
+ ? NULL
+ : r_mergeable < 0
+ ? lr_load
+ : rr_load);
+ }
+
+ if (parts > 1)
+ {
+ if (r_split_load)
+ {
+ gimple *point[2];
+ if (r_mergeable == 0)
+ point[0] = point[1] = NULL;
+ else if (r_mergeable < 0)
+ {
+ point[0] = lr_load;
+ point[1] = rr_load;
+ }
+ else
+ {
+ point[0] = rr_load;
+ point[1] = lr_load;
+ }
+ build_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
+ shifted[1], loc, lr_inner, lr_arg,
+ rnmode, rnmode2, rnbitpos, lr_reversep, point);
+ }
+ else
+ reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
+ shifted[1], xmask[1],
+ lnbitpos + GET_MODE_BITSIZE (lnmode)
+ - ll_bitpos + lr_bitpos, lr_reversep);
+ }
+ }
+ else
+ {
+ /* Handle the case of comparisons with constants. If there is
+ something in common between the masks, those bits of the
+ constants must be the same. If not, the condition is always
+ false. Test for this to avoid generating incorrect code
+ below. */
+ result = int_const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
+ if (! integer_zerop (result)
+ && simple_cst_equal (int_const_binop (BIT_AND_EXPR,
+ result, l_const),
+ int_const_binop (BIT_AND_EXPR,
+ result, r_const)) != 1)
+ {
+ if (wanted_code == NE_EXPR)
+ {
+ warning (0,
+ "%<or%> of unmatched not-equal tests"
+ " is always 1");
+ return constant_boolean_node (true, truth_type);
+ }
+ else
+ {
+ warning (0,
+ "%<and%> of mutually exclusive equal-tests"
+ " is always 0");
+ return constant_boolean_node (false, truth_type);
+ }
+ }
+
+ if (lnbitpos < 0)
+ return 0;
+
+ /* The constants are combined so as to line up with the loaded
+ field, so use the same parameters. */
+ ld_arg[1][0] = int_const_binop (BIT_IOR_EXPR, l_const, r_const);
+ toshift[1][0] = MIN (xll_bitpos, xrl_bitpos);
+ shifted[1][0] = 0;
+ bitpos[1][0] = lnbitpos;
+ bitsiz[1][0] = lnbitsize;
+
+ if (parts > 1)
+ reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
+ shifted[1], xmask[1],
+ lnbitpos + GET_MODE_BITSIZE (lnmode),
+ lr_reversep);
+
+ lr_mask = build_int_cst_type (TREE_TYPE (ld_arg[1][0]), -1);
+
+ /* If the compiler thinks this is used uninitialized below, it's
+ because it can't realize that parts can only be 2 when
+ comparing wiht constants if l_split_load is also true. This
+ just silences the warning. */
+ rnbitpos = 0;
+ }
+
+ ll_mask = int_const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
+ toshift[0][0] = MIN (xll_bitpos, xrl_bitpos);
+ shifted[0][0] = 0;
+
+ if (!l_split_load)
+ {
+ bitpos[0][0] = lnbitpos;
+ bitsiz[0][0] = lnbitsize;
+ ld_arg[0][0] = make_bit_field_load (loc, ll_inner, ll_arg,
+ lntype, lnbitsize, lnbitpos,
+ ll_unsignedp || rl_unsignedp,
+ ll_reversep,
+ l_mergeable == 0
+ ? NULL
+ : l_mergeable < 0
+ ? ll_load
+ : rl_load);
+ }
+
+ if (parts > 1)
+ {
+ if (l_split_load)
+ {
+ gimple *point[2];
+ if (l_mergeable == 0)
+ point[0] = point[1] = NULL;
+ else if (l_mergeable < 0)
+ {
+ point[0] = ll_load;
+ point[1] = rl_load;
+ }
+ else
+ {
+ point[0] = rl_load;
+ point[1] = ll_load;
+ }
+ build_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
+ shifted[0], loc, ll_inner, ll_arg,
+ lnmode, lnmode2, lnbitpos, ll_reversep, point);
+ }
+ else
+ reuse_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
+ shifted[0], xmask[0],
+ rnbitpos + GET_MODE_BITSIZE (rnmode)
+ - lr_bitpos + ll_bitpos, ll_reversep);
+ }
+
+ for (int i = 0; i < parts; i++)
+ {
+ tree op[2] = { ld_arg[0][i], ld_arg[1][i] };
+ tree mask[2] = { ll_mask, lr_mask };
+
+ for (int j = 0; j < 2; j++)
+ {
+ op[j] = unshare_expr (op[j]);
+
+ /* Mask out the bits belonging to the other part. */
+ if (xmask[j][i])
+ mask[j] = int_const_binop (BIT_AND_EXPR, mask[j], xmask[j][i]);
+
+ if (shifted[j][i])
+ {
+ tree shiftsz = bitsize_int (shifted[j][i]);
+ mask[j] = int_const_binop (RSHIFT_EXPR, mask[j], shiftsz);
+ }
+ mask[j] = fold_convert_loc (loc, TREE_TYPE (op[j]), mask[j]);
+ }
+
+ HOST_WIDE_INT shift = (toshift[0][i] - toshift[1][i]);
+
+ if (shift)
+ {
+ int j;
+ if (shift > 0)
+ j = 0;
+ else
+ {
+ j = 1;
+ shift = -shift;
+ }
+
+ tree shiftsz = bitsize_int (shift);
+ op[j] = fold_build2_loc (loc, RSHIFT_EXPR, TREE_TYPE (op[j]),
+ op[j], shiftsz);
+ mask[j] = int_const_binop (RSHIFT_EXPR, mask[j], shiftsz);
+ }
+
+ /* Convert to the smaller type before masking out unwanted
+ bits. */
+ tree type = TREE_TYPE (op[0]);
+ if (type != TREE_TYPE (op[1]))
+ {
+ int j = (TYPE_PRECISION (type)
+ < TYPE_PRECISION (TREE_TYPE (op[1])));
+ if (!j)
+ type = TREE_TYPE (op[1]);
+ op[j] = fold_convert_loc (loc, type, op[j]);
+ mask[j] = fold_convert_loc (loc, type, mask[j]);
+ }
+
+ for (int j = 0; j < 2; j++)
+ if (! integer_all_onesp (mask[j]))
+ op[j] = build2_loc (loc, BIT_AND_EXPR, type,
+ op[j], mask[j]);
+
+ cmp[i] = build2_loc (loc, wanted_code, truth_type, op[0], op[1]);
+ }
+
+ if (first1)
+ std::swap (cmp[0], cmp[1]);
+
+ if (parts == 1)
+ result = cmp[0];
+ else if (!separatep || !maybe_separate)
+ result = build2_loc (loc, orig_code, truth_type, cmp[0], cmp[1]);
+ else
+ {
+ result = cmp[0];
+ *separatep = cmp[1];
+ }
+
+ return result;
+}
+
/* Try to simplify the AND of two comparisons, specified by
(OP1A CODE1 OP1B) and (OP2B CODE2 OP2B), respectively.
If this can be simplified to a single expression (without requiring
new file mode 100644
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+/* { dg-options "-O -save-temps -fdump-tree-optimized" } */
+
+/* Check that field loads compared with constants are merged, even if
+ tested out of order, and when fields straddle across alignment
+ boundaries. */
+
+struct TL {
+ unsigned char p;
+ unsigned int a;
+ unsigned char q;
+ unsigned int b;
+ unsigned char r;
+ unsigned int c;
+ unsigned char s;
+} __attribute__ ((packed, aligned (4), scalar_storage_order ("little-endian")));
+
+struct TB {
+ unsigned char p;
+ unsigned int a;
+ unsigned char q;
+ unsigned int b;
+ unsigned char r;
+ unsigned int c;
+ unsigned char s;
+} __attribute__ ((packed, aligned (4), scalar_storage_order ("big-endian")));
+
+#define vc 0xaa
+#define vi 0x12345678
+
+struct TL vL = { vc, vi, vc, vi, vc, vi, vc };
+struct TB vB = { vc, vi, vc, vi, vc, vi, vc };
+
+void f (void) {
+ /* Which words of | vL | vB | */
+ /* are accessed by |0123|0123| */
+ if (0 /* the tests? | | | */
+ || vL.p != vc /* |* | | */
+ || vB.p != vc /* | |* | */
+ || vL.s != vc /* | *| | */
+ || vB.q != vc /* | | * | */
+ || vL.a != vi /* |^* | | */
+ || vB.b != vi /* | | ^* | */
+ || vL.c != vi /* | *^| | */
+ || vB.c != vi /* | | ^*| */
+ || vL.b != vi /* | ^^ | | */
+ || vL.q != vc /* | ^ | | */
+ || vB.a != vi /* | |^^ | */
+ || vB.r != vc /* | | ^ | */
+ || vB.s != vc /* | | ^| */
+ || vL.r != vc /* | ^ | | */
+ )
+ __builtin_abort ();
+}
+
+int main () {
+ f ();
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 8 "optimized" } } */
+/* { dg-final { scan-assembler-not "cmpb" { target { i*86-*-* || x86_64-*-* } } } } */
+/* { dg-final { scan-assembler-times "cmpl" 8 { target { i*86-*-* || x86_64-*-* } } } } */
+/* { dg-final { scan-assembler-times "cmpw" 8 { target { powerpc*-*-* || rs6000-*-* } } } } */
new file mode 100644
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+struct TL {
+ unsigned short a;
+ unsigned short b;
+} __attribute__ ((packed, aligned (8)));
+
+struct TB {
+ unsigned char p;
+ unsigned short a;
+ unsigned short b;
+} __attribute__ ((packed, aligned (8)));
+
+#define vc 0xaa
+
+struct TL vL = { vc, vc };
+struct TB vB = { vc, vc, vc };
+
+void f (void) {
+ if (0
+ || vL.b != vB.b
+ || vL.a != vB.a
+ )
+ __builtin_abort ();
+}
+
+int main () {
+ f ();
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+const int BIG_ENDIAN_P = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__);
+
+struct T1 {
+ unsigned char p[2];
+ unsigned short a;
+ unsigned int z;
+} __attribute__((__aligned__(8)));
+
+struct T2 {
+ unsigned short p;
+ unsigned short a;
+ unsigned int z;
+} __attribute__((__aligned__(8)));
+
+#define vc 0xaa
+#define vi 0x12345678
+
+struct T1 v1 = { { vc + !BIG_ENDIAN_P, vc + BIG_ENDIAN_P }, vc, vi };
+struct T2 v2 = { (vc << 8) | (vc - 1), vc, vi };
+
+void f (void) {
+ if (0
+ || v1.p[!BIG_ENDIAN_P] != v2.p >> 8
+ || v1.a != v2.a
+ || ((v1.z ^ v2.z) & 0xff00ff00) != 0
+ || ((v1.z ^ v2.z) & 0x00ff00ff) != 0)
+ __builtin_abort ();
+}
+
+int main () {
+ f ();
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+struct T1 {
+ unsigned int zn;
+ unsigned char p;
+ unsigned char qn;
+ unsigned short a;
+ unsigned int z;
+} __attribute__((__packed__, __aligned__(4)));
+
+struct T2 {
+ unsigned int zn;
+ unsigned char rn;
+ unsigned char p;
+ unsigned char qn;
+ unsigned short a;
+ unsigned int z;
+} __attribute__((__packed__, __aligned__(4)));
+
+#define vc 0xaa
+#define vs 0xccdd
+#define vi 0x12345678
+
+struct T1 v1 = { -1, vc, 1, vs, vi };
+struct T2 v2 = { -1, 0, vc, 1, vs, vi };
+
+void f (void) {
+ if (0
+ || v1.p != v2.p
+ || v1.a != v2.a
+ || v1.z != v2.z
+ )
+ __builtin_abort ();
+}
+
+int main () {
+ f ();
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+struct T1 {
+ unsigned int zn;
+ unsigned char p;
+ unsigned char qn;
+ unsigned short a;
+ unsigned int z;
+} __attribute__((__packed__, __aligned__(8)));
+
+struct T2 {
+ unsigned int zn;
+ unsigned char rn;
+ unsigned char p;
+ unsigned char qn;
+ unsigned short a;
+ unsigned int z;
+} __attribute__((__packed__, __aligned__(8)));
+
+#define vc 0xaa
+#define vs 0xccdd
+#define vi 0x12345678
+
+struct T1 v1 = { -1, vc, 1, vs, vi };
+struct T2 v2 = { -1, 0, vc, 1, vs, vi };
+
+void f (void) {
+ if (0
+ || v1.p != v2.p
+ || v1.a != v2.a
+ || v1.z != v2.z
+ )
+ __builtin_abort ();
+}
+
+int main () {
+ f ();
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+/* { dg-shouldfail } */
+
+/* Check that the third compare won't be pulled ahead of the second one and
+ prevent, which would prevent the NULL pointer dereference that should cause
+ the execution to fail. */
+
+struct s {
+ char a, b;
+ int *p;
+};
+
+struct s a = { 0, 1, 0 };
+struct s b = { 0, 0, 0 };
+
+int f () {
+ return (a.a != b.a
+ || *b.p != *a.p
+ || a.b != b.b);
+}
+
+int main() {
+ f ();
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-ifcombine-details" } */
+
+/* Check that the third compare won't be combined with the first one. */
+
+struct s {
+ char a, b;
+ int p;
+};
+
+struct s a = { 0, 0, 0 };
+struct s b = { 0, 0, 0 };
+
+int f () {
+ return (a.a != b.a || (a.p != b.p && a.b != b.b));
+}
+
+int g () {
+ return (a.a == b.a && (a.p == b.p || a.b == b.b));
+}
+
+/* { dg-final { scan-tree-dump-not "optimizing" "ifcombine" } } */
+/* { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "ifcombine" } } */
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3. If not see
#include "tree-ssa.h"
#include "attribs.h"
#include "asan.h"
+#include "bitmap.h"
#ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
#define LOGICAL_OP_NON_SHORT_CIRCUIT \
@@ -49,6 +50,28 @@ along with GCC; see the file COPYING3. If not see
false) >= 2)
#endif
+/* Return TRUE iff COND is NULL, or the condition in it is constant. */
+
+static bool
+constant_condition_p (gcond *cond)
+{
+ if (!cond)
+ return true;
+
+ return (CONSTANT_CLASS_P (gimple_cond_lhs (cond))
+ && CONSTANT_CLASS_P (gimple_cond_rhs (cond)));
+}
+
+/* Return FALSE iff the condition in the COND stmt that ends COND_BB is not
+ constant. */
+
+static bool
+constant_condition_p (basic_block cond_bb)
+{
+ gcond *cond = safe_dyn_cast <gcond *> (*gsi_last_bb (cond_bb));
+ return constant_condition_p (cond);
+}
+
/* This pass combines COND_EXPRs to simplify control flow. It
currently recognizes bit tests and comparisons in chains that
represent logical and or logical or of two COND_EXPRs.
@@ -110,6 +133,37 @@ recognize_if_then_else (basic_block cond_bb,
return true;
}
+/* Same as recognize_if_then_else, but check that the condition is not
+ constant. It is not useful to combine constant conditions. */
+
+static bool
+recognize_if_then_else_nc (basic_block cond_bb,
+ basic_block *then_bb, basic_block *else_bb)
+{
+ return recognize_if_then_else (cond_bb, then_bb, else_bb)
+ && !constant_condition_p (cond_bb);
+}
+
+/* Same as recognize_if_then_else, but don't associate the blocks with then and
+ else, check both possibilities. */
+
+static bool
+recognize_if_succs (basic_block cond_bb,
+ basic_block succ1, basic_block succ2)
+{
+ basic_block t, e;
+
+ if (EDGE_COUNT (cond_bb->succs) != 2)
+ return false;
+
+ /* Find both succs. */
+ t = EDGE_SUCC (cond_bb, 0)->dest;
+ e = EDGE_SUCC (cond_bb, 1)->dest;
+
+ return ((t == succ1 && e == succ2)
+ || (t == succ2 && e == succ1));
+}
+
/* Verify if the basic block BB does not have side-effects. Return
true in this case, else false. */
@@ -129,7 +183,7 @@ bb_no_side_effects_p (basic_block bb)
enum tree_code rhs_code;
if (gimple_has_side_effects (stmt)
|| gimple_could_trap_p (stmt)
- || gimple_vuse (stmt)
+ /* || gimple_vuse (stmt) */
/* We need to rewrite stmts with undefined overflow to use
unsigned arithmetic but cannot do so for signed division. */
|| ((ass = dyn_cast <gassign *> (stmt))
@@ -356,14 +410,28 @@ recognize_bits_test (gcond *cond, tree *name, tree *bits, bool inv)
}
-/* Update profile after code in outer_cond_bb was adjusted so
- outer_cond_bb has no condition. */
+/* Update profile after code in either outer_cond_bb or inner_cond_bb was
+ adjusted so that it has no condition. */
static void
update_profile_after_ifcombine (basic_block inner_cond_bb,
basic_block outer_cond_bb)
{
- edge outer_to_inner = find_edge (outer_cond_bb, inner_cond_bb);
+ /* In the following we assume that inner_cond_bb has single predecessor. */
+ gcc_assert (single_pred_p (inner_cond_bb));
+
+ basic_block outer_to_inner_bb = inner_cond_bb;
+ profile_probability prob = profile_probability::always ();
+ for (;;)
+ {
+ basic_block parent = single_pred (outer_to_inner_bb);
+ prob *= find_edge (parent, outer_to_inner_bb)->probability;
+ if (parent == outer_cond_bb)
+ break;
+ outer_to_inner_bb = parent;
+ }
+
+ edge outer_to_inner = find_edge (outer_cond_bb, outer_to_inner_bb);
edge outer2 = (EDGE_SUCC (outer_cond_bb, 0) == outer_to_inner
? EDGE_SUCC (outer_cond_bb, 1)
: EDGE_SUCC (outer_cond_bb, 0));
@@ -374,40 +442,294 @@ update_profile_after_ifcombine (basic_block inner_cond_bb,
std::swap (inner_taken, inner_not_taken);
gcc_assert (inner_taken->dest == outer2->dest);
- /* In the following we assume that inner_cond_bb has single predecessor. */
- gcc_assert (single_pred_p (inner_cond_bb));
+ if (outer_to_inner_bb == inner_cond_bb
+ && constant_condition_p (outer_cond_bb))
+ {
+ /* Path outer_cond_bb->(outer2) needs to be merged into path
+ outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
+ and probability of inner_not_taken updated. */
+
+ inner_cond_bb->count = outer_cond_bb->count;
+
+ /* Handle special case where inner_taken probability is always. In this
+ case we know that the overall outcome will be always as well, but
+ combining probabilities will be conservative because it does not know
+ that outer2->probability is inverse of
+ outer_to_inner->probability. */
+ if (inner_taken->probability == profile_probability::always ())
+ ;
+ else
+ inner_taken->probability = outer2->probability
+ + outer_to_inner->probability * inner_taken->probability;
+ inner_not_taken->probability = profile_probability::always ()
+ - inner_taken->probability;
+
+ outer_to_inner->probability = profile_probability::always ();
+ outer2->probability = profile_probability::never ();
+ }
+ else if (constant_condition_p (inner_cond_bb))
+ {
+ /* Path inner_cond_bb->(inner_taken) needs to be merged into path
+ outer_cond_bb->(outer2). We've accumulated the probabilities from
+ outer_cond_bb->(outer)->...->inner_cond_bb in prob, so we have to
+ adjust that by inner_taken, and make inner unconditional. */
+
+ prob *= inner_taken->probability;
+ outer2->probability += prob;
+ outer_to_inner->probability = profile_probability::always ()
+ - outer2->probability;
+
+ inner_taken->probability = profile_probability::never ();
+ inner_not_taken->probability = profile_probability::always ();
+ }
+ else
+ {
+ /* We've moved part of the inner cond to outer, but we don't know the
+ probabilities for each part, so estimate the effects by moving half of
+ the odds of inner_taken to outer. */
+
+ inner_taken->probability *= profile_probability::even ();
+ inner_not_taken->probability = profile_probability::always ()
+ - inner_taken->probability;
+
+ prob *= inner_taken->probability;
+ outer2->probability += prob;
+ outer_to_inner->probability = profile_probability::always ()
+ - outer2->probability;
+ }
+}
+
+/* Set NAME's bit in USED if OUTER dominates it. */
+
+static void
+ifcombine_mark_ssa_name (bitmap used, tree name, basic_block outer)
+{
+ if (SSA_NAME_IS_DEFAULT_DEF (name))
+ return;
+
+ gimple *def = SSA_NAME_DEF_STMT (name);
+ basic_block bb = gimple_bb (def);
+ if (!dominated_by_p (CDI_DOMINATORS, bb, outer))
+ return;
+
+ bitmap_set_bit (used, SSA_NAME_VERSION (name));
+}
+
+/* Data structure passed to ifcombine_mark_ssa_name. */
+struct ifcombine_mark_ssa_name_t
+{
+ /* SSA_NAMEs that have been referenced. */
+ bitmap used;
+ /* Dominating block of DEFs that might need moving. */
+ basic_block outer;
+};
+
+/* Mark in DATA->used any SSA_NAMEs used in *t. */
+
+static tree
+ifcombine_mark_ssa_name_walk (tree *t, int *, void *data_)
+{
+ ifcombine_mark_ssa_name_t *data = (ifcombine_mark_ssa_name_t *)data_;
+
+ if (*t && TREE_CODE (*t) == SSA_NAME)
+ ifcombine_mark_ssa_name (data->used, *t, data->outer);
+
+ return NULL;
+}
+
+/* Replace the conditions in INNER_COND and OUTER_COND with COND and COND2.
+ COND and COND2 are computed for insertion at INNER_COND, with OUTER_COND
+ replaced with a constant, but if there are intervening blocks, it's best to
+ adjust COND for insertion at OUTER_COND, placing COND2 at INNER_COND. */
+
+static tree
+ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
+ gcond *outer_cond, bool outer_inv,
+ tree cond, bool must_canon,
+ tree cond2)
+{
+ tree ret = cond;
+ if (cond2)
+ ret = fold_build2 (TRUTH_AND_EXPR, TREE_TYPE (ret), ret, cond2);
+
+ /* Split cond into cond2 if they're contiguous. ??? We might be able to
+ handle ORIF as well, inverting both conditions, but it's not clear that
+ this would be enough, and it never comes up. */
+ if (!cond2
+ && TREE_CODE (cond) == TRUTH_ANDIF_EXPR
+ && single_pred (gimple_bb (inner_cond)) == gimple_bb (outer_cond))
+ {
+ cond2 = TREE_OPERAND (cond, 1);
+ cond = TREE_OPERAND (cond, 0);
+ }
+
+ bool outer_p = cond2 || (single_pred (gimple_bb (inner_cond))
+ != gimple_bb (outer_cond));
+ bool result_inv = outer_p ? outer_inv : inner_inv;
+
+ if (result_inv)
+ cond = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
+
+ if (tree tcanon = canonicalize_cond_expr_cond (cond))
+ cond = tcanon;
+ else if (must_canon)
+ return NULL_TREE;
+
+ if (outer_p)
+ {
+ {
+ auto_bitmap used;
+ basic_block outer_bb = gimple_bb (outer_cond);
- /* Path outer_cond_bb->(outer2) needs to be merged into path
- outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
- and probability of inner_not_taken updated. */
+ /* Mark SSA DEFs that are referenced by cond and may thus need to be
+ moved to outer. */
+ {
+ ifcombine_mark_ssa_name_t data = { used, outer_bb };
+ walk_tree (&cond, ifcombine_mark_ssa_name_walk, &data, NULL);
+ }
+
+ if (!bitmap_empty_p (used))
+ {
+ /* Iterate up from inner_cond, moving DEFs identified as used by
+ cond, and marking USEs in the DEFs for moving as well. */
+ gimple_stmt_iterator gsins = gsi_for_stmt (outer_cond);
+ for (basic_block bb = gimple_bb (inner_cond);
+ bb != outer_bb; bb = single_pred (bb))
+ {
+ for (gimple_stmt_iterator gsitr = gsi_last_bb (bb);
+ !gsi_end_p (gsitr); gsi_prev (&gsitr))
+ {
+ gimple *stmt = gsi_stmt (gsitr);
+ bool move = false;
+ tree t;
+ ssa_op_iter it;
+
+ FOR_EACH_SSA_TREE_OPERAND (t, stmt, it, SSA_OP_DEF)
+ if (bitmap_bit_p (used, SSA_NAME_VERSION (t)))
+ {
+ move = true;
+ break;
+ }
+
+ if (!move)
+ continue;
+
+ /* Mark uses in STMT before moving it. */
+ FOR_EACH_SSA_TREE_OPERAND (t, stmt, it, SSA_OP_USE)
+ ifcombine_mark_ssa_name (used, t, outer_bb);
+
+ gsi_move_before (&gsitr, &gsins, GSI_NEW_STMT);
+ }
+
+ /* Surprisingly, there may be PHI nodes in single-predecessor
+ bocks, as in pr50682.C. Fortunately, since they can't
+ involve back edges, there won't be references to parallel
+ nodes that we'd have to pay special attention to to keep
+ them parallel. We can't move the PHI nodes, but we can turn
+ them into assignments. */
+ for (gphi_iterator gsi = gsi_start_phis (bb);
+ !gsi_end_p (gsi);)
+ {
+ gphi *phi = gsi.phi ();
+
+ gcc_assert (gimple_phi_num_args (phi) == 1);
+ tree def = gimple_phi_result (phi);
+
+ if (!bitmap_bit_p (used, SSA_NAME_VERSION (def)))
+ {
+ gsi_next (&gsi);
+ continue;
+ }
+
+ /* Mark uses in STMT before moving it. */
+ use_operand_p use_p;
+ ssa_op_iter it;
+ FOR_EACH_PHI_ARG (use_p, phi, it, SSA_OP_USE)
+ ifcombine_mark_ssa_name (used, USE_FROM_PTR (use_p),
+ outer_bb);
+
+ tree use = gimple_phi_arg_def (phi, 0);
+ location_t loc = gimple_phi_arg_location (phi, 0);
+
+ remove_phi_node (&gsi, false);
+
+ gassign *a = gimple_build_assign (def, use);
+ gimple_set_location (a, loc);
+ gsi_insert_before (&gsins, a, GSI_NEW_STMT);
+ }
+ }
+ }
+ }
- inner_cond_bb->count = outer_cond_bb->count;
+ if (!is_gimple_condexpr_for_cond (cond))
+ {
+ gimple_stmt_iterator gsi = gsi_for_stmt (outer_cond);
+ cond = force_gimple_operand_gsi_1 (&gsi, cond,
+ is_gimple_condexpr_for_cond,
+ NULL, true, GSI_SAME_STMT);
+ }
- /* Handle special case where inner_taken probability is always. In this case
- we know that the overall outcome will be always as well, but combining
- probabilities will be conservative because it does not know that
- outer2->probability is inverse of outer_to_inner->probability. */
- if (inner_taken->probability == profile_probability::always ())
- ;
+ /* Leave CFG optimization to cfg_cleanup. */
+ gimple_cond_set_condition_from_tree (outer_cond, cond);
+ update_stmt (outer_cond);
+
+ if (cond2)
+ {
+ if (inner_inv)
+ cond2 = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond2), cond2);
+
+ if (tree tcanon = canonicalize_cond_expr_cond (cond2))
+ cond2 = tcanon;
+ if (!is_gimple_condexpr_for_cond (cond2))
+ {
+ gimple_stmt_iterator gsi = gsi_for_stmt (inner_cond);
+ cond2 = force_gimple_operand_gsi_1 (&gsi, cond2,
+ is_gimple_condexpr_for_cond,
+ NULL, true, GSI_SAME_STMT);
+ }
+ gimple_cond_set_condition_from_tree (inner_cond, cond2);
+ }
+ else
+ gimple_cond_set_condition_from_tree (inner_cond,
+ inner_inv
+ ? boolean_false_node
+ : boolean_true_node);
+ update_stmt (inner_cond);
+ }
else
- inner_taken->probability = outer2->probability + outer_to_inner->probability
- * inner_taken->probability;
- inner_not_taken->probability = profile_probability::always ()
- - inner_taken->probability;
+ {
+ if (!is_gimple_condexpr_for_cond (cond))
+ {
+ gimple_stmt_iterator gsi = gsi_for_stmt (inner_cond);
+ cond = force_gimple_operand_gsi_1 (&gsi, cond,
+ is_gimple_condexpr_for_cond,
+ NULL, true, GSI_SAME_STMT);
+ }
+ gimple_cond_set_condition_from_tree (inner_cond, cond);
+ update_stmt (inner_cond);
+
+ /* Leave CFG optimization to cfg_cleanup. */
+ gimple_cond_set_condition_from_tree (outer_cond,
+ outer_inv
+ ? boolean_false_node
+ : boolean_true_node);
+ update_stmt (outer_cond);
+ }
+
+ update_profile_after_ifcombine (gimple_bb (inner_cond),
+ gimple_bb (outer_cond));
- outer_to_inner->probability = profile_probability::always ();
- outer2->probability = profile_probability::never ();
+ return ret;
}
/* If-convert on a and pattern with a common else block. The inner
if is specified by its INNER_COND_BB, the outer by OUTER_COND_BB.
- inner_inv, outer_inv and result_inv indicate whether the conditions
- are inverted.
+ inner_inv, outer_inv indicate whether the conditions are inverted.
Returns true if the edges to the common else basic-block were merged. */
static bool
ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
- basic_block outer_cond_bb, bool outer_inv, bool result_inv)
+ basic_block outer_cond_bb, bool outer_inv)
{
gimple_stmt_iterator gsi;
tree name1, name2, bit1, bit2, bits1, bits2;
@@ -446,26 +768,13 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
t2 = fold_build2 (BIT_AND_EXPR, TREE_TYPE (name1), name1, t);
t2 = force_gimple_operand_gsi (&gsi, t2, true, NULL_TREE,
true, GSI_SAME_STMT);
- t = fold_build2 (result_inv ? NE_EXPR : EQ_EXPR,
- boolean_type_node, t2, t);
- t = canonicalize_cond_expr_cond (t);
- if (!t)
- return false;
- if (!is_gimple_condexpr_for_cond (t))
- {
- gsi = gsi_for_stmt (inner_cond);
- t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
- NULL, true, GSI_SAME_STMT);
- }
- gimple_cond_set_condition_from_tree (inner_cond, t);
- update_stmt (inner_cond);
- /* Leave CFG optimization to cfg_cleanup. */
- gimple_cond_set_condition_from_tree (outer_cond,
- outer_inv ? boolean_false_node : boolean_true_node);
- update_stmt (outer_cond);
+ t = fold_build2 (EQ_EXPR, boolean_type_node, t2, t);
- update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
+ if (!ifcombine_replace_cond (inner_cond, inner_inv,
+ outer_cond, outer_inv,
+ t, true, NULL_TREE))
+ return false;
if (dump_file)
{
@@ -485,9 +794,8 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
In that case remove the outer test and change the inner one to
test for name & (bits1 | bits2) != 0. */
else if (recognize_bits_test (inner_cond, &name1, &bits1, !inner_inv)
- && recognize_bits_test (outer_cond, &name2, &bits2, !outer_inv))
+ && recognize_bits_test (outer_cond, &name2, &bits2, !outer_inv))
{
- gimple_stmt_iterator gsi;
tree t;
if ((TREE_CODE (name1) == SSA_NAME
@@ -530,33 +838,14 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
bits1 = fold_convert (TREE_TYPE (bits2), bits1);
}
- /* Do it. */
- gsi = gsi_for_stmt (inner_cond);
t = fold_build2 (BIT_IOR_EXPR, TREE_TYPE (name1), bits1, bits2);
- t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
- true, GSI_SAME_STMT);
t = fold_build2 (BIT_AND_EXPR, TREE_TYPE (name1), name1, t);
- t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
- true, GSI_SAME_STMT);
- t = fold_build2 (result_inv ? NE_EXPR : EQ_EXPR, boolean_type_node, t,
+ t = fold_build2 (EQ_EXPR, boolean_type_node, t,
build_int_cst (TREE_TYPE (t), 0));
- t = canonicalize_cond_expr_cond (t);
- if (!t)
+ if (!ifcombine_replace_cond (inner_cond, inner_inv,
+ outer_cond, outer_inv,
+ t, false, NULL_TREE))
return false;
- if (!is_gimple_condexpr_for_cond (t))
- {
- gsi = gsi_for_stmt (inner_cond);
- t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
- NULL, true, GSI_SAME_STMT);
- }
- gimple_cond_set_condition_from_tree (inner_cond, t);
- update_stmt (inner_cond);
-
- /* Leave CFG optimization to cfg_cleanup. */
- gimple_cond_set_condition_from_tree (outer_cond,
- outer_inv ? boolean_false_node : boolean_true_node);
- update_stmt (outer_cond);
- update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
if (dump_file)
{
@@ -576,7 +865,7 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
else if (TREE_CODE_CLASS (gimple_cond_code (inner_cond)) == tcc_comparison
&& TREE_CODE_CLASS (gimple_cond_code (outer_cond)) == tcc_comparison)
{
- tree t;
+ tree t, ts = NULL_TREE;
enum tree_code inner_cond_code = gimple_cond_code (inner_cond);
enum tree_code outer_cond_code = gimple_cond_code (outer_cond);
@@ -599,10 +888,20 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
outer_cond_code,
gimple_cond_lhs (outer_cond),
gimple_cond_rhs (outer_cond),
- gimple_bb (outer_cond))))
+ gimple_bb (outer_cond)))
+ && !(t = (fold_truth_andor_maybe_separate
+ (UNKNOWN_LOCATION, TRUTH_ANDIF_EXPR,
+ boolean_type_node,
+ outer_cond_code,
+ gimple_cond_lhs (outer_cond),
+ gimple_cond_rhs (outer_cond),
+ inner_cond_code,
+ gimple_cond_lhs (inner_cond),
+ gimple_cond_rhs (inner_cond),
+ &ts))))
{
+ {
tree t1, t2;
- gimple_stmt_iterator gsi;
bool logical_op_non_short_circuit = LOGICAL_OP_NON_SHORT_CIRCUIT;
if (param_logical_op_non_short_circuit != -1)
logical_op_non_short_circuit
@@ -624,34 +923,13 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
gimple_cond_rhs (outer_cond));
t = fold_build2_loc (gimple_location (inner_cond),
TRUTH_AND_EXPR, boolean_type_node, t1, t2);
- if (result_inv)
- {
- t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
- result_inv = false;
- }
- gsi = gsi_for_stmt (inner_cond);
- t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
- NULL, true, GSI_SAME_STMT);
+ }
}
- if (result_inv)
- t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
- t = canonicalize_cond_expr_cond (t);
- if (!t)
- return false;
- if (!is_gimple_condexpr_for_cond (t))
- {
- gsi = gsi_for_stmt (inner_cond);
- t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
- NULL, true, GSI_SAME_STMT);
- }
- gimple_cond_set_condition_from_tree (inner_cond, t);
- update_stmt (inner_cond);
- /* Leave CFG optimization to cfg_cleanup. */
- gimple_cond_set_condition_from_tree (outer_cond,
- outer_inv ? boolean_false_node : boolean_true_node);
- update_stmt (outer_cond);
- update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
+ if (!(t = ifcombine_replace_cond (inner_cond, inner_inv,
+ outer_cond, outer_inv,
+ t, false, ts)))
+ return false;
if (dump_file)
{
@@ -669,19 +947,21 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
/* Helper function for tree_ssa_ifcombine_bb. Recognize a CFG pattern and
dispatch to the appropriate if-conversion helper for a particular
set of INNER_COND_BB, OUTER_COND_BB, THEN_BB and ELSE_BB.
- PHI_PRED_BB should be one of INNER_COND_BB, THEN_BB or ELSE_BB. */
+ PHI_PRED_BB should be one of INNER_COND_BB, THEN_BB or ELSE_BB.
+ OUTER_SUCC_BB is the successor of OUTER_COND_BB on the path towards
+ INNER_COND_BB. */
static bool
tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
basic_block then_bb, basic_block else_bb,
- basic_block phi_pred_bb)
+ basic_block phi_pred_bb, basic_block outer_succ_bb)
{
/* The && form is characterized by a common else_bb with
the two edges leading to it mergable. The latter is
guaranteed by matching PHI arguments in the else_bb and
the inner cond_bb having no side-effects. */
if (phi_pred_bb != else_bb
- && recognize_if_then_else (outer_cond_bb, &inner_cond_bb, &else_bb)
+ && recognize_if_then_else_nc (outer_cond_bb, &outer_succ_bb, &else_bb)
&& same_phi_args_p (outer_cond_bb, phi_pred_bb, else_bb))
{
/* We have
@@ -693,13 +973,12 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
<else_bb>
...
*/
- return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, false,
- false);
+ return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, false);
}
/* And a version where the outer condition is negated. */
if (phi_pred_bb != else_bb
- && recognize_if_then_else (outer_cond_bb, &else_bb, &inner_cond_bb)
+ && recognize_if_then_else_nc (outer_cond_bb, &else_bb, &outer_succ_bb)
&& same_phi_args_p (outer_cond_bb, phi_pred_bb, else_bb))
{
/* We have
@@ -711,8 +990,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
<else_bb>
...
*/
- return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, true,
- false);
+ return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, true);
}
/* The || form is characterized by a common then_bb with the
@@ -720,7 +998,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
by matching PHI arguments in the then_bb and the inner cond_bb
having no side-effects. */
if (phi_pred_bb != then_bb
- && recognize_if_then_else (outer_cond_bb, &then_bb, &inner_cond_bb)
+ && recognize_if_then_else_nc (outer_cond_bb, &then_bb, &outer_succ_bb)
&& same_phi_args_p (outer_cond_bb, phi_pred_bb, then_bb))
{
/* We have
@@ -731,13 +1009,12 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
<then_bb>
...
*/
- return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, true,
- true);
+ return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, true);
}
/* And a version where the outer condition is negated. */
if (phi_pred_bb != then_bb
- && recognize_if_then_else (outer_cond_bb, &inner_cond_bb, &then_bb)
+ && recognize_if_then_else_nc (outer_cond_bb, &outer_succ_bb, &then_bb)
&& same_phi_args_p (outer_cond_bb, phi_pred_bb, then_bb))
{
/* We have
@@ -748,8 +1025,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
<then_bb>
...
*/
- return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, false,
- true);
+ return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, false);
}
return false;
@@ -759,13 +1035,13 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
if-conversion helper. We start with BB as the innermost
worker basic-block. Returns true if a transformation was done. */
-static bool
+static basic_block
tree_ssa_ifcombine_bb (basic_block inner_cond_bb)
{
basic_block then_bb = NULL, else_bb = NULL;
- if (!recognize_if_then_else (inner_cond_bb, &then_bb, &else_bb))
- return false;
+ if (!recognize_if_then_else_nc (inner_cond_bb, &then_bb, &else_bb))
+ return NULL;
/* Recognize && and || of two conditions with a common
then/else block which entry edges we can merge. That is:
@@ -775,14 +1051,41 @@ tree_ssa_ifcombine_bb (basic_block inner_cond_bb)
if (a && b)
;
This requires a single predecessor of the inner cond_bb. */
- if (single_pred_p (inner_cond_bb)
- && bb_no_side_effects_p (inner_cond_bb))
+ for (basic_block bb = inner_cond_bb, outer_cond_bb, exit_bb = NULL;
+ single_pred_p (bb) && bb_no_side_effects_p (bb)
+ && (!exit_bb || same_phi_args_p (bb, inner_cond_bb, exit_bb));
+ bb = outer_cond_bb)
{
- basic_block outer_cond_bb = single_pred (inner_cond_bb);
+ outer_cond_bb = single_pred (bb);
+
+ /* Skip blocks without conditions. */
+ if (single_succ_p (outer_cond_bb))
+ continue;
+
+ /* When considering noncontiguous conditions, make sure that all
+ non-final conditions lead to the same successor of the final
+ condition, when not taking the path to inner_bb, so that we can
+ combine C into A, both in A && (B && C), and in A || (B || C), but
+ neither in A && (B || C), nor A || (B && C). Say, if C goes to
+ THEN_BB or ELSE_BB, then B must go to either of these, say X, besides
+ C (whether C is then or else), and A must go to X and B (whether then
+ or else).
+
+ We test for this, while allowing intervening nonconditional blocks, by
+ first taking note of which of the successors of the inner conditional
+ block is the exit path taken by the first considered outer conditional
+ block.
+
+ Having ve identified and saved the exit block in exit_bb at the end of
+ the loop, here we test that subsequent conditional blocks under
+ consideration also use the exit block as a successor, besides the
+ block that leads to inner_cond_bb. */
+ if (exit_bb && !recognize_if_succs (outer_cond_bb, bb, exit_bb))
+ break;
if (tree_ssa_ifcombine_bb_1 (inner_cond_bb, outer_cond_bb,
- then_bb, else_bb, inner_cond_bb))
- return true;
+ then_bb, else_bb, inner_cond_bb, bb))
+ return outer_cond_bb;
if (forwarder_block_to (else_bb, then_bb))
{
@@ -793,8 +1096,8 @@ tree_ssa_ifcombine_bb (basic_block inner_cond_bb)
For same_phi_args_p we look at equality of arguments between
edge from outer_cond_bb and the forwarder block. */
if (tree_ssa_ifcombine_bb_1 (inner_cond_bb, outer_cond_bb, else_bb,
- then_bb, else_bb))
- return true;
+ then_bb, else_bb, bb))
+ return outer_cond_bb;
}
else if (forwarder_block_to (then_bb, else_bb))
{
@@ -805,12 +1108,25 @@ tree_ssa_ifcombine_bb (basic_block inner_cond_bb)
For same_phi_args_p we look at equality of arguments between
edge from outer_cond_bb and the forwarder block. */
if (tree_ssa_ifcombine_bb_1 (inner_cond_bb, outer_cond_bb, else_bb,
- then_bb, then_bb))
- return true;
+ then_bb, then_bb, bb))
+ return outer_cond_bb;
+ }
+
+ /* Record the exit path taken by the outer condition. In the loop
+ condition test, we'll check whether INNER_COND_BB and the current
+ OUTER_COND_BB (then BB) share the same PHI args at EXIT_BB. */
+ if (!exit_bb)
+ {
+ if (recognize_if_succs (outer_cond_bb, then_bb, bb))
+ exit_bb = then_bb;
+ else if (recognize_if_succs (outer_cond_bb, bb, else_bb))
+ exit_bb = else_bb;
+ else
+ break;
}
}
- return false;
+ return NULL;
}
/* Main entry for the tree if-conversion pass. */
@@ -866,7 +1182,7 @@ pass_tree_ifcombine::execute (function *fun)
basic_block bb = bbs[i];
if (safe_is_a <gcond *> (*gsi_last_bb (bb)))
- if (tree_ssa_ifcombine_bb (bb))
+ if (basic_block outer_bb = tree_ssa_ifcombine_bb (bb))
{
/* Clear range info from all stmts in BB which is now executed
conditional on a always true/false condition. */
@@ -885,6 +1201,12 @@ pass_tree_ifcombine::execute (function *fun)
rewrite_to_defined_overflow (&gsi);
}
cfg_changed |= true;
+ /* Go back to outer_bb, in case it could be further optimized, but
+ only at -O2+, since it could get quadratic. */
+ do
+ i++;
+ while (optimize >= 2 && bbs[i] != outer_bb);
+ continue;
}
}