[v3] tree-optimization/95821 - Convert strlen + strchr to memchr
Commit Message
This patch allows for strchr(x, c) to the replace with memchr(x, c,
strlen(x) + 1) if strlen(x) has already been computed earlier in the
tree.
Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
Since memchr doesn't need to re-find the null terminator it is faster
than strchr.
bootstrapped and tested on x86_64-linux.
PR tree-optimization/95821
gcc/
* tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
memchr instead of strchr if strlen already computed.
gcc/testsuite/
* c-c++-common/pr95821-1.c: New test.
* c-c++-common/pr95821-2.c: New test.
* c-c++-common/pr95821-3.c: New test.
* c-c++-common/pr95821-4.c: New test.
* c-c++-common/pr95821-5.c: New test.
* c-c++-common/pr95821-6.c: New test.
* c-c++-common/pr95821-7.c: New test.
* c-c++-common/pr95821-8.c: New test.
---
gcc/testsuite/c-c++-common/pr95821-1.c | 15 ++++
gcc/testsuite/c-c++-common/pr95821-2.c | 17 ++++
gcc/testsuite/c-c++-common/pr95821-3.c | 17 ++++
gcc/testsuite/c-c++-common/pr95821-4.c | 16 ++++
gcc/testsuite/c-c++-common/pr95821-5.c | 19 +++++
gcc/testsuite/c-c++-common/pr95821-6.c | 18 ++++
gcc/testsuite/c-c++-common/pr95821-7.c | 18 ++++
gcc/testsuite/c-c++-common/pr95821-8.c | 19 +++++
gcc/tree-ssa-strlen.cc | 113 ++++++++++++++++++++-----
9 files changed, 233 insertions(+), 19 deletions(-)
create mode 100644 gcc/testsuite/c-c++-common/pr95821-1.c
create mode 100644 gcc/testsuite/c-c++-common/pr95821-2.c
create mode 100644 gcc/testsuite/c-c++-common/pr95821-3.c
create mode 100644 gcc/testsuite/c-c++-common/pr95821-4.c
create mode 100644 gcc/testsuite/c-c++-common/pr95821-5.c
create mode 100644 gcc/testsuite/c-c++-common/pr95821-6.c
create mode 100644 gcc/testsuite/c-c++-common/pr95821-7.c
create mode 100644 gcc/testsuite/c-c++-common/pr95821-8.c
Comments
On 6/21/2022 12:12 PM, Noah Goldstein via Gcc-patches wrote:
> This patch allows for strchr(x, c) to the replace with memchr(x, c,
> strlen(x) + 1) if strlen(x) has already been computed earlier in the
> tree.
>
> Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
>
> Since memchr doesn't need to re-find the null terminator it is faster
> than strchr.
>
> bootstrapped and tested on x86_64-linux.
>
> PR tree-optimization/95821
>
> gcc/
>
> * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> memchr instead of strchr if strlen already computed.
>
> gcc/testsuite/
>
> * c-c++-common/pr95821-1.c: New test.
> * c-c++-common/pr95821-2.c: New test.
> * c-c++-common/pr95821-3.c: New test.
> * c-c++-common/pr95821-4.c: New test.
> * c-c++-common/pr95821-5.c: New test.
> * c-c++-common/pr95821-6.c: New test.
> * c-c++-common/pr95821-7.c: New test.
> * c-c++-common/pr95821-8.c: New test.
Given Jakub's involvement to-date and the fact this touches
tree-ssa-strlen.cc I think Jakub should have final ACK/NAK on this.
jeff
On Sat, Jul 9, 2022 at 8:59 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 6/21/2022 12:12 PM, Noah Goldstein via Gcc-patches wrote:
> > This patch allows for strchr(x, c) to the replace with memchr(x, c,
> > strlen(x) + 1) if strlen(x) has already been computed earlier in the
> > tree.
> >
> > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> >
> > Since memchr doesn't need to re-find the null terminator it is faster
> > than strchr.
> >
> > bootstrapped and tested on x86_64-linux.
> >
> > PR tree-optimization/95821
> >
> > gcc/
> >
> > * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> > memchr instead of strchr if strlen already computed.
> >
> > gcc/testsuite/
> >
> > * c-c++-common/pr95821-1.c: New test.
> > * c-c++-common/pr95821-2.c: New test.
> > * c-c++-common/pr95821-3.c: New test.
> > * c-c++-common/pr95821-4.c: New test.
> > * c-c++-common/pr95821-5.c: New test.
> > * c-c++-common/pr95821-6.c: New test.
> > * c-c++-common/pr95821-7.c: New test.
> > * c-c++-common/pr95821-8.c: New test.
> Given Jakub's involvement to-date and the fact this touches
> tree-ssa-strlen.cc I think Jakub should have final ACK/NAK on this.
>
> jeff
>
Ping.
On Tue, Jun 21, 2022 at 11:12:15AM -0700, Noah Goldstein wrote:
> This patch allows for strchr(x, c) to the replace with memchr(x, c,
> strlen(x) + 1) if strlen(x) has already been computed earlier in the
> tree.
>
> Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
>
> Since memchr doesn't need to re-find the null terminator it is faster
> than strchr.
>
> bootstrapped and tested on x86_64-linux.
>
> PR tree-optimization/95821
>
> gcc/
>
> * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> memchr instead of strchr if strlen already computed.
>
> gcc/testsuite/
>
> * c-c++-common/pr95821-1.c: New test.
> * c-c++-common/pr95821-2.c: New test.
> * c-c++-common/pr95821-3.c: New test.
> * c-c++-common/pr95821-4.c: New test.
> * c-c++-common/pr95821-5.c: New test.
> * c-c++-common/pr95821-6.c: New test.
> * c-c++-common/pr95821-7.c: New test.
> * c-c++-common/pr95821-8.c: New test.
Sorry for the delay.
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr95821-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler "memchr" } } */
Please don't scan assembler, whether memchr will expand
to a call or be expanded inline etc. is not known.
Better use "-O2 -fdump-tree-optimize" in dg-options
and scan the optimized dump for "memchr \\\(".
Ditto for other tests.
> @@ -2452,32 +2459,96 @@ strlen_pass::handle_builtin_strchr ()
> fprintf (dump_file, "Optimizing: ");
> print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> }
> - if (si != NULL && si->endptr != NULL_TREE)
> + /* Three potential optimizations assume t=strlen (s) has already been
> + computed:
> + 1. strchr (s, chr) where chr is known to be zero -> t
-> s + t
rather than
-> t
actually.
> + 2. strchr (s, chr) where chr is known not to be zero ->
> + memchr (s, chr, t)
> + 3. strchr (s, chr) where chr is not known to be zero or
nor instead of or?
> + non-zero -> memchr (s, chr, t + 1). */
> + if (!is_strchr_zerop)
> {
> - rhs = unshare_expr (si->endptr);
> - if (!useless_type_conversion_p (TREE_TYPE (lhs),
> - TREE_TYPE (rhs)))
> - rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
> + /* If its not strchr (s, zerop) then try and convert to
> + memchr since strlen has already been computed. */
> + tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> +
> + /* Only need to check length strlen (s) + 1 if chr may be zero.
> + Otherwise the last chr (which is known to be zero) can never
> + be a match. */
> + bool chr_nonzero = false;
> + if (TREE_CODE (chr) == INTEGER_CST
> + && integer_nonzerop (fold_convert (char_type_node, chr)))
> + chr_nonzero = true;
> + else if (TREE_CODE (chr) == SSA_NAME
> + && CHAR_TYPE_SIZE < INT_TYPE_SIZE)
> + {
> + value_range r;
> + /* Try to determine using ranges if (char) chr must
> + be always 0. That is true e.g. if all the subranges
must be always non-zero ?
> + have the INT_TYPE_SIZE - CHAR_TYPE_SIZE bits
> + the same on lower and upper bounds. */
That is actually not enough, see below.
> + if (get_range_query (cfun)->range_of_expr (r, chr, stmt)
> + && r.kind () == VR_RANGE)
> + {
> + wide_int mask
> + = wi::mask (CHAR_TYPE_SIZE, true, INT_TYPE_SIZE);
Wrong indentation, = should be 2 columns left of wide_int.
> + for (unsigned i = 0; i < r.num_pairs (); ++i)
> + if ((r.lower_bound (i) & mask)
> + != (r.upper_bound (i) & mask))
> + {
> + chr_nonzero = false;
> + break;
> + }
This else if actually can't do what it indends to, because
chr_nonzero is initialized to false at the start and in the loop you
also just set it to false, so it is always false.
You need to add chr_nonzero = true; before the for loop above.
With that, all the above test proves is that there is no range like
[15, 257] where it would include 256 in the middle of the range or
at the end. But the above doesn't clear chr_nonzero on ranges like
[0, 32] or [256, 511] where (char) chr can still be zero.
So, the test should be:
if ((r.lower_bound (i) & mask)
!= (r.upper_bound (i) & mask)
|| (r.lower_bound (i) & ~mask) == 0)
or so, that will rule out also the above ranges and if one just has ranges
like:
[1, 32] U [48, 56] U [257, 511]
all is fine, (char) chr is non-zero.
But this also shows that the testsuite coverage is insufficient because
nothing caught this.
I don't see almost any tests where the second argument to strchr would be
constant (ideally check for all of 0, ~0 & ~(unsigned char) ~0, ' ',
(~0 & ~(unsigned char) ~0) + ' ') - I see you have one test with
if (c != 0x100) return else strchr which effectively is strchr (, 0x100)
and one if (c != 0) return else strchr which has c range of ~[0, 0] with
which you can't do much (just can verify that we don't treat that as
(char) c can't be zero). Beyond the tests with constant strchr arguments
(and I think you want to check in each case if there is
"= slen\[a-zA-Z.0-9_]* \\\+ 1;"
or not (and how many times if you e.g. stick more tests into one source
file, ideally all where you want the + 1 and in another one all that should
not have it)) it would be nice to have at least some tests where you
test the above problematic cases, say something like:
if (c < 256)
{
if (c < 1 || c > 64)
return ...;
}
else
{
if (c < 257 || c > 511)
return ...;
}
...
strchr (..., c);
c above should be (needs to be verified in the debugger) [1, 64] U [257, 511]
and so chr_nonzero. Similarly construct cases like [1, 32] U [48, 56] U [257, 511]
(chr_nonzero) or [0, 32] U [256, 511] (unknown whether c is zero or
non-zero) or [15, 257] (unknown too).
Jakub
new file mode 100644
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include <stddef.h>
+
+char *
+foo (char *s, char c)
+{
+ size_t slen = __builtin_strlen(s);
+ if(slen < 1000)
+ return NULL;
+
+ return __builtin_strchr(s, c);
+}
new file mode 100644
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include <stddef.h>
+
+char *
+foo (char *s, char c, char * other)
+{
+ size_t slen = __builtin_strlen(s);
+ if(slen < 1000)
+ return NULL;
+
+ *other = 0;
+
+ return __builtin_strchr(s, c);
+}
new file mode 100644
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include <stddef.h>
+
+char *
+foo (char * __restrict s, char c, char * __restrict other)
+{
+ size_t slen = __builtin_strlen(s);
+ if(slen < 1000)
+ return NULL;
+
+ *other = 0;
+
+ return __builtin_strchr(s, c);
+}
new file mode 100644
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include <stddef.h>
+#include <string.h>
+
+char *
+foo (char *s, char c)
+{
+ size_t slen = strlen(s);
+ if(slen < 1000)
+ return NULL;
+
+ return strchr(s, c);
+}
new file mode 100644
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include <stddef.h>
+#include <string.h>
+
+char *
+foo (char *s, char c, char * other)
+{
+ size_t slen = strlen(s);
+ if(slen < 1000)
+ return NULL;
+
+ *other = 0;
+
+ return strchr(s, c);
+}
+int main() {}
new file mode 100644
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include <stddef.h>
+#include <string.h>
+
+char *
+foo (char * __restrict s, char c, char * __restrict other)
+{
+ size_t slen = strlen(s);
+ if(slen < 1000)
+ return NULL;
+
+ *other = 0;
+
+ return strchr(s, c);
+}
new file mode 100644
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include <stddef.h>
+#include <string.h>
+
+char *
+foo (char * __restrict s, char c, char * __restrict other)
+{
+ size_t slen = strlen(s);
+ if(slen < 1000 || c == 0)
+ return NULL;
+
+ *other = 0;
+
+ return strchr(s, c);
+}
new file mode 100644
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "strchr" } } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include <stddef.h>
+#include <string.h>
+
+char *
+foo (char * __restrict s, int c, char * __restrict other)
+{
+ size_t slen = strlen(s);
+ if(slen < 1000 || c != 0x100)
+ return NULL;
+
+ *other = 0;
+
+ return strchr(s, c);
+}
@@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
}
}
-/* Handle a strchr call. If strlen of the first argument is known, replace
- the strchr (x, 0) call with the endptr or x + strlen, otherwise remember
- that lhs of the call is endptr and strlen of the argument is endptr - x. */
+/* Handle a strchr call. If strlen of the first argument is known,
+ replace the strchr (x, 0) call with the endptr or x + strlen,
+ otherwise remember that lhs of the call is endptr and strlen of the
+ argument is endptr - x. If strlen of x is not know but has been
+ computed earlier in the tree then replace strchr (x, c) to
+ memchr (x, c, strlen + 1). */
void
strlen_pass::handle_builtin_strchr ()
@@ -2418,8 +2421,12 @@ strlen_pass::handle_builtin_strchr ()
if (lhs == NULL_TREE)
return;
- if (!integer_zerop (gimple_call_arg (stmt, 1)))
- return;
+ tree chr = gimple_call_arg (stmt, 1);
+ /* strchr only uses the lower char of input so to check if its
+ strchr (s, zerop) only take into account the lower char. */
+ bool is_strchr_zerop
+ = (TREE_CODE (chr) == INTEGER_CST
+ && integer_zerop (fold_convert (char_type_node, chr)));
tree src = gimple_call_arg (stmt, 0);
@@ -2452,32 +2459,96 @@ strlen_pass::handle_builtin_strchr ()
fprintf (dump_file, "Optimizing: ");
print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
}
- if (si != NULL && si->endptr != NULL_TREE)
+ /* Three potential optimizations assume t=strlen (s) has already been
+ computed:
+ 1. strchr (s, chr) where chr is known to be zero -> t
+ 2. strchr (s, chr) where chr is known not to be zero ->
+ memchr (s, chr, t)
+ 3. strchr (s, chr) where chr is not known to be zero or
+ non-zero -> memchr (s, chr, t + 1). */
+ if (!is_strchr_zerop)
{
- rhs = unshare_expr (si->endptr);
- if (!useless_type_conversion_p (TREE_TYPE (lhs),
- TREE_TYPE (rhs)))
- rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
+ /* If its not strchr (s, zerop) then try and convert to
+ memchr since strlen has already been computed. */
+ tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
+
+ /* Only need to check length strlen (s) + 1 if chr may be zero.
+ Otherwise the last chr (which is known to be zero) can never
+ be a match. */
+ bool chr_nonzero = false;
+ if (TREE_CODE (chr) == INTEGER_CST
+ && integer_nonzerop (fold_convert (char_type_node, chr)))
+ chr_nonzero = true;
+ else if (TREE_CODE (chr) == SSA_NAME
+ && CHAR_TYPE_SIZE < INT_TYPE_SIZE)
+ {
+ value_range r;
+ /* Try to determine using ranges if (char) chr must
+ be always 0. That is true e.g. if all the subranges
+ have the INT_TYPE_SIZE - CHAR_TYPE_SIZE bits
+ the same on lower and upper bounds. */
+ if (get_range_query (cfun)->range_of_expr (r, chr, stmt)
+ && r.kind () == VR_RANGE)
+ {
+ wide_int mask
+ = wi::mask (CHAR_TYPE_SIZE, true, INT_TYPE_SIZE);
+ for (unsigned i = 0; i < r.num_pairs (); ++i)
+ if ((r.lower_bound (i) & mask)
+ != (r.upper_bound (i) & mask))
+ {
+ chr_nonzero = false;
+ break;
+ }
+ }
+ }
+ if (!chr_nonzero)
+ {
+ tree one = build_int_cst (TREE_TYPE (rhs), 1);
+ rhs = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (rhs),
+ unshare_expr (rhs), one);
+ tree size = make_ssa_name (TREE_TYPE (rhs));
+ gassign *size_stmt = gimple_build_assign (size, rhs);
+ gsi_insert_before (&m_gsi, size_stmt, GSI_SAME_STMT);
+ rhs = size;
+ }
+ if (!update_gimple_call (&m_gsi, fn, 3, src, chr, rhs))
+ return;
}
else
{
- rhs = fold_convert_loc (loc, sizetype, unshare_expr (rhs));
- rhs = fold_build2_loc (loc, POINTER_PLUS_EXPR,
- TREE_TYPE (src), src, rhs);
- if (!useless_type_conversion_p (TREE_TYPE (lhs),
- TREE_TYPE (rhs)))
- rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
+ if (si != NULL && si->endptr != NULL_TREE)
+ {
+ rhs = unshare_expr (si->endptr);
+ if (!useless_type_conversion_p (TREE_TYPE (lhs),
+ TREE_TYPE (rhs)))
+ rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
+ }
+ else
+ {
+ rhs = fold_convert_loc (loc, sizetype, unshare_expr (rhs));
+ rhs = fold_build2_loc (loc, POINTER_PLUS_EXPR,
+ TREE_TYPE (src), src, rhs);
+ if (!useless_type_conversion_p (TREE_TYPE (lhs),
+ TREE_TYPE (rhs)))
+ rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
+ }
+ gimplify_and_update_call_from_tree (&m_gsi, rhs);
}
- gimplify_and_update_call_from_tree (&m_gsi, rhs);
+
stmt = gsi_stmt (m_gsi);
update_stmt (stmt);
+
if (dump_file && (dump_flags & TDF_DETAILS) != 0)
{
fprintf (dump_file, "into: ");
print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
}
- if (si != NULL
- && si->endptr == NULL_TREE
+
+ /* Don't update strlen of lhs if search-char wasn't know to be zero. */
+ if (!is_strchr_zerop)
+ return;
+
+ if (si != NULL && si->endptr == NULL_TREE
&& !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
{
si = unshare_strinfo (si);
@@ -2487,6 +2558,10 @@ strlen_pass::handle_builtin_strchr ()
return;
}
}
+
+ if (!is_strchr_zerop)
+ return;
+
if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
return;
if (TREE_CODE (src) != SSA_NAME || !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (src))