s390: Add expander for uaddc/usubc optabs
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
Commit Message
Bootstrapped and regtested on s390. Both expander are constrained to
z196 because of the conditional moves. I guess this is reasonable
nowadays.
Would be great if you could have a second look that setting the
carry/borrow bit (bit 18 of the PSW) is indeed correct. Brain twisted
me at first ;-)
-- 8< --
gcc/ChangeLog:
* config/s390/s390.md (*add<mode>3_carry1_cc): Renamed to ...
(add<mode>3_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*sub<mode>3_borrow_cc): Renamed to ...
(sub<mode>3_borrow_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*add<mode>3_alc_carry1_cc): Renamed to ...
(add<mode>3_alc_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(sub<mode>3_slb_borrow1_cc): New.
(uaddc<mode>5): New.
(usubc<mode>5): New.
gcc/testsuite/ChangeLog:
* gcc.target/s390/uaddc-1.c: New test.
* gcc.target/s390/uaddc-2.c: New test.
* gcc.target/s390/usubc-1.c: New test.
* gcc.target/s390/usubc-2.c: New test.
---
gcc/config/s390/s390.md | 103 +++++++++++++++++++-----
gcc/testsuite/gcc.target/s390/uaddc-1.c | 80 ++++++++++++++++++
gcc/testsuite/gcc.target/s390/uaddc-2.c | 25 ++++++
gcc/testsuite/gcc.target/s390/usubc-1.c | 80 ++++++++++++++++++
gcc/testsuite/gcc.target/s390/usubc-2.c | 25 ++++++
5 files changed, 295 insertions(+), 18 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/s390/uaddc-1.c
create mode 100644 gcc/testsuite/gcc.target/s390/uaddc-2.c
create mode 100644 gcc/testsuite/gcc.target/s390/usubc-1.c
create mode 100644 gcc/testsuite/gcc.target/s390/usubc-2.c
Comments
Hi Stefan,
thanks for the patch and sorry for the slow review.
On 9/18/24 19:25, Stefan Schulze Frielinghaus wrote:
> Bootstrapped and regtested on s390. Both expander are constrained to
> z196 because of the conditional moves. I guess this is reasonable
> nowadays.
The reason for the limitation seems to be that you rely on the load on
condition pattern in order to get the carry into a GPR. What you really
want here is load on condition immediate, which has been added with z13.
If you compile your test for z196 we would end up using another alcr,
what would have been possible also before z196. But I agree, that the
limitation to z196 upwards is reasonable.
...
>
> +(define_expand "uaddc<mode>5"
> + [(match_operand:GPR 0 "register_operand")
> + (match_operand:GPR 1 "nonimmediate_operand")
> + (match_operand:GPR 2 "nonimmediate_operand")
> + (match_operand:GPR 3 "nonimmediate_operand")
> + (match_operand:GPR 4 "general_operand")]
> + "TARGET_Z196 && (<MODE>mode != DImode || TARGET_64BIT)"
Why is the second part necessary here? Shouldn't this also work for 31
bit with zarch?
> +{
> + rtx cond = gen_rtx_LTU (<MODE>mode, gen_rtx_REG (CCL1mode, CC_REGNUM), const0_rtx);
> + if (operands[4] == const0_rtx)
> + emit_insn (gen_add<mode>3_carry1_cc (operands[0], operands[2], operands[3]));
> + else
If we would just generate the alc pattern with a carry in of 0, wouldn't
the other optimizers be able to figure out that the a normal add would
do here? This path does not seem to get exercised by your testcases.
> + {
> + rtx tmp;
> + if (CONSTANT_P (operands[4]))
> + {
> + tmp = gen_reg_rtx (SImode);
> + emit_move_insn (tmp, operands[4]);
> + }
> + else
> + tmp = operands[4];
> + s390_emit_compare (LTU, tmp, const0_rtx);
Couldn't you just replace tmp with a force_reg (<MODE>mode, operands[4])
here to get rid of the IF statement above?
The CC handling doesn't look correct to me. The s390_emit_compare would
create a CCU mode comparison result, which is currently consumed as CCL1
by the ALC pattern. This seems to be inconsistent. s390_emit_compare
returns the condition, which I think needs to be fed into the
alc_carry1_cc pattern as input condition.
Also is LTU really the correct code here? CCU + LTU would expect CC1,
but you want CC2 or CC3 for the carry in, so GTU should be the better
choice. s390_alc_comparison should make sure that only valid
combinations are accepted, CCU + GTU would be one of them.
Similar for usubc I think.
> diff --git a/gcc/testsuite/gcc.target/s390/uaddc-1.c b/gcc/testsuite/gcc.target/s390/uaddc-1.c
> new file mode 100644
> index 00000000000..dfad5915690
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/uaddc-1.c
> @@ -0,0 +1,80 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mzarch -save-temps -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 2 "optimized" { target lp64 } } } */
> +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 1 "optimized" { target { ! lp64 } } } } */
> +/* { dg-final { scan-assembler-times "\\talcr\\t" 1 } } */
> +/* { dg-final { scan-assembler-times "\\talcgr\\t" 1 { target lp64 } } } */
Your checks seem to rely on the testcase being compiled with at least
-march=z13.
Bye,
Andreas
On Mon, Nov 11, 2024 at 11:29:16AM +0100, Andreas Krebbel wrote:
> Hi Stefan,
>
> thanks for the patch and sorry for the slow review.
No worries. Thanks for the detailed review. Much appreciated!
>
>
> On 9/18/24 19:25, Stefan Schulze Frielinghaus wrote:
> > Bootstrapped and regtested on s390. Both expander are constrained to
> > z196 because of the conditional moves. I guess this is reasonable
> > nowadays.
>
> The reason for the limitation seems to be that you rely on the load on
> condition pattern in order to get the carry into a GPR. What you really want
> here is load on condition immediate, which has been added with z13. If you
> compile your test for z196 we would end up using another alcr, what would
> have been possible also before z196. But I agree, that the limitation to
> z196 upwards is reasonable.
>
> ...
> > +(define_expand "uaddc<mode>5"
> > + [(match_operand:GPR 0 "register_operand")
> > + (match_operand:GPR 1 "nonimmediate_operand")
> > + (match_operand:GPR 2 "nonimmediate_operand")
> > + (match_operand:GPR 3 "nonimmediate_operand")
> > + (match_operand:GPR 4 "general_operand")]
> > + "TARGET_Z196 && (<MODE>mode != DImode || TARGET_64BIT)"
>
> Why is the second part necessary here? Shouldn't this also work for 31 bit
> with zarch?
Right, that works for 31bit + zarch and therefore TARGET_Z196 is enough.
>
> > +{
> > + rtx cond = gen_rtx_LTU (<MODE>mode, gen_rtx_REG (CCL1mode, CC_REGNUM), const0_rtx);
> > + if (operands[4] == const0_rtx)
> > + emit_insn (gen_add<mode>3_carry1_cc (operands[0], operands[2], operands[3]));
> > + else
> If we would just generate the alc pattern with a carry in of 0, wouldn't the
> other optimizers be able to figure out that the a normal add would do here?
> This path does not seem to get exercised by your testcases.
The zero carry in can occur due to
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a7aec76a74dd38524be325343158d3049b6ab3ac
which is why we have a special case just in order to emit an add with
carry out only.
> > + {
> > + rtx tmp;
> > + if (CONSTANT_P (operands[4]))
> > + {
> > + tmp = gen_reg_rtx (SImode);
> > + emit_move_insn (tmp, operands[4]);
> > + }
> > + else
> > + tmp = operands[4];
> > + s390_emit_compare (LTU, tmp, const0_rtx);
>
> Couldn't you just replace tmp with a force_reg (<MODE>mode, operands[4])
> here to get rid of the IF statement above?
Changed.
> The CC handling doesn't look correct to me. The s390_emit_compare would
> create a CCU mode comparison result, which is currently consumed as CCL1 by
> the ALC pattern. This seems to be inconsistent. s390_emit_compare returns
> the condition, which I think needs to be fed into the alc_carry1_cc pattern
> as input condition.
>
> Also is LTU really the correct code here? CCU + LTU would expect CC1, but
> you want CC2 or CC3 for the carry in, so GTU should be the better choice.
> s390_alc_comparison should make sure that only valid combinations are
> accepted, CCU + GTU would be one of them.
I was coming up with my own condition since conditions created by
s390_emit_compare() are of void mode which is why the alc predicates
s390_alc_comparison() failed since these require GPR mode. I've fixed
that by using PUT_MODE_RAW on those. I think we could also remove the
mode from the match_operand which might be more appropriate. I've done
it the latter way for sub<mode>3_slb_borrow1_cc. Once we have settled
for one or the other version I will align uaddc/usubc.
>
> Similar for usubc I think.
Here we have that s390_emit_compare (GTU, ...) emits a CCU mode
comparison and s390_slb_comparison() expects a LEU for that which is why
the predicate fails. Thus, the interplay of s390_emit_compare() and
s390_slb_comparison() is not clear to me. For the moment, I switched to
LEU.
>
> > diff --git a/gcc/testsuite/gcc.target/s390/uaddc-1.c b/gcc/testsuite/gcc.target/s390/uaddc-1.c
> > new file mode 100644
> > index 00000000000..dfad5915690
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/s390/uaddc-1.c
> > @@ -0,0 +1,80 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -mzarch -save-temps -fdump-tree-optimized" } */
> > +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 2 "optimized" { target lp64 } } } */
> > +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 1 "optimized" { target { ! lp64 } } } } */
> > +/* { dg-final { scan-assembler-times "\\talcr\\t" 1 } } */
> > +/* { dg-final { scan-assembler-times "\\talcgr\\t" 1 { target lp64 } } } */
>
> Your checks seem to rely on the testcase being compiled with at least
> -march=z13.
Changed.
I've attached a revised version of the patch.
Cheers,
Stefan
From ab688eadf19136db73e2da0a53d8dd7bd00fe7a0 Mon Sep 17 00:00:00 2001
From: Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org>
Date: Tue, 12 Nov 2024 10:31:04 +0100
Subject: [PATCH] s390: Add expander for uaddc/usubc optabs
gcc/ChangeLog:
* config/s390/s390.md (*add<mode>3_carry1_cc): Renamed to ...
(add<mode>3_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*sub<mode>3_borrow_cc): Renamed to ...
(sub<mode>3_borrow_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*add<mode>3_alc_carry1_cc): Renamed to ...
(add<mode>3_alc_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(sub<mode>3_slb_borrow1_cc): New.
(uaddc<mode>5): New.
(usubc<mode>5): New.
gcc/testsuite/ChangeLog:
* gcc.target/s390/uaddc-1.c: New test.
* gcc.target/s390/uaddc-2.c: New test.
* gcc.target/s390/usubc-1.c: New test.
* gcc.target/s390/usubc-2.c: New test.
---
gcc/config/s390/s390.md | 97 ++++++++++++++++++++-----
gcc/testsuite/gcc.target/s390/uaddc-1.c | 80 ++++++++++++++++++++
gcc/testsuite/gcc.target/s390/uaddc-2.c | 25 +++++++
gcc/testsuite/gcc.target/s390/usubc-1.c | 80 ++++++++++++++++++++
gcc/testsuite/gcc.target/s390/usubc-2.c | 25 +++++++
5 files changed, 289 insertions(+), 18 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/s390/uaddc-1.c
create mode 100644 gcc/testsuite/gcc.target/s390/uaddc-2.c
create mode 100644 gcc/testsuite/gcc.target/s390/usubc-1.c
create mode 100644 gcc/testsuite/gcc.target/s390/usubc-2.c
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 4a225ae24f3..1777d7e9f64 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -6001,14 +6001,14 @@
z10_super_E1,z10_super_E1,z10_super_E1")])
; alr, alfi, slfi, al, aly, alrk, alhsik, algr, algfi, slgfi, alg, alsi, algsi, algrk, alghsik
-(define_insn "*add<mode>3_carry1_cc"
- [(set (reg CC_REGNUM)
- (compare (plus:GPR (match_operand:GPR 1 "nonimmediate_operand" "%0,d, 0, 0,d,0,0,0")
- (match_operand:GPR 2 "general_operand" " d,d,Op,On,K,R,T,C"))
- (match_dup 1)))
- (set (match_operand:GPR 0 "nonimmediate_operand" "=d,d, d, d,d,d,d,d")
+(define_insn "add<mode>3_carry1_cc"
+ [(set (reg:CCL1 CC_REGNUM)
+ (compare:CCL1 (plus:GPR (match_operand:GPR 1 "nonimmediate_operand" "%0,d, 0, 0,d,0,0,0")
+ (match_operand:GPR 2 "general_operand" " d,d,Op,On,K,R,T,C"))
+ (match_dup 1)))
+ (set (match_operand:GPR 0 "nonimmediate_operand" "=d,d, d, d,d,d,d,d")
(plus:GPR (match_dup 1) (match_dup 2)))]
- "s390_match_ccmode (insn, CCL1mode)"
+ ""
"@
al<g>r\t%0,%2
al<g>rk\t%0,%1,%2
@@ -6541,14 +6541,14 @@
(set_attr "z10prop" "z10_super_c_E1,*,z10_super_E1,z10_super_E1")])
; slr, sl, sly, slgr, slg, slrk, slgrk
-(define_insn "*sub<mode>3_borrow_cc"
- [(set (reg CC_REGNUM)
- (compare (minus:GPR (match_operand:GPR 1 "register_operand" "0,d,0,0")
- (match_operand:GPR 2 "general_operand" "d,d,R,T"))
- (match_dup 1)))
- (set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
+(define_insn "sub<mode>3_borrow_cc"
+ [(set (reg:CCL2 CC_REGNUM)
+ (compare:CCL2 (minus:GPR (match_operand:GPR 1 "register_operand" "0,d,0,0")
+ (match_operand:GPR 2 "general_operand" "d,d,R,T"))
+ (match_dup 1)))
+ (set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
(minus:GPR (match_dup 1) (match_dup 2)))]
- "s390_match_ccmode (insn, CCL2mode)"
+ ""
"@
sl<g>r\t%0,%2
sl<g>rk\t%0,%1,%2
@@ -6754,22 +6754,44 @@
; add(di|si)cc instruction pattern(s).
;
+(define_expand "uaddc<mode>5"
+ [(match_operand:GPR 0 "register_operand")
+ (match_operand:GPR 1 "nonimmediate_operand")
+ (match_operand:GPR 2 "nonimmediate_operand")
+ (match_operand:GPR 3 "nonimmediate_operand")
+ (match_operand:GPR 4 "general_operand")]
+ "TARGET_Z196"
+{
+ if (operands[4] == const0_rtx)
+ emit_insn (gen_add<mode>3_carry1_cc (operands[0], operands[2], operands[3]));
+ else
+ {
+ operands[4] = force_reg (<MODE>mode, operands[4]);
+ rtx alc_cond = s390_emit_compare (GTU, operands[4], const0_rtx);
+ PUT_MODE_RAW (alc_cond, <MODE>mode);
+ emit_insn (gen_add<mode>3_alc_carry1_cc (operands[0], operands[2], operands[3], alc_cond));
+ }
+ rtx mov_cond = gen_rtx_LTU (<MODE>mode, gen_rtx_REG (CCL1mode, CC_REGNUM), const0_rtx);
+ emit_insn (gen_mov<mode>cc (operands[1], mov_cond, const1_rtx, const0_rtx));
+ DONE;
+})
+
; the following 4 patterns are used when the result of an add with
; carry is checked for an overflow condition
; op1 + op2 + c < op1
; alcr, alc, alcgr, alcg
-(define_insn "*add<mode>3_alc_carry1_cc"
- [(set (reg CC_REGNUM)
- (compare
+(define_insn "add<mode>3_alc_carry1_cc"
+ [(set (reg:CCL1 CC_REGNUM)
+ (compare:CCL1
(plus:GPR (plus:GPR (match_operand:GPR 3 "s390_alc_comparison" "")
(match_operand:GPR 1 "nonimmediate_operand" "%0,0"))
(match_operand:GPR 2 "general_operand" "d,T"))
(match_dup 1)))
(set (match_operand:GPR 0 "register_operand" "=d,d")
(plus:GPR (plus:GPR (match_dup 3) (match_dup 1)) (match_dup 2)))]
- "s390_match_ccmode (insn, CCL1mode)"
+ ""
"@
alc<g>r\t%0,%2
alc<g>\t%0,%2"
@@ -6854,6 +6876,28 @@
alc<g>\t%0,%2"
[(set_attr "op_type" "RRE,RXY")])
+(define_expand "usubc<mode>5"
+ [(match_operand:GPR 0 "register_operand")
+ (match_operand:GPR 1 "nonimmediate_operand")
+ (match_operand:GPR 2 "nonimmediate_operand")
+ (match_operand:GPR 3 "nonimmediate_operand")
+ (match_operand:GPR 4 "general_operand")]
+ "TARGET_Z196"
+{
+ if (operands[4] == const0_rtx)
+ emit_insn (gen_sub<mode>3_borrow_cc (operands[0], operands[2], operands[3]));
+ else
+ {
+ rtx tmp = gen_reg_rtx (<MODE>mode);
+ emit_insn (gen_xor<mode>3 (tmp, operands[4], const1_rtx));
+ rtx slb_cond = s390_emit_compare (LEU, tmp, const0_rtx);
+ emit_insn (gen_sub<mode>3_slb_borrow1_cc (operands[0], operands[2], operands[3], slb_cond));
+ }
+ rtx mov_cond = gen_rtx_GTU (<MODE>mode, gen_rtx_REG (CCL2mode, CC_REGNUM), const0_rtx);
+ emit_insn (gen_mov<mode>cc (operands[1], mov_cond, const1_rtx, const0_rtx));
+ DONE;
+})
+
; slbr, slb, slbgr, slbg
(define_insn "*sub<mode>3_slb_cc"
[(set (reg CC_REGNUM)
@@ -6885,6 +6929,23 @@
[(set_attr "op_type" "RRE,RXY")
(set_attr "z10prop" "z10_c,*")])
+; slbr, slb, slbgr, slbg
+(define_insn "sub<mode>3_slb_borrow1_cc"
+ [(set (reg:CCL2 CC_REGNUM)
+ (compare:CCL2
+ (minus:GPR (minus:GPR (match_operand:GPR 1 "nonimmediate_operand" "0,0")
+ (match_operand:GPR 2 "general_operand" "d,T"))
+ (match_operand 3 "s390_slb_comparison" ""))
+ (match_dup 1)))
+ (set (match_operand:GPR 0 "register_operand" "=d,d")
+ (minus:GPR (minus:GPR (match_dup 1) (match_dup 2)) (match_dup 3)))]
+ ""
+ "@
+ slb<g>r\t%0,%2
+ slb<g>\t%0,%2"
+ [(set_attr "op_type" "RRE,RXY")
+ (set_attr "z10prop" "z10_c,*")])
+
(define_expand "add<mode>cc"
[(match_operand:GPR 0 "register_operand" "")
(match_operand 1 "comparison_operator" "")
diff --git a/gcc/testsuite/gcc.target/s390/uaddc-1.c b/gcc/testsuite/gcc.target/s390/uaddc-1.c
new file mode 100644
index 00000000000..1592ee7f75a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/uaddc-1.c
@@ -0,0 +1,80 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -march=z13 -mzarch -save-temps -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 2 "optimized" { target lp64 } } } */
+/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 1 "optimized" { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "\\talcr\\t" 1 } } */
+/* { dg-final { scan-assembler-times "\\talcgr\\t" 1 { target lp64 } } } */
+
+#include <assert.h>
+
+unsigned int __attribute__ ((noipa))
+uaddc (unsigned int x, unsigned int y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned int r;
+ _Bool c1 = __builtin_add_overflow (x, y, &r);
+ _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 | c2;
+ return r;
+}
+
+#ifdef __s390x__
+unsigned long __attribute__ ((noipa))
+uaddcl (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned long r;
+ _Bool c1 = __builtin_add_overflow (x, y, &r);
+ _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 | c2;
+ return r;
+}
+#endif
+
+void
+test_int (void)
+{
+ _Bool c;
+ unsigned int r;
+
+ r = uaddc (0xf0000000u, 0x0fffffffu, 0, &c);
+ assert (r == 0xffffffffu && !c);
+
+ r = uaddc (0xf0000000u, 0x0fffffffu, 1, &c);
+ assert (r == 0 && c);
+
+ r = uaddc (0xf0000001u, 0x0fffffffu, 0, &c);
+ assert (r == 0 && c);
+
+ r = uaddc (0xf0000001u, 0x0fffffffu, 1, &c);
+ assert (r == 1 && c);
+}
+
+#ifdef __s390x__
+void
+test_long (void)
+{
+ _Bool c;
+ unsigned long r;
+
+ r = uaddcl (0xf000000000000000u, 0x0fffffffffffffffu, 0, &c);
+ assert (r == 0xffffffffffffffffu && !c);
+
+ r = uaddcl (0xf000000000000000u, 0x0fffffffffffffffu, 1, &c);
+ assert (r == 0 && c);
+
+ r = uaddcl (0xf000000000000001u, 0x0fffffffffffffffu, 0, &c);
+ assert (r == 0 && c);
+
+ r = uaddcl (0xf000000000000001u, 0x0fffffffffffffffu, 1, &c);
+ assert (r == 1 && c);
+}
+#endif
+
+int
+main (void)
+{
+ test_int ();
+#ifdef __s390x__
+ test_long ();
+#endif
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/s390/uaddc-2.c b/gcc/testsuite/gcc.target/s390/uaddc-2.c
new file mode 100644
index 00000000000..da88733c835
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/uaddc-2.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=z13 -mzarch -save-temps -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 6 "optimized" } } */
+/* { dg-final { scan-assembler-times "\\talcr\\t" 6 { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "\\talcr\\t" 3 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "\\talcgr\\t" 3 { target lp64 } } } */
+
+#define TEST(T, V, OP) \
+ unsigned T \
+ uaddc_##T##_##V (unsigned T x, unsigned T y, _Bool carry_in, _Bool *carry_out) \
+ { \
+ unsigned T r; \
+ _Bool c1 = __builtin_add_overflow (x, y, &r); \
+ _Bool c2 = __builtin_add_overflow (r, carry_in, &r); \
+ *carry_out = c1 OP c2; \
+ return r; \
+ }
+
+TEST(int, 1, |)
+TEST(int, 2, ||)
+TEST(int, 3, ^)
+
+TEST(long, 1, |)
+TEST(long, 2, ||)
+TEST(long, 3, ^)
diff --git a/gcc/testsuite/gcc.target/s390/usubc-1.c b/gcc/testsuite/gcc.target/s390/usubc-1.c
new file mode 100644
index 00000000000..87311206f06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/usubc-1.c
@@ -0,0 +1,80 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -march=z13 -mzarch -save-temps -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "\\.USUBC \\(" 2 "optimized" { target lp64 } } } */
+/* { dg-final { scan-tree-dump-times "\\.USUBC \\(" 1 "optimized" { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "\\tslbr\\t" 1 } } */
+/* { dg-final { scan-assembler-times "\\tslbgr\\t" 1 { target lp64 } } } */
+
+#include <assert.h>
+
+unsigned int __attribute__ ((noipa))
+usubc (unsigned int x, unsigned int y, _Bool borrow_in, _Bool *borrow_out)
+{
+ unsigned int r;
+ _Bool b1 = __builtin_sub_overflow (x, y, &r);
+ _Bool b2 = __builtin_sub_overflow (r, borrow_in, &r);
+ *borrow_out = b1 | b2;
+ return r;
+}
+
+#ifdef __s390x__
+unsigned long __attribute__ ((noipa))
+usubcl (unsigned long x, unsigned long y, _Bool borrow_in, _Bool *borrow_out)
+{
+ unsigned long r;
+ _Bool b1 = __builtin_sub_overflow (x, y, &r);
+ _Bool b2 = __builtin_sub_overflow (r, borrow_in, &r);
+ *borrow_out = b1 | b2;
+ return r;
+}
+#endif
+
+void
+test_int (void)
+{
+ _Bool b;
+ unsigned int r;
+
+ r = usubc (0xfu, 0xfu, 0, &b);
+ assert (r == 0 && !b);
+
+ r = usubc (0xfu, 0xfu, 1, &b);
+ assert (r == 0xffffffffu && b);
+
+ r = usubc (0xfu, 0xffu, 0, &b);
+ assert (r == 0xffffff10u && b);
+
+ r = usubc (0xfu, 0xffu, 1, &b);
+ assert (r == 0xffffff0fu && b);
+}
+
+#ifdef __s390x__
+void
+test_long (void)
+{
+ _Bool b;
+ unsigned long r;
+
+ r = usubcl (0xfu, 0xfu, 0, &b);
+ assert (r == 0 && !b);
+
+ r = usubcl (0xfu, 0xfu, 1, &b);
+ assert (r == 0xffffffffffffffffu && b);
+
+ r = usubcl (0xfu, 0xffu, 0, &b);
+ assert (r == 0xffffffffffffff10u && b);
+
+ r = usubcl (0xfu, 0xffu, 1, &b);
+ assert (r == 0xffffffffffffff0fu && b);
+}
+#endif
+
+int
+main (void)
+{
+ test_int ();
+#ifdef __s390x__
+ test_long ();
+#endif
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/s390/usubc-2.c b/gcc/testsuite/gcc.target/s390/usubc-2.c
new file mode 100644
index 00000000000..382581e4d2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/usubc-2.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=z13 -mzarch -save-temps -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "\\.USUBC \\(" 6 "optimized" } } */
+/* { dg-final { scan-assembler-times "\\tslbr\\t" 6 { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "\\tslbr\\t" 3 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "\\tslbgr\\t" 3 { target lp64 } } } */
+
+#define TEST(T, V, OP) \
+ unsigned T \
+ uaddc_##T##_##V (unsigned T x, unsigned T y, _Bool carry_in, _Bool *carry_out) \
+ { \
+ unsigned T r; \
+ _Bool c1 = __builtin_sub_overflow (x, y, &r); \
+ _Bool c2 = __builtin_sub_overflow (r, carry_in, &r); \
+ *carry_out = c1 OP c2; \
+ return r; \
+ }
+
+TEST(int, 1, |)
+TEST(int, 2, ||)
+TEST(int, 3, ^)
+
+TEST(long, 1, |)
+TEST(long, 2, ||)
+TEST(long, 3, ^)
Hi Stefan,
On 11/12/24 10:35, Stefan Schulze Frielinghaus wrote:
>>> + rtx cond = gen_rtx_LTU (<MODE>mode, gen_rtx_REG (CCL1mode, CC_REGNUM), const0_rtx);
>>> + if (operands[4] == const0_rtx)
>>> + emit_insn (gen_add<mode>3_carry1_cc (operands[0], operands[2], operands[3]));
>>> + else
>> If we would just generate the alc pattern with a carry in of 0, wouldn't the
>> other optimizers be able to figure out that the a normal add would do here?
>> This path does not seem to get exercised by your testcases.
> The zero carry in can occur due to
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a7aec76a74dd38524be325343158d3049b6ab3ac
> which is why we have a special case just in order to emit an add with
> carry out only.
Ok. I was hoping that this downgrade from add with carry in/out to add
with carry out would happen somewhat "automatically", if the the carry
in happens to be a constant zero. But probably not.
Your testcases invokes the pattern also with a constant 0 as carry in,
but since you prevent inlining the pattern is never expanded with a
const0_rtx. The testcase in the commit above is x86 specific, so it
might make sense to add an invocation which triggers the code path
explicitly. Just to be sure.
>> create a CCU mode comparison result, which is currently consumed as CCL1 by
>> the ALC pattern. This seems to be inconsistent. s390_emit_compare returns
>> the condition, which I think needs to be fed into the alc_carry1_cc pattern
>> as input condition.
>>
>> Also is LTU really the correct code here? CCU + LTU would expect CC1, but
>> you want CC2 or CC3 for the carry in, so GTU should be the better choice.
>> s390_alc_comparison should make sure that only valid combinations are
>> accepted, CCU + GTU would be one of them.
> I was coming up with my own condition since conditions created by
> s390_emit_compare() are of void mode which is why the alc predicates
> s390_alc_comparison() failed since these require GPR mode. I've fixed
> that by using PUT_MODE_RAW on those. I think we could also remove the
> mode from the match_operand which might be more appropriate. I've done
> it the latter way for sub<mode>3_slb_borrow1_cc. Once we have settled
> for one or the other version I will align uaddc/usubc.
The PLUS/MINUS arithmetic operations require both operands to have a
proper integer mode. So I think dropping the mode from match_operand
would be wrong. On the other hand, an IF_THEN_ELSE always requires the
comparison to have VOIDmode, that's what s390_emit_compare is supposed
to be used for. So taking what s390_emit_compare generates and changing
the mode is the right thing here. Since PUT_MODE is always a bit ugly,
we might also want go with extending s390_emit_compare with a target
mode operand defaulting to VOIDmode instead.
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2 -mzarch -save-temps -fdump-tree-optimized" } */
>>> +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 2 "optimized" { target lp64 } } } */
>>> +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 1 "optimized" { target { ! lp64 } } } } */
>>> +/* { dg-final { scan-assembler-times "\\talcr\\t" 1 } } */
>>> +/* { dg-final { scan-assembler-times "\\talcgr\\t" 1 { target lp64 } } } */
>> Your checks seem to rely on the testcase being compiled with at least
>> -march=z13.
I think for the run test you would have to make sure that the test is
not executed on machines older than z13 then.
/* { dg-do run { target { s390_useable_hw } } } */
Bye,
Andreas
On Mon, Nov 18, 2024 at 09:32:29AM +0100, Andreas Krebbel wrote:
> Hi Stefan,
>
>
> On 11/12/24 10:35, Stefan Schulze Frielinghaus wrote:
> > > > + rtx cond = gen_rtx_LTU (<MODE>mode, gen_rtx_REG (CCL1mode, CC_REGNUM), const0_rtx);
> > > > + if (operands[4] == const0_rtx)
> > > > + emit_insn (gen_add<mode>3_carry1_cc (operands[0], operands[2], operands[3]));
> > > > + else
> > > If we would just generate the alc pattern with a carry in of 0, wouldn't the
> > > other optimizers be able to figure out that the a normal add would do here?
> > > This path does not seem to get exercised by your testcases.
> > The zero carry in can occur due to
> > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a7aec76a74dd38524be325343158d3049b6ab3ac
> > which is why we have a special case just in order to emit an add with
> > carry out only.
>
> Ok. I was hoping that this downgrade from add with carry in/out to add with
> carry out would happen somewhat "automatically", if the the carry in happens
> to be a constant zero. But probably not.
>
> Your testcases invokes the pattern also with a constant 0 as carry in, but
> since you prevent inlining the pattern is never expanded with a const0_rtx.
> The testcase in the commit above is x86 specific, so it might make sense to
> add an invocation which triggers the code path explicitly. Just to be sure.
Ok, just for the curiosity, my take is that the middle end only emits an
u{add,sub}c optab with a constant zero if a pattern is detected as it
would occur e.g. for bitints, i.e., where multiple limbs are about to be
handled and the first one has a zero carry in. Thus, just invoking the
test case with a zero results in an add-overflow only as expected.
Anyhow, I have added tests u{add,sub}c-3.c in order to test this path,
too.
>
> > > create a CCU mode comparison result, which is currently consumed as CCL1 by
> > > the ALC pattern. This seems to be inconsistent. s390_emit_compare returns
> > > the condition, which I think needs to be fed into the alc_carry1_cc pattern
> > > as input condition.
> > >
> > > Also is LTU really the correct code here? CCU + LTU would expect CC1, but
> > > you want CC2 or CC3 for the carry in, so GTU should be the better choice.
> > > s390_alc_comparison should make sure that only valid combinations are
> > > accepted, CCU + GTU would be one of them.
> > I was coming up with my own condition since conditions created by
> > s390_emit_compare() are of void mode which is why the alc predicates
> > s390_alc_comparison() failed since these require GPR mode. I've fixed
> > that by using PUT_MODE_RAW on those. I think we could also remove the
> > mode from the match_operand which might be more appropriate. I've done
> > it the latter way for sub<mode>3_slb_borrow1_cc. Once we have settled
> > for one or the other version I will align uaddc/usubc.
>
> The PLUS/MINUS arithmetic operations require both operands to have a proper
> integer mode. So I think dropping the mode from match_operand would be
> wrong. On the other hand, an IF_THEN_ELSE always requires the comparison to
> have VOIDmode, that's what s390_emit_compare is supposed to be used for. So
> taking what s390_emit_compare generates and changing the mode is the right
> thing here. Since PUT_MODE is always a bit ugly, we might also want go with
> extending s390_emit_compare with a target mode operand defaulting to
> VOIDmode instead.
Ok, I have added a mode operand to s390_emit_compare() and changed
everything accordingly.
>
> > > > +/* { dg-do run } */
> > > > +/* { dg-options "-O2 -mzarch -save-temps -fdump-tree-optimized" } */
> > > > +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 2 "optimized" { target lp64 } } } */
> > > > +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 1 "optimized" { target { ! lp64 } } } } */
> > > > +/* { dg-final { scan-assembler-times "\\talcr\\t" 1 } } */
> > > > +/* { dg-final { scan-assembler-times "\\talcgr\\t" 1 { target lp64 } } } */
> > > Your checks seem to rely on the testcase being compiled with at least
> > > -march=z13.
>
> I think for the run test you would have to make sure that the test is not
> executed on machines older than z13 then.
>
> /* { dg-do run { target { s390_useable_hw } } } */
Didn't know about this neat predicate. Will probably use it more often
in the future :)
I will send a v2 shortly.
Cheers,
Stefan
@@ -6001,14 +6001,14 @@
z10_super_E1,z10_super_E1,z10_super_E1")])
; alr, alfi, slfi, al, aly, alrk, alhsik, algr, algfi, slgfi, alg, alsi, algsi, algrk, alghsik
-(define_insn "*add<mode>3_carry1_cc"
- [(set (reg CC_REGNUM)
- (compare (plus:GPR (match_operand:GPR 1 "nonimmediate_operand" "%0,d, 0, 0,d,0,0,0")
- (match_operand:GPR 2 "general_operand" " d,d,Op,On,K,R,T,C"))
- (match_dup 1)))
- (set (match_operand:GPR 0 "nonimmediate_operand" "=d,d, d, d,d,d,d,d")
+(define_insn "add<mode>3_carry1_cc"
+ [(set (reg:CCL1 CC_REGNUM)
+ (compare:CCL1 (plus:GPR (match_operand:GPR 1 "nonimmediate_operand" "%0,d, 0, 0,d,0,0,0")
+ (match_operand:GPR 2 "general_operand" " d,d,Op,On,K,R,T,C"))
+ (match_dup 1)))
+ (set (match_operand:GPR 0 "nonimmediate_operand" "=d,d, d, d,d,d,d,d")
(plus:GPR (match_dup 1) (match_dup 2)))]
- "s390_match_ccmode (insn, CCL1mode)"
+ ""
"@
al<g>r\t%0,%2
al<g>rk\t%0,%1,%2
@@ -6541,14 +6541,14 @@
(set_attr "z10prop" "z10_super_c_E1,*,z10_super_E1,z10_super_E1")])
; slr, sl, sly, slgr, slg, slrk, slgrk
-(define_insn "*sub<mode>3_borrow_cc"
- [(set (reg CC_REGNUM)
- (compare (minus:GPR (match_operand:GPR 1 "register_operand" "0,d,0,0")
- (match_operand:GPR 2 "general_operand" "d,d,R,T"))
- (match_dup 1)))
- (set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
+(define_insn "sub<mode>3_borrow_cc"
+ [(set (reg:CCL2 CC_REGNUM)
+ (compare:CCL2 (minus:GPR (match_operand:GPR 1 "register_operand" "0,d,0,0")
+ (match_operand:GPR 2 "general_operand" "d,d,R,T"))
+ (match_dup 1)))
+ (set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
(minus:GPR (match_dup 1) (match_dup 2)))]
- "s390_match_ccmode (insn, CCL2mode)"
+ ""
"@
sl<g>r\t%0,%2
sl<g>rk\t%0,%1,%2
@@ -6754,22 +6754,50 @@
; add(di|si)cc instruction pattern(s).
;
+(define_expand "uaddc<mode>5"
+ [(match_operand:GPR 0 "register_operand")
+ (match_operand:GPR 1 "nonimmediate_operand")
+ (match_operand:GPR 2 "nonimmediate_operand")
+ (match_operand:GPR 3 "nonimmediate_operand")
+ (match_operand:GPR 4 "general_operand")]
+ "TARGET_Z196 && (<MODE>mode != DImode || TARGET_64BIT)"
+{
+ rtx cond = gen_rtx_LTU (<MODE>mode, gen_rtx_REG (CCL1mode, CC_REGNUM), const0_rtx);
+ if (operands[4] == const0_rtx)
+ emit_insn (gen_add<mode>3_carry1_cc (operands[0], operands[2], operands[3]));
+ else
+ {
+ rtx tmp;
+ if (CONSTANT_P (operands[4]))
+ {
+ tmp = gen_reg_rtx (SImode);
+ emit_move_insn (tmp, operands[4]);
+ }
+ else
+ tmp = operands[4];
+ s390_emit_compare (LTU, tmp, const0_rtx);
+ emit_insn (gen_add<mode>3_alc_carry1_cc (operands[0], operands[2], operands[3], cond));
+ }
+ emit_insn (gen_mov<mode>cc (operands[1], cond, const1_rtx, const0_rtx));
+ DONE;
+})
+
; the following 4 patterns are used when the result of an add with
; carry is checked for an overflow condition
; op1 + op2 + c < op1
; alcr, alc, alcgr, alcg
-(define_insn "*add<mode>3_alc_carry1_cc"
- [(set (reg CC_REGNUM)
- (compare
+(define_insn "add<mode>3_alc_carry1_cc"
+ [(set (reg:CCL1 CC_REGNUM)
+ (compare:CCL1
(plus:GPR (plus:GPR (match_operand:GPR 3 "s390_alc_comparison" "")
(match_operand:GPR 1 "nonimmediate_operand" "%0,0"))
(match_operand:GPR 2 "general_operand" "d,T"))
(match_dup 1)))
(set (match_operand:GPR 0 "register_operand" "=d,d")
(plus:GPR (plus:GPR (match_dup 3) (match_dup 1)) (match_dup 2)))]
- "s390_match_ccmode (insn, CCL1mode)"
+ ""
"@
alc<g>r\t%0,%2
alc<g>\t%0,%2"
@@ -6854,6 +6882,28 @@
alc<g>\t%0,%2"
[(set_attr "op_type" "RRE,RXY")])
+(define_expand "usubc<mode>5"
+ [(match_operand:GPR 0 "register_operand")
+ (match_operand:GPR 1 "nonimmediate_operand")
+ (match_operand:GPR 2 "nonimmediate_operand")
+ (match_operand:GPR 3 "nonimmediate_operand")
+ (match_operand:GPR 4 "general_operand")]
+ "TARGET_Z196 && (<MODE>mode != DImode || TARGET_64BIT)"
+{
+ rtx cond = gen_rtx_GTU (<MODE>mode, gen_rtx_REG (CCL2mode, CC_REGNUM), const0_rtx);
+ if (operands[4] == const0_rtx)
+ emit_insn (gen_sub<mode>3_borrow_cc (operands[0], operands[2], operands[3]));
+ else
+ {
+ rtx tmp = gen_reg_rtx (<MODE>mode);
+ emit_insn (gen_xor<mode>3 (tmp, operands[4], const1_rtx));
+ s390_emit_compare (GTU, tmp, const0_rtx);
+ emit_insn (gen_sub<mode>3_slb_borrow1_cc (operands[0], operands[2], operands[3], cond));
+ }
+ emit_insn (gen_mov<mode>cc (operands[1], cond, const1_rtx, const0_rtx));
+ DONE;
+})
+
; slbr, slb, slbgr, slbg
(define_insn "*sub<mode>3_slb_cc"
[(set (reg CC_REGNUM)
@@ -6885,6 +6935,23 @@
[(set_attr "op_type" "RRE,RXY")
(set_attr "z10prop" "z10_c,*")])
+; slbr, slb, slbgr, slbg
+(define_insn "sub<mode>3_slb_borrow1_cc"
+ [(set (reg:CCL2 CC_REGNUM)
+ (compare:CCL2
+ (minus:GPR (minus:GPR (match_operand:GPR 1 "nonimmediate_operand" "0,0")
+ (match_operand:GPR 2 "general_operand" "d,T"))
+ (match_operand:GPR 3 "s390_slb_comparison" ""))
+ (match_dup 1)))
+ (set (match_operand:GPR 0 "register_operand" "=d,d")
+ (minus:GPR (minus:GPR (match_dup 1) (match_dup 2)) (match_dup 3)))]
+ ""
+ "@
+ slb<g>r\t%0,%2
+ slb<g>\t%0,%2"
+ [(set_attr "op_type" "RRE,RXY")
+ (set_attr "z10prop" "z10_c,*")])
+
(define_expand "add<mode>cc"
[(match_operand:GPR 0 "register_operand" "")
(match_operand 1 "comparison_operator" "")
new file mode 100644
@@ -0,0 +1,80 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mzarch -save-temps -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 2 "optimized" { target lp64 } } } */
+/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 1 "optimized" { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "\\talcr\\t" 1 } } */
+/* { dg-final { scan-assembler-times "\\talcgr\\t" 1 { target lp64 } } } */
+
+#include <assert.h>
+
+unsigned int __attribute__ ((noipa))
+uaddc (unsigned int x, unsigned int y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned int r;
+ _Bool c1 = __builtin_add_overflow (x, y, &r);
+ _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 | c2;
+ return r;
+}
+
+#ifdef __s390x__
+unsigned long __attribute__ ((noipa))
+uaddcl (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned long r;
+ _Bool c1 = __builtin_add_overflow (x, y, &r);
+ _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 | c2;
+ return r;
+}
+#endif
+
+void
+test_int (void)
+{
+ _Bool c;
+ unsigned int r;
+
+ r = uaddc (0xf0000000u, 0x0fffffffu, 0, &c);
+ assert (r == 0xffffffffu && !c);
+
+ r = uaddc (0xf0000000u, 0x0fffffffu, 1, &c);
+ assert (r == 0 && c);
+
+ r = uaddc (0xf0000001u, 0x0fffffffu, 0, &c);
+ assert (r == 0 && c);
+
+ r = uaddc (0xf0000001u, 0x0fffffffu, 1, &c);
+ assert (r == 1 && c);
+}
+
+#ifdef __s390x__
+void
+test_long (void)
+{
+ _Bool c;
+ unsigned long r;
+
+ r = uaddcl (0xf000000000000000u, 0x0fffffffffffffffu, 0, &c);
+ assert (r == 0xffffffffffffffffu && !c);
+
+ r = uaddcl (0xf000000000000000u, 0x0fffffffffffffffu, 1, &c);
+ assert (r == 0 && c);
+
+ r = uaddcl (0xf000000000000001u, 0x0fffffffffffffffu, 0, &c);
+ assert (r == 0 && c);
+
+ r = uaddcl (0xf000000000000001u, 0x0fffffffffffffffu, 1, &c);
+ assert (r == 1 && c);
+}
+#endif
+
+int
+main (void)
+{
+ test_int ();
+#ifdef __s390x__
+ test_long ();
+#endif
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mzarch -save-temps -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 6 "optimized" } } */
+/* { dg-final { scan-assembler-times "\\talcr\\t" 6 { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "\\talcr\\t" 3 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "\\talcgr\\t" 3 { target lp64 } } } */
+
+#define TEST(T, V, OP) \
+ unsigned T \
+ uaddc_##T##_##V (unsigned T x, unsigned T y, _Bool carry_in, _Bool *carry_out) \
+ { \
+ unsigned T r; \
+ _Bool c1 = __builtin_add_overflow (x, y, &r); \
+ _Bool c2 = __builtin_add_overflow (r, carry_in, &r); \
+ *carry_out = c1 OP c2; \
+ return r; \
+ }
+
+TEST(int, 1, |)
+TEST(int, 2, ||)
+TEST(int, 3, ^)
+
+TEST(long, 1, |)
+TEST(long, 2, ||)
+TEST(long, 3, ^)
new file mode 100644
@@ -0,0 +1,80 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mzarch -save-temps -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "\\.USUBC \\(" 2 "optimized" { target lp64 } } } */
+/* { dg-final { scan-tree-dump-times "\\.USUBC \\(" 1 "optimized" { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "\\tslbr\\t" 1 } } */
+/* { dg-final { scan-assembler-times "\\tslbgr\\t" 1 { target lp64 } } } */
+
+#include <assert.h>
+
+unsigned int __attribute__ ((noipa))
+usubc (unsigned int x, unsigned int y, _Bool borrow_in, _Bool *borrow_out)
+{
+ unsigned int r;
+ _Bool b1 = __builtin_sub_overflow (x, y, &r);
+ _Bool b2 = __builtin_sub_overflow (r, borrow_in, &r);
+ *borrow_out = b1 | b2;
+ return r;
+}
+
+#ifdef __s390x__
+unsigned long __attribute__ ((noipa))
+usubcl (unsigned long x, unsigned long y, _Bool borrow_in, _Bool *borrow_out)
+{
+ unsigned long r;
+ _Bool b1 = __builtin_sub_overflow (x, y, &r);
+ _Bool b2 = __builtin_sub_overflow (r, borrow_in, &r);
+ *borrow_out = b1 | b2;
+ return r;
+}
+#endif
+
+void
+test_int (void)
+{
+ _Bool b;
+ unsigned int r;
+
+ r = usubc (0xfu, 0xfu, 0, &b);
+ assert (r == 0 && !b);
+
+ r = usubc (0xfu, 0xfu, 1, &b);
+ assert (r == 0xffffffffu && b);
+
+ r = usubc (0xfu, 0xffu, 0, &b);
+ assert (r == 0xffffff10u && b);
+
+ r = usubc (0xfu, 0xffu, 1, &b);
+ assert (r == 0xffffff0fu && b);
+}
+
+#ifdef __s390x__
+void
+test_long (void)
+{
+ _Bool b;
+ unsigned long r;
+
+ r = usubcl (0xfu, 0xfu, 0, &b);
+ assert (r == 0 && !b);
+
+ r = usubcl (0xfu, 0xfu, 1, &b);
+ assert (r == 0xffffffffffffffffu && b);
+
+ r = usubcl (0xfu, 0xffu, 0, &b);
+ assert (r == 0xffffffffffffff10u && b);
+
+ r = usubcl (0xfu, 0xffu, 1, &b);
+ assert (r == 0xffffffffffffff0fu && b);
+}
+#endif
+
+int
+main (void)
+{
+ test_int ();
+#ifdef __s390x__
+ test_long ();
+#endif
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mzarch -save-temps -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "\\.USUBC \\(" 6 "optimized" } } */
+/* { dg-final { scan-assembler-times "\\tslbr\\t" 6 { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "\\tslbr\\t" 3 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "\\tslbgr\\t" 3 { target lp64 } } } */
+
+#define TEST(T, V, OP) \
+ unsigned T \
+ uaddc_##T##_##V (unsigned T x, unsigned T y, _Bool carry_in, _Bool *carry_out) \
+ { \
+ unsigned T r; \
+ _Bool c1 = __builtin_sub_overflow (x, y, &r); \
+ _Bool c2 = __builtin_sub_overflow (r, carry_in, &r); \
+ *carry_out = c1 OP c2; \
+ return r; \
+ }
+
+TEST(int, 1, |)
+TEST(int, 2, ||)
+TEST(int, 3, ^)
+
+TEST(long, 1, |)
+TEST(long, 2, ||)
+TEST(long, 3, ^)