From patchwork Tue Nov 28 03:29:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 80859 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8C57E3857BAA for ; Tue, 28 Nov 2023 03:30:39 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 35F19385829D for ; Tue, 28 Nov 2023 03:29:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 35F19385829D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 35F19385829D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142201; cv=none; b=dcTwSTMnAr+NIOBXfQCDItDtLj9r5Ri+BHxp6fWXaty8LlG0Or7ugJpcM014PxgcMoR2qJefjiZbXia+6nOS3OCZ0cpkcj2IjRAv+0508W1Iq1hQmzYaQ8Xz8srozQEqnKfL/jhxsMFeMdOGi0Mc7kGiBl760R7jgxaUxsb55hE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142201; c=relaxed/simple; bh=NdRC/7aPyWpOhi1zvHFvCN8RuMJgun8WqtC2hnvGCu8=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=cfnmTlfOlmeaK3rI1CMHQRnUR1Go1mRoRp34acTSOx16kMwPJJ2Qf5Du0COjN5ytPbHVCFRwEvlQQiZr5rOHmSZQmmbezI/XxRRHIsG34RrOCEfFVNa0NOfHOte33BvZkLw54t6YZwk1WNjMYyDjwqDaWUYMizTaNkcz9FtChAI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8CxtPCyXmVlD049AA--.57265S3; Tue, 28 Nov 2023 11:29:54 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxnd6lXmVlM2ZOAA--.43637S8; Tue, 28 Nov 2023 11:29:52 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math. Date: Tue, 28 Nov 2023 11:29:37 +0800 Message-Id: <20231128032938.17202-5-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231128032938.17202-1-xujiahao@loongson.cn> References: <20231128032938.17202-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxnd6lXmVlM2ZOAA--.43637S8 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj9fXoWfCr47Xr48tF15KrW5Wr45Jwc_yoW5Xw48Zo WrtF4DJ3W8GryF939rKrs3Zry8X3WUAr4xAay3ZwnYyFs7Jr98t3sF9a1Yv343Ar9rWry5 C3s7W3ZxZa4xJa1kl-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUYj7kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6r4j6r4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMc 02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUtVWrXwAv7VC2z280aVAF wI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28IcxkI7V AKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCj r7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42IY6x IIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAI w20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x 0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU8QJ57UUUUU== X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org When -mrecip option is turned on, use approximate reciprocal instructions and approximate reciprocal square root instructions with additional Newton-Raphson steps to implement single precision floating-point division, square root and reciprocal square root operations for better throughput. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in (recip_mask): New variable. (-mrecip, -mrecip): New options. * config/loongarch/lasx.md (div3): New expander. (*div3): Rename. (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/loongarch-protos.h (loongarch_emit_swrsqrtsf): New prototype. (loongarch_emit_swdivsf): Ditto. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Set recip_mask for -mrecip and -mrecip= options. (loongarch_emit_swrsqrtsf): New function. (loongarch_emit_swdivsf): Ditto. (use_rsqrt_p): Ditto. (loongarch_optab_supported_p): Ditto. (TARGET_OPTAB_SUPPORTED_P): New hook. * config/loongarch/loongarch.h (RECIP_MASK_NONE): New bitmasks. (RECIP_MASK_DIV): Ditto. (RECIP_MASK_SQRT): Ditto. (RECIP_MASK_RSQRT): Ditto. (RECIP_MASK_VEC_DIV): Ditto. (RECIP_MASK_VEC_SQRT): Ditto. (RECIP_MASK_VEC_RSQRT): Ditto. (RECIP_MASK_ALL): Ditto. (TARGET_RECIP_DIV): New tests. (TARGET_RECIP_SQRT): Ditto. (TARGET_RECIP_RSQRT): Ditto. (TARGET_RECIP_VEC_DIV): Ditto. (TARGET_RECIP_VEC_SQRT): Ditto. (TARGET_RECIP_VEC_RSQRT): Ditto. * config/loongarch/loongarch.md (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/loongarch.opt (recip_mask): New variable. (-mrecip, -mrecip): New options. * config/loongarch/lsx.md (div3): New expander. (*div3): Rename. (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/predicates.md (reg_or_vecotr_1_operand): New predicate. * doc/invoke.texi (LoongArch Options): Document new options. gcc/testsuite/ChangeLog: * gcc.target/loongarch/recip-divf.c: New test. * gcc.target/loongarch/recip-sqrtf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip-divf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip-divf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c: New test. diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 8af6cc6f532..cc1a9daf7cf 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -23,6 +23,9 @@ config/loongarch/loongarch-opts.h HeaderInclude config/loongarch/loongarch-str.h +TargetVariable +unsigned int recip_mask = 0 + ; ISA related options ;; Base ISA Enum @@ -197,6 +200,14 @@ mexplicit-relocs Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET) Use %reloc() assembly operators (for backward compatibility). +mrecip +Target RejectNegative Var(loongarch_recip) +Generate approximate reciprocal divide and square root for better throughput. + +mrecip= +Target RejectNegative Joined Var(loongarch_recip_name) +Control generation of reciprocal estimates. + ; The code model option names for -mcmodel. Enum Name(cmodel) Type(int) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 2d7b8f02b4b..08c81ef53e4 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -1249,7 +1249,25 @@ (define_insn "mul3" [(set_attr "type" "simd_fmul") (set_attr "mode" "")]) -(define_insn "div3" +(define_expand "div3" + [(set (match_operand:FLASX 0 "register_operand") + (div:FLASX (match_operand:FLASX 1 "reg_or_vecotr_1_operand") + (match_operand:FLASX 2 "register_operand")))] + "ISA_HAS_LASX" +{ + if (mode == V8SFmode + && TARGET_RECIP_VEC_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], V8SFmode); + DONE; + } +}) + +(define_insn "*div3" [(set (match_operand:FLASX 0 "register_operand" "=f") (div:FLASX (match_operand:FLASX 1 "register_operand" "f") (match_operand:FLASX 2 "register_operand" "f")))] @@ -1278,7 +1296,23 @@ (define_insn "fnma4" [(set_attr "type" "simd_fmadd") (set_attr "mode" "")]) -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:FLASX 0 "register_operand") + (sqrt:FLASX (match_operand:FLASX 1 "register_operand")))] + "ISA_HAS_LASX" +{ + if (mode == V8SFmode + && TARGET_RECIP_VEC_SQRT + && flag_unsafe_math_optimizations + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V8SFmode, 0); + DONE; + } +}) + +(define_insn "*sqrt2" [(set (match_operand:FLASX 0 "register_operand" "=f") (sqrt:FLASX (match_operand:FLASX 1 "register_operand" "f")))] "ISA_HAS_LASX" @@ -1710,7 +1744,20 @@ (define_insn "lasx_xvfrint_" [(set_attr "type" "simd_fcvt") (set_attr "mode" "")]) -(define_insn "rsqrt2" +(define_expand "rsqrt2" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRT))] + "ISA_HAS_LASX" + { + if (mode == V8SFmode && TARGET_RECIP_VEC_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V8SFmode, 1); + DONE; + } +}) + +(define_insn "*rsqrt2" [(set (match_operand:FLASX 0 "register_operand" "=f") (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] UNSPEC_LASX_XVFRSQRT))] diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index cb8fc36b086..f2ff93b5e10 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -220,5 +220,7 @@ extern rtx loongarch_gen_const_int_vector_shuffle (machine_mode, int); extern tree loongarch_build_builtin_va_list (void); extern rtx loongarch_build_signbit_mask (machine_mode, bool, bool); +extern void loongarch_emit_swrsqrtsf (rtx, rtx, machine_mode, bool); +extern void loongarch_emit_swdivsf (rtx, rtx, rtx, machine_mode); extern bool loongarch_explicit_relocs_p (enum loongarch_symbol_type); #endif /* ! GCC_LOONGARCH_PROTOS_H */ diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index d3896d72bc2..afee09c3b61 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -7547,6 +7547,69 @@ loongarch_option_override_internal (struct gcc_options *opts, /* Function to allocate machine-dependent function status. */ init_machine_status = &loongarch_init_machine_status; + + /* -mrecip options. */ + static struct + { + const char *string; /* option name. */ + unsigned int mask; /* mask bits to set. */ + } + const recip_options[] = { + { "all", RECIP_MASK_ALL }, + { "none", RECIP_MASK_NONE }, + { "div", RECIP_MASK_DIV }, + { "sqrt", RECIP_MASK_SQRT }, + { "rsqrt", RECIP_MASK_RSQRT }, + { "vec-div", RECIP_MASK_VEC_DIV }, + { "vec-sqrt", RECIP_MASK_VEC_SQRT }, + { "vec-rsqrt", RECIP_MASK_VEC_RSQRT }, + }; + + if (loongarch_recip_name) + { + char *p = ASTRDUP (loongarch_recip_name); + char *q; + unsigned int mask, i; + bool invert; + + while ((q = strtok (p, ",")) != NULL) + { + p = NULL; + if (*q == '!') + { + invert = true; + q++; + } + else + invert = false; + + if (!strcmp (q, "default")) + mask = RECIP_MASK_ALL; + else + { + for (i = 0; i < ARRAY_SIZE (recip_options); i++) + if (!strcmp (q, recip_options[i].string)) + { + mask = recip_options[i].mask; + break; + } + + if (i == ARRAY_SIZE (recip_options)) + { + error ("unknown option for %<-mrecip=%s%>", q); + invert = false; + mask = RECIP_MASK_NONE; + } + } + + if (invert) + recip_mask &= ~mask; + else + recip_mask |= mask; + } + } + if (loongarch_recip) + recip_mask |= RECIP_MASK_ALL; } @@ -11443,6 +11506,156 @@ loongarch_build_signbit_mask (machine_mode mode, bool vect, bool invert) return force_reg (vec_mode, v); } +/* Use rsqrte instruction and Newton-Rhapson to compute the approximation of + a single precision floating point [reciprocal] square root. */ + +void loongarch_emit_swrsqrtsf (rtx res, rtx a, machine_mode mode, bool recip) +{ + rtx x0, e0, e1, e2, mhalf, monehalf; + REAL_VALUE_TYPE r; + int unspec; + + x0 = gen_reg_rtx (mode); + e0 = gen_reg_rtx (mode); + e1 = gen_reg_rtx (mode); + e2 = gen_reg_rtx (mode); + + real_arithmetic (&r, ABS_EXPR, &dconsthalf, NULL); + mhalf = const_double_from_real_value (r, SFmode); + + real_arithmetic (&r, PLUS_EXPR, &dconsthalf, &dconst1); + monehalf = const_double_from_real_value (r, SFmode); + unspec = UNSPEC_RSQRTE; + + if (VECTOR_MODE_P (mode)) + { + mhalf = loongarch_build_const_vector (mode, true, mhalf); + monehalf = loongarch_build_const_vector (mode, true, monehalf); + unspec = GET_MODE_SIZE (mode) == 32 ? UNSPEC_LASX_XVFRSQRTE + : UNSPEC_LSX_VFRSQRTE; + } + + /* rsqrt(a) = rsqrte(a) * (1.5 - 0.5 * a * rsqrte(a) * rsqrte(a)) + sqrt(a) = a * rsqrte(a) * (1.5 - 0.5 * a * rsqrte(a) * rsqrte(a)) */ + + a = force_reg (mode, a); + + /* x0 = rsqrt(a) estimate. */ + emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a), + unspec))); + + /* If (a == 0.0) Filter out infinity to prevent NaN for sqrt(0.0). */ + if (!recip) + { + rtx zero = force_reg (mode, CONST0_RTX (mode)); + + if (VECTOR_MODE_P (mode)) + { + machine_mode imode = related_int_vector_mode (mode).require (); + rtx mask = gen_reg_rtx (imode); + emit_insn (gen_rtx_SET (mask, gen_rtx_NE (imode, a, zero))); + emit_insn (gen_rtx_SET (x0, gen_rtx_AND (mode, x0, + gen_lowpart (mode, mask)))); + } + else + { + rtx target = emit_conditional_move (x0, { GT, a, zero, mode }, + x0, zero, mode, 0); + if (target != x0) + emit_move_insn (x0, target); + } + } + + /* e0 = x0 * a */ + emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a))); + /* e1 = e0 * x0 */ + emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0))); + + /* e2 = 1.5 - e1 * 0.5 */ + mhalf = force_reg (mode, mhalf); + monehalf = force_reg (mode, monehalf); + emit_insn (gen_rtx_SET (e2, gen_rtx_FMA (mode, + gen_rtx_NEG (mode, e1), + mhalf, monehalf))); + + if (recip) + /* res = e2 * x0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, x0, e2))); + else + /* res = e2 * e0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e2, e0))); +} + +/* Use recipe instruction and Newton-Rhapson to compute the approximation of + a single precision floating point divide. */ + +void loongarch_emit_swdivsf (rtx res, rtx a, rtx b, machine_mode mode) +{ + rtx x0, e0, mtwo; + REAL_VALUE_TYPE r; + x0 = gen_reg_rtx (mode); + e0 = gen_reg_rtx (mode); + int unspec = UNSPEC_RECIPE; + + real_arithmetic (&r, ABS_EXPR, &dconst2, NULL); + mtwo = const_double_from_real_value (r, SFmode); + + if (VECTOR_MODE_P (mode)) + { + mtwo = loongarch_build_const_vector (mode, true, mtwo); + unspec = GET_MODE_SIZE (mode) == 32 ? UNSPEC_LASX_XVFRECIPE + : UNSPEC_LSX_VFRECIPE; + } + + mtwo = force_reg (mode, mtwo); + + /* a / b = a * recipe(b) * (2.0 - b * recipe(b)) */ + + /* x0 = 1./b estimate. */ + emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b), + unspec))); + /* 2.0 - b * x0 */ + emit_insn (gen_rtx_SET (e0, gen_rtx_FMA (mode, + gen_rtx_NEG (mode, b), x0, mtwo))); + + /* x0 = a * x0 */ + if (a != CONST1_RTX (mode)) + emit_insn (gen_rtx_SET (x0, gen_rtx_MULT (mode, a, x0))); + + /* res = e0 * x0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0))); +} + +/* Return true if it is safe to use the rsqrt optabs to optimize + 1.0/sqrt. */ + +static bool +use_rsqrt_p (machine_mode mode) +{ + if (TARGET_RECIP_RSQRT && GET_MODE_INNER (mode) == SFmode) + return (flag_finite_math_only + && !flag_trapping_math + && flag_unsafe_math_optimizations); + else + return true; +} + +/* Implement the TARGET_OPTAB_SUPPORTED_P hook. */ + +static bool +loongarch_optab_supported_p (int op, machine_mode mode1, machine_mode, + optimization_type opt_type) +{ + switch (op) + { + case rsqrt_optab: + return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (mode1); + + default: + return true; + } +} + static bool loongarch_builtin_support_vector_misalignment (machine_mode mode, const_tree type, @@ -11610,6 +11823,9 @@ loongarch_asm_code_end (void) #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \ loongarch_autovectorize_vector_modes +#undef TARGET_OPTAB_SUPPORTED_P +#define TARGET_OPTAB_SUPPORTED_P loongarch_optab_supported_p + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS loongarch_init_builtins #undef TARGET_BUILTIN_DECL diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index 115222e70fd..9910084a6a5 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -700,6 +700,24 @@ enum reg_class && (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT \ || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)) +#define RECIP_MASK_NONE 0x00 +#define RECIP_MASK_DIV 0x01 +#define RECIP_MASK_SQRT 0x02 +#define RECIP_MASK_RSQRT 0x04 +#define RECIP_MASK_VEC_DIV 0x08 +#define RECIP_MASK_VEC_SQRT 0x10 +#define RECIP_MASK_VEC_RSQRT 0x20 +#define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \ + | RECIP_MASK_RSQRT | RECIP_MASK_VEC_SQRT \ + | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_RSQRT) + +#define TARGET_RECIP_DIV ((recip_mask & RECIP_MASK_DIV) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_SQRT ((recip_mask & RECIP_MASK_SQRT) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_RSQRT ((recip_mask & RECIP_MASK_RSQRT) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_VEC_DIV ((recip_mask & RECIP_MASK_VEC_DIV) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_VEC_SQRT ((recip_mask & RECIP_MASK_VEC_SQRT) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_VEC_RSQRT ((recip_mask & RECIP_MASK_VEC_RSQRT) != 0 || TARGET_uARCH_LA664) + /* 1 if N is a possible register number for function argument passing. We have no FP argument registers when soft-float. */ diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 0b6910d84ab..ce4d2d9ad06 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -896,9 +896,21 @@ (define_peephole ;; Float division and modulus. (define_expand "div3" [(set (match_operand:ANYF 0 "register_operand") - (div:ANYF (match_operand:ANYF 1 "reg_or_1_operand") - (match_operand:ANYF 2 "register_operand")))] - "") + (div:ANYF (match_operand:ANYF 1 "reg_or_1_operand") + (match_operand:ANYF 2 "register_operand")))] + "" +{ + if (mode == SFmode + && TARGET_RECIP_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], SFmode); + DONE; + } +}) (define_insn "*div3" [(set (match_operand:ANYF 0 "register_operand" "=f") @@ -1129,7 +1141,23 @@ (define_insn "*fnma4" ;; ;; .................... -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:ANYF 0 "register_operand") + (sqrt:ANYF (match_operand:ANYF 1 "register_operand")))] + "" + { + if (mode == SFmode + && TARGET_RECIP_SQRT + && flag_unsafe_math_optimizations + && !optimize_insn_for_size_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], SFmode, 0); + DONE; + } + }) + +(define_insn "*sqrt2" [(set (match_operand:ANYF 0 "register_operand" "=f") (sqrt:ANYF (match_operand:ANYF 1 "register_operand" "f")))] "" @@ -1138,6 +1166,19 @@ (define_insn "sqrt2" (set_attr "mode" "") (set_attr "insn_count" "1")]) +(define_expand "rsqrt2" + [(set (match_operand:ANYF 0 "register_operand") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand")] + UNSPEC_RSQRT))] + "TARGET_HARD_FLOAT" +{ + if (mode == SFmode && TARGET_RECIP_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], SFmode, 1); + DONE; + } +}) + (define_insn "*rsqrt2" [(set (match_operand:ANYF 0 "register_operand" "=f") (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index 4d36e3ec4de..3c4fbda18ee 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -31,6 +31,9 @@ config/loongarch/loongarch-opts.h HeaderInclude config/loongarch/loongarch-str.h +TargetVariable +unsigned int recip_mask = 0 + ; ISA related options ;; Base ISA Enum @@ -205,6 +208,14 @@ mexplicit-relocs Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET) Use %reloc() assembly operators (for backward compatibility). +mrecip +Target RejectNegative Var(loongarch_recip) +Generate approximate reciprocal divide and square root for better throughput. + +mrecip= +Target RejectNegative Joined Var(loongarch_recip_name) +Control generation of reciprocal estimates. + ; The code model option names for -mcmodel. Enum Name(cmodel) Type(int) diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 20946326e37..dc78837ecf5 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -1153,7 +1153,25 @@ (define_insn "mul3" [(set_attr "type" "simd_fmul") (set_attr "mode" "")]) -(define_insn "div3" +(define_expand "div3" + [(set (match_operand:FLSX 0 "register_operand") + (div:FLSX (match_operand:FLSX 1 "reg_or_vecotr_1_operand") + (match_operand:FLSX 2 "register_operand")))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode + && TARGET_RECIP_VEC_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], V4SFmode); + DONE; + } +}) + +(define_insn "*div3" [(set (match_operand:FLSX 0 "register_operand" "=f") (div:FLSX (match_operand:FLSX 1 "register_operand" "f") (match_operand:FLSX 2 "register_operand" "f")))] @@ -1182,7 +1200,23 @@ (define_insn "fnma4" [(set_attr "type" "simd_fmadd") (set_attr "mode" "")]) -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:FLSX 0 "register_operand") + (sqrt:FLSX (match_operand:FLSX 1 "register_operand")))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode + && TARGET_RECIP_VEC_SQRT + && flag_unsafe_math_optimizations + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V4SFmode, 0); + DONE; + } +}) + +(define_insn "*sqrt2" [(set (match_operand:FLSX 0 "register_operand" "=f") (sqrt:FLSX (match_operand:FLSX 1 "register_operand" "f")))] "ISA_HAS_LSX" @@ -1638,7 +1672,20 @@ (define_insn "lsx_vfrint_" [(set_attr "type" "simd_fcvt") (set_attr "mode" "")]) -(define_insn "rsqrt2" +(define_expand "rsqrt2" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRT))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode && TARGET_RECIP_VEC_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V4SFmode, 1); + DONE; + } +}) + +(define_insn "*rsqrt2" [(set (match_operand:FLSX 0 "register_operand" "=f") (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] UNSPEC_LSX_VFRSQRT))] diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index f7796da10b2..9e9ce58cb53 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand" (ior (match_operand 0 "const_1_operand") (match_operand 0 "register_operand"))) +(define_predicate "reg_or_vecotr_1_operand" + (ior (match_operand 0 "const_vector_1_operand") + (match_operand 0 "register_operand"))) + ;; These are used in vec_merge, hence accept bitmask as const_int. (define_predicate "const_exp_2_operand" (and (match_code "const_int") diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 2e6bac37f3c..fd2a3e2a848 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1209,6 +1209,7 @@ Objective-C and Objective-C++ Dialects}. -msoft-float -mhard-float -mdouble-float -munordered-float -mcmov -mror -mrori -msext -msfimm -mshftimm -mcmodel=@var{code-model}} +-mrecip -mrecip=@var{opt} @emph{PDP-11 Options} @gccoptlist{-mfpu -msoft-float -mac0 -mno-ac0 -m40 -m45 -m10 @@ -26523,6 +26524,57 @@ detecting corresponding assembler support: This option is mostly useful for debugging, or interoperation with assemblers different from the build-time one. +@opindex mrecip +@item -mrecip +This option enables use of the reciprocal estimate and reciprocal square +root estimate instructions with additional Newton-Raphson steps to increase +precision instead of doing a divide or square root and divide for +floating-point arguments. +These instructions are generated only when @option{-funsafe-math-optimizations} +is enabled together with @option{-ffinite-math-only} and +@option{-fno-trapping-math}. +Note that while the throughput of the sequence is higher than the throughput of +the non-reciprocal instruction, the precision of the sequence can be decreased +by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). + +@opindex mrecip=opt +@item -mrecip=@var{opt} +This option controls which reciprocal estimate instructions +may be used. @var{opt} is a comma-separated list of options, which may +be preceded by a @samp{!} to invert the option: + +@table @samp +@item all +Enable all estimate instructions. + +@item default +Enable the default instructions, equivalent to @option{-mrecip}. + +@item none +Disable all estimate instructions, equivalent to @option{-mno-recip}. + +@item div +Enable the approximation for scalar division. + +@item vec-div +Enable the approximation for vectorized division. + +@item sqrt +Enable the approximation for scalar square root. + +@item vec-sqrt +Enable the approximation for vectorized square root. + +@item rsqrt +Enable the approximation for scalar reciprocal square root. + +@item vec-rsqrt +Enable the approximation for vectorized reciprocal square root. +@end table + +So, for example, @option{-mrecip=all,!sqrt} enables +all of the reciprocal approximations, except for scalar square root. + @item loongarch-vect-unroll-limit The vectorizer will use available tuning information to determine whether it would be beneficial to unroll the main vectorized loop and by how much. This diff --git a/gcc/testsuite/gcc.target/loongarch/recip-divf.c b/gcc/testsuite/gcc.target/loongarch/recip-divf.c new file mode 100644 index 00000000000..82b3224a250 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/recip-divf.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -march=la664" } */ +/* { dg-final { scan-assembler "frecipe.s" } } */ + +float +foo(float a, float b) +{ + return a / b; +} diff --git a/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c new file mode 100644 index 00000000000..23ceb4cd261 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -march=la664" } */ +/* { dg-final { scan-assembler-times "frsqrte.s" 3 } } */ + +extern float sqrtf (float); + +float +foo1 (float a, float b) +{ + return a/sqrtf(b); +} + +float +foo2 (float a, float b) +{ + return sqrtf(a/b); +} + +float +foo3 (float a) +{ + return sqrtf(a); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c new file mode 100644 index 00000000000..6ca72a1ce81 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlasx -march=la664" } */ +/* { dg-final { scan-assembler "xvfrecipe.s" } } */ + +float a[8],b[8],c[8]; + +void +foo () +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c new file mode 100644 index 00000000000..5f1c4e3d164 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlasx -march=la664" } */ +/* { dg-final { scan-assembler-times "xvfrsqrte.s" 3 } } */ + +float a[8], b[8], c[8]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c new file mode 100644 index 00000000000..015dafb50f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -march=la664" } */ +/* { dg-final { scan-assembler "vfrecipe.s" } } */ + +float a[4],b[4],c[4]; + +void +foo () +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c new file mode 100644 index 00000000000..5a1a14e291c --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -march=la664" } */ +/* { dg-final { scan-assembler-times "vfrsqrte.s" 3 } } */ + +float a[4], b[4], c[4]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i]); +}