[RFC] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support

  Hi!

Here is more complete patch to add std::bfloat16_t support on
x86, AArch64 and (only partially) on ARM 32-bit.  No BFmode optabs
are added by the patch, so for binops/unops it extends to SFmode
first and then truncates back to BFmode.
For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations
of all those conversions so that we avoid double rounding, for
BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much
it emits BFmode -> SFmode conversion first and then converts to the even
wider mode, neither step should be imprecise.
For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion
and then SFmode -> HFmode, because neither format is subset or superset
of the other, while SFmode is superset of both.
expr.cc then contains a -ffast-math optimization of the BF -> SF and
SF -> BF conversions if we don't optimize for space (and for the latter
if -frounding-math isn't enabled either).
For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16
but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math
|| !flag_unsafe_math_optimizations, because I think the insn doesn't
raise on sNaNs, hardcodes round to nearest and flushes denormals to zero.
In C by default (unless x86 -fexcess-precision=16) we use float excess
precision for BFmode, so truncate only on explicit casts and assignments.
In C++ unfortunately (but that is the case of also _Float16) we don't
support excess precision yet which means that for
__bf16 (__bf16 a, __bf16 b, __bf16 c, __bf16 d) { return a * b + c * d; }
we do a lot of conversions.
The aarch64 part is untested but has a chance of working (IMHO),
though I'd appreciate if ARM maintainers could decide whether it is
acceptable for them that __bf16 changes mangling and will allow arithmetics
and conversions.
The arm part is partial, libgcc side is missing as the target doesn't really
seem to use soft-fp right now.  Perhaps the config/arm/ changes can be
left out from the patch (thus keep ARM 32-bit __bf16 as before) and support
for it can be done at some later time.

Thoughts on this?

2022-09-29  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
	* tree.h (bfloat16_type_node): Define.
	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
	like float16_type_mode.
	* expmed.h (maybe_expand_shift): Declare.
	* expmed.cc (maybe_expand_shift): No longer static.
	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
	-ffast-math generic implementation for BF -> SF and SF -> BF
	conversions.
	* config/arm/arm.h (arm_bf16_type_node): Remove.
	(arm_bf16_ptr_type_node): Adjust comment.
	* config/arm/arm.cc (TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	(arm_mangle_type): Mangle BFmode as DFb16_.
	(arm_invalid_conversion): Only reject BF <-> HF conversions if
	HFmode is non-IEEE format.
	(arm_invalid_unary_op, arm_invalid_binary_op): Remove.
	* config/arm/arm-builtins.cc (arm_bf16_type_node): Remove.
	(arm_simd_builtin_std_type): Use bfloat16_type_node rather than
	arm_bf16_type_node.
	(arm_init_simd_builtin_types): Likewise.
	(arm_init_simd_builtin_scalar_types): Likewise.
	(arm_init_bf16_types): Likewise.
	* config/i386/i386.cc (ix86_mangle_type): Mangle BFmode as DFb16_.
	(ix86_invalid_conversion, ix86_invalid_unary_op,
	ix86_invalid_binary_op): Remove.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
	ix86_bf16_type_node.
	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
	* config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
	(aarch64_bf16_ptr_type_node): Adjust comment.
	* config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
	bfloat16_type_node rather than aarch64_bf16_type_node.
	(aarch64_mangle_type): Mangle BFmode as DFb16_.
	(aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
	aarch64_invalid_binary_op): Remove BFmode related rejections.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
	* config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
	(aarch64_int_or_fp_type): Use bfloat16_type_node rather than
	aarch64_bf16_type_node.
	(aarch64_init_simd_builtin_types, aarch64_init_bf16_types): Likewise.
	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
gcc/c-family/
	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
	predefine for C++ __BFLT16_*__ macros and for C++23 also
	__STDCPP_BFLOAT16_T__.
	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16 for C++.
gcc/cp/
	* cp-tree.h (extended_float_type_p): Return true for
	bfloat16_type_node.
	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
libcpp/
	* include/cpplib.h (CPP_N_BFLOAT16): Define.
	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
	C++.
libgcc/
	* config/arm/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/aarch64/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf dfbf sfbf hfbf.
	* config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,t,h}fbf2.
	* config/aarch64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
	* soft-fp/brain.h: New file.
	* soft-fp/truncsfbf2.c: New file.
	* soft-fp/truncdfbf2.c: New file.
	* soft-fp/truncxfbf2.c: New file.
	* soft-fp/trunctfbf2.c: New file.
	* soft-fp/trunchfbf2.c: New file.
	* soft-fp/truncbfhf2.c: New file.
	* soft-fp/extendbfsf2.c: New file.
libiberty/
	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
	entry.
	(cplus_demangle_type): Demangle DFb16_.
	* testsuite/demangle-expected (_Z3xxxDFb16_): New test.

	Jakub

Message ID	YzXABvJX2wl3gHkK@tucnak
State	New
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 75A983857BBF Date: Thu, 29 Sep 2022 17:55:50 +0200 To: Jason Merrill <jason@redhat.com>, "Joseph S. Myers" <joseph@codesourcery.com>, Hongtao Liu <crazylht@gmail.com>, hjl.tools@gmail.com, Richard Earnshaw <richard.earnshaw@arm.com>, Kyrylo Tkachov <kyrylo.tkachov@arm.com>, richard.sandiford@arm.com Subject: [RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support Message-ID: <YzXABvJX2wl3gHkK@tucnak> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Precedence: list From: Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Jakub Jelinek <jakub@redhat.com> Cc: gcc-patches@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
Series	[RFC] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support \| [RFC] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support

[RFC] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and __bf16 arithmetic support

Commit Message

Comments

Patch