[RFC,1/2] RISC-V: Add support for _Bfloat16.

Message ID 20230919084444.2089-1-jinma@linux.alibaba.com
State Superseded
Delegated to: Jeff Law
Headers
Series [RFC,1/2] RISC-V: Add support for _Bfloat16. |

Commit Message

Jin Ma Sept. 19, 2023, 8:44 a.m. UTC
  gcc/ChangeLog:

	* config/riscv/iterators.md (HFBF): New.
	* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
	Initialize data type_Bfloat16.
	* config/riscv/riscv-modes.def (FLOAT_MODE): New.
	(ADJUST_FLOAT_FORMAT): New.
	* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
	(riscv_scalar_mode_supported_p): Ditto.
	(riscv_libgcc_floating_mode_supported_p): Ditto.
	(riscv_block_arith_comp_libfuncs_for_mode): New.
	(riscv_init_libfuncs): Opening and closing some libfuncs for BFmode.
	* config/riscv/riscv.md (mode" ): Add BF.
	(truncdfbf2): New.
	(movhf): Support for BFmode.
	(mov<mode>): Ditto.
	(*mov<mode>_softfloat):  Ditto.
	(fix_truncbf<GPR:mode>2): New.
	(fixuns_truncbf<GPR:mode>2): New.
	(float<mode>bf2): New.
	(floatuns<mode>bf2): New.

libgcc/ChangeLog:

	* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
	(_FP_NANSIGN_B): New.
	* config/riscv/t-softfp32: Add support for BF libfuncs.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/bf16_arithmetic.c: New test.
	* gcc.target/riscv/bf16_call.c: New test.
	* gcc.target/riscv/bf16_comparisons.c: New test.
	* gcc.target/riscv/bf16_convert-1.c: New test.
	* gcc.target/riscv/bf16_convert-2.c: New test.
	* gcc.target/riscv/bf16_convert_run.c: New test.
---
 gcc/config/riscv/iterators.md                 |   2 +
 gcc/config/riscv/riscv-builtins.cc            |  16 ++
 gcc/config/riscv/riscv-modes.def              |   4 +
 gcc/config/riscv/riscv.cc                     |  93 ++++++++--
 gcc/config/riscv/riscv.md                     |  94 ++++++++--
 .../gcc.target/riscv/bf16_arithmetic.c        |  36 ++++
 gcc/testsuite/gcc.target/riscv/bf16_call.c    |  17 ++
 .../gcc.target/riscv/bf16_comparisons.c       |  25 +++
 .../gcc.target/riscv/bf16_convert-1.c         |  39 +++++
 .../gcc.target/riscv/bf16_convert-2.c         |  38 ++++
 .../gcc.target/riscv/bf16_convert_run.c       | 163 ++++++++++++++++++
 libgcc/config/riscv/sfp-machine.h             |   3 +
 libgcc/config/riscv/t-softfp32                |   7 +-
 13 files changed, 503 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_call.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_comparisons.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_convert-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_convert-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16_convert_run.c
  

Comments

Jeff Law Sept. 29, 2023, 5:46 p.m. UTC | #1
On 9/19/23 02:44, Jin Ma wrote:
> gcc/ChangeLog:
> 
> 	* config/riscv/iterators.md (HFBF): New.
> 	* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
> 	Initialize data type_Bfloat16.
> 	* config/riscv/riscv-modes.def (FLOAT_MODE): New.
> 	(ADJUST_FLOAT_FORMAT): New.
> 	* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
> 	(riscv_scalar_mode_supported_p): Ditto.
> 	(riscv_libgcc_floating_mode_supported_p): Ditto.
> 	(riscv_block_arith_comp_libfuncs_for_mode): New.
> 	(riscv_init_libfuncs): Opening and closing some libfuncs for BFmode.
> 	* config/riscv/riscv.md (mode" ): Add BF.
> 	(truncdfbf2): New.
> 	(movhf): Support for BFmode.
> 	(mov<mode>): Ditto.
> 	(*mov<mode>_softfloat):  Ditto.
> 	(fix_truncbf<GPR:mode>2): New.
> 	(fixuns_truncbf<GPR:mode>2): New.
> 	(float<mode>bf2): New.
> 	(floatuns<mode>bf2): New.
> 
> libgcc/ChangeLog:
> 
> 	* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
> 	(_FP_NANSIGN_B): New.
> 	* config/riscv/t-softfp32: Add support for BF libfuncs.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/bf16_arithmetic.c: New test.
> 	* gcc.target/riscv/bf16_call.c: New test.
> 	* gcc.target/riscv/bf16_comparisons.c: New test.
> 	* gcc.target/riscv/bf16_convert-1.c: New test.
> 	* gcc.target/riscv/bf16_convert-2.c: New test.
> 	* gcc.target/riscv/bf16_convert_run.c: New test.
So this can't go in the tree until the extension has moved into a frozen 
state.  Hopefully that'll happen before we close stage1 development in Nov.



> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index e00b8ee3579..5048628c784 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -1631,6 +1631,20 @@ (define_insn "truncdfhf2"
>     [(set_attr "type" "fcvt")
>      (set_attr "mode" "HF")])
>   
> +;; The conversion of DF to BF needs to be done with SF if there is a
> +;; chance to generate at least one instruction, otherwise just using
> +;; libfunc __truncdfbf2.
> +(define_expand "truncdfbf2"
> +  [(set (match_operand:BF     0 "register_operand" "=f")
> +       (float_truncate:BF
> +           (match_operand:DF 1 "register_operand" " f")))]
> +  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
> +  {
> +    convert_move (operands[0],
> +		  convert_modes (SFmode, DFmode, operands[1], 0), 0);
> +    DONE;
> +  })
So for conversions to/from BFmode, doesn't generic code take care of 
this for us?  Search for convert_mode_scalar in expr.cc. That code will 
utilize SFmode as an intermediate step just like your expander.   Is 
there some reason that generic code is insufficient?

Similarly for the the other conversions.

Otherwise it looks pretty good.

Jeff
  
Jin Ma Oct. 9, 2023, 6:18 a.m. UTC | #2
> On 9/19/23 02:44, Jin Ma wrote:
> > gcc/ChangeLog:
> > 
> > 	* config/riscv/iterators.md (HFBF): New.
> > 	* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
> > 	Initialize data type_Bfloat16.
> > 	* config/riscv/riscv-modes.def (FLOAT_MODE): New.
> > 	(ADJUST_FLOAT_FORMAT): New.
> > 	* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
> > 	(riscv_scalar_mode_supported_p): Ditto.
> > 	(riscv_libgcc_floating_mode_supported_p): Ditto.
> > 	(riscv_block_arith_comp_libfuncs_for_mode): New.
> > 	(riscv_init_libfuncs): Opening and closing some libfuncs for BFmode.
> > 	* config/riscv/riscv.md (mode" ): Add BF.
> > 	(truncdfbf2): New.
> > 	(movhf): Support for BFmode.
> > 	(mov<mode>): Ditto.
> > 	(*mov<mode>_softfloat):  Ditto.
> > 	(fix_truncbf<GPR:mode>2): New.
> > 	(fixuns_truncbf<GPR:mode>2): New.
> > 	(float<mode>bf2): New.
> > 	(floatuns<mode>bf2): New.
> > 
> > libgcc/ChangeLog:
> > 
> > 	* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
> > 	(_FP_NANSIGN_B): New.
> > 	* config/riscv/t-softfp32: Add support for BF libfuncs.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* gcc.target/riscv/bf16_arithmetic.c: New test.
> > 	* gcc.target/riscv/bf16_call.c: New test.
> > 	* gcc.target/riscv/bf16_comparisons.c: New test.
> > 	* gcc.target/riscv/bf16_convert-1.c: New test.
> > 	* gcc.target/riscv/bf16_convert-2.c: New test.
> > 	* gcc.target/riscv/bf16_convert_run.c: New test.
> So this can't go in the tree until the extension has moved into a frozen 
> state.  Hopefully that'll happen before we close stage1 development in Nov.

Ok, this is very reasonable.

> 
> 
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index e00b8ee3579..5048628c784 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -1631,6 +1631,20 @@ (define_insn "truncdfhf2"
> >     [(set_attr "type" "fcvt")
> >      (set_attr "mode" "HF")])
> >   
> > +;; The conversion of DF to BF needs to be done with SF if there is a
> > +;; chance to generate at least one instruction, otherwise just using
> > +;; libfunc __truncdfbf2.
> > +(define_expand "truncdfbf2"
> > +  [(set (match_operand:BF     0 "register_operand" "=f")
> > +       (float_truncate:BF
> > +           (match_operand:DF 1 "register_operand" " f")))]
> > +  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
> > +  {
> > +    convert_move (operands[0],
> > +		  convert_modes (SFmode, DFmode, operands[1], 0), 0);
> > +    DONE;
> > +  })
> So for conversions to/from BFmode, doesn't generic code take care of 
> this for us?  Search for convert_mode_scalar in expr.cc. That code will 
> utilize SFmode as an intermediate step just like your expander.   Is 
> there some reason that generic code is insufficient?
>
> Similarly for the the other conversions.

As far as I can see, the function 'convert_mode_scalar' doesn't seem to be perfect for
dealing with the conversions to/from BFmode. It can only handle BF to HF, SF, DF and
SF to BF well, but the rest of the conversion without any processing, directly using
the libcall.

Maybe I should choose to enhance its functionality? This seems to be a
good choice, I'm not sure.

Jin

> 
> Otherwise it looks pretty good.
> 
> Jeff
  
Jeff Law Oct. 9, 2023, 7:16 p.m. UTC | #3
On 10/9/23 00:18, Jin Ma wrote:

>>> +;; The conversion of DF to BF needs to be done with SF if there is a
>>> +;; chance to generate at least one instruction, otherwise just using
>>> +;; libfunc __truncdfbf2.
>>> +(define_expand "truncdfbf2"
>>> +  [(set (match_operand:BF     0 "register_operand" "=f")
>>> +       (float_truncate:BF
>>> +           (match_operand:DF 1 "register_operand" " f")))]
>>> +  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
>>> +  {
>>> +    convert_move (operands[0],
>>> +		  convert_modes (SFmode, DFmode, operands[1], 0), 0);
>>> +    DONE;
>>> +  })
>> So for conversions to/from BFmode, doesn't generic code take care of
>> this for us?  Search for convert_mode_scalar in expr.cc. That code will
>> utilize SFmode as an intermediate step just like your expander.   Is
>> there some reason that generic code is insufficient?
>>
>> Similarly for the the other conversions.
> 
> As far as I can see, the function 'convert_mode_scalar' doesn't seem to be perfect for
> dealing with the conversions to/from BFmode. It can only handle BF to HF, SF, DF and
> SF to BF well, but the rest of the conversion without any processing, directly using
> the libcall.
> 
> Maybe I should choose to enhance its functionality? This seems to be a
> good choice, I'm not sure.My recollection was that BF could be converted to/from SF trivially and 
if we wanted BF->DF we'd first convert to SF, then to DF.

Direct BF<->DF conversions aren't actually important from a performance 
standpoint.  So it's OK if they have an extra step IMHO.

jeff
  
Jin Ma Oct. 25, 2023, 10:15 a.m. UTC | #4
> >>> +;; The conversion of DF to BF needs to be done with SF if there is a
> >>> +;; chance to generate at least one instruction, otherwise just using
> >>> +;; libfunc __truncdfbf2.
> >>> +(define_expand "truncdfbf2"
> >>> +  [(set (match_operand:BF     0 "register_operand" "=f")
> >>> +       (float_truncate:BF
> >>> +           (match_operand:DF 1 "register_operand" " f")))]
> >>> +  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
> >>> +  {
> >>> +    convert_move (operands[0],
> >>> +		  convert_modes (SFmode, DFmode, operands[1], 0), 0);
> >>> +    DONE;
> >>> +  })
> >> So for conversions to/from BFmode, doesn't generic code take care of
> >> this for us?  Search for convert_mode_scalar in expr.cc. That code will
> >> utilize SFmode as an intermediate step just like your expander.   Is
> >> there some reason that generic code is insufficient?
> >>
> >> Similarly for the the other conversions.
> > 
> > As far as I can see, the function 'convert_mode_scalar' doesn't seem to be perfect for
> > dealing with the conversions to/from BFmode. It can only handle BF to HF, SF, DF and
> > SF to BF well, but the rest of the conversion without any processing, directly using
> > the libcall.
> > 
> > Maybe I should choose to enhance its functionality? This seems to be a
> > good choice, I'm not sure.My recollection was that BF could be converted to/from SF trivially and 
> if we wanted BF->DF we'd first convert to SF, then to DF.
> 
> Direct BF<->DF conversions aren't actually important from a performance 
> standpoint.  So it's OK if they have an extra step IMHO.

Thank you very much for your review and detailed reply. Maybe there are some problems with my expression
and I am a little confused about your guidance. My understanding is that you also think that it is reasonable to
convert through SF, right? In fact, this is what I did.

In this patch, my thoughts are as follows:

The general principle is to use the real instructions instead of libcall as much as possible for conversions,
while minimizing the definition of libcall(only reusing which has been defined by other architectures such
as aarch64). If SF can be used as a transit, it is preferred to convert to SF, otherwise libcall is directly used.

1. For the conversions between floating points

For BF->DF, as you said, the function 'convert_mode_scalar' in the general code has been well implemented,
which will be expressed as BF->SF->DF. And the generated instruction list may be as follows:
  'call __extendbfsf2' + 'call __extendsfdf2' (when only soft floating point support);
  'call __extendbfsf2' + 'fcvt.d.s'           (when (TARGET_DOUBLE_FLOAT || TARGET_ZDINX) is true);
  'fcvt.s.bf16'        + 'fcvt.d.s'           (when ((TARGET_DOUBLE_FLOAT || TARGET_ZDINX) && TARGET_ZFBFMIN) is true)

For DF->BF, if any of fcvt.s.d and fcvt.bf16.s cannot be generated, the 'call __truncdfbf2' is directly generated
by the function 'convert_mode_scalar'. Otherwise the new pattern(define_expand "truncdfbf2") is used. This
makes it possible to implement DF->BF by 'fcvt.s.d' + 'fcvt.bf16.s', which cannot be generated by the function
'convert_mode_scala'.

2. For the conversions between integer and BF, it seems that gcc only uses libcall to implement it, but this is
obviously wrong. For example, the conversion BF->SI directly calls the unimplemented libcall __fixunsbfsi.
So I added some new pattern to handle these transformations with SF.

Thanks,

Jin

> 
> jeff
  
Jeff Law Nov. 10, 2023, 9:21 p.m. UTC | #5
On 10/25/23 04:15, Jin Ma wrote:
>>>>> +;; The conversion of DF to BF needs to be done with SF if there is a
>>>>> +;; chance to generate at least one instruction, otherwise just using
>>>>> +;; libfunc __truncdfbf2.
>>>>> +(define_expand "truncdfbf2"
>>>>> +  [(set (match_operand:BF     0 "register_operand" "=f")
>>>>> +       (float_truncate:BF
>>>>> +           (match_operand:DF 1 "register_operand" " f")))]
>>>>> +  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
>>>>> +  {
>>>>> +    convert_move (operands[0],
>>>>> +		  convert_modes (SFmode, DFmode, operands[1], 0), 0);
>>>>> +    DONE;
>>>>> +  })
>>>> So for conversions to/from BFmode, doesn't generic code take care of
>>>> this for us?  Search for convert_mode_scalar in expr.cc. That code will
>>>> utilize SFmode as an intermediate step just like your expander.   Is
>>>> there some reason that generic code is insufficient?
>>>>
>>>> Similarly for the the other conversions.
>>>
>>> As far as I can see, the function 'convert_mode_scalar' doesn't seem to be perfect for
>>> dealing with the conversions to/from BFmode. It can only handle BF to HF, SF, DF and
>>> SF to BF well, but the rest of the conversion without any processing, directly using
>>> the libcall.
>>>
>>> Maybe I should choose to enhance its functionality? This seems to be a
>>> good choice, I'm not sure.My recollection was that BF could be converted to/from SF trivially and
>> if we wanted BF->DF we'd first convert to SF, then to DF.
>>
>> Direct BF<->DF conversions aren't actually important from a performance
>> standpoint.  So it's OK if they have an extra step IMHO.
> 
> Thank you very much for your review and detailed reply. Maybe there are some problems with my expression
> and I am a little confused about your guidance. My understanding is that you also think that it is reasonable to
> convert through SF, right? In fact, this is what I did.
My point was that I would expect the generic code to handle the 
conversion and that we didn't need to handle it explicitly in the RISC-V 
backend.

Meaning that I don't think we need a define_expand for truncdfbf2, 
fix_truncbf<GPR:mode>2, fixuns_truncbf<GPR:mode>2, float<mode>bf2, or 
floatuns<mode>bf2.


> 
> In this patch, my thoughts are as follows:
> 
> The general principle is to use the real instructions instead of libcall as much as possible for conversions,
> while minimizing the definition of libcall(only reusing which has been defined by other architectures such
> as aarch64). If SF can be used as a transit, it is preferred to convert to SF, otherwise libcall is directly used.
> 
> 1. For the conversions between floating points
> 
> For BF->DF, as you said, the function 'convert_mode_scalar' in the general code has been well implemented,
> which will be expressed as BF->SF->DF. And the generated instruction list may be as follows:
>    'call __extendbfsf2' + 'call __extendsfdf2' (when only soft floating point support);
>    'call __extendbfsf2' + 'fcvt.d.s'           (when (TARGET_DOUBLE_FLOAT || TARGET_ZDINX) is true);
>    'fcvt.s.bf16'        + 'fcvt.d.s'           (when ((TARGET_DOUBLE_FLOAT || TARGET_ZDINX) && TARGET_ZFBFMIN) is true)
> 
> For DF->BF, if any of fcvt.s.d and fcvt.bf16.s cannot be generated, the 'call __truncdfbf2' is directly generated
> by the function 'convert_mode_scalar'. Otherwise the new pattern(define_expand "truncdfbf2") is used. This
> makes it possible to implement DF->BF by 'fcvt.s.d' + 'fcvt.bf16.s', which cannot be generated by the function
> 'convert_mode_scala'.
But I would have expected convert_mode_scalar to generate DF->BF by 
first truncating to SF, then to BF.   If that is missing for truncation, 
then we should add it to convert_mode_scalar rather than expressing it 
as a backend expander.





> 
> 2. For the conversions between integer and BF, it seems that gcc only uses libcall to implement it, but this is
> obviously wrong. For example, the conversion BF->SI directly calls the unimplemented libcall __fixunsbfsi.
> So I added some new pattern to handle these transformations with SF.
I would suggest these move into target independent code as well. 
There's no reason I'm aware of that they should be implemented entirely 
in a target machine description.  We're not really doing anything target 
specific in here.

jeff
  

Patch

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index ecf033f2fa7..73523b73fdd 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -84,6 +84,8 @@  (define_mode_iterator SOFTF [SF (DF "TARGET_64BIT") (HF "TARGET_ZFHMIN")])
 ;; instruction.
 (define_mode_attr size [(QI "b") (HI "h")])
 
+(define_mode_iterator HFBF [HF BF])
+
 ;; Mode attributes for loads.
 (define_mode_attr load [(QI "lb") (HI "lh") (SI "lw") (DI "ld") (HF "flh") (SF "flw") (DF "fld")])
 
diff --git a/gcc/config/riscv/riscv-builtins.cc b/gcc/config/riscv/riscv-builtins.cc
index 3fe3a89dcc2..b7bb89794f7 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -192,6 +192,7 @@  static GTY(()) int riscv_builtin_decl_index[NUM_INSN_CODES];
   riscv_builtin_decls[riscv_builtin_decl_index[(CODE)]]
 
 tree riscv_float16_type_node = NULL_TREE;
+tree riscv_bfloat16_type_node = NULL_TREE;
 
 /* Return the function type associated with function prototype TYPE.  */
 
@@ -235,6 +236,21 @@  riscv_init_builtin_types (void)
   if (!maybe_get_identifier ("_Float16"))
     lang_hooks.types.register_builtin_type (riscv_float16_type_node,
 					    "_Float16");
+
+  /* Provide the _Bfloat16 type and bfloat16_type_node if needed.  */
+  if (!bfloat16_type_node)
+    {
+      riscv_bfloat16_type_node = make_node (REAL_TYPE);
+      TYPE_PRECISION (riscv_bfloat16_type_node) = 16;
+      SET_TYPE_MODE (riscv_bfloat16_type_node, BFmode);
+      layout_type (riscv_bfloat16_type_node);
+    }
+  else
+    riscv_bfloat16_type_node = bfloat16_type_node;
+
+  if (!maybe_get_identifier ("_Bfloat16"))
+    lang_hooks.types.register_builtin_type (riscv_bfloat16_type_node,
+					    "_Bfloat16");
 }
 
 /* Implement TARGET_INIT_BUILTINS.  */
diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index e3c6ccb2809..723bfaee42d 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -22,6 +22,10 @@  along with GCC; see the file COPYING3.  If not see
 FLOAT_MODE (HF, 2, ieee_half_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
 
+FLOAT_MODE (BF, 2, 0);
+/* Reuse definition from arm.  */
+ADJUST_FLOAT_FORMAT (BF, &arm_bfloat_half_format);
+
 /* Vector modes.  */
 
 /* Encode the ratio of SEW/LMUL into the mask types. There are the following
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8c766e2e2be..910523ee2b9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8727,9 +8727,17 @@  riscv_asan_shadow_offset (void)
 static const char *
 riscv_mangle_type (const_tree type)
 {
-  /* Half-precision float, _Float16 is "DF16_".  */
+  /* Half-precision float, _Float16 is "DF16_" and _Bfloat16 is "DF16b".  */
   if (SCALAR_FLOAT_TYPE_P (type) && TYPE_PRECISION (type) == 16)
-    return "DF16_";
+    {
+      if (TYPE_MODE (type) == HFmode)
+	return "DF16_";
+
+      if (TYPE_MODE (type) == BFmode)
+	return "DF16b";
+
+      gcc_unreachable ();
+    }
 
   /* Mangle all vector type for vector extension.  */
   /* The mangle name follows the rule of RVV LLVM
@@ -8750,19 +8758,19 @@  riscv_mangle_type (const_tree type)
 static bool
 riscv_scalar_mode_supported_p (scalar_mode mode)
 {
-  if (mode == HFmode)
+  if (mode == HFmode || mode == BFmode)
     return true;
   else
     return default_scalar_mode_supported_p (mode);
 }
 
 /* Implement TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P - return TRUE
-   if MODE is HFmode, and punt to the generic implementation otherwise.  */
+   if MODE is HFmode or BFmode, and punt to the generic implementation otherwise.  */
 
 static bool
 riscv_libgcc_floating_mode_supported_p (scalar_float_mode mode)
 {
-  if (mode == HFmode)
+  if (mode == HFmode || mode == BFmode)
     return true;
   else
     return default_libgcc_floating_mode_supported_p (mode);
@@ -8813,27 +8821,74 @@  riscv_floatn_mode (int n, bool extended)
   return default_floatn_mode (n, extended);
 }
 
+/* Record that we have no arithmetic or comparison libfuncs for
+   machine mode MODE.  */
+
+static void
+riscv_block_arith_comp_libfuncs_for_mode (machine_mode mode)
+{
+  /* Arithmetic.  */
+  set_optab_libfunc (add_optab, mode, NULL);
+  set_optab_libfunc (sdiv_optab, mode, NULL);
+  set_optab_libfunc (smul_optab, mode, NULL);
+  set_optab_libfunc (neg_optab, mode, NULL);
+  set_optab_libfunc (sub_optab, mode, NULL);
+
+  /* Comparisons.  */
+  set_optab_libfunc (eq_optab, mode, NULL);
+  set_optab_libfunc (ne_optab, mode, NULL);
+  set_optab_libfunc (lt_optab, mode, NULL);
+  set_optab_libfunc (le_optab, mode, NULL);
+  set_optab_libfunc (ge_optab, mode, NULL);
+  set_optab_libfunc (gt_optab, mode, NULL);
+  set_optab_libfunc (unord_optab, mode, NULL);
+}
+
 static void
 riscv_init_libfuncs (void)
 {
+  machine_mode mode_iter;
   /* Half-precision float operations.  The compiler handles all operations
      with NULL libfuncs by converting to SFmode.  */
 
-  /* Arithmetic.  */
-  set_optab_libfunc (add_optab, HFmode, NULL);
-  set_optab_libfunc (sdiv_optab, HFmode, NULL);
-  set_optab_libfunc (smul_optab, HFmode, NULL);
-  set_optab_libfunc (neg_optab, HFmode, NULL);
-  set_optab_libfunc (sub_optab, HFmode, NULL);
+  riscv_block_arith_comp_libfuncs_for_mode (HFmode);
 
-  /* Comparisons.  */
-  set_optab_libfunc (eq_optab, HFmode, NULL);
-  set_optab_libfunc (ne_optab, HFmode, NULL);
-  set_optab_libfunc (lt_optab, HFmode, NULL);
-  set_optab_libfunc (le_optab, HFmode, NULL);
-  set_optab_libfunc (ge_optab, HFmode, NULL);
-  set_optab_libfunc (gt_optab, HFmode, NULL);
-  set_optab_libfunc (unord_optab, HFmode, NULL);
+  /* For all possible libcalls in BFmode, record NULL.  */
+  riscv_block_arith_comp_libfuncs_for_mode (BFmode);
+  FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_FLOAT)
+    {
+      set_conv_libfunc (trunc_optab, BFmode, mode_iter, NULL);
+      set_conv_libfunc (trunc_optab, mode_iter, BFmode, NULL);
+      set_conv_libfunc (sext_optab, mode_iter, BFmode, NULL);
+      set_conv_libfunc (sext_optab, BFmode, mode_iter, NULL);
+    }
+
+  FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
+    {
+      set_conv_libfunc (sfix_optab, BFmode, mode_iter, NULL);
+      set_conv_libfunc (sfix_optab, mode_iter, BFmode, NULL);
+      set_conv_libfunc (ufix_optab, BFmode, mode_iter, NULL);
+      set_conv_libfunc (ufix_optab, mode_iter, BFmode, NULL);
+
+      set_conv_libfunc (sfloat_optab, mode_iter, BFmode, NULL);
+      set_conv_libfunc (sfloat_optab, BFmode, mode_iter, NULL);
+      set_conv_libfunc (ufloat_optab, mode_iter, BFmode, NULL);
+      set_conv_libfunc (ufloat_optab, BFmode, mode_iter, NULL);
+    }
+
+  /* Enable libfuncs conversion for BFmode.  */
+  set_conv_libfunc (sext_optab, SFmode, BFmode, "__extendbfsf2");
+  set_conv_libfunc (trunc_optab, BFmode, SFmode, "__truncsfbf2");
+  set_conv_libfunc (trunc_optab, BFmode, DFmode, "__truncdfbf2");
+
+  set_conv_libfunc (sfloat_optab, BFmode, DImode, "__floatdibf");
+  set_conv_libfunc (ufloat_optab, BFmode, DImode, "__floatundibf");
+
+  /* Convert between BFmode and HFmode using only trunc libfunc if needed.  */
+  set_conv_libfunc (sext_optab, BFmode, HFmode, "__trunchfbf2");
+  set_conv_libfunc (sext_optab, HFmode, BFmode, "__truncbfhf2");
+  set_conv_libfunc (trunc_optab, BFmode, HFmode, "__trunchfbf2");
+  set_conv_libfunc (trunc_optab, HFmode, BFmode, "__truncbfhf2");
 }
 
 #if CHECKING_P
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index e00b8ee3579..5048628c784 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -189,7 +189,7 @@  (define_attr "move_type"
   (const_string "unknown"))
 
 ;; Main data type used by the insn
-(define_attr "mode" "unknown,none,QI,HI,SI,DI,TI,HF,SF,DF,TF,
+(define_attr "mode" "unknown,none,QI,HI,SI,DI,TI,BF,HF,SF,DF,TF,
   RVVMF64BI,RVVMF32BI,RVVMF16BI,RVVMF8BI,RVVMF4BI,RVVMF2BI,RVVM1BI,
   RVVM8QI,RVVM4QI,RVVM2QI,RVVM1QI,RVVMF2QI,RVVMF4QI,RVVMF8QI,
   RVVM8HI,RVVM4HI,RVVM2HI,RVVM1HI,RVVMF2HI,RVVMF4HI,
@@ -1631,6 +1631,20 @@  (define_insn "truncdfhf2"
   [(set_attr "type" "fcvt")
    (set_attr "mode" "HF")])
 
+;; The conversion of DF to BF needs to be done with SF if there is a
+;; chance to generate at least one instruction, otherwise just using
+;; libfunc __truncdfbf2.
+(define_expand "truncdfbf2"
+  [(set (match_operand:BF     0 "register_operand" "=f")
+       (float_truncate:BF
+           (match_operand:DF 1 "register_operand" " f")))]
+  "TARGET_DOUBLE_FLOAT || TARGET_ZDINX"
+  {
+    convert_move (operands[0],
+		  convert_modes (SFmode, DFmode, operands[1], 0), 0);
+    DONE;
+  })
+
 ;;
 ;;  ....................
 ;;
@@ -1784,12 +1798,12 @@  (define_insn "extendhfdf2"
    (set_attr "mode" "DF")])
 
 ;; 16-bit floating point moves
-(define_expand "movhf"
-  [(set (match_operand:HF 0 "")
-	(match_operand:HF 1 ""))]
+(define_expand "mov<mode>"
+  [(set (match_operand:HFBF 0 "")
+	(match_operand:HFBF 1 ""))]
   ""
 {
-  if (riscv_legitimize_move (HFmode, operands[0], operands[1]))
+  if (riscv_legitimize_move (<MODE>mode, operands[0], operands[1]))
     DONE;
 })
 
@@ -1804,16 +1818,16 @@  (define_insn "*movhf_hardfloat"
    (set_attr "type" "fmove")
    (set_attr "mode" "HF")])
 
-(define_insn "*movhf_softfloat"
-  [(set (match_operand:HF 0 "nonimmediate_operand" "=f, r,r,m,*f,*r")
-	(match_operand:HF 1 "move_operand"         " f,Gr,m,r,*r,*f"))]
-  "!TARGET_ZFHMIN
-   && (register_operand (operands[0], HFmode)
-       || reg_or_0_operand (operands[1], HFmode))"
+(define_insn "*mov<mode>_softfloat"
+  [(set (match_operand:HFBF 0 "nonimmediate_operand" "=f, r,r,m,*f,*r")
+	(match_operand:HFBF 1 "move_operand"         " f,Gr,m,r,*r,*f"))]
+  "(!(TARGET_ZFHMIN && <MODE>mode == HFmode) || (<MODE>mode == BFmode))
+   && (register_operand (operands[0], <MODE>mode)
+       || reg_or_0_operand (operands[1], <MODE>mode))"
   { return riscv_output_move (operands[0], operands[1]); }
   [(set_attr "move_type" "fmove,move,load,store,mtc,mfc")
    (set_attr "type" "fmove")
-   (set_attr "mode" "HF")])
+   (set_attr "mode" "<MODE>")])
 
 ;;
 ;;  ....................
@@ -1858,6 +1872,62 @@  (define_insn "floatuns<GPR:mode><ANYF:mode>2"
   [(set_attr "type" "fcvt")
    (set_attr "mode" "<ANYF:MODE>")])
 
+;; The conversion of BF to SI/DI needs to be done with SF.
+(define_expand "fix_truncbf<GPR:mode>2"
+  [(set (match_operand:GPR      0 "register_operand" "=r")
+	(fix:GPR
+	    (match_operand:BF 1 "register_operand" " f")))]
+  ""
+  {
+    rtx op1 = gen_reg_rtx (SFmode);
+    convert_move (op1, operands[1], 0);
+    expand_fix (operands[0], op1, 0);
+    DONE;
+  })
+
+(define_expand "fixuns_truncbf<GPR:mode>2"
+  [(set (match_operand:GPR      0 "register_operand" "=r")
+	(unsigned_fix:GPR
+	    (match_operand:BF 1 "register_operand" " f")))]
+  ""
+  {
+    rtx op1 = gen_reg_rtx (SFmode);
+    convert_move (op1, operands[1], 1);
+    expand_fix (operands[0], op1, 1);
+    DONE;
+  })
+
+;; The conversion of SI to BF needs to be done with SF.
+;; The conversion of DI to BF needs to be done with libfuncs
+;; __floatdibf and __floatundibf directly if there is no F
+;; extension, because we have not yet enabled __floatdisf
+;; and __floatundisf.
+(define_expand "float<mode>bf2"
+  [(set (match_operand:BF    0 "register_operand" "= f")
+	(float:BF
+	    (match_operand:GPR 1 "reg_or_0_operand" " rJ")))]
+  "(<MODE>mode == SImode)
+   || (<MODE>mode == DImode && (TARGET_HARD_FLOAT || TARGET_ZFINX))"
+  {
+    rtx op1 = gen_reg_rtx (SFmode);
+    expand_float (op1, operands[1], 0);
+    convert_move (operands[0], op1, 0);
+    DONE;
+  })
+
+(define_expand "floatuns<mode>bf2"
+  [(set (match_operand:BF    0 "register_operand" "= f")
+	(unsigned_float:BF
+	    (match_operand:GPR 1 "reg_or_0_operand" " rJ")))]
+  "(<MODE>mode == SImode)
+   || (<MODE>mode == DImode && (TARGET_HARD_FLOAT || TARGET_ZFINX))"
+  {
+    rtx op1 = gen_reg_rtx (SFmode);
+    expand_float (op1, operands[1], 1);
+    convert_move (operands[0], op1, 1);
+    DONE;
+  })
+
 (define_insn "l<rint_pattern><ANYF:mode><GPR:mode>2"
   [(set (match_operand:GPR       0 "register_operand" "=r")
 	(unspec:GPR
diff --git a/gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c b/gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c
new file mode 100644
index 00000000000..9e67b2babc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv32imac -mabi=ilp32 -O" { target { rv32 } } } */
+/* { dg-options "-march=rv64imac -mabi=lp64 -O" { target { rv64 } } } */
+
+extern _Bfloat16 bf;
+extern _Bfloat16 bf1;
+extern _Bfloat16 bf2;
+
+/* Arithmetic.  */
+void bf_add_bf () { bf = bf1 + bf2; }
+
+void bf_sub_bf () { bf = bf1 - bf2; }
+
+void bf_mul_bf () { bf = bf1 * bf2; }
+
+void bf_div_bf () { bf = bf1 / bf2; }
+
+void bf_add_const () { bf = bf1 + 3.14; }
+
+void const_sub_bf () { bf = 3.14 - bf2; }
+
+void bf_mul_const () { bf = bf1 *3.14; }
+
+void const_div_bf () { bf = 3.14 / bf2; }
+
+void bf_inc () { ++bf; }
+
+void bf_dec () { --bf; }
+
+/* { dg-final { scan-assembler-times "call\t__extendbfsf2" 16 } } */
+/* { dg-final { scan-assembler-times "call\t__truncsfbf2" 6 } } */
+/* { dg-final { scan-assembler-times "call\t__truncdfbf2" 4 } } */
+/* { dg-final { scan-assembler-not "call\t__addbf3" } } */
+/* { dg-final { scan-assembler-not "call\t__subbf3" } } */
+/* { dg-final { scan-assembler-not "call\t__mulbf3" } } */
+/* { dg-final { scan-assembler-not "call\t__divbf3" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/bf16_call.c b/gcc/testsuite/gcc.target/riscv/bf16_call.c
new file mode 100644
index 00000000000..01576e38ac5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/bf16_call.c
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv32imac -mabi=ilp32 -O" { target { rv32 } } } */
+/* { dg-options "-march=rv64imac -mabi=lp64 -O" { target { rv64 } } } */
+
+_Bfloat16 add (_Bfloat16 a, _Bfloat16 b) __attribute__ ((noinline));
+_Bfloat16 add (_Bfloat16 a, _Bfloat16 b)
+{
+  return a + b;
+}
+
+_Bfloat16 test(_Bfloat16 a, _Bfloat16 b)
+{
+  return add (a, b);
+}
+
+/* { dg-final { scan-assembler-times "call\t__extendbfsf2" 2 } } */
+/* { dg-final { scan-assembler-times "call\t__truncsfbf2" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/bf16_comparisons.c b/gcc/testsuite/gcc.target/riscv/bf16_comparisons.c
new file mode 100644
index 00000000000..ff692378c00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/bf16_comparisons.c
@@ -0,0 +1,25 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv32imac -mabi=ilp32 -O" { target { rv32 } } } */
+/* { dg-options "-march=rv64imac -mabi=lp64 -O" { target { rv64 } } } */
+
+extern _Bfloat16 bf;
+extern _Bfloat16 bf1;
+extern _Bfloat16 bf2;
+
+/* Comparisons.  */
+void bf_lt_bf () { bf = (bf1 < bf2) ? bf1 : bf2; }
+
+void bf_gt_bf () { bf = (bf1 > bf2) ? bf1 : bf2; }
+
+void bf_eq_bf () { bf = (bf1 == bf2) ? bf1 : bf2; }
+
+void bf_lt_const () { bf = (bf1 < 3.14) ? bf1 : bf2; }
+
+void const_gt_bf () { bf = (3.14 > bf2) ? bf1 : bf2; }
+
+void bf_eq_const () { bf = (bf1 == 3.14) ? bf1 : bf2; }
+
+/* { dg-final { scan-assembler-times "call\t__extendbfsf2" 9 } } */
+/* { dg-final { scan-assembler-not "call\t__ltbf2" } } */
+/* { dg-final { scan-assembler-not "call\t__gtbf2" } } */
+/* { dg-final { scan-assembler-not "call\t__eqbf2" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/bf16_convert-1.c b/gcc/testsuite/gcc.target/riscv/bf16_convert-1.c
new file mode 100644
index 00000000000..3b9a7434373
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/bf16_convert-1.c
@@ -0,0 +1,39 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -O" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O" { target { rv64 } } } */
+
+extern _Bfloat16 bf;
+extern _Bfloat16 bf1;
+extern _Bfloat16 bf2;
+extern _Float16 hf;
+extern float sf;
+extern double df;
+
+extern int si;
+extern long long  di;
+
+extern unsigned int usi;
+extern unsigned long long udi;
+
+/* Fp or gp Converts to bf.  */
+void hf_to_bf () { bf = hf; } /* { dg-final { scan-assembler-times "call\t__trunchfbf2" 1 } } */
+void sf_to_bf () { bf = sf; }
+void df_to_bf () { bf = df; }
+void si_to_bf () { bf = si; }
+void di_to_bf () { bf = di; } /* { dg-final { scan-assembler-times "call\t__floatdibf" 1 { target { rv32 } } } } */ 
+void usi_to_bf () { bf = usi; }
+void udi_to_bf () { bf = udi; } /* { dg-final { scan-assembler-times "call\t__floatundibf" 1 { target { rv32 } } } } */ 
+void const_to_bf () { __volatile__ const float temp = 3.14; bf = temp; }
+/* { dg-final { scan-assembler-times "call\t__truncsfbf2" 5 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "call\t__truncsfbf2" 7 { target { rv64 } } } } */
+
+/* Bf converts to fp or gp.  */
+void bf_to_hf () { hf = bf; } /* { dg-final { scan-assembler-times "call\t__truncsfhf2" 1 } } */
+void bf_to_sf () { sf = bf; }
+void bf_to_df () { df = bf; }
+void bf_to_si () { si = bf; }
+void bf_to_di () { di = bf; } /* { dg-final { scan-assembler-times "call\t__fixsfdi" 1 { target { rv32 } } } } */ 
+void bf_to_usi () { usi = bf; }
+void bf_to_udi () { udi = bf; } /* { dg-final { scan-assembler-times "call\t__fixunssfdi" 1 { target { rv32 } } } } */ 
+/* { dg-final { scan-assembler-times "call\t__extendbfsf2" 4 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "call\t__extendbfsf2" 6 { target { rv64 } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/bf16_convert-2.c b/gcc/testsuite/gcc.target/riscv/bf16_convert-2.c
new file mode 100644
index 00000000000..912b875bdd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/bf16_convert-2.c
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv32imac -mabi=ilp32 -O" { target { rv32 } } } */
+/* { dg-options "-march=rv64imac -mabi=lp64 -O" { target { rv64 } } } */
+
+extern _Bfloat16 bf;
+extern _Bfloat16 bf1;
+extern _Bfloat16 bf2;
+extern _Float16 hf;
+extern float sf;
+extern double df;
+
+extern int si;
+extern long long  di;
+
+extern unsigned int usi;
+extern unsigned long long udi;
+
+/* Fp or gp Converts to bf.  */
+void hf_to_bf () { bf = hf; } /* { dg-final { scan-assembler-times "call\t__trunchfbf2" 1 } } */
+void sf_to_bf () { bf = sf; }
+void df_to_bf () { bf = df; } /* { dg-final { scan-assembler-times "call\t__truncdfbf2" 1 } } */
+void si_to_bf () { bf = si; } /* { dg-final { scan-assembler-times "call\t__floatsisf" 1 } } */
+void di_to_bf () { bf = di; } /* { dg-final { scan-assembler-times "call\t__floatdibf" 1 } } */
+void usi_to_bf () { bf = usi; }  /* { dg-final { scan-assembler-times "call\t__floatunsisf" 1 } } */
+void udi_to_bf () { bf = udi; }  /* { dg-final { scan-assembler-times "call\t__floatundibf" 1 } } */
+void const_to_bf () { __volatile__ const float temp = 3.14; bf = temp; }
+/* { dg-final { scan-assembler-times "call\t__truncsfbf2" 4 } } */
+
+/* Bf converts to fp or gp.  */
+void bf_to_hf () { hf = bf; } /* { dg-final { scan-assembler-times "call\t__truncsfhf2" 1 } } */
+void bf_to_sf () { sf = bf; }
+void bf_to_df () { df = bf; } /* { dg-final { scan-assembler-times "call\t__extendsfdf2" 1 } } */
+void bf_to_si () { si = bf; } /* { dg-final { scan-assembler-times "call\t__fixsfsi" 1 } } */
+void bf_to_di () { di = bf; } /* { dg-final { scan-assembler-times "call\t__fixsfdi" 1 } } */
+void bf_to_usi () { usi = bf; }  /* { dg-final { scan-assembler-times "call\t__fixunssfsi" 1 } } */
+void bf_to_udi () { udi = bf; }  /* { dg-final { scan-assembler-times "call\t__fixunssfdi" 1 } } */
+/* { dg-final { scan-assembler-times "call\t__extendbfsf2" 4 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "call\t__extendbfsf2" 6 { target { rv64 } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/bf16_convert_run.c b/gcc/testsuite/gcc.target/riscv/bf16_convert_run.c
new file mode 100644
index 00000000000..d9b1f0f6298
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/bf16_convert_run.c
@@ -0,0 +1,163 @@ 
+/* { dg-do run } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -O" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O" { target { rv64 } } } */
+
+#include <stdio.h>
+
+#define NO_INLINE __attribute__((noinline))
+
+int NO_INLINE
+bf16_to_int ()
+{
+  int ret[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+  _Bfloat16 a_bf16 = 1.2;
+  _Bfloat16 b_bf16 = 7.1;
+  signed char a_char = 1;
+  short int a_short_int = 2;
+  int a_int = 3;
+  long a_long = 4;
+  long long a_long_long = 5;
+
+  a_bf16 = (_Bfloat16)a_char;
+  if (a_bf16 != (_Bfloat16)1)
+    ret[0] = 1;
+
+  a_bf16 = (_Bfloat16)a_short_int;
+  if (a_bf16 != (_Bfloat16)2)
+    ret[1] = 1;
+
+  a_bf16 = (_Bfloat16)a_int;
+  if (a_bf16 != (_Bfloat16)3)
+    ret[2] = 1;
+
+  a_bf16 = (_Bfloat16)a_long;
+  if (a_bf16 != (_Bfloat16)4)
+    ret[3] = 1;
+
+  a_bf16 = (_Bfloat16)a_long_long;
+  if (a_bf16 != (_Bfloat16)5)
+    ret[4] = 1;
+
+  a_char = (signed char)b_bf16;
+  if (a_char != (signed char)7.1)
+    ret[5] = 1;
+
+  a_short_int = (short int)b_bf16;
+  if (a_short_int != (short int)7.1)
+    ret[6] = 1;
+
+  a_int = (int)b_bf16;
+  if (a_int != (int)7.1)
+    ret[7] = 1;
+
+  a_long = (long)b_bf16;
+  if (a_long != (long)7.1)
+    ret[8] = 1;
+
+  a_long_long = (long long)b_bf16;
+  if (a_long_long != (long long)7.1)
+    ret[9] = 1;
+
+  if ((ret[0] == 1) || (ret[1] == 1) || (ret[2] == 1) || (ret[3] == 1) || (ret[4] == 1) ||
+      (ret[5] == 1) || (ret[6] == 1) || (ret[7] == 1) || (ret[8] == 1) || (ret[9] == 1))
+    return 1;
+  else
+    return 0;
+}
+
+int NO_INLINE
+bf16_to_uint ()
+{
+  int ret[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+  _Bfloat16 a_bf16 = 1.2;
+  _Bfloat16 b_bf16 = 7.1;
+  unsigned char a_uchar = 1;
+  unsigned short int a_short_uint = 2;
+  unsigned int a_uint = 3;
+  unsigned long a_ulong = 4;
+  unsigned long long a_ulong_ulong = 5;
+
+  a_bf16 = (_Bfloat16)a_uchar;
+  if (a_bf16 != (_Bfloat16)1)
+    ret[0] = 1;
+
+  a_bf16 = (_Bfloat16)a_short_uint;
+  if (a_bf16 != (_Bfloat16)2)
+    ret[1] = 1;
+
+  a_bf16 = (_Bfloat16)a_uint;
+  if (a_bf16 != (_Bfloat16)3)
+    ret[2] = 1;
+
+  a_bf16 = (_Bfloat16)a_ulong;
+  if (a_bf16 != (_Bfloat16)4)
+    ret[3] = 1;
+
+  a_bf16 = (_Bfloat16)a_ulong_ulong;
+  if (a_bf16 != (_Bfloat16)5)
+    ret[4] = 1;
+
+  a_uchar = (unsigned char)b_bf16;
+  if (a_uchar != (unsigned char)7.1)
+    ret[5] = 1;
+
+  a_short_uint = (unsigned short int)b_bf16;
+  if (a_short_uint != (unsigned short int)7.1)
+    ret[6] = 1;
+
+  a_uint = (unsigned int)b_bf16;
+  if (a_uint != (unsigned int)7.1)
+    ret[7] = 1;
+
+  a_ulong = (unsigned long)b_bf16;
+  if (a_ulong != (unsigned long)7.1)
+    ret[8] = 1;
+
+  a_ulong_ulong = (unsigned long long)b_bf16;
+  if (a_ulong_ulong != (unsigned long long)7.1)
+    ret[9] = 1;
+
+  if ((ret[0] == 1) || (ret[1] == 1) || (ret[2] == 1) || (ret[3] == 1) || (ret[4] == 1) ||
+      (ret[5] == 1) || (ret[6] == 1) || (ret[7] == 1) || (ret[8] == 1) || (ret[9] == 1))
+    return 1;
+  else
+    return 0;
+}
+
+int NO_INLINE
+bf16_to_float ()
+{
+  int ret[4] = {0, 0, 0, 0};
+  _Bfloat16 a_bf16 = 1.2;
+  _Bfloat16 b_bf16 = 7.5;
+  float a_float = 3.7;
+  double a_double = 5.8;
+  a_bf16 = (_Bfloat16)a_float;
+  if (a_bf16 != ((_Bfloat16)3.7))
+    ret[0] = 1;
+
+  a_bf16 = (_Bfloat16)a_double;
+  if (a_bf16 != ((_Bfloat16)5.8))
+    ret[1] = 1;
+
+  a_float = (float)b_bf16;
+  if (a_float != (float)7.5)
+    ret[2] = 1;
+
+  a_double = (double)b_bf16;
+  if (a_double != (double)7.5)
+    ret[3] = 1;
+
+  if ((ret[0] == 1) || (ret[1] == 1) || (ret[2] == 1) || (ret[3] == 1))
+    return 1;
+  else
+    return 0;
+}
+
+int main()
+{
+  if (bf16_to_int () || bf16_to_uint () || bf16_to_float ())
+    return 1;
+  else
+    return 0;
+}
diff --git a/libgcc/config/riscv/sfp-machine.h b/libgcc/config/riscv/sfp-machine.h
index 38e2817bffa..6e294b38783 100644
--- a/libgcc/config/riscv/sfp-machine.h
+++ b/libgcc/config/riscv/sfp-machine.h
@@ -41,6 +41,7 @@  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define _FP_DIV_MEAT_D(R,X,Y)	_FP_DIV_MEAT_2_udiv(D,R,X,Y)
 #define _FP_DIV_MEAT_Q(R,X,Y)	_FP_DIV_MEAT_4_udiv(Q,R,X,Y)
 
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
@@ -64,6 +65,7 @@  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define _FP_DIV_MEAT_D(R,X,Y)	_FP_DIV_MEAT_1_udiv_norm(D,R,X,Y)
 #define _FP_DIV_MEAT_Q(R,X,Y)	_FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D
@@ -82,6 +84,7 @@  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
 #define CMPtype __gcc_CMPtype
 
+#define _FP_NANSIGN_B		0
 #define _FP_NANSIGN_H		0
 #define _FP_NANSIGN_S		0
 #define _FP_NANSIGN_D		0
diff --git a/libgcc/config/riscv/t-softfp32 b/libgcc/config/riscv/t-softfp32
index 1a3b1caa6b0..0c61f77714b 100644
--- a/libgcc/config/riscv/t-softfp32
+++ b/libgcc/config/riscv/t-softfp32
@@ -42,7 +42,8 @@  softfp_extras += divsf3 divdf3 divtf3
 
 endif
 
-softfp_extensions += hfsf hfdf hftf
-softfp_truncations += tfhf dfhf sfhf
+softfp_extensions += hfsf hfdf hftf bfsf
+softfp_truncations += tfhf dfhf sfhf tfbf dfbf sfbf bfhf hfbf
 softfp_extras += fixhfsi fixhfdi fixunshfsi fixunshfdi \
-                 floatsihf floatdihf floatunsihf floatundihf
+		 floatsihf floatdihf floatunsihf floatundihf \
+		 floatdibf floatundibf