[RFC,V4] Enable libmvec support for RISC-V
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Testing passed
|
redhat-pt-bot/TryBot-32bit |
success
|
Build for i686
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Testing passed
|
Commit Message
From: yulong <shiyulong@iscas.ac.cn>
Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
This patch tries to enable libmvec on RISC-V. I also have demonstrated
how this all fits together by adding implementations for vector cos.
This patch is a try and we hope to receive valuable comments.
Thanks,
yulong
---
sysdeps/riscv/configure | 4 +
sysdeps/riscv/configure.ac | 4 +
sysdeps/riscv/rvd/Makefile | 5 +
sysdeps/riscv/rvd/Versions | 5 +
sysdeps/riscv/rvd/bits/math-vector.h | 29 ++++
sysdeps/riscv/rvd/cos.c | 94 ++++++++++++
sysdeps/riscv/rvd/math_private.h | 42 ++++++
sysdeps/riscv/rvd/v_math.h | 139 ++++++++++++++++++
sysdeps/riscv/rvd/vecmath_config.h | 33 +++++
sysdeps/unix/sysv/linux/riscv/libmvec.abilist | 1 +
10 files changed, 356 insertions(+)
mode change 100644 => 100755 sysdeps/riscv/configure
create mode 100644 sysdeps/riscv/rvd/Makefile
create mode 100644 sysdeps/riscv/rvd/Versions
create mode 100644 sysdeps/riscv/rvd/bits/math-vector.h
create mode 100644 sysdeps/riscv/rvd/cos.c
create mode 100644 sysdeps/riscv/rvd/math_private.h
create mode 100644 sysdeps/riscv/rvd/v_math.h
create mode 100644 sysdeps/riscv/rvd/vecmath_config.h
create mode 100644 sysdeps/unix/sysv/linux/riscv/libmvec.abilist
Comments
On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
> From: yulong <shiyulong@iscas.ac.cn>
>
> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
> This patch tries to enable libmvec on RISC-V. I also have demonstrated
> how this all fits together by adding implementations for vector cos.
> This patch is a try and we hope to receive valuable comments.
Just an FYI -- Palmer's team over at Rivos have implementations for a
number of routines that would fit into libmvec. You might reach out to
Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
implementation.
> https://github.com/rivosinc/veclibm/
THeir implementations may provide good guidance on performant
implementations of various routines that libmvec typically provides.
jeff
在 2024/4/25 13:07, Jeff Law 写道:
>
>
> On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
>> From: yulong <shiyulong@iscas.ac.cn>
>>
>> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
>> This patch tries to enable libmvec on RISC-V. I also have demonstrated
>> how this all fits together by adding implementations for vector cos.
>> This patch is a try and we hope to receive valuable comments.
> Just an FYI -- Palmer's team over at Rivos have implementations for a
> number of routines that would fit into libmvec. You might reach out
> to Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
> implementation.
>
>> https://github.com/rivosinc/veclibm/
>
>
> THeir implementations may provide good guidance on performant
> implementations of various routines that libmvec typically provides.
>
> jeff
Thanks Jeff for your advice, I'm working on a new implementation after
reading the above code.
On Wed, 24 Apr 2024 22:07:31 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
>
> On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
>> From: yulong <shiyulong@iscas.ac.cn>
>>
>> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
>> This patch tries to enable libmvec on RISC-V. I also have demonstrated
>> how this all fits together by adding implementations for vector cos.
>> This patch is a try and we hope to receive valuable comments.
> Just an FYI -- Palmer's team over at Rivos have implementations for a
> number of routines that would fit into libmvec. You might reach out to
> Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
> implementation.
>
>> https://github.com/rivosinc/veclibm/
>
>
> THeir implementations may provide good guidance on performant
> implementations of various routines that libmvec typically provides.
Ya, that's the idea of veclibm. The actual functions are written in a
way that's more suitable for some other libraries, but the core
computational implemenations should be the same. A few of us had
briefly talked internally about getting these into glibc, IIUC all the
code was written at Rivos and thus could be copyright assigned to the
FSF and used in glibc. We don't have time to do that right now, but if
you're interested in helping that'd be awesome. We'll need to be
careful with the copyright/licensing, though.
That said, I've never really quite managed to figure out how all the
libmvec stuff is supposed to fit together. I'm more worried about the
ABI side of things than the implementation, so I think starting with
just one function to get the ABI template figure out is a reasonable way
to go and we can get the rest of the implementations ported over next.
The first thing that jumps out on the ABI side of things is cos() taking
EMUL=2 types, I'm not sure if there's a reason for that but it seems
we'd want EMUL=1 to fit more data in the argument registers?
Also, I think some of this can be split out: the roundtoint/converttoint
isn't really a libmvec thing (see
https://inbox.sourceware.org/libc-alpha/20220803174258.4235-1-palmer@rivosinc.com/,
which fails some test), and ptr_barrier() can probably be pulled out to
something generic as it's the same as arm64's version.
I'm also only seeing draft versions of the vector intrinsics. I know we
merged them into GCC and usually that means things are stable, but we
merged these pre-freeze (based on some assertions things wouldn't
change) and things have drifted around a bit it the spec. I think we're
probably safe just depending on the types, if there's no frozen version
we should at least write down exactly which version we're following
though.
Also: are there GCC patches for these? It'd be great to be able to test
things through the whole codegen stack so we can make sure it works.
>
> jeff
在 2024/5/1 0:26, Palmer Dabbelt 写道:
> On Wed, 24 Apr 2024 22:07:31 PDT (-0700), jeffreyalaw@gmail.com wrote:
>>
>>
>> On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
>>> From: yulong <shiyulong@iscas.ac.cn>
>>>
>>> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
>>> This patch tries to enable libmvec on RISC-V. I also have demonstrated
>>> how this all fits together by adding implementations for vector cos.
>>> This patch is a try and we hope to receive valuable comments.
>> Just an FYI -- Palmer's team over at Rivos have implementations for a
>> number of routines that would fit into libmvec. You might reach out to
>> Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
>> implementation.
>>
>>> https://github.com/rivosinc/veclibm/
>>
>>
>> THeir implementations may provide good guidance on performant
>> implementations of various routines that libmvec typically provides.
>
> Ya, that's the idea of veclibm. The actual functions are written in a
> way that's more suitable for some other libraries, but the core
> computational implemenations should be the same. A few of us had
> briefly talked internally about getting these into glibc, IIUC all the
> code was written at Rivos and thus could be copyright assigned to the
> FSF and used in glibc. We don't have time to do that right now, but
> if you're interested in helping that'd be awesome. We'll need to be
> careful with the copyright/licensing, though.
Thanks for your reply. I also received an email from Peter Tang. I am
very interested in contributing to glibc.
>
> That said, I've never really quite managed to figure out how all the
> libmvec stuff is supposed to fit together. I'm more worried about the
> ABI side of things than the implementation, so I think starting with
> just one function to get the ABI template figure out is a reasonable
> way to go and we can get the rest of the implementations ported over
> next. The first thing that jumps out on the ABI side of things is
> cos() taking EMUL=2 types, I'm not sure if there's a reason for that
> but it seems we'd want EMUL=1 to fit more data in the argument registers?
Setting EMUL=2 is just a personal experiment. I think you are right and
I will improve it in the next version.
>
> Also, I think some of this can be split out: the
> roundtoint/converttoint isn't really a libmvec thing (see
> https://inbox.sourceware.org/libc-alpha/20220803174258.4235-1-palmer@rivosinc.com/,
> which fails some test), and ptr_barrier() can probably be pulled out
> to something generic as it's the same as arm64's version.
>
> I'm also only seeing draft versions of the vector intrinsics. I know
> we merged them into GCC and usually that means things are stable, but
> we merged these pre-freeze (based on some assertions things wouldn't
> change) and things have drifted around a bit it the spec. I think
> we're probably safe just depending on the types, if there's no frozen
> version we should at least write down exactly which version we're
> following though.
We are currently developing based on the latest branches. Can we declare
that we are following RVV 1.0?
>
> Also: are there GCC patches for these? It'd be great to be able to
> test things through the whole codegen stack so we can make sure it works.
Unfortunately, there are no patches for GCC right now. This may be the
direction of future work.
>
>>
>> jeff
Hi yulong, do you have any further progress? I finish a new version
libmvec support for risc-v, which also base on implementations by
Palmer's team over at Rivos.
https://github.com/rivosinc/veclibm/
I can't find the vector function name mangling of risc-v, so I define it
as follows, maybe it's incorrect, but I think it's worhting discussing.
_ZGV<x>N<y>v<v...>_<func_name>
'x' is the LMUL, if the LMUL is 1/2/4/8 and 'x' is 1/2/4/8.
'y' is the count of elements also 'simdlen' in gcc.
'v..' depends on the number of parameter, there are as many 'v'
characters as there are parameters.
'func_name' is the scalar function name.
This path have supported vectorized version for the following math
function in risc-v (although now only support VLENB <= 256, it's very
easy to extend to larger VLENB). Besides, I also finish the gcc patch to
support libmvec in risc-v.
exp/asin/atan/acos/atanh/exp10/exp2/tan/tanh/pow/sin/log/cos/acosh/asinh/atan2/expm1/tgamma/lgamma/log2/log10/cbrt/erfc/erf/cosh/sinh
Hi Palmer, I temporarily change the Copyright information in some files
which come from veclibm, it's not a viaolation of your Copyright,
actually I don't know how to solve the conflict between LGPL and
Apache2.0. If you know, please tell me to fix it, thank you.
Zhijin Zeng
在 2024/5/10 21:06, yulong 写道:
>
> 在 2024/5/1 0:26, Palmer Dabbelt 写道:
>> On Wed, 24 Apr 2024 22:07:31 PDT (-0700), jeffreyalaw@gmail.com wrote:
>>>
>>>
>>> On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
>>>> From: yulong <shiyulong@iscas.ac.cn>
>>>>
>>>> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
>>>> This patch tries to enable libmvec on RISC-V. I also have demonstrated
>>>> how this all fits together by adding implementations for vector cos.
>>>> This patch is a try and we hope to receive valuable comments.
>>> Just an FYI -- Palmer's team over at Rivos have implementations for a
>>> number of routines that would fit into libmvec. You might reach out to
>>> Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
>>> implementation.
>>>
>>>> https://github.com/rivosinc/veclibm/
>>>
>>>
>>> THeir implementations may provide good guidance on performant
>>> implementations of various routines that libmvec typically provides.
>>
>> Ya, that's the idea of veclibm. The actual functions are written in
>> a way that's more suitable for some other libraries, but the core
>> computational implemenations should be the same. A few of us had
>> briefly talked internally about getting these into glibc, IIUC all
>> the code was written at Rivos and thus could be copyright assigned to
>> the FSF and used in glibc. We don't have time to do that right now,
>> but if you're interested in helping that'd be awesome. We'll need to
>> be careful with the copyright/licensing, though.
> Thanks for your reply. I also received an email from Peter Tang. I
> am very interested in contributing to glibc.
>>
>> That said, I've never really quite managed to figure out how all the
>> libmvec stuff is supposed to fit together. I'm more worried about
>> the ABI side of things than the implementation, so I think starting
>> with just one function to get the ABI template figure out is a
>> reasonable way to go and we can get the rest of the implementations
>> ported over next. The first thing that jumps out on the ABI side of
>> things is cos() taking EMUL=2 types, I'm not sure if there's a reason
>> for that but it seems we'd want EMUL=1 to fit more data in the
>> argument registers?
> Setting EMUL=2 is just a personal experiment. I think you are right
> and I will improve it in the next version.
>>
>> Also, I think some of this can be split out: the
>> roundtoint/converttoint isn't really a libmvec thing (see
>> https://inbox.sourceware.org/libc-alpha/20220803174258.4235-1-palmer@rivosinc.com/,
>> which fails some test), and ptr_barrier() can probably be pulled out
>> to something generic as it's the same as arm64's version.
>>
>> I'm also only seeing draft versions of the vector intrinsics. I know
>> we merged them into GCC and usually that means things are stable, but
>> we merged these pre-freeze (based on some assertions things wouldn't
>> change) and things have drifted around a bit it the spec. I think
>> we're probably safe just depending on the types, if there's no frozen
>> version we should at least write down exactly which version we're
>> following though.
> We are currently developing based on the latest branches. Can we
> declare that we are following RVV 1.0?
>>
>> Also: are there GCC patches for these? It'd be great to be able to
>> test things through the whole codegen stack so we can make sure it
>> works.
> Unfortunately, there are no patches for GCC right now. This may be the
> direction of future work.
>>
>>>
>>> jeff
This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not an intended recipient of this message, please delete it and any attachment from your system and notify the sender immediately by reply e-mail. Unintended recipients should not use, copy, disclose or take any action based on this message or any information contained in this message. Emails cannot be guaranteed to be secure or error free as they can be intercepted, amended, lost or destroyed, and you should take full responsibility for security checking.
本邮件及其任何附件具有保密性质,并可能受其他保护或不允许被披露给第三方。如阁下误收到本邮件,敬请立即以回复电子邮件的方式通知发件人,并将本邮件及其任何附件从阁下系统中予以删除。如阁下并非本邮件写明之收件人,敬请切勿使用、复制、披露本邮件或其任何内容,亦请切勿依本邮件或其任何内容而采取任何行动。电子邮件无法保证是一种安全和不会出现任何差错的通信方式,可能会被拦截、修改、丢失或损坏,收件人需自行负责做好安全检查。
From 0eda8e538c7f7d4036d9decceb714acf3314f885 Mon Sep 17 00:00:00 2001
From: Zhijin Zeng <zhijin.zeng@spacemit.com>
Date: Thu, 31 Oct 2024 18:13:19 +0800
Subject: [PATCH] RISC-V: support vector math library for risc-v
Add risc-v vector function mangling rules as follow:
_ZGV<x>N<y>v_<func_name>
'x' is the LMUL, if the LMUL is 1/2/4/8 and 'x' is 1/2/4/8.
'y' is the count of elements also 'simdlen' in gcc.
'func_name' is the scalar function name.
gcc/ChangeLog:
* config/riscv/riscv.cc (INCLUDE_STRING):
(riscv_vector_type_p):
(supported_simd_type):
(lane_size):
(riscv_simd_clone_compute_vecsize_and_simdlen):
(riscv_simd_clone_adjust):
(riscv_simd_clone_usable):
(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
(TARGET_SIMD_CLONE_ADJUST):
(TARGET_SIMD_CLONE_USABLE):
---
gcc/config/riscv/riscv.cc | 241 +++++++++++++++++++++++++++++++++++++-
1 file changed, 240 insertions(+), 1 deletion(-)
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4f8e3ab931a..9b44d36b171 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. If not see
#define IN_TARGET_CODE 1
#define INCLUDE_STRING
+#include <cmath>
#include "config.h"
#include "system.h"
#include "coretypes.h"
@@ -33,6 +34,7 @@ along with GCC; see the file COPYING3. If not see
#include "insn-config.h"
#include "insn-attr.h"
#include "recog.h"
+#include "cgraph.h"
#include "output.h"
#include "alias.h"
#include "tree.h"
@@ -5197,7 +5199,9 @@ riscv_vector_type_p (const_tree type)
{
/* Currently, only builtin scalabler vector type is allowed, in the future,
more vector types may be allowed, such as GNU vector type, etc. */
- return riscv_vector::builtin_type_p (type);
+ if (!type)
+ return false;
+ return riscv_vector::builtin_type_p (type) || VECTOR_TYPE_P (type);
}
static unsigned int
@@ -11099,6 +11103,231 @@ riscv_get_raw_result_mode (int regno)
return default_get_reg_raw_mode (regno);
}
+/* Return true for types that could be supported as SIMD return or
+ argument types. */
+
+static bool
+supported_simd_type (tree t)
+{
+ if (SCALAR_FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t))
+ {
+ HOST_WIDE_INT s = tree_to_shwi (TYPE_SIZE_UNIT (t));
+ return s == 1 || s == 2 || s == 4 || s == 8;
+ }
+ return false;
+}
+
+static unsigned
+lane_size (cgraph_simd_clone_arg_type clone_arg_type, tree type)
+{
+ gcc_assert (clone_arg_type != SIMD_CLONE_ARG_TYPE_MASK);
+
+ if (INTEGRAL_TYPE_P (type)
+ || SCALAR_FLOAT_TYPE_P (type))
+ switch (TYPE_PRECISION (type) / BITS_PER_UNIT)
+ {
+ default:
+ break;
+ case 1:
+ case 2:
+ case 4:
+ case 8:
+ return TYPE_PRECISION (type);
+ }
+ gcc_unreachable ();
+}
+
+/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN. */
+
+static int
+riscv_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
+ struct cgraph_simd_clone *clonei,
+ tree base_type ATTRIBUTE_UNUSED,
+ int num, bool explicit_p)
+{
+ tree t, ret_type;
+ unsigned int elt_bit = 0;
+ unsigned HOST_WIDE_INT const_simdlen;
+
+ if (!TARGET_VECTOR)
+ return 0;
+
+ if (maybe_ne (clonei->simdlen, 0U)
+ && clonei->simdlen.is_constant (&const_simdlen)
+ && (const_simdlen < 2
+ || const_simdlen > 1024
+ || (const_simdlen & (const_simdlen - 1)) != 0))
+ {
+ if (explicit_p)
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "unsupported simdlen %wd", const_simdlen);
+ return 0;
+ }
+
+ ret_type = TREE_TYPE (TREE_TYPE (node->decl));
+ if (TREE_CODE (ret_type) != VOID_TYPE
+ && !supported_simd_type (ret_type))
+ {
+ if (!explicit_p)
+ ;
+ else if (COMPLEX_FLOAT_TYPE_P (ret_type))
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "GCC does not currently support return type %qT "
+ "for simd", ret_type);
+ else
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "unsupported return type %qT for simd",
+ ret_type);
+ return 0;
+ }
+
+ auto_vec<std::pair <tree, unsigned int>> vec_elts (clonei->nargs + 1);
+ if (TREE_CODE (ret_type) != VOID_TYPE)
+ {
+ elt_bit = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
+ vec_elts.safe_push (std::make_pair (ret_type, elt_bit));
+ }
+
+ int i;
+ tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
+ bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE);
+ for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
+ t && t != void_list_node; t = TREE_CHAIN (t), i++)
+ {
+ tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
+ if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
+ && !supported_simd_type (arg_type))
+ {
+ if (!explicit_p)
+ ;
+ else if (COMPLEX_FLOAT_TYPE_P (ret_type))
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "GCC does not currently support argument type %qT "
+ "for simd", arg_type);
+ else
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "unsupported argument type %qT for simd",
+ arg_type);
+ return 0;
+ }
+ unsigned lane_bits = lane_size (clonei->args[i].arg_type, arg_type);
+ if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
+ vec_elts.safe_push (std::make_pair (arg_type, lane_bits));
+ if (!elt_bit)
+ elt_bit = lane_bits;
+ if (elt_bit != lane_bits)
+ return 0;
+ }
+
+ if (!elt_bit)
+ return 0;
+
+ clonei->vecsize_mangle = 'n';
+ clonei->mask_mode = VOIDmode;
+ poly_uint64 simdlen;
+ auto_vec<poly_uint64> simdlens (2);
+
+ clonei->vecsize_int = 0;
+ clonei->vecsize_float = 0;
+
+ if ((unsigned int)TARGET_MIN_VLEN <= elt_bit)
+ return 0;
+
+ /* Keep track of the possible simdlens the clones of this function can have,
+ and check them later to see if we support them. */
+ if (known_eq (clonei->simdlen, 0U))
+ {
+ if (TARGET_MAX_LMUL >= RVV_M1)
+ simdlens.safe_push (
+ exact_div (poly_uint64 (TARGET_MIN_VLEN * RVV_M1), elt_bit));
+ if (TARGET_MAX_LMUL >= RVV_M2)
+ simdlens.safe_push (
+ exact_div (poly_uint64 (TARGET_MIN_VLEN * RVV_M2), elt_bit));
+ if (TARGET_MAX_LMUL >= RVV_M4)
+ simdlens.safe_push (
+ exact_div (poly_uint64 (TARGET_MIN_VLEN * RVV_M4), elt_bit));
+ if (TARGET_MAX_LMUL >= RVV_M8)
+ simdlens.safe_push (
+ exact_div (poly_uint64 (TARGET_MIN_VLEN * RVV_M8), elt_bit));
+ }
+ else
+ simdlens.safe_push (clonei->simdlen);
+
+ unsigned j = 0;
+ while (j < simdlens.length ())
+ {
+ bool remove_simdlen = false;
+ for (auto elt : vec_elts)
+ if (known_gt (simdlens[j] * elt.second,
+ TARGET_MIN_VLEN * TARGET_MAX_LMUL))
+ {
+ /* Don't issue a warning for every simdclone when there is no
+ specific simdlen clause. */
+ if (explicit_p && maybe_ne (clonei->simdlen, 0U))
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "GCC does not currently support simdlen %wd for "
+ "type %qT",
+ constant_lower_bound (simdlens[j]), elt.first);
+ remove_simdlen = true;
+ break;
+ }
+ if (remove_simdlen)
+ simdlens.ordered_remove (j);
+ else
+ j++;
+ }
+
+ int count = simdlens.length ();
+ if (count == 0)
+ {
+ if (explicit_p && known_eq (clonei->simdlen, 0U))
+ {
+ /* Warn the user if we can't generate any simdclone. */
+ //simdlen = exact_div (TARGET_MIN_VLEN * LMUL, elt_bit);
+ warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+ "GCC does not currently support a simdclone with simdlens"
+ " %wd and %wd for these types.",
+ constant_lower_bound (simdlen),
+ constant_lower_bound (simdlen*2));
+ }
+ return 0;
+ }
+
+ gcc_assert (num < count);
+ clonei->vecsize_mangle = std::exp2 (num) + '0';
+ clonei->simdlen = simdlens[num];
+ return count;
+}
+
+/* Implement TARGET_SIMD_CLONE_ADJUST. */
+
+static void
+riscv_simd_clone_adjust (struct cgraph_node *node)
+{
+ tree t = TREE_TYPE (node->decl);
+ TYPE_ATTRIBUTES (t) = make_attribute ("riscv_vector_cc", "default",
+ TYPE_ATTRIBUTES (t));
+}
+
+/* Implement TARGET_SIMD_CLONE_USABLE. */
+
+static int
+riscv_simd_clone_usable (struct cgraph_node *node)
+{
+ switch (node->simdclone->vecsize_mangle)
+ {
+ case '1':
+ case '2':
+ case '4':
+ case '8':
+ if (!TARGET_VECTOR)
+ return -1;
+ return 0;
+ default:
+ gcc_unreachable ();
+ }
+}
+
/* Initialize the GCC target structure. */
#undef TARGET_ASM_ALIGNED_HI_OP
#define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -11451,6 +11680,16 @@ riscv_get_raw_result_mode (int regno)
#undef TARGET_GET_RAW_RESULT_MODE
#define TARGET_GET_RAW_RESULT_MODE riscv_get_raw_result_mode
+#undef TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
+#define TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN \
+ riscv_simd_clone_compute_vecsize_and_simdlen
+
+#undef TARGET_SIMD_CLONE_ADJUST
+#define TARGET_SIMD_CLONE_ADJUST riscv_simd_clone_adjust
+
+#undef TARGET_SIMD_CLONE_USABLE
+#define TARGET_SIMD_CLONE_USABLE riscv_simd_clone_usable
+
struct gcc_target targetm = TARGET_INITIALIZER;
#include "gt-riscv.h"
Hi, Zhijin Zeng:
Thank you for your contribution.
I am still working on the relevant work, but not pushed it to upstream.
Because there is an urgent project being done recently. After that, I
will send patch to upstream as soon as possible.
Thanks!
yulong
在 2024/11/4 12:41, Zhijin Zeng 写道:
> Hi yulong, do you have any further progress? I finish a new version
> libmvec support for risc-v, which also base on implementations by
> Palmer's team over at Rivos.
>
> https://github.com/rivosinc/veclibm/
>
> I can't find the vector function name mangling of risc-v, so I define it
> as follows, maybe it's incorrect, but I think it's worhting discussing.
>
> _ZGV<x>N<y>v<v...>_<func_name>
>
> 'x' is the LMUL, if the LMUL is 1/2/4/8 and 'x' is 1/2/4/8.
>
> 'y' is the count of elements also 'simdlen' in gcc.
>
> 'v..' depends on the number of parameter, there are as many 'v'
> characters as there are parameters.
>
> 'func_name' is the scalar function name.
>
> This path have supported vectorized version for the following math
> function in risc-v (although now only support VLENB <= 256, it's very
> easy to extend to larger VLENB). Besides, I also finish the gcc patch to
> support libmvec in risc-v.
>
> exp/asin/atan/acos/atanh/exp10/exp2/tan/tanh/pow/sin/log/cos/acosh/asinh/atan2/expm1/tgamma/lgamma/log2/log10/cbrt/erfc/erf/cosh/sinh
>
> Hi Palmer, I temporarily change the Copyright information in some files
> which come from veclibm, it's not a viaolation of your Copyright,
> actually I don't know how to solve the conflict between LGPL and
> Apache2.0. If you know, please tell me to fix it, thank you.
>
> Zhijin Zeng
>
>
> 在 2024/5/10 21:06, yulong 写道:
>> 在 2024/5/1 0:26, Palmer Dabbelt 写道:
>>> On Wed, 24 Apr 2024 22:07:31 PDT (-0700), jeffreyalaw@gmail.com wrote:
>>>>
>>>> On 4/15/24 1:21 AM, shiyulong@iscas.ac.cn wrote:
>>>>> From: yulong <shiyulong@iscas.ac.cn>
>>>>>
>>>>> Diff: Chande the version from GLIBC_2.39 to GLIBC_2.40.
>>>>> This patch tries to enable libmvec on RISC-V. I also have demonstrated
>>>>> how this all fits together by adding implementations for vector cos.
>>>>> This patch is a try and we hope to receive valuable comments.
>>>> Just an FYI -- Palmer's team over at Rivos have implementations for a
>>>> number of routines that would fit into libmvec. You might reach out to
>>>> Ping Tak Peter Tang <ptpt@rivosinc.com> for information in his
>>>> implementation.
>>>>
>>>>> https://github.com/rivosinc/veclibm/
>>>>
>>>> THeir implementations may provide good guidance on performant
>>>> implementations of various routines that libmvec typically provides.
>>> Ya, that's the idea of veclibm. The actual functions are written in
>>> a way that's more suitable for some other libraries, but the core
>>> computational implemenations should be the same. A few of us had
>>> briefly talked internally about getting these into glibc, IIUC all
>>> the code was written at Rivos and thus could be copyright assigned to
>>> the FSF and used in glibc. We don't have time to do that right now,
>>> but if you're interested in helping that'd be awesome. We'll need to
>>> be careful with the copyright/licensing, though.
>> Thanks for your reply. I also received an email from Peter Tang. I
>> am very interested in contributing to glibc.
>>> That said, I've never really quite managed to figure out how all the
>>> libmvec stuff is supposed to fit together. I'm more worried about
>>> the ABI side of things than the implementation, so I think starting
>>> with just one function to get the ABI template figure out is a
>>> reasonable way to go and we can get the rest of the implementations
>>> ported over next. The first thing that jumps out on the ABI side of
>>> things is cos() taking EMUL=2 types, I'm not sure if there's a reason
>>> for that but it seems we'd want EMUL=1 to fit more data in the
>>> argument registers?
>> Setting EMUL=2 is just a personal experiment. I think you are right
>> and I will improve it in the next version.
>>> Also, I think some of this can be split out: the
>>> roundtoint/converttoint isn't really a libmvec thing (see
>>> https://inbox.sourceware.org/libc-alpha/20220803174258.4235-1-palmer@rivosinc.com/,
>>> which fails some test), and ptr_barrier() can probably be pulled out
>>> to something generic as it's the same as arm64's version.
>>>
>>> I'm also only seeing draft versions of the vector intrinsics. I know
>>> we merged them into GCC and usually that means things are stable, but
>>> we merged these pre-freeze (based on some assertions things wouldn't
>>> change) and things have drifted around a bit it the spec. I think
>>> we're probably safe just depending on the types, if there's no frozen
>>> version we should at least write down exactly which version we're
>>> following though.
>> We are currently developing based on the latest branches. Can we
>> declare that we are following RVV 1.0?
>>> Also: are there GCC patches for these? It'd be great to be able to
>>> test things through the whole codegen stack so we can make sure it
>>> works.
>> Unfortunately, there are no patches for GCC right now. This may be the
>> direction of future work.
>>>> jeff
> This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not an intended recipient of this message, please delete it and any attachment from your system and notify the sender immediately by reply e-mail. Unintended recipients should not use, copy, disclose or take any action based on this message or any information contained in this message. Emails cannot be guaranteed to be secure or error free as they can be intercepted, amended, lost or destroyed, and you should take full responsibility for security checking.
>
> 本邮件及其任何附件具有保密性质,并可能受其他保护或不允许被披露给第三方。如阁下误收到本邮件,敬请立即以回复电子邮件的方式通知发件人,并将本邮件及其任何附件从阁下系统中予以删除。如阁下并非本邮件写明之收件人,敬请切勿使用、复制、披露本邮件或其任何内容,亦请切勿依本邮件或其任何内容而采取任何行动。电子邮件无法保证是一种安全和不会出现任何差错的通信方式,可能会被拦截、修改、丢失或损坏,收件人需自行负责做好安全检查。
old mode 100644
new mode 100755
@@ -80,3 +80,7 @@ if test "$libc_cv_static_pie_on_riscv" = yes; then
printf "%s\n" "#define SUPPORT_STATIC_PIE 1" >>confdefs.h
fi
+
+if test x"$build_mathvec" = xnotset; then
+ build_mathvec=yes
+fi
@@ -43,3 +43,7 @@ EOF
if test "$libc_cv_static_pie_on_riscv" = yes; then
AC_DEFINE(SUPPORT_STATIC_PIE)
fi
+
+if test x"$build_mathvec" = xnotset; then
+ build_mathvec=yes
+fi
new file mode 100644
@@ -0,0 +1,5 @@
+libmvec-supported-funcs = cos
+
+ifeq ($(subdir),mathvec)
+libmvec-support = $(addprefix d,$(libmvec-supported-funcs))
+endif
new file mode 100644
@@ -0,0 +1,5 @@
+libmvec {
+ GLIBC_2.40 {
+ _ZGVnN2v_cos;
+ }
+}
new file mode 100644
@@ -0,0 +1,29 @@
+/* Platform-specific SIMD declarations of math functions.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly;\
+ include <math.h> instead."
+#endif
+
+#if defined __riscv__
+# define __DECL_RVV_RISCV _Pragma
+# undef __DECL_RVV_cos
+# define __DECL_RVV_cos __DECL_RVV_RISCV
+#endif
new file mode 100644
@@ -0,0 +1,94 @@
+/* Double-precision vector cos function.
+
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "v_math.h"
+
+
+static const struct data
+{
+ vfloat64m2_t poly[7];
+ vfloat64m2_t range_val, shift, inv_pi, half_pi, pi_1, pi_2, pi_3;
+} data = {
+ /* Worst-case error is 3.3 ulp in [-pi/2, pi/2]. */
+ .poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7),
+ V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19),
+ V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33),
+ V2 (-0x1.9e9540300a1p-41) },
+ .inv_pi = V2 (0x1.45f306dc9c883p-2),
+ .half_pi = V2 (0x1.921fb54442d18p+0),
+ .pi_1 = V2 (0x1.921fb54442d18p+1),
+ .pi_2 = V2 (0x1.1a62633145c06p-53),
+ .pi_3 = V2 (0x1.c1cd129024e09p-106),
+ .shift = V2 (0x1.8p52),
+ .range_val = V2 (0x1p23)
+};
+
+#define C(i) d->poly[i]
+
+static vfloat64m2_t NOINLINE
+special_case (vfloat64m2_t x, vfloat64m2_t y, vuint64m2_t odd, vuint64m2_t cmp)
+{
+ y = vreinterpret_v_u64m2_f64m2 (vor (vreinterpret_v_f64m2_u64m2 (y), odd, 1));
+ return v_call_f64 (cos, x, y, cmp);
+}
+
+vfloat64m2_t V_NAME_D1 (cos) (vfloat64m2_t x)
+{
+ const struct data *d = ptr_barrier (&data);
+ vfloat64m2_t n, r, r2, r3, r4, t1, t2, t3, y;
+ vuint64m2_t odd, cmp;
+
+ r = vfabs_v_f64m2 (x, 2);
+ cmp = (vuint64m2_t) vmsgeu (vreinterpret_v_f64m2_u64m2 (r),
+ vreinterpret_v_f64m2_u64m2 (d->range_val));
+ if (__glibc_unlikely (v_any_u64 (cmp)))
+ /* If fenv exceptions are to be triggered correctly, set any special lanes
+ to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by
+ special-case handler later. */
+ r = vmsltu (cmp, v_f64 (1.0), r);
+
+ /* n = rint((|x|+pi/2)/pi) - 0.5. */
+ n = vfmadd (d->shift, d->inv_pi, vfadd (r, d->half_pi,2), 2);
+ odd = vshlq_n_u64 (vreinterpret_v_f64m2_u64m2 (n), 63);
+ n = vfsub (n, d->shift, 2);
+ n = vfsub (n, v_f64 (0.5), 2);
+
+ /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */
+ r = vfmsub (r, d->pi_1, n, 2);
+ r = vfmsub (r, d->pi_2, n, 2);
+ r = vfmsub (r, d->pi_3, n, 2);
+
+ /* sin(r) poly approx. */
+ r2 = vfmul (r, r, 2);
+ r3 = vfmul (r2, r, 2);
+ r4 = vfmul (r2, r2, 2);
+
+ t1 = vfmadd (C (4), C (5), r2, 2);
+ t2 = vfmadd (C (2), C (3), r2, 2);
+ t3 = vfmadd (C (0), C (1), r2, 2);
+
+ y = vfmadd (t1, C (6), r4, 2);
+ y = vfmadd (t2, y, r4, 2);
+ y = vfmadd (t3, y, r4, 2);
+ y = vfmadd (r, y, r3, 2);
+
+ if (__glibc_unlikely (v_any_u64 (cmp)))
+ return special_case (x, y, odd, cmp);
+ return vreinterpretq_f64_u64 (vor (vreinterpret_v_f64m2_u64m2 (y), odd, 2));
+}
new file mode 100644
@@ -0,0 +1,42 @@
+/* Configure optimized libm functions. RISC-V version.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef RISCV_MATH_PRIVATE_H
+#define RISCV_MATH_PRIVATE_H 1
+
+#include <stdint.h>
+#include <math.h>
+
+/* Use inline round and lround instructions. */
+#define TOINT_INTRINSICS 1
+
+static inline double_t
+roundtoint (double_t x)
+{
+ return round (x);
+}
+
+static inline int32_t
+converttoint (double_t x)
+{
+ return lround (x);
+}
+
+#include_next <math_private.h>
+
+#endif
new file mode 100644
@@ -0,0 +1,139 @@
+/* Utilities for Advanced SIMD libmvec routines.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef _V_MATH_H
+#define _V_MATH_H
+
+#include <riscv_vector.h>
+#include "vecmath_config.h"
+
+#define V_NAME_D1(fun) _ZGVnN2v_##fun
+
+/* Shorthand helpers for declaring constants. */
+#define V2(X) { X, X }
+#define V4(X) { X, X, X, X }
+#define V8(X) { X, X, X, X, X, X, X, X }
+
+static inline vfloat32m4_t
+v_f32 (float x)
+{
+ return (vfloat32m4_t) V4 (x);
+}
+static inline vuint32m4_t
+v_u32 (uint32_t x)
+{
+ return (vuint32m4_t) V4 (x);
+}
+static inline vint32m4_t
+v_s32 (int32_t x)
+{
+ return (vint32m4_t) V4 (x);
+}
+
+/* true if any elements of a vector compare result is non-zero. */
+static inline int
+v_any_u32 (vuint32m4_t x)
+{
+ /* assume elements in x are either 0 or -1u. */
+ return vpaddd_u64 (vreinterpret_v_u64m2_u32m2 (x)) != 0;
+}
+static inline int
+v_any_u32h (vuint32m2_t x)
+{
+ return vget_lane_u64 (vreinterpret_v_u32m2_u64m2 (x), 0) != 0;
+}
+static inline vfloat32m4_t
+v_lookup_f32 (const float *tab, vuint32m4_t idx)
+{
+ return (vfloat32m4_t){ tab[idx[0]], tab[idx[1]], tab[idx[2]], tab[idx[3]] };
+}
+static inline vuint32m4_t
+v_lookup_u32 (const uint32_t *tab, vuint32m4_t idx)
+{
+ return (vuint32m4_t){ tab[idx[0]], tab[idx[1]], tab[idx[2]], tab[idx[3]] };
+}
+static inline vfloat32m4_t
+v_call_f32 (float (*f) (float), vfloat32m4_t x, vfloat32m4_t y, vuint32m4_t p)
+{
+ return (vfloat32m4_t){ p[0] ? f (x[0]) : y[0], p[1] ? f (x[1]) : y[1],
+ p[2] ? f (x[2]) : y[2], p[3] ? f (x[3]) : y[3] };
+}
+static inline vfloat32m4_t
+v_call2_f32 (float (*f) (float, float), vfloat32m4_t x1, vfloat32m4_t x2,
+ vfloat32m4_t y, vuint32m4_t p)
+{
+ return (vfloat32m4_t){ p[0] ? f (x1[0], x2[0]) : y[0],
+ p[1] ? f (x1[1], x2[1]) : y[1],
+ p[2] ? f (x1[2], x2[2]) : y[2],
+ p[3] ? f (x1[3], x2[3]) : y[3] };
+}
+
+static inline vfloat64m2_t
+v_f64 (double x)
+{
+ return (vfloat64m2_t) V2 (x);
+}
+static inline vuint64m2_t
+v_u64 (uint64_t x)
+{
+ return (vuint64m2_t) V2 (x);
+}
+static inline vint64m2_t
+v_s64 (int64_t x)
+{
+ return (vint64m2_t) V2 (x);
+}
+
+/* true if any elements of a vector compare result is non-zero. */
+static inline int
+v_any_u64 (vuint64m1_t x)
+{
+ /* assume elements in x are either 0 or -1u. */
+ return vpaddd_u64 (x) != 0;
+}
+/* true if all elements of a vector compare result is 1. */
+static inline int
+v_all_u64 (vuint64m1_t x)
+{
+ /* assume elements in x are either 0 or -1u. */
+ return vpaddd_s64 (vreinterpretq_s64_u64 (x)) == -2;
+}
+static inline vfloat64m1_t
+v_lookup_f64 (const double *tab, vuint64m1_t idx)
+{
+ return (vfloat64m1_t){ tab[idx[0]], tab[idx[1]] };
+}
+static inline vuint64m1_t
+v_lookup_u64 (const uint64_t *tab, vuint64m1_t idx)
+{
+ return (vuint64m1_t){ tab[idx[0]], tab[idx[1]] };
+}
+static inline vfloat64m1_t
+v_call_f64 (double (*f) (double), vfloat64m1_t x, vfloat64m1_t y, vuint64m1_t p)
+{
+ return (vfloat64m1_t){ p[0] ? f (x[0]) : y[0], p[1] ? f (x[1]) : y[1] };
+}
+static inline vfloat64m1_t
+v_call2_f64 (double (*f) (double, double), vfloat64m1_t x1, vfloat64m1_t x2,
+ vfloat64m1_t y, vuint64m1_t p)
+{
+ return (vfloat64m1_t){ p[0] ? f (x1[0], x2[0]) : y[0],
+ p[1] ? f (x1[1], x2[1]) : y[1] };
+}
+
+#endif
new file mode 100644
@@ -0,0 +1,33 @@
+/* Configuration for libmvec routines.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef _VECMATH_CONFIG_H
+#define _VECMATH_CONFIG_H
+
+#include <math_private.h>
+
+/* Return ptr but hide its value from the compiler so accesses through it
+ cannot be optimized based on the contents. */
+#define ptr_barrier(ptr) \
+ ({ \
+ __typeof (ptr) __ptr = (ptr); \
+ __asm("" : "+r"(__ptr)); \
+ __ptr; \
+ })
+
+#endif
new file mode 100644
@@ -0,0 +1 @@
+GLIBC_2.40 _ZGVnN2v_cos F