From patchwork Tue Mar 21 17:01:22 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Matthias Kretz <m.kretz@gsi.de>
X-Patchwork-Id: 66707
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id D04E0384B0D4
	for <patchwork@sourceware.org>; Tue, 21 Mar 2023 17:01:55 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D04E0384B0D4
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1679418115;
	bh=6miSIw+o6mIroceCaK3IjtVkbNZEysq7tmCuFUxA4MA=;
	h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=Jdzcfyk6PYQmGMp/QJ9RmrsKE8Z/hs29dkCPxqdaD/8/dxiQygAdH7e+Nq5aXQiWm
	 WpnPIyxFQpqBvP8fvOY3UTQnBTE6fnsONdkExr5yDXI16Qj2iZRZ/zzjDFn1lxtJ0g
	 qVyX1/76/gCwpr80We7kg9MgOniDeHMqyvn/6JZk=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from lxmtout1.gsi.de (lxmtout1.gsi.de [140.181.3.111])
 by sourceware.org (Postfix) with ESMTPS id DFFBF3857007;
 Tue, 21 Mar 2023 17:01:24 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DFFBF3857007
Received: from localhost (localhost [127.0.0.1])
 by lxmtout1.gsi.de (Postfix) with ESMTP id BB5EA2051042;
 Tue, 21 Mar 2023 18:01:23 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at lxmtout1.gsi.de
Received: from lxmtout1.gsi.de ([127.0.0.1])
 by localhost (lxmtout1.gsi.de [127.0.0.1]) (amavisd-new, port 10024)
 with LMTP id ICwNkixSbgJQ; Tue, 21 Mar 2023 18:01:23 +0100 (CET)
Received: from srvEX6.campus.gsi.de (unknown [10.10.4.96])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lxmtout1.gsi.de (Postfix) with ESMTPS id 9E20B2051040;
 Tue, 21 Mar 2023 18:01:23 +0100 (CET)
Received: from minbar.localnet (140.181.3.12) by srvEX6.campus.gsi.de
 (10.10.4.96) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.26; Tue, 21 Mar
 2023 18:01:23 +0100
To: <gcc-patches@gcc.gnu.org>, <libstdc++@gcc.gnu.org>
Subject: [committed] libstdc++: Fix simd compilation with Clang
Date: Tue, 21 Mar 2023 18:01:22 +0100
Message-ID: <27030948.6Emhk5qWAg@minbar>
Organization: GSI Helmholtz Centre for Heavy Ion Research
In-Reply-To: 
 <CACb0b4nf-+u34UiV7DZ=v-06_qjBWVddihSWdTtOyg0c5fvfDg@mail.gmail.com>
References: 
 <CACb0b4nf-+u34UiV7DZ=v-06_qjBWVddihSWdTtOyg0c5fvfDg@mail.gmail.com>
MIME-Version: 1.0
X-Originating-IP: [140.181.3.12]
X-ClientProxiedBy: srvex5.Campus.gsi.de (10.10.4.95) To srvEX6.campus.gsi.de
 (10.10.4.96)
X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, BODY_8BITS,
 GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Matthias Kretz via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Matthias Kretz <m.kretz@gsi.de>
Reply-To: Matthias Kretz <m.kretz@gsi.de>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Slightly modified patch. I had to fix floating-point AVX512 blending on 
Clang by removing a cast. While at it I cleaned up the -Wundef noise.

----- 8< ------

Clang fails to compile some constant expressions involving simd.
Therefore, just disable this non-conforming extension for clang.

Fix AVX512 blend implementation for Clang. It was converting the bitmask
to bool before, which is obviously wrong. Instead use a Clang builtin to
convert the bitmask to vector-mask before using a vector blend ?:. A
similar change is required for the masked unary implementation, because
the GCC builtins do not exist on Clang.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

	* include/experimental/bits/simd_detail.h: Don't declare the
	simd API as constexpr with Clang.
	* include/experimental/bits/simd_x86.h (__movm): New.
	(_S_blend_avx512): Resolve FIXME. Implement blend using __movm
	and ?:.
	(_SimdImplX86::_S_masked_unary): Clang does not implement the
	same builtins. Implement the function using __movm, ?:, and -
	operators on vector_size types instead.
---
 .../include/experimental/bits/simd_detail.h   |  2 +-
 .../include/experimental/bits/simd_x86.h      | 58 +++++++++++++++++--
 2 files changed, 55 insertions(+), 5 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 stdₓ::simd
──────────────────────────────────────────────────────────────────────────

diff --git a/libstdc++-v3/include/experimental/bits/simd_detail.h b/libstdc++-v3/include/experimental/bits/simd_detail.h
index 30cc1ef0eef..49b94decf0a 100644
--- a/libstdc++-v3/include/experimental/bits/simd_detail.h
+++ b/libstdc++-v3/include/experimental/bits/simd_detail.h
@@ -267,7 +267,7 @@ namespace experimental
 #define _GLIBCXX_SIMD_IS_UNLIKELY(__x) __builtin_expect(__x, 0)
 #define _GLIBCXX_SIMD_IS_LIKELY(__x) __builtin_expect(__x, 1)
 
-#if defined __STRICT_ANSI__ && __STRICT_ANSI__
+#if __STRICT_ANSI__ || defined __clang__
 #define _GLIBCXX_SIMD_CONSTEXPR
 #define _GLIBCXX_SIMD_USE_CONSTEXPR_API const
 #else
diff --git a/libstdc++-v3/include/experimental/bits/simd_x86.h b/libstdc++-v3/include/experimental/bits/simd_x86.h
index 608918542c6..7b8f1c664b3 100644
--- a/libstdc++-v3/include/experimental/bits/simd_x86.h
+++ b/libstdc++-v3/include/experimental/bits/simd_x86.h
@@ -363,6 +363,53 @@ __maskload_pd(const double* __ptr, _Tp __k)
 
 // }}}
 
+#ifdef __clang__
+template <size_t _Np, typename _Tp, typename _Kp>
+  _GLIBCXX_SIMD_INTRINSIC constexpr auto
+  __movm(_Kp __k) noexcept
+  {
+    static_assert(is_unsigned_v<_Kp>);
+    if constexpr (sizeof(_Tp) == 1 && __have_avx512bw)
+      {
+	if constexpr (_Np <= 16 && __have_avx512vl)
+	  return __builtin_ia32_cvtmask2b128(__k);
+	else if constexpr (_Np <= 32 && __have_avx512vl)
+	  return __builtin_ia32_cvtmask2b256(__k);
+	else
+	  return __builtin_ia32_cvtmask2b512(__k);
+      }
+    else if constexpr (sizeof(_Tp) == 2 && __have_avx512bw)
+      {
+	if constexpr (_Np <= 8 && __have_avx512vl)
+	  return __builtin_ia32_cvtmask2w128(__k);
+	else if constexpr (_Np <= 16 && __have_avx512vl)
+	  return __builtin_ia32_cvtmask2w256(__k);
+	else
+	  return __builtin_ia32_cvtmask2w512(__k);
+      }
+    else if constexpr (sizeof(_Tp) == 4 && __have_avx512dq)
+      {
+	if constexpr (_Np <= 4 && __have_avx512vl)
+	  return __builtin_ia32_cvtmask2d128(__k);
+	else if constexpr (_Np <= 8 && __have_avx512vl)
+	  return __builtin_ia32_cvtmask2d256(__k);
+	else
+	  return __builtin_ia32_cvtmask2d512(__k);
+      }
+    else if constexpr (sizeof(_Tp) == 8 && __have_avx512dq)
+      {
+	if constexpr (_Np <= 2 && __have_avx512vl)
+	  return __builtin_ia32_cvtmask2q128(__k);
+	else if constexpr (_Np <= 4 && __have_avx512vl)
+	  return __builtin_ia32_cvtmask2q256(__k);
+	else
+	  return __builtin_ia32_cvtmask2q512(__k);
+      }
+    else
+      __assert_unreachable<_Tp>();
+  }
+#endif // __clang__
+
 #ifdef _GLIBCXX_SIMD_WORKAROUND_PR85048
 #include "simd_x86_conversions.h"
 #endif
@@ -619,14 +666,13 @@ _pdep_u32(
     _GLIBCXX_SIMD_INTRINSIC static _TV
     _S_blend_avx512(const _Kp __k, const _TV __a, const _TV __b) noexcept
     {
-#ifdef __clang__
-      // FIXME: this does a boolean choice, not a blend
-      return __k ? __a : __b;
-#else
       static_assert(__is_vector_type_v<_TV>);
       using _Tp = typename _VectorTraits<_TV>::value_type;
       static_assert(sizeof(_TV) >= 16);
       static_assert(sizeof(_Tp) <= 8);
+#ifdef __clang__
+      return __movm<_VectorTraits<_TV>::_S_full_size, _Tp>(__k) ? __b : __a;
+#else
       using _IntT
 	= conditional_t<(sizeof(_Tp) > 2),
 			conditional_t<sizeof(_Tp) == 4, int, long long>,
@@ -3483,6 +3529,9 @@ _S_masked_unary(const _SimdWrapper<_K, _Np> __k, const _SimdWrapper<_Tp, _Np> __
 	    // optimize masked unary increment and decrement as masked sub +/-1
 	    constexpr int __pm_one
 	      = is_same_v<_Op<void>, __increment<void>> ? -1 : 1;
+#ifdef __clang__
+	    return __movm<_Np, _Tp>(__k._M_data) ? __v._M_data - __pm_one : __v._M_data;
+#else // __clang__
 	    if constexpr (is_integral_v<_Tp>)
 	      {
 		constexpr bool __lp64 = sizeof(long) == sizeof(long long);
@@ -3526,6 +3575,7 @@ _S_masked_unary(const _SimdWrapper<_K, _Np> __k, const _SimdWrapper<_Tp, _Np> __
 		_GLIBCXX_SIMD_MASK_SUB(8, 16, subpd128);
 #undef _GLIBCXX_SIMD_MASK_SUB
 	      }
+#endif // __clang__
 	  }
 	else
 	  return _Base::template _S_masked_unary<_Op>(__k, __v);