This patch splits implementation of generate_cannonical_any, into tree
different path depending on bits generated (R), required bits of entropy
(b), and value k such that R^k > 2^d > R^(k-1):
* log2(R) >= d: single call of generator is sufficient (k = 1),
and computation can be performed in the generator value type (_UIntR)
* bit_width(R^k) <= 128: computation can be performed using type
__detail::_Select_uint_least_t; previous implementation is reused
* bit_width(R^k) > 128: the computations required may overflow 128 bit;
such generators were previously unsupported, as illustrated by
r17-507-g5d17eec7d83445
This patch implements the support for the last branch, by observing
that while it is possible for computation to overflow 128 the sum:
sum_[0, k) urng() * R^i
The values of sum, such that (sum / x * 2^d) > 1, are rejected, and
any value (sum / x) > 2^d can be rejected. As 2^d is guaranteed to fit
in 128 integer, we can use above to reject the value, before overflow
will occur.
Firstly, we know that for l = k-1, the value R^l < 2^d, and
sum_[0, l) urng() * R^i <= R^l (because urng() is alwayus < R),
always fit in 128bit integer. We compute tha valuet (__lsum) using
normal loop.
This leaves us with final iteration of multiplying __kth * R^l,
where __kth is result of invocation of urng. In that case we
compute the bitwidth (__abits) of value that can be multiplied
by _R^l without overflow 128bit integer. The __kth * R^l is
performed by multiplying the loop of __abits of __kth in
loop where:
* __sumx (initially __lsum / __x) is the current value of sum / x
* __remx (initially 2^d - __sumx) is the value that we can still
add before overflowing (shifted with each iteration)
* __modx (initially __modx % __x) are corrected reminders
In each iteration the lowest __kth bits are multiplied with _Rl (__val),
scaled (__valx) and checked against __rem. When fits the __sumx, __modx
and __remx are updated, and both values of __remx and __kth are shifted
right by __abits.
If the multiplication finished without overflow (__kth is zero),
we handle the collected reminders (__modx), and compute the __ret value.
The value greater or equal 1 (e.g. due floating point aproximation), are
still rejected.
Similar approach is applied when computing x = R^k / 2^d.
Again while R^k will overflow 128 bits, we know that x < R (so fits
in 64bits), as R^l < 2^d < R^k, then x < R. We again compute
R * R^l / 2^d, by splitting the value of R into __abits chunks (_pR).
For result of chunks multipliations (_pRk), the bits (_xp) remaining
after division by __shift (starts from __d and reduces by __abits),
are collected into cumulative value of __x. While the lower bits
are preserved for later iteration (__pRk).
PR libstdc++/119739
libstdc++-v3/ChangeLog:
* include/bits/random.tcc (__generate_canonical_any):
Rewrite to support more cases.
* testsuite/26_numerics/random/uniform_real_distribution/operators/gencanon.cc:
Add test for previously unsupported generator.
* testsuite/26_numerics/random/uniform_real_distribution/operators/gencanon_eng.cc:
Add test for previously unsuported-generator ranges.
* testsuite/26_numerics/random/uniform_real_distribution/operators/gencanon_eng_neg.cc:
Moved (now-supported) generator test to gencanon_eng.cc, and removed
the file.
---
Tested on x86_64-linux. Also tested a modified version, with __bits <= 128
branch being removed and __abits hardoced to 10. This confirms that new
implementation gives the same output as existing one.
OK for trunk?
libstdc++-v3/include/bits/random.tcc | 169 +++++++++++++++---
.../operators/gencanon.cc | 96 ++++++++++
.../operators/gencanon_eng.cc | 63 +++++++
.../operators/gencanon_eng_neg.cc | 89 ---------
4 files changed, 300 insertions(+), 117 deletions(-)
delete mode 100644 libstdc++-v3/testsuite/26_numerics/random/uniform_real_distribution/operators/gencanon_eng_neg.cc
@@ -242,7 +242,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
constexpr size_t
mersenne_twister_engine<_UIntType, __w, __n, __m, __r, __a, __u, __d,
__s, __b, __t, __c, __l, __f>::tempering_u;
-
+
template<typename _UIntType,
size_t __w, size_t __n, size_t __m, size_t __r,
_UIntType __a, size_t __u, _UIntType __d, size_t __s,
@@ -3753,38 +3753,151 @@ namespace __detail
constexpr unsigned __log2R
= sizeof(_UIntR) * __CHAR_BIT__ - __builtin_clzg(__R) - 1;
// We overstimate number of required bits, by computing
- // r such that l * log2(R) >= d, so:
- // R^l >= (2 ^ log2(R)) ^ l == 2 ^ (log2(r) * l) >= 2^d
- // And then requiring l * bit_width(R) bits.
- constexpr unsigned __l = (__d + __log2R - 1) / __log2R;
- constexpr unsigned __bits = (__log2R + 1) * __l;
- using _UInt = typename __detail::_Select_uint_least_t<__bits>::type;
-
- _GLIBCXX_GEN_CANON_CONST _UInt __rd = _UInt(1) << __d;
- _GLIBCXX_GEN_CANON_CONST auto __logRrd = __gen_canon_log(__rd, __R);
- _GLIBCXX_GEN_CANON_CONST unsigned __k
- = __logRrd.__floor_log + (__rd > __logRrd.__floor_pow);
-
- _GLIBCXX_GEN_CANON_CONST _UInt __Rk
- = (__k > __logRrd.__floor_log)
- ? _UInt(__logRrd.__floor_pow) * _UInt(__R)
- : _UInt(__logRrd.__floor_pow);
- _GLIBCXX_GEN_CANON_CONST _UInt __x = __Rk / __rd;
+ // r such that m * log2(R) >= d, so:
+ // R^m >= (2 ^ log2(R)) ^ m == 2 ^ (log2(r) * m) >= 2^d
+ // And then requiring m * bit_width(R) bits.
+ constexpr unsigned __m = (__d + __log2R - 1) / __log2R;
+ constexpr unsigned __bits = (__log2R + 1) * __m;
+ if constexpr (__log2R >= __d)
+ {
+ // range already provide required number of bits,
+ // so in this case __k == 1, and single call to generator
+ // is sufficient. Futhermore 2^d fits in _UIntR
+ constexpr _UIntR __rd = _UIntR(1) << __d;
+ constexpr _UIntR __x = __R >> __d;
- while (true)
+ while (true)
+ {
+ _UIntR __val(__urng() - _Urbg::min());
+ const _RealT __ret = _RealT(__val / __x) / _RealT(__rd);
+ if (__ret < _RealT(1.0))
+ return __ret;
+ }
+ }
+ // generator call produce less bits than __d, but R^k > rd fits
+ // into 128 bits.
+ else if constexpr (__bits <= 128)
{
- _UInt __Ri{1};
- _UInt __sum(__urng() - _Urbg::min());
- for (int __i = __k - 1; __i > 0; --__i)
+ using _UInt = typename __detail::_Select_uint_least_t<__bits>::type;
+
+ _GLIBCXX_GEN_CANON_CONST _UInt __rd = _UInt(1) << __d;
+ _GLIBCXX_GEN_CANON_CONST auto __logRrd = __gen_canon_log(__rd, __R);
+ // __rd is power of two, and __R is not so __floor_pow is never
+ // equal to __rd.
+ _GLIBCXX_GEN_CANON_CONST unsigned __k = __logRrd.__floor_log + 1;
+ _GLIBCXX_GEN_CANON_CONST _UInt __Rk = __logRrd.__floor_pow * _UInt(__R);
+ // x = floor(R^k / 2^d) = R^k >> d;
+ _GLIBCXX_GEN_CANON_CONST _UInt __x = __Rk >> __d;
+
+ while (true)
{
- __Ri *= _UInt(__R);
- __sum += _UInt(__urng() - _Urbg::min()) * __Ri;
+ _UInt __Ri{1};
+ _UInt __sum(__urng() - _Urbg::min());
+ for (int __i = __k - 1; __i > 0; --__i)
+ {
+ __Ri *= __R;
+ __sum += __Ri * _UIntR(__urng() - _Urbg::min());
+ }
+ const _RealT __ret = _RealT(__sum / __x) / _RealT(__rd);
+ if (__ret < _RealT(1.0))
+ return __ret;
}
- const _RealT __ret = _RealT(__sum / __x) / _RealT(__rd);
- if (__ret < _RealT(1.0))
- return __ret;
}
-#undef _GLIBCXX_GEN_CANON_CONST
+ else
+ {
+#pragma GCC diagnostic ignored "-Wshift-count-overflow" // we shift by 64bits
+ static_assert(__log2R < 64, "Only generators emitting up to 64 bits are supported");
+ using _UInt = typename __detail::_Select_uint_least_t<128>::type;
+ using _UInt64 = typename __detail::_Select_uint_least_t<64>::type;
+
+ _GLIBCXX_GEN_CANON_CONST _UInt __rd = _UInt(1) << __d;
+ _GLIBCXX_GEN_CANON_CONST auto __logRrd = __gen_canon_log(__rd, __R);
+ // __rd is power of two, and __R is not so __floor_pow is never
+ // equal to __rd, and l == k - 1
+ _GLIBCXX_GEN_CANON_CONST unsigned __l = __logRrd.__floor_log;
+ _GLIBCXX_GEN_CANON_CONST _UInt __Rl = __logRrd.__floor_pow;
+
+ // __abits is maximum bit width of the value, that can
+ // be multiplied by R^l without overflowing 128 bit integer
+ // TODO: use bit_width(_Rl)
+ _GLIBCXX_GEN_CANON_CONST unsigned __bwRl = (__log2R + 1) * __l;
+ // For __R close to power of two, the actual __k may be smaller than __m,
+ // and will use less than 128bits.
+ _GLIBCXX_GEN_CANON_CONST unsigned __abits = __bwRl > 64 ? 128 - __bwRl : 64;
+ _GLIBCXX_GEN_CANON_CONST _UInt64 __amask
+ = (__abits == 64) ? ~_UInt64(0) : (_UInt64(1) << __abits) - 1;
+
+ // Compute __x = _Rk / __rd (_Rk >> __d), by spliting _R into chunks
+ // (_pR) of __abits, so their multiplication does not overflow 128 bits.
+ // __x fits in 64bits, becuse __R^l < __rd < __R^k, thus __x < _R
+ _UInt64 __x(0), __pR(__R); _UInt __pRk(0);
+ for (unsigned __shift = __d; __pR; __shift -= __abits)
+ {
+ __pRk += __Rl * (__pR & __amask);
+ __pR >>= __abits;
+
+ _UInt __xp(__pRk >> __shift);
+ __x += _UInt64(__xp);
+ __pRk -= (__xp << __shift);
+ __pRk >>= __abits;
+ }
+
+ while (true)
+ {
+ // Compute sum of __urng() * R^i for i in range [0, l).
+ // The values is smaller than R^l which is smaller than
+ // __rd so fits into 128 bit integer.
+ _UInt __Ri{1};
+ _UInt __lsum(__urng() - _Urbg::min());
+ for (int __i = __l - 1; __i > 0; --__i)
+ {
+ __Ri *= _UInt64(__R);
+ __lsum += __Ri * _UInt64(__urng() - _Urbg::min());
+ }
+
+ // The last iteration __urng() * R^k may overflow 128bit
+ // integer. We use the fact that we reject __sum / __x >= __rd,
+ // and split __urng() value into chunks of __abits, so their
+ // products with R^l does not overflow 128bit. These product
+ // are then multiplied by 2^__shift, scaled by __x and added into
+ // the __sumx, until they value is smaller than remaining value
+ // __rem [(__rd - __sumx) * 2^__shift], or whole value was
+ // multiplied (__kth is zero). The reminder of division by
+ // __x are accumulated into __modx value, and handled later.
+ // In consequence this guarantees that __sumx never exceeded
+ // __rd value, and thus does not overflow 128 bit integer.
+ _UInt __sumx = __lsum / __x, __modx = __lsum % __x;
+ _UInt __remx = __rd - __sumx;
+ _UInt64 __kth(__urng() - _Urbg::min());
+ for (unsigned __shift = 0; __kth; __shift += __abits)
+ {
+ const _UInt __val = __Rl * (__kth & __amask);
+ const _UInt __valx = __val / __x;
+ if (__valx > __remx)
+ break;
+ __kth >>= __abits;
+
+ __sumx += (__valx << __shift);
+ __modx += ((__val % __x) << __shift);
+
+ __remx -= __valx;
+ __remx >>= __abits;
+ }
+ if (__kth) // __sum / __x > _rd after adding __kth iteration
+ continue;
+
+ // Handle accumulated reminders
+ const _UInt __valx = __modx / __x;
+ if (__valx > __rd - __sumx) // avoids overflow
+ continue;
+
+ __sumx += __valx;
+ const _RealT __ret = _RealT(__sumx) / _RealT(__rd);
+ if (__ret < _RealT(1.0))
+ return __ret;
+ }
+ }
+#undef _GLIBCXX_GEN_CANON_CONST
}
#if !defined(_GLIBCXX_GENERATE_CANONICAL_STRICT)
@@ -234,6 +234,98 @@ test_2p31m1(const std::mt19937& rng)
}
}
+template<std::uint64_t Max, typename Under = std::mt19937_64>
+struct trimmed_engine
+{
+ using result_type = std::uint64_t;
+
+ static constexpr
+ result_type min()
+ { return result_type(0); }
+
+ static constexpr
+ result_type max()
+ { return result_type(Max); }
+
+ trimmed_engine() : dist(min(), max())
+ {}
+
+ trimmed_engine(const Under& gen)
+ : under(gen), dist(min(), max())
+ {}
+
+ result_type operator()()
+ { return dist(under); }
+
+ void discard(std::size_t n)
+ {
+ for (size_t i = 0; i < n; ++i)
+ (void)dist(under);
+ }
+
+ friend bool
+ operator==(const trimmed_engine& lhs, const trimmed_engine& rhs)
+ { return lhs.under == rhs.under; }
+
+private:
+ Under under;
+ std::uniform_int_distribution<result_type> dist;
+};
+
+// Uses a generate that emits range of size 2^55+2.
+// To produce 128bit IEE float, we need 3 calls to above generator, that can
+// produce value that have 155 bits, and uses separate path.
+template <typename T>
+void
+test_2p55p1(const std::mt19937& rng)
+{
+ if (!std::numeric_limits<T>::is_iec559)
+ return;
+
+ constexpr size_t mantissa = std::numeric_limits<T>::digits;
+ constexpr size_t call_per_elem = (mantissa / 55) + 1;
+ const trimmed_engine<(1ULL << 55) + 1ULL, std::mt19937> lrng{rng};
+
+ int deviation = 0, max = 0, rms = 0, zeros = 0;
+ int skips = run_generator<T, -1u, call_per_elem, 1u>
+ (lrng, deviation, max, rms, zeros);
+
+ switch (mantissa)
+ {
+ case 24: // ieee32
+ VERIFY(skips == 0);
+ VERIFY(deviation == 7682);
+ VERIFY(max == 295);
+ VERIFY(rms == 971);
+ VERIFY(zeros == 0);
+ break;
+ case 53: // ieee64
+ VERIFY(skips == 0);
+ VERIFY(deviation == 7682);
+ VERIFY(max == 295);
+ VERIFY(rms == 971);
+ VERIFY(zeros == 0);
+ break;
+ case 64: // ieee80
+ VERIFY(skips == 0);
+ VERIFY(deviation == 6788);
+ VERIFY(max == 248);
+ VERIFY(rms == 854);
+ VERIFY(zeros == 0);
+ break;
+ case 113: // ieee128
+ VERIFY(skips == 0);
+ VERIFY(deviation == 8222);
+ VERIFY(max == 257);
+ VERIFY(rms == 999);
+ VERIFY(zeros == 0);
+ break;
+ default:
+ VERIFY(false);
+ break;
+ }
+}
+
int main()
{
std::mt19937 rng(8890);
@@ -244,20 +336,24 @@ int main()
test_2p32<float>(rng);
test_10p6<float>(rng);
test_2p31m1<float>(rng);
+ test_2p55p1<float>(rng);
test_2p32<double>(rng);
test_10p6<double>(rng);
test_2p31m1<double>(rng);
+ test_2p55p1<double>(rng);
test_2p32<long double>(rng);
test_10p6<long double>(rng);
test_2p31m1<long double>(rng);
+ test_2p55p1<long double>(rng);
#ifndef _GLIBCXX_GENERATE_CANONICAL_STRICT
# ifdef __SIZEOF_FLOAT128__
test_2p32<__float128>(rng);
test_10p6<__float128>(rng);
test_2p31m1<__float128>(rng);
+ test_2p55p1<__float128>(rng);
# endif
#endif
}
@@ -10,6 +10,40 @@ test_engine()
(void)std::generate_canonical<_Real, size_t(-1)>(__engine);
}
+template<std::uint64_t Max, typename Under = std::mt19937_64>
+struct trimmed_engine
+{
+ using result_type = std::uint64_t;
+
+ static constexpr
+ result_type min()
+ { return result_type(0); }
+
+ static constexpr
+ result_type max()
+ { return result_type(Max); }
+
+ trimmed_engine() : dist(min(), max())
+ {}
+
+ result_type operator()()
+ { return dist(under); }
+
+private:
+ Under under;
+ std::uniform_int_distribution<result_type> dist;
+};
+
+template<typename Real, size_t Bits>
+void
+test_bits()
+{
+ trimmed_engine<(std::uint64_t(1) << Bits) - 1> pow2_engine;
+ (void)std::generate_canonical<Real, -1u>(pow2_engine);
+ trimmed_engine<(std::uint64_t(1) << Bits) - 2> non_pow2_engine;
+ (void)std::generate_canonical<Real, -1u>(non_pow2_engine);
+}
+
template<typename _Real>
void
test_all_engines()
@@ -29,6 +63,35 @@ test_all_engines()
test_engine<_Real, std::philox4x32>();
test_engine<_Real, std::philox4x64>();
#endif
+
+ // For 128bit floating points, generator emitting a range, which size is
+ // not power of two, but of width of B bits, such that for any N:
+ // N * B < 113 (bits in ieee128)
+ // (N+1) * B > 128
+ // use >128bits patch, as they would otherwise require integer with more
+ // than 128 bits.
+ // N == 3: B in [43, 57)
+ test_bits<_Real, 42>(); // 3 calls
+ test_bits<_Real, 43>(); // 3 calls, >128bits path
+ test_bits<_Real, 56>(); // 3 calls, >128bits path
+ test_bits<_Real, 57>(); // 2 calls
+
+ // N == 4: B in [33, 38)
+ test_bits<_Real, 32>(); // 4 calls
+ test_bits<_Real, 33>(); // 4 calls, >128bits path
+ test_bits<_Real, 37>(); // 4 calls, >128bits path
+ test_bits<_Real, 38>(); // 3 calls
+
+ // N == 5: B in [26, 29)
+ test_bits<_Real, 25>(); // 5 calls
+ test_bits<_Real, 26>(); // 5 calls, >128bits path
+ test_bits<_Real, 28>(); // 5 calls, >128bits path
+ test_bits<_Real, 29>(); // 4 calls
+
+ // N == 6: B == 22
+ test_bits<_Real, 21>(); // 6 calls
+ test_bits<_Real, 22>(); // 6 calls, >128bits path
+ test_bits<_Real, 23>(); // 5 calls
}
int main()
deleted file mode 100644
@@ -1,89 +0,0 @@
-// { dg-do compile { target { c++11 } } }
-// { dg-require-effective-target __float128 }
-// { dg-require-effective-target base_quadfloat_support }
-// { dg-add-options __float128 }
-
-#include <random>
-#include <cstdint>
-
-template<std::uint64_t Max, typename Under = std::mt19937_64>
-struct trimmed_engine
-{
- using result_type = std::uint64_t;
-
- static constexpr
- result_type min()
- { return result_type(0); }
-
- static constexpr
- result_type max()
- { return result_type(Max); }
-
- trimmed_engine() : dist(min(), max())
- {}
-
- result_type operator()()
- { return dist(under); }
-
-private:
- Under under;
- std::uniform_int_distribution<result_type> dist;
-};
-
-template<typename Real, size_t Bits>
-void
-test_non_pow2()
-{
- trimmed_engine<(std::uint64_t(1) << Bits) - 2> non_pow2_engine;
- (void)std::generate_canonical<Real, -1u>(non_pow2_engine);
-}
-
-template<typename Real, size_t Bits>
-void
-test_pow2()
-{
- trimmed_engine<(std::uint64_t(1) << Bits) - 1> pow2_engine;
- (void)std::generate_canonical<Real, -1u>(pow2_engine);
-}
-
-int main()
-{
- // For 128bit floating points, generator emitting a range, which size is
- // not power of two, but of width of B bits, such that for any N:
- // N * B < 113 (bits in ieee128)
- // (N+1) * B > 128
- // are not supported, as they would require integer with more than 127 bits.
-
- // N == 3: B in [43, 57)
- test_non_pow2<__float128, 42>(); // 3 calls
- test_non_pow2<__float128, 43>(); // { dg-error "from here" }
- test_non_pow2<__float128, 56>(); // { dg-error "from here" }
- test_non_pow2<__float128, 57>(); // 2 calls
- test_pow2<__float128, 43>();
- test_pow2<__float128, 56>();
-
- // N == 4: B in [33, 38)
- test_non_pow2<__float128, 32>(); // 4 calls
- test_non_pow2<__float128, 33>(); // { dg-error "from here" }
- test_non_pow2<__float128, 37>(); // { dg-error "from here" }
- test_non_pow2<__float128, 38>(); // 3 calls
- test_pow2<__float128, 33>();
- test_pow2<__float128, 37>();
-
- // N == 5: B in [26, 29)
- test_non_pow2<__float128, 25>(); // 5 calls
- test_non_pow2<__float128, 26>(); // { dg-error "from here" }
- test_non_pow2<__float128, 28>(); // { dg-error "from here" }
- test_non_pow2<__float128, 29>(); // 4 calls
- test_pow2<__float128, 26>();
- test_pow2<__float128, 28>();
-
- // N == 6: B == 22
- test_non_pow2<__float128, 21>(); // 6 calls
- test_non_pow2<__float128, 22>(); // { dg-error "from here" }
- test_non_pow2<__float128, 23>(); // 5 calls
- test_pow2<__float128, 22>();
-}
-
-// { dg-prune-output "no type named 'type' in 'struct std::__detail::_Select_uint_least_t" }
-// { dg-prune-output "static assertion failed: sorry, would be too much trouble for a slow result" }