Rasmus Munk Larsen
5297b7162a
Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions.
2021-02-25 18:21:21 +00:00
Rasmus Munk Larsen
be0574e215
New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.
2021-02-17 02:50:32 +00:00
Rasmus Munk Larsen
6e3b795f81
Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one.
2021-02-05 16:58:49 -08:00
Antonio Sanchez
4c42d5ee41
Eliminate implicit conversion warning in test/array_cwise.cpp
2021-01-23 11:54:00 -08:00
Antonio Sanchez
e0d13ead90
Replace std::isnan with numext::isnan for c++03
2021-01-23 11:02:35 -08:00
Antonio Sanchez
f0e46ed5d4
Fix pow and other cwise ops for half/bfloat16.
...
The new `generic_pow` implementation was failing for half/bfloat16 since
their construction from int/float is not `constexpr`. Modified
in `GenericPacketMathFunctions` to remove `constexpr`.
While adding tests for half/bfloat16, found other issues related to
implicit conversions.
Also needed to implement `numext::arg` for non-integer, non-complex,
non-float/double/long double types. These seem to be implicitly
converted to `std::complex<T>`, which then fails for half/bfloat16.
2021-01-22 11:10:54 -08:00
Rasmus Munk Larsen
cdd8fdc32e
Vectorize pow(x, y)
. This closes https://gitlab.com/libeigen/eigen/-/issues/2085 , which also contains a description of the algorithm.
...
I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics:
```
max_rel_error = 8.34405e-07
rms_rel_error = 2.76654e-07
```
If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`:
```
max_rel_error = 0.666667
rms = 6.8727e-05
count = 1335165689
argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45
```
which seems reasonable, since these results are subnormals with only couple of significant bits left.
2021-01-18 13:25:16 +00:00
Rasmus Munk Larsen
f9fac1d5b0
Add log2() to Eigen.
2020-12-04 21:45:09 +00:00
Rasmus Munk Larsen
f23dc5b971
Revert "Add log2() operator to Eigen"
...
This reverts commit 4d91519a9b
.
2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen
4d91519a9b
Add log2() operator to Eigen
2020-12-03 22:31:44 +00:00
Rasmus Munk Larsen
6964ae8d52
Change the sign operator in Eigen to return NaN for NaN arguments, not zero.
2020-07-07 01:54:04 +00:00
Rasmus Munk Larsen
74ec8e6618
Make size odd for transposeInPlace test to make sure we hit the scalar path.
2020-05-07 17:29:56 +00:00
Rasmus Munk Larsen
b47c777993
Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once.
...
rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.*TransposeInPlace.*float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench
10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s
(Generated by http://go/benchy . Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".*TransposeInPlace.*float.*" experimental/users/rmlarsen/bench:matmul_bench)
name old time/op new time/op delta
BM_TransposeInPlace<float>/4 9.84ns ± 0% 6.51ns ± 0% -33.80% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/8 23.6ns ± 1% 17.6ns ± 0% -25.26% (p=0.016 n=5+4)
BM_TransposeInPlace<float>/16 78.8ns ± 0% 60.3ns ± 0% -23.50% (p=0.029 n=4+4)
BM_TransposeInPlace<float>/32 302ns ± 0% 229ns ± 0% -24.40% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/59 1.03µs ± 0% 0.84µs ± 1% -17.87% (p=0.016 n=5+4)
BM_TransposeInPlace<float>/64 1.20µs ± 0% 0.89µs ± 1% -25.81% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/128 8.96µs ± 0% 3.82µs ± 2% -57.33% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/256 152µs ± 3% 17µs ± 2% -89.06% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/512 837µs ± 1% 208µs ± 0% -75.15% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/1k 4.28ms ± 2% 1.08ms ± 2% -74.72% (p=0.008 n=5+5)
2020-04-28 16:08:16 +00:00
Joel Holdsworth
d5c665742b
Add absolute_difference coefficient-wise binary Array function
2020-03-19 17:45:20 +00:00
Ilya Tokar
19876ced76
Bug #1785 : Introduce numext::rint.
...
This provides a new op that matches std::rint and previous behavior of
pround. Also adds corresponding unsupported/../Tensor op.
Performance is the same as e. g. floor (tested SSE/AVX).
2020-01-07 21:22:44 +00:00
Srinivas Vasudevan
88062b7fed
Fix implementation of complex expm1. Add tests that fail with previous implementation, but pass with the current one.
2019-12-12 01:56:54 +00:00
Gael Guennebaud
87427d2eaa
PR 719: fix real/imag namespace conflict
2019-10-08 09:15:17 +02:00
Gael Guennebaud
543529da6a
Add more extensive tests of Array ctors, including {} variants
2019-01-22 15:30:50 +01:00
Rasmus Munk Larsen
28ba1b2c32
Add support for inverse hyperbolic functions.
...
Fix cost of division.
2019-01-11 17:45:37 -08:00
Gael Guennebaud
e0f6d352fb
Rename test/array.cpp to test/array_cwise.cpp to avoid conflicts with the array header.
2018-09-20 18:07:32 +02:00