Everton Constantino
8a7f360ec3
- Vectorizing MMA packing.
...
- Optimizing MMA kernel.
- Adding PacketBlock store to blas_data_mapper.
2020-05-19 19:24:11 +00:00
Rasmus Munk Larsen
9b411757ab
Add missing packet ops for bool, and make it pass the same packet op unit tests as other arithmetic types.
...
This change also contains a few minor cleanups:
1. Remove packet op pnot, which is not needed for anything other than pcmp_le_or_nan,
which can be done in other ways.
2. Remove the "HasInsert" enum, which is no longer needed since we removed the
corresponding packet ops.
3. Add faster pselect op for Packet4i when SSE4.1 is supported.
Among other things, this makes the fast transposeInPlace() method available for Matrix<bool>.
Run on ************** (72 X 2994 MHz CPUs); 2020-05-09T10:51:02.372347913-07:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------------
BM_TransposeInPlace<float>/4 9.77 9.77 71670320
BM_TransposeInPlace<float>/8 21.9 21.9 31929525
BM_TransposeInPlace<float>/16 66.6 66.6 10000000
BM_TransposeInPlace<float>/32 243 243 2879561
BM_TransposeInPlace<float>/59 844 844 829767
BM_TransposeInPlace<float>/64 933 933 750567
BM_TransposeInPlace<float>/128 3944 3945 177405
BM_TransposeInPlace<float>/256 16853 16853 41457
BM_TransposeInPlace<float>/512 204952 204968 3448
BM_TransposeInPlace<float>/1k 1053889 1053861 664
BM_TransposeInPlace<bool>/4 14.4 14.4 48637301
BM_TransposeInPlace<bool>/8 36.0 36.0 19370222
BM_TransposeInPlace<bool>/16 31.5 31.5 22178902
BM_TransposeInPlace<bool>/32 111 111 6272048
BM_TransposeInPlace<bool>/59 626 626 1000000
BM_TransposeInPlace<bool>/64 428 428 1632689
BM_TransposeInPlace<bool>/128 1677 1677 417377
BM_TransposeInPlace<bool>/256 7126 7126 96264
BM_TransposeInPlace<bool>/512 29021 29024 24165
BM_TransposeInPlace<bool>/1k 116321 116330 6068
2020-05-14 22:39:13 +00:00
Felipe Attanasio
d640276d31
Added support for reverse iterators for Vectorwise operations.
2020-05-14 22:38:20 +00:00
Christopher Moore
fa8fd4b4d5
Indexed view should have RowMajorBit when there is staticly a single row
2020-05-14 22:11:19 +00:00
Christopher Moore
a187ffea28
Resolve "IndexedView of a vector should allow linear access"
2020-05-13 19:24:42 +00:00
Rasmus Munk Larsen
c1d944dd91
Remove packet ops pinsertfirst and pinsertlast that are only used in a single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp.
...
I cannot measure any performance changes for SSE, AVX, or AVX512.
name old time/op new time/op delta
BM_LinSpace<float>/1 1.63ns ± 0% 1.63ns ± 0% ~ (p=0.762 n=5+5)
BM_LinSpace<float>/8 4.92ns ± 3% 4.89ns ± 3% ~ (p=0.421 n=5+5)
BM_LinSpace<float>/64 34.6ns ± 0% 34.6ns ± 0% ~ (p=0.841 n=5+5)
BM_LinSpace<float>/512 217ns ± 0% 217ns ± 0% ~ (p=0.421 n=5+5)
BM_LinSpace<float>/4k 1.68µs ± 0% 1.68µs ± 0% ~ (p=1.000 n=5+5)
BM_LinSpace<float>/32k 13.3µs ± 0% 13.3µs ± 0% ~ (p=0.905 n=5+4)
BM_LinSpace<float>/256k 107µs ± 0% 107µs ± 0% ~ (p=0.841 n=5+5)
BM_LinSpace<float>/1M 427µs ± 0% 427µs ± 0% ~ (p=0.690 n=5+5)
2020-05-08 15:41:50 -07:00
Rasmus Munk Larsen
225ab040e0
Remove unused packet op "palign".
...
Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
2020-05-07 17:14:26 -07:00
Rasmus Munk Larsen
74ec8e6618
Make size odd for transposeInPlace test to make sure we hit the scalar path.
2020-05-07 17:29:56 +00:00
Rasmus Munk Larsen
ab773c7e91
Extend support for Packet16b:
...
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests
This speeds up matmul for boolean matrices by about 10x
name old time/op new time/op delta
BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5)
BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5)
BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5)
BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5)
BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5)
BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5)
BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)
2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen
b47c777993
Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once.
...
rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.*TransposeInPlace.*float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench
10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s
(Generated by http://go/benchy . Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".*TransposeInPlace.*float.*" experimental/users/rmlarsen/bench:matmul_bench)
name old time/op new time/op delta
BM_TransposeInPlace<float>/4 9.84ns ± 0% 6.51ns ± 0% -33.80% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/8 23.6ns ± 1% 17.6ns ± 0% -25.26% (p=0.016 n=5+4)
BM_TransposeInPlace<float>/16 78.8ns ± 0% 60.3ns ± 0% -23.50% (p=0.029 n=4+4)
BM_TransposeInPlace<float>/32 302ns ± 0% 229ns ± 0% -24.40% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/59 1.03µs ± 0% 0.84µs ± 1% -17.87% (p=0.016 n=5+4)
BM_TransposeInPlace<float>/64 1.20µs ± 0% 0.89µs ± 1% -25.81% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/128 8.96µs ± 0% 3.82µs ± 2% -57.33% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/256 152µs ± 3% 17µs ± 2% -89.06% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/512 837µs ± 1% 208µs ± 0% -75.15% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/1k 4.28ms ± 2% 1.08ms ± 2% -74.72% (p=0.008 n=5+5)
2020-04-28 16:08:16 +00:00
Rasmus Munk Larsen
e80ec24357
Remove unused packet op "preduxp".
2020-04-23 18:17:14 +00:00
Rasmus Munk Larsen
2f6ddaa25c
Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x.
...
Benchmark numbers for the logical and of two NxN tensors:
name old time/op new time/op delta
BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96%
BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07%
BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87%
BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59%
BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87%
BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45%
BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57%
BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83%
BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01%
BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93%
BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11%
BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31%
BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35%
BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07%
BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08%
BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55%
BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%
2020-04-20 20:16:28 +00:00
Christoph Hertzberg
d46d726e9d
CommaInitializer wrongfully asserted for 0-sized blocks
...
commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
2020-04-13 16:41:20 +02:00
Antonio Sanchez
8e875719b3
Replace norm() with squaredNorm() to address integer overflows
...
For random matrices with integer coefficients, many of the tests here lead to
integer overflows. When taking the norm() of a row/column, the squaredNorm()
often overflows to a negative value, leading to domain errors when taking the
sqrt(). This leads to a crash on some systems. By replacing the norm() call by
a squaredNorm(), the values still overflow, but at least there is no domain
error.
Addresses https://gitlab.com/libeigen/eigen/-/issues/1856
2020-04-07 19:48:28 +00:00
Rasmus Munk Larsen
4fd5d1477b
Fix packetmath test build for AVX.
2020-03-27 17:05:39 +00:00
Rasmus Munk Larsen
55c8fe8d0f
Fix bug in 52d54278be
2020-03-27 16:41:15 +00:00
Joel Holdsworth
52d54278be
Additional NEON packet-math operations
2020-03-26 20:18:19 +00:00
Aaron Franke
5c22c7a7de
Make file formatting comply with POSIX and Unix standards
...
UTF-8, LF, no BOM, and newlines at the end of files
2020-03-23 18:09:02 +00:00
Joel Holdsworth
d5c665742b
Add absolute_difference coefficient-wise binary Array function
2020-03-19 17:45:20 +00:00
Joel Holdsworth
54aa8fa186
Implement integer square-root for NEON
2020-03-19 17:05:13 +00:00
Joel Holdsworth
88337acae2
test/packetmath: Add tests for all integer types
2020-03-10 22:46:19 +00:00
Joel Holdsworth
9e68977578
test/packetmath: Made negate non-mandatory
2020-03-10 22:46:19 +00:00
Rasmus Munk Larsen
6ac37768a9
Revert "add some static checks for packet-picking logic"
...
This reverts commit 7769600245
2020-02-25 01:07:04 +00:00
Rasmus Munk Larsen
87cfa4862f
Revert "Disable test in test/vectorization_logic.cpp, which is currently failing with AVX."
...
This reverts commit b625adffd8
2020-02-25 01:04:56 +00:00
Rasmus Munk Larsen
b625adffd8
Disable test in test/vectorization_logic.cpp, which is currently failing with AVX.
2020-02-24 23:28:25 +00:00
Francesco Mazzoli
7769600245
add some static checks for packet-picking logic
2020-02-07 18:16:16 +01:00
Christoph Hertzberg
1d0c45122a
Removing executable bit from file mode
2020-01-11 15:02:29 +01:00
Christoph Hertzberg
35219cea68
Bug #1790 : Make areApprox
check numext::isnan
instead of bitwise equality (NaNs don't have to be bitwise equal).
2020-01-11 14:57:22 +01:00
Srinivas Vasudevan
2e099e8d8f
Added special_packetmath test and tweaked bounds on tests.
...
Refactor shared packetmath code to header file.
(Squashed from PR !38 )
2020-01-11 10:31:21 +00:00
Christoph Hertzberg
8333e03590
Use data.data() instead of &data (since it is not obvious that Array is trivially copyable)
2020-01-09 11:38:19 +01:00
Ilya Tokar
19876ced76
Bug #1785 : Introduce numext::rint.
...
This provides a new op that matches std::rint and previous behavior of
pround. Also adds corresponding unsupported/../Tensor op.
Performance is the same as e. g. floor (tested SSE/AVX).
2020-01-07 21:22:44 +00:00
Everton Constantino
eedb7eeacf
Protecting integer_types's long long test with a check to see if we have CXX11 support.
2020-01-07 14:35:35 +00:00
Christoph Hertzberg
870e53c0f2
Bug #1788 : Fix rule-of-three violations inside the stable modules.
...
This fixes deprecated-copy warnings when compiling with GCC>=9
Also protect some additional Base-constructors from getting called by user code code (#1587 )
2019-12-19 17:30:11 +01:00
Christoph Hertzberg
6965f6de7f
Fix unit-test which I broke in previous fix
2019-12-19 13:42:14 +01:00
Christoph Hertzberg
72166d0e6e
Fix some maybe-unitialized warnings
2019-12-18 18:26:20 +01:00
Christoph Hertzberg
5a3eaf88ac
Workaround class-memaccess warnings on newer GCC versions
2019-12-18 16:37:26 +01:00
Rasmus Munk Larsen
a566074480
Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function).
...
This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in 66f07efeae
), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9.
This change also contains a few improvements to speed up the original float specialization of logistic:
- Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case).
- Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup).
The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set.
The benchmarks below repeated calls
u = v.logistic() (u = v.tanh(), respectively)
where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1].
Benchmark numbers for logistic:
Before:
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float 4467 4468 155835 model_time: 4827
AVX
BM_eigen_logistic_float 2347 2347 299135 model_time: 2926
AVX+FMA
BM_eigen_logistic_float 1467 1467 476143 model_time: 2926
AVX512
BM_eigen_logistic_float 805 805 858696 model_time: 1463
After:
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float 2589 2590 270264 model_time: 4827
AVX
BM_eigen_logistic_float 1428 1428 489265 model_time: 2926
AVX+FMA
BM_eigen_logistic_float 1059 1059 662255 model_time: 2926
AVX512
BM_eigen_logistic_float 673 673 1000000 model_time: 1463
Benchmark numbers for tanh:
Before:
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float 2391 2391 292624 model_time: 4242
AVX
BM_eigen_tanh_float 1256 1256 554662 model_time: 2633
AVX+FMA
BM_eigen_tanh_float 823 823 866267 model_time: 1609
AVX512
BM_eigen_tanh_float 443 443 1578999 model_time: 805
After:
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float 2588 2588 273531 model_time: 4242
AVX
BM_eigen_tanh_float 1536 1536 452321 model_time: 2633
AVX+FMA
BM_eigen_tanh_float 1007 1007 694681 model_time: 1609
AVX512
BM_eigen_tanh_float 471 471 1472178 model_time: 805
2019-12-16 21:33:42 +00:00
Ilya Tokar
06e99aaf40
Bug 1785: fix pround on x86 to use the same rounding mode as std::round.
...
This also adds pset1frombits helper to Packet[24]d.
Makes round ~45% slower for SSE: 1.65µs ± 1% before vs 2.45µs ± 2% after,
stil an order of magnitude faster than scalar version: 33.8µs ± 2%.
2019-12-12 17:38:53 -05:00
Srinivas Vasudevan
88062b7fed
Fix implementation of complex expm1. Add tests that fail with previous implementation, but pass with the current one.
2019-12-12 01:56:54 +00:00
Joel Holdsworth
1b6e0395e6
Added io test
2019-12-11 18:22:57 +00:00
Gael Guennebaud
6358599ecb
Fix QuaternionBase::cast for quaternion map and wrapper.
2019-12-03 14:51:14 +01:00
Gael Guennebaud
7745f69013
bug #1776 : fix vector-wise STL iterator's operator-> using a proxy as pointer type.
...
This changeset fixes also the value_type definition.
2019-12-03 14:40:15 +01:00
Joel Holdsworth
743c925286
test/packetmath: Silence alignment warnings
2019-11-05 19:06:12 +00:00
Hans Johnson
8c8cab1afd
STYLE: Convert CMake-language commands to lower case
...
Ancient CMake versions required upper-case commands. Later command names
became case-insensitive. Now the preferred style is lower-case.
2019-10-31 11:36:37 -05:00
Hans Johnson
6fb3e5f176
STYLE: Remove CMake-language block-end command arguments
...
Ancient versions of CMake required else(), endif(), and similar block
termination commands to have arguments matching the command starting the block.
This is no longer the preferred style.
2019-10-31 11:36:27 -05:00
Rasmus Munk Larsen
f1e8307308
1. Fix a bug in psqrt and make it return 0 for +inf arguments.
...
2. Simplify handling of special cases by taking advantage of the fact that the
builtin vrsqrt approximation handles negative, zero and +inf arguments correctly.
This speeds up the SSE and AVX implementations by ~20%.
3. Make the Newton-Raphson formula used for rsqrt more numerically robust:
Before: y = y * (1.5 - x/2 * y^2)
After: y = y * (1.5 - y * (x/2) * y)
Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision.
4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration.
Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o
2019-11-15 17:09:46 -08:00
Gael Guennebaud
8af045a287
bug #1774 : fix VectorwiseOp::begin()/end() return types regarding constness.
2019-11-14 11:45:52 +01:00
Gael Guennebaud
8496f86f84
Enable CompleteOrthogonalDecomposition::pseudoInverse with non-square fixed-size matrices.
2019-11-13 21:16:53 +01:00
Gael Guennebaud
e7d8ba747c
bug #1752 : make is_convertible equivalent to the std c++11 equivalent and fallback to std::is_convertible when c++11 is enabled.
2019-10-10 17:41:47 +02:00
Gael Guennebaud
fb557aec5c
bug #1752 : disable some is_convertible tests for recent compilers.
2019-10-10 11:40:21 +02:00
Gael Guennebaud
36da231a41
Disable an expected warning in unit test
2019-10-08 16:28:14 +02:00
Gael Guennebaud
87427d2eaa
PR 719: fix real/imag namespace conflict
2019-10-08 09:15:17 +02:00
Rasmus Larsen
d38e6fbc27
Merged in rmlarsen/eigen (pull request PR-704)
...
Add generic PacketMath implementation of the Error Function (erf).
2019-09-24 23:40:29 +00:00
Christoph Hertzberg
efd9867ff0
bug #1746 : Removed implementation of standard copy-constructor and standard copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types
2019-09-24 11:09:58 +02:00
Rasmus Munk Larsen
6de5ed08d8
Add generic PacketMath implementation of the Error Function (erf).
2019-09-19 12:48:30 -07:00
Srinivas Vasudevan
df0816b71f
Merging eigen/eigen.
2019-09-16 19:33:29 -04:00
Srinivas Vasudevan
6e215cf109
Add Bessel functions to SpecialFunctions.
...
- Split SpecialFunctions files in to a separate BesselFunctions file.
In particular add:
- Modified bessel functions of the second kind k0, k1, k0e, k1e
- Bessel functions of the first kind j0, j1
- Bessel functions of the second kind y0, y1
2019-09-14 12:16:47 -04:00
Srinivas Vasudevan
facdec5aa7
Add packetized versions of i0e and i1e special functions.
...
- In particular refactor the i0e and i1e code so scalar and vectorized path share code.
- Move chebevl to GenericPacketMathFunctions.
A brief benchmark with building Eigen with FMA, AVX and AVX2 flags
Before:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 57.3 57.3 10000000
BM_eigen_i0e_double/8 398 398 1748554
BM_eigen_i0e_double/64 3184 3184 218961
BM_eigen_i0e_double/512 25579 25579 27330
BM_eigen_i0e_double/4k 205043 205042 3418
BM_eigen_i0e_double/32k 1646038 1646176 422
BM_eigen_i0e_double/256k 13180959 13182613 53
BM_eigen_i0e_double/1M 52684617 52706132 10
BM_eigen_i0e_float/1 28.4 28.4 24636711
BM_eigen_i0e_float/8 75.7 75.7 9207634
BM_eigen_i0e_float/64 512 512 1000000
BM_eigen_i0e_float/512 4194 4194 166359
BM_eigen_i0e_float/4k 32756 32761 21373
BM_eigen_i0e_float/32k 261133 261153 2678
BM_eigen_i0e_float/256k 2087938 2088231 333
BM_eigen_i0e_float/1M 8380409 8381234 84
BM_eigen_i1e_double/1 56.3 56.3 10000000
BM_eigen_i1e_double/8 397 397 1772376
BM_eigen_i1e_double/64 3114 3115 223881
BM_eigen_i1e_double/512 25358 25361 27761
BM_eigen_i1e_double/4k 203543 203593 3462
BM_eigen_i1e_double/32k 1613649 1613803 428
BM_eigen_i1e_double/256k 12910625 12910374 54
BM_eigen_i1e_double/1M 51723824 51723991 10
BM_eigen_i1e_float/1 28.3 28.3 24683049
BM_eigen_i1e_float/8 74.8 74.9 9366216
BM_eigen_i1e_float/64 505 505 1000000
BM_eigen_i1e_float/512 4068 4068 171690
BM_eigen_i1e_float/4k 31803 31806 21948
BM_eigen_i1e_float/32k 253637 253692 2763
BM_eigen_i1e_float/256k 2019711 2019918 346
BM_eigen_i1e_float/1M 8238681 8238713 86
After:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 15.8 15.8 44097476
BM_eigen_i0e_double/8 99.3 99.3 7014884
BM_eigen_i0e_double/64 777 777 886612
BM_eigen_i0e_double/512 6180 6181 100000
BM_eigen_i0e_double/4k 48136 48140 14678
BM_eigen_i0e_double/32k 385936 385943 1801
BM_eigen_i0e_double/256k 3293324 3293551 228
BM_eigen_i0e_double/1M 12423600 12424458 57
BM_eigen_i0e_float/1 16.3 16.3 43038042
BM_eigen_i0e_float/8 30.1 30.1 23456931
BM_eigen_i0e_float/64 169 169 4132875
BM_eigen_i0e_float/512 1338 1339 516860
BM_eigen_i0e_float/4k 10191 10191 68513
BM_eigen_i0e_float/32k 81338 81337 8531
BM_eigen_i0e_float/256k 651807 651984 1000
BM_eigen_i0e_float/1M 2633821 2634187 268
BM_eigen_i1e_double/1 16.2 16.2 42352499
BM_eigen_i1e_double/8 110 110 6316524
BM_eigen_i1e_double/64 822 822 851065
BM_eigen_i1e_double/512 6480 6481 100000
BM_eigen_i1e_double/4k 51843 51843 10000
BM_eigen_i1e_double/32k 414854 414852 1680
BM_eigen_i1e_double/256k 3320001 3320568 212
BM_eigen_i1e_double/1M 13442795 13442391 53
BM_eigen_i1e_float/1 17.6 17.6 41025735
BM_eigen_i1e_float/8 35.5 35.5 19597891
BM_eigen_i1e_float/64 240 240 2924237
BM_eigen_i1e_float/512 1424 1424 485953
BM_eigen_i1e_float/4k 10722 10723 65162
BM_eigen_i1e_float/32k 86286 86297 8048
BM_eigen_i1e_float/256k 691821 691868 1000
BM_eigen_i1e_float/1M 2777336 2777747 256
This shows anywhere from a 50% to 75% improvement on these operations.
I've also benchmarked without any of these flags turned on, and got similar
performance to before (if not better).
Also tested packetmath.cpp + special_functions to ensure no regressions.
2019-09-11 18:34:02 -07:00
Gael Guennebaud
747c6a51ca
bug #1736 : fix compilation issue with A(all,{1,2}).col(j) by implementing true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i)
2019-09-11 15:40:07 +02:00
Gael Guennebaud
031f17117d
bug #1741 : fix self-adjoint*matrix, triangular*matrix, and triangular^1*matrix with a destination having a non-trivial inner-stride
2019-09-11 15:04:25 +02:00
Gael Guennebaud
c06e6fd115
bug #1741 : fix SelfAdjointView::rankUpdate and product to triangular part for destination with non-trivial inner stride
2019-09-10 23:29:52 +02:00
Gael Guennebaud
ea0d5dc956
bug #1741 : fix C.noalias() = A*C; with C.innerStride()!=1
2019-09-10 16:25:24 +02:00
Srinivas Vasudevan
e38dd48a27
PR 681: Add ndtri function, the inverse of the normal distribution function.
2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen
1187bb65ad
Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf.
2019-08-28 12:20:21 -07:00
Rasmus Munk Larsen
9aba527405
Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly.
2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen
b021cdea6d
Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
2019-08-27 11:30:31 -07:00
Rasmus Munk Larsen
a3298b22ec
Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
...
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)
The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.
Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
2019-08-12 13:53:28 -07:00
Rasmus Munk Larsen
85928e5f47
Guard against repeated definition of EIGEN_MPL2_ONLY
2019-08-07 14:19:00 -07:00
Mehdi Goli
16a56b2ddd
[SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL.
...
* Adding SYCL memory model
* Enabling/Disabling SYCL backend in Core
* Supporting Vectorization
2019-06-27 12:25:09 +01:00
Rasmus Munk Larsen
988f24b730
Various fixes for packet ops.
...
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
2019-06-20 11:47:49 -07:00
Rasmus Larsen
c1b0aea653
Merged in Artem-B/eigen (pull request PR-654)
...
Minor build improvements
Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-05-31 22:27:04 +00:00
Rasmus Munk Larsen
b08527b0c1
Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures.
2019-05-31 15:26:06 -07:00
tra
b4c49bf00e
Minor build improvements
...
* Allow specifying multiple GPU architectures. E.g.:
cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70"
* Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda
which may not be the right location, if cmake was invoked with
-DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
2019-05-31 14:08:34 -07:00
Christoph Hertzberg
4ccd1ece92
bug #1707 : Fix deprecation warnings, or disable warnings when testing deprecated functions
2019-05-10 14:57:05 +02:00
Eugene Zhulenev
e9f0eb8a5e
Add masked_store_available to unpacket_traits
2019-05-02 14:52:58 -07:00
Eugene Zhulenev
b4010f02f9
Add masked pstoreu to AVX and AVX512 PacketMath
2019-05-02 13:14:18 -07:00
Anuj Rawat
8c7a6feb8e
Adding lowlevel APIs for optimized RHS packet load in TensorFlow
...
SpatialConvolution
Low-level APIs are added in order to optimized packet load in gemm_pack_rhs
in TensorFlow SpatialConvolution. The optimization is for scenario when a
packet is split across 2 adjacent columns. In this case we read it as two
'partial' packets and then merge these into 1. Currently this only works for
Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other
packet types (such as Packet8d) also.
This optimization shows significant speedup in SpatialConvolution with
certain parameters. Some examples are below.
Benchmark parameters are specified as:
Batch size, Input dim, Depth, Num of filters, Filter dim
Speedup numbers are specified for number of threads 1, 2, 4, 8, 16.
AVX512:
Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128, 24x24, 3, 64, 5x5 |2.18X, 2.13X, 1.73X, 1.64X, 1.66X
128, 24x24, 1, 64, 8x8 |2.00X, 1.98X, 1.93X, 1.91X, 1.91X
32, 24x24, 3, 64, 5x5 |2.26X, 2.14X, 2.17X, 2.22X, 2.33X
128, 24x24, 3, 64, 3x3 |1.51X, 1.45X, 1.45X, 1.67X, 1.57X
32, 14x14, 24, 64, 5x5 |1.21X, 1.19X, 1.16X, 1.70X, 1.17X
128, 128x128, 3, 96, 11x11 |2.17X, 2.18X, 2.19X, 2.20X, 2.18X
AVX2:
Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128, 24x24, 3, 64, 5x5 | 1.66X, 1.65X, 1.61X, 1.56X, 1.49X
32, 24x24, 3, 64, 5x5 | 1.71X, 1.63X, 1.77X, 1.58X, 1.68X
128, 24x24, 1, 64, 5x5 | 1.44X, 1.40X, 1.38X, 1.37X, 1.33X
128, 24x24, 3, 64, 3x3 | 1.68X, 1.63X, 1.58X, 1.56X, 1.62X
128, 128x128, 3, 96, 11x11 | 1.36X, 1.36X, 1.37X, 1.37X, 1.37X
In the higher level benchmark cifar10, we observe a runtime improvement
of around 6% for AVX512 on Intel Skylake server (8 cores).
On lower level PackRhs micro-benchmarks specified in TensorFlow
tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe
the following runtime numbers:
AVX512:
Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 41350 | 15073 | 2.74X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 7277 | 7341 | 0.99X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 8675 | 8681 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 24155 | 16079 | 1.50X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 25052 | 17152 | 1.46X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 18269 | 18345 | 1.00X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 19468 | 19872 | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 156060 | 42432 | 3.68X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 132701 | 36944 | 3.59X
AVX2:
Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 26233 | 12393 | 2.12X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 6091 | 6062 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 7427 | 7408 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 23453 | 20826 | 1.13X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 23167 | 22091 | 1.09X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 23422 | 23682 | 0.99X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 23165 | 23663 | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 72689 | 44969 | 1.62X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 61732 | 39779 | 1.55X
All benchmarks on Intel Skylake server with 8 cores.
2019-04-20 06:46:43 +00:00
Gael Guennebaud
48898a988a
fix unit test in c++03: c++03 does not allow passing local or anonymous enum as template param
2019-03-18 11:38:36 +01:00
Gael Guennebaud
cf7e2e277f
bug #1692 : enable enum as sizes of Matrix and Array
2019-03-17 21:59:30 +01:00
David Tellenbach
b013176e52
Remove undefined std::complex<int>
2019-03-14 11:40:28 +01:00
David Tellenbach
97f9a46cb9
PR 593: Add variadtic ctor for DiagonalMatrix with unit tests
2019-03-14 10:18:24 +01:00
Gael Guennebaud
45ab514fe2
revert debug stuff
2019-03-14 10:08:12 +01:00
Gael Guennebaud
d7d2f0680e
bug #1684 : partially workaround clang's 6/7 bug #40815
2019-03-13 10:40:01 +01:00
Gael Guennebaud
b0d406d91c
Enable construction of Ref<VectorType> from a runtime vector.
2019-03-03 15:25:25 +01:00
Gael Guennebaud
32502f3c45
bug #1684 : add simplified regression test for respective clang's bug (this also reveal the same bug in Apples's clang)
2019-02-22 10:29:06 +01:00
Gael Guennebaud
2a39659d79
Add fully generic Vector<Type,Size> and RowVector<Type,Size> type aliases.
2019-02-20 15:23:23 +01:00
Gael Guennebaud
44b54fa4a3
Protect c++11 type alias with Eigen's macro, and add respective unit test.
2019-02-20 14:43:05 +01:00
Gael Guennebaud
4e8047cdcf
Fix compilation with gcc and remove TR1 stuff.
2019-02-20 13:59:34 +01:00
Gael Guennebaud
edd413c184
bug #1409 : make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode:
...
- this helps clang 5 and 6 to support alignas in STL's containers.
- this makes the public API of our (and users) classes cleaner
2019-02-20 13:52:11 +01:00
Gael Guennebaud
3b5deeb546
bug #899 : make sparseqr unit test more stable by 1) trying with larger threshold and 2) relax rank computation for rank-deficient problems.
2019-02-19 22:57:51 +01:00
Gael Guennebaud
292d61970a
Fix C++17 compilation
2019-02-19 21:59:41 +01:00
Gael Guennebaud
2cfc025bda
fix unit compilation in c++17: std::ptr_fun has been removed.
2019-02-19 14:05:22 +01:00
Gael Guennebaud
7d10c78738
bug #1046 : add unit tests for correct propagation of alignment through std::alignment_of
2019-02-19 10:31:56 +01:00
Gael Guennebaud
e23bf40dc2
Add unit test for LinSpaced and complex numbers.
2019-02-18 22:03:47 +01:00
Gael Guennebaud
31b6e080a9
Fix regression: .conjugate() was popped out but not re-introduced.
2019-02-18 14:45:55 +01:00
Gael Guennebaud
c69d0d08d0
Set cost of conjugate to 0 (in practice it boils down to a no-op).
...
This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate
its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)
2019-02-18 14:43:07 +01:00
Gael Guennebaud
512b74aaa1
GEMM: catch all scalar-multiple variants when falling-back to a coeff-based product.
...
Before only s*A*B was caught which was both inconsistent with GEMM, sub-optimal,
and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495 ).
2019-02-18 11:47:54 +01:00
Gael Guennebaud
dada863d23
Enable unit tests of PartialPivLU on fixed size matrices, and increase tested matrix size (blocking was not tested!)
2019-02-11 17:56:20 +01:00
Gael Guennebaud
8a06c699d0
bug #1669 : fix PartialPivLU/inverse with zero-sized matrices.
2019-01-29 10:27:13 +01:00
Gael Guennebaud
f489f44519
bug #1574 : implement "sparse_matrix =,+=,-= diagonal_matrix" with smart insertion strategies of missing diagonal coeffs.
2019-01-28 17:29:50 +01:00
Gael Guennebaud
53560f9186
bug #1672 : fix unit test compilation with MSVC by adding overloads of test_is* for long long (and factorize copy/paste code through a macro)
2019-01-28 13:47:28 +01:00
Christoph Hertzberg
934b8a1304
Avoid I
as an identifier, since it may clash with the C-header complex.h
2019-01-25 14:54:39 +01:00
Gael Guennebaud
6908ce2a15
More thoroughly check variadic template ctor of fixed-size vectors
2019-01-24 10:24:28 +01:00
David Tellenbach
db152b9ee6
PR 572: Add initializer list constructors to Matrix and Array (include unit tests and doc)
...
- {1,2,3,4,5,...} for fixed-size vectors only
- {{1,2,3},{4,5,6}} for the general cases
- {{1,2,3,4,5,....}} is allowed for both row and column-vector
2019-01-21 16:25:57 +01:00
Gael Guennebaud
543529da6a
Add more extensive tests of Array ctors, including {} variants
2019-01-22 15:30:50 +01:00
Gael Guennebaud
d18f49cbb3
Fix compilation of unit tests with gcc and c++17
2019-01-18 11:12:42 +01:00
Christoph Hertzberg
d575505d25
After fixing bug #1557 , boostmultiprec_7 failed with NumericalIssue instead of NoConvergence (all that matters here is no Success)
2019-01-17 19:14:07 +01:00
Gael Guennebaud
0fe6b7d687
Make nestByValue works again (broken since 3.3) and add unit tests.
2019-01-17 18:27:25 +01:00
Gael Guennebaud
4b7cf7ff82
Extend reshaped unit tests and remove useless const_cast
2019-01-17 17:35:32 +01:00
Gael Guennebaud
b57c9787b1
Cleanup useless const_cast and add missing broadcast assignment tests
2019-01-17 16:55:42 +01:00
Patrick Peltzer
bba2f05064
Boosttest only available for Boost version >= 1.53.0
2019-01-17 11:54:37 +01:00
Patrick Peltzer
15e53d5d93
PR 567: makes all dense solvers inherit SoverBase (LU,Cholesky,QR,SVD).
...
This changeset also includes:
* add HouseholderSequence::conjugateIf
* define int as the StorageIndex type for all dense solvers
* dedicated unit tests, including assertion checking
* _check_solve_assertion(): this method can be implemented in derived solver classes to implement custom checks
* CompleteOrthogonalDecompositions: add applyZOnTheLeftInPlace, fix scalar type in applyZAdjointOnTheLeftInPlace(), add missing assertions
* Cholesky: add missing assertions
* FullPivHouseholderQR: Corrected Scalar type in _solve_impl()
* BDCSVD: Unambiguous return type for ternary operator
* SVDBase: Corrected Scalar type in _solve_impl()
2019-01-17 01:17:39 +01:00
Gael Guennebaud
7f32109c11
Add conjugateIf<bool> members to DesneBase, TriangularView, SelfadjointView, and make PartialPivLU use it.
2019-01-17 11:33:43 +01:00
Gael Guennebaud
c8e40edac9
Remove Eigen2ToEigen3 migration page (obsolete since 3.3)
2019-01-16 16:27:00 +01:00
Gael Guennebaud
aeffdf909e
bug #1617 : add unit tests for empty triangular solve.
2019-01-16 15:24:59 +01:00
Gael Guennebaud
502f717980
bug #1646 : disable aliasing detection for empty and 1x1 expression
2019-01-16 14:33:45 +01:00
Gael Guennebaud
2b70b2f570
Make Transform::rotation() an alias to Transform::linear() in the case of an Isometry
2019-01-15 22:50:42 +01:00
Gael Guennebaud
6ec6bf0b0d
Enable visitor on empty matrices (the visitor is left unchanged), and protect min/maxCoeff(Index*,Index*) on empty matrices by an assertion (+ doc & unit tests)
2019-01-15 15:21:14 +01:00
Gael Guennebaud
027e44ed24
bug #1592 : makes partial min/max reductions trigger an assertion on inputs with a zero reduction length (+doc and tests)
2019-01-15 15:13:24 +01:00
Gael Guennebaud
f8bc5cb39e
Fix detection of vector-at-time: use Rows/Cols instead of MaxRow/MaxCols.
...
This fix VectorXd(n).middleCol(0,0).outerSize() which was equal to 1.
2019-01-15 15:09:49 +01:00
Gael Guennebaud
32d7232aec
fix always true warning with gcc 4.7
2019-01-15 11:18:48 +01:00
Gael Guennebaud
e7d4d4f192
cleanup
2019-01-15 10:51:03 +01:00
Rasmus Larsen
5a59452aae
Merged eigen/eigen into default
2019-01-14 10:23:23 -08:00
Gael Guennebaud
61b6eb05fe
AVX512 (r)sqrt(double) was mistakenly disabled with clang and others
2019-01-14 17:28:47 +01:00
Greg Coombe
9d988a1e1a
Initialize isometric transforms like affine transforms.
...
The isometric transform, like the affine transform, has an implicit last
row of [0, 0, 0, 1]. This was not being properly initialized, as verified
by a new test function.
2019-01-11 23:14:35 -08:00
Gael Guennebaud
f566724023
Fix StorageIndex FIXME in dense LU solvers
2019-01-13 17:54:30 +01:00
Rasmus Munk Larsen
28ba1b2c32
Add support for inverse hyperbolic functions.
...
Fix cost of division.
2019-01-11 17:45:37 -08:00
Rasmus Munk Larsen
fcfced13ed
Rename pones -> ptrue. Use _CMP_TRUE_UQ where appropriate.
2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
8f04442526
Collapsed revision
...
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
cb955df9a6
Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
2019-01-09 16:17:08 -08:00
Rasmus Larsen
cb3c059fa4
Merged eigen/eigen into default
2019-01-09 15:04:17 -08:00
Gael Guennebaud
e6b217b8dd
bug #1652 : implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows:
...
- no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs
- FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs
2019-01-09 15:25:17 +01:00
Rasmus Munk Larsen
055f0b73db
Add support for pcmp_eq and pnot, including for complex types.
2019-01-07 16:53:36 -08:00
Gael Guennebaud
697fba3bb0
Fix unit test
2018-12-27 11:20:47 +01:00
Gael Guennebaud
0f6f75bd8a
Implement a faster fix for sin/cos of large entries that also correctly handle INF input.
2018-12-23 17:26:21 +01:00
Gael Guennebaud
38d704def8
Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate)
2018-12-23 16:13:24 +01:00
Gael Guennebaud
5713fb7feb
Fix plog(+INF): it returned ~87 instead of +INF
2018-12-23 15:40:52 +01:00
Gael Guennebaud
cfc70dc13f
Add regression test for bug #1174
2018-12-12 18:03:31 +01:00
Gael Guennebaud
2de8da70fd
bug #1557 : fix RealSchur and EigenSolver for matrices with only zeros on the diagonal.
2018-12-12 17:30:08 +01:00
Gael Guennebaud
72c0bbe2bd
Simplify handling of tests that must fail to compile.
...
Each test is now a normal ctest target, and build properties (compiler+flags) are preserved (instead of starting a new build-dir from scratch).
2018-12-12 15:48:36 +01:00
Gael Guennebaud
81c27325ae
bug #1641 : fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512
2018-12-08 14:27:48 +01:00
Gael Guennebaud
cd25b538ab
Fix noise in sparse_basic_3 (numerical cancellation)
2018-12-08 00:13:37 +01:00
Gael Guennebaud
efaf03bf96
Fix noise in lu unit test
2018-12-08 00:05:03 +01:00
Gael Guennebaud
aab749b1c3
fix test regarding AVX512 vectorization of complexes.
2018-12-06 16:55:00 +01:00
Gael Guennebaud
c53eececb0
Implement AVX512 vectorization of std::complex<float/double>
2018-12-06 15:58:06 +01:00
Christoph Hertzberg
919414b9fe
bug #785 : Make Cholesky decomposition work for empty matrices
2018-12-03 16:18:15 +01:00
Gael Guennebaud
69ace742be
Several improvements regarding packet-bitwise operations:
...
- add unit tests
- optimize their AVX512f implementation
- add missing implementations (half, Packet4f, ...)
2018-11-30 15:56:08 +01:00
Gael Guennebaud
48fe78c375
bug #1630 : fix linspaced when requesting smaller packet size than default one.
2018-11-28 13:15:06 +01:00
Gael Guennebaud
382279eb7f
Extend unit test to recursively check half-packet types and non packet types
2018-11-26 14:10:07 +01:00
Gael Guennebaud
e3b22a6bd0
merge
2018-11-23 16:06:21 +01:00
Gael Guennebaud
572d62697d
check two ctors
2018-11-23 15:37:09 +01:00
Gael Guennebaud
354f14293b
Fix double = bool !
2018-11-23 15:12:06 +01:00
Christoph Hertzberg
ea60a172cf
Add default constructor to Bar to make test compile again with clang-3.8
2018-11-23 14:24:22 +01:00
Gael Guennebaud
c685fe9838
Move regression test to right unit test file
2018-11-21 15:59:47 +01:00
Gael Guennebaud
4b2cebade8
Workaround weird MSVC bug
2018-11-21 15:53:37 +01:00
Gael Guennebaud
43c987b1c1
Add explicit regression test for bug #1622
2018-11-16 11:24:51 +01:00
Mark D Ryan
670d56441c
PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals
...
Commit aa110e681b
optimised the multiplication of small dyanmically
sized matrices by restricting the packet size to a maximum of 4, increasing
the chances that SIMD instructions are used in the computation. However, it
introduced a mismatch between the packet size and the requestedAlignment. This
mismatch can lead to crashes when the destination is not aligned. This patch
fixes the issue by ensuring that the AssignmentTraits are correctly computed
when using a restricted packet size.
* * *
Bind LinearPacketType to MaxPacketSize
This commit applies any packet size limit specified when instantiating
copy_using_evaluator_traits to the LinearPacketType, providing that the
size of the destination is not known at compile time.
* * *
Add unit test for restricted packet assignment
A new unit test is added to check that multiplication of small dynamically
sized matrices works correctly when the packet size is restricted to 4 and
the destination is unaligned.
2018-11-13 16:15:08 +01:00
Gael Guennebaud
784a3f13cf
bug #1619 : fix mixing of const and non-const generic iterators
2018-11-09 21:45:10 +01:00
Gael Guennebaud
db9a9a12ba
bug #1619 : make const and non-const iterators compatible
2018-11-09 16:49:19 +01:00
Gael Guennebaud
f62a0f69c6
Fix max-size in indexed-view
2018-11-08 18:40:22 +01:00
Gael Guennebaud
9d318b92c6
add unit tests for bug #1619
2018-11-01 15:14:50 +01:00
Matthieu Vigne
8d7a73e48e
bug #1617 : Fix SolveTriangular.solveInPlace crashing for empty matrix.
...
This made FullPivLU.kernel() crash when used on the zero matrix.
Add unit test for FullPivLU.kernel() on the zero matrix.
2018-10-31 20:28:18 +01:00
Rasmus Munk Larsen
954b4ca9d0
Suppress compiler warning about unused global variable.
2018-10-22 13:48:56 -07:00
Gael Guennebaud
e3b85771d7
Show call stack in case of failing sparse solving.
2018-10-16 00:43:44 +02:00
Gael Guennebaud
3a33db4de5
merge
2018-10-15 09:22:27 +02:00
Rasmus Munk Larsen
0ed811a9c1
Suppress unused variable compiler warning in sparse subtest 3.
2018-10-12 13:41:57 -07:00
Gael Guennebaud
8214cf1896
Make sparse_basic includable from sparse_extra, but disable it since sparse_basic(DynamicSparseMatrix) does not compile at all anyways
2018-10-11 10:27:23 +02:00
Gael Guennebaud
2ef1b39674
Relaxed fastmath unit test: if std::foo fails, then let's only trigger a warning is numext::foo fails too.
...
A true error will triggered only if std::foo works but our numext::foo fails.
2018-10-11 09:45:30 +02:00
Gael Guennebaud
1d5a6363ea
relax numerical tests from equal to approx (x87)
2018-10-11 09:29:56 +02:00
Gael Guennebaud
ce243ee45b
bug #520 : add diagmat +/- diagmat operators.
2018-10-10 23:38:22 +02:00
Gael Guennebaud
5335659c47
Merged in ezhulenev/eigen-02 (pull request PR-525)
...
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 20:59:00 +00:00
Gael Guennebaud
eec0dfd688
bug #632 : add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense.
...
They are rewritten as two compound assignment to by-pass hybrid dense-sparse iterator.
2018-10-10 22:50:15 +02:00
Eugene Zhulenev
8e6dc2c81d
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 13:23:52 -07:00
Gael Guennebaud
76ceae49c1
bug #1609 : add inplace transposition unit test
2018-10-10 21:48:58 +02:00
Christoph Hertzberg
f3130ee1ba
Avoid empty macro arguments
2018-10-10 08:23:40 +02:00
Rasmus Munk Larsen
e8918743c1
Merged in ezhulenev/eigen-01 (pull request PR-523)
...
Compile time detection for unimplemented stl-style iterators
2018-10-09 23:42:01 +00:00
Eugene Zhulenev
befcac883d
Hide stl-container detection test under #if
2018-10-09 15:36:01 -07:00
Eugene Zhulenev
c0ca8a9fa3
Compile time detection for unimplemented stl-style iterators
2018-10-09 15:28:23 -07:00
Gael Guennebaud
1dd1f8e454
bug #65 : add vectorization of partial reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean()
2018-10-09 23:36:50 +02:00
Gael Guennebaud
c0c3be26ed
Extend unit tests for partial reductions
2018-10-09 22:54:54 +02:00
Gael Guennebaud
c6e2dde714
fix c++11 deprecated warning
2018-10-08 18:26:05 +02:00
Gael Guennebaud
649d4758a6
merge
2018-10-08 17:35:18 +02:00
Gael Guennebaud
c9643f4a6f
Disable C++11 deprecated warning when limiting Eigen to C++98
2018-10-08 10:43:43 +02:00
Gael Guennebaud
6c3f6cd52b
Fix maybe-uninitialized warning
2018-10-07 23:29:51 +02:00
Gael Guennebaud
16b2001ece
Fix gcc 8.1 warning: "maybe use uninitialized"
2018-10-07 21:54:49 +02:00
Gael Guennebaud
409132bb81
Workaround gcc bug making it trigger an invalid warning
2018-10-07 09:23:15 +02:00
Gael Guennebaud
d92f004ab7
Simplify API by removing allCols/allRows and reusing rowwise/colwise to define iterators over rows/columns
2018-10-05 23:11:21 +02:00
Gael Guennebaud
3e64b1fc86
Move iterators to internal, improve doc, make unit test c++03 friendly
2018-10-03 15:13:15 +02:00
Gael Guennebaud
8a1e98240e
add unit tests
2018-10-03 11:56:27 +02:00
Gael Guennebaud
5f26f57598
Change the logic of A.reshaped<Order>() to be a simple alias to A.reshaped<Order>(AutoSize,fix<1>).
...
This means that now AutoOrder is allowed, and it always return a column-vector.
2018-10-03 11:41:47 +02:00
Gael Guennebaud
0481900e25
Add pointer-based iterator for direct-access expressions
2018-10-02 23:44:36 +02:00
Gael Guennebaud
12487531ce
Add templated subVector<Vertical/Horizonal>(Index) aliases to col/row(Index) methods (plus subVectors<>() to retrieve the number of rows/columns)
2018-10-02 14:02:34 +02:00
Gael Guennebaud
37e29fc893
Use Index instead of ptrdiff_t or int, fix random-accessors.
2018-10-02 13:29:32 +02:00
Gael Guennebaud
b0c66adfb1
bug #231 : initial implementation of STL iterators for dense expressions
2018-10-01 23:21:37 +02:00
Gael Guennebaud
626942d9dd
fix alignment issue in ploaddup for AVX512
2018-09-28 16:57:32 +02:00
Gael Guennebaud
84a1101b36
Merge with default.
2018-09-23 21:52:58 +02:00
Christoph Hertzberg
e3c8289047
Replace unused PREDICATE by corresponding STATIC_ASSERT
2018-09-21 21:15:51 +02:00
Gael Guennebaud
1bf12880ae
Add reshaped<>() shortcuts when returning vectors and remove the reshaping version of operator()(all)
2018-09-21 16:50:04 +02:00
Gael Guennebaud
03a0cb2b72
fix unalignedcount for avx512
2018-09-21 14:40:26 +02:00
Gael Guennebaud
91716f03a7
Fix vectorization logic unit test for AVX512
2018-09-21 14:32:24 +02:00