Ilya Tokar
06e99aaf40
Bug 1785: fix pround on x86 to use the same rounding mode as std::round.
...
This also adds pset1frombits helper to Packet[24]d.
Makes round ~45% slower for SSE: 1.65µs ± 1% before vs 2.45µs ± 2% after,
stil an order of magnitude faster than scalar version: 33.8µs ± 2%.
2019-12-12 17:38:53 -05:00
Rasmus Munk Larsen
73a8d572f5
Clamp tanh approximation outside [-c, c] where c is the smallest value where the approximation is exactly +/-1. Without FMA, c = 7.90531110763549805, with FMA c = 7.99881172180175781.
2019-12-12 19:34:25 +00:00
Srinivas Vasudevan
88062b7fed
Fix implementation of complex expm1. Add tests that fail with previous implementation, but pass with the current one.
2019-12-12 01:56:54 +00:00
Joel Holdsworth
3c0ef9f394
IO: Fixed printing of char and unsigned char matrices
2019-12-11 18:22:57 +00:00
Joel Holdsworth
e87af0ed37
Added Eigen::numext typedefs for uint8_t, int8_t, uint16_t and int16_t
2019-12-11 18:22:57 +00:00
Gael Guennebaud
15b3bcfca0
Bug 1786: fix compilation with MSVC
2019-12-11 16:16:38 +01:00
Deven Desai
c49f0d851a
Fix for HIP breakage detected on 191210
...
The following commit introduces compile errors when running eigen with hipcc
2918f85ba9
hipcc errors out because it requies the device attribute on the methods within the TensorBlockV2ResourceRequirements struct instroduced by the commit above. The fix is to add the device attribute to those methods
2019-12-10 22:14:05 +00:00
Gael Guennebaud
8fbe0e4699
Update old links to bitbucket to point to gitlab.com
2019-12-04 10:57:07 +01:00
Rasmus Larsen
cacf433975
Merged in anshuljl/eigen-2/Anshul-Jaiswal/update-configurevectorizationh-to-not-op-1573079916090 (pull request PR-754)
...
Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda.
Approved-by: Deven Desai <deven.desai.amd@gmail.com>
2019-12-04 00:45:42 +00:00
Gael Guennebaud
6358599ecb
Fix QuaternionBase::cast for quaternion map and wrapper.
2019-12-03 14:51:14 +01:00
Gael Guennebaud
7745f69013
bug #1776 : fix vector-wise STL iterator's operator-> using a proxy as pointer type.
...
This changeset fixes also the value_type definition.
2019-12-03 14:40:15 +01:00
Rasmus Munk Larsen
66f07efeae
Revert the specialization for scalar_logistic_op<float> introduced in:
...
77b447c24e
While providing a 50% speedup on Haswell+ processors, the large relative error outside [-18, 18] in this approximation causes problems, e.g., when computing gradients of activation functions like softplus in neural networks.
2019-12-02 17:00:58 -08:00
Rasmus Larsen
3b15373bb3
Merged in ezhulenev/eigen-02 (pull request PR-767)
...
Fix shadow warnings in AlignedBox and SparseBlock
2019-12-02 18:23:11 +00:00
Deven Desai
312c8e77ff
Fix for the HIP build+test errors.
...
Recent changes have introduced the following build error when compiling with HIPCC
---------
unsupported/test/../../Eigen/src/Core/GenericPacketMath.h:254:58: error: 'ldexp': no overloaded function has restriction specifiers that are compatible with the ambient context 'pldexp'
---------
The fix for the error is to pick the math function(s) from the global namespace (where they are declared as device functions in the HIP header files) when compiling with HIPCC.
2019-12-02 17:41:32 +00:00
Mehdi Goli
00f32752f7
[SYCL] Rebasing the SYCL support branch on top of the Einge upstream master branch.
...
* Unifying all loadLocalTile from lhs and rhs to an extract_block function.
* Adding get_tensor operation which was missing in TensorContractionMapper.
* Adding the -D method missing from cmake for Disable_Skinny Contraction operation.
* Wrapping all the indices in TensorScanSycl into Scan parameter struct.
* Fixing typo in Device SYCL
* Unifying load to private register for tall/skinny no shared
* Unifying load to vector tile for tensor-vector/vector-tensor operation
* Removing all the LHS/RHS class for extracting data from global
* Removing Outputfunction from TensorContractionSkinnyNoshared.
* Combining the local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining General Tensor-Vector and VectorTensor contraction into one kernel.
* Making double buffering optional for Tensor contraction when local memory is version is used.
* Modifying benchmark to accept custom Reduction Sizes
* Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host
* Adding Test for SYCL
* Modifying SYCL CMake
2019-11-28 10:08:54 +00:00
Eugene Zhulenev
82a47338df
Fix shadow warnings in AlignedBox and SparseBlock
2019-11-27 16:22:27 -08:00
Rasmus Munk Larsen
ea51a9eace
Add missing EIGEN_DEVICE_FUNC attribute to template specializations for pexp to fix GPU build.
2019-11-27 10:17:09 -08:00
Rasmus Munk Larsen
5a3ebda36b
Fix warning due to missing cast for exponent arguments for std::frexp and std::lexp.
2019-11-26 16:18:29 -08:00
Joel Holdsworth
86eb41f1cb
SparseRef: Fixed alignment warning on ARM GCC
2019-11-07 14:34:06 +00:00
Anshul Jaiswal
c1a67cb5af
Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda.
2019-11-06 22:40:38 +00:00
Rasmus Munk Larsen
cc3d0e6a40
Add EIGEN_HAS_INTRINSIC_INT128 macro
...
Add a new EIGEN_HAS_INTRINSIC_INT128 macro, and use this instead of __SIZEOF_INT128__. This fixes related issues with TensorIntDiv.h when building with Clang for Windows, where support for 128-bit integer arithmetic is advertised but broken in practice.
2019-11-06 14:24:33 -08:00
Rasmus Munk Larsen
ee404667e2
Rollback or PR-746 and partial rollback of 668ab3fc47
...
.
std::array is still not supported in CUDA device code on Windows.
2019-11-05 17:17:58 -08:00
Hans Johnson
e78ed6e7f3
COMP: Simplify install commands for Eigen
...
Confirm that install directory is identical
before and after this simplifying patch.
```bash
hg clone <<Eigen>>
mkdir eigen-bld
cd eigen-bld
cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/bef
make install
find /tmp/pre_eigen_modernize >/tmp/bef
# Apply this patch
cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/aft
make install
find /tmp/post_eigen_modernize |sed 's/post_e/pre_e/g' >/tmp/aft
diff /tmp/bef /tmp/aft
```
2019-11-17 15:14:25 -06:00
Gael Guennebaud
e5778b87b9
Fix duplicate symbol linking error.
2019-11-20 17:23:19 +01:00
Hans Johnson
6fb3e5f176
STYLE: Remove CMake-language block-end command arguments
...
Ancient versions of CMake required else(), endif(), and similar block
termination commands to have arguments matching the command starting the block.
This is no longer the preferred style.
2019-10-31 11:36:27 -05:00
Rasmus Munk Larsen
f1e8307308
1. Fix a bug in psqrt and make it return 0 for +inf arguments.
...
2. Simplify handling of special cases by taking advantage of the fact that the
builtin vrsqrt approximation handles negative, zero and +inf arguments correctly.
This speeds up the SSE and AVX implementations by ~20%.
3. Make the Newton-Raphson formula used for rsqrt more numerically robust:
Before: y = y * (1.5 - x/2 * y^2)
After: y = y * (1.5 - y * (x/2) * y)
Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision.
4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration.
Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o
2019-11-15 17:09:46 -08:00
Gael Guennebaud
2cb2915f90
bug #1744 : fix compilation with MSVC 2017 and AVX512, plog1p/pexpm1 require plog/pexp, but the later was disabled on some compilers
2019-11-15 13:39:51 +01:00
Gael Guennebaud
8af045a287
bug #1774 : fix VectorwiseOp::begin()/end() return types regarding constness.
2019-11-14 11:45:52 +01:00
Sakshi Goynar
75b4c0a3e0
PR 751: Fixed compilation issue when compiling using MSVC with /arch:AVX512 flag
2019-10-31 16:09:16 -07:00
Gael Guennebaud
8496f86f84
Enable CompleteOrthogonalDecomposition::pseudoInverse with non-square fixed-size matrices.
2019-11-13 21:16:53 +01:00
Gael Guennebaud
71aa53dd6d
Disable AVX on broken xcode versions. See PR 748.
...
Patch adapted from Hans Johnson's PR 748.
2019-11-12 11:40:38 +01:00
Eugene Zhulenev
e7ed4bd388
Remove internal::smart_copy and replace with std::copy
2019-10-29 11:25:24 -07:00
Gael Guennebaud
e7d8ba747c
bug #1752 : make is_convertible equivalent to the std c++11 equivalent and fallback to std::is_convertible when c++11 is enabled.
2019-10-10 17:41:47 +02:00
Gael Guennebaud
196de2efe3
Explicitly bypass resize and memmoves when there is already the exact right number of elements available.
2019-10-08 21:44:33 +02:00
Gael Guennebaud
d1def335dc
fix one more possible conflicts with real/imag
2019-10-08 16:19:10 +02:00
Gael Guennebaud
87427d2eaa
PR 719: fix real/imag namespace conflict
2019-10-08 09:15:17 +02:00
Rasmus Munk Larsen
fab4e3a753
Address comments on Chebyshev evaluation code:
...
1. Use pmadd when possible.
2. Add casts to avoid c++03 warnings.
2019-10-02 12:48:17 -07:00
Rasmus Munk Larsen
bd0fac456f
Prevent infinite loop in the nvcc compiler while unrolling the recurrent templates for Chebyshev polynomial evaluation.
2019-10-01 13:15:30 -07:00
Gael Guennebaud
9549ba8313
Fix perf issue in SimplicialLDLT::solve for complexes (again, m_diag is real)
2019-10-01 12:54:25 +02:00
Gael Guennebaud
c8b2c603b0
Fix speed issue with SimplicialLDLT for complexes: the diagonal is real!
2019-09-30 16:14:34 +02:00
Rasmus Munk Larsen
13ef08e5ac
Move implementation of vectorized error function erf() to SpecialFunctionsImpl.h.
2019-09-27 13:56:04 -07:00
Eugene Zhulenev
0c845e28c9
Fix erf in c++03
2019-09-25 11:31:45 -07:00
Deven Desai
5e186b1987
Fix for the HIP build+test errors.
...
The errors were introduced by this commit : d38e6fbc27
After the above mentioned commit, some of the tests started failing with the following error
```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:70:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h:28:22: error: call to 'erf' is ambiguous
return Eigen::half(Eigen::numext::erf(static_cast<float>(a)));
^~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1600:7: note: candidate function [with T = float]
float erf(const float &x) { return ::erff(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = float]
erf(const Scalar& x) {
^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:23: error: call to 'erf' is ambiguous
return make_double2(erf(a.x), erf(a.y));
^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
erf(const Scalar& x) {
^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:33: error: call to 'erf' is ambiguous
return make_double2(erf(a.x), erf(a.y));
^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
erf(const Scalar& x) {
^
3 errors generated.
```
This PR fixes the compile error by removing the "old" implementation for "erf" (assuming that the "new" implementation is what we want going forward. from a GPU point-of-view both implementations are the same).
This PR also fixes what seems like a cut-n-paste error in the aforementioned commit
2019-09-25 15:39:13 +00:00
Rasmus Larsen
d38e6fbc27
Merged in rmlarsen/eigen (pull request PR-704)
...
Add generic PacketMath implementation of the Error Function (erf).
2019-09-24 23:40:29 +00:00
Eugene Zhulenev
ef9dfee7bd
Tensor block evaluation V2 support for unary/binary/broadcsting
2019-09-24 12:52:45 -07:00
Christoph Hertzberg
efd9867ff0
bug #1746 : Removed implementation of standard copy-constructor and standard copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types
2019-09-24 11:09:58 +02:00
Rasmus Munk Larsen
6de5ed08d8
Add generic PacketMath implementation of the Error Function (erf).
2019-09-19 12:48:30 -07:00
Rasmus Munk Larsen
28b6786498
Fix build on setups without AVX512DQ.
2019-09-19 12:36:09 -07:00
Deven Desai
e02d429637
Fix for the HIP build+test errors.
...
The errors were introduced by this commit : 6e215cf109
The fix is switching to using ::<math_func> instead std::<math_func> when compiling for GPU
2019-09-18 18:44:20 +00:00
Srinivas Vasudevan
6e215cf109
Add Bessel functions to SpecialFunctions.
...
- Split SpecialFunctions files in to a separate BesselFunctions file.
In particular add:
- Modified bessel functions of the second kind k0, k1, k0e, k1e
- Bessel functions of the first kind j0, j1
- Bessel functions of the second kind y0, y1
2019-09-14 12:16:47 -04:00
Srinivas Vasudevan
facdec5aa7
Add packetized versions of i0e and i1e special functions.
...
- In particular refactor the i0e and i1e code so scalar and vectorized path share code.
- Move chebevl to GenericPacketMathFunctions.
A brief benchmark with building Eigen with FMA, AVX and AVX2 flags
Before:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 57.3 57.3 10000000
BM_eigen_i0e_double/8 398 398 1748554
BM_eigen_i0e_double/64 3184 3184 218961
BM_eigen_i0e_double/512 25579 25579 27330
BM_eigen_i0e_double/4k 205043 205042 3418
BM_eigen_i0e_double/32k 1646038 1646176 422
BM_eigen_i0e_double/256k 13180959 13182613 53
BM_eigen_i0e_double/1M 52684617 52706132 10
BM_eigen_i0e_float/1 28.4 28.4 24636711
BM_eigen_i0e_float/8 75.7 75.7 9207634
BM_eigen_i0e_float/64 512 512 1000000
BM_eigen_i0e_float/512 4194 4194 166359
BM_eigen_i0e_float/4k 32756 32761 21373
BM_eigen_i0e_float/32k 261133 261153 2678
BM_eigen_i0e_float/256k 2087938 2088231 333
BM_eigen_i0e_float/1M 8380409 8381234 84
BM_eigen_i1e_double/1 56.3 56.3 10000000
BM_eigen_i1e_double/8 397 397 1772376
BM_eigen_i1e_double/64 3114 3115 223881
BM_eigen_i1e_double/512 25358 25361 27761
BM_eigen_i1e_double/4k 203543 203593 3462
BM_eigen_i1e_double/32k 1613649 1613803 428
BM_eigen_i1e_double/256k 12910625 12910374 54
BM_eigen_i1e_double/1M 51723824 51723991 10
BM_eigen_i1e_float/1 28.3 28.3 24683049
BM_eigen_i1e_float/8 74.8 74.9 9366216
BM_eigen_i1e_float/64 505 505 1000000
BM_eigen_i1e_float/512 4068 4068 171690
BM_eigen_i1e_float/4k 31803 31806 21948
BM_eigen_i1e_float/32k 253637 253692 2763
BM_eigen_i1e_float/256k 2019711 2019918 346
BM_eigen_i1e_float/1M 8238681 8238713 86
After:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 15.8 15.8 44097476
BM_eigen_i0e_double/8 99.3 99.3 7014884
BM_eigen_i0e_double/64 777 777 886612
BM_eigen_i0e_double/512 6180 6181 100000
BM_eigen_i0e_double/4k 48136 48140 14678
BM_eigen_i0e_double/32k 385936 385943 1801
BM_eigen_i0e_double/256k 3293324 3293551 228
BM_eigen_i0e_double/1M 12423600 12424458 57
BM_eigen_i0e_float/1 16.3 16.3 43038042
BM_eigen_i0e_float/8 30.1 30.1 23456931
BM_eigen_i0e_float/64 169 169 4132875
BM_eigen_i0e_float/512 1338 1339 516860
BM_eigen_i0e_float/4k 10191 10191 68513
BM_eigen_i0e_float/32k 81338 81337 8531
BM_eigen_i0e_float/256k 651807 651984 1000
BM_eigen_i0e_float/1M 2633821 2634187 268
BM_eigen_i1e_double/1 16.2 16.2 42352499
BM_eigen_i1e_double/8 110 110 6316524
BM_eigen_i1e_double/64 822 822 851065
BM_eigen_i1e_double/512 6480 6481 100000
BM_eigen_i1e_double/4k 51843 51843 10000
BM_eigen_i1e_double/32k 414854 414852 1680
BM_eigen_i1e_double/256k 3320001 3320568 212
BM_eigen_i1e_double/1M 13442795 13442391 53
BM_eigen_i1e_float/1 17.6 17.6 41025735
BM_eigen_i1e_float/8 35.5 35.5 19597891
BM_eigen_i1e_float/64 240 240 2924237
BM_eigen_i1e_float/512 1424 1424 485953
BM_eigen_i1e_float/4k 10722 10723 65162
BM_eigen_i1e_float/32k 86286 86297 8048
BM_eigen_i1e_float/256k 691821 691868 1000
BM_eigen_i1e_float/1M 2777336 2777747 256
This shows anywhere from a 50% to 75% improvement on these operations.
I've also benchmarked without any of these flags turned on, and got similar
performance to before (if not better).
Also tested packetmath.cpp + special_functions to ensure no regressions.
2019-09-11 18:34:02 -07:00
Srinivas Vasudevan
b052ec6992
Merged eigen/eigen into default
2019-09-11 18:01:54 -07:00
Deven Desai
cdb377d0cb
Fix for the HIP build+test errors introduced by the ndtri support.
...
The fixes needed are
* adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs)
* switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC
* removing an errant "j" from a testcase (don't know how that made it in to begin with!)
2019-09-06 16:03:49 +00:00
Gael Guennebaud
747c6a51ca
bug #1736 : fix compilation issue with A(all,{1,2}).col(j) by implementing true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i)
2019-09-11 15:40:07 +02:00
Gael Guennebaud
031f17117d
bug #1741 : fix self-adjoint*matrix, triangular*matrix, and triangular^1*matrix with a destination having a non-trivial inner-stride
2019-09-11 15:04:25 +02:00
Gael Guennebaud
459b2bcc08
Fix compilation of BLAS backend and frontend
2019-09-11 10:02:37 +02:00
Gael Guennebaud
afa8d13532
Fix some implicit literal to Scalar conversions in SparseCore
2019-09-11 00:03:07 +02:00
Gael Guennebaud
c06e6fd115
bug #1741 : fix SelfAdjointView::rankUpdate and product to triangular part for destination with non-trivial inner stride
2019-09-10 23:29:52 +02:00
Gael Guennebaud
ea0d5dc956
bug #1741 : fix C.noalias() = A*C; with C.innerStride()!=1
2019-09-10 16:25:24 +02:00
Gael Guennebaud
17226100c5
Fix a circular dependency regarding pshift* functions and GenericPacketMathFunctions.
...
Another solution would have been to make pshift* fully generic template functions with
partial specialization which is always a mess in c++03.
2019-09-06 09:26:04 +02:00
Gael Guennebaud
55b63d4ea3
Fix compilation without vector engine available (e.g., x86 with SSE disabled):
...
-> ppolevl is required by ndtri even for the scalar path
2019-09-05 18:16:46 +02:00
Srinivas Vasudevan
a9cf823db7
Merged eigen/eigen
2019-09-04 23:50:52 -04:00
Gael Guennebaud
e6c183f8fd
Fix doc issues regarding ndtri
2019-09-04 23:00:21 +02:00
Gael Guennebaud
5702a57926
Fix possible warning regarding strict equality comparisons
2019-09-04 22:57:04 +02:00
Srinivas Vasudevan
99036a3615
Merging from eigen/eigen.
2019-09-03 15:34:47 -04:00
Gael Guennebaud
8e7e3d9bc8
Makes Scalar/RealScalar typedefs public in Pardiso's wrappers (see PR 688)
2019-09-03 13:09:03 +02:00
Srinivas Vasudevan
e38dd48a27
PR 681: Add ndtri function, the inverse of the normal distribution function.
2019-08-12 19:26:29 -04:00
Eugene Zhulenev
f59bed7a13
Change typedefs from private to protected to fix MSVC compilation
2019-09-03 19:11:36 -07:00
Srinivas Vasudevan
18ceb3413d
Add ndtri function, the inverse of the normal distribution function.
2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen
d55d392e7b
Fix bugs in log1p and expm1 where repeated using statements would clobber each other.
...
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
2019-08-08 16:27:32 -07:00
Gael Guennebaud
15f3d9d272
More colamd cleanup:
...
- Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc.
- Fix signed/unsigned warning
- move some ugly free functions as member functions
2019-09-03 00:50:51 +02:00
Anshul Jaiswal
a4d1a6cd7d
Eigen_Colamd.h updated to replace constexpr with consts and enums.
2019-08-17 05:29:23 +00:00
Anshul Jaiswal
283558face
Ordering.h edited to fix dependencies on Eigen_Colamd.h
2019-08-15 20:21:56 +00:00
Anshul Jaiswal
39f30923c2
Eigen_Colamd.h edited replacing macros with constexprs and functions.
2019-08-15 20:15:19 +00:00
Anshul Jaiswal
0a6b553ecf
Eigen_Colamd.h edited online with Bitbucket replacing constant #defines with const definitions
2019-07-21 04:53:31 +00:00
Michael Grupp
6e17491f45
Fix typo in Umeyama method documentation
2019-07-17 11:20:41 +00:00
Christoph Hertzberg
e0f5a2a456
Remove {} accidentally added in previous commit
2019-07-18 20:22:17 +02:00
Christoph Hertzberg
ea6d7eb32f
Move variadic constructors outside #ifndef EIGEN_PARSED_BY_DOXYGEN
block, to make it actually appear in the generated documentation.
2019-07-12 19:46:37 +02:00
Christoph Hertzberg
c2671e5315
Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNING
...
Also, document LinSpaced only where it is implemented
2019-07-12 19:43:32 +02:00
Rasmus Munk Larsen
23b958818e
Fix compiler for unsigned integers.
2019-07-09 11:18:25 -07:00
Anshul Jaiswal
fab51d133e
Updated Eigen_Colamd.h, namespacing macros ALIVE & DEAD as COLAMD_ALIVE & COLAMD_DEAD
...
to prevent conflicts with other libraries / code.
2019-06-08 21:09:06 +00:00
Rasmus Munk Larsen
f6c51d9209
Fix missing header inclusion and colliding definitions for half type casting, which broke
...
build with -march=native on Haswell/Skylake.
2019-08-30 14:03:29 -07:00
Rasmus Munk Larsen
1187bb65ad
Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf.
2019-08-28 12:20:21 -07:00
Rasmus Munk Larsen
9aba527405
Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly.
2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen
b021cdea6d
Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
2019-08-27 11:30:31 -07:00
Christoph Hertzberg
2fb24384c9
Merged in jaopaulolc/eigen (pull request PR-679)
...
Fixes for Altivec/VSX and compilation with clang on PowerPC
2019-08-22 15:57:33 +00:00
João P. L. de Carvalho
5ac7984ffa
Fix debug macros in p{load,store}u
2019-08-14 11:59:12 -06:00
João P. L. de Carvalho
db9147ae40
Add missing pcmp_XX methods for double/Packet2d
...
This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
2019-08-14 10:37:39 -06:00
Rasmus Munk Larsen
a3298b22ec
Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
...
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)
The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.
Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
2019-08-12 13:53:28 -07:00
João P. L. de Carvalho
787f6ef025
Fix packed load/store for PowerPC's VSX
...
The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts.
For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f.
Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
2019-08-09 16:02:55 -06:00
João P. L. de Carvalho
4d29aa0294
Fix offset argument of ploadu/pstoreu for Altivec
...
If no offset is given, them it should be zero.
Also passes full address to vec_vsx_ld/st builtins.
Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.
Removes unnecessary casts.
2019-08-09 15:59:26 -06:00
João P. L. de Carvalho
66d073c38e
bug #1718 : Add cast to successfully compile with clang on PowerPC
...
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
2019-08-09 15:56:26 -06:00
Justin Carpentier
ffaf658ecd
PR 655: Fix missing Eigen namespace in Macros
2019-06-05 09:51:59 +02:00
Mehdi Goli
0b24e1cb5c
[SYCL] Adding the SYCL memory model. The SYCL memory model provides :
...
* an interface for SYCL buffers to behave as a non-dereferenceable pointer
* an interface for placeholder accessor to behave like a pointer on both host and device
2019-07-01 16:02:30 +01:00
Rasmus Munk Larsen
8053eeb51e
Fix CUDA compilation error for pselect<half>.
2019-06-28 12:07:29 -07:00
Mehdi Goli
16a56b2ddd
[SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL.
...
* Adding SYCL memory model
* Enabling/Disabling SYCL backend in Core
* Supporting Vectorization
2019-06-27 12:25:09 +01:00
Deven Desai
ba506d5bd2
fix for a ROCm/HIP specificcompile errror introduced by a recent commit.
2019-06-22 00:06:05 +00:00
Rasmus Munk Larsen
c9394d7a0e
Remove extra "one" in comment.
2019-06-20 16:23:19 -07:00
Rasmus Munk Larsen
b8f8dac4eb
Update comment as suggested by tra@google.com.
2019-06-20 16:18:37 -07:00
Rasmus Munk Larsen
e5e63c2cad
Fix grammar.
2019-06-20 16:03:59 -07:00
Rasmus Munk Larsen
302a404b7e
Added comment explaining the surprising EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC clause.
2019-06-20 15:59:08 -07:00
Rasmus Munk Larsen
b5237f53b1
Fix CUDA build on Mac.
2019-06-20 15:44:14 -07:00
Rasmus Munk Larsen
988f24b730
Various fixes for packet ops.
...
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
2019-06-20 11:47:49 -07:00
Christoph Hertzberg
e0be7f30e1
bug #1724 : Mask buggy warnings with g++-7
...
(grafted from 427f2f66d6
)
2019-06-14 14:57:46 +02:00
Rasmus Munk Larsen
6d432eae5d
Make is_valid_index_type return false for float and double when EIGEN_HAS_TYPE_TRAITS is off.
2019-06-05 16:42:27 -07:00
Rasmus Munk Larsen
f715f6e816
Add workaround for choosing the right include files with FP16C support with clang.
2019-06-05 13:36:37 -07:00
Rasmus Munk Larsen
b08527b0c1
Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures.
2019-05-31 15:26:06 -07:00
Deven Desai
2c38930161
fix for HIP build errors that were introduced by a commit earlier this week
2019-05-24 14:25:32 +00:00
Gustavo Lima Chaves
56bc4974fb
GEMV: remove double declaration of constant.
...
That was hurting users with compilers that would object to proceed with
that:
"""
./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow]
LhsPacketSize = Traits::LhsPacketSize,
^
./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here
static const Index LhsPacketSize = Traits::LhsPacketSize;
"""
2019-05-23 14:50:29 -07:00
Christoph Hertzberg
ac21a08c13
Cast Index to RealScalar
...
This fixes compilation issues with RealScalar types that are not implicitly castable from Index (e.g. ceres Jet types).
Reported by Peter Anderson-Sprecher via eMail
2019-05-23 15:31:12 +02:00
Rasmus Munk Larsen
3eb5ad0ed0
Enable support for F16C with Clang. The required intrinsics were added here: https://reviews.llvm.org/D16177
...
and are part of LLVM 3.8.0.
2019-05-20 17:19:20 -07:00
Rasmus Larsen
e92486b8c3
Merged in rmlarsen/eigen (pull request PR-643)
...
Make Eigen build with cuda 10 and clang.
Approved-by: Justin Lebar <justin.lebar@gmail.com>
2019-05-20 17:02:39 +00:00
Gael Guennebaud
cc7ecbb124
Merged in scramsby/eigen (pull request PR-646)
...
Eigen: Fix MSVC C++17 language standard detection logic
2019-05-20 07:19:10 +00:00
Rasmus Larsen
bf9cbed8d0
Merged in glchaves/eigen (pull request PR-635)
...
Speed up GEMV on AVX-512 builds, just as done for GEBP previously.
Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-05-17 19:40:50 +00:00
Rasmus Munk Larsen
ab0a30e429
Make Eigen build with cuda 10 and clang.
2019-05-15 13:32:15 -07:00
Christoph Hertzberg
5f32b79edc
Collapsed revision from PR-641
...
* SparseLU.h - corrected example, it didn't compile
* Changed encoding back to UTF8
2019-05-13 19:02:30 +02:00
Anuj Rawat
ad372084f5
Removing unused API to fix compile error in TensorFlow due to
...
AVX512VL, AVX512BW usage
2019-05-12 14:43:10 +00:00
Christoph Hertzberg
4ccd1ece92
bug #1707 : Fix deprecation warnings, or disable warnings when testing deprecated functions
2019-05-10 14:57:05 +02:00
Rasmus Munk Larsen
d3ef7cf03e
Fix build with clang on Windows.
2019-05-09 11:07:04 -07:00
Eugene Zhulenev
45b40d91ca
Fix AVX512 & GCC 6.3 compilation
2019-05-07 16:44:55 -07:00
Christoph Hertzberg
cca76c272c
Restore C++03 compatibility
2019-05-06 16:18:22 +02:00
Rasmus Munk Larsen
8e33844fc7
Fix traits for scalar_logistic_op.
2019-05-03 15:49:09 -07:00
Scott Ramsby
ff06ef7584
Eigen: Fix MSVC C++17 language standard detection logic
...
To detect C++17 support, use _MSVC_LANG macro instead of _MSC_VER. _MSC_VER can indicate whether the current compiler version could support the C++17 language standard, but not whether that standard is actually selected (i.e. via /std:c++17).
See these web pages for more details:
https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/
https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros
2019-05-03 14:14:09 -07:00
Eugene Zhulenev
e9f0eb8a5e
Add masked_store_available to unpacket_traits
2019-05-02 14:52:58 -07:00
Eugene Zhulenev
96e30e936a
Add masked pstoreu for Packet16h
2019-05-02 14:11:01 -07:00
Eugene Zhulenev
b4010f02f9
Add masked pstoreu to AVX and AVX512 PacketMath
2019-05-02 13:14:18 -07:00
Gael Guennebaud
578407f42f
Fix regression in changeset ae33e866c7
2019-05-02 15:45:21 +02:00
Gustavo Lima Chaves
d4dcb71bcb
Speed up GEMV on AVX-512 builds, just as done for GEBP previously.
...
We take advantage of smaller SIMD registers as well, in that case.
Gains up to 3x for select input sizes.
2019-04-26 14:12:39 -07:00
Andy May
ae33e866c7
Fix compilation with PGI version 19
2019-04-25 21:23:19 +01:00
Gael Guennebaud
665ac22cc6
Merged in ezhulenev/eigen-01 (pull request PR-632)
...
Fix doxygen warnings
2019-04-25 20:02:20 +00:00
Eugene Zhulenev
8ead5bb3d8
Fix doxygen warnings to enable statis code analysis
2019-04-24 12:42:28 -07:00
Eugene Zhulenev
07355d47c6
Get rid of SequentialLinSpacedReturnType deprecation warnings in DenseBase.h
2019-04-24 11:01:35 -07:00
Rasmus Munk Larsen
144ca33321
Remove deprecation annotation from typedef Eigen::Index Index, as it would generate too many build warnings.
2019-04-24 08:50:07 -07:00
Eugene Zhulenev
a7b7f3ca8a
Add missing EIGEN_DEPRECATED annotations to deprecated functions and fix few other doxygen warnings
2019-04-23 17:23:19 -07:00
Eugene Zhulenev
68a2a8c445
Use packet ops instead of AVX2 intrinsics
2019-04-23 11:41:02 -07:00
Anuj Rawat
8c7a6feb8e
Adding lowlevel APIs for optimized RHS packet load in TensorFlow
...
SpatialConvolution
Low-level APIs are added in order to optimized packet load in gemm_pack_rhs
in TensorFlow SpatialConvolution. The optimization is for scenario when a
packet is split across 2 adjacent columns. In this case we read it as two
'partial' packets and then merge these into 1. Currently this only works for
Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other
packet types (such as Packet8d) also.
This optimization shows significant speedup in SpatialConvolution with
certain parameters. Some examples are below.
Benchmark parameters are specified as:
Batch size, Input dim, Depth, Num of filters, Filter dim
Speedup numbers are specified for number of threads 1, 2, 4, 8, 16.
AVX512:
Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128, 24x24, 3, 64, 5x5 |2.18X, 2.13X, 1.73X, 1.64X, 1.66X
128, 24x24, 1, 64, 8x8 |2.00X, 1.98X, 1.93X, 1.91X, 1.91X
32, 24x24, 3, 64, 5x5 |2.26X, 2.14X, 2.17X, 2.22X, 2.33X
128, 24x24, 3, 64, 3x3 |1.51X, 1.45X, 1.45X, 1.67X, 1.57X
32, 14x14, 24, 64, 5x5 |1.21X, 1.19X, 1.16X, 1.70X, 1.17X
128, 128x128, 3, 96, 11x11 |2.17X, 2.18X, 2.19X, 2.20X, 2.18X
AVX2:
Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128, 24x24, 3, 64, 5x5 | 1.66X, 1.65X, 1.61X, 1.56X, 1.49X
32, 24x24, 3, 64, 5x5 | 1.71X, 1.63X, 1.77X, 1.58X, 1.68X
128, 24x24, 1, 64, 5x5 | 1.44X, 1.40X, 1.38X, 1.37X, 1.33X
128, 24x24, 3, 64, 3x3 | 1.68X, 1.63X, 1.58X, 1.56X, 1.62X
128, 128x128, 3, 96, 11x11 | 1.36X, 1.36X, 1.37X, 1.37X, 1.37X
In the higher level benchmark cifar10, we observe a runtime improvement
of around 6% for AVX512 on Intel Skylake server (8 cores).
On lower level PackRhs micro-benchmarks specified in TensorFlow
tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe
the following runtime numbers:
AVX512:
Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 41350 | 15073 | 2.74X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 7277 | 7341 | 0.99X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 8675 | 8681 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 24155 | 16079 | 1.50X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 25052 | 17152 | 1.46X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 18269 | 18345 | 1.00X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 19468 | 19872 | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 156060 | 42432 | 3.68X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 132701 | 36944 | 3.59X
AVX2:
Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 26233 | 12393 | 2.12X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 6091 | 6062 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 7427 | 7408 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 23453 | 20826 | 1.13X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 23167 | 22091 | 1.09X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 23422 | 23682 | 0.99X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 23165 | 23663 | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 72689 | 44969 | 1.62X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 61732 | 39779 | 1.55X
All benchmarks on Intel Skylake server with 8 cores.
2019-04-20 06:46:43 +00:00
Gael Guennebaud
45e65fbb77
bug #1695 : fix a numerical robustness issue. Computing the secular equation at the middle range without a shift might give a wrong sign.
2019-03-27 20:16:58 +01:00
William D. Irons
8de66719f9
Collapsed revision from PR-619
...
* Add support for pcmp_eq in AltiVec/Complex.h
* Fixed implementation of pcmp_eq for double
The new logic is based on the logic from NEON for double.
2019-03-26 18:14:49 +00:00
Gael Guennebaud
f11364290e
ICC does not support -fno-unsafe-math-optimizations
2019-03-22 09:26:24 +01:00
David Tellenbach
3031d57200
PR 621: Fix documentation of EIGEN_COMP_EMSCRIPTEN
2019-03-21 02:21:04 +01:00
Deven Desai
51e399fc15
updates requested in the PR feedback. Also droping coded within #ifdef EIGEN_HAS_OLD_HIP_FP16
2019-03-19 21:45:25 +00:00
Deven Desai
2dbea5510f
Merged eigen/eigen into default
2019-03-19 16:52:38 -04:00
Rasmus Larsen
5c93b38c5f
Merged in rmlarsen/eigen (pull request PR-618)
...
Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_op<float>.
Approved-by: Gael Guennebaud <g.gael@free.fr>
2019-03-18 15:51:55 +00:00
Gael Guennebaud
cf7e2e277f
bug #1692 : enable enum as sizes of Matrix and Array
2019-03-17 21:59:30 +01:00
Rasmus Munk Larsen
e42f9aa68a
Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_<float>.
2019-03-15 17:15:14 -07:00
Rasmus Munk Larsen
8450a6d519
Clean up half packet traits and add a few more missing packet ops.
2019-03-14 15:18:06 -07:00
David Tellenbach
97f9a46cb9
PR 593: Add variadtic ctor for DiagonalMatrix with unit tests
2019-03-14 10:18:24 +01:00
Rasmus Munk Larsen
6a34003141
Remove EIGEN_MPL2_ONLY guard in IncompleteCholesky that is no longer needed after the AMD reordering code was relicensed to MPL2.
2019-03-13 11:52:41 -07:00
Gael Guennebaud
d7d2f0680e
bug #1684 : partially workaround clang's 6/7 bug #40815
2019-03-13 10:40:01 +01:00
Rasmus Munk Larsen
77f7d4a894
Clean up PacketMathHalf.h and add a few missing logical packet ops.
2019-03-11 17:51:16 -07:00
Gael Guennebaud
656d9bc66b
Apply SSE's pmin/pmax fix for GCC <= 5 to AVX's pmin/pmax
2019-03-10 21:19:18 +01:00
Rasmus Larsen
4d808e834a
Merged in rmlarsen/eigen_threadpool (pull request PR-606)
...
Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239
Approved-by: Sameer Agarwal <sameeragarwal@google.com>
2019-03-06 17:59:03 +00:00
Gael Guennebaud
bfbf7da047
bug #1689 fix used-but-marked-unused warning
2019-03-05 23:46:24 +01:00
Rasmus Munk Larsen
0318fc7f44
Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239
2019-03-05 10:24:54 -08:00
Gael Guennebaud
b0d406d91c
Enable construction of Ref<VectorType> from a runtime vector.
2019-03-03 15:25:25 +01:00
Sam Hasinoff
9ba81cf0ff
Fully qualify Eigen::internal::aligned_free
...
This helps avoids a conflict on certain Windows toolchains
(potentially due to some ADL name resolution bug) in the case
where aligned_free is defined in the global namespace. In any
case, tightening this up is harmless.
2019-03-02 17:42:16 +00:00
Gael Guennebaud
22144e949d
bug #1629 : fix compilation of PardisoSupport (regression introduced in changeset a7842daef2
...
)
2019-03-02 22:44:47 +01:00
Rasmus Larsen
2ca1e73239
Merged in rmlarsen/eigen (pull request PR-597)
...
Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2.
Approved-by: Gael Guennebaud <g.gael@free.fr>
2019-02-25 17:02:16 +00:00
Gael Guennebaud
e409dbba14
Enable SSE vectorization of Quaternion and cross3() with AVX
2019-02-23 10:45:40 +01:00
Gael Guennebaud
0b25a5c431
fix alignment in ploadquad
2019-02-22 21:39:36 +01:00
Rasmus Munk Larsen
1dc1677d52
Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2. Google LLC executed a license agreement with the author of the code from which these files are derived to allow the Eigen project to distribute the code and derived works under MPL2.
2019-02-22 12:33:57 -08:00
Gael Guennebaud
cca6c207f4
AVX512: implement faster ploadquad<Packet16f> thus speeding up GEMM
2019-02-21 17:18:28 +01:00
Gael Guennebaud
1c09ee8541
bug #1674 : workaround clang fast-math aggressive optimizations
2019-02-22 15:48:53 +01:00
Gael Guennebaud
7e3084bb6f
Fix compilation on ARM.
2019-02-22 14:56:12 +01:00
Gael Guennebaud
42c23f14ac
Speed up col/row-wise reverse for fixed size matrices by propagating compile-time sizes.
2019-02-21 22:44:40 +01:00
Rasmus Munk Larsen
4d7f317102
Add a few missing packet ops: cmp_eq for NEON. pfloor for GPU.
2019-02-21 13:32:13 -08:00
Gael Guennebaud
2a39659d79
Add fully generic Vector<Type,Size> and RowVector<Type,Size> type aliases.
2019-02-20 15:23:23 +01:00
Gael Guennebaud
302377110a
Update documentation of Matrix and Array type aliases.
2019-02-20 15:18:48 +01:00
Gael Guennebaud
44b54fa4a3
Protect c++11 type alias with Eigen's macro, and add respective unit test.
2019-02-20 14:43:05 +01:00
Gael Guennebaud
7195f008ce
Merged in ra_bauke/eigen (pull request PR-180)
...
alias template for matrix and array classes, see also bug #864
Approved-by: Heiko Bauke <heiko.bauke@mail.de>
2019-02-20 13:22:39 +00:00
Gael Guennebaud
edd413c184
bug #1409 : make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode:
...
- this helps clang 5 and 6 to support alignas in STL's containers.
- this makes the public API of our (and users) classes cleaner
2019-02-20 13:52:11 +01:00
Gael Guennebaud
482c5fb321
bug #899 : remove "rank-revealing" qualifier for SparseQR and warn that it is not always rank-revealing.
2019-02-19 22:52:15 +01:00
Christoph Hertzberg
a1646fc960
Commas at the end of enumerator lists are not allowed in C++03
2019-02-19 14:32:25 +01:00
Gael Guennebaud
ab78cabd39
Add C++17 detection macro, and make sure throw(xpr) is not used if the compiler is in c++17 mode.
2019-02-19 14:04:35 +01:00
Gael Guennebaud
115da6a1ea
Fix conversion warnings
2019-02-19 14:00:15 +01:00
Gael Guennebaud
7580112c31
Fix harmless Scalar vs RealScalar cast.
2019-02-18 22:12:28 +01:00
Gael Guennebaud
796db94e6e
bug #1194 : implement slightly faster and SIMD friendly 4x4 determinant.
2019-02-18 16:21:27 +01:00
Gael Guennebaud
31b6e080a9
Fix regression: .conjugate() was popped out but not re-introduced.
2019-02-18 14:45:55 +01:00
Gael Guennebaud
c69d0d08d0
Set cost of conjugate to 0 (in practice it boils down to a no-op).
...
This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate
its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)
2019-02-18 14:43:07 +01:00
Gael Guennebaud
512b74aaa1
GEMM: catch all scalar-multiple variants when falling-back to a coeff-based product.
...
Before only s*A*B was caught which was both inconsistent with GEMM, sub-optimal,
and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495 ).
2019-02-18 11:47:54 +01:00
Christoph Hertzberg
ec032ac03b
Guard C++11-style default constructor. Also, this is only needed for MSVC
2019-02-16 09:44:05 +01:00
Gael Guennebaud
83309068b4
bug #1680 : improve MSVC inlining by declaring many triavial constructors and accessors as STRONG_INLINE.
2019-02-15 16:35:35 +01:00
Gael Guennebaud
0505248f25
bug #1680 : make all "block" methods strong-inline and device-functions (some were missing EIGEN_DEVICE_FUNC)
2019-02-15 16:33:56 +01:00
Gael Guennebaud
559320745e
bug #1678 : Fix lack of __FMA__ macro on MSVC with AVX512
2019-02-15 10:30:28 +01:00
Gael Guennebaud
d85ae650bf
bug #1678 : workaround MSVC compilation issues with AVX512
2019-02-15 10:24:17 +01:00
Gael Guennebaud
f2970819a2
bug #1679 : avoid possible division by 0 in complex-schur
2019-02-15 09:39:25 +01:00
Rasmus Munk Larsen
65e23ca7e9
Revert b55b5c7280
...
.
2019-02-14 13:46:13 -08:00
Gael Guennebaud
bdcb5f3304
Let's properly use Score instead of std::abs, and remove deprecated FIXME ( a /= b does a/b and not a * (1/b) as it was a long time ago...)
2019-02-11 22:56:19 +01:00
Gael Guennebaud
2edfc6807d
Fix compilation of empty products of the form: Mx0 * 0xN
2019-02-11 18:24:07 +01:00
Gael Guennebaud
eb46f34a8c
Speed up 2x2 LU by a factor 2, and other small fixed sizes by about 10%.
...
Not sure that's so critical, but this does not complexify the code base much.
2019-02-11 17:59:35 +01:00
Gael Guennebaud
ab6e6edc32
Speedup PartialPivLU for small matrices by passing compile-time sizes when available.
...
This change set also makes a better use of Map<>+OuterStride and Ref<> yielding surprising speed up for small dynamic sizes as well.
The table below reports times in micro seconds for 10 random matrices:
| ------ float --------- | ------- double ------- |
size | before after ratio | before after ratio |
fixed 1 | 0.34 0.11 2.93 | 0.35 0.11 3.06 |
fixed 2 | 0.81 0.24 3.38 | 0.91 0.25 3.60 |
fixed 3 | 1.49 0.49 3.04 | 1.68 0.55 3.01 |
fixed 4 | 2.31 0.70 3.28 | 2.45 1.08 2.27 |
fixed 5 | 3.49 1.11 3.13 | 3.84 2.24 1.71 |
fixed 6 | 4.76 1.64 2.88 | 4.87 2.84 1.71 |
dyn 1 | 0.50 0.40 1.23 | 0.51 0.40 1.26 |
dyn 2 | 1.08 0.85 1.27 | 1.04 0.69 1.49 |
dyn 3 | 1.76 1.26 1.40 | 1.84 1.14 1.60 |
dyn 4 | 2.57 1.75 1.46 | 2.67 1.66 1.60 |
dyn 5 | 3.80 2.64 1.43 | 4.00 2.48 1.61 |
dyn 6 | 5.06 3.43 1.47 | 5.15 3.21 1.60 |
2019-02-11 13:58:24 +01:00
Gael Guennebaud
013cc3a6b3
Make GEMM fallback to GEMV for runtime vectors.
...
This is a more general and simpler version of changeset 4c0fa6ce0f
2019-02-07 16:24:09 +01:00
Gael Guennebaud
fa2fcb4895
Backed out changeset 4c0fa6ce0f
2019-02-07 16:07:08 +01:00
Gael Guennebaud
b3c4344a68
bug #1676 : workaround GCC's bug in c++17 mode.
2019-02-07 15:21:35 +01:00
Eugene Zhulenev
6d0f6265a9
Remove duplicated comment line
2019-02-04 10:30:25 -08:00
Eugene Zhulenev
690b2c45b1
Fix GeneralBlockPanelKernel Android compilation
2019-02-04 10:29:15 -08:00
Gael Guennebaud
871e2e5339
bug #1674 : disable GCC's unsafe-math-optimizations in sin/cos vectorization (results are completely wrong otherwise)
2019-02-03 08:54:47 +01:00
Rasmus Larsen
e7b481ea74
Merged in rmlarsen/eigen (pull request PR-578)
...
Speed up Eigen matrix*vector and vector*matrix multiplication.
Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2019-02-02 01:53:44 +00:00
Sameer Agarwal
b55b5c7280
Speed up row-major matrix-vector product on ARM
...
The row-major matrix-vector multiplication code uses a threshold to
check if processing 8 rows at a time would thrash the cache.
This change introduces two modifications to this logic.
1. A smaller threshold for ARM and ARM64 devices.
The value of this threshold was determined empirically using a Pixel2
phone, by benchmarking a large number of matrix-vector products in the
range [1..4096]x[1..4096] and measuring performance separately on
small and little cores with frequency pinning.
On big (out-of-order) cores, this change has little to no impact. But
on the small (in-order) cores, the matrix-vector products are up to
700% faster. Especially on large matrices.
The motivation for this change was some internal code at Google which
was using hand-written NEON for implementing similar functionality,
processing the matrix one row at a time, which exhibited substantially
better performance than Eigen.
With the current change, Eigen handily beats that code.
2. Make the logic for choosing number of simultaneous rows apply
unifiormly to 8, 4 and 2 rows instead of just 8 rows.
Since the default threshold for non-ARM devices is essentially
unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM
performance. This was verified by running the same set of benchmarks
on a Xeon desktop.
2019-02-01 15:23:53 -08:00
Rasmus Munk Larsen
4c0fa6ce0f
Speed up Eigen matrix*vector and vector*matrix multiplication.
...
This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector.
The benchmarks below test
c.noalias()= n_by_n_matrix * n_by_1_matrix;
c.noalias()= 1_by_n_matrix * n_by_n_matrix;
respectively.
Benchmark measurements:
SSE:
Run on *** (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 1096 312 +71.5%
BM_MatVec/128 4581 1464 +68.0%
BM_MatVec/256 18534 5710 +69.2%
BM_MatVec/512 118083 24162 +79.5%
BM_MatVec/1k 704106 173346 +75.4%
BM_MatVec/2k 3080828 742728 +75.9%
BM_MatVec/4k 25421512 4530117 +82.2%
BM_VecMat/32 352 130 +63.1%
BM_VecMat/64 1213 425 +65.0%
BM_VecMat/128 4640 1564 +66.3%
BM_VecMat/256 17902 5884 +67.1%
BM_VecMat/512 70466 24000 +65.9%
BM_VecMat/1k 340150 161263 +52.6%
BM_VecMat/2k 1420590 645576 +54.6%
BM_VecMat/4k 8083859 4364327 +46.0%
AVX2:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 619 120 +80.6%
BM_MatVec/128 9693 752 +92.2%
BM_MatVec/256 38356 2773 +92.8%
BM_MatVec/512 69006 12803 +81.4%
BM_MatVec/1k 443810 160378 +63.9%
BM_MatVec/2k 2633553 646594 +75.4%
BM_MatVec/4k 16211095 4327148 +73.3%
BM_VecMat/64 925 227 +75.5%
BM_VecMat/128 3438 830 +75.9%
BM_VecMat/256 13427 2936 +78.1%
BM_VecMat/512 53944 12473 +76.9%
BM_VecMat/1k 302264 157076 +48.0%
BM_VecMat/2k 1396811 675778 +51.6%
BM_VecMat/4k 8962246 4459010 +50.2%
AVX512:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 401 111 +72.3%
BM_MatVec/128 1846 513 +72.2%
BM_MatVec/256 36739 1927 +94.8%
BM_MatVec/512 54490 9227 +83.1%
BM_MatVec/1k 487374 161457 +66.9%
BM_MatVec/2k 2016270 643824 +68.1%
BM_MatVec/4k 13204300 4077412 +69.1%
BM_VecMat/32 324 106 +67.3%
BM_VecMat/64 1034 246 +76.2%
BM_VecMat/128 3576 802 +77.6%
BM_VecMat/256 13411 2561 +80.9%
BM_VecMat/512 58686 10037 +82.9%
BM_VecMat/1k 320862 163750 +49.0%
BM_VecMat/2k 1406719 651397 +53.7%
BM_VecMat/4k 7785179 4124677 +47.0%
Currently watchingStop watching
2019-01-31 14:24:08 -08:00
Gael Guennebaud
7ef879f6bf
GEBP: improves pipelining in the 1pX4 path with FMA.
...
Prior to this change, a product with a LHS having 8 rows was faster with AVX-only than with AVX+FMA.
With AVX+FMA I measured a speed up of about x1.25 in such cases.
2019-01-30 23:45:12 +01:00
Gael Guennebaud
de77bf5d6c
Fix compilation with ARM64.
2019-01-30 16:48:20 +01:00
Gael Guennebaud
eb4c6bb22d
Fix conflicts and merge
2019-01-30 15:57:08 +01:00
Gael Guennebaud
df12fae8b8
According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101 , the previous GCC issue is fixed in GCC trunk (will be gcc 9).
2019-01-30 11:52:28 +01:00
Gael Guennebaud
3775926bba
ARM64 & GEBP: add specialization for double +30% speed up
2019-01-30 11:49:06 +01:00
Gael Guennebaud
be5b0f664a
ARM64 & GEBP: Make use of vfmaq_laneq_f32 and workaround GCC's issue in generating good ASM
2019-01-30 11:48:25 +01:00
Gael Guennebaud
8a06c699d0
bug #1669 : fix PartialPivLU/inverse with zero-sized matrices.
2019-01-29 10:27:13 +01:00
Gael Guennebaud
a2a07e62b9
Fix compilation with c++03 (local class cannot be template arguments), and make SparseMatrix::assignDiagonal truly protected.
2019-01-29 10:10:07 +01:00
Gael Guennebaud
f489f44519
bug #1574 : implement "sparse_matrix =,+=,-= diagonal_matrix" with smart insertion strategies of missing diagonal coeffs.
2019-01-28 17:29:50 +01:00
Gael Guennebaud
803fa79767
Move evaluator<SparseCompressedBase>::find(i,j) to a more general and reusable SparseCompressedBase::lower_bound(i,j) functiion
2019-01-28 17:24:44 +01:00
Christoph Hertzberg
5a52e35f9a
Renaming some more I
identifiers
2019-01-26 13:18:21 +01:00
Rasmus Munk Larsen
71429883ee
Fix compilation error in NEON GEBP specializaition of madd.
2019-01-25 17:00:21 -08:00
Gael Guennebaud
ec8a387972
cleanup
2019-01-24 10:24:45 +01:00
David Tellenbach
237b03b372
PR 574: use variadic template instead of initializer_list to implement fixed-size vector ctor from coefficients.
2019-01-23 00:07:19 +01:00
Gael Guennebaud
80f81f9c4b
Cleanup SFINAE in Array/Matrix(initializer_list) ctors and minor doc editing.
2019-01-22 17:08:47 +01:00
David Tellenbach
db152b9ee6
PR 572: Add initializer list constructors to Matrix and Array (include unit tests and doc)
...
- {1,2,3,4,5,...} for fixed-size vectors only
- {{1,2,3},{4,5,6}} for the general cases
- {{1,2,3,4,5,....}} is allowed for both row and column-vector
2019-01-21 16:25:57 +01:00
nluehr
92774f0275
Replace host_define.h with cuda_runtime_api.h
2019-01-18 16:10:09 -06:00
Christoph Hertzberg
da0a41b9ce
Mask unused-parameter warnings, when building with NDEBUG
2019-01-18 10:41:14 +01:00
Rasmus Munk Larsen
2eccbaf3f7
Add missing logical packet ops for GPU and NEON.
2019-01-17 17:45:08 -08:00
Gael Guennebaud
ee3662abc5
Remove some useless const_cast
2019-01-17 18:27:49 +01:00
Gael Guennebaud
0fe6b7d687
Make nestByValue works again (broken since 3.3) and add unit tests.
2019-01-17 18:27:25 +01:00
Gael Guennebaud
4b7cf7ff82
Extend reshaped unit tests and remove useless const_cast
2019-01-17 17:35:32 +01:00
Gael Guennebaud
b57c9787b1
Cleanup useless const_cast and add missing broadcast assignment tests
2019-01-17 16:55:42 +01:00
Gael Guennebaud
be05d0030d
Make FullPivLU use conjugateIf<>
2019-01-17 12:01:00 +01:00
Patrick Peltzer
15e53d5d93
PR 567: makes all dense solvers inherit SoverBase (LU,Cholesky,QR,SVD).
...
This changeset also includes:
* add HouseholderSequence::conjugateIf
* define int as the StorageIndex type for all dense solvers
* dedicated unit tests, including assertion checking
* _check_solve_assertion(): this method can be implemented in derived solver classes to implement custom checks
* CompleteOrthogonalDecompositions: add applyZOnTheLeftInPlace, fix scalar type in applyZAdjointOnTheLeftInPlace(), add missing assertions
* Cholesky: add missing assertions
* FullPivHouseholderQR: Corrected Scalar type in _solve_impl()
* BDCSVD: Unambiguous return type for ternary operator
* SVDBase: Corrected Scalar type in _solve_impl()
2019-01-17 01:17:39 +01:00
Gael Guennebaud
7f32109c11
Add conjugateIf<bool> members to DesneBase, TriangularView, SelfadjointView, and make PartialPivLU use it.
2019-01-17 11:33:43 +01:00
Gael Guennebaud
562985bac4
bug #1646 : fix false aliasing detection for A.row(0) = A.col(0);
...
This changeset completely disable the detection for vectors for which are current mechanism cannot detect any positive aliasing anyway.
2019-01-17 00:14:27 +01:00
Rasmus Munk Larsen
7401e2541d
Fix compilation error for logical packet ops with older compilers.
2019-01-16 14:43:33 -08:00
Gael Guennebaud
0f028f61cb
GEBP: fix swapped kernel mode with AVX512 and complex scalars
2019-01-16 22:26:38 +01:00
Gael Guennebaud
e118ce86fd
GEBP: cleanup logic to choose between a 4 packets of 1 packet
2019-01-16 21:47:42 +01:00
Gael Guennebaud
70e133333d
bug #1661 : fix regression in GEBP and AVX512
2019-01-16 21:22:20 +01:00
Gael Guennebaud
502f717980
bug #1646 : disable aliasing detection for empty and 1x1 expression
2019-01-16 14:33:45 +01:00
Gael Guennebaud
0b466b6933
bug #1633 : use proper type for madd temporaries, factorize RhsPacketx4.
2019-01-16 13:50:13 +01:00
Renjie Liu
dbfcceabf5
Bug: 1633: refactor gebp kernel and optimize for neon
2019-01-16 12:51:36 +08:00
Gael Guennebaud
2b70b2f570
Make Transform::rotation() an alias to Transform::linear() in the case of an Isometry
2019-01-15 22:50:42 +01:00
Gael Guennebaud
2c2c114995
Silent maybe-uninitialized warnings by gcc
2019-01-15 16:53:15 +01:00
Gael Guennebaud
6ec6bf0b0d
Enable visitor on empty matrices (the visitor is left unchanged), and protect min/maxCoeff(Index*,Index*) on empty matrices by an assertion (+ doc & unit tests)
2019-01-15 15:21:14 +01:00
Gael Guennebaud
027e44ed24
bug #1592 : makes partial min/max reductions trigger an assertion on inputs with a zero reduction length (+doc and tests)
2019-01-15 15:13:24 +01:00
Gael Guennebaud
f8bc5cb39e
Fix detection of vector-at-time: use Rows/Cols instead of MaxRow/MaxCols.
...
This fix VectorXd(n).middleCol(0,0).outerSize() which was equal to 1.
2019-01-15 15:09:49 +01:00
Gael Guennebaud
6cf7afa3d9
Typo
2019-01-15 11:04:37 +01:00
Rasmus Larsen
7b3aab0936
Merged in rmlarsen/eigen (pull request PR-570)
...
Add support for inverse hyperbolic functions. Fix cost of division.
2019-01-14 21:31:33 +00:00
Gael Guennebaud
250dcd1fdb
bug #1652 : fix position of EIGEN_ALIGN16 attributes in Neon and Altivec
2019-01-14 21:45:56 +01:00
Rasmus Larsen
5a59452aae
Merged eigen/eigen into default
2019-01-14 10:23:23 -08:00
Gael Guennebaud
3c9e6d206d
AVX512: fix pgather/pscatter for Packet4cd and unaligned pointers
2019-01-14 17:57:28 +01:00
Gael Guennebaud
61b6eb05fe
AVX512 (r)sqrt(double) was mistakenly disabled with clang and others
2019-01-14 17:28:47 +01:00
Gael Guennebaud
ccddeaad90
fix warning
2019-01-14 16:51:16 +01:00
Gael Guennebaud
d4881751d3
Doc: add Isometry in the list of supported Mode of Transform<>
2019-01-14 16:38:26 +01:00
Greg Coombe
9d988a1e1a
Initialize isometric transforms like affine transforms.
...
The isometric transform, like the affine transform, has an implicit last
row of [0, 0, 0, 1]. This was not being properly initialized, as verified
by a new test function.
2019-01-11 23:14:35 -08:00
Gael Guennebaud
4356a55a61
PR 571: Implements an accurate argument reduction algorithm for huge inputs of sin/cos and call it instead of falling back to std::sin/std::cos.
...
This makes both the small and huge argument cases faster because:
- for small inputs this removes the last pselect
- for large inputs only the reduction part follows a scalar path,
the rest use the same SIMD path as the small-argument case.
2019-01-14 13:54:01 +01:00
Gael Guennebaud
f566724023
Fix StorageIndex FIXME in dense LU solvers
2019-01-13 17:54:30 +01:00