eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Georg Jäger	1b1082334b	adding attributes to constructors to support hip-clang on ROCm 3.5	2020-08-20 16:48:11 +02:00
Deven Desai	603e213d13	Fixing a CUDA / P100 regression introduced by PR 181 PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified. That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only	2020-08-20 00:29:57 +00:00
Rasmus Munk Larsen	d10b27fe37	Add missing inline keyword in Quaternion.h.	2020-08-14 17:51:04 +00:00
David Tellenbach	c6820a6316	Replace the call to int64_t in the blasutil test by explicit types Some platforms define int64_t to be long long even for C++03. If this is the case we miss the definition of internal::make_unsigned for this type. If we just define the template we get duplicated definitions errors for platforms defining int64_t as signed long for C++03. We need to find a way to distinguish both cases at compile-time.	2020-08-14 17:24:37 +02:00
David Tellenbach	8ba1b0f41a	bfloat16 packetmath for Arm Neon backend	2020-08-13 15:48:40 +00:00
Pedro Caldeira	704798d1df	Add support for Bfloat16 to use vector instructions on Altivec architecture	2020-08-10 13:22:01 -05:00
Zachary Garrett	21122498ec	Temporarily turn off the NEON implementation of pfloor as it does not work for large values. The NEON implementation mimics the SSE implementation, but didn't mention the caveat that due to the unsigned of signed integer conversions, not all values in the original floating point represented are supported.	2020-08-04 16:28:23 +00:00
David Tellenbach	5e484fa11d	Fix StlDeque for GCC 10 StlDeque extends std::deque by accessing some of its internal members. Since GCC 10 these are not accessible anymore.	2020-07-29 12:31:13 +00:00
Teng Lu	3ec4f0b641	Fix undefine BF16 union behavior in AVX512.	2020-07-29 02:20:21 +00:00
David Tellenbach	99da2e1a8d	Fix clang-tidy warnings in generic bfloat16 implementation See !172 for related discussions.	2020-07-27 16:00:24 +02:00
David Tellenbach	c1ffe452fc	Fix bfloat16 casts If we have explicit conversion operators available (C++11) we define explicit casts from bfloat16 to other types. If not (C++03), we don't define conversion operators but rely on implicit conversion chains from bfloat16 over float to other types.	2020-07-23 20:55:06 +00:00
Rasmus Munk Larsen	1b84f21e32	Revert change that made conversion from bfloat16 to {float, double} implicit. Add roundtrip tests for casting between bfloat16 and complex types.	2020-07-22 18:09:00 -07:00
David Tellenbach	38b91f256b	Fix cast of blfoat16 to std::complex<T> This fixes https://gitlab.com/libeigen/eigen/-/issues/1951	2020-07-22 19:00:17 +00:00
Rasmus Munk Larsen	bed7fbe854	Make sure we take the little-endian path if __BYTE_ORDER__ is not defined.	2020-07-22 18:54:38 +00:00
Niels Dekker	0e1a33a461	Faster conversion from integer types to bfloat16 Specialized `bfloat16_impl::float_to_bfloat16_rtne(float)` for normal floating point numbers, infinity and zero, in order to improve the performance of `bfloat16::bfloat16(const T&)` for integer argument types. A reduction of more than 20% of the runtime duration of conversion from int to bfloat16 was observed, using Visual C++ 2019 on Windows 10.	2020-07-22 19:25:49 +02:00
Rasmus Munk Larsen	acab22c205	Avoid division by zero in nonZerosEstimate() for empty blocks.	2020-07-22 01:38:30 +00:00
Rasmus Munk Larsen	0aeaf5f451	Make numext::as_uint a device function.	2020-07-22 00:33:41 +00:00
Alexander Turkin	60faa9f897	user-defined copy operations removed in favor of compiler-generated ones	2020-07-20 14:59:35 +03:00
Niels Dekker	b11f817bcf	Avoid undefined behavior by union type punning in float_to_bfloat16_rtne Use `numext::as_uint`, instead of union based type punning, to avoid undefined behavior. See also C++ Core Guidelines: "Don't use a union for type punning" https://github.com/isocpp/CppCoreGuidelines/blob/v0.8/CppCoreGuidelines.md#c183-dont-use-a-union-for-type-punning `numext::as_uint` was suggested by David Tellenbach	2020-07-14 19:55:20 +02:00
Sheng Yang	56b3e3f3f8	AVX path for BF16	2020-07-14 01:34:03 +00:00
Niels Dekker	4ab32e2de2	Allow implicit conversion from bfloat16 to float and double Conversion from `bfloat16` to `float` and `double` is lossless. It seems natural to allow the conversion to be implicit, as the C++ language also support implicit conversion from a smaller to a larger floating point type. Intel's OneDLL bfloat16 implementation also has an implicit `operator float()`: https://github.com/oneapi-src/oneDNN/blob/v1.5/src/common/bfloat16.hpp	2020-07-11 13:32:28 +02:00
Rasmus Munk Larsen	ed00df445d	Guard operator<< by EIGEN_NO_IO.	2020-07-09 19:52:44 +00:00
Rasmus Munk Larsen	fb77b7288c	Add operator<< to print a quaternion.	2020-07-09 12:49:58 -07:00
David Tellenbach	ee4715ff48	Fix test basic stuff - Guard fundamental types that are not available pre C++11 - Separate subsequent angle brackets >> by spaces - Allow casting of Eigen::half and Eigen::bfloat16 to complex types	2020-07-09 17:24:00 +00:00
Forrest Voight	8889a2c1c6	Add operator==/operator!= to Quaternion. Fixes #1876 .	2020-07-07 20:16:54 +00:00
Rasmus Munk Larsen	6964ae8d52	Change the sign operator in Eigen to return NaN for NaN arguments, not zero.	2020-07-07 01:54:04 +00:00
Sheng Yang	116c5235ac	BF16 for scalar_cmp_with_cast_op	2020-07-01 18:33:42 +00:00
Antonio Sanchez	9cb8771e9c	Fix tensor casts for large packets and casts to/from std::complex The original tensor casts were only defined for `SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the missing 1:N and 8:1. We also add casting `Eigen::half` to/from `std::complex<T>`, which was missing to make it consistent with `Eigen:bfloat16`, and generalize the overload to work for any complex type. Tests were added to `basicstuff`, `packetmath`, and `cxx11_tensor_casts` to test all cast configurations.	2020-06-30 18:53:55 +00:00
Antonio Sanchez	7222f0b6b5	Fix packetmath_1 float tests for arm/aarch64. Added missing `pmadd<Packet2f>` for NEON. This leads to significant improvement in precision than previous `pmul+padd`, which was causing the `pcos` tests to fail. Also added an approx test with `std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would pass. Modified `log(denorm)` tests. Denorms are not always supported by all systems (returns `::min`), are always flushed to zero on 32-bit arm, and configurably flush to zero on sse/avx/aarch64. This leads to inconsistent results across different systems (i.e. `-inf` vs `nan`). Added a check for existence and exclude ARM. Removed logistic exactness test, since scalar and vectorized versions follow different code-paths due to differences in `pexp` and `pmadd`, which result in slightly different values. For example, exactness always fails on arm, aarch64, and altivec.	2020-06-24 14:03:35 -07:00
Antonio Sanchez	ff4e7a0820	Add missing Packet2l/Packet2ul ops for NEON. The current multiply (`pmul`) and comparison operators (`pcmp_lt`, `pcmp_le`, `pcmp_eq`) are missing for packets `Packet2l` and `Packet2ul`. This leads to compile errors for the `packetmath.cpp` tests in clang. Here we add and test the missing ops. Tested: ``` $ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath $ adb push packetmath /data/local/tmp/ $ adb shell "/data/local/tmp/packetmath" $ arm-linux-gnueabihf-g++ -mfpu=neon -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath $ adb push packetmath /data/local/tmp/ $ adb shell "/data/local/tmp/packetmath" $ clang++ -target aarch64-linux-android21 -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath $ adb push packetmath /data/local/tmp/ $ adb shell "/data/local/tmp/packetmath" $ clang++ -target armv7-linux-android21 -static -mfpu=neon -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath $ adb push packetmath /data/local/tmp/ $ adb shell "/data/local/tmp/packetmath" ```	2020-06-22 11:24:43 -07:00
Antonio Sanchez	03ebdf6acb	Added missing NEON pcasts, update packetmath tests. The NEON `pcast` operators are all implemented and tested for existing packets. This requires adding a `pcast(a,b,c,d,e,f,g,h)` for casting between `int64_t` and `int8_t` in `GenericPacketMath.h`. Removed incorrect `HasHalfPacket` definition for NEON's `Packet2l`/`Packet2ul`. Adjustments were also made to the `packetmath` tests. These include - minor bug fixes for cast tests (i.e. 4:1 casts, only casting for packets that are vectorizable) - added 8:1 cast tests - random number generation - original had uninteresting 0 to 0 casts for many casts between floating-point and integers, and exhibited signed overflow undefined behavior Tested: ``` $ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_ALL=1' test/packetmath.cpp -o packetmath $ adb push packetmath /data/local/tmp/ $ adb shell "/data/local/tmp/packetmath" ```	2020-06-21 09:32:31 -07:00
Teng Lu	386d809bde	Support BFloat16 in Eigen	2020-06-20 19:16:24 +00:00
Pedro Caldeira	a475bf14d4	Fix pscatter and pgather for Altivec Complex double	2020-06-16 16:41:02 -03:00
David Tellenbach	c6c84ed961	Fix unused variable warning on Arm	2020-06-15 00:14:58 +02:00
Sebastien Boisvert	6228f27234	Fix #1818 : SparseLU: add methods nnzL() and nnzU() Now this compiles without errors: $ clang++ -I ../../ test_sparseLU.cpp -std=c++03	2020-06-11 23:49:49 +00:00
Sebastien Boisvert	39cbd6578f	Fix #1911 : add benchmark for move semantics with fixed-size matrix $ clang++ -O3 bench/bench_move_semantics.cpp -I. -std=c++11 \ -o bench_move_semantics $ ./bench_move_semantics float copy semantics: 1755.97 ms float move semantics: 55.063 ms double copy semantics: 2457.65 ms double move semantics: 55.034 ms	2020-06-11 23:43:25 +00:00
Antonio Sanchez	a7d2552af8	Remove HasCast and fix packetmath cast tests. The use of the `packet_traits<>::HasCast` field is currently inconsistent with `type_casting_traits<>`, and is unused apart from within `test/packetmath.cpp`. In addition, those packetmath cast tests do not currently reflect how casts are performed in practice: they ignore the `SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio. Here we remove the unsed `HasCast`, and modify the packet cast tests to better reflect their usage.	2020-06-11 17:26:56 +00:00
Sebastien Boisvert	463ec86648	Fix #1757 : remove the word 'suicide'	2020-06-11 00:56:54 +00:00
ShengYang1	b5d66b5e73	Implement scalar_cmp_with_cast_op	2020-06-09 08:12:07 +08:00
Rasmus Munk Larsen	c4059ffcb6	Fix static analyzer warning in SelfadjointProduct.h. Fix compiler warnings in GeneralBlockPanelKernel.h.	2020-06-08 11:48:44 -07:00
Thales Sabino	1fcaaf460f	Update FindComputeCpp.cmake to fix build problems on Windows - Use standard types in SYCL/PacketMath.h to avoid compilation problems on Windows - Add EIGEN_HAS_CONSTEXPR to cxx11_tensor_argmax_sycl.cpp to fix build problems on Windows	2020-06-05 20:51:20 +00:00
Rasmus Munk Larsen	c2ab36f47a	Fix broken packetmath test for logistic on Arm.	2020-06-04 16:24:47 -07:00
Rasmus Munk Larsen	537e2b322f	Fix typo in previous update to generic predux_any.	2020-06-04 22:25:05 +00:00
Rasmus Munk Larsen	fdc1cbdce3	Avoid implicit float equality comparison in generic predux_any, but use numext::not_equal_strict to avoid breaking builds that compile with -Werror=float-equal.	2020-06-04 22:15:56 +00:00
Rasmus Munk Larsen	daf9bbeca2	Fix compilation error in logistic packet op.	2020-06-03 00:57:41 +00:00
Gael Guennebaud	029a76e115	Bug #1777 : make the scalar and packet path consistent for the logistic function + respective unit test	2020-05-31 00:53:37 +02:00
Gael Guennebaud	99b7f7cb9c	Fix #556 : warnings with mingw	2020-05-31 00:39:44 +02:00
Gael Guennebaud	867a756509	Fix #1833 : compilation issue of "array!=scalar" with c++20	2020-05-30 23:53:58 +02:00
Gael Guennebaud	ab615e4114	Save one extra temporary when assigning a sparse product to a row-major sparse matrix	2020-05-30 23:15:12 +02:00
Kan Chen	8d1302f566	Add support for PacketBlock<Packet8s,4> and PacketBlock<Packet16uc,4> ptranspose on NEON	2020-05-29 00:33:45 +00:00
Yong Tang	8e1df5b082	Fix incorrect usage of `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC` This PR tries to fix an incorrect usage of `if defined(EIGEN_ARCH_PPC)` in `Eigen/Core` header. In `Eigen/src/Core/util/Macros.h`, EIGEN_ARCH_PPC was explicitly defined as either 0 or 1. As a result `if defined(EIGEN_ARCH_PPC)` will always be true. This causes issues when building on non PPC platform and `MatrixProduct.h` is not available. This fix changes `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC`. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2020-05-28 05:53:44 -07:00
Kan Chen	4e7046063b	Fix #1874 : it works on both MSVC 2017 and other platforms.	2020-05-21 18:42:56 +08:00
Pedro Caldeira	2d67af2d2b	Add pscatter for Packet16{u}c (int8)	2020-05-20 17:29:34 -03:00
Everton Constantino	8a7f360ec3	- Vectorizing MMA packing. - Optimizing MMA kernel. - Adding PacketBlock store to blas_data_mapper.	2020-05-19 19:24:11 +00:00
Rasmus Munk Larsen	a145e4adf5	Add newline at the end of StlIterators.h.	2020-05-15 20:36:00 +00:00
Gael Guennebaud	8ce9630ddb	Fix #1874 : workaround MSVC 2017 compilation issue.	2020-05-15 20:47:32 +02:00
Rasmus Munk Larsen	9b411757ab	Add missing packet ops for bool, and make it pass the same packet op unit tests as other arithmetic types. This change also contains a few minor cleanups: 1. Remove packet op pnot, which is not needed for anything other than pcmp_le_or_nan, which can be done in other ways. 2. Remove the "HasInsert" enum, which is no longer needed since we removed the corresponding packet ops. 3. Add faster pselect op for Packet4i when SSE4.1 is supported. Among other things, this makes the fast transposeInPlace() method available for Matrix<bool>. Run on ************** (72 X 2994 MHz CPUs); 2020-05-09T10:51:02.372347913-07:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------------- BM_TransposeInPlace<float>/4 9.77 9.77 71670320 BM_TransposeInPlace<float>/8 21.9 21.9 31929525 BM_TransposeInPlace<float>/16 66.6 66.6 10000000 BM_TransposeInPlace<float>/32 243 243 2879561 BM_TransposeInPlace<float>/59 844 844 829767 BM_TransposeInPlace<float>/64 933 933 750567 BM_TransposeInPlace<float>/128 3944 3945 177405 BM_TransposeInPlace<float>/256 16853 16853 41457 BM_TransposeInPlace<float>/512 204952 204968 3448 BM_TransposeInPlace<float>/1k 1053889 1053861 664 BM_TransposeInPlace<bool>/4 14.4 14.4 48637301 BM_TransposeInPlace<bool>/8 36.0 36.0 19370222 BM_TransposeInPlace<bool>/16 31.5 31.5 22178902 BM_TransposeInPlace<bool>/32 111 111 6272048 BM_TransposeInPlace<bool>/59 626 626 1000000 BM_TransposeInPlace<bool>/64 428 428 1632689 BM_TransposeInPlace<bool>/128 1677 1677 417377 BM_TransposeInPlace<bool>/256 7126 7126 96264 BM_TransposeInPlace<bool>/512 29021 29024 24165 BM_TransposeInPlace<bool>/1k 116321 116330 6068	2020-05-14 22:39:13 +00:00
Felipe Attanasio	d640276d31	Added support for reverse iterators for Vectorwise operations.	2020-05-14 22:38:20 +00:00
Christopher Moore	fa8fd4b4d5	Indexed view should have RowMajorBit when there is staticly a single row	2020-05-14 22:11:19 +00:00
Christopher Moore	a187ffea28	Resolve "IndexedView of a vector should allow linear access"	2020-05-13 19:24:42 +00:00
Pedro Caldeira	5fdc179241	Altivec template functions to better code reusability	2020-05-11 21:04:51 +00:00
mehdi-goli	d3e81db6c5	Eigen moved the `scanLauncehr` function inside the internal namespace. This commit applies the following changes: - Moving the `scamLauncher` specialization inside internal namespace to fix compiler crash on TensorScan for SYCL backend. - Replacing `SYCL/sycl.hpp` to `CL/sycl.hpp` in order to follow SYCL 1.2.1 standard. - minor fixes: commenting out an unused variable to avoid compiler warnings.	2020-05-11 16:10:33 +01:00
Rasmus Munk Larsen	c1d944dd91	Remove packet ops pinsertfirst and pinsertlast that are only used in a single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp. I cannot measure any performance changes for SSE, AVX, or AVX512. name old time/op new time/op delta BM_LinSpace<float>/1 1.63ns ± 0% 1.63ns ± 0% ~ (p=0.762 n=5+5) BM_LinSpace<float>/8 4.92ns ± 3% 4.89ns ± 3% ~ (p=0.421 n=5+5) BM_LinSpace<float>/64 34.6ns ± 0% 34.6ns ± 0% ~ (p=0.841 n=5+5) BM_LinSpace<float>/512 217ns ± 0% 217ns ± 0% ~ (p=0.421 n=5+5) BM_LinSpace<float>/4k 1.68µs ± 0% 1.68µs ± 0% ~ (p=1.000 n=5+5) BM_LinSpace<float>/32k 13.3µs ± 0% 13.3µs ± 0% ~ (p=0.905 n=5+4) BM_LinSpace<float>/256k 107µs ± 0% 107µs ± 0% ~ (p=0.841 n=5+5) BM_LinSpace<float>/1M 427µs ± 0% 427µs ± 0% ~ (p=0.690 n=5+5)	2020-05-08 15:41:50 -07:00
David Tellenbach	5c4e19fbe7	Possibility to specify user-defined default cache sizes for GEBP kernel Some architectures have no convinient way to determine cache sizes at runtime. Eigen's GEBP kernel falls back to default cache values in this case which might not be correct in all situations. This patch introduces three preprocessor directives `EIGEN_DEFAULT_L1_CACHE_SIZE` `EIGEN_DEFAULT_L2_CACHE_SIZE` `EIGEN_DEFAULT_L3_CACHE_SIZE` to give users the possibility to set these default values explicitly.	2020-05-08 12:54:36 +02:00
Rasmus Munk Larsen	225ab040e0	Remove unused packet op "palign". Clean up a compiler warning in c++03 mode in AVX512/Complex.h.	2020-05-07 17:14:26 -07:00
Rasmus Munk Larsen	49f1aeb60d	Remove traits declaring NEON vectorized casts that do not actually have packet op implementations.	2020-05-07 09:49:22 -07:00
Xiaoxiang Cao	a74a278abd	Fix confusing template param name for Stride fwd decl.	2020-04-30 01:43:05 +00:00
Rasmus Munk Larsen	923ee9aba3	Fix the embarrassingly incomplete fix to the embarrassing bug in blocked transpose.	2020-04-29 17:27:36 +00:00
Rasmus Munk Larsen	a32923a439	Fix (embarrassing) bug in blocked transpose.	2020-04-29 17:02:27 +00:00
Rasmus Munk Larsen	1e41406c36	Add missing transpose in cleanup loop. Without it, we trip an assertion in debug mode.	2020-04-29 01:30:51 +00:00
Rasmus Munk Larsen	fbe7916c55	Fix compilation error with Clang on Android: _mm_extract_epi64 fails to compile.	2020-04-29 00:58:41 +00:00
Rasmus Munk Larsen	ab773c7e91	Extend support for Packet16b: * Add ptranspose<,4> to support matmul and add unit test for Matrix<bool> Matrix<bool> * work around a bug in slicing of Tensor<bool>. * Add tensor tests This speeds up matmul for boolean matrices by about 10x name old time/op new time/op delta BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5) BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5) BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5) BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5) BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5) BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5) BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)	2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen	b47c777993	Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once. rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.TransposeInPlace.float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s (Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".TransposeInPlace.float.*" experimental/users/rmlarsen/bench:matmul_bench) name old time/op new time/op delta BM_TransposeInPlace<float>/4 9.84ns ± 0% 6.51ns ± 0% -33.80% (p=0.008 n=5+5) BM_TransposeInPlace<float>/8 23.6ns ± 1% 17.6ns ± 0% -25.26% (p=0.016 n=5+4) BM_TransposeInPlace<float>/16 78.8ns ± 0% 60.3ns ± 0% -23.50% (p=0.029 n=4+4) BM_TransposeInPlace<float>/32 302ns ± 0% 229ns ± 0% -24.40% (p=0.008 n=5+5) BM_TransposeInPlace<float>/59 1.03µs ± 0% 0.84µs ± 1% -17.87% (p=0.016 n=5+4) BM_TransposeInPlace<float>/64 1.20µs ± 0% 0.89µs ± 1% -25.81% (p=0.008 n=5+5) BM_TransposeInPlace<float>/128 8.96µs ± 0% 3.82µs ± 2% -57.33% (p=0.008 n=5+5) BM_TransposeInPlace<float>/256 152µs ± 3% 17µs ± 2% -89.06% (p=0.008 n=5+5) BM_TransposeInPlace<float>/512 837µs ± 1% 208µs ± 0% -75.15% (p=0.008 n=5+5) BM_TransposeInPlace<float>/1k 4.28ms ± 2% 1.08ms ± 2% -74.72% (p=0.008 n=5+5)	2020-04-28 16:08:16 +00:00
Pedro Caldeira	29f0917a43	Add support to vector instructions to Packet16uc and Packet16c	2020-04-27 12:48:08 -03:00
Rasmus Munk Larsen	e80ec24357	Remove unused packet op "preduxp".	2020-04-23 18:17:14 +00:00
René Wagner	0aebe19aca	BooleanRedux.h: Add more EIGEN_DEVICE_FUNC qualifiers. This enables operator== on Eigen matrices in device code.	2020-04-23 17:25:08 +02:00
Pedro Caldeira	0c67b855d2	Add Packet8s and Packet8us to support signed/unsigned int16/short Altivec vector operations	2020-04-21 14:52:46 -03:00
Rasmus Munk Larsen	e8f40e4670	Fix bug in ptrue for Packet16b.	2020-04-20 21:45:10 +00:00
Rasmus Munk Larsen	2f6ddaa25c	Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x. Benchmark numbers for the logical and of two NxN tensors: name old time/op new time/op delta BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96% BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07% BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87% BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59% BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87% BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45% BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57% BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83% BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01% BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93% BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11% BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31% BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35% BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07% BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08% BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55% BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%	2020-04-20 20:16:28 +00:00
Rasmus Munk Larsen	5ab87d8aba	Move eigen_packet_wrapper to GenericPacketMath.h and use it for SSE/AVX/AVX512 as it is already used for NEON. This will allow us to define multiple packet types backed by the same vector type, e.g., __m128i. Use this machanism to define packets for half and clean up the packet op implementations.	2020-04-15 18:17:19 +00:00
Rasmus Munk Larsen	4aae8ac693	Fix typo in TypeCasting.h	2020-04-14 02:55:51 +00:00
Rasmus Munk Larsen	1d674003b2	Fix big in vectorized casting of {uint8, int8} -> {int16, uint16, int32, uint32, float} {uint16, int16} -> {int32, uint32, int64, uint64, float} for NEON. These conversions were advertised as vectorized, but not actually implemented.	2020-04-14 02:11:06 +00:00
Christoph Hertzberg	d46d726e9d	CommaInitializer wrongfully asserted for 0-sized blocks commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.	2020-04-13 16:41:20 +02:00
Antonio Sanchez	c854e189e6	Fixed commainitializer test. The removed `finished()` call was responsible for enforcing that the initializer was provided the correct number of values. Putting it back in to restore previous behavior.	2020-04-10 13:53:26 -07:00
Rasmus Munk Larsen	f0577a2bfd	Speed up matrix multiplication for small to medium size matrices by using half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path. Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size. Measured improvement with AVX+FMA: name old time/op new time/op delta BM_MatMul_ATB/8 139ns ± 1% 129ns ± 1% -7.49% (p=0.008 n=5+5) BM_MatMul_ATB/32 1.46µs ± 1% 1.22µs ± 0% -16.72% (p=0.008 n=5+5) BM_MatMul_ATB/64 8.43µs ± 1% 7.41µs ± 0% -12.04% (p=0.008 n=5+5) BM_MatMul_ATB/128 56.8µs ± 1% 52.9µs ± 1% -6.83% (p=0.008 n=5+5) BM_MatMul_ATB/256 407µs ± 1% 395µs ± 3% -2.94% (p=0.032 n=5+5) BM_MatMul_ATB/512 3.27ms ± 3% 3.18ms ± 1% ~ (p=0.056 n=5+5) Measured improvement for AVX512: name old time/op new time/op delta BM_MatMul_ATB/8 167ns ± 1% 154ns ± 1% -7.63% (p=0.008 n=5+5) BM_MatMul_ATB/32 1.08µs ± 1% 0.83µs ± 3% -23.58% (p=0.008 n=5+5) BM_MatMul_ATB/64 6.21µs ± 1% 5.06µs ± 1% -18.47% (p=0.008 n=5+5) BM_MatMul_ATB/128 36.1µs ± 2% 31.3µs ± 1% -13.32% (p=0.008 n=5+5) BM_MatMul_ATB/256 263µs ± 2% 242µs ± 2% -7.92% (p=0.008 n=5+5) BM_MatMul_ATB/512 1.95ms ± 2% 1.91ms ± 2% ~ (p=0.095 n=5+5) BM_MatMul_ATB/1k 15.4ms ± 4% 14.8ms ± 2% ~ (p=0.095 n=5+5)	2020-04-07 22:09:51 +00:00
Antonio Sanchez	9dda5eb7d2	Missing struct definition in NumTraits	2020-04-07 09:01:11 -07:00
Akshay Naresh Modi	bcc0e9e15c	Add numeric_limits min and max for bool This will allow (among other things) computation of argmax and argmin of bool tensors	2020-04-06 23:38:57 +00:00
Bernardo Bahia Monteiro	54a0a9c9dd	Bugfix: conjugate_gradient did not compile with lazy-evaluated RealScalar The error generated by the compiler was: no matching function for call to 'maxi' RealScalar threshold = numext::maxi(toltolrhsNorm2,considerAsZero); The important part in the following notes was: candidate template ignored: deduced conflicting types for parameter 'T'" ('codi::Multiply11<...>' vs. 'codi::ActiveReal<...>') EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y) I am using CoDiPack to provide the RealScalar type. This bug was introduced in `bc000deaa` Fix conjugate-gradient for very small rhs	2020-03-29 19:44:12 -04:00
Rasmus Munk Larsen	393dbd8ee9	Fix bug in `52d54278be`	2020-03-27 16:42:18 +00:00
Joel Holdsworth	6d2dbfc453	NEON: Fixed MSVC types definitions	2020-03-26 20:19:58 +00:00
Joel Holdsworth	52d54278be	Additional NEON packet-math operations	2020-03-26 20:18:19 +00:00
Everton Constantino	deb93ed1bf	Adhere to recommended load/store intrinsics for pp64le	2020-03-23 15:18:15 -03:00
Everton Constantino	5afdaa473a	Fixing float32's pround halfway criteria to match STL's criteria.	2020-03-21 22:30:54 -05:00
Alessio M	96cd1ff718	Fixed: - access violation when initializing 0x0 matrices - exception can be thrown during stack unwind while comma-initializing a matrix if eigen_assert if configured to throw	2020-03-21 05:11:21 +00:00
dlazenby	cc954777f2	Update VectorwiseOp.h to allow Plugins similar to MatrixBase.h or ArrayBase.h	2020-03-20 19:30:01 +00:00
Masaki Murooka	55ecd58a3c	Bug https://gitlab.com/libeigen/eigen/-/issues/1415 : add missing EIGEN_DEVICE_FUNC to diagonal_product_evaluator_base.	2020-03-20 13:37:37 +09:00
Rasmus Munk Larsen	4da2c6b197	Remove reference to non-existent unary_op_base class.	2020-03-19 18:23:06 +00:00
Rasmus Munk Larsen	eda90baf35	Add missing arguments to numext::absdiff().	2020-03-19 18:16:55 +00:00
Joel Holdsworth	d5c665742b	Add absolute_difference coefficient-wise binary Array function	2020-03-19 17:45:20 +00:00
Everton Constantino	6ff5a14091	Reenabling packetmath unsigned tests, adding dummy pabs for relevant unsigned types.	2020-03-19 17:31:49 +00:00
Joel Holdsworth	232f904082	Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions	2020-03-19 17:24:06 +00:00
Joel Holdsworth	54aa8fa186	Implement integer square-root for NEON	2020-03-19 17:05:13 +00:00
Allan Leal	37ccb86916	Update NullaryFunctors.h	2020-03-16 11:59:02 +00:00
Deven Desai	7158ed4e0e	Fixing HIP breakage caused by the recent commit that introduces Packet4h2 as the Eigen::Half packet type	2020-03-12 01:06:24 +00:00
Joel Holdsworth	d53ae40f7b	NEON: Added int64_t and uint64_t packet math	2020-03-10 22:46:19 +00:00
Joel Holdsworth	4b9ecf2924	NEON: Added int8_t and uint8_t packet math	2020-03-10 22:46:19 +00:00
Joel Holdsworth	ceaabd4e16	NEON: Added int16_t and uint16_t packet math	2020-03-10 22:46:19 +00:00
Joel Holdsworth	d5d3cf9339	NEON: Added uint32_t packet math	2020-03-10 22:46:19 +00:00
Joel Holdsworth	eacf97f727	NEON: Implemented half-size vectors	2020-03-10 22:46:19 +00:00
Joel Holdsworth	5f411b729e	NEON: Set packet_traits<double> flags	2020-03-10 22:46:19 +00:00
Sami Kama	b733b8b680	remove duplicate pset1 for half and add some comments about why we need expose pmul/add/div/min/max on host	2020-03-10 20:28:43 +00:00
Rasmus Munk Larsen	52a2fbbb00	Revert "avoid selecting half-packets when unnecessary" This reverts commit `5ca10480b0`	2020-02-25 01:07:43 +00:00
Rasmus Munk Larsen	235bcfe08d	Revert "Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE" This reverts commit `44df2109c8`	2020-02-25 01:07:28 +00:00
Rasmus Munk Larsen	d7a42eade6	Revert "do not pick full-packet if it'd result in more operations" This reverts commit `e9cc0cd353`	2020-02-25 01:07:15 +00:00
Tobias Bosch	f0ce88cff7	Include <sstream> explicitly, and don't rely on the implicit include via <complex>. This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).	2020-02-24 23:09:36 +00:00
Francesco Mazzoli	e9cc0cd353	do not pick full-packet if it'd result in more operations See comment and <https://gitlab.com/libeigen/eigen/merge_requests/46#note_270622952>.	2020-02-07 18:16:16 +01:00
Francesco Mazzoli	44df2109c8	Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE See comment for details.	2020-02-07 18:16:16 +01:00
Francesco Mazzoli	5ca10480b0	avoid selecting half-packets when unnecessary See <https://stackoverflow.com/questions/59709148/ensuring-that-eigen-uses-avx-vectorization-for-a-certain-operation> for an explanation of the problem this solves. In short, for some reason, before this commit the half-packet is selected when the array / matrix size is not a multiple of `unpacket_traits<PacketType>::size`, where `PacketType` starts out being the full Packet. For example, for some data of 100 `float`s, `Packet4f` will be selected rather than `Packet8f`, because 100 is not a multiple of 8, the size of `Packet8f`. This commit switches to selecting the half-packet if the size is less than the packet size, which seems to make more sense. As I stated in the SO post I'm not sure that I'm understanding the issue correctly, but this fix resolves the issue in my program. Moreover, `make check` passes, with the exception of line 614 and 616 in `test/packetmath.cpp`, which however also fail on master on my machine: CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i0, internal::pbessel_i0); ... CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i1, internal::pbessel_i1);	2020-02-07 18:16:16 +01:00
Rasmus Munk Larsen	6601abce86	Remove rogue include in TypeCasting.h. Meta.h is already included by the top-level header in Eigen/Core.	2020-01-14 21:03:53 +00:00
Everton Constantino	5a8b97b401	Switching unpacket_traits<Packet4i> to vectorizable=true.	2020-01-13 16:08:20 -03:00
Everton Constantino	42838c28b8	Adding correct cache sizes for PPC architecture.	2020-01-13 16:58:14 +00:00
Rasmus Munk Larsen	e1ecfc162d	call Explicitly ::rint and ::rintf for targets without c++11. Without this, the Windows build breaks when trying to compile numext::rint<double>.	2020-01-10 21:14:08 +00:00
Joel Holdsworth	da5a7afed0	Improvements to the tidiness and completeness of the NEON implementation	2020-01-10 18:31:15 +00:00
Anuj Rawat	452371cead	Fix for gcc build error when using Eigen headers with AVX512	2020-01-10 18:05:42 +00:00
mehdi-goli	601f89dfd0	Adding RInt vector support for SYCL.	2020-01-10 18:00:36 +00:00
Rasmus Munk Larsen	9254974115	Don't add EIGEN_DEVICE_FUNC to random() since ::rand is not available in Cuda.	2020-01-09 21:23:09 +00:00
Rasmus Munk Larsen	a3ec89b5bd	Add missing EIGEN_DEVICE_FUNC annotations in MathFunctions.h.	2020-01-09 21:06:34 +00:00
Rasmus Munk Larsen	e6fcee995b	Don't use the rational approximation to the logistic function on GPUs as it appears to be slightly slower.	2020-01-09 00:04:26 +00:00
Rasmus Munk Larsen	4217a9f090	The upper limits for where to use the rational approximation to the logistic function were not set carefully enough in the original commit, and some arguments would cause the function to return values greater than 1. This change set the versions found by scanning all floating point numbers (using std::nextafterf()).	2020-01-08 22:21:37 +00:00
Ilya Tokar	19876ced76	Bug #1785 : Introduce numext::rint. This provides a new op that matches std::rint and previous behavior of pround. Also adds corresponding unsupported/../Tensor op. Performance is the same as e. g. floor (tested SSE/AVX).	2020-01-07 21:22:44 +00:00
mehdi-goli	d0ae052da4	[SYCL Backend] * Adding Missing operations for vector comparison in SYCL. This caused compiler error for vector comparison when compiling SYCL * Fixing the compiler error for placement new in TensorForcedEval.h This caused compiler error when compiling SYCL backend * Reducing the SYCL warning by removing the abort function inside the kernel * Adding Strong inline to functions inside SYCL interop.	2020-01-07 15:13:37 +00:00
Janek Kozicki	00de570793	Fix -Werror -Wfloat-conversion warning.	2019-12-23 23:52:44 +01:00
Christoph Hertzberg	870e53c0f2	Bug #1788 : Fix rule-of-three violations inside the stable modules. This fixes deprecated-copy warnings when compiling with GCC>=9 Also protect some additional Base-constructors from getting called by user code code (#1587)	2019-12-19 17:30:11 +01:00
Christoph Hertzberg	72166d0e6e	Fix some maybe-unitialized warnings	2019-12-18 18:26:20 +01:00
Rasmus Munk Larsen	7252163335	Add default definition for EIGEN_PREDICT_*	2019-12-16 22:31:59 +00:00
Rasmus Munk Larsen	a566074480	Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function). This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in `66f07efeae`), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9. This change also contains a few improvements to speed up the original float specialization of logistic: - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case). - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup). The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set. The benchmarks below repeated calls u = v.logistic() (u = v.tanh(), respectively) where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1]. Benchmark numbers for logistic: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 4467 4468 155835 model_time: 4827 AVX BM_eigen_logistic_float 2347 2347 299135 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1467 1467 476143 model_time: 2926 AVX512 BM_eigen_logistic_float 805 805 858696 model_time: 1463 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 2589 2590 270264 model_time: 4827 AVX BM_eigen_logistic_float 1428 1428 489265 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1059 1059 662255 model_time: 2926 AVX512 BM_eigen_logistic_float 673 673 1000000 model_time: 1463 Benchmark numbers for tanh: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2391 2391 292624 model_time: 4242 AVX BM_eigen_tanh_float 1256 1256 554662 model_time: 2633 AVX+FMA BM_eigen_tanh_float 823 823 866267 model_time: 1609 AVX512 BM_eigen_tanh_float 443 443 1578999 model_time: 805 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2588 2588 273531 model_time: 4242 AVX BM_eigen_tanh_float 1536 1536 452321 model_time: 2633 AVX+FMA BM_eigen_tanh_float 1007 1007 694681 model_time: 1609 AVX512 BM_eigen_tanh_float 471 471 1472178 model_time: 805	2019-12-16 21:33:42 +00:00
Christoph Hertzberg	8e5da71466	Resolve double-promotion warnings when compiling with clang. `sin` was calling `sin(double)` instead of `std::sin(float)`	2019-12-13 22:46:40 +01:00
Ilya Tokar	06e99aaf40	Bug 1785: fix pround on x86 to use the same rounding mode as std::round. This also adds pset1frombits helper to Packet[24]d. Makes round ~45% slower for SSE: 1.65µs ± 1% before vs 2.45µs ± 2% after, stil an order of magnitude faster than scalar version: 33.8µs ± 2%.	2019-12-12 17:38:53 -05:00
Rasmus Munk Larsen	73a8d572f5	Clamp tanh approximation outside [-c, c] where c is the smallest value where the approximation is exactly +/-1. Without FMA, c = 7.90531110763549805, with FMA c = 7.99881172180175781.	2019-12-12 19:34:25 +00:00
Srinivas Vasudevan	88062b7fed	Fix implementation of complex expm1. Add tests that fail with previous implementation, but pass with the current one.	2019-12-12 01:56:54 +00:00
Joel Holdsworth	3c0ef9f394	IO: Fixed printing of char and unsigned char matrices	2019-12-11 18:22:57 +00:00
Joel Holdsworth	e87af0ed37	Added Eigen::numext typedefs for uint8_t, int8_t, uint16_t and int16_t	2019-12-11 18:22:57 +00:00
Gael Guennebaud	15b3bcfca0	Bug 1786: fix compilation with MSVC	2019-12-11 16:16:38 +01:00
Deven Desai	c49f0d851a	Fix for HIP breakage detected on 191210 The following commit introduces compile errors when running eigen with hipcc `2918f85ba9` hipcc errors out because it requies the device attribute on the methods within the TensorBlockV2ResourceRequirements struct instroduced by the commit above. The fix is to add the device attribute to those methods	2019-12-10 22:14:05 +00:00
Gael Guennebaud	8fbe0e4699	Update old links to bitbucket to point to gitlab.com	2019-12-04 10:57:07 +01:00
Rasmus Larsen	cacf433975	Merged in anshuljl/eigen-2/Anshul-Jaiswal/update-configurevectorizationh-to-not-op-1573079916090 (pull request PR-754) Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda. Approved-by: Deven Desai <deven.desai.amd@gmail.com>	2019-12-04 00:45:42 +00:00
Gael Guennebaud	6358599ecb	Fix QuaternionBase::cast for quaternion map and wrapper.	2019-12-03 14:51:14 +01:00
Gael Guennebaud	7745f69013	bug #1776 : fix vector-wise STL iterator's operator-> using a proxy as pointer type. This changeset fixes also the value_type definition.	2019-12-03 14:40:15 +01:00
Rasmus Munk Larsen	66f07efeae	Revert the specialization for scalar_logistic_op<float> introduced in: `77b447c24e` While providing a 50% speedup on Haswell+ processors, the large relative error outside [-18, 18] in this approximation causes problems, e.g., when computing gradients of activation functions like softplus in neural networks.	2019-12-02 17:00:58 -08:00
Rasmus Larsen	3b15373bb3	Merged in ezhulenev/eigen-02 (pull request PR-767) Fix shadow warnings in AlignedBox and SparseBlock	2019-12-02 18:23:11 +00:00
Deven Desai	312c8e77ff	Fix for the HIP build+test errors. Recent changes have introduced the following build error when compiling with HIPCC --------- unsupported/test/../../Eigen/src/Core/GenericPacketMath.h:254:58: error: 'ldexp': no overloaded function has restriction specifiers that are compatible with the ambient context 'pldexp' --------- The fix for the error is to pick the math function(s) from the global namespace (where they are declared as device functions in the HIP header files) when compiling with HIPCC.	2019-12-02 17:41:32 +00:00
Mehdi Goli	00f32752f7	[SYCL] Rebasing the SYCL support branch on top of the Einge upstream master branch. * Unifying all loadLocalTile from lhs and rhs to an extract_block function. * Adding get_tensor operation which was missing in TensorContractionMapper. * Adding the -D method missing from cmake for Disable_Skinny Contraction operation. * Wrapping all the indices in TensorScanSycl into Scan parameter struct. * Fixing typo in Device SYCL * Unifying load to private register for tall/skinny no shared * Unifying load to vector tile for tensor-vector/vector-tensor operation * Removing all the LHS/RHS class for extracting data from global * Removing Outputfunction from TensorContractionSkinnyNoshared. * Combining the local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining General Tensor-Vector and VectorTensor contraction into one kernel. * Making double buffering optional for Tensor contraction when local memory is version is used. * Modifying benchmark to accept custom Reduction Sizes * Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host * Adding Test for SYCL * Modifying SYCL CMake	2019-11-28 10:08:54 +00:00
Eugene Zhulenev	82a47338df	Fix shadow warnings in AlignedBox and SparseBlock	2019-11-27 16:22:27 -08:00
Rasmus Munk Larsen	ea51a9eace	Add missing EIGEN_DEVICE_FUNC attribute to template specializations for pexp to fix GPU build.	2019-11-27 10:17:09 -08:00
Rasmus Munk Larsen	5a3ebda36b	Fix warning due to missing cast for exponent arguments for std::frexp and std::lexp.	2019-11-26 16:18:29 -08:00
Joel Holdsworth	86eb41f1cb	SparseRef: Fixed alignment warning on ARM GCC	2019-11-07 14:34:06 +00:00
Anshul Jaiswal	c1a67cb5af	Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda.	2019-11-06 22:40:38 +00:00
Rasmus Munk Larsen	cc3d0e6a40	Add EIGEN_HAS_INTRINSIC_INT128 macro Add a new EIGEN_HAS_INTRINSIC_INT128 macro, and use this instead of __SIZEOF_INT128__. This fixes related issues with TensorIntDiv.h when building with Clang for Windows, where support for 128-bit integer arithmetic is advertised but broken in practice.	2019-11-06 14:24:33 -08:00
Rasmus Munk Larsen	ee404667e2	Rollback or PR-746 and partial rollback of `668ab3fc47` . std::array is still not supported in CUDA device code on Windows.	2019-11-05 17:17:58 -08:00
Hans Johnson	e78ed6e7f3	COMP: Simplify install commands for Eigen Confirm that install directory is identical before and after this simplifying patch. ```bash hg clone <<Eigen>> mkdir eigen-bld cd eigen-bld cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/bef make install find /tmp/pre_eigen_modernize >/tmp/bef # Apply this patch cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/aft make install find /tmp/post_eigen_modernize \|sed 's/post_e/pre_e/g' >/tmp/aft diff /tmp/bef /tmp/aft ```	2019-11-17 15:14:25 -06:00
Gael Guennebaud	e5778b87b9	Fix duplicate symbol linking error.	2019-11-20 17:23:19 +01:00
Hans Johnson	6fb3e5f176	STYLE: Remove CMake-language block-end command arguments Ancient versions of CMake required else(), endif(), and similar block termination commands to have arguments matching the command starting the block. This is no longer the preferred style.	2019-10-31 11:36:27 -05:00
Rasmus Munk Larsen	f1e8307308	1. Fix a bug in psqrt and make it return 0 for +inf arguments. 2. Simplify handling of special cases by taking advantage of the fact that the builtin vrsqrt approximation handles negative, zero and +inf arguments correctly. This speeds up the SSE and AVX implementations by ~20%. 3. Make the Newton-Raphson formula used for rsqrt more numerically robust: Before: y = y * (1.5 - x/2 * y^2) After: y = y * (1.5 - y * (x/2) * y) Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision. 4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration. Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o	2019-11-15 17:09:46 -08:00
Gael Guennebaud	2cb2915f90	bug #1744 : fix compilation with MSVC 2017 and AVX512, plog1p/pexpm1 require plog/pexp, but the later was disabled on some compilers	2019-11-15 13:39:51 +01:00
Gael Guennebaud	8af045a287	bug #1774 : fix VectorwiseOp::begin()/end() return types regarding constness.	2019-11-14 11:45:52 +01:00
Sakshi Goynar	75b4c0a3e0	PR 751: Fixed compilation issue when compiling using MSVC with /arch:AVX512 flag	2019-10-31 16:09:16 -07:00
Gael Guennebaud	8496f86f84	Enable CompleteOrthogonalDecomposition::pseudoInverse with non-square fixed-size matrices.	2019-11-13 21:16:53 +01:00
Gael Guennebaud	71aa53dd6d	Disable AVX on broken xcode versions. See PR 748. Patch adapted from Hans Johnson's PR 748.	2019-11-12 11:40:38 +01:00
Eugene Zhulenev	e7ed4bd388	Remove internal::smart_copy and replace with std::copy	2019-10-29 11:25:24 -07:00
Gael Guennebaud	e7d8ba747c	bug #1752 : make is_convertible equivalent to the std c++11 equivalent and fallback to std::is_convertible when c++11 is enabled.	2019-10-10 17:41:47 +02:00
Gael Guennebaud	196de2efe3	Explicitly bypass resize and memmoves when there is already the exact right number of elements available.	2019-10-08 21:44:33 +02:00
Gael Guennebaud	d1def335dc	fix one more possible conflicts with real/imag	2019-10-08 16:19:10 +02:00
Gael Guennebaud	87427d2eaa	PR 719: fix real/imag namespace conflict	2019-10-08 09:15:17 +02:00
Rasmus Munk Larsen	fab4e3a753	Address comments on Chebyshev evaluation code: 1. Use pmadd when possible. 2. Add casts to avoid c++03 warnings.	2019-10-02 12:48:17 -07:00
Rasmus Munk Larsen	bd0fac456f	Prevent infinite loop in the nvcc compiler while unrolling the recurrent templates for Chebyshev polynomial evaluation.	2019-10-01 13:15:30 -07:00
Gael Guennebaud	9549ba8313	Fix perf issue in SimplicialLDLT::solve for complexes (again, m_diag is real)	2019-10-01 12:54:25 +02:00
Gael Guennebaud	c8b2c603b0	Fix speed issue with SimplicialLDLT for complexes: the diagonal is real!	2019-09-30 16:14:34 +02:00
Rasmus Munk Larsen	13ef08e5ac	Move implementation of vectorized error function erf() to SpecialFunctionsImpl.h.	2019-09-27 13:56:04 -07:00
Eugene Zhulenev	0c845e28c9	Fix erf in c++03	2019-09-25 11:31:45 -07:00
Deven Desai	5e186b1987	Fix for the HIP build+test errors. The errors were introduced by this commit : `d38e6fbc27` After the above mentioned commit, some of the tests started failing with the following error ``` Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:70: /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h:28:22: error: call to 'erf' is ambiguous return Eigen::half(Eigen::numext::erf(static_cast<float>(a))); ^~~~~~~~~~~~~~~~~~ /home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1600:7: note: candidate function [with T = float] float erf(const float &x) { return ::erff(x); } ^ /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = float] erf(const Scalar& x) { ^ In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75: /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:23: error: call to 'erf' is ambiguous return make_double2(erf(a.x), erf(a.y)); ^~~ /home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double] double erf(const double &x) { return ::erf(x); } ^ /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double] erf(const Scalar& x) { ^ In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75: /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:33: error: call to 'erf' is ambiguous return make_double2(erf(a.x), erf(a.y)); ^~~ /home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double] double erf(const double &x) { return ::erf(x); } ^ /home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double] erf(const Scalar& x) { ^ 3 errors generated. ``` This PR fixes the compile error by removing the "old" implementation for "erf" (assuming that the "new" implementation is what we want going forward. from a GPU point-of-view both implementations are the same). This PR also fixes what seems like a cut-n-paste error in the aforementioned commit	2019-09-25 15:39:13 +00:00
Rasmus Larsen	d38e6fbc27	Merged in rmlarsen/eigen (pull request PR-704) Add generic PacketMath implementation of the Error Function (erf).	2019-09-24 23:40:29 +00:00
Eugene Zhulenev	ef9dfee7bd	Tensor block evaluation V2 support for unary/binary/broadcsting	2019-09-24 12:52:45 -07:00
Christoph Hertzberg	efd9867ff0	bug #1746 : Removed implementation of standard copy-constructor and standard copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types	2019-09-24 11:09:58 +02:00
Rasmus Munk Larsen	6de5ed08d8	Add generic PacketMath implementation of the Error Function (erf).	2019-09-19 12:48:30 -07:00
Rasmus Munk Larsen	28b6786498	Fix build on setups without AVX512DQ.	2019-09-19 12:36:09 -07:00
Deven Desai	e02d429637	Fix for the HIP build+test errors. The errors were introduced by this commit : `6e215cf109` The fix is switching to using ::<math_func> instead std::<math_func> when compiling for GPU	2019-09-18 18:44:20 +00:00
Srinivas Vasudevan	6e215cf109	Add Bessel functions to SpecialFunctions. - Split SpecialFunctions files in to a separate BesselFunctions file. In particular add: - Modified bessel functions of the second kind k0, k1, k0e, k1e - Bessel functions of the first kind j0, j1 - Bessel functions of the second kind y0, y1	2019-09-14 12:16:47 -04:00
Srinivas Vasudevan	facdec5aa7	Add packetized versions of i0e and i1e special functions. - In particular refactor the i0e and i1e code so scalar and vectorized path share code. - Move chebevl to GenericPacketMathFunctions. A brief benchmark with building Eigen with FMA, AVX and AVX2 flags Before: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 57.3 57.3 10000000 BM_eigen_i0e_double/8 398 398 1748554 BM_eigen_i0e_double/64 3184 3184 218961 BM_eigen_i0e_double/512 25579 25579 27330 BM_eigen_i0e_double/4k 205043 205042 3418 BM_eigen_i0e_double/32k 1646038 1646176 422 BM_eigen_i0e_double/256k 13180959 13182613 53 BM_eigen_i0e_double/1M 52684617 52706132 10 BM_eigen_i0e_float/1 28.4 28.4 24636711 BM_eigen_i0e_float/8 75.7 75.7 9207634 BM_eigen_i0e_float/64 512 512 1000000 BM_eigen_i0e_float/512 4194 4194 166359 BM_eigen_i0e_float/4k 32756 32761 21373 BM_eigen_i0e_float/32k 261133 261153 2678 BM_eigen_i0e_float/256k 2087938 2088231 333 BM_eigen_i0e_float/1M 8380409 8381234 84 BM_eigen_i1e_double/1 56.3 56.3 10000000 BM_eigen_i1e_double/8 397 397 1772376 BM_eigen_i1e_double/64 3114 3115 223881 BM_eigen_i1e_double/512 25358 25361 27761 BM_eigen_i1e_double/4k 203543 203593 3462 BM_eigen_i1e_double/32k 1613649 1613803 428 BM_eigen_i1e_double/256k 12910625 12910374 54 BM_eigen_i1e_double/1M 51723824 51723991 10 BM_eigen_i1e_float/1 28.3 28.3 24683049 BM_eigen_i1e_float/8 74.8 74.9 9366216 BM_eigen_i1e_float/64 505 505 1000000 BM_eigen_i1e_float/512 4068 4068 171690 BM_eigen_i1e_float/4k 31803 31806 21948 BM_eigen_i1e_float/32k 253637 253692 2763 BM_eigen_i1e_float/256k 2019711 2019918 346 BM_eigen_i1e_float/1M 8238681 8238713 86 After: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 15.8 15.8 44097476 BM_eigen_i0e_double/8 99.3 99.3 7014884 BM_eigen_i0e_double/64 777 777 886612 BM_eigen_i0e_double/512 6180 6181 100000 BM_eigen_i0e_double/4k 48136 48140 14678 BM_eigen_i0e_double/32k 385936 385943 1801 BM_eigen_i0e_double/256k 3293324 3293551 228 BM_eigen_i0e_double/1M 12423600 12424458 57 BM_eigen_i0e_float/1 16.3 16.3 43038042 BM_eigen_i0e_float/8 30.1 30.1 23456931 BM_eigen_i0e_float/64 169 169 4132875 BM_eigen_i0e_float/512 1338 1339 516860 BM_eigen_i0e_float/4k 10191 10191 68513 BM_eigen_i0e_float/32k 81338 81337 8531 BM_eigen_i0e_float/256k 651807 651984 1000 BM_eigen_i0e_float/1M 2633821 2634187 268 BM_eigen_i1e_double/1 16.2 16.2 42352499 BM_eigen_i1e_double/8 110 110 6316524 BM_eigen_i1e_double/64 822 822 851065 BM_eigen_i1e_double/512 6480 6481 100000 BM_eigen_i1e_double/4k 51843 51843 10000 BM_eigen_i1e_double/32k 414854 414852 1680 BM_eigen_i1e_double/256k 3320001 3320568 212 BM_eigen_i1e_double/1M 13442795 13442391 53 BM_eigen_i1e_float/1 17.6 17.6 41025735 BM_eigen_i1e_float/8 35.5 35.5 19597891 BM_eigen_i1e_float/64 240 240 2924237 BM_eigen_i1e_float/512 1424 1424 485953 BM_eigen_i1e_float/4k 10722 10723 65162 BM_eigen_i1e_float/32k 86286 86297 8048 BM_eigen_i1e_float/256k 691821 691868 1000 BM_eigen_i1e_float/1M 2777336 2777747 256 This shows anywhere from a 50% to 75% improvement on these operations. I've also benchmarked without any of these flags turned on, and got similar performance to before (if not better). Also tested packetmath.cpp + special_functions to ensure no regressions.	2019-09-11 18:34:02 -07:00
Srinivas Vasudevan	b052ec6992	Merged eigen/eigen into default	2019-09-11 18:01:54 -07:00
Deven Desai	cdb377d0cb	Fix for the HIP build+test errors introduced by the ndtri support. The fixes needed are * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs) * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC * removing an errant "j" from a testcase (don't know how that made it in to begin with!)	2019-09-06 16:03:49 +00:00
Gael Guennebaud	747c6a51ca	bug #1736 : fix compilation issue with A(all,{1,2}).col(j) by implementing true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i)	2019-09-11 15:40:07 +02:00
Gael Guennebaud	031f17117d	bug #1741 : fix self-adjointmatrix, triangularmatrix, and triangular^1*matrix with a destination having a non-trivial inner-stride	2019-09-11 15:04:25 +02:00
Gael Guennebaud	459b2bcc08	Fix compilation of BLAS backend and frontend	2019-09-11 10:02:37 +02:00
Gael Guennebaud	afa8d13532	Fix some implicit literal to Scalar conversions in SparseCore	2019-09-11 00:03:07 +02:00
Gael Guennebaud	c06e6fd115	bug #1741 : fix SelfAdjointView::rankUpdate and product to triangular part for destination with non-trivial inner stride	2019-09-10 23:29:52 +02:00
Gael Guennebaud	ea0d5dc956	bug #1741 : fix C.noalias() = A*C; with C.innerStride()!=1	2019-09-10 16:25:24 +02:00
Gael Guennebaud	17226100c5	Fix a circular dependency regarding pshift* functions and GenericPacketMathFunctions. Another solution would have been to make pshift* fully generic template functions with partial specialization which is always a mess in c++03.	2019-09-06 09:26:04 +02:00
Gael Guennebaud	55b63d4ea3	Fix compilation without vector engine available (e.g., x86 with SSE disabled): -> ppolevl is required by ndtri even for the scalar path	2019-09-05 18:16:46 +02:00
Srinivas Vasudevan	a9cf823db7	Merged eigen/eigen	2019-09-04 23:50:52 -04:00
Gael Guennebaud	e6c183f8fd	Fix doc issues regarding ndtri	2019-09-04 23:00:21 +02:00
Gael Guennebaud	5702a57926	Fix possible warning regarding strict equality comparisons	2019-09-04 22:57:04 +02:00
Srinivas Vasudevan	99036a3615	Merging from eigen/eigen.	2019-09-03 15:34:47 -04:00
Gael Guennebaud	8e7e3d9bc8	Makes Scalar/RealScalar typedefs public in Pardiso's wrappers (see PR 688)	2019-09-03 13:09:03 +02:00
Srinivas Vasudevan	e38dd48a27	PR 681: Add ndtri function, the inverse of the normal distribution function.	2019-08-12 19:26:29 -04:00
Eugene Zhulenev	f59bed7a13	Change typedefs from private to protected to fix MSVC compilation	2019-09-03 19:11:36 -07:00
Srinivas Vasudevan	18ceb3413d	Add ndtri function, the inverse of the normal distribution function.	2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen	d55d392e7b	Fix bugs in log1p and expm1 where repeated using statements would clobber each other. Add specializations for complex types since std::log1p and std::exp1m do not support complex.	2019-08-08 16:27:32 -07:00
Gael Guennebaud	15f3d9d272	More colamd cleanup: - Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc. - Fix signed/unsigned warning - move some ugly free functions as member functions	2019-09-03 00:50:51 +02:00
Anshul Jaiswal	a4d1a6cd7d	Eigen_Colamd.h updated to replace constexpr with consts and enums.	2019-08-17 05:29:23 +00:00
Anshul Jaiswal	283558face	Ordering.h edited to fix dependencies on Eigen_Colamd.h	2019-08-15 20:21:56 +00:00
Anshul Jaiswal	39f30923c2	Eigen_Colamd.h edited replacing macros with constexprs and functions.	2019-08-15 20:15:19 +00:00
Anshul Jaiswal	0a6b553ecf	Eigen_Colamd.h edited online with Bitbucket replacing constant #defines with const definitions	2019-07-21 04:53:31 +00:00
Michael Grupp	6e17491f45	Fix typo in Umeyama method documentation	2019-07-17 11:20:41 +00:00
Christoph Hertzberg	e0f5a2a456	Remove {} accidentally added in previous commit	2019-07-18 20:22:17 +02:00
Christoph Hertzberg	ea6d7eb32f	Move variadic constructors outside `#ifndef EIGEN_PARSED_BY_DOXYGEN` block, to make it actually appear in the generated documentation.	2019-07-12 19:46:37 +02:00
Christoph Hertzberg	c2671e5315	Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNING Also, document LinSpaced only where it is implemented	2019-07-12 19:43:32 +02:00
Rasmus Munk Larsen	23b958818e	Fix compiler for unsigned integers.	2019-07-09 11:18:25 -07:00
Anshul Jaiswal	fab51d133e	Updated Eigen_Colamd.h, namespacing macros ALIVE & DEAD as COLAMD_ALIVE & COLAMD_DEAD to prevent conflicts with other libraries / code.	2019-06-08 21:09:06 +00:00
Rasmus Munk Larsen	f6c51d9209	Fix missing header inclusion and colliding definitions for half type casting, which broke build with -march=native on Haswell/Skylake.	2019-08-30 14:03:29 -07:00
Rasmus Munk Larsen	1187bb65ad	Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf.	2019-08-28 12:20:21 -07:00
Rasmus Munk Larsen	9aba527405	Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly.	2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen	b021cdea6d	Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.	2019-08-27 11:30:31 -07:00
Christoph Hertzberg	2fb24384c9	Merged in jaopaulolc/eigen (pull request PR-679) Fixes for Altivec/VSX and compilation with clang on PowerPC	2019-08-22 15:57:33 +00:00
João P. L. de Carvalho	5ac7984ffa	Fix debug macros in p{load,store}u	2019-08-14 11:59:12 -06:00
João P. L. de Carvalho	db9147ae40	Add missing pcmp_XX methods for double/Packet2d This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.	2019-08-14 10:37:39 -06:00
Rasmus Munk Larsen	a3298b22ec	Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments. Depending on instruction set, significant speedups are observed for the vectorized path: log1p wall time is reduced 60-93% (2.5x - 15x speedup) expm1 wall time is reduced 0-85% (1x - 7x speedup) The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly. Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM	2019-08-12 13:53:28 -07:00
João P. L. de Carvalho	787f6ef025	Fix packed load/store for PowerPC's VSX The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts. For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f. Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.	2019-08-09 16:02:55 -06:00
João P. L. de Carvalho	4d29aa0294	Fix offset argument of ploadu/pstoreu for Altivec If no offset is given, them it should be zero. Also passes full address to vec_vsx_ld/st builtins. Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT. Removes unnecessary casts.	2019-08-09 15:59:26 -06:00
João P. L. de Carvalho	66d073c38e	bug #1718 : Add cast to successfully compile with clang on PowerPC Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h	2019-08-09 15:56:26 -06:00
Justin Carpentier	ffaf658ecd	PR 655: Fix missing Eigen namespace in Macros	2019-06-05 09:51:59 +02:00
Mehdi Goli	0b24e1cb5c	[SYCL] Adding the SYCL memory model. The SYCL memory model provides : * an interface for SYCL buffers to behave as a non-dereferenceable pointer * an interface for placeholder accessor to behave like a pointer on both host and device	2019-07-01 16:02:30 +01:00
Rasmus Munk Larsen	8053eeb51e	Fix CUDA compilation error for pselect<half>.	2019-06-28 12:07:29 -07:00
Mehdi Goli	16a56b2ddd	[SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL. * Adding SYCL memory model * Enabling/Disabling SYCL backend in Core * Supporting Vectorization	2019-06-27 12:25:09 +01:00
Deven Desai	ba506d5bd2	fix for a ROCm/HIP specificcompile errror introduced by a recent commit.	2019-06-22 00:06:05 +00:00
Rasmus Munk Larsen	c9394d7a0e	Remove extra "one" in comment.	2019-06-20 16:23:19 -07:00
Rasmus Munk Larsen	b8f8dac4eb	Update comment as suggested by tra@google.com.	2019-06-20 16:18:37 -07:00
Rasmus Munk Larsen	e5e63c2cad	Fix grammar.	2019-06-20 16:03:59 -07:00
Rasmus Munk Larsen	302a404b7e	Added comment explaining the surprising EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC clause.	2019-06-20 15:59:08 -07:00
Rasmus Munk Larsen	b5237f53b1	Fix CUDA build on Mac.	2019-06-20 15:44:14 -07:00
Rasmus Munk Larsen	988f24b730	Various fixes for packet ops. 1. Fix buggy pcmp_eq and unit test for half types. 2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types. 3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.	2019-06-20 11:47:49 -07:00
Christoph Hertzberg	e0be7f30e1	bug #1724 : Mask buggy warnings with g++-7 (grafted from `427f2f66d6` )	2019-06-14 14:57:46 +02:00
Rasmus Munk Larsen	6d432eae5d	Make is_valid_index_type return false for float and double when EIGEN_HAS_TYPE_TRAITS is off.	2019-06-05 16:42:27 -07:00
Rasmus Munk Larsen	f715f6e816	Add workaround for choosing the right include files with FP16C support with clang.	2019-06-05 13:36:37 -07:00
Rasmus Munk Larsen	b08527b0c1	Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures.	2019-05-31 15:26:06 -07:00
Deven Desai	2c38930161	fix for HIP build errors that were introduced by a commit earlier this week	2019-05-24 14:25:32 +00:00
Gustavo Lima Chaves	56bc4974fb	GEMV: remove double declaration of constant. That was hurting users with compilers that would object to proceed with that: """ ./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow] LhsPacketSize = Traits::LhsPacketSize, ^ ./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here static const Index LhsPacketSize = Traits::LhsPacketSize; """	2019-05-23 14:50:29 -07:00
Christoph Hertzberg	ac21a08c13	Cast Index to RealScalar This fixes compilation issues with RealScalar types that are not implicitly castable from Index (e.g. ceres Jet types). Reported by Peter Anderson-Sprecher via eMail	2019-05-23 15:31:12 +02:00
Rasmus Munk Larsen	3eb5ad0ed0	Enable support for F16C with Clang. The required intrinsics were added here: https://reviews.llvm.org/D16177 and are part of LLVM 3.8.0.	2019-05-20 17:19:20 -07:00
Rasmus Larsen	e92486b8c3	Merged in rmlarsen/eigen (pull request PR-643) Make Eigen build with cuda 10 and clang. Approved-by: Justin Lebar <justin.lebar@gmail.com>	2019-05-20 17:02:39 +00:00
Gael Guennebaud	cc7ecbb124	Merged in scramsby/eigen (pull request PR-646) Eigen: Fix MSVC C++17 language standard detection logic	2019-05-20 07:19:10 +00:00

... 3 4 5 6 7 ...

6445 Commits