Gael Guennebaud
06eb24cf4d
Introduce gpu_assert for assertion in device-code, and disable them with clang-cuda.
2018-07-13 16:04:27 +02:00
Gael Guennebaud
5fd03ddbfb
Make EIGEN_TEST_CUDA_CLANG more friendly with OSX
2018-07-13 16:03:14 +02:00
Gael Guennebaud
86d9c0255c
Forward declaring std::array does not work with all std libs, so let's just include <array>
2018-07-13 13:06:44 +02:00
David Hyde
d908afe35f
bug #1558 : fix a corner case in MINRES when both v_new and w_new vanish.
2018-07-08 22:06:38 -07:00
Eugene Zhulenev
6e654f3379
Reduce number of allocations in TensorContractionThreadPool.
2018-07-16 14:26:39 -07:00
Gael Guennebaud
7ccb623746
bug #1569 : fix Tensor<half>::mean() on AVX with respective unit test.
2018-07-19 13:15:40 +02:00
Alexey Frunze
1f523e7304
Add MIPS changes missing from previous merge.
2018-07-18 12:27:50 -07:00
Eugene Zhulenev
e3c2d61739
Assert that no output kernel is defined for GPU contraction
2018-07-18 14:34:22 -07:00
Eugene Zhulenev
086ded5c85
Disable type traits for GCC < 5.1.0
2018-07-18 16:32:55 -07:00
Eugene Zhulenev
79d4129cce
Specify default output kernel for TensorContractionOp
2018-07-18 14:21:01 -07:00
Gael Guennebaud
6e5a3b898f
Add regression for bugs #1573 and #1575
2018-07-18 23:34:34 +02:00
Gael Guennebaud
863580fe88
bug #1432 : fix conservativeResize for non-relocatable scalar types. For those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable
2018-07-18 23:33:07 +02:00
Gael Guennebaud
053ed97c72
Generalize ScalarWithExceptions to a full non-copyable and trowing scalar type to be used in other unit tests.
2018-07-18 23:27:37 +02:00
Gael Guennebaud
a503fc8725
bug #1575 : fix regression introduced in bug #1573 patch. Move ctor/assignment should not be defaulted.
2018-07-18 23:26:13 +02:00
Gael Guennebaud
308725c3c9
More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDA
2018-07-18 13:51:36 +02:00
Alexey Frunze
3875fb05aa
Add support for MIPS SIMD (MSA)
2018-07-06 16:04:30 -07:00
Gael Guennebaud
44ea5f7623
Add unit test for -Tensor<complex> on GPU
2018-07-12 17:19:38 +02:00
Gael Guennebaud
12e1ebb68b
Remove local Index typedef from unit-tests
2018-07-12 17:16:40 +02:00
Gael Guennebaud
63185be8b2
Disable eigenvalues test for clang-cuda
2018-07-12 17:03:14 +02:00
Gael Guennebaud
bec013b2c9
fix unused warning
2018-07-12 17:02:18 +02:00
Gael Guennebaud
5c73c9223a
Fix shadowing typedefs
2018-07-12 17:01:07 +02:00
Gael Guennebaud
98728312c8
Fix compilation regarding std::array
2018-07-12 17:00:37 +02:00
Gael Guennebaud
eb3d8f68bb
fix unused warning
2018-07-12 16:59:47 +02:00
Gael Guennebaud
006e18e52b
Cleanup the mess in Eigen/Core by moving CUDA/HIP stuff at more appropriate places (Macros.h),
...
and alignment/vectorization logic is now in util/ConfigureVectorization.h
2018-07-12 16:57:41 +02:00
Thales Sabino
9a6a43319f
Fix cxx11_tensor_fft not building on Windows.
...
The type used in Eigen::DSizes needs to be at least 8 bytes long. Internally Tensor tries to convert this to an __int64 on Windows and this fails to build. On Linux, long and long long are both 8 byte integer types.
* * *
Changing from "long long" to "std::int64_t".
2018-07-12 11:20:59 +01:00
Gael Guennebaud
b347eb0b1c
Fix doc
2018-07-12 11:56:18 +02:00
Mark D Ryan
e79c5149bf
Fix AVX512 implementations of psqrt
...
This commit fixes the AVX512 implementations of psqrt in the same
way that 3ed67cb0bb
fixed the AVX2 version of this function. The
AVX512 versions of psqrt incorrectly return -0.0 for negative
values, instead of NaN. Fixing the issues requires adding
some additional instructions that slow down the algorithms. A
similar test to the one used in 3ed67cb0bb
shows that the
corrected Packet16f code runs at 73% of the speed of the existing code,
while the corrected Packed8d function runs at 68% of the original.
2018-06-25 05:05:02 -07:00
Yuefeng Zhou
1eff6cf8a7
Use device's allocate function instead of internal::aligned_malloc. This would make it easier to track memory usage in device instances.
2018-02-20 16:50:05 -08:00
Gael Guennebaud
adb134d47e
Fix implicit conversion from 0.0 to scalar
2018-02-16 22:26:01 +04:00
Gael Guennebaud
937ad18221
add unit test for SimplicialCholesky and Boost multiprec.
2018-02-16 22:25:11 +04:00
Julian Kent
6d451cf2b6
Add missing consts for rows and cols functions in SparseLU
2018-02-10 13:44:05 +01:00
Daniele E. Domenichelli
a12b8a8c75
FindEigen3: Set Eigen3_FOUND variable
2018-07-11 16:31:50 +02:00
Gael Guennebaud
8bdb214fd0
remove double ;;
2018-07-12 11:17:53 +02:00
Gael Guennebaud
a9060378d3
bug #1570 : fix warning
2018-07-12 11:07:09 +02:00
Gael Guennebaud
6cd6551b26
Add deprecated header files for TensorFlow
2018-07-12 10:50:53 +02:00
Gael Guennebaud
da0c604078
Merged in deven-amd/eigen (pull request PR-402)
...
Adding support for using Eigen in HIP kernels.
2018-07-12 08:07:16 +00:00
Gael Guennebaud
a4ea611ca7
Remove useless specialization thanks to is_convertible being more robust.
2018-07-12 09:59:44 +02:00
Gael Guennebaud
8a40dda5a6
Add some basic unit-tests
2018-07-12 09:59:00 +02:00
Gael Guennebaud
8ef267ccbd
spellcheck
2018-07-12 09:58:29 +02:00
Gael Guennebaud
21cf4a1a8b
Make is_convertible more robust and conformant to std::is_convertible
2018-07-12 09:57:19 +02:00
Gael Guennebaud
8a5955a052
Optimize the product of a householder-sequence with the identity, and optimize the evaluation of a HouseholderSequence to a dense matrix using faster blocked product.
2018-07-11 17:16:50 +02:00
Gael Guennebaud
d193cc87f4
Fix regression in 9357838f94
2018-07-11 17:09:23 +02:00
Gael Guennebaud
fb33687736
Fix double ;;
2018-07-11 17:08:30 +02:00
Deven Desai
876f392c39
Updates corresponding to the latest round of PR feedback
...
The major changes are
1. Moving CUDA/PacketMath.h to GPU/PacketMath.h
2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h
3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h
The above three changes effectively enable the Eigen "Packet" layer for the HIP platform
4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic")
5. Updating the "EIGEN_DEVICE_FUNC" marking in some places
The change has been tested on the HIP and CUDA platforms.
2018-07-11 10:39:54 -04:00
Deven Desai
1fe0b74904
deleting hip specific files that are no longer required
2018-07-11 09:28:44 -04:00
Deven Desai
dec47a6493
renaming CUDA* to GPU* for some header files
2018-07-11 09:26:54 -04:00
Deven Desai
471cfe5ff7
renaming CUDA* to GPU* for some header files
2018-07-11 09:22:04 -04:00
Deven Desai
38807a2575
merging updates from upstream
2018-07-11 09:17:33 -04:00
Gael Guennebaud
f00d08cc0a
Optimize extraction of Q in SparseQR by exploiting the structure of the identity matrix.
2018-07-11 14:01:47 +02:00
Gael Guennebaud
1625476091
Add internall::is_identity compile-time helper
2018-07-11 14:00:24 +02:00