Gael Guennebaud
f159cf3d75
Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels.
...
With a 6pX4 kernel (not committed yet), this provides a +20% speedup.
2018-12-11 15:36:27 +01:00
Gael Guennebaud
0a7e7af6fd
Properly set the number of registers for AVX512
2018-12-11 15:33:17 +01:00
Gael Guennebaud
7166496f70
bug #1643 : fix compilation issue with gcc and no optimizaion
2018-12-11 13:24:42 +01:00
Gael Guennebaud
0d90637838
enable spilling workaround on architectures with SSE/AVX
2018-12-10 23:22:44 +01:00
Gael Guennebaud
bff90bf270
workaround "may be used uninitialized" warning
2018-12-08 18:58:28 +01:00
Gael Guennebaud
81c27325ae
bug #1641 : fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512
2018-12-08 14:27:48 +01:00
Gael Guennebaud
426bce7529
fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target
2018-12-08 09:44:21 +01:00
Gael Guennebaud
956678a4ef
bug #1515 : disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling.
2018-12-07 18:03:36 +01:00
Gael Guennebaud
7b6d0ff1f6
Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error.
2018-12-07 15:14:50 +01:00
Gael Guennebaud
f233c6194d
bug #1637 : workaround register spilling in gebp with clang>=6.0+AVX+FMA
2018-12-07 10:01:09 +01:00
Gael Guennebaud
ae59a7652b
bug #1638 : add a warning if avx512 is enabled without SSE/AVX FMA
2018-12-07 09:23:28 +01:00
Gael Guennebaud
4e7746fe22
bug #1636 : fix gemm performance issue with gcc>=6 and no FMA
2018-12-07 09:15:46 +01:00
Gael Guennebaud
cbf2f4b7a0
AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only
2018-12-06 18:21:56 +01:00
Gael Guennebaud
1d683ae2f5
Fix compilation with avx512f only, i.e., no AVX512DQ
2018-12-06 18:11:07 +01:00
Gael Guennebaud
c53eececb0
Implement AVX512 vectorization of std::complex<float/double>
2018-12-06 15:58:06 +01:00
Gael Guennebaud
3fba59ea59
temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though!
2018-12-06 00:13:26 +01:00
Gael Guennebaud
1ac2695ef7
bug #1636 : fix compilation with some ABI versions.
2018-12-06 00:05:10 +01:00
Rasmus Munk Larsen
47d8b741b2
#elif -> #else to fix GPU build.
2018-12-05 13:19:31 -08:00
Christoph Hertzberg
c1d356e8b4
bug #1635 : Use infinity from Numtraits instead of creating it manually.
2018-12-05 15:01:04 +01:00
Rasmus Munk Larsen
b57b31cce9
Merged in ezhulenev/eigen-01 (pull request PR-553)
...
Do not disable alignment with EIGEN_GPUCC
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-12-04 23:47:19 +00:00
Eugene Zhulenev
0bb15bb6d6
Update checks in ConfigureVectorization.h
2018-12-03 17:10:40 -08:00
Eugene Zhulenev
fd0fbfa9b5
Do not disable alignment with EIGEN_GPUCC
2018-12-03 15:54:10 -08:00
Christoph Hertzberg
919414b9fe
bug #785 : Make Cholesky decomposition work for empty matrices
2018-12-03 16:18:15 +01:00
Gael Guennebaud
0ea7ae7213
Add missing padd for Packet8i (it was implicitly generated by clang and gcc)
2018-11-30 21:52:25 +01:00
Gael Guennebaud
ab4df3e6ff
bug #1634 : remove double copy in move-ctor of non movable Matrix/Array
2018-11-30 21:25:51 +01:00
Gael Guennebaud
c785464430
Add packet sin and cos to Altivec/VSX and NEON
2018-11-30 16:21:33 +01:00
Gael Guennebaud
69ace742be
Several improvements regarding packet-bitwise operations:
...
- add unit tests
- optimize their AVX512f implementation
- add missing implementations (half, Packet4f, ...)
2018-11-30 15:56:08 +01:00
Gael Guennebaud
fa87f9d876
Add psin/pcos on AVX512 -> almost for free, at last!
2018-11-30 14:33:13 +01:00
Gael Guennebaud
c68bd2fa7a
Cleanup
2018-11-30 14:32:31 +01:00
Gael Guennebaud
f91500d303
Fix pandnot order in AVX512
2018-11-30 14:32:06 +01:00
Gael Guennebaud
b477d60bc6
Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX)
2018-11-30 11:26:30 +01:00
Gael Guennebaud
e19ece822d
Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks)
2018-11-28 17:56:24 +01:00
Gael Guennebaud
41052f63b7
same for pmax
2018-11-28 17:17:28 +01:00
Gael Guennebaud
3e95e398b6
pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions
2018-11-28 17:14:20 +01:00
Gael Guennebaud
aa6097395b
Add missing SSE/AVX type-casting in AVX512 mode
2018-11-28 16:09:08 +01:00
Gael Guennebaud
48fe78c375
bug #1630 : fix linspaced when requesting smaller packet size than default one.
2018-11-28 13:15:06 +01:00
Eugene Zhulenev
80f1651f35
Use explicit packet type in SSE/PacketMath pldexp
2018-11-27 17:25:49 -08:00
Benoit Jacob
a4159dba08
do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first).
2018-11-27 16:53:14 -05:00
Gael Guennebaud
b131a4db24
bug #1631 : fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.
2018-11-27 23:45:00 +01:00
Gael Guennebaud
a1a5fbbd21
Update pshiftleft to pass the shift as a true compile-time integer.
2018-11-27 22:57:30 +01:00
Gael Guennebaud
fa7fd61eda
Unify SSE/AVX psin functions.
...
It is based on the SSE version which is much more accurate, though very slightly slower.
This changeset also includes the following required changes:
- add packet-float to packet-int type traits
- add packet float<->int reinterpret casts
- add faster pselect for AVX based on blendv
2018-11-27 22:41:51 +01:00
Benoit Jacob
7b1cb8a440
fix the build on 64-bit ARM when NEON is disabled
2018-11-27 11:11:02 -05:00
Gael Guennebaud
b5695a6008
Unify Altivec/VSX pexp(double) with default implementation
2018-11-27 13:53:05 +01:00
Gael Guennebaud
7655a8af6e
cleanup
2018-11-26 23:21:29 +01:00
Gael Guennebaud
502f92fa10
Unify SSE and AVX pexp for double.
2018-11-26 23:12:44 +01:00
Gael Guennebaud
4a347a0054
Unify NEON's pexp with generic implementation
2018-11-26 22:15:44 +01:00
Gael Guennebaud
5c8406babc
Unify Altivec/VSX's pexp with generic implementation
2018-11-26 16:47:13 +01:00
Gael Guennebaud
cf8b85d5c5
Unify SSE and AVX implementation of pexp
2018-11-26 16:36:19 +01:00
Gael Guennebaud
c2f35b1b47
Unify Altivec/VSX's plog with generic implementation, and enable it!
2018-11-26 15:58:11 +01:00
Gael Guennebaud
c24e98e6a8
Unify NEON's plog with generic implementation
2018-11-26 15:02:16 +01:00
Gael Guennebaud
2c44c40114
First step toward a unification of packet log implementation, currently only SSE and AVX are unified.
...
To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
2018-11-26 14:21:24 +01:00
Gael Guennebaud
5f6045077c
Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B"
2018-11-26 14:14:07 +01:00
Gael Guennebaud
0836a715d6
bug #1611 : fix plog(0) on NEON
2018-11-26 09:08:38 +01:00
Patrik Huber
95566eeed4
Fix typos
2018-11-23 22:22:14 +00:00
Gael Guennebaud
ccabdd88c9
Fix reserved usage of double __ in macro names
2018-11-23 16:01:47 +01:00
Gael Guennebaud
a7842daef2
Fix several uninitialized member from ctor
2018-11-23 15:10:28 +01:00
Gael Guennebaud
a476054879
bug #1624 : improve matrix-matrix product on ARM 64, 20% speedup
2018-11-23 10:25:19 +01:00
Gael Guennebaud
4b2cebade8
Workaround weird MSVC bug
2018-11-21 15:53:37 +01:00
Deven Desai
e7e6809e6b
ROCm/HIP specfic fixes + updates
...
1. Eigen/src/Core/arch/GPU/Half.h
Updating the HIPCC implementation half so that it can declared as a __shared__ variable
2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h
introducing a EIGEN_USE_STD(func) macro that calls
- std::func be default
- ::func when eigen is being compiled with HIPCC
This change was requested in the previous HIP PR
(https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff )
3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h
Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC
4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors
2018-11-19 18:13:59 +00:00
Gael Guennebaud
6a510fe69c
Make MaxPacketSize a true upper bound, even for fixed-size inputs
2018-11-16 11:25:32 +01:00
Mark D Ryan
670d56441c
PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals
...
Commit aa110e681b
optimised the multiplication of small dyanmically
sized matrices by restricting the packet size to a maximum of 4, increasing
the chances that SIMD instructions are used in the computation. However, it
introduced a mismatch between the packet size and the requestedAlignment. This
mismatch can lead to crashes when the destination is not aligned. This patch
fixes the issue by ensuring that the AssignmentTraits are correctly computed
when using a restricted packet size.
* * *
Bind LinearPacketType to MaxPacketSize
This commit applies any packet size limit specified when instantiating
copy_using_evaluator_traits to the LinearPacketType, providing that the
size of the destination is not known at compile time.
* * *
Add unit test for restricted packet assignment
A new unit test is added to check that multiplication of small dynamically
sized matrices works correctly when the packet size is restricted to 4 and
the destination is unaligned.
2018-11-13 16:15:08 +01:00
Nikolaus Demmel
3dc0845046
Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES
2018-11-14 18:11:30 +01:00
Gael Guennebaud
7fddc6a51f
typo
2018-11-14 14:43:18 +01:00
Gael Guennebaud
449f948b2a
help doxygen linking to DenseBase::NulllaryExpr
2018-11-14 14:42:59 +01:00
luz.paz"
f67b19a884
[PATCH 1/2] Misc. typos
...
From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of:
```
als
ans
cas
dum
lastr
lowd
nd
overfl
pres
preverse
substraction
te
uint
whch
```
---
CMakeLists.txt | 26 +++++++++----------
Eigen/src/Core/GenericPacketMath.h | 2 +-
Eigen/src/SparseLU/SparseLU.h | 2 +-
bench/bench_norm.cpp | 2 +-
doc/HiPerformance.dox | 2 +-
doc/QuickStartGuide.dox | 2 +-
.../Eigen/CXX11/src/Tensor/TensorChipping.h | 6 ++---
.../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h | 2 +-
.../src/Tensor/TensorForwardDeclarations.h | 4 +--
.../src/Tensor/TensorGpuHipCudaDefines.h | 2 +-
.../Eigen/CXX11/src/Tensor/TensorReduction.h | 2 +-
.../CXX11/src/Tensor/TensorReductionGpu.h | 2 +-
.../test/cxx11_tensor_concatenation.cpp | 2 +-
unsupported/test/cxx11_tensor_executor.cpp | 2 +-
14 files changed, 29 insertions(+), 29 deletions(-)
2018-09-18 04:15:01 -04:00
Rasmus Munk Larsen
77b447c24e
Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX.
2018-11-12 13:42:24 -08:00
Gael Guennebaud
0105146915
Fix warning in c++03
2018-11-10 09:11:38 +01:00
Gael Guennebaud
784a3f13cf
bug #1619 : fix mixing of const and non-const generic iterators
2018-11-09 21:45:10 +01:00
Gael Guennebaud
db9a9a12ba
bug #1619 : make const and non-const iterators compatible
2018-11-09 16:49:19 +01:00
Gael Guennebaud
bd9a00718f
Let doxygen sees lastN
2018-11-09 11:35:48 +01:00
Gael Guennebaud
a368848473
Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE
2018-11-09 10:33:17 +01:00
Gael Guennebaud
f62a0f69c6
Fix max-size in indexed-view
2018-11-08 18:40:22 +01:00
Gael Guennebaud
bf495859ff
Merged in glchaves/eigen (pull request PR-539)
...
Vectorize row-by-row gebp loop iterations on 16 packets as well
2018-11-07 07:21:15 +00:00
Gustavo Lima Chaves
4ad359237a
Vectorize row-by-row gebp loop iterations on 16 packets as well
...
Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com>
Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com>
2018-11-06 10:48:42 -08:00
Matthieu Vigne
8d7a73e48e
bug #1617 : Fix SolveTriangular.solveInPlace crashing for empty matrix.
...
This made FullPivLU.kernel() crash when used on the zero matrix.
Add unit test for FullPivLU.kernel() on the zero matrix.
2018-10-31 20:28:18 +01:00
Christoph Hertzberg
66b28e290d
bug #1618 : Use different power-of-2 check to avoid MSVC warning
2018-11-01 13:23:19 +01:00
Christian von Schultz
4a40b3785d
Collapsed revision (based on pull request PR-325)
...
* Support compiling without IO streams
Add the preprocessor definition EIGEN_NO_IO which, if defined,
disables all use of the IO streams part of the standard library.
2018-10-22 21:14:40 +02:00
Rasmus Munk Larsen
14054e217f
Do not rely on the compiler generating __device__ functions for constexpr in Cuda (via EIGEN_CONSTEXPR_ARE_DEVICE_FUNC. This breaks several target in the TensorFlow Cuda build, e.g.,
...
INFO: From Compiling tensorflow/core/kernels/maxpooling_op_gpu.cu.cc:
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNHWC< ::Eigen::half> ") is not allowed
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code"
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNCHW< ::Eigen::half> ") is not allowed
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code
4 errors detected in the compilation of "/tmp/tmpxft_00000011_00000000-6_maxpooling_op_gpu.cu.cpp1.ii".
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: output 'tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o' was not created
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: Couldn't build file tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o: not all outputs were created or valid
2018-10-22 16:18:24 -07:00
Rasmus Munk Larsen
9caafca550
Merged in rmlarsen/eigen (pull request PR-532)
...
Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
2018-10-19 21:37:14 +00:00
Christoph Hertzberg
449ff74672
Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file).
...
Manually grafted from d107a371c6
2018-10-19 21:10:28 +02:00
Rasmus Munk Larsen
d8f285852b
Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
2018-10-18 16:55:02 -07:00
Gael Guennebaud
0f780bb0b4
Fix float-to-double warning
2018-10-16 09:19:45 +02:00
Gael Guennebaud
a39e0f7438
bug #1612 : fix regression in "outer-vectorization" of partial reductions for PacketSize==1 (aka complex<double>)
2018-10-16 01:04:25 +02:00
Gael Guennebaud
d2d570c116
Remove useless (and broken) resize
2018-10-16 00:42:48 +02:00
Gael Guennebaud
f0fb95135d
Iterative solvers: unify and fix handling of multiple rhs.
...
m_info was not properly computed and the logic was repeated in several places.
2018-10-15 23:47:46 +02:00
Gael Guennebaud
3a33db4de5
merge
2018-10-15 09:22:27 +02:00
Mark D Ryan
aa110e681b
PR 526: Speed up multiplication of small, dynamically sized matrices
...
The Packet16f, Packet8f and Packet8d types are too large to use with dynamically
sized matrices typically processed by the SliceVectorizedTraversal specialization of
the dense_assignment_loop. Using these types is likely to lead to little or no
vectorization. Significant slowdown in the multiplication of these small matrices can
be observed when building with AVX and AVX512 enabled.
This patch introduces a new dense_assignment_kernel that is used when
computing small products whose operands have dynamic dimensions. It ensures that the
PacketSize used is no larger than 4, thereby increasing the chance that vectorized
instructions will be used when computing the product.
I tested all 969 possible combinations of M, K, and N that are handled by the
dense_assignment_loop on x86 builds. Although a few combinations are slowed down
by this patch they are far outnumbered by the cases that are sped up, as the
following results demonstrate.
Disabling Packed8d on AVX512 builds:
Total Cases: 969
Better: 511
Worse: 85
Same: 373
Max Improvement: 169.00% (4 8 6)
Max Degradation: 36.50% (8 5 3)
Median Improvement: 35.46%
Median Degradation: 17.41%
Total FLOPs Improvement: 19.42%
Disabling Packet16f and Packed8f on AVX512 builds:
Total Cases: 969
Better: 658
Worse: 5
Same: 306
Max Improvement: 214.05% (8 6 5)
Max Degradation: 22.26% (16 2 1)
Median Improvement: 60.05%
Median Degradation: 13.32%
Total FLOPs Improvement: 59.58%
Disabling Packed8f on AVX builds:
Total Cases: 969
Better: 663
Worse: 96
Same: 210
Max Improvement: 155.29% (4 10 5)
Max Degradation: 35.12% (8 3 2)
Median Improvement: 34.28%
Median Degradation: 15.05%
Total FLOPs Improvement: 26.02%
2018-10-12 15:20:21 +02:00
Eugene Zhulenev
d9392f9e55
Fix code format
2018-11-02 14:51:35 -07:00
Eugene Zhulenev
118520f04a
Workaround nbcc+msvc compiler bug
2018-11-02 14:48:28 -07:00
Christoph Hertzberg
24dc076519
Explicitly convert 0 to Scalar for custom types
2018-10-12 10:22:19 +02:00
Gael Guennebaud
43633fbaba
Fix warning with AVX512f
2018-10-11 10:13:48 +02:00
Gael Guennebaud
97e2c808e9
Fix avx512 plog(NaN) to return NaN instead of +inf
2018-10-11 10:13:13 +02:00
Gael Guennebaud
b3f66d29a5
Enable avx512 plog with clang
2018-10-11 10:12:21 +02:00
Gael Guennebaud
f0aa7e40fc
Fix regression in changeset 5335659c47
2018-10-10 23:47:30 +02:00
Gael Guennebaud
ce243ee45b
bug #520 : add diagmat +/- diagmat operators.
2018-10-10 23:38:22 +02:00
Gael Guennebaud
5335659c47
Merged in ezhulenev/eigen-02 (pull request PR-525)
...
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 20:59:00 +00:00
Gael Guennebaud
eec0dfd688
bug #632 : add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense.
...
They are rewritten as two compound assignment to by-pass hybrid dense-sparse iterator.
2018-10-10 22:50:15 +02:00
Eugene Zhulenev
8e6dc2c81d
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 13:23:52 -07:00
Eugene Zhulenev
2bf1a31d81
Use void type if stl-style iterators are not supported
2018-10-10 10:31:40 -07:00
Rasmus Munk Larsen
e8918743c1
Merged in ezhulenev/eigen-01 (pull request PR-523)
...
Compile time detection for unimplemented stl-style iterators
2018-10-09 23:42:01 +00:00
Eugene Zhulenev
c0ca8a9fa3
Compile time detection for unimplemented stl-style iterators
2018-10-09 15:28:23 -07:00
Gael Guennebaud
1dd1f8e454
bug #65 : add vectorization of partial reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean()
2018-10-09 23:36:50 +02:00
Gael Guennebaud
bfa2a81a50
Make redux_vec_unroller more flexible regarding packet-type
2018-10-09 23:30:41 +02:00
Christoph Hertzberg
f6359ad795
Small Doxygen fixes
2018-10-09 19:33:35 +02:00
Gael Guennebaud
7a882c05ab
Fix compilation on CUDA
2018-10-09 17:02:16 +02:00
Gael Guennebaud
e00487f7d2
bug #1603 : add parenthesis around ternary operator in function body as well as a harmless attempt to make MSVC happy.
2018-10-08 22:27:04 +02:00
Gael Guennebaud
649d4758a6
merge
2018-10-08 17:35:18 +02:00
Gael Guennebaud
774bb9d6f7
fix a doxygen issue
2018-10-08 09:30:15 +02:00
Gael Guennebaud
bcb7c66b53
Workaround gcc's alloc-size-larger-than= warning
2018-10-07 21:55:59 +02:00
Gael Guennebaud
6512c5e136
Implement a better workaround for GCC's bug #87544
2018-10-07 15:00:05 +02:00
Gael Guennebaud
409132bb81
Workaround gcc bug making it trigger an invalid warning
2018-10-07 09:23:15 +02:00
Gael Guennebaud
c6a1ab4036
Workaround MSVC compilation issue
2018-10-06 13:49:17 +02:00
Gael Guennebaud
e21766c6f5
Clarify doc of rowwise/colwise/vectorwise.
2018-10-05 23:12:09 +02:00
Gael Guennebaud
d92f004ab7
Simplify API by removing allCols/allRows and reusing rowwise/colwise to define iterators over rows/columns
2018-10-05 23:11:21 +02:00
Gael Guennebaud
3e64b1fc86
Move iterators to internal, improve doc, make unit test c++03 friendly
2018-10-03 15:13:15 +02:00
Gael Guennebaud
2b2b4d0580
fix unused warning
2018-10-03 14:16:21 +02:00
Gael Guennebaud
5f26f57598
Change the logic of A.reshaped<Order>() to be a simple alias to A.reshaped<Order>(AutoSize,fix<1>).
...
This means that now AutoOrder is allowed, and it always return a column-vector.
2018-10-03 11:41:47 +02:00
Gael Guennebaud
0481900e25
Add pointer-based iterator for direct-access expressions
2018-10-02 23:44:36 +02:00
Gael Guennebaud
8c38528168
Factorize RowsProxy/ColsProxy and related iterators using subVector<>(Index)
2018-10-02 14:03:26 +02:00
Gael Guennebaud
12487531ce
Add templated subVector<Vertical/Horizonal>(Index) aliases to col/row(Index) methods (plus subVectors<>() to retrieve the number of rows/columns)
2018-10-02 14:02:34 +02:00
Gael Guennebaud
37e29fc893
Use Index instead of ptrdiff_t or int, fix random-accessors.
2018-10-02 13:29:32 +02:00
Gael Guennebaud
de2efbc43c
bug #1605 : workaround ABI issue with vector types (aka __m128) versus scalar types (aka float)
2018-10-01 23:45:55 +02:00
Gael Guennebaud
b0c66adfb1
bug #231 : initial implementation of STL iterators for dense expressions
2018-10-01 23:21:37 +02:00
Christoph Hertzberg
564ca71e39
Merged in deven-amd/eigen/HIP_fixes (pull request PR-518)
...
PR with HIP specific fixes (for the eigen nightly regression failures in HIP mode)
2018-10-01 16:51:04 +00:00
Deven Desai
94898488a6
This commit contains the following (HIP specific) updates:
...
- unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h
Changing "pass-by-reference" argument to be "pass-by-value" instead
(in a __global__ function decl).
"pass-by-reference" arguments to __global__ functions are unwise,
and will be explicitly flagged as errors by the newer versions of HIP.
- Eigen/src/Core/util/Memory.h
- unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
Changes introduced in recent commits breaks the HIP compile.
Adding EIGEN_DEVICE_FUNC attribute to some functions and
calling ::malloc/free instead of the corresponding std:: versions
to get the HIP compile working again
- unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
Change introduced a recent commit breaks the HIP compile
(link stage errors out due to failure to inline a function).
Disabling the recently introduced code (only for HIP compile), to get
the eigen nightly testing going again.
Will submit another PR once we have te proper fix.
- Eigen/src/Core/util/ConfigureVectorization.h
Enabling GPU VECTOR support when HIP compiler is in use
(for both the host and device compile phases)
2018-10-01 14:28:37 +00:00
Gael Guennebaud
af3ad4b513
oops, I've been too fast in previous copy/paste
2018-09-27 09:28:57 +02:00
Gael Guennebaud
24b163a877
#pragma GCC diagnostic push/pop is not supported prioro to gcc 4.6
2018-09-27 09:23:54 +02:00
Gael Guennebaud
41c3a2ffc1
Fix documentation of reshape to vectors.
2018-09-25 16:35:44 +02:00
Christoph Hertzberg
2c083ace3e
Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides
2018-09-24 18:01:17 +02:00
Gael Guennebaud
626942d9dd
fix alignment issue in ploaddup for AVX512
2018-09-28 16:57:32 +02:00
Gael Guennebaud
84a1101b36
Merge with default.
2018-09-23 21:52:58 +02:00
Gael Guennebaud
795e12393b
Fix logic in diagonal*dense product in a corner case.
...
The problem was for: diag(1x1) * mat(1,n)
2018-09-22 16:44:33 +02:00
Gael Guennebaud
bac36d0996
Demangle Travseral and Unrolling in Redux
2018-09-21 23:03:45 +02:00
Gael Guennebaud
1bf12880ae
Add reshaped<>() shortcuts when returning vectors and remove the reshaping version of operator()(all)
2018-09-21 16:50:04 +02:00
Gael Guennebaud
371068992a
Add more debug output
2018-09-21 14:32:39 +02:00
Gael Guennebaud
b00e48a867
Improve slice-vectorization logic for redux (significant speed-up for reduxion of blocks)
2018-09-21 13:45:56 +02:00
Gael Guennebaud
a488d59787
merge with default Eigen
2018-09-21 11:51:49 +02:00
Gael Guennebaud
47720e7970
Doc fixes
2018-09-21 11:48:22 +02:00
Gael Guennebaud
3ec2985914
Merged indexing cleanup (pull request PR-506)
2018-09-21 09:36:05 +00:00
Gael Guennebaud
651e5d4866
Fix EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE for AVX512 or AVX with malloc aligned on 8 bytes only.
...
This change also make it future proof for AVX1024
2018-09-21 10:33:22 +02:00
Gael Guennebaud
f0ef3467de
Fix doc
2018-09-20 22:57:28 +02:00
Gael Guennebaud
617f75f117
Add indexing namespace
2018-09-20 22:57:10 +02:00
Gael Guennebaud
0c56d22e2e
Fix shadowing
2018-09-20 22:56:21 +02:00
Gael Guennebaud
9419f506d0
Fix regression introduced by the previous fix for AVX512.
...
It brokes the complex-complex case on SSE.
2018-09-20 17:32:34 +02:00
Gael Guennebaud
e38d1ab4d1
Workaround increases required alignment warning
2018-09-20 17:07:33 +02:00
Gael Guennebaud
71496b0e25
Fix gebp kernel for real+complex in case only reals are vectorized (e.g., AVX512).
...
This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.
2018-09-20 17:01:24 +02:00
Gael Guennebaud
5a30eed17e
Fix warnings in AVX512
2018-09-20 16:58:51 +02:00
Gael Guennebaud
c3a19527a2
Fix doc wrt previous change
2018-09-19 11:49:26 +02:00
Gael Guennebaud
dfa8439e4d
Update reshaped API to use RowMajor/ColMajor directly as integral values instead of introducing RowOrder/ColOrder types.
...
The API changed from A.respahed(rows,cols,RowOrder) to A.template reshaped<RowOrder>(rows,cols).
2018-09-19 11:49:26 +02:00
Gael Guennebaud
297ca62319
ease transition by adding placeholders::all/last/and as deprecated
2018-09-17 16:24:52 +02:00
Gael Guennebaud
2014c7ae28
Move all, last, end from Eigen::placeholders namespace to Eigen::, and rename end to lastp1 to avoid conflicts with std::end.
2018-09-15 14:35:10 +02:00
Gael Guennebaud
82772e8d9d
Rename Symbolic namespace to symbolic to be consistent with numext namespace
2018-09-15 14:16:20 +02:00
Gael Guennebaud
3e8188fc77
bug #1600 : initialize m_info to InvalidInput by default, even though m_info is not accessible until it has been initialized (assert)
2018-09-18 21:24:48 +02:00
Christoph Hertzberg
d7378aae8e
Provide EIGEN_ALIGNOF macro, and give handmade_aligned_malloc the possibility for alignments larger than the standard alignment.
2018-09-14 20:17:47 +02:00
Gael Guennebaud
1141bcf794
Fix conjugate-gradient for very small rhs
2018-09-13 23:53:28 +02:00
Deven Desai
c64fe9ea1f
Updates to fix HIP-clang specific compile errors.
...
Compiling the eigen unittests with hip-clang (HIP with clang as the underlying compiler instead of hcc or nvcc), results in compile errors. The changes in this commit fix those compile errors. The main change is to convert a few instances of "__device__" to "EIGEN_DEVICE_FUNC"
2018-08-30 20:22:16 +00:00
Gael Guennebaud
5927eef612
Enable std::result_of for msvc 2015 and later
2018-09-13 09:44:46 +02:00
Christoph Hertzberg
3adece4827
Fix misleading indentation of errorCode and make it loop-local
2018-09-12 14:41:38 +02:00
Christoph Hertzberg
7e9c9fbb2d
Disable type-limits warnings for g++ < 4.8
2018-09-12 14:40:39 +02:00
Justin Carpentier
4827bec776
LLT: correct doc and add missing reference for the return type of rankUpdate
...
---
Eigen/src/Cholesky/LLT.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
2018-09-11 09:33:21 +02:00
luz.paz"
43fd42a33b
Fix doxy and misc. typos
...
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt`
---
Eigen/src/Core/ProductEvaluators.h | 4 ++--
Eigen/src/Core/arch/GPU/Half.h | 2 +-
Eigen/src/Core/util/Memory.h | 2 +-
Eigen/src/Geometry/Hyperplane.h | 2 +-
Eigen/src/Geometry/Transform.h | 2 +-
Eigen/src/Geometry/Translation.h | 12 ++++++------
doc/PreprocessorDirectives.dox | 2 +-
doc/TutorialGeometry.dox | 2 +-
test/boostmultiprec.cpp | 2 +-
test/triangular.cpp | 2 +-
10 files changed, 16 insertions(+), 16 deletions(-)
2018-08-01 21:34:47 -04:00
Jiandong Ruan
6dcd2642aa
bug #1526 - CUDA compilation fails on CUDA 9.x SDK when arch is set to compute_60 and/or above
2018-09-08 12:05:33 -07:00
Alexey Frunze
ec38f07b79
bug #1595 : Don't use C++11's std::isnan() in MIPS/MSA packet math.
...
This removes reliance on C++11 and improves generated code.
2018-09-06 15:40:09 -07:00
cgs1019
c6066ac411
Make param name and docs constistent for JacobiRotation::makeGivens
...
Previously the rendered math in the doc string called the optional return value
'r', while the actual parameter and the doc string text referred to the
parameter as 'z'. This changeset renames all the z's to r's to match the math.
2018-09-06 11:04:17 -04:00
Christoph Hertzberg
ddbc564386
Fixed a few more shadowing warnings when compiling with g++ (and c++03)
2018-08-30 16:33:03 +02:00
Mehdi Goli
7ec8b40ad9
Collapsed revision
...
* Separating SYCL math function.
* Converting function overload to function specialisation.
* Applying the suggested design.
2018-08-28 14:20:48 +01:00
Christoph Hertzberg
73ca600bca
Fix numerous shadow-warnings for GCC<=4.8
2018-08-28 18:32:39 +02:00
Christoph Hertzberg
ef4d79fed8
Disable/ReenableStupidWarnings did not work properly, when included recursively
2018-08-28 18:26:22 +02:00
Gael Guennebaud
befaf83f5f
bug #1590 : fix collision with some system headers defining the macro FP32
2018-08-28 13:21:28 +02:00
Christoph Hertzberg
42f3ee4fb8
Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop
...
Workaround: Don't include "DisableStupidWarnings.h" before including other main-headers
2018-08-28 11:44:15 +02:00
Alexey Frunze
050bcf6126
bug #1584 : Improve random (avoid undefined behavior).
2018-08-08 20:19:32 -07:00
Christoph Hertzberg
ad4a08fb68
Use Intel cast intrinsics, since MSVC does not allow direct casting.
...
Reported by David Winkler.
2018-08-24 19:04:33 +02:00
Christoph Hertzberg
41f1cc67b8
Assertion depended on a not yet initialized value
2018-08-17 16:42:53 +02:00
Christoph Hertzberg
595cae9b09
Silence logical-op-parentheses warning
2018-08-17 16:30:32 +02:00
Justin Carpentier
eabc7a4031
PR 465: Fix issue in RowMajor assignment in plain_matrix_type_row_major::type
...
The type should be RowMajor
2018-08-10 14:30:06 +02:00
Rasmus Munk Larsen
c49e93440f
SuiteSparse defines the macro SuiteSparse_long to control what type is used for 64bit integers. The default value of this macro on non-MSVC platforms is long and __int64 on MSVC. CholmodSupport defaults to using long for the long variants of CHOLMOD functions. This creates problems when SuiteSparse_long is different than long. So the correct thing to do here is
...
to use SuiteSparse_long as the type instead of long.
2018-08-13 15:53:31 -07:00
Gael Guennebaud
3ec60215df
Merged in rmlarsen/eigen2 (pull request PR-466)
...
Move sigmoid functor to core and rename it to 'logistic'.
2018-08-13 21:28:20 +00:00
Rasmus Munk Larsen
d6e283ba96
sigmoid -> logistic
2018-08-13 11:14:50 -07:00
Rasmus Munk Larsen
bfc5091dd5
Cast to diagonalSize to RealScalar instead Scalar.
2018-08-09 14:46:17 -07:00
Rasmus Munk Larsen
8603d80029
Cast diagonalSize() to Scalar before multiplication. Without this, automatic differentiation in Ceres breaks because Scalar is a custom type that does not support multiplication by Index.
2018-08-09 11:09:10 -07:00
Mehdi Goli
67711eaa31
Fixing typo.
2018-08-08 11:38:10 +01:00
Mehdi Goli
22031ab59a
Adding EIGEN_UNROLL_LOOP macro.
2018-08-08 11:07:27 +01:00
Rasmus Munk Larsen
fa68342ef8
Move sigmoid functor to core.
2018-08-03 17:31:23 -07:00
Rasmus Munk Larsen
7f8b53fd0e
bug #1580 : Fix cuda clang build. STL is not supported, so std::equal_to and std::not_equal breaks compilation.
...
Update the definition of EIGEN_CONSTEXPR_ARE_DEVICE_FUNC to exclude clang.
See also PR 450.
2018-08-01 12:36:24 -07:00
Mehdi Goli
01358300d5
Creating separate SYCL required PR for uncontroversial files.
2018-08-03 16:59:15 +01:00
Gael Guennebaud
62169419ab
Fix two regressions introduced in previous merges: bad usage of EIGEN_HAS_VARIADIC_TEMPLATES and linking issue.
2018-08-01 23:35:34 +02:00
Benoit Steiner
17221115c9
Merged in codeplaysoftware/eigen-upstream-pure/eigen_variadic_assert (pull request PR-447)
...
Adding variadic version of assert which can take a parameter pack as its input.
2018-08-01 16:41:54 +00:00
Mehdi Goli
af96018b49
Using the suggested modification.
2018-08-01 16:04:44 +01:00
Mehdi Goli
c84509d7cc
Adding new arch/SYCL headers, used for SYCL vectorization.
2018-08-01 12:40:54 +01:00
Mehdi Goli
3a197a60e6
variadic version of assert which can take a parameter pack as its input.
2018-08-01 12:19:14 +01:00
Alexey Frunze
7b91c11207
bug #1578 : Improve prefetching in matrix multiplication on MIPS.
2018-07-24 18:36:44 -07:00
Mark D Ryan
bc615e4585
Re-enable FMA for fast sqrt functions
2018-07-30 13:21:00 +02:00
Mark D Ryan
e79c5149bf
Fix AVX512 implementations of psqrt
...
This commit fixes the AVX512 implementations of psqrt in the same
way that 3ed67cb0bb
fixed the AVX2 version of this function. The
AVX512 versions of psqrt incorrectly return -0.0 for negative
values, instead of NaN. Fixing the issues requires adding
some additional instructions that slow down the algorithms. A
similar test to the one used in 3ed67cb0bb
shows that the
corrected Packet16f code runs at 73% of the speed of the existing code,
while the corrected Packed8d function runs at 68% of the original.
2018-06-25 05:05:02 -07:00
Rasmus Munk Larsen
2ebcb911b2
Add pcast packet op for NEON.
2018-07-26 14:28:48 -07:00
Christoph Hertzberg
fd4fe7cbc5
Fixed issue which made documentation not getting built anymore
2018-07-24 22:56:15 +02:00
Gael Guennebaud
4ca3e48f42
fix typo
2018-07-23 16:51:57 +02:00
Gael Guennebaud
c747cde69a
Add lastN shorcuts to seq/seqN.
2018-07-23 16:20:25 +02:00
Eugene Zhulenev
2bf864f1eb
Disable type traits for stdlibc++ <= 4.9.3
2018-07-20 10:11:44 -07:00
Gael Guennebaud
509a5fa77f
Fix IsRelocatable without C++11
2018-07-19 18:47:38 +02:00
Gael Guennebaud
2ca2592009
Fix determination of EIGEN_HAS_TYPE_TRAITS
2018-07-19 18:47:18 +02:00
Gael Guennebaud
5e5987996f
Fix stupid error in Quaternion move ctor
2018-07-19 18:33:53 +02:00
Alexey Frunze
1f523e7304
Add MIPS changes missing from previous merge.
2018-07-18 12:27:50 -07:00
Eugene Zhulenev
086ded5c85
Disable type traits for GCC < 5.1.0
2018-07-18 16:32:55 -07:00
Gael Guennebaud
863580fe88
bug #1432 : fix conservativeResize for non-relocatable scalar types. For those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable
2018-07-18 23:33:07 +02:00
Gael Guennebaud
a503fc8725
bug #1575 : fix regression introduced in bug #1573 patch. Move ctor/assignment should not be defaulted.
2018-07-18 23:26:13 +02:00
Gael Guennebaud
308725c3c9
More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDA
2018-07-18 13:51:36 +02:00
Deven Desai
f124f07965
applying EIGEN_DECLARE_TEST to *gpu* tests
...
Also, a few minor fixes for GPU tests running in HIP mode.
1. Adding an include for hip/hip_runtime.h in the Macros.h file
For HIP __host__ and __device__ are macros which are defined in hip headers.
Their definitions need to be included before their use in the file.
2. Fixing the compile failure in TensorContractionGpu introduced by the commit to
"Fuse computations into the Tensor contractions using output kernel"
3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit
2018-07-17 14:16:48 -04:00
Gael Guennebaud
2b2cd85694
bug #1573 : add noexcept move constructor and move assignment operator to Quaternion
2018-07-17 11:11:33 +02:00
Gael Guennebaud
5539587b1f
Some warning fixes
2018-07-17 10:29:12 +02:00
Gael Guennebaud
40797dbea3
bug #1572 : use c++11 atomic instead of volatile if c++11 is available, and disable multi-threaded GEMM on non-x86 without c++11.
2018-07-17 00:11:20 +02:00
Gael Guennebaud
a87cff20df
Fix GeneralizedEigenSolver when requesting for eigenvalues only.
2018-07-14 09:38:49 +02:00
Rasmus Munk Larsen
4a3952fd55
Relax the condition to not only work on Android.
2018-07-13 11:24:07 -07:00
Rasmus Munk Larsen
02a9443db9
Clang produces incorrect Thumb2 assembler when using alloca.
...
Don't define EIGEN_ALLOCA when generating Thumb with clang.
2018-07-13 11:03:04 -07:00
Gael Guennebaud
20991c3203
bug #1571 : fix is_convertible<from,to> with "from" a reference.
2018-07-13 17:47:28 +02:00
Gael Guennebaud
86d9c0255c
Forward declaring std::array does not work with all std libs, so let's just include <array>
2018-07-13 13:06:44 +02:00
Alexey Frunze
3875fb05aa
Add support for MIPS SIMD (MSA)
2018-07-06 16:04:30 -07:00
Gael Guennebaud
5c73c9223a
Fix shadowing typedefs
2018-07-12 17:01:07 +02:00
Gael Guennebaud
98728312c8
Fix compilation regarding std::array
2018-07-12 17:00:37 +02:00
Gael Guennebaud
eb3d8f68bb
fix unused warning
2018-07-12 16:59:47 +02:00
Gael Guennebaud
006e18e52b
Cleanup the mess in Eigen/Core by moving CUDA/HIP stuff at more appropriate places (Macros.h),
...
and alignment/vectorization logic is now in util/ConfigureVectorization.h
2018-07-12 16:57:41 +02:00
Julian Kent
6d451cf2b6
Add missing consts for rows and cols functions in SparseLU
2018-02-10 13:44:05 +01:00
Gael Guennebaud
8bdb214fd0
remove double ;;
2018-07-12 11:17:53 +02:00
Gael Guennebaud
a9060378d3
bug #1570 : fix warning
2018-07-12 11:07:09 +02:00
Gael Guennebaud
da0c604078
Merged in deven-amd/eigen (pull request PR-402)
...
Adding support for using Eigen in HIP kernels.
2018-07-12 08:07:16 +00:00
Gael Guennebaud
a4ea611ca7
Remove useless specialization thanks to is_convertible being more robust.
2018-07-12 09:59:44 +02:00
Gael Guennebaud
8ef267ccbd
spellcheck
2018-07-12 09:58:29 +02:00
Gael Guennebaud
21cf4a1a8b
Make is_convertible more robust and conformant to std::is_convertible
2018-07-12 09:57:19 +02:00
Gael Guennebaud
8a5955a052
Optimize the product of a householder-sequence with the identity, and optimize the evaluation of a HouseholderSequence to a dense matrix using faster blocked product.
2018-07-11 17:16:50 +02:00
Gael Guennebaud
d193cc87f4
Fix regression in 9357838f94
2018-07-11 17:09:23 +02:00
Gael Guennebaud
fb33687736
Fix double ;;
2018-07-11 17:08:30 +02:00
Deven Desai
876f392c39
Updates corresponding to the latest round of PR feedback
...
The major changes are
1. Moving CUDA/PacketMath.h to GPU/PacketMath.h
2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h
3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h
The above three changes effectively enable the Eigen "Packet" layer for the HIP platform
4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic")
5. Updating the "EIGEN_DEVICE_FUNC" marking in some places
The change has been tested on the HIP and CUDA platforms.
2018-07-11 10:39:54 -04:00
Deven Desai
471cfe5ff7
renaming CUDA* to GPU* for some header files
2018-07-11 09:22:04 -04:00
Deven Desai
38807a2575
merging updates from upstream
2018-07-11 09:17:33 -04:00
Gael Guennebaud
f00d08cc0a
Optimize extraction of Q in SparseQR by exploiting the structure of the identity matrix.
2018-07-11 14:01:47 +02:00
Gael Guennebaud
1625476091
Add internall::is_identity compile-time helper
2018-07-11 14:00:24 +02:00
Gael Guennebaud
fe723d6129
Fix conversion warning
2018-07-10 09:10:32 +02:00
Gael Guennebaud
9357838f94
bug #1543 : improve linear indexing for general block expressions
2018-07-10 09:10:15 +02:00
Gael Guennebaud
de9e31a06d
Introduce the macro ei_declare_local_nested_eval to help allocating on the stack local temporaries via alloca, and let outer-products makes a good use of it.
...
If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.
2018-07-09 15:41:14 +02:00
Gael Guennebaud
ec323b7e66
Skip null numerators in triangular-vector-solve (as in BLAS TRSV).
2018-07-09 11:13:19 +02:00
Gael Guennebaud
359dd77ec3
Fix legitimate "declaration shadows a typedef" warning
2018-07-09 11:03:39 +02:00
Mark D Ryan
90a53ca6fd
Fix the Packet16h version of ptranspose
...
The AVX512 version of ptranpose for PacketBlock<Packet16h,16> was
reordering the PacketBlock argument incorrectly. This lead to errors in
the multiplication of matrices composed of 16 bit floats on AVX512
machines, if at least of the matrices was using RowMajor order. This
error is responsible for one tensorflow unit test failure on AVX512
machines:
//tensorflow/python/kernel_tests:batch_matmul_op_test
2018-06-16 15:13:06 -07:00
Gael Guennebaud
1f54164eca
Fix a few issues with Packet16h
2018-07-07 00:15:07 +02:00
Gael Guennebaud
f2dc048df9
complete implementation of Packet16h (AVX512)
2018-07-06 17:43:11 +02:00
Gael Guennebaud
f4d623ffa7
Complete Packet8h implementation and test it in packetmath unit test
2018-07-06 17:13:36 +02:00
Deven Desai
b6cc0961b1
updates based on PR feedback
...
There are two major changes (and a few minor ones which are not listed here...see PR discussion for details)
1. Eigen::half implementations for HIP and CUDA have been merged.
This means that
- `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h`
- `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h`
- `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h`
After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install.
2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate.
- `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)`
- `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)`
- `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
2018-06-14 10:21:54 -04:00
Deven Desai
ba972fb6b4
moving Half headers from CUDA dir to GPU dir, removing the HIP versions
2018-06-13 12:26:18 -04:00
Deven Desai
d1d22ef0f4
syncing this fork with upstream
2018-06-13 12:09:52 -04:00
Benoit Steiner
d3a380af4d
Merged in mfigurnov/eigen/gamma-der-a (pull request PR-403)
...
Derivative of the incomplete Gamma function and the sample of a Gamma random variable
Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
2018-06-11 17:57:47 +00:00
Andrea Bocci
f7124b3e46
Extend CUDA support to matrix inversion and selfadjointeigensolver
2018-06-11 18:33:24 +02:00
Gael Guennebaud
0537123953
bug #1565 : help MSVC to generatenot too bad ASM in reductions.
2018-07-05 09:21:26 +02:00