eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-24 14:45:14 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	f159cf3d75	Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels. With a 6pX4 kernel (not committed yet), this provides a +20% speedup.	2018-12-11 15:36:27 +01:00
Gael Guennebaud	0a7e7af6fd	Properly set the number of registers for AVX512	2018-12-11 15:33:17 +01:00
Gael Guennebaud	7166496f70	bug #1643 : fix compilation issue with gcc and no optimizaion	2018-12-11 13:24:42 +01:00
Gael Guennebaud	0d90637838	enable spilling workaround on architectures with SSE/AVX	2018-12-10 23:22:44 +01:00
Gael Guennebaud	bff90bf270	workaround "may be used uninitialized" warning	2018-12-08 18:58:28 +01:00
Gael Guennebaud	81c27325ae	bug #1641 : fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512	2018-12-08 14:27:48 +01:00
Gael Guennebaud	426bce7529	fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target	2018-12-08 09:44:21 +01:00
Gael Guennebaud	956678a4ef	bug #1515 : disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling.	2018-12-07 18:03:36 +01:00
Gael Guennebaud	7b6d0ff1f6	Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error.	2018-12-07 15:14:50 +01:00
Gael Guennebaud	f233c6194d	bug #1637 : workaround register spilling in gebp with clang>=6.0+AVX+FMA	2018-12-07 10:01:09 +01:00
Gael Guennebaud	ae59a7652b	bug #1638 : add a warning if avx512 is enabled without SSE/AVX FMA	2018-12-07 09:23:28 +01:00
Gael Guennebaud	4e7746fe22	bug #1636 : fix gemm performance issue with gcc>=6 and no FMA	2018-12-07 09:15:46 +01:00
Gael Guennebaud	cbf2f4b7a0	AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only	2018-12-06 18:21:56 +01:00
Gael Guennebaud	1d683ae2f5	Fix compilation with avx512f only, i.e., no AVX512DQ	2018-12-06 18:11:07 +01:00
Gael Guennebaud	c53eececb0	Implement AVX512 vectorization of std::complex<float/double>	2018-12-06 15:58:06 +01:00
Gael Guennebaud	3fba59ea59	temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though!	2018-12-06 00:13:26 +01:00
Gael Guennebaud	1ac2695ef7	bug #1636 : fix compilation with some ABI versions.	2018-12-06 00:05:10 +01:00
Rasmus Munk Larsen	47d8b741b2	#elif -> #else to fix GPU build.	2018-12-05 13:19:31 -08:00
Christoph Hertzberg	c1d356e8b4	bug #1635 : Use infinity from Numtraits instead of creating it manually.	2018-12-05 15:01:04 +01:00
Rasmus Munk Larsen	b57b31cce9	Merged in ezhulenev/eigen-01 (pull request PR-553) Do not disable alignment with EIGEN_GPUCC Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>	2018-12-04 23:47:19 +00:00
Eugene Zhulenev	0bb15bb6d6	Update checks in ConfigureVectorization.h	2018-12-03 17:10:40 -08:00
Eugene Zhulenev	fd0fbfa9b5	Do not disable alignment with EIGEN_GPUCC	2018-12-03 15:54:10 -08:00
Christoph Hertzberg	919414b9fe	bug #785 : Make Cholesky decomposition work for empty matrices	2018-12-03 16:18:15 +01:00
Gael Guennebaud	0ea7ae7213	Add missing padd for Packet8i (it was implicitly generated by clang and gcc)	2018-11-30 21:52:25 +01:00
Gael Guennebaud	ab4df3e6ff	bug #1634 : remove double copy in move-ctor of non movable Matrix/Array	2018-11-30 21:25:51 +01:00
Gael Guennebaud	c785464430	Add packet sin and cos to Altivec/VSX and NEON	2018-11-30 16:21:33 +01:00
Gael Guennebaud	69ace742be	Several improvements regarding packet-bitwise operations: - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...)	2018-11-30 15:56:08 +01:00
Gael Guennebaud	fa87f9d876	Add psin/pcos on AVX512 -> almost for free, at last!	2018-11-30 14:33:13 +01:00
Gael Guennebaud	c68bd2fa7a	Cleanup	2018-11-30 14:32:31 +01:00
Gael Guennebaud	f91500d303	Fix pandnot order in AVX512	2018-11-30 14:32:06 +01:00
Gael Guennebaud	b477d60bc6	Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX)	2018-11-30 11:26:30 +01:00
Gael Guennebaud	e19ece822d	Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks)	2018-11-28 17:56:24 +01:00
Gael Guennebaud	41052f63b7	same for pmax	2018-11-28 17:17:28 +01:00
Gael Guennebaud	3e95e398b6	pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions	2018-11-28 17:14:20 +01:00
Gael Guennebaud	aa6097395b	Add missing SSE/AVX type-casting in AVX512 mode	2018-11-28 16:09:08 +01:00
Gael Guennebaud	48fe78c375	bug #1630 : fix linspaced when requesting smaller packet size than default one.	2018-11-28 13:15:06 +01:00
Eugene Zhulenev	80f1651f35	Use explicit packet type in SSE/PacketMath pldexp	2018-11-27 17:25:49 -08:00
Benoit Jacob	a4159dba08	do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first).	2018-11-27 16:53:14 -05:00
Gael Guennebaud	b131a4db24	bug #1631 : fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.	2018-11-27 23:45:00 +01:00
Gael Guennebaud	a1a5fbbd21	Update pshiftleft to pass the shift as a true compile-time integer.	2018-11-27 22:57:30 +01:00
Gael Guennebaud	fa7fd61eda	Unify SSE/AVX psin functions. It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv	2018-11-27 22:41:51 +01:00
Benoit Jacob	7b1cb8a440	fix the build on 64-bit ARM when NEON is disabled	2018-11-27 11:11:02 -05:00
Gael Guennebaud	b5695a6008	Unify Altivec/VSX pexp(double) with default implementation	2018-11-27 13:53:05 +01:00
Gael Guennebaud	7655a8af6e	cleanup	2018-11-26 23:21:29 +01:00
Gael Guennebaud	502f92fa10	Unify SSE and AVX pexp for double.	2018-11-26 23:12:44 +01:00
Gael Guennebaud	4a347a0054	Unify NEON's pexp with generic implementation	2018-11-26 22:15:44 +01:00
Gael Guennebaud	5c8406babc	Unify Altivec/VSX's pexp with generic implementation	2018-11-26 16:47:13 +01:00
Gael Guennebaud	cf8b85d5c5	Unify SSE and AVX implementation of pexp	2018-11-26 16:36:19 +01:00
Gael Guennebaud	c2f35b1b47	Unify Altivec/VSX's plog with generic implementation, and enable it!	2018-11-26 15:58:11 +01:00
Gael Guennebaud	c24e98e6a8	Unify NEON's plog with generic implementation	2018-11-26 15:02:16 +01:00
Gael Guennebaud	2c44c40114	First step toward a unification of packet log implementation, currently only SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.	2018-11-26 14:21:24 +01:00
Gael Guennebaud	5f6045077c	Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B"	2018-11-26 14:14:07 +01:00
Gael Guennebaud	0836a715d6	bug #1611 : fix plog(0) on NEON	2018-11-26 09:08:38 +01:00
Patrik Huber	95566eeed4	Fix typos	2018-11-23 22:22:14 +00:00
Gael Guennebaud	ccabdd88c9	Fix reserved usage of double __ in macro names	2018-11-23 16:01:47 +01:00
Gael Guennebaud	a7842daef2	Fix several uninitialized member from ctor	2018-11-23 15:10:28 +01:00
Gael Guennebaud	a476054879	bug #1624 : improve matrix-matrix product on ARM 64, 20% speedup	2018-11-23 10:25:19 +01:00
Gael Guennebaud	4b2cebade8	Workaround weird MSVC bug	2018-11-21 15:53:37 +01:00
Deven Desai	e7e6809e6b	ROCm/HIP specfic fixes + updates 1. Eigen/src/Core/arch/GPU/Half.h Updating the HIPCC implementation half so that it can declared as a __shared__ variable 2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h introducing a EIGEN_USE_STD(func) macro that calls - std::func be default - ::func when eigen is being compiled with HIPCC This change was requested in the previous HIP PR (https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff) 3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC 4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors	2018-11-19 18:13:59 +00:00
Gael Guennebaud	6a510fe69c	Make MaxPacketSize a true upper bound, even for fixed-size inputs	2018-11-16 11:25:32 +01:00
Mark D Ryan	670d56441c	PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals Commit `aa110e681b` optimised the multiplication of small dyanmically sized matrices by restricting the packet size to a maximum of 4, increasing the chances that SIMD instructions are used in the computation. However, it introduced a mismatch between the packet size and the requestedAlignment. This mismatch can lead to crashes when the destination is not aligned. This patch fixes the issue by ensuring that the AssignmentTraits are correctly computed when using a restricted packet size. * * * Bind LinearPacketType to MaxPacketSize This commit applies any packet size limit specified when instantiating copy_using_evaluator_traits to the LinearPacketType, providing that the size of the destination is not known at compile time. * * * Add unit test for restricted packet assignment A new unit test is added to check that multiplication of small dynamically sized matrices works correctly when the packet size is restricted to 4 and the destination is unaligned.	2018-11-13 16:15:08 +01:00
Nikolaus Demmel	3dc0845046	Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES	2018-11-14 18:11:30 +01:00
Gael Guennebaud	7fddc6a51f	typo	2018-11-14 14:43:18 +01:00
Gael Guennebaud	449f948b2a	help doxygen linking to DenseBase::NulllaryExpr	2018-11-14 14:42:59 +01:00
luz.paz"	f67b19a884	[PATCH 1/2] Misc. typos From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001 Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of: ``` als ans cas dum lastr lowd nd overfl pres preverse substraction te uint whch ``` --- CMakeLists.txt \| 26 +++++++++---------- Eigen/src/Core/GenericPacketMath.h \| 2 +- Eigen/src/SparseLU/SparseLU.h \| 2 +- bench/bench_norm.cpp \| 2 +- doc/HiPerformance.dox \| 2 +- doc/QuickStartGuide.dox \| 2 +- .../Eigen/CXX11/src/Tensor/TensorChipping.h \| 6 ++--- .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h \| 2 +- .../src/Tensor/TensorForwardDeclarations.h \| 4 +-- .../src/Tensor/TensorGpuHipCudaDefines.h \| 2 +- .../Eigen/CXX11/src/Tensor/TensorReduction.h \| 2 +- .../CXX11/src/Tensor/TensorReductionGpu.h \| 2 +- .../test/cxx11_tensor_concatenation.cpp \| 2 +- unsupported/test/cxx11_tensor_executor.cpp \| 2 +- 14 files changed, 29 insertions(+), 29 deletions(-)	2018-09-18 04:15:01 -04:00
Rasmus Munk Larsen	77b447c24e	Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX.	2018-11-12 13:42:24 -08:00
Gael Guennebaud	0105146915	Fix warning in c++03	2018-11-10 09:11:38 +01:00
Gael Guennebaud	784a3f13cf	bug #1619 : fix mixing of const and non-const generic iterators	2018-11-09 21:45:10 +01:00
Gael Guennebaud	db9a9a12ba	bug #1619 : make const and non-const iterators compatible	2018-11-09 16:49:19 +01:00
Gael Guennebaud	bd9a00718f	Let doxygen sees lastN	2018-11-09 11:35:48 +01:00
Gael Guennebaud	a368848473	Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE	2018-11-09 10:33:17 +01:00
Gael Guennebaud	f62a0f69c6	Fix max-size in indexed-view	2018-11-08 18:40:22 +01:00
Gael Guennebaud	bf495859ff	Merged in glchaves/eigen (pull request PR-539) Vectorize row-by-row gebp loop iterations on 16 packets as well	2018-11-07 07:21:15 +00:00
Gustavo Lima Chaves	4ad359237a	Vectorize row-by-row gebp loop iterations on 16 packets as well Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com> Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com>	2018-11-06 10:48:42 -08:00
Matthieu Vigne	8d7a73e48e	bug #1617 : Fix SolveTriangular.solveInPlace crashing for empty matrix. This made FullPivLU.kernel() crash when used on the zero matrix. Add unit test for FullPivLU.kernel() on the zero matrix.	2018-10-31 20:28:18 +01:00
Christoph Hertzberg	66b28e290d	bug #1618 : Use different power-of-2 check to avoid MSVC warning	2018-11-01 13:23:19 +01:00
Christian von Schultz	4a40b3785d	Collapsed revision (based on pull request PR-325) * Support compiling without IO streams Add the preprocessor definition EIGEN_NO_IO which, if defined, disables all use of the IO streams part of the standard library.	2018-10-22 21:14:40 +02:00
Rasmus Munk Larsen	14054e217f	Do not rely on the compiler generating __device__ functions for constexpr in Cuda (via EIGEN_CONSTEXPR_ARE_DEVICE_FUNC. This breaks several target in the TensorFlow Cuda build, e.g., INFO: From Compiling tensorflow/core/kernels/maxpooling_op_gpu.cu.cc: /b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNHWC< ::Eigen::half> ") is not allowed /b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code" /b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNCHW< ::Eigen::half> ") is not allowed /b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code 4 errors detected in the compilation of "/tmp/tmpxft_00000011_00000000-6_maxpooling_op_gpu.cu.cpp1.ii". ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: output 'tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o' was not created ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: Couldn't build file tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o: not all outputs were created or valid	2018-10-22 16:18:24 -07:00
Rasmus Munk Larsen	9caafca550	Merged in rmlarsen/eigen (pull request PR-532) Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.	2018-10-19 21:37:14 +00:00
Christoph Hertzberg	449ff74672	Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file). Manually grafted from `d107a371c6`	2018-10-19 21:10:28 +02:00
Rasmus Munk Larsen	d8f285852b	Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.	2018-10-18 16:55:02 -07:00
Gael Guennebaud	0f780bb0b4	Fix float-to-double warning	2018-10-16 09:19:45 +02:00
Gael Guennebaud	a39e0f7438	bug #1612 : fix regression in "outer-vectorization" of partial reductions for PacketSize==1 (aka complex<double>)	2018-10-16 01:04:25 +02:00
Gael Guennebaud	d2d570c116	Remove useless (and broken) resize	2018-10-16 00:42:48 +02:00
Gael Guennebaud	f0fb95135d	Iterative solvers: unify and fix handling of multiple rhs. m_info was not properly computed and the logic was repeated in several places.	2018-10-15 23:47:46 +02:00
Gael Guennebaud	3a33db4de5	merge	2018-10-15 09:22:27 +02:00
Mark D Ryan	aa110e681b	PR 526: Speed up multiplication of small, dynamically sized matrices The Packet16f, Packet8f and Packet8d types are too large to use with dynamically sized matrices typically processed by the SliceVectorizedTraversal specialization of the dense_assignment_loop. Using these types is likely to lead to little or no vectorization. Significant slowdown in the multiplication of these small matrices can be observed when building with AVX and AVX512 enabled. This patch introduces a new dense_assignment_kernel that is used when computing small products whose operands have dynamic dimensions. It ensures that the PacketSize used is no larger than 4, thereby increasing the chance that vectorized instructions will be used when computing the product. I tested all 969 possible combinations of M, K, and N that are handled by the dense_assignment_loop on x86 builds. Although a few combinations are slowed down by this patch they are far outnumbered by the cases that are sped up, as the following results demonstrate. Disabling Packed8d on AVX512 builds: Total Cases: 969 Better: 511 Worse: 85 Same: 373 Max Improvement: 169.00% (4 8 6) Max Degradation: 36.50% (8 5 3) Median Improvement: 35.46% Median Degradation: 17.41% Total FLOPs Improvement: 19.42% Disabling Packet16f and Packed8f on AVX512 builds: Total Cases: 969 Better: 658 Worse: 5 Same: 306 Max Improvement: 214.05% (8 6 5) Max Degradation: 22.26% (16 2 1) Median Improvement: 60.05% Median Degradation: 13.32% Total FLOPs Improvement: 59.58% Disabling Packed8f on AVX builds: Total Cases: 969 Better: 663 Worse: 96 Same: 210 Max Improvement: 155.29% (4 10 5) Max Degradation: 35.12% (8 3 2) Median Improvement: 34.28% Median Degradation: 15.05% Total FLOPs Improvement: 26.02%	2018-10-12 15:20:21 +02:00
Eugene Zhulenev	d9392f9e55	Fix code format	2018-11-02 14:51:35 -07:00
Eugene Zhulenev	118520f04a	Workaround nbcc+msvc compiler bug	2018-11-02 14:48:28 -07:00
Christoph Hertzberg	24dc076519	Explicitly convert 0 to Scalar for custom types	2018-10-12 10:22:19 +02:00
Gael Guennebaud	43633fbaba	Fix warning with AVX512f	2018-10-11 10:13:48 +02:00
Gael Guennebaud	97e2c808e9	Fix avx512 plog(NaN) to return NaN instead of +inf	2018-10-11 10:13:13 +02:00
Gael Guennebaud	b3f66d29a5	Enable avx512 plog with clang	2018-10-11 10:12:21 +02:00
Gael Guennebaud	f0aa7e40fc	Fix regression in changeset `5335659c47`	2018-10-10 23:47:30 +02:00
Gael Guennebaud	ce243ee45b	bug #520 : add diagmat +/- diagmat operators.	2018-10-10 23:38:22 +02:00
Gael Guennebaud	5335659c47	Merged in ezhulenev/eigen-02 (pull request PR-525) Fix bug in partial reduction of expressions requiring evaluation	2018-10-10 20:59:00 +00:00
Gael Guennebaud	eec0dfd688	bug #632 : add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense. They are rewritten as two compound assignment to by-pass hybrid dense-sparse iterator.	2018-10-10 22:50:15 +02:00
Eugene Zhulenev	8e6dc2c81d	Fix bug in partial reduction of expressions requiring evaluation	2018-10-10 13:23:52 -07:00
Eugene Zhulenev	2bf1a31d81	Use void type if stl-style iterators are not supported	2018-10-10 10:31:40 -07:00
Rasmus Munk Larsen	e8918743c1	Merged in ezhulenev/eigen-01 (pull request PR-523) Compile time detection for unimplemented stl-style iterators	2018-10-09 23:42:01 +00:00
Eugene Zhulenev	c0ca8a9fa3	Compile time detection for unimplemented stl-style iterators	2018-10-09 15:28:23 -07:00
Gael Guennebaud	1dd1f8e454	bug #65 : add vectorization of partial reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean()	2018-10-09 23:36:50 +02:00
Gael Guennebaud	bfa2a81a50	Make redux_vec_unroller more flexible regarding packet-type	2018-10-09 23:30:41 +02:00
Christoph Hertzberg	f6359ad795	Small Doxygen fixes	2018-10-09 19:33:35 +02:00
Gael Guennebaud	7a882c05ab	Fix compilation on CUDA	2018-10-09 17:02:16 +02:00
Gael Guennebaud	e00487f7d2	bug #1603 : add parenthesis around ternary operator in function body as well as a harmless attempt to make MSVC happy.	2018-10-08 22:27:04 +02:00
Gael Guennebaud	649d4758a6	merge	2018-10-08 17:35:18 +02:00
Gael Guennebaud	774bb9d6f7	fix a doxygen issue	2018-10-08 09:30:15 +02:00
Gael Guennebaud	bcb7c66b53	Workaround gcc's alloc-size-larger-than= warning	2018-10-07 21:55:59 +02:00
Gael Guennebaud	6512c5e136	Implement a better workaround for GCC's bug #87544	2018-10-07 15:00:05 +02:00
Gael Guennebaud	409132bb81	Workaround gcc bug making it trigger an invalid warning	2018-10-07 09:23:15 +02:00
Gael Guennebaud	c6a1ab4036	Workaround MSVC compilation issue	2018-10-06 13:49:17 +02:00
Gael Guennebaud	e21766c6f5	Clarify doc of rowwise/colwise/vectorwise.	2018-10-05 23:12:09 +02:00
Gael Guennebaud	d92f004ab7	Simplify API by removing allCols/allRows and reusing rowwise/colwise to define iterators over rows/columns	2018-10-05 23:11:21 +02:00
Gael Guennebaud	3e64b1fc86	Move iterators to internal, improve doc, make unit test c++03 friendly	2018-10-03 15:13:15 +02:00
Gael Guennebaud	2b2b4d0580	fix unused warning	2018-10-03 14:16:21 +02:00
Gael Guennebaud	5f26f57598	Change the logic of A.reshaped<Order>() to be a simple alias to A.reshaped<Order>(AutoSize,fix<1>). This means that now AutoOrder is allowed, and it always return a column-vector.	2018-10-03 11:41:47 +02:00
Gael Guennebaud	0481900e25	Add pointer-based iterator for direct-access expressions	2018-10-02 23:44:36 +02:00
Gael Guennebaud	8c38528168	Factorize RowsProxy/ColsProxy and related iterators using subVector<>(Index)	2018-10-02 14:03:26 +02:00
Gael Guennebaud	12487531ce	Add templated subVector<Vertical/Horizonal>(Index) aliases to col/row(Index) methods (plus subVectors<>() to retrieve the number of rows/columns)	2018-10-02 14:02:34 +02:00
Gael Guennebaud	37e29fc893	Use Index instead of ptrdiff_t or int, fix random-accessors.	2018-10-02 13:29:32 +02:00
Gael Guennebaud	de2efbc43c	bug #1605 : workaround ABI issue with vector types (aka __m128) versus scalar types (aka float)	2018-10-01 23:45:55 +02:00
Gael Guennebaud	b0c66adfb1	bug #231 : initial implementation of STL iterators for dense expressions	2018-10-01 23:21:37 +02:00
Christoph Hertzberg	564ca71e39	Merged in deven-amd/eigen/HIP_fixes (pull request PR-518) PR with HIP specific fixes (for the eigen nightly regression failures in HIP mode)	2018-10-01 16:51:04 +00:00
Deven Desai	94898488a6	This commit contains the following (HIP specific) updates: - unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h Changing "pass-by-reference" argument to be "pass-by-value" instead (in a __global__ function decl). "pass-by-reference" arguments to __global__ functions are unwise, and will be explicitly flagged as errors by the newer versions of HIP. - Eigen/src/Core/util/Memory.h - unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h Changes introduced in recent commits breaks the HIP compile. Adding EIGEN_DEVICE_FUNC attribute to some functions and calling ::malloc/free instead of the corresponding std:: versions to get the HIP compile working again - unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h Change introduced a recent commit breaks the HIP compile (link stage errors out due to failure to inline a function). Disabling the recently introduced code (only for HIP compile), to get the eigen nightly testing going again. Will submit another PR once we have te proper fix. - Eigen/src/Core/util/ConfigureVectorization.h Enabling GPU VECTOR support when HIP compiler is in use (for both the host and device compile phases)	2018-10-01 14:28:37 +00:00
Gael Guennebaud	af3ad4b513	oops, I've been too fast in previous copy/paste	2018-09-27 09:28:57 +02:00
Gael Guennebaud	24b163a877	#pragma GCC diagnostic push/pop is not supported prioro to gcc 4.6	2018-09-27 09:23:54 +02:00
Gael Guennebaud	41c3a2ffc1	Fix documentation of reshape to vectors.	2018-09-25 16:35:44 +02:00
Christoph Hertzberg	2c083ace3e	Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides	2018-09-24 18:01:17 +02:00
Gael Guennebaud	626942d9dd	fix alignment issue in ploaddup for AVX512	2018-09-28 16:57:32 +02:00
Gael Guennebaud	84a1101b36	Merge with default.	2018-09-23 21:52:58 +02:00
Gael Guennebaud	795e12393b	Fix logic in diagonaldense product in a corner case. The problem was for: diag(1x1) mat(1,n)	2018-09-22 16:44:33 +02:00
Gael Guennebaud	bac36d0996	Demangle Travseral and Unrolling in Redux	2018-09-21 23:03:45 +02:00
Gael Guennebaud	1bf12880ae	Add reshaped<>() shortcuts when returning vectors and remove the reshaping version of operator()(all)	2018-09-21 16:50:04 +02:00
Gael Guennebaud	371068992a	Add more debug output	2018-09-21 14:32:39 +02:00
Gael Guennebaud	b00e48a867	Improve slice-vectorization logic for redux (significant speed-up for reduxion of blocks)	2018-09-21 13:45:56 +02:00
Gael Guennebaud	a488d59787	merge with default Eigen	2018-09-21 11:51:49 +02:00
Gael Guennebaud	47720e7970	Doc fixes	2018-09-21 11:48:22 +02:00
Gael Guennebaud	3ec2985914	Merged indexing cleanup (pull request PR-506)	2018-09-21 09:36:05 +00:00
Gael Guennebaud	651e5d4866	Fix EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE for AVX512 or AVX with malloc aligned on 8 bytes only. This change also make it future proof for AVX1024	2018-09-21 10:33:22 +02:00
Gael Guennebaud	f0ef3467de	Fix doc	2018-09-20 22:57:28 +02:00
Gael Guennebaud	617f75f117	Add indexing namespace	2018-09-20 22:57:10 +02:00
Gael Guennebaud	0c56d22e2e	Fix shadowing	2018-09-20 22:56:21 +02:00
Gael Guennebaud	9419f506d0	Fix regression introduced by the previous fix for AVX512. It brokes the complex-complex case on SSE.	2018-09-20 17:32:34 +02:00
Gael Guennebaud	e38d1ab4d1	Workaround increases required alignment warning	2018-09-20 17:07:33 +02:00
Gael Guennebaud	71496b0e25	Fix gebp kernel for real+complex in case only reals are vectorized (e.g., AVX512). This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.	2018-09-20 17:01:24 +02:00
Gael Guennebaud	5a30eed17e	Fix warnings in AVX512	2018-09-20 16:58:51 +02:00
Gael Guennebaud	c3a19527a2	Fix doc wrt previous change	2018-09-19 11:49:26 +02:00
Gael Guennebaud	dfa8439e4d	Update reshaped API to use RowMajor/ColMajor directly as integral values instead of introducing RowOrder/ColOrder types. The API changed from A.respahed(rows,cols,RowOrder) to A.template reshaped<RowOrder>(rows,cols).	2018-09-19 11:49:26 +02:00
Gael Guennebaud	297ca62319	ease transition by adding placeholders::all/last/and as deprecated	2018-09-17 16:24:52 +02:00
Gael Guennebaud	2014c7ae28	Move all, last, end from Eigen::placeholders namespace to Eigen::, and rename end to lastp1 to avoid conflicts with std::end.	2018-09-15 14:35:10 +02:00
Gael Guennebaud	82772e8d9d	Rename Symbolic namespace to symbolic to be consistent with numext namespace	2018-09-15 14:16:20 +02:00
Gael Guennebaud	3e8188fc77	bug #1600 : initialize m_info to InvalidInput by default, even though m_info is not accessible until it has been initialized (assert)	2018-09-18 21:24:48 +02:00
Christoph Hertzberg	d7378aae8e	Provide EIGEN_ALIGNOF macro, and give handmade_aligned_malloc the possibility for alignments larger than the standard alignment.	2018-09-14 20:17:47 +02:00
Gael Guennebaud	1141bcf794	Fix conjugate-gradient for very small rhs	2018-09-13 23:53:28 +02:00
Deven Desai	c64fe9ea1f	Updates to fix HIP-clang specific compile errors. Compiling the eigen unittests with hip-clang (HIP with clang as the underlying compiler instead of hcc or nvcc), results in compile errors. The changes in this commit fix those compile errors. The main change is to convert a few instances of "__device__" to "EIGEN_DEVICE_FUNC"	2018-08-30 20:22:16 +00:00
Gael Guennebaud	5927eef612	Enable std::result_of for msvc 2015 and later	2018-09-13 09:44:46 +02:00
Christoph Hertzberg	3adece4827	Fix misleading indentation of errorCode and make it loop-local	2018-09-12 14:41:38 +02:00
Christoph Hertzberg	7e9c9fbb2d	Disable type-limits warnings for g++ < 4.8	2018-09-12 14:40:39 +02:00
Justin Carpentier	4827bec776	LLT: correct doc and add missing reference for the return type of rankUpdate --- Eigen/src/Cholesky/LLT.h \| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)	2018-09-11 09:33:21 +02:00
luz.paz"	43fd42a33b	Fix doxy and misc. typos Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` --- Eigen/src/Core/ProductEvaluators.h \| 4 ++-- Eigen/src/Core/arch/GPU/Half.h \| 2 +- Eigen/src/Core/util/Memory.h \| 2 +- Eigen/src/Geometry/Hyperplane.h \| 2 +- Eigen/src/Geometry/Transform.h \| 2 +- Eigen/src/Geometry/Translation.h \| 12 ++++++------ doc/PreprocessorDirectives.dox \| 2 +- doc/TutorialGeometry.dox \| 2 +- test/boostmultiprec.cpp \| 2 +- test/triangular.cpp \| 2 +- 10 files changed, 16 insertions(+), 16 deletions(-)	2018-08-01 21:34:47 -04:00
Jiandong Ruan	6dcd2642aa	bug #1526 - CUDA compilation fails on CUDA 9.x SDK when arch is set to compute_60 and/or above	2018-09-08 12:05:33 -07:00
Alexey Frunze	ec38f07b79	bug #1595 : Don't use C++11's std::isnan() in MIPS/MSA packet math. This removes reliance on C++11 and improves generated code.	2018-09-06 15:40:09 -07:00
cgs1019	c6066ac411	Make param name and docs constistent for JacobiRotation::makeGivens Previously the rendered math in the doc string called the optional return value 'r', while the actual parameter and the doc string text referred to the parameter as 'z'. This changeset renames all the z's to r's to match the math.	2018-09-06 11:04:17 -04:00
Christoph Hertzberg	ddbc564386	Fixed a few more shadowing warnings when compiling with g++ (and c++03)	2018-08-30 16:33:03 +02:00
Mehdi Goli	7ec8b40ad9	Collapsed revision * Separating SYCL math function. * Converting function overload to function specialisation. * Applying the suggested design.	2018-08-28 14:20:48 +01:00
Christoph Hertzberg	73ca600bca	Fix numerous shadow-warnings for GCC<=4.8	2018-08-28 18:32:39 +02:00
Christoph Hertzberg	ef4d79fed8	Disable/ReenableStupidWarnings did not work properly, when included recursively	2018-08-28 18:26:22 +02:00
Gael Guennebaud	befaf83f5f	bug #1590 : fix collision with some system headers defining the macro FP32	2018-08-28 13:21:28 +02:00
Christoph Hertzberg	42f3ee4fb8	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop Workaround: Don't include "DisableStupidWarnings.h" before including other main-headers	2018-08-28 11:44:15 +02:00
Alexey Frunze	050bcf6126	bug #1584 : Improve random (avoid undefined behavior).	2018-08-08 20:19:32 -07:00
Christoph Hertzberg	ad4a08fb68	Use Intel cast intrinsics, since MSVC does not allow direct casting. Reported by David Winkler.	2018-08-24 19:04:33 +02:00
Christoph Hertzberg	41f1cc67b8	Assertion depended on a not yet initialized value	2018-08-17 16:42:53 +02:00
Christoph Hertzberg	595cae9b09	Silence logical-op-parentheses warning	2018-08-17 16:30:32 +02:00
Justin Carpentier	eabc7a4031	PR 465: Fix issue in RowMajor assignment in plain_matrix_type_row_major::type The type should be RowMajor	2018-08-10 14:30:06 +02:00
Rasmus Munk Larsen	c49e93440f	SuiteSparse defines the macro SuiteSparse_long to control what type is used for 64bit integers. The default value of this macro on non-MSVC platforms is long and __int64 on MSVC. CholmodSupport defaults to using long for the long variants of CHOLMOD functions. This creates problems when SuiteSparse_long is different than long. So the correct thing to do here is to use SuiteSparse_long as the type instead of long.	2018-08-13 15:53:31 -07:00
Gael Guennebaud	3ec60215df	Merged in rmlarsen/eigen2 (pull request PR-466) Move sigmoid functor to core and rename it to 'logistic'.	2018-08-13 21:28:20 +00:00
Rasmus Munk Larsen	d6e283ba96	sigmoid -> logistic	2018-08-13 11:14:50 -07:00
Rasmus Munk Larsen	bfc5091dd5	Cast to diagonalSize to RealScalar instead Scalar.	2018-08-09 14:46:17 -07:00
Rasmus Munk Larsen	8603d80029	Cast diagonalSize() to Scalar before multiplication. Without this, automatic differentiation in Ceres breaks because Scalar is a custom type that does not support multiplication by Index.	2018-08-09 11:09:10 -07:00
Mehdi Goli	67711eaa31	Fixing typo.	2018-08-08 11:38:10 +01:00
Mehdi Goli	22031ab59a	Adding EIGEN_UNROLL_LOOP macro.	2018-08-08 11:07:27 +01:00
Rasmus Munk Larsen	fa68342ef8	Move sigmoid functor to core.	2018-08-03 17:31:23 -07:00
Rasmus Munk Larsen	7f8b53fd0e	bug #1580 : Fix cuda clang build. STL is not supported, so std::equal_to and std::not_equal breaks compilation. Update the definition of EIGEN_CONSTEXPR_ARE_DEVICE_FUNC to exclude clang. See also PR 450.	2018-08-01 12:36:24 -07:00
Mehdi Goli	01358300d5	Creating separate SYCL required PR for uncontroversial files.	2018-08-03 16:59:15 +01:00
Gael Guennebaud	62169419ab	Fix two regressions introduced in previous merges: bad usage of EIGEN_HAS_VARIADIC_TEMPLATES and linking issue.	2018-08-01 23:35:34 +02:00
Benoit Steiner	17221115c9	Merged in codeplaysoftware/eigen-upstream-pure/eigen_variadic_assert (pull request PR-447) Adding variadic version of assert which can take a parameter pack as its input.	2018-08-01 16:41:54 +00:00
Mehdi Goli	af96018b49	Using the suggested modification.	2018-08-01 16:04:44 +01:00
Mehdi Goli	c84509d7cc	Adding new arch/SYCL headers, used for SYCL vectorization.	2018-08-01 12:40:54 +01:00
Mehdi Goli	3a197a60e6	variadic version of assert which can take a parameter pack as its input.	2018-08-01 12:19:14 +01:00
Alexey Frunze	7b91c11207	bug #1578 : Improve prefetching in matrix multiplication on MIPS.	2018-07-24 18:36:44 -07:00
Mark D Ryan	bc615e4585	Re-enable FMA for fast sqrt functions	2018-07-30 13:21:00 +02:00
Mark D Ryan	e79c5149bf	Fix AVX512 implementations of psqrt This commit fixes the AVX512 implementations of psqrt in the same way that `3ed67cb0bb` fixed the AVX2 version of this function. The AVX512 versions of psqrt incorrectly return -0.0 for negative values, instead of NaN. Fixing the issues requires adding some additional instructions that slow down the algorithms. A similar test to the one used in `3ed67cb0bb` shows that the corrected Packet16f code runs at 73% of the speed of the existing code, while the corrected Packed8d function runs at 68% of the original.	2018-06-25 05:05:02 -07:00
Rasmus Munk Larsen	2ebcb911b2	Add pcast packet op for NEON.	2018-07-26 14:28:48 -07:00
Christoph Hertzberg	fd4fe7cbc5	Fixed issue which made documentation not getting built anymore	2018-07-24 22:56:15 +02:00
Gael Guennebaud	4ca3e48f42	fix typo	2018-07-23 16:51:57 +02:00
Gael Guennebaud	c747cde69a	Add lastN shorcuts to seq/seqN.	2018-07-23 16:20:25 +02:00
Eugene Zhulenev	2bf864f1eb	Disable type traits for stdlibc++ <= 4.9.3	2018-07-20 10:11:44 -07:00
Gael Guennebaud	509a5fa77f	Fix IsRelocatable without C++11	2018-07-19 18:47:38 +02:00
Gael Guennebaud	2ca2592009	Fix determination of EIGEN_HAS_TYPE_TRAITS	2018-07-19 18:47:18 +02:00
Gael Guennebaud	5e5987996f	Fix stupid error in Quaternion move ctor	2018-07-19 18:33:53 +02:00
Alexey Frunze	1f523e7304	Add MIPS changes missing from previous merge.	2018-07-18 12:27:50 -07:00
Eugene Zhulenev	086ded5c85	Disable type traits for GCC < 5.1.0	2018-07-18 16:32:55 -07:00
Gael Guennebaud	863580fe88	bug #1432 : fix conservativeResize for non-relocatable scalar types. For those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable	2018-07-18 23:33:07 +02:00
Gael Guennebaud	a503fc8725	bug #1575 : fix regression introduced in bug #1573 patch. Move ctor/assignment should not be defaulted.	2018-07-18 23:26:13 +02:00
Gael Guennebaud	308725c3c9	More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDA	2018-07-18 13:51:36 +02:00
Deven Desai	f124f07965	applying EIGEN_DECLARE_TEST to gpu tests Also, a few minor fixes for GPU tests running in HIP mode. 1. Adding an include for hip/hip_runtime.h in the Macros.h file For HIP __host__ and __device__ are macros which are defined in hip headers. Their definitions need to be included before their use in the file. 2. Fixing the compile failure in TensorContractionGpu introduced by the commit to "Fuse computations into the Tensor contractions using output kernel" 3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit	2018-07-17 14:16:48 -04:00
Gael Guennebaud	2b2cd85694	bug #1573 : add noexcept move constructor and move assignment operator to Quaternion	2018-07-17 11:11:33 +02:00
Gael Guennebaud	5539587b1f	Some warning fixes	2018-07-17 10:29:12 +02:00
Gael Guennebaud	40797dbea3	bug #1572 : use c++11 atomic instead of volatile if c++11 is available, and disable multi-threaded GEMM on non-x86 without c++11.	2018-07-17 00:11:20 +02:00
Gael Guennebaud	a87cff20df	Fix GeneralizedEigenSolver when requesting for eigenvalues only.	2018-07-14 09:38:49 +02:00
Rasmus Munk Larsen	4a3952fd55	Relax the condition to not only work on Android.	2018-07-13 11:24:07 -07:00
Rasmus Munk Larsen	02a9443db9	Clang produces incorrect Thumb2 assembler when using alloca. Don't define EIGEN_ALLOCA when generating Thumb with clang.	2018-07-13 11:03:04 -07:00
Gael Guennebaud	20991c3203	bug #1571 : fix is_convertible<from,to> with "from" a reference.	2018-07-13 17:47:28 +02:00
Gael Guennebaud	86d9c0255c	Forward declaring std::array does not work with all std libs, so let's just include <array>	2018-07-13 13:06:44 +02:00
Alexey Frunze	3875fb05aa	Add support for MIPS SIMD (MSA)	2018-07-06 16:04:30 -07:00
Gael Guennebaud	5c73c9223a	Fix shadowing typedefs	2018-07-12 17:01:07 +02:00
Gael Guennebaud	98728312c8	Fix compilation regarding std::array	2018-07-12 17:00:37 +02:00
Gael Guennebaud	eb3d8f68bb	fix unused warning	2018-07-12 16:59:47 +02:00
Gael Guennebaud	006e18e52b	Cleanup the mess in Eigen/Core by moving CUDA/HIP stuff at more appropriate places (Macros.h), and alignment/vectorization logic is now in util/ConfigureVectorization.h	2018-07-12 16:57:41 +02:00
Julian Kent	6d451cf2b6	Add missing consts for rows and cols functions in SparseLU	2018-02-10 13:44:05 +01:00
Gael Guennebaud	8bdb214fd0	remove double ;;	2018-07-12 11:17:53 +02:00
Gael Guennebaud	a9060378d3	bug #1570 : fix warning	2018-07-12 11:07:09 +02:00
Gael Guennebaud	da0c604078	Merged in deven-amd/eigen (pull request PR-402) Adding support for using Eigen in HIP kernels.	2018-07-12 08:07:16 +00:00
Gael Guennebaud	a4ea611ca7	Remove useless specialization thanks to is_convertible being more robust.	2018-07-12 09:59:44 +02:00
Gael Guennebaud	8ef267ccbd	spellcheck	2018-07-12 09:58:29 +02:00
Gael Guennebaud	21cf4a1a8b	Make is_convertible more robust and conformant to std::is_convertible	2018-07-12 09:57:19 +02:00
Gael Guennebaud	8a5955a052	Optimize the product of a householder-sequence with the identity, and optimize the evaluation of a HouseholderSequence to a dense matrix using faster blocked product.	2018-07-11 17:16:50 +02:00
Gael Guennebaud	d193cc87f4	Fix regression in `9357838f94`	2018-07-11 17:09:23 +02:00
Gael Guennebaud	fb33687736	Fix double ;;	2018-07-11 17:08:30 +02:00
Deven Desai	876f392c39	Updates corresponding to the latest round of PR feedback The major changes are 1. Moving CUDA/PacketMath.h to GPU/PacketMath.h 2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h 3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h The above three changes effectively enable the Eigen "Packet" layer for the HIP platform 4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic") 5. Updating the "EIGEN_DEVICE_FUNC" marking in some places The change has been tested on the HIP and CUDA platforms.	2018-07-11 10:39:54 -04:00
Deven Desai	471cfe5ff7	renaming CUDA* to GPU* for some header files	2018-07-11 09:22:04 -04:00
Deven Desai	38807a2575	merging updates from upstream	2018-07-11 09:17:33 -04:00
Gael Guennebaud	f00d08cc0a	Optimize extraction of Q in SparseQR by exploiting the structure of the identity matrix.	2018-07-11 14:01:47 +02:00
Gael Guennebaud	1625476091	Add internall::is_identity compile-time helper	2018-07-11 14:00:24 +02:00
Gael Guennebaud	fe723d6129	Fix conversion warning	2018-07-10 09:10:32 +02:00
Gael Guennebaud	9357838f94	bug #1543 : improve linear indexing for general block expressions	2018-07-10 09:10:15 +02:00
Gael Guennebaud	de9e31a06d	Introduce the macro ei_declare_local_nested_eval to help allocating on the stack local temporaries via alloca, and let outer-products makes a good use of it. If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.	2018-07-09 15:41:14 +02:00
Gael Guennebaud	ec323b7e66	Skip null numerators in triangular-vector-solve (as in BLAS TRSV).	2018-07-09 11:13:19 +02:00
Gael Guennebaud	359dd77ec3	Fix legitimate "declaration shadows a typedef" warning	2018-07-09 11:03:39 +02:00
Mark D Ryan	90a53ca6fd	Fix the Packet16h version of ptranspose The AVX512 version of ptranpose for PacketBlock<Packet16h,16> was reordering the PacketBlock argument incorrectly. This lead to errors in the multiplication of matrices composed of 16 bit floats on AVX512 machines, if at least of the matrices was using RowMajor order. This error is responsible for one tensorflow unit test failure on AVX512 machines: //tensorflow/python/kernel_tests:batch_matmul_op_test	2018-06-16 15:13:06 -07:00
Gael Guennebaud	1f54164eca	Fix a few issues with Packet16h	2018-07-07 00:15:07 +02:00
Gael Guennebaud	f2dc048df9	complete implementation of Packet16h (AVX512)	2018-07-06 17:43:11 +02:00
Gael Guennebaud	f4d623ffa7	Complete Packet8h implementation and test it in packetmath unit test	2018-07-06 17:13:36 +02:00
Deven Desai	b6cc0961b1	updates based on PR feedback There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC \|\| EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH \|\| EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`	2018-06-14 10:21:54 -04:00
Deven Desai	ba972fb6b4	moving Half headers from CUDA dir to GPU dir, removing the HIP versions	2018-06-13 12:26:18 -04:00
Deven Desai	d1d22ef0f4	syncing this fork with upstream	2018-06-13 12:09:52 -04:00
Benoit Steiner	d3a380af4d	Merged in mfigurnov/eigen/gamma-der-a (pull request PR-403) Derivative of the incomplete Gamma function and the sample of a Gamma random variable Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>	2018-06-11 17:57:47 +00:00
Andrea Bocci	f7124b3e46	Extend CUDA support to matrix inversion and selfadjointeigensolver	2018-06-11 18:33:24 +02:00
Gael Guennebaud	0537123953	bug #1565 : help MSVC to generatenot too bad ASM in reductions.	2018-07-05 09:21:26 +02:00

... 3 4 5 6 7 ...

6019 Commits