eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-06 14:14:46 +08:00

Author	SHA1	Message	Date
Srinivas Vasudevan	e38dd48a27	PR 681: Add ndtri function, the inverse of the normal distribution function.	2019-08-12 19:26:29 -04:00
Eugene Zhulenev	f59bed7a13	Change typedefs from private to protected to fix MSVC compilation	2019-09-03 19:11:36 -07:00
Srinivas Vasudevan	18ceb3413d	Add ndtri function, the inverse of the normal distribution function.	2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen	d55d392e7b	Fix bugs in log1p and expm1 where repeated using statements would clobber each other. Add specializations for complex types since std::log1p and std::exp1m do not support complex.	2019-08-08 16:27:32 -07:00
Gael Guennebaud	15f3d9d272	More colamd cleanup: - Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc. - Fix signed/unsigned warning - move some ugly free functions as member functions	2019-09-03 00:50:51 +02:00
Anshul Jaiswal	a4d1a6cd7d	Eigen_Colamd.h updated to replace constexpr with consts and enums.	2019-08-17 05:29:23 +00:00
Anshul Jaiswal	283558face	Ordering.h edited to fix dependencies on Eigen_Colamd.h	2019-08-15 20:21:56 +00:00
Anshul Jaiswal	39f30923c2	Eigen_Colamd.h edited replacing macros with constexprs and functions.	2019-08-15 20:15:19 +00:00
Anshul Jaiswal	0a6b553ecf	Eigen_Colamd.h edited online with Bitbucket replacing constant #defines with const definitions	2019-07-21 04:53:31 +00:00
Michael Grupp	6e17491f45	Fix typo in Umeyama method documentation	2019-07-17 11:20:41 +00:00
Christoph Hertzberg	e0f5a2a456	Remove {} accidentally added in previous commit	2019-07-18 20:22:17 +02:00
Christoph Hertzberg	ea6d7eb32f	Move variadic constructors outside `#ifndef EIGEN_PARSED_BY_DOXYGEN` block, to make it actually appear in the generated documentation.	2019-07-12 19:46:37 +02:00
Christoph Hertzberg	c2671e5315	Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNING Also, document LinSpaced only where it is implemented	2019-07-12 19:43:32 +02:00
Rasmus Munk Larsen	23b958818e	Fix compiler for unsigned integers.	2019-07-09 11:18:25 -07:00
Anshul Jaiswal	fab51d133e	Updated Eigen_Colamd.h, namespacing macros ALIVE & DEAD as COLAMD_ALIVE & COLAMD_DEAD to prevent conflicts with other libraries / code.	2019-06-08 21:09:06 +00:00
Rasmus Munk Larsen	f6c51d9209	Fix missing header inclusion and colliding definitions for half type casting, which broke build with -march=native on Haswell/Skylake.	2019-08-30 14:03:29 -07:00
Rasmus Munk Larsen	1187bb65ad	Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf.	2019-08-28 12:20:21 -07:00
Rasmus Munk Larsen	9aba527405	Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly.	2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen	b021cdea6d	Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.	2019-08-27 11:30:31 -07:00
Christoph Hertzberg	2fb24384c9	Merged in jaopaulolc/eigen (pull request PR-679) Fixes for Altivec/VSX and compilation with clang on PowerPC	2019-08-22 15:57:33 +00:00
João P. L. de Carvalho	5ac7984ffa	Fix debug macros in p{load,store}u	2019-08-14 11:59:12 -06:00
João P. L. de Carvalho	db9147ae40	Add missing pcmp_XX methods for double/Packet2d This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.	2019-08-14 10:37:39 -06:00
Rasmus Munk Larsen	a3298b22ec	Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments. Depending on instruction set, significant speedups are observed for the vectorized path: log1p wall time is reduced 60-93% (2.5x - 15x speedup) expm1 wall time is reduced 0-85% (1x - 7x speedup) The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly. Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM	2019-08-12 13:53:28 -07:00
João P. L. de Carvalho	787f6ef025	Fix packed load/store for PowerPC's VSX The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts. For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f. Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.	2019-08-09 16:02:55 -06:00
João P. L. de Carvalho	4d29aa0294	Fix offset argument of ploadu/pstoreu for Altivec If no offset is given, them it should be zero. Also passes full address to vec_vsx_ld/st builtins. Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT. Removes unnecessary casts.	2019-08-09 15:59:26 -06:00
João P. L. de Carvalho	66d073c38e	bug #1718 : Add cast to successfully compile with clang on PowerPC Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h	2019-08-09 15:56:26 -06:00
Justin Carpentier	ffaf658ecd	PR 655: Fix missing Eigen namespace in Macros	2019-06-05 09:51:59 +02:00
Mehdi Goli	0b24e1cb5c	[SYCL] Adding the SYCL memory model. The SYCL memory model provides : * an interface for SYCL buffers to behave as a non-dereferenceable pointer * an interface for placeholder accessor to behave like a pointer on both host and device	2019-07-01 16:02:30 +01:00
Rasmus Munk Larsen	8053eeb51e	Fix CUDA compilation error for pselect<half>.	2019-06-28 12:07:29 -07:00
Mehdi Goli	16a56b2ddd	[SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL. * Adding SYCL memory model * Enabling/Disabling SYCL backend in Core * Supporting Vectorization	2019-06-27 12:25:09 +01:00
Deven Desai	ba506d5bd2	fix for a ROCm/HIP specificcompile errror introduced by a recent commit.	2019-06-22 00:06:05 +00:00
Rasmus Munk Larsen	c9394d7a0e	Remove extra "one" in comment.	2019-06-20 16:23:19 -07:00
Rasmus Munk Larsen	b8f8dac4eb	Update comment as suggested by tra@google.com.	2019-06-20 16:18:37 -07:00
Rasmus Munk Larsen	e5e63c2cad	Fix grammar.	2019-06-20 16:03:59 -07:00
Rasmus Munk Larsen	302a404b7e	Added comment explaining the surprising EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC clause.	2019-06-20 15:59:08 -07:00
Rasmus Munk Larsen	b5237f53b1	Fix CUDA build on Mac.	2019-06-20 15:44:14 -07:00
Rasmus Munk Larsen	988f24b730	Various fixes for packet ops. 1. Fix buggy pcmp_eq and unit test for half types. 2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types. 3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.	2019-06-20 11:47:49 -07:00
Christoph Hertzberg	e0be7f30e1	bug #1724 : Mask buggy warnings with g++-7 (grafted from `427f2f66d6` )	2019-06-14 14:57:46 +02:00
Rasmus Munk Larsen	6d432eae5d	Make is_valid_index_type return false for float and double when EIGEN_HAS_TYPE_TRAITS is off.	2019-06-05 16:42:27 -07:00
Rasmus Munk Larsen	f715f6e816	Add workaround for choosing the right include files with FP16C support with clang.	2019-06-05 13:36:37 -07:00
Rasmus Munk Larsen	b08527b0c1	Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures.	2019-05-31 15:26:06 -07:00
Deven Desai	2c38930161	fix for HIP build errors that were introduced by a commit earlier this week	2019-05-24 14:25:32 +00:00
Gustavo Lima Chaves	56bc4974fb	GEMV: remove double declaration of constant. That was hurting users with compilers that would object to proceed with that: """ ./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow] LhsPacketSize = Traits::LhsPacketSize, ^ ./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here static const Index LhsPacketSize = Traits::LhsPacketSize; """	2019-05-23 14:50:29 -07:00
Christoph Hertzberg	ac21a08c13	Cast Index to RealScalar This fixes compilation issues with RealScalar types that are not implicitly castable from Index (e.g. ceres Jet types). Reported by Peter Anderson-Sprecher via eMail	2019-05-23 15:31:12 +02:00
Rasmus Munk Larsen	3eb5ad0ed0	Enable support for F16C with Clang. The required intrinsics were added here: https://reviews.llvm.org/D16177 and are part of LLVM 3.8.0.	2019-05-20 17:19:20 -07:00
Rasmus Larsen	e92486b8c3	Merged in rmlarsen/eigen (pull request PR-643) Make Eigen build with cuda 10 and clang. Approved-by: Justin Lebar <justin.lebar@gmail.com>	2019-05-20 17:02:39 +00:00
Gael Guennebaud	cc7ecbb124	Merged in scramsby/eigen (pull request PR-646) Eigen: Fix MSVC C++17 language standard detection logic	2019-05-20 07:19:10 +00:00
Rasmus Larsen	bf9cbed8d0	Merged in glchaves/eigen (pull request PR-635) Speed up GEMV on AVX-512 builds, just as done for GEBP previously. Approved-by: Rasmus Larsen <rmlarsen@google.com>	2019-05-17 19:40:50 +00:00
Rasmus Munk Larsen	ab0a30e429	Make Eigen build with cuda 10 and clang.	2019-05-15 13:32:15 -07:00
Christoph Hertzberg	5f32b79edc	Collapsed revision from PR-641 * SparseLU.h - corrected example, it didn't compile * Changed encoding back to UTF8	2019-05-13 19:02:30 +02:00
Anuj Rawat	ad372084f5	Removing unused API to fix compile error in TensorFlow due to AVX512VL, AVX512BW usage	2019-05-12 14:43:10 +00:00
Christoph Hertzberg	4ccd1ece92	bug #1707 : Fix deprecation warnings, or disable warnings when testing deprecated functions	2019-05-10 14:57:05 +02:00
Rasmus Munk Larsen	d3ef7cf03e	Fix build with clang on Windows.	2019-05-09 11:07:04 -07:00
Eugene Zhulenev	45b40d91ca	Fix AVX512 & GCC 6.3 compilation	2019-05-07 16:44:55 -07:00
Christoph Hertzberg	cca76c272c	Restore C++03 compatibility	2019-05-06 16:18:22 +02:00
Rasmus Munk Larsen	8e33844fc7	Fix traits for scalar_logistic_op.	2019-05-03 15:49:09 -07:00
Scott Ramsby	ff06ef7584	Eigen: Fix MSVC C++17 language standard detection logic To detect C++17 support, use _MSVC_LANG macro instead of _MSC_VER. _MSC_VER can indicate whether the current compiler version could support the C++17 language standard, but not whether that standard is actually selected (i.e. via /std:c++17). See these web pages for more details: https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/ https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros	2019-05-03 14:14:09 -07:00
Eugene Zhulenev	e9f0eb8a5e	Add masked_store_available to unpacket_traits	2019-05-02 14:52:58 -07:00
Eugene Zhulenev	96e30e936a	Add masked pstoreu for Packet16h	2019-05-02 14:11:01 -07:00
Eugene Zhulenev	b4010f02f9	Add masked pstoreu to AVX and AVX512 PacketMath	2019-05-02 13:14:18 -07:00
Gael Guennebaud	578407f42f	Fix regression in changeset `ae33e866c7`	2019-05-02 15:45:21 +02:00
Gustavo Lima Chaves	d4dcb71bcb	Speed up GEMV on AVX-512 builds, just as done for GEBP previously. We take advantage of smaller SIMD registers as well, in that case. Gains up to 3x for select input sizes.	2019-04-26 14:12:39 -07:00
Andy May	ae33e866c7	Fix compilation with PGI version 19	2019-04-25 21:23:19 +01:00
Gael Guennebaud	665ac22cc6	Merged in ezhulenev/eigen-01 (pull request PR-632) Fix doxygen warnings	2019-04-25 20:02:20 +00:00
Eugene Zhulenev	8ead5bb3d8	Fix doxygen warnings to enable statis code analysis	2019-04-24 12:42:28 -07:00
Eugene Zhulenev	07355d47c6	Get rid of SequentialLinSpacedReturnType deprecation warnings in DenseBase.h	2019-04-24 11:01:35 -07:00
Rasmus Munk Larsen	144ca33321	Remove deprecation annotation from typedef Eigen::Index Index, as it would generate too many build warnings.	2019-04-24 08:50:07 -07:00
Eugene Zhulenev	a7b7f3ca8a	Add missing EIGEN_DEPRECATED annotations to deprecated functions and fix few other doxygen warnings	2019-04-23 17:23:19 -07:00
Eugene Zhulenev	68a2a8c445	Use packet ops instead of AVX2 intrinsics	2019-04-23 11:41:02 -07:00
Anuj Rawat	8c7a6feb8e	Adding lowlevel APIs for optimized RHS packet load in TensorFlow SpatialConvolution Low-level APIs are added in order to optimized packet load in gemm_pack_rhs in TensorFlow SpatialConvolution. The optimization is for scenario when a packet is split across 2 adjacent columns. In this case we read it as two 'partial' packets and then merge these into 1. Currently this only works for Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other packet types (such as Packet8d) also. This optimization shows significant speedup in SpatialConvolution with certain parameters. Some examples are below. Benchmark parameters are specified as: Batch size, Input dim, Depth, Num of filters, Filter dim Speedup numbers are specified for number of threads 1, 2, 4, 8, 16. AVX512: Parameters \| Speedup (Num of threads: 1, 2, 4, 8, 16) ----------------------------\|------------------------------------------ 128, 24x24, 3, 64, 5x5 \|2.18X, 2.13X, 1.73X, 1.64X, 1.66X 128, 24x24, 1, 64, 8x8 \|2.00X, 1.98X, 1.93X, 1.91X, 1.91X 32, 24x24, 3, 64, 5x5 \|2.26X, 2.14X, 2.17X, 2.22X, 2.33X 128, 24x24, 3, 64, 3x3 \|1.51X, 1.45X, 1.45X, 1.67X, 1.57X 32, 14x14, 24, 64, 5x5 \|1.21X, 1.19X, 1.16X, 1.70X, 1.17X 128, 128x128, 3, 96, 11x11 \|2.17X, 2.18X, 2.19X, 2.20X, 2.18X AVX2: Parameters \| Speedup (Num of threads: 1, 2, 4, 8, 16) ----------------------------\|------------------------------------------ 128, 24x24, 3, 64, 5x5 \| 1.66X, 1.65X, 1.61X, 1.56X, 1.49X 32, 24x24, 3, 64, 5x5 \| 1.71X, 1.63X, 1.77X, 1.58X, 1.68X 128, 24x24, 1, 64, 5x5 \| 1.44X, 1.40X, 1.38X, 1.37X, 1.33X 128, 24x24, 3, 64, 3x3 \| 1.68X, 1.63X, 1.58X, 1.56X, 1.62X 128, 128x128, 3, 96, 11x11 \| 1.36X, 1.36X, 1.37X, 1.37X, 1.37X In the higher level benchmark cifar10, we observe a runtime improvement of around 6% for AVX512 on Intel Skylake server (8 cores). On lower level PackRhs micro-benchmarks specified in TensorFlow tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe the following runtime numbers: AVX512: Parameters \| Runtime without patch (ns) \| Runtime with patch (ns) \| Speedup ---------------------------------------------------------------\|----------------------------\|-------------------------\|--------- BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) \| 41350 \| 15073 \| 2.74X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) \| 7277 \| 7341 \| 0.99X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) \| 8675 \| 8681 \| 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) \| 24155 \| 16079 \| 1.50X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) \| 25052 \| 17152 \| 1.46X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) \| 18269 \| 18345 \| 1.00X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) \| 19468 \| 19872 \| 0.98X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) \| 156060 \| 42432 \| 3.68X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) \| 132701 \| 36944 \| 3.59X AVX2: Parameters \| Runtime without patch (ns) \| Runtime with patch (ns) \| Speedup ---------------------------------------------------------------\|----------------------------\|-------------------------\|--------- BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) \| 26233 \| 12393 \| 2.12X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) \| 6091 \| 6062 \| 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) \| 7427 \| 7408 \| 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) \| 23453 \| 20826 \| 1.13X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) \| 23167 \| 22091 \| 1.09X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) \| 23422 \| 23682 \| 0.99X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) \| 23165 \| 23663 \| 0.98X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) \| 72689 \| 44969 \| 1.62X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) \| 61732 \| 39779 \| 1.55X All benchmarks on Intel Skylake server with 8 cores.	2019-04-20 06:46:43 +00:00
Gael Guennebaud	45e65fbb77	bug #1695 : fix a numerical robustness issue. Computing the secular equation at the middle range without a shift might give a wrong sign.	2019-03-27 20:16:58 +01:00
William D. Irons	8de66719f9	Collapsed revision from PR-619 * Add support for pcmp_eq in AltiVec/Complex.h * Fixed implementation of pcmp_eq for double The new logic is based on the logic from NEON for double.	2019-03-26 18:14:49 +00:00
Gael Guennebaud	f11364290e	ICC does not support -fno-unsafe-math-optimizations	2019-03-22 09:26:24 +01:00
David Tellenbach	3031d57200	PR 621: Fix documentation of EIGEN_COMP_EMSCRIPTEN	2019-03-21 02:21:04 +01:00
Deven Desai	51e399fc15	updates requested in the PR feedback. Also droping coded within #ifdef EIGEN_HAS_OLD_HIP_FP16	2019-03-19 21:45:25 +00:00
Deven Desai	2dbea5510f	Merged eigen/eigen into default	2019-03-19 16:52:38 -04:00
Rasmus Larsen	5c93b38c5f	Merged in rmlarsen/eigen (pull request PR-618) Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_op<float>. Approved-by: Gael Guennebaud <g.gael@free.fr>	2019-03-18 15:51:55 +00:00
Gael Guennebaud	cf7e2e277f	bug #1692 : enable enum as sizes of Matrix and Array	2019-03-17 21:59:30 +01:00
Rasmus Munk Larsen	e42f9aa68a	Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_<float>.	2019-03-15 17:15:14 -07:00
Rasmus Munk Larsen	8450a6d519	Clean up half packet traits and add a few more missing packet ops.	2019-03-14 15:18:06 -07:00
David Tellenbach	97f9a46cb9	PR 593: Add variadtic ctor for DiagonalMatrix with unit tests	2019-03-14 10:18:24 +01:00
Rasmus Munk Larsen	6a34003141	Remove EIGEN_MPL2_ONLY guard in IncompleteCholesky that is no longer needed after the AMD reordering code was relicensed to MPL2.	2019-03-13 11:52:41 -07:00
Gael Guennebaud	d7d2f0680e	bug #1684 : partially workaround clang's 6/7 bug #40815	2019-03-13 10:40:01 +01:00
Rasmus Munk Larsen	77f7d4a894	Clean up PacketMathHalf.h and add a few missing logical packet ops.	2019-03-11 17:51:16 -07:00
Gael Guennebaud	656d9bc66b	Apply SSE's pmin/pmax fix for GCC <= 5 to AVX's pmin/pmax	2019-03-10 21:19:18 +01:00
Rasmus Larsen	4d808e834a	Merged in rmlarsen/eigen_threadpool (pull request PR-606) Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239` Approved-by: Sameer Agarwal <sameeragarwal@google.com>	2019-03-06 17:59:03 +00:00
Gael Guennebaud	bfbf7da047	bug #1689 fix used-but-marked-unused warning	2019-03-05 23:46:24 +01:00
Rasmus Munk Larsen	0318fc7f44	Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239`	2019-03-05 10:24:54 -08:00
Gael Guennebaud	b0d406d91c	Enable construction of Ref<VectorType> from a runtime vector.	2019-03-03 15:25:25 +01:00
Sam Hasinoff	9ba81cf0ff	Fully qualify Eigen::internal::aligned_free This helps avoids a conflict on certain Windows toolchains (potentially due to some ADL name resolution bug) in the case where aligned_free is defined in the global namespace. In any case, tightening this up is harmless.	2019-03-02 17:42:16 +00:00
Gael Guennebaud	22144e949d	bug #1629 : fix compilation of PardisoSupport (regression introduced in changeset `a7842daef2` )	2019-03-02 22:44:47 +01:00
Rasmus Larsen	2ca1e73239	Merged in rmlarsen/eigen (pull request PR-597) Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2. Approved-by: Gael Guennebaud <g.gael@free.fr>	2019-02-25 17:02:16 +00:00
Gael Guennebaud	e409dbba14	Enable SSE vectorization of Quaternion and cross3() with AVX	2019-02-23 10:45:40 +01:00
Gael Guennebaud	0b25a5c431	fix alignment in ploadquad	2019-02-22 21:39:36 +01:00
Rasmus Munk Larsen	1dc1677d52	Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2. Google LLC executed a license agreement with the author of the code from which these files are derived to allow the Eigen project to distribute the code and derived works under MPL2.	2019-02-22 12:33:57 -08:00
Gael Guennebaud	cca6c207f4	AVX512: implement faster ploadquad<Packet16f> thus speeding up GEMM	2019-02-21 17:18:28 +01:00
Gael Guennebaud	1c09ee8541	bug #1674 : workaround clang fast-math aggressive optimizations	2019-02-22 15:48:53 +01:00
Gael Guennebaud	7e3084bb6f	Fix compilation on ARM.	2019-02-22 14:56:12 +01:00
Gael Guennebaud	42c23f14ac	Speed up col/row-wise reverse for fixed size matrices by propagating compile-time sizes.	2019-02-21 22:44:40 +01:00
Rasmus Munk Larsen	4d7f317102	Add a few missing packet ops: cmp_eq for NEON. pfloor for GPU.	2019-02-21 13:32:13 -08:00
Gael Guennebaud	2a39659d79	Add fully generic Vector<Type,Size> and RowVector<Type,Size> type aliases.	2019-02-20 15:23:23 +01:00
Gael Guennebaud	302377110a	Update documentation of Matrix and Array type aliases.	2019-02-20 15:18:48 +01:00
Gael Guennebaud	44b54fa4a3	Protect c++11 type alias with Eigen's macro, and add respective unit test.	2019-02-20 14:43:05 +01:00
Gael Guennebaud	7195f008ce	Merged in ra_bauke/eigen (pull request PR-180) alias template for matrix and array classes, see also bug #864 Approved-by: Heiko Bauke <heiko.bauke@mail.de>	2019-02-20 13:22:39 +00:00
Gael Guennebaud	edd413c184	bug #1409 : make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode: - this helps clang 5 and 6 to support alignas in STL's containers. - this makes the public API of our (and users) classes cleaner	2019-02-20 13:52:11 +01:00
Gael Guennebaud	482c5fb321	bug #899 : remove "rank-revealing" qualifier for SparseQR and warn that it is not always rank-revealing.	2019-02-19 22:52:15 +01:00
Christoph Hertzberg	a1646fc960	Commas at the end of enumerator lists are not allowed in C++03	2019-02-19 14:32:25 +01:00
Gael Guennebaud	ab78cabd39	Add C++17 detection macro, and make sure throw(xpr) is not used if the compiler is in c++17 mode.	2019-02-19 14:04:35 +01:00
Gael Guennebaud	115da6a1ea	Fix conversion warnings	2019-02-19 14:00:15 +01:00
Gael Guennebaud	7580112c31	Fix harmless Scalar vs RealScalar cast.	2019-02-18 22:12:28 +01:00
Gael Guennebaud	796db94e6e	bug #1194 : implement slightly faster and SIMD friendly 4x4 determinant.	2019-02-18 16:21:27 +01:00
Gael Guennebaud	31b6e080a9	Fix regression: .conjugate() was popped out but not re-introduced.	2019-02-18 14:45:55 +01:00
Gael Guennebaud	c69d0d08d0	Set cost of conjugate to 0 (in practice it boils down to a no-op). This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)	2019-02-18 14:43:07 +01:00
Gael Guennebaud	512b74aaa1	GEMM: catch all scalar-multiple variants when falling-back to a coeff-based product. Before only sAB was caught which was both inconsistent with GEMM, sub-optimal, and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495).	2019-02-18 11:47:54 +01:00
Christoph Hertzberg	ec032ac03b	Guard C++11-style default constructor. Also, this is only needed for MSVC	2019-02-16 09:44:05 +01:00
Gael Guennebaud	83309068b4	bug #1680 : improve MSVC inlining by declaring many triavial constructors and accessors as STRONG_INLINE.	2019-02-15 16:35:35 +01:00
Gael Guennebaud	0505248f25	bug #1680 : make all "block" methods strong-inline and device-functions (some were missing EIGEN_DEVICE_FUNC)	2019-02-15 16:33:56 +01:00
Gael Guennebaud	559320745e	bug #1678 : Fix lack of __FMA__ macro on MSVC with AVX512	2019-02-15 10:30:28 +01:00
Gael Guennebaud	d85ae650bf	bug #1678 : workaround MSVC compilation issues with AVX512	2019-02-15 10:24:17 +01:00
Gael Guennebaud	f2970819a2	bug #1679 : avoid possible division by 0 in complex-schur	2019-02-15 09:39:25 +01:00
Rasmus Munk Larsen	65e23ca7e9	Revert `b55b5c7280` .	2019-02-14 13:46:13 -08:00
Gael Guennebaud	bdcb5f3304	Let's properly use Score instead of std::abs, and remove deprecated FIXME ( a /= b does a/b and not a * (1/b) as it was a long time ago...)	2019-02-11 22:56:19 +01:00
Gael Guennebaud	2edfc6807d	Fix compilation of empty products of the form: Mx0 * 0xN	2019-02-11 18:24:07 +01:00
Gael Guennebaud	eb46f34a8c	Speed up 2x2 LU by a factor 2, and other small fixed sizes by about 10%. Not sure that's so critical, but this does not complexify the code base much.	2019-02-11 17:59:35 +01:00
Gael Guennebaud	ab6e6edc32	Speedup PartialPivLU for small matrices by passing compile-time sizes when available. This change set also makes a better use of Map<>+OuterStride and Ref<> yielding surprising speed up for small dynamic sizes as well. The table below reports times in micro seconds for 10 random matrices: \| ------ float --------- \| ------- double ------- \| size \| before after ratio \| before after ratio \| fixed 1 \| 0.34 0.11 2.93 \| 0.35 0.11 3.06 \| fixed 2 \| 0.81 0.24 3.38 \| 0.91 0.25 3.60 \| fixed 3 \| 1.49 0.49 3.04 \| 1.68 0.55 3.01 \| fixed 4 \| 2.31 0.70 3.28 \| 2.45 1.08 2.27 \| fixed 5 \| 3.49 1.11 3.13 \| 3.84 2.24 1.71 \| fixed 6 \| 4.76 1.64 2.88 \| 4.87 2.84 1.71 \| dyn 1 \| 0.50 0.40 1.23 \| 0.51 0.40 1.26 \| dyn 2 \| 1.08 0.85 1.27 \| 1.04 0.69 1.49 \| dyn 3 \| 1.76 1.26 1.40 \| 1.84 1.14 1.60 \| dyn 4 \| 2.57 1.75 1.46 \| 2.67 1.66 1.60 \| dyn 5 \| 3.80 2.64 1.43 \| 4.00 2.48 1.61 \| dyn 6 \| 5.06 3.43 1.47 \| 5.15 3.21 1.60 \|	2019-02-11 13:58:24 +01:00
Gael Guennebaud	013cc3a6b3	Make GEMM fallback to GEMV for runtime vectors. This is a more general and simpler version of changeset `4c0fa6ce0f`	2019-02-07 16:24:09 +01:00
Gael Guennebaud	fa2fcb4895	Backed out changeset `4c0fa6ce0f`	2019-02-07 16:07:08 +01:00
Gael Guennebaud	b3c4344a68	bug #1676 : workaround GCC's bug in c++17 mode.	2019-02-07 15:21:35 +01:00
Eugene Zhulenev	6d0f6265a9	Remove duplicated comment line	2019-02-04 10:30:25 -08:00
Eugene Zhulenev	690b2c45b1	Fix GeneralBlockPanelKernel Android compilation	2019-02-04 10:29:15 -08:00
Gael Guennebaud	871e2e5339	bug #1674 : disable GCC's unsafe-math-optimizations in sin/cos vectorization (results are completely wrong otherwise)	2019-02-03 08:54:47 +01:00
Rasmus Larsen	e7b481ea74	Merged in rmlarsen/eigen (pull request PR-578) Speed up Eigen matrixvector and vectormatrix multiplication. Approved-by: Eugene Zhulenev <ezhulenev@google.com>	2019-02-02 01:53:44 +00:00
Sameer Agarwal	b55b5c7280	Speed up row-major matrix-vector product on ARM The row-major matrix-vector multiplication code uses a threshold to check if processing 8 rows at a time would thrash the cache. This change introduces two modifications to this logic. 1. A smaller threshold for ARM and ARM64 devices. The value of this threshold was determined empirically using a Pixel2 phone, by benchmarking a large number of matrix-vector products in the range [1..4096]x[1..4096] and measuring performance separately on small and little cores with frequency pinning. On big (out-of-order) cores, this change has little to no impact. But on the small (in-order) cores, the matrix-vector products are up to 700% faster. Especially on large matrices. The motivation for this change was some internal code at Google which was using hand-written NEON for implementing similar functionality, processing the matrix one row at a time, which exhibited substantially better performance than Eigen. With the current change, Eigen handily beats that code. 2. Make the logic for choosing number of simultaneous rows apply unifiormly to 8, 4 and 2 rows instead of just 8 rows. Since the default threshold for non-ARM devices is essentially unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM performance. This was verified by running the same set of benchmarks on a Xeon desktop.	2019-02-01 15:23:53 -08:00
Rasmus Munk Larsen	4c0fa6ce0f	Speed up Eigen matrixvector and vectormatrix multiplication. This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector. The benchmarks below test c.noalias()= n_by_n_matrix * n_by_1_matrix; c.noalias()= 1_by_n_matrix * n_by_n_matrix; respectively. Benchmark measurements: SSE: Run on * (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 1096 312 +71.5% BM_MatVec/128 4581 1464 +68.0% BM_MatVec/256 18534 5710 +69.2% BM_MatVec/512 118083 24162 +79.5% BM_MatVec/1k 704106 173346 +75.4% BM_MatVec/2k 3080828 742728 +75.9% BM_MatVec/4k 25421512 4530117 +82.2% BM_VecMat/32 352 130 +63.1% BM_VecMat/64 1213 425 +65.0% BM_VecMat/128 4640 1564 +66.3% BM_VecMat/256 17902 5884 +67.1% BM_VecMat/512 70466 24000 +65.9% BM_VecMat/1k 340150 161263 +52.6% BM_VecMat/2k 1420590 645576 +54.6% BM_VecMat/4k 8083859 4364327 +46.0% AVX2: Run on * (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 619 120 +80.6% BM_MatVec/128 9693 752 +92.2% BM_MatVec/256 38356 2773 +92.8% BM_MatVec/512 69006 12803 +81.4% BM_MatVec/1k 443810 160378 +63.9% BM_MatVec/2k 2633553 646594 +75.4% BM_MatVec/4k 16211095 4327148 +73.3% BM_VecMat/64 925 227 +75.5% BM_VecMat/128 3438 830 +75.9% BM_VecMat/256 13427 2936 +78.1% BM_VecMat/512 53944 12473 +76.9% BM_VecMat/1k 302264 157076 +48.0% BM_VecMat/2k 1396811 675778 +51.6% BM_VecMat/4k 8962246 4459010 +50.2% AVX512: Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 401 111 +72.3% BM_MatVec/128 1846 513 +72.2% BM_MatVec/256 36739 1927 +94.8% BM_MatVec/512 54490 9227 +83.1% BM_MatVec/1k 487374 161457 +66.9% BM_MatVec/2k 2016270 643824 +68.1% BM_MatVec/4k 13204300 4077412 +69.1% BM_VecMat/32 324 106 +67.3% BM_VecMat/64 1034 246 +76.2% BM_VecMat/128 3576 802 +77.6% BM_VecMat/256 13411 2561 +80.9% BM_VecMat/512 58686 10037 +82.9% BM_VecMat/1k 320862 163750 +49.0% BM_VecMat/2k 1406719 651397 +53.7% BM_VecMat/4k 7785179 4124677 +47.0% Currently watchingStop watching	2019-01-31 14:24:08 -08:00
Gael Guennebaud	7ef879f6bf	GEBP: improves pipelining in the 1pX4 path with FMA. Prior to this change, a product with a LHS having 8 rows was faster with AVX-only than with AVX+FMA. With AVX+FMA I measured a speed up of about x1.25 in such cases.	2019-01-30 23:45:12 +01:00
Gael Guennebaud	de77bf5d6c	Fix compilation with ARM64.	2019-01-30 16:48:20 +01:00
Gael Guennebaud	eb4c6bb22d	Fix conflicts and merge	2019-01-30 15:57:08 +01:00
Gael Guennebaud	df12fae8b8	According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101 , the previous GCC issue is fixed in GCC trunk (will be gcc 9).	2019-01-30 11:52:28 +01:00
Gael Guennebaud	3775926bba	ARM64 & GEBP: add specialization for double +30% speed up	2019-01-30 11:49:06 +01:00
Gael Guennebaud	be5b0f664a	ARM64 & GEBP: Make use of vfmaq_laneq_f32 and workaround GCC's issue in generating good ASM	2019-01-30 11:48:25 +01:00
Gael Guennebaud	8a06c699d0	bug #1669 : fix PartialPivLU/inverse with zero-sized matrices.	2019-01-29 10:27:13 +01:00
Gael Guennebaud	a2a07e62b9	Fix compilation with c++03 (local class cannot be template arguments), and make SparseMatrix::assignDiagonal truly protected.	2019-01-29 10:10:07 +01:00
Gael Guennebaud	f489f44519	bug #1574 : implement "sparse_matrix =,+=,-= diagonal_matrix" with smart insertion strategies of missing diagonal coeffs.	2019-01-28 17:29:50 +01:00
Gael Guennebaud	803fa79767	Move evaluator<SparseCompressedBase>::find(i,j) to a more general and reusable SparseCompressedBase::lower_bound(i,j) functiion	2019-01-28 17:24:44 +01:00
Christoph Hertzberg	5a52e35f9a	Renaming some more `I` identifiers	2019-01-26 13:18:21 +01:00
Rasmus Munk Larsen	71429883ee	Fix compilation error in NEON GEBP specializaition of madd.	2019-01-25 17:00:21 -08:00
Gael Guennebaud	ec8a387972	cleanup	2019-01-24 10:24:45 +01:00
David Tellenbach	237b03b372	PR 574: use variadic template instead of initializer_list to implement fixed-size vector ctor from coefficients.	2019-01-23 00:07:19 +01:00
Gael Guennebaud	80f81f9c4b	Cleanup SFINAE in Array/Matrix(initializer_list) ctors and minor doc editing.	2019-01-22 17:08:47 +01:00
David Tellenbach	db152b9ee6	PR 572: Add initializer list constructors to Matrix and Array (include unit tests and doc) - {1,2,3,4,5,...} for fixed-size vectors only - {{1,2,3},{4,5,6}} for the general cases - {{1,2,3,4,5,....}} is allowed for both row and column-vector	2019-01-21 16:25:57 +01:00
nluehr	92774f0275	Replace host_define.h with cuda_runtime_api.h	2019-01-18 16:10:09 -06:00
Christoph Hertzberg	da0a41b9ce	Mask unused-parameter warnings, when building with NDEBUG	2019-01-18 10:41:14 +01:00
Rasmus Munk Larsen	2eccbaf3f7	Add missing logical packet ops for GPU and NEON.	2019-01-17 17:45:08 -08:00
Gael Guennebaud	ee3662abc5	Remove some useless const_cast	2019-01-17 18:27:49 +01:00
Gael Guennebaud	0fe6b7d687	Make nestByValue works again (broken since 3.3) and add unit tests.	2019-01-17 18:27:25 +01:00
Gael Guennebaud	4b7cf7ff82	Extend reshaped unit tests and remove useless const_cast	2019-01-17 17:35:32 +01:00
Gael Guennebaud	b57c9787b1	Cleanup useless const_cast and add missing broadcast assignment tests	2019-01-17 16:55:42 +01:00
Gael Guennebaud	be05d0030d	Make FullPivLU use conjugateIf<>	2019-01-17 12:01:00 +01:00
Patrick Peltzer	15e53d5d93	PR 567: makes all dense solvers inherit SoverBase (LU,Cholesky,QR,SVD). This changeset also includes: * add HouseholderSequence::conjugateIf * define int as the StorageIndex type for all dense solvers * dedicated unit tests, including assertion checking * _check_solve_assertion(): this method can be implemented in derived solver classes to implement custom checks * CompleteOrthogonalDecompositions: add applyZOnTheLeftInPlace, fix scalar type in applyZAdjointOnTheLeftInPlace(), add missing assertions * Cholesky: add missing assertions * FullPivHouseholderQR: Corrected Scalar type in _solve_impl() * BDCSVD: Unambiguous return type for ternary operator * SVDBase: Corrected Scalar type in _solve_impl()	2019-01-17 01:17:39 +01:00
Gael Guennebaud	7f32109c11	Add conjugateIf<bool> members to DesneBase, TriangularView, SelfadjointView, and make PartialPivLU use it.	2019-01-17 11:33:43 +01:00
Gael Guennebaud	562985bac4	bug #1646 : fix false aliasing detection for A.row(0) = A.col(0); This changeset completely disable the detection for vectors for which are current mechanism cannot detect any positive aliasing anyway.	2019-01-17 00:14:27 +01:00
Rasmus Munk Larsen	7401e2541d	Fix compilation error for logical packet ops with older compilers.	2019-01-16 14:43:33 -08:00
Gael Guennebaud	0f028f61cb	GEBP: fix swapped kernel mode with AVX512 and complex scalars	2019-01-16 22:26:38 +01:00
Gael Guennebaud	e118ce86fd	GEBP: cleanup logic to choose between a 4 packets of 1 packet	2019-01-16 21:47:42 +01:00
Gael Guennebaud	70e133333d	bug #1661 : fix regression in GEBP and AVX512	2019-01-16 21:22:20 +01:00
Gael Guennebaud	502f717980	bug #1646 : disable aliasing detection for empty and 1x1 expression	2019-01-16 14:33:45 +01:00
Gael Guennebaud	0b466b6933	bug #1633 : use proper type for madd temporaries, factorize RhsPacketx4.	2019-01-16 13:50:13 +01:00
Renjie Liu	dbfcceabf5	Bug: 1633: refactor gebp kernel and optimize for neon	2019-01-16 12:51:36 +08:00
Gael Guennebaud	2b70b2f570	Make Transform::rotation() an alias to Transform::linear() in the case of an Isometry	2019-01-15 22:50:42 +01:00
Gael Guennebaud	2c2c114995	Silent maybe-uninitialized warnings by gcc	2019-01-15 16:53:15 +01:00
Gael Guennebaud	6ec6bf0b0d	Enable visitor on empty matrices (the visitor is left unchanged), and protect min/maxCoeff(Index,Index) on empty matrices by an assertion (+ doc & unit tests)	2019-01-15 15:21:14 +01:00
Gael Guennebaud	027e44ed24	bug #1592 : makes partial min/max reductions trigger an assertion on inputs with a zero reduction length (+doc and tests)	2019-01-15 15:13:24 +01:00
Gael Guennebaud	f8bc5cb39e	Fix detection of vector-at-time: use Rows/Cols instead of MaxRow/MaxCols. This fix VectorXd(n).middleCol(0,0).outerSize() which was equal to 1.	2019-01-15 15:09:49 +01:00
Gael Guennebaud	6cf7afa3d9	Typo	2019-01-15 11:04:37 +01:00
Rasmus Larsen	7b3aab0936	Merged in rmlarsen/eigen (pull request PR-570) Add support for inverse hyperbolic functions. Fix cost of division.	2019-01-14 21:31:33 +00:00
Gael Guennebaud	250dcd1fdb	bug #1652 : fix position of EIGEN_ALIGN16 attributes in Neon and Altivec	2019-01-14 21:45:56 +01:00
Rasmus Larsen	5a59452aae	Merged eigen/eigen into default	2019-01-14 10:23:23 -08:00
Gael Guennebaud	3c9e6d206d	AVX512: fix pgather/pscatter for Packet4cd and unaligned pointers	2019-01-14 17:57:28 +01:00
Gael Guennebaud	61b6eb05fe	AVX512 (r)sqrt(double) was mistakenly disabled with clang and others	2019-01-14 17:28:47 +01:00
Gael Guennebaud	ccddeaad90	fix warning	2019-01-14 16:51:16 +01:00
Gael Guennebaud	d4881751d3	Doc: add Isometry in the list of supported Mode of Transform<>	2019-01-14 16:38:26 +01:00
Greg Coombe	9d988a1e1a	Initialize isometric transforms like affine transforms. The isometric transform, like the affine transform, has an implicit last row of [0, 0, 0, 1]. This was not being properly initialized, as verified by a new test function.	2019-01-11 23:14:35 -08:00
Gael Guennebaud	4356a55a61	PR 571: Implements an accurate argument reduction algorithm for huge inputs of sin/cos and call it instead of falling back to std::sin/std::cos. This makes both the small and huge argument cases faster because: - for small inputs this removes the last pselect - for large inputs only the reduction part follows a scalar path, the rest use the same SIMD path as the small-argument case.	2019-01-14 13:54:01 +01:00
Gael Guennebaud	f566724023	Fix StorageIndex FIXME in dense LU solvers	2019-01-13 17:54:30 +01:00
Rasmus Munk Larsen	1c6e6e2c3f	Merge.	2019-01-11 17:47:11 -08:00
Rasmus Munk Larsen	28ba1b2c32	Add support for inverse hyperbolic functions. Fix cost of division.	2019-01-11 17:45:37 -08:00
Rasmus Munk Larsen	89c4001d6f	Fix warnings in ptrue for complex and half types.	2019-01-11 14:10:57 -08:00
Rasmus Munk Larsen	a49d01edba	Fix warnings in ptrue for complex and half types.	2019-01-11 13:18:17 -08:00
Rasmus Munk Larsen	df29511ac0	Fix merge.	2019-01-11 10:36:36 -08:00
Rasmus Munk Larsen	9396ace46b	Merge.	2019-01-11 10:28:52 -08:00
Rasmus Larsen	74882471d0	Merged eigen/eigen into default	2019-01-11 10:20:55 -08:00
Gael Guennebaud	9005f0111f	Replace compiler's alignas/alignof extension by respective c++11 keywords when available. This also fix a compilation issue with gcc-4.7.	2019-01-11 17:10:54 +01:00
Mark D Ryan	3c9add6598	Remove reinterpret_cast from AVX512 complex implementation The reinterpret_casts used in ptranspose(PacketBlock<Packet8cf,4>&) ptranspose(PacketBlock<Packet8cf,8>&) don't appear to be working correctly. They're used to convert the kernel parameters to PacketBlock<Packet8d,T>& so that the complex number versions of ptranspose can be written using the existing double implementations. Unfortunately, they don't seem to work and are responsible for 9 unit test failures in the AVX512 build of tensorflow master. This commit fixes the issue by manually initialising PacketBlock<Packet8d,T> variables with the contents of the kernel parameter before calling the double version of ptranspose, and then copying the resulting values back into the kernel parameter before returning.	2019-01-11 14:02:09 +01:00
Rasmus Munk Larsen	fcfced13ed	Rename pones -> ptrue. Use _CMP_TRUE_UQ where appropriate.	2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen	e15bb785ad	Collapsed revision * Add packet up "pones". Write pnot(a) as pxor(pones(a), a). * Collapsed revision * Simplify a bit. * Undo useless diffs. * Fix typo.	2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen	f6ba6071c5	Fix typo.	2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen	8f04442526	Collapsed revision * Collapsed revision * Add packet up "pones". Write pnot(a) as pxor(pones(a), a). * Collapsed revision * Simplify a bit. * Undo useless diffs. * Fix typo.	2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen	e00521b514	Undo useless diffs.	2019-01-09 16:32:53 -08:00
Rasmus Munk Larsen	f2767112c8	Simplify a bit.	2019-01-09 16:29:18 -08:00
Rasmus Munk Larsen	cb955df9a6	Add packet up "pones". Write pnot(a) as pxor(pones(a), a).	2019-01-09 16:17:08 -08:00
Rasmus Larsen	cb3c059fa4	Merged eigen/eigen into default	2019-01-09 15:04:17 -08:00
Gael Guennebaud	d812f411c3	bug #1654 : fix compilation with cuda and no c++11	2019-01-09 18:00:05 +01:00
Gael Guennebaud	3492a1ca74	fix plog(+inf) with AVX512	2019-01-09 16:53:37 +01:00
Gael Guennebaud	47810cf5b7	Add dedicated implementations of predux_any for AVX512, NEON, and Altivec/VSE	2019-01-09 16:40:42 +01:00
Gael Guennebaud	3f14e0d19e	fix warning	2019-01-09 15:45:21 +01:00
Gael Guennebaud	aeec68f77b	Add missing pcmp_lt and others for AVX512	2019-01-09 15:36:41 +01:00
Gael Guennebaud	e6b217b8dd	bug #1652 : implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows: - no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs - FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs	2019-01-09 15:25:17 +01:00
Rasmus Munk Larsen	055f0b73db	Add support for pcmp_eq and pnot, including for complex types.	2019-01-07 16:53:36 -08:00
Eugene Zhulenev	190d053e41	Explicitly set fill character when printing aligned data to ostream	2019-01-03 14:55:28 -08:00
Mark D Ryan	bc5dd4cafd	PR560: Fix the AVX512f only builds Commit `c53eececb0` introduced AVX512 support for complex numbers but required avx512dq to build. Commit `1d683ae2f5` fixed some but not, it would seem all, of the hard avx512dq dependencies. Build failures are still evident on Eigen and TensorFlow when compiling with just avx512f and no avx512dq using gcc 7.3. Looking at the code there does indeed seem to be a problem. Commit `c53eececb0` calls avx512dq intrinsics directly, e.g, _mm512_extractf32x8_ps and _mm512_and_ps. This commit fixes the issue by replacing the direct intrinsic calls with the various wrapper functions that are safe to use on avx512f only builds.	2019-01-03 14:33:04 +01:00
Gael Guennebaud	60d3fe9a89	One more stupid AVX 512 fix (I don't have direct access to AVX512 machines)	2018-12-24 13:05:03 +01:00
Gael Guennebaud	4aa667b510	Add EIGEN_STRONG_INLINE where required	2018-12-24 10:45:01 +01:00
Gael Guennebaud	961ff567e8	Add missing pcmp_lt_or_nan for AVX512	2018-12-23 22:13:29 +01:00
Gael Guennebaud	0f6f75bd8a	Implement a faster fix for sin/cos of large entries that also correctly handle INF input.	2018-12-23 17:26:21 +01:00
Gael Guennebaud	38d704def8	Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate)	2018-12-23 16:13:24 +01:00
Gael Guennebaud	5713fb7feb	Fix plog(+INF): it returned ~87 instead of +INF	2018-12-23 15:40:52 +01:00
Christoph Hertzberg	6dd93f7e3b	Make code compile again for older compilers. See https://stackoverflow.com/questions/7411515/	2018-12-22 13:09:07 +01:00
Gustavo Lima Chaves	1024a70e82	gebp: Add new ½ and ¼ packet rows per (peeling) round on the lhs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The patch works by altering the gebp lhs packing routines to also consider ½ and ¼ packet lenght rows when packing, besides the original whole package and row-by-row attempts. Finally, gebp itself will try to fit a fraction of a packet at a time if: i) ½ and/or ¼ packets are available for the current context (e.g. AVX2 and SSE-sized SIMD register for x86) ii) The matrix's height is favorable to it (it may be it's too small in that dimension to take full advantage of the current/maximum packet width or it may be the case that last rows may take advantage of smaller packets before gebp goes row-by-row) This helps mitigate huge slowdowns one had on AVX512 builds when compared to AVX2 ones, for some dimensions. Gains top at an extra 1x in throughput. This patch is a complement to changeset `4ad359237a` . Since packing is changed, Eigen users which would go for very low-level API usage, like TensorFlow, will have to be adapted to work fine with the changes.	2018-12-21 11:03:18 -08:00
Gustavo Lima Chaves	e763fcd09e	Introducing "vectorized" byte on unpacket_traits structs This is a preparation to a change on gebp_traits, where a new template argument will be introduced to dictate the packet size, so it won't be bound to the current/max packet size only anymore. By having packet types defined early on gebp_traits, one has now to act on packet types, not scalars anymore, for the enum values defined on that class. One approach for reaching the vectorizable/size properties one needs there could be getting the packet's scalar again with unpacket_traits<>, then the size/Vectorizable enum entries from packet_traits<>. It turns out guards like "#ifndef EIGEN_VECTORIZE_AVX512" at AVX/PacketMath.h will hide smaller packet variations of packet_traits<> for some types (and it makes sense to keep that). In other words, one can't go back to the scalar and create a new PacketType, as this will always lead to the maximum packet type for the architecture. The less costly/invasive solution for that, thus, is to add the vectorizable info on every unpacket_traits struct as well.	2018-12-19 14:24:44 -08:00
Gael Guennebaud	efa4c9c40f	bug #1615 : slightly increase the default unrolling limit to compensate for changeset `101ea26f5e` . This solves a performance regression with clang and 3x3 matrix products.	2018-12-13 10:42:39 +01:00
Gael Guennebaud	f582ea3579	Fix compilation with expression template scalar type.	2018-12-12 22:47:00 +01:00
Gael Guennebaud	2de8da70fd	bug #1557 : fix RealSchur and EigenSolver for matrices with only zeros on the diagonal.	2018-12-12 17:30:08 +01:00
Gael Guennebaud	37c91e1836	bug #1644 : fix warning	2018-12-11 22:07:20 +01:00
Gael Guennebaud	f159cf3d75	Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels. With a 6pX4 kernel (not committed yet), this provides a +20% speedup.	2018-12-11 15:36:27 +01:00
Gael Guennebaud	0a7e7af6fd	Properly set the number of registers for AVX512	2018-12-11 15:33:17 +01:00
Gael Guennebaud	7166496f70	bug #1643 : fix compilation issue with gcc and no optimizaion	2018-12-11 13:24:42 +01:00
Gael Guennebaud	0d90637838	enable spilling workaround on architectures with SSE/AVX	2018-12-10 23:22:44 +01:00
Gael Guennebaud	bff90bf270	workaround "may be used uninitialized" warning	2018-12-08 18:58:28 +01:00
Gael Guennebaud	81c27325ae	bug #1641 : fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512	2018-12-08 14:27:48 +01:00
Gael Guennebaud	426bce7529	fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target	2018-12-08 09:44:21 +01:00
Gael Guennebaud	956678a4ef	bug #1515 : disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling.	2018-12-07 18:03:36 +01:00
Gael Guennebaud	7b6d0ff1f6	Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error.	2018-12-07 15:14:50 +01:00
Gael Guennebaud	f233c6194d	bug #1637 : workaround register spilling in gebp with clang>=6.0+AVX+FMA	2018-12-07 10:01:09 +01:00
Gael Guennebaud	ae59a7652b	bug #1638 : add a warning if avx512 is enabled without SSE/AVX FMA	2018-12-07 09:23:28 +01:00
Gael Guennebaud	4e7746fe22	bug #1636 : fix gemm performance issue with gcc>=6 and no FMA	2018-12-07 09:15:46 +01:00
Gael Guennebaud	cbf2f4b7a0	AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only	2018-12-06 18:21:56 +01:00
Gael Guennebaud	1d683ae2f5	Fix compilation with avx512f only, i.e., no AVX512DQ	2018-12-06 18:11:07 +01:00
Gael Guennebaud	c53eececb0	Implement AVX512 vectorization of std::complex<float/double>	2018-12-06 15:58:06 +01:00
Gael Guennebaud	3fba59ea59	temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though!	2018-12-06 00:13:26 +01:00
Gael Guennebaud	1ac2695ef7	bug #1636 : fix compilation with some ABI versions.	2018-12-06 00:05:10 +01:00
Rasmus Munk Larsen	47d8b741b2	#elif -> #else to fix GPU build.	2018-12-05 13:19:31 -08:00
Christoph Hertzberg	c1d356e8b4	bug #1635 : Use infinity from Numtraits instead of creating it manually.	2018-12-05 15:01:04 +01:00
Rasmus Munk Larsen	b57b31cce9	Merged in ezhulenev/eigen-01 (pull request PR-553) Do not disable alignment with EIGEN_GPUCC Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>	2018-12-04 23:47:19 +00:00
Eugene Zhulenev	0bb15bb6d6	Update checks in ConfigureVectorization.h	2018-12-03 17:10:40 -08:00
Eugene Zhulenev	fd0fbfa9b5	Do not disable alignment with EIGEN_GPUCC	2018-12-03 15:54:10 -08:00
Christoph Hertzberg	919414b9fe	bug #785 : Make Cholesky decomposition work for empty matrices	2018-12-03 16:18:15 +01:00
Gael Guennebaud	0ea7ae7213	Add missing padd for Packet8i (it was implicitly generated by clang and gcc)	2018-11-30 21:52:25 +01:00
Gael Guennebaud	ab4df3e6ff	bug #1634 : remove double copy in move-ctor of non movable Matrix/Array	2018-11-30 21:25:51 +01:00
Gael Guennebaud	c785464430	Add packet sin and cos to Altivec/VSX and NEON	2018-11-30 16:21:33 +01:00
Gael Guennebaud	69ace742be	Several improvements regarding packet-bitwise operations: - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...)	2018-11-30 15:56:08 +01:00

... 3 4 5 6 7 ...

6242 Commits