eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-24 14:45:14 +08:00

Author	SHA1	Message	Date
Eugene Zhulenev	a7b7f3ca8a	Add missing EIGEN_DEPRECATED annotations to deprecated functions and fix few other doxygen warnings	2019-04-23 17:23:19 -07:00
Eugene Zhulenev	68a2a8c445	Use packet ops instead of AVX2 intrinsics	2019-04-23 11:41:02 -07:00
Anuj Rawat	8c7a6feb8e	Adding lowlevel APIs for optimized RHS packet load in TensorFlow SpatialConvolution Low-level APIs are added in order to optimized packet load in gemm_pack_rhs in TensorFlow SpatialConvolution. The optimization is for scenario when a packet is split across 2 adjacent columns. In this case we read it as two 'partial' packets and then merge these into 1. Currently this only works for Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other packet types (such as Packet8d) also. This optimization shows significant speedup in SpatialConvolution with certain parameters. Some examples are below. Benchmark parameters are specified as: Batch size, Input dim, Depth, Num of filters, Filter dim Speedup numbers are specified for number of threads 1, 2, 4, 8, 16. AVX512: Parameters \| Speedup (Num of threads: 1, 2, 4, 8, 16) ----------------------------\|------------------------------------------ 128, 24x24, 3, 64, 5x5 \|2.18X, 2.13X, 1.73X, 1.64X, 1.66X 128, 24x24, 1, 64, 8x8 \|2.00X, 1.98X, 1.93X, 1.91X, 1.91X 32, 24x24, 3, 64, 5x5 \|2.26X, 2.14X, 2.17X, 2.22X, 2.33X 128, 24x24, 3, 64, 3x3 \|1.51X, 1.45X, 1.45X, 1.67X, 1.57X 32, 14x14, 24, 64, 5x5 \|1.21X, 1.19X, 1.16X, 1.70X, 1.17X 128, 128x128, 3, 96, 11x11 \|2.17X, 2.18X, 2.19X, 2.20X, 2.18X AVX2: Parameters \| Speedup (Num of threads: 1, 2, 4, 8, 16) ----------------------------\|------------------------------------------ 128, 24x24, 3, 64, 5x5 \| 1.66X, 1.65X, 1.61X, 1.56X, 1.49X 32, 24x24, 3, 64, 5x5 \| 1.71X, 1.63X, 1.77X, 1.58X, 1.68X 128, 24x24, 1, 64, 5x5 \| 1.44X, 1.40X, 1.38X, 1.37X, 1.33X 128, 24x24, 3, 64, 3x3 \| 1.68X, 1.63X, 1.58X, 1.56X, 1.62X 128, 128x128, 3, 96, 11x11 \| 1.36X, 1.36X, 1.37X, 1.37X, 1.37X In the higher level benchmark cifar10, we observe a runtime improvement of around 6% for AVX512 on Intel Skylake server (8 cores). On lower level PackRhs micro-benchmarks specified in TensorFlow tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe the following runtime numbers: AVX512: Parameters \| Runtime without patch (ns) \| Runtime with patch (ns) \| Speedup ---------------------------------------------------------------\|----------------------------\|-------------------------\|--------- BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) \| 41350 \| 15073 \| 2.74X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) \| 7277 \| 7341 \| 0.99X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) \| 8675 \| 8681 \| 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) \| 24155 \| 16079 \| 1.50X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) \| 25052 \| 17152 \| 1.46X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) \| 18269 \| 18345 \| 1.00X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) \| 19468 \| 19872 \| 0.98X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) \| 156060 \| 42432 \| 3.68X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) \| 132701 \| 36944 \| 3.59X AVX2: Parameters \| Runtime without patch (ns) \| Runtime with patch (ns) \| Speedup ---------------------------------------------------------------\|----------------------------\|-------------------------\|--------- BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) \| 26233 \| 12393 \| 2.12X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) \| 6091 \| 6062 \| 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) \| 7427 \| 7408 \| 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) \| 23453 \| 20826 \| 1.13X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) \| 23167 \| 22091 \| 1.09X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) \| 23422 \| 23682 \| 0.99X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) \| 23165 \| 23663 \| 0.98X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) \| 72689 \| 44969 \| 1.62X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) \| 61732 \| 39779 \| 1.55X All benchmarks on Intel Skylake server with 8 cores.	2019-04-20 06:46:43 +00:00
Gael Guennebaud	45e65fbb77	bug #1695 : fix a numerical robustness issue. Computing the secular equation at the middle range without a shift might give a wrong sign.	2019-03-27 20:16:58 +01:00
William D. Irons	8de66719f9	Collapsed revision from PR-619 * Add support for pcmp_eq in AltiVec/Complex.h * Fixed implementation of pcmp_eq for double The new logic is based on the logic from NEON for double.	2019-03-26 18:14:49 +00:00
Gael Guennebaud	f11364290e	ICC does not support -fno-unsafe-math-optimizations	2019-03-22 09:26:24 +01:00
Deven Desai	51e399fc15	updates requested in the PR feedback. Also droping coded within #ifdef EIGEN_HAS_OLD_HIP_FP16	2019-03-19 21:45:25 +00:00
Deven Desai	2dbea5510f	Merged eigen/eigen into default	2019-03-19 16:52:38 -04:00
Rasmus Larsen	5c93b38c5f	Merged in rmlarsen/eigen (pull request PR-618) Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_op<float>. Approved-by: Gael Guennebaud <g.gael@free.fr>	2019-03-18 15:51:55 +00:00
Gael Guennebaud	cf7e2e277f	bug #1692 : enable enum as sizes of Matrix and Array	2019-03-17 21:59:30 +01:00
Rasmus Munk Larsen	e42f9aa68a	Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_<float>.	2019-03-15 17:15:14 -07:00
Rasmus Munk Larsen	8450a6d519	Clean up half packet traits and add a few more missing packet ops.	2019-03-14 15:18:06 -07:00
David Tellenbach	97f9a46cb9	PR 593: Add variadtic ctor for DiagonalMatrix with unit tests	2019-03-14 10:18:24 +01:00
Rasmus Munk Larsen	6a34003141	Remove EIGEN_MPL2_ONLY guard in IncompleteCholesky that is no longer needed after the AMD reordering code was relicensed to MPL2.	2019-03-13 11:52:41 -07:00
Gael Guennebaud	d7d2f0680e	bug #1684 : partially workaround clang's 6/7 bug #40815	2019-03-13 10:40:01 +01:00
Rasmus Munk Larsen	77f7d4a894	Clean up PacketMathHalf.h and add a few missing logical packet ops.	2019-03-11 17:51:16 -07:00
Gael Guennebaud	656d9bc66b	Apply SSE's pmin/pmax fix for GCC <= 5 to AVX's pmin/pmax	2019-03-10 21:19:18 +01:00
Rasmus Larsen	4d808e834a	Merged in rmlarsen/eigen_threadpool (pull request PR-606) Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239` Approved-by: Sameer Agarwal <sameeragarwal@google.com>	2019-03-06 17:59:03 +00:00
Gael Guennebaud	bfbf7da047	bug #1689 fix used-but-marked-unused warning	2019-03-05 23:46:24 +01:00
Rasmus Munk Larsen	0318fc7f44	Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239`	2019-03-05 10:24:54 -08:00
Gael Guennebaud	b0d406d91c	Enable construction of Ref<VectorType> from a runtime vector.	2019-03-03 15:25:25 +01:00
Sam Hasinoff	9ba81cf0ff	Fully qualify Eigen::internal::aligned_free This helps avoids a conflict on certain Windows toolchains (potentially due to some ADL name resolution bug) in the case where aligned_free is defined in the global namespace. In any case, tightening this up is harmless.	2019-03-02 17:42:16 +00:00
Gael Guennebaud	22144e949d	bug #1629 : fix compilation of PardisoSupport (regression introduced in changeset `a7842daef2` )	2019-03-02 22:44:47 +01:00
Rasmus Larsen	2ca1e73239	Merged in rmlarsen/eigen (pull request PR-597) Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2. Approved-by: Gael Guennebaud <g.gael@free.fr>	2019-02-25 17:02:16 +00:00
Gael Guennebaud	e409dbba14	Enable SSE vectorization of Quaternion and cross3() with AVX	2019-02-23 10:45:40 +01:00
Gael Guennebaud	0b25a5c431	fix alignment in ploadquad	2019-02-22 21:39:36 +01:00
Rasmus Munk Larsen	1dc1677d52	Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2. Google LLC executed a license agreement with the author of the code from which these files are derived to allow the Eigen project to distribute the code and derived works under MPL2.	2019-02-22 12:33:57 -08:00
Gael Guennebaud	cca6c207f4	AVX512: implement faster ploadquad<Packet16f> thus speeding up GEMM	2019-02-21 17:18:28 +01:00
Gael Guennebaud	1c09ee8541	bug #1674 : workaround clang fast-math aggressive optimizations	2019-02-22 15:48:53 +01:00
Gael Guennebaud	7e3084bb6f	Fix compilation on ARM.	2019-02-22 14:56:12 +01:00
Gael Guennebaud	42c23f14ac	Speed up col/row-wise reverse for fixed size matrices by propagating compile-time sizes.	2019-02-21 22:44:40 +01:00
Rasmus Munk Larsen	4d7f317102	Add a few missing packet ops: cmp_eq for NEON. pfloor for GPU.	2019-02-21 13:32:13 -08:00
Gael Guennebaud	2a39659d79	Add fully generic Vector<Type,Size> and RowVector<Type,Size> type aliases.	2019-02-20 15:23:23 +01:00
Gael Guennebaud	302377110a	Update documentation of Matrix and Array type aliases.	2019-02-20 15:18:48 +01:00
Gael Guennebaud	44b54fa4a3	Protect c++11 type alias with Eigen's macro, and add respective unit test.	2019-02-20 14:43:05 +01:00
Gael Guennebaud	7195f008ce	Merged in ra_bauke/eigen (pull request PR-180) alias template for matrix and array classes, see also bug #864 Approved-by: Heiko Bauke <heiko.bauke@mail.de>	2019-02-20 13:22:39 +00:00
Gael Guennebaud	edd413c184	bug #1409 : make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode: - this helps clang 5 and 6 to support alignas in STL's containers. - this makes the public API of our (and users) classes cleaner	2019-02-20 13:52:11 +01:00
Gael Guennebaud	482c5fb321	bug #899 : remove "rank-revealing" qualifier for SparseQR and warn that it is not always rank-revealing.	2019-02-19 22:52:15 +01:00
Christoph Hertzberg	a1646fc960	Commas at the end of enumerator lists are not allowed in C++03	2019-02-19 14:32:25 +01:00
Gael Guennebaud	ab78cabd39	Add C++17 detection macro, and make sure throw(xpr) is not used if the compiler is in c++17 mode.	2019-02-19 14:04:35 +01:00
Gael Guennebaud	115da6a1ea	Fix conversion warnings	2019-02-19 14:00:15 +01:00
Gael Guennebaud	7580112c31	Fix harmless Scalar vs RealScalar cast.	2019-02-18 22:12:28 +01:00
Gael Guennebaud	796db94e6e	bug #1194 : implement slightly faster and SIMD friendly 4x4 determinant.	2019-02-18 16:21:27 +01:00
Gael Guennebaud	31b6e080a9	Fix regression: .conjugate() was popped out but not re-introduced.	2019-02-18 14:45:55 +01:00
Gael Guennebaud	c69d0d08d0	Set cost of conjugate to 0 (in practice it boils down to a no-op). This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)	2019-02-18 14:43:07 +01:00
Gael Guennebaud	512b74aaa1	GEMM: catch all scalar-multiple variants when falling-back to a coeff-based product. Before only sAB was caught which was both inconsistent with GEMM, sub-optimal, and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495).	2019-02-18 11:47:54 +01:00
Christoph Hertzberg	ec032ac03b	Guard C++11-style default constructor. Also, this is only needed for MSVC	2019-02-16 09:44:05 +01:00
Gael Guennebaud	83309068b4	bug #1680 : improve MSVC inlining by declaring many triavial constructors and accessors as STRONG_INLINE.	2019-02-15 16:35:35 +01:00
Gael Guennebaud	0505248f25	bug #1680 : make all "block" methods strong-inline and device-functions (some were missing EIGEN_DEVICE_FUNC)	2019-02-15 16:33:56 +01:00
Gael Guennebaud	559320745e	bug #1678 : Fix lack of __FMA__ macro on MSVC with AVX512	2019-02-15 10:30:28 +01:00

1 2 3 4 5 ...

5974 Commits