Rasmus Larsen
4d808e834a
Merged in rmlarsen/eigen_threadpool (pull request PR-606)
...
Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239
Approved-by: Sameer Agarwal <sameeragarwal@google.com>
2019-03-06 17:59:03 +00:00
Gael Guennebaud
bfbf7da047
bug #1689 fix used-but-marked-unused warning
2019-03-05 23:46:24 +01:00
Rasmus Munk Larsen
0318fc7f44
Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239
2019-03-05 10:24:54 -08:00
Gael Guennebaud
b0d406d91c
Enable construction of Ref<VectorType> from a runtime vector.
2019-03-03 15:25:25 +01:00
Sam Hasinoff
9ba81cf0ff
Fully qualify Eigen::internal::aligned_free
...
This helps avoids a conflict on certain Windows toolchains
(potentially due to some ADL name resolution bug) in the case
where aligned_free is defined in the global namespace. In any
case, tightening this up is harmless.
2019-03-02 17:42:16 +00:00
Gael Guennebaud
22144e949d
bug #1629 : fix compilation of PardisoSupport (regression introduced in changeset a7842daef2
...
)
2019-03-02 22:44:47 +01:00
Rasmus Larsen
2ca1e73239
Merged in rmlarsen/eigen (pull request PR-597)
...
Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2.
Approved-by: Gael Guennebaud <g.gael@free.fr>
2019-02-25 17:02:16 +00:00
Gael Guennebaud
e409dbba14
Enable SSE vectorization of Quaternion and cross3() with AVX
2019-02-23 10:45:40 +01:00
Gael Guennebaud
0b25a5c431
fix alignment in ploadquad
2019-02-22 21:39:36 +01:00
Rasmus Munk Larsen
1dc1677d52
Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2. Google LLC executed a license agreement with the author of the code from which these files are derived to allow the Eigen project to distribute the code and derived works under MPL2.
2019-02-22 12:33:57 -08:00
Gael Guennebaud
cca6c207f4
AVX512: implement faster ploadquad<Packet16f> thus speeding up GEMM
2019-02-21 17:18:28 +01:00
Gael Guennebaud
1c09ee8541
bug #1674 : workaround clang fast-math aggressive optimizations
2019-02-22 15:48:53 +01:00
Gael Guennebaud
7e3084bb6f
Fix compilation on ARM.
2019-02-22 14:56:12 +01:00
Gael Guennebaud
42c23f14ac
Speed up col/row-wise reverse for fixed size matrices by propagating compile-time sizes.
2019-02-21 22:44:40 +01:00
Rasmus Munk Larsen
4d7f317102
Add a few missing packet ops: cmp_eq for NEON. pfloor for GPU.
2019-02-21 13:32:13 -08:00
Gael Guennebaud
2a39659d79
Add fully generic Vector<Type,Size> and RowVector<Type,Size> type aliases.
2019-02-20 15:23:23 +01:00
Gael Guennebaud
302377110a
Update documentation of Matrix and Array type aliases.
2019-02-20 15:18:48 +01:00
Gael Guennebaud
44b54fa4a3
Protect c++11 type alias with Eigen's macro, and add respective unit test.
2019-02-20 14:43:05 +01:00
Gael Guennebaud
7195f008ce
Merged in ra_bauke/eigen (pull request PR-180)
...
alias template for matrix and array classes, see also bug #864
Approved-by: Heiko Bauke <heiko.bauke@mail.de>
2019-02-20 13:22:39 +00:00
Gael Guennebaud
edd413c184
bug #1409 : make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode:
...
- this helps clang 5 and 6 to support alignas in STL's containers.
- this makes the public API of our (and users) classes cleaner
2019-02-20 13:52:11 +01:00
Gael Guennebaud
482c5fb321
bug #899 : remove "rank-revealing" qualifier for SparseQR and warn that it is not always rank-revealing.
2019-02-19 22:52:15 +01:00
Christoph Hertzberg
a1646fc960
Commas at the end of enumerator lists are not allowed in C++03
2019-02-19 14:32:25 +01:00
Gael Guennebaud
ab78cabd39
Add C++17 detection macro, and make sure throw(xpr) is not used if the compiler is in c++17 mode.
2019-02-19 14:04:35 +01:00
Gael Guennebaud
115da6a1ea
Fix conversion warnings
2019-02-19 14:00:15 +01:00
Gael Guennebaud
7580112c31
Fix harmless Scalar vs RealScalar cast.
2019-02-18 22:12:28 +01:00
Gael Guennebaud
796db94e6e
bug #1194 : implement slightly faster and SIMD friendly 4x4 determinant.
2019-02-18 16:21:27 +01:00
Gael Guennebaud
31b6e080a9
Fix regression: .conjugate() was popped out but not re-introduced.
2019-02-18 14:45:55 +01:00
Gael Guennebaud
c69d0d08d0
Set cost of conjugate to 0 (in practice it boils down to a no-op).
...
This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate
its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)
2019-02-18 14:43:07 +01:00
Gael Guennebaud
512b74aaa1
GEMM: catch all scalar-multiple variants when falling-back to a coeff-based product.
...
Before only s*A*B was caught which was both inconsistent with GEMM, sub-optimal,
and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495 ).
2019-02-18 11:47:54 +01:00
Christoph Hertzberg
ec032ac03b
Guard C++11-style default constructor. Also, this is only needed for MSVC
2019-02-16 09:44:05 +01:00
Gael Guennebaud
83309068b4
bug #1680 : improve MSVC inlining by declaring many triavial constructors and accessors as STRONG_INLINE.
2019-02-15 16:35:35 +01:00
Gael Guennebaud
0505248f25
bug #1680 : make all "block" methods strong-inline and device-functions (some were missing EIGEN_DEVICE_FUNC)
2019-02-15 16:33:56 +01:00
Gael Guennebaud
559320745e
bug #1678 : Fix lack of __FMA__ macro on MSVC with AVX512
2019-02-15 10:30:28 +01:00
Gael Guennebaud
d85ae650bf
bug #1678 : workaround MSVC compilation issues with AVX512
2019-02-15 10:24:17 +01:00
Gael Guennebaud
f2970819a2
bug #1679 : avoid possible division by 0 in complex-schur
2019-02-15 09:39:25 +01:00
Rasmus Munk Larsen
65e23ca7e9
Revert b55b5c7280
...
.
2019-02-14 13:46:13 -08:00
Gael Guennebaud
bdcb5f3304
Let's properly use Score instead of std::abs, and remove deprecated FIXME ( a /= b does a/b and not a * (1/b) as it was a long time ago...)
2019-02-11 22:56:19 +01:00
Gael Guennebaud
2edfc6807d
Fix compilation of empty products of the form: Mx0 * 0xN
2019-02-11 18:24:07 +01:00
Gael Guennebaud
eb46f34a8c
Speed up 2x2 LU by a factor 2, and other small fixed sizes by about 10%.
...
Not sure that's so critical, but this does not complexify the code base much.
2019-02-11 17:59:35 +01:00
Gael Guennebaud
ab6e6edc32
Speedup PartialPivLU for small matrices by passing compile-time sizes when available.
...
This change set also makes a better use of Map<>+OuterStride and Ref<> yielding surprising speed up for small dynamic sizes as well.
The table below reports times in micro seconds for 10 random matrices:
| ------ float --------- | ------- double ------- |
size | before after ratio | before after ratio |
fixed 1 | 0.34 0.11 2.93 | 0.35 0.11 3.06 |
fixed 2 | 0.81 0.24 3.38 | 0.91 0.25 3.60 |
fixed 3 | 1.49 0.49 3.04 | 1.68 0.55 3.01 |
fixed 4 | 2.31 0.70 3.28 | 2.45 1.08 2.27 |
fixed 5 | 3.49 1.11 3.13 | 3.84 2.24 1.71 |
fixed 6 | 4.76 1.64 2.88 | 4.87 2.84 1.71 |
dyn 1 | 0.50 0.40 1.23 | 0.51 0.40 1.26 |
dyn 2 | 1.08 0.85 1.27 | 1.04 0.69 1.49 |
dyn 3 | 1.76 1.26 1.40 | 1.84 1.14 1.60 |
dyn 4 | 2.57 1.75 1.46 | 2.67 1.66 1.60 |
dyn 5 | 3.80 2.64 1.43 | 4.00 2.48 1.61 |
dyn 6 | 5.06 3.43 1.47 | 5.15 3.21 1.60 |
2019-02-11 13:58:24 +01:00
Gael Guennebaud
013cc3a6b3
Make GEMM fallback to GEMV for runtime vectors.
...
This is a more general and simpler version of changeset 4c0fa6ce0f
2019-02-07 16:24:09 +01:00
Gael Guennebaud
fa2fcb4895
Backed out changeset 4c0fa6ce0f
2019-02-07 16:07:08 +01:00
Gael Guennebaud
b3c4344a68
bug #1676 : workaround GCC's bug in c++17 mode.
2019-02-07 15:21:35 +01:00
Eugene Zhulenev
6d0f6265a9
Remove duplicated comment line
2019-02-04 10:30:25 -08:00
Eugene Zhulenev
690b2c45b1
Fix GeneralBlockPanelKernel Android compilation
2019-02-04 10:29:15 -08:00
Gael Guennebaud
871e2e5339
bug #1674 : disable GCC's unsafe-math-optimizations in sin/cos vectorization (results are completely wrong otherwise)
2019-02-03 08:54:47 +01:00
Rasmus Larsen
e7b481ea74
Merged in rmlarsen/eigen (pull request PR-578)
...
Speed up Eigen matrix*vector and vector*matrix multiplication.
Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2019-02-02 01:53:44 +00:00
Sameer Agarwal
b55b5c7280
Speed up row-major matrix-vector product on ARM
...
The row-major matrix-vector multiplication code uses a threshold to
check if processing 8 rows at a time would thrash the cache.
This change introduces two modifications to this logic.
1. A smaller threshold for ARM and ARM64 devices.
The value of this threshold was determined empirically using a Pixel2
phone, by benchmarking a large number of matrix-vector products in the
range [1..4096]x[1..4096] and measuring performance separately on
small and little cores with frequency pinning.
On big (out-of-order) cores, this change has little to no impact. But
on the small (in-order) cores, the matrix-vector products are up to
700% faster. Especially on large matrices.
The motivation for this change was some internal code at Google which
was using hand-written NEON for implementing similar functionality,
processing the matrix one row at a time, which exhibited substantially
better performance than Eigen.
With the current change, Eigen handily beats that code.
2. Make the logic for choosing number of simultaneous rows apply
unifiormly to 8, 4 and 2 rows instead of just 8 rows.
Since the default threshold for non-ARM devices is essentially
unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM
performance. This was verified by running the same set of benchmarks
on a Xeon desktop.
2019-02-01 15:23:53 -08:00
Rasmus Munk Larsen
4c0fa6ce0f
Speed up Eigen matrix*vector and vector*matrix multiplication.
...
This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector.
The benchmarks below test
c.noalias()= n_by_n_matrix * n_by_1_matrix;
c.noalias()= 1_by_n_matrix * n_by_n_matrix;
respectively.
Benchmark measurements:
SSE:
Run on *** (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 1096 312 +71.5%
BM_MatVec/128 4581 1464 +68.0%
BM_MatVec/256 18534 5710 +69.2%
BM_MatVec/512 118083 24162 +79.5%
BM_MatVec/1k 704106 173346 +75.4%
BM_MatVec/2k 3080828 742728 +75.9%
BM_MatVec/4k 25421512 4530117 +82.2%
BM_VecMat/32 352 130 +63.1%
BM_VecMat/64 1213 425 +65.0%
BM_VecMat/128 4640 1564 +66.3%
BM_VecMat/256 17902 5884 +67.1%
BM_VecMat/512 70466 24000 +65.9%
BM_VecMat/1k 340150 161263 +52.6%
BM_VecMat/2k 1420590 645576 +54.6%
BM_VecMat/4k 8083859 4364327 +46.0%
AVX2:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 619 120 +80.6%
BM_MatVec/128 9693 752 +92.2%
BM_MatVec/256 38356 2773 +92.8%
BM_MatVec/512 69006 12803 +81.4%
BM_MatVec/1k 443810 160378 +63.9%
BM_MatVec/2k 2633553 646594 +75.4%
BM_MatVec/4k 16211095 4327148 +73.3%
BM_VecMat/64 925 227 +75.5%
BM_VecMat/128 3438 830 +75.9%
BM_VecMat/256 13427 2936 +78.1%
BM_VecMat/512 53944 12473 +76.9%
BM_VecMat/1k 302264 157076 +48.0%
BM_VecMat/2k 1396811 675778 +51.6%
BM_VecMat/4k 8962246 4459010 +50.2%
AVX512:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 401 111 +72.3%
BM_MatVec/128 1846 513 +72.2%
BM_MatVec/256 36739 1927 +94.8%
BM_MatVec/512 54490 9227 +83.1%
BM_MatVec/1k 487374 161457 +66.9%
BM_MatVec/2k 2016270 643824 +68.1%
BM_MatVec/4k 13204300 4077412 +69.1%
BM_VecMat/32 324 106 +67.3%
BM_VecMat/64 1034 246 +76.2%
BM_VecMat/128 3576 802 +77.6%
BM_VecMat/256 13411 2561 +80.9%
BM_VecMat/512 58686 10037 +82.9%
BM_VecMat/1k 320862 163750 +49.0%
BM_VecMat/2k 1406719 651397 +53.7%
BM_VecMat/4k 7785179 4124677 +47.0%
Currently watchingStop watching
2019-01-31 14:24:08 -08:00
Gael Guennebaud
7ef879f6bf
GEBP: improves pipelining in the 1pX4 path with FMA.
...
Prior to this change, a product with a LHS having 8 rows was faster with AVX-only than with AVX+FMA.
With AVX+FMA I measured a speed up of about x1.25 in such cases.
2019-01-30 23:45:12 +01:00