eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-02-11 18:00:51 +08:00

History

Sameer Agarwal b55b5c7280 Speed up row-major matrix-vector product on ARM The row-major matrix-vector multiplication code uses a threshold to check if processing 8 rows at a time would thrash the cache. This change introduces two modifications to this logic. 1. A smaller threshold for ARM and ARM64 devices. The value of this threshold was determined empirically using a Pixel2 phone, by benchmarking a large number of matrix-vector products in the range [1..4096]x[1..4096] and measuring performance separately on small and little cores with frequency pinning. On big (out-of-order) cores, this change has little to no impact. But on the small (in-order) cores, the matrix-vector products are up to 700% faster. Especially on large matrices. The motivation for this change was some internal code at Google which was using hand-written NEON for implementing similar functionality, processing the matrix one row at a time, which exhibited substantially better performance than Eigen. With the current change, Eigen handily beats that code. 2. Make the logic for choosing number of simultaneous rows apply unifiormly to 8, 4 and 2 rows instead of just 8 rows. Since the default threshold for non-ARM devices is essentially unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM performance. This was verified by running the same set of benchmarks on a Xeon desktop.		2019-02-01 15:23:53 -08:00
..
src	Speed up row-major matrix-vector product on ARM	2019-02-01 15:23:53 -08:00
Cholesky
CholmodSupport
CMakeLists.txt
Core	Implement AVX512 vectorization of std::complex<float/double>	2018-12-06 15:58:06 +01:00
Dense
Eigen
Eigenvalues	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
Geometry	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
Householder
IterativeLinearSolvers
Jacobi
KLUSupport	Move KLU support to official	2017-11-10 14:11:22 +01:00
LU
MetisSupport
OrderingMethods
PardisoSupport	Extend CUDA support to matrix inversion and selfadjointeigensolver	2018-06-11 18:33:24 +02:00
PaStiXSupport	clarify Pastix requirements	2017-11-27 22:11:57 +01:00
QR	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
QtAlignedMalloc	bug #1468 (1/2) : add missing std:: to memcpy	2017-09-22 09:23:24 +02:00
Sparse	bug #1392 : fix #include <Eigen/Sparse> with mpl2-only	2017-02-11 10:35:01 +01:00
SparseCholesky
SparseCore
SparseLU	Fix numerous shadow-warnings for GCC<=4.8	2018-08-28 18:32:39 +02:00
SparseQR	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
SPQRSupport
StdDeque	bug #1389 : MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX)	2017-02-03 15:22:35 +01:00
StdList	bug #1389 : MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX)	2017-02-03 15:22:35 +01:00
StdVector	bug #1389 : MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX)	2017-02-03 15:22:35 +01:00
SuperLUSupport
SVD
UmfPackSupport