eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-02-17 18:09:55 +08:00

History

Rasmus Munk Larsen 4c0fa6ce0f Speed up Eigen matrixvector and vectormatrix multiplication. This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector. The benchmarks below test c.noalias()= n_by_n_matrix * n_by_1_matrix; c.noalias()= 1_by_n_matrix * n_by_n_matrix; respectively. Benchmark measurements: SSE: Run on * (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 1096 312 +71.5% BM_MatVec/128 4581 1464 +68.0% BM_MatVec/256 18534 5710 +69.2% BM_MatVec/512 118083 24162 +79.5% BM_MatVec/1k 704106 173346 +75.4% BM_MatVec/2k 3080828 742728 +75.9% BM_MatVec/4k 25421512 4530117 +82.2% BM_VecMat/32 352 130 +63.1% BM_VecMat/64 1213 425 +65.0% BM_VecMat/128 4640 1564 +66.3% BM_VecMat/256 17902 5884 +67.1% BM_VecMat/512 70466 24000 +65.9% BM_VecMat/1k 340150 161263 +52.6% BM_VecMat/2k 1420590 645576 +54.6% BM_VecMat/4k 8083859 4364327 +46.0% AVX2: Run on * (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 619 120 +80.6% BM_MatVec/128 9693 752 +92.2% BM_MatVec/256 38356 2773 +92.8% BM_MatVec/512 69006 12803 +81.4% BM_MatVec/1k 443810 160378 +63.9% BM_MatVec/2k 2633553 646594 +75.4% BM_MatVec/4k 16211095 4327148 +73.3% BM_VecMat/64 925 227 +75.5% BM_VecMat/128 3438 830 +75.9% BM_VecMat/256 13427 2936 +78.1% BM_VecMat/512 53944 12473 +76.9% BM_VecMat/1k 302264 157076 +48.0% BM_VecMat/2k 1396811 675778 +51.6% BM_VecMat/4k 8962246 4459010 +50.2% AVX512: Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 401 111 +72.3% BM_MatVec/128 1846 513 +72.2% BM_MatVec/256 36739 1927 +94.8% BM_MatVec/512 54490 9227 +83.1% BM_MatVec/1k 487374 161457 +66.9% BM_MatVec/2k 2016270 643824 +68.1% BM_MatVec/4k 13204300 4077412 +69.1% BM_VecMat/32 324 106 +67.3% BM_VecMat/64 1034 246 +76.2% BM_VecMat/128 3576 802 +77.6% BM_VecMat/256 13411 2561 +80.9% BM_VecMat/512 58686 10037 +82.9% BM_VecMat/1k 320862 163750 +49.0% BM_VecMat/2k 1406719 651397 +53.7% BM_VecMat/4k 7785179 4124677 +47.0% Currently watchingStop watching		2019-01-31 14:24:08 -08:00
..
src	Speed up Eigen matrixvector and vectormatrix multiplication.	2019-01-31 14:24:08 -08:00
Cholesky	bug #1455 : Cholesky module depends on Jacobi for rank-updates.	2017-08-22 11:37:32 +02:00
CholmodSupport
CMakeLists.txt
Core	Implement AVX512 vectorization of std::complex<float/double>	2018-12-06 15:58:06 +01:00
Dense
Eigen
Eigenvalues	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
Geometry	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
Householder
IterativeLinearSolvers
Jacobi
KLUSupport	Move KLU support to official	2017-11-10 14:11:22 +01:00
LU	use MKL's lapacke.h header when using MKL	2017-08-17 21:58:39 +02:00
MetisSupport
OrderingMethods
PardisoSupport	Extend CUDA support to matrix inversion and selfadjointeigensolver	2018-06-11 18:33:24 +02:00
PaStiXSupport	clarify Pastix requirements	2017-11-27 22:11:57 +01:00
QR	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
QtAlignedMalloc	bug #1468 (1/2) : add missing std:: to memcpy	2017-09-22 09:23:24 +02:00
Sparse	bug #1392 : fix #include <Eigen/Sparse> with mpl2-only	2017-02-11 10:35:01 +01:00
SparseCholesky
SparseCore
SparseLU	Fix numerous shadow-warnings for GCC<=4.8	2018-08-28 18:32:39 +02:00
SparseQR	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
SPQRSupport
StdDeque	bug #1389 : MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX)	2017-02-03 15:22:35 +01:00
StdList	bug #1389 : MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX)	2017-02-03 15:22:35 +01:00
StdVector	bug #1389 : MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX)	2017-02-03 15:22:35 +01:00
SuperLUSupport
SVD	use MKL's lapacke.h header when using MKL	2017-08-17 21:58:39 +02:00
UmfPackSupport