eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-03-01 18:26:24 +08:00

History

Rasmus Munk Larsen f0577a2bfd Speed up matrix multiplication for small to medium size matrices by using half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path. Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size. Measured improvement with AVX+FMA: name old time/op new time/op delta BM_MatMul_ATB/8 139ns ± 1% 129ns ± 1% -7.49% (p=0.008 n=5+5) BM_MatMul_ATB/32 1.46µs ± 1% 1.22µs ± 0% -16.72% (p=0.008 n=5+5) BM_MatMul_ATB/64 8.43µs ± 1% 7.41µs ± 0% -12.04% (p=0.008 n=5+5) BM_MatMul_ATB/128 56.8µs ± 1% 52.9µs ± 1% -6.83% (p=0.008 n=5+5) BM_MatMul_ATB/256 407µs ± 1% 395µs ± 3% -2.94% (p=0.032 n=5+5) BM_MatMul_ATB/512 3.27ms ± 3% 3.18ms ± 1% ~ (p=0.056 n=5+5) Measured improvement for AVX512: name old time/op new time/op delta BM_MatMul_ATB/8 167ns ± 1% 154ns ± 1% -7.63% (p=0.008 n=5+5) BM_MatMul_ATB/32 1.08µs ± 1% 0.83µs ± 3% -23.58% (p=0.008 n=5+5) BM_MatMul_ATB/64 6.21µs ± 1% 5.06µs ± 1% -18.47% (p=0.008 n=5+5) BM_MatMul_ATB/128 36.1µs ± 2% 31.3µs ± 1% -13.32% (p=0.008 n=5+5) BM_MatMul_ATB/256 263µs ± 2% 242µs ± 2% -7.92% (p=0.008 n=5+5) BM_MatMul_ATB/512 1.95ms ± 2% 1.91ms ± 2% ~ (p=0.095 n=5+5) BM_MatMul_ATB/1k 15.4ms ± 4% 14.8ms ± 2% ~ (p=0.095 n=5+5)		2020-04-07 22:09:51 +00:00
..
src	Speed up matrix multiplication for small to medium size matrices by using half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path.	2020-04-07 22:09:51 +00:00
Cholesky
CholmodSupport
Core	Include <sstream> explicitly, and don't rely on the implicit include via <complex>.	2020-02-24 23:09:36 +00:00
Dense
Eigen
Eigenvalues	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
Geometry	Enable SSE vectorization of Quaternion and cross3() with AVX	2019-02-23 10:45:40 +01:00
Householder
IterativeLinearSolvers
Jacobi
KLUSupport	Move KLU support to official	2017-11-10 14:11:22 +01:00
LU
MetisSupport
OrderingMethods	Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239`	2019-03-05 10:24:54 -08:00
PardisoSupport	Extend CUDA support to matrix inversion and selfadjointeigensolver	2018-06-11 18:33:24 +02:00
PaStiXSupport	clarify Pastix requirements	2017-11-27 22:11:57 +01:00
QR	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
QtAlignedMalloc
Sparse	Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239`	2019-03-05 10:24:54 -08:00
SparseCholesky	Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239`	2019-03-05 10:24:54 -08:00
SparseCore
SparseLU	Fix numerous shadow-warnings for GCC<=4.8	2018-08-28 18:32:39 +02:00
SparseQR	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
SPQRSupport
StdDeque
StdList
StdVector
SuperLUSupport
SVD
UmfPackSupport