eigen/Eigen
Rasmus Munk Larsen f0577a2bfd Speed up matrix multiplication for small to medium size matrices by using half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path.
Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size.

Measured improvement with AVX+FMA:

name                           old time/op             new time/op             delta
BM_MatMul_ATB/8                 139ns ± 1%              129ns ± 1%   -7.49%          (p=0.008 n=5+5)
BM_MatMul_ATB/32               1.46µs ± 1%             1.22µs ± 0%  -16.72%          (p=0.008 n=5+5)
BM_MatMul_ATB/64               8.43µs ± 1%             7.41µs ± 0%  -12.04%          (p=0.008 n=5+5)
BM_MatMul_ATB/128              56.8µs ± 1%             52.9µs ± 1%   -6.83%          (p=0.008 n=5+5)
BM_MatMul_ATB/256               407µs ± 1%              395µs ± 3%   -2.94%          (p=0.032 n=5+5)
BM_MatMul_ATB/512              3.27ms ± 3%             3.18ms ± 1%     ~             (p=0.056 n=5+5)


Measured improvement for AVX512:

name                          old time/op             new time/op             delta
BM_MatMul_ATB/8                167ns ± 1%              154ns ± 1%   -7.63%          (p=0.008 n=5+5)
BM_MatMul_ATB/32              1.08µs ± 1%             0.83µs ± 3%  -23.58%          (p=0.008 n=5+5)
BM_MatMul_ATB/64              6.21µs ± 1%             5.06µs ± 1%  -18.47%          (p=0.008 n=5+5)
BM_MatMul_ATB/128             36.1µs ± 2%             31.3µs ± 1%  -13.32%          (p=0.008 n=5+5)
BM_MatMul_ATB/256              263µs ± 2%              242µs ± 2%   -7.92%          (p=0.008 n=5+5)
BM_MatMul_ATB/512             1.95ms ± 2%             1.91ms ± 2%     ~             (p=0.095 n=5+5)
BM_MatMul_ATB/1k              15.4ms ± 4%             14.8ms ± 2%     ~             (p=0.095 n=5+5)
2020-04-07 22:09:51 +00:00
..
src Speed up matrix multiplication for small to medium size matrices by using half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path. 2020-04-07 22:09:51 +00:00
Cholesky bug #1455: Cholesky module depends on Jacobi for rank-updates. 2017-08-22 11:37:32 +02:00
CholmodSupport Update link to suitesparse. 2016-01-27 22:48:40 +01:00
Core Include <sstream> explicitly, and don't rely on the implicit include via <complex>. 2020-02-24 23:09:36 +00:00
Dense
Eigen
Eigenvalues Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop 2018-08-28 11:44:15 +02:00
Geometry Enable SSE vectorization of Quaternion and cross3() with AVX 2019-02-23 10:45:40 +01:00
Householder
IterativeLinearSolvers
Jacobi
KLUSupport Move KLU support to official 2017-11-10 14:11:22 +01:00
LU use MKL's lapacke.h header when using MKL 2017-08-17 21:58:39 +02:00
MetisSupport
OrderingMethods Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239 2019-03-05 10:24:54 -08:00
PardisoSupport Extend CUDA support to matrix inversion and selfadjointeigensolver 2018-06-11 18:33:24 +02:00
PaStiXSupport clarify Pastix requirements 2017-11-27 22:11:57 +01:00
QR Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop 2018-08-28 11:44:15 +02:00
QtAlignedMalloc bug #1468 (1/2) : add missing std:: to memcpy 2017-09-22 09:23:24 +02:00
Sparse Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239 2019-03-05 10:24:54 -08:00
SparseCholesky Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239 2019-03-05 10:24:54 -08:00
SparseCore
SparseLU Fix numerous shadow-warnings for GCC<=4.8 2018-08-28 18:32:39 +02:00
SparseQR Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop 2018-08-28 11:44:15 +02:00
SPQRSupport Update link to suitesparse. 2016-01-27 22:48:40 +01:00
StdDeque bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
StdList bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
StdVector bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
SuperLUSupport bug #1119: Adjust call to ?gssvx for SuperLU 5 2016-07-10 02:29:57 +02:00
SVD use MKL's lapacke.h header when using MKL 2017-08-17 21:58:39 +02:00
UmfPackSupport Update link to suitesparse. 2016-01-27 22:48:40 +01:00