eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-06 14:14:46 +08:00

History

Rasmus Munk Larsen b47c777993 Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once. rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.TransposeInPlace.float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s (Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".TransposeInPlace.float.*" experimental/users/rmlarsen/bench:matmul_bench) name old time/op new time/op delta BM_TransposeInPlace<float>/4 9.84ns ± 0% 6.51ns ± 0% -33.80% (p=0.008 n=5+5) BM_TransposeInPlace<float>/8 23.6ns ± 1% 17.6ns ± 0% -25.26% (p=0.016 n=5+4) BM_TransposeInPlace<float>/16 78.8ns ± 0% 60.3ns ± 0% -23.50% (p=0.029 n=4+4) BM_TransposeInPlace<float>/32 302ns ± 0% 229ns ± 0% -24.40% (p=0.008 n=5+5) BM_TransposeInPlace<float>/59 1.03µs ± 0% 0.84µs ± 1% -17.87% (p=0.016 n=5+4) BM_TransposeInPlace<float>/64 1.20µs ± 0% 0.89µs ± 1% -25.81% (p=0.008 n=5+5) BM_TransposeInPlace<float>/128 8.96µs ± 0% 3.82µs ± 2% -57.33% (p=0.008 n=5+5) BM_TransposeInPlace<float>/256 152µs ± 3% 17µs ± 2% -89.06% (p=0.008 n=5+5) BM_TransposeInPlace<float>/512 837µs ± 1% 208µs ± 0% -75.15% (p=0.008 n=5+5) BM_TransposeInPlace<float>/1k 4.28ms ± 2% 1.08ms ± 2% -74.72% (p=0.008 n=5+5)		2020-04-28 16:08:16 +00:00
..
src	Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once.	2020-04-28 16:08:16 +00:00
Cholesky
CholmodSupport
Core	Include <sstream> explicitly, and don't rely on the implicit include via <complex>.	2020-02-24 23:09:36 +00:00
Dense
Eigen
Eigenvalues	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
Geometry	Enable SSE vectorization of Quaternion and cross3() with AVX	2019-02-23 10:45:40 +01:00
Householder
IterativeLinearSolvers
Jacobi
KLUSupport
LU
MetisSupport
OrderingMethods	Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239`	2019-03-05 10:24:54 -08:00
PardisoSupport
PaStiXSupport
QR	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
QtAlignedMalloc
Sparse	Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239`	2019-03-05 10:24:54 -08:00
SparseCholesky	Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239`	2019-03-05 10:24:54 -08:00
SparseCore
SparseLU	Fix numerous shadow-warnings for GCC<=4.8	2018-08-28 18:32:39 +02:00
SparseQR	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
SPQRSupport
StdDeque
StdList
StdVector
SuperLUSupport
SVD
UmfPackSupport