mirror of
https://gitlab.com/libeigen/eigen.git
synced 2024-12-21 07:19:46 +08:00
f85038b7f3
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138. |
||
---|---|---|
.. | ||
src | ||
Cholesky | ||
CholmodSupport | ||
Core | ||
Dense | ||
Eigen | ||
Eigenvalues | ||
Geometry | ||
Householder | ||
IterativeLinearSolvers | ||
Jacobi | ||
KLUSupport | ||
LU | ||
MetisSupport | ||
OrderingMethods | ||
PardisoSupport | ||
PaStiXSupport | ||
QR | ||
QtAlignedMalloc | ||
Sparse | ||
SparseCholesky | ||
SparseCore | ||
SparseLU | ||
SparseQR | ||
SPQRSupport | ||
StdDeque | ||
StdList | ||
StdVector | ||
SuperLUSupport | ||
SVD | ||
UmfPackSupport |