eigen/Eigen
Antonio Sanchez f85038b7f3 Fix excessive GEBP register spilling for 32-bit NEON.
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM,
leading to excessive 16-byte register spills, slowing down basic f32
matrix multiplication by approx 50%.

By specializing `gebp_traits`, we can eliminate the register spills.
Volatile inline ASM both acts as a barrier to prevent reordering and
enforces strict register use. In a simple f32 matrix multiply example,
this modification reduces 16-byte spills from 109 instances to zero,
leading to a 1.5x speed increase (search for `16-byte Spill` in the
assembly in https://godbolt.org/z/chsPbE).

This is a replacement of !379.  See there for further discussion.

Also moved `gebp_traits` specializations for NEON to
`Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside
other NEON-specific code.

Fixes #2138.
2021-02-03 09:01:48 -08:00
..
src Fix excessive GEBP register spilling for 32-bit NEON. 2021-02-03 09:01:48 -08:00
Cholesky
CholmodSupport
Core Fix excessive GEBP register spilling for 32-bit NEON. 2021-02-03 09:01:48 -08:00
Dense
Eigen
Eigenvalues
Geometry 1)provide a better generic paddsub op implementation 2021-01-13 22:54:03 +00:00
Householder
IterativeLinearSolvers
Jacobi
KLUSupport
LU Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath. 2020-11-17 12:27:01 +00:00
MetisSupport
OrderingMethods
PardisoSupport
PaStiXSupport
QR
QtAlignedMalloc
Sparse
SparseCholesky
SparseCore
SparseLU
SparseQR
SPQRSupport
StdDeque
StdList
StdVector
SuperLUSupport
SVD
UmfPackSupport