eigen/Eigen
Gustavo Lima Chaves 1024a70e82 gebp: Add new ½ and ¼ packet rows per (peeling) round on the lhs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The patch works by altering the gebp lhs packing routines to also
consider ½ and ¼ packet lenght rows when packing, besides the original
whole package and row-by-row attempts. Finally, gebp itself will try
to fit a fraction of a packet at a time if:

i) ½ and/or ¼ packets are available for the current context (e.g. AVX2
   and SSE-sized SIMD register for x86)

ii) The matrix's height is favorable to it (it may be it's too small
    in that dimension to take full advantage of the current/maximum
    packet width or it may be the case that last rows may take
    advantage of smaller packets before gebp goes row-by-row)

This helps mitigate huge slowdowns one had on AVX512 builds when
compared to AVX2 ones, for some dimensions. Gains top at an extra 1x
in throughput. This patch is a complement to changeset 4ad359237a
.

Since packing is changed, Eigen users which would go for very
low-level API usage, like TensorFlow, will have to be adapted to work
fine with the changes.
2018-12-21 11:03:18 -08:00
..
src gebp: Add new ½ and ¼ packet rows per (peeling) round on the lhs 2018-12-21 11:03:18 -08:00
Cholesky bug #1455: Cholesky module depends on Jacobi for rank-updates. 2017-08-22 11:37:32 +02:00
CholmodSupport
CMakeLists.txt bug #1167: simplify installation of header files using cmake's install(DIRECTORY ...) command. 2016-08-29 10:59:37 +02:00
Core Implement AVX512 vectorization of std::complex<float/double> 2018-12-06 15:58:06 +01:00
Dense
Eigen
Eigenvalues Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop 2018-08-28 11:44:15 +02:00
Geometry Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop 2018-08-28 11:44:15 +02:00
Householder
IterativeLinearSolvers
Jacobi
KLUSupport Move KLU support to official 2017-11-10 14:11:22 +01:00
LU use MKL's lapacke.h header when using MKL 2017-08-17 21:58:39 +02:00
MetisSupport
OrderingMethods
PardisoSupport Extend CUDA support to matrix inversion and selfadjointeigensolver 2018-06-11 18:33:24 +02:00
PaStiXSupport clarify Pastix requirements 2017-11-27 22:11:57 +01:00
QR Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop 2018-08-28 11:44:15 +02:00
QtAlignedMalloc bug #1468 (1/2) : add missing std:: to memcpy 2017-09-22 09:23:24 +02:00
Sparse bug #1392: fix #include <Eigen/Sparse> with mpl2-only 2017-02-11 10:35:01 +01:00
SparseCholesky
SparseCore
SparseLU Fix numerous shadow-warnings for GCC<=4.8 2018-08-28 18:32:39 +02:00
SparseQR Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop 2018-08-28 11:44:15 +02:00
SPQRSupport
StdDeque bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
StdList bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
StdVector bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
SuperLUSupport
SVD use MKL's lapacke.h header when using MKL 2017-08-17 21:58:39 +02:00
UmfPackSupport