Go to file
Mark D Ryan aa110e681b PR 526: Speed up multiplication of small, dynamically sized matrices
The Packet16f, Packet8f and Packet8d types are too large to use with dynamically
sized matrices typically processed by the SliceVectorizedTraversal specialization of
the dense_assignment_loop.  Using these types is likely to lead to little or no
vectorization.  Significant slowdown in the multiplication of these small matrices can
be observed when building with AVX and AVX512 enabled.

This patch introduces a new dense_assignment_kernel that is used when
computing small products whose operands have dynamic dimensions.  It ensures that the
PacketSize used is no larger than 4, thereby increasing the chance that vectorized
instructions will be used when computing the product.

I tested all 969 possible combinations of M, K, and N that are handled by the
dense_assignment_loop on x86 builds.  Although a few combinations are slowed down
by this patch they are far outnumbered by the cases that are sped up, as the
following results demonstrate.


Disabling Packed8d on AVX512 builds:

Total Cases:             969
Better:                  511
Worse:                   85
Same:                    373
Max Improvement:         169.00% (4 8 6)
Max Degradation:         36.50% (8 5 3)
Median Improvement:      35.46%
Median Degradation:      17.41%
Total FLOPs Improvement: 19.42%


Disabling Packet16f and Packed8f on AVX512 builds:

Total Cases:             969
Better:                  658
Worse:                   5
Same:                    306
Max Improvement:         214.05% (8 6 5)
Max Degradation:         22.26% (16 2 1)
Median Improvement:      60.05%
Median Degradation:      13.32%
Total FLOPs Improvement: 59.58%


Disabling Packed8f on AVX builds:

Total Cases:             969
Better:                  663
Worse:                   96
Same:                    210
Max Improvement:         155.29% (4 10 5)
Max Degradation:         35.12% (8 3 2)
Median Improvement:      34.28%
Median Degradation:      15.05%
Total FLOPs Improvement: 26.02%
2018-10-12 15:20:21 +02:00
bench Optimize the product of a householder-sequence with the identity, and optimize the evaluation of a HouseholderSequence to a dense matrix using faster blocked product. 2018-07-11 17:16:50 +02:00
blas Fix numerous shadow-warnings for GCC<=4.8 2018-08-28 18:32:39 +02:00
cmake bug #1606: Explicitly set the standard before find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11. 2018-10-19 17:20:51 +02:00
debug MIsc. source and comment typos 2018-03-11 10:01:44 -04:00
demos Fixed compilation error due to obsolete internal::abs and internal::sqrt function calls 2014-03-26 22:02:48 -04:00
doc Document EIGEN_NO_IO preprocessor directive 2018-10-25 16:49:25 +02:00
Eigen PR 526: Speed up multiplication of small, dynamically sized matrices 2018-10-12 15:20:21 +02:00
failtest Add unit tests for bug #981: valid and invalid usage of ternary operator 2015-09-09 11:38:25 +02:00
lapack fix md5sum of lapack_addons 2018-06-15 14:21:29 +02:00
scripts Simplify handling and non-splitted tests and include split_test_helper.h instead of re-generating it. This also allows us to modify it without breaking existing build folder. 2018-07-16 18:55:40 +02:00
test add unit tests for bug #1619 2018-11-01 15:14:50 +01:00
unsupported Merged in ezhulenev/eigen-02 (pull request PR-534) 2018-10-25 18:34:35 +00:00
.hgeol Added a pattern which forces LF line endings for *.sh files. 2013-07-31 18:20:58 +02:00
.hgignore ignore all *build* sub directories 2017-12-14 14:22:14 +01:00
CMakeLists.txt bug #1606: Explicitly set the standard before find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11. 2018-10-19 17:20:51 +02:00
COPYING.BSD Intel(R) MKL support added. 2011-12-05 14:52:21 +07:00
COPYING.GPL there's no reason why we should follow the FSF's stupid recommendation for the naming of these files, right? This could give the wrong impression that Eigen is only GPL-licensed. 2009-11-14 23:26:07 -05:00
COPYING.LGPL Replace COPYING.LGPL by a copy of the LGPL 2.1 (instead of LGPL 3). 2012-09-10 13:27:44 -04:00
COPYING.MINPACK add COPYING.MINPACK 2012-07-15 11:46:22 -04:00
COPYING.MPL2 add COPYING.MPL2 2012-07-15 10:20:59 -04:00
COPYING.README Replace COPYING.LGPL by a copy of the LGPL 2.1 (instead of LGPL 3). 2012-09-10 13:27:44 -04:00
CTestConfig.cmake Optimize the product of a householder-sequence with the identity, and optimize the evaluation of a HouseholderSequence to a dense matrix using faster blocked product. 2018-07-11 17:16:50 +02:00
CTestCustom.cmake.in Allow to filter out build-error messages 2018-07-24 20:12:49 +02:00
eigen3.pc.in Further fixes for CMAKE_INSTALL_PREFIX correctness 2015-11-07 21:29:24 -05:00
INSTALL finally, the right fix: set CTEST_BUILD_TARGET. 2009-10-04 20:27:44 -04:00
README.md Add links where to make PRs and report bugs into README.md 2018-04-13 21:05:28 +00:00
signature_of_eigen3_matrix_library improve the scripts for building unit tests: 2009-11-25 21:26:37 -05:00

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

For more information go to http://eigen.tuxfamily.org/.

For pull request please only use the official repository at https://bitbucket.org/eigen/eigen.

For bug reports and feature requests go to http://eigen.tuxfamily.org/bz.