Go to file
Rasmus Munk Larsen 48c635e223 Add a simple cost model to prevent Eigen's parallel GEMM from using too many threads when the inner dimension is small.
Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K.

Improvements in Wall time:

Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1                    3088      1610    +47.9%
BM_OuterishProd/64/4                    3562      2414    +32.2%
BM_OuterishProd/64/32                   8861      7815    +11.8%
BM_OuterishProd/128/1                  11363      6504    +42.8%
BM_OuterishProd/128/4                  11128      9794    +12.0%
BM_OuterishProd/128/64                 27691     27396     +1.1%
BM_OuterishProd/256/1                  33214     28123    +15.3%
BM_OuterishProd/256/4                  34312     36818     -7.3%
BM_OuterishProd/256/128               174866    176398     -0.9%
BM_OuterishProd/512/1                7963684    104224    +98.7%
BM_OuterishProd/512/4                7987913    112867    +98.6%
BM_OuterishProd/512/256              8198378   1306500    +84.1%
BM_OuterishProd/1k/1                 7356256    324432    +95.6%
BM_OuterishProd/1k/4                 8129616    331621    +95.9%
BM_OuterishProd/1k/512              27265418   7517538    +72.4%

Improvements in CPU time:

Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1                    6169      1608    +73.9%
BM_OuterishProd/64/4                    7117      2412    +66.1%
BM_OuterishProd/64/32                  17702     15616    +11.8%
BM_OuterishProd/128/1                  45415      6498    +85.7%
BM_OuterishProd/128/4                  44459      9786    +78.0%
BM_OuterishProd/128/64                110657    109489     +1.1%
BM_OuterishProd/256/1                 265158     28101    +89.4%
BM_OuterishProd/256/4                 274234    183885    +32.9%
BM_OuterishProd/256/128              1397160   1408776     -0.8%
BM_OuterishProd/512/1               78947048    520703    +99.3%
BM_OuterishProd/512/4               86955578   1349742    +98.4%
BM_OuterishProd/512/256             74701613  15584661    +79.1%
BM_OuterishProd/1k/1                78352601   3877911    +95.1%
BM_OuterishProd/1k/4                78521643   3966221    +94.9%
BM_OuterishProd/1k/512              258104736  89480530    +65.3%
2016-10-06 10:33:10 -07:00
bench Do not manually add absolute path to boost-library. 2016-09-22 00:10:47 +02:00
blas Relax mixing-type constraints for binary coefficient-wise operators: 2016-06-06 15:11:41 +02:00
cmake FindEigen3.cmake : search for package only if EIGEN3_INCLUDE_DIR is not already defined 2016-08-22 22:13:10 +00:00
debug Make gdb pretty printer Python3-compatible (bug #800). 2014-04-28 14:10:22 +01:00
demos Fixed compilation error due to obsolete internal::abs and internal::sqrt function calls 2014-03-26 22:02:48 -04:00
doc Add missing file. 2016-09-23 10:26:08 +02:00
Eigen Add a simple cost model to prevent Eigen's parallel GEMM from using too many threads when the inner dimension is small. 2016-10-06 10:33:10 -07:00
failtest Add unit tests for bug #981: valid and invalid usage of ternary operator 2015-09-09 11:38:25 +02:00
lapack Workaround "misleading-indentation" warnings 2016-05-11 08:41:36 +02:00
scripts Fix help output of buildtests and check scripts 2016-05-11 19:39:09 +02:00
test Fix compilation of qr.inverse() for column and full pivoting variants. 2016-10-06 09:55:50 +02:00
unsupported Increased the robustness of the reduction tests on fp16 2016-10-05 10:42:41 -07:00
.hgeol Added a pattern which forces LF line endings for *.sh files. 2013-07-31 18:20:58 +02:00
.hgignore Ignore automalically imported lapack source files 2014-10-17 15:34:39 +02:00
CMakeLists.txt Add aliases Eigen_*_DIR to Eigen3_*_DIR 2016-08-05 15:21:14 +02:00
COPYING.BSD Intel(R) MKL support added. 2011-12-05 14:52:21 +07:00
COPYING.GPL
COPYING.LGPL Replace COPYING.LGPL by a copy of the LGPL 2.1 (instead of LGPL 3). 2012-09-10 13:27:44 -04:00
COPYING.MINPACK add COPYING.MINPACK 2012-07-15 11:46:22 -04:00
COPYING.MPL2 add COPYING.MPL2 2012-07-15 10:20:59 -04:00
COPYING.README Replace COPYING.LGPL by a copy of the LGPL 2.1 (instead of LGPL 3). 2012-09-10 13:27:44 -04:00
CTestConfig.cmake swap 3.2 <-> default CTestConfig.cmake file 2014-03-05 10:07:44 +01:00
CTestCustom.cmake.in Reduce maximum number of warnings/errors. (they took GBs even for limited period of time) 2013-06-20 17:39:15 +02:00
eigen3.pc.in Further fixes for CMAKE_INSTALL_PREFIX correctness 2015-11-07 21:29:24 -05:00
INSTALL
README.md Reverted the README 2015-02-27 13:09:49 -08:00
signature_of_eigen3_matrix_library

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

For more information go to http://eigen.tuxfamily.org/.