Go to file
Rasmus Munk Larsen 05754100fe * Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup.
* Write iterative sqrt explicitly in terms of pmadd. This gives up to 7% speedup for psqrt<float> with AVX & SSE with FMA.
* Remove iterative psqrt<double> for NEON, because the initial rsqrt apprimation is not accurate enough for convergence in 2 Newton-Raphson steps and with 3 steps, just calling the builtin sqrt insn is faster.

The following benchmarks were compiled with clang "-O2 -fast-math -mfma" and with and without -mavx.

AVX+FMA (float)

name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_float/1     1.08ns ± 0%  1.09ns ± 1%    ~
BM_eigen_sqrt_float/8     2.07ns ± 0%  2.08ns ± 1%    ~
BM_eigen_sqrt_float/64    12.4ns ± 0%  12.4ns ± 1%    ~
BM_eigen_sqrt_float/512   95.7ns ± 0%  95.5ns ± 0%    ~
BM_eigen_sqrt_float/4k     776ns ± 0%   763ns ± 0%  -1.67%
BM_eigen_sqrt_float/32k   6.57µs ± 1%  6.13µs ± 0%  -6.69%
BM_eigen_sqrt_float/256k  83.7µs ± 3%  83.3µs ± 2%    ~
BM_eigen_sqrt_float/1M     335µs ± 2%   332µs ± 2%    ~

SSE+FMA (float)
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_float/1     1.08ns ± 0%  1.09ns ± 0%    ~
BM_eigen_sqrt_float/8     2.07ns ± 0%  2.06ns ± 0%    ~
BM_eigen_sqrt_float/64    12.4ns ± 0%  12.4ns ± 1%    ~
BM_eigen_sqrt_float/512   95.7ns ± 0%  96.3ns ± 4%    ~
BM_eigen_sqrt_float/4k     774ns ± 0%   763ns ± 0%  -1.50%
BM_eigen_sqrt_float/32k   6.58µs ± 2%  6.11µs ± 0%  -7.06%
BM_eigen_sqrt_float/256k  82.7µs ± 1%  82.6µs ± 1%    ~
BM_eigen_sqrt_float/1M     330µs ± 1%   329µs ± 2%    ~

SSE+FMA (double)
BM_eigen_sqrt_double/1      1.63ns ± 0%  1.63ns ± 0%     ~
BM_eigen_sqrt_double/8      6.51ns ± 0%  6.08ns ± 0%   -6.68%
BM_eigen_sqrt_double/64     52.1ns ± 0%  46.5ns ± 1%  -10.65%
BM_eigen_sqrt_double/512     417ns ± 0%   374ns ± 1%  -10.29%
BM_eigen_sqrt_double/4k     3.33µs ± 0%  2.97µs ± 1%  -11.00%
BM_eigen_sqrt_double/32k    26.7µs ± 0%  23.7µs ± 0%  -11.07%
BM_eigen_sqrt_double/256k    213µs ± 0%   206µs ± 1%   -3.31%
BM_eigen_sqrt_double/1M      862µs ± 0%   870µs ± 2%   +0.96%

AVX+FMA (double)
name                        old cpu/op  new cpu/op  delta
BM_eigen_sqrt_double/1      1.63ns ± 0%  1.63ns ± 0%     ~
BM_eigen_sqrt_double/8      6.51ns ± 0%  6.06ns ± 0%   -6.95%
BM_eigen_sqrt_double/64     52.1ns ± 0%  46.5ns ± 1%  -10.80%
BM_eigen_sqrt_double/512     417ns ± 0%   373ns ± 1%  -10.59%
BM_eigen_sqrt_double/4k     3.33µs ± 0%  2.97µs ± 1%  -10.79%
BM_eigen_sqrt_double/32k    26.7µs ± 0%  23.8µs ± 0%  -10.94%
BM_eigen_sqrt_double/256k    214µs ± 0%   208µs ± 2%   -2.76%
BM_eigen_sqrt_double/1M      866µs ± 3%   923µs ± 7%     ~
2020-12-16 18:16:11 +00:00
bench Fix #1911: add benchmark for move semantics with fixed-size matrix 2020-06-11 23:43:25 +00:00
blas STYLE: Remove CMake-language block-end command arguments 2019-10-31 11:36:27 -05:00
ci Add CI configuration for ppc64le 2020-09-22 00:26:23 +00:00
cmake check for include dirs set 2020-11-26 10:22:46 +00:00
debug
demos Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
doc Fix typo in doc 2020-11-30 10:53:29 +00:00
Eigen * Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup. 2020-12-16 18:16:11 +00:00
failtest Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
lapack Remove code checking for CMake < 3.5 2020-12-14 09:57:44 +00:00
scripts Replace calls to "hg" by calls to "git" 2019-12-04 11:24:06 +01:00
test Replace M_LOG2E and M_LN2 with custom macros. 2020-12-11 14:34:31 -08:00
unsupported Replace call to FixedDimensions() with a singleton instance of 2020-12-16 07:34:44 -07:00
.gitignore New CI infrastructure, including AArch64 runners 2020-09-11 18:11:49 +00:00
.gitlab-ci.yml New CI infrastructure, including AArch64 runners 2020-09-11 18:11:49 +00:00
.hgeol
CMakeLists.txt Remove code checking for CMake < 3.5 2020-12-14 09:57:44 +00:00
COPYING.APACHE Add Apache 2.0 license text in COPYING.APACHE. 2020-06-18 12:45:27 -07:00
COPYING.BSD Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
COPYING.GPL
COPYING.LGPL
COPYING.MINPACK Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
COPYING.MPL2
COPYING.README
CTestConfig.cmake STYLE: Convert CMake-language commands to lower case 2019-10-31 11:36:37 -05:00
CTestCustom.cmake.in
eigen3.pc.in
INSTALL
README.md Update old links to bitbucket to point to gitlab.com 2019-12-04 10:57:07 +01:00
signature_of_eigen3_matrix_library

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

For more information go to http://eigen.tuxfamily.org/.

For pull request, bug reports, and feature requests, go to https://gitlab.com/libeigen/eigen.