eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Go to file

Rasmus Munk Larsen 05754100fe * Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup. * Write iterative sqrt explicitly in terms of pmadd. This gives up to 7% speedup for psqrt<float> with AVX & SSE with FMA. * Remove iterative psqrt<double> for NEON, because the initial rsqrt apprimation is not accurate enough for convergence in 2 Newton-Raphson steps and with 3 steps, just calling the builtin sqrt insn is faster. The following benchmarks were compiled with clang "-O2 -fast-math -mfma" and with and without -mavx. AVX+FMA (float) name old cpu/op new cpu/op delta BM_eigen_sqrt_float/1 1.08ns ± 0% 1.09ns ± 1% ~ BM_eigen_sqrt_float/8 2.07ns ± 0% 2.08ns ± 1% ~ BM_eigen_sqrt_float/64 12.4ns ± 0% 12.4ns ± 1% ~ BM_eigen_sqrt_float/512 95.7ns ± 0% 95.5ns ± 0% ~ BM_eigen_sqrt_float/4k 776ns ± 0% 763ns ± 0% -1.67% BM_eigen_sqrt_float/32k 6.57µs ± 1% 6.13µs ± 0% -6.69% BM_eigen_sqrt_float/256k 83.7µs ± 3% 83.3µs ± 2% ~ BM_eigen_sqrt_float/1M 335µs ± 2% 332µs ± 2% ~ SSE+FMA (float) name old cpu/op new cpu/op delta BM_eigen_sqrt_float/1 1.08ns ± 0% 1.09ns ± 0% ~ BM_eigen_sqrt_float/8 2.07ns ± 0% 2.06ns ± 0% ~ BM_eigen_sqrt_float/64 12.4ns ± 0% 12.4ns ± 1% ~ BM_eigen_sqrt_float/512 95.7ns ± 0% 96.3ns ± 4% ~ BM_eigen_sqrt_float/4k 774ns ± 0% 763ns ± 0% -1.50% BM_eigen_sqrt_float/32k 6.58µs ± 2% 6.11µs ± 0% -7.06% BM_eigen_sqrt_float/256k 82.7µs ± 1% 82.6µs ± 1% ~ BM_eigen_sqrt_float/1M 330µs ± 1% 329µs ± 2% ~ SSE+FMA (double) BM_eigen_sqrt_double/1 1.63ns ± 0% 1.63ns ± 0% ~ BM_eigen_sqrt_double/8 6.51ns ± 0% 6.08ns ± 0% -6.68% BM_eigen_sqrt_double/64 52.1ns ± 0% 46.5ns ± 1% -10.65% BM_eigen_sqrt_double/512 417ns ± 0% 374ns ± 1% -10.29% BM_eigen_sqrt_double/4k 3.33µs ± 0% 2.97µs ± 1% -11.00% BM_eigen_sqrt_double/32k 26.7µs ± 0% 23.7µs ± 0% -11.07% BM_eigen_sqrt_double/256k 213µs ± 0% 206µs ± 1% -3.31% BM_eigen_sqrt_double/1M 862µs ± 0% 870µs ± 2% +0.96% AVX+FMA (double) name old cpu/op new cpu/op delta BM_eigen_sqrt_double/1 1.63ns ± 0% 1.63ns ± 0% ~ BM_eigen_sqrt_double/8 6.51ns ± 0% 6.06ns ± 0% -6.95% BM_eigen_sqrt_double/64 52.1ns ± 0% 46.5ns ± 1% -10.80% BM_eigen_sqrt_double/512 417ns ± 0% 373ns ± 1% -10.59% BM_eigen_sqrt_double/4k 3.33µs ± 0% 2.97µs ± 1% -10.79% BM_eigen_sqrt_double/32k 26.7µs ± 0% 23.8µs ± 0% -10.94% BM_eigen_sqrt_double/256k 214µs ± 0% 208µs ± 2% -2.76% BM_eigen_sqrt_double/1M 866µs ± 3% 923µs ± 7% ~		2020-12-16 18:16:11 +00:00
bench	Fix #1911 : add benchmark for move semantics with fixed-size matrix	2020-06-11 23:43:25 +00:00
blas	STYLE: Remove CMake-language block-end command arguments	2019-10-31 11:36:27 -05:00
ci	Add CI configuration for ppc64le	2020-09-22 00:26:23 +00:00
cmake	check for include dirs set	2020-11-26 10:22:46 +00:00
debug
demos	Make file formatting comply with POSIX and Unix standards	2020-03-23 18:09:02 +00:00
doc	Fix typo in doc	2020-11-30 10:53:29 +00:00
Eigen	* Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup.	2020-12-16 18:16:11 +00:00
failtest	Make file formatting comply with POSIX and Unix standards	2020-03-23 18:09:02 +00:00
lapack	Remove code checking for CMake < 3.5	2020-12-14 09:57:44 +00:00
scripts	Replace calls to "hg" by calls to "git"	2019-12-04 11:24:06 +01:00
test	Replace M_LOG2E and M_LN2 with custom macros.	2020-12-11 14:34:31 -08:00
unsupported	Replace call to FixedDimensions() with a singleton instance of	2020-12-16 07:34:44 -07:00
.gitignore	New CI infrastructure, including AArch64 runners	2020-09-11 18:11:49 +00:00
.gitlab-ci.yml	New CI infrastructure, including AArch64 runners	2020-09-11 18:11:49 +00:00
.hgeol
CMakeLists.txt	Remove code checking for CMake < 3.5	2020-12-14 09:57:44 +00:00
COPYING.APACHE	Add Apache 2.0 license text in COPYING.APACHE.	2020-06-18 12:45:27 -07:00
COPYING.BSD	Make file formatting comply with POSIX and Unix standards	2020-03-23 18:09:02 +00:00
COPYING.GPL
COPYING.LGPL
COPYING.MINPACK	Make file formatting comply with POSIX and Unix standards	2020-03-23 18:09:02 +00:00
COPYING.MPL2
COPYING.README
CTestConfig.cmake	STYLE: Convert CMake-language commands to lower case	2019-10-31 11:36:37 -05:00
CTestCustom.cmake.in
eigen3.pc.in
INSTALL
README.md	Update old links to bitbucket to point to gitlab.com	2019-12-04 10:57:07 +01:00
signature_of_eigen3_matrix_library

README.md

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

For more information go to http://eigen.tuxfamily.org/.

For pull request, bug reports, and feature requests, go to https://gitlab.com/libeigen/eigen.