eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-06 14:14:46 +08:00

History

Rasmus Munk Larsen a566074480 Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function). This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in `66f07efeae`), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9. This change also contains a few improvements to speed up the original float specialization of logistic: - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case). - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup). The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set. The benchmarks below repeated calls u = v.logistic() (u = v.tanh(), respectively) where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1]. Benchmark numbers for logistic: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 4467 4468 155835 model_time: 4827 AVX BM_eigen_logistic_float 2347 2347 299135 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1467 1467 476143 model_time: 2926 AVX512 BM_eigen_logistic_float 805 805 858696 model_time: 1463 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 2589 2590 270264 model_time: 4827 AVX BM_eigen_logistic_float 1428 1428 489265 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1059 1059 662255 model_time: 2926 AVX512 BM_eigen_logistic_float 673 673 1000000 model_time: 1463 Benchmark numbers for tanh: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2391 2391 292624 model_time: 4242 AVX BM_eigen_tanh_float 1256 1256 554662 model_time: 2633 AVX+FMA BM_eigen_tanh_float 823 823 866267 model_time: 1609 AVX512 BM_eigen_tanh_float 443 443 1578999 model_time: 805 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2588 2588 273531 model_time: 4242 AVX BM_eigen_tanh_float 1536 1536 452321 model_time: 2633 AVX+FMA BM_eigen_tanh_float 1007 1007 694681 model_time: 1609 AVX512 BM_eigen_tanh_float 471 471 1472178 model_time: 805		2019-12-16 21:33:42 +00:00
..
src	Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function).	2019-12-16 21:33:42 +00:00
Cholesky	bug #1455 : Cholesky module depends on Jacobi for rank-updates.	2017-08-22 11:37:32 +02:00
CholmodSupport
Core	Fix a circular dependency regarding pshift* functions and GenericPacketMathFunctions.	2019-09-06 09:26:04 +02:00
Dense
Eigen
Eigenvalues	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
Geometry	Enable SSE vectorization of Quaternion and cross3() with AVX	2019-02-23 10:45:40 +01:00
Householder
IterativeLinearSolvers
Jacobi
KLUSupport	Move KLU support to official	2017-11-10 14:11:22 +01:00
LU	use MKL's lapacke.h header when using MKL	2017-08-17 21:58:39 +02:00
MetisSupport
OrderingMethods	Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239`	2019-03-05 10:24:54 -08:00
PardisoSupport	Extend CUDA support to matrix inversion and selfadjointeigensolver	2018-06-11 18:33:24 +02:00
PaStiXSupport	clarify Pastix requirements	2017-11-27 22:11:57 +01:00
QR	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
QtAlignedMalloc	bug #1468 (1/2) : add missing std:: to memcpy	2017-09-22 09:23:24 +02:00
Sparse	Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239`	2019-03-05 10:24:54 -08:00
SparseCholesky	Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in `2ca1e73239`	2019-03-05 10:24:54 -08:00
SparseCore
SparseLU	Fix numerous shadow-warnings for GCC<=4.8	2018-08-28 18:32:39 +02:00
SparseQR	Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop	2018-08-28 11:44:15 +02:00
SPQRSupport
StdDeque	bug #1389 : MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX)	2017-02-03 15:22:35 +01:00
StdList	bug #1389 : MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX)	2017-02-03 15:22:35 +01:00
StdVector	bug #1389 : MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX)	2017-02-03 15:22:35 +01:00
SuperLUSupport	bug #1119 : Adjust call to ?gssvx for SuperLU 5	2016-07-10 02:29:57 +02:00
SVD	use MKL's lapacke.h header when using MKL	2017-08-17 21:58:39 +02:00
UmfPackSupport