eigen/Eigen
Rasmus Munk Larsen a566074480 Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function).
This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in 66f07efeae), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9.

This change also contains a few improvements to speed up the original float specialization of logistic:
  - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case).
  - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup).

The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set.

The benchmarks below repeated calls

   u = v.logistic()  (u = v.tanh(), respectively)

where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1].

Benchmark numbers for logistic:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        4467           4468         155835  model_time: 4827
AVX
BM_eigen_logistic_float        2347           2347         299135  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1467           1467         476143  model_time: 2926
AVX512
BM_eigen_logistic_float         805            805         858696  model_time: 1463

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        2589           2590         270264  model_time: 4827
AVX
BM_eigen_logistic_float        1428           1428         489265  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1059           1059         662255  model_time: 2926
AVX512
BM_eigen_logistic_float         673            673        1000000  model_time: 1463

Benchmark numbers for tanh:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2391           2391         292624  model_time: 4242
AVX
BM_eigen_tanh_float        1256           1256         554662  model_time: 2633
AVX+FMA
BM_eigen_tanh_float         823            823         866267  model_time: 1609
AVX512
BM_eigen_tanh_float         443            443        1578999  model_time: 805

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2588           2588         273531  model_time: 4242
AVX
BM_eigen_tanh_float        1536           1536         452321  model_time: 2633
AVX+FMA
BM_eigen_tanh_float        1007           1007         694681  model_time: 1609
AVX512
BM_eigen_tanh_float         471            471        1472178  model_time: 805
2019-12-16 21:33:42 +00:00
..
src Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function). 2019-12-16 21:33:42 +00:00
Cholesky bug #1455: Cholesky module depends on Jacobi for rank-updates. 2017-08-22 11:37:32 +02:00
CholmodSupport
Core Fix a circular dependency regarding pshift* functions and GenericPacketMathFunctions. 2019-09-06 09:26:04 +02:00
Dense
Eigen
Eigenvalues Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop 2018-08-28 11:44:15 +02:00
Geometry Enable SSE vectorization of Quaternion and cross3() with AVX 2019-02-23 10:45:40 +01:00
Householder
IterativeLinearSolvers
Jacobi
KLUSupport Move KLU support to official 2017-11-10 14:11:22 +01:00
LU use MKL's lapacke.h header when using MKL 2017-08-17 21:58:39 +02:00
MetisSupport
OrderingMethods Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239 2019-03-05 10:24:54 -08:00
PardisoSupport Extend CUDA support to matrix inversion and selfadjointeigensolver 2018-06-11 18:33:24 +02:00
PaStiXSupport clarify Pastix requirements 2017-11-27 22:11:57 +01:00
QR Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop 2018-08-28 11:44:15 +02:00
QtAlignedMalloc bug #1468 (1/2) : add missing std:: to memcpy 2017-09-22 09:23:24 +02:00
Sparse Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239 2019-03-05 10:24:54 -08:00
SparseCholesky Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239 2019-03-05 10:24:54 -08:00
SparseCore
SparseLU Fix numerous shadow-warnings for GCC<=4.8 2018-08-28 18:32:39 +02:00
SparseQR Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop 2018-08-28 11:44:15 +02:00
SPQRSupport
StdDeque bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
StdList bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
StdVector bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
SuperLUSupport bug #1119: Adjust call to ?gssvx for SuperLU 5 2016-07-10 02:29:57 +02:00
SVD use MKL's lapacke.h header when using MKL 2017-08-17 21:58:39 +02:00
UmfPackSupport