eigen/unsupported/Eigen
Srinivas Vasudevan facdec5aa7 Add packetized versions of i0e and i1e special functions.
- In particular refactor the i0e and i1e code so scalar and vectorized path share code.
  - Move chebevl to GenericPacketMathFunctions.


A brief benchmark with building Eigen with FMA, AVX and AVX2 flags

Before:

CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1            57.3           57.3     10000000
BM_eigen_i0e_double/8           398            398        1748554
BM_eigen_i0e_double/64         3184           3184         218961
BM_eigen_i0e_double/512       25579          25579          27330
BM_eigen_i0e_double/4k       205043         205042           3418
BM_eigen_i0e_double/32k     1646038        1646176            422
BM_eigen_i0e_double/256k   13180959       13182613             53
BM_eigen_i0e_double/1M     52684617       52706132             10
BM_eigen_i0e_float/1             28.4           28.4     24636711
BM_eigen_i0e_float/8             75.7           75.7      9207634
BM_eigen_i0e_float/64           512            512        1000000
BM_eigen_i0e_float/512         4194           4194         166359
BM_eigen_i0e_float/4k         32756          32761          21373
BM_eigen_i0e_float/32k       261133         261153           2678
BM_eigen_i0e_float/256k     2087938        2088231            333
BM_eigen_i0e_float/1M       8380409        8381234             84
BM_eigen_i1e_double/1            56.3           56.3     10000000
BM_eigen_i1e_double/8           397            397        1772376
BM_eigen_i1e_double/64         3114           3115         223881
BM_eigen_i1e_double/512       25358          25361          27761
BM_eigen_i1e_double/4k       203543         203593           3462
BM_eigen_i1e_double/32k     1613649        1613803            428
BM_eigen_i1e_double/256k   12910625       12910374             54
BM_eigen_i1e_double/1M     51723824       51723991             10
BM_eigen_i1e_float/1             28.3           28.3     24683049
BM_eigen_i1e_float/8             74.8           74.9      9366216
BM_eigen_i1e_float/64           505            505        1000000
BM_eigen_i1e_float/512         4068           4068         171690
BM_eigen_i1e_float/4k         31803          31806          21948
BM_eigen_i1e_float/32k       253637         253692           2763
BM_eigen_i1e_float/256k     2019711        2019918            346
BM_eigen_i1e_float/1M       8238681        8238713             86


After:

CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1            15.8           15.8     44097476
BM_eigen_i0e_double/8            99.3           99.3      7014884
BM_eigen_i0e_double/64          777            777         886612
BM_eigen_i0e_double/512        6180           6181         100000
BM_eigen_i0e_double/4k        48136          48140          14678
BM_eigen_i0e_double/32k      385936         385943           1801
BM_eigen_i0e_double/256k    3293324        3293551            228
BM_eigen_i0e_double/1M     12423600       12424458             57
BM_eigen_i0e_float/1             16.3           16.3     43038042
BM_eigen_i0e_float/8             30.1           30.1     23456931
BM_eigen_i0e_float/64           169            169        4132875
BM_eigen_i0e_float/512         1338           1339         516860
BM_eigen_i0e_float/4k         10191          10191          68513
BM_eigen_i0e_float/32k        81338          81337           8531
BM_eigen_i0e_float/256k      651807         651984           1000
BM_eigen_i0e_float/1M       2633821        2634187            268
BM_eigen_i1e_double/1            16.2           16.2     42352499
BM_eigen_i1e_double/8           110            110        6316524
BM_eigen_i1e_double/64          822            822         851065
BM_eigen_i1e_double/512        6480           6481         100000
BM_eigen_i1e_double/4k        51843          51843          10000
BM_eigen_i1e_double/32k      414854         414852           1680
BM_eigen_i1e_double/256k    3320001        3320568            212
BM_eigen_i1e_double/1M     13442795       13442391             53
BM_eigen_i1e_float/1             17.6           17.6     41025735
BM_eigen_i1e_float/8             35.5           35.5     19597891
BM_eigen_i1e_float/64           240            240        2924237
BM_eigen_i1e_float/512         1424           1424         485953
BM_eigen_i1e_float/4k         10722          10723          65162
BM_eigen_i1e_float/32k        86286          86297           8048
BM_eigen_i1e_float/256k      691821         691868           1000
BM_eigen_i1e_float/1M       2777336        2777747            256


This shows anywhere from a 50% to 75% improvement on these operations.

I've also benchmarked without any of these flags turned on, and got similar
performance to before (if not better).

Also tested packetmath.cpp + special_functions to ensure no regressions.
2019-09-11 18:34:02 -07:00
..
CXX11 Update ThreadLocal to use separate Initialize/Release callables 2019-09-10 16:13:32 -07:00
src Add packetized versions of i0e and i1e special functions. 2019-09-11 18:34:02 -07:00
AdolcForward bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
AlignedVector3 bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
ArpackSupport bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
AutoDiff Fix numerous shadow-warnings for GCC<=4.8 2018-08-28 18:32:39 +02:00
BVH bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
CMakeLists.txt
EulerAngles bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
FFT bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
IterativeSolvers bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
KroneckerProduct
LevenbergMarquardt bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
MatrixFunctions Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file). 2018-10-19 21:10:28 +02:00
MoreVectorization bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
MPRealSupport Fix MPrealSupport 2018-09-20 18:30:10 +02:00
NonLinearOptimization bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
NumericalDiff bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
OpenGLSupport bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
Polynomials bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
Skyline bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
SparseExtra
SpecialFunctions PR 681: Add ndtri function, the inverse of the normal distribution function. 2019-08-12 19:26:29 -04:00
Splines Fix numerous shadow-warnings for GCC<=4.8 2018-08-28 18:32:39 +02:00