mirror of
https://gitlab.com/libeigen/eigen.git
synced 2025-01-06 14:14:46 +08:00
f1e8307308
2. Simplify handling of special cases by taking advantage of the fact that the builtin vrsqrt approximation handles negative, zero and +inf arguments correctly. This speeds up the SSE and AVX implementations by ~20%. 3. Make the Newton-Raphson formula used for rsqrt more numerically robust: Before: y = y * (1.5 - x/2 * y^2) After: y = y * (1.5 - y * (x/2) * y) Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision. 4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration. Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o |
||
---|---|---|
.. | ||
src | ||
Cholesky | ||
CholmodSupport | ||
CMakeLists.txt | ||
Core | ||
Dense | ||
Eigen | ||
Eigenvalues | ||
Geometry | ||
Householder | ||
IterativeLinearSolvers | ||
Jacobi | ||
KLUSupport | ||
LU | ||
MetisSupport | ||
OrderingMethods | ||
PardisoSupport | ||
PaStiXSupport | ||
QR | ||
QtAlignedMalloc | ||
Sparse | ||
SparseCholesky | ||
SparseCore | ||
SparseLU | ||
SparseQR | ||
SPQRSupport | ||
StdDeque | ||
StdList | ||
StdVector | ||
SuperLUSupport | ||
SVD | ||
UmfPackSupport |