Commit Graph

11 Commits

Author SHA1 Message Date
Antonio Sanchez
82d61af3a4 Fix rint SSE/NEON again, using optimization barrier.
This is a new version of !423, which failed for MSVC.

Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to
prevent operations involving `X` from crossing that barrier. Should
work on most `GNUC` compatible compilers (MSVC doesn't seem to need
this). This is a modified version adapted from what was used in
`psincos_float` and tested on more platforms
(see #1674, https://godbolt.org/z/73ezTG).

Modified `rint` to use the barrier to prevent the add/subtract rounding
trick from being optimized away.

Also fixed an edge case for large inputs that get bumped up a power of two
and ends up rounding away more than just the fractional part.  If we are
over `2^digits` then just return the input.  This edge case was missed in
the test since the test was comparing approximate equality, which was still
satisfied.  Adding a strict equality option catches it.
2021-03-05 08:54:12 -08:00
Antonio Sánchez
9a663973b4 Revert "Fix rint for SSE/NEON."
This reverts commit e72dfeb8b9
2021-03-03 18:51:51 +00:00
Antonio Sanchez
e72dfeb8b9 Fix rint for SSE/NEON.
It seems *sometimes* with aggressive optimizations the combination
`psub(padd(a, b), b)` trick to force rounding is compiled away. Here
we replace with inline assembly to prevent this (I tried `volatile`,
but that leads to additional loads from memory).

Also fixed an edge case for large inputs `a` where adding `b` bumps
the value up a power of two and ends up rounding away more than
just the fractional part.  If we are over `2^digits` then just return
the input.  This edge case was missed in the test since the test was
comparing approximate equality, which was still satisfied.  Adding
a strict equality option catches it.
2021-03-03 09:41:46 -08:00
Antonio Sanchez
b2126fd6b5 Fix pfrexp/pldexp for half.
The recent addition of vectorized pow (!330) relies on `pfrexp` and
`pldexp`.  This was missing for `Eigen::half` and `Eigen::bfloat16`.
Adding tests for these packet ops also exposed an issue with handling
negative values in `pfrexp`, returning an incorrect exponent.

Added the missing implementations, corrected the exponent in `pfrexp1`,
and added `packetmath` tests.
2021-01-21 19:32:28 +00:00
Antonio Sanchez
f149e0ebc3 Fix MSVC complex sqrt and packetmath test.
MSVC incorrectly handles `inf` cases for `std::sqrt<std::complex<T>>`.
Here we replace it with a custom version (currently used on GPU).

Also fixed the `packetmath` test, which previously skipped several
corner cases since `CHECK_CWISE1` only tests the first `PacketSize`
elements.
2021-01-08 01:17:19 +00:00
Rasmus Munk Larsen
4e4d3f32d1 Clean up packetmath tests and fix various bugs to make bfloat16 pass (almost) all packetmath tests with SSE, AVX, and AVX512. 2020-10-09 20:05:49 +00:00
Pedro Caldeira
35d149e34c Add missing functions for Packet8bf in Altivec architecture.
Including new tests for bfloat16 Packets.
Fix prsqrt on GenericPacketMath.
2020-09-08 09:22:11 -05:00
Teng Lu
386d809bde Support BFloat16 in Eigen 2020-06-20 19:16:24 +00:00
Rasmus Munk Larsen
c1d944dd91 Remove packet ops pinsertfirst and pinsertlast that are only used in a single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp.
I cannot measure any performance changes for SSE, AVX, or AVX512.

name                                 old time/op             new time/op             delta
BM_LinSpace<float>/1                 1.63ns ± 0%             1.63ns ± 0%   ~             (p=0.762 n=5+5)
BM_LinSpace<float>/8                 4.92ns ± 3%             4.89ns ± 3%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/64                34.6ns ± 0%             34.6ns ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/512                217ns ± 0%              217ns ± 0%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/4k                1.68µs ± 0%             1.68µs ± 0%   ~             (p=1.000 n=5+5)
BM_LinSpace<float>/32k               13.3µs ± 0%             13.3µs ± 0%   ~             (p=0.905 n=5+4)
BM_LinSpace<float>/256k               107µs ± 0%              107µs ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/1M                 427µs ± 0%              427µs ± 0%   ~             (p=0.690 n=5+5)
2020-05-08 15:41:50 -07:00
Christoph Hertzberg
35219cea68 Bug #1790: Make areApprox check numext::isnan instead of bitwise equality (NaNs don't have to be bitwise equal). 2020-01-11 14:57:22 +01:00
Srinivas Vasudevan
2e099e8d8f Added special_packetmath test and tweaked bounds on tests.
Refactor shared packetmath code to header file.
(Squashed from PR !38)
2020-01-11 10:31:21 +00:00