Commit Graph

6139 Commits

Author SHA1 Message Date
Antonio Sanchez
4cf01d2cf5 Update AVX half packets, disable test.
The AVX half implementation is incomplete, causing the `packetmath_13` test
to fail.  This disables the test.

Also refactored the existing AVX implementation to use `bit_cast`
instead of direct access to `.x`.
2020-11-21 09:05:10 -08:00
Antonio Sanchez
fd1dcb6b45 Fixes duplicate symbol when building blas
Missing inline breaks blas, since symbol generated in
`complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp`

Changed rest of inlines to `EIGEN_STRONG_INLINE`.
2020-11-20 09:37:40 -08:00
David Tellenbach
6c9c3f9a1a Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool
Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to
float and can hence be converted to bool via the conversion chain

  Eigen::{half,bfloat16} -> float -> bool

We thus remove the explicit cast operator to bool.
2020-11-19 18:49:09 +01:00
Antonio Sanchez
a8fdcae55d Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix.
Multiplication of column-major `DynamicSparseMatrix`es involves three
temporaries:
- two for transposing twice to sort the coefficients
(`ConservativeSparseSparseProduct.h`, L160-161)
- one for a final copy assignment (`SparseAssign.h`, L108)
The latter is avoided in an optimization for `SparseMatrix`.

Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not
worth the effort to optimize further, so I simply disabled counting
temporaries via a macro.

Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra`
tests actually re-run all the original `sparse_product` tests as well.

We may want to simply drop the `DynamicSparseMatrix` tests altogether, which
would eliminate the test duplication.

Related to #2048
2020-11-18 23:15:33 +00:00
David Tellenbach
11e4056f6b Re-enable Arm Neon Eigen::half packets of size 8
- Add predux_half_dowto4
- Remove explicit casts in Half.h to match the behaviour of BFloat16.h
- Enable more packetmath tests for Eigen::half
2020-11-18 23:02:21 +00:00
Antonio Sanchez
17268b155d Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom
The existing `TensorRandom.h` implementation makes the assumption that
`half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not
always true. This currently fails on arm64, where `x` has type `__fp16`.
Added `bit_cast` specializations to allow casting to/from `uint16_t`
for both `half` and `bfloat16`.  Also added tests in
`half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch
these errors in the future.
2020-11-18 20:32:35 +00:00
Antonio Sanchez
60218829b7 EOF newline added to InverseSize4.
Causing build breakages due to `-Wnewline-eof -Werror` that seems to be
common across Google.
2020-11-18 07:58:33 -08:00
Rasmus Munk Larsen
2d63706545 Add missing parens around macro argument. 2020-11-18 00:24:19 +00:00
Rasmus Munk Larsen
6bba58f109 Replace SSE_SHUFFLE_MASK macro with shuffle_mask. 2020-11-17 15:28:37 -08:00
David Tellenbach
e9b55c4db8 Avoid promotion of Arm __fp16 to float in Neon PacketMath
Using overloaded arithmetic operators for Arm __fp16 always
causes a promotion to float. We replace operator* by vmulh_f16
to avoid this.
2020-11-17 20:19:44 +01:00
Antonio Sanchez
117a4c0617 Fix missing EIGEN_CONSTEXPR pop_macro in Half.
`EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if
`EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.
2020-11-17 08:29:33 -08:00
Guoqiang QI
394f564055 Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath. 2020-11-17 12:27:01 +00:00
acxz
9175f50d6f Add EIGEN_DEVICE_FUNC to TranspositionsBase
Fixes #2057.
2020-11-16 15:37:40 +00:00
Antonio Sanchez
bb69a8db5d Explicit casts of S -> std::complex<T>
When calling `internal::cast<S, std::complex<T>>(x)`, clang often
generates an implicit conversion warning due to an implicit cast
from type `S` to `T`.  This currently affects the following tests:
- `basicstuff`
- `bfloat16_float`
- `cxx11_tensor_casts`

The implicit cast leads to widening/narrowing float conversions.
Widening warnings only seem to be generated by clang (`-Wdouble-promotion`).

To eliminate the warning, we explicitly cast the real-component first
from `S` to `T`.  We also adjust tests to use `internal::cast` instead
of `static_cast` when a complex type may be involved.
2020-11-14 05:50:42 +00:00
guoqiangqi
8324e5e049 Fix typo in NEON/PacketMath.h 2020-11-13 00:46:41 +00:00
Rasmus Munk Larsen
bec72345d6 Simplify expression for inner product fallback in Gemv product evaluator. 2020-11-12 23:43:15 +00:00
Rasmus Munk Larsen
276db21f26 Remove redundant branch for handling dynamic vector*vector. This will be handled by the equivalent branch in the specialization for GemvProduct. 2020-11-12 21:54:56 +00:00
Rasmus Munk Larsen
cf12474a8b Optimize matrix*matrix and matrix*vector products when they correspond to inner products at runtime.
This speeds up inner products where the one or or both arguments is dynamic for small and medium-sized vectors (up to 32k).

name                           old time/op             new time/op   delta
BM_VecVecStatStat<float>/1     1.64ns ± 0%             1.64ns ± 0%     ~
BM_VecVecStatStat<float>/8     2.99ns ± 0%             2.99ns ± 0%     ~
BM_VecVecStatStat<float>/64    7.00ns ± 1%             7.04ns ± 0%   +0.66%
BM_VecVecStatStat<float>/512   61.6ns ± 0%             61.6ns ± 0%     ~
BM_VecVecStatStat<float>/4k     551ns ± 0%              553ns ± 1%   +0.26%
BM_VecVecStatStat<float>/32k   4.45µs ± 0%             4.45µs ± 0%     ~
BM_VecVecStatStat<float>/256k  77.9µs ± 0%             78.1µs ± 1%     ~
BM_VecVecStatStat<float>/1M     312µs ± 0%              312µs ± 1%     ~
BM_VecVecDynStat<float>/1      13.3ns ± 1%              4.6ns ± 0%  -65.35%
BM_VecVecDynStat<float>/8      14.4ns ± 0%              6.2ns ± 0%  -57.00%
BM_VecVecDynStat<float>/64     24.0ns ± 0%             10.2ns ± 3%  -57.57%
BM_VecVecDynStat<float>/512     138ns ± 0%               68ns ± 0%  -50.52%
BM_VecVecDynStat<float>/4k     1.11µs ± 0%             0.56µs ± 0%  -49.72%
BM_VecVecDynStat<float>/32k    8.89µs ± 0%             4.46µs ± 0%  -49.89%
BM_VecVecDynStat<float>/256k   78.2µs ± 0%             78.1µs ± 1%     ~
BM_VecVecDynStat<float>/1M      313µs ± 0%              312µs ± 1%     ~
BM_VecVecDynDyn<float>/1       10.4ns ± 0%             10.5ns ± 0%   +0.91%
BM_VecVecDynDyn<float>/8       12.0ns ± 3%             11.9ns ± 0%     ~
BM_VecVecDynDyn<float>/64      37.4ns ± 0%             19.6ns ± 1%  -47.57%
BM_VecVecDynDyn<float>/512      159ns ± 0%               81ns ± 0%  -49.07%
BM_VecVecDynDyn<float>/4k      1.13µs ± 0%             0.58µs ± 1%  -49.11%
BM_VecVecDynDyn<float>/32k     8.91µs ± 0%             5.06µs ±12%  -43.23%
BM_VecVecDynDyn<float>/256k    78.2µs ± 0%             78.2µs ± 1%     ~
BM_VecVecDynDyn<float>/1M       313µs ± 0%              312µs ± 1%     ~
2020-11-12 18:02:37 +00:00
Pedro Caldeira
c29935b323 Add support for dynamic dispatch of MMA instructions for POWER 10 2020-11-12 11:31:15 -03:00
acxz
b714dd9701 remove annotation for first declaration of default con/destruction 2020-11-12 04:34:12 +00:00
mehdi-goli
e24a1f57e3 [SYCL Function pointer Issue]: SYCL does not support function pointer inside the kernel, due to the portability issue of a function pointer and memory address space among host and accelerators. To fix the issue, function pointers have been replaced by function objects. 2020-11-12 01:50:28 +00:00
guoqiangqi
82fe059f35 Fix issue2045 which get a error case _mm256_set_m128d op not supported by gcc 7.x 2020-11-04 09:21:39 +08:00
Deven Desai
39a038f2e4 Fix for ROCm (and CUDA?) breakage - 201029
The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error

e265f7ed8e

```

Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:355:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:169:
/home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'?
  return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t*>(ptr)));
                                                                           ^~~~~~
                                                                           Eigen::numext
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here
namespace numext {
          ^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
  Error generating file
  /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o

test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2
```

The fix is in this commit is trivial. Please review and merge
2020-10-29 15:34:05 +00:00
David Tellenbach
f895755c0e Remove unused functions in Half.h.
The following functions have been removed:

  Eigen::half fabsh(const Eigen::half&)
  Eigen::half exph(const Eigen::half&)
  Eigen::half sqrth(const Eigen::half&)
  Eigen::half powh(const Eigen::half&, const Eigen::half&)
  Eigen::half floorh(const Eigen::half&)
  Eigen::half ceilh(const Eigen::half&)
2020-10-29 07:37:52 +01:00
David Tellenbach
09f015852b Replace numext::as_uint with numext::bit_cast<numext::uint32_t> 2020-10-29 07:28:28 +01:00
David Tellenbach
e265f7ed8e Add support for Armv8.2-a __fp16
Armv8.2-a provides a native half-precision floating point (__fp16 aka.
float16_t). This patch introduces

* __fp16 as underlying type of Eigen::half if this type is available
* the packet types Packet4hf and Packet8hf representing float16x4_t and
  float16x8_t respectively
* packet-math for the above packets with corresponding scalar type Eigen::half

The packet-math functionality has been implemented by Ashutosh Sharma
<ashutosh.sharma@amperecomputing.com>.

This closes #1940.
2020-10-28 20:15:09 +00:00
mehdi-goli
b9ff791fed [Missing SYCL math op]: Addin the missing LDEXP Function for SYCL. 2020-10-28 08:32:57 +00:00
mehdi-goli
61461d682a [Fixing expf issue]: Eigen uses the packet type operation for scaler type float on Sigmoid function(https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/functors/UnaryFunctors.h#L990). As a result SYCL backend breaks since SYCL backend only supports packet operation for vectorized type float4 and double2. The issue has been fixed by adding scalar type float to packet operation pexp for SYCL backend. 2020-10-28 08:30:34 +00:00
guoqiangqi
28aef8e816 Improve polynomial evaluation with instruction-level parallelism for pexp_float and pexp<Packet16f> 2020-10-20 11:37:09 +08:00
guoqiangqi
4a77eda1fd remove unnecessary specialize template of pexp for scale float/double 2020-10-19 00:51:42 +00:00
Antonio Sanchez
d9f0d9eb76 Fix missing pfirst<Packet16b> for MSVC.
It was only defined under one `#ifdef` case.  This fixes the `packetmath_14`
test for MSVC.
2020-10-16 16:22:00 -07:00
Rasmus Munk Larsen
21edea5edd Fix the specialization of pfrexp for AVX to be faster when AVX2/AVX512DQ is not available, and avoid undefined behavior in C++. Also mask off the sign bit when extracting the exponent. 2020-10-15 18:39:58 -07:00
Deven Desai
011e0db31d Fix for ROCm/HIP breakage - 201013
The following commit seems to have introduced regressions in ROCm/HIP support.

183a208212

It causes some unit-tests to fail with the following error

```
...
Eigen/src/Core/GenericPacketMath.h:322:3: error: no member named 'bit_and' in the global namespace; did you mean 'std::bit_and'?
...
Eigen/src/Core/GenericPacketMath.h:329:3: error: no member named 'bit_or' in the global namespace; did you mean 'std::bit_or'?
...
Eigen/src/Core/GenericPacketMath.h:336:3: error: no member named 'bit_xor' in the global namespace; did you mean 'std::bit_xor'?
...
```

The error occurs because, when compiling the device code in HIP/CUDA, the compiler will pick up the some of the std functions (whose calls are prefixed by EIGEN_USING_STD) from the global namespace (i.e. use ::bit_xor instead of std::bit_xor). For this to work, those functions must be declared in the global namespace in the HIP/CUDA header files. The `bit_and`, `bit_or` and `bit_xor` routines are not declared in the HIP header file that contain the decls for the std math functions ( `math_functions.h` ), and this is the cause of the error above.

It seems that the newer HIP compilers do support the calling of `std::` math routines within device code, and the ideal fix here would have been to change all calls to std math functions in EIGEN to use the `std::` namespace (instead of the global namespace ), when compiling  with HIP compiler. However it seems there was a recent commit to remove the EIGEN_USING_STD_MATH macro and collapse it uses into the EIGEN_USING_STD macro ( 4091f6b25c ).

Replacing all std math calls will essentially require re-surrecting the EIGEN_USING_STD_MATH macro, so not choosing that option.

Also HIP compilers only have support std math calls within device code, and not all std functions (specifically not for malloc/free which are prefixed via EIGEN_USING_STD). So modyfing EIGEN_USE_STD implementation to use std:: namspace for HIP will not work either.

Hence going for the ugly solution of special casing the three calls that breaking the HIP compile, to explicitly use the std:: namespace
2020-10-15 12:17:35 +00:00
Rasmus Munk Larsen
6ea8091705 Revert change from 4e4d3f32d1 that broke BFloat16.h build with older compilers. 2020-10-15 01:20:08 +00:00
Guoqiang QI
4700713faf Add AVX plog<Packet4d> and AVX512 plog<Packet8d> ops,also unified AVX512 plog<Packet16f> op with generic api 2020-10-15 00:54:45 +00:00
Rasmus Munk Larsen
af6f43d7ff Add specializations for pmin/pmax with prescribed NaN propagation semantics for SSE/AVX/AVX512. 2020-10-14 23:11:24 +00:00
Rasmus Munk Larsen
208b3626d1 Revert generic implementation of predux, since it break compilation of predux_any with MSVC. 2020-10-14 21:41:28 +00:00
David Tellenbach
e3e2cf9d24 Add MatrixBase::cwiseArg() 2020-10-14 01:56:42 +00:00
Rasmus Munk Larsen
c6953f799b Add packet generic ops predux_fmin, predux_fmin_nan, predux_fmax, and predux_fmax_nan that implement reductions with PropagateNaN, and PropagateNumbers semantics. Add (slow) generic implementations for most reductions. 2020-10-13 21:48:31 +00:00
acxz
807e51528d undefine EIGEN_CONSTEXPR before redefinition 2020-10-12 20:28:56 -04:00
Rasmus Munk Larsen
9a4d04c05f Make bitwise_helper a device function to unbreak GPU builds. 2020-10-10 01:45:20 +00:00
Rasmus Munk Larsen
4e4d3f32d1 Clean up packetmath tests and fix various bugs to make bfloat16 pass (almost) all packetmath tests with SSE, AVX, and AVX512. 2020-10-09 20:05:49 +00:00
David Tellenbach
4091f6b25c Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD 2020-10-09 02:05:05 +02:00
Rasmus Munk Larsen
183a208212 Implement generic bitwise logical packet ops that work for all types. 2020-10-08 22:45:20 +00:00
Rasmus Munk Larsen
b431024404 Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms.
Change test to only test for NaN-propagation for pfmin/pfmax.
2020-10-07 19:05:18 +00:00
David Tellenbach
f66f3393e3 Use reinterpret_cast instead of C-style cast in Inverse_NEON.h 2020-10-04 00:35:09 +02:00
Rasmus Munk Larsen
22c971a225 Don't cast away const in Inverse_NEON.h. 2020-10-02 15:06:34 -07:00
Rasmus Munk Larsen
f93841b53e Use EIGEN_USING_STD to fix CUDA compilation error on BFloat16.h. 2020-10-02 14:47:15 -07:00
Rasmus Munk Larsen
ee714f79f7 Fix CUDA build breakage and incorrect result for absdiff on HIP with long double arguments. 2020-10-02 21:05:35 +00:00
janos
f7b185a8b1 dont use =* might not return a Scalar 2020-10-02 14:36:51 +02:00