Commit Graph

6642 Commits

Author SHA1 Message Date
Gengxin Xie
5c642950a5 Bug Fix: correct the bug that won't define EIGEN_HAS_FP16_C
if the compiler isn't clang
2021-11-04 22:13:01 +00:00
Gilad
0d73440fb2 Documentation of Quaternion constructor from MatrixBase 2021-11-04 16:21:26 +00:00
Xinle Liu
478a1bdda6 Fix total deflation issue in BDCSVD, when & only when M is already diagonal. 2021-11-02 16:53:55 +00:00
Chip Kerchner
9cf34ee0ae Invert rows and depth in non-vectorized portion of packing (PowerPC). 2021-10-28 21:59:41 +00:00
Ilya Tokar
e1cb6369b0 Add AVX vector path to float2half/half2float
Makes e. g. matrix multiplication 2x faster:
name         old cpu/op  new cpu/op  delta
BM_convers   181ms ± 1%    62ms ± 9%  -65.82%  (p=0.016 n=4+5)

Tested on all possible input values (not adding tests, since they
take a long time).
2021-10-28 13:59:01 -04:00
Antonio Sanchez
03d4cbb307 Fix min/max nan-propagation for scalar "other".
Copied input type from `EIGEN_MAKE_CWISE_BINARY_OP`.

Fixes #2362.
2021-10-28 09:28:29 -07:00
Antonio Sanchez
e559701981 Fix compile issue for gcc 4.8 2021-10-28 08:23:19 -07:00
Rohit Santhanam
48e40b22bf Preliminary HIP bfloat16 GPU support. 2021-10-27 18:36:45 +00:00
Antonio Sanchez
40bbe8a4d0 Fix ZVector build.
Cross-compiled via `s390x-linux-gnu-g++`, run via qemu.  This allows the
packetmath tests to pass.
2021-10-27 16:30:15 +00:00
Alex Druinsky
6bb6a6bf53 Vectorize fp16 tanh and logistic functions on Neon
Activates vectorization of the Eigen::half versions of the tanh and
logistic functions when they run on Neon. Both functions convert their
inputs to float before computing the output, and as a result of this
commit, the conversions and the computation in float are vectorized.
2021-10-27 16:09:16 +00:00
Andreas Krebbel
8faafc3aaa ZVector: Move alignas qualifier to come first
We currently have plenty of type definitions with the alignment
qualifier coming after the type.  The compiler warns about ignoring
them:
int EIGEN_ALIGN16 ai[4];

Turn this into:
EIGEN_ALIGN16 int ai[4];
2021-10-26 15:33:47 +02:00
Alex Druinsky
d0e3791b1a Fix vectorized reductions for Eigen::half
Fixes compiler errors in expressions that look like

  Eigen::Matrix<Eigen::half, 3, 1>::Random().maxCoeff()

The error comes from the code that creates the initial value for
vectorized reductions. The fix is to specify the scalar type of the
reduction's initial value.

The cahnge is necessary for Eigen::half because unlike other types,
Eigen::half scalars cannot be implicitly created from integers.
2021-10-25 14:44:33 -07:00
Yann Billeter
6c3206152a fix(CommaInitializer): pass dims at compile-time 2021-10-25 19:53:38 +00:00
Antonio Sanchez
0578feaabc Remove const from visitor return type.
This seems to interfere with `pload`/`ploadu`, since `pload<const
Packet**>` are not defined.

This should unbreak the arm/ppc builds.
2021-10-25 19:09:50 +00:00
benardp
b63c096fbb Extend EIGEN_QT_SUPPORT to Qt6 2021-10-23 23:43:06 +00:00
Lennart Steffen
163f11e24a Included note on inner stride for compile-time vectors. See https://gitlab.com/libeigen/eigen/-/issues/2355#note_711078126 2021-10-22 09:46:43 +00:00
Rasmus Munk Larsen
2d3fec8ff6 Add nan-propagation options to matrix and array plugins. 2021-10-21 19:40:11 +00:00
Antonio Sanchez
b86e013321 Revert bit_cast to use memcpy for CUDA.
To elide the memcpy, we need to first load the `src` value into
registers by making a local copy. This avoids the need to resort
to potential UB by using `reinterpret_cast`.

This change doesn't seem to affect CPU (at least not with gcc/clang).
With optimizations on, the copy is also elided.
2021-10-21 08:14:11 -07:00
Antonio Sanchez
45e67a6fda Use reinterpret_cast on GPU for bit_cast.
This seems to be the recommended approach for doing type punning in
CUDA. See for example
- https://stackoverflow.com/questions/47037104/cuda-type-punning-memcpy-vs-ub-union
- https://developer.nvidia.com/blog/faster-parallel-reductions-kepler/
(the latter puns a double to an `int2`).
The issue is that for CUDA, the `memcpy` is not elided, and ends up
being an expensive operation.  We already have similar `reintepret_cast`s across
the Eigen codebase for GPU (as does TensorFlow).
2021-10-20 21:34:40 +00:00
Antonio Sanchez
95bb645e92 Fix MSVC+NVCC EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR compilation.
Looks like we need to update the
`EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR` for newer versions of MSVC as
well when compiling with NVCC.  Fixes build issues for VS 2017.
2021-10-20 19:38:14 +00:00
Antonio Sanchez
fd5f48e465 Fix tuple compilation for VS2017.
VS2017 doesn't like deducing alias types, leading to a bunch of compile
errors for functions involving the `tuple` alias.  Replacing with
`TupleImpl` seems to solve this, allowing the test to compile/pass.
2021-10-20 19:18:34 +00:00
Antonio Sanchez
d0d34524a1 Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h
The `Complex.h` file applies equally to HIP/CUDA, so placing under the
generic `GPU` folder.

The `TensorReductionCuda.h` has already been deprecated, now removing
for the next Eigen version.
2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen
f2c9c2d2f7 Vectorize Visitor.h. 2021-10-20 16:58:01 +00:00
Antonio Sanchez
f0f1d7938b Disable testing of complex compound assignment operators for MSVC.
MSVC does not support specializing compound assignments for
`std::complex`, since it already specializes them (contrary to the
standard).

Trying to use one of these on device will currently lead to a
duplicate definition error.  This is still probably preferable
to no error though.  If we remove the definitions for MSVC, then
it will compile, but the kernel will fail silently.

The only proper solution would be to define our own custom `Complex`
type.
2021-09-27 15:15:11 -07:00
Antonio Sanchez
21640612be Disable more CUDA warnings.
For cuda 9.2 and 11.4, they changed the numbers again.

Fixes #2331.
2021-09-24 21:31:14 -07:00
Antonio Sanchez
e9e90892fe Disable another device warning 2021-09-23 13:43:18 -07:00
Antonio Sanchez
86c0decc48 Disable more NVCC warnings.
The 2979 warning is yet another "calling a __host__ function from a
__host__ device__ function.  Although we probably should eventually
address these, they are flooding the logs.  Most of these are
harmless since we only call the original from the host.
In cases where these are actually called from device, an error is generated
instead anyways.

The 2977 warning is a bit strange - although the warning suggests the
`__device__` annotation is ignored, this doesn't actually seem to be
the case.  Without the `__device__` declarations, the kernel actually
fails to run when attempting to construct such objects.  Again,
these warnings are flooding the logs, so disabling for now.
2021-09-23 10:52:39 -07:00
Kolja Brix
afa616bc9e Fix some typos found 2021-09-23 15:22:00 +00:00
sciencewhiz
4b6036e276 fix various typos 2021-09-22 16:15:06 +00:00
Alexander Grund
b5eaa42695
Fix alias violation in BFloat16
reinterpret_cast between unrelated types is undefined behavior and leads
to misoptimizations on some platforms.
Use the safer (and faster) version via bit_cast
2021-09-20 10:37:50 +02:00
Antonio Sanchez
3c724c44cf Fix strict aliasing bug causing product_small failure.
Packet loading is skipped due to aliasing violation, leading to nullopt matrix
multiplication.

Fixes #2327.
2021-09-17 21:09:34 +00:00
Antonio Sanchez
5dac0b53c9 Move Eigen::all,last,lastp1,lastN to Eigen::placeholders::.
These names are so common, IMO they should not exist directly in the
`Eigen::` namespace.  This prevents us from using the `last` or `all`
names for any parameters or local variables, otherwise spitting out
warnings about shadowing or hiding the global values.  Many external
projects (and our own examples) also heavily use
```
using namespace Eigen;
```
which means these conflict with external libraries as well, e.g.
`std::fill(first,last,value)`.

It seems originally these were placed in a separate namespace
`Eigen::placeholders`, which has since been deprecated.  I propose
to un-deprecate this, and restore the original locations.

These symbols are also imported into `Eigen::indexing`, which
additionally imports `fix` and `seq`. An alternative is to remove the
`placeholders` namespace and stick with `indexing`.

NOTE: this is an API-breaking change.

Fixes #2321.
2021-09-17 10:21:42 -07:00
Rasmus Munk Larsen
6cadab6896 Clean up EIGEN_STATIC_ASSERT to only use standard c++11 static_assert. 2021-09-16 20:43:54 +00:00
Rasmus Munk Larsen
7b975acb1f Remove unused variable. 2021-09-16 20:27:13 +00:00
Rasmus Munk Larsen
92849d814b Remove unused variable. 2021-09-16 20:21:31 +00:00
Rasmus Munk Larsen
da027fa20a Remove unused variable. 2021-09-16 20:02:42 +00:00
Antonio Sanchez
cb50730993 Default eigen_packet_wrapper constructor.
This makes it trivial, allowing use of `memcpy`.

Fixes #2326
2021-09-14 10:57:22 -07:00
Rasmus Munk Larsen
d7d0bf832d Issue an error in case of direct inclusion of internal headers. 2021-09-10 19:12:26 +00:00
Antonio Sanchez
26e5beb8cb Device-compatible Tuple implementation.
An analogue of `std::tuple` that works on device.

Context: I've tried `std::tuple` in various versions of NVCC and clang,
and although code seems to compile, it often fails to run - generating
"illegal memory access" errors, or "illegal instruction" errors.
This replacement does work on device.
2021-09-08 13:34:19 -07:00
Antonio Sanchez
fcd73b4884 Add a simple serialization mechanism.
The `Serializer<T>` class implements a binary serialization that
can write to (`serialize`) and read from (`deserialize`) a byte
buffer.  Also added convenience routines for serializing
a list of arguments.

This will mainly be for testing, specifically to transfer data to
and from the GPU.
2021-09-08 09:38:59 -07:00
Antonio Sanchez
7792b1e909 Fix AVX2 PacketMath.h.
There were a couple typos ps -> epi32, and an unaligned load issue.
2021-09-03 19:47:57 +00:00
Antonio Sanchez
5bf35383e0 Disable MSVC constant condition warning.
We use extensive use of `if (CONSTANT)`, and cannot use c++17's `if
constexpr`.
2021-09-03 11:07:18 -07:00
Antonio Sanchez
def145547f Add missing packet types in pset1 call.
Oops, introduced this when "fixing" integer packets.
2021-09-02 16:21:07 -07:00
Antonio Sanchez
3b48a3b964 Remove stray DynamicSparseMatrix references.
DynamicSparseMatrix has been removed.  These shouldn't be here anymore.
2021-09-02 19:47:26 +00:00
Antonio Sanchez
ebd4b17d2f Fix tridiagonalization_inplace_selector.
The `Options` of the new `hCoeffs` vector do not necessarily match
those of the `MatrixType`, leading to build errors. Having the
`CoeffVectorType` be a template parameter relieves this restriction.
2021-09-02 12:23:27 -07:00
Antonio Sanchez
998bab4b04 Missing EIGEN_DEVICE_FUNCs to get gpu_basic passing with CUDA 9.
CUDA 9 seems to require labelling defaulted constructors as
`EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are
ignored.  Without these labels, the `gpu_basic` test fails to
compile, with errors about calling `__host__` functions from
`__host__ __device__` functions.
2021-09-01 19:49:53 -07:00
Antonio Sanchez
3d4ba855e0 Fix AVX integer packet issues.
Most are instances of AVX2 functions not protected by
`EIGEN_VECTORIZE_AVX2`.  There was also a missing semi-colon
for AVX512.
2021-09-01 14:14:43 -07:00
Antonio Sanchez
3a6296d4f1 Fix EIGEN_OPTIMIZATION_BARRIER for arm-clang.
Clang doesn't like !621, needs the "g" constraint back.
The "g" constraint also works for GCC >= 5.

This fixes our gitlab CI.
2021-09-01 09:19:55 -07:00
Antonio Sanchez
ff07a8a639 GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315).
GCC 4.8 doesn't seem to like the `g` register constraint, failing to
compile with "error: 'asm' operand requires impossible reload".

Tested `r` instead, and that seems to work, even with latest compilers.

Also fixed some minor macro issues to eliminate warnings on armv7.

Fixes #2315.
2021-08-31 20:20:47 +00:00
Antonio Sanchez
cc3573ab44 Disable cuda Eigen::half vectorization on host.
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.

All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.

Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.
2021-08-31 19:13:12 +00:00