Commit Graph

6623 Commits

Author SHA1 Message Date
Antonio Sanchez
95bb645e92 Fix MSVC+NVCC EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR compilation.
Looks like we need to update the
`EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR` for newer versions of MSVC as
well when compiling with NVCC.  Fixes build issues for VS 2017.
2021-10-20 19:38:14 +00:00
Antonio Sanchez
fd5f48e465 Fix tuple compilation for VS2017.
VS2017 doesn't like deducing alias types, leading to a bunch of compile
errors for functions involving the `tuple` alias.  Replacing with
`TupleImpl` seems to solve this, allowing the test to compile/pass.
2021-10-20 19:18:34 +00:00
Antonio Sanchez
d0d34524a1 Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h
The `Complex.h` file applies equally to HIP/CUDA, so placing under the
generic `GPU` folder.

The `TensorReductionCuda.h` has already been deprecated, now removing
for the next Eigen version.
2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen
f2c9c2d2f7 Vectorize Visitor.h. 2021-10-20 16:58:01 +00:00
Antonio Sanchez
f0f1d7938b Disable testing of complex compound assignment operators for MSVC.
MSVC does not support specializing compound assignments for
`std::complex`, since it already specializes them (contrary to the
standard).

Trying to use one of these on device will currently lead to a
duplicate definition error.  This is still probably preferable
to no error though.  If we remove the definitions for MSVC, then
it will compile, but the kernel will fail silently.

The only proper solution would be to define our own custom `Complex`
type.
2021-09-27 15:15:11 -07:00
Antonio Sanchez
21640612be Disable more CUDA warnings.
For cuda 9.2 and 11.4, they changed the numbers again.

Fixes #2331.
2021-09-24 21:31:14 -07:00
Antonio Sanchez
e9e90892fe Disable another device warning 2021-09-23 13:43:18 -07:00
Antonio Sanchez
86c0decc48 Disable more NVCC warnings.
The 2979 warning is yet another "calling a __host__ function from a
__host__ device__ function.  Although we probably should eventually
address these, they are flooding the logs.  Most of these are
harmless since we only call the original from the host.
In cases where these are actually called from device, an error is generated
instead anyways.

The 2977 warning is a bit strange - although the warning suggests the
`__device__` annotation is ignored, this doesn't actually seem to be
the case.  Without the `__device__` declarations, the kernel actually
fails to run when attempting to construct such objects.  Again,
these warnings are flooding the logs, so disabling for now.
2021-09-23 10:52:39 -07:00
Kolja Brix
afa616bc9e Fix some typos found 2021-09-23 15:22:00 +00:00
sciencewhiz
4b6036e276 fix various typos 2021-09-22 16:15:06 +00:00
Alexander Grund
b5eaa42695
Fix alias violation in BFloat16
reinterpret_cast between unrelated types is undefined behavior and leads
to misoptimizations on some platforms.
Use the safer (and faster) version via bit_cast
2021-09-20 10:37:50 +02:00
Antonio Sanchez
3c724c44cf Fix strict aliasing bug causing product_small failure.
Packet loading is skipped due to aliasing violation, leading to nullopt matrix
multiplication.

Fixes #2327.
2021-09-17 21:09:34 +00:00
Antonio Sanchez
5dac0b53c9 Move Eigen::all,last,lastp1,lastN to Eigen::placeholders::.
These names are so common, IMO they should not exist directly in the
`Eigen::` namespace.  This prevents us from using the `last` or `all`
names for any parameters or local variables, otherwise spitting out
warnings about shadowing or hiding the global values.  Many external
projects (and our own examples) also heavily use
```
using namespace Eigen;
```
which means these conflict with external libraries as well, e.g.
`std::fill(first,last,value)`.

It seems originally these were placed in a separate namespace
`Eigen::placeholders`, which has since been deprecated.  I propose
to un-deprecate this, and restore the original locations.

These symbols are also imported into `Eigen::indexing`, which
additionally imports `fix` and `seq`. An alternative is to remove the
`placeholders` namespace and stick with `indexing`.

NOTE: this is an API-breaking change.

Fixes #2321.
2021-09-17 10:21:42 -07:00
Rasmus Munk Larsen
6cadab6896 Clean up EIGEN_STATIC_ASSERT to only use standard c++11 static_assert. 2021-09-16 20:43:54 +00:00
Rasmus Munk Larsen
7b975acb1f Remove unused variable. 2021-09-16 20:27:13 +00:00
Rasmus Munk Larsen
92849d814b Remove unused variable. 2021-09-16 20:21:31 +00:00
Rasmus Munk Larsen
da027fa20a Remove unused variable. 2021-09-16 20:02:42 +00:00
Antonio Sanchez
cb50730993 Default eigen_packet_wrapper constructor.
This makes it trivial, allowing use of `memcpy`.

Fixes #2326
2021-09-14 10:57:22 -07:00
Rasmus Munk Larsen
d7d0bf832d Issue an error in case of direct inclusion of internal headers. 2021-09-10 19:12:26 +00:00
Antonio Sanchez
26e5beb8cb Device-compatible Tuple implementation.
An analogue of `std::tuple` that works on device.

Context: I've tried `std::tuple` in various versions of NVCC and clang,
and although code seems to compile, it often fails to run - generating
"illegal memory access" errors, or "illegal instruction" errors.
This replacement does work on device.
2021-09-08 13:34:19 -07:00
Antonio Sanchez
fcd73b4884 Add a simple serialization mechanism.
The `Serializer<T>` class implements a binary serialization that
can write to (`serialize`) and read from (`deserialize`) a byte
buffer.  Also added convenience routines for serializing
a list of arguments.

This will mainly be for testing, specifically to transfer data to
and from the GPU.
2021-09-08 09:38:59 -07:00
Antonio Sanchez
7792b1e909 Fix AVX2 PacketMath.h.
There were a couple typos ps -> epi32, and an unaligned load issue.
2021-09-03 19:47:57 +00:00
Antonio Sanchez
5bf35383e0 Disable MSVC constant condition warning.
We use extensive use of `if (CONSTANT)`, and cannot use c++17's `if
constexpr`.
2021-09-03 11:07:18 -07:00
Antonio Sanchez
def145547f Add missing packet types in pset1 call.
Oops, introduced this when "fixing" integer packets.
2021-09-02 16:21:07 -07:00
Antonio Sanchez
3b48a3b964 Remove stray DynamicSparseMatrix references.
DynamicSparseMatrix has been removed.  These shouldn't be here anymore.
2021-09-02 19:47:26 +00:00
Antonio Sanchez
ebd4b17d2f Fix tridiagonalization_inplace_selector.
The `Options` of the new `hCoeffs` vector do not necessarily match
those of the `MatrixType`, leading to build errors. Having the
`CoeffVectorType` be a template parameter relieves this restriction.
2021-09-02 12:23:27 -07:00
Antonio Sanchez
998bab4b04 Missing EIGEN_DEVICE_FUNCs to get gpu_basic passing with CUDA 9.
CUDA 9 seems to require labelling defaulted constructors as
`EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are
ignored.  Without these labels, the `gpu_basic` test fails to
compile, with errors about calling `__host__` functions from
`__host__ __device__` functions.
2021-09-01 19:49:53 -07:00
Antonio Sanchez
3d4ba855e0 Fix AVX integer packet issues.
Most are instances of AVX2 functions not protected by
`EIGEN_VECTORIZE_AVX2`.  There was also a missing semi-colon
for AVX512.
2021-09-01 14:14:43 -07:00
Antonio Sanchez
3a6296d4f1 Fix EIGEN_OPTIMIZATION_BARRIER for arm-clang.
Clang doesn't like !621, needs the "g" constraint back.
The "g" constraint also works for GCC >= 5.

This fixes our gitlab CI.
2021-09-01 09:19:55 -07:00
Antonio Sanchez
ff07a8a639 GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315).
GCC 4.8 doesn't seem to like the `g` register constraint, failing to
compile with "error: 'asm' operand requires impossible reload".

Tested `r` instead, and that seems to work, even with latest compilers.

Also fixed some minor macro issues to eliminate warnings on armv7.

Fixes #2315.
2021-08-31 20:20:47 +00:00
Antonio Sanchez
cc3573ab44 Disable cuda Eigen::half vectorization on host.
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.

All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.

Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.
2021-08-31 19:13:12 +00:00
Adam Kallai
1415817d8d win: include intrin header in Windows on ARM
intrin header is needed for _BitScanReverse and
_BitScanReverse64
2021-08-31 10:57:34 +02:00
Antonio Sanchez
5db9e5c779 Fix fix<N> when variable templates are not supported.
There were some typos that checked `EIGEN_HAS_CXX14` that should have
checked `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`, causing a mismatch
in some of the `Eigen::fix<N>` assumptions.

Also fixed the `symbolic_index` test when
`EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` is 0.

Fixes #2308
2021-08-30 08:06:55 -07:00
Jakub Lichman
dc5b1f7d75 AVX512 and AVX2 support for Packet16i and Packet8i added 2021-08-25 19:38:23 +00:00
Han-Kuan Chen
ab28419298 optimize predux if architecture is aarch64 2021-08-25 19:18:54 +00:00
Rasmus Munk Larsen
82dd3710da Update version of master branch to 3.4.90. 2021-08-18 13:46:05 -07:00
Antonio Sanchez
2b410ecbef Workaround VS 2017 arg bug.
In VS 2017, `std::arg` for real inputs always returns 0, even for
negative inputs.  It should return `PI` for negative real values.
This seems to be fixed in VS 2019 (MSVC 1920).
2021-08-18 18:39:18 +00:00
Jakob Struye
53a29c7e35 Clearer doc for squaredNorm 2021-08-18 15:11:15 +00:00
Antonio Sanchez
fc9d352432 Renamed shift_left/shift_right to shiftLeft/shiftRight.
For naming consistency.  Also moved to ArrayCwiseUnaryOps, and added
test.
2021-08-17 20:04:48 -07:00
Antonio Sanchez
2cc6ee0d2e Add missing PPC packet comparisons.
This is to fix the packetmath tests on the ppc pipeline.
2021-08-17 07:42:04 -07:00
Chip-Kerchner
8dcf3e38ba Fix unaligned loads in ploadLhs & ploadRhs for P8. 2021-08-16 20:28:22 -05:00
andiwand
5c6b3efead minor doc fix in Map.h 2021-08-16 12:02:33 +00:00
Chip-Kerchner
e07227c411 Reverse compare logic ƒin F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8). 2021-08-13 11:21:28 -05:00
Chip Kerchner
66499f0f17 Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+ 2021-08-12 21:38:54 +00:00
Rasmus Munk Larsen
8ce341caf2 * revise the meta_least_common_multiple function template, add a bool variable to check whether the A is larger than B.
* This can make less compile_time if A is smaller than B. and avoid failure in compile if we get a little A and a great B.

Authored by @awoniu.
2021-08-11 18:10:01 +00:00
ChipKerchner
413bc491f1 Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl). 2021-08-10 15:03:18 -05:00
Gauri Deshpande
e6a5a594a7 remove denormal flushing in fp32tobf16 for avx & avx512 2021-08-09 22:15:21 +00:00
Rasmus Munk Larsen
a5a7faeb45 Avoid memory allocation in tridiagonalization_inplace_selector::run. 2021-08-06 20:48:10 +00:00
Alexander Karatarakis
4ba872bd75 Avoid leading underscore followed by cap in template identifiers 2021-08-04 22:41:52 +00:00
Antonio Sanchez
5ad8b9bfe2 Make inverse 3x3 faster and avoid gcc bug.
There seems to be a gcc 4.7 bug that incorrectly flags the current
3x3 inverse as using uninitialized memory.  I'm *pretty* sure it's
a false positive, but it's hard to trigger.  The same warning
does not trigger with clang or later compiler versions.

In trying to find a work-around, this implementation turns out to be
faster anyways for static-sized matrices.

```
name                                            old cpu/op  new cpu/op  delta
BM_Inverse3x3<DynamicMatrix3T<float>>            423ns ± 2%   433ns ± 3%   +2.32%    (p=0.000 n=98+96)
BM_Inverse3x3<DynamicMatrix3T<double>>           425ns ± 2%   427ns ± 3%   +0.48%    (p=0.003 n=99+96)
BM_Inverse3x3<StaticMatrix3T<float>>            7.10ns ± 2%  0.80ns ± 1%  -88.67%  (p=0.000 n=114+112)
BM_Inverse3x3<StaticMatrix3T<double>>           7.45ns ± 2%  1.34ns ± 1%  -82.01%  (p=0.000 n=105+111)
BM_AliasedInverse3x3<DynamicMatrix3T<float>>     409ns ± 3%   419ns ± 3%   +2.40%   (p=0.000 n=100+98)
BM_AliasedInverse3x3<DynamicMatrix3T<double>>    414ns ± 3%   413ns ± 2%     ~       (p=0.322 n=98+98)
BM_AliasedInverse3x3<StaticMatrix3T<float>>     7.57ns ± 1%  0.80ns ± 1%  -89.37%  (p=0.000 n=111+114)
BM_AliasedInverse3x3<StaticMatrix3T<double>>    9.09ns ± 1%  2.58ns ±41%  -71.60%  (p=0.000 n=113+116)
```
2021-08-04 21:18:44 +00:00