Commit Graph

11444 Commits

Author SHA1 Message Date
Chip Kerchner
91e99ec1e0 Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow 2021-06-30 23:05:04 +00:00
Alexander Karatarakis
60400334a9 Make DenseStorage<> trivially_copyable 2021-06-30 04:27:51 +00:00
大河メタル
c81da59a25 Correct declarations for aarch64-pc-windows-msvc 2021-06-30 04:09:46 +00:00
Rasmus Munk Larsen
5aebbe9098 Get rid of redundant pabs instruction in complex square root. 2021-06-29 23:26:15 +00:00
Antonio Sanchez
3a087ccb99 Modify tensor argmin/argmax to always return first occurence.
As written, depending on multithreading/gpu, the returned index from
`argmin`/`argmax` is not currently stable.  Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).

This is otherwise causing unpredictable results in some TF tests.
2021-06-29 10:36:20 -07:00
Rohit Santhanam
2d132d1736 Commit 52a5f982 broke conjhelper functionality for HIP GPUs.
This commit addresses this.
2021-06-25 19:28:00 +00:00
Rasmus Munk Larsen
bffd267d17 Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code. 2021-06-24 18:52:17 -07:00
Rasmus Munk Larsen
52a5f98212 Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations. 2021-06-24 15:47:48 -07:00
Rasmus Munk Larsen
4ad30a73fc Use internal::ref_selector to avoid holding a reference to a RHS expression. 2021-06-22 14:31:32 +00:00
Rasmus Munk Larsen
ea62c937ed Update ComplexEigenSolver_eigenvectors.cpp 2021-06-21 19:06:25 +00:00
Rasmus Munk Larsen
c8a2b4d20a Fix typo in SelfAdjointEigenSolver_eigenvectors.cpp 2021-06-21 19:06:04 +00:00
Antonio Sanchez
e9ab4278b7 Rewrite balancer to avoid overflows.
The previous balancer overflowed for large row/column norms.
Modified to prevent that.

Fixes #2273.
2021-06-21 17:29:55 +00:00
Antonio Sanchez
35a367d557 Fix fix<> for gcc-4.9.3.
There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`
replacement.

Fixes ##2267
2021-06-18 13:22:54 -07:00
Antonio Sanchez
12e8d57108 Remove pset, replace with ploadu.
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned.  But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.

This is causing segfaults in Google Maps.
2021-06-16 18:41:17 -07:00
Chip-Kerchner
ef1fd341a8 EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate. 2021-06-16 16:30:31 +00:00
jenswehner
175f0cc1e9 changed documentation to make example compile 2021-06-16 11:45:06 +02:00
Antonio Sanchez
9e94c59570 Add missing ppc pcmp_lt_or_nan<Packet8bf> 2021-06-15 13:42:17 -07:00
Antonio Sanchez
954879183b Fix placement of permanent GPU defines. 2021-06-15 12:17:09 -07:00
Rasmus Munk Larsen
13fb5ab92c Fix more enum arithmetic. 2021-06-15 09:09:31 -07:00
Antonio Sanchez
ad82d20cf6 Fix checking of version number for mingw.
MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.

Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.

Related to #2268.  CMake and build passes for me after this.
2021-06-11 23:19:10 +00:00
Antonio Sanchez
514977f31b Add ability to permanently enable HIP/CUDA gpu* defines.
When using Eigen for gpu, these simplify portability.  If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.
2021-06-11 17:19:54 +00:00
Antonio Sanchez
6aec83263d Allow custom TENSOR_CONTRACTION_DISPATCH macro.
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.
2021-06-11 17:02:19 +00:00
Rasmus Munk Larsen
fc87e2cbaa Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled. 2021-06-11 02:35:53 +00:00
Rasmus Munk Larsen
f64b2954c7 Fix c++20 warnings about using enums in arithmetic expressions. 2021-06-10 17:17:39 -07:00
Nicolas Cornu
001a57519a Fix parsing of version for nvhpc
As the first line of the version is empty it crashes,
so delete first line if it is empty
2021-06-10 18:30:53 +00:00
Rohit Santhanam
c8d40a7bf1 Removed dead code from GPU float16 unit test. 2021-05-28 20:06:48 +00:00
Cyril Kaiser
91cd67f057 Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor. 2021-05-26 19:28:13 +00:00
Antonio Sanchez
dba753a986 Add missing NEON ptranspose implementations.
Unified implementation using only `vzip`.
2021-05-25 18:25:35 +00:00
Antonio Sanchez
ebb300d0b4 Modify Unary/Binary/TernaryOp evaluators to work for non-class types.
This used to work for non-class types (e.g. raw function pointers) in
Eigen 3.3.  This was changed in commit 11f55b29 to optimize the
evaluator:

> `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.

though I cannot reproduce the 16 byte result.  Both before the change
and after, with multiple compilers/versions, I always get a result of 40 bytes.

https://godbolt.org/z/MsjTc1PGe

This change modifies the code slightly to allow non-class types.  The
final generated code is identical, and the expression remains 40 bytes
for the `abs2` sample case.

Fixes #2251
2021-05-23 12:44:37 -07:00
Jakub Lichman
12471fcb5d predux_half_dowto4 test extended to all applicable packets 2021-05-21 16:42:19 +00:00
Steve Bronder
1720057023 Adds macro for checking if C++14 variable templates are supported 2021-05-21 16:25:32 +00:00
Niall Murphy
391094c507 Use derived object type in conservative_resize_like_impl
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.
2021-05-20 23:17:02 +00:00
Jakub Lichman
8877f8d9b2 ptranpose test for non-square kernels added 2021-05-19 08:26:45 +00:00
Guoqiang QI
3e006bfd31 Ensure all generated matrices for inverse_4x4 testes are invertible, this fix #2248 . 2021-05-13 15:03:30 +00:00
Nathan Luehr
972cf0c28a Fix calls to device functions from host code 2021-05-11 22:47:49 +00:00
Nathan Luehr
7e6a1c129c Device implementation of log for std::complex types. 2021-05-11 22:02:21 +00:00
Nathan Luehr
6753f0f197 Fix ambiguity due to argument dependent lookup. 2021-05-11 15:41:11 -05:00
guoqiangqi
3d9051ea84 Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 . 2021-05-10 23:53:16 +00:00
Rohit Santhanam
39ec31c0ad Fix for issue where numext::imag and numext::real are used before they are defined. 2021-05-10 19:48:32 +00:00
Antonio Sanchez
c0eb5f89a4 Restore ABI compatibility for conj with 3.3, fix conflict with boost.
The boost library unfortunately specializes `conj` for various types and
assumes the original two-template-parameter version.  This changes
restores the second parameter.  This also restores ABI compatibility.

The specialization for `std::complex` is because `std::conj` is not
a device function. For custom complex scalar types, users should provide
their own `conj` implementation.

We may consider removing the unnecessary second parameter in the future - but
this will require modifying boost as well.

Fixes #2112.
2021-05-07 18:14:00 +00:00
Antonio Sanchez
0eba8a1fe3 Clean up gpu device properties.
Made a class and singleton to encapsulate initialization and retrieval of
device properties.

Related to !481, which already changed the API to address a static
linkage issue.
2021-05-07 17:51:29 +00:00
Antonio Sanchez
90e9a33e1c Fix numext::arg return type.
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.

Related to !477, which uncovered the issue.
2021-05-07 16:26:57 +00:00
Christoph Hertzberg
722ca0b665 Revert addition of unused paddsub<Packet2cf>. This fixes #2242 2021-05-06 18:36:47 +02:00
Antonio Sanchez
e3b7f59659 Simplify TensorRandom and remove time-dependence.
Time-dependence prevents tests from being repeatable. This has long
been an issue with debugging the tensor tests. Removing this will allow
future tests to be repeatable in the usual way.

Also, the recently added macros in !476 are causing headaches across different
platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple
ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE`
are sometimes defined with values, sometimes defined as empty, and sometimes
not defined at all when they probably should be.  This is leading to
multiple build breakages.

The simplest approach is to generate a seed via
`Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a
hash based on the current thread ID (since `rand()` isn't supported
on GPU).

Fixes #1602.
2021-05-04 13:34:49 -07:00
Antonio Sanchez
1c013be2cc Better CUDA complex division.
The original produced NaNs when dividing 0/b for subnormal b.
The `complex_divide_stable` was changed to use the more common
Smith's algorithm.
2021-04-29 17:39:58 +00:00
Antonio Sanchez
172db7bfc3 Add missing pcmp_lt_or_nan for NEON Packet4bf. 2021-04-27 14:12:11 -07:00
Theo Fletcher
2ced0cc233 Added complex matrix unit tests for SelfAdjointEigenSolve 2021-04-26 19:00:51 +00:00
Jakub Lichman
d87648a6be Tests added and AVX512 bug fixed for pcmp_lt_or_nan 2021-04-25 20:58:56 +00:00
Jakub Lichman
1115f5462e Tests for pcmp_lt and pcmp_le added 2021-04-23 19:51:43 +00:00
Turing Eret
3804ca0d90 Fix for issue with static global variables in TensorDeviceGpu.h
m_deviceProperties and m_devicePropInitialized are defined as global
statics which will define multiple copies which can cause issues if
initializeDeviceProp() is called in one translation unit and then
m_deviceProperties is used in a different translation unit. Added
inline functions getDeviceProperties() and getDevicePropInitialized()
which defines those variables as static locals. As per the C++ standard
7.1.2/4, a static local declared in an inline function always refers
to the same object, so this should be safer. Credit to Sun Chenggen
for this fix.

This fixes issue #1475.
2021-04-23 07:43:35 -06:00