Commit Graph

11659 Commits

Author SHA1 Message Date
Rohit Santhanam
beea14a18f Enable extract et. al. for HIP GPU. 2021-07-09 14:58:07 +00:00
Rasmus Munk Larsen
0c361c4899 Defer to std::fill_n when filling a dense object with a constant value. 2021-07-09 03:59:35 +00:00
Antonio Sanchez
1e6c6c1576 Replace memset with fill to work for non-trivial scalars.
For custom scalars, zero is not necessarily represented by
a zeroed-out memory block (e.g. gnu MPFR). We therefore
cannot rely on `memset` if we want to fill a matrix or tensor
with zeroes. Instead, we should rely on `fill`, which for trivial
types does end up getting converted to a `memset` under-the-hood
(at least with gcc/clang).

Requires adding a `fill(begin, end, v)` to `TensorDevice`.

Replaced all potentially bad instances of memset with fill.

Fixes #2245.
2021-07-08 18:34:41 +00:00
Jonas Harsch
e9c9a3130b Removed superfluous boolean degenerate in TensorMorphing.h. 2021-07-08 18:02:58 +00:00
Guoqiang QI
4bcd42c271 Make a copy of input matrix when try to do the inverse in place, this fixes #2285. 2021-07-08 17:05:26 +00:00
Kolja Brix
a59cf78c8d Add Doxygen-style documentation to main.h. 2021-07-07 18:23:59 +00:00
Antonio Sanchez
f44f05532d Fix CMake directory issues.
Allows absolute and relative paths for
- `INCLUDE_INSTALL_DIR`
- `CMAKEPACKAGE_INSTALL_DIR`
- `PKGCONFIG_INSTALL_DIR`

Type should be `PATH` not `STRING`.  Contrary to !211, these don't
seem to be made absolute if user-defined - according to the doc any
directories should use `PATH` type, which allows a file dialog
to be used via the GUI.  It also better handles file separators.

If user provides an absolute path, it will be made relative to
`CMAKE_INSTALL_PREFIX` so that the `configure_packet_config_file` will
work.

Fixes #2155 and #2269.
2021-07-07 17:24:57 +00:00
Antonio Sanchez
f5a9873bbb Fix Tensor documentation page.
The extra [TOC] tag is generating a huge floating duplicated
table-of-contents, which obscures the majority of the page
(see bottom of https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html).
Remove it.

Also, headers do not support markup (see
[doxygen bug](https://github.com/doxygen/doxygen/issues/7467)), so
backticks like
```
```
end up generating titles that looks like
```
Constructor <tt>Tensor<double,2></tt>
```
Removing backticks for now.  To generate proper formatted headers, we
must directly use html instead of markdown, i.e.
```
<h2>Constructor <code>Tensor&lt;double,2&gt;</code></h2>
```
which is ugly.

Fixes #2254.
2021-07-03 04:39:22 +00:00
Rasmus Munk Larsen
7b35638ddb Fix breakage of conj_helper in conjunction with custom types introduced in !537. 2021-07-02 20:42:15 +00:00
Jonas Harsch
aab747021b Don't crash when attempting to shuffle an empty tensor. 2021-07-02 20:33:52 +00:00
Rasmus Munk Larsen
bbfc4d54cd Use padd instead of +. 2021-07-02 02:51:48 +00:00
Rasmus Munk Larsen
9312a5bf5c Implement a generic vectorized version of Smith's algorithms for complex division. 2021-07-01 23:31:12 +00:00
Antonio Sanchez
6035da5283 Fix compile issues for gcc 4.8.
- Move constructors can only be defaulted as NOEXCEPT if all members
have NOEXCEPT move constructors.
- gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.
2021-07-01 22:58:14 +00:00
Antonio Sanchez
154f00e9ea Fix inverse nullptr/asan errors for LU.
For empty or single-column matrices, the current `PartialPivLU`
currently dereferences a `nullptr` or accesses memory out-of-bounds.
Here we adjust the checks to avoid this.
2021-07-01 13:41:04 -07:00
Dan Miller
eb04775903 Fix duplicate definitions on Mac 2021-07-01 14:54:12 +00:00
Chip Kerchner
91e99ec1e0 Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow 2021-06-30 23:05:04 +00:00
Alexander Karatarakis
60400334a9 Make DenseStorage<> trivially_copyable 2021-06-30 04:27:51 +00:00
大河メタル
c81da59a25 Correct declarations for aarch64-pc-windows-msvc 2021-06-30 04:09:46 +00:00
Rasmus Munk Larsen
5aebbe9098 Get rid of redundant pabs instruction in complex square root. 2021-06-29 23:26:15 +00:00
Antonio Sanchez
3a087ccb99 Modify tensor argmin/argmax to always return first occurence.
As written, depending on multithreading/gpu, the returned index from
`argmin`/`argmax` is not currently stable.  Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).

This is otherwise causing unpredictable results in some TF tests.
2021-06-29 10:36:20 -07:00
Rohit Santhanam
2d132d1736 Commit 52a5f982 broke conjhelper functionality for HIP GPUs.
This commit addresses this.
2021-06-25 19:28:00 +00:00
Rasmus Munk Larsen
bffd267d17 Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code. 2021-06-24 18:52:17 -07:00
Rasmus Munk Larsen
52a5f98212 Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations. 2021-06-24 15:47:48 -07:00
Rasmus Munk Larsen
4ad30a73fc Use internal::ref_selector to avoid holding a reference to a RHS expression. 2021-06-22 14:31:32 +00:00
Rasmus Munk Larsen
ea62c937ed Update ComplexEigenSolver_eigenvectors.cpp 2021-06-21 19:06:25 +00:00
Rasmus Munk Larsen
c8a2b4d20a Fix typo in SelfAdjointEigenSolver_eigenvectors.cpp 2021-06-21 19:06:04 +00:00
Antonio Sanchez
e9ab4278b7 Rewrite balancer to avoid overflows.
The previous balancer overflowed for large row/column norms.
Modified to prevent that.

Fixes #2273.
2021-06-21 17:29:55 +00:00
Antonio Sanchez
35a367d557 Fix fix<> for gcc-4.9.3.
There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`
replacement.

Fixes ##2267
2021-06-18 13:22:54 -07:00
Antonio Sanchez
12e8d57108 Remove pset, replace with ploadu.
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned.  But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.

This is causing segfaults in Google Maps.
2021-06-16 18:41:17 -07:00
Chip-Kerchner
ef1fd341a8 EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate. 2021-06-16 16:30:31 +00:00
jenswehner
175f0cc1e9 changed documentation to make example compile 2021-06-16 11:45:06 +02:00
Antonio Sanchez
9e94c59570 Add missing ppc pcmp_lt_or_nan<Packet8bf> 2021-06-15 13:42:17 -07:00
Antonio Sanchez
954879183b Fix placement of permanent GPU defines. 2021-06-15 12:17:09 -07:00
Rasmus Munk Larsen
13fb5ab92c Fix more enum arithmetic. 2021-06-15 09:09:31 -07:00
Antonio Sanchez
ad82d20cf6 Fix checking of version number for mingw.
MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.

Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.

Related to #2268.  CMake and build passes for me after this.
2021-06-11 23:19:10 +00:00
Antonio Sanchez
514977f31b Add ability to permanently enable HIP/CUDA gpu* defines.
When using Eigen for gpu, these simplify portability.  If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.
2021-06-11 17:19:54 +00:00
Antonio Sanchez
6aec83263d Allow custom TENSOR_CONTRACTION_DISPATCH macro.
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.
2021-06-11 17:02:19 +00:00
Rasmus Munk Larsen
fc87e2cbaa Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled. 2021-06-11 02:35:53 +00:00
Rasmus Munk Larsen
f64b2954c7 Fix c++20 warnings about using enums in arithmetic expressions. 2021-06-10 17:17:39 -07:00
Nicolas Cornu
001a57519a Fix parsing of version for nvhpc
As the first line of the version is empty it crashes,
so delete first line if it is empty
2021-06-10 18:30:53 +00:00
Rohit Santhanam
c8d40a7bf1 Removed dead code from GPU float16 unit test. 2021-05-28 20:06:48 +00:00
Cyril Kaiser
91cd67f057 Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor. 2021-05-26 19:28:13 +00:00
Antonio Sanchez
dba753a986 Add missing NEON ptranspose implementations.
Unified implementation using only `vzip`.
2021-05-25 18:25:35 +00:00
Antonio Sanchez
ebb300d0b4 Modify Unary/Binary/TernaryOp evaluators to work for non-class types.
This used to work for non-class types (e.g. raw function pointers) in
Eigen 3.3.  This was changed in commit 11f55b29 to optimize the
evaluator:

> `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.

though I cannot reproduce the 16 byte result.  Both before the change
and after, with multiple compilers/versions, I always get a result of 40 bytes.

https://godbolt.org/z/MsjTc1PGe

This change modifies the code slightly to allow non-class types.  The
final generated code is identical, and the expression remains 40 bytes
for the `abs2` sample case.

Fixes #2251
2021-05-23 12:44:37 -07:00
Jakub Lichman
12471fcb5d predux_half_dowto4 test extended to all applicable packets 2021-05-21 16:42:19 +00:00
Steve Bronder
1720057023 Adds macro for checking if C++14 variable templates are supported 2021-05-21 16:25:32 +00:00
Niall Murphy
391094c507 Use derived object type in conservative_resize_like_impl
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.
2021-05-20 23:17:02 +00:00
Jakub Lichman
8877f8d9b2 ptranpose test for non-square kernels added 2021-05-19 08:26:45 +00:00
Guoqiang QI
3e006bfd31 Ensure all generated matrices for inverse_4x4 testes are invertible, this fix #2248 . 2021-05-13 15:03:30 +00:00
Nathan Luehr
972cf0c28a Fix calls to device functions from host code 2021-05-11 22:47:49 +00:00