Commit Graph

6584 Commits

Author SHA1 Message Date
Nathan Luehr
7e6a1c129c Device implementation of log for std::complex types. 2021-05-11 22:02:21 +00:00
Nathan Luehr
6753f0f197 Fix ambiguity due to argument dependent lookup. 2021-05-11 15:41:11 -05:00
guoqiangqi
3d9051ea84 Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 . 2021-05-10 23:53:16 +00:00
Rohit Santhanam
39ec31c0ad Fix for issue where numext::imag and numext::real are used before they are defined. 2021-05-10 19:48:32 +00:00
Antonio Sanchez
c0eb5f89a4 Restore ABI compatibility for conj with 3.3, fix conflict with boost.
The boost library unfortunately specializes `conj` for various types and
assumes the original two-template-parameter version.  This changes
restores the second parameter.  This also restores ABI compatibility.

The specialization for `std::complex` is because `std::conj` is not
a device function. For custom complex scalar types, users should provide
their own `conj` implementation.

We may consider removing the unnecessary second parameter in the future - but
this will require modifying boost as well.

Fixes #2112.
2021-05-07 18:14:00 +00:00
Antonio Sanchez
90e9a33e1c Fix numext::arg return type.
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.

Related to !477, which uncovered the issue.
2021-05-07 16:26:57 +00:00
Christoph Hertzberg
722ca0b665 Revert addition of unused paddsub<Packet2cf>. This fixes #2242 2021-05-06 18:36:47 +02:00
Antonio Sanchez
1c013be2cc Better CUDA complex division.
The original produced NaNs when dividing 0/b for subnormal b.
The `complex_divide_stable` was changed to use the more common
Smith's algorithm.
2021-04-29 17:39:58 +00:00
Antonio Sanchez
172db7bfc3 Add missing pcmp_lt_or_nan for NEON Packet4bf. 2021-04-27 14:12:11 -07:00
Jakub Lichman
d87648a6be Tests added and AVX512 bug fixed for pcmp_lt_or_nan 2021-04-25 20:58:56 +00:00
Antonio Sanchez
d213a0bcea DenseStorage safely copy/swap.
Fixes #2229.

For dynamic matrices with fixed-sized storage, only copy/swap
elements that have been set.  Otherwise, this leads to inefficient
copying, and potential UB for non-initialized elements.
2021-04-22 18:45:19 +00:00
Rasmus Munk Larsen
85a76a16ea Make vectorized compute_inverse_size4 compile with AVX. 2021-04-22 15:21:01 +00:00
Chip-Kerchner
06c2760bd1 Fix taking address of rvalue compiler issue with TensorFlow (plus other warnings). 2021-04-21 00:47:13 +00:00
Jakub Lichman
2b1dfd1ba0 HasExp added for AVX512 Packet8d 2021-04-20 19:07:58 +00:00
Antonio Sanchez
1d79c68ba0 Fix ldexp for AVX512 (#2215)
Wrong shuffle was used.  Need to interleave low/high halves with a
`permute` instruction.

Fixes #2215.
2021-04-20 16:25:22 +00:00
David Tellenbach
3e819d83bf Before 3.4 branch 2021-04-18 23:36:14 +02:00
Christoph Hertzberg
9357feedc7 Avoid using uninitialized inputs and if available, use slightly more efficient movsd instruction for pset1<Packet2cf>. 2021-04-13 01:36:59 +02:00
Christoph Hertzberg
1e1c8a735c Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for std::result_of and std::invoke_result.
Fixes #2209
2021-04-12 01:26:15 +00:00
Christoph Hertzberg
d58678069c Make iterators default constructible and assignable, by making... 2021-04-09 17:03:28 +00:00
Antonio Sanchez
fcb5106c6e Scaled epsilon the wrong way.
Should have been 0.5 to widen the bounds, since this is inverse
precision.  Setting to 0.5, however, leads to many more failing
tests at Google, so reverting to 1 for now.
2021-04-07 15:08:39 -07:00
Christoph Hertzberg
6197ce1a35 Replace -2147483648 by -0.0f or -0.0 constants (this should fix #2189).
Also, remove unnecessary `pgather` operations.
2021-04-07 11:25:27 +00:00
Rasmus Munk Larsen
22edb46823 Align local arrays to Packet boundary. 2021-04-06 16:22:36 +00:00
Antonio Sanchez
90187a33e1 Fix SelfAdjoingEigenSolver (#2191)
Adjust the relaxation step to use the condition
```
abs(subdiag[i]) <= epsilon * sqrt(abs(diag[i]) + abs(diag[i+1]))
```
for setting the subdiagonal entry to zero.

Also adjust Wilkinson shift for small `e = subdiag[end-1]` -
I couldn't find a reference for the original, and it was not
consistent with the Wilkinson definition.

Fixes #2191.
2021-04-05 11:19:09 -07:00
Rasmus Munk Larsen
3ddc0974ce Fix two bugs in commit 2021-04-02 22:06:27 +00:00
Chip Kerchner
c24bee6120 Fix address of temporary object errors in clang11.
This fixes the problem with taking the address of temporary objects which clang11 treats as errors.
2021-04-02 16:27:08 +00:00
Rasmus Munk Larsen
5bbc9cea93 Add an info() method to the SVDBase class to make it possible to tell the user that the computation failed, possibly due to invalid input.
Make Jacobi and divide-and-conquer fail fast and return info() == InvalidInput if the matrix contains NaN or +/-Inf.
2021-03-31 21:09:19 +00:00
Antonio Sanchez
78ee3d6261 Fix CUDA constexpr issues for numeric_limits.
Some CUDA/HIP constants fail on device with `constexpr` since they
internally rely on non-constexpr functions, e.g.
```
\#define CUDART_INF_F            __int_as_float(0x7f800000)
```
This fails for cuda-clang (though passes with nvcc). These constants are
currently used by `device::numeric_limits`.  For portability, we
need to remove `constexpr` from the affected functions.

For C++11 or higher, we should be able to rely on the `std::numeric_limits`
versions anyways, since the methods themselves are now `constexpr`, so
should be supported on device (clang/hipcc natively, nvcc with
`--expr-relaxed-constexpr`).
2021-03-30 18:01:27 +00:00
Antonio Sanchez
af1247fbc1 Use Index type in loop over coefficients.
Previously was `int`.  Brought up by Kyle Snow (Polaris Geospatial
Services) on the mailing list.
2021-03-29 17:40:55 +00:00
Antonio Sanchez
87729ea39f Eliminate round_impl double-promotion warnings for c++03. 2021-03-25 16:52:19 +00:00
Deven Desai
748489ef9c Un-defining EIGEN_HAS_CONSTEXPR on the HIP platform
The Eigen unit-tests started failing on the HIP/ROCm platform, after the following commit

e7b8643d70

```
In file included from /home/rocm-user/eigen/test/main.h:360:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:162:
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:300:17: error: constexpr function never produces a constant expression [-Winvalid-constexpr]
  static float (max)() {
                ^
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:304:12: note: non-constexpr function '__int_as_float' cannot be used in a constant expression
    return HIPRT_MAX_NORMAL_F;
           ^
/home/rocm-user/eigen/Eigen/src/Core/arch/HIP/hcc/math_constants.h:14:28: note: expanded from macro 'HIPRT_MAX_NORMAL_F'
#define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff)
                           ^
/opt/rocm/hip/include/hip/hcc_detail/device_functions.h:913:32: note: declared here
__device__ static inline float __int_as_float(int x) {
                               ^
```

The problem seems to that some of the constants defined in the HIP `math_constants.h` have a call to `__int_as_float` routine which is not declared `constexpr` in the HIP runtime header file.

Working around this issue for now, be skipping the const_expr support (enabled via the above commit) on HIP
2021-03-25 13:45:52 +00:00
Chip Kerchner
d59ef212e1 Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3). 2021-03-25 11:08:19 +00:00
Steve Bronder
e7b8643d70 Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()""
This reverts commit 5f0b4a4010.
2021-03-24 18:14:56 +00:00
Christoph Hertzberg
69a4f70956 Revert "Uses _mm512_abs_pd for Packet8d pabs"
This reverts commit f019b97aca
2021-03-23 18:52:19 +00:00
David Tellenbach
4811e81966 Remove yet another comma at end of enum 2021-03-18 23:30:00 +01:00
Steve Bronder
f019b97aca Uses _mm512_abs_pd for Packet8d pabs 2021-03-18 15:47:52 +00:00
Niek Bouman
ed964ba3f1 Proposed fix for issue #2187 2021-03-18 00:55:36 +00:00
Antonio Sanchez
8dfe1029a5 Augment NumTraits with min/max_exponent() again.
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase where possible.  Also replaced some other `numeric_limits`
usages in affected tests with the `NumTraits` equivalent.

The previous MR !443 failed for c++03 due to lack of `constexpr`.
Because of this, we need to keep around the `std::numeric_limits`
version in enum expressions until the switch to c++11.

Fixes #2148
2021-03-16 20:12:46 -07:00
David Tellenbach
eb71e5db98 Fix another warning on missing commas 2021-03-17 03:07:04 +01:00
David Tellenbach
df4bc2731c Revert "Augment NumTraits with min/max_exponent()."
This reverts commit 75ce9cd2a7.
2021-03-17 03:06:08 +01:00
Antonio Sanchez
75ce9cd2a7 Augment NumTraits with min/max_exponent().
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase.  Also replaced some other `numeric_limits` usages in
affected tests with the `NumTraits` equivalent.

Fixes #2148
2021-03-17 01:00:41 +00:00
David Tellenbach
9fb7062440 Silence warning on comma at end of enumerator list 2021-03-17 01:46:52 +01:00
Theo Fletcher
b8502a9dd6 Updated SelfAdjointEigenSolver documentation to include that the eigenvectors matrix is unitary. 2021-03-16 18:48:02 +00:00
Rasmus Munk Larsen
2e83cbbba9 Add NaN propagation options to minCoeff/maxCoeff visitors. 2021-03-16 17:02:50 +00:00
Antonio Sanchez
f612df2736 Add fmod(half, half).
This is to support TensorFlow's `tf.math.floormod` for half.
2021-03-15 13:32:24 -07:00
Antonio Sanchez
14b7ebea11 Fix numext::round pre c++11 for large inputs.
This is to resolve an issue for large inputs when +0.5 can
actually lead to +1 if the input doesn't have enough precision
to resolve the addition - leading to an off-by-one error.

See discussion on 9a663973.
2021-03-15 19:08:04 +00:00
Chip Kerchner
c9d4367fa4 Fix pround and add print 2021-03-15 19:07:43 +00:00
Antonio Sanchez
d24f9f9b55 Fix NVCC+ICC issues.
NVCC does not understand `__forceinline`, so we need to use `inline`
when compiling for GPU.

ICC specializes `std::complex` operators for `float` and `double`
by default, which cannot be used on device and conflict with Eigen's
workaround in CUDA/Complex.h.  This can be prevented by defining
`_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`.
Added this define to the tests and to `Eigen/Core`, but this will
not work if the user includes `<complex>` before `<Eigen/Core>`.

ICC also seems to generate a duplicate `Map` symbol in
`PlainObjectBase`:
```
error: "Map" has already been declared in the current scope
  static ConstMapType Map(const Scalar *data)

```
I tracked this down to `friend class Eigen::Map`.  Putting the `friend`
statements at the bottom of the class seems to resolve this issue.

Fixes #2180
2021-03-15 18:42:04 +00:00
Antonio Sanchez
14487ed14e Add increment/decrement operators to Eigen::half.
This is for consistency with bfloat16, and to support initialization
with `std::iota`.
2021-03-15 10:52:23 -07:00
Antonio Sanchez
d098c4d64c Disable EIGEN_OPTIMIZATION_BARRIER for PPC clang.
Doesn't seem to correctly select the register type, and most types
lead to compiler crashes.
2021-03-10 16:05:01 -08:00
Antonio Sanchez
543e34ab9d Re-implement move assignments.
The original swap approach leads to potential undefined behavior (reading
uninitialized memory) and results in unnecessary copying of data for static
storage.

Here we pass down the move assignment to the underlying storage.  Static
storage does a one-way copy, dynamic storage does a swap.

Modified the tests to no longer read from the moved-from matrix/tensor,
since that can lead to UB. Added a test to ensure we do not access
uninitialized memory in a move.

Fixes: #2119
2021-03-10 16:55:20 +00:00
Ben Niu
b8d1857f0d [MSVC-specific] Define EIGEN_ARCH_x86_64 for native x64 (_M_X64 is defined and _M_ARM64EC is not), and define EIGEN_ARCH_ARM64 for both the native ARM64 (_M_ARM64 is defined) or ARM64EC (_M_ARM64EC is defined). _M_ARM64EC is defined when the code is compiled by MSVC for ARM64EC, a new ARM64 ABI designed to be compatible with x64 application emulation on ARM64. If _M_ARM64EC is defined, _M_X64 and _M_AMD64 are also defined, so x64-specific code (especially intrinsics) is also compiled to ARM64 instructions (compliant with the ARM64EC ABI) for maximum x64 compatibility. Although a majority of x64-specific intrinsics can emulated by ARM64 instructions, it is still a good to simply recompile the native ARM64 code paths to ARM64EC for pure computation tasks, for performance reasons. 2021-03-10 10:21:31 +00:00
Antonio Sanchez
853a5c4b84 Fix ambiguous call to CUDA __half constructor. 2021-03-08 21:06:28 -08:00
Antonio Sanchez
94327dbfba Fix typo: DEVICE -> GPU 2021-03-08 11:21:00 -08:00
Antonio Sanchez
1296abdf82 Fix non-trivial Half constructor for CUDA.
Both CUDA and HIP require trivial default constructors for types used
in shared memory. Otherwise failing with
```
error: initialization is not supported for __shared__ variables.
```
2021-03-08 07:32:54 -08:00
Antonio Sanchez
6045243141 Revert stack allocation limit change that crept in.
This was accidentally introduced when copying changes between repos.
2021-03-05 14:29:37 -08:00
Deven Desai
1a96d49afe Changing the Eigen::half implementation for HIP
Currently, when compiling with HIP, Eigen::half is derived from the `__half_raw` struct that is defined within the hip_fp16.h header file. This is true for both the "host" compile phase and the "device" compile phase. This was causing a very hard to detect bug in the ROCm TensorFlow build.

In the ROCm Tensorflow build,
* files that do not contain ant GPU code get compiled via gcc, and
* files that contnain GPU code get compiled via hipcc.

In certain case, we have a function that is defined in a file that is compiled by hipcc, and is called in a file that is compiled by gcc. If such a function had Eigen::half has a "pass-by-value" argument, its value was getting corrupted, when received by the function.

The reason for this seems to be that for the gcc compile, Eigen::half is derived from a `__half_raw` struct that has `uint16_t` as the data-store, and for hipcc the `__half_raw` implementation uses `_Float16` as the data store. There is some ABI incompatibility between gcc / hipcc (which is essentially latest clang), which results in the Eigen::half value (which is correct at the call-site) getting randomly corrupted when passed to the function.

Changing the Eigen::half argument to be "pass by reference" seems to workaround the error.

In order to fix it such that we do not run into it again in TF, this commit changes the Eigne::half implementation to use the same `__half_raw` implementation as the non-GPU compile, during host compile phase of the hipcc compile.
2021-03-05 19:27:13 +00:00
Antonio Sanchez
2468253c9a Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.
The macro `__cplusplus` is not defined correctly in MSVC unless building
with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the
specified c++ standard version number.

Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version
number both for MSVC and otherwise.  This simplifies checks for supported
features.

Also replaced most instances of standard version checking via `__cplusplus`
with the existing `EIGEN_COMP_CXXVER` macro for better clarity.

Fixes: #2170
2021-03-05 18:33:18 +00:00
Antonio Sanchez
82d61af3a4 Fix rint SSE/NEON again, using optimization barrier.
This is a new version of !423, which failed for MSVC.

Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to
prevent operations involving `X` from crossing that barrier. Should
work on most `GNUC` compatible compilers (MSVC doesn't seem to need
this). This is a modified version adapted from what was used in
`psincos_float` and tested on more platforms
(see #1674, https://godbolt.org/z/73ezTG).

Modified `rint` to use the barrier to prevent the add/subtract rounding
trick from being optimized away.

Also fixed an edge case for large inputs that get bumped up a power of two
and ends up rounding away more than just the fractional part.  If we are
over `2^digits` then just return the input.  This edge case was missed in
the test since the test was comparing approximate equality, which was still
satisfied.  Adding a strict equality option catches it.
2021-03-05 08:54:12 -08:00
David Tellenbach
5f0b4a4010 Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()"
This reverts commit 6cbb3038ac because it
breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
2021-03-05 13:16:43 +01:00
Steve Bronder
6cbb3038ac Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size() 2021-03-04 18:58:08 +00:00
Antonio Sánchez
9a663973b4 Revert "Fix rint for SSE/NEON."
This reverts commit e72dfeb8b9
2021-03-03 18:51:51 +00:00
Antonio Sanchez
e72dfeb8b9 Fix rint for SSE/NEON.
It seems *sometimes* with aggressive optimizations the combination
`psub(padd(a, b), b)` trick to force rounding is compiled away. Here
we replace with inline assembly to prevent this (I tried `volatile`,
but that leads to additional loads from memory).

Also fixed an edge case for large inputs `a` where adding `b` bumps
the value up a power of two and ends up rounding away more than
just the fractional part.  If we are over `2^digits` then just return
the input.  This edge case was missed in the test since the test was
comparing approximate equality, which was still satisfied.  Adding
a strict equality option catches it.
2021-03-03 09:41:46 -08:00
Antonio Sanchez
1e0c7d4f49 Add print for SSE/NEON, use NEON rounding intrinsics if available.
In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the
current rounding mode.

For NEON, we use the provided intrinsics for rint/floor/ceil if
available (armv8).

Related to #1969.
2021-02-27 22:42:07 +00:00
Antonio Sanchez
c65c2b31d4 Make half/bfloat16 constructor take inputs by value, fix powerpc test.
Since `numeric_limits<half>::max_exponent` is a static inline constant,
it cannot be directly passed by reference. This triggers a linker error
in recent versions of `g++-powerpc64le`.

Changing `half` to take inputs by value fixes this.  Wrapping
`max_exponent` with `int(...)` to make an addressable integer also fixes this
and may help with other custom `Scalar` types down-the-road.

Also eliminated some compile warnings for powerpc.
2021-02-27 21:32:06 +00:00
Christoph Hertzberg
39a590dfb6 Remove unused include 2021-02-27 19:02:33 +01:00
Christoph Hertzberg
8f686ac4ec clang 10 aggressively warns about precision loss when converting int to float (or long to double)
(cherry picked from commit cd541ad52c8152340469cae210312c0e27829c8d)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
a3521d743c Fix some enum-enum conversion warnings
(cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
ca528593f4 Fixed/masked more implicit copy constructor warnings
(cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
4fb3459a23 Fix double-promotion warnings
(cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
2021-02-27 18:44:26 +01:00
Antonio Sanchez
29ebd84cb7 Fix NEON sqrt for 32-bit, add prsqrt.
With !406, we accidentally broke arm 32-bit NEON builds, since
`vsqrt_f32` is only available for 64-bit.

Here we add back the `rsqrt` implementation for 32-bit, relying
on a `prsqrt` implementation with better handling of edge cases.

Note that several of the 32-bit NEON packet tests are currently
failing - either due to denormal handling (NEON versions flush
to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).
2021-02-26 14:08:40 -08:00
Rasmus Munk Larsen
fe19714f80 Merge branch 'rmlarsen1/eigen-nan_prop' 2021-02-26 09:21:24 -08:00
Rasmus Munk Larsen
e67672024d Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop 2021-02-26 09:12:44 -08:00
Rasmus Munk Larsen
5e7d4c33d6 Add TODO. 2021-02-26 09:08:45 -08:00
Rasmus Munk Larsen
fb5b59641a Defer default for minCoeff/maxCoeff to templated variant. 2021-02-26 09:07:00 -08:00
Antonio Sanchez
e19829c3b0 Fix floor/ceil for NEON fp16.
Forgot to test this.  Fixes bug introduced in !416.
2021-02-25 20:39:56 -08:00
Antonio Sanchez
5529db7524 Fix SSE/NEON pfloor/pceil for saturated values.
The original will saturate if the input does not fit into an integer
type.  Here we fix this, returning the input if it doesn't have
enough precision to have a fractional part.

Also added `pceil` for NEON.

Fixes #1969.
2021-02-25 14:39:26 -08:00
Rasmus Munk Larsen
51eba8c3e2 Fix indentation. 2021-02-25 18:21:21 +00:00
Rasmus Munk Larsen
5297b7162a Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions. 2021-02-25 18:21:21 +00:00
Chip-Kerchner
6eebe97bab Fix clang compile when no MMA flags are set. Simplify MMA compiler detection. 2021-02-24 20:43:23 -06:00
Rasmus Munk Larsen
4cb0592af7 Fix indentation. 2021-02-24 17:59:36 -08:00
Rasmus Munk Larsen
0065f9d322 Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions. 2021-02-25 01:54:36 +00:00
Rasmus Munk Larsen
113e61f364 Remove unused function scalar_cmp_with_cast. 2021-02-24 23:59:35 +00:00
Rasmus Munk Larsen
98ca58b02c Cast anonymous enums to int when used in expressions. 2021-02-24 23:50:06 +00:00
Chip-Kerchner
c31ead8a15 Having forward template function declarations in a P10 file causes bad code in certain situations. 2021-02-24 23:43:30 +00:00
Antonio Sanchez
a31effc3bc Add invoke_result and eliminate result_of warnings for C++17+.
The `std::result_of` meta struct is deprecated in C++17 and removed
in C++20. It was still slipping through due to a faulty definition of
`EIGEN_HAS_STD_RESULT_OF`.

Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and
`Eigen::internal::invoke_result` implementation with fallback for
pre C++17.

Replaces the `result_of` definition with one based on `std::invoke_result`
for C++17 and higher.

For completeness, added nullary op support for c++03.

Fixes #1850.
2021-02-24 21:36:14 +00:00
Chip-Kerchner
8523d447a1 Fixes to support old and new versions of the compilers for built-ins. Cast to non-const when using vector_pair with certain built-ins. 2021-02-24 20:49:15 +00:00
Antonio Sanchez
5908aeeaba Fix CUDA device new and delete, and add test.
HIP does not support new/delete on device, so test is skipped.
2021-02-24 11:31:41 -08:00
Antonio Sanchez
6cf0ab5e99 Disable fast psqrt for NEON.
Accuracy is too poor - requires at least two Newton iterations, but then
it is no longer significantly faster than `vsqrt`.

Fixes #2094.
2021-02-23 19:52:55 -08:00
Antonio Sanchez
aba3998278 Fix check if GPU compile phase for std::hash 2021-02-23 19:52:08 -08:00
Antonio Sanchez
db5691ff2b Fix some CUDA warnings.
Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not
running on GPU.

`std::hash<float>` is not a device function, so cannot be used by
`std::hash<bfloat16>`.  Removed `EIGEN_DEVICE_FUNC` and only
define if `EIGEN_HAS_STD_HASH`. Same for `half`.

Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability,
eliminate warnings about `EIGEN_CUDA_ARCH` not being defined.

Replaced a couple C-style casts with `reinterpret_cast` for aligned
loading of `half*` to `half2*`. This eliminates `-Wcast-align`
warnings in clang.  Although not ideal due to potential type aliasing,
this is how CUDA handles these conversions internally.
2021-02-24 00:16:31 +00:00
Rasmus Munk Larsen
88d4c6d4c8 Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that
make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect
implementation.
2021-02-23 23:11:03 +00:00
Adam Shapiro
2ac0b78739 Fixed sparse conservativeResize() when both num cols and rows decreased.
The previous implementation caused a buffer overflow trying to calculate non-
zero counts for columns that no longer exist.
2021-02-23 21:32:39 +00:00
Chip-Kerchner
10c77b0ff4 Fix compilation errors with later versions of GCC and use of MMA. 2021-02-22 15:01:47 -06:00
Christoph Hertzberg
73922b0174 Fixes Bug #1925. Packets should be passed by const reference, even to inline functions. 2021-02-20 18:56:42 +01:00
Christoph Hertzberg
a7749c09bc Bug #1910: Make SparseCholesky work for RowMajor matrices 2021-02-19 19:36:18 +01:00
Antonio Sánchez
128eebf05e Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC)."
This reverts commit 12fd3dd655
2021-02-19 17:09:16 +00:00
Rasmus Munk Larsen
7f09d3487d Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp. 2021-02-18 20:49:18 +00:00
Masaki Murooka
12fd3dd655 add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC). 2021-02-17 22:55:47 +00:00
David Tellenbach
aa8b22e776 Bump to 3.4.99 2021-02-17 23:23:17 +01:00
David Tellenbach
5336ad8591 Define internal::make_unsigned for [unsigned]long long on macOS.
macOS defines int64_t as long long even for C++03 and therefore expects
a template specialization

  internal::make_unsigned<long long>,

for C++03. Since other platforms define int64_t as long for C++03 we
cannot add the specialization for all cases.
2021-02-17 23:03:10 +01:00