Commit Graph

11385 Commits

Author SHA1 Message Date
Antonio Sanchez
db5691ff2b Fix some CUDA warnings.
Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not
running on GPU.

`std::hash<float>` is not a device function, so cannot be used by
`std::hash<bfloat16>`.  Removed `EIGEN_DEVICE_FUNC` and only
define if `EIGEN_HAS_STD_HASH`. Same for `half`.

Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability,
eliminate warnings about `EIGEN_CUDA_ARCH` not being defined.

Replaced a couple C-style casts with `reinterpret_cast` for aligned
loading of `half*` to `half2*`. This eliminates `-Wcast-align`
warnings in clang.  Although not ideal due to potential type aliasing,
this is how CUDA handles these conversions internally.
2021-02-24 00:16:31 +00:00
Rasmus Munk Larsen
88d4c6d4c8 Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that
make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect
implementation.
2021-02-23 23:11:03 +00:00
Adam Shapiro
2ac0b78739 Fixed sparse conservativeResize() when both num cols and rows decreased.
The previous implementation caused a buffer overflow trying to calculate non-
zero counts for columns that no longer exist.
2021-02-23 21:32:39 +00:00
Chip-Kerchner
10c77b0ff4 Fix compilation errors with later versions of GCC and use of MMA. 2021-02-22 15:01:47 -06:00
Christoph Hertzberg
73922b0174 Fixes Bug #1925. Packets should be passed by const reference, even to inline functions. 2021-02-20 18:56:42 +01:00
Antonio Sanchez
5f9cfb2529 Add missing adolc isinf/isnan.
Also modified cmake/FindAdolc.cmake to eliminate warnings, and added
search paths to match install layout.

Fixed: #2157
2021-02-19 22:26:56 +00:00
Christoph Hertzberg
ce4af0b38f Missing change regarding #1910 2021-02-19 20:51:35 +01:00
Christoph Hertzberg
a7749c09bc Bug #1910: Make SparseCholesky work for RowMajor matrices 2021-02-19 19:36:18 +01:00
Antonio Sánchez
128eebf05e Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC)."
This reverts commit 12fd3dd655
2021-02-19 17:09:16 +00:00
frgossen
33e0af0130 Return nan at poles of polygamma, digamma, and zeta if limit is not defined 2021-02-19 16:35:11 +00:00
Rasmus Munk Larsen
7f09d3487d Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp. 2021-02-18 20:49:18 +00:00
Masaki Murooka
12fd3dd655 add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC). 2021-02-17 22:55:47 +00:00
David Tellenbach
aa8b22e776 Bump to 3.4.99 2021-02-17 23:23:17 +01:00
David Tellenbach
5336ad8591 Define internal::make_unsigned for [unsigned]long long on macOS.
macOS defines int64_t as long long even for C++03 and therefore expects
a template specialization

  internal::make_unsigned<long long>,

for C++03. Since other platforms define int64_t as long for C++03 we
cannot add the specialization for all cases.
2021-02-17 23:03:10 +01:00
Antonio Sanchez
0845df7f77 Fix uninitialized warning on AVX. 2021-02-17 13:13:39 -08:00
Chip Kerchner
9b51dc7972 Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product 2021-02-17 17:49:23 +00:00
Rasmus Munk Larsen
be0574e215 New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double. 2021-02-17 02:50:32 +00:00
Antonio Sanchez
7ff0b7a980 Updated pfrexp implementation.
The original implementation fails for 0, denormals, inf, and NaN.

See #2150
2021-02-17 02:23:24 +00:00
David Tellenbach
9ad4096ccb Document possible inconsistencies when using Matrix<bool, ...> 2021-02-17 00:50:26 +01:00
Ashutosh Sharma
f702792a7c missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& kernel) 2021-02-16 16:33:59 +00:00
Jan van Dijk
db61b8d478 Avoid -Wunused warnings in NDEBUG builds.
In two places in SuperLUSupport.h, a local variable 'size' is
created that is used only inside an eigen_assert. Remove these,
just fetch the required values inside the assert statements.
This avoids annoying -Wunused warnings (and -Werror=unused errors)
in NDEBUG builds.
2021-02-12 18:35:35 +00:00
David Tellenbach
622c598944 Don't allow all test jobs to fail but only the currently failing ones. 2021-02-12 14:01:17 +01:00
Antonio Sanchez
90ee821c56 Use vrsqrts for rsqrt Newton iterations.
It's slightly faster and slightly more accurate, allowing our current
packetmath tests to pass for sqrt with a single iteration.
2021-02-11 11:33:51 -08:00
Antonio Sanchez
9fde9cce5d Adjust bounds for pexp_float/double
The original clamping bounds on `_x` actually produce finite values:
```
  exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38

  exp(709.437) = 1.27226e+308 < 1.79769e+308
```
so with an accurate `ldexp` implementation, `pexp` fails for large
inputs, producing finite values instead of `inf`.

This adjusts the bounds slightly outside the finite range so that
the output will overflow to +/- `inf` as expected.
2021-02-10 22:48:05 +00:00
Antonio Sanchez
4cb563a01e Fix ldexp implementations.
The previous implementations produced garbage values if the exponent did
not fit within the exponent bits.  See #2131 for a complete discussion,
and !375 for other possible implementations.

Here we implement the 4-factor version. See `pldexp_impl` in
`GenericPacketMathFunctions.h` for a full description.

The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>`
requires `por`.

Left as a "TODO" is to delegate to a faster version if we know the
exponent does fit within the exponent bits.

Fixes #2131.
2021-02-10 22:45:41 +00:00
Ashutosh Sharma
7eb07da538 loop less ptranspose 2021-02-10 10:21:37 -08:00
David Tellenbach
36200b7855 Remove vim specific comments to recognoize correct file-type.
As discussed in #2143 we remove editor specific comments.
2021-02-09 09:13:09 +01:00
David Tellenbach
54589635ad Replace nullptr by NULL in SparseLU.h to be C++03 compliant. 2021-02-09 09:08:06 +01:00
Ralf Hannemann-Tamas
984d010b7b add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves 2021-02-08 22:00:31 +00:00
Nikolaus Demmel
b578930657 Fix documentation typos in LDLT.h 2021-02-08 21:07:29 +00:00
Antonio Sanchez
66841ea070 Enable bdcsvd on host.
Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation
is skipped, leading to a linker error.  This prevents it from running on
the host as well.

Seems it was disabled 6 years ago (5384e891) to match `jacobiSvd`, but
`jacobiSvd` is now enabled on host.  Tested and runs fine on host, but
will not compile/run for device (though it's not labelled as a device
function, so this should be fine).

Fixes #2139
2021-02-08 12:56:23 -08:00
Rasmus Munk Larsen
6e3b795f81 Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one. 2021-02-05 16:58:49 -08:00
Antonio Sanchez
abcde69a79 Disable vectorized pow for half/bfloat16.
We are potentially seeing some accuracy issues with these.  Ideally we
would hand off to `float`, but that's not trivial with the current
setup.

We may want to consider adding `ppow<Packet>` and `HasPow`, so
implementations can more easily specialize this.
2021-02-05 12:17:34 -08:00
Antonio Sanchez
f85038b7f3 Fix excessive GEBP register spilling for 32-bit NEON.
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM,
leading to excessive 16-byte register spills, slowing down basic f32
matrix multiplication by approx 50%.

By specializing `gebp_traits`, we can eliminate the register spills.
Volatile inline ASM both acts as a barrier to prevent reordering and
enforces strict register use. In a simple f32 matrix multiply example,
this modification reduces 16-byte spills from 109 instances to zero,
leading to a 1.5x speed increase (search for `16-byte Spill` in the
assembly in https://godbolt.org/z/chsPbE).

This is a replacement of !379.  See there for further discussion.

Also moved `gebp_traits` specializations for NEON to
`Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside
other NEON-specific code.

Fixes #2138.
2021-02-03 09:01:48 -08:00
Antonio Sanchez
56c8b14d87 Eliminate implicit conversions from float to double. 2021-02-01 15:31:01 -08:00
Antonio Sanchez
fb4548e27b Implement bit_* for device.
Unfortunately `std::bit_and` and the like are host-only functions prior
to c++14 (since they are not `constexpr`).  They also never exist in the
global namespace, so the current implementation  always fails to compile via
NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global
namespace on device.

To overcome these limitations, we implement these functionals here.
2021-02-01 13:27:45 -08:00
Antonio Sanchez
1615a27993 Fix altivec packetmath.
Allows the altivec packetmath tests to pass.  There were a few issues:
- `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems
- `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead
of 0xFFFF)
- `pfrexp` needed to set the `exponent` argument.

Related to !370, #2128

cc: @ChipKerchner @pdrocaldeira

Tested on `_BIG_ENDIAN` running on QEMU with VSX.  Couldn't figure out build
flags to get it to work for little endian.
2021-01-28 18:37:09 +00:00
Chip Kerchner
1414e2212c Fix clang compilation for AltiVec from previous check-in 2021-01-28 18:36:40 +00:00
David Tellenbach
170a504c2f Add the following functions
DenseBase::setConstant(NoChange_t, Index, const Scalar&)
  DenseBase::setConstant(Index, NoChange_t, const Scalar&)

to close #663.
2021-01-28 15:13:07 +01:00
David Tellenbach
598e1b6e54 Add the following functions:
DenseBase::setZero(NoChange_t, Index)
  DenseBase::setZero(Index, NoChange_t)
  DenseBase::setOnes(NoChange_t, Index)
  DenseBase::setOnes(Index, NoChange_t)
  DenseBase::setRandom(NoChange_t, Index)
  DenseBase::setRandom(Index, NoChange_t)

This closes #663.
2021-01-28 01:10:36 +01:00
Gael Guennebaud
0668c68b03 Allow for negative strides.
Note that using a stride of -1 is still not possible because it would
clash with the definition of Eigen::Dynamic.

This fixes #747.
2021-01-27 23:32:12 +01:00
Samir Benmendil
288d456c29 Replace language_support module with builtin CheckLanguage
The workaround_9220 function was introduced a long time ago to
workaround a CMake issue with enable_language(OPTIONAL). Since then
CMake has clarified that the OPTIONAL keywords has not been
implemented[0].

A CheckLanguage module is now provided with CMake to check if a language
can be enabled. Use that instead.

[0] https://cmake.org/cmake/help/v3.18/command/enable_language.html
2021-01-27 13:26:40 +00:00
Antonio Sanchez
3f4684f87d Include <cstdint> in one place, remove custom typedefs
Originating from
[this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case),
some win32 compilers define `__int32` as a `long`, but MinGW defines
`std::int32_t` as an `int`, leading to a type conflict.

To avoid this, we remove the custom `typedef` definitions for win32.  The
Tensor module requires C++11 anyways, so we are guaranteed to have
included `<cstdint>` already in `Eigen/Core`.

Also re-arranged the headers to only include `<cstdint>` in one place to
avoid this type of error again.
2021-01-26 14:23:05 -08:00
Chip Kerchner
0784d9f87b Fix sqrt, ldexp and frexp compilation errors. 2021-01-25 15:22:19 -06:00
Gmc2
a4edb1079c fix test of ExtractVolumePatchesOp 2021-01-25 03:23:46 +00:00
Antonio Sanchez
4c42d5ee41 Eliminate implicit conversion warning in test/array_cwise.cpp 2021-01-23 11:54:00 -08:00
Antonio Sanchez
e0d13ead90 Replace std::isnan with numext::isnan for c++03 2021-01-23 11:02:35 -08:00
Florian Maurin
c35965b381 Remove unused variable in SparseLU.h 2021-01-22 22:24:11 +00:00
Antonio Sanchez
f0e46ed5d4 Fix pow and other cwise ops for half/bfloat16.
The new `generic_pow` implementation was failing for half/bfloat16 since
their construction from int/float is not `constexpr`. Modified
in `GenericPacketMathFunctions` to remove `constexpr`.

While adding tests for half/bfloat16, found other issues related to
implicit conversions.

Also needed to implement `numext::arg` for non-integer, non-complex,
non-float/double/long double types.  These seem to be  implicitly
converted to `std::complex<T>`, which then fails for half/bfloat16.
2021-01-22 11:10:54 -08:00
Antonio Sanchez
f19bcffee6 Specialize std::complex operators for use on GPU device.
NVCC and older versions of clang do not fully support `std::complex` on device,
leading to either compile errors (Cannot call `__host__` function) or worse,
runtime errors (Illegal instruction).  For most functions, we can
implement specialized `numext` versions. Here we specialize the standard
operators (with the exception of stream operators and member function operators
with a scalar that are already specialized in `<complex>`) so they can be used
in device code as well.

To import these operators into the current scope, use
`EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into
the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces.

This allow us to remove specializations of the
sum/difference/product/quotient ops, and allow us to treat complex
numbers like most other scalars (e.g. in tests).
2021-01-22 18:19:19 +00:00