When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.
The boost library unfortunately specializes `conj` for various types and
assumes the original two-template-parameter version. This changes
restores the second parameter. This also restores ABI compatibility.
The specialization for `std::complex` is because `std::conj` is not
a device function. For custom complex scalar types, users should provide
their own `conj` implementation.
We may consider removing the unnecessary second parameter in the future - but
this will require modifying boost as well.
Fixes#2112.
Made a class and singleton to encapsulate initialization and retrieval of
device properties.
Related to !481, which already changed the API to address a static
linkage issue.
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.
Related to !477, which uncovered the issue.
Time-dependence prevents tests from being repeatable. This has long
been an issue with debugging the tensor tests. Removing this will allow
future tests to be repeatable in the usual way.
Also, the recently added macros in !476 are causing headaches across different
platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple
ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE`
are sometimes defined with values, sometimes defined as empty, and sometimes
not defined at all when they probably should be. This is leading to
multiple build breakages.
The simplest approach is to generate a seed via
`Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a
hash based on the current thread ID (since `rand()` isn't supported
on GPU).
Fixes#1602.
m_deviceProperties and m_devicePropInitialized are defined as global
statics which will define multiple copies which can cause issues if
initializeDeviceProp() is called in one translation unit and then
m_deviceProperties is used in a different translation unit. Added
inline functions getDeviceProperties() and getDevicePropInitialized()
which defines those variables as static locals. As per the C++ standard
7.1.2/4, a static local declared in an inline function always refers
to the same object, so this should be safer. Credit to Sun Chenggen
for this fix.
This fixes issue #1475.
`TensorRandom` currently relies on BSD `random()`, which is not always
available. The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html)
gives the glibc condition:
```
_XOPEN_SOURCE >= 500
|| /* Glibc since 2.19: */ _DEFAULT_SOURCE
|| /* Glibc <= 2.19: */ _SVID_SOURCE || _BSD_SOURCE
```
In particular, this was failing to compile for MinGW via msys2. If not
available, we fall back to using `rand()`.
Fixes#2229.
For dynamic matrices with fixed-sized storage, only copy/swap
elements that have been set. Otherwise, this leads to inefficient
copying, and potential UB for non-initialized elements.
The namespace declaration for googlehash is a configurable macro that
can be disabled. In particular, it is disabled within google, causing
compile errors since `dense_hash_map`/`sparse_hash_map` are then in
the global namespace instead of in `::google`.
Here we play a bit of gynastics to allow for both `google::*_hash_map`
and `*_hash_map`, while limiting namespace polution. Symbols within
the `::google` namespace are imported into `Eigen::google`.
We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is
fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be
defined.
The issue was discovered when the GPU scan unit test was run and resulted in a segmentation fault.
The segmantation fault occurred because the unit test allocated GPU memory and passed a pointer to that memory to the computation that it presumed would execute on the GPU.
But because of the issue, the computation was scheduled to execute on the CPU so a situation was constructed where the CPU attempted to access a GPU memory location.
The fix expands the GPU specific ScanLauncher specialization to handle cases where vectorization is enabled.
Previously, the GPU specialization is chosen only if Vectorization is not used.
Should have been 0.5 to widen the bounds, since this is inverse
precision. Setting to 0.5, however, leads to many more failing
tests at Google, so reverting to 1 for now.
Clang-tidy complains that full specializations in headers can cause ODR
violations. Marked these as `inline` to fix.
It also complains about renaming arguments in specializations. Set the
argument names to match.
Adjust the relaxation step to use the condition
```
abs(subdiag[i]) <= epsilon * sqrt(abs(diag[i]) + abs(diag[i+1]))
```
for setting the subdiagonal entry to zero.
Also adjust Wilkinson shift for small `e = subdiag[end-1]` -
I couldn't find a reference for the original, and it was not
consistent with the Wilkinson definition.
Fixes#2191.
Some CUDA/HIP constants fail on device with `constexpr` since they
internally rely on non-constexpr functions, e.g.
```
\#define CUDART_INF_F __int_as_float(0x7f800000)
```
This fails for cuda-clang (though passes with nvcc). These constants are
currently used by `device::numeric_limits`. For portability, we
need to remove `constexpr` from the affected functions.
For C++11 or higher, we should be able to rely on the `std::numeric_limits`
versions anyways, since the methods themselves are now `constexpr`, so
should be supported on device (clang/hipcc natively, nvcc with
`--expr-relaxed-constexpr`).