Commit Graph

2957 Commits

Author SHA1 Message Date
Nico
b17bcddbca Fix -Wbitwise-instead-of-logical clang warning
& and | short-circuit, && and || don't. When both arguments to those
are boolean, the short-circuiting version is usually the desired one, so
clang warns on this.

Here, it is inconsequential, so switch to && and || to suppress the warning.
2021-10-21 23:32:45 -04:00
Antonio Sanchez
24ebb37f38 Disable Tree reduction for GPU.
For moderately sized inputs, running the Tree reduction quickly
fills/overflows the GPU thread stack space, leading to memory errors.
This was happening in the `cxx11_tensor_complex_gpu` test, for example.
Disabling tree reduction on GPU fixes this.
2021-10-20 20:42:37 +00:00
Rasmus Munk Larsen
360290fc42 Improve accuracy of full tensor reduction for half and bfloat16 by reducing leaf size in tree reduction.
Add more unit tests for summation accuracy.
2021-10-20 19:54:06 +00:00
Antonio Sanchez
d0d34524a1 Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h
The `Complex.h` file applies equally to HIP/CUDA, so placing under the
generic `GPU` folder.

The `TensorReductionCuda.h` has already been deprecated, now removing
for the next Eigen version.
2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen
1d75fab368 Speed up tensor reduction 2021-10-02 14:58:23 +00:00
Antonio Sanchez
be9e7d205f Reduce tensor_contract_gpu test.
The original test times out after 60 minutes on Windows, even when
setting flags to optimize for speed.  Reducing the number of
contractions performed from 3600->27 for subtests 8,9 allow the
two to run in just over a minute each.
2021-10-02 04:36:15 +00:00
Antonio Sanchez
701f5d1c91 Fix gpu special function tests.
Some checks used incorrect values, partly from copy-paste errors,
partly from the change in behaviour introduced in !398.

Modified results to match scipy, simplified tests by updating
`VERIFY_IS_CWISE_APPROX` to work for scalars.
2021-10-01 10:20:50 -07:00
Antonio Sanchez
de218b471d Add -arch=<arch> argument for nvcc.
Without this flag, when compiling with nvcc, if the compute architecture of a card does
not exactly match any of those listed for `-gencode arch=compute_<arch>,code=sm_<arch>`,
then the kernel will fail to run with:
```
cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device.
```
This can happen, for example, when compiling with an older cuda version
that does not support a newer architecture (e.g. T4 is `sm_75`, but cuda
9.2 only supports up to `sm_70`).

With the `-arch=<arch>` flag, the code will compile and run at the
supplied architecture.
2021-09-24 20:48:01 -07:00
Antonio Sanchez
846d34384a Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS
Also add a missing space for clang.
2021-09-24 20:15:55 -07:00
Antonio Sanchez
7b00e8b186 Clean up CUDA CMake files.
- Unify test/CMakeLists.txt and unsupported/test/CMakeLists.txt
- Added `EIGEN_CUDA_FLAGS` that are appended to the set of flags passed
to the cuda compiler (nvcc or clang).

The latter is to support passing custom flags (e.g. `-arch=` to nvcc,
or to disable cuda-specific warnings).
2021-09-24 14:43:59 -07:00
Kolja Brix
afa616bc9e Fix some typos found 2021-09-23 15:22:00 +00:00
sciencewhiz
4b6036e276 fix various typos 2021-09-22 16:15:06 +00:00
Alexander Karatarakis
4d622be118 [AutodiffScalar] Remove const when returning by value
clang-tidy: Return type 'const T' is 'const'-qualified at the top level,
which may reduce code readability without improving const correctness

The types are somewhat long, but the affected return types are of the form:
```
const T my_func() { /**/ }
```

Change to:
```
T my_func() { /**/ }
```
2021-09-18 21:23:32 +00:00
Rasmus Munk Larsen
6cadab6896 Clean up EIGEN_STATIC_ASSERT to only use standard c++11 static_assert. 2021-09-16 20:43:54 +00:00
Rasmus Munk Larsen
d7d0bf832d Issue an error in case of direct inclusion of internal headers. 2021-09-10 19:12:26 +00:00
Antonio Sanchez
6c10495a78 Remove unnecessary std::tuple reference. 2021-09-09 15:49:44 +00:00
Antonio Sanchez
eea2a3385c Remove more DynamicSparseMatrix references.
Also fixed some typos in SparseExtra/MarketIO.h.
2021-09-02 15:36:47 -07:00
Jens Wehner
8286073c73 Matrixmarket extension 2021-09-02 17:23:33 +00:00
Antonio Sanchez
74da2e6821 Rename Tuple -> Pair.
This is to make way for a new `Tuple` class that mimics `std::tuple`,
but can be reliably used on device and with aligned Eigen types.

The existing Tuple has very few references, and is actually an
analogue of `std::pair`.
2021-09-02 02:20:54 +00:00
Maxiwell S. Garcia
09fc0f97b5 Rename 'vec_all_nan' of cxx11_tensor_expr test because this symbol is used by altivec.h 2021-09-01 16:42:51 +00:00
jenswehner
a443a2373f updated documentation 2021-08-31 22:58:28 +00:00
Antonio Sanchez
cc3573ab44 Disable cuda Eigen::half vectorization on host.
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.

All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.

Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.
2021-08-31 19:13:12 +00:00
Turing Eret
3324389f6d Add EIGEN_TENSOR_PLUGIN support per issue #2052. 2021-08-30 19:36:55 +00:00
Jens Wehner
53ad9c75b4 included unordered_map header 2021-08-27 16:53:28 +00:00
jenswehner
9abf4d0bec made RandomSetter C++11 compatible 2021-08-25 20:24:55 +00:00
Antonio Sanchez
eeacbd26c8 Bump CMake files to at least c++11.
Removed all configurations that explicitly test or set the c++ standard
flags. The only place the standard is now configured is at the top of
the main `CMakeLists.txt` file, which can easily be updated (e.g. if
we decide to move to c++14+). This can also be set via command-line using
```
> cmake -DCMAKE_CXX_STANDARD 14
```

Kept the `EIGEN_TEST_CXX11` flag for now - that still controls whether to
build/run the `cxx11_*` tests.  We will likely end up renaming these
tests and removing the `CXX11` subfolder.
2021-08-25 20:07:48 +00:00
jenswehner
90b3b6b572 added doxygen flowchart 2021-08-24 17:11:51 +00:00
jenswehner
d85de1ef56 removed sparse dynamic matrix 2021-08-24 10:33:00 +02:00
jenswehner
e3e74001f7 added includes for unordered_map 2021-08-10 13:34:57 +02:00
Alexander Karatarakis
4ba872bd75 Avoid leading underscore followed by cap in template identifiers 2021-08-04 22:41:52 +00:00
Antonio Sanchez
31f796ebef Fix MPReal detection and support.
The latest version of `mpreal` has a bug that breaks `min`/`max`.
It also breaks with the latest dev version of `mpfr`. Here we
add `FindMPREAL.cmake` which searches for the library and tests if
compilation works.

Removed our internal copy of `mpreal.h` under `unsupported/test`, as
it is out-of-sync with the latest, and similarly breaks with
the latest `mpfr`.  It would be best to use the installed version
of `mpreal` anyways, since that's what we actually want to test.

Fixes #2282.
2021-08-03 17:55:03 +00:00
Antonio Sanchez
8cf6cb27ba Fix TriSycl CMake files.
This is to enable compiling with the latest trisycl. `FindTriSYCL.cmake` was
broken by commit 00f32752, which modified `add_sycl_to_target` for ComputeCPP.
This makes the corresponding modifications for trisycl to make them consistent.

Also, trisycl now requires c++17.
2021-08-03 16:44:29 +00:00
Alexander Karatarakis
f357283d31 _DerType -> DerivativeType as underscore-followed-by-caps is a reserved identifier 2021-07-29 18:02:04 +00:00
Antonio Sanchez
1fd5ce1002 For GpuDevice::fill, use a single memset if all bytes are equal.
The original `fill` implementation introduced a 5x regression on my
nvidia Quadro K1200.  @rohitsan reported up to 100x regression for
HIP.  This restores performance.
2021-07-10 13:37:16 +00:00
Antonio Sanchez
9c22795d65 Put attach/detach buffer back in for TensorDeviceSycl.
Also added a test to verify the original buffer is updated correctly.
2021-07-09 10:00:05 -07:00
Antonio Sanchez
1e6c6c1576 Replace memset with fill to work for non-trivial scalars.
For custom scalars, zero is not necessarily represented by
a zeroed-out memory block (e.g. gnu MPFR). We therefore
cannot rely on `memset` if we want to fill a matrix or tensor
with zeroes. Instead, we should rely on `fill`, which for trivial
types does end up getting converted to a `memset` under-the-hood
(at least with gcc/clang).

Requires adding a `fill(begin, end, v)` to `TensorDevice`.

Replaced all potentially bad instances of memset with fill.

Fixes #2245.
2021-07-08 18:34:41 +00:00
Jonas Harsch
e9c9a3130b Removed superfluous boolean degenerate in TensorMorphing.h. 2021-07-08 18:02:58 +00:00
Antonio Sanchez
f5a9873bbb Fix Tensor documentation page.
The extra [TOC] tag is generating a huge floating duplicated
table-of-contents, which obscures the majority of the page
(see bottom of https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html).
Remove it.

Also, headers do not support markup (see
[doxygen bug](https://github.com/doxygen/doxygen/issues/7467)), so
backticks like
```
```
end up generating titles that looks like
```
Constructor <tt>Tensor<double,2></tt>
```
Removing backticks for now.  To generate proper formatted headers, we
must directly use html instead of markdown, i.e.
```
<h2>Constructor <code>Tensor&lt;double,2&gt;</code></h2>
```
which is ugly.

Fixes #2254.
2021-07-03 04:39:22 +00:00
Jonas Harsch
aab747021b Don't crash when attempting to shuffle an empty tensor. 2021-07-02 20:33:52 +00:00
Antonio Sanchez
6035da5283 Fix compile issues for gcc 4.8.
- Move constructors can only be defaulted as NOEXCEPT if all members
have NOEXCEPT move constructors.
- gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.
2021-07-01 22:58:14 +00:00
Antonio Sanchez
3a087ccb99 Modify tensor argmin/argmax to always return first occurence.
As written, depending on multithreading/gpu, the returned index from
`argmin`/`argmax` is not currently stable.  Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).

This is otherwise causing unpredictable results in some TF tests.
2021-06-29 10:36:20 -07:00
Antonio Sanchez
e9ab4278b7 Rewrite balancer to avoid overflows.
The previous balancer overflowed for large row/column norms.
Modified to prevent that.

Fixes #2273.
2021-06-21 17:29:55 +00:00
jenswehner
175f0cc1e9 changed documentation to make example compile 2021-06-16 11:45:06 +02:00
Antonio Sanchez
954879183b Fix placement of permanent GPU defines. 2021-06-15 12:17:09 -07:00
Rasmus Munk Larsen
13fb5ab92c Fix more enum arithmetic. 2021-06-15 09:09:31 -07:00
Antonio Sanchez
514977f31b Add ability to permanently enable HIP/CUDA gpu* defines.
When using Eigen for gpu, these simplify portability.  If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.
2021-06-11 17:19:54 +00:00
Antonio Sanchez
6aec83263d Allow custom TENSOR_CONTRACTION_DISPATCH macro.
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.
2021-06-11 17:02:19 +00:00
Rohit Santhanam
c8d40a7bf1 Removed dead code from GPU float16 unit test. 2021-05-28 20:06:48 +00:00
Nathan Luehr
972cf0c28a Fix calls to device functions from host code 2021-05-11 22:47:49 +00:00
Antonio Sanchez
0eba8a1fe3 Clean up gpu device properties.
Made a class and singleton to encapsulate initialization and retrieval of
device properties.

Related to !481, which already changed the API to address a static
linkage issue.
2021-05-07 17:51:29 +00:00