eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-27 07:29:52 +08:00

Author	SHA1	Message	Date
Antonio Sanchez	24ebb37f38	Disable Tree reduction for GPU. For moderately sized inputs, running the Tree reduction quickly fills/overflows the GPU thread stack space, leading to memory errors. This was happening in the `cxx11_tensor_complex_gpu` test, for example. Disabling tree reduction on GPU fixes this.	2021-10-20 20:42:37 +00:00
Rasmus Munk Larsen	360290fc42	Improve accuracy of full tensor reduction for half and bfloat16 by reducing leaf size in tree reduction. Add more unit tests for summation accuracy.	2021-10-20 19:54:06 +00:00
Antonio Sanchez	d0d34524a1	Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h The `Complex.h` file applies equally to HIP/CUDA, so placing under the generic `GPU` folder. The `TensorReductionCuda.h` has already been deprecated, now removing for the next Eigen version.	2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen	1d75fab368	Speed up tensor reduction	2021-10-02 14:58:23 +00:00
Antonio Sanchez	be9e7d205f	Reduce tensor_contract_gpu test. The original test times out after 60 minutes on Windows, even when setting flags to optimize for speed. Reducing the number of contractions performed from 3600->27 for subtests 8,9 allow the two to run in just over a minute each.	2021-10-02 04:36:15 +00:00
Antonio Sanchez	701f5d1c91	Fix gpu special function tests. Some checks used incorrect values, partly from copy-paste errors, partly from the change in behaviour introduced in !398. Modified results to match scipy, simplified tests by updating `VERIFY_IS_CWISE_APPROX` to work for scalars.	2021-10-01 10:20:50 -07:00
Antonio Sanchez	de218b471d	Add -arch=<arch> argument for nvcc. Without this flag, when compiling with nvcc, if the compute architecture of a card does not exactly match any of those listed for `-gencode arch=compute_<arch>,code=sm_<arch>`, then the kernel will fail to run with: ``` cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device. ``` This can happen, for example, when compiling with an older cuda version that does not support a newer architecture (e.g. T4 is `sm_75`, but cuda 9.2 only supports up to `sm_70`). With the `-arch=<arch>` flag, the code will compile and run at the supplied architecture.	2021-09-24 20:48:01 -07:00
Antonio Sanchez	846d34384a	Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS Also add a missing space for clang.	2021-09-24 20:15:55 -07:00
Antonio Sanchez	7b00e8b186	Clean up CUDA CMake files. - Unify test/CMakeLists.txt and unsupported/test/CMakeLists.txt - Added `EIGEN_CUDA_FLAGS` that are appended to the set of flags passed to the cuda compiler (nvcc or clang). The latter is to support passing custom flags (e.g. `-arch=` to nvcc, or to disable cuda-specific warnings).	2021-09-24 14:43:59 -07:00
Kolja Brix	afa616bc9e	Fix some typos found	2021-09-23 15:22:00 +00:00
sciencewhiz	4b6036e276	fix various typos	2021-09-22 16:15:06 +00:00
Alexander Karatarakis	4d622be118	[AutodiffScalar] Remove const when returning by value clang-tidy: Return type 'const T' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness The types are somewhat long, but the affected return types are of the form: ``` const T my_func() { // } ``` Change to: ``` T my_func() { // } ```	2021-09-18 21:23:32 +00:00
Rasmus Munk Larsen	6cadab6896	Clean up EIGEN_STATIC_ASSERT to only use standard c++11 static_assert.	2021-09-16 20:43:54 +00:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Antonio Sanchez	6c10495a78	Remove unnecessary std::tuple reference.	2021-09-09 15:49:44 +00:00
Antonio Sanchez	eea2a3385c	Remove more DynamicSparseMatrix references. Also fixed some typos in SparseExtra/MarketIO.h.	2021-09-02 15:36:47 -07:00
Jens Wehner	8286073c73	Matrixmarket extension	2021-09-02 17:23:33 +00:00
Antonio Sanchez	74da2e6821	Rename Tuple -> Pair. This is to make way for a new `Tuple` class that mimics `std::tuple`, but can be reliably used on device and with aligned Eigen types. The existing Tuple has very few references, and is actually an analogue of `std::pair`.	2021-09-02 02:20:54 +00:00
Maxiwell S. Garcia	09fc0f97b5	Rename 'vec_all_nan' of cxx11_tensor_expr test because this symbol is used by altivec.h	2021-09-01 16:42:51 +00:00
jenswehner	a443a2373f	updated documentation	2021-08-31 22:58:28 +00:00
Antonio Sanchez	cc3573ab44	Disable cuda Eigen::half vectorization on host. All cuda `__half` functions are device-only in CUDA 9, including conversions. Host-side conversions were added in CUDA 10. The existing code doesn't build prior to 10.0. All arithmetic functions are always device-only, so there's therefore no reason to use vectorization on the host at all. Modified the code to disable vectorization for `__half` on host, which required also updating the `TensorReductionGpu` implementation which previously made assumptions about available packets.	2021-08-31 19:13:12 +00:00
Turing Eret	3324389f6d	Add EIGEN_TENSOR_PLUGIN support per issue #2052 .	2021-08-30 19:36:55 +00:00
Jens Wehner	53ad9c75b4	included unordered_map header	2021-08-27 16:53:28 +00:00
jenswehner	9abf4d0bec	made RandomSetter C++11 compatible	2021-08-25 20:24:55 +00:00
Antonio Sanchez	eeacbd26c8	Bump CMake files to at least c++11. Removed all configurations that explicitly test or set the c++ standard flags. The only place the standard is now configured is at the top of the main `CMakeLists.txt` file, which can easily be updated (e.g. if we decide to move to c++14+). This can also be set via command-line using ``` > cmake -DCMAKE_CXX_STANDARD 14 ``` Kept the `EIGEN_TEST_CXX11` flag for now - that still controls whether to build/run the `cxx11_*` tests. We will likely end up renaming these tests and removing the `CXX11` subfolder.	2021-08-25 20:07:48 +00:00
jenswehner	90b3b6b572	added doxygen flowchart	2021-08-24 17:11:51 +00:00
jenswehner	d85de1ef56	removed sparse dynamic matrix	2021-08-24 10:33:00 +02:00
jenswehner	e3e74001f7	added includes for unordered_map	2021-08-10 13:34:57 +02:00
Alexander Karatarakis	4ba872bd75	Avoid leading underscore followed by cap in template identifiers	2021-08-04 22:41:52 +00:00
Antonio Sanchez	31f796ebef	Fix MPReal detection and support. The latest version of `mpreal` has a bug that breaks `min`/`max`. It also breaks with the latest dev version of `mpfr`. Here we add `FindMPREAL.cmake` which searches for the library and tests if compilation works. Removed our internal copy of `mpreal.h` under `unsupported/test`, as it is out-of-sync with the latest, and similarly breaks with the latest `mpfr`. It would be best to use the installed version of `mpreal` anyways, since that's what we actually want to test. Fixes #2282.	2021-08-03 17:55:03 +00:00
Antonio Sanchez	8cf6cb27ba	Fix TriSycl CMake files. This is to enable compiling with the latest trisycl. `FindTriSYCL.cmake` was broken by commit `00f32752`, which modified `add_sycl_to_target` for ComputeCPP. This makes the corresponding modifications for trisycl to make them consistent. Also, trisycl now requires c++17.	2021-08-03 16:44:29 +00:00
Alexander Karatarakis	f357283d31	_DerType -> DerivativeType as underscore-followed-by-caps is a reserved identifier	2021-07-29 18:02:04 +00:00
Antonio Sanchez	1fd5ce1002	For GpuDevice::fill, use a single memset if all bytes are equal. The original `fill` implementation introduced a 5x regression on my nvidia Quadro K1200. @rohitsan reported up to 100x regression for HIP. This restores performance.	2021-07-10 13:37:16 +00:00
Antonio Sanchez	9c22795d65	Put attach/detach buffer back in for TensorDeviceSycl. Also added a test to verify the original buffer is updated correctly.	2021-07-09 10:00:05 -07:00
Antonio Sanchez	1e6c6c1576	Replace memset with fill to work for non-trivial scalars. For custom scalars, zero is not necessarily represented by a zeroed-out memory block (e.g. gnu MPFR). We therefore cannot rely on `memset` if we want to fill a matrix or tensor with zeroes. Instead, we should rely on `fill`, which for trivial types does end up getting converted to a `memset` under-the-hood (at least with gcc/clang). Requires adding a `fill(begin, end, v)` to `TensorDevice`. Replaced all potentially bad instances of memset with fill. Fixes #2245.	2021-07-08 18:34:41 +00:00
Jonas Harsch	e9c9a3130b	Removed superfluous boolean `degenerate` in TensorMorphing.h.	2021-07-08 18:02:58 +00:00
Antonio Sanchez	f5a9873bbb	Fix Tensor documentation page. The extra [TOC] tag is generating a huge floating duplicated table-of-contents, which obscures the majority of the page (see bottom of https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html). Remove it. Also, headers do not support markup (see [doxygen bug](https://github.com/doxygen/doxygen/issues/7467)), so backticks like ``` ``` end up generating titles that looks like ``` Constructor <tt>Tensor<double,2></tt> ``` Removing backticks for now. To generate proper formatted headers, we must directly use html instead of markdown, i.e. ``` <h2>Constructor <code>Tensor<double,2></code></h2> ``` which is ugly. Fixes #2254.	2021-07-03 04:39:22 +00:00
Jonas Harsch	aab747021b	Don't crash when attempting to shuffle an empty tensor.	2021-07-02 20:33:52 +00:00
Antonio Sanchez	6035da5283	Fix compile issues for gcc 4.8. - Move constructors can only be defaulted as NOEXCEPT if all members have NOEXCEPT move constructors. - gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.	2021-07-01 22:58:14 +00:00
Antonio Sanchez	3a087ccb99	Modify tensor argmin/argmax to always return first occurence. As written, depending on multithreading/gpu, the returned index from `argmin`/`argmax` is not currently stable. Here we modify the functors to always keep the first occurence (i.e. if the value is equal to the current min/max, then keep the one with the smallest index). This is otherwise causing unpredictable results in some TF tests.	2021-06-29 10:36:20 -07:00
Antonio Sanchez	e9ab4278b7	Rewrite balancer to avoid overflows. The previous balancer overflowed for large row/column norms. Modified to prevent that. Fixes #2273.	2021-06-21 17:29:55 +00:00
jenswehner	175f0cc1e9	changed documentation to make example compile	2021-06-16 11:45:06 +02:00
Antonio Sanchez	954879183b	Fix placement of permanent GPU defines.	2021-06-15 12:17:09 -07:00
Rasmus Munk Larsen	13fb5ab92c	Fix more enum arithmetic.	2021-06-15 09:09:31 -07:00
Antonio Sanchez	514977f31b	Add ability to permanently enable HIP/CUDA gpu* defines. When using Eigen for gpu, these simplify portability. If `EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then we do not undefine them.	2021-06-11 17:19:54 +00:00
Antonio Sanchez	6aec83263d	Allow custom TENSOR_CONTRACTION_DISPATCH macro. Currently TF lite needs to hack around with the Tensor headers in order to customize the contraction dispatch method. Here we add simple `#ifndef` guards to allow them to provide their own dispatch prior to inclusion.	2021-06-11 17:02:19 +00:00
Rohit Santhanam	c8d40a7bf1	Removed dead code from GPU float16 unit test.	2021-05-28 20:06:48 +00:00
Nathan Luehr	972cf0c28a	Fix calls to device functions from host code	2021-05-11 22:47:49 +00:00
Antonio Sanchez	0eba8a1fe3	Clean up gpu device properties. Made a class and singleton to encapsulate initialization and retrieval of device properties. Related to !481, which already changed the API to address a static linkage issue.	2021-05-07 17:51:29 +00:00
Antonio Sanchez	e3b7f59659	Simplify TensorRandom and remove time-dependence. Time-dependence prevents tests from being repeatable. This has long been an issue with debugging the tensor tests. Removing this will allow future tests to be repeatable in the usual way. Also, the recently added macros in !476 are causing headaches across different platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE` are sometimes defined with values, sometimes defined as empty, and sometimes not defined at all when they probably should be. This is leading to multiple build breakages. The simplest approach is to generate a seed via `Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a hash based on the current thread ID (since `rand()` isn't supported on GPU). Fixes #1602.	2021-05-04 13:34:49 -07:00

1 2 3 4 5 ...

2956 Commits