eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
jenswehner	675b72e44b	added clang format	2021-11-09 23:49:01 +01:00
Ben Barsdell	50df8d3d6d	Avoid integer overflow in EigenMetaKernel indexing - The current implementation computes `size + total_threads`, which can overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to the maximum representable value. - The num_blocks calculation can also overflow due to the implementation of divup(). - This patch prevents these overflows and allows the kernel to work correctly for the full representable range of tensor sizes. - Also adds relevant tests.	2021-11-05 16:39:37 +11:00
Rasmus Munk Larsen	55e3ae02ac	Compare summation results against forward error bound.	2021-11-04 18:04:04 -07:00
Gengxin Xie	5c642950a5	Bug Fix: correct the bug that won't define EIGEN_HAS_FP16_C if the compiler isn't clang	2021-11-04 22:13:01 +00:00
Gilad	0d73440fb2	Documentation of Quaternion constructor from MatrixBase	2021-11-04 16:21:26 +00:00
Minh Quan HO	4284c68fbb	nestbyvalue test: fix uninitialized matrix - Doing computation with uninitialized (zero-ed ? but thanks Linux) matrix, or worse NaN on other non-linux systems. - This commit fixes it by initializing to Random().	2021-11-04 14:32:12 +01:00
Xinle Liu	478a1bdda6	Fix total deflation issue in BDCSVD, when & only when M is already diagonal.	2021-11-02 16:53:55 +00:00
Antonio Sanchez	8f8c2ba2fe	Remove bad "take" impl that causes g++-11 crash. For some reason, having `take<n, numeric_list<T>>` for `n > 0` causes g++-11 to ICE with ``` sorry, unimplemented: unexpected AST of kind nontype_argument_pack ``` It does work with other versions of gcc, and with clang. I filed a GCC bug [here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102999). Technically we should never actually run into this case, since you can't take n > 0 elements from an empty list. Commenting it out allows our Eigen tests to pass.	2021-11-01 17:04:41 +00:00
Antonio Sanchez	f6c8cc0e99	Fix TensorReduction warnings and error bound for sum accuracy test. The sum accuracy test currently uses the default test precision for the given scalar type. However, scalars are generated via a normal distribution, and given a large enough count and strong enough random generator, the expected sum is zero. This causes the test to periodically fail. Here we estimate an upper-bound for the error as `sqrt(N) * prec` for summing N values, with each having an approximate epsilon of `prec`. Also fixed a few warnings generated by MSVC when compiling the reduction test.	2021-10-30 14:59:00 -07:00
Rasmus Munk Larsen	b3bea43a2d	Don't use unrolled loops for stateful reducers. The problem is the combination step, e.g. reducer0.reducePacket(accum1, accum0); reducer0.reducePacket(accum2, accum0); reducer0.reducePacket(accum3, accum0); For the mean reducer this will increment the count as well as adding together the accumulators and result in the wrong count being divided into the sum at the end.	2021-10-28 23:52:54 +00:00
Chip Kerchner	9cf34ee0ae	Invert rows and depth in non-vectorized portion of packing (PowerPC).	2021-10-28 21:59:41 +00:00
Ilya Tokar	e1cb6369b0	Add AVX vector path to float2half/half2float Makes e. g. matrix multiplication 2x faster: name old cpu/op new cpu/op delta BM_convers 181ms ± 1% 62ms ± 9% -65.82% (p=0.016 n=4+5) Tested on all possible input values (not adding tests, since they take a long time).	2021-10-28 13:59:01 -04:00
Antonio Sanchez	03d4cbb307	Fix min/max nan-propagation for scalar "other". Copied input type from `EIGEN_MAKE_CWISE_BINARY_OP`. Fixes #2362.	2021-10-28 09:28:29 -07:00
Antonio Sanchez	e559701981	Fix compile issue for gcc 4.8	2021-10-28 08:23:19 -07:00
Fabian Keßler	19cacd3ecb	optimize cmake scripts for subproject use	2021-10-28 16:08:02 +02:00
Rohit Santhanam	48e40b22bf	Preliminary HIP bfloat16 GPU support.	2021-10-27 18:36:45 +00:00
Antonio Sanchez	40bbe8a4d0	Fix ZVector build. Cross-compiled via `s390x-linux-gnu-g++`, run via qemu. This allows the packetmath tests to pass.	2021-10-27 16:30:15 +00:00
Alex Druinsky	6bb6a6bf53	Vectorize fp16 tanh and logistic functions on Neon Activates vectorization of the Eigen::half versions of the tanh and logistic functions when they run on Neon. Both functions convert their inputs to float before computing the output, and as a result of this commit, the conversions and the computation in float are vectorized.	2021-10-27 16:09:16 +00:00
Antonio Sánchez	185ad0e610	Revert "Avoid integer overflow in EigenMetaKernel indexing" This reverts commit `100d7caf92`	2021-10-27 14:55:25 +00:00
Rasmus Munk Larsen	68e0d023c0	Remove license column in tables for builtin sparse solvers since all are MPL2 now.	2021-10-26 18:09:22 +00:00
Andreas Krebbel	8faafc3aaa	ZVector: Move alignas qualifier to come first We currently have plenty of type definitions with the alignment qualifier coming after the type. The compiler warns about ignoring them: int EIGEN_ALIGN16 ai[4]; Turn this into: EIGEN_ALIGN16 int ai[4];	2021-10-26 15:33:47 +02:00
Ben Barsdell	100d7caf92	Avoid integer overflow in EigenMetaKernel indexing - The current implementation computes `size + total_threads`, which can overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to the maximum representable value. - The num_blocks calculation can also overflow due to the implementation of divup(). - This patch prevents these overflows and allows the kernel to work correctly for the full representable range of tensor sizes. - Also adds relevant tests.	2021-10-26 00:04:28 +00:00
Alex Druinsky	d0e3791b1a	Fix vectorized reductions for Eigen::half Fixes compiler errors in expressions that look like Eigen::Matrix<Eigen::half, 3, 1>::Random().maxCoeff() The error comes from the code that creates the initial value for vectorized reductions. The fix is to specify the scalar type of the reduction's initial value. The cahnge is necessary for Eigen::half because unlike other types, Eigen::half scalars cannot be implicitly created from integers.	2021-10-25 14:44:33 -07:00
Maxiwell S. Garcia	99600bd1a6	test: fix boostmutiprec test to compile with older Boost versions Eigen boostmultiprec test redefines a symbol that is already defined inside Boot Math [1]. Boost has fixed it recently [2], but this patch avoids errors if Boost version was less than 1.77. https://github.com/boostorg/math/blob/boost-1.76.0/include/boost/math/policies/policy.hpp#L18 `6830712302 (diff-c7a8e5911c2e6be4138e1a966d762200f147792ac16ad96fdcc724313d11f839)`	2021-10-25 20:32:33 +00:00
Yann Billeter	6c3206152a	fix(CommaInitializer): pass dims at compile-time	2021-10-25 19:53:38 +00:00
Antonio Sanchez	a500da1dc0	Fix broadcasting oob error. For vectorized 1-dimensional inputs that do not take the special blocking path (e.g. `std::complex<...>`), there was an index-out-of-bounds error causing the broadcast size to be computed incorrectly. Here we fix this, and make other minor cleanup changes. Fixes #2351.	2021-10-25 19:31:12 +00:00
Antonio Sanchez	0578feaabc	Remove const from visitor return type. This seems to interfere with `pload`/`ploadu`, since `pload<const Packet**>` are not defined. This should unbreak the arm/ppc builds.	2021-10-25 19:09:50 +00:00
benardp	b63c096fbb	Extend EIGEN_QT_SUPPORT to Qt6	2021-10-23 23:43:06 +00:00
Lennart Steffen	163f11e24a	Included note on inner stride for compile-time vectors. See https://gitlab.com/libeigen/eigen/-/issues/2355#note_711078126	2021-10-22 09:46:43 +00:00
Nico	b17bcddbca	Fix -Wbitwise-instead-of-logical clang warning & and \| short-circuit, && and \|\| don't. When both arguments to those are boolean, the short-circuiting version is usually the desired one, so clang warns on this. Here, it is inconsequential, so switch to && and \|\| to suppress the warning.	2021-10-21 23:32:45 -04:00
Rasmus Munk Larsen	2d3fec8ff6	Add nan-propagation options to matrix and array plugins.	2021-10-21 19:40:11 +00:00
Antonio Sanchez	b86e013321	Revert bit_cast to use memcpy for CUDA. To elide the memcpy, we need to first load the `src` value into registers by making a local copy. This avoids the need to resort to potential UB by using `reinterpret_cast`. This change doesn't seem to affect CPU (at least not with gcc/clang). With optimizations on, the copy is also elided.	2021-10-21 08:14:11 -07:00
Antonio Sanchez	45e67a6fda	Use reinterpret_cast on GPU for bit_cast. This seems to be the recommended approach for doing type punning in CUDA. See for example - https://stackoverflow.com/questions/47037104/cuda-type-punning-memcpy-vs-ub-union - https://developer.nvidia.com/blog/faster-parallel-reductions-kepler/ (the latter puns a double to an `int2`). The issue is that for CUDA, the `memcpy` is not elided, and ends up being an expensive operation. We already have similar `reintepret_cast`s across the Eigen codebase for GPU (as does TensorFlow).	2021-10-20 21:34:40 +00:00
Antonio Sanchez	24ebb37f38	Disable Tree reduction for GPU. For moderately sized inputs, running the Tree reduction quickly fills/overflows the GPU thread stack space, leading to memory errors. This was happening in the `cxx11_tensor_complex_gpu` test, for example. Disabling tree reduction on GPU fixes this.	2021-10-20 20:42:37 +00:00
Rasmus Munk Larsen	360290fc42	Improve accuracy of full tensor reduction for half and bfloat16 by reducing leaf size in tree reduction. Add more unit tests for summation accuracy.	2021-10-20 19:54:06 +00:00
Antonio Sanchez	95bb645e92	Fix MSVC+NVCC EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR compilation. Looks like we need to update the `EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR` for newer versions of MSVC as well when compiling with NVCC. Fixes build issues for VS 2017.	2021-10-20 19:38:14 +00:00
Antonio Sanchez	fd5f48e465	Fix tuple compilation for VS2017. VS2017 doesn't like deducing alias types, leading to a bunch of compile errors for functions involving the `tuple` alias. Replacing with `TupleImpl` seems to solve this, allowing the test to compile/pass.	2021-10-20 19:18:34 +00:00
Antonio Sanchez	d0d34524a1	Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h The `Complex.h` file applies equally to HIP/CUDA, so placing under the generic `GPU` folder. The `TensorReductionCuda.h` has already been deprecated, now removing for the next Eigen version.	2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen	f2c9c2d2f7	Vectorize Visitor.h.	2021-10-20 16:58:01 +00:00
Antonio Sanchez	2bf07fa5b5	Fix Windows CMake compiler/OS detection. Replaced deprecated `DetermineVSServicePack`macro with recommended `CMAKE_CXX_COMPILER_VERSION`. Deleted custom `OSVersion` detection. The windows-specific code is highly outdated, and on other systems simply returns `CMAKE_SYSTEM`. We will get values like `windows-10.0.17763`, but this is preferable to `unknownwin`, and saves us needing to maintain a separate cmake file.	2021-10-02 16:30:38 +00:00
Rasmus Munk Larsen	1d75fab368	Speed up tensor reduction	2021-10-02 14:58:23 +00:00
Antonio Sanchez	be9e7d205f	Reduce tensor_contract_gpu test. The original test times out after 60 minutes on Windows, even when setting flags to optimize for speed. Reducing the number of contractions performed from 3600->27 for subtests 8,9 allow the two to run in just over a minute each.	2021-10-02 04:36:15 +00:00
Antonio Sanchez	701f5d1c91	Fix gpu special function tests. Some checks used incorrect values, partly from copy-paste errors, partly from the change in behaviour introduced in !398. Modified results to match scipy, simplified tests by updating `VERIFY_IS_CWISE_APPROX` to work for scalars.	2021-10-01 10:20:50 -07:00
Antonio Sanchez	f0f1d7938b	Disable testing of complex compound assignment operators for MSVC. MSVC does not support specializing compound assignments for `std::complex`, since it already specializes them (contrary to the standard). Trying to use one of these on device will currently lead to a duplicate definition error. This is still probably preferable to no error though. If we remove the definitions for MSVC, then it will compile, but the kernel will fail silently. The only proper solution would be to define our own custom `Complex` type.	2021-09-27 15:15:11 -07:00
Kolja Brix	51a0b4e2d2	Reorganize test main file	2021-09-27 18:30:47 +00:00
Antonio Sanchez	21640612be	Disable more CUDA warnings. For cuda 9.2 and 11.4, they changed the numbers again. Fixes #2331.	2021-09-24 21:31:14 -07:00
Antonio Sanchez	de218b471d	Add -arch=<arch> argument for nvcc. Without this flag, when compiling with nvcc, if the compute architecture of a card does not exactly match any of those listed for `-gencode arch=compute_<arch>,code=sm_<arch>`, then the kernel will fail to run with: ``` cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device. ``` This can happen, for example, when compiling with an older cuda version that does not support a newer architecture (e.g. T4 is `sm_75`, but cuda 9.2 only supports up to `sm_70`). With the `-arch=<arch>` flag, the code will compile and run at the supplied architecture.	2021-09-24 20:48:01 -07:00
Antonio Sanchez	846d34384a	Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS Also add a missing space for clang.	2021-09-24 20:15:55 -07:00
Antonio Sanchez	7b00e8b186	Clean up CUDA CMake files. - Unify test/CMakeLists.txt and unsupported/test/CMakeLists.txt - Added `EIGEN_CUDA_FLAGS` that are appended to the set of flags passed to the cuda compiler (nvcc or clang). The latter is to support passing custom flags (e.g. `-arch=` to nvcc, or to disable cuda-specific warnings).	2021-09-24 14:43:59 -07:00
Antonio Sanchez	e9e90892fe	Disable another device warning	2021-09-23 13:43:18 -07:00

1 2 3 4 5 ...

11617 Commits