eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-27 07:29:52 +08:00

Author	SHA1	Message	Date
Nathan Luehr	7e6a1c129c	Device implementation of log for std::complex types.	2021-05-11 22:02:21 +00:00
Nathan Luehr	6753f0f197	Fix ambiguity due to argument dependent lookup.	2021-05-11 15:41:11 -05:00
guoqiangqi	3d9051ea84	Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .	2021-05-10 23:53:16 +00:00
Rohit Santhanam	39ec31c0ad	Fix for issue where numext::imag and numext::real are used before they are defined.	2021-05-10 19:48:32 +00:00
Antonio Sanchez	c0eb5f89a4	Restore ABI compatibility for conj with 3.3, fix conflict with boost. The boost library unfortunately specializes `conj` for various types and assumes the original two-template-parameter version. This changes restores the second parameter. This also restores ABI compatibility. The specialization for `std::complex` is because `std::conj` is not a device function. For custom complex scalar types, users should provide their own `conj` implementation. We may consider removing the unnecessary second parameter in the future - but this will require modifying boost as well. Fixes #2112.	2021-05-07 18:14:00 +00:00
Antonio Sanchez	90e9a33e1c	Fix numext::arg return type. The cxx11 path for `numext::arg` incorrectly returned the complex type instead of the real type, leading to compile errors. Fixed this and added tests. Related to !477, which uncovered the issue.	2021-05-07 16:26:57 +00:00
Christoph Hertzberg	722ca0b665	Revert addition of unused `paddsub<Packet2cf>`. This fixes #2242	2021-05-06 18:36:47 +02:00
Antonio Sanchez	1c013be2cc	Better CUDA complex division. The original produced NaNs when dividing 0/b for subnormal b. The `complex_divide_stable` was changed to use the more common Smith's algorithm.	2021-04-29 17:39:58 +00:00
Antonio Sanchez	172db7bfc3	Add missing pcmp_lt_or_nan for NEON Packet4bf.	2021-04-27 14:12:11 -07:00
Jakub Lichman	d87648a6be	Tests added and AVX512 bug fixed for pcmp_lt_or_nan	2021-04-25 20:58:56 +00:00
Antonio Sanchez	d213a0bcea	DenseStorage safely copy/swap. Fixes #2229. For dynamic matrices with fixed-sized storage, only copy/swap elements that have been set. Otherwise, this leads to inefficient copying, and potential UB for non-initialized elements.	2021-04-22 18:45:19 +00:00
Rasmus Munk Larsen	85a76a16ea	Make vectorized compute_inverse_size4 compile with AVX.	2021-04-22 15:21:01 +00:00
Chip-Kerchner	06c2760bd1	Fix taking address of rvalue compiler issue with TensorFlow (plus other warnings).	2021-04-21 00:47:13 +00:00
Jakub Lichman	2b1dfd1ba0	HasExp added for AVX512 Packet8d	2021-04-20 19:07:58 +00:00
Antonio Sanchez	1d79c68ba0	Fix ldexp for AVX512 (#2215 ) Wrong shuffle was used. Need to interleave low/high halves with a `permute` instruction. Fixes #2215.	2021-04-20 16:25:22 +00:00
David Tellenbach	3e819d83bf	Before 3.4 branch	2021-04-18 23:36:14 +02:00
Christoph Hertzberg	9357feedc7	Avoid using uninitialized inputs and if available, use slightly more efficient `movsd` instruction for `pset1<Packet2cf>`.	2021-04-13 01:36:59 +02:00
Christoph Hertzberg	1e1c8a735c	Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for `std::result_of` and `std::invoke_result`. Fixes #2209	2021-04-12 01:26:15 +00:00
Christoph Hertzberg	d58678069c	Make iterators default constructible and assignable, by making...	2021-04-09 17:03:28 +00:00
Antonio Sanchez	fcb5106c6e	Scaled epsilon the wrong way. Should have been 0.5 to widen the bounds, since this is inverse precision. Setting to 0.5, however, leads to many more failing tests at Google, so reverting to 1 for now.	2021-04-07 15:08:39 -07:00
Christoph Hertzberg	6197ce1a35	Replace `-2147483648` by `-0.0f` or `-0.0` constants (this should fix #2189 ). Also, remove unnecessary `pgather` operations.	2021-04-07 11:25:27 +00:00
Rasmus Munk Larsen	22edb46823	Align local arrays to Packet boundary.	2021-04-06 16:22:36 +00:00
Antonio Sanchez	90187a33e1	Fix SelfAdjoingEigenSolver (#2191 ) Adjust the relaxation step to use the condition ``` abs(subdiag[i]) <= epsilon * sqrt(abs(diag[i]) + abs(diag[i+1])) ``` for setting the subdiagonal entry to zero. Also adjust Wilkinson shift for small `e = subdiag[end-1]` - I couldn't find a reference for the original, and it was not consistent with the Wilkinson definition. Fixes #2191.	2021-04-05 11:19:09 -07:00
Rasmus Munk Larsen	3ddc0974ce	Fix two bugs in commit	2021-04-02 22:06:27 +00:00
Chip Kerchner	c24bee6120	Fix address of temporary object errors in clang11. This fixes the problem with taking the address of temporary objects which clang11 treats as errors.	2021-04-02 16:27:08 +00:00
Rasmus Munk Larsen	5bbc9cea93	Add an info() method to the SVDBase class to make it possible to tell the user that the computation failed, possibly due to invalid input. Make Jacobi and divide-and-conquer fail fast and return info() == InvalidInput if the matrix contains NaN or +/-Inf.	2021-03-31 21:09:19 +00:00
Antonio Sanchez	78ee3d6261	Fix CUDA constexpr issues for numeric_limits. Some CUDA/HIP constants fail on device with `constexpr` since they internally rely on non-constexpr functions, e.g. ``` \#define CUDART_INF_F __int_as_float(0x7f800000) ``` This fails for cuda-clang (though passes with nvcc). These constants are currently used by `device::numeric_limits`. For portability, we need to remove `constexpr` from the affected functions. For C++11 or higher, we should be able to rely on the `std::numeric_limits` versions anyways, since the methods themselves are now `constexpr`, so should be supported on device (clang/hipcc natively, nvcc with `--expr-relaxed-constexpr`).	2021-03-30 18:01:27 +00:00
Antonio Sanchez	af1247fbc1	Use Index type in loop over coefficients. Previously was `int`. Brought up by Kyle Snow (Polaris Geospatial Services) on the mailing list.	2021-03-29 17:40:55 +00:00
Antonio Sanchez	87729ea39f	Eliminate `round_impl` double-promotion warnings for c++03.	2021-03-25 16:52:19 +00:00
Deven Desai	748489ef9c	Un-defining EIGEN_HAS_CONSTEXPR on the HIP platform The Eigen unit-tests started failing on the HIP/ROCm platform, after the following commit `e7b8643d70` ``` In file included from /home/rocm-user/eigen/test/main.h:360: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:162: /home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:300:17: error: constexpr function never produces a constant expression [-Winvalid-constexpr] static float (max)() { ^ /home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:304:12: note: non-constexpr function '__int_as_float' cannot be used in a constant expression return HIPRT_MAX_NORMAL_F; ^ /home/rocm-user/eigen/Eigen/src/Core/arch/HIP/hcc/math_constants.h:14:28: note: expanded from macro 'HIPRT_MAX_NORMAL_F' #define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff) ^ /opt/rocm/hip/include/hip/hcc_detail/device_functions.h:913:32: note: declared here __device__ static inline float __int_as_float(int x) { ^ ``` The problem seems to that some of the constants defined in the HIP `math_constants.h` have a call to `__int_as_float` routine which is not declared `constexpr` in the HIP runtime header file. Working around this issue for now, be skipping the const_expr support (enabled via the above commit) on HIP	2021-03-25 13:45:52 +00:00
Chip Kerchner	d59ef212e1	Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3).	2021-03-25 11:08:19 +00:00
Steve Bronder	e7b8643d70	Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()"" This reverts commit `5f0b4a4010`.	2021-03-24 18:14:56 +00:00
Christoph Hertzberg	69a4f70956	Revert "Uses _mm512_abs_pd for Packet8d pabs" This reverts commit `f019b97aca`	2021-03-23 18:52:19 +00:00
David Tellenbach	4811e81966	Remove yet another comma at end of enum	2021-03-18 23:30:00 +01:00
Steve Bronder	f019b97aca	Uses _mm512_abs_pd for Packet8d pabs	2021-03-18 15:47:52 +00:00
Niek Bouman	ed964ba3f1	Proposed fix for issue #2187	2021-03-18 00:55:36 +00:00
Antonio Sanchez	8dfe1029a5	Augment NumTraits with min/max_exponent() again. Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase where possible. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. The previous MR !443 failed for c++03 due to lack of `constexpr`. Because of this, we need to keep around the `std::numeric_limits` version in enum expressions until the switch to c++11. Fixes #2148	2021-03-16 20:12:46 -07:00
David Tellenbach	eb71e5db98	Fix another warning on missing commas	2021-03-17 03:07:04 +01:00
David Tellenbach	df4bc2731c	Revert "Augment NumTraits with min/max_exponent()." This reverts commit `75ce9cd2a7`.	2021-03-17 03:06:08 +01:00
Antonio Sanchez	75ce9cd2a7	Augment NumTraits with min/max_exponent(). Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. Fixes #2148	2021-03-17 01:00:41 +00:00
David Tellenbach	9fb7062440	Silence warning on comma at end of enumerator list	2021-03-17 01:46:52 +01:00
Theo Fletcher	b8502a9dd6	Updated SelfAdjointEigenSolver documentation to include that the eigenvectors matrix is unitary.	2021-03-16 18:48:02 +00:00
Rasmus Munk Larsen	2e83cbbba9	Add NaN propagation options to minCoeff/maxCoeff visitors.	2021-03-16 17:02:50 +00:00
Antonio Sanchez	f612df2736	Add fmod(half, half). This is to support TensorFlow's `tf.math.floormod` for half.	2021-03-15 13:32:24 -07:00
Antonio Sanchez	14b7ebea11	Fix numext::round pre c++11 for large inputs. This is to resolve an issue for large inputs when +0.5 can actually lead to +1 if the input doesn't have enough precision to resolve the addition - leading to an off-by-one error. See discussion on `9a663973`.	2021-03-15 19:08:04 +00:00
Chip Kerchner	c9d4367fa4	Fix pround and add print	2021-03-15 19:07:43 +00:00
Antonio Sanchez	d24f9f9b55	Fix NVCC+ICC issues. NVCC does not understand `__forceinline`, so we need to use `inline` when compiling for GPU. ICC specializes `std::complex` operators for `float` and `double` by default, which cannot be used on device and conflict with Eigen's workaround in CUDA/Complex.h. This can be prevented by defining `_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`. Added this define to the tests and to `Eigen/Core`, but this will not work if the user includes `<complex>` before `<Eigen/Core>`. ICC also seems to generate a duplicate `Map` symbol in `PlainObjectBase`: ``` error: "Map" has already been declared in the current scope static ConstMapType Map(const Scalar *data) ``` I tracked this down to `friend class Eigen::Map`. Putting the `friend` statements at the bottom of the class seems to resolve this issue. Fixes #2180	2021-03-15 18:42:04 +00:00
Antonio Sanchez	14487ed14e	Add increment/decrement operators to Eigen::half. This is for consistency with bfloat16, and to support initialization with `std::iota`.	2021-03-15 10:52:23 -07:00
Antonio Sanchez	d098c4d64c	Disable EIGEN_OPTIMIZATION_BARRIER for PPC clang. Doesn't seem to correctly select the register type, and most types lead to compiler crashes.	2021-03-10 16:05:01 -08:00
Antonio Sanchez	543e34ab9d	Re-implement move assignments. The original swap approach leads to potential undefined behavior (reading uninitialized memory) and results in unnecessary copying of data for static storage. Here we pass down the move assignment to the underlying storage. Static storage does a one-way copy, dynamic storage does a swap. Modified the tests to no longer read from the moved-from matrix/tensor, since that can lead to UB. Added a test to ensure we do not access uninitialized memory in a move. Fixes: #2119	2021-03-10 16:55:20 +00:00
Ben Niu	b8d1857f0d	[MSVC-specific] Define EIGEN_ARCH_x86_64 for native x64 (_M_X64 is defined and _M_ARM64EC is not), and define EIGEN_ARCH_ARM64 for both the native ARM64 (_M_ARM64 is defined) or ARM64EC (_M_ARM64EC is defined). _M_ARM64EC is defined when the code is compiled by MSVC for ARM64EC, a new ARM64 ABI designed to be compatible with x64 application emulation on ARM64. If _M_ARM64EC is defined, _M_X64 and _M_AMD64 are also defined, so x64-specific code (especially intrinsics) is also compiled to ARM64 instructions (compliant with the ARM64EC ABI) for maximum x64 compatibility. Although a majority of x64-specific intrinsics can emulated by ARM64 instructions, it is still a good to simply recompile the native ARM64 code paths to ARM64EC for pure computation tasks, for performance reasons.	2021-03-10 10:21:31 +00:00
Antonio Sanchez	853a5c4b84	Fix ambiguous call to CUDA __half constructor.	2021-03-08 21:06:28 -08:00
Antonio Sanchez	94327dbfba	Fix typo: DEVICE -> GPU	2021-03-08 11:21:00 -08:00
Antonio Sanchez	1296abdf82	Fix non-trivial Half constructor for CUDA. Both CUDA and HIP require trivial default constructors for types used in shared memory. Otherwise failing with ``` error: initialization is not supported for __shared__ variables. ```	2021-03-08 07:32:54 -08:00
Antonio Sanchez	6045243141	Revert stack allocation limit change that crept in. This was accidentally introduced when copying changes between repos.	2021-03-05 14:29:37 -08:00
Deven Desai	1a96d49afe	Changing the Eigen::half implementation for HIP Currently, when compiling with HIP, Eigen::half is derived from the `__half_raw` struct that is defined within the hip_fp16.h header file. This is true for both the "host" compile phase and the "device" compile phase. This was causing a very hard to detect bug in the ROCm TensorFlow build. In the ROCm Tensorflow build, * files that do not contain ant GPU code get compiled via gcc, and * files that contnain GPU code get compiled via hipcc. In certain case, we have a function that is defined in a file that is compiled by hipcc, and is called in a file that is compiled by gcc. If such a function had Eigen::half has a "pass-by-value" argument, its value was getting corrupted, when received by the function. The reason for this seems to be that for the gcc compile, Eigen::half is derived from a `__half_raw` struct that has `uint16_t` as the data-store, and for hipcc the `__half_raw` implementation uses `_Float16` as the data store. There is some ABI incompatibility between gcc / hipcc (which is essentially latest clang), which results in the Eigen::half value (which is correct at the call-site) getting randomly corrupted when passed to the function. Changing the Eigen::half argument to be "pass by reference" seems to workaround the error. In order to fix it such that we do not run into it again in TF, this commit changes the Eigne::half implementation to use the same `__half_raw` implementation as the non-GPU compile, during host compile phase of the hipcc compile.	2021-03-05 19:27:13 +00:00
Antonio Sanchez	2468253c9a	Define EIGEN_CPLUSPLUS and replace most __cplusplus checks. The macro `__cplusplus` is not defined correctly in MSVC unless building with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the specified c++ standard version number. Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version number both for MSVC and otherwise. This simplifies checks for supported features. Also replaced most instances of standard version checking via `__cplusplus` with the existing `EIGEN_COMP_CXXVER` macro for better clarity. Fixes: #2170	2021-03-05 18:33:18 +00:00
Antonio Sanchez	82d61af3a4	Fix rint SSE/NEON again, using optimization barrier. This is a new version of !423, which failed for MSVC. Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to prevent operations involving `X` from crossing that barrier. Should work on most `GNUC` compatible compilers (MSVC doesn't seem to need this). This is a modified version adapted from what was used in `psincos_float` and tested on more platforms (see #1674, https://godbolt.org/z/73ezTG). Modified `rint` to use the barrier to prevent the add/subtract rounding trick from being optimized away. Also fixed an edge case for large inputs that get bumped up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.	2021-03-05 08:54:12 -08:00
David Tellenbach	5f0b4a4010	Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()" This reverts commit `6cbb3038ac` because it breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.	2021-03-05 13:16:43 +01:00
Steve Bronder	6cbb3038ac	Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()	2021-03-04 18:58:08 +00:00
Antonio Sánchez	9a663973b4	Revert "Fix rint for SSE/NEON." This reverts commit `e72dfeb8b9`	2021-03-03 18:51:51 +00:00
Antonio Sanchez	e72dfeb8b9	Fix rint for SSE/NEON. It seems sometimes with aggressive optimizations the combination `psub(padd(a, b), b)` trick to force rounding is compiled away. Here we replace with inline assembly to prevent this (I tried `volatile`, but that leads to additional loads from memory). Also fixed an edge case for large inputs `a` where adding `b` bumps the value up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.	2021-03-03 09:41:46 -08:00
Antonio Sanchez	1e0c7d4f49	Add print for SSE/NEON, use NEON rounding intrinsics if available. In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the current rounding mode. For NEON, we use the provided intrinsics for rint/floor/ceil if available (armv8). Related to #1969.	2021-02-27 22:42:07 +00:00
Antonio Sanchez	c65c2b31d4	Make half/bfloat16 constructor take inputs by value, fix powerpc test. Since `numeric_limits<half>::max_exponent` is a static inline constant, it cannot be directly passed by reference. This triggers a linker error in recent versions of `g++-powerpc64le`. Changing `half` to take inputs by value fixes this. Wrapping `max_exponent` with `int(...)` to make an addressable integer also fixes this and may help with other custom `Scalar` types down-the-road. Also eliminated some compile warnings for powerpc.	2021-02-27 21:32:06 +00:00
Christoph Hertzberg	39a590dfb6	Remove unused include	2021-02-27 19:02:33 +01:00
Christoph Hertzberg	8f686ac4ec	clang 10 aggressively warns about precision loss when converting int to float (or long to double) (cherry picked from commit cd541ad52c8152340469cae210312c0e27829c8d)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	a3521d743c	Fix some enum-enum conversion warnings (cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	ca528593f4	Fixed/masked more implicit copy constructor warnings (cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	4fb3459a23	Fix double-promotion warnings (cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)	2021-02-27 18:44:26 +01:00
Antonio Sanchez	29ebd84cb7	Fix NEON sqrt for 32-bit, add prsqrt. With !406, we accidentally broke arm 32-bit NEON builds, since `vsqrt_f32` is only available for 64-bit. Here we add back the `rsqrt` implementation for 32-bit, relying on a `prsqrt` implementation with better handling of edge cases. Note that several of the 32-bit NEON packet tests are currently failing - either due to denormal handling (NEON versions flush to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).	2021-02-26 14:08:40 -08:00
Rasmus Munk Larsen	fe19714f80	Merge branch 'rmlarsen1/eigen-nan_prop'	2021-02-26 09:21:24 -08:00
Rasmus Munk Larsen	e67672024d	Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop	2021-02-26 09:12:44 -08:00
Rasmus Munk Larsen	5e7d4c33d6	Add TODO.	2021-02-26 09:08:45 -08:00
Rasmus Munk Larsen	fb5b59641a	Defer default for minCoeff/maxCoeff to templated variant.	2021-02-26 09:07:00 -08:00
Antonio Sanchez	e19829c3b0	Fix floor/ceil for NEON fp16. Forgot to test this. Fixes bug introduced in !416.	2021-02-25 20:39:56 -08:00
Antonio Sanchez	5529db7524	Fix SSE/NEON pfloor/pceil for saturated values. The original will saturate if the input does not fit into an integer type. Here we fix this, returning the input if it doesn't have enough precision to have a fractional part. Also added `pceil` for NEON. Fixes #1969.	2021-02-25 14:39:26 -08:00
Rasmus Munk Larsen	51eba8c3e2	Fix indentation.	2021-02-25 18:21:21 +00:00
Rasmus Munk Larsen	5297b7162a	Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions.	2021-02-25 18:21:21 +00:00
Chip-Kerchner	6eebe97bab	Fix clang compile when no MMA flags are set. Simplify MMA compiler detection.	2021-02-24 20:43:23 -06:00
Rasmus Munk Larsen	4cb0592af7	Fix indentation.	2021-02-24 17:59:36 -08:00
Rasmus Munk Larsen	0065f9d322	Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions.	2021-02-25 01:54:36 +00:00
Rasmus Munk Larsen	113e61f364	Remove unused function scalar_cmp_with_cast.	2021-02-24 23:59:35 +00:00
Rasmus Munk Larsen	98ca58b02c	Cast anonymous enums to int when used in expressions.	2021-02-24 23:50:06 +00:00
Chip-Kerchner	c31ead8a15	Having forward template function declarations in a P10 file causes bad code in certain situations.	2021-02-24 23:43:30 +00:00
Antonio Sanchez	a31effc3bc	Add `invoke_result` and eliminate `result_of` warnings for C++17+. The `std::result_of` meta struct is deprecated in C++17 and removed in C++20. It was still slipping through due to a faulty definition of `EIGEN_HAS_STD_RESULT_OF`. Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and `Eigen::internal::invoke_result` implementation with fallback for pre C++17. Replaces the `result_of` definition with one based on `std::invoke_result` for C++17 and higher. For completeness, added nullary op support for c++03. Fixes #1850.	2021-02-24 21:36:14 +00:00
Chip-Kerchner	8523d447a1	Fixes to support old and new versions of the compilers for built-ins. Cast to non-const when using vector_pair with certain built-ins.	2021-02-24 20:49:15 +00:00
Antonio Sanchez	5908aeeaba	Fix CUDA device new and delete, and add test. HIP does not support new/delete on device, so test is skipped.	2021-02-24 11:31:41 -08:00
Antonio Sanchez	6cf0ab5e99	Disable fast psqrt for NEON. Accuracy is too poor - requires at least two Newton iterations, but then it is no longer significantly faster than `vsqrt`. Fixes #2094.	2021-02-23 19:52:55 -08:00
Antonio Sanchez	aba3998278	Fix check if GPU compile phase for std::hash	2021-02-23 19:52:08 -08:00
Antonio Sanchez	db5691ff2b	Fix some CUDA warnings. Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not running on GPU. `std::hash<float>` is not a device function, so cannot be used by `std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only define if `EIGEN_HAS_STD_HASH`. Same for `half`. Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability, eliminate warnings about `EIGEN_CUDA_ARCH` not being defined. Replaced a couple C-style casts with `reinterpret_cast` for aligned loading of `half` to `half2`. This eliminates `-Wcast-align` warnings in clang. Although not ideal due to potential type aliasing, this is how CUDA handles these conversions internally.	2021-02-24 00:16:31 +00:00
Rasmus Munk Larsen	88d4c6d4c8	Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect implementation.	2021-02-23 23:11:03 +00:00
Adam Shapiro	2ac0b78739	Fixed sparse conservativeResize() when both num cols and rows decreased. The previous implementation caused a buffer overflow trying to calculate non- zero counts for columns that no longer exist.	2021-02-23 21:32:39 +00:00
Chip-Kerchner	10c77b0ff4	Fix compilation errors with later versions of GCC and use of MMA.	2021-02-22 15:01:47 -06:00
Christoph Hertzberg	73922b0174	Fixes Bug #1925 . Packets should be passed by const reference, even to inline functions.	2021-02-20 18:56:42 +01:00
Christoph Hertzberg	a7749c09bc	Bug #1910 : Make SparseCholesky work for RowMajor matrices	2021-02-19 19:36:18 +01:00
Antonio Sánchez	128eebf05e	Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC)." This reverts commit `12fd3dd655`	2021-02-19 17:09:16 +00:00
Rasmus Munk Larsen	7f09d3487d	Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp.	2021-02-18 20:49:18 +00:00
Masaki Murooka	12fd3dd655	add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC).	2021-02-17 22:55:47 +00:00
David Tellenbach	aa8b22e776	Bump to 3.4.99	2021-02-17 23:23:17 +01:00
David Tellenbach	5336ad8591	Define internal::make_unsigned for [unsigned]long long on macOS. macOS defines int64_t as long long even for C++03 and therefore expects a template specialization internal::make_unsigned<long long>, for C++03. Since other platforms define int64_t as long for C++03 we cannot add the specialization for all cases.	2021-02-17 23:03:10 +01:00

1 2 3 4 5 ...

6584 Commits