eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-27 07:29:52 +08:00

Author	SHA1	Message	Date
Antonio Sanchez	db5691ff2b	Fix some CUDA warnings. Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not running on GPU. `std::hash<float>` is not a device function, so cannot be used by `std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only define if `EIGEN_HAS_STD_HASH`. Same for `half`. Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability, eliminate warnings about `EIGEN_CUDA_ARCH` not being defined. Replaced a couple C-style casts with `reinterpret_cast` for aligned loading of `half` to `half2`. This eliminates `-Wcast-align` warnings in clang. Although not ideal due to potential type aliasing, this is how CUDA handles these conversions internally.	2021-02-24 00:16:31 +00:00
Rasmus Munk Larsen	88d4c6d4c8	Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect implementation.	2021-02-23 23:11:03 +00:00
Adam Shapiro	2ac0b78739	Fixed sparse conservativeResize() when both num cols and rows decreased. The previous implementation caused a buffer overflow trying to calculate non- zero counts for columns that no longer exist.	2021-02-23 21:32:39 +00:00
Chip-Kerchner	10c77b0ff4	Fix compilation errors with later versions of GCC and use of MMA.	2021-02-22 15:01:47 -06:00
Christoph Hertzberg	73922b0174	Fixes Bug #1925 . Packets should be passed by const reference, even to inline functions.	2021-02-20 18:56:42 +01:00
Antonio Sanchez	5f9cfb2529	Add missing adolc isinf/isnan. Also modified cmake/FindAdolc.cmake to eliminate warnings, and added search paths to match install layout. Fixed: #2157	2021-02-19 22:26:56 +00:00
Christoph Hertzberg	ce4af0b38f	Missing change regarding #1910	2021-02-19 20:51:35 +01:00
Christoph Hertzberg	a7749c09bc	Bug #1910 : Make SparseCholesky work for RowMajor matrices	2021-02-19 19:36:18 +01:00
Antonio Sánchez	128eebf05e	Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC)." This reverts commit `12fd3dd655`	2021-02-19 17:09:16 +00:00
frgossen	33e0af0130	Return nan at poles of polygamma, digamma, and zeta if limit is not defined	2021-02-19 16:35:11 +00:00
Rasmus Munk Larsen	7f09d3487d	Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp.	2021-02-18 20:49:18 +00:00
Masaki Murooka	12fd3dd655	add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC).	2021-02-17 22:55:47 +00:00
David Tellenbach	aa8b22e776	Bump to 3.4.99	2021-02-17 23:23:17 +01:00
David Tellenbach	5336ad8591	Define internal::make_unsigned for [unsigned]long long on macOS. macOS defines int64_t as long long even for C++03 and therefore expects a template specialization internal::make_unsigned<long long>, for C++03. Since other platforms define int64_t as long for C++03 we cannot add the specialization for all cases.	2021-02-17 23:03:10 +01:00
Antonio Sanchez	0845df7f77	Fix uninitialized warning on AVX.	2021-02-17 13:13:39 -08:00
Chip Kerchner	9b51dc7972	Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product	2021-02-17 17:49:23 +00:00
Rasmus Munk Larsen	be0574e215	New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.	2021-02-17 02:50:32 +00:00
Antonio Sanchez	7ff0b7a980	Updated pfrexp implementation. The original implementation fails for 0, denormals, inf, and NaN. See #2150	2021-02-17 02:23:24 +00:00
David Tellenbach	9ad4096ccb	Document possible inconsistencies when using `Matrix<bool, ...>`	2021-02-17 00:50:26 +01:00
Ashutosh Sharma	f702792a7c	missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& kernel)	2021-02-16 16:33:59 +00:00
Jan van Dijk	db61b8d478	Avoid -Wunused warnings in NDEBUG builds. In two places in SuperLUSupport.h, a local variable 'size' is created that is used only inside an eigen_assert. Remove these, just fetch the required values inside the assert statements. This avoids annoying -Wunused warnings (and -Werror=unused errors) in NDEBUG builds.	2021-02-12 18:35:35 +00:00
David Tellenbach	622c598944	Don't allow all test jobs to fail but only the currently failing ones.	2021-02-12 14:01:17 +01:00
Antonio Sanchez	90ee821c56	Use vrsqrts for rsqrt Newton iterations. It's slightly faster and slightly more accurate, allowing our current packetmath tests to pass for sqrt with a single iteration.	2021-02-11 11:33:51 -08:00
Antonio Sanchez	9fde9cce5d	Adjust bounds for pexp_float/double The original clamping bounds on `_x` actually produce finite values: ``` exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38 exp(709.437) = 1.27226e+308 < 1.79769e+308 ``` so with an accurate `ldexp` implementation, `pexp` fails for large inputs, producing finite values instead of `inf`. This adjusts the bounds slightly outside the finite range so that the output will overflow to +/- `inf` as expected.	2021-02-10 22:48:05 +00:00
Antonio Sanchez	4cb563a01e	Fix ldexp implementations. The previous implementations produced garbage values if the exponent did not fit within the exponent bits. See #2131 for a complete discussion, and !375 for other possible implementations. Here we implement the 4-factor version. See `pldexp_impl` in `GenericPacketMathFunctions.h` for a full description. The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>` requires `por`. Left as a "TODO" is to delegate to a faster version if we know the exponent does fit within the exponent bits. Fixes #2131.	2021-02-10 22:45:41 +00:00
Ashutosh Sharma	7eb07da538	loop less ptranspose	2021-02-10 10:21:37 -08:00
David Tellenbach	36200b7855	Remove vim specific comments to recognoize correct file-type. As discussed in #2143 we remove editor specific comments.	2021-02-09 09:13:09 +01:00
David Tellenbach	54589635ad	Replace nullptr by NULL in SparseLU.h to be C++03 compliant.	2021-02-09 09:08:06 +01:00
Ralf Hannemann-Tamas	984d010b7b	add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves	2021-02-08 22:00:31 +00:00
Nikolaus Demmel	b578930657	Fix documentation typos in LDLT.h	2021-02-08 21:07:29 +00:00
Antonio Sanchez	66841ea070	Enable bdcsvd on host. Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation is skipped, leading to a linker error. This prevents it from running on the host as well. Seems it was disabled 6 years ago (`5384e891`) to match `jacobiSvd`, but `jacobiSvd` is now enabled on host. Tested and runs fine on host, but will not compile/run for device (though it's not labelled as a device function, so this should be fine). Fixes #2139	2021-02-08 12:56:23 -08:00
Rasmus Munk Larsen	6e3b795f81	Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one.	2021-02-05 16:58:49 -08:00
Antonio Sanchez	abcde69a79	Disable vectorized pow for half/bfloat16. We are potentially seeing some accuracy issues with these. Ideally we would hand off to `float`, but that's not trivial with the current setup. We may want to consider adding `ppow<Packet>` and `HasPow`, so implementations can more easily specialize this.	2021-02-05 12:17:34 -08:00
Antonio Sanchez	f85038b7f3	Fix excessive GEBP register spilling for 32-bit NEON. Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138.	2021-02-03 09:01:48 -08:00
Antonio Sanchez	56c8b14d87	Eliminate implicit conversions from float to double.	2021-02-01 15:31:01 -08:00
Antonio Sanchez	fb4548e27b	Implement bit_* for device. Unfortunately `std::bit_and` and the like are host-only functions prior to c++14 (since they are not `constexpr`). They also never exist in the global namespace, so the current implementation always fails to compile via NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global namespace on device. To overcome these limitations, we implement these functionals here.	2021-02-01 13:27:45 -08:00
Antonio Sanchez	1615a27993	Fix altivec packetmath. Allows the altivec packetmath tests to pass. There were a few issues: - `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems - `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead of 0xFFFF) - `pfrexp` needed to set the `exponent` argument. Related to !370, #2128 cc: @ChipKerchner @pdrocaldeira Tested on `_BIG_ENDIAN` running on QEMU with VSX. Couldn't figure out build flags to get it to work for little endian.	2021-01-28 18:37:09 +00:00
Chip Kerchner	1414e2212c	Fix clang compilation for AltiVec from previous check-in	2021-01-28 18:36:40 +00:00
David Tellenbach	170a504c2f	Add the following functions DenseBase::setConstant(NoChange_t, Index, const Scalar&) DenseBase::setConstant(Index, NoChange_t, const Scalar&) to close #663.	2021-01-28 15:13:07 +01:00
David Tellenbach	598e1b6e54	Add the following functions: DenseBase::setZero(NoChange_t, Index) DenseBase::setZero(Index, NoChange_t) DenseBase::setOnes(NoChange_t, Index) DenseBase::setOnes(Index, NoChange_t) DenseBase::setRandom(NoChange_t, Index) DenseBase::setRandom(Index, NoChange_t) This closes #663.	2021-01-28 01:10:36 +01:00
Gael Guennebaud	0668c68b03	Allow for negative strides. Note that using a stride of -1 is still not possible because it would clash with the definition of Eigen::Dynamic. This fixes #747.	2021-01-27 23:32:12 +01:00
Samir Benmendil	288d456c29	Replace language_support module with builtin CheckLanguage The workaround_9220 function was introduced a long time ago to workaround a CMake issue with enable_language(OPTIONAL). Since then CMake has clarified that the OPTIONAL keywords has not been implemented[0]. A CheckLanguage module is now provided with CMake to check if a language can be enabled. Use that instead. [0] https://cmake.org/cmake/help/v3.18/command/enable_language.html	2021-01-27 13:26:40 +00:00
Antonio Sanchez	3f4684f87d	Include `<cstdint>` in one place, remove custom typedefs Originating from [this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case), some win32 compilers define `__int32` as a `long`, but MinGW defines `std::int32_t` as an `int`, leading to a type conflict. To avoid this, we remove the custom `typedef` definitions for win32. The Tensor module requires C++11 anyways, so we are guaranteed to have included `<cstdint>` already in `Eigen/Core`. Also re-arranged the headers to only include `<cstdint>` in one place to avoid this type of error again.	2021-01-26 14:23:05 -08:00
Chip Kerchner	0784d9f87b	Fix sqrt, ldexp and frexp compilation errors.	2021-01-25 15:22:19 -06:00
Gmc2	a4edb1079c	fix test of ExtractVolumePatchesOp	2021-01-25 03:23:46 +00:00
Antonio Sanchez	4c42d5ee41	Eliminate implicit conversion warning in test/array_cwise.cpp	2021-01-23 11:54:00 -08:00
Antonio Sanchez	e0d13ead90	Replace std::isnan with numext::isnan for c++03	2021-01-23 11:02:35 -08:00
Florian Maurin	c35965b381	Remove unused variable in SparseLU.h	2021-01-22 22:24:11 +00:00
Antonio Sanchez	f0e46ed5d4	Fix pow and other cwise ops for half/bfloat16. The new `generic_pow` implementation was failing for half/bfloat16 since their construction from int/float is not `constexpr`. Modified in `GenericPacketMathFunctions` to remove `constexpr`. While adding tests for half/bfloat16, found other issues related to implicit conversions. Also needed to implement `numext::arg` for non-integer, non-complex, non-float/double/long double types. These seem to be implicitly converted to `std::complex<T>`, which then fails for half/bfloat16.	2021-01-22 11:10:54 -08:00
Antonio Sanchez	f19bcffee6	Specialize std::complex operators for use on GPU device. NVCC and older versions of clang do not fully support `std::complex` on device, leading to either compile errors (Cannot call `__host__` function) or worse, runtime errors (Illegal instruction). For most functions, we can implement specialized `numext` versions. Here we specialize the standard operators (with the exception of stream operators and member function operators with a scalar that are already specialized in `<complex>`) so they can be used in device code as well. To import these operators into the current scope, use `EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces. This allow us to remove specializations of the sum/difference/product/quotient ops, and allow us to treat complex numbers like most other scalars (e.g. in tests).	2021-01-22 18:19:19 +00:00

1 2 3 4 5 ...

11385 Commits