eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Rasmus Munk Larsen	7f09d3487d	Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp.	2021-02-18 20:49:18 +00:00
Masaki Murooka	12fd3dd655	add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC).	2021-02-17 22:55:47 +00:00
David Tellenbach	aa8b22e776	Bump to 3.4.99	2021-02-17 23:23:17 +01:00
David Tellenbach	5336ad8591	Define internal::make_unsigned for [unsigned]long long on macOS. macOS defines int64_t as long long even for C++03 and therefore expects a template specialization internal::make_unsigned<long long>, for C++03. Since other platforms define int64_t as long for C++03 we cannot add the specialization for all cases.	2021-02-17 23:03:10 +01:00
Antonio Sanchez	0845df7f77	Fix uninitialized warning on AVX.	2021-02-17 13:13:39 -08:00
Chip Kerchner	9b51dc7972	Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product	2021-02-17 17:49:23 +00:00
Rasmus Munk Larsen	be0574e215	New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.	2021-02-17 02:50:32 +00:00
Antonio Sanchez	7ff0b7a980	Updated pfrexp implementation. The original implementation fails for 0, denormals, inf, and NaN. See #2150	2021-02-17 02:23:24 +00:00
David Tellenbach	9ad4096ccb	Document possible inconsistencies when using `Matrix<bool, ...>`	2021-02-17 00:50:26 +01:00
Ashutosh Sharma	f702792a7c	missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& kernel)	2021-02-16 16:33:59 +00:00
Jan van Dijk	db61b8d478	Avoid -Wunused warnings in NDEBUG builds. In two places in SuperLUSupport.h, a local variable 'size' is created that is used only inside an eigen_assert. Remove these, just fetch the required values inside the assert statements. This avoids annoying -Wunused warnings (and -Werror=unused errors) in NDEBUG builds.	2021-02-12 18:35:35 +00:00
David Tellenbach	622c598944	Don't allow all test jobs to fail but only the currently failing ones.	2021-02-12 14:01:17 +01:00
Antonio Sanchez	90ee821c56	Use vrsqrts for rsqrt Newton iterations. It's slightly faster and slightly more accurate, allowing our current packetmath tests to pass for sqrt with a single iteration.	2021-02-11 11:33:51 -08:00
Antonio Sanchez	9fde9cce5d	Adjust bounds for pexp_float/double The original clamping bounds on `_x` actually produce finite values: ``` exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38 exp(709.437) = 1.27226e+308 < 1.79769e+308 ``` so with an accurate `ldexp` implementation, `pexp` fails for large inputs, producing finite values instead of `inf`. This adjusts the bounds slightly outside the finite range so that the output will overflow to +/- `inf` as expected.	2021-02-10 22:48:05 +00:00
Antonio Sanchez	4cb563a01e	Fix ldexp implementations. The previous implementations produced garbage values if the exponent did not fit within the exponent bits. See #2131 for a complete discussion, and !375 for other possible implementations. Here we implement the 4-factor version. See `pldexp_impl` in `GenericPacketMathFunctions.h` for a full description. The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>` requires `por`. Left as a "TODO" is to delegate to a faster version if we know the exponent does fit within the exponent bits. Fixes #2131.	2021-02-10 22:45:41 +00:00
Ashutosh Sharma	7eb07da538	loop less ptranspose	2021-02-10 10:21:37 -08:00
David Tellenbach	36200b7855	Remove vim specific comments to recognoize correct file-type. As discussed in #2143 we remove editor specific comments.	2021-02-09 09:13:09 +01:00
David Tellenbach	54589635ad	Replace nullptr by NULL in SparseLU.h to be C++03 compliant.	2021-02-09 09:08:06 +01:00
Ralf Hannemann-Tamas	984d010b7b	add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves	2021-02-08 22:00:31 +00:00
Nikolaus Demmel	b578930657	Fix documentation typos in LDLT.h	2021-02-08 21:07:29 +00:00
Antonio Sanchez	66841ea070	Enable bdcsvd on host. Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation is skipped, leading to a linker error. This prevents it from running on the host as well. Seems it was disabled 6 years ago (`5384e891`) to match `jacobiSvd`, but `jacobiSvd` is now enabled on host. Tested and runs fine on host, but will not compile/run for device (though it's not labelled as a device function, so this should be fine). Fixes #2139	2021-02-08 12:56:23 -08:00
Rasmus Munk Larsen	6e3b795f81	Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one.	2021-02-05 16:58:49 -08:00
Antonio Sanchez	abcde69a79	Disable vectorized pow for half/bfloat16. We are potentially seeing some accuracy issues with these. Ideally we would hand off to `float`, but that's not trivial with the current setup. We may want to consider adding `ppow<Packet>` and `HasPow`, so implementations can more easily specialize this.	2021-02-05 12:17:34 -08:00
Antonio Sanchez	f85038b7f3	Fix excessive GEBP register spilling for 32-bit NEON. Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138.	2021-02-03 09:01:48 -08:00
Antonio Sanchez	56c8b14d87	Eliminate implicit conversions from float to double.	2021-02-01 15:31:01 -08:00
Antonio Sanchez	fb4548e27b	Implement bit_* for device. Unfortunately `std::bit_and` and the like are host-only functions prior to c++14 (since they are not `constexpr`). They also never exist in the global namespace, so the current implementation always fails to compile via NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global namespace on device. To overcome these limitations, we implement these functionals here.	2021-02-01 13:27:45 -08:00
Antonio Sanchez	1615a27993	Fix altivec packetmath. Allows the altivec packetmath tests to pass. There were a few issues: - `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems - `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead of 0xFFFF) - `pfrexp` needed to set the `exponent` argument. Related to !370, #2128 cc: @ChipKerchner @pdrocaldeira Tested on `_BIG_ENDIAN` running on QEMU with VSX. Couldn't figure out build flags to get it to work for little endian.	2021-01-28 18:37:09 +00:00
Chip Kerchner	1414e2212c	Fix clang compilation for AltiVec from previous check-in	2021-01-28 18:36:40 +00:00
David Tellenbach	170a504c2f	Add the following functions DenseBase::setConstant(NoChange_t, Index, const Scalar&) DenseBase::setConstant(Index, NoChange_t, const Scalar&) to close #663.	2021-01-28 15:13:07 +01:00
David Tellenbach	598e1b6e54	Add the following functions: DenseBase::setZero(NoChange_t, Index) DenseBase::setZero(Index, NoChange_t) DenseBase::setOnes(NoChange_t, Index) DenseBase::setOnes(Index, NoChange_t) DenseBase::setRandom(NoChange_t, Index) DenseBase::setRandom(Index, NoChange_t) This closes #663.	2021-01-28 01:10:36 +01:00
Gael Guennebaud	0668c68b03	Allow for negative strides. Note that using a stride of -1 is still not possible because it would clash with the definition of Eigen::Dynamic. This fixes #747.	2021-01-27 23:32:12 +01:00
Samir Benmendil	288d456c29	Replace language_support module with builtin CheckLanguage The workaround_9220 function was introduced a long time ago to workaround a CMake issue with enable_language(OPTIONAL). Since then CMake has clarified that the OPTIONAL keywords has not been implemented[0]. A CheckLanguage module is now provided with CMake to check if a language can be enabled. Use that instead. [0] https://cmake.org/cmake/help/v3.18/command/enable_language.html	2021-01-27 13:26:40 +00:00
Antonio Sanchez	3f4684f87d	Include `<cstdint>` in one place, remove custom typedefs Originating from [this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case), some win32 compilers define `__int32` as a `long`, but MinGW defines `std::int32_t` as an `int`, leading to a type conflict. To avoid this, we remove the custom `typedef` definitions for win32. The Tensor module requires C++11 anyways, so we are guaranteed to have included `<cstdint>` already in `Eigen/Core`. Also re-arranged the headers to only include `<cstdint>` in one place to avoid this type of error again.	2021-01-26 14:23:05 -08:00
Chip Kerchner	0784d9f87b	Fix sqrt, ldexp and frexp compilation errors.	2021-01-25 15:22:19 -06:00
Gmc2	a4edb1079c	fix test of ExtractVolumePatchesOp	2021-01-25 03:23:46 +00:00
Antonio Sanchez	4c42d5ee41	Eliminate implicit conversion warning in test/array_cwise.cpp	2021-01-23 11:54:00 -08:00
Antonio Sanchez	e0d13ead90	Replace std::isnan with numext::isnan for c++03	2021-01-23 11:02:35 -08:00
Florian Maurin	c35965b381	Remove unused variable in SparseLU.h	2021-01-22 22:24:11 +00:00
Antonio Sanchez	f0e46ed5d4	Fix pow and other cwise ops for half/bfloat16. The new `generic_pow` implementation was failing for half/bfloat16 since their construction from int/float is not `constexpr`. Modified in `GenericPacketMathFunctions` to remove `constexpr`. While adding tests for half/bfloat16, found other issues related to implicit conversions. Also needed to implement `numext::arg` for non-integer, non-complex, non-float/double/long double types. These seem to be implicitly converted to `std::complex<T>`, which then fails for half/bfloat16.	2021-01-22 11:10:54 -08:00
Antonio Sanchez	f19bcffee6	Specialize std::complex operators for use on GPU device. NVCC and older versions of clang do not fully support `std::complex` on device, leading to either compile errors (Cannot call `__host__` function) or worse, runtime errors (Illegal instruction). For most functions, we can implement specialized `numext` versions. Here we specialize the standard operators (with the exception of stream operators and member function operators with a scalar that are already specialized in `<complex>`) so they can be used in device code as well. To import these operators into the current scope, use `EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces. This allow us to remove specializations of the sum/difference/product/quotient ops, and allow us to treat complex numbers like most other scalars (e.g. in tests).	2021-01-22 18:19:19 +00:00
David Tellenbach	65e2169c45	Add support for Arm SVE This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently sizeless. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is length agnostic). During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector. Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`. This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements. This MR is joint work with Miguel Tairum <miguel.tairum@arm.com>.	2021-01-21 21:11:57 +00:00
Antonio Sanchez	b2126fd6b5	Fix pfrexp/pldexp for half. The recent addition of vectorized pow (!330) relies on `pfrexp` and `pldexp`. This was missing for `Eigen::half` and `Eigen::bfloat16`. Adding tests for these packet ops also exposed an issue with handling negative values in `pfrexp`, returning an incorrect exponent. Added the missing implementations, corrected the exponent in `pfrexp1`, and added `packetmath` tests.	2021-01-21 19:32:28 +00:00
Antonio Sanchez	25d8498f8b	Fix stable_norm_1 test. Test enters an infinite loop if size is 1x1 when choosing to select unique indices for adding `inf` and `NaN` to the input. Here we revert to non-unique indices, and split the `hypotNorm` check into two cases: one where both `inf` and `NaN` are added, and one where only `NaN` is added.	2021-01-21 09:44:42 -08:00
David Tellenbach	660c6b857c	Remove std::cerr in iterative solver since we don't have iostream. This fixes #2123	2021-01-21 11:40:05 +01:00
Antonio Sanchez	d5b7981119	Fix signed-unsigned comparison. Hex literals are interpreted as unsigned, leading to a comparison between signed max supported function `abcd[0]` (which was negative) to the unsigned literal `0x80000006`. Should not change result since signed is implicitly converted to unsigned for the comparison, but eliminates the warning.	2021-01-20 08:34:00 -08:00
Ivan Popivanov	e409795d6b	Proper CPUID	2021-01-18 17:10:11 +00:00
Rasmus Munk Larsen	cdd8fdc32e	Vectorize `pow(x, y)`. This closes https://gitlab.com/libeigen/eigen/-/issues/2085 , which also contains a description of the algorithm. I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics: ``` max_rel_error = 8.34405e-07 rms_rel_error = 2.76654e-07 ``` If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`: ``` max_rel_error = 0.666667 rms = 6.8727e-05 count = 1335165689 argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45 ``` which seems reasonable, since these results are subnormals with only couple of significant bits left.	2021-01-18 13:25:16 +00:00
Antonio Sanchez	bde6741641	Improved std::complex sqrt and rsqrt. Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously `complex_sqrt` was only used for CUDA and MSVC), and implements custom `complex_rsqrt`. Also introduces `numext::rsqrt` to simplify implementation, and modified `numext::hypot` to adhere to IEEE IEC 6059 for special cases. The `complex_sqrt` and `complex_rsqrt` implementations were found to be significantly faster than `std::sqrt<std::complex<T>>` and `1/numext::sqrt<std::complex<T>>`. Benchmark file attached. ``` GCC 10, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 9.21 ns 9.21 ns 73225448 BM_StdSqrt<std::complex<float>> 17.1 ns 17.1 ns 40966545 BM_Sqrt<std::complex<double>> 8.53 ns 8.53 ns 81111062 BM_StdSqrt<std::complex<double>> 21.5 ns 21.5 ns 32757248 BM_Rsqrt<std::complex<float>> 10.3 ns 10.3 ns 68047474 BM_DivSqrt<std::complex<float>> 16.3 ns 16.3 ns 42770127 BM_Rsqrt<std::complex<double>> 11.3 ns 11.3 ns 61322028 BM_DivSqrt<std::complex<double>> 16.5 ns 16.5 ns 42200711 Clang 11, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 7.46 ns 7.45 ns 90742042 BM_StdSqrt<std::complex<float>> 16.6 ns 16.6 ns 42369878 BM_Sqrt<std::complex<double>> 8.49 ns 8.49 ns 81629030 BM_StdSqrt<std::complex<double>> 21.8 ns 21.7 ns 31809588 BM_Rsqrt<std::complex<float>> 8.39 ns 8.39 ns 82933666 BM_DivSqrt<std::complex<float>> 14.4 ns 14.4 ns 48638676 BM_Rsqrt<std::complex<double>> 9.83 ns 9.82 ns 70068956 BM_DivSqrt<std::complex<double>> 15.7 ns 15.7 ns 44487798 Clang 9, Pixel 2, aarch64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 24.2 ns 24.1 ns 28616031 BM_StdSqrt<std::complex<float>> 104 ns 103 ns 6826926 BM_Sqrt<std::complex<double>> 31.8 ns 31.8 ns 22157591 BM_StdSqrt<std::complex<double>> 128 ns 128 ns 5437375 BM_Rsqrt<std::complex<float>> 31.9 ns 31.8 ns 22384383 BM_DivSqrt<std::complex<float>> 99.2 ns 98.9 ns 7250438 BM_Rsqrt<std::complex<double>> 46.0 ns 45.8 ns 15338689 BM_DivSqrt<std::complex<double>> 119 ns 119 ns 5898944 ```	2021-01-17 08:50:57 -08:00
Maozhou, Ge	21a8a2487c	fix paddings of TensorVolumePatchOp	2021-01-15 11:51:49 +08:00
Guoqiang QI	38ae5353ab	1)provide a better generic paddsub op implementation 2)make paddsub op support the Packet2cf/Packet4f/Packet2f in NEON 3)make paddsub op support the Packet2cf/Packet4f in SSE	2021-01-13 22:54:03 +00:00

1 2 3 4 5 ...

11275 Commits