eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-12 14:25:16 +08:00

Author	SHA1	Message	Date
Jakub Lichman	d87648a6be	Tests added and AVX512 bug fixed for pcmp_lt_or_nan	2021-04-25 20:58:56 +00:00
Jakub Lichman	1115f5462e	Tests for pcmp_lt and pcmp_le added	2021-04-23 19:51:43 +00:00
Antonio Sanchez	d213a0bcea	DenseStorage safely copy/swap. Fixes #2229. For dynamic matrices with fixed-sized storage, only copy/swap elements that have been set. Otherwise, this leads to inefficient copying, and potential UB for non-initialized elements.	2021-04-22 18:45:19 +00:00
Antonio Sanchez	69adf26aa3	Modify googlehash use to account for namespace issues. The namespace declaration for googlehash is a configurable macro that can be disabled. In particular, it is disabled within google, causing compile errors since `dense_hash_map`/`sparse_hash_map` are then in the global namespace instead of in `::google`. Here we play a bit of gynastics to allow for both `google::_hash_map` and `_hash_map`, while limiting namespace polution. Symbols within the `::google` namespace are imported into `Eigen::google`. We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be defined.	2021-04-12 19:00:39 -07:00
Christoph Hertzberg	d58678069c	Make iterators default constructible and assignable, by making...	2021-04-09 17:03:28 +00:00
Antonio Sanchez	ace7f132ed	Fix clang tidy warnings in AnnoyingScalar. Clang-tidy complains that full specializations in headers can cause ODR violations. Marked these as `inline` to fix. It also complains about renaming arguments in specializations. Set the argument names to match.	2021-04-05 12:49:38 -07:00
Rasmus Munk Larsen	3ddc0974ce	Fix two bugs in commit	2021-04-02 22:06:27 +00:00
David Tellenbach	ae95b74af9	Add CMake infrastructure for smoke testing Necessary CMake changes to implement pre-merge smoke tests running via CI.	2021-03-31 22:09:00 +00:00
Rasmus Munk Larsen	5bbc9cea93	Add an info() method to the SVDBase class to make it possible to tell the user that the computation failed, possibly due to invalid input. Make Jacobi and divide-and-conquer fail fast and return info() == InvalidInput if the matrix contains NaN or +/-Inf.	2021-03-31 21:09:19 +00:00
Antonio Sanchez	78ee3d6261	Fix CUDA constexpr issues for numeric_limits. Some CUDA/HIP constants fail on device with `constexpr` since they internally rely on non-constexpr functions, e.g. ``` \#define CUDART_INF_F __int_as_float(0x7f800000) ``` This fails for cuda-clang (though passes with nvcc). These constants are currently used by `device::numeric_limits`. For portability, we need to remove `constexpr` from the affected functions. For C++11 or higher, we should be able to rely on the `std::numeric_limits` versions anyways, since the methods themselves are now `constexpr`, so should be supported on device (clang/hipcc natively, nvcc with `--expr-relaxed-constexpr`).	2021-03-30 18:01:27 +00:00
Chip Kerchner	d59ef212e1	Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3).	2021-03-25 11:08:19 +00:00
Antonio Sanchez	5521c65afb	Eliminate mixingtypes_7 warning. `g_called` is not used in subtest 7, so was generating a `-Wunneeded-internal-declaration` warnings. Here we silence it by initializing the static variable.	2021-03-24 11:05:41 -07:00
David Tellenbach	0cc9b5eb40	Split test commainitializer into two substests	2021-03-18 13:28:51 +01:00
Antonio Sanchez	c3fbc6cec7	Use singleton pattern for static registered tests. The original fails with nvcc+msvc - there's a static order of initialization issue leading to registered tests being cleared. The test then fails on ``` VERIFY(EigenTest::all().size()>0); ``` since `EigenTest` no longer contains any tests. The singleton pattern fixes this.	2021-03-18 00:56:31 +00:00
Antonio Sanchez	8dfe1029a5	Augment NumTraits with min/max_exponent() again. Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase where possible. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. The previous MR !443 failed for c++03 due to lack of `constexpr`. Because of this, we need to keep around the `std::numeric_limits` version in enum expressions until the switch to c++11. Fixes #2148	2021-03-16 20:12:46 -07:00
David Tellenbach	df4bc2731c	Revert "Augment NumTraits with min/max_exponent()." This reverts commit `75ce9cd2a7`.	2021-03-17 03:06:08 +01:00
Antonio Sanchez	75ce9cd2a7	Augment NumTraits with min/max_exponent(). Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. Fixes #2148	2021-03-17 01:00:41 +00:00
Rasmus Munk Larsen	2e83cbbba9	Add NaN propagation options to minCoeff/maxCoeff visitors.	2021-03-16 17:02:50 +00:00
Antonio Sanchez	f612df2736	Add fmod(half, half). This is to support TensorFlow's `tf.math.floormod` for half.	2021-03-15 13:32:24 -07:00
Antonio Sanchez	d24f9f9b55	Fix NVCC+ICC issues. NVCC does not understand `__forceinline`, so we need to use `inline` when compiling for GPU. ICC specializes `std::complex` operators for `float` and `double` by default, which cannot be used on device and conflict with Eigen's workaround in CUDA/Complex.h. This can be prevented by defining `_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`. Added this define to the tests and to `Eigen/Core`, but this will not work if the user includes `<complex>` before `<Eigen/Core>`. ICC also seems to generate a duplicate `Map` symbol in `PlainObjectBase`: ``` error: "Map" has already been declared in the current scope static ConstMapType Map(const Scalar *data) ``` I tracked this down to `friend class Eigen::Map`. Putting the `friend` statements at the bottom of the class seems to resolve this issue. Fixes #2180	2021-03-15 18:42:04 +00:00
Antonio Sanchez	14487ed14e	Add increment/decrement operators to Eigen::half. This is for consistency with bfloat16, and to support initialization with `std::iota`.	2021-03-15 10:52:23 -07:00
Antonio Sanchez	b271110788	Bump up rand histogram threshold. The previous one sometimes fails for MSVC which has a poor random number generator. Fixes #2182	2021-03-10 22:17:03 -08:00
Antonio Sanchez	543e34ab9d	Re-implement move assignments. The original swap approach leads to potential undefined behavior (reading uninitialized memory) and results in unnecessary copying of data for static storage. Here we pass down the move assignment to the underlying storage. Static storage does a one-way copy, dynamic storage does a swap. Modified the tests to no longer read from the moved-from matrix/tensor, since that can lead to UB. Added a test to ensure we do not access uninitialized memory in a move. Fixes: #2119	2021-03-10 16:55:20 +00:00
Antonio Sanchez	2468253c9a	Define EIGEN_CPLUSPLUS and replace most __cplusplus checks. The macro `__cplusplus` is not defined correctly in MSVC unless building with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the specified c++ standard version number. Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version number both for MSVC and otherwise. This simplifies checks for supported features. Also replaced most instances of standard version checking via `__cplusplus` with the existing `EIGEN_COMP_CXXVER` macro for better clarity. Fixes: #2170	2021-03-05 18:33:18 +00:00
Antonio Sanchez	82d61af3a4	Fix rint SSE/NEON again, using optimization barrier. This is a new version of !423, which failed for MSVC. Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to prevent operations involving `X` from crossing that barrier. Should work on most `GNUC` compatible compilers (MSVC doesn't seem to need this). This is a modified version adapted from what was used in `psincos_float` and tested on more platforms (see #1674, https://godbolt.org/z/73ezTG). Modified `rint` to use the barrier to prevent the add/subtract rounding trick from being optimized away. Also fixed an edge case for large inputs that get bumped up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.	2021-03-05 08:54:12 -08:00
Antonio Sánchez	9a663973b4	Revert "Fix rint for SSE/NEON." This reverts commit `e72dfeb8b9`	2021-03-03 18:51:51 +00:00
Antonio Sanchez	e72dfeb8b9	Fix rint for SSE/NEON. It seems sometimes with aggressive optimizations the combination `psub(padd(a, b), b)` trick to force rounding is compiled away. Here we replace with inline assembly to prevent this (I tried `volatile`, but that leads to additional loads from memory). Also fixed an edge case for large inputs `a` where adding `b` bumps the value up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.	2021-03-03 09:41:46 -08:00
Christoph Hertzberg	199c5f2b47	geo_alignedbox_5 was failing with AVX enabled, due to storing `Vector4d` in a `std::vector` without using an aligned allocator. Got rid of using `std::vector` and simplified the code. Avoid leading `_`	2021-03-01 03:59:21 +01:00
Antonio Sanchez	1e0c7d4f49	Add print for SSE/NEON, use NEON rounding intrinsics if available. In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the current rounding mode. For NEON, we use the provided intrinsics for rint/floor/ceil if available (armv8). Related to #1969.	2021-02-27 22:42:07 +00:00
Antonio Sanchez	c65c2b31d4	Make half/bfloat16 constructor take inputs by value, fix powerpc test. Since `numeric_limits<half>::max_exponent` is a static inline constant, it cannot be directly passed by reference. This triggers a linker error in recent versions of `g++-powerpc64le`. Changing `half` to take inputs by value fixes this. Wrapping `max_exponent` with `int(...)` to make an addressable integer also fixes this and may help with other custom `Scalar` types down-the-road. Also eliminated some compile warnings for powerpc.	2021-02-27 21:32:06 +00:00
Christoph Hertzberg	ca528593f4	Fixed/masked more implicit copy constructor warnings (cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)	2021-02-27 18:44:26 +01:00
Antonio Sanchez	29ebd84cb7	Fix NEON sqrt for 32-bit, add prsqrt. With !406, we accidentally broke arm 32-bit NEON builds, since `vsqrt_f32` is only available for 64-bit. Here we add back the `rsqrt` implementation for 32-bit, relying on a `prsqrt` implementation with better handling of edge cases. Note that several of the 32-bit NEON packet tests are currently failing - either due to denormal handling (NEON versions flush to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).	2021-02-26 14:08:40 -08:00
Rasmus Munk Larsen	fe19714f80	Merge branch 'rmlarsen1/eigen-nan_prop'	2021-02-26 09:21:24 -08:00
Antonio Sanchez	e19829c3b0	Fix floor/ceil for NEON fp16. Forgot to test this. Fixes bug introduced in !416.	2021-02-25 20:39:56 -08:00
Antonio Sanchez	5529db7524	Fix SSE/NEON pfloor/pceil for saturated values. The original will saturate if the input does not fit into an integer type. Here we fix this, returning the input if it doesn't have enough precision to have a fractional part. Also added `pceil` for NEON. Fixes #1969.	2021-02-25 14:39:26 -08:00
Rasmus Munk Larsen	5297b7162a	Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions.	2021-02-25 18:21:21 +00:00
Antonio Sanchez	ecb7b19dfa	Disable new/delete test for HIP	2021-02-25 08:04:05 -08:00
Antonio Sanchez	5908aeeaba	Fix CUDA device new and delete, and add test. HIP does not support new/delete on device, so test is skipped.	2021-02-24 11:31:41 -08:00
Antonio Sanchez	119763cf38	Eliminate CMake FindPackageHandleStandardArgs warnings. CMake complains that the package name does not match when the case differs, e.g.: ``` CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message): The package name passed to `find_package_handle_standard_args` (UMFPACK) does not match the name of the calling package (Umfpack). This can lead to problems in calling code that expects `find_package` result variables (e.g., `_FOUND`) to follow a certain pattern. Call Stack (most recent call first): cmake/FindUmfpack.cmake:50 (find_package_handle_standard_args) bench/spbench/CMakeLists.txt:24 (find_package) This warning is for project developers. Use -Wno-dev to suppress it. ``` Here we rename the libraries to match their true cases.	2021-02-24 09:52:05 +00:00
Adam Shapiro	2ac0b78739	Fixed sparse conservativeResize() when both num cols and rows decreased. The previous implementation caused a buffer overflow trying to calculate non- zero counts for columns that no longer exist.	2021-02-23 21:32:39 +00:00
Christoph Hertzberg	ce4af0b38f	Missing change regarding #1910	2021-02-19 20:51:35 +01:00
Christoph Hertzberg	a7749c09bc	Bug #1910 : Make SparseCholesky work for RowMajor matrices	2021-02-19 19:36:18 +01:00
Rasmus Munk Larsen	be0574e215	New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.	2021-02-17 02:50:32 +00:00
Antonio Sanchez	7ff0b7a980	Updated pfrexp implementation. The original implementation fails for 0, denormals, inf, and NaN. See #2150	2021-02-17 02:23:24 +00:00
Antonio Sanchez	4cb563a01e	Fix ldexp implementations. The previous implementations produced garbage values if the exponent did not fit within the exponent bits. See #2131 for a complete discussion, and !375 for other possible implementations. Here we implement the 4-factor version. See `pldexp_impl` in `GenericPacketMathFunctions.h` for a full description. The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>` requires `por`. Left as a "TODO" is to delegate to a faster version if we know the exponent does fit within the exponent bits. Fixes #2131.	2021-02-10 22:45:41 +00:00
Ralf Hannemann-Tamas	984d010b7b	add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves	2021-02-08 22:00:31 +00:00
Rasmus Munk Larsen	6e3b795f81	Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one.	2021-02-05 16:58:49 -08:00
Gael Guennebaud	0668c68b03	Allow for negative strides. Note that using a stride of -1 is still not possible because it would clash with the definition of Eigen::Dynamic. This fixes #747.	2021-01-27 23:32:12 +01:00
Samir Benmendil	288d456c29	Replace language_support module with builtin CheckLanguage The workaround_9220 function was introduced a long time ago to workaround a CMake issue with enable_language(OPTIONAL). Since then CMake has clarified that the OPTIONAL keywords has not been implemented[0]. A CheckLanguage module is now provided with CMake to check if a language can be enabled. Use that instead. [0] https://cmake.org/cmake/help/v3.18/command/enable_language.html	2021-01-27 13:26:40 +00:00
Antonio Sanchez	4c42d5ee41	Eliminate implicit conversion warning in test/array_cwise.cpp	2021-01-23 11:54:00 -08:00
Antonio Sanchez	e0d13ead90	Replace std::isnan with numext::isnan for c++03	2021-01-23 11:02:35 -08:00
Antonio Sanchez	f0e46ed5d4	Fix pow and other cwise ops for half/bfloat16. The new `generic_pow` implementation was failing for half/bfloat16 since their construction from int/float is not `constexpr`. Modified in `GenericPacketMathFunctions` to remove `constexpr`. While adding tests for half/bfloat16, found other issues related to implicit conversions. Also needed to implement `numext::arg` for non-integer, non-complex, non-float/double/long double types. These seem to be implicitly converted to `std::complex<T>`, which then fails for half/bfloat16.	2021-01-22 11:10:54 -08:00
Antonio Sanchez	f19bcffee6	Specialize std::complex operators for use on GPU device. NVCC and older versions of clang do not fully support `std::complex` on device, leading to either compile errors (Cannot call `__host__` function) or worse, runtime errors (Illegal instruction). For most functions, we can implement specialized `numext` versions. Here we specialize the standard operators (with the exception of stream operators and member function operators with a scalar that are already specialized in `<complex>`) so they can be used in device code as well. To import these operators into the current scope, use `EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces. This allow us to remove specializations of the sum/difference/product/quotient ops, and allow us to treat complex numbers like most other scalars (e.g. in tests).	2021-01-22 18:19:19 +00:00
Antonio Sanchez	b2126fd6b5	Fix pfrexp/pldexp for half. The recent addition of vectorized pow (!330) relies on `pfrexp` and `pldexp`. This was missing for `Eigen::half` and `Eigen::bfloat16`. Adding tests for these packet ops also exposed an issue with handling negative values in `pfrexp`, returning an incorrect exponent. Added the missing implementations, corrected the exponent in `pfrexp1`, and added `packetmath` tests.	2021-01-21 19:32:28 +00:00
Antonio Sanchez	25d8498f8b	Fix stable_norm_1 test. Test enters an infinite loop if size is 1x1 when choosing to select unique indices for adding `inf` and `NaN` to the input. Here we revert to non-unique indices, and split the `hypotNorm` check into two cases: one where both `inf` and `NaN` are added, and one where only `NaN` is added.	2021-01-21 09:44:42 -08:00
Rasmus Munk Larsen	cdd8fdc32e	Vectorize `pow(x, y)`. This closes https://gitlab.com/libeigen/eigen/-/issues/2085 , which also contains a description of the algorithm. I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics: ``` max_rel_error = 8.34405e-07 rms_rel_error = 2.76654e-07 ``` If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`: ``` max_rel_error = 0.666667 rms = 6.8727e-05 count = 1335165689 argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45 ``` which seems reasonable, since these results are subnormals with only couple of significant bits left.	2021-01-18 13:25:16 +00:00
Antonio Sanchez	bde6741641	Improved std::complex sqrt and rsqrt. Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously `complex_sqrt` was only used for CUDA and MSVC), and implements custom `complex_rsqrt`. Also introduces `numext::rsqrt` to simplify implementation, and modified `numext::hypot` to adhere to IEEE IEC 6059 for special cases. The `complex_sqrt` and `complex_rsqrt` implementations were found to be significantly faster than `std::sqrt<std::complex<T>>` and `1/numext::sqrt<std::complex<T>>`. Benchmark file attached. ``` GCC 10, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 9.21 ns 9.21 ns 73225448 BM_StdSqrt<std::complex<float>> 17.1 ns 17.1 ns 40966545 BM_Sqrt<std::complex<double>> 8.53 ns 8.53 ns 81111062 BM_StdSqrt<std::complex<double>> 21.5 ns 21.5 ns 32757248 BM_Rsqrt<std::complex<float>> 10.3 ns 10.3 ns 68047474 BM_DivSqrt<std::complex<float>> 16.3 ns 16.3 ns 42770127 BM_Rsqrt<std::complex<double>> 11.3 ns 11.3 ns 61322028 BM_DivSqrt<std::complex<double>> 16.5 ns 16.5 ns 42200711 Clang 11, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 7.46 ns 7.45 ns 90742042 BM_StdSqrt<std::complex<float>> 16.6 ns 16.6 ns 42369878 BM_Sqrt<std::complex<double>> 8.49 ns 8.49 ns 81629030 BM_StdSqrt<std::complex<double>> 21.8 ns 21.7 ns 31809588 BM_Rsqrt<std::complex<float>> 8.39 ns 8.39 ns 82933666 BM_DivSqrt<std::complex<float>> 14.4 ns 14.4 ns 48638676 BM_Rsqrt<std::complex<double>> 9.83 ns 9.82 ns 70068956 BM_DivSqrt<std::complex<double>> 15.7 ns 15.7 ns 44487798 Clang 9, Pixel 2, aarch64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 24.2 ns 24.1 ns 28616031 BM_StdSqrt<std::complex<float>> 104 ns 103 ns 6826926 BM_Sqrt<std::complex<double>> 31.8 ns 31.8 ns 22157591 BM_StdSqrt<std::complex<double>> 128 ns 128 ns 5437375 BM_Rsqrt<std::complex<float>> 31.9 ns 31.8 ns 22384383 BM_DivSqrt<std::complex<float>> 99.2 ns 98.9 ns 7250438 BM_Rsqrt<std::complex<double>> 46.0 ns 45.8 ns 15338689 BM_DivSqrt<std::complex<double>> 119 ns 119 ns 5898944 ```	2021-01-17 08:50:57 -08:00
Antonio Sanchez	f149e0ebc3	Fix MSVC complex sqrt and packetmath test. MSVC incorrectly handles `inf` cases for `std::sqrt<std::complex<T>>`. Here we replace it with a custom version (currently used on GPU). Also fixed the `packetmath` test, which previously skipped several corner cases since `CHECK_CWISE1` only tests the first `PacketSize` elements.	2021-01-08 01:17:19 +00:00
Antonio Sanchez	8d9cfba799	Fix rand test for MSVC. MSVC's uniform random number generator is not quite as uniform as others, requiring a slightly wider threshold on the histogram test. After inspecting histograms for several runs, there's no obvious bias -- just some bins end up having slightly more less elements (often > 2% but less than 2.5%).	2021-01-07 12:48:40 -08:00
Essex Edwards	e741b43668	Make Transform::computeRotationScaling(0,&S) continuous	2021-01-07 17:45:14 +00:00
Antonio Sanchez	bb1de9dbde	Fix Ref Stride checks. The existing `Ref` class failed to consider cases where the Ref's `Stride` setting could match the underlying referred object's stride, but didn't at runtime. This led to trying to set invalid stride values, causing runtime failures in some cases, and garbage due to mismatched strides in others. Here we add the missing runtime checks. This involves computing the strides necessary to align with the referred object's storage, and verifying we can actually set those strides at runtime. In the `const` case, if it may be possible to refer to the original storage at compile-time but fails at runtime, then we defer to the `construct(...)` method that makes a copy. Added more tests to check these cases. Fixes #2093.	2021-01-05 10:41:25 -08:00
Christoph Hertzberg	12dda34b15	Eliminate boolean product warnings by factoring out a `combine_scalar_factors` helper function.	2021-01-05 18:15:30 +00:00
Antonio Sanchez	070d303d56	Add CUDA complex sqrt. This is to support scalar `sqrt` of complex numbers `std::complex<T>` on device, requested by Tensorflow folks. Technically `std::complex` is not supported by NVCC on device (though it is by clang), so the default `sqrt(std::complex<T>)` function only works on the host. Here we create an overload to add back the functionality. Also modified the CMake file to add `--relaxed-constexpr` (or equivalent) flag for NVCC to allow calling constexpr functions from device functions, and added support for specifying compute architecture for NVCC (was already available for clang).	2020-12-22 23:25:23 -08:00
Antonio Sanchez	c6efc4e0ba	Replace M_LOG2E and M_LN2 with custom macros. For these to exist we would need to define `_USE_MATH_DEFINES` before `cmath` or `math.h` is first included. However, we don't control the include order for projects outside Eigen, so even defining the macro in `Eigen/Core` does not fix the issue for projects that end up including `<cmath>` before Eigen does (explicitly or transitively). To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.	2020-12-11 14:34:31 -08:00
Rasmus Munk Larsen	125cc9a5df	Implement vectorized complex square root. Closes #1905 Measured speedup for sqrt of `complex<float>` on Skylake: SSE: ``` name old time/op new time/op delta BM_eigen_sqrt_ctype/1 49.4ns ± 0% 54.3ns ± 0% +10.01% BM_eigen_sqrt_ctype/8 332ns ± 0% 50ns ± 1% -84.97% BM_eigen_sqrt_ctype/64 2.81µs ± 1% 0.38µs ± 0% -86.49% BM_eigen_sqrt_ctype/512 23.8µs ± 0% 3.0µs ± 0% -87.32% BM_eigen_sqrt_ctype/4k 202µs ± 0% 24µs ± 2% -88.03% BM_eigen_sqrt_ctype/32k 1.63ms ± 0% 0.19ms ± 0% -88.18% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 1.5ms ± 1% -88.20% BM_eigen_sqrt_ctype/1M 52.1ms ± 0% 6.2ms ± 0% -88.18% ``` AVX2: ``` name old cpu/op new cpu/op delta BM_eigen_sqrt_ctype/1 53.6ns ± 0% 55.6ns ± 0% +3.71% BM_eigen_sqrt_ctype/8 334ns ± 0% 27ns ± 0% -91.86% BM_eigen_sqrt_ctype/64 2.79µs ± 0% 0.22µs ± 2% -92.28% BM_eigen_sqrt_ctype/512 23.8µs ± 1% 1.7µs ± 1% -92.81% BM_eigen_sqrt_ctype/4k 201µs ± 0% 14µs ± 1% -93.24% BM_eigen_sqrt_ctype/32k 1.62ms ± 0% 0.11ms ± 1% -93.29% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 0.9ms ± 1% -93.31% BM_eigen_sqrt_ctype/1M 52.0ms ± 0% 3.5ms ± 1% -93.31% ``` AVX512: ``` name old cpu/op new cpu/op delta BM_eigen_sqrt_ctype/1 53.7ns ± 0% 56.2ns ± 1% +4.75% BM_eigen_sqrt_ctype/8 334ns ± 0% 18ns ± 2% -94.63% BM_eigen_sqrt_ctype/64 2.79µs ± 0% 0.12µs ± 1% -95.54% BM_eigen_sqrt_ctype/512 23.9µs ± 1% 1.0µs ± 1% -95.89% BM_eigen_sqrt_ctype/4k 202µs ± 0% 8µs ± 1% -96.13% BM_eigen_sqrt_ctype/32k 1.63ms ± 0% 0.06ms ± 1% -96.15% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 0.5ms ± 4% -96.11% BM_eigen_sqrt_ctype/1M 52.1ms ± 0% 2.0ms ± 1% -96.13% ```	2020-12-08 18:13:35 -08:00
Rasmus Munk Larsen	f9fac1d5b0	Add log2() to Eigen.	2020-12-04 21:45:09 +00:00
Rasmus Munk Larsen	f23dc5b971	Revert "Add log2() operator to Eigen" This reverts commit `4d91519a9b`.	2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen	4d91519a9b	Add log2() operator to Eigen	2020-12-03 22:31:44 +00:00
Antonio Sanchez	eb4d4ae070	Include chrono in main for c++11. Hack to fix tensor tests, since min/max are overridden by `main.h`.	2020-12-03 11:27:32 -08:00
Antonio Sanchez	89f90b585d	AVX512 missing ops. This allows the `packetmath` tests to pass for AVX512 on skylake. Made `half` and `bfloat16` consistent in terms of ops they support. Note the `log` tests are currently disabled for `bfloat16` since they fail due to poor precision (they were previously disabled for `Packet8bf` via test function specialization -- I just removed that specialization and disabled it in the generic test).	2020-11-30 16:28:57 +00:00
Bowie Owens	9842366bba	Make inclusion of doc sub-directory optional by adjusting options. Allows exclusion of doc and related targets to help when using eigen via add_subdirectory(). Requested by: https://gitlab.com/libeigen/eigen/-/issues/1842 Also required making EIGEN_TEST_BUILD_DOCUMENTATION a dependent option on EIGEN_BUILD_DOC. This ensures documentation targets are properly defined when EIGEN_TEST_BUILD_DOCUMENTATION is ON.	2020-11-27 08:11:49 +11:00
Rasmus Munk Larsen	79818216ed	Revert "Fix Half NaN definition and test." This reverts commit `c770746d70`.	2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen	c770746d70	Fix Half NaN definition and test. The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.	2020-11-24 20:53:07 +00:00
Antonio Sanchez	a3b300f1af	Implement missing AVX half ops. Minimal implementation of AVX `Eigen::half` ops to bring in line with `bfloat16`. Allows `packetmath_13` to pass. Also adjusted `bfloat16` packet traits to match the supported set of ops (e.g. Bessel is not actually implemented).	2020-11-24 16:46:41 +00:00
Antonio Sanchez	38abf2be42	Fix Half NaN definition and test. The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.	2020-11-23 14:13:59 -08:00
Antonio Sanchez	4cf01d2cf5	Update AVX half packets, disable test. The AVX half implementation is incomplete, causing the `packetmath_13` test to fail. This disables the test. Also refactored the existing AVX implementation to use `bit_cast` instead of direct access to `.x`.	2020-11-21 09:05:10 -08:00
Antonio Sanchez	a8fdcae55d	Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix. Multiplication of column-major `DynamicSparseMatrix`es involves three temporaries: - two for transposing twice to sort the coefficients (`ConservativeSparseSparseProduct.h`, L160-161) - one for a final copy assignment (`SparseAssign.h`, L108) The latter is avoided in an optimization for `SparseMatrix`. Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not worth the effort to optimize further, so I simply disabled counting temporaries via a macro. Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra` tests actually re-run all the original `sparse_product` tests as well. We may want to simply drop the `DynamicSparseMatrix` tests altogether, which would eliminate the test duplication. Related to #2048	2020-11-18 23:15:33 +00:00
David Tellenbach	11e4056f6b	Re-enable Arm Neon Eigen::half packets of size 8 - Add predux_half_dowto4 - Remove explicit casts in Half.h to match the behaviour of BFloat16.h - Enable more packetmath tests for Eigen::half	2020-11-18 23:02:21 +00:00
Antonio Sanchez	17268b155d	Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom The existing `TensorRandom.h` implementation makes the assumption that `half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not always true. This currently fails on arm64, where `x` has type `__fp16`. Added `bit_cast` specializations to allow casting to/from `uint16_t` for both `half` and `bfloat16`. Also added tests in `half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch these errors in the future.	2020-11-18 20:32:35 +00:00
Antonio Sanchez	41d5d5334b	Initialize primitives to fix -Wuninitialized-const-reference. The `meta` test generates warnings with the latest version of clang due to passing uninitialized variables as const reference arguments. ``` test/meta.cpp:102:45: error: variable 'f' is uninitialized when passed as a const reference argument here [-Werror,-Wuninitialized-const-reference] VERIFY(( check_is_convertible(a.dot(b), f) )); ``` We don't actually use the variables, but initializing them eliminates the new warning. Fixes #2067.	2020-11-18 20:23:20 +00:00
Antonio Sanchez	8e9cc5b10a	Eliminate double-promotion warnings. Clang currently complains about implicit conversions, e.g. ``` test/packetmath.cpp:680:59: warning: implicit conversion increases floating-point precision: 'typename Eigen::internal::random_retval<typename Eigen::internal::global_math_functions_filtering_base<double>::type>::type' (aka 'double') to 'long double' [-Wdouble-promotion] data1[0] = Scalar((2 * k + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2)); ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ test/packetmath.cpp:681:40: warning: implicit conversion increases floating-point precision: 'float' to 'long double' [-Wdouble-promotion] data1[1] = Scalar((2 * k + 2 + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2)); ``` Modified to explicitly cast to double.	2020-11-16 10:39:09 -08:00
Antonio Sanchez	bb69a8db5d	Explicit casts of S -> std::complex<T> When calling `internal::cast<S, std::complex<T>>(x)`, clang often generates an implicit conversion warning due to an implicit cast from type `S` to `T`. This currently affects the following tests: - `basicstuff` - `bfloat16_float` - `cxx11_tensor_casts` The implicit cast leads to widening/narrowing float conversions. Widening warnings only seem to be generated by clang (`-Wdouble-promotion`). To eliminate the warning, we explicitly cast the real-component first from `S` to `T`. We also adjust tests to use `internal::cast` instead of `static_cast` when a complex type may be involved.	2020-11-14 05:50:42 +00:00
Christoph Hertzberg	90f6d9d23e	Suppress ignored-attributes warning (same as in vectorization_logic). Remove redundant include and using namespace.	2020-11-13 16:21:53 +01:00
Everton Constantino	348a48682e	Fix erroneous forward declaration of boost nvp.	2020-11-10 13:07:34 -03:00
Deven Desai	9d11e2c03e	CMakefile update for ROCm 4.0 Starting with ROCm 4.0, the `hipconfig --platform` command will return `amd` (prior return value was `hcc`). Updating the CMakeLists.txt files in the test dirs to account for this change.	2020-10-29 18:06:31 +00:00
David Tellenbach	e265f7ed8e	Add support for Armv8.2-a __fp16 Armv8.2-a provides a native half-precision floating point (__fp16 aka. float16_t). This patch introduces * __fp16 as underlying type of Eigen::half if this type is available * the packet types Packet4hf and Packet8hf representing float16x4_t and float16x8_t respectively * packet-math for the above packets with corresponding scalar type Eigen::half The packet-math functionality has been implemented by Ashutosh Sharma <ashutosh.sharma@amperecomputing.com>. This closes #1940.	2020-10-28 20:15:09 +00:00
Rasmus Munk Larsen	c6953f799b	Add packet generic ops `predux_fmin`, `predux_fmin_nan`, `predux_fmax`, and `predux_fmax_nan` that implement reductions with `PropagateNaN`, and `PropagateNumbers` semantics. Add (slow) generic implementations for most reductions.	2020-10-13 21:48:31 +00:00
Rasmus Munk Larsen	4e4d3f32d1	Clean up packetmath tests and fix various bugs to make bfloat16 pass (almost) all packetmath tests with SSE, AVX, and AVX512.	2020-10-09 20:05:49 +00:00
David Tellenbach	7a8d3d5b81	Disable test exceptions when using OpenMP.	2020-10-09 17:49:07 +02:00
Rasmus Munk Larsen	b431024404	Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms. Change test to only test for NaN-propagation for pfmin/pfmax.	2020-10-07 19:05:18 +00:00
Rasmus Munk Larsen	3b445d9bf2	Add a generic packet ops corresponding to {std}::fmin and {std}::fmax. The non-sensical NaN-propagation rules for std::min std::max implemented by pmin and pmax in Eigen is a longstanding source og confusion and bug report. This change is a first step towards addressing it, as discussing in issue #564 .	2020-10-01 16:54:31 +00:00
Antonio Sanchez	d5a0d89491	Fix alignedbox 32-bit precision test failure. The current `test/geo_alignedbox` tests fail on 32-bit arm due to small floating-point errors. In particular, the following is not guaranteed to hold: ``` IsometryTransform identity = IsometryTransform::Identity(); BoxType transformedC; transformedC.extend(c.transformed(identity)); VERIFY(transformedC.contains(c)); ``` since `c.transformed(identity)` is ever-so-slightly different from `c`. Instead, we replace this test with one that checks an identity transform is within floating-point precision of `c`. Also updated the condition on `AlignedBox::transform(...)` to only accept `Affine`, `AffineCompact`, and `Isometry` modes explicitly. Otherwise, invalid combinations of modes would also incorrectly pass the assertion.	2020-09-30 08:42:03 -07:00
Martin Pecka	6425e875a1	Added AlignedBox::transform(AffineTransform).	2020-09-28 18:06:23 +00:00
David Tellenbach	493a7c773c	Remove EIGEN_CONSTEXPR from NumTraits<boost::multiprecision::number<...>>	2020-09-21 12:43:41 +02:00
Rasmus Munk Larsen	e55182ac09	Get rid of initialization logic for blueNorm by making the computed constants static const or constexpr. Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.	2020-09-18 17:38:58 +00:00
Tim Shen	bb56a62582	Make bfloat16(float(-nan)) produce -nan, not nan.	2020-09-15 13:24:23 -07:00
Pedro Caldeira	35d149e34c	Add missing functions for Packet8bf in Altivec architecture. Including new tests for bfloat16 Packets. Fix prsqrt on GenericPacketMath.	2020-09-08 09:22:11 -05:00
Everton Constantino	6fe88a3c9d	MatrixProuct enhancements: - Changes to Altivec/MatrixProduct Adapting code to gcc 10. Generic code style and performance enhancements. Adding PanelMode support. Adding stride/offset support. Enabling float64, std::complex and std::complex. Fixing lack of symm_pack. Enabling mixedtypes. - Adding std::complex tests to blasutil. - Adding an implementation of storePacketBlock when Incr!= 1.	2020-09-02 18:21:36 -03:00
Gael Guennebaud	25424d91f6	Fix #1974 : assertion when reserving an empty sparse matrix	2020-08-26 12:32:20 +02:00
Deven Desai	603e213d13	Fixing a CUDA / P100 regression introduced by PR 181 PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified. That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only	2020-08-20 00:29:57 +00:00

1 2 3 4 5 ...

2524 Commits