eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-27 07:29:52 +08:00

Author	SHA1	Message	Date
Antonio Sanchez	c65c2b31d4	Make half/bfloat16 constructor take inputs by value, fix powerpc test. Since `numeric_limits<half>::max_exponent` is a static inline constant, it cannot be directly passed by reference. This triggers a linker error in recent versions of `g++-powerpc64le`. Changing `half` to take inputs by value fixes this. Wrapping `max_exponent` with `int(...)` to make an addressable integer also fixes this and may help with other custom `Scalar` types down-the-road. Also eliminated some compile warnings for powerpc.	2021-02-27 21:32:06 +00:00
Christoph Hertzberg	39a590dfb6	Remove unused include	2021-02-27 19:02:33 +01:00
Christoph Hertzberg	8f686ac4ec	clang 10 aggressively warns about precision loss when converting int to float (or long to double) (cherry picked from commit cd541ad52c8152340469cae210312c0e27829c8d)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	a3521d743c	Fix some enum-enum conversion warnings (cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	ca528593f4	Fixed/masked more implicit copy constructor warnings (cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	4fb3459a23	Fix double-promotion warnings (cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)	2021-02-27 18:44:26 +01:00
Antonio Sanchez	29ebd84cb7	Fix NEON sqrt for 32-bit, add prsqrt. With !406, we accidentally broke arm 32-bit NEON builds, since `vsqrt_f32` is only available for 64-bit. Here we add back the `rsqrt` implementation for 32-bit, relying on a `prsqrt` implementation with better handling of edge cases. Note that several of the 32-bit NEON packet tests are currently failing - either due to denormal handling (NEON versions flush to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).	2021-02-26 14:08:40 -08:00
Rasmus Munk Larsen	fe19714f80	Merge branch 'rmlarsen1/eigen-nan_prop'	2021-02-26 09:21:24 -08:00
Rasmus Munk Larsen	e67672024d	Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop	2021-02-26 09:12:44 -08:00
Rasmus Munk Larsen	5e7d4c33d6	Add TODO.	2021-02-26 09:08:45 -08:00
Rasmus Munk Larsen	fb5b59641a	Defer default for minCoeff/maxCoeff to templated variant.	2021-02-26 09:07:00 -08:00
Antonio Sanchez	e19829c3b0	Fix floor/ceil for NEON fp16. Forgot to test this. Fixes bug introduced in !416.	2021-02-25 20:39:56 -08:00
Antonio Sanchez	5529db7524	Fix SSE/NEON pfloor/pceil for saturated values. The original will saturate if the input does not fit into an integer type. Here we fix this, returning the input if it doesn't have enough precision to have a fractional part. Also added `pceil` for NEON. Fixes #1969.	2021-02-25 14:39:26 -08:00
Rasmus Munk Larsen	51eba8c3e2	Fix indentation.	2021-02-25 18:21:21 +00:00
Rasmus Munk Larsen	5297b7162a	Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions.	2021-02-25 18:21:21 +00:00
Chip-Kerchner	6eebe97bab	Fix clang compile when no MMA flags are set. Simplify MMA compiler detection.	2021-02-24 20:43:23 -06:00
Rasmus Munk Larsen	4cb0592af7	Fix indentation.	2021-02-24 17:59:36 -08:00
Rasmus Munk Larsen	0065f9d322	Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions.	2021-02-25 01:54:36 +00:00
Rasmus Munk Larsen	113e61f364	Remove unused function scalar_cmp_with_cast.	2021-02-24 23:59:35 +00:00
Rasmus Munk Larsen	98ca58b02c	Cast anonymous enums to int when used in expressions.	2021-02-24 23:50:06 +00:00
Chip-Kerchner	c31ead8a15	Having forward template function declarations in a P10 file causes bad code in certain situations.	2021-02-24 23:43:30 +00:00
Antonio Sanchez	a31effc3bc	Add `invoke_result` and eliminate `result_of` warnings for C++17+. The `std::result_of` meta struct is deprecated in C++17 and removed in C++20. It was still slipping through due to a faulty definition of `EIGEN_HAS_STD_RESULT_OF`. Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and `Eigen::internal::invoke_result` implementation with fallback for pre C++17. Replaces the `result_of` definition with one based on `std::invoke_result` for C++17 and higher. For completeness, added nullary op support for c++03. Fixes #1850.	2021-02-24 21:36:14 +00:00
Chip-Kerchner	8523d447a1	Fixes to support old and new versions of the compilers for built-ins. Cast to non-const when using vector_pair with certain built-ins.	2021-02-24 20:49:15 +00:00
Antonio Sanchez	5908aeeaba	Fix CUDA device new and delete, and add test. HIP does not support new/delete on device, so test is skipped.	2021-02-24 11:31:41 -08:00
Antonio Sanchez	6cf0ab5e99	Disable fast psqrt for NEON. Accuracy is too poor - requires at least two Newton iterations, but then it is no longer significantly faster than `vsqrt`. Fixes #2094.	2021-02-23 19:52:55 -08:00
Antonio Sanchez	aba3998278	Fix check if GPU compile phase for std::hash	2021-02-23 19:52:08 -08:00
Antonio Sanchez	db5691ff2b	Fix some CUDA warnings. Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not running on GPU. `std::hash<float>` is not a device function, so cannot be used by `std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only define if `EIGEN_HAS_STD_HASH`. Same for `half`. Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability, eliminate warnings about `EIGEN_CUDA_ARCH` not being defined. Replaced a couple C-style casts with `reinterpret_cast` for aligned loading of `half` to `half2`. This eliminates `-Wcast-align` warnings in clang. Although not ideal due to potential type aliasing, this is how CUDA handles these conversions internally.	2021-02-24 00:16:31 +00:00
Rasmus Munk Larsen	88d4c6d4c8	Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect implementation.	2021-02-23 23:11:03 +00:00
Adam Shapiro	2ac0b78739	Fixed sparse conservativeResize() when both num cols and rows decreased. The previous implementation caused a buffer overflow trying to calculate non- zero counts for columns that no longer exist.	2021-02-23 21:32:39 +00:00
Chip-Kerchner	10c77b0ff4	Fix compilation errors with later versions of GCC and use of MMA.	2021-02-22 15:01:47 -06:00
Christoph Hertzberg	73922b0174	Fixes Bug #1925 . Packets should be passed by const reference, even to inline functions.	2021-02-20 18:56:42 +01:00
Christoph Hertzberg	a7749c09bc	Bug #1910 : Make SparseCholesky work for RowMajor matrices	2021-02-19 19:36:18 +01:00
Antonio Sánchez	128eebf05e	Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC)." This reverts commit `12fd3dd655`	2021-02-19 17:09:16 +00:00
Rasmus Munk Larsen	7f09d3487d	Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp.	2021-02-18 20:49:18 +00:00
Masaki Murooka	12fd3dd655	add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC).	2021-02-17 22:55:47 +00:00
David Tellenbach	aa8b22e776	Bump to 3.4.99	2021-02-17 23:23:17 +01:00
David Tellenbach	5336ad8591	Define internal::make_unsigned for [unsigned]long long on macOS. macOS defines int64_t as long long even for C++03 and therefore expects a template specialization internal::make_unsigned<long long>, for C++03. Since other platforms define int64_t as long for C++03 we cannot add the specialization for all cases.	2021-02-17 23:03:10 +01:00
Antonio Sanchez	0845df7f77	Fix uninitialized warning on AVX.	2021-02-17 13:13:39 -08:00
Chip Kerchner	9b51dc7972	Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product	2021-02-17 17:49:23 +00:00
Rasmus Munk Larsen	be0574e215	New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.	2021-02-17 02:50:32 +00:00
Antonio Sanchez	7ff0b7a980	Updated pfrexp implementation. The original implementation fails for 0, denormals, inf, and NaN. See #2150	2021-02-17 02:23:24 +00:00
Ashutosh Sharma	f702792a7c	missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& kernel)	2021-02-16 16:33:59 +00:00
Jan van Dijk	db61b8d478	Avoid -Wunused warnings in NDEBUG builds. In two places in SuperLUSupport.h, a local variable 'size' is created that is used only inside an eigen_assert. Remove these, just fetch the required values inside the assert statements. This avoids annoying -Wunused warnings (and -Werror=unused errors) in NDEBUG builds.	2021-02-12 18:35:35 +00:00
Antonio Sanchez	90ee821c56	Use vrsqrts for rsqrt Newton iterations. It's slightly faster and slightly more accurate, allowing our current packetmath tests to pass for sqrt with a single iteration.	2021-02-11 11:33:51 -08:00
Antonio Sanchez	9fde9cce5d	Adjust bounds for pexp_float/double The original clamping bounds on `_x` actually produce finite values: ``` exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38 exp(709.437) = 1.27226e+308 < 1.79769e+308 ``` so with an accurate `ldexp` implementation, `pexp` fails for large inputs, producing finite values instead of `inf`. This adjusts the bounds slightly outside the finite range so that the output will overflow to +/- `inf` as expected.	2021-02-10 22:48:05 +00:00
Antonio Sanchez	4cb563a01e	Fix ldexp implementations. The previous implementations produced garbage values if the exponent did not fit within the exponent bits. See #2131 for a complete discussion, and !375 for other possible implementations. Here we implement the 4-factor version. See `pldexp_impl` in `GenericPacketMathFunctions.h` for a full description. The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>` requires `por`. Left as a "TODO" is to delegate to a faster version if we know the exponent does fit within the exponent bits. Fixes #2131.	2021-02-10 22:45:41 +00:00
Ashutosh Sharma	7eb07da538	loop less ptranspose	2021-02-10 10:21:37 -08:00
David Tellenbach	36200b7855	Remove vim specific comments to recognoize correct file-type. As discussed in #2143 we remove editor specific comments.	2021-02-09 09:13:09 +01:00
David Tellenbach	54589635ad	Replace nullptr by NULL in SparseLU.h to be C++03 compliant.	2021-02-09 09:08:06 +01:00
Ralf Hannemann-Tamas	984d010b7b	add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves	2021-02-08 22:00:31 +00:00
Nikolaus Demmel	b578930657	Fix documentation typos in LDLT.h	2021-02-08 21:07:29 +00:00
Antonio Sanchez	66841ea070	Enable bdcsvd on host. Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation is skipped, leading to a linker error. This prevents it from running on the host as well. Seems it was disabled 6 years ago (`5384e891`) to match `jacobiSvd`, but `jacobiSvd` is now enabled on host. Tested and runs fine on host, but will not compile/run for device (though it's not labelled as a device function, so this should be fine). Fixes #2139	2021-02-08 12:56:23 -08:00
Rasmus Munk Larsen	6e3b795f81	Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one.	2021-02-05 16:58:49 -08:00
Antonio Sanchez	abcde69a79	Disable vectorized pow for half/bfloat16. We are potentially seeing some accuracy issues with these. Ideally we would hand off to `float`, but that's not trivial with the current setup. We may want to consider adding `ppow<Packet>` and `HasPow`, so implementations can more easily specialize this.	2021-02-05 12:17:34 -08:00
Antonio Sanchez	f85038b7f3	Fix excessive GEBP register spilling for 32-bit NEON. Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138.	2021-02-03 09:01:48 -08:00
Antonio Sanchez	56c8b14d87	Eliminate implicit conversions from float to double.	2021-02-01 15:31:01 -08:00
Antonio Sanchez	fb4548e27b	Implement bit_* for device. Unfortunately `std::bit_and` and the like are host-only functions prior to c++14 (since they are not `constexpr`). They also never exist in the global namespace, so the current implementation always fails to compile via NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global namespace on device. To overcome these limitations, we implement these functionals here.	2021-02-01 13:27:45 -08:00
Antonio Sanchez	1615a27993	Fix altivec packetmath. Allows the altivec packetmath tests to pass. There were a few issues: - `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems - `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead of 0xFFFF) - `pfrexp` needed to set the `exponent` argument. Related to !370, #2128 cc: @ChipKerchner @pdrocaldeira Tested on `_BIG_ENDIAN` running on QEMU with VSX. Couldn't figure out build flags to get it to work for little endian.	2021-01-28 18:37:09 +00:00
Chip Kerchner	1414e2212c	Fix clang compilation for AltiVec from previous check-in	2021-01-28 18:36:40 +00:00
David Tellenbach	170a504c2f	Add the following functions DenseBase::setConstant(NoChange_t, Index, const Scalar&) DenseBase::setConstant(Index, NoChange_t, const Scalar&) to close #663.	2021-01-28 15:13:07 +01:00
David Tellenbach	598e1b6e54	Add the following functions: DenseBase::setZero(NoChange_t, Index) DenseBase::setZero(Index, NoChange_t) DenseBase::setOnes(NoChange_t, Index) DenseBase::setOnes(Index, NoChange_t) DenseBase::setRandom(NoChange_t, Index) DenseBase::setRandom(Index, NoChange_t) This closes #663.	2021-01-28 01:10:36 +01:00
Gael Guennebaud	0668c68b03	Allow for negative strides. Note that using a stride of -1 is still not possible because it would clash with the definition of Eigen::Dynamic. This fixes #747.	2021-01-27 23:32:12 +01:00
Antonio Sanchez	3f4684f87d	Include `<cstdint>` in one place, remove custom typedefs Originating from [this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case), some win32 compilers define `__int32` as a `long`, but MinGW defines `std::int32_t` as an `int`, leading to a type conflict. To avoid this, we remove the custom `typedef` definitions for win32. The Tensor module requires C++11 anyways, so we are guaranteed to have included `<cstdint>` already in `Eigen/Core`. Also re-arranged the headers to only include `<cstdint>` in one place to avoid this type of error again.	2021-01-26 14:23:05 -08:00
Chip Kerchner	0784d9f87b	Fix sqrt, ldexp and frexp compilation errors.	2021-01-25 15:22:19 -06:00
Florian Maurin	c35965b381	Remove unused variable in SparseLU.h	2021-01-22 22:24:11 +00:00
Antonio Sanchez	f0e46ed5d4	Fix pow and other cwise ops for half/bfloat16. The new `generic_pow` implementation was failing for half/bfloat16 since their construction from int/float is not `constexpr`. Modified in `GenericPacketMathFunctions` to remove `constexpr`. While adding tests for half/bfloat16, found other issues related to implicit conversions. Also needed to implement `numext::arg` for non-integer, non-complex, non-float/double/long double types. These seem to be implicitly converted to `std::complex<T>`, which then fails for half/bfloat16.	2021-01-22 11:10:54 -08:00
Antonio Sanchez	f19bcffee6	Specialize std::complex operators for use on GPU device. NVCC and older versions of clang do not fully support `std::complex` on device, leading to either compile errors (Cannot call `__host__` function) or worse, runtime errors (Illegal instruction). For most functions, we can implement specialized `numext` versions. Here we specialize the standard operators (with the exception of stream operators and member function operators with a scalar that are already specialized in `<complex>`) so they can be used in device code as well. To import these operators into the current scope, use `EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces. This allow us to remove specializations of the sum/difference/product/quotient ops, and allow us to treat complex numbers like most other scalars (e.g. in tests).	2021-01-22 18:19:19 +00:00
David Tellenbach	65e2169c45	Add support for Arm SVE This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently sizeless. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is length agnostic). During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector. Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`. This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements. This MR is joint work with Miguel Tairum <miguel.tairum@arm.com>.	2021-01-21 21:11:57 +00:00
Antonio Sanchez	b2126fd6b5	Fix pfrexp/pldexp for half. The recent addition of vectorized pow (!330) relies on `pfrexp` and `pldexp`. This was missing for `Eigen::half` and `Eigen::bfloat16`. Adding tests for these packet ops also exposed an issue with handling negative values in `pfrexp`, returning an incorrect exponent. Added the missing implementations, corrected the exponent in `pfrexp1`, and added `packetmath` tests.	2021-01-21 19:32:28 +00:00
Antonio Sanchez	d5b7981119	Fix signed-unsigned comparison. Hex literals are interpreted as unsigned, leading to a comparison between signed max supported function `abcd[0]` (which was negative) to the unsigned literal `0x80000006`. Should not change result since signed is implicitly converted to unsigned for the comparison, but eliminates the warning.	2021-01-20 08:34:00 -08:00
Ivan Popivanov	e409795d6b	Proper CPUID	2021-01-18 17:10:11 +00:00
Rasmus Munk Larsen	cdd8fdc32e	Vectorize `pow(x, y)`. This closes https://gitlab.com/libeigen/eigen/-/issues/2085 , which also contains a description of the algorithm. I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics: ``` max_rel_error = 8.34405e-07 rms_rel_error = 2.76654e-07 ``` If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`: ``` max_rel_error = 0.666667 rms = 6.8727e-05 count = 1335165689 argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45 ``` which seems reasonable, since these results are subnormals with only couple of significant bits left.	2021-01-18 13:25:16 +00:00
Antonio Sanchez	bde6741641	Improved std::complex sqrt and rsqrt. Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously `complex_sqrt` was only used for CUDA and MSVC), and implements custom `complex_rsqrt`. Also introduces `numext::rsqrt` to simplify implementation, and modified `numext::hypot` to adhere to IEEE IEC 6059 for special cases. The `complex_sqrt` and `complex_rsqrt` implementations were found to be significantly faster than `std::sqrt<std::complex<T>>` and `1/numext::sqrt<std::complex<T>>`. Benchmark file attached. ``` GCC 10, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 9.21 ns 9.21 ns 73225448 BM_StdSqrt<std::complex<float>> 17.1 ns 17.1 ns 40966545 BM_Sqrt<std::complex<double>> 8.53 ns 8.53 ns 81111062 BM_StdSqrt<std::complex<double>> 21.5 ns 21.5 ns 32757248 BM_Rsqrt<std::complex<float>> 10.3 ns 10.3 ns 68047474 BM_DivSqrt<std::complex<float>> 16.3 ns 16.3 ns 42770127 BM_Rsqrt<std::complex<double>> 11.3 ns 11.3 ns 61322028 BM_DivSqrt<std::complex<double>> 16.5 ns 16.5 ns 42200711 Clang 11, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 7.46 ns 7.45 ns 90742042 BM_StdSqrt<std::complex<float>> 16.6 ns 16.6 ns 42369878 BM_Sqrt<std::complex<double>> 8.49 ns 8.49 ns 81629030 BM_StdSqrt<std::complex<double>> 21.8 ns 21.7 ns 31809588 BM_Rsqrt<std::complex<float>> 8.39 ns 8.39 ns 82933666 BM_DivSqrt<std::complex<float>> 14.4 ns 14.4 ns 48638676 BM_Rsqrt<std::complex<double>> 9.83 ns 9.82 ns 70068956 BM_DivSqrt<std::complex<double>> 15.7 ns 15.7 ns 44487798 Clang 9, Pixel 2, aarch64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 24.2 ns 24.1 ns 28616031 BM_StdSqrt<std::complex<float>> 104 ns 103 ns 6826926 BM_Sqrt<std::complex<double>> 31.8 ns 31.8 ns 22157591 BM_StdSqrt<std::complex<double>> 128 ns 128 ns 5437375 BM_Rsqrt<std::complex<float>> 31.9 ns 31.8 ns 22384383 BM_DivSqrt<std::complex<float>> 99.2 ns 98.9 ns 7250438 BM_Rsqrt<std::complex<double>> 46.0 ns 45.8 ns 15338689 BM_DivSqrt<std::complex<double>> 119 ns 119 ns 5898944 ```	2021-01-17 08:50:57 -08:00
Guoqiang QI	38ae5353ab	1)provide a better generic paddsub op implementation 2)make paddsub op support the Packet2cf/Packet4f/Packet2f in NEON 3)make paddsub op support the Packet2cf/Packet4f in SSE	2021-01-13 22:54:03 +00:00
Antonio Sanchez	352f1422d3	Remove `inf` local variable. Apparently `inf` is a macro on iOS for `std::numeric_limits<T>::infinity()`, causing a compile error here. We don't need the local anyways since it's only used in one spot.	2021-01-12 10:33:15 -08:00
Antonio Sanchez	2044084979	Remove TODO from Transform::computeScaleRotation() Upon investigation, `JacobiSVD` is significantly faster than `BDCSVD` for small matrices (twice as fast for 2x2, 20% faster for 3x3, 1% faster for 10x10). Since the majority of cases will be small, let's stick with `JacobiSVD`. See !361.	2021-01-11 11:30:01 -08:00
Antonio Sanchez	3daf92c7a5	Transform::computeScalingRotation flush determinant to +/- 1. In the previous code, in attempting to correct for a negative determinant, we end up multiplying and dividing by a number that is often very near, but not exactly +/-1. By flushing to +/-1, we can replace a division with a multiplication, and results are more numerically consistent.	2021-01-11 10:13:38 -08:00
Antonio Sanchez	587fd6ab70	Only specialize complex `sqrt_impl` for CUDA if not MSVC. We already specialize `sqrt_impl` on windows due to MSVC's mishandling of `inf` (!355).	2021-01-11 09:15:45 -08:00
Deven Desai	2a6addb4f9	Fix for breakage in ROCm support - 210108 The following commit breaks ROCm support for Eigen `f149e0ebc3` All unit tests fail with the following error ``` Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o In file included from /home/rocm-user/eigen/test/gpu_basic.cu:19: In file included from /home/rocm-user/eigen/test/main.h:356: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:166: /home/rocm-user/eigen/Eigen/src/Core/MathFunctionsImpl.h:105:35: error: __host__ __device__ function 'complex_sqrt' cannot overload __host__ function 'complex_sqrt' EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& z) { ^ /home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:342:38: note: previous declaration is here template<typename T> std::complex<T> complex_sqrt(const std::complex<T>& a_x); ^ 1 error generated when compiling for gfx900. CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message): Error generating file /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed make[3]: * [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1 CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed make[2]: * [test/CMakeFiles/gpu_basic.dir/all] Error 2 CMakeFiles/Makefile2:16625: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed make[1]: * [test/CMakeFiles/gpu_basic.dir/rule] Error 2 Makefile:5401: recipe for target 'gpu_basic' failed make: * [gpu_basic] Error 2 ``` The error message is accurate, and the fix (provided in thsi commit) is trivial.	2021-01-08 18:04:40 +00:00
Antonio Sanchez	f149e0ebc3	Fix MSVC complex sqrt and packetmath test. MSVC incorrectly handles `inf` cases for `std::sqrt<std::complex<T>>`. Here we replace it with a custom version (currently used on GPU). Also fixed the `packetmath` test, which previously skipped several corner cases since `CHECK_CWISE1` only tests the first `PacketSize` elements.	2021-01-08 01:17:19 +00:00
Essex Edwards	e741b43668	Make Transform::computeRotationScaling(0,&S) continuous	2021-01-07 17:45:14 +00:00
David Tellenbach	0bdc0dba20	Add missing #endif directive in Macros.h	2021-01-07 12:32:41 +01:00
shrek1402	cb654b1c45	#define was defined incorrectly because the result_of function was deprecated in c++17 and removed in c++20. Also, EIGEN_COMP_MSVC (which is _MSC_VER) only affects result_of indirectly, which can cause errors.	2021-01-07 10:12:25 +00:00
Antonio Sanchez	52d1dd979a	Fix Ref initialization. Since `eigen_assert` is a macro, the statements can become noops (e.g. when compiling for GPU), so they may not execute the contained logic -- which in this case is the entire `Ref` construction. We need to separate the assert from statements which have consequences. Fixes #2113	2021-01-06 13:14:20 -08:00
Antonio Sanchez	166fcdecdb	Allow CwiseUnaryView to be used on device. Added `EIGEN_DEVICE_FUNC` to methods.	2021-01-06 09:16:52 -08:00
Antonio Sanchez	bb1de9dbde	Fix Ref Stride checks. The existing `Ref` class failed to consider cases where the Ref's `Stride` setting could match the underlying referred object's stride, but didn't at runtime. This led to trying to set invalid stride values, causing runtime failures in some cases, and garbage due to mismatched strides in others. Here we add the missing runtime checks. This involves computing the strides necessary to align with the referred object's storage, and verifying we can actually set those strides at runtime. In the `const` case, if it may be possible to refer to the original storage at compile-time but fails at runtime, then we defer to the `construct(...)` method that makes a copy. Added more tests to check these cases. Fixes #2093.	2021-01-05 10:41:25 -08:00
Christoph Hertzberg	12dda34b15	Eliminate boolean product warnings by factoring out a `combine_scalar_factors` helper function.	2021-01-05 18:15:30 +00:00
Antonio Sanchez	070d303d56	Add CUDA complex sqrt. This is to support scalar `sqrt` of complex numbers `std::complex<T>` on device, requested by Tensorflow folks. Technically `std::complex` is not supported by NVCC on device (though it is by clang), so the default `sqrt(std::complex<T>)` function only works on the host. Here we create an overload to add back the functionality. Also modified the CMake file to add `--relaxed-constexpr` (or equivalent) flag for NVCC to allow calling constexpr functions from device functions, and added support for specifying compute architecture for NVCC (was already available for clang).	2020-12-22 23:25:23 -08:00
rgreenblatt	fdf2ee62c5	Fix missing EIGEN_DEVICE_FUNC	2020-12-20 23:22:53 -05:00
Rasmus Munk Larsen	05754100fe	* Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup. * Write iterative sqrt explicitly in terms of pmadd. This gives up to 7% speedup for psqrt<float> with AVX & SSE with FMA. * Remove iterative psqrt<double> for NEON, because the initial rsqrt apprimation is not accurate enough for convergence in 2 Newton-Raphson steps and with 3 steps, just calling the builtin sqrt insn is faster. The following benchmarks were compiled with clang "-O2 -fast-math -mfma" and with and without -mavx. AVX+FMA (float) name old cpu/op new cpu/op delta BM_eigen_sqrt_float/1 1.08ns ± 0% 1.09ns ± 1% ~ BM_eigen_sqrt_float/8 2.07ns ± 0% 2.08ns ± 1% ~ BM_eigen_sqrt_float/64 12.4ns ± 0% 12.4ns ± 1% ~ BM_eigen_sqrt_float/512 95.7ns ± 0% 95.5ns ± 0% ~ BM_eigen_sqrt_float/4k 776ns ± 0% 763ns ± 0% -1.67% BM_eigen_sqrt_float/32k 6.57µs ± 1% 6.13µs ± 0% -6.69% BM_eigen_sqrt_float/256k 83.7µs ± 3% 83.3µs ± 2% ~ BM_eigen_sqrt_float/1M 335µs ± 2% 332µs ± 2% ~ SSE+FMA (float) name old cpu/op new cpu/op delta BM_eigen_sqrt_float/1 1.08ns ± 0% 1.09ns ± 0% ~ BM_eigen_sqrt_float/8 2.07ns ± 0% 2.06ns ± 0% ~ BM_eigen_sqrt_float/64 12.4ns ± 0% 12.4ns ± 1% ~ BM_eigen_sqrt_float/512 95.7ns ± 0% 96.3ns ± 4% ~ BM_eigen_sqrt_float/4k 774ns ± 0% 763ns ± 0% -1.50% BM_eigen_sqrt_float/32k 6.58µs ± 2% 6.11µs ± 0% -7.06% BM_eigen_sqrt_float/256k 82.7µs ± 1% 82.6µs ± 1% ~ BM_eigen_sqrt_float/1M 330µs ± 1% 329µs ± 2% ~ SSE+FMA (double) BM_eigen_sqrt_double/1 1.63ns ± 0% 1.63ns ± 0% ~ BM_eigen_sqrt_double/8 6.51ns ± 0% 6.08ns ± 0% -6.68% BM_eigen_sqrt_double/64 52.1ns ± 0% 46.5ns ± 1% -10.65% BM_eigen_sqrt_double/512 417ns ± 0% 374ns ± 1% -10.29% BM_eigen_sqrt_double/4k 3.33µs ± 0% 2.97µs ± 1% -11.00% BM_eigen_sqrt_double/32k 26.7µs ± 0% 23.7µs ± 0% -11.07% BM_eigen_sqrt_double/256k 213µs ± 0% 206µs ± 1% -3.31% BM_eigen_sqrt_double/1M 862µs ± 0% 870µs ± 2% +0.96% AVX+FMA (double) name old cpu/op new cpu/op delta BM_eigen_sqrt_double/1 1.63ns ± 0% 1.63ns ± 0% ~ BM_eigen_sqrt_double/8 6.51ns ± 0% 6.06ns ± 0% -6.95% BM_eigen_sqrt_double/64 52.1ns ± 0% 46.5ns ± 1% -10.80% BM_eigen_sqrt_double/512 417ns ± 0% 373ns ± 1% -10.59% BM_eigen_sqrt_double/4k 3.33µs ± 0% 2.97µs ± 1% -10.79% BM_eigen_sqrt_double/32k 26.7µs ± 0% 23.8µs ± 0% -10.94% BM_eigen_sqrt_double/256k 214µs ± 0% 208µs ± 2% -2.76% BM_eigen_sqrt_double/1M 866µs ± 3% 923µs ± 7% ~	2020-12-16 18:16:11 +00:00
Rasmus Munk Larsen	6cee8d347e	Add an additional step of Newton-Raphson for `psqrt<double>` on Arm, which otherwise has an error of ~1000 ulps.	2020-12-15 04:06:41 +00:00
David Tellenbach	751f18f2c0	Remove comma at the end of enumeration list to silence C++03 warnings	2020-12-13 18:11:02 +01:00
Antonio Sanchez	5dc2fbabee	Fix implicit cast to double. Triggers `-Wimplicit-float-conversion`, causing a bunch of build errors in Google due to `-Wall`.	2020-12-12 09:26:20 -08:00
Antonio Sanchez	55967f87d1	Fix NEON pmax<PropagateNumbers,Packet4bf>. Simple typo, the max impl called pmin instead of pmax for floats.	2020-12-11 21:50:52 -08:00
Antonio Sanchez	839aa505c3	Fix typo in AVX512 packet math.	2020-12-11 21:35:44 -08:00
David Tellenbach	536c8a79f2	Remove unused macro in Half.h	2020-12-12 00:53:26 +01:00
Antonio Sanchez	8c9976d7f0	Fix more SSE/AVX packet conversions for peven. MSVC doesn't like function-style casts and forces us to use intrinsics.	2020-12-11 15:46:42 -08:00
Antonio Sanchez	c6efc4e0ba	Replace M_LOG2E and M_LN2 with custom macros. For these to exist we would need to define `_USE_MATH_DEFINES` before `cmath` or `math.h` is first included. However, we don't control the include order for projects outside Eigen, so even defining the macro in `Eigen/Core` does not fix the issue for projects that end up including `<cmath>` before Eigen does (explicitly or transitively). To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.	2020-12-11 14:34:31 -08:00
Antonio Sanchez	e82722a4a7	Fix MSVC SSE casts. MSVC doesn't like __m128(__m128i) c-style casts, so packets need to be converted using intrinsic methods.	2020-12-11 08:52:59 -08:00
Deven Desai	f3d2ea48f5	Fix for broken ROCm/HIP Support The following commit introduced a breakage in ROCm/HIP support for Eigen. `5ec4907434 (1958e65719641efe5483abc4ce0b61806270f6f3_525_517)` ``` Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20: In file included from /home/rocm-user/eigen/test/main.h:356: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:222: /home/rocm-user/eigen/Eigen/src/Core/arch/GPU/PacketMath.h:556:10: error: use of undeclared identifier 'half2half2'; did you mean '__half2half2'? return half2half2(from); ^~~~~~~~~~ __half2half2 /opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:547:21: note: '__half2half2' declared here __half2 __half2half2(__half x) ^ 1 error generated when compiling for gfx900. ``` The cause seems to be a copy-paster error, and the fix is trivial	2020-12-11 16:14:57 +00:00
David Tellenbach	c7eb3a74cb	Don't guard psqrt for std::complex<float> with EIGEN_ARCH_ARM64	2020-12-11 12:41:52 +01:00
Everton Constantino	bccf055a7c	Add Armv8 guard on PropagateNumbers implementation.	2020-12-10 22:01:55 -03:00
Antonio Sanchez	82c0c18a83	Remove private access of std::deque::_M_impl. This no longer works on gcc or clang, so we should just remove the hack. The default should compile to similar code anyways.	2020-12-10 14:59:34 -08:00
David Tellenbach	00be0a7ff3	Fix vectorization of complex sqrt on NEON	2020-12-10 15:23:23 +00:00
David Tellenbach	8eb461a431	Remove comma at end of enumerator list in NEON PacketMath	2020-12-10 15:22:55 +01:00
David Tellenbach	2e8f850c78	Fix a typo in SparseMatrix documentation. This fixes issue #2091.	2020-12-09 14:48:24 +01:00
Rasmus Munk Larsen	125cc9a5df	Implement vectorized complex square root. Closes #1905 Measured speedup for sqrt of `complex<float>` on Skylake: SSE: ``` name old time/op new time/op delta BM_eigen_sqrt_ctype/1 49.4ns ± 0% 54.3ns ± 0% +10.01% BM_eigen_sqrt_ctype/8 332ns ± 0% 50ns ± 1% -84.97% BM_eigen_sqrt_ctype/64 2.81µs ± 1% 0.38µs ± 0% -86.49% BM_eigen_sqrt_ctype/512 23.8µs ± 0% 3.0µs ± 0% -87.32% BM_eigen_sqrt_ctype/4k 202µs ± 0% 24µs ± 2% -88.03% BM_eigen_sqrt_ctype/32k 1.63ms ± 0% 0.19ms ± 0% -88.18% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 1.5ms ± 1% -88.20% BM_eigen_sqrt_ctype/1M 52.1ms ± 0% 6.2ms ± 0% -88.18% ``` AVX2: ``` name old cpu/op new cpu/op delta BM_eigen_sqrt_ctype/1 53.6ns ± 0% 55.6ns ± 0% +3.71% BM_eigen_sqrt_ctype/8 334ns ± 0% 27ns ± 0% -91.86% BM_eigen_sqrt_ctype/64 2.79µs ± 0% 0.22µs ± 2% -92.28% BM_eigen_sqrt_ctype/512 23.8µs ± 1% 1.7µs ± 1% -92.81% BM_eigen_sqrt_ctype/4k 201µs ± 0% 14µs ± 1% -93.24% BM_eigen_sqrt_ctype/32k 1.62ms ± 0% 0.11ms ± 1% -93.29% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 0.9ms ± 1% -93.31% BM_eigen_sqrt_ctype/1M 52.0ms ± 0% 3.5ms ± 1% -93.31% ``` AVX512: ``` name old cpu/op new cpu/op delta BM_eigen_sqrt_ctype/1 53.7ns ± 0% 56.2ns ± 1% +4.75% BM_eigen_sqrt_ctype/8 334ns ± 0% 18ns ± 2% -94.63% BM_eigen_sqrt_ctype/64 2.79µs ± 0% 0.12µs ± 1% -95.54% BM_eigen_sqrt_ctype/512 23.9µs ± 1% 1.0µs ± 1% -95.89% BM_eigen_sqrt_ctype/4k 202µs ± 0% 8µs ± 1% -96.13% BM_eigen_sqrt_ctype/32k 1.63ms ± 0% 0.06ms ± 1% -96.15% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 0.5ms ± 4% -96.11% BM_eigen_sqrt_ctype/1M 52.1ms ± 0% 2.0ms ± 1% -96.13% ```	2020-12-08 18:13:35 -08:00
Antonio Sanchez	8cfe0db108	Fix host/device calls for __half. The previous code had `__host__ __device__` functions calling `__device__` functions (e.g. `__low2half`) which caused build failures in tensorflow. Also tried to simplify the `#ifdef` guards to make them more clear.	2020-12-08 20:31:02 +00:00
Everton Constantino	baf9d762b7	- Enabling PropagateNaN and PropagateNumbers for NEON. - Adding propagate tests to bfloat16.	2020-12-08 17:05:05 +00:00
Antonio Sanchez	634bd79b0e	Fix unused warning on new `dense_assignment_loop` impl.	2020-12-07 19:14:21 -08:00
Antonio Sanchez	655c3a4042	Add specialization for compile-time zero-sized dense assignment. In the current `dense_assignment_loop` implementations, if the destination's inner or outer size is zero at compile time and if the kernel involves a product, we currently get a compile error (#2080). This is triggered by attempting to multiply a non-existent row by a column (or vice-versa). To address this, we add a specialization for zero-sized assignments (`AllAtOnceTraversal`) which evaluates to a no-op. We also add a static check to ensure the size is in-fact zero. This now seems to be the only existing use of `AllAtOnceTraversal`. Fixes #2080.	2020-12-07 08:38:43 -08:00
Antonio Sanchez	5ec4907434	Clean up `#if`s in GPU PacketPath. Removed redundant checks and redundant code for CUDA/HIP. Note: there are several issues here of calling `__device__` functions from `__host__ __device__` functions, in particular `__low2half`. We do not address that here -- only modifying this file enough to get our current tests to compile. Fixed: #1847	2020-12-04 16:14:03 -08:00
Rasmus Munk Larsen	f9fac1d5b0	Add log2() to Eigen.	2020-12-04 21:45:09 +00:00
Antonio Sanchez	e2f21465fe	Special function implementations for half/bfloat16 packets. Current implementations fail to consider half-float packets, only half-float scalars. Added specializations for packets on AVX, AVX512 and NEON. Added tests to `special_packetmath`. The current `special_functions` tests would fail for half and bfloat16 due to lack of precision. The NEON tests also fail with precision issues and due to different handling of `sqrt(inf)`, so special functions bessel, ndtri have been disabled. Tested with AVX, AVX512.	2020-12-04 10:16:29 -08:00
David Tellenbach	305b8bd277	Remove duplicate #if clause	2020-12-04 18:55:46 +01:00
Antonio Sanchez	9ee9ac81de	Fix shfl* macros for CUDA/HIP The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so they are defined whenever the corresponding CUDA/HIP ones are. Also changed the HIP/CUDA<9.0 versions to cast to int instead of doing the conversion `half`<->`float`. Fixes #2083	2020-12-04 17:18:32 +00:00
shrek1402	a9a2f2bebf	The function 'prefetch' did not work correctly on the win64 platform	2020-12-04 17:18:08 +00:00
Rasmus Munk Larsen	f23dc5b971	Revert "Add log2() operator to Eigen" This reverts commit `4d91519a9b`.	2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen	4d91519a9b	Add log2() operator to Eigen	2020-12-03 22:31:44 +00:00
Rasmus Munk Larsen	25d8ae7465	Small cleanup of generic plog implementations: Adding the term eln(2) is split into two step for no obvious reason. This dates back to the original Cephes code from which the algorithm is adapted. It appears that this was done in Cephes to prevent the compiler from reordering the addition of the 3 terms in the approximation log(1+x) ~= x - 0.5x^2 + x^3*P(x)/Q(x) which must be added in reverse order since \|x\| < (sqrt(2)-1). This allows rewriting the code to just 2 pmadd and 1 padd instructions, which on a Skylake processor speeds up the code by 5-7%.	2020-12-03 19:40:40 +00:00
Antonio Sanchez	70fbcf82ed	Fix typo in `F32MaskToBf16Mask`.	2020-12-02 07:58:34 -08:00
Antonio Sanchez	2627e2f2e6	Fix neon cmp* functions for bf16. The current impl corrupts the comparison masks when converting from float back to bfloat16. The resulting masks are then no longer all zeros or all ones, which breaks when used with `pselect` (e.g. in `pmin<PropagateNumbers>`). This was causing `packetmath_15` to fail on arm. Introducing a simple `F32MaskToBf16Mask` corrects this (takes the lower 16-bits for each float mask).	2020-12-02 01:29:34 +00:00
Antonio Sanchez	ddd48b242c	Implement CUDA __shfl* for Eigen::half Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu` test are broken, as well as several ops in Tensorflow. The gpu functions `__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float. Here we add the required specializations.	2020-12-01 14:36:52 -08:00
Rasmus Munk Larsen	e57281a741	Fix a few issues for AVX512. This change enables vectorized versions of log, exp, log1p, expm1 when AVX512DQ is not available.	2020-12-01 11:31:47 -08:00
Antonio Sanchez	1992af3de2	Fix #2077 , `EIGEN_CONSTEXPR` in `Half`. `bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from `raw_half_as_uint16(...)`. This shouldn't affect anything else, since it is only used in `a bit_cast<uint16_t,half>()` which is not itself `constexpr`. Fixes #2077.	2020-12-01 03:10:21 +00:00
acxz	7b80609d49	add EIGEN_DEVICE_FUNC to methods	2020-12-01 03:08:47 +00:00
Antonio Sanchez	89f90b585d	AVX512 missing ops. This allows the `packetmath` tests to pass for AVX512 on skylake. Made `half` and `bfloat16` consistent in terms of ops they support. Note the `log` tests are currently disabled for `bfloat16` since they fail due to poor precision (they were previously disabled for `Packet8bf` via test function specialization -- I just removed that specialization and disabled it in the generic test).	2020-11-30 16:28:57 +00:00
Jim Lersch	a7170f2aca	Fix doxygen class blocks that were not associated with the correct classes.	2020-11-27 08:48:11 -07:00
Andreas Krebbel	1e74f93d55	Fix some packet-functions in the IBM ZVector packet-math.	2020-11-25 14:11:23 +00:00
Rasmus Munk Larsen	79818216ed	Revert "Fix Half NaN definition and test." This reverts commit `c770746d70`.	2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen	c770746d70	Fix Half NaN definition and test. The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.	2020-11-24 20:53:07 +00:00
Antonio Sanchez	22f67b5958	Fix boolean float conversion and product warnings. This fixes some gcc warnings such as: ``` Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool] Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); } ``` Details: - Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`). - Added `scalar_square_op<bool>` and `scalar_cube_op<bool>` specializations (`-Wint-in-bool-context`) - Deprecated above specialized ops for bool. - Modified `cxx11_tensor_block_eval` to specialize generator for booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to avoid deprecated bool ops.	2020-11-24 20:20:36 +00:00
Antonio Sanchez	a3b300f1af	Implement missing AVX half ops. Minimal implementation of AVX `Eigen::half` ops to bring in line with `bfloat16`. Allows `packetmath_13` to pass. Also adjusted `bfloat16` packet traits to match the supported set of ops (e.g. Bessel is not actually implemented).	2020-11-24 16:46:41 +00:00
Antonio Sanchez	38abf2be42	Fix Half NaN definition and test. The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.	2020-11-23 14:13:59 -08:00
Antonio Sanchez	4cf01d2cf5	Update AVX half packets, disable test. The AVX half implementation is incomplete, causing the `packetmath_13` test to fail. This disables the test. Also refactored the existing AVX implementation to use `bit_cast` instead of direct access to `.x`.	2020-11-21 09:05:10 -08:00
Antonio Sanchez	fd1dcb6b45	Fixes duplicate symbol when building blas Missing inline breaks blas, since symbol generated in `complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp` Changed rest of inlines to `EIGEN_STRONG_INLINE`.	2020-11-20 09:37:40 -08:00
David Tellenbach	6c9c3f9a1a	Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to float and can hence be converted to bool via the conversion chain Eigen::{half,bfloat16} -> float -> bool We thus remove the explicit cast operator to bool.	2020-11-19 18:49:09 +01:00
Antonio Sanchez	a8fdcae55d	Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix. Multiplication of column-major `DynamicSparseMatrix`es involves three temporaries: - two for transposing twice to sort the coefficients (`ConservativeSparseSparseProduct.h`, L160-161) - one for a final copy assignment (`SparseAssign.h`, L108) The latter is avoided in an optimization for `SparseMatrix`. Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not worth the effort to optimize further, so I simply disabled counting temporaries via a macro. Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra` tests actually re-run all the original `sparse_product` tests as well. We may want to simply drop the `DynamicSparseMatrix` tests altogether, which would eliminate the test duplication. Related to #2048	2020-11-18 23:15:33 +00:00
David Tellenbach	11e4056f6b	Re-enable Arm Neon Eigen::half packets of size 8 - Add predux_half_dowto4 - Remove explicit casts in Half.h to match the behaviour of BFloat16.h - Enable more packetmath tests for Eigen::half	2020-11-18 23:02:21 +00:00
Antonio Sanchez	17268b155d	Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom The existing `TensorRandom.h` implementation makes the assumption that `half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not always true. This currently fails on arm64, where `x` has type `__fp16`. Added `bit_cast` specializations to allow casting to/from `uint16_t` for both `half` and `bfloat16`. Also added tests in `half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch these errors in the future.	2020-11-18 20:32:35 +00:00
Antonio Sanchez	60218829b7	EOF newline added to InverseSize4. Causing build breakages due to `-Wnewline-eof -Werror` that seems to be common across Google.	2020-11-18 07:58:33 -08:00
Rasmus Munk Larsen	2d63706545	Add missing parens around macro argument.	2020-11-18 00:24:19 +00:00
Rasmus Munk Larsen	6bba58f109	Replace SSE_SHUFFLE_MASK macro with shuffle_mask.	2020-11-17 15:28:37 -08:00
David Tellenbach	e9b55c4db8	Avoid promotion of Arm __fp16 to float in Neon PacketMath Using overloaded arithmetic operators for Arm __fp16 always causes a promotion to float. We replace operator* by vmulh_f16 to avoid this.	2020-11-17 20:19:44 +01:00
Antonio Sanchez	117a4c0617	Fix missing `EIGEN_CONSTEXPR` pop_macro in `Half`. `EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if `EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.	2020-11-17 08:29:33 -08:00
Guoqiang QI	394f564055	Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath.	2020-11-17 12:27:01 +00:00
acxz	9175f50d6f	Add EIGEN_DEVICE_FUNC to TranspositionsBase Fixes #2057.	2020-11-16 15:37:40 +00:00
Antonio Sanchez	bb69a8db5d	Explicit casts of S -> std::complex<T> When calling `internal::cast<S, std::complex<T>>(x)`, clang often generates an implicit conversion warning due to an implicit cast from type `S` to `T`. This currently affects the following tests: - `basicstuff` - `bfloat16_float` - `cxx11_tensor_casts` The implicit cast leads to widening/narrowing float conversions. Widening warnings only seem to be generated by clang (`-Wdouble-promotion`). To eliminate the warning, we explicitly cast the real-component first from `S` to `T`. We also adjust tests to use `internal::cast` instead of `static_cast` when a complex type may be involved.	2020-11-14 05:50:42 +00:00
guoqiangqi	8324e5e049	Fix typo in NEON/PacketMath.h	2020-11-13 00:46:41 +00:00
Rasmus Munk Larsen	bec72345d6	Simplify expression for inner product fallback in Gemv product evaluator.	2020-11-12 23:43:15 +00:00

1 2 3 4 5 ...

6571 Commits