eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-06 14:14:46 +08:00

Author	SHA1	Message	Date
Antonio Sanchez	e2f21465fe	Special function implementations for half/bfloat16 packets. Current implementations fail to consider half-float packets, only half-float scalars. Added specializations for packets on AVX, AVX512 and NEON. Added tests to `special_packetmath`. The current `special_functions` tests would fail for half and bfloat16 due to lack of precision. The NEON tests also fail with precision issues and due to different handling of `sqrt(inf)`, so special functions bessel, ndtri have been disabled. Tested with AVX, AVX512.	2020-12-04 10:16:29 -08:00
David Tellenbach	305b8bd277	Remove duplicate #if clause	2020-12-04 18:55:46 +01:00
Antonio Sanchez	9ee9ac81de	Fix shfl* macros for CUDA/HIP The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so they are defined whenever the corresponding CUDA/HIP ones are. Also changed the HIP/CUDA<9.0 versions to cast to int instead of doing the conversion `half`<->`float`. Fixes #2083	2020-12-04 17:18:32 +00:00
shrek1402	a9a2f2bebf	The function 'prefetch' did not work correctly on the win64 platform	2020-12-04 17:18:08 +00:00
Rasmus Munk Larsen	f23dc5b971	Revert "Add log2() operator to Eigen" This reverts commit `4d91519a9b`.	2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen	4d91519a9b	Add log2() operator to Eigen	2020-12-03 22:31:44 +00:00
Rasmus Munk Larsen	25d8ae7465	Small cleanup of generic plog implementations: Adding the term eln(2) is split into two step for no obvious reason. This dates back to the original Cephes code from which the algorithm is adapted. It appears that this was done in Cephes to prevent the compiler from reordering the addition of the 3 terms in the approximation log(1+x) ~= x - 0.5x^2 + x^3*P(x)/Q(x) which must be added in reverse order since \|x\| < (sqrt(2)-1). This allows rewriting the code to just 2 pmadd and 1 padd instructions, which on a Skylake processor speeds up the code by 5-7%.	2020-12-03 19:40:40 +00:00
Antonio Sanchez	70fbcf82ed	Fix typo in `F32MaskToBf16Mask`.	2020-12-02 07:58:34 -08:00
Antonio Sanchez	2627e2f2e6	Fix neon cmp* functions for bf16. The current impl corrupts the comparison masks when converting from float back to bfloat16. The resulting masks are then no longer all zeros or all ones, which breaks when used with `pselect` (e.g. in `pmin<PropagateNumbers>`). This was causing `packetmath_15` to fail on arm. Introducing a simple `F32MaskToBf16Mask` corrects this (takes the lower 16-bits for each float mask).	2020-12-02 01:29:34 +00:00
Antonio Sanchez	ddd48b242c	Implement CUDA __shfl* for Eigen::half Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu` test are broken, as well as several ops in Tensorflow. The gpu functions `__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float. Here we add the required specializations.	2020-12-01 14:36:52 -08:00
Rasmus Munk Larsen	e57281a741	Fix a few issues for AVX512. This change enables vectorized versions of log, exp, log1p, expm1 when AVX512DQ is not available.	2020-12-01 11:31:47 -08:00
Antonio Sanchez	1992af3de2	Fix #2077 , `EIGEN_CONSTEXPR` in `Half`. `bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from `raw_half_as_uint16(...)`. This shouldn't affect anything else, since it is only used in `a bit_cast<uint16_t,half>()` which is not itself `constexpr`. Fixes #2077.	2020-12-01 03:10:21 +00:00
acxz	7b80609d49	add EIGEN_DEVICE_FUNC to methods	2020-12-01 03:08:47 +00:00
Antonio Sanchez	89f90b585d	AVX512 missing ops. This allows the `packetmath` tests to pass for AVX512 on skylake. Made `half` and `bfloat16` consistent in terms of ops they support. Note the `log` tests are currently disabled for `bfloat16` since they fail due to poor precision (they were previously disabled for `Packet8bf` via test function specialization -- I just removed that specialization and disabled it in the generic test).	2020-11-30 16:28:57 +00:00
Jim Lersch	a7170f2aca	Fix doxygen class blocks that were not associated with the correct classes.	2020-11-27 08:48:11 -07:00
Andreas Krebbel	1e74f93d55	Fix some packet-functions in the IBM ZVector packet-math.	2020-11-25 14:11:23 +00:00
Rasmus Munk Larsen	79818216ed	Revert "Fix Half NaN definition and test." This reverts commit `c770746d70`.	2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen	c770746d70	Fix Half NaN definition and test. The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.	2020-11-24 20:53:07 +00:00
Antonio Sanchez	22f67b5958	Fix boolean float conversion and product warnings. This fixes some gcc warnings such as: ``` Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool] Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); } ``` Details: - Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`). - Added `scalar_square_op<bool>` and `scalar_cube_op<bool>` specializations (`-Wint-in-bool-context`) - Deprecated above specialized ops for bool. - Modified `cxx11_tensor_block_eval` to specialize generator for booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to avoid deprecated bool ops.	2020-11-24 20:20:36 +00:00
Antonio Sanchez	a3b300f1af	Implement missing AVX half ops. Minimal implementation of AVX `Eigen::half` ops to bring in line with `bfloat16`. Allows `packetmath_13` to pass. Also adjusted `bfloat16` packet traits to match the supported set of ops (e.g. Bessel is not actually implemented).	2020-11-24 16:46:41 +00:00
Antonio Sanchez	38abf2be42	Fix Half NaN definition and test. The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.	2020-11-23 14:13:59 -08:00
Antonio Sanchez	4cf01d2cf5	Update AVX half packets, disable test. The AVX half implementation is incomplete, causing the `packetmath_13` test to fail. This disables the test. Also refactored the existing AVX implementation to use `bit_cast` instead of direct access to `.x`.	2020-11-21 09:05:10 -08:00
Antonio Sanchez	fd1dcb6b45	Fixes duplicate symbol when building blas Missing inline breaks blas, since symbol generated in `complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp` Changed rest of inlines to `EIGEN_STRONG_INLINE`.	2020-11-20 09:37:40 -08:00
David Tellenbach	6c9c3f9a1a	Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to float and can hence be converted to bool via the conversion chain Eigen::{half,bfloat16} -> float -> bool We thus remove the explicit cast operator to bool.	2020-11-19 18:49:09 +01:00
Antonio Sanchez	a8fdcae55d	Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix. Multiplication of column-major `DynamicSparseMatrix`es involves three temporaries: - two for transposing twice to sort the coefficients (`ConservativeSparseSparseProduct.h`, L160-161) - one for a final copy assignment (`SparseAssign.h`, L108) The latter is avoided in an optimization for `SparseMatrix`. Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not worth the effort to optimize further, so I simply disabled counting temporaries via a macro. Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra` tests actually re-run all the original `sparse_product` tests as well. We may want to simply drop the `DynamicSparseMatrix` tests altogether, which would eliminate the test duplication. Related to #2048	2020-11-18 23:15:33 +00:00
David Tellenbach	11e4056f6b	Re-enable Arm Neon Eigen::half packets of size 8 - Add predux_half_dowto4 - Remove explicit casts in Half.h to match the behaviour of BFloat16.h - Enable more packetmath tests for Eigen::half	2020-11-18 23:02:21 +00:00
Antonio Sanchez	17268b155d	Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom The existing `TensorRandom.h` implementation makes the assumption that `half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not always true. This currently fails on arm64, where `x` has type `__fp16`. Added `bit_cast` specializations to allow casting to/from `uint16_t` for both `half` and `bfloat16`. Also added tests in `half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch these errors in the future.	2020-11-18 20:32:35 +00:00
Antonio Sanchez	60218829b7	EOF newline added to InverseSize4. Causing build breakages due to `-Wnewline-eof -Werror` that seems to be common across Google.	2020-11-18 07:58:33 -08:00
Rasmus Munk Larsen	2d63706545	Add missing parens around macro argument.	2020-11-18 00:24:19 +00:00
Rasmus Munk Larsen	6bba58f109	Replace SSE_SHUFFLE_MASK macro with shuffle_mask.	2020-11-17 15:28:37 -08:00
David Tellenbach	e9b55c4db8	Avoid promotion of Arm __fp16 to float in Neon PacketMath Using overloaded arithmetic operators for Arm __fp16 always causes a promotion to float. We replace operator* by vmulh_f16 to avoid this.	2020-11-17 20:19:44 +01:00
Antonio Sanchez	117a4c0617	Fix missing `EIGEN_CONSTEXPR` pop_macro in `Half`. `EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if `EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.	2020-11-17 08:29:33 -08:00
Guoqiang QI	394f564055	Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath.	2020-11-17 12:27:01 +00:00
acxz	9175f50d6f	Add EIGEN_DEVICE_FUNC to TranspositionsBase Fixes #2057.	2020-11-16 15:37:40 +00:00
Antonio Sanchez	bb69a8db5d	Explicit casts of S -> std::complex<T> When calling `internal::cast<S, std::complex<T>>(x)`, clang often generates an implicit conversion warning due to an implicit cast from type `S` to `T`. This currently affects the following tests: - `basicstuff` - `bfloat16_float` - `cxx11_tensor_casts` The implicit cast leads to widening/narrowing float conversions. Widening warnings only seem to be generated by clang (`-Wdouble-promotion`). To eliminate the warning, we explicitly cast the real-component first from `S` to `T`. We also adjust tests to use `internal::cast` instead of `static_cast` when a complex type may be involved.	2020-11-14 05:50:42 +00:00
guoqiangqi	8324e5e049	Fix typo in NEON/PacketMath.h	2020-11-13 00:46:41 +00:00
Rasmus Munk Larsen	bec72345d6	Simplify expression for inner product fallback in Gemv product evaluator.	2020-11-12 23:43:15 +00:00
Rasmus Munk Larsen	276db21f26	Remove redundant branch for handling dynamic vector*vector. This will be handled by the equivalent branch in the specialization for GemvProduct.	2020-11-12 21:54:56 +00:00
Rasmus Munk Larsen	cf12474a8b	Optimize matrixmatrix and matrixvector products when they correspond to inner products at runtime. This speeds up inner products where the one or or both arguments is dynamic for small and medium-sized vectors (up to 32k). name old time/op new time/op delta BM_VecVecStatStat<float>/1 1.64ns ± 0% 1.64ns ± 0% ~ BM_VecVecStatStat<float>/8 2.99ns ± 0% 2.99ns ± 0% ~ BM_VecVecStatStat<float>/64 7.00ns ± 1% 7.04ns ± 0% +0.66% BM_VecVecStatStat<float>/512 61.6ns ± 0% 61.6ns ± 0% ~ BM_VecVecStatStat<float>/4k 551ns ± 0% 553ns ± 1% +0.26% BM_VecVecStatStat<float>/32k 4.45µs ± 0% 4.45µs ± 0% ~ BM_VecVecStatStat<float>/256k 77.9µs ± 0% 78.1µs ± 1% ~ BM_VecVecStatStat<float>/1M 312µs ± 0% 312µs ± 1% ~ BM_VecVecDynStat<float>/1 13.3ns ± 1% 4.6ns ± 0% -65.35% BM_VecVecDynStat<float>/8 14.4ns ± 0% 6.2ns ± 0% -57.00% BM_VecVecDynStat<float>/64 24.0ns ± 0% 10.2ns ± 3% -57.57% BM_VecVecDynStat<float>/512 138ns ± 0% 68ns ± 0% -50.52% BM_VecVecDynStat<float>/4k 1.11µs ± 0% 0.56µs ± 0% -49.72% BM_VecVecDynStat<float>/32k 8.89µs ± 0% 4.46µs ± 0% -49.89% BM_VecVecDynStat<float>/256k 78.2µs ± 0% 78.1µs ± 1% ~ BM_VecVecDynStat<float>/1M 313µs ± 0% 312µs ± 1% ~ BM_VecVecDynDyn<float>/1 10.4ns ± 0% 10.5ns ± 0% +0.91% BM_VecVecDynDyn<float>/8 12.0ns ± 3% 11.9ns ± 0% ~ BM_VecVecDynDyn<float>/64 37.4ns ± 0% 19.6ns ± 1% -47.57% BM_VecVecDynDyn<float>/512 159ns ± 0% 81ns ± 0% -49.07% BM_VecVecDynDyn<float>/4k 1.13µs ± 0% 0.58µs ± 1% -49.11% BM_VecVecDynDyn<float>/32k 8.91µs ± 0% 5.06µs ±12% -43.23% BM_VecVecDynDyn<float>/256k 78.2µs ± 0% 78.2µs ± 1% ~ BM_VecVecDynDyn<float>/1M 313µs ± 0% 312µs ± 1% ~	2020-11-12 18:02:37 +00:00
Pedro Caldeira	c29935b323	Add support for dynamic dispatch of MMA instructions for POWER 10	2020-11-12 11:31:15 -03:00
acxz	b714dd9701	remove annotation for first declaration of default con/destruction	2020-11-12 04:34:12 +00:00
mehdi-goli	e24a1f57e3	[SYCL Function pointer Issue]: SYCL does not support function pointer inside the kernel, due to the portability issue of a function pointer and memory address space among host and accelerators. To fix the issue, function pointers have been replaced by function objects.	2020-11-12 01:50:28 +00:00
guoqiangqi	82fe059f35	Fix issue2045 which get a error case _mm256_set_m128d op not supported by gcc 7.x	2020-11-04 09:21:39 +08:00
Deven Desai	39a038f2e4	Fix for ROCm (and CUDA?) breakage - 201029 The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error `e265f7ed8e` ``` Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20: In file included from /home/rocm-user/eigen/test/main.h:355: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:169: /home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'? return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t>(ptr))); ^~~~~~ Eigen::numext /home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here namespace numext { ^ 1 error generated when compiling for gfx900. CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message): Error generating file /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed make[3]: [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1 CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed make[2]: * [test/CMakeFiles/gpu_basic.dir/all] Error 2 CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed make[1]: * [test/CMakeFiles/gpu_basic.dir/rule] Error 2 Makefile:5401: recipe for target 'gpu_basic' failed make: * [gpu_basic] Error 2 ``` The fix is in this commit is trivial. Please review and merge	2020-10-29 15:34:05 +00:00
David Tellenbach	f895755c0e	Remove unused functions in Half.h. The following functions have been removed: Eigen::half fabsh(const Eigen::half&) Eigen::half exph(const Eigen::half&) Eigen::half sqrth(const Eigen::half&) Eigen::half powh(const Eigen::half&, const Eigen::half&) Eigen::half floorh(const Eigen::half&) Eigen::half ceilh(const Eigen::half&)	2020-10-29 07:37:52 +01:00
David Tellenbach	09f015852b	Replace numext::as_uint with numext::bit_cast<numext::uint32_t>	2020-10-29 07:28:28 +01:00
David Tellenbach	e265f7ed8e	Add support for Armv8.2-a __fp16 Armv8.2-a provides a native half-precision floating point (__fp16 aka. float16_t). This patch introduces * __fp16 as underlying type of Eigen::half if this type is available * the packet types Packet4hf and Packet8hf representing float16x4_t and float16x8_t respectively * packet-math for the above packets with corresponding scalar type Eigen::half The packet-math functionality has been implemented by Ashutosh Sharma <ashutosh.sharma@amperecomputing.com>. This closes #1940.	2020-10-28 20:15:09 +00:00
mehdi-goli	b9ff791fed	[Missing SYCL math op]: Addin the missing LDEXP Function for SYCL.	2020-10-28 08:32:57 +00:00
mehdi-goli	61461d682a	[Fixing expf issue]: Eigen uses the packet type operation for scaler type float on Sigmoid function(https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/functors/UnaryFunctors.h#L990 ). As a result SYCL backend breaks since SYCL backend only supports packet operation for vectorized type float4 and double2. The issue has been fixed by adding scalar type float to packet operation pexp for SYCL backend.	2020-10-28 08:30:34 +00:00
guoqiangqi	28aef8e816	Improve polynomial evaluation with instruction-level parallelism for pexp_float and pexp<Packet16f>	2020-10-20 11:37:09 +08:00

1 2 3 4 5 ...

6358 Commits