eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Antonio Sanchez	ddd48b242c	Implement CUDA __shfl* for Eigen::half Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu` test are broken, as well as several ops in Tensorflow. The gpu functions `__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float. Here we add the required specializations.	2020-12-01 14:36:52 -08:00
Rasmus Munk Larsen	e57281a741	Fix a few issues for AVX512. This change enables vectorized versions of log, exp, log1p, expm1 when AVX512DQ is not available.	2020-12-01 11:31:47 -08:00
Antonio Sanchez	1992af3de2	Fix #2077 , `EIGEN_CONSTEXPR` in `Half`. `bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from `raw_half_as_uint16(...)`. This shouldn't affect anything else, since it is only used in `a bit_cast<uint16_t,half>()` which is not itself `constexpr`. Fixes #2077.	2020-12-01 03:10:21 +00:00
acxz	7b80609d49	add EIGEN_DEVICE_FUNC to methods	2020-12-01 03:08:47 +00:00
Antonio Sanchez	89f90b585d	AVX512 missing ops. This allows the `packetmath` tests to pass for AVX512 on skylake. Made `half` and `bfloat16` consistent in terms of ops they support. Note the `log` tests are currently disabled for `bfloat16` since they fail due to poor precision (they were previously disabled for `Packet8bf` via test function specialization -- I just removed that specialization and disabled it in the generic test).	2020-11-30 16:28:57 +00:00
Jim Lersch	a7170f2aca	Fix doxygen class blocks that were not associated with the correct classes.	2020-11-27 08:48:11 -07:00
Andreas Krebbel	1e74f93d55	Fix some packet-functions in the IBM ZVector packet-math.	2020-11-25 14:11:23 +00:00
Rasmus Munk Larsen	79818216ed	Revert "Fix Half NaN definition and test." This reverts commit `c770746d70`.	2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen	c770746d70	Fix Half NaN definition and test. The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.	2020-11-24 20:53:07 +00:00
Antonio Sanchez	22f67b5958	Fix boolean float conversion and product warnings. This fixes some gcc warnings such as: ``` Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool] Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); } ``` Details: - Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`). - Added `scalar_square_op<bool>` and `scalar_cube_op<bool>` specializations (`-Wint-in-bool-context`) - Deprecated above specialized ops for bool. - Modified `cxx11_tensor_block_eval` to specialize generator for booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to avoid deprecated bool ops.	2020-11-24 20:20:36 +00:00
Antonio Sanchez	a3b300f1af	Implement missing AVX half ops. Minimal implementation of AVX `Eigen::half` ops to bring in line with `bfloat16`. Allows `packetmath_13` to pass. Also adjusted `bfloat16` packet traits to match the supported set of ops (e.g. Bessel is not actually implemented).	2020-11-24 16:46:41 +00:00
Antonio Sanchez	38abf2be42	Fix Half NaN definition and test. The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.	2020-11-23 14:13:59 -08:00
Antonio Sanchez	4cf01d2cf5	Update AVX half packets, disable test. The AVX half implementation is incomplete, causing the `packetmath_13` test to fail. This disables the test. Also refactored the existing AVX implementation to use `bit_cast` instead of direct access to `.x`.	2020-11-21 09:05:10 -08:00
Antonio Sanchez	fd1dcb6b45	Fixes duplicate symbol when building blas Missing inline breaks blas, since symbol generated in `complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp` Changed rest of inlines to `EIGEN_STRONG_INLINE`.	2020-11-20 09:37:40 -08:00
David Tellenbach	6c9c3f9a1a	Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to float and can hence be converted to bool via the conversion chain Eigen::{half,bfloat16} -> float -> bool We thus remove the explicit cast operator to bool.	2020-11-19 18:49:09 +01:00
Antonio Sanchez	a8fdcae55d	Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix. Multiplication of column-major `DynamicSparseMatrix`es involves three temporaries: - two for transposing twice to sort the coefficients (`ConservativeSparseSparseProduct.h`, L160-161) - one for a final copy assignment (`SparseAssign.h`, L108) The latter is avoided in an optimization for `SparseMatrix`. Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not worth the effort to optimize further, so I simply disabled counting temporaries via a macro. Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra` tests actually re-run all the original `sparse_product` tests as well. We may want to simply drop the `DynamicSparseMatrix` tests altogether, which would eliminate the test duplication. Related to #2048	2020-11-18 23:15:33 +00:00
David Tellenbach	11e4056f6b	Re-enable Arm Neon Eigen::half packets of size 8 - Add predux_half_dowto4 - Remove explicit casts in Half.h to match the behaviour of BFloat16.h - Enable more packetmath tests for Eigen::half	2020-11-18 23:02:21 +00:00
Antonio Sanchez	17268b155d	Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom The existing `TensorRandom.h` implementation makes the assumption that `half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not always true. This currently fails on arm64, where `x` has type `__fp16`. Added `bit_cast` specializations to allow casting to/from `uint16_t` for both `half` and `bfloat16`. Also added tests in `half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch these errors in the future.	2020-11-18 20:32:35 +00:00
Antonio Sanchez	60218829b7	EOF newline added to InverseSize4. Causing build breakages due to `-Wnewline-eof -Werror` that seems to be common across Google.	2020-11-18 07:58:33 -08:00
Rasmus Munk Larsen	2d63706545	Add missing parens around macro argument.	2020-11-18 00:24:19 +00:00
Rasmus Munk Larsen	6bba58f109	Replace SSE_SHUFFLE_MASK macro with shuffle_mask.	2020-11-17 15:28:37 -08:00
David Tellenbach	e9b55c4db8	Avoid promotion of Arm __fp16 to float in Neon PacketMath Using overloaded arithmetic operators for Arm __fp16 always causes a promotion to float. We replace operator* by vmulh_f16 to avoid this.	2020-11-17 20:19:44 +01:00
Antonio Sanchez	117a4c0617	Fix missing `EIGEN_CONSTEXPR` pop_macro in `Half`. `EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if `EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.	2020-11-17 08:29:33 -08:00
Guoqiang QI	394f564055	Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath.	2020-11-17 12:27:01 +00:00
acxz	9175f50d6f	Add EIGEN_DEVICE_FUNC to TranspositionsBase Fixes #2057.	2020-11-16 15:37:40 +00:00
Antonio Sanchez	bb69a8db5d	Explicit casts of S -> std::complex<T> When calling `internal::cast<S, std::complex<T>>(x)`, clang often generates an implicit conversion warning due to an implicit cast from type `S` to `T`. This currently affects the following tests: - `basicstuff` - `bfloat16_float` - `cxx11_tensor_casts` The implicit cast leads to widening/narrowing float conversions. Widening warnings only seem to be generated by clang (`-Wdouble-promotion`). To eliminate the warning, we explicitly cast the real-component first from `S` to `T`. We also adjust tests to use `internal::cast` instead of `static_cast` when a complex type may be involved.	2020-11-14 05:50:42 +00:00
guoqiangqi	8324e5e049	Fix typo in NEON/PacketMath.h	2020-11-13 00:46:41 +00:00
Rasmus Munk Larsen	bec72345d6	Simplify expression for inner product fallback in Gemv product evaluator.	2020-11-12 23:43:15 +00:00
Rasmus Munk Larsen	276db21f26	Remove redundant branch for handling dynamic vector*vector. This will be handled by the equivalent branch in the specialization for GemvProduct.	2020-11-12 21:54:56 +00:00
Rasmus Munk Larsen	cf12474a8b	Optimize matrixmatrix and matrixvector products when they correspond to inner products at runtime. This speeds up inner products where the one or or both arguments is dynamic for small and medium-sized vectors (up to 32k). name old time/op new time/op delta BM_VecVecStatStat<float>/1 1.64ns ± 0% 1.64ns ± 0% ~ BM_VecVecStatStat<float>/8 2.99ns ± 0% 2.99ns ± 0% ~ BM_VecVecStatStat<float>/64 7.00ns ± 1% 7.04ns ± 0% +0.66% BM_VecVecStatStat<float>/512 61.6ns ± 0% 61.6ns ± 0% ~ BM_VecVecStatStat<float>/4k 551ns ± 0% 553ns ± 1% +0.26% BM_VecVecStatStat<float>/32k 4.45µs ± 0% 4.45µs ± 0% ~ BM_VecVecStatStat<float>/256k 77.9µs ± 0% 78.1µs ± 1% ~ BM_VecVecStatStat<float>/1M 312µs ± 0% 312µs ± 1% ~ BM_VecVecDynStat<float>/1 13.3ns ± 1% 4.6ns ± 0% -65.35% BM_VecVecDynStat<float>/8 14.4ns ± 0% 6.2ns ± 0% -57.00% BM_VecVecDynStat<float>/64 24.0ns ± 0% 10.2ns ± 3% -57.57% BM_VecVecDynStat<float>/512 138ns ± 0% 68ns ± 0% -50.52% BM_VecVecDynStat<float>/4k 1.11µs ± 0% 0.56µs ± 0% -49.72% BM_VecVecDynStat<float>/32k 8.89µs ± 0% 4.46µs ± 0% -49.89% BM_VecVecDynStat<float>/256k 78.2µs ± 0% 78.1µs ± 1% ~ BM_VecVecDynStat<float>/1M 313µs ± 0% 312µs ± 1% ~ BM_VecVecDynDyn<float>/1 10.4ns ± 0% 10.5ns ± 0% +0.91% BM_VecVecDynDyn<float>/8 12.0ns ± 3% 11.9ns ± 0% ~ BM_VecVecDynDyn<float>/64 37.4ns ± 0% 19.6ns ± 1% -47.57% BM_VecVecDynDyn<float>/512 159ns ± 0% 81ns ± 0% -49.07% BM_VecVecDynDyn<float>/4k 1.13µs ± 0% 0.58µs ± 1% -49.11% BM_VecVecDynDyn<float>/32k 8.91µs ± 0% 5.06µs ±12% -43.23% BM_VecVecDynDyn<float>/256k 78.2µs ± 0% 78.2µs ± 1% ~ BM_VecVecDynDyn<float>/1M 313µs ± 0% 312µs ± 1% ~	2020-11-12 18:02:37 +00:00
Pedro Caldeira	c29935b323	Add support for dynamic dispatch of MMA instructions for POWER 10	2020-11-12 11:31:15 -03:00
acxz	b714dd9701	remove annotation for first declaration of default con/destruction	2020-11-12 04:34:12 +00:00
mehdi-goli	e24a1f57e3	[SYCL Function pointer Issue]: SYCL does not support function pointer inside the kernel, due to the portability issue of a function pointer and memory address space among host and accelerators. To fix the issue, function pointers have been replaced by function objects.	2020-11-12 01:50:28 +00:00
guoqiangqi	82fe059f35	Fix issue2045 which get a error case _mm256_set_m128d op not supported by gcc 7.x	2020-11-04 09:21:39 +08:00
Deven Desai	39a038f2e4	Fix for ROCm (and CUDA?) breakage - 201029 The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error `e265f7ed8e` ``` Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20: In file included from /home/rocm-user/eigen/test/main.h:355: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:169: /home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'? return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t>(ptr))); ^~~~~~ Eigen::numext /home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here namespace numext { ^ 1 error generated when compiling for gfx900. CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message): Error generating file /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed make[3]: [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1 CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed make[2]: * [test/CMakeFiles/gpu_basic.dir/all] Error 2 CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed make[1]: * [test/CMakeFiles/gpu_basic.dir/rule] Error 2 Makefile:5401: recipe for target 'gpu_basic' failed make: * [gpu_basic] Error 2 ``` The fix is in this commit is trivial. Please review and merge	2020-10-29 15:34:05 +00:00
David Tellenbach	f895755c0e	Remove unused functions in Half.h. The following functions have been removed: Eigen::half fabsh(const Eigen::half&) Eigen::half exph(const Eigen::half&) Eigen::half sqrth(const Eigen::half&) Eigen::half powh(const Eigen::half&, const Eigen::half&) Eigen::half floorh(const Eigen::half&) Eigen::half ceilh(const Eigen::half&)	2020-10-29 07:37:52 +01:00
David Tellenbach	09f015852b	Replace numext::as_uint with numext::bit_cast<numext::uint32_t>	2020-10-29 07:28:28 +01:00
David Tellenbach	e265f7ed8e	Add support for Armv8.2-a __fp16 Armv8.2-a provides a native half-precision floating point (__fp16 aka. float16_t). This patch introduces * __fp16 as underlying type of Eigen::half if this type is available * the packet types Packet4hf and Packet8hf representing float16x4_t and float16x8_t respectively * packet-math for the above packets with corresponding scalar type Eigen::half The packet-math functionality has been implemented by Ashutosh Sharma <ashutosh.sharma@amperecomputing.com>. This closes #1940.	2020-10-28 20:15:09 +00:00
mehdi-goli	b9ff791fed	[Missing SYCL math op]: Addin the missing LDEXP Function for SYCL.	2020-10-28 08:32:57 +00:00
mehdi-goli	61461d682a	[Fixing expf issue]: Eigen uses the packet type operation for scaler type float on Sigmoid function(https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/functors/UnaryFunctors.h#L990 ). As a result SYCL backend breaks since SYCL backend only supports packet operation for vectorized type float4 and double2. The issue has been fixed by adding scalar type float to packet operation pexp for SYCL backend.	2020-10-28 08:30:34 +00:00
guoqiangqi	28aef8e816	Improve polynomial evaluation with instruction-level parallelism for pexp_float and pexp<Packet16f>	2020-10-20 11:37:09 +08:00
guoqiangqi	4a77eda1fd	remove unnecessary specialize template of pexp for scale float/double	2020-10-19 00:51:42 +00:00
Antonio Sanchez	d9f0d9eb76	Fix missing `pfirst<Packet16b>` for MSVC. It was only defined under one `#ifdef` case. This fixes the `packetmath_14` test for MSVC.	2020-10-16 16:22:00 -07:00
Rasmus Munk Larsen	21edea5edd	Fix the specialization of pfrexp for AVX to be faster when AVX2/AVX512DQ is not available, and avoid undefined behavior in C++. Also mask off the sign bit when extracting the exponent.	2020-10-15 18:39:58 -07:00
Deven Desai	011e0db31d	Fix for ROCm/HIP breakage - 201013 The following commit seems to have introduced regressions in ROCm/HIP support. `183a208212` It causes some unit-tests to fail with the following error ``` ... Eigen/src/Core/GenericPacketMath.h:322:3: error: no member named 'bit_and' in the global namespace; did you mean 'std::bit_and'? ... Eigen/src/Core/GenericPacketMath.h:329:3: error: no member named 'bit_or' in the global namespace; did you mean 'std::bit_or'? ... Eigen/src/Core/GenericPacketMath.h:336:3: error: no member named 'bit_xor' in the global namespace; did you mean 'std::bit_xor'? ... ``` The error occurs because, when compiling the device code in HIP/CUDA, the compiler will pick up the some of the std functions (whose calls are prefixed by EIGEN_USING_STD) from the global namespace (i.e. use ::bit_xor instead of std::bit_xor). For this to work, those functions must be declared in the global namespace in the HIP/CUDA header files. The `bit_and`, `bit_or` and `bit_xor` routines are not declared in the HIP header file that contain the decls for the std math functions ( `math_functions.h` ), and this is the cause of the error above. It seems that the newer HIP compilers do support the calling of `std::` math routines within device code, and the ideal fix here would have been to change all calls to std math functions in EIGEN to use the `std::` namespace (instead of the global namespace ), when compiling with HIP compiler. However it seems there was a recent commit to remove the EIGEN_USING_STD_MATH macro and collapse it uses into the EIGEN_USING_STD macro ( `4091f6b25c` ). Replacing all std math calls will essentially require re-surrecting the EIGEN_USING_STD_MATH macro, so not choosing that option. Also HIP compilers only have support std math calls within device code, and not all std functions (specifically not for malloc/free which are prefixed via EIGEN_USING_STD). So modyfing EIGEN_USE_STD implementation to use std:: namspace for HIP will not work either. Hence going for the ugly solution of special casing the three calls that breaking the HIP compile, to explicitly use the std:: namespace	2020-10-15 12:17:35 +00:00
Rasmus Munk Larsen	6ea8091705	Revert change from `4e4d3f32d1` that broke BFloat16.h build with older compilers.	2020-10-15 01:20:08 +00:00
Guoqiang QI	4700713faf	Add AVX plog<Packet4d> and AVX512 plog<Packet8d> ops,also unified AVX512 plog<Packet16f> op with generic api	2020-10-15 00:54:45 +00:00
Rasmus Munk Larsen	af6f43d7ff	Add specializations for pmin/pmax with prescribed NaN propagation semantics for SSE/AVX/AVX512.	2020-10-14 23:11:24 +00:00
Rasmus Munk Larsen	208b3626d1	Revert generic implementation of `predux`, since it break compilation of `predux_any` with MSVC.	2020-10-14 21:41:28 +00:00
David Tellenbach	e3e2cf9d24	Add MatrixBase::cwiseArg()	2020-10-14 01:56:42 +00:00
Rasmus Munk Larsen	c6953f799b	Add packet generic ops `predux_fmin`, `predux_fmin_nan`, `predux_fmax`, and `predux_fmax_nan` that implement reductions with `PropagateNaN`, and `PropagateNumbers` semantics. Add (slow) generic implementations for most reductions.	2020-10-13 21:48:31 +00:00
acxz	807e51528d	undefine EIGEN_CONSTEXPR before redefinition	2020-10-12 20:28:56 -04:00
Rasmus Munk Larsen	9a4d04c05f	Make bitwise_helper a device function to unbreak GPU builds.	2020-10-10 01:45:20 +00:00
Rasmus Munk Larsen	4e4d3f32d1	Clean up packetmath tests and fix various bugs to make bfloat16 pass (almost) all packetmath tests with SSE, AVX, and AVX512.	2020-10-09 20:05:49 +00:00
David Tellenbach	4091f6b25c	Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD	2020-10-09 02:05:05 +02:00
Rasmus Munk Larsen	183a208212	Implement generic bitwise logical packet ops that work for all types.	2020-10-08 22:45:20 +00:00
Rasmus Munk Larsen	b431024404	Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms. Change test to only test for NaN-propagation for pfmin/pfmax.	2020-10-07 19:05:18 +00:00
David Tellenbach	f66f3393e3	Use reinterpret_cast instead of C-style cast in Inverse_NEON.h	2020-10-04 00:35:09 +02:00
Rasmus Munk Larsen	22c971a225	Don't cast away const in Inverse_NEON.h.	2020-10-02 15:06:34 -07:00
Rasmus Munk Larsen	f93841b53e	Use EIGEN_USING_STD to fix CUDA compilation error on BFloat16.h.	2020-10-02 14:47:15 -07:00
Rasmus Munk Larsen	ee714f79f7	Fix CUDA build breakage and incorrect result for absdiff on HIP with long double arguments.	2020-10-02 21:05:35 +00:00
janos	f7b185a8b1	dont use =* might not return a Scalar	2020-10-02 14:36:51 +02:00
Rasmus Munk Larsen	9078f47cd6	Fix build breakage with MSVC 2019, which does not support MMX intrinsics for 64 bit builds, see: https://stackoverflow.com/questions/60933486/mmx-intrinsics-like-mm-cvtpd-pi32-not-found-with-msvc-2019-for-64bit-targets-c Instead use the equivalent SSE2 intrinsics.	2020-10-01 12:37:55 -07:00
Rasmus Munk Larsen	3b445d9bf2	Add a generic packet ops corresponding to {std}::fmin and {std}::fmax. The non-sensical NaN-propagation rules for std::min std::max implemented by pmin and pmax in Eigen is a longstanding source og confusion and bug report. This change is a first step towards addressing it, as discussing in issue #564 .	2020-10-01 16:54:31 +00:00
Rasmus Munk Larsen	44b9d4e412	Specialize pldexp_double and pfdexp_double and get rid of Packet2l definition for SSE. SSE does not support conversion between 64 bit integers and double and the existing implementation of casting between Packet2d and Packer2l results in undefined behavior when casting NaN to int. Since pldexp and pfdexp only manipulate exponent fields that fit in 32 bit, this change provides specializations that use existing instructions _mm_cvtpd_pi32 and _mm_cvtsi32_pd instead.	2020-09-30 13:33:44 -07:00
Antonio Sanchez	d5a0d89491	Fix alignedbox 32-bit precision test failure. The current `test/geo_alignedbox` tests fail on 32-bit arm due to small floating-point errors. In particular, the following is not guaranteed to hold: ``` IsometryTransform identity = IsometryTransform::Identity(); BoxType transformedC; transformedC.extend(c.transformed(identity)); VERIFY(transformedC.contains(c)); ``` since `c.transformed(identity)` is ever-so-slightly different from `c`. Instead, we replace this test with one that checks an identity transform is within floating-point precision of `c`. Also updated the condition on `AlignedBox::transform(...)` to only accept `Affine`, `AffineCompact`, and `Isometry` modes explicitly. Otherwise, invalid combinations of modes would also incorrectly pass the assertion.	2020-09-30 08:42:03 -07:00
David Tellenbach	30960d485e	Fix failure in GEBP kernel when compiling with OpenMP and FMA Fixes #1995	2020-09-30 01:26:07 +02:00
Rasmus Munk Larsen	f9d1500f74	Revert !182 .	2020-09-29 13:56:17 -07:00
Rasmus Munk Larsen	068121ec02	Add missing newline at the end of Inverse_NEON.h	2020-09-29 15:32:52 +00:00
Rasmus Munk Larsen	74ff5719b3	Fix compilation of 64 bit constant arguments to pset1frombits in TypeCasting.h on platforms where uint64_t != unsigned long.	2020-09-28 22:47:11 +00:00
Rasmus Munk Larsen	3a0b23e473	Fix compilation of pset1frombits calls on iOS.	2020-09-28 22:30:36 +00:00
Christoph Hertzberg	6b0c0b587e	Provide a more efficient Packet2l->Packet2d cast method	2020-09-28 22:14:02 +00:00
Martin Pecka	6425e875a1	Added AlignedBox::transform(AffineTransform).	2020-09-28 18:06:23 +00:00
Deven Desai	ce5c59729d	Fix for ROCm/HIP breakage - 200921 The following commit causes regressions in the ROCm/HIP support for Eigen `e55182ac09` I suspect the same breakages occur on the CUDA side too. The above commit puts the EIGEN_CONSTEXPR attribute on `half_base` constructor. `half_base` is derived from `__half_raw`. When compiling with GPU support, the definition of `__half_raw` gets picked up from the GPU Compiler specific header files (`hip_fp16.h`, `cuda_fp16.h`). Properly supporting the above commit would require adding the `constexpr` attribute to the `__half_raw` constructor (and other `half` routines) in those header files. While that is something we can explore in the future, for now we need to undo the above commit when compiling with GPU support, which is what this commit does. This commit also reverts a small change in the `raw_uint16_to_half` routine made by the above commit. Similar to the case above, that change was leading to compile errors due to the fact that `__half_raw` has a different definition when compiling with DPU support.	2020-09-22 22:26:45 +00:00
Guoqiang QI	821702e771	Fix the #issue1997 and #issue1991 bug triggered by unsupport a[index](type a: __i28d) ops with MSVC compiler	2020-09-21 15:49:00 +00:00
Rasmus Munk Larsen	c4b99f78c7	Fix breakage in pcast<Packet2l, Packet2d> due to _mm_cvtsi128_si64 not being available on 32 bit x86. If SSE 4.1 is available use the faster _mm_extract_epi64 intrinsic.	2020-09-18 18:13:20 -07:00
guoqiangqi	9aad16b443	Fix undefined reference to pset1frombits bug on different platforms	2020-09-19 00:53:21 +00:00
David Tellenbach	c4aa8e0db2	Rename variable to avoid shadowing of a previously declared one	2020-09-18 22:53:15 +02:00
Rasmus Munk Larsen	e55182ac09	Get rid of initialization logic for blueNorm by making the computed constants static const or constexpr. Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.	2020-09-18 17:38:58 +00:00
Rasmus Munk Larsen	14022f5eb5	Fix more mildly embarrassing typos in ARM intrinsics in PacketMath.h. 'vmvnq_u64' does not exist for some reason.	2020-09-18 04:14:13 +00:00
Rasmus Munk Larsen	a5b226920f	Fix typo in PacketMath.h	2020-09-18 01:22:23 +00:00
Rasmus Munk Larsen	3af744b023	Add missing packet op pcmp_lt_or_nan for Packet2d on ARM.	2020-09-18 01:07:01 +00:00
Rasmus Munk Larsen	31a6b88ff3	Disable double version of compute_inverse_size4 on Inverse_NEON.h if Packet2d is not supported.	2020-09-17 23:51:06 +00:00
Brad King	880fa43b2b	Add support for CastXML on ARM aarch64 CastXML simulates the preprocessors of other compilers, but actually parses the translation unit with an internal Clang compiler. Use the same `vld1q_u64` workaround that we do for Clang. Fixes: #1979	2020-09-16 13:40:23 -04:00
daravi	6f0f6f792e	Fix compiler error due to c++20 operator== generation rules	2020-09-16 02:06:53 +00:00
Benoit Jacob	cc0c38ace8	Remove old Clang compiler bug work-arounds. The two LLVM bugs referenced in the comments here have long been fixed. The workarounds were now detrimental because (1) they prevented using fused mul-add on Clang/ARM32 and (2) the unnecessary 'volatile' in 'asm volatile' prevented legitimate reordering by the compiler.	2020-09-15 20:54:14 -04:00
Tim Shen	bb56a62582	Make bfloat16(float(-nan)) produce -nan, not nan.	2020-09-15 13:24:23 -07:00
Guoqiang QI	3012e755e9	Add plog ops support packet2d for NEON	2020-09-15 17:10:35 +00:00
Rasmus Munk Larsen	e4fb0ddf78	Add EIGEN_UNUSED_VARIABLE to unused variable in Memory.h	2020-09-15 01:18:55 +00:00
Pedro Caldeira	65e400896b	Fix bfloat16 round on gcc 4.8	2020-09-14 10:43:59 -03:00
Rasmus Munk Larsen	5636f80d11	Fix issue #1968 . Don't discard return value from "new" in C++17.	2020-09-13 17:38:45 +00:00
Guoqiang QI	7c5d48f313	Unified sse pldexp_double api	2020-09-12 10:56:55 +00:00
Rasmus Munk Larsen	71e08c702b	Make blueNorm threadsafe if C++11 atomics are available.	2020-09-12 01:23:29 +00:00
Niels Dekker	5328c9be43	Fix half_impl::float_to_half_rtne(float) warning: '<<' causes overflow Fixed Visual Studio 2019 Code Analysis (C++ Core Guidelines) warning C26450 from inside `half_impl::float_to_half_rtne(float)`: > Arithmetic overflow: '<<' operation causes overflow at compile time.	2020-09-10 16:22:28 +02:00
Pedro Caldeira	35d149e34c	Add missing functions for Packet8bf in Altivec architecture. Including new tests for bfloat16 Packets. Fix prsqrt on GenericPacketMath.	2020-09-08 09:22:11 -05:00
Guoqiang QI	85428a3440	Add Neon psqrt<Packet2d> and pexp<Packet2d>	2020-09-08 09:04:03 +00:00
Alexander Neumann	5272106826	remove semi triggering -Wextra-semi-stmt	2020-09-07 11:42:30 +02:00
Stephen Zheng	5f25bcf7d6	Add Inverse_NEON.h Implemented fast size-4 matrix inverse (mimicking Inverse_SSE.h) using NEON intrinsics. ``` Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------- BM_float -0.1285 -0.1275 568 495 572 499 BM_double -0.2265 -0.2254 638 494 641 496 ```	2020-09-04 10:55:47 +00:00
Everton Constantino	6fe88a3c9d	MatrixProuct enhancements: - Changes to Altivec/MatrixProduct Adapting code to gcc 10. Generic code style and performance enhancements. Adding PanelMode support. Adding stride/offset support. Enabling float64, std::complex and std::complex. Fixing lack of symm_pack. Enabling mixedtypes. - Adding std::complex tests to blasutil. - Adding an implementation of storePacketBlock when Incr!= 1.	2020-09-02 18:21:36 -03:00
Everton Constantino	6568856275	Changing u/int8_t to un/signed char because clang does not understand it. Implementing pcmp_eq to Packet8 and Packet16.	2020-09-02 17:02:15 -03:00

1 2 3 4 5 ...

6399 Commits