Commit Graph

6286 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
3b445d9bf2 Add a generic packet ops corresponding to {std}::fmin and {std}::fmax. The non-sensical NaN-propagation rules for std::min std::max implemented by pmin and pmax in Eigen is a longstanding source og confusion and bug report. This change is a first step towards addressing it, as discussing in issue #564. 2020-10-01 16:54:31 +00:00
Rasmus Munk Larsen
44b9d4e412 Specialize pldexp_double and pfdexp_double and get rid of Packet2l definition for SSE. SSE does not support conversion between 64 bit integers and double and the existing implementation of casting between Packet2d and Packer2l results in undefined behavior when casting NaN to int. Since pldexp and pfdexp only manipulate exponent fields that fit in 32 bit, this change provides specializations that use existing instructions _mm_cvtpd_pi32 and _mm_cvtsi32_pd instead. 2020-09-30 13:33:44 -07:00
Antonio Sanchez
d5a0d89491 Fix alignedbox 32-bit precision test failure.
The current `test/geo_alignedbox` tests fail on 32-bit arm due to small floating-point errors.

In particular, the following is not guaranteed to hold:
```
IsometryTransform identity = IsometryTransform::Identity();
BoxType transformedC;
transformedC.extend(c.transformed(identity));
VERIFY(transformedC.contains(c));
```
since `c.transformed(identity)` is ever-so-slightly different from `c`. Instead, we replace this test with one that checks an identity transform is within floating-point precision of `c`.

Also updated the condition on `AlignedBox::transform(...)` to only accept `Affine`, `AffineCompact`, and `Isometry` modes explicitly.  Otherwise, invalid combinations of modes would also incorrectly pass the assertion.
2020-09-30 08:42:03 -07:00
David Tellenbach
30960d485e Fix failure in GEBP kernel when compiling with OpenMP and FMA
Fixes #1995
2020-09-30 01:26:07 +02:00
Rasmus Munk Larsen
f9d1500f74 Revert !182. 2020-09-29 13:56:17 -07:00
Rasmus Munk Larsen
068121ec02 Add missing newline at the end of Inverse_NEON.h 2020-09-29 15:32:52 +00:00
Rasmus Munk Larsen
74ff5719b3 Fix compilation of 64 bit constant arguments to pset1frombits in TypeCasting.h on platforms where uint64_t != unsigned long. 2020-09-28 22:47:11 +00:00
Rasmus Munk Larsen
3a0b23e473 Fix compilation of pset1frombits calls on iOS. 2020-09-28 22:30:36 +00:00
Christoph Hertzberg
6b0c0b587e Provide a more efficient Packet2l->Packet2d cast method 2020-09-28 22:14:02 +00:00
Martin Pecka
6425e875a1 Added AlignedBox::transform(AffineTransform). 2020-09-28 18:06:23 +00:00
Deven Desai
ce5c59729d Fix for ROCm/HIP breakage - 200921
The following commit causes regressions in the ROCm/HIP support for Eigen
e55182ac09

I suspect the same breakages occur on the CUDA side too.

The above commit puts the EIGEN_CONSTEXPR attribute on `half_base` constructor. `half_base` is derived from `__half_raw`.

When compiling with GPU support, the definition of `__half_raw` gets picked up from the GPU Compiler specific header files (`hip_fp16.h`, `cuda_fp16.h`). Properly supporting the above commit would require adding the `constexpr` attribute to the `__half_raw` constructor (and other `*half*` routines) in those header files. While that is something we can explore in the future, for now we need to undo the above commit when compiling with GPU support, which is what this commit does.

This commit also reverts a small change in the `raw_uint16_to_half` routine made by the above commit. Similar to the case above, that change was leading to compile errors due to the fact that `__half_raw` has a different definition when compiling with DPU support.
2020-09-22 22:26:45 +00:00
Guoqiang QI
821702e771 Fix the #issue1997 and #issue1991 bug triggered by unsupport a[index](type a: __i28d) ops with MSVC compiler 2020-09-21 15:49:00 +00:00
Rasmus Munk Larsen
c4b99f78c7 Fix breakage in pcast<Packet2l, Packet2d> due to _mm_cvtsi128_si64 not being available on 32 bit x86.
If SSE 4.1 is available use the faster _mm_extract_epi64 intrinsic.
2020-09-18 18:13:20 -07:00
guoqiangqi
9aad16b443 Fix undefined reference to pset1frombits bug on different platforms 2020-09-19 00:53:21 +00:00
David Tellenbach
c4aa8e0db2 Rename variable to avoid shadowing of a previously declared one 2020-09-18 22:53:15 +02:00
Rasmus Munk Larsen
e55182ac09 Get rid of initialization logic for blueNorm by making the computed constants static const or constexpr.
Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.
2020-09-18 17:38:58 +00:00
Rasmus Munk Larsen
14022f5eb5 Fix more mildly embarrassing typos in ARM intrinsics in PacketMath.h.
'vmvnq_u64' does not exist for some reason.
2020-09-18 04:14:13 +00:00
Rasmus Munk Larsen
a5b226920f Fix typo in PacketMath.h 2020-09-18 01:22:23 +00:00
Rasmus Munk Larsen
3af744b023 Add missing packet op pcmp_lt_or_nan for Packet2d on ARM. 2020-09-18 01:07:01 +00:00
Rasmus Munk Larsen
31a6b88ff3 Disable double version of compute_inverse_size4 on Inverse_NEON.h if Packet2d is not supported. 2020-09-17 23:51:06 +00:00
Brad King
880fa43b2b Add support for CastXML on ARM aarch64
CastXML simulates the preprocessors of other compilers, but actually
parses the translation unit with an internal Clang compiler.
Use the same `vld1q_u64` workaround that we do for Clang.

Fixes: #1979
2020-09-16 13:40:23 -04:00
daravi
6f0f6f792e Fix compiler error due to c++20 operator== generation rules 2020-09-16 02:06:53 +00:00
Benoit Jacob
cc0c38ace8 Remove old Clang compiler bug work-arounds. The two LLVM bugs referenced in the comments here have long been fixed. The workarounds were now detrimental because (1) they prevented using fused mul-add on Clang/ARM32 and (2) the unnecessary 'volatile' in 'asm volatile' prevented legitimate reordering by the compiler. 2020-09-15 20:54:14 -04:00
Tim Shen
bb56a62582 Make bfloat16(float(-nan)) produce -nan, not nan. 2020-09-15 13:24:23 -07:00
Guoqiang QI
3012e755e9 Add plog ops support packet2d for NEON 2020-09-15 17:10:35 +00:00
Rasmus Munk Larsen
e4fb0ddf78 Add EIGEN_UNUSED_VARIABLE to unused variable in Memory.h 2020-09-15 01:18:55 +00:00
Pedro Caldeira
65e400896b Fix bfloat16 round on gcc 4.8 2020-09-14 10:43:59 -03:00
Rasmus Munk Larsen
5636f80d11 Fix issue #1968. Don't discard return value from "new" in C++17. 2020-09-13 17:38:45 +00:00
Guoqiang QI
7c5d48f313 Unified sse pldexp_double api 2020-09-12 10:56:55 +00:00
Rasmus Munk Larsen
71e08c702b Make blueNorm threadsafe if C++11 atomics are available. 2020-09-12 01:23:29 +00:00
Niels Dekker
5328c9be43 Fix half_impl::float_to_half_rtne(float) warning: '<<' causes overflow
Fixed Visual Studio 2019 Code Analysis (C++ Core Guidelines) warning
C26450 from inside `half_impl::float_to_half_rtne(float)`:
> Arithmetic overflow: '<<' operation causes overflow at compile time.
2020-09-10 16:22:28 +02:00
Pedro Caldeira
35d149e34c Add missing functions for Packet8bf in Altivec architecture.
Including new tests for bfloat16 Packets.
Fix prsqrt on GenericPacketMath.
2020-09-08 09:22:11 -05:00
Guoqiang QI
85428a3440 Add Neon psqrt<Packet2d> and pexp<Packet2d> 2020-09-08 09:04:03 +00:00
Alexander Neumann
5272106826 remove semi triggering -Wextra-semi-stmt 2020-09-07 11:42:30 +02:00
Stephen Zheng
5f25bcf7d6 Add Inverse_NEON.h
Implemented fast size-4 matrix inverse (mimicking Inverse_SSE.h) using NEON intrinsics.

```
Benchmark                   Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------
BM_float                 -0.1285         -0.1275           568           495           572           499
BM_double                -0.2265         -0.2254           638           494           641           496
```
2020-09-04 10:55:47 +00:00
Everton Constantino
6fe88a3c9d MatrixProuct enhancements:
- Changes to Altivec/MatrixProduct
  Adapting code to gcc 10.
  Generic code style and performance enhancements.
  Adding PanelMode support.
  Adding stride/offset support.
  Enabling float64, std::complex and std::complex.
  Fixing lack of symm_pack.
  Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
2020-09-02 18:21:36 -03:00
Everton Constantino
6568856275 Changing u/int8_t to un/signed char because clang does not understand
it.

Implementing pcmp_eq to Packet8 and Packet16.
2020-09-02 17:02:15 -03:00
Gael Guennebaud
27e6648074 fix #1901: warning in Mode==(Upper|Lower) 2020-09-02 15:43:58 +02:00
Chip Kerchner
e5886457c8 Change Packet8s and Packet8us to use vector commands on Power for pmadd, pmul and psub. 2020-08-28 19:27:32 +00:00
Gael Guennebaud
25424d91f6 Fix #1974: assertion when reserving an empty sparse matrix 2020-08-26 12:32:20 +02:00
Guoqiang QI
8bb0febaf9 add psqrt ops support packet2f/packet4f for NEON 2020-08-21 03:17:15 +00:00
Georg Jäger
1b1082334b adding attributes to constructors to support hip-clang on ROCm 3.5 2020-08-20 16:48:11 +02:00
Deven Desai
603e213d13 Fixing a CUDA / P100 regression introduced by PR 181
PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified.

That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
2020-08-20 00:29:57 +00:00
Rasmus Munk Larsen
d10b27fe37 Add missing inline keyword in Quaternion.h. 2020-08-14 17:51:04 +00:00
David Tellenbach
c6820a6316 Replace the call to int64_t in the blasutil test by explicit types
Some platforms define int64_t to be long long even for C++03. If this is
the case we miss the definition of internal::make_unsigned for this
type. If we just define the template we get duplicated definitions
errors for platforms defining int64_t as signed long for C++03.

We need to find a way to distinguish both cases at compile-time.
2020-08-14 17:24:37 +02:00
David Tellenbach
8ba1b0f41a bfloat16 packetmath for Arm Neon backend 2020-08-13 15:48:40 +00:00
Pedro Caldeira
704798d1df Add support for Bfloat16 to use vector instructions on Altivec
architecture
2020-08-10 13:22:01 -05:00
Zachary Garrett
21122498ec Temporarily turn off the NEON implementation of pfloor as it does not work for large values.
The NEON implementation mimics the SSE implementation, but didn't mention the caveat that due to the unsigned of signed integer conversions, not all values in the original floating point  represented are supported.
2020-08-04 16:28:23 +00:00
David Tellenbach
5e484fa11d Fix StlDeque for GCC 10
StlDeque extends std::deque by accessing some of its internal members.
Since GCC 10 these are not accessible anymore.
2020-07-29 12:31:13 +00:00
Teng Lu
3ec4f0b641 Fix undefine BF16 union behavior in AVX512. 2020-07-29 02:20:21 +00:00