Gael Guennebaud
6cf7afa3d9
Typo
2019-01-15 11:04:37 +01:00
Rasmus Larsen
7b3aab0936
Merged in rmlarsen/eigen (pull request PR-570)
...
Add support for inverse hyperbolic functions. Fix cost of division.
2019-01-14 21:31:33 +00:00
Gael Guennebaud
250dcd1fdb
bug #1652 : fix position of EIGEN_ALIGN16 attributes in Neon and Altivec
2019-01-14 21:45:56 +01:00
Rasmus Larsen
5a59452aae
Merged eigen/eigen into default
2019-01-14 10:23:23 -08:00
Gael Guennebaud
3c9e6d206d
AVX512: fix pgather/pscatter for Packet4cd and unaligned pointers
2019-01-14 17:57:28 +01:00
Gael Guennebaud
61b6eb05fe
AVX512 (r)sqrt(double) was mistakenly disabled with clang and others
2019-01-14 17:28:47 +01:00
Gael Guennebaud
ccddeaad90
fix warning
2019-01-14 16:51:16 +01:00
Gael Guennebaud
d4881751d3
Doc: add Isometry in the list of supported Mode of Transform<>
2019-01-14 16:38:26 +01:00
Greg Coombe
9d988a1e1a
Initialize isometric transforms like affine transforms.
...
The isometric transform, like the affine transform, has an implicit last
row of [0, 0, 0, 1]. This was not being properly initialized, as verified
by a new test function.
2019-01-11 23:14:35 -08:00
Gael Guennebaud
4356a55a61
PR 571: Implements an accurate argument reduction algorithm for huge inputs of sin/cos and call it instead of falling back to std::sin/std::cos.
...
This makes both the small and huge argument cases faster because:
- for small inputs this removes the last pselect
- for large inputs only the reduction part follows a scalar path,
the rest use the same SIMD path as the small-argument case.
2019-01-14 13:54:01 +01:00
Gael Guennebaud
f566724023
Fix StorageIndex FIXME in dense LU solvers
2019-01-13 17:54:30 +01:00
Rasmus Munk Larsen
1c6e6e2c3f
Merge.
2019-01-11 17:47:11 -08:00
Rasmus Munk Larsen
28ba1b2c32
Add support for inverse hyperbolic functions.
...
Fix cost of division.
2019-01-11 17:45:37 -08:00
Rasmus Munk Larsen
89c4001d6f
Fix warnings in ptrue for complex and half types.
2019-01-11 14:10:57 -08:00
Rasmus Munk Larsen
a49d01edba
Fix warnings in ptrue for complex and half types.
2019-01-11 13:18:17 -08:00
Rasmus Munk Larsen
df29511ac0
Fix merge.
2019-01-11 10:36:36 -08:00
Rasmus Munk Larsen
9396ace46b
Merge.
2019-01-11 10:28:52 -08:00
Rasmus Larsen
74882471d0
Merged eigen/eigen into default
2019-01-11 10:20:55 -08:00
Gael Guennebaud
9005f0111f
Replace compiler's alignas/alignof extension by respective c++11 keywords when available. This also fix a compilation issue with gcc-4.7.
2019-01-11 17:10:54 +01:00
Mark D Ryan
3c9add6598
Remove reinterpret_cast from AVX512 complex implementation
...
The reinterpret_casts used in ptranspose(PacketBlock<Packet8cf,4>&)
ptranspose(PacketBlock<Packet8cf,8>&) don't appear to be working
correctly. They're used to convert the kernel parameters to
PacketBlock<Packet8d,T>& so that the complex number versions of
ptranspose can be written using the existing double implementations.
Unfortunately, they don't seem to work and are responsible for 9 unit
test failures in the AVX512 build of tensorflow master. This commit
fixes the issue by manually initialising PacketBlock<Packet8d,T>
variables with the contents of the kernel parameter before calling
the double version of ptranspose, and then copying the resulting
values back into the kernel parameter before returning.
2019-01-11 14:02:09 +01:00
Rasmus Munk Larsen
fcfced13ed
Rename pones -> ptrue. Use _CMP_TRUE_UQ where appropriate.
2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
e15bb785ad
Collapsed revision
...
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
f6ba6071c5
Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
8f04442526
Collapsed revision
...
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
e00521b514
Undo useless diffs.
2019-01-09 16:32:53 -08:00
Rasmus Munk Larsen
f2767112c8
Simplify a bit.
2019-01-09 16:29:18 -08:00
Rasmus Munk Larsen
cb955df9a6
Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
2019-01-09 16:17:08 -08:00
Rasmus Larsen
cb3c059fa4
Merged eigen/eigen into default
2019-01-09 15:04:17 -08:00
Gael Guennebaud
d812f411c3
bug #1654 : fix compilation with cuda and no c++11
2019-01-09 18:00:05 +01:00
Gael Guennebaud
3492a1ca74
fix plog(+inf) with AVX512
2019-01-09 16:53:37 +01:00
Gael Guennebaud
47810cf5b7
Add dedicated implementations of predux_any for AVX512, NEON, and Altivec/VSE
2019-01-09 16:40:42 +01:00
Gael Guennebaud
3f14e0d19e
fix warning
2019-01-09 15:45:21 +01:00
Gael Guennebaud
aeec68f77b
Add missing pcmp_lt and others for AVX512
2019-01-09 15:36:41 +01:00
Gael Guennebaud
e6b217b8dd
bug #1652 : implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows:
...
- no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs
- FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs
2019-01-09 15:25:17 +01:00
Rasmus Munk Larsen
055f0b73db
Add support for pcmp_eq and pnot, including for complex types.
2019-01-07 16:53:36 -08:00
Eugene Zhulenev
190d053e41
Explicitly set fill character when printing aligned data to ostream
2019-01-03 14:55:28 -08:00
Mark D Ryan
bc5dd4cafd
PR560: Fix the AVX512f only builds
...
Commit c53eececb0
introduced AVX512 support for complex numbers but required
avx512dq to build. Commit 1d683ae2f5
fixed some but not, it would seem all,
of the hard avx512dq dependencies. Build failures are still evident on
Eigen and TensorFlow when compiling with just avx512f and no avx512dq
using gcc 7.3. Looking at the code there does indeed seem to be a problem.
Commit c53eececb0
calls avx512dq intrinsics directly, e.g, _mm512_extractf32x8_ps
and _mm512_and_ps. This commit fixes the issue by replacing the direct
intrinsic calls with the various wrapper functions that are safe to use on
avx512f only builds.
2019-01-03 14:33:04 +01:00
Gael Guennebaud
60d3fe9a89
One more stupid AVX 512 fix (I don't have direct access to AVX512 machines)
2018-12-24 13:05:03 +01:00
Gael Guennebaud
4aa667b510
Add EIGEN_STRONG_INLINE where required
2018-12-24 10:45:01 +01:00
Gael Guennebaud
961ff567e8
Add missing pcmp_lt_or_nan for AVX512
2018-12-23 22:13:29 +01:00
Gael Guennebaud
0f6f75bd8a
Implement a faster fix for sin/cos of large entries that also correctly handle INF input.
2018-12-23 17:26:21 +01:00
Gael Guennebaud
38d704def8
Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate)
2018-12-23 16:13:24 +01:00
Gael Guennebaud
5713fb7feb
Fix plog(+INF): it returned ~87 instead of +INF
2018-12-23 15:40:52 +01:00
Christoph Hertzberg
6dd93f7e3b
Make code compile again for older compilers.
...
See https://stackoverflow.com/questions/7411515/
2018-12-22 13:09:07 +01:00
Gael Guennebaud
efa4c9c40f
bug #1615 : slightly increase the default unrolling limit to compensate for changeset 101ea26f5e
...
.
This solves a performance regression with clang and 3x3 matrix products.
2018-12-13 10:42:39 +01:00
Gael Guennebaud
f582ea3579
Fix compilation with expression template scalar type.
2018-12-12 22:47:00 +01:00
Gael Guennebaud
2de8da70fd
bug #1557 : fix RealSchur and EigenSolver for matrices with only zeros on the diagonal.
2018-12-12 17:30:08 +01:00
Gael Guennebaud
37c91e1836
bug #1644 : fix warning
2018-12-11 22:07:20 +01:00
Gael Guennebaud
f159cf3d75
Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels.
...
With a 6pX4 kernel (not committed yet), this provides a +20% speedup.
2018-12-11 15:36:27 +01:00
Gael Guennebaud
0a7e7af6fd
Properly set the number of registers for AVX512
2018-12-11 15:33:17 +01:00