Rasmus Munk Larsen
7401e2541d
Fix compilation error for logical packet ops with older compilers.
2019-01-16 14:43:33 -08:00
Gael Guennebaud
0f028f61cb
GEBP: fix swapped kernel mode with AVX512 and complex scalars
2019-01-16 22:26:38 +01:00
Gael Guennebaud
e118ce86fd
GEBP: cleanup logic to choose between a 4 packets of 1 packet
2019-01-16 21:47:42 +01:00
Gael Guennebaud
70e133333d
bug #1661 : fix regression in GEBP and AVX512
2019-01-16 21:22:20 +01:00
Gael Guennebaud
502f717980
bug #1646 : disable aliasing detection for empty and 1x1 expression
2019-01-16 14:33:45 +01:00
Gael Guennebaud
0b466b6933
bug #1633 : use proper type for madd temporaries, factorize RhsPacketx4.
2019-01-16 13:50:13 +01:00
Renjie Liu
dbfcceabf5
Bug: 1633: refactor gebp kernel and optimize for neon
2019-01-16 12:51:36 +08:00
Gael Guennebaud
2b70b2f570
Make Transform::rotation() an alias to Transform::linear() in the case of an Isometry
2019-01-15 22:50:42 +01:00
Gael Guennebaud
2c2c114995
Silent maybe-uninitialized warnings by gcc
2019-01-15 16:53:15 +01:00
Gael Guennebaud
6ec6bf0b0d
Enable visitor on empty matrices (the visitor is left unchanged), and protect min/maxCoeff(Index*,Index*) on empty matrices by an assertion (+ doc & unit tests)
2019-01-15 15:21:14 +01:00
Gael Guennebaud
027e44ed24
bug #1592 : makes partial min/max reductions trigger an assertion on inputs with a zero reduction length (+doc and tests)
2019-01-15 15:13:24 +01:00
Gael Guennebaud
f8bc5cb39e
Fix detection of vector-at-time: use Rows/Cols instead of MaxRow/MaxCols.
...
This fix VectorXd(n).middleCol(0,0).outerSize() which was equal to 1.
2019-01-15 15:09:49 +01:00
Gael Guennebaud
6cf7afa3d9
Typo
2019-01-15 11:04:37 +01:00
Rasmus Larsen
7b3aab0936
Merged in rmlarsen/eigen (pull request PR-570)
...
Add support for inverse hyperbolic functions. Fix cost of division.
2019-01-14 21:31:33 +00:00
Gael Guennebaud
250dcd1fdb
bug #1652 : fix position of EIGEN_ALIGN16 attributes in Neon and Altivec
2019-01-14 21:45:56 +01:00
Rasmus Larsen
5a59452aae
Merged eigen/eigen into default
2019-01-14 10:23:23 -08:00
Gael Guennebaud
3c9e6d206d
AVX512: fix pgather/pscatter for Packet4cd and unaligned pointers
2019-01-14 17:57:28 +01:00
Gael Guennebaud
61b6eb05fe
AVX512 (r)sqrt(double) was mistakenly disabled with clang and others
2019-01-14 17:28:47 +01:00
Gael Guennebaud
ccddeaad90
fix warning
2019-01-14 16:51:16 +01:00
Gael Guennebaud
d4881751d3
Doc: add Isometry in the list of supported Mode of Transform<>
2019-01-14 16:38:26 +01:00
Greg Coombe
9d988a1e1a
Initialize isometric transforms like affine transforms.
...
The isometric transform, like the affine transform, has an implicit last
row of [0, 0, 0, 1]. This was not being properly initialized, as verified
by a new test function.
2019-01-11 23:14:35 -08:00
Gael Guennebaud
4356a55a61
PR 571: Implements an accurate argument reduction algorithm for huge inputs of sin/cos and call it instead of falling back to std::sin/std::cos.
...
This makes both the small and huge argument cases faster because:
- for small inputs this removes the last pselect
- for large inputs only the reduction part follows a scalar path,
the rest use the same SIMD path as the small-argument case.
2019-01-14 13:54:01 +01:00
Gael Guennebaud
f566724023
Fix StorageIndex FIXME in dense LU solvers
2019-01-13 17:54:30 +01:00
Rasmus Munk Larsen
1c6e6e2c3f
Merge.
2019-01-11 17:47:11 -08:00
Rasmus Munk Larsen
28ba1b2c32
Add support for inverse hyperbolic functions.
...
Fix cost of division.
2019-01-11 17:45:37 -08:00
Rasmus Munk Larsen
89c4001d6f
Fix warnings in ptrue for complex and half types.
2019-01-11 14:10:57 -08:00
Rasmus Munk Larsen
a49d01edba
Fix warnings in ptrue for complex and half types.
2019-01-11 13:18:17 -08:00
Rasmus Munk Larsen
df29511ac0
Fix merge.
2019-01-11 10:36:36 -08:00
Rasmus Munk Larsen
9396ace46b
Merge.
2019-01-11 10:28:52 -08:00
Rasmus Larsen
74882471d0
Merged eigen/eigen into default
2019-01-11 10:20:55 -08:00
Gael Guennebaud
9005f0111f
Replace compiler's alignas/alignof extension by respective c++11 keywords when available. This also fix a compilation issue with gcc-4.7.
2019-01-11 17:10:54 +01:00
Mark D Ryan
3c9add6598
Remove reinterpret_cast from AVX512 complex implementation
...
The reinterpret_casts used in ptranspose(PacketBlock<Packet8cf,4>&)
ptranspose(PacketBlock<Packet8cf,8>&) don't appear to be working
correctly. They're used to convert the kernel parameters to
PacketBlock<Packet8d,T>& so that the complex number versions of
ptranspose can be written using the existing double implementations.
Unfortunately, they don't seem to work and are responsible for 9 unit
test failures in the AVX512 build of tensorflow master. This commit
fixes the issue by manually initialising PacketBlock<Packet8d,T>
variables with the contents of the kernel parameter before calling
the double version of ptranspose, and then copying the resulting
values back into the kernel parameter before returning.
2019-01-11 14:02:09 +01:00
Rasmus Munk Larsen
fcfced13ed
Rename pones -> ptrue. Use _CMP_TRUE_UQ where appropriate.
2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
e15bb785ad
Collapsed revision
...
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
f6ba6071c5
Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
8f04442526
Collapsed revision
...
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
e00521b514
Undo useless diffs.
2019-01-09 16:32:53 -08:00
Rasmus Munk Larsen
f2767112c8
Simplify a bit.
2019-01-09 16:29:18 -08:00
Rasmus Munk Larsen
cb955df9a6
Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
2019-01-09 16:17:08 -08:00
Rasmus Larsen
cb3c059fa4
Merged eigen/eigen into default
2019-01-09 15:04:17 -08:00
Gael Guennebaud
d812f411c3
bug #1654 : fix compilation with cuda and no c++11
2019-01-09 18:00:05 +01:00
Gael Guennebaud
3492a1ca74
fix plog(+inf) with AVX512
2019-01-09 16:53:37 +01:00
Gael Guennebaud
47810cf5b7
Add dedicated implementations of predux_any for AVX512, NEON, and Altivec/VSE
2019-01-09 16:40:42 +01:00
Gael Guennebaud
3f14e0d19e
fix warning
2019-01-09 15:45:21 +01:00
Gael Guennebaud
aeec68f77b
Add missing pcmp_lt and others for AVX512
2019-01-09 15:36:41 +01:00
Gael Guennebaud
e6b217b8dd
bug #1652 : implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows:
...
- no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs
- FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs
2019-01-09 15:25:17 +01:00
Rasmus Munk Larsen
055f0b73db
Add support for pcmp_eq and pnot, including for complex types.
2019-01-07 16:53:36 -08:00
Eugene Zhulenev
190d053e41
Explicitly set fill character when printing aligned data to ostream
2019-01-03 14:55:28 -08:00
Mark D Ryan
bc5dd4cafd
PR560: Fix the AVX512f only builds
...
Commit c53eececb0
introduced AVX512 support for complex numbers but required
avx512dq to build. Commit 1d683ae2f5
fixed some but not, it would seem all,
of the hard avx512dq dependencies. Build failures are still evident on
Eigen and TensorFlow when compiling with just avx512f and no avx512dq
using gcc 7.3. Looking at the code there does indeed seem to be a problem.
Commit c53eececb0
calls avx512dq intrinsics directly, e.g, _mm512_extractf32x8_ps
and _mm512_and_ps. This commit fixes the issue by replacing the direct
intrinsic calls with the various wrapper functions that are safe to use on
avx512f only builds.
2019-01-03 14:33:04 +01:00
Gael Guennebaud
60d3fe9a89
One more stupid AVX 512 fix (I don't have direct access to AVX512 machines)
2018-12-24 13:05:03 +01:00