Gael Guennebaud
c68bd2fa7a
Cleanup
2018-11-30 14:32:31 +01:00
Gael Guennebaud
f91500d303
Fix pandnot order in AVX512
2018-11-30 14:32:06 +01:00
Gael Guennebaud
b477d60bc6
Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX)
2018-11-30 11:26:30 +01:00
Gael Guennebaud
e19ece822d
Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks)
2018-11-28 17:56:24 +01:00
Gael Guennebaud
41052f63b7
same for pmax
2018-11-28 17:17:28 +01:00
Gael Guennebaud
3e95e398b6
pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions
2018-11-28 17:14:20 +01:00
Gael Guennebaud
aa6097395b
Add missing SSE/AVX type-casting in AVX512 mode
2018-11-28 16:09:08 +01:00
Gael Guennebaud
48fe78c375
bug #1630 : fix linspaced when requesting smaller packet size than default one.
2018-11-28 13:15:06 +01:00
Eugene Zhulenev
80f1651f35
Use explicit packet type in SSE/PacketMath pldexp
2018-11-27 17:25:49 -08:00
Benoit Jacob
a4159dba08
do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first).
2018-11-27 16:53:14 -05:00
Gael Guennebaud
b131a4db24
bug #1631 : fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.
2018-11-27 23:45:00 +01:00
Gael Guennebaud
a1a5fbbd21
Update pshiftleft to pass the shift as a true compile-time integer.
2018-11-27 22:57:30 +01:00
Gael Guennebaud
fa7fd61eda
Unify SSE/AVX psin functions.
...
It is based on the SSE version which is much more accurate, though very slightly slower.
This changeset also includes the following required changes:
- add packet-float to packet-int type traits
- add packet float<->int reinterpret casts
- add faster pselect for AVX based on blendv
2018-11-27 22:41:51 +01:00
Rasmus Munk Larsen
08edbc8cfe
Merged in bjacob/eigen/fixbuild (pull request PR-549)
...
fix the build on 64-bit ARM when NEON is disabled
2018-11-27 20:14:12 +00:00
Benoit Jacob
7b1cb8a440
fix the build on 64-bit ARM when NEON is disabled
2018-11-27 11:11:02 -05:00
Gael Guennebaud
b5695a6008
Unify Altivec/VSX pexp(double) with default implementation
2018-11-27 13:53:05 +01:00
Gael Guennebaud
7655a8af6e
cleanup
2018-11-26 23:21:29 +01:00
Gael Guennebaud
502f92fa10
Unify SSE and AVX pexp for double.
2018-11-26 23:12:44 +01:00
Gael Guennebaud
4a347a0054
Unify NEON's pexp with generic implementation
2018-11-26 22:15:44 +01:00
Gael Guennebaud
5c8406babc
Unify Altivec/VSX's pexp with generic implementation
2018-11-26 16:47:13 +01:00
Gael Guennebaud
cf8b85d5c5
Unify SSE and AVX implementation of pexp
2018-11-26 16:36:19 +01:00
Gael Guennebaud
c2f35b1b47
Unify Altivec/VSX's plog with generic implementation, and enable it!
2018-11-26 15:58:11 +01:00
Gael Guennebaud
c24e98e6a8
Unify NEON's plog with generic implementation
2018-11-26 15:02:16 +01:00
Gael Guennebaud
2c44c40114
First step toward a unification of packet log implementation, currently only SSE and AVX are unified.
...
To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
2018-11-26 14:21:24 +01:00
Gael Guennebaud
5f6045077c
Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B"
2018-11-26 14:14:07 +01:00
Gael Guennebaud
382279eb7f
Extend unit test to recursively check half-packet types and non packet types
2018-11-26 14:10:07 +01:00
Gael Guennebaud
0836a715d6
bug #1611 : fix plog(0) on NEON
2018-11-26 09:08:38 +01:00
Patrik Huber
95566eeed4
Fix typos
2018-11-23 22:22:14 +00:00
Gael Guennebaud
e3b22a6bd0
merge
2018-11-23 16:06:21 +01:00
Gael Guennebaud
ccabdd88c9
Fix reserved usage of double __ in macro names
2018-11-23 16:01:47 +01:00
Gael Guennebaud
572d62697d
check two ctors
2018-11-23 15:37:09 +01:00
Gael Guennebaud
354f14293b
Fix double = bool !
2018-11-23 15:12:06 +01:00
Gael Guennebaud
a7842daef2
Fix several uninitialized member from ctor
2018-11-23 15:10:28 +01:00
Christoph Hertzberg
ea60a172cf
Add default constructor to Bar to make test compile again with clang-3.8
2018-11-23 14:24:22 +01:00
Christoph Hertzberg
806352d844
Small typo found be Patrick Huber (pull request PR-547)
2018-11-23 12:34:27 +00:00
Gael Guennebaud
a476054879
bug #1624 : improve matrix-matrix product on ARM 64, 20% speedup
2018-11-23 10:25:19 +01:00
Gael Guennebaud
c685fe9838
Move regression test to right unit test file
2018-11-21 15:59:47 +01:00
Gael Guennebaud
4b2cebade8
Workaround weird MSVC bug
2018-11-21 15:53:37 +01:00
Christoph Hertzberg
0ec8afde57
Fixed most conversion warnings in MatrixFunctions module
2018-11-20 16:23:28 +01:00
Gael Guennebaud
6a510fe69c
Make MaxPacketSize a true upper bound, even for fixed-size inputs
2018-11-16 11:25:32 +01:00
Gael Guennebaud
43c987b1c1
Add explicit regression test for bug #1622
2018-11-16 11:24:51 +01:00
Mark D Ryan
670d56441c
PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals
...
Commit aa110e681b
optimised the multiplication of small dyanmically
sized matrices by restricting the packet size to a maximum of 4, increasing
the chances that SIMD instructions are used in the computation. However, it
introduced a mismatch between the packet size and the requestedAlignment. This
mismatch can lead to crashes when the destination is not aligned. This patch
fixes the issue by ensuring that the AssignmentTraits are correctly computed
when using a restricted packet size.
* * *
Bind LinearPacketType to MaxPacketSize
This commit applies any packet size limit specified when instantiating
copy_using_evaluator_traits to the LinearPacketType, providing that the
size of the destination is not known at compile time.
* * *
Add unit test for restricted packet assignment
A new unit test is added to check that multiplication of small dynamically
sized matrices works correctly when the packet size is restricted to 4 and
the destination is unaligned.
2018-11-13 16:15:08 +01:00
Nikolaus Demmel
3dc0845046
Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES
2018-11-14 18:11:30 +01:00
Gael Guennebaud
7fddc6a51f
typo
2018-11-14 14:43:18 +01:00
Gael Guennebaud
449f948b2a
help doxygen linking to DenseBase::NulllaryExpr
2018-11-14 14:42:59 +01:00
Gael Guennebaud
4263f23c28
Improve doc on multi-threading and warn about hyper-threading
2018-11-14 14:42:29 +01:00
Gael Guennebaud
db529ae4ec
doxygen does not like \addtogroup and \ingroup in the same line
2018-11-14 14:42:06 +01:00
Rasmus Munk Larsen
72928a2c8a
Merged in rmlarsen/eigen2 (pull request PR-543)
...
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2018-11-13 17:10:30 +00:00
Rasmus Munk Larsen
cda479d626
Remove accidental changes.
2018-11-12 18:34:04 -08:00
Rasmus Munk Larsen
719d9aee65
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
2018-11-12 17:46:02 -08:00