Gael Guennebaud
4a347a0054
Unify NEON's pexp with generic implementation
2018-11-26 22:15:44 +01:00
Gael Guennebaud
5c8406babc
Unify Altivec/VSX's pexp with generic implementation
2018-11-26 16:47:13 +01:00
Gael Guennebaud
cf8b85d5c5
Unify SSE and AVX implementation of pexp
2018-11-26 16:36:19 +01:00
Gael Guennebaud
c2f35b1b47
Unify Altivec/VSX's plog with generic implementation, and enable it!
2018-11-26 15:58:11 +01:00
Gael Guennebaud
c24e98e6a8
Unify NEON's plog with generic implementation
2018-11-26 15:02:16 +01:00
Gael Guennebaud
2c44c40114
First step toward a unification of packet log implementation, currently only SSE and AVX are unified.
...
To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
2018-11-26 14:21:24 +01:00
Gael Guennebaud
5f6045077c
Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B"
2018-11-26 14:14:07 +01:00
Gael Guennebaud
382279eb7f
Extend unit test to recursively check half-packet types and non packet types
2018-11-26 14:10:07 +01:00
Gael Guennebaud
0836a715d6
bug #1611 : fix plog(0) on NEON
2018-11-26 09:08:38 +01:00
Patrik Huber
95566eeed4
Fix typos
2018-11-23 22:22:14 +00:00
Gael Guennebaud
e3b22a6bd0
merge
2018-11-23 16:06:21 +01:00
Gael Guennebaud
ccabdd88c9
Fix reserved usage of double __ in macro names
2018-11-23 16:01:47 +01:00
Gael Guennebaud
572d62697d
check two ctors
2018-11-23 15:37:09 +01:00
Gael Guennebaud
354f14293b
Fix double = bool !
2018-11-23 15:12:06 +01:00
Gael Guennebaud
a7842daef2
Fix several uninitialized member from ctor
2018-11-23 15:10:28 +01:00
Christoph Hertzberg
ea60a172cf
Add default constructor to Bar to make test compile again with clang-3.8
2018-11-23 14:24:22 +01:00
Christoph Hertzberg
806352d844
Small typo found be Patrick Huber (pull request PR-547)
2018-11-23 12:34:27 +00:00
Gael Guennebaud
a476054879
bug #1624 : improve matrix-matrix product on ARM 64, 20% speedup
2018-11-23 10:25:19 +01:00
Gael Guennebaud
c685fe9838
Move regression test to right unit test file
2018-11-21 15:59:47 +01:00
Gael Guennebaud
4b2cebade8
Workaround weird MSVC bug
2018-11-21 15:53:37 +01:00
Christoph Hertzberg
0ec8afde57
Fixed most conversion warnings in MatrixFunctions module
2018-11-20 16:23:28 +01:00
Gael Guennebaud
6a510fe69c
Make MaxPacketSize a true upper bound, even for fixed-size inputs
2018-11-16 11:25:32 +01:00
Gael Guennebaud
43c987b1c1
Add explicit regression test for bug #1622
2018-11-16 11:24:51 +01:00
Mark D Ryan
670d56441c
PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals
...
Commit aa110e681b
optimised the multiplication of small dyanmically
sized matrices by restricting the packet size to a maximum of 4, increasing
the chances that SIMD instructions are used in the computation. However, it
introduced a mismatch between the packet size and the requestedAlignment. This
mismatch can lead to crashes when the destination is not aligned. This patch
fixes the issue by ensuring that the AssignmentTraits are correctly computed
when using a restricted packet size.
* * *
Bind LinearPacketType to MaxPacketSize
This commit applies any packet size limit specified when instantiating
copy_using_evaluator_traits to the LinearPacketType, providing that the
size of the destination is not known at compile time.
* * *
Add unit test for restricted packet assignment
A new unit test is added to check that multiplication of small dynamically
sized matrices works correctly when the packet size is restricted to 4 and
the destination is unaligned.
2018-11-13 16:15:08 +01:00
Nikolaus Demmel
3dc0845046
Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES
2018-11-14 18:11:30 +01:00
Gael Guennebaud
7fddc6a51f
typo
2018-11-14 14:43:18 +01:00
Gael Guennebaud
449f948b2a
help doxygen linking to DenseBase::NulllaryExpr
2018-11-14 14:42:59 +01:00
Gael Guennebaud
4263f23c28
Improve doc on multi-threading and warn about hyper-threading
2018-11-14 14:42:29 +01:00
Gael Guennebaud
db529ae4ec
doxygen does not like \addtogroup and \ingroup in the same line
2018-11-14 14:42:06 +01:00
Rasmus Munk Larsen
72928a2c8a
Merged in rmlarsen/eigen2 (pull request PR-543)
...
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2018-11-13 17:10:30 +00:00
Rasmus Munk Larsen
cda479d626
Remove accidental changes.
2018-11-12 18:34:04 -08:00
Rasmus Munk Larsen
719d9aee65
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
2018-11-12 17:46:02 -08:00
Rasmus Munk Larsen
77b447c24e
Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX.
2018-11-12 13:42:24 -08:00
Gael Guennebaud
c81bdbdadc
Add manual doc on STL-compatible iterators
2018-11-12 22:06:33 +01:00
Gael Guennebaud
0105146915
Fix warning in c++03
2018-11-10 09:11:38 +01:00
Rasmus Munk Larsen
93f9988a7e
A few small fixes to a) prevent throwing in ctors and dtors of the threading code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles.
2018-11-09 14:15:32 -08:00
Gael Guennebaud
784a3f13cf
bug #1619 : fix mixing of const and non-const generic iterators
2018-11-09 21:45:10 +01:00
Gael Guennebaud
db9a9a12ba
bug #1619 : make const and non-const iterators compatible
2018-11-09 16:49:19 +01:00
Gael Guennebaud
fbd6e7b025
add missing ref to a.zeta(b)
2018-11-09 13:53:42 +01:00
Gael Guennebaud
dffd1e11de
Limit the size of the toc
2018-11-09 13:52:34 +01:00
Gael Guennebaud
a88e0a0e95
Update doxy hacks wrt doxygen 1.8.13/14
2018-11-09 13:52:10 +01:00
Gael Guennebaud
bd9a00718f
Let doxygen sees lastN
2018-11-09 11:35:48 +01:00
Gael Guennebaud
d7c644213c
Add and update manual pages for slicing, indexing, and reshaping.
2018-11-09 11:35:27 +01:00
Gael Guennebaud
a368848473
Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE
2018-11-09 10:33:17 +01:00
Gael Guennebaud
f62a0f69c6
Fix max-size in indexed-view
2018-11-08 18:40:22 +01:00
Gael Guennebaud
bf495859ff
Merged in glchaves/eigen (pull request PR-539)
...
Vectorize row-by-row gebp loop iterations on 16 packets as well
2018-11-07 07:21:15 +00:00
Gael Guennebaud
995730fc6c
Add option to disable plot generation
2018-11-07 00:41:16 +01:00
Gustavo Lima Chaves
4ad359237a
Vectorize row-by-row gebp loop iterations on 16 packets as well
...
Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com>
Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com>
2018-11-06 10:48:42 -08:00
Gael Guennebaud
9d318b92c6
add unit tests for bug #1619
2018-11-01 15:14:50 +01:00
Matthieu Vigne
8d7a73e48e
bug #1617 : Fix SolveTriangular.solveInPlace crashing for empty matrix.
...
This made FullPivLU.kernel() crash when used on the zero matrix.
Add unit test for FullPivLU.kernel() on the zero matrix.
2018-10-31 20:28:18 +01:00