eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	c68bd2fa7a	Cleanup	2018-11-30 14:32:31 +01:00
Gael Guennebaud	f91500d303	Fix pandnot order in AVX512	2018-11-30 14:32:06 +01:00
Gael Guennebaud	b477d60bc6	Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX)	2018-11-30 11:26:30 +01:00
Gael Guennebaud	e19ece822d	Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks)	2018-11-28 17:56:24 +01:00
Gael Guennebaud	41052f63b7	same for pmax	2018-11-28 17:17:28 +01:00
Gael Guennebaud	3e95e398b6	pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions	2018-11-28 17:14:20 +01:00
Gael Guennebaud	aa6097395b	Add missing SSE/AVX type-casting in AVX512 mode	2018-11-28 16:09:08 +01:00
Gael Guennebaud	48fe78c375	bug #1630 : fix linspaced when requesting smaller packet size than default one.	2018-11-28 13:15:06 +01:00
Eugene Zhulenev	80f1651f35	Use explicit packet type in SSE/PacketMath pldexp	2018-11-27 17:25:49 -08:00
Benoit Jacob	a4159dba08	do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first).	2018-11-27 16:53:14 -05:00
Gael Guennebaud	b131a4db24	bug #1631 : fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.	2018-11-27 23:45:00 +01:00
Gael Guennebaud	a1a5fbbd21	Update pshiftleft to pass the shift as a true compile-time integer.	2018-11-27 22:57:30 +01:00
Gael Guennebaud	fa7fd61eda	Unify SSE/AVX psin functions. It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv	2018-11-27 22:41:51 +01:00
Rasmus Munk Larsen	08edbc8cfe	Merged in bjacob/eigen/fixbuild (pull request PR-549) fix the build on 64-bit ARM when NEON is disabled	2018-11-27 20:14:12 +00:00
Benoit Jacob	7b1cb8a440	fix the build on 64-bit ARM when NEON is disabled	2018-11-27 11:11:02 -05:00
Gael Guennebaud	b5695a6008	Unify Altivec/VSX pexp(double) with default implementation	2018-11-27 13:53:05 +01:00
Gael Guennebaud	7655a8af6e	cleanup	2018-11-26 23:21:29 +01:00
Gael Guennebaud	502f92fa10	Unify SSE and AVX pexp for double.	2018-11-26 23:12:44 +01:00
Gael Guennebaud	4a347a0054	Unify NEON's pexp with generic implementation	2018-11-26 22:15:44 +01:00
Gael Guennebaud	5c8406babc	Unify Altivec/VSX's pexp with generic implementation	2018-11-26 16:47:13 +01:00
Gael Guennebaud	cf8b85d5c5	Unify SSE and AVX implementation of pexp	2018-11-26 16:36:19 +01:00
Gael Guennebaud	c2f35b1b47	Unify Altivec/VSX's plog with generic implementation, and enable it!	2018-11-26 15:58:11 +01:00
Gael Guennebaud	c24e98e6a8	Unify NEON's plog with generic implementation	2018-11-26 15:02:16 +01:00
Gael Guennebaud	2c44c40114	First step toward a unification of packet log implementation, currently only SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.	2018-11-26 14:21:24 +01:00
Gael Guennebaud	5f6045077c	Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B"	2018-11-26 14:14:07 +01:00
Gael Guennebaud	382279eb7f	Extend unit test to recursively check half-packet types and non packet types	2018-11-26 14:10:07 +01:00
Gael Guennebaud	0836a715d6	bug #1611 : fix plog(0) on NEON	2018-11-26 09:08:38 +01:00
Patrik Huber	95566eeed4	Fix typos	2018-11-23 22:22:14 +00:00
Gael Guennebaud	e3b22a6bd0	merge	2018-11-23 16:06:21 +01:00
Gael Guennebaud	ccabdd88c9	Fix reserved usage of double __ in macro names	2018-11-23 16:01:47 +01:00
Gael Guennebaud	572d62697d	check two ctors	2018-11-23 15:37:09 +01:00
Gael Guennebaud	354f14293b	Fix double = bool !	2018-11-23 15:12:06 +01:00
Gael Guennebaud	a7842daef2	Fix several uninitialized member from ctor	2018-11-23 15:10:28 +01:00
Christoph Hertzberg	ea60a172cf	Add default constructor to Bar to make test compile again with clang-3.8	2018-11-23 14:24:22 +01:00
Christoph Hertzberg	806352d844	Small typo found be Patrick Huber (pull request PR-547)	2018-11-23 12:34:27 +00:00
Gael Guennebaud	a476054879	bug #1624 : improve matrix-matrix product on ARM 64, 20% speedup	2018-11-23 10:25:19 +01:00
Gael Guennebaud	c685fe9838	Move regression test to right unit test file	2018-11-21 15:59:47 +01:00
Gael Guennebaud	4b2cebade8	Workaround weird MSVC bug	2018-11-21 15:53:37 +01:00
Christoph Hertzberg	0ec8afde57	Fixed most conversion warnings in MatrixFunctions module	2018-11-20 16:23:28 +01:00
Gael Guennebaud	6a510fe69c	Make MaxPacketSize a true upper bound, even for fixed-size inputs	2018-11-16 11:25:32 +01:00
Gael Guennebaud	43c987b1c1	Add explicit regression test for bug #1622	2018-11-16 11:24:51 +01:00
Mark D Ryan	670d56441c	PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals Commit `aa110e681b` optimised the multiplication of small dyanmically sized matrices by restricting the packet size to a maximum of 4, increasing the chances that SIMD instructions are used in the computation. However, it introduced a mismatch between the packet size and the requestedAlignment. This mismatch can lead to crashes when the destination is not aligned. This patch fixes the issue by ensuring that the AssignmentTraits are correctly computed when using a restricted packet size. * * * Bind LinearPacketType to MaxPacketSize This commit applies any packet size limit specified when instantiating copy_using_evaluator_traits to the LinearPacketType, providing that the size of the destination is not known at compile time. * * * Add unit test for restricted packet assignment A new unit test is added to check that multiplication of small dynamically sized matrices works correctly when the packet size is restricted to 4 and the destination is unaligned.	2018-11-13 16:15:08 +01:00
Nikolaus Demmel	3dc0845046	Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES	2018-11-14 18:11:30 +01:00
Gael Guennebaud	7fddc6a51f	typo	2018-11-14 14:43:18 +01:00
Gael Guennebaud	449f948b2a	help doxygen linking to DenseBase::NulllaryExpr	2018-11-14 14:42:59 +01:00
Gael Guennebaud	4263f23c28	Improve doc on multi-threading and warn about hyper-threading	2018-11-14 14:42:29 +01:00
Gael Guennebaud	db529ae4ec	doxygen does not like \addtogroup and \ingroup in the same line	2018-11-14 14:42:06 +01:00
Rasmus Munk Larsen	72928a2c8a	Merged in rmlarsen/eigen2 (pull request PR-543) Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth. Approved-by: Eugene Zhulenev <ezhulenev@google.com>	2018-11-13 17:10:30 +00:00
Rasmus Munk Larsen	cda479d626	Remove accidental changes.	2018-11-12 18:34:04 -08:00
Rasmus Munk Larsen	719d9aee65	Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.	2018-11-12 17:46:02 -08:00

1 2 3 4 5 ...

10237 Commits