eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-06 14:14:46 +08:00

Author	SHA1	Message	Date
Christoph Hertzberg	6dd93f7e3b	Make code compile again for older compilers. See https://stackoverflow.com/questions/7411515/	2018-12-22 13:09:07 +01:00
Gustavo Lima Chaves	1024a70e82	gebp: Add new ½ and ¼ packet rows per (peeling) round on the lhs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The patch works by altering the gebp lhs packing routines to also consider ½ and ¼ packet lenght rows when packing, besides the original whole package and row-by-row attempts. Finally, gebp itself will try to fit a fraction of a packet at a time if: i) ½ and/or ¼ packets are available for the current context (e.g. AVX2 and SSE-sized SIMD register for x86) ii) The matrix's height is favorable to it (it may be it's too small in that dimension to take full advantage of the current/maximum packet width or it may be the case that last rows may take advantage of smaller packets before gebp goes row-by-row) This helps mitigate huge slowdowns one had on AVX512 builds when compared to AVX2 ones, for some dimensions. Gains top at an extra 1x in throughput. This patch is a complement to changeset `4ad359237a` . Since packing is changed, Eigen users which would go for very low-level API usage, like TensorFlow, will have to be adapted to work fine with the changes.	2018-12-21 11:03:18 -08:00
Gustavo Lima Chaves	e763fcd09e	Introducing "vectorized" byte on unpacket_traits structs This is a preparation to a change on gebp_traits, where a new template argument will be introduced to dictate the packet size, so it won't be bound to the current/max packet size only anymore. By having packet types defined early on gebp_traits, one has now to act on packet types, not scalars anymore, for the enum values defined on that class. One approach for reaching the vectorizable/size properties one needs there could be getting the packet's scalar again with unpacket_traits<>, then the size/Vectorizable enum entries from packet_traits<>. It turns out guards like "#ifndef EIGEN_VECTORIZE_AVX512" at AVX/PacketMath.h will hide smaller packet variations of packet_traits<> for some types (and it makes sense to keep that). In other words, one can't go back to the scalar and create a new PacketType, as this will always lead to the maximum packet type for the architecture. The less costly/invasive solution for that, thus, is to add the vectorizable info on every unpacket_traits struct as well.	2018-12-19 14:24:44 -08:00
Gael Guennebaud	efa4c9c40f	bug #1615 : slightly increase the default unrolling limit to compensate for changeset `101ea26f5e` . This solves a performance regression with clang and 3x3 matrix products.	2018-12-13 10:42:39 +01:00
Gael Guennebaud	f20c991679	add changesets related to matrix product perf.	2018-12-13 10:33:29 +01:00
Rasmus Munk Larsen	dd6d65898a	Fix shorten-64-to-32 warning. Use regular memcpy if num_threads==0.	2018-12-12 14:45:31 -08:00
Gael Guennebaud	f582ea3579	Fix compilation with expression template scalar type.	2018-12-12 22:47:00 +01:00
Gael Guennebaud	cfc70dc13f	Add regression test for bug #1174	2018-12-12 18:03:31 +01:00
Gael Guennebaud	2de8da70fd	bug #1557 : fix RealSchur and EigenSolver for matrices with only zeros on the diagonal.	2018-12-12 17:30:08 +01:00
Gael Guennebaud	72c0bbe2bd	Simplify handling of tests that must fail to compile. Each test is now a normal ctest target, and build properties (compiler+flags) are preserved (instead of starting a new build-dir from scratch).	2018-12-12 15:48:36 +01:00
Gael Guennebaud	37c91e1836	bug #1644 : fix warning	2018-12-11 22:07:20 +01:00
Gael Guennebaud	f159cf3d75	Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels. With a 6pX4 kernel (not committed yet), this provides a +20% speedup.	2018-12-11 15:36:27 +01:00
Gael Guennebaud	0a7e7af6fd	Properly set the number of registers for AVX512	2018-12-11 15:33:17 +01:00
Gael Guennebaud	7166496f70	bug #1643 : fix compilation issue with gcc and no optimizaion	2018-12-11 13:24:42 +01:00
Gael Guennebaud	0d90637838	enable spilling workaround on architectures with SSE/AVX	2018-12-10 23:22:44 +01:00
Gael Guennebaud	cf697272e1	Remove debug code.	2018-12-09 23:05:46 +01:00
Gael Guennebaud	450dc97c6b	Various fixes in polynomial solver and its unit tests: - cleanup noise in imaginary part of real roots - take into account the magnitude of the derivative to check roots. - use <= instead of < at appropriate places	2018-12-09 22:54:39 +01:00
Gael Guennebaud	348bb386d1	Enable "old" CMP0026 policy (not perfect, but better than dozens of warning)	2018-12-08 18:59:51 +01:00
Gael Guennebaud	bff90bf270	workaround "may be used uninitialized" warning	2018-12-08 18:58:28 +01:00
Gael Guennebaud	81c27325ae	bug #1641 : fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512	2018-12-08 14:27:48 +01:00
Gael Guennebaud	426bce7529	fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target	2018-12-08 09:44:21 +01:00
Gael Guennebaud	cd25b538ab	Fix noise in sparse_basic_3 (numerical cancellation)	2018-12-08 00:13:37 +01:00
Gael Guennebaud	efaf03bf96	Fix noise in lu unit test	2018-12-08 00:05:03 +01:00
Gael Guennebaud	956678a4ef	bug #1515 : disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling.	2018-12-07 18:03:36 +01:00
Gael Guennebaud	7b6d0ff1f6	Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error.	2018-12-07 15:14:50 +01:00
Gael Guennebaud	f233c6194d	bug #1637 : workaround register spilling in gebp with clang>=6.0+AVX+FMA	2018-12-07 10:01:09 +01:00
Gael Guennebaud	ae59a7652b	bug #1638 : add a warning if avx512 is enabled without SSE/AVX FMA	2018-12-07 09:23:28 +01:00
Gael Guennebaud	4e7746fe22	bug #1636 : fix gemm performance issue with gcc>=6 and no FMA	2018-12-07 09:15:46 +01:00
Gael Guennebaud	cbf2f4b7a0	AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only	2018-12-06 18:21:56 +01:00
Gael Guennebaud	1d683ae2f5	Fix compilation with avx512f only, i.e., no AVX512DQ	2018-12-06 18:11:07 +01:00
Gael Guennebaud	aab749b1c3	fix test regarding AVX512 vectorization of complexes.	2018-12-06 16:55:00 +01:00
Gael Guennebaud	c53eececb0	Implement AVX512 vectorization of std::complex<float/double>	2018-12-06 15:58:06 +01:00
Gael Guennebaud	3fba59ea59	temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though!	2018-12-06 00:13:26 +01:00
Gael Guennebaud	1ac2695ef7	bug #1636 : fix compilation with some ABI versions.	2018-12-06 00:05:10 +01:00
Rasmus Munk Larsen	47d8b741b2	#elif -> #else to fix GPU build.	2018-12-05 13:19:31 -08:00
Rasmus Munk Larsen	8a02883d58	Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554) Fix tensor contraction on AVX512 builds Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>	2018-12-05 18:19:32 +00:00
Gael Guennebaud	acc3459a49	Add help messages in the quick ref/ascii docs regarding slicing, indexing, and reshaping.	2018-12-05 17:17:23 +01:00
Gael Guennebaud	e2e897298a	Fix page nesting	2018-12-05 17:13:46 +01:00
Christoph Hertzberg	c1d356e8b4	bug #1635 : Use infinity from Numtraits instead of creating it manually.	2018-12-05 15:01:04 +01:00
Mark D Ryan	36f8f6d0be	Fix evalShardedByInnerDim for AVX512 builds evalShardedByInnerDim ensures that the values it passes for start_k and end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel does not work correctly when the values of k are not multiples of the packet_size. While this precaution works for AVX builds, it is insufficient for AVX512 builds where the maximum packet size is 16. The result is slightly incorrect float32 contractions on AVX512 builds. This commit fixes the problem by ensuring that k is always a multiple of the packet_size if the packet_size is > 8.	2018-12-05 12:29:03 +01:00
Rasmus Munk Larsen	b57b31cce9	Merged in ezhulenev/eigen-01 (pull request PR-553) Do not disable alignment with EIGEN_GPUCC Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>	2018-12-04 23:47:19 +00:00
Eugene Zhulenev	0bb15bb6d6	Update checks in ConfigureVectorization.h	2018-12-03 17:10:40 -08:00
Eugene Zhulenev	fd0fbfa9b5	Do not disable alignment with EIGEN_GPUCC	2018-12-03 15:54:10 -08:00
Christoph Hertzberg	919414b9fe	bug #785 : Make Cholesky decomposition work for empty matrices	2018-12-03 16:18:15 +01:00
Gael Guennebaud	0ea7ae7213	Add missing padd for Packet8i (it was implicitly generated by clang and gcc)	2018-11-30 21:52:25 +01:00
Gael Guennebaud	ab4df3e6ff	bug #1634 : remove double copy in move-ctor of non movable Matrix/Array	2018-11-30 21:25:51 +01:00
Gael Guennebaud	c785464430	Add packet sin and cos to Altivec/VSX and NEON	2018-11-30 16:21:33 +01:00
Gael Guennebaud	69ace742be	Several improvements regarding packet-bitwise operations: - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...)	2018-11-30 15:56:08 +01:00
Gael Guennebaud	fa87f9d876	Add psin/pcos on AVX512 -> almost for free, at last!	2018-11-30 14:33:13 +01:00
Gael Guennebaud	c68bd2fa7a	Cleanup	2018-11-30 14:32:31 +01:00

... 2 3 4 5 6 ...

10437 Commits