eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-27 07:29:52 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	94684721bd	Smarter block size computation	2014-04-18 15:35:34 +02:00
Gael Guennebaud	1388f4f9fd	Fix typo (was working with clang\!)	2014-04-18 11:43:13 +02:00
Gael Guennebaud	6d665d446b	Fixes for fixed sizes and non vectorizable types	2014-04-17 23:26:34 +02:00
Gael Guennebaud	2c3c95990d	merge	2014-04-17 22:50:49 +02:00
Benoit Steiner	6d6df90c9a	Implemented the pgather/pscatter packet primitives for the arm/NEON architecture	2014-04-17 12:28:01 -07:00
Gael Guennebaud	9746396d1b	Optimize AVX pset1 for complexes and ploaddup	2014-04-17 20:51:04 +02:00
Gael Guennebaud	11fbdcbc38	Fix and optimize mixed products	2014-04-17 16:04:30 +02:00
Gael Guennebaud	0fa8290366	Optimize ploaddup for AVX	2014-04-17 16:02:27 +02:00
Gael Guennebaud	d936ddc3d1	Fallback to lazy products for very small ones.	2014-04-16 23:15:42 +02:00
Gael Guennebaud	de8336a9bc	Enable alloca on MAC OSX	2014-04-16 23:14:58 +02:00
Gael Guennebaud	d5a795f673	New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.	2014-04-16 17:05:11 +02:00
Benoit Steiner	feaf7c7e6d	Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8).	2014-04-14 10:44:17 -07:00
Benoit Steiner	b446ff037e	Deleted some dead code.	2014-04-04 14:12:24 -07:00
Gael Guennebaud	8d0441052e	Finally, prefetching seems to help getting more stable performance	2014-03-31 10:42:19 +02:00
Gael Guennebaud	1c0728043a	Workaround alignment warnings	2014-03-30 22:43:47 +02:00
Gael Guennebaud	e497a27ddc	Optimize gebp kernel: 1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs	2014-03-30 21:57:05 +02:00
Benoit Steiner	ad59ade116	Vectorized the loop peeling of the inner loop of the block-panel matrix multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size.	2014-03-28 12:11:23 -07:00
Gael Guennebaud	10aa14592a	Add a mechanism to recursively access to half-size packet types	2014-03-28 10:18:04 +01:00
Gael Guennebaud	8d2bb2c20d	merge with default branch	2014-03-28 09:24:18 +01:00
Gael Guennebaud	c94fde118a	Enable vectorization of gemv for PacketSize>4 through unaligned loads (still better than no vectorization)	2014-03-28 09:11:06 +01:00
Benoit Steiner	51e85c936d	Merged latest changes from parent.	2014-03-27 18:32:15 -07:00
Benoit Steiner	8a94cb3edd	Implemented the SSE version of the gather and scatter packet primitives.	2014-03-27 18:29:01 -07:00
Benoit Steiner	7f3162f707	Implemented the AVX version of the gather and scatter packet primitives.	2014-03-27 17:42:25 -07:00
Benoit Steiner	ee86679096	Introduced pscatter/pgather packet primitives. They will be used to optimize the loop peeling code of the block-panel matrix multiplication kernel.	2014-03-27 16:03:03 -07:00
Gael Guennebaud	58fe2fc2b2	enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates the other fmadd variants plus some register moves...)	2014-03-27 23:38:50 +01:00
Benoit Steiner	729363114f	Fixed compilation error when FMA instructions are enabled.	2014-03-27 11:20:41 -07:00
Benoit Steiner	1697d7a179	Silenced "unused variable" warnings when compiling with FMA.	2014-03-27 11:00:47 -07:00
Benoit Steiner	3e1fe8e416	Vectorized the packing of a col-major matrix used as the right hand side argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance.	2014-03-27 10:38:41 -07:00
Benoit Steiner	b776458ccb	Vectorized the packing of a row-major matrix used as the left hand side argument in a matrix-matrix product.	2014-03-27 10:02:24 -07:00
Benoit Steiner	c4902a3d01	Implemented the AVX version of the ptranspose packet primitive.	2014-03-27 09:34:51 -07:00
Gael Guennebaud	052aedd394	Implement pcplflip, palign, predux and the likes from AVC/complexes	2014-03-27 14:47:00 +01:00
Gael Guennebaud	fb03b56647	Fix warning	2014-03-27 11:38:35 +01:00
Mark Borgerding	9ce0d78513	immintrin.h did not come until intel version 11	2014-03-26 22:26:07 -04:00
Benoit Steiner	a419cea4a0	Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions.	2014-03-26 19:03:07 -07:00
Benoit Steiner	14bc4b9704	Made sure that the version of gemm_pack_rhs specialized for row major matrices is vectorized when nr == 2*PacketSize (which is the case for SSE when compiling in 64bit mode).	2014-03-26 17:35:18 -07:00
Benoit Steiner	e45a6bed45	Specialized the pload1 packet primitive for Packet8f and Packet4d in order to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.	2014-03-26 15:58:13 -07:00
Benoit Steiner	cc73164aa8	Merged latest updates from the parent branch	2014-03-26 15:23:59 -07:00
Gael Guennebaud	f0a4c9d5ab	Update gebp kernel to process a panle of 4 columns at once for the remaining ones.	2014-03-26 23:22:36 +01:00
Gael Guennebaud	8be011e776	Remove remaining bits of the dead working buffer	2014-03-26 23:14:44 +01:00
Benoit Steiner	a078f442a3	Vectorized the multiplication and division of complex numbers using AVX instructions.	2014-03-26 15:11:18 -07:00
Benoit Steiner	cf1a7bfbe1	Used AVX instructions to vectorize the complex version of the pfirst and ploaddup packet primitives. Silenced a few compilation warnings.	2014-03-26 12:03:31 -07:00
Gael Guennebaud	bc401eb6fa	Implement new 1 packet x 8 gebp kernel	2014-03-26 18:53:00 +01:00
Gael Guennebaud	b286a1e75c	add pbroadcast2/4 generic intrinsics	2014-03-26 16:46:36 +01:00
Benoit Steiner	6bf3cc2732	Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, preverse<Packet2cd>, and preverse<Packet4cf>	2014-03-25 09:00:43 -07:00
Benoit Steiner	7ae9b0805d	Used AVX instructions to vectorize the predux_min<Packet8f>, predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives.	2014-03-24 13:33:40 -07:00
Benoit Steiner	72707a8664	Made sure that EIGEN_ALIGN is defined when EIGEN_DONT_VECTORIZE is set to true to prevent build failures when vectorization is disabled.	2014-03-21 11:40:29 -07:00
Benoit Steiner	8a0845ebd7	Merged latest changes from the parent	2014-03-18 12:58:08 -07:00
Christoph Hertzberg	35a2c9cde7	clang does not accept this without template keyword	2014-03-14 16:48:29 +01:00
Gael Guennebaud	bb4b67cf39	Relax Ref such that Ref<MatrixXf> accepts a RowVectorXf which can be seen as a degenerate MatrixXf(1,N)	2014-03-13 18:04:19 +01:00
Christoph Hertzberg	2db792852f	Silence stupid parenthesis warnings for old GCC versions (<= 4.6.x)	2014-03-13 12:58:57 +01:00

1 2 3 4 5 ...

3264 Commits