eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Steven Peters	953ca5ba2f	Spline.h: fix spelling "spang" -> "span"	2019-02-08 06:23:24 +00:00
Eugene Zhulenev	59998117bb	Don't do parallel_pack if we can use thread_local memory in tensor contractions	2019-02-07 09:21:25 -08:00
Gael Guennebaud	013cc3a6b3	Make GEMM fallback to GEMV for runtime vectors. This is a more general and simpler version of changeset `4c0fa6ce0f`	2019-02-07 16:24:09 +01:00
Gael Guennebaud	fa2fcb4895	Backed out changeset `4c0fa6ce0f`	2019-02-07 16:07:08 +01:00
Gael Guennebaud	b3c4344a68	bug #1676 : workaround GCC's bug in c++17 mode.	2019-02-07 15:21:35 +01:00
Rasmus Larsen	3091c03898	Merged in ezhulenev/eigen-01 (pull request PR-581) Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing Approved-by: Rasmus Larsen <rmlarsen@google.com> Approved-by: Gael Guennebaud <g.gael@free.fr>	2019-02-05 22:45:20 +00:00
Eugene Zhulenev	8491127082	Do not reduce parallelism too much in contractions with small number of threads	2019-02-04 12:59:33 -08:00
Eugene Zhulenev	eb21bab769	Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing	2019-02-04 10:43:16 -08:00
Eugene Zhulenev	6d0f6265a9	Remove duplicated comment line	2019-02-04 10:30:25 -08:00
Eugene Zhulenev	690b2c45b1	Fix GeneralBlockPanelKernel Android compilation	2019-02-04 10:29:15 -08:00
Gael Guennebaud	871e2e5339	bug #1674 : disable GCC's unsafe-math-optimizations in sin/cos vectorization (results are completely wrong otherwise)	2019-02-03 08:54:47 +01:00
Rasmus Larsen	e7b481ea74	Merged in rmlarsen/eigen (pull request PR-578) Speed up Eigen matrixvector and vectormatrix multiplication. Approved-by: Eugene Zhulenev <ezhulenev@google.com>	2019-02-02 01:53:44 +00:00
Sameer Agarwal	b55b5c7280	Speed up row-major matrix-vector product on ARM The row-major matrix-vector multiplication code uses a threshold to check if processing 8 rows at a time would thrash the cache. This change introduces two modifications to this logic. 1. A smaller threshold for ARM and ARM64 devices. The value of this threshold was determined empirically using a Pixel2 phone, by benchmarking a large number of matrix-vector products in the range [1..4096]x[1..4096] and measuring performance separately on small and little cores with frequency pinning. On big (out-of-order) cores, this change has little to no impact. But on the small (in-order) cores, the matrix-vector products are up to 700% faster. Especially on large matrices. The motivation for this change was some internal code at Google which was using hand-written NEON for implementing similar functionality, processing the matrix one row at a time, which exhibited substantially better performance than Eigen. With the current change, Eigen handily beats that code. 2. Make the logic for choosing number of simultaneous rows apply unifiormly to 8, 4 and 2 rows instead of just 8 rows. Since the default threshold for non-ARM devices is essentially unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM performance. This was verified by running the same set of benchmarks on a Xeon desktop.	2019-02-01 15:23:53 -08:00
Rasmus Munk Larsen	4c0fa6ce0f	Speed up Eigen matrixvector and vectormatrix multiplication. This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector. The benchmarks below test c.noalias()= n_by_n_matrix * n_by_1_matrix; c.noalias()= 1_by_n_matrix * n_by_n_matrix; respectively. Benchmark measurements: SSE: Run on * (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 1096 312 +71.5% BM_MatVec/128 4581 1464 +68.0% BM_MatVec/256 18534 5710 +69.2% BM_MatVec/512 118083 24162 +79.5% BM_MatVec/1k 704106 173346 +75.4% BM_MatVec/2k 3080828 742728 +75.9% BM_MatVec/4k 25421512 4530117 +82.2% BM_VecMat/32 352 130 +63.1% BM_VecMat/64 1213 425 +65.0% BM_VecMat/128 4640 1564 +66.3% BM_VecMat/256 17902 5884 +67.1% BM_VecMat/512 70466 24000 +65.9% BM_VecMat/1k 340150 161263 +52.6% BM_VecMat/2k 1420590 645576 +54.6% BM_VecMat/4k 8083859 4364327 +46.0% AVX2: Run on * (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 619 120 +80.6% BM_MatVec/128 9693 752 +92.2% BM_MatVec/256 38356 2773 +92.8% BM_MatVec/512 69006 12803 +81.4% BM_MatVec/1k 443810 160378 +63.9% BM_MatVec/2k 2633553 646594 +75.4% BM_MatVec/4k 16211095 4327148 +73.3% BM_VecMat/64 925 227 +75.5% BM_VecMat/128 3438 830 +75.9% BM_VecMat/256 13427 2936 +78.1% BM_VecMat/512 53944 12473 +76.9% BM_VecMat/1k 302264 157076 +48.0% BM_VecMat/2k 1396811 675778 +51.6% BM_VecMat/4k 8962246 4459010 +50.2% AVX512: Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 401 111 +72.3% BM_MatVec/128 1846 513 +72.2% BM_MatVec/256 36739 1927 +94.8% BM_MatVec/512 54490 9227 +83.1% BM_MatVec/1k 487374 161457 +66.9% BM_MatVec/2k 2016270 643824 +68.1% BM_MatVec/4k 13204300 4077412 +69.1% BM_VecMat/32 324 106 +67.3% BM_VecMat/64 1034 246 +76.2% BM_VecMat/128 3576 802 +77.6% BM_VecMat/256 13411 2561 +80.9% BM_VecMat/512 58686 10037 +82.9% BM_VecMat/1k 320862 163750 +49.0% BM_VecMat/2k 1406719 651397 +53.7% BM_VecMat/4k 7785179 4124677 +47.0% Currently watchingStop watching	2019-01-31 14:24:08 -08:00
Gael Guennebaud	7ef879f6bf	GEBP: improves pipelining in the 1pX4 path with FMA. Prior to this change, a product with a LHS having 8 rows was faster with AVX-only than with AVX+FMA. With AVX+FMA I measured a speed up of about x1.25 in such cases.	2019-01-30 23:45:12 +01:00
Gael Guennebaud	de77bf5d6c	Fix compilation with ARM64.	2019-01-30 16:48:20 +01:00
Gael Guennebaud	d586686924	Workaround lack of support for arbitrary packet-type in Tensor by manually loading half/quarter packets in tensor contraction mapper.	2019-01-30 16:48:01 +01:00
Gael Guennebaud	eb4c6bb22d	Fix conflicts and merge	2019-01-30 15:57:08 +01:00
Gael Guennebaud	e3622a0396	Slightly extend discussions on auto and move the content of the Pit falls wiki page here. http://eigen.tuxfamily.org/index.php?title=Pit_Falls	2019-01-30 13:09:21 +01:00
Gael Guennebaud	df12fae8b8	According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101 , the previous GCC issue is fixed in GCC trunk (will be gcc 9).	2019-01-30 11:52:28 +01:00
Gael Guennebaud	3775926bba	ARM64 & GEBP: add specialization for double +30% speed up	2019-01-30 11:49:06 +01:00
Gael Guennebaud	be5b0f664a	ARM64 & GEBP: Make use of vfmaq_laneq_f32 and workaround GCC's issue in generating good ASM	2019-01-30 11:48:25 +01:00
Christoph Hertzberg	a7779a9b42	Hide some annoying unused variable warnings in g++8.1	2019-01-29 16:48:21 +01:00
Gael Guennebaud	efe02292a6	Add recent gemm related changesets and various cleanups in perf-monitoring	2019-01-29 11:53:47 +01:00
Gael Guennebaud	8a06c699d0	bug #1669 : fix PartialPivLU/inverse with zero-sized matrices.	2019-01-29 10:27:13 +01:00
Gael Guennebaud	a2a07e62b9	Fix compilation with c++03 (local class cannot be template arguments), and make SparseMatrix::assignDiagonal truly protected.	2019-01-29 10:10:07 +01:00
Gael Guennebaud	f489f44519	bug #1574 : implement "sparse_matrix =,+=,-= diagonal_matrix" with smart insertion strategies of missing diagonal coeffs.	2019-01-28 17:29:50 +01:00
Gael Guennebaud	803fa79767	Move evaluator<SparseCompressedBase>::find(i,j) to a more general and reusable SparseCompressedBase::lower_bound(i,j) functiion	2019-01-28 17:24:44 +01:00
Gael Guennebaud	53560f9186	bug #1672 : fix unit test compilation with MSVC by adding overloads of test_is* for long long (and factorize copy/paste code through a macro)	2019-01-28 13:47:28 +01:00
Christoph Hertzberg	c9825b967e	Renaming even more `I` identifiers	2019-01-26 13:22:13 +01:00
Christoph Hertzberg	5a52e35f9a	Renaming some more `I` identifiers	2019-01-26 13:18:21 +01:00
Rasmus Munk Larsen	71429883ee	Fix compilation error in NEON GEBP specializaition of madd.	2019-01-25 17:00:21 -08:00
Christoph Hertzberg	934b8a1304	Avoid `I` as an identifier, since it may clash with the C-header complex.h	2019-01-25 14:54:39 +01:00
Gael Guennebaud	ec8a387972	cleanup	2019-01-24 10:24:45 +01:00
Gael Guennebaud	6908ce2a15	More thoroughly check variadic template ctor of fixed-size vectors	2019-01-24 10:24:28 +01:00
David Tellenbach	237b03b372	PR 574: use variadic template instead of initializer_list to implement fixed-size vector ctor from coefficients.	2019-01-23 00:07:19 +01:00
Christoph Hertzberg	bd6dadcda8	Tell doxygen that cxx11 math is available	2019-01-24 00:14:02 +01:00
Gael Guennebaud	c64d5d3827	Bypass inline asm for non compatible compilers.	2019-01-23 23:43:13 +01:00
Christoph Hertzberg	e16913a45f	Fix name of tutorial snippet.	2019-01-23 10:35:06 +01:00
Gael Guennebaud	80f81f9c4b	Cleanup SFINAE in Array/Matrix(initializer_list) ctors and minor doc editing.	2019-01-22 17:08:47 +01:00
David Tellenbach	db152b9ee6	PR 572: Add initializer list constructors to Matrix and Array (include unit tests and doc) - {1,2,3,4,5,...} for fixed-size vectors only - {{1,2,3},{4,5,6}} for the general cases - {{1,2,3,4,5,....}} is allowed for both row and column-vector	2019-01-21 16:25:57 +01:00
Gael Guennebaud	543529da6a	Add more extensive tests of Array ctors, including {} variants	2019-01-22 15:30:50 +01:00
nluehr	92774f0275	Replace host_define.h with cuda_runtime_api.h	2019-01-18 16:10:09 -06:00
Gael Guennebaud	d18f49cbb3	Fix compilation of unit tests with gcc and c++17	2019-01-18 11:12:42 +01:00
Christoph Hertzberg	da0a41b9ce	Mask unused-parameter warnings, when building with NDEBUG	2019-01-18 10:41:14 +01:00
Rasmus Munk Larsen	2eccbaf3f7	Add missing logical packet ops for GPU and NEON.	2019-01-17 17:45:08 -08:00
Christoph Hertzberg	d575505d25	After fixing bug #1557 , boostmultiprec_7 failed with NumericalIssue instead of NoConvergence (all that matters here is no Success)	2019-01-17 19:14:07 +01:00
Gael Guennebaud	ee3662abc5	Remove some useless const_cast	2019-01-17 18:27:49 +01:00
Gael Guennebaud	0fe6b7d687	Make nestByValue works again (broken since 3.3) and add unit tests.	2019-01-17 18:27:25 +01:00
Gael Guennebaud	4b7cf7ff82	Extend reshaped unit tests and remove useless const_cast	2019-01-17 17:35:32 +01:00

... 2 3 4 5 6 ...

10576 Commits