eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-27 07:29:52 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	d06a48959a	bug #1383 : Fix regression from 3.2 with LinSpaced(n,0,n-1) with n==0.	2017-01-25 15:27:13 +01:00
Rasmus Munk Larsen	ae3e43a125	Remove extra space.	2017-01-24 16:16:39 -08:00
Benoit Steiner	e96c77668d	Merged in rmlarsen/eigen2 (pull request PR-292) Adds a fast memcpy function to Eigen.	2017-01-25 00:14:04 +00:00
Rasmus Munk Larsen	3be5ee2352	Update copy helper to use fast_memcpy.	2017-01-24 14:22:49 -08:00
Rasmus Munk Larsen	e6b1020221	Adds a fast memcpy function to Eigen. This takes advantage of the following: 1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster. 2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation. The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}. Measured improvements in wall clock time: Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_memcpy_1T/2 3.48 2.39 +31.3% BM_memcpy_1T/8 12.3 6.51 +47.0% BM_memcpy_1T/64 371 383 -3.2% BM_memcpy_1T/512 66922 66720 +0.3% BM_memcpy_1T/4k 9892867 6849682 +30.8% BM_memcpy_1T/5k 14951099 10332856 +30.9% BM_memcpy_2T/2 3.50 2.46 +29.7% BM_memcpy_2T/8 12.3 7.66 +37.7% BM_memcpy_2T/64 371 376 -1.3% BM_memcpy_2T/512 66652 66788 -0.2% BM_memcpy_2T/4k 6145012 6117776 +0.4% BM_memcpy_2T/5k 9181478 9010942 +1.9% BM_memcpy_4T/2 3.47 2.47 +31.0% BM_memcpy_4T/8 12.3 6.67 +45.8 BM_memcpy_4T/64 374 376 -0.5% BM_memcpy_4T/512 67833 68019 -0.3% BM_memcpy_4T/4k 5057425 5188253 -2.6% BM_memcpy_4T/5k 7555638 7779468 -3.0% BM_memcpy_6T/2 3.51 2.50 +28.8% BM_memcpy_6T/8 12.3 7.61 +38.1% BM_memcpy_6T/64 373 378 -1.3% BM_memcpy_6T/512 66871 66774 +0.1% BM_memcpy_6T/4k 5112975 5233502 -2.4% BM_memcpy_6T/5k 7614180 7772246 -2.1% BM_memcpy_8T/2 3.47 2.41 +30.5% BM_memcpy_8T/8 12.4 10.5 +15.3% BM_memcpy_8T/64 372 388 -4.3% BM_memcpy_8T/512 67373 66588 +1.2% BM_memcpy_8T/4k 5148462 5254897 -2.1% BM_memcpy_8T/5k 7660989 7799058 -1.8% BM_memcpy_12T/2 3.50 2.40 +31.4% BM_memcpy_12T/8 12.4 7.55 +39.1 BM_memcpy_12T/64 374 378 -1.1% BM_memcpy_12T/512 67132 66683 +0.7% BM_memcpy_12T/4k 5185125 5292920 -2.1% BM_memcpy_12T/5k 7717284 7942684 -2.9% BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4% BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4% BM_slicingSmallPieces_1T/64 491 476 +3.1% BM_slicingSmallPieces_1T/512 21734 18814 +13.4% BM_slicingSmallPieces_1T/4k 394660 396760 -0.5% BM_slicingSmallPieces_1T/5k 218722 209244 +4.3% BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0% BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0 BM_slicingSmallPieces_2T/64 497 477 +4.0% BM_slicingSmallPieces_2T/512 21732 18822 +13.4% BM_slicingSmallPieces_2T/4k 392885 390490 +0.6% BM_slicingSmallPieces_2T/5k 221988 208678 +6.0% BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9% BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7% BM_slicingSmallPieces_4T/64 493 476 +3.4% BM_slicingSmallPieces_4T/512 21702 18758 +13.6% BM_slicingSmallPieces_4T/4k 393962 404023 -2.6% BM_slicingSmallPieces_4T/5k 249667 211732 +15.2% BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5% BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8% BM_slicingSmallPieces_6T/64 488 478 +2.0% BM_slicingSmallPieces_6T/512 21719 18841 +13.3% BM_slicingSmallPieces_6T/4k 394950 397583 -0.7% BM_slicingSmallPieces_6T/5k 223080 210148 +5.8% BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0% BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9% BM_slicingSmallPieces_8T/64 489 480 +1.8% BM_slicingSmallPieces_8T/512 21586 18798 +12.9% BM_slicingSmallPieces_8T/4k 394592 400165 -1.4% BM_slicingSmallPieces_8T/5k 219688 208301 +5.2% BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7% BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8 BM_slicingSmallPieces_12T/64 488 476 +2.5% BM_slicingSmallPieces_12T/512 21931 18831 +14.1% BM_slicingSmallPieces_12T/4k 393962 396541 -0.7% BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%	2017-01-24 13:55:18 -08:00
Rasmus Munk Larsen	7b6aaa3440	Fix NaN propagation for AVX512.	2017-01-24 13:37:08 -08:00
Rasmus Munk Larsen	5e144bbaa4	Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op. See #1373 for details.	2017-01-24 13:32:50 -08:00
Gael Guennebaud	d83db761a2	Add support for std::integral_constant	2017-01-24 16:28:12 +01:00
Gael Guennebaud	bc10201854	Add test for multiple symbols	2017-01-24 16:27:51 +01:00
Gael Guennebaud	c43d254d13	Fix seq().reverse() in c++98	2017-01-24 11:36:43 +01:00
Gael Guennebaud	5783158e8f	Add unit test for FixedInt and Symbolic	2017-01-24 10:55:12 +01:00
Gael Guennebaud	ddd83f82d8	Add support for "SymbolicExpr op fix<N>" in C++98/11 mode.	2017-01-24 10:54:42 +01:00
Gael Guennebaud	228fef1b3a	Extended the set of arithmetic operators supported by FixedInt (-,+,*,/,%,&,\|)	2017-01-24 10:53:51 +01:00
Gael Guennebaud	bb52f74e62	Add internal doc	2017-01-24 10:13:35 +01:00
Gael Guennebaud	41c523a0ab	Rename fix_t to FixedInt	2017-01-24 09:39:49 +01:00
Gael Guennebaud	156e6234f1	bug #1375 : fix cmake installation with cmake 2.8	2017-01-24 09:16:40 +01:00
Gael Guennebaud	ba3f977946	bug #1376 : add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j))	2017-01-23 22:06:08 +01:00
Gael Guennebaud	b0db4eff36	bug #1382 : move using std::size_t/ptrdiff_t to Eigen's namespace (still better than the global namespace!)	2017-01-23 22:03:57 +01:00
Gael Guennebaud	ca79c1545a	Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t	2017-01-23 22:02:53 +01:00
Gael Guennebaud	4b607b5692	Use Index instead of size_t	2017-01-23 22:00:33 +01:00
Luke Iwanski	bf44fed9b7	Allows AMD APU	2017-01-23 15:56:45 +00:00
Gael Guennebaud	0fe278f7be	bug #1379 : fix compilation in sparsediagonaldense with openmp	2017-01-21 23:27:01 +01:00
Gael Guennebaud	22a172751e	bug #1378 : fix doc (DiagonalIndex vs Diagonal)	2017-01-21 22:09:59 +01:00
Mehdi Goli	602f8c27f5	Reverting back to the previous TensorDeviceSycl.h as the total number of buffer is not enough for tensorflow.	2017-01-20 18:23:20 +00:00
Gael Guennebaud	4d302a080c	Recover compile-time size from seq(A,B) when A and B are fixed values. (c++11 only)	2017-01-19 20:34:18 +01:00
Gael Guennebaud	54f3fbee24	Exploit fixed values in seq and reverse with C++98 compatibility	2017-01-19 19:57:32 +01:00
Gael Guennebaud	7691723e34	Add support for fixed-value in symbolic expression, c++11 only for now.	2017-01-19 19:25:29 +01:00
Benoit Steiner	924600a0e8	Made sure that enabling avx2 instructions enables avx and sse instructions as well.	2017-01-19 09:54:48 -08:00
Mehdi Goli	77cc4d06c7	Removing unused variables	2017-01-19 17:06:21 +00:00
Mehdi Goli	837fdbdcb2	Merging with Benoit's upstream.	2017-01-19 11:34:34 +00:00
Mehdi Goli	6bdd15f572	Adding non-deferrenciable pointer track for ComputeCpp backend; Adding TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class.	2017-01-19 11:30:59 +00:00
Benoit Steiner	aa7fb88dfa	Merged in LaFeuille/eigen (pull request PR-289) Fix a typo	2017-01-18 16:44:39 -08:00
Gael Guennebaud	e84ed7b6ef	Remove dead code	2017-01-18 23:18:28 +01:00
Gael Guennebaud	f3ccbe0419	Add a Symbolic::FixedExpr helper expression to make sure the compiler fully optimize the usage of last and end.	2017-01-18 23:16:32 +01:00
Mehdi Goli	c6f7b33834	Applying Benoit's comment. Embedding synchronisation inside device memcpy so there is no need to externally call synchronise() for device memcopy.	2017-01-18 10:45:28 +00:00
Gael Guennebaud	15471432fe	Add a .reverse() member to ArithmeticSequence.	2017-01-18 11:35:27 +01:00
Gael Guennebaud	e4f8dd860a	Add missing operator*	2017-01-18 10:49:01 +01:00
Gael Guennebaud	198507141b	Update all block expressions to accept compile-time sizes passed by fix<N> or fix<N>(n)	2017-01-18 09:43:58 +01:00
Gael Guennebaud	5484ddd353	Merge the generic and dynamic overloads of block()	2017-01-17 22:11:46 +01:00
Gael Guennebaud	655ba783f8	Defer set-to-zero in triangular = product so that no aliasing issue occur in the common: A.triangularView() = BA.sefladjointView()B.adjoint() case that used to work in 3.2.	2017-01-17 18:03:35 +01:00
Gael Guennebaud	5e36ec3b6f	Fix regression when passing enums to operator()	2017-01-17 17:10:16 +01:00
Gael Guennebaud	f7852c3d16	Fix -Wunnamed-type-template-args	2017-01-17 16:05:58 +01:00
Gael Guennebaud	4f36dcfda8	Add a generic block() method compatible with Eigen::fix	2017-01-17 11:34:28 +01:00
Gael Guennebaud	71e5b71356	Add a get_runtime_value helper to deal with pointer-to-function hack, plus some refactoring to make the internals more consistent.	2017-01-17 11:33:57 +01:00
Gael Guennebaud	59801a3250	Add \newin{3.x} doxygen command	2017-01-17 10:31:28 +01:00
Gael Guennebaud	23bfcfc15f	Add missing overload of get_compile_time for c++98/11	2017-01-17 10:30:21 +01:00
Gael Guennebaud	edff32c2c2	Disambiguate the two versions of fix for doxygen	2017-01-17 10:29:33 +01:00
Gael Guennebaud	4989922be2	Add support for symbolic expressions as arguments of operator()	2017-01-16 22:21:23 +01:00
Gael Guennebaud	12e22a2844	typos in doc	2017-01-16 16:31:19 +01:00
Gael Guennebaud	e70c4c97fa	Typo	2017-01-16 16:20:16 +01:00

1 2 3 4 5 ...

9256 Commits