eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Rasmus Munk Larsen	edaa0fc5d1	Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy. This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.	2017-01-26 12:46:06 -08:00
Gael Guennebaud	25a1703579	Merged in ggael/eigen-flexidexing (pull request PR-294) generalized operator() for indexed access and slicing	2017-01-26 08:04:23 +00:00
Gael Guennebaud	607be65a03	Fix duplicates of array_size bewteen unsupported and Core	2017-01-25 22:53:58 +01:00
Benoit Steiner	e96c77668d	Merged in rmlarsen/eigen2 (pull request PR-292) Adds a fast memcpy function to Eigen.	2017-01-25 00:14:04 +00:00
Rasmus Munk Larsen	e6b1020221	Adds a fast memcpy function to Eigen. This takes advantage of the following: 1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster. 2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation. The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}. Measured improvements in wall clock time: Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_memcpy_1T/2 3.48 2.39 +31.3% BM_memcpy_1T/8 12.3 6.51 +47.0% BM_memcpy_1T/64 371 383 -3.2% BM_memcpy_1T/512 66922 66720 +0.3% BM_memcpy_1T/4k 9892867 6849682 +30.8% BM_memcpy_1T/5k 14951099 10332856 +30.9% BM_memcpy_2T/2 3.50 2.46 +29.7% BM_memcpy_2T/8 12.3 7.66 +37.7% BM_memcpy_2T/64 371 376 -1.3% BM_memcpy_2T/512 66652 66788 -0.2% BM_memcpy_2T/4k 6145012 6117776 +0.4% BM_memcpy_2T/5k 9181478 9010942 +1.9% BM_memcpy_4T/2 3.47 2.47 +31.0% BM_memcpy_4T/8 12.3 6.67 +45.8 BM_memcpy_4T/64 374 376 -0.5% BM_memcpy_4T/512 67833 68019 -0.3% BM_memcpy_4T/4k 5057425 5188253 -2.6% BM_memcpy_4T/5k 7555638 7779468 -3.0% BM_memcpy_6T/2 3.51 2.50 +28.8% BM_memcpy_6T/8 12.3 7.61 +38.1% BM_memcpy_6T/64 373 378 -1.3% BM_memcpy_6T/512 66871 66774 +0.1% BM_memcpy_6T/4k 5112975 5233502 -2.4% BM_memcpy_6T/5k 7614180 7772246 -2.1% BM_memcpy_8T/2 3.47 2.41 +30.5% BM_memcpy_8T/8 12.4 10.5 +15.3% BM_memcpy_8T/64 372 388 -4.3% BM_memcpy_8T/512 67373 66588 +1.2% BM_memcpy_8T/4k 5148462 5254897 -2.1% BM_memcpy_8T/5k 7660989 7799058 -1.8% BM_memcpy_12T/2 3.50 2.40 +31.4% BM_memcpy_12T/8 12.4 7.55 +39.1 BM_memcpy_12T/64 374 378 -1.1% BM_memcpy_12T/512 67132 66683 +0.7% BM_memcpy_12T/4k 5185125 5292920 -2.1% BM_memcpy_12T/5k 7717284 7942684 -2.9% BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4% BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4% BM_slicingSmallPieces_1T/64 491 476 +3.1% BM_slicingSmallPieces_1T/512 21734 18814 +13.4% BM_slicingSmallPieces_1T/4k 394660 396760 -0.5% BM_slicingSmallPieces_1T/5k 218722 209244 +4.3% BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0% BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0 BM_slicingSmallPieces_2T/64 497 477 +4.0% BM_slicingSmallPieces_2T/512 21732 18822 +13.4% BM_slicingSmallPieces_2T/4k 392885 390490 +0.6% BM_slicingSmallPieces_2T/5k 221988 208678 +6.0% BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9% BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7% BM_slicingSmallPieces_4T/64 493 476 +3.4% BM_slicingSmallPieces_4T/512 21702 18758 +13.6% BM_slicingSmallPieces_4T/4k 393962 404023 -2.6% BM_slicingSmallPieces_4T/5k 249667 211732 +15.2% BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5% BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8% BM_slicingSmallPieces_6T/64 488 478 +2.0% BM_slicingSmallPieces_6T/512 21719 18841 +13.3% BM_slicingSmallPieces_6T/4k 394950 397583 -0.7% BM_slicingSmallPieces_6T/5k 223080 210148 +5.8% BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0% BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9% BM_slicingSmallPieces_8T/64 489 480 +1.8% BM_slicingSmallPieces_8T/512 21586 18798 +12.9% BM_slicingSmallPieces_8T/4k 394592 400165 -1.4% BM_slicingSmallPieces_8T/5k 219688 208301 +5.2% BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7% BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8 BM_slicingSmallPieces_12T/64 488 476 +2.5% BM_slicingSmallPieces_12T/512 21931 18831 +14.1% BM_slicingSmallPieces_12T/4k 393962 396541 -0.7% BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%	2017-01-24 13:55:18 -08:00
Rasmus Munk Larsen	5e144bbaa4	Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op. See #1373 for details.	2017-01-24 13:32:50 -08:00
NeroBurner	c4fc2611ba	add cmake-option to enable/disable creation of tests * * * disable unsupportet/test when test are disabled * * * rename EIGEN_ENABLE_TESTS to BUILD_TESTING * * * consider BUILD_TESTING in blas	2017-01-02 09:09:21 +01:00
Benoit Steiner	660da83e18	Pulled latest update from trunk	2016-12-21 16:43:27 -08:00
Benoit Steiner	4236aebe10	Simplified the contraction code`	2016-12-21 16:42:56 -08:00
Benoit Steiner	3cfa16f41d	Merged in benoitsteiner/opencl (pull request PR-279) Fix for auto appearing in functor template argument.	2016-12-21 15:08:54 -08:00
Benoit Steiner	519d63d350	Added support for libxsmm kernel in multithreaded contractions	2016-12-21 15:06:06 -08:00
Benoit Steiner	0657228569	Simplified the way we link libxsmm	2016-12-21 14:40:08 -08:00
Benoit Steiner	f9eff17e91	Leverage libxsmm kernels within signle threaded contractions	2016-12-21 12:32:06 -08:00
Benoit Steiner	c19fe5e9ed	Added support for libxsmm in the eigen makefiles	2016-12-21 10:43:40 -08:00
Luke Iwanski	c55ecfd820	Fix for auto appearing in functor template argument.	2016-12-21 15:42:51 +00:00
Benoit Steiner	0f577d4744	Merged eigen/eigen into default	2016-12-20 17:02:06 -08:00
Luke Iwanski	29186f766f	Fixed order of initialisation in ExecExprFunctorKernel functor.	2016-12-20 21:32:42 +00:00
Gael Guennebaud	e8d6862f14	Properly adjust precision when saving to Market format.	2016-12-20 22:10:33 +01:00
Gael Guennebaud	e2f4ee1c2b	Speed up parsing of sparse Market file.	2016-12-20 21:56:21 +01:00
Luke Iwanski	8245851d1b	Matching parameters order between lambda and the functor.	2016-12-20 16:18:15 +00:00
Benoit Steiner	548ed30a1c	Added an OpenCL regression test	2016-12-19 18:56:26 -08:00
Benoit Steiner	27ceb43bf6	Fixed race condition in the tensor_shuffling_sycl test	2016-12-19 15:34:42 -08:00
Benoit Steiner	70d0172f0c	Merged eigen/eigen into default	2016-12-16 17:37:04 -08:00
Benoit Steiner	8910442e19	Fixed memcpy, memcpyHostToDevice and memcpyDeviceToHost for Sycl.	2016-12-16 15:45:04 -08:00
Luke Iwanski	54db66c5df	struct -> class in order to silence compilation warning.	2016-12-16 20:25:20 +00:00
Mehdi Goli	35bae513a0	Converting all parallel for lambda to functor in order to prevent kernel duplication name error; adding tensorConcatinationOp backend for sycl.	2016-12-16 19:46:45 +00:00
Mehdi Goli	c5e8546306	Adding asynchandler to sycl queue as lack of it can cause undefined behaviour.	2016-12-15 16:59:57 +00:00
Benoit Steiner	2c2e218471	Avoid using #define since they can conflict with user code	2016-12-14 19:49:15 -08:00
Benoit Steiner	3beb180ee5	Don't call EnvThread::OnCancel by default since it doesn't do anything.	2016-12-14 18:33:39 -08:00
Benoit Steiner	9ff5d0f821	Merged eigen/eigen into default	2016-12-14 17:32:16 -08:00
Mehdi Goli	730eb9fe1c	Adding asynchronous execution as it improves the performance.	2016-12-14 17:38:53 +00:00
Mehdi Goli	2d4a091beb	Adding tensor contraction operation backend for Sycl; adding test for contractionOp sycl backend; adding temporary solution to prevent memory leak in buffer; cleaning up cxx11_tensor_buildins_sycl.h	2016-12-14 15:30:37 +00:00
Benoit Steiner	a432fc102d	Moved the choice of ThreadPool to unsupported/Eigen/CXX11/ThreadPool	2016-12-12 15:24:16 -08:00
Benoit Steiner	8ae68924ed	Made ThreadPoolInterface::Cancel() an optional functionality	2016-12-12 11:58:38 -08:00
Benoit Steiner	76fca22134	Use a more accurate timer to sleep on Linux systems.	2016-12-09 15:12:24 -08:00
Benoit Steiner	4deafd35b7	Introduce a portable EIGEN_SLEEP macro.	2016-12-09 14:52:15 -08:00
Benoit Steiner	aafa97f4d2	Fixed build error with MSVC	2016-12-09 14:42:32 -08:00
Benoit Steiner	2f5b7a199b	Reworked the threadpool cancellation mechanism to not depend on pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms.	2016-12-09 13:05:14 -08:00
Benoit Steiner	3d59a47720	Added a message to ease the detection of platforms on which thread cancellation isn't supported.	2016-12-08 14:51:46 -08:00
Benoit Steiner	28ee8f42b2	Added a Flush method to the RunQueue	2016-12-08 14:07:56 -08:00
Benoit Steiner	69ef267a77	Added the new threadpool cancel method to the threadpool interface based class.	2016-12-08 14:03:25 -08:00
Benoit Steiner	7bfff85355	Added support for thread cancellation on Linux	2016-12-08 08:12:49 -08:00
Benoit Steiner	462c28e77a	Merged in srvasude/eigen (pull request PR-265) Add Expm1 support to Eigen.	2016-12-05 02:31:11 +00:00
Gael Guennebaud	4465d20403	Add missing generic load methods.	2016-12-03 21:25:04 +01:00
Srinivas Vasudevan	218764ee1f	Added support for expm1 in Eigen.	2016-12-02 14:13:01 -08:00
Mehdi Goli	592acc5bfa	Makingt default numeric_list works with sycl.	2016-12-02 17:58:30 +00:00
Mehdi Goli	79aa2b784e	Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.	2016-12-01 13:02:27 +00:00
Benoit Steiner	a70393fd02	Cleaned up forward declarations	2016-11-30 21:59:07 -08:00
Benoit Steiner	e073de96dc	Moved the MemCopyFunctor back to TensorSyclDevice since it's the only caller and it makes TensorFlow compile again	2016-11-30 21:36:52 -08:00
Benoit Steiner	fca27350eb	Added the deallocate_all() method back	2016-11-30 20:45:20 -08:00

1 2 3 4 5 ...

2202 Commits