Gael Guennebaud
|
d06a48959a
|
bug #1383: Fix regression from 3.2 with LinSpaced(n,0,n-1) with n==0.
|
2017-01-25 15:27:13 +01:00 |
|
Benoit Steiner
|
e96c77668d
|
Merged in rmlarsen/eigen2 (pull request PR-292)
Adds a fast memcpy function to Eigen.
|
2017-01-25 00:14:04 +00:00 |
|
Rasmus Munk Larsen
|
3be5ee2352
|
Update copy helper to use fast_memcpy.
|
2017-01-24 14:22:49 -08:00 |
|
Rasmus Munk Larsen
|
e6b1020221
|
Adds a fast memcpy function to Eigen. This takes advantage of the following:
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.
2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux
Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.
The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.
Measured improvements in wall clock time:
Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2 3.48 2.39 +31.3%
BM_memcpy_1T/8 12.3 6.51 +47.0%
BM_memcpy_1T/64 371 383 -3.2%
BM_memcpy_1T/512 66922 66720 +0.3%
BM_memcpy_1T/4k 9892867 6849682 +30.8%
BM_memcpy_1T/5k 14951099 10332856 +30.9%
BM_memcpy_2T/2 3.50 2.46 +29.7%
BM_memcpy_2T/8 12.3 7.66 +37.7%
BM_memcpy_2T/64 371 376 -1.3%
BM_memcpy_2T/512 66652 66788 -0.2%
BM_memcpy_2T/4k 6145012 6117776 +0.4%
BM_memcpy_2T/5k 9181478 9010942 +1.9%
BM_memcpy_4T/2 3.47 2.47 +31.0%
BM_memcpy_4T/8 12.3 6.67 +45.8
BM_memcpy_4T/64 374 376 -0.5%
BM_memcpy_4T/512 67833 68019 -0.3%
BM_memcpy_4T/4k 5057425 5188253 -2.6%
BM_memcpy_4T/5k 7555638 7779468 -3.0%
BM_memcpy_6T/2 3.51 2.50 +28.8%
BM_memcpy_6T/8 12.3 7.61 +38.1%
BM_memcpy_6T/64 373 378 -1.3%
BM_memcpy_6T/512 66871 66774 +0.1%
BM_memcpy_6T/4k 5112975 5233502 -2.4%
BM_memcpy_6T/5k 7614180 7772246 -2.1%
BM_memcpy_8T/2 3.47 2.41 +30.5%
BM_memcpy_8T/8 12.4 10.5 +15.3%
BM_memcpy_8T/64 372 388 -4.3%
BM_memcpy_8T/512 67373 66588 +1.2%
BM_memcpy_8T/4k 5148462 5254897 -2.1%
BM_memcpy_8T/5k 7660989 7799058 -1.8%
BM_memcpy_12T/2 3.50 2.40 +31.4%
BM_memcpy_12T/8 12.4 7.55 +39.1
BM_memcpy_12T/64 374 378 -1.1%
BM_memcpy_12T/512 67132 66683 +0.7%
BM_memcpy_12T/4k 5185125 5292920 -2.1%
BM_memcpy_12T/5k 7717284 7942684 -2.9%
BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4%
BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4%
BM_slicingSmallPieces_1T/64 491 476 +3.1%
BM_slicingSmallPieces_1T/512 21734 18814 +13.4%
BM_slicingSmallPieces_1T/4k 394660 396760 -0.5%
BM_slicingSmallPieces_1T/5k 218722 209244 +4.3%
BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0%
BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0
BM_slicingSmallPieces_2T/64 497 477 +4.0%
BM_slicingSmallPieces_2T/512 21732 18822 +13.4%
BM_slicingSmallPieces_2T/4k 392885 390490 +0.6%
BM_slicingSmallPieces_2T/5k 221988 208678 +6.0%
BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9%
BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7%
BM_slicingSmallPieces_4T/64 493 476 +3.4%
BM_slicingSmallPieces_4T/512 21702 18758 +13.6%
BM_slicingSmallPieces_4T/4k 393962 404023 -2.6%
BM_slicingSmallPieces_4T/5k 249667 211732 +15.2%
BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5%
BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8%
BM_slicingSmallPieces_6T/64 488 478 +2.0%
BM_slicingSmallPieces_6T/512 21719 18841 +13.3%
BM_slicingSmallPieces_6T/4k 394950 397583 -0.7%
BM_slicingSmallPieces_6T/5k 223080 210148 +5.8%
BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0%
BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9%
BM_slicingSmallPieces_8T/64 489 480 +1.8%
BM_slicingSmallPieces_8T/512 21586 18798 +12.9%
BM_slicingSmallPieces_8T/4k 394592 400165 -1.4%
BM_slicingSmallPieces_8T/5k 219688 208301 +5.2%
BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7%
BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8
BM_slicingSmallPieces_12T/64 488 476 +2.5%
BM_slicingSmallPieces_12T/512 21931 18831 +14.1%
BM_slicingSmallPieces_12T/4k 393962 396541 -0.7%
BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%
|
2017-01-24 13:55:18 -08:00 |
|
Rasmus Munk Larsen
|
7b6aaa3440
|
Fix NaN propagation for AVX512.
|
2017-01-24 13:37:08 -08:00 |
|
Rasmus Munk Larsen
|
5e144bbaa4
|
Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op.
See #1373 for details.
|
2017-01-24 13:32:50 -08:00 |
|
Gael Guennebaud
|
156e6234f1
|
bug #1375: fix cmake installation with cmake 2.8
|
2017-01-24 09:16:40 +01:00 |
|
Gael Guennebaud
|
ba3f977946
|
bug #1376: add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j))
|
2017-01-23 22:06:08 +01:00 |
|
Gael Guennebaud
|
b0db4eff36
|
bug #1382: move using std::size_t/ptrdiff_t to Eigen's namespace (still better than the global namespace!)
|
2017-01-23 22:03:57 +01:00 |
|
Gael Guennebaud
|
ca79c1545a
|
Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t
|
2017-01-23 22:02:53 +01:00 |
|
Gael Guennebaud
|
4b607b5692
|
Use Index instead of size_t
|
2017-01-23 22:00:33 +01:00 |
|
Gael Guennebaud
|
0fe278f7be
|
bug #1379: fix compilation in sparse*diagonal*dense with openmp
|
2017-01-21 23:27:01 +01:00 |
|
Gael Guennebaud
|
22a172751e
|
bug #1378: fix doc (DiagonalIndex vs Diagonal)
|
2017-01-21 22:09:59 +01:00 |
|
Benoit Steiner
|
924600a0e8
|
Made sure that enabling avx2 instructions enables avx and sse instructions as well.
|
2017-01-19 09:54:48 -08:00 |
|
Benoit Steiner
|
aa7fb88dfa
|
Merged in LaFeuille/eigen (pull request PR-289)
Fix a typo
|
2017-01-18 16:44:39 -08:00 |
|
Gael Guennebaud
|
655ba783f8
|
Defer set-to-zero in triangular = product so that no aliasing issue occur in the common:
A.triangularView() = B*A.sefladjointView()*B.adjoint()
case that used to work in 3.2.
|
2017-01-17 18:03:35 +01:00 |
|
LaFeuille
|
1b19b80c06
|
Fix a typo
|
2017-01-13 07:24:55 +00:00 |
|
NeroBurner
|
c4fc2611ba
|
add cmake-option to enable/disable creation of tests
* * *
disable unsupportet/test when test are disabled
* * *
rename EIGEN_ENABLE_TESTS to BUILD_TESTING
* * *
consider BUILD_TESTING in blas
|
2017-01-02 09:09:21 +01:00 |
|
Gael Guennebaud
|
45199b9773
|
Fix typo
|
2017-01-11 09:34:08 +01:00 |
|
Gael Guennebaud
|
ad3eef7608
|
Add link to SO
|
2017-01-09 13:01:39 +01:00 |
|
Gael Guennebaud
|
831fffe874
|
Add missing doc of SparseView
|
2017-01-06 18:01:29 +01:00 |
|
Gael Guennebaud
|
e383d6159a
|
MSVC 2015 has all we want about c++11 and MSVC 2017 fails on binder1st/binder2nd
|
2017-01-06 15:44:13 +01:00 |
|
Gael Guennebaud
|
f3f026c9aa
|
Convert integers to real numbers when computing relative L2 error
|
2017-01-05 13:36:08 +01:00 |
|
Gael Guennebaud
|
2299717fd5
|
Fix and workaround several doxygen issues/warnings
|
2017-01-04 23:27:33 +01:00 |
|
Gael Guennebaud
|
ee6f7f6c0c
|
Add doc for sparse triangular solve functions
|
2017-01-04 23:10:36 +01:00 |
|
Gael Guennebaud
|
5165de97a4
|
Add missing snippet files.
|
2017-01-04 23:08:27 +01:00 |
|
Gael Guennebaud
|
a0a36ad0ef
|
bug #1336: workaround doxygen failing to include numerous members of MatriBase in Matrix
|
2017-01-04 22:02:39 +01:00 |
|
Gael Guennebaud
|
29a1a58113
|
Document selfadjointView
|
2017-01-04 22:01:50 +01:00 |
|
Gael Guennebaud
|
a5ebc92f8d
|
bug #1336: fix doxygen issue regarding EIGEN_CWISE_BINARY_RETURN_TYPE
|
2017-01-04 18:21:44 +01:00 |
|
Gael Guennebaud
|
45b289505c
|
Add debug output
|
2017-01-03 11:31:02 +01:00 |
|
Gael Guennebaud
|
5838f078a7
|
Fix inclusion
|
2017-01-03 11:30:27 +01:00 |
|
Gael Guennebaud
|
8702562177
|
bug #1370: add doc for StorageIndex
|
2017-01-03 11:25:41 +01:00 |
|
Gael Guennebaud
|
575c078759
|
bug #1370: rename _Index to _StorageIndex in SparseMatrix, and add a warning in the doc regarding the 3.2 to 3.3 change of SparseMatrix::Index
|
2017-01-03 11:19:14 +01:00 |
|
Valentin Roussellet
|
d3c5525c23
|
Added += and + operators to inner iterators
Fix #1340
#1340
|
2016-12-28 18:29:30 +01:00 |
|
Gael Guennebaud
|
5c27962453
|
Move common cwise-unary method from MatrixBase/ArrayBase to the common DenseBase class.
|
2017-01-02 22:27:07 +01:00 |
|
Marco Falke
|
4ebf69394d
|
doc: Fix trivial typo in AsciiQuickReference.txt
* * *
fixup!
|
2017-01-01 13:25:48 +00:00 |
|
Gael Guennebaud
|
8d7810a476
|
bug #1365: fix another type mismatch warning
(sync is set from and compared to an Index)
|
2016-12-28 23:35:43 +01:00 |
|
Gael Guennebaud
|
97812ff0d3
|
bug #1369: fix type mismatch warning.
Returned values of omp thread id and numbers are int,
o let's use int instead of Index here.
|
2016-12-28 23:29:35 +01:00 |
|
Gael Guennebaud
|
7713e20fd2
|
Fix compilation
|
2016-12-27 22:04:58 +01:00 |
|
Gael Guennebaud
|
ab69a7f6d1
|
Cleanup because trait<CwiseBinaryOp>::Flags now expose the correct storage order
|
2016-12-27 16:55:47 +01:00 |
|
Gael Guennebaud
|
d32a43e33a
|
Make sure that traits<CwiseBinaryOp>::Flags reports the correct storage order so that methods like .outerSize()/.innerSize() work properly.
|
2016-12-27 16:35:45 +01:00 |
|
Gael Guennebaud
|
7136267461
|
Add missing .outer() member to iterators of evaluators of cwise sparse binary expression
|
2016-12-27 16:34:30 +01:00 |
|
Gael Guennebaud
|
fe0ee72390
|
Fix check of storage order mismatch for "sparse cwiseop sparse".
|
2016-12-27 16:33:19 +01:00 |
|
Gael Guennebaud
|
6b8f637ab1
|
Harmless typo
|
2016-12-27 16:31:17 +01:00 |
|
Benoit Steiner
|
354baa0fb1
|
Avoid using horizontal adds since they're not very efficient.
|
2016-12-21 20:55:07 -08:00 |
|
Benoit Steiner
|
d7825b6707
|
Use native AVX512 types instead of Eigen Packets whenever possible.
|
2016-12-21 20:06:18 -08:00 |
|
Benoit Steiner
|
660da83e18
|
Pulled latest update from trunk
|
2016-12-21 16:43:27 -08:00 |
|
Benoit Steiner
|
4236aebe10
|
Simplified the contraction code`
|
2016-12-21 16:42:56 -08:00 |
|
Benoit Steiner
|
3cfa16f41d
|
Merged in benoitsteiner/opencl (pull request PR-279)
Fix for auto appearing in functor template argument.
|
2016-12-21 15:08:54 -08:00 |
|
Benoit Steiner
|
519d63d350
|
Added support for libxsmm kernel in multithreaded contractions
|
2016-12-21 15:06:06 -08:00 |
|