Commit Graph

5484 Commits

Author SHA1 Message Date
Benoit Steiner
e96c77668d Merged in rmlarsen/eigen2 (pull request PR-292)
Adds a fast memcpy function to Eigen.
2017-01-25 00:14:04 +00:00
Rasmus Munk Larsen
3be5ee2352 Update copy helper to use fast_memcpy. 2017-01-24 14:22:49 -08:00
Rasmus Munk Larsen
e6b1020221 Adds a fast memcpy function to Eigen. This takes advantage of the following:
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.

2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux

Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.

The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.

Measured improvements in wall clock time:

Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2                          3.48      2.39    +31.3%
BM_memcpy_1T/8                          12.3      6.51    +47.0%
BM_memcpy_1T/64                          371       383     -3.2%
BM_memcpy_1T/512                       66922     66720     +0.3%
BM_memcpy_1T/4k                      9892867   6849682    +30.8%
BM_memcpy_1T/5k                     14951099  10332856    +30.9%
BM_memcpy_2T/2                          3.50      2.46    +29.7%
BM_memcpy_2T/8                          12.3      7.66    +37.7%
BM_memcpy_2T/64                          371       376     -1.3%
BM_memcpy_2T/512                       66652     66788     -0.2%
BM_memcpy_2T/4k                      6145012   6117776     +0.4%
BM_memcpy_2T/5k                      9181478   9010942     +1.9%
BM_memcpy_4T/2                          3.47      2.47    +31.0%
BM_memcpy_4T/8                          12.3      6.67    +45.8
BM_memcpy_4T/64                          374       376     -0.5%
BM_memcpy_4T/512                       67833     68019     -0.3%
BM_memcpy_4T/4k                      5057425   5188253     -2.6%
BM_memcpy_4T/5k                      7555638   7779468     -3.0%
BM_memcpy_6T/2                          3.51      2.50    +28.8%
BM_memcpy_6T/8                          12.3      7.61    +38.1%
BM_memcpy_6T/64                          373       378     -1.3%
BM_memcpy_6T/512                       66871     66774     +0.1%
BM_memcpy_6T/4k                      5112975   5233502     -2.4%
BM_memcpy_6T/5k                      7614180   7772246     -2.1%
BM_memcpy_8T/2                          3.47      2.41    +30.5%
BM_memcpy_8T/8                          12.4      10.5    +15.3%
BM_memcpy_8T/64                          372       388     -4.3%
BM_memcpy_8T/512                       67373     66588     +1.2%
BM_memcpy_8T/4k                      5148462   5254897     -2.1%
BM_memcpy_8T/5k                      7660989   7799058     -1.8%
BM_memcpy_12T/2                         3.50      2.40    +31.4%
BM_memcpy_12T/8                         12.4      7.55    +39.1
BM_memcpy_12T/64                         374       378     -1.1%
BM_memcpy_12T/512                      67132     66683     +0.7%
BM_memcpy_12T/4k                     5185125   5292920     -2.1%
BM_memcpy_12T/5k                     7717284   7942684     -2.9%
BM_slicingSmallPieces_1T/2              47.3      47.5     +0.4%
BM_slicingSmallPieces_1T/8              53.6      52.3     +2.4%
BM_slicingSmallPieces_1T/64              491       476     +3.1%
BM_slicingSmallPieces_1T/512           21734     18814    +13.4%
BM_slicingSmallPieces_1T/4k           394660    396760     -0.5%
BM_slicingSmallPieces_1T/5k           218722    209244     +4.3%
BM_slicingSmallPieces_2T/2              80.7      79.9     +1.0%
BM_slicingSmallPieces_2T/8              54.2      53.1     +2.0
BM_slicingSmallPieces_2T/64              497       477     +4.0%
BM_slicingSmallPieces_2T/512           21732     18822    +13.4%
BM_slicingSmallPieces_2T/4k           392885    390490     +0.6%
BM_slicingSmallPieces_2T/5k           221988    208678     +6.0%
BM_slicingSmallPieces_4T/2              80.8      80.1     +0.9%
BM_slicingSmallPieces_4T/8              54.1      53.2     +1.7%
BM_slicingSmallPieces_4T/64              493       476     +3.4%
BM_slicingSmallPieces_4T/512           21702     18758    +13.6%
BM_slicingSmallPieces_4T/4k           393962    404023     -2.6%
BM_slicingSmallPieces_4T/5k           249667    211732    +15.2%
BM_slicingSmallPieces_6T/2              80.5      80.1     +0.5%
BM_slicingSmallPieces_6T/8              54.4      53.4     +1.8%
BM_slicingSmallPieces_6T/64              488       478     +2.0%
BM_slicingSmallPieces_6T/512           21719     18841    +13.3%
BM_slicingSmallPieces_6T/4k           394950    397583     -0.7%
BM_slicingSmallPieces_6T/5k           223080    210148     +5.8%
BM_slicingSmallPieces_8T/2              81.2      80.4     +1.0%
BM_slicingSmallPieces_8T/8              58.1      53.5     +7.9%
BM_slicingSmallPieces_8T/64              489       480     +1.8%
BM_slicingSmallPieces_8T/512           21586     18798    +12.9%
BM_slicingSmallPieces_8T/4k           394592    400165     -1.4%
BM_slicingSmallPieces_8T/5k           219688    208301     +5.2%
BM_slicingSmallPieces_12T/2             80.2      79.8     +0.7%
BM_slicingSmallPieces_12T/8             54.4      53.4     +1.8
BM_slicingSmallPieces_12T/64             488       476     +2.5%
BM_slicingSmallPieces_12T/512          21931     18831    +14.1%
BM_slicingSmallPieces_12T/4k          393962    396541     -0.7%
BM_slicingSmallPieces_12T/5k          218803    207965     +5.0%
2017-01-24 13:55:18 -08:00
Rasmus Munk Larsen
7b6aaa3440 Fix NaN propagation for AVX512. 2017-01-24 13:37:08 -08:00
Rasmus Munk Larsen
5e144bbaa4 Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op.
See #1373 for details.
2017-01-24 13:32:50 -08:00
Gael Guennebaud
d83db761a2 Add support for std::integral_constant 2017-01-24 16:28:12 +01:00
Gael Guennebaud
bc10201854 Add test for multiple symbols 2017-01-24 16:27:51 +01:00
Gael Guennebaud
c43d254d13 Fix seq().reverse() in c++98 2017-01-24 11:36:43 +01:00
Gael Guennebaud
ddd83f82d8 Add support for "SymbolicExpr op fix<N>" in C++98/11 mode. 2017-01-24 10:54:42 +01:00
Gael Guennebaud
228fef1b3a Extended the set of arithmetic operators supported by FixedInt (-,+,*,/,%,&,|) 2017-01-24 10:53:51 +01:00
Gael Guennebaud
bb52f74e62 Add internal doc 2017-01-24 10:13:35 +01:00
Gael Guennebaud
41c523a0ab Rename fix_t to FixedInt 2017-01-24 09:39:49 +01:00
Gael Guennebaud
ba3f977946 bug #1376: add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j)) 2017-01-23 22:06:08 +01:00
Gael Guennebaud
b0db4eff36 bug #1382: move using std::size_t/ptrdiff_t to Eigen's namespace (still better than the global namespace!) 2017-01-23 22:03:57 +01:00
Gael Guennebaud
ca79c1545a Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t 2017-01-23 22:02:53 +01:00
Gael Guennebaud
4b607b5692 Use Index instead of size_t 2017-01-23 22:00:33 +01:00
Gael Guennebaud
0fe278f7be bug #1379: fix compilation in sparse*diagonal*dense with openmp 2017-01-21 23:27:01 +01:00
Gael Guennebaud
22a172751e bug #1378: fix doc (DiagonalIndex vs Diagonal) 2017-01-21 22:09:59 +01:00
Gael Guennebaud
4d302a080c Recover compile-time size from seq(A,B) when A and B are fixed values. (c++11 only) 2017-01-19 20:34:18 +01:00
Gael Guennebaud
54f3fbee24 Exploit fixed values in seq and reverse with C++98 compatibility 2017-01-19 19:57:32 +01:00
Gael Guennebaud
7691723e34 Add support for fixed-value in symbolic expression, c++11 only for now. 2017-01-19 19:25:29 +01:00
Benoit Steiner
924600a0e8 Made sure that enabling avx2 instructions enables avx and sse instructions as well. 2017-01-19 09:54:48 -08:00
Gael Guennebaud
e84ed7b6ef Remove dead code 2017-01-18 23:18:28 +01:00
Gael Guennebaud
f3ccbe0419 Add a Symbolic::FixedExpr helper expression to make sure the compiler fully optimize the usage of last and end. 2017-01-18 23:16:32 +01:00
Gael Guennebaud
15471432fe Add a .reverse() member to ArithmeticSequence. 2017-01-18 11:35:27 +01:00
Gael Guennebaud
e4f8dd860a Add missing operator* 2017-01-18 10:49:01 +01:00
Gael Guennebaud
198507141b Update all block expressions to accept compile-time sizes passed by fix<N> or fix<N>(n) 2017-01-18 09:43:58 +01:00
Gael Guennebaud
5484ddd353 Merge the generic and dynamic overloads of block() 2017-01-17 22:11:46 +01:00
Gael Guennebaud
655ba783f8 Defer set-to-zero in triangular = product so that no aliasing issue occur in the common:
A.triangularView() = B*A.sefladjointView()*B.adjoint()
case that used to work in 3.2.
2017-01-17 18:03:35 +01:00
Gael Guennebaud
5e36ec3b6f Fix regression when passing enums to operator() 2017-01-17 17:10:16 +01:00
Gael Guennebaud
4f36dcfda8 Add a generic block() method compatible with Eigen::fix 2017-01-17 11:34:28 +01:00
Gael Guennebaud
71e5b71356 Add a get_runtime_value helper to deal with pointer-to-function hack,
plus some refactoring to make the internals more consistent.
2017-01-17 11:33:57 +01:00
Gael Guennebaud
23bfcfc15f Add missing overload of get_compile_time for c++98/11 2017-01-17 10:30:21 +01:00
Gael Guennebaud
edff32c2c2 Disambiguate the two versions of fix for doxygen 2017-01-17 10:29:33 +01:00
Gael Guennebaud
4989922be2 Add support for symbolic expressions as arguments of operator() 2017-01-16 22:21:23 +01:00
Gael Guennebaud
12e22a2844 typos in doc 2017-01-16 16:31:19 +01:00
Gael Guennebaud
e70c4c97fa Typo 2017-01-16 16:20:16 +01:00
Gael Guennebaud
a9232af845 Introduce a variable_or_fixed<N> proxy returned by fix<N>(val) to pass both a compile-time and runtime fallback value in case N means "runtime".
This mechanism is used by the seq/seqN functions. The proxy object is immediately converted to pure compile-time (as fix<N>) or pure runtime (i.e., an Index) to avoid redundant template instantiations.
2017-01-16 16:17:01 +01:00
Gael Guennebaud
6e97698161 Introduce a EIGEN_HAS_CXX14 macro 2017-01-16 16:13:37 +01:00
Mehdi Goli
e46e722381 Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree. 2017-01-16 13:58:49 +00:00
Luke Iwanski
23778a15d8 Reverting unintentional change to Eigen/Geometry 2017-01-16 11:05:56 +00:00
Fraser Cormack
8245d3c7ad Fix case-sensitivity of file include 2017-01-12 12:13:18 +00:00
Gael Guennebaud
752bd92ba5 Large code refactoring:
- generalize some utilities and move them to Meta (size(), array_size())
 - move handling of all and single indices to IndexedViewHelper.h
 - several cleanup changes
2017-01-11 17:24:02 +01:00
Gael Guennebaud
f93d1c58e0 Make get_compile_time compatible with variable_if_dynamic 2017-01-11 17:08:59 +01:00
Gael Guennebaud
c020d307a6 Make variable_if_dynamic<T> implicitely convertible to T 2017-01-11 17:08:05 +01:00
Gael Guennebaud
43c617e2ee merge 2017-01-11 14:33:37 +01:00
Gael Guennebaud
b1dc0fa813 Move fix and symbolic to their own file, and improve doxygen compatibility 2017-01-11 14:28:28 +01:00
Gael Guennebaud
04397f17e2 Add 1D overloads of operator() 2017-01-11 13:17:09 +01:00
Gael Guennebaud
1b5570988b Add doc to seq, seqN, ArithmeticSequence, operator(), etc. 2017-01-10 22:58:58 +01:00
Gael Guennebaud
17eac60446 Factorize const and non-const version of the generic operator() method. 2017-01-10 21:45:55 +01:00
Gael Guennebaud
d072fc4b14 add writeable IndexedView 2017-01-10 17:10:35 +01:00
Gael Guennebaud
c9d5e5c6da Simplify Symbolic API: std::tuple is now used internally and automatically built. 2017-01-10 16:55:07 +01:00
Gael Guennebaud
407e7b7a93 Simplify symbolic API by using "symbol=value" to associate a runtime value to a symbol. 2017-01-10 16:45:32 +01:00
Gael Guennebaud
96e6cf9aa2 Fix linking issue. 2017-01-10 16:35:46 +01:00
Gael Guennebaud
e63678bc89 Fix ambiguous call 2017-01-10 16:33:40 +01:00
Gael Guennebaud
8e247744a4 Fix linking issue 2017-01-10 16:32:06 +01:00
Gael Guennebaud
b47a7e5c3a Add doc for IndexedView 2017-01-10 16:28:57 +01:00
Gael Guennebaud
87963f441c Fallback to Block<> when possible (Index, all, seq with > increment).
This is important to take advantage of the optimized implementations (evaluator, products, etc.),
and to support sparse matrices.
2017-01-10 14:25:30 +01:00
Gael Guennebaud
a98c7efb16 Add a more generic evaluation mechanism and minimalistic doc. 2017-01-10 11:46:29 +01:00
Gael Guennebaud
13d954f270 Cleanup Eigen's namespace 2017-01-10 11:06:02 +01:00
Gael Guennebaud
9eaab4f9e0 Refactoring: move all symbolic stuff into its own namespace 2017-01-10 10:57:08 +01:00
Gael Guennebaud
acd08900c9 Move 'last' and 'end' to their own namespace 2017-01-10 10:31:07 +01:00
Gael Guennebaud
1df2377d78 Implement c++98 version of seq() 2017-01-10 10:28:45 +01:00
Gael Guennebaud
ecd9cc5412 Isolate legacy code (we keep it for performance comparison purpose) 2017-01-10 09:34:25 +01:00
Gael Guennebaud
b50c3e967e Add a minimalistic symbolic scalar type with expression template and make use of it to define the last placeholder and to unify the return type of seq and seqN. 2017-01-09 23:42:16 +01:00
Gael Guennebaud
68064e14fa Rename span/range to seqN/seq 2017-01-09 17:35:21 +01:00
Gael Guennebaud
ad3eef7608 Add link to SO 2017-01-09 13:01:39 +01:00
Gael Guennebaud
75aef5b37f Fix extraction of compile-time size of std::array with gcc 2017-01-06 22:04:49 +01:00
Gael Guennebaud
233dff1b35 Add support for plain arrays for columns and both rows/columns 2017-01-06 22:01:53 +01:00
Gael Guennebaud
76e183bd52 Propagate compile-time size for plain arrays 2017-01-06 22:01:23 +01:00
Gael Guennebaud
3264d3c761 Add support for plain-array as indices, e.g., mat({1,2,3,4}) 2017-01-06 21:53:32 +01:00
Gael Guennebaud
831fffe874 Add missing doc of SparseView 2017-01-06 18:01:29 +01:00
Gael Guennebaud
a875167d99 Propagate compile-time increment and strides.
Had to introduce a UndefinedIncr constant for non structured list of indices.
2017-01-06 15:54:55 +01:00
Gael Guennebaud
e383d6159a MSVC 2015 has all we want about c++11 and MSVC 2017 fails on binder1st/binder2nd 2017-01-06 15:44:13 +01:00
Gael Guennebaud
fad1fa75b3 Propagate compile-time size with "all" and add c++11 array unit test 2017-01-06 13:29:33 +01:00
Gael Guennebaud
3730e3ca9e Use "fix" for compile-time values, propagate compile-time sizes for span, clean some cleanup. 2017-01-06 13:10:10 +01:00
Gael Guennebaud
ac7e4ac9c0 Initial commit to add a generic indexed-based view of matrices.
This version already works as a read-only expression.
Numerous refactoring, renaming, extension, tuning passes are expected...
2017-01-06 00:01:44 +01:00
Jim Radford
0c226644d8 LLT: const the arg to solveInPlace() to allow passing .transpose(), .block(), etc. 2017-01-04 14:42:57 -08:00
Jim Radford
be281e5289 LLT: avoid making a copy when decomposing in place 2017-01-04 14:43:56 -08:00
Gael Guennebaud
e27f17bf5c Gub 1453: fix Map with non-default inner-stride but no outer-stride. 2017-08-22 13:27:37 +02:00
Gael Guennebaud
21d0a0bcf5 bug #1456: add perf recommendation for LLT and storage format 2017-08-22 12:46:35 +02:00
Gael Guennebaud
2c3d70d915 Re-enable hidden doc in LLT 2017-08-22 12:04:09 +02:00
Gael Guennebaud
a6e7a41a55 bug #1455: Cholesky module depends on Jacobi for rank-updates. 2017-08-22 11:37:32 +02:00
Gael Guennebaud
e6021cc8cc bug #1458: fix documentation of LLT and LDLT info() method. 2017-08-22 11:32:55 +02:00
Gael Guennebaud
f727844658 use MKL's lapacke.h header when using MKL 2017-08-17 21:58:39 +02:00
Gael Guennebaud
8c858bd891 Clarify doc regarding the usage of MKL_DIRECT_CALL 2017-08-17 12:17:45 +02:00
Gael Guennebaud
b95f92843c Fix support for MKL's BLAS when using MKL_DIRECT_CALL. 2017-08-17 12:07:10 +02:00
Gael Guennebaud
687bedfcad Make NoAlias and JacobiRotation compatible with CUDA. 2017-08-17 11:51:22 +02:00
Gael Guennebaud
1f4b24d2df Do not preallocate more space than the matrix size (when the sparse matrix boils down to a vector 2017-07-20 10:13:48 +02:00
Gael Guennebaud
55d7181557 Fix lazyness of operator* with CUDA 2017-07-20 09:47:28 +02:00
Gael Guennebaud
cda47c42c2 Fix compilation in c++98 mode. 2017-07-17 21:08:20 +02:00
Gael Guennebaud
3182bdbae6 Disable vectorization when compiled by nvcc, even is EIGEN_NO_CUDA is defined 2017-07-17 11:01:28 +02:00
Gael Guennebaud
9f8136ff74 disable nvcc boolean-expr-is-constant warning 2017-07-17 10:43:18 +02:00
Gael Guennebaud
bbd97b4095 Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases 2017-07-17 01:02:51 +02:00
Gael Guennebaud
2299717fd5 Fix and workaround several doxygen issues/warnings 2017-01-04 23:27:33 +01:00
Gael Guennebaud
ee6f7f6c0c Add doc for sparse triangular solve functions 2017-01-04 23:10:36 +01:00
Gael Guennebaud
a0a36ad0ef bug #1336: workaround doxygen failing to include numerous members of MatriBase in Matrix 2017-01-04 22:02:39 +01:00
Gael Guennebaud
29a1a58113 Document selfadjointView 2017-01-04 22:01:50 +01:00
Gael Guennebaud
8702562177 bug #1370: add doc for StorageIndex 2017-01-03 11:25:41 +01:00
Gael Guennebaud
575c078759 bug #1370: rename _Index to _StorageIndex in SparseMatrix, and add a warning in the doc regarding the 3.2 to 3.3 change of SparseMatrix::Index 2017-01-03 11:19:14 +01:00
Valentin Roussellet
d3c5525c23 Added += and + operators to inner iterators
Fix #1340
#1340
2016-12-28 18:29:30 +01:00
Gael Guennebaud
5c27962453 Move common cwise-unary method from MatrixBase/ArrayBase to the common DenseBase class. 2017-01-02 22:27:07 +01:00
Gael Guennebaud
8d7810a476 bug #1365: fix another type mismatch warning
(sync is set from and compared to an Index)
2016-12-28 23:35:43 +01:00
Gael Guennebaud
97812ff0d3 bug #1369: fix type mismatch warning.
Returned values of omp thread id and numbers are int,
o let's use int instead of Index here.
2016-12-28 23:29:35 +01:00
Gael Guennebaud
7713e20fd2 Fix compilation 2016-12-27 22:04:58 +01:00
Gael Guennebaud
ab69a7f6d1 Cleanup because trait<CwiseBinaryOp>::Flags now expose the correct storage order 2016-12-27 16:55:47 +01:00
Gael Guennebaud
d32a43e33a Make sure that traits<CwiseBinaryOp>::Flags reports the correct storage order so that methods like .outerSize()/.innerSize() work properly. 2016-12-27 16:35:45 +01:00
Gael Guennebaud
7136267461 Add missing .outer() member to iterators of evaluators of cwise sparse binary expression 2016-12-27 16:34:30 +01:00
Gael Guennebaud
fe0ee72390 Fix check of storage order mismatch for "sparse cwiseop sparse". 2016-12-27 16:33:19 +01:00
Gael Guennebaud
6b8f637ab1 Harmless typo 2016-12-27 16:31:17 +01:00
Benoit Steiner
354baa0fb1 Avoid using horizontal adds since they're not very efficient. 2016-12-21 20:55:07 -08:00
Benoit Steiner
d7825b6707 Use native AVX512 types instead of Eigen Packets whenever possible. 2016-12-21 20:06:18 -08:00
Gael Guennebaud
c6882a72ed Merged in joaoruileal/eigen (pull request PR-276)
Minor improvements to Umfpack support
2016-12-21 21:39:48 +01:00
Joao Rui Leal
c8c89b5e19 renamed methods umfpackReportControl(), umfpackReportInfo(), and umfpackReportStatus() from UmfPackLU to printUmfpackControl(), printUmfpackInfo(), and printUmfpackStatus() 2016-12-21 09:16:28 +00:00
Gael Guennebaud
f2f9df8aa5 Remove MSVC warning 4127 - conditional expression is constant from the disabled list as we now have a local workaround. 2016-12-20 22:53:19 +01:00
Gael Guennebaud
2b3fc981b8 bug #1362: workaround constant conditional warning produced by MSVC 2016-12-20 22:52:27 +01:00
Gael Guennebaud
94e8d8902f Fix bug #1367: compilation fix for gcc 4.1! 2016-12-20 22:17:01 +01:00
Gael Guennebaud
684cfc762d Add transpose, adjoint, conjugate methods to SelfAdjointView (useful to write generic code) 2016-12-20 16:33:53 +01:00
Gael Guennebaud
11f55b2979 Optimize storage layout of Cwise* and PlainObjectBase evaluator to remove the functor or outer-stride if they are empty.
For instance, sizeof("(A-B).cwiseAbs2()") with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.
In theory, evaluators should be completely optimized away by the compiler, but this might help in some cases.
2016-12-20 15:55:40 +01:00
Gael Guennebaud
5271474b15 Remove common "noncopyable" base class from evaluator_base to get a chance to get EBO (Empty Base Optimization)
Note: we should probbaly get rid of this class and define a macro instead.
2016-12-20 15:51:30 +01:00
Gael Guennebaud
316673bbde Clean-up usage of ExpressionTraits in all/any implementation. 2016-12-20 14:38:05 +01:00
Christoph Hertzberg
10c6bcdc2e Add support for long indexes and for (real-valued) row-major matrices to CholmodSupport module 2016-12-19 14:07:42 +01:00
Gael Guennebaud
f5d644b415 Make sure that HyperPlane::transform manitains a unit normal vector in the Affine case. 2016-12-20 09:35:00 +01:00
Benoit Steiner
923acadfac Fixed compilation errors with gcc6 when compiling the AVX512 intrinsics 2016-12-19 13:02:27 -08:00
Benoit Jacob
751e097c57 Use 32 registers on ARM64 2016-12-19 13:44:46 -05:00
Benoit Steiner
fb1d0138ec Include SSE packet instructions when compiling with avx512 enabled. 2016-12-19 07:32:48 -08:00
Joao Rui Leal
95b804c0fe it is now possible to change Umfpack control settings before factorizations; added access to the report functions of Umfpack 2016-12-19 10:45:59 +00:00
Gael Guennebaud
8c0e701504 bug #1360: fix sign issue with pmull on altivec 2016-12-18 22:13:19 +00:00
Gael Guennebaud
fc94258e77 Fix unused warning 2016-12-18 22:11:48 +00:00
ermak
d60cca32e5 Transformation methods added to ParametrizedLine class. 2016-12-17 00:45:13 +07:00
Benoit Steiner
9e03dfb452 Made sure EIGEN_HAS_C99_MATH is defined when compiling OpenCL code 2016-12-17 09:23:37 -08:00
Rafael Guglielmetti
8f11df2667 NumTraits.h:
For the values 'ReadCost, AddCost and MulCost', information about value Eigen::HugeCost
2016-12-16 09:07:12 +00:00
Benoit Steiner
1324ffef2f Reenabled the use of constexpr on OpenCL devices 2016-12-15 06:49:38 -08:00
Gael Guennebaud
5d00fdf0e8 bug #1363: fix mingw's ABI issue 2016-12-15 11:58:31 +01:00
Gael Guennebaud
11b492e993 bug #1358: fix compilation for sparse += sparse.selfadjointView(); 2016-12-14 17:53:47 +01:00
Gael Guennebaud
e67397bfa7 bug #1359: fix compilation of col_major_sparse.row() *= scalar
(used to work in 3.2.9 though the expression is not really writable)
2016-12-14 17:05:26 +01:00
Gael Guennebaud
98d7458275 bug #1359: fix sparse /=scalar and *=scalar implementation.
InnerIterators must be obtained from an evaluator.
2016-12-14 17:03:13 +01:00
Gael Guennebaud
c817ce3ba3 bug #1361: fix compilation issue in mat=perm.inverse() 2016-12-13 23:10:27 +01:00
Benoit Steiner
6811e6cf49 Merged in srvasude/eigen/fix_cuda_exp (pull request PR-268)
Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).
2016-12-08 05:14:11 -08:00
Angelos Mantzaflaris
7694684992 Remove superfluous const's (can cause warnings on some Intel compilers)
(grafted from e236d3443c
)
2016-12-07 00:37:48 +01:00
Gael Guennebaud
eb621413c1 Revert vec/y to vec*(1/y) in row-major TRSM:
- div is extremely costly
- this is consistent with the column-major case
- this is consistent with all other BLAS implementations
2016-12-06 15:04:50 +01:00
Gael Guennebaud
8365c2c941 Fix BLAS backend for symmetric rank K updates. 2016-12-06 14:47:09 +01:00
Srinivas Vasudevan
e6c8b5500c Change comparisons to use Scalar instead of RealScalar. 2016-12-05 14:01:45 -08:00
Srinivas Vasudevan
f7d7c33a28 Fix expm1 CUDA implementation (do not shadow exp CUDA implementation). 2016-12-05 12:19:01 -08:00
Srinivas Vasudevan
09ee7f0c80 Fix small nit where I changed name of plog1p to pexpm1. 2016-12-02 15:30:12 -08:00
Srinivas Vasudevan
a0d3ac760f Sync from Head. 2016-12-02 14:14:45 -08:00
Srinivas Vasudevan
218764ee1f Added support for expm1 in Eigen. 2016-12-02 14:13:01 -08:00
Gael Guennebaud
66f65ccc36 Ease compiler job to generate clean and efficient code in mat*vec. 2016-12-02 22:41:26 +01:00
Gael Guennebaud
fe696022ec Operators += and -= do not resize! 2016-12-02 22:40:25 +01:00
Angelos Mantzaflaris
18de92329e use numext::abs
(grafted from 0a08d4c60b
)
2016-12-02 11:48:06 +01:00
Angelos Mantzaflaris
e8a6aa518e 1. Add explicit template to abs2 (resolves deduction for some arithmetic types)
2. Avoid signed-unsigned conversion in comparison (warning in case Scalar is unsigned)
(grafted from 4086187e49
)
2016-12-02 11:39:18 +01:00
Gael Guennebaud
a6b971e291 Fix memory leak in Ref<Sparse> 2016-12-05 16:59:30 +01:00
Gael Guennebaud
8640ffac65 Optimize SparseLU::solve for rhs vectors 2016-12-05 15:41:14 +01:00
Gael Guennebaud
62acd67903 remove temporary in SparseLU::solve 2016-12-05 15:11:57 +01:00
Gael Guennebaud
0db6d5b3f4 bug #1356: fix calls to evaluator::coeffRef(0,0) to get the address of the destination
by adding a dstDataPtr() member to the kernel. This fixes undefined behavior if dst is empty (nullptr).
2016-12-05 15:08:09 +01:00
Gael Guennebaud
91003f3b86 typo 2016-12-05 13:51:07 +01:00
Gael Guennebaud
e3f613cbd4 Improve performance of row-major-dense-matrix * vector products for recent CPUs.
This revised version does not bother about aligned loads/stores,
and rather processes 8 rows at ones for better instruction pipelining.
2016-12-05 13:02:01 +01:00
Gael Guennebaud
3abc827354 Clean debugging code 2016-12-05 12:59:32 +01:00
Benoit Steiner
462c28e77a Merged in srvasude/eigen (pull request PR-265)
Add Expm1 support to Eigen.
2016-12-05 02:31:11 +00:00
Gael Guennebaud
6a5fe86098 Complete rewrite of column-major-matrix * vector product to deliver higher performance of modern CPU.
The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive.
This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA.
According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast.
Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching.
We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
2016-12-03 21:14:14 +01:00
Christoph Hertzberg
22f7d398e2 bug #1355: Fixed wrong line-endings on two files 2016-12-02 11:22:05 +01:00
Gael Guennebaud
27873008d4 Clean up SparseCore module regarding ReverseInnerIterator 2016-12-01 21:55:10 +01:00
Angelos Mantzaflaris
8c24723a09 typo UIntPtr
(grafted from b6f04a2dd4
)
2016-12-01 21:25:58 +01:00
Angelos Mantzaflaris
aeba0d8655 fix two warnings(unused typedef, unused variable) and a typo
(grafted from a9aa3bcf50
)
2016-12-01 21:23:43 +01:00
Gael Guennebaud
181138a1cb fix member order 2016-12-01 17:06:20 +01:00
Gael Guennebaud
9f297d57ae Merged in rmlarsen/eigen (pull request PR-256)
Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.
2016-12-01 15:27:33 +00:00
Benoit Steiner
7ff26ddcbb Merged eigen/eigen into default 2016-12-01 07:13:17 -08:00
Gael Guennebaud
037b46762d Fix misleading-indentation warnings. 2016-12-01 16:05:42 +01:00
Mehdi Goli
79aa2b784e Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code. 2016-12-01 13:02:27 +00:00
Benoit Steiner
fd1dc3363e Merged eigen/eigen into default 2016-11-30 20:16:17 -08:00
Gael Guennebaud
8df272af88 Fix slection of product implementation for dynamic size matrices with fixed max size. 2016-11-30 22:21:33 +01:00
Gael Guennebaud
c927af60ed Fix a performance regression in (mat*mat)*vec for which mat*mat was evaluated multiple times. 2016-11-30 17:59:13 +01:00
Gael Guennebaud
ab4ef5e66e bug #1351: fix compilation of random with old compilers 2016-11-30 17:37:53 +01:00
Rasmus Munk Larsen
a0329f64fb Add a default constructor for the "fake" __half class when not using the
__half class provided by CUDA.
2016-11-29 13:18:09 -08:00
Benoit Steiner
9f8fbd9434 Merged eigen/eigen into default 2016-11-26 11:28:25 -08:00
Mehdi Goli
7318daf887 Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h. 2016-11-25 16:19:07 +00:00
Benoit Steiner
3be1afca11 Disabled the "remove the call to 'std::abs' since unsigned values cannot be negative" warning introduced in clang 3.5 2016-11-23 18:49:51 -08:00
Mehdi Goli
b8cc5635d5 Removing unsupported device from test case; cleaning the tensor device sycl. 2016-11-23 16:30:41 +00:00
Gael Guennebaud
e340866c81 Fix compilation with gcc and old ABI version 2016-11-23 14:04:57 +01:00
Gael Guennebaud
a91de27e98 Fix compilation issue with MSVC:
MSVC always messes up with shadowed template arguments, for instance in:
  struct B { typedef float T; }
  template<typename T> struct A : B {
    T g;
  };
The type of A<double>::g will be float and not double.
2016-11-23 12:24:48 +01:00
Gael Guennebaud
74637fa4e3 Optimize predux<Packet8f> (AVX) 2016-11-22 21:57:52 +01:00
Gael Guennebaud
178c084856 Disable usage of SSE3 _mm_hadd_ps that is extremely slow. 2016-11-22 21:53:14 +01:00
Gael Guennebaud
7dd894e40e Optimize predux<Packet4d> (AVX) 2016-11-22 21:41:30 +01:00
Gael Guennebaud
f3fb0a1940 Disable usage of SSE3 haddpd that is extremely slow. 2016-11-22 16:58:31 +01:00
Gael Guennebaud
6a84246a6a Fix regression in assigment of sparse block to spasre block. 2016-11-21 21:46:42 +01:00
Benoit Steiner
ed839c5851 Enable the use of constant expressions with clang >= 3.6 2016-11-20 10:34:49 -08:00
Gael Guennebaud
465ede0f20 Fix compilation issue in mat = permutation (regression introduced in 8193ffb3d3
)
2016-11-20 09:41:37 +01:00
Benoit Steiner
81151bd474 Fixed merge conflicts 2016-11-19 19:12:59 -08:00
Benoit Steiner
1bdf1b9ce0 Merged in benoitsteiner/opencl (pull request PR-253)
OpenCL improvements
2016-11-19 04:44:43 +00:00
Benoit Steiner
8649e16c2a Enable EIGEN_HAS_C99_MATH when building with the latest version of Visual Studio 2016-11-18 14:18:34 -08:00
Gael Guennebaud
164414c563 Merged in ChunW/eigen (pull request PR-252)
Workaround for error in VS2012 with /clr
2016-11-18 21:07:29 +00:00
Luke Iwanski
5159675c33 Added isnan, isfinite and isinf for SYCL device. Plus test for that. 2016-11-18 16:01:48 +00:00
Gael Guennebaud
8193ffb3d3 bug #1343: fix compilation regression in mat+=selfadjoint_view.
Generic EigenBase2EigenBase assignment was incomplete.
2016-11-18 10:17:34 +01:00
Gael Guennebaud
cebff7e3a2 bug #1343: fix compilation regression in array = matrix_product 2016-11-18 10:09:33 +01:00
Benoit Steiner
7c30078b9f Merged eigen/eigen into default 2016-11-17 22:53:37 -08:00
Chun Wang
0d0948c3b9 Workaround for error in VS2012 with /clr 2016-11-17 17:54:27 -05:00
Konstantinos Margaritis
672aa97d4d implement float/std::complex<float> for ZVector as well, minor fixes to ZVector 2016-11-17 13:27:33 -05:00
Luke Iwanski
c5130dedbe Specialised basic math functions for SYCL device. 2016-11-17 11:47:13 +00:00
Benoit Steiner
f2e8b73256 Enable the use of AVX512 instruction by default 2016-11-16 21:28:04 -08:00
Gael Guennebaud
7b09e4dd8c bump default branch to 3.3.90 2016-11-16 22:20:58 +01:00
Benoit Steiner
dff9a049c4 Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs 2016-11-16 09:01:51 -08:00
Gael Guennebaud
0ee92aa38e Optimize sparse<bool> && sparse<bool> to use the same path as for coeff-wise products. 2016-11-14 18:47:41 +01:00
Gael Guennebaud
2e334f5da0 bug #426: move operator && and || to MatrixBase and SparseMatrixBase. 2016-11-14 18:47:02 +01:00
Gael Guennebaud
a048aba14c Merged in olesalscheider/eigen (pull request PR-248)
Make sure not to call numext::maxi on expression templates
2016-11-14 13:25:53 +00:00
Gael Guennebaud
eedb87f4ba Fix regression in SparseMatrix::ReverseInnerIterator 2016-11-14 14:05:53 +01:00
Niels Ole Salscheider
51fef87408 Make sure not to call numext::maxi on expression templates 2016-11-12 12:20:57 +01:00
Gael Guennebaud
eeac81b8c0 bump to 3.3.0 2016-11-10 13:55:14 +01:00
Gael Guennebaud
e80bc2ddb0 Fix printing of sparse expressions 2016-11-10 10:35:32 +01:00
Benoit Steiner
db3903498d Merged in benoitsteiner/opencl (pull request PR-246)
Improved support for OpenCL
2016-11-08 22:28:44 +00:00
Gael Guennebaud
436a111792 Generalize Cholmod support to hanlde any sparse type as the rhs and result of the solve method 2016-11-06 20:29:23 +01:00
Gael Guennebaud
afc55b1885 Generalize IterativeSolverBase::solve to hanlde any sparse type as the results (instead of SparseMatrix only) 2016-11-06 20:28:18 +01:00
Gael Guennebaud
a5c2d8a3cc Generalize solve_sparse_through_dense_panels to handle SparseVector. 2016-11-06 15:20:58 +01:00
Gael Guennebaud
f8bfe10613 Add missing friend declaration 2016-11-06 15:20:30 +01:00
Gael Guennebaud
fc7180cda8 Add a default ctor to evaluator<SparseVector>.
Needed for evaluator<Solve>.
2016-11-06 15:20:00 +01:00
Gael Guennebaud
4d226ab5b5 Enable swapping between SparseMatrix and SparseVector 2016-11-06 15:15:03 +01:00
Gael Guennebaud
a354c3ca59 Fix compilation of LLT with complex<mpreal>. 2016-11-05 11:28:29 +01:00
Benoit Steiner
d46a36cc84 Merged eigen/eigen into default 2016-11-04 18:22:55 -07:00
Mehdi Goli
0ebe3808ca Removed the sycl include from Eigen/Core and moved it to Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size; 2016-11-04 18:18:19 +00:00
Gael Guennebaud
ba05572dcb bump to 3.3-rc2 2016-11-04 09:09:06 +01:00
Benoit Steiner
5c3995769c Improved AVX512 configuration 2016-11-03 04:50:28 -07:00
Benoit Steiner
ca0ba0d9a4 Improved AVX512 support 2016-11-03 04:00:49 -07:00
Benoit Steiner
c80587c92b Merged eigen/eigen into default 2016-11-03 03:55:11 -07:00
Gael Guennebaud
3f1d0cdc22 bug #1337: improve doc of homogeneous() and hnormalized() 2016-11-03 11:03:08 +01:00
Gael Guennebaud
78e93ac1ad bug #1330: Cholmod supports double precision only, so let's trigger a static assertion if the scalar type does not match this requirement. 2016-11-03 10:21:59 +01:00
Benoit Steiner
3e37166d0b Merged in benoitsteiner/opencl (pull request PR-244)
Disable vectorization on device only when compiling for sycl
2016-11-02 22:01:03 +00:00
Benoit Steiner
0585b2965d Disable vectorization on device only when compiling for sycl 2016-11-02 11:44:27 -07:00
Gael Guennebaud
a07bb428df bug #1004: improve accuracy of LinSpaced for abs(low) >> abs(high). 2016-11-02 11:34:38 +01:00
Gael Guennebaud
598de8b193 Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX. 2016-11-02 10:38:13 +01:00
Benoit Steiner
7a0e96b80d Gate the code that refers to cuda fp16 primitives more thoroughly 2016-11-01 12:08:09 -07:00
Gael Guennebaud
3ecb343dc3 Fix regression in X = (X*X.transpose())/s with X rectangular by deferring resizing of the destination after the creation of the evaluator of the source expression. 2016-10-26 22:50:41 +02:00
Gael Guennebaud
97feea9d39 add a generic EIGEN_HAS_CXX11 2016-10-26 15:53:13 +02:00
Gael Guennebaud
ca6a2a5248 Fix warning with ICC 2016-10-26 14:13:05 +02:00
Gael Guennebaud
b15a5dc3f4 Fix ICC warnings 2016-10-25 22:20:24 +02:00
Gael Guennebaud
aad72f3c6d Add missing inline keywords 2016-10-25 20:20:09 +02:00
Benoit Steiner
3e194a6a73 Fixed a typo 2016-10-25 08:42:15 -07:00
Gael Guennebaud
58146be99b bug #1004: one more rewrite of LinSpaced for floating point numbers to guarantee both interpolation and monotonicity.
This version simply does low+i*step plus a branch to return high if i==size-1.
Vectorization is accomplished with a branch and the help of pinsertlast.
Some quick benchmark revealed that the overhead is really marginal, even when filling small vectors.
2016-10-25 16:53:09 +02:00
Gael Guennebaud
13fc18d3a2 Add a pinsertlast function replacing the last entry of a packet by a scalar.
(useful to vectorize LinSpaced)
2016-10-25 16:48:49 +02:00
Gael Guennebaud
2634f9386c bug #1333: fix bad usage of const_cast_derived. Better use .data() for that purpose. 2016-10-24 22:22:35 +02:00
Gael Guennebaud
9e8f07d7b5 Cleanup ArrayWrapper and MatrixWrapper by removing redundant accessors. 2016-10-24 22:16:48 +02:00
Gael Guennebaud
b027d7a8cf bug #1004: remove the inaccurate "sequential" path for LinSpaced, mark respective function as deprecated, and enforce strict interpolation of the higher range using a correction term.
Now, even with floating point precision, both the 'low' and 'high' bounds are exactly reproduced at i=0 and i=size-1 respectively.
2016-10-24 20:27:21 +02:00
Benoit Steiner
b11aab5fcc Merged in benoitsteiner/opencl (pull request PR-238)
Added support for OpenCL to the Tensor Module
2016-10-24 15:30:45 +00:00
Gael Guennebaud
53c77061f0 bug #698: rewrite LinSpaced for integer scalar types to avoid overflow and guarantee an even spacing when possible.
Otherwise, the "high" bound is implicitly lowered to the largest value allowing for an even distribution.
This changeset also disable vectorization for this integer path.
2016-10-24 15:50:27 +02:00
Gael Guennebaud
40f62974b7 bug #1328: workaround a compilation issue with gcc 4.2 2016-10-20 19:19:37 +02:00
Benoit Steiner
cf20b30d65 Merge latest updates from trunk 2016-10-20 09:42:05 -07:00
Benoit Steiner
d3943cd50c Fixed a few typos in the ternary tensor expressions types 2016-10-19 12:56:12 -07:00
Mehdi Goli
8fb162fc85 Fixing the typo regarding missing #if needed for proper handling of exceptions in Eigen/Core. 2016-10-16 12:52:34 +01:00
Luke Iwanski
2e188dd4d4 Merged ComputeCpp to default. 2016-10-14 16:47:40 +01:00
Mehdi Goli
15380f9a87 Applyiing Benoit's comment to return the missing line back in Eigen/Core 2016-10-14 16:39:41 +01:00
Gael Guennebaud
692b30ca95 Fix previous merge. 2016-10-14 17:16:28 +02:00
Gael Guennebaud
050c681bdd Merged in rmlarsen/eigen2 (pull request PR-232)
Improve performance of parallelized matrix multiply for rectangular matrices
2016-10-14 14:51:09 +00:00