Commit Graph

5376 Commits

Author SHA1 Message Date
Benoit Steiner
7b61944669 Made most of the packet math primitives usable within CUDA kernel when compiling with clang 2017-02-28 17:05:28 -08:00
Benoit Steiner
857adbbd52 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 16:42:00 -08:00
Benoit Steiner
c36bc2d445 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 14:58:45 -08:00
Benoit Steiner
4a7df114c8 Added missing EIGEN_DEVICE_FUNC 2017-02-28 14:00:15 -08:00
Benoit Steiner
765f4cc4b4 Deleted extra: EIGEN_DEVICE_FUNC: the QR and Cholesky code isn't ready to run on GPU yet. 2017-02-28 11:57:00 -08:00
Benoit Steiner
e993c94f07 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 09:56:45 -08:00
Benoit Steiner
33443ec2b0 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 09:50:10 -08:00
Benoit Steiner
f3e9c42876 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 09:46:30 -08:00
Gael Guennebaud
4e98a7b2f0 bug #1396: add some missing EIGEN_DEVICE_FUNC 2017-02-28 09:47:38 +01:00
Benoit Steiner
889c606f8f Added missing EIGEN_DEVICE_FUNC to the SelfCwise binary ops 2017-02-27 17:17:47 -08:00
Benoit Steiner
193939d6aa Added missing EIGEN_DEVICE_FUNC qualifiers to several nullary op methods. 2017-02-27 17:11:47 -08:00
Benoit Steiner
ed4dc9d01a Declared the plset, ploadt_ro, and ploaddup packet primitives as usable within a gpu kernel 2017-02-27 16:57:01 -08:00
Benoit Steiner
b1fc7c9a09 Added missing EIGEN_DEVICE_FUNC qualifiers. 2017-02-27 16:48:30 -08:00
Benoit Steiner
554116bec1 Added EIGEN_DEVICE_FUNC to make the prototype of the EigenBase override match that of DenseBase 2017-02-27 16:45:31 -08:00
Benoit Steiner
34d9fce93b Avoid unecessary float to double conversions. 2017-02-27 16:33:33 -08:00
Gael Guennebaud
76687f385c bug #1394: fix compilation of SelfAdjointEigenSolver<Matrix>(sparse*sparse); 2017-02-20 14:27:26 +01:00
Gael Guennebaud
6572825703 bug #1395: fix the use of compile-time vectors as inputs of JacobiSVD. 2017-02-20 13:44:37 +01:00
Gael Guennebaud
a811a04696 Silent warning. 2017-02-20 10:14:21 +01:00
Gael Guennebaud
63798df038 Fix usage of CUDACC_VER 2017-02-20 08:16:36 +01:00
Gael Guennebaud
deefa54a54 Fix tracking of temporaries in unit tests 2017-02-19 10:32:54 +01:00
Gael Guennebaud
cbbf88c4d7 Use int32_t instead of int in NEON code. Some platforms with 16 bytes int supports ARM NEON. 2017-02-17 14:39:02 +01:00
Gael Guennebaud
582b5e39bf bug #1393: enable Matrix/Array explicit ctor from types with conversion operators (was ok with 3.2) 2017-02-17 14:10:57 +01:00
Benoit Steiner
31a25ab226 Merged eigen/eigen into default 2017-02-14 15:36:21 -08:00
Gael Guennebaud
5937c4ae32 Fall back is_integral to std::is_integral in c++11 2017-02-13 17:14:26 +01:00
Jonathan Hseu
3453b00a1e Fix vector indexing with uint64_t 2017-02-11 21:45:32 -08:00
Gael Guennebaud
e7ebe52bfb bug #1391: include IO.h before DenseBase to enable its usage in DenseBase plugins. 2017-02-13 09:46:20 +01:00
Gael Guennebaud
b3750990d5 Workaround some gcc 4.7 warnings 2017-02-11 23:24:06 +01:00
Gael Guennebaud
c16ee72b20 bug #1392: fix #include <Eigen/Sparse> with mpl2-only 2017-02-11 10:35:01 +01:00
Gael Guennebaud
e43016367a Forgot to include a file in previous commit 2017-02-11 10:34:18 +01:00
Gael Guennebaud
6486d4fc95 Worakound gcc 4.7 issue in c++11. 2017-02-11 10:29:10 +01:00
Gael Guennebaud
4a4a72951f Fix previous commits: disbale only problematic indexed view methods for old compilers instead of disabling everything.
Tested with gcc 4.7 (c++03) and gcc 4.8 (c++03 & c++11)
2017-02-11 10:28:44 +01:00
Benoit Steiner
fad776492f Merged eigen/eigen into default 2017-02-10 14:27:43 -08:00
Benoit Steiner
1ef30b8090 Fixed bug introduced in previous commit 2017-02-10 13:35:10 -08:00
Benoit Steiner
769208a17f Pulled latest updates from upstream 2017-02-10 13:11:40 -08:00
Benoit Steiner
8b3cc54c42 Added a new EIGEN_HAS_INDEXED_VIEW define that set to 0 for older compilers that are known to fail to compile the indexed views (I used the define from the indexed_views.cpp test).
Only include the indexed view methods when the compiler supports the code.
This makes it possible to use Eigen again in complex code bases such as TensorFlow and older compilers such as gcc 4.8
2017-02-10 13:08:49 -08:00
Gael Guennebaud
a1ff24f96a Fix prunning in (sparse*sparse).pruned() when the result is nearly dense. 2017-02-10 13:59:32 +01:00
Gael Guennebaud
0256c52359 Include clang in the list of non strict MSVC (just to be sure) 2017-02-10 13:41:52 +01:00
Alexander Neumann
dd58462e63 fixed inlining issue with clang-cl on visual studio
(grafted from 7962ac1a58
)
2017-02-08 23:50:38 +01:00
Gael Guennebaud
fc8fd5fd24 Improve multi-threading heuristic for matrix products with a small number of columns. 2017-02-07 17:19:59 +01:00
Gael Guennebaud
4254b3eda3 bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
Benoit Steiner
2db75c07a6 fixed the ordering of the template and EIGEN_DEVICE_FUNC keywords in a few more places to get more of the Eigen codebase to compile with nvcc again. 2017-02-01 15:41:29 -08:00
Benoit Steiner
fcd257039b Replaced EIGEN_DEVICE_FUNC template<foo> with template<foo> EIGEN_DEVICE_FUNC to make the code compile with nvcc8. 2017-02-01 15:30:49 -08:00
Gael Guennebaud
0eceea4efd Define EIGEN_COMP_GNUC to reflect version number: 47, 48, 49, 50, 60, ... 2017-02-01 23:36:40 +01:00
Gael Guennebaud
645a8e32a5 Fix compilation of JacobiSVD for vectors type 2017-01-31 16:22:54 +01:00
Gael Guennebaud
53026d29d4 bug #478: fix regression in the eigen decomposition of zero matrices. 2017-01-31 14:22:42 +01:00
Benoit Steiner
fbc39fd02c Merge latest changes from upstream 2017-01-30 15:25:57 -08:00
Gael Guennebaud
c86911ac73 bug #1384: fix evaluation of "sparse/scalar" that used the wrong evaluation path. 2017-01-30 13:38:24 +01:00
Gael Guennebaud
d024e9942d MSVC 1900 release is not c++14 compatible enough for us. The 1910 update seems to be fine though. 2017-01-27 22:17:59 +01:00
Rasmus Munk Larsen
edaa0fc5d1 Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy.
This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
2017-01-26 12:46:06 -08:00
Gael Guennebaud
25a1703579 Merged in ggael/eigen-flexidexing (pull request PR-294)
generalized operator() for indexed access and slicing
2017-01-26 08:04:23 +00:00
Gael Guennebaud
98dfe0c13f Fix useless ';' warning 2017-01-25 22:55:04 +01:00
Gael Guennebaud
28351073d8 Fix unamed type as template argument (ok in c++11 only) 2017-01-25 22:54:51 +01:00
Gael Guennebaud
607be65a03 Fix duplicates of array_size bewteen unsupported and Core 2017-01-25 22:53:58 +01:00
Rasmus Munk Larsen
7d39c6d50a Merged eigen/eigen into default 2017-01-25 09:22:26 -08:00
Rasmus Munk Larsen
5c9ed4ba0d Reverse arguments for pmin in AVX. 2017-01-25 09:21:57 -08:00
Gael Guennebaud
850ca961d2 bug #1383: fix regression in LinSpaced for integers and high<low 2017-01-25 18:13:53 +01:00
Gael Guennebaud
296d24be4d bug #1381: fix sparse.diagonal() used as a rvalue.
The problem was that is "sparse" is not const, then sparse.diagonal() must have the
LValueBit flag meaning that sparse.diagonal().coeff(i) must returns a const reference,
const Scalar&. However, sparse::coeff() cannot returns a reference for a non-existing
zero coefficient. The trick is to return a reference to a local member of
evaluator<SparseMatrix>.
2017-01-25 17:39:01 +01:00
Gael Guennebaud
d06a48959a bug #1383: Fix regression from 3.2 with LinSpaced(n,0,n-1) with n==0. 2017-01-25 15:27:13 +01:00
Rasmus Munk Larsen
ae3e43a125 Remove extra space. 2017-01-24 16:16:39 -08:00
Benoit Steiner
e96c77668d Merged in rmlarsen/eigen2 (pull request PR-292)
Adds a fast memcpy function to Eigen.
2017-01-25 00:14:04 +00:00
Rasmus Munk Larsen
3be5ee2352 Update copy helper to use fast_memcpy. 2017-01-24 14:22:49 -08:00
Rasmus Munk Larsen
e6b1020221 Adds a fast memcpy function to Eigen. This takes advantage of the following:
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.

2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux

Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.

The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.

Measured improvements in wall clock time:

Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2                          3.48      2.39    +31.3%
BM_memcpy_1T/8                          12.3      6.51    +47.0%
BM_memcpy_1T/64                          371       383     -3.2%
BM_memcpy_1T/512                       66922     66720     +0.3%
BM_memcpy_1T/4k                      9892867   6849682    +30.8%
BM_memcpy_1T/5k                     14951099  10332856    +30.9%
BM_memcpy_2T/2                          3.50      2.46    +29.7%
BM_memcpy_2T/8                          12.3      7.66    +37.7%
BM_memcpy_2T/64                          371       376     -1.3%
BM_memcpy_2T/512                       66652     66788     -0.2%
BM_memcpy_2T/4k                      6145012   6117776     +0.4%
BM_memcpy_2T/5k                      9181478   9010942     +1.9%
BM_memcpy_4T/2                          3.47      2.47    +31.0%
BM_memcpy_4T/8                          12.3      6.67    +45.8
BM_memcpy_4T/64                          374       376     -0.5%
BM_memcpy_4T/512                       67833     68019     -0.3%
BM_memcpy_4T/4k                      5057425   5188253     -2.6%
BM_memcpy_4T/5k                      7555638   7779468     -3.0%
BM_memcpy_6T/2                          3.51      2.50    +28.8%
BM_memcpy_6T/8                          12.3      7.61    +38.1%
BM_memcpy_6T/64                          373       378     -1.3%
BM_memcpy_6T/512                       66871     66774     +0.1%
BM_memcpy_6T/4k                      5112975   5233502     -2.4%
BM_memcpy_6T/5k                      7614180   7772246     -2.1%
BM_memcpy_8T/2                          3.47      2.41    +30.5%
BM_memcpy_8T/8                          12.4      10.5    +15.3%
BM_memcpy_8T/64                          372       388     -4.3%
BM_memcpy_8T/512                       67373     66588     +1.2%
BM_memcpy_8T/4k                      5148462   5254897     -2.1%
BM_memcpy_8T/5k                      7660989   7799058     -1.8%
BM_memcpy_12T/2                         3.50      2.40    +31.4%
BM_memcpy_12T/8                         12.4      7.55    +39.1
BM_memcpy_12T/64                         374       378     -1.1%
BM_memcpy_12T/512                      67132     66683     +0.7%
BM_memcpy_12T/4k                     5185125   5292920     -2.1%
BM_memcpy_12T/5k                     7717284   7942684     -2.9%
BM_slicingSmallPieces_1T/2              47.3      47.5     +0.4%
BM_slicingSmallPieces_1T/8              53.6      52.3     +2.4%
BM_slicingSmallPieces_1T/64              491       476     +3.1%
BM_slicingSmallPieces_1T/512           21734     18814    +13.4%
BM_slicingSmallPieces_1T/4k           394660    396760     -0.5%
BM_slicingSmallPieces_1T/5k           218722    209244     +4.3%
BM_slicingSmallPieces_2T/2              80.7      79.9     +1.0%
BM_slicingSmallPieces_2T/8              54.2      53.1     +2.0
BM_slicingSmallPieces_2T/64              497       477     +4.0%
BM_slicingSmallPieces_2T/512           21732     18822    +13.4%
BM_slicingSmallPieces_2T/4k           392885    390490     +0.6%
BM_slicingSmallPieces_2T/5k           221988    208678     +6.0%
BM_slicingSmallPieces_4T/2              80.8      80.1     +0.9%
BM_slicingSmallPieces_4T/8              54.1      53.2     +1.7%
BM_slicingSmallPieces_4T/64              493       476     +3.4%
BM_slicingSmallPieces_4T/512           21702     18758    +13.6%
BM_slicingSmallPieces_4T/4k           393962    404023     -2.6%
BM_slicingSmallPieces_4T/5k           249667    211732    +15.2%
BM_slicingSmallPieces_6T/2              80.5      80.1     +0.5%
BM_slicingSmallPieces_6T/8              54.4      53.4     +1.8%
BM_slicingSmallPieces_6T/64              488       478     +2.0%
BM_slicingSmallPieces_6T/512           21719     18841    +13.3%
BM_slicingSmallPieces_6T/4k           394950    397583     -0.7%
BM_slicingSmallPieces_6T/5k           223080    210148     +5.8%
BM_slicingSmallPieces_8T/2              81.2      80.4     +1.0%
BM_slicingSmallPieces_8T/8              58.1      53.5     +7.9%
BM_slicingSmallPieces_8T/64              489       480     +1.8%
BM_slicingSmallPieces_8T/512           21586     18798    +12.9%
BM_slicingSmallPieces_8T/4k           394592    400165     -1.4%
BM_slicingSmallPieces_8T/5k           219688    208301     +5.2%
BM_slicingSmallPieces_12T/2             80.2      79.8     +0.7%
BM_slicingSmallPieces_12T/8             54.4      53.4     +1.8
BM_slicingSmallPieces_12T/64             488       476     +2.5%
BM_slicingSmallPieces_12T/512          21931     18831    +14.1%
BM_slicingSmallPieces_12T/4k          393962    396541     -0.7%
BM_slicingSmallPieces_12T/5k          218803    207965     +5.0%
2017-01-24 13:55:18 -08:00
Rasmus Munk Larsen
7b6aaa3440 Fix NaN propagation for AVX512. 2017-01-24 13:37:08 -08:00
Rasmus Munk Larsen
5e144bbaa4 Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op.
See #1373 for details.
2017-01-24 13:32:50 -08:00
Gael Guennebaud
d83db761a2 Add support for std::integral_constant 2017-01-24 16:28:12 +01:00
Gael Guennebaud
bc10201854 Add test for multiple symbols 2017-01-24 16:27:51 +01:00
Gael Guennebaud
c43d254d13 Fix seq().reverse() in c++98 2017-01-24 11:36:43 +01:00
Gael Guennebaud
ddd83f82d8 Add support for "SymbolicExpr op fix<N>" in C++98/11 mode. 2017-01-24 10:54:42 +01:00
Gael Guennebaud
228fef1b3a Extended the set of arithmetic operators supported by FixedInt (-,+,*,/,%,&,|) 2017-01-24 10:53:51 +01:00
Gael Guennebaud
bb52f74e62 Add internal doc 2017-01-24 10:13:35 +01:00
Gael Guennebaud
41c523a0ab Rename fix_t to FixedInt 2017-01-24 09:39:49 +01:00
Gael Guennebaud
ba3f977946 bug #1376: add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j)) 2017-01-23 22:06:08 +01:00
Gael Guennebaud
b0db4eff36 bug #1382: move using std::size_t/ptrdiff_t to Eigen's namespace (still better than the global namespace!) 2017-01-23 22:03:57 +01:00
Gael Guennebaud
ca79c1545a Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t 2017-01-23 22:02:53 +01:00
Gael Guennebaud
4b607b5692 Use Index instead of size_t 2017-01-23 22:00:33 +01:00
Gael Guennebaud
0fe278f7be bug #1379: fix compilation in sparse*diagonal*dense with openmp 2017-01-21 23:27:01 +01:00
Gael Guennebaud
22a172751e bug #1378: fix doc (DiagonalIndex vs Diagonal) 2017-01-21 22:09:59 +01:00
Gael Guennebaud
4d302a080c Recover compile-time size from seq(A,B) when A and B are fixed values. (c++11 only) 2017-01-19 20:34:18 +01:00
Gael Guennebaud
54f3fbee24 Exploit fixed values in seq and reverse with C++98 compatibility 2017-01-19 19:57:32 +01:00
Gael Guennebaud
7691723e34 Add support for fixed-value in symbolic expression, c++11 only for now. 2017-01-19 19:25:29 +01:00
Benoit Steiner
924600a0e8 Made sure that enabling avx2 instructions enables avx and sse instructions as well. 2017-01-19 09:54:48 -08:00
Gael Guennebaud
e84ed7b6ef Remove dead code 2017-01-18 23:18:28 +01:00
Gael Guennebaud
f3ccbe0419 Add a Symbolic::FixedExpr helper expression to make sure the compiler fully optimize the usage of last and end. 2017-01-18 23:16:32 +01:00
Gael Guennebaud
15471432fe Add a .reverse() member to ArithmeticSequence. 2017-01-18 11:35:27 +01:00
Gael Guennebaud
e4f8dd860a Add missing operator* 2017-01-18 10:49:01 +01:00
Gael Guennebaud
198507141b Update all block expressions to accept compile-time sizes passed by fix<N> or fix<N>(n) 2017-01-18 09:43:58 +01:00
Gael Guennebaud
5484ddd353 Merge the generic and dynamic overloads of block() 2017-01-17 22:11:46 +01:00
Gael Guennebaud
655ba783f8 Defer set-to-zero in triangular = product so that no aliasing issue occur in the common:
A.triangularView() = B*A.sefladjointView()*B.adjoint()
case that used to work in 3.2.
2017-01-17 18:03:35 +01:00
Gael Guennebaud
5e36ec3b6f Fix regression when passing enums to operator() 2017-01-17 17:10:16 +01:00
Gael Guennebaud
4f36dcfda8 Add a generic block() method compatible with Eigen::fix 2017-01-17 11:34:28 +01:00
Gael Guennebaud
71e5b71356 Add a get_runtime_value helper to deal with pointer-to-function hack,
plus some refactoring to make the internals more consistent.
2017-01-17 11:33:57 +01:00
Gael Guennebaud
23bfcfc15f Add missing overload of get_compile_time for c++98/11 2017-01-17 10:30:21 +01:00
Gael Guennebaud
edff32c2c2 Disambiguate the two versions of fix for doxygen 2017-01-17 10:29:33 +01:00
Gael Guennebaud
4989922be2 Add support for symbolic expressions as arguments of operator() 2017-01-16 22:21:23 +01:00
Gael Guennebaud
12e22a2844 typos in doc 2017-01-16 16:31:19 +01:00
Gael Guennebaud
e70c4c97fa Typo 2017-01-16 16:20:16 +01:00
Gael Guennebaud
a9232af845 Introduce a variable_or_fixed<N> proxy returned by fix<N>(val) to pass both a compile-time and runtime fallback value in case N means "runtime".
This mechanism is used by the seq/seqN functions. The proxy object is immediately converted to pure compile-time (as fix<N>) or pure runtime (i.e., an Index) to avoid redundant template instantiations.
2017-01-16 16:17:01 +01:00
Gael Guennebaud
6e97698161 Introduce a EIGEN_HAS_CXX14 macro 2017-01-16 16:13:37 +01:00
Mehdi Goli
e46e722381 Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree. 2017-01-16 13:58:49 +00:00
Luke Iwanski
23778a15d8 Reverting unintentional change to Eigen/Geometry 2017-01-16 11:05:56 +00:00