eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-06 14:14:46 +08:00

Author	SHA1	Message	Date
Benoit Steiner	e96c77668d	Merged in rmlarsen/eigen2 (pull request PR-292) Adds a fast memcpy function to Eigen.	2017-01-25 00:14:04 +00:00
Rasmus Munk Larsen	3be5ee2352	Update copy helper to use fast_memcpy.	2017-01-24 14:22:49 -08:00
Rasmus Munk Larsen	e6b1020221	Adds a fast memcpy function to Eigen. This takes advantage of the following: 1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster. 2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation. The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}. Measured improvements in wall clock time: Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_memcpy_1T/2 3.48 2.39 +31.3% BM_memcpy_1T/8 12.3 6.51 +47.0% BM_memcpy_1T/64 371 383 -3.2% BM_memcpy_1T/512 66922 66720 +0.3% BM_memcpy_1T/4k 9892867 6849682 +30.8% BM_memcpy_1T/5k 14951099 10332856 +30.9% BM_memcpy_2T/2 3.50 2.46 +29.7% BM_memcpy_2T/8 12.3 7.66 +37.7% BM_memcpy_2T/64 371 376 -1.3% BM_memcpy_2T/512 66652 66788 -0.2% BM_memcpy_2T/4k 6145012 6117776 +0.4% BM_memcpy_2T/5k 9181478 9010942 +1.9% BM_memcpy_4T/2 3.47 2.47 +31.0% BM_memcpy_4T/8 12.3 6.67 +45.8 BM_memcpy_4T/64 374 376 -0.5% BM_memcpy_4T/512 67833 68019 -0.3% BM_memcpy_4T/4k 5057425 5188253 -2.6% BM_memcpy_4T/5k 7555638 7779468 -3.0% BM_memcpy_6T/2 3.51 2.50 +28.8% BM_memcpy_6T/8 12.3 7.61 +38.1% BM_memcpy_6T/64 373 378 -1.3% BM_memcpy_6T/512 66871 66774 +0.1% BM_memcpy_6T/4k 5112975 5233502 -2.4% BM_memcpy_6T/5k 7614180 7772246 -2.1% BM_memcpy_8T/2 3.47 2.41 +30.5% BM_memcpy_8T/8 12.4 10.5 +15.3% BM_memcpy_8T/64 372 388 -4.3% BM_memcpy_8T/512 67373 66588 +1.2% BM_memcpy_8T/4k 5148462 5254897 -2.1% BM_memcpy_8T/5k 7660989 7799058 -1.8% BM_memcpy_12T/2 3.50 2.40 +31.4% BM_memcpy_12T/8 12.4 7.55 +39.1 BM_memcpy_12T/64 374 378 -1.1% BM_memcpy_12T/512 67132 66683 +0.7% BM_memcpy_12T/4k 5185125 5292920 -2.1% BM_memcpy_12T/5k 7717284 7942684 -2.9% BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4% BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4% BM_slicingSmallPieces_1T/64 491 476 +3.1% BM_slicingSmallPieces_1T/512 21734 18814 +13.4% BM_slicingSmallPieces_1T/4k 394660 396760 -0.5% BM_slicingSmallPieces_1T/5k 218722 209244 +4.3% BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0% BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0 BM_slicingSmallPieces_2T/64 497 477 +4.0% BM_slicingSmallPieces_2T/512 21732 18822 +13.4% BM_slicingSmallPieces_2T/4k 392885 390490 +0.6% BM_slicingSmallPieces_2T/5k 221988 208678 +6.0% BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9% BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7% BM_slicingSmallPieces_4T/64 493 476 +3.4% BM_slicingSmallPieces_4T/512 21702 18758 +13.6% BM_slicingSmallPieces_4T/4k 393962 404023 -2.6% BM_slicingSmallPieces_4T/5k 249667 211732 +15.2% BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5% BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8% BM_slicingSmallPieces_6T/64 488 478 +2.0% BM_slicingSmallPieces_6T/512 21719 18841 +13.3% BM_slicingSmallPieces_6T/4k 394950 397583 -0.7% BM_slicingSmallPieces_6T/5k 223080 210148 +5.8% BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0% BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9% BM_slicingSmallPieces_8T/64 489 480 +1.8% BM_slicingSmallPieces_8T/512 21586 18798 +12.9% BM_slicingSmallPieces_8T/4k 394592 400165 -1.4% BM_slicingSmallPieces_8T/5k 219688 208301 +5.2% BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7% BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8 BM_slicingSmallPieces_12T/64 488 476 +2.5% BM_slicingSmallPieces_12T/512 21931 18831 +14.1% BM_slicingSmallPieces_12T/4k 393962 396541 -0.7% BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%	2017-01-24 13:55:18 -08:00
Rasmus Munk Larsen	7b6aaa3440	Fix NaN propagation for AVX512.	2017-01-24 13:37:08 -08:00
Rasmus Munk Larsen	5e144bbaa4	Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op. See #1373 for details.	2017-01-24 13:32:50 -08:00
Gael Guennebaud	d83db761a2	Add support for std::integral_constant	2017-01-24 16:28:12 +01:00
Gael Guennebaud	bc10201854	Add test for multiple symbols	2017-01-24 16:27:51 +01:00
Gael Guennebaud	c43d254d13	Fix seq().reverse() in c++98	2017-01-24 11:36:43 +01:00
Gael Guennebaud	ddd83f82d8	Add support for "SymbolicExpr op fix<N>" in C++98/11 mode.	2017-01-24 10:54:42 +01:00
Gael Guennebaud	228fef1b3a	Extended the set of arithmetic operators supported by FixedInt (-,+,*,/,%,&,\|)	2017-01-24 10:53:51 +01:00
Gael Guennebaud	bb52f74e62	Add internal doc	2017-01-24 10:13:35 +01:00
Gael Guennebaud	41c523a0ab	Rename fix_t to FixedInt	2017-01-24 09:39:49 +01:00
Gael Guennebaud	ba3f977946	bug #1376 : add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j))	2017-01-23 22:06:08 +01:00
Gael Guennebaud	b0db4eff36	bug #1382 : move using std::size_t/ptrdiff_t to Eigen's namespace (still better than the global namespace!)	2017-01-23 22:03:57 +01:00
Gael Guennebaud	ca79c1545a	Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t	2017-01-23 22:02:53 +01:00
Gael Guennebaud	4b607b5692	Use Index instead of size_t	2017-01-23 22:00:33 +01:00
Gael Guennebaud	0fe278f7be	bug #1379 : fix compilation in sparsediagonaldense with openmp	2017-01-21 23:27:01 +01:00
Gael Guennebaud	22a172751e	bug #1378 : fix doc (DiagonalIndex vs Diagonal)	2017-01-21 22:09:59 +01:00
Gael Guennebaud	4d302a080c	Recover compile-time size from seq(A,B) when A and B are fixed values. (c++11 only)	2017-01-19 20:34:18 +01:00
Gael Guennebaud	54f3fbee24	Exploit fixed values in seq and reverse with C++98 compatibility	2017-01-19 19:57:32 +01:00
Gael Guennebaud	7691723e34	Add support for fixed-value in symbolic expression, c++11 only for now.	2017-01-19 19:25:29 +01:00
Benoit Steiner	924600a0e8	Made sure that enabling avx2 instructions enables avx and sse instructions as well.	2017-01-19 09:54:48 -08:00
Gael Guennebaud	e84ed7b6ef	Remove dead code	2017-01-18 23:18:28 +01:00
Gael Guennebaud	f3ccbe0419	Add a Symbolic::FixedExpr helper expression to make sure the compiler fully optimize the usage of last and end.	2017-01-18 23:16:32 +01:00
Gael Guennebaud	15471432fe	Add a .reverse() member to ArithmeticSequence.	2017-01-18 11:35:27 +01:00
Gael Guennebaud	e4f8dd860a	Add missing operator*	2017-01-18 10:49:01 +01:00
Gael Guennebaud	198507141b	Update all block expressions to accept compile-time sizes passed by fix<N> or fix<N>(n)	2017-01-18 09:43:58 +01:00
Gael Guennebaud	5484ddd353	Merge the generic and dynamic overloads of block()	2017-01-17 22:11:46 +01:00
Gael Guennebaud	655ba783f8	Defer set-to-zero in triangular = product so that no aliasing issue occur in the common: A.triangularView() = BA.sefladjointView()B.adjoint() case that used to work in 3.2.	2017-01-17 18:03:35 +01:00
Gael Guennebaud	5e36ec3b6f	Fix regression when passing enums to operator()	2017-01-17 17:10:16 +01:00
Gael Guennebaud	4f36dcfda8	Add a generic block() method compatible with Eigen::fix	2017-01-17 11:34:28 +01:00
Gael Guennebaud	71e5b71356	Add a get_runtime_value helper to deal with pointer-to-function hack, plus some refactoring to make the internals more consistent.	2017-01-17 11:33:57 +01:00
Gael Guennebaud	23bfcfc15f	Add missing overload of get_compile_time for c++98/11	2017-01-17 10:30:21 +01:00
Gael Guennebaud	edff32c2c2	Disambiguate the two versions of fix for doxygen	2017-01-17 10:29:33 +01:00
Gael Guennebaud	4989922be2	Add support for symbolic expressions as arguments of operator()	2017-01-16 22:21:23 +01:00
Gael Guennebaud	12e22a2844	typos in doc	2017-01-16 16:31:19 +01:00
Gael Guennebaud	e70c4c97fa	Typo	2017-01-16 16:20:16 +01:00
Gael Guennebaud	a9232af845	Introduce a variable_or_fixed<N> proxy returned by fix<N>(val) to pass both a compile-time and runtime fallback value in case N means "runtime". This mechanism is used by the seq/seqN functions. The proxy object is immediately converted to pure compile-time (as fix<N>) or pure runtime (i.e., an Index) to avoid redundant template instantiations.	2017-01-16 16:17:01 +01:00
Gael Guennebaud	6e97698161	Introduce a EIGEN_HAS_CXX14 macro	2017-01-16 16:13:37 +01:00
Mehdi Goli	e46e722381	Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree.	2017-01-16 13:58:49 +00:00
Luke Iwanski	23778a15d8	Reverting unintentional change to Eigen/Geometry	2017-01-16 11:05:56 +00:00
Fraser Cormack	8245d3c7ad	Fix case-sensitivity of file include	2017-01-12 12:13:18 +00:00
Gael Guennebaud	752bd92ba5	Large code refactoring: - generalize some utilities and move them to Meta (size(), array_size()) - move handling of all and single indices to IndexedViewHelper.h - several cleanup changes	2017-01-11 17:24:02 +01:00
Gael Guennebaud	f93d1c58e0	Make get_compile_time compatible with variable_if_dynamic	2017-01-11 17:08:59 +01:00
Gael Guennebaud	c020d307a6	Make variable_if_dynamic<T> implicitely convertible to T	2017-01-11 17:08:05 +01:00
Gael Guennebaud	43c617e2ee	merge	2017-01-11 14:33:37 +01:00
Gael Guennebaud	b1dc0fa813	Move fix and symbolic to their own file, and improve doxygen compatibility	2017-01-11 14:28:28 +01:00
Gael Guennebaud	04397f17e2	Add 1D overloads of operator()	2017-01-11 13:17:09 +01:00
Gael Guennebaud	1b5570988b	Add doc to seq, seqN, ArithmeticSequence, operator(), etc.	2017-01-10 22:58:58 +01:00
Gael Guennebaud	17eac60446	Factorize const and non-const version of the generic operator() method.	2017-01-10 21:45:55 +01:00
Gael Guennebaud	d072fc4b14	add writeable IndexedView	2017-01-10 17:10:35 +01:00
Gael Guennebaud	c9d5e5c6da	Simplify Symbolic API: std::tuple is now used internally and automatically built.	2017-01-10 16:55:07 +01:00
Gael Guennebaud	407e7b7a93	Simplify symbolic API by using "symbol=value" to associate a runtime value to a symbol.	2017-01-10 16:45:32 +01:00
Gael Guennebaud	96e6cf9aa2	Fix linking issue.	2017-01-10 16:35:46 +01:00
Gael Guennebaud	e63678bc89	Fix ambiguous call	2017-01-10 16:33:40 +01:00
Gael Guennebaud	8e247744a4	Fix linking issue	2017-01-10 16:32:06 +01:00
Gael Guennebaud	b47a7e5c3a	Add doc for IndexedView	2017-01-10 16:28:57 +01:00
Gael Guennebaud	87963f441c	Fallback to Block<> when possible (Index, all, seq with > increment). This is important to take advantage of the optimized implementations (evaluator, products, etc.), and to support sparse matrices.	2017-01-10 14:25:30 +01:00
Gael Guennebaud	a98c7efb16	Add a more generic evaluation mechanism and minimalistic doc.	2017-01-10 11:46:29 +01:00
Gael Guennebaud	13d954f270	Cleanup Eigen's namespace	2017-01-10 11:06:02 +01:00
Gael Guennebaud	9eaab4f9e0	Refactoring: move all symbolic stuff into its own namespace	2017-01-10 10:57:08 +01:00
Gael Guennebaud	acd08900c9	Move 'last' and 'end' to their own namespace	2017-01-10 10:31:07 +01:00
Gael Guennebaud	1df2377d78	Implement c++98 version of seq()	2017-01-10 10:28:45 +01:00
Gael Guennebaud	ecd9cc5412	Isolate legacy code (we keep it for performance comparison purpose)	2017-01-10 09:34:25 +01:00
Gael Guennebaud	b50c3e967e	Add a minimalistic symbolic scalar type with expression template and make use of it to define the last placeholder and to unify the return type of seq and seqN.	2017-01-09 23:42:16 +01:00
Gael Guennebaud	68064e14fa	Rename span/range to seqN/seq	2017-01-09 17:35:21 +01:00
Gael Guennebaud	ad3eef7608	Add link to SO	2017-01-09 13:01:39 +01:00
Gael Guennebaud	75aef5b37f	Fix extraction of compile-time size of std::array with gcc	2017-01-06 22:04:49 +01:00
Gael Guennebaud	233dff1b35	Add support for plain arrays for columns and both rows/columns	2017-01-06 22:01:53 +01:00
Gael Guennebaud	76e183bd52	Propagate compile-time size for plain arrays	2017-01-06 22:01:23 +01:00
Gael Guennebaud	3264d3c761	Add support for plain-array as indices, e.g., mat({1,2,3,4})	2017-01-06 21:53:32 +01:00
Gael Guennebaud	831fffe874	Add missing doc of SparseView	2017-01-06 18:01:29 +01:00
Gael Guennebaud	a875167d99	Propagate compile-time increment and strides. Had to introduce a UndefinedIncr constant for non structured list of indices.	2017-01-06 15:54:55 +01:00
Gael Guennebaud	e383d6159a	MSVC 2015 has all we want about c++11 and MSVC 2017 fails on binder1st/binder2nd	2017-01-06 15:44:13 +01:00
Gael Guennebaud	fad1fa75b3	Propagate compile-time size with "all" and add c++11 array unit test	2017-01-06 13:29:33 +01:00
Gael Guennebaud	3730e3ca9e	Use "fix" for compile-time values, propagate compile-time sizes for span, clean some cleanup.	2017-01-06 13:10:10 +01:00
Gael Guennebaud	ac7e4ac9c0	Initial commit to add a generic indexed-based view of matrices. This version already works as a read-only expression. Numerous refactoring, renaming, extension, tuning passes are expected...	2017-01-06 00:01:44 +01:00
Jim Radford	0c226644d8	LLT: const the arg to solveInPlace() to allow passing .transpose(), .block(), etc.	2017-01-04 14:42:57 -08:00
Jim Radford	be281e5289	LLT: avoid making a copy when decomposing in place	2017-01-04 14:43:56 -08:00
Gael Guennebaud	e27f17bf5c	Gub 1453: fix Map with non-default inner-stride but no outer-stride.	2017-08-22 13:27:37 +02:00
Gael Guennebaud	21d0a0bcf5	bug #1456 : add perf recommendation for LLT and storage format	2017-08-22 12:46:35 +02:00
Gael Guennebaud	2c3d70d915	Re-enable hidden doc in LLT	2017-08-22 12:04:09 +02:00
Gael Guennebaud	a6e7a41a55	bug #1455 : Cholesky module depends on Jacobi for rank-updates.	2017-08-22 11:37:32 +02:00
Gael Guennebaud	e6021cc8cc	bug #1458 : fix documentation of LLT and LDLT info() method.	2017-08-22 11:32:55 +02:00
Gael Guennebaud	f727844658	use MKL's lapacke.h header when using MKL	2017-08-17 21:58:39 +02:00
Gael Guennebaud	8c858bd891	Clarify doc regarding the usage of MKL_DIRECT_CALL	2017-08-17 12:17:45 +02:00
Gael Guennebaud	b95f92843c	Fix support for MKL's BLAS when using MKL_DIRECT_CALL.	2017-08-17 12:07:10 +02:00
Gael Guennebaud	687bedfcad	Make NoAlias and JacobiRotation compatible with CUDA.	2017-08-17 11:51:22 +02:00
Gael Guennebaud	1f4b24d2df	Do not preallocate more space than the matrix size (when the sparse matrix boils down to a vector	2017-07-20 10:13:48 +02:00
Gael Guennebaud	55d7181557	Fix lazyness of operator* with CUDA	2017-07-20 09:47:28 +02:00
Gael Guennebaud	cda47c42c2	Fix compilation in c++98 mode.	2017-07-17 21:08:20 +02:00
Gael Guennebaud	3182bdbae6	Disable vectorization when compiled by nvcc, even is EIGEN_NO_CUDA is defined	2017-07-17 11:01:28 +02:00
Gael Guennebaud	9f8136ff74	disable nvcc boolean-expr-is-constant warning	2017-07-17 10:43:18 +02:00
Gael Guennebaud	bbd97b4095	Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases	2017-07-17 01:02:51 +02:00
Gael Guennebaud	2299717fd5	Fix and workaround several doxygen issues/warnings	2017-01-04 23:27:33 +01:00
Gael Guennebaud	ee6f7f6c0c	Add doc for sparse triangular solve functions	2017-01-04 23:10:36 +01:00
Gael Guennebaud	a0a36ad0ef	bug #1336 : workaround doxygen failing to include numerous members of MatriBase in Matrix	2017-01-04 22:02:39 +01:00
Gael Guennebaud	29a1a58113	Document selfadjointView	2017-01-04 22:01:50 +01:00
Gael Guennebaud	8702562177	bug #1370 : add doc for StorageIndex	2017-01-03 11:25:41 +01:00
Gael Guennebaud	575c078759	bug #1370 : rename _Index to _StorageIndex in SparseMatrix, and add a warning in the doc regarding the 3.2 to 3.3 change of SparseMatrix::Index	2017-01-03 11:19:14 +01:00
Valentin Roussellet	d3c5525c23	Added += and + operators to inner iterators Fix #1340 #1340	2016-12-28 18:29:30 +01:00
Gael Guennebaud	5c27962453	Move common cwise-unary method from MatrixBase/ArrayBase to the common DenseBase class.	2017-01-02 22:27:07 +01:00
Gael Guennebaud	8d7810a476	bug #1365 : fix another type mismatch warning (sync is set from and compared to an Index)	2016-12-28 23:35:43 +01:00
Gael Guennebaud	97812ff0d3	bug #1369 : fix type mismatch warning. Returned values of omp thread id and numbers are int, o let's use int instead of Index here.	2016-12-28 23:29:35 +01:00
Gael Guennebaud	7713e20fd2	Fix compilation	2016-12-27 22:04:58 +01:00
Gael Guennebaud	ab69a7f6d1	Cleanup because trait<CwiseBinaryOp>::Flags now expose the correct storage order	2016-12-27 16:55:47 +01:00
Gael Guennebaud	d32a43e33a	Make sure that traits<CwiseBinaryOp>::Flags reports the correct storage order so that methods like .outerSize()/.innerSize() work properly.	2016-12-27 16:35:45 +01:00
Gael Guennebaud	7136267461	Add missing .outer() member to iterators of evaluators of cwise sparse binary expression	2016-12-27 16:34:30 +01:00
Gael Guennebaud	fe0ee72390	Fix check of storage order mismatch for "sparse cwiseop sparse".	2016-12-27 16:33:19 +01:00
Gael Guennebaud	6b8f637ab1	Harmless typo	2016-12-27 16:31:17 +01:00
Benoit Steiner	354baa0fb1	Avoid using horizontal adds since they're not very efficient.	2016-12-21 20:55:07 -08:00
Benoit Steiner	d7825b6707	Use native AVX512 types instead of Eigen Packets whenever possible.	2016-12-21 20:06:18 -08:00
Gael Guennebaud	c6882a72ed	Merged in joaoruileal/eigen (pull request PR-276) Minor improvements to Umfpack support	2016-12-21 21:39:48 +01:00
Joao Rui Leal	c8c89b5e19	renamed methods umfpackReportControl(), umfpackReportInfo(), and umfpackReportStatus() from UmfPackLU to printUmfpackControl(), printUmfpackInfo(), and printUmfpackStatus()	2016-12-21 09:16:28 +00:00
Gael Guennebaud	f2f9df8aa5	Remove MSVC warning 4127 - conditional expression is constant from the disabled list as we now have a local workaround.	2016-12-20 22:53:19 +01:00
Gael Guennebaud	2b3fc981b8	bug #1362 : workaround constant conditional warning produced by MSVC	2016-12-20 22:52:27 +01:00
Gael Guennebaud	94e8d8902f	Fix bug #1367 : compilation fix for gcc 4.1!	2016-12-20 22:17:01 +01:00
Gael Guennebaud	684cfc762d	Add transpose, adjoint, conjugate methods to SelfAdjointView (useful to write generic code)	2016-12-20 16:33:53 +01:00
Gael Guennebaud	11f55b2979	Optimize storage layout of Cwise* and PlainObjectBase evaluator to remove the functor or outer-stride if they are empty. For instance, sizeof("(A-B).cwiseAbs2()") with A,B Vector4f is now 16 bytes, instead of 48 before this optimization. In theory, evaluators should be completely optimized away by the compiler, but this might help in some cases.	2016-12-20 15:55:40 +01:00
Gael Guennebaud	5271474b15	Remove common "noncopyable" base class from evaluator_base to get a chance to get EBO (Empty Base Optimization) Note: we should probbaly get rid of this class and define a macro instead.	2016-12-20 15:51:30 +01:00
Gael Guennebaud	316673bbde	Clean-up usage of ExpressionTraits in all/any implementation.	2016-12-20 14:38:05 +01:00
Christoph Hertzberg	10c6bcdc2e	Add support for long indexes and for (real-valued) row-major matrices to CholmodSupport module	2016-12-19 14:07:42 +01:00
Gael Guennebaud	f5d644b415	Make sure that HyperPlane::transform manitains a unit normal vector in the Affine case.	2016-12-20 09:35:00 +01:00
Benoit Steiner	923acadfac	Fixed compilation errors with gcc6 when compiling the AVX512 intrinsics	2016-12-19 13:02:27 -08:00
Benoit Jacob	751e097c57	Use 32 registers on ARM64	2016-12-19 13:44:46 -05:00
Benoit Steiner	fb1d0138ec	Include SSE packet instructions when compiling with avx512 enabled.	2016-12-19 07:32:48 -08:00
Joao Rui Leal	95b804c0fe	it is now possible to change Umfpack control settings before factorizations; added access to the report functions of Umfpack	2016-12-19 10:45:59 +00:00
Gael Guennebaud	8c0e701504	bug #1360 : fix sign issue with pmull on altivec	2016-12-18 22:13:19 +00:00
Gael Guennebaud	fc94258e77	Fix unused warning	2016-12-18 22:11:48 +00:00
ermak	d60cca32e5	Transformation methods added to ParametrizedLine class.	2016-12-17 00:45:13 +07:00
Benoit Steiner	9e03dfb452	Made sure EIGEN_HAS_C99_MATH is defined when compiling OpenCL code	2016-12-17 09:23:37 -08:00
Rafael Guglielmetti	8f11df2667	NumTraits.h: For the values 'ReadCost, AddCost and MulCost', information about value Eigen::HugeCost	2016-12-16 09:07:12 +00:00
Benoit Steiner	1324ffef2f	Reenabled the use of constexpr on OpenCL devices	2016-12-15 06:49:38 -08:00
Gael Guennebaud	5d00fdf0e8	bug #1363 : fix mingw's ABI issue	2016-12-15 11:58:31 +01:00
Gael Guennebaud	11b492e993	bug #1358 : fix compilation for sparse += sparse.selfadjointView();	2016-12-14 17:53:47 +01:00
Gael Guennebaud	e67397bfa7	bug #1359 : fix compilation of col_major_sparse.row() *= scalar (used to work in 3.2.9 though the expression is not really writable)	2016-12-14 17:05:26 +01:00
Gael Guennebaud	98d7458275	bug #1359 : fix sparse /=scalar and *=scalar implementation. InnerIterators must be obtained from an evaluator.	2016-12-14 17:03:13 +01:00
Gael Guennebaud	c817ce3ba3	bug #1361 : fix compilation issue in mat=perm.inverse()	2016-12-13 23:10:27 +01:00
Benoit Steiner	6811e6cf49	Merged in srvasude/eigen/fix_cuda_exp (pull request PR-268) Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).	2016-12-08 05:14:11 -08:00
Angelos Mantzaflaris	7694684992	Remove superfluous const's (can cause warnings on some Intel compilers) (grafted from `e236d3443c` )	2016-12-07 00:37:48 +01:00
Gael Guennebaud	eb621413c1	Revert vec/y to vec*(1/y) in row-major TRSM: - div is extremely costly - this is consistent with the column-major case - this is consistent with all other BLAS implementations	2016-12-06 15:04:50 +01:00
Gael Guennebaud	8365c2c941	Fix BLAS backend for symmetric rank K updates.	2016-12-06 14:47:09 +01:00
Srinivas Vasudevan	e6c8b5500c	Change comparisons to use Scalar instead of RealScalar.	2016-12-05 14:01:45 -08:00
Srinivas Vasudevan	f7d7c33a28	Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).	2016-12-05 12:19:01 -08:00
Srinivas Vasudevan	09ee7f0c80	Fix small nit where I changed name of plog1p to pexpm1.	2016-12-02 15:30:12 -08:00
Srinivas Vasudevan	a0d3ac760f	Sync from Head.	2016-12-02 14:14:45 -08:00
Srinivas Vasudevan	218764ee1f	Added support for expm1 in Eigen.	2016-12-02 14:13:01 -08:00
Gael Guennebaud	66f65ccc36	Ease compiler job to generate clean and efficient code in mat*vec.	2016-12-02 22:41:26 +01:00
Gael Guennebaud	fe696022ec	Operators += and -= do not resize!	2016-12-02 22:40:25 +01:00
Angelos Mantzaflaris	18de92329e	use numext::abs (grafted from `0a08d4c60b` )	2016-12-02 11:48:06 +01:00
Angelos Mantzaflaris	e8a6aa518e	1. Add explicit template to abs2 (resolves deduction for some arithmetic types) 2. Avoid signed-unsigned conversion in comparison (warning in case Scalar is unsigned) (grafted from `4086187e49` )	2016-12-02 11:39:18 +01:00
Gael Guennebaud	a6b971e291	Fix memory leak in Ref<Sparse>	2016-12-05 16:59:30 +01:00
Gael Guennebaud	8640ffac65	Optimize SparseLU::solve for rhs vectors	2016-12-05 15:41:14 +01:00
Gael Guennebaud	62acd67903	remove temporary in SparseLU::solve	2016-12-05 15:11:57 +01:00
Gael Guennebaud	0db6d5b3f4	bug #1356 : fix calls to evaluator::coeffRef(0,0) to get the address of the destination by adding a dstDataPtr() member to the kernel. This fixes undefined behavior if dst is empty (nullptr).	2016-12-05 15:08:09 +01:00
Gael Guennebaud	91003f3b86	typo	2016-12-05 13:51:07 +01:00
Gael Guennebaud	e3f613cbd4	Improve performance of row-major-dense-matrix * vector products for recent CPUs. This revised version does not bother about aligned loads/stores, and rather processes 8 rows at ones for better instruction pipelining.	2016-12-05 13:02:01 +01:00
Gael Guennebaud	3abc827354	Clean debugging code	2016-12-05 12:59:32 +01:00
Benoit Steiner	462c28e77a	Merged in srvasude/eigen (pull request PR-265) Add Expm1 support to Eigen.	2016-12-05 02:31:11 +00:00
Gael Guennebaud	6a5fe86098	Complete rewrite of column-major-matrix * vector product to deliver higher performance of modern CPU. The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive. This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA. According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast. Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching. We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).	2016-12-03 21:14:14 +01:00
Christoph Hertzberg	22f7d398e2	bug #1355 : Fixed wrong line-endings on two files	2016-12-02 11:22:05 +01:00
Gael Guennebaud	27873008d4	Clean up SparseCore module regarding ReverseInnerIterator	2016-12-01 21:55:10 +01:00
Angelos Mantzaflaris	8c24723a09	typo UIntPtr (grafted from `b6f04a2dd4` )	2016-12-01 21:25:58 +01:00
Angelos Mantzaflaris	aeba0d8655	fix two warnings(unused typedef, unused variable) and a typo (grafted from `a9aa3bcf50` )	2016-12-01 21:23:43 +01:00
Gael Guennebaud	181138a1cb	fix member order	2016-12-01 17:06:20 +01:00
Gael Guennebaud	9f297d57ae	Merged in rmlarsen/eigen (pull request PR-256) Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.	2016-12-01 15:27:33 +00:00
Benoit Steiner	7ff26ddcbb	Merged eigen/eigen into default	2016-12-01 07:13:17 -08:00
Gael Guennebaud	037b46762d	Fix misleading-indentation warnings.	2016-12-01 16:05:42 +01:00
Mehdi Goli	79aa2b784e	Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.	2016-12-01 13:02:27 +00:00
Benoit Steiner	fd1dc3363e	Merged eigen/eigen into default	2016-11-30 20:16:17 -08:00
Gael Guennebaud	8df272af88	Fix slection of product implementation for dynamic size matrices with fixed max size.	2016-11-30 22:21:33 +01:00
Gael Guennebaud	c927af60ed	Fix a performance regression in (matmat)vec for which mat*mat was evaluated multiple times.	2016-11-30 17:59:13 +01:00
Gael Guennebaud	ab4ef5e66e	bug #1351 : fix compilation of random with old compilers	2016-11-30 17:37:53 +01:00
Rasmus Munk Larsen	a0329f64fb	Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.	2016-11-29 13:18:09 -08:00
Benoit Steiner	9f8fbd9434	Merged eigen/eigen into default	2016-11-26 11:28:25 -08:00
Mehdi Goli	7318daf887	Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h.	2016-11-25 16:19:07 +00:00
Benoit Steiner	3be1afca11	Disabled the "remove the call to 'std::abs' since unsigned values cannot be negative" warning introduced in clang 3.5	2016-11-23 18:49:51 -08:00
Mehdi Goli	b8cc5635d5	Removing unsupported device from test case; cleaning the tensor device sycl.	2016-11-23 16:30:41 +00:00
Gael Guennebaud	e340866c81	Fix compilation with gcc and old ABI version	2016-11-23 14:04:57 +01:00
Gael Guennebaud	a91de27e98	Fix compilation issue with MSVC: MSVC always messes up with shadowed template arguments, for instance in: struct B { typedef float T; } template<typename T> struct A : B { T g; }; The type of A<double>::g will be float and not double.	2016-11-23 12:24:48 +01:00
Gael Guennebaud	74637fa4e3	Optimize predux<Packet8f> (AVX)	2016-11-22 21:57:52 +01:00
Gael Guennebaud	178c084856	Disable usage of SSE3 _mm_hadd_ps that is extremely slow.	2016-11-22 21:53:14 +01:00
Gael Guennebaud	7dd894e40e	Optimize predux<Packet4d> (AVX)	2016-11-22 21:41:30 +01:00
Gael Guennebaud	f3fb0a1940	Disable usage of SSE3 haddpd that is extremely slow.	2016-11-22 16:58:31 +01:00
Gael Guennebaud	6a84246a6a	Fix regression in assigment of sparse block to spasre block.	2016-11-21 21:46:42 +01:00
Benoit Steiner	ed839c5851	Enable the use of constant expressions with clang >= 3.6	2016-11-20 10:34:49 -08:00
Gael Guennebaud	465ede0f20	Fix compilation issue in mat = permutation (regression introduced in `8193ffb3d3` )	2016-11-20 09:41:37 +01:00
Benoit Steiner	81151bd474	Fixed merge conflicts	2016-11-19 19:12:59 -08:00
Benoit Steiner	1bdf1b9ce0	Merged in benoitsteiner/opencl (pull request PR-253) OpenCL improvements	2016-11-19 04:44:43 +00:00
Benoit Steiner	8649e16c2a	Enable EIGEN_HAS_C99_MATH when building with the latest version of Visual Studio	2016-11-18 14:18:34 -08:00
Gael Guennebaud	164414c563	Merged in ChunW/eigen (pull request PR-252) Workaround for error in VS2012 with /clr	2016-11-18 21:07:29 +00:00
Luke Iwanski	5159675c33	Added isnan, isfinite and isinf for SYCL device. Plus test for that.	2016-11-18 16:01:48 +00:00
Gael Guennebaud	8193ffb3d3	bug #1343 : fix compilation regression in mat+=selfadjoint_view. Generic EigenBase2EigenBase assignment was incomplete.	2016-11-18 10:17:34 +01:00
Gael Guennebaud	cebff7e3a2	bug #1343 : fix compilation regression in array = matrix_product	2016-11-18 10:09:33 +01:00
Benoit Steiner	7c30078b9f	Merged eigen/eigen into default	2016-11-17 22:53:37 -08:00
Chun Wang	0d0948c3b9	Workaround for error in VS2012 with /clr	2016-11-17 17:54:27 -05:00
Konstantinos Margaritis	672aa97d4d	implement float/std::complex<float> for ZVector as well, minor fixes to ZVector	2016-11-17 13:27:33 -05:00
Luke Iwanski	c5130dedbe	Specialised basic math functions for SYCL device.	2016-11-17 11:47:13 +00:00
Benoit Steiner	f2e8b73256	Enable the use of AVX512 instruction by default	2016-11-16 21:28:04 -08:00
Gael Guennebaud	7b09e4dd8c	bump default branch to 3.3.90	2016-11-16 22:20:58 +01:00
Benoit Steiner	dff9a049c4	Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs	2016-11-16 09:01:51 -08:00
Gael Guennebaud	0ee92aa38e	Optimize sparse<bool> && sparse<bool> to use the same path as for coeff-wise products.	2016-11-14 18:47:41 +01:00
Gael Guennebaud	2e334f5da0	bug #426 : move operator && and \|\| to MatrixBase and SparseMatrixBase.	2016-11-14 18:47:02 +01:00
Gael Guennebaud	a048aba14c	Merged in olesalscheider/eigen (pull request PR-248) Make sure not to call numext::maxi on expression templates	2016-11-14 13:25:53 +00:00
Gael Guennebaud	eedb87f4ba	Fix regression in SparseMatrix::ReverseInnerIterator	2016-11-14 14:05:53 +01:00
Niels Ole Salscheider	51fef87408	Make sure not to call numext::maxi on expression templates	2016-11-12 12:20:57 +01:00
Gael Guennebaud	eeac81b8c0	bump to 3.3.0	2016-11-10 13:55:14 +01:00
Gael Guennebaud	e80bc2ddb0	Fix printing of sparse expressions	2016-11-10 10:35:32 +01:00
Benoit Steiner	db3903498d	Merged in benoitsteiner/opencl (pull request PR-246) Improved support for OpenCL	2016-11-08 22:28:44 +00:00
Gael Guennebaud	436a111792	Generalize Cholmod support to hanlde any sparse type as the rhs and result of the solve method	2016-11-06 20:29:23 +01:00
Gael Guennebaud	afc55b1885	Generalize IterativeSolverBase::solve to hanlde any sparse type as the results (instead of SparseMatrix only)	2016-11-06 20:28:18 +01:00
Gael Guennebaud	a5c2d8a3cc	Generalize solve_sparse_through_dense_panels to handle SparseVector.	2016-11-06 15:20:58 +01:00
Gael Guennebaud	f8bfe10613	Add missing friend declaration	2016-11-06 15:20:30 +01:00
Gael Guennebaud	fc7180cda8	Add a default ctor to evaluator<SparseVector>. Needed for evaluator<Solve>.	2016-11-06 15:20:00 +01:00
Gael Guennebaud	4d226ab5b5	Enable swapping between SparseMatrix and SparseVector	2016-11-06 15:15:03 +01:00
Gael Guennebaud	a354c3ca59	Fix compilation of LLT with complex<mpreal>.	2016-11-05 11:28:29 +01:00
Benoit Steiner	d46a36cc84	Merged eigen/eigen into default	2016-11-04 18:22:55 -07:00
Mehdi Goli	0ebe3808ca	Removed the sycl include from Eigen/Core and moved it to Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size;	2016-11-04 18:18:19 +00:00
Gael Guennebaud	ba05572dcb	bump to 3.3-rc2	2016-11-04 09:09:06 +01:00
Benoit Steiner	5c3995769c	Improved AVX512 configuration	2016-11-03 04:50:28 -07:00
Benoit Steiner	ca0ba0d9a4	Improved AVX512 support	2016-11-03 04:00:49 -07:00
Benoit Steiner	c80587c92b	Merged eigen/eigen into default	2016-11-03 03:55:11 -07:00
Gael Guennebaud	3f1d0cdc22	bug #1337 : improve doc of homogeneous() and hnormalized()	2016-11-03 11:03:08 +01:00
Gael Guennebaud	78e93ac1ad	bug #1330 : Cholmod supports double precision only, so let's trigger a static assertion if the scalar type does not match this requirement.	2016-11-03 10:21:59 +01:00
Benoit Steiner	3e37166d0b	Merged in benoitsteiner/opencl (pull request PR-244) Disable vectorization on device only when compiling for sycl	2016-11-02 22:01:03 +00:00
Benoit Steiner	0585b2965d	Disable vectorization on device only when compiling for sycl	2016-11-02 11:44:27 -07:00
Gael Guennebaud	a07bb428df	bug #1004 : improve accuracy of LinSpaced for abs(low) >> abs(high).	2016-11-02 11:34:38 +01:00
Gael Guennebaud	598de8b193	Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX.	2016-11-02 10:38:13 +01:00
Benoit Steiner	7a0e96b80d	Gate the code that refers to cuda fp16 primitives more thoroughly	2016-11-01 12:08:09 -07:00
Gael Guennebaud	3ecb343dc3	Fix regression in X = (X*X.transpose())/s with X rectangular by deferring resizing of the destination after the creation of the evaluator of the source expression.	2016-10-26 22:50:41 +02:00
Gael Guennebaud	97feea9d39	add a generic EIGEN_HAS_CXX11	2016-10-26 15:53:13 +02:00
Gael Guennebaud	ca6a2a5248	Fix warning with ICC	2016-10-26 14:13:05 +02:00
Gael Guennebaud	b15a5dc3f4	Fix ICC warnings	2016-10-25 22:20:24 +02:00
Gael Guennebaud	aad72f3c6d	Add missing inline keywords	2016-10-25 20:20:09 +02:00
Benoit Steiner	3e194a6a73	Fixed a typo	2016-10-25 08:42:15 -07:00
Gael Guennebaud	58146be99b	bug #1004 : one more rewrite of LinSpaced for floating point numbers to guarantee both interpolation and monotonicity. This version simply does low+i*step plus a branch to return high if i==size-1. Vectorization is accomplished with a branch and the help of pinsertlast. Some quick benchmark revealed that the overhead is really marginal, even when filling small vectors.	2016-10-25 16:53:09 +02:00
Gael Guennebaud	13fc18d3a2	Add a pinsertlast function replacing the last entry of a packet by a scalar. (useful to vectorize LinSpaced)	2016-10-25 16:48:49 +02:00
Gael Guennebaud	2634f9386c	bug #1333 : fix bad usage of const_cast_derived. Better use .data() for that purpose.	2016-10-24 22:22:35 +02:00
Gael Guennebaud	9e8f07d7b5	Cleanup ArrayWrapper and MatrixWrapper by removing redundant accessors.	2016-10-24 22:16:48 +02:00
Gael Guennebaud	b027d7a8cf	bug #1004 : remove the inaccurate "sequential" path for LinSpaced, mark respective function as deprecated, and enforce strict interpolation of the higher range using a correction term. Now, even with floating point precision, both the 'low' and 'high' bounds are exactly reproduced at i=0 and i=size-1 respectively.	2016-10-24 20:27:21 +02:00
Benoit Steiner	b11aab5fcc	Merged in benoitsteiner/opencl (pull request PR-238) Added support for OpenCL to the Tensor Module	2016-10-24 15:30:45 +00:00
Gael Guennebaud	53c77061f0	bug #698 : rewrite LinSpaced for integer scalar types to avoid overflow and guarantee an even spacing when possible. Otherwise, the "high" bound is implicitly lowered to the largest value allowing for an even distribution. This changeset also disable vectorization for this integer path.	2016-10-24 15:50:27 +02:00
Gael Guennebaud	40f62974b7	bug #1328 : workaround a compilation issue with gcc 4.2	2016-10-20 19:19:37 +02:00
Benoit Steiner	cf20b30d65	Merge latest updates from trunk	2016-10-20 09:42:05 -07:00
Benoit Steiner	d3943cd50c	Fixed a few typos in the ternary tensor expressions types	2016-10-19 12:56:12 -07:00
Mehdi Goli	8fb162fc85	Fixing the typo regarding missing #if needed for proper handling of exceptions in Eigen/Core.	2016-10-16 12:52:34 +01:00
Luke Iwanski	2e188dd4d4	Merged ComputeCpp to default.	2016-10-14 16:47:40 +01:00
Mehdi Goli	15380f9a87	Applyiing Benoit's comment to return the missing line back in Eigen/Core	2016-10-14 16:39:41 +01:00
Gael Guennebaud	692b30ca95	Fix previous merge.	2016-10-14 17:16:28 +02:00
Gael Guennebaud	050c681bdd	Merged in rmlarsen/eigen2 (pull request PR-232) Improve performance of parallelized matrix multiply for rectangular matrices	2016-10-14 14:51:09 +00:00

... 3 4 5 6 7 ...

5484 Commits