eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Benoit Steiner	8afce86e64	Added support for RowMajor layout to the image patch extraction code Speeded up the unsupported_cxx11_tensor_image_patch test and reduced its memory footprint	2015-02-25 09:48:54 -08:00
Benoit Jacob	692136350b	So I extensively measured the impact of the offset in this prefetch. I tried offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!	2015-02-25 12:37:14 -05:00
Christoph Hertzberg	531fa9de77	bug #970 : Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports RValue-references.	2015-02-24 21:03:28 +01:00
Benoit Jacob	26275b250a	Fix my recent prefetch changes: - the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device.	2015-02-23 16:55:17 -05:00
Benoit Jacob	488874781b	Add analyze-blocking-sizes program under bench/ to analyze multiple logs generated by benchmark-blocking-sizes.	2015-02-23 14:02:29 -05:00
Christoph Hertzberg	052b6b40f1	Fix two trivial warnings	2015-02-22 12:40:51 +01:00
Christoph Hertzberg	ecbf2a6656	log1p is defined only for real Scalars in C++11	2015-02-21 19:58:24 +01:00
Christoph Hertzberg	6af6cf0c2e	I can reproduce any problems that justified this hack. However it makes builds fail in C++11 mode.	2015-02-21 19:43:56 +01:00
Gael Guennebaud	3cf642baa3	Fix compilation of unit tests disabling assertion cheking	2015-02-21 14:13:48 +01:00
Benoit Jacob	458cf91cd9	Add benchmark-blocking-sizes.cpp to bench/ per mailing list discussion.	2015-02-20 17:08:04 -05:00
Gael Guennebaud	03ec601ff7	Initial version of a small script to help tracking performance regressions	2015-02-20 19:20:34 +01:00
Gael Guennebaud	333b497383	update bench_gemm	2015-02-20 11:59:49 +01:00
Gael Guennebaud	2da1594750	Fix doc of Ref<>	2015-02-20 11:52:22 +01:00
Gael Guennebaud	01b8440579	With C++11 Matrix<float> + Matrix<complex<float>> does not even compile	2015-02-20 09:32:49 +01:00
Gael Guennebaud	3594451ee0	Remove EIGEN_TEST_C++0x option and let EIGEN_TEST_CXX11 adds the -std=c++11 flag	2015-02-20 09:31:27 +01:00
Gael Guennebaud	b192e29eae	In C++11 destructors do not throw by default (fix CommaInitializer unit test)	2015-02-20 09:28:34 +01:00
Benoit Steiner	ab41652d81	Pulled latest changes from trunk	2015-02-19 21:23:37 -08:00
Benoit Steiner	7765039f1c	Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up being executed on the GPU device.	2015-02-19 21:22:51 -08:00
Gael Guennebaud	a66f5fc2fd	Fix regression with C++11 support of lambda: now internal::result_of falls back to std::result_of in C++11.	2015-02-19 23:32:12 +01:00
Gael Guennebaud	ece6b440f9	Fix a C++11 compilation issue in unit test	2015-02-19 23:31:08 +01:00
Gael Guennebaud	1b7e12847d	Fix some calls to result_of on binary functors as unary ones.	2015-02-19 23:30:41 +01:00
Gael Guennebaud	0f4dd15dfc	Declare const some const variables	2015-02-19 23:28:57 +01:00
Benoit Steiner	92ceb02c6d	Pulle latest updates from trunk	2015-02-19 11:59:52 -08:00
Benoit Steiner	110fb90250	Improved the documentations	2015-02-19 11:59:04 -08:00
Gael Guennebaud	829dddd0fd	Add support for C++11 result_of/lambdas	2015-02-19 15:18:37 +01:00
Benoit Jacob	db05f2d01e	rotating kernel: avoid compiling anything outside of ARM	2015-02-18 15:43:52 -05:00
Benoit Jacob	0ed00d5438	remove a newly introduced redundant typedef - sorry.	2015-02-18 15:05:01 -05:00
Benoit Jacob	9bd8a4bab5	bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).	2015-02-18 15:03:35 -05:00
Hauke Heibel	ee27d50633	Fixed template parameter.	2015-02-18 18:51:08 +01:00
Gael Guennebaud	73a24de424	merge	2015-02-18 15:51:00 +01:00
Gael Guennebaud	63eb0f6fe6	Clean a bit computeProductBlockingSizes (use Index type, remove CEIL macro)	2015-02-18 15:49:05 +01:00
Gael Guennebaud	fc5c3e85e2	Fix bug #961 : eigen-doc.tgz included part of itself.	2015-02-18 15:47:01 +01:00
Benoit Jacob	4a3e6c8be1	bug #958 - Allow testing specific blocking sizes This is only a debugging/testing patch. It allows testing specific product blocking sizes, typically to study the impact on performance. Example usage: int testk, testm, testn; #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn #include <Eigen/Core>	2015-02-18 09:43:55 -05:00
Gael Guennebaud	c7bb1e8ea8	Fix a regression when using OpenMP, and fix bug #714 : the number of threads might be lower than the number of requested ones	2015-02-18 15:19:23 +01:00
Jan Blechta	168ceb271e	Really use zero guess in ConjugateGradients::solve as documented and expected for consistency with other methods.	2015-02-18 14:26:10 +01:00
Gael Guennebaud	8fdcaded5e	merge	2015-03-04 10:18:08 +01:00
Gael Guennebaud	c43154bbc5	Check for no-reallocation in SparseMatrix::insert (bug #974 )	2015-03-04 10:16:46 +01:00
Gael Guennebaud	1ce0178363	Improve efficiency of SparseMatrix::insert/coeffRef for sequential outer-index insertion strategies (bug #974 )	2015-03-04 09:39:26 +01:00
Gael Guennebaud	3dca4a1efc	Update manual wrt new LSCG solver.	2015-03-04 09:35:30 +01:00
Gael Guennebaud	05274219a7	Add a CG-based solver for rectangular least-square problems (bug #975 ).	2015-03-04 09:34:27 +01:00
Benoit Jacob	2aa09e6b4e	Fix asm comments in 1px1 kernel	2015-03-03 13:44:00 -05:00
Benoit Steiner	5d2fd64a1a	Fixed compilation error when compiling with gcc4.7	2015-03-03 08:56:49 -08:00
Benoit Jacob	f64b4480af	Add missing copyright notices	2015-03-03 11:43:56 -05:00
Benoit Jacob	eae8e27b7d	Add a benchmark-default-sizes action to benchmark-blocking-sizes.cpp	2015-03-03 11:41:21 -05:00
Marc Glisse	37a93c4263	New scoring functor to select the pivot. This is can be useful for non-floating point scalars, where choosing the biggest element is generally not the best choice.	2015-03-03 17:08:28 +01:00
Benoit Jacob	ccc1277a42	must also disable complex<double> when disabling double vectorization	2015-03-03 10:17:05 -05:00
Benoit Jacob	f839099512	Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON intrinsics.	2015-03-03 09:35:22 -05:00
Benoit Jacob	9930e9583b	Improve analyze-blocking-sizes, and in particular give it a evaluate-defaults tool that shows the efficiency of Eigen's default blocking sizes choices, using a previously computed table from benchmark-blocking-sizes.	2015-03-02 18:08:38 -05:00
Benoit Jacob	1ec0f4fadf	HalfPacket also needed to be disabled for double, on ARMv8.	2015-03-02 16:08:54 -05:00
Gael Guennebaud	3109f0e74e	Add SSE vectorization of Quaternion::conjugate. Significant speed-up when combined with products like q1*q2.conjugate()	2015-03-02 20:09:33 +01:00

... 3 4 5 6 7 ...

6402 Commits