eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	58740ce4c6	Improve product kernel: replace the previous dynamic loop swaping strategy by a more general one: It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited.	2015-03-06 10:30:35 +01:00
Benoit Jacob	5db2baa573	Make benchmark-blocking-sizes detect changes to clock speed and be resilient to that.	2015-03-05 13:44:20 -05:00
Gael Guennebaud	4c8b95d5c5	Rename LSCG to LeastSquaresConjugateGradient	2015-03-05 10:16:32 +01:00
Gael Guennebaud	7550107028	Product optimization: implement a dynamic loop-swapping startegy to improve memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large"	2015-03-05 10:03:46 +01:00
Gael Guennebaud	2dc968e453	bug #824 : improve accuracy of Quaternion::angularDistance using atan2 instead of acos.	2015-03-04 17:03:13 +01:00
Benoit Jacob	2231b3dece	output to cout, not cerr, the actual results	2015-03-04 09:45:12 -05:00
Benoit Jacob	00ea121881	Complete the tool to analyze the efficiency of default sizes.	2015-03-04 09:30:56 -05:00
Jan Blechta	168ceb271e	Really use zero guess in ConjugateGradients::solve as documented and expected for consistency with other methods.	2015-02-18 14:26:10 +01:00
Gael Guennebaud	8fdcaded5e	merge	2015-03-04 10:18:08 +01:00
Gael Guennebaud	c43154bbc5	Check for no-reallocation in SparseMatrix::insert (bug #974 )	2015-03-04 10:16:46 +01:00
Gael Guennebaud	1ce0178363	Improve efficiency of SparseMatrix::insert/coeffRef for sequential outer-index insertion strategies (bug #974 )	2015-03-04 09:39:26 +01:00
Gael Guennebaud	3dca4a1efc	Update manual wrt new LSCG solver.	2015-03-04 09:35:30 +01:00
Gael Guennebaud	05274219a7	Add a CG-based solver for rectangular least-square problems (bug #975 ).	2015-03-04 09:34:27 +01:00
Benoit Jacob	2aa09e6b4e	Fix asm comments in 1px1 kernel	2015-03-03 13:44:00 -05:00
Benoit Steiner	5d2fd64a1a	Fixed compilation error when compiling with gcc4.7	2015-03-03 08:56:49 -08:00
Benoit Jacob	f64b4480af	Add missing copyright notices	2015-03-03 11:43:56 -05:00
Benoit Jacob	eae8e27b7d	Add a benchmark-default-sizes action to benchmark-blocking-sizes.cpp	2015-03-03 11:41:21 -05:00
Marc Glisse	37a93c4263	New scoring functor to select the pivot. This is can be useful for non-floating point scalars, where choosing the biggest element is generally not the best choice.	2015-03-03 17:08:28 +01:00
Benoit Jacob	ccc1277a42	must also disable complex<double> when disabling double vectorization	2015-03-03 10:17:05 -05:00
Benoit Jacob	f839099512	Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON intrinsics.	2015-03-03 09:35:22 -05:00
Benoit Jacob	9930e9583b	Improve analyze-blocking-sizes, and in particular give it a evaluate-defaults tool that shows the efficiency of Eigen's default blocking sizes choices, using a previously computed table from benchmark-blocking-sizes.	2015-03-02 18:08:38 -05:00
Benoit Jacob	1ec0f4fadf	HalfPacket also needed to be disabled for double, on ARMv8.	2015-03-02 16:08:54 -05:00
Gael Guennebaud	3109f0e74e	Add SSE vectorization of Quaternion::conjugate. Significant speed-up when combined with products like q1*q2.conjugate()	2015-03-02 20:09:33 +01:00
Abhijit Kundu	ef09ce4552	Fix for TensorIO for Fixed sized Tensors. The following code snippet was failing to compile: TensorFixedSize<double, Sizes<4, 3> > t_4x3; cout << 4x3;	2015-02-28 21:30:31 -05:00
Abhijit Kundu	3a4b6827b4	Merged eigen/eigen into default	2015-02-28 20:15:28 -05:00
Christoph Hertzberg	31e2ffe82c	Replaced POSIX random() by internal::random	2015-02-28 18:39:37 +01:00
Christoph Hertzberg	73dd95e7b0	Use @CMAKE_MAKE_PROGRAM@ instead of make in buildtests.sh	2015-02-28 16:51:53 +01:00
Christoph Hertzberg	682196e9fc	Fixed MPRealSupport	2015-02-28 16:41:00 +01:00
Christoph Hertzberg	33f40b2883	Cygwin does not like weak linking either.	2015-02-28 14:53:11 +01:00
Christoph Hertzberg	0f82a1d7b7	bug #967 : Automatically add cxx11 suffix when building in C++11 mode	2015-02-28 14:52:26 +01:00
Gael Guennebaud	9aee1e300a	Increase unit-test L1 cache size to ensure we are doing at least 2 peeled loop within product kernel.	2015-02-27 22:55:12 +01:00
Gael Guennebaud	b10cd3afd2	Re-enbale detection of min/max parentheses protection, and re-enable mpreal_support unit test.	2015-02-27 22:38:00 +01:00
Benoit Jacob	6466fa63be	Reimplement the selection between rotating and non-rotating kernels using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier.	2015-02-27 15:30:10 -05:00
Benoit Steiner	bf9877a92a	Pulled latest updates from trunk	2015-02-27 09:23:22 -08:00
Benoit Steiner	90f4e90f1d	Fixed off-by-one error that prevented the evaluation of small tensor expressions from being vectorized	2015-02-27 09:22:37 -08:00
Benoit Jacob	2fc3b484d7	remove trailing comma	2015-02-27 11:37:45 -05:00
Benoit Jacob	33669348c4	Disable Packet2f/2i halfpacket support in NEON. I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.	2015-02-27 11:35:37 -05:00
Benoit Jacob	f5ff4d826f	Fix NEON build flags: in the current NDK, at least with the clang-3.5 toolchain, -mfpu=neon is not enough to activate NEON, since it's incompatible with the default float ABI, and I have to pass -mfloat-abi=softfp (which is what everyone does in practice). In fact, it would be a good idea to pass -mfloat-abi=softfp all the time, regardless of NEON. Also removing the -mcpu=cortex-a8, as 1) it's not needed and 2) if we really wanted to pass a specific -mcpu flag, that would presumably to tune performance for benchmarks, and it would then not really make sense to tune for the very old cortex-a8 (it reflects ARM CPUs from 5 years ago).	2015-02-27 10:56:50 -05:00
Benoit Jacob	b7fc8746e0	Replace a static assert by a runtime one, fixes the build of unit tests on ARM Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results.	2015-02-27 10:01:59 -05:00
Abhijit Kundu	4084dce038	Added CMake support for Tensor module. CMake now installs CXX11 Tensor module like the rest of the unsupported modules	2015-02-26 16:50:09 -05:00
Benoit Steiner	f074bb4b5f	Fixed another compilation problem with TensorIntDiv.h	2015-02-26 11:14:23 -08:00
Benoit Steiner	57154fdb32	Can now use the tensor 'reverse' operation as a lvalue	2015-02-26 11:13:42 -08:00
Benoit Steiner	2fffe69b1b	Added missing copy constructor	2015-02-26 09:27:53 -08:00
Gael Guennebaud	bcf9bb5c1f	Avoid packing rhs multiple-times when blocking on the lhs only.	2015-02-26 17:01:33 +01:00
Gael Guennebaud	4ec3f04b3a	Make sure that the block size computation is tested by our unit test.	2015-02-26 17:00:36 +01:00
Gael Guennebaud	2e9cb06a87	Update changeset list to be checked by perf_monitoring/gemm.	2015-02-26 16:13:33 +01:00
Gael Guennebaud	a46061ab7b	Make perf_monitoring/gemm script more flexible: - skip existing dataset - add a "-up" option to recompute the dataset (see script header) - allow to specify a filename prefix	2015-02-26 16:12:58 +01:00
Gael Guennebaud	a8ad8887bf	Implement a more generic blocking-size selection algorithm. See explanations inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)	2015-02-26 16:04:35 +01:00
Gael Guennebaud	400becc591	Fix typos in block-size testing code, and set peeling on k to 8.	2015-02-26 15:57:06 +01:00
Benoit Steiner	bffb6bdf45	Made TensorIntDiv.h compile with MSVC	2015-02-25 23:54:43 -08:00

1 2 3 4 5 ...

6232 Commits