Commit Graph

3817 Commits

Author SHA1 Message Date
Gael Guennebaud
8580eb6808 bug #949: add static assertion for incompatible scalar types in dense end-user decompositions. 2015-03-13 21:06:20 +01:00
Gael Guennebaud
a9df28c95b SparseMatrix::insert: switch to a fully uncompressed mode if sequential insertion is not possible (otherwise an arbitrary large amount of memory was preallocated in some cases) 2015-03-13 21:00:21 +01:00
Gael Guennebaud
5ffe29cb9f Bound pre-allocation to the maximal size representable by StorageIndex and throw bad_alloc if that's not possible. 2015-03-13 20:57:33 +01:00
Gael Guennebaud
2f6f8bf31c Add missing coeff/coeffRef members to Block<sparse>, and extend unit tests. 2015-03-13 16:24:40 +01:00
Gael Guennebaud
fd78874888 Fix compilation of iterative solvers with dense matrices 2015-03-09 21:31:03 +01:00
Gael Guennebaud
d4317a85e8 Add typedefs for return types of SparseMatrixBase::selfadjointView 2015-03-09 21:29:46 +01:00
Gael Guennebaud
9e885fb766 Add unit tests for CG and sparse-LLT for long int as storage-index 2015-03-09 14:33:15 +01:00
Gael Guennebaud
224a1fe4c6 bug #963: make IncompleteLUT compatible with non-default storage index types. 2015-03-09 13:55:20 +01:00
Gael Guennebaud
0ee391863e Avoid undeflow when blocking size are tuned manually. 2015-03-06 21:51:09 +01:00
Gael Guennebaud
14a5f135a3 bug #969: workaround abiguous calls to Ref using enable_if. 2015-03-06 17:51:31 +01:00
Gael Guennebaud
87681e508f bug #978: early return for vanishing products 2015-03-06 16:11:22 +01:00
Gael Guennebaud
cd3bbffa73 Improve blocking heuristic: if the lhs fit within L1, then block on the rhs in L1 (allows to keep packed rhs in L1) 2015-03-06 14:31:39 +01:00
Gael Guennebaud
58740ce4c6 Improve product kernel: replace the previous dynamic loop swaping strategy by a more general one:
It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited.
2015-03-06 10:30:35 +01:00
Gael Guennebaud
4c8b95d5c5 Rename LSCG to LeastSquaresConjugateGradient 2015-03-05 10:16:32 +01:00
Gael Guennebaud
7550107028 Product optimization: implement a dynamic loop-swapping startegy to improve memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large" 2015-03-05 10:03:46 +01:00
Gael Guennebaud
2dc968e453 bug #824: improve accuracy of Quaternion::angularDistance using atan2 instead of acos. 2015-03-04 17:03:13 +01:00
Jan Blechta
168ceb271e Really use zero guess in ConjugateGradients::solve as documented
and expected for consistency with other methods.
2015-02-18 14:26:10 +01:00
Gael Guennebaud
8fdcaded5e merge 2015-03-04 10:18:08 +01:00
Gael Guennebaud
c43154bbc5 Check for no-reallocation in SparseMatrix::insert (bug #974) 2015-03-04 10:16:46 +01:00
Gael Guennebaud
1ce0178363 Improve efficiency of SparseMatrix::insert/coeffRef for sequential outer-index insertion strategies (bug #974) 2015-03-04 09:39:26 +01:00
Gael Guennebaud
3dca4a1efc Update manual wrt new LSCG solver. 2015-03-04 09:35:30 +01:00
Gael Guennebaud
05274219a7 Add a CG-based solver for rectangular least-square problems (bug #975). 2015-03-04 09:34:27 +01:00
Benoit Jacob
2aa09e6b4e Fix asm comments in 1px1 kernel 2015-03-03 13:44:00 -05:00
Benoit Jacob
eae8e27b7d Add a benchmark-default-sizes action to benchmark-blocking-sizes.cpp 2015-03-03 11:41:21 -05:00
Marc Glisse
37a93c4263 New scoring functor to select the pivot.
This is can be useful for non-floating point scalars, where choosing the biggest element is generally not the best choice.
2015-03-03 17:08:28 +01:00
Benoit Jacob
ccc1277a42 must also disable complex<double> when disabling double vectorization 2015-03-03 10:17:05 -05:00
Benoit Jacob
f839099512 Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON intrinsics. 2015-03-03 09:35:22 -05:00
Benoit Jacob
1ec0f4fadf HalfPacket also needed to be disabled for double, on ARMv8. 2015-03-02 16:08:54 -05:00
Gael Guennebaud
3109f0e74e Add SSE vectorization of Quaternion::conjugate. Significant speed-up when combined with products like q1*q2.conjugate() 2015-03-02 20:09:33 +01:00
Gael Guennebaud
9aee1e300a Increase unit-test L1 cache size to ensure we are doing at least 2 peeled loop within product kernel. 2015-02-27 22:55:12 +01:00
Gael Guennebaud
b10cd3afd2 Re-enbale detection of min/max parentheses protection, and re-enable mpreal_support unit test. 2015-02-27 22:38:00 +01:00
Benoit Jacob
6466fa63be Reimplement the selection between rotating and non-rotating kernels
using templates instead of macros and if()'s.
That was needed to fix the build of unit tests on ARM, which I had
broken. My bad for not testing earlier.
2015-02-27 15:30:10 -05:00
Benoit Jacob
2fc3b484d7 remove trailing comma 2015-02-27 11:37:45 -05:00
Benoit Jacob
33669348c4 Disable Packet2f/2i halfpacket support in NEON.
I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented,
and code trying to use halfpackets just fails to compile on NEON, as it tries to use the
default implementation of pload/pstore and the types don't match.
2015-02-27 11:35:37 -05:00
Benoit Jacob
b7fc8746e0 Replace a static assert by a runtime one, fixes the build of unit tests on ARM
Also safely assert in the non-implemented path that should never be taken in practice,
and would return wrong results.
2015-02-27 10:01:59 -05:00
Gael Guennebaud
bcf9bb5c1f Avoid packing rhs multiple-times when blocking on the lhs only. 2015-02-26 17:01:33 +01:00
Gael Guennebaud
4ec3f04b3a Make sure that the block size computation is tested by our unit test. 2015-02-26 17:00:36 +01:00
Gael Guennebaud
a8ad8887bf Implement a more generic blocking-size selection algorithm. See explanations inlines.
It performs extremely well on Haswell. The main issue is to reliably and quickly find the
actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)
2015-02-26 16:04:35 +01:00
Gael Guennebaud
400becc591 Fix typos in block-size testing code, and set peeling on k to 8. 2015-02-26 15:57:06 +01:00
Benoit Jacob
692136350b So I extensively measured the impact of the offset in this prefetch. I tried offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes).
On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400.

I could not see any significant impact of this offset.

On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0.

So let's just go with 0!

Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!
2015-02-25 12:37:14 -05:00
Christoph Hertzberg
531fa9de77 bug #970: Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports RValue-references. 2015-02-24 21:03:28 +01:00
Benoit Jacob
26275b250a Fix my recent prefetch changes:
- the first prefetch is actually harmful on Haswell with FMA,
   but it is the most beneficial on ARM.
 - the second prefetch... I was very stupid and multiplied by sizeof(scalar)
   and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8.
   So this effectively restores the older offset. Actually, there were
   two prefetches here, one with offset 48 and one with offset 64. I could not
   confirm any benefit from this strange 48 offset on either the haswell or
   my ARM device.
2015-02-23 16:55:17 -05:00
Christoph Hertzberg
052b6b40f1 Fix two trivial warnings 2015-02-22 12:40:51 +01:00
Christoph Hertzberg
ecbf2a6656 log1p is defined only for real Scalars in C++11 2015-02-21 19:58:24 +01:00
Gael Guennebaud
3cf642baa3 Fix compilation of unit tests disabling assertion cheking 2015-02-21 14:13:48 +01:00
Gael Guennebaud
2da1594750 Fix doc of Ref<> 2015-02-20 11:52:22 +01:00
Gael Guennebaud
b192e29eae In C++11 destructors do not throw by default (fix CommaInitializer unit test) 2015-02-20 09:28:34 +01:00
Benoit Steiner
ab41652d81 Pulled latest changes from trunk 2015-02-19 21:23:37 -08:00
Benoit Steiner
7765039f1c Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up being executed on the GPU device. 2015-02-19 21:22:51 -08:00
Gael Guennebaud
a66f5fc2fd Fix regression with C++11 support of lambda: now internal::result_of falls back to std::result_of in C++11. 2015-02-19 23:32:12 +01:00