eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-02-23 18:20:47 +08:00

Author	SHA1	Message	Date
Deanna Hood	f52b78491c	Remove packet isNaN, isInf, isFinite	2015-03-17 09:26:24 +10:00
Deanna Hood	1c78d6f2a6	Add boolean not operator (!) array support	2015-03-17 08:29:57 +10:00
Deanna Hood	e1d6e6c972	Make cube, inverse and abs2 free-functions	2015-03-17 06:25:24 +10:00
Deanna Hood	fef4e071d7	Rename isinf to isInf	2015-03-17 05:58:47 +10:00
Deanna Hood	46cf9cda32	Add isfinite array support as isFinite	2015-03-17 04:33:12 +10:00
Deanna Hood	1d76ceab55	Remove floor, ceil, round for complex numbers	2015-03-17 02:36:07 +10:00
Deanna Hood	717b7954ce	Update cost of coeff-wise arg call	2015-03-17 02:11:57 +10:00
Deanna Hood	fb68b149cb	Rename isnan to isNaN	2015-03-17 02:04:42 +10:00
Deanna Hood	f89fcefa79	Add hyperbolic trigonometric functions from std array support	2015-03-11 13:13:30 +10:00
Deanna Hood	a5e49976f5	Add log10 array support	2015-03-11 08:56:42 +10:00
Deanna Hood	19a71056ae	Allow calling of square(array) in addition to array.square()	2015-03-11 06:59:28 +10:00
Deanna Hood	31fdd67756	Additional unary coeff-wise functors (isnan, round, arg, e.g.)	2015-03-11 06:39:23 +10:00
Gael Guennebaud	fd78874888	Fix compilation of iterative solvers with dense matrices	2015-03-09 21:31:03 +01:00
Gael Guennebaud	d4317a85e8	Add typedefs for return types of SparseMatrixBase::selfadjointView	2015-03-09 21:29:46 +01:00
Gael Guennebaud	9e885fb766	Add unit tests for CG and sparse-LLT for long int as storage-index	2015-03-09 14:33:15 +01:00
Gael Guennebaud	224a1fe4c6	bug #963 : make IncompleteLUT compatible with non-default storage index types.	2015-03-09 13:55:20 +01:00
Gael Guennebaud	0ee391863e	Avoid undeflow when blocking size are tuned manually.	2015-03-06 21:51:09 +01:00
Gael Guennebaud	14a5f135a3	bug #969 : workaround abiguous calls to Ref using enable_if.	2015-03-06 17:51:31 +01:00
Gael Guennebaud	87681e508f	bug #978 : early return for vanishing products	2015-03-06 16:11:22 +01:00
Gael Guennebaud	cd3bbffa73	Improve blocking heuristic: if the lhs fit within L1, then block on the rhs in L1 (allows to keep packed rhs in L1)	2015-03-06 14:31:39 +01:00
Gael Guennebaud	58740ce4c6	Improve product kernel: replace the previous dynamic loop swaping strategy by a more general one: It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited.	2015-03-06 10:30:35 +01:00
Gael Guennebaud	4c8b95d5c5	Rename LSCG to LeastSquaresConjugateGradient	2015-03-05 10:16:32 +01:00
Gael Guennebaud	7550107028	Product optimization: implement a dynamic loop-swapping startegy to improve memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large"	2015-03-05 10:03:46 +01:00
Gael Guennebaud	2dc968e453	bug #824 : improve accuracy of Quaternion::angularDistance using atan2 instead of acos.	2015-03-04 17:03:13 +01:00
Jan Blechta	168ceb271e	Really use zero guess in ConjugateGradients::solve as documented and expected for consistency with other methods.	2015-02-18 14:26:10 +01:00
Gael Guennebaud	8fdcaded5e	merge	2015-03-04 10:18:08 +01:00
Gael Guennebaud	c43154bbc5	Check for no-reallocation in SparseMatrix::insert (bug #974 )	2015-03-04 10:16:46 +01:00
Gael Guennebaud	1ce0178363	Improve efficiency of SparseMatrix::insert/coeffRef for sequential outer-index insertion strategies (bug #974 )	2015-03-04 09:39:26 +01:00
Gael Guennebaud	3dca4a1efc	Update manual wrt new LSCG solver.	2015-03-04 09:35:30 +01:00
Gael Guennebaud	05274219a7	Add a CG-based solver for rectangular least-square problems (bug #975 ).	2015-03-04 09:34:27 +01:00
Benoit Jacob	2aa09e6b4e	Fix asm comments in 1px1 kernel	2015-03-03 13:44:00 -05:00
Benoit Jacob	eae8e27b7d	Add a benchmark-default-sizes action to benchmark-blocking-sizes.cpp	2015-03-03 11:41:21 -05:00
Marc Glisse	37a93c4263	New scoring functor to select the pivot. This is can be useful for non-floating point scalars, where choosing the biggest element is generally not the best choice.	2015-03-03 17:08:28 +01:00
Benoit Jacob	ccc1277a42	must also disable complex<double> when disabling double vectorization	2015-03-03 10:17:05 -05:00
Benoit Jacob	f839099512	Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON intrinsics.	2015-03-03 09:35:22 -05:00
Benoit Jacob	1ec0f4fadf	HalfPacket also needed to be disabled for double, on ARMv8.	2015-03-02 16:08:54 -05:00
Gael Guennebaud	3109f0e74e	Add SSE vectorization of Quaternion::conjugate. Significant speed-up when combined with products like q1*q2.conjugate()	2015-03-02 20:09:33 +01:00
Gael Guennebaud	9aee1e300a	Increase unit-test L1 cache size to ensure we are doing at least 2 peeled loop within product kernel.	2015-02-27 22:55:12 +01:00
Gael Guennebaud	b10cd3afd2	Re-enbale detection of min/max parentheses protection, and re-enable mpreal_support unit test.	2015-02-27 22:38:00 +01:00
Benoit Jacob	6466fa63be	Reimplement the selection between rotating and non-rotating kernels using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier.	2015-02-27 15:30:10 -05:00
Benoit Jacob	2fc3b484d7	remove trailing comma	2015-02-27 11:37:45 -05:00
Benoit Jacob	33669348c4	Disable Packet2f/2i halfpacket support in NEON. I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.	2015-02-27 11:35:37 -05:00
Benoit Jacob	b7fc8746e0	Replace a static assert by a runtime one, fixes the build of unit tests on ARM Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results.	2015-02-27 10:01:59 -05:00
Gael Guennebaud	bcf9bb5c1f	Avoid packing rhs multiple-times when blocking on the lhs only.	2015-02-26 17:01:33 +01:00
Gael Guennebaud	4ec3f04b3a	Make sure that the block size computation is tested by our unit test.	2015-02-26 17:00:36 +01:00
Gael Guennebaud	a8ad8887bf	Implement a more generic blocking-size selection algorithm. See explanations inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)	2015-02-26 16:04:35 +01:00
Gael Guennebaud	400becc591	Fix typos in block-size testing code, and set peeling on k to 8.	2015-02-26 15:57:06 +01:00
Benoit Jacob	692136350b	So I extensively measured the impact of the offset in this prefetch. I tried offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!	2015-02-25 12:37:14 -05:00
Christoph Hertzberg	531fa9de77	bug #970 : Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports RValue-references.	2015-02-24 21:03:28 +01:00
Benoit Jacob	26275b250a	Fix my recent prefetch changes: - the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device.	2015-02-23 16:55:17 -05:00

1 2 3 4 5 ...

3825 Commits