eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	14a5f135a3	bug #969 : workaround abiguous calls to Ref using enable_if.	2015-03-06 17:51:31 +01:00
Gael Guennebaud	d23fcc0672	bug #978 : add unit test for zero-sized products	2015-03-06 16:12:08 +01:00
Gael Guennebaud	87681e508f	bug #978 : early return for vanishing products	2015-03-06 16:11:22 +01:00
Gael Guennebaud	4c8eeeaed6	update gemm changeset list	2015-03-06 15:08:20 +01:00
Gael Guennebaud	cd3bbffa73	Improve blocking heuristic: if the lhs fit within L1, then block on the rhs in L1 (allows to keep packed rhs in L1)	2015-03-06 14:31:39 +01:00
Gael Guennebaud	eedd5063fd	Update gemm performance monitoring tool: - permit to recompute a subset of changesets - update changeset list - add a few more cases	2015-03-06 11:47:13 +01:00
Gael Guennebaud	58740ce4c6	Improve product kernel: replace the previous dynamic loop swaping strategy by a more general one: It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited.	2015-03-06 10:30:35 +01:00
Benoit Jacob	4ab01f7c21	slightly increase tolerance to clock speed variation	2015-03-05 14:41:16 -05:00
Benoit Jacob	5db2baa573	Make benchmark-blocking-sizes detect changes to clock speed and be resilient to that.	2015-03-05 13:44:20 -05:00
Gael Guennebaud	4c8b95d5c5	Rename LSCG to LeastSquaresConjugateGradient	2015-03-05 10:16:32 +01:00
Gael Guennebaud	7550107028	Product optimization: implement a dynamic loop-swapping startegy to improve memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large"	2015-03-05 10:03:46 +01:00
Gael Guennebaud	2dc968e453	bug #824 : improve accuracy of Quaternion::angularDistance using atan2 instead of acos.	2015-03-04 17:03:13 +01:00
Benoit Jacob	2231b3dece	output to cout, not cerr, the actual results	2015-03-04 09:45:12 -05:00
Benoit Jacob	00ea121881	Complete the tool to analyze the efficiency of default sizes.	2015-03-04 09:30:56 -05:00
Benoit Steiner	0196141938	Fixed the optimized AVX implementation of the fast rsqrt function	2015-03-02 13:49:39 -08:00
Benoit Steiner	b0f2b6f297	Updated the tensor type casting code as follow: in the case where TgtRatio < SrcRatio, disable the vectorization of the source expression unless is has direct-access.	2015-03-02 10:11:40 -08:00
Benoit Steiner	d9cb604a5d	Disabled the use of aligned memory loads when converting a tensor from float to doubles since alignment can't always be guaranteed.	2015-03-02 09:41:36 -08:00
Benoit Steiner	4fd7f47692	Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.	2015-03-02 09:38:47 -08:00
Benoit Steiner	ae73859a0a	Fixed incorrect assertion	2015-02-28 08:02:02 -08:00
Benoit Steiner	131449298f	Fixed clang compilation warning	2015-02-28 03:01:19 -08:00
Benoit Steiner	56ea45ff0f	Silenced some compilation warnings	2015-02-28 02:37:41 -08:00
Benoit Steiner	bb483313f6	Fixed another batch of compilation warnings	2015-02-28 02:32:46 -08:00
Benoit Steiner	fb53384b0f	Improved the default implementation of prsqrt	2015-02-28 01:51:26 -08:00
Benoit Steiner	61409d9449	Silenced one more comilation warning	2015-02-28 01:49:09 -08:00
Benoit Steiner	1a7b84dc75	Silenced a few compilation warnings	2015-02-28 01:45:15 -08:00
Benoit Steiner	37357a310f	Fixed compilation warnings	2015-02-27 23:54:24 -08:00
Benoit Steiner	cf1eea11de	Fixed compilation warnings	2015-02-27 23:52:02 -08:00
Benoit Steiner	78732186ee	Fixed compilation warnings	2015-02-27 23:51:16 -08:00
Benoit Steiner	4250a0cab0	Fixed compilation warnings	2015-02-27 21:59:10 -08:00
Benoit Steiner	a4e37b0617	Reverted the README	2015-02-27 13:09:49 -08:00
Benoit Steiner	306fceccbe	Pulled latest updates from trunk	2015-02-27 13:05:26 -08:00
Benoit Steiner	75e7f381c8	Pulled latest updates from trunk	2015-02-27 12:57:55 -08:00
Benoit Steiner	2386fc8528	Added support for 32bit index on a per tensor/tensor expression. This enables us to use 32bit indices to evaluate expressions on GPU faster while keeping the ability to use 64 bit indices to manipulate large tensors on CPU in the same binary.	2015-02-27 12:57:13 -08:00
Benoit Steiner	e1f6a45b14	README.md edited online with Bitbucket	2015-02-27 20:44:24 +00:00
Benoit Steiner	90893bbe18	README.md edited online with Bitbucket	2015-02-27 20:44:10 +00:00
Benoit Steiner	473e6d4c3d	README.md edited online with Bitbucket	2015-02-27 20:41:45 +00:00
Benoit Steiner	4369538227	README.md edited online with Bitbucket	2015-02-27 20:41:33 +00:00
Benoit Steiner	99cfbd6e84	README.md edited online with Bitbucket	2015-02-27 20:41:14 +00:00
Benoit Jacob	6466fa63be	Reimplement the selection between rotating and non-rotating kernels using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier.	2015-02-27 15:30:10 -05:00
Benoit Steiner	05089aba75	Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts	2015-02-27 09:27:30 -08:00
Benoit Steiner	bf9877a92a	Pulled latest updates from trunk	2015-02-27 09:23:22 -08:00
Benoit Steiner	90f4e90f1d	Fixed off-by-one error that prevented the evaluation of small tensor expressions from being vectorized	2015-02-27 09:22:37 -08:00
Benoit Steiner	573b377110	Added support for vectorized type casting of tensors	2015-02-27 08:46:04 -08:00
Benoit Jacob	2fc3b484d7	remove trailing comma	2015-02-27 11:37:45 -05:00
Benoit Jacob	33669348c4	Disable Packet2f/2i halfpacket support in NEON. I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.	2015-02-27 11:35:37 -05:00
Benoit Jacob	f5ff4d826f	Fix NEON build flags: in the current NDK, at least with the clang-3.5 toolchain, -mfpu=neon is not enough to activate NEON, since it's incompatible with the default float ABI, and I have to pass -mfloat-abi=softfp (which is what everyone does in practice). In fact, it would be a good idea to pass -mfloat-abi=softfp all the time, regardless of NEON. Also removing the -mcpu=cortex-a8, as 1) it's not needed and 2) if we really wanted to pass a specific -mcpu flag, that would presumably to tune performance for benchmarks, and it would then not really make sense to tune for the very old cortex-a8 (it reflects ARM CPUs from 5 years ago).	2015-02-27 10:56:50 -05:00
Benoit Jacob	b7fc8746e0	Replace a static assert by a runtime one, fixes the build of unit tests on ARM Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results.	2015-02-27 10:01:59 -05:00
Benoit Steiner	f074bb4b5f	Fixed another compilation problem with TensorIntDiv.h	2015-02-26 11:14:23 -08:00
Benoit Steiner	57154fdb32	Can now use the tensor 'reverse' operation as a lvalue	2015-02-26 11:13:42 -08:00
Benoit Steiner	f41b1f1666	Added support for fast reciprocal square root computation.	2015-02-26 09:42:41 -08:00

... 3 4 5 6 7 ...

6468 Commits