eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-15 07:10:37 +08:00

Author	SHA1	Message	Date
Benoit Steiner	131449298f	Fixed clang compilation warning	2015-02-28 03:01:19 -08:00
Benoit Steiner	56ea45ff0f	Silenced some compilation warnings	2015-02-28 02:37:41 -08:00
Benoit Steiner	bb483313f6	Fixed another batch of compilation warnings	2015-02-28 02:32:46 -08:00
Benoit Steiner	fb53384b0f	Improved the default implementation of prsqrt	2015-02-28 01:51:26 -08:00
Benoit Steiner	61409d9449	Silenced one more comilation warning	2015-02-28 01:49:09 -08:00
Benoit Steiner	1a7b84dc75	Silenced a few compilation warnings	2015-02-28 01:45:15 -08:00
Benoit Steiner	37357a310f	Fixed compilation warnings	2015-02-27 23:54:24 -08:00
Benoit Steiner	cf1eea11de	Fixed compilation warnings	2015-02-27 23:52:02 -08:00
Benoit Steiner	78732186ee	Fixed compilation warnings	2015-02-27 23:51:16 -08:00
Benoit Steiner	4250a0cab0	Fixed compilation warnings	2015-02-27 21:59:10 -08:00
Benoit Steiner	a4e37b0617	Reverted the README	2015-02-27 13:09:49 -08:00
Benoit Steiner	306fceccbe	Pulled latest updates from trunk	2015-02-27 13:05:26 -08:00
Benoit Steiner	75e7f381c8	Pulled latest updates from trunk	2015-02-27 12:57:55 -08:00
Benoit Steiner	2386fc8528	Added support for 32bit index on a per tensor/tensor expression. This enables us to use 32bit indices to evaluate expressions on GPU faster while keeping the ability to use 64 bit indices to manipulate large tensors on CPU in the same binary.	2015-02-27 12:57:13 -08:00
Benoit Steiner	e1f6a45b14	README.md edited online with Bitbucket	2015-02-27 20:44:24 +00:00
Benoit Steiner	90893bbe18	README.md edited online with Bitbucket	2015-02-27 20:44:10 +00:00
Benoit Steiner	473e6d4c3d	README.md edited online with Bitbucket	2015-02-27 20:41:45 +00:00
Benoit Steiner	4369538227	README.md edited online with Bitbucket	2015-02-27 20:41:33 +00:00
Benoit Steiner	99cfbd6e84	README.md edited online with Bitbucket	2015-02-27 20:41:14 +00:00
Benoit Jacob	6466fa63be	Reimplement the selection between rotating and non-rotating kernels using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier.	2015-02-27 15:30:10 -05:00
Benoit Steiner	05089aba75	Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts	2015-02-27 09:27:30 -08:00
Benoit Steiner	bf9877a92a	Pulled latest updates from trunk	2015-02-27 09:23:22 -08:00
Benoit Steiner	90f4e90f1d	Fixed off-by-one error that prevented the evaluation of small tensor expressions from being vectorized	2015-02-27 09:22:37 -08:00
Benoit Steiner	573b377110	Added support for vectorized type casting of tensors	2015-02-27 08:46:04 -08:00
Benoit Jacob	2fc3b484d7	remove trailing comma	2015-02-27 11:37:45 -05:00
Benoit Jacob	33669348c4	Disable Packet2f/2i halfpacket support in NEON. I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.	2015-02-27 11:35:37 -05:00
Benoit Jacob	f5ff4d826f	Fix NEON build flags: in the current NDK, at least with the clang-3.5 toolchain, -mfpu=neon is not enough to activate NEON, since it's incompatible with the default float ABI, and I have to pass -mfloat-abi=softfp (which is what everyone does in practice). In fact, it would be a good idea to pass -mfloat-abi=softfp all the time, regardless of NEON. Also removing the -mcpu=cortex-a8, as 1) it's not needed and 2) if we really wanted to pass a specific -mcpu flag, that would presumably to tune performance for benchmarks, and it would then not really make sense to tune for the very old cortex-a8 (it reflects ARM CPUs from 5 years ago).	2015-02-27 10:56:50 -05:00
Benoit Jacob	b7fc8746e0	Replace a static assert by a runtime one, fixes the build of unit tests on ARM Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results.	2015-02-27 10:01:59 -05:00
Benoit Steiner	f074bb4b5f	Fixed another compilation problem with TensorIntDiv.h	2015-02-26 11:14:23 -08:00
Benoit Steiner	57154fdb32	Can now use the tensor 'reverse' operation as a lvalue	2015-02-26 11:13:42 -08:00
Benoit Steiner	f41b1f1666	Added support for fast reciprocal square root computation.	2015-02-26 09:42:41 -08:00
Benoit Steiner	2fffe69b1b	Added missing copy constructor	2015-02-26 09:27:53 -08:00
Gael Guennebaud	bcf9bb5c1f	Avoid packing rhs multiple-times when blocking on the lhs only.	2015-02-26 17:01:33 +01:00
Gael Guennebaud	4ec3f04b3a	Make sure that the block size computation is tested by our unit test.	2015-02-26 17:00:36 +01:00
Gael Guennebaud	2e9cb06a87	Update changeset list to be checked by perf_monitoring/gemm.	2015-02-26 16:13:33 +01:00
Gael Guennebaud	a46061ab7b	Make perf_monitoring/gemm script more flexible: - skip existing dataset - add a "-up" option to recompute the dataset (see script header) - allow to specify a filename prefix	2015-02-26 16:12:58 +01:00
Gael Guennebaud	a8ad8887bf	Implement a more generic blocking-size selection algorithm. See explanations inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)	2015-02-26 16:04:35 +01:00
Gael Guennebaud	400becc591	Fix typos in block-size testing code, and set peeling on k to 8.	2015-02-26 15:57:06 +01:00
Benoit Steiner	bffb6bdf45	Made TensorIntDiv.h compile with MSVC	2015-02-25 23:54:43 -08:00
Benoit Steiner	27f3fb2bcc	Fixed another clang warning	2015-02-25 22:54:20 -08:00
Benoit Steiner	f8fbb3f9a6	Fixed several compilation warnings reported by clang	2015-02-25 22:22:37 -08:00
Benoit Steiner	8e817b65d0	Silenced a few more compilation warnings generated by nvcc	2015-02-25 17:46:20 -08:00
Benoit Steiner	410070e5ab	Added more tests to validate support for tensors laid out in RowMajor order.	2015-02-25 16:14:59 -08:00
Benoit Steiner	1cfd51908c	Added support for RowMajor layout to the tensor patch extraction cofde.	2015-02-25 13:29:12 -08:00
Benoit Steiner	eb21a8173e	Pulled latest changes from trunk	2015-02-25 09:49:44 -08:00
Benoit Steiner	8afce86e64	Added support for RowMajor layout to the image patch extraction code Speeded up the unsupported_cxx11_tensor_image_patch test and reduced its memory footprint	2015-02-25 09:48:54 -08:00
Benoit Jacob	692136350b	So I extensively measured the impact of the offset in this prefetch. I tried offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!	2015-02-25 12:37:14 -05:00
Christoph Hertzberg	531fa9de77	bug #970 : Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports RValue-references.	2015-02-24 21:03:28 +01:00
Benoit Jacob	26275b250a	Fix my recent prefetch changes: - the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device.	2015-02-23 16:55:17 -05:00
Benoit Jacob	488874781b	Add analyze-blocking-sizes program under bench/ to analyze multiple logs generated by benchmark-blocking-sizes.	2015-02-23 14:02:29 -05:00

... 3 4 5 6 7 ...

6449 Commits