Benoit Jacob
5db2baa573
Make benchmark-blocking-sizes detect changes to clock speed and be resilient to that.
2015-03-05 13:44:20 -05:00
Gael Guennebaud
4c8b95d5c5
Rename LSCG to LeastSquaresConjugateGradient
2015-03-05 10:16:32 +01:00
Gael Guennebaud
7550107028
Product optimization: implement a dynamic loop-swapping startegy to improve memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large"
2015-03-05 10:03:46 +01:00
Gael Guennebaud
2dc968e453
bug #824 : improve accuracy of Quaternion::angularDistance using atan2 instead of acos.
2015-03-04 17:03:13 +01:00
Benoit Jacob
2231b3dece
output to cout, not cerr, the actual results
2015-03-04 09:45:12 -05:00
Benoit Jacob
00ea121881
Complete the tool to analyze the efficiency of default sizes.
2015-03-04 09:30:56 -05:00
Benoit Steiner
0196141938
Fixed the optimized AVX implementation of the fast rsqrt function
2015-03-02 13:49:39 -08:00
Benoit Steiner
b0f2b6f297
Updated the tensor type casting code as follow: in the case where TgtRatio < SrcRatio, disable the vectorization of the source expression unless is has direct-access.
2015-03-02 10:11:40 -08:00
Benoit Steiner
d9cb604a5d
Disabled the use of aligned memory loads when converting a tensor from float to doubles since alignment can't always be guaranteed.
2015-03-02 09:41:36 -08:00
Benoit Steiner
4fd7f47692
Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.
2015-03-02 09:38:47 -08:00
Benoit Steiner
ae73859a0a
Fixed incorrect assertion
2015-02-28 08:02:02 -08:00
Benoit Steiner
131449298f
Fixed clang compilation warning
2015-02-28 03:01:19 -08:00
Benoit Steiner
56ea45ff0f
Silenced some compilation warnings
2015-02-28 02:37:41 -08:00
Benoit Steiner
bb483313f6
Fixed another batch of compilation warnings
2015-02-28 02:32:46 -08:00
Benoit Steiner
fb53384b0f
Improved the default implementation of prsqrt
2015-02-28 01:51:26 -08:00
Benoit Steiner
61409d9449
Silenced one more comilation warning
2015-02-28 01:49:09 -08:00
Benoit Steiner
1a7b84dc75
Silenced a few compilation warnings
2015-02-28 01:45:15 -08:00
Benoit Steiner
37357a310f
Fixed compilation warnings
2015-02-27 23:54:24 -08:00
Benoit Steiner
cf1eea11de
Fixed compilation warnings
2015-02-27 23:52:02 -08:00
Benoit Steiner
78732186ee
Fixed compilation warnings
2015-02-27 23:51:16 -08:00
Benoit Steiner
4250a0cab0
Fixed compilation warnings
2015-02-27 21:59:10 -08:00
Benoit Steiner
a4e37b0617
Reverted the README
2015-02-27 13:09:49 -08:00
Benoit Steiner
306fceccbe
Pulled latest updates from trunk
2015-02-27 13:05:26 -08:00
Benoit Steiner
75e7f381c8
Pulled latest updates from trunk
2015-02-27 12:57:55 -08:00
Benoit Steiner
2386fc8528
Added support for 32bit index on a per tensor/tensor expression. This enables us to use 32bit indices to evaluate expressions on GPU faster while keeping the ability to use 64 bit indices to manipulate large tensors on CPU in the same binary.
2015-02-27 12:57:13 -08:00
Benoit Steiner
e1f6a45b14
README.md edited online with Bitbucket
2015-02-27 20:44:24 +00:00
Benoit Steiner
90893bbe18
README.md edited online with Bitbucket
2015-02-27 20:44:10 +00:00
Benoit Steiner
473e6d4c3d
README.md edited online with Bitbucket
2015-02-27 20:41:45 +00:00
Benoit Steiner
4369538227
README.md edited online with Bitbucket
2015-02-27 20:41:33 +00:00
Benoit Steiner
99cfbd6e84
README.md edited online with Bitbucket
2015-02-27 20:41:14 +00:00
Benoit Jacob
6466fa63be
Reimplement the selection between rotating and non-rotating kernels
...
using templates instead of macros and if()'s.
That was needed to fix the build of unit tests on ARM, which I had
broken. My bad for not testing earlier.
2015-02-27 15:30:10 -05:00
Benoit Steiner
05089aba75
Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts
2015-02-27 09:27:30 -08:00
Benoit Steiner
bf9877a92a
Pulled latest updates from trunk
2015-02-27 09:23:22 -08:00
Benoit Steiner
90f4e90f1d
Fixed off-by-one error that prevented the evaluation of small tensor expressions from being vectorized
2015-02-27 09:22:37 -08:00
Benoit Steiner
573b377110
Added support for vectorized type casting of tensors
2015-02-27 08:46:04 -08:00
Benoit Jacob
2fc3b484d7
remove trailing comma
2015-02-27 11:37:45 -05:00
Benoit Jacob
33669348c4
Disable Packet2f/2i halfpacket support in NEON.
...
I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented,
and code trying to use halfpackets just fails to compile on NEON, as it tries to use the
default implementation of pload/pstore and the types don't match.
2015-02-27 11:35:37 -05:00
Benoit Jacob
f5ff4d826f
Fix NEON build flags: in the current NDK, at least with the clang-3.5 toolchain,
...
-mfpu=neon is not enough to activate NEON, since it's incompatible with the default float ABI,
and I have to pass -mfloat-abi=softfp (which is what everyone does in practice).
In fact, it would be a good idea to pass -mfloat-abi=softfp all the time, regardless of NEON.
Also removing the -mcpu=cortex-a8, as 1) it's not needed and 2) if we really wanted to pass
a specific -mcpu flag, that would presumably to tune performance for benchmarks, and it would
then not really make sense to tune for the very old cortex-a8 (it reflects ARM CPUs from 5 years ago).
2015-02-27 10:56:50 -05:00
Benoit Jacob
b7fc8746e0
Replace a static assert by a runtime one, fixes the build of unit tests on ARM
...
Also safely assert in the non-implemented path that should never be taken in practice,
and would return wrong results.
2015-02-27 10:01:59 -05:00
Benoit Steiner
f074bb4b5f
Fixed another compilation problem with TensorIntDiv.h
2015-02-26 11:14:23 -08:00
Benoit Steiner
57154fdb32
Can now use the tensor 'reverse' operation as a lvalue
2015-02-26 11:13:42 -08:00
Benoit Steiner
f41b1f1666
Added support for fast reciprocal square root computation.
2015-02-26 09:42:41 -08:00
Benoit Steiner
2fffe69b1b
Added missing copy constructor
2015-02-26 09:27:53 -08:00
Gael Guennebaud
bcf9bb5c1f
Avoid packing rhs multiple-times when blocking on the lhs only.
2015-02-26 17:01:33 +01:00
Gael Guennebaud
4ec3f04b3a
Make sure that the block size computation is tested by our unit test.
2015-02-26 17:00:36 +01:00
Gael Guennebaud
2e9cb06a87
Update changeset list to be checked by perf_monitoring/gemm.
2015-02-26 16:13:33 +01:00
Gael Guennebaud
a46061ab7b
Make perf_monitoring/gemm script more flexible:
...
- skip existing dataset
- add a "-up" option to recompute the dataset (see script header)
- allow to specify a filename prefix
2015-02-26 16:12:58 +01:00
Gael Guennebaud
a8ad8887bf
Implement a more generic blocking-size selection algorithm. See explanations inlines.
...
It performs extremely well on Haswell. The main issue is to reliably and quickly find the
actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)
2015-02-26 16:04:35 +01:00
Gael Guennebaud
400becc591
Fix typos in block-size testing code, and set peeling on k to 8.
2015-02-26 15:57:06 +01:00
Benoit Steiner
bffb6bdf45
Made TensorIntDiv.h compile with MSVC
2015-02-25 23:54:43 -08:00