Gael Guennebaud
1ce0178363
Improve efficiency of SparseMatrix::insert/coeffRef for sequential outer-index insertion strategies (bug #974 )
2015-03-04 09:39:26 +01:00
Gael Guennebaud
3dca4a1efc
Update manual wrt new LSCG solver.
2015-03-04 09:35:30 +01:00
Gael Guennebaud
05274219a7
Add a CG-based solver for rectangular least-square problems (bug #975 ).
2015-03-04 09:34:27 +01:00
Benoit Jacob
f839099512
Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON intrinsics.
2015-03-03 09:35:22 -05:00
Benoit Jacob
9930e9583b
Improve analyze-blocking-sizes, and in particular give it a evaluate-defaults tool
...
that shows the efficiency of Eigen's default blocking sizes choices, using a
previously computed table from benchmark-blocking-sizes.
2015-03-02 18:08:38 -05:00
Benoit Jacob
1ec0f4fadf
HalfPacket also needed to be disabled for double, on ARMv8.
2015-03-02 16:08:54 -05:00
Gael Guennebaud
3109f0e74e
Add SSE vectorization of Quaternion::conjugate. Significant speed-up when combined with products like q1*q2.conjugate()
2015-03-02 20:09:33 +01:00
Abhijit Kundu
ef09ce4552
Fix for TensorIO for Fixed sized Tensors.
...
The following code snippet was failing to compile:
TensorFixedSize<double, Sizes<4, 3> > t_4x3;
cout << 4x3;
2015-02-28 21:30:31 -05:00
Abhijit Kundu
3a4b6827b4
Merged eigen/eigen into default
2015-02-28 20:15:28 -05:00
Christoph Hertzberg
31e2ffe82c
Replaced POSIX random() by internal::random
2015-02-28 18:39:37 +01:00
Christoph Hertzberg
73dd95e7b0
Use @CMAKE_MAKE_PROGRAM@ instead of make in buildtests.sh
2015-02-28 16:51:53 +01:00
Christoph Hertzberg
682196e9fc
Fixed MPRealSupport
2015-02-28 16:41:00 +01:00
Christoph Hertzberg
33f40b2883
Cygwin does not like weak linking either.
2015-02-28 14:53:11 +01:00
Christoph Hertzberg
0f82a1d7b7
bug #967 : Automatically add cxx11 suffix when building in C++11 mode
2015-02-28 14:52:26 +01:00
Gael Guennebaud
9aee1e300a
Increase unit-test L1 cache size to ensure we are doing at least 2 peeled loop within product kernel.
2015-02-27 22:55:12 +01:00
Gael Guennebaud
b10cd3afd2
Re-enbale detection of min/max parentheses protection, and re-enable mpreal_support unit test.
2015-02-27 22:38:00 +01:00
Benoit Jacob
6466fa63be
Reimplement the selection between rotating and non-rotating kernels
...
using templates instead of macros and if()'s.
That was needed to fix the build of unit tests on ARM, which I had
broken. My bad for not testing earlier.
2015-02-27 15:30:10 -05:00
Benoit Steiner
bf9877a92a
Pulled latest updates from trunk
2015-02-27 09:23:22 -08:00
Benoit Steiner
90f4e90f1d
Fixed off-by-one error that prevented the evaluation of small tensor expressions from being vectorized
2015-02-27 09:22:37 -08:00
Benoit Jacob
2fc3b484d7
remove trailing comma
2015-02-27 11:37:45 -05:00
Benoit Jacob
33669348c4
Disable Packet2f/2i halfpacket support in NEON.
...
I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented,
and code trying to use halfpackets just fails to compile on NEON, as it tries to use the
default implementation of pload/pstore and the types don't match.
2015-02-27 11:35:37 -05:00
Benoit Jacob
f5ff4d826f
Fix NEON build flags: in the current NDK, at least with the clang-3.5 toolchain,
...
-mfpu=neon is not enough to activate NEON, since it's incompatible with the default float ABI,
and I have to pass -mfloat-abi=softfp (which is what everyone does in practice).
In fact, it would be a good idea to pass -mfloat-abi=softfp all the time, regardless of NEON.
Also removing the -mcpu=cortex-a8, as 1) it's not needed and 2) if we really wanted to pass
a specific -mcpu flag, that would presumably to tune performance for benchmarks, and it would
then not really make sense to tune for the very old cortex-a8 (it reflects ARM CPUs from 5 years ago).
2015-02-27 10:56:50 -05:00
Benoit Jacob
b7fc8746e0
Replace a static assert by a runtime one, fixes the build of unit tests on ARM
...
Also safely assert in the non-implemented path that should never be taken in practice,
and would return wrong results.
2015-02-27 10:01:59 -05:00
Abhijit Kundu
4084dce038
Added CMake support for Tensor module. CMake now installs CXX11 Tensor module like the rest of the unsupported modules
2015-02-26 16:50:09 -05:00
Benoit Steiner
f074bb4b5f
Fixed another compilation problem with TensorIntDiv.h
2015-02-26 11:14:23 -08:00
Benoit Steiner
57154fdb32
Can now use the tensor 'reverse' operation as a lvalue
2015-02-26 11:13:42 -08:00
Benoit Steiner
2fffe69b1b
Added missing copy constructor
2015-02-26 09:27:53 -08:00
Gael Guennebaud
bcf9bb5c1f
Avoid packing rhs multiple-times when blocking on the lhs only.
2015-02-26 17:01:33 +01:00
Gael Guennebaud
4ec3f04b3a
Make sure that the block size computation is tested by our unit test.
2015-02-26 17:00:36 +01:00
Gael Guennebaud
2e9cb06a87
Update changeset list to be checked by perf_monitoring/gemm.
2015-02-26 16:13:33 +01:00
Gael Guennebaud
a46061ab7b
Make perf_monitoring/gemm script more flexible:
...
- skip existing dataset
- add a "-up" option to recompute the dataset (see script header)
- allow to specify a filename prefix
2015-02-26 16:12:58 +01:00
Gael Guennebaud
a8ad8887bf
Implement a more generic blocking-size selection algorithm. See explanations inlines.
...
It performs extremely well on Haswell. The main issue is to reliably and quickly find the
actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)
2015-02-26 16:04:35 +01:00
Gael Guennebaud
400becc591
Fix typos in block-size testing code, and set peeling on k to 8.
2015-02-26 15:57:06 +01:00
Benoit Steiner
bffb6bdf45
Made TensorIntDiv.h compile with MSVC
2015-02-25 23:54:43 -08:00
Benoit Steiner
27f3fb2bcc
Fixed another clang warning
2015-02-25 22:54:20 -08:00
Benoit Steiner
f8fbb3f9a6
Fixed several compilation warnings reported by clang
2015-02-25 22:22:37 -08:00
Benoit Steiner
8e817b65d0
Silenced a few more compilation warnings generated by nvcc
2015-02-25 17:46:20 -08:00
Benoit Steiner
410070e5ab
Added more tests to validate support for tensors laid out in RowMajor order.
2015-02-25 16:14:59 -08:00
Benoit Steiner
1cfd51908c
Added support for RowMajor layout to the tensor patch extraction cofde.
2015-02-25 13:29:12 -08:00
Benoit Steiner
eb21a8173e
Pulled latest changes from trunk
2015-02-25 09:49:44 -08:00
Benoit Steiner
8afce86e64
Added support for RowMajor layout to the image patch extraction code
...
Speeded up the unsupported_cxx11_tensor_image_patch test and reduced its memory footprint
2015-02-25 09:48:54 -08:00
Benoit Jacob
692136350b
So I extensively measured the impact of the offset in this prefetch. I tried offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes).
...
On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400.
I could not see any significant impact of this offset.
On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0.
So let's just go with 0!
Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!
2015-02-25 12:37:14 -05:00
Christoph Hertzberg
531fa9de77
bug #970 : Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports RValue-references.
2015-02-24 21:03:28 +01:00
Benoit Jacob
26275b250a
Fix my recent prefetch changes:
...
- the first prefetch is actually harmful on Haswell with FMA,
but it is the most beneficial on ARM.
- the second prefetch... I was very stupid and multiplied by sizeof(scalar)
and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8.
So this effectively restores the older offset. Actually, there were
two prefetches here, one with offset 48 and one with offset 64. I could not
confirm any benefit from this strange 48 offset on either the haswell or
my ARM device.
2015-02-23 16:55:17 -05:00
Benoit Jacob
488874781b
Add analyze-blocking-sizes program under bench/ to analyze multiple logs
...
generated by benchmark-blocking-sizes.
2015-02-23 14:02:29 -05:00
Christoph Hertzberg
052b6b40f1
Fix two trivial warnings
2015-02-22 12:40:51 +01:00
Christoph Hertzberg
ecbf2a6656
log1p is defined only for real Scalars in C++11
2015-02-21 19:58:24 +01:00
Christoph Hertzberg
6af6cf0c2e
I can reproduce any problems that justified this hack. However it makes builds fail in C++11 mode.
2015-02-21 19:43:56 +01:00
Gael Guennebaud
3cf642baa3
Fix compilation of unit tests disabling assertion cheking
2015-02-21 14:13:48 +01:00
Benoit Jacob
458cf91cd9
Add benchmark-blocking-sizes.cpp to bench/ per mailing list discussion.
2015-02-20 17:08:04 -05:00