Benoit Jacob
340b8afb14
bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,
...
because this is what they are about. "Fused" means "no intermediate rounding
between the mul and the add, only one rounding at the end". Instead,
what we are concerned about here is whether a temporary register is needed,
i.e. whether the MUL and ADD are separate instructions.
Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA.
But a true fused mul-add is only available on VFPv4: VFMA.
2015-01-31 14:15:57 -05:00
Benoit Jacob
9f99f61e69
bug #936 , patch 1/3: some cleanup and renaming for consistency.
2015-01-30 17:43:56 -05:00
Benoit Jacob
759bd92a85
bug #935 : Add asm comments in GEBP kernels to work around a bug
...
in both GCC and Clang on ARM/NEON, whereby they spill registers,
severely harming performance. The reason why the asm comments
make a difference is that they prevent the compiler from
reordering code across these boundaries, which has the effect
of extending the lifetime of local variables and increasing
register pressure on this register-tight code.
2015-01-30 17:27:56 -05:00
Gael Guennebaud
f1092d2f73
bug #941 : fix accuracy issue in ColPivHouseholderQR, do not stop decomposition on a small pivot
2015-01-30 19:04:04 +01:00
Gael Guennebaud
9d82f7e30d
Supernodes was disabled.
2015-01-30 17:24:40 +01:00
Benoit Steiner
e896c0ade7
Marked the contraction operation as read only, since its result can't be assigned.
2015-01-29 10:29:47 -08:00
Benoit Steiner
5a6ea4edf6
Added more tests to cover tensor reductions
2015-01-28 10:02:47 -08:00
Gael Guennebaud
a727a2c4ed
bug #933 : RealSchur, do not consider the input matrix norm to check negligible sub-diag entries. This also makes this test consistent with the complex and self-adjoint cases.
2015-01-28 16:07:51 +01:00
Benoit Steiner
9dfdbd7e56
mproved the performance of tensor reductions that preserve the inner most dimension(s).
2015-01-27 14:15:31 -08:00
Benoit Steiner
46fc881e4a
Added a few benchmarks for the tensor code
2015-01-26 17:46:40 -08:00
Gael Guennebaud
c6eb84aabc
Enable vectorization of transposeInPlace for PacketSize x PacketSize matrices
2015-01-26 17:09:01 +01:00
Gael Guennebaud
e1f1091fde
Add support for dense ?= diagonal
2015-01-24 10:32:49 +01:00
Gael Guennebaud
b9d314ae19
bug #329 : fix typo
2015-01-17 21:55:33 +01:00
Benoit Steiner
14f537c296
gcc doesn't consider that
...
template<typename OtherDerived> TensorStridingOp& operator = (const OtherDerived& other)
provides a valid assignment operator for the striding operation, and therefore refuses to compile code like:
result.stride(foo) = source.stride(bar);
Added the explicit
TensorStridingOp& operator = (const TensorStridingOp& other)
as a workaround to get the code to compile, and did the same in all the operations that can be used as lvalues.
2015-01-16 09:09:23 -08:00
Benoit Steiner
641e824c56
Added cube() operation
2015-01-15 11:11:48 -08:00
Benoit Steiner
b5124e7cfd
Created many additional tests
2015-01-14 15:46:04 -08:00
Benoit Steiner
54e3633b43
Updated the list of include files
2015-01-14 15:43:38 -08:00
Benoit Steiner
f697df7237
Improved support for RowMajor tensors
...
Misc fixes and API cleanups.
2015-01-14 15:38:48 -08:00
Benoit Steiner
6559d09c60
Ensured that each thread has it's own copy of the TensorEvaluator: this avoid race conditions when the evaluator calls a non thread safe functor, eg when generating random numbers.
2015-01-14 15:34:50 -08:00
Benoit Steiner
8a382aa119
Improved the resizing of tensors
2015-01-14 15:33:11 -08:00
Benoit Steiner
703c526355
Misc improvements
2015-01-14 15:31:52 -08:00
Benoit Steiner
4cdf3fe427
Misc fixes
2015-01-14 15:30:47 -08:00
Benoit Steiner
0feff6e987
Expanded the functionality of index lists
2015-01-14 15:29:48 -08:00
Gael Guennebaud
cd679f2c47
Fix doc: setConstant does not exist for SparseMatrix.
2015-01-14 22:06:09 +01:00
Benoit Steiner
1ac8600126
Fixed the return type of coefficient wise operations. For example, the abs function returns a floating point value when called on a complex input.
2015-01-14 12:47:46 -08:00
Benoit Steiner
378bdfb7f0
Added missing apis to the TensorMap class
2015-01-14 12:45:20 -08:00
Benoit Steiner
0526dc1bb4
Added missing apis to the tensor class
2015-01-14 12:44:08 -08:00
Benoit Steiner
1a36590e84
Fixed the printing of RowMajor tensors
2015-01-14 12:43:20 -08:00
Benoit Steiner
7e0b6c56b4
Added ability to initialize a tensor using an initializer list
2015-01-14 12:41:30 -08:00
Benoit Steiner
b12dd1ae3c
Misc improvements for fixed size tensors
2015-01-14 12:39:34 -08:00
Benoit Steiner
71676eaddd
Added support for RowMajor inputs to the contraction code.
2015-01-14 12:36:57 -08:00
Benoit Steiner
0a0ab6dd15
Increased the functionality of the tensor devices
2015-01-14 11:45:17 -08:00
Benoit Steiner
5692723c58
Improved the performance of the contraction code on CUDA
2015-01-14 11:42:52 -08:00
Benoit Steiner
8f4b8d204b
Improved the performance of tensor reductions
...
Added the ability to generate random numbers following a normal distribution
Created a test to validate the ability to generate random numbers.
2015-01-14 10:19:33 -08:00
Benoit Steiner
3bd2b41b2e
Created a test for tensor type casting
2015-01-14 10:17:02 -08:00
Benoit Steiner
4928ea1212
Added ability to reverse the order of the coefficients in a tensor
2015-01-14 10:15:58 -08:00
Benoit Steiner
b00fe1590d
Added ability to swap the layout of a tensor
2015-01-14 10:14:46 -08:00
Benoit Steiner
c94174b4fe
Improved tensor references
2015-01-14 10:13:08 -08:00
Benoit Steiner
91dd53e54d
Created some documentation
2015-01-13 16:07:51 -08:00
Gael Guennebaud
279786e987
Fix missing evaluator in outer-product
2015-01-13 10:25:50 +01:00
Gael Guennebaud
ae4644cc68
bug #907 , ARM64: workaround ICE in xcode/clang
2015-01-13 10:03:00 +01:00
Gael Guennebaud
36f7c1337f
bug #907 , ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang
2015-01-13 09:57:37 +01:00
Gael Guennebaud
63974bcb88
Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64)
2015-01-07 09:44:25 +01:00
Gael Guennebaud
79f4a59ed9
bug #907 : fix compilation with ARM64
2015-01-07 09:41:56 +01:00
Benoit Steiner
9f98650d0a
Ensured that contractions that can be reduced to a matrix vector product work correctly even when the input coefficients aren't aligned.
2015-01-06 09:29:13 -08:00
Gael Guennebaud
db5b0741b5
Fix bug #925 : typo in MatLab versions of middleRows
2015-01-04 21:39:50 +01:00
Gael Guennebaud
f5f6e2c6f4
bug #921 : fix utilization of bitwise operation on enums in first_aligned
2014-12-19 14:41:59 +01:00
Gael Guennebaud
25c7d9164f
bug #920 : fix MSVC 2015 compilation issues
2014-12-18 22:58:15 +01:00
Gael Guennebaud
b8d9eaa19b
Use true compile time "if" for Transform::makeAffine
2014-12-13 22:16:39 +01:00
Gael Guennebaud
f806c23012
Fix false negatives in geo_transformations unit tests
2014-12-16 16:50:30 +01:00