Gael Guennebaud
8313fb7df7
Add row/column-wise reverseInPlace feature.
2015-03-31 21:35:53 +02:00
Gael Guennebaud
dfb674a25e
Make reverseInPlace really work in-place.
2015-03-31 20:17:10 +02:00
Gael Guennebaud
20d030f207
Fix vectorization of swap for non trivial expressions
2015-03-31 20:16:02 +02:00
Benoit Jacob
73cdeae1d3
Only use blocking sizes LUTs for single-thread products for now
2015-03-31 11:17:23 -04:00
Benoit Jacob
0cbd5ae3cb
Correctly detect Android with ndk_build
2015-03-31 11:17:21 -04:00
Gael Guennebaud
ae01c05e18
Fix computeProductBlockingSizes with m==0, and add respective unit test.
2015-03-31 15:19:57 +02:00
Gael Guennebaud
bd76d837e6
Fix sign of SuperLU::determinant
2015-03-31 14:57:32 +02:00
Gael Guennebaud
35d3053d55
Fix regression introduced in 3b169d792d
2015-03-31 09:23:53 +02:00
Christoph Hertzberg
3b169d792d
Suppress unused variable warning
2015-03-31 00:49:08 +02:00
Christoph Hertzberg
1efae98fee
bug #985 : RealQZ failed when either matrix had zero rows or columns (report and patch by Ben Goodrich)
...
Also added a regression test
2015-03-30 23:56:20 +02:00
Benoit Steiner
35722fa022
Made the index type a template parameter of the tensor class instead of encoding it in the options.
2015-03-30 14:55:54 -07:00
Christoph Hertzberg
58af8bf90c
bug #982 : Make sure numext::maxi and numext::mini are called correctly, in case Scalar expressions return expression templates.
2015-03-30 16:47:22 +02:00
Gael Guennebaud
2adbf6b8ca
fix stupid warning with old GCC
2015-03-28 22:34:54 +01:00
Gael Guennebaud
41e20248f8
merge
2015-03-28 14:43:35 +01:00
Christoph Hertzberg
09a5361d1b
bug #983 : Pass Vector3 by const reference and not by value
2015-03-28 12:36:24 +01:00
Gael Guennebaud
eb7e4c2b9c
Pass Vector3 type by reference
2015-03-27 12:11:24 +01:00
Gael Guennebaud
79cb875249
merge
2015-03-27 10:56:04 +01:00
Gael Guennebaud
1b8cc9af43
Slight numerical stability improvement in 2x2 svd
2015-03-27 10:55:00 +01:00
Gael Guennebaud
3d59ae0203
Fix hypot(0,0).
2015-03-27 09:59:24 +01:00
Benoit Steiner
d3f7915aeb
Pulled latest update from the eigen main codebase
2015-03-24 13:12:14 -07:00
Benoit Steiner
abdbe8562e
Fixed the CUDA packet primitives
2015-03-24 10:45:46 -07:00
Gael Guennebaud
29eaa2b0f1
Make MatrixBase::is* methods aware of nested_eval.
2015-03-24 13:42:42 +01:00
Gael Guennebaud
d27968eb7e
D&C SVD: directly falls back to JacobiSVD for very small problems (by-pass upper-bidiagonalization)
2015-03-24 13:38:07 +01:00
Gael Guennebaud
4472f3e578
Avoid SVD: consider denormalized small numbers as zero when computing the rank of the matrix
2015-03-23 09:40:21 +01:00
Deanna Hood
83e5b7656b
Use M_PI instead of acos(-1) for pi
2015-03-22 06:04:31 +10:00
Deanna Hood
4bab4790c0
Add \sa tags of isFinite/isInf for each other
2015-03-22 05:39:08 +10:00
Gael Guennebaud
4e2b18d909
Update approx. minimum ordering method to push and keep structural empty diagonal elements to the bottom-right part of the matrix
2015-03-20 16:33:48 +01:00
Gael Guennebaud
d6b2f300db
Fix MSVC compilation: aligned type must be passed by reference
2015-03-19 17:28:32 +01:00
Gael Guennebaud
61c45d7cfd
Fix comparison warning
2015-03-19 17:13:22 +01:00
Gael Guennebaud
f329d0908a
Improve random number generation for integer and add unit test
2015-03-19 15:10:36 +01:00
Benoit Jacob
dc04f12967
use unsigned short instead of uint16_t which doesn't exist in c++98
2015-03-17 10:31:45 -04:00
Deanna Hood
596be3cd86
Use std::arg for real numbers when c++11 is used, custom implementation otherwise
2015-03-17 15:28:12 +10:00
Deanna Hood
e26134ed62
Use std::round when c++11 is used, custom implementation otherwise
2015-03-17 14:55:14 +10:00
Deanna Hood
e21e29a088
Update cost of arg call to depend on if the scalar is complex or not
2015-03-17 14:04:33 +10:00
Deanna Hood
447a5a6b01
Fix VML declarations to only be for real/complex as appropriate
2015-03-17 13:33:31 +10:00
Deanna Hood
f52b78491c
Remove packet isNaN, isInf, isFinite
2015-03-17 09:26:24 +10:00
Deanna Hood
1c78d6f2a6
Add boolean not operator (!) array support
2015-03-17 08:29:57 +10:00
Benoit Jacob
364cfd529d
Similar to cset 3589a9c115
...
, also in 2px4 kernel: actual_panel_rows computation should always be resilient to parameters not consistent with the known L1 cache size, see comment
2015-03-16 16:28:44 -04:00
Deanna Hood
e1d6e6c972
Make cube, inverse and abs2 free-functions
2015-03-17 06:25:24 +10:00
Benoit Jacob
577056aa94
Include stdint.h. Not going for cstdint because it is a C++11 addition. Needed for uint16_t at least, in lookup-table code.
2015-03-16 16:21:50 -04:00
Benoit Jacob
eb6929cb19
fix bug in maxsize calculation, which would cause products of size > 2048 to address the lookup table out of bounds
2015-03-16 16:15:47 -04:00
Deanna Hood
fef4e071d7
Rename isinf to isInf
2015-03-17 05:58:47 +10:00
Deanna Hood
46cf9cda32
Add isfinite array support as isFinite
2015-03-17 04:33:12 +10:00
Deanna Hood
1d76ceab55
Remove floor, ceil, round for complex numbers
2015-03-17 02:36:07 +10:00
Deanna Hood
717b7954ce
Update cost of coeff-wise arg call
2015-03-17 02:11:57 +10:00
Deanna Hood
fb68b149cb
Rename isnan to isNaN
2015-03-17 02:04:42 +10:00
Benoit Jacob
35c3a8bb84
Update Nexus 5 lookup table from combining now 2 runs of the benchmark, using the analyze-blocking-sizes partition tool. Gives better worst-case performance.
2015-03-16 11:05:51 -04:00
Benoit Jacob
e274607d7f
fix compilation with GCC 4.8
2015-03-16 10:48:27 -04:00
Benoit Jacob
151b8b95c6
Fix bug in case where EIGEN_TEST_SPECIFIC_BLOCKING_SIZE is defined but false
2015-03-15 19:10:51 -04:00
Benoit Jacob
02babb9c0f
Provide a empirical lookup table for blocking sizes measured on a Nexus 5. Only for float, only for Android on ARM 32bit for now.
2015-03-15 18:13:12 -04:00
Benoit Jacob
3589a9c115
actual_panel_rows computation should always be resilient to parameters not consistent with the known L1 cache size, see comment
2015-03-15 18:12:18 -04:00
Benoit Jacob
1dd3d89818
Fix a unused-var warning
2015-03-15 18:07:19 -04:00
Benoit Jacob
e56aabf205
Refactor computeProductBlockingSizes to make room for the possibility of using lookup tables
2015-03-15 18:05:12 -04:00
Benoit Jacob
488c15615a
organize a little our default cache sizes, and use a saner default L1 outside of x86 (10% faster on Nexus 5)
2015-03-13 14:51:26 -07:00
Gael Guennebaud
1330f8bbd1
bug #973 , improve AVX support by enabling vectorization of Vector4i-like types, and enforcing alignement of Vector4f/Vector2d-like types to preserve compatibility with SSE and future Eigen versions that will vectorize them with AVX enabled.
2015-03-13 21:15:50 +01:00
Gael Guennebaud
d99ab35f9e
Fix internal::random(x,y) for integer types. The previous implementation could return y+1. The new implementation uses rejection sampling to get an unbiased behabior.
2015-03-13 21:12:46 +01:00
Gael Guennebaud
8580eb6808
bug #949 : add static assertion for incompatible scalar types in dense end-user decompositions.
2015-03-13 21:06:20 +01:00
Gael Guennebaud
a9df28c95b
SparseMatrix::insert: switch to a fully uncompressed mode if sequential insertion is not possible (otherwise an arbitrary large amount of memory was preallocated in some cases)
2015-03-13 21:00:21 +01:00
Gael Guennebaud
5ffe29cb9f
Bound pre-allocation to the maximal size representable by StorageIndex and throw bad_alloc if that's not possible.
2015-03-13 20:57:33 +01:00
Gael Guennebaud
2f6f8bf31c
Add missing coeff/coeffRef members to Block<sparse>, and extend unit tests.
2015-03-13 16:24:40 +01:00
Doug Kwan
657407227e
Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of
...
doubles instead of swapping the doubles.
2015-03-11 15:13:37 -07:00
Deanna Hood
f89fcefa79
Add hyperbolic trigonometric functions from std array support
2015-03-11 13:13:30 +10:00
Deanna Hood
a5e49976f5
Add log10 array support
2015-03-11 08:56:42 +10:00
Deanna Hood
19a71056ae
Allow calling of square(array) in addition to array.square()
2015-03-11 06:59:28 +10:00
Deanna Hood
31fdd67756
Additional unary coeff-wise functors (isnan, round, arg, e.g.)
2015-03-11 06:39:23 +10:00
Gael Guennebaud
fd78874888
Fix compilation of iterative solvers with dense matrices
2015-03-09 21:31:03 +01:00
Gael Guennebaud
d4317a85e8
Add typedefs for return types of SparseMatrixBase::selfadjointView
2015-03-09 21:29:46 +01:00
Gael Guennebaud
9e885fb766
Add unit tests for CG and sparse-LLT for long int as storage-index
2015-03-09 14:33:15 +01:00
Gael Guennebaud
224a1fe4c6
bug #963 : make IncompleteLUT compatible with non-default storage index types.
2015-03-09 13:55:20 +01:00
Gael Guennebaud
0ee391863e
Avoid undeflow when blocking size are tuned manually.
2015-03-06 21:51:09 +01:00
Gael Guennebaud
14a5f135a3
bug #969 : workaround abiguous calls to Ref using enable_if.
2015-03-06 17:51:31 +01:00
Gael Guennebaud
87681e508f
bug #978 : early return for vanishing products
2015-03-06 16:11:22 +01:00
Gael Guennebaud
cd3bbffa73
Improve blocking heuristic: if the lhs fit within L1, then block on the rhs in L1 (allows to keep packed rhs in L1)
2015-03-06 14:31:39 +01:00
Gael Guennebaud
58740ce4c6
Improve product kernel: replace the previous dynamic loop swaping strategy by a more general one:
...
It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited.
2015-03-06 10:30:35 +01:00
Gael Guennebaud
4c8b95d5c5
Rename LSCG to LeastSquaresConjugateGradient
2015-03-05 10:16:32 +01:00
Gael Guennebaud
7550107028
Product optimization: implement a dynamic loop-swapping startegy to improve memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large"
2015-03-05 10:03:46 +01:00
Gael Guennebaud
2dc968e453
bug #824 : improve accuracy of Quaternion::angularDistance using atan2 instead of acos.
2015-03-04 17:03:13 +01:00
Benoit Steiner
0196141938
Fixed the optimized AVX implementation of the fast rsqrt function
2015-03-02 13:49:39 -08:00
Benoit Steiner
4fd7f47692
Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.
2015-03-02 09:38:47 -08:00
Benoit Steiner
fb53384b0f
Improved the default implementation of prsqrt
2015-02-28 01:51:26 -08:00
Benoit Steiner
306fceccbe
Pulled latest updates from trunk
2015-02-27 13:05:26 -08:00
Benoit Steiner
2386fc8528
Added support for 32bit index on a per tensor/tensor expression. This enables us to use 32bit indices to evaluate expressions on GPU faster while keeping the ability to use 64 bit indices to manipulate large tensors on CPU in the same binary.
2015-02-27 12:57:13 -08:00
Benoit Jacob
6466fa63be
Reimplement the selection between rotating and non-rotating kernels
...
using templates instead of macros and if()'s.
That was needed to fix the build of unit tests on ARM, which I had
broken. My bad for not testing earlier.
2015-02-27 15:30:10 -05:00
Benoit Steiner
05089aba75
Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts
2015-02-27 09:27:30 -08:00
Benoit Steiner
573b377110
Added support for vectorized type casting of tensors
2015-02-27 08:46:04 -08:00
Benoit Jacob
2fc3b484d7
remove trailing comma
2015-02-27 11:37:45 -05:00
Benoit Jacob
33669348c4
Disable Packet2f/2i halfpacket support in NEON.
...
I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented,
and code trying to use halfpackets just fails to compile on NEON, as it tries to use the
default implementation of pload/pstore and the types don't match.
2015-02-27 11:35:37 -05:00
Benoit Jacob
b7fc8746e0
Replace a static assert by a runtime one, fixes the build of unit tests on ARM
...
Also safely assert in the non-implemented path that should never be taken in practice,
and would return wrong results.
2015-02-27 10:01:59 -05:00
Benoit Steiner
f41b1f1666
Added support for fast reciprocal square root computation.
2015-02-26 09:42:41 -08:00
Gael Guennebaud
bcf9bb5c1f
Avoid packing rhs multiple-times when blocking on the lhs only.
2015-02-26 17:01:33 +01:00
Gael Guennebaud
4ec3f04b3a
Make sure that the block size computation is tested by our unit test.
2015-02-26 17:00:36 +01:00
Gael Guennebaud
a8ad8887bf
Implement a more generic blocking-size selection algorithm. See explanations inlines.
...
It performs extremely well on Haswell. The main issue is to reliably and quickly find the
actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)
2015-02-26 16:04:35 +01:00
Gael Guennebaud
400becc591
Fix typos in block-size testing code, and set peeling on k to 8.
2015-02-26 15:57:06 +01:00
Benoit Jacob
692136350b
So I extensively measured the impact of the offset in this prefetch. I tried offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes).
...
On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400.
I could not see any significant impact of this offset.
On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0.
So let's just go with 0!
Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!
2015-02-25 12:37:14 -05:00
Christoph Hertzberg
531fa9de77
bug #970 : Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports RValue-references.
2015-02-24 21:03:28 +01:00
Benoit Jacob
26275b250a
Fix my recent prefetch changes:
...
- the first prefetch is actually harmful on Haswell with FMA,
but it is the most beneficial on ARM.
- the second prefetch... I was very stupid and multiplied by sizeof(scalar)
and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8.
So this effectively restores the older offset. Actually, there were
two prefetches here, one with offset 48 and one with offset 64. I could not
confirm any benefit from this strange 48 offset on either the haswell or
my ARM device.
2015-02-23 16:55:17 -05:00
Christoph Hertzberg
052b6b40f1
Fix two trivial warnings
2015-02-22 12:40:51 +01:00
Christoph Hertzberg
ecbf2a6656
log1p is defined only for real Scalars in C++11
2015-02-21 19:58:24 +01:00
Gael Guennebaud
3cf642baa3
Fix compilation of unit tests disabling assertion cheking
2015-02-21 14:13:48 +01:00
Gael Guennebaud
2da1594750
Fix doc of Ref<>
2015-02-20 11:52:22 +01:00
Gael Guennebaud
b192e29eae
In C++11 destructors do not throw by default (fix CommaInitializer unit test)
2015-02-20 09:28:34 +01:00
Benoit Steiner
ab41652d81
Pulled latest changes from trunk
2015-02-19 21:23:37 -08:00
Benoit Steiner
7765039f1c
Marked the CUDA packet primitives as EIGEN_DEVICE_FUNC since they'll end up being executed on the GPU device.
2015-02-19 21:22:51 -08:00
Gael Guennebaud
a66f5fc2fd
Fix regression with C++11 support of lambda: now internal::result_of falls back to std::result_of in C++11.
2015-02-19 23:32:12 +01:00
Gael Guennebaud
1b7e12847d
Fix some calls to result_of on binary functors as unary ones.
2015-02-19 23:30:41 +01:00
Gael Guennebaud
0f4dd15dfc
Declare const some const variables
2015-02-19 23:28:57 +01:00
Gael Guennebaud
829dddd0fd
Add support for C++11 result_of/lambdas
2015-02-19 15:18:37 +01:00
Benoit Jacob
db05f2d01e
rotating kernel: avoid compiling anything outside of ARM
2015-02-18 15:43:52 -05:00
Benoit Jacob
0ed00d5438
remove a newly introduced redundant typedef - sorry.
2015-02-18 15:05:01 -05:00
Benoit Jacob
9bd8a4bab5
bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path
...
This is substantially faster on ARM, where it's important to minimize the number of loads.
This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome.
Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
2015-02-18 15:03:35 -05:00
Hauke Heibel
ee27d50633
Fixed template parameter.
2015-02-18 18:51:08 +01:00
Gael Guennebaud
73a24de424
merge
2015-02-18 15:51:00 +01:00
Gael Guennebaud
63eb0f6fe6
Clean a bit computeProductBlockingSizes (use Index type, remove CEIL macro)
2015-02-18 15:49:05 +01:00
Benoit Jacob
4a3e6c8be1
bug #958 - Allow testing specific blocking sizes
...
This is only a debugging/testing patch. It allows testing specific
product blocking sizes, typically to study the impact on performance.
Example usage:
int testk, testm, testn;
#define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES
#define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk
#define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm
#define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn
#include <Eigen/Core>
2015-02-18 09:43:55 -05:00
Gael Guennebaud
c7bb1e8ea8
Fix a regression when using OpenMP, and fix bug #714 : the number of threads might be lower than the number of requested ones
2015-02-18 15:19:23 +01:00
Jan Blechta
168ceb271e
Really use zero guess in ConjugateGradients::solve as documented
...
and expected for consistency with other methods.
2015-02-18 14:26:10 +01:00
Gael Guennebaud
8fdcaded5e
merge
2015-03-04 10:18:08 +01:00
Gael Guennebaud
c43154bbc5
Check for no-reallocation in SparseMatrix::insert (bug #974 )
2015-03-04 10:16:46 +01:00
Gael Guennebaud
1ce0178363
Improve efficiency of SparseMatrix::insert/coeffRef for sequential outer-index insertion strategies (bug #974 )
2015-03-04 09:39:26 +01:00
Gael Guennebaud
3dca4a1efc
Update manual wrt new LSCG solver.
2015-03-04 09:35:30 +01:00
Gael Guennebaud
05274219a7
Add a CG-based solver for rectangular least-square problems (bug #975 ).
2015-03-04 09:34:27 +01:00
Benoit Jacob
2aa09e6b4e
Fix asm comments in 1px1 kernel
2015-03-03 13:44:00 -05:00
Benoit Jacob
eae8e27b7d
Add a benchmark-default-sizes action to benchmark-blocking-sizes.cpp
2015-03-03 11:41:21 -05:00
Marc Glisse
37a93c4263
New scoring functor to select the pivot.
...
This is can be useful for non-floating point scalars, where choosing the biggest element is generally not the best choice.
2015-03-03 17:08:28 +01:00
Benoit Jacob
ccc1277a42
must also disable complex<double> when disabling double vectorization
2015-03-03 10:17:05 -05:00
Benoit Jacob
f839099512
Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON intrinsics.
2015-03-03 09:35:22 -05:00
Benoit Jacob
1ec0f4fadf
HalfPacket also needed to be disabled for double, on ARMv8.
2015-03-02 16:08:54 -05:00
Gael Guennebaud
3109f0e74e
Add SSE vectorization of Quaternion::conjugate. Significant speed-up when combined with products like q1*q2.conjugate()
2015-03-02 20:09:33 +01:00
Gael Guennebaud
9aee1e300a
Increase unit-test L1 cache size to ensure we are doing at least 2 peeled loop within product kernel.
2015-02-27 22:55:12 +01:00
Gael Guennebaud
b10cd3afd2
Re-enbale detection of min/max parentheses protection, and re-enable mpreal_support unit test.
2015-02-27 22:38:00 +01:00
Gael Guennebaud
548b781380
Fix bug #945 : workaround MSVC warning
2015-02-18 12:53:49 +01:00
Gael Guennebaud
6f4adc9e94
Add missing install directives for arch/CUDA
2015-02-18 11:40:06 +01:00
Gael Guennebaud
63464754ef
Add an internal assertion in makeCompressed to catch a possible risk of null-pointer access.
2015-02-18 11:29:54 +01:00
Gael Guennebaud
eb563049f7
Remove some dead stores.
2015-02-18 11:26:48 +01:00
Gael Guennebaud
dc7e6acc05
Fix possible usage of a null pointer in CholmodSupport
2015-02-18 11:26:25 +01:00
Gael Guennebaud
d4eda01488
Big 957, workaround MSVC/ICC compilation issue
2015-02-18 11:24:32 +01:00
Gael Guennebaud
20cac72b82
Packet must be passed by const reference and not by value to avoid alignment issue.
2015-02-17 22:58:32 +01:00
Christoph Hertzberg
97a36ecba4
Suppress some remaining Index conversion warnings
2015-02-17 18:52:39 +01:00
Gael Guennebaud
159fb181c2
Disable __m128* wrappers when compiling with AVX and -fabi-version=4
2015-02-17 16:27:20 +01:00
Gael Guennebaud
91ab2489dd
Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same type with default ABI)
2015-02-17 16:08:07 +01:00
Gael Guennebaud
9daf8eba6f
Fix compilation of Cholmod*(matrix) ctor
2015-02-17 15:24:52 +01:00
Gael Guennebaud
3373c903b3
Fix compilation of int*complex with gcc
2015-02-16 19:18:12 +01:00
Gael Guennebaud
f0b1b1df9b
Fix SparseLU::signDeterminant() method, and add a SparseLU::determinant() method.
2015-02-16 19:09:22 +01:00
Gael Guennebaud
8768ff3c31
Add PermutationMatrix::determinant method.
2015-02-16 19:08:25 +01:00
Martin Drozdik
64b29e06b9
bug #956 : Fixed bug in move constructors of DenseStorage which caused "moved-from" objects to be in an invalid state.
2015-02-16 18:18:46 +09:00
Gael Guennebaud
1c0e8bcf09
Fix unused variable warning.
2015-02-16 17:21:30 +01:00
Gael Guennebaud
0f464d9d87
bug #897 : fix regression in BiCGSTAB(mat) ctor (an all other iterative solvers).
...
Add respective regression unit test.
2015-02-16 17:05:10 +01:00
Gael Guennebaud
470d26d580
Remove some useless typedefs
2015-02-16 16:48:21 +01:00
Gael Guennebaud
953d5ccfd5
Doc: explain how to free allocated memory in SparseMAtrix
2015-02-16 15:56:11 +01:00
Gael Guennebaud
98604576d1
Merged in chtz/eigen-indexconversion (pull request PR-92)
...
bug #877 , bug #572 : Get rid of Index conversion warnings, summary of changes:
- Introduce a global typedef Eigen::Index making Eigen::DenseIndex and AnyExpr<>::Index deprecated (default is std::ptrdiff_t).
- Eigen::Index is used throughout the API to represent indices, offsets, and sizes.
- Classes storing an array of indices uses the type StorageIndex to store them. This is a template parameter of the class. Default is int.
- Methods that *explicitly* set or return an element of such an array take or return a StorageIndex type. In all other cases, the Index type is used.
2015-02-16 15:29:00 +01:00
Gael Guennebaud
45cbb0bbb1
The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index
2015-02-16 15:05:41 +01:00
Gael Guennebaud
cc641aabb7
Remove deprecated usage of expr::Index.
2015-02-16 14:46:51 +01:00
Gael Guennebaud
aa6c516ec1
Fix many long to int conversion warnings:
...
- fix usage of Index (API) versus StorageIndex (when multiple indexes are stored)
- use StorageIndex(val) when the input has already been check
- use internal::convert_index<StorageIndex>(val) when val is potentially unsafe (directly comes from user input)
2015-02-16 13:19:05 +01:00
Christoph Hertzberg
bd511dde9d
bug #952 : Missing \endcode made doxygen fail to build ColPivHouseholderQR
2015-02-15 06:08:25 +01:00
Benoit Steiner
e2cfddf75f
Pulled latest updates from trunk
2015-02-13 16:21:59 -08:00
Benoit Steiner
0927801a84
Optimized version of the sin(), exp(), log() and sqrt() function for AVX
2015-02-13 16:07:08 -08:00
Benoit Jacob
e972b55ec4
bug #953 - Fix prefetches in 3px4 product kernel
...
This gives a 10% speedup on nexus 4 and on nexus 5.
2015-02-13 14:52:36 -05:00
Gael Guennebaud
fc202bab39
Index refactoring: StorageIndex must be used for storage only (and locally when it make sense). In all other cases use the global Index type.
2015-02-13 18:57:41 +01:00
Gael Guennebaud
fe51319980
Merge Index-refactoring branch with default, fix PastixSupport, remove some useless typedefs
2015-02-13 10:03:53 +01:00
Gael Guennebaud
0918c51e60
merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper
2015-02-12 21:48:41 +01:00
Gael Guennebaud
409547a0c8
update EIGEN_FAST_MATH documentation
2015-02-12 21:04:31 +01:00
Benoit Steiner
f669f5656a
Marked a few functions as EIGEN_DEVICE_FUNC to enable the use of tensors in cuda kernels.
2015-02-10 14:29:47 -08:00
Gael Guennebaud
029d236ceb
merge
2015-02-10 23:12:47 +01:00
Gael Guennebaud
fe25f3b8e3
FMA has been wrongly disabled
2015-02-10 23:11:35 +01:00
Benoit Steiner
cc5d7ff523
Added vectorized implementation of the exponential function for ARM/NEON
2015-02-10 14:02:38 -08:00
Jan Blechta
c3f3580b8f
Fix bug #733 : step by step solving is not a good example for solveWithGuess
2015-02-10 14:24:39 +01:00
Gael Guennebaud
c6e8caf090
Allows Lower|Upper as a template argument of CG and MINRES: in this case the full matrix will be considered.
2015-02-10 18:57:41 +01:00
Gael Guennebaud
87629cd639
bug #897 : makes iterative sparse solvers use a Ref<SparseMatrix> instead of a SparseMatrix pointer. This fixes usage of iterative solvers with a Map<SparseMatrix>.
2015-02-09 11:41:25 +01:00
Gael Guennebaud
d4ec48575e
Make Block<SparseMatrix> inherit SparseCompressedBase in the case of an inner-panels and fix valuePtr() innerIndexPtr()
2015-02-09 11:14:36 +01:00
Gael Guennebaud
3af29caae8
Cleaning and add more unit tests for Ref<SparseMatrix> and Map<SparseMatrix>
2015-02-09 10:23:45 +01:00
Gael Guennebaud
f2ff8c091e
Add a Ref<SparseMatrix> specialization.
2015-02-07 22:04:18 +01:00
Gael Guennebaud
f3be317614
Add a Map<SparseMatrix> specialization.
2015-02-07 22:03:25 +01:00
Gael Guennebaud
08081f8293
Make SparseTranspose inherit SparseCompressBase when possible
2015-02-07 22:02:14 +01:00
Gael Guennebaud
7838fda82c
Add a SparseCompressedBase class providing (un)compressed accessors (like data()/*Stride() for dense matrices),
...
and a CompressedAccessBit flag (similar to DirectAccessBit for dense matrices).
2015-02-07 22:00:46 +01:00
Benoit Steiner
01f7918788
Pulled latest fixes
2015-02-06 05:30:20 -08:00
Gael Guennebaud
b50ffaddf2
merge
2015-02-06 14:27:12 +01:00
Gael Guennebaud
74e460b995
Fix symmetric product
2015-02-06 14:26:24 +01:00
Benoit Steiner
c739102ef9
Pulled the latest changes from the trunk
2015-02-06 05:25:03 -08:00
Benoit Steiner
dcb2a8b184
Added the EIGEN_HAS_CONSTEXPR define
...
Gate the tensor index list code based on the value of EIGEN_HAS_CONSTEXPR
2015-02-06 02:51:59 -08:00
Gael Guennebaud
b1eca55328
Use Ref<> to ensure that both x and b in Ax=b are compatible with Umfpack/SuperLU expectations
2015-02-03 23:46:05 +01:00
Gael Guennebaud
ebdf6a2dbb
SPQR: fix default threshold value
2015-02-03 22:32:34 +01:00
Benoit Jacob
5ef95fabee
bug #936 , patch 3/3: Properly detect FMA support on ARM (requires VFPv4)
...
and use it instead of MLA when available, because it's both more accurate,
and faster.
2015-01-30 17:45:03 -05:00
Benoit Jacob
0f21613698
bug #936 , patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with EIGEN_HAS_SINGLE_INSTRUCTION_MADD
2015-01-30 17:44:26 -05:00
Benoit Jacob
340b8afb14
bug #936 , patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,
...
because this is what they are about. "Fused" means "no intermediate rounding
between the mul and the add, only one rounding at the end". Instead,
what we are concerned about here is whether a temporary register is needed,
i.e. whether the MUL and ADD are separate instructions.
Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA.
But a true fused mul-add is only available on VFPv4: VFMA.
2015-01-31 14:15:57 -05:00
Benoit Jacob
9f99f61e69
bug #936 , patch 1/3: some cleanup and renaming for consistency.
2015-01-30 17:43:56 -05:00
Benoit Jacob
759bd92a85
bug #935 : Add asm comments in GEBP kernels to work around a bug
...
in both GCC and Clang on ARM/NEON, whereby they spill registers,
severely harming performance. The reason why the asm comments
make a difference is that they prevent the compiler from
reordering code across these boundaries, which has the effect
of extending the lifetime of local variables and increasing
register pressure on this register-tight code.
2015-01-30 17:27:56 -05:00
Gael Guennebaud
f1092d2f73
bug #941 : fix accuracy issue in ColPivHouseholderQR, do not stop decomposition on a small pivot
2015-01-30 19:04:04 +01:00
Gael Guennebaud
9d82f7e30d
Supernodes was disabled.
2015-01-30 17:24:40 +01:00
Gael Guennebaud
a727a2c4ed
bug #933 : RealSchur, do not consider the input matrix norm to check negligible sub-diag entries. This also makes this test consistent with the complex and self-adjoint cases.
2015-01-28 16:07:51 +01:00
Gael Guennebaud
c6eb84aabc
Enable vectorization of transposeInPlace for PacketSize x PacketSize matrices
2015-01-26 17:09:01 +01:00
Gael Guennebaud
e1f1091fde
Add support for dense ?= diagonal
2015-01-24 10:32:49 +01:00
Gael Guennebaud
b9d314ae19
bug #329 : fix typo
2015-01-17 21:55:33 +01:00
Gael Guennebaud
279786e987
Fix missing evaluator in outer-product
2015-01-13 10:25:50 +01:00
Gael Guennebaud
ae4644cc68
bug #907 , ARM64: workaround ICE in xcode/clang
2015-01-13 10:03:00 +01:00
Gael Guennebaud
36f7c1337f
bug #907 , ARM64: workaround vreinterpretq_u64_* not defined in xcode/clang
2015-01-13 09:57:37 +01:00
Gael Guennebaud
63974bcb88
Big 907: workaround some missing intrinsics in current NDK's gcc version (ARM64)
2015-01-07 09:44:25 +01:00
Gael Guennebaud
79f4a59ed9
bug #907 : fix compilation with ARM64
2015-01-07 09:41:56 +01:00
Benoit Steiner
9f98650d0a
Ensured that contractions that can be reduced to a matrix vector product work correctly even when the input coefficients aren't aligned.
2015-01-06 09:29:13 -08:00
Gael Guennebaud
f5f6e2c6f4
bug #921 : fix utilization of bitwise operation on enums in first_aligned
2014-12-19 14:41:59 +01:00
Gael Guennebaud
25c7d9164f
bug #920 : fix MSVC 2015 compilation issues
2014-12-18 22:58:15 +01:00
Gael Guennebaud
b8d9eaa19b
Use true compile time "if" for Transform::makeAffine
2014-12-13 22:16:39 +01:00
Gael Guennebaud
7dad5f797e
bug #821 : workaround MSVC 2013 issue with using Base::Base::operator=
2014-12-16 13:33:43 +01:00
Christoph Hertzberg
e8cdbedefb
bug #877 , bug #572 : Introduce a global Index typedef. Rename Sparse*::Index to StorageIndex, make Dense*::StorageIndex an alias to DenseIndex. Overall this commit gets rid of all Index conversion warnings.
2014-12-04 22:48:53 +01:00
Gael Guennebaud
433bce5c3a
UmfPack support: fix redundant evaluation/copies when calling compute() and support generic expressions as input
2014-12-02 17:30:57 +01:00
Gael Guennebaud
775f7e5fbb
bug #697 : make sure empty classes are at the end in case of multiple inheritence
2014-12-02 14:40:19 +01:00
Gael Guennebaud
a819fa148d
Fix MSVC compilation issue
2014-12-02 14:35:31 +01:00
Gael Guennebaud
1a8dc85142
bug #897 : fix UmfPack usage with mapped sparse matrices
2014-12-02 13:57:13 +01:00
Gael Guennebaud
4974d1d2b4
Fix bug #911 : m_extractedDataAreDirty was not initialized in UmfPackLU
2014-12-02 13:54:06 +01:00
Gael Guennebaud
e2f3e4e4aa
Document non-const SparseMatrix::diagonal() method.
2014-12-01 14:45:15 +01:00
Gael Guennebaud
b26e697182
Make SparseMatrix::coeff() returns a const reference and add a non const version of SparseMatrix::diagonal()
2014-12-01 14:41:39 +01:00
Gael Guennebaud
b1f9f603a0
Simplify return type of diagonal(Index) (and ease compiler job)
2014-11-28 14:39:47 +01:00
Gael Guennebaud
5384e89147
Disable MatrixBase::bdcSvd with CUDA (just like MatrixBase::jacobiSvd
2014-11-26 22:29:29 +01:00
Gael Guennebaud
8518ba0bbc
Fix Hyperplane::Through(a,b,c) when points are aligned or identical. We use the stratgey as in Quaternion::setFromTwoVectors.
2014-11-26 15:01:53 +01:00
Gael Guennebaud
0efaff9b3b
Fix out-of-bounds write
2014-12-11 16:15:20 +01:00
Gael Guennebaud
41a20994cc
In simplicial cholesky: avoid deep copy of the input matrix is this later can be used readily
2014-12-08 17:56:33 +01:00
Gael Guennebaud
a910a7466e
Fix inner iterator type
2014-12-08 17:55:31 +01:00
Gael Guennebaud
4371911861
Remove useless and non standard numext::atanh2 function.
2014-12-08 16:44:34 +01:00
Gael Guennebaud
bea36925db
bug #876 : implement a portable log1p function
2014-12-08 16:26:53 +01:00
Gael Guennebaud
7f7a712062
Optimize Simplicial Cholesky when NaturalOrdering is used.
2014-12-08 15:02:25 +01:00
Gael Guennebaud
30c849669d
Fix dynamic allocation in JacobiSVD (regression)
2014-12-08 14:45:04 +01:00
Gael Guennebaud
80ed5bd90c
Workaround various "returning reference to temporary" warnings.
2014-12-05 12:49:30 +01:00
Gael Guennebaud
da584912b6
Fix memory pre-allocation when permuting inner vectors of a sparse matrix.
2014-11-24 17:31:59 +01:00
Benoit Steiner
509e4ddc02
Added reduction packet primitives for CUDA
2014-11-19 10:34:11 -08:00
Gael Guennebaud
722916e19d
bug #903 : clean swap API regarding extra enable_if parameters, and add failtests for swap
2014-11-06 09:25:26 +01:00
Gael Guennebaud
c6fefe5d8e
Big 853: replace enable_if in Ref<> ctor by static assertions and add failtests for Ref<>
2014-11-05 16:15:17 +01:00
Gael Guennebaud
ee06f78679
Introduce unified macros to identify compiler, OS, and architecture. They are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.
2014-11-04 21:58:52 +01:00
Benoit Steiner
2dde63499c
Generalized the matrix vector product code.
2014-10-31 16:33:51 -07:00
Christoph Hertzberg
0833b82efd
Run sparse_basic unit tests also for rectangular matrices.
...
TriangularView with UnitDiag does not work properly yet (bug #901 )
2014-10-31 17:12:13 +01:00
Benoit Steiner
bc99c5f7db
fixed some potential alignment issues.
2014-10-30 18:09:53 -07:00
Benoit Steiner
1946cc4478
Added missing packet primitives for CUDA.
2014-10-30 17:52:32 -07:00
Christoph Hertzberg
4ec2f07a5b
Fixed bug in SparseBlock which caused a segfault in sparse_extra_3 test
2014-10-30 21:34:10 +01:00
Christoph Hertzberg
883168ed94
Make select CUDA compatible (comparison operators aren't yet, so no test case yet)
2014-10-30 20:16:16 +01:00
Christoph Hertzberg
e5f134006b
EIGEN_UNUSED_VARIABLE works better than casting to void. Make this also usable from CUDA code
2014-10-30 19:59:09 +01:00
Gael Guennebaud
21c0a2ce0c
Move D&C SVD to official SVD module.
2014-10-29 11:29:33 +01:00
Christoph Hertzberg
e2e7ba9f85
bug #898 : add inline hint to const_cast_ptr
2014-10-28 14:49:44 +01:00
Christoph Hertzberg
bd2d330b25
Temporary workaround for bug #875 :
...
Let TriangularView<Sparse>::nonZeros() return nonZeros() of the nested expression
2014-10-28 13:31:00 +01:00
Konstantinos Margaritis
79225db0b6
Merged in kmargar/eigen (pull request PR-87)
...
Extend NEON to add ARMv8 64-bit double support
2014-10-28 13:08:53 +02:00
Konstantinos Margaritis
94ed7c81e6
Bug #896 : Swap order of checking __VSX__/__ALTIVEC__
2014-10-22 06:15:18 -04:00
Konstantinos Margaritis
fcb3573d17
Merged eigen/eigen into default
2014-10-22 10:42:18 +03:00
Konstantinos Margaritis
fae4fd7a26
Added ARMv8 support
2014-10-22 07:39:49 +00:00
Christoph Hertzberg
cf09c5f687
Prevent CUDA calling a __host__ function from a __host__ __device__ function is not allowed
error.
2014-10-21 20:40:09 +02:00
Konstantinos Margaritis
b508619392
working 64-bit support in PacketMath.h, Complex.h needed
2014-10-21 18:10:33 +00:00
Konstantinos Margaritis
87524922dc
check for __ARM_NEON instead as it's defined in arm64 as well
2014-10-21 18:08:50 +00:00
Gael Guennebaud
fe57b2f963
bug #701 : workaround (min) and (max) blocking ADL by introducing numext::mini and numext::maxi internal functions and a EIGEN_NOT_A_MACRO macro.
2014-10-20 15:55:32 +02:00
Gael Guennebaud
973e6a035f
bug #718 : Introduce a compilation error when using the wrong InnerIterator type with a SparseVector
2014-10-20 14:07:08 +02:00
Christoph Hertzberg
84aaa03182
Addendum to bug #859 : pexp(NaN) for double did not return NaN, also, plog(NaN) did not return NaN.
...
psqrt(NaN) and psqrt(-1) shall return NaN if EIGEN_FAST_MATH==0
2014-10-20 13:13:43 +02:00
Gael Guennebaud
aa5f79206f
Fix bug #859 : pexp(NaN) returned Inf instead of NaN
2014-10-20 11:38:51 +02:00
Gael Guennebaud
b4a9b3f496
Add unit tests for Rotation2D's inverse(), operator*, slerp, and fix regression wrt explicit ctor change
2014-10-20 11:04:32 +02:00
Gael Guennebaud
d04f23260d
Fix bug #894 : the sign of LDLT was not re-initialized at each call of compute()
2014-10-20 10:48:40 +02:00
Gael Guennebaud
8838b0a1ff
Fix SparseQR::rank for a completely empty matrix.
2014-10-19 22:42:20 +02:00