Rasmus Larsen
30242b7565
Merged eigen/eigen into default
2016-04-01 17:19:36 -07:00
Rasmus Munk Larsen
9d51f7c457
Add rcond method to LDLT.
2016-04-01 16:48:38 -07:00
Rasmus Munk Larsen
f54137606e
Add condition estimation to Cholesky (LLT) factorization.
2016-04-01 16:19:45 -07:00
Rasmus Munk Larsen
fb8dccc23e
Replace "inline static" with "static inline" for consistency.
2016-04-01 12:48:18 -07:00
Rasmus Munk Larsen
91414e0042
Fix comments in ConditionEstimator and minor cleanup.
2016-04-01 11:58:17 -07:00
Rasmus Munk Larsen
1aa89fb855
Add matrix condition estimator module that implements the Higham/Hager algorithm from http://www.maths.manchester.ac.uk/~higham/narep/narep135.pdf used in LPACK. Add rcond() methods to FullPivLU and PartialPivLU.
2016-04-01 10:27:59 -07:00
Till Hoffmann
80eba21ad0
Merge upstream.
2016-04-01 18:18:49 +01:00
Till Hoffmann
3cb0a237c1
Fixed suggestions by Eugene Brevdo.
2016-04-01 17:51:39 +01:00
tillahoffmann
49960adbdd
Merged eigen/eigen into default
2016-04-01 14:36:15 +01:00
Till Hoffmann
57239f4a81
Added polygamma function.
2016-04-01 14:35:21 +01:00
Till Hoffmann
dd5d390daf
Added zeta function.
2016-04-01 13:32:29 +01:00
Benoit Steiner
0ea7ab4f62
Hashing was only officially introduced in c++11. Therefore only define an implementation of the hash function for float16 if c++11 is enabled.
2016-03-31 14:44:55 -07:00
Benoit Steiner
92b7f7b650
Improved code formating
2016-03-31 13:09:58 -07:00
Benoit Steiner
f197813f37
Added the ability to hash a fp16
2016-03-31 13:09:23 -07:00
Benoit Steiner
4c859181da
Made it possible to use the NumTraits for complex and Array in a cuda kernel.
2016-03-31 12:48:38 -07:00
Benoit Steiner
c36ab19902
Added __ldg primitive for fp16.
2016-03-31 10:55:03 -07:00
Benoit Steiner
b575fb1d02
Added NumTraits for half floats
2016-03-31 10:43:59 -07:00
Benoit Steiner
8c8a79cec1
Fixed a typo
2016-03-31 10:33:32 -07:00
Benoit Steiner
4f1a7e51c1
Pull math functions from the global namespace only when compiling cuda code with nvcc. When compiling with clang, we want to use the std namespace.
2016-03-30 17:59:49 -07:00
Benoit Steiner
bc68fc2fe7
Enable constant expressions when compiling cuda code with clang.
2016-03-30 17:58:32 -07:00
Benoit Jacob
01b5333e44
bug #1186 - vreinterpretq_u64_f64 fails to build on Android/Aarch64/Clang toolchain
2016-03-30 11:02:33 -04:00
Benoit Steiner
1841d6d4c3
Added missing cuda template specializations for numext::ceil
2016-03-29 13:29:34 -07:00
Benoit Steiner
e02b784ec3
Added support for standard mathematical functions and trancendentals(such as exp, log, abs, ...) on fp16
2016-03-29 09:20:36 -07:00
Benoit Steiner
c38295f0a0
Added support for fmod
2016-03-28 15:53:02 -07:00
Konstantinos Margaritis
01e7298fe6
actually include ZVector files, passes most basic tests (float still fails)
2016-03-28 10:58:02 -04:00
Konstantinos Margaritis
f48011119e
Merged eigen/eigen into default
2016-03-28 01:48:45 +03:00
Konstantinos Margaritis
ed6b9d08f1
some primitives ported, but missing intrinsics and crash with asm() are a problem
2016-03-27 18:47:49 -04:00
Benoit Steiner
65716e99a5
Improved the cost estimate of the quotient op
2016-03-25 11:13:53 -07:00
Benoit Steiner
d94f6ba965
Started to model the cost of divisions more accurately.
2016-03-25 11:02:56 -07:00
Benoit Steiner
2e4e4cb74d
Use numext::abs instead of abs to avoid incorrect conversion to integer of the argument
2016-03-23 16:57:12 -07:00
Benoit Steiner
81d340984a
Removed executable bit from header files
2016-03-23 16:15:02 -07:00
Benoit Steiner
bff8cbad06
Removed executable bit from header files
2016-03-23 16:14:23 -07:00
Benoit Steiner
7a570e50ef
Fixed contractions of fp16
2016-03-23 16:00:06 -07:00
Benoit Steiner
fc3660285f
Made type conversion explicit
2016-03-23 09:56:50 -07:00
Benoit Steiner
0e68882604
Added the ability to divide a half float by an index
2016-03-23 09:46:42 -07:00
Benoit Steiner
6971146ca9
Added more conversion operators for half floats
2016-03-23 09:44:52 -07:00
Benoit Steiner
f9ad25e4d8
Fixed contractions of 16 bit floats
2016-03-22 09:30:23 -07:00
Benoit Steiner
134d750eab
Completed the implementation of vectorized type casting of half floats.
2016-03-18 13:36:28 -07:00
Benoit Steiner
7bd551b3a9
Make all the conversions explicit
2016-03-18 12:20:08 -07:00
Benoit Steiner
7b98de1f15
Implemented some of the missing type casting for half floats
2016-03-17 21:45:45 -07:00
Christoph Hertzberg
46aa9772fc
Merged in ebrevdo/eigen (pull request PR-169)
...
Bugfixes to cuda tests, igamma & igammac implemented, & tests for digamma, igamma, igammac on CPU & GPU.
2016-03-16 21:59:08 +01:00
Eugene Brevdo
1f69a1b65f
Change the header guard around certain numext functions to be CUDA specific.
2016-03-16 12:44:35 -07:00
Benoit Steiner
5a51366ea5
Fixed a typo.
2016-03-14 09:25:16 -07:00
Benoit Steiner
fcf59e1c37
Properly gate the use of cuda intrinsics in the code
2016-03-14 09:13:44 -07:00
Benoit Steiner
97a1f1c273
Make sure we only use the half float intrinsic when compiling with a version of CUDA that is recent enough to provide them
2016-03-14 08:37:58 -07:00
Benoit Steiner
e29c9676b1
Don't mark the cast operator as explicit, since this is a c++11 feature that's not supported by older compilers.
2016-03-12 00:15:58 -08:00
Benoit Steiner
eecd914864
Also replaced uint32_t with unsigned int to make the code more portable
2016-03-11 19:34:21 -08:00
Benoit Steiner
1ca8c1ec97
Replaced a couple more uint16_t with unsigned short
2016-03-11 19:28:28 -08:00
Benoit Steiner
0423b66187
Use unsigned short instead of uint16_t since they're more portable
2016-03-11 17:53:41 -08:00
Benoit Steiner
048c4d6efd
Made half floats usable on hardware that doesn't support them natively.
2016-03-11 17:21:42 -08:00
Benoit Steiner
456e038a4e
Fixed the +=, -=, *= and /= operators to return a reference
2016-03-10 15:17:44 -08:00
Eugene Brevdo
836e92a051
Update MathFunctions/SpecialFunctions with intelligent header guards.
2016-03-09 09:04:45 -08:00
Eugene Brevdo
5e7de771e3
Properly fix merge issues.
2016-03-08 17:35:05 -08:00
Eugene Brevdo
73220d2bb0
Resolve bad merge.
2016-03-08 17:28:21 -08:00
Eugene Brevdo
14f0fde51f
Add certain functions to numext (log, exp, tan) because CUDA doesn't support std::
...
Use these in SpecialFunctions.
2016-03-08 17:17:44 -08:00
Eugene Brevdo
0bb5de05a1
Finishing touches on igamma/igammac for GPU. Tests now pass.
2016-03-07 15:35:09 -08:00
Eugene Brevdo
5707004d6b
Fix Eigen's building of sharded tests that use CUDA & more igamma/igammac bugfixes.
...
0. Prior to this PR, not a single sharded CUDA test was actually being *run*.
Fixed that.
GPU tests are still failing for igamma/igammac.
1. Add calls for igamma/igammac to TensorBase
2. Fix up CUDA-specific calls of igamma/igammac
3. Add unit tests for digamma, igamma, igammac in CUDA.
2016-03-07 14:08:56 -08:00
Benoit Steiner
05bbca079a
Turn on some of the cxx11 features when compiling with visual studio 2015
2016-03-05 10:52:08 -08:00
Eugene Brevdo
0b9e0abc96
Make igamma and igammac work correctly.
...
This required replacing ::abs with std::abs.
Modified some unit tests.
2016-03-04 21:12:10 -08:00
Eugene Brevdo
7ea35bfa1c
Initial implementation of igamma and igammac.
2016-03-03 19:39:41 -08:00
Benoit Steiner
1032441c6f
Enable partial support for half floats on Kepler GPUs.
2016-03-03 10:34:20 -08:00
Benoit Steiner
1da10a7358
Enable the conversion between floats and half floats on older GPUs that support it.
2016-03-03 10:33:20 -08:00
Benoit Steiner
2de8cc9122
Merged in ebrevdo/eigen (pull request PR-167)
...
Add infinity() support to numext::numeric_limits, use it in lgamma.
I tested the code on my gtx-titan-black gpu, and it appears to work as expected.
2016-03-03 09:42:12 -08:00
Eugene Brevdo
ab3dc0b0fe
Small bugfix to numeric_limits for CUDA.
2016-03-02 21:48:46 -08:00
Eugene Brevdo
6afea46838
Add infinity() support to numext::numeric_limits, use it in lgamma.
...
This makes the infinity access a __device__ function, removing
nvcc warnings.
2016-03-02 21:35:48 -08:00
Gael Guennebaud
3fccef6f50
bug #537 : fix compilation with Apples's compiler
2016-03-02 13:22:46 +01:00
Gael Guennebaud
dfa80b2060
Compilation fix
2016-03-01 12:48:56 +01:00
Gael Guennebaud
bee9efc203
Compilation fix
2016-03-01 12:47:27 +01:00
Gael Guennebaud
e9bea614ec
Fix shortcoming in fixed-value deduction of startRow/startCol
2016-02-29 10:31:27 +01:00
Gael Guennebaud
8e6faab51e
bug #1172 : make valuePtr and innderIndexPtr properly return null for empty matrices.
2016-02-27 14:55:40 +01:00
Gael Guennebaud
91e1375ba9
merge
2016-02-23 11:09:05 +01:00
Gael Guennebaud
055000a424
Fix startRow()/startCol() for dense Block with direct access:
...
the initial implementation failed for empty rows/columns for which are ambiguous.
2016-02-23 11:07:59 +01:00
Benoit Steiner
6270d851e3
Declare the half float type as arithmetic.
2016-02-22 13:59:33 -08:00
Benoit Steiner
584832cb3c
Implemented the ptranspose function on half floats
2016-02-21 12:44:53 -08:00
Benoit Steiner
95fceb6452
Added the ability to compute the absolute value of a half float
2016-02-21 20:24:11 +00:00
Benoit Steiner
9ff269a1d3
Moved some of the fp16 operators outside the Eigen namespace to workaround some nvcc limitations.
2016-02-20 07:47:23 +00:00
Gael Guennebaud
d90a2dac5e
merge
2016-02-19 23:01:27 +01:00
Gael Guennebaud
6fa35bbd28
bug #1170 : skip calls to memcpy/memmove for empty imput.
2016-02-19 22:58:52 +01:00
Gael Guennebaud
6f0992c05b
Fix nesting type and complete reflection methods of Block expressions.
2016-02-19 22:21:02 +01:00
Gael Guennebaud
f3643eec57
Add typedefs for the return type of all block methods.
2016-02-19 22:15:01 +01:00
Benoit Steiner
180156ba1a
Added support for tensor reductions on half floats
2016-02-19 10:05:59 -08:00
Benoit Steiner
5c4901b83a
Implemented the scalar division of 2 half floats
2016-02-19 10:03:19 -08:00
Benoit Steiner
f7cb755299
Added support for operators +=, -=, *= and /= on CUDA half floats
2016-02-19 15:57:26 +00:00
Benoit Steiner
dc26459b99
Implemented protate() for CUDA
2016-02-19 15:16:54 +00:00
Benoit Steiner
ac5d706a94
Added support for simple coefficient wise tensor expression using half floats on CUDA devices
2016-02-19 08:19:12 +00:00
Benoit Steiner
0606a0a39b
FP16 on CUDA are only available starting with cuda 7.5. Disable them when using an older version of CUDA
2016-02-18 23:15:23 -08:00
Benoit Steiner
7151bd8768
Reverted unintended changes introduced by a bad merge
2016-02-19 06:20:50 +00:00
Benoit Steiner
17b9fbed34
Added preliminary support for half floats on CUDA GPU. For now we can simply convert floats into half floats and vice versa
2016-02-19 06:16:07 +00:00
Benoit Steiner
8ce46f9d89
Improved implementation of ptanh for SSE and AVX
2016-02-18 13:24:34 -08:00
Eugene Brevdo
832380c455
Merged eigen/eigen into default
2016-02-17 14:44:06 -08:00
Eugene Brevdo
06a2bc7c9c
Tiny bugfix in SpecialFunctions: some compilers don't like doubles
...
implicitly downcast to floats in an array constructor.
2016-02-17 14:41:59 -08:00
Gael Guennebaud
f6f057bb7d
bug #1166 : fix shortcomming in gemv when the destination is not a vector at compile-time.
2016-02-15 21:43:07 +01:00
Gael Guennebaud
4252af6897
Remove dead code.
2016-02-12 16:13:35 +01:00
Gael Guennebaud
2f5f56a820
Fix usage of evaluator in sparse * permutation products.
2016-02-12 16:13:16 +01:00
Gael Guennebaud
0a537cb2d8
bug #901 : fix triangular-view with unit diagonal of sparse rectangular matrices.
2016-02-12 15:58:31 +01:00
Benoit Steiner
17e93ba148
Pulled latest updates from trunk
2016-02-11 15:05:38 -08:00
Benoit Steiner
3628f7655d
Made it possible to run the scalar_binary_pow_op functor on GPU
2016-02-11 15:05:03 -08:00
Hauke Heibel
eeac46f980
bug #774 : re-added comment referencing equations in the original paper
2016-02-11 19:38:37 +01:00
Benoit Steiner
c569cfe12a
Inline the +=, -=, *= and /= operators consistently between DenseBase.h and SelfCwiseBinaryOp.h
2016-02-11 09:33:32 -08:00
Gael Guennebaud
8cc9232b9a
bug #774 : fix a numerical issue producing unwanted reflections.
2016-02-11 15:32:56 +01:00
Gael Guennebaud
2d35c0cb5f
Merged in rmlarsen/eigen (pull request PR-163)
...
Implement complete orthogonal decomposition in Eigen.
2016-02-11 15:12:34 +01:00
Benoit Steiner
33e2373f01
Merged in nnyby/eigen/nnyby/doc-grammar-fix-linearly-space-linearly-1443742971203 (pull request PR-138)
...
[doc] grammar fix: "linearly space" -> "linearly spaced"
2016-02-10 23:29:59 -08:00
Benoit Steiner
6d8b1dce06
Avoid implicit cast from double to float.
2016-02-10 18:07:11 -08:00
Rasmus Munk Larsen
b6fdf7468c
Rename inverse -> pseudoInverse.
2016-02-10 13:03:07 -08:00
Benoit Jacob
9d6f1ad398
I'm told to use __EMSCRIPTEN__ by an Emscripten dev.
2016-02-10 12:48:34 -05:00
Benoit Steiner
bfb3fcd94f
Optimized implementation of the tanh function for SSE
2016-02-10 08:52:30 -08:00
Benoit Steiner
2d523332b3
Optimized implementation of the hyperbolic tangent function for AVX
2016-02-10 08:48:05 -08:00
Benoit Jacob
e6ee18d6b4
Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCC
2016-02-10 11:11:49 -05:00
Benoit Jacob
964a95bf5e
Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088
2016-02-10 10:37:22 -05:00
Benoit Steiner
970751ece3
Disabling the nvcc warnings in addition to the clang warnings when clang is used as a frontend for nvcc
2016-02-09 20:55:50 -08:00
Rasmus Munk Larsen
bb8811c655
Enable inverse() method for computing pseudo-inverse.
2016-02-09 20:35:20 -08:00
Benoit Steiner
5cc0dd5f44
Fixed the code that disables the use of variadic templates when compiling with nvcc on ARM devices.
2016-02-09 10:32:01 -08:00
Benoit Steiner
24d291cf16
Worked around nvcc crash when compiling Eigen on Tegra X1
2016-02-09 02:34:02 +00:00
Rasmus Munk Larsen
53f60e0afc
Make applyZAdjointOnTheLeftInPlace protected.
2016-02-08 09:01:43 -08:00
Rasmus Munk Larsen
414efa47d3
Add missing calls to tests of COD.
...
Fix a few mistakes in 3.2 -> 3.3 port.
2016-02-08 08:50:34 -08:00
Gael Guennebaud
c2bf2f56ef
Remove custom unaligned loads for SSE. They were only useful for core2 CPU.
2016-02-08 14:29:12 +01:00
Gael Guennebaud
a4c76f8d34
Improve inlining
2016-02-08 11:33:02 +01:00
Rasmus Munk Larsen
16ec450ca1
Nevermind.
2016-02-06 17:54:01 -08:00
Rasmus Munk Larsen
019fff9a00
Add my name to copyright notice in ColPivHouseholder.h, mostly for previous work on stable norm downdate formula.
2016-02-06 17:48:42 -08:00
Rasmus Munk Larsen
86d6201d7b
Merge.
2016-02-06 16:36:56 -08:00
Rasmus Munk Larsen
d904c8ac8f
Implement complete orthogonal decomposition in Eigen.
2016-02-06 16:32:00 -08:00
Gael Guennebaud
c6a12d1dc6
Fix warning with gcc < 4.8
2016-02-06 18:06:51 +01:00
Gael Guennebaud
5b2d287878
bug #779 : allow non aligned buffers for buffers smaller than the requested alignment.
2016-02-05 21:46:39 +01:00
Gael Guennebaud
e8e1d504d6
Add an explicit assersion on the alignment of the pointer returned by std::malloc
2016-02-05 21:38:16 +01:00
Gael Guennebaud
62a1c911cd
Remove posix_memalign, _mm_malloc, and _aligned_malloc special paths.
2016-02-05 21:24:35 +01:00
Benoit Steiner
bcdcdace48
Pulled latest updates from trunk
2016-02-04 08:56:49 -08:00
Gael Guennebaud
659fc9c159
Remove dead code
2016-02-04 09:55:09 +01:00
Gael Guennebaud
d5d7798b9d
Improve heuritics for switching between coeff-based and general matrix product implementation.
2016-02-04 09:53:47 +01:00
Benoit Steiner
f535378995
Added support for vectorized type casting of int to char.
2016-02-03 18:58:29 -08:00
Benoit Steiner
727ff26960
Disable 2 more nvcc warning messages
2016-02-03 16:01:37 -08:00
Benoit Steiner
bcbde37a11
Made sure the code compiles when EIGEN_HAS_C99_MATH isn't defined
2016-02-03 14:53:08 -08:00
Benoit Steiner
f933f69021
Added a few comments
2016-02-03 14:12:18 -08:00
Benoit Steiner
5d82e47ef6
Properly disable nvcc warning messages in user code.
2016-02-03 14:10:06 -08:00
Benoit Steiner
d7742d22e4
Revert the nvcc messages to their default severity instead of the forcing them to be warnings
2016-02-03 13:47:28 -08:00
Benoit Steiner
ac26e1aaf3
Pulled latest updates from trunk
2016-02-03 12:52:20 -08:00
Benoit Steiner
492fe7ce02
Silenced some unhelpful warnings generated by nvcc.
2016-02-03 12:51:19 -08:00
Gael Guennebaud
b70db60e4d
Merged in rmlarsen/eigen (pull request PR-161)
...
Change Eigen's ColPivHouseholderQR to use numerically stable norm downdate formula
2016-02-03 21:37:06 +01:00
Rasmus Munk Larsen
5fb04ab2da
Fix bad line break. Don't repeat Kahan matrix test since it is deterministic.
2016-02-03 10:12:10 -08:00
Rasmus Munk Larsen
d9a6f86cc0
Make the array of directly compute column norms a member to avoid allocation in computeInPlace.
2016-02-03 09:55:30 -08:00
Gael Guennebaud
70dc14e4e1
bug #1161 : fix division by zero for huge scalar types
2016-02-03 18:25:41 +01:00
Damien R
c301f99208
bug #1164 : fix list and deque specializations such that our aligned allocator is automatically activatived only when the user did not specified an allocator (or specified the default std::allocator).
2016-02-03 18:07:25 +01:00
Gael Guennebaud
eb6d9aea0e
Clarify error message when writing to a read-only sparse-sub-matrix.
2016-02-03 16:58:23 +01:00
Rasmus Munk Larsen
00f9ef6c76
merging.
2016-02-01 11:10:30 -08:00
Gael Guennebaud
ff1157bcbf
bug #694 : document that SparseQR::matrixR is not sorted.
2016-02-01 16:09:34 +01:00
Gael Guennebaud
ec469700dc
bug #557 : make InnerIterator of sparse storage types more versatile by adding default-ctor, copy-ctor/assignment
2016-02-01 15:04:33 +01:00
Gael Guennebaud
6e0a86194c
Fix integer path for num_steps==1
2016-02-01 15:00:04 +01:00
Gael Guennebaud
e1d219e5c9
bug #698 : fix linspaced for integer types.
2016-02-01 14:25:34 +01:00
Gael Guennebaud
2c3224924b
Fix warning and replace min/max macros by calls to mini/maxi
2016-02-01 10:23:45 +01:00
Benoit Steiner
3f1ee45833
Fixed compilation errors triggered by duplicate inline declaration
2016-01-31 10:48:49 -08:00
Gael Guennebaud
d142165942
bug #667 : declare several critical functions as FORECE_INLINE to make ICC happier.
...
<g.gael@free.fr> HG: branch 'default' HG: changed Eigen/src/Core/ArrayBase.h HG: changed Eigen/src/Core/AssignEvaluator.h HG: changed
Eigen/src/Core/CoreEvaluators.h HG: changed Eigen/src/Core/CwiseUnaryOp.h HG: changed Eigen/src/Core/DenseBase.h HG: changed Eigen/src/Core/MatrixBase.h
2016-01-31 16:34:10 +01:00
Gael Guennebaud
1bc207c528
backout changeset d4a9e61569
...
: the extended SparseView is not needed anymore
2016-01-30 14:43:21 +01:00
Gael Guennebaud
8ed1553d20
bug #632 : implement general coefficient-wise "dense op sparse" operations through specialized evaluators instead of using SparseView.
...
This permits to deal with arbitrary storage order, and to by-pass the more complex iterator of the sparse-sparse case.
2016-01-30 14:39:50 +01:00
Gael Guennebaud
699634890a
bug #946 : generalize Cholmod::solve to handle any rhs expression
2016-01-29 23:02:22 +01:00
Gael Guennebaud
15084cf1ac
bug #632 : add support for "dense +/- sparse" operations. The current implementation is based on SparseView to make the dense subexpression compatible with the sparse one.
2016-01-29 22:09:45 +01:00
Gael Guennebaud
d4a9e61569
Extend SparseView to allow keeping explicit zeros. This is equivalent to sparseView(1,-1) but faster because the test is removed at compile-time.
2016-01-29 22:07:56 +01:00
Gael Guennebaud
d8d37349c3
bug #696 : enable zero-sized block at compile-time by relaxing the respective assertion
2016-01-29 12:44:49 +01:00
Gael Guennebaud
e8ccc06fe5
merge
2016-01-29 09:40:38 +01:00
Benoit Steiner
d3f533b395
Fixed compilation warning
2016-01-28 20:09:45 -08:00
Abhijit Kundu
3fde202215
Making ceil() functor generic w.r.t packet type
2016-01-28 21:27:00 -05:00
Rasmus Munk Larsen
acce4dd050
Change Eigen's ColPivHouseholderQR to use the numerically stable norm downdate formula from http://www.netlib.org/lapack/lawnspdf/lawn176.pdf , which has been used in LAPACK's xGEQPF and xGEQP3 since 2006. With the old formula, the code chooses the wrong pivots and fails to correctly determine rank on graded matrices.
...
This change also adds additional checks for non-increasing diagonal in R11 to existing unit tests, and adds a new unit test with the Kahan matrix, which consistently fails for the original code.
Benchmark timings on Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz. Code compiled with AVX & FMA. I just ran on square matrices of 3 difference sizes.
Benchmark Time(ns) CPU(ns) Iterations
-------------------------------------------------------
Before:
BM_EigencolPivQR/64 53677 53627 12890
BM_EigencolPivQR/512 15265408 15250784 46
BM_EigencolPivQR/4k 15403556228 15388788368 2
After (non-vectorized version):
Benchmark Time(ns) CPU(ns) Iterations Degradation
--------------------------------------------------------------------
BM_EigencolPivQR/64 63736 63669 10844 18.5%
BM_EigencolPivQR/512 16052546 16037381 43 5.1%
BM_EigencolPivQR/4k 15149263620 15132025316 2 -2.0%
Performance-wise there seems to be a ~18.5% degradation for small (64x64) matrices, probably due to the cost of more O(min(m,n)^2) sqrt operations that are not needed for the unstable formula.
2016-01-28 15:07:26 -08:00
Gael Guennebaud
b908e071a8
bug #178 : get rid of some const_cast in SparseCore
2016-01-28 22:11:18 +01:00
Gael Guennebaud
c1d900af61
bug #178 : remove additional const on nested expression, and remove several const_cast.
2016-01-28 21:43:20 +01:00
Gael Guennebaud
f50bb1e6f3
Fix compilation with gcc
2016-01-28 13:25:26 +01:00
Gael Guennebaud
ddf64babde
merge
2016-01-28 13:21:48 +01:00
Gael Guennebaud
df15fbc452
bug #1158 : PartialReduxExpr is a vector expression, and it thus must expose the LinearAccessBit flag
2016-01-28 13:16:30 +01:00
Gael Guennebaud
9bcadb7fd1
Disable stupid MSVC warning
2016-01-28 12:14:16 +01:00
Gael Guennebaud
b4d87fff4a
Fix MSVC warning.
2016-01-28 12:12:30 +01:00
Gael Guennebaud
2bad3e78d9
bug #96 , bug #1006 : fix by value argument in result_of.
2016-01-28 12:12:06 +01:00
Benoit Steiner
291069e885
Fixed some compilation problems with nvcc + clang
2016-01-27 15:37:03 -08:00
Gael Guennebaud
4865e1e732
Update link to suitesparse.
2016-01-27 22:48:40 +01:00
Eugene Brevdo
c8d94ae944
digamma special function: merge shared code.
...
Moved type-specific code into a helper class digamma_impl_maybe_poly<Scalar>.
2016-01-27 09:52:29 -08:00
Gael Guennebaud
9c8f7dfe94
bug #1156 : fix several function declarations whose arguments were passed by value instead of being passed by reference
2016-01-27 18:34:42 +01:00
Gael Guennebaud
9aa6fae123
bug #1154 : move to dynamic scheduling for spmv products.
2016-01-27 18:03:51 +01:00
Gael Guennebaud
9801c959e6
Fix tri = complex * real product, and add respective unit test.
2016-01-27 17:12:25 +01:00
Gael Guennebaud
21b5345782
Add meta_least_common_multiple helper.
2016-01-27 17:11:39 +01:00
Gael Guennebaud
fecea26d93
Extend doc on shifting strategy
2016-01-27 15:55:15 +01:00
Gael Guennebaud
cfa21f8123
Remove dead code.
2016-01-26 23:33:15 +01:00
Gael Guennebaud
6850eab33b
Re-enable blocking on rows in non-l3 blocking mode.
2016-01-26 23:32:48 +01:00
Gael Guennebaud
aa8c6a251e
Make sure that micro-panel-size is smaller than blocking sizes (otherwise we might get a buffer overflow)
2016-01-26 23:31:48 +01:00
Gael Guennebaud
5b0a9ee003
Make sure that block sizes are smaller than input matrix sizes.
2016-01-26 23:30:24 +01:00
Christoph Hertzberg
44d4674955
bug #1153 : Don't rely on __GXX_EXPERIMENTAL_CXX0X__ to detect C++11 support
2016-01-26 16:45:33 +01:00
Gael Guennebaud
8328caa618
bug #51 : add block preallocation mechanism to selfadjoit*matrix product.
2016-01-25 22:06:42 +01:00
Gael Guennebaud
e58827d2ed
bug #51 : make general_matrix_matrix_triangular_product use L3-blocking helper so that general symmetric rank-updates and general-matrix-to-triangular products do not trigger dynamic memory allocation for fixed size matrices.
2016-01-25 17:16:33 +01:00
Gael Guennebaud
b114e6fd3b
Improve documentation.
2016-01-25 11:56:25 +01:00
Gael Guennebaud
869b4443ac
Add SparseVector::conservativeResize() method.
2016-01-25 11:55:39 +01:00
Gael Guennebaud
acf6f7af6b
Merged in larsmans/eigen (pull request PR-156)
...
Documentation fixes
2016-01-24 22:28:49 +01:00
Lars Buitinck
cc482e32f1
Method is called visit, not visitor
2016-01-24 15:50:59 +01:00
Gael Guennebaud
1cf85bd875
bug #977 : add stableNormalize[d] methods: they are analogues to normalize[d] but with carefull handling of under/over-flow
2016-01-23 22:40:11 +01:00
Gael Guennebaud
369d6d1ae3
Add link to reference paper.
2016-01-23 22:16:03 +01:00
Gael Guennebaud
0caa4b1531
bug #1150 : make IncompleteCholesky more robust by iteratively increase the shift until the factorization succeed (with at most 10 attempts).
2016-01-23 22:13:54 +01:00
Gael Guennebaud
5358c38589
bug #1095 : add Cholmod*::logDeterminant/determinant (from patch of Joshua Pritikin)
2016-01-22 16:05:29 +01:00
Gael Guennebaud
06971223ef
Unify std::numeric_limits and device::numeric_limits within numext namespace
2016-01-22 15:02:21 +01:00
Gael Guennebaud
ee37eb4eed
bug #977 : avoid division by 0 in normalize() and normalized().
2016-01-21 20:43:42 +01:00
Gael Guennebaud
7cae8918c0
Fix compilation on old gcc+AVX
2016-01-21 20:30:32 +01:00
Gael Guennebaud
8dca9f97e3
Add numext::sqrt function to enable custom optimized implementation.
...
This changeset add two specializations for float/double on SSE. Those
are mostly usefull with GCC for which std::sqrt add an extra and costly
check on the result of _mm_sqrt_*. Clang does not add this burden.
In this changeset, only DenseBase::norm() makes use of it.
2016-01-21 20:18:51 +01:00
Gael Guennebaud
34340458cb
bug #1151 : remove useless critical section
2016-01-21 14:29:45 +01:00
Gael Guennebaud
ed8ade9c65
bug #1149 : fix Pastix*::*parm()
2016-01-20 19:01:24 +01:00
Gael Guennebaud
4c5e96aab6
bug #1148 : silent Pastix by default
2016-01-20 18:56:17 +01:00
Gael Guennebaud
db237d0c75
bug #1145 : fix PastixSupport LLT/LDLT wrappers (missing resize prior to calls to selfAdjointView)
2016-01-20 18:49:01 +01:00
Gael Guennebaud
0b7169d1f7
bug #1147 : fix compilation of PastixSupport
2016-01-20 18:15:59 +01:00