Gael Guennebaud
e2ca478485
bug #1214 : consider denormals as zero in D&C SVD. This also workaround infinite binary search when compiling with ICC's unsafe optimizations.
2016-05-03 23:15:29 +02:00
Benoit Steiner
6c3e5b85bc
Fixed compilation error with cuda >= 7.5
2016-05-03 09:38:42 -07:00
Benoit Steiner
da50419df8
Made a cast explicit
2016-05-02 19:50:22 -07:00
Gael Guennebaud
b1bd53aa6b
Fix performance regression: with AVX, unaligned stores were emitted instead of aligned ones for fixed size assignement.
2016-05-01 23:25:06 +02:00
Benoit Steiner
2b890ae618
Fixed compilation errors generated by clang
2016-04-29 18:30:40 -07:00
Benoit Steiner
46bcb70969
Don't turn on const expressions when compiling with gcc >= 4.8 unless the -std=c++11 option has been used
2016-04-29 15:20:59 -07:00
Gael Guennebaud
0f3c4c8ff4
Fix compilation of sparse.cast<>().transpose().
2016-04-29 18:26:08 +02:00
Benoit Steiner
dacb23277e
Fixed the igamma and igammac implementations to make them callable from a gpu kernel.
2016-04-28 18:54:54 -07:00
Benoit Steiner
a5d4545083
Deleted unused variable
2016-04-28 14:14:48 -07:00
Justin Lebar
40d1e2f8c7
Eliminate mutual recursion in igamma{,c}_impl::Run.
...
Presently, igammac_impl::Run calls igamma_impl::Run, which in turn calls
igammac_impl::Run.
This isn't actually mutual recursion; the calls are guarded such that we never
get into a loop. Nonetheless, it's a stretch for clang to prove this. As a
result, clang emits a recursive call in both igammac_impl::Run and
igamma_impl::Run.
That this is suboptimal code is bad enough, but it's particularly bad when
compiling for CUDA/nvptx. nvptx allows recursion, but only begrudgingly: If
you have recursive calls in a kernel, it's on you to manually specify the
kernel's stack size. Otherwise, ptxas will dump a warning, make a guess, and
who knows if it's right.
This change explicitly eliminates the mutual recursion in igammac_impl::Run and
igamma_impl::Run.
2016-04-28 13:57:08 -07:00
Benoit Steiner
2b917291d9
Merged in rmlarsen/eigen2 (pull request PR-183)
...
Detect cxx_constexpr support when compiling with clang.
2016-04-27 15:19:54 -07:00
Rasmus Munk Larsen
09b9e951e3
Depend on the more extensive support for constexpr in clang:
...
http://clang.llvm.org/docs/LanguageExtensions.html#c-1y-relaxed-constexpr
2016-04-27 14:59:11 -07:00
Rasmus Munk Larsen
1a325ef71c
Detect cxx_constexpr support when compiling with clang.
2016-04-27 14:33:51 -07:00
Benoit Steiner
c61170e87d
fpclassify isn't portable enough. In particular, the return values of the function are not available on all the platforms Eigen supportes: remove it from Eigen.
2016-04-27 14:22:20 -07:00
Benoit Steiner
f629fe95c8
Made the index type a template parameter to evaluateProductBlockingSizes
...
Use numext::mini and numext::maxi instead of std::min/std::max to compute blocking sizes.
2016-04-27 13:11:19 -07:00
Benoit Steiner
25141b69d4
Improved support for min and max on 16 bit floats when running on recent cuda gpus
2016-04-27 12:57:21 -07:00
Benoit Steiner
6744d776ba
Added support for fpclassify in Eigen::Numext
2016-04-27 12:10:25 -07:00
Benoit Steiner
5c372d19e3
Merged in rmlarsen/eigen (pull request PR-179)
...
Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.
2016-04-21 18:06:36 -07:00
Rasmus Munk Larsen
a3256d78d8
Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.
2016-04-21 16:49:28 -07:00
Benoit Steiner
80200a1828
Don't attempt to leverage the _cvtss_sh and _cvtsh_ss instructions when compiling with clang since it's unclear which versions of clang actually support these instruction.
2016-04-20 12:10:27 -07:00
Benoit Steiner
1d0238375d
Made sure all the required header files are included when trying to use fp16
2016-04-19 17:44:12 -07:00
Gael Guennebaud
e4fe611e2c
Enable lazy-coeff-based-product for vector*(1x1) products
2016-04-16 15:17:39 +02:00
Benoit Steiner
1a16fb1532
Deleted extraneous comma.
2016-04-15 15:50:13 -07:00
Gael Guennebaud
2a7115daca
bug #1203 : by-pass large stack-allocation in stableNorm if EIGEN_STACK_ALLOCATION_LIMIT is too small
2016-04-15 22:34:11 +02:00
Benoit Steiner
1d23430628
Improved the matrix multiplication blocking in the case where mr is not a power of 2 (e.g on Haswell CPUs).
2016-04-15 10:53:31 -07:00
Gael Guennebaud
1e80bddde3
Fix trmv for mixing types.
2016-04-15 17:58:36 +02:00
Benoit Steiner
a62e924656
Added ability to access the cache sizes from the tensor devices
2016-04-14 21:25:06 -07:00
Benoit Steiner
18e6f67426
Added support for exclusive or
2016-04-14 20:37:46 -07:00
Gael Guennebaud
20f387fafa
Improve numerical robustness of JacoviSVD:
...
- avoid noise amplification in complex to real conversion
- compare off-diagonal entries to the current biggest diagonal entry: no need to bother about a 2x2 block containing ridiculously small entries compared to the rest of the matrix.
2016-04-14 22:46:55 +02:00
Benoit Steiner
7718749fee
Force the inlining of the << operator on half floats
2016-04-14 11:51:54 -07:00
Benoit Steiner
5379d2b594
Inline the << operator on half floats
2016-04-14 11:40:48 -07:00
Benoit Steiner
5c13765ee3
Added ability to printf fp16
2016-04-14 10:24:52 -07:00
Gael Guennebaud
3551dea887
Cleaning pass on rcond estimator.
2016-04-14 16:45:41 +02:00
Gael Guennebaud
d402adc3d7
Better use .data() than &coeffRef(0)
2016-04-14 15:18:08 +02:00
Gael Guennebaud
ea7087ef31
Merged in rmlarsen/eigen (pull request PR-174)
...
Add matrix condition number estimation module.
2016-04-14 15:11:33 +02:00
Benoit Steiner
36f5a10198
Properly gate the definition of the error and gamma functions for fp16
2016-04-13 18:44:48 -07:00
Benoit Steiner
10b69810d1
Improved support for trigonometric functions on GPU
2016-04-13 16:00:51 -07:00
Benoit Steiner
d6105b53b8
Added basic implementation of the lgamma, digamma, igamma, igammac, polygamma, and zeta function for fp16
2016-04-13 15:26:02 -07:00
Gael Guennebaud
703251f10f
merge
2016-04-13 23:45:10 +02:00
Gael Guennebaud
39211ba46b
Fix JacobiSVD for complex when the complex-to-real update already gives a diagonal 2x2 block.
2016-04-13 23:43:26 +02:00
Benoit Steiner
2986253259
Cleaned up the implementation of digamma
2016-04-13 14:24:06 -07:00
Benoit Steiner
d5de1a8220
Pulled latest updates from trunk
2016-04-13 14:17:11 -07:00
Benoit Steiner
87ca15c4e8
Added support for sin, cos, tan, and tanh on fp16
2016-04-13 14:12:38 -07:00
Gael Guennebaud
feef39e2d1
Fix underflow in JacoviSVD's complex to real preconditioner
2016-04-13 22:49:51 +02:00
Benoit Steiner
bf3f6688f0
Added support for computing cos, sin, tan, and tanh on GPU.
2016-04-13 11:55:08 -07:00
Benoit Steiner
473c8380ea
Added constructors to convert unsigned integers into fp16
2016-04-13 11:03:37 -07:00
Gael Guennebaud
42a3352a3b
Workaround a division by zero when outerstride==0
2016-04-13 19:02:02 +02:00
Gael Guennebaud
6f960b83ff
Make use of is_same_dense helper instead of extract_data to detect input/outputs are the same.
2016-04-13 18:47:12 +02:00
Gael Guennebaud
b7716c0328
Fix incomplete previous patch on matrix comparision.
2016-04-13 18:32:56 +02:00
Gael Guennebaud
2630d97c62
Fix detection of same matrices when both matrices are not handled by extract_data.
2016-04-13 18:26:08 +02:00