Benoit Steiner
70195a5ff7
Added missing EIGEN_DEVICE_FUNC
2016-05-11 14:10:09 -07:00
Benoit Steiner
09a19c33a8
Added missing EIGEN_DEVICE_FUNC qualifiers
2016-05-11 14:07:43 -07:00
Christoph Hertzberg
33ca7e3c8d
bug #1207 : Add and fix logical-op warnings
2016-05-11 19:36:34 +02:00
Benoit Steiner
217d984abc
Fixed a typo in my previous commit
2016-05-11 10:22:15 -07:00
Christoph Hertzberg
0f61343893
Workaround maybe-uninitialized warning
2016-05-11 09:00:18 +02:00
Christoph Hertzberg
3bfc9b47ca
Workaround "misleading-indentation" warnings
2016-05-11 08:41:36 +02:00
Benoit Steiner
0b9e3dcd06
Added packet primitives to compute exp, log, sqrt and rsqrt on fp16. This improves the performance by 10 to 30%.
2016-05-10 11:05:33 -07:00
Benoit Steiner
8adf5cc70f
Added support for packet processing of fp16 on kepler and maxwell gpus
2016-05-06 19:16:43 -07:00
Christoph Hertzberg
a11bd82dc3
bug #1213 : Give names to anonymous enums
2016-05-06 11:31:56 +02:00
Benoit Steiner
0451940fa4
Relaxed the dummy precision for fp16
2016-05-05 15:40:01 -07:00
Christoph Hertzberg
dacb469bc9
Enable and fix -Wdouble-conversion warnings
2016-05-05 13:35:45 +02:00
Ola Røer Thorsen
be78aea6b3
fix double-promotion/float-conversion in Core/SpecialFunctions.h
2016-05-04 10:52:08 +02:00
Gael Guennebaud
75a94b9662
Improve documentation of BDCSVD
2016-05-04 12:53:14 +02:00
Gael Guennebaud
e2ca478485
bug #1214 : consider denormals as zero in D&C SVD. This also workaround infinite binary search when compiling with ICC's unsafe optimizations.
2016-05-03 23:15:29 +02:00
Benoit Steiner
6c3e5b85bc
Fixed compilation error with cuda >= 7.5
2016-05-03 09:38:42 -07:00
Benoit Steiner
da50419df8
Made a cast explicit
2016-05-02 19:50:22 -07:00
Gael Guennebaud
b1bd53aa6b
Fix performance regression: with AVX, unaligned stores were emitted instead of aligned ones for fixed size assignement.
2016-05-01 23:25:06 +02:00
Benoit Steiner
2b890ae618
Fixed compilation errors generated by clang
2016-04-29 18:30:40 -07:00
Benoit Steiner
46bcb70969
Don't turn on const expressions when compiling with gcc >= 4.8 unless the -std=c++11 option has been used
2016-04-29 15:20:59 -07:00
Gael Guennebaud
0f3c4c8ff4
Fix compilation of sparse.cast<>().transpose().
2016-04-29 18:26:08 +02:00
Benoit Steiner
dacb23277e
Fixed the igamma and igammac implementations to make them callable from a gpu kernel.
2016-04-28 18:54:54 -07:00
Benoit Steiner
a5d4545083
Deleted unused variable
2016-04-28 14:14:48 -07:00
Justin Lebar
40d1e2f8c7
Eliminate mutual recursion in igamma{,c}_impl::Run.
...
Presently, igammac_impl::Run calls igamma_impl::Run, which in turn calls
igammac_impl::Run.
This isn't actually mutual recursion; the calls are guarded such that we never
get into a loop. Nonetheless, it's a stretch for clang to prove this. As a
result, clang emits a recursive call in both igammac_impl::Run and
igamma_impl::Run.
That this is suboptimal code is bad enough, but it's particularly bad when
compiling for CUDA/nvptx. nvptx allows recursion, but only begrudgingly: If
you have recursive calls in a kernel, it's on you to manually specify the
kernel's stack size. Otherwise, ptxas will dump a warning, make a guess, and
who knows if it's right.
This change explicitly eliminates the mutual recursion in igammac_impl::Run and
igamma_impl::Run.
2016-04-28 13:57:08 -07:00
Benoit Steiner
2b917291d9
Merged in rmlarsen/eigen2 (pull request PR-183)
...
Detect cxx_constexpr support when compiling with clang.
2016-04-27 15:19:54 -07:00
Rasmus Munk Larsen
09b9e951e3
Depend on the more extensive support for constexpr in clang:
...
http://clang.llvm.org/docs/LanguageExtensions.html#c-1y-relaxed-constexpr
2016-04-27 14:59:11 -07:00
Rasmus Munk Larsen
1a325ef71c
Detect cxx_constexpr support when compiling with clang.
2016-04-27 14:33:51 -07:00
Benoit Steiner
c61170e87d
fpclassify isn't portable enough. In particular, the return values of the function are not available on all the platforms Eigen supportes: remove it from Eigen.
2016-04-27 14:22:20 -07:00
Benoit Steiner
f629fe95c8
Made the index type a template parameter to evaluateProductBlockingSizes
...
Use numext::mini and numext::maxi instead of std::min/std::max to compute blocking sizes.
2016-04-27 13:11:19 -07:00
Benoit Steiner
25141b69d4
Improved support for min and max on 16 bit floats when running on recent cuda gpus
2016-04-27 12:57:21 -07:00
Benoit Steiner
6744d776ba
Added support for fpclassify in Eigen::Numext
2016-04-27 12:10:25 -07:00
Benoit Steiner
5c372d19e3
Merged in rmlarsen/eigen (pull request PR-179)
...
Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.
2016-04-21 18:06:36 -07:00
Rasmus Munk Larsen
a3256d78d8
Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.
2016-04-21 16:49:28 -07:00
Benoit Steiner
80200a1828
Don't attempt to leverage the _cvtss_sh and _cvtsh_ss instructions when compiling with clang since it's unclear which versions of clang actually support these instruction.
2016-04-20 12:10:27 -07:00
Benoit Steiner
1d0238375d
Made sure all the required header files are included when trying to use fp16
2016-04-19 17:44:12 -07:00
Gael Guennebaud
e4fe611e2c
Enable lazy-coeff-based-product for vector*(1x1) products
2016-04-16 15:17:39 +02:00
Benoit Steiner
1a16fb1532
Deleted extraneous comma.
2016-04-15 15:50:13 -07:00
Gael Guennebaud
2a7115daca
bug #1203 : by-pass large stack-allocation in stableNorm if EIGEN_STACK_ALLOCATION_LIMIT is too small
2016-04-15 22:34:11 +02:00
Benoit Steiner
1d23430628
Improved the matrix multiplication blocking in the case where mr is not a power of 2 (e.g on Haswell CPUs).
2016-04-15 10:53:31 -07:00
Gael Guennebaud
1e80bddde3
Fix trmv for mixing types.
2016-04-15 17:58:36 +02:00
Benoit Steiner
a62e924656
Added ability to access the cache sizes from the tensor devices
2016-04-14 21:25:06 -07:00
Benoit Steiner
18e6f67426
Added support for exclusive or
2016-04-14 20:37:46 -07:00
Gael Guennebaud
20f387fafa
Improve numerical robustness of JacoviSVD:
...
- avoid noise amplification in complex to real conversion
- compare off-diagonal entries to the current biggest diagonal entry: no need to bother about a 2x2 block containing ridiculously small entries compared to the rest of the matrix.
2016-04-14 22:46:55 +02:00
Benoit Steiner
7718749fee
Force the inlining of the << operator on half floats
2016-04-14 11:51:54 -07:00
Benoit Steiner
5379d2b594
Inline the << operator on half floats
2016-04-14 11:40:48 -07:00
Benoit Steiner
5c13765ee3
Added ability to printf fp16
2016-04-14 10:24:52 -07:00
Gael Guennebaud
3551dea887
Cleaning pass on rcond estimator.
2016-04-14 16:45:41 +02:00
Gael Guennebaud
d402adc3d7
Better use .data() than &coeffRef(0)
2016-04-14 15:18:08 +02:00
Gael Guennebaud
ea7087ef31
Merged in rmlarsen/eigen (pull request PR-174)
...
Add matrix condition number estimation module.
2016-04-14 15:11:33 +02:00
Benoit Steiner
36f5a10198
Properly gate the definition of the error and gamma functions for fp16
2016-04-13 18:44:48 -07:00
Benoit Steiner
10b69810d1
Improved support for trigonometric functions on GPU
2016-04-13 16:00:51 -07:00
Benoit Steiner
d6105b53b8
Added basic implementation of the lgamma, digamma, igamma, igammac, polygamma, and zeta function for fp16
2016-04-13 15:26:02 -07:00
Gael Guennebaud
703251f10f
merge
2016-04-13 23:45:10 +02:00
Gael Guennebaud
39211ba46b
Fix JacobiSVD for complex when the complex-to-real update already gives a diagonal 2x2 block.
2016-04-13 23:43:26 +02:00
Benoit Steiner
2986253259
Cleaned up the implementation of digamma
2016-04-13 14:24:06 -07:00
Benoit Steiner
d5de1a8220
Pulled latest updates from trunk
2016-04-13 14:17:11 -07:00
Benoit Steiner
87ca15c4e8
Added support for sin, cos, tan, and tanh on fp16
2016-04-13 14:12:38 -07:00
Gael Guennebaud
feef39e2d1
Fix underflow in JacoviSVD's complex to real preconditioner
2016-04-13 22:49:51 +02:00
Benoit Steiner
bf3f6688f0
Added support for computing cos, sin, tan, and tanh on GPU.
2016-04-13 11:55:08 -07:00
Benoit Steiner
473c8380ea
Added constructors to convert unsigned integers into fp16
2016-04-13 11:03:37 -07:00
Gael Guennebaud
42a3352a3b
Workaround a division by zero when outerstride==0
2016-04-13 19:02:02 +02:00
Gael Guennebaud
6f960b83ff
Make use of is_same_dense helper instead of extract_data to detect input/outputs are the same.
2016-04-13 18:47:12 +02:00
Gael Guennebaud
b7716c0328
Fix incomplete previous patch on matrix comparision.
2016-04-13 18:32:56 +02:00
Gael Guennebaud
2630d97c62
Fix detection of same matrices when both matrices are not handled by extract_data.
2016-04-13 18:26:08 +02:00
Gael Guennebaud
06447e0a39
Improve half-packet vectorization logic to distinguish linear versus inner traversal modes.
2016-04-13 18:15:49 +02:00
Gael Guennebaud
bbb8854bf7
Enable half-packet in reduxions.
2016-04-13 13:02:34 +02:00
Benoit Steiner
aa1ba8bbd2
Don't put a command at the end of an enumerator list
2016-04-12 16:28:11 -07:00
Gael Guennebaud
b67c983291
Enable the use of half-packet in coeff-based product.
...
For instance, Matrix4f*Vector4f is now vectorized again when using AVX.
2016-04-12 23:03:03 +02:00
Rasmus Larsen
6498dadc2f
Merged eigen/eigen into default
2016-04-11 17:42:05 -07:00
Benoit Steiner
748c4c4599
More accurate cost estimates for exp, log, tanh, and sqrt.
2016-04-11 13:11:04 -07:00
Benoit Steiner
833efb39bf
Added epsilon, dummy_precision, infinity and quiet_NaN NumTraits for fp16
2016-04-11 11:03:56 -07:00
Benoit Steiner
e939b087fe
Pulled latest update from trunk
2016-04-11 11:03:02 -07:00
Gael Guennebaud
0483430283
Move LAPACK declarations from blas.h to lapack.h and fix compatibility with EIGEN_USE_MKL
2016-04-11 17:12:31 +02:00
Gael Guennebaud
097d1e8823
Cleanup obsolete assign_scalar_eig2mkl helper.
2016-04-11 16:09:29 +02:00
Gael Guennebaud
fec4c334ba
Remove all references to MKL in BLAS wrappers.
2016-04-11 16:04:09 +02:00
Gael Guennebaud
ddabc992fa
Fix long to int conversion in BLAS API.
2016-04-11 15:52:01 +02:00
Gael Guennebaud
8191f373be
Silent unused warning.
2016-04-11 15:37:16 +02:00
Gael Guennebaud
6a9ca88e7e
Relax dependency on MKL for EIGEN_USE_BLAS
2016-04-11 15:17:14 +02:00
Gael Guennebaud
4e8e5888d7
Improve constness of blas level-3 interface.
2016-04-11 15:12:44 +02:00
Gael Guennebaud
675e0a2224
Fix static/inline keywords order.
2016-04-11 15:06:20 +02:00
Till Hoffmann
643b697649
Proper handling of domain errors.
2016-04-10 00:37:53 +01:00
Rasmus Munk Larsen
1f70bd4134
Merge.
2016-04-09 15:31:53 -07:00
Rasmus Munk Larsen
096e355f8e
Add short-circuit to avoid calling matrix norm for empty matrix.
2016-04-09 15:29:56 -07:00
Rasmus Larsen
be80fb49fc
Merged default ( 4a92b590a0
...
) into default
2016-04-09 13:13:01 -07:00
Rasmus Larsen
7a8176587b
Merged eigen/eigen into default
2016-04-09 12:47:41 -07:00
Rasmus Munk Larsen
4a92b590a0
Merge.
2016-04-09 12:47:24 -07:00
Rasmus Munk Larsen
ee6c69733a
A few tiny adjustments to short-circuit logic.
2016-04-09 12:45:49 -07:00
Till Hoffmann
7f4826890c
Merge upstream
2016-04-09 20:08:07 +01:00
Till Hoffmann
de057ebe54
Added nans to zeta function.
2016-04-09 20:07:36 +01:00
Benoit Steiner
5da90fc8dd
Use numext::abs instead of std::abs in scalar_fuzzy_default_impl to make it usable inside GPU kernels.
2016-04-08 19:40:48 -07:00
Benoit Steiner
01bd577288
Fixed the implementation of Eigen::numext::isfinite, Eigen::numext::isnan, andEigen::numext::isinf on CUDA devices
2016-04-08 16:40:10 -07:00
Benoit Steiner
89a3dc35a3
Fixed isfinite_impl: NumTraits<T>::highest() and NumTraits<T>::lowest() are finite numbers.
2016-04-08 15:56:16 -07:00
Benoit Steiner
995f202cea
Disabled the use of half2 on cuda devices of compute capability < 5.3
2016-04-08 14:43:36 -07:00
Benoit Steiner
8d22967bd9
Initial support for taking the power of fp16
2016-04-08 14:22:39 -07:00
Benoit Steiner
3394379319
Fixed the packet_traits for half floats.
2016-04-08 13:33:59 -07:00
Rasmus Larsen
0b81a18d12
Merged eigen/eigen into default
2016-04-08 12:58:57 -07:00
Benoit Jacob
cd2b667ac8
Add references to filed LLVM bugs
2016-04-08 08:12:47 -04:00
Benoit Steiner
3bd16457e1
Properly handle complex numbers.
2016-04-07 23:28:04 -07:00
Rasmus Larsen
c34e55c62b
Merged eigen/eigen into default
2016-04-07 20:23:03 -07:00
Rasmus Munk Larsen
283c51cd5e
Widen short-circuiting ReciprocalConditionNumberEstimate so we don't call InverseMatrixL1NormEstimate for dec.rows() <= 1.
2016-04-07 16:45:40 -07:00
Rasmus Munk Larsen
d51803a728
Use Index instead of int for indexing and sizes.
2016-04-07 16:39:48 -07:00
Rasmus Munk Larsen
fd872aefb3
Remove transpose() method from LLT and LDLT classes as it would imply conjugation.
...
Explicitly cast constants to RealScalar in ConditionEstimator.h.
2016-04-07 16:28:44 -07:00
Rasmus Munk Larsen
0b5546d182
Use lpNorm<1>() to compute l1 norms in LLT and LDLT.
2016-04-07 15:49:30 -07:00
parthaEth
2d5bb375b7
Static casting scalar types so as to let chlesky module of eigen work with ceres
2016-04-08 00:14:44 +02:00
Benoit Steiner
74f64838c5
Updated the unary functors to use the numext implementation of typicall functions instead of the one provided in the standard library. The standard library functions aren't supported officially by cuda, so we're better off using the numext implementations.
2016-04-07 11:42:14 -07:00
Benoit Steiner
737644366f
Move the functions operating on fp16 out of the std namespace and into the Eigen::numext namespace
2016-04-07 11:40:15 -07:00
Benoit Steiner
b89d3f78b2
Updated the isnan, isinf and isfinite functions to make compatible with cuda devices.
2016-04-07 10:08:49 -07:00
Benoit Steiner
df838736e2
Fixed compilation warning triggered by msvc
2016-04-06 20:48:55 -07:00
Benoit Steiner
14ea7c7ec7
Fixed packet_traits<half>
2016-04-06 19:30:21 -07:00
Benoit Steiner
532fdf24cb
Added support for hardware conversion between fp16 and full floats whenever
...
possible.
2016-04-06 17:11:31 -07:00
Benoit Steiner
58c1dbff19
Made the fp16 code more portable.
2016-04-06 13:44:08 -07:00
Benoit Steiner
cf7e73addd
Added some missing conversions to the Half class, and fixed the implementation of the < operator on cuda devices.
2016-04-06 09:59:51 -07:00
Benoit Steiner
10bdd8e378
Merged in tillahoffmann/eigen (pull request PR-173)
...
Added zeta function of two arguments and polygamma function
2016-04-06 09:40:17 -07:00
Benoit Steiner
72abfa11dd
Added support for isfinite on fp16
2016-04-06 09:07:30 -07:00
Rasmus Munk Larsen
4d07064a3d
Fix bug in alternate lower bound calculation due to missing parentheses.
...
Make a few expressions more concise.
2016-04-05 16:40:48 -07:00
Konstantinos Margaritis
2bba4ee2cf
Merged kmargar/eigen/tip into default
2016-04-05 22:22:08 +03:00
Konstantinos Margaritis
317384b397
complete the port, remove float support
2016-04-05 14:56:45 -04:00
tillahoffmann
726bd5f077
Merged eigen/eigen into default
2016-04-05 18:21:05 +01:00
Till Hoffmann
a350c25a39
Added accuracy comments.
2016-04-05 18:20:40 +01:00
Konstantinos Margaritis
bc0ad363c6
add remaining includes
2016-04-05 06:01:17 -04:00
Konstantinos Margaritis
2d41dc9622
complete int/double specialized traits for ZVector
2016-04-05 06:00:51 -04:00
Konstantinos Margaritis
988344daf1
enable the other includes as well
2016-04-05 05:59:30 -04:00
Rasmus Larsen
d7eeee0c1d
Merged eigen/eigen into default
2016-04-04 15:58:27 -07:00
Rasmus Munk Larsen
513c372960
Fix docstrings to list all supported decompositions.
2016-04-04 14:34:59 -07:00
Rasmus Munk Larsen
86e0ed81f8
Addresses comments on Eigen pull request PR-174.
...
* Get rid of code-duplication for real vs. complex matrices.
* Fix flipped arguments to select.
* Make the condition estimation functions free functions.
* Use Vector::Unit() to generate canonical unit vectors.
* Misc. cleanup.
2016-04-04 14:20:01 -07:00
Benoit Jacob
158fea0f5e
bug #1190 - Don't trust __ARM_FEATURE_FMA on Clang/ARM
2016-04-04 16:42:40 -04:00
Benoit Jacob
03f2997a11
bug #1191 - Prevent Clang/ARM from rewriting VMLA into VMUL+VADD
2016-04-04 16:41:47 -04:00
Till Hoffmann
b97911dd18
Refactored code into type-specific helper functions.
2016-04-04 19:16:03 +01:00
Benoit Steiner
c4179dd470
Updated the scalar_abs_op struct to make it compatible with cuda devices.
2016-04-04 11:11:51 -07:00
Benoit Steiner
1108b4f218
Fixed the signature of numext::abs to make it compatible with complex numbers
2016-04-04 11:09:25 -07:00
Rasmus Larsen
30242b7565
Merged eigen/eigen into default
2016-04-01 17:19:36 -07:00
Rasmus Munk Larsen
9d51f7c457
Add rcond method to LDLT.
2016-04-01 16:48:38 -07:00
Rasmus Munk Larsen
f54137606e
Add condition estimation to Cholesky (LLT) factorization.
2016-04-01 16:19:45 -07:00
Rasmus Munk Larsen
fb8dccc23e
Replace "inline static" with "static inline" for consistency.
2016-04-01 12:48:18 -07:00
Rasmus Munk Larsen
91414e0042
Fix comments in ConditionEstimator and minor cleanup.
2016-04-01 11:58:17 -07:00
Rasmus Munk Larsen
1aa89fb855
Add matrix condition estimator module that implements the Higham/Hager algorithm from http://www.maths.manchester.ac.uk/~higham/narep/narep135.pdf used in LPACK. Add rcond() methods to FullPivLU and PartialPivLU.
2016-04-01 10:27:59 -07:00
Till Hoffmann
80eba21ad0
Merge upstream.
2016-04-01 18:18:49 +01:00
Till Hoffmann
3cb0a237c1
Fixed suggestions by Eugene Brevdo.
2016-04-01 17:51:39 +01:00
tillahoffmann
49960adbdd
Merged eigen/eigen into default
2016-04-01 14:36:15 +01:00
Till Hoffmann
57239f4a81
Added polygamma function.
2016-04-01 14:35:21 +01:00
Till Hoffmann
dd5d390daf
Added zeta function.
2016-04-01 13:32:29 +01:00
Benoit Steiner
0ea7ab4f62
Hashing was only officially introduced in c++11. Therefore only define an implementation of the hash function for float16 if c++11 is enabled.
2016-03-31 14:44:55 -07:00
Benoit Steiner
92b7f7b650
Improved code formating
2016-03-31 13:09:58 -07:00
Benoit Steiner
f197813f37
Added the ability to hash a fp16
2016-03-31 13:09:23 -07:00
Benoit Steiner
4c859181da
Made it possible to use the NumTraits for complex and Array in a cuda kernel.
2016-03-31 12:48:38 -07:00
Benoit Steiner
c36ab19902
Added __ldg primitive for fp16.
2016-03-31 10:55:03 -07:00
Benoit Steiner
b575fb1d02
Added NumTraits for half floats
2016-03-31 10:43:59 -07:00
Benoit Steiner
8c8a79cec1
Fixed a typo
2016-03-31 10:33:32 -07:00
Benoit Steiner
4f1a7e51c1
Pull math functions from the global namespace only when compiling cuda code with nvcc. When compiling with clang, we want to use the std namespace.
2016-03-30 17:59:49 -07:00
Benoit Steiner
bc68fc2fe7
Enable constant expressions when compiling cuda code with clang.
2016-03-30 17:58:32 -07:00
Benoit Jacob
01b5333e44
bug #1186 - vreinterpretq_u64_f64 fails to build on Android/Aarch64/Clang toolchain
2016-03-30 11:02:33 -04:00