eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-06 14:14:46 +08:00

Author	SHA1	Message	Date
Benoit Steiner	e617711306	Don't attempt to use MMX instructions with visualstudio since they're only partially supported.	2016-05-24 06:43:58 -07:00
Benoit Steiner	334e76537f	Worked around missing clang intrinsic	2016-05-24 00:29:28 -07:00
Benoit Steiner	b517ab349b	Use the generic ploadquad intrinsics since it does the job	2016-05-24 00:11:17 -07:00
Benoit Steiner	646872cb3b	Worked around missing clang intrinsics	2016-05-24 00:07:08 -07:00
Benoit Steiner	3dfc391a61	Added missing EIGEN_DEVICE_FUNC qualifier	2016-05-23 20:56:59 -07:00
Benoit Steiner	3d0741f027	Include mmintrin.h to make it possible to use mmx instructions when needed. For example, this will enable the definition of a half packet for the Packet4f type.	2016-05-23 20:43:48 -07:00
Benoit Steiner	33a94f5dc7	Use the Index type instead of integers to specify the strides in pgather/pscatter	2016-05-23 20:37:30 -07:00
Benoit Steiner	6bc684ab6a	Added missing alignment in the fp16 packet traits	2016-05-23 20:32:30 -07:00
Benoit Steiner	283e33dea4	ptranspose is not a template.	2016-05-23 19:55:55 -07:00
Benoit Steiner	a5a3ba2b80	Avoid unnecessary float to double conversions	2016-05-23 17:16:09 -07:00
Benoit Steiner	5ba0ebe7c9	Avoid unnecessary float to double conversion.	2016-05-23 17:14:31 -07:00
Benoit Steiner	7d980d74e5	Started to vectorize the processing of 16bit floats on CPU.	2016-05-23 15:21:40 -07:00
Benoit Steiner	5d51a7f12c	Don't optimize the processing of the last rows of a matrix matrix product in cases that violate the assumptions made by the optimized code path.	2016-05-23 15:13:16 -07:00
Christoph Hertzberg	88654762da	Replace multiple constructors of half-type by a generic/templated constructor. This fixes an incompatibility with long double, exposed by the previous commit.	2016-05-23 10:03:03 +02:00
Christoph Hertzberg	718521d5cf	Silenced several double-promotion warnings	2016-05-22 18:17:04 +02:00
Gael Guennebaud	ccaace03c9	Make EIGEN_HAS_CONSTEXPR user configurable	2016-05-20 15:10:08 +02:00
Gael Guennebaud	c3410804cd	Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable	2016-05-20 15:05:38 +02:00
Gael Guennebaud	abd1c1af7a	Make EIGEN_HAS_STD_RESULT_OF user configurable	2016-05-20 15:01:27 +02:00
Gael Guennebaud	1395056fc0	Make EIGEN_HAS_C99_MATH user configurable	2016-05-20 14:58:19 +02:00
Gael Guennebaud	48bf5ec216	Make EIGEN_HAS_RVALUE_REFERENCES user configurable	2016-05-20 14:54:20 +02:00
Gael Guennebaud	f43ae88892	Rename EIGEN_HAVE_RVALUE_REFERENCES to EIGEN_HAS_RVALUE_REFERENCES	2016-05-20 14:48:51 +02:00
Gael Guennebaud	998f2efc58	Add a EIGEN_MAX_CPP_VER option to limit the C++ version to be used.	2016-05-20 14:44:28 +02:00
Gael Guennebaud	c028d96089	Improve doc of special math functions	2016-05-20 14:18:48 +02:00
Gael Guennebaud	0ba32f99bd	Rename UniformRandom to UnitRandom.	2016-05-20 13:21:34 +02:00
Gael Guennebaud	7a9d9cde94	Fix coding practice in Quaternion::UniformRandom	2016-05-20 13:19:52 +02:00
Joseph Mirabel	eb0cc2573a	bug #823 : add static method to Quaternion for uniform random rotations.	2016-05-20 13:15:40 +02:00
Gael Guennebaud	6761c64d60	zeta and polygamma are not unary functions, but binary ones.	2016-05-19 18:34:16 +02:00
Gael Guennebaud	7a54032408	zeta and digamma do not require C++11/C99	2016-05-19 17:36:47 +02:00
Gael Guennebaud	ce12562710	Add some c++11 flags in documentation	2016-05-19 17:35:30 +02:00
Gael Guennebaud	b6ed8244b4	bug #1201 : optimize affine*vector products	2016-05-19 16:09:15 +02:00
Gael Guennebaud	73693b5de6	bug #1221 : disable gcc 6 warning: ignoring attributes on template argument	2016-05-19 15:21:53 +02:00
Gael Guennebaud	df9a5e13c6	Fix SelfAdjointEigenSolver for some input expression types, and add new regression unit tests for sparse and selfadjointview inputs.	2016-05-19 13:07:33 +02:00
Gael Guennebaud	6a2916df80	DiagonalWrapper is a vector, so it must expose the LinearAccessBit flag.	2016-05-19 13:06:21 +02:00
Gael Guennebaud	a226f6af6b	Add support for SelfAdjointView::diagonal()	2016-05-19 13:05:33 +02:00
Gael Guennebaud	ee7da3c7c5	Fix SelfAdjointView::triangularView for complexes.	2016-05-19 13:01:51 +02:00
Gael Guennebaud	b6b8578a67	bug #1230 : add support for SelfadjointView::triangularView.	2016-05-19 11:36:38 +02:00
Gael Guennebaud	84df9142e7	bug #1231 : fix compilation regression regarding complex_array/=real_array and add respective unit tests	2016-05-18 23:00:13 +02:00
Gael Guennebaud	21d692d054	Use coeff(i,j) instead of operator().	2016-05-18 17:09:20 +02:00
Gael Guennebaud	8456bbbadb	bug #1224 : fix regression in (dense*dense).sparseView() by specializing evaluator<SparseView<Product>> for sparse products only.	2016-05-18 16:53:28 +02:00
Gael Guennebaud	b507b82326	Use default sorting strategy for square products.	2016-05-18 16:51:54 +02:00
Gael Guennebaud	747e3290c0	bug #1213 : rename some enums type for consistency.	2016-05-18 13:26:56 +02:00
Rasmus Munk Larsen	0dbd68145f	Roll back changes to core. Move include of TensorFunctors.h up to satisfy dependence in TensorCostModel.h.	2016-05-17 10:25:19 -07:00
Rasmus Munk Larsen	e55deb21c5	Improvements to parallelFor. Move some scalar functors from TensorFunctors. to Eigen core.	2016-05-12 14:07:22 -07:00
Benoit Steiner	fae0493f98	Fixed a couple of bugs related to the Pascalfamily of GPUs H: Enter commit message. Lines beginning with 'HG:' are removed.	2016-05-11 23:02:26 -07:00
Benoit Steiner	b6a517c47d	Added the ability to load fp16 using the texture path. Improved the performance of some reductions on fp16	2016-05-11 21:26:48 -07:00
Benoit Steiner	518149e868	Misc fixes for fp16	2016-05-11 20:11:14 -07:00
Benoit Steiner	56a1757d74	Made predux_min and predux_max on fp16 less noisy	2016-05-11 17:37:34 -07:00
Benoit Steiner	9091351dbe	__ldg is only available with cuda architectures >= 3.5	2016-05-11 15:22:13 -07:00
Benoit Steiner	02f76dae2d	Fixed a typo	2016-05-11 15:08:38 -07:00
Christoph Hertzberg	131e5a1a4a	Do not copy for trivial 1x1 case. This also avoids a "maybe-uninitialized" warning in some situations.	2016-05-11 23:50:13 +02:00
Benoit Steiner	70195a5ff7	Added missing EIGEN_DEVICE_FUNC	2016-05-11 14:10:09 -07:00
Benoit Steiner	09a19c33a8	Added missing EIGEN_DEVICE_FUNC qualifiers	2016-05-11 14:07:43 -07:00
Christoph Hertzberg	33ca7e3c8d	bug #1207 : Add and fix logical-op warnings	2016-05-11 19:36:34 +02:00
Benoit Steiner	217d984abc	Fixed a typo in my previous commit	2016-05-11 10:22:15 -07:00
Christoph Hertzberg	0f61343893	Workaround maybe-uninitialized warning	2016-05-11 09:00:18 +02:00
Christoph Hertzberg	3bfc9b47ca	Workaround "misleading-indentation" warnings	2016-05-11 08:41:36 +02:00
Benoit Steiner	0b9e3dcd06	Added packet primitives to compute exp, log, sqrt and rsqrt on fp16. This improves the performance by 10 to 30%.	2016-05-10 11:05:33 -07:00
Benoit Steiner	8adf5cc70f	Added support for packet processing of fp16 on kepler and maxwell gpus	2016-05-06 19:16:43 -07:00
Christoph Hertzberg	a11bd82dc3	bug #1213 : Give names to anonymous enums	2016-05-06 11:31:56 +02:00
Benoit Steiner	0451940fa4	Relaxed the dummy precision for fp16	2016-05-05 15:40:01 -07:00
Christoph Hertzberg	dacb469bc9	Enable and fix -Wdouble-conversion warnings	2016-05-05 13:35:45 +02:00
Ola Røer Thorsen	be78aea6b3	fix double-promotion/float-conversion in Core/SpecialFunctions.h	2016-05-04 10:52:08 +02:00
Gael Guennebaud	75a94b9662	Improve documentation of BDCSVD	2016-05-04 12:53:14 +02:00
Gael Guennebaud	e2ca478485	bug #1214 : consider denormals as zero in D&C SVD. This also workaround infinite binary search when compiling with ICC's unsafe optimizations.	2016-05-03 23:15:29 +02:00
Benoit Steiner	6c3e5b85bc	Fixed compilation error with cuda >= 7.5	2016-05-03 09:38:42 -07:00
Benoit Steiner	da50419df8	Made a cast explicit	2016-05-02 19:50:22 -07:00
Gael Guennebaud	b1bd53aa6b	Fix performance regression: with AVX, unaligned stores were emitted instead of aligned ones for fixed size assignement.	2016-05-01 23:25:06 +02:00
Benoit Steiner	2b890ae618	Fixed compilation errors generated by clang	2016-04-29 18:30:40 -07:00
Benoit Steiner	46bcb70969	Don't turn on const expressions when compiling with gcc >= 4.8 unless the -std=c++11 option has been used	2016-04-29 15:20:59 -07:00
Gael Guennebaud	0f3c4c8ff4	Fix compilation of sparse.cast<>().transpose().	2016-04-29 18:26:08 +02:00
Benoit Steiner	dacb23277e	Fixed the igamma and igammac implementations to make them callable from a gpu kernel.	2016-04-28 18:54:54 -07:00
Benoit Steiner	a5d4545083	Deleted unused variable	2016-04-28 14:14:48 -07:00
Justin Lebar	40d1e2f8c7	Eliminate mutual recursion in igamma{,c}_impl::Run. Presently, igammac_impl::Run calls igamma_impl::Run, which in turn calls igammac_impl::Run. This isn't actually mutual recursion; the calls are guarded such that we never get into a loop. Nonetheless, it's a stretch for clang to prove this. As a result, clang emits a recursive call in both igammac_impl::Run and igamma_impl::Run. That this is suboptimal code is bad enough, but it's particularly bad when compiling for CUDA/nvptx. nvptx allows recursion, but only begrudgingly: If you have recursive calls in a kernel, it's on you to manually specify the kernel's stack size. Otherwise, ptxas will dump a warning, make a guess, and who knows if it's right. This change explicitly eliminates the mutual recursion in igammac_impl::Run and igamma_impl::Run.	2016-04-28 13:57:08 -07:00
Konstantinos Margaritis	87294c84a6	define Packet2d constants with VSX only	2016-04-28 14:39:56 -03:00
Konstantinos Margaritis	6ed7a7281c	remove accidentally pasted code	2016-04-28 14:35:55 -03:00
Konstantinos Margaritis	62f9093b31	improve state of MathFunctions as well	2016-04-28 14:33:09 -03:00
Konstantinos Margaritis	8ed26120c8	bring Altivec/VSX to a better state, implement some of the missing functions	2016-04-28 14:32:42 -03:00
Konstantinos Margaritis	950158f6d1	add name to copyrights	2016-04-28 14:32:11 -03:00
Konstantinos Margaritis	ee0459300b	minor fix, add to copyright	2016-04-28 14:31:21 -03:00
Benoit Steiner	2b917291d9	Merged in rmlarsen/eigen2 (pull request PR-183) Detect cxx_constexpr support when compiling with clang.	2016-04-27 15:19:54 -07:00
Rasmus Munk Larsen	09b9e951e3	Depend on the more extensive support for constexpr in clang: http://clang.llvm.org/docs/LanguageExtensions.html#c-1y-relaxed-constexpr	2016-04-27 14:59:11 -07:00
Rasmus Munk Larsen	1a325ef71c	Detect cxx_constexpr support when compiling with clang.	2016-04-27 14:33:51 -07:00
Benoit Steiner	c61170e87d	fpclassify isn't portable enough. In particular, the return values of the function are not available on all the platforms Eigen supportes: remove it from Eigen.	2016-04-27 14:22:20 -07:00
Benoit Steiner	f629fe95c8	Made the index type a template parameter to evaluateProductBlockingSizes Use numext::mini and numext::maxi instead of std::min/std::max to compute blocking sizes.	2016-04-27 13:11:19 -07:00
Benoit Steiner	25141b69d4	Improved support for min and max on 16 bit floats when running on recent cuda gpus	2016-04-27 12:57:21 -07:00
Benoit Steiner	6744d776ba	Added support for fpclassify in Eigen::Numext	2016-04-27 12:10:25 -07:00
Konstantinos Margaritis	3f80696ae1	Merged eigen/eigen into default	2016-04-22 15:05:21 +03:00
Benoit Steiner	5c372d19e3	Merged in rmlarsen/eigen (pull request PR-179) Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.	2016-04-21 18:06:36 -07:00
Rasmus Munk Larsen	a3256d78d8	Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.	2016-04-21 16:49:28 -07:00
Konstantinos Margaritis	e5b2ef47d5	Merged eigen/eigen into default	2016-04-21 18:03:08 +03:00
Benoit Steiner	80200a1828	Don't attempt to leverage the _cvtss_sh and _cvtsh_ss instructions when compiling with clang since it's unclear which versions of clang actually support these instruction.	2016-04-20 12:10:27 -07:00
Benoit Steiner	1d0238375d	Made sure all the required header files are included when trying to use fp16	2016-04-19 17:44:12 -07:00
Gael Guennebaud	e4fe611e2c	Enable lazy-coeff-based-product for vector*(1x1) products	2016-04-16 15:17:39 +02:00
Benoit Steiner	1a16fb1532	Deleted extraneous comma.	2016-04-15 15:50:13 -07:00
Gael Guennebaud	2a7115daca	bug #1203 : by-pass large stack-allocation in stableNorm if EIGEN_STACK_ALLOCATION_LIMIT is too small	2016-04-15 22:34:11 +02:00
Benoit Steiner	1d23430628	Improved the matrix multiplication blocking in the case where mr is not a power of 2 (e.g on Haswell CPUs).	2016-04-15 10:53:31 -07:00
Gael Guennebaud	1e80bddde3	Fix trmv for mixing types.	2016-04-15 17:58:36 +02:00
Konstantinos Margaritis	0e8fc31087	remove pgather/pscatter for std::complex<double> for s390x	2016-04-15 07:08:57 -04:00
Benoit Steiner	a62e924656	Added ability to access the cache sizes from the tensor devices	2016-04-14 21:25:06 -07:00
Benoit Steiner	18e6f67426	Added support for exclusive or	2016-04-14 20:37:46 -07:00
Gael Guennebaud	20f387fafa	Improve numerical robustness of JacoviSVD: - avoid noise amplification in complex to real conversion - compare off-diagonal entries to the current biggest diagonal entry: no need to bother about a 2x2 block containing ridiculously small entries compared to the rest of the matrix.	2016-04-14 22:46:55 +02:00
Benoit Steiner	7718749fee	Force the inlining of the << operator on half floats	2016-04-14 11:51:54 -07:00
Benoit Steiner	5379d2b594	Inline the << operator on half floats	2016-04-14 11:40:48 -07:00
Benoit Steiner	5c13765ee3	Added ability to printf fp16	2016-04-14 10:24:52 -07:00
Gael Guennebaud	3551dea887	Cleaning pass on rcond estimator.	2016-04-14 16:45:41 +02:00
Gael Guennebaud	d402adc3d7	Better use .data() than &coeffRef(0)	2016-04-14 15:18:08 +02:00
Gael Guennebaud	ea7087ef31	Merged in rmlarsen/eigen (pull request PR-174) Add matrix condition number estimation module.	2016-04-14 15:11:33 +02:00
Benoit Steiner	36f5a10198	Properly gate the definition of the error and gamma functions for fp16	2016-04-13 18:44:48 -07:00
Benoit Steiner	10b69810d1	Improved support for trigonometric functions on GPU	2016-04-13 16:00:51 -07:00
Benoit Steiner	d6105b53b8	Added basic implementation of the lgamma, digamma, igamma, igammac, polygamma, and zeta function for fp16	2016-04-13 15:26:02 -07:00
Gael Guennebaud	703251f10f	merge	2016-04-13 23:45:10 +02:00
Gael Guennebaud	39211ba46b	Fix JacobiSVD for complex when the complex-to-real update already gives a diagonal 2x2 block.	2016-04-13 23:43:26 +02:00
Benoit Steiner	2986253259	Cleaned up the implementation of digamma	2016-04-13 14:24:06 -07:00
Benoit Steiner	d5de1a8220	Pulled latest updates from trunk	2016-04-13 14:17:11 -07:00
Benoit Steiner	87ca15c4e8	Added support for sin, cos, tan, and tanh on fp16	2016-04-13 14:12:38 -07:00
Gael Guennebaud	feef39e2d1	Fix underflow in JacoviSVD's complex to real preconditioner	2016-04-13 22:49:51 +02:00
Benoit Steiner	bf3f6688f0	Added support for computing cos, sin, tan, and tanh on GPU.	2016-04-13 11:55:08 -07:00
Benoit Steiner	473c8380ea	Added constructors to convert unsigned integers into fp16	2016-04-13 11:03:37 -07:00
Gael Guennebaud	42a3352a3b	Workaround a division by zero when outerstride==0	2016-04-13 19:02:02 +02:00
Gael Guennebaud	6f960b83ff	Make use of is_same_dense helper instead of extract_data to detect input/outputs are the same.	2016-04-13 18:47:12 +02:00
Gael Guennebaud	b7716c0328	Fix incomplete previous patch on matrix comparision.	2016-04-13 18:32:56 +02:00
Gael Guennebaud	2630d97c62	Fix detection of same matrices when both matrices are not handled by extract_data.	2016-04-13 18:26:08 +02:00
Gael Guennebaud	06447e0a39	Improve half-packet vectorization logic to distinguish linear versus inner traversal modes.	2016-04-13 18:15:49 +02:00
Gael Guennebaud	bbb8854bf7	Enable half-packet in reduxions.	2016-04-13 13:02:34 +02:00
Benoit Steiner	aa1ba8bbd2	Don't put a command at the end of an enumerator list	2016-04-12 16:28:11 -07:00
Gael Guennebaud	b67c983291	Enable the use of half-packet in coeff-based product. For instance, Matrix4f*Vector4f is now vectorized again when using AVX.	2016-04-12 23:03:03 +02:00
Rasmus Larsen	6498dadc2f	Merged eigen/eigen into default	2016-04-11 17:42:05 -07:00
Benoit Steiner	748c4c4599	More accurate cost estimates for exp, log, tanh, and sqrt.	2016-04-11 13:11:04 -07:00
Benoit Steiner	833efb39bf	Added epsilon, dummy_precision, infinity and quiet_NaN NumTraits for fp16	2016-04-11 11:03:56 -07:00
Benoit Steiner	e939b087fe	Pulled latest update from trunk	2016-04-11 11:03:02 -07:00
Gael Guennebaud	0483430283	Move LAPACK declarations from blas.h to lapack.h and fix compatibility with EIGEN_USE_MKL	2016-04-11 17:12:31 +02:00
Gael Guennebaud	097d1e8823	Cleanup obsolete assign_scalar_eig2mkl helper.	2016-04-11 16:09:29 +02:00
Gael Guennebaud	fec4c334ba	Remove all references to MKL in BLAS wrappers.	2016-04-11 16:04:09 +02:00
Gael Guennebaud	ddabc992fa	Fix long to int conversion in BLAS API.	2016-04-11 15:52:01 +02:00
Gael Guennebaud	8191f373be	Silent unused warning.	2016-04-11 15:37:16 +02:00
Gael Guennebaud	6a9ca88e7e	Relax dependency on MKL for EIGEN_USE_BLAS	2016-04-11 15:17:14 +02:00
Gael Guennebaud	4e8e5888d7	Improve constness of blas level-3 interface.	2016-04-11 15:12:44 +02:00
Gael Guennebaud	675e0a2224	Fix static/inline keywords order.	2016-04-11 15:06:20 +02:00
Till Hoffmann	643b697649	Proper handling of domain errors.	2016-04-10 00:37:53 +01:00
Rasmus Munk Larsen	1f70bd4134	Merge.	2016-04-09 15:31:53 -07:00
Rasmus Munk Larsen	096e355f8e	Add short-circuit to avoid calling matrix norm for empty matrix.	2016-04-09 15:29:56 -07:00
Rasmus Larsen	be80fb49fc	Merged default (`4a92b590a0` ) into default	2016-04-09 13:13:01 -07:00
Rasmus Larsen	7a8176587b	Merged eigen/eigen into default	2016-04-09 12:47:41 -07:00
Rasmus Munk Larsen	4a92b590a0	Merge.	2016-04-09 12:47:24 -07:00
Rasmus Munk Larsen	ee6c69733a	A few tiny adjustments to short-circuit logic.	2016-04-09 12:45:49 -07:00
Till Hoffmann	7f4826890c	Merge upstream	2016-04-09 20:08:07 +01:00
Till Hoffmann	de057ebe54	Added nans to zeta function.	2016-04-09 20:07:36 +01:00
Benoit Steiner	5da90fc8dd	Use numext::abs instead of std::abs in scalar_fuzzy_default_impl to make it usable inside GPU kernels.	2016-04-08 19:40:48 -07:00
Benoit Steiner	01bd577288	Fixed the implementation of Eigen::numext::isfinite, Eigen::numext::isnan, andEigen::numext::isinf on CUDA devices	2016-04-08 16:40:10 -07:00
Benoit Steiner	89a3dc35a3	Fixed isfinite_impl: NumTraits<T>::highest() and NumTraits<T>::lowest() are finite numbers.	2016-04-08 15:56:16 -07:00
Benoit Steiner	995f202cea	Disabled the use of half2 on cuda devices of compute capability < 5.3	2016-04-08 14:43:36 -07:00
Benoit Steiner	8d22967bd9	Initial support for taking the power of fp16	2016-04-08 14:22:39 -07:00
Benoit Steiner	3394379319	Fixed the packet_traits for half floats.	2016-04-08 13:33:59 -07:00
Rasmus Larsen	0b81a18d12	Merged eigen/eigen into default	2016-04-08 12:58:57 -07:00
Benoit Jacob	cd2b667ac8	Add references to filed LLVM bugs	2016-04-08 08:12:47 -04:00
Benoit Steiner	3bd16457e1	Properly handle complex numbers.	2016-04-07 23:28:04 -07:00
Rasmus Larsen	c34e55c62b	Merged eigen/eigen into default	2016-04-07 20:23:03 -07:00
Rasmus Munk Larsen	283c51cd5e	Widen short-circuiting ReciprocalConditionNumberEstimate so we don't call InverseMatrixL1NormEstimate for dec.rows() <= 1.	2016-04-07 16:45:40 -07:00
Rasmus Munk Larsen	d51803a728	Use Index instead of int for indexing and sizes.	2016-04-07 16:39:48 -07:00
Rasmus Munk Larsen	fd872aefb3	Remove transpose() method from LLT and LDLT classes as it would imply conjugation. Explicitly cast constants to RealScalar in ConditionEstimator.h.	2016-04-07 16:28:44 -07:00
Rasmus Munk Larsen	0b5546d182	Use lpNorm<1>() to compute l1 norms in LLT and LDLT.	2016-04-07 15:49:30 -07:00
parthaEth	2d5bb375b7	Static casting scalar types so as to let chlesky module of eigen work with ceres	2016-04-08 00:14:44 +02:00
Benoit Steiner	74f64838c5	Updated the unary functors to use the numext implementation of typicall functions instead of the one provided in the standard library. The standard library functions aren't supported officially by cuda, so we're better off using the numext implementations.	2016-04-07 11:42:14 -07:00
Benoit Steiner	737644366f	Move the functions operating on fp16 out of the std namespace and into the Eigen::numext namespace	2016-04-07 11:40:15 -07:00
Benoit Steiner	b89d3f78b2	Updated the isnan, isinf and isfinite functions to make compatible with cuda devices.	2016-04-07 10:08:49 -07:00
Benoit Steiner	df838736e2	Fixed compilation warning triggered by msvc	2016-04-06 20:48:55 -07:00
Benoit Steiner	14ea7c7ec7	Fixed packet_traits<half>	2016-04-06 19:30:21 -07:00
Benoit Steiner	532fdf24cb	Added support for hardware conversion between fp16 and full floats whenever possible.	2016-04-06 17:11:31 -07:00
Benoit Steiner	58c1dbff19	Made the fp16 code more portable.	2016-04-06 13:44:08 -07:00
Benoit Steiner	cf7e73addd	Added some missing conversions to the Half class, and fixed the implementation of the < operator on cuda devices.	2016-04-06 09:59:51 -07:00
Benoit Steiner	10bdd8e378	Merged in tillahoffmann/eigen (pull request PR-173) Added zeta function of two arguments and polygamma function	2016-04-06 09:40:17 -07:00
Benoit Steiner	72abfa11dd	Added support for isfinite on fp16	2016-04-06 09:07:30 -07:00
Rasmus Munk Larsen	4d07064a3d	Fix bug in alternate lower bound calculation due to missing parentheses. Make a few expressions more concise.	2016-04-05 16:40:48 -07:00
Konstantinos Margaritis	2bba4ee2cf	Merged kmargar/eigen/tip into default	2016-04-05 22:22:08 +03:00
Konstantinos Margaritis	317384b397	complete the port, remove float support	2016-04-05 14:56:45 -04:00
tillahoffmann	726bd5f077	Merged eigen/eigen into default	2016-04-05 18:21:05 +01:00
Till Hoffmann	a350c25a39	Added accuracy comments.	2016-04-05 18:20:40 +01:00
Konstantinos Margaritis	bc0ad363c6	add remaining includes	2016-04-05 06:01:17 -04:00
Konstantinos Margaritis	2d41dc9622	complete int/double specialized traits for ZVector	2016-04-05 06:00:51 -04:00
Konstantinos Margaritis	988344daf1	enable the other includes as well	2016-04-05 05:59:30 -04:00
Rasmus Larsen	d7eeee0c1d	Merged eigen/eigen into default	2016-04-04 15:58:27 -07:00
Rasmus Munk Larsen	513c372960	Fix docstrings to list all supported decompositions.	2016-04-04 14:34:59 -07:00
Rasmus Munk Larsen	86e0ed81f8	Addresses comments on Eigen pull request PR-174. * Get rid of code-duplication for real vs. complex matrices. * Fix flipped arguments to select. * Make the condition estimation functions free functions. * Use Vector::Unit() to generate canonical unit vectors. * Misc. cleanup.	2016-04-04 14:20:01 -07:00
Benoit Jacob	158fea0f5e	bug #1190 - Don't trust __ARM_FEATURE_FMA on Clang/ARM	2016-04-04 16:42:40 -04:00
Benoit Jacob	03f2997a11	bug #1191 - Prevent Clang/ARM from rewriting VMLA into VMUL+VADD	2016-04-04 16:41:47 -04:00
Till Hoffmann	b97911dd18	Refactored code into type-specific helper functions.	2016-04-04 19:16:03 +01:00
Benoit Steiner	c4179dd470	Updated the scalar_abs_op struct to make it compatible with cuda devices.	2016-04-04 11:11:51 -07:00
Benoit Steiner	1108b4f218	Fixed the signature of numext::abs to make it compatible with complex numbers	2016-04-04 11:09:25 -07:00
Rasmus Larsen	30242b7565	Merged eigen/eigen into default	2016-04-01 17:19:36 -07:00
Rasmus Munk Larsen	9d51f7c457	Add rcond method to LDLT.	2016-04-01 16:48:38 -07:00
Rasmus Munk Larsen	f54137606e	Add condition estimation to Cholesky (LLT) factorization.	2016-04-01 16:19:45 -07:00
Rasmus Munk Larsen	fb8dccc23e	Replace "inline static" with "static inline" for consistency.	2016-04-01 12:48:18 -07:00
Rasmus Munk Larsen	91414e0042	Fix comments in ConditionEstimator and minor cleanup.	2016-04-01 11:58:17 -07:00
Rasmus Munk Larsen	1aa89fb855	Add matrix condition estimator module that implements the Higham/Hager algorithm from http://www.maths.manchester.ac.uk/~higham/narep/narep135.pdf used in LPACK. Add rcond() methods to FullPivLU and PartialPivLU.	2016-04-01 10:27:59 -07:00
Till Hoffmann	80eba21ad0	Merge upstream.	2016-04-01 18:18:49 +01:00
Till Hoffmann	3cb0a237c1	Fixed suggestions by Eugene Brevdo.	2016-04-01 17:51:39 +01:00
tillahoffmann	49960adbdd	Merged eigen/eigen into default	2016-04-01 14:36:15 +01:00
Till Hoffmann	57239f4a81	Added polygamma function.	2016-04-01 14:35:21 +01:00
Till Hoffmann	dd5d390daf	Added zeta function.	2016-04-01 13:32:29 +01:00
Benoit Steiner	0ea7ab4f62	Hashing was only officially introduced in c++11. Therefore only define an implementation of the hash function for float16 if c++11 is enabled.	2016-03-31 14:44:55 -07:00
Benoit Steiner	92b7f7b650	Improved code formating	2016-03-31 13:09:58 -07:00
Benoit Steiner	f197813f37	Added the ability to hash a fp16	2016-03-31 13:09:23 -07:00
Benoit Steiner	4c859181da	Made it possible to use the NumTraits for complex and Array in a cuda kernel.	2016-03-31 12:48:38 -07:00
Benoit Steiner	c36ab19902	Added __ldg primitive for fp16.	2016-03-31 10:55:03 -07:00
Benoit Steiner	b575fb1d02	Added NumTraits for half floats	2016-03-31 10:43:59 -07:00
Benoit Steiner	8c8a79cec1	Fixed a typo	2016-03-31 10:33:32 -07:00
Benoit Steiner	4f1a7e51c1	Pull math functions from the global namespace only when compiling cuda code with nvcc. When compiling with clang, we want to use the std namespace.	2016-03-30 17:59:49 -07:00
Benoit Steiner	bc68fc2fe7	Enable constant expressions when compiling cuda code with clang.	2016-03-30 17:58:32 -07:00
Benoit Jacob	01b5333e44	bug #1186 - vreinterpretq_u64_f64 fails to build on Android/Aarch64/Clang toolchain	2016-03-30 11:02:33 -04:00
Benoit Steiner	1841d6d4c3	Added missing cuda template specializations for numext::ceil	2016-03-29 13:29:34 -07:00
Benoit Steiner	e02b784ec3	Added support for standard mathematical functions and trancendentals(such as exp, log, abs, ...) on fp16	2016-03-29 09:20:36 -07:00
Benoit Steiner	c38295f0a0	Added support for fmod	2016-03-28 15:53:02 -07:00
Konstantinos Margaritis	01e7298fe6	actually include ZVector files, passes most basic tests (float still fails)	2016-03-28 10:58:02 -04:00
Konstantinos Margaritis	f48011119e	Merged eigen/eigen into default	2016-03-28 01:48:45 +03:00
Konstantinos Margaritis	ed6b9d08f1	some primitives ported, but missing intrinsics and crash with asm() are a problem	2016-03-27 18:47:49 -04:00
Benoit Steiner	65716e99a5	Improved the cost estimate of the quotient op	2016-03-25 11:13:53 -07:00
Benoit Steiner	d94f6ba965	Started to model the cost of divisions more accurately.	2016-03-25 11:02:56 -07:00
Benoit Steiner	2e4e4cb74d	Use numext::abs instead of abs to avoid incorrect conversion to integer of the argument	2016-03-23 16:57:12 -07:00
Benoit Steiner	81d340984a	Removed executable bit from header files	2016-03-23 16:15:02 -07:00
Benoit Steiner	bff8cbad06	Removed executable bit from header files	2016-03-23 16:14:23 -07:00
Benoit Steiner	7a570e50ef	Fixed contractions of fp16	2016-03-23 16:00:06 -07:00
Benoit Steiner	fc3660285f	Made type conversion explicit	2016-03-23 09:56:50 -07:00
Benoit Steiner	0e68882604	Added the ability to divide a half float by an index	2016-03-23 09:46:42 -07:00
Benoit Steiner	6971146ca9	Added more conversion operators for half floats	2016-03-23 09:44:52 -07:00
Benoit Steiner	f9ad25e4d8	Fixed contractions of 16 bit floats	2016-03-22 09:30:23 -07:00
Benoit Steiner	134d750eab	Completed the implementation of vectorized type casting of half floats.	2016-03-18 13:36:28 -07:00
Benoit Steiner	7bd551b3a9	Make all the conversions explicit	2016-03-18 12:20:08 -07:00
Benoit Steiner	7b98de1f15	Implemented some of the missing type casting for half floats	2016-03-17 21:45:45 -07:00
Christoph Hertzberg	46aa9772fc	Merged in ebrevdo/eigen (pull request PR-169) Bugfixes to cuda tests, igamma & igammac implemented, & tests for digamma, igamma, igammac on CPU & GPU.	2016-03-16 21:59:08 +01:00
Eugene Brevdo	1f69a1b65f	Change the header guard around certain numext functions to be CUDA specific.	2016-03-16 12:44:35 -07:00
Benoit Steiner	5a51366ea5	Fixed a typo.	2016-03-14 09:25:16 -07:00
Benoit Steiner	fcf59e1c37	Properly gate the use of cuda intrinsics in the code	2016-03-14 09:13:44 -07:00
Benoit Steiner	97a1f1c273	Make sure we only use the half float intrinsic when compiling with a version of CUDA that is recent enough to provide them	2016-03-14 08:37:58 -07:00
Benoit Steiner	e29c9676b1	Don't mark the cast operator as explicit, since this is a c++11 feature that's not supported by older compilers.	2016-03-12 00:15:58 -08:00
Benoit Steiner	eecd914864	Also replaced uint32_t with unsigned int to make the code more portable	2016-03-11 19:34:21 -08:00
Benoit Steiner	1ca8c1ec97	Replaced a couple more uint16_t with unsigned short	2016-03-11 19:28:28 -08:00
Benoit Steiner	0423b66187	Use unsigned short instead of uint16_t since they're more portable	2016-03-11 17:53:41 -08:00
Benoit Steiner	048c4d6efd	Made half floats usable on hardware that doesn't support them natively.	2016-03-11 17:21:42 -08:00
Benoit Steiner	456e038a4e	Fixed the +=, -=, *= and /= operators to return a reference	2016-03-10 15:17:44 -08:00
Eugene Brevdo	836e92a051	Update MathFunctions/SpecialFunctions with intelligent header guards.	2016-03-09 09:04:45 -08:00
Eugene Brevdo	5e7de771e3	Properly fix merge issues.	2016-03-08 17:35:05 -08:00
Eugene Brevdo	73220d2bb0	Resolve bad merge.	2016-03-08 17:28:21 -08:00
Eugene Brevdo	14f0fde51f	Add certain functions to numext (log, exp, tan) because CUDA doesn't support std:: Use these in SpecialFunctions.	2016-03-08 17:17:44 -08:00
Eugene Brevdo	0bb5de05a1	Finishing touches on igamma/igammac for GPU. Tests now pass.	2016-03-07 15:35:09 -08:00
Eugene Brevdo	5707004d6b	Fix Eigen's building of sharded tests that use CUDA & more igamma/igammac bugfixes. 0. Prior to this PR, not a single sharded CUDA test was actually being run. Fixed that. GPU tests are still failing for igamma/igammac. 1. Add calls for igamma/igammac to TensorBase 2. Fix up CUDA-specific calls of igamma/igammac 3. Add unit tests for digamma, igamma, igammac in CUDA.	2016-03-07 14:08:56 -08:00
Benoit Steiner	05bbca079a	Turn on some of the cxx11 features when compiling with visual studio 2015	2016-03-05 10:52:08 -08:00
Eugene Brevdo	0b9e0abc96	Make igamma and igammac work correctly. This required replacing ::abs with std::abs. Modified some unit tests.	2016-03-04 21:12:10 -08:00
Eugene Brevdo	7ea35bfa1c	Initial implementation of igamma and igammac.	2016-03-03 19:39:41 -08:00
Benoit Steiner	1032441c6f	Enable partial support for half floats on Kepler GPUs.	2016-03-03 10:34:20 -08:00
Benoit Steiner	1da10a7358	Enable the conversion between floats and half floats on older GPUs that support it.	2016-03-03 10:33:20 -08:00

... 3 4 5 6 7 ...

4884 Commits