eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-06 14:14:46 +08:00

Author	SHA1	Message	Date
Benoit Steiner	488ad7dd1b	Added missing EIGEN_DEVICE_FUNC qualifiers	2016-09-14 13:35:00 -07:00
Benoit Steiner	028e299577	Fixed a bug impacting some outer reductions on GPU	2016-09-12 18:36:52 -07:00
Benoit Steiner	8321dcce76	Merged latest updates from trunk	2016-09-12 10:33:05 -07:00
Benoit Steiner	eb6ba00cc8	Properly size the list of waiters	2016-09-12 10:31:55 -07:00
Benoit Steiner	a618094b62	Added a resize method to MaxSizeVector	2016-09-12 10:30:53 -07:00
Gael Guennebaud	471eac5399	bug #1195 : move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX)	2016-09-08 08:36:27 +02:00
Gael Guennebaud	e1642f485c	bug #1288 : fix memory leak in arpack wrapper.	2016-09-05 18:01:30 +02:00
Benoit Steiner	13df3441ae	Use MaxSizeVector instead of std::vector: xcode sometimes assumes that std::vector allocates aligned memory and therefore issues aligned instruction to initialize it. This can result in random crashes when compiling with AVX instructions enabled.	2016-09-02 19:25:47 -07:00
Benoit Steiner	cadd124d73	Pulled latest update from trunk	2016-09-02 15:30:02 -07:00
Benoit Steiner	05b0518077	Made the index type an explicit template parameter to help some compilers compile the code.	2016-09-02 15:29:34 -07:00
Benoit Steiner	adf864fec0	Merged in rmlarsen/eigen (pull request PR-222) Fix CUDA build broken by changes to min and max reduction.	2016-09-02 14:11:20 -07:00
Rasmus Munk Larsen	13e93ca8b7	Fix CUDA build broken by changes to min and max reduction.	2016-09-02 13:41:36 -07:00
Benoit Steiner	c53f783705	Updated the contraction code to support constant inputs.	2016-09-01 11:41:27 -07:00
Gael Guennebaud	46475eff9a	Adjust Tensor module wrt recent change in nullary functor	2016-09-01 13:40:45 +02:00
Rasmus Munk Larsen	a1e092d1e8	Fix bugs to make min- and max reducers with correctly with IEEE infinities.	2016-08-31 15:04:16 -07:00
Gael Guennebaud	1f84f0d33a	merge EulerAngles module	2016-08-30 10:01:53 +02:00
Gael Guennebaud	e074f720c7	Include missing forward declaration of SparseMatrix	2016-08-29 18:56:46 +02:00
Gael Guennebaud	35a8e94577	bug #1167 : simplify installation of header files using cmake's install(DIRECTORY ...) command.	2016-08-29 10:59:37 +02:00
Gael Guennebaud	965e595f02	Add missing log1p method	2016-08-26 14:55:00 +02:00
Benoit Steiner	34ae80179a	Use array_prod instead of calling TotalSize since TotalSize is only available on DSize.	2016-08-15 10:29:14 -07:00
Benoit Steiner	fe73648c98	Fixed a bug in the documentation.	2016-08-12 10:00:43 -07:00
Benoit Steiner	e3a8dfb02f	std::erfcf doesn't exist: use numext::erfc instead	2016-08-11 15:24:06 -07:00
Benoit Steiner	64e68cbe87	Don't attempt to optimize partial reductions when the optimized implementation doesn't buy anything.	2016-08-08 19:29:59 -07:00
Benoit Steiner	ca2cee2739	Merged in ibab/eigen (pull request PR-206) Expose real and imag methods on Tensors	2016-08-03 11:53:04 -07:00
Benoit Steiner	a20b58845f	CUDA_ARCH isn't always defined, so avoid relying on it too much when figuring out which implementation to use for reductions. Instead rely on the device to tell us on which hardware version we're running.	2016-08-03 10:00:43 -07:00
Benoit Steiner	fd220dd8b0	Use numext::conj instead of std::conj	2016-08-01 18:16:16 -07:00
Benoit Steiner	e256acec7c	Avoid unecessary object copies	2016-08-01 17:03:39 -07:00
Benoit Steiner	2693fd54bf	bug #1266 : half implementation has been moved to half_impl namespace	2016-07-29 13:45:56 -07:00
Gael Guennebaud	cc2f6d68b1	bug #1264 : fix compilation	2016-07-27 23:30:47 +02:00
Gael Guennebaud	8972323c08	Big 1261: add missing max(ADS,ADS) overload (same for min)	2016-07-27 14:52:48 +02:00
Gael Guennebaud	0d7039319c	bug #1260 : remove doubtful specializations of ScalarBinaryOpTraits	2016-07-27 14:35:52 +02:00
Benoit Steiner	3d3d34e442	Deleted dead code.	2016-07-25 08:53:37 -07:00
Gael Guennebaud	6d5daf32f5	bug #1255 : comment out broken and unsused line.	2016-07-25 14:48:30 +02:00
Gael Guennebaud	f9598d73b5	bug #1250 : fix pow() for AutoDiffScalar with custom nested scalar type.	2016-07-25 14:42:19 +02:00
Gael Guennebaud	fd1117f2be	Implement digits10 for mpreal	2016-07-25 14:38:55 +02:00
Gael Guennebaud	9908020d36	Add minimal support for Array<string>, and fix Tensor<string>	2016-07-25 14:25:56 +02:00
Benoit Steiner	c6b0de2c21	Improved partial reductions in more cases	2016-07-22 17:18:20 -07:00
Gael Guennebaud	0f350a8b7e	Fix CUDA compilation	2016-07-21 18:47:07 +02:00
Yi Lin	7b4abc2b1d	Fixed a code comment error	2016-07-20 22:28:54 +08:00
Benoit Steiner	20f7ef2f89	An evalTo expression is only aligned iff both the lhs and the rhs are aligned.	2016-07-12 10:56:42 -07:00
Benoit Steiner	3a2dd352ae	Improved the contraction mapper to properly support tensor products	2016-07-11 13:43:41 -07:00
Benoit Steiner	0bc020be9d	Improved the detection of packet size in the tensor scan evaluator.	2016-07-11 12:14:56 -07:00
Gael Guennebaud	a96a7ce3f7	Move CUDA's special functions to SpecialFunctions module.	2016-07-11 18:39:11 +02:00
Gael Guennebaud	fd60966310	merge	2016-07-11 18:11:47 +02:00
Gael Guennebaud	194daa3048	Fix assertion (it did not make sense for static_val types)	2016-07-11 11:39:27 +02:00
Gael Guennebaud	18c35747ce	Emulate _BitScanReverse64 for 32 bits builds	2016-07-11 11:38:04 +02:00
Gael Guennebaud	599f8ba617	Change runtime to compile-time conditional.	2016-07-08 11:39:43 +02:00
Gael Guennebaud	544935101a	Fix warnings	2016-07-08 11:38:52 +02:00
Gael Guennebaud	2f7e2614e7	bug #1232 : refactor special functions as a new SpecialFunctions module, currently in unsupported/.	2016-07-08 11:13:55 +02:00
Gael Guennebaud	179ebb88f9	Fix warning	2016-07-07 09:16:40 +02:00
Gael Guennebaud	ce9fc0ce14	fix clang compilation	2016-07-04 12:59:02 +02:00
Gael Guennebaud	440020474c	Workaround compilation issue with msvc	2016-07-04 12:49:19 +02:00
Igor Babuschkin	78f37ca03c	Expose real and imag methods on Tensors	2016-07-01 17:34:31 +01:00
Benoit Steiner	cb2d8b8fa6	Made it possible to compile reductions for an old cuda architecture and run them on a recent gpu.	2016-06-29 15:42:01 -07:00
Benoit Steiner	b2a47641ce	Made the code compile when using CUDA architecture < 300	2016-06-29 15:32:47 -07:00
Igor Babuschkin	85699850d9	Add missing CUDA kernel to tensor scan op The TensorScanOp implementation was missing a CUDA kernel launch. This adds a simple placeholder implementation.	2016-06-29 11:54:35 +01:00
Benoit Steiner	75c333f94c	Don't store the scan axis in the evaluator of the tensor scan operation since it's only used in the constructor. Also avoid taking references to values that may becomes stale after a copy construction.	2016-06-27 10:32:38 -07:00
Benoit Steiner	7944d4431f	Made the cost model cwiseMax and cwiseMin methods consts to help the PowerPC cuda compiler compile this code.	2016-08-18 13:46:36 -07:00
Benoit Steiner	647a51b426	Force the inlining of a simple accessor.	2016-08-18 12:31:02 -07:00
Benoit Steiner	a452dedb4f	Merged in ibab/eigen/double-tensor-reduction (pull request PR-216) Enable efficient Tensor reduction for doubles on the GPU (continued)	2016-08-18 12:29:54 -07:00
Igor Babuschkin	18c67df31c	Fix remaining CUDA >= 300 checks	2016-08-18 17:18:30 +01:00
Igor Babuschkin	1569a7d7ab	Add the necessary CUDA >= 300 checks back	2016-08-18 17:15:12 +01:00
Benoit Steiner	2b17f34574	Properly detect the type of the result of a contraction.	2016-08-16 16:00:30 -07:00
Igor Babuschkin	841e075154	Remove CUDA >= 300 checks and enable outer reductin for doubles	2016-08-06 18:07:50 +01:00
Igor Babuschkin	0425118e2a	Merge upstream changes	2016-08-05 14:34:57 +01:00
Igor Babuschkin	9537e8b118	Make use of atomicExch for atomicExchCustom	2016-08-05 14:29:58 +01:00
Igor Babuschkin	eeb0d880ee	Enable efficient Tensor reduction for doubles	2016-07-01 19:08:26 +01:00
Gael Guennebaud	cfff370549	Fix hyperbolic functions for autodiff.	2016-06-24 23:21:35 +02:00
Gael Guennebaud	3852351793	merge pull request 198	2016-06-24 11:48:17 +02:00
Gael Guennebaud	6dd9077070	Fix some unused typedef warnings.	2016-06-24 11:34:21 +02:00
Gael Guennebaud	ce90647fa5	Fix NumTraits<AutoDiff>	2016-06-24 11:34:02 +02:00
Gael Guennebaud	fa39f81b48	Fix instantiation of ScalarBinaryOpTraits for AutoDiff.	2016-06-24 11:33:30 +02:00
Rasmus Munk Larsen	a9c1e4d7b7	Return -1 from CurrentThreadId when called by thread outside the pool.	2016-06-23 16:40:07 -07:00
Rasmus Munk Larsen	d39df320d2	Resolve merge.	2016-06-23 15:08:03 -07:00
Gael Guennebaud	360a743a10	bug #1241 : does not emmit anything for empty tensors	2016-06-23 18:47:31 +02:00
Gael Guennebaud	7c6561485a	merge PR 194	2016-06-23 15:29:57 +02:00
Benoit Steiner	a29a2cb4ff	Silenced a couple of compilation warnings generated by xcode	2016-06-22 16:43:02 -07:00
Benoit Steiner	f8fcd6b32d	Turned the constructor of the PerThread struct into what is effectively a constant expression to make the code compatible with a wider range of compilers	2016-06-22 16:03:11 -07:00
Benoit Steiner	c58df31747	Handle empty tensors in the print functions	2016-06-21 09:22:43 -07:00
Benoit Steiner	de32f8d656	Fixed the printing of rank-0 tensors	2016-06-20 10:46:45 -07:00
Tal Hadad	8e198d6835	Complete docs and add ostream operator for EulerAngles.	2016-06-19 20:42:45 +03:00
Geoffrey Lalonde	72c95383e0	Add autodiff coverage for standard library hyperbolic functions, and tests. * * * Corrected tanh derivatived, moved test definitions. * * * Added more test cases, removed lingering lines	2016-06-15 23:33:19 -07:00
Benoit Steiner	7d495d890a	Merged in ibab/eigen (pull request PR-197) Implement exclusive scan option for Tensor library	2016-06-14 17:54:59 -07:00
Benoit Steiner	aedc5be1d6	Avoid generating pseudo random numbers that are multiple of 5: this helps spread the load over multiple cpus without havind to rely on work stealing.	2016-06-14 17:51:47 -07:00
Igor Babuschkin	c4d10e921f	Implement exclusive scan option	2016-06-14 19:44:07 +01:00
Gael Guennebaud	76236cdea4	merge	2016-06-14 15:33:47 +02:00
Gael Guennebaud	62134082aa	Update AutoDiffScalar wrt to scalar-multiple.	2016-06-14 15:06:35 +02:00
Gael Guennebaud	5d38203735	Update Tensor module to use bind1st_op and bind2nd_op	2016-06-14 15:06:03 +02:00
Tal Hadad	6edfe8771b	Little bit docs	2016-06-13 22:03:19 +03:00
Tal Hadad	6e1c086593	Add static assertion	2016-06-13 21:55:17 +03:00
Gael Guennebaud	3c12e24164	Add bind1st_op and bind2nd_op helpers to turn binary functors into unary ones, and implement scalar_multiple2 and scalar_quotient2 on top of them.	2016-06-13 16:18:59 +02:00
Tal Hadad	06206482d9	More docs, and minor code fixes	2016-06-12 23:40:17 +03:00
Benoit Steiner	65d33e5898	Merged in ibab/eigen (pull request PR-195) Add small fixes to TensorScanOp	2016-06-10 19:31:17 -07:00
Benoit Steiner	a05607875a	Don't refer to the half2 type unless it's been defined	2016-06-10 11:53:56 -07:00
Igor Babuschkin	86aedc9282	Add small fixes to TensorScanOp	2016-06-07 20:06:38 +01:00
Benoit Steiner	84b2060a9e	Fixed compilation error with gcc 4.4	2016-06-06 17:16:19 -07:00
Benoit Steiner	7ef9f47b58	Misc small improvements to the reduction code.	2016-06-06 14:09:46 -07:00
Tal Hadad	e30133e439	Doc EulerAngles class, and minor fixes.	2016-06-06 22:01:40 +03:00
Benoit Steiner	9137f560f0	Moved assertions to the constructor to make the code more portable	2016-06-06 07:26:48 -07:00
Gael Guennebaud	66e99ab6a1	Relax mixing-type constraints for binary coefficient-wise operators: - Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP> - Remove the "functor_is_product_like" helper (was pretty ugly) - Currently, OP is not used, but it is available to the user for fine grained tuning - Currently, only the following operators have been generalized: ,/,+,-,=,=,/=,+=,-= - TODO: generalize all other binray operators (comparisons,pow,etc.) - TODO: handle "scalar op array" operators (currently only * is handled) - TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits	2016-06-06 15:11:41 +02:00
Rasmus Munk Larsen	f1f2ff8208	size_t -> int	2016-06-03 18:06:37 -07:00
Rasmus Munk Larsen	76308e7fd2	Add CurrentThreadId and NumThreads methods to Eigen threadpools and TensorDeviceThreadPool.	2016-06-03 16:28:58 -07:00
Benoit Steiner	37638dafd7	Simplified the code that dispatches vectorized reductions on GPU	2016-06-09 10:29:52 -07:00
Benoit Steiner	66796e843d	Fixed definition of some of the reducer_traits	2016-06-09 08:50:01 -07:00
Benoit Steiner	14a112ee15	Use signed integers more consistently to encode the number of threads to use to evaluate a tensor expression.	2016-06-09 08:25:22 -07:00
Benoit Steiner	8f92c26319	Improved code formatting	2016-06-09 08:23:42 -07:00
Benoit Steiner	aa33446dac	Improved support for vectorization of 16-bit floats	2016-06-09 08:22:27 -07:00
Benoit Steiner	d6d39c7ddb	Added missing EIGEN_DEVICE_FUNC	2016-06-07 14:35:08 -07:00
Gael Guennebaud	e8b922ca63	Fix MatrixFunctions module.	2016-06-03 09:21:35 +02:00
Benoit Steiner	c3c8ad8046	Align the first element of the Waiter struct instead of padding it. This reduces its memory footprint a bit while achieving the goal of preventing false sharing	2016-06-02 21:17:41 -07:00
Eugene Brevdo	39baff850c	Add TernaryFunctors and the betainc SpecialFunction. TernaryFunctors and their executors allow operations on 3-tuples of inputs. API fully implemented for Arrays and Tensors based on binary functors. Ported the cephes betainc function (regularized incomplete beta integral) to Eigen, with support for CPU and GPU, floats, doubles, and half types. Added unit tests in array.cpp and cxx11_tensor_cuda.cu Collapsed revision * Merged helper methods for betainc across floats and doubles. * Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase. * Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper. * betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup. * Update TernaryOp and SpecialFunctions (betainc) based on review comments.	2016-06-02 17:04:19 -07:00
Benoit Steiner	c21eaedce6	Use array_prod to compute the number of elements contained in the input tensor expression	2016-06-04 07:47:04 -07:00
Benoit Steiner	36a4500822	Merged in ibab/eigen (pull request PR-192) Add generic scan method	2016-06-03 17:28:33 -07:00
Benoit Steiner	c2a102345f	Improved the performance of full reductions. AFTER: BM_fullReduction/10 4541 4543 154017 21.0M items/s BM_fullReduction/64 5191 5193 100000 752.5M items/s BM_fullReduction/512 9588 9588 71361 25.5G items/s BM_fullReduction/4k 244314 244281 2863 64.0G items/s BM_fullReduction/5k 359382 359363 1946 64.8G items/s BEFORE: BM_fullReduction/10 9085 9087 74395 10.5M items/s BM_fullReduction/64 9478 9478 72014 412.1M items/s BM_fullReduction/512 14643 14646 46902 16.7G items/s BM_fullReduction/4k 260338 260384 2678 60.0G items/s BM_fullReduction/5k 385076 385178 1818 60.5G items/s	2016-06-03 17:27:08 -07:00
Igor Babuschkin	dc03b8f3a1	Add generic scan method	2016-06-03 17:37:04 +01:00
Rasmus Munk Larsen	811aadbe00	Add syntactic sugar to Eigen tensors to allow more natural syntax. Specifically, this enables expressions involving: scalar + tensor scalar * tensor scalar / tensor scalar - tensor	2016-06-02 12:41:28 -07:00
Tal Hadad	52e4cbf539	Merged eigen/eigen into default	2016-06-02 22:15:20 +03:00
Tal Hadad	2aaaf22623	Fix Gael reports (except documention) - "Scalar angle(int) const" should be "const Vector& angles() const" - then method "coeffs" could be removed. - avoid one letter names like h, p, r -> use alpha(), beta(), gamma() ;) - about the "fromRotation" methods: - replace the ones which are not static by operator= (as in Quaternion) - the others are actually static methods: use a capital F: FromRotation - method "invert" should be removed. - use a macro to define both float and double EulerAnglesXYZ* typedefs - AddConstIf -> not used - no needs for NegateIfXor, compilers are extremely good at optimizing away branches based on compile time constants: if(IsHeadingOpposite-=IsEven) res.alpha() = -res.alpha();	2016-06-02 22:12:57 +03:00
Igor Babuschkin	fbd7ed6ff7	Add tensor scan op This is the initial implementation a generic scan operation. Based on this, cumsum and cumprod method have been added to TensorBase.	2016-06-02 13:35:47 +01:00
Benoit Steiner	0ed08fd281	Use a single PacketSize variable	2016-06-01 21:19:05 -07:00
Benoit Steiner	8f6fedc55f	Fixed compilation warning	2016-06-01 21:14:46 -07:00
Benoit Steiner	873e6ac54b	Silenced compilation warning generated by nvcc.	2016-06-01 14:20:50 -07:00
Benoit Steiner	d27b0ad4c8	Added support for mean reductions on fp16	2016-06-01 11:12:07 -07:00
Benoit Steiner	5aeb3687c4	Only enable optimized reductions of fp16 if the reduction functor supports them	2016-05-31 10:33:40 -07:00
Benoit Steiner	e2946d962d	Reimplement clamp as a static function.	2016-05-27 12:58:43 -07:00
Benoit Steiner	e96d36d4cd	Use NULL instead of nullptr to preserve the compatibility with cxx03	2016-05-27 12:54:06 -07:00
Benoit Steiner	abc815798b	Added a new operation to enable more powerful tensorindexing.	2016-05-27 12:22:25 -07:00
Gael Guennebaud	22a035db95	Fix compilation when defaulting to row-major	2016-05-27 10:31:11 +02:00
Benoit Steiner	1ae2567861	Fixed some compilation warnings	2016-05-26 15:57:19 -07:00
Benoit Steiner	1a47844529	Preserve the ability to vectorize the evaluation of an expression even when it involves a cast that isn't vectorized (e.g fp16 to float)	2016-05-26 14:37:09 -07:00
Benoit Steiner	36369ab63c	Resolved merge conflicts	2016-05-26 13:39:39 -07:00
Benoit Steiner	28fcb5ca2a	Merged latest reduction improvements	2016-05-26 12:19:33 -07:00
Benoit Steiner	c1c7f06c35	Improved the performance of inner reductions.	2016-05-26 11:53:59 -07:00
Benoit Steiner	8288b0aec2	Code cleanup.	2016-05-26 09:00:04 -07:00
Benoit Steiner	2d7ed54ba2	Made the static storage class qualifier come first.	2016-05-25 22:16:15 -07:00
Benoit Steiner	e1fca8866e	Deleted unnecessary explicit qualifiers.	2016-05-25 22:15:26 -07:00
Benoit Steiner	9b0aaf5113	Don't mark inline functions as static since it confuses the ICC compiler	2016-05-25 22:10:11 -07:00
Benoit Steiner	037a463fd5	Marked unused variables as such	2016-05-25 22:07:48 -07:00
Benoit Steiner	3ac4045272	Made the IndexPair code compile in non cxx11 mode	2016-05-25 15:15:12 -07:00
Benoit Steiner	66556d0e05	Made the index pair list code more portable accross various compilers	2016-05-25 14:34:27 -07:00
Benoit Steiner	034aa3b2c0	Improved the performance of tensor padding	2016-05-25 11:43:08 -07:00
Benoit Steiner	58026905ae	Added support for statically known lists of pairs of indices	2016-05-25 11:04:14 -07:00
Benoit Steiner	0835667329	There is no need to make the fp16 full reduction kernel a static function.	2016-05-24 23:11:56 -07:00
Benoit Steiner	b5d6b52a4d	Fixed compilation warning	2016-05-24 23:10:57 -07:00
Benoit Steiner	a09cbf9905	Merged in rmlarsen/eigen (pull request PR-188) Minor cleanups: 1. Get rid of a few unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.	2016-05-23 12:55:12 -07:00
Christoph Hertzberg	718521d5cf	Silenced several double-promotion warnings	2016-05-22 18:17:04 +02:00
Christoph Hertzberg	25a03c02d6	Fix some sign-compare warnings	2016-05-22 16:42:27 +02:00
Gael Guennebaud	ccaace03c9	Make EIGEN_HAS_CONSTEXPR user configurable	2016-05-20 15:10:08 +02:00
Gael Guennebaud	c3410804cd	Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable	2016-05-20 15:05:38 +02:00
Gael Guennebaud	48bf5ec216	Make EIGEN_HAS_RVALUE_REFERENCES user configurable	2016-05-20 14:54:20 +02:00
Gael Guennebaud	f43ae88892	Rename EIGEN_HAVE_RVALUE_REFERENCES to EIGEN_HAS_RVALUE_REFERENCES	2016-05-20 14:48:51 +02:00
Gael Guennebaud	2f656ce447	Remove std:: to enable custom scalar types.	2016-05-19 23:13:47 +02:00
Rasmus Larsen	b1e080c752	Merged eigen/eigen into default	2016-05-18 15:21:50 -07:00
Rasmus Munk Larsen	5624219b6b	Merge.	2016-05-18 15:16:06 -07:00
Rasmus Munk Larsen	7df811cfe5	Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.	2016-05-18 15:09:48 -07:00
Benoit Steiner	bb3ff8e9d9	Advertize the packet api of the tensor reducers iff the corresponding packet primitives are available.	2016-05-18 14:52:49 -07:00
Gael Guennebaud	548a487800	bug #1229 : bypass usage of Derived::Options which is available for plain matrix types only. Better use column-major storage anyway.	2016-05-18 16:44:05 +02:00
Gael Guennebaud	43790e009b	Pass argument by const ref instead of by value in pow(AutoDiffScalar...)	2016-05-18 16:28:02 +02:00
Gael Guennebaud	1fbfab27a9	bug #1223 : fix compilation of AutoDiffScalar's min/max operators, and add regression unit test.	2016-05-18 16:26:26 +02:00
Gael Guennebaud	448d9d943c	bug #1222 : fix compilation in AutoDiffScalar and add respective unit test	2016-05-18 16:00:11 +02:00
Rasmus Munk Larsen	f519fca72b	Reduce overhead for small tensors and cheap ops by short-circuiting the const computation and block size calculation in parallelFor.	2016-05-17 16:06:00 -07:00
Benoit Steiner	86ae94462e	#if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly.	2016-05-17 14:06:15 -07:00
Benoit Steiner	997c335970	Fixed compilation error	2016-05-17 12:54:18 -07:00
Benoit Steiner	ebf6ada5ee	Fixed compilation error in the tensor thread pool	2016-05-17 12:33:46 -07:00
Rasmus Munk Larsen	0bb61b04ca	Merge upstream.	2016-05-17 10:26:10 -07:00
Rasmus Munk Larsen	0dbd68145f	Roll back changes to core. Move include of TensorFunctors.h up to satisfy dependence in TensorCostModel.h.	2016-05-17 10:25:19 -07:00
Rasmus Larsen	00228f2506	Merged eigen/eigen into default	2016-05-17 09:49:31 -07:00
Benoit Steiner	e7e64c3277	Enable the use of the packet api to evaluate tensor broadcasts. This speed things up quite a bit: Before" M_broadcasting/10 500000 3690 27.10 MFlops/s BM_broadcasting/80 500000 4014 1594.24 MFlops/s BM_broadcasting/640 100000 14770 27731.35 MFlops/s BM_broadcasting/4K 5000 632711 39512.48 MFlops/s After: BM_broadcasting/10 500000 4287 23.33 MFlops/s BM_broadcasting/80 500000 4455 1436.41 MFlops/s BM_broadcasting/640 200000 10195 40173.01 MFlops/s BM_broadcasting/4K 5000 423746 58997.57 MFlops/s	2016-05-17 09:24:35 -07:00
Benoit Steiner	5fa27574dd	Allow vectorized padding on GPU. This helps speed things up a little Before: BM_padding/10 5000000 460 217.03 MFlops/s BM_padding/80 5000000 460 13899.40 MFlops/s BM_padding/640 5000000 461 888421.17 MFlops/s BM_padding/4K 5000000 460 54316322.55 MFlops/s After: BM_padding/10 5000000 454 220.20 MFlops/s BM_padding/80 5000000 455 14039.86 MFlops/s BM_padding/640 5000000 452 904968.83 MFlops/s BM_padding/4K 5000000 411 60750049.21 MFlops/s	2016-05-17 09:17:26 -07:00
Benoit Steiner	8d06c02ffd	Allow vectorized padding on GPU. This helps speed things up a little. Before: BM_padding/10 5000000 460 217.03 MFlops/s BM_padding/80 5000000 460 13899.40 MFlops/s BM_padding/640 5000000 461 888421.17 MFlops/s BM_padding/4K 5000000 460 54316322.55 MFlops/s After: BM_padding/10 5000000 454 220.20 MFlops/s BM_padding/80 5000000 455 14039.86 MFlops/s BM_padding/640 5000000 452 904968.83 MFlops/s BM_padding/4K 5000000 411 60750049.21 MFlops/s	2016-05-17 09:13:27 -07:00
David Dement	ccc7563ac5	made a fix to the GMRES solver so that it now correctly reports the error achieved in the solution process	2016-05-16 14:26:41 -04:00
Benoit Steiner	a80d875916	Added missing costPerCoeff method	2016-05-16 09:31:10 -07:00
Benoit Steiner	83ef39e055	Turn on the cost model by default. This results in some significant speedups for smaller tensors. For example, below are the results for the various tensor reductions. Before: BM_colReduction_12T/10 1000000 1949 51.29 MFlops/s BM_colReduction_12T/80 100000 15636 409.29 MFlops/s BM_colReduction_12T/640 20000 95100 4307.01 MFlops/s BM_colReduction_12T/4K 500 4573423 5466.36 MFlops/s BM_colReduction_4T/10 1000000 1867 53.56 MFlops/s BM_colReduction_4T/80 500000 5288 1210.11 MFlops/s BM_colReduction_4T/640 10000 106924 3830.75 MFlops/s BM_colReduction_4T/4K 500 9946374 2513.48 MFlops/s BM_colReduction_8T/10 1000000 1912 52.30 MFlops/s BM_colReduction_8T/80 200000 8354 766.09 MFlops/s BM_colReduction_8T/640 20000 85063 4815.22 MFlops/s BM_colReduction_8T/4K 500 5445216 4591.19 MFlops/s BM_rowReduction_12T/10 1000000 2041 48.99 MFlops/s BM_rowReduction_12T/80 100000 15426 414.87 MFlops/s BM_rowReduction_12T/640 50000 39117 10470.98 MFlops/s BM_rowReduction_12T/4K 500 3034298 8239.14 MFlops/s BM_rowReduction_4T/10 1000000 1834 54.51 MFlops/s BM_rowReduction_4T/80 500000 5406 1183.81 MFlops/s BM_rowReduction_4T/640 50000 35017 11697.16 MFlops/s BM_rowReduction_4T/4K 500 3428527 7291.76 MFlops/s BM_rowReduction_8T/10 1000000 1925 51.95 MFlops/s BM_rowReduction_8T/80 200000 8519 751.23 MFlops/s BM_rowReduction_8T/640 50000 33441 12248.42 MFlops/s BM_rowReduction_8T/4K 1000 2852841 8763.19 MFlops/s After: BM_colReduction_12T/10 50000000 59 1678.30 MFlops/s BM_colReduction_12T/80 5000000 725 8822.71 MFlops/s BM_colReduction_12T/640 20000 90882 4506.93 MFlops/s BM_colReduction_12T/4K 500 4668855 5354.63 MFlops/s BM_colReduction_4T/10 50000000 59 1687.37 MFlops/s BM_colReduction_4T/80 5000000 737 8681.24 MFlops/s BM_colReduction_4T/640 50000 108637 3770.34 MFlops/s BM_colReduction_4T/4K 500 7912954 3159.38 MFlops/s BM_colReduction_8T/10 50000000 60 1657.21 MFlops/s BM_colReduction_8T/80 5000000 726 8812.48 MFlops/s BM_colReduction_8T/640 20000 91451 4478.90 MFlops/s BM_colReduction_8T/4K 500 5441692 4594.16 MFlops/s BM_rowReduction_12T/10 20000000 93 1065.28 MFlops/s BM_rowReduction_12T/80 2000000 950 6730.96 MFlops/s BM_rowReduction_12T/640 50000 38196 10723.48 MFlops/s BM_rowReduction_12T/4K 500 3019217 8280.29 MFlops/s BM_rowReduction_4T/10 20000000 93 1064.30 MFlops/s BM_rowReduction_4T/80 2000000 959 6667.71 MFlops/s BM_rowReduction_4T/640 50000 37433 10941.96 MFlops/s BM_rowReduction_4T/4K 500 3036476 8233.23 MFlops/s BM_rowReduction_8T/10 20000000 93 1072.47 MFlops/s BM_rowReduction_8T/80 2000000 959 6670.04 MFlops/s BM_rowReduction_8T/640 50000 38069 10759.37 MFlops/s BM_rowReduction_8T/4K 1000 2758988 9061.29 MFlops/s	2016-05-16 08:55:21 -07:00
Benoit Steiner	b789a26804	Fixed syntax error	2016-05-16 08:51:08 -07:00
Benoit Steiner	83dfb40f66	Turnon the new thread pool by default since it scales much better over multiple cores. It is still possible to revert to the old thread pool by compiling with the EIGEN_USE_SIMPLE_THREAD_POOL define.	2016-05-13 17:23:15 -07:00
Benoit Steiner	97605c7b27	New multithreaded contraction that doesn't rely on the thread pool to run the closure in the order in which they are enqueued. This is needed in order to switch to the new non blocking thread pool since this new thread pool can execute the closure in any order.	2016-05-13 17:11:29 -07:00
Benoit Steiner	c4fc8b70ec	Removed unnecessary thread synchronization	2016-05-13 10:49:38 -07:00
Benoit Steiner	7aa3557d31	Fixed compilation errors triggered by old versions of gcc	2016-05-12 18:59:04 -07:00
Rasmus Munk Larsen	5005b27fc8	Diasbled cost model by accident. Revert.	2016-05-12 16:55:21 -07:00
Rasmus Munk Larsen	989e419328	Address comments by bsteiner.	2016-05-12 16:54:19 -07:00
Rasmus Munk Larsen	e55deb21c5	Improvements to parallelFor. Move some scalar functors from TensorFunctors. to Eigen core.	2016-05-12 14:07:22 -07:00
Benoit Steiner	ae9688f313	Worked around a compilation error triggered by nvcc when compiling a tensor concatenation kernel.	2016-05-12 12:06:51 -07:00
Benoit Steiner	2a54b70d45	Fixed potential race condition in the non blocking thread pool	2016-05-12 11:45:48 -07:00
Benoit Steiner	a071629fec	Replace implicit cast with an explicit one	2016-05-12 10:40:07 -07:00
Benoit Steiner	2f9401b061	Worked around compilation errors with older versions of gcc	2016-05-11 23:39:20 -07:00
Benoit Steiner	09653e1f82	Improved the portability of the tensor code	2016-05-11 23:29:09 -07:00
Benoit Steiner	b6a517c47d	Added the ability to load fp16 using the texture path. Improved the performance of some reductions on fp16	2016-05-11 21:26:48 -07:00
Christoph Hertzberg	1a1ce6ff61	Removed deprecated flag (which apparently was ignored anyway)	2016-05-11 23:05:37 +02:00
Christoph Hertzberg	2150f13d65	fixed some double-promotion and sign-compare warnings	2016-05-11 23:02:26 +02:00
Benoit Steiner	217d984abc	Fixed a typo in my previous commit	2016-05-11 10:22:15 -07:00
Benoit Steiner	08348b4e48	Fix potential race condition in the CUDA reduction code.	2016-05-11 10:08:51 -07:00
Benoit Steiner	6a5717dc74	Explicitely initialize all the atomic variables.	2016-05-11 10:04:41 -07:00
Benoit Steiner	4ede059de1	Properly gate the use of half2.	2016-05-10 17:04:01 -07:00
Benoit Steiner	661e710092	Added support for fp16 to the sigmoid functor.	2016-05-10 12:25:27 -07:00
Benoit Steiner	0eb69b7552	Small improvement to the full reduction of fp16	2016-05-10 11:58:18 -07:00
Benoit Steiner	4013b8feca	Simplified the reduction code a little.	2016-05-10 09:40:42 -07:00
Benoit Steiner	4670d7d5ce	Improved the performance of full reductions on GPU: Before: BM_fullReduction/10 200000 11751 8.51 MFlops/s BM_fullReduction/80 5000 523385 12.23 MFlops/s BM_fullReduction/640 50 36179326 11.32 MFlops/s BM_fullReduction/4K 1 2173517195 11.50 MFlops/s After: BM_fullReduction/10 500000 5987 16.70 MFlops/s BM_fullReduction/80 200000 10636 601.73 MFlops/s BM_fullReduction/640 50000 58428 7010.31 MFlops/s BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s	2016-05-09 17:09:54 -07:00
Benoit Steiner	c3859a2b58	Added the ability to use a scratch buffer in cuda kernels	2016-05-09 17:05:53 -07:00
Benoit Steiner	ba95e43ea2	Added a new parallelFor api to the thread pool device.	2016-05-09 10:45:12 -07:00
Benoit Steiner	dc7dbc2df7	Optimized the non blocking thread pool: * Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered. * Directly pop from a non-empty queue when we are waiting for work, instead of first noticing that there is a non-empty queue and then doing another round of random stealing to re-discover the non-empty queue. * Steal only 1 task from a remote queue instead of half of tasks.	2016-05-09 10:17:17 -07:00
Benoit Steiner	c54ae65c83	Marked a few tensor operations as read only	2016-05-05 17:18:47 -07:00
Benoit Steiner	910e013506	Relaxed an assertion that was tighter that necessary.	2016-05-05 15:38:16 -07:00
Benoit Steiner	28d5572658	Fixed some incorrect assertions	2016-05-05 10:02:26 -07:00
Benoit Steiner	a4d6e8fef0	Strongly hint but don't force the compiler to unroll a some loops in the tensor executor. This results in up to 27% faster code.	2016-05-05 09:25:55 -07:00
Benoit Steiner	f363e533aa	Added tests for full contractions using thread pools and gpu devices. Fixed a couple of issues in the corresponding code.	2016-05-05 09:05:45 -07:00
Benoit Steiner	06d774bf58	Updated the contraction code to ensure that full contraction return a tensor of rank 0	2016-05-05 08:37:47 -07:00
Christoph Hertzberg	dacb469bc9	Enable and fix -Wdouble-conversion warnings	2016-05-05 13:35:45 +02:00
Benoit Steiner	dd2b45feed	Removed extraneous 'explicit' keywords	2016-05-04 16:57:52 -07:00
Benoit Steiner	968ec1c2ae	Use numext::isfinite instead of std::isfinite	2016-05-03 19:56:40 -07:00
Benoit Steiner	aad9a04da4	Deleted superfluous explicit keyword.	2016-05-03 09:37:19 -07:00
Benoit Steiner	8a9228ed9b	Fixed compilation error	2016-05-01 14:48:01 -07:00
Benoit Steiner	d6c9596fd8	Added missing accessors to fixed sized tensors	2016-04-29 18:51:33 -07:00
Benoit Steiner	17fe7f354e	Deleted trailing commas	2016-04-29 18:39:01 -07:00
Benoit Steiner	e5f71aa6b2	Deleted useless trailing commas	2016-04-29 18:36:10 -07:00
Benoit Steiner	44f592dceb	Deleted unnecessary trailing commas.	2016-04-29 18:33:46 -07:00
Benoit Steiner	f100d1494c	Return the proper size (ie 1) for tensors of rank 0	2016-04-29 18:14:33 -07:00
Benoit Steiner	a8c0405cf5	Deleted unused default values for template parameters	2016-04-29 16:34:43 -07:00
Benoit Steiner	c07404f6a1	Restore Tensor support for non c++11 compilers	2016-04-29 15:19:19 -07:00
Benoit Steiner	ba32ded021	Fixed include path	2016-04-29 15:11:09 -07:00
Gael Guennebaud	318e65e0ae	Fix missing inclusion of Eigen/Core	2016-04-27 23:05:40 +02:00
Rasmus Munk Larsen	463738ccbe	Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases.	2016-04-27 12:26:18 -07:00
Gael Guennebaud	3dddd34133	Refactor the unsupported CXX11/Core module to internal headers only.	2016-04-26 11:20:25 +02:00
Benoit Steiner	4a164d2c46	Fixed the partial evaluation of non vectorizable tensor subexpressions	2016-04-25 10:43:03 -07:00
Benoit Steiner	fd9401f260	Refined the cost of the striding operation.	2016-04-25 09:16:08 -07:00
Benoit Steiner	4bbc97be5e	Provide access to the base threadpool classes	2016-04-21 17:59:33 -07:00
Benoit Steiner	33adce5c3a	Added the ability to switch to the new thread pool with a #define	2016-04-21 11:59:58 -07:00
Benoit Steiner	f670613e4b	Fixed several compilation warnings	2016-04-21 11:03:02 -07:00
Benoit Steiner	2dde1b1028	Don't crash when attempting to reduce empty tensors.	2016-04-20 18:08:20 -07:00
Benoit Steiner	c7c2054bb5	Started to implement a portable way to yield.	2016-04-19 17:59:58 -07:00
Benoit Steiner	2b72163028	Implemented a more portable version of thread local variables	2016-04-19 15:56:02 -07:00
Benoit Steiner	5b1106c56b	Fixed a compilation error with nvcc 7.	2016-04-19 14:57:57 -07:00
Benoit Steiner	7129d998db	Simplified the code that launches cuda kernels.	2016-04-19 14:55:21 -07:00
Benoit Steiner	b9ea40c30d	Don't take the address of a kernel on CUDA devices that don't support this feature.	2016-04-19 14:35:11 -07:00
Benoit Steiner	884c075058	Use numext::ceil instead of std::ceil	2016-04-19 14:33:30 -07:00
Benoit Steiner	a278414d1b	Avoid an unnecessary copy of the evaluator.	2016-04-19 13:54:28 -07:00
Benoit Steiner	50968a0a3e	Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors.	2016-04-19 11:53:58 -07:00
Benoit Steiner	c8e8f93d6c	Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators.	2016-04-15 16:48:10 -07:00
Benoit Steiner	7cff898e0a	Deleted unnecessary variable	2016-04-15 15:46:14 -07:00
Benoit Steiner	6c43c49e4a	Fixed a few compilation warnings	2016-04-15 15:34:34 -07:00
Benoit Steiner	eb669f989f	Merged in rmlarsen/eigen (pull request PR-178) Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.	2016-04-15 14:53:15 -07:00
Rasmus Munk Larsen	3718bf654b	Get rid of void* casting when calling EvalRange::run.	2016-04-15 12:51:33 -07:00
Benoit Steiner	a62e924656	Added ability to access the cache sizes from the tensor devices	2016-04-14 21:25:06 -07:00
Benoit Steiner	18e6f67426	Added support for exclusive or	2016-04-14 20:37:46 -07:00
Rasmus Munk Larsen	07ac4f7e02	Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default.	2016-04-14 18:28:23 -07:00
Benoit Steiner	9624a1ea3d	Added missing definition of PacketSize in the gpu evaluator of convolution	2016-04-14 17:16:58 -07:00
Benoit Steiner	6fbedf5a4e	Merged in rmlarsen/eigen (pull request PR-177) Eigen Tensor cost model part 1.	2016-04-14 17:13:19 -07:00
Benoit Steiner	9c064b5a97	Cleanup	2016-04-14 16:41:31 -07:00
Benoit Steiner	1372156c41	Prepared the migration to the new non blocking thread pool	2016-04-14 16:16:42 -07:00
Rasmus Munk Larsen	aeb5494a0b	Improvements to cost model.	2016-04-14 15:52:58 -07:00
Benoit Steiner	78a51abc12	Added a more scalable non blocking thread pool	2016-04-14 15:23:10 -07:00
Rasmus Munk Larsen	d2e95492e7	Merge upstream updates.	2016-04-14 13:59:50 -07:00
Rasmus Munk Larsen	235e83aba6	Eigen cost model part 1. This implements a basic recursive framework to estimate the cost of evaluating tensor expressions.	2016-04-14 13:57:35 -07:00
Benoit Steiner	5912ad877c	Silenced a compilation warning	2016-04-14 11:40:14 -07:00
Benoit Steiner	c7167fee0e	Added support for fp16 to the sigmoid function	2016-04-14 10:08:33 -07:00
Benoit Steiner	3b76df64fc	Defer the decision to vectorize tensor CUDA code to the meta kernel. This makes it possible to decide to vectorize or not depending on the capability of the target cuda architecture. In particular, this enables us to vectorize the processing of fp16 when running on device of capability >= 5.3	2016-04-12 10:58:51 -07:00
Benoit Steiner	7d5b17087f	Added missing EIGEN_DEVICE_FUNC to the tensor conversion code.	2016-04-07 20:01:19 -07:00
Benoit Steiner	48308ed801	Added support for isinf, isnan, and isfinite checks to the tensor api	2016-04-07 09:48:36 -07:00
Benoit Steiner	cfb34d808b	Fixed a possible integer overflow.	2016-04-07 08:46:52 -07:00
Benoit Steiner	7be1eaad1e	Fixed typos in the implementation of the zeta and polygamma ops.	2016-04-06 14:15:37 -07:00
tillahoffmann	726bd5f077	Merged eigen/eigen into default	2016-04-05 18:21:05 +01:00
Gael Guennebaud	4d7e230d2f	bug #1189 : fix pow/atan2 compilation for AutoDiffScalar	2016-04-05 14:49:41 +02:00
Till Hoffmann	80eba21ad0	Merge upstream.	2016-04-01 18:18:49 +01:00
Till Hoffmann	ffd770ce94	Fixed CUDA signature.	2016-04-01 17:58:24 +01:00
tillahoffmann	49960adbdd	Merged eigen/eigen into default	2016-04-01 14:36:15 +01:00
Till Hoffmann	57239f4a81	Added polygamma function.	2016-04-01 14:35:21 +01:00
Till Hoffmann	dd5d390daf	Added zeta function.	2016-04-01 13:32:29 +01:00
Benoit Steiner	3da495e6b9	Relaxed the condition used to gate the fft code.	2016-03-31 18:11:51 -07:00
Benoit Steiner	0f5cc504fe	Properly gate the fft code	2016-03-31 12:59:39 -07:00
Benoit Steiner	af4ef540bf	Fixed a off-by-one bug in a debug assertion	2016-03-30 18:37:19 -07:00
Benoit Steiner	791e5cfb69	Added NumTraits for type2index.	2016-03-30 18:36:36 -07:00
Benoit Steiner	483aaad10a	Fixed compilation warning	2016-03-30 17:08:13 -07:00
Benoit Steiner	1b40abbf99	Added missing assignment operator to the TensorUInt128 class, and made misc small improvements	2016-03-30 13:17:03 -07:00
Benoit Steiner	aa45ad2aac	Fixed the formatting of the README.	2016-03-29 15:06:13 -07:00
Benoit Steiner	56df5ef1d7	Attempt to fix the formatting of the README	2016-03-29 15:03:38 -07:00
Benoit Steiner	c38295f0a0	Added support for fmod	2016-03-28 15:53:02 -07:00
Benoit Steiner	6772f653c3	Made it possible to customize the threadpool	2016-03-28 10:01:04 -07:00
Benoit Steiner	1bc81f7889	Fixed compilation warnings on arm	2016-03-28 09:21:04 -07:00
Benoit Steiner	78f83d6f6a	Prevent potential overflow.	2016-03-28 09:18:04 -07:00
Benoit Steiner	74f91ed06c	Improved support for integer modulo	2016-03-25 17:21:56 -07:00
Benoit Steiner	41434a8a85	Avoid unnecessary conversions	2016-03-23 16:52:38 -07:00
Benoit Steiner	92693b50eb	Fixed compilation warning	2016-03-23 16:40:36 -07:00
Benoit Steiner	393bc3b16b	Added comment	2016-03-23 16:22:15 -07:00
Christoph Hertzberg	9642fd7a93	Replace all M_PI by EIGEN_PI and add a check to the testsuite.	2016-03-23 15:37:45 +01:00
Benoit Steiner	3d1e857327	Fixed compilation error	2016-03-22 15:48:28 -07:00
Benoit Steiner	de7d92c259	Pulled latest updates from trunk	2016-03-22 15:24:49 -07:00
Benoit Steiner	002cf0d1c9	Use a single Barrier instead of a collection of Notifications to reduce the thread synchronization overhead	2016-03-22 15:24:23 -07:00
Benoit Steiner	bc2b802751	Fixed a couple of typos	2016-03-22 14:27:34 -07:00
Benoit Steiner	6a31b7be3e	Avoid using std::vector whenever possible	2016-03-22 14:02:50 -07:00
Benoit Steiner	65a7113a36	Use an enum instead of a static const int to prevent possible link error	2016-03-22 09:33:54 -07:00
Benoit Steiner	f9ad25e4d8	Fixed contractions of 16 bit floats	2016-03-22 09:30:23 -07:00
Benoit Steiner	8ef3181f15	Worked around a constness related issue	2016-03-21 11:24:05 -07:00
Benoit Steiner	7a07d6aa2b	Small cleanup	2016-03-21 11:12:17 -07:00
Benoit Steiner	e91f255301	Marked variables that's only used in debug mode as such	2016-03-21 10:02:00 -07:00
Benoit Steiner	db5c14de42	Explicitly cast the default value into the proper scalar type.	2016-03-21 09:52:58 -07:00
Benoit Steiner	8e03333f06	Renamed some class members to make the code more readable.	2016-03-18 15:21:04 -07:00
Benoit Steiner	6c08943d9f	Fixed a bug in the padding of extracted image patches.	2016-03-18 15:19:10 -07:00
Benoit Steiner	9a7ece9caf	Worked around constness issue	2016-03-18 10:38:29 -07:00
Benoit Steiner	edc679f6c6	Fixed compilation warning	2016-03-18 07:12:34 -07:00
Benoit Steiner	70eb70f5f8	Avoid mutable class members when possible	2016-03-17 21:47:18 -07:00
Benoit Steiner	95b8961a9b	Allocate the mersenne twister used by the random number generators on the heap instead of on the stack since they tend to keep a lot of state (i.e. about 5k) around.	2016-03-17 15:23:51 -07:00
Benoit Steiner	f7329619da	Fix bug in tensor contraction. The code assumes that contraction axis indices for the LHS (after possibly swapping to ColMajor!) is increasing. Explicitly sort the contraction axis pairs to make it so.	2016-03-17 15:08:02 -07:00
Christoph Hertzberg	46aa9772fc	Merged in ebrevdo/eigen (pull request PR-169) Bugfixes to cuda tests, igamma & igammac implemented, & tests for digamma, igamma, igammac on CPU & GPU.	2016-03-16 21:59:08 +01:00
Benoit Steiner	b72ffcb05e	Made the comparison of Eigen::array GPU friendly	2016-03-11 16:37:59 -08:00
Benoit Steiner	25f69cb932	Added a comparison operator for Eigen::array Alias Eigen::array to std::array when compiling with Visual Studio 2015	2016-03-11 15:20:37 -08:00
Benoit Steiner	86d45a3c83	Worked around visual studio compilation warnings.	2016-03-09 21:29:39 -08:00
Benoit Steiner	8fd4241377	Fixed a typo.	2016-03-10 02:28:46 +00:00
Benoit Steiner	a685a6beed	Made the list reductions less ambiguous.	2016-03-09 17:41:52 -08:00
Benoit Steiner	3149b5b148	Avoid implicit cast	2016-03-09 17:35:17 -08:00
Benoit Steiner	b2100b83ad	Made sure to include the <random> header file when compiling with visual studio	2016-03-09 16:03:16 -08:00
Benoit Steiner	f05fb449b8	Avoid unnecessary conversion from 32bit int to 64bit unsigned int	2016-03-09 15:27:45 -08:00
Benoit Steiner	1d566417d2	Enable the random number generators when compiling with visual studio	2016-03-09 10:55:11 -08:00
Benoit Steiner	b084133dbf	Fixed the integer division code on windows	2016-03-09 07:06:36 -08:00
Benoit Steiner	6d30683113	Fixed static assertion	2016-03-08 21:02:51 -08:00
Eugene Brevdo	5e7de771e3	Properly fix merge issues.	2016-03-08 17:35:05 -08:00
Benoit Steiner	46177c8d64	Replace std::vector with our own implementation, as using the stl when compiling with nvcc and avx enabled leads to many issues.	2016-03-08 16:37:27 -08:00
Benoit Steiner	6d6413f768	Simplified the full reduction code	2016-03-08 16:02:00 -08:00
Benoit Steiner	5a427a94a9	Fixed the tensor generator code	2016-03-08 13:28:06 -08:00
Benoit Steiner	a81b88bef7	Fixed the tensor concatenation code	2016-03-08 12:30:19 -08:00
Benoit Steiner	551ff11d0d	Fixed the tensor layout swapping code	2016-03-08 12:28:10 -08:00
Benoit Steiner	8768c063f5	Fixed the tensor chipping code.	2016-03-08 12:26:49 -08:00
Benoit Steiner	e09eb835db	Decoupled the packet type definition from the definition of the tensor ops. All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit.	2016-03-08 12:07:33 -08:00
Benoit Steiner	3b614a2358	Use NumTraits::highest() and NumTraits::lowest() instead of the std::numeric_limits to make the tensor min and max functors more CUDA friendly.	2016-03-07 17:53:28 -08:00
Benoit Steiner	769685e74e	Added the ability to pad a tensor using a non-zero value	2016-03-07 14:45:37 -08:00
Benoit Steiner	7f87cc3a3b	Fix a couple of typos in the code.	2016-03-07 14:31:27 -08:00
Eugene Brevdo	5707004d6b	Fix Eigen's building of sharded tests that use CUDA & more igamma/igammac bugfixes. 0. Prior to this PR, not a single sharded CUDA test was actually being run. Fixed that. GPU tests are still failing for igamma/igammac. 1. Add calls for igamma/igammac to TensorBase 2. Fix up CUDA-specific calls of igamma/igammac 3. Add unit tests for digamma, igamma, igammac in CUDA.	2016-03-07 14:08:56 -08:00
Benoit Steiner	9a54c3e32b	Don't warn that msvc 2015 isn't c++11 compliant just because it doesn't claim to be.	2016-03-06 09:38:56 -08:00
Benoit Steiner	05bbca079a	Turn on some of the cxx11 features when compiling with visual studio 2015	2016-03-05 10:52:08 -08:00
Benoit Steiner	23aed8f2e4	Use EIGEN_PI instead of redefining our own constant PI	2016-03-05 08:04:45 -08:00
Benoit Steiner	ec35068edc	Don't rely on the M_PI constant since not all compilers provide it.	2016-03-04 16:42:38 -08:00
Benoit Steiner	60d9df11c1	Fixed the computation of leading zeros when compiling with msvc.	2016-03-04 16:27:02 -08:00
Benoit Steiner	c561eeb7bf	Don't use implicit type conversions in initializer lists since not all compilers support them.	2016-03-04 14:12:45 -08:00
Benoit Steiner	2c50fc878e	Fixed a typo	2016-03-04 14:09:38 -08:00
Benoit Steiner	5cf4558c0a	Added support for rounding, flooring, and ceiling to the tensor api	2016-03-03 12:36:55 -08:00
Benoit Steiner	68ac5c1738	Improved the performance of large outer reductions on cuda	2016-02-29 18:11:58 -08:00
Benoit Steiner	b2075cb7a2	Made the signature of the inner and outer reducers consistent	2016-02-29 10:53:38 -08:00
Benoit Steiner	3284842045	Optimized the performance of narrow reductions on CUDA devices	2016-02-29 10:48:16 -08:00
Benoit Steiner	609b3337a7	Print some information to stderr when a CUDA kernel fails	2016-02-27 20:42:57 +00:00
Benoit Steiner	ac2e6e0d03	Properly vectorized the random number generators	2016-02-26 13:52:24 -08:00
Benoit Steiner	caa54d888f	Made the TensorIndexList usable on GPU without having to use the -relaxed-constexpr compilation flag	2016-02-26 12:38:18 -08:00
Benoit Steiner	2cd32cad27	Reverted previous commit since it caused more problems than it solved	2016-02-26 13:21:44 +00:00
Benoit Steiner	d9d05dd96e	Fixed handling of long doubles on aarch64	2016-02-26 04:13:58 -08:00
Benoit Steiner	c36c09169e	Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices.	2016-02-24 17:07:25 -08:00
Benoit Steiner	7a01cb8e4b	Marked the And and Or reducers as stateless.	2016-02-24 16:43:01 -08:00
Benoit Steiner	1d9256f7db	Updated the padding code to work with half floats	2016-02-23 05:51:22 +00:00
Benoit Steiner	72d2cf642e	Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode.	2016-02-22 15:29:41 -08:00
Benoit Steiner	5cd00068c0	include <iostream> in the tensor header since we now use it to better report cuda initialization errors	2016-02-22 13:59:03 -08:00
Benoit Steiner	257b640463	Fixed compilation warning generated by clang	2016-02-21 22:43:37 -08:00
Benoit Steiner	96a24b05cc	Optimized casting of tensors in the case where the casting happens to be a no-op	2016-02-21 11:16:15 -08:00
Benoit Steiner	203490017f	Prevent unecessary Index to int conversions	2016-02-21 08:49:36 -08:00
Rasmus Munk Larsen	8eb127022b	Get rid of duplicate code.	2016-02-19 16:33:30 -08:00
Rasmus Munk Larsen	d5e2ec7447	Speed up tensor FFT by up ~25-50%. Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_tensor_fft_single_1D_cpu/8 132 134 -1.5% BM_tensor_fft_single_1D_cpu/9 1162 1229 -5.8% BM_tensor_fft_single_1D_cpu/16 199 195 +2.0% BM_tensor_fft_single_1D_cpu/17 2587 2267 +12.4% BM_tensor_fft_single_1D_cpu/32 373 341 +8.6% BM_tensor_fft_single_1D_cpu/33 5922 4879 +17.6% BM_tensor_fft_single_1D_cpu/64 797 675 +15.3% BM_tensor_fft_single_1D_cpu/65 13580 10481 +22.8% BM_tensor_fft_single_1D_cpu/128 1753 1375 +21.6% BM_tensor_fft_single_1D_cpu/129 31426 22789 +27.5% BM_tensor_fft_single_1D_cpu/256 4005 3008 +24.9% BM_tensor_fft_single_1D_cpu/257 70910 49549 +30.1% BM_tensor_fft_single_1D_cpu/512 8989 6524 +27.4% BM_tensor_fft_single_1D_cpu/513 165402 107751 +34.9% BM_tensor_fft_single_1D_cpu/999 198293 115909 +41.5% BM_tensor_fft_single_1D_cpu/1ki 21289 14143 +33.6% BM_tensor_fft_single_1D_cpu/1k 361980 233355 +35.5% BM_tensor_fft_double_1D_cpu/8 138 131 +5.1% BM_tensor_fft_double_1D_cpu/9 1253 1133 +9.6% BM_tensor_fft_double_1D_cpu/16 218 200 +8.3% BM_tensor_fft_double_1D_cpu/17 2770 2392 +13.6% BM_tensor_fft_double_1D_cpu/32 406 368 +9.4% BM_tensor_fft_double_1D_cpu/33 6418 5153 +19.7% BM_tensor_fft_double_1D_cpu/64 856 728 +15.0% BM_tensor_fft_double_1D_cpu/65 14666 11148 +24.0% BM_tensor_fft_double_1D_cpu/128 1913 1502 +21.5% BM_tensor_fft_double_1D_cpu/129 36414 24072 +33.9% BM_tensor_fft_double_1D_cpu/256 4226 3216 +23.9% BM_tensor_fft_double_1D_cpu/257 86638 52059 +39.9% BM_tensor_fft_double_1D_cpu/512 9397 6939 +26.2% BM_tensor_fft_double_1D_cpu/513 203208 114090 +43.9% BM_tensor_fft_double_1D_cpu/999 237841 125583 +47.2% BM_tensor_fft_double_1D_cpu/1ki 20921 15392 +26.4% BM_tensor_fft_double_1D_cpu/1k 455183 250763 +44.9% BM_tensor_fft_single_2D_cpu/8 1051 1005 +4.4% BM_tensor_fft_single_2D_cpu/9 16784 14837 +11.6% BM_tensor_fft_single_2D_cpu/16 4074 3772 +7.4% BM_tensor_fft_single_2D_cpu/17 75802 63884 +15.7% BM_tensor_fft_single_2D_cpu/32 20580 16931 +17.7% BM_tensor_fft_single_2D_cpu/33 345798 278579 +19.4% BM_tensor_fft_single_2D_cpu/64 97548 81237 +16.7% BM_tensor_fft_single_2D_cpu/65 1592701 1227048 +23.0% BM_tensor_fft_single_2D_cpu/128 472318 384303 +18.6% BM_tensor_fft_single_2D_cpu/129 7038351 5445308 +22.6% BM_tensor_fft_single_2D_cpu/256 2309474 1850969 +19.9% BM_tensor_fft_single_2D_cpu/257 31849182 23797538 +25.3% BM_tensor_fft_single_2D_cpu/512 10395194 8077499 +22.3% BM_tensor_fft_single_2D_cpu/513 144053843 104242541 +27.6% BM_tensor_fft_single_2D_cpu/999 279885833 208389718 +25.5% BM_tensor_fft_single_2D_cpu/1ki 45967677 36070985 +21.5% BM_tensor_fft_single_2D_cpu/1k 619727095 456489500 +26.3% BM_tensor_fft_double_2D_cpu/8 1110 1016 +8.5% BM_tensor_fft_double_2D_cpu/9 17957 15768 +12.2% BM_tensor_fft_double_2D_cpu/16 4558 4000 +12.2% BM_tensor_fft_double_2D_cpu/17 79237 66901 +15.6% BM_tensor_fft_double_2D_cpu/32 21494 17699 +17.7% BM_tensor_fft_double_2D_cpu/33 357962 290357 +18.9% BM_tensor_fft_double_2D_cpu/64 105179 87435 +16.9% BM_tensor_fft_double_2D_cpu/65 1617143 1288006 +20.4% BM_tensor_fft_double_2D_cpu/128 512848 419397 +18.2% BM_tensor_fft_double_2D_cpu/129 7271322 5636884 +22.5% BM_tensor_fft_double_2D_cpu/256 2415529 1922032 +20.4% BM_tensor_fft_double_2D_cpu/257 32517952 24462177 +24.8% BM_tensor_fft_double_2D_cpu/512 10724898 8287617 +22.7% BM_tensor_fft_double_2D_cpu/513 146007419 108603266 +25.6% BM_tensor_fft_double_2D_cpu/999 296351330 221885776 +25.1% BM_tensor_fft_double_2D_cpu/1ki 59334166 48357539 +18.5% BM_tensor_fft_double_2D_cpu/1k 666660132 483840349 +27.4%	2016-02-19 16:29:23 -08:00
Benoit Steiner	46fc23f91c	Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues.	2016-02-19 13:44:22 -08:00
Benoit Steiner	670db7988d	Updated the contraction code to make it compatible with half floats.	2016-02-19 13:03:26 -08:00
Benoit Steiner	180156ba1a	Added support for tensor reductions on half floats	2016-02-19 10:05:59 -08:00
Benoit Steiner	f268db1c4b	Added the ability to query the minor version of a cuda device	2016-02-19 16:31:04 +00:00
Benoit Steiner	f3352e0fb0	Don't make the array constructors explicit	2016-02-19 15:58:57 +00:00
Benoit Steiner	cd042dbbfd	Fixed a bug in the tensor type converter	2016-02-19 15:03:26 +00:00
Benoit Steiner	de345eff2e	Added a method to conjugate the content of a tensor or the result of a tensor expression.	2016-02-11 16:34:07 -08:00
Benoit Steiner	9a21b38ccc	Worked around a few clang compilation warnings	2016-02-10 08:02:04 -08:00
Benoit Steiner	72ab7879f7	Fixed clang comilation warnings	2016-02-10 06:48:28 -08:00
Benoit Steiner	e88535634d	Fixed some clang compilation warnings	2016-02-09 23:32:41 -08:00
Benoit Steiner	d69946183d	Updated the TensorIntDivisor code to work properly on LLP64 systems	2016-02-08 21:03:59 -08:00
Benoit Steiner	4d4211c04e	Avoid unecessary type conversions	2016-02-05 18:19:41 -08:00
Benoit Steiner	f535378995	Added support for vectorized type casting of int to char.	2016-02-03 18:58:29 -08:00
Benoit Steiner	4ab63a3f6f	Fixed the initialization of the dummy member of the array class to make it compatible with pairs of element.	2016-02-03 17:23:07 -08:00
Benoit Steiner	1cbb79cdfd	Made sure the dummy element of size 0 array is always intialized to silence some compiler warnings	2016-02-03 15:58:26 -08:00
Benoit Steiner	dc413dbe8a	Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158) Add constructor for long types.	2016-02-02 20:58:06 -08:00
Ville Kallioniemi	783018d8f6	Use EIGEN_STATIC_ASSERT for backward compatibility.	2016-02-02 16:45:12 -07:00
Benoit Steiner	99cde88341	Don't try to use direct offsets when computing a tensor product, since the required stride isn't available.	2016-02-02 11:06:53 -08:00
Ville Kallioniemi	aedea349aa	Replace separate low word constructors with a single templated constructor.	2016-02-01 20:25:02 -07:00
Ville Kallioniemi	f0fdefa96f	Rebase to latest.	2016-02-01 19:32:31 -07:00
Benoit Steiner	6b5dff875e	Made it possible to limit the number of blocks that will be used to evaluate a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.	2016-02-01 12:46:32 -08:00
Benoit Steiner	e80ed948e1	Fixed a number of compilation warnings generated by the cuda tests	2016-01-31 20:09:41 -08:00
Benoit Steiner	6720b38fbf	Fixed a few compilation warnings	2016-01-31 16:48:50 -08:00
Benoit Steiner	963f2d2a8f	Marked several methods EIGEN_DEVICE_FUNC	2016-01-28 23:37:48 -08:00
Benoit Steiner	c5d25bf1d0	Fixed a couple of compilation warnings.	2016-01-28 23:15:45 -08:00
Gael Guennebaud	ddf64babde	merge	2016-01-28 13:21:48 +01:00
Benoit Steiner	4bf9eaf77a	Deleted an invalid assertion that prevented the assignment of empty tensors.	2016-01-27 17:09:30 -08:00
Benoit Steiner	291069e885	Fixed some compilation problems with nvcc + clang	2016-01-27 15:37:03 -08:00
Gael Guennebaud	9c8f7dfe94	bug #1156 : fix several function declarations whose arguments were passed by value instead of being passed by reference	2016-01-27 18:34:42 +01:00
Ville Kallioniemi	02db1228ed	Add constructor for long types.	2016-01-26 23:41:01 -07:00
Hauke Heibel	5eb2790be0	Fixed minor typo in SplineFitting.	2016-01-25 22:17:52 +01:00
Benoit Steiner	e3a15a03a4	Don't explicitely evaluate the subexpression from TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression	2016-01-24 23:04:50 -08:00
Benoit Steiner	bd207ce11e	Added missing EIGEN_DEVICE_FUNC qualifier	2016-01-24 20:36:05 -08:00
Benoit Steiner	cb4e53ff7f	Merged in ville-k/eigen/tensorflow_fix (pull request PR-153) Add ctor for long	2016-01-22 19:11:31 -08:00
Ville Kallioniemi	9f94e030c1	Re-add executable flags to minimize changeset.	2016-01-22 20:08:45 -07:00
Benoit Steiner	3aeeca32af	Leverage the new blocking code in the tensor contraction code.	2016-01-22 16:36:30 -08:00
Benoit Steiner	4beb447e27	Created a mechanism to enable contraction mappers to determine the best blocking strategy.	2016-01-22 14:37:26 -08:00
Gael Guennebaud	6a44ccb58b	Backout changeset `690bc950f7`	2016-01-22 15:03:53 +01:00
Ville Kallioniemi	9b6c72958a	Update to latest default branch	2016-01-21 23:08:54 -07:00
Benoit Steiner	c33479324c	Fixed a constness bug	2016-01-21 17:08:11 -08:00
Jan Prach	690bc950f7	fix clang warnings "braces around scalar initializer"	2016-01-20 19:35:59 -08:00
Benoit Steiner	7ce932edd3	Small cleanup and small fix to the contraction of row major tensors	2016-01-20 18:12:08 -08:00
Benoit Steiner	47076bf00e	Reduce the register pressure exerted by the tensor mappers whenever possible. This improves the performance of the contraction of a matrix with a vector by about 35%.	2016-01-20 14:51:48 -08:00
Ville Kallioniemi	915e7667cd	Remove executable bit from header files	2016-01-19 21:17:29 -07:00
Ville Kallioniemi	2832175a68	Use explicitly 32 bit integer types in constructors.	2016-01-19 20:12:17 -07:00
Benoit Steiner	df79c00901	Improved the formatting of the code	2016-01-19 17:24:08 -08:00
Benoit Steiner	6d472d8375	Moved the contraction mapping code to its own file to make the code more manageable.	2016-01-19 17:22:05 -08:00
Benoit Steiner	b3b722905f	Improved code indentation	2016-01-19 17:09:47 -08:00
Benoit Steiner	5b7713dd33	Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression.	2016-01-19 17:05:10 -08:00

... 6 7 8 9 10 ...

1934 Commits