eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Benoit Steiner	c6b0de2c21	Improved partial reductions in more cases	2016-07-22 17:18:20 -07:00
Gael Guennebaud	0f350a8b7e	Fix CUDA compilation	2016-07-21 18:47:07 +02:00
Yi Lin	7b4abc2b1d	Fixed a code comment error	2016-07-20 22:28:54 +08:00
Benoit Steiner	20f7ef2f89	An evalTo expression is only aligned iff both the lhs and the rhs are aligned.	2016-07-12 10:56:42 -07:00
Benoit Steiner	3a2dd352ae	Improved the contraction mapper to properly support tensor products	2016-07-11 13:43:41 -07:00
Benoit Steiner	0bc020be9d	Improved the detection of packet size in the tensor scan evaluator.	2016-07-11 12:14:56 -07:00
Gael Guennebaud	a96a7ce3f7	Move CUDA's special functions to SpecialFunctions module.	2016-07-11 18:39:11 +02:00
Gael Guennebaud	fd60966310	merge	2016-07-11 18:11:47 +02:00
Gael Guennebaud	194daa3048	Fix assertion (it did not make sense for static_val types)	2016-07-11 11:39:27 +02:00
Gael Guennebaud	18c35747ce	Emulate _BitScanReverse64 for 32 bits builds	2016-07-11 11:38:04 +02:00
Gael Guennebaud	599f8ba617	Change runtime to compile-time conditional.	2016-07-08 11:39:43 +02:00
Gael Guennebaud	544935101a	Fix warnings	2016-07-08 11:38:52 +02:00
Gael Guennebaud	2f7e2614e7	bug #1232 : refactor special functions as a new SpecialFunctions module, currently in unsupported/.	2016-07-08 11:13:55 +02:00
Gael Guennebaud	179ebb88f9	Fix warning	2016-07-07 09:16:40 +02:00
Gael Guennebaud	ce9fc0ce14	fix clang compilation	2016-07-04 12:59:02 +02:00
Gael Guennebaud	440020474c	Workaround compilation issue with msvc	2016-07-04 12:49:19 +02:00
Igor Babuschkin	78f37ca03c	Expose real and imag methods on Tensors	2016-07-01 17:34:31 +01:00
Benoit Steiner	cb2d8b8fa6	Made it possible to compile reductions for an old cuda architecture and run them on a recent gpu.	2016-06-29 15:42:01 -07:00
Benoit Steiner	b2a47641ce	Made the code compile when using CUDA architecture < 300	2016-06-29 15:32:47 -07:00
Igor Babuschkin	85699850d9	Add missing CUDA kernel to tensor scan op The TensorScanOp implementation was missing a CUDA kernel launch. This adds a simple placeholder implementation.	2016-06-29 11:54:35 +01:00
Benoit Steiner	75c333f94c	Don't store the scan axis in the evaluator of the tensor scan operation since it's only used in the constructor. Also avoid taking references to values that may becomes stale after a copy construction.	2016-06-27 10:32:38 -07:00
Benoit Steiner	7944d4431f	Made the cost model cwiseMax and cwiseMin methods consts to help the PowerPC cuda compiler compile this code.	2016-08-18 13:46:36 -07:00
Benoit Steiner	647a51b426	Force the inlining of a simple accessor.	2016-08-18 12:31:02 -07:00
Benoit Steiner	a452dedb4f	Merged in ibab/eigen/double-tensor-reduction (pull request PR-216) Enable efficient Tensor reduction for doubles on the GPU (continued)	2016-08-18 12:29:54 -07:00
Igor Babuschkin	18c67df31c	Fix remaining CUDA >= 300 checks	2016-08-18 17:18:30 +01:00
Igor Babuschkin	1569a7d7ab	Add the necessary CUDA >= 300 checks back	2016-08-18 17:15:12 +01:00
Benoit Steiner	2b17f34574	Properly detect the type of the result of a contraction.	2016-08-16 16:00:30 -07:00
Igor Babuschkin	841e075154	Remove CUDA >= 300 checks and enable outer reductin for doubles	2016-08-06 18:07:50 +01:00
Igor Babuschkin	0425118e2a	Merge upstream changes	2016-08-05 14:34:57 +01:00
Igor Babuschkin	9537e8b118	Make use of atomicExch for atomicExchCustom	2016-08-05 14:29:58 +01:00
Igor Babuschkin	eeb0d880ee	Enable efficient Tensor reduction for doubles	2016-07-01 19:08:26 +01:00
Gael Guennebaud	cfff370549	Fix hyperbolic functions for autodiff.	2016-06-24 23:21:35 +02:00
Gael Guennebaud	3852351793	merge pull request 198	2016-06-24 11:48:17 +02:00
Gael Guennebaud	6dd9077070	Fix some unused typedef warnings.	2016-06-24 11:34:21 +02:00
Gael Guennebaud	ce90647fa5	Fix NumTraits<AutoDiff>	2016-06-24 11:34:02 +02:00
Gael Guennebaud	fa39f81b48	Fix instantiation of ScalarBinaryOpTraits for AutoDiff.	2016-06-24 11:33:30 +02:00
Rasmus Munk Larsen	a9c1e4d7b7	Return -1 from CurrentThreadId when called by thread outside the pool.	2016-06-23 16:40:07 -07:00
Rasmus Munk Larsen	d39df320d2	Resolve merge.	2016-06-23 15:08:03 -07:00
Gael Guennebaud	360a743a10	bug #1241 : does not emmit anything for empty tensors	2016-06-23 18:47:31 +02:00
Gael Guennebaud	7c6561485a	merge PR 194	2016-06-23 15:29:57 +02:00
Benoit Steiner	a29a2cb4ff	Silenced a couple of compilation warnings generated by xcode	2016-06-22 16:43:02 -07:00
Benoit Steiner	f8fcd6b32d	Turned the constructor of the PerThread struct into what is effectively a constant expression to make the code compatible with a wider range of compilers	2016-06-22 16:03:11 -07:00
Benoit Steiner	c58df31747	Handle empty tensors in the print functions	2016-06-21 09:22:43 -07:00
Benoit Steiner	de32f8d656	Fixed the printing of rank-0 tensors	2016-06-20 10:46:45 -07:00
Tal Hadad	8e198d6835	Complete docs and add ostream operator for EulerAngles.	2016-06-19 20:42:45 +03:00
Geoffrey Lalonde	72c95383e0	Add autodiff coverage for standard library hyperbolic functions, and tests. * * * Corrected tanh derivatived, moved test definitions. * * * Added more test cases, removed lingering lines	2016-06-15 23:33:19 -07:00
Benoit Steiner	7d495d890a	Merged in ibab/eigen (pull request PR-197) Implement exclusive scan option for Tensor library	2016-06-14 17:54:59 -07:00
Benoit Steiner	aedc5be1d6	Avoid generating pseudo random numbers that are multiple of 5: this helps spread the load over multiple cpus without havind to rely on work stealing.	2016-06-14 17:51:47 -07:00
Igor Babuschkin	c4d10e921f	Implement exclusive scan option	2016-06-14 19:44:07 +01:00
Gael Guennebaud	76236cdea4	merge	2016-06-14 15:33:47 +02:00
Gael Guennebaud	62134082aa	Update AutoDiffScalar wrt to scalar-multiple.	2016-06-14 15:06:35 +02:00
Gael Guennebaud	5d38203735	Update Tensor module to use bind1st_op and bind2nd_op	2016-06-14 15:06:03 +02:00
Tal Hadad	6edfe8771b	Little bit docs	2016-06-13 22:03:19 +03:00
Tal Hadad	6e1c086593	Add static assertion	2016-06-13 21:55:17 +03:00
Gael Guennebaud	3c12e24164	Add bind1st_op and bind2nd_op helpers to turn binary functors into unary ones, and implement scalar_multiple2 and scalar_quotient2 on top of them.	2016-06-13 16:18:59 +02:00
Tal Hadad	06206482d9	More docs, and minor code fixes	2016-06-12 23:40:17 +03:00
Benoit Steiner	65d33e5898	Merged in ibab/eigen (pull request PR-195) Add small fixes to TensorScanOp	2016-06-10 19:31:17 -07:00
Benoit Steiner	a05607875a	Don't refer to the half2 type unless it's been defined	2016-06-10 11:53:56 -07:00
Igor Babuschkin	86aedc9282	Add small fixes to TensorScanOp	2016-06-07 20:06:38 +01:00
Benoit Steiner	84b2060a9e	Fixed compilation error with gcc 4.4	2016-06-06 17:16:19 -07:00
Benoit Steiner	7ef9f47b58	Misc small improvements to the reduction code.	2016-06-06 14:09:46 -07:00
Tal Hadad	e30133e439	Doc EulerAngles class, and minor fixes.	2016-06-06 22:01:40 +03:00
Benoit Steiner	9137f560f0	Moved assertions to the constructor to make the code more portable	2016-06-06 07:26:48 -07:00
Gael Guennebaud	66e99ab6a1	Relax mixing-type constraints for binary coefficient-wise operators: - Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP> - Remove the "functor_is_product_like" helper (was pretty ugly) - Currently, OP is not used, but it is available to the user for fine grained tuning - Currently, only the following operators have been generalized: ,/,+,-,=,=,/=,+=,-= - TODO: generalize all other binray operators (comparisons,pow,etc.) - TODO: handle "scalar op array" operators (currently only * is handled) - TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits	2016-06-06 15:11:41 +02:00
Rasmus Munk Larsen	f1f2ff8208	size_t -> int	2016-06-03 18:06:37 -07:00
Rasmus Munk Larsen	76308e7fd2	Add CurrentThreadId and NumThreads methods to Eigen threadpools and TensorDeviceThreadPool.	2016-06-03 16:28:58 -07:00
Benoit Steiner	37638dafd7	Simplified the code that dispatches vectorized reductions on GPU	2016-06-09 10:29:52 -07:00
Benoit Steiner	66796e843d	Fixed definition of some of the reducer_traits	2016-06-09 08:50:01 -07:00
Benoit Steiner	14a112ee15	Use signed integers more consistently to encode the number of threads to use to evaluate a tensor expression.	2016-06-09 08:25:22 -07:00
Benoit Steiner	8f92c26319	Improved code formatting	2016-06-09 08:23:42 -07:00
Benoit Steiner	aa33446dac	Improved support for vectorization of 16-bit floats	2016-06-09 08:22:27 -07:00
Benoit Steiner	d6d39c7ddb	Added missing EIGEN_DEVICE_FUNC	2016-06-07 14:35:08 -07:00
Gael Guennebaud	e8b922ca63	Fix MatrixFunctions module.	2016-06-03 09:21:35 +02:00
Benoit Steiner	c3c8ad8046	Align the first element of the Waiter struct instead of padding it. This reduces its memory footprint a bit while achieving the goal of preventing false sharing	2016-06-02 21:17:41 -07:00
Eugene Brevdo	39baff850c	Add TernaryFunctors and the betainc SpecialFunction. TernaryFunctors and their executors allow operations on 3-tuples of inputs. API fully implemented for Arrays and Tensors based on binary functors. Ported the cephes betainc function (regularized incomplete beta integral) to Eigen, with support for CPU and GPU, floats, doubles, and half types. Added unit tests in array.cpp and cxx11_tensor_cuda.cu Collapsed revision * Merged helper methods for betainc across floats and doubles. * Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase. * Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper. * betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup. * Update TernaryOp and SpecialFunctions (betainc) based on review comments.	2016-06-02 17:04:19 -07:00
Benoit Steiner	c21eaedce6	Use array_prod to compute the number of elements contained in the input tensor expression	2016-06-04 07:47:04 -07:00
Benoit Steiner	36a4500822	Merged in ibab/eigen (pull request PR-192) Add generic scan method	2016-06-03 17:28:33 -07:00
Benoit Steiner	c2a102345f	Improved the performance of full reductions. AFTER: BM_fullReduction/10 4541 4543 154017 21.0M items/s BM_fullReduction/64 5191 5193 100000 752.5M items/s BM_fullReduction/512 9588 9588 71361 25.5G items/s BM_fullReduction/4k 244314 244281 2863 64.0G items/s BM_fullReduction/5k 359382 359363 1946 64.8G items/s BEFORE: BM_fullReduction/10 9085 9087 74395 10.5M items/s BM_fullReduction/64 9478 9478 72014 412.1M items/s BM_fullReduction/512 14643 14646 46902 16.7G items/s BM_fullReduction/4k 260338 260384 2678 60.0G items/s BM_fullReduction/5k 385076 385178 1818 60.5G items/s	2016-06-03 17:27:08 -07:00
Igor Babuschkin	dc03b8f3a1	Add generic scan method	2016-06-03 17:37:04 +01:00
Rasmus Munk Larsen	811aadbe00	Add syntactic sugar to Eigen tensors to allow more natural syntax. Specifically, this enables expressions involving: scalar + tensor scalar * tensor scalar / tensor scalar - tensor	2016-06-02 12:41:28 -07:00
Tal Hadad	52e4cbf539	Merged eigen/eigen into default	2016-06-02 22:15:20 +03:00
Tal Hadad	2aaaf22623	Fix Gael reports (except documention) - "Scalar angle(int) const" should be "const Vector& angles() const" - then method "coeffs" could be removed. - avoid one letter names like h, p, r -> use alpha(), beta(), gamma() ;) - about the "fromRotation" methods: - replace the ones which are not static by operator= (as in Quaternion) - the others are actually static methods: use a capital F: FromRotation - method "invert" should be removed. - use a macro to define both float and double EulerAnglesXYZ* typedefs - AddConstIf -> not used - no needs for NegateIfXor, compilers are extremely good at optimizing away branches based on compile time constants: if(IsHeadingOpposite-=IsEven) res.alpha() = -res.alpha();	2016-06-02 22:12:57 +03:00
Igor Babuschkin	fbd7ed6ff7	Add tensor scan op This is the initial implementation a generic scan operation. Based on this, cumsum and cumprod method have been added to TensorBase.	2016-06-02 13:35:47 +01:00
Benoit Steiner	0ed08fd281	Use a single PacketSize variable	2016-06-01 21:19:05 -07:00
Benoit Steiner	8f6fedc55f	Fixed compilation warning	2016-06-01 21:14:46 -07:00
Benoit Steiner	873e6ac54b	Silenced compilation warning generated by nvcc.	2016-06-01 14:20:50 -07:00
Benoit Steiner	d27b0ad4c8	Added support for mean reductions on fp16	2016-06-01 11:12:07 -07:00
Benoit Steiner	5aeb3687c4	Only enable optimized reductions of fp16 if the reduction functor supports them	2016-05-31 10:33:40 -07:00
Benoit Steiner	e2946d962d	Reimplement clamp as a static function.	2016-05-27 12:58:43 -07:00
Benoit Steiner	e96d36d4cd	Use NULL instead of nullptr to preserve the compatibility with cxx03	2016-05-27 12:54:06 -07:00
Benoit Steiner	abc815798b	Added a new operation to enable more powerful tensorindexing.	2016-05-27 12:22:25 -07:00
Gael Guennebaud	22a035db95	Fix compilation when defaulting to row-major	2016-05-27 10:31:11 +02:00
Benoit Steiner	1ae2567861	Fixed some compilation warnings	2016-05-26 15:57:19 -07:00
Benoit Steiner	1a47844529	Preserve the ability to vectorize the evaluation of an expression even when it involves a cast that isn't vectorized (e.g fp16 to float)	2016-05-26 14:37:09 -07:00
Benoit Steiner	36369ab63c	Resolved merge conflicts	2016-05-26 13:39:39 -07:00
Benoit Steiner	28fcb5ca2a	Merged latest reduction improvements	2016-05-26 12:19:33 -07:00
Benoit Steiner	c1c7f06c35	Improved the performance of inner reductions.	2016-05-26 11:53:59 -07:00
Benoit Steiner	8288b0aec2	Code cleanup.	2016-05-26 09:00:04 -07:00
Benoit Steiner	2d7ed54ba2	Made the static storage class qualifier come first.	2016-05-25 22:16:15 -07:00
Benoit Steiner	e1fca8866e	Deleted unnecessary explicit qualifiers.	2016-05-25 22:15:26 -07:00
Benoit Steiner	9b0aaf5113	Don't mark inline functions as static since it confuses the ICC compiler	2016-05-25 22:10:11 -07:00
Benoit Steiner	037a463fd5	Marked unused variables as such	2016-05-25 22:07:48 -07:00
Benoit Steiner	3ac4045272	Made the IndexPair code compile in non cxx11 mode	2016-05-25 15:15:12 -07:00
Benoit Steiner	66556d0e05	Made the index pair list code more portable accross various compilers	2016-05-25 14:34:27 -07:00
Benoit Steiner	034aa3b2c0	Improved the performance of tensor padding	2016-05-25 11:43:08 -07:00
Benoit Steiner	58026905ae	Added support for statically known lists of pairs of indices	2016-05-25 11:04:14 -07:00
Benoit Steiner	0835667329	There is no need to make the fp16 full reduction kernel a static function.	2016-05-24 23:11:56 -07:00
Benoit Steiner	b5d6b52a4d	Fixed compilation warning	2016-05-24 23:10:57 -07:00
Benoit Steiner	a09cbf9905	Merged in rmlarsen/eigen (pull request PR-188) Minor cleanups: 1. Get rid of a few unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.	2016-05-23 12:55:12 -07:00
Christoph Hertzberg	718521d5cf	Silenced several double-promotion warnings	2016-05-22 18:17:04 +02:00
Christoph Hertzberg	25a03c02d6	Fix some sign-compare warnings	2016-05-22 16:42:27 +02:00
Gael Guennebaud	ccaace03c9	Make EIGEN_HAS_CONSTEXPR user configurable	2016-05-20 15:10:08 +02:00
Gael Guennebaud	c3410804cd	Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable	2016-05-20 15:05:38 +02:00
Gael Guennebaud	48bf5ec216	Make EIGEN_HAS_RVALUE_REFERENCES user configurable	2016-05-20 14:54:20 +02:00
Gael Guennebaud	f43ae88892	Rename EIGEN_HAVE_RVALUE_REFERENCES to EIGEN_HAS_RVALUE_REFERENCES	2016-05-20 14:48:51 +02:00
Gael Guennebaud	2f656ce447	Remove std:: to enable custom scalar types.	2016-05-19 23:13:47 +02:00
Rasmus Larsen	b1e080c752	Merged eigen/eigen into default	2016-05-18 15:21:50 -07:00
Rasmus Munk Larsen	5624219b6b	Merge.	2016-05-18 15:16:06 -07:00
Rasmus Munk Larsen	7df811cfe5	Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.	2016-05-18 15:09:48 -07:00
Benoit Steiner	bb3ff8e9d9	Advertize the packet api of the tensor reducers iff the corresponding packet primitives are available.	2016-05-18 14:52:49 -07:00
Gael Guennebaud	548a487800	bug #1229 : bypass usage of Derived::Options which is available for plain matrix types only. Better use column-major storage anyway.	2016-05-18 16:44:05 +02:00
Gael Guennebaud	43790e009b	Pass argument by const ref instead of by value in pow(AutoDiffScalar...)	2016-05-18 16:28:02 +02:00
Gael Guennebaud	1fbfab27a9	bug #1223 : fix compilation of AutoDiffScalar's min/max operators, and add regression unit test.	2016-05-18 16:26:26 +02:00
Gael Guennebaud	448d9d943c	bug #1222 : fix compilation in AutoDiffScalar and add respective unit test	2016-05-18 16:00:11 +02:00
Rasmus Munk Larsen	f519fca72b	Reduce overhead for small tensors and cheap ops by short-circuiting the const computation and block size calculation in parallelFor.	2016-05-17 16:06:00 -07:00
Benoit Steiner	86ae94462e	#if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly.	2016-05-17 14:06:15 -07:00
Benoit Steiner	997c335970	Fixed compilation error	2016-05-17 12:54:18 -07:00
Benoit Steiner	ebf6ada5ee	Fixed compilation error in the tensor thread pool	2016-05-17 12:33:46 -07:00
Rasmus Munk Larsen	0bb61b04ca	Merge upstream.	2016-05-17 10:26:10 -07:00
Rasmus Munk Larsen	0dbd68145f	Roll back changes to core. Move include of TensorFunctors.h up to satisfy dependence in TensorCostModel.h.	2016-05-17 10:25:19 -07:00
Rasmus Larsen	00228f2506	Merged eigen/eigen into default	2016-05-17 09:49:31 -07:00
Benoit Steiner	e7e64c3277	Enable the use of the packet api to evaluate tensor broadcasts. This speed things up quite a bit: Before" M_broadcasting/10 500000 3690 27.10 MFlops/s BM_broadcasting/80 500000 4014 1594.24 MFlops/s BM_broadcasting/640 100000 14770 27731.35 MFlops/s BM_broadcasting/4K 5000 632711 39512.48 MFlops/s After: BM_broadcasting/10 500000 4287 23.33 MFlops/s BM_broadcasting/80 500000 4455 1436.41 MFlops/s BM_broadcasting/640 200000 10195 40173.01 MFlops/s BM_broadcasting/4K 5000 423746 58997.57 MFlops/s	2016-05-17 09:24:35 -07:00
Benoit Steiner	5fa27574dd	Allow vectorized padding on GPU. This helps speed things up a little Before: BM_padding/10 5000000 460 217.03 MFlops/s BM_padding/80 5000000 460 13899.40 MFlops/s BM_padding/640 5000000 461 888421.17 MFlops/s BM_padding/4K 5000000 460 54316322.55 MFlops/s After: BM_padding/10 5000000 454 220.20 MFlops/s BM_padding/80 5000000 455 14039.86 MFlops/s BM_padding/640 5000000 452 904968.83 MFlops/s BM_padding/4K 5000000 411 60750049.21 MFlops/s	2016-05-17 09:17:26 -07:00
Benoit Steiner	8d06c02ffd	Allow vectorized padding on GPU. This helps speed things up a little. Before: BM_padding/10 5000000 460 217.03 MFlops/s BM_padding/80 5000000 460 13899.40 MFlops/s BM_padding/640 5000000 461 888421.17 MFlops/s BM_padding/4K 5000000 460 54316322.55 MFlops/s After: BM_padding/10 5000000 454 220.20 MFlops/s BM_padding/80 5000000 455 14039.86 MFlops/s BM_padding/640 5000000 452 904968.83 MFlops/s BM_padding/4K 5000000 411 60750049.21 MFlops/s	2016-05-17 09:13:27 -07:00
David Dement	ccc7563ac5	made a fix to the GMRES solver so that it now correctly reports the error achieved in the solution process	2016-05-16 14:26:41 -04:00
Benoit Steiner	a80d875916	Added missing costPerCoeff method	2016-05-16 09:31:10 -07:00
Benoit Steiner	83ef39e055	Turn on the cost model by default. This results in some significant speedups for smaller tensors. For example, below are the results for the various tensor reductions. Before: BM_colReduction_12T/10 1000000 1949 51.29 MFlops/s BM_colReduction_12T/80 100000 15636 409.29 MFlops/s BM_colReduction_12T/640 20000 95100 4307.01 MFlops/s BM_colReduction_12T/4K 500 4573423 5466.36 MFlops/s BM_colReduction_4T/10 1000000 1867 53.56 MFlops/s BM_colReduction_4T/80 500000 5288 1210.11 MFlops/s BM_colReduction_4T/640 10000 106924 3830.75 MFlops/s BM_colReduction_4T/4K 500 9946374 2513.48 MFlops/s BM_colReduction_8T/10 1000000 1912 52.30 MFlops/s BM_colReduction_8T/80 200000 8354 766.09 MFlops/s BM_colReduction_8T/640 20000 85063 4815.22 MFlops/s BM_colReduction_8T/4K 500 5445216 4591.19 MFlops/s BM_rowReduction_12T/10 1000000 2041 48.99 MFlops/s BM_rowReduction_12T/80 100000 15426 414.87 MFlops/s BM_rowReduction_12T/640 50000 39117 10470.98 MFlops/s BM_rowReduction_12T/4K 500 3034298 8239.14 MFlops/s BM_rowReduction_4T/10 1000000 1834 54.51 MFlops/s BM_rowReduction_4T/80 500000 5406 1183.81 MFlops/s BM_rowReduction_4T/640 50000 35017 11697.16 MFlops/s BM_rowReduction_4T/4K 500 3428527 7291.76 MFlops/s BM_rowReduction_8T/10 1000000 1925 51.95 MFlops/s BM_rowReduction_8T/80 200000 8519 751.23 MFlops/s BM_rowReduction_8T/640 50000 33441 12248.42 MFlops/s BM_rowReduction_8T/4K 1000 2852841 8763.19 MFlops/s After: BM_colReduction_12T/10 50000000 59 1678.30 MFlops/s BM_colReduction_12T/80 5000000 725 8822.71 MFlops/s BM_colReduction_12T/640 20000 90882 4506.93 MFlops/s BM_colReduction_12T/4K 500 4668855 5354.63 MFlops/s BM_colReduction_4T/10 50000000 59 1687.37 MFlops/s BM_colReduction_4T/80 5000000 737 8681.24 MFlops/s BM_colReduction_4T/640 50000 108637 3770.34 MFlops/s BM_colReduction_4T/4K 500 7912954 3159.38 MFlops/s BM_colReduction_8T/10 50000000 60 1657.21 MFlops/s BM_colReduction_8T/80 5000000 726 8812.48 MFlops/s BM_colReduction_8T/640 20000 91451 4478.90 MFlops/s BM_colReduction_8T/4K 500 5441692 4594.16 MFlops/s BM_rowReduction_12T/10 20000000 93 1065.28 MFlops/s BM_rowReduction_12T/80 2000000 950 6730.96 MFlops/s BM_rowReduction_12T/640 50000 38196 10723.48 MFlops/s BM_rowReduction_12T/4K 500 3019217 8280.29 MFlops/s BM_rowReduction_4T/10 20000000 93 1064.30 MFlops/s BM_rowReduction_4T/80 2000000 959 6667.71 MFlops/s BM_rowReduction_4T/640 50000 37433 10941.96 MFlops/s BM_rowReduction_4T/4K 500 3036476 8233.23 MFlops/s BM_rowReduction_8T/10 20000000 93 1072.47 MFlops/s BM_rowReduction_8T/80 2000000 959 6670.04 MFlops/s BM_rowReduction_8T/640 50000 38069 10759.37 MFlops/s BM_rowReduction_8T/4K 1000 2758988 9061.29 MFlops/s	2016-05-16 08:55:21 -07:00
Benoit Steiner	b789a26804	Fixed syntax error	2016-05-16 08:51:08 -07:00
Benoit Steiner	83dfb40f66	Turnon the new thread pool by default since it scales much better over multiple cores. It is still possible to revert to the old thread pool by compiling with the EIGEN_USE_SIMPLE_THREAD_POOL define.	2016-05-13 17:23:15 -07:00
Benoit Steiner	97605c7b27	New multithreaded contraction that doesn't rely on the thread pool to run the closure in the order in which they are enqueued. This is needed in order to switch to the new non blocking thread pool since this new thread pool can execute the closure in any order.	2016-05-13 17:11:29 -07:00
Benoit Steiner	c4fc8b70ec	Removed unnecessary thread synchronization	2016-05-13 10:49:38 -07:00
Benoit Steiner	7aa3557d31	Fixed compilation errors triggered by old versions of gcc	2016-05-12 18:59:04 -07:00
Rasmus Munk Larsen	5005b27fc8	Diasbled cost model by accident. Revert.	2016-05-12 16:55:21 -07:00
Rasmus Munk Larsen	989e419328	Address comments by bsteiner.	2016-05-12 16:54:19 -07:00
Rasmus Munk Larsen	e55deb21c5	Improvements to parallelFor. Move some scalar functors from TensorFunctors. to Eigen core.	2016-05-12 14:07:22 -07:00
Benoit Steiner	ae9688f313	Worked around a compilation error triggered by nvcc when compiling a tensor concatenation kernel.	2016-05-12 12:06:51 -07:00
Benoit Steiner	2a54b70d45	Fixed potential race condition in the non blocking thread pool	2016-05-12 11:45:48 -07:00
Benoit Steiner	a071629fec	Replace implicit cast with an explicit one	2016-05-12 10:40:07 -07:00
Benoit Steiner	2f9401b061	Worked around compilation errors with older versions of gcc	2016-05-11 23:39:20 -07:00
Benoit Steiner	09653e1f82	Improved the portability of the tensor code	2016-05-11 23:29:09 -07:00

1 2 3 4 5 ...

1648 Commits