Commit Graph

1934 Commits

Author SHA1 Message Date
Benoit Steiner
488ad7dd1b Added missing EIGEN_DEVICE_FUNC qualifiers 2016-09-14 13:35:00 -07:00
Benoit Steiner
028e299577 Fixed a bug impacting some outer reductions on GPU 2016-09-12 18:36:52 -07:00
Benoit Steiner
8321dcce76 Merged latest updates from trunk 2016-09-12 10:33:05 -07:00
Benoit Steiner
eb6ba00cc8 Properly size the list of waiters 2016-09-12 10:31:55 -07:00
Benoit Steiner
a618094b62 Added a resize method to MaxSizeVector 2016-09-12 10:30:53 -07:00
Gael Guennebaud
471eac5399 bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX) 2016-09-08 08:36:27 +02:00
Gael Guennebaud
e1642f485c bug #1288: fix memory leak in arpack wrapper. 2016-09-05 18:01:30 +02:00
Benoit Steiner
13df3441ae Use MaxSizeVector instead of std::vector: xcode sometimes assumes that std::vector allocates aligned memory and therefore issues aligned instruction to initialize it. This can result in random crashes when compiling with AVX instructions enabled. 2016-09-02 19:25:47 -07:00
Benoit Steiner
cadd124d73 Pulled latest update from trunk 2016-09-02 15:30:02 -07:00
Benoit Steiner
05b0518077 Made the index type an explicit template parameter to help some compilers compile the code. 2016-09-02 15:29:34 -07:00
Benoit Steiner
adf864fec0 Merged in rmlarsen/eigen (pull request PR-222)
Fix CUDA build broken by changes to min and max reduction.
2016-09-02 14:11:20 -07:00
Rasmus Munk Larsen
13e93ca8b7 Fix CUDA build broken by changes to min and max reduction. 2016-09-02 13:41:36 -07:00
Benoit Steiner
c53f783705 Updated the contraction code to support constant inputs. 2016-09-01 11:41:27 -07:00
Gael Guennebaud
46475eff9a Adjust Tensor module wrt recent change in nullary functor 2016-09-01 13:40:45 +02:00
Rasmus Munk Larsen
a1e092d1e8 Fix bugs to make min- and max reducers with correctly with IEEE infinities. 2016-08-31 15:04:16 -07:00
Gael Guennebaud
1f84f0d33a merge EulerAngles module 2016-08-30 10:01:53 +02:00
Gael Guennebaud
e074f720c7 Include missing forward declaration of SparseMatrix 2016-08-29 18:56:46 +02:00
Gael Guennebaud
35a8e94577 bug #1167: simplify installation of header files using cmake's install(DIRECTORY ...) command. 2016-08-29 10:59:37 +02:00
Gael Guennebaud
965e595f02 Add missing log1p method 2016-08-26 14:55:00 +02:00
Benoit Steiner
34ae80179a Use array_prod instead of calling TotalSize since TotalSize is only available on DSize. 2016-08-15 10:29:14 -07:00
Benoit Steiner
fe73648c98 Fixed a bug in the documentation. 2016-08-12 10:00:43 -07:00
Benoit Steiner
e3a8dfb02f std::erfcf doesn't exist: use numext::erfc instead 2016-08-11 15:24:06 -07:00
Benoit Steiner
64e68cbe87 Don't attempt to optimize partial reductions when the optimized implementation doesn't buy anything. 2016-08-08 19:29:59 -07:00
Benoit Steiner
ca2cee2739 Merged in ibab/eigen (pull request PR-206)
Expose real and imag methods on Tensors
2016-08-03 11:53:04 -07:00
Benoit Steiner
a20b58845f CUDA_ARCH isn't always defined, so avoid relying on it too much when figuring out which implementation to use for reductions. Instead rely on the device to tell us on which hardware version we're running. 2016-08-03 10:00:43 -07:00
Benoit Steiner
fd220dd8b0 Use numext::conj instead of std::conj 2016-08-01 18:16:16 -07:00
Benoit Steiner
e256acec7c Avoid unecessary object copies 2016-08-01 17:03:39 -07:00
Benoit Steiner
2693fd54bf bug #1266: half implementation has been moved to half_impl namespace 2016-07-29 13:45:56 -07:00
Gael Guennebaud
cc2f6d68b1 bug #1264: fix compilation 2016-07-27 23:30:47 +02:00
Gael Guennebaud
8972323c08 Big 1261: add missing max(ADS,ADS) overload (same for min) 2016-07-27 14:52:48 +02:00
Gael Guennebaud
0d7039319c bug #1260: remove doubtful specializations of ScalarBinaryOpTraits 2016-07-27 14:35:52 +02:00
Benoit Steiner
3d3d34e442 Deleted dead code. 2016-07-25 08:53:37 -07:00
Gael Guennebaud
6d5daf32f5 bug #1255: comment out broken and unsused line. 2016-07-25 14:48:30 +02:00
Gael Guennebaud
f9598d73b5 bug #1250: fix pow() for AutoDiffScalar with custom nested scalar type. 2016-07-25 14:42:19 +02:00
Gael Guennebaud
fd1117f2be Implement digits10 for mpreal 2016-07-25 14:38:55 +02:00
Gael Guennebaud
9908020d36 Add minimal support for Array<string>, and fix Tensor<string> 2016-07-25 14:25:56 +02:00
Benoit Steiner
c6b0de2c21 Improved partial reductions in more cases 2016-07-22 17:18:20 -07:00
Gael Guennebaud
0f350a8b7e Fix CUDA compilation 2016-07-21 18:47:07 +02:00
Yi Lin
7b4abc2b1d Fixed a code comment error 2016-07-20 22:28:54 +08:00
Benoit Steiner
20f7ef2f89 An evalTo expression is only aligned iff both the lhs and the rhs are aligned. 2016-07-12 10:56:42 -07:00
Benoit Steiner
3a2dd352ae Improved the contraction mapper to properly support tensor products 2016-07-11 13:43:41 -07:00
Benoit Steiner
0bc020be9d Improved the detection of packet size in the tensor scan evaluator. 2016-07-11 12:14:56 -07:00
Gael Guennebaud
a96a7ce3f7 Move CUDA's special functions to SpecialFunctions module. 2016-07-11 18:39:11 +02:00
Gael Guennebaud
fd60966310 merge 2016-07-11 18:11:47 +02:00
Gael Guennebaud
194daa3048 Fix assertion (it did not make sense for static_val types) 2016-07-11 11:39:27 +02:00
Gael Guennebaud
18c35747ce Emulate _BitScanReverse64 for 32 bits builds 2016-07-11 11:38:04 +02:00
Gael Guennebaud
599f8ba617 Change runtime to compile-time conditional. 2016-07-08 11:39:43 +02:00
Gael Guennebaud
544935101a Fix warnings 2016-07-08 11:38:52 +02:00
Gael Guennebaud
2f7e2614e7 bug #1232: refactor special functions as a new SpecialFunctions module, currently in unsupported/. 2016-07-08 11:13:55 +02:00
Gael Guennebaud
179ebb88f9 Fix warning 2016-07-07 09:16:40 +02:00
Gael Guennebaud
ce9fc0ce14 fix clang compilation 2016-07-04 12:59:02 +02:00
Gael Guennebaud
440020474c Workaround compilation issue with msvc 2016-07-04 12:49:19 +02:00
Igor Babuschkin
78f37ca03c Expose real and imag methods on Tensors 2016-07-01 17:34:31 +01:00
Benoit Steiner
cb2d8b8fa6 Made it possible to compile reductions for an old cuda architecture and run them on a recent gpu. 2016-06-29 15:42:01 -07:00
Benoit Steiner
b2a47641ce Made the code compile when using CUDA architecture < 300 2016-06-29 15:32:47 -07:00
Igor Babuschkin
85699850d9 Add missing CUDA kernel to tensor scan op
The TensorScanOp implementation was missing a CUDA kernel launch.
This adds a simple placeholder implementation.
2016-06-29 11:54:35 +01:00
Benoit Steiner
75c333f94c Don't store the scan axis in the evaluator of the tensor scan operation since it's only used in the constructor.
Also avoid taking references to values that may becomes stale after a copy construction.
2016-06-27 10:32:38 -07:00
Benoit Steiner
7944d4431f Made the cost model cwiseMax and cwiseMin methods consts to help the PowerPC cuda compiler compile this code. 2016-08-18 13:46:36 -07:00
Benoit Steiner
647a51b426 Force the inlining of a simple accessor. 2016-08-18 12:31:02 -07:00
Benoit Steiner
a452dedb4f Merged in ibab/eigen/double-tensor-reduction (pull request PR-216)
Enable efficient Tensor reduction for doubles on the GPU (continued)
2016-08-18 12:29:54 -07:00
Igor Babuschkin
18c67df31c Fix remaining CUDA >= 300 checks 2016-08-18 17:18:30 +01:00
Igor Babuschkin
1569a7d7ab Add the necessary CUDA >= 300 checks back 2016-08-18 17:15:12 +01:00
Benoit Steiner
2b17f34574 Properly detect the type of the result of a contraction. 2016-08-16 16:00:30 -07:00
Igor Babuschkin
841e075154 Remove CUDA >= 300 checks and enable outer reductin for doubles 2016-08-06 18:07:50 +01:00
Igor Babuschkin
0425118e2a Merge upstream changes 2016-08-05 14:34:57 +01:00
Igor Babuschkin
9537e8b118 Make use of atomicExch for atomicExchCustom 2016-08-05 14:29:58 +01:00
Igor Babuschkin
eeb0d880ee Enable efficient Tensor reduction for doubles 2016-07-01 19:08:26 +01:00
Gael Guennebaud
cfff370549 Fix hyperbolic functions for autodiff. 2016-06-24 23:21:35 +02:00
Gael Guennebaud
3852351793 merge pull request 198 2016-06-24 11:48:17 +02:00
Gael Guennebaud
6dd9077070 Fix some unused typedef warnings. 2016-06-24 11:34:21 +02:00
Gael Guennebaud
ce90647fa5 Fix NumTraits<AutoDiff> 2016-06-24 11:34:02 +02:00
Gael Guennebaud
fa39f81b48 Fix instantiation of ScalarBinaryOpTraits for AutoDiff. 2016-06-24 11:33:30 +02:00
Rasmus Munk Larsen
a9c1e4d7b7 Return -1 from CurrentThreadId when called by thread outside the pool. 2016-06-23 16:40:07 -07:00
Rasmus Munk Larsen
d39df320d2 Resolve merge. 2016-06-23 15:08:03 -07:00
Gael Guennebaud
360a743a10 bug #1241: does not emmit anything for empty tensors 2016-06-23 18:47:31 +02:00
Gael Guennebaud
7c6561485a merge PR 194 2016-06-23 15:29:57 +02:00
Benoit Steiner
a29a2cb4ff Silenced a couple of compilation warnings generated by xcode 2016-06-22 16:43:02 -07:00
Benoit Steiner
f8fcd6b32d Turned the constructor of the PerThread struct into what is effectively a constant expression to make the code compatible with a wider range of compilers 2016-06-22 16:03:11 -07:00
Benoit Steiner
c58df31747 Handle empty tensors in the print functions 2016-06-21 09:22:43 -07:00
Benoit Steiner
de32f8d656 Fixed the printing of rank-0 tensors 2016-06-20 10:46:45 -07:00
Tal Hadad
8e198d6835 Complete docs and add ostream operator for EulerAngles. 2016-06-19 20:42:45 +03:00
Geoffrey Lalonde
72c95383e0 Add autodiff coverage for standard library hyperbolic functions, and tests.
* * *
Corrected tanh derivatived, moved test definitions.
* * *
Added more test cases, removed lingering lines
2016-06-15 23:33:19 -07:00
Benoit Steiner
7d495d890a Merged in ibab/eigen (pull request PR-197)
Implement exclusive scan option for Tensor library
2016-06-14 17:54:59 -07:00
Benoit Steiner
aedc5be1d6 Avoid generating pseudo random numbers that are multiple of 5: this helps
spread the load over multiple cpus without havind to rely on work stealing.
2016-06-14 17:51:47 -07:00
Igor Babuschkin
c4d10e921f Implement exclusive scan option 2016-06-14 19:44:07 +01:00
Gael Guennebaud
76236cdea4 merge 2016-06-14 15:33:47 +02:00
Gael Guennebaud
62134082aa Update AutoDiffScalar wrt to scalar-multiple. 2016-06-14 15:06:35 +02:00
Gael Guennebaud
5d38203735 Update Tensor module to use bind1st_op and bind2nd_op 2016-06-14 15:06:03 +02:00
Tal Hadad
6edfe8771b Little bit docs 2016-06-13 22:03:19 +03:00
Tal Hadad
6e1c086593 Add static assertion 2016-06-13 21:55:17 +03:00
Gael Guennebaud
3c12e24164 Add bind1st_op and bind2nd_op helpers to turn binary functors into unary ones, and implement scalar_multiple2 and scalar_quotient2 on top of them. 2016-06-13 16:18:59 +02:00
Tal Hadad
06206482d9 More docs, and minor code fixes 2016-06-12 23:40:17 +03:00
Benoit Steiner
65d33e5898 Merged in ibab/eigen (pull request PR-195)
Add small fixes to TensorScanOp
2016-06-10 19:31:17 -07:00
Benoit Steiner
a05607875a Don't refer to the half2 type unless it's been defined 2016-06-10 11:53:56 -07:00
Igor Babuschkin
86aedc9282 Add small fixes to TensorScanOp 2016-06-07 20:06:38 +01:00
Benoit Steiner
84b2060a9e Fixed compilation error with gcc 4.4 2016-06-06 17:16:19 -07:00
Benoit Steiner
7ef9f47b58 Misc small improvements to the reduction code. 2016-06-06 14:09:46 -07:00
Tal Hadad
e30133e439 Doc EulerAngles class, and minor fixes. 2016-06-06 22:01:40 +03:00
Benoit Steiner
9137f560f0 Moved assertions to the constructor to make the code more portable 2016-06-06 07:26:48 -07:00
Gael Guennebaud
66e99ab6a1 Relax mixing-type constraints for binary coefficient-wise operators:
- Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP>
- Remove the "functor_is_product_like" helper (was pretty ugly)
- Currently, OP is not used, but it is available to the user for fine grained tuning
- Currently, only the following operators have been generalized: *,/,+,-,=,*=,/=,+=,-=
- TODO: generalize all other binray operators (comparisons,pow,etc.)
- TODO: handle "scalar op array" operators (currently only * is handled)
- TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
2016-06-06 15:11:41 +02:00
Rasmus Munk Larsen
f1f2ff8208 size_t -> int 2016-06-03 18:06:37 -07:00
Rasmus Munk Larsen
76308e7fd2 Add CurrentThreadId and NumThreads methods to Eigen threadpools and TensorDeviceThreadPool. 2016-06-03 16:28:58 -07:00
Benoit Steiner
37638dafd7 Simplified the code that dispatches vectorized reductions on GPU 2016-06-09 10:29:52 -07:00
Benoit Steiner
66796e843d Fixed definition of some of the reducer_traits 2016-06-09 08:50:01 -07:00
Benoit Steiner
14a112ee15 Use signed integers more consistently to encode the number of threads to use to evaluate a tensor expression. 2016-06-09 08:25:22 -07:00
Benoit Steiner
8f92c26319 Improved code formatting 2016-06-09 08:23:42 -07:00
Benoit Steiner
aa33446dac Improved support for vectorization of 16-bit floats 2016-06-09 08:22:27 -07:00
Benoit Steiner
d6d39c7ddb Added missing EIGEN_DEVICE_FUNC 2016-06-07 14:35:08 -07:00
Gael Guennebaud
e8b922ca63 Fix MatrixFunctions module. 2016-06-03 09:21:35 +02:00
Benoit Steiner
c3c8ad8046 Align the first element of the Waiter struct instead of padding it. This reduces its memory footprint a bit while achieving the goal of preventing false sharing 2016-06-02 21:17:41 -07:00
Eugene Brevdo
39baff850c Add TernaryFunctors and the betainc SpecialFunction.
TernaryFunctors and their executors allow operations on 3-tuples of inputs.
API fully implemented for Arrays and Tensors based on binary functors.

Ported the cephes betainc function (regularized incomplete beta
integral) to Eigen, with support for CPU and GPU, floats, doubles, and
half types.

Added unit tests in array.cpp and cxx11_tensor_cuda.cu


Collapsed revision
* Merged helper methods for betainc across floats and doubles.
* Added TensorGlobalFunctions with betainc().  Removed betainc() from TensorBase.
* Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper.
* betainc: merge incbcf and incbd into incbeta_cfe.  and more cleanup.
* Update TernaryOp and SpecialFunctions (betainc) based on review comments.
2016-06-02 17:04:19 -07:00
Benoit Steiner
c21eaedce6 Use array_prod to compute the number of elements contained in the input tensor expression 2016-06-04 07:47:04 -07:00
Benoit Steiner
36a4500822 Merged in ibab/eigen (pull request PR-192)
Add generic scan method
2016-06-03 17:28:33 -07:00
Benoit Steiner
c2a102345f Improved the performance of full reductions.
AFTER:
BM_fullReduction/10        4541       4543     154017  21.0M items/s
BM_fullReduction/64        5191       5193     100000  752.5M items/s
BM_fullReduction/512       9588       9588      71361  25.5G items/s
BM_fullReduction/4k      244314     244281       2863  64.0G items/s
BM_fullReduction/5k      359382     359363       1946  64.8G items/s

BEFORE:
BM_fullReduction/10        9085       9087      74395  10.5M items/s
BM_fullReduction/64        9478       9478      72014  412.1M items/s
BM_fullReduction/512      14643      14646      46902  16.7G items/s
BM_fullReduction/4k      260338     260384       2678  60.0G items/s
BM_fullReduction/5k      385076     385178       1818  60.5G items/s
2016-06-03 17:27:08 -07:00
Igor Babuschkin
dc03b8f3a1 Add generic scan method 2016-06-03 17:37:04 +01:00
Rasmus Munk Larsen
811aadbe00 Add syntactic sugar to Eigen tensors to allow more natural syntax.
Specifically, this enables expressions involving:

scalar + tensor
scalar * tensor
scalar / tensor
scalar - tensor
2016-06-02 12:41:28 -07:00
Tal Hadad
52e4cbf539 Merged eigen/eigen into default 2016-06-02 22:15:20 +03:00
Tal Hadad
2aaaf22623 Fix Gael reports (except documention)
- "Scalar angle(int) const"  should be  "const Vector& angles() const"
- then method "coeffs" could be removed.
- avoid one letter names like h, p, r -> use alpha(), beta(), gamma() ;)
- about the "fromRotation" methods:
 - replace the ones which are not static by operator= (as in Quaternion)
 - the others are actually static methods: use a capital F: FromRotation
- method "invert" should be removed.
- use a macro to define both float and double EulerAnglesXYZ* typedefs
- AddConstIf -> not used
- no needs for NegateIfXor, compilers are extremely good at optimizing away branches based on compile time constants:
  if(IsHeadingOpposite-=IsEven) res.alpha() = -res.alpha();
2016-06-02 22:12:57 +03:00
Igor Babuschkin
fbd7ed6ff7 Add tensor scan op
This is the initial implementation a generic scan operation.
Based on this, cumsum and cumprod method have been added to TensorBase.
2016-06-02 13:35:47 +01:00
Benoit Steiner
0ed08fd281 Use a single PacketSize variable 2016-06-01 21:19:05 -07:00
Benoit Steiner
8f6fedc55f Fixed compilation warning 2016-06-01 21:14:46 -07:00
Benoit Steiner
873e6ac54b Silenced compilation warning generated by nvcc. 2016-06-01 14:20:50 -07:00
Benoit Steiner
d27b0ad4c8 Added support for mean reductions on fp16 2016-06-01 11:12:07 -07:00
Benoit Steiner
5aeb3687c4 Only enable optimized reductions of fp16 if the reduction functor supports them 2016-05-31 10:33:40 -07:00
Benoit Steiner
e2946d962d Reimplement clamp as a static function. 2016-05-27 12:58:43 -07:00
Benoit Steiner
e96d36d4cd Use NULL instead of nullptr to preserve the compatibility with cxx03 2016-05-27 12:54:06 -07:00
Benoit Steiner
abc815798b Added a new operation to enable more powerful tensorindexing. 2016-05-27 12:22:25 -07:00
Gael Guennebaud
22a035db95 Fix compilation when defaulting to row-major 2016-05-27 10:31:11 +02:00
Benoit Steiner
1ae2567861 Fixed some compilation warnings 2016-05-26 15:57:19 -07:00
Benoit Steiner
1a47844529 Preserve the ability to vectorize the evaluation of an expression even when it involves a cast that isn't vectorized (e.g fp16 to float) 2016-05-26 14:37:09 -07:00
Benoit Steiner
36369ab63c Resolved merge conflicts 2016-05-26 13:39:39 -07:00
Benoit Steiner
28fcb5ca2a Merged latest reduction improvements 2016-05-26 12:19:33 -07:00
Benoit Steiner
c1c7f06c35 Improved the performance of inner reductions. 2016-05-26 11:53:59 -07:00
Benoit Steiner
8288b0aec2 Code cleanup. 2016-05-26 09:00:04 -07:00
Benoit Steiner
2d7ed54ba2 Made the static storage class qualifier come first. 2016-05-25 22:16:15 -07:00
Benoit Steiner
e1fca8866e Deleted unnecessary explicit qualifiers. 2016-05-25 22:15:26 -07:00
Benoit Steiner
9b0aaf5113 Don't mark inline functions as static since it confuses the ICC compiler 2016-05-25 22:10:11 -07:00
Benoit Steiner
037a463fd5 Marked unused variables as such 2016-05-25 22:07:48 -07:00
Benoit Steiner
3ac4045272 Made the IndexPair code compile in non cxx11 mode 2016-05-25 15:15:12 -07:00
Benoit Steiner
66556d0e05 Made the index pair list code more portable accross various compilers 2016-05-25 14:34:27 -07:00
Benoit Steiner
034aa3b2c0 Improved the performance of tensor padding 2016-05-25 11:43:08 -07:00
Benoit Steiner
58026905ae Added support for statically known lists of pairs of indices 2016-05-25 11:04:14 -07:00
Benoit Steiner
0835667329 There is no need to make the fp16 full reduction kernel a static function. 2016-05-24 23:11:56 -07:00
Benoit Steiner
b5d6b52a4d Fixed compilation warning 2016-05-24 23:10:57 -07:00
Benoit Steiner
a09cbf9905 Merged in rmlarsen/eigen (pull request PR-188)
Minor cleanups: 1. Get rid of a few unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
2016-05-23 12:55:12 -07:00
Christoph Hertzberg
718521d5cf Silenced several double-promotion warnings 2016-05-22 18:17:04 +02:00
Christoph Hertzberg
25a03c02d6 Fix some sign-compare warnings 2016-05-22 16:42:27 +02:00
Gael Guennebaud
ccaace03c9 Make EIGEN_HAS_CONSTEXPR user configurable 2016-05-20 15:10:08 +02:00
Gael Guennebaud
c3410804cd Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable 2016-05-20 15:05:38 +02:00
Gael Guennebaud
48bf5ec216 Make EIGEN_HAS_RVALUE_REFERENCES user configurable 2016-05-20 14:54:20 +02:00
Gael Guennebaud
f43ae88892 Rename EIGEN_HAVE_RVALUE_REFERENCES to EIGEN_HAS_RVALUE_REFERENCES 2016-05-20 14:48:51 +02:00
Gael Guennebaud
2f656ce447 Remove std:: to enable custom scalar types. 2016-05-19 23:13:47 +02:00
Rasmus Larsen
b1e080c752 Merged eigen/eigen into default 2016-05-18 15:21:50 -07:00
Rasmus Munk Larsen
5624219b6b Merge. 2016-05-18 15:16:06 -07:00
Rasmus Munk Larsen
7df811cfe5 Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL. 2016-05-18 15:09:48 -07:00
Benoit Steiner
bb3ff8e9d9 Advertize the packet api of the tensor reducers iff the corresponding packet primitives are available. 2016-05-18 14:52:49 -07:00
Gael Guennebaud
548a487800 bug #1229: bypass usage of Derived::Options which is available for plain matrix types only. Better use column-major storage anyway. 2016-05-18 16:44:05 +02:00
Gael Guennebaud
43790e009b Pass argument by const ref instead of by value in pow(AutoDiffScalar...) 2016-05-18 16:28:02 +02:00
Gael Guennebaud
1fbfab27a9 bug #1223: fix compilation of AutoDiffScalar's min/max operators, and add regression unit test. 2016-05-18 16:26:26 +02:00
Gael Guennebaud
448d9d943c bug #1222: fix compilation in AutoDiffScalar and add respective unit test 2016-05-18 16:00:11 +02:00
Rasmus Munk Larsen
f519fca72b Reduce overhead for small tensors and cheap ops by short-circuiting the const computation and block size calculation in parallelFor. 2016-05-17 16:06:00 -07:00
Benoit Steiner
86ae94462e #if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly. 2016-05-17 14:06:15 -07:00
Benoit Steiner
997c335970 Fixed compilation error 2016-05-17 12:54:18 -07:00
Benoit Steiner
ebf6ada5ee Fixed compilation error in the tensor thread pool 2016-05-17 12:33:46 -07:00
Rasmus Munk Larsen
0bb61b04ca Merge upstream. 2016-05-17 10:26:10 -07:00
Rasmus Munk Larsen
0dbd68145f Roll back changes to core. Move include of TensorFunctors.h up to satisfy dependence in TensorCostModel.h. 2016-05-17 10:25:19 -07:00
Rasmus Larsen
00228f2506 Merged eigen/eigen into default 2016-05-17 09:49:31 -07:00
Benoit Steiner
e7e64c3277 Enable the use of the packet api to evaluate tensor broadcasts. This speed things up quite a bit:
Before"
M_broadcasting/10        500000       3690    27.10 MFlops/s
BM_broadcasting/80        500000       4014  1594.24 MFlops/s
BM_broadcasting/640       100000      14770 27731.35 MFlops/s
BM_broadcasting/4K          5000     632711 39512.48 MFlops/s
After:
BM_broadcasting/10        500000       4287    23.33 MFlops/s
BM_broadcasting/80        500000       4455  1436.41 MFlops/s
BM_broadcasting/640       200000      10195 40173.01 MFlops/s
BM_broadcasting/4K          5000     423746 58997.57 MFlops/s
2016-05-17 09:24:35 -07:00
Benoit Steiner
5fa27574dd Allow vectorized padding on GPU. This helps speed things up a little
Before:
BM_padding/10            5000000        460   217.03 MFlops/s
BM_padding/80            5000000        460 13899.40 MFlops/s
BM_padding/640           5000000        461 888421.17 MFlops/s
BM_padding/4K            5000000        460 54316322.55 MFlops/s
After:
BM_padding/10            5000000        454   220.20 MFlops/s
BM_padding/80            5000000        455 14039.86 MFlops/s
BM_padding/640           5000000        452 904968.83 MFlops/s
BM_padding/4K            5000000        411 60750049.21 MFlops/s
2016-05-17 09:17:26 -07:00
Benoit Steiner
8d06c02ffd Allow vectorized padding on GPU. This helps speed things up a little.
Before:
BM_padding/10            5000000        460   217.03 MFlops/s
BM_padding/80            5000000        460 13899.40 MFlops/s
BM_padding/640           5000000        461 888421.17 MFlops/s
BM_padding/4K            5000000        460 54316322.55 MFlops/s
After:
BM_padding/10            5000000        454   220.20 MFlops/s
BM_padding/80            5000000        455 14039.86 MFlops/s
BM_padding/640           5000000        452 904968.83 MFlops/s
BM_padding/4K            5000000        411 60750049.21 MFlops/s
2016-05-17 09:13:27 -07:00
David Dement
ccc7563ac5 made a fix to the GMRES solver so that it now correctly reports the error achieved in the solution process 2016-05-16 14:26:41 -04:00
Benoit Steiner
a80d875916 Added missing costPerCoeff method 2016-05-16 09:31:10 -07:00
Benoit Steiner
83ef39e055 Turn on the cost model by default. This results in some significant speedups for smaller tensors. For example, below are the results for the various tensor reductions.
Before:
BM_colReduction_12T/10       1000000       1949    51.29 MFlops/s
BM_colReduction_12T/80        100000      15636   409.29 MFlops/s
BM_colReduction_12T/640        20000      95100  4307.01 MFlops/s
BM_colReduction_12T/4K           500    4573423  5466.36 MFlops/s
BM_colReduction_4T/10        1000000       1867    53.56 MFlops/s
BM_colReduction_4T/80         500000       5288  1210.11 MFlops/s
BM_colReduction_4T/640         10000     106924  3830.75 MFlops/s
BM_colReduction_4T/4K            500    9946374  2513.48 MFlops/s
BM_colReduction_8T/10        1000000       1912    52.30 MFlops/s
BM_colReduction_8T/80         200000       8354   766.09 MFlops/s
BM_colReduction_8T/640         20000      85063  4815.22 MFlops/s
BM_colReduction_8T/4K            500    5445216  4591.19 MFlops/s
BM_rowReduction_12T/10       1000000       2041    48.99 MFlops/s
BM_rowReduction_12T/80        100000      15426   414.87 MFlops/s
BM_rowReduction_12T/640        50000      39117 10470.98 MFlops/s
BM_rowReduction_12T/4K           500    3034298  8239.14 MFlops/s
BM_rowReduction_4T/10        1000000       1834    54.51 MFlops/s
BM_rowReduction_4T/80         500000       5406  1183.81 MFlops/s
BM_rowReduction_4T/640         50000      35017 11697.16 MFlops/s
BM_rowReduction_4T/4K            500    3428527  7291.76 MFlops/s
BM_rowReduction_8T/10        1000000       1925    51.95 MFlops/s
BM_rowReduction_8T/80         200000       8519   751.23 MFlops/s
BM_rowReduction_8T/640         50000      33441 12248.42 MFlops/s
BM_rowReduction_8T/4K           1000    2852841  8763.19 MFlops/s


After:
BM_colReduction_12T/10      50000000         59  1678.30 MFlops/s
BM_colReduction_12T/80       5000000        725  8822.71 MFlops/s
BM_colReduction_12T/640        20000      90882  4506.93 MFlops/s
BM_colReduction_12T/4K           500    4668855  5354.63 MFlops/s
BM_colReduction_4T/10       50000000         59  1687.37 MFlops/s
BM_colReduction_4T/80        5000000        737  8681.24 MFlops/s
BM_colReduction_4T/640         50000     108637  3770.34 MFlops/s
BM_colReduction_4T/4K            500    7912954  3159.38 MFlops/s
BM_colReduction_8T/10       50000000         60  1657.21 MFlops/s
BM_colReduction_8T/80        5000000        726  8812.48 MFlops/s
BM_colReduction_8T/640         20000      91451  4478.90 MFlops/s
BM_colReduction_8T/4K            500    5441692  4594.16 MFlops/s
BM_rowReduction_12T/10      20000000         93  1065.28 MFlops/s
BM_rowReduction_12T/80       2000000        950  6730.96 MFlops/s
BM_rowReduction_12T/640        50000      38196 10723.48 MFlops/s
BM_rowReduction_12T/4K           500    3019217  8280.29 MFlops/s
BM_rowReduction_4T/10       20000000         93  1064.30 MFlops/s
BM_rowReduction_4T/80        2000000        959  6667.71 MFlops/s
BM_rowReduction_4T/640         50000      37433 10941.96 MFlops/s
BM_rowReduction_4T/4K            500    3036476  8233.23 MFlops/s
BM_rowReduction_8T/10       20000000         93  1072.47 MFlops/s
BM_rowReduction_8T/80        2000000        959  6670.04 MFlops/s
BM_rowReduction_8T/640         50000      38069 10759.37 MFlops/s
BM_rowReduction_8T/4K           1000    2758988  9061.29 MFlops/s
2016-05-16 08:55:21 -07:00
Benoit Steiner
b789a26804 Fixed syntax error 2016-05-16 08:51:08 -07:00
Benoit Steiner
83dfb40f66 Turnon the new thread pool by default since it scales much better over multiple cores. It is still possible to revert to the old thread pool by compiling with the EIGEN_USE_SIMPLE_THREAD_POOL define. 2016-05-13 17:23:15 -07:00
Benoit Steiner
97605c7b27 New multithreaded contraction that doesn't rely on the thread pool to run the closure in the order in which they are enqueued. This is needed in order to switch to the new non blocking thread pool since this new thread pool can execute the closure in any order. 2016-05-13 17:11:29 -07:00
Benoit Steiner
c4fc8b70ec Removed unnecessary thread synchronization 2016-05-13 10:49:38 -07:00
Benoit Steiner
7aa3557d31 Fixed compilation errors triggered by old versions of gcc 2016-05-12 18:59:04 -07:00
Rasmus Munk Larsen
5005b27fc8 Diasbled cost model by accident. Revert. 2016-05-12 16:55:21 -07:00
Rasmus Munk Larsen
989e419328 Address comments by bsteiner. 2016-05-12 16:54:19 -07:00
Rasmus Munk Larsen
e55deb21c5 Improvements to parallelFor.
Move some scalar functors from TensorFunctors. to Eigen core.
2016-05-12 14:07:22 -07:00
Benoit Steiner
ae9688f313 Worked around a compilation error triggered by nvcc when compiling a tensor concatenation kernel. 2016-05-12 12:06:51 -07:00
Benoit Steiner
2a54b70d45 Fixed potential race condition in the non blocking thread pool 2016-05-12 11:45:48 -07:00
Benoit Steiner
a071629fec Replace implicit cast with an explicit one 2016-05-12 10:40:07 -07:00
Benoit Steiner
2f9401b061 Worked around compilation errors with older versions of gcc 2016-05-11 23:39:20 -07:00
Benoit Steiner
09653e1f82 Improved the portability of the tensor code 2016-05-11 23:29:09 -07:00
Benoit Steiner
b6a517c47d Added the ability to load fp16 using the texture path.
Improved the performance of some reductions on fp16
2016-05-11 21:26:48 -07:00
Christoph Hertzberg
1a1ce6ff61 Removed deprecated flag (which apparently was ignored anyway) 2016-05-11 23:05:37 +02:00
Christoph Hertzberg
2150f13d65 fixed some double-promotion and sign-compare warnings 2016-05-11 23:02:26 +02:00
Benoit Steiner
217d984abc Fixed a typo in my previous commit 2016-05-11 10:22:15 -07:00
Benoit Steiner
08348b4e48 Fix potential race condition in the CUDA reduction code. 2016-05-11 10:08:51 -07:00
Benoit Steiner
6a5717dc74 Explicitely initialize all the atomic variables. 2016-05-11 10:04:41 -07:00
Benoit Steiner
4ede059de1 Properly gate the use of half2. 2016-05-10 17:04:01 -07:00
Benoit Steiner
661e710092 Added support for fp16 to the sigmoid functor. 2016-05-10 12:25:27 -07:00
Benoit Steiner
0eb69b7552 Small improvement to the full reduction of fp16 2016-05-10 11:58:18 -07:00
Benoit Steiner
4013b8feca Simplified the reduction code a little. 2016-05-10 09:40:42 -07:00
Benoit Steiner
4670d7d5ce Improved the performance of full reductions on GPU:
Before:
BM_fullReduction/10       200000      11751     8.51 MFlops/s
BM_fullReduction/80         5000     523385    12.23 MFlops/s
BM_fullReduction/640          50   36179326    11.32 MFlops/s
BM_fullReduction/4K            1 2173517195    11.50 MFlops/s

After:
BM_fullReduction/10       500000       5987    16.70 MFlops/s
BM_fullReduction/80       200000      10636   601.73 MFlops/s
BM_fullReduction/640       50000      58428  7010.31 MFlops/s
BM_fullReduction/4K         1000    2006106 12461.95 MFlops/s
2016-05-09 17:09:54 -07:00
Benoit Steiner
c3859a2b58 Added the ability to use a scratch buffer in cuda kernels 2016-05-09 17:05:53 -07:00
Benoit Steiner
ba95e43ea2 Added a new parallelFor api to the thread pool device. 2016-05-09 10:45:12 -07:00
Benoit Steiner
dc7dbc2df7 Optimized the non blocking thread pool:
* Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered.
 * Directly pop from a non-empty queue when we are waiting for work,
instead of first noticing that there is a non-empty queue and
then doing another round of random stealing to re-discover the non-empty
queue.
 * Steal only 1 task from a remote queue instead of half of tasks.
2016-05-09 10:17:17 -07:00
Benoit Steiner
c54ae65c83 Marked a few tensor operations as read only 2016-05-05 17:18:47 -07:00
Benoit Steiner
910e013506 Relaxed an assertion that was tighter that necessary. 2016-05-05 15:38:16 -07:00
Benoit Steiner
28d5572658 Fixed some incorrect assertions 2016-05-05 10:02:26 -07:00
Benoit Steiner
a4d6e8fef0 Strongly hint but don't force the compiler to unroll a some loops in the tensor executor. This results in up to 27% faster code. 2016-05-05 09:25:55 -07:00
Benoit Steiner
f363e533aa Added tests for full contractions using thread pools and gpu devices.
Fixed a couple of issues in the corresponding code.
2016-05-05 09:05:45 -07:00
Benoit Steiner
06d774bf58 Updated the contraction code to ensure that full contraction return a tensor of rank 0 2016-05-05 08:37:47 -07:00
Christoph Hertzberg
dacb469bc9 Enable and fix -Wdouble-conversion warnings 2016-05-05 13:35:45 +02:00
Benoit Steiner
dd2b45feed Removed extraneous 'explicit' keywords 2016-05-04 16:57:52 -07:00
Benoit Steiner
968ec1c2ae Use numext::isfinite instead of std::isfinite 2016-05-03 19:56:40 -07:00
Benoit Steiner
aad9a04da4 Deleted superfluous explicit keyword. 2016-05-03 09:37:19 -07:00
Benoit Steiner
8a9228ed9b Fixed compilation error 2016-05-01 14:48:01 -07:00
Benoit Steiner
d6c9596fd8 Added missing accessors to fixed sized tensors 2016-04-29 18:51:33 -07:00
Benoit Steiner
17fe7f354e Deleted trailing commas 2016-04-29 18:39:01 -07:00
Benoit Steiner
e5f71aa6b2 Deleted useless trailing commas 2016-04-29 18:36:10 -07:00
Benoit Steiner
44f592dceb Deleted unnecessary trailing commas. 2016-04-29 18:33:46 -07:00
Benoit Steiner
f100d1494c Return the proper size (ie 1) for tensors of rank 0 2016-04-29 18:14:33 -07:00
Benoit Steiner
a8c0405cf5 Deleted unused default values for template parameters 2016-04-29 16:34:43 -07:00
Benoit Steiner
c07404f6a1 Restore Tensor support for non c++11 compilers 2016-04-29 15:19:19 -07:00
Benoit Steiner
ba32ded021 Fixed include path 2016-04-29 15:11:09 -07:00
Gael Guennebaud
318e65e0ae Fix missing inclusion of Eigen/Core 2016-04-27 23:05:40 +02:00
Rasmus Munk Larsen
463738ccbe Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases. 2016-04-27 12:26:18 -07:00
Gael Guennebaud
3dddd34133 Refactor the unsupported CXX11/Core module to internal headers only. 2016-04-26 11:20:25 +02:00
Benoit Steiner
4a164d2c46 Fixed the partial evaluation of non vectorizable tensor subexpressions 2016-04-25 10:43:03 -07:00
Benoit Steiner
fd9401f260 Refined the cost of the striding operation. 2016-04-25 09:16:08 -07:00
Benoit Steiner
4bbc97be5e Provide access to the base threadpool classes 2016-04-21 17:59:33 -07:00
Benoit Steiner
33adce5c3a Added the ability to switch to the new thread pool with a #define 2016-04-21 11:59:58 -07:00
Benoit Steiner
f670613e4b Fixed several compilation warnings 2016-04-21 11:03:02 -07:00
Benoit Steiner
2dde1b1028 Don't crash when attempting to reduce empty tensors. 2016-04-20 18:08:20 -07:00
Benoit Steiner
c7c2054bb5 Started to implement a portable way to yield. 2016-04-19 17:59:58 -07:00
Benoit Steiner
2b72163028 Implemented a more portable version of thread local variables 2016-04-19 15:56:02 -07:00
Benoit Steiner
5b1106c56b Fixed a compilation error with nvcc 7. 2016-04-19 14:57:57 -07:00
Benoit Steiner
7129d998db Simplified the code that launches cuda kernels. 2016-04-19 14:55:21 -07:00
Benoit Steiner
b9ea40c30d Don't take the address of a kernel on CUDA devices that don't support this feature. 2016-04-19 14:35:11 -07:00
Benoit Steiner
884c075058 Use numext::ceil instead of std::ceil 2016-04-19 14:33:30 -07:00
Benoit Steiner
a278414d1b Avoid an unnecessary copy of the evaluator. 2016-04-19 13:54:28 -07:00
Benoit Steiner
50968a0a3e Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors. 2016-04-19 11:53:58 -07:00
Benoit Steiner
c8e8f93d6c Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators. 2016-04-15 16:48:10 -07:00
Benoit Steiner
7cff898e0a Deleted unnecessary variable 2016-04-15 15:46:14 -07:00
Benoit Steiner
6c43c49e4a Fixed a few compilation warnings 2016-04-15 15:34:34 -07:00
Benoit Steiner
eb669f989f Merged in rmlarsen/eigen (pull request PR-178)
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.
2016-04-15 14:53:15 -07:00
Rasmus Munk Larsen
3718bf654b Get rid of void* casting when calling EvalRange::run. 2016-04-15 12:51:33 -07:00
Benoit Steiner
a62e924656 Added ability to access the cache sizes from the tensor devices 2016-04-14 21:25:06 -07:00
Benoit Steiner
18e6f67426 Added support for exclusive or 2016-04-14 20:37:46 -07:00
Rasmus Munk Larsen
07ac4f7e02 Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default. 2016-04-14 18:28:23 -07:00
Benoit Steiner
9624a1ea3d Added missing definition of PacketSize in the gpu evaluator of convolution 2016-04-14 17:16:58 -07:00
Benoit Steiner
6fbedf5a4e Merged in rmlarsen/eigen (pull request PR-177)
Eigen Tensor cost model part 1.
2016-04-14 17:13:19 -07:00
Benoit Steiner
9c064b5a97 Cleanup 2016-04-14 16:41:31 -07:00
Benoit Steiner
1372156c41 Prepared the migration to the new non blocking thread pool 2016-04-14 16:16:42 -07:00
Rasmus Munk Larsen
aeb5494a0b Improvements to cost model. 2016-04-14 15:52:58 -07:00
Benoit Steiner
78a51abc12 Added a more scalable non blocking thread pool 2016-04-14 15:23:10 -07:00
Rasmus Munk Larsen
d2e95492e7 Merge upstream updates. 2016-04-14 13:59:50 -07:00
Rasmus Munk Larsen
235e83aba6 Eigen cost model part 1. This implements a basic recursive framework to estimate the cost of evaluating tensor expressions. 2016-04-14 13:57:35 -07:00
Benoit Steiner
5912ad877c Silenced a compilation warning 2016-04-14 11:40:14 -07:00
Benoit Steiner
c7167fee0e Added support for fp16 to the sigmoid function 2016-04-14 10:08:33 -07:00
Benoit Steiner
3b76df64fc Defer the decision to vectorize tensor CUDA code to the meta kernel. This makes it possible to decide to vectorize or not depending on the capability of the target cuda architecture. In particular, this enables us to vectorize the processing of fp16 when running on device of capability >= 5.3 2016-04-12 10:58:51 -07:00
Benoit Steiner
7d5b17087f Added missing EIGEN_DEVICE_FUNC to the tensor conversion code. 2016-04-07 20:01:19 -07:00
Benoit Steiner
48308ed801 Added support for isinf, isnan, and isfinite checks to the tensor api 2016-04-07 09:48:36 -07:00
Benoit Steiner
cfb34d808b Fixed a possible integer overflow. 2016-04-07 08:46:52 -07:00
Benoit Steiner
7be1eaad1e Fixed typos in the implementation of the zeta and polygamma ops. 2016-04-06 14:15:37 -07:00
tillahoffmann
726bd5f077 Merged eigen/eigen into default 2016-04-05 18:21:05 +01:00
Gael Guennebaud
4d7e230d2f bug #1189: fix pow/atan2 compilation for AutoDiffScalar 2016-04-05 14:49:41 +02:00
Till Hoffmann
80eba21ad0 Merge upstream. 2016-04-01 18:18:49 +01:00
Till Hoffmann
ffd770ce94 Fixed CUDA signature. 2016-04-01 17:58:24 +01:00
tillahoffmann
49960adbdd Merged eigen/eigen into default 2016-04-01 14:36:15 +01:00
Till Hoffmann
57239f4a81 Added polygamma function. 2016-04-01 14:35:21 +01:00
Till Hoffmann
dd5d390daf Added zeta function. 2016-04-01 13:32:29 +01:00
Benoit Steiner
3da495e6b9 Relaxed the condition used to gate the fft code. 2016-03-31 18:11:51 -07:00
Benoit Steiner
0f5cc504fe Properly gate the fft code 2016-03-31 12:59:39 -07:00
Benoit Steiner
af4ef540bf Fixed a off-by-one bug in a debug assertion 2016-03-30 18:37:19 -07:00
Benoit Steiner
791e5cfb69 Added NumTraits for type2index. 2016-03-30 18:36:36 -07:00
Benoit Steiner
483aaad10a Fixed compilation warning 2016-03-30 17:08:13 -07:00
Benoit Steiner
1b40abbf99 Added missing assignment operator to the TensorUInt128 class, and made misc small improvements 2016-03-30 13:17:03 -07:00
Benoit Steiner
aa45ad2aac Fixed the formatting of the README. 2016-03-29 15:06:13 -07:00
Benoit Steiner
56df5ef1d7 Attempt to fix the formatting of the README 2016-03-29 15:03:38 -07:00
Benoit Steiner
c38295f0a0 Added support for fmod 2016-03-28 15:53:02 -07:00
Benoit Steiner
6772f653c3 Made it possible to customize the threadpool 2016-03-28 10:01:04 -07:00
Benoit Steiner
1bc81f7889 Fixed compilation warnings on arm 2016-03-28 09:21:04 -07:00
Benoit Steiner
78f83d6f6a Prevent potential overflow. 2016-03-28 09:18:04 -07:00
Benoit Steiner
74f91ed06c Improved support for integer modulo 2016-03-25 17:21:56 -07:00
Benoit Steiner
41434a8a85 Avoid unnecessary conversions 2016-03-23 16:52:38 -07:00
Benoit Steiner
92693b50eb Fixed compilation warning 2016-03-23 16:40:36 -07:00
Benoit Steiner
393bc3b16b Added comment 2016-03-23 16:22:15 -07:00
Christoph Hertzberg
9642fd7a93 Replace all M_PI by EIGEN_PI and add a check to the testsuite. 2016-03-23 15:37:45 +01:00
Benoit Steiner
3d1e857327 Fixed compilation error 2016-03-22 15:48:28 -07:00
Benoit Steiner
de7d92c259 Pulled latest updates from trunk 2016-03-22 15:24:49 -07:00
Benoit Steiner
002cf0d1c9 Use a single Barrier instead of a collection of Notifications to reduce the thread synchronization overhead 2016-03-22 15:24:23 -07:00
Benoit Steiner
bc2b802751 Fixed a couple of typos 2016-03-22 14:27:34 -07:00
Benoit Steiner
6a31b7be3e Avoid using std::vector whenever possible 2016-03-22 14:02:50 -07:00
Benoit Steiner
65a7113a36 Use an enum instead of a static const int to prevent possible link error 2016-03-22 09:33:54 -07:00
Benoit Steiner
f9ad25e4d8 Fixed contractions of 16 bit floats 2016-03-22 09:30:23 -07:00
Benoit Steiner
8ef3181f15 Worked around a constness related issue 2016-03-21 11:24:05 -07:00
Benoit Steiner
7a07d6aa2b Small cleanup 2016-03-21 11:12:17 -07:00
Benoit Steiner
e91f255301 Marked variables that's only used in debug mode as such 2016-03-21 10:02:00 -07:00
Benoit Steiner
db5c14de42 Explicitly cast the default value into the proper scalar type. 2016-03-21 09:52:58 -07:00
Benoit Steiner
8e03333f06 Renamed some class members to make the code more readable. 2016-03-18 15:21:04 -07:00
Benoit Steiner
6c08943d9f Fixed a bug in the padding of extracted image patches. 2016-03-18 15:19:10 -07:00
Benoit Steiner
9a7ece9caf Worked around constness issue 2016-03-18 10:38:29 -07:00
Benoit Steiner
edc679f6c6 Fixed compilation warning 2016-03-18 07:12:34 -07:00
Benoit Steiner
70eb70f5f8 Avoid mutable class members when possible 2016-03-17 21:47:18 -07:00
Benoit Steiner
95b8961a9b Allocate the mersenne twister used by the random number generators on the heap instead of on the stack since they tend to keep a lot of state (i.e. about 5k) around. 2016-03-17 15:23:51 -07:00
Benoit Steiner
f7329619da Fix bug in tensor contraction. The code assumes that contraction axis indices for the LHS (after possibly swapping to ColMajor!) is increasing. Explicitly sort the contraction axis pairs to make it so. 2016-03-17 15:08:02 -07:00
Christoph Hertzberg
46aa9772fc Merged in ebrevdo/eigen (pull request PR-169)
Bugfixes to cuda tests, igamma & igammac implemented, & tests for digamma, igamma, igammac on CPU & GPU.
2016-03-16 21:59:08 +01:00
Benoit Steiner
b72ffcb05e Made the comparison of Eigen::array GPU friendly 2016-03-11 16:37:59 -08:00
Benoit Steiner
25f69cb932 Added a comparison operator for Eigen::array
Alias Eigen::array to std::array when compiling with Visual Studio 2015
2016-03-11 15:20:37 -08:00
Benoit Steiner
86d45a3c83 Worked around visual studio compilation warnings. 2016-03-09 21:29:39 -08:00
Benoit Steiner
8fd4241377 Fixed a typo. 2016-03-10 02:28:46 +00:00
Benoit Steiner
a685a6beed Made the list reductions less ambiguous. 2016-03-09 17:41:52 -08:00
Benoit Steiner
3149b5b148 Avoid implicit cast 2016-03-09 17:35:17 -08:00
Benoit Steiner
b2100b83ad Made sure to include the <random> header file when compiling with visual studio 2016-03-09 16:03:16 -08:00
Benoit Steiner
f05fb449b8 Avoid unnecessary conversion from 32bit int to 64bit unsigned int 2016-03-09 15:27:45 -08:00
Benoit Steiner
1d566417d2 Enable the random number generators when compiling with visual studio 2016-03-09 10:55:11 -08:00
Benoit Steiner
b084133dbf Fixed the integer division code on windows 2016-03-09 07:06:36 -08:00
Benoit Steiner
6d30683113 Fixed static assertion 2016-03-08 21:02:51 -08:00
Eugene Brevdo
5e7de771e3 Properly fix merge issues. 2016-03-08 17:35:05 -08:00
Benoit Steiner
46177c8d64 Replace std::vector with our own implementation, as using the stl when compiling with nvcc and avx enabled leads to many issues. 2016-03-08 16:37:27 -08:00
Benoit Steiner
6d6413f768 Simplified the full reduction code 2016-03-08 16:02:00 -08:00
Benoit Steiner
5a427a94a9 Fixed the tensor generator code 2016-03-08 13:28:06 -08:00
Benoit Steiner
a81b88bef7 Fixed the tensor concatenation code 2016-03-08 12:30:19 -08:00
Benoit Steiner
551ff11d0d Fixed the tensor layout swapping code 2016-03-08 12:28:10 -08:00
Benoit Steiner
8768c063f5 Fixed the tensor chipping code. 2016-03-08 12:26:49 -08:00
Benoit Steiner
e09eb835db Decoupled the packet type definition from the definition of the tensor ops. All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit. 2016-03-08 12:07:33 -08:00
Benoit Steiner
3b614a2358 Use NumTraits::highest() and NumTraits::lowest() instead of the std::numeric_limits to make the tensor min and max functors more CUDA friendly. 2016-03-07 17:53:28 -08:00
Benoit Steiner
769685e74e Added the ability to pad a tensor using a non-zero value 2016-03-07 14:45:37 -08:00
Benoit Steiner
7f87cc3a3b Fix a couple of typos in the code. 2016-03-07 14:31:27 -08:00
Eugene Brevdo
5707004d6b Fix Eigen's building of sharded tests that use CUDA & more igamma/igammac bugfixes.
0. Prior to this PR, not a single sharded CUDA test was actually being *run*.
Fixed that.

GPU tests are still failing for igamma/igammac.

1. Add calls for igamma/igammac to TensorBase
2. Fix up CUDA-specific calls of igamma/igammac
3. Add unit tests for digamma, igamma, igammac in CUDA.
2016-03-07 14:08:56 -08:00
Benoit Steiner
9a54c3e32b Don't warn that msvc 2015 isn't c++11 compliant just because it doesn't claim to be. 2016-03-06 09:38:56 -08:00
Benoit Steiner
05bbca079a Turn on some of the cxx11 features when compiling with visual studio 2015 2016-03-05 10:52:08 -08:00
Benoit Steiner
23aed8f2e4 Use EIGEN_PI instead of redefining our own constant PI 2016-03-05 08:04:45 -08:00
Benoit Steiner
ec35068edc Don't rely on the M_PI constant since not all compilers provide it. 2016-03-04 16:42:38 -08:00
Benoit Steiner
60d9df11c1 Fixed the computation of leading zeros when compiling with msvc. 2016-03-04 16:27:02 -08:00
Benoit Steiner
c561eeb7bf Don't use implicit type conversions in initializer lists since not all compilers support them. 2016-03-04 14:12:45 -08:00
Benoit Steiner
2c50fc878e Fixed a typo 2016-03-04 14:09:38 -08:00
Benoit Steiner
5cf4558c0a Added support for rounding, flooring, and ceiling to the tensor api 2016-03-03 12:36:55 -08:00
Benoit Steiner
68ac5c1738 Improved the performance of large outer reductions on cuda 2016-02-29 18:11:58 -08:00
Benoit Steiner
b2075cb7a2 Made the signature of the inner and outer reducers consistent 2016-02-29 10:53:38 -08:00
Benoit Steiner
3284842045 Optimized the performance of narrow reductions on CUDA devices 2016-02-29 10:48:16 -08:00
Benoit Steiner
609b3337a7 Print some information to stderr when a CUDA kernel fails 2016-02-27 20:42:57 +00:00
Benoit Steiner
ac2e6e0d03 Properly vectorized the random number generators 2016-02-26 13:52:24 -08:00
Benoit Steiner
caa54d888f Made the TensorIndexList usable on GPU without having to use the -relaxed-constexpr compilation flag 2016-02-26 12:38:18 -08:00
Benoit Steiner
2cd32cad27 Reverted previous commit since it caused more problems than it solved 2016-02-26 13:21:44 +00:00
Benoit Steiner
d9d05dd96e Fixed handling of long doubles on aarch64 2016-02-26 04:13:58 -08:00
Benoit Steiner
c36c09169e Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices. 2016-02-24 17:07:25 -08:00
Benoit Steiner
7a01cb8e4b Marked the And and Or reducers as stateless. 2016-02-24 16:43:01 -08:00
Benoit Steiner
1d9256f7db Updated the padding code to work with half floats 2016-02-23 05:51:22 +00:00
Benoit Steiner
72d2cf642e Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode. 2016-02-22 15:29:41 -08:00
Benoit Steiner
5cd00068c0 include <iostream> in the tensor header since we now use it to better report cuda initialization errors 2016-02-22 13:59:03 -08:00
Benoit Steiner
257b640463 Fixed compilation warning generated by clang 2016-02-21 22:43:37 -08:00
Benoit Steiner
96a24b05cc Optimized casting of tensors in the case where the casting happens to be a no-op 2016-02-21 11:16:15 -08:00
Benoit Steiner
203490017f Prevent unecessary Index to int conversions 2016-02-21 08:49:36 -08:00
Rasmus Munk Larsen
8eb127022b Get rid of duplicate code. 2016-02-19 16:33:30 -08:00
Rasmus Munk Larsen
d5e2ec7447 Speed up tensor FFT by up ~25-50%.
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_tensor_fft_single_1D_cpu/8            132       134     -1.5%
BM_tensor_fft_single_1D_cpu/9           1162      1229     -5.8%
BM_tensor_fft_single_1D_cpu/16           199       195     +2.0%
BM_tensor_fft_single_1D_cpu/17          2587      2267    +12.4%
BM_tensor_fft_single_1D_cpu/32           373       341     +8.6%
BM_tensor_fft_single_1D_cpu/33          5922      4879    +17.6%
BM_tensor_fft_single_1D_cpu/64           797       675    +15.3%
BM_tensor_fft_single_1D_cpu/65         13580     10481    +22.8%
BM_tensor_fft_single_1D_cpu/128         1753      1375    +21.6%
BM_tensor_fft_single_1D_cpu/129        31426     22789    +27.5%
BM_tensor_fft_single_1D_cpu/256         4005      3008    +24.9%
BM_tensor_fft_single_1D_cpu/257        70910     49549    +30.1%
BM_tensor_fft_single_1D_cpu/512         8989      6524    +27.4%
BM_tensor_fft_single_1D_cpu/513       165402    107751    +34.9%
BM_tensor_fft_single_1D_cpu/999       198293    115909    +41.5%
BM_tensor_fft_single_1D_cpu/1ki        21289     14143    +33.6%
BM_tensor_fft_single_1D_cpu/1k        361980    233355    +35.5%
BM_tensor_fft_double_1D_cpu/8            138       131     +5.1%
BM_tensor_fft_double_1D_cpu/9           1253      1133     +9.6%
BM_tensor_fft_double_1D_cpu/16           218       200     +8.3%
BM_tensor_fft_double_1D_cpu/17          2770      2392    +13.6%
BM_tensor_fft_double_1D_cpu/32           406       368     +9.4%
BM_tensor_fft_double_1D_cpu/33          6418      5153    +19.7%
BM_tensor_fft_double_1D_cpu/64           856       728    +15.0%
BM_tensor_fft_double_1D_cpu/65         14666     11148    +24.0%
BM_tensor_fft_double_1D_cpu/128         1913      1502    +21.5%
BM_tensor_fft_double_1D_cpu/129        36414     24072    +33.9%
BM_tensor_fft_double_1D_cpu/256         4226      3216    +23.9%
BM_tensor_fft_double_1D_cpu/257        86638     52059    +39.9%
BM_tensor_fft_double_1D_cpu/512         9397      6939    +26.2%
BM_tensor_fft_double_1D_cpu/513       203208    114090    +43.9%
BM_tensor_fft_double_1D_cpu/999       237841    125583    +47.2%
BM_tensor_fft_double_1D_cpu/1ki        20921     15392    +26.4%
BM_tensor_fft_double_1D_cpu/1k        455183    250763    +44.9%
BM_tensor_fft_single_2D_cpu/8           1051      1005     +4.4%
BM_tensor_fft_single_2D_cpu/9          16784     14837    +11.6%
BM_tensor_fft_single_2D_cpu/16          4074      3772     +7.4%
BM_tensor_fft_single_2D_cpu/17         75802     63884    +15.7%
BM_tensor_fft_single_2D_cpu/32         20580     16931    +17.7%
BM_tensor_fft_single_2D_cpu/33        345798    278579    +19.4%
BM_tensor_fft_single_2D_cpu/64         97548     81237    +16.7%
BM_tensor_fft_single_2D_cpu/65       1592701   1227048    +23.0%
BM_tensor_fft_single_2D_cpu/128       472318    384303    +18.6%
BM_tensor_fft_single_2D_cpu/129      7038351   5445308    +22.6%
BM_tensor_fft_single_2D_cpu/256      2309474   1850969    +19.9%
BM_tensor_fft_single_2D_cpu/257     31849182  23797538    +25.3%
BM_tensor_fft_single_2D_cpu/512     10395194   8077499    +22.3%
BM_tensor_fft_single_2D_cpu/513     144053843  104242541    +27.6%
BM_tensor_fft_single_2D_cpu/999     279885833  208389718    +25.5%
BM_tensor_fft_single_2D_cpu/1ki     45967677  36070985    +21.5%
BM_tensor_fft_single_2D_cpu/1k      619727095  456489500    +26.3%
BM_tensor_fft_double_2D_cpu/8           1110      1016     +8.5%
BM_tensor_fft_double_2D_cpu/9          17957     15768    +12.2%
BM_tensor_fft_double_2D_cpu/16          4558      4000    +12.2%
BM_tensor_fft_double_2D_cpu/17         79237     66901    +15.6%
BM_tensor_fft_double_2D_cpu/32         21494     17699    +17.7%
BM_tensor_fft_double_2D_cpu/33        357962    290357    +18.9%
BM_tensor_fft_double_2D_cpu/64        105179     87435    +16.9%
BM_tensor_fft_double_2D_cpu/65       1617143   1288006    +20.4%
BM_tensor_fft_double_2D_cpu/128       512848    419397    +18.2%
BM_tensor_fft_double_2D_cpu/129      7271322   5636884    +22.5%
BM_tensor_fft_double_2D_cpu/256      2415529   1922032    +20.4%
BM_tensor_fft_double_2D_cpu/257     32517952  24462177    +24.8%
BM_tensor_fft_double_2D_cpu/512     10724898   8287617    +22.7%
BM_tensor_fft_double_2D_cpu/513     146007419  108603266    +25.6%
BM_tensor_fft_double_2D_cpu/999     296351330  221885776    +25.1%
BM_tensor_fft_double_2D_cpu/1ki     59334166  48357539    +18.5%
BM_tensor_fft_double_2D_cpu/1k      666660132  483840349    +27.4%
2016-02-19 16:29:23 -08:00
Benoit Steiner
46fc23f91c Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues. 2016-02-19 13:44:22 -08:00
Benoit Steiner
670db7988d Updated the contraction code to make it compatible with half floats. 2016-02-19 13:03:26 -08:00
Benoit Steiner
180156ba1a Added support for tensor reductions on half floats 2016-02-19 10:05:59 -08:00
Benoit Steiner
f268db1c4b Added the ability to query the minor version of a cuda device 2016-02-19 16:31:04 +00:00
Benoit Steiner
f3352e0fb0 Don't make the array constructors explicit 2016-02-19 15:58:57 +00:00
Benoit Steiner
cd042dbbfd Fixed a bug in the tensor type converter 2016-02-19 15:03:26 +00:00
Benoit Steiner
de345eff2e Added a method to conjugate the content of a tensor or the result of a tensor expression. 2016-02-11 16:34:07 -08:00
Benoit Steiner
9a21b38ccc Worked around a few clang compilation warnings 2016-02-10 08:02:04 -08:00
Benoit Steiner
72ab7879f7 Fixed clang comilation warnings 2016-02-10 06:48:28 -08:00
Benoit Steiner
e88535634d Fixed some clang compilation warnings 2016-02-09 23:32:41 -08:00
Benoit Steiner
d69946183d Updated the TensorIntDivisor code to work properly on LLP64 systems 2016-02-08 21:03:59 -08:00
Benoit Steiner
4d4211c04e Avoid unecessary type conversions 2016-02-05 18:19:41 -08:00
Benoit Steiner
f535378995 Added support for vectorized type casting of int to char. 2016-02-03 18:58:29 -08:00
Benoit Steiner
4ab63a3f6f Fixed the initialization of the dummy member of the array class to make it compatible with pairs of element. 2016-02-03 17:23:07 -08:00
Benoit Steiner
1cbb79cdfd Made sure the dummy element of size 0 array is always intialized to silence some compiler warnings 2016-02-03 15:58:26 -08:00
Benoit Steiner
dc413dbe8a Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158)
Add constructor for long types.
2016-02-02 20:58:06 -08:00
Ville Kallioniemi
783018d8f6 Use EIGEN_STATIC_ASSERT for backward compatibility. 2016-02-02 16:45:12 -07:00
Benoit Steiner
99cde88341 Don't try to use direct offsets when computing a tensor product, since the required stride isn't available. 2016-02-02 11:06:53 -08:00
Ville Kallioniemi
aedea349aa Replace separate low word constructors with a single templated constructor. 2016-02-01 20:25:02 -07:00
Ville Kallioniemi
f0fdefa96f Rebase to latest. 2016-02-01 19:32:31 -07:00
Benoit Steiner
6b5dff875e Made it possible to limit the number of blocks that will be used to evaluate a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations. 2016-02-01 12:46:32 -08:00
Benoit Steiner
e80ed948e1 Fixed a number of compilation warnings generated by the cuda tests 2016-01-31 20:09:41 -08:00
Benoit Steiner
6720b38fbf Fixed a few compilation warnings 2016-01-31 16:48:50 -08:00
Benoit Steiner
963f2d2a8f Marked several methods EIGEN_DEVICE_FUNC 2016-01-28 23:37:48 -08:00
Benoit Steiner
c5d25bf1d0 Fixed a couple of compilation warnings. 2016-01-28 23:15:45 -08:00
Gael Guennebaud
ddf64babde merge 2016-01-28 13:21:48 +01:00
Benoit Steiner
4bf9eaf77a Deleted an invalid assertion that prevented the assignment of empty tensors. 2016-01-27 17:09:30 -08:00
Benoit Steiner
291069e885 Fixed some compilation problems with nvcc + clang 2016-01-27 15:37:03 -08:00
Gael Guennebaud
9c8f7dfe94 bug #1156: fix several function declarations whose arguments were passed by value instead of being passed by reference 2016-01-27 18:34:42 +01:00
Ville Kallioniemi
02db1228ed Add constructor for long types. 2016-01-26 23:41:01 -07:00
Hauke Heibel
5eb2790be0 Fixed minor typo in SplineFitting. 2016-01-25 22:17:52 +01:00
Benoit Steiner
e3a15a03a4 Don't explicitely evaluate the subexpression from TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression 2016-01-24 23:04:50 -08:00
Benoit Steiner
bd207ce11e Added missing EIGEN_DEVICE_FUNC qualifier 2016-01-24 20:36:05 -08:00
Benoit Steiner
cb4e53ff7f Merged in ville-k/eigen/tensorflow_fix (pull request PR-153)
Add ctor for long
2016-01-22 19:11:31 -08:00
Ville Kallioniemi
9f94e030c1 Re-add executable flags to minimize changeset. 2016-01-22 20:08:45 -07:00
Benoit Steiner
3aeeca32af Leverage the new blocking code in the tensor contraction code. 2016-01-22 16:36:30 -08:00
Benoit Steiner
4beb447e27 Created a mechanism to enable contraction mappers to determine the best blocking strategy. 2016-01-22 14:37:26 -08:00
Gael Guennebaud
6a44ccb58b Backout changeset 690bc950f7 2016-01-22 15:03:53 +01:00
Ville Kallioniemi
9b6c72958a Update to latest default branch 2016-01-21 23:08:54 -07:00
Benoit Steiner
c33479324c Fixed a constness bug 2016-01-21 17:08:11 -08:00
Jan Prach
690bc950f7 fix clang warnings
"braces around scalar initializer"
2016-01-20 19:35:59 -08:00
Benoit Steiner
7ce932edd3 Small cleanup and small fix to the contraction of row major tensors 2016-01-20 18:12:08 -08:00
Benoit Steiner
47076bf00e Reduce the register pressure exerted by the tensor mappers whenever possible. This improves the performance of the contraction of a matrix with a vector by about 35%. 2016-01-20 14:51:48 -08:00
Ville Kallioniemi
915e7667cd Remove executable bit from header files 2016-01-19 21:17:29 -07:00
Ville Kallioniemi
2832175a68 Use explicitly 32 bit integer types in constructors. 2016-01-19 20:12:17 -07:00
Benoit Steiner
df79c00901 Improved the formatting of the code 2016-01-19 17:24:08 -08:00
Benoit Steiner
6d472d8375 Moved the contraction mapping code to its own file to make the code more manageable. 2016-01-19 17:22:05 -08:00
Benoit Steiner
b3b722905f Improved code indentation 2016-01-19 17:09:47 -08:00
Benoit Steiner
5b7713dd33 Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression. 2016-01-19 17:05:10 -08:00