Commit Graph

2109 Commits

Author SHA1 Message Date
Gael Guennebaud
fa39f81b48 Fix instantiation of ScalarBinaryOpTraits for AutoDiff. 2016-06-24 11:33:30 +02:00
Rasmus Munk Larsen
a9c1e4d7b7 Return -1 from CurrentThreadId when called by thread outside the pool. 2016-06-23 16:40:07 -07:00
Rasmus Munk Larsen
d39df320d2 Resolve merge. 2016-06-23 15:08:03 -07:00
Gael Guennebaud
361dbd246d Add unit test for printing empty tensors 2016-06-23 18:54:30 +02:00
Gael Guennebaud
360a743a10 bug #1241: does not emmit anything for empty tensors 2016-06-23 18:47:31 +02:00
Gael Guennebaud
7c6561485a merge PR 194 2016-06-23 15:29:57 +02:00
Benoit Steiner
a29a2cb4ff Silenced a couple of compilation warnings generated by xcode 2016-06-22 16:43:02 -07:00
Benoit Steiner
f8fcd6b32d Turned the constructor of the PerThread struct into what is effectively a constant expression to make the code compatible with a wider range of compilers 2016-06-22 16:03:11 -07:00
Benoit Steiner
c58df31747 Handle empty tensors in the print functions 2016-06-21 09:22:43 -07:00
Benoit Steiner
de32f8d656 Fixed the printing of rank-0 tensors 2016-06-20 10:46:45 -07:00
Tal Hadad
8e198d6835 Complete docs and add ostream operator for EulerAngles. 2016-06-19 20:42:45 +03:00
Geoffrey Lalonde
72c95383e0 Add autodiff coverage for standard library hyperbolic functions, and tests.
* * *
Corrected tanh derivatived, moved test definitions.
* * *
Added more test cases, removed lingering lines
2016-06-15 23:33:19 -07:00
Benoit Steiner
7d495d890a Merged in ibab/eigen (pull request PR-197)
Implement exclusive scan option for Tensor library
2016-06-14 17:54:59 -07:00
Benoit Steiner
aedc5be1d6 Avoid generating pseudo random numbers that are multiple of 5: this helps
spread the load over multiple cpus without havind to rely on work stealing.
2016-06-14 17:51:47 -07:00
Igor Babuschkin
c4d10e921f Implement exclusive scan option 2016-06-14 19:44:07 +01:00
Gael Guennebaud
76236cdea4 merge 2016-06-14 15:33:47 +02:00
Gael Guennebaud
62134082aa Update AutoDiffScalar wrt to scalar-multiple. 2016-06-14 15:06:35 +02:00
Gael Guennebaud
5d38203735 Update Tensor module to use bind1st_op and bind2nd_op 2016-06-14 15:06:03 +02:00
Gael Guennebaud
f925dba3d9 Fix compilation of BVH example 2016-06-14 11:32:09 +02:00
Tal Hadad
6edfe8771b Little bit docs 2016-06-13 22:03:19 +03:00
Tal Hadad
6e1c086593 Add static assertion 2016-06-13 21:55:17 +03:00
Gael Guennebaud
3c12e24164 Add bind1st_op and bind2nd_op helpers to turn binary functors into unary ones, and implement scalar_multiple2 and scalar_quotient2 on top of them. 2016-06-13 16:18:59 +02:00
Tal Hadad
06206482d9 More docs, and minor code fixes 2016-06-12 23:40:17 +03:00
Benoit Steiner
65d33e5898 Merged in ibab/eigen (pull request PR-195)
Add small fixes to TensorScanOp
2016-06-10 19:31:17 -07:00
Benoit Steiner
a05607875a Don't refer to the half2 type unless it's been defined 2016-06-10 11:53:56 -07:00
Igor Babuschkin
86aedc9282 Add small fixes to TensorScanOp 2016-06-07 20:06:38 +01:00
Christoph Hertzberg
db0118342c Fixed compilation of BVH_Example (required for make doc) 2016-06-07 19:17:18 +02:00
Benoit Steiner
84b2060a9e Fixed compilation error with gcc 4.4 2016-06-06 17:16:19 -07:00
Benoit Steiner
7ef9f47b58 Misc small improvements to the reduction code. 2016-06-06 14:09:46 -07:00
Tal Hadad
e30133e439 Doc EulerAngles class, and minor fixes. 2016-06-06 22:01:40 +03:00
Benoit Steiner
9137f560f0 Moved assertions to the constructor to make the code more portable 2016-06-06 07:26:48 -07:00
Gael Guennebaud
66e99ab6a1 Relax mixing-type constraints for binary coefficient-wise operators:
- Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP>
- Remove the "functor_is_product_like" helper (was pretty ugly)
- Currently, OP is not used, but it is available to the user for fine grained tuning
- Currently, only the following operators have been generalized: *,/,+,-,=,*=,/=,+=,-=
- TODO: generalize all other binray operators (comparisons,pow,etc.)
- TODO: handle "scalar op array" operators (currently only * is handled)
- TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
2016-06-06 15:11:41 +02:00
Rasmus Munk Larsen
f1f2ff8208 size_t -> int 2016-06-03 18:06:37 -07:00
Rasmus Munk Larsen
76308e7fd2 Add CurrentThreadId and NumThreads methods to Eigen threadpools and TensorDeviceThreadPool. 2016-06-03 16:28:58 -07:00
Benoit Steiner
37638dafd7 Simplified the code that dispatches vectorized reductions on GPU 2016-06-09 10:29:52 -07:00
Benoit Steiner
66796e843d Fixed definition of some of the reducer_traits 2016-06-09 08:50:01 -07:00
Benoit Steiner
14a112ee15 Use signed integers more consistently to encode the number of threads to use to evaluate a tensor expression. 2016-06-09 08:25:22 -07:00
Benoit Steiner
8f92c26319 Improved code formatting 2016-06-09 08:23:42 -07:00
Benoit Steiner
aa33446dac Improved support for vectorization of 16-bit floats 2016-06-09 08:22:27 -07:00
Benoit Steiner
d6d39c7ddb Added missing EIGEN_DEVICE_FUNC 2016-06-07 14:35:08 -07:00
Gael Guennebaud
e8b922ca63 Fix MatrixFunctions module. 2016-06-03 09:21:35 +02:00
Benoit Steiner
c3c8ad8046 Align the first element of the Waiter struct instead of padding it. This reduces its memory footprint a bit while achieving the goal of preventing false sharing 2016-06-02 21:17:41 -07:00
Eugene Brevdo
39baff850c Add TernaryFunctors and the betainc SpecialFunction.
TernaryFunctors and their executors allow operations on 3-tuples of inputs.
API fully implemented for Arrays and Tensors based on binary functors.

Ported the cephes betainc function (regularized incomplete beta
integral) to Eigen, with support for CPU and GPU, floats, doubles, and
half types.

Added unit tests in array.cpp and cxx11_tensor_cuda.cu


Collapsed revision
* Merged helper methods for betainc across floats and doubles.
* Added TensorGlobalFunctions with betainc().  Removed betainc() from TensorBase.
* Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper.
* betainc: merge incbcf and incbd into incbeta_cfe.  and more cleanup.
* Update TernaryOp and SpecialFunctions (betainc) based on review comments.
2016-06-02 17:04:19 -07:00
Benoit Steiner
02db4e1a82 Disable the tensor tests when using msvc since older versions of the compiler fail to handle this code 2016-06-04 08:21:17 -07:00
Benoit Steiner
c21eaedce6 Use array_prod to compute the number of elements contained in the input tensor expression 2016-06-04 07:47:04 -07:00
Benoit Steiner
36a4500822 Merged in ibab/eigen (pull request PR-192)
Add generic scan method
2016-06-03 17:28:33 -07:00
Benoit Steiner
c2a102345f Improved the performance of full reductions.
AFTER:
BM_fullReduction/10        4541       4543     154017  21.0M items/s
BM_fullReduction/64        5191       5193     100000  752.5M items/s
BM_fullReduction/512       9588       9588      71361  25.5G items/s
BM_fullReduction/4k      244314     244281       2863  64.0G items/s
BM_fullReduction/5k      359382     359363       1946  64.8G items/s

BEFORE:
BM_fullReduction/10        9085       9087      74395  10.5M items/s
BM_fullReduction/64        9478       9478      72014  412.1M items/s
BM_fullReduction/512      14643      14646      46902  16.7G items/s
BM_fullReduction/4k      260338     260384       2678  60.0G items/s
BM_fullReduction/5k      385076     385178       1818  60.5G items/s
2016-06-03 17:27:08 -07:00
Igor Babuschkin
dc03b8f3a1 Add generic scan method 2016-06-03 17:37:04 +01:00
Rasmus Munk Larsen
811aadbe00 Add syntactic sugar to Eigen tensors to allow more natural syntax.
Specifically, this enables expressions involving:

scalar + tensor
scalar * tensor
scalar / tensor
scalar - tensor
2016-06-02 12:41:28 -07:00
Tal Hadad
52e4cbf539 Merged eigen/eigen into default 2016-06-02 22:15:20 +03:00
Tal Hadad
2aaaf22623 Fix Gael reports (except documention)
- "Scalar angle(int) const"  should be  "const Vector& angles() const"
- then method "coeffs" could be removed.
- avoid one letter names like h, p, r -> use alpha(), beta(), gamma() ;)
- about the "fromRotation" methods:
 - replace the ones which are not static by operator= (as in Quaternion)
 - the others are actually static methods: use a capital F: FromRotation
- method "invert" should be removed.
- use a macro to define both float and double EulerAnglesXYZ* typedefs
- AddConstIf -> not used
- no needs for NegateIfXor, compilers are extremely good at optimizing away branches based on compile time constants:
  if(IsHeadingOpposite-=IsEven) res.alpha() = -res.alpha();
2016-06-02 22:12:57 +03:00
Igor Babuschkin
fbd7ed6ff7 Add tensor scan op
This is the initial implementation a generic scan operation.
Based on this, cumsum and cumprod method have been added to TensorBase.
2016-06-02 13:35:47 +01:00
Benoit Steiner
0ed08fd281 Use a single PacketSize variable 2016-06-01 21:19:05 -07:00
Benoit Steiner
8f6fedc55f Fixed compilation warning 2016-06-01 21:14:46 -07:00
Benoit Steiner
c3cada38e2 Speedup a test 2016-06-01 21:13:00 -07:00
Benoit Steiner
873e6ac54b Silenced compilation warning generated by nvcc. 2016-06-01 14:20:50 -07:00
Benoit Steiner
d27b0ad4c8 Added support for mean reductions on fp16 2016-06-01 11:12:07 -07:00
Benoit Steiner
5aeb3687c4 Only enable optimized reductions of fp16 if the reduction functor supports them 2016-05-31 10:33:40 -07:00
Benoit Steiner
e2946d962d Reimplement clamp as a static function. 2016-05-27 12:58:43 -07:00
Benoit Steiner
e96d36d4cd Use NULL instead of nullptr to preserve the compatibility with cxx03 2016-05-27 12:54:06 -07:00
Benoit Steiner
abc815798b Added a new operation to enable more powerful tensorindexing. 2016-05-27 12:22:25 -07:00
Benoit Steiner
5707537592 Fixed option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr' warning generated by nvcc 7.5 2016-05-27 10:47:53 -07:00
Gael Guennebaud
22a035db95 Fix compilation when defaulting to row-major 2016-05-27 10:31:11 +02:00
Benoit Steiner
1ae2567861 Fixed some compilation warnings 2016-05-26 15:57:19 -07:00
Benoit Steiner
1a47844529 Preserve the ability to vectorize the evaluation of an expression even when it involves a cast that isn't vectorized (e.g fp16 to float) 2016-05-26 14:37:09 -07:00
Benoit Steiner
36369ab63c Resolved merge conflicts 2016-05-26 13:39:39 -07:00
Benoit Steiner
28fcb5ca2a Merged latest reduction improvements 2016-05-26 12:19:33 -07:00
Benoit Steiner
c1c7f06c35 Improved the performance of inner reductions. 2016-05-26 11:53:59 -07:00
Benoit Steiner
22d02c9855 Improved the coverage of the fp16 reduction tests 2016-05-26 11:12:16 -07:00
Benoit Steiner
8288b0aec2 Code cleanup. 2016-05-26 09:00:04 -07:00
Benoit Steiner
2d7ed54ba2 Made the static storage class qualifier come first. 2016-05-25 22:16:15 -07:00
Benoit Steiner
e1fca8866e Deleted unnecessary explicit qualifiers. 2016-05-25 22:15:26 -07:00
Benoit Steiner
9b0aaf5113 Don't mark inline functions as static since it confuses the ICC compiler 2016-05-25 22:10:11 -07:00
Benoit Steiner
037a463fd5 Marked unused variables as such 2016-05-25 22:07:48 -07:00
Benoit Steiner
3ac4045272 Made the IndexPair code compile in non cxx11 mode 2016-05-25 15:15:12 -07:00
Benoit Steiner
66556d0e05 Made the index pair list code more portable accross various compilers 2016-05-25 14:34:27 -07:00
Benoit Steiner
034aa3b2c0 Improved the performance of tensor padding 2016-05-25 11:43:08 -07:00
Benoit Steiner
58026905ae Added support for statically known lists of pairs of indices 2016-05-25 11:04:14 -07:00
Benoit Steiner
0835667329 There is no need to make the fp16 full reduction kernel a static function. 2016-05-24 23:11:56 -07:00
Benoit Steiner
b5d6b52a4d Fixed compilation warning 2016-05-24 23:10:57 -07:00
Benoit Steiner
a09cbf9905 Merged in rmlarsen/eigen (pull request PR-188)
Minor cleanups: 1. Get rid of a few unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
2016-05-23 12:55:12 -07:00
Christoph Hertzberg
718521d5cf Silenced several double-promotion warnings 2016-05-22 18:17:04 +02:00
Christoph Hertzberg
b5a7603822 fixed macro name 2016-05-22 16:49:29 +02:00
Christoph Hertzberg
25a03c02d6 Fix some sign-compare warnings 2016-05-22 16:42:27 +02:00
Gael Guennebaud
ccaace03c9 Make EIGEN_HAS_CONSTEXPR user configurable 2016-05-20 15:10:08 +02:00
Gael Guennebaud
c3410804cd Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable 2016-05-20 15:05:38 +02:00
Gael Guennebaud
48bf5ec216 Make EIGEN_HAS_RVALUE_REFERENCES user configurable 2016-05-20 14:54:20 +02:00
Gael Guennebaud
f43ae88892 Rename EIGEN_HAVE_RVALUE_REFERENCES to EIGEN_HAS_RVALUE_REFERENCES 2016-05-20 14:48:51 +02:00
Gael Guennebaud
2f656ce447 Remove std:: to enable custom scalar types. 2016-05-19 23:13:47 +02:00
Rasmus Larsen
b1e080c752 Merged eigen/eigen into default 2016-05-18 15:21:50 -07:00
Rasmus Munk Larsen
5624219b6b Merge. 2016-05-18 15:16:06 -07:00
Rasmus Munk Larsen
7df811cfe5 Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL. 2016-05-18 15:09:48 -07:00
Benoit Steiner
bb3ff8e9d9 Advertize the packet api of the tensor reducers iff the corresponding packet primitives are available. 2016-05-18 14:52:49 -07:00
Gael Guennebaud
548a487800 bug #1229: bypass usage of Derived::Options which is available for plain matrix types only. Better use column-major storage anyway. 2016-05-18 16:44:05 +02:00
Gael Guennebaud
43790e009b Pass argument by const ref instead of by value in pow(AutoDiffScalar...) 2016-05-18 16:28:02 +02:00
Gael Guennebaud
1fbfab27a9 bug #1223: fix compilation of AutoDiffScalar's min/max operators, and add regression unit test. 2016-05-18 16:26:26 +02:00
Gael Guennebaud
448d9d943c bug #1222: fix compilation in AutoDiffScalar and add respective unit test 2016-05-18 16:00:11 +02:00
Rasmus Munk Larsen
f519fca72b Reduce overhead for small tensors and cheap ops by short-circuiting the const computation and block size calculation in parallelFor. 2016-05-17 16:06:00 -07:00
Benoit Steiner
86ae94462e #if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly. 2016-05-17 14:06:15 -07:00
Benoit Steiner
997c335970 Fixed compilation error 2016-05-17 12:54:18 -07:00
Benoit Steiner
ebf6ada5ee Fixed compilation error in the tensor thread pool 2016-05-17 12:33:46 -07:00
Rasmus Munk Larsen
0bb61b04ca Merge upstream. 2016-05-17 10:26:10 -07:00
Rasmus Munk Larsen
0dbd68145f Roll back changes to core. Move include of TensorFunctors.h up to satisfy dependence in TensorCostModel.h. 2016-05-17 10:25:19 -07:00
Rasmus Larsen
00228f2506 Merged eigen/eigen into default 2016-05-17 09:49:31 -07:00
Benoit Steiner
e7e64c3277 Enable the use of the packet api to evaluate tensor broadcasts. This speed things up quite a bit:
Before"
M_broadcasting/10        500000       3690    27.10 MFlops/s
BM_broadcasting/80        500000       4014  1594.24 MFlops/s
BM_broadcasting/640       100000      14770 27731.35 MFlops/s
BM_broadcasting/4K          5000     632711 39512.48 MFlops/s
After:
BM_broadcasting/10        500000       4287    23.33 MFlops/s
BM_broadcasting/80        500000       4455  1436.41 MFlops/s
BM_broadcasting/640       200000      10195 40173.01 MFlops/s
BM_broadcasting/4K          5000     423746 58997.57 MFlops/s
2016-05-17 09:24:35 -07:00
Benoit Steiner
5fa27574dd Allow vectorized padding on GPU. This helps speed things up a little
Before:
BM_padding/10            5000000        460   217.03 MFlops/s
BM_padding/80            5000000        460 13899.40 MFlops/s
BM_padding/640           5000000        461 888421.17 MFlops/s
BM_padding/4K            5000000        460 54316322.55 MFlops/s
After:
BM_padding/10            5000000        454   220.20 MFlops/s
BM_padding/80            5000000        455 14039.86 MFlops/s
BM_padding/640           5000000        452 904968.83 MFlops/s
BM_padding/4K            5000000        411 60750049.21 MFlops/s
2016-05-17 09:17:26 -07:00
Benoit Steiner
a910bcee43 Merged latest updates from trunk 2016-05-17 09:14:22 -07:00
Benoit Steiner
8d06c02ffd Allow vectorized padding on GPU. This helps speed things up a little.
Before:
BM_padding/10            5000000        460   217.03 MFlops/s
BM_padding/80            5000000        460 13899.40 MFlops/s
BM_padding/640           5000000        461 888421.17 MFlops/s
BM_padding/4K            5000000        460 54316322.55 MFlops/s
After:
BM_padding/10            5000000        454   220.20 MFlops/s
BM_padding/80            5000000        455 14039.86 MFlops/s
BM_padding/640           5000000        452 904968.83 MFlops/s
BM_padding/4K            5000000        411 60750049.21 MFlops/s
2016-05-17 09:13:27 -07:00
Benoit Steiner
86da77cb9b Pulled latest updates from trunk. 2016-05-17 07:21:48 -07:00
Benoit Steiner
92fc6add43 Don't rely on c++11 extension when we don't have to. 2016-05-17 07:21:22 -07:00
David Dement
ccc7563ac5 made a fix to the GMRES solver so that it now correctly reports the error achieved in the solution process 2016-05-16 14:26:41 -04:00
Benoit Steiner
a80d875916 Added missing costPerCoeff method 2016-05-16 09:31:10 -07:00
Benoit Steiner
83ef39e055 Turn on the cost model by default. This results in some significant speedups for smaller tensors. For example, below are the results for the various tensor reductions.
Before:
BM_colReduction_12T/10       1000000       1949    51.29 MFlops/s
BM_colReduction_12T/80        100000      15636   409.29 MFlops/s
BM_colReduction_12T/640        20000      95100  4307.01 MFlops/s
BM_colReduction_12T/4K           500    4573423  5466.36 MFlops/s
BM_colReduction_4T/10        1000000       1867    53.56 MFlops/s
BM_colReduction_4T/80         500000       5288  1210.11 MFlops/s
BM_colReduction_4T/640         10000     106924  3830.75 MFlops/s
BM_colReduction_4T/4K            500    9946374  2513.48 MFlops/s
BM_colReduction_8T/10        1000000       1912    52.30 MFlops/s
BM_colReduction_8T/80         200000       8354   766.09 MFlops/s
BM_colReduction_8T/640         20000      85063  4815.22 MFlops/s
BM_colReduction_8T/4K            500    5445216  4591.19 MFlops/s
BM_rowReduction_12T/10       1000000       2041    48.99 MFlops/s
BM_rowReduction_12T/80        100000      15426   414.87 MFlops/s
BM_rowReduction_12T/640        50000      39117 10470.98 MFlops/s
BM_rowReduction_12T/4K           500    3034298  8239.14 MFlops/s
BM_rowReduction_4T/10        1000000       1834    54.51 MFlops/s
BM_rowReduction_4T/80         500000       5406  1183.81 MFlops/s
BM_rowReduction_4T/640         50000      35017 11697.16 MFlops/s
BM_rowReduction_4T/4K            500    3428527  7291.76 MFlops/s
BM_rowReduction_8T/10        1000000       1925    51.95 MFlops/s
BM_rowReduction_8T/80         200000       8519   751.23 MFlops/s
BM_rowReduction_8T/640         50000      33441 12248.42 MFlops/s
BM_rowReduction_8T/4K           1000    2852841  8763.19 MFlops/s


After:
BM_colReduction_12T/10      50000000         59  1678.30 MFlops/s
BM_colReduction_12T/80       5000000        725  8822.71 MFlops/s
BM_colReduction_12T/640        20000      90882  4506.93 MFlops/s
BM_colReduction_12T/4K           500    4668855  5354.63 MFlops/s
BM_colReduction_4T/10       50000000         59  1687.37 MFlops/s
BM_colReduction_4T/80        5000000        737  8681.24 MFlops/s
BM_colReduction_4T/640         50000     108637  3770.34 MFlops/s
BM_colReduction_4T/4K            500    7912954  3159.38 MFlops/s
BM_colReduction_8T/10       50000000         60  1657.21 MFlops/s
BM_colReduction_8T/80        5000000        726  8812.48 MFlops/s
BM_colReduction_8T/640         20000      91451  4478.90 MFlops/s
BM_colReduction_8T/4K            500    5441692  4594.16 MFlops/s
BM_rowReduction_12T/10      20000000         93  1065.28 MFlops/s
BM_rowReduction_12T/80       2000000        950  6730.96 MFlops/s
BM_rowReduction_12T/640        50000      38196 10723.48 MFlops/s
BM_rowReduction_12T/4K           500    3019217  8280.29 MFlops/s
BM_rowReduction_4T/10       20000000         93  1064.30 MFlops/s
BM_rowReduction_4T/80        2000000        959  6667.71 MFlops/s
BM_rowReduction_4T/640         50000      37433 10941.96 MFlops/s
BM_rowReduction_4T/4K            500    3036476  8233.23 MFlops/s
BM_rowReduction_8T/10       20000000         93  1072.47 MFlops/s
BM_rowReduction_8T/80        2000000        959  6670.04 MFlops/s
BM_rowReduction_8T/640         50000      38069 10759.37 MFlops/s
BM_rowReduction_8T/4K           1000    2758988  9061.29 MFlops/s
2016-05-16 08:55:21 -07:00
Benoit Steiner
b789a26804 Fixed syntax error 2016-05-16 08:51:08 -07:00
Benoit Steiner
83dfb40f66 Turnon the new thread pool by default since it scales much better over multiple cores. It is still possible to revert to the old thread pool by compiling with the EIGEN_USE_SIMPLE_THREAD_POOL define. 2016-05-13 17:23:15 -07:00
Benoit Steiner
97605c7b27 New multithreaded contraction that doesn't rely on the thread pool to run the closure in the order in which they are enqueued. This is needed in order to switch to the new non blocking thread pool since this new thread pool can execute the closure in any order. 2016-05-13 17:11:29 -07:00
Benoit Steiner
c4fc8b70ec Removed unnecessary thread synchronization 2016-05-13 10:49:38 -07:00
Benoit Steiner
7aa3557d31 Fixed compilation errors triggered by old versions of gcc 2016-05-12 18:59:04 -07:00
Rasmus Munk Larsen
5005b27fc8 Diasbled cost model by accident. Revert. 2016-05-12 16:55:21 -07:00
Rasmus Munk Larsen
989e419328 Address comments by bsteiner. 2016-05-12 16:54:19 -07:00
Rasmus Munk Larsen
e55deb21c5 Improvements to parallelFor.
Move some scalar functors from TensorFunctors. to Eigen core.
2016-05-12 14:07:22 -07:00
Benoit Steiner
ae9688f313 Worked around a compilation error triggered by nvcc when compiling a tensor concatenation kernel. 2016-05-12 12:06:51 -07:00
Benoit Steiner
2a54b70d45 Fixed potential race condition in the non blocking thread pool 2016-05-12 11:45:48 -07:00
Benoit Steiner
a071629fec Replace implicit cast with an explicit one 2016-05-12 10:40:07 -07:00
Benoit Steiner
2f9401b061 Worked around compilation errors with older versions of gcc 2016-05-11 23:39:20 -07:00
Benoit Steiner
09653e1f82 Improved the portability of the tensor code 2016-05-11 23:29:09 -07:00
Benoit Steiner
fae0493f98 Fixed a couple of bugs related to the Pascalfamily of GPUs
H: Enter commit message.  Lines beginning with 'HG:' are removed.
2016-05-11 23:02:26 -07:00
Benoit Steiner
886445ce4d Avoid unnecessary conversions between floats and doubles 2016-05-11 23:00:03 -07:00
Benoit Steiner
595e890391 Added more tests for half floats 2016-05-11 21:27:15 -07:00
Benoit Steiner
b6a517c47d Added the ability to load fp16 using the texture path.
Improved the performance of some reductions on fp16
2016-05-11 21:26:48 -07:00
Christoph Hertzberg
1a1ce6ff61 Removed deprecated flag (which apparently was ignored anyway) 2016-05-11 23:05:37 +02:00
Christoph Hertzberg
2150f13d65 fixed some double-promotion and sign-compare warnings 2016-05-11 23:02:26 +02:00
Benoit Steiner
217d984abc Fixed a typo in my previous commit 2016-05-11 10:22:15 -07:00
Benoit Steiner
08348b4e48 Fix potential race condition in the CUDA reduction code. 2016-05-11 10:08:51 -07:00
Benoit Steiner
cbb14ed47e Added a few tests to validate the generation of random tensors on GPU. 2016-05-11 10:05:56 -07:00
Benoit Steiner
6a5717dc74 Explicitely initialize all the atomic variables. 2016-05-11 10:04:41 -07:00
Benoit Steiner
4ede059de1 Properly gate the use of half2. 2016-05-10 17:04:01 -07:00
Benoit Steiner
661e710092 Added support for fp16 to the sigmoid functor. 2016-05-10 12:25:27 -07:00
Benoit Steiner
0eb69b7552 Small improvement to the full reduction of fp16 2016-05-10 11:58:18 -07:00
Benoit Steiner
6bf8273bc0 Added a test to validate the new non blocking thread pool 2016-05-10 10:49:34 -07:00
Benoit Steiner
4013b8feca Simplified the reduction code a little. 2016-05-10 09:40:42 -07:00
Benoit Steiner
75bd2bd32d Fixed compilation warning 2016-05-09 19:24:41 -07:00
Benoit Steiner
4670d7d5ce Improved the performance of full reductions on GPU:
Before:
BM_fullReduction/10       200000      11751     8.51 MFlops/s
BM_fullReduction/80         5000     523385    12.23 MFlops/s
BM_fullReduction/640          50   36179326    11.32 MFlops/s
BM_fullReduction/4K            1 2173517195    11.50 MFlops/s

After:
BM_fullReduction/10       500000       5987    16.70 MFlops/s
BM_fullReduction/80       200000      10636   601.73 MFlops/s
BM_fullReduction/640       50000      58428  7010.31 MFlops/s
BM_fullReduction/4K         1000    2006106 12461.95 MFlops/s
2016-05-09 17:09:54 -07:00
Benoit Steiner
c3859a2b58 Added the ability to use a scratch buffer in cuda kernels 2016-05-09 17:05:53 -07:00
Benoit Steiner
ba95e43ea2 Added a new parallelFor api to the thread pool device. 2016-05-09 10:45:12 -07:00
Benoit Steiner
dc7dbc2df7 Optimized the non blocking thread pool:
* Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered.
 * Directly pop from a non-empty queue when we are waiting for work,
instead of first noticing that there is a non-empty queue and
then doing another round of random stealing to re-discover the non-empty
queue.
 * Steal only 1 task from a remote queue instead of half of tasks.
2016-05-09 10:17:17 -07:00
Benoit Steiner
691614bd2c Worked around a bug in nvcc on tegra x1 2016-05-07 13:28:53 -07:00
Benoit Steiner
c54ae65c83 Marked a few tensor operations as read only 2016-05-05 17:18:47 -07:00
Benoit Steiner
69a8a4e1f3 Added a test to validate full reduction on tensor of half floats 2016-05-05 16:52:50 -07:00
Benoit Steiner
678a17ba79 Made the testing of contractions on fp16 more robust 2016-05-05 16:36:39 -07:00
Benoit Steiner
e3d053e14e Refined the testing of log and exp on fp16 2016-05-05 16:24:15 -07:00
Benoit Steiner
9a48688d37 Further improved the testing of fp16 2016-05-05 15:58:05 -07:00
Benoit Steiner
910e013506 Relaxed an assertion that was tighter that necessary. 2016-05-05 15:38:16 -07:00
Benoit Steiner
28d5572658 Fixed some incorrect assertions 2016-05-05 10:02:26 -07:00
Benoit Steiner
2aba40d208 Avoid unecessary type promotion 2016-05-05 09:26:57 -07:00
Benoit Steiner
a4d6e8fef0 Strongly hint but don't force the compiler to unroll a some loops in the tensor executor. This results in up to 27% faster code. 2016-05-05 09:25:55 -07:00
Benoit Steiner
7875437ca0 Avoided unecessary type promotion 2016-05-05 09:08:42 -07:00
Benoit Steiner
f363e533aa Added tests for full contractions using thread pools and gpu devices.
Fixed a couple of issues in the corresponding code.
2016-05-05 09:05:45 -07:00
Benoit Steiner
06d774bf58 Updated the contraction code to ensure that full contraction return a tensor of rank 0 2016-05-05 08:37:47 -07:00
Christoph Hertzberg
b300a84989 Fixed some singed/unsigned comparison warnings 2016-05-05 13:36:28 +02:00
Christoph Hertzberg
dacb469bc9 Enable and fix -Wdouble-conversion warnings 2016-05-05 13:35:45 +02:00
Benoit Steiner
62b710072e Reduced the memory footprint of the cxx11_tensor_image_patch test 2016-05-04 21:08:22 -07:00
Benoit Steiner
dd2b45feed Removed extraneous 'explicit' keywords 2016-05-04 16:57:52 -07:00
Benoit Steiner
968ec1c2ae Use numext::isfinite instead of std::isfinite 2016-05-03 19:56:40 -07:00
Benoit Steiner
2c5568a757 Added a test to validate the computation of exp and log on 16bit floats 2016-05-03 12:06:07 -07:00
Benoit Steiner
aad9a04da4 Deleted superfluous explicit keyword. 2016-05-03 09:37:19 -07:00
Benoit Steiner
8a9228ed9b Fixed compilation error 2016-05-01 14:48:01 -07:00
Benoit Steiner
d6c9596fd8 Added missing accessors to fixed sized tensors 2016-04-29 18:51:33 -07:00
Benoit Steiner
17fe7f354e Deleted trailing commas 2016-04-29 18:39:01 -07:00
Benoit Steiner
e5f71aa6b2 Deleted useless trailing commas 2016-04-29 18:36:10 -07:00
Benoit Steiner
44f592dceb Deleted unnecessary trailing commas. 2016-04-29 18:33:46 -07:00
Benoit Steiner
2b890ae618 Fixed compilation errors generated by clang 2016-04-29 18:30:40 -07:00
Benoit Steiner
d217217842 Added a few tests to ensure that the dimensions of rank 0 tensors are correctly computed 2016-04-29 18:15:34 -07:00
Benoit Steiner
f100d1494c Return the proper size (ie 1) for tensors of rank 0 2016-04-29 18:14:33 -07:00
Benoit Steiner
d14105f158 Made several tensor tests compatible with cxx03 2016-04-29 17:22:37 -07:00
Benoit Steiner
c0882ef4d9 Moved a number of tensor tests that don't require cxx11 to work properly outside the EIGEN_TEST_CXX11 test section 2016-04-29 17:13:51 -07:00
Benoit Steiner
9d1dbd1ec0 Fixed teh cxx11_tensor_empty test to compile without requiring cxx11 support 2016-04-29 16:53:55 -07:00
Benoit Steiner
a8c0405cf5 Deleted unused default values for template parameters 2016-04-29 16:34:43 -07:00
Benoit Steiner
4f53178e62 Made a coupe of tensor tests compile without requiring c++11 support. 2016-04-29 16:09:54 -07:00
Benoit Steiner
1131a984a6 Made the cxx11_tensor_forced_eval compile without c++11. 2016-04-29 15:48:59 -07:00
Benoit Steiner
c07404f6a1 Restore Tensor support for non c++11 compilers 2016-04-29 15:19:19 -07:00
Benoit Steiner
ba32ded021 Fixed include path 2016-04-29 15:11:09 -07:00
Benoit Steiner
a524a26fdc Fixed a few memory leaks 2016-04-28 18:55:53 -07:00
Gael Guennebaud
318e65e0ae Fix missing inclusion of Eigen/Core 2016-04-27 23:05:40 +02:00
Rasmus Munk Larsen
463738ccbe Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases. 2016-04-27 12:26:18 -07:00
Gael Guennebaud
3dddd34133 Refactor the unsupported CXX11/Core module to internal headers only. 2016-04-26 11:20:25 +02:00
Benoit Steiner
4a164d2c46 Fixed the partial evaluation of non vectorizable tensor subexpressions 2016-04-25 10:43:03 -07:00
Benoit Steiner
fd9401f260 Refined the cost of the striding operation. 2016-04-25 09:16:08 -07:00
Benoit Steiner
4bbc97be5e Provide access to the base threadpool classes 2016-04-21 17:59:33 -07:00
Benoit Steiner
33adce5c3a Added the ability to switch to the new thread pool with a #define 2016-04-21 11:59:58 -07:00
Benoit Steiner
f670613e4b Fixed several compilation warnings 2016-04-21 11:03:02 -07:00
Benoit Steiner
32ffce04fc Use EIGEN_THREAD_YIELD instead of std::this_thread::yield to make the code more portable. 2016-04-21 08:47:28 -07:00
Benoit Steiner
2dde1b1028 Don't crash when attempting to reduce empty tensors. 2016-04-20 18:08:20 -07:00
Benoit Steiner
a792cd357d Added more tests 2016-04-20 17:33:58 -07:00
Benoit Steiner
c7c2054bb5 Started to implement a portable way to yield. 2016-04-19 17:59:58 -07:00
Benoit Steiner
2b72163028 Implemented a more portable version of thread local variables 2016-04-19 15:56:02 -07:00
Benoit Steiner
04f954956d Fixed a few typos 2016-04-19 15:27:09 -07:00
Benoit Steiner
5b1106c56b Fixed a compilation error with nvcc 7. 2016-04-19 14:57:57 -07:00
Benoit Steiner
7129d998db Simplified the code that launches cuda kernels. 2016-04-19 14:55:21 -07:00
Benoit Steiner
b9ea40c30d Don't take the address of a kernel on CUDA devices that don't support this feature. 2016-04-19 14:35:11 -07:00
Benoit Steiner
884c075058 Use numext::ceil instead of std::ceil 2016-04-19 14:33:30 -07:00
Benoit Steiner
a278414d1b Avoid an unnecessary copy of the evaluator. 2016-04-19 13:54:28 -07:00
Benoit Steiner
f953c60705 Fixed 2 recent regression tests 2016-04-19 12:57:39 -07:00
Benoit Steiner
50968a0a3e Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors. 2016-04-19 11:53:58 -07:00
Benoit Steiner
84543c8be2 Worked around the lack of a rand_r function on windows systems 2016-04-17 19:29:27 -07:00
Benoit Steiner
5fbcfe5eb4 Worked around the lack of a rand_r function on windows systems 2016-04-17 18:42:31 -07:00
Benoit Steiner
c8e8f93d6c Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators. 2016-04-15 16:48:10 -07:00
Benoit Steiner
7cff898e0a Deleted unnecessary variable 2016-04-15 15:46:14 -07:00
Benoit Steiner
6c43c49e4a Fixed a few compilation warnings 2016-04-15 15:34:34 -07:00
Benoit Steiner
eb669f989f Merged in rmlarsen/eigen (pull request PR-178)
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.
2016-04-15 14:53:15 -07:00
Rasmus Munk Larsen
3718bf654b Get rid of void* casting when calling EvalRange::run. 2016-04-15 12:51:33 -07:00
Benoit Steiner
40c9923a8a Fixed compilation errors with msvc 2016-04-15 11:27:52 -07:00
Benoit Steiner
a62e924656 Added ability to access the cache sizes from the tensor devices 2016-04-14 21:25:06 -07:00
Benoit Steiner
18e6f67426 Added support for exclusive or 2016-04-14 20:37:46 -07:00
Rasmus Munk Larsen
07ac4f7e02 Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default. 2016-04-14 18:28:23 -07:00
Benoit Steiner
9624a1ea3d Added missing definition of PacketSize in the gpu evaluator of convolution 2016-04-14 17:16:58 -07:00
Benoit Steiner
6fbedf5a4e Merged in rmlarsen/eigen (pull request PR-177)
Eigen Tensor cost model part 1.
2016-04-14 17:13:19 -07:00
Benoit Steiner
bebb89acfa Enabled the new threadpool tests 2016-04-14 16:44:10 -07:00
Benoit Steiner
9c064b5a97 Cleanup 2016-04-14 16:41:31 -07:00
Benoit Steiner
1372156c41 Prepared the migration to the new non blocking thread pool 2016-04-14 16:16:42 -07:00
Rasmus Munk Larsen
aeb5494a0b Improvements to cost model. 2016-04-14 15:52:58 -07:00
Benoit Steiner
a8e8837ba7 Added tests for the non blocking thread pool 2016-04-14 15:23:49 -07:00
Benoit Steiner
78a51abc12 Added a more scalable non blocking thread pool 2016-04-14 15:23:10 -07:00
Rasmus Munk Larsen
d2e95492e7 Merge upstream updates. 2016-04-14 13:59:50 -07:00
Rasmus Munk Larsen
235e83aba6 Eigen cost model part 1. This implements a basic recursive framework to estimate the cost of evaluating tensor expressions. 2016-04-14 13:57:35 -07:00
Benoit Steiner
5912ad877c Silenced a compilation warning 2016-04-14 11:40:14 -07:00
Benoit Steiner
2b6e3de02f Added tests to validate flooring and ceiling of fp16 2016-04-14 11:39:18 -07:00
Benoit Steiner
6f23e945f6 Added simple test for numext::sqrt and numext::pow on fp16 2016-04-14 10:32:52 -07:00
Benoit Steiner
72510c80e1 Added basic test for trigonometric functions on fp16 2016-04-14 10:27:24 -07:00
Benoit Steiner
c7167fee0e Added support for fp16 to the sigmoid function 2016-04-14 10:08:33 -07:00
Benoit Steiner
f6003f0873 Made the test msvc friendly 2016-04-14 09:47:26 -07:00
Gael Guennebaud
7d1391d049 Turn a converge check to a warning 2016-04-13 22:50:54 +02:00
Benoit Steiner
e9b12cc1f7 Fixed compilation warnings generated by clang 2016-04-12 20:53:18 -07:00
Benoit Steiner
e3a184785c Fixed the zeta test 2016-04-12 11:12:36 -07:00
Benoit Steiner
3b76df64fc Defer the decision to vectorize tensor CUDA code to the meta kernel. This makes it possible to decide to vectorize or not depending on the capability of the target cuda architecture. In particular, this enables us to vectorize the processing of fp16 when running on device of capability >= 5.3 2016-04-12 10:58:51 -07:00
Gael Guennebaud
af2161cdb4 bug #1197: fix/relax some LM unit tests 2016-04-09 11:14:02 +02:00
Gael Guennebaud
a05a683d83 bug #1160: fix and relax some lm unit tests by turning faillures to warnings 2016-04-09 10:49:19 +02:00
Benoit Steiner
995f202cea Disabled the use of half2 on cuda devices of compute capability < 5.3 2016-04-08 14:43:36 -07:00
Benoit Steiner
0d2a532fc3 Created the new EIGEN_TEST_CUDA_CLANG option to compile the CUDA tests using clang instead of nvcc 2016-04-08 13:16:08 -07:00
Benoit Steiner
2d072b38c1 Don't test the division by 0 on float16 when compiling with msvc since msvc detects and errors out on divisions by 0. 2016-04-08 12:50:25 -07:00
Benoit Steiner
d962fe6a99 Renamed float16 into cxx11_float16 since the test relies on c++11 features 2016-04-07 20:28:32 -07:00
Benoit Steiner
7d5b17087f Added missing EIGEN_DEVICE_FUNC to the tensor conversion code. 2016-04-07 20:01:19 -07:00
Benoit Steiner
a02ec09511 Worked around numerical noise in the test for the zeta function. 2016-04-07 12:11:02 -07:00
Benoit Steiner
c912b1d28c Fixed a typo in the polygamma test. 2016-04-07 11:51:07 -07:00
Benoit Steiner
dc45aaeb93 Added tests for float16 2016-04-07 11:18:05 -07:00
Benoit Steiner
8db269e055 Fixed a typo in a test 2016-04-07 10:41:51 -07:00
Benoit Steiner
48308ed801 Added support for isinf, isnan, and isfinite checks to the tensor api 2016-04-07 09:48:36 -07:00
Benoit Steiner
cfb34d808b Fixed a possible integer overflow. 2016-04-07 08:46:52 -07:00
Benoit Steiner
165150e896 Fixed the tests for the zeta and polygamma functions 2016-04-06 14:31:01 -07:00
Benoit Steiner
7be1eaad1e Fixed typos in the implementation of the zeta and polygamma ops. 2016-04-06 14:15:37 -07:00