Gael Guennebaud
22a035db95
Fix compilation when defaulting to row-major
2016-05-27 10:31:11 +02:00
Benoit Steiner
1ae2567861
Fixed some compilation warnings
2016-05-26 15:57:19 -07:00
Benoit Steiner
1a47844529
Preserve the ability to vectorize the evaluation of an expression even when it involves a cast that isn't vectorized (e.g fp16 to float)
2016-05-26 14:37:09 -07:00
Benoit Steiner
36369ab63c
Resolved merge conflicts
2016-05-26 13:39:39 -07:00
Benoit Steiner
28fcb5ca2a
Merged latest reduction improvements
2016-05-26 12:19:33 -07:00
Benoit Steiner
c1c7f06c35
Improved the performance of inner reductions.
2016-05-26 11:53:59 -07:00
Benoit Steiner
22d02c9855
Improved the coverage of the fp16 reduction tests
2016-05-26 11:12:16 -07:00
Benoit Steiner
8288b0aec2
Code cleanup.
2016-05-26 09:00:04 -07:00
Benoit Steiner
2d7ed54ba2
Made the static storage class qualifier come first.
2016-05-25 22:16:15 -07:00
Benoit Steiner
e1fca8866e
Deleted unnecessary explicit qualifiers.
2016-05-25 22:15:26 -07:00
Benoit Steiner
9b0aaf5113
Don't mark inline functions as static since it confuses the ICC compiler
2016-05-25 22:10:11 -07:00
Benoit Steiner
037a463fd5
Marked unused variables as such
2016-05-25 22:07:48 -07:00
Benoit Steiner
3ac4045272
Made the IndexPair code compile in non cxx11 mode
2016-05-25 15:15:12 -07:00
Benoit Steiner
66556d0e05
Made the index pair list code more portable accross various compilers
2016-05-25 14:34:27 -07:00
Benoit Steiner
034aa3b2c0
Improved the performance of tensor padding
2016-05-25 11:43:08 -07:00
Benoit Steiner
58026905ae
Added support for statically known lists of pairs of indices
2016-05-25 11:04:14 -07:00
Benoit Steiner
0835667329
There is no need to make the fp16 full reduction kernel a static function.
2016-05-24 23:11:56 -07:00
Benoit Steiner
b5d6b52a4d
Fixed compilation warning
2016-05-24 23:10:57 -07:00
Benoit Steiner
a09cbf9905
Merged in rmlarsen/eigen (pull request PR-188)
...
Minor cleanups: 1. Get rid of a few unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
2016-05-23 12:55:12 -07:00
Christoph Hertzberg
718521d5cf
Silenced several double-promotion warnings
2016-05-22 18:17:04 +02:00
Christoph Hertzberg
b5a7603822
fixed macro name
2016-05-22 16:49:29 +02:00
Christoph Hertzberg
25a03c02d6
Fix some sign-compare warnings
2016-05-22 16:42:27 +02:00
Gael Guennebaud
ccaace03c9
Make EIGEN_HAS_CONSTEXPR user configurable
2016-05-20 15:10:08 +02:00
Gael Guennebaud
c3410804cd
Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable
2016-05-20 15:05:38 +02:00
Gael Guennebaud
48bf5ec216
Make EIGEN_HAS_RVALUE_REFERENCES user configurable
2016-05-20 14:54:20 +02:00
Gael Guennebaud
f43ae88892
Rename EIGEN_HAVE_RVALUE_REFERENCES to EIGEN_HAS_RVALUE_REFERENCES
2016-05-20 14:48:51 +02:00
Gael Guennebaud
2f656ce447
Remove std:: to enable custom scalar types.
2016-05-19 23:13:47 +02:00
Rasmus Larsen
b1e080c752
Merged eigen/eigen into default
2016-05-18 15:21:50 -07:00
Rasmus Munk Larsen
5624219b6b
Merge.
2016-05-18 15:16:06 -07:00
Rasmus Munk Larsen
7df811cfe5
Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
2016-05-18 15:09:48 -07:00
Benoit Steiner
bb3ff8e9d9
Advertize the packet api of the tensor reducers iff the corresponding packet primitives are available.
2016-05-18 14:52:49 -07:00
Gael Guennebaud
548a487800
bug #1229 : bypass usage of Derived::Options which is available for plain matrix types only. Better use column-major storage anyway.
2016-05-18 16:44:05 +02:00
Gael Guennebaud
43790e009b
Pass argument by const ref instead of by value in pow(AutoDiffScalar...)
2016-05-18 16:28:02 +02:00
Gael Guennebaud
1fbfab27a9
bug #1223 : fix compilation of AutoDiffScalar's min/max operators, and add regression unit test.
2016-05-18 16:26:26 +02:00
Gael Guennebaud
448d9d943c
bug #1222 : fix compilation in AutoDiffScalar and add respective unit test
2016-05-18 16:00:11 +02:00
Rasmus Munk Larsen
f519fca72b
Reduce overhead for small tensors and cheap ops by short-circuiting the const computation and block size calculation in parallelFor.
2016-05-17 16:06:00 -07:00
Benoit Steiner
86ae94462e
#if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly.
2016-05-17 14:06:15 -07:00
Benoit Steiner
997c335970
Fixed compilation error
2016-05-17 12:54:18 -07:00
Benoit Steiner
ebf6ada5ee
Fixed compilation error in the tensor thread pool
2016-05-17 12:33:46 -07:00
Rasmus Munk Larsen
0bb61b04ca
Merge upstream.
2016-05-17 10:26:10 -07:00
Rasmus Munk Larsen
0dbd68145f
Roll back changes to core. Move include of TensorFunctors.h up to satisfy dependence in TensorCostModel.h.
2016-05-17 10:25:19 -07:00
Rasmus Larsen
00228f2506
Merged eigen/eigen into default
2016-05-17 09:49:31 -07:00
Benoit Steiner
e7e64c3277
Enable the use of the packet api to evaluate tensor broadcasts. This speed things up quite a bit:
...
Before"
M_broadcasting/10 500000 3690 27.10 MFlops/s
BM_broadcasting/80 500000 4014 1594.24 MFlops/s
BM_broadcasting/640 100000 14770 27731.35 MFlops/s
BM_broadcasting/4K 5000 632711 39512.48 MFlops/s
After:
BM_broadcasting/10 500000 4287 23.33 MFlops/s
BM_broadcasting/80 500000 4455 1436.41 MFlops/s
BM_broadcasting/640 200000 10195 40173.01 MFlops/s
BM_broadcasting/4K 5000 423746 58997.57 MFlops/s
2016-05-17 09:24:35 -07:00
Benoit Steiner
5fa27574dd
Allow vectorized padding on GPU. This helps speed things up a little
...
Before:
BM_padding/10 5000000 460 217.03 MFlops/s
BM_padding/80 5000000 460 13899.40 MFlops/s
BM_padding/640 5000000 461 888421.17 MFlops/s
BM_padding/4K 5000000 460 54316322.55 MFlops/s
After:
BM_padding/10 5000000 454 220.20 MFlops/s
BM_padding/80 5000000 455 14039.86 MFlops/s
BM_padding/640 5000000 452 904968.83 MFlops/s
BM_padding/4K 5000000 411 60750049.21 MFlops/s
2016-05-17 09:17:26 -07:00
Benoit Steiner
a910bcee43
Merged latest updates from trunk
2016-05-17 09:14:22 -07:00
Benoit Steiner
8d06c02ffd
Allow vectorized padding on GPU. This helps speed things up a little.
...
Before:
BM_padding/10 5000000 460 217.03 MFlops/s
BM_padding/80 5000000 460 13899.40 MFlops/s
BM_padding/640 5000000 461 888421.17 MFlops/s
BM_padding/4K 5000000 460 54316322.55 MFlops/s
After:
BM_padding/10 5000000 454 220.20 MFlops/s
BM_padding/80 5000000 455 14039.86 MFlops/s
BM_padding/640 5000000 452 904968.83 MFlops/s
BM_padding/4K 5000000 411 60750049.21 MFlops/s
2016-05-17 09:13:27 -07:00
Benoit Steiner
86da77cb9b
Pulled latest updates from trunk.
2016-05-17 07:21:48 -07:00
Benoit Steiner
92fc6add43
Don't rely on c++11 extension when we don't have to.
2016-05-17 07:21:22 -07:00
David Dement
ccc7563ac5
made a fix to the GMRES solver so that it now correctly reports the error achieved in the solution process
2016-05-16 14:26:41 -04:00
Benoit Steiner
a80d875916
Added missing costPerCoeff method
2016-05-16 09:31:10 -07:00
Benoit Steiner
83ef39e055
Turn on the cost model by default. This results in some significant speedups for smaller tensors. For example, below are the results for the various tensor reductions.
...
Before:
BM_colReduction_12T/10 1000000 1949 51.29 MFlops/s
BM_colReduction_12T/80 100000 15636 409.29 MFlops/s
BM_colReduction_12T/640 20000 95100 4307.01 MFlops/s
BM_colReduction_12T/4K 500 4573423 5466.36 MFlops/s
BM_colReduction_4T/10 1000000 1867 53.56 MFlops/s
BM_colReduction_4T/80 500000 5288 1210.11 MFlops/s
BM_colReduction_4T/640 10000 106924 3830.75 MFlops/s
BM_colReduction_4T/4K 500 9946374 2513.48 MFlops/s
BM_colReduction_8T/10 1000000 1912 52.30 MFlops/s
BM_colReduction_8T/80 200000 8354 766.09 MFlops/s
BM_colReduction_8T/640 20000 85063 4815.22 MFlops/s
BM_colReduction_8T/4K 500 5445216 4591.19 MFlops/s
BM_rowReduction_12T/10 1000000 2041 48.99 MFlops/s
BM_rowReduction_12T/80 100000 15426 414.87 MFlops/s
BM_rowReduction_12T/640 50000 39117 10470.98 MFlops/s
BM_rowReduction_12T/4K 500 3034298 8239.14 MFlops/s
BM_rowReduction_4T/10 1000000 1834 54.51 MFlops/s
BM_rowReduction_4T/80 500000 5406 1183.81 MFlops/s
BM_rowReduction_4T/640 50000 35017 11697.16 MFlops/s
BM_rowReduction_4T/4K 500 3428527 7291.76 MFlops/s
BM_rowReduction_8T/10 1000000 1925 51.95 MFlops/s
BM_rowReduction_8T/80 200000 8519 751.23 MFlops/s
BM_rowReduction_8T/640 50000 33441 12248.42 MFlops/s
BM_rowReduction_8T/4K 1000 2852841 8763.19 MFlops/s
After:
BM_colReduction_12T/10 50000000 59 1678.30 MFlops/s
BM_colReduction_12T/80 5000000 725 8822.71 MFlops/s
BM_colReduction_12T/640 20000 90882 4506.93 MFlops/s
BM_colReduction_12T/4K 500 4668855 5354.63 MFlops/s
BM_colReduction_4T/10 50000000 59 1687.37 MFlops/s
BM_colReduction_4T/80 5000000 737 8681.24 MFlops/s
BM_colReduction_4T/640 50000 108637 3770.34 MFlops/s
BM_colReduction_4T/4K 500 7912954 3159.38 MFlops/s
BM_colReduction_8T/10 50000000 60 1657.21 MFlops/s
BM_colReduction_8T/80 5000000 726 8812.48 MFlops/s
BM_colReduction_8T/640 20000 91451 4478.90 MFlops/s
BM_colReduction_8T/4K 500 5441692 4594.16 MFlops/s
BM_rowReduction_12T/10 20000000 93 1065.28 MFlops/s
BM_rowReduction_12T/80 2000000 950 6730.96 MFlops/s
BM_rowReduction_12T/640 50000 38196 10723.48 MFlops/s
BM_rowReduction_12T/4K 500 3019217 8280.29 MFlops/s
BM_rowReduction_4T/10 20000000 93 1064.30 MFlops/s
BM_rowReduction_4T/80 2000000 959 6667.71 MFlops/s
BM_rowReduction_4T/640 50000 37433 10941.96 MFlops/s
BM_rowReduction_4T/4K 500 3036476 8233.23 MFlops/s
BM_rowReduction_8T/10 20000000 93 1072.47 MFlops/s
BM_rowReduction_8T/80 2000000 959 6670.04 MFlops/s
BM_rowReduction_8T/640 50000 38069 10759.37 MFlops/s
BM_rowReduction_8T/4K 1000 2758988 9061.29 MFlops/s
2016-05-16 08:55:21 -07:00
Benoit Steiner
b789a26804
Fixed syntax error
2016-05-16 08:51:08 -07:00
Benoit Steiner
83dfb40f66
Turnon the new thread pool by default since it scales much better over multiple cores. It is still possible to revert to the old thread pool by compiling with the EIGEN_USE_SIMPLE_THREAD_POOL define.
2016-05-13 17:23:15 -07:00
Benoit Steiner
97605c7b27
New multithreaded contraction that doesn't rely on the thread pool to run the closure in the order in which they are enqueued. This is needed in order to switch to the new non blocking thread pool since this new thread pool can execute the closure in any order.
2016-05-13 17:11:29 -07:00
Benoit Steiner
c4fc8b70ec
Removed unnecessary thread synchronization
2016-05-13 10:49:38 -07:00
Benoit Steiner
7aa3557d31
Fixed compilation errors triggered by old versions of gcc
2016-05-12 18:59:04 -07:00
Rasmus Munk Larsen
5005b27fc8
Diasbled cost model by accident. Revert.
2016-05-12 16:55:21 -07:00
Rasmus Munk Larsen
989e419328
Address comments by bsteiner.
2016-05-12 16:54:19 -07:00
Rasmus Munk Larsen
e55deb21c5
Improvements to parallelFor.
...
Move some scalar functors from TensorFunctors. to Eigen core.
2016-05-12 14:07:22 -07:00
Benoit Steiner
ae9688f313
Worked around a compilation error triggered by nvcc when compiling a tensor concatenation kernel.
2016-05-12 12:06:51 -07:00
Benoit Steiner
2a54b70d45
Fixed potential race condition in the non blocking thread pool
2016-05-12 11:45:48 -07:00
Benoit Steiner
a071629fec
Replace implicit cast with an explicit one
2016-05-12 10:40:07 -07:00
Benoit Steiner
2f9401b061
Worked around compilation errors with older versions of gcc
2016-05-11 23:39:20 -07:00
Benoit Steiner
09653e1f82
Improved the portability of the tensor code
2016-05-11 23:29:09 -07:00
Benoit Steiner
fae0493f98
Fixed a couple of bugs related to the Pascalfamily of GPUs
...
H: Enter commit message. Lines beginning with 'HG:' are removed.
2016-05-11 23:02:26 -07:00
Benoit Steiner
886445ce4d
Avoid unnecessary conversions between floats and doubles
2016-05-11 23:00:03 -07:00
Benoit Steiner
595e890391
Added more tests for half floats
2016-05-11 21:27:15 -07:00
Benoit Steiner
b6a517c47d
Added the ability to load fp16 using the texture path.
...
Improved the performance of some reductions on fp16
2016-05-11 21:26:48 -07:00
Christoph Hertzberg
1a1ce6ff61
Removed deprecated flag (which apparently was ignored anyway)
2016-05-11 23:05:37 +02:00
Christoph Hertzberg
2150f13d65
fixed some double-promotion and sign-compare warnings
2016-05-11 23:02:26 +02:00
Benoit Steiner
217d984abc
Fixed a typo in my previous commit
2016-05-11 10:22:15 -07:00
Benoit Steiner
08348b4e48
Fix potential race condition in the CUDA reduction code.
2016-05-11 10:08:51 -07:00
Benoit Steiner
cbb14ed47e
Added a few tests to validate the generation of random tensors on GPU.
2016-05-11 10:05:56 -07:00
Benoit Steiner
6a5717dc74
Explicitely initialize all the atomic variables.
2016-05-11 10:04:41 -07:00
Benoit Steiner
4ede059de1
Properly gate the use of half2.
2016-05-10 17:04:01 -07:00
Benoit Steiner
661e710092
Added support for fp16 to the sigmoid functor.
2016-05-10 12:25:27 -07:00
Benoit Steiner
0eb69b7552
Small improvement to the full reduction of fp16
2016-05-10 11:58:18 -07:00
Benoit Steiner
6bf8273bc0
Added a test to validate the new non blocking thread pool
2016-05-10 10:49:34 -07:00
Benoit Steiner
4013b8feca
Simplified the reduction code a little.
2016-05-10 09:40:42 -07:00
Benoit Steiner
75bd2bd32d
Fixed compilation warning
2016-05-09 19:24:41 -07:00
Benoit Steiner
4670d7d5ce
Improved the performance of full reductions on GPU:
...
Before:
BM_fullReduction/10 200000 11751 8.51 MFlops/s
BM_fullReduction/80 5000 523385 12.23 MFlops/s
BM_fullReduction/640 50 36179326 11.32 MFlops/s
BM_fullReduction/4K 1 2173517195 11.50 MFlops/s
After:
BM_fullReduction/10 500000 5987 16.70 MFlops/s
BM_fullReduction/80 200000 10636 601.73 MFlops/s
BM_fullReduction/640 50000 58428 7010.31 MFlops/s
BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s
2016-05-09 17:09:54 -07:00
Benoit Steiner
c3859a2b58
Added the ability to use a scratch buffer in cuda kernels
2016-05-09 17:05:53 -07:00
Benoit Steiner
ba95e43ea2
Added a new parallelFor api to the thread pool device.
2016-05-09 10:45:12 -07:00
Benoit Steiner
dc7dbc2df7
Optimized the non blocking thread pool:
...
* Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered.
* Directly pop from a non-empty queue when we are waiting for work,
instead of first noticing that there is a non-empty queue and
then doing another round of random stealing to re-discover the non-empty
queue.
* Steal only 1 task from a remote queue instead of half of tasks.
2016-05-09 10:17:17 -07:00
Benoit Steiner
691614bd2c
Worked around a bug in nvcc on tegra x1
2016-05-07 13:28:53 -07:00
Benoit Steiner
c54ae65c83
Marked a few tensor operations as read only
2016-05-05 17:18:47 -07:00
Benoit Steiner
69a8a4e1f3
Added a test to validate full reduction on tensor of half floats
2016-05-05 16:52:50 -07:00
Benoit Steiner
678a17ba79
Made the testing of contractions on fp16 more robust
2016-05-05 16:36:39 -07:00
Benoit Steiner
e3d053e14e
Refined the testing of log and exp on fp16
2016-05-05 16:24:15 -07:00
Benoit Steiner
9a48688d37
Further improved the testing of fp16
2016-05-05 15:58:05 -07:00
Benoit Steiner
910e013506
Relaxed an assertion that was tighter that necessary.
2016-05-05 15:38:16 -07:00
Benoit Steiner
28d5572658
Fixed some incorrect assertions
2016-05-05 10:02:26 -07:00
Benoit Steiner
2aba40d208
Avoid unecessary type promotion
2016-05-05 09:26:57 -07:00
Benoit Steiner
a4d6e8fef0
Strongly hint but don't force the compiler to unroll a some loops in the tensor executor. This results in up to 27% faster code.
2016-05-05 09:25:55 -07:00
Benoit Steiner
7875437ca0
Avoided unecessary type promotion
2016-05-05 09:08:42 -07:00
Benoit Steiner
f363e533aa
Added tests for full contractions using thread pools and gpu devices.
...
Fixed a couple of issues in the corresponding code.
2016-05-05 09:05:45 -07:00
Benoit Steiner
06d774bf58
Updated the contraction code to ensure that full contraction return a tensor of rank 0
2016-05-05 08:37:47 -07:00
Christoph Hertzberg
b300a84989
Fixed some singed/unsigned comparison warnings
2016-05-05 13:36:28 +02:00
Christoph Hertzberg
dacb469bc9
Enable and fix -Wdouble-conversion warnings
2016-05-05 13:35:45 +02:00
Benoit Steiner
62b710072e
Reduced the memory footprint of the cxx11_tensor_image_patch test
2016-05-04 21:08:22 -07:00
Benoit Steiner
dd2b45feed
Removed extraneous 'explicit' keywords
2016-05-04 16:57:52 -07:00
Benoit Steiner
968ec1c2ae
Use numext::isfinite instead of std::isfinite
2016-05-03 19:56:40 -07:00
Benoit Steiner
2c5568a757
Added a test to validate the computation of exp and log on 16bit floats
2016-05-03 12:06:07 -07:00
Benoit Steiner
aad9a04da4
Deleted superfluous explicit keyword.
2016-05-03 09:37:19 -07:00
Benoit Steiner
8a9228ed9b
Fixed compilation error
2016-05-01 14:48:01 -07:00
Benoit Steiner
d6c9596fd8
Added missing accessors to fixed sized tensors
2016-04-29 18:51:33 -07:00
Benoit Steiner
17fe7f354e
Deleted trailing commas
2016-04-29 18:39:01 -07:00
Benoit Steiner
e5f71aa6b2
Deleted useless trailing commas
2016-04-29 18:36:10 -07:00
Benoit Steiner
44f592dceb
Deleted unnecessary trailing commas.
2016-04-29 18:33:46 -07:00
Benoit Steiner
2b890ae618
Fixed compilation errors generated by clang
2016-04-29 18:30:40 -07:00
Benoit Steiner
d217217842
Added a few tests to ensure that the dimensions of rank 0 tensors are correctly computed
2016-04-29 18:15:34 -07:00
Benoit Steiner
f100d1494c
Return the proper size (ie 1) for tensors of rank 0
2016-04-29 18:14:33 -07:00
Benoit Steiner
d14105f158
Made several tensor tests compatible with cxx03
2016-04-29 17:22:37 -07:00
Benoit Steiner
c0882ef4d9
Moved a number of tensor tests that don't require cxx11 to work properly outside the EIGEN_TEST_CXX11 test section
2016-04-29 17:13:51 -07:00
Benoit Steiner
9d1dbd1ec0
Fixed teh cxx11_tensor_empty test to compile without requiring cxx11 support
2016-04-29 16:53:55 -07:00
Benoit Steiner
a8c0405cf5
Deleted unused default values for template parameters
2016-04-29 16:34:43 -07:00
Benoit Steiner
4f53178e62
Made a coupe of tensor tests compile without requiring c++11 support.
2016-04-29 16:09:54 -07:00
Benoit Steiner
1131a984a6
Made the cxx11_tensor_forced_eval compile without c++11.
2016-04-29 15:48:59 -07:00
Benoit Steiner
c07404f6a1
Restore Tensor support for non c++11 compilers
2016-04-29 15:19:19 -07:00
Benoit Steiner
ba32ded021
Fixed include path
2016-04-29 15:11:09 -07:00
Benoit Steiner
a524a26fdc
Fixed a few memory leaks
2016-04-28 18:55:53 -07:00
Gael Guennebaud
318e65e0ae
Fix missing inclusion of Eigen/Core
2016-04-27 23:05:40 +02:00
Rasmus Munk Larsen
463738ccbe
Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases.
2016-04-27 12:26:18 -07:00
Gael Guennebaud
3dddd34133
Refactor the unsupported CXX11/Core module to internal headers only.
2016-04-26 11:20:25 +02:00
Benoit Steiner
4a164d2c46
Fixed the partial evaluation of non vectorizable tensor subexpressions
2016-04-25 10:43:03 -07:00
Benoit Steiner
fd9401f260
Refined the cost of the striding operation.
2016-04-25 09:16:08 -07:00
Benoit Steiner
4bbc97be5e
Provide access to the base threadpool classes
2016-04-21 17:59:33 -07:00
Benoit Steiner
33adce5c3a
Added the ability to switch to the new thread pool with a #define
2016-04-21 11:59:58 -07:00
Benoit Steiner
f670613e4b
Fixed several compilation warnings
2016-04-21 11:03:02 -07:00
Benoit Steiner
32ffce04fc
Use EIGEN_THREAD_YIELD instead of std::this_thread::yield to make the code more portable.
2016-04-21 08:47:28 -07:00
Benoit Steiner
2dde1b1028
Don't crash when attempting to reduce empty tensors.
2016-04-20 18:08:20 -07:00
Benoit Steiner
a792cd357d
Added more tests
2016-04-20 17:33:58 -07:00
Benoit Steiner
c7c2054bb5
Started to implement a portable way to yield.
2016-04-19 17:59:58 -07:00
Benoit Steiner
2b72163028
Implemented a more portable version of thread local variables
2016-04-19 15:56:02 -07:00
Benoit Steiner
04f954956d
Fixed a few typos
2016-04-19 15:27:09 -07:00
Benoit Steiner
5b1106c56b
Fixed a compilation error with nvcc 7.
2016-04-19 14:57:57 -07:00
Benoit Steiner
7129d998db
Simplified the code that launches cuda kernels.
2016-04-19 14:55:21 -07:00
Benoit Steiner
b9ea40c30d
Don't take the address of a kernel on CUDA devices that don't support this feature.
2016-04-19 14:35:11 -07:00
Benoit Steiner
884c075058
Use numext::ceil instead of std::ceil
2016-04-19 14:33:30 -07:00
Benoit Steiner
a278414d1b
Avoid an unnecessary copy of the evaluator.
2016-04-19 13:54:28 -07:00
Benoit Steiner
f953c60705
Fixed 2 recent regression tests
2016-04-19 12:57:39 -07:00
Benoit Steiner
50968a0a3e
Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors.
2016-04-19 11:53:58 -07:00
Benoit Steiner
84543c8be2
Worked around the lack of a rand_r function on windows systems
2016-04-17 19:29:27 -07:00
Benoit Steiner
5fbcfe5eb4
Worked around the lack of a rand_r function on windows systems
2016-04-17 18:42:31 -07:00
Benoit Steiner
c8e8f93d6c
Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators.
2016-04-15 16:48:10 -07:00
Benoit Steiner
7cff898e0a
Deleted unnecessary variable
2016-04-15 15:46:14 -07:00
Benoit Steiner
6c43c49e4a
Fixed a few compilation warnings
2016-04-15 15:34:34 -07:00
Benoit Steiner
eb669f989f
Merged in rmlarsen/eigen (pull request PR-178)
...
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.
2016-04-15 14:53:15 -07:00
Rasmus Munk Larsen
3718bf654b
Get rid of void* casting when calling EvalRange::run.
2016-04-15 12:51:33 -07:00
Benoit Steiner
40c9923a8a
Fixed compilation errors with msvc
2016-04-15 11:27:52 -07:00