Benoit Steiner
217d984abc
Fixed a typo in my previous commit
2016-05-11 10:22:15 -07:00
Benoit Steiner
08348b4e48
Fix potential race condition in the CUDA reduction code.
2016-05-11 10:08:51 -07:00
Benoit Steiner
cbb14ed47e
Added a few tests to validate the generation of random tensors on GPU.
2016-05-11 10:05:56 -07:00
Benoit Steiner
6a5717dc74
Explicitely initialize all the atomic variables.
2016-05-11 10:04:41 -07:00
Benoit Steiner
4ede059de1
Properly gate the use of half2.
2016-05-10 17:04:01 -07:00
Benoit Steiner
661e710092
Added support for fp16 to the sigmoid functor.
2016-05-10 12:25:27 -07:00
Benoit Steiner
0eb69b7552
Small improvement to the full reduction of fp16
2016-05-10 11:58:18 -07:00
Benoit Steiner
6bf8273bc0
Added a test to validate the new non blocking thread pool
2016-05-10 10:49:34 -07:00
Benoit Steiner
4013b8feca
Simplified the reduction code a little.
2016-05-10 09:40:42 -07:00
Benoit Steiner
75bd2bd32d
Fixed compilation warning
2016-05-09 19:24:41 -07:00
Benoit Steiner
4670d7d5ce
Improved the performance of full reductions on GPU:
...
Before:
BM_fullReduction/10 200000 11751 8.51 MFlops/s
BM_fullReduction/80 5000 523385 12.23 MFlops/s
BM_fullReduction/640 50 36179326 11.32 MFlops/s
BM_fullReduction/4K 1 2173517195 11.50 MFlops/s
After:
BM_fullReduction/10 500000 5987 16.70 MFlops/s
BM_fullReduction/80 200000 10636 601.73 MFlops/s
BM_fullReduction/640 50000 58428 7010.31 MFlops/s
BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s
2016-05-09 17:09:54 -07:00
Benoit Steiner
c3859a2b58
Added the ability to use a scratch buffer in cuda kernels
2016-05-09 17:05:53 -07:00
Benoit Steiner
ba95e43ea2
Added a new parallelFor api to the thread pool device.
2016-05-09 10:45:12 -07:00
Benoit Steiner
dc7dbc2df7
Optimized the non blocking thread pool:
...
* Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered.
* Directly pop from a non-empty queue when we are waiting for work,
instead of first noticing that there is a non-empty queue and
then doing another round of random stealing to re-discover the non-empty
queue.
* Steal only 1 task from a remote queue instead of half of tasks.
2016-05-09 10:17:17 -07:00
Benoit Steiner
691614bd2c
Worked around a bug in nvcc on tegra x1
2016-05-07 13:28:53 -07:00
Benoit Steiner
c54ae65c83
Marked a few tensor operations as read only
2016-05-05 17:18:47 -07:00
Benoit Steiner
69a8a4e1f3
Added a test to validate full reduction on tensor of half floats
2016-05-05 16:52:50 -07:00
Benoit Steiner
678a17ba79
Made the testing of contractions on fp16 more robust
2016-05-05 16:36:39 -07:00
Benoit Steiner
e3d053e14e
Refined the testing of log and exp on fp16
2016-05-05 16:24:15 -07:00
Benoit Steiner
9a48688d37
Further improved the testing of fp16
2016-05-05 15:58:05 -07:00
Benoit Steiner
910e013506
Relaxed an assertion that was tighter that necessary.
2016-05-05 15:38:16 -07:00
Benoit Steiner
28d5572658
Fixed some incorrect assertions
2016-05-05 10:02:26 -07:00
Benoit Steiner
2aba40d208
Avoid unecessary type promotion
2016-05-05 09:26:57 -07:00
Benoit Steiner
a4d6e8fef0
Strongly hint but don't force the compiler to unroll a some loops in the tensor executor. This results in up to 27% faster code.
2016-05-05 09:25:55 -07:00
Benoit Steiner
7875437ca0
Avoided unecessary type promotion
2016-05-05 09:08:42 -07:00
Benoit Steiner
f363e533aa
Added tests for full contractions using thread pools and gpu devices.
...
Fixed a couple of issues in the corresponding code.
2016-05-05 09:05:45 -07:00
Benoit Steiner
06d774bf58
Updated the contraction code to ensure that full contraction return a tensor of rank 0
2016-05-05 08:37:47 -07:00
Christoph Hertzberg
b300a84989
Fixed some singed/unsigned comparison warnings
2016-05-05 13:36:28 +02:00
Christoph Hertzberg
dacb469bc9
Enable and fix -Wdouble-conversion warnings
2016-05-05 13:35:45 +02:00
Benoit Steiner
62b710072e
Reduced the memory footprint of the cxx11_tensor_image_patch test
2016-05-04 21:08:22 -07:00
Benoit Steiner
dd2b45feed
Removed extraneous 'explicit' keywords
2016-05-04 16:57:52 -07:00
Benoit Steiner
968ec1c2ae
Use numext::isfinite instead of std::isfinite
2016-05-03 19:56:40 -07:00
Benoit Steiner
2c5568a757
Added a test to validate the computation of exp and log on 16bit floats
2016-05-03 12:06:07 -07:00
Benoit Steiner
aad9a04da4
Deleted superfluous explicit keyword.
2016-05-03 09:37:19 -07:00
Benoit Steiner
8a9228ed9b
Fixed compilation error
2016-05-01 14:48:01 -07:00
Benoit Steiner
d6c9596fd8
Added missing accessors to fixed sized tensors
2016-04-29 18:51:33 -07:00
Benoit Steiner
17fe7f354e
Deleted trailing commas
2016-04-29 18:39:01 -07:00
Benoit Steiner
e5f71aa6b2
Deleted useless trailing commas
2016-04-29 18:36:10 -07:00
Benoit Steiner
44f592dceb
Deleted unnecessary trailing commas.
2016-04-29 18:33:46 -07:00
Benoit Steiner
2b890ae618
Fixed compilation errors generated by clang
2016-04-29 18:30:40 -07:00
Benoit Steiner
d217217842
Added a few tests to ensure that the dimensions of rank 0 tensors are correctly computed
2016-04-29 18:15:34 -07:00
Benoit Steiner
f100d1494c
Return the proper size (ie 1) for tensors of rank 0
2016-04-29 18:14:33 -07:00
Benoit Steiner
d14105f158
Made several tensor tests compatible with cxx03
2016-04-29 17:22:37 -07:00
Benoit Steiner
c0882ef4d9
Moved a number of tensor tests that don't require cxx11 to work properly outside the EIGEN_TEST_CXX11 test section
2016-04-29 17:13:51 -07:00
Benoit Steiner
9d1dbd1ec0
Fixed teh cxx11_tensor_empty test to compile without requiring cxx11 support
2016-04-29 16:53:55 -07:00
Benoit Steiner
a8c0405cf5
Deleted unused default values for template parameters
2016-04-29 16:34:43 -07:00
Benoit Steiner
4f53178e62
Made a coupe of tensor tests compile without requiring c++11 support.
2016-04-29 16:09:54 -07:00
Benoit Steiner
1131a984a6
Made the cxx11_tensor_forced_eval compile without c++11.
2016-04-29 15:48:59 -07:00
Benoit Steiner
c07404f6a1
Restore Tensor support for non c++11 compilers
2016-04-29 15:19:19 -07:00
Benoit Steiner
ba32ded021
Fixed include path
2016-04-29 15:11:09 -07:00
Benoit Steiner
a524a26fdc
Fixed a few memory leaks
2016-04-28 18:55:53 -07:00
Gael Guennebaud
318e65e0ae
Fix missing inclusion of Eigen/Core
2016-04-27 23:05:40 +02:00
Rasmus Munk Larsen
463738ccbe
Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases.
2016-04-27 12:26:18 -07:00
Gael Guennebaud
3dddd34133
Refactor the unsupported CXX11/Core module to internal headers only.
2016-04-26 11:20:25 +02:00
Benoit Steiner
4a164d2c46
Fixed the partial evaluation of non vectorizable tensor subexpressions
2016-04-25 10:43:03 -07:00
Benoit Steiner
fd9401f260
Refined the cost of the striding operation.
2016-04-25 09:16:08 -07:00
Benoit Steiner
4bbc97be5e
Provide access to the base threadpool classes
2016-04-21 17:59:33 -07:00
Benoit Steiner
33adce5c3a
Added the ability to switch to the new thread pool with a #define
2016-04-21 11:59:58 -07:00
Benoit Steiner
f670613e4b
Fixed several compilation warnings
2016-04-21 11:03:02 -07:00
Benoit Steiner
32ffce04fc
Use EIGEN_THREAD_YIELD instead of std::this_thread::yield to make the code more portable.
2016-04-21 08:47:28 -07:00
Benoit Steiner
2dde1b1028
Don't crash when attempting to reduce empty tensors.
2016-04-20 18:08:20 -07:00
Benoit Steiner
a792cd357d
Added more tests
2016-04-20 17:33:58 -07:00
Benoit Steiner
c7c2054bb5
Started to implement a portable way to yield.
2016-04-19 17:59:58 -07:00
Benoit Steiner
2b72163028
Implemented a more portable version of thread local variables
2016-04-19 15:56:02 -07:00
Benoit Steiner
04f954956d
Fixed a few typos
2016-04-19 15:27:09 -07:00
Benoit Steiner
5b1106c56b
Fixed a compilation error with nvcc 7.
2016-04-19 14:57:57 -07:00
Benoit Steiner
7129d998db
Simplified the code that launches cuda kernels.
2016-04-19 14:55:21 -07:00
Benoit Steiner
b9ea40c30d
Don't take the address of a kernel on CUDA devices that don't support this feature.
2016-04-19 14:35:11 -07:00
Benoit Steiner
884c075058
Use numext::ceil instead of std::ceil
2016-04-19 14:33:30 -07:00
Benoit Steiner
a278414d1b
Avoid an unnecessary copy of the evaluator.
2016-04-19 13:54:28 -07:00
Benoit Steiner
f953c60705
Fixed 2 recent regression tests
2016-04-19 12:57:39 -07:00
Benoit Steiner
50968a0a3e
Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors.
2016-04-19 11:53:58 -07:00
Benoit Steiner
84543c8be2
Worked around the lack of a rand_r function on windows systems
2016-04-17 19:29:27 -07:00
Benoit Steiner
5fbcfe5eb4
Worked around the lack of a rand_r function on windows systems
2016-04-17 18:42:31 -07:00
Benoit Steiner
c8e8f93d6c
Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators.
2016-04-15 16:48:10 -07:00
Benoit Steiner
7cff898e0a
Deleted unnecessary variable
2016-04-15 15:46:14 -07:00
Benoit Steiner
6c43c49e4a
Fixed a few compilation warnings
2016-04-15 15:34:34 -07:00
Benoit Steiner
eb669f989f
Merged in rmlarsen/eigen (pull request PR-178)
...
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.
2016-04-15 14:53:15 -07:00
Rasmus Munk Larsen
3718bf654b
Get rid of void* casting when calling EvalRange::run.
2016-04-15 12:51:33 -07:00
Benoit Steiner
40c9923a8a
Fixed compilation errors with msvc
2016-04-15 11:27:52 -07:00
Benoit Steiner
a62e924656
Added ability to access the cache sizes from the tensor devices
2016-04-14 21:25:06 -07:00
Benoit Steiner
18e6f67426
Added support for exclusive or
2016-04-14 20:37:46 -07:00
Rasmus Munk Larsen
07ac4f7e02
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default.
2016-04-14 18:28:23 -07:00
Benoit Steiner
9624a1ea3d
Added missing definition of PacketSize in the gpu evaluator of convolution
2016-04-14 17:16:58 -07:00
Benoit Steiner
6fbedf5a4e
Merged in rmlarsen/eigen (pull request PR-177)
...
Eigen Tensor cost model part 1.
2016-04-14 17:13:19 -07:00
Benoit Steiner
bebb89acfa
Enabled the new threadpool tests
2016-04-14 16:44:10 -07:00
Benoit Steiner
9c064b5a97
Cleanup
2016-04-14 16:41:31 -07:00
Benoit Steiner
1372156c41
Prepared the migration to the new non blocking thread pool
2016-04-14 16:16:42 -07:00
Rasmus Munk Larsen
aeb5494a0b
Improvements to cost model.
2016-04-14 15:52:58 -07:00
Benoit Steiner
a8e8837ba7
Added tests for the non blocking thread pool
2016-04-14 15:23:49 -07:00
Benoit Steiner
78a51abc12
Added a more scalable non blocking thread pool
2016-04-14 15:23:10 -07:00
Rasmus Munk Larsen
d2e95492e7
Merge upstream updates.
2016-04-14 13:59:50 -07:00
Rasmus Munk Larsen
235e83aba6
Eigen cost model part 1. This implements a basic recursive framework to estimate the cost of evaluating tensor expressions.
2016-04-14 13:57:35 -07:00
Benoit Steiner
5912ad877c
Silenced a compilation warning
2016-04-14 11:40:14 -07:00
Benoit Steiner
2b6e3de02f
Added tests to validate flooring and ceiling of fp16
2016-04-14 11:39:18 -07:00
Benoit Steiner
6f23e945f6
Added simple test for numext::sqrt and numext::pow on fp16
2016-04-14 10:32:52 -07:00
Benoit Steiner
72510c80e1
Added basic test for trigonometric functions on fp16
2016-04-14 10:27:24 -07:00
Benoit Steiner
c7167fee0e
Added support for fp16 to the sigmoid function
2016-04-14 10:08:33 -07:00
Benoit Steiner
f6003f0873
Made the test msvc friendly
2016-04-14 09:47:26 -07:00
Gael Guennebaud
7d1391d049
Turn a converge check to a warning
2016-04-13 22:50:54 +02:00