Benoit Steiner
97605c7b27
New multithreaded contraction that doesn't rely on the thread pool to run the closure in the order in which they are enqueued. This is needed in order to switch to the new non blocking thread pool since this new thread pool can execute the closure in any order.
2016-05-13 17:11:29 -07:00
Benoit Steiner
c4fc8b70ec
Removed unnecessary thread synchronization
2016-05-13 10:49:38 -07:00
Benoit Steiner
7aa3557d31
Fixed compilation errors triggered by old versions of gcc
2016-05-12 18:59:04 -07:00
Benoit Steiner
ae9688f313
Worked around a compilation error triggered by nvcc when compiling a tensor concatenation kernel.
2016-05-12 12:06:51 -07:00
Benoit Steiner
2a54b70d45
Fixed potential race condition in the non blocking thread pool
2016-05-12 11:45:48 -07:00
Benoit Steiner
a071629fec
Replace implicit cast with an explicit one
2016-05-12 10:40:07 -07:00
Benoit Steiner
2f9401b061
Worked around compilation errors with older versions of gcc
2016-05-11 23:39:20 -07:00
Benoit Steiner
09653e1f82
Improved the portability of the tensor code
2016-05-11 23:29:09 -07:00
Benoit Steiner
b6a517c47d
Added the ability to load fp16 using the texture path.
...
Improved the performance of some reductions on fp16
2016-05-11 21:26:48 -07:00
Christoph Hertzberg
1a1ce6ff61
Removed deprecated flag (which apparently was ignored anyway)
2016-05-11 23:05:37 +02:00
Christoph Hertzberg
2150f13d65
fixed some double-promotion and sign-compare warnings
2016-05-11 23:02:26 +02:00
Benoit Steiner
217d984abc
Fixed a typo in my previous commit
2016-05-11 10:22:15 -07:00
Benoit Steiner
08348b4e48
Fix potential race condition in the CUDA reduction code.
2016-05-11 10:08:51 -07:00
Benoit Steiner
6a5717dc74
Explicitely initialize all the atomic variables.
2016-05-11 10:04:41 -07:00
Benoit Steiner
4ede059de1
Properly gate the use of half2.
2016-05-10 17:04:01 -07:00
Benoit Steiner
661e710092
Added support for fp16 to the sigmoid functor.
2016-05-10 12:25:27 -07:00
Benoit Steiner
0eb69b7552
Small improvement to the full reduction of fp16
2016-05-10 11:58:18 -07:00
Benoit Steiner
4013b8feca
Simplified the reduction code a little.
2016-05-10 09:40:42 -07:00
Benoit Steiner
4670d7d5ce
Improved the performance of full reductions on GPU:
...
Before:
BM_fullReduction/10 200000 11751 8.51 MFlops/s
BM_fullReduction/80 5000 523385 12.23 MFlops/s
BM_fullReduction/640 50 36179326 11.32 MFlops/s
BM_fullReduction/4K 1 2173517195 11.50 MFlops/s
After:
BM_fullReduction/10 500000 5987 16.70 MFlops/s
BM_fullReduction/80 200000 10636 601.73 MFlops/s
BM_fullReduction/640 50000 58428 7010.31 MFlops/s
BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s
2016-05-09 17:09:54 -07:00
Benoit Steiner
c3859a2b58
Added the ability to use a scratch buffer in cuda kernels
2016-05-09 17:05:53 -07:00
Benoit Steiner
ba95e43ea2
Added a new parallelFor api to the thread pool device.
2016-05-09 10:45:12 -07:00
Benoit Steiner
dc7dbc2df7
Optimized the non blocking thread pool:
...
* Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered.
* Directly pop from a non-empty queue when we are waiting for work,
instead of first noticing that there is a non-empty queue and
then doing another round of random stealing to re-discover the non-empty
queue.
* Steal only 1 task from a remote queue instead of half of tasks.
2016-05-09 10:17:17 -07:00
Benoit Steiner
c54ae65c83
Marked a few tensor operations as read only
2016-05-05 17:18:47 -07:00
Benoit Steiner
910e013506
Relaxed an assertion that was tighter that necessary.
2016-05-05 15:38:16 -07:00
Benoit Steiner
28d5572658
Fixed some incorrect assertions
2016-05-05 10:02:26 -07:00
Benoit Steiner
a4d6e8fef0
Strongly hint but don't force the compiler to unroll a some loops in the tensor executor. This results in up to 27% faster code.
2016-05-05 09:25:55 -07:00
Benoit Steiner
f363e533aa
Added tests for full contractions using thread pools and gpu devices.
...
Fixed a couple of issues in the corresponding code.
2016-05-05 09:05:45 -07:00
Benoit Steiner
06d774bf58
Updated the contraction code to ensure that full contraction return a tensor of rank 0
2016-05-05 08:37:47 -07:00
Christoph Hertzberg
dacb469bc9
Enable and fix -Wdouble-conversion warnings
2016-05-05 13:35:45 +02:00
Benoit Steiner
dd2b45feed
Removed extraneous 'explicit' keywords
2016-05-04 16:57:52 -07:00
Benoit Steiner
968ec1c2ae
Use numext::isfinite instead of std::isfinite
2016-05-03 19:56:40 -07:00
Benoit Steiner
aad9a04da4
Deleted superfluous explicit keyword.
2016-05-03 09:37:19 -07:00
Benoit Steiner
8a9228ed9b
Fixed compilation error
2016-05-01 14:48:01 -07:00
Benoit Steiner
d6c9596fd8
Added missing accessors to fixed sized tensors
2016-04-29 18:51:33 -07:00
Benoit Steiner
17fe7f354e
Deleted trailing commas
2016-04-29 18:39:01 -07:00
Benoit Steiner
e5f71aa6b2
Deleted useless trailing commas
2016-04-29 18:36:10 -07:00
Benoit Steiner
44f592dceb
Deleted unnecessary trailing commas.
2016-04-29 18:33:46 -07:00
Benoit Steiner
f100d1494c
Return the proper size (ie 1) for tensors of rank 0
2016-04-29 18:14:33 -07:00
Benoit Steiner
a8c0405cf5
Deleted unused default values for template parameters
2016-04-29 16:34:43 -07:00
Benoit Steiner
c07404f6a1
Restore Tensor support for non c++11 compilers
2016-04-29 15:19:19 -07:00
Benoit Steiner
ba32ded021
Fixed include path
2016-04-29 15:11:09 -07:00
Gael Guennebaud
318e65e0ae
Fix missing inclusion of Eigen/Core
2016-04-27 23:05:40 +02:00
Rasmus Munk Larsen
463738ccbe
Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases.
2016-04-27 12:26:18 -07:00
Gael Guennebaud
3dddd34133
Refactor the unsupported CXX11/Core module to internal headers only.
2016-04-26 11:20:25 +02:00
Benoit Steiner
4a164d2c46
Fixed the partial evaluation of non vectorizable tensor subexpressions
2016-04-25 10:43:03 -07:00
Benoit Steiner
fd9401f260
Refined the cost of the striding operation.
2016-04-25 09:16:08 -07:00
Benoit Steiner
4bbc97be5e
Provide access to the base threadpool classes
2016-04-21 17:59:33 -07:00
Benoit Steiner
33adce5c3a
Added the ability to switch to the new thread pool with a #define
2016-04-21 11:59:58 -07:00
Benoit Steiner
f670613e4b
Fixed several compilation warnings
2016-04-21 11:03:02 -07:00
Benoit Steiner
2dde1b1028
Don't crash when attempting to reduce empty tensors.
2016-04-20 18:08:20 -07:00