Benoit Steiner
|
a20b58845f
|
CUDA_ARCH isn't always defined, so avoid relying on it too much when figuring out which implementation to use for reductions. Instead rely on the device to tell us on which hardware version we're running.
|
2016-08-03 10:00:43 -07:00 |
|
Benoit Steiner
|
fd220dd8b0
|
Use numext::conj instead of std::conj
|
2016-08-01 18:16:16 -07:00 |
|
Benoit Steiner
|
e256acec7c
|
Avoid unecessary object copies
|
2016-08-01 17:03:39 -07:00 |
|
Benoit Steiner
|
2693fd54bf
|
bug #1266: half implementation has been moved to half_impl namespace
|
2016-07-29 13:45:56 -07:00 |
|
Benoit Steiner
|
3d3d34e442
|
Deleted dead code.
|
2016-07-25 08:53:37 -07:00 |
|
Gael Guennebaud
|
6d5daf32f5
|
bug #1255: comment out broken and unsused line.
|
2016-07-25 14:48:30 +02:00 |
|
Gael Guennebaud
|
9908020d36
|
Add minimal support for Array<string>, and fix Tensor<string>
|
2016-07-25 14:25:56 +02:00 |
|
Benoit Steiner
|
c6b0de2c21
|
Improved partial reductions in more cases
|
2016-07-22 17:18:20 -07:00 |
|
Gael Guennebaud
|
0f350a8b7e
|
Fix CUDA compilation
|
2016-07-21 18:47:07 +02:00 |
|
Benoit Steiner
|
20f7ef2f89
|
An evalTo expression is only aligned iff both the lhs and the rhs are aligned.
|
2016-07-12 10:56:42 -07:00 |
|
Benoit Steiner
|
3a2dd352ae
|
Improved the contraction mapper to properly support tensor products
|
2016-07-11 13:43:41 -07:00 |
|
Benoit Steiner
|
0bc020be9d
|
Improved the detection of packet size in the tensor scan evaluator.
|
2016-07-11 12:14:56 -07:00 |
|
Gael Guennebaud
|
194daa3048
|
Fix assertion (it did not make sense for static_val types)
|
2016-07-11 11:39:27 +02:00 |
|
Gael Guennebaud
|
18c35747ce
|
Emulate _BitScanReverse64 for 32 bits builds
|
2016-07-11 11:38:04 +02:00 |
|
Gael Guennebaud
|
599f8ba617
|
Change runtime to compile-time conditional.
|
2016-07-08 11:39:43 +02:00 |
|
Gael Guennebaud
|
544935101a
|
Fix warnings
|
2016-07-08 11:38:52 +02:00 |
|
Gael Guennebaud
|
179ebb88f9
|
Fix warning
|
2016-07-07 09:16:40 +02:00 |
|
Gael Guennebaud
|
ce9fc0ce14
|
fix clang compilation
|
2016-07-04 12:59:02 +02:00 |
|
Gael Guennebaud
|
440020474c
|
Workaround compilation issue with msvc
|
2016-07-04 12:49:19 +02:00 |
|
Benoit Steiner
|
cb2d8b8fa6
|
Made it possible to compile reductions for an old cuda architecture and run them on a recent gpu.
|
2016-06-29 15:42:01 -07:00 |
|
Benoit Steiner
|
b2a47641ce
|
Made the code compile when using CUDA architecture < 300
|
2016-06-29 15:32:47 -07:00 |
|
Igor Babuschkin
|
85699850d9
|
Add missing CUDA kernel to tensor scan op
The TensorScanOp implementation was missing a CUDA kernel launch.
This adds a simple placeholder implementation.
|
2016-06-29 11:54:35 +01:00 |
|
Benoit Steiner
|
75c333f94c
|
Don't store the scan axis in the evaluator of the tensor scan operation since it's only used in the constructor.
Also avoid taking references to values that may becomes stale after a copy construction.
|
2016-06-27 10:32:38 -07:00 |
|
Rasmus Munk Larsen
|
a9c1e4d7b7
|
Return -1 from CurrentThreadId when called by thread outside the pool.
|
2016-06-23 16:40:07 -07:00 |
|
Rasmus Munk Larsen
|
d39df320d2
|
Resolve merge.
|
2016-06-23 15:08:03 -07:00 |
|
Gael Guennebaud
|
360a743a10
|
bug #1241: does not emmit anything for empty tensors
|
2016-06-23 18:47:31 +02:00 |
|
Gael Guennebaud
|
7c6561485a
|
merge PR 194
|
2016-06-23 15:29:57 +02:00 |
|
Benoit Steiner
|
a29a2cb4ff
|
Silenced a couple of compilation warnings generated by xcode
|
2016-06-22 16:43:02 -07:00 |
|
Benoit Steiner
|
f8fcd6b32d
|
Turned the constructor of the PerThread struct into what is effectively a constant expression to make the code compatible with a wider range of compilers
|
2016-06-22 16:03:11 -07:00 |
|
Benoit Steiner
|
c58df31747
|
Handle empty tensors in the print functions
|
2016-06-21 09:22:43 -07:00 |
|
Benoit Steiner
|
de32f8d656
|
Fixed the printing of rank-0 tensors
|
2016-06-20 10:46:45 -07:00 |
|
Benoit Steiner
|
7d495d890a
|
Merged in ibab/eigen (pull request PR-197)
Implement exclusive scan option for Tensor library
|
2016-06-14 17:54:59 -07:00 |
|
Benoit Steiner
|
aedc5be1d6
|
Avoid generating pseudo random numbers that are multiple of 5: this helps
spread the load over multiple cpus without havind to rely on work stealing.
|
2016-06-14 17:51:47 -07:00 |
|
Igor Babuschkin
|
c4d10e921f
|
Implement exclusive scan option
|
2016-06-14 19:44:07 +01:00 |
|
Gael Guennebaud
|
76236cdea4
|
merge
|
2016-06-14 15:33:47 +02:00 |
|
Gael Guennebaud
|
5d38203735
|
Update Tensor module to use bind1st_op and bind2nd_op
|
2016-06-14 15:06:03 +02:00 |
|
Benoit Steiner
|
65d33e5898
|
Merged in ibab/eigen (pull request PR-195)
Add small fixes to TensorScanOp
|
2016-06-10 19:31:17 -07:00 |
|
Benoit Steiner
|
a05607875a
|
Don't refer to the half2 type unless it's been defined
|
2016-06-10 11:53:56 -07:00 |
|
Igor Babuschkin
|
86aedc9282
|
Add small fixes to TensorScanOp
|
2016-06-07 20:06:38 +01:00 |
|
Benoit Steiner
|
84b2060a9e
|
Fixed compilation error with gcc 4.4
|
2016-06-06 17:16:19 -07:00 |
|
Benoit Steiner
|
7ef9f47b58
|
Misc small improvements to the reduction code.
|
2016-06-06 14:09:46 -07:00 |
|
Benoit Steiner
|
9137f560f0
|
Moved assertions to the constructor to make the code more portable
|
2016-06-06 07:26:48 -07:00 |
|
Rasmus Munk Larsen
|
f1f2ff8208
|
size_t -> int
|
2016-06-03 18:06:37 -07:00 |
|
Rasmus Munk Larsen
|
76308e7fd2
|
Add CurrentThreadId and NumThreads methods to Eigen threadpools and TensorDeviceThreadPool.
|
2016-06-03 16:28:58 -07:00 |
|
Benoit Steiner
|
37638dafd7
|
Simplified the code that dispatches vectorized reductions on GPU
|
2016-06-09 10:29:52 -07:00 |
|
Benoit Steiner
|
66796e843d
|
Fixed definition of some of the reducer_traits
|
2016-06-09 08:50:01 -07:00 |
|
Benoit Steiner
|
14a112ee15
|
Use signed integers more consistently to encode the number of threads to use to evaluate a tensor expression.
|
2016-06-09 08:25:22 -07:00 |
|
Benoit Steiner
|
8f92c26319
|
Improved code formatting
|
2016-06-09 08:23:42 -07:00 |
|
Benoit Steiner
|
aa33446dac
|
Improved support for vectorization of 16-bit floats
|
2016-06-09 08:22:27 -07:00 |
|
Benoit Steiner
|
d6d39c7ddb
|
Added missing EIGEN_DEVICE_FUNC
|
2016-06-07 14:35:08 -07:00 |
|