Benoit Steiner
|
37c2c516a6
|
Cleaned up the sycl device code
|
2016-11-18 12:38:06 -08:00 |
|
Mehdi Goli
|
15e226d7d3
|
adding Benoit changes on the TensorDeviceSycl.h
|
2016-11-18 16:34:54 +00:00 |
|
Mehdi Goli
|
622805a0c5
|
Modifying TensorDeviceSycl.h to always create buffer of type uint8_t and convert them to the actual type at the execution on the device; adding the queue interface class to separate the lifespan of sycl queue and buffers,created for that queue, from Eigen::SyclDevice; modifying sycl tests to support the evaluation of the results for both row major and column major data layout on all different devices that are supported by Sycl{CPU; GPU; and Host}.
|
2016-11-18 16:20:42 +00:00 |
|
Benoit Steiner
|
7c30078b9f
|
Merged eigen/eigen into default
|
2016-11-17 22:53:37 -08:00 |
|
Benoit Steiner
|
553f50b246
|
Added a way to detect errors generated by the opencl device from the host
|
2016-11-17 21:51:48 -08:00 |
|
Benoit Steiner
|
72a45d32e9
|
Cleanup
|
2016-11-17 21:29:15 -08:00 |
|
Benoit Steiner
|
4349fc640e
|
Created a test to check that the sycl runtime can successfully report errors (like ivision by 0).
Small cleanup
|
2016-11-17 20:27:54 -08:00 |
|
Benoit Steiner
|
a6a3fd0703
|
Made TensorDeviceCuda.h compile on windows
|
2016-11-17 16:15:27 -08:00 |
|
Luke Iwanski
|
c5130dedbe
|
Specialised basic math functions for SYCL device.
|
2016-11-17 11:47:13 +00:00 |
|
Benoit Steiner
|
b5c75351e3
|
Merged eigen/eigen into default
|
2016-11-14 15:54:44 -08:00 |
|
Rasmus Munk Larsen
|
32df1b1046
|
Reduce dispatch overhead in parallelFor by only calling thread_pool.Schedule() for one of the two recursive calls in handleRange. This avoids going through the scedule path to push both recursive calls onto another thread-queue in the binary tree, but instead executes one of them on the main thread. At the leaf level this will still activate a full complement of threads, but will save up to 50% of the overhead in Schedule (random number generation, insertion in queue which includes signaling via atomics).
|
2016-11-14 14:18:16 -08:00 |
|
Mehdi Goli
|
05e8c2a1d9
|
Adding extra test for non-fixed size to broadcast; Replacing stcl with sycl.
|
2016-11-14 18:13:53 +00:00 |
|
Mehdi Goli
|
f8ca893976
|
Adding TensorFixsize; adding sycl device memcpy; adding insial stage of slicing.
|
2016-11-14 17:51:57 +00:00 |
|
Mehdi Goli
|
a5c3f15682
|
Adding comment to TensorDeviceSycl.h and cleaning the code.
|
2016-11-11 19:06:34 +00:00 |
|
Mehdi Goli
|
3be3963021
|
Adding EIGEN_STRONG_INLINE back; using size() instead of dimensions.TotalSize() on Tensor.
|
2016-11-10 19:16:31 +00:00 |
|
Mehdi Goli
|
12387abad5
|
adding the missing in eigen_assert!
|
2016-11-10 18:58:08 +00:00 |
|
Mehdi Goli
|
2e704d4257
|
Adding Memset; optimising MecopyDeviceToHost by removing double copying;
|
2016-11-10 18:45:12 +00:00 |
|
Benoit Steiner
|
dcc14bee64
|
Fixed the formatting of the code
|
2016-11-08 14:24:46 -08:00 |
|
Luke Iwanski
|
912cb3d660
|
#if EIGEN_EXCEPTION -> #ifdef EIGEN_EXCEPTIONS.
|
2016-11-08 22:01:14 +00:00 |
|
Luke Iwanski
|
1b345b0895
|
Fix for SYCL queue initialisation.
|
2016-11-08 21:56:31 +00:00 |
|
Luke Iwanski
|
1b95717358
|
Use try/catch only when exceptions are enabled.
|
2016-11-08 21:08:53 +00:00 |
|
Mehdi Goli
|
d57430dd73
|
Converting all sycl buffers to uninitialised device only buffers; adding memcpyHostToDevice and memcpyDeviceToHost on syclDevice; modifying all examples to obey the new rules; moving sycl queue creating to the device based on Benoit suggestion; removing the sycl specefic condition for returning m_result in TensorReduction.h according to Benoit suggestion.
|
2016-11-08 17:08:02 +00:00 |
|
Benoit Steiner
|
dad177be01
|
Added missing includes
|
2016-11-05 10:04:42 -07:00 |
|
Mehdi Goli
|
0ebe3808ca
|
Removed the sycl include from Eigen/Core and moved it to Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size;
|
2016-11-04 18:18:19 +00:00 |
|
Benoit Steiner
|
0585b2965d
|
Disable vectorization on device only when compiling for sycl
|
2016-11-02 11:44:27 -07:00 |
|
Mehdi Goli
|
51af6ae971
|
Fixed the ambiguity in callig make_tuple for sycl backend.
|
2016-10-31 16:35:51 +00:00 |
|
Benoit Steiner
|
0a9ad6fc72
|
Worked around Visual Studio compilation errors
|
2016-10-28 07:54:27 -07:00 |
|
Benoit Steiner
|
b0c5bfdf78
|
Added missing template parameters
|
2016-10-28 03:43:41 +00:00 |
|
Gael Guennebaud
|
530f20c21a
|
Workaround MSVC issue.
|
2016-10-27 21:51:37 +02:00 |
|
Benoit Steiner
|
0a4c4d40b4
|
Removed a template parameter for fixed sized tensors
|
2016-10-26 18:47:37 -07:00 |
|
Benoit Steiner
|
5f2dd503ff
|
Replaced tabs with spaces
|
2016-10-25 20:40:58 -07:00 |
|
Benoit Steiner
|
1644bafe29
|
Code cleanup
|
2016-10-25 20:36:14 -07:00 |
|
Benoit Steiner
|
cf20b30d65
|
Merge latest updates from trunk
|
2016-10-20 09:42:05 -07:00 |
|
Luke Iwanski
|
03b63e182c
|
Added SYCL include in Tensor.
|
2016-10-20 15:32:44 +01:00 |
|
Benoit Steiner
|
d3943cd50c
|
Fixed a few typos in the ternary tensor expressions types
|
2016-10-19 12:56:12 -07:00 |
|
Mehdi Goli
|
e36cb91c99
|
Fixing the code indentation in the TensorReduction.h file.
|
2016-10-14 18:03:00 +01:00 |
|
Luke Iwanski
|
e742da8b28
|
Merged ComputeCpp into default.
|
2016-10-14 13:36:51 +01:00 |
|
Mehdi Goli
|
524fa4c46f
|
Reducing the code by generalising sycl backend functions/structs.
|
2016-10-14 12:09:55 +01:00 |
|
Benoit Steiner
|
7e4a6754b2
|
Merged eigen/eigen into default
|
2016-10-12 22:42:33 -07:00 |
|
Benoit Steiner
|
7f0599b6eb
|
Manually define int16_t and uint16_t when compiling with Visual Studio
|
2016-10-08 22:56:32 -07:00 |
|
Benoit Steiner
|
5c68051cd7
|
Merge the content of the ComputeCpp branch into the default branch
|
2016-10-07 11:04:16 -07:00 |
|
RJ Ryan
|
e2e9cdd169
|
Fully support complex types in SumReducer and MeanReducer when building for CUDA by using scalar_sum_op and scalar_product_op instead of operator+ and operator*.
|
2016-10-06 10:49:48 -07:00 |
|
Benoit Steiner
|
ae1385c7e4
|
Pull the latest updates from trunk
|
2016-10-05 14:54:36 -07:00 |
|
Benoit Steiner
|
c84084c0c0
|
Fixed compilation warning
|
2016-10-05 14:15:41 -07:00 |
|
Benoit Steiner
|
8b69d5d730
|
::rand() returns a signed integer on win32
|
2016-10-05 08:55:02 -07:00 |
|
Benoit Steiner
|
ed7a220b04
|
Fixed a typo that impacts windows builds
|
2016-10-05 08:51:31 -07:00 |
|
Benoit Steiner
|
ceee1c008b
|
Silenced compilation warning
|
2016-10-04 18:47:53 -07:00 |
|
Benoit Steiner
|
6af5ac7e27
|
Cleanup the cuda executor code.
|
2016-10-04 08:52:13 -07:00 |
|
Benoit Steiner
|
2f6d1607c8
|
Cleaned up the random number generation code.
|
2016-10-04 08:38:23 -07:00 |
|
Benoit Steiner
|
2bda1b0d93
|
Updated the tensor sum and mean reducer to enable them to process complex numbers on cuda gpus.
|
2016-09-28 17:08:41 -07:00 |
|
Mehdi Goli
|
dd602e62c8
|
Converting alias template to nested struct in order to be compatible with CXX-03
|
2016-09-27 16:21:19 +01:00 |
|
Benoit Steiner
|
6565f8d60f
|
Made the initialization of a CUDA device thread safe.
|
2016-09-26 11:00:32 -07:00 |
|
Benoit Steiner
|
f6ac51a054
|
Made TensorEvalTo compatible with c++0x again.
|
2016-09-23 16:45:17 -07:00 |
|
Benoit Steiner
|
00d4e65f00
|
Deleted unused TensorMap data member
|
2016-09-23 16:44:45 -07:00 |
|
Benoit Steiner
|
1301d744f8
|
Made the gaussian generator usable on GPU
|
2016-09-22 19:04:44 -07:00 |
|
Benoit Steiner
|
c3ca9b1e76
|
Deleted some unecessary and confusing EIGEN_DEVICE_FUNC
|
2016-09-19 11:33:39 -07:00 |
|
Luke Iwanski
|
b91e021172
|
Merged with default.
|
2016-09-19 14:03:54 +01:00 |
|
Luke Iwanski
|
cb81975714
|
Partial OpenCL support via SYCL compatible with ComputeCpp CE.
|
2016-09-19 12:44:13 +01:00 |
|
Gael Guennebaud
|
18f6e47815
|
Fix order of "static inline".
|
2016-09-16 11:32:54 +02:00 |
|
Benoit Steiner
|
488ad7dd1b
|
Added missing EIGEN_DEVICE_FUNC qualifiers
|
2016-09-14 13:35:00 -07:00 |
|
Benoit Steiner
|
028e299577
|
Fixed a bug impacting some outer reductions on GPU
|
2016-09-12 18:36:52 -07:00 |
|
Benoit Steiner
|
8321dcce76
|
Merged latest updates from trunk
|
2016-09-12 10:33:05 -07:00 |
|
Benoit Steiner
|
eb6ba00cc8
|
Properly size the list of waiters
|
2016-09-12 10:31:55 -07:00 |
|
Benoit Steiner
|
a618094b62
|
Added a resize method to MaxSizeVector
|
2016-09-12 10:30:53 -07:00 |
|
Gael Guennebaud
|
471eac5399
|
bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX)
|
2016-09-08 08:36:27 +02:00 |
|
Benoit Steiner
|
13df3441ae
|
Use MaxSizeVector instead of std::vector: xcode sometimes assumes that std::vector allocates aligned memory and therefore issues aligned instruction to initialize it. This can result in random crashes when compiling with AVX instructions enabled.
|
2016-09-02 19:25:47 -07:00 |
|
Benoit Steiner
|
cadd124d73
|
Pulled latest update from trunk
|
2016-09-02 15:30:02 -07:00 |
|
Benoit Steiner
|
05b0518077
|
Made the index type an explicit template parameter to help some compilers compile the code.
|
2016-09-02 15:29:34 -07:00 |
|
Benoit Steiner
|
adf864fec0
|
Merged in rmlarsen/eigen (pull request PR-222)
Fix CUDA build broken by changes to min and max reduction.
|
2016-09-02 14:11:20 -07:00 |
|
Rasmus Munk Larsen
|
13e93ca8b7
|
Fix CUDA build broken by changes to min and max reduction.
|
2016-09-02 13:41:36 -07:00 |
|
Benoit Steiner
|
c53f783705
|
Updated the contraction code to support constant inputs.
|
2016-09-01 11:41:27 -07:00 |
|
Gael Guennebaud
|
46475eff9a
|
Adjust Tensor module wrt recent change in nullary functor
|
2016-09-01 13:40:45 +02:00 |
|
Rasmus Munk Larsen
|
a1e092d1e8
|
Fix bugs to make min- and max reducers with correctly with IEEE infinities.
|
2016-08-31 15:04:16 -07:00 |
|
Gael Guennebaud
|
35a8e94577
|
bug #1167: simplify installation of header files using cmake's install(DIRECTORY ...) command.
|
2016-08-29 10:59:37 +02:00 |
|
Gael Guennebaud
|
965e595f02
|
Add missing log1p method
|
2016-08-26 14:55:00 +02:00 |
|
Benoit Steiner
|
34ae80179a
|
Use array_prod instead of calling TotalSize since TotalSize is only available on DSize.
|
2016-08-15 10:29:14 -07:00 |
|
Benoit Steiner
|
fe73648c98
|
Fixed a bug in the documentation.
|
2016-08-12 10:00:43 -07:00 |
|
Benoit Steiner
|
64e68cbe87
|
Don't attempt to optimize partial reductions when the optimized implementation doesn't buy anything.
|
2016-08-08 19:29:59 -07:00 |
|
Benoit Steiner
|
ca2cee2739
|
Merged in ibab/eigen (pull request PR-206)
Expose real and imag methods on Tensors
|
2016-08-03 11:53:04 -07:00 |
|
Benoit Steiner
|
a20b58845f
|
CUDA_ARCH isn't always defined, so avoid relying on it too much when figuring out which implementation to use for reductions. Instead rely on the device to tell us on which hardware version we're running.
|
2016-08-03 10:00:43 -07:00 |
|
Benoit Steiner
|
fd220dd8b0
|
Use numext::conj instead of std::conj
|
2016-08-01 18:16:16 -07:00 |
|
Benoit Steiner
|
e256acec7c
|
Avoid unecessary object copies
|
2016-08-01 17:03:39 -07:00 |
|
Benoit Steiner
|
2693fd54bf
|
bug #1266: half implementation has been moved to half_impl namespace
|
2016-07-29 13:45:56 -07:00 |
|
Benoit Steiner
|
3d3d34e442
|
Deleted dead code.
|
2016-07-25 08:53:37 -07:00 |
|
Gael Guennebaud
|
6d5daf32f5
|
bug #1255: comment out broken and unsused line.
|
2016-07-25 14:48:30 +02:00 |
|
Gael Guennebaud
|
9908020d36
|
Add minimal support for Array<string>, and fix Tensor<string>
|
2016-07-25 14:25:56 +02:00 |
|
Benoit Steiner
|
c6b0de2c21
|
Improved partial reductions in more cases
|
2016-07-22 17:18:20 -07:00 |
|
Gael Guennebaud
|
0f350a8b7e
|
Fix CUDA compilation
|
2016-07-21 18:47:07 +02:00 |
|
Benoit Steiner
|
20f7ef2f89
|
An evalTo expression is only aligned iff both the lhs and the rhs are aligned.
|
2016-07-12 10:56:42 -07:00 |
|
Benoit Steiner
|
3a2dd352ae
|
Improved the contraction mapper to properly support tensor products
|
2016-07-11 13:43:41 -07:00 |
|
Benoit Steiner
|
0bc020be9d
|
Improved the detection of packet size in the tensor scan evaluator.
|
2016-07-11 12:14:56 -07:00 |
|
Gael Guennebaud
|
fd60966310
|
merge
|
2016-07-11 18:11:47 +02:00 |
|
Gael Guennebaud
|
194daa3048
|
Fix assertion (it did not make sense for static_val types)
|
2016-07-11 11:39:27 +02:00 |
|
Gael Guennebaud
|
18c35747ce
|
Emulate _BitScanReverse64 for 32 bits builds
|
2016-07-11 11:38:04 +02:00 |
|
Gael Guennebaud
|
599f8ba617
|
Change runtime to compile-time conditional.
|
2016-07-08 11:39:43 +02:00 |
|
Gael Guennebaud
|
544935101a
|
Fix warnings
|
2016-07-08 11:38:52 +02:00 |
|
Gael Guennebaud
|
2f7e2614e7
|
bug #1232: refactor special functions as a new SpecialFunctions module, currently in unsupported/.
|
2016-07-08 11:13:55 +02:00 |
|
Gael Guennebaud
|
179ebb88f9
|
Fix warning
|
2016-07-07 09:16:40 +02:00 |
|
Gael Guennebaud
|
ce9fc0ce14
|
fix clang compilation
|
2016-07-04 12:59:02 +02:00 |
|
Gael Guennebaud
|
440020474c
|
Workaround compilation issue with msvc
|
2016-07-04 12:49:19 +02:00 |
|