Mehdi Goli
b42d775f13
Temporarry branch for synch with upstream
2017-05-23 10:51:14 +01:00
Benoit Steiner
0d08165a7f
Merged in benoitsteiner/opencl (pull request PR-309)
...
OpenCL improvements
2017-04-05 14:28:08 +00:00
Benoit Steiner
068cc09708
Preserve file naming conventions
2017-04-04 10:09:10 -07:00
Benoit Steiner
c302ea7bc4
Deleted empty line of code
2017-04-04 10:05:16 -07:00
Benoit Steiner
a5a0c8fac1
Guard sycl specific code under a EIGEN_USE_SYCL ifdef
2017-04-04 10:03:21 -07:00
Benoit Steiner
a1304b95b7
Code cleanup
2017-04-04 10:00:46 -07:00
Benoit Steiner
66c63826bd
Guard the sycl specific code with EIGEN_USE_SYCL
2017-04-04 09:59:09 -07:00
Benoit Steiner
e3e343390a
Guard the sycl specific code with a #ifdef EIGEN_USE_SYCL
2017-04-04 09:56:33 -07:00
Benoit Steiner
63840d4666
iGate the sycl specific code under a EIGEN_USE_SYCL define
2017-04-04 09:54:31 -07:00
Benoit Steiner
bc050ea9f0
Fixed compilation error when sycl is enabled.
2017-04-04 09:47:04 -07:00
Gagan Goel
4910630c96
fix typos in the Tensor readme
2017-03-31 20:32:16 -04:00
Benoit Steiner
c1b3d5ecb6
Restored code compatibility with compilers that dont support c++11
...
Gated more sycl code under #ifdef sycl
2017-03-31 08:31:28 -07:00
Benoit Steiner
e2d5d4e7b3
Restore the old constructors to retain compatibility with non c++11 compilers.
2017-03-31 08:26:13 -07:00
Benoit Steiner
73fcaa319f
Gate the sycl specific code under #ifdef sycl
2017-03-31 08:22:25 -07:00
Mehdi Goli
bd64ee8555
Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax.
2017-03-28 16:50:34 +01:00
Luke Iwanski
a91417a7a5
Introduces align allocator for SYCL buffer
2017-03-20 14:48:54 +00:00
Gael Guennebaud
aae19c70ac
update has_ReturnType to be more consistent with other has_ helpers
2017-03-17 17:33:15 +01:00
Benoit Steiner
f8a622ef3c
Merged eigen/eigen into default
2017-03-15 20:06:19 -07:00
Benoit Steiner
fd7db52f9b
Silenced compilation warning
2017-03-15 20:02:39 -07:00
Luke Iwanski
9597d6f6ab
Temporary: Disables cxx11_tensor_argmax_sycl test since it is causing zombie thread
2017-03-15 19:28:09 +00:00
Luke Iwanski
c06861d15e
Fixes bug in get_sycl_supported_devices() that was reporting unsupported Intel CPU on AMD platform - causing timeouts in that configuration
2017-03-15 19:26:08 +00:00
Benoit Steiner
7f31bb6822
Merged in ilya-biryukov/eigen/fix_clang_cuda_compilation (pull request PR-304)
...
Fixed compilation with cuda-clang
2017-03-15 16:48:52 +00:00
Gael Guennebaud
89fd0c3881
better check array index before using it
2017-03-15 15:18:03 +01:00
Benoit Jacob
61160a21d2
ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32.
2017-03-15 06:57:25 -04:00
Benoit Steiner
f0f3591118
Made the reduction code compile with cuda-clang
2017-03-14 14:16:53 -07:00
Mehdi Goli
f499fe9496
Adding synchronisation to convolution kernel for sycl backend.
2017-03-13 09:18:37 +00:00
Rasmus Munk Larsen
bfd7bf9c5b
Get rid of Init().
2017-03-10 08:48:20 -08:00
Rasmus Munk Larsen
d56ab01094
Use C++11 ctor forwarding to simplify code a bit.
2017-03-10 08:30:22 -08:00
Rasmus Munk Larsen
344c2694a6
Make the non-blocking threadpool more flexible and less wasteful of CPU cycles for high-latency use-cases.
...
* Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O.
* This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time.
* Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for.
* Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
2017-03-09 15:41:03 -08:00
Luke Iwanski
1b32a10053
Use name to distinguish name instead of the vendor
2017-03-08 18:26:34 +00:00
Mehdi Goli
aadb7405a7
Fixing typo in sycl Benchmark.
2017-03-08 18:20:06 +00:00
Gael Guennebaud
970ff78294
bug #1401 : fix compilation of "cond ? x : -x" with x an AutoDiffScalar
2017-03-08 16:16:53 +01:00
Mehdi Goli
5e9a1e7a7a
Adding sycl Benchmarks.
2017-03-08 14:17:48 +00:00
Mehdi Goli
e2e3f78533
Fixing potential race condition on sycl device.
2017-03-07 17:48:15 +00:00
Mehdi Goli
f84963ed95
Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch.
2017-03-07 14:27:10 +00:00
Gael Guennebaud
e5156e4d25
fix typo
2017-03-07 11:25:58 +01:00
Gael Guennebaud
5694315fbb
remove UTF8 symbol
2017-03-07 10:53:47 +01:00
Gael Guennebaud
e958c2baac
remove UTF8 symbols
2017-03-07 10:47:40 +01:00
Gael Guennebaud
d967718525
do not include std header within extern C
2017-03-07 10:16:39 +01:00
Gael Guennebaud
659087b622
bug #1400 : fix stableNorm with EIGEN_DONT_ALIGN_STATICALLY
2017-03-07 10:02:34 +01:00
Ilya Biryukov
1c03d43a5c
Fixed compilation with cuda-clang
2017-03-06 12:01:12 +01:00
Benoit Steiner
a71943b9a4
Made the Tensor code compile with clang 3.9
2017-03-02 10:47:29 -08:00
Benoit Steiner
09ae0e6586
Adjusted the EIGEN_DEVICE_FUNC qualifiers to make sure that:
...
* they're used consistently between the declaration and the definition of a function
* we avoid calling host only methods from host device methods.
2017-03-01 11:47:47 -08:00
Benoit Steiner
1e2d046651
Silenced a couple of compilation warnings
2017-03-01 10:13:42 -08:00
Benoit Steiner
c1d87ec110
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-03-01 10:08:50 -08:00
Benoit Steiner
3a3f040baa
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 17:06:15 -08:00
Benoit Steiner
7b61944669
Made most of the packet math primitives usable within CUDA kernel when compiling with clang
2017-02-28 17:05:28 -08:00
Benoit Steiner
c92406d613
Silenced clang compilation warning.
2017-02-28 17:03:11 -08:00
Benoit Steiner
857adbbd52
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 16:42:00 -08:00
Benoit Steiner
c36bc2d445
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 14:58:45 -08:00