Mehdi Goli
bd64ee8555
Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax.
2017-03-28 16:50:34 +01:00
Luke Iwanski
a91417a7a5
Introduces align allocator for SYCL buffer
2017-03-20 14:48:54 +00:00
Benoit Steiner
f8a622ef3c
Merged eigen/eigen into default
2017-03-15 20:06:19 -07:00
Benoit Steiner
fd7db52f9b
Silenced compilation warning
2017-03-15 20:02:39 -07:00
Luke Iwanski
9597d6f6ab
Temporary: Disables cxx11_tensor_argmax_sycl test since it is causing zombie thread
2017-03-15 19:28:09 +00:00
Luke Iwanski
c06861d15e
Fixes bug in get_sycl_supported_devices() that was reporting unsupported Intel CPU on AMD platform - causing timeouts in that configuration
2017-03-15 19:26:08 +00:00
Benoit Steiner
7f31bb6822
Merged in ilya-biryukov/eigen/fix_clang_cuda_compilation (pull request PR-304)
...
Fixed compilation with cuda-clang
2017-03-15 16:48:52 +00:00
Gael Guennebaud
89fd0c3881
better check array index before using it
2017-03-15 15:18:03 +01:00
Benoit Jacob
61160a21d2
ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32.
2017-03-15 06:57:25 -04:00
Benoit Steiner
f0f3591118
Made the reduction code compile with cuda-clang
2017-03-14 14:16:53 -07:00
Mehdi Goli
f499fe9496
Adding synchronisation to convolution kernel for sycl backend.
2017-03-13 09:18:37 +00:00
Rasmus Munk Larsen
bfd7bf9c5b
Get rid of Init().
2017-03-10 08:48:20 -08:00
Rasmus Munk Larsen
d56ab01094
Use C++11 ctor forwarding to simplify code a bit.
2017-03-10 08:30:22 -08:00
Rasmus Munk Larsen
344c2694a6
Make the non-blocking threadpool more flexible and less wasteful of CPU cycles for high-latency use-cases.
...
* Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O.
* This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time.
* Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for.
* Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
2017-03-09 15:41:03 -08:00
Luke Iwanski
1b32a10053
Use name to distinguish name instead of the vendor
2017-03-08 18:26:34 +00:00
Mehdi Goli
aadb7405a7
Fixing typo in sycl Benchmark.
2017-03-08 18:20:06 +00:00
Gael Guennebaud
970ff78294
bug #1401 : fix compilation of "cond ? x : -x" with x an AutoDiffScalar
2017-03-08 16:16:53 +01:00
Mehdi Goli
5e9a1e7a7a
Adding sycl Benchmarks.
2017-03-08 14:17:48 +00:00
Mehdi Goli
e2e3f78533
Fixing potential race condition on sycl device.
2017-03-07 17:48:15 +00:00
Mehdi Goli
f84963ed95
Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch.
2017-03-07 14:27:10 +00:00
Gael Guennebaud
e5156e4d25
fix typo
2017-03-07 11:25:58 +01:00
Gael Guennebaud
5694315fbb
remove UTF8 symbol
2017-03-07 10:53:47 +01:00
Gael Guennebaud
e958c2baac
remove UTF8 symbols
2017-03-07 10:47:40 +01:00
Gael Guennebaud
d967718525
do not include std header within extern C
2017-03-07 10:16:39 +01:00
Gael Guennebaud
659087b622
bug #1400 : fix stableNorm with EIGEN_DONT_ALIGN_STATICALLY
2017-03-07 10:02:34 +01:00
Ilya Biryukov
1c03d43a5c
Fixed compilation with cuda-clang
2017-03-06 12:01:12 +01:00
Benoit Steiner
a71943b9a4
Made the Tensor code compile with clang 3.9
2017-03-02 10:47:29 -08:00
Benoit Steiner
09ae0e6586
Adjusted the EIGEN_DEVICE_FUNC qualifiers to make sure that:
...
* they're used consistently between the declaration and the definition of a function
* we avoid calling host only methods from host device methods.
2017-03-01 11:47:47 -08:00
Benoit Steiner
1e2d046651
Silenced a couple of compilation warnings
2017-03-01 10:13:42 -08:00
Benoit Steiner
c1d87ec110
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-03-01 10:08:50 -08:00
Benoit Steiner
3a3f040baa
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 17:06:15 -08:00
Benoit Steiner
7b61944669
Made most of the packet math primitives usable within CUDA kernel when compiling with clang
2017-02-28 17:05:28 -08:00
Benoit Steiner
c92406d613
Silenced clang compilation warning.
2017-02-28 17:03:11 -08:00
Benoit Steiner
857adbbd52
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 16:42:00 -08:00
Benoit Steiner
c36bc2d445
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 14:58:45 -08:00
Benoit Steiner
4a7df114c8
Added missing EIGEN_DEVICE_FUNC
2017-02-28 14:00:15 -08:00
Benoit Steiner
de7b0fdea9
Made the TensorStorage class compile with clang 3.9
2017-02-28 13:52:22 -08:00
Benoit Steiner
765f4cc4b4
Deleted extra: EIGEN_DEVICE_FUNC: the QR and Cholesky code isn't ready to run on GPU yet.
2017-02-28 11:57:00 -08:00
Benoit Steiner
e993c94f07
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 09:56:45 -08:00
Benoit Steiner
33443ec2b0
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 09:50:10 -08:00
Benoit Steiner
f3e9c42876
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 09:46:30 -08:00
Mehdi Goli
8296b87d7b
Adding sycl backend for TensorCustomOp; fixing the partial lhs modification issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used.
2017-02-28 17:16:14 +00:00
Gael Guennebaud
4e98a7b2f0
bug #1396 : add some missing EIGEN_DEVICE_FUNC
2017-02-28 09:47:38 +01:00
Gael Guennebaud
478a9f53be
Fix typo.
2017-02-28 09:32:45 +01:00
Benoit Steiner
889c606f8f
Added missing EIGEN_DEVICE_FUNC to the SelfCwise binary ops
2017-02-27 17:17:47 -08:00
Benoit Steiner
193939d6aa
Added missing EIGEN_DEVICE_FUNC qualifiers to several nullary op methods.
2017-02-27 17:11:47 -08:00
Benoit Steiner
ed4dc9d01a
Declared the plset, ploadt_ro, and ploaddup packet primitives as usable within a gpu kernel
2017-02-27 16:57:01 -08:00
Benoit Steiner
b1fc7c9a09
Added missing EIGEN_DEVICE_FUNC qualifiers.
2017-02-27 16:48:30 -08:00
Benoit Steiner
554116bec1
Added EIGEN_DEVICE_FUNC to make the prototype of the EigenBase override match that of DenseBase
2017-02-27 16:45:31 -08:00
Benoit Steiner
34d9fce93b
Avoid unecessary float to double conversions.
2017-02-27 16:33:33 -08:00