Commit Graph

1031 Commits

Author SHA1 Message Date
Benoit Steiner
a1304b95b7 Code cleanup 2017-04-04 10:00:46 -07:00
Benoit Steiner
66c63826bd Guard the sycl specific code with EIGEN_USE_SYCL 2017-04-04 09:59:09 -07:00
Benoit Steiner
e3e343390a Guard the sycl specific code with a #ifdef EIGEN_USE_SYCL 2017-04-04 09:56:33 -07:00
Benoit Steiner
63840d4666 iGate the sycl specific code under a EIGEN_USE_SYCL define 2017-04-04 09:54:31 -07:00
Benoit Steiner
bc050ea9f0 Fixed compilation error when sycl is enabled. 2017-04-04 09:47:04 -07:00
Gagan Goel
4910630c96 fix typos in the Tensor readme 2017-03-31 20:32:16 -04:00
Benoit Steiner
c1b3d5ecb6 Restored code compatibility with compilers that dont support c++11
Gated more sycl code under #ifdef sycl
2017-03-31 08:31:28 -07:00
Benoit Steiner
e2d5d4e7b3 Restore the old constructors to retain compatibility with non c++11 compilers. 2017-03-31 08:26:13 -07:00
Benoit Steiner
73fcaa319f Gate the sycl specific code under #ifdef sycl 2017-03-31 08:22:25 -07:00
Mehdi Goli
bd64ee8555 Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax. 2017-03-28 16:50:34 +01:00
Luke Iwanski
a91417a7a5 Introduces align allocator for SYCL buffer 2017-03-20 14:48:54 +00:00
Benoit Steiner
f8a622ef3c Merged eigen/eigen into default 2017-03-15 20:06:19 -07:00
Benoit Steiner
fd7db52f9b Silenced compilation warning 2017-03-15 20:02:39 -07:00
Luke Iwanski
c06861d15e Fixes bug in get_sycl_supported_devices() that was reporting unsupported Intel CPU on AMD platform - causing timeouts in that configuration 2017-03-15 19:26:08 +00:00
Benoit Steiner
f0f3591118 Made the reduction code compile with cuda-clang 2017-03-14 14:16:53 -07:00
Mehdi Goli
f499fe9496 Adding synchronisation to convolution kernel for sycl backend. 2017-03-13 09:18:37 +00:00
Rasmus Munk Larsen
bfd7bf9c5b Get rid of Init(). 2017-03-10 08:48:20 -08:00
Rasmus Munk Larsen
d56ab01094 Use C++11 ctor forwarding to simplify code a bit. 2017-03-10 08:30:22 -08:00
Rasmus Munk Larsen
344c2694a6 Make the non-blocking threadpool more flexible and less wasteful of CPU cycles for high-latency use-cases.
* Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O.

* This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time.

* Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for.

* Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
2017-03-09 15:41:03 -08:00
Luke Iwanski
1b32a10053 Use name to distinguish name instead of the vendor 2017-03-08 18:26:34 +00:00
Mehdi Goli
5e9a1e7a7a Adding sycl Benchmarks. 2017-03-08 14:17:48 +00:00
Mehdi Goli
e2e3f78533 Fixing potential race condition on sycl device. 2017-03-07 17:48:15 +00:00
Mehdi Goli
f84963ed95 Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch. 2017-03-07 14:27:10 +00:00
Benoit Steiner
a71943b9a4 Made the Tensor code compile with clang 3.9 2017-03-02 10:47:29 -08:00
Benoit Steiner
1e2d046651 Silenced a couple of compilation warnings 2017-03-01 10:13:42 -08:00
Benoit Steiner
c92406d613 Silenced clang compilation warning. 2017-02-28 17:03:11 -08:00
Benoit Steiner
de7b0fdea9 Made the TensorStorage class compile with clang 3.9 2017-02-28 13:52:22 -08:00
Mehdi Goli
8296b87d7b Adding sycl backend for TensorCustomOp; fixing the partial lhs modification issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used. 2017-02-28 17:16:14 +00:00
Gael Guennebaud
478a9f53be Fix typo. 2017-02-28 09:32:45 +01:00
Benoit Steiner
e0bd6f5738 Merged eigen/eigen into default 2017-02-26 10:02:14 -08:00
Mehdi Goli
2fa2b617a9 Adding TensorVolumePatchOP.h for sycl 2017-02-24 19:16:24 +00:00
Mehdi Goli
0b7875f137 Converting fixed float type into template type for TensorContraction. 2017-02-24 18:13:30 +00:00
Mehdi Goli
89dfd51fae Adding Sycl Backend for TensorGenerator.h. 2017-02-22 16:36:24 +00:00
Mehdi Goli
79ebc8f761 Adding Sycl backend for TensorImagePatchOP.h; adding Sycl backend for TensorInflation.h. 2017-02-20 12:11:05 +00:00
Gael Guennebaud
a811a04696 Silent warning. 2017-02-20 10:14:21 +01:00
Gael Guennebaud
f8a55cc062 Fix compilation. 2017-02-18 10:08:13 +01:00
Benoit Steiner
cfa0568ef7 Size indices are signed. 2017-02-16 10:13:34 -08:00
Mehdi Goli
91982b91c0 Adding TensorLayoutSwapOp for sycl. 2017-02-15 16:28:12 +00:00
Mehdi Goli
b1e312edd6 Adding TensorPatch.h for sycl backend. 2017-02-15 10:13:01 +00:00
Mehdi Goli
0d153ded29 Adding TensorChippingOP for sycl backend; fixing the index value in the verification operation for cxx11_tensorChipping.cpp test 2017-02-13 17:25:12 +00:00
Benoit Steiner
769208a17f Pulled latest updates from upstream 2017-02-10 13:11:40 -08:00
Mehdi Goli
0ee97b60c2 Adding mean to TensorReductionSycl.h 2017-02-07 15:43:17 +00:00
Mehdi Goli
42bd5c4e7b Fixing TensorReductionSycl for min and max. 2017-02-06 18:05:23 +00:00
Mehdi Goli
bc128f9f3b Reducing the warnings in Sycl backend. 2017-02-02 10:43:47 +00:00
Benoit Steiner
442e9cbb30 Silenced several compilation warnings 2017-02-01 15:50:58 -08:00
Mehdi Goli
bab29936a1 Reducing warnings in Sycl backend. 2017-02-01 15:29:53 +00:00
Mehdi Goli
48a20b7d95 Fixing compiler error on TensorContractionSycl.h; Silencing the compiler unused parameter warning for eval_op_indices in TensorContraction.h 2017-01-31 14:06:36 +00:00
Benoit Steiner
fbc39fd02c Merge latest changes from upstream 2017-01-30 15:25:57 -08:00
Mehdi Goli
82ce92419e Fixing the buffer type in memcpy. 2017-01-30 11:38:20 +00:00
Rasmus Munk Larsen
edaa0fc5d1 Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy.
This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
2017-01-26 12:46:06 -08:00
Gael Guennebaud
25a1703579 Merged in ggael/eigen-flexidexing (pull request PR-294)
generalized operator() for indexed access and slicing
2017-01-26 08:04:23 +00:00
Gael Guennebaud
607be65a03 Fix duplicates of array_size bewteen unsupported and Core 2017-01-25 22:53:58 +01:00
Rasmus Munk Larsen
e6b1020221 Adds a fast memcpy function to Eigen. This takes advantage of the following:
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.

2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux

Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.

The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.

Measured improvements in wall clock time:

Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2                          3.48      2.39    +31.3%
BM_memcpy_1T/8                          12.3      6.51    +47.0%
BM_memcpy_1T/64                          371       383     -3.2%
BM_memcpy_1T/512                       66922     66720     +0.3%
BM_memcpy_1T/4k                      9892867   6849682    +30.8%
BM_memcpy_1T/5k                     14951099  10332856    +30.9%
BM_memcpy_2T/2                          3.50      2.46    +29.7%
BM_memcpy_2T/8                          12.3      7.66    +37.7%
BM_memcpy_2T/64                          371       376     -1.3%
BM_memcpy_2T/512                       66652     66788     -0.2%
BM_memcpy_2T/4k                      6145012   6117776     +0.4%
BM_memcpy_2T/5k                      9181478   9010942     +1.9%
BM_memcpy_4T/2                          3.47      2.47    +31.0%
BM_memcpy_4T/8                          12.3      6.67    +45.8
BM_memcpy_4T/64                          374       376     -0.5%
BM_memcpy_4T/512                       67833     68019     -0.3%
BM_memcpy_4T/4k                      5057425   5188253     -2.6%
BM_memcpy_4T/5k                      7555638   7779468     -3.0%
BM_memcpy_6T/2                          3.51      2.50    +28.8%
BM_memcpy_6T/8                          12.3      7.61    +38.1%
BM_memcpy_6T/64                          373       378     -1.3%
BM_memcpy_6T/512                       66871     66774     +0.1%
BM_memcpy_6T/4k                      5112975   5233502     -2.4%
BM_memcpy_6T/5k                      7614180   7772246     -2.1%
BM_memcpy_8T/2                          3.47      2.41    +30.5%
BM_memcpy_8T/8                          12.4      10.5    +15.3%
BM_memcpy_8T/64                          372       388     -4.3%
BM_memcpy_8T/512                       67373     66588     +1.2%
BM_memcpy_8T/4k                      5148462   5254897     -2.1%
BM_memcpy_8T/5k                      7660989   7799058     -1.8%
BM_memcpy_12T/2                         3.50      2.40    +31.4%
BM_memcpy_12T/8                         12.4      7.55    +39.1
BM_memcpy_12T/64                         374       378     -1.1%
BM_memcpy_12T/512                      67132     66683     +0.7%
BM_memcpy_12T/4k                     5185125   5292920     -2.1%
BM_memcpy_12T/5k                     7717284   7942684     -2.9%
BM_slicingSmallPieces_1T/2              47.3      47.5     +0.4%
BM_slicingSmallPieces_1T/8              53.6      52.3     +2.4%
BM_slicingSmallPieces_1T/64              491       476     +3.1%
BM_slicingSmallPieces_1T/512           21734     18814    +13.4%
BM_slicingSmallPieces_1T/4k           394660    396760     -0.5%
BM_slicingSmallPieces_1T/5k           218722    209244     +4.3%
BM_slicingSmallPieces_2T/2              80.7      79.9     +1.0%
BM_slicingSmallPieces_2T/8              54.2      53.1     +2.0
BM_slicingSmallPieces_2T/64              497       477     +4.0%
BM_slicingSmallPieces_2T/512           21732     18822    +13.4%
BM_slicingSmallPieces_2T/4k           392885    390490     +0.6%
BM_slicingSmallPieces_2T/5k           221988    208678     +6.0%
BM_slicingSmallPieces_4T/2              80.8      80.1     +0.9%
BM_slicingSmallPieces_4T/8              54.1      53.2     +1.7%
BM_slicingSmallPieces_4T/64              493       476     +3.4%
BM_slicingSmallPieces_4T/512           21702     18758    +13.6%
BM_slicingSmallPieces_4T/4k           393962    404023     -2.6%
BM_slicingSmallPieces_4T/5k           249667    211732    +15.2%
BM_slicingSmallPieces_6T/2              80.5      80.1     +0.5%
BM_slicingSmallPieces_6T/8              54.4      53.4     +1.8%
BM_slicingSmallPieces_6T/64              488       478     +2.0%
BM_slicingSmallPieces_6T/512           21719     18841    +13.3%
BM_slicingSmallPieces_6T/4k           394950    397583     -0.7%
BM_slicingSmallPieces_6T/5k           223080    210148     +5.8%
BM_slicingSmallPieces_8T/2              81.2      80.4     +1.0%
BM_slicingSmallPieces_8T/8              58.1      53.5     +7.9%
BM_slicingSmallPieces_8T/64              489       480     +1.8%
BM_slicingSmallPieces_8T/512           21586     18798    +12.9%
BM_slicingSmallPieces_8T/4k           394592    400165     -1.4%
BM_slicingSmallPieces_8T/5k           219688    208301     +5.2%
BM_slicingSmallPieces_12T/2             80.2      79.8     +0.7%
BM_slicingSmallPieces_12T/8             54.4      53.4     +1.8
BM_slicingSmallPieces_12T/64             488       476     +2.5%
BM_slicingSmallPieces_12T/512          21931     18831    +14.1%
BM_slicingSmallPieces_12T/4k          393962    396541     -0.7%
BM_slicingSmallPieces_12T/5k          218803    207965     +5.0%
2017-01-24 13:55:18 -08:00
Luke Iwanski
bf44fed9b7 Allows AMD APU 2017-01-23 15:56:45 +00:00
Mehdi Goli
602f8c27f5 Reverting back to the previous TensorDeviceSycl.h as the total number of buffer is not enough for tensorflow. 2017-01-20 18:23:20 +00:00
Mehdi Goli
77cc4d06c7 Removing unused variables 2017-01-19 17:06:21 +00:00
Mehdi Goli
837fdbdcb2 Merging with Benoit's upstream. 2017-01-19 11:34:34 +00:00
Mehdi Goli
6bdd15f572 Adding non-deferrenciable pointer track for ComputeCpp backend; Adding TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class. 2017-01-19 11:30:59 +00:00
Mehdi Goli
c6f7b33834 Applying Benoit's comment. Embedding synchronisation inside device memcpy so there is no need to externally call synchronise() for device memcopy. 2017-01-18 10:45:28 +00:00
Mehdi Goli
e46e722381 Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree. 2017-01-16 13:58:49 +00:00
Gael Guennebaud
bbd97b4095 Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases 2017-07-17 01:02:51 +02:00
Luke Iwanski
90c5bc8d64 Fixes auto appearance in functor template argument for reduction. 2017-01-04 22:18:44 +00:00
Mehdi Goli
8b1c2108ba Reverting asynchronous exec to Synchronous exec regarding random race condition. 2016-12-22 16:45:38 +00:00
Benoit Steiner
660da83e18 Pulled latest update from trunk 2016-12-21 16:43:27 -08:00
Benoit Steiner
4236aebe10 Simplified the contraction code` 2016-12-21 16:42:56 -08:00
Benoit Steiner
3cfa16f41d Merged in benoitsteiner/opencl (pull request PR-279)
Fix for auto appearing in functor template argument.
2016-12-21 15:08:54 -08:00
Benoit Steiner
519d63d350 Added support for libxsmm kernel in multithreaded contractions 2016-12-21 15:06:06 -08:00
Benoit Steiner
f9eff17e91 Leverage libxsmm kernels within signle threaded contractions 2016-12-21 12:32:06 -08:00
Luke Iwanski
c55ecfd820 Fix for auto appearing in functor template argument. 2016-12-21 15:42:51 +00:00
Luke Iwanski
29186f766f Fixed order of initialisation in ExecExprFunctorKernel functor. 2016-12-20 21:32:42 +00:00
Luke Iwanski
8245851d1b Matching parameters order between lambda and the functor. 2016-12-20 16:18:15 +00:00
Benoit Steiner
70d0172f0c Merged eigen/eigen into default 2016-12-16 17:37:04 -08:00
Benoit Steiner
8910442e19 Fixed memcpy, memcpyHostToDevice and memcpyDeviceToHost for Sycl. 2016-12-16 15:45:04 -08:00
Luke Iwanski
54db66c5df struct -> class in order to silence compilation warning. 2016-12-16 20:25:20 +00:00
Mehdi Goli
35bae513a0 Converting all parallel for lambda to functor in order to prevent kernel duplication name error; adding tensorConcatinationOp backend for sycl. 2016-12-16 19:46:45 +00:00
Mehdi Goli
c5e8546306 Adding asynchandler to sycl queue as lack of it can cause undefined behaviour. 2016-12-15 16:59:57 +00:00
Benoit Steiner
2c2e218471 Avoid using #define since they can conflict with user code 2016-12-14 19:49:15 -08:00
Benoit Steiner
3beb180ee5 Don't call EnvThread::OnCancel by default since it doesn't do anything. 2016-12-14 18:33:39 -08:00
Benoit Steiner
9ff5d0f821 Merged eigen/eigen into default 2016-12-14 17:32:16 -08:00
Mehdi Goli
730eb9fe1c Adding asynchronous execution as it improves the performance. 2016-12-14 17:38:53 +00:00
Mehdi Goli
2d4a091beb Adding tensor contraction operation backend for Sycl; adding test for contractionOp sycl backend; adding temporary solution to prevent memory leak in buffer; cleaning up cxx11_tensor_buildins_sycl.h 2016-12-14 15:30:37 +00:00
Benoit Steiner
a432fc102d Moved the choice of ThreadPool to unsupported/Eigen/CXX11/ThreadPool 2016-12-12 15:24:16 -08:00
Benoit Steiner
8ae68924ed Made ThreadPoolInterface::Cancel() an optional functionality 2016-12-12 11:58:38 -08:00
Benoit Steiner
76fca22134 Use a more accurate timer to sleep on Linux systems. 2016-12-09 15:12:24 -08:00
Benoit Steiner
4deafd35b7 Introduce a portable EIGEN_SLEEP macro. 2016-12-09 14:52:15 -08:00
Benoit Steiner
aafa97f4d2 Fixed build error with MSVC 2016-12-09 14:42:32 -08:00
Benoit Steiner
2f5b7a199b Reworked the threadpool cancellation mechanism to not depend on pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms. 2016-12-09 13:05:14 -08:00
Benoit Steiner
28ee8f42b2 Added a Flush method to the RunQueue 2016-12-08 14:07:56 -08:00
Benoit Steiner
69ef267a77 Added the new threadpool cancel method to the threadpool interface based class. 2016-12-08 14:03:25 -08:00
Benoit Steiner
7bfff85355 Added support for thread cancellation on Linux 2016-12-08 08:12:49 -08:00
Benoit Steiner
462c28e77a Merged in srvasude/eigen (pull request PR-265)
Add Expm1 support to Eigen.
2016-12-05 02:31:11 +00:00
Gael Guennebaud
4465d20403 Add missing generic load methods. 2016-12-03 21:25:04 +01:00
Srinivas Vasudevan
218764ee1f Added support for expm1 in Eigen. 2016-12-02 14:13:01 -08:00
Mehdi Goli
592acc5bfa Makingt default numeric_list works with sycl. 2016-12-02 17:58:30 +00:00
Mehdi Goli
79aa2b784e Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code. 2016-12-01 13:02:27 +00:00
Benoit Steiner
a70393fd02 Cleaned up forward declarations 2016-11-30 21:59:07 -08:00
Benoit Steiner
e073de96dc Moved the MemCopyFunctor back to TensorSyclDevice since it's the only caller and it makes TensorFlow compile again 2016-11-30 21:36:52 -08:00
Benoit Steiner
fca27350eb Added the deallocate_all() method back 2016-11-30 20:45:20 -08:00
Benoit Steiner
e633a8371f Simplified includes 2016-11-30 20:21:18 -08:00
Benoit Steiner
7cd33df4ce Improved formatting 2016-11-30 20:20:44 -08:00
Benoit Steiner
f5107010ee Udated the Sizes class to work on AMD gpus without requiring a separate implementation 2016-11-30 19:57:28 -08:00
Benoit Steiner
e37c2c52d3 Added an implementation of numeric_list that works with sycl 2016-11-30 19:55:15 -08:00
Luke Iwanski
26fff1c5b1 Added EIGEN_STRONG_INLINE to get_sycl_supported_device(). 2016-11-30 16:55:22 +00:00
Mehdi Goli
577ce78085 Adding TensorShuffling backend for sycl; adding TensorReshaping backend for sycl; cleaning up the sycl backend. 2016-11-29 15:30:42 +00:00
Benoit Steiner
02080e2b67 Merged eigen/eigen into default 2016-11-27 07:27:30 -08:00
Benoit Steiner
9fd081cddc Fixed compilation warnings 2016-11-26 20:22:25 -08:00
Benoit Steiner
9f8fbd9434 Merged eigen/eigen into default 2016-11-26 11:28:25 -08:00
Benoit Steiner
67b2c41f30 Avoided unnecessary type conversion 2016-11-26 11:27:29 -08:00
Benoit Steiner
7fe704596a Added missing array_get method for numeric_list 2016-11-26 11:26:07 -08:00
Mehdi Goli
7318daf887 Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h. 2016-11-25 16:19:07 +00:00
Benoit Steiner
7ad37606dd Fixed the documentation of Scalar Tensors 2016-11-24 12:31:43 -08:00
Mehdi Goli
b8cc5635d5 Removing unsupported device from test case; cleaning the tensor device sycl. 2016-11-23 16:30:41 +00:00
Benoit Steiner
f11da1d83b Made the QueueInterface thread safe 2016-11-20 13:17:08 -08:00
Benoit Steiner
6d781e3e52 Merged eigen/eigen into default 2016-11-20 10:12:54 -08:00
Benoit Steiner
79a07b891b Fixed a typo 2016-11-20 07:07:41 -08:00
Benoit Steiner
81151bd474 Fixed merge conflicts 2016-11-19 19:12:59 -08:00
Benoit Steiner
9265ca707e Made it possible to check the state of a sycl device without synchronization 2016-11-19 10:56:24 -08:00
Benoit Steiner
2d1aec15a7 Added missing include 2016-11-19 08:09:54 -08:00
Benoit Steiner
1bdf1b9ce0 Merged in benoitsteiner/opencl (pull request PR-253)
OpenCL improvements
2016-11-19 04:44:43 +00:00
Benoit Steiner
dc601d79d1 Added the ability to run test exclusively OpenCL devices that are listed by sycl::device::get_devices(). 2016-11-18 16:26:50 -08:00
Benoit Steiner
110b7f8d9f Deleted unnecessary semicolons 2016-11-18 14:06:17 -08:00
Benoit Steiner
37c2c516a6 Cleaned up the sycl device code 2016-11-18 12:38:06 -08:00
Mehdi Goli
15e226d7d3 adding Benoit changes on the TensorDeviceSycl.h 2016-11-18 16:34:54 +00:00
Mehdi Goli
622805a0c5 Modifying TensorDeviceSycl.h to always create buffer of type uint8_t and convert them to the actual type at the execution on the device; adding the queue interface class to separate the lifespan of sycl queue and buffers,created for that queue, from Eigen::SyclDevice; modifying sycl tests to support the evaluation of the results for both row major and column major data layout on all different devices that are supported by Sycl{CPU; GPU; and Host}. 2016-11-18 16:20:42 +00:00
Benoit Steiner
7c30078b9f Merged eigen/eigen into default 2016-11-17 22:53:37 -08:00
Benoit Steiner
553f50b246 Added a way to detect errors generated by the opencl device from the host 2016-11-17 21:51:48 -08:00
Benoit Steiner
72a45d32e9 Cleanup 2016-11-17 21:29:15 -08:00
Benoit Steiner
4349fc640e Created a test to check that the sycl runtime can successfully report errors (like ivision by 0).
Small cleanup
2016-11-17 20:27:54 -08:00
Benoit Steiner
a6a3fd0703 Made TensorDeviceCuda.h compile on windows 2016-11-17 16:15:27 -08:00
Luke Iwanski
c5130dedbe Specialised basic math functions for SYCL device. 2016-11-17 11:47:13 +00:00
Benoit Steiner
b5c75351e3 Merged eigen/eigen into default 2016-11-14 15:54:44 -08:00
Rasmus Munk Larsen
32df1b1046 Reduce dispatch overhead in parallelFor by only calling thread_pool.Schedule() for one of the two recursive calls in handleRange. This avoids going through the scedule path to push both recursive calls onto another thread-queue in the binary tree, but instead executes one of them on the main thread. At the leaf level this will still activate a full complement of threads, but will save up to 50% of the overhead in Schedule (random number generation, insertion in queue which includes signaling via atomics). 2016-11-14 14:18:16 -08:00
Mehdi Goli
05e8c2a1d9 Adding extra test for non-fixed size to broadcast; Replacing stcl with sycl. 2016-11-14 18:13:53 +00:00
Mehdi Goli
f8ca893976 Adding TensorFixsize; adding sycl device memcpy; adding insial stage of slicing. 2016-11-14 17:51:57 +00:00
Mehdi Goli
a5c3f15682 Adding comment to TensorDeviceSycl.h and cleaning the code. 2016-11-11 19:06:34 +00:00
Mehdi Goli
3be3963021 Adding EIGEN_STRONG_INLINE back; using size() instead of dimensions.TotalSize() on Tensor. 2016-11-10 19:16:31 +00:00
Mehdi Goli
12387abad5 adding the missing in eigen_assert! 2016-11-10 18:58:08 +00:00
Mehdi Goli
2e704d4257 Adding Memset; optimising MecopyDeviceToHost by removing double copying; 2016-11-10 18:45:12 +00:00
Benoit Steiner
dcc14bee64 Fixed the formatting of the code 2016-11-08 14:24:46 -08:00
Luke Iwanski
912cb3d660 #if EIGEN_EXCEPTION -> #ifdef EIGEN_EXCEPTIONS. 2016-11-08 22:01:14 +00:00
Luke Iwanski
1b345b0895 Fix for SYCL queue initialisation. 2016-11-08 21:56:31 +00:00
Luke Iwanski
1b95717358 Use try/catch only when exceptions are enabled. 2016-11-08 21:08:53 +00:00
Mehdi Goli
d57430dd73 Converting all sycl buffers to uninitialised device only buffers; adding memcpyHostToDevice and memcpyDeviceToHost on syclDevice; modifying all examples to obey the new rules; moving sycl queue creating to the device based on Benoit suggestion; removing the sycl specefic condition for returning m_result in TensorReduction.h according to Benoit suggestion. 2016-11-08 17:08:02 +00:00
Benoit Steiner
dad177be01 Added missing includes 2016-11-05 10:04:42 -07:00
Mehdi Goli
0ebe3808ca Removed the sycl include from Eigen/Core and moved it to Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size; 2016-11-04 18:18:19 +00:00
Benoit Steiner
0585b2965d Disable vectorization on device only when compiling for sycl 2016-11-02 11:44:27 -07:00
Mehdi Goli
51af6ae971 Fixed the ambiguity in callig make_tuple for sycl backend. 2016-10-31 16:35:51 +00:00
Benoit Steiner
0a9ad6fc72 Worked around Visual Studio compilation errors 2016-10-28 07:54:27 -07:00
Benoit Steiner
b0c5bfdf78 Added missing template parameters 2016-10-28 03:43:41 +00:00
Gael Guennebaud
530f20c21a Workaround MSVC issue. 2016-10-27 21:51:37 +02:00