Gael Guennebaud
|
a811a04696
|
Silent warning.
|
2017-02-20 10:14:21 +01:00 |
|
Gael Guennebaud
|
f8a55cc062
|
Fix compilation.
|
2017-02-18 10:08:13 +01:00 |
|
Benoit Steiner
|
cfa0568ef7
|
Size indices are signed.
|
2017-02-16 10:13:34 -08:00 |
|
Mehdi Goli
|
91982b91c0
|
Adding TensorLayoutSwapOp for sycl.
|
2017-02-15 16:28:12 +00:00 |
|
Mehdi Goli
|
b1e312edd6
|
Adding TensorPatch.h for sycl backend.
|
2017-02-15 10:13:01 +00:00 |
|
Mehdi Goli
|
0d153ded29
|
Adding TensorChippingOP for sycl backend; fixing the index value in the verification operation for cxx11_tensorChipping.cpp test
|
2017-02-13 17:25:12 +00:00 |
|
Benoit Steiner
|
769208a17f
|
Pulled latest updates from upstream
|
2017-02-10 13:11:40 -08:00 |
|
Mehdi Goli
|
0ee97b60c2
|
Adding mean to TensorReductionSycl.h
|
2017-02-07 15:43:17 +00:00 |
|
Mehdi Goli
|
42bd5c4e7b
|
Fixing TensorReductionSycl for min and max.
|
2017-02-06 18:05:23 +00:00 |
|
Mehdi Goli
|
bc128f9f3b
|
Reducing the warnings in Sycl backend.
|
2017-02-02 10:43:47 +00:00 |
|
Benoit Steiner
|
442e9cbb30
|
Silenced several compilation warnings
|
2017-02-01 15:50:58 -08:00 |
|
Mehdi Goli
|
ff53050034
|
Converting ptrdiff_t type to int64_t type in cxx11_tensor_contract_sycl.cpp in order to be the same as other tests.
|
2017-02-01 15:36:03 +00:00 |
|
Mehdi Goli
|
bab29936a1
|
Reducing warnings in Sycl backend.
|
2017-02-01 15:29:53 +00:00 |
|
Mehdi Goli
|
48a20b7d95
|
Fixing compiler error on TensorContractionSycl.h; Silencing the compiler unused parameter warning for eval_op_indices in TensorContraction.h
|
2017-01-31 14:06:36 +00:00 |
|
Benoit Steiner
|
fbc39fd02c
|
Merge latest changes from upstream
|
2017-01-30 15:25:57 -08:00 |
|
Gael Guennebaud
|
63de19c000
|
bug #1380: fix matrix exponential with Map<>
|
2017-01-30 13:55:27 +01:00 |
|
Mehdi Goli
|
82ce92419e
|
Fixing the buffer type in memcpy.
|
2017-01-30 11:38:20 +00:00 |
|
Rasmus Munk Larsen
|
edaa0fc5d1
|
Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy.
This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
|
2017-01-26 12:46:06 -08:00 |
|
Gael Guennebaud
|
25a1703579
|
Merged in ggael/eigen-flexidexing (pull request PR-294)
generalized operator() for indexed access and slicing
|
2017-01-26 08:04:23 +00:00 |
|
Gael Guennebaud
|
607be65a03
|
Fix duplicates of array_size bewteen unsupported and Core
|
2017-01-25 22:53:58 +01:00 |
|
Benoit Steiner
|
e96c77668d
|
Merged in rmlarsen/eigen2 (pull request PR-292)
Adds a fast memcpy function to Eigen.
|
2017-01-25 00:14:04 +00:00 |
|
Rasmus Munk Larsen
|
e6b1020221
|
Adds a fast memcpy function to Eigen. This takes advantage of the following:
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.
2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux
Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.
The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.
Measured improvements in wall clock time:
Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2 3.48 2.39 +31.3%
BM_memcpy_1T/8 12.3 6.51 +47.0%
BM_memcpy_1T/64 371 383 -3.2%
BM_memcpy_1T/512 66922 66720 +0.3%
BM_memcpy_1T/4k 9892867 6849682 +30.8%
BM_memcpy_1T/5k 14951099 10332856 +30.9%
BM_memcpy_2T/2 3.50 2.46 +29.7%
BM_memcpy_2T/8 12.3 7.66 +37.7%
BM_memcpy_2T/64 371 376 -1.3%
BM_memcpy_2T/512 66652 66788 -0.2%
BM_memcpy_2T/4k 6145012 6117776 +0.4%
BM_memcpy_2T/5k 9181478 9010942 +1.9%
BM_memcpy_4T/2 3.47 2.47 +31.0%
BM_memcpy_4T/8 12.3 6.67 +45.8
BM_memcpy_4T/64 374 376 -0.5%
BM_memcpy_4T/512 67833 68019 -0.3%
BM_memcpy_4T/4k 5057425 5188253 -2.6%
BM_memcpy_4T/5k 7555638 7779468 -3.0%
BM_memcpy_6T/2 3.51 2.50 +28.8%
BM_memcpy_6T/8 12.3 7.61 +38.1%
BM_memcpy_6T/64 373 378 -1.3%
BM_memcpy_6T/512 66871 66774 +0.1%
BM_memcpy_6T/4k 5112975 5233502 -2.4%
BM_memcpy_6T/5k 7614180 7772246 -2.1%
BM_memcpy_8T/2 3.47 2.41 +30.5%
BM_memcpy_8T/8 12.4 10.5 +15.3%
BM_memcpy_8T/64 372 388 -4.3%
BM_memcpy_8T/512 67373 66588 +1.2%
BM_memcpy_8T/4k 5148462 5254897 -2.1%
BM_memcpy_8T/5k 7660989 7799058 -1.8%
BM_memcpy_12T/2 3.50 2.40 +31.4%
BM_memcpy_12T/8 12.4 7.55 +39.1
BM_memcpy_12T/64 374 378 -1.1%
BM_memcpy_12T/512 67132 66683 +0.7%
BM_memcpy_12T/4k 5185125 5292920 -2.1%
BM_memcpy_12T/5k 7717284 7942684 -2.9%
BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4%
BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4%
BM_slicingSmallPieces_1T/64 491 476 +3.1%
BM_slicingSmallPieces_1T/512 21734 18814 +13.4%
BM_slicingSmallPieces_1T/4k 394660 396760 -0.5%
BM_slicingSmallPieces_1T/5k 218722 209244 +4.3%
BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0%
BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0
BM_slicingSmallPieces_2T/64 497 477 +4.0%
BM_slicingSmallPieces_2T/512 21732 18822 +13.4%
BM_slicingSmallPieces_2T/4k 392885 390490 +0.6%
BM_slicingSmallPieces_2T/5k 221988 208678 +6.0%
BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9%
BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7%
BM_slicingSmallPieces_4T/64 493 476 +3.4%
BM_slicingSmallPieces_4T/512 21702 18758 +13.6%
BM_slicingSmallPieces_4T/4k 393962 404023 -2.6%
BM_slicingSmallPieces_4T/5k 249667 211732 +15.2%
BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5%
BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8%
BM_slicingSmallPieces_6T/64 488 478 +2.0%
BM_slicingSmallPieces_6T/512 21719 18841 +13.3%
BM_slicingSmallPieces_6T/4k 394950 397583 -0.7%
BM_slicingSmallPieces_6T/5k 223080 210148 +5.8%
BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0%
BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9%
BM_slicingSmallPieces_8T/64 489 480 +1.8%
BM_slicingSmallPieces_8T/512 21586 18798 +12.9%
BM_slicingSmallPieces_8T/4k 394592 400165 -1.4%
BM_slicingSmallPieces_8T/5k 219688 208301 +5.2%
BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7%
BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8
BM_slicingSmallPieces_12T/64 488 476 +2.5%
BM_slicingSmallPieces_12T/512 21931 18831 +14.1%
BM_slicingSmallPieces_12T/4k 393962 396541 -0.7%
BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%
|
2017-01-24 13:55:18 -08:00 |
|
Rasmus Munk Larsen
|
5e144bbaa4
|
Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op.
See #1373 for details.
|
2017-01-24 13:32:50 -08:00 |
|
Luke Iwanski
|
bf44fed9b7
|
Allows AMD APU
|
2017-01-23 15:56:45 +00:00 |
|
Mehdi Goli
|
602f8c27f5
|
Reverting back to the previous TensorDeviceSycl.h as the total number of buffer is not enough for tensorflow.
|
2017-01-20 18:23:20 +00:00 |
|
Mehdi Goli
|
77cc4d06c7
|
Removing unused variables
|
2017-01-19 17:06:21 +00:00 |
|
Mehdi Goli
|
837fdbdcb2
|
Merging with Benoit's upstream.
|
2017-01-19 11:34:34 +00:00 |
|
Mehdi Goli
|
6bdd15f572
|
Adding non-deferrenciable pointer track for ComputeCpp backend; Adding TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class.
|
2017-01-19 11:30:59 +00:00 |
|
Mehdi Goli
|
c6f7b33834
|
Applying Benoit's comment. Embedding synchronisation inside device memcpy so there is no need to externally call synchronise() for device memcopy.
|
2017-01-18 10:45:28 +00:00 |
|
Mehdi Goli
|
e46e722381
|
Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree.
|
2017-01-16 13:58:49 +00:00 |
|
Gael Guennebaud
|
bbd97b4095
|
Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases
|
2017-07-17 01:02:51 +02:00 |
|
Luke Iwanski
|
90c5bc8d64
|
Fixes auto appearance in functor template argument for reduction.
|
2017-01-04 22:18:44 +00:00 |
|
NeroBurner
|
c4fc2611ba
|
add cmake-option to enable/disable creation of tests
* * *
disable unsupportet/test when test are disabled
* * *
rename EIGEN_ENABLE_TESTS to BUILD_TESTING
* * *
consider BUILD_TESTING in blas
|
2017-01-02 09:09:21 +01:00 |
|
Mehdi Goli
|
8b1c2108ba
|
Reverting asynchronous exec to Synchronous exec regarding random race condition.
|
2016-12-22 16:45:38 +00:00 |
|
Benoit Steiner
|
660da83e18
|
Pulled latest update from trunk
|
2016-12-21 16:43:27 -08:00 |
|
Benoit Steiner
|
4236aebe10
|
Simplified the contraction code`
|
2016-12-21 16:42:56 -08:00 |
|
Benoit Steiner
|
3cfa16f41d
|
Merged in benoitsteiner/opencl (pull request PR-279)
Fix for auto appearing in functor template argument.
|
2016-12-21 15:08:54 -08:00 |
|
Benoit Steiner
|
519d63d350
|
Added support for libxsmm kernel in multithreaded contractions
|
2016-12-21 15:06:06 -08:00 |
|
Benoit Steiner
|
0657228569
|
Simplified the way we link libxsmm
|
2016-12-21 14:40:08 -08:00 |
|
Benoit Steiner
|
f9eff17e91
|
Leverage libxsmm kernels within signle threaded contractions
|
2016-12-21 12:32:06 -08:00 |
|
Benoit Steiner
|
c19fe5e9ed
|
Added support for libxsmm in the eigen makefiles
|
2016-12-21 10:43:40 -08:00 |
|
Luke Iwanski
|
c55ecfd820
|
Fix for auto appearing in functor template argument.
|
2016-12-21 15:42:51 +00:00 |
|
Benoit Steiner
|
0f577d4744
|
Merged eigen/eigen into default
|
2016-12-20 17:02:06 -08:00 |
|
Luke Iwanski
|
29186f766f
|
Fixed order of initialisation in ExecExprFunctorKernel functor.
|
2016-12-20 21:32:42 +00:00 |
|
Gael Guennebaud
|
e8d6862f14
|
Properly adjust precision when saving to Market format.
|
2016-12-20 22:10:33 +01:00 |
|
Gael Guennebaud
|
e2f4ee1c2b
|
Speed up parsing of sparse Market file.
|
2016-12-20 21:56:21 +01:00 |
|
Luke Iwanski
|
8245851d1b
|
Matching parameters order between lambda and the functor.
|
2016-12-20 16:18:15 +00:00 |
|
Benoit Steiner
|
548ed30a1c
|
Added an OpenCL regression test
|
2016-12-19 18:56:26 -08:00 |
|
Benoit Steiner
|
27ceb43bf6
|
Fixed race condition in the tensor_shuffling_sycl test
|
2016-12-19 15:34:42 -08:00 |
|
Benoit Steiner
|
70d0172f0c
|
Merged eigen/eigen into default
|
2016-12-16 17:37:04 -08:00 |
|
Benoit Steiner
|
8910442e19
|
Fixed memcpy, memcpyHostToDevice and memcpyDeviceToHost for Sycl.
|
2016-12-16 15:45:04 -08:00 |
|
Luke Iwanski
|
54db66c5df
|
struct -> class in order to silence compilation warning.
|
2016-12-16 20:25:20 +00:00 |
|
Mehdi Goli
|
35bae513a0
|
Converting all parallel for lambda to functor in order to prevent kernel duplication name error; adding tensorConcatinationOp backend for sycl.
|
2016-12-16 19:46:45 +00:00 |
|
Mehdi Goli
|
c5e8546306
|
Adding asynchandler to sycl queue as lack of it can cause undefined behaviour.
|
2016-12-15 16:59:57 +00:00 |
|
Benoit Steiner
|
2c2e218471
|
Avoid using #define since they can conflict with user code
|
2016-12-14 19:49:15 -08:00 |
|
Benoit Steiner
|
3beb180ee5
|
Don't call EnvThread::OnCancel by default since it doesn't do anything.
|
2016-12-14 18:33:39 -08:00 |
|
Benoit Steiner
|
9ff5d0f821
|
Merged eigen/eigen into default
|
2016-12-14 17:32:16 -08:00 |
|
Mehdi Goli
|
730eb9fe1c
|
Adding asynchronous execution as it improves the performance.
|
2016-12-14 17:38:53 +00:00 |
|
Mehdi Goli
|
2d4a091beb
|
Adding tensor contraction operation backend for Sycl; adding test for contractionOp sycl backend; adding temporary solution to prevent memory leak in buffer; cleaning up cxx11_tensor_buildins_sycl.h
|
2016-12-14 15:30:37 +00:00 |
|
Benoit Steiner
|
a432fc102d
|
Moved the choice of ThreadPool to unsupported/Eigen/CXX11/ThreadPool
|
2016-12-12 15:24:16 -08:00 |
|
Benoit Steiner
|
8ae68924ed
|
Made ThreadPoolInterface::Cancel() an optional functionality
|
2016-12-12 11:58:38 -08:00 |
|
Benoit Steiner
|
76fca22134
|
Use a more accurate timer to sleep on Linux systems.
|
2016-12-09 15:12:24 -08:00 |
|
Benoit Steiner
|
4deafd35b7
|
Introduce a portable EIGEN_SLEEP macro.
|
2016-12-09 14:52:15 -08:00 |
|
Benoit Steiner
|
aafa97f4d2
|
Fixed build error with MSVC
|
2016-12-09 14:42:32 -08:00 |
|
Benoit Steiner
|
2f5b7a199b
|
Reworked the threadpool cancellation mechanism to not depend on pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms.
|
2016-12-09 13:05:14 -08:00 |
|
Benoit Steiner
|
3d59a47720
|
Added a message to ease the detection of platforms on which thread cancellation isn't supported.
|
2016-12-08 14:51:46 -08:00 |
|
Benoit Steiner
|
28ee8f42b2
|
Added a Flush method to the RunQueue
|
2016-12-08 14:07:56 -08:00 |
|
Benoit Steiner
|
69ef267a77
|
Added the new threadpool cancel method to the threadpool interface based class.
|
2016-12-08 14:03:25 -08:00 |
|
Benoit Steiner
|
7bfff85355
|
Added support for thread cancellation on Linux
|
2016-12-08 08:12:49 -08:00 |
|
Benoit Steiner
|
462c28e77a
|
Merged in srvasude/eigen (pull request PR-265)
Add Expm1 support to Eigen.
|
2016-12-05 02:31:11 +00:00 |
|
Gael Guennebaud
|
4465d20403
|
Add missing generic load methods.
|
2016-12-03 21:25:04 +01:00 |
|
Srinivas Vasudevan
|
218764ee1f
|
Added support for expm1 in Eigen.
|
2016-12-02 14:13:01 -08:00 |
|
Mehdi Goli
|
592acc5bfa
|
Makingt default numeric_list works with sycl.
|
2016-12-02 17:58:30 +00:00 |
|
Mehdi Goli
|
79aa2b784e
|
Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.
|
2016-12-01 13:02:27 +00:00 |
|
Benoit Steiner
|
a70393fd02
|
Cleaned up forward declarations
|
2016-11-30 21:59:07 -08:00 |
|
Benoit Steiner
|
e073de96dc
|
Moved the MemCopyFunctor back to TensorSyclDevice since it's the only caller and it makes TensorFlow compile again
|
2016-11-30 21:36:52 -08:00 |
|
Benoit Steiner
|
fca27350eb
|
Added the deallocate_all() method back
|
2016-11-30 20:45:20 -08:00 |
|
Benoit Steiner
|
e633a8371f
|
Simplified includes
|
2016-11-30 20:21:18 -08:00 |
|
Benoit Steiner
|
7cd33df4ce
|
Improved formatting
|
2016-11-30 20:20:44 -08:00 |
|
Benoit Steiner
|
fd1dc3363e
|
Merged eigen/eigen into default
|
2016-11-30 20:16:17 -08:00 |
|
Benoit Steiner
|
f5107010ee
|
Udated the Sizes class to work on AMD gpus without requiring a separate implementation
|
2016-11-30 19:57:28 -08:00 |
|
Benoit Steiner
|
e37c2c52d3
|
Added an implementation of numeric_list that works with sycl
|
2016-11-30 19:55:15 -08:00 |
|
Benoit Steiner
|
df3da0780d
|
Updated customIndices2Array to handle various index sizes.
|
2016-11-30 09:30:12 -08:00 |
|
Luke Iwanski
|
26fff1c5b1
|
Added EIGEN_STRONG_INLINE to get_sycl_supported_device().
|
2016-11-30 16:55:22 +00:00 |
|
Mehdi Goli
|
577ce78085
|
Adding TensorShuffling backend for sycl; adding TensorReshaping backend for sycl; cleaning up the sycl backend.
|
2016-11-29 15:30:42 +00:00 |
|
Benoit Steiner
|
3011dc94ef
|
Call internal::array_prod to compute the total size of the tensor.
|
2016-11-28 09:00:31 -08:00 |
|
Benoit Steiner
|
02080e2b67
|
Merged eigen/eigen into default
|
2016-11-27 07:27:30 -08:00 |
|
Benoit Steiner
|
9fd081cddc
|
Fixed compilation warnings
|
2016-11-26 20:22:25 -08:00 |
|
Benoit Steiner
|
9f8fbd9434
|
Merged eigen/eigen into default
|
2016-11-26 11:28:25 -08:00 |
|
Benoit Steiner
|
67b2c41f30
|
Avoided unnecessary type conversion
|
2016-11-26 11:27:29 -08:00 |
|
Benoit Steiner
|
7fe704596a
|
Added missing array_get method for numeric_list
|
2016-11-26 11:26:07 -08:00 |
|
Mehdi Goli
|
7318daf887
|
Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h.
|
2016-11-25 16:19:07 +00:00 |
|
Benoit Steiner
|
7ad37606dd
|
Fixed the documentation of Scalar Tensors
|
2016-11-24 12:31:43 -08:00 |
|
Gael Guennebaud
|
308961c05e
|
Fix compilation.
|
2016-11-23 22:17:52 +01:00 |
|
Mehdi Goli
|
b8cc5635d5
|
Removing unsupported device from test case; cleaning the tensor device sycl.
|
2016-11-23 16:30:41 +00:00 |
|
Gael Guennebaud
|
7f6333c32b
|
Merged in tal500/eigen-eulerangles (pull request PR-237)
Euler angles
|
2016-11-23 15:17:38 +00:00 |
|
Gael Guennebaud
|
f12b368417
|
Extend polynomial solver unit tests to complexes
|
2016-11-23 16:05:45 +01:00 |
|
Gael Guennebaud
|
56e5ec07c6
|
Automatically switch between EigenSolver and ComplexEigenSolver, and fix a few Real versus Scalar issues.
|
2016-11-23 16:05:10 +01:00 |
|
Gael Guennebaud
|
9246587122
|
Patch from Oleg Shirokobrod to extend polynomial solver to complexes
|
2016-11-23 15:42:26 +01:00 |
|
Benoit Steiner
|
f11da1d83b
|
Made the QueueInterface thread safe
|
2016-11-20 13:17:08 -08:00 |
|
Benoit Steiner
|
6d781e3e52
|
Merged eigen/eigen into default
|
2016-11-20 10:12:54 -08:00 |
|
Benoit Steiner
|
79a07b891b
|
Fixed a typo
|
2016-11-20 07:07:41 -08:00 |
|
Benoit Steiner
|
81151bd474
|
Fixed merge conflicts
|
2016-11-19 19:12:59 -08:00 |
|
Benoit Steiner
|
9265ca707e
|
Made it possible to check the state of a sycl device without synchronization
|
2016-11-19 10:56:24 -08:00 |
|
Benoit Steiner
|
2d1aec15a7
|
Added missing include
|
2016-11-19 08:09:54 -08:00 |
|
Luke Iwanski
|
af67335e0e
|
Added test for cwiseMin, cwiseMax and operator%.
|
2016-11-19 13:37:27 +00:00 |
|
Benoit Steiner
|
1bdf1b9ce0
|
Merged in benoitsteiner/opencl (pull request PR-253)
OpenCL improvements
|
2016-11-19 04:44:43 +00:00 |
|
Benoit Steiner
|
a357fe1fb9
|
Code cleanup
|
2016-11-18 16:58:09 -08:00 |
|
Benoit Steiner
|
1c6eafb46b
|
Updated cxx11_tensor_device_sycl to run only on the OpenCL devices available on the host
|
2016-11-18 16:43:27 -08:00 |
|
Benoit Steiner
|
ca754caa23
|
Only runs the cxx11_tensor_reduction_sycl on devices that are available.
|
2016-11-18 16:31:14 -08:00 |
|
Benoit Steiner
|
dc601d79d1
|
Added the ability to run test exclusively OpenCL devices that are listed by sycl::device::get_devices().
|
2016-11-18 16:26:50 -08:00 |
|
Benoit Steiner
|
110b7f8d9f
|
Deleted unnecessary semicolons
|
2016-11-18 14:06:17 -08:00 |
|
Benoit Steiner
|
b5e3285e16
|
Test broadcasting on OpenCL devices with 64 bit indexing
|
2016-11-18 13:44:20 -08:00 |
|
Benoit Steiner
|
37c2c516a6
|
Cleaned up the sycl device code
|
2016-11-18 12:38:06 -08:00 |
|
Benoit Steiner
|
7335c49204
|
Fixed the cxx11_tensor_device_sycl test
|
2016-11-18 12:37:13 -08:00 |
|
Mehdi Goli
|
15e226d7d3
|
adding Benoit changes on the TensorDeviceSycl.h
|
2016-11-18 16:34:54 +00:00 |
|
Mehdi Goli
|
622805a0c5
|
Modifying TensorDeviceSycl.h to always create buffer of type uint8_t and convert them to the actual type at the execution on the device; adding the queue interface class to separate the lifespan of sycl queue and buffers,created for that queue, from Eigen::SyclDevice; modifying sycl tests to support the evaluation of the results for both row major and column major data layout on all different devices that are supported by Sycl{CPU; GPU; and Host}.
|
2016-11-18 16:20:42 +00:00 |
|
Luke Iwanski
|
5159675c33
|
Added isnan, isfinite and isinf for SYCL device. Plus test for that.
|
2016-11-18 16:01:48 +00:00 |
|
Tal Hadad
|
76b2a3e6e7
|
Allow to construct EulerAngles from 3D vector directly.
Using assignment template struct to distinguish between 3D vector and 3D rotation matrix.
|
2016-11-18 15:01:06 +02:00 |
|
Luke Iwanski
|
927bd62d2a
|
Now testing out (+=, =) in.FUNC() and out (+=, =) out.FUNC()
|
2016-11-18 11:16:42 +00:00 |
|
Benoit Steiner
|
7c30078b9f
|
Merged eigen/eigen into default
|
2016-11-17 22:53:37 -08:00 |
|
Benoit Steiner
|
553f50b246
|
Added a way to detect errors generated by the opencl device from the host
|
2016-11-17 21:51:48 -08:00 |
|
Benoit Steiner
|
72a45d32e9
|
Cleanup
|
2016-11-17 21:29:15 -08:00 |
|
Benoit Steiner
|
4349fc640e
|
Created a test to check that the sycl runtime can successfully report errors (like ivision by 0).
Small cleanup
|
2016-11-17 20:27:54 -08:00 |
|
Benoit Steiner
|
a6a3fd0703
|
Made TensorDeviceCuda.h compile on windows
|
2016-11-17 16:15:27 -08:00 |
|
Benoit Steiner
|
004344cf54
|
Avoid calling log(0) or 1/0
|
2016-11-17 11:56:44 -08:00 |
|
Luke Iwanski
|
7878756dea
|
Fixed existing test.
|
2016-11-17 17:46:55 +00:00 |
|
Luke Iwanski
|
c5130dedbe
|
Specialised basic math functions for SYCL device.
|
2016-11-17 11:47:13 +00:00 |
|
Benoit Steiner
|
b5c75351e3
|
Merged eigen/eigen into default
|
2016-11-14 15:54:44 -08:00 |
|
Rasmus Munk Larsen
|
32df1b1046
|
Reduce dispatch overhead in parallelFor by only calling thread_pool.Schedule() for one of the two recursive calls in handleRange. This avoids going through the scedule path to push both recursive calls onto another thread-queue in the binary tree, but instead executes one of them on the main thread. At the leaf level this will still activate a full complement of threads, but will save up to 50% of the overhead in Schedule (random number generation, insertion in queue which includes signaling via atomics).
|
2016-11-14 14:18:16 -08:00 |
|
Mehdi Goli
|
05e8c2a1d9
|
Adding extra test for non-fixed size to broadcast; Replacing stcl with sycl.
|
2016-11-14 18:13:53 +00:00 |
|
Mehdi Goli
|
f8ca893976
|
Adding TensorFixsize; adding sycl device memcpy; adding insial stage of slicing.
|
2016-11-14 17:51:57 +00:00 |
|
Mehdi Goli
|
a5c3f15682
|
Adding comment to TensorDeviceSycl.h and cleaning the code.
|
2016-11-11 19:06:34 +00:00 |
|
Mehdi Goli
|
3be3963021
|
Adding EIGEN_STRONG_INLINE back; using size() instead of dimensions.TotalSize() on Tensor.
|
2016-11-10 19:16:31 +00:00 |
|
Mehdi Goli
|
12387abad5
|
adding the missing in eigen_assert!
|
2016-11-10 18:58:08 +00:00 |
|
Mehdi Goli
|
2e704d4257
|
Adding Memset; optimising MecopyDeviceToHost by removing double copying;
|
2016-11-10 18:45:12 +00:00 |
|
Benoit Steiner
|
75c080b176
|
Added a test to validate memory transfers between host and sycl device
|
2016-11-09 06:23:42 -08:00 |
|
Benoit Steiner
|
db3903498d
|
Merged in benoitsteiner/opencl (pull request PR-246)
Improved support for OpenCL
|
2016-11-08 22:28:44 +00:00 |
|
Benoit Steiner
|
dcc14bee64
|
Fixed the formatting of the code
|
2016-11-08 14:24:46 -08:00 |
|
Luke Iwanski
|
912cb3d660
|
#if EIGEN_EXCEPTION -> #ifdef EIGEN_EXCEPTIONS.
|
2016-11-08 22:01:14 +00:00 |
|
Luke Iwanski
|
1b345b0895
|
Fix for SYCL queue initialisation.
|
2016-11-08 21:56:31 +00:00 |
|
Luke Iwanski
|
1b95717358
|
Use try/catch only when exceptions are enabled.
|
2016-11-08 21:08:53 +00:00 |
|
Mehdi Goli
|
d57430dd73
|
Converting all sycl buffers to uninitialised device only buffers; adding memcpyHostToDevice and memcpyDeviceToHost on syclDevice; modifying all examples to obey the new rules; moving sycl queue creating to the device based on Benoit suggestion; removing the sycl specefic condition for returning m_result in TensorReduction.h according to Benoit suggestion.
|
2016-11-08 17:08:02 +00:00 |
|
Benoit Steiner
|
ad086b03e4
|
Removed unnecessary statement
|
2016-11-05 12:43:27 -07:00 |
|
Benoit Steiner
|
dad177be01
|
Added missing includes
|
2016-11-05 10:04:42 -07:00 |
|
Gael Guennebaud
|
55b4fd1d40
|
Extend mpreal unit test to check LLT with complexes.
|
2016-11-05 11:28:53 +01:00 |
|
Benoit Steiner
|
d46a36cc84
|
Merged eigen/eigen into default
|
2016-11-04 18:22:55 -07:00 |
|
Mehdi Goli
|
0ebe3808ca
|
Removed the sycl include from Eigen/Core and moved it to Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size;
|
2016-11-04 18:18:19 +00:00 |
|
Benoit Steiner
|
3e37166d0b
|
Merged in benoitsteiner/opencl (pull request PR-244)
Disable vectorization on device only when compiling for sycl
|
2016-11-02 22:01:03 +00:00 |
|
Benoit Steiner
|
0585b2965d
|
Disable vectorization on device only when compiling for sycl
|
2016-11-02 11:44:27 -07:00 |
|