Mehdi Goli
00f32752f7
[SYCL] Rebasing the SYCL support branch on top of the Einge upstream master branch.
...
* Unifying all loadLocalTile from lhs and rhs to an extract_block function.
* Adding get_tensor operation which was missing in TensorContractionMapper.
* Adding the -D method missing from cmake for Disable_Skinny Contraction operation.
* Wrapping all the indices in TensorScanSycl into Scan parameter struct.
* Fixing typo in Device SYCL
* Unifying load to private register for tall/skinny no shared
* Unifying load to vector tile for tensor-vector/vector-tensor operation
* Removing all the LHS/RHS class for extracting data from global
* Removing Outputfunction from TensorContractionSkinnyNoshared.
* Combining the local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining General Tensor-Vector and VectorTensor contraction into one kernel.
* Making double buffering optional for Tensor contraction when local memory is version is used.
* Modifying benchmark to accept custom Reduction Sizes
* Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host
* Adding Test for SYCL
* Modifying SYCL CMake
2019-11-28 10:08:54 +00:00
Mehdi Goli
f499fe9496
Adding synchronisation to convolution kernel for sycl backend.
2017-03-13 09:18:37 +00:00
Mehdi Goli
aadb7405a7
Fixing typo in sycl Benchmark.
2017-03-08 18:20:06 +00:00
Mehdi Goli
5e9a1e7a7a
Adding sycl Benchmarks.
2017-03-08 14:17:48 +00:00
Benoit Steiner
3eda02d78d
Fixed the sycl benchmarking code
2016-12-22 10:37:05 -08:00
Luke Iwanski
cb81975714
Partial OpenCL support via SYCL compatible with ComputeCpp CE.
2016-09-19 12:44:13 +01:00
Benoit Steiner
457204cb83
Updated the README file for the tensor benchmarks
2016-05-25 16:13:41 -07:00
Benoit Steiner
034aa3b2c0
Improved the performance of tensor padding
2016-05-25 11:43:08 -07:00
Benoit Steiner
069a0b04d7
Added benchmarks for contraction on CPU.
2016-05-13 14:32:17 -07:00
Benoit Steiner
f81e413180
Added a benchmark to measure the performance of full reductions of 16 bit floats
2016-05-05 14:15:11 -07:00
Benoit Steiner
79b900375f
Use index list for the striding benchmarks
2016-04-21 11:58:27 -07:00
Benoit Steiner
eaeb6ca93a
Enable the benchmarks for algebraic and transcendental fnctions on fp16.
2016-04-12 16:29:00 -07:00
Benoit Steiner
53121c0119
Turned on the contraction benchmarks for fp16
2016-04-12 14:11:52 -07:00
Benoit Steiner
63102ee43d
Turn on the coeffWise benchmarks on fp16
2016-04-07 23:05:20 -07:00
Benoit Steiner
7c47d3e663
Fixed the type casting benchmarks for fp16
2016-04-07 22:50:25 -07:00
Benoit Steiner
a6d08be9b2
Fixed the benchmarking of fp16 coefficient wise operations
2016-04-07 17:13:44 -07:00
Benoit Steiner
0968e925a0
Updated the benchmarking code to use Eigen::half instead of half
2016-03-24 18:00:33 -07:00
Benoit Steiner
7168afde5e
Made the tensor benchmarks compile on MacOS
2016-03-23 14:21:04 -07:00
Benoit Steiner
56a3ada670
Added benchmarks for full reduction
2016-02-29 14:57:52 -08:00
Benoit Steiner
1031b31571
Improved the README
2016-02-27 20:22:04 +00:00
Benoit Steiner
93485d86bc
Added benchmarks for type casting of float16
2016-02-26 12:24:58 -08:00
Benoit Steiner
002824e32d
Added benchmarks for fp16
2016-02-26 12:21:25 -08:00
Benoit Steiner
8cb9bfab87
Extended the tensor benchmark suite to support types other than floats
2016-02-23 05:28:02 +00:00
Benoit Steiner
f442a5a5b3
Updated the tensor benchmarking code to work with compilers that don't support cxx11.
2016-02-23 04:15:48 +00:00
Benoit Steiner
4281eb1e2c
Added 2 benchmarks to the suite of tensor benchmarks running on GPU
2016-01-30 10:20:43 -08:00
Benoit Steiner
e4f83bae5d
Fixed the tensor benchmarks on apple devices
2016-01-28 21:08:07 -08:00
Benoit Steiner
10bea90c4a
Fixed clang related compilation error
2016-01-28 20:52:08 -08:00
Benoit Steiner
211d350fc3
Fixed a typo
2016-01-28 17:13:04 -08:00
Benoit Steiner
bd2e5a788a
Made sure the number of floating point operations done by a benchmark is computed using 64 bit integers to avoid overflows.
2016-01-28 17:10:40 -08:00
Benoit Steiner
120e13b1b6
Added a readme to explain how to compile the tensor benchmarks.
2016-01-28 17:06:00 -08:00
Benoit Steiner
a68864b6bc
Updated the benchmarking code to print the number of flops processed instead of the number of bytes.
2016-01-28 16:51:40 -08:00
Benoit Steiner
c8d5f21941
Added extra tensor benchmarks
2016-01-28 16:20:36 -08:00
Yangqing Jia
270c4e1ecd
bugfix
2016-01-28 11:11:45 -08:00
Yangqing Jia
c4e47630b1
benchmark modifications to make it compilable in a standalone fashion.
2016-01-28 10:35:14 -08:00
Benoit Steiner
46fc881e4a
Added a few benchmarks for the tensor code
2015-01-26 17:46:40 -08:00