Commit Graph

1951 Commits

Author SHA1 Message Date
Gael Guennebaud
82f0ce2726 Get rid of EIGEN_TEST_FUNC, unit tests must now be declared with EIGEN_DECLARE_TEST(mytest) { /* code */ }.
This provide several advantages:
- more flexibility in designing unit tests
- unit tests can be glued to speed up compilation
- unit tests are compiled with same predefined macros, which is a requirement for zapcc
2018-07-17 14:46:15 +02:00
Eugene Zhulenev
43206ac4de Call OutputKernel in evalGemv 2018-07-12 14:52:23 -07:00
Eugene Zhulenev
e204ecdaaf Remove SimpleThreadPool and always use {NonBlocking}ThreadPool 2018-07-16 15:06:57 -07:00
Eugene Zhulenev
01fd4096d3 Fuse computations into the Tensor contractions using output kernel 2018-07-10 13:16:38 -07:00
Gael Guennebaud
5539587b1f Some warning fixes 2018-07-17 10:29:12 +02:00
Benoit Steiner
8f55956a57 Update the padding computation for PADDING_SAME to be consistent with TensorFlow. 2018-01-30 20:22:12 +00:00
Lee.Deokjae
5b3c367926 Fix typos in the contraction example of tensor README 2018-01-06 14:36:19 +09:00
RJ Ryan
59985cfd26 Disable use of recurrence for computing twiddle factors. Fixes FFT precision issues for large FFTs. https://github.com/tensorflow/tensorflow/issues/10749#issuecomment-354557689 2017-12-31 10:44:56 -05:00
Gael Guennebaud
73214c4bd0 Workaround nvcc 9.0 issue. See PR 351.
https://bitbucket.org/eigen/eigen/pull-requests/351
2017-12-15 14:10:59 +01:00
Yangzihao Wang
3122477c86 Update the padding computation for PADDING_SAME to be consistent with TensorFlow. 2017-12-12 11:15:24 -08:00
Rasmus Munk Larsen
e900b010c8 Improve robustness of igamma and igammac to bad inputs.
Check for nan inputs and propagate them immediately. Limit the number of internal iterations to 2000 (same number as used by scipy.special.gammainc). This prevents an infinite loop when the function is called with nan or very large arguments.

Original change by mfirgunov@google.com
2018-03-19 09:04:54 -07:00
Gael Guennebaud
00bc67c374 Move KLU support to official 2017-11-10 14:11:22 +01:00
Gael Guennebaud
b82cd93c01 KLU: truely disable unimplemented code, add proper static assertions in solve 2017-11-10 14:09:01 +01:00
Gael Guennebaud
8cf63ccb99 Merged in kylemacfarlan/eigen (pull request PR-337)
Add support for SuiteSparse's KLU routines
2017-11-10 10:43:17 +00:00
Gael Guennebaud
1495b98a8e Merged in spraetor/eigen (pull request PR-305)
Issue with mpreal and std::numeric_limits::digits
2017-11-10 10:28:54 +00:00
Gael Guennebaud
fc45324380 Merged in jkflying/eigen-fix-scaling (pull request PR-302)
Make scaling work with non-square matrices
2017-11-10 10:11:36 +00:00
Gael Guennebaud
1b2dcf9a47 Check that Schur decomposition succeed. 2017-11-10 10:26:09 +01:00
Gael Guennebaud
0a1cc73942 bug #1484: restore deleted line for 128 bits long doubles, and improve dispatching logic. 2017-11-10 10:25:41 +01:00
Benoit Steiner
3949615176 Merged in JonasMu/eigen (pull request PR-329)
Added an example for a contraction to a scalar value to README.md

Approved-by: Jonas Harsch <jonas.harsch@gmail.com>
2017-10-27 07:27:46 +00:00
Benoit Steiner
8eb4b9d254 Merged in benoitsteiner/opencl (pull request PR-341) 2017-10-17 16:39:28 +00:00
Rasmus Munk Larsen
f349507e02 Specialize ThreadPoolDevice::enqueueNotification for the case with no args. As an example this reduces binary size of an TensorFlow demo app for Android by about 2.5%. 2017-10-13 15:58:12 -07:00
Kyle Vedder
c0e1d510fd Add support for SuiteSparse's KLU routines 2017-10-04 21:01:23 -05:00
Mehdi Goli
2062ac9958 Changes required for new ComputeCpp CE version. 2017-09-18 18:17:39 +01:00
Rasmus Munk Larsen
1b7294f6fc Fix cut-and-paste error. 2017-09-08 16:35:58 -07:00
Rasmus Munk Larsen
94e2213b38 Avoid undefined behavior in Eigen::TensorCostModel::numThreads.
If the cost is large enough then the thread count can be larger than the maximum
representable int, so just casting it to an int is undefined behavior.

Contributed by phurst@google.com.
2017-09-08 15:49:55 -07:00
Gael Guennebaud
a91918a105 Merged in infinitei/eigen (pull request PR-328)
bug #1464 : Fixes construction of EulerAngles from 3D vector expression.

Approved-by: Tal Hadad <tal_hd@hotmail.com>
Approved-by: Abhijit Kundu <abhijit.kundu@gatech.edu>
2017-09-06 08:42:14 +00:00
Jonas Harsch
a991c80365 Added an example for a contraction to a scalar value, e.g. a double contraction of two second order tensors and how you can get the value of the result. I lost one day to get this doen so I think it will help some guys. I also added Eigen:: to the IndexPair and and array in the same example. 2017-09-01 11:30:26 +00:00
Benoit Steiner
a4089991eb Added support for CUDA 9.0. 2017-08-31 02:49:39 +00:00
Abhijit Kundu
6d991a9595 bug #1464 : Fixes construction of EulerAngles from 3D vector expression. 2017-08-30 13:26:30 -04:00
Benoit Steiner
84d7be103a Fixing Argmax that was breaking upstream TensorFlow. 2017-07-22 03:19:34 +00:00
Benoit Steiner
f0b154a4b0 Code cleanup 2017-07-10 09:54:09 -07:00
Benoit Steiner
575cda76b3 Fixed syntax errors generated by xcode 2017-07-09 11:39:01 -07:00
Benoit Steiner
5ac27d5b51 Avoid relying on cxx11 features when possible. 2017-07-08 21:58:44 -07:00
Benoit Steiner
c5a241ab9b Merged in benoitsteiner/opencl (pull request PR-323)
Improved support for OpenCL
2017-07-07 16:27:33 +00:00
Benoit Steiner
b7ae4dd9ef Merged in hughperkins/eigen/add-endif-labels-TensorReductionCuda.h (pull request PR-315)
Add labels to #ifdef, in TensorReductionCuda.h
2017-07-07 04:23:52 +00:00
Benoit Steiner
9daed67952 Merged in tntnatbry/eigen (pull request PR-319)
Tensor Trace op
2017-07-07 04:18:03 +00:00
Benoit Steiner
6795512e59 Improved the randomness of the tensor random generator 2017-07-06 21:12:45 -07:00
Benoit Steiner
dc524ac716 Fixed compilation warning 2017-07-06 21:11:15 -07:00
Benoit Steiner
62b4634ebe Merged in mehdi_goli/upstr_benoit/TensorSYCLImageVolumePatchFixed (pull request PR-14)
Applying Benoit's comment for Fixing ImageVolumePatch.

* Applying Benoit's comment for Fixing ImageVolumePatch. Fixing conflict on cmake file.

* Fixing dealocation of the memory in ImagePatch test for SYCL.

* Fixing the automerge issue.
2017-07-06 05:08:13 +00:00
Benoit Steiner
53725c10b8 Merged in mehdi_goli/opencl/DataDependancy (pull request PR-10)
DataDependancy

* Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code.

* Applying Ronnan's Comments.

* Applying benoit's comments
2017-06-28 17:55:23 +00:00
Benoit Steiner
b8e805497e Merged in benoitsteiner/opencl (pull request PR-318)
Improved support for OpenCL
2017-06-13 05:01:10 +00:00
Hugh Perkins
9341f258d4 Add labels to #ifdef, in TensorReductionCuda.h 2017-06-06 15:51:06 +01:00
Benoit Steiner
1e736b9ead Merged in mehdi_goli/opencl/SYCLAlignAllocator (pull request PR-7)
Fixing SYCL alignment issue required by TensorFlow.
2017-05-26 17:23:00 +00:00
Benoit Steiner
9dee55ec33 Merged eigen/eigen into default 2017-05-26 09:01:04 -07:00
Mehdi Goli
0370d3576e Applying Ronnan's comments. 2017-05-26 16:01:48 +01:00
Mehdi Goli
e3f964ed55 Applying Benoit's comment;removing dead code. 2017-05-25 11:17:26 +01:00
a-doumoulakis
fb853a857a Restore misplaced comment 2017-05-24 17:50:15 +01:00
a-doumoulakis
7a8ba565f8 Merge changed from upstream 2017-05-24 17:45:29 +01:00
Mmanu Chaturvedi
2971503fed Specializing numeric_limits For AutoDiffScalar 2017-05-23 17:12:36 -04:00
Gael Guennebaud
26e8f9171e Fix compilation of matrix log with Map as input 2017-06-07 10:51:23 +02:00
Mehdi Goli
76c0fc1f95 Fixing SYCL alignment issue required by TensorFlow. 2017-05-22 16:49:32 +01:00
Mehdi Goli
2d17128d6f Fixing suported device list. 2017-05-22 16:40:33 +01:00
a-doumoulakis
052426b824 Add support for triSYCL
Eigen is now able to use triSYCL with EIGEN_SYCL_TRISYCL and TRISYCL_INCLUDE_DIR options

Fix contraction kernel with correct nd_item dimension
2017-05-05 19:26:27 +01:00
RJ Ryan
949a2da38c Use scalar_sum_op and scalar_quotient_op instead of operator+ and operator/ in MeanReducer.
Improves support for std::complex types when compiling for CUDA.

Expands on e2e9cdd169
 and 2bda1b0d93
.
2017-04-14 13:23:35 -07:00
Benoit Steiner
0d08165a7f Merged in benoitsteiner/opencl (pull request PR-309)
OpenCL improvements
2017-04-05 14:28:08 +00:00
Benoit Steiner
c302ea7bc4 Deleted empty line of code 2017-04-04 10:05:16 -07:00
Benoit Steiner
a5a0c8fac1 Guard sycl specific code under a EIGEN_USE_SYCL ifdef 2017-04-04 10:03:21 -07:00
Benoit Steiner
a1304b95b7 Code cleanup 2017-04-04 10:00:46 -07:00
Benoit Steiner
66c63826bd Guard the sycl specific code with EIGEN_USE_SYCL 2017-04-04 09:59:09 -07:00
Benoit Steiner
e3e343390a Guard the sycl specific code with a #ifdef EIGEN_USE_SYCL 2017-04-04 09:56:33 -07:00
Benoit Steiner
63840d4666 iGate the sycl specific code under a EIGEN_USE_SYCL define 2017-04-04 09:54:31 -07:00
Benoit Steiner
bc050ea9f0 Fixed compilation error when sycl is enabled. 2017-04-04 09:47:04 -07:00
Gagan Goel
4910630c96 fix typos in the Tensor readme 2017-03-31 20:32:16 -04:00
Benoit Steiner
c1b3d5ecb6 Restored code compatibility with compilers that dont support c++11
Gated more sycl code under #ifdef sycl
2017-03-31 08:31:28 -07:00
Benoit Steiner
e2d5d4e7b3 Restore the old constructors to retain compatibility with non c++11 compilers. 2017-03-31 08:26:13 -07:00
Benoit Steiner
73fcaa319f Gate the sycl specific code under #ifdef sycl 2017-03-31 08:22:25 -07:00
Mehdi Goli
bd64ee8555 Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax. 2017-03-28 16:50:34 +01:00
Simon Praetorius
511810797e Issue with mpreal and std::numeric_limits, i.e. digits is not a constant. Added a digits() traits in NumTraits with fallback to static constant. Specialization for mpreal added in MPRealSupport. 2017-03-24 17:45:56 +01:00
Luke Iwanski
a91417a7a5 Introduces align allocator for SYCL buffer 2017-03-20 14:48:54 +00:00
Benoit Steiner
f8a622ef3c Merged eigen/eigen into default 2017-03-15 20:06:19 -07:00
Benoit Steiner
fd7db52f9b Silenced compilation warning 2017-03-15 20:02:39 -07:00
Luke Iwanski
c06861d15e Fixes bug in get_sycl_supported_devices() that was reporting unsupported Intel CPU on AMD platform - causing timeouts in that configuration 2017-03-15 19:26:08 +00:00
Benoit Steiner
f0f3591118 Made the reduction code compile with cuda-clang 2017-03-14 14:16:53 -07:00
Mehdi Goli
f499fe9496 Adding synchronisation to convolution kernel for sycl backend. 2017-03-13 09:18:37 +00:00
Rasmus Munk Larsen
bfd7bf9c5b Get rid of Init(). 2017-03-10 08:48:20 -08:00
Rasmus Munk Larsen
d56ab01094 Use C++11 ctor forwarding to simplify code a bit. 2017-03-10 08:30:22 -08:00
Rasmus Munk Larsen
344c2694a6 Make the non-blocking threadpool more flexible and less wasteful of CPU cycles for high-latency use-cases.
* Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O.

* This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time.

* Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for.

* Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
2017-03-09 15:41:03 -08:00
Luke Iwanski
1b32a10053 Use name to distinguish name instead of the vendor 2017-03-08 18:26:34 +00:00
Gael Guennebaud
970ff78294 bug #1401: fix compilation of "cond ? x : -x" with x an AutoDiffScalar 2017-03-08 16:16:53 +01:00
Mehdi Goli
5e9a1e7a7a Adding sycl Benchmarks. 2017-03-08 14:17:48 +00:00
Mehdi Goli
e2e3f78533 Fixing potential race condition on sycl device. 2017-03-07 17:48:15 +00:00
Mehdi Goli
f84963ed95 Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch. 2017-03-07 14:27:10 +00:00
Julian Kent
bbe717fa2f Make scaling work with non-square matrices 2017-03-03 12:58:51 +01:00
Benoit Steiner
a71943b9a4 Made the Tensor code compile with clang 3.9 2017-03-02 10:47:29 -08:00
Benoit Steiner
1e2d046651 Silenced a couple of compilation warnings 2017-03-01 10:13:42 -08:00
Benoit Steiner
c92406d613 Silenced clang compilation warning. 2017-02-28 17:03:11 -08:00
Benoit Steiner
de7b0fdea9 Made the TensorStorage class compile with clang 3.9 2017-02-28 13:52:22 -08:00
Mehdi Goli
8296b87d7b Adding sycl backend for TensorCustomOp; fixing the partial lhs modification issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used. 2017-02-28 17:16:14 +00:00
Gael Guennebaud
478a9f53be Fix typo. 2017-02-28 09:32:45 +01:00
Benoit Steiner
e0bd6f5738 Merged eigen/eigen into default 2017-02-26 10:02:14 -08:00
Mehdi Goli
2fa2b617a9 Adding TensorVolumePatchOP.h for sycl 2017-02-24 19:16:24 +00:00
Mehdi Goli
0b7875f137 Converting fixed float type into template type for TensorContraction. 2017-02-24 18:13:30 +00:00
Mehdi Goli
89dfd51fae Adding Sycl Backend for TensorGenerator.h. 2017-02-22 16:36:24 +00:00
Gael Guennebaud
d8b1f6cebd bug #1380: for Map<> as input of matrix exponential 2017-02-20 14:06:06 +01:00
Mehdi Goli
79ebc8f761 Adding Sycl backend for TensorImagePatchOP.h; adding Sycl backend for TensorInflation.h. 2017-02-20 12:11:05 +00:00
Gael Guennebaud
a811a04696 Silent warning. 2017-02-20 10:14:21 +01:00
Gael Guennebaud
f8a55cc062 Fix compilation. 2017-02-18 10:08:13 +01:00
Benoit Steiner
cfa0568ef7 Size indices are signed. 2017-02-16 10:13:34 -08:00
Mehdi Goli
91982b91c0 Adding TensorLayoutSwapOp for sycl. 2017-02-15 16:28:12 +00:00
Mehdi Goli
b1e312edd6 Adding TensorPatch.h for sycl backend. 2017-02-15 10:13:01 +00:00
Mehdi Goli
0d153ded29 Adding TensorChippingOP for sycl backend; fixing the index value in the verification operation for cxx11_tensorChipping.cpp test 2017-02-13 17:25:12 +00:00
Benoit Steiner
769208a17f Pulled latest updates from upstream 2017-02-10 13:11:40 -08:00
Mehdi Goli
0ee97b60c2 Adding mean to TensorReductionSycl.h 2017-02-07 15:43:17 +00:00
Mehdi Goli
42bd5c4e7b Fixing TensorReductionSycl for min and max. 2017-02-06 18:05:23 +00:00
Mehdi Goli
bc128f9f3b Reducing the warnings in Sycl backend. 2017-02-02 10:43:47 +00:00
Benoit Steiner
442e9cbb30 Silenced several compilation warnings 2017-02-01 15:50:58 -08:00
Mehdi Goli
bab29936a1 Reducing warnings in Sycl backend. 2017-02-01 15:29:53 +00:00
Mehdi Goli
48a20b7d95 Fixing compiler error on TensorContractionSycl.h; Silencing the compiler unused parameter warning for eval_op_indices in TensorContraction.h 2017-01-31 14:06:36 +00:00
Benoit Steiner
fbc39fd02c Merge latest changes from upstream 2017-01-30 15:25:57 -08:00
Gael Guennebaud
63de19c000 bug #1380: fix matrix exponential with Map<> 2017-01-30 13:55:27 +01:00
Mehdi Goli
82ce92419e Fixing the buffer type in memcpy. 2017-01-30 11:38:20 +00:00
Rasmus Munk Larsen
edaa0fc5d1 Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy.
This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
2017-01-26 12:46:06 -08:00
Gael Guennebaud
25a1703579 Merged in ggael/eigen-flexidexing (pull request PR-294)
generalized operator() for indexed access and slicing
2017-01-26 08:04:23 +00:00
Gael Guennebaud
607be65a03 Fix duplicates of array_size bewteen unsupported and Core 2017-01-25 22:53:58 +01:00
Rasmus Munk Larsen
e6b1020221 Adds a fast memcpy function to Eigen. This takes advantage of the following:
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.

2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux

Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.

The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.

Measured improvements in wall clock time:

Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2                          3.48      2.39    +31.3%
BM_memcpy_1T/8                          12.3      6.51    +47.0%
BM_memcpy_1T/64                          371       383     -3.2%
BM_memcpy_1T/512                       66922     66720     +0.3%
BM_memcpy_1T/4k                      9892867   6849682    +30.8%
BM_memcpy_1T/5k                     14951099  10332856    +30.9%
BM_memcpy_2T/2                          3.50      2.46    +29.7%
BM_memcpy_2T/8                          12.3      7.66    +37.7%
BM_memcpy_2T/64                          371       376     -1.3%
BM_memcpy_2T/512                       66652     66788     -0.2%
BM_memcpy_2T/4k                      6145012   6117776     +0.4%
BM_memcpy_2T/5k                      9181478   9010942     +1.9%
BM_memcpy_4T/2                          3.47      2.47    +31.0%
BM_memcpy_4T/8                          12.3      6.67    +45.8
BM_memcpy_4T/64                          374       376     -0.5%
BM_memcpy_4T/512                       67833     68019     -0.3%
BM_memcpy_4T/4k                      5057425   5188253     -2.6%
BM_memcpy_4T/5k                      7555638   7779468     -3.0%
BM_memcpy_6T/2                          3.51      2.50    +28.8%
BM_memcpy_6T/8                          12.3      7.61    +38.1%
BM_memcpy_6T/64                          373       378     -1.3%
BM_memcpy_6T/512                       66871     66774     +0.1%
BM_memcpy_6T/4k                      5112975   5233502     -2.4%
BM_memcpy_6T/5k                      7614180   7772246     -2.1%
BM_memcpy_8T/2                          3.47      2.41    +30.5%
BM_memcpy_8T/8                          12.4      10.5    +15.3%
BM_memcpy_8T/64                          372       388     -4.3%
BM_memcpy_8T/512                       67373     66588     +1.2%
BM_memcpy_8T/4k                      5148462   5254897     -2.1%
BM_memcpy_8T/5k                      7660989   7799058     -1.8%
BM_memcpy_12T/2                         3.50      2.40    +31.4%
BM_memcpy_12T/8                         12.4      7.55    +39.1
BM_memcpy_12T/64                         374       378     -1.1%
BM_memcpy_12T/512                      67132     66683     +0.7%
BM_memcpy_12T/4k                     5185125   5292920     -2.1%
BM_memcpy_12T/5k                     7717284   7942684     -2.9%
BM_slicingSmallPieces_1T/2              47.3      47.5     +0.4%
BM_slicingSmallPieces_1T/8              53.6      52.3     +2.4%
BM_slicingSmallPieces_1T/64              491       476     +3.1%
BM_slicingSmallPieces_1T/512           21734     18814    +13.4%
BM_slicingSmallPieces_1T/4k           394660    396760     -0.5%
BM_slicingSmallPieces_1T/5k           218722    209244     +4.3%
BM_slicingSmallPieces_2T/2              80.7      79.9     +1.0%
BM_slicingSmallPieces_2T/8              54.2      53.1     +2.0
BM_slicingSmallPieces_2T/64              497       477     +4.0%
BM_slicingSmallPieces_2T/512           21732     18822    +13.4%
BM_slicingSmallPieces_2T/4k           392885    390490     +0.6%
BM_slicingSmallPieces_2T/5k           221988    208678     +6.0%
BM_slicingSmallPieces_4T/2              80.8      80.1     +0.9%
BM_slicingSmallPieces_4T/8              54.1      53.2     +1.7%
BM_slicingSmallPieces_4T/64              493       476     +3.4%
BM_slicingSmallPieces_4T/512           21702     18758    +13.6%
BM_slicingSmallPieces_4T/4k           393962    404023     -2.6%
BM_slicingSmallPieces_4T/5k           249667    211732    +15.2%
BM_slicingSmallPieces_6T/2              80.5      80.1     +0.5%
BM_slicingSmallPieces_6T/8              54.4      53.4     +1.8%
BM_slicingSmallPieces_6T/64              488       478     +2.0%
BM_slicingSmallPieces_6T/512           21719     18841    +13.3%
BM_slicingSmallPieces_6T/4k           394950    397583     -0.7%
BM_slicingSmallPieces_6T/5k           223080    210148     +5.8%
BM_slicingSmallPieces_8T/2              81.2      80.4     +1.0%
BM_slicingSmallPieces_8T/8              58.1      53.5     +7.9%
BM_slicingSmallPieces_8T/64              489       480     +1.8%
BM_slicingSmallPieces_8T/512           21586     18798    +12.9%
BM_slicingSmallPieces_8T/4k           394592    400165     -1.4%
BM_slicingSmallPieces_8T/5k           219688    208301     +5.2%
BM_slicingSmallPieces_12T/2             80.2      79.8     +0.7%
BM_slicingSmallPieces_12T/8             54.4      53.4     +1.8
BM_slicingSmallPieces_12T/64             488       476     +2.5%
BM_slicingSmallPieces_12T/512          21931     18831    +14.1%
BM_slicingSmallPieces_12T/4k          393962    396541     -0.7%
BM_slicingSmallPieces_12T/5k          218803    207965     +5.0%
2017-01-24 13:55:18 -08:00
Luke Iwanski
bf44fed9b7 Allows AMD APU 2017-01-23 15:56:45 +00:00
Mehdi Goli
602f8c27f5 Reverting back to the previous TensorDeviceSycl.h as the total number of buffer is not enough for tensorflow. 2017-01-20 18:23:20 +00:00
Mehdi Goli
77cc4d06c7 Removing unused variables 2017-01-19 17:06:21 +00:00
Mehdi Goli
837fdbdcb2 Merging with Benoit's upstream. 2017-01-19 11:34:34 +00:00
Mehdi Goli
6bdd15f572 Adding non-deferrenciable pointer track for ComputeCpp backend; Adding TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class. 2017-01-19 11:30:59 +00:00
Mehdi Goli
c6f7b33834 Applying Benoit's comment. Embedding synchronisation inside device memcpy so there is no need to externally call synchronise() for device memcopy. 2017-01-18 10:45:28 +00:00
Mehdi Goli
e46e722381 Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree. 2017-01-16 13:58:49 +00:00
Gael Guennebaud
bbd97b4095 Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases 2017-07-17 01:02:51 +02:00
Luke Iwanski
90c5bc8d64 Fixes auto appearance in functor template argument for reduction. 2017-01-04 22:18:44 +00:00
Mehdi Goli
8b1c2108ba Reverting asynchronous exec to Synchronous exec regarding random race condition. 2016-12-22 16:45:38 +00:00
Benoit Steiner
660da83e18 Pulled latest update from trunk 2016-12-21 16:43:27 -08:00
Benoit Steiner
4236aebe10 Simplified the contraction code` 2016-12-21 16:42:56 -08:00
Benoit Steiner
3cfa16f41d Merged in benoitsteiner/opencl (pull request PR-279)
Fix for auto appearing in functor template argument.
2016-12-21 15:08:54 -08:00
Benoit Steiner
519d63d350 Added support for libxsmm kernel in multithreaded contractions 2016-12-21 15:06:06 -08:00
Benoit Steiner
f9eff17e91 Leverage libxsmm kernels within signle threaded contractions 2016-12-21 12:32:06 -08:00
Luke Iwanski
c55ecfd820 Fix for auto appearing in functor template argument. 2016-12-21 15:42:51 +00:00
Benoit Steiner
0f577d4744 Merged eigen/eigen into default 2016-12-20 17:02:06 -08:00
Luke Iwanski
29186f766f Fixed order of initialisation in ExecExprFunctorKernel functor. 2016-12-20 21:32:42 +00:00
Gael Guennebaud
e8d6862f14 Properly adjust precision when saving to Market format. 2016-12-20 22:10:33 +01:00
Gael Guennebaud
e2f4ee1c2b Speed up parsing of sparse Market file. 2016-12-20 21:56:21 +01:00
Luke Iwanski
8245851d1b Matching parameters order between lambda and the functor. 2016-12-20 16:18:15 +00:00
Benoit Steiner
70d0172f0c Merged eigen/eigen into default 2016-12-16 17:37:04 -08:00
Benoit Steiner
8910442e19 Fixed memcpy, memcpyHostToDevice and memcpyDeviceToHost for Sycl. 2016-12-16 15:45:04 -08:00
Luke Iwanski
54db66c5df struct -> class in order to silence compilation warning. 2016-12-16 20:25:20 +00:00
Mehdi Goli
35bae513a0 Converting all parallel for lambda to functor in order to prevent kernel duplication name error; adding tensorConcatinationOp backend for sycl. 2016-12-16 19:46:45 +00:00
Mehdi Goli
c5e8546306 Adding asynchandler to sycl queue as lack of it can cause undefined behaviour. 2016-12-15 16:59:57 +00:00
Benoit Steiner
2c2e218471 Avoid using #define since they can conflict with user code 2016-12-14 19:49:15 -08:00
Benoit Steiner
3beb180ee5 Don't call EnvThread::OnCancel by default since it doesn't do anything. 2016-12-14 18:33:39 -08:00
Benoit Steiner
9ff5d0f821 Merged eigen/eigen into default 2016-12-14 17:32:16 -08:00
Mehdi Goli
730eb9fe1c Adding asynchronous execution as it improves the performance. 2016-12-14 17:38:53 +00:00
Mehdi Goli
2d4a091beb Adding tensor contraction operation backend for Sycl; adding test for contractionOp sycl backend; adding temporary solution to prevent memory leak in buffer; cleaning up cxx11_tensor_buildins_sycl.h 2016-12-14 15:30:37 +00:00
Benoit Steiner
a432fc102d Moved the choice of ThreadPool to unsupported/Eigen/CXX11/ThreadPool 2016-12-12 15:24:16 -08:00
Benoit Steiner
8ae68924ed Made ThreadPoolInterface::Cancel() an optional functionality 2016-12-12 11:58:38 -08:00
Benoit Steiner
76fca22134 Use a more accurate timer to sleep on Linux systems. 2016-12-09 15:12:24 -08:00
Benoit Steiner
4deafd35b7 Introduce a portable EIGEN_SLEEP macro. 2016-12-09 14:52:15 -08:00