Gael Guennebaud
6190aa5632
bug #1567 : add optimized path for tensor broadcasting and 'Channel First' shape
2018-07-09 11:23:16 +02:00
Deven Desai
1bb6fa99a3
merging the CUDA and HIP implementation for the Tensor directory and the unit tests
2018-06-20 16:44:58 -04:00
Deven Desai
cfdabbcc8f
removing the *Hip files from the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories
2018-06-20 12:57:02 -04:00
Deven Desai
7e41c8f1a9
renaming *Cuda files to *Gpu in the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories
2018-06-20 12:52:30 -04:00
Deven Desai
b6cc0961b1
updates based on PR feedback
...
There are two major changes (and a few minor ones which are not listed here...see PR discussion for details)
1. Eigen::half implementations for HIP and CUDA have been merged.
This means that
- `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h`
- `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h`
- `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h`
After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install.
2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate.
- `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)`
- `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)`
- `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
2018-06-14 10:21:54 -04:00
Deven Desai
d1d22ef0f4
syncing this fork with upstream
2018-06-13 12:09:52 -04:00
Benoit Steiner
d3a380af4d
Merged in mfigurnov/eigen/gamma-der-a (pull request PR-403)
...
Derivative of the incomplete Gamma function and the sample of a Gamma random variable
Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
2018-06-11 17:57:47 +00:00
Jonathan Liu
b7689bded9
Use std::complex constructor instead of assignment from scalar
...
Fixes GCC conversion to non-scalar type requested compile error when
using boost::multiprecision::cpp_dec_float_50 as scalar type.
2018-06-28 00:32:37 +10:00
Rasmus Munk Larsen
5418154a45
Fix oversharding bug in parallelFor.
2018-06-20 17:51:48 -07:00
Gael Guennebaud
7933267c67
fix prototype
2018-06-08 09:56:01 +02:00
Michael Figurnov
30fa3d0454
Merge from eigen/eigen
2018-06-07 17:57:56 +01:00
Michael Figurnov
6c71c7d360
Merge from eigen/eigen.
2018-06-07 15:54:18 +01:00
Gael Guennebaud
37348d03ae
Fix int versus Index
2018-06-07 15:56:43 +02:00
Michael Figurnov
aa813d417b
Fix compilation of special functions without C99 math.
...
The commit with Bessel functions i0e and i1e placed the ifdef/endif incorrectly,
causing i0e/i1e to be undefined when EIGEN_HAS_C99_MATH=0. These functions do not
actually require C99 math, so now they are always available.
2018-06-07 14:35:07 +01:00
Gael Guennebaud
b3fd93207b
Fix typos found using codespell
2018-06-07 14:43:02 +02:00
Michael Figurnov
5172a32849
Updated the stopping criteria in igammac_cf_impl.
...
Previously, when computing the derivative, it used a relative error threshold. Now it uses an absolute error threshold. The behavior for computing the value is unchanged. This makes more sense since we do not expect the derivative to often be close to zero. This change makes the derivatives about 30% faster across the board. The error for the igamma_der_a is almost unchanged, while for gamma_sample_der_alpha it is a bit worse for float32 and unchanged for float64.
2018-06-07 12:03:58 +01:00
Michael Figurnov
4bd158fa37
Derivative of the incomplete Gamma function and the sample of a Gamma random variable.
...
In addition to igamma(a, x), this code implements:
* igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter
* gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter
The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.
2018-06-06 18:49:26 +01:00
Deven Desai
8fbd47052b
Adding support for using Eigen in HIP kernels.
...
This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs.
Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor)
Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
2018-06-06 10:12:58 -04:00
Benoit Steiner
e206f8d4a4
Merged in mfigurnov/eigen (pull request PR-400)
...
Exponentially scaled modified Bessel functions of order zero and one.
Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
2018-06-05 17:05:21 +00:00
Penporn Koanantakool
e2ed0cf8ab
Add a ThreadPoolInterface* getter for ThreadPoolDevice.
2018-06-02 12:07:49 -07:00
Michael Figurnov
f216854453
Exponentially scaled modified Bessel functions of order zero and one.
...
The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(|x|) i0e(x)
The code is ported from Cephes and tested against SciPy.
2018-05-31 15:34:53 +01:00
Katrin Leinweber
ea94543190
Hyperlink DOIs against preferred resolver
2018-05-24 18:55:40 +02:00
Vamsi Sripathi
6293ad3f39
Performance improvements to tensor broadcast operation
...
1. Added new packet functions using SIMD for NByOne, OneByN cases
2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD
3. Added 4 test cases to cover the new packet functions
2018-05-23 14:02:05 -07:00
Benoit Steiner
0371380d5b
Merged in rmlarsen/eigen2 (pull request PR-393)
...
Rename scalar_clip_op to scalar_clamp_op to prevent collision with existing functor in TensorFlow.
2018-05-16 21:45:42 +00:00
Rasmus Munk Larsen
b8d36774fa
Rename clip2 to clamp.
2018-05-16 14:04:48 -07:00
Rasmus Munk Larsen
812480baa3
Rename scalar_clip_op to scalar_clip2_op to prevent collision with existing functor in TensorFlow.
2018-05-16 09:49:24 -07:00
Benoit Steiner
1403c2c15b
Merged in didierjansen/eigen (pull request PR-360)
...
Fix bugs and typos in the contraction example of the tensor README
2018-05-16 01:16:36 +00:00
Rasmus Munk Larsen
b8c8e5f436
Add vectorized clip functor for Eigen Tensors.
2018-05-14 16:07:13 -07:00
Benoit Steiner
6118c6ff4f
Enable RawAccess to tensor slices whenever possinle.
...
Avoid 32-bit integer overflow in TensorSlicingOp
2018-04-30 11:28:12 -07:00
Weiming Zhao
b0eda3cb9f
Avoid using memcpy for non-POD elements
2018-04-11 11:37:06 +02:00
Gael Guennebaud
67bac6368c
protect calls to isnan
2018-04-03 14:19:04 +02:00
Gael Guennebaud
524119d32a
Fix uninitialized output argument.
2018-04-03 10:56:10 +02:00
Viktor Csomor
000840cae0
Added a move constructor and move assignment operator to Tensor and wrote some tests.
2018-02-07 19:10:54 +01:00
Eugene Zhulenev
c95aacab90
Fix TensorContractionOp evaluators for GPU and SYCL
2018-07-17 14:09:37 -07:00
Deven Desai
f124f07965
applying EIGEN_DECLARE_TEST to *gpu* tests
...
Also, a few minor fixes for GPU tests running in HIP mode.
1. Adding an include for hip/hip_runtime.h in the Macros.h file
For HIP __host__ and __device__ are macros which are defined in hip headers.
Their definitions need to be included before their use in the file.
2. Fixing the compile failure in TensorContractionGpu introduced by the commit to
"Fuse computations into the Tensor contractions using output kernel"
3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit
2018-07-17 14:16:48 -04:00
Gael Guennebaud
82f0ce2726
Get rid of EIGEN_TEST_FUNC, unit tests must now be declared with EIGEN_DECLARE_TEST(mytest) { /* code */ }.
...
This provide several advantages:
- more flexibility in designing unit tests
- unit tests can be glued to speed up compilation
- unit tests are compiled with same predefined macros, which is a requirement for zapcc
2018-07-17 14:46:15 +02:00
Eugene Zhulenev
43206ac4de
Call OutputKernel in evalGemv
2018-07-12 14:52:23 -07:00
Eugene Zhulenev
e204ecdaaf
Remove SimpleThreadPool and always use {NonBlocking}ThreadPool
2018-07-16 15:06:57 -07:00
Eugene Zhulenev
01fd4096d3
Fuse computations into the Tensor contractions using output kernel
2018-07-10 13:16:38 -07:00
Gael Guennebaud
5539587b1f
Some warning fixes
2018-07-17 10:29:12 +02:00
Benoit Steiner
8f55956a57
Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
2018-01-30 20:22:12 +00:00
Lee.Deokjae
5b3c367926
Fix typos in the contraction example of tensor README
2018-01-06 14:36:19 +09:00
RJ Ryan
59985cfd26
Disable use of recurrence for computing twiddle factors. Fixes FFT precision issues for large FFTs. https://github.com/tensorflow/tensorflow/issues/10749#issuecomment-354557689
2017-12-31 10:44:56 -05:00
Gael Guennebaud
73214c4bd0
Workaround nvcc 9.0 issue. See PR 351.
...
https://bitbucket.org/eigen/eigen/pull-requests/351
2017-12-15 14:10:59 +01:00
Yangzihao Wang
3122477c86
Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
2017-12-12 11:15:24 -08:00
Rasmus Munk Larsen
e900b010c8
Improve robustness of igamma and igammac to bad inputs.
...
Check for nan inputs and propagate them immediately. Limit the number of internal iterations to 2000 (same number as used by scipy.special.gammainc). This prevents an infinite loop when the function is called with nan or very large arguments.
Original change by mfirgunov@google.com
2018-03-19 09:04:54 -07:00
Gael Guennebaud
00bc67c374
Move KLU support to official
2017-11-10 14:11:22 +01:00
Gael Guennebaud
b82cd93c01
KLU: truely disable unimplemented code, add proper static assertions in solve
2017-11-10 14:09:01 +01:00
Gael Guennebaud
8cf63ccb99
Merged in kylemacfarlan/eigen (pull request PR-337)
...
Add support for SuiteSparse's KLU routines
2017-11-10 10:43:17 +00:00
Gael Guennebaud
1495b98a8e
Merged in spraetor/eigen (pull request PR-305)
...
Issue with mpreal and std::numeric_limits::digits
2017-11-10 10:28:54 +00:00
Gael Guennebaud
fc45324380
Merged in jkflying/eigen-fix-scaling (pull request PR-302)
...
Make scaling work with non-square matrices
2017-11-10 10:11:36 +00:00
Gael Guennebaud
1b2dcf9a47
Check that Schur decomposition succeed.
2017-11-10 10:26:09 +01:00
Gael Guennebaud
0a1cc73942
bug #1484 : restore deleted line for 128 bits long doubles, and improve dispatching logic.
2017-11-10 10:25:41 +01:00
Benoit Steiner
3949615176
Merged in JonasMu/eigen (pull request PR-329)
...
Added an example for a contraction to a scalar value to README.md
Approved-by: Jonas Harsch <jonas.harsch@gmail.com>
2017-10-27 07:27:46 +00:00
Benoit Steiner
8eb4b9d254
Merged in benoitsteiner/opencl (pull request PR-341)
2017-10-17 16:39:28 +00:00
Rasmus Munk Larsen
f349507e02
Specialize ThreadPoolDevice::enqueueNotification for the case with no args. As an example this reduces binary size of an TensorFlow demo app for Android by about 2.5%.
2017-10-13 15:58:12 -07:00
Kyle Vedder
c0e1d510fd
Add support for SuiteSparse's KLU routines
2017-10-04 21:01:23 -05:00
Mehdi Goli
2062ac9958
Changes required for new ComputeCpp CE version.
2017-09-18 18:17:39 +01:00
Rasmus Munk Larsen
1b7294f6fc
Fix cut-and-paste error.
2017-09-08 16:35:58 -07:00
Rasmus Munk Larsen
94e2213b38
Avoid undefined behavior in Eigen::TensorCostModel::numThreads.
...
If the cost is large enough then the thread count can be larger than the maximum
representable int, so just casting it to an int is undefined behavior.
Contributed by phurst@google.com .
2017-09-08 15:49:55 -07:00
Gael Guennebaud
a91918a105
Merged in infinitei/eigen (pull request PR-328)
...
bug #1464 : Fixes construction of EulerAngles from 3D vector expression.
Approved-by: Tal Hadad <tal_hd@hotmail.com>
Approved-by: Abhijit Kundu <abhijit.kundu@gatech.edu>
2017-09-06 08:42:14 +00:00
Jonas Harsch
a991c80365
Added an example for a contraction to a scalar value, e.g. a double contraction of two second order tensors and how you can get the value of the result. I lost one day to get this doen so I think it will help some guys. I also added Eigen:: to the IndexPair and and array in the same example.
2017-09-01 11:30:26 +00:00
Benoit Steiner
a4089991eb
Added support for CUDA 9.0.
2017-08-31 02:49:39 +00:00
Abhijit Kundu
6d991a9595
bug #1464 : Fixes construction of EulerAngles from 3D vector expression.
2017-08-30 13:26:30 -04:00
Benoit Steiner
84d7be103a
Fixing Argmax that was breaking upstream TensorFlow.
2017-07-22 03:19:34 +00:00
Benoit Steiner
f0b154a4b0
Code cleanup
2017-07-10 09:54:09 -07:00
Benoit Steiner
575cda76b3
Fixed syntax errors generated by xcode
2017-07-09 11:39:01 -07:00
Benoit Steiner
5ac27d5b51
Avoid relying on cxx11 features when possible.
2017-07-08 21:58:44 -07:00
Benoit Steiner
c5a241ab9b
Merged in benoitsteiner/opencl (pull request PR-323)
...
Improved support for OpenCL
2017-07-07 16:27:33 +00:00
Benoit Steiner
b7ae4dd9ef
Merged in hughperkins/eigen/add-endif-labels-TensorReductionCuda.h (pull request PR-315)
...
Add labels to #ifdef, in TensorReductionCuda.h
2017-07-07 04:23:52 +00:00
Benoit Steiner
9daed67952
Merged in tntnatbry/eigen (pull request PR-319)
...
Tensor Trace op
2017-07-07 04:18:03 +00:00
Benoit Steiner
6795512e59
Improved the randomness of the tensor random generator
2017-07-06 21:12:45 -07:00
Benoit Steiner
dc524ac716
Fixed compilation warning
2017-07-06 21:11:15 -07:00
Benoit Steiner
62b4634ebe
Merged in mehdi_goli/upstr_benoit/TensorSYCLImageVolumePatchFixed (pull request PR-14)
...
Applying Benoit's comment for Fixing ImageVolumePatch.
* Applying Benoit's comment for Fixing ImageVolumePatch. Fixing conflict on cmake file.
* Fixing dealocation of the memory in ImagePatch test for SYCL.
* Fixing the automerge issue.
2017-07-06 05:08:13 +00:00
Benoit Steiner
53725c10b8
Merged in mehdi_goli/opencl/DataDependancy (pull request PR-10)
...
DataDependancy
* Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code.
* Applying Ronnan's Comments.
* Applying benoit's comments
2017-06-28 17:55:23 +00:00
Benoit Steiner
b8e805497e
Merged in benoitsteiner/opencl (pull request PR-318)
...
Improved support for OpenCL
2017-06-13 05:01:10 +00:00
Hugh Perkins
9341f258d4
Add labels to #ifdef, in TensorReductionCuda.h
2017-06-06 15:51:06 +01:00
Benoit Steiner
1e736b9ead
Merged in mehdi_goli/opencl/SYCLAlignAllocator (pull request PR-7)
...
Fixing SYCL alignment issue required by TensorFlow.
2017-05-26 17:23:00 +00:00
Benoit Steiner
9dee55ec33
Merged eigen/eigen into default
2017-05-26 09:01:04 -07:00
Mehdi Goli
0370d3576e
Applying Ronnan's comments.
2017-05-26 16:01:48 +01:00
Mehdi Goli
e3f964ed55
Applying Benoit's comment;removing dead code.
2017-05-25 11:17:26 +01:00
a-doumoulakis
fb853a857a
Restore misplaced comment
2017-05-24 17:50:15 +01:00
a-doumoulakis
7a8ba565f8
Merge changed from upstream
2017-05-24 17:45:29 +01:00
Mmanu Chaturvedi
2971503fed
Specializing numeric_limits For AutoDiffScalar
2017-05-23 17:12:36 -04:00
Gael Guennebaud
26e8f9171e
Fix compilation of matrix log with Map as input
2017-06-07 10:51:23 +02:00
Mehdi Goli
76c0fc1f95
Fixing SYCL alignment issue required by TensorFlow.
2017-05-22 16:49:32 +01:00
Mehdi Goli
2d17128d6f
Fixing suported device list.
2017-05-22 16:40:33 +01:00
a-doumoulakis
052426b824
Add support for triSYCL
...
Eigen is now able to use triSYCL with EIGEN_SYCL_TRISYCL and TRISYCL_INCLUDE_DIR options
Fix contraction kernel with correct nd_item dimension
2017-05-05 19:26:27 +01:00
RJ Ryan
949a2da38c
Use scalar_sum_op and scalar_quotient_op instead of operator+ and operator/ in MeanReducer.
...
Improves support for std::complex types when compiling for CUDA.
Expands on e2e9cdd169
and 2bda1b0d93
.
2017-04-14 13:23:35 -07:00
Benoit Steiner
0d08165a7f
Merged in benoitsteiner/opencl (pull request PR-309)
...
OpenCL improvements
2017-04-05 14:28:08 +00:00
Benoit Steiner
c302ea7bc4
Deleted empty line of code
2017-04-04 10:05:16 -07:00
Benoit Steiner
a5a0c8fac1
Guard sycl specific code under a EIGEN_USE_SYCL ifdef
2017-04-04 10:03:21 -07:00
Benoit Steiner
a1304b95b7
Code cleanup
2017-04-04 10:00:46 -07:00
Benoit Steiner
66c63826bd
Guard the sycl specific code with EIGEN_USE_SYCL
2017-04-04 09:59:09 -07:00
Benoit Steiner
e3e343390a
Guard the sycl specific code with a #ifdef EIGEN_USE_SYCL
2017-04-04 09:56:33 -07:00
Benoit Steiner
63840d4666
iGate the sycl specific code under a EIGEN_USE_SYCL define
2017-04-04 09:54:31 -07:00
Benoit Steiner
bc050ea9f0
Fixed compilation error when sycl is enabled.
2017-04-04 09:47:04 -07:00
Gagan Goel
4910630c96
fix typos in the Tensor readme
2017-03-31 20:32:16 -04:00
Benoit Steiner
c1b3d5ecb6
Restored code compatibility with compilers that dont support c++11
...
Gated more sycl code under #ifdef sycl
2017-03-31 08:31:28 -07:00
Benoit Steiner
e2d5d4e7b3
Restore the old constructors to retain compatibility with non c++11 compilers.
2017-03-31 08:26:13 -07:00
Benoit Steiner
73fcaa319f
Gate the sycl specific code under #ifdef sycl
2017-03-31 08:22:25 -07:00
Mehdi Goli
bd64ee8555
Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax.
2017-03-28 16:50:34 +01:00
Simon Praetorius
511810797e
Issue with mpreal and std::numeric_limits, i.e. digits is not a constant. Added a digits() traits in NumTraits with fallback to static constant. Specialization for mpreal added in MPRealSupport.
2017-03-24 17:45:56 +01:00
Luke Iwanski
a91417a7a5
Introduces align allocator for SYCL buffer
2017-03-20 14:48:54 +00:00
Benoit Steiner
f8a622ef3c
Merged eigen/eigen into default
2017-03-15 20:06:19 -07:00
Benoit Steiner
fd7db52f9b
Silenced compilation warning
2017-03-15 20:02:39 -07:00
Luke Iwanski
c06861d15e
Fixes bug in get_sycl_supported_devices() that was reporting unsupported Intel CPU on AMD platform - causing timeouts in that configuration
2017-03-15 19:26:08 +00:00
Benoit Steiner
f0f3591118
Made the reduction code compile with cuda-clang
2017-03-14 14:16:53 -07:00
Mehdi Goli
f499fe9496
Adding synchronisation to convolution kernel for sycl backend.
2017-03-13 09:18:37 +00:00
Rasmus Munk Larsen
bfd7bf9c5b
Get rid of Init().
2017-03-10 08:48:20 -08:00
Rasmus Munk Larsen
d56ab01094
Use C++11 ctor forwarding to simplify code a bit.
2017-03-10 08:30:22 -08:00
Rasmus Munk Larsen
344c2694a6
Make the non-blocking threadpool more flexible and less wasteful of CPU cycles for high-latency use-cases.
...
* Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O.
* This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time.
* Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for.
* Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
2017-03-09 15:41:03 -08:00
Luke Iwanski
1b32a10053
Use name to distinguish name instead of the vendor
2017-03-08 18:26:34 +00:00
Gael Guennebaud
970ff78294
bug #1401 : fix compilation of "cond ? x : -x" with x an AutoDiffScalar
2017-03-08 16:16:53 +01:00
Mehdi Goli
5e9a1e7a7a
Adding sycl Benchmarks.
2017-03-08 14:17:48 +00:00
Mehdi Goli
e2e3f78533
Fixing potential race condition on sycl device.
2017-03-07 17:48:15 +00:00
Mehdi Goli
f84963ed95
Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch.
2017-03-07 14:27:10 +00:00
Julian Kent
bbe717fa2f
Make scaling work with non-square matrices
2017-03-03 12:58:51 +01:00
Benoit Steiner
a71943b9a4
Made the Tensor code compile with clang 3.9
2017-03-02 10:47:29 -08:00
Benoit Steiner
1e2d046651
Silenced a couple of compilation warnings
2017-03-01 10:13:42 -08:00
Benoit Steiner
c92406d613
Silenced clang compilation warning.
2017-02-28 17:03:11 -08:00
Benoit Steiner
de7b0fdea9
Made the TensorStorage class compile with clang 3.9
2017-02-28 13:52:22 -08:00
Mehdi Goli
8296b87d7b
Adding sycl backend for TensorCustomOp; fixing the partial lhs modification issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used.
2017-02-28 17:16:14 +00:00
Gael Guennebaud
478a9f53be
Fix typo.
2017-02-28 09:32:45 +01:00
Benoit Steiner
e0bd6f5738
Merged eigen/eigen into default
2017-02-26 10:02:14 -08:00
Mehdi Goli
2fa2b617a9
Adding TensorVolumePatchOP.h for sycl
2017-02-24 19:16:24 +00:00
Mehdi Goli
0b7875f137
Converting fixed float type into template type for TensorContraction.
2017-02-24 18:13:30 +00:00
Mehdi Goli
89dfd51fae
Adding Sycl Backend for TensorGenerator.h.
2017-02-22 16:36:24 +00:00
Gael Guennebaud
d8b1f6cebd
bug #1380 : for Map<> as input of matrix exponential
2017-02-20 14:06:06 +01:00
Mehdi Goli
79ebc8f761
Adding Sycl backend for TensorImagePatchOP.h; adding Sycl backend for TensorInflation.h.
2017-02-20 12:11:05 +00:00
Gael Guennebaud
a811a04696
Silent warning.
2017-02-20 10:14:21 +01:00
Gael Guennebaud
f8a55cc062
Fix compilation.
2017-02-18 10:08:13 +01:00
Benoit Steiner
cfa0568ef7
Size indices are signed.
2017-02-16 10:13:34 -08:00
Mehdi Goli
91982b91c0
Adding TensorLayoutSwapOp for sycl.
2017-02-15 16:28:12 +00:00
Mehdi Goli
b1e312edd6
Adding TensorPatch.h for sycl backend.
2017-02-15 10:13:01 +00:00
Mehdi Goli
0d153ded29
Adding TensorChippingOP for sycl backend; fixing the index value in the verification operation for cxx11_tensorChipping.cpp test
2017-02-13 17:25:12 +00:00
Benoit Steiner
769208a17f
Pulled latest updates from upstream
2017-02-10 13:11:40 -08:00
Mehdi Goli
0ee97b60c2
Adding mean to TensorReductionSycl.h
2017-02-07 15:43:17 +00:00
Mehdi Goli
42bd5c4e7b
Fixing TensorReductionSycl for min and max.
2017-02-06 18:05:23 +00:00
Mehdi Goli
bc128f9f3b
Reducing the warnings in Sycl backend.
2017-02-02 10:43:47 +00:00
Benoit Steiner
442e9cbb30
Silenced several compilation warnings
2017-02-01 15:50:58 -08:00
Mehdi Goli
bab29936a1
Reducing warnings in Sycl backend.
2017-02-01 15:29:53 +00:00
Mehdi Goli
48a20b7d95
Fixing compiler error on TensorContractionSycl.h; Silencing the compiler unused parameter warning for eval_op_indices in TensorContraction.h
2017-01-31 14:06:36 +00:00
Benoit Steiner
fbc39fd02c
Merge latest changes from upstream
2017-01-30 15:25:57 -08:00
Gael Guennebaud
63de19c000
bug #1380 : fix matrix exponential with Map<>
2017-01-30 13:55:27 +01:00
Mehdi Goli
82ce92419e
Fixing the buffer type in memcpy.
2017-01-30 11:38:20 +00:00
Rasmus Munk Larsen
edaa0fc5d1
Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy.
...
This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
2017-01-26 12:46:06 -08:00
Gael Guennebaud
25a1703579
Merged in ggael/eigen-flexidexing (pull request PR-294)
...
generalized operator() for indexed access and slicing
2017-01-26 08:04:23 +00:00
Gael Guennebaud
607be65a03
Fix duplicates of array_size bewteen unsupported and Core
2017-01-25 22:53:58 +01:00
Rasmus Munk Larsen
e6b1020221
Adds a fast memcpy function to Eigen. This takes advantage of the following:
...
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.
2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux
Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.
The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.
Measured improvements in wall clock time:
Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2 3.48 2.39 +31.3%
BM_memcpy_1T/8 12.3 6.51 +47.0%
BM_memcpy_1T/64 371 383 -3.2%
BM_memcpy_1T/512 66922 66720 +0.3%
BM_memcpy_1T/4k 9892867 6849682 +30.8%
BM_memcpy_1T/5k 14951099 10332856 +30.9%
BM_memcpy_2T/2 3.50 2.46 +29.7%
BM_memcpy_2T/8 12.3 7.66 +37.7%
BM_memcpy_2T/64 371 376 -1.3%
BM_memcpy_2T/512 66652 66788 -0.2%
BM_memcpy_2T/4k 6145012 6117776 +0.4%
BM_memcpy_2T/5k 9181478 9010942 +1.9%
BM_memcpy_4T/2 3.47 2.47 +31.0%
BM_memcpy_4T/8 12.3 6.67 +45.8
BM_memcpy_4T/64 374 376 -0.5%
BM_memcpy_4T/512 67833 68019 -0.3%
BM_memcpy_4T/4k 5057425 5188253 -2.6%
BM_memcpy_4T/5k 7555638 7779468 -3.0%
BM_memcpy_6T/2 3.51 2.50 +28.8%
BM_memcpy_6T/8 12.3 7.61 +38.1%
BM_memcpy_6T/64 373 378 -1.3%
BM_memcpy_6T/512 66871 66774 +0.1%
BM_memcpy_6T/4k 5112975 5233502 -2.4%
BM_memcpy_6T/5k 7614180 7772246 -2.1%
BM_memcpy_8T/2 3.47 2.41 +30.5%
BM_memcpy_8T/8 12.4 10.5 +15.3%
BM_memcpy_8T/64 372 388 -4.3%
BM_memcpy_8T/512 67373 66588 +1.2%
BM_memcpy_8T/4k 5148462 5254897 -2.1%
BM_memcpy_8T/5k 7660989 7799058 -1.8%
BM_memcpy_12T/2 3.50 2.40 +31.4%
BM_memcpy_12T/8 12.4 7.55 +39.1
BM_memcpy_12T/64 374 378 -1.1%
BM_memcpy_12T/512 67132 66683 +0.7%
BM_memcpy_12T/4k 5185125 5292920 -2.1%
BM_memcpy_12T/5k 7717284 7942684 -2.9%
BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4%
BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4%
BM_slicingSmallPieces_1T/64 491 476 +3.1%
BM_slicingSmallPieces_1T/512 21734 18814 +13.4%
BM_slicingSmallPieces_1T/4k 394660 396760 -0.5%
BM_slicingSmallPieces_1T/5k 218722 209244 +4.3%
BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0%
BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0
BM_slicingSmallPieces_2T/64 497 477 +4.0%
BM_slicingSmallPieces_2T/512 21732 18822 +13.4%
BM_slicingSmallPieces_2T/4k 392885 390490 +0.6%
BM_slicingSmallPieces_2T/5k 221988 208678 +6.0%
BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9%
BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7%
BM_slicingSmallPieces_4T/64 493 476 +3.4%
BM_slicingSmallPieces_4T/512 21702 18758 +13.6%
BM_slicingSmallPieces_4T/4k 393962 404023 -2.6%
BM_slicingSmallPieces_4T/5k 249667 211732 +15.2%
BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5%
BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8%
BM_slicingSmallPieces_6T/64 488 478 +2.0%
BM_slicingSmallPieces_6T/512 21719 18841 +13.3%
BM_slicingSmallPieces_6T/4k 394950 397583 -0.7%
BM_slicingSmallPieces_6T/5k 223080 210148 +5.8%
BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0%
BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9%
BM_slicingSmallPieces_8T/64 489 480 +1.8%
BM_slicingSmallPieces_8T/512 21586 18798 +12.9%
BM_slicingSmallPieces_8T/4k 394592 400165 -1.4%
BM_slicingSmallPieces_8T/5k 219688 208301 +5.2%
BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7%
BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8
BM_slicingSmallPieces_12T/64 488 476 +2.5%
BM_slicingSmallPieces_12T/512 21931 18831 +14.1%
BM_slicingSmallPieces_12T/4k 393962 396541 -0.7%
BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%
2017-01-24 13:55:18 -08:00