eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-06 14:14:46 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	6190aa5632	bug #1567 : add optimized path for tensor broadcasting and 'Channel First' shape	2018-07-09 11:23:16 +02:00
Deven Desai	1bb6fa99a3	merging the CUDA and HIP implementation for the Tensor directory and the unit tests	2018-06-20 16:44:58 -04:00
Deven Desai	cfdabbcc8f	removing the *Hip files from the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories	2018-06-20 12:57:02 -04:00
Deven Desai	7e41c8f1a9	renaming Cuda files to Gpu in the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories	2018-06-20 12:52:30 -04:00
Deven Desai	b6cc0961b1	updates based on PR feedback There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC \|\| EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH \|\| EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`	2018-06-14 10:21:54 -04:00
Deven Desai	d1d22ef0f4	syncing this fork with upstream	2018-06-13 12:09:52 -04:00
Benoit Steiner	d3a380af4d	Merged in mfigurnov/eigen/gamma-der-a (pull request PR-403) Derivative of the incomplete Gamma function and the sample of a Gamma random variable Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>	2018-06-11 17:57:47 +00:00
Jonathan Liu	b7689bded9	Use std::complex constructor instead of assignment from scalar Fixes GCC conversion to non-scalar type requested compile error when using boost::multiprecision::cpp_dec_float_50 as scalar type.	2018-06-28 00:32:37 +10:00
Rasmus Munk Larsen	5418154a45	Fix oversharding bug in parallelFor.	2018-06-20 17:51:48 -07:00
Gael Guennebaud	7933267c67	fix prototype	2018-06-08 09:56:01 +02:00
Michael Figurnov	30fa3d0454	Merge from eigen/eigen	2018-06-07 17:57:56 +01:00
Michael Figurnov	6c71c7d360	Merge from eigen/eigen.	2018-06-07 15:54:18 +01:00
Gael Guennebaud	37348d03ae	Fix int versus Index	2018-06-07 15:56:43 +02:00
Michael Figurnov	aa813d417b	Fix compilation of special functions without C99 math. The commit with Bessel functions i0e and i1e placed the ifdef/endif incorrectly, causing i0e/i1e to be undefined when EIGEN_HAS_C99_MATH=0. These functions do not actually require C99 math, so now they are always available.	2018-06-07 14:35:07 +01:00
Gael Guennebaud	b3fd93207b	Fix typos found using codespell	2018-06-07 14:43:02 +02:00
Michael Figurnov	5172a32849	Updated the stopping criteria in igammac_cf_impl. Previously, when computing the derivative, it used a relative error threshold. Now it uses an absolute error threshold. The behavior for computing the value is unchanged. This makes more sense since we do not expect the derivative to often be close to zero. This change makes the derivatives about 30% faster across the board. The error for the igamma_der_a is almost unchanged, while for gamma_sample_der_alpha it is a bit worse for float32 and unchanged for float64.	2018-06-07 12:03:58 +01:00
Michael Figurnov	4bd158fa37	Derivative of the incomplete Gamma function and the sample of a Gamma random variable. In addition to igamma(a, x), this code implements: * igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter * gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.	2018-06-06 18:49:26 +01:00
Deven Desai	8fbd47052b	Adding support for using Eigen in HIP kernels. This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs. Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor) Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.	2018-06-06 10:12:58 -04:00
Benoit Steiner	e206f8d4a4	Merged in mfigurnov/eigen (pull request PR-400) Exponentially scaled modified Bessel functions of order zero and one. Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>	2018-06-05 17:05:21 +00:00
Penporn Koanantakool	e2ed0cf8ab	Add a ThreadPoolInterface* getter for ThreadPoolDevice.	2018-06-02 12:07:49 -07:00
Michael Figurnov	f216854453	Exponentially scaled modified Bessel functions of order zero and one. The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(\|x\|) i0e(x) The code is ported from Cephes and tested against SciPy.	2018-05-31 15:34:53 +01:00
Katrin Leinweber	ea94543190	Hyperlink DOIs against preferred resolver	2018-05-24 18:55:40 +02:00
Vamsi Sripathi	6293ad3f39	Performance improvements to tensor broadcast operation 1. Added new packet functions using SIMD for NByOne, OneByN cases 2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD 3. Added 4 test cases to cover the new packet functions	2018-05-23 14:02:05 -07:00
Benoit Steiner	0371380d5b	Merged in rmlarsen/eigen2 (pull request PR-393) Rename scalar_clip_op to scalar_clamp_op to prevent collision with existing functor in TensorFlow.	2018-05-16 21:45:42 +00:00
Rasmus Munk Larsen	b8d36774fa	Rename clip2 to clamp.	2018-05-16 14:04:48 -07:00
Rasmus Munk Larsen	812480baa3	Rename scalar_clip_op to scalar_clip2_op to prevent collision with existing functor in TensorFlow.	2018-05-16 09:49:24 -07:00
Benoit Steiner	1403c2c15b	Merged in didierjansen/eigen (pull request PR-360) Fix bugs and typos in the contraction example of the tensor README	2018-05-16 01:16:36 +00:00
Rasmus Munk Larsen	b8c8e5f436	Add vectorized clip functor for Eigen Tensors.	2018-05-14 16:07:13 -07:00
Benoit Steiner	6118c6ff4f	Enable RawAccess to tensor slices whenever possinle. Avoid 32-bit integer overflow in TensorSlicingOp	2018-04-30 11:28:12 -07:00
Weiming Zhao	b0eda3cb9f	Avoid using memcpy for non-POD elements	2018-04-11 11:37:06 +02:00
Gael Guennebaud	67bac6368c	protect calls to isnan	2018-04-03 14:19:04 +02:00
Gael Guennebaud	524119d32a	Fix uninitialized output argument.	2018-04-03 10:56:10 +02:00
Viktor Csomor	000840cae0	Added a move constructor and move assignment operator to Tensor and wrote some tests.	2018-02-07 19:10:54 +01:00
Eugene Zhulenev	c95aacab90	Fix TensorContractionOp evaluators for GPU and SYCL	2018-07-17 14:09:37 -07:00
Deven Desai	f124f07965	applying EIGEN_DECLARE_TEST to gpu tests Also, a few minor fixes for GPU tests running in HIP mode. 1. Adding an include for hip/hip_runtime.h in the Macros.h file For HIP __host__ and __device__ are macros which are defined in hip headers. Their definitions need to be included before their use in the file. 2. Fixing the compile failure in TensorContractionGpu introduced by the commit to "Fuse computations into the Tensor contractions using output kernel" 3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit	2018-07-17 14:16:48 -04:00
Gael Guennebaud	82f0ce2726	Get rid of EIGEN_TEST_FUNC, unit tests must now be declared with EIGEN_DECLARE_TEST(mytest) { /* code */ }. This provide several advantages: - more flexibility in designing unit tests - unit tests can be glued to speed up compilation - unit tests are compiled with same predefined macros, which is a requirement for zapcc	2018-07-17 14:46:15 +02:00
Eugene Zhulenev	43206ac4de	Call OutputKernel in evalGemv	2018-07-12 14:52:23 -07:00
Eugene Zhulenev	e204ecdaaf	Remove SimpleThreadPool and always use {NonBlocking}ThreadPool	2018-07-16 15:06:57 -07:00
Eugene Zhulenev	01fd4096d3	Fuse computations into the Tensor contractions using output kernel	2018-07-10 13:16:38 -07:00
Gael Guennebaud	5539587b1f	Some warning fixes	2018-07-17 10:29:12 +02:00
Benoit Steiner	8f55956a57	Update the padding computation for PADDING_SAME to be consistent with TensorFlow.	2018-01-30 20:22:12 +00:00
Lee.Deokjae	5b3c367926	Fix typos in the contraction example of tensor README	2018-01-06 14:36:19 +09:00
RJ Ryan	59985cfd26	Disable use of recurrence for computing twiddle factors. Fixes FFT precision issues for large FFTs. https://github.com/tensorflow/tensorflow/issues/10749#issuecomment-354557689	2017-12-31 10:44:56 -05:00
Gael Guennebaud	73214c4bd0	Workaround nvcc 9.0 issue. See PR 351. https://bitbucket.org/eigen/eigen/pull-requests/351	2017-12-15 14:10:59 +01:00
Yangzihao Wang	3122477c86	Update the padding computation for PADDING_SAME to be consistent with TensorFlow.	2017-12-12 11:15:24 -08:00
Rasmus Munk Larsen	e900b010c8	Improve robustness of igamma and igammac to bad inputs. Check for nan inputs and propagate them immediately. Limit the number of internal iterations to 2000 (same number as used by scipy.special.gammainc). This prevents an infinite loop when the function is called with nan or very large arguments. Original change by mfirgunov@google.com	2018-03-19 09:04:54 -07:00
Gael Guennebaud	00bc67c374	Move KLU support to official	2017-11-10 14:11:22 +01:00
Gael Guennebaud	b82cd93c01	KLU: truely disable unimplemented code, add proper static assertions in solve	2017-11-10 14:09:01 +01:00
Gael Guennebaud	8cf63ccb99	Merged in kylemacfarlan/eigen (pull request PR-337) Add support for SuiteSparse's KLU routines	2017-11-10 10:43:17 +00:00
Gael Guennebaud	1495b98a8e	Merged in spraetor/eigen (pull request PR-305) Issue with mpreal and std::numeric_limits::digits	2017-11-10 10:28:54 +00:00
Gael Guennebaud	fc45324380	Merged in jkflying/eigen-fix-scaling (pull request PR-302) Make scaling work with non-square matrices	2017-11-10 10:11:36 +00:00
Gael Guennebaud	1b2dcf9a47	Check that Schur decomposition succeed.	2017-11-10 10:26:09 +01:00
Gael Guennebaud	0a1cc73942	bug #1484 : restore deleted line for 128 bits long doubles, and improve dispatching logic.	2017-11-10 10:25:41 +01:00
Benoit Steiner	3949615176	Merged in JonasMu/eigen (pull request PR-329) Added an example for a contraction to a scalar value to README.md Approved-by: Jonas Harsch <jonas.harsch@gmail.com>	2017-10-27 07:27:46 +00:00
Benoit Steiner	8eb4b9d254	Merged in benoitsteiner/opencl (pull request PR-341)	2017-10-17 16:39:28 +00:00
Rasmus Munk Larsen	f349507e02	Specialize ThreadPoolDevice::enqueueNotification for the case with no args. As an example this reduces binary size of an TensorFlow demo app for Android by about 2.5%.	2017-10-13 15:58:12 -07:00
Kyle Vedder	c0e1d510fd	Add support for SuiteSparse's KLU routines	2017-10-04 21:01:23 -05:00
Mehdi Goli	2062ac9958	Changes required for new ComputeCpp CE version.	2017-09-18 18:17:39 +01:00
Rasmus Munk Larsen	1b7294f6fc	Fix cut-and-paste error.	2017-09-08 16:35:58 -07:00
Rasmus Munk Larsen	94e2213b38	Avoid undefined behavior in Eigen::TensorCostModel::numThreads. If the cost is large enough then the thread count can be larger than the maximum representable int, so just casting it to an int is undefined behavior. Contributed by phurst@google.com.	2017-09-08 15:49:55 -07:00
Gael Guennebaud	a91918a105	Merged in infinitei/eigen (pull request PR-328) bug #1464 : Fixes construction of EulerAngles from 3D vector expression. Approved-by: Tal Hadad <tal_hd@hotmail.com> Approved-by: Abhijit Kundu <abhijit.kundu@gatech.edu>	2017-09-06 08:42:14 +00:00
Jonas Harsch	a991c80365	Added an example for a contraction to a scalar value, e.g. a double contraction of two second order tensors and how you can get the value of the result. I lost one day to get this doen so I think it will help some guys. I also added Eigen:: to the IndexPair and and array in the same example.	2017-09-01 11:30:26 +00:00
Benoit Steiner	a4089991eb	Added support for CUDA 9.0.	2017-08-31 02:49:39 +00:00
Abhijit Kundu	6d991a9595	bug #1464 : Fixes construction of EulerAngles from 3D vector expression.	2017-08-30 13:26:30 -04:00
Benoit Steiner	84d7be103a	Fixing Argmax that was breaking upstream TensorFlow.	2017-07-22 03:19:34 +00:00
Benoit Steiner	f0b154a4b0	Code cleanup	2017-07-10 09:54:09 -07:00
Benoit Steiner	575cda76b3	Fixed syntax errors generated by xcode	2017-07-09 11:39:01 -07:00
Benoit Steiner	5ac27d5b51	Avoid relying on cxx11 features when possible.	2017-07-08 21:58:44 -07:00
Benoit Steiner	c5a241ab9b	Merged in benoitsteiner/opencl (pull request PR-323) Improved support for OpenCL	2017-07-07 16:27:33 +00:00
Benoit Steiner	b7ae4dd9ef	Merged in hughperkins/eigen/add-endif-labels-TensorReductionCuda.h (pull request PR-315) Add labels to #ifdef, in TensorReductionCuda.h	2017-07-07 04:23:52 +00:00
Benoit Steiner	9daed67952	Merged in tntnatbry/eigen (pull request PR-319) Tensor Trace op	2017-07-07 04:18:03 +00:00
Benoit Steiner	6795512e59	Improved the randomness of the tensor random generator	2017-07-06 21:12:45 -07:00
Benoit Steiner	dc524ac716	Fixed compilation warning	2017-07-06 21:11:15 -07:00
Benoit Steiner	62b4634ebe	Merged in mehdi_goli/upstr_benoit/TensorSYCLImageVolumePatchFixed (pull request PR-14) Applying Benoit's comment for Fixing ImageVolumePatch. * Applying Benoit's comment for Fixing ImageVolumePatch. Fixing conflict on cmake file. * Fixing dealocation of the memory in ImagePatch test for SYCL. * Fixing the automerge issue.	2017-07-06 05:08:13 +00:00
Benoit Steiner	53725c10b8	Merged in mehdi_goli/opencl/DataDependancy (pull request PR-10) DataDependancy * Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code. * Applying Ronnan's Comments. * Applying benoit's comments	2017-06-28 17:55:23 +00:00
Benoit Steiner	b8e805497e	Merged in benoitsteiner/opencl (pull request PR-318) Improved support for OpenCL	2017-06-13 05:01:10 +00:00
Hugh Perkins	9341f258d4	Add labels to #ifdef, in TensorReductionCuda.h	2017-06-06 15:51:06 +01:00
Benoit Steiner	1e736b9ead	Merged in mehdi_goli/opencl/SYCLAlignAllocator (pull request PR-7) Fixing SYCL alignment issue required by TensorFlow.	2017-05-26 17:23:00 +00:00
Benoit Steiner	9dee55ec33	Merged eigen/eigen into default	2017-05-26 09:01:04 -07:00
Mehdi Goli	0370d3576e	Applying Ronnan's comments.	2017-05-26 16:01:48 +01:00
Mehdi Goli	e3f964ed55	Applying Benoit's comment;removing dead code.	2017-05-25 11:17:26 +01:00
a-doumoulakis	fb853a857a	Restore misplaced comment	2017-05-24 17:50:15 +01:00
a-doumoulakis	7a8ba565f8	Merge changed from upstream	2017-05-24 17:45:29 +01:00
Mmanu Chaturvedi	2971503fed	Specializing numeric_limits For AutoDiffScalar	2017-05-23 17:12:36 -04:00
Gael Guennebaud	26e8f9171e	Fix compilation of matrix log with Map as input	2017-06-07 10:51:23 +02:00
Mehdi Goli	76c0fc1f95	Fixing SYCL alignment issue required by TensorFlow.	2017-05-22 16:49:32 +01:00
Mehdi Goli	2d17128d6f	Fixing suported device list.	2017-05-22 16:40:33 +01:00
a-doumoulakis	052426b824	Add support for triSYCL Eigen is now able to use triSYCL with EIGEN_SYCL_TRISYCL and TRISYCL_INCLUDE_DIR options Fix contraction kernel with correct nd_item dimension	2017-05-05 19:26:27 +01:00
RJ Ryan	949a2da38c	Use scalar_sum_op and scalar_quotient_op instead of operator+ and operator/ in MeanReducer. Improves support for std::complex types when compiling for CUDA. Expands on `e2e9cdd169` and `2bda1b0d93` .	2017-04-14 13:23:35 -07:00
Benoit Steiner	0d08165a7f	Merged in benoitsteiner/opencl (pull request PR-309) OpenCL improvements	2017-04-05 14:28:08 +00:00
Benoit Steiner	c302ea7bc4	Deleted empty line of code	2017-04-04 10:05:16 -07:00
Benoit Steiner	a5a0c8fac1	Guard sycl specific code under a EIGEN_USE_SYCL ifdef	2017-04-04 10:03:21 -07:00
Benoit Steiner	a1304b95b7	Code cleanup	2017-04-04 10:00:46 -07:00
Benoit Steiner	66c63826bd	Guard the sycl specific code with EIGEN_USE_SYCL	2017-04-04 09:59:09 -07:00
Benoit Steiner	e3e343390a	Guard the sycl specific code with a #ifdef EIGEN_USE_SYCL	2017-04-04 09:56:33 -07:00
Benoit Steiner	63840d4666	iGate the sycl specific code under a EIGEN_USE_SYCL define	2017-04-04 09:54:31 -07:00
Benoit Steiner	bc050ea9f0	Fixed compilation error when sycl is enabled.	2017-04-04 09:47:04 -07:00
Gagan Goel	4910630c96	fix typos in the Tensor readme	2017-03-31 20:32:16 -04:00
Benoit Steiner	c1b3d5ecb6	Restored code compatibility with compilers that dont support c++11 Gated more sycl code under #ifdef sycl	2017-03-31 08:31:28 -07:00
Benoit Steiner	e2d5d4e7b3	Restore the old constructors to retain compatibility with non c++11 compilers.	2017-03-31 08:26:13 -07:00
Benoit Steiner	73fcaa319f	Gate the sycl specific code under #ifdef sycl	2017-03-31 08:22:25 -07:00
Mehdi Goli	bd64ee8555	Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax.	2017-03-28 16:50:34 +01:00
Simon Praetorius	511810797e	Issue with mpreal and std::numeric_limits, i.e. digits is not a constant. Added a digits() traits in NumTraits with fallback to static constant. Specialization for mpreal added in MPRealSupport.	2017-03-24 17:45:56 +01:00
Luke Iwanski	a91417a7a5	Introduces align allocator for SYCL buffer	2017-03-20 14:48:54 +00:00
Benoit Steiner	f8a622ef3c	Merged eigen/eigen into default	2017-03-15 20:06:19 -07:00
Benoit Steiner	fd7db52f9b	Silenced compilation warning	2017-03-15 20:02:39 -07:00
Luke Iwanski	c06861d15e	Fixes bug in get_sycl_supported_devices() that was reporting unsupported Intel CPU on AMD platform - causing timeouts in that configuration	2017-03-15 19:26:08 +00:00
Benoit Steiner	f0f3591118	Made the reduction code compile with cuda-clang	2017-03-14 14:16:53 -07:00
Mehdi Goli	f499fe9496	Adding synchronisation to convolution kernel for sycl backend.	2017-03-13 09:18:37 +00:00
Rasmus Munk Larsen	bfd7bf9c5b	Get rid of Init().	2017-03-10 08:48:20 -08:00
Rasmus Munk Larsen	d56ab01094	Use C++11 ctor forwarding to simplify code a bit.	2017-03-10 08:30:22 -08:00
Rasmus Munk Larsen	344c2694a6	Make the non-blocking threadpool more flexible and less wasteful of CPU cycles for high-latency use-cases. * Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O. * This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time. * Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for. * Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().	2017-03-09 15:41:03 -08:00
Luke Iwanski	1b32a10053	Use name to distinguish name instead of the vendor	2017-03-08 18:26:34 +00:00
Gael Guennebaud	970ff78294	bug #1401 : fix compilation of "cond ? x : -x" with x an AutoDiffScalar	2017-03-08 16:16:53 +01:00
Mehdi Goli	5e9a1e7a7a	Adding sycl Benchmarks.	2017-03-08 14:17:48 +00:00
Mehdi Goli	e2e3f78533	Fixing potential race condition on sycl device.	2017-03-07 17:48:15 +00:00
Mehdi Goli	f84963ed95	Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch.	2017-03-07 14:27:10 +00:00
Julian Kent	bbe717fa2f	Make scaling work with non-square matrices	2017-03-03 12:58:51 +01:00
Benoit Steiner	a71943b9a4	Made the Tensor code compile with clang 3.9	2017-03-02 10:47:29 -08:00
Benoit Steiner	1e2d046651	Silenced a couple of compilation warnings	2017-03-01 10:13:42 -08:00
Benoit Steiner	c92406d613	Silenced clang compilation warning.	2017-02-28 17:03:11 -08:00
Benoit Steiner	de7b0fdea9	Made the TensorStorage class compile with clang 3.9	2017-02-28 13:52:22 -08:00
Mehdi Goli	8296b87d7b	Adding sycl backend for TensorCustomOp; fixing the partial lhs modification issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used.	2017-02-28 17:16:14 +00:00
Gael Guennebaud	478a9f53be	Fix typo.	2017-02-28 09:32:45 +01:00
Benoit Steiner	e0bd6f5738	Merged eigen/eigen into default	2017-02-26 10:02:14 -08:00
Mehdi Goli	2fa2b617a9	Adding TensorVolumePatchOP.h for sycl	2017-02-24 19:16:24 +00:00
Mehdi Goli	0b7875f137	Converting fixed float type into template type for TensorContraction.	2017-02-24 18:13:30 +00:00
Mehdi Goli	89dfd51fae	Adding Sycl Backend for TensorGenerator.h.	2017-02-22 16:36:24 +00:00
Gael Guennebaud	d8b1f6cebd	bug #1380 : for Map<> as input of matrix exponential	2017-02-20 14:06:06 +01:00
Mehdi Goli	79ebc8f761	Adding Sycl backend for TensorImagePatchOP.h; adding Sycl backend for TensorInflation.h.	2017-02-20 12:11:05 +00:00
Gael Guennebaud	a811a04696	Silent warning.	2017-02-20 10:14:21 +01:00
Gael Guennebaud	f8a55cc062	Fix compilation.	2017-02-18 10:08:13 +01:00
Benoit Steiner	cfa0568ef7	Size indices are signed.	2017-02-16 10:13:34 -08:00
Mehdi Goli	91982b91c0	Adding TensorLayoutSwapOp for sycl.	2017-02-15 16:28:12 +00:00
Mehdi Goli	b1e312edd6	Adding TensorPatch.h for sycl backend.	2017-02-15 10:13:01 +00:00
Mehdi Goli	0d153ded29	Adding TensorChippingOP for sycl backend; fixing the index value in the verification operation for cxx11_tensorChipping.cpp test	2017-02-13 17:25:12 +00:00
Benoit Steiner	769208a17f	Pulled latest updates from upstream	2017-02-10 13:11:40 -08:00
Mehdi Goli	0ee97b60c2	Adding mean to TensorReductionSycl.h	2017-02-07 15:43:17 +00:00
Mehdi Goli	42bd5c4e7b	Fixing TensorReductionSycl for min and max.	2017-02-06 18:05:23 +00:00
Mehdi Goli	bc128f9f3b	Reducing the warnings in Sycl backend.	2017-02-02 10:43:47 +00:00
Benoit Steiner	442e9cbb30	Silenced several compilation warnings	2017-02-01 15:50:58 -08:00
Mehdi Goli	bab29936a1	Reducing warnings in Sycl backend.	2017-02-01 15:29:53 +00:00
Mehdi Goli	48a20b7d95	Fixing compiler error on TensorContractionSycl.h; Silencing the compiler unused parameter warning for eval_op_indices in TensorContraction.h	2017-01-31 14:06:36 +00:00
Benoit Steiner	fbc39fd02c	Merge latest changes from upstream	2017-01-30 15:25:57 -08:00
Gael Guennebaud	63de19c000	bug #1380 : fix matrix exponential with Map<>	2017-01-30 13:55:27 +01:00
Mehdi Goli	82ce92419e	Fixing the buffer type in memcpy.	2017-01-30 11:38:20 +00:00
Rasmus Munk Larsen	edaa0fc5d1	Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy. This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.	2017-01-26 12:46:06 -08:00
Gael Guennebaud	25a1703579	Merged in ggael/eigen-flexidexing (pull request PR-294) generalized operator() for indexed access and slicing	2017-01-26 08:04:23 +00:00
Gael Guennebaud	607be65a03	Fix duplicates of array_size bewteen unsupported and Core	2017-01-25 22:53:58 +01:00
Rasmus Munk Larsen	e6b1020221	Adds a fast memcpy function to Eigen. This takes advantage of the following: 1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster. 2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation. The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}. Measured improvements in wall clock time: Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_memcpy_1T/2 3.48 2.39 +31.3% BM_memcpy_1T/8 12.3 6.51 +47.0% BM_memcpy_1T/64 371 383 -3.2% BM_memcpy_1T/512 66922 66720 +0.3% BM_memcpy_1T/4k 9892867 6849682 +30.8% BM_memcpy_1T/5k 14951099 10332856 +30.9% BM_memcpy_2T/2 3.50 2.46 +29.7% BM_memcpy_2T/8 12.3 7.66 +37.7% BM_memcpy_2T/64 371 376 -1.3% BM_memcpy_2T/512 66652 66788 -0.2% BM_memcpy_2T/4k 6145012 6117776 +0.4% BM_memcpy_2T/5k 9181478 9010942 +1.9% BM_memcpy_4T/2 3.47 2.47 +31.0% BM_memcpy_4T/8 12.3 6.67 +45.8 BM_memcpy_4T/64 374 376 -0.5% BM_memcpy_4T/512 67833 68019 -0.3% BM_memcpy_4T/4k 5057425 5188253 -2.6% BM_memcpy_4T/5k 7555638 7779468 -3.0% BM_memcpy_6T/2 3.51 2.50 +28.8% BM_memcpy_6T/8 12.3 7.61 +38.1% BM_memcpy_6T/64 373 378 -1.3% BM_memcpy_6T/512 66871 66774 +0.1% BM_memcpy_6T/4k 5112975 5233502 -2.4% BM_memcpy_6T/5k 7614180 7772246 -2.1% BM_memcpy_8T/2 3.47 2.41 +30.5% BM_memcpy_8T/8 12.4 10.5 +15.3% BM_memcpy_8T/64 372 388 -4.3% BM_memcpy_8T/512 67373 66588 +1.2% BM_memcpy_8T/4k 5148462 5254897 -2.1% BM_memcpy_8T/5k 7660989 7799058 -1.8% BM_memcpy_12T/2 3.50 2.40 +31.4% BM_memcpy_12T/8 12.4 7.55 +39.1 BM_memcpy_12T/64 374 378 -1.1% BM_memcpy_12T/512 67132 66683 +0.7% BM_memcpy_12T/4k 5185125 5292920 -2.1% BM_memcpy_12T/5k 7717284 7942684 -2.9% BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4% BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4% BM_slicingSmallPieces_1T/64 491 476 +3.1% BM_slicingSmallPieces_1T/512 21734 18814 +13.4% BM_slicingSmallPieces_1T/4k 394660 396760 -0.5% BM_slicingSmallPieces_1T/5k 218722 209244 +4.3% BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0% BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0 BM_slicingSmallPieces_2T/64 497 477 +4.0% BM_slicingSmallPieces_2T/512 21732 18822 +13.4% BM_slicingSmallPieces_2T/4k 392885 390490 +0.6% BM_slicingSmallPieces_2T/5k 221988 208678 +6.0% BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9% BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7% BM_slicingSmallPieces_4T/64 493 476 +3.4% BM_slicingSmallPieces_4T/512 21702 18758 +13.6% BM_slicingSmallPieces_4T/4k 393962 404023 -2.6% BM_slicingSmallPieces_4T/5k 249667 211732 +15.2% BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5% BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8% BM_slicingSmallPieces_6T/64 488 478 +2.0% BM_slicingSmallPieces_6T/512 21719 18841 +13.3% BM_slicingSmallPieces_6T/4k 394950 397583 -0.7% BM_slicingSmallPieces_6T/5k 223080 210148 +5.8% BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0% BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9% BM_slicingSmallPieces_8T/64 489 480 +1.8% BM_slicingSmallPieces_8T/512 21586 18798 +12.9% BM_slicingSmallPieces_8T/4k 394592 400165 -1.4% BM_slicingSmallPieces_8T/5k 219688 208301 +5.2% BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7% BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8 BM_slicingSmallPieces_12T/64 488 476 +2.5% BM_slicingSmallPieces_12T/512 21931 18831 +14.1% BM_slicingSmallPieces_12T/4k 393962 396541 -0.7% BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%	2017-01-24 13:55:18 -08:00

1 2 3 4 5 ...

1986 Commits