eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-06 14:14:46 +08:00

Author	SHA1	Message	Date
Luke Iwanski	1b32a10053	Use name to distinguish name instead of the vendor	2017-03-08 18:26:34 +00:00
Mehdi Goli	5e9a1e7a7a	Adding sycl Benchmarks.	2017-03-08 14:17:48 +00:00
Mehdi Goli	e2e3f78533	Fixing potential race condition on sycl device.	2017-03-07 17:48:15 +00:00
Mehdi Goli	f84963ed95	Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch.	2017-03-07 14:27:10 +00:00
Benoit Steiner	a71943b9a4	Made the Tensor code compile with clang 3.9	2017-03-02 10:47:29 -08:00
Benoit Steiner	1e2d046651	Silenced a couple of compilation warnings	2017-03-01 10:13:42 -08:00
Benoit Steiner	c92406d613	Silenced clang compilation warning.	2017-02-28 17:03:11 -08:00
Benoit Steiner	de7b0fdea9	Made the TensorStorage class compile with clang 3.9	2017-02-28 13:52:22 -08:00
Mehdi Goli	8296b87d7b	Adding sycl backend for TensorCustomOp; fixing the partial lhs modification issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used.	2017-02-28 17:16:14 +00:00
Gael Guennebaud	478a9f53be	Fix typo.	2017-02-28 09:32:45 +01:00
Benoit Steiner	e0bd6f5738	Merged eigen/eigen into default	2017-02-26 10:02:14 -08:00
Mehdi Goli	2fa2b617a9	Adding TensorVolumePatchOP.h for sycl	2017-02-24 19:16:24 +00:00
Mehdi Goli	0b7875f137	Converting fixed float type into template type for TensorContraction.	2017-02-24 18:13:30 +00:00
Mehdi Goli	89dfd51fae	Adding Sycl Backend for TensorGenerator.h.	2017-02-22 16:36:24 +00:00
Mehdi Goli	79ebc8f761	Adding Sycl backend for TensorImagePatchOP.h; adding Sycl backend for TensorInflation.h.	2017-02-20 12:11:05 +00:00
Gael Guennebaud	a811a04696	Silent warning.	2017-02-20 10:14:21 +01:00
Gael Guennebaud	f8a55cc062	Fix compilation.	2017-02-18 10:08:13 +01:00
Benoit Steiner	cfa0568ef7	Size indices are signed.	2017-02-16 10:13:34 -08:00
Mehdi Goli	91982b91c0	Adding TensorLayoutSwapOp for sycl.	2017-02-15 16:28:12 +00:00
Mehdi Goli	b1e312edd6	Adding TensorPatch.h for sycl backend.	2017-02-15 10:13:01 +00:00
Mehdi Goli	0d153ded29	Adding TensorChippingOP for sycl backend; fixing the index value in the verification operation for cxx11_tensorChipping.cpp test	2017-02-13 17:25:12 +00:00
Benoit Steiner	769208a17f	Pulled latest updates from upstream	2017-02-10 13:11:40 -08:00
Mehdi Goli	0ee97b60c2	Adding mean to TensorReductionSycl.h	2017-02-07 15:43:17 +00:00
Mehdi Goli	42bd5c4e7b	Fixing TensorReductionSycl for min and max.	2017-02-06 18:05:23 +00:00
Mehdi Goli	bc128f9f3b	Reducing the warnings in Sycl backend.	2017-02-02 10:43:47 +00:00
Benoit Steiner	442e9cbb30	Silenced several compilation warnings	2017-02-01 15:50:58 -08:00
Mehdi Goli	bab29936a1	Reducing warnings in Sycl backend.	2017-02-01 15:29:53 +00:00
Mehdi Goli	48a20b7d95	Fixing compiler error on TensorContractionSycl.h; Silencing the compiler unused parameter warning for eval_op_indices in TensorContraction.h	2017-01-31 14:06:36 +00:00
Benoit Steiner	fbc39fd02c	Merge latest changes from upstream	2017-01-30 15:25:57 -08:00
Mehdi Goli	82ce92419e	Fixing the buffer type in memcpy.	2017-01-30 11:38:20 +00:00
Rasmus Munk Larsen	edaa0fc5d1	Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy. This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.	2017-01-26 12:46:06 -08:00
Gael Guennebaud	25a1703579	Merged in ggael/eigen-flexidexing (pull request PR-294) generalized operator() for indexed access and slicing	2017-01-26 08:04:23 +00:00
Gael Guennebaud	607be65a03	Fix duplicates of array_size bewteen unsupported and Core	2017-01-25 22:53:58 +01:00
Rasmus Munk Larsen	e6b1020221	Adds a fast memcpy function to Eigen. This takes advantage of the following: 1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster. 2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation. The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}. Measured improvements in wall clock time: Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_memcpy_1T/2 3.48 2.39 +31.3% BM_memcpy_1T/8 12.3 6.51 +47.0% BM_memcpy_1T/64 371 383 -3.2% BM_memcpy_1T/512 66922 66720 +0.3% BM_memcpy_1T/4k 9892867 6849682 +30.8% BM_memcpy_1T/5k 14951099 10332856 +30.9% BM_memcpy_2T/2 3.50 2.46 +29.7% BM_memcpy_2T/8 12.3 7.66 +37.7% BM_memcpy_2T/64 371 376 -1.3% BM_memcpy_2T/512 66652 66788 -0.2% BM_memcpy_2T/4k 6145012 6117776 +0.4% BM_memcpy_2T/5k 9181478 9010942 +1.9% BM_memcpy_4T/2 3.47 2.47 +31.0% BM_memcpy_4T/8 12.3 6.67 +45.8 BM_memcpy_4T/64 374 376 -0.5% BM_memcpy_4T/512 67833 68019 -0.3% BM_memcpy_4T/4k 5057425 5188253 -2.6% BM_memcpy_4T/5k 7555638 7779468 -3.0% BM_memcpy_6T/2 3.51 2.50 +28.8% BM_memcpy_6T/8 12.3 7.61 +38.1% BM_memcpy_6T/64 373 378 -1.3% BM_memcpy_6T/512 66871 66774 +0.1% BM_memcpy_6T/4k 5112975 5233502 -2.4% BM_memcpy_6T/5k 7614180 7772246 -2.1% BM_memcpy_8T/2 3.47 2.41 +30.5% BM_memcpy_8T/8 12.4 10.5 +15.3% BM_memcpy_8T/64 372 388 -4.3% BM_memcpy_8T/512 67373 66588 +1.2% BM_memcpy_8T/4k 5148462 5254897 -2.1% BM_memcpy_8T/5k 7660989 7799058 -1.8% BM_memcpy_12T/2 3.50 2.40 +31.4% BM_memcpy_12T/8 12.4 7.55 +39.1 BM_memcpy_12T/64 374 378 -1.1% BM_memcpy_12T/512 67132 66683 +0.7% BM_memcpy_12T/4k 5185125 5292920 -2.1% BM_memcpy_12T/5k 7717284 7942684 -2.9% BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4% BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4% BM_slicingSmallPieces_1T/64 491 476 +3.1% BM_slicingSmallPieces_1T/512 21734 18814 +13.4% BM_slicingSmallPieces_1T/4k 394660 396760 -0.5% BM_slicingSmallPieces_1T/5k 218722 209244 +4.3% BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0% BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0 BM_slicingSmallPieces_2T/64 497 477 +4.0% BM_slicingSmallPieces_2T/512 21732 18822 +13.4% BM_slicingSmallPieces_2T/4k 392885 390490 +0.6% BM_slicingSmallPieces_2T/5k 221988 208678 +6.0% BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9% BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7% BM_slicingSmallPieces_4T/64 493 476 +3.4% BM_slicingSmallPieces_4T/512 21702 18758 +13.6% BM_slicingSmallPieces_4T/4k 393962 404023 -2.6% BM_slicingSmallPieces_4T/5k 249667 211732 +15.2% BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5% BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8% BM_slicingSmallPieces_6T/64 488 478 +2.0% BM_slicingSmallPieces_6T/512 21719 18841 +13.3% BM_slicingSmallPieces_6T/4k 394950 397583 -0.7% BM_slicingSmallPieces_6T/5k 223080 210148 +5.8% BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0% BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9% BM_slicingSmallPieces_8T/64 489 480 +1.8% BM_slicingSmallPieces_8T/512 21586 18798 +12.9% BM_slicingSmallPieces_8T/4k 394592 400165 -1.4% BM_slicingSmallPieces_8T/5k 219688 208301 +5.2% BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7% BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8 BM_slicingSmallPieces_12T/64 488 476 +2.5% BM_slicingSmallPieces_12T/512 21931 18831 +14.1% BM_slicingSmallPieces_12T/4k 393962 396541 -0.7% BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%	2017-01-24 13:55:18 -08:00
Luke Iwanski	bf44fed9b7	Allows AMD APU	2017-01-23 15:56:45 +00:00
Mehdi Goli	602f8c27f5	Reverting back to the previous TensorDeviceSycl.h as the total number of buffer is not enough for tensorflow.	2017-01-20 18:23:20 +00:00
Mehdi Goli	77cc4d06c7	Removing unused variables	2017-01-19 17:06:21 +00:00
Mehdi Goli	837fdbdcb2	Merging with Benoit's upstream.	2017-01-19 11:34:34 +00:00
Mehdi Goli	6bdd15f572	Adding non-deferrenciable pointer track for ComputeCpp backend; Adding TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class.	2017-01-19 11:30:59 +00:00
Mehdi Goli	c6f7b33834	Applying Benoit's comment. Embedding synchronisation inside device memcpy so there is no need to externally call synchronise() for device memcopy.	2017-01-18 10:45:28 +00:00
Mehdi Goli	e46e722381	Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree.	2017-01-16 13:58:49 +00:00
Gael Guennebaud	bbd97b4095	Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases	2017-07-17 01:02:51 +02:00
Luke Iwanski	90c5bc8d64	Fixes auto appearance in functor template argument for reduction.	2017-01-04 22:18:44 +00:00
Mehdi Goli	8b1c2108ba	Reverting asynchronous exec to Synchronous exec regarding random race condition.	2016-12-22 16:45:38 +00:00
Benoit Steiner	660da83e18	Pulled latest update from trunk	2016-12-21 16:43:27 -08:00
Benoit Steiner	4236aebe10	Simplified the contraction code`	2016-12-21 16:42:56 -08:00
Benoit Steiner	3cfa16f41d	Merged in benoitsteiner/opencl (pull request PR-279) Fix for auto appearing in functor template argument.	2016-12-21 15:08:54 -08:00
Benoit Steiner	519d63d350	Added support for libxsmm kernel in multithreaded contractions	2016-12-21 15:06:06 -08:00
Benoit Steiner	f9eff17e91	Leverage libxsmm kernels within signle threaded contractions	2016-12-21 12:32:06 -08:00
Luke Iwanski	c55ecfd820	Fix for auto appearing in functor template argument.	2016-12-21 15:42:51 +00:00
Luke Iwanski	29186f766f	Fixed order of initialisation in ExecExprFunctorKernel functor.	2016-12-20 21:32:42 +00:00
Luke Iwanski	8245851d1b	Matching parameters order between lambda and the functor.	2016-12-20 16:18:15 +00:00
Benoit Steiner	70d0172f0c	Merged eigen/eigen into default	2016-12-16 17:37:04 -08:00
Benoit Steiner	8910442e19	Fixed memcpy, memcpyHostToDevice and memcpyDeviceToHost for Sycl.	2016-12-16 15:45:04 -08:00
Luke Iwanski	54db66c5df	struct -> class in order to silence compilation warning.	2016-12-16 20:25:20 +00:00
Mehdi Goli	35bae513a0	Converting all parallel for lambda to functor in order to prevent kernel duplication name error; adding tensorConcatinationOp backend for sycl.	2016-12-16 19:46:45 +00:00
Mehdi Goli	c5e8546306	Adding asynchandler to sycl queue as lack of it can cause undefined behaviour.	2016-12-15 16:59:57 +00:00
Benoit Steiner	2c2e218471	Avoid using #define since they can conflict with user code	2016-12-14 19:49:15 -08:00
Benoit Steiner	3beb180ee5	Don't call EnvThread::OnCancel by default since it doesn't do anything.	2016-12-14 18:33:39 -08:00
Benoit Steiner	9ff5d0f821	Merged eigen/eigen into default	2016-12-14 17:32:16 -08:00
Mehdi Goli	730eb9fe1c	Adding asynchronous execution as it improves the performance.	2016-12-14 17:38:53 +00:00
Mehdi Goli	2d4a091beb	Adding tensor contraction operation backend for Sycl; adding test for contractionOp sycl backend; adding temporary solution to prevent memory leak in buffer; cleaning up cxx11_tensor_buildins_sycl.h	2016-12-14 15:30:37 +00:00
Benoit Steiner	a432fc102d	Moved the choice of ThreadPool to unsupported/Eigen/CXX11/ThreadPool	2016-12-12 15:24:16 -08:00
Benoit Steiner	8ae68924ed	Made ThreadPoolInterface::Cancel() an optional functionality	2016-12-12 11:58:38 -08:00
Benoit Steiner	76fca22134	Use a more accurate timer to sleep on Linux systems.	2016-12-09 15:12:24 -08:00
Benoit Steiner	4deafd35b7	Introduce a portable EIGEN_SLEEP macro.	2016-12-09 14:52:15 -08:00
Benoit Steiner	aafa97f4d2	Fixed build error with MSVC	2016-12-09 14:42:32 -08:00
Benoit Steiner	2f5b7a199b	Reworked the threadpool cancellation mechanism to not depend on pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms.	2016-12-09 13:05:14 -08:00
Benoit Steiner	28ee8f42b2	Added a Flush method to the RunQueue	2016-12-08 14:07:56 -08:00
Benoit Steiner	69ef267a77	Added the new threadpool cancel method to the threadpool interface based class.	2016-12-08 14:03:25 -08:00
Benoit Steiner	7bfff85355	Added support for thread cancellation on Linux	2016-12-08 08:12:49 -08:00
Benoit Steiner	462c28e77a	Merged in srvasude/eigen (pull request PR-265) Add Expm1 support to Eigen.	2016-12-05 02:31:11 +00:00
Gael Guennebaud	4465d20403	Add missing generic load methods.	2016-12-03 21:25:04 +01:00
Srinivas Vasudevan	218764ee1f	Added support for expm1 in Eigen.	2016-12-02 14:13:01 -08:00
Mehdi Goli	592acc5bfa	Makingt default numeric_list works with sycl.	2016-12-02 17:58:30 +00:00
Mehdi Goli	79aa2b784e	Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.	2016-12-01 13:02:27 +00:00
Benoit Steiner	a70393fd02	Cleaned up forward declarations	2016-11-30 21:59:07 -08:00
Benoit Steiner	e073de96dc	Moved the MemCopyFunctor back to TensorSyclDevice since it's the only caller and it makes TensorFlow compile again	2016-11-30 21:36:52 -08:00
Benoit Steiner	fca27350eb	Added the deallocate_all() method back	2016-11-30 20:45:20 -08:00
Benoit Steiner	e633a8371f	Simplified includes	2016-11-30 20:21:18 -08:00
Benoit Steiner	7cd33df4ce	Improved formatting	2016-11-30 20:20:44 -08:00
Benoit Steiner	f5107010ee	Udated the Sizes class to work on AMD gpus without requiring a separate implementation	2016-11-30 19:57:28 -08:00
Benoit Steiner	e37c2c52d3	Added an implementation of numeric_list that works with sycl	2016-11-30 19:55:15 -08:00
Luke Iwanski	26fff1c5b1	Added EIGEN_STRONG_INLINE to get_sycl_supported_device().	2016-11-30 16:55:22 +00:00
Mehdi Goli	577ce78085	Adding TensorShuffling backend for sycl; adding TensorReshaping backend for sycl; cleaning up the sycl backend.	2016-11-29 15:30:42 +00:00
Benoit Steiner	02080e2b67	Merged eigen/eigen into default	2016-11-27 07:27:30 -08:00
Benoit Steiner	9fd081cddc	Fixed compilation warnings	2016-11-26 20:22:25 -08:00
Benoit Steiner	9f8fbd9434	Merged eigen/eigen into default	2016-11-26 11:28:25 -08:00
Benoit Steiner	67b2c41f30	Avoided unnecessary type conversion	2016-11-26 11:27:29 -08:00
Benoit Steiner	7fe704596a	Added missing array_get method for numeric_list	2016-11-26 11:26:07 -08:00
Mehdi Goli	7318daf887	Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h.	2016-11-25 16:19:07 +00:00
Benoit Steiner	7ad37606dd	Fixed the documentation of Scalar Tensors	2016-11-24 12:31:43 -08:00
Mehdi Goli	b8cc5635d5	Removing unsupported device from test case; cleaning the tensor device sycl.	2016-11-23 16:30:41 +00:00
Benoit Steiner	f11da1d83b	Made the QueueInterface thread safe	2016-11-20 13:17:08 -08:00
Benoit Steiner	6d781e3e52	Merged eigen/eigen into default	2016-11-20 10:12:54 -08:00
Benoit Steiner	79a07b891b	Fixed a typo	2016-11-20 07:07:41 -08:00
Benoit Steiner	81151bd474	Fixed merge conflicts	2016-11-19 19:12:59 -08:00
Benoit Steiner	9265ca707e	Made it possible to check the state of a sycl device without synchronization	2016-11-19 10:56:24 -08:00
Benoit Steiner	2d1aec15a7	Added missing include	2016-11-19 08:09:54 -08:00
Benoit Steiner	1bdf1b9ce0	Merged in benoitsteiner/opencl (pull request PR-253) OpenCL improvements	2016-11-19 04:44:43 +00:00
Benoit Steiner	dc601d79d1	Added the ability to run test exclusively OpenCL devices that are listed by sycl::device::get_devices().	2016-11-18 16:26:50 -08:00
Benoit Steiner	110b7f8d9f	Deleted unnecessary semicolons	2016-11-18 14:06:17 -08:00
Benoit Steiner	37c2c516a6	Cleaned up the sycl device code	2016-11-18 12:38:06 -08:00
Mehdi Goli	15e226d7d3	adding Benoit changes on the TensorDeviceSycl.h	2016-11-18 16:34:54 +00:00
Mehdi Goli	622805a0c5	Modifying TensorDeviceSycl.h to always create buffer of type uint8_t and convert them to the actual type at the execution on the device; adding the queue interface class to separate the lifespan of sycl queue and buffers,created for that queue, from Eigen::SyclDevice; modifying sycl tests to support the evaluation of the results for both row major and column major data layout on all different devices that are supported by Sycl{CPU; GPU; and Host}.	2016-11-18 16:20:42 +00:00
Benoit Steiner	7c30078b9f	Merged eigen/eigen into default	2016-11-17 22:53:37 -08:00
Benoit Steiner	553f50b246	Added a way to detect errors generated by the opencl device from the host	2016-11-17 21:51:48 -08:00
Benoit Steiner	72a45d32e9	Cleanup	2016-11-17 21:29:15 -08:00
Benoit Steiner	4349fc640e	Created a test to check that the sycl runtime can successfully report errors (like ivision by 0). Small cleanup	2016-11-17 20:27:54 -08:00
Benoit Steiner	a6a3fd0703	Made TensorDeviceCuda.h compile on windows	2016-11-17 16:15:27 -08:00
Luke Iwanski	c5130dedbe	Specialised basic math functions for SYCL device.	2016-11-17 11:47:13 +00:00
Benoit Steiner	b5c75351e3	Merged eigen/eigen into default	2016-11-14 15:54:44 -08:00
Rasmus Munk Larsen	32df1b1046	Reduce dispatch overhead in parallelFor by only calling thread_pool.Schedule() for one of the two recursive calls in handleRange. This avoids going through the scedule path to push both recursive calls onto another thread-queue in the binary tree, but instead executes one of them on the main thread. At the leaf level this will still activate a full complement of threads, but will save up to 50% of the overhead in Schedule (random number generation, insertion in queue which includes signaling via atomics).	2016-11-14 14:18:16 -08:00
Mehdi Goli	05e8c2a1d9	Adding extra test for non-fixed size to broadcast; Replacing stcl with sycl.	2016-11-14 18:13:53 +00:00
Mehdi Goli	f8ca893976	Adding TensorFixsize; adding sycl device memcpy; adding insial stage of slicing.	2016-11-14 17:51:57 +00:00
Mehdi Goli	a5c3f15682	Adding comment to TensorDeviceSycl.h and cleaning the code.	2016-11-11 19:06:34 +00:00
Mehdi Goli	3be3963021	Adding EIGEN_STRONG_INLINE back; using size() instead of dimensions.TotalSize() on Tensor.	2016-11-10 19:16:31 +00:00
Mehdi Goli	12387abad5	adding the missing in eigen_assert!	2016-11-10 18:58:08 +00:00
Mehdi Goli	2e704d4257	Adding Memset; optimising MecopyDeviceToHost by removing double copying;	2016-11-10 18:45:12 +00:00
Benoit Steiner	dcc14bee64	Fixed the formatting of the code	2016-11-08 14:24:46 -08:00
Luke Iwanski	912cb3d660	#if EIGEN_EXCEPTION -> #ifdef EIGEN_EXCEPTIONS.	2016-11-08 22:01:14 +00:00
Luke Iwanski	1b345b0895	Fix for SYCL queue initialisation.	2016-11-08 21:56:31 +00:00
Luke Iwanski	1b95717358	Use try/catch only when exceptions are enabled.	2016-11-08 21:08:53 +00:00
Mehdi Goli	d57430dd73	Converting all sycl buffers to uninitialised device only buffers; adding memcpyHostToDevice and memcpyDeviceToHost on syclDevice; modifying all examples to obey the new rules; moving sycl queue creating to the device based on Benoit suggestion; removing the sycl specefic condition for returning m_result in TensorReduction.h according to Benoit suggestion.	2016-11-08 17:08:02 +00:00
Benoit Steiner	dad177be01	Added missing includes	2016-11-05 10:04:42 -07:00
Mehdi Goli	0ebe3808ca	Removed the sycl include from Eigen/Core and moved it to Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size;	2016-11-04 18:18:19 +00:00
Benoit Steiner	0585b2965d	Disable vectorization on device only when compiling for sycl	2016-11-02 11:44:27 -07:00
Mehdi Goli	51af6ae971	Fixed the ambiguity in callig make_tuple for sycl backend.	2016-10-31 16:35:51 +00:00
Benoit Steiner	0a9ad6fc72	Worked around Visual Studio compilation errors	2016-10-28 07:54:27 -07:00
Benoit Steiner	b0c5bfdf78	Added missing template parameters	2016-10-28 03:43:41 +00:00
Gael Guennebaud	530f20c21a	Workaround MSVC issue.	2016-10-27 21:51:37 +02:00
Benoit Steiner	0a4c4d40b4	Removed a template parameter for fixed sized tensors	2016-10-26 18:47:37 -07:00
Benoit Steiner	5f2dd503ff	Replaced tabs with spaces	2016-10-25 20:40:58 -07:00
Benoit Steiner	1644bafe29	Code cleanup	2016-10-25 20:36:14 -07:00
Benoit Steiner	cf20b30d65	Merge latest updates from trunk	2016-10-20 09:42:05 -07:00
Luke Iwanski	03b63e182c	Added SYCL include in Tensor.	2016-10-20 15:32:44 +01:00
Benoit Steiner	d3943cd50c	Fixed a few typos in the ternary tensor expressions types	2016-10-19 12:56:12 -07:00
Mehdi Goli	e36cb91c99	Fixing the code indentation in the TensorReduction.h file.	2016-10-14 18:03:00 +01:00
Luke Iwanski	e742da8b28	Merged ComputeCpp into default.	2016-10-14 13:36:51 +01:00
Mehdi Goli	524fa4c46f	Reducing the code by generalising sycl backend functions/structs.	2016-10-14 12:09:55 +01:00
Benoit Steiner	7e4a6754b2	Merged eigen/eigen into default	2016-10-12 22:42:33 -07:00
Benoit Steiner	7f0599b6eb	Manually define int16_t and uint16_t when compiling with Visual Studio	2016-10-08 22:56:32 -07:00
Benoit Steiner	5c68051cd7	Merge the content of the ComputeCpp branch into the default branch	2016-10-07 11:04:16 -07:00
RJ Ryan	e2e9cdd169	Fully support complex types in SumReducer and MeanReducer when building for CUDA by using scalar_sum_op and scalar_product_op instead of operator+ and operator*.	2016-10-06 10:49:48 -07:00
Benoit Steiner	ae1385c7e4	Pull the latest updates from trunk	2016-10-05 14:54:36 -07:00
Benoit Steiner	c84084c0c0	Fixed compilation warning	2016-10-05 14:15:41 -07:00
Benoit Steiner	8b69d5d730	::rand() returns a signed integer on win32	2016-10-05 08:55:02 -07:00
Benoit Steiner	ed7a220b04	Fixed a typo that impacts windows builds	2016-10-05 08:51:31 -07:00
Benoit Steiner	ceee1c008b	Silenced compilation warning	2016-10-04 18:47:53 -07:00
Benoit Steiner	6af5ac7e27	Cleanup the cuda executor code.	2016-10-04 08:52:13 -07:00
Benoit Steiner	2f6d1607c8	Cleaned up the random number generation code.	2016-10-04 08:38:23 -07:00
Benoit Steiner	2bda1b0d93	Updated the tensor sum and mean reducer to enable them to process complex numbers on cuda gpus.	2016-09-28 17:08:41 -07:00
Mehdi Goli	dd602e62c8	Converting alias template to nested struct in order to be compatible with CXX-03	2016-09-27 16:21:19 +01:00
Benoit Steiner	6565f8d60f	Made the initialization of a CUDA device thread safe.	2016-09-26 11:00:32 -07:00
Benoit Steiner	f6ac51a054	Made TensorEvalTo compatible with c++0x again.	2016-09-23 16:45:17 -07:00
Benoit Steiner	00d4e65f00	Deleted unused TensorMap data member	2016-09-23 16:44:45 -07:00
Benoit Steiner	1301d744f8	Made the gaussian generator usable on GPU	2016-09-22 19:04:44 -07:00
Benoit Steiner	c3ca9b1e76	Deleted some unecessary and confusing EIGEN_DEVICE_FUNC	2016-09-19 11:33:39 -07:00
Luke Iwanski	b91e021172	Merged with default.	2016-09-19 14:03:54 +01:00
Luke Iwanski	cb81975714	Partial OpenCL support via SYCL compatible with ComputeCpp CE.	2016-09-19 12:44:13 +01:00
Gael Guennebaud	18f6e47815	Fix order of "static inline".	2016-09-16 11:32:54 +02:00
Benoit Steiner	488ad7dd1b	Added missing EIGEN_DEVICE_FUNC qualifiers	2016-09-14 13:35:00 -07:00
Benoit Steiner	028e299577	Fixed a bug impacting some outer reductions on GPU	2016-09-12 18:36:52 -07:00
Benoit Steiner	8321dcce76	Merged latest updates from trunk	2016-09-12 10:33:05 -07:00
Benoit Steiner	eb6ba00cc8	Properly size the list of waiters	2016-09-12 10:31:55 -07:00
Benoit Steiner	a618094b62	Added a resize method to MaxSizeVector	2016-09-12 10:30:53 -07:00
Gael Guennebaud	471eac5399	bug #1195 : move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX)	2016-09-08 08:36:27 +02:00
Benoit Steiner	13df3441ae	Use MaxSizeVector instead of std::vector: xcode sometimes assumes that std::vector allocates aligned memory and therefore issues aligned instruction to initialize it. This can result in random crashes when compiling with AVX instructions enabled.	2016-09-02 19:25:47 -07:00
Benoit Steiner	cadd124d73	Pulled latest update from trunk	2016-09-02 15:30:02 -07:00
Benoit Steiner	05b0518077	Made the index type an explicit template parameter to help some compilers compile the code.	2016-09-02 15:29:34 -07:00
Benoit Steiner	adf864fec0	Merged in rmlarsen/eigen (pull request PR-222) Fix CUDA build broken by changes to min and max reduction.	2016-09-02 14:11:20 -07:00
Rasmus Munk Larsen	13e93ca8b7	Fix CUDA build broken by changes to min and max reduction.	2016-09-02 13:41:36 -07:00
Benoit Steiner	c53f783705	Updated the contraction code to support constant inputs.	2016-09-01 11:41:27 -07:00
Gael Guennebaud	46475eff9a	Adjust Tensor module wrt recent change in nullary functor	2016-09-01 13:40:45 +02:00
Rasmus Munk Larsen	a1e092d1e8	Fix bugs to make min- and max reducers with correctly with IEEE infinities.	2016-08-31 15:04:16 -07:00
Gael Guennebaud	35a8e94577	bug #1167 : simplify installation of header files using cmake's install(DIRECTORY ...) command.	2016-08-29 10:59:37 +02:00
Gael Guennebaud	965e595f02	Add missing log1p method	2016-08-26 14:55:00 +02:00
Benoit Steiner	34ae80179a	Use array_prod instead of calling TotalSize since TotalSize is only available on DSize.	2016-08-15 10:29:14 -07:00
Benoit Steiner	fe73648c98	Fixed a bug in the documentation.	2016-08-12 10:00:43 -07:00
Benoit Steiner	64e68cbe87	Don't attempt to optimize partial reductions when the optimized implementation doesn't buy anything.	2016-08-08 19:29:59 -07:00
Benoit Steiner	ca2cee2739	Merged in ibab/eigen (pull request PR-206) Expose real and imag methods on Tensors	2016-08-03 11:53:04 -07:00
Benoit Steiner	a20b58845f	CUDA_ARCH isn't always defined, so avoid relying on it too much when figuring out which implementation to use for reductions. Instead rely on the device to tell us on which hardware version we're running.	2016-08-03 10:00:43 -07:00
Benoit Steiner	fd220dd8b0	Use numext::conj instead of std::conj	2016-08-01 18:16:16 -07:00
Benoit Steiner	e256acec7c	Avoid unecessary object copies	2016-08-01 17:03:39 -07:00
Benoit Steiner	2693fd54bf	bug #1266 : half implementation has been moved to half_impl namespace	2016-07-29 13:45:56 -07:00
Benoit Steiner	3d3d34e442	Deleted dead code.	2016-07-25 08:53:37 -07:00
Gael Guennebaud	6d5daf32f5	bug #1255 : comment out broken and unsused line.	2016-07-25 14:48:30 +02:00
Gael Guennebaud	9908020d36	Add minimal support for Array<string>, and fix Tensor<string>	2016-07-25 14:25:56 +02:00
Benoit Steiner	c6b0de2c21	Improved partial reductions in more cases	2016-07-22 17:18:20 -07:00
Gael Guennebaud	0f350a8b7e	Fix CUDA compilation	2016-07-21 18:47:07 +02:00
Benoit Steiner	20f7ef2f89	An evalTo expression is only aligned iff both the lhs and the rhs are aligned.	2016-07-12 10:56:42 -07:00
Benoit Steiner	3a2dd352ae	Improved the contraction mapper to properly support tensor products	2016-07-11 13:43:41 -07:00
Benoit Steiner	0bc020be9d	Improved the detection of packet size in the tensor scan evaluator.	2016-07-11 12:14:56 -07:00
Gael Guennebaud	fd60966310	merge	2016-07-11 18:11:47 +02:00
Gael Guennebaud	194daa3048	Fix assertion (it did not make sense for static_val types)	2016-07-11 11:39:27 +02:00
Gael Guennebaud	18c35747ce	Emulate _BitScanReverse64 for 32 bits builds	2016-07-11 11:38:04 +02:00
Gael Guennebaud	599f8ba617	Change runtime to compile-time conditional.	2016-07-08 11:39:43 +02:00
Gael Guennebaud	544935101a	Fix warnings	2016-07-08 11:38:52 +02:00
Gael Guennebaud	2f7e2614e7	bug #1232 : refactor special functions as a new SpecialFunctions module, currently in unsupported/.	2016-07-08 11:13:55 +02:00
Gael Guennebaud	179ebb88f9	Fix warning	2016-07-07 09:16:40 +02:00
Gael Guennebaud	ce9fc0ce14	fix clang compilation	2016-07-04 12:59:02 +02:00
Gael Guennebaud	440020474c	Workaround compilation issue with msvc	2016-07-04 12:49:19 +02:00
Igor Babuschkin	78f37ca03c	Expose real and imag methods on Tensors	2016-07-01 17:34:31 +01:00
Benoit Steiner	cb2d8b8fa6	Made it possible to compile reductions for an old cuda architecture and run them on a recent gpu.	2016-06-29 15:42:01 -07:00
Benoit Steiner	b2a47641ce	Made the code compile when using CUDA architecture < 300	2016-06-29 15:32:47 -07:00
Igor Babuschkin	85699850d9	Add missing CUDA kernel to tensor scan op The TensorScanOp implementation was missing a CUDA kernel launch. This adds a simple placeholder implementation.	2016-06-29 11:54:35 +01:00
Benoit Steiner	75c333f94c	Don't store the scan axis in the evaluator of the tensor scan operation since it's only used in the constructor. Also avoid taking references to values that may becomes stale after a copy construction.	2016-06-27 10:32:38 -07:00
Benoit Steiner	7944d4431f	Made the cost model cwiseMax and cwiseMin methods consts to help the PowerPC cuda compiler compile this code.	2016-08-18 13:46:36 -07:00
Benoit Steiner	647a51b426	Force the inlining of a simple accessor.	2016-08-18 12:31:02 -07:00
Benoit Steiner	a452dedb4f	Merged in ibab/eigen/double-tensor-reduction (pull request PR-216) Enable efficient Tensor reduction for doubles on the GPU (continued)	2016-08-18 12:29:54 -07:00
Igor Babuschkin	18c67df31c	Fix remaining CUDA >= 300 checks	2016-08-18 17:18:30 +01:00
Igor Babuschkin	1569a7d7ab	Add the necessary CUDA >= 300 checks back	2016-08-18 17:15:12 +01:00
Benoit Steiner	2b17f34574	Properly detect the type of the result of a contraction.	2016-08-16 16:00:30 -07:00
Igor Babuschkin	841e075154	Remove CUDA >= 300 checks and enable outer reductin for doubles	2016-08-06 18:07:50 +01:00
Igor Babuschkin	0425118e2a	Merge upstream changes	2016-08-05 14:34:57 +01:00
Igor Babuschkin	9537e8b118	Make use of atomicExch for atomicExchCustom	2016-08-05 14:29:58 +01:00
Igor Babuschkin	eeb0d880ee	Enable efficient Tensor reduction for doubles	2016-07-01 19:08:26 +01:00
Rasmus Munk Larsen	a9c1e4d7b7	Return -1 from CurrentThreadId when called by thread outside the pool.	2016-06-23 16:40:07 -07:00
Rasmus Munk Larsen	d39df320d2	Resolve merge.	2016-06-23 15:08:03 -07:00
Gael Guennebaud	360a743a10	bug #1241 : does not emmit anything for empty tensors	2016-06-23 18:47:31 +02:00
Gael Guennebaud	7c6561485a	merge PR 194	2016-06-23 15:29:57 +02:00
Benoit Steiner	a29a2cb4ff	Silenced a couple of compilation warnings generated by xcode	2016-06-22 16:43:02 -07:00
Benoit Steiner	f8fcd6b32d	Turned the constructor of the PerThread struct into what is effectively a constant expression to make the code compatible with a wider range of compilers	2016-06-22 16:03:11 -07:00
Benoit Steiner	c58df31747	Handle empty tensors in the print functions	2016-06-21 09:22:43 -07:00
Benoit Steiner	de32f8d656	Fixed the printing of rank-0 tensors	2016-06-20 10:46:45 -07:00
Benoit Steiner	7d495d890a	Merged in ibab/eigen (pull request PR-197) Implement exclusive scan option for Tensor library	2016-06-14 17:54:59 -07:00
Benoit Steiner	aedc5be1d6	Avoid generating pseudo random numbers that are multiple of 5: this helps spread the load over multiple cpus without havind to rely on work stealing.	2016-06-14 17:51:47 -07:00
Igor Babuschkin	c4d10e921f	Implement exclusive scan option	2016-06-14 19:44:07 +01:00
Gael Guennebaud	76236cdea4	merge	2016-06-14 15:33:47 +02:00
Gael Guennebaud	5d38203735	Update Tensor module to use bind1st_op and bind2nd_op	2016-06-14 15:06:03 +02:00
Benoit Steiner	65d33e5898	Merged in ibab/eigen (pull request PR-195) Add small fixes to TensorScanOp	2016-06-10 19:31:17 -07:00
Benoit Steiner	a05607875a	Don't refer to the half2 type unless it's been defined	2016-06-10 11:53:56 -07:00
Igor Babuschkin	86aedc9282	Add small fixes to TensorScanOp	2016-06-07 20:06:38 +01:00
Benoit Steiner	84b2060a9e	Fixed compilation error with gcc 4.4	2016-06-06 17:16:19 -07:00
Benoit Steiner	7ef9f47b58	Misc small improvements to the reduction code.	2016-06-06 14:09:46 -07:00
Benoit Steiner	9137f560f0	Moved assertions to the constructor to make the code more portable	2016-06-06 07:26:48 -07:00
Rasmus Munk Larsen	f1f2ff8208	size_t -> int	2016-06-03 18:06:37 -07:00
Rasmus Munk Larsen	76308e7fd2	Add CurrentThreadId and NumThreads methods to Eigen threadpools and TensorDeviceThreadPool.	2016-06-03 16:28:58 -07:00
Benoit Steiner	37638dafd7	Simplified the code that dispatches vectorized reductions on GPU	2016-06-09 10:29:52 -07:00
Benoit Steiner	66796e843d	Fixed definition of some of the reducer_traits	2016-06-09 08:50:01 -07:00
Benoit Steiner	14a112ee15	Use signed integers more consistently to encode the number of threads to use to evaluate a tensor expression.	2016-06-09 08:25:22 -07:00
Benoit Steiner	8f92c26319	Improved code formatting	2016-06-09 08:23:42 -07:00
Benoit Steiner	aa33446dac	Improved support for vectorization of 16-bit floats	2016-06-09 08:22:27 -07:00
Benoit Steiner	d6d39c7ddb	Added missing EIGEN_DEVICE_FUNC	2016-06-07 14:35:08 -07:00
Benoit Steiner	c3c8ad8046	Align the first element of the Waiter struct instead of padding it. This reduces its memory footprint a bit while achieving the goal of preventing false sharing	2016-06-02 21:17:41 -07:00
Eugene Brevdo	39baff850c	Add TernaryFunctors and the betainc SpecialFunction. TernaryFunctors and their executors allow operations on 3-tuples of inputs. API fully implemented for Arrays and Tensors based on binary functors. Ported the cephes betainc function (regularized incomplete beta integral) to Eigen, with support for CPU and GPU, floats, doubles, and half types. Added unit tests in array.cpp and cxx11_tensor_cuda.cu Collapsed revision * Merged helper methods for betainc across floats and doubles. * Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase. * Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper. * betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup. * Update TernaryOp and SpecialFunctions (betainc) based on review comments.	2016-06-02 17:04:19 -07:00
Benoit Steiner	c21eaedce6	Use array_prod to compute the number of elements contained in the input tensor expression	2016-06-04 07:47:04 -07:00
Benoit Steiner	36a4500822	Merged in ibab/eigen (pull request PR-192) Add generic scan method	2016-06-03 17:28:33 -07:00
Benoit Steiner	c2a102345f	Improved the performance of full reductions. AFTER: BM_fullReduction/10 4541 4543 154017 21.0M items/s BM_fullReduction/64 5191 5193 100000 752.5M items/s BM_fullReduction/512 9588 9588 71361 25.5G items/s BM_fullReduction/4k 244314 244281 2863 64.0G items/s BM_fullReduction/5k 359382 359363 1946 64.8G items/s BEFORE: BM_fullReduction/10 9085 9087 74395 10.5M items/s BM_fullReduction/64 9478 9478 72014 412.1M items/s BM_fullReduction/512 14643 14646 46902 16.7G items/s BM_fullReduction/4k 260338 260384 2678 60.0G items/s BM_fullReduction/5k 385076 385178 1818 60.5G items/s	2016-06-03 17:27:08 -07:00
Igor Babuschkin	dc03b8f3a1	Add generic scan method	2016-06-03 17:37:04 +01:00

... 3 4 5 6 7 ...

1112 Commits