eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Christoph Hertzberg	b786ce8c72	Fix conversion warning ... again	2018-10-02 18:35:25 +02:00
Christoph Hertzberg	564ca71e39	Merged in deven-amd/eigen/HIP_fixes (pull request PR-518) PR with HIP specific fixes (for the eigen nightly regression failures in HIP mode)	2018-10-01 16:51:04 +00:00
Deven Desai	94898488a6	This commit contains the following (HIP specific) updates: - unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h Changing "pass-by-reference" argument to be "pass-by-value" instead (in a __global__ function decl). "pass-by-reference" arguments to __global__ functions are unwise, and will be explicitly flagged as errors by the newer versions of HIP. - Eigen/src/Core/util/Memory.h - unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h Changes introduced in recent commits breaks the HIP compile. Adding EIGEN_DEVICE_FUNC attribute to some functions and calling ::malloc/free instead of the corresponding std:: versions to get the HIP compile working again - unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h Change introduced a recent commit breaks the HIP compile (link stage errors out due to failure to inline a function). Disabling the recently introduced code (only for HIP compile), to get the eigen nightly testing going again. Will submit another PR once we have te proper fix. - Eigen/src/Core/util/ConfigureVectorization.h Enabling GPU VECTOR support when HIP compiler is in use (for both the host and device compile phases)	2018-10-01 14:28:37 +00:00
Rasmus Munk Larsen	2088c0897f	Merged eigen/eigen into default	2018-09-28 16:00:46 -07:00
Rasmus Munk Larsen	31629bb964	Get rid of unused variable warning.	2018-09-28 16:00:09 -07:00
Eugene Zhulenev	bb13d5d917	Fix bug in copy optimization in Tensor slicing.	2018-09-28 14:34:42 -07:00
Rasmus Munk Larsen	104e8fa074	Fix a few warnings and rename a variable to not shadow "last".	2018-09-28 12:00:08 -07:00
Rasmus Munk Larsen	7c1b47840a	Merged in ezhulenev/eigen-01 (pull request PR-514) Add tests for evalShardedByInnerDim contraction + fix bugs	2018-09-28 18:37:54 +00:00
Eugene Zhulenev	524c81f3fa	Add tests for evalShardedByInnerDim contraction + fix bugs	2018-09-28 11:24:08 -07:00
Christoph Hertzberg	86ba50be39	Fix integer conversion warnings	2018-09-28 19:33:39 +02:00
Eugene Zhulenev	e95696acb3	Optimize TensorBlockCopyOp	2018-09-27 14:49:26 -07:00
Eugene Zhulenev	9f33e71e9d	Revert code lost in merge	2018-09-27 12:08:17 -07:00
Eugene Zhulenev	a7a3e9f2b6	Merge with eigen/eigen default	2018-09-27 12:05:06 -07:00
Eugene Zhulenev	9f4988959f	Remove explicit mkldnn support and redundant TensorContractionKernelBlocking	2018-09-27 11:49:19 -07:00
Rasmus Munk Larsen	d956204ab2	Remove "false &&" left over from test.	2018-09-26 17:03:30 -07:00
Rasmus Munk Larsen	3815aeed7a	Parallelize tensor contraction over the inner dimension in cases where where one or both of the outer dimensions (m and n) are small but k is large. This speeds up individual matmul microbenchmarks by up to 85%. Naming below is BM_Matmul_M_K_N_THREADS, measured on a 2-socket Intel Broadwell-based server. Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_Matmul_1_80_13522_1 387457 396013 -2.2% BM_Matmul_1_80_13522_2 406487 230789 +43.2% BM_Matmul_1_80_13522_4 395821 123211 +68.9% BM_Matmul_1_80_13522_6 391625 97002 +75.2% BM_Matmul_1_80_13522_8 408986 113828 +72.2% BM_Matmul_1_80_13522_16 399988 67600 +83.1% BM_Matmul_1_80_13522_22 411546 60044 +85.4% BM_Matmul_1_80_13522_32 393528 57312 +85.4% BM_Matmul_1_80_13522_44 390047 63525 +83.7% BM_Matmul_1_80_13522_88 387876 63592 +83.6% BM_Matmul_1_1500_500_1 245359 248119 -1.1% BM_Matmul_1_1500_500_2 401833 143271 +64.3% BM_Matmul_1_1500_500_4 210519 100231 +52.4% BM_Matmul_1_1500_500_6 251582 86575 +65.6% BM_Matmul_1_1500_500_8 211499 80444 +62.0% BM_Matmul_3_250_512_1 70297 68551 +2.5% BM_Matmul_3_250_512_2 70141 52450 +25.2% BM_Matmul_3_250_512_4 67872 58204 +14.2% BM_Matmul_3_250_512_6 71378 63340 +11.3% BM_Matmul_3_250_512_8 69595 41652 +40.2% BM_Matmul_3_250_512_16 72055 42549 +40.9% BM_Matmul_3_250_512_22 70158 54023 +23.0% BM_Matmul_3_250_512_32 71541 56042 +21.7% BM_Matmul_3_250_512_44 71843 57019 +20.6% BM_Matmul_3_250_512_88 69951 54045 +22.7% BM_Matmul_3_1500_512_1 369328 374284 -1.4% BM_Matmul_3_1500_512_2 428656 223603 +47.8% BM_Matmul_3_1500_512_4 205599 139508 +32.1% BM_Matmul_3_1500_512_6 214278 139071 +35.1% BM_Matmul_3_1500_512_8 184149 142338 +22.7% BM_Matmul_3_1500_512_16 156462 156983 -0.3% BM_Matmul_3_1500_512_22 163905 158259 +3.4% BM_Matmul_3_1500_512_32 155314 157662 -1.5% BM_Matmul_3_1500_512_44 235434 158657 +32.6% BM_Matmul_3_1500_512_88 156779 160275 -2.2% BM_Matmul_1500_4_512_1 363358 349528 +3.8% BM_Matmul_1500_4_512_2 303134 263319 +13.1% BM_Matmul_1500_4_512_4 176208 130086 +26.2% BM_Matmul_1500_4_512_6 148026 115449 +22.0% BM_Matmul_1500_4_512_8 131656 98421 +25.2% BM_Matmul_1500_4_512_16 134011 82861 +38.2% BM_Matmul_1500_4_512_22 134950 85685 +36.5% BM_Matmul_1500_4_512_32 133165 90081 +32.4% BM_Matmul_1500_4_512_44 133203 90644 +32.0% BM_Matmul_1500_4_512_88 134106 100566 +25.0% BM_Matmul_4_1500_512_1 439243 435058 +1.0% BM_Matmul_4_1500_512_2 451830 257032 +43.1% BM_Matmul_4_1500_512_4 276434 164513 +40.5% BM_Matmul_4_1500_512_6 182542 144827 +20.7% BM_Matmul_4_1500_512_8 179411 166256 +7.3% BM_Matmul_4_1500_512_16 158101 155560 +1.6% BM_Matmul_4_1500_512_22 152435 155448 -1.9% BM_Matmul_4_1500_512_32 155150 149538 +3.6% BM_Matmul_4_1500_512_44 193842 149777 +22.7% BM_Matmul_4_1500_512_88 149544 154468 -3.3%	2018-09-26 16:47:13 -07:00
Eugene Zhulenev	71cd3fbd6a	Support multiple contraction kernel types in TensorContractionThreadPool	2018-09-26 11:08:47 -07:00
Christoph Hertzberg	2c083ace3e	Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides	2018-09-24 18:01:17 +02:00
Gael Guennebaud	c696dbcaa6	Fiw shadowing of last and all	2018-09-21 23:02:33 +02:00
Rasmus Munk Larsen	8e2be7777e	Merged eigen/eigen into default	2018-09-20 11:41:15 -07:00
Rasmus Munk Larsen	5d2e759329	Initialize BlockIteratorState in a C++03 compatible way.	2018-09-20 11:40:43 -07:00
Gael Guennebaud	e04faca930	merge	2018-09-20 18:33:54 +02:00
Gael Guennebaud	3c6dc93f99	Fix GPU support.	2018-09-20 18:29:21 +02:00
Gael Guennebaud	9419f506d0	Fix regression introduced by the previous fix for AVX512. It brokes the complex-complex case on SSE.	2018-09-20 17:32:34 +02:00
Christoph Hertzberg	a0166ab651	Workaround for spurious "array subscript is above array bounds" warnings with g++4.x	2018-09-20 17:08:43 +02:00
Gael Guennebaud	71496b0e25	Fix gebp kernel for real+complex in case only reals are vectorized (e.g., AVX512). This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.	2018-09-20 17:01:24 +02:00
Rasmus Munk Larsen	44d8274383	Cast to longer type.	2018-09-19 13:31:42 -07:00
Rasmus Munk Larsen	d638b62dda	Silence compiler warning.	2018-09-19 13:27:55 -07:00
Rasmus Munk Larsen	db9c9df59a	Silence more compiler warnings.	2018-09-19 11:50:27 -07:00
Rasmus Munk Larsen	febd09dcc0	Silence compiler warnings in ThreadPoolInterface.h.	2018-09-19 11:11:04 -07:00
luz.paz"	f67b19a884	[PATCH 1/2] Misc. typos From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001 Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of: ``` als ans cas dum lastr lowd nd overfl pres preverse substraction te uint whch ``` --- CMakeLists.txt \| 26 +++++++++---------- Eigen/src/Core/GenericPacketMath.h \| 2 +- Eigen/src/SparseLU/SparseLU.h \| 2 +- bench/bench_norm.cpp \| 2 +- doc/HiPerformance.dox \| 2 +- doc/QuickStartGuide.dox \| 2 +- .../Eigen/CXX11/src/Tensor/TensorChipping.h \| 6 ++--- .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h \| 2 +- .../src/Tensor/TensorForwardDeclarations.h \| 4 +-- .../src/Tensor/TensorGpuHipCudaDefines.h \| 2 +- .../Eigen/CXX11/src/Tensor/TensorReduction.h \| 2 +- .../CXX11/src/Tensor/TensorReductionGpu.h \| 2 +- .../test/cxx11_tensor_concatenation.cpp \| 2 +- unsupported/test/cxx11_tensor_executor.cpp \| 2 +- 14 files changed, 29 insertions(+), 29 deletions(-)	2018-09-18 04:15:01 -04:00
Eugene Zhulenev	c4627039ac	Support static dimensions (aka IndexList) in Tensor::resize(...)	2018-09-18 14:25:21 -07:00
Eugene Zhulenev	218a7b9840	Enable DSizes type promotion with c++03 compilers	2018-09-18 10:57:00 -07:00
Ravi Kiran	1f0c941c3d	Collapsed revision * Merged eigen/eigen into default	2018-09-17 18:29:12 -07:00
Rasmus Munk Larsen	03a88c57e1	Merged in ezhulenev/eigen-02 (pull request PR-498) Add DSizes index type promotion	2018-09-17 21:58:38 +00:00
Rasmus Munk Larsen	5ca0e4a245	Merged in ezhulenev/eigen-01 (pull request PR-497) Fix warnings in IndexList array_prod	2018-09-17 20:15:06 +00:00
Eugene Zhulenev	a5cd4e9ad1	Replace deprecated Eigen::DenseIndex with Eigen::Index in TensorIndexList	2018-09-17 10:58:07 -07:00
Gael Guennebaud	b311bfb752	bug #1596 : fix inclusion of Eigen's header within unsupported modules.	2018-09-17 09:54:29 +02:00
Gael Guennebaud	72f19c827a	typo	2018-09-16 22:10:34 +02:00
Eugene Zhulenev	66f056776f	Add DSizes index type promotion	2018-09-15 15:17:38 -07:00
Eugene Zhulenev	f313126dab	Fix warnings in IndexList array_prod	2018-09-15 13:47:54 -07:00
Christoph Hertzberg	42705ba574	Fix weird error for building with g++-4.7 in C++03 mode.	2018-09-15 12:43:41 +02:00
Rasmus Munk Larsen	c2383f95af	Merged in ezhulenev/eigen/fix_dsizes (pull request PR-494) Fix DSizes IndexList constructor	2018-09-15 02:36:19 +00:00
Rasmus Munk Larsen	30290cdd56	Merged in ezhulenev/eigen/moar_eigen_fixes_3 (pull request PR-493) Const cast scalar pointer in TensorSlicingOp evaluator Approved-by: Sameer Agarwal <sameeragarwal@google.com>	2018-09-15 02:35:07 +00:00
Eugene Zhulenev	f7d0053cf0	Fix DSizes IndexList constructor	2018-09-14 19:19:13 -07:00
Rasmus Munk Larsen	601e289d27	Merged in ezhulenev/eigen/moar_eigen_fixes_1 (pull request PR-492) Explicitly construct tensor block dimensions from evaluator dimensions	2018-09-15 01:36:21 +00:00
Eugene Zhulenev	71070a1e84	Const cast scalar pointer in TensorSlicingOp evaluator	2018-09-14 17:17:50 -07:00
Eugene Zhulenev	4863375723	Explicitly construct tensor block dimensions from evaluator dimensions	2018-09-14 16:55:05 -07:00
Rasmus Munk Larsen	14e35855e1	Merged in chtz/eigen-maxsizevector (pull request PR-490) Let MaxSizeVector respect alignment of objects Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>	2018-09-14 23:29:24 +00:00
Eugene Zhulenev	1b8d70a22b	Support reshaping with static shapes and dimensions conversion in tensor broadcasting	2018-09-14 15:25:27 -07:00
Christoph Hertzberg	007f165c69	bug #1598 : Let MaxSizeVector respect alignment of objects and add a unit test Also revert `8b3d9ed081`	2018-09-14 20:21:56 +02:00
Rasmus Munk Larsen	6313dde390	Fix merge error.	2018-09-13 16:42:05 -07:00
Rasmus Munk Larsen	0db590d22d	Backed out changeset `01197e4452`	2018-09-13 16:20:57 -07:00
Rasmus Munk Larsen	b3f4c067d9	Merge	2018-09-13 16:18:52 -07:00
Rasmus Munk Larsen	2b07018140	Enable vectorized version on GPUs. The underlying bug has been fixed.	2018-09-13 16:12:22 -07:00
Rasmus Munk Larsen	53568e3549	Merged in ezhulenev/eigen/tiled_evalution_support (pull request PR-444) Tiled evaluation for Tensor ops Approved-by: Rasmus Munk Larsen <rmlarsen@google.com> Approved-by: Gael Guennebaud <g.gael@free.fr>	2018-09-13 22:05:47 +00:00
Eugene Zhulenev	01197e4452	Fix warnings	2018-09-13 15:03:36 -07:00
Gael Guennebaud	7f3b17e403	MSVC 2015 supports c++11 thread-local-storage	2018-09-13 18:15:07 +02:00
Rasmus Munk Larsen	e289f44c56	Don't vectorize the MeanReducer unless pdiv is available.	2018-09-11 14:09:00 -07:00
Eugene Zhulenev	55bb7e7935	Merge with upstream eigen/default	2018-09-11 13:33:06 -07:00
Eugene Zhulenev	81b38a155a	Fix compilation of tiled evaluation code with c++03	2018-09-11 13:32:32 -07:00
Rasmus Munk Larsen	46f88fc454	Use numerically stable tree reduction in TensorReduction.	2018-09-11 10:08:10 -07:00
Deven Desai	c64fe9ea1f	Updates to fix HIP-clang specific compile errors. Compiling the eigen unittests with hip-clang (HIP with clang as the underlying compiler instead of hcc or nvcc), results in compile errors. The changes in this commit fix those compile errors. The main change is to convert a few instances of "__device__" to "EIGEN_DEVICE_FUNC"	2018-08-30 20:22:16 +00:00
Rasmus Munk Larsen	8b3d9ed081	Use padding instead of alignment attribute, which MaxSizeVector does not respect. This leads to undefined behavior and hard-to-trace bugs.	2018-09-05 11:20:06 -07:00
Eugene Zhulenev	c144bb355b	Merge with upstream eigen/default	2018-08-27 14:34:07 -07:00
Christoph Hertzberg	b1653d1599	Fix some trivial C++11 vs C++03 compatibility warnings	2018-08-25 12:21:00 +02:00
Christoph Hertzberg	5aaedbeced	Fixed more sign-compare and type-limits warnings	2018-08-24 23:54:12 +02:00
Christoph Hertzberg	f7675b826b	Fix several integer conversion and sign-compare warnings	2018-08-24 22:58:55 +02:00
Rasmus Munk Larsen	744e2fe0de	Address comments about EIGEN_THREAD_LOCAL.	2018-08-24 10:24:54 -07:00
Rasmus Munk Larsen	8d9bc5cc02	Fix g++ compilation.	2018-08-23 13:06:39 -07:00
Rasmus Munk Larsen	e9f9d70611	Don't rely on __had_feature for g++. Don't use __thread. Only use thread_local for gcc 4.8 or newer.	2018-08-23 12:59:46 -07:00
Rasmus Munk Larsen	668690978f	Pad PerThread when we emulate thread_local to prevent false sharing.	2018-08-23 12:54:33 -07:00
Rasmus Munk Larsen	6cedc5a9b3	rename mu.	2018-08-23 12:11:58 -07:00
Rasmus Munk Larsen	6e0464004a	Store std::unique_ptr instead of raw pointers in per_thread_map_.	2018-08-23 12:10:08 -07:00
Rasmus Munk Larsen	e51d9e473a	Protect #undef max with #ifdef max.	2018-08-23 11:42:05 -07:00
Rasmus Munk Larsen	d35880ed91	merge	2018-08-23 11:36:49 -07:00
Christoph Hertzberg	a709c8efb4	Replace pointers by values or unique_ptr for better leak-safety	2018-08-23 19:41:59 +02:00
Christoph Hertzberg	39335cf51e	Make MaxSizeVector leak-safe	2018-08-23 19:37:56 +02:00
Benoit Steiner	19df4d5752	Merged in codeplaysoftware/eigen-upstream-pure/Pointer_type_creation (pull request PR-461) Creating a pointer type in TensorCustomOp.h	2018-08-16 18:28:33 +00:00
Benoit Steiner	f641cf1253	Adding missing at method in Eigen::array	2018-08-16 11:24:37 -07:00
Benoit Steiner	e23c8c294e	Use actual types instead of the auto keyword to make the code more portable	2018-08-16 10:41:01 -07:00
Mehdi Goli	80f1a76dec	removing the noises.	2018-08-16 13:33:24 +01:00
Mehdi Goli	d0b01ebbf6	Reverting the unitended delete from the code.	2018-08-16 13:21:36 +01:00
Mehdi Goli	161dcbae9b	Using PointerType struct and specializing it per device for TensorCustomOp.h	2018-08-16 00:07:02 +01:00
Sameer Agarwal	f197c3f55b	Removed an used variable (PacketSize) from TensorExecutor	2018-08-15 11:24:57 -07:00
Benoit Steiner	4181556907	Fixed the tensor contraction code.	2018-08-15 09:34:47 -07:00
Benoit Steiner	fbb834144d	Fixed more compilation errors	2018-08-15 08:52:58 -07:00
Benoit Steiner	ab3f481141	Cleaned up the code and make it compile with more compilers	2018-08-14 14:05:46 -07:00
Rasmus Munk Larsen	fa0bcbf230	merge	2018-08-14 12:18:31 -07:00
Rasmus Munk Larsen	15d4f515e2	Use plain_assert in destructors to avoid throwing in CXX11 tests where main.h owerwrites eigen_assert with a throwing version.	2018-08-14 12:17:46 -07:00
Rasmus Munk Larsen	2a98bd9c8e	Merged eigen/eigen into default	2018-08-14 12:02:09 -07:00
Benoit Steiner	59bba77ead	Fixed compilation errors with gcc 4.7 and 4.8	2018-08-14 10:54:48 -07:00
Mehdi Goli	8ba799805b	Merge with upstream	2018-08-14 09:43:45 +01:00
Rasmus Munk Larsen	6d6e7b7027	merge	2018-08-13 15:34:50 -07:00
Rasmus Munk Larsen	9bb75d8d31	Add Barrier.h.	2018-08-13 15:34:03 -07:00
Rasmus Munk Larsen	2e1adc0324	Merged eigen/eigen into default	2018-08-13 15:32:00 -07:00
Rasmus Munk Larsen	8278ae6313	Add support for thread local support on platforms that do not support it through emulation using a hash map.	2018-08-13 15:31:23 -07:00
Benoit Steiner	501be70b27	Code cleanup	2018-08-13 15:16:40 -07:00
Gael Guennebaud	3ec60215df	Merged in rmlarsen/eigen2 (pull request PR-466) Move sigmoid functor to core and rename it to 'logistic'.	2018-08-13 21:28:20 +00:00
Rasmus Munk Larsen	0f1b2e08a5	Call logistic functor from Tensor::sigmoid.	2018-08-13 11:52:58 -07:00
Benoit Steiner	26239ee580	Use NULL instead of nullptr to avoid adding a cxx11 requirement.	2018-08-13 11:05:51 -07:00
Benoit Steiner	3810ec228f	Don't use the auto keyword since it's not always supported properly.	2018-08-13 10:46:09 -07:00
Benoit Steiner	e6d5be811d	Fixed syntax of nested templates chevrons to make it compatible with c++97 mode.	2018-08-13 10:29:21 -07:00
Mehdi Goli	1aa86aad14	Merge with upstream.	2018-08-13 15:40:31 +01:00
Eugene Zhulenev	35d90e8960	Fix BlockAccess enum in CwiseUnaryOp evaluator	2018-08-10 17:37:58 -07:00
Eugene Zhulenev	855b68896b	Merge with eigen/default	2018-08-10 17:18:42 -07:00
Eugene Zhulenev	f2209d06e4	Add block evaluationto CwiseUnaryOp and add PreferBlockAccess enum to all evaluators	2018-08-10 16:53:36 -07:00
Benoit Steiner	c8ea398675	Avoided language features that are only available in cxx11 mode.	2018-08-10 13:02:41 -07:00
Benoit Steiner	4be4286224	Made the code compile with gcc 5.4.	2018-08-10 11:32:58 -07:00
Eugene Zhulenev	cfaedb38cd	Fix bug in a test + compilation errors	2018-08-09 09:44:07 -07:00
Mehdi Goli	ea8fa5e86f	Merge with upstream	2018-08-09 14:07:56 +01:00
Mehdi Goli	8c083bfd0e	Properly fixing the PointerType for TensorCustomOp.h. As the output type here should be based on CoeffreturnType not the Scalar type. Therefore, Similar to reduction and evalTo function, it should have its own MakePointer class. In this case, for other device the type is defaulted to CoeffReturnType and no changes is required on users' code. However, in SYCL, on the device, we can recunstruct the device Type.	2018-08-09 13:57:43 +01:00
Eugene Zhulenev	1c8b9e10a7	Merged with upstream eigen	2018-08-08 16:57:58 -07:00
Benoit Steiner	131ed1191f	Merged in codeplaysoftware/eigen-upstream-pure/Fixing_compiler_warning (pull request PR-462) Fixing compiler warning in TensorBlock.h as it was creating a lot of noise at compilation.	2018-08-08 18:14:15 +00:00
Mehdi Goli	532a0be05c	Fixing compiler warning in TensorBlock.h as it was creating a lot of noise at compilation.	2018-08-08 12:12:26 +01:00
Mehdi Goli	3055e3a7c2	Creating a pointer type in TensorCustomOp.h	2018-08-08 11:19:02 +01:00
Rasmus Munk Larsen	693fb1d41e	Fix init order.	2018-08-07 17:18:51 -07:00
Benoit Steiner	10d286f55b	Silenced a couple of compilation warnings.	2018-08-06 16:00:29 -07:00
Benoit Steiner	d011d05fd6	Fixed compilation errors.	2018-08-06 13:40:51 -07:00
Rasmus Munk Larsen	36e7e7dd8f	Forward declare NoOpOutputKernel as struct rather than class to be consistent with implementation.	2018-08-06 13:16:32 -07:00
Rasmus Munk Larsen	fa68342ef8	Move sigmoid functor to core.	2018-08-03 17:31:23 -07:00
Christoph Hertzberg	023ed6b9a8	Product of empty array must be 1 and not 0.	2018-08-30 17:14:52 +02:00
Christoph Hertzberg	c2f4e8c08e	Fix integer conversion warning	2018-08-30 17:12:53 +02:00
Deven Desai	946c3e2544	adding EIGEN_DEVICE_FUNC attribute to fix some GPU unit tests that are broken in HIP mode	2018-08-27 23:04:08 +00:00
Christoph Hertzberg	73ca600bca	Fix numerous shadow-warnings for GCC<=4.8	2018-08-28 18:32:39 +02:00
Eugene Zhulenev	1b0373ae10	Replace all using declarations with typedefs in Tensor ops	2018-08-01 15:55:46 -07:00
Rasmus Munk Larsen	bcb29f890c	Fix initialization order.	2018-08-03 10:18:53 -07:00
Mehdi Goli	3074b1ff9e	Fixing the compilation error.	2018-08-03 17:13:44 +01:00
Mehdi Goli	01358300d5	Creating separate SYCL required PR for uncontroversial files.	2018-08-03 16:59:15 +01:00
Eugene Zhulenev	64abdf1d7e	Fix typo + get rid of redundant member variables for block sizes	2018-08-01 12:35:19 -07:00
Benoit Steiner	93b9e36e10	Merged in paultucker/eigen (pull request PR-431) Optional ThreadPoolDevice allocator Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>	2018-08-01 19:14:34 +00:00
Eugene Zhulenev	385b3ff12f	Merged latest changes from upstream/eigen	2018-08-01 11:59:04 -07:00
Benoit Steiner	17221115c9	Merged in codeplaysoftware/eigen-upstream-pure/eigen_variadic_assert (pull request PR-447) Adding variadic version of assert which can take a parameter pack as its input.	2018-08-01 16:41:54 +00:00
Benoit Steiner	0360c36170	Merged in codeplaysoftware/eigen-upstream-pure/separating_internal_memory_allocation (pull request PR-446) Distinguishing between internal memory allocation/deallocation from explicit user memory allocation/deallocation.	2018-08-01 16:13:15 +00:00
Mehdi Goli	c6a5c70712	Correcting the position of allocate_temp/deallocate_temp in TensorDeviceGpu.h	2018-08-01 16:56:26 +01:00
Benoit Steiner	45f75f1ace	Merged in codeplaysoftware/eigen-upstream-pure/using_PacketType_class (pull request PR-449) Enabling per device specialisation of packetSize.	2018-08-01 15:43:03 +00:00
Mehdi Goli	af96018b49	Using the suggested modification.	2018-08-01 16:04:44 +01:00
Mehdi Goli	b512a9536f	Enabling per device specialisation of packetsize.	2018-08-01 13:39:13 +01:00
Mehdi Goli	3a197a60e6	variadic version of assert which can take a parameter pack as its input.	2018-08-01 12:19:14 +01:00
Mehdi Goli	d7a8414848	Distinguishing between internal memory allocation/deallocation from explicit user memory allocation/deallocation.	2018-08-01 11:56:30 +01:00
Mehdi Goli	9e219bb3d3	Converting ad-hoc inline keyword to EIGEN_STRONG_INLINE MACRO.	2018-08-01 10:47:49 +01:00
Eugene Zhulenev	83c0a16baf	Add block evaluation support to TensorOps	2018-07-31 15:56:31 -07:00
Benoit Steiner	edf46bd7a2	Merged in yuefengz/eigen (pull request PR-370) Use device's allocate function instead of internal::aligned_malloc.	2018-07-31 22:38:28 +00:00
Paul Tucker	385f7b8d0c	Change getAllocator() to allocator() in ThreadPoolDevice.	2018-07-31 13:52:18 -07:00
Mark D Ryan	6f5b126e6d	Fix tensor contraction for AVX512 machines This patch modifies the TensorContraction class to ensure that the kc_ field is always a multiple of the packet_size, if the packet_size is > 8. Without this change spatial convolutions in Tensorflow do not work properly as the code that re-arranges the input matrices can assert if kc_ is not a multiple of the packet_size. This leads to a unit test failure, //tensorflow/python/kernel_tests:conv_ops_test, on AVX512 builds of tensorflow.	2018-07-31 09:33:37 +01:00
Gael Guennebaud	678a0dcb12	Merged in ezhulenev/eigen/tiling_3 (pull request PR-438) Tiled tensor executor	2018-07-31 08:13:00 +00:00
Gael Guennebaud	679eece876	Speedup trivial tensor broadcasting on GPU by enforcing unaligned loads. See PR 437.	2018-07-31 10:10:14 +02:00
Eugene Zhulenev	966c2a7bb6	Rename Index to StorageIndex + use Eigen::Array and Eigen::Map when possible	2018-07-27 12:45:17 -07:00
Eugene Zhulenev	6913221c43	Add tiled evaluation support to TensorExecutor	2018-07-25 13:51:10 -07:00
Rasmus Munk Larsen	e478532625	Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.	2018-07-27 12:36:34 -07:00
Eugene Zhulenev	d55efa6f0f	TensorBlockIO	2018-07-23 15:50:55 -07:00
Eugene Zhulenev	34a75c3c5c	Initial support of TensorBlock	2018-07-20 17:37:20 -07:00
Paul Tucker	d4afccde5a	Add test coverage for ThreadPoolDevice optional allocator.	2018-07-19 17:43:44 -07:00
Eugene Zhulenev	c58b874727	PR430: Convert count to the reducer type in MeanReducer Without explicit conversion Tensorflow fails to compile, pset1 template deduction fails. cannot convert '((const Eigen::internal::MeanReducer<Eigen::half>*)this) ->Eigen::internal::MeanReducer<Eigen::half>::packetCount_' (type 'const DenseIndex {aka const long int}') to type 'const type& {aka const Eigen::half&}' return pdiv(vaccum, pset1<Packet>(packetCount_)); Honestly I’m not sure why it works in Eigen tests, because Eigen::half constructor is explicit, and why it stopped working in TF, I didn’t find any relevant changes since previous Eigen upgrade. static_cast<T>(packetCount_) - breaks cxx11_tensor_reductions test for Eigen::half, also quite surprising.	2018-07-19 17:37:03 -07:00
Paul Tucker	4e9848fa86	Actually add optional Allocator* arg to ThreadPoolDevice().	2018-07-16 17:53:36 -07:00
Paul Tucker	b3e7c9132d	Add optional Allocator argument to ThreadPoolDevice constructor. When supplied, this allocator will be used in place of internal::aligned_malloc. This permits e.g. use of a NUMA-node specific allocator where the thread-pool is also restricted a single NUMA-node.	2018-07-16 17:26:05 -07:00
Rasmus Munk Larsen	3a9cf4e290	Get rid of alias for m_broadcast.	2018-07-13 16:24:48 -07:00
Rasmus Munk Larsen	4222550e17	Optimize the case where broadcasting is a no-op.	2018-07-13 16:12:38 -07:00
Gael Guennebaud	06eb24cf4d	Introduce gpu_assert for assertion in device-code, and disable them with clang-cuda.	2018-07-13 16:04:27 +02:00
Eugene Zhulenev	6e654f3379	Reduce number of allocations in TensorContractionThreadPool.	2018-07-16 14:26:39 -07:00
Gael Guennebaud	7ccb623746	bug #1569 : fix Tensor<half>::mean() on AVX with respective unit test.	2018-07-19 13:15:40 +02:00
Eugene Zhulenev	e3c2d61739	Assert that no output kernel is defined for GPU contraction	2018-07-18 14:34:22 -07:00
Eugene Zhulenev	79d4129cce	Specify default output kernel for TensorContractionOp	2018-07-18 14:21:01 -07:00
Yuefeng Zhou	1eff6cf8a7	Use device's allocate function instead of internal::aligned_malloc. This would make it easier to track memory usage in device instances.	2018-02-20 16:50:05 -08:00
Gael Guennebaud	6cd6551b26	Add deprecated header files for TensorFlow	2018-07-12 10:50:53 +02:00
Deven Desai	876f392c39	Updates corresponding to the latest round of PR feedback The major changes are 1. Moving CUDA/PacketMath.h to GPU/PacketMath.h 2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h 3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h The above three changes effectively enable the Eigen "Packet" layer for the HIP platform 4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic") 5. Updating the "EIGEN_DEVICE_FUNC" marking in some places The change has been tested on the HIP and CUDA platforms.	2018-07-11 10:39:54 -04:00
Deven Desai	38807a2575	merging updates from upstream	2018-07-11 09:17:33 -04:00
Gael Guennebaud	6190aa5632	bug #1567 : add optimized path for tensor broadcasting and 'Channel First' shape	2018-07-09 11:23:16 +02:00
Deven Desai	1bb6fa99a3	merging the CUDA and HIP implementation for the Tensor directory and the unit tests	2018-06-20 16:44:58 -04:00
Deven Desai	cfdabbcc8f	removing the *Hip files from the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories	2018-06-20 12:57:02 -04:00
Deven Desai	7e41c8f1a9	renaming Cuda files to Gpu in the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories	2018-06-20 12:52:30 -04:00
Deven Desai	b6cc0961b1	updates based on PR feedback There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC \|\| EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH \|\| EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`	2018-06-14 10:21:54 -04:00
Deven Desai	d1d22ef0f4	syncing this fork with upstream	2018-06-13 12:09:52 -04:00
Rasmus Munk Larsen	5418154a45	Fix oversharding bug in parallelFor.	2018-06-20 17:51:48 -07:00
Michael Figurnov	30fa3d0454	Merge from eigen/eigen	2018-06-07 17:57:56 +01:00
Gael Guennebaud	b3fd93207b	Fix typos found using codespell	2018-06-07 14:43:02 +02:00
Michael Figurnov	4bd158fa37	Derivative of the incomplete Gamma function and the sample of a Gamma random variable. In addition to igamma(a, x), this code implements: * igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter * gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.	2018-06-06 18:49:26 +01:00
Deven Desai	8fbd47052b	Adding support for using Eigen in HIP kernels. This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs. Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor) Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.	2018-06-06 10:12:58 -04:00
Benoit Steiner	e206f8d4a4	Merged in mfigurnov/eigen (pull request PR-400) Exponentially scaled modified Bessel functions of order zero and one. Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>	2018-06-05 17:05:21 +00:00
Penporn Koanantakool	e2ed0cf8ab	Add a ThreadPoolInterface* getter for ThreadPoolDevice.	2018-06-02 12:07:49 -07:00
Michael Figurnov	f216854453	Exponentially scaled modified Bessel functions of order zero and one. The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(\|x\|) i0e(x) The code is ported from Cephes and tested against SciPy.	2018-05-31 15:34:53 +01:00
Katrin Leinweber	ea94543190	Hyperlink DOIs against preferred resolver	2018-05-24 18:55:40 +02:00
Vamsi Sripathi	6293ad3f39	Performance improvements to tensor broadcast operation 1. Added new packet functions using SIMD for NByOne, OneByN cases 2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD 3. Added 4 test cases to cover the new packet functions	2018-05-23 14:02:05 -07:00
Benoit Steiner	0371380d5b	Merged in rmlarsen/eigen2 (pull request PR-393) Rename scalar_clip_op to scalar_clamp_op to prevent collision with existing functor in TensorFlow.	2018-05-16 21:45:42 +00:00
Rasmus Munk Larsen	b8d36774fa	Rename clip2 to clamp.	2018-05-16 14:04:48 -07:00
Rasmus Munk Larsen	812480baa3	Rename scalar_clip_op to scalar_clip2_op to prevent collision with existing functor in TensorFlow.	2018-05-16 09:49:24 -07:00
Benoit Steiner	1403c2c15b	Merged in didierjansen/eigen (pull request PR-360) Fix bugs and typos in the contraction example of the tensor README	2018-05-16 01:16:36 +00:00
Rasmus Munk Larsen	b8c8e5f436	Add vectorized clip functor for Eigen Tensors.	2018-05-14 16:07:13 -07:00
Benoit Steiner	6118c6ff4f	Enable RawAccess to tensor slices whenever possinle. Avoid 32-bit integer overflow in TensorSlicingOp	2018-04-30 11:28:12 -07:00
Weiming Zhao	b0eda3cb9f	Avoid using memcpy for non-POD elements	2018-04-11 11:37:06 +02:00
Viktor Csomor	000840cae0	Added a move constructor and move assignment operator to Tensor and wrote some tests.	2018-02-07 19:10:54 +01:00
Eugene Zhulenev	c95aacab90	Fix TensorContractionOp evaluators for GPU and SYCL	2018-07-17 14:09:37 -07:00
Deven Desai	f124f07965	applying EIGEN_DECLARE_TEST to gpu tests Also, a few minor fixes for GPU tests running in HIP mode. 1. Adding an include for hip/hip_runtime.h in the Macros.h file For HIP __host__ and __device__ are macros which are defined in hip headers. Their definitions need to be included before their use in the file. 2. Fixing the compile failure in TensorContractionGpu introduced by the commit to "Fuse computations into the Tensor contractions using output kernel" 3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit	2018-07-17 14:16:48 -04:00
Eugene Zhulenev	43206ac4de	Call OutputKernel in evalGemv	2018-07-12 14:52:23 -07:00
Eugene Zhulenev	e204ecdaaf	Remove SimpleThreadPool and always use {NonBlocking}ThreadPool	2018-07-16 15:06:57 -07:00
Eugene Zhulenev	01fd4096d3	Fuse computations into the Tensor contractions using output kernel	2018-07-10 13:16:38 -07:00
Benoit Steiner	8f55956a57	Update the padding computation for PADDING_SAME to be consistent with TensorFlow.	2018-01-30 20:22:12 +00:00
Lee.Deokjae	5b3c367926	Fix typos in the contraction example of tensor README	2018-01-06 14:36:19 +09:00
RJ Ryan	59985cfd26	Disable use of recurrence for computing twiddle factors. Fixes FFT precision issues for large FFTs. https://github.com/tensorflow/tensorflow/issues/10749#issuecomment-354557689	2017-12-31 10:44:56 -05:00
Gael Guennebaud	73214c4bd0	Workaround nvcc 9.0 issue. See PR 351. https://bitbucket.org/eigen/eigen/pull-requests/351	2017-12-15 14:10:59 +01:00
Yangzihao Wang	3122477c86	Update the padding computation for PADDING_SAME to be consistent with TensorFlow.	2017-12-12 11:15:24 -08:00
Benoit Steiner	3949615176	Merged in JonasMu/eigen (pull request PR-329) Added an example for a contraction to a scalar value to README.md Approved-by: Jonas Harsch <jonas.harsch@gmail.com>	2017-10-27 07:27:46 +00:00
Benoit Steiner	8eb4b9d254	Merged in benoitsteiner/opencl (pull request PR-341)	2017-10-17 16:39:28 +00:00
Rasmus Munk Larsen	f349507e02	Specialize ThreadPoolDevice::enqueueNotification for the case with no args. As an example this reduces binary size of an TensorFlow demo app for Android by about 2.5%.	2017-10-13 15:58:12 -07:00
Mehdi Goli	2062ac9958	Changes required for new ComputeCpp CE version.	2017-09-18 18:17:39 +01:00
Rasmus Munk Larsen	1b7294f6fc	Fix cut-and-paste error.	2017-09-08 16:35:58 -07:00
Rasmus Munk Larsen	94e2213b38	Avoid undefined behavior in Eigen::TensorCostModel::numThreads. If the cost is large enough then the thread count can be larger than the maximum representable int, so just casting it to an int is undefined behavior. Contributed by phurst@google.com.	2017-09-08 15:49:55 -07:00
Jonas Harsch	a991c80365	Added an example for a contraction to a scalar value, e.g. a double contraction of two second order tensors and how you can get the value of the result. I lost one day to get this doen so I think it will help some guys. I also added Eigen:: to the IndexPair and and array in the same example.	2017-09-01 11:30:26 +00:00
Benoit Steiner	a4089991eb	Added support for CUDA 9.0.	2017-08-31 02:49:39 +00:00
Benoit Steiner	84d7be103a	Fixing Argmax that was breaking upstream TensorFlow.	2017-07-22 03:19:34 +00:00
Benoit Steiner	f0b154a4b0	Code cleanup	2017-07-10 09:54:09 -07:00
Benoit Steiner	575cda76b3	Fixed syntax errors generated by xcode	2017-07-09 11:39:01 -07:00
Benoit Steiner	5ac27d5b51	Avoid relying on cxx11 features when possible.	2017-07-08 21:58:44 -07:00
Benoit Steiner	c5a241ab9b	Merged in benoitsteiner/opencl (pull request PR-323) Improved support for OpenCL	2017-07-07 16:27:33 +00:00
Benoit Steiner	b7ae4dd9ef	Merged in hughperkins/eigen/add-endif-labels-TensorReductionCuda.h (pull request PR-315) Add labels to #ifdef, in TensorReductionCuda.h	2017-07-07 04:23:52 +00:00
Benoit Steiner	9daed67952	Merged in tntnatbry/eigen (pull request PR-319) Tensor Trace op	2017-07-07 04:18:03 +00:00
Benoit Steiner	6795512e59	Improved the randomness of the tensor random generator	2017-07-06 21:12:45 -07:00
Benoit Steiner	dc524ac716	Fixed compilation warning	2017-07-06 21:11:15 -07:00
Benoit Steiner	62b4634ebe	Merged in mehdi_goli/upstr_benoit/TensorSYCLImageVolumePatchFixed (pull request PR-14) Applying Benoit's comment for Fixing ImageVolumePatch. * Applying Benoit's comment for Fixing ImageVolumePatch. Fixing conflict on cmake file. * Fixing dealocation of the memory in ImagePatch test for SYCL. * Fixing the automerge issue.	2017-07-06 05:08:13 +00:00
Benoit Steiner	53725c10b8	Merged in mehdi_goli/opencl/DataDependancy (pull request PR-10) DataDependancy * Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code. * Applying Ronnan's Comments. * Applying benoit's comments	2017-06-28 17:55:23 +00:00
Hugh Perkins	9341f258d4	Add labels to #ifdef, in TensorReductionCuda.h	2017-06-06 15:51:06 +01:00
Benoit Steiner	1e736b9ead	Merged in mehdi_goli/opencl/SYCLAlignAllocator (pull request PR-7) Fixing SYCL alignment issue required by TensorFlow.	2017-05-26 17:23:00 +00:00
Benoit Steiner	9dee55ec33	Merged eigen/eigen into default	2017-05-26 09:01:04 -07:00
Mehdi Goli	0370d3576e	Applying Ronnan's comments.	2017-05-26 16:01:48 +01:00
Mehdi Goli	e3f964ed55	Applying Benoit's comment;removing dead code.	2017-05-25 11:17:26 +01:00
a-doumoulakis	fb853a857a	Restore misplaced comment	2017-05-24 17:50:15 +01:00
a-doumoulakis	7a8ba565f8	Merge changed from upstream	2017-05-24 17:45:29 +01:00
Mehdi Goli	76c0fc1f95	Fixing SYCL alignment issue required by TensorFlow.	2017-05-22 16:49:32 +01:00
Mehdi Goli	2d17128d6f	Fixing suported device list.	2017-05-22 16:40:33 +01:00
a-doumoulakis	052426b824	Add support for triSYCL Eigen is now able to use triSYCL with EIGEN_SYCL_TRISYCL and TRISYCL_INCLUDE_DIR options Fix contraction kernel with correct nd_item dimension	2017-05-05 19:26:27 +01:00
RJ Ryan	949a2da38c	Use scalar_sum_op and scalar_quotient_op instead of operator+ and operator/ in MeanReducer. Improves support for std::complex types when compiling for CUDA. Expands on `e2e9cdd169` and `2bda1b0d93` .	2017-04-14 13:23:35 -07:00
Benoit Steiner	0d08165a7f	Merged in benoitsteiner/opencl (pull request PR-309) OpenCL improvements	2017-04-05 14:28:08 +00:00
Benoit Steiner	c302ea7bc4	Deleted empty line of code	2017-04-04 10:05:16 -07:00
Benoit Steiner	a5a0c8fac1	Guard sycl specific code under a EIGEN_USE_SYCL ifdef	2017-04-04 10:03:21 -07:00
Benoit Steiner	a1304b95b7	Code cleanup	2017-04-04 10:00:46 -07:00
Benoit Steiner	66c63826bd	Guard the sycl specific code with EIGEN_USE_SYCL	2017-04-04 09:59:09 -07:00
Benoit Steiner	e3e343390a	Guard the sycl specific code with a #ifdef EIGEN_USE_SYCL	2017-04-04 09:56:33 -07:00
Benoit Steiner	63840d4666	iGate the sycl specific code under a EIGEN_USE_SYCL define	2017-04-04 09:54:31 -07:00
Benoit Steiner	bc050ea9f0	Fixed compilation error when sycl is enabled.	2017-04-04 09:47:04 -07:00
Gagan Goel	4910630c96	fix typos in the Tensor readme	2017-03-31 20:32:16 -04:00
Benoit Steiner	c1b3d5ecb6	Restored code compatibility with compilers that dont support c++11 Gated more sycl code under #ifdef sycl	2017-03-31 08:31:28 -07:00
Benoit Steiner	e2d5d4e7b3	Restore the old constructors to retain compatibility with non c++11 compilers.	2017-03-31 08:26:13 -07:00
Benoit Steiner	73fcaa319f	Gate the sycl specific code under #ifdef sycl	2017-03-31 08:22:25 -07:00
Mehdi Goli	bd64ee8555	Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax.	2017-03-28 16:50:34 +01:00
Luke Iwanski	a91417a7a5	Introduces align allocator for SYCL buffer	2017-03-20 14:48:54 +00:00
Benoit Steiner	f8a622ef3c	Merged eigen/eigen into default	2017-03-15 20:06:19 -07:00
Benoit Steiner	fd7db52f9b	Silenced compilation warning	2017-03-15 20:02:39 -07:00
Luke Iwanski	c06861d15e	Fixes bug in get_sycl_supported_devices() that was reporting unsupported Intel CPU on AMD platform - causing timeouts in that configuration	2017-03-15 19:26:08 +00:00
Benoit Steiner	f0f3591118	Made the reduction code compile with cuda-clang	2017-03-14 14:16:53 -07:00
Mehdi Goli	f499fe9496	Adding synchronisation to convolution kernel for sycl backend.	2017-03-13 09:18:37 +00:00

... 3 4 5 6 7 ...

1365 Commits