Commit Graph

9784 Commits

Author SHA1 Message Date
Mehdi Goli
b512a9536f Enabling per device specialisation of packetsize. 2018-08-01 13:39:13 +01:00
Benoit Steiner
edf46bd7a2 Merged in yuefengz/eigen (pull request PR-370)
Use device's allocate function instead of internal::aligned_malloc.
2018-07-31 22:38:28 +00:00
Gael Guennebaud
678a0dcb12 Merged in ezhulenev/eigen/tiling_3 (pull request PR-438)
Tiled tensor executor
2018-07-31 08:13:00 +00:00
Gael Guennebaud
679eece876 Speedup trivial tensor broadcasting on GPU by enforcing unaligned loads. See PR 437. 2018-07-31 10:10:14 +02:00
Gael Guennebaud
723856dec1 bug #1577: fix msvc compilation of unit test, msvc defines ptrdiff_t as long long 2018-07-30 14:52:15 +02:00
Eugene Zhulenev
966c2a7bb6 Rename Index to StorageIndex + use Eigen::Array and Eigen::Map when possible 2018-07-27 12:45:17 -07:00
Eugene Zhulenev
6913221c43 Add tiled evaluation support to TensorExecutor 2018-07-25 13:51:10 -07:00
Alexey Frunze
7b91c11207 bug #1578: Improve prefetching in matrix multiplication on MIPS. 2018-07-24 18:36:44 -07:00
Patrik Huber
f5cace5e9f Fix two small typos in the documentation 2018-07-26 19:55:19 +00:00
Gael Guennebaud
34539c4af4 Merged in rmlarsen/eigen1 (pull request PR-441)
Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.
2018-07-30 11:26:24 +00:00
Mark D Ryan
bc615e4585 Re-enable FMA for fast sqrt functions 2018-07-30 13:21:00 +02:00
Mark D Ryan
96b030a8e4 Re-enable FMA for fast sqrt functions
This commit re-enables the use of FMA for the FAST sqrt functions.
Doing so improves the performance of both algorithms.  The float32
version is now 88% the speed of the original function, while the
double version is 90%.
2018-07-30 10:19:51 +01:00
Rasmus Munk Larsen
e478532625 Reduce the number of template specializations of classes related to tensor contraction to reduce binary size. 2018-07-27 12:36:34 -07:00
Rasmus Munk Larsen
2ebcb911b2 Add pcast packet op for NEON. 2018-07-26 14:28:48 -07:00
Christoph Hertzberg
397b0547e1 DIsable static assertions only when necessary and disable double-promotion warnings in that case as well 2018-07-26 00:01:24 +02:00
Christoph Hertzberg
5e79402b4a fix warnings for doc-eigen-prerequisites 2018-07-24 21:59:15 +02:00
Christoph Hertzberg
5f79b7f9a9 Removed several shadowing types and use global Index typedef everywhere 2018-07-25 21:47:45 +02:00
Christoph Hertzberg
44ee201337 Rename variable which shadows class name 2018-07-25 20:26:15 +02:00
Gustavo Lima Chaves
705f66a9ca Account for missing change on commit "Remove SimpleThreadPool and..."
"... always use {NonBlocking}ThreadPool". It seems the non-blocking
implementation was me the default/only one, but a reference to the old
name was left unmodified. Fix that.
2018-07-23 16:29:09 -07:00
Christoph Hertzberg
fd4fe7cbc5 Fixed issue which made documentation not getting built anymore 2018-07-24 22:56:15 +02:00
Christoph Hertzberg
636126ef40 Allow to filter out build-error messages 2018-07-24 20:12:49 +02:00
Eugene Zhulenev
d55efa6f0f TensorBlockIO 2018-07-23 15:50:55 -07:00
Eugene Zhulenev
34a75c3c5c Initial support of TensorBlock 2018-07-20 17:37:20 -07:00
Gael Guennebaud
2c2de9da7d Merged in glchaves/eigen (pull request PR-433)
Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded  block
2018-07-23 19:38:55 +00:00
Gael Guennebaud
4ca3e48f42 fix typo 2018-07-23 16:51:57 +02:00
Gael Guennebaud
c747cde69a Add lastN shorcuts to seq/seqN. 2018-07-23 16:20:25 +02:00
Gustavo Lima Chaves
02eaaacbc5 Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded
block

Builds configured without the -DEIGEN_TEST_CXX11=ON flag would fail
right away without this, as this test seems to rely on those language
features. The skip under compilation with MSVC was kept.
2018-07-20 16:08:40 -07:00
Eugene Zhulenev
2bf864f1eb Disable type traits for stdlibc++ <= 4.9.3 2018-07-20 10:11:44 -07:00
Gael Guennebaud
de70671937 Oopps, EIGEN_COMP_MSVC is not available before including Eigen. 2018-07-20 17:51:17 +02:00
Gael Guennebaud
56a750b6cc Disable optimization for sparse_product unit test with MSVC 2013, otherwise it takes several hours to build. 2018-07-20 08:36:38 -07:00
Eugene Zhulenev
c58b874727 PR430: Convert count to the reducer type in MeanReducer
Without explicit conversion Tensorflow fails to compile, pset1 template deduction fails.

cannot convert '((const Eigen::internal::MeanReducer<Eigen::half>*)this)
  ->Eigen::internal::MeanReducer<Eigen::half>::packetCount_'
(type 'const DenseIndex {aka const long int}')
to type 'const type& {aka const Eigen::half&}'
     return pdiv(vaccum, pset1<Packet>(packetCount_));
Honestly I’m not sure why it works in Eigen tests, because Eigen::half constructor is explicit, and why it stopped working in TF, I didn’t find any relevant changes since previous Eigen upgrade.

static_cast<T>(packetCount_) - breaks cxx11_tensor_reductions test for Eigen::half, also quite surprising.
2018-07-19 17:37:03 -07:00
Gael Guennebaud
2424e3b7ac Pass by const ref. 2018-07-19 18:48:19 +02:00
Gael Guennebaud
509a5fa77f Fix IsRelocatable without C++11 2018-07-19 18:47:38 +02:00
Gael Guennebaud
2ca2592009 Fix determination of EIGEN_HAS_TYPE_TRAITS 2018-07-19 18:47:18 +02:00
Gael Guennebaud
5e5987996f Fix stupid error in Quaternion move ctor 2018-07-19 18:33:53 +02:00
David Hyde
d908afe35f bug #1558: fix a corner case in MINRES when both v_new and w_new vanish. 2018-07-08 22:06:38 -07:00
Eugene Zhulenev
6e654f3379 Reduce number of allocations in TensorContractionThreadPool. 2018-07-16 14:26:39 -07:00
Gael Guennebaud
7ccb623746 bug #1569: fix Tensor<half>::mean() on AVX with respective unit test. 2018-07-19 13:15:40 +02:00
Alexey Frunze
1f523e7304 Add MIPS changes missing from previous merge. 2018-07-18 12:27:50 -07:00
Eugene Zhulenev
e3c2d61739 Assert that no output kernel is defined for GPU contraction 2018-07-18 14:34:22 -07:00
Eugene Zhulenev
086ded5c85 Disable type traits for GCC < 5.1.0 2018-07-18 16:32:55 -07:00
Eugene Zhulenev
79d4129cce Specify default output kernel for TensorContractionOp 2018-07-18 14:21:01 -07:00
Gael Guennebaud
6e5a3b898f Add regression for bugs #1573 and #1575 2018-07-18 23:34:34 +02:00
Gael Guennebaud
863580fe88 bug #1432: fix conservativeResize for non-relocatable scalar types. For those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable 2018-07-18 23:33:07 +02:00
Gael Guennebaud
053ed97c72 Generalize ScalarWithExceptions to a full non-copyable and trowing scalar type to be used in other unit tests. 2018-07-18 23:27:37 +02:00
Gael Guennebaud
a503fc8725 bug #1575: fix regression introduced in bug #1573 patch. Move ctor/assignment should not be defaulted. 2018-07-18 23:26:13 +02:00
Gael Guennebaud
308725c3c9 More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDA 2018-07-18 13:51:36 +02:00
Mark D Ryan
e79c5149bf Fix AVX512 implementations of psqrt
This commit fixes the AVX512 implementations of psqrt in the same
way that 3ed67cb0bb
 fixed the AVX2 version of this function.  The
AVX512 versions of psqrt incorrectly return -0.0 for negative
values, instead of NaN.  Fixing the issues requires adding
some additional instructions that slow down the algorithms.  A
similar test to the one used in 3ed67cb0bb
 shows that the
corrected Packet16f code runs at 73% of the speed of the existing code,
while the corrected Packed8d function runs at 68% of the original.
2018-06-25 05:05:02 -07:00
Yuefeng Zhou
1eff6cf8a7 Use device's allocate function instead of internal::aligned_malloc. This would make it easier to track memory usage in device instances. 2018-02-20 16:50:05 -08:00
Gael Guennebaud
adb134d47e Fix implicit conversion from 0.0 to scalar 2018-02-16 22:26:01 +04:00