Mehdi Goli
b512a9536f
Enabling per device specialisation of packetsize.
2018-08-01 13:39:13 +01:00
Benoit Steiner
edf46bd7a2
Merged in yuefengz/eigen (pull request PR-370)
...
Use device's allocate function instead of internal::aligned_malloc.
2018-07-31 22:38:28 +00:00
Gael Guennebaud
678a0dcb12
Merged in ezhulenev/eigen/tiling_3 (pull request PR-438)
...
Tiled tensor executor
2018-07-31 08:13:00 +00:00
Gael Guennebaud
679eece876
Speedup trivial tensor broadcasting on GPU by enforcing unaligned loads. See PR 437.
2018-07-31 10:10:14 +02:00
Gael Guennebaud
723856dec1
bug #1577 : fix msvc compilation of unit test, msvc defines ptrdiff_t as long long
2018-07-30 14:52:15 +02:00
Eugene Zhulenev
966c2a7bb6
Rename Index to StorageIndex + use Eigen::Array and Eigen::Map when possible
2018-07-27 12:45:17 -07:00
Eugene Zhulenev
6913221c43
Add tiled evaluation support to TensorExecutor
2018-07-25 13:51:10 -07:00
Alexey Frunze
7b91c11207
bug #1578 : Improve prefetching in matrix multiplication on MIPS.
2018-07-24 18:36:44 -07:00
Patrik Huber
f5cace5e9f
Fix two small typos in the documentation
2018-07-26 19:55:19 +00:00
Gael Guennebaud
34539c4af4
Merged in rmlarsen/eigen1 (pull request PR-441)
...
Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.
2018-07-30 11:26:24 +00:00
Mark D Ryan
bc615e4585
Re-enable FMA for fast sqrt functions
2018-07-30 13:21:00 +02:00
Mark D Ryan
96b030a8e4
Re-enable FMA for fast sqrt functions
...
This commit re-enables the use of FMA for the FAST sqrt functions.
Doing so improves the performance of both algorithms. The float32
version is now 88% the speed of the original function, while the
double version is 90%.
2018-07-30 10:19:51 +01:00
Rasmus Munk Larsen
e478532625
Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.
2018-07-27 12:36:34 -07:00
Rasmus Munk Larsen
2ebcb911b2
Add pcast packet op for NEON.
2018-07-26 14:28:48 -07:00
Christoph Hertzberg
397b0547e1
DIsable static assertions only when necessary and disable double-promotion warnings in that case as well
2018-07-26 00:01:24 +02:00
Christoph Hertzberg
5e79402b4a
fix warnings for doc-eigen-prerequisites
2018-07-24 21:59:15 +02:00
Christoph Hertzberg
5f79b7f9a9
Removed several shadowing types and use global Index typedef everywhere
2018-07-25 21:47:45 +02:00
Christoph Hertzberg
44ee201337
Rename variable which shadows class name
2018-07-25 20:26:15 +02:00
Gustavo Lima Chaves
705f66a9ca
Account for missing change on commit "Remove SimpleThreadPool and..."
...
"... always use {NonBlocking}ThreadPool". It seems the non-blocking
implementation was me the default/only one, but a reference to the old
name was left unmodified. Fix that.
2018-07-23 16:29:09 -07:00
Christoph Hertzberg
fd4fe7cbc5
Fixed issue which made documentation not getting built anymore
2018-07-24 22:56:15 +02:00
Christoph Hertzberg
636126ef40
Allow to filter out build-error messages
2018-07-24 20:12:49 +02:00
Eugene Zhulenev
d55efa6f0f
TensorBlockIO
2018-07-23 15:50:55 -07:00
Eugene Zhulenev
34a75c3c5c
Initial support of TensorBlock
2018-07-20 17:37:20 -07:00
Gael Guennebaud
2c2de9da7d
Merged in glchaves/eigen (pull request PR-433)
...
Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded block
2018-07-23 19:38:55 +00:00
Gael Guennebaud
4ca3e48f42
fix typo
2018-07-23 16:51:57 +02:00
Gael Guennebaud
c747cde69a
Add lastN shorcuts to seq/seqN.
2018-07-23 16:20:25 +02:00
Gustavo Lima Chaves
02eaaacbc5
Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded
...
block
Builds configured without the -DEIGEN_TEST_CXX11=ON flag would fail
right away without this, as this test seems to rely on those language
features. The skip under compilation with MSVC was kept.
2018-07-20 16:08:40 -07:00
Eugene Zhulenev
2bf864f1eb
Disable type traits for stdlibc++ <= 4.9.3
2018-07-20 10:11:44 -07:00
Gael Guennebaud
de70671937
Oopps, EIGEN_COMP_MSVC is not available before including Eigen.
2018-07-20 17:51:17 +02:00
Gael Guennebaud
56a750b6cc
Disable optimization for sparse_product unit test with MSVC 2013, otherwise it takes several hours to build.
2018-07-20 08:36:38 -07:00
Eugene Zhulenev
c58b874727
PR430: Convert count to the reducer type in MeanReducer
...
Without explicit conversion Tensorflow fails to compile, pset1 template deduction fails.
cannot convert '((const Eigen::internal::MeanReducer<Eigen::half>*)this)
->Eigen::internal::MeanReducer<Eigen::half>::packetCount_'
(type 'const DenseIndex {aka const long int}')
to type 'const type& {aka const Eigen::half&}'
return pdiv(vaccum, pset1<Packet>(packetCount_));
Honestly I’m not sure why it works in Eigen tests, because Eigen::half constructor is explicit, and why it stopped working in TF, I didn’t find any relevant changes since previous Eigen upgrade.
static_cast<T>(packetCount_) - breaks cxx11_tensor_reductions test for Eigen::half, also quite surprising.
2018-07-19 17:37:03 -07:00
Gael Guennebaud
2424e3b7ac
Pass by const ref.
2018-07-19 18:48:19 +02:00
Gael Guennebaud
509a5fa77f
Fix IsRelocatable without C++11
2018-07-19 18:47:38 +02:00
Gael Guennebaud
2ca2592009
Fix determination of EIGEN_HAS_TYPE_TRAITS
2018-07-19 18:47:18 +02:00
Gael Guennebaud
5e5987996f
Fix stupid error in Quaternion move ctor
2018-07-19 18:33:53 +02:00
David Hyde
d908afe35f
bug #1558 : fix a corner case in MINRES when both v_new and w_new vanish.
2018-07-08 22:06:38 -07:00
Eugene Zhulenev
6e654f3379
Reduce number of allocations in TensorContractionThreadPool.
2018-07-16 14:26:39 -07:00
Gael Guennebaud
7ccb623746
bug #1569 : fix Tensor<half>::mean() on AVX with respective unit test.
2018-07-19 13:15:40 +02:00
Alexey Frunze
1f523e7304
Add MIPS changes missing from previous merge.
2018-07-18 12:27:50 -07:00
Eugene Zhulenev
e3c2d61739
Assert that no output kernel is defined for GPU contraction
2018-07-18 14:34:22 -07:00
Eugene Zhulenev
086ded5c85
Disable type traits for GCC < 5.1.0
2018-07-18 16:32:55 -07:00
Eugene Zhulenev
79d4129cce
Specify default output kernel for TensorContractionOp
2018-07-18 14:21:01 -07:00
Gael Guennebaud
6e5a3b898f
Add regression for bugs #1573 and #1575
2018-07-18 23:34:34 +02:00
Gael Guennebaud
863580fe88
bug #1432 : fix conservativeResize for non-relocatable scalar types. For those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable
2018-07-18 23:33:07 +02:00
Gael Guennebaud
053ed97c72
Generalize ScalarWithExceptions to a full non-copyable and trowing scalar type to be used in other unit tests.
2018-07-18 23:27:37 +02:00
Gael Guennebaud
a503fc8725
bug #1575 : fix regression introduced in bug #1573 patch. Move ctor/assignment should not be defaulted.
2018-07-18 23:26:13 +02:00
Gael Guennebaud
308725c3c9
More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDA
2018-07-18 13:51:36 +02:00
Mark D Ryan
e79c5149bf
Fix AVX512 implementations of psqrt
...
This commit fixes the AVX512 implementations of psqrt in the same
way that 3ed67cb0bb
fixed the AVX2 version of this function. The
AVX512 versions of psqrt incorrectly return -0.0 for negative
values, instead of NaN. Fixing the issues requires adding
some additional instructions that slow down the algorithms. A
similar test to the one used in 3ed67cb0bb
shows that the
corrected Packet16f code runs at 73% of the speed of the existing code,
while the corrected Packed8d function runs at 68% of the original.
2018-06-25 05:05:02 -07:00
Yuefeng Zhou
1eff6cf8a7
Use device's allocate function instead of internal::aligned_malloc. This would make it easier to track memory usage in device instances.
2018-02-20 16:50:05 -08:00
Gael Guennebaud
adb134d47e
Fix implicit conversion from 0.0 to scalar
2018-02-16 22:26:01 +04:00