Commit Graph

10188 Commits

Author SHA1 Message Date
Mark D Ryan
36f8f6d0be Fix evalShardedByInnerDim for AVX512 builds
evalShardedByInnerDim ensures that the values it passes for start_k and
end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel
does not work correctly when the values of k are not multiples of the
packet_size.  While this precaution works for AVX builds, it is insufficient
for AVX512 builds where the maximum packet size is 16.  The result is slightly
incorrect float32 contractions on AVX512 builds.

This commit fixes the problem by ensuring that k is always a multiple of
the packet_size if the packet_size is > 8.
2018-12-05 12:29:03 +01:00
Mark D Ryan
6f5b126e6d Fix tensor contraction for AVX512 machines
This patch modifies the TensorContraction class to ensure that the kc_ field is
always a multiple of the packet_size, if the packet_size is > 8.  Without this
change spatial convolutions in Tensorflow do not work properly as the code that
re-arranges the input matrices can assert if kc_ is not a multiple of the
packet_size.  This leads to a unit test failure,
//tensorflow/python/kernel_tests:conv_ops_test, on AVX512 builds of tensorflow.
2018-07-31 09:33:37 +01:00
Rasmus Munk Larsen
77b447c24e Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX. 2018-11-12 13:42:24 -08:00
Gael Guennebaud
c81bdbdadc Add manual doc on STL-compatible iterators 2018-11-12 22:06:33 +01:00
Gael Guennebaud
0105146915 Fix warning in c++03 2018-11-10 09:11:38 +01:00
Rasmus Munk Larsen
93f9988a7e A few small fixes to a) prevent throwing in ctors and dtors of the threading code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles. 2018-11-09 14:15:32 -08:00
Gael Guennebaud
784a3f13cf bug #1619: fix mixing of const and non-const generic iterators 2018-11-09 21:45:10 +01:00
Gael Guennebaud
db9a9a12ba bug #1619: make const and non-const iterators compatible 2018-11-09 16:49:19 +01:00
Gael Guennebaud
fbd6e7b025 add missing ref to a.zeta(b) 2018-11-09 13:53:42 +01:00
Gael Guennebaud
dffd1e11de Limit the size of the toc 2018-11-09 13:52:34 +01:00
Gael Guennebaud
a88e0a0e95 Update doxy hacks wrt doxygen 1.8.13/14 2018-11-09 13:52:10 +01:00
Gael Guennebaud
bd9a00718f Let doxygen sees lastN 2018-11-09 11:35:48 +01:00
Gael Guennebaud
d7c644213c Add and update manual pages for slicing, indexing, and reshaping. 2018-11-09 11:35:27 +01:00
Gael Guennebaud
a368848473 Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE 2018-11-09 10:33:17 +01:00
Gael Guennebaud
f62a0f69c6 Fix max-size in indexed-view 2018-11-08 18:40:22 +01:00
Gael Guennebaud
bf495859ff Merged in glchaves/eigen (pull request PR-539)
Vectorize row-by-row gebp loop iterations on 16 packets as well
2018-11-07 07:21:15 +00:00
Gael Guennebaud
995730fc6c Add option to disable plot generation 2018-11-07 00:41:16 +01:00
Gustavo Lima Chaves
4ad359237a Vectorize row-by-row gebp loop iterations on 16 packets as well
Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com>
Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com>
2018-11-06 10:48:42 -08:00
Gael Guennebaud
9d318b92c6 add unit tests for bug #1619 2018-11-01 15:14:50 +01:00
Matthieu Vigne
8d7a73e48e bug #1617: Fix SolveTriangular.solveInPlace crashing for empty matrix.
This made FullPivLU.kernel() crash when used on the zero matrix.
Add unit test for FullPivLU.kernel() on the zero matrix.
2018-10-31 20:28:18 +01:00
Christoph Hertzberg
66b28e290d bug #1618: Use different power-of-2 check to avoid MSVC warning 2018-11-01 13:23:19 +01:00
Rasmus Munk Larsen
07fcdd1438 Merged in ezhulenev/eigen-02 (pull request PR-534)
Fix cxx11_tensor_{block_access, reduction} tests
2018-10-25 18:34:35 +00:00
Eugene Zhulenev
8a977c1f46 Fix cxx11_tensor_{block_access, reduction} tests 2018-10-25 11:31:29 -07:00
Halie Murray-Davis
fb62d6d96e Fix typo in tutorial documentation. 2018-10-25 04:55:34 +00:00
Christoph Hertzberg
b5f077d22c Document EIGEN_NO_IO preprocessor directive 2018-10-25 16:49:25 +02:00
Christian von Schultz
4a40b3785d Collapsed revision (based on pull request PR-325)
* Support compiling without IO streams

Add the preprocessor definition EIGEN_NO_IO which, if defined,
disables all use of the IO streams part of the standard library.
2018-10-22 21:14:40 +02:00
Rasmus Munk Larsen
14054e217f Do not rely on the compiler generating __device__ functions for constexpr in Cuda (via EIGEN_CONSTEXPR_ARE_DEVICE_FUNC. This breaks several target in the TensorFlow Cuda build, e.g.,
INFO: From Compiling tensorflow/core/kernels/maxpooling_op_gpu.cu.cc:
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNHWC< ::Eigen::half> ") is not allowed

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code"

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNCHW< ::Eigen::half> ") is not allowed

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code

4 errors detected in the compilation of "/tmp/tmpxft_00000011_00000000-6_maxpooling_op_gpu.cu.cpp1.ii".
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: output 'tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o' was not created
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: Couldn't build file tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o: not all outputs were created or valid
2018-10-22 16:18:24 -07:00
Rasmus Munk Larsen
954b4ca9d0 Suppress compiler warning about unused global variable. 2018-10-22 13:48:56 -07:00
Rasmus Munk Larsen
9caafca550 Merged in rmlarsen/eigen (pull request PR-532)
Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
2018-10-19 21:37:14 +00:00
Christoph Hertzberg
449ff74672 Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file).
Manually grafted from d107a371c6
2018-10-19 21:10:28 +02:00
Rasmus Munk Larsen
39fec15d5c Merged eigen/eigen into default 2018-10-19 09:48:19 -07:00
Christoph Hertzberg
40fa6f98bf bug #1606: Explicitly set the standard before find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11.
Grafted manually from a4afa90d16
2018-10-19 17:20:51 +02:00
Rasmus Munk Larsen
d8f285852b Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available. 2018-10-18 16:55:02 -07:00
Rasmus Munk Larsen
dda68f56ec Fix GPU build due to gpu_assert not always being defined. 2018-10-18 16:29:29 -07:00
Gael Guennebaud
1dcf5a6ed8 fix typo in doc 2018-10-17 09:29:36 +02:00
Eugene Zhulenev
9e96e91936 Move from rvalue arguments in ThreadPool enqueue* methods 2018-10-16 16:48:32 -07:00
Eugene Zhulenev
217d839816 Reduce thread scheduling overhead in parallelFor 2018-10-16 14:53:06 -07:00
Rasmus Munk Larsen
d52763bb4f Merged in ezhulenev/eigen-02 (pull request PR-528)
[TensorBlockIO] Check if it's allowed to squeeze inner dimensions

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-10-16 15:39:40 +00:00
Gael Guennebaud
0f780bb0b4 Fix float-to-double warning 2018-10-16 09:19:45 +02:00
Eugene Zhulenev
900c7c61bb Check if it's allowed to squueze inner dimensions in TensorBlockIO 2018-10-15 16:52:33 -07:00
Gael Guennebaud
a39e0f7438 bug #1612: fix regression in "outer-vectorization" of partial reductions for PacketSize==1 (aka complex<double>) 2018-10-16 01:04:25 +02:00
Gael Guennebaud
e3b85771d7 Show call stack in case of failing sparse solving. 2018-10-16 00:43:44 +02:00
Gael Guennebaud
d2d570c116 Remove useless (and broken) resize 2018-10-16 00:42:48 +02:00
Gael Guennebaud
f0fb95135d Iterative solvers: unify and fix handling of multiple rhs.
m_info was not properly computed and the logic was repeated in several places.
2018-10-15 23:47:46 +02:00
Gael Guennebaud
2747b98cfc DGMRES: fix null rhs, fix restart, fix m_isDeflInitialized for multiple solve 2018-10-15 23:46:00 +02:00
Gael Guennebaud
d835a0bf53 relax number of iterations checks to avoid false negatives 2018-10-15 10:23:32 +02:00
Gael Guennebaud
3a33db4de5 merge 2018-10-15 09:22:27 +02:00
Rasmus Munk Larsen
0ed811a9c1 Suppress unused variable compiler warning in sparse subtest 3. 2018-10-12 13:41:57 -07:00
Mark D Ryan
aa110e681b PR 526: Speed up multiplication of small, dynamically sized matrices
The Packet16f, Packet8f and Packet8d types are too large to use with dynamically
sized matrices typically processed by the SliceVectorizedTraversal specialization of
the dense_assignment_loop.  Using these types is likely to lead to little or no
vectorization.  Significant slowdown in the multiplication of these small matrices can
be observed when building with AVX and AVX512 enabled.

This patch introduces a new dense_assignment_kernel that is used when
computing small products whose operands have dynamic dimensions.  It ensures that the
PacketSize used is no larger than 4, thereby increasing the chance that vectorized
instructions will be used when computing the product.

I tested all 969 possible combinations of M, K, and N that are handled by the
dense_assignment_loop on x86 builds.  Although a few combinations are slowed down
by this patch they are far outnumbered by the cases that are sped up, as the
following results demonstrate.


Disabling Packed8d on AVX512 builds:

Total Cases:             969
Better:                  511
Worse:                   85
Same:                    373
Max Improvement:         169.00% (4 8 6)
Max Degradation:         36.50% (8 5 3)
Median Improvement:      35.46%
Median Degradation:      17.41%
Total FLOPs Improvement: 19.42%


Disabling Packet16f and Packed8f on AVX512 builds:

Total Cases:             969
Better:                  658
Worse:                   5
Same:                    306
Max Improvement:         214.05% (8 6 5)
Max Degradation:         22.26% (16 2 1)
Median Improvement:      60.05%
Median Degradation:      13.32%
Total FLOPs Improvement: 59.58%


Disabling Packed8f on AVX builds:

Total Cases:             969
Better:                  663
Worse:                   96
Same:                    210
Max Improvement:         155.29% (4 10 5)
Max Degradation:         35.12% (8 3 2)
Median Improvement:      34.28%
Median Degradation:      15.05%
Total FLOPs Improvement: 26.02%
2018-10-12 15:20:21 +02:00
Eugene Zhulenev
d9392f9e55 Fix code format 2018-11-02 14:51:35 -07:00