Commit Graph

2122 Commits

Author SHA1 Message Date
Eugene Zhulenev
5d9a6686ed Block evaluation for TensorGeneratorOp 2019-03-05 16:35:21 -08:00
Eugene Zhulenev
a407e022e6 Tune tensor contraction threadpool heuristics 2019-03-05 14:19:59 -08:00
Eugene Zhulenev
56c6373f82 Add an extra check for the RunQueue size estimate 2019-03-05 11:51:26 -08:00
Eugene Zhulenev
efb5080d31 Do not initialize invalid fast_strides in TensorGeneratorOp 2019-03-04 16:58:49 -08:00
Eugene Zhulenev
b95941e5c2 Add tiled evaluation for TensorForcedEvalOp 2019-03-04 16:02:22 -08:00
Eugene Zhulenev
694084ecbd Use fast divisors in TensorGeneratorOp 2019-03-04 11:10:21 -08:00
Rasmus Munk Larsen
cf4a1c81fa Fix specialization for conjugate on non-complex types in TensorBase.h. 2019-03-01 14:21:09 -08:00
Rasmus Munk Larsen
6560692c67 Improve EventCount used by the non-blocking threadpool.
The current algorithm requires threads to commit/cancel waiting in order
they called Prewait. Spinning caused by that serialization can consume
lots of CPU time on some workloads. Restructure the algorithm to not
require that serialization and remove spin waits from Commit/CancelWait.
Note: this reduces max number of threads from 2^16 to 2^14 to leave
more space for ABA counter (which is now 22 bits).
Implementation details are explained in comments.
2019-02-22 13:56:26 -08:00
Gael Guennebaud
9ac1634fdf Fix conversion warnings 2019-02-19 21:59:53 +01:00
Rasmus Munk Larsen
071629a440 Fix incorrect value of NumDimensions in TensorContraction traits.
Reported here: #1671
2019-02-19 10:49:54 -08:00
Rasmus Larsen
efeabee445 Merged in ezhulenev/eigen-01 (pull request PR-590)
Do not generate no-op cast() and conjugate() expressions
2019-02-14 21:16:12 +00:00
Eugene Zhulenev
7b837559a7 Fix signed-unsigned return in RuqQueue 2019-02-14 10:40:21 -08:00
Eugene Zhulenev
f0d42d2265 Fix signed-unsigned comparison warning in RunQueue 2019-02-14 10:27:28 -08:00
Eugene Zhulenev
106ba7bb1a Do not generate no-op cast() and conjugate() expressions 2019-02-14 09:51:51 -08:00
Eugene Zhulenev
8c2f30c790 Speedup Tensor ThreadPool RunQueu::Empty() 2019-02-13 10:20:53 -08:00
Eugene Zhulenev
21eb97d3e0 Add PacketConv implementation for non-vectorizable src expressions 2019-02-08 15:47:25 -08:00
Eugene Zhulenev
1e36166ed1 Optimize TensorConversion evaluator: do not convert same type 2019-02-08 15:13:24 -08:00
Steven Peters
953ca5ba2f Spline.h: fix spelling "spang" -> "span" 2019-02-08 06:23:24 +00:00
Eugene Zhulenev
59998117bb Don't do parallel_pack if we can use thread_local memory in tensor contractions 2019-02-07 09:21:25 -08:00
Eugene Zhulenev
8491127082 Do not reduce parallelism too much in contractions with small number of threads 2019-02-04 12:59:33 -08:00
Eugene Zhulenev
eb21bab769 Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing 2019-02-04 10:43:16 -08:00
Gael Guennebaud
d586686924 Workaround lack of support for arbitrary packet-type in Tensor by manually loading half/quarter packets in tensor contraction mapper. 2019-01-30 16:48:01 +01:00
Christoph Hertzberg
a7779a9b42 Hide some annoying unused variable warnings in g++8.1 2019-01-29 16:48:21 +01:00
Christoph Hertzberg
c9825b967e Renaming even more I identifiers 2019-01-26 13:22:13 +01:00
Christoph Hertzberg
934b8a1304 Avoid I as an identifier, since it may clash with the C-header complex.h 2019-01-25 14:54:39 +01:00
Eugene Zhulenev
1e6d15b55b Fix shorten-64-to-32 warning in TensorContractionThreadPool 2019-01-11 11:41:53 -08:00
Eugene Zhulenev
0abe03764c Fix shorten-64-to-32 warning in TensorContractionThreadPool 2019-01-10 10:27:55 -08:00
Gael Guennebaud
d812f411c3 bug #1654: fix compilation with cuda and no c++11 2019-01-09 18:00:05 +01:00
Eugene Zhulenev
e70ffef967 Optimize evalShardedByInnerDim 2019-01-08 16:26:31 -08:00
Rasmus Munk Larsen
dd6d65898a Fix shorten-64-to-32 warning. Use regular memcpy if num_threads==0. 2018-12-12 14:45:31 -08:00
Gael Guennebaud
cf697272e1 Remove debug code. 2018-12-09 23:05:46 +01:00
Gael Guennebaud
450dc97c6b Various fixes in polynomial solver and its unit tests:
- cleanup noise in imaginary part of real roots
 - take into account the magnitude of the derivative to check roots.
 - use <= instead of < at appropriate places
2018-12-09 22:54:39 +01:00
Rasmus Munk Larsen
8a02883d58 Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554)
Fix tensor contraction on AVX512 builds

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-12-05 18:19:32 +00:00
Mark D Ryan
36f8f6d0be Fix evalShardedByInnerDim for AVX512 builds
evalShardedByInnerDim ensures that the values it passes for start_k and
end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel
does not work correctly when the values of k are not multiples of the
packet_size.  While this precaution works for AVX builds, it is insufficient
for AVX512 builds where the maximum packet size is 16.  The result is slightly
incorrect float32 contractions on AVX512 builds.

This commit fixes the problem by ensuring that k is always a multiple of
the packet_size if the packet_size is > 8.
2018-12-05 12:29:03 +01:00
Christoph Hertzberg
0ec8afde57 Fixed most conversion warnings in MatrixFunctions module 2018-11-20 16:23:28 +01:00
Rasmus Munk Larsen
72928a2c8a Merged in rmlarsen/eigen2 (pull request PR-543)
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.

Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2018-11-13 17:10:30 +00:00
Rasmus Munk Larsen
cda479d626 Remove accidental changes. 2018-11-12 18:34:04 -08:00
Rasmus Munk Larsen
719d9aee65 Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth. 2018-11-12 17:46:02 -08:00
Rasmus Munk Larsen
93f9988a7e A few small fixes to a) prevent throwing in ctors and dtors of the threading code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles. 2018-11-09 14:15:32 -08:00
Christoph Hertzberg
449ff74672 Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file).
Manually grafted from d107a371c6
2018-10-19 21:10:28 +02:00
Rasmus Munk Larsen
dda68f56ec Fix GPU build due to gpu_assert not always being defined. 2018-10-18 16:29:29 -07:00
Eugene Zhulenev
9e96e91936 Move from rvalue arguments in ThreadPool enqueue* methods 2018-10-16 16:48:32 -07:00
Eugene Zhulenev
217d839816 Reduce thread scheduling overhead in parallelFor 2018-10-16 14:53:06 -07:00
Rasmus Munk Larsen
d52763bb4f Merged in ezhulenev/eigen-02 (pull request PR-528)
[TensorBlockIO] Check if it's allowed to squeeze inner dimensions

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-10-16 15:39:40 +00:00
Eugene Zhulenev
900c7c61bb Check if it's allowed to squueze inner dimensions in TensorBlockIO 2018-10-15 16:52:33 -07:00
Gael Guennebaud
f0fb95135d Iterative solvers: unify and fix handling of multiple rhs.
m_info was not properly computed and the logic was repeated in several places.
2018-10-15 23:47:46 +02:00
Gael Guennebaud
2747b98cfc DGMRES: fix null rhs, fix restart, fix m_isDeflInitialized for multiple solve 2018-10-15 23:46:00 +02:00
Christoph Hertzberg
3f2c8b7ff0 Fix a lot of Doxygen warnings in Tensor module 2018-10-09 20:22:47 +02:00
Rasmus Munk Larsen
d16634c4d4 Fix out-of bounds access in TensorArgMax.h. 2018-10-08 16:41:36 -07:00
Gael Guennebaud
64b1a15318 Workaround stupid warning 2018-10-08 12:01:18 +02:00