Commit Graph

2671 Commits

Author SHA1 Message Date
Deven Desai
2dbea5510f Merged eigen/eigen into default 2019-03-19 16:52:38 -04:00
David Tellenbach
bd9c2ae3fd Fix include guard comments 2019-03-15 15:29:17 +01:00
Eugene Zhulenev
001f10e3c9 Fix segfaults with cuda compilation 2019-03-11 09:43:33 -07:00
Eugene Zhulenev
899c16fa2c Fix a bug in TensorGenerator for 1d tensors 2019-03-11 09:42:01 -07:00
Eugene Zhulenev
0f8bfff23d Fix a data race in NonBlockingThreadPool 2019-03-11 09:38:44 -07:00
Gael Guennebaud
2df4f00246 Change license from LGPL to MPL2 with agreement from David Harmon. 2019-03-07 18:17:10 +01:00
Rasmus Munk Larsen
3c3f639fe2 Merge. 2019-03-06 11:54:30 -08:00
Rasmus Munk Larsen
f4ec8edea8 Add macro EIGEN_AVOID_THREAD_LOCAL to make it possible to manually disable the use of thread_local. 2019-03-06 11:52:04 -08:00
Rasmus Munk Larsen
41cdc370d0 Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
Found with -Wundefined-func-template.

Author: tkoeppe@google.com
2019-03-06 11:42:22 -08:00
Rasmus Munk Larsen
cc407c9d4d Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
Found with -Wundefined-func-template.

Author: tkoeppe@google.com
2019-03-06 11:40:06 -08:00
Eugene Zhulenev
1bc2a0a57c Add missing return to NonBlockingThreadPool::LocalSteal 2019-03-06 10:49:49 -08:00
Eugene Zhulenev
4e4dcd9026 Remove redundant steal loop 2019-03-06 10:39:07 -08:00
Eugene Zhulenev
25abaa2e41 Check that inner block dimension is continuous 2019-03-05 17:34:35 -08:00
Eugene Zhulenev
5d9a6686ed Block evaluation for TensorGeneratorOp 2019-03-05 16:35:21 -08:00
Eugene Zhulenev
a407e022e6 Tune tensor contraction threadpool heuristics 2019-03-05 14:19:59 -08:00
Eugene Zhulenev
56c6373f82 Add an extra check for the RunQueue size estimate 2019-03-05 11:51:26 -08:00
Eugene Zhulenev
b1a8627493 Do not create Tensor<const T> in cxx11_tensor_forced_eval test 2019-03-05 11:19:25 -08:00
Eugene Zhulenev
efb5080d31 Do not initialize invalid fast_strides in TensorGeneratorOp 2019-03-04 16:58:49 -08:00
Eugene Zhulenev
b95941e5c2 Add tiled evaluation for TensorForcedEvalOp 2019-03-04 16:02:22 -08:00
Eugene Zhulenev
694084ecbd Use fast divisors in TensorGeneratorOp 2019-03-04 11:10:21 -08:00
Rasmus Munk Larsen
cf4a1c81fa Fix specialization for conjugate on non-complex types in TensorBase.h. 2019-03-01 14:21:09 -08:00
Rasmus Munk Larsen
6560692c67 Improve EventCount used by the non-blocking threadpool.
The current algorithm requires threads to commit/cancel waiting in order
they called Prewait. Spinning caused by that serialization can consume
lots of CPU time on some workloads. Restructure the algorithm to not
require that serialization and remove spin waits from Commit/CancelWait.
Note: this reduces max number of threads from 2^16 to 2^14 to leave
more space for ABA counter (which is now 22 bits).
Implementation details are explained in comments.
2019-02-22 13:56:26 -08:00
Gael Guennebaud
9ac1634fdf Fix conversion warnings 2019-02-19 21:59:53 +01:00
Rasmus Munk Larsen
071629a440 Fix incorrect value of NumDimensions in TensorContraction traits.
Reported here: #1671
2019-02-19 10:49:54 -08:00
Rasmus Larsen
efeabee445 Merged in ezhulenev/eigen-01 (pull request PR-590)
Do not generate no-op cast() and conjugate() expressions
2019-02-14 21:16:12 +00:00
Eugene Zhulenev
7b837559a7 Fix signed-unsigned return in RuqQueue 2019-02-14 10:40:21 -08:00
Eugene Zhulenev
f0d42d2265 Fix signed-unsigned comparison warning in RunQueue 2019-02-14 10:27:28 -08:00
Eugene Zhulenev
106ba7bb1a Do not generate no-op cast() and conjugate() expressions 2019-02-14 09:51:51 -08:00
Eugene Zhulenev
8c2f30c790 Speedup Tensor ThreadPool RunQueu::Empty() 2019-02-13 10:20:53 -08:00
Eugene Zhulenev
21eb97d3e0 Add PacketConv implementation for non-vectorizable src expressions 2019-02-08 15:47:25 -08:00
Eugene Zhulenev
1e36166ed1 Optimize TensorConversion evaluator: do not convert same type 2019-02-08 15:13:24 -08:00
Steven Peters
953ca5ba2f Spline.h: fix spelling "spang" -> "span" 2019-02-08 06:23:24 +00:00
Eugene Zhulenev
59998117bb Don't do parallel_pack if we can use thread_local memory in tensor contractions 2019-02-07 09:21:25 -08:00
Eugene Zhulenev
8491127082 Do not reduce parallelism too much in contractions with small number of threads 2019-02-04 12:59:33 -08:00
Eugene Zhulenev
eb21bab769 Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing 2019-02-04 10:43:16 -08:00
Gael Guennebaud
d586686924 Workaround lack of support for arbitrary packet-type in Tensor by manually loading half/quarter packets in tensor contraction mapper. 2019-01-30 16:48:01 +01:00
Christoph Hertzberg
a7779a9b42 Hide some annoying unused variable warnings in g++8.1 2019-01-29 16:48:21 +01:00
Christoph Hertzberg
c9825b967e Renaming even more I identifiers 2019-01-26 13:22:13 +01:00
Christoph Hertzberg
934b8a1304 Avoid I as an identifier, since it may clash with the C-header complex.h 2019-01-25 14:54:39 +01:00
Rasmus Munk Larsen
ee550a2ac3 Fix flaky test for tensor fft. 2019-01-16 14:03:12 -08:00
Eugene Zhulenev
1e6d15b55b Fix shorten-64-to-32 warning in TensorContractionThreadPool 2019-01-11 11:41:53 -08:00
Eugene Zhulenev
0abe03764c Fix shorten-64-to-32 warning in TensorContractionThreadPool 2019-01-10 10:27:55 -08:00
Gael Guennebaud
d812f411c3 bug #1654: fix compilation with cuda and no c++11 2019-01-09 18:00:05 +01:00
Eugene Zhulenev
e70ffef967 Optimize evalShardedByInnerDim 2019-01-08 16:26:31 -08:00
Rasmus Munk Larsen
dd6d65898a Fix shorten-64-to-32 warning. Use regular memcpy if num_threads==0. 2018-12-12 14:45:31 -08:00
Gael Guennebaud
cf697272e1 Remove debug code. 2018-12-09 23:05:46 +01:00
Gael Guennebaud
450dc97c6b Various fixes in polynomial solver and its unit tests:
- cleanup noise in imaginary part of real roots
 - take into account the magnitude of the derivative to check roots.
 - use <= instead of < at appropriate places
2018-12-09 22:54:39 +01:00
Rasmus Munk Larsen
8a02883d58 Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554)
Fix tensor contraction on AVX512 builds

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-12-05 18:19:32 +00:00
Mark D Ryan
36f8f6d0be Fix evalShardedByInnerDim for AVX512 builds
evalShardedByInnerDim ensures that the values it passes for start_k and
end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel
does not work correctly when the values of k are not multiples of the
packet_size.  While this precaution works for AVX builds, it is insufficient
for AVX512 builds where the maximum packet size is 16.  The result is slightly
incorrect float32 contractions on AVX512 builds.

This commit fixes the problem by ensuring that k is always a multiple of
the packet_size if the packet_size is > 8.
2018-12-05 12:29:03 +01:00
Christoph Hertzberg
0ec8afde57 Fixed most conversion warnings in MatrixFunctions module 2018-11-20 16:23:28 +01:00