Commit Graph

1547 Commits

Author SHA1 Message Date
Benoit Steiner
dc413dbe8a Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158)
Add constructor for long types.
2016-02-02 20:58:06 -08:00
Ville Kallioniemi
783018d8f6 Use EIGEN_STATIC_ASSERT for backward compatibility. 2016-02-02 16:45:12 -07:00
Benoit Steiner
99cde88341 Don't try to use direct offsets when computing a tensor product, since the required stride isn't available. 2016-02-02 11:06:53 -08:00
Ville Kallioniemi
aedea349aa Replace separate low word constructors with a single templated constructor. 2016-02-01 20:25:02 -07:00
Ville Kallioniemi
f0fdefa96f Rebase to latest. 2016-02-01 19:32:31 -07:00
Benoit Steiner
64ce78c2ec Cleaned up a tensor contraction test 2016-02-01 13:57:41 -08:00
Benoit Steiner
0ce5d32be5 Sharded the cxx11_tensor_contract_cuda test 2016-02-01 13:33:23 -08:00
Benoit Steiner
922b5f527b Silenced a few compilation warnings 2016-02-01 13:30:49 -08:00
Benoit Steiner
6b5dff875e Made it possible to limit the number of blocks that will be used to evaluate a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations. 2016-02-01 12:46:32 -08:00
Benoit Steiner
264f8141f8 Shared the tensor reduction test 2016-02-01 07:44:31 -08:00
Benoit Steiner
11bb71c8fc Sharded the tensor device test 2016-02-01 07:34:59 -08:00
Benoit Steiner
e80ed948e1 Fixed a number of compilation warnings generated by the cuda tests 2016-01-31 20:09:41 -08:00
Benoit Steiner
6720b38fbf Fixed a few compilation warnings 2016-01-31 16:48:50 -08:00
Benoit Steiner
4a2ddfb81d Sharded the CUDA argmax tensor test 2016-01-31 10:44:15 -08:00
Benoit Steiner
483082ef6e Fixed a few memory leaks in the cuda tests 2016-01-30 11:59:22 -08:00
Benoit Steiner
bd21aba181 Sharded the cxx11_tensor_cuda test and fixed a memory leak 2016-01-30 11:47:09 -08:00
Benoit Steiner
9de155d153 Added a test to cover threaded tensor shuffling 2016-01-30 10:56:47 -08:00
Benoit Steiner
32088c06a1 Made the comparison between single and multithreaded contraction results more resistant to numerical noise to prevent spurious test failures. 2016-01-30 10:51:14 -08:00
Benoit Steiner
2053478c56 Made sure to use a tensor of rank 0 to store the result of a full reduction in the tensor thread pool test 2016-01-30 10:46:36 -08:00
Benoit Steiner
d0db95f730 Sharded the tensor thread pool test 2016-01-30 10:43:57 -08:00
Benoit Steiner
ba27c8a7de Made the CUDA contract test more robust to numerical noise. 2016-01-30 10:28:43 -08:00
Benoit Steiner
963f2d2a8f Marked several methods EIGEN_DEVICE_FUNC 2016-01-28 23:37:48 -08:00
Benoit Steiner
c5d25bf1d0 Fixed a couple of compilation warnings. 2016-01-28 23:15:45 -08:00
Benoit Steiner
7b3044d086 Made sure to call nvcc with the relaxed-constexpr flag. 2016-01-28 15:36:34 -08:00
Gael Guennebaud
ddf64babde merge 2016-01-28 13:21:48 +01:00
Gael Guennebaud
7802a6bb1c Fix unit test filename. 2016-01-28 09:35:37 +01:00
Benoit Steiner
4bf9eaf77a Deleted an invalid assertion that prevented the assignment of empty tensors. 2016-01-27 17:09:30 -08:00
Benoit Steiner
291069e885 Fixed some compilation problems with nvcc + clang 2016-01-27 15:37:03 -08:00
Benoit Steiner
47ca9dc809 Fixed the tensor_cuda test 2016-01-27 14:58:48 -08:00
Benoit Steiner
55a5204319 Fixed the flags passed to nvcc to compile the tensor code. 2016-01-27 14:46:34 -08:00
Benoit Steiner
9dfbd4fe8d Made the cuda tests compile using make check 2016-01-27 12:22:17 -08:00
Benoit Steiner
5973bcf939 Properly specify the namespace when calling cout/endl 2016-01-27 12:04:42 -08:00
Gael Guennebaud
9c8f7dfe94 bug #1156: fix several function declarations whose arguments were passed by value instead of being passed by reference 2016-01-27 18:34:42 +01:00
Ville Kallioniemi
02db1228ed Add constructor for long types. 2016-01-26 23:41:01 -07:00
Hauke Heibel
5eb2790be0 Fixed minor typo in SplineFitting. 2016-01-25 22:17:52 +01:00
Benoit Steiner
e3a15a03a4 Don't explicitely evaluate the subexpression from TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression 2016-01-24 23:04:50 -08:00
Benoit Steiner
bd207ce11e Added missing EIGEN_DEVICE_FUNC qualifier 2016-01-24 20:36:05 -08:00
Benoit Steiner
cb4e53ff7f Merged in ville-k/eigen/tensorflow_fix (pull request PR-153)
Add ctor for long
2016-01-22 19:11:31 -08:00
Ville Kallioniemi
9f94e030c1 Re-add executable flags to minimize changeset. 2016-01-22 20:08:45 -07:00
Benoit Steiner
3aeeca32af Leverage the new blocking code in the tensor contraction code. 2016-01-22 16:36:30 -08:00
Benoit Steiner
4beb447e27 Created a mechanism to enable contraction mappers to determine the best blocking strategy. 2016-01-22 14:37:26 -08:00
Gael Guennebaud
6a44ccb58b Backout changeset 690bc950f7 2016-01-22 15:03:53 +01:00
Ville Kallioniemi
9b6c72958a Update to latest default branch 2016-01-21 23:08:54 -07:00
Benoit Steiner
c33479324c Fixed a constness bug 2016-01-21 17:08:11 -08:00
Jan Prach
690bc950f7 fix clang warnings
"braces around scalar initializer"
2016-01-20 19:35:59 -08:00
Benoit Steiner
7ce932edd3 Small cleanup and small fix to the contraction of row major tensors 2016-01-20 18:12:08 -08:00
Benoit Steiner
47076bf00e Reduce the register pressure exerted by the tensor mappers whenever possible. This improves the performance of the contraction of a matrix with a vector by about 35%. 2016-01-20 14:51:48 -08:00
Ville Kallioniemi
915e7667cd Remove executable bit from header files 2016-01-19 21:17:29 -07:00
Ville Kallioniemi
2832175a68 Use explicitly 32 bit integer types in constructors. 2016-01-19 20:12:17 -07:00
Benoit Steiner
df79c00901 Improved the formatting of the code 2016-01-19 17:24:08 -08:00
Benoit Steiner
6d472d8375 Moved the contraction mapping code to its own file to make the code more manageable. 2016-01-19 17:22:05 -08:00
Benoit Steiner
b3b722905f Improved code indentation 2016-01-19 17:09:47 -08:00
Benoit Steiner
5b7713dd33 Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression. 2016-01-19 17:05:10 -08:00
Ville Kallioniemi
63fb66f53a Add ctor for long 2016-01-17 21:25:36 -07:00
Benoit Steiner
34057cff23 Fixed a race condition that could affect some reductions on CUDA devices. 2016-01-15 15:11:56 -08:00
Benoit Steiner
0461f0153e Made it possible to compare tensor dimensions inside a CUDA kernel. 2016-01-15 11:22:16 -08:00
Benoit Steiner
aed4cb1269 Use warp shuffles instead of shared memory access to speedup the inner reduction kernel. 2016-01-14 21:45:14 -08:00
Benoit Steiner
8fe2532e70 Fixed a boundary condition bug in the outer reduction kernel 2016-01-14 09:29:48 -08:00
Benoit Steiner
9f013a9d86 Properly record the rank of reduced tensors in the tensor traits. 2016-01-13 14:24:37 -08:00
Benoit Steiner
79b69b7444 Trigger the optimized matrix vector path more conservatively. 2016-01-12 15:21:09 -08:00
Benoit Steiner
d920d57f38 Improved the performance of the contraction of a 2d tensor with a 1d tensor by a factor of 3 or more. This helps speedup LSTM neural networks. 2016-01-12 11:32:27 -08:00
Benoit Steiner
bd7d901da9 Reverted a previous change that tripped nvcc when compiling in debug mode. 2016-01-11 17:49:44 -08:00
Benoit Steiner
c5e6900400 Silenced a few compilation warnings. 2016-01-11 17:06:39 -08:00
Benoit Steiner
f894736d61 Updated the tensor traits: the alignment is not part of the Flags enum anymore 2016-01-11 16:42:18 -08:00
Benoit Steiner
4f7714d72c Enabled the use of fixed dimensions from within a cuda kernel. 2016-01-11 16:01:00 -08:00
Benoit Steiner
01c55d37e6 Deleted unused variable. 2016-01-11 15:53:19 -08:00
Benoit Steiner
0504c56ea7 Silenced a nvcc compilation warning 2016-01-11 15:49:21 -08:00
Benoit Steiner
b523771a24 Silenced several compilation warnings triggered by nvcc. 2016-01-11 14:25:43 -08:00
Benoit Steiner
2c3b13eded Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)
Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
2016-01-11 11:43:37 -08:00
Benoit Steiner
2ccb1c8634 Fixed a bug in the dispatch of optimized reduction kernels. 2016-01-11 10:36:37 -08:00
Benoit Steiner
780623261e Re-enabled the optimized reduction CUDA code. 2016-01-11 09:07:14 -08:00
Jeremy Barnes
91678f489a Cleaned up double-defined macro from last commit 2016-01-10 22:44:45 -05:00
Jeremy Barnes
403a7cb6c3 Alternative way of forcing instantiation of device kernels without
causing warnings or requiring device to device kernel invocations.

This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
2016-01-10 22:39:13 -05:00
Benoit Steiner
e76904af1b Simplified the dispatch code. 2016-01-08 16:50:57 -08:00
Benoit Steiner
d726e864ac Made it possible to use array of size 0 on CUDA devices 2016-01-08 16:38:14 -08:00
Benoit Steiner
3358dfd5dd Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases 2016-01-08 16:28:53 -08:00
Benoit Steiner
53749ff415 Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures. 2016-01-08 13:53:40 -08:00
Benoit Steiner
6639b7d6e8 Removed a couple of partial specialization that confuse nvcc and result in errors such as this:
error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>"
            "Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>"
            "Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"
2016-01-07 18:45:19 -08:00
Benoit Steiner
0cb2ca5de2 Fixed a typo. 2016-01-06 18:50:28 -08:00
Benoit Steiner
213459d818 Optimized the performance of broadcasting of scalars. 2016-01-06 18:47:45 -08:00
Benoit Steiner
cfff40b1d4 Improved the performance of reductions on CUDA devices 2016-01-04 17:25:00 -08:00
Benoit Steiner
515dee0baf Added a 'divup' util to compute the floor of the quotient of two integers 2016-01-04 16:29:26 -08:00
Gael Guennebaud
8b0d1eb0f7 Fix numerous doxygen shortcomings, and workaround some clang -Wdocumentation warnings 2016-01-01 21:45:06 +01:00
Gael Guennebaud
978c379ed7 Add missing ctor from uint 2015-12-30 12:52:38 +01:00
Eugene Brevdo
f7362772e3 Add digamma for CPU + CUDA. Includes tests. 2015-12-24 21:15:38 -08:00
Benoit Steiner
bdcbc66a5c Don't attempt to vectorize mean reductions of integers since we can't use
SSE or AVX instructions to divide 2 integers.
2015-12-22 17:51:55 -08:00
Benoit Steiner
a1e08fb2a5 Optimized the configuration of the outer reduction cuda kernel 2015-12-22 16:30:10 -08:00
Benoit Steiner
9c7d96697b Added missing define 2015-12-22 16:11:07 -08:00
Benoit Steiner
e7e6d01810 Made sure the optimized gpu reduction code is actually compiled. 2015-12-22 15:07:33 -08:00
Benoit Steiner
b5d2078c4a Optimized outer reduction on GPUs. 2015-12-22 15:06:17 -08:00
Benoit Steiner
1c3e78319d Added missing const 2015-12-21 15:05:01 -08:00
Benoit Steiner
1b82969559 Add alignment requirement for local buffer used by the slicing op. 2015-12-18 14:36:35 -08:00
Benoit Steiner
75a7fa1919 Doubled the speed of full reductions on GPUs. 2015-12-18 14:07:31 -08:00
Benoit Steiner
8dd17cbe80 Fixed a clang compilation warning triggered by the use of arrays of size 0. 2015-12-17 14:00:33 -08:00
Benoit Steiner
4aac55f684 Silenced some compilation warnings triggered by nvcc 2015-12-17 13:39:01 -08:00
Benoit Steiner
40e6250fc3 Made it possible to run tensor chipping operations on CUDA devices 2015-12-17 13:29:08 -08:00
Benoit Steiner
2ca55a3ae4 Fixed some compilation error triggered by the tensor code with msvc 2008 2015-12-16 20:45:58 -08:00
Gael Guennebaud
35d8725c73 Disable AutoDiffScalar generic copy ctor for non compatible scalar types (fix ambiguous template instantiation) 2015-12-16 10:14:24 +01:00
Christoph Hertzberg
92655e7215 bug #1136: Protect isinf for Intel compilers. Also don't distinguish GCC from ICC and don't rely on EIGEN_NOT_A_MACRO, which might not be defined when including this. 2015-12-15 11:34:52 +01:00
Benoit Steiner
17352e2792 Made the entire TensorFixedSize api callable from a CUDA kernel. 2015-12-14 15:20:31 -08:00