Benoit Steiner
dc413dbe8a
Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158)
...
Add constructor for long types.
2016-02-02 20:58:06 -08:00
Ville Kallioniemi
783018d8f6
Use EIGEN_STATIC_ASSERT for backward compatibility.
2016-02-02 16:45:12 -07:00
Benoit Steiner
99cde88341
Don't try to use direct offsets when computing a tensor product, since the required stride isn't available.
2016-02-02 11:06:53 -08:00
Ville Kallioniemi
aedea349aa
Replace separate low word constructors with a single templated constructor.
2016-02-01 20:25:02 -07:00
Ville Kallioniemi
f0fdefa96f
Rebase to latest.
2016-02-01 19:32:31 -07:00
Benoit Steiner
64ce78c2ec
Cleaned up a tensor contraction test
2016-02-01 13:57:41 -08:00
Benoit Steiner
0ce5d32be5
Sharded the cxx11_tensor_contract_cuda test
2016-02-01 13:33:23 -08:00
Benoit Steiner
922b5f527b
Silenced a few compilation warnings
2016-02-01 13:30:49 -08:00
Benoit Steiner
6b5dff875e
Made it possible to limit the number of blocks that will be used to evaluate a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.
2016-02-01 12:46:32 -08:00
Benoit Steiner
264f8141f8
Shared the tensor reduction test
2016-02-01 07:44:31 -08:00
Benoit Steiner
11bb71c8fc
Sharded the tensor device test
2016-02-01 07:34:59 -08:00
Benoit Steiner
e80ed948e1
Fixed a number of compilation warnings generated by the cuda tests
2016-01-31 20:09:41 -08:00
Benoit Steiner
6720b38fbf
Fixed a few compilation warnings
2016-01-31 16:48:50 -08:00
Benoit Steiner
4a2ddfb81d
Sharded the CUDA argmax tensor test
2016-01-31 10:44:15 -08:00
Benoit Steiner
483082ef6e
Fixed a few memory leaks in the cuda tests
2016-01-30 11:59:22 -08:00
Benoit Steiner
bd21aba181
Sharded the cxx11_tensor_cuda test and fixed a memory leak
2016-01-30 11:47:09 -08:00
Benoit Steiner
9de155d153
Added a test to cover threaded tensor shuffling
2016-01-30 10:56:47 -08:00
Benoit Steiner
32088c06a1
Made the comparison between single and multithreaded contraction results more resistant to numerical noise to prevent spurious test failures.
2016-01-30 10:51:14 -08:00
Benoit Steiner
2053478c56
Made sure to use a tensor of rank 0 to store the result of a full reduction in the tensor thread pool test
2016-01-30 10:46:36 -08:00
Benoit Steiner
d0db95f730
Sharded the tensor thread pool test
2016-01-30 10:43:57 -08:00
Benoit Steiner
ba27c8a7de
Made the CUDA contract test more robust to numerical noise.
2016-01-30 10:28:43 -08:00
Benoit Steiner
963f2d2a8f
Marked several methods EIGEN_DEVICE_FUNC
2016-01-28 23:37:48 -08:00
Benoit Steiner
c5d25bf1d0
Fixed a couple of compilation warnings.
2016-01-28 23:15:45 -08:00
Benoit Steiner
7b3044d086
Made sure to call nvcc with the relaxed-constexpr flag.
2016-01-28 15:36:34 -08:00
Gael Guennebaud
ddf64babde
merge
2016-01-28 13:21:48 +01:00
Gael Guennebaud
7802a6bb1c
Fix unit test filename.
2016-01-28 09:35:37 +01:00
Benoit Steiner
4bf9eaf77a
Deleted an invalid assertion that prevented the assignment of empty tensors.
2016-01-27 17:09:30 -08:00
Benoit Steiner
291069e885
Fixed some compilation problems with nvcc + clang
2016-01-27 15:37:03 -08:00
Benoit Steiner
47ca9dc809
Fixed the tensor_cuda test
2016-01-27 14:58:48 -08:00
Benoit Steiner
55a5204319
Fixed the flags passed to nvcc to compile the tensor code.
2016-01-27 14:46:34 -08:00
Benoit Steiner
9dfbd4fe8d
Made the cuda tests compile using make check
2016-01-27 12:22:17 -08:00
Benoit Steiner
5973bcf939
Properly specify the namespace when calling cout/endl
2016-01-27 12:04:42 -08:00
Gael Guennebaud
9c8f7dfe94
bug #1156 : fix several function declarations whose arguments were passed by value instead of being passed by reference
2016-01-27 18:34:42 +01:00
Ville Kallioniemi
02db1228ed
Add constructor for long types.
2016-01-26 23:41:01 -07:00
Hauke Heibel
5eb2790be0
Fixed minor typo in SplineFitting.
2016-01-25 22:17:52 +01:00
Benoit Steiner
e3a15a03a4
Don't explicitely evaluate the subexpression from TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression
2016-01-24 23:04:50 -08:00
Benoit Steiner
bd207ce11e
Added missing EIGEN_DEVICE_FUNC qualifier
2016-01-24 20:36:05 -08:00
Benoit Steiner
cb4e53ff7f
Merged in ville-k/eigen/tensorflow_fix (pull request PR-153)
...
Add ctor for long
2016-01-22 19:11:31 -08:00
Ville Kallioniemi
9f94e030c1
Re-add executable flags to minimize changeset.
2016-01-22 20:08:45 -07:00
Benoit Steiner
3aeeca32af
Leverage the new blocking code in the tensor contraction code.
2016-01-22 16:36:30 -08:00
Benoit Steiner
4beb447e27
Created a mechanism to enable contraction mappers to determine the best blocking strategy.
2016-01-22 14:37:26 -08:00
Gael Guennebaud
6a44ccb58b
Backout changeset 690bc950f7
2016-01-22 15:03:53 +01:00
Ville Kallioniemi
9b6c72958a
Update to latest default branch
2016-01-21 23:08:54 -07:00
Benoit Steiner
c33479324c
Fixed a constness bug
2016-01-21 17:08:11 -08:00
Jan Prach
690bc950f7
fix clang warnings
...
"braces around scalar initializer"
2016-01-20 19:35:59 -08:00
Benoit Steiner
7ce932edd3
Small cleanup and small fix to the contraction of row major tensors
2016-01-20 18:12:08 -08:00
Benoit Steiner
47076bf00e
Reduce the register pressure exerted by the tensor mappers whenever possible. This improves the performance of the contraction of a matrix with a vector by about 35%.
2016-01-20 14:51:48 -08:00
Ville Kallioniemi
915e7667cd
Remove executable bit from header files
2016-01-19 21:17:29 -07:00
Ville Kallioniemi
2832175a68
Use explicitly 32 bit integer types in constructors.
2016-01-19 20:12:17 -07:00
Benoit Steiner
df79c00901
Improved the formatting of the code
2016-01-19 17:24:08 -08:00
Benoit Steiner
6d472d8375
Moved the contraction mapping code to its own file to make the code more manageable.
2016-01-19 17:22:05 -08:00
Benoit Steiner
b3b722905f
Improved code indentation
2016-01-19 17:09:47 -08:00
Benoit Steiner
5b7713dd33
Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression.
2016-01-19 17:05:10 -08:00
Ville Kallioniemi
63fb66f53a
Add ctor for long
2016-01-17 21:25:36 -07:00
Benoit Steiner
34057cff23
Fixed a race condition that could affect some reductions on CUDA devices.
2016-01-15 15:11:56 -08:00
Benoit Steiner
0461f0153e
Made it possible to compare tensor dimensions inside a CUDA kernel.
2016-01-15 11:22:16 -08:00
Benoit Steiner
aed4cb1269
Use warp shuffles instead of shared memory access to speedup the inner reduction kernel.
2016-01-14 21:45:14 -08:00
Benoit Steiner
8fe2532e70
Fixed a boundary condition bug in the outer reduction kernel
2016-01-14 09:29:48 -08:00
Benoit Steiner
9f013a9d86
Properly record the rank of reduced tensors in the tensor traits.
2016-01-13 14:24:37 -08:00
Benoit Steiner
79b69b7444
Trigger the optimized matrix vector path more conservatively.
2016-01-12 15:21:09 -08:00
Benoit Steiner
d920d57f38
Improved the performance of the contraction of a 2d tensor with a 1d tensor by a factor of 3 or more. This helps speedup LSTM neural networks.
2016-01-12 11:32:27 -08:00
Benoit Steiner
bd7d901da9
Reverted a previous change that tripped nvcc when compiling in debug mode.
2016-01-11 17:49:44 -08:00
Benoit Steiner
c5e6900400
Silenced a few compilation warnings.
2016-01-11 17:06:39 -08:00
Benoit Steiner
f894736d61
Updated the tensor traits: the alignment is not part of the Flags enum anymore
2016-01-11 16:42:18 -08:00
Benoit Steiner
4f7714d72c
Enabled the use of fixed dimensions from within a cuda kernel.
2016-01-11 16:01:00 -08:00
Benoit Steiner
01c55d37e6
Deleted unused variable.
2016-01-11 15:53:19 -08:00
Benoit Steiner
0504c56ea7
Silenced a nvcc compilation warning
2016-01-11 15:49:21 -08:00
Benoit Steiner
b523771a24
Silenced several compilation warnings triggered by nvcc.
2016-01-11 14:25:43 -08:00
Benoit Steiner
2c3b13eded
Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)
...
Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
2016-01-11 11:43:37 -08:00
Benoit Steiner
2ccb1c8634
Fixed a bug in the dispatch of optimized reduction kernels.
2016-01-11 10:36:37 -08:00
Benoit Steiner
780623261e
Re-enabled the optimized reduction CUDA code.
2016-01-11 09:07:14 -08:00
Jeremy Barnes
91678f489a
Cleaned up double-defined macro from last commit
2016-01-10 22:44:45 -05:00
Jeremy Barnes
403a7cb6c3
Alternative way of forcing instantiation of device kernels without
...
causing warnings or requiring device to device kernel invocations.
This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
2016-01-10 22:39:13 -05:00
Benoit Steiner
e76904af1b
Simplified the dispatch code.
2016-01-08 16:50:57 -08:00
Benoit Steiner
d726e864ac
Made it possible to use array of size 0 on CUDA devices
2016-01-08 16:38:14 -08:00
Benoit Steiner
3358dfd5dd
Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases
2016-01-08 16:28:53 -08:00
Benoit Steiner
53749ff415
Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
2016-01-08 13:53:40 -08:00
Benoit Steiner
6639b7d6e8
Removed a couple of partial specialization that confuse nvcc and result in errors such as this:
...
error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"
2016-01-07 18:45:19 -08:00
Benoit Steiner
0cb2ca5de2
Fixed a typo.
2016-01-06 18:50:28 -08:00
Benoit Steiner
213459d818
Optimized the performance of broadcasting of scalars.
2016-01-06 18:47:45 -08:00
Benoit Steiner
cfff40b1d4
Improved the performance of reductions on CUDA devices
2016-01-04 17:25:00 -08:00
Benoit Steiner
515dee0baf
Added a 'divup' util to compute the floor of the quotient of two integers
2016-01-04 16:29:26 -08:00
Gael Guennebaud
8b0d1eb0f7
Fix numerous doxygen shortcomings, and workaround some clang -Wdocumentation warnings
2016-01-01 21:45:06 +01:00
Gael Guennebaud
978c379ed7
Add missing ctor from uint
2015-12-30 12:52:38 +01:00
Eugene Brevdo
f7362772e3
Add digamma for CPU + CUDA. Includes tests.
2015-12-24 21:15:38 -08:00
Benoit Steiner
bdcbc66a5c
Don't attempt to vectorize mean reductions of integers since we can't use
...
SSE or AVX instructions to divide 2 integers.
2015-12-22 17:51:55 -08:00
Benoit Steiner
a1e08fb2a5
Optimized the configuration of the outer reduction cuda kernel
2015-12-22 16:30:10 -08:00
Benoit Steiner
9c7d96697b
Added missing define
2015-12-22 16:11:07 -08:00
Benoit Steiner
e7e6d01810
Made sure the optimized gpu reduction code is actually compiled.
2015-12-22 15:07:33 -08:00
Benoit Steiner
b5d2078c4a
Optimized outer reduction on GPUs.
2015-12-22 15:06:17 -08:00
Benoit Steiner
1c3e78319d
Added missing const
2015-12-21 15:05:01 -08:00
Benoit Steiner
1b82969559
Add alignment requirement for local buffer used by the slicing op.
2015-12-18 14:36:35 -08:00
Benoit Steiner
75a7fa1919
Doubled the speed of full reductions on GPUs.
2015-12-18 14:07:31 -08:00
Benoit Steiner
8dd17cbe80
Fixed a clang compilation warning triggered by the use of arrays of size 0.
2015-12-17 14:00:33 -08:00
Benoit Steiner
4aac55f684
Silenced some compilation warnings triggered by nvcc
2015-12-17 13:39:01 -08:00
Benoit Steiner
40e6250fc3
Made it possible to run tensor chipping operations on CUDA devices
2015-12-17 13:29:08 -08:00
Benoit Steiner
2ca55a3ae4
Fixed some compilation error triggered by the tensor code with msvc 2008
2015-12-16 20:45:58 -08:00
Gael Guennebaud
35d8725c73
Disable AutoDiffScalar generic copy ctor for non compatible scalar types (fix ambiguous template instantiation)
2015-12-16 10:14:24 +01:00
Christoph Hertzberg
92655e7215
bug #1136 : Protect isinf for Intel compilers. Also don't distinguish GCC from ICC and don't rely on EIGEN_NOT_A_MACRO, which might not be defined when including this.
2015-12-15 11:34:52 +01:00
Benoit Steiner
17352e2792
Made the entire TensorFixedSize api callable from a CUDA kernel.
2015-12-14 15:20:31 -08:00