Benoit Steiner
f268db1c4b
Added the ability to query the minor version of a cuda device
2016-02-19 16:31:04 +00:00
Benoit Steiner
f3352e0fb0
Don't make the array constructors explicit
2016-02-19 15:58:57 +00:00
Benoit Steiner
cd042dbbfd
Fixed a bug in the tensor type converter
2016-02-19 15:03:26 +00:00
Benoit Steiner
de345eff2e
Added a method to conjugate the content of a tensor or the result of a tensor expression.
2016-02-11 16:34:07 -08:00
Benoit Steiner
9a21b38ccc
Worked around a few clang compilation warnings
2016-02-10 08:02:04 -08:00
Benoit Steiner
72ab7879f7
Fixed clang comilation warnings
2016-02-10 06:48:28 -08:00
Benoit Steiner
e88535634d
Fixed some clang compilation warnings
2016-02-09 23:32:41 -08:00
Benoit Steiner
d69946183d
Updated the TensorIntDivisor code to work properly on LLP64 systems
2016-02-08 21:03:59 -08:00
Benoit Steiner
4d4211c04e
Avoid unecessary type conversions
2016-02-05 18:19:41 -08:00
Benoit Steiner
f535378995
Added support for vectorized type casting of int to char.
2016-02-03 18:58:29 -08:00
Benoit Steiner
4ab63a3f6f
Fixed the initialization of the dummy member of the array class to make it compatible with pairs of element.
2016-02-03 17:23:07 -08:00
Benoit Steiner
1cbb79cdfd
Made sure the dummy element of size 0 array is always intialized to silence some compiler warnings
2016-02-03 15:58:26 -08:00
Benoit Steiner
dc413dbe8a
Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158)
...
Add constructor for long types.
2016-02-02 20:58:06 -08:00
Ville Kallioniemi
783018d8f6
Use EIGEN_STATIC_ASSERT for backward compatibility.
2016-02-02 16:45:12 -07:00
Benoit Steiner
99cde88341
Don't try to use direct offsets when computing a tensor product, since the required stride isn't available.
2016-02-02 11:06:53 -08:00
Ville Kallioniemi
aedea349aa
Replace separate low word constructors with a single templated constructor.
2016-02-01 20:25:02 -07:00
Ville Kallioniemi
f0fdefa96f
Rebase to latest.
2016-02-01 19:32:31 -07:00
Benoit Steiner
6b5dff875e
Made it possible to limit the number of blocks that will be used to evaluate a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.
2016-02-01 12:46:32 -08:00
Benoit Steiner
e80ed948e1
Fixed a number of compilation warnings generated by the cuda tests
2016-01-31 20:09:41 -08:00
Benoit Steiner
6720b38fbf
Fixed a few compilation warnings
2016-01-31 16:48:50 -08:00
Benoit Steiner
963f2d2a8f
Marked several methods EIGEN_DEVICE_FUNC
2016-01-28 23:37:48 -08:00
Benoit Steiner
c5d25bf1d0
Fixed a couple of compilation warnings.
2016-01-28 23:15:45 -08:00
Gael Guennebaud
ddf64babde
merge
2016-01-28 13:21:48 +01:00
Benoit Steiner
4bf9eaf77a
Deleted an invalid assertion that prevented the assignment of empty tensors.
2016-01-27 17:09:30 -08:00
Benoit Steiner
291069e885
Fixed some compilation problems with nvcc + clang
2016-01-27 15:37:03 -08:00
Ville Kallioniemi
02db1228ed
Add constructor for long types.
2016-01-26 23:41:01 -07:00
Benoit Steiner
e3a15a03a4
Don't explicitely evaluate the subexpression from TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression
2016-01-24 23:04:50 -08:00
Benoit Steiner
bd207ce11e
Added missing EIGEN_DEVICE_FUNC qualifier
2016-01-24 20:36:05 -08:00
Benoit Steiner
cb4e53ff7f
Merged in ville-k/eigen/tensorflow_fix (pull request PR-153)
...
Add ctor for long
2016-01-22 19:11:31 -08:00
Benoit Steiner
3aeeca32af
Leverage the new blocking code in the tensor contraction code.
2016-01-22 16:36:30 -08:00
Benoit Steiner
4beb447e27
Created a mechanism to enable contraction mappers to determine the best blocking strategy.
2016-01-22 14:37:26 -08:00
Gael Guennebaud
6a44ccb58b
Backout changeset 690bc950f7
2016-01-22 15:03:53 +01:00
Ville Kallioniemi
9b6c72958a
Update to latest default branch
2016-01-21 23:08:54 -07:00
Benoit Steiner
c33479324c
Fixed a constness bug
2016-01-21 17:08:11 -08:00
Jan Prach
690bc950f7
fix clang warnings
...
"braces around scalar initializer"
2016-01-20 19:35:59 -08:00
Benoit Steiner
7ce932edd3
Small cleanup and small fix to the contraction of row major tensors
2016-01-20 18:12:08 -08:00
Benoit Steiner
47076bf00e
Reduce the register pressure exerted by the tensor mappers whenever possible. This improves the performance of the contraction of a matrix with a vector by about 35%.
2016-01-20 14:51:48 -08:00
Ville Kallioniemi
2832175a68
Use explicitly 32 bit integer types in constructors.
2016-01-19 20:12:17 -07:00
Benoit Steiner
df79c00901
Improved the formatting of the code
2016-01-19 17:24:08 -08:00
Benoit Steiner
6d472d8375
Moved the contraction mapping code to its own file to make the code more manageable.
2016-01-19 17:22:05 -08:00
Benoit Steiner
b3b722905f
Improved code indentation
2016-01-19 17:09:47 -08:00
Benoit Steiner
5b7713dd33
Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression.
2016-01-19 17:05:10 -08:00
Ville Kallioniemi
63fb66f53a
Add ctor for long
2016-01-17 21:25:36 -07:00
Benoit Steiner
34057cff23
Fixed a race condition that could affect some reductions on CUDA devices.
2016-01-15 15:11:56 -08:00
Benoit Steiner
0461f0153e
Made it possible to compare tensor dimensions inside a CUDA kernel.
2016-01-15 11:22:16 -08:00
Benoit Steiner
aed4cb1269
Use warp shuffles instead of shared memory access to speedup the inner reduction kernel.
2016-01-14 21:45:14 -08:00
Benoit Steiner
8fe2532e70
Fixed a boundary condition bug in the outer reduction kernel
2016-01-14 09:29:48 -08:00
Benoit Steiner
9f013a9d86
Properly record the rank of reduced tensors in the tensor traits.
2016-01-13 14:24:37 -08:00
Benoit Steiner
79b69b7444
Trigger the optimized matrix vector path more conservatively.
2016-01-12 15:21:09 -08:00
Benoit Steiner
d920d57f38
Improved the performance of the contraction of a 2d tensor with a 1d tensor by a factor of 3 or more. This helps speedup LSTM neural networks.
2016-01-12 11:32:27 -08:00
Benoit Steiner
bd7d901da9
Reverted a previous change that tripped nvcc when compiling in debug mode.
2016-01-11 17:49:44 -08:00
Benoit Steiner
c5e6900400
Silenced a few compilation warnings.
2016-01-11 17:06:39 -08:00
Benoit Steiner
f894736d61
Updated the tensor traits: the alignment is not part of the Flags enum anymore
2016-01-11 16:42:18 -08:00
Benoit Steiner
4f7714d72c
Enabled the use of fixed dimensions from within a cuda kernel.
2016-01-11 16:01:00 -08:00
Benoit Steiner
01c55d37e6
Deleted unused variable.
2016-01-11 15:53:19 -08:00
Benoit Steiner
0504c56ea7
Silenced a nvcc compilation warning
2016-01-11 15:49:21 -08:00
Benoit Steiner
b523771a24
Silenced several compilation warnings triggered by nvcc.
2016-01-11 14:25:43 -08:00
Benoit Steiner
2c3b13eded
Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)
...
Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
2016-01-11 11:43:37 -08:00
Benoit Steiner
2ccb1c8634
Fixed a bug in the dispatch of optimized reduction kernels.
2016-01-11 10:36:37 -08:00
Benoit Steiner
780623261e
Re-enabled the optimized reduction CUDA code.
2016-01-11 09:07:14 -08:00
Jeremy Barnes
91678f489a
Cleaned up double-defined macro from last commit
2016-01-10 22:44:45 -05:00
Jeremy Barnes
403a7cb6c3
Alternative way of forcing instantiation of device kernels without
...
causing warnings or requiring device to device kernel invocations.
This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
2016-01-10 22:39:13 -05:00
Benoit Steiner
e76904af1b
Simplified the dispatch code.
2016-01-08 16:50:57 -08:00
Benoit Steiner
d726e864ac
Made it possible to use array of size 0 on CUDA devices
2016-01-08 16:38:14 -08:00
Benoit Steiner
3358dfd5dd
Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases
2016-01-08 16:28:53 -08:00
Benoit Steiner
53749ff415
Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
2016-01-08 13:53:40 -08:00
Benoit Steiner
6639b7d6e8
Removed a couple of partial specialization that confuse nvcc and result in errors such as this:
...
error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"
2016-01-07 18:45:19 -08:00
Benoit Steiner
0cb2ca5de2
Fixed a typo.
2016-01-06 18:50:28 -08:00
Benoit Steiner
213459d818
Optimized the performance of broadcasting of scalars.
2016-01-06 18:47:45 -08:00
Benoit Steiner
cfff40b1d4
Improved the performance of reductions on CUDA devices
2016-01-04 17:25:00 -08:00
Benoit Steiner
515dee0baf
Added a 'divup' util to compute the floor of the quotient of two integers
2016-01-04 16:29:26 -08:00
Gael Guennebaud
978c379ed7
Add missing ctor from uint
2015-12-30 12:52:38 +01:00
Eugene Brevdo
f7362772e3
Add digamma for CPU + CUDA. Includes tests.
2015-12-24 21:15:38 -08:00
Benoit Steiner
bdcbc66a5c
Don't attempt to vectorize mean reductions of integers since we can't use
...
SSE or AVX instructions to divide 2 integers.
2015-12-22 17:51:55 -08:00
Benoit Steiner
a1e08fb2a5
Optimized the configuration of the outer reduction cuda kernel
2015-12-22 16:30:10 -08:00
Benoit Steiner
9c7d96697b
Added missing define
2015-12-22 16:11:07 -08:00
Benoit Steiner
e7e6d01810
Made sure the optimized gpu reduction code is actually compiled.
2015-12-22 15:07:33 -08:00
Benoit Steiner
b5d2078c4a
Optimized outer reduction on GPUs.
2015-12-22 15:06:17 -08:00
Benoit Steiner
1c3e78319d
Added missing const
2015-12-21 15:05:01 -08:00
Benoit Steiner
1b82969559
Add alignment requirement for local buffer used by the slicing op.
2015-12-18 14:36:35 -08:00
Benoit Steiner
75a7fa1919
Doubled the speed of full reductions on GPUs.
2015-12-18 14:07:31 -08:00
Benoit Steiner
8dd17cbe80
Fixed a clang compilation warning triggered by the use of arrays of size 0.
2015-12-17 14:00:33 -08:00
Benoit Steiner
4aac55f684
Silenced some compilation warnings triggered by nvcc
2015-12-17 13:39:01 -08:00
Benoit Steiner
40e6250fc3
Made it possible to run tensor chipping operations on CUDA devices
2015-12-17 13:29:08 -08:00
Benoit Steiner
2ca55a3ae4
Fixed some compilation error triggered by the tensor code with msvc 2008
2015-12-16 20:45:58 -08:00
Benoit Steiner
17352e2792
Made the entire TensorFixedSize api callable from a CUDA kernel.
2015-12-14 15:20:31 -08:00
Benoit Steiner
75e19fc7ca
Marked the tensor constructors as EIGEN_DEVICE_FUNC: This makes it possible to call them from a CUDA kernel.
2015-12-14 15:12:55 -08:00
Gael Guennebaud
ca39b1546e
Merged in ebrevdo/eigen (pull request PR-148)
...
Add special functions to eigen: lgamma, erf, erfc.
2015-12-11 11:52:09 +01:00
Benoit Steiner
6af52a1227
Fixed a typo in the constructor of tensors of rank 5.
2015-12-10 23:31:12 -08:00
Benoit Steiner
8e00ea9a92
Fixed the coefficient accessors use for the 2d and 3d case when compiling without cxx11 support.
2015-12-10 22:45:10 -08:00
Eugene Brevdo
fa4f933c0f
Add special functions to Eigen: lgamma, erf, erfc.
...
Includes CUDA support and unit tests.
2015-12-07 15:24:49 -08:00
Benoit Steiner
7dfe75f445
Fixed compilation warnings
2015-12-07 08:12:30 -08:00
Benoit Steiner
f4ca8ad917
Use signed integers instead of unsigned ones more consistently in the codebase.
2015-12-04 18:14:16 -08:00
Benoit Steiner
490d26e4c1
Use integers instead of std::size_t to encode the number of dimensions in the Tensor class since most of the code currently already use integers.
2015-12-04 10:15:11 -08:00
Benoit Steiner
d20efc974d
Made it possible to use the sigmoid functor within a CUDA kernel.
2015-12-04 09:38:15 -08:00
Benoit Steiner
029052d276
Deleted redundant code
2015-12-03 17:08:47 -08:00
Mark Borgerding
7ddcf97da7
added scalar_sign_op (both real,complex)
2015-11-24 17:15:07 -05:00
Benoit Steiner
44848ac39b
Fixed a bug in TensorArgMax.h
2015-11-23 15:58:47 -08:00
Benoit Steiner
547a8608e5
Fixed the implementation of Eigen::internal::count_leading_zeros for MSVC.
...
Also updated the code to silence bogux warnings generated by nvcc when compilining this function.
2015-11-23 12:17:45 -08:00
Benoit Steiner
562078780a
Don't create more cuda blocks than necessary
2015-11-23 11:00:10 -08:00
Benoit Steiner
df31ca3b9e
Made it possible to refer t oa GPUDevice from code compile with a regular C++ compiler
2015-11-23 10:03:53 -08:00
Benoit Steiner
1e04059012
Deleted unused variable.
2015-11-23 08:36:54 -08:00
Benoit Steiner
9fa65d3838
Split TensorDeviceType.h in 3 files to make it more manageable
2015-11-20 17:42:50 -08:00
Benoit Steiner
a367804856
Added option to force the usage of the Eigen array class instead of the std::array class.
2015-11-20 12:41:40 -08:00
Benoit Steiner
383d1cc2ed
Added proper support for fast 64bit integer division on CUDA
2015-11-20 11:09:46 -08:00
Benoit Steiner
f37a5f1c53
Fixed compilation error triggered by nvcc
2015-11-19 14:34:26 -08:00
Benoit Steiner
f8df393165
Added support for 128bit integers on CUDA devices.
2015-11-19 13:57:27 -08:00
Benoit Steiner
1dd444ea71
Avoid using the version of TensorIntDiv optimized for 32-bit integers when the divisor can be equal to one since it isn't supported.
2015-11-18 11:37:58 -08:00
Benoit Steiner
f1fbd74db9
Added sanity check
2015-11-13 09:07:27 -08:00
Benoit Steiner
7815b84be4
Fixed a compilation warning
2015-11-12 20:16:59 -08:00
Benoit Steiner
10a91930cc
Fixed a compilation warning triggered by nvcc
2015-11-12 20:10:52 -08:00
Benoit Steiner
ed4b37de02
Fixed a few compilation warnings
2015-11-12 20:08:01 -08:00
Benoit Steiner
b69248fa2a
Added a couple of missing EIGEN_DEVICE_FUNC
2015-11-12 20:01:50 -08:00
Benoit Steiner
0aaa5941df
Silenced some compilation warnings triggered by nvcc
2015-11-12 19:11:43 -08:00
Benoit Steiner
2c73633b28
Fixed a few more typos
2015-11-12 18:39:19 -08:00
Benoit Steiner
be08e82953
Fixed typos
2015-11-12 18:37:40 -08:00
Benoit Steiner
150c12e138
Completed the IndexList rewrite
2015-11-12 18:11:56 -08:00
Benoit Steiner
8037826367
Simplified more of the IndexList code.
2015-11-12 17:19:45 -08:00
Benoit Steiner
e9ecfad796
Started to make the IndexList code compile by more compilers
2015-11-12 16:41:14 -08:00
Benoit Steiner
7a1316fcc5
Fixed compilation error with xcode.
2015-11-12 11:05:54 -08:00
Benoit Steiner
737d237722
Made it possible to run some of the CXXMeta functions on a CUDA device.
2015-11-12 09:02:59 -08:00
Benoit Steiner
1e072424e8
Moved the array code into it's own file.
2015-11-12 08:57:04 -08:00
Benoit Steiner
aa5f1ca714
gen_numeric_list takes a size_t, not a int
2015-11-12 08:30:10 -08:00
Benoit Steiner
9fa10fe52d
Don't use std::array when compiling with nvcc since nvidia doesn't support the use of STL containers on GPU.
2015-11-11 15:38:30 -08:00
Benoit Steiner
c587293e48
Fixed a compilation warning
2015-11-11 15:35:12 -08:00
Benoit Steiner
7f1c29fb0c
Make it possible for a vectorized tensor expression to be executed in a CUDA kernel.
2015-11-11 15:22:50 -08:00
Benoit Steiner
99f4778506
Disable SFINAE when compiling with nvcc
2015-11-11 15:04:58 -08:00
Benoit Steiner
5cb18e5b5e
Fixed CUDA compilation errors
2015-11-11 14:36:33 -08:00
Benoit Steiner
228edfe616
Use Eigen::NumTraits instead of std::numeric_limits
2015-11-11 09:26:23 -08:00
Benoit Steiner
20e2ab1121
Fixed another compilation warning
2015-12-07 16:17:57 -08:00
Benoit Steiner
d573efe303
Code cleanup
2015-11-06 14:54:28 -08:00
Benoit Steiner
9fa283339f
Silenced a compilation warning
2015-11-06 11:44:22 -08:00
Benoit Steiner
53432a17b2
Added static assertions to avoid misuses of padding, broadcasting and concatenation ops.
2015-11-06 10:26:19 -08:00
Benoit Steiner
6857a35a11
Fixed typos
2015-11-06 09:42:05 -08:00
Benoit Steiner
33cbdc2d15
Added more missing EIGEN_DEVICE_FUNC
2015-11-06 09:29:59 -08:00
Benoit Steiner
ed1962b464
Reimplement the tensor comparison operators by using the scalar_cmp_op functors. This makes them more cuda friendly.
2015-11-06 09:18:43 -08:00
Benoit Steiner
29038b982d
Added support for modulo operation
2015-11-05 19:39:48 -08:00
Benoit Steiner
c75a19f815
Misc fixes to full reductions
2015-11-05 14:21:20 -08:00
Benoit Steiner
ec5a81b45a
Fixed a bug in the extraction of sizes of fixed sized tensors of rank 0
2015-11-05 13:39:48 -08:00
Benoit Steiner
beedd9630d
Updated the reduction code so that full reductions now return a tensor of rank 0.
2015-11-04 13:57:36 -08:00
Benoit Steiner
6a02c2a85d
Fixed a compilation warning
2015-10-29 20:21:29 -07:00
Benoit Steiner
ca12d4c3b3
Pulled latest updates from trunk
2015-10-29 17:57:48 -07:00
Benoit Steiner
ce19e38c1f
Added support for tensor maps of rank 0.
2015-10-29 17:49:04 -07:00
Benoit Steiner
3785c69294
Added support for fixed sized tensors of rank 0
2015-10-29 17:31:03 -07:00
Benoit Steiner
0d7a23d34e
Extended the reduction code so that reducing an empty set returns the neural element for the operation
2015-10-29 17:29:49 -07:00
Benoit Steiner
1b0685d09a
Added support for rank-0 tensors
2015-10-29 17:27:38 -07:00
Benoit Steiner
c444a0a8c3
Consistently use the same index type in the fft codebase.
2015-10-29 16:39:47 -07:00
Benoit Steiner
09ea3a7acd
Silenced a few more compilation warnings
2015-10-29 16:22:52 -07:00
Benoit Steiner
0974a57910
Silenced compiler warning
2015-10-29 15:00:06 -07:00
Benoit Steiner
1c8312c811
Started to add support for tensors of rank 0
2015-10-26 14:29:26 -07:00