Benoit Steiner
|
46fc23f91c
|
Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues.
|
2016-02-19 13:44:22 -08:00 |
|
Benoit Steiner
|
670db7988d
|
Updated the contraction code to make it compatible with half floats.
|
2016-02-19 13:03:26 -08:00 |
|
Benoit Steiner
|
180156ba1a
|
Added support for tensor reductions on half floats
|
2016-02-19 10:05:59 -08:00 |
|
Benoit Steiner
|
f268db1c4b
|
Added the ability to query the minor version of a cuda device
|
2016-02-19 16:31:04 +00:00 |
|
Benoit Steiner
|
f3352e0fb0
|
Don't make the array constructors explicit
|
2016-02-19 15:58:57 +00:00 |
|
Benoit Steiner
|
cd042dbbfd
|
Fixed a bug in the tensor type converter
|
2016-02-19 15:03:26 +00:00 |
|
Benoit Steiner
|
de345eff2e
|
Added a method to conjugate the content of a tensor or the result of a tensor expression.
|
2016-02-11 16:34:07 -08:00 |
|
Benoit Steiner
|
9a21b38ccc
|
Worked around a few clang compilation warnings
|
2016-02-10 08:02:04 -08:00 |
|
Benoit Steiner
|
72ab7879f7
|
Fixed clang comilation warnings
|
2016-02-10 06:48:28 -08:00 |
|
Benoit Steiner
|
e88535634d
|
Fixed some clang compilation warnings
|
2016-02-09 23:32:41 -08:00 |
|
Benoit Steiner
|
d69946183d
|
Updated the TensorIntDivisor code to work properly on LLP64 systems
|
2016-02-08 21:03:59 -08:00 |
|
Benoit Steiner
|
4d4211c04e
|
Avoid unecessary type conversions
|
2016-02-05 18:19:41 -08:00 |
|
Benoit Steiner
|
f535378995
|
Added support for vectorized type casting of int to char.
|
2016-02-03 18:58:29 -08:00 |
|
Benoit Steiner
|
4ab63a3f6f
|
Fixed the initialization of the dummy member of the array class to make it compatible with pairs of element.
|
2016-02-03 17:23:07 -08:00 |
|
Benoit Steiner
|
1cbb79cdfd
|
Made sure the dummy element of size 0 array is always intialized to silence some compiler warnings
|
2016-02-03 15:58:26 -08:00 |
|
Benoit Steiner
|
dc413dbe8a
|
Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158)
Add constructor for long types.
|
2016-02-02 20:58:06 -08:00 |
|
Ville Kallioniemi
|
783018d8f6
|
Use EIGEN_STATIC_ASSERT for backward compatibility.
|
2016-02-02 16:45:12 -07:00 |
|
Benoit Steiner
|
99cde88341
|
Don't try to use direct offsets when computing a tensor product, since the required stride isn't available.
|
2016-02-02 11:06:53 -08:00 |
|
Ville Kallioniemi
|
aedea349aa
|
Replace separate low word constructors with a single templated constructor.
|
2016-02-01 20:25:02 -07:00 |
|
Ville Kallioniemi
|
f0fdefa96f
|
Rebase to latest.
|
2016-02-01 19:32:31 -07:00 |
|
Benoit Steiner
|
6b5dff875e
|
Made it possible to limit the number of blocks that will be used to evaluate a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.
|
2016-02-01 12:46:32 -08:00 |
|
Benoit Steiner
|
e80ed948e1
|
Fixed a number of compilation warnings generated by the cuda tests
|
2016-01-31 20:09:41 -08:00 |
|
Benoit Steiner
|
6720b38fbf
|
Fixed a few compilation warnings
|
2016-01-31 16:48:50 -08:00 |
|
Benoit Steiner
|
963f2d2a8f
|
Marked several methods EIGEN_DEVICE_FUNC
|
2016-01-28 23:37:48 -08:00 |
|
Benoit Steiner
|
c5d25bf1d0
|
Fixed a couple of compilation warnings.
|
2016-01-28 23:15:45 -08:00 |
|
Gael Guennebaud
|
ddf64babde
|
merge
|
2016-01-28 13:21:48 +01:00 |
|
Benoit Steiner
|
4bf9eaf77a
|
Deleted an invalid assertion that prevented the assignment of empty tensors.
|
2016-01-27 17:09:30 -08:00 |
|
Benoit Steiner
|
291069e885
|
Fixed some compilation problems with nvcc + clang
|
2016-01-27 15:37:03 -08:00 |
|
Ville Kallioniemi
|
02db1228ed
|
Add constructor for long types.
|
2016-01-26 23:41:01 -07:00 |
|
Benoit Steiner
|
e3a15a03a4
|
Don't explicitely evaluate the subexpression from TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression
|
2016-01-24 23:04:50 -08:00 |
|
Benoit Steiner
|
bd207ce11e
|
Added missing EIGEN_DEVICE_FUNC qualifier
|
2016-01-24 20:36:05 -08:00 |
|
Benoit Steiner
|
cb4e53ff7f
|
Merged in ville-k/eigen/tensorflow_fix (pull request PR-153)
Add ctor for long
|
2016-01-22 19:11:31 -08:00 |
|
Benoit Steiner
|
3aeeca32af
|
Leverage the new blocking code in the tensor contraction code.
|
2016-01-22 16:36:30 -08:00 |
|
Benoit Steiner
|
4beb447e27
|
Created a mechanism to enable contraction mappers to determine the best blocking strategy.
|
2016-01-22 14:37:26 -08:00 |
|
Gael Guennebaud
|
6a44ccb58b
|
Backout changeset 690bc950f7
|
2016-01-22 15:03:53 +01:00 |
|
Ville Kallioniemi
|
9b6c72958a
|
Update to latest default branch
|
2016-01-21 23:08:54 -07:00 |
|
Benoit Steiner
|
c33479324c
|
Fixed a constness bug
|
2016-01-21 17:08:11 -08:00 |
|
Jan Prach
|
690bc950f7
|
fix clang warnings
"braces around scalar initializer"
|
2016-01-20 19:35:59 -08:00 |
|
Benoit Steiner
|
7ce932edd3
|
Small cleanup and small fix to the contraction of row major tensors
|
2016-01-20 18:12:08 -08:00 |
|
Benoit Steiner
|
47076bf00e
|
Reduce the register pressure exerted by the tensor mappers whenever possible. This improves the performance of the contraction of a matrix with a vector by about 35%.
|
2016-01-20 14:51:48 -08:00 |
|
Ville Kallioniemi
|
2832175a68
|
Use explicitly 32 bit integer types in constructors.
|
2016-01-19 20:12:17 -07:00 |
|
Benoit Steiner
|
df79c00901
|
Improved the formatting of the code
|
2016-01-19 17:24:08 -08:00 |
|
Benoit Steiner
|
6d472d8375
|
Moved the contraction mapping code to its own file to make the code more manageable.
|
2016-01-19 17:22:05 -08:00 |
|
Benoit Steiner
|
b3b722905f
|
Improved code indentation
|
2016-01-19 17:09:47 -08:00 |
|
Benoit Steiner
|
5b7713dd33
|
Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression.
|
2016-01-19 17:05:10 -08:00 |
|
Ville Kallioniemi
|
63fb66f53a
|
Add ctor for long
|
2016-01-17 21:25:36 -07:00 |
|
Benoit Steiner
|
34057cff23
|
Fixed a race condition that could affect some reductions on CUDA devices.
|
2016-01-15 15:11:56 -08:00 |
|
Benoit Steiner
|
0461f0153e
|
Made it possible to compare tensor dimensions inside a CUDA kernel.
|
2016-01-15 11:22:16 -08:00 |
|
Benoit Steiner
|
aed4cb1269
|
Use warp shuffles instead of shared memory access to speedup the inner reduction kernel.
|
2016-01-14 21:45:14 -08:00 |
|
Benoit Steiner
|
8fe2532e70
|
Fixed a boundary condition bug in the outer reduction kernel
|
2016-01-14 09:29:48 -08:00 |
|