Benoit Steiner
bd7d901da9
Reverted a previous change that tripped nvcc when compiling in debug mode.
2016-01-11 17:49:44 -08:00
Benoit Steiner
bbdabbb379
Made the blas utils usable from within a cuda kernel
2016-01-11 17:26:56 -08:00
Benoit Steiner
c5e6900400
Silenced a few compilation warnings.
2016-01-11 17:06:39 -08:00
Benoit Steiner
f894736d61
Updated the tensor traits: the alignment is not part of the Flags enum anymore
2016-01-11 16:42:18 -08:00
Benoit Steiner
4f7714d72c
Enabled the use of fixed dimensions from within a cuda kernel.
2016-01-11 16:01:00 -08:00
Benoit Steiner
01c55d37e6
Deleted unused variable.
2016-01-11 15:53:19 -08:00
Benoit Steiner
0504c56ea7
Silenced a nvcc compilation warning
2016-01-11 15:49:21 -08:00
Benoit Steiner
b523771a24
Silenced several compilation warnings triggered by nvcc.
2016-01-11 14:25:43 -08:00
Benoit Steiner
2c3b13eded
Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)
...
Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
2016-01-11 11:43:37 -08:00
Benoit Steiner
2ccb1c8634
Fixed a bug in the dispatch of optimized reduction kernels.
2016-01-11 10:36:37 -08:00
Benoit Steiner
780623261e
Re-enabled the optimized reduction CUDA code.
2016-01-11 09:07:14 -08:00
Jeremy Barnes
91678f489a
Cleaned up double-defined macro from last commit
2016-01-10 22:44:45 -05:00
Jeremy Barnes
403a7cb6c3
Alternative way of forcing instantiation of device kernels without
...
causing warnings or requiring device to device kernel invocations.
This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
2016-01-10 22:39:13 -05:00
Gael Guennebaud
b557662e58
merge
2016-01-09 08:37:01 +01:00
Gael Guennebaud
8b9dc9f0df
bug #1144 : fix regression in x=y+A*x (aliasing), and move evaluator_traits::AssumeAliasing to evaluator_assume_aliasing.
2016-01-09 08:30:38 +01:00
Benoit Steiner
e76904af1b
Simplified the dispatch code.
2016-01-08 16:50:57 -08:00
Benoit Steiner
d726e864ac
Made it possible to use array of size 0 on CUDA devices
2016-01-08 16:38:14 -08:00
Benoit Steiner
3358dfd5dd
Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases
2016-01-08 16:28:53 -08:00
Benoit Steiner
53749ff415
Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
2016-01-08 13:53:40 -08:00
Gael Guennebaud
f9d71a1729
extend matlab conversion table
2016-01-08 22:24:45 +01:00
Benoit Steiner
6639b7d6e8
Removed a couple of partial specialization that confuse nvcc and result in errors such as this:
...
error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"
2016-01-07 18:45:19 -08:00
Benoit Steiner
0cb2ca5de2
Fixed a typo.
2016-01-06 18:50:28 -08:00
Benoit Steiner
213459d818
Optimized the performance of broadcasting of scalars.
2016-01-06 18:47:45 -08:00
Gael Guennebaud
ee738321aa
rm remaining debug code
2016-01-06 14:49:40 +01:00
Christoph Hertzberg
54bf582303
bug #1143 : Work-around gcc bug
2016-01-06 11:59:24 +01:00
Benoit Steiner
cfff40b1d4
Improved the performance of reductions on CUDA devices
2016-01-04 17:25:00 -08:00
Benoit Steiner
515dee0baf
Added a 'divup' util to compute the floor of the quotient of two integers
2016-01-04 16:29:26 -08:00
Gael Guennebaud
715f6f049f
Improve inline documentation of SparseCompressedBase and its derived classes
2016-01-03 21:56:30 +01:00
Gael Guennebaud
8b0d1eb0f7
Fix numerous doxygen shortcomings, and workaround some clang -Wdocumentation warnings
2016-01-01 21:45:06 +01:00
Gael Guennebaud
9900782e88
Mark AlignedBit and EvalBeforeNestingBit with deprecated attribute, and remove the remaining usages of EvalBeforeNestingBit.
2015-12-30 16:47:49 +01:00
Gael Guennebaud
70404e07c2
Workaround clang -Wdocumentation warning about "/*<"
2015-12-30 16:46:45 +01:00
Gael Guennebaud
addb7066e8
Workaround "empty paragraph" warning with clang -Wdocumentation
2015-12-30 16:45:44 +01:00
Gael Guennebaud
eadc377b3f
Add missing doc of Derived template parameter
2015-12-30 16:43:19 +01:00
Gael Guennebaud
29bb599e03
Fix numerous doxygen issues in auto-link generation
2015-12-30 16:04:24 +01:00
Gael Guennebaud
162ccb2938
Fix links to Eigen2-to-Eigen3 porting helpers
2015-12-30 16:03:14 +01:00
Gael Guennebaud
5fae3750b5
Recent versions of doxygen miss-parsed Eigen/* headers
2015-12-30 16:02:05 +01:00
Gael Guennebaud
b84cefe61d
Add missing snippets for erf/erfc/lgamma functions.
2015-12-30 15:12:15 +01:00
Gael Guennebaud
16dd82ed51
Add missing snippet for sign/cwiseSign functions.
2015-12-30 15:11:42 +01:00
Gael Guennebaud
978c379ed7
Add missing ctor from uint
2015-12-30 12:52:38 +01:00
Gael Guennebaud
25f2b8d824
bug #1141 : add missing initialization of CholmodBase::m_*IsOk
2015-12-29 15:50:11 +01:00
Gael Guennebaud
d2e288ae50
Workaround compilers that do not even define _mm256_set_m128.
2015-12-24 16:53:43 +01:00
Benoit Steiner
bdcbc66a5c
Don't attempt to vectorize mean reductions of integers since we can't use
...
SSE or AVX instructions to divide 2 integers.
2015-12-22 17:51:55 -08:00
Benoit Steiner
a1e08fb2a5
Optimized the configuration of the outer reduction cuda kernel
2015-12-22 16:30:10 -08:00
Benoit Steiner
9c7d96697b
Added missing define
2015-12-22 16:11:07 -08:00
Benoit Steiner
e7e6d01810
Made sure the optimized gpu reduction code is actually compiled.
2015-12-22 15:07:33 -08:00
Benoit Steiner
b5d2078c4a
Optimized outer reduction on GPUs.
2015-12-22 15:06:17 -08:00
Benoit Steiner
3504ae47ca
Made it possible to run the lgamma, erf, and erfc functors on a CUDA gpu.
2015-12-21 15:20:06 -08:00
Benoit Steiner
1c3e78319d
Added missing const
2015-12-21 15:05:01 -08:00
Benoit Steiner
b407948a77
Merged in connor-k/eigen (pull request PR-149)
...
[doc] Remove extra ';' in Advanced Initialization sample
2015-12-21 09:44:25 -08:00
Benoit Steiner
a6c243617b
Fixed a typo in previous change.
2015-12-21 09:05:45 -08:00