Commit Graph

7247 Commits

Author SHA1 Message Date
Benoit Steiner
bd7d901da9 Reverted a previous change that tripped nvcc when compiling in debug mode. 2016-01-11 17:49:44 -08:00
Benoit Steiner
bbdabbb379 Made the blas utils usable from within a cuda kernel 2016-01-11 17:26:56 -08:00
Benoit Steiner
c5e6900400 Silenced a few compilation warnings. 2016-01-11 17:06:39 -08:00
Benoit Steiner
f894736d61 Updated the tensor traits: the alignment is not part of the Flags enum anymore 2016-01-11 16:42:18 -08:00
Benoit Steiner
4f7714d72c Enabled the use of fixed dimensions from within a cuda kernel. 2016-01-11 16:01:00 -08:00
Benoit Steiner
01c55d37e6 Deleted unused variable. 2016-01-11 15:53:19 -08:00
Benoit Steiner
0504c56ea7 Silenced a nvcc compilation warning 2016-01-11 15:49:21 -08:00
Benoit Steiner
b523771a24 Silenced several compilation warnings triggered by nvcc. 2016-01-11 14:25:43 -08:00
Benoit Steiner
2c3b13eded Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)
Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
2016-01-11 11:43:37 -08:00
Benoit Steiner
2ccb1c8634 Fixed a bug in the dispatch of optimized reduction kernels. 2016-01-11 10:36:37 -08:00
Benoit Steiner
780623261e Re-enabled the optimized reduction CUDA code. 2016-01-11 09:07:14 -08:00
Jeremy Barnes
91678f489a Cleaned up double-defined macro from last commit 2016-01-10 22:44:45 -05:00
Jeremy Barnes
403a7cb6c3 Alternative way of forcing instantiation of device kernels without
causing warnings or requiring device to device kernel invocations.

This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
2016-01-10 22:39:13 -05:00
Gael Guennebaud
b557662e58 merge 2016-01-09 08:37:01 +01:00
Gael Guennebaud
8b9dc9f0df bug #1144: fix regression in x=y+A*x (aliasing), and move evaluator_traits::AssumeAliasing to evaluator_assume_aliasing. 2016-01-09 08:30:38 +01:00
Benoit Steiner
e76904af1b Simplified the dispatch code. 2016-01-08 16:50:57 -08:00
Benoit Steiner
d726e864ac Made it possible to use array of size 0 on CUDA devices 2016-01-08 16:38:14 -08:00
Benoit Steiner
3358dfd5dd Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases 2016-01-08 16:28:53 -08:00
Benoit Steiner
53749ff415 Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures. 2016-01-08 13:53:40 -08:00
Gael Guennebaud
f9d71a1729 extend matlab conversion table 2016-01-08 22:24:45 +01:00
Benoit Steiner
6639b7d6e8 Removed a couple of partial specialization that confuse nvcc and result in errors such as this:
error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>"
            "Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>"
            "Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"
2016-01-07 18:45:19 -08:00
Benoit Steiner
0cb2ca5de2 Fixed a typo. 2016-01-06 18:50:28 -08:00
Benoit Steiner
213459d818 Optimized the performance of broadcasting of scalars. 2016-01-06 18:47:45 -08:00
Gael Guennebaud
ee738321aa rm remaining debug code 2016-01-06 14:49:40 +01:00
Christoph Hertzberg
54bf582303 bug #1143: Work-around gcc bug 2016-01-06 11:59:24 +01:00
Benoit Steiner
cfff40b1d4 Improved the performance of reductions on CUDA devices 2016-01-04 17:25:00 -08:00
Benoit Steiner
515dee0baf Added a 'divup' util to compute the floor of the quotient of two integers 2016-01-04 16:29:26 -08:00
Gael Guennebaud
715f6f049f Improve inline documentation of SparseCompressedBase and its derived classes 2016-01-03 21:56:30 +01:00
Gael Guennebaud
8b0d1eb0f7 Fix numerous doxygen shortcomings, and workaround some clang -Wdocumentation warnings 2016-01-01 21:45:06 +01:00
Gael Guennebaud
9900782e88 Mark AlignedBit and EvalBeforeNestingBit with deprecated attribute, and remove the remaining usages of EvalBeforeNestingBit. 2015-12-30 16:47:49 +01:00
Gael Guennebaud
70404e07c2 Workaround clang -Wdocumentation warning about "/*<" 2015-12-30 16:46:45 +01:00
Gael Guennebaud
addb7066e8 Workaround "empty paragraph" warning with clang -Wdocumentation 2015-12-30 16:45:44 +01:00
Gael Guennebaud
eadc377b3f Add missing doc of Derived template parameter 2015-12-30 16:43:19 +01:00
Gael Guennebaud
29bb599e03 Fix numerous doxygen issues in auto-link generation 2015-12-30 16:04:24 +01:00
Gael Guennebaud
162ccb2938 Fix links to Eigen2-to-Eigen3 porting helpers 2015-12-30 16:03:14 +01:00
Gael Guennebaud
5fae3750b5 Recent versions of doxygen miss-parsed Eigen/* headers 2015-12-30 16:02:05 +01:00
Gael Guennebaud
b84cefe61d Add missing snippets for erf/erfc/lgamma functions. 2015-12-30 15:12:15 +01:00
Gael Guennebaud
16dd82ed51 Add missing snippet for sign/cwiseSign functions. 2015-12-30 15:11:42 +01:00
Gael Guennebaud
978c379ed7 Add missing ctor from uint 2015-12-30 12:52:38 +01:00
Gael Guennebaud
25f2b8d824 bug #1141: add missing initialization of CholmodBase::m_*IsOk 2015-12-29 15:50:11 +01:00
Gael Guennebaud
d2e288ae50 Workaround compilers that do not even define _mm256_set_m128. 2015-12-24 16:53:43 +01:00
Benoit Steiner
bdcbc66a5c Don't attempt to vectorize mean reductions of integers since we can't use
SSE or AVX instructions to divide 2 integers.
2015-12-22 17:51:55 -08:00
Benoit Steiner
a1e08fb2a5 Optimized the configuration of the outer reduction cuda kernel 2015-12-22 16:30:10 -08:00
Benoit Steiner
9c7d96697b Added missing define 2015-12-22 16:11:07 -08:00
Benoit Steiner
e7e6d01810 Made sure the optimized gpu reduction code is actually compiled. 2015-12-22 15:07:33 -08:00
Benoit Steiner
b5d2078c4a Optimized outer reduction on GPUs. 2015-12-22 15:06:17 -08:00
Benoit Steiner
3504ae47ca Made it possible to run the lgamma, erf, and erfc functors on a CUDA gpu. 2015-12-21 15:20:06 -08:00
Benoit Steiner
1c3e78319d Added missing const 2015-12-21 15:05:01 -08:00
Benoit Steiner
b407948a77 Merged in connor-k/eigen (pull request PR-149)
[doc] Remove extra ';' in Advanced Initialization sample
2015-12-21 09:44:25 -08:00
Benoit Steiner
a6c243617b Fixed a typo in previous change. 2015-12-21 09:05:45 -08:00