eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Benoit Steiner	b3b722905f	Improved code indentation	2016-01-19 17:09:47 -08:00
Benoit Steiner	5b7713dd33	Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression.	2016-01-19 17:05:10 -08:00
Benoit Steiner	34057cff23	Fixed a race condition that could affect some reductions on CUDA devices.	2016-01-15 15:11:56 -08:00
Benoit Steiner	0461f0153e	Made it possible to compare tensor dimensions inside a CUDA kernel.	2016-01-15 11:22:16 -08:00
Benoit Steiner	aed4cb1269	Use warp shuffles instead of shared memory access to speedup the inner reduction kernel.	2016-01-14 21:45:14 -08:00
Benoit Steiner	8fe2532e70	Fixed a boundary condition bug in the outer reduction kernel	2016-01-14 09:29:48 -08:00
Benoit Steiner	9f013a9d86	Properly record the rank of reduced tensors in the tensor traits.	2016-01-13 14:24:37 -08:00
Benoit Steiner	79b69b7444	Trigger the optimized matrix vector path more conservatively.	2016-01-12 15:21:09 -08:00
Benoit Steiner	d920d57f38	Improved the performance of the contraction of a 2d tensor with a 1d tensor by a factor of 3 or more. This helps speedup LSTM neural networks.	2016-01-12 11:32:27 -08:00
Benoit Steiner	bd7d901da9	Reverted a previous change that tripped nvcc when compiling in debug mode.	2016-01-11 17:49:44 -08:00
Benoit Steiner	c5e6900400	Silenced a few compilation warnings.	2016-01-11 17:06:39 -08:00
Benoit Steiner	f894736d61	Updated the tensor traits: the alignment is not part of the Flags enum anymore	2016-01-11 16:42:18 -08:00
Benoit Steiner	4f7714d72c	Enabled the use of fixed dimensions from within a cuda kernel.	2016-01-11 16:01:00 -08:00
Benoit Steiner	01c55d37e6	Deleted unused variable.	2016-01-11 15:53:19 -08:00
Benoit Steiner	0504c56ea7	Silenced a nvcc compilation warning	2016-01-11 15:49:21 -08:00
Benoit Steiner	b523771a24	Silenced several compilation warnings triggered by nvcc.	2016-01-11 14:25:43 -08:00
Benoit Steiner	2c3b13eded	Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152) Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.	2016-01-11 11:43:37 -08:00
Benoit Steiner	2ccb1c8634	Fixed a bug in the dispatch of optimized reduction kernels.	2016-01-11 10:36:37 -08:00
Benoit Steiner	780623261e	Re-enabled the optimized reduction CUDA code.	2016-01-11 09:07:14 -08:00
Jeremy Barnes	91678f489a	Cleaned up double-defined macro from last commit	2016-01-10 22:44:45 -05:00
Jeremy Barnes	403a7cb6c3	Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.	2016-01-10 22:39:13 -05:00
Benoit Steiner	e76904af1b	Simplified the dispatch code.	2016-01-08 16:50:57 -08:00
Benoit Steiner	d726e864ac	Made it possible to use array of size 0 on CUDA devices	2016-01-08 16:38:14 -08:00
Benoit Steiner	3358dfd5dd	Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases	2016-01-08 16:28:53 -08:00
Benoit Steiner	53749ff415	Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.	2016-01-08 13:53:40 -08:00
Benoit Steiner	6639b7d6e8	Removed a couple of partial specialization that confuse nvcc and result in errors such as this: error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"	2016-01-07 18:45:19 -08:00
Benoit Steiner	0cb2ca5de2	Fixed a typo.	2016-01-06 18:50:28 -08:00
Benoit Steiner	213459d818	Optimized the performance of broadcasting of scalars.	2016-01-06 18:47:45 -08:00
Benoit Steiner	cfff40b1d4	Improved the performance of reductions on CUDA devices	2016-01-04 17:25:00 -08:00
Benoit Steiner	515dee0baf	Added a 'divup' util to compute the floor of the quotient of two integers	2016-01-04 16:29:26 -08:00
Gael Guennebaud	978c379ed7	Add missing ctor from uint	2015-12-30 12:52:38 +01:00
Benoit Steiner	bdcbc66a5c	Don't attempt to vectorize mean reductions of integers since we can't use SSE or AVX instructions to divide 2 integers.	2015-12-22 17:51:55 -08:00
Benoit Steiner	a1e08fb2a5	Optimized the configuration of the outer reduction cuda kernel	2015-12-22 16:30:10 -08:00
Benoit Steiner	9c7d96697b	Added missing define	2015-12-22 16:11:07 -08:00
Benoit Steiner	e7e6d01810	Made sure the optimized gpu reduction code is actually compiled.	2015-12-22 15:07:33 -08:00
Benoit Steiner	b5d2078c4a	Optimized outer reduction on GPUs.	2015-12-22 15:06:17 -08:00
Benoit Steiner	1c3e78319d	Added missing const	2015-12-21 15:05:01 -08:00
Benoit Steiner	1b82969559	Add alignment requirement for local buffer used by the slicing op.	2015-12-18 14:36:35 -08:00
Benoit Steiner	75a7fa1919	Doubled the speed of full reductions on GPUs.	2015-12-18 14:07:31 -08:00
Benoit Steiner	8dd17cbe80	Fixed a clang compilation warning triggered by the use of arrays of size 0.	2015-12-17 14:00:33 -08:00
Benoit Steiner	4aac55f684	Silenced some compilation warnings triggered by nvcc	2015-12-17 13:39:01 -08:00
Benoit Steiner	40e6250fc3	Made it possible to run tensor chipping operations on CUDA devices	2015-12-17 13:29:08 -08:00
Benoit Steiner	2ca55a3ae4	Fixed some compilation error triggered by the tensor code with msvc 2008	2015-12-16 20:45:58 -08:00
Benoit Steiner	17352e2792	Made the entire TensorFixedSize api callable from a CUDA kernel.	2015-12-14 15:20:31 -08:00
Benoit Steiner	75e19fc7ca	Marked the tensor constructors as EIGEN_DEVICE_FUNC: This makes it possible to call them from a CUDA kernel.	2015-12-14 15:12:55 -08:00
Gael Guennebaud	ca39b1546e	Merged in ebrevdo/eigen (pull request PR-148) Add special functions to eigen: lgamma, erf, erfc.	2015-12-11 11:52:09 +01:00
Benoit Steiner	6af52a1227	Fixed a typo in the constructor of tensors of rank 5.	2015-12-10 23:31:12 -08:00
Benoit Steiner	8e00ea9a92	Fixed the coefficient accessors use for the 2d and 3d case when compiling without cxx11 support.	2015-12-10 22:45:10 -08:00
Eugene Brevdo	fa4f933c0f	Add special functions to Eigen: lgamma, erf, erfc. Includes CUDA support and unit tests.	2015-12-07 15:24:49 -08:00
Benoit Steiner	7dfe75f445	Fixed compilation warnings	2015-12-07 08:12:30 -08:00

1 2 3 4 5 ...

400 Commits