eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-27 07:29:52 +08:00

Author	SHA1	Message	Date
Benoit Steiner	fd9401f260	Refined the cost of the striding operation.	2016-04-25 09:16:08 -07:00
Benoit Steiner	4bbc97be5e	Provide access to the base threadpool classes	2016-04-21 17:59:33 -07:00
Benoit Steiner	33adce5c3a	Added the ability to switch to the new thread pool with a #define	2016-04-21 11:59:58 -07:00
Benoit Steiner	f670613e4b	Fixed several compilation warnings	2016-04-21 11:03:02 -07:00
Benoit Steiner	2dde1b1028	Don't crash when attempting to reduce empty tensors.	2016-04-20 18:08:20 -07:00
Benoit Steiner	c7c2054bb5	Started to implement a portable way to yield.	2016-04-19 17:59:58 -07:00
Benoit Steiner	2b72163028	Implemented a more portable version of thread local variables	2016-04-19 15:56:02 -07:00
Benoit Steiner	5b1106c56b	Fixed a compilation error with nvcc 7.	2016-04-19 14:57:57 -07:00
Benoit Steiner	7129d998db	Simplified the code that launches cuda kernels.	2016-04-19 14:55:21 -07:00
Benoit Steiner	b9ea40c30d	Don't take the address of a kernel on CUDA devices that don't support this feature.	2016-04-19 14:35:11 -07:00
Benoit Steiner	884c075058	Use numext::ceil instead of std::ceil	2016-04-19 14:33:30 -07:00
Benoit Steiner	a278414d1b	Avoid an unnecessary copy of the evaluator.	2016-04-19 13:54:28 -07:00
Benoit Steiner	50968a0a3e	Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors.	2016-04-19 11:53:58 -07:00
Benoit Steiner	c8e8f93d6c	Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators.	2016-04-15 16:48:10 -07:00
Benoit Steiner	7cff898e0a	Deleted unnecessary variable	2016-04-15 15:46:14 -07:00
Benoit Steiner	6c43c49e4a	Fixed a few compilation warnings	2016-04-15 15:34:34 -07:00
Benoit Steiner	eb669f989f	Merged in rmlarsen/eigen (pull request PR-178) Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.	2016-04-15 14:53:15 -07:00
Rasmus Munk Larsen	3718bf654b	Get rid of void* casting when calling EvalRange::run.	2016-04-15 12:51:33 -07:00
Benoit Steiner	a62e924656	Added ability to access the cache sizes from the tensor devices	2016-04-14 21:25:06 -07:00
Benoit Steiner	18e6f67426	Added support for exclusive or	2016-04-14 20:37:46 -07:00
Rasmus Munk Larsen	07ac4f7e02	Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default.	2016-04-14 18:28:23 -07:00
Benoit Steiner	9624a1ea3d	Added missing definition of PacketSize in the gpu evaluator of convolution	2016-04-14 17:16:58 -07:00
Benoit Steiner	6fbedf5a4e	Merged in rmlarsen/eigen (pull request PR-177) Eigen Tensor cost model part 1.	2016-04-14 17:13:19 -07:00
Benoit Steiner	1372156c41	Prepared the migration to the new non blocking thread pool	2016-04-14 16:16:42 -07:00
Rasmus Munk Larsen	aeb5494a0b	Improvements to cost model.	2016-04-14 15:52:58 -07:00
Benoit Steiner	78a51abc12	Added a more scalable non blocking thread pool	2016-04-14 15:23:10 -07:00
Rasmus Munk Larsen	d2e95492e7	Merge upstream updates.	2016-04-14 13:59:50 -07:00
Rasmus Munk Larsen	235e83aba6	Eigen cost model part 1. This implements a basic recursive framework to estimate the cost of evaluating tensor expressions.	2016-04-14 13:57:35 -07:00
Benoit Steiner	5912ad877c	Silenced a compilation warning	2016-04-14 11:40:14 -07:00
Benoit Steiner	c7167fee0e	Added support for fp16 to the sigmoid function	2016-04-14 10:08:33 -07:00
Benoit Steiner	3b76df64fc	Defer the decision to vectorize tensor CUDA code to the meta kernel. This makes it possible to decide to vectorize or not depending on the capability of the target cuda architecture. In particular, this enables us to vectorize the processing of fp16 when running on device of capability >= 5.3	2016-04-12 10:58:51 -07:00
Benoit Steiner	7d5b17087f	Added missing EIGEN_DEVICE_FUNC to the tensor conversion code.	2016-04-07 20:01:19 -07:00
Benoit Steiner	48308ed801	Added support for isinf, isnan, and isfinite checks to the tensor api	2016-04-07 09:48:36 -07:00
Benoit Steiner	7be1eaad1e	Fixed typos in the implementation of the zeta and polygamma ops.	2016-04-06 14:15:37 -07:00
Till Hoffmann	80eba21ad0	Merge upstream.	2016-04-01 18:18:49 +01:00
Till Hoffmann	ffd770ce94	Fixed CUDA signature.	2016-04-01 17:58:24 +01:00
tillahoffmann	49960adbdd	Merged eigen/eigen into default	2016-04-01 14:36:15 +01:00
Till Hoffmann	57239f4a81	Added polygamma function.	2016-04-01 14:35:21 +01:00
Till Hoffmann	dd5d390daf	Added zeta function.	2016-04-01 13:32:29 +01:00
Benoit Steiner	3da495e6b9	Relaxed the condition used to gate the fft code.	2016-03-31 18:11:51 -07:00
Benoit Steiner	0f5cc504fe	Properly gate the fft code	2016-03-31 12:59:39 -07:00
Benoit Steiner	af4ef540bf	Fixed a off-by-one bug in a debug assertion	2016-03-30 18:37:19 -07:00
Benoit Steiner	791e5cfb69	Added NumTraits for type2index.	2016-03-30 18:36:36 -07:00
Benoit Steiner	483aaad10a	Fixed compilation warning	2016-03-30 17:08:13 -07:00
Benoit Steiner	1b40abbf99	Added missing assignment operator to the TensorUInt128 class, and made misc small improvements	2016-03-30 13:17:03 -07:00
Benoit Steiner	aa45ad2aac	Fixed the formatting of the README.	2016-03-29 15:06:13 -07:00
Benoit Steiner	56df5ef1d7	Attempt to fix the formatting of the README	2016-03-29 15:03:38 -07:00
Benoit Steiner	c38295f0a0	Added support for fmod	2016-03-28 15:53:02 -07:00
Benoit Steiner	6772f653c3	Made it possible to customize the threadpool	2016-03-28 10:01:04 -07:00
Benoit Steiner	1bc81f7889	Fixed compilation warnings on arm	2016-03-28 09:21:04 -07:00
Benoit Steiner	78f83d6f6a	Prevent potential overflow.	2016-03-28 09:18:04 -07:00
Benoit Steiner	74f91ed06c	Improved support for integer modulo	2016-03-25 17:21:56 -07:00
Benoit Steiner	41434a8a85	Avoid unnecessary conversions	2016-03-23 16:52:38 -07:00
Benoit Steiner	92693b50eb	Fixed compilation warning	2016-03-23 16:40:36 -07:00
Benoit Steiner	393bc3b16b	Added comment	2016-03-23 16:22:15 -07:00
Benoit Steiner	3d1e857327	Fixed compilation error	2016-03-22 15:48:28 -07:00
Benoit Steiner	de7d92c259	Pulled latest updates from trunk	2016-03-22 15:24:49 -07:00
Benoit Steiner	002cf0d1c9	Use a single Barrier instead of a collection of Notifications to reduce the thread synchronization overhead	2016-03-22 15:24:23 -07:00
Benoit Steiner	bc2b802751	Fixed a couple of typos	2016-03-22 14:27:34 -07:00
Benoit Steiner	6a31b7be3e	Avoid using std::vector whenever possible	2016-03-22 14:02:50 -07:00
Benoit Steiner	65a7113a36	Use an enum instead of a static const int to prevent possible link error	2016-03-22 09:33:54 -07:00
Benoit Steiner	f9ad25e4d8	Fixed contractions of 16 bit floats	2016-03-22 09:30:23 -07:00
Benoit Steiner	8ef3181f15	Worked around a constness related issue	2016-03-21 11:24:05 -07:00
Benoit Steiner	7a07d6aa2b	Small cleanup	2016-03-21 11:12:17 -07:00
Benoit Steiner	e91f255301	Marked variables that's only used in debug mode as such	2016-03-21 10:02:00 -07:00
Benoit Steiner	db5c14de42	Explicitly cast the default value into the proper scalar type.	2016-03-21 09:52:58 -07:00
Benoit Steiner	8e03333f06	Renamed some class members to make the code more readable.	2016-03-18 15:21:04 -07:00
Benoit Steiner	6c08943d9f	Fixed a bug in the padding of extracted image patches.	2016-03-18 15:19:10 -07:00
Benoit Steiner	9a7ece9caf	Worked around constness issue	2016-03-18 10:38:29 -07:00
Benoit Steiner	edc679f6c6	Fixed compilation warning	2016-03-18 07:12:34 -07:00
Benoit Steiner	70eb70f5f8	Avoid mutable class members when possible	2016-03-17 21:47:18 -07:00
Benoit Steiner	95b8961a9b	Allocate the mersenne twister used by the random number generators on the heap instead of on the stack since they tend to keep a lot of state (i.e. about 5k) around.	2016-03-17 15:23:51 -07:00
Benoit Steiner	f7329619da	Fix bug in tensor contraction. The code assumes that contraction axis indices for the LHS (after possibly swapping to ColMajor!) is increasing. Explicitly sort the contraction axis pairs to make it so.	2016-03-17 15:08:02 -07:00
Christoph Hertzberg	46aa9772fc	Merged in ebrevdo/eigen (pull request PR-169) Bugfixes to cuda tests, igamma & igammac implemented, & tests for digamma, igamma, igammac on CPU & GPU.	2016-03-16 21:59:08 +01:00
Benoit Steiner	b72ffcb05e	Made the comparison of Eigen::array GPU friendly	2016-03-11 16:37:59 -08:00
Benoit Steiner	25f69cb932	Added a comparison operator for Eigen::array Alias Eigen::array to std::array when compiling with Visual Studio 2015	2016-03-11 15:20:37 -08:00
Benoit Steiner	86d45a3c83	Worked around visual studio compilation warnings.	2016-03-09 21:29:39 -08:00
Benoit Steiner	8fd4241377	Fixed a typo.	2016-03-10 02:28:46 +00:00
Benoit Steiner	a685a6beed	Made the list reductions less ambiguous.	2016-03-09 17:41:52 -08:00
Benoit Steiner	3149b5b148	Avoid implicit cast	2016-03-09 17:35:17 -08:00
Benoit Steiner	f05fb449b8	Avoid unnecessary conversion from 32bit int to 64bit unsigned int	2016-03-09 15:27:45 -08:00
Benoit Steiner	1d566417d2	Enable the random number generators when compiling with visual studio	2016-03-09 10:55:11 -08:00
Benoit Steiner	b084133dbf	Fixed the integer division code on windows	2016-03-09 07:06:36 -08:00
Benoit Steiner	6d30683113	Fixed static assertion	2016-03-08 21:02:51 -08:00
Eugene Brevdo	5e7de771e3	Properly fix merge issues.	2016-03-08 17:35:05 -08:00
Benoit Steiner	46177c8d64	Replace std::vector with our own implementation, as using the stl when compiling with nvcc and avx enabled leads to many issues.	2016-03-08 16:37:27 -08:00
Benoit Steiner	6d6413f768	Simplified the full reduction code	2016-03-08 16:02:00 -08:00
Benoit Steiner	5a427a94a9	Fixed the tensor generator code	2016-03-08 13:28:06 -08:00
Benoit Steiner	a81b88bef7	Fixed the tensor concatenation code	2016-03-08 12:30:19 -08:00
Benoit Steiner	551ff11d0d	Fixed the tensor layout swapping code	2016-03-08 12:28:10 -08:00
Benoit Steiner	8768c063f5	Fixed the tensor chipping code.	2016-03-08 12:26:49 -08:00
Benoit Steiner	e09eb835db	Decoupled the packet type definition from the definition of the tensor ops. All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit.	2016-03-08 12:07:33 -08:00
Benoit Steiner	3b614a2358	Use NumTraits::highest() and NumTraits::lowest() instead of the std::numeric_limits to make the tensor min and max functors more CUDA friendly.	2016-03-07 17:53:28 -08:00
Benoit Steiner	769685e74e	Added the ability to pad a tensor using a non-zero value	2016-03-07 14:45:37 -08:00
Benoit Steiner	7f87cc3a3b	Fix a couple of typos in the code.	2016-03-07 14:31:27 -08:00
Eugene Brevdo	5707004d6b	Fix Eigen's building of sharded tests that use CUDA & more igamma/igammac bugfixes. 0. Prior to this PR, not a single sharded CUDA test was actually being run. Fixed that. GPU tests are still failing for igamma/igammac. 1. Add calls for igamma/igammac to TensorBase 2. Fix up CUDA-specific calls of igamma/igammac 3. Add unit tests for digamma, igamma, igammac in CUDA.	2016-03-07 14:08:56 -08:00
Benoit Steiner	9a54c3e32b	Don't warn that msvc 2015 isn't c++11 compliant just because it doesn't claim to be.	2016-03-06 09:38:56 -08:00
Benoit Steiner	23aed8f2e4	Use EIGEN_PI instead of redefining our own constant PI	2016-03-05 08:04:45 -08:00
Benoit Steiner	ec35068edc	Don't rely on the M_PI constant since not all compilers provide it.	2016-03-04 16:42:38 -08:00
Benoit Steiner	60d9df11c1	Fixed the computation of leading zeros when compiling with msvc.	2016-03-04 16:27:02 -08:00
Benoit Steiner	c561eeb7bf	Don't use implicit type conversions in initializer lists since not all compilers support them.	2016-03-04 14:12:45 -08:00
Benoit Steiner	2c50fc878e	Fixed a typo	2016-03-04 14:09:38 -08:00
Benoit Steiner	5cf4558c0a	Added support for rounding, flooring, and ceiling to the tensor api	2016-03-03 12:36:55 -08:00
Benoit Steiner	68ac5c1738	Improved the performance of large outer reductions on cuda	2016-02-29 18:11:58 -08:00
Benoit Steiner	b2075cb7a2	Made the signature of the inner and outer reducers consistent	2016-02-29 10:53:38 -08:00
Benoit Steiner	3284842045	Optimized the performance of narrow reductions on CUDA devices	2016-02-29 10:48:16 -08:00
Benoit Steiner	609b3337a7	Print some information to stderr when a CUDA kernel fails	2016-02-27 20:42:57 +00:00
Benoit Steiner	ac2e6e0d03	Properly vectorized the random number generators	2016-02-26 13:52:24 -08:00
Benoit Steiner	caa54d888f	Made the TensorIndexList usable on GPU without having to use the -relaxed-constexpr compilation flag	2016-02-26 12:38:18 -08:00
Benoit Steiner	c36c09169e	Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices.	2016-02-24 17:07:25 -08:00
Benoit Steiner	7a01cb8e4b	Marked the And and Or reducers as stateless.	2016-02-24 16:43:01 -08:00
Benoit Steiner	1d9256f7db	Updated the padding code to work with half floats	2016-02-23 05:51:22 +00:00
Benoit Steiner	72d2cf642e	Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode.	2016-02-22 15:29:41 -08:00
Benoit Steiner	257b640463	Fixed compilation warning generated by clang	2016-02-21 22:43:37 -08:00
Benoit Steiner	96a24b05cc	Optimized casting of tensors in the case where the casting happens to be a no-op	2016-02-21 11:16:15 -08:00
Benoit Steiner	203490017f	Prevent unecessary Index to int conversions	2016-02-21 08:49:36 -08:00
Rasmus Munk Larsen	8eb127022b	Get rid of duplicate code.	2016-02-19 16:33:30 -08:00
Rasmus Munk Larsen	d5e2ec7447	Speed up tensor FFT by up ~25-50%. Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_tensor_fft_single_1D_cpu/8 132 134 -1.5% BM_tensor_fft_single_1D_cpu/9 1162 1229 -5.8% BM_tensor_fft_single_1D_cpu/16 199 195 +2.0% BM_tensor_fft_single_1D_cpu/17 2587 2267 +12.4% BM_tensor_fft_single_1D_cpu/32 373 341 +8.6% BM_tensor_fft_single_1D_cpu/33 5922 4879 +17.6% BM_tensor_fft_single_1D_cpu/64 797 675 +15.3% BM_tensor_fft_single_1D_cpu/65 13580 10481 +22.8% BM_tensor_fft_single_1D_cpu/128 1753 1375 +21.6% BM_tensor_fft_single_1D_cpu/129 31426 22789 +27.5% BM_tensor_fft_single_1D_cpu/256 4005 3008 +24.9% BM_tensor_fft_single_1D_cpu/257 70910 49549 +30.1% BM_tensor_fft_single_1D_cpu/512 8989 6524 +27.4% BM_tensor_fft_single_1D_cpu/513 165402 107751 +34.9% BM_tensor_fft_single_1D_cpu/999 198293 115909 +41.5% BM_tensor_fft_single_1D_cpu/1ki 21289 14143 +33.6% BM_tensor_fft_single_1D_cpu/1k 361980 233355 +35.5% BM_tensor_fft_double_1D_cpu/8 138 131 +5.1% BM_tensor_fft_double_1D_cpu/9 1253 1133 +9.6% BM_tensor_fft_double_1D_cpu/16 218 200 +8.3% BM_tensor_fft_double_1D_cpu/17 2770 2392 +13.6% BM_tensor_fft_double_1D_cpu/32 406 368 +9.4% BM_tensor_fft_double_1D_cpu/33 6418 5153 +19.7% BM_tensor_fft_double_1D_cpu/64 856 728 +15.0% BM_tensor_fft_double_1D_cpu/65 14666 11148 +24.0% BM_tensor_fft_double_1D_cpu/128 1913 1502 +21.5% BM_tensor_fft_double_1D_cpu/129 36414 24072 +33.9% BM_tensor_fft_double_1D_cpu/256 4226 3216 +23.9% BM_tensor_fft_double_1D_cpu/257 86638 52059 +39.9% BM_tensor_fft_double_1D_cpu/512 9397 6939 +26.2% BM_tensor_fft_double_1D_cpu/513 203208 114090 +43.9% BM_tensor_fft_double_1D_cpu/999 237841 125583 +47.2% BM_tensor_fft_double_1D_cpu/1ki 20921 15392 +26.4% BM_tensor_fft_double_1D_cpu/1k 455183 250763 +44.9% BM_tensor_fft_single_2D_cpu/8 1051 1005 +4.4% BM_tensor_fft_single_2D_cpu/9 16784 14837 +11.6% BM_tensor_fft_single_2D_cpu/16 4074 3772 +7.4% BM_tensor_fft_single_2D_cpu/17 75802 63884 +15.7% BM_tensor_fft_single_2D_cpu/32 20580 16931 +17.7% BM_tensor_fft_single_2D_cpu/33 345798 278579 +19.4% BM_tensor_fft_single_2D_cpu/64 97548 81237 +16.7% BM_tensor_fft_single_2D_cpu/65 1592701 1227048 +23.0% BM_tensor_fft_single_2D_cpu/128 472318 384303 +18.6% BM_tensor_fft_single_2D_cpu/129 7038351 5445308 +22.6% BM_tensor_fft_single_2D_cpu/256 2309474 1850969 +19.9% BM_tensor_fft_single_2D_cpu/257 31849182 23797538 +25.3% BM_tensor_fft_single_2D_cpu/512 10395194 8077499 +22.3% BM_tensor_fft_single_2D_cpu/513 144053843 104242541 +27.6% BM_tensor_fft_single_2D_cpu/999 279885833 208389718 +25.5% BM_tensor_fft_single_2D_cpu/1ki 45967677 36070985 +21.5% BM_tensor_fft_single_2D_cpu/1k 619727095 456489500 +26.3% BM_tensor_fft_double_2D_cpu/8 1110 1016 +8.5% BM_tensor_fft_double_2D_cpu/9 17957 15768 +12.2% BM_tensor_fft_double_2D_cpu/16 4558 4000 +12.2% BM_tensor_fft_double_2D_cpu/17 79237 66901 +15.6% BM_tensor_fft_double_2D_cpu/32 21494 17699 +17.7% BM_tensor_fft_double_2D_cpu/33 357962 290357 +18.9% BM_tensor_fft_double_2D_cpu/64 105179 87435 +16.9% BM_tensor_fft_double_2D_cpu/65 1617143 1288006 +20.4% BM_tensor_fft_double_2D_cpu/128 512848 419397 +18.2% BM_tensor_fft_double_2D_cpu/129 7271322 5636884 +22.5% BM_tensor_fft_double_2D_cpu/256 2415529 1922032 +20.4% BM_tensor_fft_double_2D_cpu/257 32517952 24462177 +24.8% BM_tensor_fft_double_2D_cpu/512 10724898 8287617 +22.7% BM_tensor_fft_double_2D_cpu/513 146007419 108603266 +25.6% BM_tensor_fft_double_2D_cpu/999 296351330 221885776 +25.1% BM_tensor_fft_double_2D_cpu/1ki 59334166 48357539 +18.5% BM_tensor_fft_double_2D_cpu/1k 666660132 483840349 +27.4%	2016-02-19 16:29:23 -08:00
Benoit Steiner	46fc23f91c	Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues.	2016-02-19 13:44:22 -08:00
Benoit Steiner	670db7988d	Updated the contraction code to make it compatible with half floats.	2016-02-19 13:03:26 -08:00
Benoit Steiner	180156ba1a	Added support for tensor reductions on half floats	2016-02-19 10:05:59 -08:00
Benoit Steiner	f268db1c4b	Added the ability to query the minor version of a cuda device	2016-02-19 16:31:04 +00:00
Benoit Steiner	f3352e0fb0	Don't make the array constructors explicit	2016-02-19 15:58:57 +00:00
Benoit Steiner	cd042dbbfd	Fixed a bug in the tensor type converter	2016-02-19 15:03:26 +00:00
Benoit Steiner	de345eff2e	Added a method to conjugate the content of a tensor or the result of a tensor expression.	2016-02-11 16:34:07 -08:00
Benoit Steiner	9a21b38ccc	Worked around a few clang compilation warnings	2016-02-10 08:02:04 -08:00
Benoit Steiner	72ab7879f7	Fixed clang comilation warnings	2016-02-10 06:48:28 -08:00
Benoit Steiner	e88535634d	Fixed some clang compilation warnings	2016-02-09 23:32:41 -08:00
Benoit Steiner	d69946183d	Updated the TensorIntDivisor code to work properly on LLP64 systems	2016-02-08 21:03:59 -08:00
Benoit Steiner	4d4211c04e	Avoid unecessary type conversions	2016-02-05 18:19:41 -08:00
Benoit Steiner	f535378995	Added support for vectorized type casting of int to char.	2016-02-03 18:58:29 -08:00
Benoit Steiner	4ab63a3f6f	Fixed the initialization of the dummy member of the array class to make it compatible with pairs of element.	2016-02-03 17:23:07 -08:00
Benoit Steiner	1cbb79cdfd	Made sure the dummy element of size 0 array is always intialized to silence some compiler warnings	2016-02-03 15:58:26 -08:00
Benoit Steiner	dc413dbe8a	Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158) Add constructor for long types.	2016-02-02 20:58:06 -08:00
Ville Kallioniemi	783018d8f6	Use EIGEN_STATIC_ASSERT for backward compatibility.	2016-02-02 16:45:12 -07:00
Benoit Steiner	99cde88341	Don't try to use direct offsets when computing a tensor product, since the required stride isn't available.	2016-02-02 11:06:53 -08:00
Ville Kallioniemi	aedea349aa	Replace separate low word constructors with a single templated constructor.	2016-02-01 20:25:02 -07:00
Ville Kallioniemi	f0fdefa96f	Rebase to latest.	2016-02-01 19:32:31 -07:00
Benoit Steiner	6b5dff875e	Made it possible to limit the number of blocks that will be used to evaluate a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.	2016-02-01 12:46:32 -08:00
Benoit Steiner	e80ed948e1	Fixed a number of compilation warnings generated by the cuda tests	2016-01-31 20:09:41 -08:00
Benoit Steiner	6720b38fbf	Fixed a few compilation warnings	2016-01-31 16:48:50 -08:00
Benoit Steiner	963f2d2a8f	Marked several methods EIGEN_DEVICE_FUNC	2016-01-28 23:37:48 -08:00
Benoit Steiner	c5d25bf1d0	Fixed a couple of compilation warnings.	2016-01-28 23:15:45 -08:00
Gael Guennebaud	ddf64babde	merge	2016-01-28 13:21:48 +01:00
Benoit Steiner	4bf9eaf77a	Deleted an invalid assertion that prevented the assignment of empty tensors.	2016-01-27 17:09:30 -08:00
Benoit Steiner	291069e885	Fixed some compilation problems with nvcc + clang	2016-01-27 15:37:03 -08:00
Ville Kallioniemi	02db1228ed	Add constructor for long types.	2016-01-26 23:41:01 -07:00
Benoit Steiner	e3a15a03a4	Don't explicitely evaluate the subexpression from TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression	2016-01-24 23:04:50 -08:00
Benoit Steiner	bd207ce11e	Added missing EIGEN_DEVICE_FUNC qualifier	2016-01-24 20:36:05 -08:00
Benoit Steiner	cb4e53ff7f	Merged in ville-k/eigen/tensorflow_fix (pull request PR-153) Add ctor for long	2016-01-22 19:11:31 -08:00
Benoit Steiner	3aeeca32af	Leverage the new blocking code in the tensor contraction code.	2016-01-22 16:36:30 -08:00
Benoit Steiner	4beb447e27	Created a mechanism to enable contraction mappers to determine the best blocking strategy.	2016-01-22 14:37:26 -08:00
Gael Guennebaud	6a44ccb58b	Backout changeset `690bc950f7`	2016-01-22 15:03:53 +01:00
Ville Kallioniemi	9b6c72958a	Update to latest default branch	2016-01-21 23:08:54 -07:00
Benoit Steiner	c33479324c	Fixed a constness bug	2016-01-21 17:08:11 -08:00
Jan Prach	690bc950f7	fix clang warnings "braces around scalar initializer"	2016-01-20 19:35:59 -08:00
Benoit Steiner	7ce932edd3	Small cleanup and small fix to the contraction of row major tensors	2016-01-20 18:12:08 -08:00
Benoit Steiner	47076bf00e	Reduce the register pressure exerted by the tensor mappers whenever possible. This improves the performance of the contraction of a matrix with a vector by about 35%.	2016-01-20 14:51:48 -08:00
Ville Kallioniemi	2832175a68	Use explicitly 32 bit integer types in constructors.	2016-01-19 20:12:17 -07:00
Benoit Steiner	df79c00901	Improved the formatting of the code	2016-01-19 17:24:08 -08:00
Benoit Steiner	6d472d8375	Moved the contraction mapping code to its own file to make the code more manageable.	2016-01-19 17:22:05 -08:00
Benoit Steiner	b3b722905f	Improved code indentation	2016-01-19 17:09:47 -08:00
Benoit Steiner	5b7713dd33	Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression.	2016-01-19 17:05:10 -08:00
Ville Kallioniemi	63fb66f53a	Add ctor for long	2016-01-17 21:25:36 -07:00
Benoit Steiner	34057cff23	Fixed a race condition that could affect some reductions on CUDA devices.	2016-01-15 15:11:56 -08:00
Benoit Steiner	0461f0153e	Made it possible to compare tensor dimensions inside a CUDA kernel.	2016-01-15 11:22:16 -08:00
Benoit Steiner	aed4cb1269	Use warp shuffles instead of shared memory access to speedup the inner reduction kernel.	2016-01-14 21:45:14 -08:00
Benoit Steiner	8fe2532e70	Fixed a boundary condition bug in the outer reduction kernel	2016-01-14 09:29:48 -08:00
Benoit Steiner	9f013a9d86	Properly record the rank of reduced tensors in the tensor traits.	2016-01-13 14:24:37 -08:00
Benoit Steiner	79b69b7444	Trigger the optimized matrix vector path more conservatively.	2016-01-12 15:21:09 -08:00
Benoit Steiner	d920d57f38	Improved the performance of the contraction of a 2d tensor with a 1d tensor by a factor of 3 or more. This helps speedup LSTM neural networks.	2016-01-12 11:32:27 -08:00
Benoit Steiner	bd7d901da9	Reverted a previous change that tripped nvcc when compiling in debug mode.	2016-01-11 17:49:44 -08:00
Benoit Steiner	c5e6900400	Silenced a few compilation warnings.	2016-01-11 17:06:39 -08:00
Benoit Steiner	f894736d61	Updated the tensor traits: the alignment is not part of the Flags enum anymore	2016-01-11 16:42:18 -08:00
Benoit Steiner	4f7714d72c	Enabled the use of fixed dimensions from within a cuda kernel.	2016-01-11 16:01:00 -08:00
Benoit Steiner	01c55d37e6	Deleted unused variable.	2016-01-11 15:53:19 -08:00
Benoit Steiner	0504c56ea7	Silenced a nvcc compilation warning	2016-01-11 15:49:21 -08:00
Benoit Steiner	b523771a24	Silenced several compilation warnings triggered by nvcc.	2016-01-11 14:25:43 -08:00
Benoit Steiner	2c3b13eded	Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152) Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.	2016-01-11 11:43:37 -08:00
Benoit Steiner	2ccb1c8634	Fixed a bug in the dispatch of optimized reduction kernels.	2016-01-11 10:36:37 -08:00
Benoit Steiner	780623261e	Re-enabled the optimized reduction CUDA code.	2016-01-11 09:07:14 -08:00
Jeremy Barnes	91678f489a	Cleaned up double-defined macro from last commit	2016-01-10 22:44:45 -05:00
Jeremy Barnes	403a7cb6c3	Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.	2016-01-10 22:39:13 -05:00
Benoit Steiner	e76904af1b	Simplified the dispatch code.	2016-01-08 16:50:57 -08:00
Benoit Steiner	d726e864ac	Made it possible to use array of size 0 on CUDA devices	2016-01-08 16:38:14 -08:00
Benoit Steiner	3358dfd5dd	Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases	2016-01-08 16:28:53 -08:00
Benoit Steiner	53749ff415	Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.	2016-01-08 13:53:40 -08:00
Benoit Steiner	6639b7d6e8	Removed a couple of partial specialization that confuse nvcc and result in errors such as this: error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"	2016-01-07 18:45:19 -08:00
Benoit Steiner	0cb2ca5de2	Fixed a typo.	2016-01-06 18:50:28 -08:00
Benoit Steiner	213459d818	Optimized the performance of broadcasting of scalars.	2016-01-06 18:47:45 -08:00
Benoit Steiner	cfff40b1d4	Improved the performance of reductions on CUDA devices	2016-01-04 17:25:00 -08:00
Benoit Steiner	515dee0baf	Added a 'divup' util to compute the floor of the quotient of two integers	2016-01-04 16:29:26 -08:00
Gael Guennebaud	978c379ed7	Add missing ctor from uint	2015-12-30 12:52:38 +01:00
Eugene Brevdo	f7362772e3	Add digamma for CPU + CUDA. Includes tests.	2015-12-24 21:15:38 -08:00
Benoit Steiner	bdcbc66a5c	Don't attempt to vectorize mean reductions of integers since we can't use SSE or AVX instructions to divide 2 integers.	2015-12-22 17:51:55 -08:00
Benoit Steiner	a1e08fb2a5	Optimized the configuration of the outer reduction cuda kernel	2015-12-22 16:30:10 -08:00
Benoit Steiner	9c7d96697b	Added missing define	2015-12-22 16:11:07 -08:00
Benoit Steiner	e7e6d01810	Made sure the optimized gpu reduction code is actually compiled.	2015-12-22 15:07:33 -08:00
Benoit Steiner	b5d2078c4a	Optimized outer reduction on GPUs.	2015-12-22 15:06:17 -08:00
Benoit Steiner	1c3e78319d	Added missing const	2015-12-21 15:05:01 -08:00

... 2 3 4 5 6 ...

707 Commits