eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Benoit Steiner	bc2b802751	Fixed a couple of typos	2016-03-22 14:27:34 -07:00
Benoit Steiner	e7a468c5b7	Filter some compilation flags that nvcc warns about.	2016-03-22 14:26:50 -07:00
Benoit Steiner	6a31b7be3e	Avoid using std::vector whenever possible	2016-03-22 14:02:50 -07:00
Benoit Steiner	65a7113a36	Use an enum instead of a static const int to prevent possible link error	2016-03-22 09:33:54 -07:00
Benoit Steiner	f9ad25e4d8	Fixed contractions of 16 bit floats	2016-03-22 09:30:23 -07:00
Benoit Steiner	8ef3181f15	Worked around a constness related issue	2016-03-21 11:24:05 -07:00
Benoit Steiner	7a07d6aa2b	Small cleanup	2016-03-21 11:12:17 -07:00
Benoit Steiner	e91f255301	Marked variables that's only used in debug mode as such	2016-03-21 10:02:00 -07:00
Benoit Steiner	db5c14de42	Explicitly cast the default value into the proper scalar type.	2016-03-21 09:52:58 -07:00
Benoit Steiner	8e03333f06	Renamed some class members to make the code more readable.	2016-03-18 15:21:04 -07:00
Benoit Steiner	6c08943d9f	Fixed a bug in the padding of extracted image patches.	2016-03-18 15:19:10 -07:00
Benoit Steiner	bb0e73c191	Gate all the CUDA tests under the EIGEN_TEST_NVCC option	2016-03-18 12:17:37 -07:00
Benoit Steiner	2db4a04827	Fixed a typo	2016-03-18 12:08:01 -07:00
Benoit Steiner	dd514de8a9	Added a test to validate the fallback path for half floats	2016-03-18 12:02:39 -07:00
Benoit Steiner	9a7ece9caf	Worked around constness issue	2016-03-18 10:38:29 -07:00
Benoit Steiner	edc679f6c6	Fixed compilation warning	2016-03-18 07:12:34 -07:00
Benoit Steiner	53d498ef06	Fixed compilation warnings in the cuda tests	2016-03-18 07:04:54 -07:00
Benoit Steiner	70eb70f5f8	Avoid mutable class members when possible	2016-03-17 21:47:18 -07:00
Benoit Steiner	95b8961a9b	Allocate the mersenne twister used by the random number generators on the heap instead of on the stack since they tend to keep a lot of state (i.e. about 5k) around.	2016-03-17 15:23:51 -07:00
Benoit Steiner	f7329619da	Fix bug in tensor contraction. The code assumes that contraction axis indices for the LHS (after possibly swapping to ColMajor!) is increasing. Explicitly sort the contraction axis pairs to make it so.	2016-03-17 15:08:02 -07:00
Christoph Hertzberg	46aa9772fc	Merged in ebrevdo/eigen (pull request PR-169) Bugfixes to cuda tests, igamma & igammac implemented, & tests for digamma, igamma, igammac on CPU & GPU.	2016-03-16 21:59:08 +01:00
Benoit Steiner	ab9b749b45	Improved a test	2016-03-14 20:03:13 -07:00
Benoit Steiner	048c4d6efd	Made half floats usable on hardware that doesn't support them natively.	2016-03-11 17:21:42 -08:00
Benoit Steiner	b72ffcb05e	Made the comparison of Eigen::array GPU friendly	2016-03-11 16:37:59 -08:00
Benoit Steiner	25f69cb932	Added a comparison operator for Eigen::array Alias Eigen::array to std::array when compiling with Visual Studio 2015	2016-03-11 15:20:37 -08:00
Benoit Steiner	c5b98a58b8	Updated the cxx11_meta test to work on the Eigen::array class when std::array isn't available.	2016-03-11 11:53:38 -08:00
Benoit Steiner	86d45a3c83	Worked around visual studio compilation warnings.	2016-03-09 21:29:39 -08:00
Benoit Steiner	8fd4241377	Fixed a typo.	2016-03-10 02:28:46 +00:00
Benoit Steiner	a685a6beed	Made the list reductions less ambiguous.	2016-03-09 17:41:52 -08:00
Benoit Steiner	3149b5b148	Avoid implicit cast	2016-03-09 17:35:17 -08:00
Benoit Steiner	b2100b83ad	Made sure to include the <random> header file when compiling with visual studio	2016-03-09 16:03:16 -08:00
Benoit Steiner	f05fb449b8	Avoid unnecessary conversion from 32bit int to 64bit unsigned int	2016-03-09 15:27:45 -08:00
Benoit Steiner	1d566417d2	Enable the random number generators when compiling with visual studio	2016-03-09 10:55:11 -08:00
Benoit Steiner	b084133dbf	Fixed the integer division code on windows	2016-03-09 07:06:36 -08:00
Benoit Steiner	6d30683113	Fixed static assertion	2016-03-08 21:02:51 -08:00
Eugene Brevdo	5e7de771e3	Properly fix merge issues.	2016-03-08 17:35:05 -08:00
Eugene Brevdo	73220d2bb0	Resolve bad merge.	2016-03-08 17:28:21 -08:00
Benoit Steiner	46177c8d64	Replace std::vector with our own implementation, as using the stl when compiling with nvcc and avx enabled leads to many issues.	2016-03-08 16:37:27 -08:00
Benoit Steiner	6d6413f768	Simplified the full reduction code	2016-03-08 16:02:00 -08:00
Benoit Steiner	5a427a94a9	Fixed the tensor generator code	2016-03-08 13:28:06 -08:00
Benoit Steiner	a81b88bef7	Fixed the tensor concatenation code	2016-03-08 12:30:19 -08:00
Benoit Steiner	551ff11d0d	Fixed the tensor layout swapping code	2016-03-08 12:28:10 -08:00
Benoit Steiner	8768c063f5	Fixed the tensor chipping code.	2016-03-08 12:26:49 -08:00
Benoit Steiner	e09eb835db	Decoupled the packet type definition from the definition of the tensor ops. All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit.	2016-03-08 12:07:33 -08:00
Benoit Steiner	3b614a2358	Use NumTraits::highest() and NumTraits::lowest() instead of the std::numeric_limits to make the tensor min and max functors more CUDA friendly.	2016-03-07 17:53:28 -08:00
Eugene Brevdo	0bb5de05a1	Finishing touches on igamma/igammac for GPU. Tests now pass.	2016-03-07 15:35:09 -08:00
Benoit Steiner	769685e74e	Added the ability to pad a tensor using a non-zero value	2016-03-07 14:45:37 -08:00
Benoit Steiner	7f87cc3a3b	Fix a couple of typos in the code.	2016-03-07 14:31:27 -08:00
Eugene Brevdo	5707004d6b	Fix Eigen's building of sharded tests that use CUDA & more igamma/igammac bugfixes. 0. Prior to this PR, not a single sharded CUDA test was actually being run. Fixed that. GPU tests are still failing for igamma/igammac. 1. Add calls for igamma/igammac to TensorBase 2. Fix up CUDA-specific calls of igamma/igammac 3. Add unit tests for digamma, igamma, igammac in CUDA.	2016-03-07 14:08:56 -08:00
Benoit Steiner	e5f25622e2	Added a test to validate the behavior of some of the tensor syntactic sugar.	2016-03-07 09:04:27 -08:00
Benoit Steiner	9f5740cbc1	Added missing include	2016-03-06 22:03:18 -08:00
Benoit Steiner	5238e03fe1	Don't try to compile the uint128 test with compilers that don't support uint127	2016-03-06 21:59:40 -08:00
Benoit Steiner	9a54c3e32b	Don't warn that msvc 2015 isn't c++11 compliant just because it doesn't claim to be.	2016-03-06 09:38:56 -08:00
Benoit Steiner	05bbca079a	Turn on some of the cxx11 features when compiling with visual studio 2015	2016-03-05 10:52:08 -08:00
Benoit Steiner	6093eb9ff5	Don't test our 128bit emulation code when compiling with msvc	2016-03-05 10:37:11 -08:00
Benoit Steiner	57b263c5b9	Avoid using initializer lists in test since not all version of msvc support them	2016-03-05 08:35:26 -08:00
Benoit Steiner	23aed8f2e4	Use EIGEN_PI instead of redefining our own constant PI	2016-03-05 08:04:45 -08:00
Benoit Steiner	c23e0be18f	Use the CMAKE_CXX_STANDARD variable to turn on cxx11	2016-03-04 20:18:01 -08:00
Benoit Steiner	ec35068edc	Don't rely on the M_PI constant since not all compilers provide it.	2016-03-04 16:42:38 -08:00
Benoit Steiner	60d9df11c1	Fixed the computation of leading zeros when compiling with msvc.	2016-03-04 16:27:02 -08:00
Benoit Steiner	4e49fd5eb9	MSVC uses __uint128 while other compilers use __uint128_t to encode 128bit unsigned integers. Make the cxx11_tensor_uint128.cpp test work in both cases.	2016-03-04 14:49:18 -08:00
Benoit Steiner	667fcc2b53	Fixed syntax error	2016-03-04 14:37:51 -08:00
Benoit Steiner	4416a5dcff	Added missing include	2016-03-04 14:35:43 -08:00
Benoit Steiner	c561eeb7bf	Don't use implicit type conversions in initializer lists since not all compilers support them.	2016-03-04 14:12:45 -08:00
Benoit Steiner	174edf976b	Made the contraction test more portable	2016-03-04 14:11:13 -08:00
Benoit Steiner	2c50fc878e	Fixed a typo	2016-03-04 14:09:38 -08:00
Benoit Steiner	deea866bbd	Added tests to cover the new rounding, flooring and ceiling tensor operations.	2016-03-03 12:38:02 -08:00
Benoit Steiner	5cf4558c0a	Added support for rounding, flooring, and ceiling to the tensor api	2016-03-03 12:36:55 -08:00
Benoit Steiner	dac58d7c35	Added a test to validate the conversion of half floats into floats on Kepler GPUs. Restricted the testing of the random number generation code to GPU architecture greater than or equal to 3.5.	2016-03-03 10:37:25 -08:00
Benoit Steiner	68ac5c1738	Improved the performance of large outer reductions on cuda	2016-02-29 18:11:58 -08:00
Benoit Steiner	b2075cb7a2	Made the signature of the inner and outer reducers consistent	2016-02-29 10:53:38 -08:00
Benoit Steiner	3284842045	Optimized the performance of narrow reductions on CUDA devices	2016-02-29 10:48:16 -08:00
Benoit Steiner	609b3337a7	Print some information to stderr when a CUDA kernel fails	2016-02-27 20:42:57 +00:00
Benoit Steiner	ac2e6e0d03	Properly vectorized the random number generators	2016-02-26 13:52:24 -08:00
Benoit Steiner	caa54d888f	Made the TensorIndexList usable on GPU without having to use the -relaxed-constexpr compilation flag	2016-02-26 12:38:18 -08:00
Benoit Steiner	2cd32cad27	Reverted previous commit since it caused more problems than it solved	2016-02-26 13:21:44 +00:00
Benoit Steiner	d9d05dd96e	Fixed handling of long doubles on aarch64	2016-02-26 04:13:58 -08:00
Benoit Steiner	af199b4658	Made the CUDA architecture level a build setting.	2016-02-25 09:06:18 -08:00
Benoit Steiner	c36c09169e	Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices.	2016-02-24 17:07:25 -08:00
Benoit Steiner	7a01cb8e4b	Marked the And and Or reducers as stateless.	2016-02-24 16:43:01 -08:00
Benoit Steiner	1d9256f7db	Updated the padding code to work with half floats	2016-02-23 05:51:22 +00:00
Benoit Steiner	72d2cf642e	Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode.	2016-02-22 15:29:41 -08:00
Benoit Steiner	5cd00068c0	include <iostream> in the tensor header since we now use it to better report cuda initialization errors	2016-02-22 13:59:03 -08:00
Benoit Steiner	257b640463	Fixed compilation warning generated by clang	2016-02-21 22:43:37 -08:00
Benoit Steiner	e644f60907	Pulled latest updates from trunk	2016-02-21 20:24:59 +00:00
Benoit Steiner	95fceb6452	Added the ability to compute the absolute value of a half float	2016-02-21 20:24:11 +00:00
Benoit Steiner	ed69cbeef0	Added some debugging information to the test to figure out why it fails sometimes	2016-02-21 11:20:20 -08:00
Benoit Steiner	96a24b05cc	Optimized casting of tensors in the case where the casting happens to be a no-op	2016-02-21 11:16:15 -08:00
Benoit Steiner	203490017f	Prevent unecessary Index to int conversions	2016-02-21 08:49:36 -08:00
Benoit Steiner	1e6fe6f046	Fixed the float16 tensor test.	2016-02-20 07:44:17 +00:00
Rasmus Munk Larsen	8eb127022b	Get rid of duplicate code.	2016-02-19 16:33:30 -08:00
Rasmus Munk Larsen	d5e2ec7447	Speed up tensor FFT by up ~25-50%. Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_tensor_fft_single_1D_cpu/8 132 134 -1.5% BM_tensor_fft_single_1D_cpu/9 1162 1229 -5.8% BM_tensor_fft_single_1D_cpu/16 199 195 +2.0% BM_tensor_fft_single_1D_cpu/17 2587 2267 +12.4% BM_tensor_fft_single_1D_cpu/32 373 341 +8.6% BM_tensor_fft_single_1D_cpu/33 5922 4879 +17.6% BM_tensor_fft_single_1D_cpu/64 797 675 +15.3% BM_tensor_fft_single_1D_cpu/65 13580 10481 +22.8% BM_tensor_fft_single_1D_cpu/128 1753 1375 +21.6% BM_tensor_fft_single_1D_cpu/129 31426 22789 +27.5% BM_tensor_fft_single_1D_cpu/256 4005 3008 +24.9% BM_tensor_fft_single_1D_cpu/257 70910 49549 +30.1% BM_tensor_fft_single_1D_cpu/512 8989 6524 +27.4% BM_tensor_fft_single_1D_cpu/513 165402 107751 +34.9% BM_tensor_fft_single_1D_cpu/999 198293 115909 +41.5% BM_tensor_fft_single_1D_cpu/1ki 21289 14143 +33.6% BM_tensor_fft_single_1D_cpu/1k 361980 233355 +35.5% BM_tensor_fft_double_1D_cpu/8 138 131 +5.1% BM_tensor_fft_double_1D_cpu/9 1253 1133 +9.6% BM_tensor_fft_double_1D_cpu/16 218 200 +8.3% BM_tensor_fft_double_1D_cpu/17 2770 2392 +13.6% BM_tensor_fft_double_1D_cpu/32 406 368 +9.4% BM_tensor_fft_double_1D_cpu/33 6418 5153 +19.7% BM_tensor_fft_double_1D_cpu/64 856 728 +15.0% BM_tensor_fft_double_1D_cpu/65 14666 11148 +24.0% BM_tensor_fft_double_1D_cpu/128 1913 1502 +21.5% BM_tensor_fft_double_1D_cpu/129 36414 24072 +33.9% BM_tensor_fft_double_1D_cpu/256 4226 3216 +23.9% BM_tensor_fft_double_1D_cpu/257 86638 52059 +39.9% BM_tensor_fft_double_1D_cpu/512 9397 6939 +26.2% BM_tensor_fft_double_1D_cpu/513 203208 114090 +43.9% BM_tensor_fft_double_1D_cpu/999 237841 125583 +47.2% BM_tensor_fft_double_1D_cpu/1ki 20921 15392 +26.4% BM_tensor_fft_double_1D_cpu/1k 455183 250763 +44.9% BM_tensor_fft_single_2D_cpu/8 1051 1005 +4.4% BM_tensor_fft_single_2D_cpu/9 16784 14837 +11.6% BM_tensor_fft_single_2D_cpu/16 4074 3772 +7.4% BM_tensor_fft_single_2D_cpu/17 75802 63884 +15.7% BM_tensor_fft_single_2D_cpu/32 20580 16931 +17.7% BM_tensor_fft_single_2D_cpu/33 345798 278579 +19.4% BM_tensor_fft_single_2D_cpu/64 97548 81237 +16.7% BM_tensor_fft_single_2D_cpu/65 1592701 1227048 +23.0% BM_tensor_fft_single_2D_cpu/128 472318 384303 +18.6% BM_tensor_fft_single_2D_cpu/129 7038351 5445308 +22.6% BM_tensor_fft_single_2D_cpu/256 2309474 1850969 +19.9% BM_tensor_fft_single_2D_cpu/257 31849182 23797538 +25.3% BM_tensor_fft_single_2D_cpu/512 10395194 8077499 +22.3% BM_tensor_fft_single_2D_cpu/513 144053843 104242541 +27.6% BM_tensor_fft_single_2D_cpu/999 279885833 208389718 +25.5% BM_tensor_fft_single_2D_cpu/1ki 45967677 36070985 +21.5% BM_tensor_fft_single_2D_cpu/1k 619727095 456489500 +26.3% BM_tensor_fft_double_2D_cpu/8 1110 1016 +8.5% BM_tensor_fft_double_2D_cpu/9 17957 15768 +12.2% BM_tensor_fft_double_2D_cpu/16 4558 4000 +12.2% BM_tensor_fft_double_2D_cpu/17 79237 66901 +15.6% BM_tensor_fft_double_2D_cpu/32 21494 17699 +17.7% BM_tensor_fft_double_2D_cpu/33 357962 290357 +18.9% BM_tensor_fft_double_2D_cpu/64 105179 87435 +16.9% BM_tensor_fft_double_2D_cpu/65 1617143 1288006 +20.4% BM_tensor_fft_double_2D_cpu/128 512848 419397 +18.2% BM_tensor_fft_double_2D_cpu/129 7271322 5636884 +22.5% BM_tensor_fft_double_2D_cpu/256 2415529 1922032 +20.4% BM_tensor_fft_double_2D_cpu/257 32517952 24462177 +24.8% BM_tensor_fft_double_2D_cpu/512 10724898 8287617 +22.7% BM_tensor_fft_double_2D_cpu/513 146007419 108603266 +25.6% BM_tensor_fft_double_2D_cpu/999 296351330 221885776 +25.1% BM_tensor_fft_double_2D_cpu/1ki 59334166 48357539 +18.5% BM_tensor_fft_double_2D_cpu/1k 666660132 483840349 +27.4%	2016-02-19 16:29:23 -08:00
Benoit Steiner	46fc23f91c	Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues.	2016-02-19 13:44:22 -08:00
Benoit Steiner	670db7988d	Updated the contraction code to make it compatible with half floats.	2016-02-19 13:03:26 -08:00
Benoit Steiner	180156ba1a	Added support for tensor reductions on half floats	2016-02-19 10:05:59 -08:00
Benoit Steiner	f268db1c4b	Added the ability to query the minor version of a cuda device	2016-02-19 16:31:04 +00:00
Benoit Steiner	a08d2ff0c9	Started to work on contractions and reductions using half floats	2016-02-19 15:59:59 +00:00
Benoit Steiner	f3352e0fb0	Don't make the array constructors explicit	2016-02-19 15:58:57 +00:00
Benoit Steiner	cd042dbbfd	Fixed a bug in the tensor type converter	2016-02-19 15:03:26 +00:00
Benoit Steiner	ac5d706a94	Added support for simple coefficient wise tensor expression using half floats on CUDA devices	2016-02-19 08:19:12 +00:00
Benoit Steiner	0606a0a39b	FP16 on CUDA are only available starting with cuda 7.5. Disable them when using an older version of CUDA	2016-02-18 23:15:23 -08:00
Benoit Steiner	f36c0c2c65	Added regression test for float16	2016-02-19 06:23:28 +00:00
Benoit Steiner	7151bd8768	Reverted unintended changes introduced by a bad merge	2016-02-19 06:20:50 +00:00
Benoit Steiner	17b9fbed34	Added preliminary support for half floats on CUDA GPU. For now we can simply convert floats into half floats and vice versa	2016-02-19 06:16:07 +00:00
Benoit Steiner	9e3f3a2d27	Deleted outdated comment	2016-02-11 17:27:35 -08:00
Benoit Steiner	de345eff2e	Added a method to conjugate the content of a tensor or the result of a tensor expression.	2016-02-11 16:34:07 -08:00
Benoit Steiner	9a21b38ccc	Worked around a few clang compilation warnings	2016-02-10 08:02:04 -08:00
Benoit Steiner	72ab7879f7	Fixed clang comilation warnings	2016-02-10 06:48:28 -08:00
Benoit Steiner	e88535634d	Fixed some clang compilation warnings	2016-02-09 23:32:41 -08:00
Benoit Steiner	6323851ea9	Fixed compilation warning	2016-02-09 20:43:41 -08:00
Benoit Steiner	d69946183d	Updated the TensorIntDivisor code to work properly on LLP64 systems	2016-02-08 21:03:59 -08:00
Benoit Steiner	4d4211c04e	Avoid unecessary type conversions	2016-02-05 18:19:41 -08:00
Benoit Steiner	d2cba52015	Only enable the cxx11_tensor_uint128 test on 64 bit machines since 32 bit systems don't support the __uin128_t type	2016-02-05 18:14:23 -08:00
Benoit Steiner	fb00a4af2b	Made the tensor fft test compile on tegra x1	2016-02-06 01:42:14 +00:00
Benoit Steiner	f535378995	Added support for vectorized type casting of int to char.	2016-02-03 18:58:29 -08:00
Benoit Steiner	4ab63a3f6f	Fixed the initialization of the dummy member of the array class to make it compatible with pairs of element.	2016-02-03 17:23:07 -08:00
Benoit Steiner	1cbb79cdfd	Made sure the dummy element of size 0 array is always intialized to silence some compiler warnings	2016-02-03 15:58:26 -08:00
Benoit Steiner	5d82e47ef6	Properly disable nvcc warning messages in user code.	2016-02-03 14:10:06 -08:00
Benoit Steiner	af8436b196	Silenced the "calling a __host__ function from a __host__ __device__ function is not allowed" messages	2016-02-03 13:48:36 -08:00
Benoit Steiner	dc413dbe8a	Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158) Add constructor for long types.	2016-02-02 20:58:06 -08:00
Ville Kallioniemi	783018d8f6	Use EIGEN_STATIC_ASSERT for backward compatibility.	2016-02-02 16:45:12 -07:00
Benoit Steiner	99cde88341	Don't try to use direct offsets when computing a tensor product, since the required stride isn't available.	2016-02-02 11:06:53 -08:00
Ville Kallioniemi	aedea349aa	Replace separate low word constructors with a single templated constructor.	2016-02-01 20:25:02 -07:00
Ville Kallioniemi	f0fdefa96f	Rebase to latest.	2016-02-01 19:32:31 -07:00
Benoit Steiner	64ce78c2ec	Cleaned up a tensor contraction test	2016-02-01 13:57:41 -08:00
Benoit Steiner	0ce5d32be5	Sharded the cxx11_tensor_contract_cuda test	2016-02-01 13:33:23 -08:00
Benoit Steiner	922b5f527b	Silenced a few compilation warnings	2016-02-01 13:30:49 -08:00
Benoit Steiner	6b5dff875e	Made it possible to limit the number of blocks that will be used to evaluate a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.	2016-02-01 12:46:32 -08:00
Benoit Steiner	264f8141f8	Shared the tensor reduction test	2016-02-01 07:44:31 -08:00
Benoit Steiner	11bb71c8fc	Sharded the tensor device test	2016-02-01 07:34:59 -08:00
Benoit Steiner	e80ed948e1	Fixed a number of compilation warnings generated by the cuda tests	2016-01-31 20:09:41 -08:00
Benoit Steiner	6720b38fbf	Fixed a few compilation warnings	2016-01-31 16:48:50 -08:00
Benoit Steiner	4a2ddfb81d	Sharded the CUDA argmax tensor test	2016-01-31 10:44:15 -08:00
Benoit Steiner	483082ef6e	Fixed a few memory leaks in the cuda tests	2016-01-30 11:59:22 -08:00
Benoit Steiner	bd21aba181	Sharded the cxx11_tensor_cuda test and fixed a memory leak	2016-01-30 11:47:09 -08:00
Benoit Steiner	9de155d153	Added a test to cover threaded tensor shuffling	2016-01-30 10:56:47 -08:00
Benoit Steiner	32088c06a1	Made the comparison between single and multithreaded contraction results more resistant to numerical noise to prevent spurious test failures.	2016-01-30 10:51:14 -08:00
Benoit Steiner	2053478c56	Made sure to use a tensor of rank 0 to store the result of a full reduction in the tensor thread pool test	2016-01-30 10:46:36 -08:00
Benoit Steiner	d0db95f730	Sharded the tensor thread pool test	2016-01-30 10:43:57 -08:00
Benoit Steiner	ba27c8a7de	Made the CUDA contract test more robust to numerical noise.	2016-01-30 10:28:43 -08:00
Benoit Steiner	963f2d2a8f	Marked several methods EIGEN_DEVICE_FUNC	2016-01-28 23:37:48 -08:00
Benoit Steiner	c5d25bf1d0	Fixed a couple of compilation warnings.	2016-01-28 23:15:45 -08:00
Benoit Steiner	7b3044d086	Made sure to call nvcc with the relaxed-constexpr flag.	2016-01-28 15:36:34 -08:00
Gael Guennebaud	ddf64babde	merge	2016-01-28 13:21:48 +01:00
Gael Guennebaud	7802a6bb1c	Fix unit test filename.	2016-01-28 09:35:37 +01:00
Benoit Steiner	4bf9eaf77a	Deleted an invalid assertion that prevented the assignment of empty tensors.	2016-01-27 17:09:30 -08:00
Benoit Steiner	291069e885	Fixed some compilation problems with nvcc + clang	2016-01-27 15:37:03 -08:00
Benoit Steiner	47ca9dc809	Fixed the tensor_cuda test	2016-01-27 14:58:48 -08:00
Benoit Steiner	55a5204319	Fixed the flags passed to nvcc to compile the tensor code.	2016-01-27 14:46:34 -08:00
Benoit Steiner	9dfbd4fe8d	Made the cuda tests compile using make check	2016-01-27 12:22:17 -08:00
Benoit Steiner	5973bcf939	Properly specify the namespace when calling cout/endl	2016-01-27 12:04:42 -08:00
Gael Guennebaud	9c8f7dfe94	bug #1156 : fix several function declarations whose arguments were passed by value instead of being passed by reference	2016-01-27 18:34:42 +01:00
Ville Kallioniemi	02db1228ed	Add constructor for long types.	2016-01-26 23:41:01 -07:00
Hauke Heibel	5eb2790be0	Fixed minor typo in SplineFitting.	2016-01-25 22:17:52 +01:00
Benoit Steiner	e3a15a03a4	Don't explicitely evaluate the subexpression from TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression	2016-01-24 23:04:50 -08:00
Benoit Steiner	bd207ce11e	Added missing EIGEN_DEVICE_FUNC qualifier	2016-01-24 20:36:05 -08:00
Benoit Steiner	cb4e53ff7f	Merged in ville-k/eigen/tensorflow_fix (pull request PR-153) Add ctor for long	2016-01-22 19:11:31 -08:00
Ville Kallioniemi	9f94e030c1	Re-add executable flags to minimize changeset.	2016-01-22 20:08:45 -07:00
Benoit Steiner	3aeeca32af	Leverage the new blocking code in the tensor contraction code.	2016-01-22 16:36:30 -08:00
Benoit Steiner	4beb447e27	Created a mechanism to enable contraction mappers to determine the best blocking strategy.	2016-01-22 14:37:26 -08:00
Gael Guennebaud	6a44ccb58b	Backout changeset `690bc950f7`	2016-01-22 15:03:53 +01:00
Ville Kallioniemi	9b6c72958a	Update to latest default branch	2016-01-21 23:08:54 -07:00
Benoit Steiner	c33479324c	Fixed a constness bug	2016-01-21 17:08:11 -08:00
Jan Prach	690bc950f7	fix clang warnings "braces around scalar initializer"	2016-01-20 19:35:59 -08:00
Benoit Steiner	7ce932edd3	Small cleanup and small fix to the contraction of row major tensors	2016-01-20 18:12:08 -08:00
Benoit Steiner	47076bf00e	Reduce the register pressure exerted by the tensor mappers whenever possible. This improves the performance of the contraction of a matrix with a vector by about 35%.	2016-01-20 14:51:48 -08:00
Ville Kallioniemi	915e7667cd	Remove executable bit from header files	2016-01-19 21:17:29 -07:00
Ville Kallioniemi	2832175a68	Use explicitly 32 bit integer types in constructors.	2016-01-19 20:12:17 -07:00
Benoit Steiner	df79c00901	Improved the formatting of the code	2016-01-19 17:24:08 -08:00
Benoit Steiner	6d472d8375	Moved the contraction mapping code to its own file to make the code more manageable.	2016-01-19 17:22:05 -08:00
Benoit Steiner	b3b722905f	Improved code indentation	2016-01-19 17:09:47 -08:00
Benoit Steiner	5b7713dd33	Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression.	2016-01-19 17:05:10 -08:00
Ville Kallioniemi	63fb66f53a	Add ctor for long	2016-01-17 21:25:36 -07:00
Benoit Steiner	34057cff23	Fixed a race condition that could affect some reductions on CUDA devices.	2016-01-15 15:11:56 -08:00
Benoit Steiner	0461f0153e	Made it possible to compare tensor dimensions inside a CUDA kernel.	2016-01-15 11:22:16 -08:00
Benoit Steiner	aed4cb1269	Use warp shuffles instead of shared memory access to speedup the inner reduction kernel.	2016-01-14 21:45:14 -08:00
Benoit Steiner	8fe2532e70	Fixed a boundary condition bug in the outer reduction kernel	2016-01-14 09:29:48 -08:00
Benoit Steiner	9f013a9d86	Properly record the rank of reduced tensors in the tensor traits.	2016-01-13 14:24:37 -08:00
Benoit Steiner	79b69b7444	Trigger the optimized matrix vector path more conservatively.	2016-01-12 15:21:09 -08:00
Benoit Steiner	d920d57f38	Improved the performance of the contraction of a 2d tensor with a 1d tensor by a factor of 3 or more. This helps speedup LSTM neural networks.	2016-01-12 11:32:27 -08:00
Benoit Steiner	bd7d901da9	Reverted a previous change that tripped nvcc when compiling in debug mode.	2016-01-11 17:49:44 -08:00
Benoit Steiner	c5e6900400	Silenced a few compilation warnings.	2016-01-11 17:06:39 -08:00
Benoit Steiner	f894736d61	Updated the tensor traits: the alignment is not part of the Flags enum anymore	2016-01-11 16:42:18 -08:00
Benoit Steiner	4f7714d72c	Enabled the use of fixed dimensions from within a cuda kernel.	2016-01-11 16:01:00 -08:00
Benoit Steiner	01c55d37e6	Deleted unused variable.	2016-01-11 15:53:19 -08:00
Benoit Steiner	0504c56ea7	Silenced a nvcc compilation warning	2016-01-11 15:49:21 -08:00
Benoit Steiner	b523771a24	Silenced several compilation warnings triggered by nvcc.	2016-01-11 14:25:43 -08:00
Benoit Steiner	2c3b13eded	Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152) Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.	2016-01-11 11:43:37 -08:00
Benoit Steiner	2ccb1c8634	Fixed a bug in the dispatch of optimized reduction kernels.	2016-01-11 10:36:37 -08:00
Benoit Steiner	780623261e	Re-enabled the optimized reduction CUDA code.	2016-01-11 09:07:14 -08:00
Jeremy Barnes	91678f489a	Cleaned up double-defined macro from last commit	2016-01-10 22:44:45 -05:00
Jeremy Barnes	403a7cb6c3	Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.	2016-01-10 22:39:13 -05:00
Benoit Steiner	e76904af1b	Simplified the dispatch code.	2016-01-08 16:50:57 -08:00
Benoit Steiner	d726e864ac	Made it possible to use array of size 0 on CUDA devices	2016-01-08 16:38:14 -08:00
Benoit Steiner	3358dfd5dd	Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases	2016-01-08 16:28:53 -08:00
Benoit Steiner	53749ff415	Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.	2016-01-08 13:53:40 -08:00
Benoit Steiner	6639b7d6e8	Removed a couple of partial specialization that confuse nvcc and result in errors such as this: error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"	2016-01-07 18:45:19 -08:00
Benoit Steiner	0cb2ca5de2	Fixed a typo.	2016-01-06 18:50:28 -08:00
Benoit Steiner	213459d818	Optimized the performance of broadcasting of scalars.	2016-01-06 18:47:45 -08:00
Benoit Steiner	cfff40b1d4	Improved the performance of reductions on CUDA devices	2016-01-04 17:25:00 -08:00

... 2 3 4 5 6 ...

1766 Commits