Benoit Steiner
|
68ac5c1738
|
Improved the performance of large outer reductions on cuda
|
2016-02-29 18:11:58 -08:00 |
|
Benoit Steiner
|
b2075cb7a2
|
Made the signature of the inner and outer reducers consistent
|
2016-02-29 10:53:38 -08:00 |
|
Benoit Steiner
|
3284842045
|
Optimized the performance of narrow reductions on CUDA devices
|
2016-02-29 10:48:16 -08:00 |
|
Benoit Steiner
|
609b3337a7
|
Print some information to stderr when a CUDA kernel fails
|
2016-02-27 20:42:57 +00:00 |
|
Benoit Steiner
|
ac2e6e0d03
|
Properly vectorized the random number generators
|
2016-02-26 13:52:24 -08:00 |
|
Benoit Steiner
|
caa54d888f
|
Made the TensorIndexList usable on GPU without having to use the -relaxed-constexpr compilation flag
|
2016-02-26 12:38:18 -08:00 |
|
Benoit Steiner
|
2cd32cad27
|
Reverted previous commit since it caused more problems than it solved
|
2016-02-26 13:21:44 +00:00 |
|
Benoit Steiner
|
d9d05dd96e
|
Fixed handling of long doubles on aarch64
|
2016-02-26 04:13:58 -08:00 |
|
Benoit Steiner
|
c36c09169e
|
Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices.
|
2016-02-24 17:07:25 -08:00 |
|
Benoit Steiner
|
7a01cb8e4b
|
Marked the And and Or reducers as stateless.
|
2016-02-24 16:43:01 -08:00 |
|
Benoit Steiner
|
1d9256f7db
|
Updated the padding code to work with half floats
|
2016-02-23 05:51:22 +00:00 |
|
Benoit Steiner
|
72d2cf642e
|
Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode.
|
2016-02-22 15:29:41 -08:00 |
|
Benoit Steiner
|
5cd00068c0
|
include <iostream> in the tensor header since we now use it to better report cuda initialization errors
|
2016-02-22 13:59:03 -08:00 |
|
Benoit Steiner
|
257b640463
|
Fixed compilation warning generated by clang
|
2016-02-21 22:43:37 -08:00 |
|
Benoit Steiner
|
96a24b05cc
|
Optimized casting of tensors in the case where the casting happens to be a no-op
|
2016-02-21 11:16:15 -08:00 |
|
Benoit Steiner
|
203490017f
|
Prevent unecessary Index to int conversions
|
2016-02-21 08:49:36 -08:00 |
|
Rasmus Munk Larsen
|
8eb127022b
|
Get rid of duplicate code.
|
2016-02-19 16:33:30 -08:00 |
|
Rasmus Munk Larsen
|
d5e2ec7447
|
Speed up tensor FFT by up ~25-50%.
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_tensor_fft_single_1D_cpu/8 132 134 -1.5%
BM_tensor_fft_single_1D_cpu/9 1162 1229 -5.8%
BM_tensor_fft_single_1D_cpu/16 199 195 +2.0%
BM_tensor_fft_single_1D_cpu/17 2587 2267 +12.4%
BM_tensor_fft_single_1D_cpu/32 373 341 +8.6%
BM_tensor_fft_single_1D_cpu/33 5922 4879 +17.6%
BM_tensor_fft_single_1D_cpu/64 797 675 +15.3%
BM_tensor_fft_single_1D_cpu/65 13580 10481 +22.8%
BM_tensor_fft_single_1D_cpu/128 1753 1375 +21.6%
BM_tensor_fft_single_1D_cpu/129 31426 22789 +27.5%
BM_tensor_fft_single_1D_cpu/256 4005 3008 +24.9%
BM_tensor_fft_single_1D_cpu/257 70910 49549 +30.1%
BM_tensor_fft_single_1D_cpu/512 8989 6524 +27.4%
BM_tensor_fft_single_1D_cpu/513 165402 107751 +34.9%
BM_tensor_fft_single_1D_cpu/999 198293 115909 +41.5%
BM_tensor_fft_single_1D_cpu/1ki 21289 14143 +33.6%
BM_tensor_fft_single_1D_cpu/1k 361980 233355 +35.5%
BM_tensor_fft_double_1D_cpu/8 138 131 +5.1%
BM_tensor_fft_double_1D_cpu/9 1253 1133 +9.6%
BM_tensor_fft_double_1D_cpu/16 218 200 +8.3%
BM_tensor_fft_double_1D_cpu/17 2770 2392 +13.6%
BM_tensor_fft_double_1D_cpu/32 406 368 +9.4%
BM_tensor_fft_double_1D_cpu/33 6418 5153 +19.7%
BM_tensor_fft_double_1D_cpu/64 856 728 +15.0%
BM_tensor_fft_double_1D_cpu/65 14666 11148 +24.0%
BM_tensor_fft_double_1D_cpu/128 1913 1502 +21.5%
BM_tensor_fft_double_1D_cpu/129 36414 24072 +33.9%
BM_tensor_fft_double_1D_cpu/256 4226 3216 +23.9%
BM_tensor_fft_double_1D_cpu/257 86638 52059 +39.9%
BM_tensor_fft_double_1D_cpu/512 9397 6939 +26.2%
BM_tensor_fft_double_1D_cpu/513 203208 114090 +43.9%
BM_tensor_fft_double_1D_cpu/999 237841 125583 +47.2%
BM_tensor_fft_double_1D_cpu/1ki 20921 15392 +26.4%
BM_tensor_fft_double_1D_cpu/1k 455183 250763 +44.9%
BM_tensor_fft_single_2D_cpu/8 1051 1005 +4.4%
BM_tensor_fft_single_2D_cpu/9 16784 14837 +11.6%
BM_tensor_fft_single_2D_cpu/16 4074 3772 +7.4%
BM_tensor_fft_single_2D_cpu/17 75802 63884 +15.7%
BM_tensor_fft_single_2D_cpu/32 20580 16931 +17.7%
BM_tensor_fft_single_2D_cpu/33 345798 278579 +19.4%
BM_tensor_fft_single_2D_cpu/64 97548 81237 +16.7%
BM_tensor_fft_single_2D_cpu/65 1592701 1227048 +23.0%
BM_tensor_fft_single_2D_cpu/128 472318 384303 +18.6%
BM_tensor_fft_single_2D_cpu/129 7038351 5445308 +22.6%
BM_tensor_fft_single_2D_cpu/256 2309474 1850969 +19.9%
BM_tensor_fft_single_2D_cpu/257 31849182 23797538 +25.3%
BM_tensor_fft_single_2D_cpu/512 10395194 8077499 +22.3%
BM_tensor_fft_single_2D_cpu/513 144053843 104242541 +27.6%
BM_tensor_fft_single_2D_cpu/999 279885833 208389718 +25.5%
BM_tensor_fft_single_2D_cpu/1ki 45967677 36070985 +21.5%
BM_tensor_fft_single_2D_cpu/1k 619727095 456489500 +26.3%
BM_tensor_fft_double_2D_cpu/8 1110 1016 +8.5%
BM_tensor_fft_double_2D_cpu/9 17957 15768 +12.2%
BM_tensor_fft_double_2D_cpu/16 4558 4000 +12.2%
BM_tensor_fft_double_2D_cpu/17 79237 66901 +15.6%
BM_tensor_fft_double_2D_cpu/32 21494 17699 +17.7%
BM_tensor_fft_double_2D_cpu/33 357962 290357 +18.9%
BM_tensor_fft_double_2D_cpu/64 105179 87435 +16.9%
BM_tensor_fft_double_2D_cpu/65 1617143 1288006 +20.4%
BM_tensor_fft_double_2D_cpu/128 512848 419397 +18.2%
BM_tensor_fft_double_2D_cpu/129 7271322 5636884 +22.5%
BM_tensor_fft_double_2D_cpu/256 2415529 1922032 +20.4%
BM_tensor_fft_double_2D_cpu/257 32517952 24462177 +24.8%
BM_tensor_fft_double_2D_cpu/512 10724898 8287617 +22.7%
BM_tensor_fft_double_2D_cpu/513 146007419 108603266 +25.6%
BM_tensor_fft_double_2D_cpu/999 296351330 221885776 +25.1%
BM_tensor_fft_double_2D_cpu/1ki 59334166 48357539 +18.5%
BM_tensor_fft_double_2D_cpu/1k 666660132 483840349 +27.4%
|
2016-02-19 16:29:23 -08:00 |
|
Benoit Steiner
|
46fc23f91c
|
Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues.
|
2016-02-19 13:44:22 -08:00 |
|
Benoit Steiner
|
670db7988d
|
Updated the contraction code to make it compatible with half floats.
|
2016-02-19 13:03:26 -08:00 |
|
Benoit Steiner
|
180156ba1a
|
Added support for tensor reductions on half floats
|
2016-02-19 10:05:59 -08:00 |
|
Benoit Steiner
|
f268db1c4b
|
Added the ability to query the minor version of a cuda device
|
2016-02-19 16:31:04 +00:00 |
|
Benoit Steiner
|
f3352e0fb0
|
Don't make the array constructors explicit
|
2016-02-19 15:58:57 +00:00 |
|
Benoit Steiner
|
cd042dbbfd
|
Fixed a bug in the tensor type converter
|
2016-02-19 15:03:26 +00:00 |
|
Benoit Steiner
|
de345eff2e
|
Added a method to conjugate the content of a tensor or the result of a tensor expression.
|
2016-02-11 16:34:07 -08:00 |
|
Benoit Steiner
|
9a21b38ccc
|
Worked around a few clang compilation warnings
|
2016-02-10 08:02:04 -08:00 |
|
Benoit Steiner
|
72ab7879f7
|
Fixed clang comilation warnings
|
2016-02-10 06:48:28 -08:00 |
|
Benoit Steiner
|
e88535634d
|
Fixed some clang compilation warnings
|
2016-02-09 23:32:41 -08:00 |
|
Benoit Steiner
|
d69946183d
|
Updated the TensorIntDivisor code to work properly on LLP64 systems
|
2016-02-08 21:03:59 -08:00 |
|
Benoit Steiner
|
4d4211c04e
|
Avoid unecessary type conversions
|
2016-02-05 18:19:41 -08:00 |
|
Benoit Steiner
|
f535378995
|
Added support for vectorized type casting of int to char.
|
2016-02-03 18:58:29 -08:00 |
|
Benoit Steiner
|
4ab63a3f6f
|
Fixed the initialization of the dummy member of the array class to make it compatible with pairs of element.
|
2016-02-03 17:23:07 -08:00 |
|
Benoit Steiner
|
1cbb79cdfd
|
Made sure the dummy element of size 0 array is always intialized to silence some compiler warnings
|
2016-02-03 15:58:26 -08:00 |
|
Benoit Steiner
|
dc413dbe8a
|
Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158)
Add constructor for long types.
|
2016-02-02 20:58:06 -08:00 |
|
Ville Kallioniemi
|
783018d8f6
|
Use EIGEN_STATIC_ASSERT for backward compatibility.
|
2016-02-02 16:45:12 -07:00 |
|
Benoit Steiner
|
99cde88341
|
Don't try to use direct offsets when computing a tensor product, since the required stride isn't available.
|
2016-02-02 11:06:53 -08:00 |
|
Ville Kallioniemi
|
aedea349aa
|
Replace separate low word constructors with a single templated constructor.
|
2016-02-01 20:25:02 -07:00 |
|
Ville Kallioniemi
|
f0fdefa96f
|
Rebase to latest.
|
2016-02-01 19:32:31 -07:00 |
|
Benoit Steiner
|
6b5dff875e
|
Made it possible to limit the number of blocks that will be used to evaluate a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.
|
2016-02-01 12:46:32 -08:00 |
|
Benoit Steiner
|
e80ed948e1
|
Fixed a number of compilation warnings generated by the cuda tests
|
2016-01-31 20:09:41 -08:00 |
|
Benoit Steiner
|
6720b38fbf
|
Fixed a few compilation warnings
|
2016-01-31 16:48:50 -08:00 |
|
Benoit Steiner
|
963f2d2a8f
|
Marked several methods EIGEN_DEVICE_FUNC
|
2016-01-28 23:37:48 -08:00 |
|
Benoit Steiner
|
c5d25bf1d0
|
Fixed a couple of compilation warnings.
|
2016-01-28 23:15:45 -08:00 |
|
Gael Guennebaud
|
ddf64babde
|
merge
|
2016-01-28 13:21:48 +01:00 |
|
Benoit Steiner
|
4bf9eaf77a
|
Deleted an invalid assertion that prevented the assignment of empty tensors.
|
2016-01-27 17:09:30 -08:00 |
|
Benoit Steiner
|
291069e885
|
Fixed some compilation problems with nvcc + clang
|
2016-01-27 15:37:03 -08:00 |
|
Gael Guennebaud
|
9c8f7dfe94
|
bug #1156: fix several function declarations whose arguments were passed by value instead of being passed by reference
|
2016-01-27 18:34:42 +01:00 |
|
Ville Kallioniemi
|
02db1228ed
|
Add constructor for long types.
|
2016-01-26 23:41:01 -07:00 |
|
Hauke Heibel
|
5eb2790be0
|
Fixed minor typo in SplineFitting.
|
2016-01-25 22:17:52 +01:00 |
|
Benoit Steiner
|
e3a15a03a4
|
Don't explicitely evaluate the subexpression from TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression
|
2016-01-24 23:04:50 -08:00 |
|