Benoit Steiner
|
f05fb449b8
|
Avoid unnecessary conversion from 32bit int to 64bit unsigned int
|
2016-03-09 15:27:45 -08:00 |
|
Benoit Steiner
|
1d566417d2
|
Enable the random number generators when compiling with visual studio
|
2016-03-09 10:55:11 -08:00 |
|
Benoit Steiner
|
b084133dbf
|
Fixed the integer division code on windows
|
2016-03-09 07:06:36 -08:00 |
|
Benoit Steiner
|
6d30683113
|
Fixed static assertion
|
2016-03-08 21:02:51 -08:00 |
|
Benoit Steiner
|
46177c8d64
|
Replace std::vector with our own implementation, as using the stl when compiling with nvcc and avx enabled leads to many issues.
|
2016-03-08 16:37:27 -08:00 |
|
Benoit Steiner
|
6d6413f768
|
Simplified the full reduction code
|
2016-03-08 16:02:00 -08:00 |
|
Benoit Steiner
|
5a427a94a9
|
Fixed the tensor generator code
|
2016-03-08 13:28:06 -08:00 |
|
Benoit Steiner
|
a81b88bef7
|
Fixed the tensor concatenation code
|
2016-03-08 12:30:19 -08:00 |
|
Benoit Steiner
|
551ff11d0d
|
Fixed the tensor layout swapping code
|
2016-03-08 12:28:10 -08:00 |
|
Benoit Steiner
|
8768c063f5
|
Fixed the tensor chipping code.
|
2016-03-08 12:26:49 -08:00 |
|
Benoit Steiner
|
e09eb835db
|
Decoupled the packet type definition from the definition of the tensor ops. All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit.
|
2016-03-08 12:07:33 -08:00 |
|
Benoit Steiner
|
3b614a2358
|
Use NumTraits::highest() and NumTraits::lowest() instead of the std::numeric_limits to make the tensor min and max functors more CUDA friendly.
|
2016-03-07 17:53:28 -08:00 |
|
Benoit Steiner
|
769685e74e
|
Added the ability to pad a tensor using a non-zero value
|
2016-03-07 14:45:37 -08:00 |
|
Benoit Steiner
|
7f87cc3a3b
|
Fix a couple of typos in the code.
|
2016-03-07 14:31:27 -08:00 |
|
Benoit Steiner
|
9a54c3e32b
|
Don't warn that msvc 2015 isn't c++11 compliant just because it doesn't claim to be.
|
2016-03-06 09:38:56 -08:00 |
|
Benoit Steiner
|
05bbca079a
|
Turn on some of the cxx11 features when compiling with visual studio 2015
|
2016-03-05 10:52:08 -08:00 |
|
Benoit Steiner
|
23aed8f2e4
|
Use EIGEN_PI instead of redefining our own constant PI
|
2016-03-05 08:04:45 -08:00 |
|
Benoit Steiner
|
ec35068edc
|
Don't rely on the M_PI constant since not all compilers provide it.
|
2016-03-04 16:42:38 -08:00 |
|
Benoit Steiner
|
60d9df11c1
|
Fixed the computation of leading zeros when compiling with msvc.
|
2016-03-04 16:27:02 -08:00 |
|
Benoit Steiner
|
c561eeb7bf
|
Don't use implicit type conversions in initializer lists since not all compilers support them.
|
2016-03-04 14:12:45 -08:00 |
|
Benoit Steiner
|
2c50fc878e
|
Fixed a typo
|
2016-03-04 14:09:38 -08:00 |
|
Benoit Steiner
|
5cf4558c0a
|
Added support for rounding, flooring, and ceiling to the tensor api
|
2016-03-03 12:36:55 -08:00 |
|
Benoit Steiner
|
68ac5c1738
|
Improved the performance of large outer reductions on cuda
|
2016-02-29 18:11:58 -08:00 |
|
Benoit Steiner
|
b2075cb7a2
|
Made the signature of the inner and outer reducers consistent
|
2016-02-29 10:53:38 -08:00 |
|
Benoit Steiner
|
3284842045
|
Optimized the performance of narrow reductions on CUDA devices
|
2016-02-29 10:48:16 -08:00 |
|
Benoit Steiner
|
609b3337a7
|
Print some information to stderr when a CUDA kernel fails
|
2016-02-27 20:42:57 +00:00 |
|
Benoit Steiner
|
ac2e6e0d03
|
Properly vectorized the random number generators
|
2016-02-26 13:52:24 -08:00 |
|
Benoit Steiner
|
caa54d888f
|
Made the TensorIndexList usable on GPU without having to use the -relaxed-constexpr compilation flag
|
2016-02-26 12:38:18 -08:00 |
|
Benoit Steiner
|
2cd32cad27
|
Reverted previous commit since it caused more problems than it solved
|
2016-02-26 13:21:44 +00:00 |
|
Benoit Steiner
|
d9d05dd96e
|
Fixed handling of long doubles on aarch64
|
2016-02-26 04:13:58 -08:00 |
|
Benoit Steiner
|
c36c09169e
|
Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices.
|
2016-02-24 17:07:25 -08:00 |
|
Benoit Steiner
|
7a01cb8e4b
|
Marked the And and Or reducers as stateless.
|
2016-02-24 16:43:01 -08:00 |
|
Benoit Steiner
|
1d9256f7db
|
Updated the padding code to work with half floats
|
2016-02-23 05:51:22 +00:00 |
|
Benoit Steiner
|
72d2cf642e
|
Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode.
|
2016-02-22 15:29:41 -08:00 |
|
Benoit Steiner
|
5cd00068c0
|
include <iostream> in the tensor header since we now use it to better report cuda initialization errors
|
2016-02-22 13:59:03 -08:00 |
|
Benoit Steiner
|
257b640463
|
Fixed compilation warning generated by clang
|
2016-02-21 22:43:37 -08:00 |
|
Benoit Steiner
|
96a24b05cc
|
Optimized casting of tensors in the case where the casting happens to be a no-op
|
2016-02-21 11:16:15 -08:00 |
|
Benoit Steiner
|
203490017f
|
Prevent unecessary Index to int conversions
|
2016-02-21 08:49:36 -08:00 |
|
Rasmus Munk Larsen
|
8eb127022b
|
Get rid of duplicate code.
|
2016-02-19 16:33:30 -08:00 |
|
Rasmus Munk Larsen
|
d5e2ec7447
|
Speed up tensor FFT by up ~25-50%.
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_tensor_fft_single_1D_cpu/8 132 134 -1.5%
BM_tensor_fft_single_1D_cpu/9 1162 1229 -5.8%
BM_tensor_fft_single_1D_cpu/16 199 195 +2.0%
BM_tensor_fft_single_1D_cpu/17 2587 2267 +12.4%
BM_tensor_fft_single_1D_cpu/32 373 341 +8.6%
BM_tensor_fft_single_1D_cpu/33 5922 4879 +17.6%
BM_tensor_fft_single_1D_cpu/64 797 675 +15.3%
BM_tensor_fft_single_1D_cpu/65 13580 10481 +22.8%
BM_tensor_fft_single_1D_cpu/128 1753 1375 +21.6%
BM_tensor_fft_single_1D_cpu/129 31426 22789 +27.5%
BM_tensor_fft_single_1D_cpu/256 4005 3008 +24.9%
BM_tensor_fft_single_1D_cpu/257 70910 49549 +30.1%
BM_tensor_fft_single_1D_cpu/512 8989 6524 +27.4%
BM_tensor_fft_single_1D_cpu/513 165402 107751 +34.9%
BM_tensor_fft_single_1D_cpu/999 198293 115909 +41.5%
BM_tensor_fft_single_1D_cpu/1ki 21289 14143 +33.6%
BM_tensor_fft_single_1D_cpu/1k 361980 233355 +35.5%
BM_tensor_fft_double_1D_cpu/8 138 131 +5.1%
BM_tensor_fft_double_1D_cpu/9 1253 1133 +9.6%
BM_tensor_fft_double_1D_cpu/16 218 200 +8.3%
BM_tensor_fft_double_1D_cpu/17 2770 2392 +13.6%
BM_tensor_fft_double_1D_cpu/32 406 368 +9.4%
BM_tensor_fft_double_1D_cpu/33 6418 5153 +19.7%
BM_tensor_fft_double_1D_cpu/64 856 728 +15.0%
BM_tensor_fft_double_1D_cpu/65 14666 11148 +24.0%
BM_tensor_fft_double_1D_cpu/128 1913 1502 +21.5%
BM_tensor_fft_double_1D_cpu/129 36414 24072 +33.9%
BM_tensor_fft_double_1D_cpu/256 4226 3216 +23.9%
BM_tensor_fft_double_1D_cpu/257 86638 52059 +39.9%
BM_tensor_fft_double_1D_cpu/512 9397 6939 +26.2%
BM_tensor_fft_double_1D_cpu/513 203208 114090 +43.9%
BM_tensor_fft_double_1D_cpu/999 237841 125583 +47.2%
BM_tensor_fft_double_1D_cpu/1ki 20921 15392 +26.4%
BM_tensor_fft_double_1D_cpu/1k 455183 250763 +44.9%
BM_tensor_fft_single_2D_cpu/8 1051 1005 +4.4%
BM_tensor_fft_single_2D_cpu/9 16784 14837 +11.6%
BM_tensor_fft_single_2D_cpu/16 4074 3772 +7.4%
BM_tensor_fft_single_2D_cpu/17 75802 63884 +15.7%
BM_tensor_fft_single_2D_cpu/32 20580 16931 +17.7%
BM_tensor_fft_single_2D_cpu/33 345798 278579 +19.4%
BM_tensor_fft_single_2D_cpu/64 97548 81237 +16.7%
BM_tensor_fft_single_2D_cpu/65 1592701 1227048 +23.0%
BM_tensor_fft_single_2D_cpu/128 472318 384303 +18.6%
BM_tensor_fft_single_2D_cpu/129 7038351 5445308 +22.6%
BM_tensor_fft_single_2D_cpu/256 2309474 1850969 +19.9%
BM_tensor_fft_single_2D_cpu/257 31849182 23797538 +25.3%
BM_tensor_fft_single_2D_cpu/512 10395194 8077499 +22.3%
BM_tensor_fft_single_2D_cpu/513 144053843 104242541 +27.6%
BM_tensor_fft_single_2D_cpu/999 279885833 208389718 +25.5%
BM_tensor_fft_single_2D_cpu/1ki 45967677 36070985 +21.5%
BM_tensor_fft_single_2D_cpu/1k 619727095 456489500 +26.3%
BM_tensor_fft_double_2D_cpu/8 1110 1016 +8.5%
BM_tensor_fft_double_2D_cpu/9 17957 15768 +12.2%
BM_tensor_fft_double_2D_cpu/16 4558 4000 +12.2%
BM_tensor_fft_double_2D_cpu/17 79237 66901 +15.6%
BM_tensor_fft_double_2D_cpu/32 21494 17699 +17.7%
BM_tensor_fft_double_2D_cpu/33 357962 290357 +18.9%
BM_tensor_fft_double_2D_cpu/64 105179 87435 +16.9%
BM_tensor_fft_double_2D_cpu/65 1617143 1288006 +20.4%
BM_tensor_fft_double_2D_cpu/128 512848 419397 +18.2%
BM_tensor_fft_double_2D_cpu/129 7271322 5636884 +22.5%
BM_tensor_fft_double_2D_cpu/256 2415529 1922032 +20.4%
BM_tensor_fft_double_2D_cpu/257 32517952 24462177 +24.8%
BM_tensor_fft_double_2D_cpu/512 10724898 8287617 +22.7%
BM_tensor_fft_double_2D_cpu/513 146007419 108603266 +25.6%
BM_tensor_fft_double_2D_cpu/999 296351330 221885776 +25.1%
BM_tensor_fft_double_2D_cpu/1ki 59334166 48357539 +18.5%
BM_tensor_fft_double_2D_cpu/1k 666660132 483840349 +27.4%
|
2016-02-19 16:29:23 -08:00 |
|
Benoit Steiner
|
46fc23f91c
|
Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues.
|
2016-02-19 13:44:22 -08:00 |
|
Benoit Steiner
|
670db7988d
|
Updated the contraction code to make it compatible with half floats.
|
2016-02-19 13:03:26 -08:00 |
|
Benoit Steiner
|
180156ba1a
|
Added support for tensor reductions on half floats
|
2016-02-19 10:05:59 -08:00 |
|
Benoit Steiner
|
f268db1c4b
|
Added the ability to query the minor version of a cuda device
|
2016-02-19 16:31:04 +00:00 |
|
Benoit Steiner
|
f3352e0fb0
|
Don't make the array constructors explicit
|
2016-02-19 15:58:57 +00:00 |
|
Benoit Steiner
|
cd042dbbfd
|
Fixed a bug in the tensor type converter
|
2016-02-19 15:03:26 +00:00 |
|
Benoit Steiner
|
de345eff2e
|
Added a method to conjugate the content of a tensor or the result of a tensor expression.
|
2016-02-11 16:34:07 -08:00 |
|
Benoit Steiner
|
9a21b38ccc
|
Worked around a few clang compilation warnings
|
2016-02-10 08:02:04 -08:00 |
|
Benoit Steiner
|
72ab7879f7
|
Fixed clang comilation warnings
|
2016-02-10 06:48:28 -08:00 |
|
Benoit Steiner
|
e88535634d
|
Fixed some clang compilation warnings
|
2016-02-09 23:32:41 -08:00 |
|