Commit Graph

1558 Commits

Author SHA1 Message Date
Benoit Steiner
ec35068edc Don't rely on the M_PI constant since not all compilers provide it. 2016-03-04 16:42:38 -08:00
Benoit Steiner
60d9df11c1 Fixed the computation of leading zeros when compiling with msvc. 2016-03-04 16:27:02 -08:00
Benoit Steiner
4e49fd5eb9 MSVC uses __uint128 while other compilers use __uint128_t to encode 128bit unsigned integers. Make the cxx11_tensor_uint128.cpp test work in both cases. 2016-03-04 14:49:18 -08:00
Benoit Steiner
667fcc2b53 Fixed syntax error 2016-03-04 14:37:51 -08:00
Benoit Steiner
4416a5dcff Added missing include 2016-03-04 14:35:43 -08:00
Benoit Steiner
c561eeb7bf Don't use implicit type conversions in initializer lists since not all compilers support them. 2016-03-04 14:12:45 -08:00
Benoit Steiner
174edf976b Made the contraction test more portable 2016-03-04 14:11:13 -08:00
Benoit Steiner
2c50fc878e Fixed a typo 2016-03-04 14:09:38 -08:00
Benoit Steiner
deea866bbd Added tests to cover the new rounding, flooring and ceiling tensor operations. 2016-03-03 12:38:02 -08:00
Benoit Steiner
5cf4558c0a Added support for rounding, flooring, and ceiling to the tensor api 2016-03-03 12:36:55 -08:00
Benoit Steiner
dac58d7c35 Added a test to validate the conversion of half floats into floats on Kepler GPUs.
Restricted the testing of the random number generation code to GPU architecture greater than or equal to 3.5.
2016-03-03 10:37:25 -08:00
Benoit Steiner
68ac5c1738 Improved the performance of large outer reductions on cuda 2016-02-29 18:11:58 -08:00
Benoit Steiner
b2075cb7a2 Made the signature of the inner and outer reducers consistent 2016-02-29 10:53:38 -08:00
Benoit Steiner
3284842045 Optimized the performance of narrow reductions on CUDA devices 2016-02-29 10:48:16 -08:00
Benoit Steiner
609b3337a7 Print some information to stderr when a CUDA kernel fails 2016-02-27 20:42:57 +00:00
Benoit Steiner
ac2e6e0d03 Properly vectorized the random number generators 2016-02-26 13:52:24 -08:00
Benoit Steiner
caa54d888f Made the TensorIndexList usable on GPU without having to use the -relaxed-constexpr compilation flag 2016-02-26 12:38:18 -08:00
Benoit Steiner
2cd32cad27 Reverted previous commit since it caused more problems than it solved 2016-02-26 13:21:44 +00:00
Benoit Steiner
d9d05dd96e Fixed handling of long doubles on aarch64 2016-02-26 04:13:58 -08:00
Benoit Steiner
af199b4658 Made the CUDA architecture level a build setting. 2016-02-25 09:06:18 -08:00
Benoit Steiner
c36c09169e Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices. 2016-02-24 17:07:25 -08:00
Benoit Steiner
7a01cb8e4b Marked the And and Or reducers as stateless. 2016-02-24 16:43:01 -08:00
Benoit Steiner
1d9256f7db Updated the padding code to work with half floats 2016-02-23 05:51:22 +00:00
Benoit Steiner
72d2cf642e Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode. 2016-02-22 15:29:41 -08:00
Benoit Steiner
5cd00068c0 include <iostream> in the tensor header since we now use it to better report cuda initialization errors 2016-02-22 13:59:03 -08:00
Benoit Steiner
257b640463 Fixed compilation warning generated by clang 2016-02-21 22:43:37 -08:00
Benoit Steiner
e644f60907 Pulled latest updates from trunk 2016-02-21 20:24:59 +00:00
Benoit Steiner
95fceb6452 Added the ability to compute the absolute value of a half float 2016-02-21 20:24:11 +00:00
Benoit Steiner
ed69cbeef0 Added some debugging information to the test to figure out why it fails sometimes 2016-02-21 11:20:20 -08:00
Benoit Steiner
96a24b05cc Optimized casting of tensors in the case where the casting happens to be a no-op 2016-02-21 11:16:15 -08:00
Benoit Steiner
203490017f Prevent unecessary Index to int conversions 2016-02-21 08:49:36 -08:00
Benoit Steiner
1e6fe6f046 Fixed the float16 tensor test. 2016-02-20 07:44:17 +00:00
Rasmus Munk Larsen
8eb127022b Get rid of duplicate code. 2016-02-19 16:33:30 -08:00
Rasmus Munk Larsen
d5e2ec7447 Speed up tensor FFT by up ~25-50%.
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_tensor_fft_single_1D_cpu/8            132       134     -1.5%
BM_tensor_fft_single_1D_cpu/9           1162      1229     -5.8%
BM_tensor_fft_single_1D_cpu/16           199       195     +2.0%
BM_tensor_fft_single_1D_cpu/17          2587      2267    +12.4%
BM_tensor_fft_single_1D_cpu/32           373       341     +8.6%
BM_tensor_fft_single_1D_cpu/33          5922      4879    +17.6%
BM_tensor_fft_single_1D_cpu/64           797       675    +15.3%
BM_tensor_fft_single_1D_cpu/65         13580     10481    +22.8%
BM_tensor_fft_single_1D_cpu/128         1753      1375    +21.6%
BM_tensor_fft_single_1D_cpu/129        31426     22789    +27.5%
BM_tensor_fft_single_1D_cpu/256         4005      3008    +24.9%
BM_tensor_fft_single_1D_cpu/257        70910     49549    +30.1%
BM_tensor_fft_single_1D_cpu/512         8989      6524    +27.4%
BM_tensor_fft_single_1D_cpu/513       165402    107751    +34.9%
BM_tensor_fft_single_1D_cpu/999       198293    115909    +41.5%
BM_tensor_fft_single_1D_cpu/1ki        21289     14143    +33.6%
BM_tensor_fft_single_1D_cpu/1k        361980    233355    +35.5%
BM_tensor_fft_double_1D_cpu/8            138       131     +5.1%
BM_tensor_fft_double_1D_cpu/9           1253      1133     +9.6%
BM_tensor_fft_double_1D_cpu/16           218       200     +8.3%
BM_tensor_fft_double_1D_cpu/17          2770      2392    +13.6%
BM_tensor_fft_double_1D_cpu/32           406       368     +9.4%
BM_tensor_fft_double_1D_cpu/33          6418      5153    +19.7%
BM_tensor_fft_double_1D_cpu/64           856       728    +15.0%
BM_tensor_fft_double_1D_cpu/65         14666     11148    +24.0%
BM_tensor_fft_double_1D_cpu/128         1913      1502    +21.5%
BM_tensor_fft_double_1D_cpu/129        36414     24072    +33.9%
BM_tensor_fft_double_1D_cpu/256         4226      3216    +23.9%
BM_tensor_fft_double_1D_cpu/257        86638     52059    +39.9%
BM_tensor_fft_double_1D_cpu/512         9397      6939    +26.2%
BM_tensor_fft_double_1D_cpu/513       203208    114090    +43.9%
BM_tensor_fft_double_1D_cpu/999       237841    125583    +47.2%
BM_tensor_fft_double_1D_cpu/1ki        20921     15392    +26.4%
BM_tensor_fft_double_1D_cpu/1k        455183    250763    +44.9%
BM_tensor_fft_single_2D_cpu/8           1051      1005     +4.4%
BM_tensor_fft_single_2D_cpu/9          16784     14837    +11.6%
BM_tensor_fft_single_2D_cpu/16          4074      3772     +7.4%
BM_tensor_fft_single_2D_cpu/17         75802     63884    +15.7%
BM_tensor_fft_single_2D_cpu/32         20580     16931    +17.7%
BM_tensor_fft_single_2D_cpu/33        345798    278579    +19.4%
BM_tensor_fft_single_2D_cpu/64         97548     81237    +16.7%
BM_tensor_fft_single_2D_cpu/65       1592701   1227048    +23.0%
BM_tensor_fft_single_2D_cpu/128       472318    384303    +18.6%
BM_tensor_fft_single_2D_cpu/129      7038351   5445308    +22.6%
BM_tensor_fft_single_2D_cpu/256      2309474   1850969    +19.9%
BM_tensor_fft_single_2D_cpu/257     31849182  23797538    +25.3%
BM_tensor_fft_single_2D_cpu/512     10395194   8077499    +22.3%
BM_tensor_fft_single_2D_cpu/513     144053843  104242541    +27.6%
BM_tensor_fft_single_2D_cpu/999     279885833  208389718    +25.5%
BM_tensor_fft_single_2D_cpu/1ki     45967677  36070985    +21.5%
BM_tensor_fft_single_2D_cpu/1k      619727095  456489500    +26.3%
BM_tensor_fft_double_2D_cpu/8           1110      1016     +8.5%
BM_tensor_fft_double_2D_cpu/9          17957     15768    +12.2%
BM_tensor_fft_double_2D_cpu/16          4558      4000    +12.2%
BM_tensor_fft_double_2D_cpu/17         79237     66901    +15.6%
BM_tensor_fft_double_2D_cpu/32         21494     17699    +17.7%
BM_tensor_fft_double_2D_cpu/33        357962    290357    +18.9%
BM_tensor_fft_double_2D_cpu/64        105179     87435    +16.9%
BM_tensor_fft_double_2D_cpu/65       1617143   1288006    +20.4%
BM_tensor_fft_double_2D_cpu/128       512848    419397    +18.2%
BM_tensor_fft_double_2D_cpu/129      7271322   5636884    +22.5%
BM_tensor_fft_double_2D_cpu/256      2415529   1922032    +20.4%
BM_tensor_fft_double_2D_cpu/257     32517952  24462177    +24.8%
BM_tensor_fft_double_2D_cpu/512     10724898   8287617    +22.7%
BM_tensor_fft_double_2D_cpu/513     146007419  108603266    +25.6%
BM_tensor_fft_double_2D_cpu/999     296351330  221885776    +25.1%
BM_tensor_fft_double_2D_cpu/1ki     59334166  48357539    +18.5%
BM_tensor_fft_double_2D_cpu/1k      666660132  483840349    +27.4%
2016-02-19 16:29:23 -08:00
Benoit Steiner
46fc23f91c Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues. 2016-02-19 13:44:22 -08:00
Benoit Steiner
670db7988d Updated the contraction code to make it compatible with half floats. 2016-02-19 13:03:26 -08:00
Benoit Steiner
180156ba1a Added support for tensor reductions on half floats 2016-02-19 10:05:59 -08:00
Benoit Steiner
f268db1c4b Added the ability to query the minor version of a cuda device 2016-02-19 16:31:04 +00:00
Benoit Steiner
a08d2ff0c9 Started to work on contractions and reductions using half floats 2016-02-19 15:59:59 +00:00
Benoit Steiner
f3352e0fb0 Don't make the array constructors explicit 2016-02-19 15:58:57 +00:00
Benoit Steiner
cd042dbbfd Fixed a bug in the tensor type converter 2016-02-19 15:03:26 +00:00
Benoit Steiner
ac5d706a94 Added support for simple coefficient wise tensor expression using half floats on CUDA devices 2016-02-19 08:19:12 +00:00
Benoit Steiner
0606a0a39b FP16 on CUDA are only available starting with cuda 7.5. Disable them when using an older version of CUDA 2016-02-18 23:15:23 -08:00
Benoit Steiner
f36c0c2c65 Added regression test for float16 2016-02-19 06:23:28 +00:00
Benoit Steiner
7151bd8768 Reverted unintended changes introduced by a bad merge 2016-02-19 06:20:50 +00:00
Benoit Steiner
17b9fbed34 Added preliminary support for half floats on CUDA GPU. For now we can simply convert floats into half floats and vice versa 2016-02-19 06:16:07 +00:00
Benoit Steiner
9e3f3a2d27 Deleted outdated comment 2016-02-11 17:27:35 -08:00
Benoit Steiner
de345eff2e Added a method to conjugate the content of a tensor or the result of a tensor expression. 2016-02-11 16:34:07 -08:00
Benoit Steiner
9a21b38ccc Worked around a few clang compilation warnings 2016-02-10 08:02:04 -08:00
Benoit Steiner
72ab7879f7 Fixed clang comilation warnings 2016-02-10 06:48:28 -08:00