Benoit Steiner
|
584832cb3c
|
Implemented the ptranspose function on half floats
|
2016-02-21 12:44:53 -08:00 |
|
Benoit Steiner
|
e644f60907
|
Pulled latest updates from trunk
|
2016-02-21 20:24:59 +00:00 |
|
Benoit Steiner
|
95fceb6452
|
Added the ability to compute the absolute value of a half float
|
2016-02-21 20:24:11 +00:00 |
|
Benoit Steiner
|
ed69cbeef0
|
Added some debugging information to the test to figure out why it fails sometimes
|
2016-02-21 11:20:20 -08:00 |
|
Benoit Steiner
|
96a24b05cc
|
Optimized casting of tensors in the case where the casting happens to be a no-op
|
2016-02-21 11:16:15 -08:00 |
|
Benoit Steiner
|
203490017f
|
Prevent unecessary Index to int conversions
|
2016-02-21 08:49:36 -08:00 |
|
Benoit Steiner
|
9ff269a1d3
|
Moved some of the fp16 operators outside the Eigen namespace to workaround some nvcc limitations.
|
2016-02-20 07:47:23 +00:00 |
|
Benoit Steiner
|
1e6fe6f046
|
Fixed the float16 tensor test.
|
2016-02-20 07:44:17 +00:00 |
|
Rasmus Munk Larsen
|
8eb127022b
|
Get rid of duplicate code.
|
2016-02-19 16:33:30 -08:00 |
|
Rasmus Munk Larsen
|
d5e2ec7447
|
Speed up tensor FFT by up ~25-50%.
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_tensor_fft_single_1D_cpu/8 132 134 -1.5%
BM_tensor_fft_single_1D_cpu/9 1162 1229 -5.8%
BM_tensor_fft_single_1D_cpu/16 199 195 +2.0%
BM_tensor_fft_single_1D_cpu/17 2587 2267 +12.4%
BM_tensor_fft_single_1D_cpu/32 373 341 +8.6%
BM_tensor_fft_single_1D_cpu/33 5922 4879 +17.6%
BM_tensor_fft_single_1D_cpu/64 797 675 +15.3%
BM_tensor_fft_single_1D_cpu/65 13580 10481 +22.8%
BM_tensor_fft_single_1D_cpu/128 1753 1375 +21.6%
BM_tensor_fft_single_1D_cpu/129 31426 22789 +27.5%
BM_tensor_fft_single_1D_cpu/256 4005 3008 +24.9%
BM_tensor_fft_single_1D_cpu/257 70910 49549 +30.1%
BM_tensor_fft_single_1D_cpu/512 8989 6524 +27.4%
BM_tensor_fft_single_1D_cpu/513 165402 107751 +34.9%
BM_tensor_fft_single_1D_cpu/999 198293 115909 +41.5%
BM_tensor_fft_single_1D_cpu/1ki 21289 14143 +33.6%
BM_tensor_fft_single_1D_cpu/1k 361980 233355 +35.5%
BM_tensor_fft_double_1D_cpu/8 138 131 +5.1%
BM_tensor_fft_double_1D_cpu/9 1253 1133 +9.6%
BM_tensor_fft_double_1D_cpu/16 218 200 +8.3%
BM_tensor_fft_double_1D_cpu/17 2770 2392 +13.6%
BM_tensor_fft_double_1D_cpu/32 406 368 +9.4%
BM_tensor_fft_double_1D_cpu/33 6418 5153 +19.7%
BM_tensor_fft_double_1D_cpu/64 856 728 +15.0%
BM_tensor_fft_double_1D_cpu/65 14666 11148 +24.0%
BM_tensor_fft_double_1D_cpu/128 1913 1502 +21.5%
BM_tensor_fft_double_1D_cpu/129 36414 24072 +33.9%
BM_tensor_fft_double_1D_cpu/256 4226 3216 +23.9%
BM_tensor_fft_double_1D_cpu/257 86638 52059 +39.9%
BM_tensor_fft_double_1D_cpu/512 9397 6939 +26.2%
BM_tensor_fft_double_1D_cpu/513 203208 114090 +43.9%
BM_tensor_fft_double_1D_cpu/999 237841 125583 +47.2%
BM_tensor_fft_double_1D_cpu/1ki 20921 15392 +26.4%
BM_tensor_fft_double_1D_cpu/1k 455183 250763 +44.9%
BM_tensor_fft_single_2D_cpu/8 1051 1005 +4.4%
BM_tensor_fft_single_2D_cpu/9 16784 14837 +11.6%
BM_tensor_fft_single_2D_cpu/16 4074 3772 +7.4%
BM_tensor_fft_single_2D_cpu/17 75802 63884 +15.7%
BM_tensor_fft_single_2D_cpu/32 20580 16931 +17.7%
BM_tensor_fft_single_2D_cpu/33 345798 278579 +19.4%
BM_tensor_fft_single_2D_cpu/64 97548 81237 +16.7%
BM_tensor_fft_single_2D_cpu/65 1592701 1227048 +23.0%
BM_tensor_fft_single_2D_cpu/128 472318 384303 +18.6%
BM_tensor_fft_single_2D_cpu/129 7038351 5445308 +22.6%
BM_tensor_fft_single_2D_cpu/256 2309474 1850969 +19.9%
BM_tensor_fft_single_2D_cpu/257 31849182 23797538 +25.3%
BM_tensor_fft_single_2D_cpu/512 10395194 8077499 +22.3%
BM_tensor_fft_single_2D_cpu/513 144053843 104242541 +27.6%
BM_tensor_fft_single_2D_cpu/999 279885833 208389718 +25.5%
BM_tensor_fft_single_2D_cpu/1ki 45967677 36070985 +21.5%
BM_tensor_fft_single_2D_cpu/1k 619727095 456489500 +26.3%
BM_tensor_fft_double_2D_cpu/8 1110 1016 +8.5%
BM_tensor_fft_double_2D_cpu/9 17957 15768 +12.2%
BM_tensor_fft_double_2D_cpu/16 4558 4000 +12.2%
BM_tensor_fft_double_2D_cpu/17 79237 66901 +15.6%
BM_tensor_fft_double_2D_cpu/32 21494 17699 +17.7%
BM_tensor_fft_double_2D_cpu/33 357962 290357 +18.9%
BM_tensor_fft_double_2D_cpu/64 105179 87435 +16.9%
BM_tensor_fft_double_2D_cpu/65 1617143 1288006 +20.4%
BM_tensor_fft_double_2D_cpu/128 512848 419397 +18.2%
BM_tensor_fft_double_2D_cpu/129 7271322 5636884 +22.5%
BM_tensor_fft_double_2D_cpu/256 2415529 1922032 +20.4%
BM_tensor_fft_double_2D_cpu/257 32517952 24462177 +24.8%
BM_tensor_fft_double_2D_cpu/512 10724898 8287617 +22.7%
BM_tensor_fft_double_2D_cpu/513 146007419 108603266 +25.6%
BM_tensor_fft_double_2D_cpu/999 296351330 221885776 +25.1%
BM_tensor_fft_double_2D_cpu/1ki 59334166 48357539 +18.5%
BM_tensor_fft_double_2D_cpu/1k 666660132 483840349 +27.4%
|
2016-02-19 16:29:23 -08:00 |
|
Gael Guennebaud
|
d90a2dac5e
|
merge
|
2016-02-19 23:01:27 +01:00 |
|
Gael Guennebaud
|
485823b5f5
|
Add COD and BDCSVD in list of benched solvers.
|
2016-02-19 23:00:33 +01:00 |
|
Gael Guennebaud
|
2af04f1a57
|
Extend unit test to stress smart_copy with empty input/output.
|
2016-02-19 22:59:28 +01:00 |
|
Gael Guennebaud
|
6fa35bbd28
|
bug #1170: skip calls to memcpy/memmove for empty imput.
|
2016-02-19 22:58:52 +01:00 |
|
Benoit Steiner
|
46fc23f91c
|
Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues.
|
2016-02-19 13:44:22 -08:00 |
|
Gael Guennebaud
|
6f0992c05b
|
Fix nesting type and complete reflection methods of Block expressions.
|
2016-02-19 22:21:02 +01:00 |
|
Gael Guennebaud
|
f3643eec57
|
Add typedefs for the return type of all block methods.
|
2016-02-19 22:15:01 +01:00 |
|
Benoit Steiner
|
670db7988d
|
Updated the contraction code to make it compatible with half floats.
|
2016-02-19 13:03:26 -08:00 |
|
Benoit Steiner
|
180156ba1a
|
Added support for tensor reductions on half floats
|
2016-02-19 10:05:59 -08:00 |
|
Benoit Steiner
|
5c4901b83a
|
Implemented the scalar division of 2 half floats
|
2016-02-19 10:03:19 -08:00 |
|
Benoit Steiner
|
f268db1c4b
|
Added the ability to query the minor version of a cuda device
|
2016-02-19 16:31:04 +00:00 |
|
Benoit Steiner
|
a08d2ff0c9
|
Started to work on contractions and reductions using half floats
|
2016-02-19 15:59:59 +00:00 |
|
Benoit Steiner
|
f3352e0fb0
|
Don't make the array constructors explicit
|
2016-02-19 15:58:57 +00:00 |
|
Benoit Steiner
|
f7cb755299
|
Added support for operators +=, -=, *= and /= on CUDA half floats
|
2016-02-19 15:57:26 +00:00 |
|
Benoit Steiner
|
dc26459b99
|
Implemented protate() for CUDA
|
2016-02-19 15:16:54 +00:00 |
|
Benoit Steiner
|
cd042dbbfd
|
Fixed a bug in the tensor type converter
|
2016-02-19 15:03:26 +00:00 |
|
Benoit Steiner
|
ac5d706a94
|
Added support for simple coefficient wise tensor expression using half floats on CUDA devices
|
2016-02-19 08:19:12 +00:00 |
|
Benoit Steiner
|
0606a0a39b
|
FP16 on CUDA are only available starting with cuda 7.5. Disable them when using an older version of CUDA
|
2016-02-18 23:15:23 -08:00 |
|
Benoit Steiner
|
f36c0c2c65
|
Added regression test for float16
|
2016-02-19 06:23:28 +00:00 |
|
Benoit Steiner
|
7151bd8768
|
Reverted unintended changes introduced by a bad merge
|
2016-02-19 06:20:50 +00:00 |
|
Benoit Steiner
|
1304e1fb5e
|
Pulled latest updates from trunk
|
2016-02-19 06:17:02 +00:00 |
|
Benoit Steiner
|
17b9fbed34
|
Added preliminary support for half floats on CUDA GPU. For now we can simply convert floats into half floats and vice versa
|
2016-02-19 06:16:07 +00:00 |
|
Benoit Steiner
|
8ce46f9d89
|
Improved implementation of ptanh for SSE and AVX
|
2016-02-18 13:24:34 -08:00 |
|
Eugene Brevdo
|
832380c455
|
Merged eigen/eigen into default
|
2016-02-17 14:44:06 -08:00 |
|
Eugene Brevdo
|
06a2bc7c9c
|
Tiny bugfix in SpecialFunctions: some compilers don't like doubles
implicitly downcast to floats in an array constructor.
|
2016-02-17 14:41:59 -08:00 |
|
Gael Guennebaud
|
f6f057bb7d
|
bug #1166: fix shortcomming in gemv when the destination is not a vector at compile-time.
|
2016-02-15 21:43:07 +01:00 |
|
Gael Guennebaud
|
8e1f1ba6a6
|
Import wiki's paragraph: "I disabled vectorization, but I'm still getting annoyed about alignment issues"
|
2016-02-12 22:16:59 +01:00 |
|
Gael Guennebaud
|
c8b4c4b48a
|
bug #795: mention allocate_shared as a condidate for aligned_allocator.
|
2016-02-12 22:09:16 +01:00 |
|
Gael Guennebaud
|
6eff3e5185
|
Fix triangularView versus triangularPart.
|
2016-02-12 17:09:28 +01:00 |
|
Gael Guennebaud
|
4252af6897
|
Remove dead code.
|
2016-02-12 16:13:35 +01:00 |
|
Gael Guennebaud
|
2f5f56a820
|
Fix usage of evaluator in sparse * permutation products.
|
2016-02-12 16:13:16 +01:00 |
|
Gael Guennebaud
|
0a537cb2d8
|
bug #901: fix triangular-view with unit diagonal of sparse rectangular matrices.
|
2016-02-12 15:58:31 +01:00 |
|
Gael Guennebaud
|
b35d1a122e
|
Fix unit test: accessing elements in a deque by offsetting a pointer to another element causes undefined behavior.
|
2016-02-12 15:31:16 +01:00 |
|
Benoit Steiner
|
9e3f3a2d27
|
Deleted outdated comment
|
2016-02-11 17:27:35 -08:00 |
|
Benoit Steiner
|
de345eff2e
|
Added a method to conjugate the content of a tensor or the result of a tensor expression.
|
2016-02-11 16:34:07 -08:00 |
|
Benoit Steiner
|
17e93ba148
|
Pulled latest updates from trunk
|
2016-02-11 15:05:38 -08:00 |
|
Benoit Steiner
|
3628f7655d
|
Made it possible to run the scalar_binary_pow_op functor on GPU
|
2016-02-11 15:05:03 -08:00 |
|
Hauke Heibel
|
eeac46f980
|
bug #774: re-added comment referencing equations in the original paper
|
2016-02-11 19:38:37 +01:00 |
|
Benoit Steiner
|
c569cfe12a
|
Inline the +=, -=, *= and /= operators consistently between DenseBase.h and SelfCwiseBinaryOp.h
|
2016-02-11 09:33:32 -08:00 |
|
Gael Guennebaud
|
8cc9232b9a
|
bug #774: fix a numerical issue producing unwanted reflections.
|
2016-02-11 15:32:56 +01:00 |
|