Benoit Steiner
|
667fcc2b53
|
Fixed syntax error
|
2016-03-04 14:37:51 -08:00 |
|
Benoit Steiner
|
4416a5dcff
|
Added missing include
|
2016-03-04 14:35:43 -08:00 |
|
Benoit Steiner
|
c561eeb7bf
|
Don't use implicit type conversions in initializer lists since not all compilers support them.
|
2016-03-04 14:12:45 -08:00 |
|
Benoit Steiner
|
174edf976b
|
Made the contraction test more portable
|
2016-03-04 14:11:13 -08:00 |
|
Benoit Steiner
|
2c50fc878e
|
Fixed a typo
|
2016-03-04 14:09:38 -08:00 |
|
Benoit Steiner
|
deea866bbd
|
Added tests to cover the new rounding, flooring and ceiling tensor operations.
|
2016-03-03 12:38:02 -08:00 |
|
Benoit Steiner
|
5cf4558c0a
|
Added support for rounding, flooring, and ceiling to the tensor api
|
2016-03-03 12:36:55 -08:00 |
|
Benoit Steiner
|
dac58d7c35
|
Added a test to validate the conversion of half floats into floats on Kepler GPUs.
Restricted the testing of the random number generation code to GPU architecture greater than or equal to 3.5.
|
2016-03-03 10:37:25 -08:00 |
|
Benoit Steiner
|
1032441c6f
|
Enable partial support for half floats on Kepler GPUs.
|
2016-03-03 10:34:20 -08:00 |
|
Benoit Steiner
|
1da10a7358
|
Enable the conversion between floats and half floats on older GPUs that support it.
|
2016-03-03 10:33:20 -08:00 |
|
Benoit Steiner
|
2de8cc9122
|
Merged in ebrevdo/eigen (pull request PR-167)
Add infinity() support to numext::numeric_limits, use it in lgamma.
I tested the code on my gtx-titan-black gpu, and it appears to work as expected.
|
2016-03-03 09:42:12 -08:00 |
|
Eugene Brevdo
|
ab3dc0b0fe
|
Small bugfix to numeric_limits for CUDA.
|
2016-03-02 21:48:46 -08:00 |
|
Eugene Brevdo
|
6afea46838
|
Add infinity() support to numext::numeric_limits, use it in lgamma.
This makes the infinity access a __device__ function, removing
nvcc warnings.
|
2016-03-02 21:35:48 -08:00 |
|
Gael Guennebaud
|
3fccef6f50
|
bug #537: fix compilation with Apples's compiler
|
2016-03-02 13:22:46 +01:00 |
|
Benoit Steiner
|
fedaf19262
|
Pulled latest updates from trunk
|
2016-03-01 06:15:44 -08:00 |
|
Gael Guennebaud
|
dfa80b2060
|
Compilation fix
|
2016-03-01 12:48:56 +01:00 |
|
Gael Guennebaud
|
bee9efc203
|
Compilation fix
|
2016-03-01 12:47:27 +01:00 |
|
Benoit Steiner
|
68ac5c1738
|
Improved the performance of large outer reductions on cuda
|
2016-02-29 18:11:58 -08:00 |
|
Benoit Steiner
|
56a3ada670
|
Added benchmarks for full reduction
|
2016-02-29 14:57:52 -08:00 |
|
Benoit Steiner
|
b2075cb7a2
|
Made the signature of the inner and outer reducers consistent
|
2016-02-29 10:53:38 -08:00 |
|
Benoit Steiner
|
3284842045
|
Optimized the performance of narrow reductions on CUDA devices
|
2016-02-29 10:48:16 -08:00 |
|
Gael Guennebaud
|
e9bea614ec
|
Fix shortcoming in fixed-value deduction of startRow/startCol
|
2016-02-29 10:31:27 +01:00 |
|
Benoit Steiner
|
609b3337a7
|
Print some information to stderr when a CUDA kernel fails
|
2016-02-27 20:42:57 +00:00 |
|
Benoit Steiner
|
1031b31571
|
Improved the README
|
2016-02-27 20:22:04 +00:00 |
|
Gael Guennebaud
|
8e6faab51e
|
bug #1172: make valuePtr and innderIndexPtr properly return null for empty matrices.
|
2016-02-27 14:55:40 +01:00 |
|
Benoit Steiner
|
ac2e6e0d03
|
Properly vectorized the random number generators
|
2016-02-26 13:52:24 -08:00 |
|
Benoit Steiner
|
caa54d888f
|
Made the TensorIndexList usable on GPU without having to use the -relaxed-constexpr compilation flag
|
2016-02-26 12:38:18 -08:00 |
|
Benoit Steiner
|
93485d86bc
|
Added benchmarks for type casting of float16
|
2016-02-26 12:24:58 -08:00 |
|
Benoit Steiner
|
002824e32d
|
Added benchmarks for fp16
|
2016-02-26 12:21:25 -08:00 |
|
Benoit Steiner
|
2cd32cad27
|
Reverted previous commit since it caused more problems than it solved
|
2016-02-26 13:21:44 +00:00 |
|
Benoit Steiner
|
d9d05dd96e
|
Fixed handling of long doubles on aarch64
|
2016-02-26 04:13:58 -08:00 |
|
Benoit Steiner
|
af199b4658
|
Made the CUDA architecture level a build setting.
|
2016-02-25 09:06:18 -08:00 |
|
Benoit Steiner
|
c36c09169e
|
Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices.
|
2016-02-24 17:07:25 -08:00 |
|
Benoit Steiner
|
7a01cb8e4b
|
Marked the And and Or reducers as stateless.
|
2016-02-24 16:43:01 -08:00 |
|
Gael Guennebaud
|
91e1375ba9
|
merge
|
2016-02-23 11:09:05 +01:00 |
|
Gael Guennebaud
|
055000a424
|
Fix startRow()/startCol() for dense Block with direct access:
the initial implementation failed for empty rows/columns for which are ambiguous.
|
2016-02-23 11:07:59 +01:00 |
|
Benoit Steiner
|
1d9256f7db
|
Updated the padding code to work with half floats
|
2016-02-23 05:51:22 +00:00 |
|
Benoit Steiner
|
8cb9bfab87
|
Extended the tensor benchmark suite to support types other than floats
|
2016-02-23 05:28:02 +00:00 |
|
Benoit Steiner
|
f442a5a5b3
|
Updated the tensor benchmarking code to work with compilers that don't support cxx11.
|
2016-02-23 04:15:48 +00:00 |
|
Benoit Steiner
|
72d2cf642e
|
Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode.
|
2016-02-22 15:29:41 -08:00 |
|
Benoit Steiner
|
6270d851e3
|
Declare the half float type as arithmetic.
|
2016-02-22 13:59:33 -08:00 |
|
Benoit Steiner
|
5cd00068c0
|
include <iostream> in the tensor header since we now use it to better report cuda initialization errors
|
2016-02-22 13:59:03 -08:00 |
|
Benoit Steiner
|
257b640463
|
Fixed compilation warning generated by clang
|
2016-02-21 22:43:37 -08:00 |
|
Benoit Steiner
|
584832cb3c
|
Implemented the ptranspose function on half floats
|
2016-02-21 12:44:53 -08:00 |
|
Benoit Steiner
|
e644f60907
|
Pulled latest updates from trunk
|
2016-02-21 20:24:59 +00:00 |
|
Benoit Steiner
|
95fceb6452
|
Added the ability to compute the absolute value of a half float
|
2016-02-21 20:24:11 +00:00 |
|
Benoit Steiner
|
ed69cbeef0
|
Added some debugging information to the test to figure out why it fails sometimes
|
2016-02-21 11:20:20 -08:00 |
|
Benoit Steiner
|
96a24b05cc
|
Optimized casting of tensors in the case where the casting happens to be a no-op
|
2016-02-21 11:16:15 -08:00 |
|
Benoit Steiner
|
203490017f
|
Prevent unecessary Index to int conversions
|
2016-02-21 08:49:36 -08:00 |
|
Benoit Steiner
|
9ff269a1d3
|
Moved some of the fp16 operators outside the Eigen namespace to workaround some nvcc limitations.
|
2016-02-20 07:47:23 +00:00 |
|