Benoit Steiner
|
2dde1b1028
|
Don't crash when attempting to reduce empty tensors.
|
2016-04-20 18:08:20 -07:00 |
|
Benoit Steiner
|
c7c2054bb5
|
Started to implement a portable way to yield.
|
2016-04-19 17:59:58 -07:00 |
|
Benoit Steiner
|
2b72163028
|
Implemented a more portable version of thread local variables
|
2016-04-19 15:56:02 -07:00 |
|
Benoit Steiner
|
5b1106c56b
|
Fixed a compilation error with nvcc 7.
|
2016-04-19 14:57:57 -07:00 |
|
Benoit Steiner
|
7129d998db
|
Simplified the code that launches cuda kernels.
|
2016-04-19 14:55:21 -07:00 |
|
Benoit Steiner
|
b9ea40c30d
|
Don't take the address of a kernel on CUDA devices that don't support this feature.
|
2016-04-19 14:35:11 -07:00 |
|
Benoit Steiner
|
884c075058
|
Use numext::ceil instead of std::ceil
|
2016-04-19 14:33:30 -07:00 |
|
Benoit Steiner
|
a278414d1b
|
Avoid an unnecessary copy of the evaluator.
|
2016-04-19 13:54:28 -07:00 |
|
Benoit Steiner
|
50968a0a3e
|
Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors.
|
2016-04-19 11:53:58 -07:00 |
|
Benoit Steiner
|
c8e8f93d6c
|
Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators.
|
2016-04-15 16:48:10 -07:00 |
|
Benoit Steiner
|
7cff898e0a
|
Deleted unnecessary variable
|
2016-04-15 15:46:14 -07:00 |
|
Benoit Steiner
|
6c43c49e4a
|
Fixed a few compilation warnings
|
2016-04-15 15:34:34 -07:00 |
|
Benoit Steiner
|
eb669f989f
|
Merged in rmlarsen/eigen (pull request PR-178)
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.
|
2016-04-15 14:53:15 -07:00 |
|
Rasmus Munk Larsen
|
3718bf654b
|
Get rid of void* casting when calling EvalRange::run.
|
2016-04-15 12:51:33 -07:00 |
|
Benoit Steiner
|
a62e924656
|
Added ability to access the cache sizes from the tensor devices
|
2016-04-14 21:25:06 -07:00 |
|
Benoit Steiner
|
18e6f67426
|
Added support for exclusive or
|
2016-04-14 20:37:46 -07:00 |
|
Rasmus Munk Larsen
|
07ac4f7e02
|
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default.
|
2016-04-14 18:28:23 -07:00 |
|
Benoit Steiner
|
9624a1ea3d
|
Added missing definition of PacketSize in the gpu evaluator of convolution
|
2016-04-14 17:16:58 -07:00 |
|
Benoit Steiner
|
6fbedf5a4e
|
Merged in rmlarsen/eigen (pull request PR-177)
Eigen Tensor cost model part 1.
|
2016-04-14 17:13:19 -07:00 |
|
Benoit Steiner
|
1372156c41
|
Prepared the migration to the new non blocking thread pool
|
2016-04-14 16:16:42 -07:00 |
|
Rasmus Munk Larsen
|
aeb5494a0b
|
Improvements to cost model.
|
2016-04-14 15:52:58 -07:00 |
|
Benoit Steiner
|
78a51abc12
|
Added a more scalable non blocking thread pool
|
2016-04-14 15:23:10 -07:00 |
|
Rasmus Munk Larsen
|
d2e95492e7
|
Merge upstream updates.
|
2016-04-14 13:59:50 -07:00 |
|
Rasmus Munk Larsen
|
235e83aba6
|
Eigen cost model part 1. This implements a basic recursive framework to estimate the cost of evaluating tensor expressions.
|
2016-04-14 13:57:35 -07:00 |
|
Benoit Steiner
|
5912ad877c
|
Silenced a compilation warning
|
2016-04-14 11:40:14 -07:00 |
|
Benoit Steiner
|
c7167fee0e
|
Added support for fp16 to the sigmoid function
|
2016-04-14 10:08:33 -07:00 |
|
Benoit Steiner
|
3b76df64fc
|
Defer the decision to vectorize tensor CUDA code to the meta kernel. This makes it possible to decide to vectorize or not depending on the capability of the target cuda architecture. In particular, this enables us to vectorize the processing of fp16 when running on device of capability >= 5.3
|
2016-04-12 10:58:51 -07:00 |
|
Benoit Steiner
|
7d5b17087f
|
Added missing EIGEN_DEVICE_FUNC to the tensor conversion code.
|
2016-04-07 20:01:19 -07:00 |
|
Benoit Steiner
|
48308ed801
|
Added support for isinf, isnan, and isfinite checks to the tensor api
|
2016-04-07 09:48:36 -07:00 |
|
Benoit Steiner
|
7be1eaad1e
|
Fixed typos in the implementation of the zeta and polygamma ops.
|
2016-04-06 14:15:37 -07:00 |
|
Till Hoffmann
|
80eba21ad0
|
Merge upstream.
|
2016-04-01 18:18:49 +01:00 |
|
Till Hoffmann
|
ffd770ce94
|
Fixed CUDA signature.
|
2016-04-01 17:58:24 +01:00 |
|
tillahoffmann
|
49960adbdd
|
Merged eigen/eigen into default
|
2016-04-01 14:36:15 +01:00 |
|
Till Hoffmann
|
57239f4a81
|
Added polygamma function.
|
2016-04-01 14:35:21 +01:00 |
|
Till Hoffmann
|
dd5d390daf
|
Added zeta function.
|
2016-04-01 13:32:29 +01:00 |
|
Benoit Steiner
|
3da495e6b9
|
Relaxed the condition used to gate the fft code.
|
2016-03-31 18:11:51 -07:00 |
|
Benoit Steiner
|
0f5cc504fe
|
Properly gate the fft code
|
2016-03-31 12:59:39 -07:00 |
|
Benoit Steiner
|
af4ef540bf
|
Fixed a off-by-one bug in a debug assertion
|
2016-03-30 18:37:19 -07:00 |
|
Benoit Steiner
|
791e5cfb69
|
Added NumTraits for type2index.
|
2016-03-30 18:36:36 -07:00 |
|
Benoit Steiner
|
483aaad10a
|
Fixed compilation warning
|
2016-03-30 17:08:13 -07:00 |
|
Benoit Steiner
|
1b40abbf99
|
Added missing assignment operator to the TensorUInt128 class, and made misc small improvements
|
2016-03-30 13:17:03 -07:00 |
|
Benoit Steiner
|
aa45ad2aac
|
Fixed the formatting of the README.
|
2016-03-29 15:06:13 -07:00 |
|
Benoit Steiner
|
56df5ef1d7
|
Attempt to fix the formatting of the README
|
2016-03-29 15:03:38 -07:00 |
|
Benoit Steiner
|
c38295f0a0
|
Added support for fmod
|
2016-03-28 15:53:02 -07:00 |
|
Benoit Steiner
|
6772f653c3
|
Made it possible to customize the threadpool
|
2016-03-28 10:01:04 -07:00 |
|
Benoit Steiner
|
1bc81f7889
|
Fixed compilation warnings on arm
|
2016-03-28 09:21:04 -07:00 |
|
Benoit Steiner
|
78f83d6f6a
|
Prevent potential overflow.
|
2016-03-28 09:18:04 -07:00 |
|
Benoit Steiner
|
74f91ed06c
|
Improved support for integer modulo
|
2016-03-25 17:21:56 -07:00 |
|
Benoit Steiner
|
41434a8a85
|
Avoid unnecessary conversions
|
2016-03-23 16:52:38 -07:00 |
|
Benoit Steiner
|
92693b50eb
|
Fixed compilation warning
|
2016-03-23 16:40:36 -07:00 |
|