Benoit Steiner
|
f629fe95c8
|
Made the index type a template parameter to evaluateProductBlockingSizes
Use numext::mini and numext::maxi instead of std::min/std::max to compute blocking sizes.
|
2016-04-27 13:11:19 -07:00 |
|
Benoit Steiner
|
66b215b742
|
Merged latest updates from trunk
|
2016-04-27 12:57:48 -07:00 |
|
Benoit Steiner
|
25141b69d4
|
Improved support for min and max on 16 bit floats when running on recent cuda gpus
|
2016-04-27 12:57:21 -07:00 |
|
Rasmus Larsen
|
ff33798acd
|
Merged eigen/eigen into default
|
2016-04-27 12:27:00 -07:00 |
|
Rasmus Munk Larsen
|
463738ccbe
|
Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases.
|
2016-04-27 12:26:18 -07:00 |
|
Benoit Steiner
|
6744d776ba
|
Added support for fpclassify in Eigen::Numext
|
2016-04-27 12:10:25 -07:00 |
|
Rasmus Munk Larsen
|
1f48f47ab7
|
Implement stricter argument checking for SYRK and SY2K and real matrices. To implement the BLAS API they should return info=2 if op='C' is passed for a complex matrix. Without this change, the Eigen BLAS fails the strict zblat3 and cblat3 tests in LAPACK 3.5.
|
2016-04-27 19:59:44 +02:00 |
|
Gael Guennebaud
|
3dddd34133
|
Refactor the unsupported CXX11/Core module to internal headers only.
|
2016-04-26 11:20:25 +02:00 |
|
Benoit Steiner
|
4a164d2c46
|
Fixed the partial evaluation of non vectorizable tensor subexpressions
|
2016-04-25 10:43:03 -07:00 |
|
Benoit Steiner
|
fd9401f260
|
Refined the cost of the striding operation.
|
2016-04-25 09:16:08 -07:00 |
|
Benoit Steiner
|
5c372d19e3
|
Merged in rmlarsen/eigen (pull request PR-179)
Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.
|
2016-04-21 18:06:36 -07:00 |
|
Benoit Steiner
|
4bbc97be5e
|
Provide access to the base threadpool classes
|
2016-04-21 17:59:33 -07:00 |
|
Rasmus Munk Larsen
|
a3256d78d8
|
Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.
|
2016-04-21 16:49:28 -07:00 |
|
Benoit Steiner
|
33adce5c3a
|
Added the ability to switch to the new thread pool with a #define
|
2016-04-21 11:59:58 -07:00 |
|
Benoit Steiner
|
79b900375f
|
Use index list for the striding benchmarks
|
2016-04-21 11:58:27 -07:00 |
|
Benoit Steiner
|
f670613e4b
|
Fixed several compilation warnings
|
2016-04-21 11:03:02 -07:00 |
|
Benoit Steiner
|
6015422ee6
|
Added an option to enable the use of the F16C instruction set
|
2016-04-21 10:30:29 -07:00 |
|
Benoit Steiner
|
32ffce04fc
|
Use EIGEN_THREAD_YIELD instead of std::this_thread::yield to make the code more portable.
|
2016-04-21 08:47:28 -07:00 |
|
Benoit Steiner
|
2dde1b1028
|
Don't crash when attempting to reduce empty tensors.
|
2016-04-20 18:08:20 -07:00 |
|
Benoit Steiner
|
a792cd357d
|
Added more tests
|
2016-04-20 17:33:58 -07:00 |
|
Benoit Steiner
|
80200a1828
|
Don't attempt to leverage the _cvtss_sh and _cvtsh_ss instructions when compiling with clang since it's unclear which versions of clang actually support these instruction.
|
2016-04-20 12:10:27 -07:00 |
|
Benoit Steiner
|
c7c2054bb5
|
Started to implement a portable way to yield.
|
2016-04-19 17:59:58 -07:00 |
|
Benoit Steiner
|
1d0238375d
|
Made sure all the required header files are included when trying to use fp16
|
2016-04-19 17:44:12 -07:00 |
|
Benoit Steiner
|
2b72163028
|
Implemented a more portable version of thread local variables
|
2016-04-19 15:56:02 -07:00 |
|
Benoit Steiner
|
04f954956d
|
Fixed a few typos
|
2016-04-19 15:27:09 -07:00 |
|
Benoit Steiner
|
5b1106c56b
|
Fixed a compilation error with nvcc 7.
|
2016-04-19 14:57:57 -07:00 |
|
Benoit Steiner
|
7129d998db
|
Simplified the code that launches cuda kernels.
|
2016-04-19 14:55:21 -07:00 |
|
Benoit Steiner
|
b9ea40c30d
|
Don't take the address of a kernel on CUDA devices that don't support this feature.
|
2016-04-19 14:35:11 -07:00 |
|
Benoit Steiner
|
884c075058
|
Use numext::ceil instead of std::ceil
|
2016-04-19 14:33:30 -07:00 |
|
Benoit Steiner
|
a278414d1b
|
Avoid an unnecessary copy of the evaluator.
|
2016-04-19 13:54:28 -07:00 |
|
Benoit Steiner
|
f953c60705
|
Fixed 2 recent regression tests
|
2016-04-19 12:57:39 -07:00 |
|
Benoit Steiner
|
50968a0a3e
|
Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors.
|
2016-04-19 11:53:58 -07:00 |
|
Benoit Steiner
|
84543c8be2
|
Worked around the lack of a rand_r function on windows systems
|
2016-04-17 19:29:27 -07:00 |
|
Benoit Steiner
|
5fbcfe5eb4
|
Worked around the lack of a rand_r function on windows systems
|
2016-04-17 18:42:31 -07:00 |
|
Gael Guennebaud
|
e4fe611e2c
|
Enable lazy-coeff-based-product for vector*(1x1) products
|
2016-04-16 15:17:39 +02:00 |
|
Benoit Steiner
|
c8e8f93d6c
|
Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators.
|
2016-04-15 16:48:10 -07:00 |
|
Benoit Steiner
|
1a16fb1532
|
Deleted extraneous comma.
|
2016-04-15 15:50:13 -07:00 |
|
Benoit Steiner
|
7cff898e0a
|
Deleted unnecessary variable
|
2016-04-15 15:46:14 -07:00 |
|
Benoit Steiner
|
6c43c49e4a
|
Fixed a few compilation warnings
|
2016-04-15 15:34:34 -07:00 |
|
Benoit Steiner
|
eb669f989f
|
Merged in rmlarsen/eigen (pull request PR-178)
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.
|
2016-04-15 14:53:15 -07:00 |
|
Gael Guennebaud
|
2a7115daca
|
bug #1203: by-pass large stack-allocation in stableNorm if EIGEN_STACK_ALLOCATION_LIMIT is too small
|
2016-04-15 22:34:11 +02:00 |
|
Rasmus Munk Larsen
|
3718bf654b
|
Get rid of void* casting when calling EvalRange::run.
|
2016-04-15 12:51:33 -07:00 |
|
Benoit Steiner
|
40c9923a8a
|
Fixed compilation errors with msvc
|
2016-04-15 11:27:52 -07:00 |
|
Benoit Steiner
|
1d23430628
|
Improved the matrix multiplication blocking in the case where mr is not a power of 2 (e.g on Haswell CPUs).
|
2016-04-15 10:53:31 -07:00 |
|
Gael Guennebaud
|
1e80bddde3
|
Fix trmv for mixing types.
|
2016-04-15 17:58:36 +02:00 |
|
Benoit Steiner
|
a62e924656
|
Added ability to access the cache sizes from the tensor devices
|
2016-04-14 21:25:06 -07:00 |
|
Benoit Steiner
|
18e6f67426
|
Added support for exclusive or
|
2016-04-14 20:37:46 -07:00 |
|
Rasmus Munk Larsen
|
07ac4f7e02
|
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default.
|
2016-04-14 18:28:23 -07:00 |
|
Benoit Steiner
|
9624a1ea3d
|
Added missing definition of PacketSize in the gpu evaluator of convolution
|
2016-04-14 17:16:58 -07:00 |
|
Benoit Steiner
|
6fbedf5a4e
|
Merged in rmlarsen/eigen (pull request PR-177)
Eigen Tensor cost model part 1.
|
2016-04-14 17:13:19 -07:00 |
|