Benoit Steiner
|
e4d4d15588
|
Register the cxx11_tensor_device only for recent cuda architectures (i.e. >= 3.0) since the test instantiate contractions that require a modern gpu.
|
2016-09-12 19:01:52 -07:00 |
|
Benoit Steiner
|
4dfd888c92
|
CUDA contractions require arch >= 3.0: don't compile the cuda contraction tests on older architectures.
|
2016-09-12 18:49:01 -07:00 |
|
Benoit Steiner
|
028e299577
|
Fixed a bug impacting some outer reductions on GPU
|
2016-09-12 18:36:52 -07:00 |
|
Benoit Steiner
|
5f50f12d2c
|
Added the ability to compute the absolute value of a complex number on GPU, as well as a test to catch the problem.
|
2016-09-12 13:46:13 -07:00 |
|
Benoit Steiner
|
8321dcce76
|
Merged latest updates from trunk
|
2016-09-12 10:33:05 -07:00 |
|
Benoit Steiner
|
eb6ba00cc8
|
Properly size the list of waiters
|
2016-09-12 10:31:55 -07:00 |
|
Benoit Steiner
|
a618094b62
|
Added a resize method to MaxSizeVector
|
2016-09-12 10:30:53 -07:00 |
|
Gael Guennebaud
|
471eac5399
|
bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX)
|
2016-09-08 08:36:27 +02:00 |
|
Gael Guennebaud
|
e1642f485c
|
bug #1288: fix memory leak in arpack wrapper.
|
2016-09-05 18:01:30 +02:00 |
|
Gael Guennebaud
|
dabc81751f
|
Fix compilation when cuda_fp16.h does not exist.
|
2016-09-05 17:14:20 +02:00 |
|
Benoit Steiner
|
87a8a1975e
|
Fixed a regression test
|
2016-09-02 19:29:33 -07:00 |
|
Benoit Steiner
|
13df3441ae
|
Use MaxSizeVector instead of std::vector: xcode sometimes assumes that std::vector allocates aligned memory and therefore issues aligned instruction to initialize it. This can result in random crashes when compiling with AVX instructions enabled.
|
2016-09-02 19:25:47 -07:00 |
|
Benoit Steiner
|
cadd124d73
|
Pulled latest update from trunk
|
2016-09-02 15:30:02 -07:00 |
|
Benoit Steiner
|
05b0518077
|
Made the index type an explicit template parameter to help some compilers compile the code.
|
2016-09-02 15:29:34 -07:00 |
|
Benoit Steiner
|
adf864fec0
|
Merged in rmlarsen/eigen (pull request PR-222)
Fix CUDA build broken by changes to min and max reduction.
|
2016-09-02 14:11:20 -07:00 |
|
Rasmus Munk Larsen
|
13e93ca8b7
|
Fix CUDA build broken by changes to min and max reduction.
|
2016-09-02 13:41:36 -07:00 |
|
Benoit Steiner
|
6c05c3dd49
|
Fix the cxx11_tensor_cuda.cu test on 32bit platforms.
|
2016-09-02 11:12:16 -07:00 |
|
Benoit Steiner
|
039e225f7f
|
Added a test for nullary expressions on CUDA
Also check that we can mix 64 and 32 bit indices in the same compilation unit
|
2016-09-01 13:28:12 -07:00 |
|
Benoit Steiner
|
c53f783705
|
Updated the contraction code to support constant inputs.
|
2016-09-01 11:41:27 -07:00 |
|
Gael Guennebaud
|
46475eff9a
|
Adjust Tensor module wrt recent change in nullary functor
|
2016-09-01 13:40:45 +02:00 |
|
Gael Guennebaud
|
72a4d49315
|
Fix compilation with CUDA 8
|
2016-09-01 13:39:33 +02:00 |
|
Rasmus Munk Larsen
|
a1e092d1e8
|
Fix bugs to make min- and max reducers with correctly with IEEE infinities.
|
2016-08-31 15:04:16 -07:00 |
|
Gael Guennebaud
|
1f84f0d33a
|
merge EulerAngles module
|
2016-08-30 10:01:53 +02:00 |
|
Gael Guennebaud
|
e074f720c7
|
Include missing forward declaration of SparseMatrix
|
2016-08-29 18:56:46 +02:00 |
|
Gael Guennebaud
|
6cd7b9ea6b
|
Fix compilation with cuda 8
|
2016-08-29 11:06:08 +02:00 |
|
Gael Guennebaud
|
35a8e94577
|
bug #1167: simplify installation of header files using cmake's install(DIRECTORY ...) command.
|
2016-08-29 10:59:37 +02:00 |
|
Gael Guennebaud
|
0f56b5a6de
|
enable vectorization path when testing half on cuda, and add test for log1p
|
2016-08-26 14:55:51 +02:00 |
|
Gael Guennebaud
|
965e595f02
|
Add missing log1p method
|
2016-08-26 14:55:00 +02:00 |
|
Benoit Steiner
|
7944d4431f
|
Made the cost model cwiseMax and cwiseMin methods consts to help the PowerPC cuda compiler compile this code.
|
2016-08-18 13:46:36 -07:00 |
|
Benoit Steiner
|
647a51b426
|
Force the inlining of a simple accessor.
|
2016-08-18 12:31:02 -07:00 |
|
Benoit Steiner
|
a452dedb4f
|
Merged in ibab/eigen/double-tensor-reduction (pull request PR-216)
Enable efficient Tensor reduction for doubles on the GPU (continued)
|
2016-08-18 12:29:54 -07:00 |
|
Igor Babuschkin
|
18c67df31c
|
Fix remaining CUDA >= 300 checks
|
2016-08-18 17:18:30 +01:00 |
|
Igor Babuschkin
|
1569a7d7ab
|
Add the necessary CUDA >= 300 checks back
|
2016-08-18 17:15:12 +01:00 |
|
Benoit Steiner
|
2b17f34574
|
Properly detect the type of the result of a contraction.
|
2016-08-16 16:00:30 -07:00 |
|
Benoit Steiner
|
34ae80179a
|
Use array_prod instead of calling TotalSize since TotalSize is only available on DSize.
|
2016-08-15 10:29:14 -07:00 |
|
Benoit Steiner
|
fe73648c98
|
Fixed a bug in the documentation.
|
2016-08-12 10:00:43 -07:00 |
|
Benoit Steiner
|
e3a8dfb02f
|
std::erfcf doesn't exist: use numext::erfc instead
|
2016-08-11 15:24:06 -07:00 |
|
Benoit Steiner
|
64e68cbe87
|
Don't attempt to optimize partial reductions when the optimized implementation doesn't buy anything.
|
2016-08-08 19:29:59 -07:00 |
|
Igor Babuschkin
|
841e075154
|
Remove CUDA >= 300 checks and enable outer reductin for doubles
|
2016-08-06 18:07:50 +01:00 |
|
Igor Babuschkin
|
0425118e2a
|
Merge upstream changes
|
2016-08-05 14:34:57 +01:00 |
|
Igor Babuschkin
|
9537e8b118
|
Make use of atomicExch for atomicExchCustom
|
2016-08-05 14:29:58 +01:00 |
|
Benoit Steiner
|
5eea1c7f97
|
Fixed cut and paste bug in debud message
|
2016-08-04 17:34:13 -07:00 |
|
Benoit Steiner
|
b50d8f8c4a
|
Extended a regression test to validate that we basic fp16 support works with cuda 7.0
|
2016-08-03 16:50:13 -07:00 |
|
Benoit Steiner
|
fad9828769
|
Deleted redundant regression test.
|
2016-08-03 16:08:37 -07:00 |
|
Benoit Steiner
|
ca2cee2739
|
Merged in ibab/eigen (pull request PR-206)
Expose real and imag methods on Tensors
|
2016-08-03 11:53:04 -07:00 |
|
Benoit Steiner
|
d92df04ce8
|
Cleaned up the new float16 test a bit
|
2016-08-03 11:50:07 -07:00 |
|
Benoit Steiner
|
81099ef482
|
Added a test for fp16
|
2016-08-03 11:41:17 -07:00 |
|
Benoit Steiner
|
a20b58845f
|
CUDA_ARCH isn't always defined, so avoid relying on it too much when figuring out which implementation to use for reductions. Instead rely on the device to tell us on which hardware version we're running.
|
2016-08-03 10:00:43 -07:00 |
|
Benoit Steiner
|
fd220dd8b0
|
Use numext::conj instead of std::conj
|
2016-08-01 18:16:16 -07:00 |
|
Benoit Steiner
|
e256acec7c
|
Avoid unecessary object copies
|
2016-08-01 17:03:39 -07:00 |
|