Commit Graph

1999 Commits

Author SHA1 Message Date
Benoit Steiner
cadd124d73 Pulled latest update from trunk 2016-09-02 15:30:02 -07:00
Benoit Steiner
05b0518077 Made the index type an explicit template parameter to help some compilers compile the code. 2016-09-02 15:29:34 -07:00
Benoit Steiner
adf864fec0 Merged in rmlarsen/eigen (pull request PR-222)
Fix CUDA build broken by changes to min and max reduction.
2016-09-02 14:11:20 -07:00
Rasmus Munk Larsen
13e93ca8b7 Fix CUDA build broken by changes to min and max reduction. 2016-09-02 13:41:36 -07:00
Benoit Steiner
6c05c3dd49 Fix the cxx11_tensor_cuda.cu test on 32bit platforms. 2016-09-02 11:12:16 -07:00
Benoit Steiner
039e225f7f Added a test for nullary expressions on CUDA
Also check that we can mix 64 and 32 bit indices in the same compilation unit
2016-09-01 13:28:12 -07:00
Benoit Steiner
c53f783705 Updated the contraction code to support constant inputs. 2016-09-01 11:41:27 -07:00
Gael Guennebaud
46475eff9a Adjust Tensor module wrt recent change in nullary functor 2016-09-01 13:40:45 +02:00
Gael Guennebaud
72a4d49315 Fix compilation with CUDA 8 2016-09-01 13:39:33 +02:00
Rasmus Munk Larsen
a1e092d1e8 Fix bugs to make min- and max reducers with correctly with IEEE infinities. 2016-08-31 15:04:16 -07:00
Gael Guennebaud
1f84f0d33a merge EulerAngles module 2016-08-30 10:01:53 +02:00
Gael Guennebaud
e074f720c7 Include missing forward declaration of SparseMatrix 2016-08-29 18:56:46 +02:00
Gael Guennebaud
6cd7b9ea6b Fix compilation with cuda 8 2016-08-29 11:06:08 +02:00
Gael Guennebaud
35a8e94577 bug #1167: simplify installation of header files using cmake's install(DIRECTORY ...) command. 2016-08-29 10:59:37 +02:00
Gael Guennebaud
0f56b5a6de enable vectorization path when testing half on cuda, and add test for log1p 2016-08-26 14:55:51 +02:00
Gael Guennebaud
965e595f02 Add missing log1p method 2016-08-26 14:55:00 +02:00
Benoit Steiner
7944d4431f Made the cost model cwiseMax and cwiseMin methods consts to help the PowerPC cuda compiler compile this code. 2016-08-18 13:46:36 -07:00
Benoit Steiner
647a51b426 Force the inlining of a simple accessor. 2016-08-18 12:31:02 -07:00
Benoit Steiner
a452dedb4f Merged in ibab/eigen/double-tensor-reduction (pull request PR-216)
Enable efficient Tensor reduction for doubles on the GPU (continued)
2016-08-18 12:29:54 -07:00
Igor Babuschkin
18c67df31c Fix remaining CUDA >= 300 checks 2016-08-18 17:18:30 +01:00
Igor Babuschkin
1569a7d7ab Add the necessary CUDA >= 300 checks back 2016-08-18 17:15:12 +01:00
Benoit Steiner
2b17f34574 Properly detect the type of the result of a contraction. 2016-08-16 16:00:30 -07:00
Benoit Steiner
34ae80179a Use array_prod instead of calling TotalSize since TotalSize is only available on DSize. 2016-08-15 10:29:14 -07:00
Benoit Steiner
fe73648c98 Fixed a bug in the documentation. 2016-08-12 10:00:43 -07:00
Benoit Steiner
e3a8dfb02f std::erfcf doesn't exist: use numext::erfc instead 2016-08-11 15:24:06 -07:00
Benoit Steiner
64e68cbe87 Don't attempt to optimize partial reductions when the optimized implementation doesn't buy anything. 2016-08-08 19:29:59 -07:00
Igor Babuschkin
841e075154 Remove CUDA >= 300 checks and enable outer reductin for doubles 2016-08-06 18:07:50 +01:00
Igor Babuschkin
0425118e2a Merge upstream changes 2016-08-05 14:34:57 +01:00
Igor Babuschkin
9537e8b118 Make use of atomicExch for atomicExchCustom 2016-08-05 14:29:58 +01:00
Benoit Steiner
5eea1c7f97 Fixed cut and paste bug in debud message 2016-08-04 17:34:13 -07:00
Benoit Steiner
b50d8f8c4a Extended a regression test to validate that we basic fp16 support works with cuda 7.0 2016-08-03 16:50:13 -07:00
Benoit Steiner
fad9828769 Deleted redundant regression test. 2016-08-03 16:08:37 -07:00
Benoit Steiner
ca2cee2739 Merged in ibab/eigen (pull request PR-206)
Expose real and imag methods on Tensors
2016-08-03 11:53:04 -07:00
Benoit Steiner
d92df04ce8 Cleaned up the new float16 test a bit 2016-08-03 11:50:07 -07:00
Benoit Steiner
81099ef482 Added a test for fp16 2016-08-03 11:41:17 -07:00
Benoit Steiner
a20b58845f CUDA_ARCH isn't always defined, so avoid relying on it too much when figuring out which implementation to use for reductions. Instead rely on the device to tell us on which hardware version we're running. 2016-08-03 10:00:43 -07:00
Benoit Steiner
fd220dd8b0 Use numext::conj instead of std::conj 2016-08-01 18:16:16 -07:00
Benoit Steiner
e256acec7c Avoid unecessary object copies 2016-08-01 17:03:39 -07:00
Benoit Steiner
2693fd54bf bug #1266: half implementation has been moved to half_impl namespace 2016-07-29 13:45:56 -07:00
Gael Guennebaud
cc2f6d68b1 bug #1264: fix compilation 2016-07-27 23:30:47 +02:00
Gael Guennebaud
8972323c08 Big 1261: add missing max(ADS,ADS) overload (same for min) 2016-07-27 14:52:48 +02:00
Gael Guennebaud
5d94dc85e5 bug #1260: add regression test 2016-07-27 14:38:30 +02:00
Gael Guennebaud
0d7039319c bug #1260: remove doubtful specializations of ScalarBinaryOpTraits 2016-07-27 14:35:52 +02:00
Benoit Steiner
3d3d34e442 Deleted dead code. 2016-07-25 08:53:37 -07:00
Gael Guennebaud
6d5daf32f5 bug #1255: comment out broken and unsused line. 2016-07-25 14:48:30 +02:00
Gael Guennebaud
f9598d73b5 bug #1250: fix pow() for AutoDiffScalar with custom nested scalar type. 2016-07-25 14:42:19 +02:00
Gael Guennebaud
fd1117f2be Implement digits10 for mpreal 2016-07-25 14:38:55 +02:00
Gael Guennebaud
9908020d36 Add minimal support for Array<string>, and fix Tensor<string> 2016-07-25 14:25:56 +02:00
Benoit Steiner
c6b0de2c21 Improved partial reductions in more cases 2016-07-22 17:18:20 -07:00
Gael Guennebaud
32d95e86c9 merge 2016-07-22 16:43:12 +02:00