Commit Graph

2154 Commits

Author SHA1 Message Date
Mehdi Goli
e36cb91c99 Fixing the code indentation in the TensorReduction.h file. 2016-10-14 18:03:00 +01:00
Tal Hadad
078a202621 Merge Hongkai Dai correct range calculation, and remove ranges from API.
Docs updated.
2016-10-14 16:03:28 +03:00
Luke Iwanski
e742da8b28 Merged ComputeCpp into default. 2016-10-14 13:36:51 +01:00
Mehdi Goli
524fa4c46f Reducing the code by generalising sycl backend functions/structs. 2016-10-14 12:09:55 +01:00
Hongkai Dai
014d9f1d9b implement euler angles with the right ranges 2016-10-13 14:45:51 -07:00
Benoit Steiner
d0ee2267d6 Relaxed the resizing checks so that they don't fail with gcc >= 5.3 2016-10-13 10:59:46 -07:00
Benoit Steiner
7e4a6754b2 Merged eigen/eigen into default 2016-10-12 22:42:33 -07:00
Gael Guennebaud
091d373ee9 Fix outer-stride. 2016-10-12 21:47:52 +02:00
Benoit Steiner
7f0599b6eb Manually define int16_t and uint16_t when compiling with Visual Studio 2016-10-08 22:56:32 -07:00
Benoit Steiner
5266ff8966 Cleaned up a regression test 2016-10-08 19:12:44 +00:00
Benoit Steiner
5c68051cd7 Merge the content of the ComputeCpp branch into the default branch 2016-10-07 11:04:16 -07:00
RJ Ryan
bfc264abe8 Add a test that GPU complex product reductions match CPU reductions. 2016-10-06 11:10:14 -07:00
RJ Ryan
e2e9cdd169 Fully support complex types in SumReducer and MeanReducer when building for CUDA by using scalar_sum_op and scalar_product_op instead of operator+ and operator*. 2016-10-06 10:49:48 -07:00
Benoit Steiner
d7f9679a34 Fixed a couple of compilation warnings 2016-10-05 15:00:32 -07:00
Benoit Steiner
ae1385c7e4 Pull the latest updates from trunk 2016-10-05 14:54:36 -07:00
Benoit Steiner
73b0012945 Fixed compilation warnings 2016-10-05 14:24:24 -07:00
Benoit Steiner
c84084c0c0 Fixed compilation warning 2016-10-05 14:15:41 -07:00
Benoit Steiner
4387433acf Increased the robustness of the reduction tests on fp16 2016-10-05 10:42:41 -07:00
Benoit Steiner
aad20d700d Increase the tolerance to numerical noise. 2016-10-05 10:39:24 -07:00
Benoit Steiner
8b69d5d730 ::rand() returns a signed integer on win32 2016-10-05 08:55:02 -07:00
Benoit Steiner
ed7a220b04 Fixed a typo that impacts windows builds 2016-10-05 08:51:31 -07:00
Benoit Steiner
ceee1c008b Silenced compilation warning 2016-10-04 18:47:53 -07:00
Benoit Steiner
6af5ac7e27 Cleanup the cuda executor code. 2016-10-04 08:52:13 -07:00
Benoit Steiner
2f6d1607c8 Cleaned up the random number generation code. 2016-10-04 08:38:23 -07:00
Benoit Steiner
616a7a1912 Improved support for compiling CUDA code with clang as the host compiler 2016-10-03 17:09:33 -07:00
Benoit Steiner
422530946f Renamed the SYCL tests to follow the standard naming convention. 2016-09-30 08:22:10 -07:00
Benoit Steiner
2bda1b0d93 Updated the tensor sum and mean reducer to enable them to process complex numbers on cuda gpus. 2016-09-28 17:08:41 -07:00
Mehdi Goli
dd602e62c8 Converting alias template to nested struct in order to be compatible with CXX-03 2016-09-27 16:21:19 +01:00
Benoit Steiner
6565f8d60f Made the initialization of a CUDA device thread safe. 2016-09-26 11:00:32 -07:00
Benoit Steiner
f6ac51a054 Made TensorEvalTo compatible with c++0x again. 2016-09-23 16:45:17 -07:00
Benoit Steiner
00d4e65f00 Deleted unused TensorMap data member 2016-09-23 16:44:45 -07:00
Benoit Steiner
1301d744f8 Made the gaussian generator usable on GPU 2016-09-22 19:04:44 -07:00
RJ Ryan
608b1acd6d Don't use c++11 features and fix include. 2016-09-20 07:49:05 -07:00
RJ Ryan
b2c6dc48d9 Add CUDA-specific std::complex<T> specializations for scalar_sum_op, scalar_difference_op, scalar_product_op, and scalar_quotient_op. 2016-09-20 07:18:20 -07:00
Gael Guennebaud
3ada6e4bed Merged hongkai-dai/eigen/tip into default (bug #1298) 2016-09-19 22:08:06 +02:00
Benoit Steiner
c3ca9b1e76 Deleted some unecessary and confusing EIGEN_DEVICE_FUNC 2016-09-19 11:33:39 -07:00
Hongkai Dai
5dcc6d301a remove ternary operator in euler angles 2016-09-19 10:30:30 -07:00
Luke Iwanski
c771df6bc3 Updated the owners of the file. 2016-09-19 14:09:25 +01:00
Luke Iwanski
b91e021172 Merged with default. 2016-09-19 14:03:54 +01:00
Luke Iwanski
cb81975714 Partial OpenCL support via SYCL compatible with ComputeCpp CE. 2016-09-19 12:44:13 +01:00
Emil Fresk
6edd2e2851 Made AutoDiffJacobian more intuitive to use and updated for C++11
Changes:
* Removed unnecessary types from the Functor by inferring from its types
* Removed inputs() function reference, replaced with .rows()
* Updated the forward constructor to use variadic templates
* Added optional parameters to the Fuctor for passing parameters,
  control signals, etc
* Has been tested with fixed size and dynamic matricies

Ammendment by chtz: overload operator() for compatibility with not fully conforming compilers
2016-09-16 14:03:55 +02:00
Gael Guennebaud
18f6e47815 Fix order of "static inline". 2016-09-16 11:32:54 +02:00
Benoit Steiner
488ad7dd1b Added missing EIGEN_DEVICE_FUNC qualifiers 2016-09-14 13:35:00 -07:00
Benoit Steiner
e4d4d15588 Register the cxx11_tensor_device only for recent cuda architectures (i.e. >= 3.0) since the test instantiate contractions that require a modern gpu. 2016-09-12 19:01:52 -07:00
Benoit Steiner
4dfd888c92 CUDA contractions require arch >= 3.0: don't compile the cuda contraction tests on older architectures. 2016-09-12 18:49:01 -07:00
Benoit Steiner
028e299577 Fixed a bug impacting some outer reductions on GPU 2016-09-12 18:36:52 -07:00
Benoit Steiner
5f50f12d2c Added the ability to compute the absolute value of a complex number on GPU, as well as a test to catch the problem. 2016-09-12 13:46:13 -07:00
Benoit Steiner
8321dcce76 Merged latest updates from trunk 2016-09-12 10:33:05 -07:00
Benoit Steiner
eb6ba00cc8 Properly size the list of waiters 2016-09-12 10:31:55 -07:00
Benoit Steiner
a618094b62 Added a resize method to MaxSizeVector 2016-09-12 10:30:53 -07:00
Gael Guennebaud
471eac5399 bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX) 2016-09-08 08:36:27 +02:00
Gael Guennebaud
e1642f485c bug #1288: fix memory leak in arpack wrapper. 2016-09-05 18:01:30 +02:00
Gael Guennebaud
dabc81751f Fix compilation when cuda_fp16.h does not exist. 2016-09-05 17:14:20 +02:00
Benoit Steiner
87a8a1975e Fixed a regression test 2016-09-02 19:29:33 -07:00
Benoit Steiner
13df3441ae Use MaxSizeVector instead of std::vector: xcode sometimes assumes that std::vector allocates aligned memory and therefore issues aligned instruction to initialize it. This can result in random crashes when compiling with AVX instructions enabled. 2016-09-02 19:25:47 -07:00
Benoit Steiner
cadd124d73 Pulled latest update from trunk 2016-09-02 15:30:02 -07:00
Benoit Steiner
05b0518077 Made the index type an explicit template parameter to help some compilers compile the code. 2016-09-02 15:29:34 -07:00
Benoit Steiner
adf864fec0 Merged in rmlarsen/eigen (pull request PR-222)
Fix CUDA build broken by changes to min and max reduction.
2016-09-02 14:11:20 -07:00
Rasmus Munk Larsen
13e93ca8b7 Fix CUDA build broken by changes to min and max reduction. 2016-09-02 13:41:36 -07:00
Benoit Steiner
6c05c3dd49 Fix the cxx11_tensor_cuda.cu test on 32bit platforms. 2016-09-02 11:12:16 -07:00
Benoit Steiner
039e225f7f Added a test for nullary expressions on CUDA
Also check that we can mix 64 and 32 bit indices in the same compilation unit
2016-09-01 13:28:12 -07:00
Benoit Steiner
c53f783705 Updated the contraction code to support constant inputs. 2016-09-01 11:41:27 -07:00
Gael Guennebaud
46475eff9a Adjust Tensor module wrt recent change in nullary functor 2016-09-01 13:40:45 +02:00
Gael Guennebaud
72a4d49315 Fix compilation with CUDA 8 2016-09-01 13:39:33 +02:00
Rasmus Munk Larsen
a1e092d1e8 Fix bugs to make min- and max reducers with correctly with IEEE infinities. 2016-08-31 15:04:16 -07:00
Gael Guennebaud
1f84f0d33a merge EulerAngles module 2016-08-30 10:01:53 +02:00
Gael Guennebaud
e074f720c7 Include missing forward declaration of SparseMatrix 2016-08-29 18:56:46 +02:00
Gael Guennebaud
6cd7b9ea6b Fix compilation with cuda 8 2016-08-29 11:06:08 +02:00
Gael Guennebaud
35a8e94577 bug #1167: simplify installation of header files using cmake's install(DIRECTORY ...) command. 2016-08-29 10:59:37 +02:00
Gael Guennebaud
0f56b5a6de enable vectorization path when testing half on cuda, and add test for log1p 2016-08-26 14:55:51 +02:00
Gael Guennebaud
965e595f02 Add missing log1p method 2016-08-26 14:55:00 +02:00
Benoit Steiner
34ae80179a Use array_prod instead of calling TotalSize since TotalSize is only available on DSize. 2016-08-15 10:29:14 -07:00
Benoit Steiner
fe73648c98 Fixed a bug in the documentation. 2016-08-12 10:00:43 -07:00
Benoit Steiner
e3a8dfb02f std::erfcf doesn't exist: use numext::erfc instead 2016-08-11 15:24:06 -07:00
Benoit Steiner
64e68cbe87 Don't attempt to optimize partial reductions when the optimized implementation doesn't buy anything. 2016-08-08 19:29:59 -07:00
Benoit Steiner
5eea1c7f97 Fixed cut and paste bug in debud message 2016-08-04 17:34:13 -07:00
Benoit Steiner
b50d8f8c4a Extended a regression test to validate that we basic fp16 support works with cuda 7.0 2016-08-03 16:50:13 -07:00
Benoit Steiner
fad9828769 Deleted redundant regression test. 2016-08-03 16:08:37 -07:00
Benoit Steiner
ca2cee2739 Merged in ibab/eigen (pull request PR-206)
Expose real and imag methods on Tensors
2016-08-03 11:53:04 -07:00
Benoit Steiner
d92df04ce8 Cleaned up the new float16 test a bit 2016-08-03 11:50:07 -07:00
Benoit Steiner
81099ef482 Added a test for fp16 2016-08-03 11:41:17 -07:00
Benoit Steiner
a20b58845f CUDA_ARCH isn't always defined, so avoid relying on it too much when figuring out which implementation to use for reductions. Instead rely on the device to tell us on which hardware version we're running. 2016-08-03 10:00:43 -07:00
Benoit Steiner
fd220dd8b0 Use numext::conj instead of std::conj 2016-08-01 18:16:16 -07:00
Benoit Steiner
e256acec7c Avoid unecessary object copies 2016-08-01 17:03:39 -07:00
Benoit Steiner
2693fd54bf bug #1266: half implementation has been moved to half_impl namespace 2016-07-29 13:45:56 -07:00
Gael Guennebaud
cc2f6d68b1 bug #1264: fix compilation 2016-07-27 23:30:47 +02:00
Gael Guennebaud
8972323c08 Big 1261: add missing max(ADS,ADS) overload (same for min) 2016-07-27 14:52:48 +02:00
Gael Guennebaud
5d94dc85e5 bug #1260: add regression test 2016-07-27 14:38:30 +02:00
Gael Guennebaud
0d7039319c bug #1260: remove doubtful specializations of ScalarBinaryOpTraits 2016-07-27 14:35:52 +02:00
Benoit Steiner
3d3d34e442 Deleted dead code. 2016-07-25 08:53:37 -07:00
Gael Guennebaud
6d5daf32f5 bug #1255: comment out broken and unsused line. 2016-07-25 14:48:30 +02:00
Gael Guennebaud
f9598d73b5 bug #1250: fix pow() for AutoDiffScalar with custom nested scalar type. 2016-07-25 14:42:19 +02:00
Gael Guennebaud
fd1117f2be Implement digits10 for mpreal 2016-07-25 14:38:55 +02:00
Gael Guennebaud
9908020d36 Add minimal support for Array<string>, and fix Tensor<string> 2016-07-25 14:25:56 +02:00
Benoit Steiner
c6b0de2c21 Improved partial reductions in more cases 2016-07-22 17:18:20 -07:00
Gael Guennebaud
32d95e86c9 merge 2016-07-22 16:43:12 +02:00
Gael Guennebaud
d7a0e52478 Fix testing of log nearby 1 2016-07-22 15:44:26 +02:00
Gael Guennebaud
7acf23c14c Truely split unit test. 2016-07-22 15:41:23 +02:00
Gael Guennebaud
d075d122ea Move half unit test from unsupported to main tests 2016-07-22 14:34:19 +02:00
Gael Guennebaud
0f350a8b7e Fix CUDA compilation 2016-07-21 18:47:07 +02:00
Gael Guennebaud
82798162c0 Extend unit testing of half with ADL and arrays. 2016-07-21 15:47:21 +02:00
Yi Lin
7b4abc2b1d Fixed a code comment error 2016-07-20 22:28:54 +08:00
Benoit Steiner
20f7ef2f89 An evalTo expression is only aligned iff both the lhs and the rhs are aligned. 2016-07-12 10:56:42 -07:00
Gael Guennebaud
c98bac2966 Manually add -stdd=c++11 to nvcc for old cmake versions 2016-07-12 09:29:18 +02:00
Benoit Steiner
40eb97516c reverted unintended change. 2016-07-11 14:28:03 -07:00
Benoit Steiner
03b71c273e Made the packetmath test compile again. A better fix would be to move the special function tests to the unsupported directory where the code now resides. 2016-07-11 13:50:24 -07:00
Benoit Steiner
3a2dd352ae Improved the contraction mapper to properly support tensor products 2016-07-11 13:43:41 -07:00
Benoit Steiner
0bc020be9d Improved the detection of packet size in the tensor scan evaluator. 2016-07-11 12:14:56 -07:00
Gael Guennebaud
a96a7ce3f7 Move CUDA's special functions to SpecialFunctions module. 2016-07-11 18:39:11 +02:00
Gael Guennebaud
fd60966310 merge 2016-07-11 18:11:47 +02:00
Gael Guennebaud
7d636349dc Fix configuration of CUDA:
- preserve user defined CUDA_NVCC_FLAGS
 - remove the -ansi flag that conflicts with -std=c++11
 - do not add -std=c++11 if already there
2016-07-11 18:09:04 +02:00
Gael Guennebaud
131ee4bb8e Split test_slice_in_expr which seems to be huge for visual 2016-07-11 11:46:55 +02:00
Gael Guennebaud
194daa3048 Fix assertion (it did not make sense for static_val types) 2016-07-11 11:39:27 +02:00
Gael Guennebaud
18c35747ce Emulate _BitScanReverse64 for 32 bits builds 2016-07-11 11:38:04 +02:00
Gael Guennebaud
599f8ba617 Change runtime to compile-time conditional. 2016-07-08 11:39:43 +02:00
Gael Guennebaud
544935101a Fix warnings 2016-07-08 11:38:52 +02:00
Gael Guennebaud
59bf2774a3 Fix warnings 2016-07-08 11:38:11 +02:00
Gael Guennebaud
2f7e2614e7 bug #1232: refactor special functions as a new SpecialFunctions module, currently in unsupported/. 2016-07-08 11:13:55 +02:00
Gael Guennebaud
8b7431d8fd fix compilation with c++11 2016-07-07 15:18:23 +02:00
Gael Guennebaud
69378eed0b Split huge unit test 2016-07-07 15:18:04 +02:00
Gael Guennebaud
179ebb88f9 Fix warning 2016-07-07 09:16:40 +02:00
Gael Guennebaud
5d2dada197 Fix warnings 2016-07-07 09:05:15 +02:00
Gael Guennebaud
f5e780fb05 split huge unit test 2016-07-07 08:59:59 +02:00
Gael Guennebaud
ce9fc0ce14 fix clang compilation 2016-07-04 12:59:02 +02:00
Gael Guennebaud
440020474c Workaround compilation issue with msvc 2016-07-04 12:49:19 +02:00
Igor Babuschkin
78f37ca03c Expose real and imag methods on Tensors 2016-07-01 17:34:31 +01:00
Benoit Steiner
cb2d8b8fa6 Made it possible to compile reductions for an old cuda architecture and run them on a recent gpu. 2016-06-29 15:42:01 -07:00
Benoit Steiner
b2a47641ce Made the code compile when using CUDA architecture < 300 2016-06-29 15:32:47 -07:00
Igor Babuschkin
85699850d9 Add missing CUDA kernel to tensor scan op
The TensorScanOp implementation was missing a CUDA kernel launch.
This adds a simple placeholder implementation.
2016-06-29 11:54:35 +01:00
Benoit Steiner
1a9f92e781 Added a test to validate the tensor scan evaluation on GPU. The test is currently disabled since the code segfaults. 2016-06-27 16:02:52 -07:00
Benoit Steiner
75c333f94c Don't store the scan axis in the evaluator of the tensor scan operation since it's only used in the constructor.
Also avoid taking references to values that may becomes stale after a copy construction.
2016-06-27 10:32:38 -07:00
Benoit Steiner
7944d4431f Made the cost model cwiseMax and cwiseMin methods consts to help the PowerPC cuda compiler compile this code. 2016-08-18 13:46:36 -07:00
Benoit Steiner
647a51b426 Force the inlining of a simple accessor. 2016-08-18 12:31:02 -07:00
Benoit Steiner
a452dedb4f Merged in ibab/eigen/double-tensor-reduction (pull request PR-216)
Enable efficient Tensor reduction for doubles on the GPU (continued)
2016-08-18 12:29:54 -07:00
Igor Babuschkin
18c67df31c Fix remaining CUDA >= 300 checks 2016-08-18 17:18:30 +01:00
Igor Babuschkin
1569a7d7ab Add the necessary CUDA >= 300 checks back 2016-08-18 17:15:12 +01:00
Benoit Steiner
2b17f34574 Properly detect the type of the result of a contraction. 2016-08-16 16:00:30 -07:00
Igor Babuschkin
841e075154 Remove CUDA >= 300 checks and enable outer reductin for doubles 2016-08-06 18:07:50 +01:00
Igor Babuschkin
0425118e2a Merge upstream changes 2016-08-05 14:34:57 +01:00
Igor Babuschkin
9537e8b118 Make use of atomicExch for atomicExchCustom 2016-08-05 14:29:58 +01:00
Igor Babuschkin
eeb0d880ee Enable efficient Tensor reduction for doubles 2016-07-01 19:08:26 +01:00
Gael Guennebaud
cfff370549 Fix hyperbolic functions for autodiff. 2016-06-24 23:21:35 +02:00
Gael Guennebaud
3852351793 merge pull request 198 2016-06-24 11:48:17 +02:00
Gael Guennebaud
6dd9077070 Fix some unused typedef warnings. 2016-06-24 11:34:21 +02:00
Gael Guennebaud
ce90647fa5 Fix NumTraits<AutoDiff> 2016-06-24 11:34:02 +02:00
Gael Guennebaud
fa39f81b48 Fix instantiation of ScalarBinaryOpTraits for AutoDiff. 2016-06-24 11:33:30 +02:00
Rasmus Munk Larsen
a9c1e4d7b7 Return -1 from CurrentThreadId when called by thread outside the pool. 2016-06-23 16:40:07 -07:00
Rasmus Munk Larsen
d39df320d2 Resolve merge. 2016-06-23 15:08:03 -07:00
Gael Guennebaud
361dbd246d Add unit test for printing empty tensors 2016-06-23 18:54:30 +02:00
Gael Guennebaud
360a743a10 bug #1241: does not emmit anything for empty tensors 2016-06-23 18:47:31 +02:00