Benoit Steiner
4deafd35b7
Introduce a portable EIGEN_SLEEP macro.
2016-12-09 14:52:15 -08:00
Benoit Steiner
2f5b7a199b
Reworked the threadpool cancellation mechanism to not depend on pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms.
2016-12-09 13:05:14 -08:00
Benoit Steiner
3d59a47720
Added a message to ease the detection of platforms on which thread cancellation isn't supported.
2016-12-08 14:51:46 -08:00
Benoit Steiner
7bfff85355
Added support for thread cancellation on Linux
2016-12-08 08:12:49 -08:00
Srinivas Vasudevan
218764ee1f
Added support for expm1 in Eigen.
2016-12-02 14:13:01 -08:00
Mehdi Goli
79aa2b784e
Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.
2016-12-01 13:02:27 +00:00
Benoit Steiner
fd1dc3363e
Merged eigen/eigen into default
2016-11-30 20:16:17 -08:00
Mehdi Goli
577ce78085
Adding TensorShuffling backend for sycl; adding TensorReshaping backend for sycl; cleaning up the sycl backend.
2016-11-29 15:30:42 +00:00
Benoit Steiner
3011dc94ef
Call internal::array_prod to compute the total size of the tensor.
2016-11-28 09:00:31 -08:00
Benoit Steiner
9f8fbd9434
Merged eigen/eigen into default
2016-11-26 11:28:25 -08:00
Mehdi Goli
7318daf887
Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h.
2016-11-25 16:19:07 +00:00
Mehdi Goli
b8cc5635d5
Removing unsupported device from test case; cleaning the tensor device sycl.
2016-11-23 16:30:41 +00:00
Gael Guennebaud
7f6333c32b
Merged in tal500/eigen-eulerangles (pull request PR-237)
...
Euler angles
2016-11-23 15:17:38 +00:00
Gael Guennebaud
f12b368417
Extend polynomial solver unit tests to complexes
2016-11-23 16:05:45 +01:00
Luke Iwanski
af67335e0e
Added test for cwiseMin, cwiseMax and operator%.
2016-11-19 13:37:27 +00:00
Benoit Steiner
a357fe1fb9
Code cleanup
2016-11-18 16:58:09 -08:00
Benoit Steiner
1c6eafb46b
Updated cxx11_tensor_device_sycl to run only on the OpenCL devices available on the host
2016-11-18 16:43:27 -08:00
Benoit Steiner
ca754caa23
Only runs the cxx11_tensor_reduction_sycl on devices that are available.
2016-11-18 16:31:14 -08:00
Benoit Steiner
dc601d79d1
Added the ability to run test exclusively OpenCL devices that are listed by sycl::device::get_devices().
2016-11-18 16:26:50 -08:00
Benoit Steiner
b5e3285e16
Test broadcasting on OpenCL devices with 64 bit indexing
2016-11-18 13:44:20 -08:00
Benoit Steiner
7335c49204
Fixed the cxx11_tensor_device_sycl test
2016-11-18 12:37:13 -08:00
Mehdi Goli
15e226d7d3
adding Benoit changes on the TensorDeviceSycl.h
2016-11-18 16:34:54 +00:00
Mehdi Goli
622805a0c5
Modifying TensorDeviceSycl.h to always create buffer of type uint8_t and convert them to the actual type at the execution on the device; adding the queue interface class to separate the lifespan of sycl queue and buffers,created for that queue, from Eigen::SyclDevice; modifying sycl tests to support the evaluation of the results for both row major and column major data layout on all different devices that are supported by Sycl{CPU; GPU; and Host}.
2016-11-18 16:20:42 +00:00
Luke Iwanski
5159675c33
Added isnan, isfinite and isinf for SYCL device. Plus test for that.
2016-11-18 16:01:48 +00:00
Luke Iwanski
927bd62d2a
Now testing out (+=, =) in.FUNC() and out (+=, =) out.FUNC()
2016-11-18 11:16:42 +00:00
Benoit Steiner
553f50b246
Added a way to detect errors generated by the opencl device from the host
2016-11-17 21:51:48 -08:00
Benoit Steiner
4349fc640e
Created a test to check that the sycl runtime can successfully report errors (like ivision by 0).
...
Small cleanup
2016-11-17 20:27:54 -08:00
Benoit Steiner
004344cf54
Avoid calling log(0) or 1/0
2016-11-17 11:56:44 -08:00
Luke Iwanski
7878756dea
Fixed existing test.
2016-11-17 17:46:55 +00:00
Luke Iwanski
c5130dedbe
Specialised basic math functions for SYCL device.
2016-11-17 11:47:13 +00:00
Mehdi Goli
05e8c2a1d9
Adding extra test for non-fixed size to broadcast; Replacing stcl with sycl.
2016-11-14 18:13:53 +00:00
Mehdi Goli
f8ca893976
Adding TensorFixsize; adding sycl device memcpy; adding insial stage of slicing.
2016-11-14 17:51:57 +00:00
Mehdi Goli
3be3963021
Adding EIGEN_STRONG_INLINE back; using size() instead of dimensions.TotalSize() on Tensor.
2016-11-10 19:16:31 +00:00
Mehdi Goli
2e704d4257
Adding Memset; optimising MecopyDeviceToHost by removing double copying;
2016-11-10 18:45:12 +00:00
Benoit Steiner
75c080b176
Added a test to validate memory transfers between host and sycl device
2016-11-09 06:23:42 -08:00
Benoit Steiner
db3903498d
Merged in benoitsteiner/opencl (pull request PR-246)
...
Improved support for OpenCL
2016-11-08 22:28:44 +00:00
Mehdi Goli
d57430dd73
Converting all sycl buffers to uninitialised device only buffers; adding memcpyHostToDevice and memcpyDeviceToHost on syclDevice; modifying all examples to obey the new rules; moving sycl queue creating to the device based on Benoit suggestion; removing the sycl specefic condition for returning m_result in TensorReduction.h according to Benoit suggestion.
2016-11-08 17:08:02 +00:00
Benoit Steiner
ad086b03e4
Removed unnecessary statement
2016-11-05 12:43:27 -07:00
Gael Guennebaud
55b4fd1d40
Extend mpreal unit test to check LLT with complexes.
2016-11-05 11:28:53 +01:00
Mehdi Goli
0ebe3808ca
Removed the sycl include from Eigen/Core and moved it to Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size;
2016-11-04 18:18:19 +00:00
Benoit Steiner
d5f88e2357
Sharded the tensor_image_patch test to help it run on low power devices
2016-10-27 21:48:21 -07:00
Benoit Steiner
0b4b0f11e8
Fixed a few more compilation warnings
2016-10-28 04:01:01 +00:00
Benoit Steiner
306daa24a3
Fixed a compilation warning
2016-10-28 03:50:31 +00:00
Benoit Steiner
8471cf1996
Fixed compilation warning
2016-10-28 03:46:08 +00:00
Benoit Steiner
cf20b30d65
Merge latest updates from trunk
2016-10-20 09:42:05 -07:00
Tal Hadad
15eca2432a
Euler tests: Tighter precision when no roll exists and clean code.
2016-10-18 23:24:57 +03:00
Tal Hadad
6f4f12d1ed
Add isApprox() and cast() functions.
...
test cases included
2016-10-17 22:23:47 +03:00
Tal Hadad
7402cfd4cc
Add safty for near pole cases and test them better.
2016-10-17 20:42:08 +03:00
Tal Hadad
58f5d7d058
Fix calc bug, docs and better testing.
...
Test code changes:
* better coded
* rand and manual numbers
* singularity checking
2016-10-16 14:39:26 +03:00
Tal Hadad
078a202621
Merge Hongkai Dai correct range calculation, and remove ranges from API.
...
Docs updated.
2016-10-14 16:03:28 +03:00
Luke Iwanski
e742da8b28
Merged ComputeCpp into default.
2016-10-14 13:36:51 +01:00
Mehdi Goli
524fa4c46f
Reducing the code by generalising sycl backend functions/structs.
2016-10-14 12:09:55 +01:00
Hongkai Dai
014d9f1d9b
implement euler angles with the right ranges
2016-10-13 14:45:51 -07:00
Benoit Steiner
d0ee2267d6
Relaxed the resizing checks so that they don't fail with gcc >= 5.3
2016-10-13 10:59:46 -07:00
Benoit Steiner
7e4a6754b2
Merged eigen/eigen into default
2016-10-12 22:42:33 -07:00
Benoit Steiner
5266ff8966
Cleaned up a regression test
2016-10-08 19:12:44 +00:00
Benoit Steiner
5c68051cd7
Merge the content of the ComputeCpp branch into the default branch
2016-10-07 11:04:16 -07:00
RJ Ryan
bfc264abe8
Add a test that GPU complex product reductions match CPU reductions.
2016-10-06 11:10:14 -07:00
Benoit Steiner
d7f9679a34
Fixed a couple of compilation warnings
2016-10-05 15:00:32 -07:00
Benoit Steiner
ae1385c7e4
Pull the latest updates from trunk
2016-10-05 14:54:36 -07:00
Benoit Steiner
73b0012945
Fixed compilation warnings
2016-10-05 14:24:24 -07:00
Benoit Steiner
4387433acf
Increased the robustness of the reduction tests on fp16
2016-10-05 10:42:41 -07:00
Benoit Steiner
aad20d700d
Increase the tolerance to numerical noise.
2016-10-05 10:39:24 -07:00
Benoit Steiner
616a7a1912
Improved support for compiling CUDA code with clang as the host compiler
2016-10-03 17:09:33 -07:00
Benoit Steiner
422530946f
Renamed the SYCL tests to follow the standard naming convention.
2016-09-30 08:22:10 -07:00
Benoit Steiner
2bda1b0d93
Updated the tensor sum and mean reducer to enable them to process complex numbers on cuda gpus.
2016-09-28 17:08:41 -07:00
RJ Ryan
608b1acd6d
Don't use c++11 features and fix include.
2016-09-20 07:49:05 -07:00
RJ Ryan
b2c6dc48d9
Add CUDA-specific std::complex<T> specializations for scalar_sum_op, scalar_difference_op, scalar_product_op, and scalar_quotient_op.
2016-09-20 07:18:20 -07:00
Luke Iwanski
c771df6bc3
Updated the owners of the file.
2016-09-19 14:09:25 +01:00
Luke Iwanski
b91e021172
Merged with default.
2016-09-19 14:03:54 +01:00
Luke Iwanski
cb81975714
Partial OpenCL support via SYCL compatible with ComputeCpp CE.
2016-09-19 12:44:13 +01:00
Emil Fresk
6edd2e2851
Made AutoDiffJacobian more intuitive to use and updated for C++11
...
Changes:
* Removed unnecessary types from the Functor by inferring from its types
* Removed inputs() function reference, replaced with .rows()
* Updated the forward constructor to use variadic templates
* Added optional parameters to the Fuctor for passing parameters,
control signals, etc
* Has been tested with fixed size and dynamic matricies
Ammendment by chtz: overload operator() for compatibility with not fully conforming compilers
2016-09-16 14:03:55 +02:00
Benoit Steiner
e4d4d15588
Register the cxx11_tensor_device only for recent cuda architectures (i.e. >= 3.0) since the test instantiate contractions that require a modern gpu.
2016-09-12 19:01:52 -07:00
Benoit Steiner
4dfd888c92
CUDA contractions require arch >= 3.0: don't compile the cuda contraction tests on older architectures.
2016-09-12 18:49:01 -07:00
Benoit Steiner
028e299577
Fixed a bug impacting some outer reductions on GPU
2016-09-12 18:36:52 -07:00
Benoit Steiner
5f50f12d2c
Added the ability to compute the absolute value of a complex number on GPU, as well as a test to catch the problem.
2016-09-12 13:46:13 -07:00
Benoit Steiner
8321dcce76
Merged latest updates from trunk
2016-09-12 10:33:05 -07:00
Benoit Steiner
eb6ba00cc8
Properly size the list of waiters
2016-09-12 10:31:55 -07:00
Gael Guennebaud
dabc81751f
Fix compilation when cuda_fp16.h does not exist.
2016-09-05 17:14:20 +02:00
Benoit Steiner
87a8a1975e
Fixed a regression test
2016-09-02 19:29:33 -07:00
Benoit Steiner
6c05c3dd49
Fix the cxx11_tensor_cuda.cu test on 32bit platforms.
2016-09-02 11:12:16 -07:00
Benoit Steiner
039e225f7f
Added a test for nullary expressions on CUDA
...
Also check that we can mix 64 and 32 bit indices in the same compilation unit
2016-09-01 13:28:12 -07:00
Benoit Steiner
c53f783705
Updated the contraction code to support constant inputs.
2016-09-01 11:41:27 -07:00
Gael Guennebaud
72a4d49315
Fix compilation with CUDA 8
2016-09-01 13:39:33 +02:00
Gael Guennebaud
1f84f0d33a
merge EulerAngles module
2016-08-30 10:01:53 +02:00
Gael Guennebaud
e074f720c7
Include missing forward declaration of SparseMatrix
2016-08-29 18:56:46 +02:00
Gael Guennebaud
6cd7b9ea6b
Fix compilation with cuda 8
2016-08-29 11:06:08 +02:00
Gael Guennebaud
0f56b5a6de
enable vectorization path when testing half on cuda, and add test for log1p
2016-08-26 14:55:51 +02:00
Benoit Steiner
5eea1c7f97
Fixed cut and paste bug in debud message
2016-08-04 17:34:13 -07:00
Benoit Steiner
b50d8f8c4a
Extended a regression test to validate that we basic fp16 support works with cuda 7.0
2016-08-03 16:50:13 -07:00
Benoit Steiner
fad9828769
Deleted redundant regression test.
2016-08-03 16:08:37 -07:00
Benoit Steiner
d92df04ce8
Cleaned up the new float16 test a bit
2016-08-03 11:50:07 -07:00
Benoit Steiner
81099ef482
Added a test for fp16
2016-08-03 11:41:17 -07:00
Gael Guennebaud
cc2f6d68b1
bug #1264 : fix compilation
2016-07-27 23:30:47 +02:00
Gael Guennebaud
8972323c08
Big 1261: add missing max(ADS,ADS) overload (same for min)
2016-07-27 14:52:48 +02:00
Gael Guennebaud
5d94dc85e5
bug #1260 : add regression test
2016-07-27 14:38:30 +02:00
Gael Guennebaud
0d7039319c
bug #1260 : remove doubtful specializations of ScalarBinaryOpTraits
2016-07-27 14:35:52 +02:00
Gael Guennebaud
fd1117f2be
Implement digits10 for mpreal
2016-07-25 14:38:55 +02:00
Benoit Steiner
c6b0de2c21
Improved partial reductions in more cases
2016-07-22 17:18:20 -07:00
Gael Guennebaud
32d95e86c9
merge
2016-07-22 16:43:12 +02:00
Gael Guennebaud
d7a0e52478
Fix testing of log nearby 1
2016-07-22 15:44:26 +02:00
Gael Guennebaud
7acf23c14c
Truely split unit test.
2016-07-22 15:41:23 +02:00
Gael Guennebaud
d075d122ea
Move half unit test from unsupported to main tests
2016-07-22 14:34:19 +02:00
Gael Guennebaud
82798162c0
Extend unit testing of half with ADL and arrays.
2016-07-21 15:47:21 +02:00
Gael Guennebaud
c98bac2966
Manually add -stdd=c++11 to nvcc for old cmake versions
2016-07-12 09:29:18 +02:00
Benoit Steiner
40eb97516c
reverted unintended change.
2016-07-11 14:28:03 -07:00
Benoit Steiner
03b71c273e
Made the packetmath test compile again. A better fix would be to move the special function tests to the unsupported directory where the code now resides.
2016-07-11 13:50:24 -07:00
Gael Guennebaud
fd60966310
merge
2016-07-11 18:11:47 +02:00
Gael Guennebaud
7d636349dc
Fix configuration of CUDA:
...
- preserve user defined CUDA_NVCC_FLAGS
- remove the -ansi flag that conflicts with -std=c++11
- do not add -std=c++11 if already there
2016-07-11 18:09:04 +02:00
Gael Guennebaud
131ee4bb8e
Split test_slice_in_expr which seems to be huge for visual
2016-07-11 11:46:55 +02:00
Gael Guennebaud
544935101a
Fix warnings
2016-07-08 11:38:52 +02:00
Gael Guennebaud
59bf2774a3
Fix warnings
2016-07-08 11:38:11 +02:00
Gael Guennebaud
2f7e2614e7
bug #1232 : refactor special functions as a new SpecialFunctions module, currently in unsupported/.
2016-07-08 11:13:55 +02:00
Gael Guennebaud
8b7431d8fd
fix compilation with c++11
2016-07-07 15:18:23 +02:00
Gael Guennebaud
69378eed0b
Split huge unit test
2016-07-07 15:18:04 +02:00
Gael Guennebaud
5d2dada197
Fix warnings
2016-07-07 09:05:15 +02:00
Gael Guennebaud
f5e780fb05
split huge unit test
2016-07-07 08:59:59 +02:00
Igor Babuschkin
85699850d9
Add missing CUDA kernel to tensor scan op
...
The TensorScanOp implementation was missing a CUDA kernel launch.
This adds a simple placeholder implementation.
2016-06-29 11:54:35 +01:00
Benoit Steiner
1a9f92e781
Added a test to validate the tensor scan evaluation on GPU. The test is currently disabled since the code segfaults.
2016-06-27 16:02:52 -07:00
Igor Babuschkin
0425118e2a
Merge upstream changes
2016-08-05 14:34:57 +01:00
Igor Babuschkin
eeb0d880ee
Enable efficient Tensor reduction for doubles
2016-07-01 19:08:26 +01:00
Gael Guennebaud
3852351793
merge pull request 198
2016-06-24 11:48:17 +02:00
Rasmus Munk Larsen
a9c1e4d7b7
Return -1 from CurrentThreadId when called by thread outside the pool.
2016-06-23 16:40:07 -07:00
Rasmus Munk Larsen
d39df320d2
Resolve merge.
2016-06-23 15:08:03 -07:00
Gael Guennebaud
361dbd246d
Add unit test for printing empty tensors
2016-06-23 18:54:30 +02:00
Benoit Steiner
de32f8d656
Fixed the printing of rank-0 tensors
2016-06-20 10:46:45 -07:00
Geoffrey Lalonde
72c95383e0
Add autodiff coverage for standard library hyperbolic functions, and tests.
...
* * *
Corrected tanh derivatived, moved test definitions.
* * *
Added more test cases, removed lingering lines
2016-06-15 23:33:19 -07:00
Igor Babuschkin
c4d10e921f
Implement exclusive scan option
2016-06-14 19:44:07 +01:00
Rasmus Munk Larsen
f1f2ff8208
size_t -> int
2016-06-03 18:06:37 -07:00
Rasmus Munk Larsen
76308e7fd2
Add CurrentThreadId and NumThreads methods to Eigen threadpools and TensorDeviceThreadPool.
2016-06-03 16:28:58 -07:00
Eugene Brevdo
39baff850c
Add TernaryFunctors and the betainc SpecialFunction.
...
TernaryFunctors and their executors allow operations on 3-tuples of inputs.
API fully implemented for Arrays and Tensors based on binary functors.
Ported the cephes betainc function (regularized incomplete beta
integral) to Eigen, with support for CPU and GPU, floats, doubles, and
half types.
Added unit tests in array.cpp and cxx11_tensor_cuda.cu
Collapsed revision
* Merged helper methods for betainc across floats and doubles.
* Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase.
* Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper.
* betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup.
* Update TernaryOp and SpecialFunctions (betainc) based on review comments.
2016-06-02 17:04:19 -07:00
Benoit Steiner
02db4e1a82
Disable the tensor tests when using msvc since older versions of the compiler fail to handle this code
2016-06-04 08:21:17 -07:00
Rasmus Munk Larsen
811aadbe00
Add syntactic sugar to Eigen tensors to allow more natural syntax.
...
Specifically, this enables expressions involving:
scalar + tensor
scalar * tensor
scalar / tensor
scalar - tensor
2016-06-02 12:41:28 -07:00
Tal Hadad
52e4cbf539
Merged eigen/eigen into default
2016-06-02 22:15:20 +03:00
Tal Hadad
2aaaf22623
Fix Gael reports (except documention)
...
- "Scalar angle(int) const" should be "const Vector& angles() const"
- then method "coeffs" could be removed.
- avoid one letter names like h, p, r -> use alpha(), beta(), gamma() ;)
- about the "fromRotation" methods:
- replace the ones which are not static by operator= (as in Quaternion)
- the others are actually static methods: use a capital F: FromRotation
- method "invert" should be removed.
- use a macro to define both float and double EulerAnglesXYZ* typedefs
- AddConstIf -> not used
- no needs for NegateIfXor, compilers are extremely good at optimizing away branches based on compile time constants:
if(IsHeadingOpposite-=IsEven) res.alpha() = -res.alpha();
2016-06-02 22:12:57 +03:00
Igor Babuschkin
fbd7ed6ff7
Add tensor scan op
...
This is the initial implementation a generic scan operation.
Based on this, cumsum and cumprod method have been added to TensorBase.
2016-06-02 13:35:47 +01:00
Benoit Steiner
c3cada38e2
Speedup a test
2016-06-01 21:13:00 -07:00
Benoit Steiner
abc815798b
Added a new operation to enable more powerful tensorindexing.
2016-05-27 12:22:25 -07:00
Benoit Steiner
5707537592
Fixed option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr' warning generated by nvcc 7.5
2016-05-27 10:47:53 -07:00
Benoit Steiner
36369ab63c
Resolved merge conflicts
2016-05-26 13:39:39 -07:00
Benoit Steiner
28fcb5ca2a
Merged latest reduction improvements
2016-05-26 12:19:33 -07:00
Benoit Steiner
c1c7f06c35
Improved the performance of inner reductions.
2016-05-26 11:53:59 -07:00
Benoit Steiner
22d02c9855
Improved the coverage of the fp16 reduction tests
2016-05-26 11:12:16 -07:00
Benoit Steiner
58026905ae
Added support for statically known lists of pairs of indices
2016-05-25 11:04:14 -07:00
Christoph Hertzberg
718521d5cf
Silenced several double-promotion warnings
2016-05-22 18:17:04 +02:00
Christoph Hertzberg
b5a7603822
fixed macro name
2016-05-22 16:49:29 +02:00
Gael Guennebaud
ccaace03c9
Make EIGEN_HAS_CONSTEXPR user configurable
2016-05-20 15:10:08 +02:00
Gael Guennebaud
c3410804cd
Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable
2016-05-20 15:05:38 +02:00
Benoit Steiner
a910bcee43
Merged latest updates from trunk
2016-05-17 09:14:22 -07:00
Benoit Steiner
8d06c02ffd
Allow vectorized padding on GPU. This helps speed things up a little.
...
Before:
BM_padding/10 5000000 460 217.03 MFlops/s
BM_padding/80 5000000 460 13899.40 MFlops/s
BM_padding/640 5000000 461 888421.17 MFlops/s
BM_padding/4K 5000000 460 54316322.55 MFlops/s
After:
BM_padding/10 5000000 454 220.20 MFlops/s
BM_padding/80 5000000 455 14039.86 MFlops/s
BM_padding/640 5000000 452 904968.83 MFlops/s
BM_padding/4K 5000000 411 60750049.21 MFlops/s
2016-05-17 09:13:27 -07:00