Mehdi Goli
8fb162fc85
Fixing the typo regarding missing #if needed for proper handling of exceptions in Eigen/Core.
2016-10-16 12:52:34 +01:00
Mehdi Goli
e36cb91c99
Fixing the code indentation in the TensorReduction.h file.
2016-10-14 18:03:00 +01:00
Luke Iwanski
2e188dd4d4
Merged ComputeCpp to default.
2016-10-14 16:47:40 +01:00
Mehdi Goli
15380f9a87
Applyiing Benoit's comment to return the missing line back in Eigen/Core
2016-10-14 16:39:41 +01:00
Luke Iwanski
e742da8b28
Merged ComputeCpp into default.
2016-10-14 13:36:51 +01:00
Mehdi Goli
524fa4c46f
Reducing the code by generalising sycl backend functions/structs.
2016-10-14 12:09:55 +01:00
Benoit Steiner
7e4a6754b2
Merged eigen/eigen into default
2016-10-12 22:42:33 -07:00
Gael Guennebaud
e74612b9a0
Remove double ;;
2016-10-12 22:49:47 +02:00
Gael Guennebaud
f939c351cb
Fix SPQR for rectangular matrices
2016-10-12 22:39:33 +02:00
Gael Guennebaud
091d373ee9
Fix outer-stride.
2016-10-12 21:47:52 +02:00
Gael Guennebaud
5c366fe1d7
Merged in rmlarsen/eigen (pull request PR-230)
...
Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1
2016-10-12 16:30:51 +00:00
Rasmus Munk Larsen
47150af1c8
Fix copy-paste error: Must use _mm256_cmp_ps for AVX.
2016-10-12 08:34:39 -07:00
Gael Guennebaud
89e315152c
bug #1325 : fix compilation on NEON with clang
2016-10-12 16:55:47 +02:00
Benoit Steiner
7f0599b6eb
Manually define int16_t and uint16_t when compiling with Visual Studio
2016-10-08 22:56:32 -07:00
Benoit Steiner
5727e4d89c
Reenabled the use of variadic templates on tegra x1 provides that the latest version (i.e. JetPack 2.3) is used.
2016-10-08 22:19:03 +00:00
Benoit Steiner
5266ff8966
Cleaned up a regression test
2016-10-08 19:12:44 +00:00
Benoit Steiner
5c68051cd7
Merge the content of the ComputeCpp branch into the default branch
2016-10-07 11:04:16 -07:00
Gael Guennebaud
4860727ac2
Remove static qualifier of free-functions (inline is enough and this helps ICC to find the right overload)
2016-10-07 09:21:12 +02:00
Benoit Steiner
33fba3f08d
Merged in rryan/eigen/tensorfunctors (pull request PR-233)
...
Fully support complex types in SumReducer and MeanReducer when building for CUDA by using scalar_sum_op and scalar_product_op instead of operator+ and operator*.
2016-10-06 12:29:19 -07:00
RJ Ryan
bfc264abe8
Add a test that GPU complex product reductions match CPU reductions.
2016-10-06 11:10:14 -07:00
RJ Ryan
e2e9cdd169
Fully support complex types in SumReducer and MeanReducer when building for CUDA by using scalar_sum_op and scalar_product_op instead of operator+ and operator*.
2016-10-06 10:49:48 -07:00
Benoit Steiner
d485d12c51
Added missing AVX intrinsics for fp16: in particular, implemented predux which is required by the matrix-vector code.
2016-10-06 10:41:03 -07:00
Gael Guennebaud
80b5133789
Fix compilation of qr.inverse() for column and full pivoting variants.
2016-10-06 09:55:50 +02:00
Benoit Steiner
d7f9679a34
Fixed a couple of compilation warnings
2016-10-05 15:00:32 -07:00
Benoit Steiner
ae1385c7e4
Pull the latest updates from trunk
2016-10-05 14:54:36 -07:00
Benoit Steiner
73b0012945
Fixed compilation warnings
2016-10-05 14:24:24 -07:00
Benoit Steiner
c84084c0c0
Fixed compilation warning
2016-10-05 14:15:41 -07:00
Benoit Steiner
4387433acf
Increased the robustness of the reduction tests on fp16
2016-10-05 10:42:41 -07:00
Benoit Steiner
aad20d700d
Increase the tolerance to numerical noise.
2016-10-05 10:39:24 -07:00
Benoit Steiner
8b69d5d730
::rand() returns a signed integer on win32
2016-10-05 08:55:02 -07:00
Benoit Steiner
ed7a220b04
Fixed a typo that impacts windows builds
2016-10-05 08:51:31 -07:00
Benoit Steiner
ceee1c008b
Silenced compilation warning
2016-10-04 18:47:53 -07:00
Benoit Steiner
698ff69450
Properly characterize the CUDA packet primitives for fp16 as device only
2016-10-04 16:53:30 -07:00
Rasmus Munk Larsen
7f67e6dfdb
Update comment for fast sqrt.
2016-10-04 15:09:11 -07:00
Rasmus Munk Larsen
765615609d
Update comment for fast sqrt.
2016-10-04 15:08:41 -07:00
Rasmus Munk Larsen
3ed67cb0bb
Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments.
...
Benchmark speed in Giga-sqrts/s
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
-----------------------------------------
SSE AVX
Fast=1 2.529G 4.380G
Fast=0 1.944G 1.898G
Fast=1 fixed 2.214G 3.739G
This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.
2016-10-04 14:22:56 -07:00
Benoit Steiner
6af5ac7e27
Cleanup the cuda executor code.
2016-10-04 08:52:13 -07:00
Benoit Steiner
2f6d1607c8
Cleaned up the random number generation code.
2016-10-04 08:38:23 -07:00
Benoit Steiner
881b90e984
Use explicit type casting to generate packets of zeros.
2016-10-04 08:23:38 -07:00
Benoit Steiner
616a7a1912
Improved support for compiling CUDA code with clang as the host compiler
2016-10-03 17:09:33 -07:00
Benoit Steiner
409e887d78
Added support for constand std::complex numbers on GPU
2016-10-03 11:06:24 -07:00
Gael Guennebaud
9d6d0dff8f
bug #1317 : fix performance regression with some Block expressions and clang by helping it to remove dead code.
...
The trick is to get rid of the nested expression in the evaluator by copying only the required information (here, the strides).
2016-10-01 15:37:00 +02:00
Gael Guennebaud
8b84801f7f
bug #1310 : workaround a compilation regression from 3.2 regarding triangular * homogeneous
2016-09-30 22:49:59 +02:00
Benoit Steiner
422530946f
Renamed the SYCL tests to follow the standard naming convention.
2016-09-30 08:22:10 -07:00
Gael Guennebaud
67b4f45836
Fix angle range
2016-09-30 12:46:33 +02:00
Gael Guennebaud
27f3970453
Remove std:: prefix
2016-09-30 12:40:41 +02:00
Gael Guennebaud
3860a0bc8f
bug #1312 : Quaternion to AxisAngle conversion now ensures the angle will be in the range [-pi,pi]. This also increases accuracy when q.w is negative.
2016-09-29 23:23:35 +02:00
Gael Guennebaud
33500050c3
bug #1308 : fix compilation of some small products involving nullary-expressions.
2016-09-29 09:40:44 +02:00
Benoit Steiner
27d7628f16
Updated the list of warnings to reflect the new message ids introduced in cuda 8.0
2016-09-28 17:42:59 -07:00
Benoit Steiner
2bda1b0d93
Updated the tensor sum and mean reducer to enable them to process complex numbers on cuda gpus.
2016-09-28 17:08:41 -07:00