Commit Graph

5283 Commits

Author SHA1 Message Date
Gael Guennebaud
a6b971e291 Fix memory leak in Ref<Sparse> 2016-12-05 16:59:30 +01:00
Gael Guennebaud
8640ffac65 Optimize SparseLU::solve for rhs vectors 2016-12-05 15:41:14 +01:00
Gael Guennebaud
62acd67903 remove temporary in SparseLU::solve 2016-12-05 15:11:57 +01:00
Gael Guennebaud
0db6d5b3f4 bug #1356: fix calls to evaluator::coeffRef(0,0) to get the address of the destination
by adding a dstDataPtr() member to the kernel. This fixes undefined behavior if dst is empty (nullptr).
2016-12-05 15:08:09 +01:00
Gael Guennebaud
91003f3b86 typo 2016-12-05 13:51:07 +01:00
Gael Guennebaud
e3f613cbd4 Improve performance of row-major-dense-matrix * vector products for recent CPUs.
This revised version does not bother about aligned loads/stores,
and rather processes 8 rows at ones for better instruction pipelining.
2016-12-05 13:02:01 +01:00
Gael Guennebaud
3abc827354 Clean debugging code 2016-12-05 12:59:32 +01:00
Benoit Steiner
462c28e77a Merged in srvasude/eigen (pull request PR-265)
Add Expm1 support to Eigen.
2016-12-05 02:31:11 +00:00
Gael Guennebaud
6a5fe86098 Complete rewrite of column-major-matrix * vector product to deliver higher performance of modern CPU.
The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive.
This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA.
According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast.
Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching.
We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
2016-12-03 21:14:14 +01:00
Christoph Hertzberg
22f7d398e2 bug #1355: Fixed wrong line-endings on two files 2016-12-02 11:22:05 +01:00
Gael Guennebaud
27873008d4 Clean up SparseCore module regarding ReverseInnerIterator 2016-12-01 21:55:10 +01:00
Angelos Mantzaflaris
8c24723a09 typo UIntPtr
(grafted from b6f04a2dd4
)
2016-12-01 21:25:58 +01:00
Angelos Mantzaflaris
aeba0d8655 fix two warnings(unused typedef, unused variable) and a typo
(grafted from a9aa3bcf50
)
2016-12-01 21:23:43 +01:00
Gael Guennebaud
181138a1cb fix member order 2016-12-01 17:06:20 +01:00
Gael Guennebaud
9f297d57ae Merged in rmlarsen/eigen (pull request PR-256)
Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.
2016-12-01 15:27:33 +00:00
Benoit Steiner
7ff26ddcbb Merged eigen/eigen into default 2016-12-01 07:13:17 -08:00
Gael Guennebaud
037b46762d Fix misleading-indentation warnings. 2016-12-01 16:05:42 +01:00
Mehdi Goli
79aa2b784e Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code. 2016-12-01 13:02:27 +00:00
Benoit Steiner
fd1dc3363e Merged eigen/eigen into default 2016-11-30 20:16:17 -08:00
Gael Guennebaud
8df272af88 Fix slection of product implementation for dynamic size matrices with fixed max size. 2016-11-30 22:21:33 +01:00
Gael Guennebaud
c927af60ed Fix a performance regression in (mat*mat)*vec for which mat*mat was evaluated multiple times. 2016-11-30 17:59:13 +01:00
Gael Guennebaud
ab4ef5e66e bug #1351: fix compilation of random with old compilers 2016-11-30 17:37:53 +01:00
Rasmus Munk Larsen
a0329f64fb Add a default constructor for the "fake" __half class when not using the
__half class provided by CUDA.
2016-11-29 13:18:09 -08:00
Benoit Steiner
9f8fbd9434 Merged eigen/eigen into default 2016-11-26 11:28:25 -08:00
Mehdi Goli
7318daf887 Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h. 2016-11-25 16:19:07 +00:00
Benoit Steiner
3be1afca11 Disabled the "remove the call to 'std::abs' since unsigned values cannot be negative" warning introduced in clang 3.5 2016-11-23 18:49:51 -08:00
Mehdi Goli
b8cc5635d5 Removing unsupported device from test case; cleaning the tensor device sycl. 2016-11-23 16:30:41 +00:00
Gael Guennebaud
e340866c81 Fix compilation with gcc and old ABI version 2016-11-23 14:04:57 +01:00
Gael Guennebaud
a91de27e98 Fix compilation issue with MSVC:
MSVC always messes up with shadowed template arguments, for instance in:
  struct B { typedef float T; }
  template<typename T> struct A : B {
    T g;
  };
The type of A<double>::g will be float and not double.
2016-11-23 12:24:48 +01:00
Gael Guennebaud
74637fa4e3 Optimize predux<Packet8f> (AVX) 2016-11-22 21:57:52 +01:00
Gael Guennebaud
178c084856 Disable usage of SSE3 _mm_hadd_ps that is extremely slow. 2016-11-22 21:53:14 +01:00
Gael Guennebaud
7dd894e40e Optimize predux<Packet4d> (AVX) 2016-11-22 21:41:30 +01:00
Gael Guennebaud
f3fb0a1940 Disable usage of SSE3 haddpd that is extremely slow. 2016-11-22 16:58:31 +01:00
Gael Guennebaud
6a84246a6a Fix regression in assigment of sparse block to spasre block. 2016-11-21 21:46:42 +01:00
Benoit Steiner
ed839c5851 Enable the use of constant expressions with clang >= 3.6 2016-11-20 10:34:49 -08:00
Gael Guennebaud
465ede0f20 Fix compilation issue in mat = permutation (regression introduced in 8193ffb3d3
)
2016-11-20 09:41:37 +01:00
Benoit Steiner
81151bd474 Fixed merge conflicts 2016-11-19 19:12:59 -08:00
Benoit Steiner
1bdf1b9ce0 Merged in benoitsteiner/opencl (pull request PR-253)
OpenCL improvements
2016-11-19 04:44:43 +00:00
Benoit Steiner
8649e16c2a Enable EIGEN_HAS_C99_MATH when building with the latest version of Visual Studio 2016-11-18 14:18:34 -08:00
Gael Guennebaud
164414c563 Merged in ChunW/eigen (pull request PR-252)
Workaround for error in VS2012 with /clr
2016-11-18 21:07:29 +00:00
Luke Iwanski
5159675c33 Added isnan, isfinite and isinf for SYCL device. Plus test for that. 2016-11-18 16:01:48 +00:00
Gael Guennebaud
8193ffb3d3 bug #1343: fix compilation regression in mat+=selfadjoint_view.
Generic EigenBase2EigenBase assignment was incomplete.
2016-11-18 10:17:34 +01:00
Gael Guennebaud
cebff7e3a2 bug #1343: fix compilation regression in array = matrix_product 2016-11-18 10:09:33 +01:00
Benoit Steiner
7c30078b9f Merged eigen/eigen into default 2016-11-17 22:53:37 -08:00
Chun Wang
0d0948c3b9 Workaround for error in VS2012 with /clr 2016-11-17 17:54:27 -05:00
Konstantinos Margaritis
672aa97d4d implement float/std::complex<float> for ZVector as well, minor fixes to ZVector 2016-11-17 13:27:33 -05:00
Luke Iwanski
c5130dedbe Specialised basic math functions for SYCL device. 2016-11-17 11:47:13 +00:00
Benoit Steiner
f2e8b73256 Enable the use of AVX512 instruction by default 2016-11-16 21:28:04 -08:00
Gael Guennebaud
7b09e4dd8c bump default branch to 3.3.90 2016-11-16 22:20:58 +01:00
Benoit Steiner
dff9a049c4 Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs 2016-11-16 09:01:51 -08:00
Gael Guennebaud
0ee92aa38e Optimize sparse<bool> && sparse<bool> to use the same path as for coeff-wise products. 2016-11-14 18:47:41 +01:00
Gael Guennebaud
2e334f5da0 bug #426: move operator && and || to MatrixBase and SparseMatrixBase. 2016-11-14 18:47:02 +01:00
Gael Guennebaud
a048aba14c Merged in olesalscheider/eigen (pull request PR-248)
Make sure not to call numext::maxi on expression templates
2016-11-14 13:25:53 +00:00
Gael Guennebaud
eedb87f4ba Fix regression in SparseMatrix::ReverseInnerIterator 2016-11-14 14:05:53 +01:00
Niels Ole Salscheider
51fef87408 Make sure not to call numext::maxi on expression templates 2016-11-12 12:20:57 +01:00
Gael Guennebaud
eeac81b8c0 bump to 3.3.0 2016-11-10 13:55:14 +01:00
Gael Guennebaud
e80bc2ddb0 Fix printing of sparse expressions 2016-11-10 10:35:32 +01:00
Benoit Steiner
db3903498d Merged in benoitsteiner/opencl (pull request PR-246)
Improved support for OpenCL
2016-11-08 22:28:44 +00:00
Gael Guennebaud
436a111792 Generalize Cholmod support to hanlde any sparse type as the rhs and result of the solve method 2016-11-06 20:29:23 +01:00
Gael Guennebaud
afc55b1885 Generalize IterativeSolverBase::solve to hanlde any sparse type as the results (instead of SparseMatrix only) 2016-11-06 20:28:18 +01:00
Gael Guennebaud
a5c2d8a3cc Generalize solve_sparse_through_dense_panels to handle SparseVector. 2016-11-06 15:20:58 +01:00
Gael Guennebaud
f8bfe10613 Add missing friend declaration 2016-11-06 15:20:30 +01:00
Gael Guennebaud
fc7180cda8 Add a default ctor to evaluator<SparseVector>.
Needed for evaluator<Solve>.
2016-11-06 15:20:00 +01:00
Gael Guennebaud
4d226ab5b5 Enable swapping between SparseMatrix and SparseVector 2016-11-06 15:15:03 +01:00
Gael Guennebaud
a354c3ca59 Fix compilation of LLT with complex<mpreal>. 2016-11-05 11:28:29 +01:00
Benoit Steiner
d46a36cc84 Merged eigen/eigen into default 2016-11-04 18:22:55 -07:00
Mehdi Goli
0ebe3808ca Removed the sycl include from Eigen/Core and moved it to Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size; 2016-11-04 18:18:19 +00:00
Gael Guennebaud
ba05572dcb bump to 3.3-rc2 2016-11-04 09:09:06 +01:00
Benoit Steiner
5c3995769c Improved AVX512 configuration 2016-11-03 04:50:28 -07:00
Benoit Steiner
ca0ba0d9a4 Improved AVX512 support 2016-11-03 04:00:49 -07:00
Benoit Steiner
c80587c92b Merged eigen/eigen into default 2016-11-03 03:55:11 -07:00
Gael Guennebaud
3f1d0cdc22 bug #1337: improve doc of homogeneous() and hnormalized() 2016-11-03 11:03:08 +01:00
Gael Guennebaud
78e93ac1ad bug #1330: Cholmod supports double precision only, so let's trigger a static assertion if the scalar type does not match this requirement. 2016-11-03 10:21:59 +01:00
Benoit Steiner
3e37166d0b Merged in benoitsteiner/opencl (pull request PR-244)
Disable vectorization on device only when compiling for sycl
2016-11-02 22:01:03 +00:00
Benoit Steiner
0585b2965d Disable vectorization on device only when compiling for sycl 2016-11-02 11:44:27 -07:00
Gael Guennebaud
a07bb428df bug #1004: improve accuracy of LinSpaced for abs(low) >> abs(high). 2016-11-02 11:34:38 +01:00
Gael Guennebaud
598de8b193 Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX. 2016-11-02 10:38:13 +01:00
Benoit Steiner
7a0e96b80d Gate the code that refers to cuda fp16 primitives more thoroughly 2016-11-01 12:08:09 -07:00
Gael Guennebaud
3ecb343dc3 Fix regression in X = (X*X.transpose())/s with X rectangular by deferring resizing of the destination after the creation of the evaluator of the source expression. 2016-10-26 22:50:41 +02:00
Gael Guennebaud
97feea9d39 add a generic EIGEN_HAS_CXX11 2016-10-26 15:53:13 +02:00
Gael Guennebaud
ca6a2a5248 Fix warning with ICC 2016-10-26 14:13:05 +02:00
Gael Guennebaud
b15a5dc3f4 Fix ICC warnings 2016-10-25 22:20:24 +02:00
Gael Guennebaud
aad72f3c6d Add missing inline keywords 2016-10-25 20:20:09 +02:00
Benoit Steiner
3e194a6a73 Fixed a typo 2016-10-25 08:42:15 -07:00
Gael Guennebaud
58146be99b bug #1004: one more rewrite of LinSpaced for floating point numbers to guarantee both interpolation and monotonicity.
This version simply does low+i*step plus a branch to return high if i==size-1.
Vectorization is accomplished with a branch and the help of pinsertlast.
Some quick benchmark revealed that the overhead is really marginal, even when filling small vectors.
2016-10-25 16:53:09 +02:00
Gael Guennebaud
13fc18d3a2 Add a pinsertlast function replacing the last entry of a packet by a scalar.
(useful to vectorize LinSpaced)
2016-10-25 16:48:49 +02:00
Gael Guennebaud
2634f9386c bug #1333: fix bad usage of const_cast_derived. Better use .data() for that purpose. 2016-10-24 22:22:35 +02:00
Gael Guennebaud
9e8f07d7b5 Cleanup ArrayWrapper and MatrixWrapper by removing redundant accessors. 2016-10-24 22:16:48 +02:00
Gael Guennebaud
b027d7a8cf bug #1004: remove the inaccurate "sequential" path for LinSpaced, mark respective function as deprecated, and enforce strict interpolation of the higher range using a correction term.
Now, even with floating point precision, both the 'low' and 'high' bounds are exactly reproduced at i=0 and i=size-1 respectively.
2016-10-24 20:27:21 +02:00
Benoit Steiner
b11aab5fcc Merged in benoitsteiner/opencl (pull request PR-238)
Added support for OpenCL to the Tensor Module
2016-10-24 15:30:45 +00:00
Gael Guennebaud
53c77061f0 bug #698: rewrite LinSpaced for integer scalar types to avoid overflow and guarantee an even spacing when possible.
Otherwise, the "high" bound is implicitly lowered to the largest value allowing for an even distribution.
This changeset also disable vectorization for this integer path.
2016-10-24 15:50:27 +02:00
Gael Guennebaud
40f62974b7 bug #1328: workaround a compilation issue with gcc 4.2 2016-10-20 19:19:37 +02:00
Benoit Steiner
cf20b30d65 Merge latest updates from trunk 2016-10-20 09:42:05 -07:00
Benoit Steiner
d3943cd50c Fixed a few typos in the ternary tensor expressions types 2016-10-19 12:56:12 -07:00
Mehdi Goli
8fb162fc85 Fixing the typo regarding missing #if needed for proper handling of exceptions in Eigen/Core. 2016-10-16 12:52:34 +01:00
Luke Iwanski
2e188dd4d4 Merged ComputeCpp to default. 2016-10-14 16:47:40 +01:00
Mehdi Goli
15380f9a87 Applyiing Benoit's comment to return the missing line back in Eigen/Core 2016-10-14 16:39:41 +01:00
Gael Guennebaud
692b30ca95 Fix previous merge. 2016-10-14 17:16:28 +02:00
Gael Guennebaud
050c681bdd Merged in rmlarsen/eigen2 (pull request PR-232)
Improve performance of parallelized matrix multiply for rectangular matrices
2016-10-14 14:51:09 +00:00
Luke Iwanski
e742da8b28 Merged ComputeCpp into default. 2016-10-14 13:36:51 +01:00
Mehdi Goli
524fa4c46f Reducing the code by generalising sycl backend functions/structs. 2016-10-14 12:09:55 +01:00
Benoit Steiner
737e4152c3 Merged in lukier/eigen (pull request PR-234)
Enabling CUDA in Geometry
2016-10-13 18:09:28 +00:00
Robert Lukierski
a94791b69a Fixes for min and abs after Benoit's comments, switched to numext. 2016-10-13 15:00:22 +01:00
Avi Ginsburg
ac63d6891c Patch to allow VS2015 & CUDA 8.0 to compile with Eigen included. I'm not sure
whether to limit the check to this compiler combination
(` || (EIGEN_COMP_MSVC == 1900 &&  __CUDACC_VER__) `)
or to leave it as it is. I also don't know if this will have any affect on
including Eigen in device code (I'm not in my current project).
2016-10-13 08:47:32 +00:00
Benoit Steiner
7e4a6754b2 Merged eigen/eigen into default 2016-10-12 22:42:33 -07:00
Benoit Steiner
38b6048e14 Deleted redundant implementation of predux 2016-10-12 14:37:56 -07:00
Gael Guennebaud
e74612b9a0 Remove double ;; 2016-10-12 22:49:47 +02:00
Benoit Steiner
78d2926508 Merged eigen/eigen into default 2016-10-12 13:46:29 -07:00
Benoit Steiner
2e2f48e30e Take advantage of AVX512 instructions whenever possible to speedup the processing of 16 bit floats. 2016-10-12 13:45:39 -07:00
Gael Guennebaud
f939c351cb Fix SPQR for rectangular matrices 2016-10-12 22:39:33 +02:00
Robert Lukierski
471075f7ad Fixes min() warnings. 2016-10-12 18:59:05 +01:00
Gael Guennebaud
5c366fe1d7 Merged in rmlarsen/eigen (pull request PR-230)
Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1
2016-10-12 16:30:51 +00:00
Robert Lukierski
86711497c4 Adding EIGEN_DEVICE_FUNC in the Geometry module.
Additional CUDA necessary fixes in the Core (mostly usage of
EIGEN_USING_STD_MATH).
2016-10-12 16:35:17 +01:00
Rasmus Munk Larsen
47150af1c8 Fix copy-paste error: Must use _mm256_cmp_ps for AVX. 2016-10-12 08:34:39 -07:00
Gael Guennebaud
89e315152c bug #1325: fix compilation on NEON with clang 2016-10-12 16:55:47 +02:00
Benoit Steiner
5727e4d89c Reenabled the use of variadic templates on tegra x1 provides that the latest version (i.e. JetPack 2.3) is used. 2016-10-08 22:19:03 +00:00
Benoit Steiner
5c68051cd7 Merge the content of the ComputeCpp branch into the default branch 2016-10-07 11:04:16 -07:00
Gael Guennebaud
4860727ac2 Remove static qualifier of free-functions (inline is enough and this helps ICC to find the right overload) 2016-10-07 09:21:12 +02:00
Benoit Steiner
507b661106 Renamed predux_half into predux_downto4 2016-10-06 17:57:04 -07:00
Benoit Steiner
a498ff7df6 Fixed incorrect comment 2016-10-06 15:27:27 -07:00
Benoit Steiner
a7473d6d5a Fixed compilation error with gcc >= 5.3 2016-10-06 14:33:22 -07:00
Benoit Steiner
5e64cea896 Silenced a compilation warning 2016-10-06 14:24:17 -07:00
Benoit Steiner
d485d12c51 Added missing AVX intrinsics for fp16: in particular, implemented predux which is required by the matrix-vector code. 2016-10-06 10:41:03 -07:00
Rasmus Munk Larsen
48c635e223 Add a simple cost model to prevent Eigen's parallel GEMM from using too many threads when the inner dimension is small.
Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K.

Improvements in Wall time:

Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1                    3088      1610    +47.9%
BM_OuterishProd/64/4                    3562      2414    +32.2%
BM_OuterishProd/64/32                   8861      7815    +11.8%
BM_OuterishProd/128/1                  11363      6504    +42.8%
BM_OuterishProd/128/4                  11128      9794    +12.0%
BM_OuterishProd/128/64                 27691     27396     +1.1%
BM_OuterishProd/256/1                  33214     28123    +15.3%
BM_OuterishProd/256/4                  34312     36818     -7.3%
BM_OuterishProd/256/128               174866    176398     -0.9%
BM_OuterishProd/512/1                7963684    104224    +98.7%
BM_OuterishProd/512/4                7987913    112867    +98.6%
BM_OuterishProd/512/256              8198378   1306500    +84.1%
BM_OuterishProd/1k/1                 7356256    324432    +95.6%
BM_OuterishProd/1k/4                 8129616    331621    +95.9%
BM_OuterishProd/1k/512              27265418   7517538    +72.4%

Improvements in CPU time:

Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1                    6169      1608    +73.9%
BM_OuterishProd/64/4                    7117      2412    +66.1%
BM_OuterishProd/64/32                  17702     15616    +11.8%
BM_OuterishProd/128/1                  45415      6498    +85.7%
BM_OuterishProd/128/4                  44459      9786    +78.0%
BM_OuterishProd/128/64                110657    109489     +1.1%
BM_OuterishProd/256/1                 265158     28101    +89.4%
BM_OuterishProd/256/4                 274234    183885    +32.9%
BM_OuterishProd/256/128              1397160   1408776     -0.8%
BM_OuterishProd/512/1               78947048    520703    +99.3%
BM_OuterishProd/512/4               86955578   1349742    +98.4%
BM_OuterishProd/512/256             74701613  15584661    +79.1%
BM_OuterishProd/1k/1                78352601   3877911    +95.1%
BM_OuterishProd/1k/4                78521643   3966221    +94.9%
BM_OuterishProd/1k/512              258104736  89480530    +65.3%
2016-10-06 10:33:10 -07:00
Benoit Steiner
9f3276981c Enabling AVX512 should also enable AVX2. 2016-10-06 10:29:48 -07:00
Gael Guennebaud
80b5133789 Fix compilation of qr.inverse() for column and full pivoting variants. 2016-10-06 09:55:50 +02:00
Benoit Steiner
4131074818 Deleted unecessary CMakeLists.txt file 2016-10-05 18:54:35 -07:00
Benoit Steiner
cb5cd69872 Silenced a compilation warning. 2016-10-05 18:50:53 -07:00
Benoit Steiner
78b569f685 Merged latest updates from trunk 2016-10-05 18:48:55 -07:00
Benoit Steiner
9c2b6c049b Silenced a few compilation warnings 2016-10-05 18:37:31 -07:00
Benoit Steiner
ae1385c7e4 Pull the latest updates from trunk 2016-10-05 14:54:36 -07:00
Benoit Steiner
698ff69450 Properly characterize the CUDA packet primitives for fp16 as device only 2016-10-04 16:53:30 -07:00
Rasmus Munk Larsen
7f67e6dfdb Update comment for fast sqrt. 2016-10-04 15:09:11 -07:00
Rasmus Munk Larsen
765615609d Update comment for fast sqrt. 2016-10-04 15:08:41 -07:00
Rasmus Munk Larsen
3ed67cb0bb Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments.
Benchmark speed in Giga-sqrts/s
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
-----------------------------------------
                    SSE        AVX
Fast=1              2.529G     4.380G
Fast=0              1.944G     1.898G
Fast=1 fixed        2.214G     3.739G

This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.
2016-10-04 14:22:56 -07:00
Benoit Steiner
881b90e984 Use explicit type casting to generate packets of zeros. 2016-10-04 08:23:38 -07:00
Benoit Steiner
409e887d78 Added support for constand std::complex numbers on GPU 2016-10-03 11:06:24 -07:00
Gael Guennebaud
9d6d0dff8f bug #1317: fix performance regression with some Block expressions and clang by helping it to remove dead code.
The trick is to get rid of the nested expression in the evaluator by copying only the required information (here, the strides).
2016-10-01 15:37:00 +02:00
Gael Guennebaud
8b84801f7f bug #1310: workaround a compilation regression from 3.2 regarding triangular * homogeneous 2016-09-30 22:49:59 +02:00
Gael Guennebaud
67b4f45836 Fix angle range 2016-09-30 12:46:33 +02:00
Gael Guennebaud
27f3970453 Remove std:: prefix 2016-09-30 12:40:41 +02:00
Gael Guennebaud
3860a0bc8f bug #1312: Quaternion to AxisAngle conversion now ensures the angle will be in the range [-pi,pi]. This also increases accuracy when q.w is negative. 2016-09-29 23:23:35 +02:00
Gael Guennebaud
33500050c3 bug #1308: fix compilation of some small products involving nullary-expressions. 2016-09-29 09:40:44 +02:00
Benoit Steiner
27d7628f16 Updated the list of warnings to reflect the new message ids introduced in cuda 8.0 2016-09-28 17:42:59 -07:00
Gael Guennebaud
f3a00dd2b5 Merged in sergiu/eigen (pull request PR-229)
Disabled MSVC level 4 warning C4714
2016-09-27 09:28:08 +02:00
Gael Guennebaud
892afb9416 Add debug info. 2016-09-26 23:53:57 +02:00
Gael Guennebaud
779774f98c bug #1311: fix alignment logic in some cases of (scalar*small).lazyProduct(small) 2016-09-26 23:53:40 +02:00
Gael Guennebaud
48dfe98abd bug #1308: fix compilation of vector * rowvector::nullary. 2016-09-25 14:54:35 +02:00
Sergiu Deitsch
fe29157d02 disabled MSVC level 4 warning C4714
The level 4 warning (/W4) warns about functions marked as __forceinline not
inlined, and generates a lot of noise.
2016-09-25 14:25:47 +02:00
Gael Guennebaud
86caba838d bug #1304: fix Projective * scaling and Projective *= scaling 2016-09-23 13:41:21 +02:00
Benoit Steiner
2a69290ddb Added a specialization of Eigen::numext::real and Eigen::numext::imag for std::complex<T> to be used when compiling a cuda kernel. This is unfortunately necessary to be able to process complex numbers from a CUDA kernel on MacOS. 2016-09-22 15:52:23 -07:00
Gael Guennebaud
77e27fbeee bump to 3.3-rc1 2016-09-22 22:37:39 +02:00
Gael Guennebaud
2ada122bc6 merge 2016-09-22 22:33:18 +02:00
Gael Guennebaud
8f2bdde373 merge 2016-09-22 22:32:55 +02:00
Gael Guennebaud
ba0f844d6b Backout changeset ce3557ca69 2016-09-22 22:28:51 +02:00
Benoit Steiner
50e3bbfc90 Calls x.imag() instead of imag(x) when x is a complex number since the former
is a constexpr while the later isn't. This fixes compilation errors triggered by nvcc on Mac.
2016-09-22 13:17:25 -07:00
Gael Guennebaud
ca3746c6f8 Bypass identity reflectors. 2016-09-22 22:07:13 +02:00
Felix Gruber
8bde7da086 fix documentation of LinSpaced
The index of the highest value in a LinSpace is size-1.
2016-09-22 14:50:07 +02:00
Gael Guennebaud
66cbabafed Add a note regarding gcc bug #72867 2016-09-22 11:18:52 +02:00
Gael Guennebaud
9fa2c8650e Fix alignement of statically allocated temporaries in symv, and trmv. 2016-09-21 17:34:24 +02:00
Gael Guennebaud
ac5377e161 Improve cost estimation of complex division 2016-09-21 17:26:04 +02:00
Benoit Steiner
26f9907542 Added missing typedefs 2016-09-20 12:58:03 -07:00
RJ Ryan
b2c6dc48d9 Add CUDA-specific std::complex<T> specializations for scalar_sum_op, scalar_difference_op, scalar_product_op, and scalar_quotient_op. 2016-09-20 07:18:20 -07:00
Benoit Steiner
8a66ca4b10 Pulled latest updates from trunk 2016-09-19 14:13:55 -07:00
Benoit Steiner
59e9edfbf1 Removed EIGEN_DEVICE_FUNC qualifers for the lu(), fullPivLu(), partialPivLu(), and inverse() functions since they aren't ready to run on GPU 2016-09-19 14:13:20 -07:00
Hongkai Dai
5dcc6d301a remove ternary operator in euler angles 2016-09-19 10:30:30 -07:00
Luke Iwanski
b91e021172 Merged with default. 2016-09-19 14:03:54 +01:00
Luke Iwanski
cb81975714 Partial OpenCL support via SYCL compatible with ComputeCpp CE. 2016-09-19 12:44:13 +01:00
Gael Guennebaud
4cc2c73e6a Fix alignement of statically allocated temporaries in gemv. 2016-09-17 12:52:27 +02:00
Christoph Hertzberg
ce3557ca69 Make makeHouseholder more stable for cases where real(c0) is not very small (but the rest is). 2016-09-16 14:24:47 +02:00
Gael Guennebaud
ee62f168e6 Doc: add link from block methods to respective tutorial section. 2016-09-16 11:26:25 +02:00
Gael Guennebaud
ca7f061a5f bug #828: clarify documentation of SparseMatrixBase's methods returning a sub-matrix. 2016-09-16 11:23:19 +02:00
Gael Guennebaud
50e203c717 bug #828: clarify documentation of SparseMatrixBase's unary methods. 2016-09-16 10:40:50 +02:00
Gael Guennebaud
fa9049a544 Let be consistent and consider any denormal number as zero. 2016-09-15 11:24:03 +02:00
Gael Guennebaud
b33144e4df merge 2016-09-15 11:22:16 +02:00
Benoit Steiner
c0d56a543e Added several missing EIGEN_DEVICE_FUNC qualifiers 2016-09-14 14:06:21 -07:00
Benoit Steiner
779faaaeba Fixed compilation warnings generated by nvcc 6.5 (and below) when compiling the EIGEN_THROW macro 2016-09-14 09:56:11 -07:00
Gael Guennebaud
1c8347e554 Fix product for custom complex type. (conjugation was ignored) 2016-09-14 18:28:49 +02:00
Benoit Steiner
ff47717f25 Suppress warning 2527 and 2529, which correspond to the "calling a __host__ function from a __host__ __device__ function is not allowed" message in nvcc 6.5. 2016-09-13 12:49:40 -07:00
Benoit Steiner
309190cf02 Suppress message 1222 when compiling with nvcc: this ensures that we don't warnings about unknown warning messages when compiling with older versions of nvcc 2016-09-13 12:42:13 -07:00
Gael Guennebaud
c10620b2b0 Fix typo in doc. 2016-09-13 09:25:07 +02:00
Gael Guennebaud
73c8f2f697 bug #1285: fix regression introduced in changeset 00c29c2cae 2016-09-13 07:58:39 +02:00
Benoit Steiner
5f50f12d2c Added the ability to compute the absolute value of a complex number on GPU, as well as a test to catch the problem. 2016-09-12 13:46:13 -07:00
Gael Guennebaud
228ae29591 Fix compilation on 32 bits systems. 2016-09-09 22:34:38 +02:00
Gael Guennebaud
471eac5399 bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX) 2016-09-08 08:36:27 +02:00
Gael Guennebaud
d780983f59 Doc: explain minimal requirements on nullary functors 2016-09-06 23:14:52 +02:00
Gael Guennebaud
85fb517eaf Generalize ScalarBinaryOpTraits to any complex-real combination as defined by NumTraits (instead of supporting std::complex only). 2016-09-06 17:23:15 +02:00
Gael Guennebaud
447f269561 Disable previous workaround. 2016-09-06 15:49:02 +02:00
Gael Guennebaud
b046a3f87d Workaround MSVC instantiation faillure of has_*ary_operator at the level of triats<Ref>::match so that the has_*ary_operator are really properly instantiated throughout the compilation unit. 2016-09-06 15:47:04 +02:00
Gael Guennebaud
3cb914f332 bug #1266: remove CUDA guards on MatrixBase::<decomposition> definitions. (those used to break old nvcc versions that we propably don't care anymore) 2016-09-06 09:55:50 +02:00
Gael Guennebaud
19a95b3309 Fix shadowing wrt Eigen::Index 2016-09-05 17:19:47 +02:00
Gael Guennebaud
e13071dd13 Workaround a weird msvc 2012 compilation error. 2016-09-05 15:50:41 +02:00
Gael Guennebaud
d123717e21 Fix for msvc 2012 and older 2016-09-05 15:26:56 +02:00
Benoit Steiner
373c340b71 Fixed a typo 2016-09-02 15:41:17 -07:00
Benoit Steiner
5a6be66cef Turned the Index type used by the nullary wrapper into a template parameter. 2016-09-02 14:10:29 -07:00
Gael Guennebaud
d6c8366d84 Fix compilation with MSVC 2012 2016-09-02 15:23:32 +02:00
Gael Guennebaud
ef54723dbe One more msvc fix iteration, the previous one was over-simplified for visual 2016-09-01 15:04:53 +02:00
Gael Guennebaud
f9f32e9e2d Fix compilation with nvcc 2016-09-01 13:06:14 +02:00
Gael Guennebaud
3d946e42b3 Fix compilation with visual studio 2016-09-01 12:59:32 +02:00
Gael Guennebaud
836fa25a82 Make sure sizeof is truelly needed, thus improving SFINAE portability. 2016-08-31 23:40:18 +02:00