eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-21 07:19:46 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	a6b971e291	Fix memory leak in Ref<Sparse>	2016-12-05 16:59:30 +01:00
Gael Guennebaud	8640ffac65	Optimize SparseLU::solve for rhs vectors	2016-12-05 15:41:14 +01:00
Gael Guennebaud	62acd67903	remove temporary in SparseLU::solve	2016-12-05 15:11:57 +01:00
Gael Guennebaud	0db6d5b3f4	bug #1356 : fix calls to evaluator::coeffRef(0,0) to get the address of the destination by adding a dstDataPtr() member to the kernel. This fixes undefined behavior if dst is empty (nullptr).	2016-12-05 15:08:09 +01:00
Gael Guennebaud	91003f3b86	typo	2016-12-05 13:51:07 +01:00
Gael Guennebaud	e3f613cbd4	Improve performance of row-major-dense-matrix * vector products for recent CPUs. This revised version does not bother about aligned loads/stores, and rather processes 8 rows at ones for better instruction pipelining.	2016-12-05 13:02:01 +01:00
Gael Guennebaud	3abc827354	Clean debugging code	2016-12-05 12:59:32 +01:00
Benoit Steiner	462c28e77a	Merged in srvasude/eigen (pull request PR-265) Add Expm1 support to Eigen.	2016-12-05 02:31:11 +00:00
Gael Guennebaud	6a5fe86098	Complete rewrite of column-major-matrix * vector product to deliver higher performance of modern CPU. The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive. This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA. According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast. Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching. We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).	2016-12-03 21:14:14 +01:00
Christoph Hertzberg	22f7d398e2	bug #1355 : Fixed wrong line-endings on two files	2016-12-02 11:22:05 +01:00
Gael Guennebaud	27873008d4	Clean up SparseCore module regarding ReverseInnerIterator	2016-12-01 21:55:10 +01:00
Angelos Mantzaflaris	8c24723a09	typo UIntPtr (grafted from `b6f04a2dd4` )	2016-12-01 21:25:58 +01:00
Angelos Mantzaflaris	aeba0d8655	fix two warnings(unused typedef, unused variable) and a typo (grafted from `a9aa3bcf50` )	2016-12-01 21:23:43 +01:00
Gael Guennebaud	181138a1cb	fix member order	2016-12-01 17:06:20 +01:00
Gael Guennebaud	9f297d57ae	Merged in rmlarsen/eigen (pull request PR-256) Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.	2016-12-01 15:27:33 +00:00
Benoit Steiner	7ff26ddcbb	Merged eigen/eigen into default	2016-12-01 07:13:17 -08:00
Gael Guennebaud	037b46762d	Fix misleading-indentation warnings.	2016-12-01 16:05:42 +01:00
Mehdi Goli	79aa2b784e	Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.	2016-12-01 13:02:27 +00:00
Benoit Steiner	fd1dc3363e	Merged eigen/eigen into default	2016-11-30 20:16:17 -08:00
Gael Guennebaud	8df272af88	Fix slection of product implementation for dynamic size matrices with fixed max size.	2016-11-30 22:21:33 +01:00
Gael Guennebaud	c927af60ed	Fix a performance regression in (matmat)vec for which mat*mat was evaluated multiple times.	2016-11-30 17:59:13 +01:00
Gael Guennebaud	ab4ef5e66e	bug #1351 : fix compilation of random with old compilers	2016-11-30 17:37:53 +01:00
Rasmus Munk Larsen	a0329f64fb	Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.	2016-11-29 13:18:09 -08:00
Benoit Steiner	9f8fbd9434	Merged eigen/eigen into default	2016-11-26 11:28:25 -08:00
Mehdi Goli	7318daf887	Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h.	2016-11-25 16:19:07 +00:00
Benoit Steiner	3be1afca11	Disabled the "remove the call to 'std::abs' since unsigned values cannot be negative" warning introduced in clang 3.5	2016-11-23 18:49:51 -08:00
Mehdi Goli	b8cc5635d5	Removing unsupported device from test case; cleaning the tensor device sycl.	2016-11-23 16:30:41 +00:00
Gael Guennebaud	e340866c81	Fix compilation with gcc and old ABI version	2016-11-23 14:04:57 +01:00
Gael Guennebaud	a91de27e98	Fix compilation issue with MSVC: MSVC always messes up with shadowed template arguments, for instance in: struct B { typedef float T; } template<typename T> struct A : B { T g; }; The type of A<double>::g will be float and not double.	2016-11-23 12:24:48 +01:00
Gael Guennebaud	74637fa4e3	Optimize predux<Packet8f> (AVX)	2016-11-22 21:57:52 +01:00
Gael Guennebaud	178c084856	Disable usage of SSE3 _mm_hadd_ps that is extremely slow.	2016-11-22 21:53:14 +01:00
Gael Guennebaud	7dd894e40e	Optimize predux<Packet4d> (AVX)	2016-11-22 21:41:30 +01:00
Gael Guennebaud	f3fb0a1940	Disable usage of SSE3 haddpd that is extremely slow.	2016-11-22 16:58:31 +01:00
Gael Guennebaud	6a84246a6a	Fix regression in assigment of sparse block to spasre block.	2016-11-21 21:46:42 +01:00
Benoit Steiner	ed839c5851	Enable the use of constant expressions with clang >= 3.6	2016-11-20 10:34:49 -08:00
Gael Guennebaud	465ede0f20	Fix compilation issue in mat = permutation (regression introduced in `8193ffb3d3` )	2016-11-20 09:41:37 +01:00
Benoit Steiner	81151bd474	Fixed merge conflicts	2016-11-19 19:12:59 -08:00
Benoit Steiner	1bdf1b9ce0	Merged in benoitsteiner/opencl (pull request PR-253) OpenCL improvements	2016-11-19 04:44:43 +00:00
Benoit Steiner	8649e16c2a	Enable EIGEN_HAS_C99_MATH when building with the latest version of Visual Studio	2016-11-18 14:18:34 -08:00
Gael Guennebaud	164414c563	Merged in ChunW/eigen (pull request PR-252) Workaround for error in VS2012 with /clr	2016-11-18 21:07:29 +00:00
Luke Iwanski	5159675c33	Added isnan, isfinite and isinf for SYCL device. Plus test for that.	2016-11-18 16:01:48 +00:00
Gael Guennebaud	8193ffb3d3	bug #1343 : fix compilation regression in mat+=selfadjoint_view. Generic EigenBase2EigenBase assignment was incomplete.	2016-11-18 10:17:34 +01:00
Gael Guennebaud	cebff7e3a2	bug #1343 : fix compilation regression in array = matrix_product	2016-11-18 10:09:33 +01:00
Benoit Steiner	7c30078b9f	Merged eigen/eigen into default	2016-11-17 22:53:37 -08:00
Chun Wang	0d0948c3b9	Workaround for error in VS2012 with /clr	2016-11-17 17:54:27 -05:00
Konstantinos Margaritis	672aa97d4d	implement float/std::complex<float> for ZVector as well, minor fixes to ZVector	2016-11-17 13:27:33 -05:00
Luke Iwanski	c5130dedbe	Specialised basic math functions for SYCL device.	2016-11-17 11:47:13 +00:00
Benoit Steiner	f2e8b73256	Enable the use of AVX512 instruction by default	2016-11-16 21:28:04 -08:00
Gael Guennebaud	7b09e4dd8c	bump default branch to 3.3.90	2016-11-16 22:20:58 +01:00
Benoit Steiner	dff9a049c4	Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs	2016-11-16 09:01:51 -08:00
Gael Guennebaud	0ee92aa38e	Optimize sparse<bool> && sparse<bool> to use the same path as for coeff-wise products.	2016-11-14 18:47:41 +01:00
Gael Guennebaud	2e334f5da0	bug #426 : move operator && and \|\| to MatrixBase and SparseMatrixBase.	2016-11-14 18:47:02 +01:00
Gael Guennebaud	a048aba14c	Merged in olesalscheider/eigen (pull request PR-248) Make sure not to call numext::maxi on expression templates	2016-11-14 13:25:53 +00:00
Gael Guennebaud	eedb87f4ba	Fix regression in SparseMatrix::ReverseInnerIterator	2016-11-14 14:05:53 +01:00
Niels Ole Salscheider	51fef87408	Make sure not to call numext::maxi on expression templates	2016-11-12 12:20:57 +01:00
Gael Guennebaud	eeac81b8c0	bump to 3.3.0	2016-11-10 13:55:14 +01:00
Gael Guennebaud	e80bc2ddb0	Fix printing of sparse expressions	2016-11-10 10:35:32 +01:00
Benoit Steiner	db3903498d	Merged in benoitsteiner/opencl (pull request PR-246) Improved support for OpenCL	2016-11-08 22:28:44 +00:00
Gael Guennebaud	436a111792	Generalize Cholmod support to hanlde any sparse type as the rhs and result of the solve method	2016-11-06 20:29:23 +01:00
Gael Guennebaud	afc55b1885	Generalize IterativeSolverBase::solve to hanlde any sparse type as the results (instead of SparseMatrix only)	2016-11-06 20:28:18 +01:00
Gael Guennebaud	a5c2d8a3cc	Generalize solve_sparse_through_dense_panels to handle SparseVector.	2016-11-06 15:20:58 +01:00
Gael Guennebaud	f8bfe10613	Add missing friend declaration	2016-11-06 15:20:30 +01:00
Gael Guennebaud	fc7180cda8	Add a default ctor to evaluator<SparseVector>. Needed for evaluator<Solve>.	2016-11-06 15:20:00 +01:00
Gael Guennebaud	4d226ab5b5	Enable swapping between SparseMatrix and SparseVector	2016-11-06 15:15:03 +01:00
Gael Guennebaud	a354c3ca59	Fix compilation of LLT with complex<mpreal>.	2016-11-05 11:28:29 +01:00
Benoit Steiner	d46a36cc84	Merged eigen/eigen into default	2016-11-04 18:22:55 -07:00
Mehdi Goli	0ebe3808ca	Removed the sycl include from Eigen/Core and moved it to Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size;	2016-11-04 18:18:19 +00:00
Gael Guennebaud	ba05572dcb	bump to 3.3-rc2	2016-11-04 09:09:06 +01:00
Benoit Steiner	5c3995769c	Improved AVX512 configuration	2016-11-03 04:50:28 -07:00
Benoit Steiner	ca0ba0d9a4	Improved AVX512 support	2016-11-03 04:00:49 -07:00
Benoit Steiner	c80587c92b	Merged eigen/eigen into default	2016-11-03 03:55:11 -07:00
Gael Guennebaud	3f1d0cdc22	bug #1337 : improve doc of homogeneous() and hnormalized()	2016-11-03 11:03:08 +01:00
Gael Guennebaud	78e93ac1ad	bug #1330 : Cholmod supports double precision only, so let's trigger a static assertion if the scalar type does not match this requirement.	2016-11-03 10:21:59 +01:00
Benoit Steiner	3e37166d0b	Merged in benoitsteiner/opencl (pull request PR-244) Disable vectorization on device only when compiling for sycl	2016-11-02 22:01:03 +00:00
Benoit Steiner	0585b2965d	Disable vectorization on device only when compiling for sycl	2016-11-02 11:44:27 -07:00
Gael Guennebaud	a07bb428df	bug #1004 : improve accuracy of LinSpaced for abs(low) >> abs(high).	2016-11-02 11:34:38 +01:00
Gael Guennebaud	598de8b193	Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX.	2016-11-02 10:38:13 +01:00
Benoit Steiner	7a0e96b80d	Gate the code that refers to cuda fp16 primitives more thoroughly	2016-11-01 12:08:09 -07:00
Gael Guennebaud	3ecb343dc3	Fix regression in X = (X*X.transpose())/s with X rectangular by deferring resizing of the destination after the creation of the evaluator of the source expression.	2016-10-26 22:50:41 +02:00
Gael Guennebaud	97feea9d39	add a generic EIGEN_HAS_CXX11	2016-10-26 15:53:13 +02:00
Gael Guennebaud	ca6a2a5248	Fix warning with ICC	2016-10-26 14:13:05 +02:00
Gael Guennebaud	b15a5dc3f4	Fix ICC warnings	2016-10-25 22:20:24 +02:00
Gael Guennebaud	aad72f3c6d	Add missing inline keywords	2016-10-25 20:20:09 +02:00
Benoit Steiner	3e194a6a73	Fixed a typo	2016-10-25 08:42:15 -07:00
Gael Guennebaud	58146be99b	bug #1004 : one more rewrite of LinSpaced for floating point numbers to guarantee both interpolation and monotonicity. This version simply does low+i*step plus a branch to return high if i==size-1. Vectorization is accomplished with a branch and the help of pinsertlast. Some quick benchmark revealed that the overhead is really marginal, even when filling small vectors.	2016-10-25 16:53:09 +02:00
Gael Guennebaud	13fc18d3a2	Add a pinsertlast function replacing the last entry of a packet by a scalar. (useful to vectorize LinSpaced)	2016-10-25 16:48:49 +02:00
Gael Guennebaud	2634f9386c	bug #1333 : fix bad usage of const_cast_derived. Better use .data() for that purpose.	2016-10-24 22:22:35 +02:00
Gael Guennebaud	9e8f07d7b5	Cleanup ArrayWrapper and MatrixWrapper by removing redundant accessors.	2016-10-24 22:16:48 +02:00
Gael Guennebaud	b027d7a8cf	bug #1004 : remove the inaccurate "sequential" path for LinSpaced, mark respective function as deprecated, and enforce strict interpolation of the higher range using a correction term. Now, even with floating point precision, both the 'low' and 'high' bounds are exactly reproduced at i=0 and i=size-1 respectively.	2016-10-24 20:27:21 +02:00
Benoit Steiner	b11aab5fcc	Merged in benoitsteiner/opencl (pull request PR-238) Added support for OpenCL to the Tensor Module	2016-10-24 15:30:45 +00:00
Gael Guennebaud	53c77061f0	bug #698 : rewrite LinSpaced for integer scalar types to avoid overflow and guarantee an even spacing when possible. Otherwise, the "high" bound is implicitly lowered to the largest value allowing for an even distribution. This changeset also disable vectorization for this integer path.	2016-10-24 15:50:27 +02:00
Gael Guennebaud	40f62974b7	bug #1328 : workaround a compilation issue with gcc 4.2	2016-10-20 19:19:37 +02:00
Benoit Steiner	cf20b30d65	Merge latest updates from trunk	2016-10-20 09:42:05 -07:00
Benoit Steiner	d3943cd50c	Fixed a few typos in the ternary tensor expressions types	2016-10-19 12:56:12 -07:00
Mehdi Goli	8fb162fc85	Fixing the typo regarding missing #if needed for proper handling of exceptions in Eigen/Core.	2016-10-16 12:52:34 +01:00
Luke Iwanski	2e188dd4d4	Merged ComputeCpp to default.	2016-10-14 16:47:40 +01:00
Mehdi Goli	15380f9a87	Applyiing Benoit's comment to return the missing line back in Eigen/Core	2016-10-14 16:39:41 +01:00
Gael Guennebaud	692b30ca95	Fix previous merge.	2016-10-14 17:16:28 +02:00
Gael Guennebaud	050c681bdd	Merged in rmlarsen/eigen2 (pull request PR-232) Improve performance of parallelized matrix multiply for rectangular matrices	2016-10-14 14:51:09 +00:00
Luke Iwanski	e742da8b28	Merged ComputeCpp into default.	2016-10-14 13:36:51 +01:00
Mehdi Goli	524fa4c46f	Reducing the code by generalising sycl backend functions/structs.	2016-10-14 12:09:55 +01:00
Benoit Steiner	737e4152c3	Merged in lukier/eigen (pull request PR-234) Enabling CUDA in Geometry	2016-10-13 18:09:28 +00:00
Robert Lukierski	a94791b69a	Fixes for min and abs after Benoit's comments, switched to numext.	2016-10-13 15:00:22 +01:00
Avi Ginsburg	ac63d6891c	Patch to allow VS2015 & CUDA 8.0 to compile with Eigen included. I'm not sure whether to limit the check to this compiler combination (` \|\| (EIGEN_COMP_MSVC == 1900 && __CUDACC_VER__) `) or to leave it as it is. I also don't know if this will have any affect on including Eigen in device code (I'm not in my current project).	2016-10-13 08:47:32 +00:00
Benoit Steiner	7e4a6754b2	Merged eigen/eigen into default	2016-10-12 22:42:33 -07:00
Benoit Steiner	38b6048e14	Deleted redundant implementation of predux	2016-10-12 14:37:56 -07:00
Gael Guennebaud	e74612b9a0	Remove double ;;	2016-10-12 22:49:47 +02:00
Benoit Steiner	78d2926508	Merged eigen/eigen into default	2016-10-12 13:46:29 -07:00
Benoit Steiner	2e2f48e30e	Take advantage of AVX512 instructions whenever possible to speedup the processing of 16 bit floats.	2016-10-12 13:45:39 -07:00
Gael Guennebaud	f939c351cb	Fix SPQR for rectangular matrices	2016-10-12 22:39:33 +02:00
Robert Lukierski	471075f7ad	Fixes min() warnings.	2016-10-12 18:59:05 +01:00
Gael Guennebaud	5c366fe1d7	Merged in rmlarsen/eigen (pull request PR-230) Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1	2016-10-12 16:30:51 +00:00
Robert Lukierski	86711497c4	Adding EIGEN_DEVICE_FUNC in the Geometry module. Additional CUDA necessary fixes in the Core (mostly usage of EIGEN_USING_STD_MATH).	2016-10-12 16:35:17 +01:00
Rasmus Munk Larsen	47150af1c8	Fix copy-paste error: Must use _mm256_cmp_ps for AVX.	2016-10-12 08:34:39 -07:00
Gael Guennebaud	89e315152c	bug #1325 : fix compilation on NEON with clang	2016-10-12 16:55:47 +02:00
Benoit Steiner	5727e4d89c	Reenabled the use of variadic templates on tegra x1 provides that the latest version (i.e. JetPack 2.3) is used.	2016-10-08 22:19:03 +00:00
Benoit Steiner	5c68051cd7	Merge the content of the ComputeCpp branch into the default branch	2016-10-07 11:04:16 -07:00
Gael Guennebaud	4860727ac2	Remove static qualifier of free-functions (inline is enough and this helps ICC to find the right overload)	2016-10-07 09:21:12 +02:00
Benoit Steiner	507b661106	Renamed predux_half into predux_downto4	2016-10-06 17:57:04 -07:00
Benoit Steiner	a498ff7df6	Fixed incorrect comment	2016-10-06 15:27:27 -07:00
Benoit Steiner	a7473d6d5a	Fixed compilation error with gcc >= 5.3	2016-10-06 14:33:22 -07:00
Benoit Steiner	5e64cea896	Silenced a compilation warning	2016-10-06 14:24:17 -07:00
Benoit Steiner	d485d12c51	Added missing AVX intrinsics for fp16: in particular, implemented predux which is required by the matrix-vector code.	2016-10-06 10:41:03 -07:00
Rasmus Munk Larsen	48c635e223	Add a simple cost model to prevent Eigen's parallel GEMM from using too many threads when the inner dimension is small. Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K. Improvements in Wall time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 3088 1610 +47.9% BM_OuterishProd/64/4 3562 2414 +32.2% BM_OuterishProd/64/32 8861 7815 +11.8% BM_OuterishProd/128/1 11363 6504 +42.8% BM_OuterishProd/128/4 11128 9794 +12.0% BM_OuterishProd/128/64 27691 27396 +1.1% BM_OuterishProd/256/1 33214 28123 +15.3% BM_OuterishProd/256/4 34312 36818 -7.3% BM_OuterishProd/256/128 174866 176398 -0.9% BM_OuterishProd/512/1 7963684 104224 +98.7% BM_OuterishProd/512/4 7987913 112867 +98.6% BM_OuterishProd/512/256 8198378 1306500 +84.1% BM_OuterishProd/1k/1 7356256 324432 +95.6% BM_OuterishProd/1k/4 8129616 331621 +95.9% BM_OuterishProd/1k/512 27265418 7517538 +72.4% Improvements in CPU time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 6169 1608 +73.9% BM_OuterishProd/64/4 7117 2412 +66.1% BM_OuterishProd/64/32 17702 15616 +11.8% BM_OuterishProd/128/1 45415 6498 +85.7% BM_OuterishProd/128/4 44459 9786 +78.0% BM_OuterishProd/128/64 110657 109489 +1.1% BM_OuterishProd/256/1 265158 28101 +89.4% BM_OuterishProd/256/4 274234 183885 +32.9% BM_OuterishProd/256/128 1397160 1408776 -0.8% BM_OuterishProd/512/1 78947048 520703 +99.3% BM_OuterishProd/512/4 86955578 1349742 +98.4% BM_OuterishProd/512/256 74701613 15584661 +79.1% BM_OuterishProd/1k/1 78352601 3877911 +95.1% BM_OuterishProd/1k/4 78521643 3966221 +94.9% BM_OuterishProd/1k/512 258104736 89480530 +65.3%	2016-10-06 10:33:10 -07:00
Benoit Steiner	9f3276981c	Enabling AVX512 should also enable AVX2.	2016-10-06 10:29:48 -07:00
Gael Guennebaud	80b5133789	Fix compilation of qr.inverse() for column and full pivoting variants.	2016-10-06 09:55:50 +02:00
Benoit Steiner	4131074818	Deleted unecessary CMakeLists.txt file	2016-10-05 18:54:35 -07:00
Benoit Steiner	cb5cd69872	Silenced a compilation warning.	2016-10-05 18:50:53 -07:00
Benoit Steiner	78b569f685	Merged latest updates from trunk	2016-10-05 18:48:55 -07:00
Benoit Steiner	9c2b6c049b	Silenced a few compilation warnings	2016-10-05 18:37:31 -07:00
Benoit Steiner	ae1385c7e4	Pull the latest updates from trunk	2016-10-05 14:54:36 -07:00
Benoit Steiner	698ff69450	Properly characterize the CUDA packet primitives for fp16 as device only	2016-10-04 16:53:30 -07:00
Rasmus Munk Larsen	7f67e6dfdb	Update comment for fast sqrt.	2016-10-04 15:09:11 -07:00
Rasmus Munk Larsen	765615609d	Update comment for fast sqrt.	2016-10-04 15:08:41 -07:00
Rasmus Munk Larsen	3ed67cb0bb	Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments. Benchmark speed in Giga-sqrts/s Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz ----------------------------------------- SSE AVX Fast=1 2.529G 4.380G Fast=0 1.944G 1.898G Fast=1 fixed 2.214G 3.739G This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.	2016-10-04 14:22:56 -07:00
Benoit Steiner	881b90e984	Use explicit type casting to generate packets of zeros.	2016-10-04 08:23:38 -07:00
Benoit Steiner	409e887d78	Added support for constand std::complex numbers on GPU	2016-10-03 11:06:24 -07:00
Gael Guennebaud	9d6d0dff8f	bug #1317 : fix performance regression with some Block expressions and clang by helping it to remove dead code. The trick is to get rid of the nested expression in the evaluator by copying only the required information (here, the strides).	2016-10-01 15:37:00 +02:00
Gael Guennebaud	8b84801f7f	bug #1310 : workaround a compilation regression from 3.2 regarding triangular * homogeneous	2016-09-30 22:49:59 +02:00
Gael Guennebaud	67b4f45836	Fix angle range	2016-09-30 12:46:33 +02:00
Gael Guennebaud	27f3970453	Remove std:: prefix	2016-09-30 12:40:41 +02:00
Gael Guennebaud	3860a0bc8f	bug #1312 : Quaternion to AxisAngle conversion now ensures the angle will be in the range [-pi,pi]. This also increases accuracy when q.w is negative.	2016-09-29 23:23:35 +02:00
Gael Guennebaud	33500050c3	bug #1308 : fix compilation of some small products involving nullary-expressions.	2016-09-29 09:40:44 +02:00
Benoit Steiner	27d7628f16	Updated the list of warnings to reflect the new message ids introduced in cuda 8.0	2016-09-28 17:42:59 -07:00
Gael Guennebaud	f3a00dd2b5	Merged in sergiu/eigen (pull request PR-229) Disabled MSVC level 4 warning C4714	2016-09-27 09:28:08 +02:00
Gael Guennebaud	892afb9416	Add debug info.	2016-09-26 23:53:57 +02:00
Gael Guennebaud	779774f98c	bug #1311 : fix alignment logic in some cases of (scalar*small).lazyProduct(small)	2016-09-26 23:53:40 +02:00
Gael Guennebaud	48dfe98abd	bug #1308 : fix compilation of vector * rowvector::nullary.	2016-09-25 14:54:35 +02:00
Sergiu Deitsch	fe29157d02	disabled MSVC level 4 warning C4714 The level 4 warning (/W4) warns about functions marked as __forceinline not inlined, and generates a lot of noise.	2016-09-25 14:25:47 +02:00
Gael Guennebaud	86caba838d	bug #1304 : fix Projective * scaling and Projective *= scaling	2016-09-23 13:41:21 +02:00
Benoit Steiner	2a69290ddb	Added a specialization of Eigen::numext::real and Eigen::numext::imag for std::complex<T> to be used when compiling a cuda kernel. This is unfortunately necessary to be able to process complex numbers from a CUDA kernel on MacOS.	2016-09-22 15:52:23 -07:00
Gael Guennebaud	77e27fbeee	bump to 3.3-rc1	2016-09-22 22:37:39 +02:00
Gael Guennebaud	2ada122bc6	merge	2016-09-22 22:33:18 +02:00
Gael Guennebaud	8f2bdde373	merge	2016-09-22 22:32:55 +02:00
Gael Guennebaud	ba0f844d6b	Backout changeset `ce3557ca69`	2016-09-22 22:28:51 +02:00
Benoit Steiner	50e3bbfc90	Calls x.imag() instead of imag(x) when x is a complex number since the former is a constexpr while the later isn't. This fixes compilation errors triggered by nvcc on Mac.	2016-09-22 13:17:25 -07:00
Gael Guennebaud	ca3746c6f8	Bypass identity reflectors.	2016-09-22 22:07:13 +02:00
Felix Gruber	8bde7da086	fix documentation of LinSpaced The index of the highest value in a LinSpace is size-1.	2016-09-22 14:50:07 +02:00
Gael Guennebaud	66cbabafed	Add a note regarding gcc bug #72867	2016-09-22 11:18:52 +02:00
Gael Guennebaud	9fa2c8650e	Fix alignement of statically allocated temporaries in symv, and trmv.	2016-09-21 17:34:24 +02:00
Gael Guennebaud	ac5377e161	Improve cost estimation of complex division	2016-09-21 17:26:04 +02:00
Benoit Steiner	26f9907542	Added missing typedefs	2016-09-20 12:58:03 -07:00
RJ Ryan	b2c6dc48d9	Add CUDA-specific std::complex<T> specializations for scalar_sum_op, scalar_difference_op, scalar_product_op, and scalar_quotient_op.	2016-09-20 07:18:20 -07:00
Benoit Steiner	8a66ca4b10	Pulled latest updates from trunk	2016-09-19 14:13:55 -07:00
Benoit Steiner	59e9edfbf1	Removed EIGEN_DEVICE_FUNC qualifers for the lu(), fullPivLu(), partialPivLu(), and inverse() functions since they aren't ready to run on GPU	2016-09-19 14:13:20 -07:00
Hongkai Dai	5dcc6d301a	remove ternary operator in euler angles	2016-09-19 10:30:30 -07:00
Luke Iwanski	b91e021172	Merged with default.	2016-09-19 14:03:54 +01:00
Luke Iwanski	cb81975714	Partial OpenCL support via SYCL compatible with ComputeCpp CE.	2016-09-19 12:44:13 +01:00
Gael Guennebaud	4cc2c73e6a	Fix alignement of statically allocated temporaries in gemv.	2016-09-17 12:52:27 +02:00
Christoph Hertzberg	ce3557ca69	Make makeHouseholder more stable for cases where real(c0) is not very small (but the rest is).	2016-09-16 14:24:47 +02:00
Gael Guennebaud	ee62f168e6	Doc: add link from block methods to respective tutorial section.	2016-09-16 11:26:25 +02:00
Gael Guennebaud	ca7f061a5f	bug #828 : clarify documentation of SparseMatrixBase's methods returning a sub-matrix.	2016-09-16 11:23:19 +02:00
Gael Guennebaud	50e203c717	bug #828 : clarify documentation of SparseMatrixBase's unary methods.	2016-09-16 10:40:50 +02:00
Gael Guennebaud	fa9049a544	Let be consistent and consider any denormal number as zero.	2016-09-15 11:24:03 +02:00
Gael Guennebaud	b33144e4df	merge	2016-09-15 11:22:16 +02:00
Benoit Steiner	c0d56a543e	Added several missing EIGEN_DEVICE_FUNC qualifiers	2016-09-14 14:06:21 -07:00
Benoit Steiner	779faaaeba	Fixed compilation warnings generated by nvcc 6.5 (and below) when compiling the EIGEN_THROW macro	2016-09-14 09:56:11 -07:00
Gael Guennebaud	1c8347e554	Fix product for custom complex type. (conjugation was ignored)	2016-09-14 18:28:49 +02:00
Benoit Steiner	ff47717f25	Suppress warning 2527 and 2529, which correspond to the "calling a __host__ function from a __host__ __device__ function is not allowed" message in nvcc 6.5.	2016-09-13 12:49:40 -07:00
Benoit Steiner	309190cf02	Suppress message 1222 when compiling with nvcc: this ensures that we don't warnings about unknown warning messages when compiling with older versions of nvcc	2016-09-13 12:42:13 -07:00
Gael Guennebaud	c10620b2b0	Fix typo in doc.	2016-09-13 09:25:07 +02:00
Gael Guennebaud	73c8f2f697	bug #1285 : fix regression introduced in changeset `00c29c2cae`	2016-09-13 07:58:39 +02:00
Benoit Steiner	5f50f12d2c	Added the ability to compute the absolute value of a complex number on GPU, as well as a test to catch the problem.	2016-09-12 13:46:13 -07:00
Gael Guennebaud	228ae29591	Fix compilation on 32 bits systems.	2016-09-09 22:34:38 +02:00
Gael Guennebaud	471eac5399	bug #1195 : move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX)	2016-09-08 08:36:27 +02:00
Gael Guennebaud	d780983f59	Doc: explain minimal requirements on nullary functors	2016-09-06 23:14:52 +02:00
Gael Guennebaud	85fb517eaf	Generalize ScalarBinaryOpTraits to any complex-real combination as defined by NumTraits (instead of supporting std::complex only).	2016-09-06 17:23:15 +02:00
Gael Guennebaud	447f269561	Disable previous workaround.	2016-09-06 15:49:02 +02:00
Gael Guennebaud	b046a3f87d	Workaround MSVC instantiation faillure of has_ary_operator at the level of triats<Ref>::match so that the has_ary_operator are really properly instantiated throughout the compilation unit.	2016-09-06 15:47:04 +02:00
Gael Guennebaud	3cb914f332	bug #1266 : remove CUDA guards on MatrixBase::<decomposition> definitions. (those used to break old nvcc versions that we propably don't care anymore)	2016-09-06 09:55:50 +02:00
Gael Guennebaud	19a95b3309	Fix shadowing wrt Eigen::Index	2016-09-05 17:19:47 +02:00
Gael Guennebaud	e13071dd13	Workaround a weird msvc 2012 compilation error.	2016-09-05 15:50:41 +02:00
Gael Guennebaud	d123717e21	Fix for msvc 2012 and older	2016-09-05 15:26:56 +02:00
Benoit Steiner	373c340b71	Fixed a typo	2016-09-02 15:41:17 -07:00
Benoit Steiner	5a6be66cef	Turned the Index type used by the nullary wrapper into a template parameter.	2016-09-02 14:10:29 -07:00
Gael Guennebaud	d6c8366d84	Fix compilation with MSVC 2012	2016-09-02 15:23:32 +02:00
Gael Guennebaud	ef54723dbe	One more msvc fix iteration, the previous one was over-simplified for visual	2016-09-01 15:04:53 +02:00
Gael Guennebaud	f9f32e9e2d	Fix compilation with nvcc	2016-09-01 13:06:14 +02:00
Gael Guennebaud	3d946e42b3	Fix compilation with visual studio	2016-09-01 12:59:32 +02:00
Gael Guennebaud	836fa25a82	Make sure sizeof is truelly needed, thus improving SFINAE portability.	2016-08-31 23:40:18 +02:00

... 2 3 4 5 6 ...

5283 Commits