eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-01-06 14:14:46 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	3ecb343dc3	Fix regression in X = (X*X.transpose())/s with X rectangular by deferring resizing of the destination after the creation of the evaluator of the source expression.	2016-10-26 22:50:41 +02:00
Gael Guennebaud	97feea9d39	add a generic EIGEN_HAS_CXX11	2016-10-26 15:53:13 +02:00
Gael Guennebaud	ca6a2a5248	Fix warning with ICC	2016-10-26 14:13:05 +02:00
Gael Guennebaud	b15a5dc3f4	Fix ICC warnings	2016-10-25 22:20:24 +02:00
Gael Guennebaud	aad72f3c6d	Add missing inline keywords	2016-10-25 20:20:09 +02:00
Benoit Steiner	3e194a6a73	Fixed a typo	2016-10-25 08:42:15 -07:00
Gael Guennebaud	58146be99b	bug #1004 : one more rewrite of LinSpaced for floating point numbers to guarantee both interpolation and monotonicity. This version simply does low+i*step plus a branch to return high if i==size-1. Vectorization is accomplished with a branch and the help of pinsertlast. Some quick benchmark revealed that the overhead is really marginal, even when filling small vectors.	2016-10-25 16:53:09 +02:00
Gael Guennebaud	13fc18d3a2	Add a pinsertlast function replacing the last entry of a packet by a scalar. (useful to vectorize LinSpaced)	2016-10-25 16:48:49 +02:00
Gael Guennebaud	2634f9386c	bug #1333 : fix bad usage of const_cast_derived. Better use .data() for that purpose.	2016-10-24 22:22:35 +02:00
Gael Guennebaud	9e8f07d7b5	Cleanup ArrayWrapper and MatrixWrapper by removing redundant accessors.	2016-10-24 22:16:48 +02:00
Gael Guennebaud	b027d7a8cf	bug #1004 : remove the inaccurate "sequential" path for LinSpaced, mark respective function as deprecated, and enforce strict interpolation of the higher range using a correction term. Now, even with floating point precision, both the 'low' and 'high' bounds are exactly reproduced at i=0 and i=size-1 respectively.	2016-10-24 20:27:21 +02:00
Benoit Steiner	b11aab5fcc	Merged in benoitsteiner/opencl (pull request PR-238) Added support for OpenCL to the Tensor Module	2016-10-24 15:30:45 +00:00
Gael Guennebaud	53c77061f0	bug #698 : rewrite LinSpaced for integer scalar types to avoid overflow and guarantee an even spacing when possible. Otherwise, the "high" bound is implicitly lowered to the largest value allowing for an even distribution. This changeset also disable vectorization for this integer path.	2016-10-24 15:50:27 +02:00
Gael Guennebaud	40f62974b7	bug #1328 : workaround a compilation issue with gcc 4.2	2016-10-20 19:19:37 +02:00
Benoit Steiner	cf20b30d65	Merge latest updates from trunk	2016-10-20 09:42:05 -07:00
Benoit Steiner	d3943cd50c	Fixed a few typos in the ternary tensor expressions types	2016-10-19 12:56:12 -07:00
Mehdi Goli	8fb162fc85	Fixing the typo regarding missing #if needed for proper handling of exceptions in Eigen/Core.	2016-10-16 12:52:34 +01:00
Luke Iwanski	2e188dd4d4	Merged ComputeCpp to default.	2016-10-14 16:47:40 +01:00
Mehdi Goli	15380f9a87	Applyiing Benoit's comment to return the missing line back in Eigen/Core	2016-10-14 16:39:41 +01:00
Gael Guennebaud	692b30ca95	Fix previous merge.	2016-10-14 17:16:28 +02:00
Gael Guennebaud	050c681bdd	Merged in rmlarsen/eigen2 (pull request PR-232) Improve performance of parallelized matrix multiply for rectangular matrices	2016-10-14 14:51:09 +00:00
Luke Iwanski	e742da8b28	Merged ComputeCpp into default.	2016-10-14 13:36:51 +01:00
Mehdi Goli	524fa4c46f	Reducing the code by generalising sycl backend functions/structs.	2016-10-14 12:09:55 +01:00
Benoit Steiner	737e4152c3	Merged in lukier/eigen (pull request PR-234) Enabling CUDA in Geometry	2016-10-13 18:09:28 +00:00
Robert Lukierski	a94791b69a	Fixes for min and abs after Benoit's comments, switched to numext.	2016-10-13 15:00:22 +01:00
Avi Ginsburg	ac63d6891c	Patch to allow VS2015 & CUDA 8.0 to compile with Eigen included. I'm not sure whether to limit the check to this compiler combination (` \|\| (EIGEN_COMP_MSVC == 1900 && __CUDACC_VER__) `) or to leave it as it is. I also don't know if this will have any affect on including Eigen in device code (I'm not in my current project).	2016-10-13 08:47:32 +00:00
Benoit Steiner	7e4a6754b2	Merged eigen/eigen into default	2016-10-12 22:42:33 -07:00
Gael Guennebaud	e74612b9a0	Remove double ;;	2016-10-12 22:49:47 +02:00
Gael Guennebaud	f939c351cb	Fix SPQR for rectangular matrices	2016-10-12 22:39:33 +02:00
Robert Lukierski	471075f7ad	Fixes min() warnings.	2016-10-12 18:59:05 +01:00
Gael Guennebaud	5c366fe1d7	Merged in rmlarsen/eigen (pull request PR-230) Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1	2016-10-12 16:30:51 +00:00
Robert Lukierski	86711497c4	Adding EIGEN_DEVICE_FUNC in the Geometry module. Additional CUDA necessary fixes in the Core (mostly usage of EIGEN_USING_STD_MATH).	2016-10-12 16:35:17 +01:00
Rasmus Munk Larsen	47150af1c8	Fix copy-paste error: Must use _mm256_cmp_ps for AVX.	2016-10-12 08:34:39 -07:00
Gael Guennebaud	89e315152c	bug #1325 : fix compilation on NEON with clang	2016-10-12 16:55:47 +02:00
Benoit Steiner	5727e4d89c	Reenabled the use of variadic templates on tegra x1 provides that the latest version (i.e. JetPack 2.3) is used.	2016-10-08 22:19:03 +00:00
Benoit Steiner	5c68051cd7	Merge the content of the ComputeCpp branch into the default branch	2016-10-07 11:04:16 -07:00
Gael Guennebaud	4860727ac2	Remove static qualifier of free-functions (inline is enough and this helps ICC to find the right overload)	2016-10-07 09:21:12 +02:00
Benoit Steiner	d485d12c51	Added missing AVX intrinsics for fp16: in particular, implemented predux which is required by the matrix-vector code.	2016-10-06 10:41:03 -07:00
Rasmus Munk Larsen	48c635e223	Add a simple cost model to prevent Eigen's parallel GEMM from using too many threads when the inner dimension is small. Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K. Improvements in Wall time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 3088 1610 +47.9% BM_OuterishProd/64/4 3562 2414 +32.2% BM_OuterishProd/64/32 8861 7815 +11.8% BM_OuterishProd/128/1 11363 6504 +42.8% BM_OuterishProd/128/4 11128 9794 +12.0% BM_OuterishProd/128/64 27691 27396 +1.1% BM_OuterishProd/256/1 33214 28123 +15.3% BM_OuterishProd/256/4 34312 36818 -7.3% BM_OuterishProd/256/128 174866 176398 -0.9% BM_OuterishProd/512/1 7963684 104224 +98.7% BM_OuterishProd/512/4 7987913 112867 +98.6% BM_OuterishProd/512/256 8198378 1306500 +84.1% BM_OuterishProd/1k/1 7356256 324432 +95.6% BM_OuterishProd/1k/4 8129616 331621 +95.9% BM_OuterishProd/1k/512 27265418 7517538 +72.4% Improvements in CPU time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 6169 1608 +73.9% BM_OuterishProd/64/4 7117 2412 +66.1% BM_OuterishProd/64/32 17702 15616 +11.8% BM_OuterishProd/128/1 45415 6498 +85.7% BM_OuterishProd/128/4 44459 9786 +78.0% BM_OuterishProd/128/64 110657 109489 +1.1% BM_OuterishProd/256/1 265158 28101 +89.4% BM_OuterishProd/256/4 274234 183885 +32.9% BM_OuterishProd/256/128 1397160 1408776 -0.8% BM_OuterishProd/512/1 78947048 520703 +99.3% BM_OuterishProd/512/4 86955578 1349742 +98.4% BM_OuterishProd/512/256 74701613 15584661 +79.1% BM_OuterishProd/1k/1 78352601 3877911 +95.1% BM_OuterishProd/1k/4 78521643 3966221 +94.9% BM_OuterishProd/1k/512 258104736 89480530 +65.3%	2016-10-06 10:33:10 -07:00
Gael Guennebaud	80b5133789	Fix compilation of qr.inverse() for column and full pivoting variants.	2016-10-06 09:55:50 +02:00
Benoit Steiner	ae1385c7e4	Pull the latest updates from trunk	2016-10-05 14:54:36 -07:00
Benoit Steiner	698ff69450	Properly characterize the CUDA packet primitives for fp16 as device only	2016-10-04 16:53:30 -07:00
Rasmus Munk Larsen	7f67e6dfdb	Update comment for fast sqrt.	2016-10-04 15:09:11 -07:00
Rasmus Munk Larsen	765615609d	Update comment for fast sqrt.	2016-10-04 15:08:41 -07:00
Rasmus Munk Larsen	3ed67cb0bb	Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments. Benchmark speed in Giga-sqrts/s Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz ----------------------------------------- SSE AVX Fast=1 2.529G 4.380G Fast=0 1.944G 1.898G Fast=1 fixed 2.214G 3.739G This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.	2016-10-04 14:22:56 -07:00
Benoit Steiner	881b90e984	Use explicit type casting to generate packets of zeros.	2016-10-04 08:23:38 -07:00
Benoit Steiner	409e887d78	Added support for constand std::complex numbers on GPU	2016-10-03 11:06:24 -07:00
Gael Guennebaud	9d6d0dff8f	bug #1317 : fix performance regression with some Block expressions and clang by helping it to remove dead code. The trick is to get rid of the nested expression in the evaluator by copying only the required information (here, the strides).	2016-10-01 15:37:00 +02:00
Gael Guennebaud	8b84801f7f	bug #1310 : workaround a compilation regression from 3.2 regarding triangular * homogeneous	2016-09-30 22:49:59 +02:00
Gael Guennebaud	67b4f45836	Fix angle range	2016-09-30 12:46:33 +02:00

1 2 3 4 5 ...

5016 Commits