Mehdi Goli
524fa4c46f
Reducing the code by generalising sycl backend functions/structs.
2016-10-14 12:09:55 +01:00
Benoit Steiner
737e4152c3
Merged in lukier/eigen (pull request PR-234)
...
Enabling CUDA in Geometry
2016-10-13 18:09:28 +00:00
Robert Lukierski
a94791b69a
Fixes for min and abs after Benoit's comments, switched to numext.
2016-10-13 15:00:22 +01:00
Avi Ginsburg
ac63d6891c
Patch to allow VS2015 & CUDA 8.0 to compile with Eigen included. I'm not sure
...
whether to limit the check to this compiler combination
(` || (EIGEN_COMP_MSVC == 1900 && __CUDACC_VER__) `)
or to leave it as it is. I also don't know if this will have any affect on
including Eigen in device code (I'm not in my current project).
2016-10-13 08:47:32 +00:00
Benoit Steiner
7e4a6754b2
Merged eigen/eigen into default
2016-10-12 22:42:33 -07:00
Benoit Steiner
38b6048e14
Deleted redundant implementation of predux
2016-10-12 14:37:56 -07:00
Gael Guennebaud
e74612b9a0
Remove double ;;
2016-10-12 22:49:47 +02:00
Benoit Steiner
78d2926508
Merged eigen/eigen into default
2016-10-12 13:46:29 -07:00
Benoit Steiner
2e2f48e30e
Take advantage of AVX512 instructions whenever possible to speedup the processing of 16 bit floats.
2016-10-12 13:45:39 -07:00
Gael Guennebaud
f939c351cb
Fix SPQR for rectangular matrices
2016-10-12 22:39:33 +02:00
Robert Lukierski
471075f7ad
Fixes min() warnings.
2016-10-12 18:59:05 +01:00
Gael Guennebaud
5c366fe1d7
Merged in rmlarsen/eigen (pull request PR-230)
...
Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1
2016-10-12 16:30:51 +00:00
Robert Lukierski
86711497c4
Adding EIGEN_DEVICE_FUNC in the Geometry module.
...
Additional CUDA necessary fixes in the Core (mostly usage of
EIGEN_USING_STD_MATH).
2016-10-12 16:35:17 +01:00
Rasmus Munk Larsen
47150af1c8
Fix copy-paste error: Must use _mm256_cmp_ps for AVX.
2016-10-12 08:34:39 -07:00
Gael Guennebaud
89e315152c
bug #1325 : fix compilation on NEON with clang
2016-10-12 16:55:47 +02:00
Benoit Steiner
5727e4d89c
Reenabled the use of variadic templates on tegra x1 provides that the latest version (i.e. JetPack 2.3) is used.
2016-10-08 22:19:03 +00:00
Benoit Steiner
5c68051cd7
Merge the content of the ComputeCpp branch into the default branch
2016-10-07 11:04:16 -07:00
Gael Guennebaud
4860727ac2
Remove static qualifier of free-functions (inline is enough and this helps ICC to find the right overload)
2016-10-07 09:21:12 +02:00
Benoit Steiner
507b661106
Renamed predux_half into predux_downto4
2016-10-06 17:57:04 -07:00
Benoit Steiner
a498ff7df6
Fixed incorrect comment
2016-10-06 15:27:27 -07:00
Benoit Steiner
a7473d6d5a
Fixed compilation error with gcc >= 5.3
2016-10-06 14:33:22 -07:00
Benoit Steiner
5e64cea896
Silenced a compilation warning
2016-10-06 14:24:17 -07:00
Benoit Steiner
d485d12c51
Added missing AVX intrinsics for fp16: in particular, implemented predux which is required by the matrix-vector code.
2016-10-06 10:41:03 -07:00
Rasmus Munk Larsen
48c635e223
Add a simple cost model to prevent Eigen's parallel GEMM from using too many threads when the inner dimension is small.
...
Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K.
Improvements in Wall time:
Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1 3088 1610 +47.9%
BM_OuterishProd/64/4 3562 2414 +32.2%
BM_OuterishProd/64/32 8861 7815 +11.8%
BM_OuterishProd/128/1 11363 6504 +42.8%
BM_OuterishProd/128/4 11128 9794 +12.0%
BM_OuterishProd/128/64 27691 27396 +1.1%
BM_OuterishProd/256/1 33214 28123 +15.3%
BM_OuterishProd/256/4 34312 36818 -7.3%
BM_OuterishProd/256/128 174866 176398 -0.9%
BM_OuterishProd/512/1 7963684 104224 +98.7%
BM_OuterishProd/512/4 7987913 112867 +98.6%
BM_OuterishProd/512/256 8198378 1306500 +84.1%
BM_OuterishProd/1k/1 7356256 324432 +95.6%
BM_OuterishProd/1k/4 8129616 331621 +95.9%
BM_OuterishProd/1k/512 27265418 7517538 +72.4%
Improvements in CPU time:
Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1 6169 1608 +73.9%
BM_OuterishProd/64/4 7117 2412 +66.1%
BM_OuterishProd/64/32 17702 15616 +11.8%
BM_OuterishProd/128/1 45415 6498 +85.7%
BM_OuterishProd/128/4 44459 9786 +78.0%
BM_OuterishProd/128/64 110657 109489 +1.1%
BM_OuterishProd/256/1 265158 28101 +89.4%
BM_OuterishProd/256/4 274234 183885 +32.9%
BM_OuterishProd/256/128 1397160 1408776 -0.8%
BM_OuterishProd/512/1 78947048 520703 +99.3%
BM_OuterishProd/512/4 86955578 1349742 +98.4%
BM_OuterishProd/512/256 74701613 15584661 +79.1%
BM_OuterishProd/1k/1 78352601 3877911 +95.1%
BM_OuterishProd/1k/4 78521643 3966221 +94.9%
BM_OuterishProd/1k/512 258104736 89480530 +65.3%
2016-10-06 10:33:10 -07:00
Benoit Steiner
9f3276981c
Enabling AVX512 should also enable AVX2.
2016-10-06 10:29:48 -07:00
Gael Guennebaud
80b5133789
Fix compilation of qr.inverse() for column and full pivoting variants.
2016-10-06 09:55:50 +02:00
Benoit Steiner
4131074818
Deleted unecessary CMakeLists.txt file
2016-10-05 18:54:35 -07:00
Benoit Steiner
cb5cd69872
Silenced a compilation warning.
2016-10-05 18:50:53 -07:00
Benoit Steiner
78b569f685
Merged latest updates from trunk
2016-10-05 18:48:55 -07:00
Benoit Steiner
9c2b6c049b
Silenced a few compilation warnings
2016-10-05 18:37:31 -07:00
Benoit Steiner
ae1385c7e4
Pull the latest updates from trunk
2016-10-05 14:54:36 -07:00
Benoit Steiner
698ff69450
Properly characterize the CUDA packet primitives for fp16 as device only
2016-10-04 16:53:30 -07:00
Rasmus Munk Larsen
7f67e6dfdb
Update comment for fast sqrt.
2016-10-04 15:09:11 -07:00
Rasmus Munk Larsen
765615609d
Update comment for fast sqrt.
2016-10-04 15:08:41 -07:00
Rasmus Munk Larsen
3ed67cb0bb
Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments.
...
Benchmark speed in Giga-sqrts/s
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
-----------------------------------------
SSE AVX
Fast=1 2.529G 4.380G
Fast=0 1.944G 1.898G
Fast=1 fixed 2.214G 3.739G
This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.
2016-10-04 14:22:56 -07:00
Benoit Steiner
881b90e984
Use explicit type casting to generate packets of zeros.
2016-10-04 08:23:38 -07:00
Benoit Steiner
409e887d78
Added support for constand std::complex numbers on GPU
2016-10-03 11:06:24 -07:00
Gael Guennebaud
9d6d0dff8f
bug #1317 : fix performance regression with some Block expressions and clang by helping it to remove dead code.
...
The trick is to get rid of the nested expression in the evaluator by copying only the required information (here, the strides).
2016-10-01 15:37:00 +02:00
Gael Guennebaud
8b84801f7f
bug #1310 : workaround a compilation regression from 3.2 regarding triangular * homogeneous
2016-09-30 22:49:59 +02:00
Gael Guennebaud
67b4f45836
Fix angle range
2016-09-30 12:46:33 +02:00
Gael Guennebaud
27f3970453
Remove std:: prefix
2016-09-30 12:40:41 +02:00
Gael Guennebaud
3860a0bc8f
bug #1312 : Quaternion to AxisAngle conversion now ensures the angle will be in the range [-pi,pi]. This also increases accuracy when q.w is negative.
2016-09-29 23:23:35 +02:00
Gael Guennebaud
33500050c3
bug #1308 : fix compilation of some small products involving nullary-expressions.
2016-09-29 09:40:44 +02:00
Benoit Steiner
27d7628f16
Updated the list of warnings to reflect the new message ids introduced in cuda 8.0
2016-09-28 17:42:59 -07:00
Gael Guennebaud
f3a00dd2b5
Merged in sergiu/eigen (pull request PR-229)
...
Disabled MSVC level 4 warning C4714
2016-09-27 09:28:08 +02:00
Gael Guennebaud
892afb9416
Add debug info.
2016-09-26 23:53:57 +02:00
Gael Guennebaud
779774f98c
bug #1311 : fix alignment logic in some cases of (scalar*small).lazyProduct(small)
2016-09-26 23:53:40 +02:00
Gael Guennebaud
48dfe98abd
bug #1308 : fix compilation of vector * rowvector::nullary.
2016-09-25 14:54:35 +02:00
Sergiu Deitsch
fe29157d02
disabled MSVC level 4 warning C4714
...
The level 4 warning (/W4) warns about functions marked as __forceinline not
inlined, and generates a lot of noise.
2016-09-25 14:25:47 +02:00
Gael Guennebaud
86caba838d
bug #1304 : fix Projective * scaling and Projective *= scaling
2016-09-23 13:41:21 +02:00
Benoit Steiner
2a69290ddb
Added a specialization of Eigen::numext::real and Eigen::numext::imag for std::complex<T> to be used when compiling a cuda kernel. This is unfortunately necessary to be able to process complex numbers from a CUDA kernel on MacOS.
2016-09-22 15:52:23 -07:00
Gael Guennebaud
77e27fbeee
bump to 3.3-rc1
2016-09-22 22:37:39 +02:00
Gael Guennebaud
2ada122bc6
merge
2016-09-22 22:33:18 +02:00
Gael Guennebaud
8f2bdde373
merge
2016-09-22 22:32:55 +02:00
Gael Guennebaud
ba0f844d6b
Backout changeset ce3557ca69
2016-09-22 22:28:51 +02:00
Benoit Steiner
50e3bbfc90
Calls x.imag() instead of imag(x) when x is a complex number since the former
...
is a constexpr while the later isn't. This fixes compilation errors triggered by nvcc on Mac.
2016-09-22 13:17:25 -07:00
Gael Guennebaud
ca3746c6f8
Bypass identity reflectors.
2016-09-22 22:07:13 +02:00
Felix Gruber
8bde7da086
fix documentation of LinSpaced
...
The index of the highest value in a LinSpace is size-1.
2016-09-22 14:50:07 +02:00
Gael Guennebaud
66cbabafed
Add a note regarding gcc bug #72867
2016-09-22 11:18:52 +02:00
Gael Guennebaud
9fa2c8650e
Fix alignement of statically allocated temporaries in symv, and trmv.
2016-09-21 17:34:24 +02:00
Gael Guennebaud
ac5377e161
Improve cost estimation of complex division
2016-09-21 17:26:04 +02:00
Benoit Steiner
26f9907542
Added missing typedefs
2016-09-20 12:58:03 -07:00
RJ Ryan
b2c6dc48d9
Add CUDA-specific std::complex<T> specializations for scalar_sum_op, scalar_difference_op, scalar_product_op, and scalar_quotient_op.
2016-09-20 07:18:20 -07:00
Benoit Steiner
8a66ca4b10
Pulled latest updates from trunk
2016-09-19 14:13:55 -07:00
Benoit Steiner
59e9edfbf1
Removed EIGEN_DEVICE_FUNC qualifers for the lu(), fullPivLu(), partialPivLu(), and inverse() functions since they aren't ready to run on GPU
2016-09-19 14:13:20 -07:00
Hongkai Dai
5dcc6d301a
remove ternary operator in euler angles
2016-09-19 10:30:30 -07:00
Luke Iwanski
b91e021172
Merged with default.
2016-09-19 14:03:54 +01:00
Luke Iwanski
cb81975714
Partial OpenCL support via SYCL compatible with ComputeCpp CE.
2016-09-19 12:44:13 +01:00
Gael Guennebaud
4cc2c73e6a
Fix alignement of statically allocated temporaries in gemv.
2016-09-17 12:52:27 +02:00
Christoph Hertzberg
ce3557ca69
Make makeHouseholder more stable for cases where real(c0) is not very small (but the rest is).
2016-09-16 14:24:47 +02:00
Gael Guennebaud
ee62f168e6
Doc: add link from block methods to respective tutorial section.
2016-09-16 11:26:25 +02:00
Gael Guennebaud
ca7f061a5f
bug #828 : clarify documentation of SparseMatrixBase's methods returning a sub-matrix.
2016-09-16 11:23:19 +02:00
Gael Guennebaud
50e203c717
bug #828 : clarify documentation of SparseMatrixBase's unary methods.
2016-09-16 10:40:50 +02:00
Gael Guennebaud
fa9049a544
Let be consistent and consider any denormal number as zero.
2016-09-15 11:24:03 +02:00
Gael Guennebaud
b33144e4df
merge
2016-09-15 11:22:16 +02:00
Benoit Steiner
c0d56a543e
Added several missing EIGEN_DEVICE_FUNC qualifiers
2016-09-14 14:06:21 -07:00
Benoit Steiner
779faaaeba
Fixed compilation warnings generated by nvcc 6.5 (and below) when compiling the EIGEN_THROW macro
2016-09-14 09:56:11 -07:00
Gael Guennebaud
1c8347e554
Fix product for custom complex type. (conjugation was ignored)
2016-09-14 18:28:49 +02:00
Benoit Steiner
ff47717f25
Suppress warning 2527 and 2529, which correspond to the "calling a __host__ function from a __host__ __device__ function is not allowed" message in nvcc 6.5.
2016-09-13 12:49:40 -07:00
Benoit Steiner
309190cf02
Suppress message 1222 when compiling with nvcc: this ensures that we don't warnings about unknown warning messages when compiling with older versions of nvcc
2016-09-13 12:42:13 -07:00
Gael Guennebaud
c10620b2b0
Fix typo in doc.
2016-09-13 09:25:07 +02:00
Gael Guennebaud
73c8f2f697
bug #1285 : fix regression introduced in changeset 00c29c2cae
2016-09-13 07:58:39 +02:00
Benoit Steiner
5f50f12d2c
Added the ability to compute the absolute value of a complex number on GPU, as well as a test to catch the problem.
2016-09-12 13:46:13 -07:00
Gael Guennebaud
228ae29591
Fix compilation on 32 bits systems.
2016-09-09 22:34:38 +02:00
Gael Guennebaud
471eac5399
bug #1195 : move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX)
2016-09-08 08:36:27 +02:00
Gael Guennebaud
d780983f59
Doc: explain minimal requirements on nullary functors
2016-09-06 23:14:52 +02:00
Gael Guennebaud
85fb517eaf
Generalize ScalarBinaryOpTraits to any complex-real combination as defined by NumTraits (instead of supporting std::complex only).
2016-09-06 17:23:15 +02:00
Gael Guennebaud
447f269561
Disable previous workaround.
2016-09-06 15:49:02 +02:00
Gael Guennebaud
b046a3f87d
Workaround MSVC instantiation faillure of has_*ary_operator at the level of triats<Ref>::match so that the has_*ary_operator are really properly instantiated throughout the compilation unit.
2016-09-06 15:47:04 +02:00
Gael Guennebaud
3cb914f332
bug #1266 : remove CUDA guards on MatrixBase::<decomposition> definitions. (those used to break old nvcc versions that we propably don't care anymore)
2016-09-06 09:55:50 +02:00
Gael Guennebaud
19a95b3309
Fix shadowing wrt Eigen::Index
2016-09-05 17:19:47 +02:00
Gael Guennebaud
e13071dd13
Workaround a weird msvc 2012 compilation error.
2016-09-05 15:50:41 +02:00
Gael Guennebaud
d123717e21
Fix for msvc 2012 and older
2016-09-05 15:26:56 +02:00
Benoit Steiner
373c340b71
Fixed a typo
2016-09-02 15:41:17 -07:00
Benoit Steiner
5a6be66cef
Turned the Index type used by the nullary wrapper into a template parameter.
2016-09-02 14:10:29 -07:00
Gael Guennebaud
d6c8366d84
Fix compilation with MSVC 2012
2016-09-02 15:23:32 +02:00
Gael Guennebaud
ef54723dbe
One more msvc fix iteration, the previous one was over-simplified for visual
2016-09-01 15:04:53 +02:00
Gael Guennebaud
f9f32e9e2d
Fix compilation with nvcc
2016-09-01 13:06:14 +02:00
Gael Guennebaud
3d946e42b3
Fix compilation with visual studio
2016-09-01 12:59:32 +02:00
Gael Guennebaud
836fa25a82
Make sure sizeof is truelly needed, thus improving SFINAE portability.
2016-08-31 23:40:18 +02:00
Gael Guennebaud
84cf6e42ca
minor tweaks in has_* helpers
2016-08-31 23:04:14 +02:00
Gael Guennebaud
218c37beb4
bug #1286 : automatically detect the available prototypes of functors passed to CwiseNullaryExpr such that functors have only to implement the operators that matters among:
...
operator()()
operator()(i)
operator()(i,j)
Linear access is also automatically detected based on the availability of operator()(i,j).
2016-08-31 15:45:25 +02:00
Gael Guennebaud
3456247437
bug #1283 : quick fix for products involving uncommon general block access to vectors.
2016-08-31 08:17:15 +02:00
Gael Guennebaud
8c48d42530
Fix 4x4 inverse with non-linear destination
2016-08-30 23:16:38 +02:00
Gael Guennebaud
e7fbbc2748
Doc: add links and discourage user to write their own expression (better use CwiseNullaryOp)
2016-08-30 15:57:46 +02:00
Gael Guennebaud
9c9e23858e
Doc: split customizing-eigen page into sub-pages and re-structure a bit the different topics
2016-08-30 11:10:08 +02:00
Gael Guennebaud
cffe8bbff7
Doc: add link to example
2016-08-30 10:45:27 +02:00
Gael Guennebaud
68e803a26e
Fix warning
2016-08-30 09:21:57 +02:00
Gael Guennebaud
2915e1fc5d
Revert part of changeset 5b3a6f51d3
...
to keep accuracy of smallest eigenvalues.
2016-08-29 14:14:18 +02:00
Gael Guennebaud
7e029d1d6e
bug #1271 : add SparseMatrix::coeffs() methods returning a 1D view of the non zero coefficients.
2016-08-29 12:06:37 +02:00
Gael Guennebaud
8f4b4ad5fb
use ::hlog if available.
2016-08-29 11:05:32 +02:00
Gael Guennebaud
35a8e94577
bug #1167 : simplify installation of header files using cmake's install(DIRECTORY ...) command.
2016-08-29 10:59:37 +02:00
Gael Guennebaud
0decc31aa8
Add generic implementation of conj_helper for custom complex types.
2016-08-29 09:42:29 +02:00
Gael Guennebaud
fd9caa1bc2
bug #1282 : fix implicit double to float conversion warning
2016-08-28 22:45:56 +02:00
Gael Guennebaud
68d1897e8a
Make sure that our log1p implementation is called as a last resort only.
2016-08-26 15:30:55 +02:00
Gael Guennebaud
fe60856fed
Add overload of numext::log1p for float/double in CUDA
2016-08-26 15:28:59 +02:00
Gael Guennebaud
1329c55875
Fix compilation with boost::multiprec.
2016-08-25 14:54:39 +02:00
Gael Guennebaud
441b7eaab2
Add support for non trivial scalar factor in sparse selfadjoint * dense products, and enable +=/-= assignement for such products.
...
This changeset also improves the performance by working on column of the result at once.
2016-08-24 13:06:34 +02:00
Gael Guennebaud
8132a12625
bug #1268 : detect faillure in LDLT and report them through info()
2016-08-23 23:15:55 +02:00
Gael Guennebaud
bde9b456dc
Typo
2016-08-23 21:36:36 +02:00
Gael Guennebaud
ea2e968257
Address several implicit scalar conversions.
2016-08-23 18:44:33 +02:00
Gael Guennebaud
0a6a50d1b0
Cleanup eiegnvector extraction: leverage matrix products and compile-time sizes, remove numerous useless temporaries.
2016-08-23 18:14:37 +02:00
Gael Guennebaud
00b2666853
bug #645 : patch from Tobias Wood implementing the extraction of eigenvectors in GeneralizedEigenSolver
2016-08-23 17:37:38 +02:00
Gael Guennebaud
504a4404f1
Optimize expression matching "d?=a-b*c" as "d?=a; d?=b*c;"
2016-08-23 16:52:22 +02:00
Gael Guennebaud
e47a8928ec
Fix compilation in check_for_aliasing due to ambiguous specializations
2016-08-23 16:19:10 +02:00
Gael Guennebaud
ef3de20481
Cleanup cost of tanh
2016-08-23 14:39:55 +02:00
Gael Guennebaud
b3151bca40
Implement pmadd for float and double to make it consistent with the vectorized path when FMA is available.
2016-08-23 14:24:08 +02:00
Gael Guennebaud
a4c266f827
Factorize the 4 copies of tanh implementations, make numext::tanh consistent with array::tanh, enable fast tanh in fast-math mode only.
2016-08-23 14:23:08 +02:00
Gael Guennebaud
82147cefff
Fix possible overflow and biais in integer random generator
2016-08-23 13:25:31 +02:00
Gael Guennebaud
581b6472d1
bug #1265 : remove outdated notes
2016-08-22 23:25:39 +02:00
Igor Babuschkin
59bacfe520
Fix compilation on CUDA 8 by removing call to h2log1p
2016-08-15 23:38:05 +01:00
Christoph Hertzberg
c83b754ee0
bug #1272 : Disable assertion when total number of columns is zero.
...
Also moved assertion to finished() method and adapted unit-test
2016-08-12 15:15:34 +02:00
Igor Babuschkin
aee693ac52
Add log1p support for CUDA and half floats
2016-08-08 20:24:59 +01:00
Benoit Steiner
72096f3bd4
Merged in suiyuan2009/eigen/fix_tanh_inconsistent_for_tensorflow (pull request PR-215)
...
Fix_tanh_inconsistent_for_tensorflow
2016-08-08 09:06:45 -07:00
Christoph Hertzberg
3e4a33d4ba
bug #1272 : Let CommaInitializer work for more border cases (enhances fix of bug #1242 ).
...
The unit test tests all combinations of 2x2 block-sizes from 0 to 3.
2016-08-08 17:26:48 +02:00
Ziming Dong
1031223c09
fix tanh inconsistent
2016-08-06 19:48:50 +08:00
Benoit Steiner
fe778427f2
Fixed the constructors of the new half_base class.
2016-08-04 18:32:26 -07:00
Benoit Steiner
9506343349
Fixed the isnan, isfinite and isinf operations on GPU
2016-08-04 17:25:53 -07:00
Gael Guennebaud
17b9a55d98
Move Eigen::half_impl::half to Eigen::half while preserving the free functions to the Eigen::half_impl namespace together with ADL
2016-08-04 00:00:43 +02:00
Gael Guennebaud
7995cec90c
Fix vectorization logic for coeff-based product for some corner cases.
2016-07-31 15:20:22 +02:00
Benoit Steiner
02fe89f5ef
half implementation has been moved to half_impl namespace
2016-07-29 15:09:34 -07:00
Christoph Hertzberg
c5b893f434
bug #1266 : half implementation has been moved to half_impl namespace
2016-07-29 18:36:08 +02:00
klimpel
ca5effa16c
MSVC-2010 is making problems with SFINAE again. But restricting to the variant for very old compilers (enum, template<typename C> for both function definitions) fixes the problem.
2016-07-28 15:58:17 +01:00
Gael Guennebaud
4057f9b1fc
Enable slice-vectorization+inner-unrolling when unaligned vectorization is allowed. For instance, this permits to vectorize 5x5 matrices (including product)
2016-07-28 13:47:33 +02:00
Gael Guennebaud
a72752caac
Vectorize more small product expressions by letting the general assignement logic decides on the sizes that are OK for vectorization.
2016-07-28 11:21:07 +02:00
Christoph Hertzberg
d3d7c6245d
Add brackets to block matrix and fixed some typos
2016-07-27 09:55:39 +02:00
Gael Guennebaud
f6b3cf8de9
Bump to 3.3-beta2
2016-07-26 23:51:59 +02:00
Gael Guennebaud
95113cb15c
Improve robustness of 2x2 eigenvalue with shifting and scaling
2016-07-26 14:43:54 +02:00
Gael Guennebaud
7f7e84aa36
Fix compilation with MKL support
2016-07-26 13:31:29 +02:00
Gael Guennebaud
c581c8fa79
Fix with expession template scalar types.
2016-07-26 11:33:28 +02:00
Gael Guennebaud
757971e7ea
bug #1258 : fix compilation of Map<SparseMatrix>::coeffRef
2016-07-26 09:40:19 +02:00
Gael Guennebaud
9c663e4ee8
Clean references to MKL in LAPACKe support.
2016-07-25 18:20:08 +02:00
Gael Guennebaud
0c06077efa
Rename MKL files
2016-07-25 18:00:47 +02:00
Gael Guennebaud
4d54e3dd33
bug #173 : remove dependency to MKL for LAPACKe backend.
2016-07-25 17:55:07 +02:00
Gael Guennebaud
34b483e25d
bug #1249 : enable use of __builtin_prefetch for GCC, clang, and ICC only.
2016-07-25 15:17:45 +02:00
Gael Guennebaud
9908020d36
Add minimal support for Array<string>, and fix Tensor<string>
2016-07-25 14:25:56 +02:00
Gael Guennebaud
1b2049fbda
Enforce scalar types in calls to max/min (helps with expression template scalar types)
2016-07-25 12:35:10 +02:00
Gael Guennebaud
b118bc76eb
Add digits10 overload for complex.
2016-07-25 12:33:21 +02:00
Gael Guennebaud
c96af5381f
Remove custom complex division function cdiv.
2016-07-25 12:31:58 +02:00
Gael Guennebaud
e1c7c5968a
Update doc.
2016-07-25 11:18:04 +02:00
Gael Guennebaud
8fffc81606
Add NumTraits::digits10() function based on numeric_limits::digits10 and make use of it for printing matrices.
2016-07-25 11:13:01 +02:00
Gael Guennebaud
1b0353c659
Fix misuse of dummy_precesion in eigenvalues solvers
2016-07-23 17:52:31 +02:00
Gael Guennebaud
72744d93ef
Allows the compiler to inline outer products (the change from default to dont-inline in changeset 737bed19c1
...
was not motivated)
2016-07-22 17:02:28 +02:00
Gael Guennebaud
395c835f4b
Fix CUDA compilation
2016-07-22 15:30:24 +02:00
Gael Guennebaud
47afc9a365
More cleaning in half:
...
- put its definition and functions in its own half_impl namespace such that the free function does not polute the Eigen namespace while still making them visible for half through ADL.
- expose Eigen::half throguh a using statement
- move operator<< from std to half_float namespace
2016-07-22 14:33:28 +02:00
Gael Guennebaud
0f350a8b7e
Fix CUDA compilation
2016-07-21 18:47:07 +02:00
Gael Guennebaud
bf91a44f4a
Use ADL and log10 for printing matrices.
2016-07-21 15:48:24 +02:00
Gael Guennebaud
87fbda812f
Add missing log10 and random generator for half.
2016-07-21 15:46:45 +02:00
Gael Guennebaud
01d12d3e82
Some cleanup in Halh: standard functions should be defined in the namespace of the class half to make ADL work, and thus the global is* functions can be removed.
2016-07-21 15:10:48 +02:00
Gael Guennebaud
7722913475
Fix ambiguous specialization with custom scalar type
2016-07-20 15:13:44 +02:00
Gael Guennebaud
fd057f86b3
Complete the coeff-wise math function table.
2016-07-20 12:14:10 +02:00
Gael Guennebaud
9e8476ef22
Add missing Eigen::rsqrt global function
2016-07-20 11:59:49 +02:00
Gael Guennebaud
4b4c296d6e
Simplify ScalarBinaryOpTraits by removing the Defined enum, and extend its documentation.
2016-07-20 09:56:39 +02:00
Gael Guennebaud
e3bf874c83
Workaround MSVC 2010 compilation issue.
2016-07-18 15:17:25 +02:00
Gael Guennebaud
0f89c6d6b5
Add a summary of possible values for EIGEN_COMP_MSVC
2016-07-18 15:16:13 +02:00
Gael Guennebaud
18884f17d7
Remove static constant declaration: this enforces compiler to generate costly code for thread safety.
2016-07-18 15:05:17 +02:00
Gael Guennebaud
79574e384e
Make scalar_product_op the default (instead of void)
2016-07-18 12:03:05 +02:00
Gael Guennebaud
6a3c451c1c
Permits call to explicit ctor.
2016-07-18 12:02:20 +02:00
Gael Guennebaud
0c3fe4aca5
merge
2016-07-18 10:44:15 +02:00
Gael Guennebaud
db9b154193
Add missing non-const reverse method in VectorwiseOp.
2016-07-16 15:19:28 +02:00
Gael Guennebaud
461cd819c2
Workaround VS2015 bug
2016-07-13 18:46:01 +02:00
Gael Guennebaud
5ea0864c81
Fix regression in a previous commit: some diagonal entry might not be treated by the 2x2 real preconditioner.
2016-07-13 18:37:54 +02:00
Gael Guennebaud
b4343aa67e
Avoid division by very small entries when extracting singularvalues, and explicitly handle the 1x1 complex case.
2016-07-12 17:22:03 +02:00
Gael Guennebaud
e2aa58b631
Consider denormals as zero in makeJacobi and 2x2 SVD.
...
This also fix serious issues with x387 for which values can be much smaller than the smallest denormal!
2016-07-12 17:21:03 +02:00
klimpel
8b3fc31b55
compile fix (SFINAE variant apparently didn't work for all compilers) for the following compiler/platform:
...
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
Copyright (C) 2006 Free Software Foundation, Inc.
2016-07-11 17:42:22 +02:00
Gael Guennebaud
a96a7ce3f7
Move CUDA's special functions to SpecialFunctions module.
2016-07-11 18:39:11 +02:00
Gael Guennebaud
fd60966310
merge
2016-07-11 18:11:47 +02:00
Gael Guennebaud
3e348fdcf9
Workaround MSVC bug
2016-07-11 15:24:52 +02:00
Konstantinos Margaritis
ef05463fcf
Merged kmargar/eigen/tip into default, Altivec/VSX port should be working ok now.
2016-07-10 16:11:46 +03:00
Konstantinos Margaritis
9f7caa7e7d
minor fixes for big endian altivec/vsx
2016-07-10 07:05:10 -03:00
Christoph Hertzberg
3c795c6923
bug #1119 : Adjust call to ?gssvx for SuperLU 5
...
Also improved corresponding cmake module to detect versions 5.x
Based on patch by Christoph Grüninger.
2016-07-10 02:29:57 +02:00
Gael Guennebaud
2f7e2614e7
bug #1232 : refactor special functions as a new SpecialFunctions module, currently in unsupported/.
2016-07-08 11:13:55 +02:00
Gael Guennebaud
66917299a9
Add debug output
2016-07-06 22:27:15 +02:00
Gael Guennebaud
c3b23d7dbf
Fix support of Intel's VML
2016-07-06 14:07:32 +02:00
Gael Guennebaud
8ec4d6480d
Fix compilation with recent updates of icc 2016
2016-07-06 14:07:14 +02:00
Gael Guennebaud
5b3a6f51d3
Improve numerical robustness of RealSchur: add scaling and compare sub-diag entries to largest diagonal entry instead of the 2 neighbors.
2016-07-06 13:45:30 +02:00
Gael Guennebaud
367ef66af3
Re-enable some specializations for Assignment<.,Product<>>
2016-07-05 22:58:14 +02:00
Gael Guennebaud
155d8d8603
Fix compilation with msvc
2016-07-05 14:43:42 +02:00
Gael Guennebaud
b39fd8217f
Fix nesting of SolveWithGuess, and add unit test.
2016-07-04 17:47:47 +02:00
Gael Guennebaud
ec02af1047
Fix template resolution.
2016-07-04 17:37:33 +02:00
Gael Guennebaud
fbcfc2f862
Add unit test for solveWithGuess, and fix template resolution.
2016-07-04 17:19:38 +02:00
Gael Guennebaud
7f7839c12f
Add documentation and exemples for inplace decomposition.
2016-07-04 17:18:26 +02:00
Gael Guennebaud
32a41ee659
bug #707 : add inplace decomposition through Ref<> for Cholesky, LU and QR decompositions.
2016-07-04 15:13:35 +02:00
Gael Guennebaud
91b3039013
Change the semantic of the last template parameter of Assignment from "Scalar" to "SFINAE" only.
...
The previous "Scalar" semantic was obsolete since we allow for different scalar types in the source and destination expressions.
On can still specialize on scalar types through SFINAE and/or assignment functor.
2016-07-04 11:02:00 +02:00
Gael Guennebaud
0fa9e4a15c
Fix performance regression in dgemm introduced by changeset 5d51a7f12c
2016-07-02 17:35:08 +02:00
Gael Guennebaud
672076db5d
Fix performance regression introduced in changeset e56aabf205
...
.
Register blocking sizes are better handled by the cache size heuristics.
The current code introduced very small blocks, for instance for 9x9 matrix,
thus killing performance.
2016-07-02 15:40:56 +02:00
Justin Carpentier
6126886a67
Use complete nested namespace Eigen::internal
2016-06-28 20:09:25 +02:00
Benoit Jacob
328c5d876a
Undo changes in AltiVec --- I don't have any way to test there.
2016-06-28 11:15:25 -04:00
Benoit Jacob
38fb606052
Avoid global variables with static constructors in NEON/Complex.h
2016-06-28 11:12:49 -04:00
Gael Guennebaud
d937a420a2
Fix compilation with MSVC by using our portable numext::log1p implementation.
2016-08-22 15:44:21 +02:00
Gael Guennebaud
2d5731e40a
bug #1270 : bypass custom asm for pmadd and recent clang version
2016-08-22 15:38:03 +02:00
Gael Guennebaud
49b005181a
Define EIGEN_COMP_CLANG to clang version as major*100+minor (e.g., 307 corresponds to clang 3.7)
2016-08-22 15:37:05 +02:00
Gael Guennebaud
130f891bb0
bug #1278 : ease parsing
2016-08-22 15:00:29 +02:00
Gael Guennebaud
d476cadbb8
bug #1247 : fix regression in compilation of pow(integer,integer), and add respective unit tests.
2016-06-25 10:12:06 +02:00
Gael Guennebaud
c50c73cae2
Fix missing specialization.
2016-06-24 23:10:39 +02:00
Gael Guennebaud
cd577a275c
Relax promote_scalar_arg logic to enable promotion to Expr::Scalar if conversion to Expr::Literal fails.
...
This is useful to cancel expression template at the scalar level, e.g. with AutoDiff<AutoDiff<>>.
This patch also defers calls to NumTraits in cases for which types are not directly compatible.
2016-06-24 11:28:54 +02:00
Gael Guennebaud
deb45ad4bc
bug #1245 : fix compilation with msvc
2016-06-24 09:52:25 +02:00
Gael Guennebaud
55fc04e8b5
Fix operator priority
2016-06-23 15:36:42 +02:00
Gael Guennebaud
bf2d5edecc
Fix warning.
2016-06-23 15:35:17 +02:00
Gael Guennebaud
7c6561485a
merge PR 194
2016-06-23 15:29:57 +02:00
Konstantinos Margaritis
be107e387b
fix compilation with clang 3.9, fix performance with pset1, use vector operators instead of intrinsics in some cases
2016-06-23 10:19:05 -03:00
Gael Guennebaud
76faf4a965
Introduce a NumTraits<T>::Literal type to be used for literals, and
...
improve mixing type support in operations between arrays and scalars:
- 2 * ArrayXcf is now optimized in the sense that the integer 2 is properly promoted to a float instead of a complex<float> (fix a regression)
- 2.1 * ArrayXi is now forbiden (previously, 2.1 was converted to 2)
- This mechanism should be applicable to any custom scalar type, assuming NumTraits<T>::Literal is properly defined (it defaults to T)
2016-06-23 14:27:20 +02:00
Gael Guennebaud
a3f7edf7e7
Biug 1242: fix comma init with empty matrices.
2016-06-23 10:25:04 +02:00
Konstantinos Margaritis
8c34b5a0e3
mostly cleanups and modernizing code
2016-06-19 16:13:17 -03:00
Konstantinos Margaritis
b410d46482
mostly cleanups and modernizing code
2016-06-19 16:12:52 -03:00
Konstantinos Margaritis
b80379bda0
fixed pexp<Packet2d>, was failing tests
2016-06-19 16:11:58 -03:00
Benoit Steiner
b055590e91
Made log1p_impl usable inside a GPU kernel
2016-06-16 11:37:40 -07:00
Gael Guennebaud
67c12531e5
Fix warnings with gcc
2016-06-15 18:11:33 +02:00
Gael Guennebaud
eb91345d64
Move scalar/expr to ArrayBase and fix documentation
2016-06-15 15:22:03 +02:00
Gael Guennebaud
4794834397
Propagate functor to ScalarBinaryOpTraits
2016-06-15 09:58:49 +02:00
Gael Guennebaud
c55035b9c0
Include the cost of stores in unrolling of triangular expressions.
2016-06-15 09:57:33 +02:00
Gael Guennebaud
4e7c3af874
Cleanup useless helper: internal::product_result_scalar
2016-06-15 00:04:10 +02:00
Gael Guennebaud
101ea26f5e
Include the cost of stores in unrolling (also fix infinite unrolling with expression costing 0 like Constant)
2016-06-15 00:01:16 +02:00
Gael Guennebaud
76236cdea4
merge
2016-06-14 15:33:47 +02:00
Gael Guennebaud
1004c4df99
Cleanup unused functors.
2016-06-14 15:27:28 +02:00
Gael Guennebaud
70dad84b73
Generalize expr/expr and scalar/expr wrt scalar types.
2016-06-14 15:26:37 +02:00
Gael Guennebaud
62134082aa
Update AutoDiffScalar wrt to scalar-multiple.
2016-06-14 15:06:35 +02:00
Gael Guennebaud
396d9cfb6e
Generalize expr.pow(scalar), pow(expr,scalar) and pow(scalar,expr).
...
Internal: scalar_pow_op (unary) is removed, and scalar_binary_pow_op is renamed scalar_pow_op.
2016-06-14 14:10:07 +02:00
Gael Guennebaud
a8c08e8b8e
Implement expr+scalar, scalar+expr, expr-scalar, and scalar-expr as binary expressions, and generalize supported scalar types.
...
The following functors are now deprecated: scalar_add_op, scalar_sub_op, and scalar_rsub_op.
2016-06-14 12:06:10 +02:00
Gael Guennebaud
756ac4a93d
Fix doc.
2016-06-14 12:03:39 +02:00
Gael Guennebaud
bcc0f38f98
Add unittesting plugins to scalar_product_op and scalar_quotient_op to help chaking that types are properly propagated.
2016-06-14 11:31:27 +02:00
Gael Guennebaud
f57fd78e30
Generalize coeff-wise sparse products to support different scalar types
2016-06-14 11:29:54 +02:00
Gael Guennebaud
f5b1c73945
Set cost of constant expression to 0 (the cost should be amortized through the expression)
2016-06-14 11:29:06 +02:00
Gael Guennebaud
deb8306e60
Move MatrixBase::operaotr*(UniformScaling) as a free function in Scaling.h, and fix return type.
2016-06-14 11:28:03 +02:00
Gael Guennebaud
64fcfd314f
Implement scalar multiples and division by a scalar as a binary-expression with a constant expression.
...
This slightly complexifies the type of the expressions and implies that we now have to distinguish between scalar*expr and expr*scalar to catch scalar-multiple expression (e.g., see BlasUtil.h), but this brings several advantages:
- it makes it clear on each side the scalar is applied,
- it clearly reflects that we are dealing with a binary-expression,
- the complexity of the type is hidden through macros defined at the end of Macros.h,
- distinguishing between "scalar op expr" and "expr op scalar" is important to support non commutative fields (like quaternions)
- "scalar op expr" is now fully equivalent to "ConstantExpr(scalar) op expr"
- scalar_multiple_op, scalar_quotient1_op and scalar_quotient2_op are not used anymore in officially supported modules (still used in Tensor)
2016-06-14 11:26:57 +02:00
Gael Guennebaud
3c12e24164
Add bind1st_op and bind2nd_op helpers to turn binary functors into unary ones, and implement scalar_multiple2 and scalar_quotient2 on top of them.
2016-06-13 16:18:59 +02:00
Gael Guennebaud
7a9ef7bbb4
Add default template parameters for the second scalar type of binary functors.
...
This enhences backward compatibility.
2016-06-13 16:17:23 +02:00
Gael Guennebaud
4c61f00838
Add missing explicit scalar conversion
2016-06-12 22:42:13 +02:00
Gael Guennebaud
83904a21c1
Make sure T(i+1,i)==0 when diagonalizing T(i:i+1,i:i+1)
2016-06-11 14:41:36 +02:00
Gael Guennebaud
fabae6c9a1
Cleanup
2016-06-10 15:58:33 +02:00