eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2024-12-27 07:29:52 +08:00

Author	SHA1	Message	Date
derekjchow	66ca41bd47	Add support for vectorizing logical comparisons.	2021-07-23 20:07:48 +00:00
arthurfeeney	a77638387d	Fixes #1387 for compilation error in JacobiSVD with HouseholderQRPreconditioner that occurs when input is a compile-time row vector.	2021-07-20 20:11:22 +00:00
Antonio Sanchez	297f0f563d	Fix explicit default cache size typo.	2021-07-20 11:40:17 -07:00
Antonio Sanchez	1fd5ce1002	For GpuDevice::fill, use a single memset if all bytes are equal. The original `fill` implementation introduced a 5x regression on my nvidia Quadro K1200. @rohitsan reported up to 100x regression for HIP. This restores performance.	2021-07-10 13:37:16 +00:00
Antonio Sanchez	9c22795d65	Put attach/detach buffer back in for TensorDeviceSycl. Also added a test to verify the original buffer is updated correctly.	2021-07-09 10:00:05 -07:00
Rohit Santhanam	beea14a18f	Enable extract et. al. for HIP GPU.	2021-07-09 14:58:07 +00:00
Rasmus Munk Larsen	0c361c4899	Defer to std::fill_n when filling a dense object with a constant value.	2021-07-09 03:59:35 +00:00
Antonio Sanchez	1e6c6c1576	Replace memset with fill to work for non-trivial scalars. For custom scalars, zero is not necessarily represented by a zeroed-out memory block (e.g. gnu MPFR). We therefore cannot rely on `memset` if we want to fill a matrix or tensor with zeroes. Instead, we should rely on `fill`, which for trivial types does end up getting converted to a `memset` under-the-hood (at least with gcc/clang). Requires adding a `fill(begin, end, v)` to `TensorDevice`. Replaced all potentially bad instances of memset with fill. Fixes #2245.	2021-07-08 18:34:41 +00:00
Jonas Harsch	e9c9a3130b	Removed superfluous boolean `degenerate` in TensorMorphing.h.	2021-07-08 18:02:58 +00:00
Guoqiang QI	4bcd42c271	Make a copy of input matrix when try to do the inverse in place, this fixes #2285 .	2021-07-08 17:05:26 +00:00
Kolja Brix	a59cf78c8d	Add Doxygen-style documentation to main.h.	2021-07-07 18:23:59 +00:00
Antonio Sanchez	f44f05532d	Fix CMake directory issues. Allows absolute and relative paths for - `INCLUDE_INSTALL_DIR` - `CMAKEPACKAGE_INSTALL_DIR` - `PKGCONFIG_INSTALL_DIR` Type should be `PATH` not `STRING`. Contrary to !211, these don't seem to be made absolute if user-defined - according to the doc any directories should use `PATH` type, which allows a file dialog to be used via the GUI. It also better handles file separators. If user provides an absolute path, it will be made relative to `CMAKE_INSTALL_PREFIX` so that the `configure_packet_config_file` will work. Fixes #2155 and #2269.	2021-07-07 17:24:57 +00:00
Antonio Sanchez	f5a9873bbb	Fix Tensor documentation page. The extra [TOC] tag is generating a huge floating duplicated table-of-contents, which obscures the majority of the page (see bottom of https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html). Remove it. Also, headers do not support markup (see [doxygen bug](https://github.com/doxygen/doxygen/issues/7467)), so backticks like ``` ``` end up generating titles that looks like ``` Constructor <tt>Tensor<double,2></tt> ``` Removing backticks for now. To generate proper formatted headers, we must directly use html instead of markdown, i.e. ``` <h2>Constructor <code>Tensor<double,2></code></h2> ``` which is ugly. Fixes #2254.	2021-07-03 04:39:22 +00:00
Rasmus Munk Larsen	7b35638ddb	Fix breakage of conj_helper in conjunction with custom types introduced in !537 .	2021-07-02 20:42:15 +00:00
Jonas Harsch	aab747021b	Don't crash when attempting to shuffle an empty tensor.	2021-07-02 20:33:52 +00:00
Rasmus Munk Larsen	bbfc4d54cd	Use `padd` instead of `+`.	2021-07-02 02:51:48 +00:00
Rasmus Munk Larsen	9312a5bf5c	Implement a generic vectorized version of Smith's algorithms for complex division.	2021-07-01 23:31:12 +00:00
Antonio Sanchez	6035da5283	Fix compile issues for gcc 4.8. - Move constructors can only be defaulted as NOEXCEPT if all members have NOEXCEPT move constructors. - gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.	2021-07-01 22:58:14 +00:00
Antonio Sanchez	154f00e9ea	Fix inverse nullptr/asan errors for LU. For empty or single-column matrices, the current `PartialPivLU` currently dereferences a `nullptr` or accesses memory out-of-bounds. Here we adjust the checks to avoid this.	2021-07-01 13:41:04 -07:00
Dan Miller	eb04775903	Fix duplicate definitions on Mac	2021-07-01 14:54:12 +00:00
Chip Kerchner	91e99ec1e0	Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow	2021-06-30 23:05:04 +00:00
Alexander Karatarakis	60400334a9	Make DenseStorage<> trivially_copyable	2021-06-30 04:27:51 +00:00
大河メタル	c81da59a25	Correct declarations for aarch64-pc-windows-msvc	2021-06-30 04:09:46 +00:00
Rasmus Munk Larsen	5aebbe9098	Get rid of redundant `pabs` instruction in complex square root.	2021-06-29 23:26:15 +00:00
Antonio Sanchez	3a087ccb99	Modify tensor argmin/argmax to always return first occurence. As written, depending on multithreading/gpu, the returned index from `argmin`/`argmax` is not currently stable. Here we modify the functors to always keep the first occurence (i.e. if the value is equal to the current min/max, then keep the one with the smallest index). This is otherwise causing unpredictable results in some TF tests.	2021-06-29 10:36:20 -07:00
Rohit Santhanam	2d132d1736	Commit `52a5f982` broke conjhelper functionality for HIP GPUs. This commit addresses this.	2021-06-25 19:28:00 +00:00
Rasmus Munk Larsen	bffd267d17	Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.	2021-06-24 18:52:17 -07:00
Rasmus Munk Larsen	52a5f98212	Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.	2021-06-24 15:47:48 -07:00
Rasmus Munk Larsen	4ad30a73fc	Use internal::ref_selector to avoid holding a reference to a RHS expression.	2021-06-22 14:31:32 +00:00
Rasmus Munk Larsen	ea62c937ed	Update ComplexEigenSolver_eigenvectors.cpp	2021-06-21 19:06:25 +00:00
Rasmus Munk Larsen	c8a2b4d20a	Fix typo in SelfAdjointEigenSolver_eigenvectors.cpp	2021-06-21 19:06:04 +00:00
Antonio Sanchez	e9ab4278b7	Rewrite balancer to avoid overflows. The previous balancer overflowed for large row/column norms. Modified to prevent that. Fixes #2273.	2021-06-21 17:29:55 +00:00
Antonio Sanchez	35a367d557	Fix fix<> for gcc-4.9.3. There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` replacement. Fixes ##2267	2021-06-18 13:22:54 -07:00
Antonio Sanchez	12e8d57108	Remove pset, replace with ploadu. We can't make guarantees on alignment for existing calls to `pset`, so we should default to loading unaligned. But in that case, we should just use `ploadu` directly. For loading constants, this load should hopefully get optimized away. This is causing segfaults in Google Maps.	2021-06-16 18:41:17 -07:00
Chip-Kerchner	ef1fd341a8	EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.	2021-06-16 16:30:31 +00:00
jenswehner	175f0cc1e9	changed documentation to make example compile	2021-06-16 11:45:06 +02:00
Antonio Sanchez	9e94c59570	Add missing ppc pcmp_lt_or_nan<Packet8bf>	2021-06-15 13:42:17 -07:00
Antonio Sanchez	954879183b	Fix placement of permanent GPU defines.	2021-06-15 12:17:09 -07:00
Rasmus Munk Larsen	13fb5ab92c	Fix more enum arithmetic.	2021-06-15 09:09:31 -07:00
Antonio Sanchez	ad82d20cf6	Fix checking of version number for mingw. MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC) 10-win32 20210110`, which causes the version extraction to fail. Added support for this with tests. Also added `make_unsigned` for `long long`, since mingw seems to use that for `uint64_t`. Related to #2268. CMake and build passes for me after this.	2021-06-11 23:19:10 +00:00
Antonio Sanchez	514977f31b	Add ability to permanently enable HIP/CUDA gpu* defines. When using Eigen for gpu, these simplify portability. If `EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then we do not undefine them.	2021-06-11 17:19:54 +00:00
Antonio Sanchez	6aec83263d	Allow custom TENSOR_CONTRACTION_DISPATCH macro. Currently TF lite needs to hack around with the Tensor headers in order to customize the contraction dispatch method. Here we add simple `#ifndef` guards to allow them to provide their own dispatch prior to inclusion.	2021-06-11 17:02:19 +00:00
Rasmus Munk Larsen	fc87e2cbaa	Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.	2021-06-11 02:35:53 +00:00
Rasmus Munk Larsen	f64b2954c7	Fix c++20 warnings about using enums in arithmetic expressions.	2021-06-10 17:17:39 -07:00
Nicolas Cornu	001a57519a	Fix parsing of version for nvhpc As the first line of the version is empty it crashes, so delete first line if it is empty	2021-06-10 18:30:53 +00:00
Rohit Santhanam	c8d40a7bf1	Removed dead code from GPU float16 unit test.	2021-05-28 20:06:48 +00:00
Cyril Kaiser	91cd67f057	Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor.	2021-05-26 19:28:13 +00:00
Antonio Sanchez	dba753a986	Add missing NEON ptranspose implementations. Unified implementation using only `vzip`.	2021-05-25 18:25:35 +00:00
Antonio Sanchez	ebb300d0b4	Modify Unary/Binary/TernaryOp evaluators to work for non-class types. This used to work for non-class types (e.g. raw function pointers) in Eigen 3.3. This was changed in commit `11f55b29` to optimize the evaluator: > `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization. though I cannot reproduce the 16 byte result. Both before the change and after, with multiple compilers/versions, I always get a result of 40 bytes. https://godbolt.org/z/MsjTc1PGe This change modifies the code slightly to allow non-class types. The final generated code is identical, and the expression remains 40 bytes for the `abs2` sample case. Fixes #2251	2021-05-23 12:44:37 -07:00
Jakub Lichman	12471fcb5d	predux_half_dowto4 test extended to all applicable packets	2021-05-21 16:42:19 +00:00

1 2 3 4 5 ...

11564 Commits