Commit Graph

5999 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
fcfced13ed Rename pones -> ptrue. Use _CMP_TRUE_UQ where appropriate. 2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
e15bb785ad Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
f6ba6071c5 Fix typo. 2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
8f04442526 Collapsed revision
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
e00521b514 Undo useless diffs. 2019-01-09 16:32:53 -08:00
Rasmus Munk Larsen
f2767112c8 Simplify a bit. 2019-01-09 16:29:18 -08:00
Rasmus Munk Larsen
cb955df9a6 Add packet up "pones". Write pnot(a) as pxor(pones(a), a). 2019-01-09 16:17:08 -08:00
Rasmus Larsen
cb3c059fa4 Merged eigen/eigen into default 2019-01-09 15:04:17 -08:00
Gael Guennebaud
d812f411c3 bug #1654: fix compilation with cuda and no c++11 2019-01-09 18:00:05 +01:00
Gael Guennebaud
3492a1ca74 fix plog(+inf) with AVX512 2019-01-09 16:53:37 +01:00
Gael Guennebaud
47810cf5b7 Add dedicated implementations of predux_any for AVX512, NEON, and Altivec/VSE 2019-01-09 16:40:42 +01:00
Gael Guennebaud
3f14e0d19e fix warning 2019-01-09 15:45:21 +01:00
Gael Guennebaud
aeec68f77b Add missing pcmp_lt and others for AVX512 2019-01-09 15:36:41 +01:00
Gael Guennebaud
e6b217b8dd bug #1652: implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows:
- no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs
  - FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs
2019-01-09 15:25:17 +01:00
Rasmus Munk Larsen
055f0b73db Add support for pcmp_eq and pnot, including for complex types. 2019-01-07 16:53:36 -08:00
Eugene Zhulenev
190d053e41 Explicitly set fill character when printing aligned data to ostream 2019-01-03 14:55:28 -08:00
Mark D Ryan
bc5dd4cafd PR560: Fix the AVX512f only builds
Commit c53eececb0
 introduced AVX512 support for complex numbers but required
avx512dq to build.  Commit 1d683ae2f5
 fixed some but not, it would seem all,
of the hard avx512dq dependencies.  Build failures are still evident on
Eigen and TensorFlow when compiling with just avx512f and no avx512dq
using gcc 7.3.  Looking at the code there does indeed seem to be a problem.
Commit c53eececb0
 calls avx512dq intrinsics directly, e.g, _mm512_extractf32x8_ps
and _mm512_and_ps.  This commit fixes the issue by replacing the direct
intrinsic calls with the various wrapper functions that are safe to use on
avx512f only builds.
2019-01-03 14:33:04 +01:00
Gael Guennebaud
60d3fe9a89 One more stupid AVX 512 fix (I don't have direct access to AVX512 machines) 2018-12-24 13:05:03 +01:00
Gael Guennebaud
4aa667b510 Add EIGEN_STRONG_INLINE where required 2018-12-24 10:45:01 +01:00
Gael Guennebaud
961ff567e8 Add missing pcmp_lt_or_nan for AVX512 2018-12-23 22:13:29 +01:00
Gael Guennebaud
0f6f75bd8a Implement a faster fix for sin/cos of large entries that also correctly handle INF input. 2018-12-23 17:26:21 +01:00
Gael Guennebaud
38d704def8 Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate) 2018-12-23 16:13:24 +01:00
Gael Guennebaud
5713fb7feb Fix plog(+INF): it returned ~87 instead of +INF 2018-12-23 15:40:52 +01:00
Christoph Hertzberg
6dd93f7e3b Make code compile again for older compilers.
See https://stackoverflow.com/questions/7411515/
2018-12-22 13:09:07 +01:00
Gustavo Lima Chaves
1024a70e82 gebp: Add new ½ and ¼ packet rows per (peeling) round on the lhs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The patch works by altering the gebp lhs packing routines to also
consider ½ and ¼ packet lenght rows when packing, besides the original
whole package and row-by-row attempts. Finally, gebp itself will try
to fit a fraction of a packet at a time if:

i) ½ and/or ¼ packets are available for the current context (e.g. AVX2
   and SSE-sized SIMD register for x86)

ii) The matrix's height is favorable to it (it may be it's too small
    in that dimension to take full advantage of the current/maximum
    packet width or it may be the case that last rows may take
    advantage of smaller packets before gebp goes row-by-row)

This helps mitigate huge slowdowns one had on AVX512 builds when
compared to AVX2 ones, for some dimensions. Gains top at an extra 1x
in throughput. This patch is a complement to changeset 4ad359237a
.

Since packing is changed, Eigen users which would go for very
low-level API usage, like TensorFlow, will have to be adapted to work
fine with the changes.
2018-12-21 11:03:18 -08:00
Gustavo Lima Chaves
e763fcd09e Introducing "vectorized" byte on unpacket_traits structs
This is a preparation to a change on gebp_traits, where a new template
argument will be introduced to dictate the packet size, so it won't be
bound to the current/max packet size only anymore.

By having packet types defined early on gebp_traits, one has now to
act on packet types, not scalars anymore, for the enum values defined
on that class. One approach for reaching the vectorizable/size
properties one needs there could be getting the packet's scalar again
with unpacket_traits<>, then the size/Vectorizable enum entries from
packet_traits<>. It turns out guards like "#ifndef
EIGEN_VECTORIZE_AVX512" at AVX/PacketMath.h will hide smaller packet
variations of packet_traits<> for some types (and it makes sense to
keep that). In other words, one can't go back to the scalar and create
a new PacketType, as this will always lead to the maximum packet type
for the architecture.

The less costly/invasive solution for that, thus, is to add the
vectorizable info on every unpacket_traits struct as well.
2018-12-19 14:24:44 -08:00
Gael Guennebaud
efa4c9c40f bug #1615: slightly increase the default unrolling limit to compensate for changeset 101ea26f5e
.
This solves a performance regression with clang and 3x3 matrix products.
2018-12-13 10:42:39 +01:00
Gael Guennebaud
f582ea3579 Fix compilation with expression template scalar type. 2018-12-12 22:47:00 +01:00
Gael Guennebaud
2de8da70fd bug #1557: fix RealSchur and EigenSolver for matrices with only zeros on the diagonal. 2018-12-12 17:30:08 +01:00
Gael Guennebaud
37c91e1836 bug #1644: fix warning 2018-12-11 22:07:20 +01:00
Gael Guennebaud
f159cf3d75 Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels.
With a 6pX4 kernel (not committed yet), this provides a +20% speedup.
2018-12-11 15:36:27 +01:00
Gael Guennebaud
0a7e7af6fd Properly set the number of registers for AVX512 2018-12-11 15:33:17 +01:00
Gael Guennebaud
7166496f70 bug #1643: fix compilation issue with gcc and no optimizaion 2018-12-11 13:24:42 +01:00
Gael Guennebaud
0d90637838 enable spilling workaround on architectures with SSE/AVX 2018-12-10 23:22:44 +01:00
Gael Guennebaud
bff90bf270 workaround "may be used uninitialized" warning 2018-12-08 18:58:28 +01:00
Gael Guennebaud
81c27325ae bug #1641: fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512 2018-12-08 14:27:48 +01:00
Gael Guennebaud
426bce7529 fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target 2018-12-08 09:44:21 +01:00
Gael Guennebaud
956678a4ef bug #1515: disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling. 2018-12-07 18:03:36 +01:00
Gael Guennebaud
7b6d0ff1f6 Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error. 2018-12-07 15:14:50 +01:00
Gael Guennebaud
f233c6194d bug #1637: workaround register spilling in gebp with clang>=6.0+AVX+FMA 2018-12-07 10:01:09 +01:00
Gael Guennebaud
ae59a7652b bug #1638: add a warning if avx512 is enabled without SSE/AVX FMA 2018-12-07 09:23:28 +01:00
Gael Guennebaud
4e7746fe22 bug #1636: fix gemm performance issue with gcc>=6 and no FMA 2018-12-07 09:15:46 +01:00
Gael Guennebaud
cbf2f4b7a0 AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only 2018-12-06 18:21:56 +01:00
Gael Guennebaud
1d683ae2f5 Fix compilation with avx512f only, i.e., no AVX512DQ 2018-12-06 18:11:07 +01:00
Gael Guennebaud
c53eececb0 Implement AVX512 vectorization of std::complex<float/double> 2018-12-06 15:58:06 +01:00
Gael Guennebaud
3fba59ea59 temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though! 2018-12-06 00:13:26 +01:00
Gael Guennebaud
1ac2695ef7 bug #1636: fix compilation with some ABI versions. 2018-12-06 00:05:10 +01:00
Rasmus Munk Larsen
47d8b741b2 #elif -> #else to fix GPU build. 2018-12-05 13:19:31 -08:00
Christoph Hertzberg
c1d356e8b4 bug #1635: Use infinity from Numtraits instead of creating it manually. 2018-12-05 15:01:04 +01:00
Rasmus Munk Larsen
b57b31cce9 Merged in ezhulenev/eigen-01 (pull request PR-553)
Do not disable alignment with EIGEN_GPUCC

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-12-04 23:47:19 +00:00
Eugene Zhulenev
0bb15bb6d6 Update checks in ConfigureVectorization.h 2018-12-03 17:10:40 -08:00
Eugene Zhulenev
fd0fbfa9b5 Do not disable alignment with EIGEN_GPUCC 2018-12-03 15:54:10 -08:00
Christoph Hertzberg
919414b9fe bug #785: Make Cholesky decomposition work for empty matrices 2018-12-03 16:18:15 +01:00
Gael Guennebaud
0ea7ae7213 Add missing padd for Packet8i (it was implicitly generated by clang and gcc) 2018-11-30 21:52:25 +01:00
Gael Guennebaud
ab4df3e6ff bug #1634: remove double copy in move-ctor of non movable Matrix/Array 2018-11-30 21:25:51 +01:00
Gael Guennebaud
c785464430 Add packet sin and cos to Altivec/VSX and NEON 2018-11-30 16:21:33 +01:00
Gael Guennebaud
69ace742be Several improvements regarding packet-bitwise operations:
- add unit tests
- optimize their AVX512f implementation
- add missing implementations (half, Packet4f, ...)
2018-11-30 15:56:08 +01:00
Gael Guennebaud
fa87f9d876 Add psin/pcos on AVX512 -> almost for free, at last! 2018-11-30 14:33:13 +01:00
Gael Guennebaud
c68bd2fa7a Cleanup 2018-11-30 14:32:31 +01:00
Gael Guennebaud
f91500d303 Fix pandnot order in AVX512 2018-11-30 14:32:06 +01:00
Gael Guennebaud
b477d60bc6 Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX) 2018-11-30 11:26:30 +01:00
Gael Guennebaud
e19ece822d Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks) 2018-11-28 17:56:24 +01:00
Gael Guennebaud
41052f63b7 same for pmax 2018-11-28 17:17:28 +01:00
Gael Guennebaud
3e95e398b6 pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions 2018-11-28 17:14:20 +01:00
Gael Guennebaud
aa6097395b Add missing SSE/AVX type-casting in AVX512 mode 2018-11-28 16:09:08 +01:00
Gael Guennebaud
48fe78c375 bug #1630: fix linspaced when requesting smaller packet size than default one. 2018-11-28 13:15:06 +01:00
Eugene Zhulenev
80f1651f35 Use explicit packet type in SSE/PacketMath pldexp 2018-11-27 17:25:49 -08:00
Benoit Jacob
a4159dba08 do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first). 2018-11-27 16:53:14 -05:00
Gael Guennebaud
b131a4db24 bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions. 2018-11-27 23:45:00 +01:00
Gael Guennebaud
a1a5fbbd21 Update pshiftleft to pass the shift as a true compile-time integer. 2018-11-27 22:57:30 +01:00
Gael Guennebaud
fa7fd61eda Unify SSE/AVX psin functions.
It is based on the SSE version which is much more accurate, though very slightly slower.
This changeset also includes the following required changes:
 - add packet-float to packet-int type traits
 - add packet float<->int reinterpret casts
 - add faster pselect for AVX based on blendv
2018-11-27 22:41:51 +01:00
Benoit Jacob
7b1cb8a440 fix the build on 64-bit ARM when NEON is disabled 2018-11-27 11:11:02 -05:00
Gael Guennebaud
b5695a6008 Unify Altivec/VSX pexp(double) with default implementation 2018-11-27 13:53:05 +01:00
Gael Guennebaud
7655a8af6e cleanup 2018-11-26 23:21:29 +01:00
Gael Guennebaud
502f92fa10 Unify SSE and AVX pexp for double. 2018-11-26 23:12:44 +01:00
Gael Guennebaud
4a347a0054 Unify NEON's pexp with generic implementation 2018-11-26 22:15:44 +01:00
Gael Guennebaud
5c8406babc Unify Altivec/VSX's pexp with generic implementation 2018-11-26 16:47:13 +01:00
Gael Guennebaud
cf8b85d5c5 Unify SSE and AVX implementation of pexp 2018-11-26 16:36:19 +01:00
Gael Guennebaud
c2f35b1b47 Unify Altivec/VSX's plog with generic implementation, and enable it! 2018-11-26 15:58:11 +01:00
Gael Guennebaud
c24e98e6a8 Unify NEON's plog with generic implementation 2018-11-26 15:02:16 +01:00
Gael Guennebaud
2c44c40114 First step toward a unification of packet log implementation, currently only SSE and AVX are unified.
To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
2018-11-26 14:21:24 +01:00
Gael Guennebaud
5f6045077c Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B" 2018-11-26 14:14:07 +01:00
Gael Guennebaud
0836a715d6 bug #1611: fix plog(0) on NEON 2018-11-26 09:08:38 +01:00
Patrik Huber
95566eeed4 Fix typos 2018-11-23 22:22:14 +00:00
Gael Guennebaud
ccabdd88c9 Fix reserved usage of double __ in macro names 2018-11-23 16:01:47 +01:00
Gael Guennebaud
a7842daef2 Fix several uninitialized member from ctor 2018-11-23 15:10:28 +01:00
Gael Guennebaud
a476054879 bug #1624: improve matrix-matrix product on ARM 64, 20% speedup 2018-11-23 10:25:19 +01:00
Gael Guennebaud
4b2cebade8 Workaround weird MSVC bug 2018-11-21 15:53:37 +01:00
Deven Desai
e7e6809e6b ROCm/HIP specfic fixes + updates
1. Eigen/src/Core/arch/GPU/Half.h

   Updating the HIPCC implementation half so that it can declared as a __shared__ variable


2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h

   introducing a EIGEN_USE_STD(func) macro that calls
   - std::func be default
   - ::func when eigen is being compiled with HIPCC

   This change was requested in the previous HIP PR
   (https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff)


3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h

   Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC


4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h

   Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors
2018-11-19 18:13:59 +00:00
Gael Guennebaud
6a510fe69c Make MaxPacketSize a true upper bound, even for fixed-size inputs 2018-11-16 11:25:32 +01:00
Mark D Ryan
670d56441c PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals
Commit aa110e681b
 optimised the multiplication of small dyanmically
sized matrices by restricting the packet size to a maximum of 4, increasing
the chances that SIMD instructions are used in the computation.  However, it
introduced a mismatch between the packet size and the requestedAlignment.  This
mismatch can lead to crashes when the destination is not aligned.  This patch
fixes the issue by ensuring that the AssignmentTraits are correctly computed
when using a restricted packet size.
* * *
Bind LinearPacketType to MaxPacketSize

This commit applies any packet size limit specified when instantiating
copy_using_evaluator_traits to the LinearPacketType, providing that the
size of the destination is not known at compile time.
* * *
Add unit test for restricted packet assignment

A new unit test is added to check that multiplication of small dynamically
sized matrices works correctly when the packet size is restricted to 4 and
the destination is unaligned.
2018-11-13 16:15:08 +01:00
Nikolaus Demmel
3dc0845046 Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES 2018-11-14 18:11:30 +01:00
Gael Guennebaud
7fddc6a51f typo 2018-11-14 14:43:18 +01:00
Gael Guennebaud
449f948b2a help doxygen linking to DenseBase::NulllaryExpr 2018-11-14 14:42:59 +01:00
luz.paz"
f67b19a884 [PATCH 1/2] Misc. typos
From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of:
```
als
ans
cas
dum
lastr
lowd
nd
overfl
pres
preverse
substraction
te
uint
whch
```
---
 CMakeLists.txt                                | 26 +++++++++----------
 Eigen/src/Core/GenericPacketMath.h            |  2 +-
 Eigen/src/SparseLU/SparseLU.h                 |  2 +-
 bench/bench_norm.cpp                          |  2 +-
 doc/HiPerformance.dox                         |  2 +-
 doc/QuickStartGuide.dox                       |  2 +-
 .../Eigen/CXX11/src/Tensor/TensorChipping.h   |  6 ++---
 .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h  |  2 +-
 .../src/Tensor/TensorForwardDeclarations.h    |  4 +--
 .../src/Tensor/TensorGpuHipCudaDefines.h      |  2 +-
 .../Eigen/CXX11/src/Tensor/TensorReduction.h  |  2 +-
 .../CXX11/src/Tensor/TensorReductionGpu.h     |  2 +-
 .../test/cxx11_tensor_concatenation.cpp       |  2 +-
 unsupported/test/cxx11_tensor_executor.cpp    |  2 +-
 14 files changed, 29 insertions(+), 29 deletions(-)
2018-09-18 04:15:01 -04:00
Rasmus Munk Larsen
77b447c24e Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX. 2018-11-12 13:42:24 -08:00
Gael Guennebaud
0105146915 Fix warning in c++03 2018-11-10 09:11:38 +01:00
Gael Guennebaud
784a3f13cf bug #1619: fix mixing of const and non-const generic iterators 2018-11-09 21:45:10 +01:00
Gael Guennebaud
db9a9a12ba bug #1619: make const and non-const iterators compatible 2018-11-09 16:49:19 +01:00
Gael Guennebaud
bd9a00718f Let doxygen sees lastN 2018-11-09 11:35:48 +01:00
Gael Guennebaud
a368848473 Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE 2018-11-09 10:33:17 +01:00
Gael Guennebaud
f62a0f69c6 Fix max-size in indexed-view 2018-11-08 18:40:22 +01:00
Gael Guennebaud
bf495859ff Merged in glchaves/eigen (pull request PR-539)
Vectorize row-by-row gebp loop iterations on 16 packets as well
2018-11-07 07:21:15 +00:00
Gustavo Lima Chaves
4ad359237a Vectorize row-by-row gebp loop iterations on 16 packets as well
Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com>
Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com>
2018-11-06 10:48:42 -08:00
Matthieu Vigne
8d7a73e48e bug #1617: Fix SolveTriangular.solveInPlace crashing for empty matrix.
This made FullPivLU.kernel() crash when used on the zero matrix.
Add unit test for FullPivLU.kernel() on the zero matrix.
2018-10-31 20:28:18 +01:00
Christoph Hertzberg
66b28e290d bug #1618: Use different power-of-2 check to avoid MSVC warning 2018-11-01 13:23:19 +01:00
Christian von Schultz
4a40b3785d Collapsed revision (based on pull request PR-325)
* Support compiling without IO streams

Add the preprocessor definition EIGEN_NO_IO which, if defined,
disables all use of the IO streams part of the standard library.
2018-10-22 21:14:40 +02:00
Rasmus Munk Larsen
14054e217f Do not rely on the compiler generating __device__ functions for constexpr in Cuda (via EIGEN_CONSTEXPR_ARE_DEVICE_FUNC. This breaks several target in the TensorFlow Cuda build, e.g.,
INFO: From Compiling tensorflow/core/kernels/maxpooling_op_gpu.cu.cc:
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNHWC< ::Eigen::half> ") is not allowed

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code"

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNCHW< ::Eigen::half> ") is not allowed

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code

4 errors detected in the compilation of "/tmp/tmpxft_00000011_00000000-6_maxpooling_op_gpu.cu.cpp1.ii".
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: output 'tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o' was not created
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: Couldn't build file tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o: not all outputs were created or valid
2018-10-22 16:18:24 -07:00
Rasmus Munk Larsen
9caafca550 Merged in rmlarsen/eigen (pull request PR-532)
Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
2018-10-19 21:37:14 +00:00
Christoph Hertzberg
449ff74672 Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file).
Manually grafted from d107a371c6
2018-10-19 21:10:28 +02:00
Rasmus Munk Larsen
d8f285852b Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available. 2018-10-18 16:55:02 -07:00
Gael Guennebaud
0f780bb0b4 Fix float-to-double warning 2018-10-16 09:19:45 +02:00
Gael Guennebaud
a39e0f7438 bug #1612: fix regression in "outer-vectorization" of partial reductions for PacketSize==1 (aka complex<double>) 2018-10-16 01:04:25 +02:00
Gael Guennebaud
d2d570c116 Remove useless (and broken) resize 2018-10-16 00:42:48 +02:00
Gael Guennebaud
f0fb95135d Iterative solvers: unify and fix handling of multiple rhs.
m_info was not properly computed and the logic was repeated in several places.
2018-10-15 23:47:46 +02:00
Gael Guennebaud
3a33db4de5 merge 2018-10-15 09:22:27 +02:00
Mark D Ryan
aa110e681b PR 526: Speed up multiplication of small, dynamically sized matrices
The Packet16f, Packet8f and Packet8d types are too large to use with dynamically
sized matrices typically processed by the SliceVectorizedTraversal specialization of
the dense_assignment_loop.  Using these types is likely to lead to little or no
vectorization.  Significant slowdown in the multiplication of these small matrices can
be observed when building with AVX and AVX512 enabled.

This patch introduces a new dense_assignment_kernel that is used when
computing small products whose operands have dynamic dimensions.  It ensures that the
PacketSize used is no larger than 4, thereby increasing the chance that vectorized
instructions will be used when computing the product.

I tested all 969 possible combinations of M, K, and N that are handled by the
dense_assignment_loop on x86 builds.  Although a few combinations are slowed down
by this patch they are far outnumbered by the cases that are sped up, as the
following results demonstrate.


Disabling Packed8d on AVX512 builds:

Total Cases:             969
Better:                  511
Worse:                   85
Same:                    373
Max Improvement:         169.00% (4 8 6)
Max Degradation:         36.50% (8 5 3)
Median Improvement:      35.46%
Median Degradation:      17.41%
Total FLOPs Improvement: 19.42%


Disabling Packet16f and Packed8f on AVX512 builds:

Total Cases:             969
Better:                  658
Worse:                   5
Same:                    306
Max Improvement:         214.05% (8 6 5)
Max Degradation:         22.26% (16 2 1)
Median Improvement:      60.05%
Median Degradation:      13.32%
Total FLOPs Improvement: 59.58%


Disabling Packed8f on AVX builds:

Total Cases:             969
Better:                  663
Worse:                   96
Same:                    210
Max Improvement:         155.29% (4 10 5)
Max Degradation:         35.12% (8 3 2)
Median Improvement:      34.28%
Median Degradation:      15.05%
Total FLOPs Improvement: 26.02%
2018-10-12 15:20:21 +02:00
Eugene Zhulenev
d9392f9e55 Fix code format 2018-11-02 14:51:35 -07:00
Eugene Zhulenev
118520f04a Workaround nbcc+msvc compiler bug 2018-11-02 14:48:28 -07:00
Christoph Hertzberg
24dc076519 Explicitly convert 0 to Scalar for custom types 2018-10-12 10:22:19 +02:00
Gael Guennebaud
43633fbaba Fix warning with AVX512f 2018-10-11 10:13:48 +02:00
Gael Guennebaud
97e2c808e9 Fix avx512 plog(NaN) to return NaN instead of +inf 2018-10-11 10:13:13 +02:00
Gael Guennebaud
b3f66d29a5 Enable avx512 plog with clang 2018-10-11 10:12:21 +02:00
Gael Guennebaud
f0aa7e40fc Fix regression in changeset 5335659c47 2018-10-10 23:47:30 +02:00
Gael Guennebaud
ce243ee45b bug #520: add diagmat +/- diagmat operators. 2018-10-10 23:38:22 +02:00
Gael Guennebaud
5335659c47 Merged in ezhulenev/eigen-02 (pull request PR-525)
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 20:59:00 +00:00
Gael Guennebaud
eec0dfd688 bug #632: add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense.
They are rewritten as two compound assignment to by-pass hybrid dense-sparse iterator.
2018-10-10 22:50:15 +02:00
Eugene Zhulenev
8e6dc2c81d Fix bug in partial reduction of expressions requiring evaluation 2018-10-10 13:23:52 -07:00
Eugene Zhulenev
2bf1a31d81 Use void type if stl-style iterators are not supported 2018-10-10 10:31:40 -07:00
Rasmus Munk Larsen
e8918743c1 Merged in ezhulenev/eigen-01 (pull request PR-523)
Compile time detection for unimplemented stl-style iterators
2018-10-09 23:42:01 +00:00
Eugene Zhulenev
c0ca8a9fa3 Compile time detection for unimplemented stl-style iterators 2018-10-09 15:28:23 -07:00
Gael Guennebaud
1dd1f8e454 bug #65: add vectorization of partial reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean() 2018-10-09 23:36:50 +02:00
Gael Guennebaud
bfa2a81a50 Make redux_vec_unroller more flexible regarding packet-type 2018-10-09 23:30:41 +02:00
Christoph Hertzberg
f6359ad795 Small Doxygen fixes 2018-10-09 19:33:35 +02:00
Gael Guennebaud
7a882c05ab Fix compilation on CUDA 2018-10-09 17:02:16 +02:00
Gael Guennebaud
e00487f7d2 bug #1603: add parenthesis around ternary operator in function body as well as a harmless attempt to make MSVC happy. 2018-10-08 22:27:04 +02:00
Gael Guennebaud
649d4758a6 merge 2018-10-08 17:35:18 +02:00
Gael Guennebaud
774bb9d6f7 fix a doxygen issue 2018-10-08 09:30:15 +02:00
Gael Guennebaud
bcb7c66b53 Workaround gcc's alloc-size-larger-than= warning 2018-10-07 21:55:59 +02:00
Gael Guennebaud
6512c5e136 Implement a better workaround for GCC's bug #87544 2018-10-07 15:00:05 +02:00
Gael Guennebaud
409132bb81 Workaround gcc bug making it trigger an invalid warning 2018-10-07 09:23:15 +02:00
Gael Guennebaud
c6a1ab4036 Workaround MSVC compilation issue 2018-10-06 13:49:17 +02:00
Gael Guennebaud
e21766c6f5 Clarify doc of rowwise/colwise/vectorwise. 2018-10-05 23:12:09 +02:00
Gael Guennebaud
d92f004ab7 Simplify API by removing allCols/allRows and reusing rowwise/colwise to define iterators over rows/columns 2018-10-05 23:11:21 +02:00
Gael Guennebaud
3e64b1fc86 Move iterators to internal, improve doc, make unit test c++03 friendly 2018-10-03 15:13:15 +02:00
Gael Guennebaud
2b2b4d0580 fix unused warning 2018-10-03 14:16:21 +02:00
Gael Guennebaud
5f26f57598 Change the logic of A.reshaped<Order>() to be a simple alias to A.reshaped<Order>(AutoSize,fix<1>).
This means that now AutoOrder is allowed, and it always return a column-vector.
2018-10-03 11:41:47 +02:00
Gael Guennebaud
0481900e25 Add pointer-based iterator for direct-access expressions 2018-10-02 23:44:36 +02:00
Gael Guennebaud
8c38528168 Factorize RowsProxy/ColsProxy and related iterators using subVector<>(Index) 2018-10-02 14:03:26 +02:00
Gael Guennebaud
12487531ce Add templated subVector<Vertical/Horizonal>(Index) aliases to col/row(Index) methods (plus subVectors<>() to retrieve the number of rows/columns) 2018-10-02 14:02:34 +02:00
Gael Guennebaud
37e29fc893 Use Index instead of ptrdiff_t or int, fix random-accessors. 2018-10-02 13:29:32 +02:00
Gael Guennebaud
de2efbc43c bug #1605: workaround ABI issue with vector types (aka __m128) versus scalar types (aka float) 2018-10-01 23:45:55 +02:00
Gael Guennebaud
b0c66adfb1 bug #231: initial implementation of STL iterators for dense expressions 2018-10-01 23:21:37 +02:00
Christoph Hertzberg
564ca71e39 Merged in deven-amd/eigen/HIP_fixes (pull request PR-518)
PR with HIP specific fixes (for the eigen nightly regression failures in HIP mode)
2018-10-01 16:51:04 +00:00
Deven Desai
94898488a6 This commit contains the following (HIP specific) updates:
- unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h
  Changing "pass-by-reference" argument to be "pass-by-value" instead
  (in a  __global__ function decl).
  "pass-by-reference" arguments to __global__ functions are unwise,
  and will be explicitly flagged as errors by the newer versions of HIP.

- Eigen/src/Core/util/Memory.h
- unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
  Changes introduced in recent commits breaks the HIP compile.
  Adding EIGEN_DEVICE_FUNC attribute to some functions and
  calling ::malloc/free instead of the corresponding std:: versions
  to get the HIP compile working again

- unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
  Change introduced a recent commit breaks the HIP compile
  (link stage errors out due to failure to inline a function).
  Disabling the recently introduced code (only for HIP compile), to get
  the eigen nightly testing going again.
  Will submit another PR once we have te proper fix.

- Eigen/src/Core/util/ConfigureVectorization.h
  Enabling GPU VECTOR support when HIP compiler is in use
  (for both the host and device compile phases)
2018-10-01 14:28:37 +00:00
Gael Guennebaud
af3ad4b513 oops, I've been too fast in previous copy/paste 2018-09-27 09:28:57 +02:00
Gael Guennebaud
24b163a877 #pragma GCC diagnostic push/pop is not supported prioro to gcc 4.6 2018-09-27 09:23:54 +02:00
Gael Guennebaud
41c3a2ffc1 Fix documentation of reshape to vectors. 2018-09-25 16:35:44 +02:00
Christoph Hertzberg
2c083ace3e Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides 2018-09-24 18:01:17 +02:00
Gael Guennebaud
626942d9dd fix alignment issue in ploaddup for AVX512 2018-09-28 16:57:32 +02:00
Gael Guennebaud
84a1101b36 Merge with default. 2018-09-23 21:52:58 +02:00
Gael Guennebaud
795e12393b Fix logic in diagonal*dense product in a corner case.
The problem was for: diag(1x1) * mat(1,n)
2018-09-22 16:44:33 +02:00
Gael Guennebaud
bac36d0996 Demangle Travseral and Unrolling in Redux 2018-09-21 23:03:45 +02:00
Gael Guennebaud
1bf12880ae Add reshaped<>() shortcuts when returning vectors and remove the reshaping version of operator()(all) 2018-09-21 16:50:04 +02:00
Gael Guennebaud
371068992a Add more debug output 2018-09-21 14:32:39 +02:00
Gael Guennebaud
b00e48a867 Improve slice-vectorization logic for redux (significant speed-up for reduxion of blocks) 2018-09-21 13:45:56 +02:00
Gael Guennebaud
a488d59787 merge with default Eigen 2018-09-21 11:51:49 +02:00
Gael Guennebaud
47720e7970 Doc fixes 2018-09-21 11:48:22 +02:00
Gael Guennebaud
3ec2985914 Merged indexing cleanup (pull request PR-506) 2018-09-21 09:36:05 +00:00
Gael Guennebaud
651e5d4866 Fix EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE for AVX512 or AVX with malloc aligned on 8 bytes only.
This change also make it future proof for AVX1024
2018-09-21 10:33:22 +02:00
Gael Guennebaud
f0ef3467de Fix doc 2018-09-20 22:57:28 +02:00
Gael Guennebaud
617f75f117 Add indexing namespace 2018-09-20 22:57:10 +02:00
Gael Guennebaud
0c56d22e2e Fix shadowing 2018-09-20 22:56:21 +02:00
Gael Guennebaud
9419f506d0 Fix regression introduced by the previous fix for AVX512.
It brokes the complex-complex case on SSE.
2018-09-20 17:32:34 +02:00
Gael Guennebaud
e38d1ab4d1 Workaround increases required alignment warning 2018-09-20 17:07:33 +02:00
Gael Guennebaud
71496b0e25 Fix gebp kernel for real+complex in case only reals are vectorized (e.g., AVX512).
This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.
2018-09-20 17:01:24 +02:00
Gael Guennebaud
5a30eed17e Fix warnings in AVX512 2018-09-20 16:58:51 +02:00
Gael Guennebaud
c3a19527a2 Fix doc wrt previous change 2018-09-19 11:49:26 +02:00
Gael Guennebaud
dfa8439e4d Update reshaped API to use RowMajor/ColMajor directly as integral values instead of introducing RowOrder/ColOrder types.
The API changed from A.respahed(rows,cols,RowOrder) to A.template reshaped<RowOrder>(rows,cols).
2018-09-19 11:49:26 +02:00
Gael Guennebaud
297ca62319 ease transition by adding placeholders::all/last/and as deprecated 2018-09-17 16:24:52 +02:00
Gael Guennebaud
2014c7ae28 Move all, last, end from Eigen::placeholders namespace to Eigen::, and rename end to lastp1 to avoid conflicts with std::end. 2018-09-15 14:35:10 +02:00
Gael Guennebaud
82772e8d9d Rename Symbolic namespace to symbolic to be consistent with numext namespace 2018-09-15 14:16:20 +02:00
Gael Guennebaud
3e8188fc77 bug #1600: initialize m_info to InvalidInput by default, even though m_info is not accessible until it has been initialized (assert) 2018-09-18 21:24:48 +02:00
Christoph Hertzberg
d7378aae8e Provide EIGEN_ALIGNOF macro, and give handmade_aligned_malloc the possibility for alignments larger than the standard alignment. 2018-09-14 20:17:47 +02:00
Gael Guennebaud
1141bcf794 Fix conjugate-gradient for very small rhs 2018-09-13 23:53:28 +02:00
Deven Desai
c64fe9ea1f Updates to fix HIP-clang specific compile errors.
Compiling the eigen unittests with hip-clang (HIP with clang as the underlying compiler instead of hcc or nvcc), results in compile errors. The changes in this commit fix those compile errors. The main change is to convert a few instances of "__device__" to "EIGEN_DEVICE_FUNC"
2018-08-30 20:22:16 +00:00
Gael Guennebaud
5927eef612 Enable std::result_of for msvc 2015 and later 2018-09-13 09:44:46 +02:00
Christoph Hertzberg
3adece4827 Fix misleading indentation of errorCode and make it loop-local 2018-09-12 14:41:38 +02:00
Christoph Hertzberg
7e9c9fbb2d Disable type-limits warnings for g++ < 4.8 2018-09-12 14:40:39 +02:00
Justin Carpentier
4827bec776 LLT: correct doc and add missing reference for the return type of rankUpdate
---
 Eigen/src/Cholesky/LLT.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
2018-09-11 09:33:21 +02:00
luz.paz"
43fd42a33b Fix doxy and misc. typos
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt`
---
 Eigen/src/Core/ProductEvaluators.h |  4 ++--
 Eigen/src/Core/arch/GPU/Half.h     |  2 +-
 Eigen/src/Core/util/Memory.h       |  2 +-
 Eigen/src/Geometry/Hyperplane.h    |  2 +-
 Eigen/src/Geometry/Transform.h     |  2 +-
 Eigen/src/Geometry/Translation.h   | 12 ++++++------
 doc/PreprocessorDirectives.dox     |  2 +-
 doc/TutorialGeometry.dox           |  2 +-
 test/boostmultiprec.cpp            |  2 +-
 test/triangular.cpp                |  2 +-
 10 files changed, 16 insertions(+), 16 deletions(-)
2018-08-01 21:34:47 -04:00
Jiandong Ruan
6dcd2642aa bug #1526 - CUDA compilation fails on CUDA 9.x SDK when arch is set to compute_60 and/or above 2018-09-08 12:05:33 -07:00
Alexey Frunze
ec38f07b79 bug #1595: Don't use C++11's std::isnan() in MIPS/MSA packet math.
This removes reliance on C++11 and improves generated code.
2018-09-06 15:40:09 -07:00
cgs1019
c6066ac411 Make param name and docs constistent for JacobiRotation::makeGivens
Previously the rendered math in the doc string called the optional return value
'r', while the actual parameter and the doc string text referred to the
parameter as 'z'. This changeset renames all the z's to r's to match the math.
2018-09-06 11:04:17 -04:00
Christoph Hertzberg
ddbc564386 Fixed a few more shadowing warnings when compiling with g++ (and c++03) 2018-08-30 16:33:03 +02:00
Mehdi Goli
7ec8b40ad9 Collapsed revision
* Separating SYCL math function.
* Converting function overload to function specialisation.
* Applying the suggested design.
2018-08-28 14:20:48 +01:00
Christoph Hertzberg
73ca600bca Fix numerous shadow-warnings for GCC<=4.8 2018-08-28 18:32:39 +02:00
Christoph Hertzberg
ef4d79fed8 Disable/ReenableStupidWarnings did not work properly, when included recursively 2018-08-28 18:26:22 +02:00
Gael Guennebaud
befaf83f5f bug #1590: fix collision with some system headers defining the macro FP32 2018-08-28 13:21:28 +02:00
Christoph Hertzberg
42f3ee4fb8 Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop
Workaround: Don't include "DisableStupidWarnings.h" before including other main-headers
2018-08-28 11:44:15 +02:00