Commit Graph

5789 Commits

Author SHA1 Message Date
Gael Guennebaud
c68bd2fa7a Cleanup 2018-11-30 14:32:31 +01:00
Gael Guennebaud
f91500d303 Fix pandnot order in AVX512 2018-11-30 14:32:06 +01:00
Gael Guennebaud
b477d60bc6 Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX) 2018-11-30 11:26:30 +01:00
Gael Guennebaud
e19ece822d Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks) 2018-11-28 17:56:24 +01:00
Gael Guennebaud
41052f63b7 same for pmax 2018-11-28 17:17:28 +01:00
Gael Guennebaud
3e95e398b6 pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions 2018-11-28 17:14:20 +01:00
Gael Guennebaud
aa6097395b Add missing SSE/AVX type-casting in AVX512 mode 2018-11-28 16:09:08 +01:00
Gael Guennebaud
48fe78c375 bug #1630: fix linspaced when requesting smaller packet size than default one. 2018-11-28 13:15:06 +01:00
Eugene Zhulenev
80f1651f35 Use explicit packet type in SSE/PacketMath pldexp 2018-11-27 17:25:49 -08:00
Benoit Jacob
a4159dba08 do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first). 2018-11-27 16:53:14 -05:00
Gael Guennebaud
b131a4db24 bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions. 2018-11-27 23:45:00 +01:00
Gael Guennebaud
a1a5fbbd21 Update pshiftleft to pass the shift as a true compile-time integer. 2018-11-27 22:57:30 +01:00
Gael Guennebaud
fa7fd61eda Unify SSE/AVX psin functions.
It is based on the SSE version which is much more accurate, though very slightly slower.
This changeset also includes the following required changes:
 - add packet-float to packet-int type traits
 - add packet float<->int reinterpret casts
 - add faster pselect for AVX based on blendv
2018-11-27 22:41:51 +01:00
Benoit Jacob
7b1cb8a440 fix the build on 64-bit ARM when NEON is disabled 2018-11-27 11:11:02 -05:00
Gael Guennebaud
b5695a6008 Unify Altivec/VSX pexp(double) with default implementation 2018-11-27 13:53:05 +01:00
Gael Guennebaud
7655a8af6e cleanup 2018-11-26 23:21:29 +01:00
Gael Guennebaud
502f92fa10 Unify SSE and AVX pexp for double. 2018-11-26 23:12:44 +01:00
Gael Guennebaud
4a347a0054 Unify NEON's pexp with generic implementation 2018-11-26 22:15:44 +01:00
Gael Guennebaud
5c8406babc Unify Altivec/VSX's pexp with generic implementation 2018-11-26 16:47:13 +01:00
Gael Guennebaud
cf8b85d5c5 Unify SSE and AVX implementation of pexp 2018-11-26 16:36:19 +01:00
Gael Guennebaud
c2f35b1b47 Unify Altivec/VSX's plog with generic implementation, and enable it! 2018-11-26 15:58:11 +01:00
Gael Guennebaud
c24e98e6a8 Unify NEON's plog with generic implementation 2018-11-26 15:02:16 +01:00
Gael Guennebaud
2c44c40114 First step toward a unification of packet log implementation, currently only SSE and AVX are unified.
To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
2018-11-26 14:21:24 +01:00
Gael Guennebaud
5f6045077c Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B" 2018-11-26 14:14:07 +01:00
Gael Guennebaud
0836a715d6 bug #1611: fix plog(0) on NEON 2018-11-26 09:08:38 +01:00
Patrik Huber
95566eeed4 Fix typos 2018-11-23 22:22:14 +00:00
Gael Guennebaud
ccabdd88c9 Fix reserved usage of double __ in macro names 2018-11-23 16:01:47 +01:00
Gael Guennebaud
a7842daef2 Fix several uninitialized member from ctor 2018-11-23 15:10:28 +01:00
Gael Guennebaud
a476054879 bug #1624: improve matrix-matrix product on ARM 64, 20% speedup 2018-11-23 10:25:19 +01:00
Gael Guennebaud
4b2cebade8 Workaround weird MSVC bug 2018-11-21 15:53:37 +01:00
Gael Guennebaud
6a510fe69c Make MaxPacketSize a true upper bound, even for fixed-size inputs 2018-11-16 11:25:32 +01:00
Mark D Ryan
670d56441c PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals
Commit aa110e681b
 optimised the multiplication of small dyanmically
sized matrices by restricting the packet size to a maximum of 4, increasing
the chances that SIMD instructions are used in the computation.  However, it
introduced a mismatch between the packet size and the requestedAlignment.  This
mismatch can lead to crashes when the destination is not aligned.  This patch
fixes the issue by ensuring that the AssignmentTraits are correctly computed
when using a restricted packet size.
* * *
Bind LinearPacketType to MaxPacketSize

This commit applies any packet size limit specified when instantiating
copy_using_evaluator_traits to the LinearPacketType, providing that the
size of the destination is not known at compile time.
* * *
Add unit test for restricted packet assignment

A new unit test is added to check that multiplication of small dynamically
sized matrices works correctly when the packet size is restricted to 4 and
the destination is unaligned.
2018-11-13 16:15:08 +01:00
Nikolaus Demmel
3dc0845046 Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES 2018-11-14 18:11:30 +01:00
Gael Guennebaud
7fddc6a51f typo 2018-11-14 14:43:18 +01:00
Gael Guennebaud
449f948b2a help doxygen linking to DenseBase::NulllaryExpr 2018-11-14 14:42:59 +01:00
luz.paz"
f67b19a884 [PATCH 1/2] Misc. typos
From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of:
```
als
ans
cas
dum
lastr
lowd
nd
overfl
pres
preverse
substraction
te
uint
whch
```
---
 CMakeLists.txt                                | 26 +++++++++----------
 Eigen/src/Core/GenericPacketMath.h            |  2 +-
 Eigen/src/SparseLU/SparseLU.h                 |  2 +-
 bench/bench_norm.cpp                          |  2 +-
 doc/HiPerformance.dox                         |  2 +-
 doc/QuickStartGuide.dox                       |  2 +-
 .../Eigen/CXX11/src/Tensor/TensorChipping.h   |  6 ++---
 .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h  |  2 +-
 .../src/Tensor/TensorForwardDeclarations.h    |  4 +--
 .../src/Tensor/TensorGpuHipCudaDefines.h      |  2 +-
 .../Eigen/CXX11/src/Tensor/TensorReduction.h  |  2 +-
 .../CXX11/src/Tensor/TensorReductionGpu.h     |  2 +-
 .../test/cxx11_tensor_concatenation.cpp       |  2 +-
 unsupported/test/cxx11_tensor_executor.cpp    |  2 +-
 14 files changed, 29 insertions(+), 29 deletions(-)
2018-09-18 04:15:01 -04:00
Rasmus Munk Larsen
77b447c24e Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX. 2018-11-12 13:42:24 -08:00
Gael Guennebaud
0105146915 Fix warning in c++03 2018-11-10 09:11:38 +01:00
Gael Guennebaud
784a3f13cf bug #1619: fix mixing of const and non-const generic iterators 2018-11-09 21:45:10 +01:00
Gael Guennebaud
db9a9a12ba bug #1619: make const and non-const iterators compatible 2018-11-09 16:49:19 +01:00
Gael Guennebaud
bd9a00718f Let doxygen sees lastN 2018-11-09 11:35:48 +01:00
Gael Guennebaud
a368848473 Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE 2018-11-09 10:33:17 +01:00
Gael Guennebaud
f62a0f69c6 Fix max-size in indexed-view 2018-11-08 18:40:22 +01:00
Gael Guennebaud
bf495859ff Merged in glchaves/eigen (pull request PR-539)
Vectorize row-by-row gebp loop iterations on 16 packets as well
2018-11-07 07:21:15 +00:00
Gustavo Lima Chaves
4ad359237a Vectorize row-by-row gebp loop iterations on 16 packets as well
Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com>
Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com>
2018-11-06 10:48:42 -08:00
Matthieu Vigne
8d7a73e48e bug #1617: Fix SolveTriangular.solveInPlace crashing for empty matrix.
This made FullPivLU.kernel() crash when used on the zero matrix.
Add unit test for FullPivLU.kernel() on the zero matrix.
2018-10-31 20:28:18 +01:00
Christoph Hertzberg
66b28e290d bug #1618: Use different power-of-2 check to avoid MSVC warning 2018-11-01 13:23:19 +01:00
Christian von Schultz
4a40b3785d Collapsed revision (based on pull request PR-325)
* Support compiling without IO streams

Add the preprocessor definition EIGEN_NO_IO which, if defined,
disables all use of the IO streams part of the standard library.
2018-10-22 21:14:40 +02:00
Rasmus Munk Larsen
14054e217f Do not rely on the compiler generating __device__ functions for constexpr in Cuda (via EIGEN_CONSTEXPR_ARE_DEVICE_FUNC. This breaks several target in the TensorFlow Cuda build, e.g.,
INFO: From Compiling tensorflow/core/kernels/maxpooling_op_gpu.cu.cc:
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNHWC< ::Eigen::half> ") is not allowed

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code"

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNCHW< ::Eigen::half> ") is not allowed

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code

4 errors detected in the compilation of "/tmp/tmpxft_00000011_00000000-6_maxpooling_op_gpu.cu.cpp1.ii".
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: output 'tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o' was not created
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: Couldn't build file tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o: not all outputs were created or valid
2018-10-22 16:18:24 -07:00
Rasmus Munk Larsen
9caafca550 Merged in rmlarsen/eigen (pull request PR-532)
Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
2018-10-19 21:37:14 +00:00