Commit Graph

6119 Commits

Author SHA1 Message Date
Gael Guennebaud
c8b2c603b0 Fix speed issue with SimplicialLDLT for complexes: the diagonal is real! 2019-09-30 16:14:34 +02:00
Rasmus Munk Larsen
13ef08e5ac Move implementation of vectorized error function erf() to SpecialFunctionsImpl.h. 2019-09-27 13:56:04 -07:00
Eugene Zhulenev
0c845e28c9 Fix erf in c++03 2019-09-25 11:31:45 -07:00
Deven Desai
5e186b1987 Fix for the HIP build+test errors.
The errors were introduced by this commit : d38e6fbc27


After the above mentioned commit, some of the tests started failing with the following error


```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:70:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h:28:22: error: call to 'erf' is ambiguous
  return Eigen::half(Eigen::numext::erf(static_cast<float>(a)));
                     ^~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1600:7: note: candidate function [with T = float]
float erf(const float &x) { return ::erff(x); }
      ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = float]
    erf(const Scalar& x) {
    ^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:23: error: call to 'erf' is ambiguous
  return make_double2(erf(a.x), erf(a.y));
                      ^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
       ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
    erf(const Scalar& x) {
    ^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:33: error: call to 'erf' is ambiguous
  return make_double2(erf(a.x), erf(a.y));
                                ^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
       ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
    erf(const Scalar& x) {
    ^
3 errors generated.
```


This PR fixes the compile error by removing the "old" implementation for "erf" (assuming that the "new" implementation is what we want going forward. from a GPU point-of-view both implementations are the same).

This PR also fixes what seems like a cut-n-paste error in the aforementioned commit
2019-09-25 15:39:13 +00:00
Rasmus Larsen
d38e6fbc27 Merged in rmlarsen/eigen (pull request PR-704)
Add generic PacketMath implementation of the Error Function (erf).
2019-09-24 23:40:29 +00:00
Eugene Zhulenev
ef9dfee7bd Tensor block evaluation V2 support for unary/binary/broadcsting 2019-09-24 12:52:45 -07:00
Christoph Hertzberg
efd9867ff0 bug #1746: Removed implementation of standard copy-constructor and standard copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types 2019-09-24 11:09:58 +02:00
Rasmus Munk Larsen
6de5ed08d8 Add generic PacketMath implementation of the Error Function (erf). 2019-09-19 12:48:30 -07:00
Rasmus Munk Larsen
28b6786498 Fix build on setups without AVX512DQ. 2019-09-19 12:36:09 -07:00
Deven Desai
e02d429637 Fix for the HIP build+test errors.
The errors were introduced by this commit : 6e215cf109


The fix is switching to using ::<math_func> instead std::<math_func> when compiling for GPU
2019-09-18 18:44:20 +00:00
Srinivas Vasudevan
6e215cf109 Add Bessel functions to SpecialFunctions.
- Split SpecialFunctions files in to a separate BesselFunctions file.

In particular add:
    - Modified bessel functions of the second kind k0, k1, k0e, k1e
    - Bessel functions of the first kind j0, j1
    - Bessel functions of the second kind y0, y1
2019-09-14 12:16:47 -04:00
Srinivas Vasudevan
facdec5aa7 Add packetized versions of i0e and i1e special functions.
- In particular refactor the i0e and i1e code so scalar and vectorized path share code.
  - Move chebevl to GenericPacketMathFunctions.


A brief benchmark with building Eigen with FMA, AVX and AVX2 flags

Before:

CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1            57.3           57.3     10000000
BM_eigen_i0e_double/8           398            398        1748554
BM_eigen_i0e_double/64         3184           3184         218961
BM_eigen_i0e_double/512       25579          25579          27330
BM_eigen_i0e_double/4k       205043         205042           3418
BM_eigen_i0e_double/32k     1646038        1646176            422
BM_eigen_i0e_double/256k   13180959       13182613             53
BM_eigen_i0e_double/1M     52684617       52706132             10
BM_eigen_i0e_float/1             28.4           28.4     24636711
BM_eigen_i0e_float/8             75.7           75.7      9207634
BM_eigen_i0e_float/64           512            512        1000000
BM_eigen_i0e_float/512         4194           4194         166359
BM_eigen_i0e_float/4k         32756          32761          21373
BM_eigen_i0e_float/32k       261133         261153           2678
BM_eigen_i0e_float/256k     2087938        2088231            333
BM_eigen_i0e_float/1M       8380409        8381234             84
BM_eigen_i1e_double/1            56.3           56.3     10000000
BM_eigen_i1e_double/8           397            397        1772376
BM_eigen_i1e_double/64         3114           3115         223881
BM_eigen_i1e_double/512       25358          25361          27761
BM_eigen_i1e_double/4k       203543         203593           3462
BM_eigen_i1e_double/32k     1613649        1613803            428
BM_eigen_i1e_double/256k   12910625       12910374             54
BM_eigen_i1e_double/1M     51723824       51723991             10
BM_eigen_i1e_float/1             28.3           28.3     24683049
BM_eigen_i1e_float/8             74.8           74.9      9366216
BM_eigen_i1e_float/64           505            505        1000000
BM_eigen_i1e_float/512         4068           4068         171690
BM_eigen_i1e_float/4k         31803          31806          21948
BM_eigen_i1e_float/32k       253637         253692           2763
BM_eigen_i1e_float/256k     2019711        2019918            346
BM_eigen_i1e_float/1M       8238681        8238713             86


After:

CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1            15.8           15.8     44097476
BM_eigen_i0e_double/8            99.3           99.3      7014884
BM_eigen_i0e_double/64          777            777         886612
BM_eigen_i0e_double/512        6180           6181         100000
BM_eigen_i0e_double/4k        48136          48140          14678
BM_eigen_i0e_double/32k      385936         385943           1801
BM_eigen_i0e_double/256k    3293324        3293551            228
BM_eigen_i0e_double/1M     12423600       12424458             57
BM_eigen_i0e_float/1             16.3           16.3     43038042
BM_eigen_i0e_float/8             30.1           30.1     23456931
BM_eigen_i0e_float/64           169            169        4132875
BM_eigen_i0e_float/512         1338           1339         516860
BM_eigen_i0e_float/4k         10191          10191          68513
BM_eigen_i0e_float/32k        81338          81337           8531
BM_eigen_i0e_float/256k      651807         651984           1000
BM_eigen_i0e_float/1M       2633821        2634187            268
BM_eigen_i1e_double/1            16.2           16.2     42352499
BM_eigen_i1e_double/8           110            110        6316524
BM_eigen_i1e_double/64          822            822         851065
BM_eigen_i1e_double/512        6480           6481         100000
BM_eigen_i1e_double/4k        51843          51843          10000
BM_eigen_i1e_double/32k      414854         414852           1680
BM_eigen_i1e_double/256k    3320001        3320568            212
BM_eigen_i1e_double/1M     13442795       13442391             53
BM_eigen_i1e_float/1             17.6           17.6     41025735
BM_eigen_i1e_float/8             35.5           35.5     19597891
BM_eigen_i1e_float/64           240            240        2924237
BM_eigen_i1e_float/512         1424           1424         485953
BM_eigen_i1e_float/4k         10722          10723          65162
BM_eigen_i1e_float/32k        86286          86297           8048
BM_eigen_i1e_float/256k      691821         691868           1000
BM_eigen_i1e_float/1M       2777336        2777747            256


This shows anywhere from a 50% to 75% improvement on these operations.

I've also benchmarked without any of these flags turned on, and got similar
performance to before (if not better).

Also tested packetmath.cpp + special_functions to ensure no regressions.
2019-09-11 18:34:02 -07:00
Srinivas Vasudevan
b052ec6992 Merged eigen/eigen into default 2019-09-11 18:01:54 -07:00
Deven Desai
cdb377d0cb Fix for the HIP build+test errors introduced by the ndtri support.
The fixes needed are
 * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs)
 * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC
 * removing an errant "j" from a testcase (don't know how that made it in to begin with!)
2019-09-06 16:03:49 +00:00
Gael Guennebaud
747c6a51ca bug #1736: fix compilation issue with A(all,{1,2}).col(j) by implementing true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i) 2019-09-11 15:40:07 +02:00
Gael Guennebaud
031f17117d bug #1741: fix self-adjoint*matrix, triangular*matrix, and triangular^1*matrix with a destination having a non-trivial inner-stride 2019-09-11 15:04:25 +02:00
Gael Guennebaud
459b2bcc08 Fix compilation of BLAS backend and frontend 2019-09-11 10:02:37 +02:00
Gael Guennebaud
afa8d13532 Fix some implicit literal to Scalar conversions in SparseCore 2019-09-11 00:03:07 +02:00
Gael Guennebaud
c06e6fd115 bug #1741: fix SelfAdjointView::rankUpdate and product to triangular part for destination with non-trivial inner stride 2019-09-10 23:29:52 +02:00
Gael Guennebaud
ea0d5dc956 bug #1741: fix C.noalias() = A*C; with C.innerStride()!=1 2019-09-10 16:25:24 +02:00
Gael Guennebaud
17226100c5 Fix a circular dependency regarding pshift* functions and GenericPacketMathFunctions.
Another solution would have been to make pshift* fully generic template functions with
partial specialization which is always a mess in c++03.
2019-09-06 09:26:04 +02:00
Gael Guennebaud
55b63d4ea3 Fix compilation without vector engine available (e.g., x86 with SSE disabled):
-> ppolevl is required by ndtri even for the scalar path
2019-09-05 18:16:46 +02:00
Srinivas Vasudevan
a9cf823db7 Merged eigen/eigen 2019-09-04 23:50:52 -04:00
Gael Guennebaud
e6c183f8fd Fix doc issues regarding ndtri 2019-09-04 23:00:21 +02:00
Gael Guennebaud
5702a57926 Fix possible warning regarding strict equality comparisons 2019-09-04 22:57:04 +02:00
Srinivas Vasudevan
99036a3615 Merging from eigen/eigen. 2019-09-03 15:34:47 -04:00
Gael Guennebaud
8e7e3d9bc8 Makes Scalar/RealScalar typedefs public in Pardiso's wrappers (see PR 688) 2019-09-03 13:09:03 +02:00
Srinivas Vasudevan
e38dd48a27 PR 681: Add ndtri function, the inverse of the normal distribution function. 2019-08-12 19:26:29 -04:00
Eugene Zhulenev
f59bed7a13 Change typedefs from private to protected to fix MSVC compilation 2019-09-03 19:11:36 -07:00
Srinivas Vasudevan
18ceb3413d Add ndtri function, the inverse of the normal distribution function. 2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen
d55d392e7b Fix bugs in log1p and expm1 where repeated using statements would clobber each other.
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
2019-08-08 16:27:32 -07:00
Gael Guennebaud
15f3d9d272 More colamd cleanup:
- Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc.
- Fix signed/unsigned warning
- move some ugly free functions as member functions
2019-09-03 00:50:51 +02:00
Anshul Jaiswal
a4d1a6cd7d Eigen_Colamd.h updated to replace constexpr with consts and enums. 2019-08-17 05:29:23 +00:00
Anshul Jaiswal
283558face Ordering.h edited to fix dependencies on Eigen_Colamd.h 2019-08-15 20:21:56 +00:00
Anshul Jaiswal
39f30923c2 Eigen_Colamd.h edited replacing macros with constexprs and functions. 2019-08-15 20:15:19 +00:00
Anshul Jaiswal
0a6b553ecf Eigen_Colamd.h edited online with Bitbucket replacing constant #defines with const definitions 2019-07-21 04:53:31 +00:00
Michael Grupp
6e17491f45 Fix typo in Umeyama method documentation 2019-07-17 11:20:41 +00:00
Christoph Hertzberg
e0f5a2a456 Remove {} accidentally added in previous commit 2019-07-18 20:22:17 +02:00
Christoph Hertzberg
ea6d7eb32f Move variadic constructors outside #ifndef EIGEN_PARSED_BY_DOXYGEN block, to make it actually appear in the generated documentation. 2019-07-12 19:46:37 +02:00
Christoph Hertzberg
c2671e5315 Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNING
Also, document LinSpaced only where it is implemented
2019-07-12 19:43:32 +02:00
Rasmus Munk Larsen
23b958818e Fix compiler for unsigned integers. 2019-07-09 11:18:25 -07:00
Anshul Jaiswal
fab51d133e Updated Eigen_Colamd.h, namespacing macros ALIVE & DEAD as COLAMD_ALIVE & COLAMD_DEAD
to prevent conflicts with other libraries / code.
2019-06-08 21:09:06 +00:00
Rasmus Munk Larsen
f6c51d9209 Fix missing header inclusion and colliding definitions for half type casting, which broke
build with -march=native on Haswell/Skylake.
2019-08-30 14:03:29 -07:00
Rasmus Munk Larsen
1187bb65ad Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf. 2019-08-28 12:20:21 -07:00
Rasmus Munk Larsen
9aba527405 Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly. 2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen
b021cdea6d Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories. 2019-08-27 11:30:31 -07:00
Christoph Hertzberg
2fb24384c9 Merged in jaopaulolc/eigen (pull request PR-679)
Fixes for Altivec/VSX and compilation with clang on PowerPC
2019-08-22 15:57:33 +00:00
João P. L. de Carvalho
5ac7984ffa Fix debug macros in p{load,store}u 2019-08-14 11:59:12 -06:00
João P. L. de Carvalho
db9147ae40 Add missing pcmp_XX methods for double/Packet2d
This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
2019-08-14 10:37:39 -06:00
Rasmus Munk Larsen
a3298b22ec Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)

The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.

Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
2019-08-12 13:53:28 -07:00
João P. L. de Carvalho
787f6ef025 Fix packed load/store for PowerPC's VSX
The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts.

For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f.

Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
2019-08-09 16:02:55 -06:00
João P. L. de Carvalho
4d29aa0294 Fix offset argument of ploadu/pstoreu for Altivec
If no offset is given, them it should be zero.

Also passes full address to vec_vsx_ld/st builtins.

Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.

Removes unnecessary casts.
2019-08-09 15:59:26 -06:00
João P. L. de Carvalho
66d073c38e bug #1718: Add cast to successfully compile with clang on PowerPC
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
2019-08-09 15:56:26 -06:00
Justin Carpentier
ffaf658ecd PR 655: Fix missing Eigen namespace in Macros 2019-06-05 09:51:59 +02:00
Mehdi Goli
0b24e1cb5c [SYCL] Adding the SYCL memory model. The SYCL memory model provides :
* an interface for SYCL buffers to behave as a non-dereferenceable pointer
  * an interface for placeholder accessor to behave like a pointer on both host and device
2019-07-01 16:02:30 +01:00
Rasmus Munk Larsen
8053eeb51e Fix CUDA compilation error for pselect<half>. 2019-06-28 12:07:29 -07:00
Mehdi Goli
16a56b2ddd [SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL.
* Adding SYCL memory model
* Enabling/Disabling SYCL  backend in Core
*  Supporting Vectorization
2019-06-27 12:25:09 +01:00
Deven Desai
ba506d5bd2 fix for a ROCm/HIP specificcompile errror introduced by a recent commit. 2019-06-22 00:06:05 +00:00
Rasmus Munk Larsen
c9394d7a0e Remove extra "one" in comment. 2019-06-20 16:23:19 -07:00
Rasmus Munk Larsen
b8f8dac4eb Update comment as suggested by tra@google.com. 2019-06-20 16:18:37 -07:00
Rasmus Munk Larsen
e5e63c2cad Fix grammar. 2019-06-20 16:03:59 -07:00
Rasmus Munk Larsen
302a404b7e Added comment explaining the surprising EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC clause. 2019-06-20 15:59:08 -07:00
Rasmus Munk Larsen
b5237f53b1 Fix CUDA build on Mac. 2019-06-20 15:44:14 -07:00
Rasmus Munk Larsen
988f24b730 Various fixes for packet ops.
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
2019-06-20 11:47:49 -07:00
Christoph Hertzberg
e0be7f30e1 bug #1724: Mask buggy warnings with g++-7
(grafted from 427f2f66d6
)
2019-06-14 14:57:46 +02:00
Rasmus Munk Larsen
6d432eae5d Make is_valid_index_type return false for float and double when EIGEN_HAS_TYPE_TRAITS is off. 2019-06-05 16:42:27 -07:00
Rasmus Munk Larsen
f715f6e816 Add workaround for choosing the right include files with FP16C support with clang. 2019-06-05 13:36:37 -07:00
Rasmus Munk Larsen
b08527b0c1 Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures. 2019-05-31 15:26:06 -07:00
Deven Desai
2c38930161 fix for HIP build errors that were introduced by a commit earlier this week 2019-05-24 14:25:32 +00:00
Gustavo Lima Chaves
56bc4974fb GEMV: remove double declaration of constant.
That was hurting users with compilers that would object to proceed with
that:

"""
./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow]
         LhsPacketSize = Traits::LhsPacketSize,
         ^
./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here
  static const Index LhsPacketSize = Traits::LhsPacketSize;
"""
2019-05-23 14:50:29 -07:00
Christoph Hertzberg
ac21a08c13 Cast Index to RealScalar
This fixes compilation issues with RealScalar types that are not implicitly castable from Index (e.g. ceres Jet types).
Reported by Peter Anderson-Sprecher via eMail
2019-05-23 15:31:12 +02:00
Rasmus Munk Larsen
3eb5ad0ed0 Enable support for F16C with Clang. The required intrinsics were added here: https://reviews.llvm.org/D16177
and are part of LLVM 3.8.0.
2019-05-20 17:19:20 -07:00
Rasmus Larsen
e92486b8c3 Merged in rmlarsen/eigen (pull request PR-643)
Make Eigen build with cuda 10 and clang.

Approved-by: Justin Lebar <justin.lebar@gmail.com>
2019-05-20 17:02:39 +00:00
Gael Guennebaud
cc7ecbb124 Merged in scramsby/eigen (pull request PR-646)
Eigen: Fix MSVC C++17 language standard detection logic
2019-05-20 07:19:10 +00:00
Rasmus Larsen
bf9cbed8d0 Merged in glchaves/eigen (pull request PR-635)
Speed up GEMV on AVX-512 builds, just as done for GEBP previously.

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-05-17 19:40:50 +00:00
Rasmus Munk Larsen
ab0a30e429 Make Eigen build with cuda 10 and clang. 2019-05-15 13:32:15 -07:00
Christoph Hertzberg
5f32b79edc Collapsed revision from PR-641
* SparseLU.h - corrected example, it didn't compile
* Changed encoding back to UTF8
2019-05-13 19:02:30 +02:00
Anuj Rawat
ad372084f5 Removing unused API to fix compile error in TensorFlow due to
AVX512VL, AVX512BW usage
2019-05-12 14:43:10 +00:00
Christoph Hertzberg
4ccd1ece92 bug #1707: Fix deprecation warnings, or disable warnings when testing deprecated functions 2019-05-10 14:57:05 +02:00
Rasmus Munk Larsen
d3ef7cf03e Fix build with clang on Windows. 2019-05-09 11:07:04 -07:00
Eugene Zhulenev
45b40d91ca Fix AVX512 & GCC 6.3 compilation 2019-05-07 16:44:55 -07:00
Christoph Hertzberg
cca76c272c Restore C++03 compatibility 2019-05-06 16:18:22 +02:00
Rasmus Munk Larsen
8e33844fc7 Fix traits for scalar_logistic_op. 2019-05-03 15:49:09 -07:00
Scott Ramsby
ff06ef7584 Eigen: Fix MSVC C++17 language standard detection logic
To detect C++17 support, use _MSVC_LANG macro instead of _MSC_VER. _MSC_VER can indicate whether the current compiler version could support the C++17 language standard, but not whether that standard is actually selected (i.e. via /std:c++17).
See these web pages for more details:
https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/
https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros
2019-05-03 14:14:09 -07:00
Eugene Zhulenev
e9f0eb8a5e Add masked_store_available to unpacket_traits 2019-05-02 14:52:58 -07:00
Eugene Zhulenev
96e30e936a Add masked pstoreu for Packet16h 2019-05-02 14:11:01 -07:00
Eugene Zhulenev
b4010f02f9 Add masked pstoreu to AVX and AVX512 PacketMath 2019-05-02 13:14:18 -07:00
Gael Guennebaud
578407f42f Fix regression in changeset ae33e866c7 2019-05-02 15:45:21 +02:00
Gustavo Lima Chaves
d4dcb71bcb Speed up GEMV on AVX-512 builds, just as done for GEBP previously.
We take advantage of smaller SIMD registers as well, in that case.

Gains up to 3x for select input sizes.
2019-04-26 14:12:39 -07:00
Andy May
ae33e866c7 Fix compilation with PGI version 19 2019-04-25 21:23:19 +01:00
Gael Guennebaud
665ac22cc6 Merged in ezhulenev/eigen-01 (pull request PR-632)
Fix doxygen warnings
2019-04-25 20:02:20 +00:00
Eugene Zhulenev
8ead5bb3d8 Fix doxygen warnings to enable statis code analysis 2019-04-24 12:42:28 -07:00
Eugene Zhulenev
07355d47c6 Get rid of SequentialLinSpacedReturnType deprecation warnings in DenseBase.h 2019-04-24 11:01:35 -07:00
Rasmus Munk Larsen
144ca33321 Remove deprecation annotation from typedef Eigen::Index Index, as it would generate too many build warnings. 2019-04-24 08:50:07 -07:00
Eugene Zhulenev
a7b7f3ca8a Add missing EIGEN_DEPRECATED annotations to deprecated functions and fix few other doxygen warnings 2019-04-23 17:23:19 -07:00
Eugene Zhulenev
68a2a8c445 Use packet ops instead of AVX2 intrinsics 2019-04-23 11:41:02 -07:00
Anuj Rawat
8c7a6feb8e Adding lowlevel APIs for optimized RHS packet load in TensorFlow
SpatialConvolution

Low-level APIs are added in order to optimized packet load in gemm_pack_rhs
in TensorFlow SpatialConvolution. The optimization is for scenario when a
packet is split across 2 adjacent columns. In this case we read it as two
'partial' packets and then merge these into 1. Currently this only works for
Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other
packet types (such as Packet8d) also.

This optimization shows significant speedup in SpatialConvolution with
certain parameters. Some examples are below.

Benchmark parameters are specified as:
Batch size, Input dim, Depth, Num of filters, Filter dim

Speedup numbers are specified for number of threads 1, 2, 4, 8, 16.

AVX512:

Parameters                  | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128,   24x24,  3, 64,   5x5 |2.18X, 2.13X, 1.73X, 1.64X, 1.66X
128,   24x24,  1, 64,   8x8 |2.00X, 1.98X, 1.93X, 1.91X, 1.91X
 32,   24x24,  3, 64,   5x5 |2.26X, 2.14X, 2.17X, 2.22X, 2.33X
128,   24x24,  3, 64,   3x3 |1.51X, 1.45X, 1.45X, 1.67X, 1.57X
 32,   14x14, 24, 64,   5x5 |1.21X, 1.19X, 1.16X, 1.70X, 1.17X
128, 128x128,  3, 96, 11x11 |2.17X, 2.18X, 2.19X, 2.20X, 2.18X

AVX2:

Parameters                  | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128,   24x24,  3, 64,   5x5 | 1.66X, 1.65X, 1.61X, 1.56X, 1.49X
 32,   24x24,  3, 64,   5x5 | 1.71X, 1.63X, 1.77X, 1.58X, 1.68X
128,   24x24,  1, 64,   5x5 | 1.44X, 1.40X, 1.38X, 1.37X, 1.33X
128,   24x24,  3, 64,   3x3 | 1.68X, 1.63X, 1.58X, 1.56X, 1.62X
128, 128x128,  3, 96, 11x11 | 1.36X, 1.36X, 1.37X, 1.37X, 1.37X

In the higher level benchmark cifar10, we observe a runtime improvement
of around 6% for AVX512 on Intel Skylake server (8 cores).

On lower level PackRhs micro-benchmarks specified in TensorFlow
tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe
the following runtime numbers:

AVX512:

Parameters                                                     | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56)  |  41350                     | 15073                   | 2.74X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56)  |   7277                     |  7341                   | 0.99X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56)  |   8675                     |  8681                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56)  |  24155                     | 16079                   | 1.50X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56)  |  25052                     | 17152                   | 1.46X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) |  18269                     | 18345                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) |  19468                     | 19872                   | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432)   | 156060                     | 42432                   | 3.68X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432)   | 132701                     | 36944                   | 3.59X

AVX2:

Parameters                                                     | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56)  | 26233                      | 12393                   | 2.12X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56)  |  6091                      |  6062                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56)  |  7427                      |  7408                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56)  | 23453                      | 20826                   | 1.13X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56)  | 23167                      | 22091                   | 1.09X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 23422                      | 23682                   | 0.99X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 23165                      | 23663                   | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432)   | 72689                      | 44969                   | 1.62X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432)   | 61732                      | 39779                   | 1.55X

All benchmarks on Intel Skylake server with 8 cores.
2019-04-20 06:46:43 +00:00
Gael Guennebaud
45e65fbb77 bug #1695: fix a numerical robustness issue. Computing the secular equation at the middle range without a shift might give a wrong sign. 2019-03-27 20:16:58 +01:00
William D. Irons
8de66719f9 Collapsed revision from PR-619
* Add support for pcmp_eq in AltiVec/Complex.h
* Fixed implementation of pcmp_eq for double

The new logic is based on the logic from NEON for double.
2019-03-26 18:14:49 +00:00
Gael Guennebaud
f11364290e ICC does not support -fno-unsafe-math-optimizations 2019-03-22 09:26:24 +01:00