Commit Graph

10634 Commits

Author SHA1 Message Date
João P. L. de Carvalho
5ac7984ffa Fix debug macros in p{load,store}u 2019-08-14 11:59:12 -06:00
João P. L. de Carvalho
db9147ae40 Add missing pcmp_XX methods for double/Packet2d
This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
2019-08-14 10:37:39 -06:00
João P. L. de Carvalho
787f6ef025 Fix packed load/store for PowerPC's VSX
The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts.

For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f.

Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
2019-08-09 16:02:55 -06:00
João P. L. de Carvalho
4d29aa0294 Fix offset argument of ploadu/pstoreu for Altivec
If no offset is given, them it should be zero.

Also passes full address to vec_vsx_ld/st builtins.

Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.

Removes unnecessary casts.
2019-08-09 15:59:26 -06:00
João P. L. de Carvalho
66d073c38e bug #1718: Add cast to successfully compile with clang on PowerPC
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
2019-08-09 15:56:26 -06:00
Rasmus Munk Larsen
d55d392e7b Fix bugs in log1p and expm1 where repeated using statements would clobber each other.
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
2019-08-08 16:27:32 -07:00
Rasmus Munk Larsen
85928e5f47 Guard against repeated definition of EIGEN_MPL2_ONLY 2019-08-07 14:19:00 -07:00
Rasmus Munk Larsen
facc4e4536 Disable tests for contraction with output kernels when using libxsmm, which does not support this. 2019-08-07 14:11:15 -07:00
Rasmus Munk Larsen
eab7e52db2 [Eigen] Vectorize evaluation of coefficient-wise functions over tensor blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX.
A few benchmark numbers:

name                              old time/op             new time/op             delta
BM_Xent_16_10000_cpu              448µs ± 3%              389µs ± 2%  -13.21%
(p=0.008 n=5+5)
BM_Xent_32_10000_cpu              575µs ± 6%              454µs ± 3%  -21.00%          (p=0.008 n=5+5)
BM_Xent_64_10000_cpu              933µs ± 4%              712µs ± 1%  -23.71%          (p=0.008 n=5+5)
2019-08-07 12:57:42 -07:00
Rasmus Munk Larsen
0987126165 Clean up unnecessary namespace specifiers in TensorBlock.h. 2019-08-07 12:12:52 -07:00
Gael Guennebaud
0050644b23 Fix doc regarding alignment and c++17 2019-08-04 01:09:41 +02:00
Rasmus Munk Larsen
e2999d4c38 Fix performance regressions due to https://bitbucket.org/eigen/eigen/pull-requests/662.
The change caused the device struct to be copied for each expression evaluation, and caused, e.g., a 10% regression in the TensorFlow multinomial op on GPU:


Benchmark                       Time(ns)        CPU(ns)     Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4     128173         231326           2922  1.610G items/s

VS

Benchmark                       Time(ns)        CPU(ns)     Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4     146683         246914           2719  1.509G items/s
2019-08-02 11:18:13 -07:00
Kyle Vedder
f22b7283a3 Added leading asterisk for Doxygen to consume as it was removing asterisk intended to be part of the code. 2019-07-18 18:12:14 +00:00
Michael Grupp
6e17491f45 Fix typo in Umeyama method documentation 2019-07-17 11:20:41 +00:00
Christoph Hertzberg
e0f5a2a456 Remove {} accidentally added in previous commit 2019-07-18 20:22:17 +02:00
Christoph Hertzberg
ea6d7eb32f Move variadic constructors outside #ifndef EIGEN_PARSED_BY_DOXYGEN block, to make it actually appear in the generated documentation. 2019-07-12 19:46:37 +02:00
Christoph Hertzberg
9237883ff1 Escape \# inside doxygen docu 2019-07-12 19:45:13 +02:00
Christoph Hertzberg
c2671e5315 Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNING
Also, document LinSpaced only where it is implemented
2019-07-12 19:43:32 +02:00
Eugene Zhulenev
3cd148f983 Fix expression evaluation heuristic for TensorSliceOp 2019-07-09 12:10:26 -07:00
Rasmus Munk Larsen
23b958818e Fix compiler for unsigned integers. 2019-07-09 11:18:25 -07:00
Eugene Zhulenev
6083014594 Add outer/inner chipping optimization for chipping dimension specified at runtime 2019-07-03 11:35:25 -07:00
Deven Desai
7eb2e0a95b adding the EIGEN_DEVICE_FUNC attribute to the constCast routine.
Not having this attribute results in the following failures in the `--config=rocm` TF build.

```
In file included from tensorflow/core/kernels/cross_op_gpu.cu.cc:20:
In file included from ./tensorflow/core/framework/register_types.h:20:
In file included from ./tensorflow/core/framework/numeric_types.h:20:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:140:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error:  'Eigen::constCast':  no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
    typename Storage::Type result = constCast(m_impl.data());
                                    ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error:  'Eigen::constCast':  no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:148:56: note: in instantiation of member function 'Eigen::TensorEvaluator<const Eigen::TensorChippingOp<1, Eigen::TensorMap<Eigen::Tensor<int, 2, 1, long>, 16, MakePointer> >, Eigen::Gpu\
Device>::data' requested here
    return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data());

```

Adding the EIGEN_DEVICE_FUNC attribute resolves those errors
2019-07-02 20:02:46 +00:00
Gael Guennebaud
ef8aca6a89 Merged in codeplaysoftware/eigen (pull request PR-667)
[SYCL] :

Approved-by: Gael Guennebaud <g.gael@free.fr>
Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-07-02 12:45:23 +00:00
Eugene Zhulenev
4ac93f8edc Allocate non-const scalar buffer for block evaluation with DefaultDevice 2019-07-01 10:55:19 -07:00
Mehdi Goli
9ea490c82c [SYCL] :
* Modifying TensorDeviceSYCL to use `EIGEN_THROW_X`.
  * Modifying TensorMacro to use `EIGEN_TRY/CATCH(X)` macro.
  * Modifying TensorReverse.h to use `EIGEN_DEVICE_REF` instead of `&`.
  * Fixing the SYCL device macro in SpecialFunctionsImpl.h.
2019-07-01 16:27:28 +01:00
Eugene Zhulenev
81a03bec75 Fix TensorReverse on GPU with m_stride[i]==0 2019-06-28 15:50:39 -07:00
Rasmus Munk Larsen
8053eeb51e Fix CUDA compilation error for pselect<half>. 2019-06-28 12:07:29 -07:00
Rasmus Munk Larsen
74a9dd1102 Fix preprocessor condition to only generate a warning when calling eigen::GpuDevice::synchronize() from device code, but not when calling from a non-GPU compilation unit. 2019-06-28 11:56:21 -07:00
Rasmus Munk Larsen
70d4020ad9 Remove comma causing warning in c++03 mode. 2019-06-28 11:39:45 -07:00
Eugene Zhulenev
6e7c76481a Merge with Eigen head 2019-06-28 11:22:46 -07:00
Eugene Zhulenev
878845cb25 Add block access to TensorReverseOp and make sure that TensorForcedEval uses block access when preferred 2019-06-28 11:13:44 -07:00
Rasmus Munk Larsen
1f61aee5ca [SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
2019-06-28 10:11:56 -07:00
Mehdi Goli
7d08fa805a [SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
2019-06-28 10:08:23 +01:00
Mehdi Goli
16a56b2ddd [SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL.
* Adding SYCL memory model
* Enabling/Disabling SYCL  backend in Core
*  Supporting Vectorization
2019-06-27 12:25:09 +01:00
Christoph Hertzberg
adec097c61 Remove extra comma (causes warnings in C++03) 2019-06-26 16:14:28 +02:00
Eugene Zhulenev
229db81572 Optimize evaluation strategy for TensorSlicingOp and TensorChippingOp 2019-06-25 15:41:37 -07:00
Deven Desai
ba506d5bd2 fix for a ROCm/HIP specificcompile errror introduced by a recent commit. 2019-06-22 00:06:05 +00:00
Rasmus Munk Larsen
c9394d7a0e Remove extra "one" in comment. 2019-06-20 16:23:19 -07:00
Rasmus Munk Larsen
b8f8dac4eb Update comment as suggested by tra@google.com. 2019-06-20 16:18:37 -07:00
Rasmus Munk Larsen
e5e63c2cad Fix grammar. 2019-06-20 16:03:59 -07:00
Rasmus Munk Larsen
302a404b7e Added comment explaining the surprising EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC clause. 2019-06-20 15:59:08 -07:00
Rasmus Munk Larsen
b5237f53b1 Fix CUDA build on Mac. 2019-06-20 15:44:14 -07:00
Rasmus Munk Larsen
988f24b730 Various fixes for packet ops.
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
2019-06-20 11:47:49 -07:00
Christoph Hertzberg
e0be7f30e1 bug #1724: Mask buggy warnings with g++-7
(grafted from 427f2f66d6
)
2019-06-14 14:57:46 +02:00
Rasmus Munk Larsen
6d432eae5d Make is_valid_index_type return false for float and double when EIGEN_HAS_TYPE_TRAITS is off. 2019-06-05 16:42:27 -07:00
Rasmus Munk Larsen
f715f6e816 Add workaround for choosing the right include files with FP16C support with clang. 2019-06-05 13:36:37 -07:00
Justin Carpentier
ffaf658ecd PR 655: Fix missing Eigen namespace in Macros 2019-06-05 09:51:59 +02:00
Mehdi Goli
0b24e1cb5c [SYCL] Adding the SYCL memory model. The SYCL memory model provides :
* an interface for SYCL buffers to behave as a non-dereferenceable pointer
  * an interface for placeholder accessor to behave like a pointer on both host and device
2019-07-01 16:02:30 +01:00
Rasmus Larsen
c1b0aea653 Merged in Artem-B/eigen (pull request PR-654)
Minor build improvements

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-05-31 22:27:04 +00:00
Rasmus Munk Larsen
b08527b0c1 Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures. 2019-05-31 15:26:06 -07:00