Gael Guennebaud
031f17117d
bug #1741 : fix self-adjoint*matrix, triangular*matrix, and triangular^1*matrix with a destination having a non-trivial inner-stride
2019-09-11 15:04:25 +02:00
Gael Guennebaud
459b2bcc08
Fix compilation of BLAS backend and frontend
2019-09-11 10:02:37 +02:00
Rasmus Larsen
97f1e1d89f
Merged in ezhulenev/eigen-01 (pull request PR-698)
...
ThreadLocal container that does not rely on thread local storage
Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-09-10 23:19:33 +00:00
Eugene Zhulenev
d918bd9a8b
Update ThreadLocal to use separate Initialize/Release callables
2019-09-10 16:13:32 -07:00
Gael Guennebaud
afa8d13532
Fix some implicit literal to Scalar conversions in SparseCore
2019-09-11 00:03:07 +02:00
Gael Guennebaud
c06e6fd115
bug #1741 : fix SelfAdjointView::rankUpdate and product to triangular part for destination with non-trivial inner stride
2019-09-10 23:29:52 +02:00
Gael Guennebaud
ea0d5dc956
bug #1741 : fix C.noalias() = A*C; with C.innerStride()!=1
2019-09-10 16:25:24 +02:00
Eugene Zhulenev
e3dec4dcc1
ThreadLocal container that does not rely on thread local storage
2019-09-09 15:18:14 -07:00
Gael Guennebaud
17226100c5
Fix a circular dependency regarding pshift* functions and GenericPacketMathFunctions.
...
Another solution would have been to make pshift* fully generic template functions with
partial specialization which is always a mess in c++03.
2019-09-06 09:26:04 +02:00
Gael Guennebaud
55b63d4ea3
Fix compilation without vector engine available (e.g., x86 with SSE disabled):
...
-> ppolevl is required by ndtri even for the scalar path
2019-09-05 18:16:46 +02:00
Srinivas Vasudevan
a9cf823db7
Merged eigen/eigen
2019-09-04 23:50:52 -04:00
Gael Guennebaud
e6c183f8fd
Fix doc issues regarding ndtri
2019-09-04 23:00:21 +02:00
Gael Guennebaud
5702a57926
Fix possible warning regarding strict equality comparisons
2019-09-04 22:57:04 +02:00
Srinivas Vasudevan
99036a3615
Merging from eigen/eigen.
2019-09-03 15:34:47 -04:00
Eugene Zhulenev
a8d264fa9c
Add test for const TensorMap underlying data mutation
2019-09-03 11:38:39 -07:00
Eugene Zhulenev
f68f2bba09
TensorMap constness should not change underlying storage constness
2019-09-03 11:08:09 -07:00
Gael Guennebaud
8e7e3d9bc8
Makes Scalar/RealScalar typedefs public in Pardiso's wrappers (see PR 688)
2019-09-03 13:09:03 +02:00
Srinivas Vasudevan
e38dd48a27
PR 681: Add ndtri function, the inverse of the normal distribution function.
2019-08-12 19:26:29 -04:00
Eugene Zhulenev
f59bed7a13
Change typedefs from private to protected to fix MSVC compilation
2019-09-03 19:11:36 -07:00
Eugene Zhulenev
47fefa235f
Allow move-only done callback in TensorAsyncDevice
2019-09-03 17:20:56 -07:00
Srinivas Vasudevan
18ceb3413d
Add ndtri function, the inverse of the normal distribution function.
2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen
d55d392e7b
Fix bugs in log1p and expm1 where repeated using statements would clobber each other.
...
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
2019-08-08 16:27:32 -07:00
Rasmus Munk Larsen
85928e5f47
Guard against repeated definition of EIGEN_MPL2_ONLY
2019-08-07 14:19:00 -07:00
Rasmus Munk Larsen
facc4e4536
Disable tests for contraction with output kernels when using libxsmm, which does not support this.
2019-08-07 14:11:15 -07:00
Rasmus Munk Larsen
eab7e52db2
[Eigen] Vectorize evaluation of coefficient-wise functions over tensor blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX.
...
A few benchmark numbers:
name old time/op new time/op delta
BM_Xent_16_10000_cpu 448µs ± 3% 389µs ± 2% -13.21%
(p=0.008 n=5+5)
BM_Xent_32_10000_cpu 575µs ± 6% 454µs ± 3% -21.00% (p=0.008 n=5+5)
BM_Xent_64_10000_cpu 933µs ± 4% 712µs ± 1% -23.71% (p=0.008 n=5+5)
2019-08-07 12:57:42 -07:00
Rasmus Munk Larsen
0987126165
Clean up unnecessary namespace specifiers in TensorBlock.h.
2019-08-07 12:12:52 -07:00
Gael Guennebaud
0050644b23
Fix doc regarding alignment and c++17
2019-08-04 01:09:41 +02:00
Rasmus Munk Larsen
e2999d4c38
Fix performance regressions due to https://bitbucket.org/eigen/eigen/pull-requests/662 .
...
The change caused the device struct to be copied for each expression evaluation, and caused, e.g., a 10% regression in the TensorFlow multinomial op on GPU:
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4 128173 231326 2922 1.610G items/s
VS
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4 146683 246914 2719 1.509G items/s
2019-08-02 11:18:13 -07:00
Alberto Luaces
c694be1214
Fixed Tensor documentation formatting.
2019-07-23 09:24:06 +00:00
Gael Guennebaud
15f3d9d272
More colamd cleanup:
...
- Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc.
- Fix signed/unsigned warning
- move some ugly free functions as member functions
2019-09-03 00:50:51 +02:00
Anshul Jaiswal
a4d1a6cd7d
Eigen_Colamd.h updated to replace constexpr with consts and enums.
2019-08-17 05:29:23 +00:00
Anshul Jaiswal
283558face
Ordering.h edited to fix dependencies on Eigen_Colamd.h
2019-08-15 20:21:56 +00:00
Anshul Jaiswal
39f30923c2
Eigen_Colamd.h edited replacing macros with constexprs and functions.
2019-08-15 20:15:19 +00:00
Anshul Jaiswal
0a6b553ecf
Eigen_Colamd.h edited online with Bitbucket replacing constant #defines with const definitions
2019-07-21 04:53:31 +00:00
Kyle Vedder
f22b7283a3
Added leading asterisk for Doxygen to consume as it was removing asterisk intended to be part of the code.
2019-07-18 18:12:14 +00:00
Michael Grupp
6e17491f45
Fix typo in Umeyama method documentation
2019-07-17 11:20:41 +00:00
Christoph Hertzberg
e0f5a2a456
Remove {} accidentally added in previous commit
2019-07-18 20:22:17 +02:00
Christoph Hertzberg
ea6d7eb32f
Move variadic constructors outside #ifndef EIGEN_PARSED_BY_DOXYGEN
block, to make it actually appear in the generated documentation.
2019-07-12 19:46:37 +02:00
Christoph Hertzberg
9237883ff1
Escape \# inside doxygen docu
2019-07-12 19:45:13 +02:00
Christoph Hertzberg
c2671e5315
Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNING
...
Also, document LinSpaced only where it is implemented
2019-07-12 19:43:32 +02:00
Eugene Zhulenev
3cd148f983
Fix expression evaluation heuristic for TensorSliceOp
2019-07-09 12:10:26 -07:00
Rasmus Munk Larsen
23b958818e
Fix compiler for unsigned integers.
2019-07-09 11:18:25 -07:00
Eugene Zhulenev
6083014594
Add outer/inner chipping optimization for chipping dimension specified at runtime
2019-07-03 11:35:25 -07:00
Deven Desai
7eb2e0a95b
adding the EIGEN_DEVICE_FUNC attribute to the constCast routine.
...
Not having this attribute results in the following failures in the `--config=rocm` TF build.
```
In file included from tensorflow/core/kernels/cross_op_gpu.cu.cc:20:
In file included from ./tensorflow/core/framework/register_types.h:20:
In file included from ./tensorflow/core/framework/numeric_types.h:20:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:140:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
typename Storage::Type result = constCast(m_impl.data());
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:148:56: note: in instantiation of member function 'Eigen::TensorEvaluator<const Eigen::TensorChippingOp<1, Eigen::TensorMap<Eigen::Tensor<int, 2, 1, long>, 16, MakePointer> >, Eigen::Gpu\
Device>::data' requested here
return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data());
```
Adding the EIGEN_DEVICE_FUNC attribute resolves those errors
2019-07-02 20:02:46 +00:00
Gael Guennebaud
ef8aca6a89
Merged in codeplaysoftware/eigen (pull request PR-667)
...
[SYCL] :
Approved-by: Gael Guennebaud <g.gael@free.fr>
Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-07-02 12:45:23 +00:00
Eugene Zhulenev
4ac93f8edc
Allocate non-const scalar buffer for block evaluation with DefaultDevice
2019-07-01 10:55:19 -07:00
Mehdi Goli
9ea490c82c
[SYCL] :
...
* Modifying TensorDeviceSYCL to use `EIGEN_THROW_X`.
* Modifying TensorMacro to use `EIGEN_TRY/CATCH(X)` macro.
* Modifying TensorReverse.h to use `EIGEN_DEVICE_REF` instead of `&`.
* Fixing the SYCL device macro in SpecialFunctionsImpl.h.
2019-07-01 16:27:28 +01:00
Eugene Zhulenev
81a03bec75
Fix TensorReverse on GPU with m_stride[i]==0
2019-06-28 15:50:39 -07:00
Rasmus Munk Larsen
8053eeb51e
Fix CUDA compilation error for pselect<half>.
2019-06-28 12:07:29 -07:00
Rasmus Munk Larsen
74a9dd1102
Fix preprocessor condition to only generate a warning when calling eigen::GpuDevice::synchronize() from device code, but not when calling from a non-GPU compilation unit.
2019-06-28 11:56:21 -07:00