* Unifying all loadLocalTile from lhs and rhs to an extract_block function.
* Adding get_tensor operation which was missing in TensorContractionMapper.
* Adding the -D method missing from cmake for Disable_Skinny Contraction operation.
* Wrapping all the indices in TensorScanSycl into Scan parameter struct.
* Fixing typo in Device SYCL
* Unifying load to private register for tall/skinny no shared
* Unifying load to vector tile for tensor-vector/vector-tensor operation
* Removing all the LHS/RHS class for extracting data from global
* Removing Outputfunction from TensorContractionSkinnyNoshared.
* Combining the local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining General Tensor-Vector and VectorTensor contraction into one kernel.
* Making double buffering optional for Tensor contraction when local memory is version is used.
* Modifying benchmark to accept custom Reduction Sizes
* Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host
* Adding Test for SYCL
* Modifying SYCL CMake
Ancient versions of CMake required else(), endif(), and similar block
termination commands to have arguments matching the command starting the block.
This is no longer the preferred style.
Add a new EIGEN_HAS_INTRINSIC_INT128 macro, and use this instead of __SIZEOF_INT128__. This fixes related issues with TensorIntDiv.h when building with Clang for Windows, where support for 128-bit integer arithmetic is advertised but broken in practice.
* The specialization of array class in the different namespace for GCC<=6.4
* The implicit call to `std::array` constructor using the initializer list for GCC <=6.1
The errors were introduced by this commit :
After the above mentioned commit, some of the tests started failing with the following error
```
Built target cxx11_tensor_reduction
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:117:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:155:5: error: the field type is not amp-compatible
DestinationBufferKind m_kind;
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:211:3: error: the field type is not amp-compatible
DestinationBuffer m_destination;
^
```
For some reason HIPCC does not like device code to contain enum types which do not have the base-type explicitly declared. The fix is trivial, explicitly state "int" as the basetype
The errors were introduced by this commit : d38e6fbc27
After the above mentioned commit, some of the tests started failing with the following error
```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:70:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h:28:22: error: call to 'erf' is ambiguous
return Eigen::half(Eigen::numext::erf(static_cast<float>(a)));
^~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1600:7: note: candidate function [with T = float]
float erf(const float &x) { return ::erff(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = float]
erf(const Scalar& x) {
^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:23: error: call to 'erf' is ambiguous
return make_double2(erf(a.x), erf(a.y));
^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
erf(const Scalar& x) {
^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:33: error: call to 'erf' is ambiguous
return make_double2(erf(a.x), erf(a.y));
^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
erf(const Scalar& x) {
^
3 errors generated.
```
This PR fixes the compile error by removing the "old" implementation for "erf" (assuming that the "new" implementation is what we want going forward. from a GPU point-of-view both implementations are the same).
This PR also fixes what seems like a cut-n-paste error in the aforementioned commit