* Unifying all loadLocalTile from lhs and rhs to an extract_block function.
* Adding get_tensor operation which was missing in TensorContractionMapper.
* Adding the -D method missing from cmake for Disable_Skinny Contraction operation.
* Wrapping all the indices in TensorScanSycl into Scan parameter struct.
* Fixing typo in Device SYCL
* Unifying load to private register for tall/skinny no shared
* Unifying load to vector tile for tensor-vector/vector-tensor operation
* Removing all the LHS/RHS class for extracting data from global
* Removing Outputfunction from TensorContractionSkinnyNoshared.
* Combining the local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining General Tensor-Vector and VectorTensor contraction into one kernel.
* Making double buffering optional for Tensor contraction when local memory is version is used.
* Modifying benchmark to accept custom Reduction Sizes
* Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host
* Adding Test for SYCL
* Modifying SYCL CMake
- remove most of the metaprogramming kung fu in MathFunctions.h (only keep functions that differs from the std)
- remove the overloads for array expression that were in the std namespace
* convenient functions:
- Horner and stabilized Horner evaluation
- polynomial coefficients from a set of given roots
- Cauchy bounds
* a QR based polynomial solver
This does the job but it is only a first version. Further plans:
improved docs, more tests, improve code by refactoring, add convenience
functions for sine, cosine, sinh, cosh, and (eventually) add the matrix
logarithm.
* Enable compilation of examples for unsupported.
* Fix use of std::vector in BVH example.
* Add an example for the matrix exponential.
* Bug fixes in unsupported/doc/{examples,snippets}/CMakeLists.txt .