Commit Graph

1124 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
b431024404 Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms.
Change test to only test for NaN-propagation for pfmin/pfmax.
2020-10-07 19:05:18 +00:00
David Tellenbach
d4a727d092 Disable min/max NaN propagation in test cxx11_tensor_expr
The current pmin/pmax implementation for Arm Neon propagate NaNs
differently than std::min/std::max.

See issue https://gitlab.com/libeigen/eigen/-/issues/1937
2020-08-14 16:16:27 +00:00
Rasmus Munk Larsen
ac2eca6b11 Update tensor reduction test to avoid undefined division of bfloat16 by int. 2020-07-22 00:35:51 +00:00
Antonio Sanchez
9cb8771e9c Fix tensor casts for large packets and casts to/from std::complex
The original tensor casts were only defined for
`SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the
missing 1:N and 8:1.

We also add casting `Eigen::half` to/from `std::complex<T>`, which
was missing to make it consistent with `Eigen:bfloat16`, and
generalize the overload to work for any complex type.

Tests were added to `basicstuff`, `packetmath`, and
`cxx11_tensor_casts` to test all cast configurations.
2020-06-30 18:53:55 +00:00
Teng Lu
386d809bde Support BFloat16 in Eigen 2020-06-20 19:16:24 +00:00
Antonio Sanchez
a7d2552af8 Remove HasCast and fix packetmath cast tests.
The use of the `packet_traits<>::HasCast` field is currently inconsistent with
`type_casting_traits<>`, and is unused apart from within
`test/packetmath.cpp`. In addition, those packetmath cast tests do not
currently reflect how casts are performed in practice: they ignore the
`SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio.

Here we remove the unsed `HasCast`, and modify the packet cast tests to
better reflect their usage.
2020-06-11 17:26:56 +00:00
Thales Sabino
1fcaaf460f Update FindComputeCpp.cmake to fix build problems on Windows
- Use standard types in SYCL/PacketMath.h to avoid compilation problems on Windows
- Add EIGEN_HAS_CONSTEXPR to cxx11_tensor_argmax_sycl.cpp to fix build problems on Windows
2020-06-05 20:51:20 +00:00
Antonio Sánchez
8719b9c5bc Disable test for 32-bit systems (e.g. ARM, i386)
Both i386 and 32-bit ARM do not define __uint128_t. On most systems, if
__uint128_t is defined, then so is the macro __SIZEOF_INT128__.

https://stackoverflow.com/questions/18531782/how-to-know-if-uint128-t-is-defined1
2020-05-28 17:40:15 +00:00
Rasmus Munk Larsen
ab773c7e91 Extend support for Packet16b:
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests

This speeds up matmul for boolean matrices by about 10x

name                            old time/op             new time/op             delta
BM_MatMul<bool>/8                267ns ± 0%              479ns ± 0%  +79.25%          (p=0.008 n=5+5)
BM_MatMul<bool>/32              6.42µs ± 0%             0.87µs ± 0%  -86.50%          (p=0.008 n=5+5)
BM_MatMul<bool>/64              43.3µs ± 0%              5.9µs ± 0%  -86.42%          (p=0.008 n=5+5)
BM_MatMul<bool>/128              315µs ± 0%               44µs ± 0%  -85.98%          (p=0.008 n=5+5)
BM_MatMul<bool>/256             2.41ms ± 0%             0.34ms ± 0%  -85.68%          (p=0.008 n=5+5)
BM_MatMul<bool>/512             18.8ms ± 0%              2.7ms ± 0%  -85.53%          (p=0.008 n=5+5)
BM_MatMul<bool>/1k               149ms ± 0%               22ms ± 0%  -85.40%          (p=0.008 n=5+5)
2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen
2f6ddaa25c Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x.
Benchmark numbers for the logical and of two NxN tensors:

name                                               old time/op             new time/op             delta
BM_booleanAnd_1T/3   [using 1 threads]             14.6ns ± 0%             14.4ns ± 0%   -0.96%
BM_booleanAnd_1T/4   [using 1 threads]             20.5ns ±12%              9.0ns ± 0%  -56.07%
BM_booleanAnd_1T/7   [using 1 threads]             41.7ns ± 0%             10.5ns ± 0%  -74.87%
BM_booleanAnd_1T/8   [using 1 threads]             52.1ns ± 0%             10.1ns ± 0%  -80.59%
BM_booleanAnd_1T/10  [using 1 threads]             76.3ns ± 0%             13.8ns ± 0%  -81.87%
BM_booleanAnd_1T/15  [using 1 threads]              167ns ± 0%               16ns ± 0%  -90.45%
BM_booleanAnd_1T/16  [using 1 threads]              188ns ± 0%               16ns ± 0%  -91.57%
BM_booleanAnd_1T/31  [using 1 threads]              667ns ± 0%               34ns ± 0%  -94.83%
BM_booleanAnd_1T/32  [using 1 threads]              710ns ± 0%               35ns ± 0%  -95.01%
BM_booleanAnd_1T/64  [using 1 threads]             2.80µs ± 0%             0.11µs ± 0%  -95.93%
BM_booleanAnd_1T/128 [using 1 threads]             11.2µs ± 0%              0.4µs ± 0%  -96.11%
BM_booleanAnd_1T/256 [using 1 threads]             44.6µs ± 0%              2.5µs ± 0%  -94.31%
BM_booleanAnd_1T/512 [using 1 threads]              178µs ± 0%               10µs ± 0%  -94.35%
BM_booleanAnd_1T/1k  [using 1 threads]              717µs ± 0%               78µs ± 1%  -89.07%
BM_booleanAnd_1T/2k  [using 1 threads]             2.87ms ± 0%             0.31ms ± 1%  -89.08%
BM_booleanAnd_1T/4k  [using 1 threads]             11.7ms ± 0%              1.9ms ± 4%  -83.55%
BM_booleanAnd_1T/10k [using 1 threads]             70.3ms ± 0%             17.2ms ± 4%  -75.48%
2020-04-20 20:16:28 +00:00
Aaron Franke
5c22c7a7de Make file formatting comply with POSIX and Unix standards
UTF-8, LF, no BOM, and newlines at the end of files
2020-03-23 18:09:02 +00:00
Srinivas Vasudevan
f6c6de5d63 Ensure Igamma does not NaN or Inf for large values. 2020-01-14 21:32:48 +00:00
Srinivas Vasudevan
2e099e8d8f Added special_packetmath test and tweaked bounds on tests.
Refactor shared packetmath code to header file.
(Squashed from PR !38)
2020-01-11 10:31:21 +00:00
Christoph Hertzberg
1e9664b147 Bug #1796: Make matrix squareroot usable for Map and Ref types 2019-12-20 18:10:22 +01:00
Christoph Hertzberg
c21771ac04 Use double-braces initialization (as everywhere else in the test-suite). 2019-12-19 19:20:48 +01:00
Eugene Zhulenev
ae07801dd8 Tensor block evaluation cost model 2019-12-18 20:07:00 +00:00
Eugene Zhulenev
1c879eb010 Remove V2 suffix from TensorBlock 2019-12-10 15:40:23 -08:00
Eugene Zhulenev
dbca11e880 Remove TensorBlock.h and old TensorBlock/BlockMapper 2019-12-10 14:31:44 -08:00
Janek Kozicki
11d6465326 fix AlignedVector3 inconsisent interface with other Vector classes, default constructor and operator- were missing. 2019-12-06 21:07:39 +01:00
Rasmus Munk Larsen
366cf005b0 Add missing initialization in cxx11_tensor_trace.cpp. 2019-12-04 23:56:37 +00:00
Mehdi Goli
00f32752f7 [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master branch.
* Unifying all loadLocalTile from lhs and rhs to an extract_block function.
* Adding get_tensor operation which was missing in TensorContractionMapper.
* Adding the -D method missing from cmake for Disable_Skinny Contraction operation.
* Wrapping all the indices in TensorScanSycl into Scan parameter struct.
* Fixing typo in Device SYCL
* Unifying load to private register for tall/skinny no shared
* Unifying load to vector tile for tensor-vector/vector-tensor operation
* Removing all the LHS/RHS class for extracting data from global
* Removing Outputfunction from TensorContractionSkinnyNoshared.
* Combining the local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining General Tensor-Vector and VectorTensor contraction into one kernel.
* Making double buffering optional for Tensor contraction when local memory is version is used.
* Modifying benchmark to accept custom Reduction Sizes
* Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host
* Adding Test for SYCL
* Modifying SYCL CMake
2019-11-28 10:08:54 +00:00
Hans Johnson
6fb3e5f176 STYLE: Remove CMake-language block-end command arguments
Ancient versions of CMake required else(), endif(), and similar block
termination commands to have arguments matching the command starting the block.
This is no longer the preferred style.
2019-10-31 11:36:27 -05:00
Gael Guennebaud
b9837ca9ae bug #1281: fix AutoDiffScalar's make_coherent for nested expression of constant ADs. 2019-11-14 14:58:08 +01:00
Eugene Zhulenev
13c3327f5c Remove legacy block evaluation support 2019-11-12 10:12:28 -08:00
Rasmus Munk Larsen
ebf04fb3e8 Fix data race in css11_tensor_notification test. 2019-11-08 17:44:50 -08:00
Rasmus Munk Larsen
97c0c5d485 Add block evaluation V2 to TensorAsyncExecutor.
Add async evaluation to a number of ops.
2019-10-22 12:42:44 -07:00
Rasmus Munk Larsen
668ab3fc47 Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate c++11 functionality with older compilers. 2019-10-18 16:42:00 -07:00
Eugene Zhulenev
0d2a14ce11 Cleanup Tensor block destination and materialized block storage allocation 2019-10-16 17:14:37 -07:00
Eugene Zhulenev
02431cbe71 TensorBroadcasting support for random/uniform blocks 2019-10-16 13:26:28 -07:00
Eugene Zhulenev
d380c23b2c Block evaluation for TensorGenerator/TensorReverse/TensorShuffling 2019-10-14 14:31:59 -07:00
Eugene Zhulenev
a411e9f344 Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor reverse op 2019-10-10 10:56:58 -07:00
Eugene Zhulenev
33e1746139 Block evaluation for TensorChipping + fixed bugs in TensorPadding and TensorSlicing 2019-10-09 12:45:31 -07:00
Gael Guennebaud
f0a4642bab Implement c++03 compatible fix for changeset 7a43af1a33 2019-10-09 16:00:57 +02:00
Gael Guennebaud
7a43af1a33 Fix compilation of FFTW unit test 2019-10-08 08:58:35 +02:00
Eugene Zhulenev
f74ab8cb8d Add block evaluation to TensorEvalTo and fix few small bugs 2019-10-07 15:34:26 -07:00
Eugene Zhulenev
98bdd7252e Fix compilation warnings and errors with clang in TensorBlockV2 code and tests 2019-10-04 10:15:33 -07:00
Eugene Zhulenev
60ae24ee1a Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect 2019-10-02 12:44:06 -07:00
Eugene Zhulenev
7c8bc0d928 Fix cxx11_tensor_block_io test 2019-09-25 11:48:11 -07:00
Eugene Zhulenev
71d5bedf72 Fix compilation warnings and errors with clang in TensorBlockV2 2019-09-25 11:25:22 -07:00
Eugene Zhulenev
c97b208468 Add new TensorBlock api implementation + tests 2019-09-24 15:17:35 -07:00
Eugene Zhulenev
ef9dfee7bd Tensor block evaluation V2 support for unary/binary/broadcsting 2019-09-24 12:52:45 -07:00
Rasmus Munk Larsen
1d5af0693c Add support for asynchronous evaluation of tensor casting expressions. 2019-09-19 13:54:49 -07:00
Srinivas Vasudevan
df0816b71f Merging eigen/eigen. 2019-09-16 19:33:29 -04:00
Srinivas Vasudevan
6e215cf109 Add Bessel functions to SpecialFunctions.
- Split SpecialFunctions files in to a separate BesselFunctions file.

In particular add:
    - Modified bessel functions of the second kind k0, k1, k0e, k1e
    - Bessel functions of the first kind j0, j1
    - Bessel functions of the second kind y0, y1
2019-09-14 12:16:47 -04:00
Deven Desai
cdb377d0cb Fix for the HIP build+test errors introduced by the ndtri support.
The fixes needed are
 * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs)
 * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC
 * removing an errant "j" from a testcase (don't know how that made it in to begin with!)
2019-09-06 16:03:49 +00:00
Eugene Zhulenev
d918bd9a8b Update ThreadLocal to use separate Initialize/Release callables 2019-09-10 16:13:32 -07:00
Eugene Zhulenev
e3dec4dcc1 ThreadLocal container that does not rely on thread local storage 2019-09-09 15:18:14 -07:00
Srinivas Vasudevan
e38dd48a27 PR 681: Add ndtri function, the inverse of the normal distribution function. 2019-08-12 19:26:29 -04:00
Eugene Zhulenev
47fefa235f Allow move-only done callback in TensorAsyncDevice 2019-09-03 17:20:56 -07:00
Eugene Zhulenev
a8d264fa9c Add test for const TensorMap underlying data mutation 2019-09-03 11:38:39 -07:00