Rasmus Munk Larsen
|
ab773c7e91
|
Extend support for Packet16b:
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests
This speeds up matmul for boolean matrices by about 10x
name old time/op new time/op delta
BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5)
BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5)
BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5)
BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5)
BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5)
BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5)
BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)
|
2020-04-28 16:12:47 +00:00 |
|
Aaron Franke
|
5c22c7a7de
|
Make file formatting comply with POSIX and Unix standards
UTF-8, LF, no BOM, and newlines at the end of files
|
2020-03-23 18:09:02 +00:00 |
|
Eugene Zhulenev
|
ae07801dd8
|
Tensor block evaluation cost model
|
2019-12-18 20:07:00 +00:00 |
|
Eugene Zhulenev
|
1c879eb010
|
Remove V2 suffix from TensorBlock
|
2019-12-10 15:40:23 -08:00 |
|
Eugene Zhulenev
|
dbca11e880
|
Remove TensorBlock.h and old TensorBlock/BlockMapper
|
2019-12-10 14:31:44 -08:00 |
|
Eugene Zhulenev
|
7c8bc0d928
|
Fix cxx11_tensor_block_io test
|
2019-09-25 11:48:11 -07:00 |
|
Eugene Zhulenev
|
c97b208468
|
Add new TensorBlock api implementation + tests
|
2019-09-24 15:17:35 -07:00 |
|