Eugene Zhulenev
|
524c81f3fa
|
Add tests for evalShardedByInnerDim contraction + fix bugs
|
2018-09-28 11:24:08 -07:00 |
|
Eugene Zhulenev
|
e95696acb3
|
Optimize TensorBlockCopyOp
|
2018-09-27 14:49:26 -07:00 |
|
Eugene Zhulenev
|
9f33e71e9d
|
Revert code lost in merge
|
2018-09-27 12:08:17 -07:00 |
|
Eugene Zhulenev
|
a7a3e9f2b6
|
Merge with eigen/eigen default
|
2018-09-27 12:05:06 -07:00 |
|
Eugene Zhulenev
|
9f4988959f
|
Remove explicit mkldnn support and redundant TensorContractionKernelBlocking
|
2018-09-27 11:49:19 -07:00 |
|
Rasmus Munk Larsen
|
d956204ab2
|
Remove "false &&" left over from test.
|
2018-09-26 17:03:30 -07:00 |
|
Rasmus Munk Larsen
|
3815aeed7a
|
Parallelize tensor contraction over the inner dimension in cases where where one or both of the outer dimensions (m and n) are small but k is large. This speeds up individual matmul microbenchmarks by up to 85%.
Naming below is BM_Matmul_M_K_N_THREADS, measured on a 2-socket Intel Broadwell-based server.
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_Matmul_1_80_13522_1 387457 396013 -2.2%
BM_Matmul_1_80_13522_2 406487 230789 +43.2%
BM_Matmul_1_80_13522_4 395821 123211 +68.9%
BM_Matmul_1_80_13522_6 391625 97002 +75.2%
BM_Matmul_1_80_13522_8 408986 113828 +72.2%
BM_Matmul_1_80_13522_16 399988 67600 +83.1%
BM_Matmul_1_80_13522_22 411546 60044 +85.4%
BM_Matmul_1_80_13522_32 393528 57312 +85.4%
BM_Matmul_1_80_13522_44 390047 63525 +83.7%
BM_Matmul_1_80_13522_88 387876 63592 +83.6%
BM_Matmul_1_1500_500_1 245359 248119 -1.1%
BM_Matmul_1_1500_500_2 401833 143271 +64.3%
BM_Matmul_1_1500_500_4 210519 100231 +52.4%
BM_Matmul_1_1500_500_6 251582 86575 +65.6%
BM_Matmul_1_1500_500_8 211499 80444 +62.0%
BM_Matmul_3_250_512_1 70297 68551 +2.5%
BM_Matmul_3_250_512_2 70141 52450 +25.2%
BM_Matmul_3_250_512_4 67872 58204 +14.2%
BM_Matmul_3_250_512_6 71378 63340 +11.3%
BM_Matmul_3_250_512_8 69595 41652 +40.2%
BM_Matmul_3_250_512_16 72055 42549 +40.9%
BM_Matmul_3_250_512_22 70158 54023 +23.0%
BM_Matmul_3_250_512_32 71541 56042 +21.7%
BM_Matmul_3_250_512_44 71843 57019 +20.6%
BM_Matmul_3_250_512_88 69951 54045 +22.7%
BM_Matmul_3_1500_512_1 369328 374284 -1.4%
BM_Matmul_3_1500_512_2 428656 223603 +47.8%
BM_Matmul_3_1500_512_4 205599 139508 +32.1%
BM_Matmul_3_1500_512_6 214278 139071 +35.1%
BM_Matmul_3_1500_512_8 184149 142338 +22.7%
BM_Matmul_3_1500_512_16 156462 156983 -0.3%
BM_Matmul_3_1500_512_22 163905 158259 +3.4%
BM_Matmul_3_1500_512_32 155314 157662 -1.5%
BM_Matmul_3_1500_512_44 235434 158657 +32.6%
BM_Matmul_3_1500_512_88 156779 160275 -2.2%
BM_Matmul_1500_4_512_1 363358 349528 +3.8%
BM_Matmul_1500_4_512_2 303134 263319 +13.1%
BM_Matmul_1500_4_512_4 176208 130086 +26.2%
BM_Matmul_1500_4_512_6 148026 115449 +22.0%
BM_Matmul_1500_4_512_8 131656 98421 +25.2%
BM_Matmul_1500_4_512_16 134011 82861 +38.2%
BM_Matmul_1500_4_512_22 134950 85685 +36.5%
BM_Matmul_1500_4_512_32 133165 90081 +32.4%
BM_Matmul_1500_4_512_44 133203 90644 +32.0%
BM_Matmul_1500_4_512_88 134106 100566 +25.0%
BM_Matmul_4_1500_512_1 439243 435058 +1.0%
BM_Matmul_4_1500_512_2 451830 257032 +43.1%
BM_Matmul_4_1500_512_4 276434 164513 +40.5%
BM_Matmul_4_1500_512_6 182542 144827 +20.7%
BM_Matmul_4_1500_512_8 179411 166256 +7.3%
BM_Matmul_4_1500_512_16 158101 155560 +1.6%
BM_Matmul_4_1500_512_22 152435 155448 -1.9%
BM_Matmul_4_1500_512_32 155150 149538 +3.6%
BM_Matmul_4_1500_512_44 193842 149777 +22.7%
BM_Matmul_4_1500_512_88 149544 154468 -3.3%
|
2018-09-26 16:47:13 -07:00 |
|
Eugene Zhulenev
|
71cd3fbd6a
|
Support multiple contraction kernel types in TensorContractionThreadPool
|
2018-09-26 11:08:47 -07:00 |
|
Gael Guennebaud
|
c696dbcaa6
|
Fiw shadowing of last and all
|
2018-09-21 23:02:33 +02:00 |
|
Gael Guennebaud
|
4291f167ee
|
Add missing plugins to DynamicSparseMatrix -- fix sparse_extra_3
|
2018-09-21 14:53:43 +02:00 |
|
Rasmus Munk Larsen
|
8e2be7777e
|
Merged eigen/eigen into default
|
2018-09-20 11:41:15 -07:00 |
|
Rasmus Munk Larsen
|
5d2e759329
|
Initialize BlockIteratorState in a C++03 compatible way.
|
2018-09-20 11:40:43 -07:00 |
|
Gael Guennebaud
|
e04faca930
|
merge
|
2018-09-20 18:33:54 +02:00 |
|
Gael Guennebaud
|
d37188b9c1
|
Fix MPrealSupport
|
2018-09-20 18:30:10 +02:00 |
|
Gael Guennebaud
|
3c6dc93f99
|
Fix GPU support.
|
2018-09-20 18:29:21 +02:00 |
|
Gael Guennebaud
|
9419f506d0
|
Fix regression introduced by the previous fix for AVX512.
It brokes the complex-complex case on SSE.
|
2018-09-20 17:32:34 +02:00 |
|
Christoph Hertzberg
|
a0166ab651
|
Workaround for spurious "array subscript is above array bounds" warnings with g++4.x
|
2018-09-20 17:08:43 +02:00 |
|
Gael Guennebaud
|
71496b0e25
|
Fix gebp kernel for real+complex in case only reals are vectorized (e.g., AVX512).
This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.
|
2018-09-20 17:01:24 +02:00 |
|
Rasmus Munk Larsen
|
44d8274383
|
Cast to longer type.
|
2018-09-19 13:31:42 -07:00 |
|
Rasmus Munk Larsen
|
d638b62dda
|
Silence compiler warning.
|
2018-09-19 13:27:55 -07:00 |
|
Rasmus Munk Larsen
|
db9c9df59a
|
Silence more compiler warnings.
|
2018-09-19 11:50:27 -07:00 |
|
Rasmus Munk Larsen
|
febd09dcc0
|
Silence compiler warnings in ThreadPoolInterface.h.
|
2018-09-19 11:11:04 -07:00 |
|
Eugene Zhulenev
|
c4627039ac
|
Support static dimensions (aka IndexList) in Tensor::resize(...)
|
2018-09-18 14:25:21 -07:00 |
|
Eugene Zhulenev
|
218a7b9840
|
Enable DSizes type promotion with c++03 compilers
|
2018-09-18 10:57:00 -07:00 |
|
Ravi Kiran
|
1f0c941c3d
|
Collapsed revision
* Merged eigen/eigen into default
|
2018-09-17 18:29:12 -07:00 |
|
Rasmus Munk Larsen
|
03a88c57e1
|
Merged in ezhulenev/eigen-02 (pull request PR-498)
Add DSizes index type promotion
|
2018-09-17 21:58:38 +00:00 |
|
Rasmus Munk Larsen
|
5ca0e4a245
|
Merged in ezhulenev/eigen-01 (pull request PR-497)
Fix warnings in IndexList array_prod
|
2018-09-17 20:15:06 +00:00 |
|
Eugene Zhulenev
|
a5cd4e9ad1
|
Replace deprecated Eigen::DenseIndex with Eigen::Index in TensorIndexList
|
2018-09-17 10:58:07 -07:00 |
|
Gael Guennebaud
|
b311bfb752
|
bug #1596: fix inclusion of Eigen's header within unsupported modules.
|
2018-09-17 09:54:29 +02:00 |
|
Gael Guennebaud
|
72f19c827a
|
typo
|
2018-09-16 22:10:34 +02:00 |
|
Eugene Zhulenev
|
66f056776f
|
Add DSizes index type promotion
|
2018-09-15 15:17:38 -07:00 |
|
Eugene Zhulenev
|
f313126dab
|
Fix warnings in IndexList array_prod
|
2018-09-15 13:47:54 -07:00 |
|
Christoph Hertzberg
|
42705ba574
|
Fix weird error for building with g++-4.7 in C++03 mode.
|
2018-09-15 12:43:41 +02:00 |
|
Rasmus Munk Larsen
|
c2383f95af
|
Merged in ezhulenev/eigen/fix_dsizes (pull request PR-494)
Fix DSizes IndexList constructor
|
2018-09-15 02:36:19 +00:00 |
|
Rasmus Munk Larsen
|
30290cdd56
|
Merged in ezhulenev/eigen/moar_eigen_fixes_3 (pull request PR-493)
Const cast scalar pointer in TensorSlicingOp evaluator
Approved-by: Sameer Agarwal <sameeragarwal@google.com>
|
2018-09-15 02:35:07 +00:00 |
|
Eugene Zhulenev
|
f7d0053cf0
|
Fix DSizes IndexList constructor
|
2018-09-14 19:19:13 -07:00 |
|
Rasmus Munk Larsen
|
601e289d27
|
Merged in ezhulenev/eigen/moar_eigen_fixes_1 (pull request PR-492)
Explicitly construct tensor block dimensions from evaluator dimensions
|
2018-09-15 01:36:21 +00:00 |
|
Eugene Zhulenev
|
71070a1e84
|
Const cast scalar pointer in TensorSlicingOp evaluator
|
2018-09-14 17:17:50 -07:00 |
|
Eugene Zhulenev
|
4863375723
|
Explicitly construct tensor block dimensions from evaluator dimensions
|
2018-09-14 16:55:05 -07:00 |
|
Rasmus Munk Larsen
|
14e35855e1
|
Merged in chtz/eigen-maxsizevector (pull request PR-490)
Let MaxSizeVector respect alignment of objects
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
|
2018-09-14 23:29:24 +00:00 |
|
Eugene Zhulenev
|
1b8d70a22b
|
Support reshaping with static shapes and dimensions conversion in tensor broadcasting
|
2018-09-14 15:25:27 -07:00 |
|
Christoph Hertzberg
|
007f165c69
|
bug #1598: Let MaxSizeVector respect alignment of objects and add a unit test
Also revert 8b3d9ed081
|
2018-09-14 20:21:56 +02:00 |
|
Rasmus Munk Larsen
|
6313dde390
|
Fix merge error.
|
2018-09-13 16:42:05 -07:00 |
|
Rasmus Munk Larsen
|
0db590d22d
|
Backed out changeset 01197e4452
|
2018-09-13 16:20:57 -07:00 |
|
Rasmus Munk Larsen
|
b3f4c067d9
|
Merge
|
2018-09-13 16:18:52 -07:00 |
|
Rasmus Munk Larsen
|
2b07018140
|
Enable vectorized version on GPUs. The underlying bug has been fixed.
|
2018-09-13 16:12:22 -07:00 |
|
Rasmus Munk Larsen
|
53568e3549
|
Merged in ezhulenev/eigen/tiled_evalution_support (pull request PR-444)
Tiled evaluation for Tensor ops
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
Approved-by: Gael Guennebaud <g.gael@free.fr>
|
2018-09-13 22:05:47 +00:00 |
|
Eugene Zhulenev
|
01197e4452
|
Fix warnings
|
2018-09-13 15:03:36 -07:00 |
|
Gael Guennebaud
|
7f3b17e403
|
MSVC 2015 supports c++11 thread-local-storage
|
2018-09-13 18:15:07 +02:00 |
|
Rasmus Munk Larsen
|
e289f44c56
|
Don't vectorize the MeanReducer unless pdiv is available.
|
2018-09-11 14:09:00 -07:00 |
|