Rasmus Munk Larsen
|
31629bb964
|
Get rid of unused variable warning.
|
2018-09-28 16:00:09 -07:00 |
|
Eugene Zhulenev
|
bb13d5d917
|
Fix bug in copy optimization in Tensor slicing.
|
2018-09-28 14:34:42 -07:00 |
|
Rasmus Munk Larsen
|
104e8fa074
|
Fix a few warnings and rename a variable to not shadow "last".
|
2018-09-28 12:00:08 -07:00 |
|
Rasmus Munk Larsen
|
7c1b47840a
|
Merged in ezhulenev/eigen-01 (pull request PR-514)
Add tests for evalShardedByInnerDim contraction + fix bugs
|
2018-09-28 18:37:54 +00:00 |
|
Eugene Zhulenev
|
524c81f3fa
|
Add tests for evalShardedByInnerDim contraction + fix bugs
|
2018-09-28 11:24:08 -07:00 |
|
Christoph Hertzberg
|
86ba50be39
|
Fix integer conversion warnings
|
2018-09-28 19:33:39 +02:00 |
|
Eugene Zhulenev
|
e95696acb3
|
Optimize TensorBlockCopyOp
|
2018-09-27 14:49:26 -07:00 |
|
Eugene Zhulenev
|
9f33e71e9d
|
Revert code lost in merge
|
2018-09-27 12:08:17 -07:00 |
|
Eugene Zhulenev
|
a7a3e9f2b6
|
Merge with eigen/eigen default
|
2018-09-27 12:05:06 -07:00 |
|
Eugene Zhulenev
|
9f4988959f
|
Remove explicit mkldnn support and redundant TensorContractionKernelBlocking
|
2018-09-27 11:49:19 -07:00 |
|
Rasmus Munk Larsen
|
1e5750a5b8
|
Merged in rmlarsen/eigen4 (pull request PR-511)
Parallelize tensor contraction over the inner dimension.
|
2018-09-27 17:18:32 +00:00 |
|
Gael Guennebaud
|
af3ad4b513
|
oops, I've been too fast in previous copy/paste
|
2018-09-27 09:28:57 +02:00 |
|
Gael Guennebaud
|
24b163a877
|
#pragma GCC diagnostic push/pop is not supported prioro to gcc 4.6
|
2018-09-27 09:23:54 +02:00 |
|
Eugene Zhulenev
|
b314376f9c
|
Test mkldnn pack for doubles
|
2018-09-26 18:22:24 -07:00 |
|
Eugene Zhulenev
|
22ed98a331
|
Conditionally add mkldnn test
|
2018-09-26 17:57:37 -07:00 |
|
Rasmus Munk Larsen
|
d956204ab2
|
Remove "false &&" left over from test.
|
2018-09-26 17:03:30 -07:00 |
|
Rasmus Munk Larsen
|
3815aeed7a
|
Parallelize tensor contraction over the inner dimension in cases where where one or both of the outer dimensions (m and n) are small but k is large. This speeds up individual matmul microbenchmarks by up to 85%.
Naming below is BM_Matmul_M_K_N_THREADS, measured on a 2-socket Intel Broadwell-based server.
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_Matmul_1_80_13522_1 387457 396013 -2.2%
BM_Matmul_1_80_13522_2 406487 230789 +43.2%
BM_Matmul_1_80_13522_4 395821 123211 +68.9%
BM_Matmul_1_80_13522_6 391625 97002 +75.2%
BM_Matmul_1_80_13522_8 408986 113828 +72.2%
BM_Matmul_1_80_13522_16 399988 67600 +83.1%
BM_Matmul_1_80_13522_22 411546 60044 +85.4%
BM_Matmul_1_80_13522_32 393528 57312 +85.4%
BM_Matmul_1_80_13522_44 390047 63525 +83.7%
BM_Matmul_1_80_13522_88 387876 63592 +83.6%
BM_Matmul_1_1500_500_1 245359 248119 -1.1%
BM_Matmul_1_1500_500_2 401833 143271 +64.3%
BM_Matmul_1_1500_500_4 210519 100231 +52.4%
BM_Matmul_1_1500_500_6 251582 86575 +65.6%
BM_Matmul_1_1500_500_8 211499 80444 +62.0%
BM_Matmul_3_250_512_1 70297 68551 +2.5%
BM_Matmul_3_250_512_2 70141 52450 +25.2%
BM_Matmul_3_250_512_4 67872 58204 +14.2%
BM_Matmul_3_250_512_6 71378 63340 +11.3%
BM_Matmul_3_250_512_8 69595 41652 +40.2%
BM_Matmul_3_250_512_16 72055 42549 +40.9%
BM_Matmul_3_250_512_22 70158 54023 +23.0%
BM_Matmul_3_250_512_32 71541 56042 +21.7%
BM_Matmul_3_250_512_44 71843 57019 +20.6%
BM_Matmul_3_250_512_88 69951 54045 +22.7%
BM_Matmul_3_1500_512_1 369328 374284 -1.4%
BM_Matmul_3_1500_512_2 428656 223603 +47.8%
BM_Matmul_3_1500_512_4 205599 139508 +32.1%
BM_Matmul_3_1500_512_6 214278 139071 +35.1%
BM_Matmul_3_1500_512_8 184149 142338 +22.7%
BM_Matmul_3_1500_512_16 156462 156983 -0.3%
BM_Matmul_3_1500_512_22 163905 158259 +3.4%
BM_Matmul_3_1500_512_32 155314 157662 -1.5%
BM_Matmul_3_1500_512_44 235434 158657 +32.6%
BM_Matmul_3_1500_512_88 156779 160275 -2.2%
BM_Matmul_1500_4_512_1 363358 349528 +3.8%
BM_Matmul_1500_4_512_2 303134 263319 +13.1%
BM_Matmul_1500_4_512_4 176208 130086 +26.2%
BM_Matmul_1500_4_512_6 148026 115449 +22.0%
BM_Matmul_1500_4_512_8 131656 98421 +25.2%
BM_Matmul_1500_4_512_16 134011 82861 +38.2%
BM_Matmul_1500_4_512_22 134950 85685 +36.5%
BM_Matmul_1500_4_512_32 133165 90081 +32.4%
BM_Matmul_1500_4_512_44 133203 90644 +32.0%
BM_Matmul_1500_4_512_88 134106 100566 +25.0%
BM_Matmul_4_1500_512_1 439243 435058 +1.0%
BM_Matmul_4_1500_512_2 451830 257032 +43.1%
BM_Matmul_4_1500_512_4 276434 164513 +40.5%
BM_Matmul_4_1500_512_6 182542 144827 +20.7%
BM_Matmul_4_1500_512_8 179411 166256 +7.3%
BM_Matmul_4_1500_512_16 158101 155560 +1.6%
BM_Matmul_4_1500_512_22 152435 155448 -1.9%
BM_Matmul_4_1500_512_32 155150 149538 +3.6%
BM_Matmul_4_1500_512_44 193842 149777 +22.7%
BM_Matmul_4_1500_512_88 149544 154468 -3.3%
|
2018-09-26 16:47:13 -07:00 |
|
Eugene Zhulenev
|
71cd3fbd6a
|
Support multiple contraction kernel types in TensorContractionThreadPool
|
2018-09-26 11:08:47 -07:00 |
|
Christoph Hertzberg
|
0a3356f4ec
|
Don't deactivate BVH test for clang (probably, this was failing for very old versions of clang)
|
2018-09-25 20:26:16 +02:00 |
|
Gael Guennebaud
|
41c3a2ffc1
|
Fix documentation of reshape to vectors.
|
2018-09-25 16:35:44 +02:00 |
|
Christoph Hertzberg
|
2c083ace3e
|
Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides
|
2018-09-24 18:01:17 +02:00 |
|
Gael Guennebaud
|
626942d9dd
|
fix alignment issue in ploaddup for AVX512
|
2018-09-28 16:57:32 +02:00 |
|
Gael Guennebaud
|
84a1101b36
|
Merge with default.
|
2018-09-23 21:52:58 +02:00 |
|
Gael Guennebaud
|
795e12393b
|
Fix logic in diagonal*dense product in a corner case.
The problem was for: diag(1x1) * mat(1,n)
|
2018-09-22 16:44:33 +02:00 |
|
Gael Guennebaud
|
bac36d0996
|
Demangle Travseral and Unrolling in Redux
|
2018-09-21 23:03:45 +02:00 |
|
Gael Guennebaud
|
c696dbcaa6
|
Fiw shadowing of last and all
|
2018-09-21 23:02:33 +02:00 |
|
Christoph Hertzberg
|
e3c8289047
|
Replace unused PREDICATE by corresponding STATIC_ASSERT
|
2018-09-21 21:15:51 +02:00 |
|
Gael Guennebaud
|
1bf12880ae
|
Add reshaped<>() shortcuts when returning vectors and remove the reshaping version of operator()(all)
|
2018-09-21 16:50:04 +02:00 |
|
Gael Guennebaud
|
4291f167ee
|
Add missing plugins to DynamicSparseMatrix -- fix sparse_extra_3
|
2018-09-21 14:53:43 +02:00 |
|
Gael Guennebaud
|
03a0cb2b72
|
fix unalignedcount for avx512
|
2018-09-21 14:40:26 +02:00 |
|
Gael Guennebaud
|
371068992a
|
Add more debug output
|
2018-09-21 14:32:39 +02:00 |
|
Gael Guennebaud
|
91716f03a7
|
Fix vectorization logic unit test for AVX512
|
2018-09-21 14:32:24 +02:00 |
|
Gael Guennebaud
|
b00e48a867
|
Improve slice-vectorization logic for redux (significant speed-up for reduxion of blocks)
|
2018-09-21 13:45:56 +02:00 |
|
Gael Guennebaud
|
a488d59787
|
merge with default Eigen
|
2018-09-21 11:51:49 +02:00 |
|
Gael Guennebaud
|
47720e7970
|
Doc fixes
|
2018-09-21 11:48:22 +02:00 |
|
Gael Guennebaud
|
3ec2985914
|
Merged indexing cleanup (pull request PR-506)
|
2018-09-21 09:36:05 +00:00 |
|
Gael Guennebaud
|
651e5d4866
|
Fix EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE for AVX512 or AVX with malloc aligned on 8 bytes only.
This change also make it future proof for AVX1024
|
2018-09-21 10:33:22 +02:00 |
|
Eugene Zhulenev
|
719e438a20
|
Collapsed revision
* Split cxx11_tensor_executor test
* Register test parts with EIGEN_SUFFIXES
* Fix EIGEN_SUFFIXES in cxx11_tensor_executor test
|
2018-09-20 15:19:12 -07:00 |
|
Gael Guennebaud
|
f0ef3467de
|
Fix doc
|
2018-09-20 22:57:28 +02:00 |
|
Gael Guennebaud
|
617f75f117
|
Add indexing namespace
|
2018-09-20 22:57:10 +02:00 |
|
Gael Guennebaud
|
0c56d22e2e
|
Fix shadowing
|
2018-09-20 22:56:21 +02:00 |
|
Rasmus Munk Larsen
|
8e2be7777e
|
Merged eigen/eigen into default
|
2018-09-20 11:41:15 -07:00 |
|
Rasmus Munk Larsen
|
5d2e759329
|
Initialize BlockIteratorState in a C++03 compatible way.
|
2018-09-20 11:40:43 -07:00 |
|
Gael Guennebaud
|
e04faca930
|
merge
|
2018-09-20 18:33:54 +02:00 |
|
Gael Guennebaud
|
d37188b9c1
|
Fix MPrealSupport
|
2018-09-20 18:30:10 +02:00 |
|
Gael Guennebaud
|
3c6dc93f99
|
Fix GPU support.
|
2018-09-20 18:29:21 +02:00 |
|
Gael Guennebaud
|
e0f6d352fb
|
Rename test/array.cpp to test/array_cwise.cpp to avoid conflicts with the array header.
|
2018-09-20 18:07:32 +02:00 |
|
Gael Guennebaud
|
eeeb18814f
|
Fix warning
|
2018-09-20 17:48:56 +02:00 |
|
Gael Guennebaud
|
9419f506d0
|
Fix regression introduced by the previous fix for AVX512.
It brokes the complex-complex case on SSE.
|
2018-09-20 17:32:34 +02:00 |
|
Christoph Hertzberg
|
a0166ab651
|
Workaround for spurious "array subscript is above array bounds" warnings with g++4.x
|
2018-09-20 17:08:43 +02:00 |
|