Commit Graph

2620 Commits

Author SHA1 Message Date
Christoph Hertzberg
0ec8afde57 Fixed most conversion warnings in MatrixFunctions module 2018-11-20 16:23:28 +01:00
Rasmus Munk Larsen
72928a2c8a Merged in rmlarsen/eigen2 (pull request PR-543)
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.

Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2018-11-13 17:10:30 +00:00
Rasmus Munk Larsen
cda479d626 Remove accidental changes. 2018-11-12 18:34:04 -08:00
Rasmus Munk Larsen
719d9aee65 Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth. 2018-11-12 17:46:02 -08:00
Rasmus Munk Larsen
93f9988a7e A few small fixes to a) prevent throwing in ctors and dtors of the threading code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles. 2018-11-09 14:15:32 -08:00
Rasmus Munk Larsen
07fcdd1438 Merged in ezhulenev/eigen-02 (pull request PR-534)
Fix cxx11_tensor_{block_access, reduction} tests
2018-10-25 18:34:35 +00:00
Eugene Zhulenev
8a977c1f46 Fix cxx11_tensor_{block_access, reduction} tests 2018-10-25 11:31:29 -07:00
Christoph Hertzberg
449ff74672 Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file).
Manually grafted from d107a371c6
2018-10-19 21:10:28 +02:00
Christoph Hertzberg
40fa6f98bf bug #1606: Explicitly set the standard before find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11.
Grafted manually from a4afa90d16
2018-10-19 17:20:51 +02:00
Rasmus Munk Larsen
dda68f56ec Fix GPU build due to gpu_assert not always being defined. 2018-10-18 16:29:29 -07:00
Eugene Zhulenev
9e96e91936 Move from rvalue arguments in ThreadPool enqueue* methods 2018-10-16 16:48:32 -07:00
Eugene Zhulenev
217d839816 Reduce thread scheduling overhead in parallelFor 2018-10-16 14:53:06 -07:00
Rasmus Munk Larsen
d52763bb4f Merged in ezhulenev/eigen-02 (pull request PR-528)
[TensorBlockIO] Check if it's allowed to squeeze inner dimensions

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-10-16 15:39:40 +00:00
Eugene Zhulenev
900c7c61bb Check if it's allowed to squueze inner dimensions in TensorBlockIO 2018-10-15 16:52:33 -07:00
Gael Guennebaud
f0fb95135d Iterative solvers: unify and fix handling of multiple rhs.
m_info was not properly computed and the logic was repeated in several places.
2018-10-15 23:47:46 +02:00
Gael Guennebaud
2747b98cfc DGMRES: fix null rhs, fix restart, fix m_isDeflInitialized for multiple solve 2018-10-15 23:46:00 +02:00
Gael Guennebaud
d835a0bf53 relax number of iterations checks to avoid false negatives 2018-10-15 10:23:32 +02:00
Gael Guennebaud
8214cf1896 Make sparse_basic includable from sparse_extra, but disable it since sparse_basic(DynamicSparseMatrix) does not compile at all anyways 2018-10-11 10:27:23 +02:00
Christoph Hertzberg
3f2c8b7ff0 Fix a lot of Doxygen warnings in Tensor module 2018-10-09 20:22:47 +02:00
Gael Guennebaud
93a6192e98 fix mpreal for mpfr<4.0.0 2018-10-09 09:15:22 +02:00
Rasmus Munk Larsen
d16634c4d4 Fix out-of bounds access in TensorArgMax.h. 2018-10-08 16:41:36 -07:00
Rasmus Munk Larsen
1a737e1d6a Fix contraction test. 2018-10-08 16:37:07 -07:00
Gael Guennebaud
2eda9783de typo 2018-10-08 21:37:46 +02:00
Gael Guennebaud
6cc9b2c831 fix warning in mpreal.h 2018-10-08 18:25:37 +02:00
Gael Guennebaud
e29bfe8479 Update included mpreal header to 3.6.5 and fix deprecated warnings. 2018-10-08 17:09:23 +02:00
Gael Guennebaud
64b1a15318 Workaround stupid warning 2018-10-08 12:01:18 +02:00
Christoph Hertzberg
c5f1d0a72a Fix shadow warning 2018-10-02 19:01:08 +02:00
Christoph Hertzberg
b92c71235d Move struct outside of method for C++03 compatibility. 2018-10-02 18:59:10 +02:00
Christoph Hertzberg
051f9c1aff Make code compile in C++03 mode again 2018-10-02 18:36:30 +02:00
Christoph Hertzberg
b786ce8c72 Fix conversion warning ... again 2018-10-02 18:35:25 +02:00
Christoph Hertzberg
564ca71e39 Merged in deven-amd/eigen/HIP_fixes (pull request PR-518)
PR with HIP specific fixes (for the eigen nightly regression failures in HIP mode)
2018-10-01 16:51:04 +00:00
Deven Desai
94898488a6 This commit contains the following (HIP specific) updates:
- unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h
  Changing "pass-by-reference" argument to be "pass-by-value" instead
  (in a  __global__ function decl).
  "pass-by-reference" arguments to __global__ functions are unwise,
  and will be explicitly flagged as errors by the newer versions of HIP.

- Eigen/src/Core/util/Memory.h
- unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
  Changes introduced in recent commits breaks the HIP compile.
  Adding EIGEN_DEVICE_FUNC attribute to some functions and
  calling ::malloc/free instead of the corresponding std:: versions
  to get the HIP compile working again

- unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
  Change introduced a recent commit breaks the HIP compile
  (link stage errors out due to failure to inline a function).
  Disabling the recently introduced code (only for HIP compile), to get
  the eigen nightly testing going again.
  Will submit another PR once we have te proper fix.

- Eigen/src/Core/util/ConfigureVectorization.h
  Enabling GPU VECTOR support when HIP compiler is in use
  (for both the host and device compile phases)
2018-10-01 14:28:37 +00:00
Rasmus Munk Larsen
2088c0897f Merged eigen/eigen into default 2018-09-28 16:00:46 -07:00
Rasmus Munk Larsen
31629bb964 Get rid of unused variable warning. 2018-09-28 16:00:09 -07:00
Eugene Zhulenev
bb13d5d917 Fix bug in copy optimization in Tensor slicing. 2018-09-28 14:34:42 -07:00
Rasmus Munk Larsen
104e8fa074 Fix a few warnings and rename a variable to not shadow "last". 2018-09-28 12:00:08 -07:00
Rasmus Munk Larsen
7c1b47840a Merged in ezhulenev/eigen-01 (pull request PR-514)
Add tests for evalShardedByInnerDim contraction + fix bugs
2018-09-28 18:37:54 +00:00
Eugene Zhulenev
524c81f3fa Add tests for evalShardedByInnerDim contraction + fix bugs 2018-09-28 11:24:08 -07:00
Christoph Hertzberg
86ba50be39 Fix integer conversion warnings 2018-09-28 19:33:39 +02:00
Eugene Zhulenev
e95696acb3 Optimize TensorBlockCopyOp 2018-09-27 14:49:26 -07:00
Eugene Zhulenev
9f33e71e9d Revert code lost in merge 2018-09-27 12:08:17 -07:00
Eugene Zhulenev
a7a3e9f2b6 Merge with eigen/eigen default 2018-09-27 12:05:06 -07:00
Eugene Zhulenev
9f4988959f Remove explicit mkldnn support and redundant TensorContractionKernelBlocking 2018-09-27 11:49:19 -07:00
Eugene Zhulenev
b314376f9c Test mkldnn pack for doubles 2018-09-26 18:22:24 -07:00
Eugene Zhulenev
22ed98a331 Conditionally add mkldnn test 2018-09-26 17:57:37 -07:00
Rasmus Munk Larsen
d956204ab2 Remove "false &&" left over from test. 2018-09-26 17:03:30 -07:00
Rasmus Munk Larsen
3815aeed7a Parallelize tensor contraction over the inner dimension in cases where where one or both of the outer dimensions (m and n) are small but k is large. This speeds up individual matmul microbenchmarks by up to 85%.
Naming below is BM_Matmul_M_K_N_THREADS, measured on a 2-socket Intel Broadwell-based server.

Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_Matmul_1_80_13522_1                  387457    396013     -2.2%
BM_Matmul_1_80_13522_2                  406487    230789    +43.2%
BM_Matmul_1_80_13522_4                  395821    123211    +68.9%
BM_Matmul_1_80_13522_6                  391625     97002    +75.2%
BM_Matmul_1_80_13522_8                  408986    113828    +72.2%
BM_Matmul_1_80_13522_16                 399988     67600    +83.1%
BM_Matmul_1_80_13522_22                 411546     60044    +85.4%
BM_Matmul_1_80_13522_32                 393528     57312    +85.4%
BM_Matmul_1_80_13522_44                 390047     63525    +83.7%
BM_Matmul_1_80_13522_88                 387876     63592    +83.6%
BM_Matmul_1_1500_500_1                  245359    248119     -1.1%
BM_Matmul_1_1500_500_2                  401833    143271    +64.3%
BM_Matmul_1_1500_500_4                  210519    100231    +52.4%
BM_Matmul_1_1500_500_6                  251582     86575    +65.6%
BM_Matmul_1_1500_500_8                  211499     80444    +62.0%
BM_Matmul_3_250_512_1                    70297     68551     +2.5%
BM_Matmul_3_250_512_2                    70141     52450    +25.2%
BM_Matmul_3_250_512_4                    67872     58204    +14.2%
BM_Matmul_3_250_512_6                    71378     63340    +11.3%
BM_Matmul_3_250_512_8                    69595     41652    +40.2%
BM_Matmul_3_250_512_16                   72055     42549    +40.9%
BM_Matmul_3_250_512_22                   70158     54023    +23.0%
BM_Matmul_3_250_512_32                   71541     56042    +21.7%
BM_Matmul_3_250_512_44                   71843     57019    +20.6%
BM_Matmul_3_250_512_88                   69951     54045    +22.7%
BM_Matmul_3_1500_512_1                  369328    374284     -1.4%
BM_Matmul_3_1500_512_2                  428656    223603    +47.8%
BM_Matmul_3_1500_512_4                  205599    139508    +32.1%
BM_Matmul_3_1500_512_6                  214278    139071    +35.1%
BM_Matmul_3_1500_512_8                  184149    142338    +22.7%
BM_Matmul_3_1500_512_16                 156462    156983     -0.3%
BM_Matmul_3_1500_512_22                 163905    158259     +3.4%
BM_Matmul_3_1500_512_32                 155314    157662     -1.5%
BM_Matmul_3_1500_512_44                 235434    158657    +32.6%
BM_Matmul_3_1500_512_88                 156779    160275     -2.2%
BM_Matmul_1500_4_512_1                  363358    349528     +3.8%
BM_Matmul_1500_4_512_2                  303134    263319    +13.1%
BM_Matmul_1500_4_512_4                  176208    130086    +26.2%
BM_Matmul_1500_4_512_6                  148026    115449    +22.0%
BM_Matmul_1500_4_512_8                  131656     98421    +25.2%
BM_Matmul_1500_4_512_16                 134011     82861    +38.2%
BM_Matmul_1500_4_512_22                 134950     85685    +36.5%
BM_Matmul_1500_4_512_32                 133165     90081    +32.4%
BM_Matmul_1500_4_512_44                 133203     90644    +32.0%
BM_Matmul_1500_4_512_88                 134106    100566    +25.0%
BM_Matmul_4_1500_512_1                  439243    435058     +1.0%
BM_Matmul_4_1500_512_2                  451830    257032    +43.1%
BM_Matmul_4_1500_512_4                  276434    164513    +40.5%
BM_Matmul_4_1500_512_6                  182542    144827    +20.7%
BM_Matmul_4_1500_512_8                  179411    166256     +7.3%
BM_Matmul_4_1500_512_16                 158101    155560     +1.6%
BM_Matmul_4_1500_512_22                 152435    155448     -1.9%
BM_Matmul_4_1500_512_32                 155150    149538     +3.6%
BM_Matmul_4_1500_512_44                 193842    149777    +22.7%
BM_Matmul_4_1500_512_88                 149544    154468     -3.3%
2018-09-26 16:47:13 -07:00
Eugene Zhulenev
71cd3fbd6a Support multiple contraction kernel types in TensorContractionThreadPool 2018-09-26 11:08:47 -07:00
Christoph Hertzberg
0a3356f4ec Don't deactivate BVH test for clang (probably, this was failing for very old versions of clang) 2018-09-25 20:26:16 +02:00
Christoph Hertzberg
2c083ace3e Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides 2018-09-24 18:01:17 +02:00