mirror of
https://gitlab.com/libeigen/eigen.git
synced 2024-12-21 07:19:46 +08:00
191 lines
6.7 KiB
Plaintext
191 lines
6.7 KiB
Plaintext
|
|
namespace Eigen {
|
|
|
|
/** \page HiPerformance Advanced - Using Eigen with high performance
|
|
|
|
In general achieving good performance with Eigen does no require any special effort:
|
|
simply write your expressions in the most high level way. This is especially true
|
|
for small fixed size matrices. For large matrices, however, it might useful to
|
|
take some care when writing your expressions in order to minimize useless evaluations
|
|
and optimize the performance.
|
|
In this page we will give a brief overview of the Eigen's internal mechanism to simplify
|
|
and evaluate complex expressions, and discuss the current limitations.
|
|
In particular we will focus on expressions matching level 2 and 3 BLAS routines, i.e,
|
|
all kind of matrix products and triangular solvers.
|
|
|
|
Indeed, in Eigen we have implemented a set of highly optimized routines which are very similar
|
|
to BLAS's ones. Unlike BLAS, those routines are made available to user via a high level and
|
|
natural API. Each of these routines can perform in a single evaluation a wide variety of expressions.
|
|
Given an expression, the challenge is then to map it to a minimal set of primitives.
|
|
As explained latter, this mechanism has some limitations, and knowing them will allow
|
|
you to write faster code by making your expressions more Eigen friendly.
|
|
|
|
\section GEMM General Matrix-Matrix product (GEMM)
|
|
|
|
Let's start with the most common primitive: the matrix product of general dense matrices.
|
|
In the BLAS world this corresponds to the GEMM routine. Our equivalent primitive can
|
|
perform the following operation:
|
|
\f$ C += \alpha op1(A) * op2(B) \f$
|
|
where A, B, and C are column and/or row major matrices (or sub-matrices),
|
|
alpha is a scalar value, and op1, op2 can be transpose, adjoint, conjugate, or the identity.
|
|
When Eigen detects a matrix product, it analyzes both sides of the product to extract a
|
|
unique scalar factor alpha, and for each side its effective storage (order and shape) and conjugate state.
|
|
More precisely each side is simplified by iteratively removing trivial expressions such as scalar multiple,
|
|
negate and conjugate. Transpose and Block expressions are not evaluated and only modify the storage order
|
|
and shape. All other expressions are immediately evaluated.
|
|
For instance, the following expression:
|
|
\code m1 -= (s1 * m2.adjoint() * (-(s3*m3).conjugate()*s2)).lazy() \endcode
|
|
is automatically simplified to:
|
|
\code m1 += (s1*s2*conj(s3)) * m2.adjoint() * m3.conjugate() \endcode
|
|
which exactly matches our GEMM routine.
|
|
|
|
\subsection GEMM_Limitations Limitations
|
|
Unfortunately, this simplification mechanism is not perfect yet and not all expressions which could be
|
|
handled by a single GEMM-like call are correctly detected.
|
|
<table class="tutorial_code">
|
|
<tr>
|
|
<td>Not optimal expression</td>
|
|
<td>Evaluated as</td>
|
|
<td>Optimal version (single evaluation)</td>
|
|
<td>Comments</td>
|
|
</tr>
|
|
<tr>
|
|
<td>\code m1 += m2 * m3; \endcode</td>
|
|
<td>\code temp = m2 * m3; m1 += temp; \endcode</td>
|
|
<td>\code m1 += (m2 * m3).lazy(); \endcode</td>
|
|
<td>Use .lazy() to tell Eigen the result and right-hand-sides do not alias.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>\code m1 += (s1 * (m2 * m3)).lazy(); \endcode</td>
|
|
<td>\code temp = (m2 * m3).lazy(); m1 += s1 * temp; \endcode</td>
|
|
<td>\code m1 += (s1 * m2 * m3).lazy(); \endcode</td>
|
|
<td>This is because m2 * m3 is immediately evaluated by the scalar product. <br>
|
|
Make sure the matrix product is the top most expression.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>\code m1 += s1 * (m2 * m3).lazy(); \endcode</td>
|
|
<td>\code m1 += s1 * m2 * m3; // using a naive product \endcode</td>
|
|
<td>\code m1 += (s1 * m2 * m3).lazy(); \endcode</td>
|
|
<td>Even though this expression is evaluated without temporary, it is actually even
|
|
worse than the previous case because here the .lazy() enforces Eigen to use a
|
|
naive (and slow) evaluation of the product.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>\code m1 = m1 + m2 * m3; \endcode</td>
|
|
<td>\code temp = (m2 * m3).lazy(); m1 = m1 + temp; \endcode</td>
|
|
<td>\code m1 += (m2 * m3).lazy(); \endcode</td>
|
|
<td>Here there is no way to detect at compile time that the two m1 are the same,
|
|
and so the matrix product will be immediately evaluated.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>\code m1 += ((s1*m2).block(....) * m3).lazy(); \endcode</td>
|
|
<td>\code temp = (s1*m2).block(....); m1 += (temp * m3).lazy(); \endcode</td>
|
|
<td>\code m1 += (s1 * m2.block(....) * m3).lazy(); \endcode</td>
|
|
<td>This is because our expression analyzer is currently not able to extract trivial
|
|
expressions nested in a Block expression. Therefore the nested scalar
|
|
multiple cannot be properly extracted.</td>
|
|
</tr>
|
|
</table>
|
|
|
|
Of course all these remarks hold for all other kind of products that we will describe in the following paragraphs.
|
|
|
|
|
|
|
|
|
|
<table class="tutorial_code">
|
|
<tr>
|
|
<td>BLAS equivalent routine</td>
|
|
<td>Efficient version <br> (compile to a single optimized evaluation)</td>
|
|
<td>Less efficient equivalent version <br> (requires multiple evaluations)</td>
|
|
<td>comments</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GEMM</td>
|
|
<td>m1 = s1 * m2 * m3</td>
|
|
<td>m1 = s1 * (m2 * m3)</td>
|
|
<td>This is because m2 * m3 is evaluated by the scalar product.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GEMM</td>
|
|
<td>m1 += s1 * m2.adjoint() * m3</td>
|
|
<td>m1 += (s1 * m2).adjoint() * m3</td>
|
|
<td>This is because our expression analyzer stops at the first transpose expression and cannot extract the nested scalar multiple.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GEMM</td>
|
|
<td>m1 += m2.adjoint() * m3</td>
|
|
<td>m1 += m2.conjugate().transpose() * m3</td>
|
|
<td>For the same reason. Use .adjoint() or .transpose().conjugate()</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GEMM</td>
|
|
<td>m1 -= (-(s0*m2).conjugate()*s1) * (s2 * m3.adjoint() * s3)</td>
|
|
<td></td>
|
|
<td>Note that s0 is automatically conjugated during the simplification of the expression.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>SYR</td>
|
|
<td>m.sefadjointView<LowerTriangular>().rankUpdate(v,s)</td>
|
|
<td></td>
|
|
<td>Computes m += s * v * v.adjoint()</td>
|
|
</tr>
|
|
<tr>
|
|
<td>SYR2</td>
|
|
<td>m.sefadjointView<LowerTriangular>().rankUpdate(u,v,s)</td>
|
|
<td></td>
|
|
<td>Computes m += s * u * v.adjoint() + s * v * u.adjoint()</td>
|
|
</tr>
|
|
<tr>
|
|
<td>SYRK</td>
|
|
<td>m1.sefadjointView<UpperTriangular>().rankUpdate(m2.adjoint(),s)</td>
|
|
<td></td>
|
|
<td>Computes m1 += s * m2.adjoint() * m2</td>
|
|
</tr>
|
|
<tr>
|
|
<td>SYMM/HEMM</td>
|
|
<td>m3 -= s1 * m1.sefadjointView<UpperTriangular>() * m2.adjoint()</td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>SYMM/HEMM</td>
|
|
<td>m3 += s1 * m2.transpose() * m1.conjugate().sefadjointView<UpperTriangular>()</td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>TRMM</td>
|
|
<td>m3 -= s1 * m1.triangularView<UnitUpperTriangular>() * m2.adjoint()</td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>TRSV / TRSM</td>
|
|
<td>m1.adjoint().triangularView<UnitLowerTriangular>().solveInPlace(m2)</td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
<td></td>
|
|
</tr>
|
|
</table>
|
|
|
|
*/
|
|
|
|
}
|