eigen/doc/HiPerformance.dox


namespace Eigen {

/** \page TopicWritingEfficientProductExpression Writing efficient matrix product expressions

In general achieving good performance with Eigen does no require any special effort:
simply write your expressions in the most high level way. This is especially true
for small fixed size matrices. For large matrices, however, it might be useful to
take some care when writing your expressions in order to minimize useless evaluations
and optimize the performance.
In this page we will give a brief overview of the Eigen's internal mechanism to simplify
and evaluate complex product expressions, and discuss the current limitations.
In particular we will focus on expressions matching level 2 and 3 BLAS routines, i.e,
all kind of matrix products and triangular solvers.

Indeed, in Eigen we have implemented a set of highly optimized routines which are very similar
to BLAS's ones. Unlike BLAS, those routines are made available to user via a high level and
natural API. Each of these routines can compute in a single evaluation a wide variety of expressions.
Given an expression, the challenge is then to map it to a minimal set of routines.
As explained latter, this mechanism has some limitations, and knowing them will allow
you to write faster code by making your expressions more Eigen friendly.

\section GEMM General Matrix-Matrix product (GEMM)

Let's start with the most common primitive: the matrix product of general dense matrices.
In the BLAS world this corresponds to the GEMM routine. Our equivalent primitive can
perform the following operation:
\f$ C.noalias() += \alpha op1(A) op2(B) \f$
where A, B, and C are column and/or row major matrices (or sub-matrices),
alpha is a scalar value, and op1, op2 can be transpose, adjoint, conjugate, or the identity.
When Eigen detects a matrix product, it analyzes both sides of the product to extract a
unique scalar factor alpha, and for each side, its effective storage order, shape, and conjugation states.
More precisely each side is simplified by iteratively removing trivial expressions such as scalar multiple,
negation and conjugation. Transpose and Block expressions are not evaluated and they only modify the storage order
and shape. All other expressions are immediately evaluated.
For instance, the following expression:
\code m1.noalias() -= s4 * (s1 * m2.adjoint() * (-(s3*m3).conjugate()*s2))  \endcode
is automatically simplified to:
\code m1.noalias() += (s1*s2*conj(s3)*s4) * m2.adjoint() * m3.conjugate() \endcode
which exactly matches our GEMM routine.

\subsection GEMM_Limitations Limitations
Unfortunately, this simplification mechanism is not perfect yet and not all expressions which could be
handled by a single GEMM-like call are correctly detected.
<table class="manual" style="width:100%">
<tr>
<th>Not optimal expression</th>
<th>Evaluated as</th>
<th>Optimal version (single evaluation)</th>
<th>Comments</th>
</tr>
<tr>
<td>\code
m1 += m2 * m3; \endcode</td>
<td>\code
temp = m2 * m3;
m1 += temp; \endcode</td>
<td>\code
m1.noalias() += m2 * m3; \endcode</td>
<td>Use .noalias() to tell Eigen the result and right-hand-sides do not alias. 
    Otherwise the product m2 * m3 is evaluated into a temporary.</td>
</tr>
<tr class="alt">
<td></td>
<td></td>
<td>\code
m1.noalias() += s1 * (m2 * m3); \endcode</td>
<td>This is a special feature of Eigen. Here the product between a scalar
    and a matrix product does not evaluate the matrix product but instead it
    returns a matrix product expression tracking the scalar scaling factor. <br>
    Without this optimization, the matrix product would be evaluated into a
    temporary as in the next example.</td>
</tr>
<tr>
<td>\code
m1.noalias() += (m2 * m3).adjoint(); \endcode</td>
<td>\code
temp = m2 * m3;
m1 += temp.adjoint(); \endcode</td>
<td>\code
m1.noalias() += m3.adjoint()
*              * m2.adjoint(); \endcode</td>
<td>This is because the product expression has the EvalBeforeNesting bit which
    enforces the evaluation of the product by the Tranpose expression.</td>
</tr>
<tr class="alt">
<td>\code
m1 = m1 + m2 * m3; \endcode</td>
<td>\code
temp = m2 * m3;
m1 = m1 + temp; \endcode</td>
<td>\code m1.noalias() += m2 * m3; \endcode</td>
<td>Here there is no way to detect at compile time that the two m1 are the same,
    and so the matrix product will be immediately evaluated.</td>
</tr>
<tr>
<td>\code
m1.noalias() = m4 + m2 * m3; \endcode</td>
<td>\code
temp = m2 * m3;
m1 = m4 + temp; \endcode</td>
<td>\code
m1 = m4;
m1.noalias() += m2 * m3; \endcode</td>
<td>First of all, here the .noalias() in the first expression is useless because
    m2*m3 will be evaluated anyway. However, note how this expression can be rewritten
    so that no temporary is required. (tip: for very small fixed size matrix
    it is slightly better to rewrite it like this: m1.noalias() = m2 * m3; m1 += m4;</td>
</tr>
<tr class="alt">
<td>\code
m1.noalias() += (s1*m2).block(..) * m3; \endcode</td>
<td>\code
temp = (s1*m2).block(..);
m1 += temp * m3; \endcode</td>
<td>\code
m1.noalias() += s1 * m2.block(..) * m3; \endcode</td>
<td>This is because our expression analyzer is currently not able to extract trivial
    expressions nested in a Block expression. Therefore the nested scalar
    multiple cannot be properly extracted.</td>
</tr>
</table>

Of course all these remarks hold for all other kind of products involving triangular or selfadjoint matrices.

*/

}
fix compilation of the doc and started a page dedicated to high performance and or BLAS users 2009-07-28 00:50:39 +08:00
			`namespace Eigen {`

update topic page on products 2010-07-04 16:37:32 +08:00			`/** \page TopicWritingEfficientProductExpression Writing efficient matrix product expressions`
fix compilation of the doc and started a page dedicated to high performance and or BLAS users 2009-07-28 00:50:39 +08:00
update doc 2009-07-28 18:08:26 +08:00			`In general achieving good performance with Eigen does no require any special effort:`
			`simply write your expressions in the most high level way. This is especially true`
update topic page on products 2010-07-04 16:37:32 +08:00			`for small fixed size matrices. For large matrices, however, it might be useful to`
update doc 2009-07-28 18:08:26 +08:00			`take some care when writing your expressions in order to minimize useless evaluations`
			`and optimize the performance.`
			`In this page we will give a brief overview of the Eigen's internal mechanism to simplify`
bugfix in compute_matrix_flags, optimization in LU, improve doc, and workaround aliasing detection in MatrixBase_eval snippet (not very nice but I don't know how to do it in a better way) 2009-08-16 16:55:10 +08:00			`and evaluate complex product expressions, and discuss the current limitations.`
update doc 2009-07-28 18:08:26 +08:00			`In particular we will focus on expressions matching level 2 and 3 BLAS routines, i.e,`
			`all kind of matrix products and triangular solvers.`

			`Indeed, in Eigen we have implemented a set of highly optimized routines which are very similar`
			`to BLAS's ones. Unlike BLAS, those routines are made available to user via a high level and`
bugfix in compute_matrix_flags, optimization in LU, improve doc, and workaround aliasing detection in MatrixBase_eval snippet (not very nice but I don't know how to do it in a better way) 2009-08-16 16:55:10 +08:00			`natural API. Each of these routines can compute in a single evaluation a wide variety of expressions.`
update topic page on products 2010-07-04 16:37:32 +08:00			`Given an expression, the challenge is then to map it to a minimal set of routines.`
update doc 2009-07-28 18:08:26 +08:00			`As explained latter, this mechanism has some limitations, and knowing them will allow`
			`you to write faster code by making your expressions more Eigen friendly.`

			`\section GEMM General Matrix-Matrix product (GEMM)`

			`Let's start with the most common primitive: the matrix product of general dense matrices.`
			`In the BLAS world this corresponds to the GEMM routine. Our equivalent primitive can`
			`perform the following operation:`
bugfix in compute_matrix_flags, optimization in LU, improve doc, and workaround aliasing detection in MatrixBase_eval snippet (not very nice but I don't know how to do it in a better way) 2009-08-16 16:55:10 +08:00			`\f$ C.noalias() += \alpha op1(A) op2(B) \f$`
update doc 2009-07-28 18:08:26 +08:00			`where A, B, and C are column and/or row major matrices (or sub-matrices),`
			`alpha is a scalar value, and op1, op2 can be transpose, adjoint, conjugate, or the identity.`
			`When Eigen detects a matrix product, it analyzes both sides of the product to extract a`
update topic page on products 2010-07-04 16:37:32 +08:00			`unique scalar factor alpha, and for each side, its effective storage order, shape, and conjugation states.`
update doc 2009-07-28 18:08:26 +08:00			`More precisely each side is simplified by iteratively removing trivial expressions such as scalar multiple,`
update topic page on products 2010-07-04 16:37:32 +08:00			`negation and conjugation. Transpose and Block expressions are not evaluated and they only modify the storage order`
update doc 2009-07-28 18:08:26 +08:00			`and shape. All other expressions are immediately evaluated.`
			`For instance, the following expression:`
update topic page on products 2010-07-04 16:37:32 +08:00			`\code m1.noalias() -= s4 * (s1 * m2.adjoint() * (-(s3m3).conjugate()s2)) \endcode`
update doc 2009-07-28 18:08:26 +08:00			`is automatically simplified to:`
update topic page on products 2010-07-04 16:37:32 +08:00			`\code m1.noalias() += (s1s2conj(s3)s4) m2.adjoint() * m3.conjugate() \endcode`
update doc 2009-07-28 18:08:26 +08:00			`which exactly matches our GEMM routine.`

			`\subsection GEMM_Limitations Limitations`
			`Unfortunately, this simplification mechanism is not perfect yet and not all expressions which could be`
			`handled by a single GEMM-like call are correctly detected.`
factorize CSS code, make use of the "manual" class when appropriate, clean the style of the big linear algebra table 2010-10-19 21:25:00 +08:00			`<table class="manual" style="width:100%">`
update doc 2009-07-28 18:08:26 +08:00			`<tr>`
move tables from class "tutorial_code" to "example" also remove a align="center" in the Aliasing page -- it doesn't make sense to have 1 centered table page when all others are left aligned. 2010-10-19 20:42:49 +08:00			`<th>Not optimal expression</th>`
			`<th>Evaluated as</th>`
			`<th>Optimal version (single evaluation)</th>`
			`<th>Comments</th>`
update doc 2009-07-28 18:08:26 +08:00			`</tr>`
			`<tr>`
update topic page on products 2010-07-04 16:37:32 +08:00			`<td>\code`
			`m1 += m2 * m3; \endcode</td>`
			`<td>\code`
			`temp = m2 * m3;`
			`m1 += temp; \endcode</td>`
			`<td>\code`
			`m1.noalias() += m2 * m3; \endcode</td>`
bugfix in compute_matrix_flags, optimization in LU, improve doc, and workaround aliasing detection in MatrixBase_eval snippet (not very nice but I don't know how to do it in a better way) 2009-08-16 16:55:10 +08:00			`<td>Use .noalias() to tell Eigen the result and right-hand-sides do not alias.`
			`Otherwise the product m2 * m3 is evaluated into a temporary.</td>`
update doc 2009-07-28 18:08:26 +08:00			`</tr>`
factorize CSS code, make use of the "manual" class when appropriate, clean the style of the big linear algebra table 2010-10-19 21:25:00 +08:00			`<tr class="alt">`
bugfix in compute_matrix_flags, optimization in LU, improve doc, and workaround aliasing detection in MatrixBase_eval snippet (not very nice but I don't know how to do it in a better way) 2009-08-16 16:55:10 +08:00			`<td></td>`
			`<td></td>`
update topic page on products 2010-07-04 16:37:32 +08:00			`<td>\code`
			`m1.noalias() += s1 * (m2 * m3); \endcode</td>`
bugfix in compute_matrix_flags, optimization in LU, improve doc, and workaround aliasing detection in MatrixBase_eval snippet (not very nice but I don't know how to do it in a better way) 2009-08-16 16:55:10 +08:00			`<td>This is a special feature of Eigen. Here the product between a scalar`
			`and a matrix product does not evaluate the matrix product but instead it`
			`returns a matrix product expression tracking the scalar scaling factor. <br>`
			`Without this optimization, the matrix product would be evaluated into a`
			`temporary as in the next example.</td>`
update doc 2009-07-28 18:08:26 +08:00			`</tr>`
			`<tr>`
update topic page on products 2010-07-04 16:37:32 +08:00			`<td>\code`
			`m1.noalias() += (m2 * m3).adjoint(); \endcode</td>`
			`<td>\code`
			`temp = m2 * m3;`
			`m1 += temp.adjoint(); \endcode</td>`
			`<td>\code`
			`m1.noalias() += m3.adjoint()`
Avoid inefficient 2x2 LU. Move atanh to internal for maintainability. 2012-08-30 23:40:30 +08:00			`* * m2.adjoint(); \endcode</td>`
bugfix in compute_matrix_flags, optimization in LU, improve doc, and workaround aliasing detection in MatrixBase_eval snippet (not very nice but I don't know how to do it in a better way) 2009-08-16 16:55:10 +08:00			`<td>This is because the product expression has the EvalBeforeNesting bit which`
			`enforces the evaluation of the product by the Tranpose expression.</td>`
update doc 2009-07-28 23:11:15 +08:00			`</tr>`
factorize CSS code, make use of the "manual" class when appropriate, clean the style of the big linear algebra table 2010-10-19 21:25:00 +08:00			`<tr class="alt">`
update topic page on products 2010-07-04 16:37:32 +08:00			`<td>\code`
			`m1 = m1 + m2 * m3; \endcode</td>`
			`<td>\code`
			`temp = m2 * m3;`
			`m1 = m1 + temp; \endcode</td>`
			`<td>\code m1.noalias() += m2 * m3; \endcode</td>`
update doc 2009-07-28 18:08:26 +08:00			`<td>Here there is no way to detect at compile time that the two m1 are the same,`
			`and so the matrix product will be immediately evaluated.</td>`
			`</tr>`
			`<tr>`
update topic page on products 2010-07-04 16:37:32 +08:00			`<td>\code`
			`m1.noalias() = m4 + m2 * m3; \endcode</td>`
			`<td>\code`
			`temp = m2 * m3;`
			`m1 = m4 + temp; \endcode</td>`
			`<td>\code`
			`m1 = m4;`
			`m1.noalias() += m2 * m3; \endcode</td>`
bugfix in compute_matrix_flags, optimization in LU, improve doc, and workaround aliasing detection in MatrixBase_eval snippet (not very nice but I don't know how to do it in a better way) 2009-08-16 16:55:10 +08:00			`<td>First of all, here the .noalias() in the first expression is useless because`
			`m2*m3 will be evaluated anyway. However, note how this expression can be rewritten`
update topic page on products 2010-07-04 16:37:32 +08:00			`so that no temporary is required. (tip: for very small fixed size matrix`
[PATCH 1/2] Misc. typos From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001 Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of: ``` als ans cas dum lastr lowd nd overfl pres preverse substraction te uint whch ``` --- CMakeLists.txt \| 26 +++++++++---------- Eigen/src/Core/GenericPacketMath.h \| 2 +- Eigen/src/SparseLU/SparseLU.h \| 2 +- bench/bench_norm.cpp \| 2 +- doc/HiPerformance.dox \| 2 +- doc/QuickStartGuide.dox \| 2 +- .../Eigen/CXX11/src/Tensor/TensorChipping.h \| 6 ++--- .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h \| 2 +- .../src/Tensor/TensorForwardDeclarations.h \| 4 +-- .../src/Tensor/TensorGpuHipCudaDefines.h \| 2 +- .../Eigen/CXX11/src/Tensor/TensorReduction.h \| 2 +- .../CXX11/src/Tensor/TensorReductionGpu.h \| 2 +- .../test/cxx11_tensor_concatenation.cpp \| 2 +- unsupported/test/cxx11_tensor_executor.cpp \| 2 +- 14 files changed, 29 insertions(+), 29 deletions(-) 2018-09-18 16:15:01 +08:00			`it is slightly better to rewrite it like this: m1.noalias() = m2 * m3; m1 += m4;</td>`
bugfix in compute_matrix_flags, optimization in LU, improve doc, and workaround aliasing detection in MatrixBase_eval snippet (not very nice but I don't know how to do it in a better way) 2009-08-16 16:55:10 +08:00			`</tr>`
factorize CSS code, make use of the "manual" class when appropriate, clean the style of the big linear algebra table 2010-10-19 21:25:00 +08:00			`<tr class="alt">`
update topic page on products 2010-07-04 16:37:32 +08:00			`<td>\code`
			`m1.noalias() += (s1m2).block(..) m3; \endcode</td>`
			`<td>\code`
			`temp = (s1*m2).block(..);`
			`m1 += temp * m3; \endcode</td>`
			`<td>\code`
			`m1.noalias() += s1 * m2.block(..) * m3; \endcode</td>`
update doc 2009-07-28 23:11:15 +08:00			`<td>This is because our expression analyzer is currently not able to extract trivial`
			`expressions nested in a Block expression. Therefore the nested scalar`
			`multiple cannot be properly extracted.</td>`
update doc 2009-07-28 18:08:26 +08:00			`</tr>`
			`</table>`

bugfix in compute_matrix_flags, optimization in LU, improve doc, and workaround aliasing detection in MatrixBase_eval snippet (not very nice but I don't know how to do it in a better way) 2009-08-16 16:55:10 +08:00			`Of course all these remarks hold for all other kind of products involving triangular or selfadjoint matrices.`
fix compilation of the doc and started a page dedicated to high performance and or BLAS users 2009-07-28 00:50:39 +08:00
			`*/`

			`}`