mirror of
https://gitlab.com/libeigen/eigen.git
synced 2024-12-27 07:29:52 +08:00
45e67a6fda
This seems to be the recommended approach for doing type punning in CUDA. See for example - https://stackoverflow.com/questions/47037104/cuda-type-punning-memcpy-vs-ub-union - https://developer.nvidia.com/blog/faster-parallel-reductions-kepler/ (the latter puns a double to an `int2`). The issue is that for CUDA, the `memcpy` is not elided, and ends up being an expensive operation. We already have similar `reintepret_cast`s across the Eigen codebase for GPU (as does TensorFlow). |
||
---|---|---|
.. | ||
src | ||
Cholesky | ||
CholmodSupport | ||
Core | ||
Dense | ||
Eigen | ||
Eigenvalues | ||
Geometry | ||
Householder | ||
IterativeLinearSolvers | ||
Jacobi | ||
KLUSupport | ||
LU | ||
MetisSupport | ||
OrderingMethods | ||
PardisoSupport | ||
PaStiXSupport | ||
QR | ||
QtAlignedMalloc | ||
Sparse | ||
SparseCholesky | ||
SparseCore | ||
SparseLU | ||
SparseQR | ||
SPQRSupport | ||
StdDeque | ||
StdList | ||
StdVector | ||
SuperLUSupport | ||
SVD | ||
UmfPackSupport |