eigen/unsupported/Eigen/CXX11
Rasmus Munk Larsen eab7e52db2 [Eigen] Vectorize evaluation of coefficient-wise functions over tensor blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX.
A few benchmark numbers:

name                              old time/op             new time/op             delta
BM_Xent_16_10000_cpu              448µs ± 3%              389µs ± 2%  -13.21%
(p=0.008 n=5+5)
BM_Xent_32_10000_cpu              575µs ± 6%              454µs ± 3%  -21.00%          (p=0.008 n=5+5)
BM_Xent_64_10000_cpu              933µs ± 4%              712µs ± 1%  -23.71%          (p=0.008 n=5+5)
2019-08-07 12:57:42 -07:00
..
src [Eigen] Vectorize evaluation of coefficient-wise functions over tensor blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX. 2019-08-07 12:57:42 -07:00
CMakeLists.txt
Tensor Fix GPU build due to gpu_assert not always being defined. 2018-10-18 16:29:29 -07:00
TensorSymmetry bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
ThreadPool bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00