Gael Guennebaud
cd8b996f99
Extend unit test and documentation of SelfAdjointEigenSolver::computeDirect
2015-06-08 16:16:42 +02:00
Gael Guennebaud
8f031a3cee
bug #997 : add missing evaluators for m.lazyProduct(v.homogeneous())
2015-06-08 15:43:41 +02:00
Gael Guennebaud
274b1f5d7e
Fix homogeneous() for 1x1 matrix: in this case, homogeneous follows the storage order guaranteeing that v.transpose().homogeneous() == v.homogeneous().transpose()
2015-06-08 15:40:51 +02:00
Gael Guennebaud
cbe3a1a83e
Add missing accessors for 1D index based access to Replicate<> expressions.
2015-06-08 15:39:09 +02:00
Gael Guennebaud
a7ae628c9f
bug #1005 : fix regression regarding sparse coeff-wise binary operator that did not trigger a static assertion for mismatched storage
2015-06-08 10:14:08 +02:00
Gael Guennebaud
0a9b5d1396
bug #705 : fix handling of Lapack potrf return code
2015-06-05 15:59:13 +02:00
Gael Guennebaud
d0b7b5cb55
minor documentation fixes
2015-06-05 14:40:07 +02:00
Gael Guennebaud
56d4ef7ad6
BiCGSTAB: set default guess to 0, and improve restart mechanism by recomputing the accurate residual.
2015-06-05 14:37:57 +02:00
Gael Guennebaud
d457734a19
Avoid calling smart_copy with null pointers.
2015-05-25 22:30:56 +02:00
Benoit Jacob
051d5325cc
Abandon blocking size lookup table approach. Not performing as well in real world as in microbenchmark.
2015-05-19 11:03:59 -04:00
Christoph Hertzberg
ebea530782
bug #1014 : More stable direct computation of eigenvalues and -vectors for 3x3 matrices
2015-05-17 21:54:32 +02:00
Benoit Jacob
c88e1abaf3
also uninitialized here, see previous cset
2015-05-15 11:34:57 -04:00
Benoit Jacob
807793ec3b
Fix uninitialized var warning. The compiler was clearing the register anyway, so this does not change resulting code
2015-05-15 11:15:53 -04:00
Pete Warden
140f85bb99
Check for the macro __ARM_NEON__ (with two underscores at the end) as well as __ARM_NEON. The second macro is correct according to the ARM language extensions specification, but historically the first one has been more common. Some older compilers (e.g. gcc v4.6 on a Beaglebone Black) only define the first, so without this patch NEON isn't enabled.
2015-05-12 16:03:43 -07:00
Gael Guennebaud
ef81730625
Ignore denormal numbers in selfadjoint eigensolver.
2015-05-12 18:38:43 +02:00
Christoph Hertzberg
494fa991c3
bug #872 : Avoid deprecated binder1st/binder2nd usage by providing custom functors for comparison operators
2015-05-07 17:28:40 +02:00
Gael Guennebaud
4a936974a5
bug #1013 : fix 2x2 direct eigensolver for identical eiegnvalues
2015-05-07 15:55:12 +02:00
Gael Guennebaud
ebf8ca4fa8
Fix bug #1010 : m_isInitialized was improperly updated
2015-05-07 14:20:42 +02:00
Konstantinos Margaritis
dd698e6680
Merged in doug_kwan/eigen (pull request PR-103)
...
Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of
2015-05-05 20:50:14 +03:00
Benoit Steiner
1dded10cb7
Added a double-precision implementation of the exp() function for AVX.
2015-05-04 10:42:51 -07:00
Christoph Hertzberg
28a4c92cbf
bug #998 : Started fixing doxygen warnings
2015-05-01 22:10:41 +02:00
Christoph Hertzberg
173b34e9ab
bug #999 : clarify that behavior of empty AlignedBoxes is undefined, and further improvements in documentation
2015-04-30 19:30:36 +02:00
Gael Guennebaud
de18cd413d
Disable posix_memalign on Solaris and SunOS, and allows to by-pass built-in posix_memalign detection rules.
2015-04-24 11:26:51 +02:00
Gael Guennebaud
40258078c6
bug #360 : add value_type typedef to DenseBase/SparseMatrixBase
2015-04-24 09:44:24 +02:00
Christoph Hertzberg
c460af414e
Fix bug #1000 : Manually inherit assignment operators for MSVC 2013 and later (as required by the standard).
2015-04-23 13:39:03 +02:00
Gael Guennebaud
dbd12b4cda
Make sure that BlockImpl<const SparseMatrix> ctor is called with the right type
2015-04-21 10:15:36 +02:00
Gael Guennebaud
d6a8b43b39
Fix typo in the definition of EIGEN_COMP_GNUC_STRICT
2015-04-21 10:12:38 +02:00
Deanna Hood
e5048b5501
Use std::isfinite when available
2015-04-20 14:59:57 -04:00
Deanna Hood
249c48ba00
Incorporate C++11 check into EIGEN_HAS_C99_MATH macro
2015-04-20 14:57:04 -04:00
Deanna Hood
0250f4a9f2
Merged default into unary-array-cwise-functors
2015-04-20 14:01:35 -04:00
Deanna Hood
0339502a4f
Only use std::isnan and std::isinf if they are available
2015-04-20 13:14:06 -04:00
Gael Guennebaud
fc2d5b86ce
simplify previous changeset: sub-expressions are nested by value
2015-04-18 22:50:16 +02:00
Gael Guennebaud
5a3c48e3c6
bug #942 : fix dangling references in evaluator of diagonal * sparse products.
2015-04-18 22:43:27 +02:00
Christoph Hertzberg
4f126b862d
Add internal assertions to purely fixed-size DenseStorage, mark optional variables always as unused
2015-04-17 11:36:21 +02:00
Christoph Hertzberg
9d7843d0d0
Add internal assertions to DenseStorage constructor
2015-04-16 15:47:06 +02:00
Christoph Hertzberg
3be9f5c4d7
Constructing a Matrix/Array with implicit transpose could lead to memory leaks.
...
Also reduced code duplication for Matrix/Array constructors
2015-04-16 13:25:20 +02:00
Gael Guennebaud
e0cff9ae0d
Fix bug #996 : fix comparisons to 0 instead of Scalar(0)
2015-04-15 14:48:53 +02:00
Gael Guennebaud
5dbe758dc3
Backed out changeset 04c8c5d9ef
2015-04-15 14:47:08 +02:00
Gael Guennebaud
04c8c5d9ef
Fix bug #996 : fix comparisons to 0 instead of Scalar(0)
2015-04-15 14:44:57 +02:00
Benoit Steiner
0f82399fe9
Pulled latest changes from trunk
2015-04-14 19:13:34 -07:00
Christoph Hertzberg
761691f18d
Make conversion from 0 to Scalar explicit (issue reported by Brad Bell)
2015-04-13 17:15:00 +02:00
Benoit Steiner
5401fbcc50
Improved the blocking strategy to speedup multithreaded tensor contractions.
2015-04-09 16:44:10 -07:00
Deanna Hood
085aa8e601
Don't use M_PI since it's only guaranteed to be defined in Eigen/Geometry
2015-04-08 13:59:18 -05:00
Gael Guennebaud
0eb220c00d
add a note on bug #992
2015-04-08 09:25:34 +02:00
Benoit Jacob
d7f51feb07
bug #992 : don't select a 3p GEMM path with non-vectorizable scalar types, this hits unsupported paths in symm/triangular products code
2015-04-07 15:13:55 -04:00
Benoit Steiner
7c18ab921c
Pulled latest updates from trunk
2015-04-04 20:07:04 -07:00
Gael Guennebaud
15b5adb327
Fix regression in DynamicSparseMatrix and SuperLUSupport wrt recent change on nonZeros/nonZerosEstimate
2015-04-02 22:21:41 +02:00
Benoit Steiner
74e558cfa8
Pulled latest updates from trunk
2015-04-01 23:24:11 -07:00
Gael Guennebaud
5861cfb55e
Remove unused GenericSparseBlockInnerIteratorImpl code.
2015-04-01 22:29:29 +02:00
Gael Guennebaud
3105986e71
bug #875 : remove broken SparseMatrixBase::nonZeros and introduce a nonZerosEstimate() method to sparse evaluators for internal uses.
...
Factorize some code in SparseCompressedBase.
2015-04-01 22:27:34 +02:00
Gael Guennebaud
39dcd01b0a
bug #973 : enable alignment of multiples of half-packet size (e.g., Vector6d with AVX)
2015-04-01 13:55:09 +02:00
Gael Guennebaud
8481dc21ea
bug #986 : add support for coefficient-based product with 0 depth.
2015-04-01 13:15:23 +02:00
Gael Guennebaud
79b4e6acaf
Fix bug #987 : wrong alignement guess in diagonal product.
2015-03-31 23:35:12 +02:00
Gael Guennebaud
3c38589984
Remove most of the dynamic memory allocations that occured in D&C SVD. Still remains the calls to JacobiSVD and UpperBidiagonalization.
2015-03-31 22:54:47 +02:00
Gael Guennebaud
8313fb7df7
Add row/column-wise reverseInPlace feature.
2015-03-31 21:35:53 +02:00
Gael Guennebaud
dfb674a25e
Make reverseInPlace really work in-place.
2015-03-31 20:17:10 +02:00
Gael Guennebaud
20d030f207
Fix vectorization of swap for non trivial expressions
2015-03-31 20:16:02 +02:00
Benoit Jacob
73cdeae1d3
Only use blocking sizes LUTs for single-thread products for now
2015-03-31 11:17:23 -04:00
Benoit Jacob
0cbd5ae3cb
Correctly detect Android with ndk_build
2015-03-31 11:17:21 -04:00
Gael Guennebaud
ae01c05e18
Fix computeProductBlockingSizes with m==0, and add respective unit test.
2015-03-31 15:19:57 +02:00
Gael Guennebaud
bd76d837e6
Fix sign of SuperLU::determinant
2015-03-31 14:57:32 +02:00
Gael Guennebaud
35d3053d55
Fix regression introduced in 3b169d792d
2015-03-31 09:23:53 +02:00
Christoph Hertzberg
3b169d792d
Suppress unused variable warning
2015-03-31 00:49:08 +02:00
Christoph Hertzberg
1efae98fee
bug #985 : RealQZ failed when either matrix had zero rows or columns (report and patch by Ben Goodrich)
...
Also added a regression test
2015-03-30 23:56:20 +02:00
Benoit Steiner
35722fa022
Made the index type a template parameter of the tensor class instead of encoding it in the options.
2015-03-30 14:55:54 -07:00
Christoph Hertzberg
58af8bf90c
bug #982 : Make sure numext::maxi and numext::mini are called correctly, in case Scalar expressions return expression templates.
2015-03-30 16:47:22 +02:00
Gael Guennebaud
2adbf6b8ca
fix stupid warning with old GCC
2015-03-28 22:34:54 +01:00
Gael Guennebaud
41e20248f8
merge
2015-03-28 14:43:35 +01:00
Christoph Hertzberg
09a5361d1b
bug #983 : Pass Vector3 by const reference and not by value
2015-03-28 12:36:24 +01:00
Gael Guennebaud
eb7e4c2b9c
Pass Vector3 type by reference
2015-03-27 12:11:24 +01:00
Gael Guennebaud
79cb875249
merge
2015-03-27 10:56:04 +01:00
Gael Guennebaud
1b8cc9af43
Slight numerical stability improvement in 2x2 svd
2015-03-27 10:55:00 +01:00
Gael Guennebaud
3d59ae0203
Fix hypot(0,0).
2015-03-27 09:59:24 +01:00
Benoit Steiner
d3f7915aeb
Pulled latest update from the eigen main codebase
2015-03-24 13:12:14 -07:00
Benoit Steiner
abdbe8562e
Fixed the CUDA packet primitives
2015-03-24 10:45:46 -07:00
Gael Guennebaud
29eaa2b0f1
Make MatrixBase::is* methods aware of nested_eval.
2015-03-24 13:42:42 +01:00
Gael Guennebaud
d27968eb7e
D&C SVD: directly falls back to JacobiSVD for very small problems (by-pass upper-bidiagonalization)
2015-03-24 13:38:07 +01:00
Gael Guennebaud
4472f3e578
Avoid SVD: consider denormalized small numbers as zero when computing the rank of the matrix
2015-03-23 09:40:21 +01:00
Deanna Hood
83e5b7656b
Use M_PI instead of acos(-1) for pi
2015-03-22 06:04:31 +10:00
Deanna Hood
4bab4790c0
Add \sa tags of isFinite/isInf for each other
2015-03-22 05:39:08 +10:00
Gael Guennebaud
4e2b18d909
Update approx. minimum ordering method to push and keep structural empty diagonal elements to the bottom-right part of the matrix
2015-03-20 16:33:48 +01:00
Gael Guennebaud
d6b2f300db
Fix MSVC compilation: aligned type must be passed by reference
2015-03-19 17:28:32 +01:00
Gael Guennebaud
61c45d7cfd
Fix comparison warning
2015-03-19 17:13:22 +01:00
Gael Guennebaud
f329d0908a
Improve random number generation for integer and add unit test
2015-03-19 15:10:36 +01:00
Benoit Jacob
dc04f12967
use unsigned short instead of uint16_t which doesn't exist in c++98
2015-03-17 10:31:45 -04:00
Deanna Hood
596be3cd86
Use std::arg for real numbers when c++11 is used, custom implementation otherwise
2015-03-17 15:28:12 +10:00
Deanna Hood
e26134ed62
Use std::round when c++11 is used, custom implementation otherwise
2015-03-17 14:55:14 +10:00
Deanna Hood
e21e29a088
Update cost of arg call to depend on if the scalar is complex or not
2015-03-17 14:04:33 +10:00
Deanna Hood
447a5a6b01
Fix VML declarations to only be for real/complex as appropriate
2015-03-17 13:33:31 +10:00
Deanna Hood
f52b78491c
Remove packet isNaN, isInf, isFinite
2015-03-17 09:26:24 +10:00
Deanna Hood
1c78d6f2a6
Add boolean not operator (!) array support
2015-03-17 08:29:57 +10:00
Benoit Jacob
364cfd529d
Similar to cset 3589a9c115
...
, also in 2px4 kernel: actual_panel_rows computation should always be resilient to parameters not consistent with the known L1 cache size, see comment
2015-03-16 16:28:44 -04:00
Deanna Hood
e1d6e6c972
Make cube, inverse and abs2 free-functions
2015-03-17 06:25:24 +10:00
Benoit Jacob
577056aa94
Include stdint.h. Not going for cstdint because it is a C++11 addition. Needed for uint16_t at least, in lookup-table code.
2015-03-16 16:21:50 -04:00
Benoit Jacob
eb6929cb19
fix bug in maxsize calculation, which would cause products of size > 2048 to address the lookup table out of bounds
2015-03-16 16:15:47 -04:00
Deanna Hood
fef4e071d7
Rename isinf to isInf
2015-03-17 05:58:47 +10:00
Deanna Hood
46cf9cda32
Add isfinite array support as isFinite
2015-03-17 04:33:12 +10:00
Deanna Hood
1d76ceab55
Remove floor, ceil, round for complex numbers
2015-03-17 02:36:07 +10:00
Deanna Hood
717b7954ce
Update cost of coeff-wise arg call
2015-03-17 02:11:57 +10:00
Deanna Hood
fb68b149cb
Rename isnan to isNaN
2015-03-17 02:04:42 +10:00
Benoit Jacob
35c3a8bb84
Update Nexus 5 lookup table from combining now 2 runs of the benchmark, using the analyze-blocking-sizes partition tool. Gives better worst-case performance.
2015-03-16 11:05:51 -04:00
Benoit Jacob
e274607d7f
fix compilation with GCC 4.8
2015-03-16 10:48:27 -04:00
Benoit Jacob
151b8b95c6
Fix bug in case where EIGEN_TEST_SPECIFIC_BLOCKING_SIZE is defined but false
2015-03-15 19:10:51 -04:00
Benoit Jacob
02babb9c0f
Provide a empirical lookup table for blocking sizes measured on a Nexus 5. Only for float, only for Android on ARM 32bit for now.
2015-03-15 18:13:12 -04:00
Benoit Jacob
3589a9c115
actual_panel_rows computation should always be resilient to parameters not consistent with the known L1 cache size, see comment
2015-03-15 18:12:18 -04:00
Benoit Jacob
1dd3d89818
Fix a unused-var warning
2015-03-15 18:07:19 -04:00
Benoit Jacob
e56aabf205
Refactor computeProductBlockingSizes to make room for the possibility of using lookup tables
2015-03-15 18:05:12 -04:00
Benoit Jacob
488c15615a
organize a little our default cache sizes, and use a saner default L1 outside of x86 (10% faster on Nexus 5)
2015-03-13 14:51:26 -07:00
Gael Guennebaud
1330f8bbd1
bug #973 , improve AVX support by enabling vectorization of Vector4i-like types, and enforcing alignement of Vector4f/Vector2d-like types to preserve compatibility with SSE and future Eigen versions that will vectorize them with AVX enabled.
2015-03-13 21:15:50 +01:00
Gael Guennebaud
d99ab35f9e
Fix internal::random(x,y) for integer types. The previous implementation could return y+1. The new implementation uses rejection sampling to get an unbiased behabior.
2015-03-13 21:12:46 +01:00
Gael Guennebaud
8580eb6808
bug #949 : add static assertion for incompatible scalar types in dense end-user decompositions.
2015-03-13 21:06:20 +01:00
Gael Guennebaud
a9df28c95b
SparseMatrix::insert: switch to a fully uncompressed mode if sequential insertion is not possible (otherwise an arbitrary large amount of memory was preallocated in some cases)
2015-03-13 21:00:21 +01:00
Gael Guennebaud
5ffe29cb9f
Bound pre-allocation to the maximal size representable by StorageIndex and throw bad_alloc if that's not possible.
2015-03-13 20:57:33 +01:00
Gael Guennebaud
2f6f8bf31c
Add missing coeff/coeffRef members to Block<sparse>, and extend unit tests.
2015-03-13 16:24:40 +01:00
Doug Kwan
657407227e
Fix bug in pdiv<Packet1cd> which swaps 32-bit halves of a pair of
...
doubles instead of swapping the doubles.
2015-03-11 15:13:37 -07:00
Deanna Hood
f89fcefa79
Add hyperbolic trigonometric functions from std array support
2015-03-11 13:13:30 +10:00
Deanna Hood
a5e49976f5
Add log10 array support
2015-03-11 08:56:42 +10:00
Deanna Hood
19a71056ae
Allow calling of square(array) in addition to array.square()
2015-03-11 06:59:28 +10:00
Deanna Hood
31fdd67756
Additional unary coeff-wise functors (isnan, round, arg, e.g.)
2015-03-11 06:39:23 +10:00
Gael Guennebaud
fd78874888
Fix compilation of iterative solvers with dense matrices
2015-03-09 21:31:03 +01:00
Gael Guennebaud
d4317a85e8
Add typedefs for return types of SparseMatrixBase::selfadjointView
2015-03-09 21:29:46 +01:00
Gael Guennebaud
9e885fb766
Add unit tests for CG and sparse-LLT for long int as storage-index
2015-03-09 14:33:15 +01:00
Gael Guennebaud
224a1fe4c6
bug #963 : make IncompleteLUT compatible with non-default storage index types.
2015-03-09 13:55:20 +01:00
Gael Guennebaud
0ee391863e
Avoid undeflow when blocking size are tuned manually.
2015-03-06 21:51:09 +01:00
Gael Guennebaud
14a5f135a3
bug #969 : workaround abiguous calls to Ref using enable_if.
2015-03-06 17:51:31 +01:00
Gael Guennebaud
87681e508f
bug #978 : early return for vanishing products
2015-03-06 16:11:22 +01:00
Gael Guennebaud
cd3bbffa73
Improve blocking heuristic: if the lhs fit within L1, then block on the rhs in L1 (allows to keep packed rhs in L1)
2015-03-06 14:31:39 +01:00
Gael Guennebaud
58740ce4c6
Improve product kernel: replace the previous dynamic loop swaping strategy by a more general one:
...
It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited.
2015-03-06 10:30:35 +01:00
Gael Guennebaud
4c8b95d5c5
Rename LSCG to LeastSquaresConjugateGradient
2015-03-05 10:16:32 +01:00
Gael Guennebaud
7550107028
Product optimization: implement a dynamic loop-swapping startegy to improve memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large"
2015-03-05 10:03:46 +01:00
Gael Guennebaud
2dc968e453
bug #824 : improve accuracy of Quaternion::angularDistance using atan2 instead of acos.
2015-03-04 17:03:13 +01:00
Benoit Steiner
0196141938
Fixed the optimized AVX implementation of the fast rsqrt function
2015-03-02 13:49:39 -08:00
Benoit Steiner
4fd7f47692
Added an optimized version of rsqrt for SSE and AVX that is used when EIGEN_FAST_MATH is defined.
2015-03-02 09:38:47 -08:00
Benoit Steiner
fb53384b0f
Improved the default implementation of prsqrt
2015-02-28 01:51:26 -08:00
Benoit Steiner
306fceccbe
Pulled latest updates from trunk
2015-02-27 13:05:26 -08:00
Benoit Steiner
2386fc8528
Added support for 32bit index on a per tensor/tensor expression. This enables us to use 32bit indices to evaluate expressions on GPU faster while keeping the ability to use 64 bit indices to manipulate large tensors on CPU in the same binary.
2015-02-27 12:57:13 -08:00
Benoit Jacob
6466fa63be
Reimplement the selection between rotating and non-rotating kernels
...
using templates instead of macros and if()'s.
That was needed to fix the build of unit tests on ARM, which I had
broken. My bad for not testing earlier.
2015-02-27 15:30:10 -05:00
Benoit Steiner
05089aba75
Switch to truncated casting when converting floating point types to integer. This ensures that vectorized casts are consistent with scalar casts
2015-02-27 09:27:30 -08:00
Benoit Steiner
573b377110
Added support for vectorized type casting of tensors
2015-02-27 08:46:04 -08:00
Benoit Jacob
2fc3b484d7
remove trailing comma
2015-02-27 11:37:45 -05:00
Benoit Jacob
33669348c4
Disable Packet2f/2i halfpacket support in NEON.
...
I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented,
and code trying to use halfpackets just fails to compile on NEON, as it tries to use the
default implementation of pload/pstore and the types don't match.
2015-02-27 11:35:37 -05:00
Benoit Jacob
b7fc8746e0
Replace a static assert by a runtime one, fixes the build of unit tests on ARM
...
Also safely assert in the non-implemented path that should never be taken in practice,
and would return wrong results.
2015-02-27 10:01:59 -05:00
Benoit Steiner
f41b1f1666
Added support for fast reciprocal square root computation.
2015-02-26 09:42:41 -08:00
Gael Guennebaud
bcf9bb5c1f
Avoid packing rhs multiple-times when blocking on the lhs only.
2015-02-26 17:01:33 +01:00
Gael Guennebaud
4ec3f04b3a
Make sure that the block size computation is tested by our unit test.
2015-02-26 17:00:36 +01:00
Gael Guennebaud
a8ad8887bf
Implement a more generic blocking-size selection algorithm. See explanations inlines.
...
It performs extremely well on Haswell. The main issue is to reliably and quickly find the
actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)
2015-02-26 16:04:35 +01:00
Gael Guennebaud
400becc591
Fix typos in block-size testing code, and set peeling on k to 8.
2015-02-26 15:57:06 +01:00
Benoit Jacob
692136350b
So I extensively measured the impact of the offset in this prefetch. I tried offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes).
...
On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400.
I could not see any significant impact of this offset.
On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0.
So let's just go with 0!
Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!
2015-02-25 12:37:14 -05:00
Christoph Hertzberg
531fa9de77
bug #970 : Add EIGEN_DEVICE_FUNC to RValue functions, in case Cuda supports RValue-references.
2015-02-24 21:03:28 +01:00
Benoit Jacob
26275b250a
Fix my recent prefetch changes:
...
- the first prefetch is actually harmful on Haswell with FMA,
but it is the most beneficial on ARM.
- the second prefetch... I was very stupid and multiplied by sizeof(scalar)
and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8.
So this effectively restores the older offset. Actually, there were
two prefetches here, one with offset 48 and one with offset 64. I could not
confirm any benefit from this strange 48 offset on either the haswell or
my ARM device.
2015-02-23 16:55:17 -05:00