using a macro and _Pragma.
- use OpenMP also in cacheOptimalProduct and in the
vectorized paths as well
- kill the vector assignment unroller. implement in
operator= the logic for assigning a row-vector in
a col-vector.
- CMakeLists support for building tests/examples
with -fopenmp and/or -msse2
- updates in bench/, especially replace identity()
by ones() which prevents underflows from perturbing
bench results.