- removed the ugly X11 and PNG gnuplots terminals
- use enhanced postscript terminal
- use imagemagick to generate the png files (with compression)
- disable the fortran impl by default since it is as meaningless as a "C impl"
- update line settings
It basically performs 4 dot products at once reducing loads of the vector and improving
instructions scheduling. With 3 cache friendly algorithms, we now handle all product
configurations with outstanding perf for large matrices.