We could generate the MRI version (SSE 4.1) instead of the RMI
(SSE 2) version of these instructions if a 64-bit register was given
as the destination.
Reported-by: Vasiliy Olekhov <olekhov@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>