This provide several advantages:
- more flexibility in designing unit tests
- unit tests can be glued to speed up compilation
- unit tests are compiled with same predefined macros, which is a requirement for zapcc
1. Added new packet functions using SIMD for NByOne, OneByN cases
2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD
3. Added 4 test cases to cover the new packet functions