In this patch, we will support SM4 AVX10.2 extension part. It is
a promotion from VEX encoding to EVEX encoding. The EVEX encoding
is based on AVX10.2, which is the same as the upcoming MOVRS ISA.
Thus, we decide to pull AVX10.2 out to CPU_COMMON_FLAGS.
While I have also tried to merge the table like AVX/AVX512, I
choose to just templatize the table. I am okay to go either way,
but slightly prefer the templatizing one since probably SM4 would
be the only ISA with AVX10.2 needs such VEX to EVEX extension (MOVRS
does not need that). Also, it is a tendancy that we will directly
provide EVEX encodings and no VEX encodings for vector instructions
since AVX10. This will make the adding in gas/config/tc-i386.c not
that worthy.
gas/ChangeLog:
* NEWS: Support Intel SM4 EVEX instructions.
* config/tc-i386.c (_is_cpu): Handle AVX10.2.
* testsuite/gas/i386/i386.exp: Run SM4 tests.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/avx10_2-256-sm4-intel.d: Add SM4 tests.
* testsuite/gas/i386/avx10_2-256-sm4.d: Ditto.
* testsuite/gas/i386/avx10_2-256-sm4.s: Ditto.
* testsuite/gas/i386/avx10_2-512-sm4-intel.d: Ditto.
* testsuite/gas/i386/avx10_2-512-sm4.d: Ditto.
* testsuite/gas/i386/avx10_2-512-sm4.s: Ditto.
* testsuite/gas/i386/avx10_2-sm4-inval.l: Ditto.
* testsuite/gas/i386/avx10_2-sm4-inval.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-sm4-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-sm4.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-sm4.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-sm4-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-sm4.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-sm4.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-sm4-inval.l: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-sm4-inval.s: Ditto.
opcodes/ChangeLog:
* i386-dis-evex.h: Add evex table entry for SM4.
* i386-dis.h: Ditto.
* i386-opc.h: (i386_cpu): Move AVX10.2 to CPU_FLAGS_COMMON.
* i386-opc.tbl: Add SM4 EVEX instructions.
* i386-init.h: Regenerated.
* i386-tbl.h: Ditto.
Using r0 as a base address register in the ROP hashst and hashchk instructions
is invalid. Modify the assembler to catch that illegal use and emit an error.
opcodes/
* ppc-opc.c (insert_ras): Update error message and function comment.
(powerpc_opcodes) <hashst, hashstp, hashchk, hashchkp>: Use RAS.
.cfi directives only support the use of register numbers and not
register names or aliases.
This commit adds support for 4 formats, for example:
.cfi_offset r1, 8
.cfi_offset ra, 8
.cfi_offset $r1,8
.cfi_offset $ra,8
The above .cfi directives are equivalent and all represent dwarf
register number 1.
Display register aliases as specified in the psABI during disassembly.
In this patch, we will support AVX10.2 satcvt instructions. All of them
are new instruction forms. In current documentation, it is still
VCVTTNEBF162I[,U]BS, but it will change to VCVTTBF162I[,U]BS eventually.
In table part, we used temporary <sign> iterator to reduce redundancy.
It definitely could be done for legacy cvt insns, but it is out of this
patch's scope.
gas/ChangeLog:
* testsuite/gas/i386/i386.exp: Add AVX10.2 tests.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/avx10_2-512-satcvt-intel.d: New test.
* testsuite/gas/i386/avx10_2-512-satcvt.d: Ditto.
* testsuite/gas/i386/avx10_2-512-satcvt.s: Ditto.
* testsuite/gas/i386/avx10_2-256-satcvt-intel.d: Ditto.
* testsuite/gas/i386/avx10_2-256-satcvt.d: Ditto.
* testsuite/gas/i386/avx10_2-256-satcvt.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-satcvt-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-satcvt.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-satcvt.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-satcvt-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-satcvt.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-satcvt.s: Ditto.
opcodes/ChangeLog:
* i386-dis-evex-prefix.h: Add PREFIX_EVEX_MAP5_68, PREFIX_EVEX_MAP5_69,
PREFIX_EVEX_MAP5_6A, PREFIX_EVEX_MAP5_6B, PREFIX_EVEX_MAP5_6C,
PREFIX_EVEX_MAP5_6D.
* i386-dis-evex-w.h: Add EVEX_W_MAP5_6C_P_0, EVEX_W_MAP5_6C_P_2,
EVEX_W_MAP5_6D_P_0, EVEX_W_MAP5_6D_P_2.
* i386-dis-evex.h (prefix_table): Add PREFIX_EVEX_MAP5_68,
* PREFIX_EVEX_MAP5_69, PREFIX_EVEX_MAP5_6A, PREFIX_EVEX_MAP5_6B.
* i386-dis.c: (PREFIX_EVEX_MAP5_68): New.
(PREFIX_EVEX_MAP5_69): Ditto.
(PREFIX_EVEX_MAP5_6A): Ditto.
(PREFIX_EVEX_MAP5_6B): Ditto.
(PREFIX_EVEX_MAP5_6C): Ditto.
(PREFIX_EVEX_MAP5_6D): Ditto.
(EVEX_MAP5_6C_P_0): Ditto.
(EVEX_MAP5_6C_P_2): Ditto.
(EVEX_MAP5_6D_P_0): Ditto.
(EVEX_MAP5_6D_P_2): Ditto.
* i386-opc.tbl: Add AVX10.2 instructions.
* i386-mnem.h: Regenerated.
* i386-tbl.h: Ditto.
Co-authored-by: Zewei Mo <zewei.mo@intel.com>
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
Co-authored-by: Levy Hsu <admin@levyhsu.com>
For several instructions including vps{l,r}l{d,q,w,dq} and vpsra{d,w},
their VEX part do not have the following version:
vpsrlw $0x1f,(%r15,%rcx,4),%xmm0
Thus, {evex} prefix should not be inserted when their second operand is
memory, while we still need them for register as second operand. Add a
new macro %ME to solve this problem.
For vpsraq, there is no VEX version, so the {evex} prefix should always
be eliminated.
gas/ChangeLog:
PR binutils/32403
* testsuite/gas/i386/i386.exp: Run new test.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/evex-only.d: New test.
* testsuite/gas/i386/evex-only.s: Ditto.
* testsuite/gas/i386/x86-64-evex-only.d: Ditto.
* testsuite/gas/i386/x86-64-evex-only.s: Ditto.
opcodes/ChangeLog:
PR binutils/32403
* i386-dis-evex-reg.h: Use %ME instead of %XE for vps{l,r}l{w,dq}
and vpsraw. Split table for vpsra{d,q}.
* i386-dis-evex-w.h: Use %ME instead of %XE for vps{l,r}l{d,q}
and vpsrad. Eliminate vpsraq {evex} prefix.
* i386-dis-evex.h: Split table for vpsra{d,q}.
* i386-dis.c: (EVEX_W_0F72_R_4): New.
(EVEX_W_0FE2): Ditto.
(struct dis386): Add comment for %ME.
(putop): Handle %ME.
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reuse logic introduced with the preceding commit in the assembler to
treat addressing operand sequences D(X,B), D(B), and D(L,B) as one
with regards to optional last operands (i.e. optparm and optparm2).
With this "nop" now disassembles into "nop" instead of "nop 0".
opcodes/
* s390-dis.c (operand_count): New helper to count the remaining
operands, treating D(X,B), D(B), and D(L,B) as one.
(skip_optargs_p): New helper to test whether remaining operands
are optional.
(skip_optargs_zero_p): New helper to test whether remaining
operands are optional and their values are zero.
(s390_print_insn_with_opcode): Use skip_optargs_zero_p to skip
optional last operands with a value of zero.
gas/testsuite/
* gas/s390/zarch-optargs.d (nop): Adjust test case accordingly.
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
"nop D1(B1)" erroneously disassembled into "nop D1(B1" (missing
closing parenthesis). "nop D1(X1,0)" and "nop D1(X1,)" erroneously
disassembled into "nop D1(X1)" (missing zero base register) instead
of "nop D1(X1,0)".
Do not skip disassembly of optional operands if they are index (X)
or base (B) registers or length (L) in an addressing operand sequence
"D(X,B)", "D(B)", or "D(L,B). Index and base register operand values
of zero are being handled separately, as they may not be omitted
unconditionally. For instance a base register value of zero must be
printed in above mentioned case, to distinguish the index from the
base register. This also ensures proper formatting of addressing
operand sequences.
While at it add further test cases for instructions with optional
operands.
opcodes/
* s390-dis.c (s390_print_insn_with_opcode): Do not
unconditionally skip disassembly of optional operands with a
value of zero, if within an addressing operand sequence.
gas/testsuite/
* gas/s390/zarch-optargs.d: Add further test cases for
instructions with optional operands.
* gas/s390/zarch-optargs.s: Likewise.
Reported-by: Florian Krohm <flo2030@eich-krohm.de>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
The Nios II architecture has been EOL'ed by the vendor. This patch
removes all binutils, bfd, gas, binutils, and opcodes support for this
target with the exception of the readelf utility. (The ELF EM_*
number remains valid and the relocation definitions from the Nios II
ABI will never change in future, so retaining the readelf support
seems consistent with its purpose as a utility that tries to parse the
headers in any ELF file provided as an argument regardless of target.)
I build gdb on arm-linux and ran into:
...
CC riscv-dis.lo
opcodes/riscv-dis.c: In function ‘print_insn_args’:
opcodes/riscv-dis.c:743:29: error: format ‘%lu’ expects argument of type \
‘long unsigned int’, but argument 4 has type ‘insn_t’ \
{aka ‘long long unsigned int’} [-Werror=format=]
743 | "%lu", EXTRACT_ZCMT_INDEX (l));
| ~~^
| |
| long unsigned int
| %llu
...
Fix this by printing the insn_t value, which is a uint64_t, using PRIu64.
Tested by finishing the build.
Add support for pac_key_[pu]_[0-3](_ns)? register operands for the MRS and MSR
instructions when assembling for Armv8.1-M Mainline, as well as adding the
corresponding support for disassembling instructions that use it.
This patch supports Zcmt[1] instruction 'cm.jt' and 'cm.jalt'.
Add new CSR jvt for tablejump using. Since 'cm.jt' and 'cm.jalt'
have the same instructiong encoding, use 'match_cm_jt' and 'match_cm_jalt'
check the 'zcmt_index' field to distinguish them.
[1] https://github.com/riscvarchive/riscv-code-size-reduction/releases
Co-Authored by: Charlie Keaney <charlie.keaney@embecosm.com>
Co-Authored by: Mary Bennett <mary.bennett@embecosm.com>
Co-Authored by: Nandni Jamnadas <nandni.jamnadas@embecosm.com>
Co-Authored by: Sinan Lin <sinan.lin@linux.alibaba.com>
Co-Authored by: Simon Cook <simon.cook@embecosm.com>
Co-Authored by: Shihua Liao <shihua@iscas.ac.cn>
Co-Authored by: Yulong Shi <yulong@iscas.ac.cn>
bfd/ChangeLog:
* elfxx-riscv.c (riscv_multi_subset_supports): New extension.
(riscv_multi_subset_supports_ext): Ditto.
gas/ChangeLog:
* config/tc-riscv.c (enum riscv_csr_class): New CSR.
(riscv_csr_address): Ditto.
(validate_riscv_insn): New operand.
(riscv_ip): Ditto.
* testsuite/gas/riscv/csr-version-1p10.d: New CSR.
* testsuite/gas/riscv/csr-version-1p10.l: Ditto.
* testsuite/gas/riscv/csr-version-1p11.d: Ditto.
* testsuite/gas/riscv/csr-version-1p11.l: Ditto.
* testsuite/gas/riscv/csr-version-1p12.d: Ditto.
* testsuite/gas/riscv/csr-version-1p12.l: Ditto.
* testsuite/gas/riscv/csr.s: Ditto.
* testsuite/gas/riscv/march-help.l: New extension.
* testsuite/gas/riscv/zcmt-fail.d: New test.
* testsuite/gas/riscv/zcmt-fail.l: New test.
* testsuite/gas/riscv/zcmt-fail.s: New test.
* testsuite/gas/riscv/zcmt.d: New test.
* testsuite/gas/riscv/zcmt.s: New test.
include/ChangeLog:
* opcode/riscv-opc.h (MATCH_CM_JT): New opcode.
(MASK_CM_JT): New mask.
(MATCH_CM_JALT): New opcode.
(MASK_CM_JALT): New mask.
(CSR_JVT): New CSR.
(DECLARE_INSN): New declaration.
(DECLARE_CSR): Ditto.
* opcode/riscv.h (EXTRACT_ZCMT_INDEX): New marco.
(ENCODE_ZCMT_INDEX): Ditto.
(enum riscv_insn_class): New class.
opcodes/ChangeLog:
* riscv-dis.c (print_insn_args): New operand.
* riscv-opc.c (match_cm_jt): New function.
(match_cm_jalt): Ditto.
Map7 already has dual purpose for USER-MSR (and is to gain more for
MSR-IMM), while Map5 is about to gain VEX uses for AMX extensions. Drop
the not really meaningful infixes and (in the opcode table) prefixes,
retaining merely EVexMap4 for encoding EVex128 at the same time.
Much like AVX512-{4FMAPS,4VNNIW} have a constraint on their register
source, there's a constraint (need to be even) on the destination
register here.
Adjust "good" test cases accordingly, and add a new test case to check
the warning.
static_assert is declared in C23 so we can't reuse that identifier:
* Define our own static_assert conditionally;
* Rename "static assert" hacks to _N as we do already in some places
to avoid a conflict.
ChangeLog:
PR ld/32372
* i386-gen.c (static_assert): Define conditionally.
* mips-formats.h (MAPPED_INT): Rename identifier.
(MAPPED_REG): Rename identifier.
(OPTIONAL_MAPPED_REG): Rename identifier.
* s390-opc.c (static_assert): Define conditionally.
This patch introduces a new operand flag OPD_F_UNSIGNED to signal that
the immediate value should be treated as an unsigned value. The default
signedness of immediate operands is signed.
The current space optmization on enum aarch64_opn_qualifier forced its
encoding using an unsigned char. This "hard-coded" optimization has the
bad consequence of making the array of such enums being completely
unreadable when debugging with GDB because the enum type is lost along
the way.
Keeping this space optimization, and the enum type as well, is possible
when the declaration of the enum is tagged with attribute((packed)).
attribute((packed)) is a GNU extension, and is wrapped in the macro
ATTRIBUTE_PACKED (defined in ansidecl.h), and should be used instead.
For any arm elf target, disable an old piece of code that forced disassembly to
disassemble for 'unknown architecture' which once upon a time meant it would
disassemble ANY arm instruction. This is no longer true with the addition of
Armv8.1-M Mainline, as there are conflicting encodings for different thumb
instructions.
BFD however can detect what architecture the object file was assembled for
using information in the notes section. So if available, we use that,
otherwise we default to the old 'unknown' behaviour.
With the changes above code, a mode changing 'bx lr' assembled for armv4 with
the option --fix-v4bx will result in an object file that is recognized by bfd
as one for the armv4 architecture. The disassembler now disassembles this
encoding as a BX even for Armv4 architectures, but warns the user when
disassembling for Armv4 that this instruction is only valid from Armv4T
onwards.
Remove the unused and wrongfully defined ARM_ARCH_V8A_CRC, and
define and use a ARM_ARCH_V8R_CRC to make sure instructions enabled by
-march=armv8-r+crc are disassembled correctly.
Patch up some of the tests cases, see a brief explanation for each below.
inst.d:
This test checks the assembly & disassembly of basic instructions in armv3m. I
changed the expected behaviour for teqp, cmnp cmpp and testp instructions to
properly print p when disassembling, whereas before, in the 'unknown' case it
would disassemble these as UNPREDICTABLE as they were changed in later
architectures.
nops.d:
Was missing an -march, added one to make sure we were testing the right
behavior of NOP<c> instructions.
unpredictable.d:
Was missing an -march, added armv6 as that reproduced the behaviour being
tested.
Since QEMU have supported -Max option to to enable all normal extensions,
the dis-assembler should also add an option, -M,max to do the same thing.
For the instruction, which have overlapped encodings like zfinx, will not
be considered by the -M,max option.
opcodes/
* riscv-dis.c (all_ext): New static boolean. If set, disassemble
without checking architectire string.
(riscv_disassemble_insn): Likewise.
(parse_riscv_dis_option_without_args): Recognized -M,max option.
binutils/
* NEWS: Updated.
Without this APX support isn't really complete.
For Intel syntax displacement form is needed, such that symbolic
operands won't need prefixing by "offset". (The other form is actually
not used at all in Intel syntax.)
For the record: To restrict displacement form to Intel syntax is not
something I actually agree with.
As soon as I committed Zhaoxin's patch, I realized that I did not
include the regen file. Regenerate them and commit as obvious.
opcodes/ChangeLog:
* i386-tbl.h: Regenerated.
* i386-mnem.h: Ditto.
* i386-init.h: Ditto.
In this patch, we will support AVX10.2 convert instructions. All
of them are new instruction forms.
Among all the instructions, vcvtbiasph2[b,h]f8[,s] needs extra care.
Since Operand 2 could indicate memory size, we do not need suffix
under ATTmode. However, we could not fold all three templates but only
XMM/YMM since the dst operand size are the same for them. Also, a new
iterator <cvt8> is added to reduce redundancy.
gas/
* testsuite/gas/i386/i386.exp: Add AVX10.2 tests.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/avx10_2-256-cvt-intel.d: New.
* testsuite/gas/i386/avx10_2-256-cvt.d: Ditto.
* testsuite/gas/i386/avx10_2-256-cvt.s: Ditto.
* testsuite/gas/i386/avx10_2-512-cvt-intel.d: Ditto.
* testsuite/gas/i386/avx10_2-512-cvt.d: Ditto.
* testsuite/gas/i386/avx10_2-512-cvt.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-cvt-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-cvt.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-cvt.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-cvt-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-cvt.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-cvt.s: Ditto.
opcodes/
* i386-dis-evex-prefix.h: Add PREFIX_EVEX_0F3874,
PREFIX_EVEX_MAP5_18, PREFIX_EVEX_MAP5_1B,
PREFIX_EVEX_MAP5_1E and PREFIX_EVEX_MAP5_74.
* i386-dis-evex.h: Add table pass for AVX10.2
instructions.
* i386-dis.c (MOD_EVEX_0F38B1): New.
(PREFIX_EVEX_0F3874): Ditto.
(PREFIX_EVEX_MAP5_18): Ditto.
(PREFIX_EVEX_MAP5_1B): Ditto.
(PREFIX_EVEX_MAP5_1E): Ditto.
(PREFIX_EVEX_MAP5_74): Ditto.
* i386-opc.tbl: Add AVX10.2 instructions.
* i386-mnem.h: Regenerated.
* i386-tbl.h: Ditto.
Co-authored-by: Kong Lingling <lingling.kong@intel.com>
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
So far template expansion was limited to fields other than the insn
mnemonic. In order to be able to use <fop> also for AVX10.2 we want the
trailing mnemonic part to also be expanded. Split out the respective
piece of code into a helper function, which is then invoked twice.
The enum BFD_RELOC_[32/64] was mistakenly used in the macro instead
of the relocation in fixp. This can cause the second relocation
of a pair to be deleted when -mthin-add-sub is enabled. Apply the
correct macro to fix this.
Also sets the initial value of -mthin-add-sub.
In disassembler part, for vnni instructions, we extended previous
VEX part using %XE in disassembler to promote them to EVEX by reusing
the original VEX table. For vmpsadbw, we will also use %XE. However,
it is hard to reuse the VEX table, so we are using new ones.
In assmbler part, we put the vnni table entries with previous vnni
instructions since they are just promotion from AVX-VNNI-INT{8,16}.
Since we will prefer VEX encoding, we need to use the different table
order in template <vnni>, which prefers EVEX due to earlier introduction
for AVX512_VNNI than AVX_VNNI. This means a new <vnni>. For vdpphps
and vmpsadbw, we put them at the end of the table, with future AVX10.2
instructions.
Nit: I will remove the arch requirement for avx_vnni_int{8,16} in
evex-promote testcases after AVX10.2 implies AVX-VNNI-INT{8,16}.
gas/Changelog:
* testsuite/gas/i386/i386.exp: Add AVX10.2 tests.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/avx10_2-256-1-intel.d: New.
* testsuite/gas/i386/avx10_2-256-1.d: Ditto.
* testsuite/gas/i386/avx10_2-256-1.s: Ditto.
* testsuite/gas/i386/avx10_2-512-1-intel.d: Ditto.
* testsuite/gas/i386/avx10_2-512-1.d: Ditto.
* testsuite/gas/i386/avx10_2-512-1.s: Ditto.
* testsuite/gas/i386/avx10_2-promote.d: Ditto.
* testsuite/gas/i386/avx10_2-promote.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-1-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-1.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-1.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-1-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-1.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-1.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-promote.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-promote.s: Ditto.
opcodes/Changelog:
* i386-dis-evex-prefix.h: Adjust PREFIX_EVEX_0F3852.
Add PREFIX_EVEX_0F3A42_W_0.
* i386-dis-evex-w.h: Adjust EVEX_W_0F3A42.
* i386-dis-evex.h: Add table pass for AVX10.2
instructions.
* i386-dis.c: Adjust PREFIX_VEX_0F3850_W_0, PREFIX_VEX_0F3851_W_0,
PREFIX_VEX_0F38D2_W_0 and PREFIX_VEX_0F38D3_W_0.
* i386-opc.tbl: Add AVX10.2 instructions.
* i386-mnem.h: Regenerated.
* i386-tbl.h: Ditto.
Co-authored-by: Lili Cui <lili.cui@intel.com>
opcodes/
* m68k-dis.c (m68k_opcode_to_insn_type): Define.
(match_insn_m68k): Call it to set insn_type.
(print_insn_arg) [case 'B']: Set branch target address.
(print_insn_m68k): Set insn_info_valid.
.insn or data emitted inside text sections can lead to positions not
being at insn granularity. In such situations using alignment
directives should reliably enforce the requested alignment.
Specifically requests to align back to insn granularity may not be
ignored (where, as a subcase thereof, the ordering of ".option norvc"
and e.g. ".p2align 2" should not matter; so far the alignment directive
needs to come first to have any effect). Similarly ahead of emitting
NOPs alignment first needs to be forced back to insn granularity.
The new testcases actually point out a corner case issue in the
disassembler as well, which is being corrected at the same time: We
don't want to print "0x" without any subsequent digits.