Commit Graph

6731 Commits

Author SHA1 Message Date
Jan Beulich
4a9843c301 gas: drop unnecessary use of tc_comment_chars
The override is necessary only when a target needs other than an array
of const char.

For cris drop redundant sibling declarations at the same time.
2024-08-02 09:44:53 +02:00
Indu Bhagat
d56083b504 gas: x86: ginsn: handle previously missed indirect call and jmp ops
Some flavors of indirect call and jmp instructions were not being
handled earlier, leading to a GAS error (#1):
  (#1) "Error: SCFI: unhandled op 0xff may cause incorrect CFI"

Not handling jmp/call (direct or indirect) ops is an error (as shown
above) because SCFI needs an accurate CFG to synthesize CFI correctly.
Recall that the presence of indirect jmp/call, however, does make the
CFG ineligible for SCFI. In other words, generating the ginsns for them
now, will eventually cause SCFI to bail out later with an error (#2)
anyway:
  (#2) "Error: untraceable control flow for func 'XXX'"

The first error (#1) gives the impression of missing functionality in
GAS.  So, it seems cleaner to synthesize a GINSN_TYPE_JUMP /
GINSN_TYPE_CALL now in the backend, and let SCFI machinery complain with
the error as expected.

The handling for these indirect jmp/call instructions is similar, so
reuse the code by carving out a function for the same.

Adjust the testcase to include the now handled jmp/call instructions as
well.

gas/
	* config/tc-i386-ginsn.c (x86_ginsn_indirect_branch): New
	function.
	(x86_ginsn_new): Refactor out functionality to above.

gas/testsuite/
	* gas/scfi/x86_64/ginsn-cofi-1.l: Adjust the output.
	* gas/scfi/x86_64/ginsn-cofi-1.s: Add further varieties of
	jmp/call opcodes.
2024-08-01 10:07:07 -07:00
Jan Beulich
c39fbc749a x86: move ginsn stuff
This had been badly inserted between md_assemble() and its helpers
anyway. Follow what was done for Arm64 and move the code to its own
file, #include-d as appropriate.
2024-07-31 12:04:03 +02:00
YunQiang Su
08e6af1bac microMIPS: Add MT ASE instruction set support
Add the MT ASE instruction operand types and encodings to the microMIPS
opcode table and enable the assembly of these instructions in GAS from
MIPSr2 onwards.  Update the binutils and GAS testsuites accordingly.

References:

"MIPS Architecture for Programmers, Volume IV-f: The MIPS MT Module for
the microMIPS32 Architecture", MIPS Technologies, Inc., Document Number:
MD00768, Revision 1.12, July 16, 2013

Co-Authored-By: Maciej W. Rozycki <macro@redhat.com>
2024-07-26 18:01:09 +01:00
Jan Beulich
c97f0d71ea x86: accept whitespace around prefix separator
... and prediction suffix comma. Other than documented /**/ comments
currently aren't really converted to a single space, at least not for
x86 in its most common configurations. That'll be fixed subsequently, at
which point blanks may appear where so far none were expected.
Furthermore not permitting blanks around these separators wasn't quite
logical anyway - such constructs are composite ones, and hence
components ought to have been permitted to be separated by whitespace
from the very beginning. Furthermore note how, due to the scrubber being
overly aggressive in removing whitespace, some similar construct with a
prefix were already accepted.

Note how certain other checks in parse_insn() can be simplified as a
result.

While there for the prediction suffix also make checks case-insensitive
and check for a proper trailing separator.
2024-07-26 07:59:53 +02:00
Jan Beulich
1cd36be7c9 x86/APX: optimize certain {nf}-form insns to BMI2 ones
..., as those leave EFLAGS untouched anyway. That's a shorter encoding,
available as long as no eGPR is in use anywhere.
2024-07-26 07:59:04 +02:00
Jan Beulich
1cc4b7d755 bfin: drop _ASSIGN_BANG
A few testcases demonstrate that "=!" isn't supposed to be an
individual token, since "= !" is used in a number of places. So far
lexing that to a single token worked because of the scrubber being
overly aggressive in removing whitespace. As that's going to change,
replace uses by separate ASSIGN and BANG.
2024-07-19 11:56:46 +02:00
Jan Beulich
3fea91b17d x86: accept whitespace inside curly braces
Other than documented /**/ comments currently aren't really converted to
a single space, at least not for x86 in its most common configurations.
That'll be fixed subsequently, at which point blanks may appear where so
far none were expected. Furthermore not permitting blanks immediately
inside curly braces wasn't quite logical anyway - such constructs are
composite ones, and hence components ought to have been permitted to be
separated by whitespace from the very beginning.

With this we also don't care anymore whether the scrubber would remove
whitespace around curly braces, so move them from extra_symbol_chars[]
to operand_special_chars[].

Note: The new testcase doesn't actually exercise much (if any) of the
added code. It is being put in place to ensure that subsequently, when
that code actually comes into play, behavior remains the same.
2024-07-19 11:52:21 +02:00
Jan Beulich
0ff4e567db x86: undo '{' being a symbol-start character
Having it that way has undue side effects, in permitting not only
pseudo-prefixes to be parsed correctly, but also permitting odd symbol
names which ought to be possible only when quoted.  Borrow what other
architectures do: Put in place an "unrecognized line" hook to parse off
any pseudo prefixes, while using the "start of line" hook to reject ones
not actually followed by an insn. For that parsing re-use parse_insn()
in yet a slightly different mode (dealing with only pseudo-prefixes).

With that, pp may no longer be cleared from init_globals(), but instead
needs clearing after a line was fully processed. Since md_assemble() has
pretty many return paths, convert that into a local helper, with a
trivial wrapper around it.

Similarly pp may no longer be updated (by check_register()) when
processing anything other than insn operands. To be able to (easily)
recognize the case, clear current_templates.start when done with an insn
(or with .insn).
2024-07-19 11:44:07 +02:00
Jan Beulich
e3bfcef3f2 x86: split pseudo-prefix state from i386_insn
Subsequently we will want to update that ahead of md_assemble(), with
that function needing to take into account such earlier updating.
Therefore it'll want resetting separately from i.
2024-07-19 11:43:37 +02:00
Jan Beulich
8ba953169c x86/APX: add CMPcc/CTESTcc cases to noreg64 tests
This was missed when support for the insns was added. Just like for
DATA16, in

	rex64 neg (%rax)
	rex64 neg (%r16)
	rex64 {nf} neg (%rax)

it is not logical why the last one shouldn't be permitted. Bypassing
that check requires other adjustments, though, to actually properly
consume (and then squash) the prefix.
2024-07-19 10:54:22 +02:00
zhangxianting
88e7d674ef bfin: free the allocated memory 2024-07-19 10:53:12 +02:00
Maciej W. Rozycki
875ac09b12 MIPS/GAS: Handle --trap command-line option dynamically
We have an ISA check for the '--trap' command-line option that reports
its incompatibility with the MIPS I architecture.  It doesn't prevent
trap instructions from being enabled though, so when attempt is made to
emit one in an expansion of one of the division or multiplication macros
an assertion failure triggers:

.../gas/testsuite/gas/mips/brtr-opt.s: Assembler messages:
.../gas/testsuite/gas/mips/brtr-opt.s:3: Error: trap exception not supported at ISA 1
.../gas/testsuite/gas/mips/brtr-opt.s:9: Warning: divide by zero
.../gas/testsuite/gas/mips/brtr-opt.s:9: Internal error in macro_build at .../gas/config/tc-mips.c:9064.
Please report this bug.

The same assertion failure triggers without an earlier error message
when the initial ISA is compatible with the '--trap', however at the
time an attempt is made to emit a trap instruction from a division or
multiplication macro the ISA has been changed by a '.set' pseudo-op to
an incompatible one.

With the way the situations are mishandled it seems unlikely that anyone
relies on the current semantics and a sane approach is to decide on the
fly according to the currently selected ISA as to whether to emit trap
or breakpoint instructions in the case where '--trap' has been used.

Change our code to do so then and clarify that in the manual, which is
not explicit about how '--trap' is handled with a changing ISA.  Mention
the change in NEWS too since it's a applies to a user option.
2024-07-19 09:42:56 +01:00
Indu Bhagat
29085f7243 gas: aarch64: add experimental support for SCFI
For synthesizing CFI (SCFI) for hand-written asm, the SCFI machinery in
GAS works on the generic GAS insns (ginsns).  This patch adds support in
the aarch64 backend to create ginsns for a subset of the supported
machine instructions.  The subset includes the minimal necessary
instructions to ensure SCFI correctness:

- Any potential register saves and unsaves.  Hence, process instructions
  belonging to a variety of iclasses involving str, ldr, stp, ldp.
- Any change of flow instructions.  This includes all conditional and
  unconditional branches, call (bl, blr, etc.) and return.
- Most importantly, any instruction that could affect the two registers
  of interest: REG_SP, REG_FP.  This set includes all pre-indexed and
  post-indexed memory operations, with writeback, on the stack.  This
  set must also include other instructions (e.g., arithmetic insns)
  where the destination register is one of the afore-mentioned registers.

With respect to callee-saved registers in Aarch64, FP/Advanced SIMD
registers D8-D15 are included along with the relevant GPRs.  Calculating
offsets for loads and stores especially for Q registers needs special
attention here.

As an example,
   str q8, [sp, #16]
On big-endian:
   STR Qn stores as a 128-bit integer (MSB first), hence, should record
   D8 as being saved at sp+24 rather than sp+16.
On little-endian:
   should record D8 as being saved at sp+16

D8-D15 are the low 64 bits of Q8-Q15, and of Z8-Z15 if SVE is used;
hence, they remain "interesting" for SCFI purposes in such cases.  A CFI
save slot always represents the low 64 bits, regardless of whether a
save occurs on D, Q or Z registers.  Currently, the ginsn creation
machinery can handle D and Q registers on little-endian and big-endian.

Apart from creating ginsn, another key responsibility of the backend is
to make sure there are safeguards in place to detect and alert if an
instruction of interest may have been skipped.  This is done via
aarch64_ginsn_unhandled () (similar to the x86 backend).  This function
, hence, is also intended to alert when future ISA changes may otherwise
render SCFI results incorrect, because of missing ginsns for the newly
added machine instructions.

At this time, becuase of the complexities wrt endianness in handling Z
register usage, skip sve_misc opclass altogether for now.  The SCFI
machinery will error out (using the aarch64_ginsn_unhandled () code
path) though if Z register usage affects correctness.

The current SCFI machinery does not currently synthesize the
PAC-related, aarch64-specific CFI directives: .cfi_b_key_frame.  The
support for this is planned for near future.

SCFI is enabled for ELF targets only.

gas/
	* config/tc-aarch64-ginsn.c: New file.
	* config/tc-aarch64.c (md_assemble): Include tc-aarch64-ginsn.c
	file.  Invoke aarch64_ginsn_new.
	* config/tc-aarch64.h (TARGET_USE_GINSN): Define for SCFI
	enablement.
	(TARGET_USE_SCFI): Likewise.
	(SCFI_MAX_REG_ID): New definition.
	(REG_FP): Likewise.
	(REG_LR): Likewise.
	(REG_SP): Likewise.
	(SCFI_INIT_CFA_OFFSET): Likewise.
	(SCFI_CALLEE_SAVED_REG_P): Likewise.
	(aarch64_scfi_callee_saved_p): New declaration.
2024-07-18 20:54:14 -07:00
Maciej W. Rozycki
61022df13c Revert "MIPS/GAS: Omit LI 0 for condition trap"
This reverts commit bfa257b407.  It was
applied unapproved.
2024-07-13 06:00:43 +01:00
Srinath Parvathaneni
6ab366f264 aarch64: Add support for sme2.1 movaz instructions.
This patch adds support for following sme2.1 movaz instructions and
the spec is available here [1].

1. MOVAZ (array to vector, two registers).
2. MOVAZ (array to vector, four registers).
3. MOVAZ (tile to vector, single).

[1]: https://developer.arm.com/documentation/ddi0602/2024-03/SME-Instructions?lang=en
2024-07-12 15:40:48 +01:00
Jan Beulich
3367789048 x86/APX: remove two inconsistencies
As indicated in earlier discussion, permitting GOTTPOFF uniformly for
all legacy non-SIMD insns while at the same time restricting to just
certain ADD forms when EVEX-encoded is inconsistent. Make promoted insns
"equal" to their legacy original ones. Doing that adjustment prevents
another inconsistency, too: In

	data16 neg (%rax)
	data16 neg (%r16)
	data16 {nf} neg (%rax)

it is not logical why the last one shouldn't be permitted. Bypassing
that check requires other adjustments, though, to actually properly
consume (and then squash) the data size prefix.

While there also add the missing CMP and TEST cases to the test case
being modified.
2024-07-12 12:28:03 +02:00
Jan Beulich
eb81ff85a0 x86/APX: correct TEST/CTESTcc with 1st operand being a memory one
While they properly inherited D and C, code processing the reversal of
operands wasn't updated accordingly (and "reversed" operands also
weren't tested anywhere).
2024-07-12 12:27:19 +02:00
YunQiang Su
bfa257b407 MIPS/GAS: Omit LI 0 for condition trap
MIPSr6 removes condition trap instructions with imm, so we expand
the instruction like "tne $2,IMM" to
	li	$at,IMM
	tne	$2,$at
While if IMM is 0, we can use
	tne	$2,$zero
only.
2024-07-12 18:19:35 +08:00
Matthieu Longo
0d988fbb4e aarch64: disable feature b16b16
Feature b16b16 is currently incomplete and requires re-work.

Disable the command line option for b16b16, and mark the associated
tests as XFAIL.
2024-07-12 11:05:35 +01:00
srinath
de7a30ceaa aarch64: Add support for sve2p1 pmov instruction.
This patch adds support for followign SVE2p1 instruction, spec is available here [1].

1. PMOV (to vector)
2. PMOV (to predicate)

Both pmov (to vector) and pmov (to predicate) have destination scalable vector
register and source scalable vector register respectively as an operand with no
suffix and optional index. To handle this case we have added 8 new operands in
this patch.

AARCH64_OPND_SVE_Zn0_INDEX,      /* Zn[index], bits [9:5].  */
AARCH64_OPND_SVE_Zn1_17_INDEX,    /* Zn[index], bits [9:5,17].  */
AARCH64_OPND_SVE_Zn2_18_INDEX,    /* Zn[index], bits [9:5,18:17].  */
AARCH64_OPND_SVE_Zn3_22_INDEX,    /* Zn[index], bits [9:5,18:17,22].  */
AARCH64_OPND_SVE_Zd0_INDEX,      /* Zn[index], bits [4:0].  */
AARCH64_OPND_SVE_Zd1_17_INDEX,    /* Zn[index], bits [4:0,17].  */
AARCH64_OPND_SVE_Zd2_18_INDEX,    /* Zn[index], bits [4:0,18:17].  */
AARCH64_OPND_SVE_Zd3_22_INDEX,    /* Zn[index], bits [4:0,18:17,22].  */

Since the index of the <Zd> operand is optional, the index part is
dropped in disassembly in both the cases of "no index" or "zero index".

As per spec: PMOV <Zd>{[<imm>]}, <Pn>.D
             PMOV <Pn>.D, <Zd>{[<imm>]}

Example1:
	Assembly: pmov z5[0], p6.d
	Disassembly: pmov z5, p6.d

        Assembly: pmov z5, p6.d
        Disassembly: pmov z5, p6.d

Example2:
	Assembly: pmov p4.b, z5[0]
	Disassembly: pmov p4.b, z5

        Assembly: pmov p4.b, z5
        Disassembly: pmov p4.b, z5
[1]: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions?lang=en
2024-07-08 17:48:23 +01:00
Lingling Kong
97bf50bb61 x86-64: Fix support for APX NF TLS IE with 2 operands
Added the restriction in assemble for APX TLS IE that the destination
can only be a register.

gas/

      * config/tc-i386.c (md_assemble): Added stricter restrictions
      for APX TLS IE.
2024-07-05 18:25:59 +08:00
Jan Beulich
e6292a4b2c aarch64: fix build with old glibc
As was pointed out several times before, old glibc declares index(),
resulting in warnings from -Wshadow, in turn failing the build due to
-Werror.
2024-07-05 08:38:39 +02:00
Sun Sunny
c83ea305e2 RISC-V: Fix BFD_RELOC_RISCV_PCREL_LO12_S patch issue
In commit dff565fcca, the fixups
for PCREL_LO12_I and PCREL_LO12_S were mixed, so the "IMM"
field were applied to incorrect position, this caused incorrect
src registers to be encoded.

gas/
	* config/tc-riscv.c (md_apply_fix): Fix PCREL_LO12_S issue.
	* testsuite/gas/riscv/ixup-local.s: Updated for PCREL_LO12_S cases.
	* testsuite/gas/riscv/fixup-local-relax.d: Likewise.
	* testsuite/gas/riscv/fixup-local-norelax.d: Likewise.

Signed-off-by: Jianwei Sun <sunny.sun@corelabtech.com>
2024-07-04 21:36:48 +08:00
Lifang Xia
f9d218de5c RISC-V: hash with segment id and pcrel_hi address while recording pcrel_hi
When the same address across different segments (sections) needs to be
recorded, it will overwrite the slot, leading to a memory leak. To ensure
uniqueness, the segment (section) ID needs to be included in the hash key
calculation.

gas/
	* config/tc-riscv.c (riscv_pcrel_hi_fixup): New "const asection *sec".
	(riscv_pcrel_fixup_hash): make sec->id and e->adrsess as the
	hash key.
	(riscv_pcrel_fixup_eq): Check sec->id at first.
	(riscv_record_pcrel_fixup): New member "sec".
	(md_apply_fix) <case BFD_RELOC_RISCV_PCREL_HI20>: Likewise.
	(md_apply_fix) <case BFD_RELOC_RISCV_PCREL_LO12_I>: Likewise.
2024-07-04 21:36:21 +08:00
Andre Vieira
433e2bef4a mve: Fix encoding for vcvt[bt] single-half float conversion instructions
The encoding was previously not taking into account that the Quad vector
registers were being encoded using their Q-register numbers rather than their
D-register equivalent (multiply by 2).

gas/

	* config/tc-arm.c (do_neon_cvttb_1): Use Q-register vector number
	rather than their D-register equivalent.

gas/testsuite/

	* gas/arm/mve-vcvt-3.d: Correct expected values in test.
2024-07-04 13:48:26 +01:00
Jens Remus
8086fe0b3d gas: Enhance arch-specific SFrame configuration descriptions
Explicitly mention "SFrame" in the descriptions for the architecture-
specific SFrame configuration macros, variables, and functions.

Use the term "frame pointer" (FP) instead of "base pointer". This aligns
with the terminology used in the SFrame specification. Additionally it
helps not to confuse "base-pointer register" with the term "BASE_REG"
used in the specification to denote either the SP or FP register.

Specify what the SFRAME_CFA_*_REG register numbers are used for:
- SP (stack pointer): CFA tracking
- FP (frame pointer): CFA and FP tracking
- RA (return address): RA tracking

Align the descriptions for definitions in the source files to the
declarations in the header files.

gas/
	* config/tc-aarch64.h: Enhance architecture-specific SFrame
	configuration descriptions.
	* config/tc-aarch64.c: Likewise.
	* config/tc-i386.h: Likewise.
	* config/tc-i386.c: Likewise.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
2024-07-04 10:34:12 +02:00
Jens Remus
73b56ef2fd x86: Remove unused SFrame CFI RA register variable
gas/
	* config/tc-i386.c (x86_sframe_cfa_ra_reg): Remove.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
2024-07-04 10:34:12 +02:00
Cui, Lili
b0dd832fa4 Support APX CFCMOV
The CMOVcc instruction proposed by EVEX has four different forms,
corresponding to the four possible combinations of EVEX.ND and EVEX.NF
values.

In the encoder part, when the CFCMOV template supports EVEX_NF, it means that
it requires EVEX.NF to be 1.

In the decoder part, CFCMOV_Fixup is used to reverse source and destination
operands in the 2-operand case.

gas/ChangeLog:

        * config/tc-i386.c (build_apx_evex_prefix): Set NF bit for cfcmov
        when the insn template supports EVEX_NF.
        * testsuite/gas/i386/x86-64-apx-inval.l: Add invalid tests for cfcmov.
        * testsuite/gas/i386/x86-64-apx-inval.s: Ditto.
        * testsuite/gas/i386/x86-64.exp: Add tests for cfcmov and cmov.
        * testsuite/gas/i386/x86-64-apx-cfcmov-intel.d: Ditto.
        * testsuite/gas/i386/x86-64-apx-cfcmov.d: Ditto.
        * testsuite/gas/i386/x86-64-apx-cfcmov.s: Ditto.

opcodes/ChangeLog:

        * i386-dis-evex-prefix.h: Add cfcmov instructions.
        * i386-dis.c (CFCMOV_Fixup): Special handling of cfcmov.
        (putop): Print 'cf' for cfcmov instructions.
        * i386-opc.h (EVEX_NF): New.
        * i386-opc.tbl: Add cfcmov instructions.
        * i386-mnem.h: Regerated.
        * i386-tbl.h: Regerated.
2024-07-04 15:55:00 +08:00
Lingling Kong
2d5428d8cd x86-64: Support APX NF TLS IE with 2 operands
Support APX NF TLS IE with 2 operands.Verify it with ld and gold.

gas/

	* config/tc-i386.c (md_assemble): Allow APX NF TLS IE with
	2 operands.
	* testsuite/gas/i386/x86-64-gottpoff.d: Updated.
	* testsuite/gas/i386/x86-64-gottpoff.s: Add APX NF TLS IE
	tests with 2 operands.

gold/

	* testsuite/x86_64_ie_to_le.s: Add APX NF TLS IE tests with
	2 operands.
	* testsuite/x86_64_ie_to_le.sh: Updated.

ld/

	* testsuite/ld-x86-64/tlsbindesc.s: Add APX NF TLS IE tests
	with 2 operands.
	* testsuite/ld-x86-64/tlsbindesc.d: Updated.
	* testsuite/ld-x86-64/tlsbindesc.rd: Likewise.
2024-07-03 10:18:36 +08:00
Claudio Bantaloukas
032eb4f718 aarch64: Add support for Armv9.5-A architecture
The new -march=armv9.5-a flag enables access to the
mandatory cpa, lut and faminmax extensions.
Existing test cases for features are extended to verify they
work without additional flags.
2024-06-28 14:52:30 +01:00
Jan Beulich
2513312930 x86/APX: apply NDD-to-legacy transformation to further CMOVcc forms
With both sources being registers, these insns are almost commutative;
the only extra adjustment needed is inversion of the encoded condition.
2024-06-28 08:24:45 +02:00
Jan Beulich
7add993917 x86/APX: extend TEST-by-imm7 optimization to CTESTcc
The same properties apply there.
2024-06-28 08:24:12 +02:00
Jan Beulich
82e06fa803 x86/APX: optimize {nf}-form IMUL-by-power-of-2 to SHL
..., for differing only in the resulting EFLAGS, which are left
untouched anyway. That's a shorter encoding, available as long as
certain constraints on operands are met; see code comments. (SHL-by-1
forms may then be subject to further optimization that was introduced
earlier.)

Note that kind of as a side effect this also converts multiplication by
1 to shift by 0, which is a plain move or even no-op anyway. That could
be further shrunk (as could be presence of shifts/rotates by 0 in the
original code as  well as a fair set of other {nf}-form insns), yet the
expectation (for now) is that people won't write such code in the first
place.
2024-06-28 08:22:39 +02:00
Jan Beulich
2a7f257afb x86-64: restrict by-imm31 optimization
Avoid changing the encoding when there's no size gain: If there's a REX
or REX2 prefix anyway and the base opcode wouldn't be changed, dropping
just REX.W / REX2.W has no (size) effect. (Same for the AND-by-imm7 case
in the same big conditional.)

While there also pull out the .qword check: For the 2-register-operands
case whether that's done on the 1st or 2nd operand doesn't matter. Due
to reduction in necessary parentheses this improves readability a tiny
bit.
2024-06-28 08:21:48 +02:00
Jan Beulich
27ef4876f7 x86/APX: optimize certain {nf}-form insns to LEA
..., as that leaves EFLAGS untouched anyway. That's a shorter encoding,
available as long as certain constraints on operand size and registers
are met; see code comments.

Note that this requires deferring to derive encoding_evex from {nf}
presence, as in optimize_encoding() we want to avoid touching the insns
when {evex} was also used.

Note further that this requires want_disp32() to now also consider the
opcode: We don't want to replace i.tm.mnem_off, for diagnostics to still
report the original mnemonic (or else things can get confusing). While
there, correct adjacent mis-indentation.
2024-06-28 08:19:59 +02:00
Jan Beulich
c7eae03eab x86/APX: optimize {nf}-form rotate-by-width-less-1
Unlike for the legacy forms, where there's a difference in the resulting
EFLAGS.CF, for the NF variants the immediate can be got rid of in that
case by switching to a 1-bit rotate in the opposite direction.
2024-06-28 08:19:32 +02:00
Jan Beulich
0868b8999b x86/APX: optimize {nf} forms of ADD/SUB with specific immediates
Unlike for the legacy forms, where there's a difference in the resulting
EFLAGS, for the NF variants we can safely replace ones using 0x80 by the
respectively other insn while negating the immediate, saving 3 immediate
bytes (just 1 though for 16-bit operand size). Similarly we can replace
ones using 1 / -1 by INC/DEC (eliminating the immediate).
2024-06-28 08:18:40 +02:00
Jens Remus
da47588db1 aarch64: Treat operand ADDR_SIMPLE as address with base register
The AArch64 instruction table (aarch64-tbl.h) defines the operand
ADDR_SIMPLE as "address with base register (no offset)". During assembly
it is correctly encoded as address with base register (addr.base_regno)
in parse_operands. In warn_unpredictable_ldst it is erroneously treated
as register number (reg.regno).

This resolves the assembler test case "Diagnostics Quality" to
erroneously fail when changing the union in struct aarch64_opnd_info
from union to struct for debugging purposes.

gas/
	* config/tc-aarch64.c: Treat operand ADDR_SIMPLE as address with
	base register.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
2024-06-25 17:25:55 +02:00
Srinath Parvathaneni
4f2cb9d129 aarch64: Fix sve2p1 ld[1-4]/st[1-4]q instruction operands.
This patch fixes encoding and syntax for sve2p1 instructions ld[1-4]q/st[1-4]q
as mentioned below, for the issues reported here.
https://sourceware.org/pipermail/binutils/2024-February/132408.html

1) Previously all the ld[1-4]q/st[1-4]q instructions are wrongly added as
predicated instructions and this issue is fixed in this patch by replacing
"SVE2p1_INSNC" with "SVE2p1_INSN" macro.
2) Wrong first operand in all the ld[1-4]q/st[1-4]q instructions is fixed
by replacing "SVE_Zt" with "SVE_ZtxN".
3) Wrong operand qualifiers in ld1q and st1q instructions are also fixed in
this patch.
4) In ld1q/st1q the index in the second argument is optional and if index
   is xzr and is skipped in the assembly, the index field is ignored by the
   disassembler.

Fixing above mentioned issues helps with following:
1) ld1q and st1q first register operand accepts enclosed figure braces.
2) ld2q, ld3q, ld4q, st2q, st3q, and st4q instructions accepts wrapping
   sequence of vector registers.

For the instructions ld[2-4]q/st[2-4]q, tests for wrapping sequence of vector
registers are added along with short-form of operands for non-wrapping sequence.

I have added test using following logic:
ld2q {Z0.Q, Z1.Q}, p0/Z, [x0,  #0, MUL VL]  //raw insn encoding (all zeroes)
ld2q {Z31.Q, Z0.Q}, p0/Z, [x0,  #0, MUL VL] // encoding of <Zt1>
ld2q {Z0.Q, Z1.Q}, p7/Z, [x0,  #0, MUL VL] // encoding of <Pg>
ld2q {Z0.Q, Z1.Q}, p0/Z, [x30,  #0, MUL VL] // encoding of <Xm>
ld2q {Z0.Q, Z1.Q}, p0/Z, [x0,  #-16, MUL VL] // encoding of <imm> (low value)
ld2q {Z0.Q, Z1.Q}, p0/Z, [x0,  #14, MUL VL] // encoding of <imm> (high value)
ld2q {Z31.Q, Z0.Q}, p7/Z, [x30,  #-16, MUL VL] // encoding of all fields (all ones)
ld2q {Z30.Q, Z31.Q}, p1/Z, [x3,  #-2, MUL VL] // random encoding.

For all the above form of instructions the hyphenated form is preferred for
disassembly if there are more than two registers in the list, and the register
numbers are monotonically increasing in increments of one.
2024-06-25 13:38:48 +01:00
Srinath Parvathaneni
f50b1a3c1f aarch64: Fix sve2p1 extq instruction operands.
This patch fixes the syntax of sve2p1 "extq" instruction by modifying the operands
count to 4. A new operand AARCH64_OPND_SVE_UIMM4 is defined to handle the 4th
argument an 4-bit unsigned immediate of extq instruction. The instruction encoding
is updated to use constraint C_SCAN_MOVPRFX, to enable "extq" instruction to immediately
precede in program order by a MOVPRFX instruction. Also removed the unused operand
AARCH64_OPND_SVE_Zm_imm4.

This issues was reported here:
 https://sourceware.org/pipermail/binutils/2024-February/132408.html
2024-06-25 13:38:48 +01:00
Andrew Carlotti
a6e529673a aarch64: Add SME FP8 multiplication instructions
This includes:
- FEAT_SME_F8F32 (+sme-f8f32)
- FEAT_SME_F8F16 (+sme-f8f16)

The FP16 addition/subtraction instructions originally added by
FEAT_SME_F16F16 haven't been added to Binutils yet.  They are also
required to be enabled if FEAT_SME_F8F16 is present, so they are
included in this patch.
2024-06-24 16:50:28 +01:00
Andrew Carlotti
59b78ab1c1 aarch64: Add FP8 Neon and SVE multiplication instructions
This includes all the instructions under the following features:
- FEAT_FP8FMA (+fp8fma)
- FEAT_FP8DOT4 (+fp8dot4)
- FEAT_FP8DOT2 (+fp8dot2)
- FEAT_SSVE_FP8FMA (+ssve-fp8fma)
- FEAT_SSVE_FP8DOT4 (+ssve-fp8dot4)
- FEAT_SSVE_FP8DOT2 (+ssve-fp8dot2)
2024-06-24 16:50:28 +01:00
Andrew Carlotti
05f15256d0 aarch64: Add support for virtual features
These features will be used to gate instructions that can be enabled by
either of two (or more) different sets of command line feature flags.

This patch add a postprocessing step to the feature parsing code to
set the value of the virtual bits.
2024-06-24 16:49:52 +01:00
Andrew Carlotti
92d37320d5 aarch64: Move struct definition towards its usage 2024-06-24 16:49:52 +01:00
saurabh.jha@arm.com
adea87e275 gas, aarch64: Add SME2 lutv2 extension
Introduces instructions for the SME2 lutv2 extension for AArch64. They
are documented in the following document:

  * ARM DDI0602

For both luti4 instructions, we introduced an operand called
SME_Znx2_BIT_INDEX. We use the existing function parse_vector_reg_list
for parsing but modified that function so that it can accept operands
without qualifiers and rejects instructions that have operands with
qualifiers but are not supposed to have operands with qualifiers.
For disassembly, we modified print_register_list so that it could
accept register lists without qualifiers.

For one luti4 instruction, we introduced a SME_Zdnx4_STRIDED. It is
similar to SME_Ztx4_STRIDED and we could use existing code for parsing,
encoding, and disassembly.

For movt instruction, we introduced an operand called SME_ZT0_INDEX2_12.
This is a ZT0 register with a bit index encoded in [13:12]. It is
similar to SME_ZT0_INDEX.

We also introduced an iclass named sme_size_12_b so that we can encode
size bits [13:12] correctly when only 'b' is allowed as qualifier.
2024-06-24 15:00:40 +01:00
Jan Beulich
f4a966a91d x86: optimize {,V}PEXTR{D,Q} with immediate of 0
Such are equivalent to simple moves, which are up to 3 bytes shorter to
encode (and perhaps also cheaper to execute).
2024-06-21 14:40:44 +02:00
Jan Beulich
fa2c4239f1 x86: optimize left-shift-by-1
These can be replaced by adds when acting on a register operand.

While for the scalar forms there's no gain in encoding size, ADD
generally has higher throughput than SHL. EFLAGS set by ADD are a
superset of those set by SHL (AF in particular is undefined there).

For the SIMD cases the transformation also reduced code size, by
eliminating the 1-byte immediate from the resulting encoding. Note
that this transformation is not applied by gcc13 (according to my
observations), so would - as of now - even improve compiler generated
code.
2024-06-21 14:39:52 +02:00
Jan Beulich
8bcda53caa x86: %riz, %rip, and %eip don't require REX
While these can't be used as register operands, they can be used for
memory operand addressing. Such uses do not prevent conversion: The
RegRex64 checks in check_Rex_required() for base and index registers
were simply wrong. They specifically also aren't needed for byte
registers, as those won't pass i386_index_check() anyway.
2024-06-21 08:35:23 +02:00
Jan Beulich
c68a6e5cad x86: don't suppress errors when optimizing
Blindly ignoring any mnemonic suffix can't be quite right: Bad suffix /
operand combinations still want flagging. Simply avoid optimizing in
such situations.
2024-06-21 08:33:57 +02:00