Right now the opcode table has entries with ISA restrictions of the form
FEAT1|FEAT2, the meaning of which depends on context and requires
special treatment in tc-i386.c: Sometimes this means "both features
requires", whereas originally it was intended to solely mean "all of
these features required". Split the field, with the original one
regaining its original meaning. The new field now truly means "any of
these". The combination of both fields is still and &&-type check, i.e.
(all of these) && (any of these). In the opcode table more involved
combinations of features then also need expressing this way: "all"
entities first, follow by "any" entities enclosed in parentheses, e.g.
x64&(AVX|AVX512F). If the "all" part is empty, parentheses may not be
added around the "any" part (unless parsing logic was further relaxed).
Note that this way AVX512VL no longer needs as much special treatment,
and hence templates previously using AVX512F|AVX512VL are switched to
just AVX512VL.
Note further that this requires FMA handling as resulting from
da0784f961 ("x86: fold FMA VEX and EVEX templates") to be slightly
re-done: FMA now becomes more similar to AVX and AVX2.
First of all we want to also accumulate its reverse dependencies, such
that we can use them in cpu_flags_match(). This is in particular in
preparation of APX additions, such that e.g. BMI VEX-encoding templates
can become combined VEX/EVEX ones.
Once we have the reverse dependencies, we can further leverage them to
omit explicit "&x64" from any insn templates dealing with 64-bit-mode-
only ISA extensions. Besides helping readability for several insn
templates we already have, this will also help with what is going to be
added for APX (as all of the new templates would otherwise need to have
"&x64").
Note that rather than leaving a meaningless CPU_64_FLAGS (which is
unused anyway), its emitting is now also suppressed.
Enable the `+lse128' feature modifier which, together with new
internal feature flags, enables LSE128 instructions, which are
represented via the new `_LSE128_INSN' macro.
gas/ChangeLog:
* config/tc-aarch64.c (aarch64_features): Add new "lse128"
entry.
include/ChangeLog:
* include/opcode/aarch64.h (enum aarch64_feature_bit): New
AARCH64_FEATURE_LSE128 feature bit.
(enum aarch64_insn_class): New lse128_atomic instruction class.
opcodes/ChangeLog:
* opcodes/aarch64-tbl.h (aarch64_feature_lse128): New.
(LSE128): Likewise.
(_LSE128_INSN): Likewise.
Given the particular encoding of the LSE128 instructions, create the
necessary shared input+output operand register description and
handling in the code to allow for the encoding of the LSE128 128-bit
atomic operations.
gas/ChangeLog:
* config/tc-aarch64.c (parse_operands):
include/ChangeLog:
* opcode/aarch64.h (enum aarch64_opnd):
opcodes/ChangeLog:
* aarch64-opc.c (fields):
(aarch64_print_operand):
* aarch64-opc.h (enum aarch64_field_kind):
* aarch64-tbl.h (AARCH64_OPERANDS):
In preparation for the implementation of 128-bit system register
support across the toolchain, this patch adds the feature flag
F_REG_128 and adds it to relevant system registers in
`aarch64-sys-regs.def'.
Given the shared nature of this file, this change is made necessary
initially to implement argument validation in the `__arm_rsr128' and
`__armwsr128' ACLE intrinsics in GCC, but will be of subsequent use in
the binutils implementation of the corresponding `mrrs' and `msrr'
instructions.
Regression tested on aarch64-linux-gnu, no regressions.
opcodes/ChangeLog:
* aarch64-opc.h (F_REG_128): New flag.
* aarch64-sys-regs.def (par_el1): Add F_REG_128 flag.
(rcwmask_el1): Likewise.
(rcwsmask_el1): Likewise.
(ttbr0_el1): Likewise.
(ttbr0_el12): Likewise.
(ttbr0_el2): Likewise.
(ttbr1_el1): Likewise.
(ttbr1_el12): Likewise.
(ttbr1_el2): Likewise.
(vttbr_el2): Likewise.
Add Binutils support for system registers associated with the
Translation Hardening Extension (THE).
In doing so, we also add core feature support for THE, enabling its
associated feature flag and implementing the necessary
feature-checking machinery.
Regression tested on aarch64-linux-gnu, no regressions.
gas/ChangeLog:
* config/tc-aarch64.c (aarch64_features): Add "+the" feature modifier.
* doc/c-aarch64.texi (AArch64 Extensions): Update
documentation for `the' option.
* testsuite/gas/aarch64/sysreg-8.s: Add tests for `the'
associated system registers.
* testsuite/gas/aarch64/sysreg-8.d: Likewise.
include/ChangeLog:
* opcode/aarch64.h (enum aarch64_feature_bit): Add
AARCH64_FEATURE_THE.
opcode/ChangeLog:
* aarch64-opc.c (aarch64_sys_ins_reg_supported_p): Add `the'
system register check support.
* aarch64-sys-regs.def: Add `rcwmask_el1' and `rcwsmask_el1'
* aarch64-tbl.h: Define `THE' preprocessor macro.
Spec: https://docs.openhwgroup.org/projects/cv32e40p-user-manual/en/latest/instruction_set_extensions.html
Contributors:
Mary Bennett <mary.bennett@embecosm.com>
Nandni Jamnadas <nandni.jamnadas@embecosm.com>
Pietra Ferreira <pietra.ferreira@embecosm.com>
Charlie Keaney
Jessica Mills
Craig Blackmore <craig.blackmore@embecosm.com>
Simon Cook <simon.cook@embecosm.com>
Jeremy Bennett <jeremy.bennett@embecosm.com>
Helene Chelin <helene.chelin@embecosm.com>
bfd/ChangeLog:
* elfxx-riscv.c (riscv_multi_subset_supports): Added `xcvalu`
instruction class.
(riscv_multi_subset_supports_ext): Likewise.
gas/ChangeLog:
* config/tc-riscv.c (validate_riscv_insn): Added the necessary
operands for the extension.
(riscv_ip): Likewise.
* doc/c-riscv.texi: Noted XCValu as an additional ISA extension
for CORE-V.
* testsuite/gas/riscv/cv-alu-boundaries.d: New test.
* testsuite/gas/riscv/cv-alu-boundaries.l: New test.
* testsuite/gas/riscv/cv-alu-boundaries.s: New test.
* testsuite/gas/riscv/cv-alu-fail-march.d: New test.
* testsuite/gas/riscv/cv-alu-fail-march.l: New test.
* testsuite/gas/riscv/cv-alu-fail-march.s: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-01.d: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-01.l: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-01.s: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-02.d: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-02.l: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-02.s: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-03.d: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-03.l: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-03.s: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-04.d: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-04.l: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-04.s: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-05.d: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-05.l: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-05.s: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-06.d: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-06.l: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-06.s: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-07.d: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-07.l: New test.
* testsuite/gas/riscv/cv-alu-fail-operand-07.s: New test.
* testsuite/gas/riscv/cv-alu-insns.d: New test.
* testsuite/gas/riscv/cv-alu-insns.s: New test.
opcodes/ChangeLog:
* riscv-dis.c (print_insn_args): Disassemble xcb operand.
* riscv-opc.c: Defined the MASK and added XCValu instructions.
include/ChangeLog:
* opcode/riscv-opc.h: Added corresponding MATCH and MASK macros
for XCValu.
* opcode/riscv.h: Added corresponding EXTRACT and ENCODE macros
for XCValu.
(enum riscv_insn_class): Added the XCValu instruction class.
Spec: https://docs.openhwgroup.org/projects/cv32e40p-user-manual/en/latest/instruction_set_extensions.html
Contributors:
Mary Bennett <mary.bennett@embecosm.com>
Nandni Jamnadas <nandni.jamnadas@embecosm.com>
Pietra Ferreira <pietra.ferreira@embecosm.com>
Charlie Keaney
Jessica Mills
Craig Blackmore <craig.blackmore@embecosm.com>
Simon Cook <simon.cook@embecosm.com>
Jeremy Bennett <jeremy.bennett@embecosm.com>
Helene Chelin <helene.chelin@embecosm.com>
bfd/ChangeLog:
* elfxx-riscv.c (riscv_multi_subset_supports): Added `xcvmac`
instruction class.
(riscv_multi_subset_supports_ext): Likewise.
gas/ChangeLog:
* config/tc-riscv.c (validate_riscv_insn): Added the necessary
operands for the extension.
(riscv_ip): Likewise.
* doc/c-riscv.texi: Noted XCVmac as an additional ISA extension
for CORE-V.
* testsuite/gas/riscv/cv-mac-fail-march.d: New test.
* testsuite/gas/riscv/cv-mac-fail-march.l: New test.
* testsuite/gas/riscv/cv-mac-fail-march.s: New test.
* testsuite/gas/riscv/cv-mac-fail-operand.d: New test.
* testsuite/gas/riscv/cv-mac-fail-operand.l: New test.
* testsuite/gas/riscv/cv-mac-fail-operand.s: New test.
* testsuite/gas/riscv/cv-mac-insns.d: New test.
* testsuite/gas/riscv/cv-mac-insns.s: New test.
opcodes/ChangeLog:
* riscv-dis.c (print_insn_args): Disassemble information with
the EXTRACT macro implemented.
* riscv-opc.c: Defined the MASK and added
XCVmac instructions.
include/ChangeLog:
* opcode/riscv-opc.h: Added corresponding MATCH and MASK macros
for XCVmac.
* opcode/riscv.h: Added corresponding EXTRACT and ENCODE macros
for uimm.
(enum riscv_insn_class): Added the XCVmac instruction class.
Within the groups L{B,BU,H,HU,W,WU,D}, S{B,H,W,D}, FL{H,W,D,Q}, and
FS{H,W,D,Q} the sole difference between the handling is the insn
mnemonic passed to the common handling functions. The intended mnemonic,
however, can easily be retrieved. Furthermore leverags that Sx and FSx
are then handled identically, too, and hence their cases can also be
folded.
This patch adds support for 10 new AArch64 system registers
(gcscre0_el1, gcscr_el1, gcscr_el12, gcscr_el2, gcscr_el3,
gcspr_el0, gcspr_el1 ,gcspr_el12, gcspr_el2 and gcspr_el3),
which are enabled on using Guarded Control Stack (+gcs flag)
feature.
This patch adds support for Guarded control stack data synchronization
instruction (GCSB DSYNC). This instruction is allocated to existing
HINT space and uses the HINT number 19 and to match this an entry is
added to the aarch64_hint_options array.
This patch adds for Guarded Control Stack Extension (GCS) extension. GCS feature is
optional from Armv9.4-A architecture and enabled by passing +gcs option to -march
(eg: -march=armv9.4-a+gcs) or using ".arch_extension gcs" directive in the assembly file.
Also this patch adds support for GCS instructions gcspushx, gcspopcx, gcspopx,
gcsss1, gcsss2, gcspushm, gcspopm, gcsstr and gcssttr.
This patch adds support for Check Feature Status Extension (CHK) which
is mandatory from Armv8.0-A. Also this patch supports "chkfeat" instruction
(hint #40).
Given the shared use of the aarch64-sys-regs.def file across Binutils
and GCC, add instructions for keeping the file synchronized across the
two codebases.
Namely, it should be made clear that all changes are first to be made
in Binutils and the updated file copied across to GCC.
opcodes/ChangeLog
* opcodes/aarch64-sys-regs.def: Update file-description header
comment.
There is currently a bug in the bit masking for the barrel shift
instructions because the bit mask is not including all of the
register bits which must be zero. With this patch, the disassembler
can be sure that the 32-bit value is indeed a barrel shift instruction
and not a data value in memory.
This fix can be verified by assembling and disassembling the following:
.text
.long 0x65005f5f
With this patch, the bug is fixed, and the objdump will know that
0x65005f5f is not a barrel shift instruction.
Signed-off-by: Neal Frager <neal.frager@amd.com>
Signed-off-by: Michael J. Eager <eager@eagercon.com>
This patches adds new bsefi and bsifi instructions.
BSEFI- The instruction shall extract a bit field from a
register and place it right-adjusted in the destination register.
The other bits in the destination register shall be set to zero.
BSIFI- The instruction shall insert a right-adjusted bit field
from a register at another position in the destination register.
The rest of the bits in the destination register shall be unchanged.
Further documentation of these instructions can be found here:
https://docs.xilinx.com/v/u/en-US/ug984-vivado-microblaze-ref
With version 6 of the patch, no new relocation types are added as
this was unnecessary for adding the bsefi and bsifi instructions.
FIXED: Segfault caused by incorrect termination of microblaze_opcodes.
Signed-off-by: nagaraju <nagaraju.mekala@amd.com>
Signed-off-by: Ibai Erkiaga <ibai.erkiaga-elorza@amd.com>
Signed-off-by: Neal Frager <neal.frager@amd.com>
Signed-off-by: Michael J. Eager <eager@eagercon.com>
The elf psabi allows for mapping symbols to be of the form $x<ISA>.<any>
opcodes/
* riscv-dis.c (riscv_get_map_state): allow mapping symbol to
be suffixed by a unique identifier .<any>
This patches adds new bsefi and bsifi instructions.
BSEFI- The instruction shall extract a bit field from a
register and place it right-adjusted in the destination register.
The other bits in the destination register shall be set to zero.
BSIFI- The instruction shall insert a right-adjusted bit field
from a register at another position in the destination register.
The rest of the bits in the destination register shall be unchanged.
Further documentation of these instructions can be found here:
https://docs.xilinx.com/v/u/en-US/ug984-vivado-microblaze-ref
This patch has been tested for years of AMD Xilinx Yocto
releases as part of the following patch set:
https://github.com/Xilinx/meta-xilinx/tree/master/meta-microblaze/recipes-devtools/binutils/binutils
Signed-off-by: nagaraju <nagaraju.mekala@amd.com>
Signed-off-by: Ibai Erkiaga <ibai.erkiaga-elorza@amd.com>
Signed-off-by: Neal Frager <neal.frager@amd.com>
Signed-off-by: Michael J. Eager <eager@eagercon.com>
This patch moves instances of system register definitions, represented
by the SYSREG macro, out of their original place in `aarch64-opc.c'
and into a dedicated .def file, `aarch64-sys-regs.def'.
System register entries in this new file are ordered alphabetically by
name. This choice is made to enable the use of fast search algorithms
such as binary search when validating register names.
The SYSREG macro, defined as SYSREG (name, encoding, flags, features)
is kept as is and used in the def file, but all other SR_* macros
which previously served as indirections to SYSREG are removed.
opcodes/ChangeLog:
* aarch64-opc.c (SR_CORE): Macro definition and uses deleted.
(SR_FEAT): Likewise.
(SR_FEAT2): Likewise.
(SR_V8_1_A): Likewise.
(SR_V8_4_A): Likewise.
(SR_V8A): Likewise.
(SR_V8R): Likewise.
(SR_V8_1A): Likewise.
(SR_V8_2A): Likewise.
(SR_V8_3A): Likewise.
(SR_V8_4A): Likewise.
(SR_V8_6A): Likewise.
(SR_V8_7A): Likewise.
(SR_V8_8A): Likewise.
(SR_GIC): Likewise.
(SR_AMU): Likewise.
(SR_LOR): Likewise.
(SR_PAN): Likewise.
(SR_RAS): Likewise.
(SR_RNG): Likewise.
(SR_SME): Likewise.
(SR_SSBS): Likewise.
(SR_SVE): Likewise.
(SR_ID_PFR2): Likewise.
(SR_PROFILE): Likewise.
(SR_MEMTAG): Likewise.
(SR_SCXTNUM): Likewise.
(SR_EXPAND_ELx): Likewise.
(SR_EXPAND_EL12): Likewise.
* opcodes/aarch64-sys-regs.def: New.
This patch adds a mechanism for system register name alias detection
to register-matching mechanisms.
A new `F_REG_ALIAS' flag is added to the set of register flags and
used to label which entries in aarch64_sys_regs[] correspond to
aliases (and thus which CPENC values are non-unique in this array).
Where this is used is, for example, in `aarch64_print_operand' where,
in the case of system register decoding, the aarch64_sys_regs[] array
is iterated through until a match in CPENC value is made and the first
match accepted. If insufficient care is given in the ordering of
system registers in this array, the alias is encountered before the
"real" register and used incorrectly as the register name in the
disassembled output.
With this flag and the new `aarch64_sys_reg_alias_p' test, search
candidates corresponding to aliases can be conveniently skipped over.
One concrete example of where this is useful is with the
`trcextinselr0' system register. It was initially placed in the
system register list before `trcextinselr', in contrast to a more
natural alphabetical order.
include/ChangeLog:
* opcode/aarch64.h: add `aarch64_sys_reg_alias_p' prototype.
opcodes/ChangeLog:
* aarch64-opc.c (aarch64_sys_reg_alias_p): New.
(aarch64_print_operand): add aarch64_sys_reg_alias_p check.
(aarch64_sys_regs): Add F_REG_ALIAS flag to "trcextinselr"
entry.
* aarch64-opc.h (F_REG_ALIAS): New.
Following the folding of some generic AVX/AVX2 templates with their
AVX512F counterpart ones, do this for FMA ones as well, requiring one
further adjustment to cpu_flags_match().
In anticipation of APX introduce logic to reduce the number of templates
we have now, allowing to limit some the number of ones we then need to
gain.
The fundamental requirements are that
- attributes be compatible, which specifically means VexW needs to be
the same in the templates (which often isn't the case, for VEX
encodings having far more WIG tha, EVEX ones),
- the EVEX form being AVX512F (with or without AVX512VL), not any of its
extensions (the same will then be required for APX - it'll need to be
APX_F).
Note that in check_register() there's now a redundant zmm check. Since
this logic will need revisiting for APX anyway, I'd like to keep it that
way for now. (Similarly a couple of if()-s which could be folded are
kept separate, to reduce code churn when adding APX support.)
Add a macro pcaddi instruction to support "pcaddi rd, symbol".
pcaddi has a 20-bit signed immediate, it can address a +/- 2MB pc relative
address, and the address should be 4-byte aligned.
The AArch64 feature-flag code is currently limited to a maximum
of 64 features. This patch reworks it so that the limit can be
increased more easily. The basic idea is:
(1) Turn the ARM_FEATURE_FOO macros into an enum, with the enum
counting bit positions.
(2) Make the feature-list macros take an array index argument
(currently always 0). The macros then return the
aarch64_feature_set contents for that array index.
An N-element array would then be initialised as:
{ MACRO (0), ..., MACRO (N - 1) }
(3) Provide convenience macros for initialising an
aarch64_feature_set for:
- a single feature
- a list of individual features
- an architecture version
- an architecture version + a list of additional features
(2) and (3) use the preprocessor to generate static initialisers.
The main restriction was that uses of the same preprocessor macro
cannot be nested. So if a macro wants to do something for N individual
arguments, it needs to use a chain of N macros to do it. There then
needs to be a way of deriving N, as a preprocessor token suitable for
pasting.
The easiest way of doing that was to precede each list of features
by the number of features in the list. So an aarch64_feature_set
initialiser for three features A, B and C would be written:
AARCH64_FEATURES (3, A, B, C)
This scheme makes it difficult to keep AARCH64_FEATURE_CRYPTO as a
synonym for SHA2+AES, so the patch expands the former to the latter.
Now that CpuLM is used solely in cpu_arch_flags and cpu_arch[] while
Cpu64 is solely used in insn templates, they no longer need to be
treated different from other "ordinary" flags; the only "unusual" one
left if CpuNo64. Fold both, leaving just Cpu64.
Looking at the VEX and EVEX forms of vcvtneps2bf16 I noticed that
operand purpose isn't properly reflected in Vxy's definition. Rename
"dst" to "src", thus bringing things in line with Exy.
Recognize "/<number>" suffixes on both -march=+avx10.1 and the
corresponding .arch directive, setting an upper bound on the vector size
that insns may use. Such a restriction can be reset by setting a new base
architecture, by using a suffix-less form, by disabling AVX10, or by
enabling any other VEX/EVEX-based vector extension.
While for most insns we can suppress their use with too wide operands
via registers becoming unavailable (or in Intel syntax memory operand
size specifiers not being recognized), mask register insns have to have
their minimum required vector size specified in a new attribute. (Of
course this new attribute could also be used on other insns.)
Note that .insn continues to be permitted to emit EVEX{512,256} (and
VEX256 ones) encodings regardless of vector size restrictions in place.
Of course these can't be expressed using zmm (or ymm) operands then,
but need using the EVEX.512.* forms (broadcast forms may be usable right
now, but this may go away so shouldn't be relied upon). This is why no
assertions should be added to build_{e,}vex_prefix().
Since this is merely a re-branding of certain AVX512* features, there's
little code to be added.
The main aspect here are new testcases. In order to be able to re-use
some of the existing testcases, several of them need their start symbols
adjusted. Note that 256- and 128-bit tests want adding here, as these
need to work right away. Subsequently they'll gain vector length
constraints.
Since it was missing and is wanted here, also add an AVX512VL+VPOPCNTDQ
test.
These probably should have been put in place already anyway, but they're
very much wanted in order to then put AVX10.1 support on top. Note that
to avoid reverse dependencies towards SSE (just like we already do for
AVX and XOP), add_isa_dependencies() needs some further tweaking.
While there also address a related anomaly: Disabling AES but neither
AVX nor VAES (similarly for {,V}PCLMULQDQ) would better keep the 128-bit
VEX-encoded forms available. Note that for this the VAES insns are moved
past the AVX+AES ones, to avoid the property-11 test suddenly failing.
The test really is wrong, but let's not also make things inconsistent:
Without the movement, YMM use would be correctly recorded for the
128-bit forms simply because the first template already matches, as long
as VAES wasn't disabled. Yet it still wouldn't be if only AVX+AES were
enabled. Nor would behavior here then be the same as for VPCLMUL* insns.
gprofng uses insn_type in print_address_func().
But insn_type is always zero on aarch64.
opcodes/ChangeLog:
2023-09-07 Vladimir Mezentsev <vladimir.mezentsev@oracle.com>
* opcodes/aarch64-dis.c (print_insn_aarch64_word): Set insn_type for
branch instructions.
While the patch already committed for pr30793 prevents the asan error,
there is a problem: Now the last element of bundle_words never gets
written. That's very likely wrong, or KVXMAXBUNDLEWORDS is too big.
So this patch rearranges things a little to support writing of all of
bundle_words and does the parallel bit checking only when filling
bundle_words. In the normal case, kvx_reassemble_bundle will see
bundle_words[word_count-1] with the parallel bit clear and all other
words having it set. In the error case where all words in
bundle_words have the parallel bit set, kvx_reassemble_bundle will be
passed a wordcount of KVXMAXBUNDLEWORDS + 1. I've also made
kvx_reassemble_bundle return true for success rather than zero, and
removed the unnecessary check for zero wordcount.
PR 30793
* kvx-dis.c (kvx_reassemble_bundle): Return bool, true on success.
Fail if wordcount is too large. Don't check for wordcount zero.
Don't check kvx_has_parallel_bit.
(print_insn_kvx): Rewrite code reading bundle_words as a for loop.
Don't stop reading at KVXMAXBUNDLEWORDS - 1.
(decode_prologue_epilogue_bundle): Similarly.
The vendor operands should be named starting with `X', and preferably the
second letter (or multiple following letters) is enough to differentiate
them from other vendors.
Therefore, added letter `t' after `X' for t-head operands, to differentiate
from future different vendor's operands.
bfd/
* elfxx-riscv.c (riscv_supported_vendor_x_ext): Removed the vendor
document link since it should already be recorded in the
gas/doc/c-riscv.texi.
gas/
* config/tc-riscv.c (validate_riscv_insn): Added `t' after `X' for
t-head operands. Minor updates for indents and comments.
(riscv_ip): Likewise.
* doc/c-riscv.texi: Minor updates.
opcodes/
* riscv-dis.c (print_insn_args): Added `t' after `X' for t-head
operands. Minor updates for indents and comments.
* riscv-opc.c (riscv_opcode): Likewise.
There's no need to have almost identical code twice. Do away with
M_VMSGEU and instead simply use an unused (for these macros) field to
tell apart both variants.
The name we use internally isn't in line with the SDM, and also isn't in
line with CpuVPCLMULQDQ. Add the missing suffix, but of course leave
alone user facing names.
Commit 916fae9135 ("Add Size64 to movq/vmovq with Reg64 operand" was
right in adding the attribute to MOVQ, but there was no need to add it
to VMOVQ. (See also the AVX512F form, which doesn't have the attribute
either.)
For disassembly to only use spec-mandated aliases, respective non-alias
entries need to come ahead of their alias ones. Since identical
mnemonics need to stay together, whole groups are moved up where
necessary.
This partly reverts 839189bc93 ("RISC-V: re-arrange opcode table for
consistent alias handling"), but then also goes beyond a plain revert.
Reviewed-by: Tsukasa OI <research_trasio@irq.a4lg.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Although XVentanaCondOps instructions are XLEN-agonistic, Ventana's manual
only defines them only for RV64 (because all Ventana's processors implement
RV64).
This commit limits XVentanaCondOps instructions RV64-only to match the
behavior of the manual and LLVM.
Note that this commit alone will not make XVentanaCondOps extension with
RV32 invalid (it just makes XVentanaCondOps on RV32 empty).
opcodes/ChangeLog:
* riscv-opc.c (riscv_opcodes): Restrict "vt.maskc" and "vt.maskcn"
to XLEN=64.
gas/ChangeLog:
* testsuite/gas/riscv/x-ventana-condops-32.d: New failure test.
* testsuite/gas/riscv/x-ventana-condops-32.l: Likewise.
This patch sets GUILE to just plain 'guile'.
In the distant ("devo") past, the top-level build did support building
Guile in-tree. However, I don't think this really works any more.
For one thing, there are no build dependencies on it, so there's no
guarantee it would actually be built before the uses.
This patch also removes the use of "-s" as an option to cgen scheme
scripts. With my latest patch upstream, this is no longer needed.
After the upstream changes, either Guile 2 or Guile 3 will work, with
or without the compiler enabled.
2023-08-24 Tom Tromey <tom@tromey.com>
* cgen.sh: Don't pass "-s" to cgen.
* Makefile.in: Rebuild.
* Makefile.am (GUILE): Simplify.
i386: warning: format ‘%u’ expects argument of type ‘unsigned int’,
but argument 4 has type ‘size_t’ {aka ‘long unsigned int’} [-Wformat=]
ia64: warning: ignoring return value of ‘fgets’
* i386-gen.c (process_i386_opcodes): Correct format string.
* ia64-gen.c (load_insn_classes, load_depfile): Don't ignore
fgets return value.
opcodes/
* kvx-dis.c (print_insn_kvx): Change the loop condition so that
wordcount is always less than KVXMAXBUNDLEWORDS.
(decode_prologue_epilogue_bundle): Likewise.
Historically, flags and variables relating to architectural revisions
for the A-profile architecture omitted the trailing `A' such that, for
example, assembling for `-march=armv8.4-a' set the `AARCH64_ARCH_V8_4'
flag in the assembler.
This leads to some ambiguity, since Binutils also targets the
R-profile Arm architecture. Therefore, it seems prudent to have
everything associated with the A-profile cores end in `A' and likewise
`R' for the R-profile. Referring back to the example above, the flag
set for `-march=armv8.4-a' is better characterized if labeled
`AARCH64_ARCH_V8_4A'.
The only exception to the rule of appending `A' to variables is found
in the handling of the `AARCH64_FEATURE_V8' macro, as it is the
baseline from which ALL processors derive and should therefore be left
unchanged.
In reflecting the `ARM' architectural nomenclature choices, where we
have `ARM_ARCH_V8A' and `ARM_ARCH_V8R', the choice is made to not have
an underscore separating the numerical revision number and the
A/R-profile indicator suffix. This has meant that renaming of
R-profile related flags and variables was warranted, thus going from
`.*_[vV]8_[rR]' to `.*_[vV]8[rR]'.
Finally, this is more in line with conventions within GCC and adds consistency
across the toolchain.
gas/ChangeLog:
* gas/config/tc-aarch64.c:
(aarch64_cpus): Reference to arch feature macros updated.
(aarch64_archs): Likewise.
include/ChangeLog:
* include/opcode/aarch64.h:
(AARCH64_FEATURE_V8A): Updated name: V8_A -> V8A.
(AARCH64_FEATURE_V8_1A): A-suffix added.
(AARCH64_FEATURE_V8_2A): Likewise.
(AARCH64_FEATURE_V8_3A): Likewise.
(AARCH64_FEATURE_V8_4A): Likewise.
(AARCH64_FEATURE_V8_5A): Likewise.
(AARCH64_FEATURE_V8_6A): Likewise.
(AARCH64_FEATURE_V8_7A): Likewise.
(AARCH64_FEATURE_V8_8A):Likewise.
(AARCH64_FEATURE_V9A): Likewise.
(AARCH64_FEATURE_V8R): Updated name: V8_R -> V8R.
(AARCH64_ARCH_V8A_FEATURES): Updated name: V8_A -> V8A.
(AARCH64_ARCH_V8_1A_FEATURES): A-suffix added.
(AARCH64_ARCH_V8_2A_FEATURES): Likewise.
(AARCH64_ARCH_V8_3A_FEATURES): Likewise.
(AARCH64_ARCH_V8_4A_FEATURES): Likewise.
(AARCH64_ARCH_V8_5A_FEATURES): Likewise.
(AARCH64_ARCH_V8_6A_FEATURES): Likewise.
(AARCH64_ARCH_V8_7A_FEATURES): Likewise.
(AARCH64_ARCH_V8_8A_FEATURES): Likewise.
(AARCH64_ARCH_V9A_FEATURES): Likewise.
(AARCH64_ARCH_V9_1A_FEATURES): Likewise.
(AARCH64_ARCH_V9_2A_FEATURES): Likewise.
(AARCH64_ARCH_V9_3A_FEATURES): Likewise.
(AARCH64_ARCH_V8A): Updated name: V8_A -> V8A.
(AARCH64_ARCH_V8_1A): A-suffix added.
(AARCH64_ARCH_V8_2A): Likewise.
(AARCH64_ARCH_V8_3A): Likewise.
(AARCH64_ARCH_V8_4A): Likewise.
(AARCH64_ARCH_V8_5A): Likewise.
(AARCH64_ARCH_V8_6A): Likewise.
(AARCH64_ARCH_V8_7A): Likewise.
(AARCH64_ARCH_V8_8A): Likewise.
(AARCH64_ARCH_V9A): Likewise.
(AARCH64_ARCH_V9_1A): Likewise.
(AARCH64_ARCH_V9_2A): Likewise.
(AARCH64_ARCH_V9_3A): Likewise.
(AARCH64_ARCH_V8_R): Updated name: V8_R -> V8R.
opcodes/ChangeLog:
* opcodes/aarch64-opc.c (SR_V8A): Updated name: V8_A -> V8A.
(SR_V8_1A): A-suffix added.
(SR_V8_2A): Likewise.
(SR_V8_3A): Likewise.
(SR_V8_4A): Likewise.
(SR_V8_6A): Likewise.
(SR_V8_7A): Likewise.
(SR_V8_8A): Likewise.
(aarch64_sys_regs): Reference to arch feature macros updated.
(aarch64_pstatefields): Reference to arch feature macros updated.
(aarch64_sys_ins_reg_supported_p): Reference to arch feature macros
updated.
* opcodes/aarch64-tbl.h:
(aarch64_feature_v8_2a): a-suffix added.
(aarch64_feature_v8_3a): Likewise.
(aarch64_feature_fp_v8_3a): Likewise.
(aarch64_feature_v8_4a): Likewise.
(aarch64_feature_fp_16_v8_2a): Likewise.
(aarch64_feature_v8_5a): Likewise.
(aarch64_feature_v8_6a): Likewise.
(aarch64_feature_v8_7a): Likewise.
(aarch64_feature_v8r): Updated name: v8_r-> v8r.
(ARMV8R): Updated name: V8_R-> V8R.
(ARMV8_2A): A-suffix added.
(ARMV8_3A): Likewise.
(FP_V8_3A): Likewise.
(ARMV8_4A): Likewise.
(FP_F16_V8_2A): Likewise.
(ARMV8_5): Likewise.
(ARMV8_6A): Likewise.
(ARMV8_6A_SVE): Likewise.
(ARMV8_7A): Likewise.
(V8_2A_INSN): `A' added to macro symbol.
(V8_3A_INSN): Likewise.
(V8_4A_INSN): Likewise.
(FP16_V8_2A_INSN): Likewise.
(V8_5A_INSN): Likewise.
(V8_6A_INSN): Likewise.
(V8_7A_INSN): Likewise.
(V8R_INSN): Updated name: V8_R-> V8R.
kvx_dis_init currently always returns true, but error conditions do so
by "return -1" which converts to true. The return status is ignored
anyway, and it doesn't make much sense to error on unexpected arch or
mach: If print_insn_kvx is called then the atch is known to be kvx,
and it's better to choose some default for a user passing an unknown
mach value rather than segfaulting in decode_insn when env.opc_table
is NULL.
I've chosen the default mach to be bfd_mach_kv3_1, the default in
bfd/cpu-kvx.c, not that it matters very much. In normal objdump/gdb
usage, info->mach won't be an unexpected value.
* kvx-dis.c (kvx_dis_init): Return void. Don't error on
unexpected arch or mach. Default to bfd_mach_kv3_1 for
unknown mach. Don't clear info->disassembler_options.
The neg/neg32 BPF instructions always use BPF_SRC_K (=0) in their header
source bit, despite operating on registers. If BPF_SRC_X (=1) is set,
the instructions are rejected by the kernel.
Because of this there are also no neg/neg32 instructions which operate
on immediates, so remove them.
bd434cc4d9 was a similar fix in the old
CGEN-based port, but was not carried forward in the new port.
include/
* opcode/bpf.h (enum bpf_insn_id): Remove spurious entries
BPF_INSN_NEGI and BPF_INSN_NEG32I.
opcodes/
* bpf-opc.c (bpf_opcodes): Remove erroneous NEGI and NEG32I
instructions.
gas/
* doc/c-bpf.texi (BPF Instructions): Remove erroneous neg and
neg32 instructions operating on immediates.
* testsuite/gas/bpf/alu.s: Adapt accordingly.
* testsuite/gas/bpf/alu.d: Likewise.
* testsuite/gas/bpf/alu-be.d: Likewise
* testsuite/gas/bpf/alu32.s: Likewise.
* testsuite/gas/bpf/alu32.d: Likewise.
* testsuite/gas/bpf/alu32-be.d: Likewise.
* testsuite/gas/bpf/alu-pseudoc.s: Likewise.
* testsuite/gas/bpf/alu-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu-be-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu32-pseudoc.s: Likewise.
* testsuite/gas/bpf/alu32-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu32-be-pseudoc.d: Likewise.
I had reason yesterday to want to regenerate configury files which I
do with --enable-maintainer-mode, and added --enable-cgen-maint
accidentally. The first problem I hit is that sim looks for cgen in a
different directory by default than opcodes, and I had my source
layout set up for opcodes rather than sim. Fix that by making both
use ../cgen first, then ../../cgen relative to sim/ and opcodes/. The
next problem was that various sim local.mk files expected generated
sources in the build dir rather than the source dir. Fix that by
adding $(srcdir) to paths. Finally, the generated iq2000 files had a
compile error, fixed by the cpu/iq2000.cpu patch.
cpu/
* iq2000.cpu (syscall): Add pc arg.
opcodes/
* configure.ac (cgendir): Default to ../../cgen, but use ../cgen
if found there.
* configure: Regenerate.
sim/m4/
* sim_ac_option_cgen_maint.m4 (cgendir): Look in ../cgen too.
sim/
* cris/local.mk: Add $(srcdir) to paths for regenerated source.
* frv/local.mk: Likewise.
* iq2000/local.mk: Likewise.
* lm32/local.mk: Likewise.
* m32r/local.mk: Likewise.
* or1k/local.mk: Likewise.
* Makefile.in: Regenerate.
* configure: Regenerate.
The documentation of the 'Zfa' extension states that "fli.h" is available
"if the Zfh or Zvfh extension is implemented" (both the latest and the
oldest editions are checked).
This fact was not reflected in Binutils ('Zvfh' implies 'Zfhmin', not full
'Zfh' extension and "fli.h" required 'Zfh' and 'Zfa' extensions).
This commit makes "fli.h" also available when both 'Zfa' and 'Zvfh'
extensions are implemented.
bfd/ChangeLog:
* elfxx-riscv.c (riscv_multi_subset_supports): Add new
instruction class handling.
(riscv_multi_subset_supports_ext): Likewise.
gas/ChangeLog:
* testsuite/gas/riscv/zfa-zvfh.s: New test.
* testsuite/gas/riscv/zfa-zvfh.d: Ditto.
include/ChangeLog:
* opcode/riscv.h (enum riscv_insn_class): Add new instruction
class.
opcodes/ChangeLog:
* riscv-opc.c (riscv_opcodes): Change instruction class of "fli.h"
from INSN_CLASS_ZFH_AND_ZFA to new INSN_CLASS_ZFH_OR_ZVFH_AND_ZFA.
This commit adds 'Zihintntl' extension and its hint instructions.
This is based on:
<0dc91f505e>,
the first ISA Manual noting that the 'Zihintntl' extension is ratified.
Note that compressed 'Zihintntl' hints require either 'C' or
'Zca' extension.
Co-authored-by: Nelson Chu <nelson@rivosinc.com>
bfd/ChangeLog:
* elfxx-riscv.c (riscv_supported_std_z_ext): Add 'Zihintntl'
standard hint 'Z' extension.
(riscv_multi_subset_supports): Support new instruction classes.
(riscv_multi_subset_supports_ext): Likewise.
gas/ChangeLog:
* testsuite/gas/riscv/zihintntl.s: New test for 'Zihintntl'
including auto-compression without C prefix and explicit C prefix.
* testsuite/gas/riscv/zihintntl.d: Likewise.
* testsuite/gas/riscv/zihintntl-na.d: Likewise.
* testsuite/gas/riscv/zihintntl-base.s: New test for correspondence
between 'Zihintntl' and base 'I' or 'C' instructions.
* testsuite/gas/riscv/zihintntl-base.d: Likewise.
include/ChangeLog:
* opcode/riscv.h (enum riscv_insn_class): Add new instruction
classes: INSN_CLASS_ZIHINTNTL and INSN_CLASS_ZIHINTNTL_AND_C.
(MASK_NTL_P1, MATCH_NTL_P1, MASK_NTL_PALL,
MATCH_NTL_PALL, MASK_NTL_S1, MATCH_NTL_S1, MASK_NTL_ALL,
MATCH_NTL_ALL, MASK_C_NTL_P1, MATCH_C_NTL_P1, MASK_C_NTL_PALL,
MATCH_C_NTL_PALL, MASK_C_NTL_S1, MATCH_C_NTL_S1, MASK_C_NTL_ALL,
MATCH_C_NTL_ALL): New.
opcodes/ChangeLog:
* riscv-opc.c (riscv_opcodes): Add instructions from the
'Zihintntl' extension.
The longest register name is 4 characters (plus a nul one), so using a
4- or 8-byte pointer to get at it is neither space nor time efficient.
Embed the names right into the array. For PIE this also reduces the
number of base relocations in the final image.
To avoid old gcc, when generating 32-bit code, bogusly warning about
bounds being exceeded in the code processing Cs/Cw, Ct/Cx, and CD,
an adjustment to EXTRACT_BITS() is needed: This macro shouldn't supply
a 64-bit value, and it also doesn't need to - all operand fields to
date are far more narrow than 32 bits. This in turn allows dropping a
number of casts elsewhere.
This regenerates config files changed by the previous 44 commits.
Note that subject lines in these commits mostly match the gcc git
originating commit.
The table constantly growing in two dimensions (number of table entries
times number of ISA extension flags) doesn't scale very well. Use a more
compact representation: Only identifiers which need to combine with
other identifiers retain individual flag bits. All others are combined
into an enum, with a new helper added to transform the table entries
into the original i386_cpu_flags layout. This way the table in the final
binary shrinks by almost a third (the generated source code shrinks by
about half), and isn't likely to grow again in that dimension any time
soon.
While moving the 3DNow! fields, drop the stray inner 'a' from their
names.
Their check_func should be "match_never", not "match_opcode". The reasons
this error did not cause any disassembler errors are:
1. The problem will not reproduce if "no-aliases" is specified
(because macro instructions are handled as aliases).
2. If not, all affected compressed instructions or their aliases
precede before "vmsge{,u}.vx" macro instructions.
However, it'll easily break if we reorder opcode entries. This commit
fixes this issue before the *accident* occurs.
opcodes/ChangeLog:
* riscv-opc.c (riscv_opcodes): Make sure that we never match to
vmsge{,u}.vx instructions unless specified in the assembler.
The 32-bit non-fetching atomic instructions treat the source register as
32-bits, which means in the pseudo-c syntax the "w" registers should be
used rather than the "r" registers.
opcodes/
* bpf-opc-c (bpf_opcodes): Use %sw for AAD32, AOR32, AAND32
and AXOR32 pseudo-c dialect asm templates.
gas/
* testsuite/gas/bpf/atomic-be-pseudoc.d: Use "w" for source reg
in non-fetching 32-bit atomic instructions.
* testsuite/gas/bpf/atomic-pseudoc.d: Likewise.
* testsuite/gas/bpf/atomic-pseudoc.s: Likewise.
It makes little sense to have this comment meanwhile over a hundred
lines ahead of the array. In fact until spotting the comment, I was
wondering why those pretty important aspects aren't spelled out
anywhere.
Since I was poking at cris-dis.c to avoid the sanitizer warning,
I figure I might as well make use of stpcpy and sprintf return value
in other places in this file.
* cris-dis.c (format_hex): Use sprintf return value.
(format_reg): Use stpcpy and sprintf return, avoiding strlen.
(format_sup_reg): Likewise.
Simplify the sprintf calls, and use sprintf return value. Older code
in binutils avoided using the sprintf return count of chars printed,
because with some older C libraries it wasn't reliable. Nowadays it
should be OK to use (and we already use the return value elsewhere).
sprintf can't return an error status of -1 here.
* cris-dis.c (format_dec): Avoid sanitizer warning. Use sprintf
return value rather than calling strlen.
This patch fixes a regression recently introduced in the BPF
disassembler, that was assuming an abfd was always available in
info->section->owner. Apparently this is not so in GDB, and therefore
https://sourceware.org/bugzilla/show_bug.cgi?id=30705.
Tested in bpf-unkonwn-none.
opcodes/ChangeLog:
2023-07-31 Jose E. Marchesi <jose.marchesi@oracle.com>
PR 30705
* bpf-dis.c (print_insn_bpf): Check that info->section->owner is
actually available before using it.
This patch adds support for EF_BPF_CPUVER bits in the ELF
machine-dependent header flags. These bits encode the BPF CPU
version for which the object file has been compiled for.
The BPF assembler is updated so it annotates the object files it
generates with these bits.
The BPF disassembler is updated so it honors EF_BPF_CPUVER to use the
appropriate ISA version if the user didn't specify an explicit ISA
version in the command line. Note that a value of zero in
EF_BPF_CPUVER is interpreted by the disassembler as "use the later
supported version" (the BPF CPU versions start with v1.)
The readelf utility is updated to pretty print EF_BPF_CPUVER when it
prints out the ELF header:
$ readelf -h a.out
ELF Header:
...
Flags: 0x4, CPU Version: 4
Tested in bpf-unknown-none.
include/ChangeLog:
2023-07-30 Jose E. Marchesi <jose.marchesi@oracle.com>
* elf/bpf.h (EF_BPF_CPUVER): Define.
* opcode/bpf.h (BPF_XBPF): Change from 0xf to 0xff so it fits in
EF_BPF_CPUVER.
binutils/ChangeLog:
2023-07-30 Jose E. Marchesi <jose.marchesi@oracle.com>
* readelf.c (get_machine_flags): Recognize and pretty print BPF
machine flags.
opcodes/ChangeLog:
2023-07-30 Jose E. Marchesi <jose.marchesi@oracle.com>
* bpf-dis.c: Initialize asm_bpf_version to -1.
(print_insn_bpf): Set BPF ISA version from the cpu version ELF
header flags if no explicit version set in the command line.
* disassemble.c (disassemble_init_for_target): Remove unused code.
gas/ChangeLog:
2023-07-30 Jose E. Marchesi <jose.marchesi@oracle.com>
* config/tc-bpf.h (elf_tc_final_processing): Define.
* config/tc-bpf.c (bpf_elf_final_processing): New function.
This patch fixes the BPF_INSN_NEGR and BPF_INSN_NEG32R BPF
instructions to not use their source registers.
Tested in bpf-unknown-none.
opcodes/ChangeLog:
2023-07-26 Jose E. Marchesi <jose.marchesi@oracle.com>
* bpf-opc.c (bpf_opcodes): Fix BPF_INSN_NEGR to not use a src
register.
gas/ChangeLog:
2023-07-26 Jose E. Marchesi <jose.marchesi@oracle.com>
* testsuite/gas/bpf/alu.s: The register neg instruction gets only
one argument.
* testsuite/gas/bpf/alu32-be-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu32-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu32-pseudoc.s: Likewise.
* testsuite/gas/bpf/alu-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu-be-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu-pseudoc.s: Likewise.
* testsuite/gas/bpf/alu-be.d: Likewise.
* testsuite/gas/bpf/alu.d: Likewise.
* testsuite/gas/bpf/alu32-be.d: Likewise.
* testsuite/gas/bpf/alu32.d: Likewise.
* testsuite/gas/bpf/alu32.s: Likewise.
* doc/c-bpf.texi (BPF Instructions): Update accordingly.
This patch adds support for the BPF V4 ISA byte swap instructions to
opcodes, assembler and disassembler.
Tested in bpf-unknown-none.
include/ChangeLog:
2023-07-24 Jose E. Marchesi <jose.marchesi@oracle.com>
* opcode/bpf.h (BPF_IMM32_BSWAP16): Define.
(BPF_IMM32_BSWAP32): Likewise.
(BPF_IMM32_BSWAP64): Likewise.
(enum bpf_insn_id): New entries BPF_INSN_BSWAP{16,32,64}.
opcodes/ChangeLog:
2023-07-24 Jose E. Marchesi <jose.marchesi@oracle.com>
* bpf-opc.c (bpf_opcodes): Add entries for the BSWAP*
instructions.
gas/ChangeLog:
2023-07-24 Jose E. Marchesi <jose.marchesi@oracle.com>
* doc/c-bpf.texi (BPF Instructions): Document BSWAP* instructions.
* testsuite/gas/bpf/alu.s: Test BSWAP{16,32,64} instructions.
* testsuite/gas/bpf/alu.d: Likewise.
* testsuite/gas/bpf/alu-be.d: Likewise.
* testsuite/gas/bpf/alu-pseudoc.s: Likewise.
* testsuite/gas/bpf/alu-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu-be-pseudoc.d: Likewise.
This patch fixes the pseudoc syntax of the V4 instructions MOVS* and
LDXS* in order to reflect https://reviews.llvm.org/D144829.
opcodes/ChangeLog:
2023-07-24 Jose E. Marchesi <jose.marchesi@oracle.com>
* bpf-opc.c (bpf_opcodes): Fix pseudo-c syntax for MOVS* and LDXS*
instructions.
gas/ChangeLog:
2023-07-24 Jose E. Marchesi <jose.marchesi@oracle.com>
* doc/c-bpf.texi (BPF Instructions): Fix pseudoc syntax for MOVS*
and LDXS* instructions.
* testsuite/gas/bpf/mem-pseudoc.d: Likewise.
* testsuite/gas/bpf/mem-be-pseudoc.d: Likewise.
* testsuite/gas/bpf/mem-pseudoc.s: Likewise.
* testsuite/gas/bpf/alu-pseudoc.s: Likewise.
* testsuite/gas/bpf/alu-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu-be-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu32-pseudoc.s: Likewise.
* testsuite/gas/bpf/alu32-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu32-be-pseudoc.d: Likewise.
This patch adds support for the V4 BPF instruction jal/gotol, which is
like ja/goto but it supports a signed 32-bit PC-relative (in number of
64-bit words minus one) target operand instead of the 16-bit signed
operand of the other instruction. This greatly increases the jump
range in BPF programs.
Tested in bpf-unkown-none.
bfd/ChangeLog:
2023-07-24 Jose E. Marchesi <jose.marchesi@oracle.com>
* reloc.c: New reloc BFD_RELOC_BPF_DISPCALL32.
* elf64-bpf.c (bpf_reloc_type_lookup): Handle the new reloc.
* libbfd.h (bfd_reloc_code_real_names): Regenerate.
gas/ChangeLog:
2023-07-24 Jose E. Marchesi <jose.marchesi@oracle.com>
* config/tc-bpf.c (struct bpf_insn): New field `id'.
(md_assemble): Save the ids of successfully parsed instructions
and use the new BFD_RELOC_BPF_DISPCALL32 whenever appropriate.
(md_apply_fix): Adapt to the new BFD reloc.
* testsuite/gas/bpf/jump.s: Test JAL.
* testsuite/gas/bpf/jump.d: Likewise.
* testsuite/gas/bpf/jump-pseudoc.d: Likewise.
* testsuite/gas/bpf/jump-be.d: Likewise.
* testsuite/gas/bpf/jump-be-pseudoc.d: Likewise.
* doc/c-bpf.texi (BPF Instructions): Document new instruction
jal/gotol.
Document new operand type disp32.
include/ChangeLog:
2023-07-24 Jose E. Marchesi <jose.marchesi@oracle.com>
* opcode/bpf.h (enum bpf_insn_id): Add entry BPF_INSN_JAL.
(enum bpf_insn_id): Remove spurious entry BPF_INSN_CALLI.
opcodes/ChangeLog:
2023-07-23 Jose E. Marchesi <jose.marchesi@oracle.com>
* bpf-opc.c (bpf_opcodes): Add entry for jal.
This tiny patch makes the BPF disassembler to emit, e.g.
ldxdw %r1, [%r0+0]
instead of
ldxdw %r1, [%r00]
when the offset is 0, to avoid confusion.
opcodes/
* bpf-dis.c (print_insn_bpf): Print offsets with value 0 as "+0".
gas/
* testsuite/gas/bpf/mem.s: Add tests with offset 0.
* testsuite/gas/bpf/mem-pseudoc.s: Likewise.
* testsuite/gas/bpf/mem.d: Update accordingly.
* testsuite/gas/bpf/mem-be.d: Likewise.
* testsuite/gas/bpf/mem-pseudoc.d: Likewise.
* testsuite/gas/bpf/mem-be-pseudoc.d: Likewise.
This commit adds the signed load to register (ldxs*) instructions
introduced in the BPF ISA version 4, including opcodes and assembler
tests.
Tested in bpf-unknown-none.
include/ChangeLog:
2023-07-21 Jose E. Marchesi <jose.marchesi@oracle.com>
* opcode/bpf.h (enum bpf_insn_id): Add entries for signed load
instructions.
(BPF_MODE_SMEM): Define.
opcodes/ChangeLog:
2023-07-21 Jose E. Marchesi <jose.marchesi@oracle.com>
* bpf-opc.c (bpf_opcodes): Add entries for LDXS{B,W,H,DW}
instructions.
gas/ChangeLog:
2023-07-21 Jose E. Marchesi <jose.marchesi@oracle.com>
* testsuite/gas/bpf/mem.s: Add signed load instructions.
* testsuite/gas/bpf/mem-pseudoc.s: Likewise.
* testsuite/gas/bpf/mem.d: Likewise.
* testsuite/gas/bpf/mem-pseudoc.d: Likewise.
* testsuite/gas/bpf/mem-be.d: Likewise.
* doc/c-bpf.texi (BPF Instructions): Document the signed load
instructions.
This commit adds the signed register move (movs) instructions
introduced in the BPF ISA version 4, including opcodes and assembler
tests.
Tested in bpf-unknown-none.
include/ChangeLog:
2023-07-21 Jose E. Marchesi <jose.marchesi@oracle.com>
* opcode/bpf.h (BPF_OFFSET16_MOVS8): Define.
(BPF_OFFSET16_MOVS16): Likewise.
(BPF_OFFSET16_MOVS32): Likewise.
(enum bpf_insn_id): Add entries for MOVS{8,16,32}R and
MOVS32{8,16,32}R.
opcodes/ChangeLog:
2023-07-21 Jose E. Marchesi <jose.marchesi@oracle.com>
* bpf-opc.c (bpf_opcodes): Add entries for MOVS{8,16,32}R and
MOVS32{8,16,32}R instructions. and MOVS32I instructions.
gas/ChangeLog:
2023-07-21 Jose E. Marchesi <jose.marchesi@oracle.com>
* testsuite/gas/bpf/alu.s: Test movs instructions.
* testsuite/gas/bpf/alu-pseudoc.s: Likewise.
* testsuite/gas/bpf/alu32.s: Likewise for movs32 instruction.
* testsuite/gas/bpf/alu32-pseudoc.s: Likewise.
* testsuite/gas/bpf/alu.d: Add expected results.
* testsuite/gas/bpf/alu32.d: Likewise.
* testsuite/gas/bpf/alu-be.d: Likewise.
* testsuite/gas/bpf/alu32-be.d: Likewise.
* testsuite/gas/bpf/alu-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu32-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu-be-pseudoc.d: Likewise.
* testsuite/gas/bpf/alu32-be-pseudoc.d: Likewise.
This was breaking --enable-targets=all builds.
opcodes/ChangeLog:
2023-07-21 Jose E. Marchesi <jose.marchesi@oracle.com>
* Makefile.am (TARGET64_LIBOPCODES_CFILES): Add missing bpf-dis.c
* Makefile.in: Regenerate.
CGEN is cool, but the BPF architecture is simply too bizarre for it.
The weird way of BPF to handle endianness in instruction encoding, the
weird C-like alternative assembly syntax, the weird abuse of
multi-byte (or infra-byte) instruction fields as opcodes, the unusual
presence of opcodes beyond the first 32-bits of some instructions, are
all examples of what makes it a PITA to continue using CGEN for this
port. The bpf.cpu file is becoming so complex and so nested with
p-macros that it is very difficult to read, and quite challenging to
update. Also, every time we are forced to change something in CGEN to
accommodate BPF requirements (which is often) we have to do extensive
testing to make sure we do not break any other target using CGEN.
This is getting un-maintenable.
So I have decided to bite the bullet and revamp/rewrite the port so it
no longer uses CGEN. Overall, this involved:
* To remove the cpu/bpf.{cpu,opc} descriptions.
* To remove the CGEN generated files.
* To replace the CGEN generated opcodes table with a new hand-written
opcodes table for BPF.
* To replace the CGEN generated disassembler wih a new disassembler
that uses the new opcodes.
* To replace the CGEN generated assembler with a new assembler that uses the
new opcodes.
* To replace the CGEN generated simulator with a new simulator that uses the
new opcodes. [This is pushed in GDB in another patch.]
* To adapt the build systems to the new situation.
Additionally, this patch introduces some extensions and improvements:
* A new BPF relocation BPF_RELOC_BPF_DISP16 plus corresponding ELF
relocation R_BPF_GNU_64_16 are added to the BPF BFD port. These
relocations are used for section-relative 16-bit offsets used in
load/store instructions.
* The disassembler now has support for the "pseudo-c" assembly syntax of
BPF. What dialect to use when disassembling is controlled by a command
line option.
* The disassembler now has support for dumping instruction immediates in
either octal, hexadecimal or decimal. The used output base is controlled
by a new command-line option.
* The GAS BPF test suite has been re-structured and expanded in order to
test the disassembler pseudoc syntax support. Minor bugs have been also
fixed there. The assembler generic tests that were disabled for bpf-*-*
targets due to the previous implementation of pseudoc syntax are now
re-enabled. Additional tests have been added to test the new features of
the assembler. .dump files are no longer used.
* The linker BPF test suite has been adapted to the command line options
used by the new disassembler.
The result is very satisfactory. This patchs adds 3448 lines of code
and removes 10542 lines of code.
Tested in:
* Target bpf-unknown-none with 64-bit little-endian host and 32-bit
little-endian host.
* Target x86-64-linux-gnu with --enable-targets=all
Note that I have not tested in a big-endian host yet. I will do so
once this lands upstream so I can use the GCC compiler farm.
I have not included ChangeLog entries in this patch: these would be
massive and not very useful, considering this is pretty much a rewrite
of the port. I beg the indulgence of the global maintainers.
Bring disassembly back in line with what the assembler accepts, thus
also making it self-consistent (with, in particular selector load/store
insns). While there further add D to all affected insns except ARPL
(where S is used, matching LAR/LSL), to also behave correctly in suffix-
always mode.
While there also hook up the Intel variant of the LKGS test.
For whatever reason in c9f5b96bda ("x86: correct handling of LAR and
LSL") I didn't realize that we can easily use Sv instead of going
through mod_table[]. Redo this aspect of that change.
This patch support Zcb extension, contains new compressed instructions,
some instructions depend on other existed extension, like 'zba', 'zbb'
and 'zmmul'. Zcb also imply Zca extension to enable the compressing
features.
Co-Authored by: Charlie Keaney <charlie.keaney@embecosm.com>
Co-Authored by: Mary Bennett <mary.bennett@embecosm.com>
Co-Authored by: Nandni Jamnadas <nandni.jamnadas@embecosm.com>
Co-Authored by: Sinan Lin <sinan.lin@linux.alibaba.com>
Co-Authored by: Simon Cook <simon.cook@embecosm.com>
Co-Authored by: Shihua Liao <shihua@iscas.ac.cn>
Co-Authored by: Yulong Shi <yulong@iscas.ac.cn>
bfd/ChangeLog:
* elfxx-riscv.c (riscv_multi_subset_supports): New extension.
(riscv_multi_subset_supports_ext): Ditto.
gas/ChangeLog:
* config/tc-riscv.c (validate_riscv_insn): New operators.
(riscv_ip): Ditto.
* testsuite/gas/riscv/zcb.d: New test.
* testsuite/gas/riscv/zcb.s: New test.
include/ChangeLog:
* opcode/riscv-opc.h (MATCH_C_LBU): New opcode.
(MASK_C_LBU): New mask.
(MATCH_C_LHU): New opcode.
(MASK_C_LHU): New mask.
(MATCH_C_LH): New opcode.
(MASK_C_LH): New mask.
(MATCH_C_SB): New opcode.
(MASK_C_SB): New mask.
(MATCH_C_SH): New opcode.
(MASK_C_SH): New mask.
(MATCH_C_ZEXT_B): New opcode.
(MASK_C_ZEXT_B): New mask.
(MATCH_C_SEXT_B): New opcode.
(MASK_C_SEXT_B): New mask.
(MATCH_C_ZEXT_H): New opcode.
(MASK_C_ZEXT_H): New mask.
(MATCH_C_SEXT_H): New opcode.
(MASK_C_SEXT_H): New mask.
(MATCH_C_ZEXT_W): New opcode.
(MASK_C_ZEXT_W): New mask.
(MATCH_C_NOT): New opcode.
(MASK_C_NOT): New mask.
(MATCH_C_MUL): New opcode.
(MASK_C_MUL): New mask.
(DECLARE_INSN): New opcode.
* opcode/riscv.h (EXTRACT_ZCB_BYTE_UIMM): New inline func.
(EXTRACT_ZCB_HALFWORD_UIMM): Ditto.
(ENCODE_ZCB_BYTE_UIMM): Ditto.
(ENCODE_ZCB_HALFWORD_UIMM): Ditto.
(VALID_ZCB_BYTE_UIMM): Ditto.
(VALID_ZCB_HALFWORD_UIMM): Ditto.
(enum riscv_insn_class): New extension class.
opcodes/ChangeLog:
* riscv-dis.c (print_insn_args): New operators.
* riscv-opc.c: New instructions.
First of all it is entirely unclear why THREE_BYTE_TABLE_PREFIX() was
introduced by bf890a93a7. Nothing uses the .prefix_requirement values
from the two relevant entries.
And then having VEX_Cn_TABLE() and friends take arguments is misleading.
These aren't used (or pointlessly used in the case of VEX_C5_TABLE); the
respective table index is decoded from the insn (or implied in the case
of VEX_C5_TABLE).
Several already use OP_R(), which rejects the memory forms of insns, and
a few others can easily be converted to do so as well. Note that for it
to be able to use BadOp() without forward declaration, OP_Skip_MODRM() is
moved down.
While there add the previously missing PREFIX_OPCODE to legacy opcode
0FD7.
Now that we have OP_R(), use it here as well, while wiring memory-only
operands to OP_M() at the same time. To keep the number of consumed
opcode bytes similar to before, make BadOp() also account for VEX/XOP/
EVEX prefix bytes. To keep that change simple, convert need_vex to an
actual count of prefix bytes (keeping intact all prior boolean uses of
the field).
Note how this improves disassembly of such bad encodings, by at least
leaving a hint towards what a "nearby" instruction is. (For KSHIFT*
change the immediates test testcases use, such that disassembly remains
sufficiently in sync.)
While there also use Ux for VPMOV{B,W,D,Q}2M, where decoding through
mod_table[] was missing in the earlier scheme.
Fold OP_MS() and OP_XS() into OP_R(), paralleling OP_M(). Use operand
names (largely) matching those in the SDM. For 128-bit-only forms use
Uxmm though, marking 256-bit forms as bad. This then allows no longer
going through vex_len_table[] for two of the insns.
Specifically _do not_ continue to mis-use v_mode.
Several already use OP_M(), which rejects the register forms of insns,
and a few others can easily be converted to do so as well. (Note that
FXSAVE_Fixup() wires through to OP_M(). Note further that OP_IndirE(),
which wasn't placed very well anyway, is moved down to avoid the need to
forward-declare BadOp().)
Also adjust formatting of and drop PREFIX_OPCODE from a few adjacent
entries.
Most of them use Mx already for the memory operand, which rejects the
register form of the insn. Use that operand type also for the two EVEX
forms which so far have used EXEvexXNoBcst (and thus failed to reject
the register forms), compensating by flagging broadcast as bad for all
Mx. This way several other insns which don't permit embedded broadcast
either are also covered at the same time.
By changing decode order to do ModR/M.mod last (rather than VEX.L), the
VEX entries (which are already reused by EVEX decoding) can be folded
with their legacy counterparts as well. Note how this change of decode
order also allows removing two auxiliary #define-s, which were
introduced during earlier folding (because of that unhelpful order of
steps).
Introduce macro V to expand to 'v' in the VEX/EVEX case, and replace a
couple of abort()s where legacy code can now legitimately make it. While
there for {,V}LDDQU drop hoing through mod_table[] - OP_M() rejects
register operands quite fine.
Masking is not permitted for certain further insns, not falling in any
of the earlier categories. Introduce the Y macro (not expanding to any
output) to flag such cases.
Note that in a few cases entries already covered otherwise are converted
as well, to continue to allow sharing of the string literals.
Masking is not permitted in this case. See the code comment for how this
is being dealt with.
To avoid excess special casing of modes, have OP_M() call OP_E_memory()
directly.
While only zeroing-masking is possible in this case, this still requires
EVEX.z to be clear. Introduce a "global" flag right here, to be re-used
by checks which need to live in specific operand handlers.
Rather than corrupting disassmbly altogether, flag EVEX.z set as bad
when masking isn't in effect in the first place at the time the
destination operand is actually processed.
opcodes/ChangeLog:
* loongarch-opc.c: Mark the offset operands as "so" for
{,x}v{ld,st}, {,x}v{ldrepl,stelm}.[bhwd], and {ld,st}[lr].[wd].
Signed-off-by: WANG Xuerui <git@xen0n.name>
Zvksh is part of the vector crypto extensions.
This extension adds the following instructions:
- vsm3me.vv
- vsm3c.vi
bfd/ChangeLog:
* elfxx-riscv.c (riscv_multi_subset_supports): Add instruction
class support for Zvksh.
(riscv_multi_subset_supports_ext): Likewise.
gas/ChangeLog:
* testsuite/gas/riscv/zvksh.d: New test.
* testsuite/gas/riscv/zvksh.s: New test.
include/ChangeLog:
* opcode/riscv-opc.h (MATCH_VSM3C_VI): New.
(MASK_VSM3C_VI): New.
(MATCH_VSM3ME_VV): New.
(MASK_VSM3ME_VV): New.
(DECLARE_INSN): New.
* opcode/riscv.h (enum riscv_insn_class): Add instruction class
support for Zvksh.
opcodes/ChangeLog:
* riscv-opc.c: Add Zvksh instructions.
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Zvksed is part of the vector crypto extensions.
This extension adds the following instructions:
- vsm4k.vi
- vsm4r.[vv,vs]
bfd/ChangeLog:
* elfxx-riscv.c (riscv_multi_subset_supports): Add instruction
class support for Zvksed.
(riscv_multi_subset_supports_ext): Likewise.
gas/ChangeLog:
* testsuite/gas/riscv/zvksed.d: New test.
* testsuite/gas/riscv/zvksed.s: New test.
include/ChangeLog:
* opcode/riscv-opc.h (MATCH_VSM4K_VI): New.
(MASK_VSM4K_VI): New.
(MATCH_VSM4R_VS): New.
(MASK_VSM4R_VS): New.
(MATCH_VSM4R_VV): New.
(MASK_VSM4R_VV): New.
(DECLARE_INSN): New.
* opcode/riscv.h (enum riscv_insn_class): Add instruction class
support for Zvksed.
opcodes/ChangeLog:
* riscv-opc.c: Add Zvksed instructions.
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Zvknh[a,b] are parts of the vector crypto extensions.
This extension adds the following instructions:
- vsha2ms.vv
- vsha2c[hl].vv
bfd/ChangeLog:
* elfxx-riscv.c (riscv_multi_subset_supports): Add instruction
class support for Zvknh[a,b].
(riscv_multi_subset_supports_ext): Likewise.
gas/ChangeLog:
* testsuite/gas/riscv/zvknha.d: New test.
* testsuite/gas/riscv/zvknha_zvknhb.s: New test.
* testsuite/gas/riscv/zvknhb.d: New test.
include/ChangeLog:
* opcode/riscv-opc.h (MATCH_VSHA2CH_VV): New.
(MASK_VSHA2CH_VV): New.
(MATCH_VSHA2CL_VV): New.
(MASK_VSHA2CL_VV): New.
(MATCH_VSHA2MS_VV): New.
(MASK_VSHA2MS_VV): New.
(DECLARE_INSN): New.
* opcode/riscv.h (enum riscv_insn_class): Add instruction class
support for Zvknh[a,b].
opcodes/ChangeLog:
* riscv-opc.c: Add Zvknh[a,b] instructions.
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Zvkg is part of the vector crypto extensions.
This extension adds the following instructions:
- vghsh.vv
- vgmul.vv
bfd/ChangeLog:
* elfxx-riscv.c (riscv_multi_subset_supports): Add instruction
class support for Zvkg.
(riscv_multi_subset_supports_ext): Likewise.
gas/ChangeLog:
* testsuite/gas/riscv/zvkg.d: New test.
* testsuite/gas/riscv/zvkg.s: New test.
include/ChangeLog:
* opcode/riscv-opc.h (MATCH_VGHSH_VV): New.
(MASK_VGHSH_VV): New.
(MATCH_VGMUL_VV): New.
(MASK_VGMUL_VV): New.
(DECLARE_INSN): New.
* opcode/riscv.h (enum riscv_insn_class): Add instruction class
support for Zvkg.
opcodes/ChangeLog:
* riscv-opc.c: Add Zvkg instructions.
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
This patch adds support for the RISC-V Zfa extension,
which introduces additional floating-point instructions:
* fli (load-immediate) with pre-defined immediates
* fminm/fmaxm (like fmin/fmax but with different NaN behaviour)
* fround/froundmx (round to integer)
* fcvtmod.w.d (Modular Convert-to-Integer)
* fmv* to access high bits of FP registers in case XLEN < FLEN
* fleq/fltq (quiet comparison instructions)
Zfa defines its instructions in combination with the following
extensions:
* single-precision floating-point (F)
* double-precision floating-point (D)
* quad-precision floating-point (Q)
* half-precision floating-point (Zfh)
This patch is based on an earlier version from Tsukasa OI:
https://sourceware.org/pipermail/binutils/2022-September/122939.html
Most significant change to that commit is the switch from the rs1-field
value to the actual floating-point value in the last operand of the fli*
instructions. Everything that strtof() can parse is accepted and
the '%a' printf specifier is used to output hex floating-point literals
in the disassembly.
The Zfa specification is frozen (and has passed public review). It is
available as a chapter in "The RISC-V Instruction Set Manual: Volume 1":
https://github.com/riscv/riscv-isa-manual/releases
bfd/ChangeLog:
* elfxx-riscv.c (riscv_multi_subset_supports): Add instruction
class support for 'Zfa' extension.
(riscv_multi_subset_supports_ext): Likewise.
(riscv_implicit_subsets): Add 'Zfa' -> 'F' dependency.
gas/ChangeLog:
* config/tc-riscv.c (flt_lookup): New helper to lookup a float value
in an array.
(validate_riscv_insn): Add 'Wfv' as new format string directive.
(riscv_ip): Likewise.
* doc/c-riscv.texi: Add floating-point chapter and describe
limiations of the Zfa FP literal parsing.
* testsuite/gas/riscv/zfa-32.d: New test.
* testsuite/gas/riscv/zfa-32.s: New test.
* testsuite/gas/riscv/zfa-64.d: New test.
* testsuite/gas/riscv/zfa-64.s: New test.
* testsuite/gas/riscv/zfa-fail.d: New test.
* testsuite/gas/riscv/zfa-fail.l: New test.
* testsuite/gas/riscv/zfa-fail.s: New test.
* testsuite/gas/riscv/zfa.d: New test.
* testsuite/gas/riscv/zfa.s: New test.
* testsuite/gas/riscv/zfa.s: New test.
* opcode/riscv-opc.h (MATCH_FLI_H): New.
(MASK_FLI_H): New.
(MATCH_FMINM_H): New.
(MASK_FMINM_H): New.
(MATCH_FMAXM_H): New.
(MASK_FMAXM_H): New.
(MATCH_FROUND_H): New.
(MASK_FROUND_H): New.
(MATCH_FROUNDNX_H): New.
(MASK_FROUNDNX_H): New.
(MATCH_FLTQ_H): New.
(MASK_FLTQ_H): New.
(MATCH_FLEQ_H): New.
(MASK_FLEQ_H): New.
(MATCH_FLI_S): New.
(MASK_FLI_S): New.
(MATCH_FMINM_S): New.
(MASK_FMINM_S): New.
(MATCH_FMAXM_S): New.
(MASK_FMAXM_S): New.
(MATCH_FROUND_S): New.
(MASK_FROUND_S): New.
(MATCH_FROUNDNX_S): New.
(MASK_FROUNDNX_S): New.
(MATCH_FLTQ_S): New.
(MASK_FLTQ_S): New.
(MATCH_FLEQ_S): New.
(MASK_FLEQ_S): New.
(MATCH_FLI_D): New.
(MASK_FLI_D): New.
(MATCH_FMINM_D): New.
(MASK_FMINM_D): New.
(MATCH_FMAXM_D): New.
(MASK_FMAXM_D): New.
(MATCH_FROUND_D): New.
(MASK_FROUND_D): New.
(MATCH_FROUNDNX_D): New.
(MASK_FROUNDNX_D): New.
(MATCH_FLTQ_D): New.
(MASK_FLTQ_D): New.
(MATCH_FLEQ_D): New.
(MASK_FLEQ_D): New.
(MATCH_FLI_Q): New.
(MASK_FLI_Q): New.
(MATCH_FMINM_Q): New.
(MASK_FMINM_Q): New.
(MATCH_FMAXM_Q): New.
(MASK_FMAXM_Q): New.
(MATCH_FROUND_Q): New.
(MASK_FROUND_Q): New.
(MATCH_FROUNDNX_Q): New.
(MASK_FROUNDNX_Q): New.
(MATCH_FLTQ_Q): New.
(MASK_FLTQ_Q): New.
(MATCH_FLEQ_Q): New.
(MASK_FLEQ_Q): New.
(MATCH_FCVTMOD_W_D): New.
(MASK_FCVTMOD_W_D): New.
(MATCH_FMVH_X_D): New.
(MASK_FMVH_X_D): New.
(MATCH_FMVH_X_Q): New.
(MASK_FMVH_X_Q): New.
(MATCH_FMVP_D_X): New.
(MASK_FMVP_D_X): New.
(MATCH_FMVP_Q_X): New.
(MASK_FMVP_Q_X): New.
(DECLARE_INSN): New.
* opcode/riscv.h (enum riscv_insn_class): Add instruction
classes for the Zfa extension.
opcodes/ChangeLog:
* riscv-dis.c (print_insn_args): Add support for
new format string directive 'Wfv'.
* riscv-opc.c: Add Zfa instructions.
Co-Developed-by: Tsukasa OI <research_trasio@irq.a4lg.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Co-Developed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
As outlined in the LoongArch ELF psABI spec [1], it is actually already
2 versions after the initial LoongArch support, and the $v[01] and
$fv[01] names should really get sunset by now.
In addition, the "$x" name for $r21 was never included in any released
version of the ABI spec, and such usages are all fixed to say just $r21
for every project I could think of that accepted a LoongArch port.
Plus, the upcoming LSX/LASX support makes use of registers named
"$vrNN" and "$xrNN", so having "$vN" and "$x" alongside would almost
certainly create confusion for developers.
Issue warnings for such usages per the deprecation procedure detailed
in the spec, so we can finally remove support in the next release cycle
after this.
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
gas/ChangeLog:
* config/tc-loongarch.c: Init canonical register ABI name
mappings and deprecated register names.
(loongarch_args_parser_can_match_arg_helper): Warn in case of
deprecated register name usage.
* testsuite/gas/loongarch/deprecated_reg_aliases.d: New test.
* testsuite/gas/loongarch/deprecated_reg_aliases.l: Likewise.
* testsuite/gas/loongarch/deprecated_reg_aliases.s: Likewise.
include/ChangeLog:
* opcode/loongarch.h: Rename global variables.
opcodes/ChangeLog:
* loongarch-opc.c: Rename the alternate/deprecated register name
mappings, and move $x to the deprecated name map.
Signed-off-by: WANG Xuerui <git@xen0n.name>
For better round-trip fidelity and readability in general.
gas/ChangeLog:
* testsuite/gas/loongarch/uleb128.d: Update test case.
* testsuite/gas/loongarch/raw-insn.d: New test.
* testsuite/gas/loongarch/raw-insn.s: Likewise.
opcodes/ChangeLog:
* loongarch-dis.c (disassemble_one): Print ".word" if !opc.
Signed-off-by: WANG Xuerui <git@xen0n.name>
The additional hex notation was minimally useful when one had to
inspect code with heavy bit manipulation, or of unclear signedness, but
it clutters the output, and the style is not regular assembly language
syntax either.
Precisely how one approaches the original use case is not taken care of
in this patch (maybe we want a disassembler option forcing a certain
style for immediates, like for example printing every immediate in
decimal or hexadecimal notation), but at least let's stop the current
practice.
ChangeLog:
* testsuite/gas/loongarch/imm_ins.d: Update test case.
* testsuite/gas/loongarch/imm_ins_32.d: Likewise.
* testsuite/gas/loongarch/imm_op.d: Likewise.
* testsuite/gas/loongarch/jmp_op.d: Likewise.
* testsuite/gas/loongarch/load_store_op.d: Likewise.
* testsuite/gas/loongarch/macro_op.d: Likewise.
* testsuite/gas/loongarch/macro_op_32.d: Likewise.
* testsuite/gas/loongarch/privilege_op.d: Likewise.
* testsuite/gas/loongarch/uleb128.d: Likewise.
* testsuite/gas/loongarch/vector.d: Likewise.
ld/ChangeLog:
* testsuite/ld-loongarch-elf/jmp_op.d: Update test case.
* testsuite/ld-loongarch-elf/macro_op.d: Likewise.
* testsuite/ld-loongarch-elf/macro_op_32.d: Likewise.
opcodes/ChangeLog:
* loongarch-dis.c (dis_one_arg): Remove the "(0x%x)" part from
disassembly output of signed immediate operands.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Add a modifier char 'o' telling the disassembler to print the immediate
using the address offset style, and mark the memory access instructions'
offset operands as such.
opcodes/ChangeLog:
* loongarch-dis.c (dis_one_arg): Style disassembled address
offsets as such when the operand has a modifier char 'o'.
* loongarch-opc.c: Add 'o' to operands that represent address
offsets.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Update the LoongArch disassembler to supply style information to the
disassembler output. The output formatting remains unchanged.
opcodes/ChangeLog:
* disassemble.c: Mark LoongArch as created_styled_output=true.
* loongarch-dis.c (dis_one_arg): Use fprintf_styled_func
throughout with proper styles.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Add a flag in the pinfo field for being able to mark certain specialized
matchers as disassembler-only, so some degree of isolation between
assembler-side and disassembler-side can be achieved.
This isolation is necessary, firstly because some pseudo-instructions
cannot be fully described in the opcode table, like `li.[wd]`, so the
corresponding opcode entry cannot have meaningful match/mask values.
Secondly, some of these pseudo-instructions can be realized in more than
one plausible ways; e.g. `li.w rd, <something between 0 and 0x7ff>` can
be realized on LA64 with any of `addi.w`, `addi.d` or `ori`. If we tie
disassembly of such aliases with the corresponding GAS support, only one
canonical form among the above would be recognized as `li.w`, and it
would mildly impact the readability of disassembly output.
People wanting the exact disassembly can always set `-M no-aliases` to
get the original behavior back.
In addition, in certain cases, information is irreversibly lost after
assembling, so perfect round-trip would not be possible in such cases.
For example, `li.w` and `li.d` of immediates within int32_t range
produce the same code; in this patch, `addi.d rd, $zero, imm` is treated
as `li.d`, while `addi.w` and `ori` immediate loads are shown as `li.w`,
due to the expressible value range well within 32 bits.
gas/ChangeLog:
* config/tc-loongarch.c (get_loongarch_opcode): Ignore
disassembler-only aliases.
* testsuite/gas/loongarch/64_pcrel.d: Update test case.
* testsuite/gas/loongarch/imm_ins.d: Likewise.
* testsuite/gas/loongarch/imm_ins_32.d: Likewise.
* testsuite/gas/loongarch/jmp_op.d: Likewise.
* testsuite/gas/loongarch/li.d: Likewise.
* testsuite/gas/loongarch/macro_op.d: Likewise.
* testsuite/gas/loongarch/macro_op_32.d: Likewise.
* testsuite/gas/loongarch/macro_op_large_abs.d: Likewise.
* testsuite/gas/loongarch/macro_op_large_pc.d: Likewise.
* testsuite/gas/loongarch/nop.d: Likewise.
* testsuite/gas/loongarch/relax_align.d: Likewise.
* testsuite/gas/loongarch/reloc.d: Likewise.
include/ChangeLog:
* opcode/loongarch.h (INSN_DIS_ALIAS): Add.
ld/ChangeLog:
* testsuite/ld-loongarch-elf/jmp_op.d: Update test case.
* testsuite/ld-loongarch-elf/macro_op.d: Likewise.
* testsuite/ld-loongarch-elf/macro_op_32.d: Likewise.
* testsuite/ld-loongarch-elf/relax-align.dd: Likewise.
opcodes/ChangeLog:
* loongarch-dis.c: Move register name map declarations to top.
(get_loongarch_opcode_by_binfmt): Consider aliases when
disassembling without the no-aliases option.
(parse_loongarch_dis_option): Support the no-aliases option.
* loongarch-opc.c: Collect pseudo instructions into a new
dedicated table.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Many instructions were enabled only when both a feature flag and a minimum
architecture version are specified. This behaviour differs from GCC, which (in
most cases) allows features to be enabled at any architecture version.
There is no need for the toolchain to restrict combinations of unrelated
features in this way, so this patch removes the unnecessary dependencies.
Previously, FCSRs were referred to as $rX, which seemed strange.
We refer to FCSRs as $fcsrX, which ensures compatibility with LLVM
IAS as well.
gas/ChangeLog:
* config/tc-loongarch.c:
(loongarch_fc_normal_name): New definition.
(loongarch_fc_numeric_name): New definition.
(loongarch_single_float_opcodes): Modify `movgr2fcsr` and
`movfcsr2gr`.
testsuite/gas/loongarch/float_op.d: Likewise.
testsuite/gas/loongarch/float_op.s: Likewise.
include/ChangeLog:
* opcode/loongarch.h:
(loongarch_fc_normal_name): New extern.
(loongarch_fc_numeric_name): New extern.
opcodes/ChangeLog:
* opcodes/loongarch-dis.c (loongarch_after_parse_args): Support
referring to FCSRs as $fcsrX.
* opcodes/loongarch-opc.c (loongarch_args_parser_can_match_arg_helper):
Likewise.
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
The logic can actually be expressed with less code that way, utilizing
that there are common patterns of when which form of masking is
permitted. This then also eliminates the large set of open-codings of
BOTH_MASKING in the opcode table.
Allegrex supports some MIPS32 and MIPS32r2 instructions (albeit with
some encoding differences) such as bit manipulation (ins/ext) and MLA
(madd/msub). It also features some new instructions like wsbw and
min/max or device-specific ones such as mfic.
Signed-off-by: David Guillen Fandos <david@davidgf.net>
The Allegrex CPU supports bit rotation instructions as described in the
MIPS32 release 2 CPU (even though it is a MIPS-2 based CPU).
Signed-off-by: David Guillen Fandos <david@davidgf.net>
The Allegrex CPU was created by Sony Interactive Entertainment to power
their portable console, the PlayStation Portable.
The pspdev organization maintains all sorts of tools to create software
for said device including documentation.
Signed-off-by: David Guillen Fandos <david@davidgf.net>
We should try our best to make mips32 using the same
oprand char with micromips. So for mips32, we use:
^ is added for 5bit sa oprand for some new DSPr2 instructions:
APPEND, PREPEND, PRECR_SRA[_R].PH.W
the LSB bit is 11, like RD.
+t is removed for coprocessor 0 destination register.
'E' does the samething.
+t is now used for RX oprand for MFTR/MTTR (MT ASE)
? is added for sel oprand for MFTR/MTTR (MT ASE)
For mips32, the position of sel in MFTR/MTTR is same with mfc0 etc,
while for micromips, they are different.
We also add an extesion format of cftc2/cttc2/mftc2/mfthc2/mttc2/mtthc2:
concatenating rs with rx as the index of control or data.
In commit 1a3b4f90bc ("x86: convert two pointers to (indexing)
integers") I neglected the fact that compilers may warn about comparing
ptrdiff_t (signed long) with size_t (unsigned long) values. Since just
before we've checked that the value is positive, simply add a cast
(despite my dislike for casts).
This in particular reduces the number of pointers to non-const that we
have (and that could potentially be used for undue modification of
state). As a result, fetch_code()'s 2nd parameter can then also become
pointer-to-const.
The present way of dealing with them - misusing MAX_MNEM_SIZE, which has
nothing to do with insn length - leads to inconsistent results. Since we
allow for up to MAX_CODE_LENGTH - 1 prefix bytes (which then could be
followed by another MAX_CODE_LENGTH "normal" insn bytes until we're done
decoding), size the_buffer[] accordingly.
Move struct dis_private down to be able to use MAX_CODE_LENGTH without
moving its #define. While doing this also alter the order to have the
potentially large array last.
This first of all removes a dependency on bfd_byte and unsigned char
being the same types. It further eliminates the need to mask by 0xff
when fetching values (which wasn't done fully consistently anyway),
improving code legibility.
While there, where possible add const.
* Extract all private_data initializations into riscv_init_disasm_info, which
called from print_insn_riscv rather than riscv_disassemble_insn.
* The disassemble_free_target seems like the right place to release all target
private_data, also including the internal data structures, like riscv_subsets.
Therefore, add a new function, disassemble_free_riscv, to release them for safe.
opcodes/
* disassemble.c (disassemble_free_target): Called disassemble_free_riscv
for riscv to release private_data and internal data structures.
* disassemble.h: Added extern disassemble_free_riscv.
* riscv-dis.c (riscv_init_disasm_info): New function, used to init
riscv_private_data.
(riscv_disassemble_insn): Moved riscv_private_data initializations
into riscv_init_disasm_info.
(print_insn_riscv): Called riscv_init_disasm_info to init
riscv_private_data once time.
(disassemble_free_riscv): New function, used to free the internal data
structures, like riscv_subsets.
Trying to build binutils with an older gcc currently fails. Working
around these gcc bugs is not onerous so let's fix them.
bfd/
* elf32-csky.c (csky_elf_size_dynamic_sections): Don't type-pun
pointer.
* elf32-rl78.c (rl78_compute_complex_reloc): Rename "stat"
variable to "status".
gas/
* compress-debug.c (compress_finish): Supply all fields in
ZSTD_inBuffer initialisation.
include/
* xtensa-dynconfig.h (xtensa_isa_internal): Delete unnecessary
forward declaration.
opcodes/
* loongarch-opc.c: Supply all fields of zero struct initialisation
in various opcode tables.
Consistently do 64-bit first, VEX.L second, VEX.W third, ModR/M fourth,
and only then prefix, resulting in fewer table entries. Note that in the
course of the re-work
- TILEZERO has a previously missing decode step through rm_table[]
added,
- a wrong M_0 suffix for TILEZERO is also corrected to be M_1 (now an
infix).
Consistently do 64-bit first, ModR/M second, VEX.L third, VEX.W fourth,
and prefix last, resulting in fewer table entries. Note that in the
course of the re-work wrong M_0 suffixes are also corrected to be M_1
(partly infixes now).
Since it ended up confusing while testing the change, also adjust the
test name in x86-64-amx-bad.d (to be distinct from x86-64-amx.d's).
Ventana Micro has published the specification for their
XVentanaCondOps ("conditional ops") extension at
https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.0/ventana-custom-extensions-v1.0.0.pdf
which contains two new instructions
- vt.maskc
- vt.maskcn
that can be used in constructing branchless sequences for
various conditional-arithmetic, conditional-logical, and
conditional-select operations.
To support such vendor-defined instructions in the mainline binutils,
this change also adds a riscv_supported_vendor_x_ext secondary
dispatch table (but also keeps the behaviour of allowing any unknow
X-extension to be specified in addition to the known ones from this
table).
As discussed, this change already includes the planned/agreed future
requirements for X-extensions (which are likely to be captured in the
riscv-toolchain-conventions repository):
- a public specification document is available (see above) and is
referenced from the gas-documentation
- the naming follows chapter 27 of the RISC-V ISA specification
- instructions are prefixed by a vendor-prefix (vt for Ventana)
to ensure that they neither conflict with future standard
extensions nor clash with other vendors
bfd/ChangeLog:
* elfxx-riscv.c (riscv_get_default_ext_version): Add riscv_supported_vendor_x_ext.
(riscv_multi_subset_supports): Recognize INSN_CLASS_XVENTANACONDOPS.
gas/ChangeLog:
* doc/c-riscv.texi: Add section to list custom extensions and
their documentation URLs.
* testsuite/gas/riscv/x-ventana-condops.d: New test.
* testsuite/gas/riscv/x-ventana-condops.s: New test.
include/ChangeLog:
* opcode/riscv-opc.h Add vt.maskc and vt.maskcn.
* opcode/riscv.h (enum riscv_insn_class): Add INSN_CLASS_XVENTANACONDOPS.
opcodes/ChangeLog:
* riscv-opc.c: Add vt.maskc and vt.maskcn.
Series-version: 1
Series-to: binutils@sourceware.org
Series-cc: Kito Cheng <kito.cheng@sifive.com>
Series-cc: Nelson Chu <nelson.chu@sifive.com>
Series-cc: Greg Favor <gfavor@ventanamicro.com>
Series-cc: Christoph Muellner <cmuellner@gcc.gnu.org>
1) i386-dis.c:12055:11: runtime error: left shift of negative value -1
Bit twiddling is best done unsigned, due to UB on overflow of signed
expressions. Fix this by using bfd_vma rather than bfd_signed_vma
everywhere in i386-dis.c except print_displacement.
2) Return get32s and get16 value in a bfd_vma, reducing the need for
temp variables.
3) Introduce get16s and get8s functions to simplify the code.
4) With some optimisation options gcc-13 legitimately complains about
a fall-through in OP_I. Fix that. OP_I also doesn't need to use
"mask" which was wrong for w_mode anyway.
5) Masking with & 0xffffffff is better than casting to unsigned. We
don't know for sure that unsigned int is 32-bit.
6) We also don't know that unsigned char is 8 bits. Mask codep
accesses everywhere. I don't expect binutils will work on anything
other than an 8-bit char host, but if we are masking codep accesses in
some places we might as well be consistent. (Better would be to use
stdint.h types more in binutils.)
opcodes/i386-dis.c: In function ‘print_insn’:
opcodes/i386-dis.c:9865:22: error: storing the address of local
variable ‘priv’ in ‘*info.private_data’ [-Werror=dangling-pointer=]
* i386-dis.c (print_insn): Clear info->private_data before
returning.
For quite come time print_insn() has been storing the address of a local
variable into info->private_data. Since the compiler can't know that the
field won't be accessed again after print_insn() returns, it may kind of
legitimately diagnose this. And recent enough gcc does as of the
introduction of the fetch_error() return paths (replacing setjmp()-based
error handling).
Utilizing that neither prefix_name() nor i386_dis_printf() actually use
info->private_data, zap the pointer in fetch_error(), after having
retrieved it for local use.
A recent change in opcodes/i386-dis.c caused a build failure on my
x86-64 Fedora 36 system, which uses:
$ gcc --version
gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
[...]
The error is:
../../binutils-gdb/opcodes/i386-dis.c: In function ‘OP_J’:
../../binutils-gdb/opcodes/i386-dis.c:12705:22: error: ‘val’ may be used uninitialized [-Werror=maybe-uninitialized]
12705 | disp = val & 0x8000 ? val - 0x10000 : val;
| ~~~~^~~~~~~~
This patch fixes the warning.
opcodes/ChangeLog
2023-04-21 Tom Tromey <tromey@adacore.com>
* i386-dis.c (OP_J): Check result of get16.
get64() is unreachable when !BFD64 (due to a check relatively early in
print_insn()). Let's avoid the associated #ifdef-ary (or else we should
extend it to remove more dead code).
Make them return boolean and convert FETCH_DATA() uses to fetch_code().
With this no further users of FETCH_DATA() remain, so the macro and its
backing function are dropped as well.
Leave value types as they were for the helper functions, even if I don't
think that beyond get64() use of bfd_{,signed_}vma is really necessary.
With type change of "disp" in OP_E_memory(), change the 2nd parameter of
print_displacement() to a signed type as well, though (eliminating the
need for a local variable of signed type). This also eliminates the need
for custom printing of '-' in Intel syntax displacement expressions.
While there drop forward declarations which aren't really needed.
Use a tristate (enum) return value type to be able to express all three
cases which are of interest to the (sole) caller. This also allows doing
away with the abuse of "rex_used".
... and its direct helper get_sib(). Using setjmp()/longjmp() for fetch
error handling is problematic, as per
https://sourceware.org/pipermail/binutils/2023-March/126687.html. Start
using more conventional error handling instead.
Also introduce a fetch_modrm() helper, for subsequent re-use.
... such that it can be used from other than the setjmp() error handling
path.
Since I'd like the function's parameter to be pointer-to-const, two
other functions need respective constification then, too (along with
needing to be forward-declared).
This issue was reported from https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1188
Current flow:
1) Scan any mapping symbol less than this instruciton.
2) If not found, did a backward search.
The flow seems not big issue, let run an example here:
$x:
0x0 a <--- Found at step 1
0x4 b <--- Not found in step 1, but found at step 2
0x8 c <--- Not found in step 1, but found at step 2
$d
0x12 .word 1234 <-- Found at step 1
The instruciton didn't have the same address with mapping symbol will
still did backward search again and again.
So the new flow is:
1) Use the last mapping symbol status if the address is still within the range
of the current mapping symbol.
2) Scan any mapping symbol less than this instruciton.
3) If not found, did a backward search.
4) If a proper mapping symbol is found in either step 2 or 3, find its boundary,
and cache that.
Use the same example to run the new flow again:
$x:
0x0 a <--- Found at step 2, the boundary is 0x12
0x4 b <--- Cache hit at step 1, within the boundary.
0x8 c <--- Cache hit at step 1, within the boundary.
$d
0x12 .word 1234 <-- Found at step 2, the boundary is the end of section.
The disassemble time of the test cases has been reduced from ~20 minutes to ~4
seconds.
opcode/ChangeLog
PR 30282
* riscv-dis.c (last_map_symbol_boundary): New.
(last_map_state): New.
(last_map_section): New.
(riscv_search_mapping_symbol): Cache the result of latest
mapping symbol.
While I was working on the disassembler styling for ARM I noticed that
the whitespace in the cpsie instruction was inconsistent with most of
the other ARM disassembly output, the disassembly for cpsie looks like
this:
cpsie if,#10
notice there's no space before the '#10' immediate, most other ARM
instructions have a space before each operand.
This commit updates the disassembler to add the missing space, and
updates the tests I found that tested this instruction.
This commit intends to move operands that require very special handling or
operand types that are so minor (e.g. only useful on a few instructions)
under "W". I also intend this "W" to be "temporary" operand storage until
we can find good two character (or less) operand type.
In this commit, prefetch offset operand "f" for 'Zicbop' extension is moved
to "Wif" because of its special handling (and allocating single character
"f" for this operand type seemed too much).
Current expected allocation guideline is as follows:
1. 'W'
2. The most closely related single-letter extension in lowercase
(strongly recommended but not mandatory)
3. Identify operand type
The author currently plans to allocate following three-character operand
types (for operands including instructions from unratified extensions).
1. "Wif" ('Zicbop': fetch offset)
2. "Wfv" (unratified 'Zfa': value operand from FLI.[HSDQ] instructions)
3. "Wfm" / "WfM"
'Zfh', 'F', 'D', 'Q': rounding modes "m" with special handling
solely for widening conversion instructions.
gas/ChangeLog:
* config/tc-riscv.c (validate_riscv_insn, riscv_ip): Move from
"f" to "Wif".
opcodes/ChangeLog:
* riscv-dis.c (print_insn_args): Move from "f" to "Wif".
* riscv-opc.c (riscv_opcodes): Reflect new operand type.
All encoding spaces can be used this way; there's a certain risk that
the bits presently reserved could be used for other purposes down the
road, but people using .insn are expected to know what they're doing
anyway. Plus this way there's at least _some_ way to have those bits
set.
For now this will only allow operand-less insns to be encoded this way.
This patch adds the RPRFM (range prefetch) instruction.
It was introduced as part of SME2, but it belongs to the
prefetch hint space and so doesn't require any specific
ISA flags.
The aarch64_rprfmop_array initialiser (deliberately) only
fills in the leading non-null elements.
This patch adds the SVE FDOT, SDOT and UDOT instructions,
which are available when FEAT_SME2 is implemented. The patch
also reorders the existing SVE_Zm3_22_INDEX to keep the
operands numerically sorted.
This patch adds SUNPK and UUNPK, which unpack one register's
worth of elements to two registers' worth, or two registers'
worth to four registers' worth.
There are two instruction formats here:
- SQRSHR, SQRSHRU and UQRSHR, which operate on lists of two
or four registers.
- SQRSHRN, SQRSHRUN and UQRSHRN, which operate on lists of
four registers.
These are the first SME2 instructions to have immediate operands.
The patch makes sure that, when parsing SME2 instructions with
immediate operands, the new predicate-as-counter registers are
parsed as registers rather than as #-less immediates.
There are two instruction formats here:
- SQCVT, SQCVTU and UQCVT, which operate on lists of two or
four registers.
- SQCVTN, SQCVTUN and UQCVTN, which operate on lists of
four registers.
This patch adds the SME2 versions of the FP<->integer conversion
instructions FCVT* and *CVTF. It also adds FP rounding instructions
FRINT*, which share the same format.
BFDOT, FDOT and USDOT share the same instruction format.
SDOT and UDOT share a different format. SUDOT does not
have the multi vector x multi vector forms, since they
would be redundant with USDOT.
SMLALL, SMLSLL, UMLALL and UMLSLL have the same format.
USMLALL and SUMLALL allow the same operand types as those
instructions, except that SUMLALL does not have the multi-vector
x multi-vector forms (which would be redundant with USMLALL).
The {BF,F,S,U}MLAL and {BF,F,S,U}MLSL instructions share the same
encoding. They are the first instance of a ZA (as opposed to ZA tile)
operand having a range of offsets. As with ZA tiles, the expected
range size is encoded in the operand-specific data field.
This patch adds the SME2 multi-register forms of F{MAX,MIN}{,NM}
and {S,U}{MAX,MIN}. SQDMULH, SRSHL and URSHL have the same form
as SMAX etc., so the patch adds them too.
Add support for the SME2 ADD. SUB, FADD and FSUB instructions.
SUB and FSUB have the same form as ADD and FADD, except that
ADD also has a 2-operand accumulating form.
The 64-bit ADD/SUB instructions require FEAT_SME_I16I64 and the
64-bit FADD/FSUB instructions require FEAT_SME_F64F64.
These are the first instructions to have tied register list
operands, as opposed to tied single registers.
The parse_operands change prevents unsuffixed Z registers (width==-1)
from being treated as though they had an Advanced SIMD-style suffix
(.4s etc.). It means that:
Error: expected element type rather than vector type at operand 2 -- `add za\.s\[w8,0\],{z0-z1}'
becomes:
Error: missing type suffix at operand 2 -- `add za\.s\[w8,0\],{z0-z1}'
SME2 adds lookup table instructions for quantisation. They use
a new lookup table register called ZT0.
LUTI2 takes an unsuffixed SVE vector index of the form Zn[<imm>],
which is the first time that this syntax has been used.
Implementation-wise, the main things to note here are:
- the WHILE* instructions have forms that return a pair of predicate
registers. This is the first time that we've had lists of predicate
registers, and they wrap around after register 15 rather than after
register 31.
- the predicate-as-counter WHILE* instructions have a fourth operand
that specifies the vector length. We can treat this as an enumeration,
except that immediate values aren't allowed.
- PEXT takes an unsuffixed predicate index of the form PN<n>[<imm>].
This is the first instance of a vector/predicate index having
no suffix.
SME2 adds LD1 and ST1 variants for lists of 2 and 4 registers.
The registers can be consecutive or strided. In the strided case,
2-register lists have a stride of 8, starting at register x0xxx.
4-register lists have a stride of 4, starting at register x00xx.
The instructions are predicated on a predicate-as-counter register in
the range pn8-pn15. Although we already had register fields with upper
bounds of 7 and 15, this is the first plain register operand to have a
nonzero lower bound. The patch uses the operand-specific data field
to record the minimum value, rather than having separate inserters
and extractors for each lower bound. This in turn required adding
an extra bit to the field.
SME2 defines new MOVA instructions for moving multiple registers
to and from ZA. As with SME, the instructions are also available
through MOV aliases.
One notable feature of these instructions (and many other SME2
instructions) is that some register lists must start at a multiple
of the list's size. The patch uses the general error "start register
out of range" when this constraint isn't met, rather than an error
specifically about multiples. This ensures that the error is
consistent between these simple consecutive lists and later
strided lists, for which the requirements aren't a simple multiple.
SME2 adds a new format for the existing SVE predicate registers:
predicates as counters rather than predicates as masks. In assembly
code, operands that interpret predicates as counters are written
pn<N> rather than p<N>.
This patch adds support for these registers and extends some
existing instructions to support them. Since the new forms
are just a programmer convenience, there's no need to make them
more restrictive than the earlier predicate-as-mask forms.
Some SME2 instructions operate on a range of consecutive ZA vectors.
This is indicated by syntax such as:
za[<Wv>, <imml>:<immh>]
Like with the earlier vgx2 and vgx4 support, we get better error
messages if the parser allows all ZA indices to have a range.
We can then reject invalid cases during constraint checking.
Many SME2 instructions operate on groups of 2 or 4 ZA vectors.
This is indicated by adding a "vgx2" or "vgx4" group size to the
ZA index. The group size is optional in assembly but preferred
for disassembly.
There is not a binary distinction between mnemonics that have
group sizes and mnemonics that don't, nor between mnemonics that
take vgx2 and mnemonics that take vgx4. We therefore get better
error messages if we allow any ZA index to have a group size
during parsing, and wait until constraint checking to reject
invalid sizes.
A quirk of the way errors are reported means that if an instruction
is wrong both in its qualifiers and its use of a group size, we'll
print suggested alternative instructions that also have an incorrect
group size. But that's a general property that also applies to
things like out-of-range immediates. It's also not obviously the
wrong thing to do. We need to be relatively confident that we're
looking at the right opcode before reporting detailed operand-specific
errors, so doing qualifier checking first seems resonable.
SME2 adds various new fields that are similar to
AARCH64_OPND_SME_ZA_array, but are distinguished by the size of
their offset fields. This patch adds _off4 to the name of the
field that we already have.
Until now, binutils has supported register ranges such
as { v0.4s - v3.4s } as an unofficial shorthand for
{ v0.4s, v1.4s, v2.4s, v3.4s }. The SME2 ISA embraces this form
and makes it the preferred disassembly. It also embraces wrapped
lists such as { z31.s - z2.s }, which is something that binutils
didn't previously allow.
The range form was already binutils's preferred disassembly for 3- and
4-register lists. This patch prefers it for 2-register lists too.
The patch also adds support for wrap-around.
SME2 has instructions that accept strided register lists,
such as { z0.s, z4.s, z8.s, z12.s }. The purpose of this
patch is to extend binutils to support such lists.
The parsing code already had (unused) support for strides of 2.
The idea here is instead to accept all strides during parsing
and reject invalid strides during constraint checking.
The SME2 instructions that accept strided operands also have
non-strided forms. The errors about invalid strides therefore
take a bitmask of acceptable strides, which allows multiple
possibilities to be summed up in a single message.
I've tried to update all code that handles register lists.
Some FLD_imm* suffixes used a counting scheme such as FLD_immN,
FLD_immN_2, FLD_immN_3, etc., while others used the lsb as the
suffix. The latter seems more mnemonic, and was a big help
in doing the SME2 work.
Similarly, the _10 suffix on FLD_SME_size_10 was nonobvious.
Presumably it indicated a 2-bit field, but it actually starts
in bit 22.
Quite a lot of SME2 instructions have an opcode bit that selects
between 32-bit and 64-bit forms of an instruction, with the 32-bit
forms being part of base SME2 and with the 64-bit forms being part
of an optional extension. It's nevertheless useful to have a single
opcode entry for both forms since (a) that matches the ISA definition
and (b) it tends to improve error reporting.
This patch therefore adds a libopcodes function called
aarch64_cpu_supports_inst_p that tests whether the target
supports a particular instruction. In future it will depend
on internal libopcodes routines.
This patch renames the OP_SME_* macros in aarch64-tbl.h so that
they follow the same scheme as the OP_SVE_* ones. It also uses
OP_SVE_ as the prefix, since there is no real distinction between
the SVE and SME uses of qualifiers: a macro defined for one can
be useful for the other too.
If an instruction has invalid qualifiers, GAS would report the
error against the final opcode entry that got to the qualifier-
checking stage. It seems better to report the error against
the opcode entry that had the closest match, just like we
pick the closest match within an opcode entry for the
"did you mean this?" message.
This patch adds the number of invalid operands as an
argument to AARCH64_OPDE_INVALID_VARIANT and then picks the
AARCH64_OPDE_INVALID_VARIANT with the lowest argument.
AARCH64_OPDE_REG_LIST took a single operand that specified the
expected number of registers. However, there are quite a few
SME2 instructions that have both 2-register forms and (separate)
4-register forms. If the user tries to use a 3-register list,
it isn't obvious which opcode entry they meant. Saying that we
expect 2 registers and saying that we expect 4 registers would
both be wrong.
This patch therefore switches the operand to a bitfield. If a
AARCH64_OPDE_REG_LIST is reported against multiple opcode entries,
the patch ORs up the expected lengths.
This has no user-visible effect yet. A later patch adds more error
strings, alongside tests that use them.
SVE register lists were classified as SVE_REG, since there had been
no particular reason to separate them out. However, some SME2
instructions have tied register list operands, and so we need to
distinguish registers and register lists when checking whether two
operands match.
Also, the register list operands used a general error message,
even though we already have a dedicated error code for register
lists that are the wrong length.
libopcodes currently reports out-of-range registers as a general
AARCH64_OPDE_OTHER_ERROR. However, this means that each register
range needs its own hard-coded string, which is a bit cumbersome
if the range is determined programmatically. This patch therefore
adds a dedicated error type for out-of-range errors.
In SME, the vector select register had to be in the range
w12-w15, so it made sense to enforce that during parsing.
However, SME2 adds instructions for which the range is
w8-w11 instead.
This patch therefore moves the range check from the parsing
stage to the constraint-checking stage.
Also, the previous error used a capitalised range W12-W15,
whereas other register range errors used lowercase ranges
like p0-p7. A quick internal poll showed a preference for
the lowercase form, so the patch uses that.
The patch uses "selection register" rather than "vector
select register" so that the terminology extends more
naturally to PSEL.
This patch moves the range checks on ZA vector select offsets from
gas to libopcodes. Doing the checks there means that the error
messages contain the expected range. It also fits in better
with the error severity scheme, which becomes important later.
(This is because out-of-range indices are treated as more severe than
syntax errors, on the basis that parsing must have succeeded if we get
to the point of checking the completed opcode.)
The patch also adds a new check_za_access function for checking
ZA accesses. That's a bit over the top for one offset check, but the
function becomes more complex with later patches.
sme-9-illegal.s checked for an invalid .q suffix using:
psel p1, p15, p3.q[w15]
but this is doubly invalid because it misses the immediate part
of the index. The patch keeps that test but adds another with
a zero index, so that .q is the only thing wrong.
The aarch64-tbl.h change includes neatening up the backslash
positions.
A later patch moves the range checking for ZA vector select
offsets from gas to libopcodes. That in turn requires the
immediate field to be big enough to support all parsed values.
This shouldn't be a particularly size-sensitive structure,
so there should be no memory problems with doing this.
za_tile_vector is also used for indexing ZA as a whole, rather than
just for indexing tiles. The former is more common than the latter
in SME2, so this patch generalises the name to "indexed_za".
The patch also names the associated structure, so that later patches
can reuse it during parsing.
We already treat the ZA tiles ZA0-ZA15 as registers. This patch
does the same for ZA itself. parse_sme_zero_mask can then parse
ZA tiles and ZA in the same way, through parsed_type_reg.
One important effect of going through parsed_type_reg (in general)
is that it allows ZA to take qualifiers. This is necessary for many
SME2 instructions.
However, to support existing unqualified uses of ZA, parse_reg_with_qual
needs to treat the qualiier as optional. Hopefully the net effect is
to give better error messages, since now that SME2 makes "za.<T>"
valid in some contexts, it might be natural to use it (incorrectly)
in ZERO too.
While there, the patch also tweaks the error messages for invalid
ZA tiles, to try to make some cases more specific.
For now, parse_sme_za_array just uses parse_reg, rather than
parse_typed_reg/parse_reg_with_qual. A later patch consolidates
the parsing further.
This patch makes all SME instructions use F_STRICT, so that qualifiers
have to be provided explicitly rather than being inferred from other
operands. The main change is to move the qualifier setting from the
operand-level decoders to the opcode level.
This is one step towards consolidating the ZA parsing code and
extending it to handle SME2.
In the register-index forms of PRFM, the unallocated prefetch opcodes
24-31 have been reused for the encoding of the new RPRFM instruction.
The PRFM opcode space is now capped at 23 for these forms. The other
forms of PRFM are unaffected.
Most extension flags are named after the associated architectural
FEAT_* flags, but sme-i64 and sme-f64 were exceptions. This patch
adds sme-i16i64 and sme-f64f64 aliases, but keeps the old names too
for compatibility.
With VexVVVV only being boolean, the SSE shift-by-immediate instructions
don't need special casing anymore for SSE2AVX handling. Simplify the two
respective templates. (No change to generated tables.)
With the SDM long having dropped the NDS/NDD/DDS concept of identifying
encoding variants, we can finally do away with this concept as well. Of
the few consumers of the attribute, only an assertion was still checking
for a particular value, which we don't really need to retain.
When touching lines anyway, modernize other aspects as well. This often
improves similarity to adjacent lines.