With the general use of C99 there's no need anymore to have i386-gen
produce these. For more frequently used ones introduce local #define-s,
while others are simply spelled out directly. While doing this move
some static constants into more narrow scopes.
Note that as a "side effect" this corrects type_names[]'es imm8s entry.
There are just 4 templates using it, which can be easily identified by
other means, as D is set only on a very limited number of FPU templates.
Also move the respective conditional out of the code path taken by all
"reverse match" insns (it probably should have been this way already
before, to avoid the one conditional in the common case).
With this the templates which had FloatR dropped no longer differ from
their AT&T syntax + mnemonic counterparts - the only difference is now
which of the two would be recognized. For this, however, we don't need
two templates - we can simply arrange the condition for setting
Opcode_FloatR accordingly.
Attributes which aren't used together in any single insn template can be
converted from individual booleans to a single enum, as was done for a few
other attributes before. This is more space efficient. Collect together
all attributes which express special operand constraints (and which fit
the criteria for folding).
Earlier tidying still missed an opportunity: There's no need for the
"anyimm" static variable. Instead of using it in the loop to mask
"allowed" (which is necessary to satisfy operand_type_or()'s assertions)
simply use "mask", requiring it to be calculated first. That way the
post-loop masking by "mask" ahead of the operand_type_all_zero() can be
dropped.
By putting the templates after their AVX512 counterparts, the AVX512
flavors will be picked by default. That way the need to always use {vex}
ceases to exist once respective CPU features (AVX512-VNNI or AVX512VL as
a whole) have been disabled. This way the need for the PseudoVexPrefix
attribute also disappears.
AMX-TILE is a prereq to these, as already correctly expressed by
CPU_ANY_AMX_TILE_FLAGS. Express the dependency also in the reverse
("positive") direction.
While in some cases deriving an AT&T-style suffix from an Intel syntax
memory operand size specifier is necessary, in many cases this is not
only pointless, but has led to the introduction of various workarounds:
Excessive use of IgnoreSize and NoRex64 as well as the ToDword and
ToQword attributes. Suppress suffix derivation when we can clearly tell
that the memory operand's size isn't going to be needed to infer the
possible need for the low byte/word opcode bit or an operand size prefix
(0x66 or REX.W).
As a result ToDword and ToQword can be dropped entirely, plus a fair
number of IgnoreSize and NoRex64 can also be got rid of. Note that
IgnoreSize needs to remain on legacy encoded SIMD insns with GPR
operand, to avoid emitting an operand size prefix in 16-bit mode. (Since
16-bit code using SIMD insns isn't well tested, clone an existing
testcase just enough to cover a few insns which are potentially
problematic but are being touched here.)
Note that while folding the VCVT{,T}S{S,D}2SI templates, VCVT{,T}SH2SI
isn't included there. This is to fulfill the request of not allowing L
and Q suffixes there, despite the inconsistency with VCVT{,T}S{S,D}2SI.
Many of the vector conversion insns come with X/Y/Z suffixed forms, for
disambiguation purposes in AT&T syntax. All of these gorups follow
certain patterns. Introduce "xy" and "xyz" templates to reduce
redundancy.
To facilitate using a uniform name for both AVX and AVX512, further
introduce a means to purge a previously defined template: A standalone
<name> will be recognized to have this effect.
Note that in the course of the conversion VFPCLASSPH is properly split
to separate AT&T and Intel syntax forms, matching VFPCLASSP{S,D} and
yielding the intended "ambiguous operand size" diagnostic in Intel mode.
The vast majority of vector FP insns comes in single/double pairs. Many
pairs follow certain encoding patterns. Introduce an "sd" template to
reduce redundancy. Similarly, to further cover similarities between
AVX512F and AVX512-FP16, introduce an "sdh" template.
For element-size Disp8 shift generalize i386-gen's broadcast size
determination, allowing Disp8MemShift to be specified without an operand
in the affected templated templates. While doing the adjustment also
eliminate an unhelpful (lost information) diagnostic combined with a use
after free in what is now get_element_size().
Note that in the course of the conversion
- the AVX512F form of VMOVUPD has a stray (leftover) Load attribute
dropped,
- VMOVSH has a benign IgnoreSize added (the attribute is still strictly
necessary for VMOVSD, and necessary for VMOVSS as long as we permit
strange combinations like "-march=i286+avx"),
- VFPCLASSPH is properly split to separate AT&T and Intel syntax forms,
matching VFPCLASSP{S,D}.
This reverts commit 384f368958, which
broke i386-gen's emitting of diagnostics. As a replacement to address
the original issue of newer gcc no longer splicing lines when dropping
the line continuation backslashes, switch to using + as the line
continuation character, doing the line splicing in i386-gen.
This saves quite a number of shift instructions: The "operands" field
can now be retrieved by just masking (no shift), and extracting the
"extension_opcode" field now only requires a (signed) right shift, with
no prereq left one. (Of course there may be architectures where, in a
cross build, there might be no difference at all, e.g. when there are
suitable bitfield extraction insns.)
The only case where 64-bit code uses non-sign-extended (can also be
considered zero-extended) displacements is when an address size override
is in place for a memory operand (i.e. particularly excluding
displacements of direct branches, which - if at all - are controlled by
operand size, and then are still sign-extended, just from 16 bits).
Hence the distinction in templates is unnecessary, allowing code to be
simplified in a number of places. The only place where logic becomes
more complicated is when signed-ness of relocations is determined in
output_disp().
The other caveat is that Disp64 cannot be specified anymore in an insn
template at the same time as Disp32. Unlike for non-64-bit mode,
templates don't specify displacements for both possible addressing
modes; the necessary adjustment to the expected ones has already been
done in match_template() anyway (but of course the logic there needs
tweaking now). Hence the single template so far doing so is split.
Commit 7d5e4556a3 rendered the check near the end of what is now
i386_finalize_displacement() entirely dead for AT&T mode, since for
operands involving a displacement .unspecified will always be set. But
the logic there is bogus anyway - Intel syntax operand size specifiers
are of no interest there either. The only thing which matters in the
"displacement only" determination is .baseindex.
Of course when masking displacement kinds we should not at the same time
also mask off other attributes.
Furthermore the type mask returned by lex_got() also needs to be
adjusted: The only case where we want Disp32 (rather than Disp32S) is
when dealing with 32-bit addressing mode in 64-bit code.
Setting this field risks cpu_flags_all_zero() mistakenly returning
"false" when the object passed in was e.g. the result of ANDing together
two objects which had the bit set, or ANDNing together an object with
the field set and one with the field clear.
While there also avoid setting CpuNo64: Like Cpu64 this is driven
differently anyway and hence shouldn't be set anywhere by default.
Note that the moving of the two items in i386-gen.c's cpu_flags[] is
only for documentation purposes (and slight reducing of overhead), as
the fields are sorted anyway upon program start.
There's no need for the arbitrary special "unknown" token: Simply
recognize the leading ~ and process everything else the same, merely
recording whether to set individual fields to 1 or 0.
While there exclude CpuIAMCU from CPU_UNKNOWN_FLAGS - CPU_IAMCU_FLAGS
override cpu_arch_flags anyway when -march=iamcu is passed, and there's
no reason to have the stray flag set even if no insn actually is keyed
to it.
The checks done by check_cpu_arch_compatible() were halfway sensible
only at the time where only L1OM support was there. The purpose,
however, has always been to prevent bad uses of .arch (turning off the
base CPU "feature" flag) while at the same time permitting extensions to
be enabled / disabled. In order to achieve this (and to prevent
regressions when L1OM and K1OM support are removed)
- set CpuIAMCU in CPU_IAMCU_FLAGS,
- adjust the IAMCU check in the function itself (the other two similarly
broken checks aren't adjusted as they're slated to be removed anyway),
- avoid calling the function for extentions (which would never have the
base "feature" flag set),
- add a new testcase actually exercising ".arch iamcu" (which would also
regress with the planned removal).
To avoid issues like that addressed by 6e3e5c9e41 ("x86: extend SSE
check to PCLMULQDQ, AES, and GFNI insns"), base the check on opcode
attributes and operand types.
The result of running etc/update-copyright.py --this-year, fixing all
the files whose mode is changed by the script, plus a build with
--enable-maintainer-mode --enable-cgen-maint=yes, then checking
out */po/*.pot which we don't update frequently.
The copy of cgen was with commit d1dd5fcc38ead reverted as that commit
breaks building of bfp opcodes files.
Intel AVX512 FP16 instructions use maps 3, 5 and 6. Maps 5 and 6 use 3 bits
in the EVEX.mmm field (0b101, 0b110). Map 5 is for instructions that were FP32
in map 1 (0Fxx). Map 6 is for instructions that were FP32 in map 2 (0F38xx).
There are some exceptions to this rule. Some things in map 1 (0Fxx) with imm8
operands predated our current conventions; those instructions moved to map 3.
FP32 things in map 3 (0F3Axx) found new opcodes in map3 for FP16 because map3
is very sparsely populated. Most of the FP16 instructions share opcodes and
prefix (EVEX.pp) bits with the related FP32 operations.
Intel AVX512 FP16 instructions has new displacements scaling rules, please refer
to the public software developer manual for detail information.
gas/
2021-08-05 Igor Tsimbalist <igor.v.tsimbalist@intel.com>
H.J. Lu <hongjiu.lu@intel.com>
Wei Xiao <wei3.xiao@intel.com>
Lili Cui <lili.cui@intel.com>
* config/tc-i386.c (struct Broadcast_Operation): Adjust comment.
(cpu_arch): Add .avx512_fp16.
(cpu_noarch): Add noavx512_fp16.
(pte): Add evexmap5 and evexmap6.
(build_evex_prefix): Handle EVEXMAP5 and EVEXMAP6.
(check_VecOperations): Handle {1to32}.
(check_VecOperands): Handle CheckRegNumb.
(check_word_reg): Handle Toqword.
(i386_error): Add invalid_dest_and_src_register_set.
(match_template): Handle invalid_dest_and_src_register_set.
* doc/c-i386.texi: Document avx512_fp16, noavx512_fp16.
opcodes/
2021-08-05 Igor Tsimbalist <igor.v.tsimbalist@intel.com>
H.J. Lu <hongjiu.lu@intel.com>
Wei Xiao <wei3.xiao@intel.com>
Lili Cui <lili.cui@intel.com>
* i386-dis.c (EXwScalarS): New.
(EXxh): Ditto.
(EXxhc): Ditto.
(EXxmmqh): Ditto.
(EXxmmqdh): Ditto.
(EXEvexXwb): Ditto.
(DistinctDest_Fixup): Ditto.
(enum): Add xh_mode, evex_half_bcst_xmmqh_mode, evex_half_bcst_xmmqdh_mode
and w_swap_mode.
(enum): Add PREFIX_EVEX_0F3A08_W_0, PREFIX_EVEX_0F3A0A_W_0,
PREFIX_EVEX_0F3A26, PREFIX_EVEX_0F3A27, PREFIX_EVEX_0F3A56,
PREFIX_EVEX_0F3A57, PREFIX_EVEX_0F3A66, PREFIX_EVEX_0F3A67,
PREFIX_EVEX_0F3AC2, PREFIX_EVEX_MAP5_10, PREFIX_EVEX_MAP5_11,
PREFIX_EVEX_MAP5_1D, PREFIX_EVEX_MAP5_2A, PREFIX_EVEX_MAP5_2C,
PREFIX_EVEX_MAP5_2D, PREFIX_EVEX_MAP5_2E, PREFIX_EVEX_MAP5_2F,
PREFIX_EVEX_MAP5_51, PREFIX_EVEX_MAP5_58, PREFIX_EVEX_MAP5_59,
PREFIX_EVEX_MAP5_5A_W_0, PREFIX_EVEX_MAP5_5A_W_1,
PREFIX_EVEX_MAP5_5B_W_0, PREFIX_EVEX_MAP5_5B_W_1,
PREFIX_EVEX_MAP5_5C, PREFIX_EVEX_MAP5_5D, PREFIX_EVEX_MAP5_5E,
PREFIX_EVEX_MAP5_5F, PREFIX_EVEX_MAP5_78, PREFIX_EVEX_MAP5_79,
PREFIX_EVEX_MAP5_7A, PREFIX_EVEX_MAP5_7B, PREFIX_EVEX_MAP5_7C,
PREFIX_EVEX_MAP5_7D_W_0, PREFIX_EVEX_MAP6_13, PREFIX_EVEX_MAP6_56,
PREFIX_EVEX_MAP6_57, PREFIX_EVEX_MAP6_D6, PREFIX_EVEX_MAP6_D7
(enum): Add EVEX_MAP5 and EVEX_MAP6.
(enum): Add EVEX_W_MAP5_5A, EVEX_W_MAP5_5B,
EVEX_W_MAP5_78_P_0, EVEX_W_MAP5_78_P_2, EVEX_W_MAP5_79_P_0,
EVEX_W_MAP5_79_P_2, EVEX_W_MAP5_7A_P_2, EVEX_W_MAP5_7A_P_3,
EVEX_W_MAP5_7B_P_2, EVEX_W_MAP5_7C_P_0, EVEX_W_MAP5_7C_P_2,
EVEX_W_MAP5_7D, EVEX_W_MAP6_13_P_0, EVEX_W_MAP6_13_P_2,
(get_valid_dis386): Properly handle new instructions.
(intel_operand_size): Handle new modes.
(OP_E_memory): Ditto.
(OP_EX): Ditto.
* i386-dis-evex.h: Updated for AVX512_FP16.
* i386-dis-evex-mod.h: Updated for AVX512_FP16.
* i386-dis-evex-prefix.h: Updated for AVX512_FP16.
* i386-dis-evex-reg.h : Updated for AVX512_FP16.
* i386-dis-evex-w.h : Updated for AVX512_FP16.
* i386-gen.c (cpu_flag_init): Add CPU_AVX512_FP16_FLAGS,
and CPU_ANY_AVX512_FP16_FLAGS. Update CPU_ANY_AVX512F_FLAGS
and CPU_ANY_AVX512BW_FLAGS.
(cpu_flags): Add CpuAVX512_FP16.
(opcode_modifiers): Add DistinctDest.
* i386-opc.h (enum): (AVX512_FP16): New.
(i386_opcode_modifier): Add reqdistinctreg.
(i386_cpu_flags): Add cpuavx512_fp16.
(EVEXMAP5): Defined as a macro.
(EVEXMAP6): Ditto.
* i386-opc.tbl: Add Intel AVX512_FP16 instructions.
* i386-init.h: Regenerated.
* i386-tbl.h: Ditto.
This way not only the overall (source) table size shrinks by quite a
bit and the risk of related templates going out of sync with one another
gets lowered, but also (dis)similarities between neighboring templates
become easier to spot.
Note that for certain SSE2AVX templates this results in benign attribute
changes:
- LDMXCSR and STMXCSR: NoAVX gets set,
- MOVMSKPS, PMOVMSKB, PEXTR{B,W} (register destination), and PINSR{B,W}
(register source): IgnoreSize and NoRex64 get set,
- CVT{DQ,PS}2PD, CVTSD2SS, MOVMSKPD, MOVDDUP, PMOV{S,Z}X{BW,WD,DQ}, and
ROUNDSD: NoRex64 gets set,
- CVTSS2SD, INSERTPS, PEXTRW (memory destination), PINSRW (memory
source), and PMOV{S,Z}X{BD,WQ,BQ}: IgnoreSize gets set.
Similarly the "normal" (non-SSE2AVX)
- non-64-bit CVTS{I,S}2SD forms get NoRex64 set,
- CMP{EQ,ORD,NEQ,UNORD}{P,S}{S,D} forms get C set,
all again in a benign way.
The remaining differences in the generated table are due to re-ordering
of entries in the course of being folded into templates.
The table entries are more natural to read (and slightly shorter) when
the prefixes, like is the case for VEX/XOP/EVEX-encoded entries, are
specified as part of the opcode. This is particularly noticable for
side-by-side legacy and SSE2AVX entries.
An implication is that we now need to use "unsigned long long" for the
initially parsed opcode in i386-gen. I don't expect this to be an issue.
Just like is already done for VEX/XOP/EVEX encoded insns, record the
encoding space information in the respective opcode modifier field. Do
this again without changing the source table, but rather by deriving the
values from their existing source representation.
In the majority of cases we can easily determine the length from the
encoding, irrespective of whether a prefix is specified there as well.
We further don't even need to record the value in the table entries, as
it's easy enough to determine it (without any guesswork, unless an insn
with major opcode 00 appeared that requires a 2nd opcode byte to be
specified explicitly) when installing the chosen template for further
processing.
Should an encoding appear which
- has a major opcode byte of 66, F3, or F2,
- requires a 2nd opcode byte to be specified explicitly,
- doesn't have a mandatory prefix
we'd need to convert all templates presently encoding a mandatory prefix
this way to the Prefix_0X<nn> model to eliminate the respective guessing
i386-gen does.
Just like is already done for legacy encoded insns, record the mandatory
prefix information in the respective opcode modifier field. Do this
without changing the source table, but rather by deriving the values from
their existing source representation.
This is in preparation of opcode_length going away as a field in the
templates. Identify pseudo prefixes by a base opcode of zero instead:
No real prefix has an opcode of zero. This at the same time allows
dropping a curious special case from i386-gen.
Since most attributes are identical for all pseudo prefixes, take the
opportunity and also template them.
Commit 8b65b8953a ("x86: Remove the prefix byte from non-VEX/EVEX
base_opcode") used the opcodeprefix field for two distinct purposes. In
preparation of having VEX/XOP/EVEX and non-VEX templates become similar
in the representatioon of both encoding space and opcode prefixes, split
the field to have a separate one holding an insn's opcode space.
RepPrefixOk, HLEPrefixOk, and NoTrackPrefixOk can't be specified
together, so can share an enum-like field. IsLockable can be inferred
from HLE setting and hence only needs specifying when neither of them
is present.
Having this count explicitly in the table is redundant and (even if just
slightly) disturbs clarity. Infer the count from the number of operands
actually found.
Also convert the "no operands" indicator from "{ 0 }" to just "{}", as
that (now) ends up being easier to parse.