Make our way through the AVX instructions: conversions.
This is all I have time for now... hopefully this can service as a
generous source of examples.
Accept the gas mnemonics "ud2a" and "ud2b" for the instructions we
call ud2 and ud1 respectively, and Intel call ud2 and undocumented :)
Also, 0F FF is ud0 regardless of prefixes, at least as far as we know.
Use compiler-generated bytecodes for the AVX instruction demos. This
should make it a lot easier for other people (HINT, HINT) to add the
instruction table.
First cut at AVX machinery support. The only instruction implemented
is VPERMIL2PS, and it's probably buggy. I'm checking this in with the
hope that other people can start helping out with (a) testing this,
and (b) adding instructions.
NDISASM support is not there yet.
Add "MOV reg64,imm32" as a special rule, to handle the case of
"mov rax,dword <foo>", where <foo> is sign-extended; this is a 7-byte
form, as opposed to "mov eax,<foo>" (5 bytes) and "mov rax,<foo>" (10
bytes).
At some point, the optimizer needs to be able to handle these.
Add the XSAVE group of instructions: XSAVE, XRSTOR, XGETBV, XSETBV.
The CPU feature information is bogus, but so is our entire handling of
CPU feature sets for anything but the bare necessities (long jump
emulation, etc.)
1. Allow included files in rdsrc.pl
2. New program inslist.pl to generate instruction list from insns.dat
3. Mark certain comments in insns.dat as documentation subheaders
4. Add Instruction List appendix to nasmdoc.src
5. Update build process to invoke inslist.pl
Mark MMX instructions with \323 (do not add REX.W) unless they involve
the integer instruction file.
Change SM -> SQ for MMX instructions.
Something not complete attached, so my understanding is
mmxreg,mmxrm needs SQ
Something like xmmreg,reg32 needs SD
xmmreg,xmmrm needs SO
The CMPSW/CMPSD/CMPSQ instructions were broken by checkin
a30cc07224 due to an incorrect removal
of \1 (should only have been removed after \144-147 and \154-157). I
have verified that no other instructions were affected.
- Correct the building on the disassembler decision tree.
- Handle SSE instructions with F2 prefix (\332) correctly.
- Mark instructions which are now used as prefixes with ND.
(In a future version when we have better CPU version handling,
we should probably build the decision tree at runtime based on
the selected CPU feature sets.)
- Sanitize the handling of \144-147 and \154-157 in both the assembler
and disassembler. They take an opcode byte as argument; don't
pretend they don't.
Support the zero-operand form of floating-point instructions. Note
that in most cases, the form generated is actually the "popping" form,
e.g. "FADD" becomes "FADDP st0,st1". This is in accordance with the
Intel documentation. "FADDP" is also supported.
Un-special-case "xchg rax,rax"; allow it to be encoded as 48 90 for
orthogonality's sake. It's a no-op, to be sure, but so are many other
instructions.
"xchg eax,eax" is still special-cased in 64-bit mode since it is not a
no-op; unadorned opcode 90 is now simply "nop" and nothing else.
Make the disassembler detect unused REX.W and display them as an "o64"
prefix.
Revamp the address- and prefix-handling code to make more sense in
64-bit mode. We are now a lot closer to where we want to be, but
we're not quite there yet.
ndisasm may very well have problems, or give counterintuitive output.
However, checking it in so we can make forward progress.
0F 18-1F are reserved for hinting NOPs; they all take a single memory
operand which may be sized. Allow the use of systematic names; this
also makes sure they get sensibly disassembled.
INVLPGA is defined as taking rax,ecx but "the portion of rax used to
form the address is determined by the effective address size", so it
is really ax/eax/rax.
Auto-generate 0x67 prefixes without the need for \30x codes; the
prefix is automatically added when there is a memory operand with
address size differing from the current address size (and impossible
combinations checked for.)
The UMOV opcodes have been recycled; tag UMOV as ND until we have a
better way to specify to the disassembler exactly how it wants
instructions interpreted.
0F 1F /0 is documented as an EA-taking NOP since the P6.
0F 18..1F + EA are all "hinting nops" (instructions which, when
unimplemented, have no effect rather than #UD) but 0F 1F /0
specifically has no operation whatsoever.
Implement oword, reso, do, as well as the SO flag to instructions. No
instructions are actually flagged with SO yet, but this allows us to
specify 128-bit sizes in instruction patterns.
This checkin completes what is required to actually generate SSE5
instructions. No support in the disassembler yet.
This checkin covers:
- Support for actually generating DREX prefixes.
- Support for matching operand "operand X must match Y"
Add the SSSE3, SSE4.1 and SSE4.2 instruction sets. Change \332 to be
a literal 0xF2 prefix, by analog with \333 for 0xF3 prefix (the
previous \332 flag changed to \335). This is necessary to get the REX
prefix in the right place for instructions that use it.
We are going to have to go in and change existing instruction patterns
which use these, as well.
Use a script to find \321's that should be \324's. This is not in any
way guaranteed to be an exhaustive list, however, I have manually verified
that all the items that *were* changed *should* be changed.
- MOV gpr,CRx or MOV CRx,gpr can access high control registers with a LOCK
prefix; handle that in both the assembler and disassembler.
- Get a saner error message when trying to access high resources in
non-64-bit mode.
The assembler doesn't seem to care, but for the disassembler, it's
vitally important that we get our operand-size hints correctly. We
probably need to audit insns.dat for this kinds of errors.
CR8 is not special in any way as far as the assembler is concerned. It's
listed as having a special form in the Intel documentation, but that is
only because there are no other CRs which require a REX prefix.
MOV to CR8 is special in the sense that it's a non-serializing
instruction, but that's irrelevant to the assembler.
Furthermore, it's totally unclear how TRs should be handled in long mode;
there are no CPUs which uses TRs which also have long mode, so the easiest
is to simply mark those instructions NOLONG.
Finally, add PRIV to some privileged instructions.
a) Automatically generate dependencies for all Makefiles;
b) Move register definitions to a separate .dat file;
c) Add support for "unimplemented but there in theory" registers.