mirror of
https://github.com/netwide-assembler/nasm.git
synced 2024-12-21 09:19:31 +08:00
6733 lines
263 KiB
Plaintext
6733 lines
263 KiB
Plaintext
|
\A{iref} x86 Instruction Reference
|
||
|
|
||
|
This appendix provides a complete list of the machine instructions
|
||
|
which NASM will assemble, and a short description of the function of
|
||
|
each one.
|
||
|
|
||
|
It is not intended to be an exhaustive documentation on the fine
|
||
|
details of the instructions' function, such as which exceptions they
|
||
|
can trigger: for such documentation, you should go to Intel's Web
|
||
|
site, \W{http://developer.intel.com/design/Pentium4/manuals/}\c{http://developer.intel.com/design/Pentium4/manuals/}.
|
||
|
|
||
|
Instead, this appendix is intended primarily to provide
|
||
|
documentation on the way the instructions may be used within NASM.
|
||
|
For example, looking up \c{LOOP} will tell you that NASM allows
|
||
|
\c{CX} or \c{ECX} to be specified as an optional second argument to
|
||
|
the \c{LOOP} instruction, to enforce which of the two possible
|
||
|
counter registers should be used if the default is not the one
|
||
|
desired.
|
||
|
|
||
|
The instructions are not quite listed in alphabetical order, since
|
||
|
groups of instructions with similar functions are lumped together in
|
||
|
the same entry. Most of them don't move very far from their
|
||
|
alphabetic position because of this.
|
||
|
|
||
|
|
||
|
\H{iref-opr} Key to Operand Specifications
|
||
|
|
||
|
The instruction descriptions in this appendix specify their operands
|
||
|
using the following notation:
|
||
|
|
||
|
\b Registers: \c{reg8} denotes an 8-bit \i{general purpose
|
||
|
register}, \c{reg16} denotes a 16-bit general purpose register,
|
||
|
\c{reg32} a 32-bit one and \c{reg64} a 64-bit one. \c{fpureg} denotes
|
||
|
one of the eight FPU stack registers, \c{mmxreg} denotes one of the
|
||
|
eight 64-bit MMX registers, and \c{segreg} denotes a segment register.
|
||
|
\c{xmmreg} denotes one of the 8, or 16 in x64 long mode, SSE XMM registers.
|
||
|
In addition, some registers (such as \c{AL}, \c{DX}, \c{ECX} or \c{RAX})
|
||
|
may be specified explicitly.
|
||
|
|
||
|
\b Immediate operands: \c{imm} denotes a generic \i{immediate operand}.
|
||
|
\c{imm8}, \c{imm16} and \c{imm32} are used when the operand is
|
||
|
intended to be a specific size. For some of these instructions, NASM
|
||
|
needs an explicit specifier: for example, \c{ADD ESP,16} could be
|
||
|
interpreted as either \c{ADD r/m32,imm32} or \c{ADD r/m32,imm8}.
|
||
|
NASM chooses the former by default, and so you must specify \c{ADD
|
||
|
ESP,BYTE 16} for the latter. There is a special case of the allowance
|
||
|
of an \c{imm64} for particular x64 versions of the MOV instruction.
|
||
|
|
||
|
\b Memory references: \c{mem} denotes a generic \i{memory reference};
|
||
|
\c{mem8}, \c{mem16}, \c{mem32}, \c{mem64} and \c{mem80} are used
|
||
|
when the operand needs to be a specific size. Again, a specifier is
|
||
|
needed in some cases: \c{DEC [address]} is ambiguous and will be
|
||
|
rejected by NASM. You must specify \c{DEC BYTE [address]}, \c{DEC
|
||
|
WORD [address]} or \c{DEC DWORD [address]} instead.
|
||
|
|
||
|
\b \i{Restricted memory references}: one form of the \c{MOV}
|
||
|
instruction allows a memory address to be specified \e{without}
|
||
|
allowing the normal range of register combinations and effective
|
||
|
address processing. This is denoted by \c{memoffs8}, \c{memoffs16},
|
||
|
\c{memoffs32} or \c{memoffs64}.
|
||
|
|
||
|
\b Register or memory choices: many instructions can accept either a
|
||
|
register \e{or} a memory reference as an operand. \c{r/m8} is
|
||
|
shorthand for \c{reg8/mem8}; similarly \c{r/m16} and \c{r/m32}.
|
||
|
On legacy x86 modes, \c{r/m64} is MMX-related, and is shorthand for
|
||
|
\c{mmxreg/mem64}. When utilizing the x86-64 architecture extension,
|
||
|
\c{r/m64} denotes use of a 64-bit GPR as well, and is shorthand for
|
||
|
\c{reg64/mem64}.
|
||
|
|
||
|
|
||
|
\H{iref-opc} Key to Opcode Descriptions
|
||
|
|
||
|
This appendix also provides the opcodes which NASM will generate for
|
||
|
each form of each instruction. The opcodes are listed in the
|
||
|
following way:
|
||
|
|
||
|
\b A hex number, such as \c{3F}, indicates a fixed byte containing
|
||
|
that number.
|
||
|
|
||
|
\b A hex number followed by \c{+r}, such as \c{C8+r}, indicates that
|
||
|
one of the operands to the instruction is a register, and the
|
||
|
`register value' of that register should be added to the hex number
|
||
|
to produce the generated byte. For example, EDX has register value
|
||
|
2, so the code \c{C8+r}, when the register operand is EDX, generates
|
||
|
the hex byte \c{CA}. Register values for specific registers are
|
||
|
given in \k{iref-rv}.
|
||
|
|
||
|
\b A hex number followed by \c{+cc}, such as \c{40+cc}, indicates
|
||
|
that the instruction name has a condition code suffix, and the
|
||
|
numeric representation of the condition code should be added to the
|
||
|
hex number to produce the generated byte. For example, the code
|
||
|
\c{40+cc}, when the instruction contains the \c{NE} condition,
|
||
|
generates the hex byte \c{45}. Condition codes and their numeric
|
||
|
representations are given in \k{iref-cc}.
|
||
|
|
||
|
\b A slash followed by a digit, such as \c{/2}, indicates that one
|
||
|
of the operands to the instruction is a memory address or register
|
||
|
(denoted \c{mem} or \c{r/m}, with an optional size). This is to be
|
||
|
encoded as an effective address, with a \i{ModR/M byte}, an optional
|
||
|
\i{SIB byte}, and an optional displacement, and the spare (register)
|
||
|
field of the ModR/M byte should be the digit given (which will be
|
||
|
from 0 to 7, so it fits in three bits). The encoding of effective
|
||
|
addresses is given in \k{iref-ea}.
|
||
|
|
||
|
\b The code \c{/r} combines the above two: it indicates that one of
|
||
|
the operands is a memory address or \c{r/m}, and another is a
|
||
|
register, and that an effective address should be generated with the
|
||
|
spare (register) field in the ModR/M byte being equal to the
|
||
|
`register value' of the register operand. The encoding of effective
|
||
|
addresses is given in \k{iref-ea}; register values are given in
|
||
|
\k{iref-rv}.
|
||
|
|
||
|
\b The codes \c{ib}, \c{iw} and \c{id} indicate that one of the
|
||
|
operands to the instruction is an immediate value, and that this is
|
||
|
to be encoded as a byte, little-endian word or little-endian
|
||
|
doubleword respectively.
|
||
|
|
||
|
\b The codes \c{rb}, \c{rw} and \c{rd} indicate that one of the
|
||
|
operands to the instruction is an immediate value, and that the
|
||
|
\e{difference} between this value and the address of the end of the
|
||
|
instruction is to be encoded as a byte, word or doubleword
|
||
|
respectively. Where the form \c{rw/rd} appears, it indicates that
|
||
|
either \c{rw} or \c{rd} should be used according to whether assembly
|
||
|
is being performed in \c{BITS 16} or \c{BITS 32} state respectively.
|
||
|
|
||
|
\b The codes \c{ow} and \c{od} indicate that one of the operands to
|
||
|
the instruction is a reference to the contents of a memory address
|
||
|
specified as an immediate value: this encoding is used in some forms
|
||
|
of the \c{MOV} instruction in place of the standard
|
||
|
effective-address mechanism. The displacement is encoded as a word
|
||
|
or doubleword. Again, \c{ow/od} denotes that \c{ow} or \c{od} should
|
||
|
be chosen according to the \c{BITS} setting.
|
||
|
|
||
|
\b The codes \c{o16} and \c{o32} indicate that the given form of the
|
||
|
instruction should be assembled with operand size 16 or 32 bits. In
|
||
|
other words, \c{o16} indicates a \c{66} prefix in \c{BITS 32} state,
|
||
|
but generates no code in \c{BITS 16} state; and \c{o32} indicates a
|
||
|
\c{66} prefix in \c{BITS 16} state but generates nothing in \c{BITS
|
||
|
32}.
|
||
|
|
||
|
\b The codes \c{a16} and \c{a32}, similarly to \c{o16} and \c{o32},
|
||
|
indicate the address size of the given form of the instruction.
|
||
|
Where this does not match the \c{BITS} setting, a \c{67} prefix is
|
||
|
required. Please note that \c{a16} is useless in long mode as
|
||
|
16-bit addressing is depreciated on the x86-64 architecture extension.
|
||
|
|
||
|
|
||
|
\S{iref-rv} Register Values
|
||
|
|
||
|
Where an instruction requires a register value, it is already
|
||
|
implicit in the encoding of the rest of the instruction what type of
|
||
|
register is intended: an 8-bit general-purpose register, a segment
|
||
|
register, a debug register, an MMX register, or whatever. Therefore
|
||
|
there is no problem with registers of different types sharing an
|
||
|
encoding value.
|
||
|
|
||
|
Please note that for the register classes listed below, the register
|
||
|
extensions (REX) classes require the use of the REX prefix, in which
|
||
|
is only available when in long mode on the x86-64 processor. This
|
||
|
pretty much goes for any register that has a number higher than 7.
|
||
|
|
||
|
The encodings for the various classes of register are:
|
||
|
|
||
|
\b 8-bit general registers: \c{AL} is 0, \c{CL} is 1, \c{DL} is 2,
|
||
|
\c{BL} is 3, \c{AH} is 4, \c{CH} is 5, \c{DH} is 6 and \c{BH} is
|
||
|
7. Please note that \c{AH}, \c{BH}, \c{CH} and \c{DH} are not
|
||
|
addressable when using the REX prefix in long mode.
|
||
|
|
||
|
\b 8-bit general register extensions (REX): \c{SPL} is 4, \c{BPL} is 5,
|
||
|
\c{SIL} is 6, \c{DIL} is 7, \c{R8B} is 8, \c{R9B} is 9, \c{R10B} is 10,
|
||
|
\c{R11B} is 11, \c{R12B} is 12, \c{R13B} is 13, \c{R14B} is 14 and
|
||
|
\c{R15B} is 15.
|
||
|
|
||
|
\b 16-bit general registers: \c{AX} is 0, \c{CX} is 1, \c{DX} is 2,
|
||
|
\c{BX} is 3, \c{SP} is 4, \c{BP} is 5, \c{SI} is 6, and \c{DI} is 7.
|
||
|
|
||
|
\b 16-bit general register extensions (REX): \c{R8W} is 8, \c{R9W} is 9,
|
||
|
\c{R10w} is 10, \c{R11W} is 11, \c{R12W} is 12, \c{R13W} is 13, \c{R14W}
|
||
|
is 14 and \c{R15W} is 15.
|
||
|
|
||
|
\b 32-bit general registers: \c{EAX} is 0, \c{ECX} is 1, \c{EDX} is
|
||
|
2, \c{EBX} is 3, \c{ESP} is 4, \c{EBP} is 5, \c{ESI} is 6, and
|
||
|
\c{EDI} is 7.
|
||
|
|
||
|
\b 32-bit general register extensions (REX): \c{R8D} is 8, \c{R9D} is 9,
|
||
|
\c{R10D} is 10, \c{R11D} is 11, \c{R12D} is 12, \c{R13D} is 13, \c{R14D}
|
||
|
is 14 and \c{R15D} is 15.
|
||
|
|
||
|
\b 64-bit general register extensions (REX): \c{RAX} is 0, \c{RCX} is 1,
|
||
|
\c{RDX} is 2, \c{RBX} is 3, \c{RSP} is 4, \c{RBP} is 5, \c{RSI} is 6,
|
||
|
\c{RDI} is 7, \c{R8} is 8, \c{R9} is 9, \c{R10} is 10, \c{R11} is 11,
|
||
|
\c{R12} is 12, \c{R13} is 13, \c{R14} is 14 and \c{R15} is 15.
|
||
|
|
||
|
\b \i{Segment registers}: \c{ES} is 0, \c{CS} is 1, \c{SS} is 2, \c{DS}
|
||
|
is 3, \c{FS} is 4, and \c{GS} is 5.
|
||
|
|
||
|
\b \I{floating-point, registers}Floating-point registers: \c{ST0}
|
||
|
is 0, \c{ST1} is 1, \c{ST2} is 2, \c{ST3} is 3, \c{ST4} is 4,
|
||
|
\c{ST5} is 5, \c{ST6} is 6, and \c{ST7} is 7.
|
||
|
|
||
|
\b 64-bit \i{MMX registers}: \c{MM0} is 0, \c{MM1} is 1, \c{MM2} is 2,
|
||
|
\c{MM3} is 3, \c{MM4} is 4, \c{MM5} is 5, \c{MM6} is 6, and \c{MM7}
|
||
|
is 7.
|
||
|
|
||
|
\b 128-bit \i{XMM (SSE) registers}: \c{XMM0} is 0, \c{XMM1} is 1,
|
||
|
\c{XMM2} is 2, \c{XMM3} is 3, \c{XMM4} is 4, \c{XMM5} is 5, \c{XMM6} is
|
||
|
6 and \c{XMM7} is 7.
|
||
|
|
||
|
\b 128-bit \i{XMM (SSE) register} extensions (REX): \c{XMM8} is 8,
|
||
|
\c{XMM9} is 9, \c{XMM10} is 10, \c{XMM11} is 11, \c{XMM12} is 12,
|
||
|
\c{XMM13} is 13, \c{XMM14} is 14 and \c{XMM15} is 15.
|
||
|
|
||
|
\b \i{Control registers}: \c{CR0} is 0, \c{CR2} is 2, \c{CR3} is 3,
|
||
|
and \c{CR4} is 4.
|
||
|
|
||
|
\b \i{Control register} extensions: \c{CR8} is 8.
|
||
|
|
||
|
\b \i{Debug registers}: \c{DR0} is 0, \c{DR1} is 1, \c{DR2} is 2,
|
||
|
\c{DR3} is 3, \c{DR6} is 6, and \c{DR7} is 7.
|
||
|
|
||
|
\b \i{Test registers}: \c{TR3} is 3, \c{TR4} is 4, \c{TR5} is 5,
|
||
|
\c{TR6} is 6, and \c{TR7} is 7.
|
||
|
|
||
|
(Note that wherever a register name contains a number, that number
|
||
|
is also the register value for that register.)
|
||
|
|
||
|
|
||
|
\S{iref-cc} \i{Condition Codes}
|
||
|
|
||
|
The available condition codes are given here, along with their
|
||
|
numeric representations as part of opcodes. Many of these condition
|
||
|
codes have synonyms, so several will be listed at a time.
|
||
|
|
||
|
In the following descriptions, the word `either', when applied to two
|
||
|
possible trigger conditions, is used to mean `either or both'. If
|
||
|
`either but not both' is meant, the phrase `exactly one of' is used.
|
||
|
|
||
|
\b \c{O} is 0 (trigger if the overflow flag is set); \c{NO} is 1.
|
||
|
|
||
|
\b \c{B}, \c{C} and \c{NAE} are 2 (trigger if the carry flag is
|
||
|
set); \c{AE}, \c{NB} and \c{NC} are 3.
|
||
|
|
||
|
\b \c{E} and \c{Z} are 4 (trigger if the zero flag is set); \c{NE}
|
||
|
and \c{NZ} are 5.
|
||
|
|
||
|
\b \c{BE} and \c{NA} are 6 (trigger if either of the carry or zero
|
||
|
flags is set); \c{A} and \c{NBE} are 7.
|
||
|
|
||
|
\b \c{S} is 8 (trigger if the sign flag is set); \c{NS} is 9.
|
||
|
|
||
|
\b \c{P} and \c{PE} are 10 (trigger if the parity flag is set);
|
||
|
\c{NP} and \c{PO} are 11.
|
||
|
|
||
|
\b \c{L} and \c{NGE} are 12 (trigger if exactly one of the sign and
|
||
|
overflow flags is set); \c{GE} and \c{NL} are 13.
|
||
|
|
||
|
\b \c{LE} and \c{NG} are 14 (trigger if either the zero flag is set,
|
||
|
or exactly one of the sign and overflow flags is set); \c{G} and
|
||
|
\c{NLE} are 15.
|
||
|
|
||
|
Note that in all cases, the sense of a condition code may be
|
||
|
reversed by changing the low bit of the numeric representation.
|
||
|
|
||
|
For details of when an instruction sets each of the status flags,
|
||
|
see the individual instruction, plus the Status Flags reference
|
||
|
in \k{iref-Flags}
|
||
|
|
||
|
|
||
|
\S{iref-SSE-cc} \i{SSE Condition Predicates}
|
||
|
|
||
|
The condition predicates for SSE comparison instructions are the
|
||
|
codes used as part of the opcode, to determine what form of
|
||
|
comparison is being carried out. In each case, the imm8 value is
|
||
|
the final byte of the opcode encoding, and the predicate is the
|
||
|
code used as part of the mnemonic for the instruction (equivalent
|
||
|
to the "cc" in an integer instruction that used a condition code).
|
||
|
The instructions that use this will give details of what the various
|
||
|
mnemonics are, this table is used to help you work out details of what
|
||
|
is happening.
|
||
|
|
||
|
\c Predi- imm8 Description Relation where: Emula- Result QNaN
|
||
|
\c cate Encod- A Is 1st Operand tion if NaN Signal
|
||
|
\c ing B Is 2nd Operand Operand Invalid
|
||
|
\c
|
||
|
\c EQ 000B equal A = B False No
|
||
|
\c
|
||
|
\c LT 001B less-than A < B False Yes
|
||
|
\c
|
||
|
\c LE 010B less-than- A <= B False Yes
|
||
|
\c or-equal
|
||
|
\c
|
||
|
\c --- ---- greater A > B Swap False Yes
|
||
|
\c than Operands,
|
||
|
\c Use LT
|
||
|
\c
|
||
|
\c --- ---- greater- A >= B Swap False Yes
|
||
|
\c than-or-equal Operands,
|
||
|
\c Use LE
|
||
|
\c
|
||
|
\c UNORD 011B unordered A, B = Unordered True No
|
||
|
\c
|
||
|
\c NEQ 100B not-equal A != B True No
|
||
|
\c
|
||
|
\c NLT 101B not-less- NOT(A < B) True Yes
|
||
|
\c than
|
||
|
\c
|
||
|
\c NLE 110B not-less- NOT(A <= B) True Yes
|
||
|
\c than-or-
|
||
|
\c equal
|
||
|
\c
|
||
|
\c --- ---- not-greater NOT(A > B) Swap True Yes
|
||
|
\c than Operands,
|
||
|
\c Use NLT
|
||
|
\c
|
||
|
\c --- ---- not-greater NOT(A >= B) Swap True Yes
|
||
|
\c than- Operands,
|
||
|
\c or-equal Use NLE
|
||
|
\c
|
||
|
\c ORD 111B ordered A , B = Ordered False No
|
||
|
|
||
|
The unordered relationship is true when at least one of the two
|
||
|
values being compared is a NaN or in an unsupported format.
|
||
|
|
||
|
Note that the comparisons which are listed as not having a predicate
|
||
|
or encoding can only be achieved through software emulation, as
|
||
|
described in the "emulation" column. Note in particular that an
|
||
|
instruction such as \c{greater-than} is not the same as \c{NLE}, as,
|
||
|
unlike with the \c{CMP} instruction, it has to take into account the
|
||
|
possibility of one operand containing a NaN or an unsupported numeric
|
||
|
format.
|
||
|
|
||
|
|
||
|
\S{iref-Flags} \i{Status Flags}
|
||
|
|
||
|
The status flags provide some information about the result of the
|
||
|
arithmetic instructions. This information can be used by conditional
|
||
|
instructions (such a \c{Jcc} and \c{CMOVcc}) as well as by some of
|
||
|
the other instructions (such as \c{ADC} and \c{INTO}).
|
||
|
|
||
|
There are 6 status flags:
|
||
|
|
||
|
\c CF - Carry flag.
|
||
|
|
||
|
Set if an arithmetic operation generates a
|
||
|
carry or a borrow out of the most-significant bit of the result;
|
||
|
cleared otherwise. This flag indicates an overflow condition for
|
||
|
unsigned-integer arithmetic. It is also used in multiple-precision
|
||
|
arithmetic.
|
||
|
|
||
|
\c PF - Parity flag.
|
||
|
|
||
|
Set if the least-significant byte of the result contains an even
|
||
|
number of 1 bits; cleared otherwise.
|
||
|
|
||
|
\c AF - Adjust flag.
|
||
|
|
||
|
Set if an arithmetic operation generates a carry or a borrow
|
||
|
out of bit 3 of the result; cleared otherwise. This flag is used
|
||
|
in binary-coded decimal (BCD) arithmetic.
|
||
|
|
||
|
\c ZF - Zero flag.
|
||
|
|
||
|
Set if the result is zero; cleared otherwise.
|
||
|
|
||
|
\c SF - Sign flag.
|
||
|
|
||
|
Set equal to the most-significant bit of the result, which is the
|
||
|
sign bit of a signed integer. (0 indicates a positive value and 1
|
||
|
indicates a negative value.)
|
||
|
|
||
|
\c OF - Overflow flag.
|
||
|
|
||
|
Set if the integer result is too large a positive number or too
|
||
|
small a negative number (excluding the sign-bit) to fit in the
|
||
|
destination operand; cleared otherwise. This flag indicates an
|
||
|
overflow condition for signed-integer (two's complement) arithmetic.
|
||
|
|
||
|
|
||
|
\S{iref-ea} Effective Address Encoding: \i{ModR/M} and \i{SIB}
|
||
|
|
||
|
An \i{effective address} is encoded in up to three parts: a ModR/M
|
||
|
byte, an optional SIB byte, and an optional byte, word or doubleword
|
||
|
displacement field.
|
||
|
|
||
|
The ModR/M byte consists of three fields: the \c{mod} field, ranging
|
||
|
from 0 to 3, in the upper two bits of the byte, the \c{r/m} field,
|
||
|
ranging from 0 to 7, in the lower three bits, and the spare
|
||
|
(register) field in the middle (bit 3 to bit 5). The spare field is
|
||
|
not relevant to the effective address being encoded, and either
|
||
|
contains an extension to the instruction opcode or the register
|
||
|
value of another operand.
|
||
|
|
||
|
The ModR/M system can be used to encode a direct register reference
|
||
|
rather than a memory access. This is always done by setting the
|
||
|
\c{mod} field to 3 and the \c{r/m} field to the register value of
|
||
|
the register in question (it must be a general-purpose register, and
|
||
|
the size of the register must already be implicit in the encoding of
|
||
|
the rest of the instruction). In this case, the SIB byte and
|
||
|
displacement field are both absent.
|
||
|
|
||
|
In 16-bit addressing mode (either \c{BITS 16} with no \c{67} prefix,
|
||
|
or \c{BITS 32} with a \c{67} prefix), the SIB byte is never used.
|
||
|
The general rules for \c{mod} and \c{r/m} (there is an exception,
|
||
|
given below) are:
|
||
|
|
||
|
\b The \c{mod} field gives the length of the displacement field: 0
|
||
|
means no displacement, 1 means one byte, and 2 means two bytes.
|
||
|
|
||
|
\b The \c{r/m} field encodes the combination of registers to be
|
||
|
added to the displacement to give the accessed address: 0 means
|
||
|
\c{BX+SI}, 1 means \c{BX+DI}, 2 means \c{BP+SI}, 3 means \c{BP+DI},
|
||
|
4 means \c{SI} only, 5 means \c{DI} only, 6 means \c{BP} only, and 7
|
||
|
means \c{BX} only.
|
||
|
|
||
|
However, there is a special case:
|
||
|
|
||
|
\b If \c{mod} is 0 and \c{r/m} is 6, the effective address encoded
|
||
|
is not \c{[BP]} as the above rules would suggest, but instead
|
||
|
\c{[disp16]}: the displacement field is present and is two bytes
|
||
|
long, and no registers are added to the displacement.
|
||
|
|
||
|
Therefore the effective address \c{[BP]} cannot be encoded as
|
||
|
efficiently as \c{[BX]}; so if you code \c{[BP]} in a program, NASM
|
||
|
adds a notional 8-bit zero displacement, and sets \c{mod} to 1,
|
||
|
\c{r/m} to 6, and the one-byte displacement field to 0.
|
||
|
|
||
|
In 32-bit addressing mode (either \c{BITS 16} with a \c{67} prefix,
|
||
|
or \c{BITS 32} with no \c{67} prefix) the general rules (again,
|
||
|
there are exceptions) for \c{mod} and \c{r/m} are:
|
||
|
|
||
|
\b The \c{mod} field gives the length of the displacement field: 0
|
||
|
means no displacement, 1 means one byte, and 2 means four bytes.
|
||
|
|
||
|
\b If only one register is to be added to the displacement, and it
|
||
|
is not \c{ESP}, the \c{r/m} field gives its register value, and the
|
||
|
SIB byte is absent. If the \c{r/m} field is 4 (which would encode
|
||
|
\c{ESP}), the SIB byte is present and gives the combination and
|
||
|
scaling of registers to be added to the displacement.
|
||
|
|
||
|
If the SIB byte is present, it describes the combination of
|
||
|
registers (an optional base register, and an optional index register
|
||
|
scaled by multiplication by 1, 2, 4 or 8) to be added to the
|
||
|
displacement. The SIB byte is divided into the \c{scale} field, in
|
||
|
the top two bits, the \c{index} field in the next three, and the
|
||
|
\c{base} field in the bottom three. The general rules are:
|
||
|
|
||
|
\b The \c{base} field encodes the register value of the base
|
||
|
register.
|
||
|
|
||
|
\b The \c{index} field encodes the register value of the index
|
||
|
register, unless it is 4, in which case no index register is used
|
||
|
(so \c{ESP} cannot be used as an index register).
|
||
|
|
||
|
\b The \c{scale} field encodes the multiplier by which the index
|
||
|
register is scaled before adding it to the base and displacement: 0
|
||
|
encodes a multiplier of 1, 1 encodes 2, 2 encodes 4 and 3 encodes 8.
|
||
|
|
||
|
The exceptions to the 32-bit encoding rules are:
|
||
|
|
||
|
\b If \c{mod} is 0 and \c{r/m} is 5, the effective address encoded
|
||
|
is not \c{[EBP]} as the above rules would suggest, but instead
|
||
|
\c{[disp32]}: the displacement field is present and is four bytes
|
||
|
long, and no registers are added to the displacement.
|
||
|
|
||
|
\b If \c{mod} is 0, \c{r/m} is 4 (meaning the SIB byte is present)
|
||
|
and \c{base} is 5, the effective address encoded is not
|
||
|
\c{[EBP+index]} as the above rules would suggest, but instead
|
||
|
\c{[disp32+index]}: the displacement field is present and is four
|
||
|
bytes long, and there is no base register (but the index register is
|
||
|
still processed in the normal way).
|
||
|
|
||
|
|
||
|
\S{iref-rex} Register Extensions: The \i{REX} Prefix
|
||
|
|
||
|
The Register Extensions, or \i{REX} for short, prefix is the means
|
||
|
of accessing extended registers on the x86-64 architecture. \i{REX}
|
||
|
is considered an instruction prefix, but is required to be after
|
||
|
all other prefixes and thus immediately before the first instruction
|
||
|
opcode itself. So overall, \i{REX} can be thought of as an "Opcode
|
||
|
Prefix" instead. The \i{REX} prefix itself is indicated by a value
|
||
|
of 0x4X, where X is one of 16 different combinations of the actual
|
||
|
\i{REX} flags.
|
||
|
|
||
|
The \i{REX} prefix flags consist of four 1-bit extensions fields.
|
||
|
These flags are found in the lower nibble of the actual \i{REX}
|
||
|
prefix opcode. Below is the list of \i{REX} prefix flags, from
|
||
|
high bit to low bit.
|
||
|
|
||
|
\c{REX.W}: When set, this flag indicates the use of a 64-bit operand,
|
||
|
as opposed to the default of using 32-bit operands as found in 32-bit
|
||
|
Protected Mode.
|
||
|
|
||
|
\c{REX.R}: When set, this flag extends the \c{reg (spare)} field of
|
||
|
the \c{ModRM} byte. Overall, this raises the amount of addressable
|
||
|
registers in this field from 8 to 16.
|
||
|
|
||
|
\c{REX.X}: When set, this flag extends the \c{index} field of the
|
||
|
\c{SIB} byte. Overall, this raises the amount of addressable
|
||
|
registers in this field from 8 to 16.
|
||
|
|
||
|
\c{REX.B}: When set, this flag extends the \c{r/m} field of the
|
||
|
\c{ModRM} byte. This flag can also represent an extension to the
|
||
|
opcode register \c{(/r)} field. The determination of which is used
|
||
|
varies depending on which instruction is used. Overall, this raises
|
||
|
the amount of addressable registers in these fields from 8 to 16.
|
||
|
|
||
|
Interal use of the \i{REX} prefix by the processor is consistent,
|
||
|
yet non-trivial. Most instructions use the \i{REX} prefix as
|
||
|
indicated by the above flags. Some instructions require the \i{REX}
|
||
|
prefix to be present even if the flags are empty. Some instructions
|
||
|
default to a 64-bit operand and require the \i{REX} prefix only for
|
||
|
actual register extensions, and thus ignores the \c{REX.W} field
|
||
|
completely.
|
||
|
|
||
|
At any rate, NASM is designed to handle, and fully supports, the
|
||
|
\i{REX} prefix internally. Please read the appropriate processor
|
||
|
documentation for further information on the \i{REX} prefix.
|
||
|
|
||
|
You may have noticed that opcodes 0x40 through 0x4F are actually
|
||
|
opcodes for the INC/DEC instructions for each General Purpose
|
||
|
Register. This is, of course, correct... for legacy x86. While
|
||
|
in long mode, opcodes 0x40 through 0x4F are reserved for use as
|
||
|
the REX prefix. The other opcode forms of the INC/DEC instructions
|
||
|
are used instead.
|
||
|
|
||
|
|
||
|
\H{iref-flg} Key to Instruction Flags
|
||
|
|
||
|
Given along with each instruction in this appendix is a set of
|
||
|
flags, denoting the type of the instruction. The types are as follows:
|
||
|
|
||
|
\b \c{8086}, \c{186}, \c{286}, \c{386}, \c{486}, \c{PENT} and \c{P6}
|
||
|
denote the lowest processor type that supports the instruction. Most
|
||
|
instructions run on all processors above the given type; those that
|
||
|
do not are documented. The Pentium II contains no additional
|
||
|
instructions beyond the P6 (Pentium Pro); from the point of view of
|
||
|
its instruction set, it can be thought of as a P6 with MMX
|
||
|
capability.
|
||
|
|
||
|
\b \c{3DNOW} indicates that the instruction is a 3DNow! one, and will
|
||
|
run on the AMD K6-2 and later processors. ATHLON extensions to the
|
||
|
3DNow! instruction set are documented as such.
|
||
|
|
||
|
\b \c{CYRIX} indicates that the instruction is specific to Cyrix
|
||
|
processors, for example the extra MMX instructions in the Cyrix
|
||
|
extended MMX instruction set.
|
||
|
|
||
|
\b \c{FPU} indicates that the instruction is a floating-point one,
|
||
|
and will only run on machines with a coprocessor (automatically
|
||
|
including 486DX, Pentium and above).
|
||
|
|
||
|
\b \c{KATMAI} indicates that the instruction was introduced as part
|
||
|
of the Katmai New Instruction set. These instructions are available
|
||
|
on the Pentium III and later processors. Those which are not
|
||
|
specifically SSE instructions are also available on the AMD Athlon.
|
||
|
|
||
|
\b \c{MMX} indicates that the instruction is an MMX one, and will
|
||
|
run on MMX-capable Pentium processors and the Pentium II.
|
||
|
|
||
|
\b \c{PRIV} indicates that the instruction is a protected-mode
|
||
|
management instruction. Many of these may only be used in protected
|
||
|
mode, or only at privilege level zero.
|
||
|
|
||
|
\b \c{SSE} and \c{SSE2} indicate that the instruction is a Streaming
|
||
|
SIMD Extension instruction. These instructions operate on multiple
|
||
|
values in a single operation. SSE was introduced with the Pentium III
|
||
|
and SSE2 was introduced with the Pentium 4.
|
||
|
|
||
|
\b \c{UNDOC} indicates that the instruction is an undocumented one,
|
||
|
and not part of the official Intel Architecture; it may or may not
|
||
|
be supported on any given machine.
|
||
|
|
||
|
\b \c{WILLAMETTE} indicates that the instruction was introduced as
|
||
|
part of the new instruction set in the Pentium 4 and Intel Xeon
|
||
|
processors. These instructions are also known as SSE2 instructions.
|
||
|
|
||
|
\b \c{X64} indicates that the instruction was introduced as part of
|
||
|
the new instruction set in the x86-64 architecture extension,
|
||
|
commonly referred to as x64, AMD64 or EM64T.
|
||
|
|
||
|
|
||
|
\H{iref-inst} x86 Instruction Set
|
||
|
|
||
|
|
||
|
\S{insAAA} \i\c{AAA}, \i\c{AAS}, \i\c{AAM}, \i\c{AAD}: ASCII
|
||
|
Adjustments
|
||
|
|
||
|
\c AAA ; 37 [8086]
|
||
|
|
||
|
\c AAS ; 3F [8086]
|
||
|
|
||
|
\c AAD ; D5 0A [8086]
|
||
|
\c AAD imm ; D5 ib [8086]
|
||
|
|
||
|
\c AAM ; D4 0A [8086]
|
||
|
\c AAM imm ; D4 ib [8086]
|
||
|
|
||
|
These instructions are used in conjunction with the add, subtract,
|
||
|
multiply and divide instructions to perform binary-coded decimal
|
||
|
arithmetic in \e{unpacked} (one BCD digit per byte - easy to
|
||
|
translate to and from \c{ASCII}, hence the instruction names) form.
|
||
|
There are also packed BCD instructions \c{DAA} and \c{DAS}: see
|
||
|
\k{insDAA}.
|
||
|
|
||
|
\b \c{AAA} (ASCII Adjust After Addition) should be used after a
|
||
|
one-byte \c{ADD} instruction whose destination was the \c{AL}
|
||
|
register: by means of examining the value in the low nibble of
|
||
|
\c{AL} and also the auxiliary carry flag \c{AF}, it determines
|
||
|
whether the addition has overflowed, and adjusts it (and sets
|
||
|
the carry flag) if so. You can add long BCD strings together
|
||
|
by doing \c{ADD}/\c{AAA} on the low digits, then doing
|
||
|
\c{ADC}/\c{AAA} on each subsequent digit.
|
||
|
|
||
|
\b \c{AAS} (ASCII Adjust AL After Subtraction) works similarly to
|
||
|
\c{AAA}, but is for use after \c{SUB} instructions rather than
|
||
|
\c{ADD}.
|
||
|
|
||
|
\b \c{AAM} (ASCII Adjust AX After Multiply) is for use after you
|
||
|
have multiplied two decimal digits together and left the result
|
||
|
in \c{AL}: it divides \c{AL} by ten and stores the quotient in
|
||
|
\c{AH}, leaving the remainder in \c{AL}. The divisor 10 can be
|
||
|
changed by specifying an operand to the instruction: a particularly
|
||
|
handy use of this is \c{AAM 16}, causing the two nibbles in \c{AL}
|
||
|
to be separated into \c{AH} and \c{AL}.
|
||
|
|
||
|
\b \c{AAD} (ASCII Adjust AX Before Division) performs the inverse
|
||
|
operation to \c{AAM}: it multiplies \c{AH} by ten, adds it to
|
||
|
\c{AL}, and sets \c{AH} to zero. Again, the multiplier 10 can
|
||
|
be changed.
|
||
|
|
||
|
|
||
|
\S{insADC} \i\c{ADC}: Add with Carry
|
||
|
|
||
|
\c ADC r/m8,reg8 ; 10 /r [8086]
|
||
|
\c ADC r/m16,reg16 ; o16 11 /r [8086]
|
||
|
\c ADC r/m32,reg32 ; o32 11 /r [386]
|
||
|
|
||
|
\c ADC reg8,r/m8 ; 12 /r [8086]
|
||
|
\c ADC reg16,r/m16 ; o16 13 /r [8086]
|
||
|
\c ADC reg32,r/m32 ; o32 13 /r [386]
|
||
|
|
||
|
\c ADC r/m8,imm8 ; 80 /2 ib [8086]
|
||
|
\c ADC r/m16,imm16 ; o16 81 /2 iw [8086]
|
||
|
\c ADC r/m32,imm32 ; o32 81 /2 id [386]
|
||
|
|
||
|
\c ADC r/m16,imm8 ; o16 83 /2 ib [8086]
|
||
|
\c ADC r/m32,imm8 ; o32 83 /2 ib [386]
|
||
|
|
||
|
\c ADC AL,imm8 ; 14 ib [8086]
|
||
|
\c ADC AX,imm16 ; o16 15 iw [8086]
|
||
|
\c ADC EAX,imm32 ; o32 15 id [386]
|
||
|
|
||
|
\c{ADC} performs integer addition: it adds its two operands
|
||
|
together, plus the value of the carry flag, and leaves the result in
|
||
|
its destination (first) operand. The destination operand can be a
|
||
|
register or a memory location. The source operand can be a register,
|
||
|
a memory location or an immediate value.
|
||
|
|
||
|
The flags are set according to the result of the operation: in
|
||
|
particular, the carry flag is affected and can be used by a
|
||
|
subsequent \c{ADC} instruction.
|
||
|
|
||
|
In the forms with an 8-bit immediate second operand and a longer
|
||
|
first operand, the second operand is considered to be signed, and is
|
||
|
sign-extended to the length of the first operand. In these cases,
|
||
|
the \c{BYTE} qualifier is necessary to force NASM to generate this
|
||
|
form of the instruction.
|
||
|
|
||
|
To add two numbers without also adding the contents of the carry
|
||
|
flag, use \c{ADD} (\k{insADD}).
|
||
|
|
||
|
|
||
|
\S{insADD} \i\c{ADD}: Add Integers
|
||
|
|
||
|
\c ADD r/m8,reg8 ; 00 /r [8086]
|
||
|
\c ADD r/m16,reg16 ; o16 01 /r [8086]
|
||
|
\c ADD r/m32,reg32 ; o32 01 /r [386]
|
||
|
|
||
|
\c ADD reg8,r/m8 ; 02 /r [8086]
|
||
|
\c ADD reg16,r/m16 ; o16 03 /r [8086]
|
||
|
\c ADD reg32,r/m32 ; o32 03 /r [386]
|
||
|
|
||
|
\c ADD r/m8,imm8 ; 80 /7 ib [8086]
|
||
|
\c ADD r/m16,imm16 ; o16 81 /7 iw [8086]
|
||
|
\c ADD r/m32,imm32 ; o32 81 /7 id [386]
|
||
|
|
||
|
\c ADD r/m16,imm8 ; o16 83 /7 ib [8086]
|
||
|
\c ADD r/m32,imm8 ; o32 83 /7 ib [386]
|
||
|
|
||
|
\c ADD AL,imm8 ; 04 ib [8086]
|
||
|
\c ADD AX,imm16 ; o16 05 iw [8086]
|
||
|
\c ADD EAX,imm32 ; o32 05 id [386]
|
||
|
|
||
|
\c{ADD} performs integer addition: it adds its two operands
|
||
|
together, and leaves the result in its destination (first) operand.
|
||
|
The destination operand can be a register or a memory location.
|
||
|
The source operand can be a register, a memory location or an
|
||
|
immediate value.
|
||
|
|
||
|
The flags are set according to the result of the operation: in
|
||
|
particular, the carry flag is affected and can be used by a
|
||
|
subsequent \c{ADC} instruction.
|
||
|
|
||
|
In the forms with an 8-bit immediate second operand and a longer
|
||
|
first operand, the second operand is considered to be signed, and is
|
||
|
sign-extended to the length of the first operand. In these cases,
|
||
|
the \c{BYTE} qualifier is necessary to force NASM to generate this
|
||
|
form of the instruction.
|
||
|
|
||
|
|
||
|
\S{insADDPD} \i\c{ADDPD}: ADD Packed Double-Precision FP Values
|
||
|
|
||
|
\c ADDPD xmm1,xmm2/mem128 ; 66 0F 58 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{ADDPD} performs addition on each of two packed double-precision
|
||
|
FP value pairs.
|
||
|
|
||
|
\c dst[0-63] := dst[0-63] + src[0-63],
|
||
|
\c dst[64-127] := dst[64-127] + src[64-127].
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
|
||
|
\S{insADDPS} \i\c{ADDPS}: ADD Packed Single-Precision FP Values
|
||
|
|
||
|
\c ADDPS xmm1,xmm2/mem128 ; 0F 58 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{ADDPS} performs addition on each of four packed single-precision
|
||
|
FP value pairs
|
||
|
|
||
|
\c dst[0-31] := dst[0-31] + src[0-31],
|
||
|
\c dst[32-63] := dst[32-63] + src[32-63],
|
||
|
\c dst[64-95] := dst[64-95] + src[64-95],
|
||
|
\c dst[96-127] := dst[96-127] + src[96-127].
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
|
||
|
\S{insADDSD} \i\c{ADDSD}: ADD Scalar Double-Precision FP Values
|
||
|
|
||
|
\c ADDSD xmm1,xmm2/mem64 ; F2 0F 58 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{ADDSD} adds the low double-precision FP values from the source
|
||
|
and destination operands and stores the double-precision FP result
|
||
|
in the destination operand.
|
||
|
|
||
|
\c dst[0-63] := dst[0-63] + src[0-63],
|
||
|
\c dst[64-127) remains unchanged.
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 64-bit memory location.
|
||
|
|
||
|
|
||
|
\S{insADDSS} \i\c{ADDSS}: ADD Scalar Single-Precision FP Values
|
||
|
|
||
|
\c ADDSS xmm1,xmm2/mem32 ; F3 0F 58 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{ADDSS} adds the low single-precision FP values from the source
|
||
|
and destination operands and stores the single-precision FP result
|
||
|
in the destination operand.
|
||
|
|
||
|
\c dst[0-31] := dst[0-31] + src[0-31],
|
||
|
\c dst[32-127] remains unchanged.
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 32-bit memory location.
|
||
|
|
||
|
|
||
|
\S{insAND} \i\c{AND}: Bitwise AND
|
||
|
|
||
|
\c AND r/m8,reg8 ; 20 /r [8086]
|
||
|
\c AND r/m16,reg16 ; o16 21 /r [8086]
|
||
|
\c AND r/m32,reg32 ; o32 21 /r [386]
|
||
|
|
||
|
\c AND reg8,r/m8 ; 22 /r [8086]
|
||
|
\c AND reg16,r/m16 ; o16 23 /r [8086]
|
||
|
\c AND reg32,r/m32 ; o32 23 /r [386]
|
||
|
|
||
|
\c AND r/m8,imm8 ; 80 /4 ib [8086]
|
||
|
\c AND r/m16,imm16 ; o16 81 /4 iw [8086]
|
||
|
\c AND r/m32,imm32 ; o32 81 /4 id [386]
|
||
|
|
||
|
\c AND r/m16,imm8 ; o16 83 /4 ib [8086]
|
||
|
\c AND r/m32,imm8 ; o32 83 /4 ib [386]
|
||
|
|
||
|
\c AND AL,imm8 ; 24 ib [8086]
|
||
|
\c AND AX,imm16 ; o16 25 iw [8086]
|
||
|
\c AND EAX,imm32 ; o32 25 id [386]
|
||
|
|
||
|
\c{AND} performs a bitwise AND operation between its two operands
|
||
|
(i.e. each bit of the result is 1 if and only if the corresponding
|
||
|
bits of the two inputs were both 1), and stores the result in the
|
||
|
destination (first) operand. The destination operand can be a
|
||
|
register or a memory location. The source operand can be a register,
|
||
|
a memory location or an immediate value.
|
||
|
|
||
|
In the forms with an 8-bit immediate second operand and a longer
|
||
|
first operand, the second operand is considered to be signed, and is
|
||
|
sign-extended to the length of the first operand. In these cases,
|
||
|
the \c{BYTE} qualifier is necessary to force NASM to generate this
|
||
|
form of the instruction.
|
||
|
|
||
|
The \c{MMX} instruction \c{PAND} (see \k{insPAND}) performs the same
|
||
|
operation on the 64-bit \c{MMX} registers.
|
||
|
|
||
|
|
||
|
\S{insANDNPD} \i\c{ANDNPD}: Bitwise Logical AND NOT of
|
||
|
Packed Double-Precision FP Values
|
||
|
|
||
|
\c ANDNPD xmm1,xmm2/mem128 ; 66 0F 55 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{ANDNPD} inverts the bits of the two double-precision
|
||
|
floating-point values in the destination register, and then
|
||
|
performs a logical AND between the two double-precision
|
||
|
floating-point values in the source operand and the temporary
|
||
|
inverted result, storing the result in the destination register.
|
||
|
|
||
|
\c dst[0-63] := src[0-63] AND NOT dst[0-63],
|
||
|
\c dst[64-127] := src[64-127] AND NOT dst[64-127].
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
|
||
|
\S{insANDNPS} \i\c{ANDNPS}: Bitwise Logical AND NOT of
|
||
|
Packed Single-Precision FP Values
|
||
|
|
||
|
\c ANDNPS xmm1,xmm2/mem128 ; 0F 55 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{ANDNPS} inverts the bits of the four single-precision
|
||
|
floating-point values in the destination register, and then
|
||
|
performs a logical AND between the four single-precision
|
||
|
floating-point values in the source operand and the temporary
|
||
|
inverted result, storing the result in the destination register.
|
||
|
|
||
|
\c dst[0-31] := src[0-31] AND NOT dst[0-31],
|
||
|
\c dst[32-63] := src[32-63] AND NOT dst[32-63],
|
||
|
\c dst[64-95] := src[64-95] AND NOT dst[64-95],
|
||
|
\c dst[96-127] := src[96-127] AND NOT dst[96-127].
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
|
||
|
\S{insANDPD} \i\c{ANDPD}: Bitwise Logical AND For Single FP
|
||
|
|
||
|
\c ANDPD xmm1,xmm2/mem128 ; 66 0F 54 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{ANDPD} performs a bitwise logical AND of the two double-precision
|
||
|
floating point values in the source and destination operand, and
|
||
|
stores the result in the destination register.
|
||
|
|
||
|
\c dst[0-63] := src[0-63] AND dst[0-63],
|
||
|
\c dst[64-127] := src[64-127] AND dst[64-127].
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
|
||
|
\S{insANDPS} \i\c{ANDPS}: Bitwise Logical AND For Single FP
|
||
|
|
||
|
\c ANDPS xmm1,xmm2/mem128 ; 0F 54 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{ANDPS} performs a bitwise logical AND of the four single-precision
|
||
|
floating point values in the source and destination operand, and
|
||
|
stores the result in the destination register.
|
||
|
|
||
|
\c dst[0-31] := src[0-31] AND dst[0-31],
|
||
|
\c dst[32-63] := src[32-63] AND dst[32-63],
|
||
|
\c dst[64-95] := src[64-95] AND dst[64-95],
|
||
|
\c dst[96-127] := src[96-127] AND dst[96-127].
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
|
||
|
\S{insARPL} \i\c{ARPL}: Adjust RPL Field of Selector
|
||
|
|
||
|
\c ARPL r/m16,reg16 ; 63 /r [286,PRIV]
|
||
|
|
||
|
\c{ARPL} expects its two word operands to be segment selectors. It
|
||
|
adjusts the \i\c{RPL} (requested privilege level - stored in the bottom
|
||
|
two bits of the selector) field of the destination (first) operand
|
||
|
to ensure that it is no less (i.e. no more privileged than) the \c{RPL}
|
||
|
field of the source operand. The zero flag is set if and only if a
|
||
|
change had to be made.
|
||
|
|
||
|
|
||
|
\S{insBOUND} \i\c{BOUND}: Check Array Index against Bounds
|
||
|
|
||
|
\c BOUND reg16,mem ; o16 62 /r [186]
|
||
|
\c BOUND reg32,mem ; o32 62 /r [386]
|
||
|
|
||
|
\c{BOUND} expects its second operand to point to an area of memory
|
||
|
containing two signed values of the same size as its first operand
|
||
|
(i.e. two words for the 16-bit form; two doublewords for the 32-bit
|
||
|
form). It performs two signed comparisons: if the value in the
|
||
|
register passed as its first operand is less than the first of the
|
||
|
in-memory values, or is greater than or equal to the second, it
|
||
|
throws a \c{BR} exception. Otherwise, it does nothing.
|
||
|
|
||
|
|
||
|
\S{insBSF} \i\c{BSF}, \i\c{BSR}: Bit Scan
|
||
|
|
||
|
\c BSF reg16,r/m16 ; o16 0F BC /r [386]
|
||
|
\c BSF reg32,r/m32 ; o32 0F BC /r [386]
|
||
|
|
||
|
\c BSR reg16,r/m16 ; o16 0F BD /r [386]
|
||
|
\c BSR reg32,r/m32 ; o32 0F BD /r [386]
|
||
|
|
||
|
\b \c{BSF} searches for the least significant set bit in its source
|
||
|
(second) operand, and if it finds one, stores the index in
|
||
|
its destination (first) operand. If no set bit is found, the
|
||
|
contents of the destination operand are undefined. If the source
|
||
|
operand is zero, the zero flag is set.
|
||
|
|
||
|
\b \c{BSR} performs the same function, but searches from the top
|
||
|
instead, so it finds the most significant set bit.
|
||
|
|
||
|
Bit indices are from 0 (least significant) to 15 or 31 (most
|
||
|
significant). The destination operand can only be a register.
|
||
|
The source operand can be a register or a memory location.
|
||
|
|
||
|
|
||
|
\S{insBSWAP} \i\c{BSWAP}: Byte Swap
|
||
|
|
||
|
\c BSWAP reg32 ; o32 0F C8+r [486]
|
||
|
|
||
|
\c{BSWAP} swaps the order of the four bytes of a 32-bit register:
|
||
|
bits 0-7 exchange places with bits 24-31, and bits 8-15 swap with
|
||
|
bits 16-23. There is no explicit 16-bit equivalent: to byte-swap
|
||
|
\c{AX}, \c{BX}, \c{CX} or \c{DX}, \c{XCHG} can be used. When \c{BSWAP}
|
||
|
is used with a 16-bit register, the result is undefined.
|
||
|
|
||
|
|
||
|
\S{insBT} \i\c{BT}, \i\c{BTC}, \i\c{BTR}, \i\c{BTS}: Bit Test
|
||
|
|
||
|
\c BT r/m16,reg16 ; o16 0F A3 /r [386]
|
||
|
\c BT r/m32,reg32 ; o32 0F A3 /r [386]
|
||
|
\c BT r/m16,imm8 ; o16 0F BA /4 ib [386]
|
||
|
\c BT r/m32,imm8 ; o32 0F BA /4 ib [386]
|
||
|
|
||
|
\c BTC r/m16,reg16 ; o16 0F BB /r [386]
|
||
|
\c BTC r/m32,reg32 ; o32 0F BB /r [386]
|
||
|
\c BTC r/m16,imm8 ; o16 0F BA /7 ib [386]
|
||
|
\c BTC r/m32,imm8 ; o32 0F BA /7 ib [386]
|
||
|
|
||
|
\c BTR r/m16,reg16 ; o16 0F B3 /r [386]
|
||
|
\c BTR r/m32,reg32 ; o32 0F B3 /r [386]
|
||
|
\c BTR r/m16,imm8 ; o16 0F BA /6 ib [386]
|
||
|
\c BTR r/m32,imm8 ; o32 0F BA /6 ib [386]
|
||
|
|
||
|
\c BTS r/m16,reg16 ; o16 0F AB /r [386]
|
||
|
\c BTS r/m32,reg32 ; o32 0F AB /r [386]
|
||
|
\c BTS r/m16,imm ; o16 0F BA /5 ib [386]
|
||
|
\c BTS r/m32,imm ; o32 0F BA /5 ib [386]
|
||
|
|
||
|
These instructions all test one bit of their first operand, whose
|
||
|
index is given by the second operand, and store the value of that
|
||
|
bit into the carry flag. Bit indices are from 0 (least significant)
|
||
|
to 15 or 31 (most significant).
|
||
|
|
||
|
In addition to storing the original value of the bit into the carry
|
||
|
flag, \c{BTR} also resets (clears) the bit in the operand itself.
|
||
|
\c{BTS} sets the bit, and \c{BTC} complements the bit. \c{BT} does
|
||
|
not modify its operands.
|
||
|
|
||
|
The destination can be a register or a memory location. The source can
|
||
|
be a register or an immediate value.
|
||
|
|
||
|
If the destination operand is a register, the bit offset should be
|
||
|
in the range 0-15 (for 16-bit operands) or 0-31 (for 32-bit operands).
|
||
|
An immediate value outside these ranges will be taken modulo 16/32
|
||
|
by the processor.
|
||
|
|
||
|
If the destination operand is a memory location, then an immediate
|
||
|
bit offset follows the same rules as for a register. If the bit offset
|
||
|
is in a register, then it can be anything within the signed range of
|
||
|
the register used (ie, for a 32-bit operand, it can be (-2^31) to (2^31 - 1)
|
||
|
|
||
|
|
||
|
\S{insCALL} \i\c{CALL}: Call Subroutine
|
||
|
|
||
|
\c CALL imm ; E8 rw/rd [8086]
|
||
|
\c CALL imm:imm16 ; o16 9A iw iw [8086]
|
||
|
\c CALL imm:imm32 ; o32 9A id iw [386]
|
||
|
\c CALL FAR mem16 ; o16 FF /3 [8086]
|
||
|
\c CALL FAR mem32 ; o32 FF /3 [386]
|
||
|
\c CALL r/m16 ; o16 FF /2 [8086]
|
||
|
\c CALL r/m32 ; o32 FF /2 [386]
|
||
|
|
||
|
\c{CALL} calls a subroutine, by means of pushing the current
|
||
|
instruction pointer (\c{IP}) and optionally \c{CS} as well on the
|
||
|
stack, and then jumping to a given address.
|
||
|
|
||
|
\c{CS} is pushed as well as \c{IP} if and only if the call is a far
|
||
|
call, i.e. a destination segment address is specified in the
|
||
|
instruction. The forms involving two colon-separated arguments are
|
||
|
far calls; so are the \c{CALL FAR mem} forms.
|
||
|
|
||
|
The immediate \i{near call} takes one of two forms (\c{call imm16/imm32},
|
||
|
determined by the current segment size limit. For 16-bit operands,
|
||
|
you would use \c{CALL 0x1234}, and for 32-bit operands you would use
|
||
|
\c{CALL 0x12345678}. The value passed as an operand is a relative offset.
|
||
|
|
||
|
You can choose between the two immediate \i{far call} forms
|
||
|
(\c{CALL imm:imm}) by the use of the \c{WORD} and \c{DWORD} keywords:
|
||
|
\c{CALL WORD 0x1234:0x5678}) or \c{CALL DWORD 0x1234:0x56789abc}.
|
||
|
|
||
|
The \c{CALL FAR mem} forms execute a far call by loading the
|
||
|
destination address out of memory. The address loaded consists of 16
|
||
|
or 32 bits of offset (depending on the operand size), and 16 bits of
|
||
|
segment. The operand size may be overridden using \c{CALL WORD FAR
|
||
|
mem} or \c{CALL DWORD FAR mem}.
|
||
|
|
||
|
The \c{CALL r/m} forms execute a \i{near call} (within the same
|
||
|
segment), loading the destination address out of memory or out of a
|
||
|
register. The keyword \c{NEAR} may be specified, for clarity, in
|
||
|
these forms, but is not necessary. Again, operand size can be
|
||
|
overridden using \c{CALL WORD mem} or \c{CALL DWORD mem}.
|
||
|
|
||
|
As a convenience, NASM does not require you to call a far procedure
|
||
|
symbol by coding the cumbersome \c{CALL SEG routine:routine}, but
|
||
|
instead allows the easier synonym \c{CALL FAR routine}.
|
||
|
|
||
|
The \c{CALL r/m} forms given above are near calls; NASM will accept
|
||
|
the \c{NEAR} keyword (e.g. \c{CALL NEAR [address]}), even though it
|
||
|
is not strictly necessary.
|
||
|
|
||
|
|
||
|
\S{insCBW} \i\c{CBW}, \i\c{CWD}, \i\c{CDQ}, \i\c{CWDE}: Sign Extensions
|
||
|
|
||
|
\c CBW ; o16 98 [8086]
|
||
|
\c CWDE ; o32 98 [386]
|
||
|
|
||
|
\c CWD ; o16 99 [8086]
|
||
|
\c CDQ ; o32 99 [386]
|
||
|
|
||
|
All these instructions sign-extend a short value into a longer one,
|
||
|
by replicating the top bit of the original value to fill the
|
||
|
extended one.
|
||
|
|
||
|
\c{CBW} extends \c{AL} into \c{AX} by repeating the top bit of
|
||
|
\c{AL} in every bit of \c{AH}. \c{CWDE} extends \c{AX} into
|
||
|
\c{EAX}. \c{CWD} extends \c{AX} into \c{DX:AX} by repeating
|
||
|
the top bit of \c{AX} throughout \c{DX}, and \c{CDQ} extends
|
||
|
\c{EAX} into \c{EDX:EAX}.
|
||
|
|
||
|
|
||
|
\S{insCLC} \i\c{CLC}, \i\c{CLD}, \i\c{CLI}, \i\c{CLTS}: Clear Flags
|
||
|
|
||
|
\c CLC ; F8 [8086]
|
||
|
\c CLD ; FC [8086]
|
||
|
\c CLI ; FA [8086]
|
||
|
\c CLTS ; 0F 06 [286,PRIV]
|
||
|
|
||
|
These instructions clear various flags. \c{CLC} clears the carry
|
||
|
flag; \c{CLD} clears the direction flag; \c{CLI} clears the
|
||
|
interrupt flag (thus disabling interrupts); and \c{CLTS} clears the
|
||
|
task-switched (\c{TS}) flag in \c{CR0}.
|
||
|
|
||
|
To set the carry, direction, or interrupt flags, use the \c{STC},
|
||
|
\c{STD} and \c{STI} instructions (\k{insSTC}). To invert the carry
|
||
|
flag, use \c{CMC} (\k{insCMC}).
|
||
|
|
||
|
|
||
|
\S{insCLFLUSH} \i\c{CLFLUSH}: Flush Cache Line
|
||
|
|
||
|
\c CLFLUSH mem ; 0F AE /7 [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CLFLUSH} invalidates the cache line that contains the linear address
|
||
|
specified by the source operand from all levels of the processor cache
|
||
|
hierarchy (data and instruction). If, at any level of the cache
|
||
|
hierarchy, the line is inconsistent with memory (dirty) it is written
|
||
|
to memory before invalidation. The source operand points to a
|
||
|
byte-sized memory location.
|
||
|
|
||
|
Although \c{CLFLUSH} is flagged \c{SSE2} and above, it may not be
|
||
|
present on all processors which have \c{SSE2} support, and it may be
|
||
|
supported on other processors; the \c{CPUID} instruction (\k{insCPUID})
|
||
|
will return a bit which indicates support for the \c{CLFLUSH} instruction.
|
||
|
|
||
|
|
||
|
\S{insCMC} \i\c{CMC}: Complement Carry Flag
|
||
|
|
||
|
\c CMC ; F5 [8086]
|
||
|
|
||
|
\c{CMC} changes the value of the carry flag: if it was 0, it sets it
|
||
|
to 1, and vice versa.
|
||
|
|
||
|
|
||
|
\S{insCMOVcc} \i\c{CMOVcc}: Conditional Move
|
||
|
|
||
|
\c CMOVcc reg16,r/m16 ; o16 0F 40+cc /r [P6]
|
||
|
\c CMOVcc reg32,r/m32 ; o32 0F 40+cc /r [P6]
|
||
|
|
||
|
\c{CMOV} moves its source (second) operand into its destination
|
||
|
(first) operand if the given condition code is satisfied; otherwise
|
||
|
it does nothing.
|
||
|
|
||
|
For a list of condition codes, see \k{iref-cc}.
|
||
|
|
||
|
Although the \c{CMOV} instructions are flagged \c{P6} and above, they
|
||
|
may not be supported by all Pentium Pro processors; the \c{CPUID}
|
||
|
instruction (\k{insCPUID}) will return a bit which indicates whether
|
||
|
conditional moves are supported.
|
||
|
|
||
|
|
||
|
\S{insCMP} \i\c{CMP}: Compare Integers
|
||
|
|
||
|
\c CMP r/m8,reg8 ; 38 /r [8086]
|
||
|
\c CMP r/m16,reg16 ; o16 39 /r [8086]
|
||
|
\c CMP r/m32,reg32 ; o32 39 /r [386]
|
||
|
|
||
|
\c CMP reg8,r/m8 ; 3A /r [8086]
|
||
|
\c CMP reg16,r/m16 ; o16 3B /r [8086]
|
||
|
\c CMP reg32,r/m32 ; o32 3B /r [386]
|
||
|
|
||
|
\c CMP r/m8,imm8 ; 80 /7 ib [8086]
|
||
|
\c CMP r/m16,imm16 ; o16 81 /7 iw [8086]
|
||
|
\c CMP r/m32,imm32 ; o32 81 /7 id [386]
|
||
|
|
||
|
\c CMP r/m16,imm8 ; o16 83 /7 ib [8086]
|
||
|
\c CMP r/m32,imm8 ; o32 83 /7 ib [386]
|
||
|
|
||
|
\c CMP AL,imm8 ; 3C ib [8086]
|
||
|
\c CMP AX,imm16 ; o16 3D iw [8086]
|
||
|
\c CMP EAX,imm32 ; o32 3D id [386]
|
||
|
|
||
|
\c{CMP} performs a `mental' subtraction of its second operand from
|
||
|
its first operand, and affects the flags as if the subtraction had
|
||
|
taken place, but does not store the result of the subtraction
|
||
|
anywhere.
|
||
|
|
||
|
In the forms with an 8-bit immediate second operand and a longer
|
||
|
first operand, the second operand is considered to be signed, and is
|
||
|
sign-extended to the length of the first operand. In these cases,
|
||
|
the \c{BYTE} qualifier is necessary to force NASM to generate this
|
||
|
form of the instruction.
|
||
|
|
||
|
The destination operand can be a register or a memory location. The
|
||
|
source can be a register, memory location or an immediate value of
|
||
|
the same size as the destination.
|
||
|
|
||
|
|
||
|
\S{insCMPccPD} \i\c{CMPccPD}: Packed Double-Precision FP Compare
|
||
|
\I\c{CMPEQPD} \I\c{CMPLTPD} \I\c{CMPLEPD} \I\c{CMPUNORDPD}
|
||
|
\I\c{CMPNEQPD} \I\c{CMPNLTPD} \I\c{CMPNLEPD} \I\c{CMPORDPD}
|
||
|
|
||
|
\c CMPPD xmm1,xmm2/mem128,imm8 ; 66 0F C2 /r ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c CMPEQPD xmm1,xmm2/mem128 ; 66 0F C2 /r 00 [WILLAMETTE,SSE2]
|
||
|
\c CMPLTPD xmm1,xmm2/mem128 ; 66 0F C2 /r 01 [WILLAMETTE,SSE2]
|
||
|
\c CMPLEPD xmm1,xmm2/mem128 ; 66 0F C2 /r 02 [WILLAMETTE,SSE2]
|
||
|
\c CMPUNORDPD xmm1,xmm2/mem128 ; 66 0F C2 /r 03 [WILLAMETTE,SSE2]
|
||
|
\c CMPNEQPD xmm1,xmm2/mem128 ; 66 0F C2 /r 04 [WILLAMETTE,SSE2]
|
||
|
\c CMPNLTPD xmm1,xmm2/mem128 ; 66 0F C2 /r 05 [WILLAMETTE,SSE2]
|
||
|
\c CMPNLEPD xmm1,xmm2/mem128 ; 66 0F C2 /r 06 [WILLAMETTE,SSE2]
|
||
|
\c CMPORDPD xmm1,xmm2/mem128 ; 66 0F C2 /r 07 [WILLAMETTE,SSE2]
|
||
|
|
||
|
The \c{CMPccPD} instructions compare the two packed double-precision
|
||
|
FP values in the source and destination operands, and returns the
|
||
|
result of the comparison in the destination register. The result of
|
||
|
each comparison is a quadword mask of all 1s (comparison true) or
|
||
|
all 0s (comparison false).
|
||
|
|
||
|
The destination is an \c{XMM} register. The source can be either an
|
||
|
\c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
The third operand is an 8-bit immediate value, of which the low 3
|
||
|
bits define the type of comparison. For ease of programming, the
|
||
|
8 two-operand pseudo-instructions are provided, with the third
|
||
|
operand already filled in. The \I{Condition Predicates}
|
||
|
\c{Condition Predicates} are:
|
||
|
|
||
|
\c EQ 0 Equal
|
||
|
\c LT 1 Less-than
|
||
|
\c LE 2 Less-than-or-equal
|
||
|
\c UNORD 3 Unordered
|
||
|
\c NE 4 Not-equal
|
||
|
\c NLT 5 Not-less-than
|
||
|
\c NLE 6 Not-less-than-or-equal
|
||
|
\c ORD 7 Ordered
|
||
|
|
||
|
For more details of the comparison predicates, and details of how
|
||
|
to emulate the "greater-than" equivalents, see \k{iref-SSE-cc}
|
||
|
|
||
|
|
||
|
\S{insCMPccPS} \i\c{CMPccPS}: Packed Single-Precision FP Compare
|
||
|
\I\c{CMPEQPS} \I\c{CMPLTPS} \I\c{CMPLEPS} \I\c{CMPUNORDPS}
|
||
|
\I\c{CMPNEQPS} \I\c{CMPNLTPS} \I\c{CMPNLEPS} \I\c{CMPORDPS}
|
||
|
|
||
|
\c CMPPS xmm1,xmm2/mem128,imm8 ; 0F C2 /r ib [KATMAI,SSE]
|
||
|
|
||
|
\c CMPEQPS xmm1,xmm2/mem128 ; 0F C2 /r 00 [KATMAI,SSE]
|
||
|
\c CMPLTPS xmm1,xmm2/mem128 ; 0F C2 /r 01 [KATMAI,SSE]
|
||
|
\c CMPLEPS xmm1,xmm2/mem128 ; 0F C2 /r 02 [KATMAI,SSE]
|
||
|
\c CMPUNORDPS xmm1,xmm2/mem128 ; 0F C2 /r 03 [KATMAI,SSE]
|
||
|
\c CMPNEQPS xmm1,xmm2/mem128 ; 0F C2 /r 04 [KATMAI,SSE]
|
||
|
\c CMPNLTPS xmm1,xmm2/mem128 ; 0F C2 /r 05 [KATMAI,SSE]
|
||
|
\c CMPNLEPS xmm1,xmm2/mem128 ; 0F C2 /r 06 [KATMAI,SSE]
|
||
|
\c CMPORDPS xmm1,xmm2/mem128 ; 0F C2 /r 07 [KATMAI,SSE]
|
||
|
|
||
|
The \c{CMPccPS} instructions compare the two packed single-precision
|
||
|
FP values in the source and destination operands, and returns the
|
||
|
result of the comparison in the destination register. The result of
|
||
|
each comparison is a doubleword mask of all 1s (comparison true) or
|
||
|
all 0s (comparison false).
|
||
|
|
||
|
The destination is an \c{XMM} register. The source can be either an
|
||
|
\c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
The third operand is an 8-bit immediate value, of which the low 3
|
||
|
bits define the type of comparison. For ease of programming, the
|
||
|
8 two-operand pseudo-instructions are provided, with the third
|
||
|
operand already filled in. The \I{Condition Predicates}
|
||
|
\c{Condition Predicates} are:
|
||
|
|
||
|
\c EQ 0 Equal
|
||
|
\c LT 1 Less-than
|
||
|
\c LE 2 Less-than-or-equal
|
||
|
\c UNORD 3 Unordered
|
||
|
\c NE 4 Not-equal
|
||
|
\c NLT 5 Not-less-than
|
||
|
\c NLE 6 Not-less-than-or-equal
|
||
|
\c ORD 7 Ordered
|
||
|
|
||
|
For more details of the comparison predicates, and details of how
|
||
|
to emulate the "greater-than" equivalents, see \k{iref-SSE-cc}
|
||
|
|
||
|
|
||
|
\S{insCMPSB} \i\c{CMPSB}, \i\c{CMPSW}, \i\c{CMPSD}: Compare Strings
|
||
|
|
||
|
\c CMPSB ; A6 [8086]
|
||
|
\c CMPSW ; o16 A7 [8086]
|
||
|
\c CMPSD ; o32 A7 [386]
|
||
|
|
||
|
\c{CMPSB} compares the byte at \c{[DS:SI]} or \c{[DS:ESI]} with the
|
||
|
byte at \c{[ES:DI]} or \c{[ES:EDI]}, and sets the flags accordingly.
|
||
|
It then increments or decrements (depending on the direction flag:
|
||
|
increments if the flag is clear, decrements if it is set) \c{SI} and
|
||
|
\c{DI} (or \c{ESI} and \c{EDI}).
|
||
|
|
||
|
The registers used are \c{SI} and \c{DI} if the address size is 16
|
||
|
bits, and \c{ESI} and \c{EDI} if it is 32 bits. If you need to use
|
||
|
an address size not equal to the current \c{BITS} setting, you can
|
||
|
use an explicit \i\c{a16} or \i\c{a32} prefix.
|
||
|
|
||
|
The segment register used to load from \c{[SI]} or \c{[ESI]} can be
|
||
|
overridden by using a segment register name as a prefix (for
|
||
|
example, \c{ES CMPSB}). The use of \c{ES} for the load from \c{[DI]}
|
||
|
or \c{[EDI]} cannot be overridden.
|
||
|
|
||
|
\c{CMPSW} and \c{CMPSD} work in the same way, but they compare a
|
||
|
word or a doubleword instead of a byte, and increment or decrement
|
||
|
the addressing registers by 2 or 4 instead of 1.
|
||
|
|
||
|
The \c{REPE} and \c{REPNE} prefixes (equivalently, \c{REPZ} and
|
||
|
\c{REPNZ}) may be used to repeat the instruction up to \c{CX} (or
|
||
|
\c{ECX} - again, the address size chooses which) times until the
|
||
|
first unequal or equal byte is found.
|
||
|
|
||
|
|
||
|
\S{insCMPccSD} \i\c{CMPccSD}: Scalar Double-Precision FP Compare
|
||
|
\I\c{CMPEQSD} \I\c{CMPLTSD} \I\c{CMPLESD} \I\c{CMPUNORDSD}
|
||
|
\I\c{CMPNEQSD} \I\c{CMPNLTSD} \I\c{CMPNLESD} \I\c{CMPORDSD}
|
||
|
|
||
|
\c CMPSD xmm1,xmm2/mem64,imm8 ; F2 0F C2 /r ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c CMPEQSD xmm1,xmm2/mem64 ; F2 0F C2 /r 00 [WILLAMETTE,SSE2]
|
||
|
\c CMPLTSD xmm1,xmm2/mem64 ; F2 0F C2 /r 01 [WILLAMETTE,SSE2]
|
||
|
\c CMPLESD xmm1,xmm2/mem64 ; F2 0F C2 /r 02 [WILLAMETTE,SSE2]
|
||
|
\c CMPUNORDSD xmm1,xmm2/mem64 ; F2 0F C2 /r 03 [WILLAMETTE,SSE2]
|
||
|
\c CMPNEQSD xmm1,xmm2/mem64 ; F2 0F C2 /r 04 [WILLAMETTE,SSE2]
|
||
|
\c CMPNLTSD xmm1,xmm2/mem64 ; F2 0F C2 /r 05 [WILLAMETTE,SSE2]
|
||
|
\c CMPNLESD xmm1,xmm2/mem64 ; F2 0F C2 /r 06 [WILLAMETTE,SSE2]
|
||
|
\c CMPORDSD xmm1,xmm2/mem64 ; F2 0F C2 /r 07 [WILLAMETTE,SSE2]
|
||
|
|
||
|
The \c{CMPccSD} instructions compare the low-order double-precision
|
||
|
FP values in the source and destination operands, and returns the
|
||
|
result of the comparison in the destination register. The result of
|
||
|
each comparison is a quadword mask of all 1s (comparison true) or
|
||
|
all 0s (comparison false).
|
||
|
|
||
|
The destination is an \c{XMM} register. The source can be either an
|
||
|
\c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
The third operand is an 8-bit immediate value, of which the low 3
|
||
|
bits define the type of comparison. For ease of programming, the
|
||
|
8 two-operand pseudo-instructions are provided, with the third
|
||
|
operand already filled in. The \I{Condition Predicates}
|
||
|
\c{Condition Predicates} are:
|
||
|
|
||
|
\c EQ 0 Equal
|
||
|
\c LT 1 Less-than
|
||
|
\c LE 2 Less-than-or-equal
|
||
|
\c UNORD 3 Unordered
|
||
|
\c NE 4 Not-equal
|
||
|
\c NLT 5 Not-less-than
|
||
|
\c NLE 6 Not-less-than-or-equal
|
||
|
\c ORD 7 Ordered
|
||
|
|
||
|
For more details of the comparison predicates, and details of how
|
||
|
to emulate the "greater-than" equivalents, see \k{iref-SSE-cc}
|
||
|
|
||
|
|
||
|
\S{insCMPccSS} \i\c{CMPccSS}: Scalar Single-Precision FP Compare
|
||
|
\I\c{CMPEQSS} \I\c{CMPLTSS} \I\c{CMPLESS} \I\c{CMPUNORDSS}
|
||
|
\I\c{CMPNEQSS} \I\c{CMPNLTSS} \I\c{CMPNLESS} \I\c{CMPORDSS}
|
||
|
|
||
|
\c CMPSS xmm1,xmm2/mem32,imm8 ; F3 0F C2 /r ib [KATMAI,SSE]
|
||
|
|
||
|
\c CMPEQSS xmm1,xmm2/mem32 ; F3 0F C2 /r 00 [KATMAI,SSE]
|
||
|
\c CMPLTSS xmm1,xmm2/mem32 ; F3 0F C2 /r 01 [KATMAI,SSE]
|
||
|
\c CMPLESS xmm1,xmm2/mem32 ; F3 0F C2 /r 02 [KATMAI,SSE]
|
||
|
\c CMPUNORDSS xmm1,xmm2/mem32 ; F3 0F C2 /r 03 [KATMAI,SSE]
|
||
|
\c CMPNEQSS xmm1,xmm2/mem32 ; F3 0F C2 /r 04 [KATMAI,SSE]
|
||
|
\c CMPNLTSS xmm1,xmm2/mem32 ; F3 0F C2 /r 05 [KATMAI,SSE]
|
||
|
\c CMPNLESS xmm1,xmm2/mem32 ; F3 0F C2 /r 06 [KATMAI,SSE]
|
||
|
\c CMPORDSS xmm1,xmm2/mem32 ; F3 0F C2 /r 07 [KATMAI,SSE]
|
||
|
|
||
|
The \c{CMPccSS} instructions compare the low-order single-precision
|
||
|
FP values in the source and destination operands, and returns the
|
||
|
result of the comparison in the destination register. The result of
|
||
|
each comparison is a doubleword mask of all 1s (comparison true) or
|
||
|
all 0s (comparison false).
|
||
|
|
||
|
The destination is an \c{XMM} register. The source can be either an
|
||
|
\c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
The third operand is an 8-bit immediate value, of which the low 3
|
||
|
bits define the type of comparison. For ease of programming, the
|
||
|
8 two-operand pseudo-instructions are provided, with the third
|
||
|
operand already filled in. The \I{Condition Predicates}
|
||
|
\c{Condition Predicates} are:
|
||
|
|
||
|
\c EQ 0 Equal
|
||
|
\c LT 1 Less-than
|
||
|
\c LE 2 Less-than-or-equal
|
||
|
\c UNORD 3 Unordered
|
||
|
\c NE 4 Not-equal
|
||
|
\c NLT 5 Not-less-than
|
||
|
\c NLE 6 Not-less-than-or-equal
|
||
|
\c ORD 7 Ordered
|
||
|
|
||
|
For more details of the comparison predicates, and details of how
|
||
|
to emulate the "greater-than" equivalents, see \k{iref-SSE-cc}
|
||
|
|
||
|
|
||
|
\S{insCMPXCHG} \i\c{CMPXCHG}, \i\c{CMPXCHG486}: Compare and Exchange
|
||
|
|
||
|
\c CMPXCHG r/m8,reg8 ; 0F B0 /r [PENT]
|
||
|
\c CMPXCHG r/m16,reg16 ; o16 0F B1 /r [PENT]
|
||
|
\c CMPXCHG r/m32,reg32 ; o32 0F B1 /r [PENT]
|
||
|
|
||
|
\c CMPXCHG486 r/m8,reg8 ; 0F A6 /r [486,UNDOC]
|
||
|
\c CMPXCHG486 r/m16,reg16 ; o16 0F A7 /r [486,UNDOC]
|
||
|
\c CMPXCHG486 r/m32,reg32 ; o32 0F A7 /r [486,UNDOC]
|
||
|
|
||
|
These two instructions perform exactly the same operation; however,
|
||
|
apparently some (not all) 486 processors support it under a
|
||
|
non-standard opcode, so NASM provides the undocumented
|
||
|
\c{CMPXCHG486} form to generate the non-standard opcode.
|
||
|
|
||
|
\c{CMPXCHG} compares its destination (first) operand to the value in
|
||
|
\c{AL}, \c{AX} or \c{EAX} (depending on the operand size of the
|
||
|
instruction). If they are equal, it copies its source (second)
|
||
|
operand into the destination and sets the zero flag. Otherwise, it
|
||
|
clears the zero flag and copies the destination register to AL, AX or EAX.
|
||
|
|
||
|
The destination can be either a register or a memory location. The
|
||
|
source is a register.
|
||
|
|
||
|
\c{CMPXCHG} is intended to be used for atomic operations in
|
||
|
multitasking or multiprocessor environments. To safely update a
|
||
|
value in shared memory, for example, you might load the value into
|
||
|
\c{EAX}, load the updated value into \c{EBX}, and then execute the
|
||
|
instruction \c{LOCK CMPXCHG [value],EBX}. If \c{value} has not
|
||
|
changed since being loaded, it is updated with your desired new
|
||
|
value, and the zero flag is set to let you know it has worked. (The
|
||
|
\c{LOCK} prefix prevents another processor doing anything in the
|
||
|
middle of this operation: it guarantees atomicity.) However, if
|
||
|
another processor has modified the value in between your load and
|
||
|
your attempted store, the store does not happen, and you are
|
||
|
notified of the failure by a cleared zero flag, so you can go round
|
||
|
and try again.
|
||
|
|
||
|
|
||
|
\S{insCMPXCHG8B} \i\c{CMPXCHG8B}: Compare and Exchange Eight Bytes
|
||
|
|
||
|
\c CMPXCHG8B mem ; 0F C7 /1 [PENT]
|
||
|
|
||
|
This is a larger and more unwieldy version of \c{CMPXCHG}: it
|
||
|
compares the 64-bit (eight-byte) value stored at \c{[mem]} with the
|
||
|
value in \c{EDX:EAX}. If they are equal, it sets the zero flag and
|
||
|
stores \c{ECX:EBX} into the memory area. If they are unequal, it
|
||
|
clears the zero flag and stores the memory contents into \c{EDX:EAX}.
|
||
|
|
||
|
\c{CMPXCHG8B} can be used with the \c{LOCK} prefix, to allow atomic
|
||
|
execution. This is useful in multi-processor and multi-tasking
|
||
|
environments.
|
||
|
|
||
|
|
||
|
\S{insCOMISD} \i\c{COMISD}: Scalar Ordered Double-Precision FP Compare and Set EFLAGS
|
||
|
|
||
|
\c COMISD xmm1,xmm2/mem64 ; 66 0F 2F /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{COMISD} compares the low-order double-precision FP value in the
|
||
|
two source operands. ZF, PF and CF are set according to the result.
|
||
|
OF, AF and AF are cleared. The unordered result is returned if either
|
||
|
source is a NaN (QNaN or SNaN).
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be either
|
||
|
an \c{XMM} register or a memory location.
|
||
|
|
||
|
The flags are set according to the following rules:
|
||
|
|
||
|
\c Result Flags Values
|
||
|
|
||
|
\c UNORDERED: ZF,PF,CF <-- 111;
|
||
|
\c GREATER_THAN: ZF,PF,CF <-- 000;
|
||
|
\c LESS_THAN: ZF,PF,CF <-- 001;
|
||
|
\c EQUAL: ZF,PF,CF <-- 100;
|
||
|
|
||
|
|
||
|
\S{insCOMISS} \i\c{COMISS}: Scalar Ordered Single-Precision FP Compare and Set EFLAGS
|
||
|
|
||
|
\c COMISS xmm1,xmm2/mem32 ; 66 0F 2F /r [KATMAI,SSE]
|
||
|
|
||
|
\c{COMISS} compares the low-order single-precision FP value in the
|
||
|
two source operands. ZF, PF and CF are set according to the result.
|
||
|
OF, AF and AF are cleared. The unordered result is returned if either
|
||
|
source is a NaN (QNaN or SNaN).
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be either
|
||
|
an \c{XMM} register or a memory location.
|
||
|
|
||
|
The flags are set according to the following rules:
|
||
|
|
||
|
\c Result Flags Values
|
||
|
|
||
|
\c UNORDERED: ZF,PF,CF <-- 111;
|
||
|
\c GREATER_THAN: ZF,PF,CF <-- 000;
|
||
|
\c LESS_THAN: ZF,PF,CF <-- 001;
|
||
|
\c EQUAL: ZF,PF,CF <-- 100;
|
||
|
|
||
|
|
||
|
\S{insCPUID} \i\c{CPUID}: Get CPU Identification Code
|
||
|
|
||
|
\c CPUID ; 0F A2 [PENT]
|
||
|
|
||
|
\c{CPUID} returns various information about the processor it is
|
||
|
being executed on. It fills the four registers \c{EAX}, \c{EBX},
|
||
|
\c{ECX} and \c{EDX} with information, which varies depending on the
|
||
|
input contents of \c{EAX}.
|
||
|
|
||
|
\c{CPUID} also acts as a barrier to serialize instruction execution:
|
||
|
executing the \c{CPUID} instruction guarantees that all the effects
|
||
|
(memory modification, flag modification, register modification) of
|
||
|
previous instructions have been completed before the next
|
||
|
instruction gets fetched.
|
||
|
|
||
|
The information returned is as follows:
|
||
|
|
||
|
\b If \c{EAX} is zero on input, \c{EAX} on output holds the maximum
|
||
|
acceptable input value of \c{EAX}, and \c{EBX:EDX:ECX} contain the
|
||
|
string \c{"GenuineIntel"} (or not, if you have a clone processor).
|
||
|
That is to say, \c{EBX} contains \c{"Genu"} (in NASM's own sense of
|
||
|
character constants, described in \k{chrconst}), \c{EDX} contains
|
||
|
\c{"ineI"} and \c{ECX} contains \c{"ntel"}.
|
||
|
|
||
|
\b If \c{EAX} is one on input, \c{EAX} on output contains version
|
||
|
information about the processor, and \c{EDX} contains a set of
|
||
|
feature flags, showing the presence and absence of various features.
|
||
|
For example, bit 8 is set if the \c{CMPXCHG8B} instruction
|
||
|
(\k{insCMPXCHG8B}) is supported, bit 15 is set if the conditional
|
||
|
move instructions (\k{insCMOVcc} and \k{insFCMOVB}) are supported,
|
||
|
and bit 23 is set if \c{MMX} instructions are supported.
|
||
|
|
||
|
\b If \c{EAX} is two on input, \c{EAX}, \c{EBX}, \c{ECX} and \c{EDX}
|
||
|
all contain information about caches and TLBs (Translation Lookahead
|
||
|
Buffers).
|
||
|
|
||
|
For more information on the data returned from \c{CPUID}, see the
|
||
|
documentation from Intel and other processor manufacturers.
|
||
|
|
||
|
|
||
|
\S{insCVTDQ2PD} \i\c{CVTDQ2PD}:
|
||
|
Packed Signed INT32 to Packed Double-Precision FP Conversion
|
||
|
|
||
|
\c CVTDQ2PD xmm1,xmm2/mem64 ; F3 0F E6 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTDQ2PD} converts two packed signed doublewords from the source
|
||
|
operand to two packed double-precision FP values in the destination
|
||
|
operand.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be
|
||
|
either an \c{XMM} register or a 64-bit memory location. If the
|
||
|
source is a register, the packed integers are in the low quadword.
|
||
|
|
||
|
|
||
|
\S{insCVTDQ2PS} \i\c{CVTDQ2PS}:
|
||
|
Packed Signed INT32 to Packed Single-Precision FP Conversion
|
||
|
|
||
|
\c CVTDQ2PS xmm1,xmm2/mem128 ; 0F 5B /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTDQ2PS} converts four packed signed doublewords from the source
|
||
|
operand to four packed single-precision FP values in the destination
|
||
|
operand.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTPD2DQ} \i\c{CVTPD2DQ}:
|
||
|
Packed Double-Precision FP to Packed Signed INT32 Conversion
|
||
|
|
||
|
\c CVTPD2DQ xmm1,xmm2/mem128 ; F2 0F E6 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTPD2DQ} converts two packed double-precision FP values from the
|
||
|
source operand to two packed signed doublewords in the low quadword
|
||
|
of the destination operand. The high quadword of the destination is
|
||
|
set to all 0s.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTPD2PI} \i\c{CVTPD2PI}:
|
||
|
Packed Double-Precision FP to Packed Signed INT32 Conversion
|
||
|
|
||
|
\c CVTPD2PI mm,xmm/mem128 ; 66 0F 2D /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTPD2PI} converts two packed double-precision FP values from the
|
||
|
source operand to two packed signed doublewords in the destination
|
||
|
operand.
|
||
|
|
||
|
The destination operand is an \c{MMX} register. The source can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTPD2PS} \i\c{CVTPD2PS}:
|
||
|
Packed Double-Precision FP to Packed Single-Precision FP Conversion
|
||
|
|
||
|
\c CVTPD2PS xmm1,xmm2/mem128 ; 66 0F 5A /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTPD2PS} converts two packed double-precision FP values from the
|
||
|
source operand to two packed single-precision FP values in the low
|
||
|
quadword of the destination operand. The high quadword of the
|
||
|
destination is set to all 0s.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTPI2PD} \i\c{CVTPI2PD}:
|
||
|
Packed Signed INT32 to Packed Double-Precision FP Conversion
|
||
|
|
||
|
\c CVTPI2PD xmm,mm/mem64 ; 66 0F 2A /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTPI2PD} converts two packed signed doublewords from the source
|
||
|
operand to two packed double-precision FP values in the destination
|
||
|
operand.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be
|
||
|
either an \c{MMX} register or a 64-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTPI2PS} \i\c{CVTPI2PS}:
|
||
|
Packed Signed INT32 to Packed Single-FP Conversion
|
||
|
|
||
|
\c CVTPI2PS xmm,mm/mem64 ; 0F 2A /r [KATMAI,SSE]
|
||
|
|
||
|
\c{CVTPI2PS} converts two packed signed doublewords from the source
|
||
|
operand to two packed single-precision FP values in the low quadword
|
||
|
of the destination operand. The high quadword of the destination
|
||
|
remains unchanged.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be
|
||
|
either an \c{MMX} register or a 64-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTPS2DQ} \i\c{CVTPS2DQ}:
|
||
|
Packed Single-Precision FP to Packed Signed INT32 Conversion
|
||
|
|
||
|
\c CVTPS2DQ xmm1,xmm2/mem128 ; 66 0F 5B /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTPS2DQ} converts four packed single-precision FP values from the
|
||
|
source operand to four packed signed doublewords in the destination operand.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTPS2PD} \i\c{CVTPS2PD}:
|
||
|
Packed Single-Precision FP to Packed Double-Precision FP Conversion
|
||
|
|
||
|
\c CVTPS2PD xmm1,xmm2/mem64 ; 0F 5A /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTPS2PD} converts two packed single-precision FP values from the
|
||
|
source operand to two packed double-precision FP values in the destination
|
||
|
operand.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be
|
||
|
either an \c{XMM} register or a 64-bit memory location. If the source
|
||
|
is a register, the input values are in the low quadword.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTPS2PI} \i\c{CVTPS2PI}:
|
||
|
Packed Single-Precision FP to Packed Signed INT32 Conversion
|
||
|
|
||
|
\c CVTPS2PI mm,xmm/mem64 ; 0F 2D /r [KATMAI,SSE]
|
||
|
|
||
|
\c{CVTPS2PI} converts two packed single-precision FP values from
|
||
|
the source operand to two packed signed doublewords in the destination
|
||
|
operand.
|
||
|
|
||
|
The destination operand is an \c{MMX} register. The source can be
|
||
|
either an \c{XMM} register or a 64-bit memory location. If the
|
||
|
source is a register, the input values are in the low quadword.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTSD2SI} \i\c{CVTSD2SI}:
|
||
|
Scalar Double-Precision FP to Signed INT32 Conversion
|
||
|
|
||
|
\c CVTSD2SI reg32,xmm/mem64 ; F2 0F 2D /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTSD2SI} converts a double-precision FP value from the source
|
||
|
operand to a signed doubleword in the destination operand.
|
||
|
|
||
|
The destination operand is a general purpose register. The source can be
|
||
|
either an \c{XMM} register or a 64-bit memory location. If the
|
||
|
source is a register, the input value is in the low quadword.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTSD2SS} \i\c{CVTSD2SS}:
|
||
|
Scalar Double-Precision FP to Scalar Single-Precision FP Conversion
|
||
|
|
||
|
\c CVTSD2SS xmm1,xmm2/mem64 ; F2 0F 5A /r [KATMAI,SSE]
|
||
|
|
||
|
\c{CVTSD2SS} converts a double-precision FP value from the source
|
||
|
operand to a single-precision FP value in the low doubleword of the
|
||
|
destination operand. The upper 3 doublewords are left unchanged.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be
|
||
|
either an \c{XMM} register or a 64-bit memory location. If the
|
||
|
source is a register, the input value is in the low quadword.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTSI2SD} \i\c{CVTSI2SD}:
|
||
|
Signed INT32 to Scalar Double-Precision FP Conversion
|
||
|
|
||
|
\c CVTSI2SD xmm,r/m32 ; F2 0F 2A /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTSI2SD} converts a signed doubleword from the source operand to
|
||
|
a double-precision FP value in the low quadword of the destination
|
||
|
operand. The high quadword is left unchanged.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be either
|
||
|
a general purpose register or a 32-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTSI2SS} \i\c{CVTSI2SS}:
|
||
|
Signed INT32 to Scalar Single-Precision FP Conversion
|
||
|
|
||
|
\c CVTSI2SS xmm,r/m32 ; F3 0F 2A /r [KATMAI,SSE]
|
||
|
|
||
|
\c{CVTSI2SS} converts a signed doubleword from the source operand to a
|
||
|
single-precision FP value in the low doubleword of the destination operand.
|
||
|
The upper 3 doublewords are left unchanged.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be either
|
||
|
a general purpose register or a 32-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTSS2SD} \i\c{CVTSS2SD}:
|
||
|
Scalar Single-Precision FP to Scalar Double-Precision FP Conversion
|
||
|
|
||
|
\c CVTSS2SD xmm1,xmm2/mem32 ; F3 0F 5A /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTSS2SD} converts a single-precision FP value from the source operand
|
||
|
to a double-precision FP value in the low quadword of the destination
|
||
|
operand. The upper quadword is left unchanged.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be either
|
||
|
an \c{XMM} register or a 32-bit memory location. If the source is a
|
||
|
register, the input value is contained in the low doubleword.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTSS2SI} \i\c{CVTSS2SI}:
|
||
|
Scalar Single-Precision FP to Signed INT32 Conversion
|
||
|
|
||
|
\c CVTSS2SI reg32,xmm/mem32 ; F3 0F 2D /r [KATMAI,SSE]
|
||
|
|
||
|
\c{CVTSS2SI} converts a single-precision FP value from the source
|
||
|
operand to a signed doubleword in the destination operand.
|
||
|
|
||
|
The destination operand is a general purpose register. The source can be
|
||
|
either an \c{XMM} register or a 32-bit memory location. If the
|
||
|
source is a register, the input value is in the low doubleword.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTTPD2DQ} \i\c{CVTTPD2DQ}:
|
||
|
Packed Double-Precision FP to Packed Signed INT32 Conversion with Truncation
|
||
|
|
||
|
\c CVTTPD2DQ xmm1,xmm2/mem128 ; 66 0F E6 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTTPD2DQ} converts two packed double-precision FP values in the source
|
||
|
operand to two packed single-precision FP values in the destination operand.
|
||
|
If the result is inexact, it is truncated (rounded toward zero). The high
|
||
|
quadword is set to all 0s.
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTTPD2PI} \i\c{CVTTPD2PI}:
|
||
|
Packed Double-Precision FP to Packed Signed INT32 Conversion with Truncation
|
||
|
|
||
|
\c CVTTPD2PI mm,xmm/mem128 ; 66 0F 2C /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTTPD2PI} converts two packed double-precision FP values in the source
|
||
|
operand to two packed single-precision FP values in the destination operand.
|
||
|
If the result is inexact, it is truncated (rounded toward zero).
|
||
|
|
||
|
The destination operand is an \c{MMX} register. The source can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTTPS2DQ} \i\c{CVTTPS2DQ}:
|
||
|
Packed Single-Precision FP to Packed Signed INT32 Conversion with Truncation
|
||
|
|
||
|
\c CVTTPS2DQ xmm1,xmm2/mem128 ; F3 0F 5B /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTTPS2DQ} converts four packed single-precision FP values in the source
|
||
|
operand to four packed signed doublewords in the destination operand.
|
||
|
If the result is inexact, it is truncated (rounded toward zero).
|
||
|
|
||
|
The destination operand is an \c{XMM} register. The source can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTTPS2PI} \i\c{CVTTPS2PI}:
|
||
|
Packed Single-Precision FP to Packed Signed INT32 Conversion with Truncation
|
||
|
|
||
|
\c CVTTPS2PI mm,xmm/mem64 ; 0F 2C /r [KATMAI,SSE]
|
||
|
|
||
|
\c{CVTTPS2PI} converts two packed single-precision FP values in the source
|
||
|
operand to two packed signed doublewords in the destination operand.
|
||
|
If the result is inexact, it is truncated (rounded toward zero). If
|
||
|
the source is a register, the input values are in the low quadword.
|
||
|
|
||
|
The destination operand is an \c{MMX} register. The source can be
|
||
|
either an \c{XMM} register or a 64-bit memory location. If the source
|
||
|
is a register, the input value is in the low quadword.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTTSD2SI} \i\c{CVTTSD2SI}:
|
||
|
Scalar Double-Precision FP to Signed INT32 Conversion with Truncation
|
||
|
|
||
|
\c CVTTSD2SI reg32,xmm/mem64 ; F2 0F 2C /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{CVTTSD2SI} converts a double-precision FP value in the source operand
|
||
|
to a signed doubleword in the destination operand. If the result is
|
||
|
inexact, it is truncated (rounded toward zero).
|
||
|
|
||
|
The destination operand is a general purpose register. The source can be
|
||
|
either an \c{XMM} register or a 64-bit memory location. If the source is a
|
||
|
register, the input value is in the low quadword.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insCVTTSS2SI} \i\c{CVTTSS2SI}:
|
||
|
Scalar Single-Precision FP to Signed INT32 Conversion with Truncation
|
||
|
|
||
|
\c CVTTSD2SI reg32,xmm/mem32 ; F3 0F 2C /r [KATMAI,SSE]
|
||
|
|
||
|
\c{CVTTSS2SI} converts a single-precision FP value in the source operand
|
||
|
to a signed doubleword in the destination operand. If the result is
|
||
|
inexact, it is truncated (rounded toward zero).
|
||
|
|
||
|
The destination operand is a general purpose register. The source can be
|
||
|
either an \c{XMM} register or a 32-bit memory location. If the source is a
|
||
|
register, the input value is in the low doubleword.
|
||
|
|
||
|
For more details of this instruction, see the Intel Processor manuals.
|
||
|
|
||
|
|
||
|
\S{insDAA} \i\c{DAA}, \i\c{DAS}: Decimal Adjustments
|
||
|
|
||
|
\c DAA ; 27 [8086]
|
||
|
\c DAS ; 2F [8086]
|
||
|
|
||
|
These instructions are used in conjunction with the add and subtract
|
||
|
instructions to perform binary-coded decimal arithmetic in
|
||
|
\e{packed} (one BCD digit per nibble) form. For the unpacked
|
||
|
equivalents, see \k{insAAA}.
|
||
|
|
||
|
\c{DAA} should be used after a one-byte \c{ADD} instruction whose
|
||
|
destination was the \c{AL} register: by means of examining the value
|
||
|
in the \c{AL} and also the auxiliary carry flag \c{AF}, it
|
||
|
determines whether either digit of the addition has overflowed, and
|
||
|
adjusts it (and sets the carry and auxiliary-carry flags) if so. You
|
||
|
can add long BCD strings together by doing \c{ADD}/\c{DAA} on the
|
||
|
low two digits, then doing \c{ADC}/\c{DAA} on each subsequent pair
|
||
|
of digits.
|
||
|
|
||
|
\c{DAS} works similarly to \c{DAA}, but is for use after \c{SUB}
|
||
|
instructions rather than \c{ADD}.
|
||
|
|
||
|
|
||
|
\S{insDEC} \i\c{DEC}: Decrement Integer
|
||
|
|
||
|
\c DEC reg16 ; o16 48+r [8086]
|
||
|
\c DEC reg32 ; o32 48+r [386]
|
||
|
\c DEC r/m8 ; FE /1 [8086]
|
||
|
\c DEC r/m16 ; o16 FF /1 [8086]
|
||
|
\c DEC r/m32 ; o32 FF /1 [386]
|
||
|
|
||
|
\c{DEC} subtracts 1 from its operand. It does \e{not} affect the
|
||
|
carry flag: to affect the carry flag, use \c{SUB something,1} (see
|
||
|
\k{insSUB}). \c{DEC} affects all the other flags according to the result.
|
||
|
|
||
|
This instruction can be used with a \c{LOCK} prefix to allow atomic
|
||
|
execution.
|
||
|
|
||
|
See also \c{INC} (\k{insINC}).
|
||
|
|
||
|
|
||
|
\S{insDIV} \i\c{DIV}: Unsigned Integer Divide
|
||
|
|
||
|
\c DIV r/m8 ; F6 /6 [8086]
|
||
|
\c DIV r/m16 ; o16 F7 /6 [8086]
|
||
|
\c DIV r/m32 ; o32 F7 /6 [386]
|
||
|
|
||
|
\c{DIV} performs unsigned integer division. The explicit operand
|
||
|
provided is the divisor; the dividend and destination operands are
|
||
|
implicit, in the following way:
|
||
|
|
||
|
\b For \c{DIV r/m8}, \c{AX} is divided by the given operand; the
|
||
|
quotient is stored in \c{AL} and the remainder in \c{AH}.
|
||
|
|
||
|
\b For \c{DIV r/m16}, \c{DX:AX} is divided by the given operand; the
|
||
|
quotient is stored in \c{AX} and the remainder in \c{DX}.
|
||
|
|
||
|
\b For \c{DIV r/m32}, \c{EDX:EAX} is divided by the given operand;
|
||
|
the quotient is stored in \c{EAX} and the remainder in \c{EDX}.
|
||
|
|
||
|
Signed integer division is performed by the \c{IDIV} instruction:
|
||
|
see \k{insIDIV}.
|
||
|
|
||
|
|
||
|
\S{insDIVPD} \i\c{DIVPD}: Packed Double-Precision FP Divide
|
||
|
|
||
|
\c DIVPD xmm1,xmm2/mem128 ; 66 0F 5E /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{DIVPD} divides the two packed double-precision FP values in
|
||
|
the destination operand by the two packed double-precision FP
|
||
|
values in the source operand, and stores the packed double-precision
|
||
|
results in the destination register.
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
\c dst[0-63] := dst[0-63] / src[0-63],
|
||
|
\c dst[64-127] := dst[64-127] / src[64-127].
|
||
|
|
||
|
|
||
|
\S{insDIVPS} \i\c{DIVPS}: Packed Single-Precision FP Divide
|
||
|
|
||
|
\c DIVPS xmm1,xmm2/mem128 ; 0F 5E /r [KATMAI,SSE]
|
||
|
|
||
|
\c{DIVPS} divides the four packed single-precision FP values in
|
||
|
the destination operand by the four packed single-precision FP
|
||
|
values in the source operand, and stores the packed single-precision
|
||
|
results in the destination register.
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 128-bit memory location.
|
||
|
|
||
|
\c dst[0-31] := dst[0-31] / src[0-31],
|
||
|
\c dst[32-63] := dst[32-63] / src[32-63],
|
||
|
\c dst[64-95] := dst[64-95] / src[64-95],
|
||
|
\c dst[96-127] := dst[96-127] / src[96-127].
|
||
|
|
||
|
|
||
|
\S{insDIVSD} \i\c{DIVSD}: Scalar Double-Precision FP Divide
|
||
|
|
||
|
\c DIVSD xmm1,xmm2/mem64 ; F2 0F 5E /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{DIVSD} divides the low-order double-precision FP value in the
|
||
|
destination operand by the low-order double-precision FP value in
|
||
|
the source operand, and stores the double-precision result in the
|
||
|
destination register.
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 64-bit memory location.
|
||
|
|
||
|
\c dst[0-63] := dst[0-63] / src[0-63],
|
||
|
\c dst[64-127] remains unchanged.
|
||
|
|
||
|
|
||
|
\S{insDIVSS} \i\c{DIVSS}: Scalar Single-Precision FP Divide
|
||
|
|
||
|
\c DIVSS xmm1,xmm2/mem32 ; F3 0F 5E /r [KATMAI,SSE]
|
||
|
|
||
|
\c{DIVSS} divides the low-order single-precision FP value in the
|
||
|
destination operand by the low-order single-precision FP value in
|
||
|
the source operand, and stores the single-precision result in the
|
||
|
destination register.
|
||
|
|
||
|
The destination is an \c{XMM} register. The source operand can be
|
||
|
either an \c{XMM} register or a 32-bit memory location.
|
||
|
|
||
|
\c dst[0-31] := dst[0-31] / src[0-31],
|
||
|
\c dst[32-127] remains unchanged.
|
||
|
|
||
|
|
||
|
\S{insEMMS} \i\c{EMMS}: Empty MMX State
|
||
|
|
||
|
\c EMMS ; 0F 77 [PENT,MMX]
|
||
|
|
||
|
\c{EMMS} sets the FPU tag word (marking which floating-point registers
|
||
|
are available) to all ones, meaning all registers are available for
|
||
|
the FPU to use. It should be used after executing \c{MMX} instructions
|
||
|
and before executing any subsequent floating-point operations.
|
||
|
|
||
|
|
||
|
\S{insENTER} \i\c{ENTER}: Create Stack Frame
|
||
|
|
||
|
\c ENTER imm,imm ; C8 iw ib [186]
|
||
|
|
||
|
\c{ENTER} constructs a \i\c{stack frame} for a high-level language
|
||
|
procedure call. The first operand (the \c{iw} in the opcode
|
||
|
definition above refers to the first operand) gives the amount of
|
||
|
stack space to allocate for local variables; the second (the \c{ib}
|
||
|
above) gives the nesting level of the procedure (for languages like
|
||
|
Pascal, with nested procedures).
|
||
|
|
||
|
The function of \c{ENTER}, with a nesting level of zero, is
|
||
|
equivalent to
|
||
|
|
||
|
\c PUSH EBP ; or PUSH BP in 16 bits
|
||
|
\c MOV EBP,ESP ; or MOV BP,SP in 16 bits
|
||
|
\c SUB ESP,operand1 ; or SUB SP,operand1 in 16 bits
|
||
|
|
||
|
This creates a stack frame with the procedure parameters accessible
|
||
|
upwards from \c{EBP}, and local variables accessible downwards from
|
||
|
\c{EBP}.
|
||
|
|
||
|
With a nesting level of one, the stack frame created is 4 (or 2)
|
||
|
bytes bigger, and the value of the final frame pointer \c{EBP} is
|
||
|
accessible in memory at \c{[EBP-4]}.
|
||
|
|
||
|
This allows \c{ENTER}, when called with a nesting level of two, to
|
||
|
look at the stack frame described by the \e{previous} value of
|
||
|
\c{EBP}, find the frame pointer at offset -4 from that, and push it
|
||
|
along with its new frame pointer, so that when a level-two procedure
|
||
|
is called from within a level-one procedure, \c{[EBP-4]} holds the
|
||
|
frame pointer of the most recent level-one procedure call and
|
||
|
\c{[EBP-8]} holds that of the most recent level-two call. And so on,
|
||
|
for nesting levels up to 31.
|
||
|
|
||
|
Stack frames created by \c{ENTER} can be destroyed by the \c{LEAVE}
|
||
|
instruction: see \k{insLEAVE}.
|
||
|
|
||
|
|
||
|
\S{insF2XM1} \i\c{F2XM1}: Calculate 2**X-1
|
||
|
|
||
|
\c F2XM1 ; D9 F0 [8086,FPU]
|
||
|
|
||
|
\c{F2XM1} raises 2 to the power of \c{ST0}, subtracts one, and
|
||
|
stores the result back into \c{ST0}. The initial contents of \c{ST0}
|
||
|
must be a number in the range -1.0 to +1.0.
|
||
|
|
||
|
|
||
|
\S{insFABS} \i\c{FABS}: Floating-Point Absolute Value
|
||
|
|
||
|
\c FABS ; D9 E1 [8086,FPU]
|
||
|
|
||
|
\c{FABS} computes the absolute value of \c{ST0},by clearing the sign
|
||
|
bit, and stores the result back in \c{ST0}.
|
||
|
|
||
|
|
||
|
\S{insFADD} \i\c{FADD}, \i\c{FADDP}: Floating-Point Addition
|
||
|
|
||
|
\c FADD mem32 ; D8 /0 [8086,FPU]
|
||
|
\c FADD mem64 ; DC /0 [8086,FPU]
|
||
|
|
||
|
\c FADD fpureg ; D8 C0+r [8086,FPU]
|
||
|
\c FADD ST0,fpureg ; D8 C0+r [8086,FPU]
|
||
|
|
||
|
\c FADD TO fpureg ; DC C0+r [8086,FPU]
|
||
|
\c FADD fpureg,ST0 ; DC C0+r [8086,FPU]
|
||
|
|
||
|
\c FADDP fpureg ; DE C0+r [8086,FPU]
|
||
|
\c FADDP fpureg,ST0 ; DE C0+r [8086,FPU]
|
||
|
|
||
|
\b \c{FADD}, given one operand, adds the operand to \c{ST0} and stores
|
||
|
the result back in \c{ST0}. If the operand has the \c{TO} modifier,
|
||
|
the result is stored in the register given rather than in \c{ST0}.
|
||
|
|
||
|
\b \c{FADDP} performs the same function as \c{FADD TO}, but pops the
|
||
|
register stack after storing the result.
|
||
|
|
||
|
The given two-operand forms are synonyms for the one-operand forms.
|
||
|
|
||
|
To add an integer value to \c{ST0}, use the c{FIADD} instruction
|
||
|
(\k{insFIADD})
|
||
|
|
||
|
|
||
|
\S{insFBLD} \i\c{FBLD}, \i\c{FBSTP}: BCD Floating-Point Load and Store
|
||
|
|
||
|
\c FBLD mem80 ; DF /4 [8086,FPU]
|
||
|
\c FBSTP mem80 ; DF /6 [8086,FPU]
|
||
|
|
||
|
\c{FBLD} loads an 80-bit (ten-byte) packed binary-coded decimal
|
||
|
number from the given memory address, converts it to a real, and
|
||
|
pushes it on the register stack. \c{FBSTP} stores the value of
|
||
|
\c{ST0}, in packed BCD, at the given address and then pops the
|
||
|
register stack.
|
||
|
|
||
|
|
||
|
\S{insFCHS} \i\c{FCHS}: Floating-Point Change Sign
|
||
|
|
||
|
\c FCHS ; D9 E0 [8086,FPU]
|
||
|
|
||
|
\c{FCHS} negates the number in \c{ST0}, by inverting the sign bit:
|
||
|
negative numbers become positive, and vice versa.
|
||
|
|
||
|
|
||
|
\S{insFCLEX} \i\c{FCLEX}, \c{FNCLEX}: Clear Floating-Point Exceptions
|
||
|
|
||
|
\c FCLEX ; 9B DB E2 [8086,FPU]
|
||
|
\c FNCLEX ; DB E2 [8086,FPU]
|
||
|
|
||
|
\c{FCLEX} clears any floating-point exceptions which may be pending.
|
||
|
\c{FNCLEX} does the same thing but doesn't wait for previous
|
||
|
floating-point operations (including the \e{handling} of pending
|
||
|
exceptions) to finish first.
|
||
|
|
||
|
|
||
|
\S{insFCMOVB} \i\c{FCMOVcc}: Floating-Point Conditional Move
|
||
|
|
||
|
\c FCMOVB fpureg ; DA C0+r [P6,FPU]
|
||
|
\c FCMOVB ST0,fpureg ; DA C0+r [P6,FPU]
|
||
|
|
||
|
\c FCMOVE fpureg ; DA C8+r [P6,FPU]
|
||
|
\c FCMOVE ST0,fpureg ; DA C8+r [P6,FPU]
|
||
|
|
||
|
\c FCMOVBE fpureg ; DA D0+r [P6,FPU]
|
||
|
\c FCMOVBE ST0,fpureg ; DA D0+r [P6,FPU]
|
||
|
|
||
|
\c FCMOVU fpureg ; DA D8+r [P6,FPU]
|
||
|
\c FCMOVU ST0,fpureg ; DA D8+r [P6,FPU]
|
||
|
|
||
|
\c FCMOVNB fpureg ; DB C0+r [P6,FPU]
|
||
|
\c FCMOVNB ST0,fpureg ; DB C0+r [P6,FPU]
|
||
|
|
||
|
\c FCMOVNE fpureg ; DB C8+r [P6,FPU]
|
||
|
\c FCMOVNE ST0,fpureg ; DB C8+r [P6,FPU]
|
||
|
|
||
|
\c FCMOVNBE fpureg ; DB D0+r [P6,FPU]
|
||
|
\c FCMOVNBE ST0,fpureg ; DB D0+r [P6,FPU]
|
||
|
|
||
|
\c FCMOVNU fpureg ; DB D8+r [P6,FPU]
|
||
|
\c FCMOVNU ST0,fpureg ; DB D8+r [P6,FPU]
|
||
|
|
||
|
The \c{FCMOV} instructions perform conditional move operations: each
|
||
|
of them moves the contents of the given register into \c{ST0} if its
|
||
|
condition is satisfied, and does nothing if not.
|
||
|
|
||
|
The conditions are not the same as the standard condition codes used
|
||
|
with conditional jump instructions. The conditions \c{B}, \c{BE},
|
||
|
\c{NB}, \c{NBE}, \c{E} and \c{NE} are exactly as normal, but none of
|
||
|
the other standard ones are supported. Instead, the condition \c{U}
|
||
|
and its counterpart \c{NU} are provided; the \c{U} condition is
|
||
|
satisfied if the last two floating-point numbers compared were
|
||
|
\e{unordered}, i.e. they were not equal but neither one could be
|
||
|
said to be greater than the other, for example if they were NaNs.
|
||
|
(The flag state which signals this is the setting of the parity
|
||
|
flag: so the \c{U} condition is notionally equivalent to \c{PE}, and
|
||
|
\c{NU} is equivalent to \c{PO}.)
|
||
|
|
||
|
The \c{FCMOV} conditions test the main processor's status flags, not
|
||
|
the FPU status flags, so using \c{FCMOV} directly after \c{FCOM}
|
||
|
will not work. Instead, you should either use \c{FCOMI} which writes
|
||
|
directly to the main CPU flags word, or use \c{FSTSW} to extract the
|
||
|
FPU flags.
|
||
|
|
||
|
Although the \c{FCMOV} instructions are flagged \c{P6} above, they
|
||
|
may not be supported by all Pentium Pro processors; the \c{CPUID}
|
||
|
instruction (\k{insCPUID}) will return a bit which indicates whether
|
||
|
conditional moves are supported.
|
||
|
|
||
|
|
||
|
\S{insFCOM} \i\c{FCOM}, \i\c{FCOMP}, \i\c{FCOMPP}, \i\c{FCOMI},
|
||
|
\i\c{FCOMIP}: Floating-Point Compare
|
||
|
|
||
|
\c FCOM mem32 ; D8 /2 [8086,FPU]
|
||
|
\c FCOM mem64 ; DC /2 [8086,FPU]
|
||
|
\c FCOM fpureg ; D8 D0+r [8086,FPU]
|
||
|
\c FCOM ST0,fpureg ; D8 D0+r [8086,FPU]
|
||
|
|
||
|
\c FCOMP mem32 ; D8 /3 [8086,FPU]
|
||
|
\c FCOMP mem64 ; DC /3 [8086,FPU]
|
||
|
\c FCOMP fpureg ; D8 D8+r [8086,FPU]
|
||
|
\c FCOMP ST0,fpureg ; D8 D8+r [8086,FPU]
|
||
|
|
||
|
\c FCOMPP ; DE D9 [8086,FPU]
|
||
|
|
||
|
\c FCOMI fpureg ; DB F0+r [P6,FPU]
|
||
|
\c FCOMI ST0,fpureg ; DB F0+r [P6,FPU]
|
||
|
|
||
|
\c FCOMIP fpureg ; DF F0+r [P6,FPU]
|
||
|
\c FCOMIP ST0,fpureg ; DF F0+r [P6,FPU]
|
||
|
|
||
|
\c{FCOM} compares \c{ST0} with the given operand, and sets the FPU
|
||
|
flags accordingly. \c{ST0} is treated as the left-hand side of the
|
||
|
comparison, so that the carry flag is set (for a `less-than' result)
|
||
|
if \c{ST0} is less than the given operand.
|
||
|
|
||
|
\c{FCOMP} does the same as \c{FCOM}, but pops the register stack
|
||
|
afterwards. \c{FCOMPP} compares \c{ST0} with \c{ST1} and then pops
|
||
|
the register stack twice.
|
||
|
|
||
|
\c{FCOMI} and \c{FCOMIP} work like the corresponding forms of
|
||
|
\c{FCOM} and \c{FCOMP}, but write their results directly to the CPU
|
||
|
flags register rather than the FPU status word, so they can be
|
||
|
immediately followed by conditional jump or conditional move
|
||
|
instructions.
|
||
|
|
||
|
The \c{FCOM} instructions differ from the \c{FUCOM} instructions
|
||
|
(\k{insFUCOM}) only in the way they handle quiet NaNs: \c{FUCOM}
|
||
|
will handle them silently and set the condition code flags to an
|
||
|
`unordered' result, whereas \c{FCOM} will generate an exception.
|
||
|
|
||
|
|
||
|
\S{insFCOS} \i\c{FCOS}: Cosine
|
||
|
|
||
|
\c FCOS ; D9 FF [386,FPU]
|
||
|
|
||
|
\c{FCOS} computes the cosine of \c{ST0} (in radians), and stores the
|
||
|
result in \c{ST0}. The absolute value of \c{ST0} must be less than 2**63.
|
||
|
|
||
|
See also \c{FSINCOS} (\k{insFSIN}).
|
||
|
|
||
|
|
||
|
\S{insFDECSTP} \i\c{FDECSTP}: Decrement Floating-Point Stack Pointer
|
||
|
|
||
|
\c FDECSTP ; D9 F6 [8086,FPU]
|
||
|
|
||
|
\c{FDECSTP} decrements the `top' field in the floating-point status
|
||
|
word. This has the effect of rotating the FPU register stack by one,
|
||
|
as if the contents of \c{ST7} had been pushed on the stack. See also
|
||
|
\c{FINCSTP} (\k{insFINCSTP}).
|
||
|
|
||
|
|
||
|
\S{insFDISI} \i\c{FxDISI}, \i\c{FxENI}: Disable and Enable Floating-Point Interrupts
|
||
|
|
||
|
\c FDISI ; 9B DB E1 [8086,FPU]
|
||
|
\c FNDISI ; DB E1 [8086,FPU]
|
||
|
|
||
|
\c FENI ; 9B DB E0 [8086,FPU]
|
||
|
\c FNENI ; DB E0 [8086,FPU]
|
||
|
|
||
|
\c{FDISI} and \c{FENI} disable and enable floating-point interrupts.
|
||
|
These instructions are only meaningful on original 8087 processors:
|
||
|
the 287 and above treat them as no-operation instructions.
|
||
|
|
||
|
\c{FNDISI} and \c{FNENI} do the same thing as \c{FDISI} and \c{FENI}
|
||
|
respectively, but without waiting for the floating-point processor
|
||
|
to finish what it was doing first.
|
||
|
|
||
|
|
||
|
\S{insFDIV} \i\c{FDIV}, \i\c{FDIVP}, \i\c{FDIVR}, \i\c{FDIVRP}: Floating-Point Division
|
||
|
|
||
|
\c FDIV mem32 ; D8 /6 [8086,FPU]
|
||
|
\c FDIV mem64 ; DC /6 [8086,FPU]
|
||
|
|
||
|
\c FDIV fpureg ; D8 F0+r [8086,FPU]
|
||
|
\c FDIV ST0,fpureg ; D8 F0+r [8086,FPU]
|
||
|
|
||
|
\c FDIV TO fpureg ; DC F8+r [8086,FPU]
|
||
|
\c FDIV fpureg,ST0 ; DC F8+r [8086,FPU]
|
||
|
|
||
|
\c FDIVR mem32 ; D8 /7 [8086,FPU]
|
||
|
\c FDIVR mem64 ; DC /7 [8086,FPU]
|
||
|
|
||
|
\c FDIVR fpureg ; D8 F8+r [8086,FPU]
|
||
|
\c FDIVR ST0,fpureg ; D8 F8+r [8086,FPU]
|
||
|
|
||
|
\c FDIVR TO fpureg ; DC F0+r [8086,FPU]
|
||
|
\c FDIVR fpureg,ST0 ; DC F0+r [8086,FPU]
|
||
|
|
||
|
\c FDIVP fpureg ; DE F8+r [8086,FPU]
|
||
|
\c FDIVP fpureg,ST0 ; DE F8+r [8086,FPU]
|
||
|
|
||
|
\c FDIVRP fpureg ; DE F0+r [8086,FPU]
|
||
|
\c FDIVRP fpureg,ST0 ; DE F0+r [8086,FPU]
|
||
|
|
||
|
\b \c{FDIV} divides \c{ST0} by the given operand and stores the result
|
||
|
back in \c{ST0}, unless the \c{TO} qualifier is given, in which case
|
||
|
it divides the given operand by \c{ST0} and stores the result in the
|
||
|
operand.
|
||
|
|
||
|
\b \c{FDIVR} does the same thing, but does the division the other way
|
||
|
up: so if \c{TO} is not given, it divides the given operand by
|
||
|
\c{ST0} and stores the result in \c{ST0}, whereas if \c{TO} is given
|
||
|
it divides \c{ST0} by its operand and stores the result in the
|
||
|
operand.
|
||
|
|
||
|
\b \c{FDIVP} operates like \c{FDIV TO}, but pops the register stack
|
||
|
once it has finished.
|
||
|
|
||
|
\b \c{FDIVRP} operates like \c{FDIVR TO}, but pops the register stack
|
||
|
once it has finished.
|
||
|
|
||
|
For FP/Integer divisions, see \c{FIDIV} (\k{insFIDIV}).
|
||
|
|
||
|
|
||
|
\S{insFEMMS} \i\c{FEMMS}: Faster Enter/Exit of the MMX or floating-point state
|
||
|
|
||
|
\c FEMMS ; 0F 0E [PENT,3DNOW]
|
||
|
|
||
|
\c{FEMMS} can be used in place of the \c{EMMS} instruction on
|
||
|
processors which support the 3DNow! instruction set. Following
|
||
|
execution of \c{FEMMS}, the state of the \c{MMX/FP} registers
|
||
|
is undefined, and this allows a faster context switch between
|
||
|
\c{FP} and \c{MMX} instructions. The \c{FEMMS} instruction can
|
||
|
also be used \e{before} executing \c{MMX} instructions
|
||
|
|
||
|
|
||
|
\S{insFFREE} \i\c{FFREE}: Flag Floating-Point Register as Unused
|
||
|
|
||
|
\c FFREE fpureg ; DD C0+r [8086,FPU]
|
||
|
\c FFREEP fpureg ; DF C0+r [286,FPU,UNDOC]
|
||
|
|
||
|
\c{FFREE} marks the given register as being empty.
|
||
|
|
||
|
\c{FFREEP} marks the given register as being empty, and then
|
||
|
pops the register stack.
|
||
|
|
||
|
|
||
|
\S{insFIADD} \i\c{FIADD}: Floating-Point/Integer Addition
|
||
|
|
||
|
\c FIADD mem16 ; DE /0 [8086,FPU]
|
||
|
\c FIADD mem32 ; DA /0 [8086,FPU]
|
||
|
|
||
|
\c{FIADD} adds the 16-bit or 32-bit integer stored in the given
|
||
|
memory location to \c{ST0}, storing the result in \c{ST0}.
|
||
|
|
||
|
|
||
|
\S{insFICOM} \i\c{FICOM}, \i\c{FICOMP}: Floating-Point/Integer Compare
|
||
|
|
||
|
\c FICOM mem16 ; DE /2 [8086,FPU]
|
||
|
\c FICOM mem32 ; DA /2 [8086,FPU]
|
||
|
|
||
|
\c FICOMP mem16 ; DE /3 [8086,FPU]
|
||
|
\c FICOMP mem32 ; DA /3 [8086,FPU]
|
||
|
|
||
|
\c{FICOM} compares \c{ST0} with the 16-bit or 32-bit integer stored
|
||
|
in the given memory location, and sets the FPU flags accordingly.
|
||
|
\c{FICOMP} does the same, but pops the register stack afterwards.
|
||
|
|
||
|
|
||
|
\S{insFIDIV} \i\c{FIDIV}, \i\c{FIDIVR}: Floating-Point/Integer Division
|
||
|
|
||
|
\c FIDIV mem16 ; DE /6 [8086,FPU]
|
||
|
\c FIDIV mem32 ; DA /6 [8086,FPU]
|
||
|
|
||
|
\c FIDIVR mem16 ; DE /7 [8086,FPU]
|
||
|
\c FIDIVR mem32 ; DA /7 [8086,FPU]
|
||
|
|
||
|
\c{FIDIV} divides \c{ST0} by the 16-bit or 32-bit integer stored in
|
||
|
the given memory location, and stores the result in \c{ST0}.
|
||
|
\c{FIDIVR} does the division the other way up: it divides the
|
||
|
integer by \c{ST0}, but still stores the result in \c{ST0}.
|
||
|
|
||
|
|
||
|
\S{insFILD} \i\c{FILD}, \i\c{FIST}, \i\c{FISTP}: Floating-Point/Integer Conversion
|
||
|
|
||
|
\c FILD mem16 ; DF /0 [8086,FPU]
|
||
|
\c FILD mem32 ; DB /0 [8086,FPU]
|
||
|
\c FILD mem64 ; DF /5 [8086,FPU]
|
||
|
|
||
|
\c FIST mem16 ; DF /2 [8086,FPU]
|
||
|
\c FIST mem32 ; DB /2 [8086,FPU]
|
||
|
|
||
|
\c FISTP mem16 ; DF /3 [8086,FPU]
|
||
|
\c FISTP mem32 ; DB /3 [8086,FPU]
|
||
|
\c FISTP mem64 ; DF /7 [8086,FPU]
|
||
|
|
||
|
\c{FILD} loads an integer out of a memory location, converts it to a
|
||
|
real, and pushes it on the FPU register stack. \c{FIST} converts
|
||
|
\c{ST0} to an integer and stores that in memory; \c{FISTP} does the
|
||
|
same as \c{FIST}, but pops the register stack afterwards.
|
||
|
|
||
|
|
||
|
\S{insFIMUL} \i\c{FIMUL}: Floating-Point/Integer Multiplication
|
||
|
|
||
|
\c FIMUL mem16 ; DE /1 [8086,FPU]
|
||
|
\c FIMUL mem32 ; DA /1 [8086,FPU]
|
||
|
|
||
|
\c{FIMUL} multiplies \c{ST0} by the 16-bit or 32-bit integer stored
|
||
|
in the given memory location, and stores the result in \c{ST0}.
|
||
|
|
||
|
|
||
|
\S{insFINCSTP} \i\c{FINCSTP}: Increment Floating-Point Stack Pointer
|
||
|
|
||
|
\c FINCSTP ; D9 F7 [8086,FPU]
|
||
|
|
||
|
\c{FINCSTP} increments the `top' field in the floating-point status
|
||
|
word. This has the effect of rotating the FPU register stack by one,
|
||
|
as if the register stack had been popped; however, unlike the
|
||
|
popping of the stack performed by many FPU instructions, it does not
|
||
|
flag the new \c{ST7} (previously \c{ST0}) as empty. See also
|
||
|
\c{FDECSTP} (\k{insFDECSTP}).
|
||
|
|
||
|
|
||
|
\S{insFINIT} \i\c{FINIT}, \i\c{FNINIT}: initialize Floating-Point Unit
|
||
|
|
||
|
\c FINIT ; 9B DB E3 [8086,FPU]
|
||
|
\c FNINIT ; DB E3 [8086,FPU]
|
||
|
|
||
|
\c{FINIT} initializes the FPU to its default state. It flags all
|
||
|
registers as empty, without actually change their values, clears
|
||
|
the top of stack pointer. \c{FNINIT} does the same, without first
|
||
|
waiting for pending exceptions to clear.
|
||
|
|
||
|
|
||
|
\S{insFISUB} \i\c{FISUB}: Floating-Point/Integer Subtraction
|
||
|
|
||
|
\c FISUB mem16 ; DE /4 [8086,FPU]
|
||
|
\c FISUB mem32 ; DA /4 [8086,FPU]
|
||
|
|
||
|
\c FISUBR mem16 ; DE /5 [8086,FPU]
|
||
|
\c FISUBR mem32 ; DA /5 [8086,FPU]
|
||
|
|
||
|
\c{FISUB} subtracts the 16-bit or 32-bit integer stored in the given
|
||
|
memory location from \c{ST0}, and stores the result in \c{ST0}.
|
||
|
\c{FISUBR} does the subtraction the other way round, i.e. it
|
||
|
subtracts \c{ST0} from the given integer, but still stores the
|
||
|
result in \c{ST0}.
|
||
|
|
||
|
|
||
|
\S{insFLD} \i\c{FLD}: Floating-Point Load
|
||
|
|
||
|
\c FLD mem32 ; D9 /0 [8086,FPU]
|
||
|
\c FLD mem64 ; DD /0 [8086,FPU]
|
||
|
\c FLD mem80 ; DB /5 [8086,FPU]
|
||
|
\c FLD fpureg ; D9 C0+r [8086,FPU]
|
||
|
|
||
|
\c{FLD} loads a floating-point value out of the given register or
|
||
|
memory location, and pushes it on the FPU register stack.
|
||
|
|
||
|
|
||
|
\S{insFLD1} \i\c{FLDxx}: Floating-Point Load Constants
|
||
|
|
||
|
\c FLD1 ; D9 E8 [8086,FPU]
|
||
|
\c FLDL2E ; D9 EA [8086,FPU]
|
||
|
\c FLDL2T ; D9 E9 [8086,FPU]
|
||
|
\c FLDLG2 ; D9 EC [8086,FPU]
|
||
|
\c FLDLN2 ; D9 ED [8086,FPU]
|
||
|
\c FLDPI ; D9 EB [8086,FPU]
|
||
|
\c FLDZ ; D9 EE [8086,FPU]
|
||
|
|
||
|
These instructions push specific standard constants on the FPU
|
||
|
register stack.
|
||
|
|
||
|
\c Instruction Constant pushed
|
||
|
|
||
|
\c FLD1 1
|
||
|
\c FLDL2E base-2 logarithm of e
|
||
|
\c FLDL2T base-2 log of 10
|
||
|
\c FLDLG2 base-10 log of 2
|
||
|
\c FLDLN2 base-e log of 2
|
||
|
\c FLDPI pi
|
||
|
\c FLDZ zero
|
||
|
|
||
|
|
||
|
\S{insFLDCW} \i\c{FLDCW}: Load Floating-Point Control Word
|
||
|
|
||
|
\c FLDCW mem16 ; D9 /5 [8086,FPU]
|
||
|
|
||
|
\c{FLDCW} loads a 16-bit value out of memory and stores it into the
|
||
|
FPU control word (governing things like the rounding mode, the
|
||
|
precision, and the exception masks). See also \c{FSTCW}
|
||
|
(\k{insFSTCW}). If exceptions are enabled and you don't want to
|
||
|
generate one, use \c{FCLEX} or \c{FNCLEX} (\k{insFCLEX}) before
|
||
|
loading the new control word.
|
||
|
|
||
|
|
||
|
\S{insFLDENV} \i\c{FLDENV}: Load Floating-Point Environment
|
||
|
|
||
|
\c FLDENV mem ; D9 /4 [8086,FPU]
|
||
|
|
||
|
\c{FLDENV} loads the FPU operating environment (control word, status
|
||
|
word, tag word, instruction pointer, data pointer and last opcode)
|
||
|
from memory. The memory area is 14 or 28 bytes long, depending on
|
||
|
the CPU mode at the time. See also \c{FSTENV} (\k{insFSTENV}).
|
||
|
|
||
|
|
||
|
\S{insFMUL} \i\c{FMUL}, \i\c{FMULP}: Floating-Point Multiply
|
||
|
|
||
|
\c FMUL mem32 ; D8 /1 [8086,FPU]
|
||
|
\c FMUL mem64 ; DC /1 [8086,FPU]
|
||
|
|
||
|
\c FMUL fpureg ; D8 C8+r [8086,FPU]
|
||
|
\c FMUL ST0,fpureg ; D8 C8+r [8086,FPU]
|
||
|
|
||
|
\c FMUL TO fpureg ; DC C8+r [8086,FPU]
|
||
|
\c FMUL fpureg,ST0 ; DC C8+r [8086,FPU]
|
||
|
|
||
|
\c FMULP fpureg ; DE C8+r [8086,FPU]
|
||
|
\c FMULP fpureg,ST0 ; DE C8+r [8086,FPU]
|
||
|
|
||
|
\c{FMUL} multiplies \c{ST0} by the given operand, and stores the
|
||
|
result in \c{ST0}, unless the \c{TO} qualifier is used in which case
|
||
|
it stores the result in the operand. \c{FMULP} performs the same
|
||
|
operation as \c{FMUL TO}, and then pops the register stack.
|
||
|
|
||
|
|
||
|
\S{insFNOP} \i\c{FNOP}: Floating-Point No Operation
|
||
|
|
||
|
\c FNOP ; D9 D0 [8086,FPU]
|
||
|
|
||
|
\c{FNOP} does nothing.
|
||
|
|
||
|
|
||
|
\S{insFPATAN} \i\c{FPATAN}, \i\c{FPTAN}: Arctangent and Tangent
|
||
|
|
||
|
\c FPATAN ; D9 F3 [8086,FPU]
|
||
|
\c FPTAN ; D9 F2 [8086,FPU]
|
||
|
|
||
|
\c{FPATAN} computes the arctangent, in radians, of the result of
|
||
|
dividing \c{ST1} by \c{ST0}, stores the result in \c{ST1}, and pops
|
||
|
the register stack. It works like the C \c{atan2} function, in that
|
||
|
changing the sign of both \c{ST0} and \c{ST1} changes the output
|
||
|
value by pi (so it performs true rectangular-to-polar coordinate
|
||
|
conversion, with \c{ST1} being the Y coordinate and \c{ST0} being
|
||
|
the X coordinate, not merely an arctangent).
|
||
|
|
||
|
\c{FPTAN} computes the tangent of the value in \c{ST0} (in radians),
|
||
|
and stores the result back into \c{ST0}.
|
||
|
|
||
|
The absolute value of \c{ST0} must be less than 2**63.
|
||
|
|
||
|
|
||
|
\S{insFPREM} \i\c{FPREM}, \i\c{FPREM1}: Floating-Point Partial Remainder
|
||
|
|
||
|
\c FPREM ; D9 F8 [8086,FPU]
|
||
|
\c FPREM1 ; D9 F5 [386,FPU]
|
||
|
|
||
|
These instructions both produce the remainder obtained by dividing
|
||
|
\c{ST0} by \c{ST1}. This is calculated, notionally, by dividing
|
||
|
\c{ST0} by \c{ST1}, rounding the result to an integer, multiplying
|
||
|
by \c{ST1} again, and computing the value which would need to be
|
||
|
added back on to the result to get back to the original value in
|
||
|
\c{ST0}.
|
||
|
|
||
|
The two instructions differ in the way the notional round-to-integer
|
||
|
operation is performed. \c{FPREM} does it by rounding towards zero,
|
||
|
so that the remainder it returns always has the same sign as the
|
||
|
original value in \c{ST0}; \c{FPREM1} does it by rounding to the
|
||
|
nearest integer, so that the remainder always has at most half the
|
||
|
magnitude of \c{ST1}.
|
||
|
|
||
|
Both instructions calculate \e{partial} remainders, meaning that
|
||
|
they may not manage to provide the final result, but might leave
|
||
|
intermediate results in \c{ST0} instead. If this happens, they will
|
||
|
set the C2 flag in the FPU status word; therefore, to calculate a
|
||
|
remainder, you should repeatedly execute \c{FPREM} or \c{FPREM1}
|
||
|
until C2 becomes clear.
|
||
|
|
||
|
|
||
|
\S{insFRNDINT} \i\c{FRNDINT}: Floating-Point Round to Integer
|
||
|
|
||
|
\c FRNDINT ; D9 FC [8086,FPU]
|
||
|
|
||
|
\c{FRNDINT} rounds the contents of \c{ST0} to an integer, according
|
||
|
to the current rounding mode set in the FPU control word, and stores
|
||
|
the result back in \c{ST0}.
|
||
|
|
||
|
|
||
|
\S{insFRSTOR} \i\c{FSAVE}, \i\c{FRSTOR}: Save/Restore Floating-Point State
|
||
|
|
||
|
\c FSAVE mem ; 9B DD /6 [8086,FPU]
|
||
|
\c FNSAVE mem ; DD /6 [8086,FPU]
|
||
|
|
||
|
\c FRSTOR mem ; DD /4 [8086,FPU]
|
||
|
|
||
|
\c{FSAVE} saves the entire floating-point unit state, including all
|
||
|
the information saved by \c{FSTENV} (\k{insFSTENV}) plus the
|
||
|
contents of all the registers, to a 94 or 108 byte area of memory
|
||
|
(depending on the CPU mode). \c{FRSTOR} restores the floating-point
|
||
|
state from the same area of memory.
|
||
|
|
||
|
\c{FNSAVE} does the same as \c{FSAVE}, without first waiting for
|
||
|
pending floating-point exceptions to clear.
|
||
|
|
||
|
|
||
|
\S{insFSCALE} \i\c{FSCALE}: Scale Floating-Point Value by Power of Two
|
||
|
|
||
|
\c FSCALE ; D9 FD [8086,FPU]
|
||
|
|
||
|
\c{FSCALE} scales a number by a power of two: it rounds \c{ST1}
|
||
|
towards zero to obtain an integer, then multiplies \c{ST0} by two to
|
||
|
the power of that integer, and stores the result in \c{ST0}.
|
||
|
|
||
|
|
||
|
\S{insFSETPM} \i\c{FSETPM}: Set Protected Mode
|
||
|
|
||
|
\c FSETPM ; DB E4 [286,FPU]
|
||
|
|
||
|
This instruction initializes protected mode on the 287 floating-point
|
||
|
coprocessor. It is only meaningful on that processor: the 387 and
|
||
|
above treat the instruction as a no-operation.
|
||
|
|
||
|
|
||
|
\S{insFSIN} \i\c{FSIN}, \i\c{FSINCOS}: Sine and Cosine
|
||
|
|
||
|
\c FSIN ; D9 FE [386,FPU]
|
||
|
\c FSINCOS ; D9 FB [386,FPU]
|
||
|
|
||
|
\c{FSIN} calculates the sine of \c{ST0} (in radians) and stores the
|
||
|
result in \c{ST0}. \c{FSINCOS} does the same, but then pushes the
|
||
|
cosine of the same value on the register stack, so that the sine
|
||
|
ends up in \c{ST1} and the cosine in \c{ST0}. \c{FSINCOS} is faster
|
||
|
than executing \c{FSIN} and \c{FCOS} (see \k{insFCOS}) in succession.
|
||
|
|
||
|
The absolute value of \c{ST0} must be less than 2**63.
|
||
|
|
||
|
|
||
|
\S{insFSQRT} \i\c{FSQRT}: Floating-Point Square Root
|
||
|
|
||
|
\c FSQRT ; D9 FA [8086,FPU]
|
||
|
|
||
|
\c{FSQRT} calculates the square root of \c{ST0} and stores the
|
||
|
result in \c{ST0}.
|
||
|
|
||
|
|
||
|
\S{insFST} \i\c{FST}, \i\c{FSTP}: Floating-Point Store
|
||
|
|
||
|
\c FST mem32 ; D9 /2 [8086,FPU]
|
||
|
\c FST mem64 ; DD /2 [8086,FPU]
|
||
|
\c FST fpureg ; DD D0+r [8086,FPU]
|
||
|
|
||
|
\c FSTP mem32 ; D9 /3 [8086,FPU]
|
||
|
\c FSTP mem64 ; DD /3 [8086,FPU]
|
||
|
\c FSTP mem80 ; DB /7 [8086,FPU]
|
||
|
\c FSTP fpureg ; DD D8+r [8086,FPU]
|
||
|
|
||
|
\c{FST} stores the value in \c{ST0} into the given memory location
|
||
|
or other FPU register. \c{FSTP} does the same, but then pops the
|
||
|
register stack.
|
||
|
|
||
|
|
||
|
\S{insFSTCW} \i\c{FSTCW}: Store Floating-Point Control Word
|
||
|
|
||
|
\c FSTCW mem16 ; 9B D9 /7 [8086,FPU]
|
||
|
\c FNSTCW mem16 ; D9 /7 [8086,FPU]
|
||
|
|
||
|
\c{FSTCW} stores the \c{FPU} control word (governing things like the
|
||
|
rounding mode, the precision, and the exception masks) into a 2-byte
|
||
|
memory area. See also \c{FLDCW} (\k{insFLDCW}).
|
||
|
|
||
|
\c{FNSTCW} does the same thing as \c{FSTCW}, without first waiting
|
||
|
for pending floating-point exceptions to clear.
|
||
|
|
||
|
|
||
|
\S{insFSTENV} \i\c{FSTENV}: Store Floating-Point Environment
|
||
|
|
||
|
\c FSTENV mem ; 9B D9 /6 [8086,FPU]
|
||
|
\c FNSTENV mem ; D9 /6 [8086,FPU]
|
||
|
|
||
|
\c{FSTENV} stores the \c{FPU} operating environment (control word,
|
||
|
status word, tag word, instruction pointer, data pointer and last
|
||
|
opcode) into memory. The memory area is 14 or 28 bytes long,
|
||
|
depending on the CPU mode at the time. See also \c{FLDENV}
|
||
|
(\k{insFLDENV}).
|
||
|
|
||
|
\c{FNSTENV} does the same thing as \c{FSTENV}, without first waiting
|
||
|
for pending floating-point exceptions to clear.
|
||
|
|
||
|
|
||
|
\S{insFSTSW} \i\c{FSTSW}: Store Floating-Point Status Word
|
||
|
|
||
|
\c FSTSW mem16 ; 9B DD /7 [8086,FPU]
|
||
|
\c FSTSW AX ; 9B DF E0 [286,FPU]
|
||
|
|
||
|
\c FNSTSW mem16 ; DD /7 [8086,FPU]
|
||
|
\c FNSTSW AX ; DF E0 [286,FPU]
|
||
|
|
||
|
\c{FSTSW} stores the \c{FPU} status word into \c{AX} or into a 2-byte
|
||
|
memory area.
|
||
|
|
||
|
\c{FNSTSW} does the same thing as \c{FSTSW}, without first waiting
|
||
|
for pending floating-point exceptions to clear.
|
||
|
|
||
|
|
||
|
\S{insFSUB} \i\c{FSUB}, \i\c{FSUBP}, \i\c{FSUBR}, \i\c{FSUBRP}: Floating-Point Subtract
|
||
|
|
||
|
\c FSUB mem32 ; D8 /4 [8086,FPU]
|
||
|
\c FSUB mem64 ; DC /4 [8086,FPU]
|
||
|
|
||
|
\c FSUB fpureg ; D8 E0+r [8086,FPU]
|
||
|
\c FSUB ST0,fpureg ; D8 E0+r [8086,FPU]
|
||
|
|
||
|
\c FSUB TO fpureg ; DC E8+r [8086,FPU]
|
||
|
\c FSUB fpureg,ST0 ; DC E8+r [8086,FPU]
|
||
|
|
||
|
\c FSUBR mem32 ; D8 /5 [8086,FPU]
|
||
|
\c FSUBR mem64 ; DC /5 [8086,FPU]
|
||
|
|
||
|
\c FSUBR fpureg ; D8 E8+r [8086,FPU]
|
||
|
\c FSUBR ST0,fpureg ; D8 E8+r [8086,FPU]
|
||
|
|
||
|
\c FSUBR TO fpureg ; DC E0+r [8086,FPU]
|
||
|
\c FSUBR fpureg,ST0 ; DC E0+r [8086,FPU]
|
||
|
|
||
|
\c FSUBP fpureg ; DE E8+r [8086,FPU]
|
||
|
\c FSUBP fpureg,ST0 ; DE E8+r [8086,FPU]
|
||
|
|
||
|
\c FSUBRP fpureg ; DE E0+r [8086,FPU]
|
||
|
\c FSUBRP fpureg,ST0 ; DE E0+r [8086,FPU]
|
||
|
|
||
|
\b \c{FSUB} subtracts the given operand from \c{ST0} and stores the
|
||
|
result back in \c{ST0}, unless the \c{TO} qualifier is given, in
|
||
|
which case it subtracts \c{ST0} from the given operand and stores
|
||
|
the result in the operand.
|
||
|
|
||
|
\b \c{FSUBR} does the same thing, but does the subtraction the other
|
||
|
way up: so if \c{TO} is not given, it subtracts \c{ST0} from the given
|
||
|
operand and stores the result in \c{ST0}, whereas if \c{TO} is given
|
||
|
it subtracts its operand from \c{ST0} and stores the result in the
|
||
|
operand.
|
||
|
|
||
|
\b \c{FSUBP} operates like \c{FSUB TO}, but pops the register stack
|
||
|
once it has finished.
|
||
|
|
||
|
\b \c{FSUBRP} operates like \c{FSUBR TO}, but pops the register stack
|
||
|
once it has finished.
|
||
|
|
||
|
|
||
|
\S{insFTST} \i\c{FTST}: Test \c{ST0} Against Zero
|
||
|
|
||
|
\c FTST ; D9 E4 [8086,FPU]
|
||
|
|
||
|
\c{FTST} compares \c{ST0} with zero and sets the FPU flags
|
||
|
accordingly. \c{ST0} is treated as the left-hand side of the
|
||
|
comparison, so that a `less-than' result is generated if \c{ST0} is
|
||
|
negative.
|
||
|
|
||
|
|
||
|
\S{insFUCOM} \i\c{FUCOMxx}: Floating-Point Unordered Compare
|
||
|
|
||
|
\c FUCOM fpureg ; DD E0+r [386,FPU]
|
||
|
\c FUCOM ST0,fpureg ; DD E0+r [386,FPU]
|
||
|
|
||
|
\c FUCOMP fpureg ; DD E8+r [386,FPU]
|
||
|
\c FUCOMP ST0,fpureg ; DD E8+r [386,FPU]
|
||
|
|
||
|
\c FUCOMPP ; DA E9 [386,FPU]
|
||
|
|
||
|
\c FUCOMI fpureg ; DB E8+r [P6,FPU]
|
||
|
\c FUCOMI ST0,fpureg ; DB E8+r [P6,FPU]
|
||
|
|
||
|
\c FUCOMIP fpureg ; DF E8+r [P6,FPU]
|
||
|
\c FUCOMIP ST0,fpureg ; DF E8+r [P6,FPU]
|
||
|
|
||
|
\b \c{FUCOM} compares \c{ST0} with the given operand, and sets the
|
||
|
FPU flags accordingly. \c{ST0} is treated as the left-hand side of
|
||
|
the comparison, so that the carry flag is set (for a `less-than'
|
||
|
result) if \c{ST0} is less than the given operand.
|
||
|
|
||
|
\b \c{FUCOMP} does the same as \c{FUCOM}, but pops the register stack
|
||
|
afterwards. \c{FUCOMPP} compares \c{ST0} with \c{ST1} and then pops
|
||
|
the register stack twice.
|
||
|
|
||
|
\b \c{FUCOMI} and \c{FUCOMIP} work like the corresponding forms of
|
||
|
\c{FUCOM} and \c{FUCOMP}, but write their results directly to the CPU
|
||
|
flags register rather than the FPU status word, so they can be
|
||
|
immediately followed by conditional jump or conditional move
|
||
|
instructions.
|
||
|
|
||
|
The \c{FUCOM} instructions differ from the \c{FCOM} instructions
|
||
|
(\k{insFCOM}) only in the way they handle quiet NaNs: \c{FUCOM} will
|
||
|
handle them silently and set the condition code flags to an
|
||
|
`unordered' result, whereas \c{FCOM} will generate an exception.
|
||
|
|
||
|
|
||
|
\S{insFXAM} \i\c{FXAM}: Examine Class of Value in \c{ST0}
|
||
|
|
||
|
\c FXAM ; D9 E5 [8086,FPU]
|
||
|
|
||
|
\c{FXAM} sets the FPU flags \c{C3}, \c{C2} and \c{C0} depending on
|
||
|
the type of value stored in \c{ST0}:
|
||
|
|
||
|
\c Register contents Flags
|
||
|
|
||
|
\c Unsupported format 000
|
||
|
\c NaN 001
|
||
|
\c Finite number 010
|
||
|
\c Infinity 011
|
||
|
\c Zero 100
|
||
|
\c Empty register 101
|
||
|
\c Denormal 110
|
||
|
|
||
|
Additionally, the \c{C1} flag is set to the sign of the number.
|
||
|
|
||
|
|
||
|
\S{insFXCH} \i\c{FXCH}: Floating-Point Exchange
|
||
|
|
||
|
\c FXCH ; D9 C9 [8086,FPU]
|
||
|
\c FXCH fpureg ; D9 C8+r [8086,FPU]
|
||
|
\c FXCH fpureg,ST0 ; D9 C8+r [8086,FPU]
|
||
|
\c FXCH ST0,fpureg ; D9 C8+r [8086,FPU]
|
||
|
|
||
|
\c{FXCH} exchanges \c{ST0} with a given FPU register. The no-operand
|
||
|
form exchanges \c{ST0} with \c{ST1}.
|
||
|
|
||
|
|
||
|
\S{insFXRSTOR} \i\c{FXRSTOR}: Restore \c{FP}, \c{MMX} and \c{SSE} State
|
||
|
|
||
|
\c FXRSTOR memory ; 0F AE /1 [P6,SSE,FPU]
|
||
|
|
||
|
The \c{FXRSTOR} instruction reloads the \c{FPU}, \c{MMX} and \c{SSE}
|
||
|
state (environment and registers), from the 512 byte memory area defined
|
||
|
by the source operand. This data should have been written by a previous
|
||
|
\c{FXSAVE}.
|
||
|
|
||
|
|
||
|
\S{insFXSAVE} \i\c{FXSAVE}: Store \c{FP}, \c{MMX} and \c{SSE} State
|
||
|
|
||
|
\c FXSAVE memory ; 0F AE /0 [P6,SSE,FPU]
|
||
|
|
||
|
\c{FXSAVE}The FXSAVE instruction writes the current \c{FPU}, \c{MMX}
|
||
|
and \c{SSE} technology states (environment and registers), to the
|
||
|
512 byte memory area defined by the destination operand. It does this
|
||
|
without checking for pending unmasked floating-point exceptions
|
||
|
(similar to the operation of \c{FNSAVE}).
|
||
|
|
||
|
Unlike the \c{FSAVE/FNSAVE} instructions, the processor retains the
|
||
|
contents of the \c{FPU}, \c{MMX} and \c{SSE} state in the processor
|
||
|
after the state has been saved. This instruction has been optimized
|
||
|
to maximize floating-point save performance.
|
||
|
|
||
|
|
||
|
\S{insFXTRACT} \i\c{FXTRACT}: Extract Exponent and Significand
|
||
|
|
||
|
\c FXTRACT ; D9 F4 [8086,FPU]
|
||
|
|
||
|
\c{FXTRACT} separates the number in \c{ST0} into its exponent and
|
||
|
significand (mantissa), stores the exponent back into \c{ST0}, and
|
||
|
then pushes the significand on the register stack (so that the
|
||
|
significand ends up in \c{ST0}, and the exponent in \c{ST1}).
|
||
|
|
||
|
|
||
|
\S{insFYL2X} \i\c{FYL2X}, \i\c{FYL2XP1}: Compute Y times Log2(X) or Log2(X+1)
|
||
|
|
||
|
\c FYL2X ; D9 F1 [8086,FPU]
|
||
|
\c FYL2XP1 ; D9 F9 [8086,FPU]
|
||
|
|
||
|
\c{FYL2X} multiplies \c{ST1} by the base-2 logarithm of \c{ST0},
|
||
|
stores the result in \c{ST1}, and pops the register stack (so that
|
||
|
the result ends up in \c{ST0}). \c{ST0} must be non-zero and
|
||
|
positive.
|
||
|
|
||
|
\c{FYL2XP1} works the same way, but replacing the base-2 log of
|
||
|
\c{ST0} with that of \c{ST0} plus one. This time, \c{ST0} must have
|
||
|
magnitude no greater than 1 minus half the square root of two.
|
||
|
|
||
|
|
||
|
\S{insHLT} \i\c{HLT}: Halt Processor
|
||
|
|
||
|
\c HLT ; F4 [8086,PRIV]
|
||
|
|
||
|
\c{HLT} puts the processor into a halted state, where it will
|
||
|
perform no more operations until restarted by an interrupt or a
|
||
|
reset.
|
||
|
|
||
|
On the 286 and later processors, this is a privileged instruction.
|
||
|
|
||
|
|
||
|
\S{insIBTS} \i\c{IBTS}: Insert Bit String
|
||
|
|
||
|
\c IBTS r/m16,reg16 ; o16 0F A7 /r [386,UNDOC]
|
||
|
\c IBTS r/m32,reg32 ; o32 0F A7 /r [386,UNDOC]
|
||
|
|
||
|
The implied operation of this instruction is:
|
||
|
|
||
|
\c IBTS r/m16,AX,CL,reg16
|
||
|
\c IBTS r/m32,EAX,CL,reg32
|
||
|
|
||
|
Writes a bit string from the source operand to the destination.
|
||
|
\c{CL} indicates the number of bits to be copied, from the low bits
|
||
|
of the source. \c{(E)AX} indicates the low order bit offset in the
|
||
|
destination that is written to. For example, if \c{CL} is set to 4
|
||
|
and \c{AX} (for 16-bit code) is set to 5, bits 0-3 of \c{src} will
|
||
|
be copied to bits 5-8 of \c{dst}. This instruction is very poorly
|
||
|
documented, and I have been unable to find any official source of
|
||
|
documentation on it.
|
||
|
|
||
|
\c{IBTS} is supported only on the early Intel 386s, and conflicts
|
||
|
with the opcodes for \c{CMPXCHG486} (on early Intel 486s). NASM
|
||
|
supports it only for completeness. Its counterpart is \c{XBTS}
|
||
|
(see \k{insXBTS}).
|
||
|
|
||
|
|
||
|
\S{insIDIV} \i\c{IDIV}: Signed Integer Divide
|
||
|
|
||
|
\c IDIV r/m8 ; F6 /7 [8086]
|
||
|
\c IDIV r/m16 ; o16 F7 /7 [8086]
|
||
|
\c IDIV r/m32 ; o32 F7 /7 [386]
|
||
|
|
||
|
\c{IDIV} performs signed integer division. The explicit operand
|
||
|
provided is the divisor; the dividend and destination operands
|
||
|
are implicit, in the following way:
|
||
|
|
||
|
\b For \c{IDIV r/m8}, \c{AX} is divided by the given operand;
|
||
|
the quotient is stored in \c{AL} and the remainder in \c{AH}.
|
||
|
|
||
|
\b For \c{IDIV r/m16}, \c{DX:AX} is divided by the given operand;
|
||
|
the quotient is stored in \c{AX} and the remainder in \c{DX}.
|
||
|
|
||
|
\b For \c{IDIV r/m32}, \c{EDX:EAX} is divided by the given operand;
|
||
|
the quotient is stored in \c{EAX} and the remainder in \c{EDX}.
|
||
|
|
||
|
Unsigned integer division is performed by the \c{DIV} instruction:
|
||
|
see \k{insDIV}.
|
||
|
|
||
|
|
||
|
\S{insIMUL} \i\c{IMUL}: Signed Integer Multiply
|
||
|
|
||
|
\c IMUL r/m8 ; F6 /5 [8086]
|
||
|
\c IMUL r/m16 ; o16 F7 /5 [8086]
|
||
|
\c IMUL r/m32 ; o32 F7 /5 [386]
|
||
|
|
||
|
\c IMUL reg16,r/m16 ; o16 0F AF /r [386]
|
||
|
\c IMUL reg32,r/m32 ; o32 0F AF /r [386]
|
||
|
|
||
|
\c IMUL reg16,imm8 ; o16 6B /r ib [186]
|
||
|
\c IMUL reg16,imm16 ; o16 69 /r iw [186]
|
||
|
\c IMUL reg32,imm8 ; o32 6B /r ib [386]
|
||
|
\c IMUL reg32,imm32 ; o32 69 /r id [386]
|
||
|
|
||
|
\c IMUL reg16,r/m16,imm8 ; o16 6B /r ib [186]
|
||
|
\c IMUL reg16,r/m16,imm16 ; o16 69 /r iw [186]
|
||
|
\c IMUL reg32,r/m32,imm8 ; o32 6B /r ib [386]
|
||
|
\c IMUL reg32,r/m32,imm32 ; o32 69 /r id [386]
|
||
|
|
||
|
\c{IMUL} performs signed integer multiplication. For the
|
||
|
single-operand form, the other operand and destination are
|
||
|
implicit, in the following way:
|
||
|
|
||
|
\b For \c{IMUL r/m8}, \c{AL} is multiplied by the given operand;
|
||
|
the product is stored in \c{AX}.
|
||
|
|
||
|
\b For \c{IMUL r/m16}, \c{AX} is multiplied by the given operand;
|
||
|
the product is stored in \c{DX:AX}.
|
||
|
|
||
|
\b For \c{IMUL r/m32}, \c{EAX} is multiplied by the given operand;
|
||
|
the product is stored in \c{EDX:EAX}.
|
||
|
|
||
|
The two-operand form multiplies its two operands and stores the
|
||
|
result in the destination (first) operand. The three-operand
|
||
|
form multiplies its last two operands and stores the result in
|
||
|
the first operand.
|
||
|
|
||
|
The two-operand form with an immediate second operand is in
|
||
|
fact a shorthand for the three-operand form, as can be seen by
|
||
|
examining the opcode descriptions: in the two-operand form, the
|
||
|
code \c{/r} takes both its register and \c{r/m} parts from the
|
||
|
same operand (the first one).
|
||
|
|
||
|
In the forms with an 8-bit immediate operand and another longer
|
||
|
source operand, the immediate operand is considered to be signed,
|
||
|
and is sign-extended to the length of the other source operand.
|
||
|
In these cases, the \c{BYTE} qualifier is necessary to force
|
||
|
NASM to generate this form of the instruction.
|
||
|
|
||
|
Unsigned integer multiplication is performed by the \c{MUL}
|
||
|
instruction: see \k{insMUL}.
|
||
|
|
||
|
|
||
|
\S{insIN} \i\c{IN}: Input from I/O Port
|
||
|
|
||
|
\c IN AL,imm8 ; E4 ib [8086]
|
||
|
\c IN AX,imm8 ; o16 E5 ib [8086]
|
||
|
\c IN EAX,imm8 ; o32 E5 ib [386]
|
||
|
\c IN AL,DX ; EC [8086]
|
||
|
\c IN AX,DX ; o16 ED [8086]
|
||
|
\c IN EAX,DX ; o32 ED [386]
|
||
|
|
||
|
\c{IN} reads a byte, word or doubleword from the specified I/O port,
|
||
|
and stores it in the given destination register. The port number may
|
||
|
be specified as an immediate value if it is between 0 and 255, and
|
||
|
otherwise must be stored in \c{DX}. See also \c{OUT} (\k{insOUT}).
|
||
|
|
||
|
|
||
|
\S{insINC} \i\c{INC}: Increment Integer
|
||
|
|
||
|
\c INC reg16 ; o16 40+r [8086]
|
||
|
\c INC reg32 ; o32 40+r [386]
|
||
|
\c INC r/m8 ; FE /0 [8086]
|
||
|
\c INC r/m16 ; o16 FF /0 [8086]
|
||
|
\c INC r/m32 ; o32 FF /0 [386]
|
||
|
|
||
|
\c{INC} adds 1 to its operand. It does \e{not} affect the carry
|
||
|
flag: to affect the carry flag, use \c{ADD something,1} (see
|
||
|
\k{insADD}). \c{INC} affects all the other flags according to the result.
|
||
|
|
||
|
This instruction can be used with a \c{LOCK} prefix to allow atomic execution.
|
||
|
|
||
|
See also \c{DEC} (\k{insDEC}).
|
||
|
|
||
|
|
||
|
\S{insINSB} \i\c{INSB}, \i\c{INSW}, \i\c{INSD}: Input String from I/O Port
|
||
|
|
||
|
\c INSB ; 6C [186]
|
||
|
\c INSW ; o16 6D [186]
|
||
|
\c INSD ; o32 6D [386]
|
||
|
|
||
|
\c{INSB} inputs a byte from the I/O port specified in \c{DX} and
|
||
|
stores it at \c{[ES:DI]} or \c{[ES:EDI]}. It then increments or
|
||
|
decrements (depending on the direction flag: increments if the flag
|
||
|
is clear, decrements if it is set) \c{DI} or \c{EDI}.
|
||
|
|
||
|
The register used is \c{DI} if the address size is 16 bits, and
|
||
|
\c{EDI} if it is 32 bits. If you need to use an address size not
|
||
|
equal to the current \c{BITS} setting, you can use an explicit
|
||
|
\i\c{a16} or \i\c{a32} prefix.
|
||
|
|
||
|
Segment override prefixes have no effect for this instruction: the
|
||
|
use of \c{ES} for the load from \c{[DI]} or \c{[EDI]} cannot be
|
||
|
overridden.
|
||
|
|
||
|
\c{INSW} and \c{INSD} work in the same way, but they input a word or
|
||
|
a doubleword instead of a byte, and increment or decrement the
|
||
|
addressing register by 2 or 4 instead of 1.
|
||
|
|
||
|
The \c{REP} prefix may be used to repeat the instruction \c{CX} (or
|
||
|
\c{ECX} - again, the address size chooses which) times.
|
||
|
|
||
|
See also \c{OUTSB}, \c{OUTSW} and \c{OUTSD} (\k{insOUTSB}).
|
||
|
|
||
|
|
||
|
\S{insINT} \i\c{INT}: Software Interrupt
|
||
|
|
||
|
\c INT imm8 ; CD ib [8086]
|
||
|
|
||
|
\c{INT} causes a software interrupt through a specified vector
|
||
|
number from 0 to 255.
|
||
|
|
||
|
The code generated by the \c{INT} instruction is always two bytes
|
||
|
long: although there are short forms for some \c{INT} instructions,
|
||
|
NASM does not generate them when it sees the \c{INT} mnemonic. In
|
||
|
order to generate single-byte breakpoint instructions, use the
|
||
|
\c{INT3} or \c{INT1} instructions (see \k{insINT1}) instead.
|
||
|
|
||
|
|
||
|
\S{insINT1} \i\c{INT3}, \i\c{INT1}, \i\c{ICEBP}, \i\c{INT01}: Breakpoints
|
||
|
|
||
|
\c INT1 ; F1 [P6]
|
||
|
\c ICEBP ; F1 [P6]
|
||
|
\c INT01 ; F1 [P6]
|
||
|
|
||
|
\c INT3 ; CC [8086]
|
||
|
\c INT03 ; CC [8086]
|
||
|
|
||
|
\c{INT1} and \c{INT3} are short one-byte forms of the instructions
|
||
|
\c{INT 1} and \c{INT 3} (see \k{insINT}). They perform a similar
|
||
|
function to their longer counterparts, but take up less code space.
|
||
|
They are used as breakpoints by debuggers.
|
||
|
|
||
|
\b \c{INT1}, and its alternative synonyms \c{INT01} and \c{ICEBP}, is
|
||
|
an instruction used by in-circuit emulators (ICEs). It is present,
|
||
|
though not documented, on some processors down to the 286, but is
|
||
|
only documented for the Pentium Pro. \c{INT3} is the instruction
|
||
|
normally used as a breakpoint by debuggers.
|
||
|
|
||
|
\b \c{INT3}, and its synonym \c{INT03}, is not precisely equivalent to
|
||
|
\c{INT 3}: the short form, since it is designed to be used as a
|
||
|
breakpoint, bypasses the normal \c{IOPL} checks in virtual-8086 mode,
|
||
|
and also does not go through interrupt redirection.
|
||
|
|
||
|
|
||
|
\S{insINTO} \i\c{INTO}: Interrupt if Overflow
|
||
|
|
||
|
\c INTO ; CE [8086]
|
||
|
|
||
|
\c{INTO} performs an \c{INT 4} software interrupt (see \k{insINT})
|
||
|
if and only if the overflow flag is set.
|
||
|
|
||
|
|
||
|
\S{insINVD} \i\c{INVD}: Invalidate Internal Caches
|
||
|
|
||
|
\c INVD ; 0F 08 [486]
|
||
|
|
||
|
\c{INVD} invalidates and empties the processor's internal caches,
|
||
|
and causes the processor to instruct external caches to do the same.
|
||
|
It does not write the contents of the caches back to memory first:
|
||
|
any modified data held in the caches will be lost. To write the data
|
||
|
back first, use \c{WBINVD} (\k{insWBINVD}).
|
||
|
|
||
|
|
||
|
\S{insINVLPG} \i\c{INVLPG}: Invalidate TLB Entry
|
||
|
|
||
|
\c INVLPG mem ; 0F 01 /7 [486]
|
||
|
|
||
|
\c{INVLPG} invalidates the translation lookahead buffer (TLB) entry
|
||
|
associated with the supplied memory address.
|
||
|
|
||
|
|
||
|
\S{insIRET} \i\c{IRET}, \i\c{IRETW}, \i\c{IRETD}: Return from Interrupt
|
||
|
|
||
|
\c IRET ; CF [8086]
|
||
|
\c IRETW ; o16 CF [8086]
|
||
|
\c IRETD ; o32 CF [386]
|
||
|
|
||
|
\c{IRET} returns from an interrupt (hardware or software) by means
|
||
|
of popping \c{IP} (or \c{EIP}), \c{CS} and the flags off the stack
|
||
|
and then continuing execution from the new \c{CS:IP}.
|
||
|
|
||
|
\c{IRETW} pops \c{IP}, \c{CS} and the flags as 2 bytes each, taking
|
||
|
6 bytes off the stack in total. \c{IRETD} pops \c{EIP} as 4 bytes,
|
||
|
pops a further 4 bytes of which the top two are discarded and the
|
||
|
bottom two go into \c{CS}, and pops the flags as 4 bytes as well,
|
||
|
taking 12 bytes off the stack.
|
||
|
|
||
|
\c{IRET} is a shorthand for either \c{IRETW} or \c{IRETD}, depending
|
||
|
on the default \c{BITS} setting at the time.
|
||
|
|
||
|
|
||
|
\S{insJcc} \i\c{Jcc}: Conditional Branch
|
||
|
|
||
|
\c Jcc imm ; 70+cc rb [8086]
|
||
|
\c Jcc NEAR imm ; 0F 80+cc rw/rd [386]
|
||
|
|
||
|
The \i{conditional jump} instructions execute a near (same segment)
|
||
|
jump if and only if their conditions are satisfied. For example,
|
||
|
\c{JNZ} jumps only if the zero flag is not set.
|
||
|
|
||
|
The ordinary form of the instructions has only a 128-byte range; the
|
||
|
\c{NEAR} form is a 386 extension to the instruction set, and can
|
||
|
span the full size of a segment. NASM will not override your choice
|
||
|
of jump instruction: if you want \c{Jcc NEAR}, you have to use the
|
||
|
\c{NEAR} keyword.
|
||
|
|
||
|
The \c{SHORT} keyword is allowed on the first form of the
|
||
|
instruction, for clarity, but is not necessary.
|
||
|
|
||
|
For details of the condition codes, see \k{iref-cc}.
|
||
|
|
||
|
|
||
|
\S{insJCXZ} \i\c{JCXZ}, \i\c{JECXZ}: Jump if CX/ECX Zero
|
||
|
|
||
|
\c JCXZ imm ; a16 E3 rb [8086]
|
||
|
\c JECXZ imm ; a32 E3 rb [386]
|
||
|
|
||
|
\c{JCXZ} performs a short jump (with maximum range 128 bytes) if and
|
||
|
only if the contents of the \c{CX} register is 0. \c{JECXZ} does the
|
||
|
same thing, but with \c{ECX}.
|
||
|
|
||
|
|
||
|
\S{insJMP} \i\c{JMP}: Jump
|
||
|
|
||
|
\c JMP imm ; E9 rw/rd [8086]
|
||
|
\c JMP SHORT imm ; EB rb [8086]
|
||
|
\c JMP imm:imm16 ; o16 EA iw iw [8086]
|
||
|
\c JMP imm:imm32 ; o32 EA id iw [386]
|
||
|
\c JMP FAR mem ; o16 FF /5 [8086]
|
||
|
\c JMP FAR mem32 ; o32 FF /5 [386]
|
||
|
\c JMP r/m16 ; o16 FF /4 [8086]
|
||
|
\c JMP r/m32 ; o32 FF /4 [386]
|
||
|
|
||
|
\c{JMP} jumps to a given address. The address may be specified as an
|
||
|
absolute segment and offset, or as a relative jump within the
|
||
|
current segment.
|
||
|
|
||
|
\c{JMP SHORT imm} has a maximum range of 128 bytes, since the
|
||
|
displacement is specified as only 8 bits, but takes up less code
|
||
|
space. NASM does not choose when to generate \c{JMP SHORT} for you:
|
||
|
you must explicitly code \c{SHORT} every time you want a short jump.
|
||
|
|
||
|
You can choose between the two immediate \i{far jump} forms (\c{JMP
|
||
|
imm:imm}) by the use of the \c{WORD} and \c{DWORD} keywords: \c{JMP
|
||
|
WORD 0x1234:0x5678}) or \c{JMP DWORD 0x1234:0x56789abc}.
|
||
|
|
||
|
The \c{JMP FAR mem} forms execute a far jump by loading the
|
||
|
destination address out of memory. The address loaded consists of 16
|
||
|
or 32 bits of offset (depending on the operand size), and 16 bits of
|
||
|
segment. The operand size may be overridden using \c{JMP WORD FAR
|
||
|
mem} or \c{JMP DWORD FAR mem}.
|
||
|
|
||
|
The \c{JMP r/m} forms execute a \i{near jump} (within the same
|
||
|
segment), loading the destination address out of memory or out of a
|
||
|
register. The keyword \c{NEAR} may be specified, for clarity, in
|
||
|
these forms, but is not necessary. Again, operand size can be
|
||
|
overridden using \c{JMP WORD mem} or \c{JMP DWORD mem}.
|
||
|
|
||
|
As a convenience, NASM does not require you to jump to a far symbol
|
||
|
by coding the cumbersome \c{JMP SEG routine:routine}, but instead
|
||
|
allows the easier synonym \c{JMP FAR routine}.
|
||
|
|
||
|
The \c{JMP r/m} forms given above are near calls; NASM will accept
|
||
|
the \c{NEAR} keyword (e.g. \c{JMP NEAR [address]}), even though it
|
||
|
is not strictly necessary.
|
||
|
|
||
|
|
||
|
\S{insLAHF} \i\c{LAHF}: Load AH from Flags
|
||
|
|
||
|
\c LAHF ; 9F [8086]
|
||
|
|
||
|
\c{LAHF} sets the \c{AH} register according to the contents of the
|
||
|
low byte of the flags word.
|
||
|
|
||
|
The operation of \c{LAHF} is:
|
||
|
|
||
|
\c AH <-- SF:ZF:0:AF:0:PF:1:CF
|
||
|
|
||
|
See also \c{SAHF} (\k{insSAHF}).
|
||
|
|
||
|
|
||
|
\S{insLAR} \i\c{LAR}: Load Access Rights
|
||
|
|
||
|
\c LAR reg16,r/m16 ; o16 0F 02 /r [286,PRIV]
|
||
|
\c LAR reg32,r/m32 ; o32 0F 02 /r [286,PRIV]
|
||
|
|
||
|
\c{LAR} takes the segment selector specified by its source (second)
|
||
|
operand, finds the corresponding segment descriptor in the GDT or
|
||
|
LDT, and loads the access-rights byte of the descriptor into its
|
||
|
destination (first) operand.
|
||
|
|
||
|
|
||
|
\S{insLDMXCSR} \i\c{LDMXCSR}: Load Streaming SIMD Extension
|
||
|
Control/Status
|
||
|
|
||
|
\c LDMXCSR mem32 ; 0F AE /2 [KATMAI,SSE]
|
||
|
|
||
|
\c{LDMXCSR} loads 32-bits of data from the specified memory location
|
||
|
into the \c{MXCSR} control/status register. \c{MXCSR} is used to
|
||
|
enable masked/unmasked exception handling, to set rounding modes,
|
||
|
to set flush-to-zero mode, and to view exception status flags.
|
||
|
|
||
|
For details of the \c{MXCSR} register, see the Intel processor docs.
|
||
|
|
||
|
See also \c{STMXCSR} (\k{insSTMXCSR}
|
||
|
|
||
|
|
||
|
\S{insLDS} \i\c{LDS}, \i\c{LES}, \i\c{LFS}, \i\c{LGS}, \i\c{LSS}: Load Far Pointer
|
||
|
|
||
|
\c LDS reg16,mem ; o16 C5 /r [8086]
|
||
|
\c LDS reg32,mem ; o32 C5 /r [386]
|
||
|
|
||
|
\c LES reg16,mem ; o16 C4 /r [8086]
|
||
|
\c LES reg32,mem ; o32 C4 /r [386]
|
||
|
|
||
|
\c LFS reg16,mem ; o16 0F B4 /r [386]
|
||
|
\c LFS reg32,mem ; o32 0F B4 /r [386]
|
||
|
|
||
|
\c LGS reg16,mem ; o16 0F B5 /r [386]
|
||
|
\c LGS reg32,mem ; o32 0F B5 /r [386]
|
||
|
|
||
|
\c LSS reg16,mem ; o16 0F B2 /r [386]
|
||
|
\c LSS reg32,mem ; o32 0F B2 /r [386]
|
||
|
|
||
|
These instructions load an entire far pointer (16 or 32 bits of
|
||
|
offset, plus 16 bits of segment) out of memory in one go. \c{LDS},
|
||
|
for example, loads 16 or 32 bits from the given memory address into
|
||
|
the given register (depending on the size of the register), then
|
||
|
loads the \e{next} 16 bits from memory into \c{DS}. \c{LES},
|
||
|
\c{LFS}, \c{LGS} and \c{LSS} work in the same way but use the other
|
||
|
segment registers.
|
||
|
|
||
|
|
||
|
\S{insLEA} \i\c{LEA}: Load Effective Address
|
||
|
|
||
|
\c LEA reg16,mem ; o16 8D /r [8086]
|
||
|
\c LEA reg32,mem ; o32 8D /r [386]
|
||
|
|
||
|
\c{LEA}, despite its syntax, does not access memory. It calculates
|
||
|
the effective address specified by its second operand as if it were
|
||
|
going to load or store data from it, but instead it stores the
|
||
|
calculated address into the register specified by its first operand.
|
||
|
This can be used to perform quite complex calculations (e.g. \c{LEA
|
||
|
EAX,[EBX+ECX*4+100]}) in one instruction.
|
||
|
|
||
|
\c{LEA}, despite being a purely arithmetic instruction which
|
||
|
accesses no memory, still requires square brackets around its second
|
||
|
operand, as if it were a memory reference.
|
||
|
|
||
|
The size of the calculation is the current \e{address} size, and the
|
||
|
size that the result is stored as is the current \e{operand} size.
|
||
|
If the address and operand size are not the same, then if the
|
||
|
addressing mode was 32-bits, the low 16-bits are stored, and if the
|
||
|
address was 16-bits, it is zero-extended to 32-bits before storing.
|
||
|
|
||
|
|
||
|
\S{insLEAVE} \i\c{LEAVE}: Destroy Stack Frame
|
||
|
|
||
|
\c LEAVE ; C9 [186]
|
||
|
|
||
|
\c{LEAVE} destroys a stack frame of the form created by the
|
||
|
\c{ENTER} instruction (see \k{insENTER}). It is functionally
|
||
|
equivalent to \c{MOV ESP,EBP} followed by \c{POP EBP} (or \c{MOV
|
||
|
SP,BP} followed by \c{POP BP} in 16-bit mode).
|
||
|
|
||
|
|
||
|
\S{insLFENCE} \i\c{LFENCE}: Load Fence
|
||
|
|
||
|
\c LFENCE ; 0F AE /5 [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{LFENCE} performs a serialising operation on all loads from memory
|
||
|
that were issued before the \c{LFENCE} instruction. This guarantees that
|
||
|
all memory reads before the \c{LFENCE} instruction are visible before any
|
||
|
reads after the \c{LFENCE} instruction.
|
||
|
|
||
|
\c{LFENCE} is ordered respective to other \c{LFENCE} instruction, \c{MFENCE},
|
||
|
any memory read and any other serialising instruction (such as \c{CPUID}).
|
||
|
|
||
|
Weakly ordered memory types can be used to achieve higher processor
|
||
|
performance through such techniques as out-of-order issue and
|
||
|
speculative reads. The degree to which a consumer of data recognizes
|
||
|
or knows that the data is weakly ordered varies among applications
|
||
|
and may be unknown to the producer of this data. The \c{LFENCE}
|
||
|
instruction provides a performance-efficient way of ensuring load
|
||
|
ordering between routines that produce weakly-ordered results and
|
||
|
routines that consume that data.
|
||
|
|
||
|
\c{LFENCE} uses the following ModRM encoding:
|
||
|
|
||
|
\c Mod (7:6) = 11B
|
||
|
\c Reg/Opcode (5:3) = 101B
|
||
|
\c R/M (2:0) = 000B
|
||
|
|
||
|
All other ModRM encodings are defined to be reserved, and use
|
||
|
of these encodings risks incompatibility with future processors.
|
||
|
|
||
|
See also \c{SFENCE} (\k{insSFENCE}) and \c{MFENCE} (\k{insMFENCE}).
|
||
|
|
||
|
|
||
|
\S{insLGDT} \i\c{LGDT}, \i\c{LIDT}, \i\c{LLDT}: Load Descriptor Tables
|
||
|
|
||
|
\c LGDT mem ; 0F 01 /2 [286,PRIV]
|
||
|
\c LIDT mem ; 0F 01 /3 [286,PRIV]
|
||
|
\c LLDT r/m16 ; 0F 00 /2 [286,PRIV]
|
||
|
|
||
|
\c{LGDT} and \c{LIDT} both take a 6-byte memory area as an operand:
|
||
|
they load a 16-bit size limit and a 32-bit linear address from that
|
||
|
area (in the opposite order) into the \c{GDTR} (global descriptor table
|
||
|
register) or \c{IDTR} (interrupt descriptor table register). These are
|
||
|
the only instructions which directly use \e{linear} addresses, rather
|
||
|
than segment/offset pairs.
|
||
|
|
||
|
\c{LLDT} takes a segment selector as an operand. The processor looks
|
||
|
up that selector in the GDT and stores the limit and base address
|
||
|
given there into the \c{LDTR} (local descriptor table register).
|
||
|
|
||
|
See also \c{SGDT}, \c{SIDT} and \c{SLDT} (\k{insSGDT}).
|
||
|
|
||
|
|
||
|
\S{insLMSW} \i\c{LMSW}: Load/Store Machine Status Word
|
||
|
|
||
|
\c LMSW r/m16 ; 0F 01 /6 [286,PRIV]
|
||
|
|
||
|
\c{LMSW} loads the bottom four bits of the source operand into the
|
||
|
bottom four bits of the \c{CR0} control register (or the Machine
|
||
|
Status Word, on 286 processors). See also \c{SMSW} (\k{insSMSW}).
|
||
|
|
||
|
|
||
|
\S{insLOADALL} \i\c{LOADALL}, \i\c{LOADALL286}: Load Processor State
|
||
|
|
||
|
\c LOADALL ; 0F 07 [386,UNDOC]
|
||
|
\c LOADALL286 ; 0F 05 [286,UNDOC]
|
||
|
|
||
|
This instruction, in its two different-opcode forms, is apparently
|
||
|
supported on most 286 processors, some 386 and possibly some 486.
|
||
|
The opcode differs between the 286 and the 386.
|
||
|
|
||
|
The function of the instruction is to load all information relating
|
||
|
to the state of the processor out of a block of memory: on the 286,
|
||
|
this block is located implicitly at absolute address \c{0x800}, and
|
||
|
on the 386 and 486 it is at \c{[ES:EDI]}.
|
||
|
|
||
|
|
||
|
\S{insLODSB} \i\c{LODSB}, \i\c{LODSW}, \i\c{LODSD}: Load from String
|
||
|
|
||
|
\c LODSB ; AC [8086]
|
||
|
\c LODSW ; o16 AD [8086]
|
||
|
\c LODSD ; o32 AD [386]
|
||
|
|
||
|
\c{LODSB} loads a byte from \c{[DS:SI]} or \c{[DS:ESI]} into \c{AL}.
|
||
|
It then increments or decrements (depending on the direction flag:
|
||
|
increments if the flag is clear, decrements if it is set) \c{SI} or
|
||
|
\c{ESI}.
|
||
|
|
||
|
The register used is \c{SI} if the address size is 16 bits, and
|
||
|
\c{ESI} if it is 32 bits. If you need to use an address size not
|
||
|
equal to the current \c{BITS} setting, you can use an explicit
|
||
|
\i\c{a16} or \i\c{a32} prefix.
|
||
|
|
||
|
The segment register used to load from \c{[SI]} or \c{[ESI]} can be
|
||
|
overridden by using a segment register name as a prefix (for
|
||
|
example, \c{ES LODSB}).
|
||
|
|
||
|
\c{LODSW} and \c{LODSD} work in the same way, but they load a
|
||
|
word or a doubleword instead of a byte, and increment or decrement
|
||
|
the addressing registers by 2 or 4 instead of 1.
|
||
|
|
||
|
|
||
|
\S{insLOOP} \i\c{LOOP}, \i\c{LOOPE}, \i\c{LOOPZ}, \i\c{LOOPNE}, \i\c{LOOPNZ}: Loop with Counter
|
||
|
|
||
|
\c LOOP imm ; E2 rb [8086]
|
||
|
\c LOOP imm,CX ; a16 E2 rb [8086]
|
||
|
\c LOOP imm,ECX ; a32 E2 rb [386]
|
||
|
|
||
|
\c LOOPE imm ; E1 rb [8086]
|
||
|
\c LOOPE imm,CX ; a16 E1 rb [8086]
|
||
|
\c LOOPE imm,ECX ; a32 E1 rb [386]
|
||
|
\c LOOPZ imm ; E1 rb [8086]
|
||
|
\c LOOPZ imm,CX ; a16 E1 rb [8086]
|
||
|
\c LOOPZ imm,ECX ; a32 E1 rb [386]
|
||
|
|
||
|
\c LOOPNE imm ; E0 rb [8086]
|
||
|
\c LOOPNE imm,CX ; a16 E0 rb [8086]
|
||
|
\c LOOPNE imm,ECX ; a32 E0 rb [386]
|
||
|
\c LOOPNZ imm ; E0 rb [8086]
|
||
|
\c LOOPNZ imm,CX ; a16 E0 rb [8086]
|
||
|
\c LOOPNZ imm,ECX ; a32 E0 rb [386]
|
||
|
|
||
|
\c{LOOP} decrements its counter register (either \c{CX} or \c{ECX} -
|
||
|
if one is not specified explicitly, the \c{BITS} setting dictates
|
||
|
which is used) by one, and if the counter does not become zero as a
|
||
|
result of this operation, it jumps to the given label. The jump has
|
||
|
a range of 128 bytes.
|
||
|
|
||
|
\c{LOOPE} (or its synonym \c{LOOPZ}) adds the additional condition
|
||
|
that it only jumps if the counter is nonzero \e{and} the zero flag
|
||
|
is set. Similarly, \c{LOOPNE} (and \c{LOOPNZ}) jumps only if the
|
||
|
counter is nonzero and the zero flag is clear.
|
||
|
|
||
|
|
||
|
\S{insLSL} \i\c{LSL}: Load Segment Limit
|
||
|
|
||
|
\c LSL reg16,r/m16 ; o16 0F 03 /r [286,PRIV]
|
||
|
\c LSL reg32,r/m32 ; o32 0F 03 /r [286,PRIV]
|
||
|
|
||
|
\c{LSL} is given a segment selector in its source (second) operand;
|
||
|
it computes the segment limit value by loading the segment limit
|
||
|
field from the associated segment descriptor in the \c{GDT} or \c{LDT}.
|
||
|
(This involves shifting left by 12 bits if the segment limit is
|
||
|
page-granular, and not if it is byte-granular; so you end up with a
|
||
|
byte limit in either case.) The segment limit obtained is then
|
||
|
loaded into the destination (first) operand.
|
||
|
|
||
|
|
||
|
\S{insLTR} \i\c{LTR}: Load Task Register
|
||
|
|
||
|
\c LTR r/m16 ; 0F 00 /3 [286,PRIV]
|
||
|
|
||
|
\c{LTR} looks up the segment base and limit in the GDT or LDT
|
||
|
descriptor specified by the segment selector given as its operand,
|
||
|
and loads them into the Task Register.
|
||
|
|
||
|
|
||
|
\S{insMASKMOVDQU} \i\c{MASKMOVDQU}: Byte Mask Write
|
||
|
|
||
|
\c MASKMOVDQU xmm1,xmm2 ; 66 0F F7 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MASKMOVDQU} stores data from xmm1 to the location specified by
|
||
|
\c{ES:(E)DI}. The size of the store depends on the address-size
|
||
|
attribute. The most significant bit in each byte of the mask
|
||
|
register xmm2 is used to selectively write the data (0 = no write,
|
||
|
1 = write) on a per-byte basis.
|
||
|
|
||
|
|
||
|
\S{insMASKMOVQ} \i\c{MASKMOVQ}: Byte Mask Write
|
||
|
|
||
|
\c MASKMOVQ mm1,mm2 ; 0F F7 /r [KATMAI,MMX]
|
||
|
|
||
|
\c{MASKMOVQ} stores data from mm1 to the location specified by
|
||
|
\c{ES:(E)DI}. The size of the store depends on the address-size
|
||
|
attribute. The most significant bit in each byte of the mask
|
||
|
register mm2 is used to selectively write the data (0 = no write,
|
||
|
1 = write) on a per-byte basis.
|
||
|
|
||
|
|
||
|
\S{insMAXPD} \i\c{MAXPD}: Return Packed Double-Precision FP Maximum
|
||
|
|
||
|
\c MAXPD xmm1,xmm2/m128 ; 66 0F 5F /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MAXPD} performs a SIMD compare of the packed double-precision
|
||
|
FP numbers from xmm1 and xmm2/mem, and stores the maximum values
|
||
|
of each pair of values in xmm1. If the values being compared are
|
||
|
both zeroes, source2 (xmm2/m128) would be returned. If source2
|
||
|
(xmm2/m128) is an SNaN, this SNaN is forwarded unchanged to the
|
||
|
destination (i.e., a QNaN version of the SNaN is not returned).
|
||
|
|
||
|
|
||
|
\S{insMAXPS} \i\c{MAXPS}: Return Packed Single-Precision FP Maximum
|
||
|
|
||
|
\c MAXPS xmm1,xmm2/m128 ; 0F 5F /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MAXPS} performs a SIMD compare of the packed single-precision
|
||
|
FP numbers from xmm1 and xmm2/mem, and stores the maximum values
|
||
|
of each pair of values in xmm1. If the values being compared are
|
||
|
both zeroes, source2 (xmm2/m128) would be returned. If source2
|
||
|
(xmm2/m128) is an SNaN, this SNaN is forwarded unchanged to the
|
||
|
destination (i.e., a QNaN version of the SNaN is not returned).
|
||
|
|
||
|
|
||
|
\S{insMAXSD} \i\c{MAXSD}: Return Scalar Double-Precision FP Maximum
|
||
|
|
||
|
\c MAXSD xmm1,xmm2/m64 ; F2 0F 5F /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MAXSD} compares the low-order double-precision FP numbers from
|
||
|
xmm1 and xmm2/mem, and stores the maximum value in xmm1. If the
|
||
|
values being compared are both zeroes, source2 (xmm2/m64) would
|
||
|
be returned. If source2 (xmm2/m64) is an SNaN, this SNaN is
|
||
|
forwarded unchanged to the destination (i.e., a QNaN version of
|
||
|
the SNaN is not returned). The high quadword of the destination
|
||
|
is left unchanged.
|
||
|
|
||
|
|
||
|
\S{insMAXSS} \i\c{MAXSS}: Return Scalar Single-Precision FP Maximum
|
||
|
|
||
|
\c MAXSS xmm1,xmm2/m32 ; F3 0F 5F /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MAXSS} compares the low-order single-precision FP numbers from
|
||
|
xmm1 and xmm2/mem, and stores the maximum value in xmm1. If the
|
||
|
values being compared are both zeroes, source2 (xmm2/m32) would
|
||
|
be returned. If source2 (xmm2/m32) is an SNaN, this SNaN is
|
||
|
forwarded unchanged to the destination (i.e., a QNaN version of
|
||
|
the SNaN is not returned). The high three doublewords of the
|
||
|
destination are left unchanged.
|
||
|
|
||
|
|
||
|
\S{insMFENCE} \i\c{MFENCE}: Memory Fence
|
||
|
|
||
|
\c MFENCE ; 0F AE /6 [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MFENCE} performs a serialising operation on all loads from memory
|
||
|
and writes to memory that were issued before the \c{MFENCE} instruction.
|
||
|
This guarantees that all memory reads and writes before the \c{MFENCE}
|
||
|
instruction are completed before any reads and writes after the
|
||
|
\c{MFENCE} instruction.
|
||
|
|
||
|
\c{MFENCE} is ordered respective to other \c{MFENCE} instructions,
|
||
|
\c{LFENCE}, \c{SFENCE}, any memory read and any other serialising
|
||
|
instruction (such as \c{CPUID}).
|
||
|
|
||
|
Weakly ordered memory types can be used to achieve higher processor
|
||
|
performance through such techniques as out-of-order issue, speculative
|
||
|
reads, write-combining, and write-collapsing. The degree to which a
|
||
|
consumer of data recognizes or knows that the data is weakly ordered
|
||
|
varies among applications and may be unknown to the producer of this
|
||
|
data. The \c{MFENCE} instruction provides a performance-efficient way
|
||
|
of ensuring load and store ordering between routines that produce
|
||
|
weakly-ordered results and routines that consume that data.
|
||
|
|
||
|
\c{MFENCE} uses the following ModRM encoding:
|
||
|
|
||
|
\c Mod (7:6) = 11B
|
||
|
\c Reg/Opcode (5:3) = 110B
|
||
|
\c R/M (2:0) = 000B
|
||
|
|
||
|
All other ModRM encodings are defined to be reserved, and use
|
||
|
of these encodings risks incompatibility with future processors.
|
||
|
|
||
|
See also \c{LFENCE} (\k{insLFENCE}) and \c{SFENCE} (\k{insSFENCE}).
|
||
|
|
||
|
|
||
|
\S{insMINPD} \i\c{MINPD}: Return Packed Double-Precision FP Minimum
|
||
|
|
||
|
\c MINPD xmm1,xmm2/m128 ; 66 0F 5D /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MINPD} performs a SIMD compare of the packed double-precision
|
||
|
FP numbers from xmm1 and xmm2/mem, and stores the minimum values
|
||
|
of each pair of values in xmm1. If the values being compared are
|
||
|
both zeroes, source2 (xmm2/m128) would be returned. If source2
|
||
|
(xmm2/m128) is an SNaN, this SNaN is forwarded unchanged to the
|
||
|
destination (i.e., a QNaN version of the SNaN is not returned).
|
||
|
|
||
|
|
||
|
\S{insMINPS} \i\c{MINPS}: Return Packed Single-Precision FP Minimum
|
||
|
|
||
|
\c MINPS xmm1,xmm2/m128 ; 0F 5D /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MINPS} performs a SIMD compare of the packed single-precision
|
||
|
FP numbers from xmm1 and xmm2/mem, and stores the minimum values
|
||
|
of each pair of values in xmm1. If the values being compared are
|
||
|
both zeroes, source2 (xmm2/m128) would be returned. If source2
|
||
|
(xmm2/m128) is an SNaN, this SNaN is forwarded unchanged to the
|
||
|
destination (i.e., a QNaN version of the SNaN is not returned).
|
||
|
|
||
|
|
||
|
\S{insMINSD} \i\c{MINSD}: Return Scalar Double-Precision FP Minimum
|
||
|
|
||
|
\c MINSD xmm1,xmm2/m64 ; F2 0F 5D /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MINSD} compares the low-order double-precision FP numbers from
|
||
|
xmm1 and xmm2/mem, and stores the minimum value in xmm1. If the
|
||
|
values being compared are both zeroes, source2 (xmm2/m64) would
|
||
|
be returned. If source2 (xmm2/m64) is an SNaN, this SNaN is
|
||
|
forwarded unchanged to the destination (i.e., a QNaN version of
|
||
|
the SNaN is not returned). The high quadword of the destination
|
||
|
is left unchanged.
|
||
|
|
||
|
|
||
|
\S{insMINSS} \i\c{MINSS}: Return Scalar Single-Precision FP Minimum
|
||
|
|
||
|
\c MINSS xmm1,xmm2/m32 ; F3 0F 5D /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MINSS} compares the low-order single-precision FP numbers from
|
||
|
xmm1 and xmm2/mem, and stores the minimum value in xmm1. If the
|
||
|
values being compared are both zeroes, source2 (xmm2/m32) would
|
||
|
be returned. If source2 (xmm2/m32) is an SNaN, this SNaN is
|
||
|
forwarded unchanged to the destination (i.e., a QNaN version of
|
||
|
the SNaN is not returned). The high three doublewords of the
|
||
|
destination are left unchanged.
|
||
|
|
||
|
|
||
|
\S{insMOV} \i\c{MOV}: Move Data
|
||
|
|
||
|
\c MOV r/m8,reg8 ; 88 /r [8086]
|
||
|
\c MOV r/m16,reg16 ; o16 89 /r [8086]
|
||
|
\c MOV r/m32,reg32 ; o32 89 /r [386]
|
||
|
\c MOV reg8,r/m8 ; 8A /r [8086]
|
||
|
\c MOV reg16,r/m16 ; o16 8B /r [8086]
|
||
|
\c MOV reg32,r/m32 ; o32 8B /r [386]
|
||
|
|
||
|
\c MOV reg8,imm8 ; B0+r ib [8086]
|
||
|
\c MOV reg16,imm16 ; o16 B8+r iw [8086]
|
||
|
\c MOV reg32,imm32 ; o32 B8+r id [386]
|
||
|
\c MOV r/m8,imm8 ; C6 /0 ib [8086]
|
||
|
\c MOV r/m16,imm16 ; o16 C7 /0 iw [8086]
|
||
|
\c MOV r/m32,imm32 ; o32 C7 /0 id [386]
|
||
|
|
||
|
\c MOV AL,memoffs8 ; A0 ow/od [8086]
|
||
|
\c MOV AX,memoffs16 ; o16 A1 ow/od [8086]
|
||
|
\c MOV EAX,memoffs32 ; o32 A1 ow/od [386]
|
||
|
\c MOV memoffs8,AL ; A2 ow/od [8086]
|
||
|
\c MOV memoffs16,AX ; o16 A3 ow/od [8086]
|
||
|
\c MOV memoffs32,EAX ; o32 A3 ow/od [386]
|
||
|
|
||
|
\c MOV r/m16,segreg ; o16 8C /r [8086]
|
||
|
\c MOV r/m32,segreg ; o32 8C /r [386]
|
||
|
\c MOV segreg,r/m16 ; o16 8E /r [8086]
|
||
|
\c MOV segreg,r/m32 ; o32 8E /r [386]
|
||
|
|
||
|
\c MOV reg32,CR0/2/3/4 ; 0F 20 /r [386]
|
||
|
\c MOV reg32,DR0/1/2/3/6/7 ; 0F 21 /r [386]
|
||
|
\c MOV reg32,TR3/4/5/6/7 ; 0F 24 /r [386]
|
||
|
\c MOV CR0/2/3/4,reg32 ; 0F 22 /r [386]
|
||
|
\c MOV DR0/1/2/3/6/7,reg32 ; 0F 23 /r [386]
|
||
|
\c MOV TR3/4/5/6/7,reg32 ; 0F 26 /r [386]
|
||
|
|
||
|
\c{MOV} copies the contents of its source (second) operand into its
|
||
|
destination (first) operand.
|
||
|
|
||
|
In all forms of the \c{MOV} instruction, the two operands are the
|
||
|
same size, except for moving between a segment register and an
|
||
|
\c{r/m32} operand. These instructions are treated exactly like the
|
||
|
corresponding 16-bit equivalent (so that, for example, \c{MOV
|
||
|
DS,EAX} functions identically to \c{MOV DS,AX} but saves a prefix
|
||
|
when in 32-bit mode), except that when a segment register is moved
|
||
|
into a 32-bit destination, the top two bytes of the result are
|
||
|
undefined.
|
||
|
|
||
|
\c{MOV} may not use \c{CS} as a destination.
|
||
|
|
||
|
\c{CR4} is only a supported register on the Pentium and above.
|
||
|
|
||
|
Test registers are supported on 386/486 processors and on some
|
||
|
non-Intel Pentium class processors.
|
||
|
|
||
|
|
||
|
\S{insMOVAPD} \i\c{MOVAPD}: Move Aligned Packed Double-Precision FP Values
|
||
|
|
||
|
\c MOVAPD xmm1,xmm2/mem128 ; 66 0F 28 /r [WILLAMETTE,SSE2]
|
||
|
\c MOVAPD xmm1/mem128,xmm2 ; 66 0F 29 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVAPD} moves a double quadword containing 2 packed double-precision
|
||
|
FP values from the source operand to the destination. When the source
|
||
|
or destination operand is a memory location, it must be aligned on a
|
||
|
16-byte boundary.
|
||
|
|
||
|
To move data in and out of memory locations that are not known to be on
|
||
|
16-byte boundaries, use the \c{MOVUPD} instruction (\k{insMOVUPD}).
|
||
|
|
||
|
|
||
|
\S{insMOVAPS} \i\c{MOVAPS}: Move Aligned Packed Single-Precision FP Values
|
||
|
|
||
|
\c MOVAPS xmm1,xmm2/mem128 ; 0F 28 /r [KATMAI,SSE]
|
||
|
\c MOVAPS xmm1/mem128,xmm2 ; 0F 29 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MOVAPS} moves a double quadword containing 4 packed single-precision
|
||
|
FP values from the source operand to the destination. When the source
|
||
|
or destination operand is a memory location, it must be aligned on a
|
||
|
16-byte boundary.
|
||
|
|
||
|
To move data in and out of memory locations that are not known to be on
|
||
|
16-byte boundaries, use the \c{MOVUPS} instruction (\k{insMOVUPS}).
|
||
|
|
||
|
|
||
|
\S{insMOVD} \i\c{MOVD}: Move Doubleword to/from MMX Register
|
||
|
|
||
|
\c MOVD mm,r/m32 ; 0F 6E /r [PENT,MMX]
|
||
|
\c MOVD r/m32,mm ; 0F 7E /r [PENT,MMX]
|
||
|
\c MOVD xmm,r/m32 ; 66 0F 6E /r [WILLAMETTE,SSE2]
|
||
|
\c MOVD r/m32,xmm ; 66 0F 7E /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVD} copies 32 bits from its source (second) operand into its
|
||
|
destination (first) operand. When the destination is a 64-bit \c{MMX}
|
||
|
register or a 128-bit \c{XMM} register, the input value is zero-extended
|
||
|
to fill the destination register.
|
||
|
|
||
|
|
||
|
\S{insMOVDQ2Q} \i\c{MOVDQ2Q}: Move Quadword from XMM to MMX register.
|
||
|
|
||
|
\c MOVDQ2Q mm,xmm ; F2 OF D6 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVDQ2Q} moves the low quadword from the source operand to the
|
||
|
destination operand.
|
||
|
|
||
|
|
||
|
\S{insMOVDQA} \i\c{MOVDQA}: Move Aligned Double Quadword
|
||
|
|
||
|
\c MOVDQA xmm1,xmm2/m128 ; 66 OF 6F /r [WILLAMETTE,SSE2]
|
||
|
\c MOVDQA xmm1/m128,xmm2 ; 66 OF 7F /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVDQA} moves a double quadword from the source operand to the
|
||
|
destination operand. When the source or destination operand is a
|
||
|
memory location, it must be aligned to a 16-byte boundary.
|
||
|
|
||
|
To move a double quadword to or from unaligned memory locations,
|
||
|
use the \c{MOVDQU} instruction (\k{insMOVDQU}).
|
||
|
|
||
|
|
||
|
\S{insMOVDQU} \i\c{MOVDQU}: Move Unaligned Double Quadword
|
||
|
|
||
|
\c MOVDQU xmm1,xmm2/m128 ; F3 OF 6F /r [WILLAMETTE,SSE2]
|
||
|
\c MOVDQU xmm1/m128,xmm2 ; F3 OF 7F /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVDQU} moves a double quadword from the source operand to the
|
||
|
destination operand. When the source or destination operand is a
|
||
|
memory location, the memory may be unaligned.
|
||
|
|
||
|
To move a double quadword to or from known aligned memory locations,
|
||
|
use the \c{MOVDQA} instruction (\k{insMOVDQA}).
|
||
|
|
||
|
|
||
|
\S{insMOVHLPS} \i\c{MOVHLPS}: Move Packed Single-Precision FP High to Low
|
||
|
|
||
|
\c MOVHLPS xmm1,xmm2 ; OF 12 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MOVHLPS} moves the two packed single-precision FP values from the
|
||
|
high quadword of the source register xmm2 to the low quadword of the
|
||
|
destination register, xmm2. The upper quadword of xmm1 is left unchanged.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c dst[0-63] := src[64-127],
|
||
|
\c dst[64-127] remains unchanged.
|
||
|
|
||
|
|
||
|
\S{insMOVHPD} \i\c{MOVHPD}: Move High Packed Double-Precision FP
|
||
|
|
||
|
\c MOVHPD xmm,m64 ; 66 OF 16 /r [WILLAMETTE,SSE2]
|
||
|
\c MOVHPD m64,xmm ; 66 OF 17 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVHPD} moves a double-precision FP value between the source and
|
||
|
destination operands. One of the operands is a 64-bit memory location,
|
||
|
the other is the high quadword of an \c{XMM} register.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c mem[0-63] := xmm[64-127];
|
||
|
|
||
|
or
|
||
|
|
||
|
\c xmm[0-63] remains unchanged;
|
||
|
\c xmm[64-127] := mem[0-63].
|
||
|
|
||
|
|
||
|
\S{insMOVHPS} \i\c{MOVHPS}: Move High Packed Single-Precision FP
|
||
|
|
||
|
\c MOVHPS xmm,m64 ; 0F 16 /r [KATMAI,SSE]
|
||
|
\c MOVHPS m64,xmm ; 0F 17 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MOVHPS} moves two packed single-precision FP values between the source
|
||
|
and destination operands. One of the operands is a 64-bit memory location,
|
||
|
the other is the high quadword of an \c{XMM} register.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c mem[0-63] := xmm[64-127];
|
||
|
|
||
|
or
|
||
|
|
||
|
\c xmm[0-63] remains unchanged;
|
||
|
\c xmm[64-127] := mem[0-63].
|
||
|
|
||
|
|
||
|
\S{insMOVLHPS} \i\c{MOVLHPS}: Move Packed Single-Precision FP Low to High
|
||
|
|
||
|
\c MOVLHPS xmm1,xmm2 ; OF 16 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MOVLHPS} moves the two packed single-precision FP values from the
|
||
|
low quadword of the source register xmm2 to the high quadword of the
|
||
|
destination register, xmm2. The low quadword of xmm1 is left unchanged.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c dst[0-63] remains unchanged;
|
||
|
\c dst[64-127] := src[0-63].
|
||
|
|
||
|
\S{insMOVLPD} \i\c{MOVLPD}: Move Low Packed Double-Precision FP
|
||
|
|
||
|
\c MOVLPD xmm,m64 ; 66 OF 12 /r [WILLAMETTE,SSE2]
|
||
|
\c MOVLPD m64,xmm ; 66 OF 13 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVLPD} moves a double-precision FP value between the source and
|
||
|
destination operands. One of the operands is a 64-bit memory location,
|
||
|
the other is the low quadword of an \c{XMM} register.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c mem(0-63) := xmm(0-63);
|
||
|
|
||
|
or
|
||
|
|
||
|
\c xmm(0-63) := mem(0-63);
|
||
|
\c xmm(64-127) remains unchanged.
|
||
|
|
||
|
\S{insMOVLPS} \i\c{MOVLPS}: Move Low Packed Single-Precision FP
|
||
|
|
||
|
\c MOVLPS xmm,m64 ; OF 12 /r [KATMAI,SSE]
|
||
|
\c MOVLPS m64,xmm ; OF 13 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MOVLPS} moves two packed single-precision FP values between the source
|
||
|
and destination operands. One of the operands is a 64-bit memory location,
|
||
|
the other is the low quadword of an \c{XMM} register.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c mem(0-63) := xmm(0-63);
|
||
|
|
||
|
or
|
||
|
|
||
|
\c xmm(0-63) := mem(0-63);
|
||
|
\c xmm(64-127) remains unchanged.
|
||
|
|
||
|
|
||
|
\S{insMOVMSKPD} \i\c{MOVMSKPD}: Extract Packed Double-Precision FP Sign Mask
|
||
|
|
||
|
\c MOVMSKPD reg32,xmm ; 66 0F 50 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVMSKPD} inserts a 2-bit mask in r32, formed of the most significant
|
||
|
bits of each double-precision FP number of the source operand.
|
||
|
|
||
|
|
||
|
\S{insMOVMSKPS} \i\c{MOVMSKPS}: Extract Packed Single-Precision FP Sign Mask
|
||
|
|
||
|
\c MOVMSKPS reg32,xmm ; 0F 50 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MOVMSKPS} inserts a 4-bit mask in r32, formed of the most significant
|
||
|
bits of each single-precision FP number of the source operand.
|
||
|
|
||
|
|
||
|
\S{insMOVNTDQ} \i\c{MOVNTDQ}: Move Double Quadword Non Temporal
|
||
|
|
||
|
\c MOVNTDQ m128,xmm ; 66 0F E7 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVNTDQ} moves the double quadword from the \c{XMM} source
|
||
|
register to the destination memory location, using a non-temporal
|
||
|
hint. This store instruction minimizes cache pollution.
|
||
|
|
||
|
|
||
|
\S{insMOVNTI} \i\c{MOVNTI}: Move Doubleword Non Temporal
|
||
|
|
||
|
\c MOVNTI m32,reg32 ; 0F C3 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVNTI} moves the doubleword in the source register
|
||
|
to the destination memory location, using a non-temporal
|
||
|
hint. This store instruction minimizes cache pollution.
|
||
|
|
||
|
|
||
|
\S{insMOVNTPD} \i\c{MOVNTPD}: Move Aligned Four Packed Single-Precision
|
||
|
FP Values Non Temporal
|
||
|
|
||
|
\c MOVNTPD m128,xmm ; 66 0F 2B /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVNTPD} moves the double quadword from the \c{XMM} source
|
||
|
register to the destination memory location, using a non-temporal
|
||
|
hint. This store instruction minimizes cache pollution. The memory
|
||
|
location must be aligned to a 16-byte boundary.
|
||
|
|
||
|
|
||
|
\S{insMOVNTPS} \i\c{MOVNTPS}: Move Aligned Four Packed Single-Precision
|
||
|
FP Values Non Temporal
|
||
|
|
||
|
\c MOVNTPS m128,xmm ; 0F 2B /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MOVNTPS} moves the double quadword from the \c{XMM} source
|
||
|
register to the destination memory location, using a non-temporal
|
||
|
hint. This store instruction minimizes cache pollution. The memory
|
||
|
location must be aligned to a 16-byte boundary.
|
||
|
|
||
|
|
||
|
\S{insMOVNTQ} \i\c{MOVNTQ}: Move Quadword Non Temporal
|
||
|
|
||
|
\c MOVNTQ m64,mm ; 0F E7 /r [KATMAI,MMX]
|
||
|
|
||
|
\c{MOVNTQ} moves the quadword in the \c{MMX} source register
|
||
|
to the destination memory location, using a non-temporal
|
||
|
hint. This store instruction minimizes cache pollution.
|
||
|
|
||
|
|
||
|
\S{insMOVQ} \i\c{MOVQ}: Move Quadword to/from MMX Register
|
||
|
|
||
|
\c MOVQ mm1,mm2/m64 ; 0F 6F /r [PENT,MMX]
|
||
|
\c MOVQ mm1/m64,mm2 ; 0F 7F /r [PENT,MMX]
|
||
|
|
||
|
\c MOVQ xmm1,xmm2/m64 ; F3 0F 7E /r [WILLAMETTE,SSE2]
|
||
|
\c MOVQ xmm1/m64,xmm2 ; 66 0F D6 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVQ} copies 64 bits from its source (second) operand into its
|
||
|
destination (first) operand. When the source is an \c{XMM} register,
|
||
|
the low quadword is moved. When the destination is an \c{XMM} register,
|
||
|
the destination is the low quadword, and the high quadword is cleared.
|
||
|
|
||
|
|
||
|
\S{insMOVQ2DQ} \i\c{MOVQ2DQ}: Move Quadword from MMX to XMM register.
|
||
|
|
||
|
\c MOVQ2DQ xmm,mm ; F3 OF D6 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVQ2DQ} moves the quadword from the source operand to the low
|
||
|
quadword of the destination operand, and clears the high quadword.
|
||
|
|
||
|
|
||
|
\S{insMOVSB} \i\c{MOVSB}, \i\c{MOVSW}, \i\c{MOVSD}: Move String
|
||
|
|
||
|
\c MOVSB ; A4 [8086]
|
||
|
\c MOVSW ; o16 A5 [8086]
|
||
|
\c MOVSD ; o32 A5 [386]
|
||
|
|
||
|
\c{MOVSB} copies the byte at \c{[DS:SI]} or \c{[DS:ESI]} to
|
||
|
\c{[ES:DI]} or \c{[ES:EDI]}. It then increments or decrements
|
||
|
(depending on the direction flag: increments if the flag is clear,
|
||
|
decrements if it is set) \c{SI} and \c{DI} (or \c{ESI} and \c{EDI}).
|
||
|
|
||
|
The registers used are \c{SI} and \c{DI} if the address size is 16
|
||
|
bits, and \c{ESI} and \c{EDI} if it is 32 bits. If you need to use
|
||
|
an address size not equal to the current \c{BITS} setting, you can
|
||
|
use an explicit \i\c{a16} or \i\c{a32} prefix.
|
||
|
|
||
|
The segment register used to load from \c{[SI]} or \c{[ESI]} can be
|
||
|
overridden by using a segment register name as a prefix (for
|
||
|
example, \c{es movsb}). The use of \c{ES} for the store to \c{[DI]}
|
||
|
or \c{[EDI]} cannot be overridden.
|
||
|
|
||
|
\c{MOVSW} and \c{MOVSD} work in the same way, but they copy a word
|
||
|
or a doubleword instead of a byte, and increment or decrement the
|
||
|
addressing registers by 2 or 4 instead of 1.
|
||
|
|
||
|
The \c{REP} prefix may be used to repeat the instruction \c{CX} (or
|
||
|
\c{ECX} - again, the address size chooses which) times.
|
||
|
|
||
|
|
||
|
\S{insMOVSD} \i\c{MOVSD}: Move Scalar Double-Precision FP Value
|
||
|
|
||
|
\c MOVSD xmm1,xmm2/m64 ; F2 0F 10 /r [WILLAMETTE,SSE2]
|
||
|
\c MOVSD xmm1/m64,xmm2 ; F2 0F 11 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVSD} moves a double-precision FP value from the source operand
|
||
|
to the destination operand. When the source or destination is a
|
||
|
register, the low-order FP value is read or written.
|
||
|
|
||
|
|
||
|
\S{insMOVSS} \i\c{MOVSS}: Move Scalar Single-Precision FP Value
|
||
|
|
||
|
\c MOVSS xmm1,xmm2/m32 ; F3 0F 10 /r [KATMAI,SSE]
|
||
|
\c MOVSS xmm1/m32,xmm2 ; F3 0F 11 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MOVSS} moves a single-precision FP value from the source operand
|
||
|
to the destination operand. When the source or destination is a
|
||
|
register, the low-order FP value is read or written.
|
||
|
|
||
|
|
||
|
\S{insMOVSX} \i\c{MOVSX}, \i\c{MOVZX}: Move Data with Sign or Zero Extend
|
||
|
|
||
|
\c MOVSX reg16,r/m8 ; o16 0F BE /r [386]
|
||
|
\c MOVSX reg32,r/m8 ; o32 0F BE /r [386]
|
||
|
\c MOVSX reg32,r/m16 ; o32 0F BF /r [386]
|
||
|
|
||
|
\c MOVZX reg16,r/m8 ; o16 0F B6 /r [386]
|
||
|
\c MOVZX reg32,r/m8 ; o32 0F B6 /r [386]
|
||
|
\c MOVZX reg32,r/m16 ; o32 0F B7 /r [386]
|
||
|
|
||
|
\c{MOVSX} sign-extends its source (second) operand to the length of
|
||
|
its destination (first) operand, and copies the result into the
|
||
|
destination operand. \c{MOVZX} does the same, but zero-extends
|
||
|
rather than sign-extending.
|
||
|
|
||
|
|
||
|
\S{insMOVUPD} \i\c{MOVUPD}: Move Unaligned Packed Double-Precision FP Values
|
||
|
|
||
|
\c MOVUPD xmm1,xmm2/mem128 ; 66 0F 10 /r [WILLAMETTE,SSE2]
|
||
|
\c MOVUPD xmm1/mem128,xmm2 ; 66 0F 11 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MOVUPD} moves a double quadword containing 2 packed double-precision
|
||
|
FP values from the source operand to the destination. This instruction
|
||
|
makes no assumptions about alignment of memory operands.
|
||
|
|
||
|
To move data in and out of memory locations that are known to be on 16-byte
|
||
|
boundaries, use the \c{MOVAPD} instruction (\k{insMOVAPD}).
|
||
|
|
||
|
|
||
|
\S{insMOVUPS} \i\c{MOVUPS}: Move Unaligned Packed Single-Precision FP Values
|
||
|
|
||
|
\c MOVUPS xmm1,xmm2/mem128 ; 0F 10 /r [KATMAI,SSE]
|
||
|
\c MOVUPS xmm1/mem128,xmm2 ; 0F 11 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MOVUPS} moves a double quadword containing 4 packed single-precision
|
||
|
FP values from the source operand to the destination. This instruction
|
||
|
makes no assumptions about alignment of memory operands.
|
||
|
|
||
|
To move data in and out of memory locations that are known to be on 16-byte
|
||
|
boundaries, use the \c{MOVAPS} instruction (\k{insMOVAPS}).
|
||
|
|
||
|
|
||
|
\S{insMUL} \i\c{MUL}: Unsigned Integer Multiply
|
||
|
|
||
|
\c MUL r/m8 ; F6 /4 [8086]
|
||
|
\c MUL r/m16 ; o16 F7 /4 [8086]
|
||
|
\c MUL r/m32 ; o32 F7 /4 [386]
|
||
|
|
||
|
\c{MUL} performs unsigned integer multiplication. The other operand
|
||
|
to the multiplication, and the destination operand, are implicit, in
|
||
|
the following way:
|
||
|
|
||
|
\b For \c{MUL r/m8}, \c{AL} is multiplied by the given operand; the
|
||
|
product is stored in \c{AX}.
|
||
|
|
||
|
\b For \c{MUL r/m16}, \c{AX} is multiplied by the given operand;
|
||
|
the product is stored in \c{DX:AX}.
|
||
|
|
||
|
\b For \c{MUL r/m32}, \c{EAX} is multiplied by the given operand;
|
||
|
the product is stored in \c{EDX:EAX}.
|
||
|
|
||
|
Signed integer multiplication is performed by the \c{IMUL}
|
||
|
instruction: see \k{insIMUL}.
|
||
|
|
||
|
|
||
|
\S{insMULPD} \i\c{MULPD}: Packed Single-FP Multiply
|
||
|
|
||
|
\c MULPD xmm1,xmm2/mem128 ; 66 0F 59 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MULPD} performs a SIMD multiply of the packed double-precision FP
|
||
|
values in both operands, and stores the results in the destination register.
|
||
|
|
||
|
|
||
|
\S{insMULPS} \i\c{MULPS}: Packed Single-FP Multiply
|
||
|
|
||
|
\c MULPS xmm1,xmm2/mem128 ; 0F 59 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MULPS} performs a SIMD multiply of the packed single-precision FP
|
||
|
values in both operands, and stores the results in the destination register.
|
||
|
|
||
|
|
||
|
\S{insMULSD} \i\c{MULSD}: Scalar Single-FP Multiply
|
||
|
|
||
|
\c MULSD xmm1,xmm2/mem32 ; F2 0F 59 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{MULSD} multiplies the lowest double-precision FP values of both
|
||
|
operands, and stores the result in the low quadword of xmm1.
|
||
|
|
||
|
|
||
|
\S{insMULSS} \i\c{MULSS}: Scalar Single-FP Multiply
|
||
|
|
||
|
\c MULSS xmm1,xmm2/mem32 ; F3 0F 59 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{MULSS} multiplies the lowest single-precision FP values of both
|
||
|
operands, and stores the result in the low doubleword of xmm1.
|
||
|
|
||
|
|
||
|
\S{insNEG} \i\c{NEG}, \i\c{NOT}: Two's and One's Complement
|
||
|
|
||
|
\c NEG r/m8 ; F6 /3 [8086]
|
||
|
\c NEG r/m16 ; o16 F7 /3 [8086]
|
||
|
\c NEG r/m32 ; o32 F7 /3 [386]
|
||
|
|
||
|
\c NOT r/m8 ; F6 /2 [8086]
|
||
|
\c NOT r/m16 ; o16 F7 /2 [8086]
|
||
|
\c NOT r/m32 ; o32 F7 /2 [386]
|
||
|
|
||
|
\c{NEG} replaces the contents of its operand by the two's complement
|
||
|
negation (invert all the bits and then add one) of the original
|
||
|
value. \c{NOT}, similarly, performs one's complement (inverts all
|
||
|
the bits).
|
||
|
|
||
|
|
||
|
\S{insNOP} \i\c{NOP}: No Operation
|
||
|
|
||
|
\c NOP ; 90 [8086]
|
||
|
|
||
|
\c{NOP} performs no operation. Its opcode is the same as that
|
||
|
generated by \c{XCHG AX,AX} or \c{XCHG EAX,EAX} (depending on the
|
||
|
processor mode; see \k{insXCHG}).
|
||
|
|
||
|
|
||
|
\S{insOR} \i\c{OR}: Bitwise OR
|
||
|
|
||
|
\c OR r/m8,reg8 ; 08 /r [8086]
|
||
|
\c OR r/m16,reg16 ; o16 09 /r [8086]
|
||
|
\c OR r/m32,reg32 ; o32 09 /r [386]
|
||
|
|
||
|
\c OR reg8,r/m8 ; 0A /r [8086]
|
||
|
\c OR reg16,r/m16 ; o16 0B /r [8086]
|
||
|
\c OR reg32,r/m32 ; o32 0B /r [386]
|
||
|
|
||
|
\c OR r/m8,imm8 ; 80 /1 ib [8086]
|
||
|
\c OR r/m16,imm16 ; o16 81 /1 iw [8086]
|
||
|
\c OR r/m32,imm32 ; o32 81 /1 id [386]
|
||
|
|
||
|
\c OR r/m16,imm8 ; o16 83 /1 ib [8086]
|
||
|
\c OR r/m32,imm8 ; o32 83 /1 ib [386]
|
||
|
|
||
|
\c OR AL,imm8 ; 0C ib [8086]
|
||
|
\c OR AX,imm16 ; o16 0D iw [8086]
|
||
|
\c OR EAX,imm32 ; o32 0D id [386]
|
||
|
|
||
|
\c{OR} performs a bitwise OR operation between its two operands
|
||
|
(i.e. each bit of the result is 1 if and only if at least one of the
|
||
|
corresponding bits of the two inputs was 1), and stores the result
|
||
|
in the destination (first) operand.
|
||
|
|
||
|
In the forms with an 8-bit immediate second operand and a longer
|
||
|
first operand, the second operand is considered to be signed, and is
|
||
|
sign-extended to the length of the first operand. In these cases,
|
||
|
the \c{BYTE} qualifier is necessary to force NASM to generate this
|
||
|
form of the instruction.
|
||
|
|
||
|
The MMX instruction \c{POR} (see \k{insPOR}) performs the same
|
||
|
operation on the 64-bit MMX registers.
|
||
|
|
||
|
|
||
|
\S{insORPD} \i\c{ORPD}: Bit-wise Logical OR of Double-Precision FP Data
|
||
|
|
||
|
\c ORPD xmm1,xmm2/m128 ; 66 0F 56 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{ORPD} return a bit-wise logical OR between xmm1 and xmm2/mem,
|
||
|
and stores the result in xmm1. If the source operand is a memory
|
||
|
location, it must be aligned to a 16-byte boundary.
|
||
|
|
||
|
|
||
|
\S{insORPS} \i\c{ORPS}: Bit-wise Logical OR of Single-Precision FP Data
|
||
|
|
||
|
\c ORPS xmm1,xmm2/m128 ; 0F 56 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{ORPS} return a bit-wise logical OR between xmm1 and xmm2/mem,
|
||
|
and stores the result in xmm1. If the source operand is a memory
|
||
|
location, it must be aligned to a 16-byte boundary.
|
||
|
|
||
|
|
||
|
\S{insOUT} \i\c{OUT}: Output Data to I/O Port
|
||
|
|
||
|
\c OUT imm8,AL ; E6 ib [8086]
|
||
|
\c OUT imm8,AX ; o16 E7 ib [8086]
|
||
|
\c OUT imm8,EAX ; o32 E7 ib [386]
|
||
|
\c OUT DX,AL ; EE [8086]
|
||
|
\c OUT DX,AX ; o16 EF [8086]
|
||
|
\c OUT DX,EAX ; o32 EF [386]
|
||
|
|
||
|
\c{OUT} writes the contents of the given source register to the
|
||
|
specified I/O port. The port number may be specified as an immediate
|
||
|
value if it is between 0 and 255, and otherwise must be stored in
|
||
|
\c{DX}. See also \c{IN} (\k{insIN}).
|
||
|
|
||
|
|
||
|
\S{insOUTSB} \i\c{OUTSB}, \i\c{OUTSW}, \i\c{OUTSD}: Output String to I/O Port
|
||
|
|
||
|
\c OUTSB ; 6E [186]
|
||
|
\c OUTSW ; o16 6F [186]
|
||
|
\c OUTSD ; o32 6F [386]
|
||
|
|
||
|
\c{OUTSB} loads a byte from \c{[DS:SI]} or \c{[DS:ESI]} and writes
|
||
|
it to the I/O port specified in \c{DX}. It then increments or
|
||
|
decrements (depending on the direction flag: increments if the flag
|
||
|
is clear, decrements if it is set) \c{SI} or \c{ESI}.
|
||
|
|
||
|
The register used is \c{SI} if the address size is 16 bits, and
|
||
|
\c{ESI} if it is 32 bits. If you need to use an address size not
|
||
|
equal to the current \c{BITS} setting, you can use an explicit
|
||
|
\i\c{a16} or \i\c{a32} prefix.
|
||
|
|
||
|
The segment register used to load from \c{[SI]} or \c{[ESI]} can be
|
||
|
overridden by using a segment register name as a prefix (for
|
||
|
example, \c{es outsb}).
|
||
|
|
||
|
\c{OUTSW} and \c{OUTSD} work in the same way, but they output a
|
||
|
word or a doubleword instead of a byte, and increment or decrement
|
||
|
the addressing registers by 2 or 4 instead of 1.
|
||
|
|
||
|
The \c{REP} prefix may be used to repeat the instruction \c{CX} (or
|
||
|
\c{ECX} - again, the address size chooses which) times.
|
||
|
|
||
|
|
||
|
\S{insPACKSSDW} \i\c{PACKSSDW}, \i\c{PACKSSWB}, \i\c{PACKUSWB}: Pack Data
|
||
|
|
||
|
\c PACKSSDW mm1,mm2/m64 ; 0F 6B /r [PENT,MMX]
|
||
|
\c PACKSSWB mm1,mm2/m64 ; 0F 63 /r [PENT,MMX]
|
||
|
\c PACKUSWB mm1,mm2/m64 ; 0F 67 /r [PENT,MMX]
|
||
|
|
||
|
\c PACKSSDW xmm1,xmm2/m128 ; 66 0F 6B /r [WILLAMETTE,SSE2]
|
||
|
\c PACKSSWB xmm1,xmm2/m128 ; 66 0F 63 /r [WILLAMETTE,SSE2]
|
||
|
\c PACKUSWB xmm1,xmm2/m128 ; 66 0F 67 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
All these instructions start by combining the source and destination
|
||
|
operands, and then splitting the result in smaller sections which it
|
||
|
then packs into the destination register. The \c{MMX} versions pack
|
||
|
two 64-bit operands into one 64-bit register, while the \c{SSE}
|
||
|
versions pack two 128-bit operands into one 128-bit register.
|
||
|
|
||
|
\b \c{PACKSSWB} splits the combined value into words, and then reduces
|
||
|
the words to bytes, using signed saturation. It then packs the bytes
|
||
|
into the destination register in the same order the words were in.
|
||
|
|
||
|
\b \c{PACKSSDW} performs the same operation as \c{PACKSSWB}, except that
|
||
|
it reduces doublewords to words, then packs them into the destination
|
||
|
register.
|
||
|
|
||
|
\b \c{PACKUSWB} performs the same operation as \c{PACKSSWB}, except that
|
||
|
it uses unsigned saturation when reducing the size of the elements.
|
||
|
|
||
|
To perform signed saturation on a number, it is replaced by the largest
|
||
|
signed number (\c{7FFFh} or \c{7Fh}) that \e{will} fit, and if it is too
|
||
|
small it is replaced by the smallest signed number (\c{8000h} or
|
||
|
\c{80h}) that will fit. To perform unsigned saturation, the input is
|
||
|
treated as unsigned, and the input is replaced by the largest unsigned
|
||
|
number that will fit.
|
||
|
|
||
|
|
||
|
\S{insPADDB} \i\c{PADDB}, \i\c{PADDW}, \i\c{PADDD}: Add Packed Integers
|
||
|
|
||
|
\c PADDB mm1,mm2/m64 ; 0F FC /r [PENT,MMX]
|
||
|
\c PADDW mm1,mm2/m64 ; 0F FD /r [PENT,MMX]
|
||
|
\c PADDD mm1,mm2/m64 ; 0F FE /r [PENT,MMX]
|
||
|
|
||
|
\c PADDB xmm1,xmm2/m128 ; 66 0F FC /r [WILLAMETTE,SSE2]
|
||
|
\c PADDW xmm1,xmm2/m128 ; 66 0F FD /r [WILLAMETTE,SSE2]
|
||
|
\c PADDD xmm1,xmm2/m128 ; 66 0F FE /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PADDx} performs packed addition of the two operands, storing the
|
||
|
result in the destination (first) operand.
|
||
|
|
||
|
\b \c{PADDB} treats the operands as packed bytes, and adds each byte
|
||
|
individually;
|
||
|
|
||
|
\b \c{PADDW} treats the operands as packed words;
|
||
|
|
||
|
\b \c{PADDD} treats its operands as packed doublewords.
|
||
|
|
||
|
When an individual result is too large to fit in its destination, it
|
||
|
is wrapped around and the low bits are stored, with the carry bit
|
||
|
discarded.
|
||
|
|
||
|
|
||
|
\S{insPADDQ} \i\c{PADDQ}: Add Packed Quadword Integers
|
||
|
|
||
|
\c PADDQ mm1,mm2/m64 ; 0F D4 /r [PENT,MMX]
|
||
|
|
||
|
\c PADDQ xmm1,xmm2/m128 ; 66 0F D4 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PADDQ} adds the quadwords in the source and destination operands, and
|
||
|
stores the result in the destination register.
|
||
|
|
||
|
When an individual result is too large to fit in its destination, it
|
||
|
is wrapped around and the low bits are stored, with the carry bit
|
||
|
discarded.
|
||
|
|
||
|
|
||
|
\S{insPADDSB} \i\c{PADDSB}, \i\c{PADDSW}: Add Packed Signed Integers With Saturation
|
||
|
|
||
|
\c PADDSB mm1,mm2/m64 ; 0F EC /r [PENT,MMX]
|
||
|
\c PADDSW mm1,mm2/m64 ; 0F ED /r [PENT,MMX]
|
||
|
|
||
|
\c PADDSB xmm1,xmm2/m128 ; 66 0F EC /r [WILLAMETTE,SSE2]
|
||
|
\c PADDSW xmm1,xmm2/m128 ; 66 0F ED /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PADDSx} performs packed addition of the two operands, storing the
|
||
|
result in the destination (first) operand.
|
||
|
\c{PADDSB} treats the operands as packed bytes, and adds each byte
|
||
|
individually; and \c{PADDSW} treats the operands as packed words.
|
||
|
|
||
|
When an individual result is too large to fit in its destination, a
|
||
|
saturated value is stored. The resulting value is the value with the
|
||
|
largest magnitude of the same sign as the result which will fit in
|
||
|
the available space.
|
||
|
|
||
|
|
||
|
\S{insPADDSIW} \i\c{PADDSIW}: MMX Packed Addition to Implicit Destination
|
||
|
|
||
|
\c PADDSIW mmxreg,r/m64 ; 0F 51 /r [CYRIX,MMX]
|
||
|
|
||
|
\c{PADDSIW}, specific to the Cyrix extensions to the MMX instruction
|
||
|
set, performs the same function as \c{PADDSW}, except that the result
|
||
|
is placed in an implied register.
|
||
|
|
||
|
To work out the implied register, invert the lowest bit in the register
|
||
|
number. So \c{PADDSIW MM0,MM2} would put the result in \c{MM1}, but
|
||
|
\c{PADDSIW MM1,MM2} would put the result in \c{MM0}.
|
||
|
|
||
|
|
||
|
\S{insPADDUSB} \i\c{PADDUSB}, \i\c{PADDUSW}: Add Packed Unsigned Integers With Saturation
|
||
|
|
||
|
\c PADDUSB mm1,mm2/m64 ; 0F DC /r [PENT,MMX]
|
||
|
\c PADDUSW mm1,mm2/m64 ; 0F DD /r [PENT,MMX]
|
||
|
|
||
|
\c PADDUSB xmm1,xmm2/m128 ; 66 0F DC /r [WILLAMETTE,SSE2]
|
||
|
\c PADDUSW xmm1,xmm2/m128 ; 66 0F DD /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PADDUSx} performs packed addition of the two operands, storing the
|
||
|
result in the destination (first) operand.
|
||
|
\c{PADDUSB} treats the operands as packed bytes, and adds each byte
|
||
|
individually; and \c{PADDUSW} treats the operands as packed words.
|
||
|
|
||
|
When an individual result is too large to fit in its destination, a
|
||
|
saturated value is stored. The resulting value is the maximum value
|
||
|
that will fit in the available space.
|
||
|
|
||
|
|
||
|
\S{insPAND} \i\c{PAND}, \i\c{PANDN}: MMX Bitwise AND and AND-NOT
|
||
|
|
||
|
\c PAND mm1,mm2/m64 ; 0F DB /r [PENT,MMX]
|
||
|
\c PANDN mm1,mm2/m64 ; 0F DF /r [PENT,MMX]
|
||
|
|
||
|
\c PAND xmm1,xmm2/m128 ; 66 0F DB /r [WILLAMETTE,SSE2]
|
||
|
\c PANDN xmm1,xmm2/m128 ; 66 0F DF /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
|
||
|
\c{PAND} performs a bitwise AND operation between its two operands
|
||
|
(i.e. each bit of the result is 1 if and only if the corresponding
|
||
|
bits of the two inputs were both 1), and stores the result in the
|
||
|
destination (first) operand.
|
||
|
|
||
|
\c{PANDN} performs the same operation, but performs a one's
|
||
|
complement operation on the destination (first) operand first.
|
||
|
|
||
|
|
||
|
\S{insPAUSE} \i\c{PAUSE}: Spin Loop Hint
|
||
|
|
||
|
\c PAUSE ; F3 90 [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PAUSE} provides a hint to the processor that the following code
|
||
|
is a spin loop. This improves processor performance by bypassing
|
||
|
possible memory order violations. On older processors, this instruction
|
||
|
operates as a \c{NOP}.
|
||
|
|
||
|
|
||
|
\S{insPAVEB} \i\c{PAVEB}: MMX Packed Average
|
||
|
|
||
|
\c PAVEB mmxreg,r/m64 ; 0F 50 /r [CYRIX,MMX]
|
||
|
|
||
|
\c{PAVEB}, specific to the Cyrix MMX extensions, treats its two
|
||
|
operands as vectors of eight unsigned bytes, and calculates the
|
||
|
average of the corresponding bytes in the operands. The resulting
|
||
|
vector of eight averages is stored in the first operand.
|
||
|
|
||
|
This opcode maps to \c{MOVMSKPS r32, xmm} on processors that support
|
||
|
the SSE instruction set.
|
||
|
|
||
|
|
||
|
\S{insPAVGB} \i\c{PAVGB} \i\c{PAVGW}: Average Packed Integers
|
||
|
|
||
|
\c PAVGB mm1,mm2/m64 ; 0F E0 /r [KATMAI,MMX]
|
||
|
\c PAVGW mm1,mm2/m64 ; 0F E3 /r [KATMAI,MMX,SM]
|
||
|
|
||
|
\c PAVGB xmm1,xmm2/m128 ; 66 0F E0 /r [WILLAMETTE,SSE2]
|
||
|
\c PAVGW xmm1,xmm2/m128 ; 66 0F E3 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PAVGB} and \c{PAVGW} add the unsigned data elements of the source
|
||
|
operand to the unsigned data elements of the destination register,
|
||
|
then adds 1 to the temporary results. The results of the add are then
|
||
|
each independently right-shifted by one bit position. The high order
|
||
|
bits of each element are filled with the carry bits of the corresponding
|
||
|
sum.
|
||
|
|
||
|
\b \c{PAVGB} operates on packed unsigned bytes, and
|
||
|
|
||
|
\b \c{PAVGW} operates on packed unsigned words.
|
||
|
|
||
|
|
||
|
\S{insPAVGUSB} \i\c{PAVGUSB}: Average of unsigned packed 8-bit values
|
||
|
|
||
|
\c PAVGUSB mm1,mm2/m64 ; 0F 0F /r BF [PENT,3DNOW]
|
||
|
|
||
|
\c{PAVGUSB} adds the unsigned data elements of the source operand to
|
||
|
the unsigned data elements of the destination register, then adds 1
|
||
|
to the temporary results. The results of the add are then each
|
||
|
independently right-shifted by one bit position. The high order bits
|
||
|
of each element are filled with the carry bits of the corresponding
|
||
|
sum.
|
||
|
|
||
|
This instruction performs exactly the same operations as the \c{PAVGB}
|
||
|
\c{MMX} instruction (\k{insPAVGB}).
|
||
|
|
||
|
|
||
|
\S{insPCMPEQB} \i\c{PCMPxx}: Compare Packed Integers.
|
||
|
|
||
|
\c PCMPEQB mm1,mm2/m64 ; 0F 74 /r [PENT,MMX]
|
||
|
\c PCMPEQW mm1,mm2/m64 ; 0F 75 /r [PENT,MMX]
|
||
|
\c PCMPEQD mm1,mm2/m64 ; 0F 76 /r [PENT,MMX]
|
||
|
|
||
|
\c PCMPGTB mm1,mm2/m64 ; 0F 64 /r [PENT,MMX]
|
||
|
\c PCMPGTW mm1,mm2/m64 ; 0F 65 /r [PENT,MMX]
|
||
|
\c PCMPGTD mm1,mm2/m64 ; 0F 66 /r [PENT,MMX]
|
||
|
|
||
|
\c PCMPEQB xmm1,xmm2/m128 ; 66 0F 74 /r [WILLAMETTE,SSE2]
|
||
|
\c PCMPEQW xmm1,xmm2/m128 ; 66 0F 75 /r [WILLAMETTE,SSE2]
|
||
|
\c PCMPEQD xmm1,xmm2/m128 ; 66 0F 76 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c PCMPGTB xmm1,xmm2/m128 ; 66 0F 64 /r [WILLAMETTE,SSE2]
|
||
|
\c PCMPGTW xmm1,xmm2/m128 ; 66 0F 65 /r [WILLAMETTE,SSE2]
|
||
|
\c PCMPGTD xmm1,xmm2/m128 ; 66 0F 66 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
The \c{PCMPxx} instructions all treat their operands as vectors of
|
||
|
bytes, words, or doublewords; corresponding elements of the source
|
||
|
and destination are compared, and the corresponding element of the
|
||
|
destination (first) operand is set to all zeros or all ones
|
||
|
depending on the result of the comparison.
|
||
|
|
||
|
\b \c{PCMPxxB} treats the operands as vectors of bytes;
|
||
|
|
||
|
\b \c{PCMPxxW} treats the operands as vectors of words;
|
||
|
|
||
|
\b \c{PCMPxxD} treats the operands as vectors of doublewords;
|
||
|
|
||
|
\b \c{PCMPEQx} sets the corresponding element of the destination
|
||
|
operand to all ones if the two elements compared are equal;
|
||
|
|
||
|
\b \c{PCMPGTx} sets the destination element to all ones if the element
|
||
|
of the first (destination) operand is greater (treated as a signed
|
||
|
integer) than that of the second (source) operand.
|
||
|
|
||
|
|
||
|
\S{insPDISTIB} \i\c{PDISTIB}: MMX Packed Distance and Accumulate
|
||
|
with Implied Register
|
||
|
|
||
|
\c PDISTIB mm,m64 ; 0F 54 /r [CYRIX,MMX]
|
||
|
|
||
|
\c{PDISTIB}, specific to the Cyrix MMX extensions, treats its two
|
||
|
input operands as vectors of eight unsigned bytes. For each byte
|
||
|
position, it finds the absolute difference between the bytes in that
|
||
|
position in the two input operands, and adds that value to the byte
|
||
|
in the same position in the implied output register. The addition is
|
||
|
saturated to an unsigned byte in the same way as \c{PADDUSB}.
|
||
|
|
||
|
To work out the implied register, invert the lowest bit in the register
|
||
|
number. So \c{PDISTIB MM0,M64} would put the result in \c{MM1}, but
|
||
|
\c{PDISTIB MM1,M64} would put the result in \c{MM0}.
|
||
|
|
||
|
Note that \c{PDISTIB} cannot take a register as its second source
|
||
|
operand.
|
||
|
|
||
|
Operation:
|
||
|
|
||
|
\c dstI[0-7] := dstI[0-7] + ABS(src0[0-7] - src1[0-7]),
|
||
|
\c dstI[8-15] := dstI[8-15] + ABS(src0[8-15] - src1[8-15]),
|
||
|
\c .......
|
||
|
\c .......
|
||
|
\c dstI[56-63] := dstI[56-63] + ABS(src0[56-63] - src1[56-63]).
|
||
|
|
||
|
|
||
|
\S{insPEXTRW} \i\c{PEXTRW}: Extract Word
|
||
|
|
||
|
\c PEXTRW reg32,mm,imm8 ; 0F C5 /r ib [KATMAI,MMX]
|
||
|
\c PEXTRW reg32,xmm,imm8 ; 66 0F C5 /r ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PEXTRW} moves the word in the source register (second operand)
|
||
|
that is pointed to by the count operand (third operand), into the
|
||
|
lower half of a 32-bit general purpose register. The upper half of
|
||
|
the register is cleared to all 0s.
|
||
|
|
||
|
When the source operand is an \c{MMX} register, the two least
|
||
|
significant bits of the count specify the source word. When it is
|
||
|
an \c{SSE} register, the three least significant bits specify the
|
||
|
word location.
|
||
|
|
||
|
|
||
|
\S{insPF2ID} \i\c{PF2ID}: Packed Single-Precision FP to Integer Convert
|
||
|
|
||
|
\c PF2ID mm1,mm2/m64 ; 0F 0F /r 1D [PENT,3DNOW]
|
||
|
|
||
|
\c{PF2ID} converts two single-precision FP values in the source operand
|
||
|
to signed 32-bit integers, using truncation, and stores them in the
|
||
|
destination operand. Source values that are outside the range supported
|
||
|
by the destination are saturated to the largest absolute value of the
|
||
|
same sign.
|
||
|
|
||
|
|
||
|
\S{insPF2IW} \i\c{PF2IW}: Packed Single-Precision FP to Integer Word Convert
|
||
|
|
||
|
\c PF2IW mm1,mm2/m64 ; 0F 0F /r 1C [PENT,3DNOW]
|
||
|
|
||
|
\c{PF2IW} converts two single-precision FP values in the source operand
|
||
|
to signed 16-bit integers, using truncation, and stores them in the
|
||
|
destination operand. Source values that are outside the range supported
|
||
|
by the destination are saturated to the largest absolute value of the
|
||
|
same sign.
|
||
|
|
||
|
\b In the K6-2 and K6-III, the 16-bit value is zero-extended to 32-bits
|
||
|
before storing.
|
||
|
|
||
|
\b In the K6-2+, K6-III+ and Athlon processors, the value is sign-extended
|
||
|
to 32-bits before storing.
|
||
|
|
||
|
|
||
|
\S{insPFACC} \i\c{PFACC}: Packed Single-Precision FP Accumulate
|
||
|
|
||
|
\c PFACC mm1,mm2/m64 ; 0F 0F /r AE [PENT,3DNOW]
|
||
|
|
||
|
\c{PFACC} adds the two single-precision FP values from the destination
|
||
|
operand together, then adds the two single-precision FP values from the
|
||
|
source operand, and places the results in the low and high doublewords
|
||
|
of the destination operand.
|
||
|
|
||
|
The operation is:
|
||
|
|
||
|
\c dst[0-31] := dst[0-31] + dst[32-63],
|
||
|
\c dst[32-63] := src[0-31] + src[32-63].
|
||
|
|
||
|
|
||
|
\S{insPFADD} \i\c{PFADD}: Packed Single-Precision FP Addition
|
||
|
|
||
|
\c PFADD mm1,mm2/m64 ; 0F 0F /r 9E [PENT,3DNOW]
|
||
|
|
||
|
\c{PFADD} performs addition on each of two packed single-precision
|
||
|
FP value pairs.
|
||
|
|
||
|
\c dst[0-31] := dst[0-31] + src[0-31],
|
||
|
\c dst[32-63] := dst[32-63] + src[32-63].
|
||
|
|
||
|
|
||
|
\S{insPFCMP} \i\c{PFCMPxx}: Packed Single-Precision FP Compare
|
||
|
\I\c{PFCMPEQ} \I\c{PFCMPGE} \I\c{PFCMPGT}
|
||
|
|
||
|
\c PFCMPEQ mm1,mm2/m64 ; 0F 0F /r B0 [PENT,3DNOW]
|
||
|
\c PFCMPGE mm1,mm2/m64 ; 0F 0F /r 90 [PENT,3DNOW]
|
||
|
\c PFCMPGT mm1,mm2/m64 ; 0F 0F /r A0 [PENT,3DNOW]
|
||
|
|
||
|
The \c{PFCMPxx} instructions compare the packed single-point FP values
|
||
|
in the source and destination operands, and set the destination
|
||
|
according to the result. If the condition is true, the destination is
|
||
|
set to all 1s, otherwise it's set to all 0s.
|
||
|
|
||
|
\b \c{PFCMPEQ} tests whether dst == src;
|
||
|
|
||
|
\b \c{PFCMPGE} tests whether dst >= src;
|
||
|
|
||
|
\b \c{PFCMPGT} tests whether dst > src.
|
||
|
|
||
|
|
||
|
\S{insPFMAX} \i\c{PFMAX}: Packed Single-Precision FP Maximum
|
||
|
|
||
|
\c PFMAX mm1,mm2/m64 ; 0F 0F /r A4 [PENT,3DNOW]
|
||
|
|
||
|
\c{PFMAX} returns the higher of each pair of single-precision FP values.
|
||
|
If the higher value is zero, it is returned as positive zero.
|
||
|
|
||
|
|
||
|
\S{insPFMIN} \i\c{PFMIN}: Packed Single-Precision FP Minimum
|
||
|
|
||
|
\c PFMIN mm1,mm2/m64 ; 0F 0F /r 94 [PENT,3DNOW]
|
||
|
|
||
|
\c{PFMIN} returns the lower of each pair of single-precision FP values.
|
||
|
If the lower value is zero, it is returned as positive zero.
|
||
|
|
||
|
|
||
|
\S{insPFMUL} \i\c{PFMUL}: Packed Single-Precision FP Multiply
|
||
|
|
||
|
\c PFMUL mm1,mm2/m64 ; 0F 0F /r B4 [PENT,3DNOW]
|
||
|
|
||
|
\c{PFMUL} returns the product of each pair of single-precision FP values.
|
||
|
|
||
|
\c dst[0-31] := dst[0-31] * src[0-31],
|
||
|
\c dst[32-63] := dst[32-63] * src[32-63].
|
||
|
|
||
|
|
||
|
\S{insPFNACC} \i\c{PFNACC}: Packed Single-Precision FP Negative Accumulate
|
||
|
|
||
|
\c PFNACC mm1,mm2/m64 ; 0F 0F /r 8A [PENT,3DNOW]
|
||
|
|
||
|
\c{PFNACC} performs a negative accumulate of the two single-precision
|
||
|
FP values in the source and destination registers. The result of the
|
||
|
accumulate from the destination register is stored in the low doubleword
|
||
|
of the destination, and the result of the source accumulate is stored in
|
||
|
the high doubleword of the destination register.
|
||
|
|
||
|
The operation is:
|
||
|
|
||
|
\c dst[0-31] := dst[0-31] - dst[32-63],
|
||
|
\c dst[32-63] := src[0-31] - src[32-63].
|
||
|
|
||
|
|
||
|
\S{insPFPNACC} \i\c{PFPNACC}: Packed Single-Precision FP Mixed Accumulate
|
||
|
|
||
|
\c PFPNACC mm1,mm2/m64 ; 0F 0F /r 8E [PENT,3DNOW]
|
||
|
|
||
|
\c{PFPNACC} performs a positive accumulate of the two single-precision
|
||
|
FP values in the source register and a negative accumulate of the
|
||
|
destination register. The result of the accumulate from the destination
|
||
|
register is stored in the low doubleword of the destination, and the
|
||
|
result of the source accumulate is stored in the high doubleword of the
|
||
|
destination register.
|
||
|
|
||
|
The operation is:
|
||
|
|
||
|
\c dst[0-31] := dst[0-31] - dst[32-63],
|
||
|
\c dst[32-63] := src[0-31] + src[32-63].
|
||
|
|
||
|
|
||
|
\S{insPFRCP} \i\c{PFRCP}: Packed Single-Precision FP Reciprocal Approximation
|
||
|
|
||
|
\c PFRCP mm1,mm2/m64 ; 0F 0F /r 96 [PENT,3DNOW]
|
||
|
|
||
|
\c{PFRCP} performs a low precision estimate of the reciprocal of the
|
||
|
low-order single-precision FP value in the source operand, storing the
|
||
|
result in both halves of the destination register. The result is accurate
|
||
|
to 14 bits.
|
||
|
|
||
|
For higher precision reciprocals, this instruction should be followed by
|
||
|
two more instructions: \c{PFRCPIT1} (\k{insPFRCPIT1}) and \c{PFRCPIT2}
|
||
|
(\k{insPFRCPIT1}). This will result in a 24-bit accuracy. For more details,
|
||
|
see the AMD 3DNow! technology manual.
|
||
|
|
||
|
|
||
|
\S{insPFRCPIT1} \i\c{PFRCPIT1}: Packed Single-Precision FP Reciprocal,
|
||
|
First Iteration Step
|
||
|
|
||
|
\c PFRCPIT1 mm1,mm2/m64 ; 0F 0F /r A6 [PENT,3DNOW]
|
||
|
|
||
|
\c{PFRCPIT1} performs the first intermediate step in the calculation of
|
||
|
the reciprocal of a single-precision FP value. The first source value
|
||
|
(\c{mm1} is the original value, and the second source value (\c{mm2/m64}
|
||
|
is the result of a \c{PFRCP} instruction.
|
||
|
|
||
|
For the final step in a reciprocal, returning the full 24-bit accuracy
|
||
|
of a single-precision FP value, see \c{PFRCPIT2} (\k{insPFRCPIT2}). For
|
||
|
more details, see the AMD 3DNow! technology manual.
|
||
|
|
||
|
|
||
|
\S{insPFRCPIT2} \i\c{PFRCPIT2}: Packed Single-Precision FP
|
||
|
Reciprocal/ Reciprocal Square Root, Second Iteration Step
|
||
|
|
||
|
\c PFRCPIT2 mm1,mm2/m64 ; 0F 0F /r B6 [PENT,3DNOW]
|
||
|
|
||
|
\c{PFRCPIT2} performs the second and final intermediate step in the
|
||
|
calculation of a reciprocal or reciprocal square root, refining the
|
||
|
values returned by the \c{PFRCP} and \c{PFRSQRT} instructions,
|
||
|
respectively.
|
||
|
|
||
|
The first source value (\c{mm1}) is the output of either a \c{PFRCPIT1}
|
||
|
or a \c{PFRSQIT1} instruction, and the second source is the output of
|
||
|
either the \c{PFRCP} or the \c{PFRSQRT} instruction. For more details,
|
||
|
see the AMD 3DNow! technology manual.
|
||
|
|
||
|
|
||
|
\S{insPFRSQIT1} \i\c{PFRSQIT1}: Packed Single-Precision FP Reciprocal
|
||
|
Square Root, First Iteration Step
|
||
|
|
||
|
\c PFRSQIT1 mm1,mm2/m64 ; 0F 0F /r A7 [PENT,3DNOW]
|
||
|
|
||
|
\c{PFRSQIT1} performs the first intermediate step in the calculation of
|
||
|
the reciprocal square root of a single-precision FP value. The first
|
||
|
source value (\c{mm1} is the square of the result of a \c{PFRSQRT}
|
||
|
instruction, and the second source value (\c{mm2/m64} is the original
|
||
|
value.
|
||
|
|
||
|
For the final step in a calculation, returning the full 24-bit accuracy
|
||
|
of a single-precision FP value, see \c{PFRCPIT2} (\k{insPFRCPIT2}). For
|
||
|
more details, see the AMD 3DNow! technology manual.
|
||
|
|
||
|
|
||
|
\S{insPFRSQRT} \i\c{PFRSQRT}: Packed Single-Precision FP Reciprocal
|
||
|
Square Root Approximation
|
||
|
|
||
|
\c PFRSQRT mm1,mm2/m64 ; 0F 0F /r 97 [PENT,3DNOW]
|
||
|
|
||
|
\c{PFRSQRT} performs a low precision estimate of the reciprocal square
|
||
|
root of the low-order single-precision FP value in the source operand,
|
||
|
storing the result in both halves of the destination register. The result
|
||
|
is accurate to 15 bits.
|
||
|
|
||
|
For higher precision reciprocals, this instruction should be followed by
|
||
|
two more instructions: \c{PFRSQIT1} (\k{insPFRSQIT1}) and \c{PFRCPIT2}
|
||
|
(\k{insPFRCPIT1}). This will result in a 24-bit accuracy. For more details,
|
||
|
see the AMD 3DNow! technology manual.
|
||
|
|
||
|
|
||
|
\S{insPFSUB} \i\c{PFSUB}: Packed Single-Precision FP Subtract
|
||
|
|
||
|
\c PFSUB mm1,mm2/m64 ; 0F 0F /r 9A [PENT,3DNOW]
|
||
|
|
||
|
\c{PFSUB} subtracts the single-precision FP values in the source from
|
||
|
those in the destination, and stores the result in the destination
|
||
|
operand.
|
||
|
|
||
|
\c dst[0-31] := dst[0-31] - src[0-31],
|
||
|
\c dst[32-63] := dst[32-63] - src[32-63].
|
||
|
|
||
|
|
||
|
\S{insPFSUBR} \i\c{PFSUBR}: Packed Single-Precision FP Reverse Subtract
|
||
|
|
||
|
\c PFSUBR mm1,mm2/m64 ; 0F 0F /r AA [PENT,3DNOW]
|
||
|
|
||
|
\c{PFSUBR} subtracts the single-precision FP values in the destination
|
||
|
from those in the source, and stores the result in the destination
|
||
|
operand.
|
||
|
|
||
|
\c dst[0-31] := src[0-31] - dst[0-31],
|
||
|
\c dst[32-63] := src[32-63] - dst[32-63].
|
||
|
|
||
|
|
||
|
\S{insPI2FD} \i\c{PI2FD}: Packed Doubleword Integer to Single-Precision FP Convert
|
||
|
|
||
|
\c PI2FD mm1,mm2/m64 ; 0F 0F /r 0D [PENT,3DNOW]
|
||
|
|
||
|
\c{PF2ID} converts two signed 32-bit integers in the source operand
|
||
|
to single-precision FP values, using truncation of significant digits,
|
||
|
and stores them in the destination operand.
|
||
|
|
||
|
|
||
|
\S{insPF2IW} \i\c{PF2IW}: Packed Word Integer to Single-Precision FP Convert
|
||
|
|
||
|
\c PI2FW mm1,mm2/m64 ; 0F 0F /r 0C [PENT,3DNOW]
|
||
|
|
||
|
\c{PF2IW} converts two signed 16-bit integers in the source operand
|
||
|
to single-precision FP values, and stores them in the destination
|
||
|
operand. The input values are in the low word of each doubleword.
|
||
|
|
||
|
|
||
|
\S{insPINSRW} \i\c{PINSRW}: Insert Word
|
||
|
|
||
|
\c PINSRW mm,r16/r32/m16,imm8 ;0F C4 /r ib [KATMAI,MMX]
|
||
|
\c PINSRW xmm,r16/r32/m16,imm8 ;66 0F C4 /r ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PINSRW} loads a word from a 16-bit register (or the low half of a
|
||
|
32-bit register), or from memory, and loads it to the word position
|
||
|
in the destination register, pointed at by the count operand (third
|
||
|
operand). If the destination is an \c{MMX} register, the low two bits
|
||
|
of the count byte are used, if it is an \c{XMM} register the low 3
|
||
|
bits are used. The insertion is done in such a way that the other
|
||
|
words from the destination register are left untouched.
|
||
|
|
||
|
|
||
|
\S{insPMACHRIW} \i\c{PMACHRIW}: Packed Multiply and Accumulate with Rounding
|
||
|
|
||
|
\c PMACHRIW mm,m64 ; 0F 5E /r [CYRIX,MMX]
|
||
|
|
||
|
\c{PMACHRIW} takes two packed 16-bit integer inputs, multiplies the
|
||
|
values in the inputs, rounds on bit 15 of each result, then adds bits
|
||
|
15-30 of each result to the corresponding position of the \e{implied}
|
||
|
destination register.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c dstI[0-15] := dstI[0-15] + (mm[0-15] *m64[0-15]
|
||
|
\c + 0x00004000)[15-30],
|
||
|
\c dstI[16-31] := dstI[16-31] + (mm[16-31]*m64[16-31]
|
||
|
\c + 0x00004000)[15-30],
|
||
|
\c dstI[32-47] := dstI[32-47] + (mm[32-47]*m64[32-47]
|
||
|
\c + 0x00004000)[15-30],
|
||
|
\c dstI[48-63] := dstI[48-63] + (mm[48-63]*m64[48-63]
|
||
|
\c + 0x00004000)[15-30].
|
||
|
|
||
|
Note that \c{PMACHRIW} cannot take a register as its second source
|
||
|
operand.
|
||
|
|
||
|
|
||
|
\S{insPMADDWD} \i\c{PMADDWD}: MMX Packed Multiply and Add
|
||
|
|
||
|
\c PMADDWD mm1,mm2/m64 ; 0F F5 /r [PENT,MMX]
|
||
|
\c PMADDWD xmm1,xmm2/m128 ; 66 0F F5 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PMADDWD} treats its two inputs as vectors of signed words. It
|
||
|
multiplies corresponding elements of the two operands, giving doubleword
|
||
|
results. These are then added together in pairs and stored in the
|
||
|
destination operand.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c dst[0-31] := (dst[0-15] * src[0-15])
|
||
|
\c + (dst[16-31] * src[16-31]);
|
||
|
\c dst[32-63] := (dst[32-47] * src[32-47])
|
||
|
\c + (dst[48-63] * src[48-63]);
|
||
|
|
||
|
The following apply to the \c{SSE} version of the instruction:
|
||
|
|
||
|
\c dst[64-95] := (dst[64-79] * src[64-79])
|
||
|
\c + (dst[80-95] * src[80-95]);
|
||
|
\c dst[96-127] := (dst[96-111] * src[96-111])
|
||
|
\c + (dst[112-127] * src[112-127]).
|
||
|
|
||
|
|
||
|
\S{insPMAGW} \i\c{PMAGW}: MMX Packed Magnitude
|
||
|
|
||
|
\c PMAGW mm1,mm2/m64 ; 0F 52 /r [CYRIX,MMX]
|
||
|
|
||
|
\c{PMAGW}, specific to the Cyrix MMX extensions, treats both its
|
||
|
operands as vectors of four signed words. It compares the absolute
|
||
|
values of the words in corresponding positions, and sets each word
|
||
|
of the destination (first) operand to whichever of the two words in
|
||
|
that position had the larger absolute value.
|
||
|
|
||
|
|
||
|
\S{insPMAXSW} \i\c{PMAXSW}: Packed Signed Integer Word Maximum
|
||
|
|
||
|
\c PMAXSW mm1,mm2/m64 ; 0F EE /r [KATMAI,MMX]
|
||
|
\c PMAXSW xmm1,xmm2/m128 ; 66 0F EE /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PMAXSW} compares each pair of words in the two source operands, and
|
||
|
for each pair it stores the maximum value in the destination register.
|
||
|
|
||
|
|
||
|
\S{insPMAXUB} \i\c{PMAXUB}: Packed Unsigned Integer Byte Maximum
|
||
|
|
||
|
\c PMAXUB mm1,mm2/m64 ; 0F DE /r [KATMAI,MMX]
|
||
|
\c PMAXUB xmm1,xmm2/m128 ; 66 0F DE /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PMAXUB} compares each pair of bytes in the two source operands, and
|
||
|
for each pair it stores the maximum value in the destination register.
|
||
|
|
||
|
|
||
|
\S{insPMINSW} \i\c{PMINSW}: Packed Signed Integer Word Minimum
|
||
|
|
||
|
\c PMINSW mm1,mm2/m64 ; 0F EA /r [KATMAI,MMX]
|
||
|
\c PMINSW xmm1,xmm2/m128 ; 66 0F EA /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PMINSW} compares each pair of words in the two source operands, and
|
||
|
for each pair it stores the minimum value in the destination register.
|
||
|
|
||
|
|
||
|
\S{insPMINUB} \i\c{PMINUB}: Packed Unsigned Integer Byte Minimum
|
||
|
|
||
|
\c PMINUB mm1,mm2/m64 ; 0F DA /r [KATMAI,MMX]
|
||
|
\c PMINUB xmm1,xmm2/m128 ; 66 0F DA /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PMINUB} compares each pair of bytes in the two source operands, and
|
||
|
for each pair it stores the minimum value in the destination register.
|
||
|
|
||
|
|
||
|
\S{insPMOVMSKB} \i\c{PMOVMSKB}: Move Byte Mask To Integer
|
||
|
|
||
|
\c PMOVMSKB reg32,mm ; 0F D7 /r [KATMAI,MMX]
|
||
|
\c PMOVMSKB reg32,xmm ; 66 0F D7 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PMOVMSKB} returns an 8-bit or 16-bit mask formed of the most
|
||
|
significant bits of each byte of source operand (8-bits for an
|
||
|
\c{MMX} register, 16-bits for an \c{XMM} register).
|
||
|
|
||
|
|
||
|
\S{insPMULHRW} \i\c{PMULHRWC}, \i\c{PMULHRIW}: Multiply Packed 16-bit Integers
|
||
|
With Rounding, and Store High Word
|
||
|
|
||
|
\c PMULHRWC mm1,mm2/m64 ; 0F 59 /r [CYRIX,MMX]
|
||
|
\c PMULHRIW mm1,mm2/m64 ; 0F 5D /r [CYRIX,MMX]
|
||
|
|
||
|
These instructions take two packed 16-bit integer inputs, multiply the
|
||
|
values in the inputs, round on bit 15 of each result, then store bits
|
||
|
15-30 of each result to the corresponding position of the destination
|
||
|
register.
|
||
|
|
||
|
\b For \c{PMULHRWC}, the destination is the first source operand.
|
||
|
|
||
|
\b For \c{PMULHRIW}, the destination is an implied register (worked out
|
||
|
as described for \c{PADDSIW} (\k{insPADDSIW})).
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c dst[0-15] := (src1[0-15] *src2[0-15] + 0x00004000)[15-30]
|
||
|
\c dst[16-31] := (src1[16-31]*src2[16-31] + 0x00004000)[15-30]
|
||
|
\c dst[32-47] := (src1[32-47]*src2[32-47] + 0x00004000)[15-30]
|
||
|
\c dst[48-63] := (src1[48-63]*src2[48-63] + 0x00004000)[15-30]
|
||
|
|
||
|
See also \c{PMULHRWA} (\k{insPMULHRWA}) for a 3DNow! version of this
|
||
|
instruction.
|
||
|
|
||
|
|
||
|
\S{insPMULHRWA} \i\c{PMULHRWA}: Multiply Packed 16-bit Integers
|
||
|
With Rounding, and Store High Word
|
||
|
|
||
|
\c PMULHRWA mm1,mm2/m64 ; 0F 0F /r B7 [PENT,3DNOW]
|
||
|
|
||
|
\c{PMULHRWA} takes two packed 16-bit integer inputs, multiplies
|
||
|
the values in the inputs, rounds on bit 16 of each result, then
|
||
|
stores bits 16-31 of each result to the corresponding position
|
||
|
of the destination register.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c dst[0-15] := (src1[0-15] *src2[0-15] + 0x00008000)[16-31];
|
||
|
\c dst[16-31] := (src1[16-31]*src2[16-31] + 0x00008000)[16-31];
|
||
|
\c dst[32-47] := (src1[32-47]*src2[32-47] + 0x00008000)[16-31];
|
||
|
\c dst[48-63] := (src1[48-63]*src2[48-63] + 0x00008000)[16-31].
|
||
|
|
||
|
See also \c{PMULHRWC} (\k{insPMULHRW}) for a Cyrix version of this
|
||
|
instruction.
|
||
|
|
||
|
|
||
|
\S{insPMULHUW} \i\c{PMULHUW}: Multiply Packed 16-bit Integers,
|
||
|
and Store High Word
|
||
|
|
||
|
\c PMULHUW mm1,mm2/m64 ; 0F E4 /r [KATMAI,MMX]
|
||
|
\c PMULHUW xmm1,xmm2/m128 ; 66 0F E4 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PMULHUW} takes two packed unsigned 16-bit integer inputs, multiplies
|
||
|
the values in the inputs, then stores bits 16-31 of each result to the
|
||
|
corresponding position of the destination register.
|
||
|
|
||
|
|
||
|
\S{insPMULHW} \i\c{PMULHW}, \i\c{PMULLW}: Multiply Packed 16-bit Integers,
|
||
|
and Store
|
||
|
|
||
|
\c PMULHW mm1,mm2/m64 ; 0F E5 /r [PENT,MMX]
|
||
|
\c PMULLW mm1,mm2/m64 ; 0F D5 /r [PENT,MMX]
|
||
|
|
||
|
\c PMULHW xmm1,xmm2/m128 ; 66 0F E5 /r [WILLAMETTE,SSE2]
|
||
|
\c PMULLW xmm1,xmm2/m128 ; 66 0F D5 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PMULxW} takes two packed unsigned 16-bit integer inputs, and
|
||
|
multiplies the values in the inputs, forming doubleword results.
|
||
|
|
||
|
\b \c{PMULHW} then stores the top 16 bits of each doubleword in the
|
||
|
destination (first) operand;
|
||
|
|
||
|
\b \c{PMULLW} stores the bottom 16 bits of each doubleword in the
|
||
|
destination operand.
|
||
|
|
||
|
|
||
|
\S{insPMULUDQ} \i\c{PMULUDQ}: Multiply Packed Unsigned
|
||
|
32-bit Integers, and Store.
|
||
|
|
||
|
\c PMULUDQ mm1,mm2/m64 ; 0F F4 /r [WILLAMETTE,SSE2]
|
||
|
\c PMULUDQ xmm1,xmm2/m128 ; 66 0F F4 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PMULUDQ} takes two packed unsigned 32-bit integer inputs, and
|
||
|
multiplies the values in the inputs, forming quadword results. The
|
||
|
source is either an unsigned doubleword in the low doubleword of a
|
||
|
64-bit operand, or it's two unsigned doublewords in the first and
|
||
|
third doublewords of a 128-bit operand. This produces either one or
|
||
|
two 64-bit results, which are stored in the respective quadword
|
||
|
locations of the destination register.
|
||
|
|
||
|
The operation is:
|
||
|
|
||
|
\c dst[0-63] := dst[0-31] * src[0-31];
|
||
|
\c dst[64-127] := dst[64-95] * src[64-95].
|
||
|
|
||
|
|
||
|
\S{insPMVccZB} \i\c{PMVccZB}: MMX Packed Conditional Move
|
||
|
|
||
|
\c PMVZB mmxreg,mem64 ; 0F 58 /r [CYRIX,MMX]
|
||
|
\c PMVNZB mmxreg,mem64 ; 0F 5A /r [CYRIX,MMX]
|
||
|
\c PMVLZB mmxreg,mem64 ; 0F 5B /r [CYRIX,MMX]
|
||
|
\c PMVGEZB mmxreg,mem64 ; 0F 5C /r [CYRIX,MMX]
|
||
|
|
||
|
These instructions, specific to the Cyrix MMX extensions, perform
|
||
|
parallel conditional moves. The two input operands are treated as
|
||
|
vectors of eight bytes. Each byte of the destination (first) operand
|
||
|
is either written from the corresponding byte of the source (second)
|
||
|
operand, or left alone, depending on the value of the byte in the
|
||
|
\e{implied} operand (specified in the same way as \c{PADDSIW}, in
|
||
|
\k{insPADDSIW}).
|
||
|
|
||
|
\b \c{PMVZB} performs each move if the corresponding byte in the
|
||
|
implied operand is zero;
|
||
|
|
||
|
\b \c{PMVNZB} moves if the byte is non-zero;
|
||
|
|
||
|
\b \c{PMVLZB} moves if the byte is less than zero;
|
||
|
|
||
|
\b \c{PMVGEZB} moves if the byte is greater than or equal to zero.
|
||
|
|
||
|
Note that these instructions cannot take a register as their second
|
||
|
source operand.
|
||
|
|
||
|
|
||
|
\S{insPOP} \i\c{POP}: Pop Data from Stack
|
||
|
|
||
|
\c POP reg16 ; o16 58+r [8086]
|
||
|
\c POP reg32 ; o32 58+r [386]
|
||
|
|
||
|
\c POP r/m16 ; o16 8F /0 [8086]
|
||
|
\c POP r/m32 ; o32 8F /0 [386]
|
||
|
|
||
|
\c POP CS ; 0F [8086,UNDOC]
|
||
|
\c POP DS ; 1F [8086]
|
||
|
\c POP ES ; 07 [8086]
|
||
|
\c POP SS ; 17 [8086]
|
||
|
\c POP FS ; 0F A1 [386]
|
||
|
\c POP GS ; 0F A9 [386]
|
||
|
|
||
|
\c{POP} loads a value from the stack (from \c{[SS:SP]} or
|
||
|
\c{[SS:ESP]}) and then increments the stack pointer.
|
||
|
|
||
|
The address-size attribute of the instruction determines whether
|
||
|
\c{SP} or \c{ESP} is used as the stack pointer: to deliberately
|
||
|
override the default given by the \c{BITS} setting, you can use an
|
||
|
\i\c{a16} or \i\c{a32} prefix.
|
||
|
|
||
|
The operand-size attribute of the instruction determines whether the
|
||
|
stack pointer is incremented by 2 or 4: this means that segment
|
||
|
register pops in \c{BITS 32} mode will pop 4 bytes off the stack and
|
||
|
discard the upper two of them. If you need to override that, you can
|
||
|
use an \i\c{o16} or \i\c{o32} prefix.
|
||
|
|
||
|
The above opcode listings give two forms for general-purpose
|
||
|
register pop instructions: for example, \c{POP BX} has the two forms
|
||
|
\c{5B} and \c{8F C3}. NASM will always generate the shorter form
|
||
|
when given \c{POP BX}. NDISASM will disassemble both.
|
||
|
|
||
|
\c{POP CS} is not a documented instruction, and is not supported on
|
||
|
any processor above the 8086 (since they use \c{0Fh} as an opcode
|
||
|
prefix for instruction set extensions). However, at least some 8086
|
||
|
processors do support it, and so NASM generates it for completeness.
|
||
|
|
||
|
|
||
|
\S{insPOPA} \i\c{POPAx}: Pop All General-Purpose Registers
|
||
|
|
||
|
\c POPA ; 61 [186]
|
||
|
\c POPAW ; o16 61 [186]
|
||
|
\c POPAD ; o32 61 [386]
|
||
|
|
||
|
\b \c{POPAW} pops a word from the stack into each of, successively,
|
||
|
\c{DI}, \c{SI}, \c{BP}, nothing (it discards a word from the stack
|
||
|
which was a placeholder for \c{SP}), \c{BX}, \c{DX}, \c{CX} and
|
||
|
\c{AX}. It is intended to reverse the operation of \c{PUSHAW} (see
|
||
|
\k{insPUSHA}), but it ignores the value for \c{SP} that was pushed
|
||
|
on the stack by \c{PUSHAW}.
|
||
|
|
||
|
\b \c{POPAD} pops twice as much data, and places the results in
|
||
|
\c{EDI}, \c{ESI}, \c{EBP}, nothing (placeholder for \c{ESP}),
|
||
|
\c{EBX}, \c{EDX}, \c{ECX} and \c{EAX}. It reverses the operation of
|
||
|
\c{PUSHAD}.
|
||
|
|
||
|
\c{POPA} is an alias mnemonic for either \c{POPAW} or \c{POPAD},
|
||
|
depending on the current \c{BITS} setting.
|
||
|
|
||
|
Note that the registers are popped in reverse order of their numeric
|
||
|
values in opcodes (see \k{iref-rv}).
|
||
|
|
||
|
|
||
|
\S{insPOPF} \i\c{POPFx}: Pop Flags Register
|
||
|
|
||
|
\c POPF ; 9D [8086]
|
||
|
\c POPFW ; o16 9D [8086]
|
||
|
\c POPFD ; o32 9D [386]
|
||
|
|
||
|
\b \c{POPFW} pops a word from the stack and stores it in the bottom 16
|
||
|
bits of the flags register (or the whole flags register, on
|
||
|
processors below a 386).
|
||
|
|
||
|
\b \c{POPFD} pops a doubleword and stores it in the entire flags register.
|
||
|
|
||
|
\c{POPF} is an alias mnemonic for either \c{POPFW} or \c{POPFD},
|
||
|
depending on the current \c{BITS} setting.
|
||
|
|
||
|
See also \c{PUSHF} (\k{insPUSHF}).
|
||
|
|
||
|
|
||
|
\S{insPOR} \i\c{POR}: MMX Bitwise OR
|
||
|
|
||
|
\c POR mm1,mm2/m64 ; 0F EB /r [PENT,MMX]
|
||
|
\c POR xmm1,xmm2/m128 ; 66 0F EB /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{POR} performs a bitwise OR operation between its two operands
|
||
|
(i.e. each bit of the result is 1 if and only if at least one of the
|
||
|
corresponding bits of the two inputs was 1), and stores the result
|
||
|
in the destination (first) operand.
|
||
|
|
||
|
|
||
|
\S{insPREFETCH} \i\c{PREFETCH}: Prefetch Data Into Caches
|
||
|
|
||
|
\c PREFETCH mem8 ; 0F 0D /0 [PENT,3DNOW]
|
||
|
\c PREFETCHW mem8 ; 0F 0D /1 [PENT,3DNOW]
|
||
|
|
||
|
\c{PREFETCH} and \c{PREFETCHW} fetch the line of data from memory that
|
||
|
contains the specified byte. \c{PREFETCHW} performs differently on the
|
||
|
Athlon to earlier processors.
|
||
|
|
||
|
For more details, see the 3DNow! Technology Manual.
|
||
|
|
||
|
|
||
|
\S{insPREFETCHh} \i\c{PREFETCHh}: Prefetch Data Into Caches
|
||
|
\I\c{PREFETCHNTA} \I\c{PREFETCHT0} \I\c{PREFETCHT1} \I\c{PREFETCHT2}
|
||
|
|
||
|
\c PREFETCHNTA m8 ; 0F 18 /0 [KATMAI]
|
||
|
\c PREFETCHT0 m8 ; 0F 18 /1 [KATMAI]
|
||
|
\c PREFETCHT1 m8 ; 0F 18 /2 [KATMAI]
|
||
|
\c PREFETCHT2 m8 ; 0F 18 /3 [KATMAI]
|
||
|
|
||
|
The \c{PREFETCHh} instructions fetch the line of data from memory
|
||
|
that contains the specified byte. It is placed in the cache
|
||
|
according to rules specified by locality hints \c{h}:
|
||
|
|
||
|
The hints are:
|
||
|
|
||
|
\b \c{T0} (temporal data) - prefetch data into all levels of the
|
||
|
cache hierarchy.
|
||
|
|
||
|
\b \c{T1} (temporal data with respect to first level cache) -
|
||
|
prefetch data into level 2 cache and higher.
|
||
|
|
||
|
\b \c{T2} (temporal data with respect to second level cache) -
|
||
|
prefetch data into level 2 cache and higher.
|
||
|
|
||
|
\b \c{NTA} (non-temporal data with respect to all cache levels) -
|
||
|
prefetch data into non-temporal cache structure and into a
|
||
|
location close to the processor, minimizing cache pollution.
|
||
|
|
||
|
Note that this group of instructions doesn't provide a guarantee
|
||
|
that the data will be in the cache when it is needed. For more
|
||
|
details, see the Intel IA32 Software Developer Manual, Volume 2.
|
||
|
|
||
|
|
||
|
\S{insPSADBW} \i\c{PSADBW}: Packed Sum of Absolute Differences
|
||
|
|
||
|
\c PSADBW mm1,mm2/m64 ; 0F F6 /r [KATMAI,MMX]
|
||
|
\c PSADBW xmm1,xmm2/m128 ; 66 0F F6 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PSADBW} The PSADBW instruction computes the absolute value of the
|
||
|
difference of the packed unsigned bytes in the two source operands.
|
||
|
These differences are then summed to produce a word result in the lower
|
||
|
16-bit field of the destination register; the rest of the register is
|
||
|
cleared. The destination operand is an \c{MMX} or an \c{XMM} register.
|
||
|
The source operand can either be a register or a memory operand.
|
||
|
|
||
|
|
||
|
\S{insPSHUFD} \i\c{PSHUFD}: Shuffle Packed Doublewords
|
||
|
|
||
|
\c PSHUFD xmm1,xmm2/m128,imm8 ; 66 0F 70 /r ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PSHUFD} shuffles the doublewords in the source (second) operand
|
||
|
according to the encoding specified by imm8, and stores the result
|
||
|
in the destination (first) operand.
|
||
|
|
||
|
Bits 0 and 1 of imm8 encode the source position of the doubleword to
|
||
|
be copied to position 0 in the destination operand. Bits 2 and 3
|
||
|
encode for position 1, bits 4 and 5 encode for position 2, and bits
|
||
|
6 and 7 encode for position 3. For example, an encoding of 10 in
|
||
|
bits 0 and 1 of imm8 indicates that the doubleword at bits 64-95 of
|
||
|
the source operand will be copied to bits 0-31 of the destination.
|
||
|
|
||
|
|
||
|
\S{insPSHUFHW} \i\c{PSHUFHW}: Shuffle Packed High Words
|
||
|
|
||
|
\c PSHUFHW xmm1,xmm2/m128,imm8 ; F3 0F 70 /r ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PSHUFW} shuffles the words in the high quadword of the source
|
||
|
(second) operand according to the encoding specified by imm8, and
|
||
|
stores the result in the high quadword of the destination (first)
|
||
|
operand.
|
||
|
|
||
|
The operation of this instruction is similar to the \c{PSHUFW}
|
||
|
instruction, except that the source and destination are the top
|
||
|
quadword of a 128-bit operand, instead of being 64-bit operands.
|
||
|
The low quadword is copied from the source to the destination
|
||
|
without any changes.
|
||
|
|
||
|
|
||
|
\S{insPSHUFLW} \i\c{PSHUFLW}: Shuffle Packed Low Words
|
||
|
|
||
|
\c PSHUFLW xmm1,xmm2/m128,imm8 ; F2 0F 70 /r ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PSHUFLW} shuffles the words in the low quadword of the source
|
||
|
(second) operand according to the encoding specified by imm8, and
|
||
|
stores the result in the low quadword of the destination (first)
|
||
|
operand.
|
||
|
|
||
|
The operation of this instruction is similar to the \c{PSHUFW}
|
||
|
instruction, except that the source and destination are the low
|
||
|
quadword of a 128-bit operand, instead of being 64-bit operands.
|
||
|
The high quadword is copied from the source to the destination
|
||
|
without any changes.
|
||
|
|
||
|
|
||
|
\S{insPSHUFW} \i\c{PSHUFW}: Shuffle Packed Words
|
||
|
|
||
|
\c PSHUFW mm1,mm2/m64,imm8 ; 0F 70 /r ib [KATMAI,MMX]
|
||
|
|
||
|
\c{PSHUFW} shuffles the words in the source (second) operand
|
||
|
according to the encoding specified by imm8, and stores the result
|
||
|
in the destination (first) operand.
|
||
|
|
||
|
Bits 0 and 1 of imm8 encode the source position of the word to be
|
||
|
copied to position 0 in the destination operand. Bits 2 and 3 encode
|
||
|
for position 1, bits 4 and 5 encode for position 2, and bits 6 and 7
|
||
|
encode for position 3. For example, an encoding of 10 in bits 0 and 1
|
||
|
of imm8 indicates that the word at bits 32-47 of the source operand
|
||
|
will be copied to bits 0-15 of the destination.
|
||
|
|
||
|
|
||
|
\S{insPSLLD} \i\c{PSLLx}: Packed Data Bit Shift Left Logical
|
||
|
|
||
|
\c PSLLW mm1,mm2/m64 ; 0F F1 /r [PENT,MMX]
|
||
|
\c PSLLW mm,imm8 ; 0F 71 /6 ib [PENT,MMX]
|
||
|
|
||
|
\c PSLLW xmm1,xmm2/m128 ; 66 0F F1 /r [WILLAMETTE,SSE2]
|
||
|
\c PSLLW xmm,imm8 ; 66 0F 71 /6 ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c PSLLD mm1,mm2/m64 ; 0F F2 /r [PENT,MMX]
|
||
|
\c PSLLD mm,imm8 ; 0F 72 /6 ib [PENT,MMX]
|
||
|
|
||
|
\c PSLLD xmm1,xmm2/m128 ; 66 0F F2 /r [WILLAMETTE,SSE2]
|
||
|
\c PSLLD xmm,imm8 ; 66 0F 72 /6 ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c PSLLQ mm1,mm2/m64 ; 0F F3 /r [PENT,MMX]
|
||
|
\c PSLLQ mm,imm8 ; 0F 73 /6 ib [PENT,MMX]
|
||
|
|
||
|
\c PSLLQ xmm1,xmm2/m128 ; 66 0F F3 /r [WILLAMETTE,SSE2]
|
||
|
\c PSLLQ xmm,imm8 ; 66 0F 73 /6 ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c PSLLDQ xmm1,imm8 ; 66 0F 73 /7 ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PSLLx} performs logical left shifts of the data elements in the
|
||
|
destination (first) operand, moving each bit in the separate elements
|
||
|
left by the number of bits specified in the source (second) operand,
|
||
|
clearing the low-order bits as they are vacated. \c{PSLLDQ}
|
||
|
shifts bytes, not bits.
|
||
|
|
||
|
\b \c{PSLLW} shifts word sized elements.
|
||
|
|
||
|
\b \c{PSLLD} shifts doubleword sized elements.
|
||
|
|
||
|
\b \c{PSLLQ} shifts quadword sized elements.
|
||
|
|
||
|
\b \c{PSLLDQ} shifts double quadword sized elements.
|
||
|
|
||
|
|
||
|
\S{insPSRAD} \i\c{PSRAx}: Packed Data Bit Shift Right Arithmetic
|
||
|
|
||
|
\c PSRAW mm1,mm2/m64 ; 0F E1 /r [PENT,MMX]
|
||
|
\c PSRAW mm,imm8 ; 0F 71 /4 ib [PENT,MMX]
|
||
|
|
||
|
\c PSRAW xmm1,xmm2/m128 ; 66 0F E1 /r [WILLAMETTE,SSE2]
|
||
|
\c PSRAW xmm,imm8 ; 66 0F 71 /4 ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c PSRAD mm1,mm2/m64 ; 0F E2 /r [PENT,MMX]
|
||
|
\c PSRAD mm,imm8 ; 0F 72 /4 ib [PENT,MMX]
|
||
|
|
||
|
\c PSRAD xmm1,xmm2/m128 ; 66 0F E2 /r [WILLAMETTE,SSE2]
|
||
|
\c PSRAD xmm,imm8 ; 66 0F 72 /4 ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PSRAx} performs arithmetic right shifts of the data elements in the
|
||
|
destination (first) operand, moving each bit in the separate elements
|
||
|
right by the number of bits specified in the source (second) operand,
|
||
|
setting the high-order bits to the value of the original sign bit.
|
||
|
|
||
|
\b \c{PSRAW} shifts word sized elements.
|
||
|
|
||
|
\b \c{PSRAD} shifts doubleword sized elements.
|
||
|
|
||
|
|
||
|
\S{insPSRLD} \i\c{PSRLx}: Packed Data Bit Shift Right Logical
|
||
|
|
||
|
\c PSRLW mm1,mm2/m64 ; 0F D1 /r [PENT,MMX]
|
||
|
\c PSRLW mm,imm8 ; 0F 71 /2 ib [PENT,MMX]
|
||
|
|
||
|
\c PSRLW xmm1,xmm2/m128 ; 66 0F D1 /r [WILLAMETTE,SSE2]
|
||
|
\c PSRLW xmm,imm8 ; 66 0F 71 /2 ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c PSRLD mm1,mm2/m64 ; 0F D2 /r [PENT,MMX]
|
||
|
\c PSRLD mm,imm8 ; 0F 72 /2 ib [PENT,MMX]
|
||
|
|
||
|
\c PSRLD xmm1,xmm2/m128 ; 66 0F D2 /r [WILLAMETTE,SSE2]
|
||
|
\c PSRLD xmm,imm8 ; 66 0F 72 /2 ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c PSRLQ mm1,mm2/m64 ; 0F D3 /r [PENT,MMX]
|
||
|
\c PSRLQ mm,imm8 ; 0F 73 /2 ib [PENT,MMX]
|
||
|
|
||
|
\c PSRLQ xmm1,xmm2/m128 ; 66 0F D3 /r [WILLAMETTE,SSE2]
|
||
|
\c PSRLQ xmm,imm8 ; 66 0F 73 /2 ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c PSRLDQ xmm1,imm8 ; 66 0F 73 /3 ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PSRLx} performs logical right shifts of the data elements in the
|
||
|
destination (first) operand, moving each bit in the separate elements
|
||
|
right by the number of bits specified in the source (second) operand,
|
||
|
clearing the high-order bits as they are vacated. \c{PSRLDQ}
|
||
|
shifts bytes, not bits.
|
||
|
|
||
|
\b \c{PSRLW} shifts word sized elements.
|
||
|
|
||
|
\b \c{PSRLD} shifts doubleword sized elements.
|
||
|
|
||
|
\b \c{PSRLQ} shifts quadword sized elements.
|
||
|
|
||
|
\b \c{PSRLDQ} shifts double quadword sized elements.
|
||
|
|
||
|
|
||
|
\S{insPSUBB} \i\c{PSUBx}: Subtract Packed Integers
|
||
|
|
||
|
\c PSUBB mm1,mm2/m64 ; 0F F8 /r [PENT,MMX]
|
||
|
\c PSUBW mm1,mm2/m64 ; 0F F9 /r [PENT,MMX]
|
||
|
\c PSUBD mm1,mm2/m64 ; 0F FA /r [PENT,MMX]
|
||
|
\c PSUBQ mm1,mm2/m64 ; 0F FB /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c PSUBB xmm1,xmm2/m128 ; 66 0F F8 /r [WILLAMETTE,SSE2]
|
||
|
\c PSUBW xmm1,xmm2/m128 ; 66 0F F9 /r [WILLAMETTE,SSE2]
|
||
|
\c PSUBD xmm1,xmm2/m128 ; 66 0F FA /r [WILLAMETTE,SSE2]
|
||
|
\c PSUBQ xmm1,xmm2/m128 ; 66 0F FB /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PSUBx} subtracts packed integers in the source operand from those
|
||
|
in the destination operand. It doesn't differentiate between signed
|
||
|
and unsigned integers, and doesn't set any of the flags.
|
||
|
|
||
|
\b \c{PSUBB} operates on byte sized elements.
|
||
|
|
||
|
\b \c{PSUBW} operates on word sized elements.
|
||
|
|
||
|
\b \c{PSUBD} operates on doubleword sized elements.
|
||
|
|
||
|
\b \c{PSUBQ} operates on quadword sized elements.
|
||
|
|
||
|
|
||
|
\S{insPSUBSB} \i\c{PSUBSxx}, \i\c{PSUBUSx}: Subtract Packed Integers With Saturation
|
||
|
|
||
|
\c PSUBSB mm1,mm2/m64 ; 0F E8 /r [PENT,MMX]
|
||
|
\c PSUBSW mm1,mm2/m64 ; 0F E9 /r [PENT,MMX]
|
||
|
|
||
|
\c PSUBSB xmm1,xmm2/m128 ; 66 0F E8 /r [WILLAMETTE,SSE2]
|
||
|
\c PSUBSW xmm1,xmm2/m128 ; 66 0F E9 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c PSUBUSB mm1,mm2/m64 ; 0F D8 /r [PENT,MMX]
|
||
|
\c PSUBUSW mm1,mm2/m64 ; 0F D9 /r [PENT,MMX]
|
||
|
|
||
|
\c PSUBUSB xmm1,xmm2/m128 ; 66 0F D8 /r [WILLAMETTE,SSE2]
|
||
|
\c PSUBUSW xmm1,xmm2/m128 ; 66 0F D9 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PSUBSx} and \c{PSUBUSx} subtracts packed integers in the source
|
||
|
operand from those in the destination operand, and use saturation for
|
||
|
results that are outside the range supported by the destination operand.
|
||
|
|
||
|
\b \c{PSUBSB} operates on signed bytes, and uses signed saturation on the
|
||
|
results.
|
||
|
|
||
|
\b \c{PSUBSW} operates on signed words, and uses signed saturation on the
|
||
|
results.
|
||
|
|
||
|
\b \c{PSUBUSB} operates on unsigned bytes, and uses signed saturation on
|
||
|
the results.
|
||
|
|
||
|
\b \c{PSUBUSW} operates on unsigned words, and uses signed saturation on
|
||
|
the results.
|
||
|
|
||
|
|
||
|
\S{insPSUBSIW} \i\c{PSUBSIW}: MMX Packed Subtract with Saturation to
|
||
|
Implied Destination
|
||
|
|
||
|
\c PSUBSIW mm1,mm2/m64 ; 0F 55 /r [CYRIX,MMX]
|
||
|
|
||
|
\c{PSUBSIW}, specific to the Cyrix extensions to the MMX instruction
|
||
|
set, performs the same function as \c{PSUBSW}, except that the
|
||
|
result is not placed in the register specified by the first operand,
|
||
|
but instead in the implied destination register, specified as for
|
||
|
\c{PADDSIW} (\k{insPADDSIW}).
|
||
|
|
||
|
|
||
|
\S{insPSWAPD} \i\c{PSWAPD}: Swap Packed Data
|
||
|
\I\c{PSWAPW}
|
||
|
|
||
|
\c PSWAPD mm1,mm2/m64 ; 0F 0F /r BB [PENT,3DNOW]
|
||
|
|
||
|
\c{PSWAPD} swaps the packed doublewords in the source operand, and
|
||
|
stores the result in the destination operand.
|
||
|
|
||
|
In the \c{K6-2} and \c{K6-III} processors, this opcode uses the
|
||
|
mnemonic \c{PSWAPW}, and it swaps the order of words when copying
|
||
|
from the source to the destination.
|
||
|
|
||
|
The operation in the \c{K6-2} and \c{K6-III} processors is
|
||
|
|
||
|
\c dst[0-15] = src[48-63];
|
||
|
\c dst[16-31] = src[32-47];
|
||
|
\c dst[32-47] = src[16-31];
|
||
|
\c dst[48-63] = src[0-15].
|
||
|
|
||
|
The operation in the \c{K6-x+}, \c{ATHLON} and later processors is:
|
||
|
|
||
|
\c dst[0-31] = src[32-63];
|
||
|
\c dst[32-63] = src[0-31].
|
||
|
|
||
|
|
||
|
\S{insPUNPCKHBW} \i\c{PUNPCKxxx}: Unpack and Interleave Data
|
||
|
|
||
|
\c PUNPCKHBW mm1,mm2/m64 ; 0F 68 /r [PENT,MMX]
|
||
|
\c PUNPCKHWD mm1,mm2/m64 ; 0F 69 /r [PENT,MMX]
|
||
|
\c PUNPCKHDQ mm1,mm2/m64 ; 0F 6A /r [PENT,MMX]
|
||
|
|
||
|
\c PUNPCKHBW xmm1,xmm2/m128 ; 66 0F 68 /r [WILLAMETTE,SSE2]
|
||
|
\c PUNPCKHWD xmm1,xmm2/m128 ; 66 0F 69 /r [WILLAMETTE,SSE2]
|
||
|
\c PUNPCKHDQ xmm1,xmm2/m128 ; 66 0F 6A /r [WILLAMETTE,SSE2]
|
||
|
\c PUNPCKHQDQ xmm1,xmm2/m128 ; 66 0F 6D /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c PUNPCKLBW mm1,mm2/m32 ; 0F 60 /r [PENT,MMX]
|
||
|
\c PUNPCKLWD mm1,mm2/m32 ; 0F 61 /r [PENT,MMX]
|
||
|
\c PUNPCKLDQ mm1,mm2/m32 ; 0F 62 /r [PENT,MMX]
|
||
|
|
||
|
\c PUNPCKLBW xmm1,xmm2/m128 ; 66 0F 60 /r [WILLAMETTE,SSE2]
|
||
|
\c PUNPCKLWD xmm1,xmm2/m128 ; 66 0F 61 /r [WILLAMETTE,SSE2]
|
||
|
\c PUNPCKLDQ xmm1,xmm2/m128 ; 66 0F 62 /r [WILLAMETTE,SSE2]
|
||
|
\c PUNPCKLQDQ xmm1,xmm2/m128 ; 66 0F 6C /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PUNPCKxx} all treat their operands as vectors, and produce a new
|
||
|
vector generated by interleaving elements from the two inputs. The
|
||
|
\c{PUNPCKHxx} instructions start by throwing away the bottom half of
|
||
|
each input operand, and the \c{PUNPCKLxx} instructions throw away
|
||
|
the top half.
|
||
|
|
||
|
The remaining elements, are then interleaved into the destination,
|
||
|
alternating elements from the second (source) operand and the first
|
||
|
(destination) operand: so the leftmost part of each element in the
|
||
|
result always comes from the second operand, and the rightmost from
|
||
|
the destination.
|
||
|
|
||
|
\b \c{PUNPCKxBW} works a byte at a time, producing word sized output
|
||
|
elements.
|
||
|
|
||
|
\b \c{PUNPCKxWD} works a word at a time, producing doubleword sized
|
||
|
output elements.
|
||
|
|
||
|
\b \c{PUNPCKxDQ} works a doubleword at a time, producing quadword sized
|
||
|
output elements.
|
||
|
|
||
|
\b \c{PUNPCKxQDQ} works a quadword at a time, producing double quadword
|
||
|
sized output elements.
|
||
|
|
||
|
So, for example, for \c{MMX} operands, if the first operand held
|
||
|
\c{0x7A6A5A4A3A2A1A0A} and the second held \c{0x7B6B5B4B3B2B1B0B},
|
||
|
then:
|
||
|
|
||
|
\b \c{PUNPCKHBW} would return \c{0x7B7A6B6A5B5A4B4A}.
|
||
|
|
||
|
\b \c{PUNPCKHWD} would return \c{0x7B6B7A6A5B4B5A4A}.
|
||
|
|
||
|
\b \c{PUNPCKHDQ} would return \c{0x7B6B5B4B7A6A5A4A}.
|
||
|
|
||
|
\b \c{PUNPCKLBW} would return \c{0x3B3A2B2A1B1A0B0A}.
|
||
|
|
||
|
\b \c{PUNPCKLWD} would return \c{0x3B2B3A2A1B0B1A0A}.
|
||
|
|
||
|
\b \c{PUNPCKLDQ} would return \c{0x3B2B1B0B3A2A1A0A}.
|
||
|
|
||
|
|
||
|
\S{insPUSH} \i\c{PUSH}: Push Data on Stack
|
||
|
|
||
|
\c PUSH reg16 ; o16 50+r [8086]
|
||
|
\c PUSH reg32 ; o32 50+r [386]
|
||
|
|
||
|
\c PUSH r/m16 ; o16 FF /6 [8086]
|
||
|
\c PUSH r/m32 ; o32 FF /6 [386]
|
||
|
|
||
|
\c PUSH CS ; 0E [8086]
|
||
|
\c PUSH DS ; 1E [8086]
|
||
|
\c PUSH ES ; 06 [8086]
|
||
|
\c PUSH SS ; 16 [8086]
|
||
|
\c PUSH FS ; 0F A0 [386]
|
||
|
\c PUSH GS ; 0F A8 [386]
|
||
|
|
||
|
\c PUSH imm8 ; 6A ib [186]
|
||
|
\c PUSH imm16 ; o16 68 iw [186]
|
||
|
\c PUSH imm32 ; o32 68 id [386]
|
||
|
|
||
|
\c{PUSH} decrements the stack pointer (\c{SP} or \c{ESP}) by 2 or 4,
|
||
|
and then stores the given value at \c{[SS:SP]} or \c{[SS:ESP]}.
|
||
|
|
||
|
The address-size attribute of the instruction determines whether
|
||
|
\c{SP} or \c{ESP} is used as the stack pointer: to deliberately
|
||
|
override the default given by the \c{BITS} setting, you can use an
|
||
|
\i\c{a16} or \i\c{a32} prefix.
|
||
|
|
||
|
The operand-size attribute of the instruction determines whether the
|
||
|
stack pointer is decremented by 2 or 4: this means that segment
|
||
|
register pushes in \c{BITS 32} mode will push 4 bytes on the stack,
|
||
|
of which the upper two are undefined. If you need to override that,
|
||
|
you can use an \i\c{o16} or \i\c{o32} prefix.
|
||
|
|
||
|
The above opcode listings give two forms for general-purpose
|
||
|
\i{register push} instructions: for example, \c{PUSH BX} has the two
|
||
|
forms \c{53} and \c{FF F3}. NASM will always generate the shorter
|
||
|
form when given \c{PUSH BX}. NDISASM will disassemble both.
|
||
|
|
||
|
Unlike the undocumented and barely supported \c{POP CS}, \c{PUSH CS}
|
||
|
is a perfectly valid and sensible instruction, supported on all
|
||
|
processors.
|
||
|
|
||
|
The instruction \c{PUSH SP} may be used to distinguish an 8086 from
|
||
|
later processors: on an 8086, the value of \c{SP} stored is the
|
||
|
value it has \e{after} the push instruction, whereas on later
|
||
|
processors it is the value \e{before} the push instruction.
|
||
|
|
||
|
|
||
|
\S{insPUSHA} \i\c{PUSHAx}: Push All General-Purpose Registers
|
||
|
|
||
|
\c PUSHA ; 60 [186]
|
||
|
\c PUSHAD ; o32 60 [386]
|
||
|
\c PUSHAW ; o16 60 [186]
|
||
|
|
||
|
\c{PUSHAW} pushes, in succession, \c{AX}, \c{CX}, \c{DX}, \c{BX},
|
||
|
\c{SP}, \c{BP}, \c{SI} and \c{DI} on the stack, decrementing the
|
||
|
stack pointer by a total of 16.
|
||
|
|
||
|
\c{PUSHAD} pushes, in succession, \c{EAX}, \c{ECX}, \c{EDX},
|
||
|
\c{EBX}, \c{ESP}, \c{EBP}, \c{ESI} and \c{EDI} on the stack,
|
||
|
decrementing the stack pointer by a total of 32.
|
||
|
|
||
|
In both cases, the value of \c{SP} or \c{ESP} pushed is its
|
||
|
\e{original} value, as it had before the instruction was executed.
|
||
|
|
||
|
\c{PUSHA} is an alias mnemonic for either \c{PUSHAW} or \c{PUSHAD},
|
||
|
depending on the current \c{BITS} setting.
|
||
|
|
||
|
Note that the registers are pushed in order of their numeric values
|
||
|
in opcodes (see \k{iref-rv}).
|
||
|
|
||
|
See also \c{POPA} (\k{insPOPA}).
|
||
|
|
||
|
|
||
|
\S{insPUSHF} \i\c{PUSHFx}: Push Flags Register
|
||
|
|
||
|
\c PUSHF ; 9C [8086]
|
||
|
\c PUSHFD ; o32 9C [386]
|
||
|
\c PUSHFW ; o16 9C [8086]
|
||
|
|
||
|
\b \c{PUSHFW} pushes the bottom 16 bits of the flags register
|
||
|
(or the whole flags register, on processors below a 386) onto
|
||
|
the stack.
|
||
|
|
||
|
\b \c{PUSHFD} pushes the entire flags register onto the stack.
|
||
|
|
||
|
\c{PUSHF} is an alias mnemonic for either \c{PUSHFW} or \c{PUSHFD},
|
||
|
depending on the current \c{BITS} setting.
|
||
|
|
||
|
See also \c{POPF} (\k{insPOPF}).
|
||
|
|
||
|
|
||
|
\S{insPXOR} \i\c{PXOR}: MMX Bitwise XOR
|
||
|
|
||
|
\c PXOR mm1,mm2/m64 ; 0F EF /r [PENT,MMX]
|
||
|
\c PXOR xmm1,xmm2/m128 ; 66 0F EF /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{PXOR} performs a bitwise XOR operation between its two operands
|
||
|
(i.e. each bit of the result is 1 if and only if exactly one of the
|
||
|
corresponding bits of the two inputs was 1), and stores the result
|
||
|
in the destination (first) operand.
|
||
|
|
||
|
|
||
|
\S{insRCL} \i\c{RCL}, \i\c{RCR}: Bitwise Rotate through Carry Bit
|
||
|
|
||
|
\c RCL r/m8,1 ; D0 /2 [8086]
|
||
|
\c RCL r/m8,CL ; D2 /2 [8086]
|
||
|
\c RCL r/m8,imm8 ; C0 /2 ib [186]
|
||
|
\c RCL r/m16,1 ; o16 D1 /2 [8086]
|
||
|
\c RCL r/m16,CL ; o16 D3 /2 [8086]
|
||
|
\c RCL r/m16,imm8 ; o16 C1 /2 ib [186]
|
||
|
\c RCL r/m32,1 ; o32 D1 /2 [386]
|
||
|
\c RCL r/m32,CL ; o32 D3 /2 [386]
|
||
|
\c RCL r/m32,imm8 ; o32 C1 /2 ib [386]
|
||
|
|
||
|
\c RCR r/m8,1 ; D0 /3 [8086]
|
||
|
\c RCR r/m8,CL ; D2 /3 [8086]
|
||
|
\c RCR r/m8,imm8 ; C0 /3 ib [186]
|
||
|
\c RCR r/m16,1 ; o16 D1 /3 [8086]
|
||
|
\c RCR r/m16,CL ; o16 D3 /3 [8086]
|
||
|
\c RCR r/m16,imm8 ; o16 C1 /3 ib [186]
|
||
|
\c RCR r/m32,1 ; o32 D1 /3 [386]
|
||
|
\c RCR r/m32,CL ; o32 D3 /3 [386]
|
||
|
\c RCR r/m32,imm8 ; o32 C1 /3 ib [386]
|
||
|
|
||
|
\c{RCL} and \c{RCR} perform a 9-bit, 17-bit or 33-bit bitwise
|
||
|
rotation operation, involving the given source/destination (first)
|
||
|
operand and the carry bit. Thus, for example, in the operation
|
||
|
\c{RCL AL,1}, a 9-bit rotation is performed in which \c{AL} is
|
||
|
shifted left by 1, the top bit of \c{AL} moves into the carry flag,
|
||
|
and the original value of the carry flag is placed in the low bit of
|
||
|
\c{AL}.
|
||
|
|
||
|
The number of bits to rotate by is given by the second operand. Only
|
||
|
the bottom five bits of the rotation count are considered by
|
||
|
processors above the 8086.
|
||
|
|
||
|
You can force the longer (286 and upwards, beginning with a \c{C1}
|
||
|
byte) form of \c{RCL foo,1} by using a \c{BYTE} prefix: \c{RCL
|
||
|
foo,BYTE 1}. Similarly with \c{RCR}.
|
||
|
|
||
|
|
||
|
\S{insRCPPS} \i\c{RCPPS}: Packed Single-Precision FP Reciprocal
|
||
|
|
||
|
\c RCPPS xmm1,xmm2/m128 ; 0F 53 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{RCPPS} returns an approximation of the reciprocal of the packed
|
||
|
single-precision FP values from xmm2/m128. The maximum error for this
|
||
|
approximation is: |Error| <= 1.5 x 2^-12
|
||
|
|
||
|
|
||
|
\S{insRCPSS} \i\c{RCPSS}: Scalar Single-Precision FP Reciprocal
|
||
|
|
||
|
\c RCPSS xmm1,xmm2/m128 ; F3 0F 53 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{RCPSS} returns an approximation of the reciprocal of the lower
|
||
|
single-precision FP value from xmm2/m32; the upper three fields are
|
||
|
passed through from xmm1. The maximum error for this approximation is:
|
||
|
|Error| <= 1.5 x 2^-12
|
||
|
|
||
|
|
||
|
\S{insRDMSR} \i\c{RDMSR}: Read Model-Specific Registers
|
||
|
|
||
|
\c RDMSR ; 0F 32 [PENT,PRIV]
|
||
|
|
||
|
\c{RDMSR} reads the processor Model-Specific Register (MSR) whose
|
||
|
index is stored in \c{ECX}, and stores the result in \c{EDX:EAX}.
|
||
|
See also \c{WRMSR} (\k{insWRMSR}).
|
||
|
|
||
|
|
||
|
\S{insRDPMC} \i\c{RDPMC}: Read Performance-Monitoring Counters
|
||
|
|
||
|
\c RDPMC ; 0F 33 [P6]
|
||
|
|
||
|
\c{RDPMC} reads the processor performance-monitoring counter whose
|
||
|
index is stored in \c{ECX}, and stores the result in \c{EDX:EAX}.
|
||
|
|
||
|
This instruction is available on P6 and later processors and on MMX
|
||
|
class processors.
|
||
|
|
||
|
|
||
|
\S{insRDSHR} \i\c{RDSHR}: Read SMM Header Pointer Register
|
||
|
|
||
|
\c RDSHR r/m32 ; 0F 36 /0 [386,CYRIX,SMM]
|
||
|
|
||
|
\c{RDSHR} reads the contents of the SMM header pointer register and
|
||
|
saves it to the destination operand, which can be either a 32 bit
|
||
|
memory location or a 32 bit register.
|
||
|
|
||
|
See also \c{WRSHR} (\k{insWRSHR}).
|
||
|
|
||
|
|
||
|
\S{insRDTSC} \i\c{RDTSC}: Read Time-Stamp Counter
|
||
|
|
||
|
\c RDTSC ; 0F 31 [PENT]
|
||
|
|
||
|
\c{RDTSC} reads the processor's time-stamp counter into \c{EDX:EAX}.
|
||
|
|
||
|
|
||
|
\S{insRET} \i\c{RET}, \i\c{RETF}, \i\c{RETN}: Return from Procedure Call
|
||
|
|
||
|
\c RET ; C3 [8086]
|
||
|
\c RET imm16 ; C2 iw [8086]
|
||
|
|
||
|
\c RETF ; CB [8086]
|
||
|
\c RETF imm16 ; CA iw [8086]
|
||
|
|
||
|
\c RETN ; C3 [8086]
|
||
|
\c RETN imm16 ; C2 iw [8086]
|
||
|
|
||
|
\b \c{RET}, and its exact synonym \c{RETN}, pop \c{IP} or \c{EIP} from
|
||
|
the stack and transfer control to the new address. Optionally, if a
|
||
|
numeric second operand is provided, they increment the stack pointer
|
||
|
by a further \c{imm16} bytes after popping the return address.
|
||
|
|
||
|
\b \c{RETF} executes a far return: after popping \c{IP}/\c{EIP}, it
|
||
|
then pops \c{CS}, and \e{then} increments the stack pointer by the
|
||
|
optional argument if present.
|
||
|
|
||
|
|
||
|
\S{insROL} \i\c{ROL}, \i\c{ROR}: Bitwise Rotate
|
||
|
|
||
|
\c ROL r/m8,1 ; D0 /0 [8086]
|
||
|
\c ROL r/m8,CL ; D2 /0 [8086]
|
||
|
\c ROL r/m8,imm8 ; C0 /0 ib [186]
|
||
|
\c ROL r/m16,1 ; o16 D1 /0 [8086]
|
||
|
\c ROL r/m16,CL ; o16 D3 /0 [8086]
|
||
|
\c ROL r/m16,imm8 ; o16 C1 /0 ib [186]
|
||
|
\c ROL r/m32,1 ; o32 D1 /0 [386]
|
||
|
\c ROL r/m32,CL ; o32 D3 /0 [386]
|
||
|
\c ROL r/m32,imm8 ; o32 C1 /0 ib [386]
|
||
|
|
||
|
\c ROR r/m8,1 ; D0 /1 [8086]
|
||
|
\c ROR r/m8,CL ; D2 /1 [8086]
|
||
|
\c ROR r/m8,imm8 ; C0 /1 ib [186]
|
||
|
\c ROR r/m16,1 ; o16 D1 /1 [8086]
|
||
|
\c ROR r/m16,CL ; o16 D3 /1 [8086]
|
||
|
\c ROR r/m16,imm8 ; o16 C1 /1 ib [186]
|
||
|
\c ROR r/m32,1 ; o32 D1 /1 [386]
|
||
|
\c ROR r/m32,CL ; o32 D3 /1 [386]
|
||
|
\c ROR r/m32,imm8 ; o32 C1 /1 ib [386]
|
||
|
|
||
|
\c{ROL} and \c{ROR} perform a bitwise rotation operation on the given
|
||
|
source/destination (first) operand. Thus, for example, in the
|
||
|
operation \c{ROL AL,1}, an 8-bit rotation is performed in which
|
||
|
\c{AL} is shifted left by 1 and the original top bit of \c{AL} moves
|
||
|
round into the low bit.
|
||
|
|
||
|
The number of bits to rotate by is given by the second operand. Only
|
||
|
the bottom five bits of the rotation count are considered by processors
|
||
|
above the 8086.
|
||
|
|
||
|
You can force the longer (286 and upwards, beginning with a \c{C1}
|
||
|
byte) form of \c{ROL foo,1} by using a \c{BYTE} prefix: \c{ROL
|
||
|
foo,BYTE 1}. Similarly with \c{ROR}.
|
||
|
|
||
|
|
||
|
\S{insRSDC} \i\c{RSDC}: Restore Segment Register and Descriptor
|
||
|
|
||
|
\c RSDC segreg,m80 ; 0F 79 /r [486,CYRIX,SMM]
|
||
|
|
||
|
\c{RSDC} restores a segment register (DS, ES, FS, GS, or SS) from mem80,
|
||
|
and sets up its descriptor.
|
||
|
|
||
|
|
||
|
\S{insRSLDT} \i\c{RSLDT}: Restore Segment Register and Descriptor
|
||
|
|
||
|
\c RSLDT m80 ; 0F 7B /0 [486,CYRIX,SMM]
|
||
|
|
||
|
\c{RSLDT} restores the Local Descriptor Table (LDTR) from mem80.
|
||
|
|
||
|
|
||
|
\S{insRSM} \i\c{RSM}: Resume from System-Management Mode
|
||
|
|
||
|
\c RSM ; 0F AA [PENT]
|
||
|
|
||
|
\c{RSM} returns the processor to its normal operating mode when it
|
||
|
was in System-Management Mode.
|
||
|
|
||
|
|
||
|
\S{insRSQRTPS} \i\c{RSQRTPS}: Packed Single-Precision FP Square Root Reciprocal
|
||
|
|
||
|
\c RSQRTPS xmm1,xmm2/m128 ; 0F 52 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{RSQRTPS} computes the approximate reciprocals of the square
|
||
|
roots of the packed single-precision floating-point values in the
|
||
|
source and stores the results in xmm1. The maximum error for this
|
||
|
approximation is: |Error| <= 1.5 x 2^-12
|
||
|
|
||
|
|
||
|
\S{insRSQRTSS} \i\c{RSQRTSS}: Scalar Single-Precision FP Square Root Reciprocal
|
||
|
|
||
|
\c RSQRTSS xmm1,xmm2/m128 ; F3 0F 52 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{RSQRTSS} returns an approximation of the reciprocal of the
|
||
|
square root of the lowest order single-precision FP value from
|
||
|
the source, and stores it in the low doubleword of the destination
|
||
|
register. The upper three fields of xmm1 are preserved. The maximum
|
||
|
error for this approximation is: |Error| <= 1.5 x 2^-12
|
||
|
|
||
|
|
||
|
\S{insRSTS} \i\c{RSTS}: Restore TSR and Descriptor
|
||
|
|
||
|
\c RSTS m80 ; 0F 7D /0 [486,CYRIX,SMM]
|
||
|
|
||
|
\c{RSTS} restores Task State Register (TSR) from mem80.
|
||
|
|
||
|
|
||
|
\S{insSAHF} \i\c{SAHF}: Store AH to Flags
|
||
|
|
||
|
\c SAHF ; 9E [8086]
|
||
|
|
||
|
\c{SAHF} sets the low byte of the flags word according to the
|
||
|
contents of the \c{AH} register.
|
||
|
|
||
|
The operation of \c{SAHF} is:
|
||
|
|
||
|
\c AH --> SF:ZF:0:AF:0:PF:1:CF
|
||
|
|
||
|
See also \c{LAHF} (\k{insLAHF}).
|
||
|
|
||
|
|
||
|
\S{insSAL} \i\c{SAL}, \i\c{SAR}: Bitwise Arithmetic Shifts
|
||
|
|
||
|
\c SAL r/m8,1 ; D0 /4 [8086]
|
||
|
\c SAL r/m8,CL ; D2 /4 [8086]
|
||
|
\c SAL r/m8,imm8 ; C0 /4 ib [186]
|
||
|
\c SAL r/m16,1 ; o16 D1 /4 [8086]
|
||
|
\c SAL r/m16,CL ; o16 D3 /4 [8086]
|
||
|
\c SAL r/m16,imm8 ; o16 C1 /4 ib [186]
|
||
|
\c SAL r/m32,1 ; o32 D1 /4 [386]
|
||
|
\c SAL r/m32,CL ; o32 D3 /4 [386]
|
||
|
\c SAL r/m32,imm8 ; o32 C1 /4 ib [386]
|
||
|
|
||
|
\c SAR r/m8,1 ; D0 /7 [8086]
|
||
|
\c SAR r/m8,CL ; D2 /7 [8086]
|
||
|
\c SAR r/m8,imm8 ; C0 /7 ib [186]
|
||
|
\c SAR r/m16,1 ; o16 D1 /7 [8086]
|
||
|
\c SAR r/m16,CL ; o16 D3 /7 [8086]
|
||
|
\c SAR r/m16,imm8 ; o16 C1 /7 ib [186]
|
||
|
\c SAR r/m32,1 ; o32 D1 /7 [386]
|
||
|
\c SAR r/m32,CL ; o32 D3 /7 [386]
|
||
|
\c SAR r/m32,imm8 ; o32 C1 /7 ib [386]
|
||
|
|
||
|
\c{SAL} and \c{SAR} perform an arithmetic shift operation on the given
|
||
|
source/destination (first) operand. The vacated bits are filled with
|
||
|
zero for \c{SAL}, and with copies of the original high bit of the
|
||
|
source operand for \c{SAR}.
|
||
|
|
||
|
\c{SAL} is a synonym for \c{SHL} (see \k{insSHL}). NASM will
|
||
|
assemble either one to the same code, but NDISASM will always
|
||
|
disassemble that code as \c{SHL}.
|
||
|
|
||
|
The number of bits to shift by is given by the second operand. Only
|
||
|
the bottom five bits of the shift count are considered by processors
|
||
|
above the 8086.
|
||
|
|
||
|
You can force the longer (286 and upwards, beginning with a \c{C1}
|
||
|
byte) form of \c{SAL foo,1} by using a \c{BYTE} prefix: \c{SAL
|
||
|
foo,BYTE 1}. Similarly with \c{SAR}.
|
||
|
|
||
|
|
||
|
\S{insSALC} \i\c{SALC}: Set AL from Carry Flag
|
||
|
|
||
|
\c SALC ; D6 [8086,UNDOC]
|
||
|
|
||
|
\c{SALC} is an early undocumented instruction similar in concept to
|
||
|
\c{SETcc} (\k{insSETcc}). Its function is to set \c{AL} to zero if
|
||
|
the carry flag is clear, or to \c{0xFF} if it is set.
|
||
|
|
||
|
|
||
|
\S{insSBB} \i\c{SBB}: Subtract with Borrow
|
||
|
|
||
|
\c SBB r/m8,reg8 ; 18 /r [8086]
|
||
|
\c SBB r/m16,reg16 ; o16 19 /r [8086]
|
||
|
\c SBB r/m32,reg32 ; o32 19 /r [386]
|
||
|
|
||
|
\c SBB reg8,r/m8 ; 1A /r [8086]
|
||
|
\c SBB reg16,r/m16 ; o16 1B /r [8086]
|
||
|
\c SBB reg32,r/m32 ; o32 1B /r [386]
|
||
|
|
||
|
\c SBB r/m8,imm8 ; 80 /3 ib [8086]
|
||
|
\c SBB r/m16,imm16 ; o16 81 /3 iw [8086]
|
||
|
\c SBB r/m32,imm32 ; o32 81 /3 id [386]
|
||
|
|
||
|
\c SBB r/m16,imm8 ; o16 83 /3 ib [8086]
|
||
|
\c SBB r/m32,imm8 ; o32 83 /3 ib [386]
|
||
|
|
||
|
\c SBB AL,imm8 ; 1C ib [8086]
|
||
|
\c SBB AX,imm16 ; o16 1D iw [8086]
|
||
|
\c SBB EAX,imm32 ; o32 1D id [386]
|
||
|
|
||
|
\c{SBB} performs integer subtraction: it subtracts its second
|
||
|
operand, plus the value of the carry flag, from its first, and
|
||
|
leaves the result in its destination (first) operand. The flags are
|
||
|
set according to the result of the operation: in particular, the
|
||
|
carry flag is affected and can be used by a subsequent \c{SBB}
|
||
|
instruction.
|
||
|
|
||
|
In the forms with an 8-bit immediate second operand and a longer
|
||
|
first operand, the second operand is considered to be signed, and is
|
||
|
sign-extended to the length of the first operand. In these cases,
|
||
|
the \c{BYTE} qualifier is necessary to force NASM to generate this
|
||
|
form of the instruction.
|
||
|
|
||
|
To subtract one number from another without also subtracting the
|
||
|
contents of the carry flag, use \c{SUB} (\k{insSUB}).
|
||
|
|
||
|
|
||
|
\S{insSCASB} \i\c{SCASB}, \i\c{SCASW}, \i\c{SCASD}: Scan String
|
||
|
|
||
|
\c SCASB ; AE [8086]
|
||
|
\c SCASW ; o16 AF [8086]
|
||
|
\c SCASD ; o32 AF [386]
|
||
|
|
||
|
\c{SCASB} compares the byte in \c{AL} with the byte at \c{[ES:DI]}
|
||
|
or \c{[ES:EDI]}, and sets the flags accordingly. It then increments
|
||
|
or decrements (depending on the direction flag: increments if the
|
||
|
flag is clear, decrements if it is set) \c{DI} (or \c{EDI}).
|
||
|
|
||
|
The register used is \c{DI} if the address size is 16 bits, and
|
||
|
\c{EDI} if it is 32 bits. If you need to use an address size not
|
||
|
equal to the current \c{BITS} setting, you can use an explicit
|
||
|
\i\c{a16} or \i\c{a32} prefix.
|
||
|
|
||
|
Segment override prefixes have no effect for this instruction: the
|
||
|
use of \c{ES} for the load from \c{[DI]} or \c{[EDI]} cannot be
|
||
|
overridden.
|
||
|
|
||
|
\c{SCASW} and \c{SCASD} work in the same way, but they compare a
|
||
|
word to \c{AX} or a doubleword to \c{EAX} instead of a byte to
|
||
|
\c{AL}, and increment or decrement the addressing registers by 2 or
|
||
|
4 instead of 1.
|
||
|
|
||
|
The \c{REPE} and \c{REPNE} prefixes (equivalently, \c{REPZ} and
|
||
|
\c{REPNZ}) may be used to repeat the instruction up to \c{CX} (or
|
||
|
\c{ECX} - again, the address size chooses which) times until the
|
||
|
first unequal or equal byte is found.
|
||
|
|
||
|
|
||
|
\S{insSETcc} \i\c{SETcc}: Set Register from Condition
|
||
|
|
||
|
\c SETcc r/m8 ; 0F 90+cc /2 [386]
|
||
|
|
||
|
\c{SETcc} sets the given 8-bit operand to zero if its condition is
|
||
|
not satisfied, and to 1 if it is.
|
||
|
|
||
|
|
||
|
\S{insSFENCE} \i\c{SFENCE}: Store Fence
|
||
|
|
||
|
\c SFENCE ; 0F AE /7 [KATMAI]
|
||
|
|
||
|
\c{SFENCE} performs a serialising operation on all writes to memory
|
||
|
that were issued before the \c{SFENCE} instruction. This guarantees that
|
||
|
all memory writes before the \c{SFENCE} instruction are visible before any
|
||
|
writes after the \c{SFENCE} instruction.
|
||
|
|
||
|
\c{SFENCE} is ordered respective to other \c{SFENCE} instruction, \c{MFENCE},
|
||
|
any memory write and any other serialising instruction (such as \c{CPUID}).
|
||
|
|
||
|
Weakly ordered memory types can be used to achieve higher processor
|
||
|
performance through such techniques as out-of-order issue,
|
||
|
write-combining, and write-collapsing. The degree to which a consumer
|
||
|
of data recognizes or knows that the data is weakly ordered varies
|
||
|
among applications and may be unknown to the producer of this data.
|
||
|
The \c{SFENCE} instruction provides a performance-efficient way of
|
||
|
insuring store ordering between routines that produce weakly-ordered
|
||
|
results and routines that consume this data.
|
||
|
|
||
|
\c{SFENCE} uses the following ModRM encoding:
|
||
|
|
||
|
\c Mod (7:6) = 11B
|
||
|
\c Reg/Opcode (5:3) = 111B
|
||
|
\c R/M (2:0) = 000B
|
||
|
|
||
|
All other ModRM encodings are defined to be reserved, and use
|
||
|
of these encodings risks incompatibility with future processors.
|
||
|
|
||
|
See also \c{LFENCE} (\k{insLFENCE}) and \c{MFENCE} (\k{insMFENCE}).
|
||
|
|
||
|
|
||
|
\S{insSGDT} \i\c{SGDT}, \i\c{SIDT}, \i\c{SLDT}: Store Descriptor Table Pointers
|
||
|
|
||
|
\c SGDT mem ; 0F 01 /0 [286,PRIV]
|
||
|
\c SIDT mem ; 0F 01 /1 [286,PRIV]
|
||
|
\c SLDT r/m16 ; 0F 00 /0 [286,PRIV]
|
||
|
|
||
|
\c{SGDT} and \c{SIDT} both take a 6-byte memory area as an operand:
|
||
|
they store the contents of the GDTR (global descriptor table
|
||
|
register) or IDTR (interrupt descriptor table register) into that
|
||
|
area as a 32-bit linear address and a 16-bit size limit from that
|
||
|
area (in that order). These are the only instructions which directly
|
||
|
use \e{linear} addresses, rather than segment/offset pairs.
|
||
|
|
||
|
\c{SLDT} stores the segment selector corresponding to the LDT (local
|
||
|
descriptor table) into the given operand.
|
||
|
|
||
|
See also \c{LGDT}, \c{LIDT} and \c{LLDT} (\k{insLGDT}).
|
||
|
|
||
|
|
||
|
\S{insSHL} \i\c{SHL}, \i\c{SHR}: Bitwise Logical Shifts
|
||
|
|
||
|
\c SHL r/m8,1 ; D0 /4 [8086]
|
||
|
\c SHL r/m8,CL ; D2 /4 [8086]
|
||
|
\c SHL r/m8,imm8 ; C0 /4 ib [186]
|
||
|
\c SHL r/m16,1 ; o16 D1 /4 [8086]
|
||
|
\c SHL r/m16,CL ; o16 D3 /4 [8086]
|
||
|
\c SHL r/m16,imm8 ; o16 C1 /4 ib [186]
|
||
|
\c SHL r/m32,1 ; o32 D1 /4 [386]
|
||
|
\c SHL r/m32,CL ; o32 D3 /4 [386]
|
||
|
\c SHL r/m32,imm8 ; o32 C1 /4 ib [386]
|
||
|
|
||
|
\c SHR r/m8,1 ; D0 /5 [8086]
|
||
|
\c SHR r/m8,CL ; D2 /5 [8086]
|
||
|
\c SHR r/m8,imm8 ; C0 /5 ib [186]
|
||
|
\c SHR r/m16,1 ; o16 D1 /5 [8086]
|
||
|
\c SHR r/m16,CL ; o16 D3 /5 [8086]
|
||
|
\c SHR r/m16,imm8 ; o16 C1 /5 ib [186]
|
||
|
\c SHR r/m32,1 ; o32 D1 /5 [386]
|
||
|
\c SHR r/m32,CL ; o32 D3 /5 [386]
|
||
|
\c SHR r/m32,imm8 ; o32 C1 /5 ib [386]
|
||
|
|
||
|
\c{SHL} and \c{SHR} perform a logical shift operation on the given
|
||
|
source/destination (first) operand. The vacated bits are filled with
|
||
|
zero.
|
||
|
|
||
|
A synonym for \c{SHL} is \c{SAL} (see \k{insSAL}). NASM will
|
||
|
assemble either one to the same code, but NDISASM will always
|
||
|
disassemble that code as \c{SHL}.
|
||
|
|
||
|
The number of bits to shift by is given by the second operand. Only
|
||
|
the bottom five bits of the shift count are considered by processors
|
||
|
above the 8086.
|
||
|
|
||
|
You can force the longer (286 and upwards, beginning with a \c{C1}
|
||
|
byte) form of \c{SHL foo,1} by using a \c{BYTE} prefix: \c{SHL
|
||
|
foo,BYTE 1}. Similarly with \c{SHR}.
|
||
|
|
||
|
|
||
|
\S{insSHLD} \i\c{SHLD}, \i\c{SHRD}: Bitwise Double-Precision Shifts
|
||
|
|
||
|
\c SHLD r/m16,reg16,imm8 ; o16 0F A4 /r ib [386]
|
||
|
\c SHLD r/m16,reg32,imm8 ; o32 0F A4 /r ib [386]
|
||
|
\c SHLD r/m16,reg16,CL ; o16 0F A5 /r [386]
|
||
|
\c SHLD r/m16,reg32,CL ; o32 0F A5 /r [386]
|
||
|
|
||
|
\c SHRD r/m16,reg16,imm8 ; o16 0F AC /r ib [386]
|
||
|
\c SHRD r/m32,reg32,imm8 ; o32 0F AC /r ib [386]
|
||
|
\c SHRD r/m16,reg16,CL ; o16 0F AD /r [386]
|
||
|
\c SHRD r/m32,reg32,CL ; o32 0F AD /r [386]
|
||
|
|
||
|
\b \c{SHLD} performs a double-precision left shift. It notionally
|
||
|
places its second operand to the right of its first, then shifts
|
||
|
the entire bit string thus generated to the left by a number of
|
||
|
bits specified in the third operand. It then updates only the
|
||
|
\e{first} operand according to the result of this. The second
|
||
|
operand is not modified.
|
||
|
|
||
|
\b \c{SHRD} performs the corresponding right shift: it notionally
|
||
|
places the second operand to the \e{left} of the first, shifts the
|
||
|
whole bit string right, and updates only the first operand.
|
||
|
|
||
|
For example, if \c{EAX} holds \c{0x01234567} and \c{EBX} holds
|
||
|
\c{0x89ABCDEF}, then the instruction \c{SHLD EAX,EBX,4} would update
|
||
|
\c{EAX} to hold \c{0x12345678}. Under the same conditions, \c{SHRD
|
||
|
EAX,EBX,4} would update \c{EAX} to hold \c{0xF0123456}.
|
||
|
|
||
|
The number of bits to shift by is given by the third operand. Only
|
||
|
the bottom five bits of the shift count are considered.
|
||
|
|
||
|
|
||
|
\S{insSHUFPD} \i\c{SHUFPD}: Shuffle Packed Double-Precision FP Values
|
||
|
|
||
|
\c SHUFPD xmm1,xmm2/m128,imm8 ; 66 0F C6 /r ib [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{SHUFPD} moves one of the packed double-precision FP values from
|
||
|
the destination operand into the low quadword of the destination
|
||
|
operand; the upper quadword is generated by moving one of the
|
||
|
double-precision FP values from the source operand into the
|
||
|
destination. The select (third) operand selects which of the values
|
||
|
are moved to the destination register.
|
||
|
|
||
|
The select operand is an 8-bit immediate: bit 0 selects which value
|
||
|
is moved from the destination operand to the result (where 0 selects
|
||
|
the low quadword and 1 selects the high quadword) and bit 1 selects
|
||
|
which value is moved from the source operand to the result.
|
||
|
Bits 2 through 7 of the shuffle operand are reserved.
|
||
|
|
||
|
|
||
|
\S{insSHUFPS} \i\c{SHUFPS}: Shuffle Packed Single-Precision FP Values
|
||
|
|
||
|
\c SHUFPS xmm1,xmm2/m128,imm8 ; 0F C6 /r ib [KATMAI,SSE]
|
||
|
|
||
|
\c{SHUFPS} moves two of the packed single-precision FP values from
|
||
|
the destination operand into the low quadword of the destination
|
||
|
operand; the upper quadword is generated by moving two of the
|
||
|
single-precision FP values from the source operand into the
|
||
|
destination. The select (third) operand selects which of the
|
||
|
values are moved to the destination register.
|
||
|
|
||
|
The select operand is an 8-bit immediate: bits 0 and 1 select the
|
||
|
value to be moved from the destination operand the low doubleword of
|
||
|
the result, bits 2 and 3 select the value to be moved from the
|
||
|
destination operand the second doubleword of the result, bits 4 and
|
||
|
5 select the value to be moved from the source operand the third
|
||
|
doubleword of the result, and bits 6 and 7 select the value to be
|
||
|
moved from the source operand to the high doubleword of the result.
|
||
|
|
||
|
|
||
|
\S{insSMI} \i\c{SMI}: System Management Interrupt
|
||
|
|
||
|
\c SMI ; F1 [386,UNDOC]
|
||
|
|
||
|
\c{SMI} puts some AMD processors into SMM mode. It is available on some
|
||
|
386 and 486 processors, and is only available when DR7 bit 12 is set,
|
||
|
otherwise it generates an Int 1.
|
||
|
|
||
|
|
||
|
\S{insSMINT} \i\c{SMINT}, \i\c{SMINTOLD}: Software SMM Entry (CYRIX)
|
||
|
|
||
|
\c SMINT ; 0F 38 [PENT,CYRIX]
|
||
|
\c SMINTOLD ; 0F 7E [486,CYRIX]
|
||
|
|
||
|
\c{SMINT} puts the processor into SMM mode. The CPU state information is
|
||
|
saved in the SMM memory header, and then execution begins at the SMM base
|
||
|
address.
|
||
|
|
||
|
\c{SMINTOLD} is the same as \c{SMINT}, but was the opcode used on the 486.
|
||
|
|
||
|
This pair of opcodes are specific to the Cyrix and compatible range of
|
||
|
processors (Cyrix, IBM, Via).
|
||
|
|
||
|
|
||
|
\S{insSMSW} \i\c{SMSW}: Store Machine Status Word
|
||
|
|
||
|
\c SMSW r/m16 ; 0F 01 /4 [286,PRIV]
|
||
|
|
||
|
\c{SMSW} stores the bottom half of the \c{CR0} control register (or
|
||
|
the Machine Status Word, on 286 processors) into the destination
|
||
|
operand. See also \c{LMSW} (\k{insLMSW}).
|
||
|
|
||
|
For 32-bit code, this would store all of \c{CR0} in the specified
|
||
|
register (or the bottom 16 bits if the destination is a memory location),
|
||
|
without needing an operand size override byte.
|
||
|
|
||
|
|
||
|
\S{insSQRTPD} \i\c{SQRTPD}: Packed Double-Precision FP Square Root
|
||
|
|
||
|
\c SQRTPD xmm1,xmm2/m128 ; 66 0F 51 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{SQRTPD} calculates the square root of the packed double-precision
|
||
|
FP value from the source operand, and stores the double-precision
|
||
|
results in the destination register.
|
||
|
|
||
|
|
||
|
\S{insSQRTPS} \i\c{SQRTPS}: Packed Single-Precision FP Square Root
|
||
|
|
||
|
\c SQRTPS xmm1,xmm2/m128 ; 0F 51 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{SQRTPS} calculates the square root of the packed single-precision
|
||
|
FP value from the source operand, and stores the single-precision
|
||
|
results in the destination register.
|
||
|
|
||
|
|
||
|
\S{insSQRTSD} \i\c{SQRTSD}: Scalar Double-Precision FP Square Root
|
||
|
|
||
|
\c SQRTSD xmm1,xmm2/m128 ; F2 0F 51 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{SQRTSD} calculates the square root of the low-order double-precision
|
||
|
FP value from the source operand, and stores the double-precision
|
||
|
result in the destination register. The high-quadword remains unchanged.
|
||
|
|
||
|
|
||
|
\S{insSQRTSS} \i\c{SQRTSS}: Scalar Single-Precision FP Square Root
|
||
|
|
||
|
\c SQRTSS xmm1,xmm2/m128 ; F3 0F 51 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{SQRTSS} calculates the square root of the low-order single-precision
|
||
|
FP value from the source operand, and stores the single-precision
|
||
|
result in the destination register. The three high doublewords remain
|
||
|
unchanged.
|
||
|
|
||
|
|
||
|
\S{insSTC} \i\c{STC}, \i\c{STD}, \i\c{STI}: Set Flags
|
||
|
|
||
|
\c STC ; F9 [8086]
|
||
|
\c STD ; FD [8086]
|
||
|
\c STI ; FB [8086]
|
||
|
|
||
|
These instructions set various flags. \c{STC} sets the carry flag;
|
||
|
\c{STD} sets the direction flag; and \c{STI} sets the interrupt flag
|
||
|
(thus enabling interrupts).
|
||
|
|
||
|
To clear the carry, direction, or interrupt flags, use the \c{CLC},
|
||
|
\c{CLD} and \c{CLI} instructions (\k{insCLC}). To invert the carry
|
||
|
flag, use \c{CMC} (\k{insCMC}).
|
||
|
|
||
|
|
||
|
\S{insSTMXCSR} \i\c{STMXCSR}: Store Streaming SIMD Extension
|
||
|
Control/Status
|
||
|
|
||
|
\c STMXCSR m32 ; 0F AE /3 [KATMAI,SSE]
|
||
|
|
||
|
\c{STMXCSR} stores the contents of the \c{MXCSR} control/status
|
||
|
register to the specified memory location. \c{MXCSR} is used to
|
||
|
enable masked/unmasked exception handling, to set rounding modes,
|
||
|
to set flush-to-zero mode, and to view exception status flags.
|
||
|
The reserved bits in the \c{MXCSR} register are stored as 0s.
|
||
|
|
||
|
For details of the \c{MXCSR} register, see the Intel processor docs.
|
||
|
|
||
|
See also \c{LDMXCSR} (\k{insLDMXCSR}).
|
||
|
|
||
|
|
||
|
\S{insSTOSB} \i\c{STOSB}, \i\c{STOSW}, \i\c{STOSD}: Store Byte to String
|
||
|
|
||
|
\c STOSB ; AA [8086]
|
||
|
\c STOSW ; o16 AB [8086]
|
||
|
\c STOSD ; o32 AB [386]
|
||
|
|
||
|
\c{STOSB} stores the byte in \c{AL} at \c{[ES:DI]} or \c{[ES:EDI]},
|
||
|
and sets the flags accordingly. It then increments or decrements
|
||
|
(depending on the direction flag: increments if the flag is clear,
|
||
|
decrements if it is set) \c{DI} (or \c{EDI}).
|
||
|
|
||
|
The register used is \c{DI} if the address size is 16 bits, and
|
||
|
\c{EDI} if it is 32 bits. If you need to use an address size not
|
||
|
equal to the current \c{BITS} setting, you can use an explicit
|
||
|
\i\c{a16} or \i\c{a32} prefix.
|
||
|
|
||
|
Segment override prefixes have no effect for this instruction: the
|
||
|
use of \c{ES} for the store to \c{[DI]} or \c{[EDI]} cannot be
|
||
|
overridden.
|
||
|
|
||
|
\c{STOSW} and \c{STOSD} work in the same way, but they store the
|
||
|
word in \c{AX} or the doubleword in \c{EAX} instead of the byte in
|
||
|
\c{AL}, and increment or decrement the addressing registers by 2 or
|
||
|
4 instead of 1.
|
||
|
|
||
|
The \c{REP} prefix may be used to repeat the instruction \c{CX} (or
|
||
|
\c{ECX} - again, the address size chooses which) times.
|
||
|
|
||
|
|
||
|
\S{insSTR} \i\c{STR}: Store Task Register
|
||
|
|
||
|
\c STR r/m16 ; 0F 00 /1 [286,PRIV]
|
||
|
|
||
|
\c{STR} stores the segment selector corresponding to the contents of
|
||
|
the Task Register into its operand. When the operand size is 32 bit and
|
||
|
the destination is a register, the upper 16-bits are cleared to 0s.
|
||
|
When the destination operand is a memory location, 16 bits are
|
||
|
written regardless of the operand size.
|
||
|
|
||
|
|
||
|
\S{insSUB} \i\c{SUB}: Subtract Integers
|
||
|
|
||
|
\c SUB r/m8,reg8 ; 28 /r [8086]
|
||
|
\c SUB r/m16,reg16 ; o16 29 /r [8086]
|
||
|
\c SUB r/m32,reg32 ; o32 29 /r [386]
|
||
|
|
||
|
\c SUB reg8,r/m8 ; 2A /r [8086]
|
||
|
\c SUB reg16,r/m16 ; o16 2B /r [8086]
|
||
|
\c SUB reg32,r/m32 ; o32 2B /r [386]
|
||
|
|
||
|
\c SUB r/m8,imm8 ; 80 /5 ib [8086]
|
||
|
\c SUB r/m16,imm16 ; o16 81 /5 iw [8086]
|
||
|
\c SUB r/m32,imm32 ; o32 81 /5 id [386]
|
||
|
|
||
|
\c SUB r/m16,imm8 ; o16 83 /5 ib [8086]
|
||
|
\c SUB r/m32,imm8 ; o32 83 /5 ib [386]
|
||
|
|
||
|
\c SUB AL,imm8 ; 2C ib [8086]
|
||
|
\c SUB AX,imm16 ; o16 2D iw [8086]
|
||
|
\c SUB EAX,imm32 ; o32 2D id [386]
|
||
|
|
||
|
\c{SUB} performs integer subtraction: it subtracts its second
|
||
|
operand from its first, and leaves the result in its destination
|
||
|
(first) operand. The flags are set according to the result of the
|
||
|
operation: in particular, the carry flag is affected and can be used
|
||
|
by a subsequent \c{SBB} instruction (\k{insSBB}).
|
||
|
|
||
|
In the forms with an 8-bit immediate second operand and a longer
|
||
|
first operand, the second operand is considered to be signed, and is
|
||
|
sign-extended to the length of the first operand. In these cases,
|
||
|
the \c{BYTE} qualifier is necessary to force NASM to generate this
|
||
|
form of the instruction.
|
||
|
|
||
|
|
||
|
\S{insSUBPD} \i\c{SUBPD}: Packed Double-Precision FP Subtract
|
||
|
|
||
|
\c SUBPD xmm1,xmm2/m128 ; 66 0F 5C /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{SUBPD} subtracts the packed double-precision FP values of
|
||
|
the source operand from those of the destination operand, and
|
||
|
stores the result in the destination operation.
|
||
|
|
||
|
|
||
|
\S{insSUBPS} \i\c{SUBPS}: Packed Single-Precision FP Subtract
|
||
|
|
||
|
\c SUBPS xmm1,xmm2/m128 ; 0F 5C /r [KATMAI,SSE]
|
||
|
|
||
|
\c{SUBPS} subtracts the packed single-precision FP values of
|
||
|
the source operand from those of the destination operand, and
|
||
|
stores the result in the destination operation.
|
||
|
|
||
|
|
||
|
\S{insSUBSD} \i\c{SUBSD}: Scalar Single-FP Subtract
|
||
|
|
||
|
\c SUBSD xmm1,xmm2/m128 ; F2 0F 5C /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{SUBSD} subtracts the low-order double-precision FP value of
|
||
|
the source operand from that of the destination operand, and
|
||
|
stores the result in the destination operation. The high
|
||
|
quadword is unchanged.
|
||
|
|
||
|
|
||
|
\S{insSUBSS} \i\c{SUBSS}: Scalar Single-FP Subtract
|
||
|
|
||
|
\c SUBSS xmm1,xmm2/m128 ; F3 0F 5C /r [KATMAI,SSE]
|
||
|
|
||
|
\c{SUBSS} subtracts the low-order single-precision FP value of
|
||
|
the source operand from that of the destination operand, and
|
||
|
stores the result in the destination operation. The three high
|
||
|
doublewords are unchanged.
|
||
|
|
||
|
|
||
|
\S{insSVDC} \i\c{SVDC}: Save Segment Register and Descriptor
|
||
|
|
||
|
\c SVDC m80,segreg ; 0F 78 /r [486,CYRIX,SMM]
|
||
|
|
||
|
\c{SVDC} saves a segment register (DS, ES, FS, GS, or SS) and its
|
||
|
descriptor to mem80.
|
||
|
|
||
|
|
||
|
\S{insSVLDT} \i\c{SVLDT}: Save LDTR and Descriptor
|
||
|
|
||
|
\c SVLDT m80 ; 0F 7A /0 [486,CYRIX,SMM]
|
||
|
|
||
|
\c{SVLDT} saves the Local Descriptor Table (LDTR) to mem80.
|
||
|
|
||
|
|
||
|
\S{insSVTS} \i\c{SVTS}: Save TSR and Descriptor
|
||
|
|
||
|
\c SVTS m80 ; 0F 7C /0 [486,CYRIX,SMM]
|
||
|
|
||
|
\c{SVTS} saves the Task State Register (TSR) to mem80.
|
||
|
|
||
|
|
||
|
\S{insSYSCALL} \i\c{SYSCALL}: Call Operating System
|
||
|
|
||
|
\c SYSCALL ; 0F 05 [P6,AMD]
|
||
|
|
||
|
\c{SYSCALL} provides a fast method of transferring control to a fixed
|
||
|
entry point in an operating system.
|
||
|
|
||
|
\b The \c{EIP} register is copied into the \c{ECX} register.
|
||
|
|
||
|
\b Bits [31-0] of the 64-bit SYSCALL/SYSRET Target Address Register
|
||
|
(\c{STAR}) are copied into the \c{EIP} register.
|
||
|
|
||
|
\b Bits [47-32] of the \c{STAR} register specify the selector that is
|
||
|
copied into the \c{CS} register.
|
||
|
|
||
|
\b Bits [47-32]+1000b of the \c{STAR} register specify the selector that
|
||
|
is copied into the SS register.
|
||
|
|
||
|
The \c{CS} and \c{SS} registers should not be modified by the operating
|
||
|
system between the execution of the \c{SYSCALL} instruction and its
|
||
|
corresponding \c{SYSRET} instruction.
|
||
|
|
||
|
For more information, see the \c{SYSCALL and SYSRET Instruction Specification}
|
||
|
(AMD document number 21086.pdf).
|
||
|
|
||
|
|
||
|
\S{insSYSENTER} \i\c{SYSENTER}: Fast System Call
|
||
|
|
||
|
\c SYSENTER ; 0F 34 [P6]
|
||
|
|
||
|
\c{SYSENTER} executes a fast call to a level 0 system procedure or
|
||
|
routine. Before using this instruction, various MSRs need to be set
|
||
|
up:
|
||
|
|
||
|
\b \c{SYSENTER_CS_MSR} contains the 32-bit segment selector for the
|
||
|
privilege level 0 code segment. (This value is also used to compute
|
||
|
the segment selector of the privilege level 0 stack segment.)
|
||
|
|
||
|
\b \c{SYSENTER_EIP_MSR} contains the 32-bit offset into the privilege
|
||
|
level 0 code segment to the first instruction of the selected operating
|
||
|
procedure or routine.
|
||
|
|
||
|
\b \c{SYSENTER_ESP_MSR} contains the 32-bit stack pointer for the
|
||
|
privilege level 0 stack.
|
||
|
|
||
|
\c{SYSENTER} performs the following sequence of operations:
|
||
|
|
||
|
\b Loads the segment selector from the \c{SYSENTER_CS_MSR} into the
|
||
|
\c{CS} register.
|
||
|
|
||
|
\b Loads the instruction pointer from the \c{SYSENTER_EIP_MSR} into
|
||
|
the \c{EIP} register.
|
||
|
|
||
|
\b Adds 8 to the value in \c{SYSENTER_CS_MSR} and loads it into the
|
||
|
\c{SS} register.
|
||
|
|
||
|
\b Loads the stack pointer from the \c{SYSENTER_ESP_MSR} into the
|
||
|
\c{ESP} register.
|
||
|
|
||
|
\b Switches to privilege level 0.
|
||
|
|
||
|
\b Clears the \c{VM} flag in the \c{EFLAGS} register, if the flag
|
||
|
is set.
|
||
|
|
||
|
\b Begins executing the selected system procedure.
|
||
|
|
||
|
In particular, note that this instruction des not save the values of
|
||
|
\c{CS} or \c{(E)IP}. If you need to return to the calling code, you
|
||
|
need to write your code to cater for this.
|
||
|
|
||
|
For more information, see the Intel Architecture Software Developer's
|
||
|
Manual, Volume 2.
|
||
|
|
||
|
|
||
|
\S{insSYSEXIT} \i\c{SYSEXIT}: Fast Return From System Call
|
||
|
|
||
|
\c SYSEXIT ; 0F 35 [P6,PRIV]
|
||
|
|
||
|
\c{SYSEXIT} executes a fast return to privilege level 3 user code.
|
||
|
This instruction is a companion instruction to the \c{SYSENTER}
|
||
|
instruction, and can only be executed by privilege level 0 code.
|
||
|
Various registers need to be set up before calling this instruction:
|
||
|
|
||
|
\b \c{SYSENTER_CS_MSR} contains the 32-bit segment selector for the
|
||
|
privilege level 0 code segment in which the processor is currently
|
||
|
executing. (This value is used to compute the segment selectors for
|
||
|
the privilege level 3 code and stack segments.)
|
||
|
|
||
|
\b \c{EDX} contains the 32-bit offset into the privilege level 3 code
|
||
|
segment to the first instruction to be executed in the user code.
|
||
|
|
||
|
\b \c{ECX} contains the 32-bit stack pointer for the privilege level 3
|
||
|
stack.
|
||
|
|
||
|
\c{SYSEXIT} performs the following sequence of operations:
|
||
|
|
||
|
\b Adds 16 to the value in \c{SYSENTER_CS_MSR} and loads the sum into
|
||
|
the \c{CS} selector register.
|
||
|
|
||
|
\b Loads the instruction pointer from the \c{EDX} register into the
|
||
|
\c{EIP} register.
|
||
|
|
||
|
\b Adds 24 to the value in \c{SYSENTER_CS_MSR} and loads the sum
|
||
|
into the \c{SS} selector register.
|
||
|
|
||
|
\b Loads the stack pointer from the \c{ECX} register into the \c{ESP}
|
||
|
register.
|
||
|
|
||
|
\b Switches to privilege level 3.
|
||
|
|
||
|
\b Begins executing the user code at the \c{EIP} address.
|
||
|
|
||
|
For more information on the use of the \c{SYSENTER} and \c{SYSEXIT}
|
||
|
instructions, see the Intel Architecture Software Developer's
|
||
|
Manual, Volume 2.
|
||
|
|
||
|
|
||
|
\S{insSYSRET} \i\c{SYSRET}: Return From Operating System
|
||
|
|
||
|
\c SYSRET ; 0F 07 [P6,AMD,PRIV]
|
||
|
|
||
|
\c{SYSRET} is the return instruction used in conjunction with the
|
||
|
\c{SYSCALL} instruction to provide fast entry/exit to an operating system.
|
||
|
|
||
|
\b The \c{ECX} register, which points to the next sequential instruction
|
||
|
after the corresponding \c{SYSCALL} instruction, is copied into the \c{EIP}
|
||
|
register.
|
||
|
|
||
|
\b Bits [63-48] of the \c{STAR} register specify the selector that is copied
|
||
|
into the \c{CS} register.
|
||
|
|
||
|
\b Bits [63-48]+1000b of the \c{STAR} register specify the selector that is
|
||
|
copied into the \c{SS} register.
|
||
|
|
||
|
\b Bits [1-0] of the \c{SS} register are set to 11b (RPL of 3) regardless of
|
||
|
the value of bits [49-48] of the \c{STAR} register.
|
||
|
|
||
|
The \c{CS} and \c{SS} registers should not be modified by the operating
|
||
|
system between the execution of the \c{SYSCALL} instruction and its
|
||
|
corresponding \c{SYSRET} instruction.
|
||
|
|
||
|
For more information, see the \c{SYSCALL and SYSRET Instruction Specification}
|
||
|
(AMD document number 21086.pdf).
|
||
|
|
||
|
|
||
|
\S{insTEST} \i\c{TEST}: Test Bits (notional bitwise AND)
|
||
|
|
||
|
\c TEST r/m8,reg8 ; 84 /r [8086]
|
||
|
\c TEST r/m16,reg16 ; o16 85 /r [8086]
|
||
|
\c TEST r/m32,reg32 ; o32 85 /r [386]
|
||
|
|
||
|
\c TEST r/m8,imm8 ; F6 /0 ib [8086]
|
||
|
\c TEST r/m16,imm16 ; o16 F7 /0 iw [8086]
|
||
|
\c TEST r/m32,imm32 ; o32 F7 /0 id [386]
|
||
|
|
||
|
\c TEST AL,imm8 ; A8 ib [8086]
|
||
|
\c TEST AX,imm16 ; o16 A9 iw [8086]
|
||
|
\c TEST EAX,imm32 ; o32 A9 id [386]
|
||
|
|
||
|
\c{TEST} performs a `mental' bitwise AND of its two operands, and
|
||
|
affects the flags as if the operation had taken place, but does not
|
||
|
store the result of the operation anywhere.
|
||
|
|
||
|
|
||
|
\S{insUCOMISD} \i\c{UCOMISD}: Unordered Scalar Double-Precision FP
|
||
|
compare and set EFLAGS
|
||
|
|
||
|
\c UCOMISD xmm1,xmm2/m128 ; 66 0F 2E /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{UCOMISD} compares the low-order double-precision FP numbers in the
|
||
|
two operands, and sets the \c{ZF}, \c{PF} and \c{CF} bits in the
|
||
|
\c{EFLAGS} register. In addition, the \c{OF}, \c{SF} and \c{AF} bits
|
||
|
in the \c{EFLAGS} register are zeroed out. The unordered predicate
|
||
|
(\c{ZF}, \c{PF} and \c{CF} all set) is returned if either source
|
||
|
operand is a \c{NaN} (\c{qNaN} or \c{sNaN}).
|
||
|
|
||
|
|
||
|
\S{insUCOMISS} \i\c{UCOMISS}: Unordered Scalar Single-Precision FP
|
||
|
compare and set EFLAGS
|
||
|
|
||
|
\c UCOMISS xmm1,xmm2/m128 ; 0F 2E /r [KATMAI,SSE]
|
||
|
|
||
|
\c{UCOMISS} compares the low-order single-precision FP numbers in the
|
||
|
two operands, and sets the \c{ZF}, \c{PF} and \c{CF} bits in the
|
||
|
\c{EFLAGS} register. In addition, the \c{OF}, \c{SF} and \c{AF} bits
|
||
|
in the \c{EFLAGS} register are zeroed out. The unordered predicate
|
||
|
(\c{ZF}, \c{PF} and \c{CF} all set) is returned if either source
|
||
|
operand is a \c{NaN} (\c{qNaN} or \c{sNaN}).
|
||
|
|
||
|
|
||
|
\S{insUD2} \i\c{UD0}, \i\c{UD1}, \i\c{UD2}: Undefined Instruction
|
||
|
|
||
|
\c UD0 ; 0F FF [186,UNDOC]
|
||
|
\c UD1 ; 0F B9 [186,UNDOC]
|
||
|
\c UD2 ; 0F 0B [186]
|
||
|
|
||
|
\c{UDx} can be used to generate an invalid opcode exception, for testing
|
||
|
purposes.
|
||
|
|
||
|
\c{UD0} is specifically documented by AMD as being reserved for this
|
||
|
purpose.
|
||
|
|
||
|
\c{UD1} is documented by Intel as being available for this purpose.
|
||
|
|
||
|
\c{UD2} is specifically documented by Intel as being reserved for this
|
||
|
purpose. Intel document this as the preferred method of generating an
|
||
|
invalid opcode exception.
|
||
|
|
||
|
All these opcodes can be used to generate invalid opcode exceptions on
|
||
|
all currently available processors.
|
||
|
|
||
|
|
||
|
\S{insUMOV} \i\c{UMOV}: User Move Data
|
||
|
|
||
|
\c UMOV r/m8,reg8 ; 0F 10 /r [386,UNDOC]
|
||
|
\c UMOV r/m16,reg16 ; o16 0F 11 /r [386,UNDOC]
|
||
|
\c UMOV r/m32,reg32 ; o32 0F 11 /r [386,UNDOC]
|
||
|
|
||
|
\c UMOV reg8,r/m8 ; 0F 12 /r [386,UNDOC]
|
||
|
\c UMOV reg16,r/m16 ; o16 0F 13 /r [386,UNDOC]
|
||
|
\c UMOV reg32,r/m32 ; o32 0F 13 /r [386,UNDOC]
|
||
|
|
||
|
This undocumented instruction is used by in-circuit emulators to
|
||
|
access user memory (as opposed to host memory). It is used just like
|
||
|
an ordinary memory/register or register/register \c{MOV}
|
||
|
instruction, but accesses user space.
|
||
|
|
||
|
This instruction is only available on some AMD and IBM 386 and 486
|
||
|
processors.
|
||
|
|
||
|
|
||
|
\S{insUNPCKHPD} \i\c{UNPCKHPD}: Unpack and Interleave High Packed
|
||
|
Double-Precision FP Values
|
||
|
|
||
|
\c UNPCKHPD xmm1,xmm2/m128 ; 66 0F 15 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{UNPCKHPD} performs an interleaved unpack of the high-order data
|
||
|
elements of the source and destination operands, saving the result
|
||
|
in \c{xmm1}. It ignores the lower half of the sources.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c dst[63-0] := dst[127-64];
|
||
|
\c dst[127-64] := src[127-64].
|
||
|
|
||
|
|
||
|
\S{insUNPCKHPS} \i\c{UNPCKHPS}: Unpack and Interleave High Packed
|
||
|
Single-Precision FP Values
|
||
|
|
||
|
\c UNPCKHPS xmm1,xmm2/m128 ; 0F 15 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{UNPCKHPS} performs an interleaved unpack of the high-order data
|
||
|
elements of the source and destination operands, saving the result
|
||
|
in \c{xmm1}. It ignores the lower half of the sources.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c dst[31-0] := dst[95-64];
|
||
|
\c dst[63-32] := src[95-64];
|
||
|
\c dst[95-64] := dst[127-96];
|
||
|
\c dst[127-96] := src[127-96].
|
||
|
|
||
|
|
||
|
\S{insUNPCKLPD} \i\c{UNPCKLPD}: Unpack and Interleave Low Packed
|
||
|
Double-Precision FP Data
|
||
|
|
||
|
\c UNPCKLPD xmm1,xmm2/m128 ; 66 0F 14 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{UNPCKLPD} performs an interleaved unpack of the low-order data
|
||
|
elements of the source and destination operands, saving the result
|
||
|
in \c{xmm1}. It ignores the lower half of the sources.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c dst[63-0] := dst[63-0];
|
||
|
\c dst[127-64] := src[63-0].
|
||
|
|
||
|
|
||
|
\S{insUNPCKLPS} \i\c{UNPCKLPS}: Unpack and Interleave Low Packed
|
||
|
Single-Precision FP Data
|
||
|
|
||
|
\c UNPCKLPS xmm1,xmm2/m128 ; 0F 14 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{UNPCKLPS} performs an interleaved unpack of the low-order data
|
||
|
elements of the source and destination operands, saving the result
|
||
|
in \c{xmm1}. It ignores the lower half of the sources.
|
||
|
|
||
|
The operation of this instruction is:
|
||
|
|
||
|
\c dst[31-0] := dst[31-0];
|
||
|
\c dst[63-32] := src[31-0];
|
||
|
\c dst[95-64] := dst[63-32];
|
||
|
\c dst[127-96] := src[63-32].
|
||
|
|
||
|
|
||
|
\S{insVERR} \i\c{VERR}, \i\c{VERW}: Verify Segment Readability/Writability
|
||
|
|
||
|
\c VERR r/m16 ; 0F 00 /4 [286,PRIV]
|
||
|
|
||
|
\c VERW r/m16 ; 0F 00 /5 [286,PRIV]
|
||
|
|
||
|
\b \c{VERR} sets the zero flag if the segment specified by the selector
|
||
|
in its operand can be read from at the current privilege level.
|
||
|
Otherwise it is cleared.
|
||
|
|
||
|
\b \c{VERW} sets the zero flag if the segment can be written.
|
||
|
|
||
|
|
||
|
\S{insWAIT} \i\c{WAIT}: Wait for Floating-Point Processor
|
||
|
|
||
|
\c WAIT ; 9B [8086]
|
||
|
\c FWAIT ; 9B [8086]
|
||
|
|
||
|
\c{WAIT}, on 8086 systems with a separate 8087 FPU, waits for the
|
||
|
FPU to have finished any operation it is engaged in before
|
||
|
continuing main processor operations, so that (for example) an FPU
|
||
|
store to main memory can be guaranteed to have completed before the
|
||
|
CPU tries to read the result back out.
|
||
|
|
||
|
On higher processors, \c{WAIT} is unnecessary for this purpose, and
|
||
|
it has the alternative purpose of ensuring that any pending unmasked
|
||
|
FPU exceptions have happened before execution continues.
|
||
|
|
||
|
|
||
|
\S{insWBINVD} \i\c{WBINVD}: Write Back and Invalidate Cache
|
||
|
|
||
|
\c WBINVD ; 0F 09 [486]
|
||
|
|
||
|
\c{WBINVD} invalidates and empties the processor's internal caches,
|
||
|
and causes the processor to instruct external caches to do the same.
|
||
|
It writes the contents of the caches back to memory first, so no
|
||
|
data is lost. To flush the caches quickly without bothering to write
|
||
|
the data back first, use \c{INVD} (\k{insINVD}).
|
||
|
|
||
|
|
||
|
\S{insWRMSR} \i\c{WRMSR}: Write Model-Specific Registers
|
||
|
|
||
|
\c WRMSR ; 0F 30 [PENT]
|
||
|
|
||
|
\c{WRMSR} writes the value in \c{EDX:EAX} to the processor
|
||
|
Model-Specific Register (MSR) whose index is stored in \c{ECX}.
|
||
|
See also \c{RDMSR} (\k{insRDMSR}).
|
||
|
|
||
|
|
||
|
\S{insWRSHR} \i\c{WRSHR}: Write SMM Header Pointer Register
|
||
|
|
||
|
\c WRSHR r/m32 ; 0F 37 /0 [386,CYRIX,SMM]
|
||
|
|
||
|
\c{WRSHR} loads the contents of either a 32-bit memory location or a
|
||
|
32-bit register into the SMM header pointer register.
|
||
|
|
||
|
See also \c{RDSHR} (\k{insRDSHR}).
|
||
|
|
||
|
|
||
|
\S{insXADD} \i\c{XADD}: Exchange and Add
|
||
|
|
||
|
\c XADD r/m8,reg8 ; 0F C0 /r [486]
|
||
|
\c XADD r/m16,reg16 ; o16 0F C1 /r [486]
|
||
|
\c XADD r/m32,reg32 ; o32 0F C1 /r [486]
|
||
|
|
||
|
\c{XADD} exchanges the values in its two operands, and then adds
|
||
|
them together and writes the result into the destination (first)
|
||
|
operand. This instruction can be used with a \c{LOCK} prefix for
|
||
|
multi-processor synchronisation purposes.
|
||
|
|
||
|
|
||
|
\S{insXBTS} \i\c{XBTS}: Extract Bit String
|
||
|
|
||
|
\c XBTS reg16,r/m16 ; o16 0F A6 /r [386,UNDOC]
|
||
|
\c XBTS reg32,r/m32 ; o32 0F A6 /r [386,UNDOC]
|
||
|
|
||
|
The implied operation of this instruction is:
|
||
|
|
||
|
\c XBTS r/m16,reg16,AX,CL
|
||
|
\c XBTS r/m32,reg32,EAX,CL
|
||
|
|
||
|
Writes a bit string from the source operand to the destination. \c{CL}
|
||
|
indicates the number of bits to be copied, and \c{(E)AX} indicates the
|
||
|
low order bit offset in the source. The bits are written to the low
|
||
|
order bits of the destination register. For example, if \c{CL} is set
|
||
|
to 4 and \c{AX} (for 16-bit code) is set to 5, bits 5-8 of \c{src} will
|
||
|
be copied to bits 0-3 of \c{dst}. This instruction is very poorly
|
||
|
documented, and I have been unable to find any official source of
|
||
|
documentation on it.
|
||
|
|
||
|
\c{XBTS} is supported only on the early Intel 386s, and conflicts with
|
||
|
the opcodes for \c{CMPXCHG486} (on early Intel 486s). NASM supports it
|
||
|
only for completeness. Its counterpart is \c{IBTS} (see \k{insIBTS}).
|
||
|
|
||
|
|
||
|
\S{insXCHG} \i\c{XCHG}: Exchange
|
||
|
|
||
|
\c XCHG reg8,r/m8 ; 86 /r [8086]
|
||
|
\c XCHG reg16,r/m8 ; o16 87 /r [8086]
|
||
|
\c XCHG reg32,r/m32 ; o32 87 /r [386]
|
||
|
|
||
|
\c XCHG r/m8,reg8 ; 86 /r [8086]
|
||
|
\c XCHG r/m16,reg16 ; o16 87 /r [8086]
|
||
|
\c XCHG r/m32,reg32 ; o32 87 /r [386]
|
||
|
|
||
|
\c XCHG AX,reg16 ; o16 90+r [8086]
|
||
|
\c XCHG EAX,reg32 ; o32 90+r [386]
|
||
|
\c XCHG reg16,AX ; o16 90+r [8086]
|
||
|
\c XCHG reg32,EAX ; o32 90+r [386]
|
||
|
|
||
|
\c{XCHG} exchanges the values in its two operands. It can be used
|
||
|
with a \c{LOCK} prefix for purposes of multi-processor
|
||
|
synchronisation.
|
||
|
|
||
|
\c{XCHG AX,AX} or \c{XCHG EAX,EAX} (depending on the \c{BITS}
|
||
|
setting) generates the opcode \c{90h}, and so is a synonym for
|
||
|
\c{NOP} (\k{insNOP}).
|
||
|
|
||
|
|
||
|
\S{insXLATB} \i\c{XLATB}: Translate Byte in Lookup Table
|
||
|
|
||
|
\c XLAT ; D7 [8086]
|
||
|
\c XLATB ; D7 [8086]
|
||
|
|
||
|
\c{XLATB} adds the value in \c{AL}, treated as an unsigned byte, to
|
||
|
\c{BX} or \c{EBX}, and loads the byte from the resulting address (in
|
||
|
the segment specified by \c{DS}) back into \c{AL}.
|
||
|
|
||
|
The base register used is \c{BX} if the address size is 16 bits, and
|
||
|
\c{EBX} if it is 32 bits. If you need to use an address size not
|
||
|
equal to the current \c{BITS} setting, you can use an explicit
|
||
|
\i\c{a16} or \i\c{a32} prefix.
|
||
|
|
||
|
The segment register used to load from \c{[BX+AL]} or \c{[EBX+AL]}
|
||
|
can be overridden by using a segment register name as a prefix (for
|
||
|
example, \c{es xlatb}).
|
||
|
|
||
|
|
||
|
\S{insXOR} \i\c{XOR}: Bitwise Exclusive OR
|
||
|
|
||
|
\c XOR r/m8,reg8 ; 30 /r [8086]
|
||
|
\c XOR r/m16,reg16 ; o16 31 /r [8086]
|
||
|
\c XOR r/m32,reg32 ; o32 31 /r [386]
|
||
|
|
||
|
\c XOR reg8,r/m8 ; 32 /r [8086]
|
||
|
\c XOR reg16,r/m16 ; o16 33 /r [8086]
|
||
|
\c XOR reg32,r/m32 ; o32 33 /r [386]
|
||
|
|
||
|
\c XOR r/m8,imm8 ; 80 /6 ib [8086]
|
||
|
\c XOR r/m16,imm16 ; o16 81 /6 iw [8086]
|
||
|
\c XOR r/m32,imm32 ; o32 81 /6 id [386]
|
||
|
|
||
|
\c XOR r/m16,imm8 ; o16 83 /6 ib [8086]
|
||
|
\c XOR r/m32,imm8 ; o32 83 /6 ib [386]
|
||
|
|
||
|
\c XOR AL,imm8 ; 34 ib [8086]
|
||
|
\c XOR AX,imm16 ; o16 35 iw [8086]
|
||
|
\c XOR EAX,imm32 ; o32 35 id [386]
|
||
|
|
||
|
\c{XOR} performs a bitwise XOR operation between its two operands
|
||
|
(i.e. each bit of the result is 1 if and only if exactly one of the
|
||
|
corresponding bits of the two inputs was 1), and stores the result
|
||
|
in the destination (first) operand.
|
||
|
|
||
|
In the forms with an 8-bit immediate second operand and a longer
|
||
|
first operand, the second operand is considered to be signed, and is
|
||
|
sign-extended to the length of the first operand. In these cases,
|
||
|
the \c{BYTE} qualifier is necessary to force NASM to generate this
|
||
|
form of the instruction.
|
||
|
|
||
|
The \c{MMX} instruction \c{PXOR} (see \k{insPXOR}) performs the same
|
||
|
operation on the 64-bit \c{MMX} registers.
|
||
|
|
||
|
|
||
|
\S{insXORPD} \i\c{XORPD}: Bitwise Logical XOR of Double-Precision FP Values
|
||
|
|
||
|
\c XORPD xmm1,xmm2/m128 ; 66 0F 57 /r [WILLAMETTE,SSE2]
|
||
|
|
||
|
\c{XORPD} returns a bit-wise logical XOR between the source and
|
||
|
destination operands, storing the result in the destination operand.
|
||
|
|
||
|
|
||
|
\S{insXORPS} \i\c{XORPS}: Bitwise Logical XOR of Single-Precision FP Values
|
||
|
|
||
|
\c XORPS xmm1,xmm2/m128 ; 0F 57 /r [KATMAI,SSE]
|
||
|
|
||
|
\c{XORPS} returns a bit-wise logical XOR between the source and
|
||
|
destination operands, storing the result in the destination operand.
|
||
|
|
||
|
|