Unify all the standard IEEE formats into one function, add support for
IEEE standard 128-bit floating point numbers.
The 80-bit format is still special since it explicitly represents the
integer portion.
0F 1F /0 is documented as an EA-taking NOP since the P6.
0F 18..1F + EA are all "hinting nops" (instructions which, when
unimplemented, have no effect rather than #UD) but 0F 1F /0
specifically has no operation whatsoever.
0FC2 is not really an instruction prefix; it's the opcode for
CMPPS/CMPSS, which takes a control immediate which Intel chose to have
opcode aliases for. However, we can't dispatch on a tail byte, so
it's useless.
Modify the disassembler so that we can have separate instruction
tables for prefixed instructions. As it was, all instructions which
started with 0F were linearly searched, and that is by now more than
half the instruction set.
Implement oword, reso, do, as well as the SO flag to instructions. No
instructions are actually flagged with SO yet, but this allows us to
specify 128-bit sizes in instruction patterns.
Combining arithmetric (add) and bitwise (xor) mixing seems to give
better result than either.
With the new prehash function, we find a valid hash much quicker.
This checkin completes what is required to actually generate SSE5
instructions. No support in the disassembler yet.
This checkin covers:
- Support for actually generating DREX prefixes.
- Support for matching operand "operand X must match Y"
Sort the dependency lists generated by "mkdep.pl", to make sure that
re-running "make alldeps" doesn't change anything unless there has
been real dependency changes. The previous version could produce
different output between runs and across platforms.
Update compilation instructions for MSVC++, and point out that it's
not just Unix systems which can use the GNU instructions -- it also
applies to MacOS X and Windows with either Cygwin or MinGW.
Minor fixes to make it possible to compile with MS Visual C++ 2005.
Unfortunately, MSVC++ is not fully C99 compliant; in particular, it
doesn't handle interspersed declarations and other code. Furthermore,
it chokes on some expressions in outelf64.c, which fortunately can be
easily substituted with simpler expressions.
Switch the preprocessor over to using the hash table library. On my
system, this improves the runtime of the output of test/pref/macro.pl
from over 600 seconds to 7 seconds.
Macros have an odd mix of case-sensitive and case-insensitive
behaviour, plus there are matching parameters for arguments, etc. As
a result, we use case-insensitive hash tables and use a linked list to
store all the possible isomorphs.
Use the new hash table function library to store labels. When
compiling on my 64-bit system, it reduces the assembly time for the
output of test/perf/label.pl from 73 to 7 seconds.
Define a proper hash table library, instead of the current ad hoc stuff
used for both labels and macros. This only implements the actual
library; it is not yet used.
We use a CRC64 as a prehash. This is almost certainly overkill,
although it is rather efficient (except, arguably, the table lookup)
on 64-bit platforms, and not all that bad on 32-bit platforms. All we
really need is a function which produces two independent 32-bit
results which are used as the primary and secondary hash
respectively. Either way, the prehash function is easily replacable
if/when we have a quicker alternative.
Simple scripts to generate performance benchmarks for label,
macro and token lookups. The label and macro lookups are simple
numerical sequences; it may be desirable to add some more
sophisticated algorithms for producing tokens in case we want to
compare different hash functions against each other.
Add the SSSE3, SSE4.1 and SSE4.2 instruction sets. Change \332 to be
a literal 0xF2 prefix, by analog with \333 for 0xF3 prefix (the
previous \332 flag changed to \335). This is necessary to get the REX
prefix in the right place for instructions that use it.
We are going to have to go in and change existing instruction patterns
which use these, as well.
Support r/m operands for non-integer operands types, i.e. mmx or xmm
operands. This allows mmx and xmm operands to be written more
compactly, speeding up the assembler.
We have a lot of enumerations; by declaring fields as such, we make it
easier when debugging, since the debugger can display the enumerations
in cleartext. However, make sure exceptional values (like -1) are
included in the enumeration, since the compiler otherwise may not
include it in the valid range of the enumeration.
Speed up pptok.c by just doing |= 0x20 instead of calling tolower() for
every character during prehashing. This is good enough for our needs,
since we don't have any tokens containing the characters @ [ \ ] _ nor
any high-bit characters (in which case we'd have to worry about multibyte
anyway.)