ctype functions take an *int*, which the user is expected to have
taken the input character from getc() and friends, or taken a
character and cast it to (unsigned char).
We don't care about EOF (-1), so use macros that cast to (unsigned
char) for us.
On some platforms, tolower() is implemented as a function call, in
order to handle locale support. We never change locales, so can the
result of tolower() into a table, so we don't have to sit through the
function call every time.
~1.3% overall performance improvement on a macro-heavy benchmark under
Linux x86-64.
Make strings a proper, first-class token type, instead of relying on
the "TOKEN_NUM with tv_charptr" hack. Only convert a string to a
number if requested in an expression context; this also makes it
possible to actually issue a warning when it overflows.
Ownership of the filename string was a bit fuzzy, with the result that
we were freeing it even though it was retained for use by __FILE__.
Clean up a number of other memory management issues with the new
quoting code, and change the stdscan implementation to one pass over
the string.
For consistency, support binary and octal floating-point, and accept
a "0d" or "0t" prefix for decimal floating-point. However, we do not
accept a binary exponent (p) for a decimal mantissa, or vice versa.
Allow any radix letter from the set [bydtoqhx] to be used either
"Intel-style" (0...x) or "C-style" (0x...). In Intel style, the
leading 0 remains optional as long as the first digit is in the range
0-9.
As a consequence, allow the prefix "0h" for hexadecimal floating
point.
Since we allow the prefix $ instead of 0x for integer constants, do
the same for floating point. No suffix support at this time; we may
want to consider if that would be appropriate.
1e30 is a floating-point constant, but 1e30h is not. The scanner
won't know that until it sees the "h", so make sure we keep enough
state to be able to distinguish "1e30" (a possible hex constant) from
"1.e30", "1e+30" or "1.0" (unabiguously floating-point.)
- Allow underscores as group separators in numbers, for example:
0x1234_5678 is now a legal number. The underscore is just ignored,
it adds no meaning.
- Recognize dotless floating-point numbers, such as "1e30". This
entails distinguishing hexadecimal numbers in the scanner, since
e.g. 0x1e30 is a perfectly legitimate hex constant.
Proper use of bool and enum makes code easier to debug. Do more of
it. In particular, we really should stomp out any residual uses of
magic constants that aren't enums or, in some cases, even #defines.
Both C and C++ have "bool", "true" and "false" in lower case; C
requires <stdbool.h> for this, in C++ it is an inherent type built
into the compiler. Use those instead of the old macros; emulate with
a simple typedef enum if unavailable.
Concentrate compiler dependencies to compiler.h; make sure compiler.h
is included first in every .c file (since some prototypes may depend
on the presence of feature request macros.)
Actually use the conditional inclusion of various functions (totally
broken in previous releases.)
Instead of returning -1 from nasm_token_hash, set tv->t_type to
TOKEN_ID and return TOKEN_ID, since that's what stdscan.c wants to do
with it anyway. This allows us to simply tailcall nasm_token_hash().
Finish the perfect hash tokenizer, and actually enable it.
Move stdscan() et al to a separate file, since it's not needed in any
of the clients of nasmlib other than nasm itself.
Run make alldeps.