doc: document preprocessor functions

Add documentation for preprocessor functions, as well as the flow of
preprocessor expansion.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This commit is contained in:
H. Peter Anvin 2022-11-11 20:25:49 -08:00
parent 3fe5b3f5a1
commit 392b2b18a0
2 changed files with 289 additions and 17 deletions

View File

@ -20,6 +20,10 @@ filename information anyway.
\b Fix handling of MASM-syntax reserved memory (e.g. \c{dw ?}) when
used in structure definitions.
\b The preprocessor now supports functions, which can be less verbose
and more convenient than the equivalent code implemented using
directives. See \k{ppfunc}.
\S{cl-2.15.06} Version 2.15.06

View File

@ -1,7 +1,7 @@
\# --------------------------------------------------------------------------
\#
\# Copyright 1996-2022 The NASM Authors - All Rights Reserved
\M{year}{1996-2020}
\M{year}{1996-2022}
\# See the file AUTHORS included with the NASM distribution for
\# the specific copyright holders.
\#
@ -84,7 +84,8 @@
\IR{-w} \c{-w} option
\IR{-Z} \c{-Z} option
\IR{!=} \c{!=} operator
\IR{$, here} \c{$}, Here token
\IR{$, here} \c{$}, current address
\IR{$, here} here token
\IR{$, prefix} \c{$}, prefix
\IR{$$} \c{$$} token
\IR{%} \c{%} operator
@ -118,7 +119,6 @@
\IR{^^} \c{^^} operator
\IR{|} \c{|} operator
\IR{||} \c{||} operator
\IR{~} \c{~} operator
\IR{%$} \c{%$} and \c{%$$} prefixes
\IA{%$$}{%$}
\IR{+ opaddition} \c{+} operator, binary
@ -127,6 +127,8 @@
\IR{- opsubtraction} \c{-} operator, binary
\IR{- opunary} \c{-} operator, unary
\IR{! opunary} \c{!} operator
\IA{~}{~ opunary}
\IR{~ opunary} \c{~} operator
\IA{A16}{a16}
\IA{A32}{a32}
\IA{A64}{a64}
@ -153,12 +155,16 @@ variables
\IR{c calling convention} C calling convention
\IR{c symbol names} C symbol names
\IA{critical expressions}{critical expression}
\IA{command line}{command-line}
\IA{command-line}{command line}
\IA{comments}{comment}
\IR{ccomment} comment, ending in \c{\\}
\IA{case sensitivity}{case sensitive}
\IA{case-sensitive}{case sensitive}
\IA{case-insensitive}{case sensitive}
\IA{character constants}{character constant}
\IR{codeview debugging format} CodeView debugging format
\IR{continuation line} continuation line
\IR{continuation line} preprocessor, continuation line
\IR{common object file format} Common Object File Format
\IR{common variables, alignment in elf} common variables, alignment in ELF
\IR{common, elf extensions to} \c{COMMON}, ELF extensions to
@ -170,8 +176,8 @@ variables
\IR{dll symbols, exporting} DLL symbols, exporting
\IR{dll symbols, importing} DLL symbols, importing
\IR{dos} DOS
\IA{effective address}{effective addresses}
\IA{effective-address}{effective addresses}
\IA{effective addresses}{effective address}
\IA{effective-address}{effective address}
\IR{elf} ELF
\IR{elf, 16-bit code} ELF, 16-bit code
\IR{elf, debug formats} ELF, debug formats
@ -241,9 +247,13 @@ variables
\IR{plt} PLT
\IR{plt} \c{PLT} relocations
\IA{pre-defining macros}{pre-define}
\IA{preprocessor expressions}{preprocessor, expressions}
\IA{preprocessor loops}{preprocessor, loops}
\IA{preprocessor variables}{preprocessor, variables}
\IR{preprocessor conditionals} preprocessor, conditionals
\IR{preprocessor expansions} preprocessor, expansions
\IR{preprocessor expressions} preprocessor, expressions
\IR{preprocessor loops} preprocessor, loops
\IR{preprocessor variables} preprocessor, variables
\IR{preprocessor variables} variables, preprocessor
\IA{comments}{comment}
\IR{relocations, pic-specific} relocations, PIC-specific
\IA{repeating}{repeating code}
\IR{section alignment, in elf} section alignment, in ELF
@ -1164,9 +1174,9 @@ is a macro, a preprocessor directive or an assembler directive: see
\c label: instruction operands ; comment
As usual, most of these fields are optional; the presence or absence
of any combination of a label, an instruction and a comment is allowed.
Of course, the operand field is either required or forbidden by the
presence and nature of the instruction field.
of any combination of a label, an instruction and a \i{comment} is
allowed. Of course, the operand field is either required or forbidden
by the presence and nature of the instruction field.
NASM uses backslash (\\) as the line continuation character; if a line
ends with backslash, the next line is considered to be a part of the
@ -2166,10 +2176,23 @@ NASM contains a powerful \i{macro processor}, which supports
conditional assembly, multi-level file inclusion, two forms of macro
(single-line and multi-line), and a `context stack' mechanism for
extra macro power. Preprocessor directives all begin with a \c{%}
sign.
sign. As a result, some care needs to be taken when using the \c{%}
arithmetic operator to avoid it being confused with a preprocessor
directive; it is recommended that it always be surrounded by
whitespace.
The preprocessor collapses all lines which end with a backslash (\\)
character into a single line. Thus:
The NASM preprocessor borrows concepts from both the C preprocessor
and the macro facilities of many other assemblers.
\H{pcsteps} \i{Preprocessor Expansions}
The input to the preprocessor is expanded in the following ways in the
order specified here.
\S{pcbackslash} \i{Continuation Line} Collapsing
The preprocessor first collapses all lines which end with a backslash
(\c{\\}) character into a single line. Thus:
\c %define THIS_VERY_LONG_MACRO_NAME_IS_DEFINED_TO \\
\c THIS_VALUE
@ -2177,8 +2200,122 @@ character into a single line. Thus:
will work like a single-line macro without the backslash-newline
sequence.
\IR{comment removal} comment, removal
\IR{comment removal} preprocessor, comment removal
\S{pccomment} \i{Comment Removal}
After concatenation, comments are removed.
\I{comment, syntax}\i{Comments}
begin with the character \c{;} unless contained
inside a quoted string or a handful of other special contexts.
\I{ccomment}Note that this is applied \e{after} \i{continuation lines}
are collapsed. This means that
\c add al,'\\' ; Add the ASCII code for \\
\c mov [ecx],al ; Save the character
will probably not do what you expect, as the second line will be
considered part of the preceeding comment. Although this behavior is
sometimes confusing, it is both the behavior of NASM since the very
first version as well as the behavior of the C preprocessor.
\S{pcline}\i\c{%line} directives
In this step, \i\c{%line} directives are processed. See \k{line}.
\S{pccond}\I{preprocessor conditionals}\I{preprocessor loops}
Conditionals, Loops and \i{Multi-Line Macro} Definitions
In this step, the following \i{preprocessor directives} are processed:
\b \i{Multi-line macro} definitions, specified by the \i\c{%macro} and
\i\c{%imacro} directives. The body of a multi-line macro is stored and
is not further expanded at this time. See \k{mlmacro}.
\b \i{Conditional assembly}, specified by the \i\c{%if} family of preprocessor
directives. Disabled part of the source code are discarded and are not
futher expanded. See \k{condasm}.
\b \i{Preprocessor loops}, specified by the \i\c{%rep} preprocessor
directive. A preprocessor loop is very similar to a multi-line macro
and as such the body is stored and is not futher expanded at this
time. See \k{rep}.
These constructs are required to be balanced, so that the ending of a
block can be detected, but no further processing is done at this time;
stored blocks will be inserted at this step when they are expanded
(see below.)
It is specific to each directive to what extent \i{inline expansions}
and \i{detokenization} are performed for the arguments of the
directives.
\S{pcsmacro} \i{Inline expansions} and other \I{preprocessor directives}directives
In this step, the following expansions are performed on each line:
\b \i{Single-line macros} are expanded. See \k{slmacro}.
\b \i{Preprocessor functions} are expanded. See \k{ppfunc}.
\b If this line is the result of \i{multi-line macro} expansions (see
below), the parameters to that macro are expanded at this time. See
\k{mlmacro}.
\b \i{Macro indirection}, using the \i\c{%[]} construct, is expanded. See
\k{indmacro}.
\b Token \i{concatenation} using either the \i\c{%+} operator (see
\k{concat%+}) or implicitly (see \k{indmacro} and \k{concat}.)
\b \i{Macro-local labels} are converted into unique strings, see
\k{maclocal}.
\b Remaining preprocessor \i{directives} are processed. It is specific
to each directive to what extend the above expansions or the ones
specified in \k{pcfinal} are performed on their arguments.
\S{pcmmacro} \i{Multi-Line Macro Expansion}
In this step, \i{multi-line macros} are expanded into new lines of
source, like the typical macro feature of many other assemblers. See
\k{mlmacro}.
After expansion, the newly injected lines of source are processed
starting with the step defined in \k{pccond}.
\S{pcfinal} \i{Detokenization}
In this step, the final line of source code is produced. It performs
the following operations:
\b Environment variables specified using the \i\c{%!} construct are
expanded. See \k{ctxlocal}.
\b \i{Context-local labels} are expanded into unique strings. See
\k{ctxlocal}.
\b All tokens are converted to their text representation. Unlike the C
preprocessor, the NASM preprocessor does not insert whitespace between
adjacent tokens unless present in the source code. See \k{concat}.
The resulting line of text either is sent to the assembler, or, if
running in preprocessor-only mode, to the output file (see \k{opt-E});
if necessary prefixed by a newly inserted \i\c{%line} directive.
\H{slmacro} \i{Single-Line Macros}
Single-line macros are expanded inline, much like macros in the C
preprocessor.
\S{define} The Normal Way: \I\c{%idefine}\i\c{%define}
Single-line macros are defined using the \c{%define} preprocessor
@ -2528,6 +2665,8 @@ The expression passed to \c{%assign} is a \i{critical expression}
a relocatable reference such as a code or data address, or anything
involving a register).
See also the \i\c{%eval()} preprocessor function, \k{f_eval}.
\S{defstr} Defining Strings: \I\c{%idefstr}\i\c{%defstr}
@ -2549,6 +2688,8 @@ This can be used, for example, with the \c{%!} construct (see
\c %defstr PATH %!PATH ; The operating system PATH variable
See also the \i\c{%str()} preprocessor function, \k{f_str}.
\S{deftok} Defining Tokens: \I\c{%ideftok}\i\c{%deftok}
@ -2564,6 +2705,8 @@ is equivalent to
\c %define test TEST
See also the \i\c{%tok()} preprocessor function, \k{f_tok}.
\S{defalias} Defining Aliases: \I\c{%idefalias}\i\c{%defalias}
@ -2628,6 +2771,9 @@ or a numeric value) to a single-line macro. When producing a string
value, it may change the style of quoting of the input string or
strings, and possibly use \c{\\}-escapes inside \c{`}-quoted strings.
These directives are also available as \i{preprocessor functions}, see
\k{ppfunc}.
\S{strcat} \i{Concatenating Strings}: \i\c{%strcat}
The \c{%strcat} operator concatenates quoted strings and assign them to
@ -2646,6 +2792,9 @@ Similarly:
The use of commas to separate strings is permitted but optional.
The corresponding preprocessor function is \c{%strcat()}, see
\k{f_strcat}.
\S{strlen} \i{String Length}: \i\c{%strlen}
@ -2665,6 +2814,9 @@ macro that expands to a string, as in the following example:
As in the first case, this would result in \c{charcnt} being
assigned the value of 9.
The corresponding preprocessor function is \c{%strlen()}, see
\k{f_strlen}.
\S{substr} \i{Extracting Substrings}: \i\c{%substr}
@ -2689,11 +2841,126 @@ values out of range result in an empty string. A negative length
means "until N-1 characters before the end of string", i.e. \c{-1}
means until end of string, \c{-2} until one character before, etc.
The corresponding preprocessor function is \c{%substr()}, see
\k{f_substr}.
\H{ppfunc} \i{Preprocessor Functions}
Preprocessor functions are, fundamentally, a kind of built-in
single-line macros. They expand to a string depending on its
arguments, and can be used in any context where single-line macro
expansion would be performed. Preprocessor functions were introduced
in NASM 2.16.
\S{f_eval} \i\c{%eval()} Function
The \c{%eval()} function evaluates its argument as a numeric
expression in much the same way the \i\c{%assign} directive would, see
\k{assign}. Unlike \c{%assign}, \c{%eval()} supports more than one
argument; if more than one argument is specified, it is expanded to a
comma-separated list of values.
\c %assign a 2
\c %assign b 3
\c %defstr what %expr(a+b,a*b) ; equivalent to %define what "5,6"
The expressions passed to \c{%eval()} are \i{critical expressions},
see \k{crit}.
\S{f_is} \i\c{%is()} Family Functions
Each \i\c{%if} family directive (see \k{condasm}) has an equivalent
\c{%is()} family function, that expands to \c{1} if the equivalent
\c{%if} directive would process as true, and \c{0} if the equivalent
\c{%if} directive would process as false.
\c ; Instead of !%isidn() could have used %isnidn()
\c %if %isdef(foo) && !%isidn(foo,bar)
\c db "foo is defined, but not as 'bar'"
\c %endif
Note that, being functions, the arguments (before expansion) will
always need to have balanced parentheses so that the end of the
argument list can be defined. This means that the syntax of
e.g. \c{%istoken()} and \c{%isidn()} is somewhat stricter than their
corresponding \c{%if} directives; it may be necessary to escape the
argument to the conditional using \c{\{\}}:
\c ; Instead of !%isidn() could have used %isnidn()
\c %if %isdef(foo) && !%isidn({foo,)})
\c db "foo is defined, but not as ')'"
\c %endif
\S{f_str} \i\c\{%str()} Function
The \c{%str()} function converts its argument, including any commas,
to a quoted string, similar to the way the \i\c{%defstr} directive
would, see \k{defstr}.
Being a function, the argument will need to have balanced parentheses
or be escaped using \c{\{\}}.
\c ; The following lines are all equivalent
\c %define test 'TEST'
\c %defstr test TEST
\c %define test %str(TEST)
\S{f_strcat} \i\c\{%strcat()} Function
The \c{%strcat()} function concatenates a list of quoted strings, in
the same way the \i\c{%strcat} directive would, see \k{strcat}.
\c ; The following lines are all equivalent
\c %define alpha 'Alpha: 12" screen'
\c %strcat alpha "Alpha: ", '12" screen'
\c %define alpha %strcat("Alpha: ", '12" screen')
\S{f_strlen} \i\c{%strlen()} Function
The \c{%strlen()} function expands to the length of a quoted string,
in the same way the \i\c{%strlen} directive would, see \k{strlen}.
\c ; The following lines are all equivalent
\c %define charcnt 9
\c %strlen charcnt 'my string'
\c %define charcnt %strlen('my string')
\S{f_substr} \i\c\{%substr()} Function
The \c{%substr()} function extracts a substring of a quoted string, in
the same way the \i\c{%substr} directive would, see \k{substr}. Note
that unlike the \c{%substr} directive, a comma is required after the
string argument.
\c ; The following lines are all equivalent
\c %define mychar 'yzw'
\c %substr mychar 'xyzw' 2,-1
\c %define mychar %substr('xyzw',2,-1)
\S{f_tok} \i\c{%tok()} function
The \c{%tok()} function converts a quoted string into a sequence of
tokens, in the same way the \i\c{%deftok} directive would, see
\k{deftok}.
\c ; The following lines are all equivalent
\c %define test TEST
\c %deftok test 'TEST'
\c %define test %tok('TEST')
\H{mlmacro} \i{Multi-Line Macros}: \I\c{%imacro}\i\c{%macro}
Multi-line macros are much more like the type of macro seen in MASM
and TASM: a multi-line macro definition in NASM looks something like
Multi-line macros much like the type of macro seen in MASM
and TASM, and expand to a new set of lines of source code.
A multi-line macro definition in NASM looks something like
this.
\c %macro prologue 1
@ -4614,6 +4881,7 @@ It is still possible to turn in on again by
Note that \c{SECTALIGN <ON|OFF>} affects only the \c{ALIGN}/\c{ALIGNB} directives,
not an explicit \c{SECTALIGN} directive.
\C{macropkg} \i{Standard Macro Packages}
The \i\c{%use} directive (see \k{use}) includes one of the standard