doc: document preprocessor functions

Add documentation for preprocessor functions, as well as the flow of preprocessor expansion. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2025-02-17 17:19:35 +08:00 · 2022-11-11 20:25:49 -08:00 · 2022-11-11 20:25:49 -08:00 · 392b2b18a0
commit 392b2b18a0
parent 3fe5b3f5a1
2 changed files with 289 additions and 17 deletions
--- a/doc/changes.src
+++ b/doc/changes.src
@ -20,6 +20,10 @@ filename information anyway.
 \b Fix handling of MASM-syntax reserved memory (e.g. \c{dw ?}) when
 used in structure definitions.

+\b The preprocessor now supports functions, which can be less verbose
+and more convenient than the equivalent code implemented using
+directives. See \k{ppfunc}.
+

 \S{cl-2.15.06} Version 2.15.06

--- a/doc/nasmdoc.src
+++ b/doc/nasmdoc.src
@ -1,7 +1,7 @@
 \# --------------------------------------------------------------------------
 \#
 \#   Copyright 1996-2022 The NASM Authors - All Rights Reserved
-\M{year}{1996-2020}
+\M{year}{1996-2022}
 \#   See the file AUTHORS included with the NASM distribution for
 \#   the specific copyright holders.
 \#
@ -84,7 +84,8 @@
 \IR{-w} \c{-w} option
 \IR{-Z} \c{-Z} option
 \IR{!=} \c{!=} operator
-\IR{$, here} \c{$}, Here token
+\IR{$, here} \c{$}, current address
+\IR{$, here} here token
 \IR{$, prefix} \c{$}, prefix
 \IR{$$} \c{$$} token
 \IR{%} \c{%} operator
@ -118,7 +119,6 @@
 \IR{^^} \c{^^} operator
 \IR{|} \c{|} operator
 \IR{||} \c{||} operator
-\IR{~} \c{~} operator
 \IR{%$} \c{%$} and \c{%$$} prefixes
 \IA{%$$}{%$}
 \IR{+ opaddition} \c{+} operator, binary
@ -127,6 +127,8 @@
 \IR{- opsubtraction} \c{-} operator, binary
 \IR{- opunary} \c{-} operator, unary
 \IR{! opunary} \c{!} operator
+\IA{~}{~ opunary}
+\IR{~ opunary} \c{~} operator
 \IA{A16}{a16}
 \IA{A32}{a32}
 \IA{A64}{a64}
@ -153,12 +155,16 @@ variables
 \IR{c calling convention} C calling convention
 \IR{c symbol names} C symbol names
 \IA{critical expressions}{critical expression}
-\IA{command line}{command-line}
+\IA{command-line}{command line}
+\IA{comments}{comment}
+\IR{ccomment} comment, ending in \c{\\}
 \IA{case sensitivity}{case sensitive}
 \IA{case-sensitive}{case sensitive}
 \IA{case-insensitive}{case sensitive}
 \IA{character constants}{character constant}
 \IR{codeview debugging format} CodeView debugging format
+\IR{continuation line} continuation line
+\IR{continuation line} preprocessor, continuation line
 \IR{common object file format} Common Object File Format
 \IR{common variables, alignment in elf} common variables, alignment in ELF
 \IR{common, elf extensions to} \c{COMMON}, ELF extensions to
@ -170,8 +176,8 @@ variables
 \IR{dll symbols, exporting} DLL symbols, exporting
 \IR{dll symbols, importing} DLL symbols, importing
 \IR{dos} DOS
-\IA{effective address}{effective addresses}
-\IA{effective-address}{effective addresses}
+\IA{effective addresses}{effective address}
+\IA{effective-address}{effective address}
 \IR{elf} ELF
 \IR{elf, 16-bit code} ELF, 16-bit code
 \IR{elf, debug formats} ELF, debug formats
@ -241,9 +247,13 @@ variables
 \IR{plt} PLT
 \IR{plt} \c{PLT} relocations
 \IA{pre-defining macros}{pre-define}
-\IA{preprocessor expressions}{preprocessor, expressions}
-\IA{preprocessor loops}{preprocessor, loops}
-\IA{preprocessor variables}{preprocessor, variables}
+\IR{preprocessor conditionals} preprocessor, conditionals
+\IR{preprocessor expansions} preprocessor, expansions
+\IR{preprocessor expressions} preprocessor, expressions
+\IR{preprocessor loops} preprocessor, loops
+\IR{preprocessor variables} preprocessor, variables
+\IR{preprocessor variables} variables, preprocessor
+\IA{comments}{comment}
 \IR{relocations, pic-specific} relocations, PIC-specific
 \IA{repeating}{repeating code}
 \IR{section alignment, in elf} section alignment, in ELF
@ -1164,9 +1174,9 @@ is a macro, a preprocessor directive or an assembler directive: see
 \c label:    instruction operands        ; comment

 As usual, most of these fields are optional; the presence or absence
-of any combination of a label, an instruction and a comment is allowed.
-Of course, the operand field is either required or forbidden by the
-presence and nature of the instruction field.
+of any combination of a label, an instruction and a \i{comment} is
+allowed.  Of course, the operand field is either required or forbidden
+by the presence and nature of the instruction field.

 NASM uses backslash (\\) as the line continuation character; if a line
 ends with backslash, the next line is considered to be a part of the
@ -2166,10 +2176,23 @@ NASM contains a powerful \i{macro processor}, which supports
 conditional assembly, multi-level file inclusion, two forms of macro
 (single-line and multi-line), and a `context stack' mechanism for
 extra macro power. Preprocessor directives all begin with a \c{%}
-sign.
+sign. As a result, some care needs to be taken when using the \c{%}
+arithmetic operator to avoid it being confused with a preprocessor
+directive; it is recommended that it always be surrounded by
+whitespace.

-The preprocessor collapses all lines which end with a backslash (\\)
-character into a single line.  Thus:
+The NASM preprocessor borrows concepts from both the C preprocessor
+and the macro facilities of many other assemblers.
+
+\H{pcsteps} \i{Preprocessor Expansions}
+
+The input to the preprocessor is expanded in the following ways in the
+order specified here.
+
+\S{pcbackslash} \i{Continuation Line} Collapsing
+
+The preprocessor first collapses all lines which end with a backslash
+(\c{\\}) character into a single line.  Thus:

 \c %define THIS_VERY_LONG_MACRO_NAME_IS_DEFINED_TO \\
 \c         THIS_VALUE
@ -2177,8 +2200,122 @@ character into a single line.  Thus:
 will work like a single-line macro without the backslash-newline
 sequence.

+\IR{comment removal} comment, removal
+\IR{comment removal} preprocessor, comment removal
+
+\S{pccomment} \i{Comment Removal}
+
+After concatenation, comments are removed.
+\I{comment, syntax}\i{Comments}
+begin with the character \c{;} unless contained
+inside a quoted string or a handful of other special contexts.
+
+\I{ccomment}Note that this is applied \e{after} \i{continuation lines}
+are collapsed. This means that
+
+\c       add al,'\\'	 ; Add the ASCII code for \\
+\c	 mov [ecx],al	; Save the character
+
+will probably not do what you expect, as the second line will be
+considered part of the preceeding comment. Although this behavior is
+sometimes confusing, it is both the behavior of NASM since the very
+first version as well as the behavior of the C preprocessor.
+
+
+\S{pcline}\i\c{%line} directives
+
+In this step, \i\c{%line} directives are processed. See \k{line}.
+
+
+\S{pccond}\I{preprocessor conditionals}\I{preprocessor loops}
+Conditionals, Loops and \i{Multi-Line Macro} Definitions
+
+In this step, the following \i{preprocessor directives} are processed:
+
+\b \i{Multi-line macro} definitions, specified by the \i\c{%macro} and
+\i\c{%imacro} directives. The body of a multi-line macro is stored and
+is not further expanded at this time. See \k{mlmacro}.
+
+\b \i{Conditional assembly}, specified by the \i\c{%if} family of preprocessor
+directives. Disabled part of the source code are discarded and are not
+futher expanded. See \k{condasm}.
+
+\b \i{Preprocessor loops}, specified by the \i\c{%rep} preprocessor
+directive. A preprocessor loop is very similar to a multi-line macro
+and as such the body is stored and is not futher expanded at this
+time. See \k{rep}.
+
+These constructs are required to be balanced, so that the ending of a
+block can be detected, but no further processing is done at this time;
+stored blocks will be inserted at this step when they are expanded
+(see below.)
+
+It is specific to each directive to what extent \i{inline expansions}
+and \i{detokenization} are performed for the arguments of the
+directives.
+
+
+\S{pcsmacro} \i{Inline expansions} and other \I{preprocessor directives}directives
+
+In this step, the following expansions are performed on each line:
+
+\b \i{Single-line macros} are expanded. See \k{slmacro}.
+
+\b \i{Preprocessor functions} are expanded. See \k{ppfunc}.
+
+\b If this line is the result of \i{multi-line macro} expansions (see
+below), the parameters to that macro are expanded at this time. See
+\k{mlmacro}.
+
+\b \i{Macro indirection}, using the \i\c{%[]} construct, is expanded. See
+\k{indmacro}.
+
+\b Token \i{concatenation} using either the \i\c{%+} operator (see
+\k{concat%+}) or implicitly (see \k{indmacro} and \k{concat}.)
+
+\b \i{Macro-local labels} are converted into unique strings, see
+\k{maclocal}.
+
+\b Remaining preprocessor \i{directives} are processed. It is specific
+to each directive to what extend the above expansions or the ones
+specified in \k{pcfinal} are performed on their arguments.
+
+
+\S{pcmmacro} \i{Multi-Line Macro Expansion}
+
+In this step, \i{multi-line macros} are expanded into new lines of
+source, like the typical macro feature of many other assemblers. See
+\k{mlmacro}.
+
+After expansion, the newly injected lines of source are processed
+starting with the step defined in \k{pccond}.
+
+
+\S{pcfinal} \i{Detokenization}
+
+In this step, the final line of source code is produced. It performs
+the following operations:
+
+\b Environment variables specified using the \i\c{%!} construct are
+expanded. See \k{ctxlocal}.
+
+\b \i{Context-local labels} are expanded into unique strings. See
+\k{ctxlocal}.
+
+\b All tokens are converted to their text representation. Unlike the C
+preprocessor, the NASM preprocessor does not insert whitespace between
+adjacent tokens unless present in the source code. See \k{concat}.
+
+The resulting line of text either is sent to the assembler, or, if
+running in preprocessor-only mode, to the output file (see \k{opt-E});
+if necessary prefixed by a newly inserted \i\c{%line} directive.
+
+
 \H{slmacro} \i{Single-Line Macros}

+Single-line macros are expanded inline, much like macros in the C
+preprocessor.
+
 \S{define} The Normal Way: \I\c{%idefine}\i\c{%define}

 Single-line macros are defined using the \c{%define} preprocessor
@ -2528,6 +2665,8 @@ The expression passed to \c{%assign} is a \i{critical expression}
 a relocatable reference such as a code or data address, or anything
 involving a register).

+See also the \i\c{%eval()} preprocessor function, \k{f_eval}.
+

 \S{defstr} Defining Strings: \I\c{%idefstr}\i\c{%defstr}

@ -2549,6 +2688,8 @@ This can be used, for example, with the \c{%!} construct (see

 \c %defstr PATH %!PATH          ; The operating system PATH variable

+See also the \i\c{%str()} preprocessor function, \k{f_str}.
+

 \S{deftok} Defining Tokens: \I\c{%ideftok}\i\c{%deftok}

@ -2564,6 +2705,8 @@ is equivalent to

 \c %define test TEST

+See also the \i\c{%tok()} preprocessor function, \k{f_tok}.
+

 \S{defalias} Defining Aliases: \I\c{%idefalias}\i\c{%defalias}

@ -2628,6 +2771,9 @@ or a numeric value) to a single-line macro.  When producing a string
 value, it may change the style of quoting of the input string or
 strings, and possibly use \c{\\}-escapes inside \c{`}-quoted strings.

+These directives are also available as \i{preprocessor functions}, see
+\k{ppfunc}.
+
 \S{strcat} \i{Concatenating Strings}: \i\c{%strcat}

 The \c{%strcat} operator concatenates quoted strings and assign them to
@ -2646,6 +2792,9 @@ Similarly:

 The use of commas to separate strings is permitted but optional.

+The corresponding preprocessor function is \c{%strcat()}, see
+\k{f_strcat}.
+

 \S{strlen} \i{String Length}: \i\c{%strlen}

@ -2665,6 +2814,9 @@ macro that expands to a string, as in the following example:
 As in the first case, this would result in \c{charcnt} being
 assigned the value of 9.

+The corresponding preprocessor function is \c{%strlen()}, see
+\k{f_strlen}.
+

 \S{substr} \i{Extracting Substrings}: \i\c{%substr}

@ -2689,11 +2841,126 @@ values out of range result in an empty string.  A negative length
 means "until N-1 characters before the end of string", i.e. \c{-1}
 means until end of string, \c{-2} until one character before, etc.

+The corresponding preprocessor function is \c{%substr()}, see
+\k{f_substr}.
+
+
+\H{ppfunc} \i{Preprocessor Functions}
+
+Preprocessor functions are, fundamentally, a kind of built-in
+single-line macros. They expand to a string depending on its
+arguments, and can be used in any context where single-line macro
+expansion would be performed. Preprocessor functions were introduced
+in NASM 2.16.
+
+\S{f_eval} \i\c{%eval()} Function
+
+The \c{%eval()} function evaluates its argument as a numeric
+expression in much the same way the \i\c{%assign} directive would, see
+\k{assign}. Unlike \c{%assign}, \c{%eval()} supports more than one
+argument; if more than one argument is specified, it is expanded to a
+comma-separated list of values.
+
+\c %assign a    2
+\c %assign b    3 
+\c %defstr what %expr(a+b,a*b)	; equivalent to %define what "5,6"
+
+The expressions passed to \c{%eval()} are \i{critical expressions},
+see \k{crit}.
+
+
+\S{f_is} \i\c{%is()} Family Functions
+
+Each \i\c{%if} family directive (see \k{condasm}) has an equivalent
+\c{%is()} family function, that expands to \c{1} if the equivalent
+\c{%if} directive would process as true, and \c{0} if the equivalent
+\c{%if} directive would process as false.
+
+\c ; Instead of !%isidn() could have used %isnidn()
+\c %if %isdef(foo) && !%isidn(foo,bar)
+\c       db "foo is defined, but not as 'bar'"
+\c %endif
+
+Note that, being functions, the arguments (before expansion) will
+always need to have balanced parentheses so that the end of the
+argument list can be defined. This means that the syntax of
+e.g. \c{%istoken()} and \c{%isidn()} is somewhat stricter than their
+corresponding \c{%if} directives; it may be necessary to escape the
+argument to the conditional using \c{\{\}}:
+
+\c ; Instead of !%isidn() could have used %isnidn()
+\c %if %isdef(foo) && !%isidn({foo,)})
+\c       db "foo is defined, but not as ')'"
+\c %endif
+
+
+\S{f_str} \i\c\{%str()} Function
+
+The \c{%str()} function converts its argument, including any commas,
+to a quoted string, similar to the way the \i\c{%defstr} directive
+would, see \k{defstr}.
+
+Being a function, the argument will need to have balanced parentheses
+or be escaped using \c{\{\}}.
+
+\c ; The following lines are all equivalent
+\c %define test 'TEST'
+\c %defstr test TEST
+\c %define test %str(TEST)
+
+
+\S{f_strcat} \i\c\{%strcat()} Function
+
+The \c{%strcat()} function concatenates a list of quoted strings, in
+the same way the \i\c{%strcat} directive would, see \k{strcat}.
+
+\c ; The following lines are all equivalent
+\c %define alpha 'Alpha: 12" screen'
+\c %strcat alpha "Alpha: ", '12" screen'
+\c %define alpha %strcat("Alpha: ", '12" screen')
+
+
+\S{f_strlen} \i\c{%strlen()} Function
+
+The \c{%strlen()} function expands to the length of a quoted string,
+in the same way the \i\c{%strlen} directive would, see \k{strlen}.
+
+\c ; The following lines are all equivalent
+\c %define charcnt 9
+\c %strlen charcnt 'my string'
+\c %define charcnt %strlen('my string')
+
+
+\S{f_substr} \i\c\{%substr()} Function
+
+The \c{%substr()} function extracts a substring of a quoted string, in
+the same way the \i\c{%substr} directive would, see \k{substr}. Note
+that unlike the \c{%substr} directive, a comma is required after the
+string argument.
+
+\c ; The following lines are all equivalent
+\c %define mychar 'yzw'
+\c %substr mychar 'xyzw' 2,-1
+\c %define mychar %substr('xyzw',2,-1)
+
+
+\S{f_tok} \i\c{%tok()} function
+
+The \c{%tok()} function converts a quoted string into a sequence of
+tokens, in the same way the \i\c{%deftok} directive would, see
+\k{deftok}.
+
+\c ; The following lines are all equivalent
+\c %define test TEST
+\c %deftok test 'TEST'
+\c %define test %tok('TEST')
+

 \H{mlmacro} \i{Multi-Line Macros}: \I\c{%imacro}\i\c{%macro}

-Multi-line macros are much more like the type of macro seen in MASM
-and TASM: a multi-line macro definition in NASM looks something like
+Multi-line macros much like the type of macro seen in MASM
+and TASM, and expand to a new set of lines of source code.
+A multi-line macro definition in NASM looks something like
 this.

 \c %macro  prologue 1
@ -4614,6 +4881,7 @@ It is still possible to turn in on again by
 Note that \c{SECTALIGN <ON|OFF>} affects only the \c{ALIGN}/\c{ALIGNB} directives,
 not an explicit \c{SECTALIGN} directive.

+
 \C{macropkg} \i{Standard Macro Packages}

 The \i\c{%use} directive (see \k{use}) includes one of the standard