diff --git a/doc/changes.src b/doc/changes.src index 4c45f18b..accc5058 100644 --- a/doc/changes.src +++ b/doc/changes.src @@ -20,6 +20,10 @@ filename information anyway. \b Fix handling of MASM-syntax reserved memory (e.g. \c{dw ?}) when used in structure definitions. +\b The preprocessor now supports functions, which can be less verbose +and more convenient than the equivalent code implemented using +directives. See \k{ppfunc}. + \S{cl-2.15.06} Version 2.15.06 diff --git a/doc/nasmdoc.src b/doc/nasmdoc.src index f2756072..d850df9e 100644 --- a/doc/nasmdoc.src +++ b/doc/nasmdoc.src @@ -1,7 +1,7 @@ \# -------------------------------------------------------------------------- \# \# Copyright 1996-2022 The NASM Authors - All Rights Reserved -\M{year}{1996-2020} +\M{year}{1996-2022} \# See the file AUTHORS included with the NASM distribution for \# the specific copyright holders. \# @@ -84,7 +84,8 @@ \IR{-w} \c{-w} option \IR{-Z} \c{-Z} option \IR{!=} \c{!=} operator -\IR{$, here} \c{$}, Here token +\IR{$, here} \c{$}, current address +\IR{$, here} here token \IR{$, prefix} \c{$}, prefix \IR{$$} \c{$$} token \IR{%} \c{%} operator @@ -118,7 +119,6 @@ \IR{^^} \c{^^} operator \IR{|} \c{|} operator \IR{||} \c{||} operator -\IR{~} \c{~} operator \IR{%$} \c{%$} and \c{%$$} prefixes \IA{%$$}{%$} \IR{+ opaddition} \c{+} operator, binary @@ -127,6 +127,8 @@ \IR{- opsubtraction} \c{-} operator, binary \IR{- opunary} \c{-} operator, unary \IR{! opunary} \c{!} operator +\IA{~}{~ opunary} +\IR{~ opunary} \c{~} operator \IA{A16}{a16} \IA{A32}{a32} \IA{A64}{a64} @@ -153,12 +155,16 @@ variables \IR{c calling convention} C calling convention \IR{c symbol names} C symbol names \IA{critical expressions}{critical expression} -\IA{command line}{command-line} +\IA{command-line}{command line} +\IA{comments}{comment} +\IR{ccomment} comment, ending in \c{\\} \IA{case sensitivity}{case sensitive} \IA{case-sensitive}{case sensitive} \IA{case-insensitive}{case sensitive} \IA{character constants}{character constant} \IR{codeview debugging format} CodeView debugging format +\IR{continuation line} continuation line +\IR{continuation line} preprocessor, continuation line \IR{common object file format} Common Object File Format \IR{common variables, alignment in elf} common variables, alignment in ELF \IR{common, elf extensions to} \c{COMMON}, ELF extensions to @@ -170,8 +176,8 @@ variables \IR{dll symbols, exporting} DLL symbols, exporting \IR{dll symbols, importing} DLL symbols, importing \IR{dos} DOS -\IA{effective address}{effective addresses} -\IA{effective-address}{effective addresses} +\IA{effective addresses}{effective address} +\IA{effective-address}{effective address} \IR{elf} ELF \IR{elf, 16-bit code} ELF, 16-bit code \IR{elf, debug formats} ELF, debug formats @@ -241,9 +247,13 @@ variables \IR{plt} PLT \IR{plt} \c{PLT} relocations \IA{pre-defining macros}{pre-define} -\IA{preprocessor expressions}{preprocessor, expressions} -\IA{preprocessor loops}{preprocessor, loops} -\IA{preprocessor variables}{preprocessor, variables} +\IR{preprocessor conditionals} preprocessor, conditionals +\IR{preprocessor expansions} preprocessor, expansions +\IR{preprocessor expressions} preprocessor, expressions +\IR{preprocessor loops} preprocessor, loops +\IR{preprocessor variables} preprocessor, variables +\IR{preprocessor variables} variables, preprocessor +\IA{comments}{comment} \IR{relocations, pic-specific} relocations, PIC-specific \IA{repeating}{repeating code} \IR{section alignment, in elf} section alignment, in ELF @@ -1164,9 +1174,9 @@ is a macro, a preprocessor directive or an assembler directive: see \c label: instruction operands ; comment As usual, most of these fields are optional; the presence or absence -of any combination of a label, an instruction and a comment is allowed. -Of course, the operand field is either required or forbidden by the -presence and nature of the instruction field. +of any combination of a label, an instruction and a \i{comment} is +allowed. Of course, the operand field is either required or forbidden +by the presence and nature of the instruction field. NASM uses backslash (\\) as the line continuation character; if a line ends with backslash, the next line is considered to be a part of the @@ -2166,10 +2176,23 @@ NASM contains a powerful \i{macro processor}, which supports conditional assembly, multi-level file inclusion, two forms of macro (single-line and multi-line), and a `context stack' mechanism for extra macro power. Preprocessor directives all begin with a \c{%} -sign. +sign. As a result, some care needs to be taken when using the \c{%} +arithmetic operator to avoid it being confused with a preprocessor +directive; it is recommended that it always be surrounded by +whitespace. -The preprocessor collapses all lines which end with a backslash (\\) -character into a single line. Thus: +The NASM preprocessor borrows concepts from both the C preprocessor +and the macro facilities of many other assemblers. + +\H{pcsteps} \i{Preprocessor Expansions} + +The input to the preprocessor is expanded in the following ways in the +order specified here. + +\S{pcbackslash} \i{Continuation Line} Collapsing + +The preprocessor first collapses all lines which end with a backslash +(\c{\\}) character into a single line. Thus: \c %define THIS_VERY_LONG_MACRO_NAME_IS_DEFINED_TO \\ \c THIS_VALUE @@ -2177,8 +2200,122 @@ character into a single line. Thus: will work like a single-line macro without the backslash-newline sequence. +\IR{comment removal} comment, removal +\IR{comment removal} preprocessor, comment removal + +\S{pccomment} \i{Comment Removal} + +After concatenation, comments are removed. +\I{comment, syntax}\i{Comments} +begin with the character \c{;} unless contained +inside a quoted string or a handful of other special contexts. + +\I{ccomment}Note that this is applied \e{after} \i{continuation lines} +are collapsed. This means that + +\c add al,'\\' ; Add the ASCII code for \\ +\c mov [ecx],al ; Save the character + +will probably not do what you expect, as the second line will be +considered part of the preceeding comment. Although this behavior is +sometimes confusing, it is both the behavior of NASM since the very +first version as well as the behavior of the C preprocessor. + + +\S{pcline}\i\c{%line} directives + +In this step, \i\c{%line} directives are processed. See \k{line}. + + +\S{pccond}\I{preprocessor conditionals}\I{preprocessor loops} +Conditionals, Loops and \i{Multi-Line Macro} Definitions + +In this step, the following \i{preprocessor directives} are processed: + +\b \i{Multi-line macro} definitions, specified by the \i\c{%macro} and +\i\c{%imacro} directives. The body of a multi-line macro is stored and +is not further expanded at this time. See \k{mlmacro}. + +\b \i{Conditional assembly}, specified by the \i\c{%if} family of preprocessor +directives. Disabled part of the source code are discarded and are not +futher expanded. See \k{condasm}. + +\b \i{Preprocessor loops}, specified by the \i\c{%rep} preprocessor +directive. A preprocessor loop is very similar to a multi-line macro +and as such the body is stored and is not futher expanded at this +time. See \k{rep}. + +These constructs are required to be balanced, so that the ending of a +block can be detected, but no further processing is done at this time; +stored blocks will be inserted at this step when they are expanded +(see below.) + +It is specific to each directive to what extent \i{inline expansions} +and \i{detokenization} are performed for the arguments of the +directives. + + +\S{pcsmacro} \i{Inline expansions} and other \I{preprocessor directives}directives + +In this step, the following expansions are performed on each line: + +\b \i{Single-line macros} are expanded. See \k{slmacro}. + +\b \i{Preprocessor functions} are expanded. See \k{ppfunc}. + +\b If this line is the result of \i{multi-line macro} expansions (see +below), the parameters to that macro are expanded at this time. See +\k{mlmacro}. + +\b \i{Macro indirection}, using the \i\c{%[]} construct, is expanded. See +\k{indmacro}. + +\b Token \i{concatenation} using either the \i\c{%+} operator (see +\k{concat%+}) or implicitly (see \k{indmacro} and \k{concat}.) + +\b \i{Macro-local labels} are converted into unique strings, see +\k{maclocal}. + +\b Remaining preprocessor \i{directives} are processed. It is specific +to each directive to what extend the above expansions or the ones +specified in \k{pcfinal} are performed on their arguments. + + +\S{pcmmacro} \i{Multi-Line Macro Expansion} + +In this step, \i{multi-line macros} are expanded into new lines of +source, like the typical macro feature of many other assemblers. See +\k{mlmacro}. + +After expansion, the newly injected lines of source are processed +starting with the step defined in \k{pccond}. + + +\S{pcfinal} \i{Detokenization} + +In this step, the final line of source code is produced. It performs +the following operations: + +\b Environment variables specified using the \i\c{%!} construct are +expanded. See \k{ctxlocal}. + +\b \i{Context-local labels} are expanded into unique strings. See +\k{ctxlocal}. + +\b All tokens are converted to their text representation. Unlike the C +preprocessor, the NASM preprocessor does not insert whitespace between +adjacent tokens unless present in the source code. See \k{concat}. + +The resulting line of text either is sent to the assembler, or, if +running in preprocessor-only mode, to the output file (see \k{opt-E}); +if necessary prefixed by a newly inserted \i\c{%line} directive. + + \H{slmacro} \i{Single-Line Macros} +Single-line macros are expanded inline, much like macros in the C +preprocessor. + \S{define} The Normal Way: \I\c{%idefine}\i\c{%define} Single-line macros are defined using the \c{%define} preprocessor @@ -2528,6 +2665,8 @@ The expression passed to \c{%assign} is a \i{critical expression} a relocatable reference such as a code or data address, or anything involving a register). +See also the \i\c{%eval()} preprocessor function, \k{f_eval}. + \S{defstr} Defining Strings: \I\c{%idefstr}\i\c{%defstr} @@ -2549,6 +2688,8 @@ This can be used, for example, with the \c{%!} construct (see \c %defstr PATH %!PATH ; The operating system PATH variable +See also the \i\c{%str()} preprocessor function, \k{f_str}. + \S{deftok} Defining Tokens: \I\c{%ideftok}\i\c{%deftok} @@ -2564,6 +2705,8 @@ is equivalent to \c %define test TEST +See also the \i\c{%tok()} preprocessor function, \k{f_tok}. + \S{defalias} Defining Aliases: \I\c{%idefalias}\i\c{%defalias} @@ -2628,6 +2771,9 @@ or a numeric value) to a single-line macro. When producing a string value, it may change the style of quoting of the input string or strings, and possibly use \c{\\}-escapes inside \c{`}-quoted strings. +These directives are also available as \i{preprocessor functions}, see +\k{ppfunc}. + \S{strcat} \i{Concatenating Strings}: \i\c{%strcat} The \c{%strcat} operator concatenates quoted strings and assign them to @@ -2646,6 +2792,9 @@ Similarly: The use of commas to separate strings is permitted but optional. +The corresponding preprocessor function is \c{%strcat()}, see +\k{f_strcat}. + \S{strlen} \i{String Length}: \i\c{%strlen} @@ -2665,6 +2814,9 @@ macro that expands to a string, as in the following example: As in the first case, this would result in \c{charcnt} being assigned the value of 9. +The corresponding preprocessor function is \c{%strlen()}, see +\k{f_strlen}. + \S{substr} \i{Extracting Substrings}: \i\c{%substr} @@ -2689,11 +2841,126 @@ values out of range result in an empty string. A negative length means "until N-1 characters before the end of string", i.e. \c{-1} means until end of string, \c{-2} until one character before, etc. +The corresponding preprocessor function is \c{%substr()}, see +\k{f_substr}. + + +\H{ppfunc} \i{Preprocessor Functions} + +Preprocessor functions are, fundamentally, a kind of built-in +single-line macros. They expand to a string depending on its +arguments, and can be used in any context where single-line macro +expansion would be performed. Preprocessor functions were introduced +in NASM 2.16. + +\S{f_eval} \i\c{%eval()} Function + +The \c{%eval()} function evaluates its argument as a numeric +expression in much the same way the \i\c{%assign} directive would, see +\k{assign}. Unlike \c{%assign}, \c{%eval()} supports more than one +argument; if more than one argument is specified, it is expanded to a +comma-separated list of values. + +\c %assign a 2 +\c %assign b 3 +\c %defstr what %expr(a+b,a*b) ; equivalent to %define what "5,6" + +The expressions passed to \c{%eval()} are \i{critical expressions}, +see \k{crit}. + + +\S{f_is} \i\c{%is()} Family Functions + +Each \i\c{%if} family directive (see \k{condasm}) has an equivalent +\c{%is()} family function, that expands to \c{1} if the equivalent +\c{%if} directive would process as true, and \c{0} if the equivalent +\c{%if} directive would process as false. + +\c ; Instead of !%isidn() could have used %isnidn() +\c %if %isdef(foo) && !%isidn(foo,bar) +\c db "foo is defined, but not as 'bar'" +\c %endif + +Note that, being functions, the arguments (before expansion) will +always need to have balanced parentheses so that the end of the +argument list can be defined. This means that the syntax of +e.g. \c{%istoken()} and \c{%isidn()} is somewhat stricter than their +corresponding \c{%if} directives; it may be necessary to escape the +argument to the conditional using \c{\{\}}: + +\c ; Instead of !%isidn() could have used %isnidn() +\c %if %isdef(foo) && !%isidn({foo,)}) +\c db "foo is defined, but not as ')'" +\c %endif + + +\S{f_str} \i\c\{%str()} Function + +The \c{%str()} function converts its argument, including any commas, +to a quoted string, similar to the way the \i\c{%defstr} directive +would, see \k{defstr}. + +Being a function, the argument will need to have balanced parentheses +or be escaped using \c{\{\}}. + +\c ; The following lines are all equivalent +\c %define test 'TEST' +\c %defstr test TEST +\c %define test %str(TEST) + + +\S{f_strcat} \i\c\{%strcat()} Function + +The \c{%strcat()} function concatenates a list of quoted strings, in +the same way the \i\c{%strcat} directive would, see \k{strcat}. + +\c ; The following lines are all equivalent +\c %define alpha 'Alpha: 12" screen' +\c %strcat alpha "Alpha: ", '12" screen' +\c %define alpha %strcat("Alpha: ", '12" screen') + + +\S{f_strlen} \i\c{%strlen()} Function + +The \c{%strlen()} function expands to the length of a quoted string, +in the same way the \i\c{%strlen} directive would, see \k{strlen}. + +\c ; The following lines are all equivalent +\c %define charcnt 9 +\c %strlen charcnt 'my string' +\c %define charcnt %strlen('my string') + + +\S{f_substr} \i\c\{%substr()} Function + +The \c{%substr()} function extracts a substring of a quoted string, in +the same way the \i\c{%substr} directive would, see \k{substr}. Note +that unlike the \c{%substr} directive, a comma is required after the +string argument. + +\c ; The following lines are all equivalent +\c %define mychar 'yzw' +\c %substr mychar 'xyzw' 2,-1 +\c %define mychar %substr('xyzw',2,-1) + + +\S{f_tok} \i\c{%tok()} function + +The \c{%tok()} function converts a quoted string into a sequence of +tokens, in the same way the \i\c{%deftok} directive would, see +\k{deftok}. + +\c ; The following lines are all equivalent +\c %define test TEST +\c %deftok test 'TEST' +\c %define test %tok('TEST') + \H{mlmacro} \i{Multi-Line Macros}: \I\c{%imacro}\i\c{%macro} -Multi-line macros are much more like the type of macro seen in MASM -and TASM: a multi-line macro definition in NASM looks something like +Multi-line macros much like the type of macro seen in MASM +and TASM, and expand to a new set of lines of source code. +A multi-line macro definition in NASM looks something like this. \c %macro prologue 1 @@ -4614,6 +4881,7 @@ It is still possible to turn in on again by Note that \c{SECTALIGN } affects only the \c{ALIGN}/\c{ALIGNB} directives, not an explicit \c{SECTALIGN} directive. + \C{macropkg} \i{Standard Macro Packages} The \i\c{%use} directive (see \k{use}) includes one of the standard