From d83bb9f778da2a861a6aebd527b61ec6c495d494 Mon Sep 17 00:00:00 2001 From: Neil Booth Date: Sat, 5 Jan 2002 10:00:24 +0000 Subject: [PATCH] * doc/cppinternals.texi: Update. From-SVN: r48562 --- gcc/ChangeLog | 4 + gcc/doc/cppinternals.texi | 150 ++++++++++++++++++++++++++++++++++---- 2 files changed, 141 insertions(+), 13 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index c4b13aa255c..ebcf344b534 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,7 @@ +2002-01-05 Neil Booth + + * doc/cppinternals.texi: Update. + 2002-01-05 Hans-Peter Nilsson * doc/invoke.texi (Option Summary) : Document diff --git a/gcc/doc/cppinternals.texi b/gcc/doc/cppinternals.texi index a831922a51e..df036380cb2 100644 --- a/gcc/doc/cppinternals.texi +++ b/gcc/doc/cppinternals.texi @@ -16,7 +16,7 @@ @ifinfo This file documents the internals of the GNU C Preprocessor. -Copyright 2000, 2001 Free Software Foundation, Inc. +Copyright 2000, 2001, 2002 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice @@ -68,7 +68,7 @@ into another language, under the above conditions for modified versions. @node Top @top -@chapter Cpplib---the core of the GNU C Preprocessor +@chapter Cpplib---the GNU C Preprocessor The GNU C preprocessor in GCC 3.x has been completely rewritten. It is now implemented as a library, @dfn{cpplib}, so it can be easily shared between @@ -469,26 +469,26 @@ enum stored in its hash node, so that directive lookup is also O(1). @unnumbered Macro Expansion Algorithm @cindex macro expansion -Macro expansion is a surprisingly tricky operation, fraught with nasty -corner cases and situations that render what you thought was a nifty -way to optimize the preprocessor's expansion algorithm wrong in quite -subtle ways. +Macro expansion is a tricky operation, fraught with nasty corner cases +and situations that render what you thought was a nifty way to +optimize the preprocessor's expansion algorithm wrong in quite subtle +ways. I strongly recommend you have a good grasp of how the C and C++ standards require macros to be expanded before diving into this section, let alone the code!. If you don't have a clear mental picture of how things like nested macro expansion, stringification and -token pasting are supposed to work, damage to you sanity can quickly +token pasting are supposed to work, damage to your sanity can quickly result. -@section Internal representation of Macros +@section Internal representation of macros @cindex macro representation (internal) The preprocessor stores macro expansions in tokenized form. This saves repeated lexing passes during expansion, at the cost of a small increase in memory consumption on average. The tokens are stored contiguously in memory, so a pointer to the first one and a token -count is all we need. +count is all you need to get the replacement list of a macro. If the macro is a function-like macro the preprocessor also stores its parameters, in the form of an ordered list of pointers to the hash @@ -502,13 +502,137 @@ the original parameters to the macro, both for dumping with e.g., @option{-dD}, and to warn about non-trivial macro redefinitions when the parameter names have changed. -@section Nested object-like macros +@section Macro expansion overview +The preprocessor maintains a @dfn{context stack}, implemented as a +linked list of @code{cpp_context} structures, which together represent +the macro expansion state at any one time. The @code{struct +cpp_reader} member variable @code{context} points to the current top +of this stack. The top normally holds the unexpanded replacement list +of the innermost macro under expansion, except when cpplib is about to +pre-expand an argument, in which case it holds that argument's +unexpanded tokens. -@c TODO +When there are no macros under expansion, cpplib is in @dfn{base +context}. All contexts other than the base context contain a +contiguous list of tokens delimited by a starting and ending token. +When not in base context, cpplib obtains the next token from the list +of the top context. If there are no tokens left in the list, it pops +that context off the stack, and subsequent ones if necessary, until an +unexhausted context is found or it returns to base context. In base +context, cpplib reads tokens directly from the lexer. -@section Function-like macros +If it encounters an identifier that is both a macro and enabled for +expansion, cpplib prepares to push a new context for that macro on the +stack by calling the routine @code{enter_macro_context}. When this +routine returns, the new context will contain the unexpanded tokens of +the replacement list of that macro. In the case of function-like +macros, @code{enter_macro_context} also replaces any parameters in the +replacement list, stored as @code{CPP_MACRO_ARG} tokens, with the +appropriate macro argument. If the standard requires that the +parameter be replaced with its expanded argument, the argument will +have been fully macro expanded first. -@c TODO +@code{enter_macro_context} also handles special macros like +@code{__LINE__}. Although these macros expand to a single token which +cannot contain any further macros, for reasons of token spacing +(@pxref{Token Spacing}) and simplicity of implementation, cpplib +handles these special macros by pushing a context containing just that +one token. + +The final thing that @code{enter_macro_context} does before returning +is to mark the macro disabled for expansion (except for special macros +like @code{__TIME__}). The macro is re-enabled when its context is +later popped from the context stack, as described above. This strict +ordering ensures that a macro is disabled whilst its expansion is +being scanned, but that it is @emph{not} disabled whilst any arguments +to it are being expanded. + +@section Scanning the replacement list for macros to expand +The C standard states that, after any parameters have been replaced +with their possibly-expanded arguments, the replacement list is +scanned for nested macros. Further, any identifiers in the +replacement list that are not expanded during this scan are never +again eligible for expansion in the future, if the reason they were +not expanded is that the macro in question was disabled. + +Clearly this latter condition can only apply to tokens resulting from +argument pre-expansion. Other tokens never have an opportunity to be +re-tested for expansion. It is possible for identifiers that are +function-like macros to not expand initially but to expand during a +later scan. This occurs when the identifier is the last token of an +argument (and therefore originally followed by a comma or a closing +parenthesis in its macro's argument list), and when it replaces its +parameter in the macro's replacement list, the subsequent token +happens to be an opening parenthesis (itself possibly the first token +of an argument). + +It is important to note that when cpplib reads the last token of a +given context, that context still remains on the stack. Only when +looking for the @emph{next} token do we pop it off the stack and drop +to a lower context. This makes backing up by one token easy, but more +importantly ensures that the macro corresponding to the current +context is still disabled when we are considering the last token of +its replacement list for expansion (or indeed expanding it). As an +example, which illustrates many of the points above, consider + +@smallexample +#define foo(x) bar x +foo(foo) (2) +@end smallexample + +@noindent which fully expands to @samp{bar foo (2)}. During pre-expansion +of the argument, @samp{foo} does not expand even though the macro is +enabled, since it has no following parenthesis [pre-expansion of an +argument only uses tokens from that argument; it cannot take tokens +from whatever follows the macro invocation]. This still leaves the +argument token @samp{foo} eligible for future expansion. Then, when +re-scanning after argument replacement, the token @samp{foo} is +rejected for expansion, and marked ineligible for future expansion, +since the macro is now disabled. It is disabled because the +replacement list @samp{bar foo} of the macro is still on the context +stack. + +If instead the algorithm looked for an opening parenthesis first and +then tested whether the macro were disabled it would be subtly wrong. +In the example above, the replacement list of @samp{foo} would be +popped in the process of finding the parenthesis, re-enabling +@samp{foo} and expanding it a second time. + +@section Looking for a function-like macro's opening parenthesis +Function-like macros only expand when immediately followed by a +parenthesis. To do this cpplib needs to temporarily disable macros +and read the next token. Unfortunately, because of spacing issues +(@pxref{Token Spacing}), there can be fake padding tokens in-between, +and if the next real token is not a parenthesis cpplib needs to be +able to back up that one token as well as retain the information in +any intervening padding tokens. + +Backing up more than one token when macros are involved is not +permitted by cpplib, because in general it might involve issues like +restoring popped contexts onto the context stack, which are too hard. +Instead, searching for the parenthesis is handled by a special +function, @code{funlike_invocation_p}, which remembers padding +information as it reads tokens. If the next real token is not an +opening parenthesis, it backs up that one token, and then pushes an +extra context just containing the padding information if necessary. + +@section Marking tokens ineligible for future expansion +As discussed above, cpplib needs a way of marking tokens as +unexpandable. Since the tokens cpplib handles are read-only once they +have been lexed, it instead makes a copy of the token and adds the +flag @code{NO_EXPAND} to the copy. + +For efficiency and to simplify memory management by avoiding having to +remember to free these tokens, they are allocated as temporary tokens +from the lexer's current token run (@pxref{Lexing a line}) using the +function @code{_cpp_temp_token}. The tokens are then re-used once the +current line of tokens has been read in. + +This might sound unsafe. However, tokens runs are not re-used at the +end of a line if it happens to be in the middle of a macro argument +list, and cpplib only wants to back-up more than one lexer token in +situations where no macro expansion is involved, so the optimization +is safe. @node Token Spacing @unnumbered Token Spacing