mirror of
git://gcc.gnu.org/git/gcc.git
synced 2024-12-30 07:44:45 +08:00
* doc/cppinternals.texi: Update.
From-SVN: r48562
This commit is contained in:
parent
4fb535d7d7
commit
d83bb9f778
@ -1,3 +1,7 @@
|
||||
2002-01-05 Neil Booth <neil@daikokuya.demon.co.uk>
|
||||
|
||||
* doc/cppinternals.texi: Update.
|
||||
|
||||
2002-01-05 Hans-Peter Nilsson <hp@bitrange.com>
|
||||
|
||||
* doc/invoke.texi (Option Summary) <MMIX Options>: Document
|
||||
|
@ -16,7 +16,7 @@
|
||||
@ifinfo
|
||||
This file documents the internals of the GNU C Preprocessor.
|
||||
|
||||
Copyright 2000, 2001 Free Software Foundation, Inc.
|
||||
Copyright 2000, 2001, 2002 Free Software Foundation, Inc.
|
||||
|
||||
Permission is granted to make and distribute verbatim copies of
|
||||
this manual provided the copyright notice and this permission notice
|
||||
@ -68,7 +68,7 @@ into another language, under the above conditions for modified versions.
|
||||
|
||||
@node Top
|
||||
@top
|
||||
@chapter Cpplib---the core of the GNU C Preprocessor
|
||||
@chapter Cpplib---the GNU C Preprocessor
|
||||
|
||||
The GNU C preprocessor in GCC 3.x has been completely rewritten. It is
|
||||
now implemented as a library, @dfn{cpplib}, so it can be easily shared between
|
||||
@ -469,26 +469,26 @@ enum stored in its hash node, so that directive lookup is also O(1).
|
||||
@unnumbered Macro Expansion Algorithm
|
||||
@cindex macro expansion
|
||||
|
||||
Macro expansion is a surprisingly tricky operation, fraught with nasty
|
||||
corner cases and situations that render what you thought was a nifty
|
||||
way to optimize the preprocessor's expansion algorithm wrong in quite
|
||||
subtle ways.
|
||||
Macro expansion is a tricky operation, fraught with nasty corner cases
|
||||
and situations that render what you thought was a nifty way to
|
||||
optimize the preprocessor's expansion algorithm wrong in quite subtle
|
||||
ways.
|
||||
|
||||
I strongly recommend you have a good grasp of how the C and C++
|
||||
standards require macros to be expanded before diving into this
|
||||
section, let alone the code!. If you don't have a clear mental
|
||||
picture of how things like nested macro expansion, stringification and
|
||||
token pasting are supposed to work, damage to you sanity can quickly
|
||||
token pasting are supposed to work, damage to your sanity can quickly
|
||||
result.
|
||||
|
||||
@section Internal representation of Macros
|
||||
@section Internal representation of macros
|
||||
@cindex macro representation (internal)
|
||||
|
||||
The preprocessor stores macro expansions in tokenized form. This
|
||||
saves repeated lexing passes during expansion, at the cost of a small
|
||||
increase in memory consumption on average. The tokens are stored
|
||||
contiguously in memory, so a pointer to the first one and a token
|
||||
count is all we need.
|
||||
count is all you need to get the replacement list of a macro.
|
||||
|
||||
If the macro is a function-like macro the preprocessor also stores its
|
||||
parameters, in the form of an ordered list of pointers to the hash
|
||||
@ -502,13 +502,137 @@ the original parameters to the macro, both for dumping with e.g.,
|
||||
@option{-dD}, and to warn about non-trivial macro redefinitions when
|
||||
the parameter names have changed.
|
||||
|
||||
@section Nested object-like macros
|
||||
@section Macro expansion overview
|
||||
The preprocessor maintains a @dfn{context stack}, implemented as a
|
||||
linked list of @code{cpp_context} structures, which together represent
|
||||
the macro expansion state at any one time. The @code{struct
|
||||
cpp_reader} member variable @code{context} points to the current top
|
||||
of this stack. The top normally holds the unexpanded replacement list
|
||||
of the innermost macro under expansion, except when cpplib is about to
|
||||
pre-expand an argument, in which case it holds that argument's
|
||||
unexpanded tokens.
|
||||
|
||||
@c TODO
|
||||
When there are no macros under expansion, cpplib is in @dfn{base
|
||||
context}. All contexts other than the base context contain a
|
||||
contiguous list of tokens delimited by a starting and ending token.
|
||||
When not in base context, cpplib obtains the next token from the list
|
||||
of the top context. If there are no tokens left in the list, it pops
|
||||
that context off the stack, and subsequent ones if necessary, until an
|
||||
unexhausted context is found or it returns to base context. In base
|
||||
context, cpplib reads tokens directly from the lexer.
|
||||
|
||||
@section Function-like macros
|
||||
If it encounters an identifier that is both a macro and enabled for
|
||||
expansion, cpplib prepares to push a new context for that macro on the
|
||||
stack by calling the routine @code{enter_macro_context}. When this
|
||||
routine returns, the new context will contain the unexpanded tokens of
|
||||
the replacement list of that macro. In the case of function-like
|
||||
macros, @code{enter_macro_context} also replaces any parameters in the
|
||||
replacement list, stored as @code{CPP_MACRO_ARG} tokens, with the
|
||||
appropriate macro argument. If the standard requires that the
|
||||
parameter be replaced with its expanded argument, the argument will
|
||||
have been fully macro expanded first.
|
||||
|
||||
@c TODO
|
||||
@code{enter_macro_context} also handles special macros like
|
||||
@code{__LINE__}. Although these macros expand to a single token which
|
||||
cannot contain any further macros, for reasons of token spacing
|
||||
(@pxref{Token Spacing}) and simplicity of implementation, cpplib
|
||||
handles these special macros by pushing a context containing just that
|
||||
one token.
|
||||
|
||||
The final thing that @code{enter_macro_context} does before returning
|
||||
is to mark the macro disabled for expansion (except for special macros
|
||||
like @code{__TIME__}). The macro is re-enabled when its context is
|
||||
later popped from the context stack, as described above. This strict
|
||||
ordering ensures that a macro is disabled whilst its expansion is
|
||||
being scanned, but that it is @emph{not} disabled whilst any arguments
|
||||
to it are being expanded.
|
||||
|
||||
@section Scanning the replacement list for macros to expand
|
||||
The C standard states that, after any parameters have been replaced
|
||||
with their possibly-expanded arguments, the replacement list is
|
||||
scanned for nested macros. Further, any identifiers in the
|
||||
replacement list that are not expanded during this scan are never
|
||||
again eligible for expansion in the future, if the reason they were
|
||||
not expanded is that the macro in question was disabled.
|
||||
|
||||
Clearly this latter condition can only apply to tokens resulting from
|
||||
argument pre-expansion. Other tokens never have an opportunity to be
|
||||
re-tested for expansion. It is possible for identifiers that are
|
||||
function-like macros to not expand initially but to expand during a
|
||||
later scan. This occurs when the identifier is the last token of an
|
||||
argument (and therefore originally followed by a comma or a closing
|
||||
parenthesis in its macro's argument list), and when it replaces its
|
||||
parameter in the macro's replacement list, the subsequent token
|
||||
happens to be an opening parenthesis (itself possibly the first token
|
||||
of an argument).
|
||||
|
||||
It is important to note that when cpplib reads the last token of a
|
||||
given context, that context still remains on the stack. Only when
|
||||
looking for the @emph{next} token do we pop it off the stack and drop
|
||||
to a lower context. This makes backing up by one token easy, but more
|
||||
importantly ensures that the macro corresponding to the current
|
||||
context is still disabled when we are considering the last token of
|
||||
its replacement list for expansion (or indeed expanding it). As an
|
||||
example, which illustrates many of the points above, consider
|
||||
|
||||
@smallexample
|
||||
#define foo(x) bar x
|
||||
foo(foo) (2)
|
||||
@end smallexample
|
||||
|
||||
@noindent which fully expands to @samp{bar foo (2)}. During pre-expansion
|
||||
of the argument, @samp{foo} does not expand even though the macro is
|
||||
enabled, since it has no following parenthesis [pre-expansion of an
|
||||
argument only uses tokens from that argument; it cannot take tokens
|
||||
from whatever follows the macro invocation]. This still leaves the
|
||||
argument token @samp{foo} eligible for future expansion. Then, when
|
||||
re-scanning after argument replacement, the token @samp{foo} is
|
||||
rejected for expansion, and marked ineligible for future expansion,
|
||||
since the macro is now disabled. It is disabled because the
|
||||
replacement list @samp{bar foo} of the macro is still on the context
|
||||
stack.
|
||||
|
||||
If instead the algorithm looked for an opening parenthesis first and
|
||||
then tested whether the macro were disabled it would be subtly wrong.
|
||||
In the example above, the replacement list of @samp{foo} would be
|
||||
popped in the process of finding the parenthesis, re-enabling
|
||||
@samp{foo} and expanding it a second time.
|
||||
|
||||
@section Looking for a function-like macro's opening parenthesis
|
||||
Function-like macros only expand when immediately followed by a
|
||||
parenthesis. To do this cpplib needs to temporarily disable macros
|
||||
and read the next token. Unfortunately, because of spacing issues
|
||||
(@pxref{Token Spacing}), there can be fake padding tokens in-between,
|
||||
and if the next real token is not a parenthesis cpplib needs to be
|
||||
able to back up that one token as well as retain the information in
|
||||
any intervening padding tokens.
|
||||
|
||||
Backing up more than one token when macros are involved is not
|
||||
permitted by cpplib, because in general it might involve issues like
|
||||
restoring popped contexts onto the context stack, which are too hard.
|
||||
Instead, searching for the parenthesis is handled by a special
|
||||
function, @code{funlike_invocation_p}, which remembers padding
|
||||
information as it reads tokens. If the next real token is not an
|
||||
opening parenthesis, it backs up that one token, and then pushes an
|
||||
extra context just containing the padding information if necessary.
|
||||
|
||||
@section Marking tokens ineligible for future expansion
|
||||
As discussed above, cpplib needs a way of marking tokens as
|
||||
unexpandable. Since the tokens cpplib handles are read-only once they
|
||||
have been lexed, it instead makes a copy of the token and adds the
|
||||
flag @code{NO_EXPAND} to the copy.
|
||||
|
||||
For efficiency and to simplify memory management by avoiding having to
|
||||
remember to free these tokens, they are allocated as temporary tokens
|
||||
from the lexer's current token run (@pxref{Lexing a line}) using the
|
||||
function @code{_cpp_temp_token}. The tokens are then re-used once the
|
||||
current line of tokens has been read in.
|
||||
|
||||
This might sound unsafe. However, tokens runs are not re-used at the
|
||||
end of a line if it happens to be in the middle of a macro argument
|
||||
list, and cpplib only wants to back-up more than one lexer token in
|
||||
situations where no macro expansion is involved, so the optimization
|
||||
is safe.
|
||||
|
||||
@node Token Spacing
|
||||
@unnumbered Token Spacing
|
||||
|
Loading…
Reference in New Issue
Block a user