diff options
author | Neil Booth <neil@daikokuya.demon.co.uk> | 2001-10-06 11:29:51 +0000 |
---|---|---|
committer | Neil Booth <neil@gcc.gnu.org> | 2001-10-06 11:29:51 +0000 |
commit | 5b810d3c839d7b5b208bd036a7bfc947830e611b (patch) | |
tree | 75faf39e4eedd33b096e8fbef9e57ab4111e9e87 | |
parent | d644be7b4c9aba239a4ce9b29375cbff27705746 (diff) | |
download | gcc-5b810d3c839d7b5b208bd036a7bfc947830e611b.tar.gz |
* doc/cppinternals.texi: Update.
From-SVN: r46050
-rw-r--r-- | gcc/ChangeLog | 4 | ||||
-rw-r--r-- | gcc/doc/cppinternals.texi | 112 |
2 files changed, 67 insertions, 49 deletions
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8f5899df428..45fb7e255ea 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,7 @@ +2001-10-06 Neil Booth <neil@daikokuya.demon.co.uk> + + * doc/cppinternals.texi: Update. + 2001-10-06 Zack Weinberg <zack@codesourcery.com> * gcc.c (main): Set this_file_error if the appropriate diff --git a/gcc/doc/cppinternals.texi b/gcc/doc/cppinternals.texi index dee2dea5133..95c4ceba9fa 100644 --- a/gcc/doc/cppinternals.texi +++ b/gcc/doc/cppinternals.texi @@ -41,7 +41,7 @@ into another language, under the above conditions for modified versions. @titlepage @c @finalout @title Cpplib Internals -@subtitle Last revised September 2001 +@subtitle Last revised October 2001 @subtitle for GCC version 3.1 @author Neil Booth @page @@ -71,7 +71,7 @@ into another language, under the above conditions for modified versions. @chapter Cpplib---the core of the GNU C Preprocessor The GNU C preprocessor in GCC 3.x has been completely rewritten. It is -now implemented as a library, cpplib, so it can be easily shared between +now implemented as a library, @dfn{cpplib}, so it can be easily shared between a stand-alone preprocessor, and a preprocessor integrated with the C, C++ and Objective-C front ends. It is also available for use by other programs, though this is not recommended as its exposed interface has @@ -498,12 +498,13 @@ both for aesthetic reasons and because it causes problems for people who still try to abuse the preprocessor for things like Fortran source and Makefiles. -For now, just notice that the only places we need to be careful about -@dfn{paste avoidance} are when tokens are added (or removed) from the -original token stream. This only occurs because of macro expansion, but -care is needed in many places: before @strong{and} after each macro -replacement, each argument replacement, and additionally each token -created by the @samp{#} and @samp{##} operators. +For now, just notice that when tokens are added (or removed, as shown by +the @code{EMPTY} example) from the original lexed token stream, we need +to check for accidental token pasting. We call this @dfn{paste +avoidance}. Token addition and removal can only occur because of macro +expansion, but accidental pasting can occur in many places: both before +and after each macro replacement, each argument replacement, and +additionally each token created by the @samp{#} and @samp{##} operators. Let's look at how the preprocessor gets whitespace output correct normally. The @code{cpp_token} structure contains a flags byte, and one @@ -512,7 +513,7 @@ indicates that the token was preceded by whitespace of some form other than a new line. The stand-alone preprocessor can use this flag to decide whether to insert a space between tokens in the output. -Now consider the following: +Now consider the result of the following macro expansion: @smallexample #define add(x, y, z) x + y +z; @@ -524,20 +525,21 @@ The interesting thing here is that the tokens @samp{1} and @samp{2} are output with a preceding space, and @samp{3} is output without a preceding space, but when lexed none of these tokens had that property. Careful consideration reveals that @samp{1} gets its preceding -whitespace from the space preceding @samp{add} in the macro -@emph{invocation}, @samp{2} gets its whitespace from the space preceding -the parameter @samp{y} in the macro @emph{replacement list}, and -@samp{3} has no preceding space because parameter @samp{z} has none in -the replacement list. +whitespace from the space preceding @samp{add} in the macro invocation, +@emph{not} replacement list. @samp{2} gets its whitespace from the +space preceding the parameter @samp{y} in the macro replacement list, +and @samp{3} has no preceding space because parameter @samp{z} has none +in the replacement list. Once lexed, tokens are effectively fixed and cannot be altered, since pointers to them might be held in many places, in particular by in-progress macro expansions. So instead of modifying the two tokens above, the preprocessor inserts a special token, which I call a -@dfn{padding token}, into the token stream in front of every macro -expansion and expanded macro argument, to indicate that the subsequent -token should assume its @code{PREV_WHITE} flag from a different -@dfn{source token}. In the above example, the source tokens are +@dfn{padding token}, into the token stream to indicate that spacing of +the subsequent token is special. The preprocessor inserts padding +tokens in front of every macro expansion and expanded macro argument. +These point to a @dfn{source token} from which the subsequent real token +should inherit its spacing. In the above example, the source tokens are @samp{add} in the macro invocation, and @samp{y} and @samp{z} in the macro replacement list, respectively. @@ -551,10 +553,14 @@ a macro's first replacement token expands straight into another macro. @expansion{} [baz] @end smallexample -Here, two padding tokens with sources @samp{foo} between the brackets, -and @samp{bar} from foo's replacement list, are generated. Clearly the -first padding token is the one that matters. But what if we happen to -leave a macro expansion? Adjusting the above example slightly: +Here, two padding tokens are generated with sources the @samp{foo} token +between the brackets, and the @samp{bar} token from foo's replacement +list, respectively. Clearly the first padding token is the one we +should use, so our output code should contain a rule that the first +padding token in a sequence is the one that matters. + +But what if we happen to leave a macro expansion? Adjusting the above +example slightly: @smallexample #define foo bar @@ -564,33 +570,41 @@ leave a macro expansion? Adjusting the above example slightly: @expansion{} [ baz] ; @end smallexample -As shown, now there should be a space before baz and the semicolon. Our -initial algorithm fails for the former, because we would see three -padding tokens, one per macro invocation, followed by @samp{baz}, which -would have inherit its spacing from the original source, @samp{foo}, -which has no leading space. Note that it is vital that cpplib get -spacing correct in these examples, since any of these macro expansions -could be stringified, where spacing matters. - -So, I have demonstrated that not just entering macro and argument -expansions, but leaving them requires special handling too. So cpplib -inserts a padding token with a @code{NULL} source token when leaving -macro expansions and after each replaced argument in a macro's -replacement list. It also inserts appropriate padding tokens on either -side of tokens created by the @samp{#} and @samp{##} operators. - -Now we can see the relationship with paste avoidance: we have to be -careful about paste avoidance in exactly the same locations we take care -to get white space correct. This makes implementation of paste -avoidance easy: wherever the stand-alone preprocessor is fixing up -spacing because of padding tokens, and it turns out that no space is -needed, it has to take the extra step to check that a space is not -needed after all to avoid an accidental paste. The function -@code{cpp_avoid_paste} advises whether a space is required between two -consecutive tokens. To avoid excessive spacing, it tries hard to only -require a space if one is likely to be necessary, but for reasons of -efficiency it is slightly conservative and might recommend a space where -one is not strictly needed. +As shown, now there should be a space before @samp{baz} and the +semicolon in the output. + +The rules we decided above fail for @samp{baz}: we generate three +padding tokens, one per macro invocation, before the token @samp{baz}. +We would then have it take its spacing from the first of these, which +carries source token @samp{foo} with no leading space. + +It is vital that cpplib get spacing correct in these examples since any +of these macro expansions could be stringified, where spacing matters. + +So, this demonstrates that not just entering macro and argument +expansions, but leaving them requires special handling too. I made +cpplib insert a padding token with a @code{NULL} source token when +leaving macro expansions, as well as after each replaced argument in a +macro's replacement list. It also inserts appropriate padding tokens on +either side of tokens created by the @samp{#} and @samp{##} operators. +I expanded the rule so that, if we see a padding token with a +@code{NULL} source token, @emph{and} that source token has no leading +space, then we behave as if we have seen no padding tokens at all. A +quick check shows this rule will then get the above example correct as +well. + +Now a relationship with paste avoidance is apparent: we have to be +careful about paste avoidance in exactly the same locations we have +padding tokens in order to get white space correct. This makes +implementation of paste avoidance easy: wherever the stand-alone +preprocessor is fixing up spacing because of padding tokens, and it +turns out that no space is needed, it has to take the extra step to +check that a space is not needed after all to avoid an accidental paste. +The function @code{cpp_avoid_paste} advises whether a space is required +between two consecutive tokens. To avoid excessive spacing, it tries +hard to only require a space if one is likely to be necessary, but for +reasons of efficiency it is slightly conservative and might recommend a +space where one is not strictly needed. @node Line Numbering @unnumbered Line numbering |