diff options
Diffstat (limited to 'pod/perlinterp.pod')
-rw-r--r-- | pod/perlinterp.pod | 742 |
1 files changed, 742 insertions, 0 deletions
diff --git a/pod/perlinterp.pod b/pod/perlinterp.pod new file mode 100644 index 0000000000..5d16e8b3bc --- /dev/null +++ b/pod/perlinterp.pod @@ -0,0 +1,742 @@ +=encoding utf8 + +=for comment +Consistent formatting of this file is achieved with: + perl ./Porting/podtidy pod/perlinterp.pod + +=head1 NAME + +perlinterp - An overview of the Perl interpreter + +=head1 DESCRIPTION + +This document provides an overview of how the Perl interpreter works at +the level of C code, along with pointers to the relevant C source code +files. + +=head1 ELEMENTS OF THE INTERPRETER + +The work of the interpreter has two main stages: compiling the code +into the internal representation, or bytecode, and then executing it. +L<perlguts/Compiled code> explains exactly how the compilation stage +happens. + +Here is a short breakdown of perl's operation: + +=head2 Startup + +The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl) +This is very high-level code, enough to fit on a single screen, and it +resembles the code found in L<perlembed>; most of the real action takes +place in F<perl.c> + +F<perlmain.c> is generated by C<ExtUtils::Miniperl> from +F<miniperlmain.c> at make time, so you should make perl to follow this +along. + +First, F<perlmain.c> allocates some memory and constructs a Perl +interpreter, along these lines: + + 1 PERL_SYS_INIT3(&argc,&argv,&env); + 2 + 3 if (!PL_do_undump) { + 4 my_perl = perl_alloc(); + 5 if (!my_perl) + 6 exit(1); + 7 perl_construct(my_perl); + 8 PL_perl_destruct_level = 0; + 9 } + +Line 1 is a macro, and its definition is dependent on your operating +system. Line 3 references C<PL_do_undump>, a global variable - all +global variables in Perl start with C<PL_>. This tells you whether the +current running program was created with the C<-u> flag to perl and +then F<undump>, which means it's going to be false in any sane context. + +Line 4 calls a function in F<perl.c> to allocate memory for a Perl +interpreter. It's quite a simple function, and the guts of it looks +like this: + + my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter)); + +Here you see an example of Perl's system abstraction, which we'll see +later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's +own C<malloc> as defined in F<malloc.c> if you selected that option at +configure time. + +Next, in line 7, we construct the interpreter using perl_construct, +also in F<perl.c>; this sets up all the special variables that Perl +needs, the stacks, and so on. + +Now we pass Perl the command line options, and tell it to go: + + exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); + if (!exitstatus) + perl_run(my_perl); + + exitstatus = perl_destruct(my_perl); + + perl_free(my_perl); + +C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined +in F<perl.c>, which processes the command line options, sets up any +statically linked XS modules, opens the program and calls C<yyparse> to +parse it. + +=head2 Parsing + +The aim of this stage is to take the Perl source, and turn it into an +op tree. We'll see what one of those looks like later. Strictly +speaking, there's three things going on here. + +C<yyparse>, the parser, lives in F<perly.c>, although you're better off +reading the original YACC input in F<perly.y>. (Yes, Virginia, there +B<is> a YACC grammar for Perl!) The job of the parser is to take your +code and "understand" it, splitting it into sentences, deciding which +operands go with which operators and so on. + +The parser is nobly assisted by the lexer, which chunks up your input +into tokens, and decides what type of thing each token is: a variable +name, an operator, a bareword, a subroutine, a core function, and so +on. The main point of entry to the lexer is C<yylex>, and that and its +associated routines can be found in F<toke.c>. Perl isn't much like +other computer languages; it's highly context sensitive at times, it +can be tricky to work out what sort of token something is, or where a +token ends. As such, there's a lot of interplay between the tokeniser +and the parser, which can get pretty frightening if you're not used to +it. + +As the parser understands a Perl program, it builds up a tree of +operations for the interpreter to perform during execution. The +routines which construct and link together the various operations are +to be found in F<op.c>, and will be examined later. + +=head2 Optimization + +Now the parsing stage is complete, and the finished tree represents the +operations that the Perl interpreter needs to perform to execute our +program. Next, Perl does a dry run over the tree looking for +optimisations: constant expressions such as C<3 + 4> will be computed +now, and the optimizer will also see if any multiple operations can be +replaced with a single one. For instance, to fetch the variable +C<$foo>, instead of grabbing the glob C<*foo> and looking at the scalar +component, the optimizer fiddles the op tree to use a function which +directly looks up the scalar in question. The main optimizer is C<peep> +in F<op.c>, and many ops have their own optimizing functions. + +=head2 Running + +Now we're finally ready to go: we have compiled Perl byte code, and all +that's left to do is run it. The actual execution is done by the +C<runops_standard> function in F<run.c>; more specifically, it's done +by these three innocent looking lines: + + while ((PL_op = PL_op->op_ppaddr(aTHX))) { + PERL_ASYNC_CHECK(); + } + +You may be more comfortable with the Perl version of that: + + PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}}; + +Well, maybe not. Anyway, each op contains a function pointer, which +stipulates the function which will actually carry out the operation. +This function will return the next op in the sequence - this allows for +things like C<if> which choose the next op dynamically at run time. The +C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt +execution if required. + +The actual functions called are known as PP code, and they're spread +between four files: F<pp_hot.c> contains the "hot" code, which is most +often used and highly optimized, F<pp_sys.c> contains all the +system-specific functions, F<pp_ctl.c> contains the functions which +implement control structures (C<if>, C<while> and the like) and F<pp.c> +contains everything else. These are, if you like, the C code for Perl's +built-in functions and operators. + +Note that each C<pp_> function is expected to return a pointer to the +next op. Calls to perl subs (and eval blocks) are handled within the +same runops loop, and do not consume extra space on the C stack. For +example, C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or +C<CxEVAL> block struct onto the context stack which contain the address +of the op following the sub call or eval. They then return the first op +of that sub or eval block, and so execution continues of that sub or +block. Later, a C<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB> +or C<CxEVAL>, retrieves the return op from it, and returns it. + +=head2 Exception handing + +Perl's exception handing (i.e. C<die> etc.) is built on top of the +low-level C<setjmp()>/C<longjmp()> C-library functions. These basically +provide a way to capture the current PC and SP registers and later +restore them; i.e. a C<longjmp()> continues at the point in code where +a previous C<setjmp()> was done, with anything further up on the C +stack being lost. This is why code should always save values using +C<SAVE_FOO> rather than in auto variables. + +The perl core wraps C<setjmp()> etc in the macros C<JMPENV_PUSH> and +C<JMPENV_JUMP>. The basic rule of perl exceptions is that C<exit>, and +C<die> (in the absence of C<eval>) perform a C<JMPENV_JUMP(2)>, while +C<die> within C<eval> does a C<JMPENV_JUMP(3)>. + +At entry points to perl, such as C<perl_parse()>, C<perl_run()> and +C<call_sv(cv, G_EVAL)> each does a C<JMPENV_PUSH>, then enter a runops +loop or whatever, and handle possible exception returns. For a 2 +return, final cleanup is performed, such as popping stacks and calling +C<CHECK> or C<END> blocks. Amongst other things, this is how scope +cleanup still occurs during an C<exit>. + +If a C<die> can find a C<CxEVAL> block on the context stack, then the +stack is popped to that level and the return op in that block is +assigned to C<PL_restartop>; then a C<JMPENV_JUMP(3)> is performed. +This normally passes control back to the guard. In the case of +C<perl_run> and C<call_sv>, a non-null C<PL_restartop> triggers +re-entry to the runops loop. The is the normal way that C<die> or +C<croak> is handled within an C<eval>. + +Sometimes ops are executed within an inner runops loop, such as tie, +sort or overload code. In this case, something like + + sub FETCH { eval { die } } + +would cause a longjmp right back to the guard in C<perl_run>, popping +both runops loops, which is clearly incorrect. One way to avoid this is +for the tie code to do a C<JMPENV_PUSH> before executing C<FETCH> in +the inner runops loop, but for efficiency reasons, perl in fact just +sets a flag, using C<CATCH_SET(TRUE)>. The C<pp_require>, +C<pp_entereval> and C<pp_entertry> ops check this flag, and if true, +they call C<docatch>, which does a C<JMPENV_PUSH> and starts a new +runops level to execute the code, rather than doing it on the current +loop. + +As a further optimisation, on exit from the eval block in the C<FETCH>, +execution of the code following the block is still carried on in the +inner loop. When an exception is raised, C<docatch> compares the +C<JMPENV> level of the C<CxEVAL> with C<PL_top_env> and if they differ, +just re-throws the exception. In this way any inner loops get popped. + +Here's an example. + + 1: eval { tie @a, 'A' }; + 2: sub A::TIEARRAY { + 3: eval { die }; + 4: die; + 5: } + +To run this code, C<perl_run> is called, which does a C<JMPENV_PUSH> +then enters a runops loop. This loop executes the eval and tie ops on +line 1, with the eval pushing a C<CxEVAL> onto the context stack. + +The C<pp_tie> does a C<CATCH_SET(TRUE)>, then starts a second runops +loop to execute the body of C<TIEARRAY>. When it executes the entertry +op on line 3, C<CATCH_GET> is true, so C<pp_entertry> calls C<docatch> +which does a C<JMPENV_PUSH> and starts a third runops loop, which then +executes the die op. At this point the C call stack looks like this: + + Perl_pp_die + Perl_runops # third loop + S_docatch_body + S_docatch + Perl_pp_entertry + Perl_runops # second loop + S_call_body + Perl_call_sv + Perl_pp_tie + Perl_runops # first loop + S_run_body + perl_run + main + +and the context and data stacks, as shown by C<-Dstv>, look like: + + STACK 0: MAIN + CX 0: BLOCK => + CX 1: EVAL => AV() PV("A"\0) + retop=leave + STACK 1: MAGIC + CX 0: SUB => + retop=(null) + CX 1: EVAL => * + retop=nextstate + +The die pops the first C<CxEVAL> off the context stack, sets +C<PL_restartop> from it, does a C<JMPENV_JUMP(3)>, and control returns +to the top C<docatch>. This then starts another third-level runops +level, which executes the nextstate, pushmark and die ops on line 4. At +the point that the second C<pp_die> is called, the C call stack looks +exactly like that above, even though we are no longer within an inner +eval; this is because of the optimization mentioned earlier. However, +the context stack now looks like this, ie with the top CxEVAL popped: + + STACK 0: MAIN + CX 0: BLOCK => + CX 1: EVAL => AV() PV("A"\0) + retop=leave + STACK 1: MAGIC + CX 0: SUB => + retop=(null) + +The die on line 4 pops the context stack back down to the CxEVAL, +leaving it as: + + STACK 0: MAIN + CX 0: BLOCK => + +As usual, C<PL_restartop> is extracted from the C<CxEVAL>, and a +C<JMPENV_JUMP(3)> done, which pops the C stack back to the docatch: + + S_docatch + Perl_pp_entertry + Perl_runops # second loop + S_call_body + Perl_call_sv + Perl_pp_tie + Perl_runops # first loop + S_run_body + perl_run + main + +In this case, because the C<JMPENV> level recorded in the C<CxEVAL> +differs from the current one, C<docatch> just does a C<JMPENV_JUMP(3)> +and the C stack unwinds to: + + perl_run + main + +Because C<PL_restartop> is non-null, C<run_body> starts a new runops +loop and execution continues. + +=head2 INTERNAL VARIABLE TYPES + +You should by now have had a look at L<perlguts>, which tells you about +Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do +that now. + +These variables are used not only to represent Perl-space variables, +but also any constants in the code, as well as some structures +completely internal to Perl. The symbol table, for instance, is an +ordinary Perl hash. Your code is represented by an SV as it's read into +the parser; any program files you call are opened via ordinary Perl +filehandles, and so on. + +The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a +Perl program. Let's see, for instance, how Perl treats the constant +C<"hello">. + + % perl -MDevel::Peek -e 'Dump("hello")' + 1 SV = PV(0xa041450) at 0xa04ecbc + 2 REFCNT = 1 + 3 FLAGS = (POK,READONLY,pPOK) + 4 PV = 0xa0484e0 "hello"\0 + 5 CUR = 5 + 6 LEN = 6 + +Reading C<Devel::Peek> output takes a bit of practise, so let's go +through it line by line. + +Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in +memory. SVs themselves are very simple structures, but they contain a +pointer to a more complex structure. In this case, it's a PV, a +structure which holds a string value, at location C<0xa041450>. Line 2 +is the reference count; there are no other references to this data, so +it's 1. + +Line 3 are the flags for this SV - it's OK to use it as a PV, it's a +read-only SV (because it's a constant) and the data is a PV internally. +Next we've got the contents of the string, starting at location +C<0xa0484e0>. + +Line 5 gives us the current length of the string - note that this does +B<not> include the null terminator. Line 6 is not the length of the +string, but the length of the currently allocated buffer; as the string +grows, Perl automatically extends the available storage via a routine +called C<SvGROW>. + +You can get at any of these quantities from C very easily; just add +C<Sv> to the name of the field shown in the snippet, and you've got a +macro which will return the value: C<SvCUR(sv)> returns the current +length of the string, C<SvREFCOUNT(sv)> returns the reference count, +C<SvPV(sv, len)> returns the string itself with its length, and so on. +More macros to manipulate these properties can be found in L<perlguts>. + +Let's take an example of manipulating a PV, from C<sv_catpvn>, in +F<sv.c> + + 1 void + 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len) + 3 { + 4 STRLEN tlen; + 5 char *junk; + + 6 junk = SvPV_force(sv, tlen); + 7 SvGROW(sv, tlen + len + 1); + 8 if (ptr == junk) + 9 ptr = SvPVX(sv); + 10 Move(ptr,SvPVX(sv)+tlen,len,char); + 11 SvCUR(sv) += len; + 12 *SvEND(sv) = '\0'; + 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */ + 14 SvTAINT(sv); + 15 } + +This is a function which adds a string, C<ptr>, of length C<len> onto +the end of the PV stored in C<sv>. The first thing we do in line 6 is +make sure that the SV B<has> a valid PV, by calling the C<SvPV_force> +macro to force a PV. As a side effect, C<tlen> gets set to the current +value of the PV, and the PV itself is returned to C<junk>. + +In line 7, we make sure that the SV will have enough room to +accommodate the old string, the new string and the null terminator. If +C<LEN> isn't big enough, C<SvGROW> will reallocate space for us. + +Now, if C<junk> is the same as the string we're trying to add, we can +grab the string directly from the SV; C<SvPVX> is the address of the PV +in the SV. + +Line 10 does the actual catenation: the C<Move> macro moves a chunk of +memory around: we move the string C<ptr> to the end of the PV - that's +the start of the PV plus its current length. We're moving C<len> bytes +of type C<char>. After doing so, we need to tell Perl we've extended +the string, by altering C<CUR> to reflect the new length. C<SvEND> is a +macro which gives us the end of the string, so that needs to be a +C<"\0">. + +Line 13 manipulates the flags; since we've changed the PV, any IV or NV +values will no longer be valid: if we have C<$a=10; $a.="6";> we don't +want to use the old IV of 10. C<SvPOK_only_utf8> is a special +UTF-8-aware version of C<SvPOK_only>, a macro which turns off the IOK +and NOK flags and turns on POK. The final C<SvTAINT> is a macro which +launders tainted data if taint mode is turned on. + +AVs and HVs are more complicated, but SVs are by far the most common +variable type being thrown around. Having seen something of how we +manipulate these, let's go on and look at how the op tree is +constructed. + +=head1 OP TREES + +First, what is the op tree, anyway? The op tree is the parsed +representation of your program, as we saw in our section on parsing, +and it's the sequence of operations that Perl goes through to execute +your program, as we saw in L</Running>. + +An op is a fundamental operation that Perl can perform: all the +built-in functions and operators are ops, and there are a series of ops +which deal with concepts the interpreter needs internally - entering +and leaving a block, ending a statement, fetching a variable, and so +on. + +The op tree is connected in two ways: you can imagine that there are +two "routes" through it, two orders in which you can traverse the tree. +First, parse order reflects how the parser understood the code, and +secondly, execution order tells perl what order to perform the +operations in. + +The easiest way to examine the op tree is to stop Perl after it has +finished parsing, and get it to dump out the tree. This is exactly what +the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise> +and L<B::Debug|B::Debug> do. + +Let's have a look at how Perl sees C<$a = $b + $c>: + + % perl -MO=Terse -e '$a=$b+$c' + 1 LISTOP (0x8179888) leave + 2 OP (0x81798b0) enter + 3 COP (0x8179850) nextstate + 4 BINOP (0x8179828) sassign + 5 BINOP (0x8179800) add [1] + 6 UNOP (0x81796e0) null [15] + 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b + 8 UNOP (0x81797e0) null [15] + 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c + 10 UNOP (0x816b4f0) null [15] + 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a + +Let's start in the middle, at line 4. This is a BINOP, a binary +operator, which is at location C<0x8179828>. The specific operator in +question is C<sassign> - scalar assignment - and you can find the code +which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a +binary operator, it has two children: the add operator, providing the +result of C<$b+$c>, is uppermost on line 5, and the left hand side is +on line 10. + +Line 10 is the null op: this does exactly nothing. What is that doing +there? If you see the null op, it's a sign that something has been +optimized away after parsing. As we mentioned in L</Optimization>, the +optimization stage sometimes converts two operations into one, for +example when fetching a scalar variable. When this happens, instead of +rewriting the op tree and cleaning up the dangling pointers, it's +easier just to replace the redundant operation with the null op. +Originally, the tree would have looked like this: + + 10 SVOP (0x816b4f0) rv2sv [15] + 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a + +That is, fetch the C<a> entry from the main symbol table, and then look +at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>) +happens to do both these things. + +The right hand side, starting at line 5 is similar to what we've just +seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add +together two C<gvsv>s. + +Now, what's this about? + + 1 LISTOP (0x8179888) leave + 2 OP (0x81798b0) enter + 3 COP (0x8179850) nextstate + +C<enter> and C<leave> are scoping ops, and their job is to perform any +housekeeping every time you enter and leave a block: lexical variables +are tidied up, unreferenced variables are destroyed, and so on. Every +program will have those first three lines: C<leave> is a list, and its +children are all the statements in the block. Statements are delimited +by C<nextstate>, so a block is a collection of C<nextstate> ops, with +the ops to be performed for each statement being the children of +C<nextstate>. C<enter> is a single op which functions as a marker. + +That's how Perl parsed the program, from top to bottom: + + Program + | + Statement + | + = + / \ + / \ + $a + + / \ + $b $c + +However, it's impossible to B<perform> the operations in this order: +you have to find the values of C<$b> and C<$c> before you add them +together, for instance. So, the other thread that runs through the op +tree is the execution order: each op has a field C<op_next> which +points to the next op to be run, so following these pointers tells us +how perl executes the code. We can traverse the tree in this order +using the C<exec> option to C<B::Terse>: + + % perl -MO=Terse,exec -e '$a=$b+$c' + 1 OP (0x8179928) enter + 2 COP (0x81798c8) nextstate + 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b + 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c + 5 BINOP (0x8179878) add [1] + 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a + 7 BINOP (0x81798a0) sassign + 8 LISTOP (0x8179900) leave + +This probably makes more sense for a human: enter a block, start a +statement. Get the values of C<$b> and C<$c>, and add them together. +Find C<$a>, and assign one to the other. Then leave. + +The way Perl builds up these op trees in the parsing process can be +unravelled by examining F<perly.y>, the YACC grammar. Let's take the +piece we need to construct the tree for C<$a = $b + $c> + + 1 term : term ASSIGNOP term + 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); } + 3 | term ADDOP term + 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } + +If you're not used to reading BNF grammars, this is how it works: +You're fed certain things by the tokeniser, which generally end up in +upper case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in +your code. C<ASSIGNOP> is provided when C<=> is used for assigning. +These are "terminal symbols", because you can't get any simpler than +them. + +The grammar, lines one and three of the snippet above, tells you how to +build up more complex forms. These complex forms, "non-terminal +symbols" are generally placed in lower case. C<term> here is a +non-terminal symbol, representing a single expression. + +The grammar gives you the following rule: you can make the thing on the +left of the colon if you see all the things on the right in sequence. +This is called a "reduction", and the aim of parsing is to completely +reduce the input. There are several different ways you can perform a +reduction, separated by vertical bars: so, C<term> followed by C<=> +followed by C<term> makes a C<term>, and C<term> followed by C<+> +followed by C<term> can also make a C<term>. + +So, if you see two terms with an C<=> or C<+>, between them, you can +turn them into a single expression. When you do this, you execute the +code in the block on the next line: if you see C<=>, you'll do the code +in line 2. If you see C<+>, you'll do the code in line 4. It's this +code which contributes to the op tree. + + | term ADDOP term + { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } + +What this does is creates a new binary op, and feeds it a number of +variables. The variables refer to the tokens: C<$1> is the first token +in the input, C<$2> the second, and so on - think regular expression +backreferences. C<$$> is the op returned from this reduction. So, we +call C<newBINOP> to create a new binary operator. The first parameter +to C<newBINOP>, a function in F<op.c>, is the op type. It's an addition +operator, so we want the type to be C<ADDOP>. We could specify this +directly, but it's right there as the second token in the input, so we +use C<$2>. The second parameter is the op's flags: 0 means "nothing +special". Then the things to add: the left and right hand side of our +expression, in scalar context. + +=head1 STACKS + +When perl executes something like C<addop>, how does it pass on its +results to the next op? The answer is, through the use of stacks. Perl +has a number of stacks to store things it's currently working on, and +we'll look at the three most important ones here. + +=head2 Argument stack + +Arguments are passed to PP code and returned from PP code using the +argument stack, C<ST>. The typical way to handle arguments is to pop +them off the stack, deal with them how you wish, and then push the +result back onto the stack. This is how, for instance, the cosine +operator works: + + NV value; + value = POPn; + value = Perl_cos(value); + XPUSHn(value); + +We'll see a more tricky example of this when we consider Perl's macros +below. C<POPn> gives you the NV (floating point value) of the top SV on +the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and +push the result back as an NV. The C<X> in C<XPUSHn> means that the +stack should be extended if necessary - it can't be necessary here, +because we know there's room for one more item on the stack, since +we've just removed one! The C<XPUSH*> macros at least guarantee safety. + +Alternatively, you can fiddle with the stack directly: C<SP> gives you +the first element in your portion of the stack, and C<TOP*> gives you +the top SV/IV/NV/etc. on the stack. So, for instance, to do unary +negation of an integer: + + SETi(-TOPi); + +Just set the integer value of the top stack entry to its negation. + +Argument stack manipulation in the core is exactly the same as it is in +XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer +description of the macros used in stack manipulation. + +=head2 Mark stack + +I say "your portion of the stack" above because PP code doesn't +necessarily get the whole stack to itself: if your function calls +another function, you'll only want to expose the arguments aimed for +the called function, and not (necessarily) let it get at your own data. +The way we do this is to have a "virtual" bottom-of-stack, exposed to +each function. The mark stack keeps bookmarks to locations in the +argument stack usable by each function. For instance, when dealing with +a tied variable, (internally, something with "P" magic) Perl has to +call methods for accesses to the tied variables. However, we need to +separate the arguments exposed to the method to the argument exposed to +the original function - the store or fetch or whatever it may be. +Here's roughly how the tied C<push> is implemented; see C<av_push> in +F<av.c>: + + 1 PUSHMARK(SP); + 2 EXTEND(SP,2); + 3 PUSHs(SvTIED_obj((SV*)av, mg)); + 4 PUSHs(val); + 5 PUTBACK; + 6 ENTER; + 7 call_method("PUSH", G_SCALAR|G_DISCARD); + 8 LEAVE; + +Let's examine the whole implementation, for practice: + + 1 PUSHMARK(SP); + +Push the current state of the stack pointer onto the mark stack. This +is so that when we've finished adding items to the argument stack, Perl +knows how many things we've added recently. + + 2 EXTEND(SP,2); + 3 PUSHs(SvTIED_obj((SV*)av, mg)); + 4 PUSHs(val); + +We're going to add two more items onto the argument stack: when you +have a tied array, the C<PUSH> subroutine receives the object and the +value to be pushed, and that's exactly what we have here - the tied +object, retrieved with C<SvTIED_obj>, and the value, the SV C<val>. + + 5 PUTBACK; + +Next we tell Perl to update the global stack pointer from our internal +variable: C<dSP> only gave us a local copy, not a reference to the +global. + + 6 ENTER; + 7 call_method("PUSH", G_SCALAR|G_DISCARD); + 8 LEAVE; + +C<ENTER> and C<LEAVE> localise a block of code - they make sure that +all variables are tidied up, everything that has been localised gets +its previous value returned, and so on. Think of them as the C<{> and +C<}> of a Perl block. + +To actually do the magic method call, we have to call a subroutine in +Perl space: C<call_method> takes care of that, and it's described in +L<perlcall>. We call the C<PUSH> method in scalar context, and we're +going to discard its return value. The call_method() function removes +the top element of the mark stack, so there is nothing for the caller +to clean up. + +=head2 Save stack + +C doesn't have a concept of local scope, so perl provides one. We've +seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save +stack implements the C equivalent of, for example: + + { + local $foo = 42; + ... + } + +See L<perlguts/Localising Changes> for how to use the save stack. + +=head1 MILLIONS OF MACROS + +One thing you'll notice about the Perl source is that it's full of +macros. Some have called the pervasive use of macros the hardest thing +to understand, others find it adds to clarity. Let's take an example, +the code which implements the addition operator: + + 1 PP(pp_add) + 2 { + 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); + 4 { + 5 dPOPTOPnnrl_ul; + 6 SETn( left + right ); + 7 RETURN; + 8 } + 9 } + +Every line here (apart from the braces, of course) contains a macro. +The first line sets up the function declaration as Perl expects for PP +code; line 3 sets up variable declarations for the argument stack and +the target, the return value of the operation. Finally, it tries to see +if the addition operation is overloaded; if so, the appropriate +subroutine is called. + +Line 5 is another variable declaration - all variable declarations +start with C<d> - which pops from the top of the argument stack two NVs +(hence C<nn>) and puts them into the variables C<right> and C<left>, +hence the C<rl>. These are the two operands to the addition operator. +Next, we call C<SETn> to set the NV of the return value to the result +of adding the two values. This done, we return - the C<RETURN> macro +makes sure that our return value is properly handled, and we pass the +next operator to run back to the main run loop. + +Most of these macros are explained in L<perlapi>, and some of the more +important ones are explained in L<perlxs> as well. Pay special +attention to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for +information on the C<[pad]THX_?> macros. + +=head1 FURTHER READING + +For more information on the Perl internals, please see the documents +listed at L<perl/Internals and C Language Interface>. |