summaryrefslogtreecommitdiff
path: root/pod/perlinterp.pod
diff options
context:
space:
mode:
authorFather Chrysostomos <sprout@cpan.org>2016-08-10 23:43:34 -0700
committerFather Chrysostomos <sprout@cpan.org>2016-08-25 21:53:23 -0700
commit65169990ec2fa183dd798b11e833db0f15b2dc24 (patch)
tree155cf3681646477bde189f989e4d07bc78a55766 /pod/perlinterp.pod
parent1fdb5498519a40e7ce6b5adced61c49638141e25 (diff)
downloadperl-65169990ec2fa183dd798b11e833db0f15b2dc24.tar.gz
perlinterp.pod: Expand the op tree section
based on things that came up in the thread starting at <20160808225325.79944d95@shy.leonerd.org.uk>.
Diffstat (limited to 'pod/perlinterp.pod')
-rw-r--r--pod/perlinterp.pod89
1 files changed, 84 insertions, 5 deletions
diff --git a/pod/perlinterp.pod b/pod/perlinterp.pod
index 5c41e29391..e1af33370a 100644
--- a/pod/perlinterp.pod
+++ b/pod/perlinterp.pod
@@ -531,8 +531,45 @@ statement. Get the values of C<$b> and C<$c>, and add them together.
Find C<$a>, and assign one to the other. Then leave.
The way Perl builds up these op trees in the parsing process can be
-unravelled by examining F<perly.y>, the YACC grammar. Let's take the
-piece we need to construct the tree for C<$a = $b + $c>
+unravelled by examining F<toke.c>, the lexer, and F<perly.y>, the YACC
+grammar. Let's look at the code that constructs the tree for C<$a = $b +
+$c>.
+
+First, we'll look at the C<Perl_yylex> function in the lexer. We want to
+look for C<case 'x'>, where x is the first character of the operator.
+(Incidentally, when looking for the code that handles a keyword, you'll
+want to search for C<KEY_foo> where "foo" is the keyword.) Here is the code
+that handles assignment (there are quite a few operators beginning with
+C<=>, so most of it is omitted for brevity):
+
+ 1 case '=':
+ 2 s++;
+ ... code that handles == => etc. and pod ...
+ 3 pl_yylval.ival = 0;
+ 4 OPERATOR(ASSIGNOP);
+
+We can see on line 4 that our token type is C<ASSIGNOP> (C<OPERATOR> is a
+macro, defined in F<toke.c>, that returns the token type, among other
+things). And C<+>:
+
+ 1 case '+':
+ 2 {
+ 3 const char tmp = *s++;
+ ... code for ++ ...
+ 4 if (PL_expect == XOPERATOR) {
+ ...
+ 5 Aop(OP_ADD);
+ 6 }
+ ...
+ 7 }
+
+Line 4 checks what type of token we are expecting. C<Aop> returns a token.
+If you search for C<Aop> elsewhere in F<toke.c>, you will see that it
+returns an C<ADDOP> token.
+
+Now that we know the two token types we want to look for in the parser,
+let's take the piece of F<perly.y> we need to construct the tree for
+C<$a = $b + $c>
1 term : term ASSIGNOP term
2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
@@ -541,9 +578,8 @@ piece we need to construct the tree for C<$a = $b + $c>
If you're not used to reading BNF grammars, this is how it works:
You're fed certain things by the tokeniser, which generally end up in
-upper case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in
-your code. C<ASSIGNOP> is provided when C<=> is used for assigning.
-These are "terminal symbols", because you can't get any simpler than
+upper case. C<ADDOP> and C<ASSIGNOP> are examples of "terminal symbols",
+because you can't get any simpler than
them.
The grammar, lines one and three of the snippet above, tells you how to
@@ -580,6 +616,49 @@ use C<$2>. The second parameter is the op's flags: 0 means "nothing
special". Then the things to add: the left and right hand side of our
expression, in scalar context.
+The functions that create ops, which have names like C<newUNOP> and
+C<newBINOP>, call a "check" function associated with each op type, before
+returning the op. The check functions can mangle the op as they see fit,
+and even replace it with an entirely new one. These functions are defined
+in F<op.c>, and have a C<Perl_ck_> prefix. You can find out which
+check function is used for a particular op type by looking in
+F<regen/opcodes>. Take C<OP_ADD>, for example. (C<OP_ADD> is the token
+value from the C<Aop(OP_ADD)> in F<toke.c> which the parser passes to
+C<newBINOP> as its first argument.) Here is the relevant line:
+
+ add addition (+) ck_null IfsT2 S S
+
+The check function in this case is C<Perl_ck_null>, which does nothing.
+Let's look at a more interesting case:
+
+ readline <HANDLE> ck_readline t% F?
+
+And here is the function from F<op.c>:
+
+ 1 OP *
+ 2 Perl_ck_readline(pTHX_ OP *o)
+ 3 {
+ 4 PERL_ARGS_ASSERT_CK_READLINE;
+ 5
+ 6 if (o->op_flags & OPf_KIDS) {
+ 7 OP *kid = cLISTOPo->op_first;
+ 8 if (kid->op_type == OP_RV2GV)
+ 9 kid->op_private |= OPpALLOW_FAKE;
+ 10 }
+ 11 else {
+ 12 OP * const newop
+ 13 = newUNOP(OP_READLINE, 0, newGVOP(OP_GV, 0,
+ 14 PL_argvgv));
+ 15 op_free(o);
+ 16 return newop;
+ 17 }
+ 18 return o;
+ 19 }
+
+One particularly interesting aspect is that if the op has no kids (i.e.,
+C<readline()> or C<< <> >>) the op is freed and replaced with an entirely
+new one that references C<*ARGV> (lines 12-16).
+
=head1 STACKS
When perl executes something like C<addop>, how does it pass on its