summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPaul "LeoNerd" Evans <leonerd@leonerd.org.uk>2022-12-10 21:15:19 +0000
committerPaul "LeoNerd" Evans <leonerd@leonerd.org.uk>2023-02-10 14:38:44 +0000
commitb45c1f8aeb97b765fe1c30fe69a0b0bbff9d9925 (patch)
treeba05d926896c60283e0d2e9582a0aa52a7d8ce6c
parentd8b29a3430b219e3ab3dae2947a0ff22885c1b5e (diff)
downloadperl-b45c1f8aeb97b765fe1c30fe69a0b0bbff9d9925.tar.gz
Many documentation updates for new class syntax
* perlclass.pod: Document field initialising expressions and :param attribute * perlclass.pod: Add a TODO section to explain what more is to be added * Added pod/perlclassguts.pod which contains internal implementation notes and details about how the class system works.
-rw-r--r--MANIFEST1
-rw-r--r--pod/perl.pod1
-rw-r--r--pod/perlclass.pod110
-rw-r--r--pod/perlclassguts.pod409
-rw-r--r--win32/pod.mak4
5 files changed, 516 insertions, 9 deletions
diff --git a/MANIFEST b/MANIFEST
index 949ed2a0d6..193d703ec6 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -5355,6 +5355,7 @@ pod/perlbot.pod
pod/perlcall.pod Perl calling conventions from C
pod/perlcheat.pod Perl cheat sheet
pod/perlclass.pod Perl class syntax
+pod/perlclassguts.pod Internals of class syntax
pod/perlclib.pod Internal replacements for standard C library functions
pod/perlcommunity.pod Perl community information
pod/perldata.pod Perl data structures
diff --git a/pod/perl.pod b/pod/perl.pod
index f75ec3b81e..58fa27a0ca 100644
--- a/pod/perl.pod
+++ b/pod/perl.pod
@@ -161,6 +161,7 @@ aux h2ph h2xs perlbug pl2pm pod2html pod2man splain xsubpp
perlmroapi Perl method resolution plugin interface
perlreapi Perl regular expression plugin interface
perlreguts Perl regular expression engine internals
+ perlclassguts Internals of class syntax
perlapi Perl API listing (autogenerated)
perlintern Perl internal functions (autogenerated)
diff --git a/pod/perlclass.pod b/pod/perlclass.pod
index b990f665c4..ad035c9218 100644
--- a/pod/perlclass.pod
+++ b/pod/perlclass.pod
@@ -74,6 +74,12 @@ Additionally, in the class BLOCK you are allowed to declare fields and methods.
field VARIABLE_NAME;
+ field VARIABLE_NAME = EXPR;
+
+ field VARIABLE_NAME : ATTRIBUTES;
+
+ field VARIABLE_NAME : ATTRIBUTES = EXPR;
+
Fields are variables which are visible in the scope of the class - more
specifically within L</method> and C<ADJUST> blocks. Each class instance get
their own storage of fields, independent of each other.
@@ -84,17 +90,28 @@ accessible from the outside). The main difference is that different instances
access different values in the same scope.
class WithFields {
- field $scalar;
- field @array;
- field %hash;
+ field $scalar = 42;
+ field @array = qw(this is just an array);
+ field %hash = (species => 'Martian', planet => 'Mars');
+ }
- ADJUST {
- $scalar = 42;
- @array = qw(this is just an array);
- %hash = (species => 'Marsian', planet => 'Mars');
- }
+Fields may optionally have initializing expressions. If present, the expression
+will be evaluated within the constructor of each object instance. During each
+evaluation, the expression can use the value of any previously-set field, as
+well as see any other variables in scope.
+
+ class WithACounter {
+ my $next_count = 1;
+ field $count = $next_count++;
}
+When combined with the C<:param> field attribute, the defaulting expression can
+use any of the C<=>, C<//=> or C<||=> operators. Expressions using C<=> will
+apply whenever the caller did not pass the corresponding parameter to the
+constructor at all. Expressions using C<//=> will also apply if the caller did
+pass the parameter but the value was undefined, and expressions using C<||=>
+will apply if the value was false.
+
=head2 method
method METHOD_NAME SIGNATURE BLOCK
@@ -167,7 +184,19 @@ already loaded.
=head2 Field attributes
-None yet.
+=head3 :param
+
+A scalar field with a C<:param> attribute will take its value from a named
+parameter passed to the constructor. By default the parameter will have the
+same name as the field (minus its leading C<$> sigil), but a different name
+can be specified in the attribute.
+
+ field $x :param;
+ field $y :param(the_y_value);
+
+If there is no defaulting expression then the parameter is required by the
+constructor; the caller must pass it or an exception is thrown. With a
+defaulting expression this becomes optional.
=head2 Method attributes
@@ -209,6 +238,69 @@ C<'OBJECT'>.
Just like with other references, when object reference count reaches zero it
will automatically be destroyed.
+=head1 TODO
+
+This feature is still experimental and very incomplete. The following list
+gives some overview of the kinds of work still to be added or changed:
+
+=over 4
+
+=item * Roles
+
+Some syntax for declaring a role (likely a C<role> keyword), and for consuming
+a role into a class (likely a C<:does()> attribute).
+
+=item * Parameters to ADJUST blocks
+
+Some syntax for declaring that an C<ADJUST> block can consume named
+parameters, which become part of the class constructor's API. This might be
+inspired by a similar plan to add named arguments to subroutine signatures.
+
+ class X {
+ ADJUST (:$alpha, :$beta = 123) {
+ ...
+ }
+ }
+
+ my $obj = X->new(alpha => 456);
+
+=item * ADJUST blocks as true blocks
+
+Currently, every ADJUST block is wrapped in its own CV that gets invoked with
+the full ENTERSUB overhead. It should be possible to use the same mechanism
+that makes all field initializer expressions appear within the same CV on
+ADJUST blocks as well, merging them all into a single CV per class. This will
+make it faster to invoke if a class has more than one of them.
+
+=item * Accessor generator attributes
+
+Attributes to request that accessor methods be generated for fields. Likely
+C<:reader> and C<:writer>.
+
+ class X {
+ field $name :reader;
+ }
+
+Equivalent to
+
+ class X {
+ field $name;
+ method name { return $name; }
+ }
+
+=item * Metaprogramming
+
+An extension of the metaprogramming API (currently proposed by
+L<RFC0022|https://github.com/Perl/RFCs/pull/25>) which adds knowledge of
+classes, methods, fields, ADJUST blocks, and other such class-related details.
+
+=item * Extension Customisation
+
+Ways in which out-of-core modules can interact with the class system,
+including an ability for them to provide new class or field attributes.
+
+=back
+
=head1 AUTHORS
Paul Evans
diff --git a/pod/perlclassguts.pod b/pod/perlclassguts.pod
new file mode 100644
index 0000000000..acecc815cf
--- /dev/null
+++ b/pod/perlclassguts.pod
@@ -0,0 +1,409 @@
+=head1 NAME
+
+perlclassguts - Internals of how C<feature 'class'> and class syntax works
+
+=head1 DESCRIPTION
+
+This document provides in-depth information about the way in which the perl
+interpreter implements the C<feature 'class'> syntax and overall behaviour.
+It is not intended as an end-user guide on how to use the feature. For that,
+see L<perlclass>.
+
+The reader is assumed to be generally familiar with the perl interpreter
+internals overall. For a more general overview of these details, see also
+L<perlguts>.
+
+=head1 DATA STORAGE
+
+=head2 Classes
+
+A class is fundamentally a package, and exists in the symbol table as an HV
+with an aux structure in exactly the same way as a non-class package. It is
+distinguished from a non-class package by the fact that the
+C<HvSTASH_IS_CLASS()> macro will return true on it.
+
+Extra information relating to it being a class is stored in the
+C<struct xpvhv_aux> structure attached to the stash, in the following fields:
+
+ HV *xhv_class_superclass;
+ CV *xhv_class_initfields_cv;
+ AV *xhv_class_adjust_blocks;
+ PADNAMELIST *xhv_class_fields;
+ PADOFFSET xhv_class_next_fieldix;
+ HV *xhv_class_param_map;
+
+=over 4
+
+=item *
+
+C<xhv_class_superclass> will be C<NULL> for a class with no superclass. It
+will point directly to the stash of the parent class if one has been set with
+the C<:isa()> class attribute.
+
+=item *
+
+C<xhv_class_initfields_cv> will contain a C<CV *> pointing to a function to be
+invoked as part of the constructor of this class or any subclass thereof. This
+CV is responsible for initializing all the fields defined by this class for a
+new instance. This CV will be an anonymous real function - i.e. while it has no
+name and no GV, it is I<not> a protosub and may be directly invoked.
+
+=item *
+
+C<xhv_class_adjust_blocks> may point to an AV containing CV pointers to each of
+the C<ADJUST> blocks defined on the class. If the class has a superclass, this
+array will additionally contain duplicate pointers of the CVs of its parent
+class. The AV is created lazily the first time an element is pushed to it; it
+is valid for there not to be one, and this pointer will be C<NULL> in that
+case.
+
+The CVs are stored directly, not via RVs. Each CV will be an anonymous real
+function.
+
+=item *
+
+C<xhv_class_fields> will point to a C<PADNAMELIST> containing C<PADNAME>s,
+each being one defined field of the class. They are stored in order of
+declaration. Note however, that the index into this array will not necessarily
+be equal to the C<fieldix> of each field, because in the case of a subclass,
+the array will begin at zero but the index of the first field in it will be
+non-zero if its parent class contains any fields at all.
+
+For more information on how individual fields are represented, see L</Fields>.
+
+=item *
+
+C<xhv_class_next_fieldix> gives the field index that will be assigned to the
+next field to be added to the class. It is only useful at compile-time.
+
+=item *
+
+C<xhv_class_param_map> may point to an HV which maps field C<:param> attribute
+names to the field index of the field with that name. This mapping is copied
+from parent classes; each class will contain the sum total of all its parents
+in addition to its own.
+
+=back
+
+=head2 Fields
+
+A field is still fundamentally a lexical variable declared in a scope, and
+exists in the C<PADNAMELIST> of its corresponding CV. Methods and other
+method-like CVs can still capture them exactly as they can with regular
+lexicals. A field is distinguished from other kinds of pad entry in that the
+C<PadnameIsFIELD()> macro will return true on it.
+
+Extra information relating to it being a field is stored in an additional
+structure accessible via the C<PadnameFIELDINFO()> macro on the padname. This
+structure has the following fields:
+
+ PADOFFSET fieldix;
+ HV *fieldstash;
+ OP *defop;
+ SV *paramname;
+ bool def_if_undef;
+ bool def_if_false;
+
+=over 4
+
+=item *
+
+C<fieldix> stores the "field index" of the field; that is, the index into the
+instance field array where this field's value will be stored. Note that the
+first index in the array is not specially reserved. The first field in a class
+will start from field index 0.
+
+=item *
+
+C<fieldstash> stores a pointer to the stash of the class that defined this
+field. This is necessary in case there are multiple classes defined within the
+same scope; it is used to disambiguate the fields of each.
+
+ {
+ class C1; field $x;
+ class C2; field $x;
+ }
+
+=item *
+
+C<defop> may store a pointer to a defaulting expression optree for this field.
+Defaulting expressions are optional; this field may be C<NULL>.
+
+=item *
+
+C<paramname> may point to a regular string SV containing the C<:param> name
+attribute given to the field. If none, it will be C<NULL>.
+
+=item *
+
+One of C<def_if_undef> and C<def_if_false> will be true if the defaulting
+expression was set using the C<//=> or C<||=> operators respectively.
+
+=back
+
+=head2 Methods
+
+A method is still fundamentally a CV, and has the same basic representation as
+one. It has an optree and a pad, and is stored via a GV in the stash of its
+containing package. It is distinguished from a non-method CV by the fact that
+the C<CvIsMETHOD()> macro will return true on it.
+
+(Note: This macro should not be confused with the one that was previously
+called C<CvMETHOD()>. That one does not relate to the class system, and was
+renamed to C<CvNOWARN_AMBIGUOUS()> to avoid this confusion.)
+
+There is currently no extra information that needs to be stored about a method
+CV, so the structure does not add any new fields.
+
+=head2 Instances
+
+Object instances are represented by an entirely new SV type, whose base type
+is C<SVt_PVOBJ>. This should still be blessed into its class stash and wrapped
+in an RV in the usual manner for classical object.
+
+As these are their own unique container type, distinct from hashes or arrays,
+the core C<builtin::reftype> function returns a new value when asked about
+these. That value is C<"OBJECT">.
+
+Internally, such an object is an array of SV pointers whose size is fixed at
+creation time (because the number of fields in a class is known after
+compilation). An object instance stores the max field index within it (for
+basic error-checking on access), and a fixed-size array of SV pointers storing
+the individual field values.
+
+Fields of array and hash type directly store AV or HV pointers into the array;
+they are not stored via an intervening RV.
+
+=head1 API
+
+The data structures described above are supported by the following API
+functions.
+
+=head2 Class Manipulation
+
+=head3 class_setup_stash
+
+ void class_setup_stash(HV *stash);
+
+Called by the parser on encountering the C<class> keyword. It upgrades the
+stash into being a class and prepares it for receiving class-specific items
+like methods and fields.
+
+=head3 class_seal_stash
+
+ void class_seal_stash(HV *stash);
+
+Called by the parser at the end of a C<class> block, or for unit classes its
+containing scope. This function performs various finalisation activities that
+are required before instances of the class can be constructed, but could not
+have been done until all the information about the members of the class is
+known.
+
+Any additions to or modifications of the class under compilation must be
+performed between these two function calls. Classes cannot be modified once
+they have been sealed.
+
+=head3 class_add_field
+
+ void class_add_field(HV *stash, PADNAME *pn);
+
+Called by F<pad.c> as part of defining a new field name in the current pad.
+Note that this function does I<not> create the padname; that must already be
+done by F<pad.c>. This API function simply informs the class that the new
+field name has been created and is now available for it.
+
+=head3 class_add_ADJUST
+
+ void class_add_ADJUST(HV *stash, CV *cv);
+
+Called by the parser once it has parsed and constructed a CV for a new
+C<ADJUST> block. This gets added to the list stored by the class.
+
+=head2 Field Manipulation
+
+=head3 class_prepare_initfield_parse
+
+ void class_prepare_initfield_parse();
+
+Called by the parser just before parsing an initializing expression for a
+field variable. This makes use of a suspended compcv to combine all the field
+initializing expressions into the same CV.
+
+=head3 class_set_field_defop
+
+ void class_set_field_defop(PADNAME *pn, OPCODE defmode, OP *defop);
+
+Called by the parser after it has parsed an initializing expression for the
+field. Sets the defaulting expression and mode of application. C<defmode>
+should either be zero, or one of C<OP_ORASSIGN> or C<OP_DORASSIGN> depending
+on the defaulting mode.
+
+=head3 padadd_FIELD
+
+ #define padadd_FIELD
+
+This flag constant tells the C<pad_add_name_*> family of functions that the
+new name should be added as a field. There is no need to call
+C<class_add_field()>; this will be done automatically.
+
+=head2 Method Manipulation
+
+=head3 class_prepare_method_parse
+
+ void class_prepare_method_parse(CV *cv);
+
+Called by the parser after C<start_subparse()> but immediately before doing
+anything else. This prepares the C<PL_compcv> for parsing a method; arranging
+for the C<CvIsMETHOD> test to be true, adding the C<$self> lexical, and any
+other activities that may be required.
+
+=head3 class_wrap_method_body
+
+ OP *class_wrap_method_body(OP *o);
+
+Called by the parser at the end of parsing a method body into an optree but
+just before wrapping it in the eventual CV. This function inserts extra ops
+into the optree to make the method work correctly.
+
+=head2 Object Instances
+
+=head3 SVt_PVOBJ
+
+ #define SVt_PVOBJ
+
+An SV type constant used for comparison with the C<SvTYPE()> macro.
+
+=head3 ObjectMAXFIELD
+
+ SSize_t ObjectMAXFIELD(sv);
+
+A function-like macro that obtains the maximum valid field index that can be
+accessed from the C<ObjectFIELDS> array.
+
+=head3 ObjectFIELDS
+
+ SV **ObjectFIELDS(sv);
+
+A function-like macro that obtains the fields array directly out of an object
+instance. Fields can be accessed by their field index, from 0 up to the maximum
+valid index given by C<ObjectMAXFIELD>.
+
+=head1 OPCODES
+
+=head2 OP_METHSTART
+
+ newUNOP_AUX(OP_METHSTART, ...);
+
+An C<OP_METHSTART> is an C<UNOP_AUX> which must be present at the start of a
+method CV in order to make it work properly. This is inserted by
+C<class_wrap_method_body()>, and even appears before any optree fragment
+associated with signature argument checking or extraction.
+
+This op is responsible for shifting the value of C<$self> out of the arguments
+list and binding any field variables that the method requires access to into
+the pad. The AUX vector will contain details of the field/pad index pairings
+required.
+
+This op also performs sanity checking on the invocant value. It checks that it
+is definitely an object reference of a compatible class type. If not, an
+exception is thrown.
+
+If the C<op_private> field includes the C<OPpINITFIELDS> flag, this indicates
+that the op begins the special C<xhv_class_initfields_cv> CV. In this case it
+should additionally take the second value from the arguments list, which
+should be a plain HV pointer (I<directly>, not via RV). and bind it to the
+second pad slot, where the generated optree will expect to find it.
+
+=head2 OP_INITFIELD
+
+An C<OP_INITFIELD> is only invoked as part of the C<xhv_class_initfields_cv>
+CV during the construction phase of an instance. This is the time that the
+individual SVs that make up the mutable fields of the instance (including AVs
+and HVs) are actually assigned into the C<ObjectFIELDS> array. The
+C<OPpINITFIELD_AV> and C<OPpINITFIELD_HV> private flags indicate whether it is
+creating an AV or HV; if neither is set then an SV is created.
+
+If the op has the C<OPf_STACKED> flag it expects to find an initializing value
+on the stack. For SVs this is the topmost SV on the data stack. For AVs and
+HVs it expects a marked list.
+
+=head1 COMPILE-TIME BEHAVIOUR
+
+=head2 C<ADJUST> Phasers
+
+During compiletime, parsing of an C<ADJUST> phaser is handled in a
+fundamentally different way to the existing perl phasers (C<BEGIN>, etc...)
+
+Rather than taking the usual route, the tokenizer recognises that the
+C<ADJUST> keyword introduces a phaser block. The parser then parses the body
+of this block similarly to how it would parse an (anonymous) method body,
+creating a CV that has no name GV. This is then inserted directly into the
+class information by calling C<class_add_ADJUST>, entirely bypassing the
+symbol table.
+
+=head2 Attributes
+
+During compilation, attributes of both classes and fields are handled in a
+different way to existing perl attributes on subroutines and lexical
+variables.
+
+The parser still forms an C<OP_LIST> optree of C<OP_CONST> nodes, but these
+are passed to the C<class_apply_attributes> or C<class_apply_field_attributes>
+functions. Rather than using a class lookup for a method in the class being
+parsed, a fixed internal list of known attributes is used to find functions to
+apply the attribute to the class or field. In future this may support
+user-supplied extension attribute, though at present it only recognises ones
+defined by the core itself.
+
+=head2 Field Initializing Expressions
+
+During compilation, the parser makes use of a suspended compcv when parsing
+the defaulting expression for a field. All the expressions for all the fields
+in the class share the same suspended compcv, which is then compiled up into
+the same internal CV called by the constructor to intialize all the fields
+provided by that class.
+
+=head1 RUNTIME BEHAVIOUR
+
+=head2 Constructor
+
+The generated constructor for a class itself is an XSUB which performs three
+tasks in order: it creates the instance SV itself, invokes the field
+initializers, then invokes the ADJUST block CVs. The constructor for any class
+is always the same basic shape, regardless of whether the class has a
+superclass or not.
+
+The field initializers are collected into a generated optree-based CV called
+the field initializer CV. This is the CV which contains all the optree
+fragments for the field initializing expressions. When invoked, the field
+initializer CV might make a chained call to the superclass initializer if one
+exists, before invoking all of the individual field initialization ops. The
+field initializer CV is invoked with two items on the stack; being the
+instance SV and a direct HV containing the constructor parameters. Note
+carefully: this HV is passed I<directly>, not via an RV reference. This is
+permitted because both the caller and the callee are directly generated code
+and not arbitrary pure-perl subroutines.
+
+The ADJUST block CVs are all collected into a single flat list, merging all of
+the ones defined by the superclass as well. They are all invoked in order,
+after the field initializer CV.
+
+=head2 C<$self> Access During Methods
+
+When C<class_prepare_method_parse()> is called, it arranges that the pad of
+the new CV body will begin with a lexical called C<$self>. Because the pad
+should be freshly-created at this point, this will have the pad index of 1.
+The function checks this and aborts if that is not true.
+
+Because of this fact, code within the body of a method or method-like CV can
+reliably use pad index 1 to obtain the invocant reference. The C<OP_INITFIELD>
+opcode also relies on this fact.
+
+In similar fashion, during the C<xhv_class_initfields_cv> the next pad slot is
+relied on to store the constructor parameters HV, at pad index 2.
+
+=head1 AUTHORS
+
+Paul Evans
+
+=cut
diff --git a/win32/pod.mak b/win32/pod.mak
index 597e688aa2..f5b9830aaa 100644
--- a/win32/pod.mak
+++ b/win32/pod.mak
@@ -98,6 +98,7 @@ POD = perl.pod \
perlcall.pod \
perlcheat.pod \
perlclass.pod \
+ perlclassguts.pod \
perlclib.pod \
perlcommunity.pod \
perldata.pod \
@@ -274,6 +275,7 @@ MAN = perl.man \
perlcall.man \
perlcheat.man \
perlclass.man \
+ perlclassguts.man \
perlclib.man \
perlcommunity.man \
perldata.man \
@@ -450,6 +452,7 @@ HTML = perl.html \
perlcall.html \
perlcheat.html \
perlclass.html \
+ perlclassguts.html \
perlclib.html \
perlcommunity.html \
perldata.html \
@@ -626,6 +629,7 @@ TEX = perl.tex \
perlcall.tex \
perlcheat.tex \
perlclass.tex \
+ perlclassguts.tex \
perlclib.tex \
perlcommunity.tex \
perldata.tex \