diff options
Diffstat (limited to 'internals')
-rw-r--r-- | internals | 295 |
1 files changed, 295 insertions, 0 deletions
diff --git a/internals b/internals new file mode 100644 index 0000000000..471ad95c08 --- /dev/null +++ b/internals @@ -0,0 +1,295 @@ +Newsgroups: comp.lang.perl +Subject: Re: perl5a4: tie ref restriction? +Summary: +Expires: +References: <2h7b64$aai@jethro.Corp.Sun.COM> +Sender: +Followup-To: +Distribution: world +Organization: NetLabs, Inc. +Keywords: + +In article <2h7b64$aai@jethro.Corp.Sun.COM> Eric.Arnold@Sun.COM writes: +: Darn: +: tie ( @a, TST_tie, "arg1", "arg2" ); +: $a[2]=[1]; +: +: produces: +: +: Can't assign a reference to a magical variable at ./tsttie line 12. +: +: I'm all agog about the "tie" function, but ... if this restriction +: wasn't there, I think I would be able to tie a top level +: reference/variable to my own package, and then automatically tie in all +: subsequently linked vars/references so that I could "tie" any arbitrary thing +: like: +: $r->{key}[el]{key} +: +: to a DBM or other type storage area. +: +: Is the restriction necessary? + +In the current storage scheme, yes, but as I mentioned in the other +article, I can and probably should relax that. That code is some of +the oldest Perl 5 code, and I didn't see some things then that I do +now. + +Ok, let me explain some things about how values are stored. Consider +this a little design document. + +Internally everything is unified to look like a scalar, regardless of +its type. There's a type-invariant part of every value, and a +type-variant part. When we modify the type of a value, we can do it in +place because all references point to the invariant part. All we do is +swap the variant part for a different part and change that ANY pointer +in the invariant part to point to the new variant. + +The invariant part looks like this: + +struct sv { + void* sv_any; /* pointer to something */ + U32 sv_refcnt; /* how many references to us */ + SVTYPE sv_type; /* what sort of thing pointer points to */ + U8 sv_flags; /* extra flags, some depending on type */ + U8 sv_storage; /* storage class */ + U8 sv_private; /* extra value, depending on type */ +}; + +This is typedefed to SV. There are other structurally equivalent +types, AV, HV and CV, that are there merely to help gdb know what kind +of pointer sv_any is, and provide a little bit of C type-checking. +Here's a key to Perl naming: + + SV scalar value + AV array value + HV hash value + CV code value + +Additionally I often use names containing + + IV integer value + NV numeric value (double) + PV pointer value + LV lvalue, such as a substr() or vec() being assigned to + BM a string containing a Boyer-Moore compiled pattern + FM a format line program + +You'll notice that in SV there's an sv_type field. This contains one +of the following values, which gives the interpretation of sv_any. + +typedef enum { + SVt_NULL, + SVt_REF, + SVt_IV, + SVt_NV, + SVt_PV, + SVt_PVIV, + SVt_PVNV, + SVt_PVMG, + SVt_PVLV, + SVt_PVAV, + SVt_PVHV, + SVt_PVCV, + SVt_PVGV, + SVt_PVBM, + SVt_PVFM, +} svtype; + +These are arranged ROUGHLY in order of increasing complexity, though +there are some discontinuities. Many of them indicate that sv_any +points to a struct of a similar name with an X on the front. They can +be classified like this: + + SVt_NULL + The sv_any doesn't point to anything meaningful. + + SVt_REF + The sv_any points to another SV. (This is what we're talking + about changing to work more like IV and NV below.) + + SVt_IV + SVt_NV + These are a little tricky in order to be efficient in both + memory and time. The sv_any pointer indicates the location of + a solitary integer(double), but not directly. The pointer is + really a pointer to an XPVIV(XPVNV), so that if there's a valid + integer(double) the same code works regardless of the type of + the SV. They have special allocators that guarantee that, even + though sv_any is pointing to a location several words earlier + than the integer(double), it never points to unallocated + memory. This does waste a few allocated integers(doubles) at + the beginning, but it's probably an overall win. + + SVt_PV + SVt_PVIV + SVt_PVNV + SVt_PVMG + These are pretty ordinary, and each is "derived" from the + previous in the sense that it just adds more data to the + previous structure. + + struct xpv { + char * xpv_pv; /* pointer to malloced string */ + STRLEN xpv_cur; /* length of xpv_pv as a C string */ + STRLEN xpv_len; /* allocated size */ + }; + + This is your basic string scalar that is never used numerically + or magically. + + struct xpviv { + char * xpv_pv; /* pointer to malloced string */ + STRLEN xpv_cur; /* length of xpv_pv as a C string */ + STRLEN xpv_len; /* allocated size */ + I32 xiv_iv; /* integer value or pv offset */ + }; + + This is a string scalar that has either been used as an + integer, or an integer that has been used in a string + context, or has had the front trimmed off of it, in which + case xiv_iv contains how far xpv_pv has been incremented + from the original allocated value. + + struct xpvnv { + char * xpv_pv; /* pointer to malloced string */ + STRLEN xpv_cur; /* length of xpv_pv as a C string */ + STRLEN xpv_len; /* allocated size */ + I32 xiv_iv; /* integer value or pv offset */ + double xnv_nv; /* numeric value, if any */ + }; + + This is a string or integer scalar that has been used in a + numeric context, or a number that has been used in a string + or integer context. + + struct xpvmg { + char * xpv_pv; /* pointer to malloced string */ + STRLEN xpv_cur; /* length of xpv_pv as a C string */ + STRLEN xpv_len; /* allocated size */ + I32 xiv_iv; /* integer value or pv offset */ + double xnv_nv; /* numeric value, if any */ + MAGIC* xmg_magic; /* linked list of magicalness */ + HV* xmg_stash; /* class package */ + }; + + This is the top of the line for ordinary scalars. This scalar + has been charmed with one or more kinds of magical or object + behavior. In addition it can contain any or all of integer, + double or string. + + SVt_PVLV + SVt_PVAV + SVt_PVHV + SVt_PVCV + SVt_PVGV + SVt_PVBM + SVt_PVFM + These are specialized forms that are never directly visible to + the Perl script. They are independent of each other, and may + not be promoted to any other type. + +There are several additional data values in the SV structure. The sv_refcnt +gives the number of references to this SV. Some of these references may be +actual Perl language references, but many other are just internal pointers, +from a symbol table, or from the syntax tree, for example. When sv_refcnt +goes to zero, the value can be safely deallocated. + +The sv_storage byte is not very well thought out, but tends to indicate +something about where the scalar lives. It's used in allocating +lexical storage, and at runtime contains an 'O' if the value has been +blessed as an object. There may be some conflicts lurking in here, and +I may eventually claim some of the bits for other purposes. + +The sv_flags are currently as follows. Most of these are set and cleared +by macros to guarantee their consistency, and you should always use the +proper macro rather than accessing them directly. + +#define SVf_IOK 1 /* has valid integer value */ +#define SVf_NOK 2 /* has valid numeric value */ +#define SVf_POK 4 /* has valid pointer value */ + These tell whether an integer, double or string value is + immediately available without further consideration. All tainting + and magic (but not objecthood) works by turning off these bits and + forcing a routine to be executed to discover the real value. The + SvIV(), SvNV() and SvPV() macros that fetch values are smart about + all this, and should always be used if possible. Most of the stuff + mentioned below you really don't have to deal with directly. (Values + aren't stored using macros, but using functions sv_setiv(), sv_setnv() + and sv_setpv(), plus variants. You should never have to explicitly + follow the sv_any pointer to any X structure in your code.) + +#define SVf_OOK 8 /* has valid offset value */ + This is only on when SVf_IOK is off, and indicates that the unused + integer storage is holding an offset for the string pointer value + because you've done something like s/^prefix//. + +#define SVf_MAGICAL 16 /* has special methods */ + This indicates not only that sv_type is at least SVt_PVMG, but + also that the linked list of magical behaviors is not empty. + +#define SVf_OK 32 /* has defined value */ + This indicates that the value is defined. Currently it means either + that the type if SVt_REF or that one of SVf_IOK, SVf_NOK, or SVf_POK + is set. + +#define SVf_TEMP 64 /* eventually in sv_private? */ + This indicates that the string is a temporary allocated by one of + the sv_mortal functions, and that any string value may be stolen + from it without copying. (It's important not to steal the value if + the temporary will continue to require the value, however.) + +#define SVf_READONLY 128 /* may not be modified */ + This scalar value may not be modified. Any function that might modify + a scalar should check for this first, and reject the operation when + inappropriate. Currently only the builtin values for sv_undef, sv_yes + and sv_no are marked readonly, but eventually we may provide a language + to set this bit. + +The sv_private byte contains some additional bits that apply across the +board. Really private bits (that depend on the type) are allocated from +128 down. + +#define SVp_IOK 1 /* has valid non-public integer value */ +#define SVp_NOK 2 /* has valid non-public numeric value */ +#define SVp_POK 4 /* has valid non-public pointer value */ + These shadow the bits in sv_flags for tainted variables, indicated that + there really is a valid value available, but you have to set the global + tainted flag if you acces them. + +#define SVp_SCREAM 8 /* has been studied? */ + Indicates that a study was done on this string. A studied string is + magical and automatically unstudies itself when modified. + +#define SVp_TAINTEDDIR 16 /* PATH component is a security risk */ + A special flag for $ENV{PATH} that indicates that, while the value + as a whole may be untainted, some path component names an insecure + directory. + +#define SVpfm_COMPILED 128 + For a format, whether its picture has been "compiled" yet. This + cannot be done until runtime because the user has access to the + internal formline function, and may supply a variable as the + picture. + +#define SVpbm_VALID 128 +#define SVpbm_CASEFOLD 64 +#define SVpbm_TAIL 32 + For a Boyer-Moore pattern, whether the search string has been invalidated + by modification (can happen to $pat between calls to index($string,$pat)), + whether case folding is in force for regexp matching, and whether we're + trying to match something like /foo$/. + +#define SVpgv_MULTI 128 + For a symbol table entry, set when we've decided that this symbol is + probably not a typo. Suspected typos can be reported by -w. + + +Well, that's probably enough for now. As you can see, we could turn +references into something more like an integer or a pointer value. In +fact, I suspect the right thing to do is say that a reference is just +a funny type of string pointer that isn't allocated the same way. +This would let us not only have references to scalars, but might provide +a way to have scalars that point to non-malloced memory. Hmm. I'll +have to think about that s'more. You can think about it too. + +Larry |