summaryrefslogtreecommitdiff
path: root/internals
diff options
context:
space:
mode:
Diffstat (limited to 'internals')
-rw-r--r--internals295
1 files changed, 295 insertions, 0 deletions
diff --git a/internals b/internals
new file mode 100644
index 0000000000..471ad95c08
--- /dev/null
+++ b/internals
@@ -0,0 +1,295 @@
+Newsgroups: comp.lang.perl
+Subject: Re: perl5a4: tie ref restriction?
+Summary:
+Expires:
+References: <2h7b64$aai@jethro.Corp.Sun.COM>
+Sender:
+Followup-To:
+Distribution: world
+Organization: NetLabs, Inc.
+Keywords:
+
+In article <2h7b64$aai@jethro.Corp.Sun.COM> Eric.Arnold@Sun.COM writes:
+: Darn:
+: tie ( @a, TST_tie, "arg1", "arg2" );
+: $a[2]=[1];
+:
+: produces:
+:
+: Can't assign a reference to a magical variable at ./tsttie line 12.
+:
+: I'm all agog about the "tie" function, but ... if this restriction
+: wasn't there, I think I would be able to tie a top level
+: reference/variable to my own package, and then automatically tie in all
+: subsequently linked vars/references so that I could "tie" any arbitrary thing
+: like:
+: $r->{key}[el]{key}
+:
+: to a DBM or other type storage area.
+:
+: Is the restriction necessary?
+
+In the current storage scheme, yes, but as I mentioned in the other
+article, I can and probably should relax that. That code is some of
+the oldest Perl 5 code, and I didn't see some things then that I do
+now.
+
+Ok, let me explain some things about how values are stored. Consider
+this a little design document.
+
+Internally everything is unified to look like a scalar, regardless of
+its type. There's a type-invariant part of every value, and a
+type-variant part. When we modify the type of a value, we can do it in
+place because all references point to the invariant part. All we do is
+swap the variant part for a different part and change that ANY pointer
+in the invariant part to point to the new variant.
+
+The invariant part looks like this:
+
+struct sv {
+ void* sv_any; /* pointer to something */
+ U32 sv_refcnt; /* how many references to us */
+ SVTYPE sv_type; /* what sort of thing pointer points to */
+ U8 sv_flags; /* extra flags, some depending on type */
+ U8 sv_storage; /* storage class */
+ U8 sv_private; /* extra value, depending on type */
+};
+
+This is typedefed to SV. There are other structurally equivalent
+types, AV, HV and CV, that are there merely to help gdb know what kind
+of pointer sv_any is, and provide a little bit of C type-checking.
+Here's a key to Perl naming:
+
+ SV scalar value
+ AV array value
+ HV hash value
+ CV code value
+
+Additionally I often use names containing
+
+ IV integer value
+ NV numeric value (double)
+ PV pointer value
+ LV lvalue, such as a substr() or vec() being assigned to
+ BM a string containing a Boyer-Moore compiled pattern
+ FM a format line program
+
+You'll notice that in SV there's an sv_type field. This contains one
+of the following values, which gives the interpretation of sv_any.
+
+typedef enum {
+ SVt_NULL,
+ SVt_REF,
+ SVt_IV,
+ SVt_NV,
+ SVt_PV,
+ SVt_PVIV,
+ SVt_PVNV,
+ SVt_PVMG,
+ SVt_PVLV,
+ SVt_PVAV,
+ SVt_PVHV,
+ SVt_PVCV,
+ SVt_PVGV,
+ SVt_PVBM,
+ SVt_PVFM,
+} svtype;
+
+These are arranged ROUGHLY in order of increasing complexity, though
+there are some discontinuities. Many of them indicate that sv_any
+points to a struct of a similar name with an X on the front. They can
+be classified like this:
+
+ SVt_NULL
+ The sv_any doesn't point to anything meaningful.
+
+ SVt_REF
+ The sv_any points to another SV. (This is what we're talking
+ about changing to work more like IV and NV below.)
+
+ SVt_IV
+ SVt_NV
+ These are a little tricky in order to be efficient in both
+ memory and time. The sv_any pointer indicates the location of
+ a solitary integer(double), but not directly. The pointer is
+ really a pointer to an XPVIV(XPVNV), so that if there's a valid
+ integer(double) the same code works regardless of the type of
+ the SV. They have special allocators that guarantee that, even
+ though sv_any is pointing to a location several words earlier
+ than the integer(double), it never points to unallocated
+ memory. This does waste a few allocated integers(doubles) at
+ the beginning, but it's probably an overall win.
+
+ SVt_PV
+ SVt_PVIV
+ SVt_PVNV
+ SVt_PVMG
+ These are pretty ordinary, and each is "derived" from the
+ previous in the sense that it just adds more data to the
+ previous structure.
+
+ struct xpv {
+ char * xpv_pv; /* pointer to malloced string */
+ STRLEN xpv_cur; /* length of xpv_pv as a C string */
+ STRLEN xpv_len; /* allocated size */
+ };
+
+ This is your basic string scalar that is never used numerically
+ or magically.
+
+ struct xpviv {
+ char * xpv_pv; /* pointer to malloced string */
+ STRLEN xpv_cur; /* length of xpv_pv as a C string */
+ STRLEN xpv_len; /* allocated size */
+ I32 xiv_iv; /* integer value or pv offset */
+ };
+
+ This is a string scalar that has either been used as an
+ integer, or an integer that has been used in a string
+ context, or has had the front trimmed off of it, in which
+ case xiv_iv contains how far xpv_pv has been incremented
+ from the original allocated value.
+
+ struct xpvnv {
+ char * xpv_pv; /* pointer to malloced string */
+ STRLEN xpv_cur; /* length of xpv_pv as a C string */
+ STRLEN xpv_len; /* allocated size */
+ I32 xiv_iv; /* integer value or pv offset */
+ double xnv_nv; /* numeric value, if any */
+ };
+
+ This is a string or integer scalar that has been used in a
+ numeric context, or a number that has been used in a string
+ or integer context.
+
+ struct xpvmg {
+ char * xpv_pv; /* pointer to malloced string */
+ STRLEN xpv_cur; /* length of xpv_pv as a C string */
+ STRLEN xpv_len; /* allocated size */
+ I32 xiv_iv; /* integer value or pv offset */
+ double xnv_nv; /* numeric value, if any */
+ MAGIC* xmg_magic; /* linked list of magicalness */
+ HV* xmg_stash; /* class package */
+ };
+
+ This is the top of the line for ordinary scalars. This scalar
+ has been charmed with one or more kinds of magical or object
+ behavior. In addition it can contain any or all of integer,
+ double or string.
+
+ SVt_PVLV
+ SVt_PVAV
+ SVt_PVHV
+ SVt_PVCV
+ SVt_PVGV
+ SVt_PVBM
+ SVt_PVFM
+ These are specialized forms that are never directly visible to
+ the Perl script. They are independent of each other, and may
+ not be promoted to any other type.
+
+There are several additional data values in the SV structure. The sv_refcnt
+gives the number of references to this SV. Some of these references may be
+actual Perl language references, but many other are just internal pointers,
+from a symbol table, or from the syntax tree, for example. When sv_refcnt
+goes to zero, the value can be safely deallocated.
+
+The sv_storage byte is not very well thought out, but tends to indicate
+something about where the scalar lives. It's used in allocating
+lexical storage, and at runtime contains an 'O' if the value has been
+blessed as an object. There may be some conflicts lurking in here, and
+I may eventually claim some of the bits for other purposes.
+
+The sv_flags are currently as follows. Most of these are set and cleared
+by macros to guarantee their consistency, and you should always use the
+proper macro rather than accessing them directly.
+
+#define SVf_IOK 1 /* has valid integer value */
+#define SVf_NOK 2 /* has valid numeric value */
+#define SVf_POK 4 /* has valid pointer value */
+ These tell whether an integer, double or string value is
+ immediately available without further consideration. All tainting
+ and magic (but not objecthood) works by turning off these bits and
+ forcing a routine to be executed to discover the real value. The
+ SvIV(), SvNV() and SvPV() macros that fetch values are smart about
+ all this, and should always be used if possible. Most of the stuff
+ mentioned below you really don't have to deal with directly. (Values
+ aren't stored using macros, but using functions sv_setiv(), sv_setnv()
+ and sv_setpv(), plus variants. You should never have to explicitly
+ follow the sv_any pointer to any X structure in your code.)
+
+#define SVf_OOK 8 /* has valid offset value */
+ This is only on when SVf_IOK is off, and indicates that the unused
+ integer storage is holding an offset for the string pointer value
+ because you've done something like s/^prefix//.
+
+#define SVf_MAGICAL 16 /* has special methods */
+ This indicates not only that sv_type is at least SVt_PVMG, but
+ also that the linked list of magical behaviors is not empty.
+
+#define SVf_OK 32 /* has defined value */
+ This indicates that the value is defined. Currently it means either
+ that the type if SVt_REF or that one of SVf_IOK, SVf_NOK, or SVf_POK
+ is set.
+
+#define SVf_TEMP 64 /* eventually in sv_private? */
+ This indicates that the string is a temporary allocated by one of
+ the sv_mortal functions, and that any string value may be stolen
+ from it without copying. (It's important not to steal the value if
+ the temporary will continue to require the value, however.)
+
+#define SVf_READONLY 128 /* may not be modified */
+ This scalar value may not be modified. Any function that might modify
+ a scalar should check for this first, and reject the operation when
+ inappropriate. Currently only the builtin values for sv_undef, sv_yes
+ and sv_no are marked readonly, but eventually we may provide a language
+ to set this bit.
+
+The sv_private byte contains some additional bits that apply across the
+board. Really private bits (that depend on the type) are allocated from
+128 down.
+
+#define SVp_IOK 1 /* has valid non-public integer value */
+#define SVp_NOK 2 /* has valid non-public numeric value */
+#define SVp_POK 4 /* has valid non-public pointer value */
+ These shadow the bits in sv_flags for tainted variables, indicated that
+ there really is a valid value available, but you have to set the global
+ tainted flag if you acces them.
+
+#define SVp_SCREAM 8 /* has been studied? */
+ Indicates that a study was done on this string. A studied string is
+ magical and automatically unstudies itself when modified.
+
+#define SVp_TAINTEDDIR 16 /* PATH component is a security risk */
+ A special flag for $ENV{PATH} that indicates that, while the value
+ as a whole may be untainted, some path component names an insecure
+ directory.
+
+#define SVpfm_COMPILED 128
+ For a format, whether its picture has been "compiled" yet. This
+ cannot be done until runtime because the user has access to the
+ internal formline function, and may supply a variable as the
+ picture.
+
+#define SVpbm_VALID 128
+#define SVpbm_CASEFOLD 64
+#define SVpbm_TAIL 32
+ For a Boyer-Moore pattern, whether the search string has been invalidated
+ by modification (can happen to $pat between calls to index($string,$pat)),
+ whether case folding is in force for regexp matching, and whether we're
+ trying to match something like /foo$/.
+
+#define SVpgv_MULTI 128
+ For a symbol table entry, set when we've decided that this symbol is
+ probably not a typo. Suspected typos can be reported by -w.
+
+
+Well, that's probably enough for now. As you can see, we could turn
+references into something more like an integer or a pointer value. In
+fact, I suspect the right thing to do is say that a reference is just
+a funny type of string pointer that isn't allocated the same way.
+This would let us not only have references to scalars, but might provide
+a way to have scalars that point to non-malloced memory. Hmm. I'll
+have to think about that s'more. You can think about it too.
+
+Larry