summaryrefslogtreecommitdiff
path: root/internals
blob: fbf686e0c1e388da0de6325847628a89427604b0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
Newsgroups: comp.lang.perl
Subject: Re: perl5a4:  tie ref restriction?
Summary: 
Expires: 
References: <2h7b64$aai@jethro.Corp.Sun.COM>
Sender: 
Followup-To: 
Distribution: world
Organization: NetLabs, Inc.
Keywords: 

In article <2h7b64$aai@jethro.Corp.Sun.COM> Eric.Arnold@Sun.COM writes:
: Darn:
: 	tie ( @a, TST_tie, "arg1", "arg2" ); 
: 	$a[2]=[1];
: 
: produces:
: 
: 	Can't assign a reference to a magical variable at ./tsttie line 12.
: 
: I'm all agog about the "tie" function, but ... if this restriction
: wasn't there, I think I would be able to tie a top level
: reference/variable to my own package, and then automatically tie in all
: subsequently linked vars/references so that I could "tie" any arbitrary thing
: like:
: 	$r->{key}[el]{key}
: 
: to a DBM or other type storage area.
: 
: Is the restriction necessary?

In the current storage scheme, yes, but as I mentioned in the other
article, I can and probably should relax that.  That code is some of
the oldest Perl 5 code, and I didn't see some things then that I do
now.

[I did relax that.]

Ok, let me explain some things about how values are stored.  Consider
this a little design document.

Internally everything is unified to look like a scalar, regardless of
its type.  There's a type-invariant part of every value, and a
type-variant part.  When we modify the type of a value, we can do it in
place because all references point to the invariant part.  All we do is
swap the variant part for a different part and change that ANY pointer
in the invariant part to point to the new variant.

The invariant part looks like this:

struct sv {
    void*	sv_any;		/* pointer to something */
    U32		sv_refcnt;	/* how many references to us */
    SVTYPE	sv_type;	/* what sort of thing pointer points to */
    U8		sv_flags;	/* extra flags, some depending on type */
    U8		sv_storage;	/* storage class */
    U8		sv_private;	/* extra value, depending on type */
};

[The last 4 bytes have been combined into a single U32.]

This is typedefed to SV.  There are other structurally equivalent
types, AV, HV and CV, that are there merely to help gdb know what kind
of pointer sv_any is, and provide a little bit of C type-checking.
Here's a key to Perl naming:

	SV	scalar value
	AV	array value
	HV	hash value
	CV	code value

Additionally I often use names containing

	IV	integer value
	NV	numeric value (double)
	PV	pointer value
	RV	reference value
	LV	lvalue, such as a substr() or vec() being assigned to
	BM	a string containing a Boyer-Moore compiled pattern
	FM	a format line program

You'll notice that in SV there's an sv_type field.  This contains one
of the following values, which gives the interpretation of sv_any.

typedef enum {
	SVt_NULL,
	SVt_REF,
	SVt_IV,
	SVt_NV,
	SVt_PV,
	SVt_PVIV,
	SVt_PVNV,
	SVt_PVMG,
	SVt_PVLV,
	SVt_PVAV,
	SVt_PVHV,
	SVt_PVCV,
	SVt_PVGV,
	SVt_PVBM,
	SVt_PVFM,
} svtype;

[There is no longer a REF type.  There's an RV type that holds a minimal ref
value but other types can also hold an RV.  This was to allow magical refs.]

These are arranged ROUGHLY in order of increasing complexity, though
there are some discontinuities.  Many of them indicate that sv_any
points to a struct of a similar name with an X on the front.  They can
be classified like this:

    SVt_NULL
	The sv_any doesn't point to anything meaningful.

    SVt_REF
	The sv_any points to another SV.  (This is what we're talking
	about changing to work more like IV and NV below.)  [And that's what
	I did.]

    SVt_IV
    SVt_NV
	These are a little tricky in order to be efficient in both
	memory and time.  The sv_any pointer indicates the location of
	a solitary integer(double), but not directly.  The pointer is
	really a pointer to an XPVIV(XPVNV), so that if there's a valid
	integer(double) the same code works regardless of the type of
	the SV.  They have special allocators that guarantee that, even
	though sv_any is pointing to a location several words earlier
	than the integer(double), it never points to unallocated
	memory.  This does waste a few allocated integers(doubles) at
	the beginning, but it's probably an overall win.

    [SVt_RV probably belongs here.]
    SVt_PV
    SVt_PVIV
    SVt_PVNV
    SVt_PVMG
	These are pretty ordinary, and each is "derived" from the
	previous in the sense that it just adds more data to the
	previous structure.
[ Need to add this:
	struct xrv {
	    SV *	xrv_rv;		/* pointer to another SV */
	};

	    A reference value.  In the following structs its space is reserved
	    as a char* xpv_pv, but if SvROK() is true, xpv_pv is pointing to
	    another SV, not a string.
]

	struct xpv {
	    char *      xpv_pv;		/* pointer to malloced string */
	    STRLEN      xpv_cur;	/* length of xpv_pv as a C string */
	    STRLEN      xpv_len;	/* allocated size */
	};

	    This is your basic string scalar that is never used numerically
	    or magically.

	struct xpviv {
	    char *      xpv_pv;		/* pointer to malloced string */
	    STRLEN      xpv_cur;	/* length of xpv_pv as a C string */
	    STRLEN      xpv_len;	/* allocated size */
	    I32		xiv_iv;		/* integer value or pv offset */
	};

	    This is a string scalar that has either been used as an
	    integer, or an integer that has been used in a string
	    context, or has had the front trimmed off of it, in which
	    case xiv_iv contains how far xpv_pv has been incremented
	    from the original allocated value.

	struct xpvnv {
	    char *      xpv_pv;		/* pointer to malloced string */
	    STRLEN      xpv_cur;	/* length of xpv_pv as a C string */
	    STRLEN      xpv_len;	/* allocated size */
	    I32		xiv_iv;		/* integer value or pv offset */
	    double      xnv_nv;		/* numeric value, if any */
	};

	    This is a string or integer scalar that has been used in a
	    numeric context, or a number that has been used in a string
	    or integer context.

	struct xpvmg {
	    char *      xpv_pv;		/* pointer to malloced string */
	    STRLEN      xpv_cur;	/* length of xpv_pv as a C string */
	    STRLEN      xpv_len;	/* allocated size */
	    I32		xiv_iv;		/* integer value or pv offset */
	    double      xnv_nv;		/* numeric value, if any */
	    MAGIC*	xmg_magic;	/* linked list of magicalness */
	    HV*		xmg_stash;	/* class package */
	};

	    This is the top of the line for ordinary scalars.  This scalar
	    has been charmed with one or more kinds of magical or object
	    behavior.  In addition it can contain any or all of integer,
	    double or string.

    SVt_PVLV
    SVt_PVAV
    SVt_PVHV
    SVt_PVCV
    SVt_PVGV
    SVt_PVBM
    SVt_PVFM
	These are specialized forms that are never directly visible to
	the Perl script.  They are independent of each other, and may
	not be promoted to any other type.
	[Actually, PVBM doesn't belong here, but in the previous section.
	saying index($foo,$bar) will in fact turn $bar into a PVBM so that
	it can do Boyer-Moore searching.]

There are several additional data values in the SV structure.  The sv_refcnt
gives the number of references to this SV.  Some of these references may be
actual Perl language references, but many other are just internal pointers,
from a symbol table, or from the syntax tree, for example.  When sv_refcnt
goes to zero, the value can be safely deallocated.  Must be, in fact.

The sv_storage byte is not very well thought out, but tends to indicate
something about where the scalar lives.  It's used in allocating
lexical storage, and at runtime contains an 'O' if the value has been
blessed as an object.  There may be some conflicts lurking in here, and
I may eventually claim some of the bits for other purposes.  [I did,
with a vengeance.]

The sv_flags are currently as follows.  Most of these are set and cleared
by macros to guarantee their consistency, and you should always use the
proper macro rather than accessing them directly.

[Most of these numbers have changed, and there are some new flags.
And they're all stuffed into a single U32.]

#define SVf_IOK		1		/* has valid integer value */
#define SVf_NOK		2		/* has valid numeric value */
#define SVf_POK		4		/* has valid pointer value */
    These tell whether an integer, double or string value is
    immediately available without further consideration.  All tainting
    and magic (but not objecthood) works by turning off these bits and
    forcing a routine to be executed to discover the real value.  The
    SvIV(), SvNV() and SvPV() macros that fetch values are smart about
    all this, and should always be used if possible.  Most of the stuff
    mentioned below you really don't have to deal with directly.  (Values
    aren't stored using macros, but using functions sv_setiv(), sv_setnv()
    and sv_setpv(), plus variants.  You should never have to explicitly
    follow the sv_any pointer to any X structure in your code.)

#define SVf_OOK		8		/* has valid offset value */
    This is only on when SVf_IOK is off, and indicates that the unused
    integer storage is holding an offset for the string pointer value
    because you've done something like s/^prefix//.

#define SVf_MAGICAL	16		/* has special methods */
    This indicates not only that sv_type is at least SVt_PVMG, but
    also that the linked list of magical behaviors is not empty.

#define SVf_OK		32		/* has defined value */
    This indicates that the value is defined.  Currently it means either
    that the type if SVt_REF or that one of SVf_IOK, SVf_NOK, or SVf_POK
    is set.

#define SVf_TEMP	64		/* eventually in sv_private? */
    This indicates that the string is a temporary allocated by one of
    the sv_mortal functions, and that any string value may be stolen
    from it without copying.  (It's important not to steal the value if
    the temporary will continue to require the value, however.)

#define SVf_READONLY	128		/* may not be modified */
    This scalar value may not be modified.  Any function that might modify
    a scalar should check for this first, and reject the operation when
    inappropriate.  Currently only the builtin values for sv_undef, sv_yes
    and sv_no are marked readonly, but eventually we may provide a language
    to set this bit.

The sv_private byte contains some additional bits that apply across the
board.  Really private bits (that depend on the type) are allocated from
128 down.

#define SVp_IOK		1		/* has valid non-public integer value */
#define SVp_NOK		2		/* has valid non-public numeric value */
#define SVp_POK		4		/* has valid non-public pointer value */
    These shadow the bits in sv_flags for tainted variables, indicated that
    there really is a valid value available, but you have to set the global
    tainted flag if you acces them.

#define SVp_SCREAM	8		/* has been studied? */
    Indicates that a study was done on this string.  A studied string is
    magical and automatically unstudies itself when modified.

#define SVp_TAINTEDDIR	16		/* PATH component is a security risk */
    A special flag for $ENV{PATH} that indicates that, while the value
    as a whole may be untainted, some path component names an insecure
    directory.

#define SVpfm_COMPILED	128
    For a format, whether its picture has been "compiled" yet.  This
    cannot be done until runtime because the user has access to the
    internal formline function, and may supply a variable as the
    picture.

#define SVpbm_VALID	128
#define SVpbm_CASEFOLD	64
#define SVpbm_TAIL	32
    For a Boyer-Moore pattern, whether the search string has been invalidated
    by modification (can happen to $pat between calls to index($string,$pat)),
    whether case folding is in force for regexp matching, and whether we're
    trying to match something like /foo$/.

#define SVpgv_MULTI	128
    For a symbol table entry, set when we've decided that this symbol is
    probably not a typo.  Suspected typos can be reported by -w.


Well, that's probably enough for now.  As you can see, we could turn
references into something more like an integer or a pointer value.  In
fact, I suspect the right thing to do is say that a reference is just
a funny type of string pointer that isn't allocated the same way.
This would let us not only have references to scalars, but might provide
a way to have scalars that point to non-malloced memory.  Hmm.  I'll
have to think about that s'more.  You can think about it too.

Larry