1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
|
Newsgroups: comp.lang.perl
Subject: Re: perl5a4: tie ref restriction?
Summary:
Expires:
References: <2h7b64$aai@jethro.Corp.Sun.COM>
Sender:
Followup-To:
Distribution: world
Organization: NetLabs, Inc.
Keywords:
In article <2h7b64$aai@jethro.Corp.Sun.COM> Eric.Arnold@Sun.COM writes:
: Darn:
: tie ( @a, TST_tie, "arg1", "arg2" );
: $a[2]=[1];
:
: produces:
:
: Can't assign a reference to a magical variable at ./tsttie line 12.
:
: I'm all agog about the "tie" function, but ... if this restriction
: wasn't there, I think I would be able to tie a top level
: reference/variable to my own package, and then automatically tie in all
: subsequently linked vars/references so that I could "tie" any arbitrary thing
: like:
: $r->{key}[el]{key}
:
: to a DBM or other type storage area.
:
: Is the restriction necessary?
In the current storage scheme, yes, but as I mentioned in the other
article, I can and probably should relax that. That code is some of
the oldest Perl 5 code, and I didn't see some things then that I do
now.
[I did relax that.]
Ok, let me explain some things about how values are stored. Consider
this a little design document.
Internally everything is unified to look like a scalar, regardless of
its type. There's a type-invariant part of every value, and a
type-variant part. When we modify the type of a value, we can do it in
place because all references point to the invariant part. All we do is
swap the variant part for a different part and change that ANY pointer
in the invariant part to point to the new variant.
The invariant part looks like this:
struct sv {
void* sv_any; /* pointer to something */
U32 sv_refcnt; /* how many references to us */
SVTYPE sv_type; /* what sort of thing pointer points to */
U8 sv_flags; /* extra flags, some depending on type */
U8 sv_storage; /* storage class */
U8 sv_private; /* extra value, depending on type */
};
[The last 4 bytes have been combined into a single U32.]
This is typedefed to SV. There are other structurally equivalent
types, AV, HV and CV, that are there merely to help gdb know what kind
of pointer sv_any is, and provide a little bit of C type-checking.
Here's a key to Perl naming:
SV scalar value
AV array value
HV hash value
CV code value
Additionally I often use names containing
IV integer value
NV numeric value (double)
PV pointer value
RV reference value
LV lvalue, such as a substr() or vec() being assigned to
BM a string containing a Boyer-Moore compiled pattern
FM a format line program
You'll notice that in SV there's an sv_type field. This contains one
of the following values, which gives the interpretation of sv_any.
typedef enum {
SVt_NULL,
SVt_REF,
SVt_IV,
SVt_NV,
SVt_PV,
SVt_PVIV,
SVt_PVNV,
SVt_PVMG,
SVt_PVLV,
SVt_PVAV,
SVt_PVHV,
SVt_PVCV,
SVt_PVGV,
SVt_PVBM,
SVt_PVFM,
} svtype;
[There is no longer a REF type. There's an RV type that holds a minimal ref
value but other types can also hold an RV. This was to allow magical refs.]
These are arranged ROUGHLY in order of increasing complexity, though
there are some discontinuities. Many of them indicate that sv_any
points to a struct of a similar name with an X on the front. They can
be classified like this:
SVt_NULL
The sv_any doesn't point to anything meaningful.
SVt_REF
The sv_any points to another SV. (This is what we're talking
about changing to work more like IV and NV below.) [And that's what
I did.]
SVt_IV
SVt_NV
These are a little tricky in order to be efficient in both
memory and time. The sv_any pointer indicates the location of
a solitary integer(double), but not directly. The pointer is
really a pointer to an XPVIV(XPVNV), so that if there's a valid
integer(double) the same code works regardless of the type of
the SV. They have special allocators that guarantee that, even
though sv_any is pointing to a location several words earlier
than the integer(double), it never points to unallocated
memory. This does waste a few allocated integers(doubles) at
the beginning, but it's probably an overall win.
[SVt_RV probably belongs here.]
SVt_PV
SVt_PVIV
SVt_PVNV
SVt_PVMG
These are pretty ordinary, and each is "derived" from the
previous in the sense that it just adds more data to the
previous structure.
[ Need to add this:
struct xrv {
SV * xrv_rv; /* pointer to another SV */
};
A reference value. In the following structs its space is reserved
as a char* xpv_pv, but if SvROK() is true, xpv_pv is pointing to
another SV, not a string.
]
struct xpv {
char * xpv_pv; /* pointer to malloced string */
STRLEN xpv_cur; /* length of xpv_pv as a C string */
STRLEN xpv_len; /* allocated size */
};
This is your basic string scalar that is never used numerically
or magically.
struct xpviv {
char * xpv_pv; /* pointer to malloced string */
STRLEN xpv_cur; /* length of xpv_pv as a C string */
STRLEN xpv_len; /* allocated size */
I32 xiv_iv; /* integer value or pv offset */
};
This is a string scalar that has either been used as an
integer, or an integer that has been used in a string
context, or has had the front trimmed off of it, in which
case xiv_iv contains how far xpv_pv has been incremented
from the original allocated value.
struct xpvnv {
char * xpv_pv; /* pointer to malloced string */
STRLEN xpv_cur; /* length of xpv_pv as a C string */
STRLEN xpv_len; /* allocated size */
I32 xiv_iv; /* integer value or pv offset */
double xnv_nv; /* numeric value, if any */
};
This is a string or integer scalar that has been used in a
numeric context, or a number that has been used in a string
or integer context.
struct xpvmg {
char * xpv_pv; /* pointer to malloced string */
STRLEN xpv_cur; /* length of xpv_pv as a C string */
STRLEN xpv_len; /* allocated size */
I32 xiv_iv; /* integer value or pv offset */
double xnv_nv; /* numeric value, if any */
MAGIC* xmg_magic; /* linked list of magicalness */
HV* xmg_stash; /* class package */
};
This is the top of the line for ordinary scalars. This scalar
has been charmed with one or more kinds of magical or object
behavior. In addition it can contain any or all of integer,
double or string.
SVt_PVLV
SVt_PVAV
SVt_PVHV
SVt_PVCV
SVt_PVGV
SVt_PVBM
SVt_PVFM
These are specialized forms that are never directly visible to
the Perl script. They are independent of each other, and may
not be promoted to any other type.
[Actually, PVBM doesn't belong here, but in the previous section.
saying index($foo,$bar) will in fact turn $bar into a PVBM so that
it can do Boyer-Moore searching.]
There are several additional data values in the SV structure. The sv_refcnt
gives the number of references to this SV. Some of these references may be
actual Perl language references, but many other are just internal pointers,
from a symbol table, or from the syntax tree, for example. When sv_refcnt
goes to zero, the value can be safely deallocated. Must be, in fact.
The sv_storage byte is not very well thought out, but tends to indicate
something about where the scalar lives. It's used in allocating
lexical storage, and at runtime contains an 'O' if the value has been
blessed as an object. There may be some conflicts lurking in here, and
I may eventually claim some of the bits for other purposes. [I did,
with a vengeance.]
The sv_flags are currently as follows. Most of these are set and cleared
by macros to guarantee their consistency, and you should always use the
proper macro rather than accessing them directly.
[Most of these numbers have changed, and there are some new flags.
And they're all stuffed into a single U32.]
#define SVf_IOK 1 /* has valid integer value */
#define SVf_NOK 2 /* has valid numeric value */
#define SVf_POK 4 /* has valid pointer value */
These tell whether an integer, double or string value is
immediately available without further consideration. All tainting
and magic (but not objecthood) works by turning off these bits and
forcing a routine to be executed to discover the real value. The
SvIV(), SvNV() and SvPV() macros that fetch values are smart about
all this, and should always be used if possible. Most of the stuff
mentioned below you really don't have to deal with directly. (Values
aren't stored using macros, but using functions sv_setiv(), sv_setnv()
and sv_setpv(), plus variants. You should never have to explicitly
follow the sv_any pointer to any X structure in your code.)
#define SVf_OOK 8 /* has valid offset value */
This is only on when SVf_IOK is off, and indicates that the unused
integer storage is holding an offset for the string pointer value
because you've done something like s/^prefix//.
#define SVf_MAGICAL 16 /* has special methods */
This indicates not only that sv_type is at least SVt_PVMG, but
also that the linked list of magical behaviors is not empty.
#define SVf_OK 32 /* has defined value */
This indicates that the value is defined. Currently it means either
that the type if SVt_REF or that one of SVf_IOK, SVf_NOK, or SVf_POK
is set.
#define SVf_TEMP 64 /* eventually in sv_private? */
This indicates that the string is a temporary allocated by one of
the sv_mortal functions, and that any string value may be stolen
from it without copying. (It's important not to steal the value if
the temporary will continue to require the value, however.)
#define SVf_READONLY 128 /* may not be modified */
This scalar value may not be modified. Any function that might modify
a scalar should check for this first, and reject the operation when
inappropriate. Currently only the builtin values for sv_undef, sv_yes
and sv_no are marked readonly, but eventually we may provide a language
to set this bit.
The sv_private byte contains some additional bits that apply across the
board. Really private bits (that depend on the type) are allocated from
128 down.
#define SVp_IOK 1 /* has valid non-public integer value */
#define SVp_NOK 2 /* has valid non-public numeric value */
#define SVp_POK 4 /* has valid non-public pointer value */
These shadow the bits in sv_flags for tainted variables, indicated that
there really is a valid value available, but you have to set the global
tainted flag if you acces them.
#define SVp_SCREAM 8 /* has been studied? */
Indicates that a study was done on this string. A studied string is
magical and automatically unstudies itself when modified.
#define SVp_TAINTEDDIR 16 /* PATH component is a security risk */
A special flag for $ENV{PATH} that indicates that, while the value
as a whole may be untainted, some path component names an insecure
directory.
#define SVpfm_COMPILED 128
For a format, whether its picture has been "compiled" yet. This
cannot be done until runtime because the user has access to the
internal formline function, and may supply a variable as the
picture.
#define SVpbm_VALID 128
#define SVpbm_CASEFOLD 64
#define SVpbm_TAIL 32
For a Boyer-Moore pattern, whether the search string has been invalidated
by modification (can happen to $pat between calls to index($string,$pat)),
whether case folding is in force for regexp matching, and whether we're
trying to match something like /foo$/.
#define SVpgv_MULTI 128
For a symbol table entry, set when we've decided that this symbol is
probably not a typo. Suspected typos can be reported by -w.
Well, that's probably enough for now. As you can see, we could turn
references into something more like an integer or a pointer value. In
fact, I suspect the right thing to do is say that a reference is just
a funny type of string pointer that isn't allocated the same way.
This would let us not only have references to scalars, but might provide
a way to have scalars that point to non-malloced memory. Hmm. I'll
have to think about that s'more. You can think about it too.
Larry
|