diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2002-02-28 05:43:45 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2002-02-28 05:43:45 +0000 |
commit | e05949c7fbf3ae0363947bc70c1c662248b91b93 (patch) | |
tree | a4673734f9526bd3b579333443c956499954741f /hv.h | |
parent | 4379a6f8153cde10f045a82b5434d852f701ae7a (diff) | |
download | perl-e05949c7fbf3ae0363947bc70c1c662248b91b93.tar.gz |
Make shared hash keys to be \0-terminated:
one possible resolution for
"UTF-8, weird \w behaviour after HASH-KEY-ification"
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-01/msg01327.html
The hash keys were shared (the SvLEN(sv) = 0 was the giveaway).
The hash keys weren't \0-terminated. This meant that the EOL ($)
in regmatch() got the nextchr beyond the last character. Since
the keys were UTF-8, the nextchr was \1, not the usual
string-terminating \0. Wham, no match.
I think another possible resolution could be to stop the nextchr
computation in regmatch() from peeking beyond the last character
of the string:
nextchr = locinput < PL_regeol ? UCHARAT(locinput) : 0;
p4raw-id: //depot/perl@14908
Diffstat (limited to 'hv.h')
-rw-r--r-- | hv.h | 4 |
1 files changed, 3 insertions, 1 deletions
@@ -23,6 +23,8 @@ struct hek { U32 hek_hash; /* hash of key */ I32 hek_len; /* length of hash key */ char hek_key[1]; /* variable-length hash key */ + /* the hash-key is \0-terminated */ + /* after the \0 there is a byte telling whether the key is UTF8 */ }; /* hash structure: */ @@ -211,7 +213,7 @@ C<SV*>. #define HEK_HASH(hek) (hek)->hek_hash #define HEK_LEN(hek) (hek)->hek_len #define HEK_KEY(hek) (hek)->hek_key -#define HEK_UTF8(hek) (*(HEK_KEY(hek)+HEK_LEN(hek))) +#define HEK_UTF8(hek) (*(HEK_KEY(hek)+HEK_LEN(hek)+1)) /* calculate HV array allocation */ #if defined(STRANGE_MALLOC) || defined(MYMALLOC) |