summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Stop inadvertently skipping Spec.t on VMS.Craig A. Berry2012-01-141-4/+4
| | | | | | | | | ae5a807c7dcf moved a check against $@ away from the eval it was checking and inserted another eval in between, the effect of which was to make the tests that can only run on VMS get skipped there too. Ouch. There are other problems with ae5a807c7dcf, but this is a start.
* pp_sys.c: goto mustn’t skip initialisationFather Chrysostomos2012-01-141-1/+2
|
* perldelta up to 7c2b3c783bFather Chrysostomos2012-01-141-2/+25
|
* magic.t: Correct miniperl skip countFather Chrysostomos2012-01-141-1/+1
|
* -T "unreadable file" should set stat info consistentlyFather Chrysostomos2012-01-143-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | This was mentioned in ticket #77388. It turns out to be related to #4253. If the file cannot be opened, -T and -B on filenames set the last han- dle to null and set the last stat type to stat, but leave the actual stat buffer and success status as they were. That means that stat(_) will continue to return the previous buffer, but lstat(_) will no longer work. This is another of those inconsistent cases where the internal stat info is only partially set. Originally, this code would set PL_laststatval (the success status) to -1. Commit 25988e07 (the patch in ticket #4253) intentionally changed this to make -T _ less suprising on read-only files. But the patch ended up affecting -T with an explicit file name, too. It also only partially fixed things for -T _, because the last stat type *was* still being set. This commit changes it to set all the stat info, for explicit file names, or no stat info, for _ (if the previous stat was with a file name).
* stat.t: Add bug numberFather Chrysostomos2012-01-141-0/+1
|
* Don’t emit unopened warning for other stat(HANDLE) errorFather Chrysostomos2012-01-141-1/+4
| | | | | | | | | | | | | -r or -T on a GV with no IO or on an IO with no fp (or dirp for -r) will produce an ‘unopened’ warning. stat() on a filehandle will warn about an unopened filehandle not only if there is no fp, but also if the fstat call fails (with errno containing EBADP, EFAULT or EIO, at least on Darwin). I don’t know if there is a way to test this. (But pp_stat and my_stat_flags are getting closer, so this must be correct. :-)
* Make -T BADHANDLE set errno with fatal warningsFather Chrysostomos2012-01-142-2/+13
| | | | | | | | | | | Due to the order of the statements, SETERRNO would never be reached with fatal warnings. I’ve added another SETERRNO out of paranoia. If there is a nicely- behaved __WARN__ handler, we should still be setting errno just before -T returns, in case the handler changed it. We can’t do much in the case of fatal handlers that do system calls. (Is $! localised for those?)
* Make -l HANDLE set PL_laststatval with fatal warningsFather Chrysostomos2012-01-142-2/+9
| | | | | | | Fatal warnings were preventing it from being set, because the warning came first. (PL_laststatval records the success status of the previous stat.)
* Make -T HANDLE and -B HANDLE always set last stat typeFather Chrysostomos2012-01-132-3/+9
| | | | | | | | -T and -B on handles always set PL_laststatval (which indicates the success of the previous stat). But they don’t set the last stat type (PL_laststype) for closed filehandles. Those two should always go together. stat and -r, -w etc., always set PL_laststype for a closed or missing filehandle.
* pp_sys.c:pp_fttest: Don’t set PL_statname to SvPV(PL_statname)Father Chrysostomos2012-01-131-2/+1
| | | | | | This is a waste of CPU cycles. PL_statname is always a PV.
* Make -T _ and -B _ always set PL_laststatvalFather Chrysostomos2012-01-132-3/+9
| | | | | | | | | | | | -T _ and -B _ always do another stat() on the previous file handle or filename, unless it is a handle that has been closed. Normally, the internal stat buffer, status, etc., are reset even for _. This happens even on a failed fstat(). -T HANDLE and -B HANDLE currently *do* reset the stat status (PL_laststatval) if there is no IO thingy, so having -T _ and -B _ not do that makes things needlessly inconsistent.
* pp_sys.c: Remove space from lstat($ioref) warningFather Chrysostomos2012-01-133-2/+6
| | | | | | This was emitting two spaces before the ‘at’: lstat() on filehandle at -e line 1.
* pp_sys.c:pp_fttext: Don’t extend the stack after poppingFather Chrysostomos2012-01-131-1/+3
|
* Squash repetitititive code in doio.c:my_stat_flagsFather Chrysostomos2012-01-131-8/+3
|
* Make failed filetests consistent with & w/out fatal warningsFather Chrysostomos2012-01-132-3/+32
| | | | | | The result of stat(_) after a failed -r HANDLE would differ depending on whether fatal warnings are on. This corrects that, by setting the internal status before warning about an unopened filehandle.
* stat $ioref should record the handle for -T _Father Chrysostomos2012-01-134-5/+23
| | | | | | | stat $gv records the handle so that -T _ can use it. But stat $ioref hasn’t been doing that, until this commit. PL_statgv can now hold an SVt_PVIO instead of a SVt_PVGV.
* stat $ioref should reset the internal stat typeFather Chrysostomos2012-01-132-1/+11
| | | | | | | | In addition to a stat buffer, Perl keeps track internally of which type of stat was done last, either stat or lstat, so that lstat _ can die if the previous type was stat. This was not being reset for stat $ioref. Filetest ops were fine.
* Set PL_statgv to null when freed or coercedFather Chrysostomos2012-01-132-3/+26
| | | | | | | | | | | | If PL_statgv is not set to null when freed, that same SV could be reused for another GV, in which case -T _ will then use another handle unrelated to the previous stat. Similarly, if PL_statgv points to a fake glob that gets coerced into a non-glob before it is freed, it will not follow the code path in sv_free that sets PL_statgv to null. Furthermore, if it becomes a GV again, it could be a completely different filehandle, unrelated to the previous stat.
* Suppress confusing uninit warning from -T _Father Chrysostomos2012-01-132-2/+5
| | | | | | | | | | | -T _ uses the file name saved by a preceding stat. If there was no preceding stat, the internal sv used to store the file name is unde- fined, so SvPV producing an uninitialized warning. Normally a failed -T will just return undefined and set $!. Normally stat on a filehan- dle will set the internal stat file name to "". This commit sets the internal file name to "" initially on startup, instead of creating an undefined scalar.
* defined *{"+"} should not stop %+ from workingFather Chrysostomos2012-01-132-1/+23
| | | | | | | | | | | | The same applies to %-. This is something I broke when merging is_magical_gv with gv_fetchpvn_flags. gv_fetchpvn_flags must make sure its *+ glob is present in the symbol table when it loads Tie::Hash::NamedCapture. If it adds it afterwards it will clobber another *+ that Tie::Hash::NamedCapture has autovivi- fied and tied in the mean time.
* defined *{"!"} should not stop %! from workingFather Chrysostomos2012-01-132-2/+14
| | | | | | | | | This is something I broke when merging is_magical_gv with gv_fetchpvn_flags. gv_fetchpvn_flags must make sure its *! glob is present in the sym- bol table it loads Errno. If it adds it afterwards it will clobber another *! that Errno has autovivified and tied in the mean time.
* Squash repetititive code in util.c:report_evil_fhFather Chrysostomos2012-01-131-17/+9
|
* perldelta for Unicode property performance changesKarl Williamson2012-01-131-0/+20
| | | | | I put this under a major change, but would be fine if it is moved to an =item change.
* util.c: Silence compiler warningKarl Williamson2012-01-131-0/+1
| | | | | cc on solaris is smart enough to figure out that this return isn't reached.
* regcomp.c: Compile inverted character classes with \p{}Karl Williamson2012-01-131-5/+1
| | | | | | This commit causes character classes of the form [^\p{...}] to have their code points known at compile time instead of runtime. This allows for better optimization and runtime execution speed.
* regcomp.c: Prepare for allowing [^\p{...}]Karl Williamson2012-01-131-8/+52
| | | | | | | | | It turns out that this code is buggy, except for the fact that <nonbitmap> currently can't contain conflicts. The trouble would have started when Unicode properties were moved to being looked at at compile time -- except when the class is to be inverted, so there isn't a problem. But in preparation for handling this case, we fix the potential bugs, as specified in the comments.
* regcomp.c; Use Latin1 \p{} in optimizationKarl Williamson2012-01-131-5/+57
| | | | | | | | | | | This commit causes any Latin1-range characters from Unicode properties to be placed at compile time into the bitmap of the ANYOF node that implements those properties, and to remove the flag that says they should be looked for at run time. This causes the optimizer to generate a better start class, as it knows more fully which characters can be and can't be in the start class, and speeds up runtime checking, as it can just do a bitmap test for these, instead of having to go look at the swash.
* regcomp.c: Better optimize [classes] under /aa.Karl Williamson2012-01-131-5/+9
| | | | | | | | | | | | | | | An optimization introduced in 5.14 is for bracketed character classes of the very special form like [Bb]. These can be optimized into an EXACTFish node. In this case, they can be optimized to an EXACTFA node since they are ASCII characters. If the surrounding options are /aa, it is likely that any adjacent EXACTFish nodes will be EXACTFA, so optimize to that node instead of the previous EXACTFU. This will allow the optimizer to collapse any adjacent nodes. For example qr/a[B]c/aai will now get optimized to an EXACTFA of "abc". Previously it would have gotten optimized to EXACTFA<a> . EXACTFU<b> . EXACTFA<c>.
* regcomp.c: Avoid unnecessary runtime fold checkingKarl Williamson2012-01-131-2/+9
| | | | | | | | | | | | | | Since 5.14, the single-char folds have been calculated at compile time, either by doing it there, or for properties, setting the swash name to include a foleded or non-folded version of the property. Thus this patch could have been done much earlier. Now, most of the properties are actually computed at compile time by previous patches, but that isn't relevant to this one. Thus there really doesn't need to be runtime folding for things that aren't in the bitmap, except for those things under /d that match only if the string is in UTF8.
* regcomp.c: Change loop variable name, associated changesKarl Williamson2012-01-131-7/+21
| | | | | | | The variable 'value' is already used for something else. Using it as a loop variable corrupts the other use. This commit changes to a different name, and adds code to keep 'value', and 'prevvalue' in sync with their other meanings.
* regexec.c: Use shared swash in bracketed character classesKarl Williamson2012-01-131-1/+1
| | | | | | | | | | | | | | | | | This takes advantage of an earlier commit to use a swash that may be shared across multiple character class instances. That means that if a match in another class has to look up a value, that that same value is automatically available without further lookup to all character classes that share the swash. This means that the lookup result only needs be cached once for all instances in the thread, saving time and memory. Note that currently the only swashes that are shared are those that consist solely of a single Unicode property definition. Some sort of checksum would have to be computed if this were to be extended to custom classes. But what this does is cause sharing for all Unicode properties that aren't in bracketed classes (as they are implemented as a bracketed class with a single element), as well as the few cases where someone explicitly writes [\p{foo}] without anything else in the class.
* regexec.c: Allow for returning shared swashKarl Williamson2012-01-134-5/+22
| | | | | | | | | | | | This changes the function that returns the swash associated with a bracketed character class so that it returns the original swash and not a copy. The function is renamed and made accessible only from within regexec.c, and a new wrapper function with the original name is created that just calls the other one and returns a copy of the swash. Thus, all access from outside regexec.c will use a copy which if overwritten will not harm others; while the option exists from within regexec.c to use a shared version.
* regcomp.c: Clean up commentKarl Williamson2012-01-131-11/+13
|
* perlunicode: Discourage use of is_utf8_char()Karl Williamson2012-01-131-4/+5
|
* perlop: Typos, too long lines, correctionsKarl Williamson2012-01-132-7/+7
|
* intrpvar.h: clarification in commentKarl Williamson2012-01-131-1/+1
|
* utf8.c: fix typo in podKarl Williamson2012-01-131-1/+1
|
* regcomp.c: Avoid leaking a scalarKarl Williamson2012-01-131-0/+1
|
* regcomp.c: truncate long debug dump outputKarl Williamson2012-01-131-1/+12
| | | | | What an ANYOF node matches could theoretically be millions of characters long; This only outputs the first portion of very long ones.
* regcomp.c: in debug output, don't duplicate code pointsKarl Williamson2012-01-131-1/+7
| | | | | The non-bitmap portion of an ANYOF node may also be in the bitmap portion. There is no sense in having duplicate output
* regcomp.c: Change debug dump of bitmap/non-bitmapKarl Williamson2012-01-131-2/+7
| | | | | | Instead of '...' separating the two components of the output, change it to a single space, which is output only if the first component isn't null.
* regcomp.c: Change \t to a - in debug dumping rangesKarl Williamson2012-01-131-0/+4
| | | | | This changes the separator in the output of a range from a tab to a hyphen, which is clearer.
* regcomp.c: White-space onlyKarl Williamson2012-01-131-38/+36
| | | | Remove trailing tabs
* regcomp.c: put_byte wants an ord, not a utf8 charKarl Williamson2012-01-131-11/+3
| | | | | These were calling put_byte() incorrectly, with a utf8 char instead of a the ordinal.
* regcomp.c: White-space onlyKarl Williamson2012-01-131-2/+2
| | | | These lines were indented one stop too many for the enclosing block
* regcomp.c: Don't read beyond inputKarl Williamson2012-01-131-2/+5
| | | | | This code was assuming that there were several more bytes in the input stream, when there may not be. This was discovered by valgrind.
* regcomp.c: Optimize a single Unicode property in a [character class]Karl Williamson2012-01-134-12/+29
| | | | | | | | | | | | | | | | | All Unicode properties actually turn into bracketed character classes, whether explicitly done or not. A swash is generated for each property in the class. If that is the only thing not in the class's bitmap, it specifies completely the non-bitmap behavior of the class, and can be passed explicitly to regexec.c. This avoids having to regenerate the swash. It also means that the same swash is used for multiple instances of a property. And that means the number of duplicated data structures is greatly reduced. This currently doesn't extend to cases where multiple Unicode properties are used in the same class [\p{greek}\p{latin}] will not share the same swash as another character class with the same components. This is because I don't know of a an efficient method to determine if a new class being parsed has the same components as one already generated. I suppose some sort of checksum could be generated, but that is for future consideration.
* Move Unicode property defn processing to compile timeKarl Williamson2012-01-131-14/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the processing of most Unicode property definitions from execution (regexec.c) to compilation (regcomp.c). There is a cost to do this. By deferring it to execution, it may be that the affected path will never be taken, and hence the work won't have to be done; whereas, it's always done if it gets done at compilation. However, doing it at compilation, has many advantages. We can't optimize what we don't know about, so this allows for better optimization, as well as feature enhancements, such as set manipulations, restricting matches to certain scripts, etc. A big one, about to be committed allows for significantly reducing the number of copies of the data structure used for each property. (Currently, every mention in every regular expression of a given property will generate a new instance of its hash, and so results of look-ups of code points in one instance aren't automatically known to other instances, so the code point has to be looked-up again.) This commit leaves the processing to execution time when the class is to be inverted. This was done purely to make the commit smaller, and will be removed in a future commit; hence the redundant test here will be removed shortly. It also has to leave to execution time processing of properties whose definition is not known yet. That can happen when the property is user-defined. We call _core_swash_init(), and if it fails, we assume that it's because it's such a property, and if it turns out that it was an unknown property, we leave to execution time the raising of a warning for it, just as before. Currently, the processing of properties in inverted character classes is also left to execution time. This restriction will be lifted in a future commit, and this patch assumes that, and doesn't indent some code that it otherwise would, in anticipation of the surrounding 'if' tests being removed.
* regcomp.c: Pass inversion list directly to regexec.cKarl Williamson2012-01-131-21/+17
| | | | | | | | Currently, any generated inversion list is stringified and passed in the data structure to regexec.c as such. regexec.c then calls _core_swash_init() to convert it into a swash and back into an inversion list. This intermediate step is wasteful, and this commit dispenses with it, based on preparatory commits in regexec.c and utf8.c