summaryrefslogtreecommitdiff
path: root/dquote.c
Commit message (Collapse)AuthorAgeFilesLines
* Base *.[ch] files: Replace leading tabs with blanksMichael G Schwern2021-05-311-11/+11
| | | | | | | This is a rebasing by @khw of part of GH #18792, which I needed to get in now to proceed with other commits. It also strips trailing white space from the affected files.
* Allow blanks within and adjacent to {...} constructsKarl Williamson2021-01-201-9/+56
| | | | | This was the consensus in http://nntp.perl.org/group/perl.perl5.porters/258489
* dquote.c: Change variable nameKarl Williamson2021-01-201-26/+26
| | | | | A future commit will need it to represent just the meaning of the new name
* style: Detabify indentation of the C code maintained by the core.Michael G. Schwern2021-01-171-11/+11
| | | | | | | | | | | This just detabifies to get rid of the mixed tab/space indentation. Applying consistent indentation and dealing with other tabs are another issue. Done with `expand -i`. * vutil.* left alone, it's part of version. * Left regen managed files alone for now.
* Remove dquote_inline.hKarl Williamson2020-01-231-1/+0
| | | | | The remaining function in this file is moved to inline.h, just to not have an extra file lying around with hardly anything in it.
* Revise \o{ missing '}' error messageKarl Williamson2020-01-231-4/+1
| | | | | | | | All the other messages raised when a construct is expecting a terminating '}' but none is found include the '}' in the message. '\o{' did not. Since these diagnostics are getting revised anyway, and I didn't find any CPAN modules relying on the wording, this commit makes the messages consistent by adding the '}' to the \o message.
* Restructure grok_bslash_[ox]Karl Williamson2020-01-231-62/+273
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit causes these functions to allow a caller to request any messages generated to be returned to the caller, instead of always being handled within these functions. The messages are somewhat changed from previously to be clearer. I did not find any code in CPAN that relied on the previous message text. Like the previous commit for grok_bslash_c, here are two reasons to do this, repeated here. 1) In pattern compilation this brings these messages into conformity with the other ones that get generated in pattern compilation, where there is a particular syntax, including marking the exact position in the parse where the problem occurred. 2) These could generate truncated messages due to the (mostly) single-pass nature of pattern compilation that is now in effect. It keeps track of where during a parse a message has been output, and won't output it again if a second parsing pass turns out to be necessary. Prior to this commit, it had to assume that a message from one of these functions did get output, and this caused some out-of-bounds reads when a subparse (using a constructed pattern) was executed. The possibility of those went away in commit 5d894ca5213, which guarantees it won't try to read outside bounds, but that may still mean it is outputting text from the wrong parse, giving meaningless results. This commit should stop that possibility.
* Restructure grok_bslash_cKarl Williamson2020-01-231-21/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit causes this function to allow a caller to request any messages generated to be returned to the caller, instead of always being handled within this function. Like the previous commit for grok_bslash_c, here are two reasons to do this, repeated here. 1) In pattern compilation this brings these messages into conformity with the other ones that get generated in pattern compilation, where there is a particular syntax, including marking the exact position in the parse where the problem occurred. 2) The messages could be truncated due to the (mostly) single-pass nature of pattern compilation that is now in effect. It keeps track of where during a parse a message has been output, and won't output it again if a second parsing pass turns out to be necessary. Prior to this commit, it had to assume that a message from one of these functions did get output, and this caused some out-of-bounds reads when a subparse (using a constructed pattern) was executed. The possibility of those went away in commit 5d894ca5213, which guarantees it won't try to read outside bounds, but that may still mean it is outputting text from the wrong parse, giving meaningless results. This commit should stop that possibility.
* dquote.c: Change parameter nameKarl Williamson2020-01-231-14/+14
| | | | | | In two functions, future commits will generalize this parameter to be possibly a warning message instead of only an error message. Change its name to reflect the added meaning.
* Hoist code point portability warningsKarl Williamson2020-01-231-16/+3
|
* PATCH: [perl #133937] Assertion failureKarl Williamson2019-03-191-0/+13
| | | | | | | | | This recently added assertion actually caught an error, which is a potential read beyond end of buffer. This doesn't actually happen in this case because this is a regular expression pattern, and the toker makes sure there is a trailing NUL (that isn't counted). The solution is to check the bounds before reading.
* dquote.c: Prevent possible out-of-bounds readKarl Williamson2019-03-191-1/+1
| | | | | This code read a byte that was potentially out-of-bounds. I don't know how it could get this far, but maybe some fuzzing code could get it.
* Change error wording for empty \x{}Karl Williamson2019-03-191-1/+1
| | | | | | | | An empty \x{} is unfortunately legal (returning a NUL) except in the scope of "use re 'strict'". Since this is an experimental feature, things like wording changes are allowed. It is unlikely anyone is relying on the precise wording of this fatal error under 'strict', and now all the messages for similar errors are similarly worded.
* Change error wording for \o{}Karl Williamson2019-03-191-1/+1
| | | | | | | | | | | An empty \o{} no longer says "Number with no digits" in favor of "Empty \o{}" which is more consistent with errors raised for things like \b{}, \P{}. There is a small risk of breakage with this change, as with any diagnostic wording change. However, this construct is relatively new and rarely used, and this is a fatal error, not a warning you might want to trap on. There are no empty \o{} instances in CPAN.
* dquote.c: Use UTF8_SAFE_SKIPKarl Williamson2019-03-131-3/+3
| | | | | Otherwise malformed input could cause this to return a pointer outside its buffer
* dquote.c: Use memchr() instead of strchr()Karl Williamson2017-11-061-18/+26
| | | | | | | This allows \x and \o to work properly in the face of embedded NULs. A limit parameter is added to each function, and that is passed to memchr (which replaces strchr). See the branch merge message for more information.
* dquote.c: Rmv extraneous #ifdef; add assertionsKarl Williamson2017-10-241-12/+5
| | | | | | assert() already does nothing unless -DDEBUGGING; no need to enclose them in #ifdef DEBUGGING. And this adds another assertion that is required to be true on entry to the function.
* Revert "Deprecating the use of C<< \cI<X> >> to specify a printable character."Sawyer X2017-02-121-4/+4
| | | | This reverts commit bfdc8cd3d5a81ab176f7d530d2e692897463c97d.
* Deprecating the use of C<< \cI<X> >> to specify a printable character.Abigail2017-01-161-4/+4
| | | | | | | | | | | Starting in 5.14, we deprecated the use of "\cI<X>" when this results in a printable character. For instance, "\c:" is just a fancy way of writing "z". Starting in 5.28, this will be a fatal error. This also includes certain usage in regular expressions with the experimental (?[ ]) construct, or when "use re 'strict'" is in effect (also experimental).
* regcomp.c, toke.c: swap functions being inline staticKarl Williamson2016-02-181-21/+113
| | | | | | | | | | grok_bslash_x() is so large that no compiler will inline it. Move it to dquote.c from dq_inline.c. Conversely, move form_octal_warning() to dq_inline.c. It is so tiny that the function call overhead is scarcely smaller than the function body. This also moves things in embed.fnc so all these functions. are not visible outside the few files they are supposed to be used in.
* toke.c: Remove soon-to-be invalid t assumptionKarl Williamson2015-11-251-4/+0
| | | | | | | | | | | | | | | | | The code in toke.c assumes that the UTF8 expansion of the string "\x{foo}" takes no more bytes than the original input text, which includes the 4 bytes of overhead "\x{}". Similarly for "\o{}". The functions that convert to the code point actually now assert for this. The next commit will make this assumption definitely invalid on EBCDIC platforms. Remove the assertions, and actually handle the case properly. The other places that call the conversion functions do not make this assumption, so there is no harm in removing them from there. Since we believe that this can't happen except on EBCDIC, we could #ifdef this code and use just an assert on non-EBCDIC. But it's easier to maintain if #ifdef's are minimized. Parsing is not a time-critical operation, like being in an inner loop, and the extra test gives a branch prediction hint to the compiler.
* Change to use UVCHR_SKIP over UNI_SKIPKarl Williamson2015-09-041-1/+1
| | | | | | | | | | UNI_SKIP is somewhat ambiguous. Perl has long used 'uvchr' as part of a name to mean the unsigned values using the native character set plus Unicode values for those above 255. This also changes two calls (one in dquote_static.c and one in dquote_inline.h) to use UVCHR_SKIP; they should not have been OFFUNI, as they are dealing with native values.
* dquote_static.c -> dquote.cJarkko Hietaniemi2015-07-221-0/+199
Instead of #include-ing the C file, compile it normally.