summaryrefslogtreecommitdiff
path: root/pp_hot.c
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2014-02-18 12:59:26 -0700
committerKarl Williamson <public@khwilliamson.com>2014-02-19 14:31:38 -0700
commit63baef57e83f77e202ae14ef902a6615cf69c8a2 (patch)
treee9ab630270d9070ec495be7115931cb6a3551939 /pp_hot.c
parentfdf73a7f7fb994c00e17a01f146018fcb3c47ffb (diff)
downloadperl-63baef57e83f77e202ae14ef902a6615cf69c8a2.tar.gz
Make taint checking regex compile time instead of runtime
See discussion at https://rt.perl.org/Ticket/Display.html?id=120675 There are several unresolved items in this discussion, but we did agree that tainting should be dependent only on the regex pattern, and not the particular input string being matched against: "The bottom line is we are moving to the policy that tainting is based on the operation being in locale, without regard to the particular operand's contents passed this time to the operation. This means simpler core code and more consistent tainting results. And it lessens the likelihood that there are paths in the core that should taint but don't" This commit does the minimal work to change regex pattern matching to determine tainting at pattern compilation time. Simply put, if a pattern contains a regnode whose match/not match depends on the run-time locale, any attempt to match against that pattern will taint, regardless of the actual target string or runtime locale in effect. Given this change, there are optimizations that can be made to avoid runtime work, but these are deferred until later. Note that just because a regular expression is compiled under locale doesn't mean that the generated pattern will be tainted. It depends on the actual pattern. For example, the pattern /(.)/ doesn't taint because it will match exactly one character of the input, regardless of locale settings.
Diffstat (limited to 'pp_hot.c')
-rw-r--r--pp_hot.c7
1 files changed, 2 insertions, 5 deletions
diff --git a/pp_hot.c b/pp_hot.c
index 79b77abb24..fb22b3897e 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -1951,17 +1951,14 @@ While the pattern is being assembled/concatenated and then compiled,
PL_tainted will get set (via TAINT_set) if any component of the pattern
is tainted, e.g. /.*$tainted/. At the end of pattern compilation,
the RXf_TAINTED flag is set on the pattern if PL_tainted is set (via
-TAINT_get).
+TAINT_get). Also, if any component of the pattern matches based on
+locale-dependent behavior, the RXf_TAINTED_SEEN flag is set.
When the pattern is copied, e.g. $r = qr/..../, the SV holding the ref to
the pattern is marked as tainted. This means that subsequent usage, such
as /x$r/, will set PL_tainted using TAINT_set, and thus RXf_TAINTED,
on the new pattern too.
-At the start of execution of a pattern, the RXf_TAINTED_SEEN flag on the
-regex is cleared; during execution, locale-variant ops such as POSIXL may
-set RXf_TAINTED_SEEN.
-
RXf_TAINTED_SEEN is used post-execution by the get magic code
of $1 et al to indicate whether the returned value should be tainted.
It is the responsibility of the caller of the pattern (i.e. pp_match,