Fix uninitialized memory use when writing study data to file if no starting

byte set exists. git-svn-id: svn://vcs.exim.org/pcre/code/trunk@787 2f5784b3-3f2a-0410-8824-cb99058d5e15
author: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2011-12-06 15:37:24 +0000
committer: ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> 2011-12-06 15:37:24 +0000
commit: 475e97e3c2ef83094b3b2239b7cf4ffcc2c05f68 (patch)
tree: e29e71d1cf791f5a6bb5ad746bde31fb71bf8903
parent: fe230b59c018dd441d38ccc8eff23f35fd009a03 (diff)
download: pcre-475e97e3c2ef83094b3b2239b7cf4ffcc2c05f68.tar.gz
2 files changed, 62 insertions, 52 deletions
diff --git a/ChangeLog b/ChangeLog
index c75bcad..bdd2619 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -22,87 +22,92 @@ Version 8.21 05-Dec-2011
 
 6.  Lookbehinds such as (?<=a{2}b) that contained a fixed repetition were
     erroneously being rejected as "not fixed length" if PCRE_CASELESS was set.
-    This bug was probably introduced by change 9 of 8.13. 
-    
+    This bug was probably introduced by change 9 of 8.13.
+
 7.  While fixing 6 above, I noticed that a number of other items were being
-    incorrectly rejected as "not fixed length". This arose partly because newer 
+    incorrectly rejected as "not fixed length". This arose partly because newer
     opcodes had not been added to the fixed-length checking code. I have (a)
     corrected the bug and added tests for these items, and (b) arranged for an
     error to occur if an unknown opcode is encountered while checking for fixed
-    length instead of just assuming "not fixed length". The items that were 
-    rejected were: (*ACCEPT), (*COMMIT), (*FAIL), (*MARK), (*PRUNE), (*SKIP), 
-    (*THEN), \h, \H, \v, \V, and single character negative classes with fixed 
+    length instead of just assuming "not fixed length". The items that were
+    rejected were: (*ACCEPT), (*COMMIT), (*FAIL), (*MARK), (*PRUNE), (*SKIP),
+    (*THEN), \h, \H, \v, \V, and single character negative classes with fixed
     repetitions, e.g. [^a]{3}, with and without PCRE_CASELESS.
-    
+
 8.  A possessively repeated conditional subpattern such as (?(?=c)c|d)++ was
-    being incorrectly compiled and would have given unpredicatble results. 
-    
-9.  A possessively repeated subpattern with minimum repeat count greater than 
+    being incorrectly compiled and would have given unpredicatble results.
+
+9.  A possessively repeated subpattern with minimum repeat count greater than
     one behaved incorrectly. For example, (A){2,}+ behaved as if it was
-    (A)(A)++ which meant that, after a subsequent mismatch, backtracking into 
-    the first (A) could occur when it should not. 
-    
-10. Add a cast and remove a redundant test from the code. 
+    (A)(A)++ which meant that, after a subsequent mismatch, backtracking into
+    the first (A) could occur when it should not.
+
+10. Add a cast and remove a redundant test from the code.
 
 11. JIT should use pcre_malloc/pcre_free for allocation.
 
 12. Updated pcre-config so that it no longer shows -L/usr/lib, which seems
-    best practice nowadays, and helps with cross-compiling. (If the exec_prefix 
-    is anything other than /usr, -L is still shown). 
-    
+    best practice nowadays, and helps with cross-compiling. (If the exec_prefix
+    is anything other than /usr, -L is still shown).
+
 13. In non-UTF-8 mode, \C is now supported in lookbehinds and DFA matching.
 
 14. Perl does not support \N without a following name in a [] class; PCRE now
     also gives an error.
-    
+
 15. If a forward reference was repeated with an upper limit of around 2000,
-    it caused the error "internal error: overran compiling workspace". The 
+    it caused the error "internal error: overran compiling workspace". The
     maximum number of forward references (including repeats) was limited by the
-    internal workspace, and dependent on the LINK_SIZE. The code has been 
-    rewritten so that the workspace expands (via pcre_malloc) if necessary, and 
-    the default depends on LINK_SIZE. There is a new upper limit (for safety) 
-    of around 200,000 forward references. While doing this, I also speeded up 
-    the filling in of repeated forward references. 
-    
+    internal workspace, and dependent on the LINK_SIZE. The code has been
+    rewritten so that the workspace expands (via pcre_malloc) if necessary, and
+    the default depends on LINK_SIZE. There is a new upper limit (for safety)
+    of around 200,000 forward references. While doing this, I also speeded up
+    the filling in of repeated forward references.
+
 16. A repeated forward reference in a pattern such as (a)(?2){2}(.) was
     incorrectly expecting the subject to contain another "a" after the start.
-    
-17. When (*SKIP:name) is activated without a corresponding (*MARK:name) earlier 
-    in the match, the SKIP should be ignored. This was not happening; instead 
-    the SKIP was being treated as NOMATCH. For patterns such as 
-    /A(*MARK:A)A+(*SKIP:B)Z|AAC/ this meant that the AAC branch was never 
-    tested. 
-    
+
+17. When (*SKIP:name) is activated without a corresponding (*MARK:name) earlier
+    in the match, the SKIP should be ignored. This was not happening; instead
+    the SKIP was being treated as NOMATCH. For patterns such as
+    /A(*MARK:A)A+(*SKIP:B)Z|AAC/ this meant that the AAC branch was never
+    tested.
+
 18. The behaviour of (*MARK), (*PRUNE), and (*THEN) has been reworked and is
     now much more compatible with Perl, in particular in cases where the result
     is a non-match for a non-anchored pattern. For example, if
     /b(*:m)f|a(*:n)w/ is matched against "abc", the non-match returns the name
-    "m", where previously it did not return a name. A side effect of this 
-    change is that for partial matches, the last encountered mark name is 
+    "m", where previously it did not return a name. A side effect of this
+    change is that for partial matches, the last encountered mark name is
     returned, as for non matches. A number of tests that were previously not
     Perl-compatible have been moved into the Perl-compatible test files. The
     refactoring has had the pleasing side effect of removing one argument from
     the match() function, thus reducing its stack requirements.
-    
-19. If the /S+ option was used in pcretest to study a pattern using JIT, 
+
+19. If the /S+ option was used in pcretest to study a pattern using JIT,
     subsequent uses of /S (without +) incorrectly behaved like /S+.
-    
+
 21. Retrieve executable code size support for the JIT compiler and fixing
     some warnings.
-    
-22. A caseless match of a UTF-8 character whose other case uses fewer bytes did 
-    not work when the shorter character appeared right at the end of the 
+
+22. A caseless match of a UTF-8 character whose other case uses fewer bytes did
+    not work when the shorter character appeared right at the end of the
     subject string.
-    
-23. Added some (int) casts to non-JIT modules to reduce warnings on 64-bit 
-    systems. 
-    
-24. Added PCRE_INFO_JITSIZE to pass on the value from (21) above, and also 
-    output it when the /M option is used in pcretest. 
-    
+
+23. Added some (int) casts to non-JIT modules to reduce warnings on 64-bit
+    systems.
+
+24. Added PCRE_INFO_JITSIZE to pass on the value from (21) above, and also
+    output it when the /M option is used in pcretest.
+
 25. The CheckMan script was not being included in the distribution. Also, added
-    an explicit "perl" to run Perl scripts from the PrepareRelease script 
-    because this is reportedly needed in Windows. 
+    an explicit "perl" to run Perl scripts from the PrepareRelease script
+    because this is reportedly needed in Windows.
+    
+26. If study data was being save in a file and studying had not found a set of
+    "starts with" bytes for the pattern, the data written to the file (though 
+    never used) was taken from uninitialized memory and so caused valgrind to
+    complain.  
 
 
 Version 8.20 21-Oct-2011
diff --git a/pcre_study.c b/pcre_study.c
index 9da92bf..20e064a 100644
--- a/pcre_study.c
+++ b/pcre_study.c
@@ -286,8 +286,8 @@ for (;;)
     cc++;
     break;
 
-    /* The single-byte matcher means we can't proceed in UTF-8 mode. (In 
-    non-UTF-8 mode \C will actually be turned into OP_ALLANY, so won't ever 
+    /* The single-byte matcher means we can't proceed in UTF-8 mode. (In
+    non-UTF-8 mode \C will actually be turned into OP_ALLANY, so won't ever
     appear, but leave the code, just in case.) */
 
     case OP_ANYBYTE:
@@ -1321,12 +1321,17 @@ if (bits_set || min > 0
 
   study->size = sizeof(pcre_study_data);
   study->flags = 0;
+  
+  /* Set the start bits always, to avoid unset memory errors if the
+  study data is written to a file, but set the flag only if any of the bits
+  are set, to save time looking when none are. */  
 
-  if (bits_set)
+  if (bits_set) 
     {
     study->flags |= PCRE_STUDY_MAPPED;
     memcpy(study->start_bits, start_bits, sizeof(start_bits));
     }
+  else memset(study->start_bits, 0, 32 * sizeof(uschar));
 
   /* Always set the minlength value in the block, because the JIT compiler
   makes use of it. However, don't set the bit unless the length is greater than
author	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2011-12-06 15:37:24 +0000
committer	ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>	2011-12-06 15:37:24 +0000
commit	475e97e3c2ef83094b3b2239b7cf4ffcc2c05f68 (patch)
tree	e29e71d1cf791f5a6bb5ad746bde31fb71bf8903
parent	fe230b59c018dd441d38ccc8eff23f35fd009a03 (diff)
download	pcre-475e97e3c2ef83094b3b2239b7cf4ffcc2c05f68.tar.gz