PATCH: [perl #72998] regex looping

If a character folds to multiple ones in case-insensitive matching, it should not match just one of those, or the regular expression can loop. For example, \N{LATIN SMALL LIGATURE FF} folds to 'ff', and so "\N{LATIN SMALL LIGATURE FF}" =~ /f+/i should match. Prior to this patch, this function returned that there is a match, but left the matching string pointer at the beginning of the "\N{LATIN SMALL LIGATURE FF}" because it doesn't make sense to match just half a character, and at this level it doesn't know about the '+'. This leaves things in an inconsistent state, with the reporting of a match, but the input pointer unchanged, the result of which is a loop. I don't know how to fix this so that it correctly matches, and there are semantic issues with doing so. For example, if "\N{LATIN SMALL LIGATURE FF}" =~ /ff/i matches, then one would think that so should "\N{LATIN SMALL LIGATURE FF}" =~ /(f)(f)/i But $1 and $2 don't really make sense here, since they both refer to the half of the same character. So this patch just returns failure if only a partial character is matched. That leaves things consistent, and solves the problem of looping, so that Perl doesn't hang on such a construct, but leaves the ultimate solution for another day.
author: Karl Williamson <khw@khw-desktop.(none)> 2010-04-13 21:25:36 -0600
committer: Rafael Garcia-Suarez <rgs@consttype.org> 2010-04-15 10:30:52 +0200
commit: 7dcb3b25fc4113f0eeb68d0d3c47ccedd5ff3f2a (patch)
tree: 32735be6d32f1eb0c931d46a202b2a555b8c4f13 /utf8.c
parent: cfbab81b96edaf7de871d0fa306f1723e15a56d7 (diff)
download: perl-7dcb3b25fc4113f0eeb68d0d3c47ccedd5ff3f2a.tar.gz
1 files changed, 2 insertions, 1 deletions
diff --git a/utf8.c b/utf8.c
index 9ed0663e19..1a6077c8d2 100644
--- a/utf8.c
+++ b/utf8.c
@@ -2609,7 +2609,8 @@ Perl_ibcmp_utf8(pTHX_ const char *s1, char **pe1, register UV l1, bool u1, const
 
      /* A match is defined by all the scans that specified
       * an explicit length reaching their final goals. */
-     match = (f1 == 0 || p1 == f1) && (f2 == 0 || p2 == f2);
+     match = (n1 == 0 && n2 == 0    /* Must not match partial char; Bug #72998 */
+	     && (f1 == 0 || p1 == f1) && (f2 == 0 || p2 == f2));
 
      if (match) {
 	  if (pe1)
author	Karl Williamson <khw@khw-desktop.(none)>	2010-04-13 21:25:36 -0600
committer	Rafael Garcia-Suarez <rgs@consttype.org>	2010-04-15 10:30:52 +0200
commit	7dcb3b25fc4113f0eeb68d0d3c47ccedd5ff3f2a (patch)
tree	32735be6d32f1eb0c931d46a202b2a555b8c4f13 /utf8.c
parent	cfbab81b96edaf7de871d0fa306f1723e15a56d7 (diff)
download	perl-7dcb3b25fc4113f0eeb68d0d3c47ccedd5ff3f2a.tar.gz