is_hfs_dotgit: loosen over-eager match of \u{..47}jk/dotgit-case-maint-1.8.5 dotgit-case-maint-1.8.5

Our is_hfs_dotgit function relies on the hackily-implemented next_hfs_char to give us the next character that an HFS+ filename comparison would look at. It's hacky because it doesn't implement the full case-folding table of HFS+; it gives us just enough to see if the path matches ".git". At the end of next_hfs_char, we use tolower() to convert our 32-bit code point to lowercase. Our tolower() implementation only takes an 8-bit char, though; it throws away the upper 24 bits. This means we can't have any false negatives for is_hfs_dotgit. We only care about matching 7-bit ASCII characters in ".git", and we will correctly process 'G' or 'g'. However, we _can_ have false positives. Because we throw away the upper bits, code point \u{0147} (for example) will look like 'G' and get downcased to 'g'. It's not known whether a sequence of code points whose truncation ends up as ".git" is meaningful in any language, but it does not hurt to be more accurate here. We can just pass out the full 32-bit code point, and compare it manually to the upper and lowercase characters we care about. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Jeff King <peff@peff.net> 2014-12-23 03:45:36 -0500
committer: Junio C Hamano <gitster@pobox.com> 2014-12-29 12:06:27 -0800
commit: 6aaf956b08cfab2dcaa1a1afe4192390d0ef14fd (patch)
tree: c7922e942a9ba4ee433446465d6ca418881c1b82 /utf8.c
parent: d08c13b947335cc48ecc1a8453d97b7147c2d6d6 (diff)
download: git-6aaf956b08cfab2dcaa1a1afe4192390d0ef14fd.tar.gz
1 files changed, 20 insertions, 12 deletions
diff --git a/utf8.c b/utf8.c
index 2c6442cc11..9c9fa3a757 100644
--- a/utf8.c
+++ b/utf8.c
@@ -630,8 +630,8 @@ int mbs_chrlen(const char **text, size_t *remainder_p, const char *encoding)
 }
 
 /*
- * Pick the next char from the stream, folding as an HFS+ filename comparison
- * would. Note that this is _not_ complete by any means. It's just enough
+ * Pick the next char from the stream, ignoring codepoints an HFS+ would.
+ * Note that this is _not_ complete by any means. It's just enough
  * to make is_hfs_dotgit() work, and should not be used otherwise.
  */
 static ucs_char_t next_hfs_char(const char **in)
@@ -668,12 +668,7 @@ static ucs_char_t next_hfs_char(const char **in)
 			continue;
 		}
 
-		/*
-		 * there's a great deal of other case-folding that occurs,
-		 * but this is enough to catch anything that will convert
-		 * to ".git"
-		 */
-		return tolower(out);
+		return out;
 	}
 }
 
@@ -681,10 +676,23 @@ int is_hfs_dotgit(const char *path)
 {
 	ucs_char_t c;
 
-	if (next_hfs_char(&path) != '.' ||
-	    next_hfs_char(&path) != 'g' ||
-	    next_hfs_char(&path) != 'i' ||
-	    next_hfs_char(&path) != 't')
+	c = next_hfs_char(&path);
+	if (c != '.')
+		return 0;
+	c = next_hfs_char(&path);
+
+	/*
+	 * there's a great deal of other case-folding that occurs
+	 * in HFS+, but this is enough to catch anything that will
+	 * convert to ".git"
+	 */
+	if (c != 'g' && c != 'G')
+		return 0;
+	c = next_hfs_char(&path);
+	if (c != 'i' && c != 'I')
+		return 0;
+	c = next_hfs_char(&path);
+	if (c != 't' && c != 'T')
 		return 0;
 	c = next_hfs_char(&path);
 	if (c && !is_dir_sep(c))
author	Jeff King <peff@peff.net>	2014-12-23 03:45:36 -0500
committer	Junio C Hamano <gitster@pobox.com>	2014-12-29 12:06:27 -0800
commit	6aaf956b08cfab2dcaa1a1afe4192390d0ef14fd (patch)
tree	c7922e942a9ba4ee433446465d6ca418881c1b82 /utf8.c
parent	d08c13b947335cc48ecc1a8453d97b7147c2d6d6 (diff)
download	git-6aaf956b08cfab2dcaa1a1afe4192390d0ef14fd.tar.gz