http: extract type/subtype portion of content-type

When we get a content-type from curl, we get the whole header line, including any parameters, and without any normalization (like downcasing or whitespace) applied. If we later try to match it with strcmp() or even strcasecmp(), we may get false negatives. This could cause two visible behaviors: 1. We might fail to recognize a smart-http server by its content-type. 2. We might fail to relay text/plain error messages to users (especially if they contain a charset parameter). This patch teaches the http code to extract and normalize just the type/subtype portion of the string. This is technically passing out less information to the callers, who can no longer see the parameters. But none of the current callers cares, and a future patch will add back an easier-to-use method for accessing those parameters. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Jeff King <peff@peff.net> 2014-05-22 05:29:47 -0400
committer: Junio C Hamano <gitster@pobox.com> 2014-05-27 09:57:00 -0700
commit: bf197fd7eebcb3579dd659af35822ce88adc66c8 (patch)
tree: 746ca7087ca1a5e48a30a20262e831ce2f83c426 /http.c
parent: dbcf2bd3dec1244fdbafb3ec7312ed14d83c0025 (diff)
download: git-bf197fd7eebcb3579dd659af35822ce88adc66c8.tar.gz
1 files changed, 35 insertions, 3 deletions
diff --git a/http.c b/http.c
index 94e1afdee7..6bfd0934b3 100644
--- a/http.c
+++ b/http.c
@@ -906,6 +906,35 @@ static CURLcode curlinfo_strbuf(CURL *curl, CURLINFO info, struct strbuf *buf)
 	return ret;
 }
 
+/*
+ * Extract a normalized version of the content type, with any
+ * spaces suppressed, all letters lowercased, and no trailing ";"
+ * or parameters.
+ *
+ * Note that we will silently remove even invalid whitespace. For
+ * example, "text / plain" is specifically forbidden by RFC 2616,
+ * but "text/plain" is the only reasonable output, and this keeps
+ * our code simple.
+ *
+ * Example:
+ *   "TEXT/PLAIN; charset=utf-8" -> "text/plain"
+ *   "text / plain" -> "text/plain"
+ */
+static void extract_content_type(struct strbuf *raw, struct strbuf *type)
+{
+	const char *p;
+
+	strbuf_reset(type);
+	strbuf_grow(type, raw->len);
+	for (p = raw->buf; *p; p++) {
+		if (isspace(*p))
+			continue;
+		if (*p == ';')
+			break;
+		strbuf_addch(type, tolower(*p));
+	}
+}
+
 /* http_request() targets */
 #define HTTP_REQUEST_STRBUF	0
 #define HTTP_REQUEST_FILE	1
@@ -957,9 +986,12 @@ static int http_request(const char *url,
 
 	ret = run_one_slot(slot, &results);
 
-	if (options && options->content_type)
-		curlinfo_strbuf(slot->curl, CURLINFO_CONTENT_TYPE,
-				options->content_type);
+	if (options && options->content_type) {
+		struct strbuf raw = STRBUF_INIT;
+		curlinfo_strbuf(slot->curl, CURLINFO_CONTENT_TYPE, &raw);
+		extract_content_type(&raw, options->content_type);
+		strbuf_release(&raw);
+	}
 
 	if (options && options->effective_url)
 		curlinfo_strbuf(slot->curl, CURLINFO_EFFECTIVE_URL,
author	Jeff King <peff@peff.net>	2014-05-22 05:29:47 -0400
committer	Junio C Hamano <gitster@pobox.com>	2014-05-27 09:57:00 -0700
commit	bf197fd7eebcb3579dd659af35822ce88adc66c8 (patch)
tree	746ca7087ca1a5e48a30a20262e831ce2f83c426 /http.c
parent	dbcf2bd3dec1244fdbafb3ec7312ed14d83c0025 (diff)
download	git-bf197fd7eebcb3579dd659af35822ce88adc66c8.tar.gz