BUG: http: tighten the list of allowed characters in a URI

The HTTP request parser was considering that any non-LWS char was
par of the URI. Unfortunately, this allows control chars to be sent
in the URI, sometimes resulting in backend servers misbehaving, for
instance when they interprete \0 as an end of string and respond
with plain HTTP/0.9 without headers, that haproxy blocks as invalid
responses.

RFC3986 clearly states the list of allowed characters in a URI. Even
non-ASCII chars are not allowed. Unfortunately, after having run 10
years with these chars allowed, we can't block them right now without
an optional workaround. So the first step consists in only blocking
control chars. A later patch will allow non-ASCII only when an appropriate
option is enabled in the frontend.

Control chars are 0..31 and 127, with the exception of 9, 10 and 13
(\t, \n, \r).
diff --git a/src/proto_http.c b/src/proto_http.c
index c6cfb35..462126a 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -1449,7 +1449,7 @@
 
 	case HTTP_MSG_RQURI:
 	http_msg_rquri:
-		if (likely(!HTTP_IS_LWS(*ptr)))
+		if (likely((unsigned char)(*ptr - 33) <= 93)) /* 33 to 126 included */
 			EAT_AND_JUMP_OR_RETURN(http_msg_rquri, HTTP_MSG_RQURI);
 
 		if (likely(HTTP_IS_SPHT(*ptr))) {
@@ -1457,8 +1457,21 @@
 			EAT_AND_JUMP_OR_RETURN(http_msg_rquri_sp, HTTP_MSG_RQURI_SP);
 		}
 
-		/* so it's a CR/LF, meaning an HTTP 0.9 request */
-		goto http_msg_req09_uri_e;
+		if (likely((unsigned char)*ptr >= 128)) {
+			/* FIXME: we should control whether we want to allow them, but
+			 * until now they were allowed.
+			 */
+			EAT_AND_JUMP_OR_RETURN(http_msg_rquri, HTTP_MSG_RQURI);
+		}
+
+		if (likely(HTTP_IS_CRLF(*ptr))) {
+			/* so it's a CR/LF, meaning an HTTP 0.9 request */
+			goto http_msg_req09_uri_e;
+		}
+
+		/* OK forbidden chars, 0..31 or 127 */
+		state = HTTP_MSG_ERROR;
+		break;
 
 	case HTTP_MSG_RQURI_SP:
 	http_msg_rquri_sp: