[MAJOR] implement parameter hashing for POST requests This patch extends the "url_param" load balancing method by introducing the "check_post" option. Using this option enables analysis of the beginning of POST requests to search for the specified URL parameter. The patch also fixes a few minor typos in comments that were discovered during code review.

commit: 1c2ab96be55d3e7d481ec1cc64970cdf6cf506b2 [log] [tgz]
author: matt.farnsworth@nokia.com <matt.farnsworth@nokia.com> Mon Apr 14 20:47:37 2008 +0200
committer: Willy Tarreau <w@1wt.eu> Tue Apr 15 15:30:41 2008 +0200
tree: dc10fac3f0cbdd6c9e740dce58ab377a1f3c5a52
parent: a532324128cd3591439489739a8fb8688ac1dea2 [diff]
diff --git a/doc/configuration.txt b/doc/configuration.txt
index 7093c78..7073a02 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt

@@ -696,6 +696,7 @@
 
 
 balance <algorithm> [ <arguments> ]
+balance url_param <param> [check_post [<max_wait>]]
   Define the load balancing algorithm to be used in a backend.
   May be used in sections :   defaults | frontend | listen | backend
                                  yes   |    no    |   yes  |   yes
@@ -745,22 +746,47 @@
                   effect.
 
       url_param   The URL parameter specified in argument will be looked up in
-                  the query string of each HTTP request. If it is found
-                  followed by an equal sign ('=') and a value, then the value
-                  is hashed and divided by the total weight of the running
-                  servers. The result designates which server will receive the
-                  request. This is used to track user identifiers in requests
-                  and ensure that a same user ID will always be sent to the
-                  same server as long as no server goes up or down. If no value
-                  is found or if the parameter is not found, then a round robin
-                  algorithm is applied. Note that this algorithm may only be
-                  used in an HTTP backend. This algorithm is static, which
-                  means that changing a server's weight on the fly will have no
-                  effect.
+                  the query string of each HTTP GET request.
+
+                  If the modifier "check_post" is used, then an HTTP POST
+		  request entity will be searched for the parameter argument,
+		  when the question mark indicating a query string ('?') is not
+		  present in the URL. Optionally, specify a number of octets to
+		  wait for before attempting to search the message body. If the
+		  entity can not be searched, then round robin is used for each
+		  request. For instance, if your clients always send the LB
+		  parameter in the first 128 bytes, then specify that. The
+		  default is 48. The entity data will not be scanned until the
+		  required number of octets have arrived at the gateway, this
+		  is the minimum of: (default/max_wait, Content-Length or first
+		  chunk length). If Content-Length is missing or zero, it does
+		  not need to wait for more data than the client promised to
+		  send. When Content-Length is present and larger than
+		  <max_wait>, then waiting is limited to <max_wait> and it is
+		  assumed that this will be enough data to search for the
+		  presence of the parameter. In the unlikely event that
+		  Transfer-Encoding: chunked is used, only the first chunk is
+		  scanned. Parameter values separated by a chunk boundary, may
+		  be randomly balanced if at all.
+
+                  If the parameter is found followed by an equal sign ('=') and
+                  a value, then the value is hashed and divided by the total
+                  weight of the running servers. The result designates which
+                  server will receive the request.
+
+                  This is used to track user identifiers in requests and ensure
+                  that a same user ID will always be sent to the same server as
+                  long as no server goes up or down. If no value is found or if
+                  the parameter is not found, then a round robin algorithm is
+                  applied. Note that this algorithm may only be used in an HTTP
+                  backend. This algorithm is static, which means that changing a
+                  server's weight on the fly will have no effect.
 
     <arguments> is an optional list of arguments which may be needed by some
                 algorithms. Right now, only the "url_param" algorithm supports
-                a mandatory argument.
+                an optional argument.
+
+                balance url_param <param> [check_post [<max_wait>]]
 
   The definition of the load balancing algorithm is mandatory for a backend
   and limited to one per backend.
@@ -768,6 +794,39 @@
   Examples :
         balance roundrobin
         balance url_param userid
+        balance url_param session_id check_post 64
+
+  Note: the following caveats and limitations on using the "check_post"
+  extension with "url_param" must be considered :
+
+    - all POST requests are eligable for consideration, because there is no way
+      to determine if the parameters will be found in the body or entity which
+      may contain binary data. Therefore another method may be required to
+      restrict consideration of POST requests that have no URL parameters in
+      the body. (see acl reqideny http_end)
+
+    - using a <max_wait> value larger than the request buffer size does not
+      make sense and is useless. The buffer size is set at build time, and
+      defaults to 16 kB.
+
+    - Content-Encoding is not supported, the parameter search will probably
+      fail; and load balancing will fall back to Round Robin.
+
+    - Expect: 100-continue is not supported, load balancing will fall back to
+      Round Robin.
+
+    - Transfer-Encoding (RFC2616 3.6.1) is only supported in the first chunk.
+      If the entire parameter value is not present in the first chunk, the
+      selection of server is undefined (actually, defined by how little
+      actually appeared in the first chunk).
+
+    - This feature does not support generation of a 100, 411 or 501 response.
+
+    - In some cases, requesting "check_post" MAY attempt to scan the entire
+      contents of a message body. Scaning normally terminates when linear
+      white space or control characters are found, indicating the end of what
+      might be a URL parameter list. This is probably not a concern with SGML
+      type message bodies.
 
   See also : "dispatch", "cookie", "appsession", "transparent" and "http_proxy".
 

diff --git a/doc/haproxy-en.txt b/doc/haproxy-en.txt
index 20cadd8..503e021 100644
--- a/doc/haproxy-en.txt
+++ b/doc/haproxy-en.txt

@@ -999,6 +999,48 @@
 but may be able to look for a parameter passed in the URL. If the parameter is
 missing from the URL, then the 'round robin' method applies.
 
+A modifier may be added to specify that parameters in POST requests may be
+found in the messsage body if the URL lacks a '?' separator character.
+A wait limit may also be applied, if no limit is requested then
+the default value is 48 octets, the minimum is 3.  HAProxy may wait, until 48
+octets are received. If Content-Length is missing, or zero it need not
+wait for more data then the client promissed to send. When Content-Length is
+present, and more than <max_wait>; then waiting is limited and it is assumed this
+will be enough data to search for the presence of a parameter.  If
+Transfer-Encoding: chunked is used (unlikely), then the length of the first chunk
+is the maximum number of bytes to wait for.
+
+balance url_param <param> [check_post [<max_wait>]]
+
+Caveats for using the check_post extension:
+
+  - all POST requests are eligable for consideration, because there is
+    no way to determine if the parameters will be found in the body or
+    entity which may contain binary data. Therefore another method may be
+    required to restrict consideration of POST requests that have no URL
+    parameters in the body. (see acl reqideny http_end)
+
+Limitations on inspecting the entity body of a POST:
+
+  - Content-Encoding is not supported, the parameter search will probably fail;
+    and load balancing will fall back to Round Robin.
+
+  - Expect: 100-continue is not supported, load balancing will fall back to
+    Round Robin.
+
+  - Transfer-Encoding(RFC2616 3.6.1) is only supported in the first chunk. If
+    the entire parameter value is not present in the first chunk, the selection
+    of server is undefined (actually, defined by how little actually appeared in
+    the first chunk).
+
+  - This feature does not support generation of a 100, 411 or 501 response.
+
+  - In some cases, requesting check_post MAY attempt to scan the entire contents
+    of a message body.  Scaning normally terminates when linear white space or
+    control characters are found, indicating the end of what might be a URL parameter
+    list.  This is probably not a concern with SGML type message bodies.
+
+
 Example :
 ---------
 

diff --git a/include/proto/proto_http.h b/include/proto/proto_http.h
index 523471d..68b3f11 100644
--- a/include/proto/proto_http.h
+++ b/include/proto/proto_http.h

@@ -81,6 +81,9 @@
 void check_response_for_cacheability(struct session *t, struct buffer *rtr);
 int stats_check_uri_auth(struct session *t, struct proxy *backend);
 void init_proto_http();
+int http_find_header2(const char *name, int len,
+		      const char *sol, struct hdr_idx *idx,
+		      struct hdr_ctx *ctx);
 
 #endif /* _PROTO_PROTO_HTTP_H */
 

diff --git a/include/types/backend.h b/include/types/backend.h
index 2d62722..6d576b8 100644
--- a/include/types/backend.h
+++ b/include/types/backend.h

@@ -1,6 +1,6 @@
 /*
   include/types/backend.h
-  This file rassembles definitions for backends
+  This file assembles definitions for backends
 
   Copyright (C) 2000-2008 Willy Tarreau - w@1wt.eu
   
@@ -46,7 +46,7 @@
 
 /* various constants */
 
-/* The scale factor between user weight an effective weight allows smooth
+/* The scale factor between user weight and effective weight allows smooth
  * weight modulation even with small weights (eg: 1). It should not be too high
  * though because it limits the number of servers in FWRR mode in order to
  * prevent any integer overflow. The max number of servers per backend is

diff --git a/include/types/proto_http.h b/include/types/proto_http.h
index 50d12d6..885fbc6 100644
--- a/include/types/proto_http.h
+++ b/include/types/proto_http.h

@@ -29,29 +29,18 @@
 
 /*
  * FIXME: break this into HTTP state and TCP socket state.
- * See server.h for the other end.
- */
-
-/* different possible states for the client side */
-#define CL_STHEADERS	0
-#define CL_STDATA	1
-#define CL_STSHUTR	2
-#define CL_STSHUTW	3
-#define CL_STCLOSE	4
-
-/*
- * FIXME: break this into HTTP state and TCP socket state.
  * See client.h for the other end.
  */
 
 /* different possible states for the server side */
 #define SV_STIDLE	0
-#define SV_STCONN	1
-#define SV_STHEADERS	2
-#define SV_STDATA	3
-#define SV_STSHUTR	4
-#define SV_STSHUTW	5
-#define SV_STCLOSE	6
+#define SV_STANALYZE	1  /* this server state is set by the client to study the body for server assignment */
+#define SV_STCONN	2
+#define SV_STHEADERS	3
+#define SV_STDATA	4
+#define SV_STSHUTR	5
+#define SV_STSHUTW	6
+#define SV_STCLOSE	7
 
 /*
  * Transaction flags moved from session
@@ -204,27 +193,28 @@
  *                             which marks the end of the line (LF or CRLF).
  */
 struct http_msg {
-	unsigned int msg_state;         /* where we are in the current message parsing */
-	char *sol;                      /* start of line, also start of message when fully parsed */
-	char *eol;                      /* end of line */
-	unsigned int som;		/* Start Of Message, relative to buffer */
-	unsigned int col, sov;		/* current header: colon, start of value */
-	unsigned int eoh;		/* End Of Headers, relative to buffer */
-	char **cap;			/* array of captured headers (may be NULL) */
-	union {				/* useful start line pointers, relative to buffer */
+	unsigned int msg_state;                /* where we are in the current message parsing */
+	char *sol;                             /* start of line, also start of message when fully parsed */
+	char *eol;                             /* end of line */
+	unsigned int som;                      /* Start Of Message, relative to buffer */
+	unsigned int col, sov;                 /* current header: colon, start of value */
+	unsigned int eoh;                      /* End Of Headers, relative to buffer */
+	char **cap;                            /* array of captured headers (may be NULL) */
+	union {                                /* useful start line pointers, relative to buffer */
 		struct {
-			int l;		/* request line length (not including CR) */
-			int m_l;	/* METHOD length (method starts at ->som) */
-			int u, u_l;	/* URI, length */
-			int v, v_l;	/* VERSION, length */
-		} rq;			/* request line : field, length */
+			int l;                 /* request line length (not including CR) */
+			int m_l;               /* METHOD length (method starts at ->som) */
+			int u, u_l;            /* URI, length */
+			int v, v_l;            /* VERSION, length */
+		} rq;                          /* request line : field, length */
 		struct {
-			int l;		/* status line length (not including CR) */
-			int v_l;	/* VERSION length (version starts at ->som) */
-			int c, c_l;	/* CODE, length */
-			int r, r_l;	/* REASON, length */
-		} st;			/* status line : field, length */
-	} sl;				/* start line */
+			int l;                 /* status line length (not including CR) */
+			int v_l;               /* VERSION length (version starts at ->som) */
+			int c, c_l;            /* CODE, length */
+			int r, r_l;            /* REASON, length */
+		} st;                          /* status line : field, length */
+	} sl;                                  /* start line */
+	unsigned long long hdr_content_len;    /* cache for parsed header value */
 };
 
 /* This is an HTTP transaction. It contains both a request message and a

diff --git a/include/types/proxy.h b/include/types/proxy.h
index 98baf53..091be57 100644
--- a/include/types/proxy.h
+++ b/include/types/proxy.h

@@ -165,6 +165,7 @@
 	int  cookie_len;			/* strlen(cookie_name), computed only once */
 	char *url_param_name;			/* name of the URL parameter used for hashing */
 	int  url_param_len;			/* strlen(url_param_name), computed only once */
+	unsigned url_param_post_limit;		/* if checking POST body for URI parameter, max body to wait for */
 	char *appsession_name;			/* name of the cookie to look for */
 	int  appsession_name_len;		/* strlen(appsession_name), computed only once */
 	int  appsession_len;			/* length of the appsession cookie value to be used */

diff --git a/src/backend.c b/src/backend.c
index 436a4d9..38f81d2 100644
--- a/src/backend.c
+++ b/src/backend.c

@@ -16,6 +16,7 @@
 #include <stdlib.h>
 #include <syslog.h>
 #include <string.h>
+#include <ctype.h>
 
 #include <common/compat.h>
 #include <common/config.h>
@@ -1122,42 +1123,41 @@
  * are shared but cookies are not usable. If the parameter is not found, NULL
  * is returned. If any server is found, it will be returned. If no valid server
  * is found, NULL is returned.
- *
  */
 struct server *get_server_ph(struct proxy *px, const char *uri, int uri_len)
 {
 	unsigned long hash = 0;
-	char *p;
+	const char *p;
+	const char *params;
 	int plen;
 
+	/* when tot_weight is 0 then so is srv_count */
 	if (px->lbprm.tot_weight == 0)
 		return NULL;
 
+	if ((p = memchr(uri, '?', uri_len)) == NULL)
+		return NULL;
+
 	if (px->lbprm.map.state & PR_MAP_RECALC)
 		recalc_server_map(px);
 
-	p = memchr(uri, '?', uri_len);
-	if (!p)
-		return NULL;
 	p++;
 
 	uri_len -= (p - uri);
 	plen = px->url_param_len;
-
-	if (uri_len <= plen)
-		return NULL;
+	params = p;
 
 	while (uri_len > plen) {
 		/* Look for the parameter name followed by an equal symbol */
-		if (p[plen] == '=') {
-			/* skip the equal symbol */
-			uri = p;
-			p += plen + 1;
-			uri_len -= plen + 1;
-			if (memcmp(uri, px->url_param_name, plen) == 0) {
-				/* OK, we have the parameter here at <uri>, and
+		if (params[plen] == '=') {
+			if (memcmp(params, px->url_param_name, plen) == 0) {
+				/* OK, we have the parameter here at <params>, and
 				 * the value after the equal sign, at <p>
+				 * skip the equal symbol
 				 */
+				p += plen + 1;
+				uri_len -= plen + 1;
+
 				while (uri_len && *p != '&') {
 					hash = *p + (hash << 6) + (hash << 16) - hash;
 					uri_len--;
@@ -1165,19 +1165,117 @@
 				}
 				return px->lbprm.map.srv[hash % px->lbprm.tot_weight];
 			}
+		}
+		/* skip to next parameter */
+		p = memchr(params, '&', uri_len);
+		if (!p)
+			return NULL;
+		p++;
+		uri_len -= (p - params);
+		params = p;
+	}
+	return NULL;
+}
+
+/*
+ * this does the same as the previous server_ph, but check the body contents
+ */
+struct server *get_server_ph_post(struct session *s)
+{
+	unsigned long    hash = 0;
+	struct http_txn *txn  = &s->txn;
+	struct buffer   *req  = s->req;
+	struct http_msg *msg  = &txn->req;
+	struct proxy    *px   = s->be;
+	unsigned int     plen = px->url_param_len;
+
+	/* tot_weight appears to mean srv_count */
+	if (px->lbprm.tot_weight == 0)
+		return NULL;
+
+        unsigned long body = msg->sol[msg->eoh] == '\r' ? msg->eoh + 2 : msg->eoh + 1;
+        unsigned long len  = req->total - body;
+        const char *params = req->data + body;
+
+	if ( len == 0 )
+		return NULL;
+
+	if (px->lbprm.map.state & PR_MAP_RECALC)
+		recalc_server_map(px);
+
+	struct hdr_ctx ctx;
+	ctx.idx = 0;
+
+	/* if the message is chunked, we skip the chunk size, but use the value as len */
+	http_find_header2("Transfer-Encoding", 17, msg->sol, &txn->hdr_idx, &ctx);
+	if ( ctx.idx && strncasecmp(ctx.line+ctx.val,"chunked",ctx.vlen)==0) {
+		unsigned int chunk = 0;
+		while ( params < req->rlim && !HTTP_IS_CRLF(*params)) {
+			char c = *params;
+			if (ishex(c)) {
+				unsigned int hex = toupper(c) - '0';
+				if ( hex > 9 )
+					hex -= 'A' - '9' - 1;
+				chunk = (chunk << 4) | hex;
+			}
+			else
+				return NULL;
+			params++;
+			len--;
 		}
+		/* spec says we get CRLF */
+		if (HTTP_IS_CRLF(*params) && HTTP_IS_CRLF(params[1]))
+			params += 2;
+		else
+			return NULL;
+		/* ok we have some encoded length, just inspect the first chunk */
+		len = chunk;
+	}
 
+	const char   *p = params;
+
+	while (len > plen) {
+		/* Look for the parameter name followed by an equal symbol */
+		if (params[plen] == '=') {
+			if (memcmp(params, px->url_param_name, plen) == 0) {
+				/* OK, we have the parameter here at <params>, and
+				 * the value after the equal sign, at <p>
+				 * skip the equal symbol
+				 */
+				p += plen + 1;
+				len -= plen + 1;
+
+				while (len && *p != '&') {
+					if (unlikely(!HTTP_IS_TOKEN(*p))) {
+					/* if in a POST, body must be URI encoded or its not a URI.
+					 * Do not interprete any possible binary data as a parameter.
+					 */
+						if (likely(HTTP_IS_LWS(*p))) /* eol, uncertain uri len */
+							break;
+						return NULL;                 /* oh, no; this is not uri-encoded.
+									      * This body does not contain parameters.
+									      */
+					}
+					hash = *p + (hash << 6) + (hash << 16) - hash;
+					len--;
+					p++;
+					/* should we break if vlen exceeds limit? */
+				}
+				return px->lbprm.map.srv[hash % px->lbprm.tot_weight];
+			}
+		}
 		/* skip to next parameter */
-		uri = p;
-		p = memchr(uri, '&', uri_len);
+		p = memchr(params, '&', len);
 		if (!p)
 			return NULL;
 		p++;
-		uri_len -= (p - uri);
+		len -= (p - params);
+		params = p;
 	}
 	return NULL;
 }
 
+
 /*
  * This function marks the session as 'assigned' in direct or dispatch modes,
  * or tries to assign one in balance mode, according to the algorithm. It does
@@ -1254,9 +1352,15 @@
 				break;
 			case BE_LB_ALGO_PH:
 				/* URL Parameter hashing */
-				s->srv = get_server_ph(s->be,
-						       s->txn.req.sol + s->txn.req.sl.rq.u,
-						       s->txn.req.sl.rq.u_l);
+				if (s->txn.meth == HTTP_METH_POST &&
+                                    memchr(s->txn.req.sol + s->txn.req.sl.rq.u, '&',
+                                           s->txn.req.sl.rq.u_l ) == NULL)
+					s->srv = get_server_ph_post(s);
+				else
+					s->srv = get_server_ph(s->be,
+							       s->txn.req.sol + s->txn.req.sl.rq.u,
+							       s->txn.req.sl.rq.u_l);
+
 				if (!s->srv) {
 					/* parameter not found, fall back to round robin on the map */
 					s->srv = get_server_rr_with_conns(s->be, srvtoavoid);
@@ -1620,7 +1724,7 @@
 			return SN_ERR_RESOURCE;
 		}
 	}
-	
+
 	if ((connect(fd, (struct sockaddr *)&s->srv_addr, sizeof(s->srv_addr)) == -1) &&
 	    (errno != EINPROGRESS) && (errno != EALREADY) && (errno != EISCONN)) {
 
@@ -1879,6 +1983,21 @@
 			free(curproxy->url_param_name);
 		curproxy->url_param_name = strdup(args[1]);
 		curproxy->url_param_len = strlen(args[1]);
+		if ( *args[2] ) {
+			if (strcmp(args[2], "check_post")) {
+				snprintf(err, errlen, "'balance url_param' only accepts check_post modifier.");
+				return -1;
+			}
+			if (*args[3]) {
+				/* TODO: maybe issue a warning if there is no value, no digits or too long */
+				curproxy->url_param_post_limit = str2ui(args[3]);
+			}
+			/* if no limit, or faul value in args[3], then default to a moderate wordlen */
+			if (!curproxy->url_param_post_limit)
+				curproxy->url_param_post_limit = 48;
+			else if ( curproxy->url_param_post_limit < 3 )
+				curproxy->url_param_post_limit = 3; /* minimum example: S=3 or \r\nS=6& */
+		}
 	}
 	else {
 		snprintf(err, errlen, "'balance' only supports 'roundrobin', 'leastconn', 'source', 'uri' and 'url_param' options.");

diff --git a/src/client.c b/src/client.c
index bff5cd9..410c3f0 100644
--- a/src/client.c
+++ b/src/client.c

@@ -232,7 +232,8 @@
 
 		if (p->mode == PR_MODE_HTTP) {
 			txn->status = -1;
-
+			txn->req.hdr_content_len = 0LL;
+			txn->rsp.hdr_content_len = 0LL;
 			txn->req.msg_state = HTTP_MSG_RQBEFORE; /* at the very beginning of the request */
 			txn->rsp.msg_state = HTTP_MSG_RPBEFORE; /* at the very beginning of the response */
 			txn->req.sol = txn->req.eol = NULL;

diff --git a/src/ev_poll.c b/src/ev_poll.c
index 0166bd6..54cd138 100644
--- a/src/ev_poll.c
+++ b/src/ev_poll.c

@@ -102,8 +102,8 @@
 #define FDSETS_ARE_INT_ALIGNED
 #ifdef FDSETS_ARE_INT_ALIGNED
 
-#define WE_REALLY_NOW_THAT_FDSETS_ARE_INTS
-#ifdef WE_REALLY_NOW_THAT_FDSETS_ARE_INTS
+#define WE_REALLY_KNOW_THAT_FDSETS_ARE_INTS
+#ifdef WE_REALLY_KNOW_THAT_FDSETS_ARE_INTS
 				sr = (rn >> count) & 1;
 				sw = (wn >> count) & 1;
 #else

diff --git a/src/proto_http.c b/src/proto_http.c
index 3f8e0ac..2c07030 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c

@@ -287,7 +287,7 @@
 };
 
 /* It is about twice as fast on recent architectures to lookup a byte in a
- * table than two perform a boolean AND or OR between two tests. Refer to
+ * table than to perform a boolean AND or OR between two tests. Refer to
  * RFC2616 for those chars.
  */
 
@@ -2065,6 +2065,83 @@
 				goto return_bad_req;
 			t->flags |= SN_CONN_CLOSED;
 		}
+		/* Before we switch to data, was assignment set in manage_client_side_cookie?
+		 * If not assigned, perhaps we are balancing on url_param, but this is a
+		 * POST; and the parameters are in the body, maybe scan there to find our server.
+		 * (unless headers overflowed the buffer?)
+                 */
+		if (!(t->flags & (SN_ASSIGNED|SN_DIRECT)) &&
+		     t->txn.meth == HTTP_METH_POST && t->be->url_param_name != NULL &&
+		     t->be->url_param_post_limit != 0 && req->total < BUFSIZE &&
+		     memchr(msg->sol + msg->sl.rq.u, '?', msg->sl.rq.u_l) == NULL) {
+			/* are there enough bytes here? total == l || r || rlim ?
+			 * len is unsigned, but eoh is int,
+			 * how many bytes of body have we received?
+			 * eoh is the first empty line of the header
+			 */
+                        /* already established CRLF or LF at eoh, move to start of message, find message length in buffer */
+			unsigned long len = req->total - (msg->sol[msg->eoh] == '\r' ? msg->eoh + 2 : msg->eoh + 1);
+
+			/* If we have HTTP/1.1 and Expect: 100-continue, then abort.
+			 * We can't assume responsibility for the server's decision,
+			 * on this URI and header set. See rfc2616: 14.20, 8.2.3,
+			 * We also can't change our mind later, about which server to choose, so round robin.
+			 */
+			if ((likely(msg->sl.rq.v_l == 8) && req->data[msg->som + msg->sl.rq.v + 7] == '1')) {
+				struct hdr_ctx ctx;
+				ctx.idx = 0;
+				/* Expect is allowed in 1.1, look for it */
+				http_find_header2("Expect", 6, msg->sol, &txn->hdr_idx, &ctx);
+				if (ctx.idx != 0  &&
+                                    unlikely(ctx.vlen == 12 && strncasecmp(ctx.line+ctx.val,"100-continue",12)==0))
+					/* We can't reliablly stall and wait for data, because of
+					 * .NET clients that don't conform to rfc2616; so, no need for
+					 * the next block to check length expectations.
+                                         * We could send 100 status back to the client, but then we need to
+                                         * re-write headers, and send the message. And this isn't the right
+                                         * place for that action.
+                                         * TODO: support Expect elsewhere and delete this block.
+					 */
+					goto end_check_maybe_wait_for_body;
+			}
+			if ( likely(len > t->be->url_param_post_limit) ) {
+				/* nothing to do, we got enough */
+			} else {
+				/* limit implies we are supposed to need this many bytes
+				 * to find the parameter. Let's see how many bytes we can wait for.
+				 */
+				long long hint = len;
+				struct hdr_ctx ctx;
+				ctx.idx = 0;
+				http_find_header2("Transfer-Encoding", 17, msg->sol, &txn->hdr_idx, &ctx);
+				if (unlikely(ctx.idx && strncasecmp(ctx.line+ctx.val,"chunked",7)==0)) {
+					t->srv_state = SV_STANALYZE;
+				} else {
+					ctx.idx = 0;
+					http_find_header2("Content-Length", 14, msg->sol, &txn->hdr_idx, &ctx);
+					/* now if we have a length, we'll take the hint */
+					if ( ctx.idx ) {
+						/* We have Content-Length */
+						if ( strl2llrc(ctx.line+ctx.val,ctx.vlen, &hint) )
+							hint = 0;         /* parse failure, untrusted client */
+						else {
+							if ( hint > 0 )
+								msg->hdr_content_len = hint;
+							else
+								hint = 0; /* bad client, sent negative length */
+						}
+					}
+					/* but limited to what we care about, maybe we don't expect any entity data (hint == 0) */
+					if ( t->be->url_param_post_limit < hint )
+						hint = t->be->url_param_post_limit;
+					/* now do we really need to buffer more data? */
+					if ( len < hint )
+						t->srv_state = SV_STANALYZE;
+					/* else... There are no body bytes to wait for */
+				}
+			}
+		}
+        end_check_maybe_wait_for_body:
 
 		/*************************************************************
 		 * OK, that's finished for the headers. We have done what we *
@@ -2436,7 +2513,12 @@
 	//EV_FD_ISSET(t->srv_fd, DIR_RD), EV_FD_ISSET(t->srv_fd, DIR_WR)
 	//);
 	if (s == SV_STIDLE) {
-		if (c == CL_STHEADERS)
+		/* NOTE: The client processor may switch to SV_STANALYZE, which switches back SV_STIDLE.
+		 *       This is logcially after CL_STHEADERS completed, CL_STDATA has started, but
+		 *       we need to defer server selection until more data arrives, if possible.
+                 *       This is rare, and only if balancing on parameter hash with values in the entity of a POST
+                 */
+		if (c == CL_STHEADERS )
 			return 0;	/* stay in idle, waiting for data to reach the client side */
 		else if (c == CL_STCLOSE || c == CL_STSHUTW ||
 			 (c == CL_STSHUTR &&
@@ -3531,6 +3613,60 @@
 		}
 		return 0;
 	}
+	else if (s == SV_STANALYZE){
+		/* this server state is set by the client to study the body for server assignment */
+
+		/* Have we been through this long enough to timeout? */
+		if (!tv_isle(&req->rex, &now)) {
+			/* balance url_param check_post should have been the only to get into this.
+			 * just wait for data, check to compare how much
+			 */
+			struct http_msg * msg = &t->txn.req;
+                        unsigned long body = msg->sol[msg->eoh] == '\r' ? msg->eoh + 2 :msg->eoh + 1;
+                        unsigned long len  = req->total - body;
+			long long limit = t->be->url_param_post_limit;
+			struct hdr_ctx ctx;
+			ctx.idx = 0;
+			/* now if we have a length, we'll take the hint */
+			http_find_header2("Transfer-Encoding", 17, msg->sol, &txn->hdr_idx, &ctx);
+			if ( ctx.idx && strncasecmp(ctx.line+ctx.val,"chunked",ctx.vlen)==0) {
+				unsigned int chunk = 0;
+			        while ( body < req->total && !HTTP_IS_CRLF(msg->sol[body])) {
+					char c = msg->sol[body];
+					if (ishex(c)) {
+						unsigned int hex = toupper(c) - '0';
+						if ( hex > 9 )
+							hex -= 'A' - '9' - 1;
+						chunk = (chunk << 4) | hex;
+					}
+					else break;
+					body++;
+					len--;
+				}
+				if ( body == req->total )
+					return 0; /* end of buffer? data missing! */
+
+				if ( memcmp(msg->sol+body, "\r\n", 2) != 0 )
+					return 0; /* chunked encoding len ends with CRLF, and we don't have it yet */
+
+				/* if we support more then one chunk here, we have to do it again when assigning server
+				   1. how much entity data do we have? new var
+				   2. should save entity_start, entity_cursor, elen & rlen in req; so we don't repeat scanning here
+				   3. test if elen > limit, or set new limit to elen if 0 (end of entity found)
+				*/
+
+				if ( chunk < limit )
+					limit = chunk;                  /* only reading one chunk */
+			} else {
+				if ( msg->hdr_content_len < limit )
+					limit = msg->hdr_content_len;
+			}
+			if ( len < limit )
+				return 0;
+		}
+		t->srv_state=SV_STIDLE;
+		return 1;
+	}
 	else { /* SV_STCLOSE : nothing to do */
 		if ((global.mode & MODE_DEBUG) && (!(global.mode & MODE_QUIET) || (global.mode & MODE_VERBOSE))) {
 			int len;
@@ -3549,7 +3685,7 @@
  * called with s->cli_state == CL_STSHUTR. Right now, only statistics can be
  * produced. It stops by itself by unsetting the SN_SELF_GEN flag from the
  * session, which it uses to keep on being called when there is free space in
- * the buffer, of simply by letting an empty buffer upon return. It returns 1
+ * the buffer, or simply by letting an empty buffer upon return. It returns 1
  * if it changes the session state from CL_STSHUTR, otherwise 0.
  */
 int produce_content(struct session *s)
@@ -3640,7 +3776,7 @@
 				/* Swithing Proxy */
 				t->be = (struct proxy *) exp->replace;
 
-				/* right now, the backend switch is not too much complicated
+				/* right now, the backend switch is not overly complicated
 				 * because we have associated req_cap and rsp_cap to the
 				 * frontend, and the beconn will be updated later.
 				 */
commit	1c2ab96be55d3e7d481ec1cc64970cdf6cf506b2	[log] [tgz]
author	matt.farnsworth@nokia.com <matt.farnsworth@nokia.com>	Mon Apr 14 20:47:37 2008 +0200
committer	Willy Tarreau <w@1wt.eu>	Tue Apr 15 15:30:41 2008 +0200
tree	dc10fac3f0cbdd6c9e740dce58ab377a1f3c5a52
parent	a532324128cd3591439489739a8fb8688ac1dea2 [diff]