Diff - 3de5bd603c7d1e6af979c8067c313847262f6d78^! - haproxy

commit	3de5bd603c7d1e6af979c8067c313847262f6d78	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Mon May 02 16:07:07 2016 +0200
committer	Willy Tarreau <w@1wt.eu>	Mon May 02 16:39:22 2016 +0200
tree	e7f19fccbc60e971520af091402f43ad51f1249e
parent	55e58f23348c2f9fa918e7370c0ddb98b307ee28 [diff] [blame]

BUG/MEDIUM: http: fix risk of CPU spikes with pipelined requests from dead client

Since client-side HTTP keep-alive was introduced in 1.4-dev, a good
number of corner cases had to be dealt with. One of them remained and
caused some occasional CPU spikes that Cyril Bonté had the opportunity
to observe from time to time and even recently to capture for deeper
analysis.

What happens is the following scenario :
  1) a client sends a first request which causes the server to respond
     using chunked encoding

  2) the server starts to respond with a large response that doesn't
     fit into a single buffer

  3) the response buffer fills up with the response

  4) the client reads the response while it is being sent by the server.

  5) haproxy eventually receives the end of the response from the server,
     which occupies the whole response buffer (must at least override the
     reserve), parses it and prepares to receive a second request while
     sending the last data blocks to the client. It then reinitializes the
     whole http_txn and calls http_wait_for_request(), which arms the
     http-request timeout.

  6) at this exact moment the client emits a second request, probably
     asking for an object referenced in the first page while parsing it
     on the fly, and silently disappears from the internet (internet
     access cut or software having trouble with pipelined request).

  7) the second request arrives into the request buffer and the response
     data stall in the response buffer, preventing the reserve from being
     used for anything else.

  8) haproxy calls http_wait_for_request() to parse the next request which
     has just arrived, but since it sees the response buffer is full, it
     refrains from doing so and waits for some data to leave or a timeout
     to strike.

  9) the http-request timeout strikes, causing http_wait_for_request() to
     be called again, and the function immediately returns since it cannot
     even produce an error message, and the loop is maintained here.

 10) the client timeout strikes, aborting the loop.

At first glance a check for timeout would be needed *before* considering
the buffer in http_wait_for_request(), but the issue is not there in fact,
because when doing so we see in the logs a client timeout error while
waiting for a request, which is wrong. The real issue is that we must not
consider the first transaction terminated until at least the reserve is
released so that http_wait_for_request() has no problem starting to process
a new request. This allows the first response to be reported in timeout and
not the second request. A more restrictive control could consist in not
considering the request complete until the response buffer has no more
outgoing data but this brings no added value and needlessly increases the
number of wake-ups when dealing with pipelining.

Note that the same issue exists with the request, we must ensure that any
POST data are finished being forwarded to the server before accepting a new
request. This case is much harder to trigger however as servers rarely
disappear and if they do so, they impact all their sessions at once.

No specific reproducer is usable because the bug is very hard to reproduce
and depends on the system as well with settings varying across reboots. On
a test machine, socket buffers were reduced to 4096 (tcp_rmem and tcp_wmem),
haproxy's buffers were set to 16384 and tune.maxrewrite to 12288. The proxy
must work in http-server-close mode, with a request timeout smaller than the
client timeout. The test is run like this :

  $ (printf "GET /15000 HTTP/1.1\r\n\r\n"; usleep 100000; \
     printf "GET / HTTP/1.0\r\n\r\n"; sleep 15) | nc6 --send-only 0 8002

The server returns chunks of the requested size (15000 bytes here, but
78000 in a previous test was the only working value). Strace must show the
last recvfrom() succeed and the last sendto() being shorter than desired or
better, not being called.

This fix must be backported to 1.6, 1.5 and 1.4. It was made in a way that
should make it possible to backport it. It depends on channel_congested()
which also needs to be backported. Further cleanup of http_wait_for_request()
could be made given that some of the tests are now useless.

diff --git a/src/proto_http.c b/src/proto_http.c
index a5334b5..21ad131 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c

@@ -5442,9 +5442,18 @@
 		  (txn->flags & TX_CON_WANT_MSK) == TX_CON_WANT_KAL)) {
 		/* server-close/keep-alive: terminate this transaction,
 		 * possibly killing the server connection and reinitialize
-		 * a fresh-new transaction.
+		 * a fresh-new transaction, but only once we're sure there's
+		 * enough room in the request and response buffer to process
+		 * another request. The request buffer must not hold any
+		 * pending output data and the request buffer must not have
+		 * output data occupying the reserve.
 		 */
-		http_end_txn_clean_session(s);
+		if (s->req.buf->o)
+			s->req.flags |= CF_WAKE_WRITE;
+		else if (channel_congested(&s->res))
+			s->res.flags |= CF_WAKE_WRITE;
+		else
+			http_end_txn_clean_session(s);
 	}
 
 	return txn->req.msg_state != old_req_state ||