61d39a0e2a047df78f7f3bfcf5584090913cdc65 - haproxy

commit	61d39a0e2a047df78f7f3bfcf5584090913cdc65	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Thu Jul 18 21:49:32 2013 +0200
committer	Willy Tarreau <w@1wt.eu>	Mon Jul 22 09:31:55 2013 +0200
tree	7411233b7739a0ed5805374b0ed87ec6c6cc5b49
parent	a34bdc0ea402ea5be1e9d7f80eaddec772b94393 [diff]

BUG/MEDIUM: splicing: fix abnormal CPU usage with splicing

Mark Janssen reported an issue in 1.5-dev19 which was introduced
in 1.5-dev12 by commit 96199b10. From time to time, randomly, the
CPU usage spikes to 100% for seconds to minutes.

A deep analysis of the traces provided shows that it happens when
waiting for the response to a second pipelined HTTP request, or
when trying to handle the received shutdown advertised by epoll()
after the last block of data. Each time, splice() was involved with
data pending in the pipe.

The cause of this was that such events could not be taken into account
by splice nor by recv and were left pending :

  - the transfer of the last block of data, optionally with a shutdown
    was not handled by splice() because of the validation that to_forward
    is higher than MIN_SPLICE_FORWARD ;

  - the next recv() call was inhibited because of the test on presence
    of data in the pipe. This is also what prevented the recv() call
    from handling a response to a pipelined request until the client
    had ACKed the previous response.

No less than 4 different methods were experimented to fix this, and the
current one was finally chosen. The principle is that if an event is not
caught by splice(), then it MUST be caught by recv(). So we remove the
condition on the pipe's emptiness to perform an recv(), and in order to
prevent recv() from being used in the middle of a transfer, we mark
supposedly full pipes with CO_FL_WAIT_ROOM, which makes sense because
the reason for stopping a splice()-based receive is that the pipe is
supposed to be full.

The net effect is that we don't wake up and sleep in loops during these
transient states. This happened much more often than expected, sometimes
for a few cycles at end of transfers, but rarely long enough to be
noticed, unless a client timed out with data pending in the pipe. The
effect on CPU usage is visible even when transfering 1MB objects in
pipeline, where the CPU usage drops from 10 to 6% on a small machine at
medium bandwidth.

Some further improvements are needed :
  - the last chunk of a splice() transfer is never done using splice due
    to the test on to_forward. This is wrong and should be performed with
    splice if the pipe has not yet been emptied ;

  - si_chk_snd() should not be called when the write event is already being
    polled, otherwise we're almost certain to get EAGAIN.

Many thanks to Mark for all the traces he cared to provide, they were
essential for understanding this issue which was not reproducible
without.

Only 1.5-dev is affected, no backport is needed.

2 files changed

tree: 7411233b7739a0ed5805374b0ed87ec6c6cc5b49