BUG/MAJOR: connection: always recompute polling status upon I/O
Bryan Berry and Baptiste Assmann both reported some occasional CPU
spinning loops where haproxy was still processing I/O but burning
CPU for apparently uncaught events.
What happens is the following sequence :
- proxy is in TCP mode
- a connection from a client initiates a connection to a server
- the connection to the server does not immediately happen and is
polled for
- in the mean time, the client speaks and the stream interface
calls ->chk_snd() on the peer connection to send the new data
- chk_snd() calls send_loop() to send the data. This last one
makes the connection succeed and empties the buffer, so it
disables polling on the connection and on the FD by creating
an update entry.
- before the update is processed, poll() succeeds and reports
a write event for this fd. The poller does fd_ev_set() on the
FD to switch it to speculative mode
- the IO handler is called with a connection which has no write
flag but an FD which is enabled in speculative mode.
- the connection does nothing useful.
- conn_update_polling() at the end of conn_fd_handler() cannot
disable the FD because there were no changes on this FD.
- the handler is left with speculative polling still enabled on
the FD, and will be called over and over until a poll event is
needed to transfer data.
There is no perfectly elegant solution to this. At least we should
update the flags indicating the current polling status to reflect
what is being done at the FD level. This will allow to detect that
the FD needs to be disabled upon exit.
chk_snd() also needs minor changes to correctly switch to speculative
polling before calling send_loop(), and to reflect this in the connection
flags. This is needed so that no event remains stuck there without any
polling. In fact, chk_snd() and chk_rcv() should perform the same number
of preparations and cleanups as conn_fd_handler().
2 files changed