BUG/MEDIUM: h2: prevent orphaned streams from blocking a connection forever
Some h2 connections remaining in CLOSE_WAIT state forever have been
reported for a while. Thanks to detailed captures provided by Milan
Petruzelka, the sequence where this happens became clearer :
1) multiple streams compete for the mux and are queued in the send_list
2) at this point the mux has to emit a GOAWAY for any reason (for
example because it received a bad message)
3) the streams are woken up, notified about the error
4) h2_detach() is called for each of them
5) the CS they are detached from the H2S
6) since the streams are marked as blocked for some room, they are
orphaned and nothing more is done on them.
7) at this point, any activity on the connection goes through h2_wake()
which sees the conneciton in ERROR2 state, tries again to release
the streams, cannot, and stops polling (thus even connection errors
cannot be detected anymore).
=> from this point, no more events can be received on the connection, and
the streams remain orphaned forever.
This patch makes sure that we never return without doing anything once
an error was met. It has to act both on the h2_detach() side (for h2
streams being detached after the error was emitted) and on the h2_wake()
side (for errors reported after h2s have already been orphaned).
Many thanks to Milan Petruzelka and Janusz Dziemidowicz for their
awesome work on this issue, collecting traces and testing patches,
and to Olivier Doucet for extra testing and confirming the fix.
This fix must be backported to 1.8.
1 file changed