MEDIUM: global: Add a "close-spread-time" option to spread soft-stop on time window
The new 'close-spread-time' global option can be used to spread idle and
active HTTP connction closing after a SIGUSR1 signal is received. This
allows to limit bursts of reconnections when too many idle connections
are closed at once. Indeed, without this new mechanism, in case of
soft-stop, all the idle connections would be closed at once (after the
grace period is over), and all active HTTP connections would be closed
by appending a "Connection: close" header to the next response that goes
over it (or via a GOAWAY frame in case of HTTP2).
This patch adds the support of this new option for HTTP as well as HTTP2
connections. It works differently on active and idle connections.
On active connections, instead of sending systematically the GOAWAY
frame or adding the 'Connection: close' header like before once the
soft-stop has started, a random based on the remainder of the close
window is calculated, and depending on its result we could decide to
keep the connection alive. The random will be recalculated for any
subsequent request/response on this connection so the GOAWAY will still
end up being sent, but we might wait a few more round trips. This will
ensure that goaways are distributed along a longer time window than
before.
On idle connections, a random factor is used when determining the expire
field of the connection's task, which should naturally spread connection
closings on the time window (see h2c_update_timeout).
This feature request was described in GitHub issue #1614.
This patch should be backported to 2.5. It depends on "BUG/MEDIUM:
mux-h2: make use of http-request and keep-alive timeouts" which
refactorized the timeout management of HTTP2 connections.
diff --git a/src/mux_h1.c b/src/mux_h1.c
index 53660f3..ecdb381 100644
--- a/src/mux_h1.c
+++ b/src/mux_h1.c
@@ -591,6 +591,8 @@
/* Refresh the h1c task timeout if necessary */
static void h1_refresh_timeout(struct h1c *h1c)
{
+ int is_idle_conn = 0;
+
if (h1c->task) {
if (!(h1c->flags & H1C_F_ST_ALIVE) || (h1c->flags & H1C_F_ST_SHUTDOWN)) {
/* half-closed or dead connections : switch to clientfin/serverfin
@@ -599,6 +601,7 @@
*/
h1c->task->expire = tick_add(now_ms, h1c->shut_timeout);
TRACE_DEVEL("refreshing connection's timeout (dead or half-closed)", H1_EV_H1C_SEND|H1_EV_H1C_RECV, h1c->conn);
+ is_idle_conn = 1;
}
else if (b_data(&h1c->obuf)) {
/* connection with pending outgoing data, need a timeout (server or client). */
@@ -609,6 +612,10 @@
/* front connections waiting for a fully usable stream need a timeout. */
h1c->task->expire = tick_add(now_ms, h1c->timeout);
TRACE_DEVEL("refreshing connection's timeout (alive front h1c but not ready)", H1_EV_H1C_SEND|H1_EV_H1C_RECV, h1c->conn);
+ /* A frontend connection not yet ready could be treated the same way as an idle
+ * one in case of soft-close.
+ */
+ is_idle_conn = 1;
}
else {
/* alive back connections of front connections with a conn-stream attached */
@@ -618,6 +625,36 @@
/* Finally set the idle expiration date if shorter */
h1c->task->expire = tick_first(h1c->task->expire, h1c->idle_exp);
+
+ if ((h1c->px->flags & (PR_FL_DISABLED|PR_FL_STOPPED)) &&
+ is_idle_conn && tick_isset(global.close_spread_end)) {
+ /* If a soft-stop is in progress and a close-spread-time
+ * is set, we want to spread idle connection closing roughly
+ * evenly across the defined window. This should only
+ * act on idle frontend connections.
+ * If the window end is already in the past, we wake the
+ * timeout task up immediately so that it can be closed.
+ */
+ int remaining_window = tick_remain(now_ms, global.close_spread_end);
+ if (remaining_window) {
+ /* We don't need to reset the expire if it would
+ * already happen before the close window end.
+ */
+ if (tick_is_le(global.close_spread_end, h1c->task->expire)) {
+ /* Set an expire value shorter than the current value
+ * because the close spread window end comes earlier.
+ */
+ h1c->task->expire = tick_add(now_ms, statistical_prng_range(remaining_window));
+ TRACE_DEVEL("connection timeout set to value before close-spread window end", H1_EV_H1C_SEND|H1_EV_H1C_RECV, h1c->conn);
+ }
+ }
+ else {
+ /* We are past the soft close window end, wake the timeout
+ * task up immediately.
+ */
+ task_wakeup(h1c->task, TASK_WOKEN_TIMER);
+ }
+ }
TRACE_DEVEL("new expiration date", H1_EV_H1C_SEND|H1_EV_H1C_RECV, h1c->conn, 0, 0, (size_t[]){h1c->task->expire});
task_queue(h1c->task);
}
@@ -3011,8 +3048,25 @@
if (!(h1c->flags & H1C_F_IS_BACK)) {
if (unlikely(h1c->px->flags & (PR_FL_DISABLED|PR_FL_STOPPED))) {
if (!(h1c->px->options & PR_O_IDLE_CLOSE_RESP) &&
- h1c->flags & H1C_F_WAIT_NEXT_REQ)
- goto release;
+ h1c->flags & H1C_F_WAIT_NEXT_REQ) {
+
+ int send_close = 1;
+ /* If a close-spread-time option is set, we want to avoid
+ * closing all the active HTTP2 connections at once so we add a
+ * random factor that will spread the closing.
+ */
+ if (tick_isset(global.close_spread_end)) {
+ int remaining_window = tick_remain(now_ms, global.close_spread_end);
+ if (remaining_window) {
+ /* This should increase the closing rate the
+ * further along the window we are.
+ */
+ send_close = (remaining_window <= statistical_prng_range(global.close_spread_time));
+ }
+ }
+ if (send_close)
+ goto release;
+ }
}
}