MINOR: ssl: mark the SSL handshake tasklet as heavy
There's a fairness issue between SSL and clear text. A full end-to-end
cleartext connection can require up to ~7.7 wakeups on average, plus 3.3
for the SSL tasklet, one of which is particularly expensive. So if we
accept to process many handshakes taking 1ms each, we significantly
increase the processing time of regular tasks just by adding an extra
delay between their calls. Ideally in order to be fair we should have a
1:18 call ratio, but this requires a bit more accounting. With very little
effort we can mark the SSL handshake tasklet as TASK_HEAVY until the
handshake completes, and remove it once done.
Doing so reduces from 14 to 3.0 ms the total response time experienced
by HTTP clients running in parallel to 1000 SSL clients doing full
handshakes in loops. Better, when tune.sched.low-latency is set to "on",
the latency further drops to 1.8 ms.
The tasks latency distribution explain pretty well what is happening:
Without the patch:
$ socat - /tmp/sock1 <<< "show profiling"
Per-task CPU profiling : on # set profiling tasks {on|auto|off}
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
ssl_sock_io_cb 2785375 19.35m 416.9us 5.401h 6.980ms
h1_io_cb 1868949 9.853s 5.271us 4.829h 9.302ms
process_stream 1864066 7.582s 4.067us 2.058h 3.974ms
si_cs_io_cb 1733808 1.932s 1.114us 26.83m 928.5us
h1_timeout_task 935760 - - 1.033h 3.975ms
accept_queue_process 303606 4.627s 15.24us 16.65m 3.291ms
srv_cleanup_toremove_connections452 64.31ms 142.3us 2.447s 5.415ms
task_run_applet 47 5.149ms 109.6us 57.09ms 1.215ms
srv_cleanup_idle_connections 34 2.210ms 65.00us 87.49ms 2.573ms
With the patch:
$ socat - /tmp/sock1 <<< "show profiling"
Per-task CPU profiling : on # set profiling tasks {on|auto|off}
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
ssl_sock_io_cb 3000365 21.08m 421.6us 20.30h 24.36ms
h1_io_cb 2031932 9.278s 4.565us 46.70m 1.379ms
process_stream 2010682 7.391s 3.675us 22.83m 681.2us
si_cs_io_cb 1702070 1.571s 922.0ns 8.732m 307.8us
h1_timeout_task 1009594 - - 17.63m 1.048ms
accept_queue_process 339595 4.792s 14.11us 3.714m 656.2us
srv_cleanup_toremove_connections779 75.42ms 96.81us 438.3ms 562.6us
srv_cleanup_idle_connections 48 2.498ms 52.05us 178.1us 3.709us
task_run_applet 17 1.738ms 102.3us 11.29ms 663.9us
other 1 947.8us 947.8us 202.6us 202.6us
=> h1_io_cb() and process_stream() are divided by 6 while ssl_sock_io_cb() is
multipled by 4
And with low-latency on:
$ socat - /tmp/sock1 <<< "show profiling"
Per-task CPU profiling : on # set profiling tasks {on|auto|off}
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
ssl_sock_io_cb 3000565 20.96m 419.1us 20.74h 24.89ms
h1_io_cb 2019702 9.294s 4.601us 49.22m 1.462ms
process_stream 2009755 6.570s 3.269us 1.493m 44.57us
si_cs_io_cb 1997820 1.566s 783.0ns 2.985m 89.66us
h1_timeout_task 1009742 - - 1.647m 97.86us
accept_queue_process 494509 4.697s 9.498us 1.240m 150.4us
srv_cleanup_toremove_connections1120 92.32ms 82.43us 463.0ms 413.4us
srv_cleanup_idle_connections 70 2.703ms 38.61us 204.5us 2.921us
task_run_applet 13 1.303ms 100.3us 85.12us 6.548us
=> process_stream() is divided by 100 while ssl_sock_io_cb() is
multipled by 4
Interestingly, the total HTTPS response time doesn't increase and even very
slightly decreases, with an overall ~1% higher request rate. The net effect
here is a redistribution of the CPU resources between internal tasks, and
in the case of SSL, handshakes wait bit more but everything after completes
faster.
This was made simple enough to be backportable if it helps some users
suffering from high latencies in mixed traffic.
(cherry picked from commit 9205ab31d2383f33612e27d8ce74ab62a27695fc)
Signed-off-by: Willy Tarreau <w@1wt.eu>
diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 327335c..7aad499 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -4916,6 +4916,7 @@
}
ctx->wait_event.tasklet->process = ssl_sock_io_cb;
ctx->wait_event.tasklet->context = ctx;
+ ctx->wait_event.tasklet->state |= TASK_HEAVY; // assign it to the bulk queue during handshake
ctx->wait_event.events = 0;
ctx->sent_early_data = 0;
ctx->early_buf = BUF_NULL;
@@ -5513,8 +5514,13 @@
MT_LIST_DEL(&conn->list);
HA_SPIN_UNLOCK(OTHER_LOCK, &idle_conns[tid].takeover_lock);
/* First if we're doing an handshake, try that */
- if (ctx->conn->flags & CO_FL_SSL_WAIT_HS)
+ if (ctx->conn->flags & CO_FL_SSL_WAIT_HS) {
ssl_sock_handshake(ctx->conn, CO_FL_SSL_WAIT_HS);
+ if (!(ctx->conn->flags & CO_FL_SSL_WAIT_HS)) {
+ /* handshake completed, leave the bulk queue */
+ _HA_ATOMIC_AND(&tl->state, ~TASK_SELF_WAKING);
+ }
+ }
/* If we had an error, or the handshake is done and I/O is available,
* let the upper layer know.
* If no mux was set up yet, then call conn_create_mux()