BUG/MEDIUM: sched: allow a bit more TASK_HEAVY to be processed when needed
As reported in github issue #1881, there are situations where an excess
of TLS handshakes can cause a livelock. What's happening is that normally
we process at most one TLS handshake per loop iteration to maintain the
latency low. This is done by tagging them with TASK_HEAVY, queuing these
tasklets in the TL_HEAVY queue. But if something slows down the loop, such
as a connect() call when no more ports are available, we could end up
processing no more than a few hundred or thousands handshakes per second.
If the llmit becomes lower than the rate of incoming handshakes, we will
accumulate them and at some point users will get impatient and give up or
retry. Then a new problem happens: the queue fills up with even more
handshake attempts, only one of which will be handled per iteration, so
we can end up processing only outdated handshakes at a low rate, with
basically nothing else in the queue. This can for example happen in
parallel with health checks that don't require incoming handshakes to
succeed to continue to cause some activity that could maintain the high
latency stuff active.
Here we're taking a slightly different approach. First, instead of always
allowing only one handshake per loop (and usually it's critical for
latency), we take the current situation into account:
- if configured with tune.sched.low-latency, the limit remains 1
- if there are other non-heavy tasks, we set the limit to 1 + one
per 1024 tasks, so that a heavily loaded queue of 4k handshakes
per thread will be able to drain them at ~4 per loops with a
limited impact on latency
- if there are no other tasks, the limit grows to 1 + one per 128
tasks, so that a heavily loaded queue of 4k handshakes per thread
will be able to drain them at ~32 per loop with still a very
limited impact on latency since only I/O will get delayed.
It was verified on a 56-core Xeon-8480 that this did not degrade the
latency; all requests remained below 1ms end-to-end in full close+
handshake, and even 500us under low-lat + busy-polling.
This must be backported to 2.4.
(cherry picked from commit ba4c7a15978deaf74b6af09d2a13b4fff7ccea74)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit e5713fb24194166e273ece9c58eddfad8ca39627)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit a7e743d153a5cb65164478c5e9d835eb88422e54)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 8f054662ffb6227cfdaceb94a218152b233177e4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
diff --git a/src/task.c b/src/task.c
index 3f51321..3bd4a24 100644
--- a/src/task.c
+++ b/src/task.c
@@ -736,11 +736,26 @@
for (queue = 0; queue < TL_CLASSES; queue++)
max[queue] = ((unsigned)max_processed * max[queue] + max_total - 1) / max_total;
- /* The heavy queue must never process more than one task at once
- * anyway.
+ /* The heavy queue must never process more than very few tasks at once
+ * anyway. We set the limit to 1 if running on low_latency scheduling,
+ * given that we know that other values can have an impact on latency
+ * (~500us end-to-end connection achieved at 130kcps in SSL), 1 + one
+ * per 1024 tasks if there is at least one non-heavy task while still
+ * respecting the ratios above, or 1 + one per 128 tasks if only heavy
+ * tasks are present. This allows to drain excess SSL handshakes more
+ * efficiently if the queue becomes congested.
*/
- if (max[TL_HEAVY] > 1)
- max[TL_HEAVY] = 1;
+ if (max[TL_HEAVY] > 1) {
+ if (global.tune.options & GTUNE_SCHED_LOW_LATENCY)
+ budget = 1;
+ else if (tt->tl_class_mask & ~(1 << TL_HEAVY))
+ budget = 1 + tt->rq_total / 1024;
+ else
+ budget = 1 + tt->rq_total / 128;
+
+ if (max[TL_HEAVY] > budget)
+ max[TL_HEAVY] = budget;
+ }
lrq = grq = NULL;