MINOR: task: limit the number of subsequent heavy tasks with flag TASK_HEAVY
While the scheduler is priority-aware and class-aware, and consistently
tries to maintain fairness between all classes, it doesn't make use of a
fine execution budget to compensate for high-latency tasks such as TLS
handshakes. This can result in many subsequent calls adding multiple
milliseconds of latency between the various steps of other tasklets that
don't even depend on this.
An ideal solution would be to add a 4th queue, have all tasks announce
their estimated cost upfront and let the scheduler maintain an auto-
refilling budget to pick from the most suitable queue.
But it turns out that a very simplified version of this already provides
impressive gains with very tiny changes and could easily be backported.
The principle is to reserve a new task flag "TASK_HEAVY" that indicates
that a task is expected to take a lot of time without yielding (e.g. an
SSL handshake typically takes 700 microseconds of crypto computation).
When the scheduler sees this flag when queuing a tasklet, it will place
it into the bulk queue. And during dequeuing, we accept only one of
these in a full round. This means that the first one will be accepted,
will not prevent other lower priority tasks from running, but if a new
one arrives, then the queue stops here and goes back to the polling.
This will allow to collect more important updates for other tasks that
will be batched before the next call of a heavy task.
Preliminary tests consisting in placing this flag on the SSL handshake
tasklet show that response times under SSL stress fell from 14 ms
before the patch to 3.0 ms with the patch, and even 1.8 ms if
tune.sched.low-latency is set to "on".
diff --git a/include/haproxy/task-t.h b/include/haproxy/task-t.h
index 34d9a45..c46cdcc 100644
--- a/include/haproxy/task-t.h
+++ b/include/haproxy/task-t.h
@@ -40,6 +40,7 @@
#define TASK_SELF_WAKING 0x0010 /* task/tasklet found waking itself */
#define TASK_KILLED 0x0020 /* task/tasklet killed, may now be freed */
#define TASK_IN_LIST 0x0040 /* tasklet is in a tasklet list */
+#define TASK_HEAVY 0x0080 /* this task/tasklet is extremely heavy */
#define TASK_WOKEN_INIT 0x0100 /* woken up for initialisation purposes */
#define TASK_WOKEN_TIMER 0x0200 /* woken up because of expired timer */
diff --git a/src/task.c b/src/task.c
index a9a6cc1..1cb8352 100644
--- a/src/task.c
+++ b/src/task.c
@@ -114,7 +114,7 @@
{
if (likely(thr < 0)) {
/* this tasklet runs on the caller thread */
- if (tl->state & TASK_SELF_WAKING) {
+ if (tl->state & (TASK_SELF_WAKING|TASK_HEAVY)) {
LIST_ADDQ(&sched->tasklets[TL_BULK], &tl->list);
sched->tl_class_mask |= 1 << TL_BULK;
}
@@ -437,6 +437,7 @@
unsigned int done = 0;
unsigned int queue;
unsigned short state;
+ char heavy_calls = 0;
void *ctx;
for (queue = 0; queue < TL_CLASSES;) {
@@ -483,7 +484,20 @@
budgets[queue]--;
t = (struct task *)LIST_ELEM(tl_queues[queue].n, struct tasklet *, list);
- state = t->state & (TASK_SHARED_WQ|TASK_SELF_WAKING|TASK_KILLED);
+ state = t->state & (TASK_SHARED_WQ|TASK_SELF_WAKING|TASK_HEAVY|TASK_KILLED);
+
+ if (state & TASK_HEAVY) {
+ /* This is a heavy task. We'll call no more than one
+ * per function call. If we called one already, we'll
+ * return and announce the max possible weight so that
+ * the caller doesn't come back too soon.
+ */
+ if (heavy_calls) {
+ done = INT_MAX; // 11ms instead of 3 without this
+ break; // too many heavy tasks processed already
+ }
+ heavy_calls = 1;
+ }
ti->flags &= ~TI_FL_STUCK; // this thread is still running
activity[tid].ctxsw++;