MEDIUM: tasks: also process late wakeups in process_runnable_tasks()
Since version 1.8, we've started to use tasks and tasklets more
extensively to defer I/O processing. Originally with the simple
scheduler, a task waking another one up using task_wakeup() would
have caused it to be processed right after the list of runnable ones.
With the introduction of tasklets, we've started to spill running
tasks from the run queues to the tasklet queues, so if a task wakes
another one up, it will only be executed on the next call to
process_runnable_task(), which means after yet another round of
polling loop.
This is particularly visible with I/Os hitting muxes: poll() reports
a read event, the connection layer performs a tasklet_wakeup() on the
mux subscribed to this I/O, and this mux in turn signals the upper
layer stream using task_wakeup(). The process goes back to poll() with
a null timeout since there's one active task, then back to checking all
possibly expired events, and finally back to process_runnable_tasks()
again. Worse, when there is high I/O activity, doing so will make the
task's execution further apart from the tasklet and will both increase
the total processing latency and reduce the cache hit ratio.
This patch brings back to the original spirit of process_runnable_tasks()
which is to execute runnable tasks as long as the execution budget is not
exhausted. By doing so, we're immediately cutting in half the number of
calls to all functions called by run_poll_loop(), and halving the number
of calls to poll(). Furthermore, calling poll() less often also means
purging FD updates less often and offering more chances to merge them.
This also has the nice effect of making tune.runqueue-depth effective
again, as in the past it used to be quickly bounded by this artificial
event horizon which was preventing from executing remaining tasks. On
certain workloads we can see a 2-3% performance increase.
diff --git a/src/task.c b/src/task.c
index 1a7f44d..500223f 100644
--- a/src/task.c
+++ b/src/task.c
@@ -427,10 +427,19 @@
ti->flags &= ~TI_FL_STUCK; // this thread is still running
+ tasks_run_queue_cur = tasks_run_queue; /* keep a copy for reporting */
+ nb_tasks_cur = nb_tasks;
+ max_processed = global.tune.runqueue_depth;
+
+ if (likely(niced_tasks))
+ max_processed = (max_processed + 3) / 4;
+
if (!thread_has_tasks()) {
activity[tid].empty_rq++;
return;
}
+
+ not_done_yet:
/* Merge the list of tasklets waken up by other threads to the
* main list.
*/
@@ -438,13 +447,6 @@
if (tmp_list)
LIST_SPLICE_END_DETACHED(&sched->tasklets[TL_URGENT], (struct list *)tmp_list);
- tasks_run_queue_cur = tasks_run_queue; /* keep a copy for reporting */
- nb_tasks_cur = nb_tasks;
- max_processed = global.tune.runqueue_depth;
-
- if (likely(niced_tasks))
- max_processed = (max_processed + 3) / 4;
-
/* run up to max_processed/3 urgent tasklets */
done = run_tasks_from_list(&tt->tasklets[TL_URGENT], (max_processed + 2) / 3);
max_processed -= done;
@@ -523,6 +525,10 @@
done = run_tasks_from_list(&tt->tasklets[TL_BULK], max_processed);
max_processed -= done;
+ /* some tasks may have woken other ones up */
+ if (max_processed && thread_has_tasks())
+ goto not_done_yet;
+
if (!LIST_ISEMPTY(&sched->tasklets[TL_URGENT]) |
!LIST_ISEMPTY(&sched->tasklets[TL_NORMAL]) |
!LIST_ISEMPTY(&sched->tasklets[TL_BULK]))