OPTIM: task: limit the impact of memory barriers in taks_remove_from_task_list()
In this function we end up with successive locked operations then a
store barrier, and in addition the compiler has to emit less efficient
code due to a longer jump. There's no need for absolutely updating the
tasks_run_queue counter before clearing the task's leaf pointer, so
let's swap the two operations and benefit from a single barrier as much
as possible. This code is on the hot path and shows about half a percent
of improvement with 8 threads.
diff --git a/include/proto/task.h b/include/proto/task.h
index c90a369..f11b445 100644
--- a/include/proto/task.h
+++ b/include/proto/task.h
@@ -273,11 +273,9 @@
{
LIST_DEL_INIT(&((struct tasklet *)t)->list);
task_per_thread[tid].task_list_size--;
+ if (!TASK_IS_TASKLET(t))
+ HA_ATOMIC_STORE(&t->rq.node.leaf_p, NULL); // was 0x1
HA_ATOMIC_SUB(&tasks_run_queue, 1);
- if (!TASK_IS_TASKLET(t)) {
- t->rq.node.leaf_p = NULL; // was 0x1
- __ha_barrier_store();
- }
}
/*