BUG/MEDIUM: task: bound the number of tasks picked from the wait queue at once

There is a theorical problem in the wait queue, which is that with many
threads, one could spend a lot of time looping on the newly expired tasks,
causing a lot of contention on the global wq_lock and on the global
rq_lock. This initially sounds bening, but if another thread does just
a task_schedule() or task_queue(), it might end up waiting for a long
time on this lock, and this wait time will count on its execution budget,
degrading the end user's experience and possibly risking to trigger the
watchdog if that lasts too long.

The simplest (and backportable) solution here consists in bounding the
number of expired tasks that may be picked from the global wait queue at
once by a thread, given that all other ones will do it as well anyway.

We don't need to pick more than global.tune.runqueue_depth tasks at once
as we won't process more, so this counter is updated for both the local
and the global queues: threads with more local expired tasks will pick
less global tasks and conversely, keeping the load balanced between all
threads. This will guarantee a much lower latency if/when wakeup storms
happen (e.g. hundreds of thousands of synchronized health checks).

Note that some crashes have been witnessed with 1/4 of the threads in
wake_expired_tasks() and, while the issue might or might not be related,
not having reasonable bounds here definitely justifies why we can spend
so much time there.

This patch should be backported, probably as far as 2.0 (maybe with
some adaptations).

(cherry picked from commit 3cfaa8d1e075b5037b15775ec6b81ef16c4d81ce)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 563eede2418274f28a72a926362b11e21341c32f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit e45f17962d02c9029794b4952fed0e5984bc0396)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
diff --git a/src/task.c b/src/task.c
index c688c16..0dd060b 100644
--- a/src/task.c
+++ b/src/task.c
@@ -159,12 +159,13 @@
  */
 int wake_expired_tasks()
 {
+	int max_processed = global.tune.runqueue_depth;
 	struct task *task;
 	struct eb32_node *eb;
 	int ret = TICK_ETERNITY;
 	__decl_hathreads(int key);
 
-	while (1) {
+	while (max_processed-- > 0) {
   lookup_next_local:
 		eb = eb32_lookup_ge(&task_per_thread[tid].timers, now_ms - TIMER_LOOK_BACK);
 		if (!eb) {
@@ -235,6 +236,8 @@
 	while (1) {
 		HA_RWLOCK_WRLOCK(TASK_WQ_LOCK, &wq_lock);
   lookup_next:
+		if (max_processed-- <= 0)
+			break;
 		eb = eb32_lookup_ge(&timers, now_ms - TIMER_LOOK_BACK);
 		if (!eb) {
 			/* we might have reached the end of the tree, typically because