BUG/MAJOR: sched: prevent rare concurrent wakeup of multi-threaded tasks

Since the relaxation of the run-queue locks in 2.0 there has been a
very small but existing race between expired tasks and running tasks:
a task might be expiring and being woken up at the same time, on
different threads. This is protected against via the TASK_QUEUED and
TASK_RUNNING flags, but just after the task finishes executing, it
releases it TASK_RUNNING bit an only then it may go to task_queue().
This one will do nothing if the task's ->expire field is zero, but
if the field turns to zero between this test and the call to
__task_queue() then three things may happen:
  - the task may remain in the WQ until the 24 next days if it's in
    the future;
  - the task may prevent any other task after it from expiring during
    the 24 next days once it's queued
  - if DEBUG_STRICT is set on 2.4 and above, an abort may happen
  - since 2.2, if the task got killed in between, then we may
    even requeue a freed task, causing random behaviour next time
    it's found there, or possibly corrupting the tree if it gets
    reinserted later.

The peers code is one call path that easily reproduces the case with
the ->expire field being reset, because it starts by setting it to
TICK_ETERNITY as the first thing when entering the task handler. But
other code parts also use multi-threaded tasks and rightfully expect
to be able to touch their expire field without causing trouble. No
trivial code path was found that would destroy such a shared task at
runtime, which already limits the risks.

This must be backported to 2.0.

(cherry picked from commit 6c8babf6c4d8cb33ff0f5220177207610c22ffcb)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 713184d12728b67154a92565b87daca244680154)
Signed-off-by: Willy Tarreau <w@1wt.eu>
2 files changed