BUG/MINOR: debug: fix a small race in the thread dumping code

If a thread dump is requested from a signal handler, it may interrupt
a thread already waiting for a dump to complete, and may see the
threads_to_dump variable go to zero while others are waiting, steal
the lock and prevent other threads from ever completing. This tends
to happen when dumping many threads upon a watchdog timeout, to threads
waiting for their turn.

Instead now we proceed in two steps :
  1) the last dumped thread sets all bits again
  2) all threads only wait for their own bit to appear, then clear it
     and quit

This way there's no risk that a bit performs a double flip in the same
loop and threads cannot get stuck here anymore.

This should be backported to 2.0 as it clarifies stack traces.

(cherry picked from commit c07736209db764fb2aef6f18ed3687a504c35771)
Signed-off-by: Willy Tarreau <w@1wt.eu>
1 file changed