MEDIUM: debug: make the thread dumper not rely on a thread mask anymore

The thread mask is too short to dump more than 64 bits. Thus here we're
using a different approach with two counters, one for the next thread ID
to dump (which always exists, as it's looked up), and the second one for
the number of threads done dumping. This allows to dump threads in ascending
order then to let them wait for all others to be done, then to leave without
the risk of an overlapping dump until the done count is null again.

This allows to remove threads_to_dump which was the last non-FD variable
using a global thread mask.
1 file changed