MINOR: threads/init: synchronize the threads startup

It's a bit dangerous to let threads initialize at different speeds on
startup. Some are still in their init functions while others area already
running. It was even subject to some race condition bugs like the one
fixed by commit 1605c7ae6 ("BUG/MEDIUM: threads/mworker: fix a race on
startup").

Here in order to secure all this, we take a very simplistic approach
consisting in using half of the rendez-vous point, which is made
exactly for this purpose : we first initialize the mask of the threads
requesting a rendez-vous to the mask of all threads, and we simply call
thread_release() once the init is complete. This guarantees that no
thread will go further than the initialization code during this time.

This could even safely be backported if any other issue related to an
init race was discovered in a stable release.
diff --git a/src/haproxy.c b/src/haproxy.c
index dcb515d..2b3cf2e 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2500,6 +2500,10 @@
 	ha_set_tid((unsigned long)data);
 	tv_update_date(-1,-1);
 
+	/* per-thread init calls performed here are not allowed to snoop on
+	 * other threads, so they are free to initialize at their own rhythm
+	 * as long as they act as if they were alone.
+	 */
 	list_for_each_entry(ptif, &per_thread_init_list, list) {
 		if (!ptif->fct()) {
 			ha_alert("failed to initialize thread %u.\n", tid);
@@ -2513,6 +2517,11 @@
 		HA_SPIN_UNLOCK(START_LOCK, &start_lock);
 	}
 
+	/* broadcast that we are ready and wait for other threads to finish
+	 * their initialization.
+	 */
+	thread_release();
+
 	protocol_enable_all();
 	run_poll_loop();
 
@@ -3153,6 +3162,12 @@
 		sigdelset(&blocked_sig, SIGSEGV);
 		pthread_sigmask(SIG_SETMASK, &blocked_sig, &old_sig);
 
+		/* mark the fact that threads must wait for each other
+		 * during startup. Once initialized, they just have to
+		 * call thread_release().
+		 */
+		threads_want_rdv_mask = all_threads_mask;
+
 		/* Create nbthread-1 thread. The first thread is the current process */
 		threads[0] = pthread_self();
 		for (i = 1; i < global.nbthread; i++)