MEDIUM: init/threads: don't use spinlocks during the init phase

PiBa-NL found some pathological cases where starting threads can hinder
each other and cause a measurable slow down. This problem is reproducible
with the following config (haproxy must be built with -DDEBUG_DEV) :

    global
	stats socket /tmp/sock1 mode 666 level admin
	nbthread 64

    backend stopme
	timeout server  1s
	option tcp-check
	tcp-check send "debug dev exit\n"
	server cli unix@/tmp/sock1 check

This will cause the process to be stopped once the checks are ready to
start. Binding all these to just a few cores magnifies the problem.
Starting them in loops shows a significant time difference among the
commits :

  # before startup serialization
  $ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e186161 -db -f slow-init.cfg >/dev/null 2>&1; done

  real    0m1.581s
  user    0m0.621s
  sys     0m5.339s

  # after startup serialization
  $ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e4d7c9dd -db -f slow-init.cfg >/dev/null 2>&1; done

  real    0m2.366s
  user    0m0.894s
  sys     0m8.238s

In order to address this, let's use plain mutexes and cond_wait during
the init phase. With this done, waiting threads now sleep and the problem
completely disappeared :

  $ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy -db -f slow-init.cfg >/dev/null 2>&1; done

  real    0m0.161s
  user    0m0.079s
  sys     0m0.149s
diff --git a/src/haproxy.c b/src/haproxy.c
index a8898b7..c9370a3 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2562,6 +2562,9 @@
 	struct per_thread_init_fct   *ptif;
 	struct per_thread_deinit_fct *ptdf;
 	struct per_thread_free_fct   *ptff;
+	static int init_left = 0;
+	__decl_hathreads(static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER);
+	__decl_hathreads(static pthread_cond_t  init_cond  = PTHREAD_COND_INITIALIZER);
 
 	ha_set_tid((unsigned long)data);
 
@@ -2577,7 +2580,13 @@
 	 * after reallocating them locally. This will also ensure there is
 	 * no race on file descriptors allocation.
 	 */
-	thread_isolate();
+#ifdef USE_THREAD
+	pthread_mutex_lock(&init_mutex);
+#endif
+	/* The first thread must set the number of threads left */
+	if (!init_left)
+		init_left = global.nbthread;
+	init_left--;
 
 	tv_update_date(-1,-1);
 
@@ -2606,16 +2615,21 @@
 
 	/* enabling protocols will result in fd_insert() calls to be performed,
 	 * we want all threads to have already allocated their local fd tables
-	 * before doing so.
+	 * before doing so, thus only the last thread does it.
 	 */
-	thread_sync_release();
-	thread_isolate();
-
-	if (tid == 0)
+	if (init_left == 0)
 		protocol_enable_all();
 
-	/* done initializing this thread, don't start before others are done */
-	thread_sync_release();
+#ifdef USE_THREAD
+	pthread_cond_broadcast(&init_cond);
+	pthread_mutex_unlock(&init_mutex);
+
+	/* now wait for other threads to finish starting */
+	pthread_mutex_lock(&init_mutex);
+	while (init_left)
+		pthread_cond_wait(&init_cond, &init_mutex);
+	pthread_mutex_unlock(&init_mutex);
+#endif
 
 	run_poll_loop();