MINOR: listener: refine the default MAX_ACCEPT from 64 to 4

The maximum number of connections accepted at once by a thread for a single
listener used to default to 64 divided by the number of processes but the
tasklet-based model is much more scalable and benefits from smaller values.
Experimentation has shown that 4 gives the highest accept rate for all
thread values, and that 3 and 5 come very close, as shown below (HTTP/1
connections forwarded per second at multi-accept 4 and 64):

 ac\thr|    1     2    4     8     16
 ------+------------------------------
      4|   80k  106k  168k  270k  336k
     64|   63k   89k  145k  230k  274k

Some tests were also conducted on SSL and absolutely no change was observed.

The value was placed into a define because it used to be spread all over the
code.

It might be useful at some point to backport this to 2.3 and 2.2 to help
those who observed some performance regressions from 1.6.
diff --git a/doc/configuration.txt b/doc/configuration.txt
index 65aabd8..6b2eea2 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -2403,14 +2403,15 @@
 tune.maxaccept <number>
   Sets the maximum number of consecutive connections a process may accept in a
   row before switching to other work. In single process mode, higher numbers
-  give better performance at high connection rates. However in multi-process
-  modes, keeping a bit of fairness between processes generally is better to
-  increase performance. This value applies individually to each listener, so
-  that the number of processes a listener is bound to is taken into account.
-  This value defaults to 64. In multi-process mode, it is divided by twice
-  the number of processes the listener is bound to. Setting this value to -1
-  completely disables the limitation. It should normally not be needed to tweak
-  this value.
+  used to give better performance at high connection rates, though this is not
+  the case anymore with the multi-queue. This value applies individually to
+  each listener, so that the number of processes a listener is bound to is
+  taken into account. This value defaults to 4 which showed best results. If a
+  significantly higher value was inherited from an ancient config, it might be
+  worth removing it as it will both increase performance and lower response
+  time. In multi-process mode, it is divided by twice the number of processes
+  the listener is bound to. Setting this value to -1 completely disables the
+  limitation. It should normally not be needed to tweak this value.
 
 tune.maxpollevents <number>
   Sets the maximum amount of events that can be processed at once in a call to
diff --git a/include/haproxy/defaults.h b/include/haproxy/defaults.h
index b023827..3a87c1b 100644
--- a/include/haproxy/defaults.h
+++ b/include/haproxy/defaults.h
@@ -170,6 +170,22 @@
 #define MAX_POLL_EVENTS 200
 #endif
 
+// The maximum number of connections accepted at once by a thread for a single
+// listener. It used to default to 64 divided by the number of processes but
+// the tasklet-based model is much more scalable and benefits from smaller
+// values. Experimentation has shown that 4 gives the highest accept rate for
+// all thread values, and that 3 and 5 come very close, as shown below (HTTP/1
+// connections forwarded per second at multi-accept 4 and 64):
+//
+// ac\thr|    1    2     4     8     16
+// ------+------------------------------
+//      4|   80k  106k  168k  270k  336k
+//     64|   63k   89k  145k  230k  274k
+//
+#ifndef MAX_ACCEPT
+#define MAX_ACCEPT 4
+#endif
+
 // the max number of tasks to run at once. Tests have shown the following
 // number of requests/s for 1 to 16 threads (1c1t, 1c2t, 2c4t, 4c8t, 4c16t):
 //
diff --git a/src/cfgparse.c b/src/cfgparse.c
index 8ceecf2..317ef2a 100644
--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -3431,7 +3431,7 @@
 			if (curproxy->options & PR_O_TCP_NOLING)
 				listener->options |= LI_O_NOLINGER;
 			if (!listener->maxaccept)
-				listener->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : 64;
+				listener->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : MAX_ACCEPT;
 
 			/* we want to have an optimal behaviour on single process mode to
 			 * maximize the work at once, but in multi-process we want to keep
diff --git a/src/listener.c b/src/listener.c
index 9ca910b..48ba8e0 100644
--- a/src/listener.c
+++ b/src/listener.c
@@ -131,7 +131,7 @@
 	/* if global.tune.maxaccept is -1, then max_accept is UINT_MAX. It
 	 * is not really illimited, but it is probably enough.
 	 */
-	max_accept = global.tune.maxaccept ? global.tune.maxaccept : 64;
+	max_accept = global.tune.maxaccept ? global.tune.maxaccept : MAX_ACCEPT;
 	for (; max_accept; max_accept--) {
 		conn = accept_queue_pop_sc(ring);
 		if (!conn)
diff --git a/src/log.c b/src/log.c
index c24fdbd..bc586e1 100644
--- a/src/log.c
+++ b/src/log.c
@@ -3923,7 +3923,7 @@
 			}
 		}
 		list_for_each_entry(l, &bind_conf->listeners, by_bind) {
-			l->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : 64;
+			l->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : MAX_ACCEPT;
 			l->accept = session_accept_fd;
 			l->analysers |=  cfg_log_forward->fe_req_ana;
 			l->default_target = cfg_log_forward->default_target;
@@ -3991,7 +3991,7 @@
 		}
 		list_for_each_entry(l, &bind_conf->listeners, by_bind) {
 			/* the fact that the sockets are of type dgram is guaranteed by str2receiver() */
-			l->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : 64;
+			l->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : MAX_ACCEPT;
 			l->rx.iocb   = syslog_fd_handler;
 			global.maxsock++;
 		}