BUG/MEDIUM: fd: always align fdtab[] to 64 bytes

There's a risk that fdtab is not 64-byte aligned. The first effect is that
it may cause false sharing between cache lines resulting in contention
when adjacent FDs are used by different threads. The second is related
to what is explained in commit "BUG/MAJOR: compiler: relax alignment
constraints on certain structures", i.e. that modern compilers might
make use of aligned vector operations to zero some entries, and would
crash. We do not use any memset() or so on fdtab, so the risk is almost
inexistent, but that's not a reason for violating some valid assumptions.

This patch addresses this by allocating 64 extra bytes and aligning the
structure manually (this is an extremely cheap solution for this specific
case). The original address is stored in a new variable "fdtab_addr" and
is the one that gets freed. This remains extremely simple and should be
easily backportable. A dedicated aligned allocator later would help, of
course.

This needs to be backported as far as 2.2. No issue related to this was
reported yet, but it could very well happen as compilers evolve. In
addition this should preserve high performance across restarts (i.e.
no more dependency on allocator's alignment).

(cherry picked from commit 97ea9c49f1d95c7e91e544e8ad9ba09bffbcc023)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
(cherry picked from commit 6cebd0fd75e3ef511d5227df476c6d8c6c2d3378)
Signed-off-by: Willy Tarreau <w@1wt.eu>
diff --git a/src/fd.c b/src/fd.c
index 733658c..5f465f0 100644
--- a/src/fd.c
+++ b/src/fd.c
@@ -114,6 +114,7 @@
 int poller_wr_pipe[MAX_THREADS] __read_mostly; // Pipe to wake the threads
 
 volatile int ha_used_fds = 0; // Number of FD we're currently using
+static struct fdtab *fdtab_addr;  /* address of the allocated area containing fdtab */
 
 #define _GET_NEXT(fd, off) ((volatile struct fdlist_entry *)(void *)((char *)(&fdtab[fd]) + off))->next
 #define _GET_PREV(fd, off) ((volatile struct fdlist_entry *)(void *)((char *)(&fdtab[fd]) + off))->prev
@@ -685,11 +686,14 @@
 	int p;
 	struct poller *bp;
 
-	if ((fdtab = calloc(global.maxsock, sizeof(*fdtab))) == NULL) {
+	if ((fdtab_addr = calloc(global.maxsock, sizeof(*fdtab) + 64)) == NULL) {
 		ha_alert("Not enough memory to allocate %d entries for fdtab!\n", global.maxsock);
 		goto fail_tab;
 	}
 
+	/* always provide an aligned fdtab */
+	fdtab = (struct fdtab*)((((size_t)fdtab_addr) + 63) & -(size_t)64);
+
 	if ((polled_mask = calloc(global.maxsock, sizeof(*polled_mask))) == NULL) {
 		ha_alert("Not enough memory to allocate %d entries for polled_mask!\n", global.maxsock);
 		goto fail_polledmask;
@@ -726,7 +730,7 @@
  fail_info:
 	free(polled_mask);
  fail_polledmask:
-	free(fdtab);
+	free(fdtab_addr);
  fail_tab:
 	return 0;
 }
@@ -747,7 +751,7 @@
 	}
 
 	ha_free(&fdinfo);
-	ha_free(&fdtab);
+	ha_free(&fdtab_addr);
 	ha_free(&polled_mask);
 }