BUG/MEDIUM: fd: avoid infinite loops in fd_add_to_fd_list and fd_rm_from_fd_list
With 4d9888c ("CLEANUP: fd: get rid of the __GET_{NEXT,PREV} macros") some
"volatile" keywords were dropped at various assignment places in fd's code.
In fd_add_to_fd_list() and fd_add_to_fd_list(), because of the absence of
the "volatile" keyword: the compiler was able to perform some code
optimizations that prevented prev and next variables from being reloaded
between locking attempts (goto loop).
The result was that fd_add_to_fd_list() and fd_rm_from_fd_list() could enter in
infinite loops, preventing other threads from working further and ultimately
resulting in the watchdog being triggered as described in GH #2011.
To fix this, we made sure to re-audit 4d9888c in order to restore the required
memory barriers / compilers hints to prevent the compiler from mis-optimizing
the code around the fd's locks.
That is: using atomic loads to fetch the prev and next values, and restoring
the "volatile" cast for cur_list.ent variable assignment in fd_rm_from_fd_list()
Big thanks to @xanaxalan for his help and patience and to @wtarreau for his
findings and explanations in regard to compiler's optimizations.
This must be backported in 2.7 with 4d9888c ("CLEANUP: fd: get rid of the
__GET_{NEXT,PREV} macros")
diff --git a/src/fd.c b/src/fd.c
index 4d4700f..aaf0e38 100644
--- a/src/fd.c
+++ b/src/fd.c
@@ -127,7 +127,7 @@
int last;
redo_next:
- next = fdtab[fd].update.next;
+ next = HA_ATOMIC_LOAD(&fdtab[fd].update.next);
/* Check that we're not already in the cache, and if not, lock us. */
if (next > -2)
goto done;
@@ -194,7 +194,7 @@
lock_self:
#if (defined(HA_CAS_IS_8B) || defined(HA_HAVE_CAS_DW))
next_list.ent.next = next_list.ent.prev = -2;
- cur_list.ent = fdtab[fd].update;
+ cur_list.ent = *(volatile typeof(fdtab->update)*)&fdtab[fd].update;
/* First, attempt to lock our own entries */
do {
/* The FD is not in the FD cache, give up */
@@ -214,7 +214,7 @@
#else
lock_self_next:
- next = fdtab[fd].update.next;
+ next = HA_ATOMIC_LOAD(&fdtab[fd].update.next);
if (next == -2)
goto lock_self_next;
if (next <= -3)
@@ -222,7 +222,7 @@
if (unlikely(!_HA_ATOMIC_CAS(&fdtab[fd].update.next, &next, -2)))
goto lock_self_next;
lock_self_prev:
- prev = fdtab[fd].update.prev;
+ prev = HA_ATOMIC_LOAD(&fdtab[fd].update.prev);
if (prev == -2)
goto lock_self_prev;
if (unlikely(!_HA_ATOMIC_CAS(&fdtab[fd].update.prev, &prev, -2)))