BUG/MEDIUM: shctx: really check the lock's value while waiting
Jérôme reported an amazing crash in the spinlock version of
_shctx_wait4lock() with an extremely high <count> value of 32M! The
root cause is that the function cannot deal with contention on the lock
at all because it forgets to check if the lock's value has changed! As
such, every time it's called due to a contention, it waits twice as
long before trying again and lets the caller check for the contention
by itself.
The correct thing to do is to compare the value again at each loop.
This way it makes sure to mostly perform read accesses on the shared
cache line without writing too often, and to be ready fast enough to
try to grab the lock. And we must not increase the count on success
either!
Unfortunately I'd have expected to see a performance boost on the cache
with this but there was absolutely no change, so it's very likely that
these issues only happen once in a while and are sufficient to derail
the process when they strike, but not to have a permanent performance
impact.
The bug was introduced with the shctx entries in 1.5 so the fix must
be backported to all versions. Before 1.8 the function was called
_shared_context_wait4lock() and was in shctx.c.
(cherry picked from commit 3801bdc3fc1c2c6d596b4c4ee3449a76f2da8654)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit def4271bb22207983d0d17a8f02e99dcb6b60fa7)
Signed-off-by: Willy Tarreau <w@1wt.eu>
diff --git a/include/proto/shctx.h b/include/proto/shctx.h
index f5448e0..ea62a12 100644
--- a/include/proto/shctx.h
+++ b/include/proto/shctx.h
@@ -93,6 +93,8 @@
for (i = 0; i < *count; i++) {
relax();
relax();
+ if (*uaddr != value)
+ return;
}
*count = *count << 1;
}