MINOR: atomic/arm64: detect and use builtins for the double-word CAS
Gcc 10.2 implements outline atomics on aarch64. The replace all inline
atomic ops with a function call that checks if the machine supports LSE
atomics. This comes with a small cost but allows modern machines to scale
much better than with the old LL/SC ones even when built for full 8.0
compatibility.
This patch enables the use of the __atomic_compare_exchange() builtin
for the double-word CAS when detected as available instead of using the
hand-written LL/SC version. The extra cost is negligible because we do
very few DWCAS operations (essentially FD migrations and shared pools)
so the cost is low but under high contention it can still be beneficial.
As expected no performance difference was measured in either direction
on 4-core machines with this change.
This could be backported to 2.3 if it was shown that FD migrations were
representing a significant source of contention, but for now it does
not appear to be needed.
diff --git a/include/haproxy/atomic.h b/include/haproxy/atomic.h
index cc06162..8608b12 100644
--- a/include/haproxy/atomic.h
+++ b/include/haproxy/atomic.h
@@ -550,8 +550,27 @@
return ret;
}
-#else // no ARMv8.1-A atomics
+#elif defined(__SIZEOF_INT128__) && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16) // no ARMv8.1-A atomics but 128-bit atomics
+
+/* According to https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
+ * we can use atomics on __int128. The availability of CAS is defined there:
+ * https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html
+ * However these usually involve a function call which can be expensive for some
+ * cases, but gcc 10.2 and above can reroute the function call to either LL/SC for
+ * v8.0 or LSE for v8.1+, which allows to use a more scalable version on v8.1+ at
+ * the extra cost of a function call.
+ */
+
+/* returns 0 on failure, non-zero on success */
+static __inline int __ha_cas_dw(void *target, void *compare, const void *set)
+{
+ return __atomic_compare_exchange((__int128*)target, (__int128*)compare, (const __int128*)set,
+ 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+}
+
+#else // neither ARMv8.1-A atomics nor 128-bit atomics
+/* returns 0 on failure, non-zero on success */
static __inline int __ha_cas_dw(void *target, void *compare, void *set)
{
void *value[2];