MEDIUM: stick-table: requeue the expiration task out of the exclusive lock

With 48 threads, a heavily loaded table with plenty of trackers and
rules and a short expiration timer of 10ms saturates the CPU at 232k
rps. By carefully using atomic ops we can make sure that t->exp_next
and t->task->expire converge to the earliest next expiration date and
that all of this can be performed under atomic ops without any lock.
That's what this patch is doing in stktable_touch_with_exp(). This is
sufficient to double the performance and reach 470k rps.

It's worth noting that __stktable_store() uses a mix of eb32_insert()
and task_queue, and that the second part of it could possibly benefit
from this, even though sometimes it's called under a lock that was
already held.
diff --git a/src/stick_table.c b/src/stick_table.c
index f4f1de1..b114466 100644
--- a/src/stick_table.c
+++ b/src/stick_table.c
@@ -378,14 +378,29 @@
 {
 	struct eb32_node * eb;
 	int locked = 0;
+	int old_exp, new_exp;
 
 	if (expire != HA_ATOMIC_LOAD(&ts->expire)) {
 		/* we'll need to set the expiration and to wake up the expiration timer .*/
 		HA_ATOMIC_STORE(&ts->expire, expire);
 		if (t->expire) {
-			if (!locked++)
-				HA_RWLOCK_WRLOCK(STK_TABLE_LOCK, &t->lock);
-			t->exp_task->expire = t->exp_next = tick_first(expire, t->exp_next);
+			/* set both t->exp_next and the task's expire to the newest
+			 * expiration date.
+			 */
+			old_exp = HA_ATOMIC_LOAD(&t->exp_next);
+			do {
+				new_exp = tick_first(expire, old_exp);
+			} while (new_exp != old_exp &&
+				 !HA_ATOMIC_CAS(&t->exp_next, &old_exp, new_exp) &&
+				 __ha_cpu_relax());
+
+			old_exp = HA_ATOMIC_LOAD(&t->exp_task->expire);
+			do {
+				new_exp = HA_ATOMIC_LOAD(&t->exp_next);
+			} while (new_exp != old_exp &&
+				 !HA_ATOMIC_CAS(&t->exp_task->expire, &old_exp, new_exp) &&
+				 __ha_cpu_relax());
+
 			task_queue(t->exp_task);
 			/* keep the lock */
 		}