BUG/MINOR: task: do not set TASK_F_USR1 for no reason

This applicationn specific flag was added in 2.4-dev by commit 6fa8bcdc7
("MINOR: task: add an application specific flag to the state: TASK_F_USR1")
to help preserve a the idle connections status across wakeup calls. While
the code to do this was OK for tasklets, it was wrong for tasks, as in an
effort not to lose it when setting the RUNNING flag (that tasklets don't
have), it ended up being inconditionally set. It just happens that for now
no regular tasks use it, only tasklets.

This fix makes sure we always atomically perform (state & flags | running)
there, using a CAS. It also does it for tasklets because it was possible
to lose some such flags if set by another thread, even though this should
not happen with current code. In order to make the code more readable (and
avoid the previous mistake of repeated flags in the bit field), a new
TASK_PERSISTENT aggregate was declared in task.h for this.

In practice the CAS is cheap here because task states are stable or
convergent so the loop will almost never be taken.

This should be backported to 2.4.

(cherry picked from commit 3193eb9907dd97d0521d796967958e4716e2ce35)
[wt: minor ctx adjustment]
Signed-off-by: Willy Tarreau <w@1wt.eu>
diff --git a/src/task.c b/src/task.c
index 8b08675..84873f0 100644
--- a/src/task.c
+++ b/src/task.c
@@ -490,11 +490,10 @@
 		}
 
 		budgets[queue]--;
-		t = (struct task *)LIST_ELEM(tl_queues[queue].n, struct tasklet *, list);
-		state = t->state & (TASK_SHARED_WQ|TASK_SELF_WAKING|TASK_HEAVY|TASK_F_TASKLET|TASK_KILLED|TASK_F_USR1);
-
 		ti->flags &= ~TI_FL_STUCK; // this thread is still running
 		activity[tid].ctxsw++;
+
+		t = (struct task *)LIST_ELEM(tl_queues[queue].n, struct tasklet *, list);
 		ctx = t->context;
 		process = t->process;
 		t->calls++;
@@ -502,7 +501,7 @@
 
 		_HA_ATOMIC_DEC(&sched->rq_total);
 
-		if (state & TASK_F_TASKLET) {
+		if (t->state & TASK_F_TASKLET) {
 			uint64_t before = 0;
 
 			LIST_DEL_INIT(&((struct tasklet *)t)->list);
@@ -519,7 +518,7 @@
 #endif
 			}
 
-			state = _HA_ATOMIC_XCHG(&t->state, state);
+			state = _HA_ATOMIC_FETCH_AND(&t->state, TASK_PERSISTENT);
 			__ha_barrier_atomic_store();
 
 			process(t, ctx, state);
@@ -537,7 +536,11 @@
 
 		LIST_DEL_INIT(&((struct tasklet *)t)->list);
 		__ha_barrier_store();
-		state = _HA_ATOMIC_XCHG(&t->state, state|TASK_RUNNING|TASK_F_USR1);
+
+		state = t->state;
+		while (!_HA_ATOMIC_CAS(&t->state, &state, (state & TASK_PERSISTENT) | TASK_RUNNING))
+			;
+
 		__ha_barrier_atomic_store();
 
 		/* OK then this is a regular task */