BUG/MINOR: task: close a tiny race in the inter-thread wakeup

__task_wakeup() takes care of a small race that exists between threads,
but it uses a store barrier that is not sufficient since apparently the
state read after clearing the leaf_p pointer sometimes is incorrect. This
results in missed wakeups between threads competing at a high rate. Let's
use a full barrier instead to serialize the operations.

This may be backported to 1.9 though it's extremely unlikely that this
bug will ever manifest itself there.
diff --git a/src/task.c b/src/task.c
index e7ea0db..826e212 100644
--- a/src/task.c
+++ b/src/task.c
@@ -107,7 +107,7 @@
 	if (((volatile unsigned short)(t->state)) & TASK_RUNNING) {
 		unsigned short state;
 		t->rq.node.leaf_p = NULL;
-		__ha_barrier_store();
+		__ha_barrier_full();
 
 		state = (volatile unsigned short)(t->state);
 		if (unlikely(state != 0 && !(state & TASK_RUNNING)))