BUG/MEDIUM: tasklet: properly compute the sleeping threads mask in tasklet_wakeup()

The use of ~(1 << tid) to compute the sleeping_mask in tasklet_wakeup()
will result in breakage above 32 threads, because (1<<31) = 0xFFFFFFFF8000000,
and upper values will lead to theorically undefined results, but practically
will wrap over 0x1 to 0x80000000 again and indicate wrong sleeping masks. It
seems that the main visible effect maybe extra latency on some threads or
short CPU loops on others.

No backport is needed.
diff --git a/include/proto/task.h b/include/proto/task.h
index 2258448..6ec2846 100644
--- a/include/proto/task.h
+++ b/include/proto/task.h
@@ -236,8 +236,8 @@
 	} else {
 		if (MT_LIST_ADDQ(&task_per_thread[tl->tid].shared_tasklet_list, (struct mt_list *)&tl->list) == 1) {
 			_HA_ATOMIC_ADD(&tasks_run_queue, 1);
-			if (sleeping_thread_mask & (1 << tl->tid)) {
-				_HA_ATOMIC_AND(&sleeping_thread_mask, ~(1 << tl->tid));
+			if (sleeping_thread_mask & (1UL << tl->tid)) {
+				_HA_ATOMIC_AND(&sleeping_thread_mask, ~(1UL << tl->tid));
 				wake_thread(tl->tid);
 			}
 		}