[OPTIM] task: reduce the number of calls to task_queue()

Most of the time, task_queue() will immediately return. By extracting
the preliminary checks and putting them in an inline function, we can
significantly reduce the number of calls to the function itself, and
most of the tests can be optimized away due to the caller's context.

Another minor improvement in process_runnable_tasks() consisted in
taking benefit from the processor's branch prediction unit by making
a special case of the process_session() callback which is by far the
most common one.

All this improved performance by about 1%, mainly during the call
from process_runnable_tasks().
diff --git a/include/common/ticks.h b/include/common/ticks.h
index f3c1a7d..4587d56 100644
--- a/include/common/ticks.h
+++ b/include/common/ticks.h
@@ -113,6 +113,17 @@
 		return t2;
 }
 
+/* return the first one of the two timers, where only the first one may be infinite */
+static inline int tick_first_2nz(int t1, int t2)
+{
+	if (!tick_isset(t1))
+		return t2;
+	if ((t1 - t2) <= 0)
+		return t1;
+	else
+		return t2;
+}
+
 /* return the number of ticks remaining from <now> to <exp>, or zero if expired */
 static inline int tick_remain(int now, int exp)
 {