MEDIUM: stream: make a full process_stream() loop when completing I/O on exit

During 1.9 development cycle a shortcut was made in process_stream() to
update the analysers immediately after an I/O even detected on the send()
path while leaving the function. In order to prevent this from being abused
by a single stream stealing all the CPU, the loop didn't cover the initial
recv() call, so that events ultimately converge.

This has caused a number of issues over time because the conditions to
decide to loop are a bit tricky. For example the CF_READ_PARTIAL flag is
not immediately removed from rqf_last and may appear for a long time at
this point, sometimes causing some loops to last long.

Another unexpected side effect is that all analysers are called again with
no data to process, just because CF_WRITE_PARTIAL is present. We cannot get
rid of this event even if of very rare use, because some analysers might
wait for some data to leave a buffer before proceeding. With a full loop,
this event would have been merged with a subsequent recv() allowing analysers
to do something more useful than just ack an event they don't care about.

While during early 1.9-dev it was very important to be kind with the
scheduler, nowadays it's lock-free for local tasks so this optimization
is much less interesting to use it for I/Os, especially if we factor in
the trouble it causes.

This patch thus removes the use of the loop for regular I/Os and instead
performs a task_wakeup() with an I/O event so that the task will be
scheduled after all other ones and will have a chance to perform another
recv() and possibly to gather more I/O events to be processed at once.
Synchronous errors and transitions to SI_ST_DIS however are still handled
by the loop.

Doing so significantly reduces the average number of calls to analysers
(those are typically halved when compression is enabled in legacy mode),
and as a side benefit, has increased the H1 performance by about 1%.
diff --git a/src/stream.c b/src/stream.c
index e653034..a29b775 100644
--- a/src/stream.c
+++ b/src/stream.c
@@ -2582,15 +2582,25 @@
 		if ((sess->fe->options & PR_O_CONTSTATS) && (s->flags & SF_BE_ASSIGNED))
 			stream_process_counters(s);
 
+		/* take the exact same flags si_update_both() will have before
+		 * trying to update again.
+		 */
+		rqf_last = req->flags & ~(CF_READ_NULL|CF_READ_PARTIAL|CF_READ_ATTACHED|CF_WRITE_NULL|CF_WRITE_PARTIAL);
+		rpf_last = res->flags & ~(CF_READ_NULL|CF_READ_PARTIAL|CF_READ_ATTACHED|CF_WRITE_NULL|CF_WRITE_PARTIAL);
+
 		si_update_both(si_f, si_b);
 
+		/* changes requiring immediate attention are processed right now */
 		if (si_f->state == SI_ST_DIS || si_f->state != si_f_prev_state ||
 		    si_b->state == SI_ST_DIS || si_b->state != si_b_prev_state ||
 		    ((si_f->flags & SI_FL_ERR) && si_f->state != SI_ST_CLO) ||
-		    ((si_b->flags & SI_FL_ERR) && si_b->state != SI_ST_CLO) ||
-		    (((req->flags ^ rqf_last) | (res->flags ^ rpf_last)) & CF_MASK_ANALYSER))
+		    ((si_b->flags & SI_FL_ERR) && si_b->state != SI_ST_CLO))
 			goto redo;
 
+		/* I/O events (mostly CF_WRITE_PARTIAL) are aggregated with other I/Os */
+		if (((req->flags ^ rqf_last) | (res->flags ^ rpf_last)) & CF_MASK_ANALYSER)
+			task_wakeup(s->task, TASK_WOKEN_IO);
+
 		/* Trick: if a request is being waiting for the server to respond,
 		 * and if we know the server can timeout, we don't want the timeout
 		 * to expire on the client side first, but we're still interested