MEDIUM: queue: update px->served and lb's take_conn once per loop

There's no point doing atomic incs over px->served/px->totpend under the
locks from the inner loop, as this value is used by the LB algorithms but
not during the dequeuing step. In addition, the LB algo's take_conn()
doesn't need to be refreshed for each and every connection taken
under the lock, it can be performed once at the end and out of the
lock.

While the gain on roundrobin is not noticeable (only the atomic inc),
on leastconn which uses take_conn(), the performance increases from
355k to 362k req/s on 16 threads.
1 file changed