BUG/MINOR: listener: do not immediately resume on transient error

The listener supports a "transient error" situation, which corresponds
to those situations where accept fails badly but poll() reports an event.
This happens for example when a listener is paused, or on out of FD. The
same mechanism is used when facing a maxconn or maxsessrate limitation.
When this happens, the listener is disabled for up to 100ms and put back
into the global listener queue so that it automatically wakes up again
as soon as the conditions change from an existing connection releasing
one resource, or the system recovers from a transient issue.

The listener_accept() function has a bug in its exit path causing a
freshly limited listener to be immediately enabled again because all
the conditions are met (connection count < max). It doesn't take into
account the fact that the listener might have been queued and must
first wait for the timeout to expire before doing so. The impact is
that upon certain errors, the faulty process will busy loop on the
accept code without sleeping. This is the scenario reported and
diagnosed by @hedong0411 in issue #382.

This commit fixes it by verifying that the global queue's delay is
at least expired before deciding to resume the listener. Another
approach could consist in having an extra state like LI_DELAY for
situations where only a delay is acceptable, but this would probably
not bring anything except more complex code.

This issue was introduced with the lock-free listener accept code
(commits 3f0d02b and 82c9789a) that were backported to 1.8.20+ and
1.9.7+, so this fix must be backported to the relevant branches.
1 file changed