BUG/MEDIUM: clock: detect and cover jumps during execution

After commit e8b1ad4c2 ("BUG/MEDIUM: clock: also update the date offset
on time jumps"), @firexinghe mentioned that the issue was still present
in their case. In fact it depends on the load, which affects the
probability that the time changes between two poll() calls vs that it
changes during poll(). The time correction code used to only deal with
the latter. But under load if it changes between two poll() calls, what
happens then is that before_poll is off, and after returning from poll(),
the date is within bounds defined by before_poll, so no correction is
applied.

After many tests, it turns out that the most reliable solution without
using CLOCK_MONOTONIC is to prevent before_poll from being earlier than
the previous after_poll (trivial), and to cover forward jumps, we need
to enforce a margin. Given that the watchdog kills a looping task within
2 seconds and that no sane setup triggers it, it seems that 2 seconds
remains a safe enough margin. This means that in the worst case, some
forward jumps of up to 2 seconds will not be corrected, leading to an
apparent fast time and low rates. But this is supposed to be an exceptional
event anyway (typically an admin or crontab running ntpdate).

For future versions, given that we now opportunistically call
now_mono_time() before and after poll(), that returns zero if not
supported, we could imagine relying on this one for the thread's local
time when it's non-null.

(cherry picked from commit ef8d8215de2ed03a08172bbccb1146e5b867ce74)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 4c656153b34b2105c1c89ab9437ee18567493e6a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 1ca44aa13c93b376a7a909f9ad80b91fd9e8f7e8)
Signed-off-by: Willy Tarreau <w@1wt.eu>
1 file changed