BUG/MINOR: resolvers: Don't wait periodic resolution on healthcheck failure
DNS resoltions may be triggered via a "do-resolve" action or when a connection
failure is experienced during a healthcheck. Cached valid responses are used, if
possible. But if the entry is expired or if there is no valid response, a new
reolution should be performed. However, an resolution is only performed if the
"resolve" timeout is expired. Thus, when this comes from a healthcheck, it means
no extra resolution is performed at all.
Now, when the resolution is performed for a server (SRV or SRVEQ) and no valid
response is found, the resolution timer is reset (last_resolution is set to
TICK_ETERNITY). Of course, it is only performed if no resolution is already
running.
Note that this feature was broken 5 years ago when the resolvers code was
refactored (67957bd59e).
This patch should fix the issue #1906. It affects all stable versions. However,
it is probably a good idea to not backport it too far (2.6, maybe 2.4) and with
some delay.
diff --git a/src/resolvers.c b/src/resolvers.c
index 4bbc6e5..d930780 100644
--- a/src/resolvers.c
+++ b/src/resolvers.c
@@ -460,8 +460,17 @@
* valid */
exp = tick_add(res->last_resolution, resolvers->hold.valid);
if (resolvers->t && (res->status != RSLV_STATUS_VALID ||
- !tick_isset(res->last_resolution) || tick_is_expired(exp, now_ms)))
+ !tick_isset(res->last_resolution) || tick_is_expired(exp, now_ms))) {
+ /* If the resolution is not running and the requester is a
+ * server, reset the resoltion timer to force a quick
+ * resolution.
+ */
+ if (res->step == RSLV_STEP_NONE &&
+ (obj_type(req->owner) == OBJ_TYPE_SERVER ||
+ obj_type(req->owner) == OBJ_TYPE_SRVRQ))
+ res->last_resolution = TICK_ETERNITY;
task_wakeup(resolvers->t, TASK_WOKEN_OTHER);
+ }
leave_resolver_code();
}