[MEDIUM] stats: report server and backend cumulated downtime
Hello,
This patch implements new statistics for SLA calculation by adding new
field 'Dwntime' with total down time since restart (both HTTP/CSV) and
extending status field (HTTP) or inserting a new one (CSV) with time
showing how long each server/backend is in a current state. Additionaly,
down transations are also calculated and displayed for backends, so it is
possible to know how many times selected backend was down, generating "No
server is available to handle this request." error.
New information are presentetd in two different ways:
- for HTTP: a "human redable form", one of "100000d 23h", "23h 59m" or
"59m 59s"
- for CSV: seconds
I believe that seconds resolution is enough.
As there are more columns in the status page I decided to shrink some
names to make more space:
- Weight -> Wght
- Check -> Chk
- Down -> Dwn
Making described changes I also made some improvements and fixed some
small bugs:
- don't increment s->health above 's->rise + s->fall - 1'. Previously it
was incremented an then (re)set to 's->rise + s->fall - 1'.
- do not set server down if it is down already
- do not set server up if it is up already
- fix colspan in multiple places (mostly introduced by my previous patch)
- add missing "status" header to CSV
- fix order of retries/redispatches in server (CSV)
- s/Tthen/Then/
- s/server/backend/ in DATA_ST_PX_BE (dumpstats.c)
Changes from previous version:
- deal with negative time intervales
- don't relay on s->state (SRV_RUNNING)
- little reworked human_time + compacted format (no spaces). If needed it
can be used in the future for other purposes by optionally making "cnt"
as an argument
- leave set_server_down mostly unchanged
- only little reworked "process_chk: 9"
- additional fields in CSV are appended to the rigth
- fix "SEC" macro
- named arguments (human_time, be_downtime, srv_downtime)
Hope it is OK. If there are only cosmetic changes needed please fill free
to correct it, however if there are some bigger changes required I would
like to discuss it first or at last to know what exactly was changed
especially since I already put this patch into my production server. :)
Thank you,
Best regards,
Krzysztof Oledzki
diff --git a/src/time.c b/src/time.c
index b80dca9..bb98d98 100644
--- a/src/time.c
+++ b/src/time.c
@@ -142,6 +142,40 @@
return __tv_isgt(tv1, tv2);
}
+char *human_time(int t, short hz) {
+ static char rv[sizeof("24855d23h")+1]; // longest of "23h59m" and "59m59s"
+ char *p = rv;
+ int cnt=2; // print two numbers
+
+ if (unlikely(t < 0 || hz <= 0)) {
+ sprintf(p, "?");
+ return rv;
+ }
+
+ if (unlikely(hz > 1))
+ t /= hz;
+
+ if (t >= DAY) {
+ p += sprintf(p, "%dd", t / DAY);
+ cnt--;
+ }
+
+ if (cnt && t % DAY / HOUR) {
+ p += sprintf(p, "%dh", t % DAY / HOUR);
+ cnt--;
+ }
+
+ if (cnt && t % HOUR / MINUTE) {
+ p += sprintf(p, "%dm", t % HOUR / MINUTE);
+ cnt--;
+ }
+
+ if ((cnt && t % MINUTE) || !t) // also display '0s'
+ p += sprintf(p, "%ds", t % MINUTE / SEC);
+
+ return rv;
+}
+
/*
* Local variables:
* c-indent-level: 8