MEDIUM: streams: Add the ability to retry a request on L7 failure.

When running in HTX mode, if we sent the request, but failed to get the
answer, either because the server just closed its socket, we hit a server
timeout, or we get a 404, 408, 425, 500, 501, 502, 503 or 504 error,
attempt to retry the request, exactly as if we just failed to connect to
the server.

To do so, add a new backend keyword, "retry-on".

It accepts a list of keywords, which can be "none" (never retry),
"conn-failure" (we failed to connect, or to do the SSL handshake),
"empty-response" (the server closed the connection without answering),
"response-timeout" (we timed out while waiting for the server response),
or "404", "408", "425", "500", "501", "502", "503" and "504".

The default is "conn-failure".
diff --git a/doc/configuration.txt b/doc/configuration.txt
index 4088291..f6269d7 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -2352,6 +2352,7 @@
 -- keyword -------------------------- defaults - frontend - listen -- backend -
 reqtarpit                                 -          X         X         X
 retries                                   X          -         X         X
+retry-on                                  X          -         X         X
 rspadd                                    -          X         X         X
 rspdel                                    -          X         X         X
 rspdeny                                   -          X         X         X
@@ -8004,6 +8005,70 @@
   See also : "option redispatch"
 
 
+retry-on [list of keywords]
+  Specify when to attempt to automatically retry a failed request
+  May be used in sections:    defaults | frontend | listen | backend
+                                 yes   |    no    |   yes  |   yes
+  Arguments :
+    <keywords>  is a list of keywords or HTTP status codes, each representing a
+                type of failure event on which an attempt to retry the request
+                is desired. Please read the notes at the bottom before changing
+                this setting. The following keywords are supported :
+
+      none              never retry
+
+      conn-failure      retry when the connection or the SSL handshake failed
+                        and the request could not be sent. This is the default.
+
+      empty-response    retry when the server connection was closed after part
+                        of the request was sent, and nothing was received from
+                        the server. This type of failure may be caused by the
+                        request timeout on the server side, poor network
+                        condition, or a server crash or restart while
+                        processing the request.
+
+      response-timeout  the server timeout stroke while waiting for the server
+                        to respond to the request. This may be caused by poor
+                        network condition, the reuse of an idle connection
+                        which has expired on the path, or by the request being
+                        extremely expensive to process. It generally is a bad
+                        idea to retry on such events on servers dealing with
+                        heavy database processing (full scans, etc) as it may
+                        amplify denial of service attacks.
+
+      <status>          any HTTP status code among "404" (Not Found), "408"
+                        (Request Timeout), "425" (Too Early), "500" (Server
+                        Error), "501" (Not Implemented), "502" (Bad Gateway),
+                        "503" (Service Unavailable), "504" (Gateway Timeout).
+
+  Using this directive replaces any previous settings with the new ones; it is
+  not cumulative.
+
+  Please note that using anything other than "none" and "conn-failure" requires
+  to allocate a buffer and copy the whole request into it, so it has memory and
+  performance impacts. Requests not fitting in a single buffer will never be
+  retried (see the global tune.bufsize setting).
+
+  You have to make sure the application has a replay protection mechanism built
+  in such as a unique transaction IDs passed in requests, or that replaying the
+  same request has no consequence, or it is very dangerous to use any retry-on
+  value beside "conn-failure" and "none". Static file servers and caches are
+  generally considered safe against any type of retry. Using a status code can
+  be useful to quickly leave a server showing an abnormal behavior (out of
+  memory, file system issues, etc), but in this case it may be a good idea to
+  immediately redispatch the connection to another server (please see "option
+  redispatch" for this). Last, it is important to understand that most causes
+  of failures are the requests themselves and that retrying a request causing a
+  server to misbehave will often make the situation even worse for this server,
+  or for the whole service in case of redispatch.
+
+  Unless you know exactly how the application deals with replayed requests, you
+  should not use this directive.
+
+  The default is "conn-failure".
+
+  See also: "retries", "option redispatch", "tune.bufsize"
+
 rspadd <string> [{if | unless} <cond>]
   Add a header at the end of the HTTP response
   May be used in sections :   defaults | frontend | listen | backend
diff --git a/doc/internals/filters.txt b/doc/internals/filters.txt
index 09090e5..2cb0eed 100644
--- a/doc/internals/filters.txt
+++ b/doc/internals/filters.txt
@@ -1170,9 +1170,12 @@
 Then, to finish, there are 2 informational callbacks:
 
   * 'flt_ops.http_reset': This callback is called when a HTTP message is
-    reset. This only happens when a '100-continue' response is received. It
+    reset. This happens either when a '100-continue' response is received, or
+    if we're retrying to send the request to the server after it failed. It
     could be useful to reset the filter context before receiving the true
     response.
+    You can know why the callback is called by checking s->txn->status. If it's
+    10X, we're called because of a '100-continue', if not, it's a L7 retry.
 
   * 'flt_ops.http_reply': This callback is called when, at any time, HAProxy
     decides to stop the processing on a HTTP message and to send an internal
diff --git a/include/proto/proxy.h b/include/proto/proxy.h
index 172c3d5..c75c0da 100644
--- a/include/proto/proxy.h
+++ b/include/proto/proxy.h
@@ -155,6 +155,35 @@
 			     update_freq_ctr(&fe->fe_req_per_sec, 1));
 }
 
+/* Returns non-zero if the proxy is configured to retry a request if we got that status, 0 overwise */
+static inline int l7_status_match(struct proxy *p, int status)
+{
+	/* Just return 0 if no retry was configured for any status */
+	if (!(p->retry_type & PR_RE_STATUS_MASK))
+		return 0;
+
+	switch (status) {
+	case 404:
+		return (p->retry_type & PR_RE_404);
+	case 408:
+		return (p->retry_type & PR_RE_408);
+	case 425:
+		return (p->retry_type & PR_RE_425);
+	case 500:
+		return (p->retry_type & PR_RE_500);
+	case 501:
+		return (p->retry_type & PR_RE_501);
+	case 502:
+		return (p->retry_type & PR_RE_502);
+	case 503:
+		return (p->retry_type & PR_RE_503);
+	case 504:
+		return (p->retry_type & PR_RE_504);
+	default:
+		break;
+	}
+	return 0;
+}
 #endif /* _PROTO_PROXY_H */
 
 /*
diff --git a/include/types/filters.h b/include/types/filters.h
index f52592d..1dd3398 100644
--- a/include/types/filters.h
+++ b/include/types/filters.h
@@ -137,7 +137,9 @@
  *                          it needs to wait for some reason, any other value
  *                          otherwise.
  *  - http_reset          : Called when the HTTP message is reseted. It happens
- *                          when a 100-continue response is received.
+ *                          either when a 100-continue response is received.
+ *                          that can be detected if s->txn->status is 10X, or
+ *                          if we're attempting a L7 retry.
  *                          Returns nothing.
  *  - http_reply          : Called when, at any time, HA proxy decides to stop
  *                          the HTTP message's processing and to send a message
diff --git a/include/types/proxy.h b/include/types/proxy.h
index 765e81d..5f19414 100644
--- a/include/types/proxy.h
+++ b/include/types/proxy.h
@@ -153,6 +153,7 @@
 
 #define PR_O2_FAKE_KA   0x00200000      /* pretend we do keep-alive with server eventhough we close */
 #define PR_O2_USE_HTX   0x00400000      /* use the HTX representation for the HTTP protocol */
+
 #define PR_O2_EXP_NONE  0x00000000      /* http-check : no expect rule */
 #define PR_O2_EXP_STS   0x00800000      /* http-check expect status */
 #define PR_O2_EXP_RSTS  0x01000000      /* http-check expect rstatus */
@@ -202,6 +203,21 @@
 #define PR_FBM_MISMATCH_NAME      0x02
 #define PR_FBM_MISMATCH_PROXYTYPE 0x04
 
+/* Bits for the different retry causes */
+#define PR_RE_CONN_FAILED         0x00000001 /* Retry if we failed to connect */
+#define PR_RE_DISCONNECTED        0x00000002 /* Retry if we got disconnected with no answer */
+#define PR_RE_TIMEOUT             0x00000004 /* Retry if we got a server timeout before we got any data */
+#define PR_RE_404                 0x00000008 /* Retry if we got a 404 */
+#define PR_RE_408                 0x00000010 /* Retry if we got a 408 */
+#define PR_RE_425                 0x00000020 /* Retry if we got a 425 */
+#define PR_RE_500                 0x00000040 /* Retry if we got a 500 */
+#define PR_RE_501                 0x00000080 /* Retry if we got a 501 */
+#define PR_RE_502                 0x00000100 /* Retry if we got a 502 */
+#define PR_RE_503                 0x00000200 /* Retry if we got a 503 */
+#define PR_RE_504                 0x00000400 /* Retry if we got a 504 */
+#define PR_RE_STATUS_MASK         (PR_RE_404 | PR_RE_408 | PR_RE_425 | \
+                                   PR_RE_425 | PR_RE_500 | PR_RE_501 | \
+                                   PR_RE_502 | PR_RE_503 | PR_RE_504)
 struct stream;
 
 struct http_snapshot {
@@ -364,6 +380,7 @@
 	char *server_id_hdr_name;                   /* the header to use to send the server id (name) */
 	int server_id_hdr_len;                      /* the length of the id (name) header... name */
 	int conn_retries;			/* maximum number of connect retries */
+	unsigned int retry_type;                /* Type of retry allowed */
 	int redispatch_after;			/* number of retries before redispatch */
 	unsigned down_trans;			/* up-down transitions */
 	unsigned down_time;			/* total time the proxy was down */
diff --git a/include/types/stream_interface.h b/include/types/stream_interface.h
index 61937e0..6b30de5 100644
--- a/include/types/stream_interface.h
+++ b/include/types/stream_interface.h
@@ -83,6 +83,7 @@
 	SI_FL_RXBLK_CONN = 0x00100000,  /* other side is not connected */
 	SI_FL_RXBLK_ANY  = 0x001F0000,  /* any of the RXBLK flags above */
 	SI_FL_RX_WAIT_EP = 0x00200000,  /* stream-int waits for more data from the end point */
+	SI_FL_L7_RETRY   = 0x01000000,  /* The stream interface may attempt L7 retries */
 };
 
 /* A stream interface has 3 parts :
@@ -111,6 +112,7 @@
 	int conn_retries;	/* number of connect retries left */
 	unsigned int hcto;      /* half-closed timeout (0 = unset) */
 	struct wait_event wait_event; /* We're in a wait list */
+	struct buffer l7_buffer; /* To store the data, in case we have to retry */
 };
 
 /* operations available on a stream-interface */
diff --git a/src/cfgparse-listen.c b/src/cfgparse-listen.c
index 5f44cfd..7c64be1 100644
--- a/src/cfgparse-listen.c
+++ b/src/cfgparse-listen.c
@@ -395,6 +395,7 @@
 		curproxy->except_mask = defproxy.except_mask;
 		curproxy->except_to = defproxy.except_to;
 		curproxy->except_mask_to = defproxy.except_mask_to;
+		curproxy->retry_type = defproxy.retry_type;
 
 		if (defproxy.fwdfor_hdr_len) {
 			curproxy->fwdfor_hdr_len  = defproxy.fwdfor_hdr_len;
diff --git a/src/cfgparse.c b/src/cfgparse.c
index 48d53e9..dd99bbb 100644
--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -2446,6 +2446,12 @@
 			}
 		}
 
+		if ((curproxy->retry_type &~ PR_RE_CONN_FAILED) &&
+		    !(curproxy->options2 & PR_O2_USE_HTX)) {
+			ha_warning("Proxy '%s' : retry-on with any other keywords than 'conn-failure' will be ignored, requires 'option http-use-htx'.\n", curproxy->id);
+			err_code |= ERR_WARN;
+			curproxy->retry_type &= PR_RE_CONN_FAILED;
+		}
 		if (curproxy->email_alert.set) {
 		    if (!(curproxy->email_alert.mailers.name && curproxy->email_alert.from && curproxy->email_alert.to)) {
 			    ha_warning("config : 'email-alert' will be ignored for %s '%s' (the presence any of "
diff --git a/src/proto_htx.c b/src/proto_htx.c
index ee9a271..d8363a2 100644
--- a/src/proto_htx.c
+++ b/src/proto_htx.c
@@ -1386,6 +1386,45 @@
 	return 0;
 }
 
+/* Reset the stream and the backend stream_interface to a situation suitable for attemption connection */
+/* Returns 0 if we can attempt to retry, -1 otherwise */
+static __inline int do_l7_retry(struct stream *s, struct stream_interface *si)
+{
+	struct channel *req, *res;
+	int co_data;
+
+	si->conn_retries--;
+	if (si->conn_retries < 0)
+		return -1;
+
+	req = &s->req;
+	res = &s->res;
+	/* Remove any write error from the request, and read error from the response */
+	req->flags &= ~(CF_WRITE_ERROR | CF_WRITE_TIMEOUT | CF_SHUTW | CF_SHUTW_NOW);
+	res->flags &= ~(CF_READ_ERROR | CF_READ_TIMEOUT | CF_SHUTR | CF_EOI | CF_READ_NULL | CF_SHUTR_NOW);
+	res->analysers = 0;
+	si->flags &= ~(SI_FL_ERR | SI_FL_EXP | SI_FL_RXBLK_SHUT);
+	si->state = SI_ST_REQ;
+	si->exp = TICK_ETERNITY;
+	res->rex = TICK_ETERNITY;
+	res->to_forward = 0;
+	res->analyse_exp = TICK_ETERNITY;
+	res->total = 0;
+	s->flags &= ~(SF_ASSIGNED | SF_ADDR_SET | SF_ERR_SRVTO | SF_ERR_SRVCL);
+	si_release_endpoint(&s->si[1]);
+	b_free(&req->buf);
+	/* Swap the L7 buffer with the channel buffer */
+	/* We know we stored the co_data as b_data, so get it there */
+	co_data = b_data(&si->l7_buffer);
+	b_set_data(&si->l7_buffer, b_size(&si->l7_buffer));
+	b_xfer(&req->buf, &si->l7_buffer, b_data(&si->l7_buffer));
+
+	co_set_data(req, co_data);
+	b_reset(&res->buf);
+	co_set_data(res, 0);
+	return 0;
+}
+
 /* This stream analyser waits for a complete HTTP response. It returns 1 if the
  * processing can continue on next analysers, or zero if it either needs more
  * data or wants to immediately abort the response (eg: timeout, error, ...). It
@@ -1406,6 +1445,7 @@
 	struct http_txn *txn = s->txn;
 	struct http_msg *msg = &txn->rsp;
 	struct htx *htx;
+	struct stream_interface *si_b = &s->si[1];
 	struct connection *srv_conn;
 	struct htx_sl *sl;
 	int n;
@@ -1453,6 +1493,17 @@
 			if (txn->flags & TX_NOT_FIRST)
 				goto abort_keep_alive;
 
+			if (si_b->flags & SI_FL_L7_RETRY) {
+				/* If we arrive here, then CF_READ_ERROR was
+				 * set by si_cs_recv() because we matched a
+				 * status, overwise it would have removed
+				 * the SI_FL_L7_RETRY flag, so it's ok not
+				 * to check s->be->retry_type.
+				 */
+				if (co_data(rep) || do_l7_retry(s, si_b) == 0)
+					return 0;
+			}
+
 			_HA_ATOMIC_ADD(&s->be->be_counters.failed_resp, 1);
 			if (objt_server(s->target)) {
 				_HA_ATOMIC_ADD(&__objt_server(s->target)->counters.failed_resp, 1);
@@ -1484,6 +1535,11 @@
 
 		/* 2: read timeout : return a 504 to the client. */
 		else if (rep->flags & CF_READ_TIMEOUT) {
+			if ((si_b->flags & SI_FL_L7_RETRY) &&
+			    (s->be->retry_type & PR_RE_TIMEOUT)) {
+				if (co_data(rep) || do_l7_retry(s, si_b) == 0)
+					return 0;
+			}
 			_HA_ATOMIC_ADD(&s->be->be_counters.failed_resp, 1);
 			if (objt_server(s->target)) {
 				_HA_ATOMIC_ADD(&__objt_server(s->target)->counters.failed_resp, 1);
@@ -1527,6 +1583,12 @@
 			if (txn->flags & TX_NOT_FIRST)
 				goto abort_keep_alive;
 
+			if ((si_b->flags & SI_FL_L7_RETRY) &&
+			    (s->be->retry_type & PR_RE_DISCONNECTED)) {
+				if (co_data(rep) || do_l7_retry(s, si_b) == 0)
+					return 0;
+			}
+
 			_HA_ATOMIC_ADD(&s->be->be_counters.failed_resp, 1);
 			if (objt_server(s->target)) {
 				_HA_ATOMIC_ADD(&__objt_server(s->target)->counters.failed_resp, 1);
diff --git a/src/proxy.c b/src/proxy.c
index a3f355f..6e804a9 100644
--- a/src/proxy.c
+++ b/src/proxy.c
@@ -501,6 +501,62 @@
 	}
 }
 
+/* This function parses a "retry-on" statement */
+static int
+proxy_parse_retry_on(char **args, int section, struct proxy *curpx,
+                               struct proxy *defpx, const char *file, int line,
+                               char **err)
+{
+	int i;
+
+	if (!(*args[1])) {
+		memprintf(err, "'%s' needs at least one keyword to specify when to retry", args[0]);
+		return -1;
+	}
+	if (!(curpx->cap & PR_CAP_BE)) {
+		memprintf(err, "'%s' only available in backend or listen section", args[0]);
+		return -1;
+	}
+	curpx->retry_type = 0;
+	for (i = 1; *(args[i]); i++) {
+		if (!strcmp(args[i], "conn-failure"))
+			curpx->retry_type |= PR_RE_CONN_FAILED;
+		else if (!strcmp(args[i], "empty-response"))
+			curpx->retry_type |= PR_RE_DISCONNECTED;
+		else if (!strcmp(args[i], "response-timeout"))
+			curpx->retry_type |= PR_RE_TIMEOUT;
+		else if (!strcmp(args[i], "404"))
+			curpx->retry_type |= PR_RE_404;
+		else if (!strcmp(args[i], "408"))
+			curpx->retry_type |= PR_RE_408;
+		else if (!strcmp(args[i], "425"))
+			curpx->retry_type |= PR_RE_425;
+		else if (!strcmp(args[i], "500"))
+			curpx->retry_type |= PR_RE_500;
+		else if (!strcmp(args[i], "501"))
+			curpx->retry_type |= PR_RE_501;
+		else if (!strcmp(args[i], "502"))
+			curpx->retry_type |= PR_RE_502;
+		else if (!strcmp(args[i], "503"))
+			curpx->retry_type |= PR_RE_503;
+		else if (!strcmp(args[i], "504"))
+			curpx->retry_type |= PR_RE_504;
+		else if (!strcmp(args[i], "none")) {
+			if (i != 1 || *args[i + 1]) {
+				memprintf(err, "'%s' 'none' keyworld only usable alone", args[0]);
+				return -1;
+			}
+		} else {
+			memprintf(err, "'%s': unknown keyword '%s'", args[0], args[i]);
+			return -1;
+		}
+
+	}
+
+
+	return 0;
+}
+
 /* This function inserts proxy <px> into the tree of known proxies. The proxy's
  * name is used as the storing key so it must already have been initialized.
  */
@@ -823,6 +879,9 @@
 	/* HTX is the default mode, for HTTP and TCP */
 	p->options2 |= PR_O2_USE_HTX;
 
+	/* Default to only allow L4 retries */
+	p->retry_type = PR_RE_CONN_FAILED;
+
 	HA_SPIN_INIT(&p->lock);
 }
 
@@ -1590,6 +1649,7 @@
 	{ CFG_LISTEN, "rate-limit", proxy_parse_rate_limit },
 	{ CFG_LISTEN, "max-keep-alive-queue", proxy_parse_max_ka_queue },
 	{ CFG_LISTEN, "declare", proxy_parse_declare },
+	{ CFG_LISTEN, "retry-on", proxy_parse_retry_on },
 	{ 0, NULL, NULL },
 }};
 
diff --git a/src/stream.c b/src/stream.c
index b3573c8..8c2ea55 100644
--- a/src/stream.c
+++ b/src/stream.c
@@ -323,6 +323,7 @@
 	if (flt_stream_init(s) < 0 || flt_stream_start(s) < 0)
 		goto out_fail_accept;
 
+	s->si[1].l7_buffer = BUF_NULL;
 	/* finish initialization of the accepted file descriptor */
 	if (appctx)
 		si_want_get(&s->si[0]);
@@ -475,6 +476,7 @@
 	tasklet_free(s->si[0].wait_event.task);
 	tasklet_free(s->si[1].wait_event.task);
 
+	b_free(&s->si[1].l7_buffer);
 	if (must_free_sess) {
 		sess->origin = NULL;
 		session_free(sess);
@@ -769,7 +771,7 @@
 
 	/* ensure that we have enough retries left */
 	si->conn_retries--;
-	if (si->conn_retries < 0) {
+	if (si->conn_retries < 0 || !(s->be->retry_type & PR_RE_CONN_FAILED)) {
 		if (!si->err_type) {
 			si->err_type = SI_ET_CONN_ERR;
 		}
@@ -2322,6 +2324,8 @@
 				 */
 				si_b->state = SI_ST_REQ; /* new connection requested */
 				si_b->conn_retries = s->be->conn_retries;
+				if (s->be->retry_type &~ PR_RE_CONN_FAILED)
+					si_b->flags |= SI_FL_L7_RETRY;
 			}
 		}
 		else {
diff --git a/src/stream_interface.c b/src/stream_interface.c
index 1e50c1f..731df38 100644
--- a/src/stream_interface.c
+++ b/src/stream_interface.c
@@ -30,8 +30,10 @@
 #include <proto/applet.h>
 #include <proto/channel.h>
 #include <proto/connection.h>
+#include <proto/http_htx.h>
 #include <proto/mux_pt.h>
 #include <proto/pipe.h>
+#include <proto/proxy.h>
 #include <proto/stream.h>
 #include <proto/stream_interface.h>
 #include <proto/task.h>
@@ -685,6 +687,34 @@
 		if (oc->flags & CF_STREAMER)
 			send_flag |= CO_SFL_STREAMER;
 
+		if ((si->flags & SI_FL_L7_RETRY) && !b_data(&si->l7_buffer)) {
+			/* If we want to be able to do L7 retries, copy
+			 * the data we're about to send, so that we are able
+			 * to resend them if needed
+			 */
+			/* Try to allocate a buffer if we had none.
+			 * If it fails, the next test will just
+			 * disable the l7 retries by setting
+			 * l7_conn_retries to 0.
+			 */
+			if (!(oc->flags & CF_EOI))
+				si->flags &= ~SI_FL_L7_RETRY;
+			else {
+				if (b_is_null(&si->l7_buffer))
+					b_alloc(&si->l7_buffer);
+				if (b_is_null(&si->l7_buffer))
+					si->flags &= ~SI_FL_L7_RETRY;
+				else {
+					memcpy(b_orig(&si->l7_buffer),
+					       b_orig(&oc->buf),
+					       b_size(&oc->buf));
+					si->l7_buffer.head = co_data(oc);
+					b_add(&si->l7_buffer, co_data(oc));
+				}
+
+			}
+		}
+
 		ret = cs->conn->mux->snd_buf(cs, &oc->buf, co_data(oc), send_flag);
 		if (ret > 0) {
 			did_send = 1;
@@ -1268,6 +1298,27 @@
 			break;
 		}
 
+		if (si->flags & SI_FL_L7_RETRY) {
+			struct htx *htx;
+			struct htx_sl *sl;
+
+			htx = htxbuf(&ic->buf);
+			if (htx) {
+				sl = http_find_stline(htx);
+				if (sl && l7_status_match(si_strm(si)->be,
+				    sl->info.res.status)) {
+					/* If we got a status for which we would
+					 * like to retry the request, empty
+					 * the buffer and pretend there's an
+					 * error on the channel.
+					 */
+					ic->flags |= CF_READ_ERROR;
+					htx_reset(htx);
+					return 1;
+				}
+			}
+			si->flags &= ~SI_FL_L7_RETRY;
+		}
 		cur_read += ret;
 
 		/* if we're allowed to directly forward data, we must update ->o */