BUG/MEDIUM: channel: fix miscalculation of available buffer space (3rd try) Latest fix 8a32106 ("BUG/MEDIUM: channel: fix miscalculation of available buffer space (2nd try)") did happen to fix some observable issues but not all of them in fact, some corner cases still remained and at least one user reported a busy loop that appeared possible, though not easily reproducible under experimental conditions. The remaining issue is that we still consider min(i, to_fwd) as the number of bytes in transit, but in fact <i> is not relevant here. Indeed, what matters is that we can read everything we want at once provided that at the end, <i> cannot be larger than <size-maxrw> (if it was not already). This is visible in two cases : - let's have i=o=max/2 and to_fwd=0. Then i+o >= max indicates that the buffer is already full, while it is not since once <o> is forwarded, some space remains. - when to_fwd is much larger than i, it's obvious that we can fill the buffer. The only relevant part in fact is o + to_fwd. to_fwd will ensure that at least this many bytes will be moved from <i> to <o> hence will leave the buffer, whatever the number of rounds it takes. Interestingly, the fix applied here ensures that channel_recv_max() will now equal (size - maxrw - i + to_fwd), which is indeed what remains available below maxrw after to_fwd bytes are forwarded from i to o and leave the buffer. Additionally, the latest fix made it possible to meet an integer overflow that was not caught by the range test when forwarding in TCP or tunnel mode due to to_forward being added to an existing value, causing the buffer size to be limited when it should not have been, resulting in 2 to 3 recv() calls when a single one was enough. The first one was limited to the unreserved buffer size, the second one to the size of the reserve minus 1, and the last one to the last byte. Eg with a 2kB buffer : recvfrom(22, "HTTP/1.1 200\r\nConnection: close\r"..., 1024, 0, NULL, NULL) = 1024 recvfrom(22, "23456789.123456789.123456789.123"..., 1023, 0, NULL, NULL) = 1023 recvfrom(22, "5", 1, 0, NULL, NULL) = 1 This bug is still present in 1.6 and 1.5 so the fix should be backported there. (cherry picked from commit 169c47028a0e897fff12d7454d8254c17b68562e) (cherry picked from commit ff9c7e24fbbc33074e5257297e38473a3411f407) [wt: in 1.5 the function is called buffer_max_len and is identical]

commit: 467c14797f9b18ae07aafa391e467f245eca9716 [log] [tgz]
author: Willy Tarreau <w@1wt.eu> Wed Apr 20 18:05:17 2016 +0200
committer: Willy Tarreau <w@1wt.eu> Mon Apr 25 17:50:38 2016 +0200
tree: bafc6c1d50fe4260ac7ea366170590677a946376
parent: fdc90ca66612c6070e108553bb53c992aab67f67 [diff] [blame]
diff --git a/include/proto/channel.h b/include/proto/channel.h
index 739d04c..fbdf872 100644
--- a/include/proto/channel.h
+++ b/include/proto/channel.h

@@ -267,26 +267,64 @@
 /* Return the max number of bytes the buffer can contain so that once all the
  * pending bytes are forwarded, the buffer still has global.tune.maxrewrite
  * bytes free. The result sits between chn->size - maxrewrite and chn->size.
- * The principle is the following :
- *   - a non-connected buffer cannot touch the reserve
- *   - infinite forward can fill the buffer
- *   - all output bytes are ignored, they're leaving
- *   - all input bytes covered by to_forward are considered in transit and
- *     virtually don't take room
- *   - the reserve may be covered up to the min of (fwd-transit) since these
- *     bytes will be in transit later thus will only take temporary space.
+ * It is important to mention that if buf->i is already larger than size-maxrw
+ * the condition above cannot be satisfied and the lowest size will be returned
+ * anyway. The principles are the following :
+ *   1) a non-connected buffer cannot touch the reserve
+ *   2) infinite forward can always fill the buffer since all data will leave
+ *   3) all output bytes are considered in transit since they're leaving
+ *   4) all input bytes covered by to_forward are considered in transit since
+ *      they'll be converted to output bytes.
+ *   5) all input bytes not covered by to_forward as considered remaining
+ *   6) all bytes scheduled to be forwarded minus what is already in the input
+ *      buffer will be in transit during future rounds.
+ *   7) 4+5+6 imply that the amount of input bytes (i) is irrelevant to the max
+ *      usable length, only to_forward and output count. The difference is
+ *      visible when to_forward > i.
+ *   8) the reserve may be covered up to the amount of bytes in transit since
+ *      these bytes will only take temporary space.
  *
- * So the formula is to return this limit is :
- *    size - maxrewrite + min(fwd - min(i, fwd), maxrewrite)
- *  = size - maxrewrite + min( min(fwd - i, 0), maxrewrite)
+ * A typical buffer looks like this :
  *
- * The code isn't written the most obvious way because we help the compiler
- * optimise it as it cannot guess how to factor the result out. The most common
- * path is jumpless.
+ *      <-------------- max_len ----------->
+ *      <---- o ----><----- i ----->        <--- 0..maxrewrite --->
+ *      +------------+--------------+-------+----------------------+
+ *      |////////////|\\\\\\\\\\\\\\|xxxxxxx|        reserve       |
+ *      +------------+--------+-----+-------+----------------------+
+ *                   <- fwd ->      <-avail->
+ *
+ * Or when to_forward > i :
+ *
+ *      <-------------- max_len ----------->
+ *      <---- o ----><----- i ----->        <--- 0..maxrewrite --->
+ *      +------------+--------------+-------+----------------------+
+ *      |////////////|\\\\\\\\\\\\\\|xxxxxxx|        reserve       |
+ *      +------------+--------+-----+-------+----------------------+
+ *                                  <-avail->
+ *                   <------------------ fwd ---------------->
+ *
+ * - the amount of buffer bytes in transit is : min(i, fwd) + o
+ * - some scheduled bytes may be in transit (up to fwd - i)
+ * - the reserve is max(0, maxrewrite - transit)
+ * - the maximum usable buffer length is size - reserve.
+ * - the available space is max_len - i - o
+ *
+ * So the formula to compute the buffer's maximum length to protect the reserve
+ * when reading new data is :
+ *
+ *    max = size - maxrewrite + min(maxrewrite, transit)
+ *        = size - max(maxrewrite - transit, 0)
+ *
+ * But WARNING! The conditions might change during the transfer and it could
+ * very well happen that a buffer would contain more bytes than max_len due to
+ * i+o already walking over the reserve (eg: after a header rewrite), including
+ * i or o alone hitting the limit. So it is critical to always consider that
+ * bounds may have already been crossed and that available space may be negative
+ * for example. Due to this it is perfectly possible for this function to return
+ * a value that is lower than current i+o.
  */
 static inline int buffer_max_len(const struct channel *chn)
 {
-	int transit;
 	int reserve;
 
 	/* return size - maxrewrite if we can't send */
@@ -294,21 +332,13 @@
 	if (unlikely(!channel_may_send(chn)))
 		goto end;
 
-	/* This apparently tricky check is just a hint to let the compiler
-	 * optimize all this code away as long as we don't change the types.
+	/* chn->to_forward may cause an integer underflow when equal to
+	 * CHN_INFINITE_FORWARD but we check for it afterwards as it produces
+	 * quite better assembly code in this sensitive code path.
 	 */
-	reserve = 0;
-	if (CHN_INFINITE_FORWARD < MAX_RANGE(typeof(chn->buf->i)) &&
-	    chn->to_forward == CHN_INFINITE_FORWARD)
-		goto end;
-
-	transit = chn->buf->o + chn->to_forward - chn->buf->i;
-	if (transit < 0)
-		transit = 0;
-
-	reserve = global.tune.maxrewrite - transit;
-	if (reserve < 0)
-		reserve = 0;
+	reserve = global.tune.maxrewrite - chn->buf->o - chn->to_forward;
+	if (chn->to_forward == CHN_INFINITE_FORWARD || reserve < 0)
+		return chn->buf->size;
  end:
 	return chn->buf->size - reserve;
 }
commit	467c14797f9b18ae07aafa391e467f245eca9716	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Wed Apr 20 18:05:17 2016 +0200
committer	Willy Tarreau <w@1wt.eu>	Mon Apr 25 17:50:38 2016 +0200
tree	bafc6c1d50fe4260ac7ea366170590677a946376
parent	fdc90ca66612c6070e108553bb53c992aab67f67 [diff] [blame]