doc/design-thoughts/buffer-redesign.txt - haproxy - Gitiles

 2012/02/27 - redesigning buffers for better simplicity - w@1wt.eu

 1) Analysis
 -----------

 Buffer handling becomes complex because buffers are circular but many of their
 users don't support wrapping operations (eg: HTTP parsing). Due to this fact,
 some buffer operations automatically realign buffers as soon as possible when
 the buffer is empty, which makes it very hard to track buffer pointers outside
 of the buffer struct itself. The buffer contains a pointer to last processed
 data (buf->lr) which is automatically realigned with such operations. But in
 the end, its semantics are often unclear and whether it's safe or not to use it
 isn't always obvious, as it has acquired multiple roles over the time.

 A "struct buffer" is declared this way :

     struct buffer {
 	unsigned int flags;             /* BF_* */
 	int rex;                        /* expiration date for a read, in ticks */
 	int wex;                        /* expiration date for a write or connect, in ticks */
 	int rto;                        /* read timeout, in ticks */
 	int wto;                        /* write timeout, in ticks */
 	unsigned int l;                 /* data length */
 	char *r, *w, *lr;               /* read ptr, write ptr, last read */
 	unsigned int size;              /* buffer size in bytes */
 	unsigned int send_max;          /* number of bytes the sender can consume om this buffer, <= l */
 	unsigned int to_forward;        /* number of bytes to forward after send_max without a wake-up */
 	unsigned int analysers;         /* bit field indicating what to do on the buffer */
 	int analyse_exp;                /* expiration date for current analysers (if set) */
 	void (*hijacker)(struct session *, struct buffer *); /* alternative content producer */
 	unsigned char xfer_large;       /* number of consecutive large xfers */
 	unsigned char xfer_small;       /* number of consecutive small xfers */
 	unsigned long long total;       /* total data read */
 	struct stream_interface *prod;  /* producer attached to this buffer */
 	struct stream_interface *cons;  /* consumer attached to this buffer */
 	struct pipe *pipe;		/* non-NULL only when data present */
 	char data[0];                   /* <size> bytes */
     };

 In order to address this, a struct http_msg was created with other pointers to
 the buffer. The issue is that some of these pointers are absolute and other
 ones are relative, sometimes one to another, sometimes to the beginning of the
 buffer, which doesn't help at all for the case where buffers get realigned.

 A "struct http_msg" is defined this way :

     struct http_msg {
 	unsigned int msg_state;
 	unsigned int flags;
 	unsigned int col, sov;      /* current header: colon, start of value */
 	unsigned int eoh;           /* End Of Headers, relative to buffer */
 	char *sol;                  /* start of line, also start of message when fully parsed */
 	char *eol;                  /* end of line */
 	unsigned int som;           /* Start Of Message, relative to buffer */
 	int err_pos;                /* err handling: -2=block, -1=pass, 0+=detected */
 	union {                     /* useful start line pointers, relative to ->sol */
 		struct {
 			int l;      /* request line length (not including CR) */
 			int m_l;    /* METHOD length (method starts at ->som) */
 			int u, u_l; /* URI, length */
 			int v, v_l; /* VERSION, length */
 		} rq;               /* request line : field, length */
 		struct {
 			int l;      /* status line length (not including CR) */
 			int v_l;    /* VERSION length (version starts at ->som) */
 			int c, c_l; /* CODE, length */
 			int r, r_l; /* REASON, length */
 		} st;               /* status line : field, length */
 	} sl;                       /* start line */
 	unsigned long long chunk_len;
 	unsigned long long body_len;
 	char **cap;
     };


 The first immediate observation is that nothing in a buffer should be relative
 to the beginning of the storage area, everything should be relative to the
 buffer's origin as a floating location. Right now the buffer's origin is equal
 to (buf->w + buf->send_max). It is the place where the first byte of data not
 yet scheduled for being forwarded is found.

   - buf->w is an absolute pointer, just like buf->data.
   - buf->send_max is a relative value which oscillates between 0 when nothing
     has to be forwarded, and buf->l when the whole buffer must be forwarded.


 2) Proposal
 -----------

 By having such an origin, we could have everything in http_msg relative to this
 origin. This would resist buffer realigns much better than right now.

 At the moment we have msg->som which is relative to buf->data and which points
 to the beginning of the message. The beginning of the message should *always*
 be the buffer's origin. If data are to be skipped in the message, just wait for
 send_max to become zero and move the origin forwards ; this would definitely get
 rid of msg->som. This is already what is done in the HTTP parser except that it
 has to move both buf->lr and msg->som.

 Following the same principle, we should then have a relative pointer in
 http_msg to replace buf->lr. It would be relative to the buffer's origin and
 would simply recall what location was last visited.

 Doing all this could result in more complex operations where more time is spent
 adding buf->w to buf->send_max and then to msg->anything. It would probably make
 more sense to define the buffer's origin as an absolute pointer and to have
 both the buf->h (head) and buf->t (tail) pointers be positive and negative
 positions relative to this origin. Operating on the buffer would then look like
 this :

   - no buf->l anymore. buf->l is replaced by (head + tail)
   - no buf->lr anymore. Use origin + msg->last for instance
   - recv() : head += recv(origin + head);
   - send() : tail -= send(origin - tail, tail);
     thus, buf->o effectively replaces buf->send_max.
   - forward(N) : tail += N; origin += N;
   - realign() : origin = data
   - detect risk of wrapping of input : origin + head > data + size

 In general it looks like less pointers are manipulated for common operations
 and that maybe an additional wrapping test (hand-made modulo) will have to be
 added so send() and recv() operations.


 3) Caveats
 ----------

 The first caveat is that the elements to modify appear at a very large number
 of places.
	2012/02/27 - redesigning buffers for better simplicity - w@1wt.eu

	1) Analysis
	-----------

	Buffer handling becomes complex because buffers are circular but many of their
	users don't support wrapping operations (eg: HTTP parsing). Due to this fact,
	some buffer operations automatically realign buffers as soon as possible when
	the buffer is empty, which makes it very hard to track buffer pointers outside
	of the buffer struct itself. The buffer contains a pointer to last processed
	data (buf->lr) which is automatically realigned with such operations. But in
	the end, its semantics are often unclear and whether it's safe or not to use it
	isn't always obvious, as it has acquired multiple roles over the time.

	A "struct buffer" is declared this way :

	struct buffer {
	unsigned int flags; /* BF_* */
	int rex; /* expiration date for a read, in ticks */
	int wex; /* expiration date for a write or connect, in ticks */
	int rto; /* read timeout, in ticks */
	int wto; /* write timeout, in ticks */
	unsigned int l; /* data length */
	char r, w, lr; / read ptr, write ptr, last read */
	unsigned int size; /* buffer size in bytes */
	unsigned int send_max; /* number of bytes the sender can consume om this buffer, <= l */
	unsigned int to_forward; /* number of bytes to forward after send_max without a wake-up */
	unsigned int analysers; /* bit field indicating what to do on the buffer */
	int analyse_exp; /* expiration date for current analysers (if set) */
	void (hijacker)(struct session , struct buffer ); / alternative content producer */
	unsigned char xfer_large; /* number of consecutive large xfers */
	unsigned char xfer_small; /* number of consecutive small xfers */
	unsigned long long total; /* total data read */
	struct stream_interface prod; / producer attached to this buffer */
	struct stream_interface cons; / consumer attached to this buffer */
	struct pipe pipe; / non-NULL only when data present */
	char data[0]; /* <size> bytes */
	};

	In order to address this, a struct http_msg was created with other pointers to
	the buffer. The issue is that some of these pointers are absolute and other
	ones are relative, sometimes one to another, sometimes to the beginning of the
	buffer, which doesn't help at all for the case where buffers get realigned.

	A "struct http_msg" is defined this way :

	struct http_msg {
	unsigned int msg_state;
	unsigned int flags;
	unsigned int col, sov; /* current header: colon, start of value */
	unsigned int eoh; /* End Of Headers, relative to buffer */
	char sol; / start of line, also start of message when fully parsed */
	char eol; / end of line */
	unsigned int som; /* Start Of Message, relative to buffer */
	int err_pos; /* err handling: -2=block, -1=pass, 0+=detected */
	union { /* useful start line pointers, relative to ->sol */
	struct {
	int l; /* request line length (not including CR) */
	int m_l; /* METHOD length (method starts at ->som) */
	int u, u_l; /* URI, length */
	int v, v_l; /* VERSION, length */
	} rq; /* request line : field, length */
	struct {
	int l; /* status line length (not including CR) */
	int v_l; /* VERSION length (version starts at ->som) */
	int c, c_l; /* CODE, length */
	int r, r_l; /* REASON, length */
	} st; /* status line : field, length */
	} sl; /* start line */
	unsigned long long chunk_len;
	unsigned long long body_len;
	char **cap;
	};


	The first immediate observation is that nothing in a buffer should be relative
	to the beginning of the storage area, everything should be relative to the
	buffer's origin as a floating location. Right now the buffer's origin is equal
	to (buf->w + buf->send_max). It is the place where the first byte of data not
	yet scheduled for being forwarded is found.

	- buf->w is an absolute pointer, just like buf->data.
	- buf->send_max is a relative value which oscillates between 0 when nothing
	has to be forwarded, and buf->l when the whole buffer must be forwarded.


	2) Proposal
	-----------

	By having such an origin, we could have everything in http_msg relative to this
	origin. This would resist buffer realigns much better than right now.

	At the moment we have msg->som which is relative to buf->data and which points
	to the beginning of the message. The beginning of the message should always
	be the buffer's origin. If data are to be skipped in the message, just wait for
	send_max to become zero and move the origin forwards ; this would definitely get
	rid of msg->som. This is already what is done in the HTTP parser except that it
	has to move both buf->lr and msg->som.

	Following the same principle, we should then have a relative pointer in
	http_msg to replace buf->lr. It would be relative to the buffer's origin and
	would simply recall what location was last visited.

	Doing all this could result in more complex operations where more time is spent
	adding buf->w to buf->send_max and then to msg->anything. It would probably make
	more sense to define the buffer's origin as an absolute pointer and to have
	both the buf->h (head) and buf->t (tail) pointers be positive and negative
	positions relative to this origin. Operating on the buffer would then look like
	this :

	- no buf->l anymore. buf->l is replaced by (head + tail)
	- no buf->lr anymore. Use origin + msg->last for instance
	- recv() : head += recv(origin + head);
	- send() : tail -= send(origin - tail, tail);
	thus, buf->o effectively replaces buf->send_max.
	- forward(N) : tail += N; origin += N;
	- realign() : origin = data
	- detect risk of wrapping of input : origin + head > data + size

	In general it looks like less pointers are manipulated for common operations
	and that maybe an additional wrapping test (hand-made modulo) will have to be
	added so send() and recv() operations.


	3) Caveats
	----------

	The first caveat is that the elements to modify appear at a very large number
	of places.