blob: 136377fd003d0d0f055c915f205ebeea801448dd [file] [log] [blame]
Willy Tarreaufa9f9cc2018-05-18 16:18:17 +020012018-05-18 - Buffer rework
2
31. Summary
4
5The situation with the current buffer structure is becoming problematic in
6the newly introduced muxes and causes serious difficulties preventing muxes
7from being used on both sides, unless requiring that all code is duplicated
8to use buf->i on the Rx path and buf->o on the Tx path.
9
10
112. History
12
13A very long time ago, buffers were used to receive data using the recv() call,
14to parse them, forward them, and send them over the network using the send()
15call. Buffers were split into (buffer,channel) when some layers started to be
16introduced, and were reorganized a few times to ease content processing and
17rewriting. The current definition of the buffer structure is the following :
18
19 struct buffer {
20 char *p;
21 uint size;
22 uint i;
23 uint o;
24 char data[0];
25 };
26
27 data p
28 | |
29 V V
30 +-----------+--------------------+------------+-------------+
31 | |////////////////////|////////////| |
32 +-----------+--------------------+------------+-------------+
33 <---------------------------------------------------------> size
34 <------------------> <---------->
35 o i
36
37Pointer (p) is initialized to (data) when the buffer is empty. Data are
38received after (p+i), increasing (i) by the number of bytes read. Data are sent
39from (p-o) for up to (o) bytes, decreasing (o) by the number of bytes sent.
40Forwarding data in the channel consists in advancing (p) by the number of bytes
41to forward, increasing (o) and decreasing (i) by the same amount.
42
43This representation is convenient for channel operations because most of them
44require to parse input data between (p) and (p+i), and to have a simple way to
45forward data. Additionally, it's always easy to know if some data are scheduled
46for departure (o), or if the buffer has some room available (size-i-o).
47
48
493. Problems
50
51When applets were introduced, the initial code that was made to write data into
52the output part was modified to send it into the input part since we had to
53rely on the stream code to forward these data via the channel. This explains
54the flood of bi_* functions that were introduced to perform the same operations
55as the initial bo_*, to write into an input buffer from an applet.
56
57Health checks however continue to use output because checks do not use streams
58nor channels. Thus the check buffers use buf->o for requests and buf->i for
59responses.
60
61The introduction of muxes has changed this again by requiring that most request
62code was able to write to buf->i, pretending to be the equivalent of a socket
63recv() call. New bi_* functions had to be created to write headers and chunks
64from the HTTP/2 mux. Conversely, it was made necessary to parse HTTP traffic
65from buf->o while all the original code was made to parse this from buf->i.
66
67Furthermore, implementing an outgoing mux (eg: HTTP/2) will require to
68duplicate a lot of the code to use buf->i instead of buf->o and conversely,
69just because the mux will not be placed on the same side of the buffer. Not
70only it complicates code maintenance but it also emphasizes the risk to use
71the wrong function at any moment.
72
73From a performance perspective, applets have to suffer a useless copy most of
74the time, only due to API limitatoins : it is not possible to write directly to
75an input buffer, one has to write to a chunk and then copy it into a buffer. A
76compatible structure could allow to share the same data between the chunk and
77the buffer without having to perform an extra copy.
78
79
804. Proposal
81
82In checks and muxes, it is obvious that a single "side" of the buffer is used,
83and it generally is the one associated with the I/O to be performed. Only the
84channel requires the distinction between (i) and (o).
85
86The proposal is to remove this distinction from the buffer and move ->o into
87the channel.
88
89A buffer will then become only a linear (possibly wrapping) storage area with
90a beginning, and an end.
91
92Given the experience gathered from past buffer API updates, we know that the
93buffer's end is not as much important as its data length. This will give the
94current representation :
95
96
97 struct buffer {
98 void *area; // start of the storage area
99 uint head; // start offset of remaining data relative to area
100 uint len; // contents length after head
101 uint size; // size of the storage area (wrapping point)
102 };
103
104 area
105 |
106 V
107 +-----------+---------------------------------+-------------+
108 | |/////////////////////////////////| |
109 +-----------+---------------------------------+-------------+
110 <---------------------------------------------------------> size
111 <---------> <------------------------------->
112 head len
113
114The channel will contain an "output" field corresponding to the current buf->o,
115indicating how many bytes of the current buffer are actually scheduled for
116being forwarded and must not be considered anymore. It means that a stream
117parser will simply start to parse from (buf->area + buf->head + chn->output)
118and stop at (buf->area + buf->head + buf->len).
119
120For esnding data, the caller of cs_send() or whatever function will have to
121pass the desired number of bytes to send, and one will not expect anymore that
122all the buffer's contents have to be sent. In general the caller will have
123access to chn->output if it needs to use this (typically from the stream
124interface code at the moment).
125
126
1275. First implementation step
128
129The first step will consist in limiting the changes to the current buffers. The
130buffer structure will still contain both a descriptor and the storage area. A
131buffer will first be declared this way :
132
133 struct buffer {
134 uint head; // start offset of remaining data relative to area
135 uint len; // contents length after head
136 uint size; // size of the storage area (wrapping point)
137 void area[0]; // start of the storage area
138 };
139
140Thanks to this, no changes will have to be performed on memory management, and
141buffers will continue to be allocated from a pool of size (sizeof(buffer) +
142tune.bufsize).
143
144The following translations will have to be performed on the code :
145 - occurrences of (buf->i + buf->o) will have to be replaced with (buf->len)
146 - bi_ptr() -> ci_ptr() ; bi_end() -> b_head()+b_size() ; bi_del() -> b_del()
147 - bo_ptr() -> b_head() ; bo_end() -> co_end()
148 - b_adv() -> c_adv() ; b_rew() -> c_rew()
149 - buf->o will have to be replaced with either chn->output or a function
150 argument containing a copy of chn->output. These ones should cancel out
151 at the end of the operation.
152 - buf->i -> (b_len(buf) - chn->output)
153
154Temporary difficulties :
155 - compression makes use of both (i) and (o), by taking care of only touching
156 (i) and never (o). The filters know how not to touch (o), and the internal
157 compression API needs a small update so that this previous ->o value is
158 passed as an argument that the filter will collect from the channel. If it
159 is simpler (it probably isn't), a temporary "struct oldbuf" could be
160 created to emulate the old behaviour and be fed/used by the filters code.
161
162 - buffer_slow_realign() distinguishes input data from output data so that the
163 output data is always placed at the end, leaving a clean contigous buffer
164 once forwarded. Instead, a "split" argument will have to be added so that
165 the caller may decide where to split the contents. Muxes will pass zero
166 here while channels will pass chn->output.
167
168
1696. Second implementation step
170
171The second step will consist in making "struct buffer" only hold the descriptor
172and not the data anymore. It will then look like this :
173
174 struct buffer {
175 void *area; // start of the storage area
176 uint head; // start offset of remaining data relative to area
177 uint len; // contents length after head
178 uint size; // size of the storage area (wrapping point)
179 };
180
181Each buffer allocation will have to atomically allocate a struct buffer and an
182area. Buffer copies will consist in exchanging the "struct buffer" contents
183only.
184
185The chunk API must then be updated so that some new versions of chunk_putblk(),
186chunk_printf() etc can write to a storage area, and so that bi_putchk() and
187bo_putchk() instead can swap the storage areas when possible.
188
189At this point b_size() will be used to know where to release the allocated
190area. The storage will simply consist in (start,len) which is perfectly suited
191to have slabs. Just like chunks, b_size()==0 can be used to mention that no
192free() must be done on the area. Doing so will make it much simpler to send
193pre-formated messages (eg: error messages) into a buffer because such messages
194will then be stored into a "struct ist" and sending such a message will be as
195simple as doing :
196
197 b->area = ist.str;
198 b->len = ist.len;
199 b->head = 0;
200 b->size = 0;
201
202The chunk struct can then be removed and replaced everywhere with a struct
203buffer. Only the functions will remain, though they will likely have to be
204renamed. Maybe the buffer manipulation functions will have to be split between
205those which support wrapping and those which don't (chunks don't support
206wrapping).
207
208The buf_empty structure will then disappear since a small 20-bytes structure
209will be enough to represent an empty buffer.
210
211
2127. Third implementation step
213
214The third step will consist in placing a struct buffer into the struct channel.
215This way no allocation is needed at all, and any storage can be used to deliver
216contents. This allows to trivially upgrade a buffer on the fly by picking from
217a different slab. It also allows to deliver error messages without ever having
218to perform a buffer allocation. Doing so removes the need for the early buffer
219allocation for the response in process_stream(), as it is only needed to have a
220reliable place to send an error message to. This will ensure the buffer
221allocator can be simplified and made more robust against the risk of deadlock
222on memory shortage.
223
224
2258. Caveats
226
227The following points require extra care :
228 - there will be some subtracts to figure the buffer "input" length (formerly
229 buf->i). In the past it always used to be an unsigned value. Extreme care
230 will have to be taken to always use an inline function to compute this so
231 that it doesn't accidently become signed.
232
233 - supporting buf->size==0 to point to a special string may require some extra
234 checks to avoid causing an integer underflow when calculating (size-len) or
235 (size-len-head) to figure the available room.
236
237 - it is very likely that some further changes will be tempting to do in the
238 channel to better integrate the buffer (which becomes very similar to the
239 pipe), but we must not go as far as removing the visibility of the "struct
240 buffer" because it will be used as entry point for many functions.
241
242 - it is likely that a number of the chunk_*, bi_* and bo_* variants have very
243 minor variations like return codes or error checking that can make their
244 replacement very tricky. Each set of such functions must be studied in
245 advance, and their users as well.