blob: c21ac108a4d0a70603e620b3b37f1f7df7e8a946 [file] [log] [blame]
Willy Tarreaue607df32014-10-23 18:36:16 +020012014/10/23 - design thoughts for HTTP/2
2
3- connections : HTTP/2 depends a lot more on a connection than HTTP/1 because a
4 connection holds a compression context (headers table, etc...). We probably
5 need to have an h2_conn struct.
6
7- multiple transactions will be handled in parallel for a given h2_conn. They
8 are called streams in HTTP/2 terminology.
9
10- multiplexing : for a given client-side h2 connection, we can have multiple
11 server-side h2 connections. And for a server-side h2 connection, we can have
12 multiple client-side h2 connections. Streams circulate in N-to-N fashion.
13
14- flow control : flow control will be applied between multiple streams. Special
15 care must be taken so that an H2 client cannot block some H2 servers by
16 sending requests spread over multiple servers to the point where one server
17 response is blocked and prevents other responses from the same server from
18 reaching their clients. H2 connection buffers must always be empty or nearly
19 empty. The per-stream flow control needs to be respected as well as the
20 connection's buffers. It is important to implement some fairness between all
21 the streams so that it's not always the same which gets the bandwidth when
22 the connection is congested.
23
24- some clients can be H1 with an H2 server (is this really needed ?). Most of
25 the initial use case will be H2 clients to H1 servers. It is important to keep
26 in mind that H1 servers do not do flow control and that we don't want them to
27 block transfers (eg: post upload).
28
29- internal tasks : some H2 clients will be internal tasks (eg: health checks).
30 Some H2 servers will be internal tasks (eg: stats, cache). The model must be
31 compatible with this use case.
32
33- header indexing : headers are transported compressed, with a reference to a
34 static or a dynamic header, or a literal, possibly huffman-encoded. Indexing
35 is specific to the H2 connection. This means there is no way any binary data
36 can flow between both sides, headers will have to be decoded according to the
37 incoming connection's context and re-encoded according to the outgoing
38 connection's context, which can significantly differ. In order to avoid the
39 parsing trouble we currently face, headers will have to be clearly split
40 between name and value. It is worth noting that neither the incoming nor the
41 outgoing connections' contexts will be of any use while processing the
42 headers. At best we can have some shortcuts for well-known names that map
43 well to the static ones (eg: use the first static entry with same name), and
44 maybe have a few special cases for static name+value as well. Probably we can
45 classify headers in such categories :
46
47 - static name + value
48 - static name + other value
49 - dynamic name + other value
50
51 This will allow for better processing in some specific cases. Headers
52 supporting a single value (:method, :status, :path, ...) should probably
53 be stored in a single location with a direct access. That would allow us
54 to retrieve a method using hdr[METHOD]. All such indexing must be performed
55 while parsing. That also means that HTTP/1 will have to be converted to this
56 representation very early in the parser and possibly converted back to H/1
57 after processing.
58
59 Header names/values will have to be placed in a small memory area that will
60 inevitably get fragmented as headers are rewritten. An automatic packing
61 mechanism must be implemented so that when there's no more room, headers are
62 simply defragmented/packet to a new table and the old one is released. Just
63 like for the static chunks, we need to have a few such tables pre-allocated
64 and ready to be swapped at any moment. Repacking must not change any index
65 nor affect the way headers are compressed so that it can happen late after a
66 retry (send-name-header for example).
67
68- header processing : can still happen on a (header, value) basis. Reqrep/
69 rsprep completely disappear and will have to be replaced with something else
70 to support renaming headers and rewriting url/path/...
71
72- push_promise : servers can push dummy requests+responses. They advertise
73 the stream ID in the push_promise frame indicating the associated stream ID.
74 This means that it is possible to initiate a client-server stream from the
75 information coming from the server and make the data flow as if the client
76 had made it. It's likely that we'll have to support two types of server
77 connections: those which support push and those which do not. That way client
78 streams will be distributed to existing server connections based on their
79 capabilities. It's important to keep in mind that PUSH will not be rewritten
80 in responses.
81
82- stream ID mapping : since the stream ID is per H2 connection, stream IDs will
83 have to be mapped. Thus a given stream is an entity with two IDs (one per
84 side). Or more precisely a stream has two end points, each one carrying an ID
85 when it ends on an HTTP2 connection. Also, for each stream ID we need to
86 quickly find the associated transaction in progress. Using a small quick
87 unique tree seems indicated considering the wide range of valid values.
88
89- frame sizes : frame have to be remapped between both sides as multiplexed
90 connections won't always have the same characteristics. Thus some frames
91 might be spliced and others will be sliced.
92
93- error processing : care must be taken to never break a connection unless it
94 is dead or corrupt at the protocol level. Stats counter must exist to observe
95 the causes. Timeouts are a great problem because silent connections might
96 die out of inactivity. Ping frames should probably be scheduled a few seconds
97 before the connection timeout so that an unused connection is verified before
98 being killed. Abnormal requests must be dealt with using RST_STREAM.
99
Ilya Shipitsin2075ca82020-03-06 23:22:22 +0500100- ALPN : ALPN must be observed on the client side, and transmitted to the server
Willy Tarreaue607df32014-10-23 18:36:16 +0200101 side.
102
103- proxy protocol : proxy protocol makes little to no sense in a multiplexed
104 protocol. A per-stream equivalent will surely be needed if implementations
105 do not quickly generalize the use of Forward.
106
107- simplified protocol for local devices (eg: haproxy->varnish in clear and
108 without handshake, and possibly even with splicing if the connection's
109 settings are shared)
110
111- logging : logging must report a number of extra information such as the
112 stream ID, and whether the transaction was initiated by the client or by the
113 server (which can be deduced from the stream ID's parity). In case of push,
114 the number of the associated stream must also be reported.
115
116- memory usage : H2 increases memory usage by mandating use of 16384 bytes
117 frame size minimum. That means slightly more than 16kB of buffer in each
118 direction to process any frame. It will definitely have an impact on the
119 deployed maxconn setting in places using less than this (4..8kB are common).
Joseph Herlant02cedc42018-11-13 19:45:17 -0800120 Also, the header list is persistent per connection, so if we reach the same
Willy Tarreaue607df32014-10-23 18:36:16 +0200121 size as the request, that's another 16kB in each direction, resulting in
122 about 48kB of memory where 8 were previously used. A more careful encoder
123 can work with a much smaller set even if that implies evicting entries
124 between multiple headers of the same message.
125
126- HTTP/1.0 should very carefully be transported over H2. Since there's no way
127 to pass version information in the protocol, the server could use some
128 features of HTTP/1.1 that are unsafe in HTTP/1.0 (compression, trailers,
129 ...).
130
131- host / :authority : ":authority" is the norm, and "host" will be absent when
132 H2 clients generate :authority. This probably means that a dummy Host header
133 will have to be produced internally from :authority and removed when passing
134 to H2 behind. This can cause some trouble when passing H2 requests to H1
135 proxies, because there's no way to know if the request should contain scheme
136 and authority in H1 or not based on the H2 request. Thus a "proxy" option
Ilya Shipitsin2075ca82020-03-06 23:22:22 +0500137 will have to be explicitly mentioned on HTTP/1 server lines. One of the
Willy Tarreaue607df32014-10-23 18:36:16 +0200138 problem that it creates is that it's not longer possible to pass H/1 requests
139 to H/1 proxies without an explicit configuration. Maybe a table of the
140 various combinations is needed.
141
142 :scheme :authority host
143 HTTP/2 request present present absent
144 HTTP/1 server req absent absent present
145 HTTP/1 proxy req present present present
146
147 So in the end the issue is only with H/2 requests passed to H/1 proxies.
148
149- ping frames : they don't indicate any stream ID so by definition they cannot
150 be forwarded to any server. The H2 connection should deal with them only.
151
152There's a layering problem with H2. The framing layer has to be aware of the
153upper layer semantics. We can't simply re-encode HTTP/1 to HTTP/2 then pass
154it over a framing layer to mux the streams, the frame type must be passed below
155so that frames are properly arranged. Header encoding is connection-based and
156all streams using the same connection will interact in the way their headers
157are encoded. Thus the encoder *has* to be placed in the h2_conn entity, and
158this entity has to know for each stream what its headers are.
159
160Probably that we should remove *all* headers from transported data and move
161them on the fly to a parallel structure that can be shared between H1 and H2
162and consumed at the appropriate level. That means buffers only transport data.
163Trailers have to be dealt with differently.
164
165So if we consider an H1 request being forwarded between a client and a server,
166it would look approximately like this :
167
168 - request header + body land into a stream's receive buffer
169 - headers are indexed and stripped out so that only the body and whatever
170 follows remain in the buffer
171 - both the header index and the buffer with the body stay attached to the
172 stream
173 - the sender can rebuild the whole headers. Since they're found in a table
174 supposed to be stable, it can rebuild them as many times as desired and
175 will always get the same result, so it's safe to build them into the trash
176 buffer for immediate sending, just as we do for the PROXY protocol.
177 - the upper protocol should probably provide a build_hdr() callback which
178 when called by the socket layer, builds this header block based on the
179 current stream's header list, ready to be sent.
180 - the socket layer has to know how many bytes from the headers are left to be
181 forwarded prior to processing the body.
182 - the socket layer needs to consume only the acceptable part of the body and
183 must not release the buffer if any data remains in it (eg: pipelining over
184 H1). This is already handled by channel->o and channel->to_forward.
185 - we could possibly have another optional callback to send a preamble before
186 data, that could be used to send chunk sizes in H1. The danger is that it
187 absolutely needs to be stable if it has to be retried. But it could
188 considerably simplify de-chunking.
189
190When the request is sent to an H2 server, an H2 stream request must be made
191to the server, we find an existing connection whose settings are compatible
192with our needs (eg: tls/clear, push/no-push), and with a spare stream ID. If
193none is found, a new connection must be established, unless maxconn is reached.
194
195Servers must have a maxstream setting just like they have a maxconn. The same
196queue may be used for that.
197
198The "tcp-request content" ruleset must apply to the TCP layer. But with HTTP/2
199that becomes impossible (and useless). We still need something like the
200"tcp-request session" hook to apply just after the SSL handshake is done.
201
202It is impossible to defragment the body on the fly in HTTP/2. Since multiple
203messages are interleaved, we cannot wait for all of them and block the head of
204line. Thus if body analysis is required, it will have to use the stream's
205buffer, which necessarily implies a copy. That means that with each H2 end we
206necessarily have at least one copy. Sometimes we might be able to "splice" some
207bytes from one side to the other without copying into the stream buffer (same
208rules as for TCP splicing).
209
210In theory, only data should flow through the channel buffer, so each side's
211connector is responsible for encoding data (H1: linear/chunks, H2: frames).
212Maybe the same mechanism could be extrapolated to tunnels / TCP.
213
214Since we'd use buffers only for data (and for receipt of headers), we need to
215have dynamic buffer allocation.
216
217Thus :
218- Tx buffers do not exist. We allocate a buffer on the fly when we're ready to
Joseph Herlant02cedc42018-11-13 19:45:17 -0800219 send something that we need to build and that needs to be persistent in case
Willy Tarreaue607df32014-10-23 18:36:16 +0200220 of partial send. H1 headers are built on the fly from the header table to a
221 temporary buffer that is immediately sent and whose amount of sent bytes is
222 the only information kept (like for PROXY protocol). H2 headers are more
223 complex since the encoding depends on what was successfully sent. Thus we
224 need to build them and put them into a temporary buffer that remains
225 persistent in case send() fails. It is possible to have a limited pool of
226 Tx buffers and refrain from sending if there is no more buffer available in
227 the pool. In that case we need a wake-up mechanism once a buffer is
228 available. Once the data are sent, the Tx buffer is then immediately recycled
229 in its pool. Note that no tx buffer being used (eg: for hdr or control) means
230 that we have to be able to serialize access to the connection and retry with
231 the same stream. It also means that a stream that times out while waiting for
232 the connector to read the second half of its request has to stay there, or at
233 least needs to be handled gracefully. However if the connector cannot read
234 the data to be sent, it means that the buffer is congested and the connection
235 is dead, so that probably means it can be killed.
236
237- Rx buffers have to be pre-allocated just before calling recv(). A connection
238 will first try to pick a buffer and disable reception if it fails, then
239 subscribe to the list of tasks waiting for an Rx buffer.
240
241- full Rx buffers might sometimes be moved around to the next buffer instead of
242 experiencing a copy. That means that channels and connectors must use the
243 same format of buffer, and that only the channel will have to see its
244 pointers adjusted.
245
246- Tx of data should be made as much as possible without copying. That possibly
247 means by directly looking into the connection buffer on the other side if
248 the local Tx buffer does not exist and the stream buffer is not allocated, or
249 even performing a splice() call between the two sides. One of the problem in
250 doing this is that it requires proper ordering of the operations (eg: when
251 multiple readers are attached to a same buffer). If the splitting occurs upon
252 receipt, there's no problem. If we expect to retrieve data directly from the
253 original buffer, it's harder since it contains various things in an order
254 which does not even indicate what belongs to whom. Thus possibly the only
255 mechanism to implement is the buffer permutation which guarantees zero-copy
256 and only in the 100% safe case. Also it's atomic and does not cause HOL
257 blocking.
258
259It makes sense to chose the frontend_accept() function right after the
260handshake ended. It is then possible to check the ALPN, the SNI, the ciphers
261and to accept to switch to the h2_conn_accept handler only if everything is OK.
262The h2_conn_accept handler will have to deal with the connection setup,
263initialization of the header table, exchange of the settings frames and
264preparing whatever is needed to fire new streams upon receipt of unknown
265stream IDs. Note: most of the time it will not be possible to splice() because
266we need to know in advance the amount of bytes to write the header, and here it
267will not be possible.
268
269H2 health checks must be seen as regular transactions/streams. The check runs a
270normal client which seeks an available stream from a server. The server then
271finds one on an existing connection or initiates a new H2 connection. The H2
272checks will have to be configurable for sharing streams or not. Another option
273could be to specify how many requests can be made over existing connections
274before insisting on getting a separate connection. Note that such separate
275connections might end up stacking up once released. So probably that they need
276to be recycled very quickly (eg: fix how many unused ones can exist max).
277