| 2014/10/23 - design thoughts for HTTP/2 |
| |
| - connections : HTTP/2 depends a lot more on a connection than HTTP/1 because a |
| connection holds a compression context (headers table, etc...). We probably |
| need to have an h2_conn struct. |
| |
| - multiple transactions will be handled in parallel for a given h2_conn. They |
| are called streams in HTTP/2 terminology. |
| |
| - multiplexing : for a given client-side h2 connection, we can have multiple |
| server-side h2 connections. And for a server-side h2 connection, we can have |
| multiple client-side h2 connections. Streams circulate in N-to-N fashion. |
| |
| - flow control : flow control will be applied between multiple streams. Special |
| care must be taken so that an H2 client cannot block some H2 servers by |
| sending requests spread over multiple servers to the point where one server |
| response is blocked and prevents other responses from the same server from |
| reaching their clients. H2 connection buffers must always be empty or nearly |
| empty. The per-stream flow control needs to be respected as well as the |
| connection's buffers. It is important to implement some fairness between all |
| the streams so that it's not always the same which gets the bandwidth when |
| the connection is congested. |
| |
| - some clients can be H1 with an H2 server (is this really needed ?). Most of |
| the initial use case will be H2 clients to H1 servers. It is important to keep |
| in mind that H1 servers do not do flow control and that we don't want them to |
| block transfers (eg: post upload). |
| |
| - internal tasks : some H2 clients will be internal tasks (eg: health checks). |
| Some H2 servers will be internal tasks (eg: stats, cache). The model must be |
| compatible with this use case. |
| |
| - header indexing : headers are transported compressed, with a reference to a |
| static or a dynamic header, or a literal, possibly huffman-encoded. Indexing |
| is specific to the H2 connection. This means there is no way any binary data |
| can flow between both sides, headers will have to be decoded according to the |
| incoming connection's context and re-encoded according to the outgoing |
| connection's context, which can significantly differ. In order to avoid the |
| parsing trouble we currently face, headers will have to be clearly split |
| between name and value. It is worth noting that neither the incoming nor the |
| outgoing connections' contexts will be of any use while processing the |
| headers. At best we can have some shortcuts for well-known names that map |
| well to the static ones (eg: use the first static entry with same name), and |
| maybe have a few special cases for static name+value as well. Probably we can |
| classify headers in such categories : |
| |
| - static name + value |
| - static name + other value |
| - dynamic name + other value |
| |
| This will allow for better processing in some specific cases. Headers |
| supporting a single value (:method, :status, :path, ...) should probably |
| be stored in a single location with a direct access. That would allow us |
| to retrieve a method using hdr[METHOD]. All such indexing must be performed |
| while parsing. That also means that HTTP/1 will have to be converted to this |
| representation very early in the parser and possibly converted back to H/1 |
| after processing. |
| |
| Header names/values will have to be placed in a small memory area that will |
| inevitably get fragmented as headers are rewritten. An automatic packing |
| mechanism must be implemented so that when there's no more room, headers are |
| simply defragmented/packet to a new table and the old one is released. Just |
| like for the static chunks, we need to have a few such tables pre-allocated |
| and ready to be swapped at any moment. Repacking must not change any index |
| nor affect the way headers are compressed so that it can happen late after a |
| retry (send-name-header for example). |
| |
| - header processing : can still happen on a (header, value) basis. Reqrep/ |
| rsprep completely disappear and will have to be replaced with something else |
| to support renaming headers and rewriting url/path/... |
| |
| - push_promise : servers can push dummy requests+responses. They advertise |
| the stream ID in the push_promise frame indicating the associated stream ID. |
| This means that it is possible to initiate a client-server stream from the |
| information coming from the server and make the data flow as if the client |
| had made it. It's likely that we'll have to support two types of server |
| connections: those which support push and those which do not. That way client |
| streams will be distributed to existing server connections based on their |
| capabilities. It's important to keep in mind that PUSH will not be rewritten |
| in responses. |
| |
| - stream ID mapping : since the stream ID is per H2 connection, stream IDs will |
| have to be mapped. Thus a given stream is an entity with two IDs (one per |
| side). Or more precisely a stream has two end points, each one carrying an ID |
| when it ends on an HTTP2 connection. Also, for each stream ID we need to |
| quickly find the associated transaction in progress. Using a small quick |
| unique tree seems indicated considering the wide range of valid values. |
| |
| - frame sizes : frame have to be remapped between both sides as multiplexed |
| connections won't always have the same characteristics. Thus some frames |
| might be spliced and others will be sliced. |
| |
| - error processing : care must be taken to never break a connection unless it |
| is dead or corrupt at the protocol level. Stats counter must exist to observe |
| the causes. Timeouts are a great problem because silent connections might |
| die out of inactivity. Ping frames should probably be scheduled a few seconds |
| before the connection timeout so that an unused connection is verified before |
| being killed. Abnormal requests must be dealt with using RST_STREAM. |
| |
| - ALPN : ALPN must be observed onthe client side, and transmitted to the server |
| side. |
| |
| - proxy protocol : proxy protocol makes little to no sense in a multiplexed |
| protocol. A per-stream equivalent will surely be needed if implementations |
| do not quickly generalize the use of Forward. |
| |
| - simplified protocol for local devices (eg: haproxy->varnish in clear and |
| without handshake, and possibly even with splicing if the connection's |
| settings are shared) |
| |
| - logging : logging must report a number of extra information such as the |
| stream ID, and whether the transaction was initiated by the client or by the |
| server (which can be deduced from the stream ID's parity). In case of push, |
| the number of the associated stream must also be reported. |
| |
| - memory usage : H2 increases memory usage by mandating use of 16384 bytes |
| frame size minimum. That means slightly more than 16kB of buffer in each |
| direction to process any frame. It will definitely have an impact on the |
| deployed maxconn setting in places using less than this (4..8kB are common). |
| Also, the header list is persistent per connection, so if we reach the same |
| size as the request, that's another 16kB in each direction, resulting in |
| about 48kB of memory where 8 were previously used. A more careful encoder |
| can work with a much smaller set even if that implies evicting entries |
| between multiple headers of the same message. |
| |
| - HTTP/1.0 should very carefully be transported over H2. Since there's no way |
| to pass version information in the protocol, the server could use some |
| features of HTTP/1.1 that are unsafe in HTTP/1.0 (compression, trailers, |
| ...). |
| |
| - host / :authority : ":authority" is the norm, and "host" will be absent when |
| H2 clients generate :authority. This probably means that a dummy Host header |
| will have to be produced internally from :authority and removed when passing |
| to H2 behind. This can cause some trouble when passing H2 requests to H1 |
| proxies, because there's no way to know if the request should contain scheme |
| and authority in H1 or not based on the H2 request. Thus a "proxy" option |
| will have to be explicitly mentionned on HTTP/1 server lines. One of the |
| problem that it creates is that it's not longer possible to pass H/1 requests |
| to H/1 proxies without an explicit configuration. Maybe a table of the |
| various combinations is needed. |
| |
| :scheme :authority host |
| HTTP/2 request present present absent |
| HTTP/1 server req absent absent present |
| HTTP/1 proxy req present present present |
| |
| So in the end the issue is only with H/2 requests passed to H/1 proxies. |
| |
| - ping frames : they don't indicate any stream ID so by definition they cannot |
| be forwarded to any server. The H2 connection should deal with them only. |
| |
| There's a layering problem with H2. The framing layer has to be aware of the |
| upper layer semantics. We can't simply re-encode HTTP/1 to HTTP/2 then pass |
| it over a framing layer to mux the streams, the frame type must be passed below |
| so that frames are properly arranged. Header encoding is connection-based and |
| all streams using the same connection will interact in the way their headers |
| are encoded. Thus the encoder *has* to be placed in the h2_conn entity, and |
| this entity has to know for each stream what its headers are. |
| |
| Probably that we should remove *all* headers from transported data and move |
| them on the fly to a parallel structure that can be shared between H1 and H2 |
| and consumed at the appropriate level. That means buffers only transport data. |
| Trailers have to be dealt with differently. |
| |
| So if we consider an H1 request being forwarded between a client and a server, |
| it would look approximately like this : |
| |
| - request header + body land into a stream's receive buffer |
| - headers are indexed and stripped out so that only the body and whatever |
| follows remain in the buffer |
| - both the header index and the buffer with the body stay attached to the |
| stream |
| - the sender can rebuild the whole headers. Since they're found in a table |
| supposed to be stable, it can rebuild them as many times as desired and |
| will always get the same result, so it's safe to build them into the trash |
| buffer for immediate sending, just as we do for the PROXY protocol. |
| - the upper protocol should probably provide a build_hdr() callback which |
| when called by the socket layer, builds this header block based on the |
| current stream's header list, ready to be sent. |
| - the socket layer has to know how many bytes from the headers are left to be |
| forwarded prior to processing the body. |
| - the socket layer needs to consume only the acceptable part of the body and |
| must not release the buffer if any data remains in it (eg: pipelining over |
| H1). This is already handled by channel->o and channel->to_forward. |
| - we could possibly have another optional callback to send a preamble before |
| data, that could be used to send chunk sizes in H1. The danger is that it |
| absolutely needs to be stable if it has to be retried. But it could |
| considerably simplify de-chunking. |
| |
| When the request is sent to an H2 server, an H2 stream request must be made |
| to the server, we find an existing connection whose settings are compatible |
| with our needs (eg: tls/clear, push/no-push), and with a spare stream ID. If |
| none is found, a new connection must be established, unless maxconn is reached. |
| |
| Servers must have a maxstream setting just like they have a maxconn. The same |
| queue may be used for that. |
| |
| The "tcp-request content" ruleset must apply to the TCP layer. But with HTTP/2 |
| that becomes impossible (and useless). We still need something like the |
| "tcp-request session" hook to apply just after the SSL handshake is done. |
| |
| It is impossible to defragment the body on the fly in HTTP/2. Since multiple |
| messages are interleaved, we cannot wait for all of them and block the head of |
| line. Thus if body analysis is required, it will have to use the stream's |
| buffer, which necessarily implies a copy. That means that with each H2 end we |
| necessarily have at least one copy. Sometimes we might be able to "splice" some |
| bytes from one side to the other without copying into the stream buffer (same |
| rules as for TCP splicing). |
| |
| In theory, only data should flow through the channel buffer, so each side's |
| connector is responsible for encoding data (H1: linear/chunks, H2: frames). |
| Maybe the same mechanism could be extrapolated to tunnels / TCP. |
| |
| Since we'd use buffers only for data (and for receipt of headers), we need to |
| have dynamic buffer allocation. |
| |
| Thus : |
| - Tx buffers do not exist. We allocate a buffer on the fly when we're ready to |
| send something that we need to build and that needs to be persistent in case |
| of partial send. H1 headers are built on the fly from the header table to a |
| temporary buffer that is immediately sent and whose amount of sent bytes is |
| the only information kept (like for PROXY protocol). H2 headers are more |
| complex since the encoding depends on what was successfully sent. Thus we |
| need to build them and put them into a temporary buffer that remains |
| persistent in case send() fails. It is possible to have a limited pool of |
| Tx buffers and refrain from sending if there is no more buffer available in |
| the pool. In that case we need a wake-up mechanism once a buffer is |
| available. Once the data are sent, the Tx buffer is then immediately recycled |
| in its pool. Note that no tx buffer being used (eg: for hdr or control) means |
| that we have to be able to serialize access to the connection and retry with |
| the same stream. It also means that a stream that times out while waiting for |
| the connector to read the second half of its request has to stay there, or at |
| least needs to be handled gracefully. However if the connector cannot read |
| the data to be sent, it means that the buffer is congested and the connection |
| is dead, so that probably means it can be killed. |
| |
| - Rx buffers have to be pre-allocated just before calling recv(). A connection |
| will first try to pick a buffer and disable reception if it fails, then |
| subscribe to the list of tasks waiting for an Rx buffer. |
| |
| - full Rx buffers might sometimes be moved around to the next buffer instead of |
| experiencing a copy. That means that channels and connectors must use the |
| same format of buffer, and that only the channel will have to see its |
| pointers adjusted. |
| |
| - Tx of data should be made as much as possible without copying. That possibly |
| means by directly looking into the connection buffer on the other side if |
| the local Tx buffer does not exist and the stream buffer is not allocated, or |
| even performing a splice() call between the two sides. One of the problem in |
| doing this is that it requires proper ordering of the operations (eg: when |
| multiple readers are attached to a same buffer). If the splitting occurs upon |
| receipt, there's no problem. If we expect to retrieve data directly from the |
| original buffer, it's harder since it contains various things in an order |
| which does not even indicate what belongs to whom. Thus possibly the only |
| mechanism to implement is the buffer permutation which guarantees zero-copy |
| and only in the 100% safe case. Also it's atomic and does not cause HOL |
| blocking. |
| |
| It makes sense to chose the frontend_accept() function right after the |
| handshake ended. It is then possible to check the ALPN, the SNI, the ciphers |
| and to accept to switch to the h2_conn_accept handler only if everything is OK. |
| The h2_conn_accept handler will have to deal with the connection setup, |
| initialization of the header table, exchange of the settings frames and |
| preparing whatever is needed to fire new streams upon receipt of unknown |
| stream IDs. Note: most of the time it will not be possible to splice() because |
| we need to know in advance the amount of bytes to write the header, and here it |
| will not be possible. |
| |
| H2 health checks must be seen as regular transactions/streams. The check runs a |
| normal client which seeks an available stream from a server. The server then |
| finds one on an existing connection or initiates a new H2 connection. The H2 |
| checks will have to be configurable for sharing streams or not. Another option |
| could be to specify how many requests can be made over existing connections |
| before insisting on getting a separate connection. Note that such separate |
| connections might end up stacking up once released. So probably that they need |
| to be recycled very quickly (eg: fix how many unused ones can exist max). |
| |