Willy Tarreau | 9382cdd | 2018-02-21 18:07:26 +0100 | [diff] [blame^] | 1 | 2018-02-21 - Layering in haproxy 1.9 |
| 2 | ------------------------------------ |
| 3 | |
| 4 | 2 main zones : |
| 5 | - application : reads from conn_streams, writes to conn_streams, often uses |
| 6 | streams |
| 7 | |
| 8 | - connection : receives data from the network, presented into buffers |
| 9 | available via conn_streams, sends data to the network |
| 10 | |
| 11 | |
| 12 | The connection zone contains multiple layers which behave independantly in each |
| 13 | direction. The Rx direction is activated upon callbacks from the lower layers. |
| 14 | The Tx direction is activated recursively from the upper layers. Between every |
| 15 | two layers there may be a buffer, in each direction. When a buffer is full |
| 16 | either in Tx or Rx direction, this direction is paused from the network layer |
| 17 | and the location where the congestion is encountered. Upon end of congestion |
| 18 | (cs_recv() from the upper layer, of sendto() at the lower layers), a |
| 19 | tasklet_wakeup() is performed on the blocked layer so that suspended operations |
| 20 | can be resumed. In this case, the Rx side restarts propagating data upwards |
| 21 | from the lowest blocked level, while the Tx side restarts propagating data |
| 22 | downwards from the highest blocked level. Proceeding like this ensures that |
| 23 | information known to the producer may always be used to tailor the buffer sizes |
| 24 | or decide of a strategy to best aggregate data. Additionally, each time a layer |
| 25 | is crossed without transformation, it becomes possible to send without copying. |
| 26 | |
| 27 | The Rx side notifies the application of data readiness using a wakeup or a |
| 28 | callback. The Tx side notifies the application of room availability once data |
| 29 | have been moved resulting in the uppermost buffer having some free space. |
| 30 | |
| 31 | When crossing a mux downwards, it is possible that the sender is not allowed to |
| 32 | access the buffer because it is not yet its turn. It is not a problem, the data |
| 33 | remains in the conn_stream's buffer (or the stream one) and will be restarted |
| 34 | once the mux is ready to consume these data. |
| 35 | |
| 36 | |
| 37 | cs_recv() -------. cs_send() |
| 38 | ^ +--------> |||||| -------------+ ^ |
| 39 | | | -------' | | stream |
| 40 | --|----------|-------------------------------|-------|------------------- |
| 41 | | | V | connection |
| 42 | data .---. | | room |
| 43 | ready! |---| |---| available! |
| 44 | |---| |---| |
| 45 | |---| |---| |
| 46 | | | '---' |
| 47 | ^ +------------+-------+ | |
| 48 | | | ^ | / |
| 49 | / V | V / |
| 50 | / recvfrom() | sendto() | |
| 51 | -------------|----------------|--------------|--------------------------- |
| 52 | | | poll! V kernel |
| 53 | |
| 54 | |
| 55 | The cs_recv() function should act on pointers to buffer pointers, so that the |
| 56 | callee may decide to pass its own buffer directly by simply swapping pointers. |
| 57 | Similarly for cs_send() it is desirable to let the callee steal the buffer by |
| 58 | swapping the pointers. This way it remains possible to implement zero-copy |
| 59 | forwarding. |
| 60 | |
| 61 | Some operation flags will be needed on cs_recv() : |
| 62 | - RECV_ZERO_COPY : refuse to merge new data into the current buffer if it |
| 63 | will result in a data copy (ie the buffer is not empty), unless no more |
| 64 | than XXX bytes have to be copied (eg: copying 2 cache lines may be cheaper |
| 65 | than waiting and playing with pointers) |
| 66 | |
| 67 | - RECV_AT_ONCE : only perform the operation if it will result in the source |
| 68 | buffer to become empty at the end of the operation so that no two buffers |
| 69 | remain allocated at the end. It will most of the time result in either a |
| 70 | small read or a zero-copy operation. |
| 71 | |
| 72 | - RECV_PEEK : retrieve a copy of pending data without removing these data |
| 73 | from the source buffer. Maybe an alternate solution could consist in |
| 74 | finding the pointer to the source buffer and accessing these data directly, |
| 75 | except that it might be less interesting for the long term, thread-wise. |
| 76 | |
| 77 | - RECV_MIN : receive minimum X bytes (or less with a shutdown), or fail. |
| 78 | This should help various protocol parsers which need to receive a complete |
| 79 | frame before proceeding. |
| 80 | |
| 81 | - RECV_ENOUGH : no more data expected after this read if it's of the |
| 82 | requested size, thus no need to re-enable receiving on the lower layers. |
| 83 | |
| 84 | - RECV_ONE_SHOT : perform a single read without re-enabling reading on the |
| 85 | lower layers, like we currently do when receving an HTTP/1 request. Like |
| 86 | RECV_ENOUGH where any size is enough. Probably that the two could be merged |
| 87 | (eg: by having a MIN argument like RECV_MIN). |
| 88 | |
| 89 | |
| 90 | Some operation flags will be needed on cs_send() : |
| 91 | - SEND_ZERO_COPY : refuse to merge the presented data with existing data and |
| 92 | prefer to wait for current data to leave and try again, unless the consumer |
| 93 | considers the amount of data acceptable for a copy. |
| 94 | |
| 95 | - SEND_AT_ONCE : only perform the operation if it will result in the source |
| 96 | buffer to become empty at the end of the operation so that no two buffers |
| 97 | remain allocated at the end. It will most of the time result in either a |
| 98 | small write or a zero-copy operation. |
| 99 | |
| 100 | |
| 101 | Both operations should return a composite status : |
| 102 | - number of bytes transfered |
| 103 | - status flags (shutr, shutw, reset, empty, full, ...) |
| 104 | |