| 2018-02-21 - Layering in haproxy 1.9 |
| ------------------------------------ |
| |
| 2 main zones : |
| - application : reads from conn_streams, writes to conn_streams, often uses |
| streams |
| |
| - connection : receives data from the network, presented into buffers |
| available via conn_streams, sends data to the network |
| |
| |
| The connection zone contains multiple layers which behave independantly in each |
| direction. The Rx direction is activated upon callbacks from the lower layers. |
| The Tx direction is activated recursively from the upper layers. Between every |
| two layers there may be a buffer, in each direction. When a buffer is full |
| either in Tx or Rx direction, this direction is paused from the network layer |
| and the location where the congestion is encountered. Upon end of congestion |
| (cs_recv() from the upper layer, of sendto() at the lower layers), a |
| tasklet_wakeup() is performed on the blocked layer so that suspended operations |
| can be resumed. In this case, the Rx side restarts propagating data upwards |
| from the lowest blocked level, while the Tx side restarts propagating data |
| downwards from the highest blocked level. Proceeding like this ensures that |
| information known to the producer may always be used to tailor the buffer sizes |
| or decide of a strategy to best aggregate data. Additionally, each time a layer |
| is crossed without transformation, it becomes possible to send without copying. |
| |
| The Rx side notifies the application of data readiness using a wakeup or a |
| callback. The Tx side notifies the application of room availability once data |
| have been moved resulting in the uppermost buffer having some free space. |
| |
| When crossing a mux downwards, it is possible that the sender is not allowed to |
| access the buffer because it is not yet its turn. It is not a problem, the data |
| remains in the conn_stream's buffer (or the stream one) and will be restarted |
| once the mux is ready to consume these data. |
| |
| |
| cs_recv() -------. cs_send() |
| ^ +--------> |||||| -------------+ ^ |
| | | -------' | | stream |
| --|----------|-------------------------------|-------|------------------- |
| | | V | connection |
| data .---. | | room |
| ready! |---| |---| available! |
| |---| |---| |
| |---| |---| |
| | | '---' |
| ^ +------------+-------+ | |
| | | ^ | / |
| / V | V / |
| / recvfrom() | sendto() | |
| -------------|----------------|--------------|--------------------------- |
| | | poll! V kernel |
| |
| |
| The cs_recv() function should act on pointers to buffer pointers, so that the |
| callee may decide to pass its own buffer directly by simply swapping pointers. |
| Similarly for cs_send() it is desirable to let the callee steal the buffer by |
| swapping the pointers. This way it remains possible to implement zero-copy |
| forwarding. |
| |
| Some operation flags will be needed on cs_recv() : |
| - RECV_ZERO_COPY : refuse to merge new data into the current buffer if it |
| will result in a data copy (ie the buffer is not empty), unless no more |
| than XXX bytes have to be copied (eg: copying 2 cache lines may be cheaper |
| than waiting and playing with pointers) |
| |
| - RECV_AT_ONCE : only perform the operation if it will result in the source |
| buffer to become empty at the end of the operation so that no two buffers |
| remain allocated at the end. It will most of the time result in either a |
| small read or a zero-copy operation. |
| |
| - RECV_PEEK : retrieve a copy of pending data without removing these data |
| from the source buffer. Maybe an alternate solution could consist in |
| finding the pointer to the source buffer and accessing these data directly, |
| except that it might be less interesting for the long term, thread-wise. |
| |
| - RECV_MIN : receive minimum X bytes (or less with a shutdown), or fail. |
| This should help various protocol parsers which need to receive a complete |
| frame before proceeding. |
| |
| - RECV_ENOUGH : no more data expected after this read if it's of the |
| requested size, thus no need to re-enable receiving on the lower layers. |
| |
| - RECV_ONE_SHOT : perform a single read without re-enabling reading on the |
| lower layers, like we currently do when receving an HTTP/1 request. Like |
| RECV_ENOUGH where any size is enough. Probably that the two could be merged |
| (eg: by having a MIN argument like RECV_MIN). |
| |
| |
| Some operation flags will be needed on cs_send() : |
| - SEND_ZERO_COPY : refuse to merge the presented data with existing data and |
| prefer to wait for current data to leave and try again, unless the consumer |
| considers the amount of data acceptable for a copy. |
| |
| - SEND_AT_ONCE : only perform the operation if it will result in the source |
| buffer to become empty at the end of the operation so that no two buffers |
| remain allocated at the end. It will most of the time result in either a |
| small write or a zero-copy operation. |
| |
| |
| Both operations should return a composite status : |
| - number of bytes transfered |
| - status flags (shutr, shutw, reset, empty, full, ...) |
| |