blob: 0fa6fe720b1efedff86a19b7d27ff927e72fbc7b [file] [log] [blame]
Willy Tarreau9382cdd2018-02-21 18:07:26 +010012018-02-21 - Layering in haproxy 1.9
2------------------------------------
3
42 main zones :
5 - application : reads from conn_streams, writes to conn_streams, often uses
6 streams
7
8 - connection : receives data from the network, presented into buffers
9 available via conn_streams, sends data to the network
10
11
12The connection zone contains multiple layers which behave independantly in each
13direction. The Rx direction is activated upon callbacks from the lower layers.
14The Tx direction is activated recursively from the upper layers. Between every
15two layers there may be a buffer, in each direction. When a buffer is full
16either in Tx or Rx direction, this direction is paused from the network layer
17and the location where the congestion is encountered. Upon end of congestion
18(cs_recv() from the upper layer, of sendto() at the lower layers), a
19tasklet_wakeup() is performed on the blocked layer so that suspended operations
20can be resumed. In this case, the Rx side restarts propagating data upwards
21from the lowest blocked level, while the Tx side restarts propagating data
22downwards from the highest blocked level. Proceeding like this ensures that
23information known to the producer may always be used to tailor the buffer sizes
24or decide of a strategy to best aggregate data. Additionally, each time a layer
25is crossed without transformation, it becomes possible to send without copying.
26
27The Rx side notifies the application of data readiness using a wakeup or a
28callback. The Tx side notifies the application of room availability once data
29have been moved resulting in the uppermost buffer having some free space.
30
31When crossing a mux downwards, it is possible that the sender is not allowed to
32access the buffer because it is not yet its turn. It is not a problem, the data
33remains in the conn_stream's buffer (or the stream one) and will be restarted
34once the mux is ready to consume these data.
35
36
37 cs_recv() -------. cs_send()
38 ^ +--------> |||||| -------------+ ^
39 | | -------' | | stream
40 --|----------|-------------------------------|-------|-------------------
41 | | V | connection
42 data .---. | | room
43 ready! |---| |---| available!
44 |---| |---|
45 |---| |---|
46 | | '---'
47 ^ +------------+-------+ |
48 | | ^ | /
49 / V | V /
50 / recvfrom() | sendto() |
51 -------------|----------------|--------------|---------------------------
52 | | poll! V kernel
53
54
55The cs_recv() function should act on pointers to buffer pointers, so that the
56callee may decide to pass its own buffer directly by simply swapping pointers.
57Similarly for cs_send() it is desirable to let the callee steal the buffer by
58swapping the pointers. This way it remains possible to implement zero-copy
59forwarding.
60
61Some operation flags will be needed on cs_recv() :
62 - RECV_ZERO_COPY : refuse to merge new data into the current buffer if it
63 will result in a data copy (ie the buffer is not empty), unless no more
64 than XXX bytes have to be copied (eg: copying 2 cache lines may be cheaper
65 than waiting and playing with pointers)
66
67 - RECV_AT_ONCE : only perform the operation if it will result in the source
68 buffer to become empty at the end of the operation so that no two buffers
69 remain allocated at the end. It will most of the time result in either a
70 small read or a zero-copy operation.
71
72 - RECV_PEEK : retrieve a copy of pending data without removing these data
73 from the source buffer. Maybe an alternate solution could consist in
74 finding the pointer to the source buffer and accessing these data directly,
75 except that it might be less interesting for the long term, thread-wise.
76
77 - RECV_MIN : receive minimum X bytes (or less with a shutdown), or fail.
78 This should help various protocol parsers which need to receive a complete
79 frame before proceeding.
80
81 - RECV_ENOUGH : no more data expected after this read if it's of the
82 requested size, thus no need to re-enable receiving on the lower layers.
83
84 - RECV_ONE_SHOT : perform a single read without re-enabling reading on the
85 lower layers, like we currently do when receving an HTTP/1 request. Like
86 RECV_ENOUGH where any size is enough. Probably that the two could be merged
87 (eg: by having a MIN argument like RECV_MIN).
88
89
90Some operation flags will be needed on cs_send() :
91 - SEND_ZERO_COPY : refuse to merge the presented data with existing data and
92 prefer to wait for current data to leave and try again, unless the consumer
93 considers the amount of data acceptable for a copy.
94
95 - SEND_AT_ONCE : only perform the operation if it will result in the source
96 buffer to become empty at the end of the operation so that no two buffers
97 remain allocated at the end. It will most of the time result in either a
98 small write or a zero-copy operation.
99
100
101Both operations should return a composite status :
102 - number of bytes transfered
103 - status flags (shutr, shutw, reset, empty, full, ...)
104