Willy Tarreau | c14b7d9 | 2014-06-19 16:03:41 +0200 | [diff] [blame] | 1 | 2012/07/05 - Connection layering and sequencing |
| 2 | |
| 3 | |
| 4 | An FD has a state : |
| 5 | - CLOSED |
| 6 | - READY |
| 7 | - ERROR (?) |
| 8 | - LISTEN (?) |
| 9 | |
| 10 | A connection has a state : |
| 11 | - CLOSED |
| 12 | - ACCEPTED |
| 13 | - CONNECTING |
| 14 | - ESTABLISHED |
| 15 | - ERROR |
| 16 | |
| 17 | A stream interface has a state : |
| 18 | - INI, REQ, QUE, TAR, ASS, CON, CER, EST, DIS, CLO |
| 19 | |
| 20 | Note that CON and CER might be replaced by EST if the connection state is used |
| 21 | instead. CON might even be more suited than EST to indicate that a connection |
| 22 | is known. |
| 23 | |
| 24 | |
| 25 | si_shutw() must do : |
| 26 | |
| 27 | data_shutw() |
| 28 | if (shutr) { |
| 29 | data_close() |
| 30 | ctrl_shutw() |
| 31 | ctrl_close() |
| 32 | } |
| 33 | |
| 34 | si_shutr() must do : |
| 35 | data_shutr() |
| 36 | if (shutw) { |
| 37 | data_close() |
| 38 | ctrl_shutr() |
| 39 | ctrl_close() |
| 40 | } |
| 41 | |
| 42 | Each of these steps may fail, in which case the step must be retained and the |
| 43 | operations postponed in an asynchronous task. |
| 44 | |
| 45 | The first asynchronous data_shut() might already fail so it is mandatory to |
| 46 | save the other side's status with the connection in order to let the async task |
| 47 | know whether the 3 next steps must be performed. |
| 48 | |
| 49 | The connection (or perhaps the FD) needs to know : |
| 50 | - the desired close operations : DSHR, DSHW, CSHR, CSHW |
| 51 | - the completed close operations : DSHR, DSHW, CSHR, CSHW |
| 52 | |
| 53 | |
| 54 | On the accept() side, we probably need to know : |
| 55 | - if a header is expected (eg: accept-proxy) |
| 56 | - if this header is still being waited for |
| 57 | => maybe both info might be combined into one bit |
| 58 | |
| 59 | - if a data-layer accept() is expected |
| 60 | - if a data-layer accept() has been started |
| 61 | - if a data-layer accept() has been performed |
| 62 | => possibly 2 bits, to indicate the need to free() |
| 63 | |
| 64 | On the connect() side, we need to konw : |
| 65 | - the desire to send a header (eg: send-proxy) |
| 66 | - if this header has been sent |
| 67 | => maybe both info might be combined |
| 68 | |
| 69 | - if a data-layer connect() is expected |
| 70 | - if a data-layer connect() has been started |
| 71 | - if a data-layer connect() has been completed |
| 72 | => possibly 2 bits, to indicate the need to free() |
| 73 | |
| 74 | On the response side, we also need to know : |
| 75 | - the desire to send a header (eg: health check response for monitor-net) |
| 76 | - if this header was sent |
| 77 | => might be the same as sending a header over a new connection |
| 78 | |
| 79 | Note: monitor-net has precedence over proxy proto and data layers. Same for |
| 80 | health mode. |
| 81 | |
| 82 | For multi-step operations, use 2 bits : |
| 83 | 00 = operation not desired, not performed |
| 84 | 10 = operation desired, not started |
| 85 | 11 = operation desired, started but not completed |
| 86 | 01 = operation desired, started and completed |
| 87 | |
| 88 | => X != 00 ==> operation desired |
| 89 | X & 01 ==> operation at least started |
| 90 | X & 10 ==> operation not completed |
| 91 | |
| 92 | Note: no way to store status information for error reporting. |
| 93 | |
| 94 | Note2: it would be nice if "tcp-request connection" rules could work at the |
| 95 | connection level, just after headers ! This means support for tracking stick |
| 96 | tables, possibly not too much complicated. |
| 97 | |
| 98 | |
| 99 | Proposal for incoming connection sequence : |
| 100 | |
| 101 | - accept() |
| 102 | - if monitor-net matches or if mode health => try to send response |
| 103 | - if accept-proxy, wait for proxy request |
| 104 | - if tcp-request connection, process tcp rules and possibly keep the |
| 105 | pointer to stick-table |
| 106 | - if SSL is enabled, switch to SSL handshake |
| 107 | - then switch to DATA state and instantiate a session |
| 108 | |
| 109 | We just need a map of handshake handlers on the connection. They all manage the |
| 110 | FD status themselves and set the callbacks themselves. If their work succeeds, |
| 111 | they remove themselves from the list. If it fails, they remain subscribed and |
| 112 | enable the required polling until they are woken up again or the timeout strikes. |
| 113 | |
| 114 | Identified handshake handlers for incoming connections : |
| 115 | - HH_HEALTH (tries to send OK and dies) |
| 116 | - HH_MONITOR_IN (matches src IP and adds/removes HH_SEND_OK/HH_SEND_HTTP_OK) |
| 117 | - HH_SEND_OK (tries to send "OK" and dies) |
| 118 | - HH_SEND_HTTP_OK (tries to send "HTTP/1.0 200 OK" and dies) |
| 119 | - HH_ACCEPT_PROXY (waits for PROXY line and parses it) |
| 120 | - HH_TCP_RULES (processes TCP rules) |
| 121 | - HH_SSL_HS (starts SSL handshake) |
| 122 | - HH_ACCEPT_SESSION (instanciates a session) |
| 123 | |
| 124 | Identified handshake handlers for outgoing connections : |
| 125 | - HH_SEND_PROXY (tries to build and send the PROXY line) |
| 126 | - HH_SSL_HS (starts SSL handshake) |
| 127 | |
| 128 | For the pollers, we could check that handshake handlers are not 0 and decide to |
| 129 | call a generic connection handshake handler instead of usual callbacks. Problem |
| 130 | is that pollers don't know connections, they know fds. So entities which manage |
| 131 | handlers should update change the FD callbacks accordingly. |
| 132 | |
| 133 | With a bit of care, we could have : |
| 134 | - HH_SEND_LAST_CHUNK (sends the chunk pointed to by a pointer and dies) |
| 135 | => merges HEALTH, SEND_OK and SEND_HTTP_OK |
| 136 | |
| 137 | It sounds like the ctrl vs data state for the connection are per-direction |
| 138 | (eg: support an async ctrl shutw while still reading data). |
| 139 | |
| 140 | Also support shutr/shutw status at L4/L7. |
| 141 | |
| 142 | In practice, what we really need is : |
| 143 | |
| 144 | shutdown(conn) = |
| 145 | conn.data.shut() |
| 146 | conn.ctrl.shut() |
| 147 | conn.fd.shut() |
| 148 | |
| 149 | close(conn) = |
| 150 | conn.data.close() |
| 151 | conn.ctrl.close() |
| 152 | conn.fd.close() |
| 153 | |
| 154 | With SSL over Remote TCP (RTCP + RSSL) to reach the server, we would have : |
| 155 | |
| 156 | HTTP -> RTCP+RSSL connection <-> RTCP+RRAW connection -> TCP+SSL connection |
| 157 | |
| 158 | The connection has to be closed at 3 places after a successful response : |
| 159 | - DATA (RSSL over RTCP) |
| 160 | - CTRL (RTCP to close connection to server) |
| 161 | - SOCK (FD to close connection to second process) |
| 162 | |
| 163 | Externally, the connection is seen with very few flags : |
| 164 | - SHR |
| 165 | - SHW |
| 166 | - ERR |
| 167 | |
| 168 | We don't need a CLOSED flag as a connection must always be detached when it's closed. |
| 169 | |
| 170 | The internal status doesn't need to be exposed : |
| 171 | - FD allocated (Y/N) |
| 172 | - CTRL initialized (Y/N) |
| 173 | - CTRL connected (Y/N) |
| 174 | - CTRL handlers done (Y/N) |
| 175 | - CTRL failed (Y/N) |
| 176 | - CTRL shutr (Y/N) |
| 177 | - CTRL shutw (Y/N) |
| 178 | - DATA initialized (Y/N) |
| 179 | - DATA connected (Y/N) |
| 180 | - DATA handlers done (Y/N) |
| 181 | - DATA failed (Y/N) |
| 182 | - DATA shutr (Y/N) |
| 183 | - DATA shutw (Y/N) |
| 184 | |
| 185 | (note that having flags for operations needing to be completed might be easier) |
| 186 | -------------- |
| 187 | |
| 188 | Maybe we need to be able to call conn->fdset() and conn->fdclr() but it sounds |
| 189 | very unlikely since the only functions manipulating this are in the code of |
| 190 | the data/ctrl handlers. |
| 191 | |
| 192 | FDSET/FDCLR cannot be directly controlled by the stream interface since it also |
| 193 | depends on the DATA layer (WANT_READ/WANT_WRITE). |
| 194 | |
| 195 | But FDSET/FDCLR is probably controlled by who owns the connection (eg: DATA). |
| 196 | |
| 197 | Example: an SSL conn relies on an FD. The buffer is full, and wants the conn to |
| 198 | stop reading. It must not stop the FD itself. It is the read function which |
| 199 | should notice that it has nothing to do with a read wake-up, which needs to |
| 200 | disable reading. |
| 201 | |
| 202 | Conversely, when calling conn->chk_rcv(), the reader might get a WANT_READ or |
| 203 | even WANT_WRITE and adjust the FDs accordingly. |
| 204 | |
| 205 | ------------------------ |
| 206 | |
| 207 | OK, the problem is simple : we don't manipulate the FD at the right level. |
| 208 | We should have : |
| 209 | ->connect(), ->chk_snd(), ->chk_rcv(), ->shutw(), ->shutr() which are |
| 210 | called from the upper layer (buffer) |
| 211 | ->recv(), ->send(), called from the lower layer |
| 212 | |
| 213 | Note that the SHR is *reported* by lower layer but can be forced by upper |
| 214 | layer. In this case it's like a delayed abort. The difficulty consists in |
| 215 | knowing the output data were correctly read. Probably we'd need to drain |
| 216 | incoming data past the active shutr(). |
| 217 | |
| 218 | The only four purposes of the top-down shutr() call are : |
| 219 | - acknowledge a shut read report : could probably be done better |
| 220 | - read timeout => disable reading : it's a delayed abort. We want to |
| 221 | report that the buffer is SHR, maybe even the connection, but the |
| 222 | FD clearly isn't. |
| 223 | - read abort due to error on the other side or desire to close (eg: |
| 224 | http-server-close) : delayed abort |
| 225 | - complete abort |
| 226 | |
| 227 | The active shutr() is problematic as we can't disable reading if we expect some |
| 228 | exchanges for data acknowledgement. We probably need to drain data only until |
| 229 | the shutw() has been performed and ACKed. |
| 230 | |
| 231 | A connection shut down for read would behave like this : |
| 232 | |
| 233 | 1) bidir exchanges |
| 234 | |
| 235 | 2) shutr() => read_abort_pending=1 |
| 236 | |
| 237 | 3) drain input, still send output |
| 238 | |
| 239 | 4) shutw() |
| 240 | |
| 241 | 5) drain input, wait for read0 or ack(shutw) |
| 242 | |
| 243 | 6) close() |
| 244 | |
| 245 | --------------------- 2012/07/05 ------------------- |
| 246 | |
| 247 | Communications must be performed this way : |
| 248 | |
| 249 | connection <-> channel <-> connection |
| 250 | |
| 251 | A channel is composed of flags and stats, and may store data in either a buffer |
| 252 | or a pipe. We need low-layer operations between sockets and buffers or pipes. |
| 253 | Right now we only support sockets, but later we might support remote sockets |
| 254 | and maybe pipes or shared memory segments. |
| 255 | |
| 256 | So we need : |
| 257 | |
| 258 | - raw_sock_to_buf() => receive raw data from socket into buffer |
| 259 | - raw_sock_to_pipe => receive raw data from socket into pipe (splice in) |
| 260 | - raw_sock_from_buf() => send raw data from buffer to socket |
| 261 | - raw_sock_from_pipe => send raw data from pipe to socket (splice out) |
| 262 | |
| 263 | - ssl_sock_to_buf() => receive ssl data from socket into buffer |
| 264 | - ssl_sock_to_pipe => receive ssl data from socket into a pipe (NULL) |
| 265 | - ssl_sock_from_buf() => send ssl data from buffer to socket |
| 266 | - ssl_sock_from_pipe => send ssl data from pipe to socket (NULL) |
| 267 | |
| 268 | These functions should set such status flags : |
| 269 | |
| 270 | #define ERR_IN 0x01 |
| 271 | #define ERR_OUT 0x02 |
| 272 | #define SHUT_IN 0x04 |
| 273 | #define SHUT_OUT 0x08 |
| 274 | #define EMPTY_IN 0x10 |
| 275 | #define FULL_OUT 0x20 |
| 276 | |