blob: 8b3f239607d6ecfa0dc9c1ebbf14b9f22a5cbef5 [file] [log] [blame]
Willy Tarreau58f10d72006-12-04 02:26:12 +01001--- Relevant portions of RFC2616 ---
2
3OCTET = <any 8-bit sequence of data>
4CHAR = <any US-ASCII character (octets 0 - 127)>
5UPALPHA = <any US-ASCII uppercase letter "A".."Z">
6LOALPHA = <any US-ASCII lowercase letter "a".."z">
7ALPHA = UPALPHA | LOALPHA
8DIGIT = <any US-ASCII digit "0".."9">
9CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
10CR = <US-ASCII CR, carriage return (13)>
11LF = <US-ASCII LF, linefeed (10)>
12SP = <US-ASCII SP, space (32)>
13HT = <US-ASCII HT, horizontal-tab (9)>
14<"> = <US-ASCII double-quote mark (34)>
15CRLF = CR LF
16LWS = [CRLF] 1*( SP | HT )
17TEXT = <any OCTET except CTLs, but including LWS>
18HEX = "A" | "B" | "C" | "D" | "E" | "F"
19 | "a" | "b" | "c" | "d" | "e" | "f" | DIGIT
20separators = "(" | ")" | "<" | ">" | "@"
21 | "," | ";" | ":" | "\" | <">
22 | "/" | "[" | "]" | "?" | "="
23 | "{" | "}" | SP | HT
24token = 1*<any CHAR except CTLs or separators>
25
26quoted-pair = "\" CHAR
27ctext = <any TEXT excluding "(" and ")">
28qdtext = <any TEXT except <">>
29quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
30comment = "(" *( ctext | quoted-pair | comment ) ")"
31
32
33
34
35
364 HTTP Message
374.1 Message Types
38
39HTTP messages consist of requests from client to server and responses from
40server to client. Request (section 5) and Response (section 6) messages use the
41generic message format of RFC 822 [9] for transferring entities (the payload of
42the message). Both types of message consist of :
43
44 - a start-line
45 - zero or more header fields (also known as "headers")
46 - an empty line (i.e., a line with nothing preceding the CRLF) indicating the
47 end of the header fields
48 - and possibly a message-body.
49
50
51HTTP-message = Request | Response
52
53start-line = Request-Line | Status-Line
54generic-message = start-line
55 *(message-header CRLF)
56 CRLF
57 [ message-body ]
58
59In the interest of robustness, servers SHOULD ignore any empty line(s) received
60where a Request-Line is expected. In other words, if the server is reading the
61protocol stream at the beginning of a message and receives a CRLF first, it
62should ignore the CRLF.
63
64
654.2 Message headers
66
67- Each header field consists of a name followed by a colon (":") and the field
68 value.
69- Field names are case-insensitive.
70- The field value MAY be preceded by any amount of LWS, though a single SP is
71 preferred.
72- Header fields can be extended over multiple lines by preceding each extra
73 line with at least one SP or HT.
74
75
76message-header = field-name ":" [ field-value ]
77field-name = token
78field-value = *( field-content | LWS )
79field-content = <the OCTETs making up the field-value and consisting of
80 either *TEXT or combinations of token, separators, and
81 quoted-string>
82
83
84The field-content does not include any leading or trailing LWS occurring before
85the first non-whitespace character of the field-value or after the last
86non-whitespace character of the field-value. Such leading or trailing LWS MAY
87be removed without changing the semantics of the field value. Any LWS that
88occurs between field-content MAY be replaced with a single SP before
89interpreting the field value or forwarding the message downstream.
90
91
92=> format des headers = 1*(CHAR & !ctl & !sep) ":" *(OCTET & (!ctl | LWS))
93=> les regex de matching de headers s'appliquent sur field-content, et peuvent
94 utiliser field-value comme espace de travail (mais de préférence après le
95 premier SP).
96
97(19.3) The line terminator for message-header fields is the sequence CRLF.
98However, we recommend that applications, when parsing such headers, recognize
99a single LF as a line terminator and ignore the leading CR.
100
101
102
103
104
105message-body = entity-body
106 | <entity-body encoded as per Transfer-Encoding>
107
108
109
1105 Request
111
112Request = Request-Line
113 *(( general-header
114 | request-header
115 | entity-header ) CRLF)
116 CRLF
117 [ message-body ]
118
119
120
1215.1 Request line
122
123The elements are separated by SP characters. No CR or LF is allowed except in
124the final CRLF sequence.
125
126Request-Line = Method SP Request-URI SP HTTP-Version CRLF
127
128(19.3) Clients SHOULD be tolerant in parsing the Status-Line and servers
129tolerant when parsing the Request-Line. In particular, they SHOULD accept any
130amount of SP or HT characters between fields, even though only a single SP is
131required.
132
1334.5 General headers
134Apply to MESSAGE.
135
136general-header = Cache-Control
137 | Connection
138 | Date
139 | Pragma
140 | Trailer
141 | Transfer-Encoding
142 | Upgrade
143 | Via
144 | Warning
145
146General-header field names can be extended reliably only in combination with a
147change in the protocol version. However, new or experimental header fields may
148be given the semantics of general header fields if all parties in the
149communication recognize them to be general-header fields. Unrecognized header
150fields are treated as entity-header fields.
151
152
153
154
1555.3 Request Header Fields
156
157The request-header fields allow the client to pass additional information about
158the request, and about the client itself, to the server. These fields act as
159request modifiers, with semantics equivalent to the parameters on a programming
160language method invocation.
161
162request-header = Accept
163 | Accept-Charset
164 | Accept-Encoding
165 | Accept-Language
166 | Authorization
167 | Expect
168 | From
169 | Host
170 | If-Match
171 | If-Modified-Since
172 | If-None-Match
173 | If-Range
174 | If-Unmodified-Since
175 | Max-Forwards
176 | Proxy-Authorization
177 | Range
178 | Referer
179 | TE
180 | User-Agent
181
182Request-header field names can be extended reliably only in combination with a
183change in the protocol version. However, new or experimental header fields MAY
184be given the semantics of request-header fields if all parties in the
185communication recognize them to be request-header fields. Unrecognized header
186fields are treated as entity-header fields.
187
188
189
1907.1 Entity header fields
191
192Entity-header fields define metainformation about the entity-body or, if no
193body is present, about the resource identified by the request. Some of this
194metainformation is OPTIONAL; some might be REQUIRED by portions of this
195specification.
196
197entity-header = Allow
198 | Content-Encoding
199 | Content-Language
200 | Content-Length
201 | Content-Location
202 | Content-MD5
203 | Content-Range
204 | Content-Type
205 | Expires
206 | Last-Modified
207 | extension-header
208extension-header = message-header
209
210The extension-header mechanism allows additional entity-header fields to be
211defined without changing the protocol, but these fields cannot be assumed to be
212recognizable by the recipient. Unrecognized header fields SHOULD be ignored by
213the recipient and MUST be forwarded by transparent proxies.
214
Willy Tarreau2f1feb92012-01-07 23:58:54 +0100215----------------------------------
216
217The format of Request-URI is defined by RFC3986 :
218
219 URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
220
221 hier-part = "//" authority path-abempty
222 / path-absolute
223 / path-rootless
224 / path-empty
225
226 URI-reference = URI / relative-ref
227
228 absolute-URI = scheme ":" hier-part [ "?" query ]
229
230 relative-ref = relative-part [ "?" query ] [ "#" fragment ]
231
232 relative-part = "//" authority path-abempty
233 / path-absolute
234 / path-noscheme
235 / path-empty
236
237 scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
238
239 authority = [ userinfo "@" ] host [ ":" port ]
240 userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
241 host = IP-literal / IPv4address / reg-name
242 port = *DIGIT
243
244 IP-literal = "[" ( IPv6address / IPvFuture ) "]"
245
246 IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
247
248 IPv6address = 6( h16 ":" ) ls32
249 / "::" 5( h16 ":" ) ls32
250 / [ h16 ] "::" 4( h16 ":" ) ls32
251 / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
252 / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
253 / [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32
254 / [ *4( h16 ":" ) h16 ] "::" ls32
255 / [ *5( h16 ":" ) h16 ] "::" h16
256 / [ *6( h16 ":" ) h16 ] "::"
257
258 h16 = 1*4HEXDIG
259 ls32 = ( h16 ":" h16 ) / IPv4address
260 IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
261 dec-octet = DIGIT ; 0-9
262 / %x31-39 DIGIT ; 10-99
263 / "1" 2DIGIT ; 100-199
264 / "2" %x30-34 DIGIT ; 200-249
265 / "25" %x30-35 ; 250-255
266
267 reg-name = *( unreserved / pct-encoded / sub-delims )
268
269 path = path-abempty ; begins with "/" or is empty
270 / path-absolute ; begins with "/" but not "//"
271 / path-noscheme ; begins with a non-colon segment
272 / path-rootless ; begins with a segment
273 / path-empty ; zero characters
274
275 path-abempty = *( "/" segment )
276 path-absolute = "/" [ segment-nz *( "/" segment ) ]
277 path-noscheme = segment-nz-nc *( "/" segment )
278 path-rootless = segment-nz *( "/" segment )
279 path-empty = 0<pchar>
280
281 segment = *pchar
282 segment-nz = 1*pchar
283 segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
284 ; non-zero-length segment without any colon ":"
285
286 pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
287
288 query = *( pchar / "/" / "?" )
289
290 fragment = *( pchar / "/" / "?" )
291
292 pct-encoded = "%" HEXDIG HEXDIG
293
294 unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
295 reserved = gen-delims / sub-delims
296 gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
297 sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
298 / "*" / "+" / "," / ";" / "="
299
300=> so the list of allowed characters in a URI is :
301
302 uri-char = unreserved / gen-delims / sub-delims / "%"
303 = ALPHA / DIGIT / "-" / "." / "_" / "~"
304 / ":" / "/" / "?" / "#" / "[" / "]" / "@"
305 / "!" / "$" / "&" / "'" / "(" / ")" /
306 / "*" / "+" / "," / ";" / "=" / "%"
307
308Note that non-ascii characters are forbidden ! Spaces and CTL are forbidden.
309Unfortunately, some products such as Apache allow such characters :-/
310
Willy Tarreau1ba6a732007-01-07 12:43:29 +0100311---- The correct way to do it ----
312
313- one http_session
314 It is basically any transport session on which we talk HTTP. It may be TCP,
315 SSL over TCP, etc... It knows a way to talk to the client, either the socket
316 file descriptor or a direct access to the client-side buffer. It should hold
317 information about the last accessed server so that we can guarantee that the
318 same server can be used during a whole session if needed. A first version
319 without optimal support for HTTP pipelining will have the client buffers tied
320 to the http_session. It may be possible that it is not sufficient for full
321 pipelining, but this will need further study. The link from the buffers to
Willy Tarreaub326fcc2007-03-03 13:54:32 +0100322 the backend should be managed by the http transaction (http_txn), provided
323 that they are serialized. Each http_session, has 0 to N http_txn. Each
324 http_txn belongs to one and only one http_session.
Willy Tarreau1ba6a732007-01-07 12:43:29 +0100325
Willy Tarreaub326fcc2007-03-03 13:54:32 +0100326- each http_txn has 1 request message (http_req), and 0 or 1 response message
327 (http_rtr). Each of them has 1 and only one http_txn. An http_txn holds
Michael Prokop60ce0ef2022-12-09 12:28:46 +0100328 information such as the HTTP method, the URI, the HTTP version, the
Willy Tarreaub326fcc2007-03-03 13:54:32 +0100329 transfer-encoding, the HTTP status, the authorization, the req and rtr
330 content-length, the timers, logs, etc... The backend and server which process
331 the request are also known from the http_txn.
Willy Tarreau1ba6a732007-01-07 12:43:29 +0100332
Michael Prokop60ce0ef2022-12-09 12:28:46 +0100333- both request and response messages hold header and parsing information, such
Willy Tarreaub326fcc2007-03-03 13:54:32 +0100334 as the parsing state, start of headers, start of message, captures, etc...
Willy Tarreau1ba6a732007-01-07 12:43:29 +0100335