blob: 106e41ce76651f39ad290a3db54db91b3cf73bb4 [file] [log] [blame]
Willy Tarreauf4016df2022-02-24 08:59:08 +010012022-02-22 - debugging options with pools
2
3Two goals:
4 - help developers spot bugs as early as possible
5
6 - make the process more reliable in field, by killing sick ones as soon as
7 possible instead of letting them corrupt data, cause trouble, or even be
8 exploited.
9
10An allocated object may exist in 5 forms:
11 - in use: currently referenced and used by haproxy, 100% of its size are
12 dedicated to the application which can do absolutely anything with it,
13 but it may never touch anything before nor after that area.
14
15 - in cache: the object is neither referenced nor used anymore, but it sits
16 in a thread's cache. The application may not touch it at all anymore, and
17 some parts of it could even be unmapped. Only the current thread may safely
18 reach it, though others might find/release it when under thread isolation.
19 The thread cache needs some LRU linking that may be stored anywhere, either
20 inside the area, or outside. The parts surrounding the <size> parts remain
21 invisible to the application layer, and can serve as a protection.
22
23 - in shared cache: the object is neither referenced nor used anymore, but it
24 may be reached by any thread. Some parts of it could be unmapped. Any
25 thread may pick it but only one may find it, hence once grabbed, it is
26 guaranteed no other one will find it. The shared cache needs to set up a
27 linked list and a single pointer needs to be stored anywhere, either inside
28 or outside the area. The parts surrounding the <size> parts remain
29 invisible to the application layer, and can serve as a protection.
30
31 - in the system's memory allocator: the object is not known anymore from
32 haproxy. It may be reassigned in parts or totally to other pools or other
33 subsystems (e.g. crypto library). Some or all of it may be unmapped. The
34 areas surrounding the <size> parts are also part of the object from the
35 library's point of view and may be delivered to other areas. Tampering
36 with these may cause any other part to malfunction in dirty ways.
37
38 - in the OS only: the memory allocator gave it back to the OS.
39
40The following options need to be configurable:
41 - detect improper initialization: this is done by poisonning objects before
42 delivering them to the application.
43
44 - help figure where an object was allocated when in use: a pointer to the
45 call place will help. Pointing to the last pool_free() as well for the
46 same reasons when dealing with a UAF.
47
48 - detection of wrong pointer/pool when in use: a pointer to the pool before
49 or after the area will definitely help.
50
51 - detection of overflows when in use: a canary at the end of the area
52 (closest possible to <size>) will definitely help. The pool above can do
53 that job. Ideally, we should fill some data at the end so that even
54 unaligned sizes can be checked (e.g. a buffer that gets a zero appended).
55 If we just align on 2 pointers, writing the same pointer twice at the end
56 may do the job, but we won't necessarily have our bytes. Thus a particular
57 end-of-string pattern would be useful (e.g. ff55aa01) to fill it.
58
59 - detection of double free when in cache: similar to detection of wrong
60 pointer/pool when in use: the pointer at the end may simply be changed so
61 that it cannot match the pool anymore. By using a pointer to the caller of
62 the previous free() operation, we have the guarantee to see different
63 pointers, and this pointer can be inspected to figure where the object was
64 previously freed. An extra check may even distinguish a perfect double-free
65 (same caller) from just a wrong free (pointer differs from pool).
66
67 - detection of late corruption when in cache: keeping a copy of the
68 checksum of the whole area upon free() will do the job, but requires one
69 extra storage area for the checksum. Filling the area with a pattern also
70 does the job and doesn't require extra storage, but it loses the contents
71 and can be a bit slower. Sometimes losing the contents can be a feature,
72 especially when trying to detect late reads. Probably that both need to
73 be implemented. Note that if contents are not strictly needed, storing a
74 checksum inside the area does the job.
75
76 - preserve total contents in cache for debugging: losing some precious
77 information can be a problem.
78
79 - pattern filling of the area helps detect use-after-free in read-only mode.
80
81 - allocate cold first helps with both cases above.
82
83Uncovered:
84 - overflow/underflow when in cache/shared/libc: it belongs to use-after-free
85 pattern and such an error during regular use ought to be caught while the
86 object was still in use.
87
88 - integrity when in libc: not under our control anymore, this is a libc
89 problem.
90
91Arbitrable:
92 - integrity when in shared cache: unlikely to happen only then if it could
93 have happened in the local cache. Shared cache not often used anymore, thus
94 probably not worth the effort
95
96 - protection against double-free when in shared cache/libc: might be done for
97 a cheap price, probably worth being able to quickly tell that such an
98 object left the local cache (e.g. the mark points to the caller, but could
99 possibly just be incremented, hence still point to the same code location+1
100 byte when released. Calls are 4 bytes min on RISC, 5 on x86 so we do have
101 some margin by having a caller's location be +0,+1,+2 or +3.
102
103 - underflow when in use: hasn't been really needed over time but may change.
104
105 - detection of late corruption when in shared cache: checksum or area filling
106 are possible, but is this as relevant as it used to considering the less
107 common use of the shared cache ?
108
109Design considerations:
110 - object allocation when in use must remain minimal
111
112 - when in cache, there are 2 lists which the compiler expect to be at least
113 aligned each (e.g. if/when we start to use DWCAS).
114
115 - the original "pool debugging" feature covers both pool tracking, double-
116 free detection, overflow detection and caller info at the cost of a single
117 pointer placed immediately after the area.
118
119 - preserving the contents might be done by placing the cache links and the
120 shared cache's list outside of the area (either before or after). Placing
121 it before has the merit that the allocated object preserves the 4-ptr
122 alignment. But when a larger alignment is desired this often does not work
123 anymore. Placing it after requires some dynamic adjustment depending on the
124 object's size. If any protection is installed, this protection must be
125 placed before the links so that the list doesn't get randomly corrupted and
126 corrupts adjacent elements. Note that if protection is desired, the extra
127 waste is probably less critical.
128
129 - a link to the last caller might have to be stored somewhere. Without
130 preservation the free() caller may be placed anywhere while the alloc()
131 caller may only be placed outside. With preservation, again the free()
132 caller may be placed either before the object or after the mark at the end.
133 There is no particular need that both share the same location though it may
134 help. Note that when debugging is enabled, the free() caller doesn't need
135 to be duplicated and can continue to serve as the double-free detection.
136 Thus maybe in the end we only need to store the caller to the last alloc()
137 but not the free() since if we want it it's available via the pool debug.
138
139 - use-after-free detection: contents may be erased on free() and checked on
140 alloc(), but they can also be checksummed on free() and rechecked on
141 alloc(). In the latter case we need to store a checksum somewhere. Note
142 that with pure checksum we don't know what part was modified, but seeing
143 previous contents can be useful.
144
145Possibilities:
146
1471) Linked lists inside the area:
148
149 V size alloc
150 ---+------------------------------+-----------------+--
151 in use |##############################| (Pool) (Tracer) |
152 ---+------------------------------+-----------------+--
153
154 ---+--+--+------------------------+-----------------+--
155 in cache |L1|L2|########################| (Caller) (Sum) |
156 ---+--+--+------------------------+-----------------+--
157or:
158 ---+--+--+------------------------+-----------------+--
159 in cache |L1|L2|###################(sum)| (Caller) |
160 ---+--+--+------------------------+-----------------+--
161
162 ---+-+----------------------------+-----------------+--
163 in global |N|XXXX########################| (Caller) |
164 ---+-+----------------------------+-----------------+--
165
166
1672) Linked lists before the the area leave room for tracer and pool before
168 the area, but the canary must remain at the end, however the area will
169 be more difficult to keep aligned:
170
171 V head size alloc
172 ----+-+-+------------------------------+-----------------+--
173 in use |T|P|##############################| (canary) |
174 ----+-+-+------------------------------+-----------------+--
175
176 --+-----+------------------------------+-----------------+--
177 in cache |L1|L2|##############################| (Caller) (Sum) |
178 --+-----+------------------------------+-----------------+--
179
180 ------+-+------------------------------+-----------------+--
181 in global |N|##############################| (Caller) |
182 ------+-+------------------------------+-----------------+--
183
184
1853) Linked lists at the end of the area, might be shared with extra data
186 depending on the state:
187
188 V size alloc
189 ---+------------------------------+-----------------+--
190 in use |##############################| (Pool) (Tracer) |
191 ---+------------------------------+-----------------+--
192
193 ---+------------------------------+--+--+-----------+--
194 in cache |##############################|L1|L2| (Caller) (Sum)
195 ---+------------------------------+--+--+-----------+--
196
197 ---+------------------------------+-+---------------+--
198 in global |##############################|N| (Caller) |
199 ---+------------------------------+-+---------------+--
200
201This model requires a little bit of alignment at the end of the area, which is
202not incompatible with pattern filling and/or checksumming:
203 - preserving the area for post-mortem analysis means nothing may be placed
204 inside. In this case it could make sense to always store the last releaser.
205 - detecting late corruption may be done either with filling or checksumming,
206 but the simple fact of assuming a risk of corruption that needs to be
207 chased means we must not store the lists nor caller inside the area.
208
209Some models imply dedicating some place when in cache:
210 - preserving contents forces the lists to be prefixed or appended, which
211 leaves unused places when in use. Thus we could systematically place the
212 pool pointer and the caller in this case.
213
214 - if preserving contents is not desired, almost everything can be stored
215 inside when not in use. Then each situation's size should be calculated
216 so that the allocated size is known, and entries are filled from the
217 beginning while not in use, or after the size when in use.
218
219 - if poisonning is requested, late corruption might be detected but then we
220 don't want the list to be stored inside at the risk of being corrupted.
221
222Maybe just implement a few models:
223 - compact/optimal: put l1/l2 inside
224 - detect late corruption: fill/sum, put l1/l2 out
225 - preserve contents: put l1/l2 out
226 - corruption+preserve: do not fill, sum out
227 - poisonning: not needed on free if pattern filling is done.
228
229try2:
230 - poison on alloc to detect missing initialization: yes/no
231 (note: nothing to do if filling done)
232 - poison on free to detect use-after-free: yes/no
233 (note: nothing to do if filling done)
234 - check on alloc for corruption-after-free: yes/no
235 If content-preserving => sum, otherwise pattern filling; in
236 any case, move L1/L2 out.
237 - check for overflows: yes/no: use a canary after the area. The
238 canary can be the pointer to the pool.
239 - check for alloc caller: yes/no => always after the area
240 - content preservation: yes/no
241 (disables filling, moves lists out)
242 - improved caller tracking: used to detect double-free, may benefit
243 from content-preserving but not only.