blob: d84fb9d018b94c47e6abb07819dfc6b9e223c9b6 [file] [log] [blame]
Willy Tarreau0722d5d2022-02-24 08:58:04 +010012022-02-24 - Pools structure and API
Willy Tarreaub64ef3e2022-01-11 14:48:20 +01002
31. Background
4-------------
5
6Memory allocation is a complex problem covered by a massive amount of
7literature. Memory allocators found in field cover a broad spectrum of
8capabilities, performance, fragmentation, efficiency etc.
9
10The main difficulty of memory allocation comes from finding the optimal chunks
11for arbitrary sized requests, that will still preserve a low fragmentation
12level. Doing this well is often expensive in CPU usage and/or memory usage.
13
14In programs like HAProxy that deal with a large number of fixed size objects,
15there is no point having to endure all this risk of fragmentation, and the
16associated costs (sometimes up to several milliseconds with certain minimalist
17allocators) are simply not acceptable. A better approach consists in grouping
18frequently used objects by size, knowing that due to the high repetitiveness of
19operations, a freed object will immediately be needed for another operation.
20
21This grouping of objects by size is what is called a pool. Pools are created
22for certain frequently allocated objects, are usually merged together when they
23are of the same size (or almost the same size), and significantly reduce the
24number of calls to the memory allocator.
25
26With the arrival of threads, pools started to become a bottleneck so they now
27implement an optional thread-local lockless cache. Finally with the arrival of
28really efficient memory allocator in modern operating systems, the shared part
29has also become optional so that it doesn't consume memory if it does not bring
30any value.
31
Willy Tarreau0722d5d2022-02-24 08:58:04 +010032In 2.6-dev2, a number of debugging options that used to be configured at build
33time only changed to boot-time and can be modified using keywords passed after
34"-dM" on the command line, which sets or clears bits in the pool_debugging
35variable. The build-time options still affect the default settings however.
36Default values may be consulted using "haproxy -dMhelp".
37
Willy Tarreaub64ef3e2022-01-11 14:48:20 +010038
392. Principles
40-------------
41
42The pools architecture is selected at build time. The main options are:
43
44 - thread-local caches and process-wide shared pool enabled (1)
45
46 This is the default situation on most operating systems. Each thread has
47 its own local cache, and when depleted it refills from the process-wide
48 pool that avoids calling the standard allocator too often. It is possible
Willy Tarreau0722d5d2022-02-24 08:58:04 +010049 to force this mode at build time by setting CONFIG_HAP_GLOBAL_POOLS or at
50 boot time with "-dMglobal".
Willy Tarreaub64ef3e2022-01-11 14:48:20 +010051
52 - thread-local caches only are enabled (2)
53
54 This is the situation on operating systems where a fast and modern memory
55 allocator is detected and when it is estimated that the process-wide shared
56 pool will not bring any benefit. This detection is automatic at build time,
Willy Tarreau0722d5d2022-02-24 08:58:04 +010057 but may also be forced at build tmie by setting CONFIG_HAP_NO_GLOBAL_POOLS
58 or at boot time with "-dMno-global".
Willy Tarreaub64ef3e2022-01-11 14:48:20 +010059
60 - pass-through to the standard allocator (3)
61
62 This is used when one absolutely wants to disable pools and rely on regular
63 malloc() and free() calls, essentially in order to trace memory allocations
64 by call points, either internally via DEBUG_MEM_STATS, or externally via
65 tools such as Valgrind. This mode of operation may be forced at build time
Willy Tarreau0722d5d2022-02-24 08:58:04 +010066 by setting DEBUG_NO_POOLS or at boot time with "-dMno-cache".
Willy Tarreaub64ef3e2022-01-11 14:48:20 +010067
68 - pass-through to an mmap-based allocator for debugging (4)
69
70 This is used only during deep debugging when trying to detect various
71 conditions such as use-after-free. In this case each allocated object's
72 size is rounded up to a multiple of a page size (4096 bytes) and an
73 integral number of pages is allocated for each object using mmap(),
74 surrounded by two unaccessible holes that aim to detect some out-of-bounds
75 accesses. Released objects are instantly freed using munmap() so that any
76 immediate subsequent access to the memory area crashes the process if the
77 area had not been reallocated yet. This mode can be enabled at build time
Willy Tarreau9192d202022-12-08 17:47:59 +010078 by setting DEBUG_UAF, or at run time by disabling pools and enabling UAF
79 with "-dMuaf". It tends to consume a lot of memory and not to scale at all
80 with concurrent calls, that tends to make the system stall. The watchdog
81 may even trigger on some slow allocations.
Willy Tarreaub64ef3e2022-01-11 14:48:20 +010082
83There are no more provisions for running with a shared pool but no thread-local
84cache: the shared pool's main goal is to compensate for the expensive calls to
85the memory allocator. This gain may be huge on tiny systems using basic
86allocators, but the thread-local cache will already achieve this. And on larger
87threaded systems, the shared pool's benefit is visible when the underlying
88allocator scales poorly, but in this case the shared pool would suffer from
89the same limitations without its thread-local cache and wouldn't provide any
90benefit.
91
92Summary of the various operation modes:
93
94 (1) (2) (3) (4)
95
96 User User User User
97 | | | |
98 pool_alloc() V V | |
99 +---------+ +---------+ | |
100 | Thread | | Thread | | |
101 | Local | | Local | | |
102 | Cache | | Cache | | |
103 +---------+ +---------+ | |
104 | | | |
105 pool_refill*() V | | |
106 +---------+ | | |
107 | Shared | | | |
108 | Pool | | | |
109 +---------+ | | |
110 | | | |
111 malloc() V V V |
112 +---------+ +---------+ +---------+ |
113 | Library | | Library | | Library | |
114 +---------+ +---------+ +---------+ |
115 | | | |
116 mmap() V V V V
117 +---------+ +---------+ +---------+ +---------+
118 | OS | | OS | | OS | | OS |
119 +---------+ +---------+ +---------+ +---------+
120
121One extra build define, DEBUG_FAIL_ALLOC, is used to enforce random allocation
122failure in pool_alloc() by randomly returning NULL, to test that callers
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100123properly handle allocation failures. It may also be enabled at boot time using
124"-dMfail". In this case the desired average rate of allocation failures can be
125fixed by global setting "tune.fail-alloc" expressed in percent.
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100126
Willy Tarreau284cfc62022-12-19 08:15:57 +0100127The thread-local caches contain the freshest objects. Its total size amounts to
128the number of bytes set in global.tune.pool_cache_size and that may be adjusted
129by the "tune.memory.hot-size" global option, which itself defaults to build
130time setting CONFIG_HAP_POOL_CACHE_SIZE, which was 1MB before 2.6 and 512kB
131after. The aim is to keep hot objects that still fit in the CPU core's private
132L2 cache. Once these objects do not fit into the cache anymore, there's no
133benefit keeping them local to the thread, so they'd rather be returned to the
134shared pool or the main allocator so that any other thread may make use of
135them. Under extreme thread contention the cost of accessing shared structures
136in the global cache or in malloc() may still be important and it may prove
137useful to increase the thread-local cache size.
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100138
139
1403. Storage in thread-local caches
141---------------------------------
142
143This section describes how objects are linked in thread local caches. This is
144not meant to be a concern for users of the pools API but it can be useful when
145inspecting post-mortem dumps or when trying to figure certain size constraints.
146
147Objects are stored in the local cache using a doubly-linked list. This ensures
148that they can be visited by freshness order like a stack, while at the same
149time being able to access them from oldest to newest when it is needed to
150evict coldest ones first:
151
152 - releasing an object to the cache always puts it on the top.
153
154 - allocating an object from the cache always takes the topmost one, hence the
155 freshest one.
156
157 - scanning for older objects to evict starts from the bottom, where the
158 oldest ones are located
159
160To that end, each thread-local cache keeps a list head in the "list" member of
161its "pool_cache_head" descriptor, that links all objects cast to type
162"pool_cache_item" via their "by_pool" member.
163
164Note that the mechanism described above only works for a single pool. When
165trying to limit the total cache size to a certain value, all pools included,
166there is also a need to arrange all objects from all pools together in the
167local caches. For this, each thread_ctx maintains a list head of recently
168released objects, all pools included, in its member "pool_lru_head". All items
169in a thread-local cache are linked there via their "by_lru" member.
170
171This means that releasing an object using pool_free() consists in inserting
172it at the beginning of two lists:
173 - the local pool_cache_head's "list" list head
174 - the thread context's "pool_lru_head" list head
175
176Allocating an object consists in picking the first entry from the pool's "list"
177and deleting its "by_pool" and "by_lru" links.
178
179Evicting an object consists in scanning the thread context's "pool_lru_head"
180backwards and deleting the object's "by_pool" and "by_lru" links.
181
182Given that entries are both inserted and removed synchronously, we have the
183guarantee that the oldest object in the thread's LRU list is always the oldest
184object in its pool, and that the next element is the cache's list head. This is
185what allows the LRU eviction mechanism to figure what pool an object belongs to
186when releasing it.
187
188Note:
189 | Since a pool_cache_item has two list entries, on 64-bit systems it will be
190 | 32-bytes long. This is the smallest size that a pool may be, and any smaller
191 | size will automatically be rounded up to this size.
192
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100193When build option DEBUG_POOL_INTEGRITY is set, or the boot-time option
194"-dMintegrity" is passed on the command line, the area of the object between
Willy Tarreau0575d8f2022-01-21 19:00:25 +0100195the two list elements and the end according to pool->size will be filled with
196pseudo-random words during pool_put_to_cache(), and these words will be
197compared between each other during pool_get_from_cache(), and the process will
198crash in case any bit differs, as this would indicate that the memory area was
199modified after the free. The pseudo-random pattern is in fact incremented by
200(~0)/3 upon each free so that roughly half of the bits change each time and we
201maximize the likelihood of detecting a single bit flip in either direction. In
202order to avoid an immediate reuse and maximize the time the object spends in
203the cache, when this option is set, objects are picked from the cache from the
204oldest one instead of the freshest one. This way even late memory corruptions
205have a chance to be detected.
206
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100207When build option DEBUG_MEMORY_POOLS is set, or the boot-time option "-dMtag"
208is passed on the executable's command line, pool objects are allocated with
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100209one extra pointer compared to the requested size, so that the bytes that follow
210the memory area point to the pool descriptor itself as long as the object is
211allocated via pool_alloc(). Upon releasing via pool_free(), the pointer is
212compared and the code will crash in if it differs. This allows to detect both
213memory overflows and object released to the wrong pool (code bug resulting from
214a copy-paste error typically).
215
216Thus an object will look like this depending whether it's in the cache or is
217currently in use:
218
219 in cache in use
220 +------------+ +------------+
221 <--+ by_pool.p | | N bytes |
222 | by_pool.n +--> | |
223 +------------+ |N=16 min on |
224 <--+ by_lru.p | | 32-bit, |
225 | by_lru.n +--> | 32 min on |
226 +------------+ | 64-bit |
227 : : : :
228 | N bytes | | |
229 +------------+ +------------+ \ optional, only if
230 : (unused) : : pool ptr : > DEBUG_MEMORY_POOLS
231 +------------+ +------------+ / is set at build time
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100232 or -dMtag at boot time
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100233
234Right now no provisions are made to return objects aligned on larger boundaries
235than those currently covered by malloc() (i.e. two pointers). This need appears
236from time to time and the layout above might evolve a little bit if needed.
237
238
2394. Storage in the process-wide shared pool
240------------------------------------------
241
242In order for the shared pool not to be a contention point in a multi-threaded
243environment, objects are allocated from or released to shared pools by clusters
244of a few objects at once. The maximum number of objects that may be moved to or
245from a shared pool at once is defined by CONFIG_HAP_POOL_CLUSTER_SIZE at build
246time, and currently defaults to 8.
247
248In order to remain scalable, the shared pool has to make some tradeoffs to
249limit the number of atomic operations and the duration of any locked operation.
250As such, it's composed of a single-linked list of clusters, themselves made of
251a single-linked list of objects.
252
253Clusters and objects are of the same type "pool_item" and are accessed from the
254pool's "free_list" member. This member points to the latest pool_item inserted
255into the pool by a release operation. And the pool_item's "next" member points
256to the next pool_item, which was the one present in the pool's free_list just
257before the pool_item was inserted, and the last pool_item in the list simply
258has a NULL "next" field.
259
260The pool_item's "down" pointer points down to the next objects part of the same
261cluster, that will be released or allocated at the same time as the first one.
262Each of these items also has a NULL "next" field, and are chained by their
263respective "down" pointers until the last one is detected by a NULL value.
264
265This results in the following layout:
266
267 pool pool_item pool_item pool_item
268 +-----------+ +------+ +------+ +------+
269 | free_list +--> | next +--> | next +--> | NULL |
270 +-----------+ +------+ +------+ +------+
271 | down | | NULL | | down |
272 +--+---+ +------+ +--+---+
273 | |
274 V V
275 +------+ +------+
276 | NULL | | NULL |
277 +------+ +------+
278 | down | | NULL |
279 +--+---+ +------+
280 |
281 V
282 +------+
283 | NULL |
284 +------+
285 | NULL |
286 +------+
287
288Allocating an entry is only a matter of performing two atomic allocations on
289the free_list and reading the pool's "next" value:
290
291 - atomically mark the free_list as being updated by writing a "magic" pointer
292 - read the first pool_item's "next" field
293 - atomically replace the free_list with this value
294
295This results in a fast operation that instantly retrieves a cluster at once.
296Then outside of the critical section entries are walked over and inserted into
297the local cache one at a time. In order to keep the code simple and efficient,
298objects allocated from the shared pool are all placed into the local cache, and
299only then the first one is allocated from the cache. This operation is
300performed by the dedicated function pool_refill_local_from_shared() which is
301called from pool_get_from_cache() when the cache is empty. It means there is an
302overhead of two list insert/delete operations for the first object and that
303could be avoided at the expense of more complex code in the fast path, but this
304is negligible since it only concerns objects that need to be visited anyway.
305
306Freeing a group of objects consists in performing the operation the other way
307around:
308
309 - atomically mark the free_list as being updated by writing a "magic" pointer
310 - write the free_list value to the to-be-released item's "next" entry
311 - atomically replace the free_list with the pool_item's pointer
312
313The cluster will simply have to be prepared before being sent to the shared
314pool. The operation of releasing a cluster at once is performed by function
315pool_put_to_shared_cache() which is called from pool_evict_last_items() which
316itself is responsible for building the clusters.
317
318Due to the way objects are stored, it is important to try to group objects as
319much as possible when releasing them because this is what will condition their
320retrieval as groups as well. This is the reason why pool_evict_last_items()
321uses the LRU to find a first entry but tries to pick several items at once from
322a single cache. Tests have shown that CONFIG_HAP_POOL_CLUSTER_SIZE set to 8
323achieves up to 6-6.5 objects on average per operation, which effectively
324divides by as much the average time spent per object by each thread and pushes
325the contention point further.
326
327Also, grouping items in clusters is a property of the process-wide shared pool
328and not of the thread-local caches. This means that there is no grouped
329operation when not using the shared pool (mode "2" in the diagram above).
330
331
3325. API
333------
334
335The following functions are public and available for user code:
336
337struct pool_head *create_pool(char *name, uint size, uint flags)
338 Create a new pool named <name> for objects of size <size> bytes. Pool
339 names are truncated to their first 11 characters. Pools of very similar
340 size will usually be merged if both have set the flag MEM_F_SHARED in
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100341 <flags>. When DEBUG_DONT_SHARE_POOLS was set at build time, or
342 "-dMno-merge" is passed on the executable's command line, the pools
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100343 also need to have the exact same name to be merged. In addition, unless
344 MEM_F_EXACT is set in <flags>, the object size will usually be rounded
345 up to the size of pointers (16 or 32 bytes). The name that will appear
346 in the pool upon merging is the name of the first created pool. The
347 returned pointer is the new (or reused) pool head, or NULL upon error.
348 Pools created this way must be destroyed using pool_destroy().
349
350void *pool_destroy(struct pool_head *pool)
351 Destroy pool <pool>, that is, all of its unused objects are freed and
352 the structure is freed as well if the pool didn't have any used objects
353 anymore. In this case NULL is returned. If some objects remain in use,
354 the pool is preserved and its pointer is returned. This ought to be
355 used essentially on exit or in rare situations where some internal
356 entities that hold pools have to be destroyed.
357
358void pool_destroy_all(void)
359 Destroy all pools, without checking which ones still have used entries.
360 This is only meant for use on exit.
361
362void *__pool_alloc(struct pool_head *pool, uint flags)
363 Allocate an entry from the pool <pool>. The allocator will first look
364 for an object in the thread-local cache if enabled, then in the shared
365 pool if enabled, then will fall back to the operating system's default
366 allocator. NULL is returned if the object couldn't be allocated (due to
367 configured limits or lack of memory). Object allocated this way have to
368 be released using pool_free(). Like with malloc(), by default the
369 contents of the returned object are undefined. If memory poisonning is
370 enabled, the object will be filled with the poisonning byte. If the
371 global "pool.fail-alloc" setting is non-zero and DEBUG_FAIL_ALLOC is
372 enabled, a random number generator will be called to randomly return a
373 NULL. The allocator's behavior may be adjusted using a few flags passed
374 in <flags>:
375 - POOL_F_NO_POISON : when set, disables memory poisonning (e.g. when
376 pointless and expensive, like for buffers)
377 - POOL_F_MUST_ZERO : when set, the memory area will be zeroed before
378 being returned, similar to what calloc() does
379 - POOL_F_NO_FAIL : when set, disables the random allocation failure,
380 e.g. for use during early init code or critical sections.
381
382void *pool_alloc(struct pool_head *pool)
383 This is an exact equivalent of __pool_alloc(pool, 0). It is the regular
384 way to allocate entries from a pool.
385
386void *pool_alloc_nocache(struct pool_head *pool)
387 Allocate an entry from the pool <pool>, bypassing the cache. If shared
388 pools are enabled, they will be consulted first. Otherwise the object
389 is allocated using the operating system's default allocator. This is
390 essentially used during early boot to pre-allocate a number of objects
391 for pools which require a minimum number of entries to exist.
392
393void *pool_zalloc(struct pool_head *pool)
394 This is an exact equivalent of __pool_alloc(pool, POOL_F_MUST_ZERO).
395
396void pool_free(struct pool_head *pool, void *ptr)
397 Free an entry allocate from one of the pool_alloc() functions above
398 from pool <pool>. The object will be placed into the thread-local cache
399 if enabled, or in the shared pool if enabled, or will be released using
Willy Tarreauaf580f62022-02-23 11:45:09 +0100400 the operating system's default allocator. When a local cache is
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100401 enabled, if the local cache size becomes larger than 75% of the maximum
402 size configured at build time, some objects will be evicted to the
403 shared pool. Such objects are taken first from the same pool, but if
404 the total size is really huge, other pools might be checked as well.
405 Some extra checks enabled at build time may enforce extra checks so
406 that the process will immediately crash if the object was not allocated
407 from this pool or experienced an overflow or some memory corruption.
408
409void pool_flush(struct pool_head *pool)
410 Free all unused objects from shared pool <pool>. Thread-local caches
411 are not affected. This is essentially used when running low on memory
412 or when stopping, in order to release a maximum amount of memory for
413 the new process.
414
415void pool_gc(struct pool_head *pool)
416 Free all unused objects from all pools, but respecting the minimum
417 number of spare objects required for each of them. Then, for operating
418 systems which support it, indicate the system that all unused memory
419 can be released. Thread-local caches are not affected. This operation
420 differs from pool_flush() in that it is run locklessly, under thread
421 isolation, and on all pools in a row. It is called by the SIGQUIT
422 signal handler and upon exit. Note that the obsolete argument <pool> is
423 not used and the convention is to pass NULL there.
424
425void dump_pools_to_trash(void)
426 Dump the current status of all pools into the trash buffer. This is
427 essentially used by the "show pools" CLI command or the SIGQUIT signal
428 handler to dump them on stderr. The total report size may not exceed
429 the size of the trash buffer. If it does, some entries will be missing.
430
431void dump_pools(void)
432 Dump the current status of all pools to stderr. This just calls
433 dump_pools_to_trash() and writes the trash to stderr.
434
435int pool_total_failures(void)
436 Report the total number of failed allocations. This is solely used to
437 report the "PoolFailed" metrics of the "show info" output. The total
438 is calculated on the fly by summing the number of failures in all pools
439 and is only meant to be used as an indicator rather than a precise
440 measure.
441
Christopher Fauletc960a3b2022-12-22 11:05:48 +0100442ullong pool_total_allocated(void)
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100443 Report the total number of bytes allocated in all pools, for reporting
444 in the "PoolAlloc_MB" field of the "show info" output. The total is
445 calculated on the fly by summing the number of allocated bytes in all
446 pools and is only meant to be used as an indicator rather than a
447 precise measure.
448
Christopher Fauletc960a3b2022-12-22 11:05:48 +0100449ullong pool_total_used(void)
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100450 Report the total number of bytes used in all pools, for reporting in
451 the "PoolUsed_MB" field of the "show info" output. The total is
452 calculated on the fly by summing the number of used bytes in all pools
453 and is only meant to be used as an indicator rather than a precise
454 measure. Note that objects present in caches are accounted as used.
455
456Some other functions exist and are only used by the pools code itself. While
457not strictly forbidden to use outside of this code, it is generally recommended
458to avoid touching them in order not to create undesired dependencies that will
459complicate maintenance.
460
461A few macros exist to ease the declaration of pools:
462
463DECLARE_POOL(ptr, name, size)
464 Placed at the top level of a file, this declares a global memory pool
465 as variable <ptr>, name <name> and size <size> bytes per element. This
466 is made via a call to REGISTER_POOL() and by assigning the resulting
467 pointer to variable <ptr>. <ptr> will be created of type "struct
468 pool_head *". If the pool needs to be visible outside of the function
469 (which is likely), it will also need to be declared somewhere as
470 "extern struct pool_head *<ptr>;". It is recommended to place such
471 declarations very early in the source file so that the variable is
472 already known to all subsequent functions which may use it.
473
474DECLARE_STATIC_POOL(ptr, name, size)
475 Placed at the top level of a file, this declares a static memory pool
476 as variable <ptr>, name <name> and size <size> bytes per element. This
477 is made via a call to REGISTER_POOL() and by assigning the resulting
478 pointer to local variable <ptr>. <ptr> will be created of type "static
479 struct pool_head *". It is recommended to place such declarations very
480 early in the source file so that the variable is already known to all
481 subsequent functions which may use it.
482
483
4846. Build options
485----------------
486
487A number of build-time defines allow to tune the pools behavior. All of them
488have to be enabled using "-Dxxx" or "-Dxxx=yyy" in the makefile's DEBUG
489variable.
490
491DEBUG_NO_POOLS
492 When this is set, pools are entirely disabled, and allocations are made
493 using malloc() instead. This is not recommended for production but may
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100494 be useful for tracing allocations. It corresponds to "-dMno-cache" at
495 boot time.
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100496
497DEBUG_MEMORY_POOLS
498 When this is set, an extra pointer is allocated at the end of each
499 object to reference the pool the object was allocated from and detect
500 buffer overflows. Then, pool_free() will provoke a crash in case it
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100501 detects an anomaly (pointer at the end not matching the pool). It
502 corresponds to "-dMtag" at boot time.
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100503
504DEBUG_FAIL_ALLOC
505 When enabled, a global setting "tune.fail-alloc" may be set to a non-
506 zero value representing a percentage of memory allocations that will be
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100507 made to fail in order to stress the calling code. It corresponds to
508 "-dMfail" at boot time.
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100509
510DEBUG_DONT_SHARE_POOLS
511 When enabled, pools of similar sizes are not merged unless the have the
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100512 exact same name. It corresponds to "-dMno-merge" at boot time.
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100513
514DEBUG_UAF
515 When enabled, pools are disabled and all allocations and releases pass
516 through mmap() and munmap(). The memory usage significantly inflates
517 and the performance degrades, but this allows to detect a lot of
518 use-after-free conditions by crashing the program at the first abnormal
Willy Tarreau9192d202022-12-08 17:47:59 +0100519 access. This should not be used in production. It corresponds to
520 boot-time options "-dMuaf". Caching is disabled but may be re-enabled
521 using "-dMcache".
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100522
Willy Tarreau0575d8f2022-01-21 19:00:25 +0100523DEBUG_POOL_INTEGRITY
524 When enabled, objects picked from the cache are checked for corruption
525 by comparing their contents against a pattern that was placed when they
526 were inserted into the cache. Objects are also allocated in the reverse
527 order, from the oldest one to the most recent, so as to maximize the
528 ability to detect such a corruption. The goal is to detect writes after
529 free (or possibly hardware memory corruptions). Contrary to DEBUG_UAF
530 this cannot detect reads after free, but may possibly detect later
531 corruptions and will not consume extra memory. The CPU usage will
532 increase a bit due to the cost of filling/checking the area and for the
533 preference for cold cache instead of hot cache, though not as much as
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100534 with DEBUG_UAF. This option is meant to be usable in production. It
535 corresponds to boot-time options "-dMcold-first,integrity".
Willy Tarreau0575d8f2022-01-21 19:00:25 +0100536
Willy Tarreauadd43fa2022-01-24 15:52:51 +0100537DEBUG_POOL_TRACING
538 When enabled, the callers of pool_alloc() and pool_free() will be
539 recorded into an extra memory area placed after the end of the object.
540 This may only be required by developers who want to get a few more
541 hints about code paths involved in some crashes, but will serve no
542 purpose outside of this. It remains compatible (and completes well)
543 DEBUG_POOL_INTEGRITY above. Such information become meaningless once
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100544 the objects leave the thread-local cache. It corresponds to boot-time
545 option "-dMcaller".
Willy Tarreauadd43fa2022-01-24 15:52:51 +0100546
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100547DEBUG_MEM_STATS
548 When enabled, all malloc/calloc/realloc/strdup/free calls are accounted
549 for per call place (file+line number), and may be displayed or reset on
550 the CLI using "debug dev memstats". This is essentially used to detect
551 potential leaks or abnormal usages. When pools are enabled (default),
552 such calls are rare and the output will mostly contain calls induced by
553 libraries. When pools are disabled, about all calls to pool_alloc() and
554 pool_free() will also appear since they will be remapped to standard
555 functions.
556
557CONFIG_HAP_GLOBAL_POOLS
558 When enabled, process-wide shared pools will be forcefully enabled even
559 if not considered useful on the platform. The default is to let haproxy
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100560 decide based on the OS and C library. It corresponds to boot-time
561 option "-dMglobal".
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100562
563CONFIG_HAP_NO_GLOBAL_POOLS
564 When enabled, process-wide shared pools will be forcefully disabled
565 even if considered useful on the platform. The default is to let
Willy Tarreau0722d5d2022-02-24 08:58:04 +0100566 haproxy decide based on the OS and C library. It corresponds to
567 boot-time option "-dMno-global".
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100568
569CONFIG_HAP_POOL_CACHE_SIZE
Willy Tarreau284cfc62022-12-19 08:15:57 +0100570 This allows one to define the default size of the per-thread cache, in
571 bytes. The default value is 512 kB (524288). Smaller values will use
572 less memory at the expense of a possibly higher CPU usage when using
573 many threads. Higher values will give diminishing returns on
574 performance while using much more memory. Usually there is no benefit
575 in using more than a per-core L2 cache size. It would be better not to
576 set this value lower than a few times the size of a buffer (bufsize,
577 defaults to 16 kB). In addition, keep in mind that this option may be
578 changed at runtime using "tune.memory.hot-size".
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100579
580CONFIG_HAP_POOL_CLUSTER_SIZE
581 This allows one to define the maximum number of objects that will be
582 groupped together in an allocation from the shared pool. Values 4 to 8
583 have experimentally shown good results with 16 threads. On systems with
Ilya Shipitsin3b64a282022-07-29 22:26:53 +0500584 more cores or loosely coupled caches exhibiting slow atomic operations,
Willy Tarreaub64ef3e2022-01-11 14:48:20 +0100585 it could possibly make sense to slightly increase this value.