Blame - doc/internals/api/pools.txt - haproxy

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

1

2022-02-24 - Pools structure and API

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

1. Background

-------------

Memory allocation is a complex problem covered by a massive amount of

7

literature. Memory allocators found in field cover a broad spectrum of

8

capabilities, performance, fragmentation, efficiency etc.

9

10

The main difficulty of memory allocation comes from finding the optimal chunks

11

for arbitrary sized requests, that will still preserve a low fragmentation

12

level. Doing this well is often expensive in CPU usage and/or memory usage.

13

14

In programs like HAProxy that deal with a large number of fixed size objects,

15

there is no point having to endure all this risk of fragmentation, and the

16

associated costs (sometimes up to several milliseconds with certain minimalist

17

allocators) are simply not acceptable. A better approach consists in grouping

18

frequently used objects by size, knowing that due to the high repetitiveness of

19

operations, a freed object will immediately be needed for another operation.

20

21

This grouping of objects by size is what is called a pool. Pools are created

22

for certain frequently allocated objects, are usually merged together when they

23

are of the same size (or almost the same size), and significantly reduce the

24

number of calls to the memory allocator.

25

26

With the arrival of threads, pools started to become a bottleneck so they now

27

implement an optional thread-local lockless cache. Finally with the arrival of

28

really efficient memory allocator in modern operating systems, the shared part

29

has also become optional so that it doesn't consume memory if it does not bring

30

any value.

31

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

32

In 2.6-dev2, a number of debugging options that used to be configured at build

33

time only changed to boot-time and can be modified using keywords passed after

34

"-dM" on the command line, which sets or clears bits in the pool_debugging

35

variable. The build-time options still affect the default settings however.

36

Default values may be consulted using "haproxy -dMhelp".

37

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

2. Principles

-------------

The pools architecture is selected at build time. The main options are:

43

44

- thread-local caches and process-wide shared pool enabled (1)

45

46

This is the default situation on most operating systems. Each thread has

47

its own local cache, and when depleted it refills from the process-wide

48

pool that avoids calling the standard allocator too often. It is possible

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

49

to force this mode at build time by setting CONFIG_HAP_GLOBAL_POOLS or at

50

boot time with "-dMglobal".

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

51

52

- thread-local caches only are enabled (2)

53

54

This is the situation on operating systems where a fast and modern memory

55

allocator is detected and when it is estimated that the process-wide shared

56

pool will not bring any benefit. This detection is automatic at build time,

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

57

but may also be forced at build tmie by setting CONFIG_HAP_NO_GLOBAL_POOLS

58

or at boot time with "-dMno-global".

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

59

60

- pass-through to the standard allocator (3)

61

62

This is used when one absolutely wants to disable pools and rely on regular

63

malloc() and free() calls, essentially in order to trace memory allocations

64

by call points, either internally via DEBUG_MEM_STATS, or externally via

65

tools such as Valgrind. This mode of operation may be forced at build time

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

66

by setting DEBUG_NO_POOLS or at boot time with "-dMno-cache".

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

67

68

- pass-through to an mmap-based allocator for debugging (4)

69

70

This is used only during deep debugging when trying to detect various

71

conditions such as use-after-free. In this case each allocated object's

72

size is rounded up to a multiple of a page size (4096 bytes) and an

73

integral number of pages is allocated for each object using mmap(),

74

surrounded by two unaccessible holes that aim to detect some out-of-bounds

75

accesses. Released objects are instantly freed using munmap() so that any

76

immediate subsequent access to the memory area crashes the process if the

77

area had not been reallocated yet. This mode can be enabled at build time

Willy Tarreau

9192d20

2022-12-08 17:47:59 +0100

[diff] [blame]

78

by setting DEBUG_UAF, or at run time by disabling pools and enabling UAF

79

with "-dMuaf". It tends to consume a lot of memory and not to scale at all

80

with concurrent calls, that tends to make the system stall. The watchdog

81

may even trigger on some slow allocations.

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

82

83

There are no more provisions for running with a shared pool but no thread-local

84

cache: the shared pool's main goal is to compensate for the expensive calls to

85

the memory allocator. This gain may be huge on tiny systems using basic

86

allocators, but the thread-local cache will already achieve this. And on larger

87

threaded systems, the shared pool's benefit is visible when the underlying

88

allocator scales poorly, but in this case the shared pool would suffer from

89

the same limitations without its thread-local cache and wouldn't provide any

90

benefit.

91

92

Summary of the various operation modes:

(1) (2) (3) (4)

User User User User

| | | |

pool_alloc() V V | |

+---------+ +---------+ | |

+---------+ +---------+ | |

104

| | | |

105

pool_refill*() V | | |

+---------+ | | |

+---------+ | | |

| | | |

malloc() V V V |

+---------+ +---------+ +---------+ |

113

114

+---------+ +---------+ +---------+ |

115

| | | |

116

mmap() V V V V

117

+---------+ +---------+ +---------+ +---------+

118

| OS | | OS | | OS | | OS |

119

+---------+ +---------+ +---------+ +---------+

120

121

One extra build define, DEBUG_FAIL_ALLOC, is used to enforce random allocation

122

failure in pool_alloc() by randomly returning NULL, to test that callers

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

123

properly handle allocation failures. It may also be enabled at boot time using

124

"-dMfail". In this case the desired average rate of allocation failures can be

125

fixed by global setting "tune.fail-alloc" expressed in percent.

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

126

Willy Tarreau

284cfc6

2022-12-19 08:15:57 +0100

[diff] [blame]

127

The thread-local caches contain the freshest objects. Its total size amounts to

128

the number of bytes set in global.tune.pool_cache_size and that may be adjusted

129

by the "tune.memory.hot-size" global option, which itself defaults to build

130

time setting CONFIG_HAP_POOL_CACHE_SIZE, which was 1MB before 2.6 and 512kB

131

after. The aim is to keep hot objects that still fit in the CPU core's private

132

L2 cache. Once these objects do not fit into the cache anymore, there's no

133

benefit keeping them local to the thread, so they'd rather be returned to the

134

shared pool or the main allocator so that any other thread may make use of

135

them. Under extreme thread contention the cost of accessing shared structures

136

in the global cache or in malloc() may still be important and it may prove

137

useful to increase the thread-local cache size.

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

138

139

140

3. Storage in thread-local caches

141

---------------------------------

142

143

This section describes how objects are linked in thread local caches. This is

144

not meant to be a concern for users of the pools API but it can be useful when

145

inspecting post-mortem dumps or when trying to figure certain size constraints.

146

147

Objects are stored in the local cache using a doubly-linked list. This ensures

148

that they can be visited by freshness order like a stack, while at the same

149

time being able to access them from oldest to newest when it is needed to

150

evict coldest ones first:

151

152

- releasing an object to the cache always puts it on the top.

153

154

- allocating an object from the cache always takes the topmost one, hence the

155

freshest one.

156

157

- scanning for older objects to evict starts from the bottom, where the

158

oldest ones are located

159

160

To that end, each thread-local cache keeps a list head in the "list" member of

161

its "pool_cache_head" descriptor, that links all objects cast to type

162

"pool_cache_item" via their "by_pool" member.

163

164

Note that the mechanism described above only works for a single pool. When

165

trying to limit the total cache size to a certain value, all pools included,

166

there is also a need to arrange all objects from all pools together in the

167

local caches. For this, each thread_ctx maintains a list head of recently

168

released objects, all pools included, in its member "pool_lru_head". All items

169

in a thread-local cache are linked there via their "by_lru" member.

170

171

This means that releasing an object using pool_free() consists in inserting

172

it at the beginning of two lists:

173

- the local pool_cache_head's "list" list head

174

- the thread context's "pool_lru_head" list head

175

176

Allocating an object consists in picking the first entry from the pool's "list"

177

and deleting its "by_pool" and "by_lru" links.

178

179

Evicting an object consists in scanning the thread context's "pool_lru_head"

180

backwards and deleting the object's "by_pool" and "by_lru" links.

181

182

Given that entries are both inserted and removed synchronously, we have the

183

guarantee that the oldest object in the thread's LRU list is always the oldest

184

object in its pool, and that the next element is the cache's list head. This is

185

what allows the LRU eviction mechanism to figure what pool an object belongs to

when releasing it.

Note:

| Since a pool_cache_item has two list entries, on 64-bit systems it will be

190

| 32-bytes long. This is the smallest size that a pool may be, and any smaller

191

| size will automatically be rounded up to this size.

192

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

193

When build option DEBUG_POOL_INTEGRITY is set, or the boot-time option

194

"-dMintegrity" is passed on the command line, the area of the object between

Willy Tarreau

0575d8f

2022-01-21 19:00:25 +0100

[diff] [blame]

195

the two list elements and the end according to pool->size will be filled with

196

pseudo-random words during pool_put_to_cache(), and these words will be

197

compared between each other during pool_get_from_cache(), and the process will

198

crash in case any bit differs, as this would indicate that the memory area was

199

modified after the free. The pseudo-random pattern is in fact incremented by

200

(~0)/3 upon each free so that roughly half of the bits change each time and we

201

maximize the likelihood of detecting a single bit flip in either direction. In

202

order to avoid an immediate reuse and maximize the time the object spends in

203

the cache, when this option is set, objects are picked from the cache from the

204

oldest one instead of the freshest one. This way even late memory corruptions

205

have a chance to be detected.

206

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

207

When build option DEBUG_MEMORY_POOLS is set, or the boot-time option "-dMtag"

208

is passed on the executable's command line, pool objects are allocated with

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

209

one extra pointer compared to the requested size, so that the bytes that follow

210

the memory area point to the pool descriptor itself as long as the object is

211

allocated via pool_alloc(). Upon releasing via pool_free(), the pointer is

212

compared and the code will crash in if it differs. This allows to detect both

213

memory overflows and object released to the wrong pool (code bug resulting from

214

a copy-paste error typically).

215

216

Thus an object will look like this depending whether it's in the cache or is

currently in use:

in cache in use

+------------+ +------------+

221

<--+ by_pool.p | | N bytes |

222

| by_pool.n +--> | |

223

+------------+ |N=16 min on |

224

<--+ by_lru.p | | 32-bit, |

225

| by_lru.n +--> | 32 min on |

226

+------------+ | 64-bit |

227

: : : :

228

| N bytes | | |

229

+------------+ +------------+ \ optional, only if

230

: (unused) : : pool ptr : > DEBUG_MEMORY_POOLS

231

+------------+ +------------+ / is set at build time

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

232

or -dMtag at boot time

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

233

234

Right now no provisions are made to return objects aligned on larger boundaries

235

than those currently covered by malloc() (i.e. two pointers). This need appears

236

from time to time and the layout above might evolve a little bit if needed.

237

238

239

4. Storage in the process-wide shared pool

240

------------------------------------------

241

242

In order for the shared pool not to be a contention point in a multi-threaded

243

environment, objects are allocated from or released to shared pools by clusters

244

of a few objects at once. The maximum number of objects that may be moved to or

245

from a shared pool at once is defined by CONFIG_HAP_POOL_CLUSTER_SIZE at build

246

time, and currently defaults to 8.

247

248

In order to remain scalable, the shared pool has to make some tradeoffs to

249

limit the number of atomic operations and the duration of any locked operation.

250

As such, it's composed of a single-linked list of clusters, themselves made of

251

a single-linked list of objects.

252

253

Clusters and objects are of the same type "pool_item" and are accessed from the

254

pool's "free_list" member. This member points to the latest pool_item inserted

255

into the pool by a release operation. And the pool_item's "next" member points

256

to the next pool_item, which was the one present in the pool's free_list just

257

before the pool_item was inserted, and the last pool_item in the list simply

258

has a NULL "next" field.

259

260

The pool_item's "down" pointer points down to the next objects part of the same

261

cluster, that will be released or allocated at the same time as the first one.

262

Each of these items also has a NULL "next" field, and are chained by their

263

respective "down" pointers until the last one is detected by a NULL value.

264

265

This results in the following layout:

266

267

pool pool_item pool_item pool_item

268

+-----------+ +------+ +------+ +------+

269

270

+-----------+ +------+ +------+ +------+

271

272

+--+---+ +------+ +--+---+

| |

V V

+------+ +------+

| NULL | | NULL |

+------+ +------+

| down | | NULL |

+--+---+ +------+

|

V

+------+

| NULL |

+------+

| NULL |

+------+

Allocating an entry is only a matter of performing two atomic allocations on

289

the free_list and reading the pool's "next" value:

290

291

- atomically mark the free_list as being updated by writing a "magic" pointer

292

- read the first pool_item's "next" field

293

- atomically replace the free_list with this value

294

295

This results in a fast operation that instantly retrieves a cluster at once.

296

Then outside of the critical section entries are walked over and inserted into

297

the local cache one at a time. In order to keep the code simple and efficient,

298

objects allocated from the shared pool are all placed into the local cache, and

299

only then the first one is allocated from the cache. This operation is

300

performed by the dedicated function pool_refill_local_from_shared() which is

301

called from pool_get_from_cache() when the cache is empty. It means there is an

302

overhead of two list insert/delete operations for the first object and that

303

could be avoided at the expense of more complex code in the fast path, but this

304

is negligible since it only concerns objects that need to be visited anyway.

305

306

Freeing a group of objects consists in performing the operation the other way

307

around:

308

309

- atomically mark the free_list as being updated by writing a "magic" pointer

310

- write the free_list value to the to-be-released item's "next" entry

311

- atomically replace the free_list with the pool_item's pointer

312

313

The cluster will simply have to be prepared before being sent to the shared

314

pool. The operation of releasing a cluster at once is performed by function

315

pool_put_to_shared_cache() which is called from pool_evict_last_items() which

316

itself is responsible for building the clusters.

317

318

Due to the way objects are stored, it is important to try to group objects as

319

much as possible when releasing them because this is what will condition their

320

retrieval as groups as well. This is the reason why pool_evict_last_items()

321

uses the LRU to find a first entry but tries to pick several items at once from

322

a single cache. Tests have shown that CONFIG_HAP_POOL_CLUSTER_SIZE set to 8

323

achieves up to 6-6.5 objects on average per operation, which effectively

324

divides by as much the average time spent per object by each thread and pushes

325

the contention point further.

326

327

Also, grouping items in clusters is a property of the process-wide shared pool

328

and not of the thread-local caches. This means that there is no grouped

329

operation when not using the shared pool (mode "2" in the diagram above).

5. API

------

The following functions are public and available for user code:

336

337

struct pool_head *create_pool(char *name, uint size, uint flags)

338

Create a new pool named <name> for objects of size <size> bytes. Pool

339

names are truncated to their first 11 characters. Pools of very similar

340

size will usually be merged if both have set the flag MEM_F_SHARED in

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

341

<flags>. When DEBUG_DONT_SHARE_POOLS was set at build time, or

342

"-dMno-merge" is passed on the executable's command line, the pools

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

343

also need to have the exact same name to be merged. In addition, unless

344

MEM_F_EXACT is set in <flags>, the object size will usually be rounded

345

up to the size of pointers (16 or 32 bytes). The name that will appear

346

in the pool upon merging is the name of the first created pool. The

347

returned pointer is the new (or reused) pool head, or NULL upon error.

348

Pools created this way must be destroyed using pool_destroy().

349

350

void *pool_destroy(struct pool_head *pool)

351

Destroy pool <pool>, that is, all of its unused objects are freed and

352

the structure is freed as well if the pool didn't have any used objects

353

anymore. In this case NULL is returned. If some objects remain in use,

354

the pool is preserved and its pointer is returned. This ought to be

355

used essentially on exit or in rare situations where some internal

356

entities that hold pools have to be destroyed.

357

358

void pool_destroy_all(void)

359

Destroy all pools, without checking which ones still have used entries.

360

This is only meant for use on exit.

361

362

void *__pool_alloc(struct pool_head *pool, uint flags)

363

Allocate an entry from the pool <pool>. The allocator will first look

364

for an object in the thread-local cache if enabled, then in the shared

365

pool if enabled, then will fall back to the operating system's default

366

allocator. NULL is returned if the object couldn't be allocated (due to

367

configured limits or lack of memory). Object allocated this way have to

368

be released using pool_free(). Like with malloc(), by default the

369

contents of the returned object are undefined. If memory poisonning is

370

enabled, the object will be filled with the poisonning byte. If the

371

global "pool.fail-alloc" setting is non-zero and DEBUG_FAIL_ALLOC is

372

enabled, a random number generator will be called to randomly return a

373

NULL. The allocator's behavior may be adjusted using a few flags passed

374

in <flags>:

375

- POOL_F_NO_POISON : when set, disables memory poisonning (e.g. when

376

pointless and expensive, like for buffers)

377

- POOL_F_MUST_ZERO : when set, the memory area will be zeroed before

378

being returned, similar to what calloc() does

379

- POOL_F_NO_FAIL : when set, disables the random allocation failure,

380

e.g. for use during early init code or critical sections.

381

382

void *pool_alloc(struct pool_head *pool)

383

This is an exact equivalent of __pool_alloc(pool, 0). It is the regular

384

way to allocate entries from a pool.

385

386

void *pool_alloc_nocache(struct pool_head *pool)

387

Allocate an entry from the pool <pool>, bypassing the cache. If shared

388

pools are enabled, they will be consulted first. Otherwise the object

389

is allocated using the operating system's default allocator. This is

390

essentially used during early boot to pre-allocate a number of objects

391

for pools which require a minimum number of entries to exist.

392

393

void *pool_zalloc(struct pool_head *pool)

394

This is an exact equivalent of __pool_alloc(pool, POOL_F_MUST_ZERO).

395

396

void pool_free(struct pool_head *pool, void *ptr)

397

Free an entry allocate from one of the pool_alloc() functions above

398

from pool <pool>. The object will be placed into the thread-local cache

399

if enabled, or in the shared pool if enabled, or will be released using

Willy Tarreau

af580f6

2022-02-23 11:45:09 +0100

[diff] [blame]

400

the operating system's default allocator. When a local cache is

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

401

enabled, if the local cache size becomes larger than 75% of the maximum

402

size configured at build time, some objects will be evicted to the

403

shared pool. Such objects are taken first from the same pool, but if

404

the total size is really huge, other pools might be checked as well.

405

Some extra checks enabled at build time may enforce extra checks so

406

that the process will immediately crash if the object was not allocated

407

from this pool or experienced an overflow or some memory corruption.

408

409

void pool_flush(struct pool_head *pool)

410

Free all unused objects from shared pool <pool>. Thread-local caches

411

are not affected. This is essentially used when running low on memory

412

or when stopping, in order to release a maximum amount of memory for

413

the new process.

414

415

void pool_gc(struct pool_head *pool)

416

Free all unused objects from all pools, but respecting the minimum

417

number of spare objects required for each of them. Then, for operating

418

systems which support it, indicate the system that all unused memory

419

can be released. Thread-local caches are not affected. This operation

420

differs from pool_flush() in that it is run locklessly, under thread

421

isolation, and on all pools in a row. It is called by the SIGQUIT

422

signal handler and upon exit. Note that the obsolete argument <pool> is

423

not used and the convention is to pass NULL there.

424

425

void dump_pools_to_trash(void)

426

Dump the current status of all pools into the trash buffer. This is

427

essentially used by the "show pools" CLI command or the SIGQUIT signal

428

handler to dump them on stderr. The total report size may not exceed

429

the size of the trash buffer. If it does, some entries will be missing.

430

431

void dump_pools(void)

432

Dump the current status of all pools to stderr. This just calls

433

dump_pools_to_trash() and writes the trash to stderr.

434

435

int pool_total_failures(void)

436

Report the total number of failed allocations. This is solely used to

437

report the "PoolFailed" metrics of the "show info" output. The total

438

is calculated on the fly by summing the number of failures in all pools

439

and is only meant to be used as an indicator rather than a precise

440

measure.

441

Christopher Faulet

c960a3b

2022-12-22 11:05:48 +0100

[diff] [blame]

442

ullong pool_total_allocated(void)

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

443

Report the total number of bytes allocated in all pools, for reporting

444

in the "PoolAlloc_MB" field of the "show info" output. The total is

445

calculated on the fly by summing the number of allocated bytes in all

446

pools and is only meant to be used as an indicator rather than a

447

precise measure.

448

Christopher Faulet

c960a3b

2022-12-22 11:05:48 +0100

[diff] [blame]

449

ullong pool_total_used(void)

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

450

Report the total number of bytes used in all pools, for reporting in

451

the "PoolUsed_MB" field of the "show info" output. The total is

452

calculated on the fly by summing the number of used bytes in all pools

453

and is only meant to be used as an indicator rather than a precise

454

measure. Note that objects present in caches are accounted as used.

455

456

Some other functions exist and are only used by the pools code itself. While

457

not strictly forbidden to use outside of this code, it is generally recommended

458

to avoid touching them in order not to create undesired dependencies that will

459

complicate maintenance.

460

461

A few macros exist to ease the declaration of pools:

462

463

DECLARE_POOL(ptr, name, size)

464

Placed at the top level of a file, this declares a global memory pool

465

as variable <ptr>, name <name> and size <size> bytes per element. This

466

is made via a call to REGISTER_POOL() and by assigning the resulting

467

pointer to variable <ptr>. <ptr> will be created of type "struct

468

pool_head *". If the pool needs to be visible outside of the function

469

(which is likely), it will also need to be declared somewhere as

470

"extern struct pool_head *<ptr>;". It is recommended to place such

471

declarations very early in the source file so that the variable is

472

already known to all subsequent functions which may use it.

473

474

DECLARE_STATIC_POOL(ptr, name, size)

475

Placed at the top level of a file, this declares a static memory pool

476

as variable <ptr>, name <name> and size <size> bytes per element. This

477

is made via a call to REGISTER_POOL() and by assigning the resulting

478

pointer to local variable <ptr>. <ptr> will be created of type "static

479

struct pool_head *". It is recommended to place such declarations very

480

early in the source file so that the variable is already known to all

481

subsequent functions which may use it.

6. Build options

----------------

A number of build-time defines allow to tune the pools behavior. All of them

488

have to be enabled using "-Dxxx" or "-Dxxx=yyy" in the makefile's DEBUG

variable.

DEBUG_NO_POOLS

When this is set, pools are entirely disabled, and allocations are made

493

using malloc() instead. This is not recommended for production but may

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

494

be useful for tracing allocations. It corresponds to "-dMno-cache" at

495

boot time.

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

496

497

DEBUG_MEMORY_POOLS

498

When this is set, an extra pointer is allocated at the end of each

499

object to reference the pool the object was allocated from and detect

500

buffer overflows. Then, pool_free() will provoke a crash in case it

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

501

detects an anomaly (pointer at the end not matching the pool). It

502

corresponds to "-dMtag" at boot time.

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

503

504

DEBUG_FAIL_ALLOC

505

When enabled, a global setting "tune.fail-alloc" may be set to a non-

506

zero value representing a percentage of memory allocations that will be

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

507

made to fail in order to stress the calling code. It corresponds to

508

"-dMfail" at boot time.

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

509

510

DEBUG_DONT_SHARE_POOLS

511

When enabled, pools of similar sizes are not merged unless the have the

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

512

exact same name. It corresponds to "-dMno-merge" at boot time.

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

513

514

DEBUG_UAF

515

When enabled, pools are disabled and all allocations and releases pass

516

through mmap() and munmap(). The memory usage significantly inflates

517

and the performance degrades, but this allows to detect a lot of

518

use-after-free conditions by crashing the program at the first abnormal

Willy Tarreau

9192d20

2022-12-08 17:47:59 +0100

[diff] [blame]

519

access. This should not be used in production. It corresponds to

520

boot-time options "-dMuaf". Caching is disabled but may be re-enabled

521

using "-dMcache".

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

522

Willy Tarreau

0575d8f

2022-01-21 19:00:25 +0100

[diff] [blame]

523

DEBUG_POOL_INTEGRITY

524

When enabled, objects picked from the cache are checked for corruption

525

by comparing their contents against a pattern that was placed when they

526

were inserted into the cache. Objects are also allocated in the reverse

527

order, from the oldest one to the most recent, so as to maximize the

528

ability to detect such a corruption. The goal is to detect writes after

529

free (or possibly hardware memory corruptions). Contrary to DEBUG_UAF

530

this cannot detect reads after free, but may possibly detect later

531

corruptions and will not consume extra memory. The CPU usage will

532

increase a bit due to the cost of filling/checking the area and for the

533

preference for cold cache instead of hot cache, though not as much as

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

534

with DEBUG_UAF. This option is meant to be usable in production. It

535

corresponds to boot-time options "-dMcold-first,integrity".

Willy Tarreau

0575d8f

2022-01-21 19:00:25 +0100

[diff] [blame]

536

Willy Tarreau

add43fa

2022-01-24 15:52:51 +0100

[diff] [blame]

537

DEBUG_POOL_TRACING

538

When enabled, the callers of pool_alloc() and pool_free() will be

539

recorded into an extra memory area placed after the end of the object.

540

This may only be required by developers who want to get a few more

541

hints about code paths involved in some crashes, but will serve no

542

purpose outside of this. It remains compatible (and completes well)

543

DEBUG_POOL_INTEGRITY above. Such information become meaningless once

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

544

the objects leave the thread-local cache. It corresponds to boot-time

545

option "-dMcaller".

Willy Tarreau

add43fa

2022-01-24 15:52:51 +0100

[diff] [blame]

546

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

547

DEBUG_MEM_STATS

548

When enabled, all malloc/calloc/realloc/strdup/free calls are accounted

549

for per call place (file+line number), and may be displayed or reset on

550

the CLI using "debug dev memstats". This is essentially used to detect

551

potential leaks or abnormal usages. When pools are enabled (default),

552

such calls are rare and the output will mostly contain calls induced by

553

libraries. When pools are disabled, about all calls to pool_alloc() and

554

pool_free() will also appear since they will be remapped to standard

555

functions.

556

557

CONFIG_HAP_GLOBAL_POOLS

558

When enabled, process-wide shared pools will be forcefully enabled even

559

if not considered useful on the platform. The default is to let haproxy

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

560

decide based on the OS and C library. It corresponds to boot-time

561

option "-dMglobal".

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

562

563

CONFIG_HAP_NO_GLOBAL_POOLS

564

When enabled, process-wide shared pools will be forcefully disabled

565

even if considered useful on the platform. The default is to let

Willy Tarreau

0722d5d

2022-02-24 08:58:04 +0100

[diff] [blame]

566

haproxy decide based on the OS and C library. It corresponds to

567

boot-time option "-dMno-global".

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

568

569

CONFIG_HAP_POOL_CACHE_SIZE

Willy Tarreau

284cfc6

2022-12-19 08:15:57 +0100

[diff] [blame]

570

This allows one to define the default size of the per-thread cache, in

571

bytes. The default value is 512 kB (524288). Smaller values will use

572

less memory at the expense of a possibly higher CPU usage when using

573

many threads. Higher values will give diminishing returns on

574

performance while using much more memory. Usually there is no benefit

575

in using more than a per-core L2 cache size. It would be better not to

576

set this value lower than a few times the size of a buffer (bufsize,

577

defaults to 16 kB). In addition, keep in mind that this option may be

578

changed at runtime using "tune.memory.hot-size".

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

579

580

CONFIG_HAP_POOL_CLUSTER_SIZE

581

This allows one to define the maximum number of objects that will be

582

groupped together in an allocation from the shared pool. Values 4 to 8

583

have experimentally shown good results with 16 threads. On systems with

Ilya Shipitsin

3b64a28

2022-07-29 22:26:53 +0500

[diff] [blame]

584

more cores or loosely coupled caches exhibiting slow atomic operations,

Willy Tarreau

b64ef3e

2022-01-11 14:48:20 +0100

[diff] [blame]

585

it could possibly make sense to slightly increase this value.