Blame - doc/coding-style.txt - haproxy

blob: 59f0991fc6f58078da85c274efa5272a944e8ea8 [file] [log] [blame]

Willy Tarreau	bdc6209	2020-07-07 16:21:19 +0200	[diff] [blame]	1	2020/07/07 - HAProxy coding style - Willy Tarreau <w@1wt.eu>
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	2	------------------------------------------------------------
				3
				4	A number of contributors are often embarrassed with coding style issues, they
				5	don't always know if they're doing it right, especially since the coding style
				6	has elvoved along the years. What is explained here is not necessarily what is
				7	applied in the code, but new code should as much as possible conform to this
				8	style. Coding style fixes happen when code is replaced. It is useless to send
				9	patches to fix coding style only, they will be rejected, unless they belong to
				10	a patch series which needs these fixes prior to get code changes. Also, please
				11	avoid fixing coding style in the same patches as functional changes, they make
				12	code review harder.
				13
Willy Tarreau	8f1b35b	2015-09-21 16:36:15 +0200	[diff] [blame]	14	A good way to quickly validate your patch before submitting it is to pass it
				15	through the Linux kernel's checkpatch.pl utility which can be downloaded here :
				16
				17	http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/scripts/checkpatch.pl
				18
				19	Running it with the following options relaxes its checks to accommodate to the
				20	extra degree of freedom that is tolerated in HAProxy's coding style compared to
				21	the stricter style used in the kernel :
				22
				23	checkpatch.pl -q --max-line-length=160 --no-tree --no-signoff \
				24	--ignore=LEADING_SPACE,CODE_INDENT,DEEP_INDENTATION \
				25	--ignore=ELSE_AFTER_BRACE < patch
				26
				27	You can take its output as hints instead of strict rules, but in general its
				28	output will be accurate and it may even spot some real bugs.
				29
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	30	When modifying a file, you must accept the terms of the license of this file
				31	which is recalled at the top of the file, or is explained in the LICENSE file,
				32	or if not stated, defaults to LGPL version 2.1 or later for files in the
				33	'include' directory, and GPL version 2 or later for all other files.
				34
				35	When adding a new file, you must add a copyright banner at the top of the
				36	file with your real name, e-mail address and a reminder of the license.
				37	Contributions under incompatible licenses or too restrictive licenses might
				38	get rejected. If in doubt, please apply the principle above for existing files.
				39
				40	All code examples below will intentionally be prefixed with " \| " to mark
				41	where the code aligns with the first column, and tabs in this document will be
				42	represented as a series of 8 spaces so that it displays the same everywhere.
				43
				44
				45	1) Indentation and alignment
				46	----------------------------
				47
				48	1.1) Indentation
				49	----------------
				50
				51	Indentation and alignment are two completely different things that people often
				52	get wrong. Indentation is used to mark a sub-level in the code. A sub-level
				53	means that a block is executed in the context of another block (eg: a function
				54	or a condition) :
				55
				56	\| main(int argc, char **argv)
				57	\| {
				58	\| int i;
				59	\|
				60	\| if (argc < 2)
				61	\| exit(1);
				62	\| }
				63
				64	In the example above, the code belongs to the main() function and the exit()
				65	call belongs to the if statement. Indentation is made with tabs (\t, ASCII 9),
				66	which allows any developer to configure their preferred editor to use their
				67	own tab size and to still get the text properly indented. Exactly one tab is
				68	used per sub-level. Tabs may only appear at the beginning of a line or after
				69	another tab. It is illegal to put a tab after some text, as it mangles displays
				70	in a different manner for different users (particularly when used to align
				71	comments or values after a #define). If you're tempted to put a tab after some
				72	text, then you're doing it wrong and you need alignment instead (see below).
				73
				74	Note that there are places where the code was not properly indented in the
				75	past. In order to view it correctly, you may have to set your tab size to 8
				76	characters.
				77
				78
				79	1.2) Alignment
				80	--------------
				81
				82	Alignment is used to continue a line in a way to makes things easier to group
				83	together. By definition, alignment is character-based, so it uses spaces. Tabs
				84	would not work because for one tab there would not be as many characters on all
				85	displays. For instance, the arguments in a function declaration may be broken
				86	into multiple lines using alignment spaces :
				87
				88	\| int http_header_match2(const char hdr, const char end,
				89	\| const char *name, int len)
				90	\| {
				91	\| ...
				92	\| }
				93
				94	In this example, the "const char *name" part is aligned with the first
				95	character of the group it belongs to (list of function arguments). Placing it
				96	here makes it obvious that it's one of the function's arguments. Multiple lines
				97	are easy to handle this way. This is very common with long conditions too :
				98
				99	\| if ((len < eol - sol) &&
				100	\| (sol[len] == ':') &&
				101	\| (strncasecmp(sol, name, len) == 0)) {
				102	\| ctx->del = len;
				103	\| }
				104
				105	If we take again the example above marking tabs with "[-Tabs-]" and spaces
				106	with "#", we get this :
				107
				108	\| [-Tabs-]if ((len < eol - sol) &&
				109	\| [-Tabs-]####(sol[len] == ':') &&
				110	\| [-Tabs-]####(strncasecmp(sol, name, len) == 0)) {
				111	\| [-Tabs-][-Tabs-]ctx->del = len;
				112	\| [-Tabs-]}
				113
Ilya Shipitsin	2a950d0	2020-03-06 13:07:38 +0500	[diff] [blame]	114	It is worth noting that some editors tend to confuse indentations and alignment.
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	115	Emacs is notoriously known for this brokenness, and is responsible for almost
				116	all of the alignment mess. The reason is that Emacs only counts spaces, tries
				117	to fill as many as possible with tabs and completes with spaces. Once you know
				118	it, you just have to be careful, as alignment is not used much, so generally it
				119	is just a matter of replacing the last tab with 8 spaces when this happens.
				120
				121	Indentation should be used everywhere there is a block or an opening brace. It
				122	is not possible to have two consecutive closing braces on the same column, it
				123	means that the innermost was not indented.
				124
				125	Right :
				126
				127	\| main(int argc, char **argv)
				128	\| {
				129	\| if (argc > 1) {
				130	\| printf("Hello\n");
				131	\| }
				132	\| exit(0);
				133	\| }
				134
				135	Wrong :
				136
				137	\| main(int argc, char **argv)
				138	\| {
				139	\| if (argc > 1) {
				140	\| printf("Hello\n");
				141	\| }
				142	\| exit(0);
				143	\| }
				144
				145	A special case applies to switch/case statements. Due to my editor's settings,
				146	I've been used to align "case" with "switch" and to find it somewhat logical
				147	since each of the "case" statements opens a sublevel belonging to the "switch"
				148	statement. But indenting "case" after "switch" is accepted too. However in any
				149	case, whatever follows the "case" statement must be indented, whether or not it
				150	contains braces :
				151
				152	\| switch (*arg) {
				153	\| case 'A': {
				154	\| int i;
				155	\| for (i = 0; i < 10; i++)
				156	\| printf("Please stop pressing 'A'!\n");
				157	\| break;
				158	\| }
				159	\| case 'B':
				160	\| printf("You pressed 'B'\n");
				161	\| break;
				162	\| case 'C':
				163	\| case 'D':
				164	\| printf("You pressed 'C' or 'D'\n");
				165	\| break;
				166	\| default:
				167	\| printf("I don't know what you pressed\n");
				168	\| }
				169
				170
				171	2) Braces
				172	---------
				173
				174	Braces are used to delimit multiple-instruction blocks. In general it is
				175	preferred to avoid braces around single-instruction blocks as it reduces the
				176	number of lines :
				177
				178	Right :
				179
				180	\| if (argc >= 2)
				181	\| exit(0);
				182
				183	Wrong :
				184
				185	\| if (argc >= 2) {
				186	\| exit(0);
				187	\| }
				188
				189	But it is not that strict, it really depends on the context. It happens from
				190	time to time that single-instruction blocks are enclosed within braces because
				191	it makes the code more symmetrical, or more readable. Example :
				192
				193	\| if (argc < 2) {
				194	\| printf("Missing argument\n");
				195	\| exit(1);
				196	\| } else {
				197	\| exit(0);
				198	\| }
				199
				200	Braces are always needed to declare a function. A function's opening brace must
				201	be placed at the beginning of the next line :
				202
				203	Right :
				204
				205	\| int main(int argc, char **argv)
				206	\| {
				207	\| exit(0);
				208	\| }
				209
				210	Wrong :
				211
				212	\| int main(int argc, char **argv) {
				213	\| exit(0);
				214	\| }
				215
				216	Note that a large portion of the code still does not conforms to this rule, as
Willy Tarreau	8f1b35b	2015-09-21 16:36:15 +0200	[diff] [blame]	217	it took years to get all authors to adapt to this more common standard which
				218	is now preferred, as it avoids visual confusion when function declarations are
				219	broken on multiple lines :
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	220
				221	Right :
				222
				223	\| int foo(const char hdr, const char end,
				224	\| const char name, const char err,
				225	\| int len)
				226	\| {
				227	\| int i;
				228
				229	Wrong :
				230
				231	\| int foo(const char hdr, const char end,
				232	\| const char name, const char err,
				233	\| int len) {
				234	\| int i;
				235
				236	Braces should always be used where there might be an ambiguity with the code
				237	later. The most common example is the stacked "if" statement where an "else"
				238	may be added later at the wrong place breaking the code, but it also happens
				239	with comments or long arguments in function calls. In general, if a block is
				240	more than one line long, it should use braces.
				241
				242	Dangerous code waiting of a victim :
				243
				244	\| if (argc < 2)
				245	\| /* ret must not be negative here */
				246	\| if (ret < 0)
				247	\| return -1;
				248
				249	Wrong change :
				250
				251	\| if (argc < 2)
				252	\| /* ret must not be negative here */
				253	\| if (ret < 0)
				254	\| return -1;
				255	\| else
				256	\| return 0;
				257
				258	It will do this instead of what your eye seems to tell you :
				259
				260	\| if (argc < 2)
				261	\| /* ret must not be negative here */
				262	\| if (ret < 0)
				263	\| return -1;
				264	\| else
				265	\| return 0;
				266
				267	Right :
				268
				269	\| if (argc < 2) {
				270	\| /* ret must not be negative here */
				271	\| if (ret < 0)
				272	\| return -1;
				273	\| }
				274	\| else
				275	\| return 0;
				276
				277	Similarly dangerous example :
				278
				279	\| if (ret < 0)
				280	\| /* ret must not be negative here */
				281	\| complain();
				282	\| init();
				283
				284	Wrong change to silent the annoying message :
				285
				286	\| if (ret < 0)
				287	\| /* ret must not be negative here */
				288	\| //complain();
				289	\| init();
				290
				291	... which in fact means :
				292
				293	\| if (ret < 0)
				294	\| init();
				295
				296
				297	3) Breaking lines
				298	-----------------
				299
				300	There is no strict rule for line breaking. Some files try to stick to the 80
				301	column limit, but given that various people use various tab sizes, it does not
				302	make much sense. Also, code is sometimes easier to read with less lines, as it
				303	represents less surface on the screen (since each new line adds its tabs and
				304	spaces). The rule is to stick to the average line length of other lines. If you
				305	are working in a file which fits in 80 columns, try to keep this goal in mind.
				306	If you're in a function with 120-chars lines, there is no reason to add many
				307	short lines, so you can make longer lines.
				308
				309	In general, opening a new block should lead to a new line. Similarly, multiple
				310	instructions should be avoided on the same line. But some constructs make it
				311	more readable when those are perfectly aligned :
				312
				313	A copy-paste bug in the following construct will be easier to spot :
				314
				315	\| if (omult % idiv == 0) { omult /= idiv; idiv = 1; }
				316	\| if (idiv % omult == 0) { idiv /= omult; omult = 1; }
				317	\| if (imult % odiv == 0) { imult /= odiv; odiv = 1; }
				318	\| if (odiv % imult == 0) { odiv /= imult; imult = 1; }
				319
				320	than in this one :
				321
				322	\| if (omult % idiv == 0) {
				323	\| omult /= idiv;
				324	\| idiv = 1;
				325	\| }
				326	\| if (idiv % omult == 0) {
				327	\| idiv /= omult;
				328	\| omult = 1;
				329	\| }
				330	\| if (imult % odiv == 0) {
				331	\| imult /= odiv;
				332	\| odiv = 1;
				333	\| }
				334	\| if (odiv % imult == 0) {
				335	\| odiv /= imult;
				336	\| imult = 1;
				337	\| }
				338
				339	What is important is not to mix styles. For instance there is nothing wrong
				340	with having many one-line "case" statements as long as most of them are this
				341	short like below :
				342
				343	\| switch (*arg) {
				344	\| case 'A': ret = 1; break;
				345	\| case 'B': ret = 2; break;
				346	\| case 'C': ret = 4; break;
				347	\| case 'D': ret = 8; break;
				348	\| default : ret = 0; break;
				349	\| }
				350
				351	Otherwise, prefer to have the "case" statement on its own line as in the
				352	example in section 1.2 about alignment. In any case, avoid to stack multiple
				353	control statements on the same line, so that it will never be the needed to
				354	add two tab levels at once :
				355
				356	Right :
				357
				358	\| switch (*arg) {
				359	\| case 'A':
				360	\| if (ret < 0)
				361	\| ret = 1;
				362	\| break;
				363	\| default : ret = 0; break;
				364	\| }
				365
				366	Wrong :
				367
				368	\| switch (*arg) {
				369	\| case 'A': if (ret < 0)
				370	\| ret = 1;
				371	\| break;
				372	\| default : ret = 0; break;
				373	\| }
				374
				375	Right :
				376
				377	\| if (argc < 2)
				378	\| if (ret < 0)
				379	\| return -1;
				380
				381	or Right :
				382
				383	\| if (argc < 2)
				384	\| if (ret < 0) return -1;
				385
				386	but Wrong :
				387
				388	\| if (argc < 2) if (ret < 0) return -1;
				389
				390
				391	When complex conditions or expressions are broken into multiple lines, please
				392	do ensure that alignment is perfectly appropriate, and group all main operators
				393	on the same side (which you're free to choose as long as it does not change for
				394	every block. Putting binary operators on the right side is preferred as it does
				395	not mangle with alignment but various people have their preferences.
				396
				397	Right :
				398
				399	\| if ((txn->flags & TX_NOT_FIRST) &&
				400	\| ((req->flags & BF_FULL) \|\|
				401	\| req->r < req->lr \|\|
				402	\| req->r > req->data + req->size - global.tune.maxrewrite)) {
				403	\| return 0;
				404	\| }
				405
				406	Right :
				407
				408	\| if ((txn->flags & TX_NOT_FIRST)
				409	\| && ((req->flags & BF_FULL)
				410	\| \|\| req->r < req->lr
				411	\| \|\| req->r > req->data + req->size - global.tune.maxrewrite)) {
				412	\| return 0;
				413	\| }
				414
				415	Wrong :
				416
				417	\| if ((txn->flags & TX_NOT_FIRST) &&
				418	\| ((req->flags & BF_FULL) \|\|
				419	\| req->r < req->lr
				420	\| \|\| req->r > req->data + req->size - global.tune.maxrewrite)) {
				421	\| return 0;
				422	\| }
				423
				424	If it makes the result more readable, parenthesis may even be closed on their
				425	own line in order to align with the opening one. Note that should normally not
				426	be needed because such code would be too complex to be digged into.
				427
				428	The "else" statement may either be merged with the closing "if" brace or lie on
				429	its own line. The later is preferred but it adds one extra line to each control
				430	block which is annoying in short ones. However, if the "else" is followed by an
				431	"if", then it should really be on its own line and the rest of the if/else
				432	blocks must follow the same style.
				433
				434	Right :
				435
				436	\| if (a < b) {
				437	\| return a;
				438	\| }
				439	\| else {
				440	\| return b;
				441	\| }
				442
				443	Right :
				444
				445	\| if (a < b) {
				446	\| return a;
				447	\| } else {
				448	\| return b;
				449	\| }
				450
				451	Right :
				452
				453	\| if (a < b) {
				454	\| return a;
				455	\| }
				456	\| else if (a != b) {
				457	\| return b;
				458	\| }
				459	\| else {
				460	\| return 0;
				461	\| }
				462
				463	Wrong :
				464
				465	\| if (a < b) {
				466	\| return a;
				467	\| } else if (a != b) {
				468	\| return b;
				469	\| } else {
				470	\| return 0;
				471	\| }
				472
				473	Wrong :
				474
				475	\| if (a < b) {
				476	\| return a;
				477	\| }
				478	\| else if (a != b) {
				479	\| return b;
				480	\| } else {
				481	\| return 0;
				482	\| }
				483
				484
				485	4) Spacing
				486	----------
				487
				488	Correctly spacing code is very important. When you have to spot a bug at 3am,
				489	you need it to be clear. When you expect other people to review your code, you
				490	want it to be clear and don't want them to get nervous when trying to find what
				491	you did.
				492
				493	Always place spaces around all binary or ternary operators, commas, as well as
				494	after semi-colons and opening braces if the line continues :
				495
				496	Right :
				497
				498	\| int ret = 0;
				499	\| /* if (x >> 4) { x >>= 4; ret += 4; } */
				500	\| ret += (x >> 4) ? (x >>= 4, 4) : 0;
				501	\| val = ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
				502
				503	Wrong :
				504
				505	\| int ret=0;
				506	\| /* if (x>>4) {x>>=4;ret+=4;} */
				507	\| ret+=(x>>4)?(x>>=4,4):0;
				508	\| val=ret+((0xFFFFAA50U>>(x<<1))&3)+1;
				509
				510	Never place spaces after unary operators (&, *, -, !, ~, ++, --) nor cast, as
				511	they might be confused with they binary counterpart, nor before commas or
				512	semicolons :
				513
				514	Right :
				515
				516	\| bit = !!(~len++ ^ -(unsigned char)*x);
				517
				518	Wrong :
				519
				520	\| bit = ! ! (~len++ ^ - (unsigned char) * x) ;
				521
				522	Note that "sizeof" is a unary operator which is sometimes considered as a
Michael Prokop	4438c60	2019-05-24 10:25:45 +0200	[diff] [blame]	523	language keyword, but in no case it is a function. It does not require
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	524	parenthesis so it is sometimes followed by spaces and sometimes not when
				525	there are no parenthesis. Most people do not really care as long as what
				526	is written is unambiguous.
				527
Joseph Herlant	71b4b15	2018-11-13 16:55:16 -0800	[diff] [blame]	528	Braces opening a block must be preceded by one space unless the brace is
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	529	placed on the first column :
				530
				531	Right :
				532
				533	\| if (argc < 2) {
				534	\| }
				535
				536	Wrong :
				537
				538	\| if (argc < 2){
				539	\| }
				540
				541	Do not add unneeded spaces inside parenthesis, they just make the code less
				542	readable.
				543
				544	Right :
				545
				546	\| if (x < 4 && (!y \|\| !z))
				547	\| break;
				548
				549	Wrong :
				550
				551	\| if ( x < 4 && ( !y \|\| !z ) )
				552	\| break;
				553
				554	Language keywords must all be followed by a space. This is true for control
				555	statements (do, for, while, if, else, return, switch, case), and for types
				556	(int, char, unsigned). As an exception, the last type in a cast does not take
				557	a space before the closing parenthesis). The "default" statement in a "switch"
				558	construct is generally just followed by the colon. However the colon after a
				559	"case" or "default" statement must be followed by a space.
				560
				561	Right :
				562
				563	\| if (nbargs < 2) {
				564	\| printf("Missing arg at %c\n", (char )ptr);
				565	\| for (i = 0; i < 10; i++) beep();
				566	\| return 0;
				567	\| }
				568	\| switch (*arg) {
				569
				570	Wrong :
				571
				572	\| if(nbargs < 2){
				573	\| printf("Missing arg at %c\n", (char)ptr);
				574	\| for(i = 0; i < 10; i++)beep();
				575	\| return 0;
				576	\| }
				577	\| switch(*arg) {
				578
				579	Function calls are different, the opening parenthesis is always coupled to the
				580	function name without any space. But spaces are still needed after commas :
				581
				582	Right :
				583
				584	\| if (!init(argc, argv))
				585	\| exit(1);
				586
				587	Wrong :
				588
				589	\| if (!init (argc,argv))
				590	\| exit(1);
				591
				592
				593	5) Excess or lack of parenthesis
				594	--------------------------------
				595
				596	Sometimes there are too many parenthesis in some formulas, sometimes there are
				597	too few. There are a few rules of thumb for this. The first one is to respect
				598	the compiler's advice. If it emits a warning and asks for more parenthesis to
				599	avoid confusion, follow the advice at least to shut the warning. For instance,
				600	the code below is quite ambiguous due to its alignment :
				601
				602	\| if (var1 < 2 \|\| var2 < 2 &&
				603	\| var3 != var4) {
				604	\| /* fail */
				605	\| return -3;
				606	\| }
				607
				608	Note that this code does :
				609
				610	\| if (var1 < 2 \|\| (var2 < 2 && var3 != var4)) {
				611	\| /* fail */
				612	\| return -3;
				613	\| }
				614
				615	But maybe the author meant :
				616
				617	\| if ((var1 < 2 \|\| var2 < 2) && var3 != var4) {
				618	\| /* fail */
				619	\| return -3;
				620	\| }
				621
				622	A second rule to put parenthesis is that people don't always know operators
				623	precedence too well. Most often they have no issue with operators of the same
				624	category (eg: booleans, integers, bit manipulation, assignment) but once these
				625	operators are mixed, it causes them all sort of issues. In this case, it is
				626	wise to use parenthesis to avoid errors. One common error concerns the bit
				627	shift operators because they're used to replace multiplies and divides but
				628	don't have the same precedence :
				629
				630	The expression :
				631
				632	\| x = y * 16 + 5;
				633
				634	becomes :
				635
				636	\| x = y << 4 + 5;
				637
				638	which is wrong because it is equivalent to :
				639
				640	\| x = y << (4 + 5);
				641
				642	while the following was desired instead :
				643
				644	\| x = (y << 4) + 5;
				645
				646	It is generally fine to write boolean expressions based on comparisons without
				647	any parenthesis. But on top of that, integer expressions and assignments should
				648	then be protected. For instance, there is an error in the expression below
				649	which should be safely rewritten :
				650
				651	Wrong :
				652
				653	\| if (var1 > 2 && var1 < 10 \|\|
				654	\| var1 > 2 + 256 && var2 < 10 + 256 \|\|
				655	\| var1 > 2 + 1 << 16 && var2 < 10 + 2 << 16)
				656	\| return 1;
				657
				658	Right (may remove a few parenthesis depending on taste) :
				659
				660	\| if ((var1 > 2 && var1 < 10) \|\|
				661	\| (var1 > (2 + 256) && var2 < (10 + 256)) \|\|
				662	\| (var1 > (2 + (1 << 16)) && var2 < (10 + (1 << 16))))
				663	\| return 1;
				664
				665	The "return" statement is not a function, so it takes no argument. It is a
				666	control statement which is followed by the expression to be returned. It does
				667	not need to be followed by parenthesis :
				668
				669	Wrong :
				670
				671	\| int ret0()
				672	\| {
				673	\| return(0);
				674	\| }
				675
				676	Right :
				677
				678	\| int ret0()
				679	\| {
				680	\| return 0;
				681	\| }
				682
				683	Parenthesisis are also found in type casts. Type casting should be avoided as
				684	much as possible, especially when it concerns pointer types. Casting a pointer
				685	disables the compiler's type checking and is the best way to get caught doing
				686	wrong things with data not the size you expect. If you need to manipulate
				687	multiple data types, you can use a union instead. If the union is really not
				688	convenient and casts are easier, then try to isolate them as much as possible,
				689	for instance when initializing function arguments or in another function. Not
				690	proceeding this way causes huge risks of not using the proper pointer without
				691	any notification, which is especially true during copy-pastes.
				692
				693	Wrong :
				694
				695	\| void check_private_data(void arg1, void *arg2)
				696	\| {
				697	\| char *area;
				698	\|
				699	\| if ((int )arg1 > 1000)
				700	\| return NULL;
				701	\| if (memcmp((const char )arg2, "send(", 5) != 0))
				702	\| return NULL;
				703	\| area = malloc((int )arg1);
				704	\| if (!area)
				705	\| return NULL;
				706	\| memcpy(area, (const char )arg2 + 5, (int )arg1);
				707	\| return area;
				708	\| }
				709
				710	Right :
				711
				712	\| void check_private_data(void arg1, void *arg2)
				713	\| {
				714	\| char *area;
				715	\| int len = (int )arg1;
				716	\| const char *msg = arg2;
				717	\|
				718	\| if (len > 1000)
				719	\| return NULL;
				720	\| if (memcmp(msg, "send(", 5) != 0)
				721	\| return NULL;
				722	\| area = malloc(len);
				723	\| if (!area)
				724	\| return NULL;
				725	\| memcpy(area, msg + 5, len);
				726	\| return area;
				727	\| }
				728
				729
				730	6) Ambiguous comparisons with zero or NULL
				731	------------------------------------------
				732
				733	In C, '0' has no type, or it has the type of the variable it is assigned to.
				734	Comparing a variable or a return value with zero means comparing with the
				735	representation of zero for this variable's type. For a boolean, zero is false.
				736	For a pointer, zero is NULL. Very often, to make things shorter, it is fine to
				737	use the '!' unary operator to compare with zero, as it is shorter and easier to
				738	remind or understand than a plain '0'. Since the '!' operator is read "not", it
				739	helps read code faster when what follows it makes sense as a boolean, and it is
				740	often much more appropriate than a comparison with zero which makes an equal
				741	sign appear at an undesirable place. For instance :
				742
				743	\| if (!isdigit(c) && !isspace(c))
				744	\| break;
				745
				746	is easier to understand than :
				747
				748	\| if (isdigit(c) == 0 && isspace(c) == 0)
				749	\| break;
				750
				751	For a char this "not" operator can be reminded as "no remaining char", and the
				752	absence of comparison to zero implies existence of the tested entity, hence the
				753	simple strcpy() implementation below which automatically stops once the last
				754	zero is copied :
				755
				756	\| void my_strcpy(char d, const char s)
				757	\| {
				758	\| while ((d++ = s++));
				759	\| }
				760
				761	Note the double parenthesis in order to avoid the compiler telling us it looks
				762	like an equality test.
				763
				764	For a string or more generally any pointer, this test may be understood as an
				765	existence test or a validity test, as the only pointer which will fail to
				766	validate equality is the NULL pointer :
				767
				768	\| area = malloc(1000);
				769	\| if (!area)
				770	\| return -1;
				771
				772	However sometimes it can fool the reader. For instance, strcmp() precisely is
				773	one of such functions whose return value can make one think the opposite due to
				774	its name which may be understood as "if strings compare...". Thus it is strongly
				775	recommended to perform an explicit comparison with zero in such a case, and it
				776	makes sense considering that the comparison's operator is the same that is
				777	wanted to compare the strings (note that current config parser lacks a lot in
				778	this regards) :
				779
				780	strcmp(a, b) == 0 <=> a == b
				781	strcmp(a, b) != 0 <=> a != b
				782	strcmp(a, b) < 0 <=> a < b
				783	strcmp(a, b) > 0 <=> a > b
				784
				785	Avoid this :
				786
				787	\| if (strcmp(arg, "test"))
				788	\| printf("this is not a test\n");
				789	\|
				790	\| if (!strcmp(arg, "test"))
				791	\| printf("this is a test\n");
				792
				793	Prefer this :
				794
				795	\| if (strcmp(arg, "test") != 0)
				796	\| printf("this is not a test\n");
				797	\|
				798	\| if (strcmp(arg, "test") == 0)
				799	\| printf("this is a test\n");
				800
				801
				802	7) System call returns
				803	----------------------
				804
				805	This is not directly a matter of coding style but more of bad habits. It is
				806	important to check for the correct value upon return of syscalls. The proper
				807	return code indicating an error is described in its man page. There is no
				808	reason to consider wider ranges than what is indicated. For instance, it is
				809	common to see such a thing :
				810
				811	\| if ((fd = open(file, O_RDONLY)) < 0)
				812	\| return -1;
				813
Joseph Herlant	71b4b15	2018-11-13 16:55:16 -0800	[diff] [blame]	814	This is wrong. The man page says that -1 is returned if an error occurred. It
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	815	does not suggest that any other negative value will be an error. It is possible
				816	that a few such issues have been left in existing code. They are bugs for which
Michael Prokop	4438c60	2019-05-24 10:25:45 +0200	[diff] [blame]	817	fixes are accepted, even though they're currently harmless since open() is not
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	818	known for returning negative values at the moment.
				819
				820
				821	8) Declaring new types, names and values
				822	----------------------------------------
				823
				824	Please refrain from using "typedef" to declare new types, they only obfuscate
				825	the code. The reader never knows whether he's manipulating a scalar type or a
				826	struct. For instance it is not obvious why the following code fails to build :
				827
				828	\| int delay_expired(timer_t exp, timer_us_t now)
				829	\| {
				830	\| return now >= exp;
				831	\| }
				832
				833	With the types declared in another file this way :
				834
				835	\| typedef unsigned int timer_t;
				836	\| typedef struct timeval timer_us_t;
				837
				838	This cannot work because we're comparing a scalar with a struct, which does
				839	not make sense. Without a typedef, the function would have been written this
				840	way without any ambiguity and would not have failed :
				841
				842	\| int delay_expired(unsigned int exp, struct timeval *now)
				843	\| {
				844	\| return now >= exp->tv_sec;
				845	\| }
				846
				847	Declaring special values may be done using enums. Enums are a way to define
				848	structured integer values which are related to each other. They are perfectly
				849	suited for state machines. While the first element is always assigned the zero
				850	value, not everybody knows that, especially people working with multiple
				851	languages all the day. For this reason it is recommended to explicitly force
				852	the first value even if it's zero. The last element should be followed by a
				853	comma if it is planned that new elements might later be added, this will make
				854	later patches shorter. Conversely, if the last element is placed in order to
				855	get the number of possible values, it must not be followed by a comma and must
Joseph Herlant	71b4b15	2018-11-13 16:55:16 -0800	[diff] [blame]	856	be preceded by a comment :
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	857
				858	\| enum {
				859	\| first = 0,
				860	\| second,
				861	\| third,
				862	\| fourth,
				863	\| };
				864
				865
				866	\| enum {
				867	\| first = 0,
				868	\| second,
				869	\| third,
				870	\| fourth,
				871	\| /* nbvalues must always be placed last */
				872	\| nbvalues
				873	\| };
				874
				875	Structure names should be short enough not to mangle function declarations,
				876	and explicit enough to avoid confusion (which is the most important thing).
				877
				878	Wrong :
				879
				880	\| struct request_args { /* arguments on the query string */
				881	\| char *name;
				882	\| char *value;
				883	\| struct misc_args *next;
				884	\| };
				885
				886	Right :
				887
				888	\| struct qs_args { /* arguments on the query string */
				889	\| char *name;
				890	\| char *value;
				891	\| struct qs_args *next;
				892	\| }
				893
				894
				895	When declaring new functions or structures, please do not use CamelCase, which
				896	is a style where upper and lower case are mixed in a single word. It causes a
				897	lot of confusion when words are composed from acronyms, because it's hard to
				898	stick to a rule. For instance, a function designed to generate an ISN (initial
				899	sequence number) for a TCP/IP connection could be called :
				900
				901	- generateTcpipIsn()
				902	- generateTcpIpIsn()
				903	- generateTcpIpISN()
				904	- generateTCPIPISN()
				905	etc...
				906
				907	None is right, none is wrong, these are just preferences which might change
				908	along the code. Instead, please use an underscore to separate words. Lowercase
				909	is preferred for the words, but if acronyms are upcased it's not dramatic. The
				910	real advantage of this method is that it creates unambiguous levels even for
				911	short names.
				912
				913	Valid examples :
				914
				915	- generate_tcpip_isn()
				916	- generate_tcp_ip_isn()
				917	- generate_TCPIP_ISN()
				918	- generate_TCP_IP_ISN()
				919
				920	Another example is easy to understand when 3 arguments are involved in naming
				921	the function :
				922
				923	Wrong (naming conflict) :
				924
				925	\| /* returns A + B * C */
				926	\| int mulABC(int a, int b, int c)
				927	\| {
				928	\| return a + b * c;
				929	\| }
				930	\|
				931	\| /* returns (A + B) * C */
				932	\| int mulABC(int a, int b, int c)
				933	\| {
				934	\| return (a + b) * c;
				935	\| }
				936
				937	Right (unambiguous naming) :
				938
				939	\| /* returns A + B * C */
				940	\| int mul_a_bc(int a, int b, int c)
				941	\| {
				942	\| return a + b * c;
				943	\| }
				944	\|
				945	\| /* returns (A + B) * C */
				946	\| int mul_ab_c(int a, int b, int c)
				947	\| {
				948	\| return (a + b) * c;
				949	\| }
				950
				951	Whenever you manipulate pointers, try to declare them as "const", as it will
				952	save you from many accidental misuses and will only cause warnings to be
				953	emitted when there is a real risk. In the examples below, it is possible to
				954	call my_strcpy() with a const string only in the first declaration. Note that
				955	people who ignore "const" are often the ones who cast a lot and who complain
				956	from segfaults when using strtok() !
				957
				958	Right :
				959
				960	\| void my_strcpy(char d, const char s)
				961	\| {
				962	\| while ((d++ = s++));
				963	\| }
				964	\|
				965	\| void say_hello(char *dest)
				966	\| {
				967	\| my_strcpy(dest, "hello\n");
				968	\| }
				969
				970	Wrong :
				971
				972	\| void my_strcpy(char d, char s)
				973	\| {
				974	\| while ((d++ = s++));
				975	\| }
				976	\|
				977	\| void say_hello(char *dest)
				978	\| {
				979	\| my_strcpy(dest, "hello\n");
				980	\| }
				981
				982
				983	9) Getting macros right
				984	-----------------------
				985
				986	It is very common for macros to do the wrong thing when used in a way their
				987	author did not have in mind. For this reason, macros must always be named with
				988	uppercase letters only. This is the only way to catch the developer's eye when
Jackie Tapia	749f74c	2020-07-22 18:59:40 -0500	[diff] [blame]	989	using them, so that they double-check whether they are taking a risk or not. First,
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	990	macros must never ever be terminated by a semi-colon, or they will close the
				991	wrong block once in a while. For instance, the following will cause a build
				992	error before the "else" due to the double semi-colon :
				993
				994	Wrong :
				995
				996	\| #define WARN printf("warning\n");
				997	\| ...
				998	\| if (a < 0)
				999	\| WARN;
				1000	\| else
				1001	\| a--;
				1002
				1003	Right :
				1004
				1005	\| #define WARN printf("warning\n")
				1006
				1007	If multiple instructions are needed, then use a do { } while (0) block, which
				1008	is the only construct which respects exactly the semantics of a single
				1009	instruction :
				1010
				1011	\| #define WARN do { printf("warning\n"); log("warning\n"); } while (0)
				1012	\| ...
				1013	\|
				1014	\| if (a < 0)
				1015	\| WARN;
				1016	\| else
				1017	\| a--;
				1018
				1019	Second, do not put unprotected control statements in macros, they will
				1020	definitely cause bugs :
				1021
				1022	Wrong :
				1023
				1024	\| #define WARN if (verbose) printf("warning\n")
				1025	\| ...
				1026	\| if (a < 0)
				1027	\| WARN;
				1028	\| else
				1029	\| a--;
				1030
				1031	Which is equivalent to the undesired form below :
				1032
				1033	\| if (a < 0)
				1034	\| if (verbose)
				1035	\| printf("warning\n");
				1036	\| else
				1037	\| a--;
				1038
				1039	Right way to do it :
				1040
				1041	\| #define WARN do { if (verbose) printf("warning\n"); } while (0)
				1042	\| ...
				1043	\| if (a < 0)
				1044	\| WARN;
				1045	\| else
				1046	\| a--;
				1047
				1048	Which is equivalent to :
				1049
				1050	\| if (a < 0)
				1051	\| do { if (verbose) printf("warning\n"); } while (0);
				1052	\| else
				1053	\| a--;
				1054
				1055	Macro parameters must always be surrounded by parenthesis, and must never be
				1056	duplicated in the same macro unless explicitly stated. Also, macros must not be
				1057	defined with operators without surrounding parenthesis. The MIN/MAX macros are
				1058	a pretty common example of multiple misuses, but this happens as early as when
				1059	using bit masks. Most often, in case of any doubt, try to use inline functions
				1060	instead.
				1061
				1062	Wrong :
				1063
				1064	\| #define MIN(a, b) a < b ? a : b
				1065	\|
				1066	\| /* returns 2 * min(a,b) + 1 */
				1067	\| int double_min_p1(int a, int b)
				1068	\| {
				1069	\| return 2 * MIN(a, b) + 1;
				1070	\| }
				1071
				1072	What this will do :
				1073
				1074	\| int double_min_p1(int a, int b)
				1075	\| {
				1076	\| return 2 * a < b ? a : b + 1;
				1077	\| }
				1078
				1079	Which is equivalent to :
				1080
				1081	\| int double_min_p1(int a, int b)
				1082	\| {
				1083	\| return (2 * a) < b ? a : (b + 1);
				1084	\| }
				1085
				1086	The first thing to fix is to surround the macro definition with parenthesis to
				1087	avoid this mistake :
				1088
				1089	\| #define MIN(a, b) (a < b ? a : b)
				1090
				1091	But this is still not enough, as can be seen in this example :
				1092
				1093	\| /* compares either a or b with c */
				1094	\| int min_ab_c(int a, int b, int c)
				1095	\| {
				1096	\| return MIN(a ? a : b, c);
				1097	\| }
				1098
				1099	Which is equivalent to :
				1100
				1101	\| int min_ab_c(int a, int b, int c)
				1102	\| {
				1103	\| return (a ? a : b < c ? a ? a : b : c);
				1104	\| }
				1105
				1106	Which in turn means a totally different thing due to precedence :
				1107
				1108	\| int min_ab_c(int a, int b, int c)
				1109	\| {
				1110	\| return (a ? a : ((b < c) ? (a ? a : b) : c));
				1111	\| }
				1112
				1113	This can be fixed by surrounding each argument in the macro with parenthesis:
				1114
				1115	\| #define MIN(a, b) ((a) < (b) ? (a) : (b))
				1116
				1117	But this is still not enough, as can be seen in this example :
				1118
				1119	\| int min_ap1_b(int a, int b)
				1120	\| {
				1121	\| return MIN(++a, b);
				1122	\| }
				1123
				1124	Which is equivalent to :
				1125
				1126	\| int min_ap1_b(int a, int b)
				1127	\| {
				1128	\| return ((++a) < (b) ? (++a) : (b));
				1129	\| }
				1130
				1131	Again, this is wrong because "a" is incremented twice if below b. The only way
				1132	to fix this is to use a compound statement and to assign each argument exactly
				1133	once to a local variable of the same type :
				1134
				1135	\| #define MIN(a, b) ({ typeof(a) __a = (a); typeof(b) __b = (b); \
				1136	\| ((__a) < (__b) ? (__a) : (__b)); \
				1137	\| })
				1138
				1139	At this point, using static inline functions is much cleaner if a single type
				1140	is to be used :
				1141
				1142	\| static inline int min(int a, int b)
				1143	\| {
				1144	\| return a < b ? a : b;
				1145	\| }
				1146
				1147
				1148	10) Includes
				1149	------------
				1150
				1151	Includes are as much as possible listed in alphabetically ordered groups :
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	1152	- the includes more or less system-specific (sys/, netinet/, ...)
Willy Tarreau	bdc6209	2020-07-07 16:21:19 +0200	[diff] [blame]	1153	- the libc-standard includes (those without any path component)
				1154	- includes from the local "import" subdirectory
				1155	- includes from the local "haproxy" subdirectory
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	1156
				1157	Each section is just visually delimited from the other ones using an empty
				1158	line. The two first ones above may be merged into a single section depending on
				1159	developer's preference. Please do not copy-paste include statements from other
				1160	files. Having too many includes significantly increases build time and makes it
				1161	hard to find which ones are needed later. Just include what you need and if
				1162	possible in alphabetical order so that when something is missing, it becomes
				1163	obvious where to look for it and where to add it.
				1164
Willy Tarreau	bdc6209	2020-07-07 16:21:19 +0200	[diff] [blame]	1165	All files should include <haproxy/api.h> because this is where build options
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	1166	are prepared.
				1167
Willy Tarreau	bdc6209	2020-07-07 16:21:19 +0200	[diff] [blame]	1168	Haproxy header files are split in two, those exporting the types only (named
				1169	with a trailing "-t") and those exporting variables, functions and inline
				1170	functions. Types, structures, enums and #defines must go into the types files
				1171	which are the only ones that may be included by othertype files. Function
				1172	prototypes and inlined functions must go into the main files. This split is
				1173	because of inlined functions which cross-reference types from other files,
				1174	which cause a chicken-and-egg problem if the functions and types are declared
				1175	at the same place.
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	1176
				1177	Include files must be protected against multiple inclusion using the common
				1178	#ifndef/#define/#endif trick with a tag derived from the include file and its
				1179	location.
				1180
				1181
				1182	11) Comments
				1183	------------
				1184
				1185	Comments are preferably of the standard 'C' form using /* */. The C++ form "//"
				1186	are tolerated for very short comments (eg: a word or two) but should be avoided
				1187	as much as possible. Multi-line comments are made with each intermediate line
				1188	starting with a star aligned with the first one, as in this example :
				1189
				1190	\| /*
				1191	\| * This is a multi-line
				1192	\| * comment.
				1193	\| */
				1194
				1195	If multiple code lines need a short comment, try to align them so that you can
				1196	have multi-line sentences. This is rarely needed, only for really complex
				1197	constructs.
				1198
				1199	Do not tell what you're doing in comments, but explain why you're doing it if
				1200	it seems not to be obvious. Also do indicate at the top of function what they
				1201	accept and what they don't accept. For instance, strcpy() only accepts output
				1202	buffers at least as large as the input buffer, and does not support any NULL
				1203	pointer. There is nothing wrong with that if the caller knows it.
				1204
				1205	Wrong use of comments :
				1206
				1207	\| int flsnz8(unsigned int x)
				1208	\| {
				1209	\| int ret = 0; /* initialize ret */
				1210	\| if (x >> 4) { x >>= 4; ret += 4; } /* add 4 to ret if needed */
				1211	\| return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1; /* add ??? */
				1212	\| }
				1213	\| ...
				1214	\| bit = ~len + (skip << 3) + 9; /* update bit */
				1215
				1216	Right use of comments :
				1217
Ilya Shipitsin	2a950d0	2020-03-06 13:07:38 +0500	[diff] [blame]	1218	\| /* This function returns the position of the highest bit set in the lowest
Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	1219	\| * byte of <x>, between 0 and 7. It only works if <x> is non-null. It uses
				1220	\| * a 32-bit value as a lookup table to return one of 4 values for the
				1221	\| * highest 16 possible 4-bit values.
				1222	\| */
				1223	\| int flsnz8(unsigned int x)
				1224	\| {
				1225	\| int ret = 0;
				1226	\| if (x >> 4) { x >>= 4; ret += 4; }
				1227	\| return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
				1228	\| }
				1229	\| ...
				1230	\| bit = ~len + (skip << 3) + 9; /* (skip << 3) + (8 - len), saves 1 cycle */
				1231
				1232
				1233	12) Use of assembly
				1234	-------------------
				1235
				1236	There are many projects where use of assembly code is not welcome. There is no
				1237	problem with use of assembly in haproxy, provided that :
				1238
				1239	a) an alternate C-form is provided for architectures not covered
				1240	b) the code is small enough and well commented enough to be maintained
				1241
				1242	It is important to take care of various incompatibilities between compiler
				1243	versions, for instance regarding output and cloberred registers. There are
				1244	a number of documentations on the subject on the net. Anyway if you are
				1245	fiddling with assembly, you probably know that already.
				1246
				1247	Example :
				1248	\| /* gcc does not know when it can safely divide 64 bits by 32 bits. Use this
				1249	\| * function when you know for sure that the result fits in 32 bits, because
				1250	\| * it is optimal on x86 and on 64bit processors.
				1251	\| */
				1252	\| static inline unsigned int div64_32(unsigned long long o1, unsigned int o2)
				1253	\| {
				1254	\| unsigned int result;
				1255	\| #ifdef __i386__
				1256	\| asm("divl %2"
				1257	\| : "=a" (result)
				1258	\| : "A"(o1), "rm"(o2));
				1259	\| #else
				1260	\| result = o1 / o2;
				1261	\| #endif
				1262	\| return result;
				1263	\| }
				1264
Willy Tarreau	02ec3fe	2020-11-18 19:53:45 +0100	[diff] [blame]	1265
				1266	13) Pointers
				1267	------------
				1268
				1269	A lot could be said about pointers, there's enough to fill entire books. Misuse
				1270	of pointers is one of the primary reasons for bugs in haproxy, and this rate
				1271	has significantly increased with the use of threads. Moreover, bogus pointers
				1272	cause the hardest to analyse bugs, because usually they result in modifications
				1273	to reassigned areas or accesses to unmapped areas, and in each case, bugs that
				1274	strike very far away from where they were located. Some bugs have already taken
				1275	up to 3 weeks of full time analysis, which has a severe impact on the project's
				1276	ability to make forward progress on important features. For this reason, code
				1277	that doesn't look robust enough or that doesn't follow some of the rules below
				1278	will be rejected, and may even be reverted after being merged if the trouble is
				1279	detected late!
				1280
				1281
				1282	13.1) No test before freeing
				1283	----------------------------
				1284
				1285	All platforms where haproxy is supported have a well-defined and documented
				1286	behavior for free(NULL), which is to do nothing at all. In other words, free()
				1287	does test for the pointer's nullity. As such, there is no point in testing
				1288	if a pointer is NULL or not before calling free(). And further, you must not
				1289	do it, because it adds some confusion to the reader during debugging sessions,
				1290	making one think that the code's authors weren't very sure about what they
				1291	were doing. This will not cause a bug but will result in your code to get
				1292	rejected.
				1293
				1294	Wrong call to free :
				1295
				1296	\| static inline int blah_free(struct blah *blah)
				1297	\| {
				1298	\| if (blah->str1)
				1299	\| free(blah->str1);
				1300	\| if (blah->str2)
				1301	\| free(blah->str2);
				1302	\| free(blah);
				1303	\| }
				1304
				1305	Correct call to free :
				1306
				1307	\| static inline int blah_free(struct blah *blah)
				1308	\| {
				1309	\| free(blah->str1);
				1310	\| free(blah->str2);
				1311	\| free(blah);
				1312	\| }
				1313
				1314
				1315	13.2) No dangling pointers
				1316	--------------------------
				1317
				1318	Pointers are very commonly used as booleans: if they're not NULL, then the
				1319	area they point to is valid and may be used. This is convenient for many things
				1320	and is even emphasized with threads where they can atomically be swapped with
				1321	another value (even NULL), and as such provide guaranteed atomic resource
				1322	allocation and sharing.
				1323
				1324	The problem with this is when someone forgets to delete a pointer when an area
				1325	is no longer valid, because this may result in the pointer being accessed later
				1326	and pointing to a wrong location, one that was reallocated for something else
				1327	and causing all sort of nastiness like crashes or memory corruption. Moreover,
				1328	thanks to the memory pools, it is extremely likely that a just released pointer
				1329	will be reassigned to a similar object with comparable values (flags etc) at
				1330	the same positions, making tests apparently succeed for a while. Some such bugs
				1331	have gone undetected for several years.
				1332
				1333	The rule is pretty simple:
				1334
				1335	+-----------------------------------------------------------------+
				1336	\| NO REACHABLE POINTER MAY EVER POINT TO AN UNREACHABLE LOCATION. \|
				1337	+-----------------------------------------------------------------+
				1338
				1339	By "reachable pointer", here we mean a pointer that is accessible from a
				1340	reachable structure or a global variable. This means that any pointer found
				1341	anywhere in any structure in the code may always be dereferenced. This can
				1342	seem obvious but this is not always enforced.
				1343
				1344	This means that when freeing an area, the pointer that was used to find that
				1345	area must be overwritten with NULL, and all other such pointers must as well
				1346	if any. It is one case where one can find more convenient to write the NULL
				1347	on the same line as the call to free() to make things easier to check. Be
				1348	careful about any potential "if" when doing this.
				1349
				1350	Wrong use of free :
				1351
				1352	\| static inline int blah_recycle(struct blah *blah)
				1353	\| {
				1354	\| free(blah->str1);
				1355	\| free(blah->str2);
				1356	\| }
				1357
				1358	Correct use of free :
				1359
				1360	\| static inline int blah_recycle(struct blah *blah)
				1361	\| {
				1362	\| free(blah->str1); blah->str1 = NULL;
				1363	\| free(blah->str2); blah->str2 = NULL;
				1364	\| }
				1365
				1366	Sometimes the code doesn't permit this to be done. It is not a matter of code
				1367	but a matter of architecture. Example:
				1368
				1369	Initialization:
				1370
				1371	\| static struct foo *foo_init()
				1372	\| {
				1373	\| struct foo *foo;
				1374	\| struct bar *bar;
				1375	\|
				1376	\| foo = pool_alloc(foo_head);
				1377	\| bar = pool_alloc(bar_head);
				1378	\| if (!foo \|\| !bar)
				1379	\| goto fail;
				1380	\| foo->bar = bar;
				1381	\| ...
				1382	\| }
				1383
				1384	Scheduled task 1:
				1385
				1386	\| static inline int foo_timeout(struct foo *foo)
				1387	\| {
				1388	\| free(foo->bar);
				1389	\| free(foo);
				1390	\| }
				1391
				1392	Scheduled task 2:
				1393
				1394	\| static inline int bar_timeout(struct bar *bar)
				1395	\| {
				1396	\| free(bar);
				1397	\| }
				1398
				1399	Here it's obvious that if "bar" times out, it will be freed but its pointer in
				1400	"foo" will remain here, and if foo times out just after, it will lead to a
				1401	double free. Or worse, if another instance allocates a pointer and receives bar
				1402	again, when foo times out, it will release the old bar pointer which now points
				1403	to a new object, and the code using that new object will crash much later, or
				1404	even worse, will share the same area as yet another instance having inherited
				1405	that pointer again.
				1406
				1407	Here this simply means that the data model is wrong. If bar may be freed alone,
				1408	it MUST have a pointer to foo so that bar->foo->bar is set to NULL to let foo
				1409	finish its life peacefully. This also means that the code dealing with foo must
				1410	be written in a way to support bar's leaving.
				1411
				1412
				1413	13.3) Don't abuse pointers as booleans
				1414	--------------------------------------
				1415
				1416	Given the common use of a pointer to know if the area it points to is valid,
				1417	there is a big incentive in using such pointers as booleans to describe
				1418	something a bit higher level, like "is the user authenticated". This must not
				1419	be done. The reason stems from the points above. Initially this perfectly
				1420	matches and the code is simple. Then later some extra options need to be added,
				1421	and more pointers are needed, all allocated together. At this point they all
				1422	start to become their own booleans, supposedly always equivalent, but if that
				1423	were true, they would be a single area with a single pointer. And things start
				1424	to fall apart with some code areas relying on one pointer for the condition and
				1425	other ones relying on other pointers. Pointers may be substituted with "flags"
				1426	or "present in list" etc here. And from this point, things quickly degrade with
				1427	pointers needing to remain set even if pointing to wrong areas, just for the
				1428	sake of not being NULL and not breaking some assumptions. At this point the
				1429	bugs are already there and the code is not trustable anymore.
				1430
				1431	The only way to avoid this is to strictly respect this rule: pointers do not
				1432	represent a functionality but a storage area. Of course it is very frequent to
				1433	consider that if an optional string is not set, a feature is not enabled. This
				1434	can be fine to some extents. But as soon as any slightest condition is added
				1435	anywhere into the mux, the code relying on the pointer must be replaced with
				1436	something else so that the pointer may live its own life and be released (and
				1437	reset) earlier if needed.
				1438
				1439
				1440	13.4) Mixing const and non-const
				1441	--------------------------------
				1442
				1443	Something often encountered, especially when assembling error messages, is
				1444	functions that collect strings, assemble them into larger messages and free
				1445	everything. The problem here is that if strings are defined as variables, there
				1446	will rightfully be build warnings when reporting string constants such as bare
				1447	keywords or messages, and if strings are defined as constants, it is not
				1448	possible to free them. The temptation is sometimes huge to force some free()
				1449	calls on casted strings. Do not do that! It will inevitably lead to someone
				1450	getting caught passing a constant string that will make the process crash (if
				1451	lucky). Document the expectations, indicate that all arguments must be freeable
				1452	and that the caller must be capable of strdup(), and make your function support
Ilya Shipitsin	2272d8a	2020-12-21 01:22:40 +0500	[diff] [blame]	1453	NULLs and document it (so that callers can deal with a failing strdup() on
Willy Tarreau	02ec3fe	2020-11-18 19:53:45 +0100	[diff] [blame]	1454	allocation error).
				1455
				1456	One valid alternative is to use a secondary channel to indicate whether the
				1457	message may be freed or not. A flag in a complex structure can be used for this
				1458	purpose, for example. If you are certain that your strings are aligned to a
				1459	certain number of bytes, it can be possible to instrument the code to use the
				1460	lowest bit to indicate the need to free (e.g. by always adding one to every
				1461	const string). But such a solution will require good enough instrumentation so
				1462	that it doesn't constitute a new set of traps.
				1463
				1464
				1465	13.5) No pointer casts
				1466	----------------------
				1467
				1468	Except in rare occasions caused by legacy APIs (e.g. sockaddr) or special cases
				1469	which explicitly require a form of aliasing, there is no valid reason for
				1470	casting pointers, and usually this is used to hide other problems that will
				1471	strike later. The only suitable type of cast is the cast from the generic void*
				1472	used to store a context for example. But in C, there is no need to cast to nor
				1473	from void*, so this is not required. However those coming from C++ tend to be
				1474	used to this practice, and others argue that it makes the intent more visible.
				1475
				1476	As a corollary, do not abuse void. Placing void everywhere to avoid casting
				1477	is a bad practice as well. The use of void* is only for generic functions or
				1478	structures which do not have a limited set of types supported. When only a few
				1479	types are supported, generally their type can be passed using a side channel,
				1480	and the void* can be turned into a union that makes the code more readable and
				1481	more verifiable.
				1482
				1483	An alternative in haproxy is to use a pointer to an obj_type enum. Usually it
				1484	is placed at the beginning of a structure. It works like a void* except that
				1485	the type is read directly from the object. This is convenient when a small set
				1486	of remote objects may be attached to another one because a single of them will
				1487	match a non-null pointer (e.g. a connection or an applet).
				1488
				1489	Example:
				1490
				1491	\| static inline int blah_free(struct blah *blah)
				1492	\| {
				1493	\| /* only one of them (at most) will not be null */
				1494	\| pool_free(pool_head_connection, objt_conn(blah->target));
				1495	\| pool_free(pool_head_appctx, objt_appctx(blah->target));
				1496	\| pool_free(pool_head_stream, objt_stream(blah->target));
				1497	\| blah->target = NULL;
				1498	\| }
				1499
				1500
				1501	13.6) Extreme caution when using non-canonical pointers
				1502	-------------------------------------------------------
				1503
				1504	It can be particularly convenient to embed some logic in the unused bits or
				1505	code points of a pointer. Indeed, when it is known by design that a given
				1506	pointer will always follow a certain alignment, a few lower bits will always
				1507	remain zero, and as such may be used as optional flags. For example, the ebtree
				1508	code uses the lowest bit to differentiate left/right attachments to the parent
				1509	and node/leaf in branches. It is also known that values very close to NULL will
				1510	never represent a valid pointer, and the thread-safe MT_LIST code uses this to
				1511	lock visited pointers.
				1512
				1513	There are a few rules to respect in order to do this:
				1514	- the deviations from the canonical pointers must be exhaustively documented
				1515	where the pointer type is defined, and the whole control logic with its
				1516	implications and possible and impossible cases must be enumerated as well ;
				1517
				1518	- make sure that the operations will work on every supported platform, which
				1519	includes 32-bit platforms where structures may be aligned on as little as
				1520	32-bit. 32-bit alignment leaves only two LSB available. When doing so, make
				1521	sure the target structures are not labelled with the "packed" attribute, or
				1522	that they're always perfectly aligned. All platforms where haproxy runs
				1523	have their NULL pointer mapped at address zero, and use page sizes at least
				1524	4096 bytes large, leaving all values form 1 to 4095 unused. Anything
				1525	outside of this is unsafe. In particular, never use negative numbers to
				1526	represent a supposedly invalid address. On 32-bits platforms it will often
				1527	correspond to a system address or a special page. Always try a variety of
				1528	platforms when doing such a thing.
				1529
				1530	- the code must not use such pointers as booleans anymore even if it is known
				1531	that "it works" because that keeps a doubt open for the reviewer. Only the
				1532	canonical pointer may be tested. There can be a rare exception which is if
				1533	this is on a critical path where severe performance degradation may result
				1534	from this. In this case, each of the checks must be duly documented and
				1535	the equivalent BUG_ON() instances must be placed to prove the claim.
				1536
				1537	- some inline functions (or macros) must be used to turn the pointers to/from
				1538	their canonical form so that the regular code doesn't have to see the
				1539	operations, and so that the representation may be easily adjusted in the
				1540	future. A few comments indicating to a human how to turn a pointer back and
				1541	forth from inside a debugger will be appreciated, as macros often end up
				1542	not being trivially readable nor directly usable.
				1543
				1544	- do not use int types to cast the pointers, this will only work on 32-bit
				1545	platforms. While "long" is usually fine, it is not recommended anymore due
				1546	to the Windows platform being LLP64 and having it set to 32 bits. And
				1547	"long long" isn't good either for always being 64 bits. More suitable types
				1548	are ptrdiff_t or size_t. Note that while those were not available everywhere
				1549	in the early days of hparoxy, size_t is now heavily used and known to work
				1550	everywhere. And do not perform the operations on the pointers, only on the
				1551	integer types (and cast back again). Some compilers such as gcc are
Ilya Shipitsin	2272d8a	2020-12-21 01:22:40 +0500	[diff] [blame]	1552	extremely picky about this and will often emit wrong code when they see
Willy Tarreau	02ec3fe	2020-11-18 19:53:45 +0100	[diff] [blame]	1553	equality conditions they believe is impossible and decide to optimize them
				1554	away.
				1555
				1556
				1557	13.7) Pointers in unions
				1558	------------------------
				1559
				1560	Before placing multiple aliasing pointers inside a same union, there MUST be a
				1561	SINGLE well-defined way to figure them out from each other. It may be thanks to
				1562	a side-channel information (as done in the samples with a defined type), it may
				1563	be based on in-area information (as done using obj_types), or any other trusted
				1564	solution. In any case, if pointers are mixed with any other type (integer or
				1565	float) in a union, there must be a very simple way to distinguish them, and not
				1566	a platform-dependent nor compiler-dependent one.