Blame - doc/coding-style.txt - haproxy

blob: 297e67c88d1c31fb04443be1232999ad5e61687b [file] [log] [blame]

Willy Tarreau	7f051b3	2011-12-30 17:16:22 +0100	[diff] [blame]	1	2011/12/30 - HAProxy coding style - Willy Tarreau <w@1wt.eu>
				2	------------------------------------------------------------
				3
				4	A number of contributors are often embarrassed with coding style issues, they
				5	don't always know if they're doing it right, especially since the coding style
				6	has elvoved along the years. What is explained here is not necessarily what is
				7	applied in the code, but new code should as much as possible conform to this
				8	style. Coding style fixes happen when code is replaced. It is useless to send
				9	patches to fix coding style only, they will be rejected, unless they belong to
				10	a patch series which needs these fixes prior to get code changes. Also, please
				11	avoid fixing coding style in the same patches as functional changes, they make
				12	code review harder.
				13
				14	When modifying a file, you must accept the terms of the license of this file
				15	which is recalled at the top of the file, or is explained in the LICENSE file,
				16	or if not stated, defaults to LGPL version 2.1 or later for files in the
				17	'include' directory, and GPL version 2 or later for all other files.
				18
				19	When adding a new file, you must add a copyright banner at the top of the
				20	file with your real name, e-mail address and a reminder of the license.
				21	Contributions under incompatible licenses or too restrictive licenses might
				22	get rejected. If in doubt, please apply the principle above for existing files.
				23
				24	All code examples below will intentionally be prefixed with " \| " to mark
				25	where the code aligns with the first column, and tabs in this document will be
				26	represented as a series of 8 spaces so that it displays the same everywhere.
				27
				28
				29	1) Indentation and alignment
				30	----------------------------
				31
				32	1.1) Indentation
				33	----------------
				34
				35	Indentation and alignment are two completely different things that people often
				36	get wrong. Indentation is used to mark a sub-level in the code. A sub-level
				37	means that a block is executed in the context of another block (eg: a function
				38	or a condition) :
				39
				40	\| main(int argc, char **argv)
				41	\| {
				42	\| int i;
				43	\|
				44	\| if (argc < 2)
				45	\| exit(1);
				46	\| }
				47
				48	In the example above, the code belongs to the main() function and the exit()
				49	call belongs to the if statement. Indentation is made with tabs (\t, ASCII 9),
				50	which allows any developer to configure their preferred editor to use their
				51	own tab size and to still get the text properly indented. Exactly one tab is
				52	used per sub-level. Tabs may only appear at the beginning of a line or after
				53	another tab. It is illegal to put a tab after some text, as it mangles displays
				54	in a different manner for different users (particularly when used to align
				55	comments or values after a #define). If you're tempted to put a tab after some
				56	text, then you're doing it wrong and you need alignment instead (see below).
				57
				58	Note that there are places where the code was not properly indented in the
				59	past. In order to view it correctly, you may have to set your tab size to 8
				60	characters.
				61
				62
				63	1.2) Alignment
				64	--------------
				65
				66	Alignment is used to continue a line in a way to makes things easier to group
				67	together. By definition, alignment is character-based, so it uses spaces. Tabs
				68	would not work because for one tab there would not be as many characters on all
				69	displays. For instance, the arguments in a function declaration may be broken
				70	into multiple lines using alignment spaces :
				71
				72	\| int http_header_match2(const char hdr, const char end,
				73	\| const char *name, int len)
				74	\| {
				75	\| ...
				76	\| }
				77
				78	In this example, the "const char *name" part is aligned with the first
				79	character of the group it belongs to (list of function arguments). Placing it
				80	here makes it obvious that it's one of the function's arguments. Multiple lines
				81	are easy to handle this way. This is very common with long conditions too :
				82
				83	\| if ((len < eol - sol) &&
				84	\| (sol[len] == ':') &&
				85	\| (strncasecmp(sol, name, len) == 0)) {
				86	\| ctx->del = len;
				87	\| }
				88
				89	If we take again the example above marking tabs with "[-Tabs-]" and spaces
				90	with "#", we get this :
				91
				92	\| [-Tabs-]if ((len < eol - sol) &&
				93	\| [-Tabs-]####(sol[len] == ':') &&
				94	\| [-Tabs-]####(strncasecmp(sol, name, len) == 0)) {
				95	\| [-Tabs-][-Tabs-]ctx->del = len;
				96	\| [-Tabs-]}
				97
				98	It is worth noting that some editors tend to confuse indentations and aligment.
				99	Emacs is notoriously known for this brokenness, and is responsible for almost
				100	all of the alignment mess. The reason is that Emacs only counts spaces, tries
				101	to fill as many as possible with tabs and completes with spaces. Once you know
				102	it, you just have to be careful, as alignment is not used much, so generally it
				103	is just a matter of replacing the last tab with 8 spaces when this happens.
				104
				105	Indentation should be used everywhere there is a block or an opening brace. It
				106	is not possible to have two consecutive closing braces on the same column, it
				107	means that the innermost was not indented.
				108
				109	Right :
				110
				111	\| main(int argc, char **argv)
				112	\| {
				113	\| if (argc > 1) {
				114	\| printf("Hello\n");
				115	\| }
				116	\| exit(0);
				117	\| }
				118
				119	Wrong :
				120
				121	\| main(int argc, char **argv)
				122	\| {
				123	\| if (argc > 1) {
				124	\| printf("Hello\n");
				125	\| }
				126	\| exit(0);
				127	\| }
				128
				129	A special case applies to switch/case statements. Due to my editor's settings,
				130	I've been used to align "case" with "switch" and to find it somewhat logical
				131	since each of the "case" statements opens a sublevel belonging to the "switch"
				132	statement. But indenting "case" after "switch" is accepted too. However in any
				133	case, whatever follows the "case" statement must be indented, whether or not it
				134	contains braces :
				135
				136	\| switch (*arg) {
				137	\| case 'A': {
				138	\| int i;
				139	\| for (i = 0; i < 10; i++)
				140	\| printf("Please stop pressing 'A'!\n");
				141	\| break;
				142	\| }
				143	\| case 'B':
				144	\| printf("You pressed 'B'\n");
				145	\| break;
				146	\| case 'C':
				147	\| case 'D':
				148	\| printf("You pressed 'C' or 'D'\n");
				149	\| break;
				150	\| default:
				151	\| printf("I don't know what you pressed\n");
				152	\| }
				153
				154
				155	2) Braces
				156	---------
				157
				158	Braces are used to delimit multiple-instruction blocks. In general it is
				159	preferred to avoid braces around single-instruction blocks as it reduces the
				160	number of lines :
				161
				162	Right :
				163
				164	\| if (argc >= 2)
				165	\| exit(0);
				166
				167	Wrong :
				168
				169	\| if (argc >= 2) {
				170	\| exit(0);
				171	\| }
				172
				173	But it is not that strict, it really depends on the context. It happens from
				174	time to time that single-instruction blocks are enclosed within braces because
				175	it makes the code more symmetrical, or more readable. Example :
				176
				177	\| if (argc < 2) {
				178	\| printf("Missing argument\n");
				179	\| exit(1);
				180	\| } else {
				181	\| exit(0);
				182	\| }
				183
				184	Braces are always needed to declare a function. A function's opening brace must
				185	be placed at the beginning of the next line :
				186
				187	Right :
				188
				189	\| int main(int argc, char **argv)
				190	\| {
				191	\| exit(0);
				192	\| }
				193
				194	Wrong :
				195
				196	\| int main(int argc, char **argv) {
				197	\| exit(0);
				198	\| }
				199
				200	Note that a large portion of the code still does not conforms to this rule, as
				201	it took years to me to adapt to this more common standard which I now tend to
				202	prefer, as it avoids visual confusion when function declarations are broken on
				203	multiple lines :
				204
				205	Right :
				206
				207	\| int foo(const char hdr, const char end,
				208	\| const char name, const char err,
				209	\| int len)
				210	\| {
				211	\| int i;
				212
				213	Wrong :
				214
				215	\| int foo(const char hdr, const char end,
				216	\| const char name, const char err,
				217	\| int len) {
				218	\| int i;
				219
				220	Braces should always be used where there might be an ambiguity with the code
				221	later. The most common example is the stacked "if" statement where an "else"
				222	may be added later at the wrong place breaking the code, but it also happens
				223	with comments or long arguments in function calls. In general, if a block is
				224	more than one line long, it should use braces.
				225
				226	Dangerous code waiting of a victim :
				227
				228	\| if (argc < 2)
				229	\| /* ret must not be negative here */
				230	\| if (ret < 0)
				231	\| return -1;
				232
				233	Wrong change :
				234
				235	\| if (argc < 2)
				236	\| /* ret must not be negative here */
				237	\| if (ret < 0)
				238	\| return -1;
				239	\| else
				240	\| return 0;
				241
				242	It will do this instead of what your eye seems to tell you :
				243
				244	\| if (argc < 2)
				245	\| /* ret must not be negative here */
				246	\| if (ret < 0)
				247	\| return -1;
				248	\| else
				249	\| return 0;
				250
				251	Right :
				252
				253	\| if (argc < 2) {
				254	\| /* ret must not be negative here */
				255	\| if (ret < 0)
				256	\| return -1;
				257	\| }
				258	\| else
				259	\| return 0;
				260
				261	Similarly dangerous example :
				262
				263	\| if (ret < 0)
				264	\| /* ret must not be negative here */
				265	\| complain();
				266	\| init();
				267
				268	Wrong change to silent the annoying message :
				269
				270	\| if (ret < 0)
				271	\| /* ret must not be negative here */
				272	\| //complain();
				273	\| init();
				274
				275	... which in fact means :
				276
				277	\| if (ret < 0)
				278	\| init();
				279
				280
				281	3) Breaking lines
				282	-----------------
				283
				284	There is no strict rule for line breaking. Some files try to stick to the 80
				285	column limit, but given that various people use various tab sizes, it does not
				286	make much sense. Also, code is sometimes easier to read with less lines, as it
				287	represents less surface on the screen (since each new line adds its tabs and
				288	spaces). The rule is to stick to the average line length of other lines. If you
				289	are working in a file which fits in 80 columns, try to keep this goal in mind.
				290	If you're in a function with 120-chars lines, there is no reason to add many
				291	short lines, so you can make longer lines.
				292
				293	In general, opening a new block should lead to a new line. Similarly, multiple
				294	instructions should be avoided on the same line. But some constructs make it
				295	more readable when those are perfectly aligned :
				296
				297	A copy-paste bug in the following construct will be easier to spot :
				298
				299	\| if (omult % idiv == 0) { omult /= idiv; idiv = 1; }
				300	\| if (idiv % omult == 0) { idiv /= omult; omult = 1; }
				301	\| if (imult % odiv == 0) { imult /= odiv; odiv = 1; }
				302	\| if (odiv % imult == 0) { odiv /= imult; imult = 1; }
				303
				304	than in this one :
				305
				306	\| if (omult % idiv == 0) {
				307	\| omult /= idiv;
				308	\| idiv = 1;
				309	\| }
				310	\| if (idiv % omult == 0) {
				311	\| idiv /= omult;
				312	\| omult = 1;
				313	\| }
				314	\| if (imult % odiv == 0) {
				315	\| imult /= odiv;
				316	\| odiv = 1;
				317	\| }
				318	\| if (odiv % imult == 0) {
				319	\| odiv /= imult;
				320	\| imult = 1;
				321	\| }
				322
				323	What is important is not to mix styles. For instance there is nothing wrong
				324	with having many one-line "case" statements as long as most of them are this
				325	short like below :
				326
				327	\| switch (*arg) {
				328	\| case 'A': ret = 1; break;
				329	\| case 'B': ret = 2; break;
				330	\| case 'C': ret = 4; break;
				331	\| case 'D': ret = 8; break;
				332	\| default : ret = 0; break;
				333	\| }
				334
				335	Otherwise, prefer to have the "case" statement on its own line as in the
				336	example in section 1.2 about alignment. In any case, avoid to stack multiple
				337	control statements on the same line, so that it will never be the needed to
				338	add two tab levels at once :
				339
				340	Right :
				341
				342	\| switch (*arg) {
				343	\| case 'A':
				344	\| if (ret < 0)
				345	\| ret = 1;
				346	\| break;
				347	\| default : ret = 0; break;
				348	\| }
				349
				350	Wrong :
				351
				352	\| switch (*arg) {
				353	\| case 'A': if (ret < 0)
				354	\| ret = 1;
				355	\| break;
				356	\| default : ret = 0; break;
				357	\| }
				358
				359	Right :
				360
				361	\| if (argc < 2)
				362	\| if (ret < 0)
				363	\| return -1;
				364
				365	or Right :
				366
				367	\| if (argc < 2)
				368	\| if (ret < 0) return -1;
				369
				370	but Wrong :
				371
				372	\| if (argc < 2) if (ret < 0) return -1;
				373
				374
				375	When complex conditions or expressions are broken into multiple lines, please
				376	do ensure that alignment is perfectly appropriate, and group all main operators
				377	on the same side (which you're free to choose as long as it does not change for
				378	every block. Putting binary operators on the right side is preferred as it does
				379	not mangle with alignment but various people have their preferences.
				380
				381	Right :
				382
				383	\| if ((txn->flags & TX_NOT_FIRST) &&
				384	\| ((req->flags & BF_FULL) \|\|
				385	\| req->r < req->lr \|\|
				386	\| req->r > req->data + req->size - global.tune.maxrewrite)) {
				387	\| return 0;
				388	\| }
				389
				390	Right :
				391
				392	\| if ((txn->flags & TX_NOT_FIRST)
				393	\| && ((req->flags & BF_FULL)
				394	\| \|\| req->r < req->lr
				395	\| \|\| req->r > req->data + req->size - global.tune.maxrewrite)) {
				396	\| return 0;
				397	\| }
				398
				399	Wrong :
				400
				401	\| if ((txn->flags & TX_NOT_FIRST) &&
				402	\| ((req->flags & BF_FULL) \|\|
				403	\| req->r < req->lr
				404	\| \|\| req->r > req->data + req->size - global.tune.maxrewrite)) {
				405	\| return 0;
				406	\| }
				407
				408	If it makes the result more readable, parenthesis may even be closed on their
				409	own line in order to align with the opening one. Note that should normally not
				410	be needed because such code would be too complex to be digged into.
				411
				412	The "else" statement may either be merged with the closing "if" brace or lie on
				413	its own line. The later is preferred but it adds one extra line to each control
				414	block which is annoying in short ones. However, if the "else" is followed by an
				415	"if", then it should really be on its own line and the rest of the if/else
				416	blocks must follow the same style.
				417
				418	Right :
				419
				420	\| if (a < b) {
				421	\| return a;
				422	\| }
				423	\| else {
				424	\| return b;
				425	\| }
				426
				427	Right :
				428
				429	\| if (a < b) {
				430	\| return a;
				431	\| } else {
				432	\| return b;
				433	\| }
				434
				435	Right :
				436
				437	\| if (a < b) {
				438	\| return a;
				439	\| }
				440	\| else if (a != b) {
				441	\| return b;
				442	\| }
				443	\| else {
				444	\| return 0;
				445	\| }
				446
				447	Wrong :
				448
				449	\| if (a < b) {
				450	\| return a;
				451	\| } else if (a != b) {
				452	\| return b;
				453	\| } else {
				454	\| return 0;
				455	\| }
				456
				457	Wrong :
				458
				459	\| if (a < b) {
				460	\| return a;
				461	\| }
				462	\| else if (a != b) {
				463	\| return b;
				464	\| } else {
				465	\| return 0;
				466	\| }
				467
				468
				469	4) Spacing
				470	----------
				471
				472	Correctly spacing code is very important. When you have to spot a bug at 3am,
				473	you need it to be clear. When you expect other people to review your code, you
				474	want it to be clear and don't want them to get nervous when trying to find what
				475	you did.
				476
				477	Always place spaces around all binary or ternary operators, commas, as well as
				478	after semi-colons and opening braces if the line continues :
				479
				480	Right :
				481
				482	\| int ret = 0;
				483	\| /* if (x >> 4) { x >>= 4; ret += 4; } */
				484	\| ret += (x >> 4) ? (x >>= 4, 4) : 0;
				485	\| val = ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
				486
				487	Wrong :
				488
				489	\| int ret=0;
				490	\| /* if (x>>4) {x>>=4;ret+=4;} */
				491	\| ret+=(x>>4)?(x>>=4,4):0;
				492	\| val=ret+((0xFFFFAA50U>>(x<<1))&3)+1;
				493
				494	Never place spaces after unary operators (&, *, -, !, ~, ++, --) nor cast, as
				495	they might be confused with they binary counterpart, nor before commas or
				496	semicolons :
				497
				498	Right :
				499
				500	\| bit = !!(~len++ ^ -(unsigned char)*x);
				501
				502	Wrong :
				503
				504	\| bit = ! ! (~len++ ^ - (unsigned char) * x) ;
				505
				506	Note that "sizeof" is a unary operator which is sometimes considered as a
				507	langage keyword, but in no case it is a function. It does not require
				508	parenthesis so it is sometimes followed by spaces and sometimes not when
				509	there are no parenthesis. Most people do not really care as long as what
				510	is written is unambiguous.
				511
				512	Braces opening a block must be preceeded by one space unless the brace is
				513	placed on the first column :
				514
				515	Right :
				516
				517	\| if (argc < 2) {
				518	\| }
				519
				520	Wrong :
				521
				522	\| if (argc < 2){
				523	\| }
				524
				525	Do not add unneeded spaces inside parenthesis, they just make the code less
				526	readable.
				527
				528	Right :
				529
				530	\| if (x < 4 && (!y \|\| !z))
				531	\| break;
				532
				533	Wrong :
				534
				535	\| if ( x < 4 && ( !y \|\| !z ) )
				536	\| break;
				537
				538	Language keywords must all be followed by a space. This is true for control
				539	statements (do, for, while, if, else, return, switch, case), and for types
				540	(int, char, unsigned). As an exception, the last type in a cast does not take
				541	a space before the closing parenthesis). The "default" statement in a "switch"
				542	construct is generally just followed by the colon. However the colon after a
				543	"case" or "default" statement must be followed by a space.
				544
				545	Right :
				546
				547	\| if (nbargs < 2) {
				548	\| printf("Missing arg at %c\n", (char )ptr);
				549	\| for (i = 0; i < 10; i++) beep();
				550	\| return 0;
				551	\| }
				552	\| switch (*arg) {
				553
				554	Wrong :
				555
				556	\| if(nbargs < 2){
				557	\| printf("Missing arg at %c\n", (char)ptr);
				558	\| for(i = 0; i < 10; i++)beep();
				559	\| return 0;
				560	\| }
				561	\| switch(*arg) {
				562
				563	Function calls are different, the opening parenthesis is always coupled to the
				564	function name without any space. But spaces are still needed after commas :
				565
				566	Right :
				567
				568	\| if (!init(argc, argv))
				569	\| exit(1);
				570
				571	Wrong :
				572
				573	\| if (!init (argc,argv))
				574	\| exit(1);
				575
				576
				577	5) Excess or lack of parenthesis
				578	--------------------------------
				579
				580	Sometimes there are too many parenthesis in some formulas, sometimes there are
				581	too few. There are a few rules of thumb for this. The first one is to respect
				582	the compiler's advice. If it emits a warning and asks for more parenthesis to
				583	avoid confusion, follow the advice at least to shut the warning. For instance,
				584	the code below is quite ambiguous due to its alignment :
				585
				586	\| if (var1 < 2 \|\| var2 < 2 &&
				587	\| var3 != var4) {
				588	\| /* fail */
				589	\| return -3;
				590	\| }
				591
				592	Note that this code does :
				593
				594	\| if (var1 < 2 \|\| (var2 < 2 && var3 != var4)) {
				595	\| /* fail */
				596	\| return -3;
				597	\| }
				598
				599	But maybe the author meant :
				600
				601	\| if ((var1 < 2 \|\| var2 < 2) && var3 != var4) {
				602	\| /* fail */
				603	\| return -3;
				604	\| }
				605
				606	A second rule to put parenthesis is that people don't always know operators
				607	precedence too well. Most often they have no issue with operators of the same
				608	category (eg: booleans, integers, bit manipulation, assignment) but once these
				609	operators are mixed, it causes them all sort of issues. In this case, it is
				610	wise to use parenthesis to avoid errors. One common error concerns the bit
				611	shift operators because they're used to replace multiplies and divides but
				612	don't have the same precedence :
				613
				614	The expression :
				615
				616	\| x = y * 16 + 5;
				617
				618	becomes :
				619
				620	\| x = y << 4 + 5;
				621
				622	which is wrong because it is equivalent to :
				623
				624	\| x = y << (4 + 5);
				625
				626	while the following was desired instead :
				627
				628	\| x = (y << 4) + 5;
				629
				630	It is generally fine to write boolean expressions based on comparisons without
				631	any parenthesis. But on top of that, integer expressions and assignments should
				632	then be protected. For instance, there is an error in the expression below
				633	which should be safely rewritten :
				634
				635	Wrong :
				636
				637	\| if (var1 > 2 && var1 < 10 \|\|
				638	\| var1 > 2 + 256 && var2 < 10 + 256 \|\|
				639	\| var1 > 2 + 1 << 16 && var2 < 10 + 2 << 16)
				640	\| return 1;
				641
				642	Right (may remove a few parenthesis depending on taste) :
				643
				644	\| if ((var1 > 2 && var1 < 10) \|\|
				645	\| (var1 > (2 + 256) && var2 < (10 + 256)) \|\|
				646	\| (var1 > (2 + (1 << 16)) && var2 < (10 + (1 << 16))))
				647	\| return 1;
				648
				649	The "return" statement is not a function, so it takes no argument. It is a
				650	control statement which is followed by the expression to be returned. It does
				651	not need to be followed by parenthesis :
				652
				653	Wrong :
				654
				655	\| int ret0()
				656	\| {
				657	\| return(0);
				658	\| }
				659
				660	Right :
				661
				662	\| int ret0()
				663	\| {
				664	\| return 0;
				665	\| }
				666
				667	Parenthesisis are also found in type casts. Type casting should be avoided as
				668	much as possible, especially when it concerns pointer types. Casting a pointer
				669	disables the compiler's type checking and is the best way to get caught doing
				670	wrong things with data not the size you expect. If you need to manipulate
				671	multiple data types, you can use a union instead. If the union is really not
				672	convenient and casts are easier, then try to isolate them as much as possible,
				673	for instance when initializing function arguments or in another function. Not
				674	proceeding this way causes huge risks of not using the proper pointer without
				675	any notification, which is especially true during copy-pastes.
				676
				677	Wrong :
				678
				679	\| void check_private_data(void arg1, void *arg2)
				680	\| {
				681	\| char *area;
				682	\|
				683	\| if ((int )arg1 > 1000)
				684	\| return NULL;
				685	\| if (memcmp((const char )arg2, "send(", 5) != 0))
				686	\| return NULL;
				687	\| area = malloc((int )arg1);
				688	\| if (!area)
				689	\| return NULL;
				690	\| memcpy(area, (const char )arg2 + 5, (int )arg1);
				691	\| return area;
				692	\| }
				693
				694	Right :
				695
				696	\| void check_private_data(void arg1, void *arg2)
				697	\| {
				698	\| char *area;
				699	\| int len = (int )arg1;
				700	\| const char *msg = arg2;
				701	\|
				702	\| if (len > 1000)
				703	\| return NULL;
				704	\| if (memcmp(msg, "send(", 5) != 0)
				705	\| return NULL;
				706	\| area = malloc(len);
				707	\| if (!area)
				708	\| return NULL;
				709	\| memcpy(area, msg + 5, len);
				710	\| return area;
				711	\| }
				712
				713
				714	6) Ambiguous comparisons with zero or NULL
				715	------------------------------------------
				716
				717	In C, '0' has no type, or it has the type of the variable it is assigned to.
				718	Comparing a variable or a return value with zero means comparing with the
				719	representation of zero for this variable's type. For a boolean, zero is false.
				720	For a pointer, zero is NULL. Very often, to make things shorter, it is fine to
				721	use the '!' unary operator to compare with zero, as it is shorter and easier to
				722	remind or understand than a plain '0'. Since the '!' operator is read "not", it
				723	helps read code faster when what follows it makes sense as a boolean, and it is
				724	often much more appropriate than a comparison with zero which makes an equal
				725	sign appear at an undesirable place. For instance :
				726
				727	\| if (!isdigit(c) && !isspace(c))
				728	\| break;
				729
				730	is easier to understand than :
				731
				732	\| if (isdigit(c) == 0 && isspace(c) == 0)
				733	\| break;
				734
				735	For a char this "not" operator can be reminded as "no remaining char", and the
				736	absence of comparison to zero implies existence of the tested entity, hence the
				737	simple strcpy() implementation below which automatically stops once the last
				738	zero is copied :
				739
				740	\| void my_strcpy(char d, const char s)
				741	\| {
				742	\| while ((d++ = s++));
				743	\| }
				744
				745	Note the double parenthesis in order to avoid the compiler telling us it looks
				746	like an equality test.
				747
				748	For a string or more generally any pointer, this test may be understood as an
				749	existence test or a validity test, as the only pointer which will fail to
				750	validate equality is the NULL pointer :
				751
				752	\| area = malloc(1000);
				753	\| if (!area)
				754	\| return -1;
				755
				756	However sometimes it can fool the reader. For instance, strcmp() precisely is
				757	one of such functions whose return value can make one think the opposite due to
				758	its name which may be understood as "if strings compare...". Thus it is strongly
				759	recommended to perform an explicit comparison with zero in such a case, and it
				760	makes sense considering that the comparison's operator is the same that is
				761	wanted to compare the strings (note that current config parser lacks a lot in
				762	this regards) :
				763
				764	strcmp(a, b) == 0 <=> a == b
				765	strcmp(a, b) != 0 <=> a != b
				766	strcmp(a, b) < 0 <=> a < b
				767	strcmp(a, b) > 0 <=> a > b
				768
				769	Avoid this :
				770
				771	\| if (strcmp(arg, "test"))
				772	\| printf("this is not a test\n");
				773	\|
				774	\| if (!strcmp(arg, "test"))
				775	\| printf("this is a test\n");
				776
				777	Prefer this :
				778
				779	\| if (strcmp(arg, "test") != 0)
				780	\| printf("this is not a test\n");
				781	\|
				782	\| if (strcmp(arg, "test") == 0)
				783	\| printf("this is a test\n");
				784
				785
				786	7) System call returns
				787	----------------------
				788
				789	This is not directly a matter of coding style but more of bad habits. It is
				790	important to check for the correct value upon return of syscalls. The proper
				791	return code indicating an error is described in its man page. There is no
				792	reason to consider wider ranges than what is indicated. For instance, it is
				793	common to see such a thing :
				794
				795	\| if ((fd = open(file, O_RDONLY)) < 0)
				796	\| return -1;
				797
				798	This is wrong. The man page says that -1 is returned if an error occured. It
				799	does not suggest that any other negative value will be an error. It is possible
				800	that a few such issues have been left in existing code. They are bugs for which
				801	fixes are accepted, eventhough they're currently harmless since open() is not
				802	known for returning negative values at the moment.
				803
				804
				805	8) Declaring new types, names and values
				806	----------------------------------------
				807
				808	Please refrain from using "typedef" to declare new types, they only obfuscate
				809	the code. The reader never knows whether he's manipulating a scalar type or a
				810	struct. For instance it is not obvious why the following code fails to build :
				811
				812	\| int delay_expired(timer_t exp, timer_us_t now)
				813	\| {
				814	\| return now >= exp;
				815	\| }
				816
				817	With the types declared in another file this way :
				818
				819	\| typedef unsigned int timer_t;
				820	\| typedef struct timeval timer_us_t;
				821
				822	This cannot work because we're comparing a scalar with a struct, which does
				823	not make sense. Without a typedef, the function would have been written this
				824	way without any ambiguity and would not have failed :
				825
				826	\| int delay_expired(unsigned int exp, struct timeval *now)
				827	\| {
				828	\| return now >= exp->tv_sec;
				829	\| }
				830
				831	Declaring special values may be done using enums. Enums are a way to define
				832	structured integer values which are related to each other. They are perfectly
				833	suited for state machines. While the first element is always assigned the zero
				834	value, not everybody knows that, especially people working with multiple
				835	languages all the day. For this reason it is recommended to explicitly force
				836	the first value even if it's zero. The last element should be followed by a
				837	comma if it is planned that new elements might later be added, this will make
				838	later patches shorter. Conversely, if the last element is placed in order to
				839	get the number of possible values, it must not be followed by a comma and must
				840	be preceeded by a comment :
				841
				842	\| enum {
				843	\| first = 0,
				844	\| second,
				845	\| third,
				846	\| fourth,
				847	\| };
				848
				849
				850	\| enum {
				851	\| first = 0,
				852	\| second,
				853	\| third,
				854	\| fourth,
				855	\| /* nbvalues must always be placed last */
				856	\| nbvalues
				857	\| };
				858
				859	Structure names should be short enough not to mangle function declarations,
				860	and explicit enough to avoid confusion (which is the most important thing).
				861
				862	Wrong :
				863
				864	\| struct request_args { /* arguments on the query string */
				865	\| char *name;
				866	\| char *value;
				867	\| struct misc_args *next;
				868	\| };
				869
				870	Right :
				871
				872	\| struct qs_args { /* arguments on the query string */
				873	\| char *name;
				874	\| char *value;
				875	\| struct qs_args *next;
				876	\| }
				877
				878
				879	When declaring new functions or structures, please do not use CamelCase, which
				880	is a style where upper and lower case are mixed in a single word. It causes a
				881	lot of confusion when words are composed from acronyms, because it's hard to
				882	stick to a rule. For instance, a function designed to generate an ISN (initial
				883	sequence number) for a TCP/IP connection could be called :
				884
				885	- generateTcpipIsn()
				886	- generateTcpIpIsn()
				887	- generateTcpIpISN()
				888	- generateTCPIPISN()
				889	etc...
				890
				891	None is right, none is wrong, these are just preferences which might change
				892	along the code. Instead, please use an underscore to separate words. Lowercase
				893	is preferred for the words, but if acronyms are upcased it's not dramatic. The
				894	real advantage of this method is that it creates unambiguous levels even for
				895	short names.
				896
				897	Valid examples :
				898
				899	- generate_tcpip_isn()
				900	- generate_tcp_ip_isn()
				901	- generate_TCPIP_ISN()
				902	- generate_TCP_IP_ISN()
				903
				904	Another example is easy to understand when 3 arguments are involved in naming
				905	the function :
				906
				907	Wrong (naming conflict) :
				908
				909	\| /* returns A + B * C */
				910	\| int mulABC(int a, int b, int c)
				911	\| {
				912	\| return a + b * c;
				913	\| }
				914	\|
				915	\| /* returns (A + B) * C */
				916	\| int mulABC(int a, int b, int c)
				917	\| {
				918	\| return (a + b) * c;
				919	\| }
				920
				921	Right (unambiguous naming) :
				922
				923	\| /* returns A + B * C */
				924	\| int mul_a_bc(int a, int b, int c)
				925	\| {
				926	\| return a + b * c;
				927	\| }
				928	\|
				929	\| /* returns (A + B) * C */
				930	\| int mul_ab_c(int a, int b, int c)
				931	\| {
				932	\| return (a + b) * c;
				933	\| }
				934
				935	Whenever you manipulate pointers, try to declare them as "const", as it will
				936	save you from many accidental misuses and will only cause warnings to be
				937	emitted when there is a real risk. In the examples below, it is possible to
				938	call my_strcpy() with a const string only in the first declaration. Note that
				939	people who ignore "const" are often the ones who cast a lot and who complain
				940	from segfaults when using strtok() !
				941
				942	Right :
				943
				944	\| void my_strcpy(char d, const char s)
				945	\| {
				946	\| while ((d++ = s++));
				947	\| }
				948	\|
				949	\| void say_hello(char *dest)
				950	\| {
				951	\| my_strcpy(dest, "hello\n");
				952	\| }
				953
				954	Wrong :
				955
				956	\| void my_strcpy(char d, char s)
				957	\| {
				958	\| while ((d++ = s++));
				959	\| }
				960	\|
				961	\| void say_hello(char *dest)
				962	\| {
				963	\| my_strcpy(dest, "hello\n");
				964	\| }
				965
				966
				967	9) Getting macros right
				968	-----------------------
				969
				970	It is very common for macros to do the wrong thing when used in a way their
				971	author did not have in mind. For this reason, macros must always be named with
				972	uppercase letters only. This is the only way to catch the developer's eye when
				973	using them, so that he double-checks whether he's taking risks or not. First,
				974	macros must never ever be terminated by a semi-colon, or they will close the
				975	wrong block once in a while. For instance, the following will cause a build
				976	error before the "else" due to the double semi-colon :
				977
				978	Wrong :
				979
				980	\| #define WARN printf("warning\n");
				981	\| ...
				982	\| if (a < 0)
				983	\| WARN;
				984	\| else
				985	\| a--;
				986
				987	Right :
				988
				989	\| #define WARN printf("warning\n")
				990
				991	If multiple instructions are needed, then use a do { } while (0) block, which
				992	is the only construct which respects exactly the semantics of a single
				993	instruction :
				994
				995	\| #define WARN do { printf("warning\n"); log("warning\n"); } while (0)
				996	\| ...
				997	\|
				998	\| if (a < 0)
				999	\| WARN;
				1000	\| else
				1001	\| a--;
				1002
				1003	Second, do not put unprotected control statements in macros, they will
				1004	definitely cause bugs :
				1005
				1006	Wrong :
				1007
				1008	\| #define WARN if (verbose) printf("warning\n")
				1009	\| ...
				1010	\| if (a < 0)
				1011	\| WARN;
				1012	\| else
				1013	\| a--;
				1014
				1015	Which is equivalent to the undesired form below :
				1016
				1017	\| if (a < 0)
				1018	\| if (verbose)
				1019	\| printf("warning\n");
				1020	\| else
				1021	\| a--;
				1022
				1023	Right way to do it :
				1024
				1025	\| #define WARN do { if (verbose) printf("warning\n"); } while (0)
				1026	\| ...
				1027	\| if (a < 0)
				1028	\| WARN;
				1029	\| else
				1030	\| a--;
				1031
				1032	Which is equivalent to :
				1033
				1034	\| if (a < 0)
				1035	\| do { if (verbose) printf("warning\n"); } while (0);
				1036	\| else
				1037	\| a--;
				1038
				1039	Macro parameters must always be surrounded by parenthesis, and must never be
				1040	duplicated in the same macro unless explicitly stated. Also, macros must not be
				1041	defined with operators without surrounding parenthesis. The MIN/MAX macros are
				1042	a pretty common example of multiple misuses, but this happens as early as when
				1043	using bit masks. Most often, in case of any doubt, try to use inline functions
				1044	instead.
				1045
				1046	Wrong :
				1047
				1048	\| #define MIN(a, b) a < b ? a : b
				1049	\|
				1050	\| /* returns 2 * min(a,b) + 1 */
				1051	\| int double_min_p1(int a, int b)
				1052	\| {
				1053	\| return 2 * MIN(a, b) + 1;
				1054	\| }
				1055
				1056	What this will do :
				1057
				1058	\| int double_min_p1(int a, int b)
				1059	\| {
				1060	\| return 2 * a < b ? a : b + 1;
				1061	\| }
				1062
				1063	Which is equivalent to :
				1064
				1065	\| int double_min_p1(int a, int b)
				1066	\| {
				1067	\| return (2 * a) < b ? a : (b + 1);
				1068	\| }
				1069
				1070	The first thing to fix is to surround the macro definition with parenthesis to
				1071	avoid this mistake :
				1072
				1073	\| #define MIN(a, b) (a < b ? a : b)
				1074
				1075	But this is still not enough, as can be seen in this example :
				1076
				1077	\| /* compares either a or b with c */
				1078	\| int min_ab_c(int a, int b, int c)
				1079	\| {
				1080	\| return MIN(a ? a : b, c);
				1081	\| }
				1082
				1083	Which is equivalent to :
				1084
				1085	\| int min_ab_c(int a, int b, int c)
				1086	\| {
				1087	\| return (a ? a : b < c ? a ? a : b : c);
				1088	\| }
				1089
				1090	Which in turn means a totally different thing due to precedence :
				1091
				1092	\| int min_ab_c(int a, int b, int c)
				1093	\| {
				1094	\| return (a ? a : ((b < c) ? (a ? a : b) : c));
				1095	\| }
				1096
				1097	This can be fixed by surrounding each argument in the macro with parenthesis:
				1098
				1099	\| #define MIN(a, b) ((a) < (b) ? (a) : (b))
				1100
				1101	But this is still not enough, as can be seen in this example :
				1102
				1103	\| int min_ap1_b(int a, int b)
				1104	\| {
				1105	\| return MIN(++a, b);
				1106	\| }
				1107
				1108	Which is equivalent to :
				1109
				1110	\| int min_ap1_b(int a, int b)
				1111	\| {
				1112	\| return ((++a) < (b) ? (++a) : (b));
				1113	\| }
				1114
				1115	Again, this is wrong because "a" is incremented twice if below b. The only way
				1116	to fix this is to use a compound statement and to assign each argument exactly
				1117	once to a local variable of the same type :
				1118
				1119	\| #define MIN(a, b) ({ typeof(a) __a = (a); typeof(b) __b = (b); \
				1120	\| ((__a) < (__b) ? (__a) : (__b)); \
				1121	\| })
				1122
				1123	At this point, using static inline functions is much cleaner if a single type
				1124	is to be used :
				1125
				1126	\| static inline int min(int a, int b)
				1127	\| {
				1128	\| return a < b ? a : b;
				1129	\| }
				1130
				1131
				1132	10) Includes
				1133	------------
				1134
				1135	Includes are as much as possible listed in alphabetically ordered groups :
				1136	- the libc-standard includes (those without any path component)
				1137	- the includes more or less system-specific (sys/, netinet/, ...)
				1138	- includes from the local "common" subdirectory
				1139	- includes from the local "types" subdirectory
				1140	- includes from the local "proto" subdirectory
				1141
				1142	Each section is just visually delimited from the other ones using an empty
				1143	line. The two first ones above may be merged into a single section depending on
				1144	developer's preference. Please do not copy-paste include statements from other
				1145	files. Having too many includes significantly increases build time and makes it
				1146	hard to find which ones are needed later. Just include what you need and if
				1147	possible in alphabetical order so that when something is missing, it becomes
				1148	obvious where to look for it and where to add it.
				1149
				1150	All files should include <common/config.h> because this is where build options
				1151	are prepared.
				1152
				1153	Header files are split in two directories ("types" and "proto") depending on
				1154	what they provide. Types, structures, enums and #defines must go into the
				1155	"types" directory. Function prototypes and inlined functions must go into the
				1156	"proto" directory. This split is because of inlined functions which
				1157	cross-reference types from other files, which cause a chicken-and-egg problem
				1158	if the functions and types are declared at the same place.
				1159
				1160	All headers which do not depend on anything currently go to the "common"
				1161	subdirectory, but are equally well placed into the "proto" directory. It is
				1162	possible that one day the "common" directory will disappear.
				1163
				1164	Include files must be protected against multiple inclusion using the common
				1165	#ifndef/#define/#endif trick with a tag derived from the include file and its
				1166	location.
				1167
				1168
				1169	11) Comments
				1170	------------
				1171
				1172	Comments are preferably of the standard 'C' form using /* */. The C++ form "//"
				1173	are tolerated for very short comments (eg: a word or two) but should be avoided
				1174	as much as possible. Multi-line comments are made with each intermediate line
				1175	starting with a star aligned with the first one, as in this example :
				1176
				1177	\| /*
				1178	\| * This is a multi-line
				1179	\| * comment.
				1180	\| */
				1181
				1182	If multiple code lines need a short comment, try to align them so that you can
				1183	have multi-line sentences. This is rarely needed, only for really complex
				1184	constructs.
				1185
				1186	Do not tell what you're doing in comments, but explain why you're doing it if
				1187	it seems not to be obvious. Also do indicate at the top of function what they
				1188	accept and what they don't accept. For instance, strcpy() only accepts output
				1189	buffers at least as large as the input buffer, and does not support any NULL
				1190	pointer. There is nothing wrong with that if the caller knows it.
				1191
				1192	Wrong use of comments :
				1193
				1194	\| int flsnz8(unsigned int x)
				1195	\| {
				1196	\| int ret = 0; /* initialize ret */
				1197	\| if (x >> 4) { x >>= 4; ret += 4; } /* add 4 to ret if needed */
				1198	\| return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1; /* add ??? */
				1199	\| }
				1200	\| ...
				1201	\| bit = ~len + (skip << 3) + 9; /* update bit */
				1202
				1203	Right use of comments :
				1204
				1205	\| /* This function returns the positoin of the highest bit set in the lowest
				1206	\| * byte of <x>, between 0 and 7. It only works if <x> is non-null. It uses
				1207	\| * a 32-bit value as a lookup table to return one of 4 values for the
				1208	\| * highest 16 possible 4-bit values.
				1209	\| */
				1210	\| int flsnz8(unsigned int x)
				1211	\| {
				1212	\| int ret = 0;
				1213	\| if (x >> 4) { x >>= 4; ret += 4; }
				1214	\| return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
				1215	\| }
				1216	\| ...
				1217	\| bit = ~len + (skip << 3) + 9; /* (skip << 3) + (8 - len), saves 1 cycle */
				1218
				1219
				1220	12) Use of assembly
				1221	-------------------
				1222
				1223	There are many projects where use of assembly code is not welcome. There is no
				1224	problem with use of assembly in haproxy, provided that :
				1225
				1226	a) an alternate C-form is provided for architectures not covered
				1227	b) the code is small enough and well commented enough to be maintained
				1228
				1229	It is important to take care of various incompatibilities between compiler
				1230	versions, for instance regarding output and cloberred registers. There are
				1231	a number of documentations on the subject on the net. Anyway if you are
				1232	fiddling with assembly, you probably know that already.
				1233
				1234	Example :
				1235	\| /* gcc does not know when it can safely divide 64 bits by 32 bits. Use this
				1236	\| * function when you know for sure that the result fits in 32 bits, because
				1237	\| * it is optimal on x86 and on 64bit processors.
				1238	\| */
				1239	\| static inline unsigned int div64_32(unsigned long long o1, unsigned int o2)
				1240	\| {
				1241	\| unsigned int result;
				1242	\| #ifdef __i386__
				1243	\| asm("divl %2"
				1244	\| : "=a" (result)
				1245	\| : "A"(o1), "rm"(o2));
				1246	\| #else
				1247	\| result = o1 / o2;
				1248	\| #endif
				1249	\| return result;
				1250	\| }
				1251