blob: 24550f1f5dbd721935d9fd209d573c2d5a37cc89 [file] [log] [blame]
Willy Tarreau8f1b35b2015-09-21 16:36:15 +020012015/09/21 - HAProxy coding style - Willy Tarreau <w@1wt.eu>
Willy Tarreau7f051b32011-12-30 17:16:22 +01002------------------------------------------------------------
3
4A number of contributors are often embarrassed with coding style issues, they
5don't always know if they're doing it right, especially since the coding style
6has elvoved along the years. What is explained here is not necessarily what is
7applied in the code, but new code should as much as possible conform to this
8style. Coding style fixes happen when code is replaced. It is useless to send
9patches to fix coding style only, they will be rejected, unless they belong to
10a patch series which needs these fixes prior to get code changes. Also, please
11avoid fixing coding style in the same patches as functional changes, they make
12code review harder.
13
Willy Tarreau8f1b35b2015-09-21 16:36:15 +020014A good way to quickly validate your patch before submitting it is to pass it
15through the Linux kernel's checkpatch.pl utility which can be downloaded here :
16
17 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/scripts/checkpatch.pl
18
19Running it with the following options relaxes its checks to accommodate to the
20extra degree of freedom that is tolerated in HAProxy's coding style compared to
21the stricter style used in the kernel :
22
23 checkpatch.pl -q --max-line-length=160 --no-tree --no-signoff \
24 --ignore=LEADING_SPACE,CODE_INDENT,DEEP_INDENTATION \
25 --ignore=ELSE_AFTER_BRACE < patch
26
27You can take its output as hints instead of strict rules, but in general its
28output will be accurate and it may even spot some real bugs.
29
Willy Tarreau7f051b32011-12-30 17:16:22 +010030When modifying a file, you must accept the terms of the license of this file
31which is recalled at the top of the file, or is explained in the LICENSE file,
32or if not stated, defaults to LGPL version 2.1 or later for files in the
33'include' directory, and GPL version 2 or later for all other files.
34
35When adding a new file, you must add a copyright banner at the top of the
36file with your real name, e-mail address and a reminder of the license.
37Contributions under incompatible licenses or too restrictive licenses might
38get rejected. If in doubt, please apply the principle above for existing files.
39
40All code examples below will intentionally be prefixed with " | " to mark
41where the code aligns with the first column, and tabs in this document will be
42represented as a series of 8 spaces so that it displays the same everywhere.
43
44
451) Indentation and alignment
46----------------------------
47
481.1) Indentation
49----------------
50
51Indentation and alignment are two completely different things that people often
52get wrong. Indentation is used to mark a sub-level in the code. A sub-level
53means that a block is executed in the context of another block (eg: a function
54or a condition) :
55
56 | main(int argc, char **argv)
57 | {
58 | int i;
59 |
60 | if (argc < 2)
61 | exit(1);
62 | }
63
64In the example above, the code belongs to the main() function and the exit()
65call belongs to the if statement. Indentation is made with tabs (\t, ASCII 9),
66which allows any developer to configure their preferred editor to use their
67own tab size and to still get the text properly indented. Exactly one tab is
68used per sub-level. Tabs may only appear at the beginning of a line or after
69another tab. It is illegal to put a tab after some text, as it mangles displays
70in a different manner for different users (particularly when used to align
71comments or values after a #define). If you're tempted to put a tab after some
72text, then you're doing it wrong and you need alignment instead (see below).
73
74Note that there are places where the code was not properly indented in the
75past. In order to view it correctly, you may have to set your tab size to 8
76characters.
77
78
791.2) Alignment
80--------------
81
82Alignment is used to continue a line in a way to makes things easier to group
83together. By definition, alignment is character-based, so it uses spaces. Tabs
84would not work because for one tab there would not be as many characters on all
85displays. For instance, the arguments in a function declaration may be broken
86into multiple lines using alignment spaces :
87
88 | int http_header_match2(const char *hdr, const char *end,
89 | const char *name, int len)
90 | {
91 | ...
92 | }
93
94In this example, the "const char *name" part is aligned with the first
95character of the group it belongs to (list of function arguments). Placing it
96here makes it obvious that it's one of the function's arguments. Multiple lines
97are easy to handle this way. This is very common with long conditions too :
98
99 | if ((len < eol - sol) &&
100 | (sol[len] == ':') &&
101 | (strncasecmp(sol, name, len) == 0)) {
102 | ctx->del = len;
103 | }
104
105If we take again the example above marking tabs with "[-Tabs-]" and spaces
106with "#", we get this :
107
108 | [-Tabs-]if ((len < eol - sol) &&
109 | [-Tabs-]####(sol[len] == ':') &&
110 | [-Tabs-]####(strncasecmp(sol, name, len) == 0)) {
111 | [-Tabs-][-Tabs-]ctx->del = len;
112 | [-Tabs-]}
113
Ilya Shipitsin4f9532d2020-03-06 13:07:38 +0500114It is worth noting that some editors tend to confuse indentations and alignment.
Willy Tarreau7f051b32011-12-30 17:16:22 +0100115Emacs is notoriously known for this brokenness, and is responsible for almost
116all of the alignment mess. The reason is that Emacs only counts spaces, tries
117to fill as many as possible with tabs and completes with spaces. Once you know
118it, you just have to be careful, as alignment is not used much, so generally it
119is just a matter of replacing the last tab with 8 spaces when this happens.
120
121Indentation should be used everywhere there is a block or an opening brace. It
122is not possible to have two consecutive closing braces on the same column, it
123means that the innermost was not indented.
124
125Right :
126
127 | main(int argc, char **argv)
128 | {
129 | if (argc > 1) {
130 | printf("Hello\n");
131 | }
132 | exit(0);
133 | }
134
135Wrong :
136
137 | main(int argc, char **argv)
138 | {
139 | if (argc > 1) {
140 | printf("Hello\n");
141 | }
142 | exit(0);
143 | }
144
145A special case applies to switch/case statements. Due to my editor's settings,
146I've been used to align "case" with "switch" and to find it somewhat logical
147since each of the "case" statements opens a sublevel belonging to the "switch"
148statement. But indenting "case" after "switch" is accepted too. However in any
149case, whatever follows the "case" statement must be indented, whether or not it
150contains braces :
151
152 | switch (*arg) {
153 | case 'A': {
154 | int i;
155 | for (i = 0; i < 10; i++)
156 | printf("Please stop pressing 'A'!\n");
157 | break;
158 | }
159 | case 'B':
160 | printf("You pressed 'B'\n");
161 | break;
162 | case 'C':
163 | case 'D':
164 | printf("You pressed 'C' or 'D'\n");
165 | break;
166 | default:
167 | printf("I don't know what you pressed\n");
168 | }
169
170
1712) Braces
172---------
173
174Braces are used to delimit multiple-instruction blocks. In general it is
175preferred to avoid braces around single-instruction blocks as it reduces the
176number of lines :
177
178Right :
179
180 | if (argc >= 2)
181 | exit(0);
182
183Wrong :
184
185 | if (argc >= 2) {
186 | exit(0);
187 | }
188
189But it is not that strict, it really depends on the context. It happens from
190time to time that single-instruction blocks are enclosed within braces because
191it makes the code more symmetrical, or more readable. Example :
192
193 | if (argc < 2) {
194 | printf("Missing argument\n");
195 | exit(1);
196 | } else {
197 | exit(0);
198 | }
199
200Braces are always needed to declare a function. A function's opening brace must
201be placed at the beginning of the next line :
202
203Right :
204
205 | int main(int argc, char **argv)
206 | {
207 | exit(0);
208 | }
209
210Wrong :
211
212 | int main(int argc, char **argv) {
213 | exit(0);
214 | }
215
216Note that a large portion of the code still does not conforms to this rule, as
Willy Tarreau8f1b35b2015-09-21 16:36:15 +0200217it took years to get all authors to adapt to this more common standard which
218is now preferred, as it avoids visual confusion when function declarations are
219broken on multiple lines :
Willy Tarreau7f051b32011-12-30 17:16:22 +0100220
221Right :
222
223 | int foo(const char *hdr, const char *end,
224 | const char *name, const char *err,
225 | int len)
226 | {
227 | int i;
228
229Wrong :
230
231 | int foo(const char *hdr, const char *end,
232 | const char *name, const char *err,
233 | int len) {
234 | int i;
235
236Braces should always be used where there might be an ambiguity with the code
237later. The most common example is the stacked "if" statement where an "else"
238may be added later at the wrong place breaking the code, but it also happens
239with comments or long arguments in function calls. In general, if a block is
240more than one line long, it should use braces.
241
242Dangerous code waiting of a victim :
243
244 | if (argc < 2)
245 | /* ret must not be negative here */
246 | if (ret < 0)
247 | return -1;
248
249Wrong change :
250
251 | if (argc < 2)
252 | /* ret must not be negative here */
253 | if (ret < 0)
254 | return -1;
255 | else
256 | return 0;
257
258It will do this instead of what your eye seems to tell you :
259
260 | if (argc < 2)
261 | /* ret must not be negative here */
262 | if (ret < 0)
263 | return -1;
264 | else
265 | return 0;
266
267Right :
268
269 | if (argc < 2) {
270 | /* ret must not be negative here */
271 | if (ret < 0)
272 | return -1;
273 | }
274 | else
275 | return 0;
276
277Similarly dangerous example :
278
279 | if (ret < 0)
280 | /* ret must not be negative here */
281 | complain();
282 | init();
283
284Wrong change to silent the annoying message :
285
286 | if (ret < 0)
287 | /* ret must not be negative here */
288 | //complain();
289 | init();
290
291... which in fact means :
292
293 | if (ret < 0)
294 | init();
295
296
2973) Breaking lines
298-----------------
299
300There is no strict rule for line breaking. Some files try to stick to the 80
301column limit, but given that various people use various tab sizes, it does not
302make much sense. Also, code is sometimes easier to read with less lines, as it
303represents less surface on the screen (since each new line adds its tabs and
304spaces). The rule is to stick to the average line length of other lines. If you
305are working in a file which fits in 80 columns, try to keep this goal in mind.
306If you're in a function with 120-chars lines, there is no reason to add many
307short lines, so you can make longer lines.
308
309In general, opening a new block should lead to a new line. Similarly, multiple
310instructions should be avoided on the same line. But some constructs make it
311more readable when those are perfectly aligned :
312
313A copy-paste bug in the following construct will be easier to spot :
314
315 | if (omult % idiv == 0) { omult /= idiv; idiv = 1; }
316 | if (idiv % omult == 0) { idiv /= omult; omult = 1; }
317 | if (imult % odiv == 0) { imult /= odiv; odiv = 1; }
318 | if (odiv % imult == 0) { odiv /= imult; imult = 1; }
319
320than in this one :
321
322 | if (omult % idiv == 0) {
323 | omult /= idiv;
324 | idiv = 1;
325 | }
326 | if (idiv % omult == 0) {
327 | idiv /= omult;
328 | omult = 1;
329 | }
330 | if (imult % odiv == 0) {
331 | imult /= odiv;
332 | odiv = 1;
333 | }
334 | if (odiv % imult == 0) {
335 | odiv /= imult;
336 | imult = 1;
337 | }
338
339What is important is not to mix styles. For instance there is nothing wrong
340with having many one-line "case" statements as long as most of them are this
341short like below :
342
343 | switch (*arg) {
344 | case 'A': ret = 1; break;
345 | case 'B': ret = 2; break;
346 | case 'C': ret = 4; break;
347 | case 'D': ret = 8; break;
348 | default : ret = 0; break;
349 | }
350
351Otherwise, prefer to have the "case" statement on its own line as in the
352example in section 1.2 about alignment. In any case, avoid to stack multiple
353control statements on the same line, so that it will never be the needed to
354add two tab levels at once :
355
356Right :
357
358 | switch (*arg) {
359 | case 'A':
360 | if (ret < 0)
361 | ret = 1;
362 | break;
363 | default : ret = 0; break;
364 | }
365
366Wrong :
367
368 | switch (*arg) {
369 | case 'A': if (ret < 0)
370 | ret = 1;
371 | break;
372 | default : ret = 0; break;
373 | }
374
375Right :
376
377 | if (argc < 2)
378 | if (ret < 0)
379 | return -1;
380
381or Right :
382
383 | if (argc < 2)
384 | if (ret < 0) return -1;
385
386but Wrong :
387
388 | if (argc < 2) if (ret < 0) return -1;
389
390
391When complex conditions or expressions are broken into multiple lines, please
392do ensure that alignment is perfectly appropriate, and group all main operators
393on the same side (which you're free to choose as long as it does not change for
394every block. Putting binary operators on the right side is preferred as it does
395not mangle with alignment but various people have their preferences.
396
397Right :
398
399 | if ((txn->flags & TX_NOT_FIRST) &&
400 | ((req->flags & BF_FULL) ||
401 | req->r < req->lr ||
402 | req->r > req->data + req->size - global.tune.maxrewrite)) {
403 | return 0;
404 | }
405
406Right :
407
408 | if ((txn->flags & TX_NOT_FIRST)
409 | && ((req->flags & BF_FULL)
410 | || req->r < req->lr
411 | || req->r > req->data + req->size - global.tune.maxrewrite)) {
412 | return 0;
413 | }
414
415Wrong :
416
417 | if ((txn->flags & TX_NOT_FIRST) &&
418 | ((req->flags & BF_FULL) ||
419 | req->r < req->lr
420 | || req->r > req->data + req->size - global.tune.maxrewrite)) {
421 | return 0;
422 | }
423
424If it makes the result more readable, parenthesis may even be closed on their
425own line in order to align with the opening one. Note that should normally not
426be needed because such code would be too complex to be digged into.
427
428The "else" statement may either be merged with the closing "if" brace or lie on
429its own line. The later is preferred but it adds one extra line to each control
430block which is annoying in short ones. However, if the "else" is followed by an
431"if", then it should really be on its own line and the rest of the if/else
432blocks must follow the same style.
433
434Right :
435
436 | if (a < b) {
437 | return a;
438 | }
439 | else {
440 | return b;
441 | }
442
443Right :
444
445 | if (a < b) {
446 | return a;
447 | } else {
448 | return b;
449 | }
450
451Right :
452
453 | if (a < b) {
454 | return a;
455 | }
456 | else if (a != b) {
457 | return b;
458 | }
459 | else {
460 | return 0;
461 | }
462
463Wrong :
464
465 | if (a < b) {
466 | return a;
467 | } else if (a != b) {
468 | return b;
469 | } else {
470 | return 0;
471 | }
472
473Wrong :
474
475 | if (a < b) {
476 | return a;
477 | }
478 | else if (a != b) {
479 | return b;
480 | } else {
481 | return 0;
482 | }
483
484
4854) Spacing
486----------
487
488Correctly spacing code is very important. When you have to spot a bug at 3am,
489you need it to be clear. When you expect other people to review your code, you
490want it to be clear and don't want them to get nervous when trying to find what
491you did.
492
493Always place spaces around all binary or ternary operators, commas, as well as
494after semi-colons and opening braces if the line continues :
495
496Right :
497
498 | int ret = 0;
499 | /* if (x >> 4) { x >>= 4; ret += 4; } */
500 | ret += (x >> 4) ? (x >>= 4, 4) : 0;
501 | val = ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
502
503Wrong :
504
505 | int ret=0;
506 | /* if (x>>4) {x>>=4;ret+=4;} */
507 | ret+=(x>>4)?(x>>=4,4):0;
508 | val=ret+((0xFFFFAA50U>>(x<<1))&3)+1;
509
510Never place spaces after unary operators (&, *, -, !, ~, ++, --) nor cast, as
511they might be confused with they binary counterpart, nor before commas or
512semicolons :
513
514Right :
515
516 | bit = !!(~len++ ^ -(unsigned char)*x);
517
518Wrong :
519
520 | bit = ! ! (~len++ ^ - (unsigned char) * x) ;
521
522Note that "sizeof" is a unary operator which is sometimes considered as a
Michael Prokop4438c602019-05-24 10:25:45 +0200523language keyword, but in no case it is a function. It does not require
Willy Tarreau7f051b32011-12-30 17:16:22 +0100524parenthesis so it is sometimes followed by spaces and sometimes not when
525there are no parenthesis. Most people do not really care as long as what
526is written is unambiguous.
527
Joseph Herlant71b4b152018-11-13 16:55:16 -0800528Braces opening a block must be preceded by one space unless the brace is
Willy Tarreau7f051b32011-12-30 17:16:22 +0100529placed on the first column :
530
531Right :
532
533 | if (argc < 2) {
534 | }
535
536Wrong :
537
538 | if (argc < 2){
539 | }
540
541Do not add unneeded spaces inside parenthesis, they just make the code less
542readable.
543
544Right :
545
546 | if (x < 4 && (!y || !z))
547 | break;
548
549Wrong :
550
551 | if ( x < 4 && ( !y || !z ) )
552 | break;
553
554Language keywords must all be followed by a space. This is true for control
555statements (do, for, while, if, else, return, switch, case), and for types
556(int, char, unsigned). As an exception, the last type in a cast does not take
557a space before the closing parenthesis). The "default" statement in a "switch"
558construct is generally just followed by the colon. However the colon after a
559"case" or "default" statement must be followed by a space.
560
561Right :
562
563 | if (nbargs < 2) {
564 | printf("Missing arg at %c\n", *(char *)ptr);
565 | for (i = 0; i < 10; i++) beep();
566 | return 0;
567 | }
568 | switch (*arg) {
569
570Wrong :
571
572 | if(nbargs < 2){
573 | printf("Missing arg at %c\n", *(char*)ptr);
574 | for(i = 0; i < 10; i++)beep();
575 | return 0;
576 | }
577 | switch(*arg) {
578
579Function calls are different, the opening parenthesis is always coupled to the
580function name without any space. But spaces are still needed after commas :
581
582Right :
583
584 | if (!init(argc, argv))
585 | exit(1);
586
587Wrong :
588
589 | if (!init (argc,argv))
590 | exit(1);
591
592
5935) Excess or lack of parenthesis
594--------------------------------
595
596Sometimes there are too many parenthesis in some formulas, sometimes there are
597too few. There are a few rules of thumb for this. The first one is to respect
598the compiler's advice. If it emits a warning and asks for more parenthesis to
599avoid confusion, follow the advice at least to shut the warning. For instance,
600the code below is quite ambiguous due to its alignment :
601
602 | if (var1 < 2 || var2 < 2 &&
603 | var3 != var4) {
604 | /* fail */
605 | return -3;
606 | }
607
608Note that this code does :
609
610 | if (var1 < 2 || (var2 < 2 && var3 != var4)) {
611 | /* fail */
612 | return -3;
613 | }
614
615But maybe the author meant :
616
617 | if ((var1 < 2 || var2 < 2) && var3 != var4) {
618 | /* fail */
619 | return -3;
620 | }
621
622A second rule to put parenthesis is that people don't always know operators
623precedence too well. Most often they have no issue with operators of the same
624category (eg: booleans, integers, bit manipulation, assignment) but once these
625operators are mixed, it causes them all sort of issues. In this case, it is
626wise to use parenthesis to avoid errors. One common error concerns the bit
627shift operators because they're used to replace multiplies and divides but
628don't have the same precedence :
629
630The expression :
631
632 | x = y * 16 + 5;
633
634becomes :
635
636 | x = y << 4 + 5;
637
638which is wrong because it is equivalent to :
639
640 | x = y << (4 + 5);
641
642while the following was desired instead :
643
644 | x = (y << 4) + 5;
645
646It is generally fine to write boolean expressions based on comparisons without
647any parenthesis. But on top of that, integer expressions and assignments should
648then be protected. For instance, there is an error in the expression below
649which should be safely rewritten :
650
651Wrong :
652
653 | if (var1 > 2 && var1 < 10 ||
654 | var1 > 2 + 256 && var2 < 10 + 256 ||
655 | var1 > 2 + 1 << 16 && var2 < 10 + 2 << 16)
656 | return 1;
657
658Right (may remove a few parenthesis depending on taste) :
659
660 | if ((var1 > 2 && var1 < 10) ||
661 | (var1 > (2 + 256) && var2 < (10 + 256)) ||
662 | (var1 > (2 + (1 << 16)) && var2 < (10 + (1 << 16))))
663 | return 1;
664
665The "return" statement is not a function, so it takes no argument. It is a
666control statement which is followed by the expression to be returned. It does
667not need to be followed by parenthesis :
668
669Wrong :
670
671 | int ret0()
672 | {
673 | return(0);
674 | }
675
676Right :
677
678 | int ret0()
679 | {
680 | return 0;
681 | }
682
683Parenthesisis are also found in type casts. Type casting should be avoided as
684much as possible, especially when it concerns pointer types. Casting a pointer
685disables the compiler's type checking and is the best way to get caught doing
686wrong things with data not the size you expect. If you need to manipulate
687multiple data types, you can use a union instead. If the union is really not
688convenient and casts are easier, then try to isolate them as much as possible,
689for instance when initializing function arguments or in another function. Not
690proceeding this way causes huge risks of not using the proper pointer without
691any notification, which is especially true during copy-pastes.
692
693Wrong :
694
695 | void *check_private_data(void *arg1, void *arg2)
696 | {
697 | char *area;
698 |
699 | if (*(int *)arg1 > 1000)
700 | return NULL;
701 | if (memcmp(*(const char *)arg2, "send(", 5) != 0))
702 | return NULL;
703 | area = malloc(*(int *)arg1);
704 | if (!area)
705 | return NULL;
706 | memcpy(area, *(const char *)arg2 + 5, *(int *)arg1);
707 | return area;
708 | }
709
710Right :
711
712 | void *check_private_data(void *arg1, void *arg2)
713 | {
714 | char *area;
715 | int len = *(int *)arg1;
716 | const char *msg = arg2;
717 |
718 | if (len > 1000)
719 | return NULL;
720 | if (memcmp(msg, "send(", 5) != 0)
721 | return NULL;
722 | area = malloc(len);
723 | if (!area)
724 | return NULL;
725 | memcpy(area, msg + 5, len);
726 | return area;
727 | }
728
729
7306) Ambiguous comparisons with zero or NULL
731------------------------------------------
732
733In C, '0' has no type, or it has the type of the variable it is assigned to.
734Comparing a variable or a return value with zero means comparing with the
735representation of zero for this variable's type. For a boolean, zero is false.
736For a pointer, zero is NULL. Very often, to make things shorter, it is fine to
737use the '!' unary operator to compare with zero, as it is shorter and easier to
738remind or understand than a plain '0'. Since the '!' operator is read "not", it
739helps read code faster when what follows it makes sense as a boolean, and it is
740often much more appropriate than a comparison with zero which makes an equal
741sign appear at an undesirable place. For instance :
742
743 | if (!isdigit(*c) && !isspace(*c))
744 | break;
745
746is easier to understand than :
747
748 | if (isdigit(*c) == 0 && isspace(*c) == 0)
749 | break;
750
751For a char this "not" operator can be reminded as "no remaining char", and the
752absence of comparison to zero implies existence of the tested entity, hence the
753simple strcpy() implementation below which automatically stops once the last
754zero is copied :
755
756 | void my_strcpy(char *d, const char *s)
757 | {
758 | while ((*d++ = *s++));
759 | }
760
761Note the double parenthesis in order to avoid the compiler telling us it looks
762like an equality test.
763
764For a string or more generally any pointer, this test may be understood as an
765existence test or a validity test, as the only pointer which will fail to
766validate equality is the NULL pointer :
767
768 | area = malloc(1000);
769 | if (!area)
770 | return -1;
771
772However sometimes it can fool the reader. For instance, strcmp() precisely is
773one of such functions whose return value can make one think the opposite due to
774its name which may be understood as "if strings compare...". Thus it is strongly
775recommended to perform an explicit comparison with zero in such a case, and it
776makes sense considering that the comparison's operator is the same that is
777wanted to compare the strings (note that current config parser lacks a lot in
778this regards) :
779
780 strcmp(a, b) == 0 <=> a == b
781 strcmp(a, b) != 0 <=> a != b
782 strcmp(a, b) < 0 <=> a < b
783 strcmp(a, b) > 0 <=> a > b
784
785Avoid this :
786
787 | if (strcmp(arg, "test"))
788 | printf("this is not a test\n");
789 |
790 | if (!strcmp(arg, "test"))
791 | printf("this is a test\n");
792
793Prefer this :
794
795 | if (strcmp(arg, "test") != 0)
796 | printf("this is not a test\n");
797 |
798 | if (strcmp(arg, "test") == 0)
799 | printf("this is a test\n");
800
801
8027) System call returns
803----------------------
804
805This is not directly a matter of coding style but more of bad habits. It is
806important to check for the correct value upon return of syscalls. The proper
807return code indicating an error is described in its man page. There is no
808reason to consider wider ranges than what is indicated. For instance, it is
809common to see such a thing :
810
811 | if ((fd = open(file, O_RDONLY)) < 0)
812 | return -1;
813
Joseph Herlant71b4b152018-11-13 16:55:16 -0800814This is wrong. The man page says that -1 is returned if an error occurred. It
Willy Tarreau7f051b32011-12-30 17:16:22 +0100815does not suggest that any other negative value will be an error. It is possible
816that a few such issues have been left in existing code. They are bugs for which
Michael Prokop4438c602019-05-24 10:25:45 +0200817fixes are accepted, even though they're currently harmless since open() is not
Willy Tarreau7f051b32011-12-30 17:16:22 +0100818known for returning negative values at the moment.
819
820
8218) Declaring new types, names and values
822----------------------------------------
823
824Please refrain from using "typedef" to declare new types, they only obfuscate
825the code. The reader never knows whether he's manipulating a scalar type or a
826struct. For instance it is not obvious why the following code fails to build :
827
828 | int delay_expired(timer_t exp, timer_us_t now)
829 | {
830 | return now >= exp;
831 | }
832
833With the types declared in another file this way :
834
835 | typedef unsigned int timer_t;
836 | typedef struct timeval timer_us_t;
837
838This cannot work because we're comparing a scalar with a struct, which does
839not make sense. Without a typedef, the function would have been written this
840way without any ambiguity and would not have failed :
841
842 | int delay_expired(unsigned int exp, struct timeval *now)
843 | {
844 | return now >= exp->tv_sec;
845 | }
846
847Declaring special values may be done using enums. Enums are a way to define
848structured integer values which are related to each other. They are perfectly
849suited for state machines. While the first element is always assigned the zero
850value, not everybody knows that, especially people working with multiple
851languages all the day. For this reason it is recommended to explicitly force
852the first value even if it's zero. The last element should be followed by a
853comma if it is planned that new elements might later be added, this will make
854later patches shorter. Conversely, if the last element is placed in order to
855get the number of possible values, it must not be followed by a comma and must
Joseph Herlant71b4b152018-11-13 16:55:16 -0800856be preceded by a comment :
Willy Tarreau7f051b32011-12-30 17:16:22 +0100857
858 | enum {
859 | first = 0,
860 | second,
861 | third,
862 | fourth,
863 | };
864
865
866 | enum {
867 | first = 0,
868 | second,
869 | third,
870 | fourth,
871 | /* nbvalues must always be placed last */
872 | nbvalues
873 | };
874
875Structure names should be short enough not to mangle function declarations,
876and explicit enough to avoid confusion (which is the most important thing).
877
878Wrong :
879
880 | struct request_args { /* arguments on the query string */
881 | char *name;
882 | char *value;
883 | struct misc_args *next;
884 | };
885
886Right :
887
888 | struct qs_args { /* arguments on the query string */
889 | char *name;
890 | char *value;
891 | struct qs_args *next;
892 | }
893
894
895When declaring new functions or structures, please do not use CamelCase, which
896is a style where upper and lower case are mixed in a single word. It causes a
897lot of confusion when words are composed from acronyms, because it's hard to
898stick to a rule. For instance, a function designed to generate an ISN (initial
899sequence number) for a TCP/IP connection could be called :
900
901 - generateTcpipIsn()
902 - generateTcpIpIsn()
903 - generateTcpIpISN()
904 - generateTCPIPISN()
905 etc...
906
907None is right, none is wrong, these are just preferences which might change
908along the code. Instead, please use an underscore to separate words. Lowercase
909is preferred for the words, but if acronyms are upcased it's not dramatic. The
910real advantage of this method is that it creates unambiguous levels even for
911short names.
912
913Valid examples :
914
915 - generate_tcpip_isn()
916 - generate_tcp_ip_isn()
917 - generate_TCPIP_ISN()
918 - generate_TCP_IP_ISN()
919
920Another example is easy to understand when 3 arguments are involved in naming
921the function :
922
923Wrong (naming conflict) :
924
925 | /* returns A + B * C */
926 | int mulABC(int a, int b, int c)
927 | {
928 | return a + b * c;
929 | }
930 |
931 | /* returns (A + B) * C */
932 | int mulABC(int a, int b, int c)
933 | {
934 | return (a + b) * c;
935 | }
936
937Right (unambiguous naming) :
938
939 | /* returns A + B * C */
940 | int mul_a_bc(int a, int b, int c)
941 | {
942 | return a + b * c;
943 | }
944 |
945 | /* returns (A + B) * C */
946 | int mul_ab_c(int a, int b, int c)
947 | {
948 | return (a + b) * c;
949 | }
950
951Whenever you manipulate pointers, try to declare them as "const", as it will
952save you from many accidental misuses and will only cause warnings to be
953emitted when there is a real risk. In the examples below, it is possible to
954call my_strcpy() with a const string only in the first declaration. Note that
955people who ignore "const" are often the ones who cast a lot and who complain
956from segfaults when using strtok() !
957
958Right :
959
960 | void my_strcpy(char *d, const char *s)
961 | {
962 | while ((*d++ = *s++));
963 | }
964 |
965 | void say_hello(char *dest)
966 | {
967 | my_strcpy(dest, "hello\n");
968 | }
969
970Wrong :
971
972 | void my_strcpy(char *d, char *s)
973 | {
974 | while ((*d++ = *s++));
975 | }
976 |
977 | void say_hello(char *dest)
978 | {
979 | my_strcpy(dest, "hello\n");
980 | }
981
982
9839) Getting macros right
984-----------------------
985
986It is very common for macros to do the wrong thing when used in a way their
987author did not have in mind. For this reason, macros must always be named with
988uppercase letters only. This is the only way to catch the developer's eye when
989using them, so that he double-checks whether he's taking risks or not. First,
990macros must never ever be terminated by a semi-colon, or they will close the
991wrong block once in a while. For instance, the following will cause a build
992error before the "else" due to the double semi-colon :
993
994Wrong :
995
996 | #define WARN printf("warning\n");
997 | ...
998 | if (a < 0)
999 | WARN;
1000 | else
1001 | a--;
1002
1003Right :
1004
1005 | #define WARN printf("warning\n")
1006
1007If multiple instructions are needed, then use a do { } while (0) block, which
1008is the only construct which respects *exactly* the semantics of a single
1009instruction :
1010
1011 | #define WARN do { printf("warning\n"); log("warning\n"); } while (0)
1012 | ...
1013 |
1014 | if (a < 0)
1015 | WARN;
1016 | else
1017 | a--;
1018
1019Second, do not put unprotected control statements in macros, they will
1020definitely cause bugs :
1021
1022Wrong :
1023
1024 | #define WARN if (verbose) printf("warning\n")
1025 | ...
1026 | if (a < 0)
1027 | WARN;
1028 | else
1029 | a--;
1030
1031Which is equivalent to the undesired form below :
1032
1033 | if (a < 0)
1034 | if (verbose)
1035 | printf("warning\n");
1036 | else
1037 | a--;
1038
1039Right way to do it :
1040
1041 | #define WARN do { if (verbose) printf("warning\n"); } while (0)
1042 | ...
1043 | if (a < 0)
1044 | WARN;
1045 | else
1046 | a--;
1047
1048Which is equivalent to :
1049
1050 | if (a < 0)
1051 | do { if (verbose) printf("warning\n"); } while (0);
1052 | else
1053 | a--;
1054
1055Macro parameters must always be surrounded by parenthesis, and must never be
1056duplicated in the same macro unless explicitly stated. Also, macros must not be
1057defined with operators without surrounding parenthesis. The MIN/MAX macros are
1058a pretty common example of multiple misuses, but this happens as early as when
1059using bit masks. Most often, in case of any doubt, try to use inline functions
1060instead.
1061
1062Wrong :
1063
1064 | #define MIN(a, b) a < b ? a : b
1065 |
1066 | /* returns 2 * min(a,b) + 1 */
1067 | int double_min_p1(int a, int b)
1068 | {
1069 | return 2 * MIN(a, b) + 1;
1070 | }
1071
1072What this will do :
1073
1074 | int double_min_p1(int a, int b)
1075 | {
1076 | return 2 * a < b ? a : b + 1;
1077 | }
1078
1079Which is equivalent to :
1080
1081 | int double_min_p1(int a, int b)
1082 | {
1083 | return (2 * a) < b ? a : (b + 1);
1084 | }
1085
1086The first thing to fix is to surround the macro definition with parenthesis to
1087avoid this mistake :
1088
1089 | #define MIN(a, b) (a < b ? a : b)
1090
1091But this is still not enough, as can be seen in this example :
1092
1093 | /* compares either a or b with c */
1094 | int min_ab_c(int a, int b, int c)
1095 | {
1096 | return MIN(a ? a : b, c);
1097 | }
1098
1099Which is equivalent to :
1100
1101 | int min_ab_c(int a, int b, int c)
1102 | {
1103 | return (a ? a : b < c ? a ? a : b : c);
1104 | }
1105
1106Which in turn means a totally different thing due to precedence :
1107
1108 | int min_ab_c(int a, int b, int c)
1109 | {
1110 | return (a ? a : ((b < c) ? (a ? a : b) : c));
1111 | }
1112
1113This can be fixed by surrounding *each* argument in the macro with parenthesis:
1114
1115 | #define MIN(a, b) ((a) < (b) ? (a) : (b))
1116
1117But this is still not enough, as can be seen in this example :
1118
1119 | int min_ap1_b(int a, int b)
1120 | {
1121 | return MIN(++a, b);
1122 | }
1123
1124Which is equivalent to :
1125
1126 | int min_ap1_b(int a, int b)
1127 | {
1128 | return ((++a) < (b) ? (++a) : (b));
1129 | }
1130
1131Again, this is wrong because "a" is incremented twice if below b. The only way
1132to fix this is to use a compound statement and to assign each argument exactly
1133once to a local variable of the same type :
1134
1135 | #define MIN(a, b) ({ typeof(a) __a = (a); typeof(b) __b = (b); \
1136 | ((__a) < (__b) ? (__a) : (__b)); \
1137 | })
1138
1139At this point, using static inline functions is much cleaner if a single type
1140is to be used :
1141
1142 | static inline int min(int a, int b)
1143 | {
1144 | return a < b ? a : b;
1145 | }
1146
1147
114810) Includes
1149------------
1150
1151Includes are as much as possible listed in alphabetically ordered groups :
1152 - the libc-standard includes (those without any path component)
1153 - the includes more or less system-specific (sys/*, netinet/*, ...)
1154 - includes from the local "common" subdirectory
1155 - includes from the local "types" subdirectory
1156 - includes from the local "proto" subdirectory
1157
1158Each section is just visually delimited from the other ones using an empty
1159line. The two first ones above may be merged into a single section depending on
1160developer's preference. Please do not copy-paste include statements from other
1161files. Having too many includes significantly increases build time and makes it
1162hard to find which ones are needed later. Just include what you need and if
1163possible in alphabetical order so that when something is missing, it becomes
1164obvious where to look for it and where to add it.
1165
1166All files should include <common/config.h> because this is where build options
1167are prepared.
1168
1169Header files are split in two directories ("types" and "proto") depending on
1170what they provide. Types, structures, enums and #defines must go into the
1171"types" directory. Function prototypes and inlined functions must go into the
1172"proto" directory. This split is because of inlined functions which
1173cross-reference types from other files, which cause a chicken-and-egg problem
1174if the functions and types are declared at the same place.
1175
1176All headers which do not depend on anything currently go to the "common"
1177subdirectory, but are equally well placed into the "proto" directory. It is
1178possible that one day the "common" directory will disappear.
1179
1180Include files must be protected against multiple inclusion using the common
1181#ifndef/#define/#endif trick with a tag derived from the include file and its
1182location.
1183
1184
118511) Comments
1186------------
1187
1188Comments are preferably of the standard 'C' form using /* */. The C++ form "//"
1189are tolerated for very short comments (eg: a word or two) but should be avoided
1190as much as possible. Multi-line comments are made with each intermediate line
1191starting with a star aligned with the first one, as in this example :
1192
1193 | /*
1194 | * This is a multi-line
1195 | * comment.
1196 | */
1197
1198If multiple code lines need a short comment, try to align them so that you can
1199have multi-line sentences. This is rarely needed, only for really complex
1200constructs.
1201
1202Do not tell what you're doing in comments, but explain why you're doing it if
1203it seems not to be obvious. Also *do* indicate at the top of function what they
1204accept and what they don't accept. For instance, strcpy() only accepts output
1205buffers at least as large as the input buffer, and does not support any NULL
1206pointer. There is nothing wrong with that if the caller knows it.
1207
1208Wrong use of comments :
1209
1210 | int flsnz8(unsigned int x)
1211 | {
1212 | int ret = 0; /* initialize ret */
1213 | if (x >> 4) { x >>= 4; ret += 4; } /* add 4 to ret if needed */
1214 | return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1; /* add ??? */
1215 | }
1216 | ...
1217 | bit = ~len + (skip << 3) + 9; /* update bit */
1218
1219Right use of comments :
1220
Ilya Shipitsin4f9532d2020-03-06 13:07:38 +05001221 | /* This function returns the position of the highest bit set in the lowest
Willy Tarreau7f051b32011-12-30 17:16:22 +01001222 | * byte of <x>, between 0 and 7. It only works if <x> is non-null. It uses
1223 | * a 32-bit value as a lookup table to return one of 4 values for the
1224 | * highest 16 possible 4-bit values.
1225 | */
1226 | int flsnz8(unsigned int x)
1227 | {
1228 | int ret = 0;
1229 | if (x >> 4) { x >>= 4; ret += 4; }
1230 | return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
1231 | }
1232 | ...
1233 | bit = ~len + (skip << 3) + 9; /* (skip << 3) + (8 - len), saves 1 cycle */
1234
1235
123612) Use of assembly
1237-------------------
1238
1239There are many projects where use of assembly code is not welcome. There is no
1240problem with use of assembly in haproxy, provided that :
1241
1242 a) an alternate C-form is provided for architectures not covered
1243 b) the code is small enough and well commented enough to be maintained
1244
1245It is important to take care of various incompatibilities between compiler
1246versions, for instance regarding output and cloberred registers. There are
1247a number of documentations on the subject on the net. Anyway if you are
1248fiddling with assembly, you probably know that already.
1249
1250Example :
1251 | /* gcc does not know when it can safely divide 64 bits by 32 bits. Use this
1252 | * function when you know for sure that the result fits in 32 bits, because
1253 | * it is optimal on x86 and on 64bit processors.
1254 | */
1255 | static inline unsigned int div64_32(unsigned long long o1, unsigned int o2)
1256 | {
1257 | unsigned int result;
1258 | #ifdef __i386__
1259 | asm("divl %2"
1260 | : "=a" (result)
1261 | : "A"(o1), "rm"(o2));
1262 | #else
1263 | result = o1 / o2;
1264 | #endif
1265 | return result;
1266 | }
1267