blob: 297e67c88d1c31fb04443be1232999ad5e61687b [file] [log] [blame]
Willy Tarreau7f051b32011-12-30 17:16:22 +010012011/12/30 - HAProxy coding style - Willy Tarreau <w@1wt.eu>
2------------------------------------------------------------
3
4A number of contributors are often embarrassed with coding style issues, they
5don't always know if they're doing it right, especially since the coding style
6has elvoved along the years. What is explained here is not necessarily what is
7applied in the code, but new code should as much as possible conform to this
8style. Coding style fixes happen when code is replaced. It is useless to send
9patches to fix coding style only, they will be rejected, unless they belong to
10a patch series which needs these fixes prior to get code changes. Also, please
11avoid fixing coding style in the same patches as functional changes, they make
12code review harder.
13
14When modifying a file, you must accept the terms of the license of this file
15which is recalled at the top of the file, or is explained in the LICENSE file,
16or if not stated, defaults to LGPL version 2.1 or later for files in the
17'include' directory, and GPL version 2 or later for all other files.
18
19When adding a new file, you must add a copyright banner at the top of the
20file with your real name, e-mail address and a reminder of the license.
21Contributions under incompatible licenses or too restrictive licenses might
22get rejected. If in doubt, please apply the principle above for existing files.
23
24All code examples below will intentionally be prefixed with " | " to mark
25where the code aligns with the first column, and tabs in this document will be
26represented as a series of 8 spaces so that it displays the same everywhere.
27
28
291) Indentation and alignment
30----------------------------
31
321.1) Indentation
33----------------
34
35Indentation and alignment are two completely different things that people often
36get wrong. Indentation is used to mark a sub-level in the code. A sub-level
37means that a block is executed in the context of another block (eg: a function
38or a condition) :
39
40 | main(int argc, char **argv)
41 | {
42 | int i;
43 |
44 | if (argc < 2)
45 | exit(1);
46 | }
47
48In the example above, the code belongs to the main() function and the exit()
49call belongs to the if statement. Indentation is made with tabs (\t, ASCII 9),
50which allows any developer to configure their preferred editor to use their
51own tab size and to still get the text properly indented. Exactly one tab is
52used per sub-level. Tabs may only appear at the beginning of a line or after
53another tab. It is illegal to put a tab after some text, as it mangles displays
54in a different manner for different users (particularly when used to align
55comments or values after a #define). If you're tempted to put a tab after some
56text, then you're doing it wrong and you need alignment instead (see below).
57
58Note that there are places where the code was not properly indented in the
59past. In order to view it correctly, you may have to set your tab size to 8
60characters.
61
62
631.2) Alignment
64--------------
65
66Alignment is used to continue a line in a way to makes things easier to group
67together. By definition, alignment is character-based, so it uses spaces. Tabs
68would not work because for one tab there would not be as many characters on all
69displays. For instance, the arguments in a function declaration may be broken
70into multiple lines using alignment spaces :
71
72 | int http_header_match2(const char *hdr, const char *end,
73 | const char *name, int len)
74 | {
75 | ...
76 | }
77
78In this example, the "const char *name" part is aligned with the first
79character of the group it belongs to (list of function arguments). Placing it
80here makes it obvious that it's one of the function's arguments. Multiple lines
81are easy to handle this way. This is very common with long conditions too :
82
83 | if ((len < eol - sol) &&
84 | (sol[len] == ':') &&
85 | (strncasecmp(sol, name, len) == 0)) {
86 | ctx->del = len;
87 | }
88
89If we take again the example above marking tabs with "[-Tabs-]" and spaces
90with "#", we get this :
91
92 | [-Tabs-]if ((len < eol - sol) &&
93 | [-Tabs-]####(sol[len] == ':') &&
94 | [-Tabs-]####(strncasecmp(sol, name, len) == 0)) {
95 | [-Tabs-][-Tabs-]ctx->del = len;
96 | [-Tabs-]}
97
98It is worth noting that some editors tend to confuse indentations and aligment.
99Emacs is notoriously known for this brokenness, and is responsible for almost
100all of the alignment mess. The reason is that Emacs only counts spaces, tries
101to fill as many as possible with tabs and completes with spaces. Once you know
102it, you just have to be careful, as alignment is not used much, so generally it
103is just a matter of replacing the last tab with 8 spaces when this happens.
104
105Indentation should be used everywhere there is a block or an opening brace. It
106is not possible to have two consecutive closing braces on the same column, it
107means that the innermost was not indented.
108
109Right :
110
111 | main(int argc, char **argv)
112 | {
113 | if (argc > 1) {
114 | printf("Hello\n");
115 | }
116 | exit(0);
117 | }
118
119Wrong :
120
121 | main(int argc, char **argv)
122 | {
123 | if (argc > 1) {
124 | printf("Hello\n");
125 | }
126 | exit(0);
127 | }
128
129A special case applies to switch/case statements. Due to my editor's settings,
130I've been used to align "case" with "switch" and to find it somewhat logical
131since each of the "case" statements opens a sublevel belonging to the "switch"
132statement. But indenting "case" after "switch" is accepted too. However in any
133case, whatever follows the "case" statement must be indented, whether or not it
134contains braces :
135
136 | switch (*arg) {
137 | case 'A': {
138 | int i;
139 | for (i = 0; i < 10; i++)
140 | printf("Please stop pressing 'A'!\n");
141 | break;
142 | }
143 | case 'B':
144 | printf("You pressed 'B'\n");
145 | break;
146 | case 'C':
147 | case 'D':
148 | printf("You pressed 'C' or 'D'\n");
149 | break;
150 | default:
151 | printf("I don't know what you pressed\n");
152 | }
153
154
1552) Braces
156---------
157
158Braces are used to delimit multiple-instruction blocks. In general it is
159preferred to avoid braces around single-instruction blocks as it reduces the
160number of lines :
161
162Right :
163
164 | if (argc >= 2)
165 | exit(0);
166
167Wrong :
168
169 | if (argc >= 2) {
170 | exit(0);
171 | }
172
173But it is not that strict, it really depends on the context. It happens from
174time to time that single-instruction blocks are enclosed within braces because
175it makes the code more symmetrical, or more readable. Example :
176
177 | if (argc < 2) {
178 | printf("Missing argument\n");
179 | exit(1);
180 | } else {
181 | exit(0);
182 | }
183
184Braces are always needed to declare a function. A function's opening brace must
185be placed at the beginning of the next line :
186
187Right :
188
189 | int main(int argc, char **argv)
190 | {
191 | exit(0);
192 | }
193
194Wrong :
195
196 | int main(int argc, char **argv) {
197 | exit(0);
198 | }
199
200Note that a large portion of the code still does not conforms to this rule, as
201it took years to me to adapt to this more common standard which I now tend to
202prefer, as it avoids visual confusion when function declarations are broken on
203multiple lines :
204
205Right :
206
207 | int foo(const char *hdr, const char *end,
208 | const char *name, const char *err,
209 | int len)
210 | {
211 | int i;
212
213Wrong :
214
215 | int foo(const char *hdr, const char *end,
216 | const char *name, const char *err,
217 | int len) {
218 | int i;
219
220Braces should always be used where there might be an ambiguity with the code
221later. The most common example is the stacked "if" statement where an "else"
222may be added later at the wrong place breaking the code, but it also happens
223with comments or long arguments in function calls. In general, if a block is
224more than one line long, it should use braces.
225
226Dangerous code waiting of a victim :
227
228 | if (argc < 2)
229 | /* ret must not be negative here */
230 | if (ret < 0)
231 | return -1;
232
233Wrong change :
234
235 | if (argc < 2)
236 | /* ret must not be negative here */
237 | if (ret < 0)
238 | return -1;
239 | else
240 | return 0;
241
242It will do this instead of what your eye seems to tell you :
243
244 | if (argc < 2)
245 | /* ret must not be negative here */
246 | if (ret < 0)
247 | return -1;
248 | else
249 | return 0;
250
251Right :
252
253 | if (argc < 2) {
254 | /* ret must not be negative here */
255 | if (ret < 0)
256 | return -1;
257 | }
258 | else
259 | return 0;
260
261Similarly dangerous example :
262
263 | if (ret < 0)
264 | /* ret must not be negative here */
265 | complain();
266 | init();
267
268Wrong change to silent the annoying message :
269
270 | if (ret < 0)
271 | /* ret must not be negative here */
272 | //complain();
273 | init();
274
275... which in fact means :
276
277 | if (ret < 0)
278 | init();
279
280
2813) Breaking lines
282-----------------
283
284There is no strict rule for line breaking. Some files try to stick to the 80
285column limit, but given that various people use various tab sizes, it does not
286make much sense. Also, code is sometimes easier to read with less lines, as it
287represents less surface on the screen (since each new line adds its tabs and
288spaces). The rule is to stick to the average line length of other lines. If you
289are working in a file which fits in 80 columns, try to keep this goal in mind.
290If you're in a function with 120-chars lines, there is no reason to add many
291short lines, so you can make longer lines.
292
293In general, opening a new block should lead to a new line. Similarly, multiple
294instructions should be avoided on the same line. But some constructs make it
295more readable when those are perfectly aligned :
296
297A copy-paste bug in the following construct will be easier to spot :
298
299 | if (omult % idiv == 0) { omult /= idiv; idiv = 1; }
300 | if (idiv % omult == 0) { idiv /= omult; omult = 1; }
301 | if (imult % odiv == 0) { imult /= odiv; odiv = 1; }
302 | if (odiv % imult == 0) { odiv /= imult; imult = 1; }
303
304than in this one :
305
306 | if (omult % idiv == 0) {
307 | omult /= idiv;
308 | idiv = 1;
309 | }
310 | if (idiv % omult == 0) {
311 | idiv /= omult;
312 | omult = 1;
313 | }
314 | if (imult % odiv == 0) {
315 | imult /= odiv;
316 | odiv = 1;
317 | }
318 | if (odiv % imult == 0) {
319 | odiv /= imult;
320 | imult = 1;
321 | }
322
323What is important is not to mix styles. For instance there is nothing wrong
324with having many one-line "case" statements as long as most of them are this
325short like below :
326
327 | switch (*arg) {
328 | case 'A': ret = 1; break;
329 | case 'B': ret = 2; break;
330 | case 'C': ret = 4; break;
331 | case 'D': ret = 8; break;
332 | default : ret = 0; break;
333 | }
334
335Otherwise, prefer to have the "case" statement on its own line as in the
336example in section 1.2 about alignment. In any case, avoid to stack multiple
337control statements on the same line, so that it will never be the needed to
338add two tab levels at once :
339
340Right :
341
342 | switch (*arg) {
343 | case 'A':
344 | if (ret < 0)
345 | ret = 1;
346 | break;
347 | default : ret = 0; break;
348 | }
349
350Wrong :
351
352 | switch (*arg) {
353 | case 'A': if (ret < 0)
354 | ret = 1;
355 | break;
356 | default : ret = 0; break;
357 | }
358
359Right :
360
361 | if (argc < 2)
362 | if (ret < 0)
363 | return -1;
364
365or Right :
366
367 | if (argc < 2)
368 | if (ret < 0) return -1;
369
370but Wrong :
371
372 | if (argc < 2) if (ret < 0) return -1;
373
374
375When complex conditions or expressions are broken into multiple lines, please
376do ensure that alignment is perfectly appropriate, and group all main operators
377on the same side (which you're free to choose as long as it does not change for
378every block. Putting binary operators on the right side is preferred as it does
379not mangle with alignment but various people have their preferences.
380
381Right :
382
383 | if ((txn->flags & TX_NOT_FIRST) &&
384 | ((req->flags & BF_FULL) ||
385 | req->r < req->lr ||
386 | req->r > req->data + req->size - global.tune.maxrewrite)) {
387 | return 0;
388 | }
389
390Right :
391
392 | if ((txn->flags & TX_NOT_FIRST)
393 | && ((req->flags & BF_FULL)
394 | || req->r < req->lr
395 | || req->r > req->data + req->size - global.tune.maxrewrite)) {
396 | return 0;
397 | }
398
399Wrong :
400
401 | if ((txn->flags & TX_NOT_FIRST) &&
402 | ((req->flags & BF_FULL) ||
403 | req->r < req->lr
404 | || req->r > req->data + req->size - global.tune.maxrewrite)) {
405 | return 0;
406 | }
407
408If it makes the result more readable, parenthesis may even be closed on their
409own line in order to align with the opening one. Note that should normally not
410be needed because such code would be too complex to be digged into.
411
412The "else" statement may either be merged with the closing "if" brace or lie on
413its own line. The later is preferred but it adds one extra line to each control
414block which is annoying in short ones. However, if the "else" is followed by an
415"if", then it should really be on its own line and the rest of the if/else
416blocks must follow the same style.
417
418Right :
419
420 | if (a < b) {
421 | return a;
422 | }
423 | else {
424 | return b;
425 | }
426
427Right :
428
429 | if (a < b) {
430 | return a;
431 | } else {
432 | return b;
433 | }
434
435Right :
436
437 | if (a < b) {
438 | return a;
439 | }
440 | else if (a != b) {
441 | return b;
442 | }
443 | else {
444 | return 0;
445 | }
446
447Wrong :
448
449 | if (a < b) {
450 | return a;
451 | } else if (a != b) {
452 | return b;
453 | } else {
454 | return 0;
455 | }
456
457Wrong :
458
459 | if (a < b) {
460 | return a;
461 | }
462 | else if (a != b) {
463 | return b;
464 | } else {
465 | return 0;
466 | }
467
468
4694) Spacing
470----------
471
472Correctly spacing code is very important. When you have to spot a bug at 3am,
473you need it to be clear. When you expect other people to review your code, you
474want it to be clear and don't want them to get nervous when trying to find what
475you did.
476
477Always place spaces around all binary or ternary operators, commas, as well as
478after semi-colons and opening braces if the line continues :
479
480Right :
481
482 | int ret = 0;
483 | /* if (x >> 4) { x >>= 4; ret += 4; } */
484 | ret += (x >> 4) ? (x >>= 4, 4) : 0;
485 | val = ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
486
487Wrong :
488
489 | int ret=0;
490 | /* if (x>>4) {x>>=4;ret+=4;} */
491 | ret+=(x>>4)?(x>>=4,4):0;
492 | val=ret+((0xFFFFAA50U>>(x<<1))&3)+1;
493
494Never place spaces after unary operators (&, *, -, !, ~, ++, --) nor cast, as
495they might be confused with they binary counterpart, nor before commas or
496semicolons :
497
498Right :
499
500 | bit = !!(~len++ ^ -(unsigned char)*x);
501
502Wrong :
503
504 | bit = ! ! (~len++ ^ - (unsigned char) * x) ;
505
506Note that "sizeof" is a unary operator which is sometimes considered as a
507langage keyword, but in no case it is a function. It does not require
508parenthesis so it is sometimes followed by spaces and sometimes not when
509there are no parenthesis. Most people do not really care as long as what
510is written is unambiguous.
511
512Braces opening a block must be preceeded by one space unless the brace is
513placed on the first column :
514
515Right :
516
517 | if (argc < 2) {
518 | }
519
520Wrong :
521
522 | if (argc < 2){
523 | }
524
525Do not add unneeded spaces inside parenthesis, they just make the code less
526readable.
527
528Right :
529
530 | if (x < 4 && (!y || !z))
531 | break;
532
533Wrong :
534
535 | if ( x < 4 && ( !y || !z ) )
536 | break;
537
538Language keywords must all be followed by a space. This is true for control
539statements (do, for, while, if, else, return, switch, case), and for types
540(int, char, unsigned). As an exception, the last type in a cast does not take
541a space before the closing parenthesis). The "default" statement in a "switch"
542construct is generally just followed by the colon. However the colon after a
543"case" or "default" statement must be followed by a space.
544
545Right :
546
547 | if (nbargs < 2) {
548 | printf("Missing arg at %c\n", *(char *)ptr);
549 | for (i = 0; i < 10; i++) beep();
550 | return 0;
551 | }
552 | switch (*arg) {
553
554Wrong :
555
556 | if(nbargs < 2){
557 | printf("Missing arg at %c\n", *(char*)ptr);
558 | for(i = 0; i < 10; i++)beep();
559 | return 0;
560 | }
561 | switch(*arg) {
562
563Function calls are different, the opening parenthesis is always coupled to the
564function name without any space. But spaces are still needed after commas :
565
566Right :
567
568 | if (!init(argc, argv))
569 | exit(1);
570
571Wrong :
572
573 | if (!init (argc,argv))
574 | exit(1);
575
576
5775) Excess or lack of parenthesis
578--------------------------------
579
580Sometimes there are too many parenthesis in some formulas, sometimes there are
581too few. There are a few rules of thumb for this. The first one is to respect
582the compiler's advice. If it emits a warning and asks for more parenthesis to
583avoid confusion, follow the advice at least to shut the warning. For instance,
584the code below is quite ambiguous due to its alignment :
585
586 | if (var1 < 2 || var2 < 2 &&
587 | var3 != var4) {
588 | /* fail */
589 | return -3;
590 | }
591
592Note that this code does :
593
594 | if (var1 < 2 || (var2 < 2 && var3 != var4)) {
595 | /* fail */
596 | return -3;
597 | }
598
599But maybe the author meant :
600
601 | if ((var1 < 2 || var2 < 2) && var3 != var4) {
602 | /* fail */
603 | return -3;
604 | }
605
606A second rule to put parenthesis is that people don't always know operators
607precedence too well. Most often they have no issue with operators of the same
608category (eg: booleans, integers, bit manipulation, assignment) but once these
609operators are mixed, it causes them all sort of issues. In this case, it is
610wise to use parenthesis to avoid errors. One common error concerns the bit
611shift operators because they're used to replace multiplies and divides but
612don't have the same precedence :
613
614The expression :
615
616 | x = y * 16 + 5;
617
618becomes :
619
620 | x = y << 4 + 5;
621
622which is wrong because it is equivalent to :
623
624 | x = y << (4 + 5);
625
626while the following was desired instead :
627
628 | x = (y << 4) + 5;
629
630It is generally fine to write boolean expressions based on comparisons without
631any parenthesis. But on top of that, integer expressions and assignments should
632then be protected. For instance, there is an error in the expression below
633which should be safely rewritten :
634
635Wrong :
636
637 | if (var1 > 2 && var1 < 10 ||
638 | var1 > 2 + 256 && var2 < 10 + 256 ||
639 | var1 > 2 + 1 << 16 && var2 < 10 + 2 << 16)
640 | return 1;
641
642Right (may remove a few parenthesis depending on taste) :
643
644 | if ((var1 > 2 && var1 < 10) ||
645 | (var1 > (2 + 256) && var2 < (10 + 256)) ||
646 | (var1 > (2 + (1 << 16)) && var2 < (10 + (1 << 16))))
647 | return 1;
648
649The "return" statement is not a function, so it takes no argument. It is a
650control statement which is followed by the expression to be returned. It does
651not need to be followed by parenthesis :
652
653Wrong :
654
655 | int ret0()
656 | {
657 | return(0);
658 | }
659
660Right :
661
662 | int ret0()
663 | {
664 | return 0;
665 | }
666
667Parenthesisis are also found in type casts. Type casting should be avoided as
668much as possible, especially when it concerns pointer types. Casting a pointer
669disables the compiler's type checking and is the best way to get caught doing
670wrong things with data not the size you expect. If you need to manipulate
671multiple data types, you can use a union instead. If the union is really not
672convenient and casts are easier, then try to isolate them as much as possible,
673for instance when initializing function arguments or in another function. Not
674proceeding this way causes huge risks of not using the proper pointer without
675any notification, which is especially true during copy-pastes.
676
677Wrong :
678
679 | void *check_private_data(void *arg1, void *arg2)
680 | {
681 | char *area;
682 |
683 | if (*(int *)arg1 > 1000)
684 | return NULL;
685 | if (memcmp(*(const char *)arg2, "send(", 5) != 0))
686 | return NULL;
687 | area = malloc(*(int *)arg1);
688 | if (!area)
689 | return NULL;
690 | memcpy(area, *(const char *)arg2 + 5, *(int *)arg1);
691 | return area;
692 | }
693
694Right :
695
696 | void *check_private_data(void *arg1, void *arg2)
697 | {
698 | char *area;
699 | int len = *(int *)arg1;
700 | const char *msg = arg2;
701 |
702 | if (len > 1000)
703 | return NULL;
704 | if (memcmp(msg, "send(", 5) != 0)
705 | return NULL;
706 | area = malloc(len);
707 | if (!area)
708 | return NULL;
709 | memcpy(area, msg + 5, len);
710 | return area;
711 | }
712
713
7146) Ambiguous comparisons with zero or NULL
715------------------------------------------
716
717In C, '0' has no type, or it has the type of the variable it is assigned to.
718Comparing a variable or a return value with zero means comparing with the
719representation of zero for this variable's type. For a boolean, zero is false.
720For a pointer, zero is NULL. Very often, to make things shorter, it is fine to
721use the '!' unary operator to compare with zero, as it is shorter and easier to
722remind or understand than a plain '0'. Since the '!' operator is read "not", it
723helps read code faster when what follows it makes sense as a boolean, and it is
724often much more appropriate than a comparison with zero which makes an equal
725sign appear at an undesirable place. For instance :
726
727 | if (!isdigit(*c) && !isspace(*c))
728 | break;
729
730is easier to understand than :
731
732 | if (isdigit(*c) == 0 && isspace(*c) == 0)
733 | break;
734
735For a char this "not" operator can be reminded as "no remaining char", and the
736absence of comparison to zero implies existence of the tested entity, hence the
737simple strcpy() implementation below which automatically stops once the last
738zero is copied :
739
740 | void my_strcpy(char *d, const char *s)
741 | {
742 | while ((*d++ = *s++));
743 | }
744
745Note the double parenthesis in order to avoid the compiler telling us it looks
746like an equality test.
747
748For a string or more generally any pointer, this test may be understood as an
749existence test or a validity test, as the only pointer which will fail to
750validate equality is the NULL pointer :
751
752 | area = malloc(1000);
753 | if (!area)
754 | return -1;
755
756However sometimes it can fool the reader. For instance, strcmp() precisely is
757one of such functions whose return value can make one think the opposite due to
758its name which may be understood as "if strings compare...". Thus it is strongly
759recommended to perform an explicit comparison with zero in such a case, and it
760makes sense considering that the comparison's operator is the same that is
761wanted to compare the strings (note that current config parser lacks a lot in
762this regards) :
763
764 strcmp(a, b) == 0 <=> a == b
765 strcmp(a, b) != 0 <=> a != b
766 strcmp(a, b) < 0 <=> a < b
767 strcmp(a, b) > 0 <=> a > b
768
769Avoid this :
770
771 | if (strcmp(arg, "test"))
772 | printf("this is not a test\n");
773 |
774 | if (!strcmp(arg, "test"))
775 | printf("this is a test\n");
776
777Prefer this :
778
779 | if (strcmp(arg, "test") != 0)
780 | printf("this is not a test\n");
781 |
782 | if (strcmp(arg, "test") == 0)
783 | printf("this is a test\n");
784
785
7867) System call returns
787----------------------
788
789This is not directly a matter of coding style but more of bad habits. It is
790important to check for the correct value upon return of syscalls. The proper
791return code indicating an error is described in its man page. There is no
792reason to consider wider ranges than what is indicated. For instance, it is
793common to see such a thing :
794
795 | if ((fd = open(file, O_RDONLY)) < 0)
796 | return -1;
797
798This is wrong. The man page says that -1 is returned if an error occured. It
799does not suggest that any other negative value will be an error. It is possible
800that a few such issues have been left in existing code. They are bugs for which
801fixes are accepted, eventhough they're currently harmless since open() is not
802known for returning negative values at the moment.
803
804
8058) Declaring new types, names and values
806----------------------------------------
807
808Please refrain from using "typedef" to declare new types, they only obfuscate
809the code. The reader never knows whether he's manipulating a scalar type or a
810struct. For instance it is not obvious why the following code fails to build :
811
812 | int delay_expired(timer_t exp, timer_us_t now)
813 | {
814 | return now >= exp;
815 | }
816
817With the types declared in another file this way :
818
819 | typedef unsigned int timer_t;
820 | typedef struct timeval timer_us_t;
821
822This cannot work because we're comparing a scalar with a struct, which does
823not make sense. Without a typedef, the function would have been written this
824way without any ambiguity and would not have failed :
825
826 | int delay_expired(unsigned int exp, struct timeval *now)
827 | {
828 | return now >= exp->tv_sec;
829 | }
830
831Declaring special values may be done using enums. Enums are a way to define
832structured integer values which are related to each other. They are perfectly
833suited for state machines. While the first element is always assigned the zero
834value, not everybody knows that, especially people working with multiple
835languages all the day. For this reason it is recommended to explicitly force
836the first value even if it's zero. The last element should be followed by a
837comma if it is planned that new elements might later be added, this will make
838later patches shorter. Conversely, if the last element is placed in order to
839get the number of possible values, it must not be followed by a comma and must
840be preceeded by a comment :
841
842 | enum {
843 | first = 0,
844 | second,
845 | third,
846 | fourth,
847 | };
848
849
850 | enum {
851 | first = 0,
852 | second,
853 | third,
854 | fourth,
855 | /* nbvalues must always be placed last */
856 | nbvalues
857 | };
858
859Structure names should be short enough not to mangle function declarations,
860and explicit enough to avoid confusion (which is the most important thing).
861
862Wrong :
863
864 | struct request_args { /* arguments on the query string */
865 | char *name;
866 | char *value;
867 | struct misc_args *next;
868 | };
869
870Right :
871
872 | struct qs_args { /* arguments on the query string */
873 | char *name;
874 | char *value;
875 | struct qs_args *next;
876 | }
877
878
879When declaring new functions or structures, please do not use CamelCase, which
880is a style where upper and lower case are mixed in a single word. It causes a
881lot of confusion when words are composed from acronyms, because it's hard to
882stick to a rule. For instance, a function designed to generate an ISN (initial
883sequence number) for a TCP/IP connection could be called :
884
885 - generateTcpipIsn()
886 - generateTcpIpIsn()
887 - generateTcpIpISN()
888 - generateTCPIPISN()
889 etc...
890
891None is right, none is wrong, these are just preferences which might change
892along the code. Instead, please use an underscore to separate words. Lowercase
893is preferred for the words, but if acronyms are upcased it's not dramatic. The
894real advantage of this method is that it creates unambiguous levels even for
895short names.
896
897Valid examples :
898
899 - generate_tcpip_isn()
900 - generate_tcp_ip_isn()
901 - generate_TCPIP_ISN()
902 - generate_TCP_IP_ISN()
903
904Another example is easy to understand when 3 arguments are involved in naming
905the function :
906
907Wrong (naming conflict) :
908
909 | /* returns A + B * C */
910 | int mulABC(int a, int b, int c)
911 | {
912 | return a + b * c;
913 | }
914 |
915 | /* returns (A + B) * C */
916 | int mulABC(int a, int b, int c)
917 | {
918 | return (a + b) * c;
919 | }
920
921Right (unambiguous naming) :
922
923 | /* returns A + B * C */
924 | int mul_a_bc(int a, int b, int c)
925 | {
926 | return a + b * c;
927 | }
928 |
929 | /* returns (A + B) * C */
930 | int mul_ab_c(int a, int b, int c)
931 | {
932 | return (a + b) * c;
933 | }
934
935Whenever you manipulate pointers, try to declare them as "const", as it will
936save you from many accidental misuses and will only cause warnings to be
937emitted when there is a real risk. In the examples below, it is possible to
938call my_strcpy() with a const string only in the first declaration. Note that
939people who ignore "const" are often the ones who cast a lot and who complain
940from segfaults when using strtok() !
941
942Right :
943
944 | void my_strcpy(char *d, const char *s)
945 | {
946 | while ((*d++ = *s++));
947 | }
948 |
949 | void say_hello(char *dest)
950 | {
951 | my_strcpy(dest, "hello\n");
952 | }
953
954Wrong :
955
956 | void my_strcpy(char *d, char *s)
957 | {
958 | while ((*d++ = *s++));
959 | }
960 |
961 | void say_hello(char *dest)
962 | {
963 | my_strcpy(dest, "hello\n");
964 | }
965
966
9679) Getting macros right
968-----------------------
969
970It is very common for macros to do the wrong thing when used in a way their
971author did not have in mind. For this reason, macros must always be named with
972uppercase letters only. This is the only way to catch the developer's eye when
973using them, so that he double-checks whether he's taking risks or not. First,
974macros must never ever be terminated by a semi-colon, or they will close the
975wrong block once in a while. For instance, the following will cause a build
976error before the "else" due to the double semi-colon :
977
978Wrong :
979
980 | #define WARN printf("warning\n");
981 | ...
982 | if (a < 0)
983 | WARN;
984 | else
985 | a--;
986
987Right :
988
989 | #define WARN printf("warning\n")
990
991If multiple instructions are needed, then use a do { } while (0) block, which
992is the only construct which respects *exactly* the semantics of a single
993instruction :
994
995 | #define WARN do { printf("warning\n"); log("warning\n"); } while (0)
996 | ...
997 |
998 | if (a < 0)
999 | WARN;
1000 | else
1001 | a--;
1002
1003Second, do not put unprotected control statements in macros, they will
1004definitely cause bugs :
1005
1006Wrong :
1007
1008 | #define WARN if (verbose) printf("warning\n")
1009 | ...
1010 | if (a < 0)
1011 | WARN;
1012 | else
1013 | a--;
1014
1015Which is equivalent to the undesired form below :
1016
1017 | if (a < 0)
1018 | if (verbose)
1019 | printf("warning\n");
1020 | else
1021 | a--;
1022
1023Right way to do it :
1024
1025 | #define WARN do { if (verbose) printf("warning\n"); } while (0)
1026 | ...
1027 | if (a < 0)
1028 | WARN;
1029 | else
1030 | a--;
1031
1032Which is equivalent to :
1033
1034 | if (a < 0)
1035 | do { if (verbose) printf("warning\n"); } while (0);
1036 | else
1037 | a--;
1038
1039Macro parameters must always be surrounded by parenthesis, and must never be
1040duplicated in the same macro unless explicitly stated. Also, macros must not be
1041defined with operators without surrounding parenthesis. The MIN/MAX macros are
1042a pretty common example of multiple misuses, but this happens as early as when
1043using bit masks. Most often, in case of any doubt, try to use inline functions
1044instead.
1045
1046Wrong :
1047
1048 | #define MIN(a, b) a < b ? a : b
1049 |
1050 | /* returns 2 * min(a,b) + 1 */
1051 | int double_min_p1(int a, int b)
1052 | {
1053 | return 2 * MIN(a, b) + 1;
1054 | }
1055
1056What this will do :
1057
1058 | int double_min_p1(int a, int b)
1059 | {
1060 | return 2 * a < b ? a : b + 1;
1061 | }
1062
1063Which is equivalent to :
1064
1065 | int double_min_p1(int a, int b)
1066 | {
1067 | return (2 * a) < b ? a : (b + 1);
1068 | }
1069
1070The first thing to fix is to surround the macro definition with parenthesis to
1071avoid this mistake :
1072
1073 | #define MIN(a, b) (a < b ? a : b)
1074
1075But this is still not enough, as can be seen in this example :
1076
1077 | /* compares either a or b with c */
1078 | int min_ab_c(int a, int b, int c)
1079 | {
1080 | return MIN(a ? a : b, c);
1081 | }
1082
1083Which is equivalent to :
1084
1085 | int min_ab_c(int a, int b, int c)
1086 | {
1087 | return (a ? a : b < c ? a ? a : b : c);
1088 | }
1089
1090Which in turn means a totally different thing due to precedence :
1091
1092 | int min_ab_c(int a, int b, int c)
1093 | {
1094 | return (a ? a : ((b < c) ? (a ? a : b) : c));
1095 | }
1096
1097This can be fixed by surrounding *each* argument in the macro with parenthesis:
1098
1099 | #define MIN(a, b) ((a) < (b) ? (a) : (b))
1100
1101But this is still not enough, as can be seen in this example :
1102
1103 | int min_ap1_b(int a, int b)
1104 | {
1105 | return MIN(++a, b);
1106 | }
1107
1108Which is equivalent to :
1109
1110 | int min_ap1_b(int a, int b)
1111 | {
1112 | return ((++a) < (b) ? (++a) : (b));
1113 | }
1114
1115Again, this is wrong because "a" is incremented twice if below b. The only way
1116to fix this is to use a compound statement and to assign each argument exactly
1117once to a local variable of the same type :
1118
1119 | #define MIN(a, b) ({ typeof(a) __a = (a); typeof(b) __b = (b); \
1120 | ((__a) < (__b) ? (__a) : (__b)); \
1121 | })
1122
1123At this point, using static inline functions is much cleaner if a single type
1124is to be used :
1125
1126 | static inline int min(int a, int b)
1127 | {
1128 | return a < b ? a : b;
1129 | }
1130
1131
113210) Includes
1133------------
1134
1135Includes are as much as possible listed in alphabetically ordered groups :
1136 - the libc-standard includes (those without any path component)
1137 - the includes more or less system-specific (sys/*, netinet/*, ...)
1138 - includes from the local "common" subdirectory
1139 - includes from the local "types" subdirectory
1140 - includes from the local "proto" subdirectory
1141
1142Each section is just visually delimited from the other ones using an empty
1143line. The two first ones above may be merged into a single section depending on
1144developer's preference. Please do not copy-paste include statements from other
1145files. Having too many includes significantly increases build time and makes it
1146hard to find which ones are needed later. Just include what you need and if
1147possible in alphabetical order so that when something is missing, it becomes
1148obvious where to look for it and where to add it.
1149
1150All files should include <common/config.h> because this is where build options
1151are prepared.
1152
1153Header files are split in two directories ("types" and "proto") depending on
1154what they provide. Types, structures, enums and #defines must go into the
1155"types" directory. Function prototypes and inlined functions must go into the
1156"proto" directory. This split is because of inlined functions which
1157cross-reference types from other files, which cause a chicken-and-egg problem
1158if the functions and types are declared at the same place.
1159
1160All headers which do not depend on anything currently go to the "common"
1161subdirectory, but are equally well placed into the "proto" directory. It is
1162possible that one day the "common" directory will disappear.
1163
1164Include files must be protected against multiple inclusion using the common
1165#ifndef/#define/#endif trick with a tag derived from the include file and its
1166location.
1167
1168
116911) Comments
1170------------
1171
1172Comments are preferably of the standard 'C' form using /* */. The C++ form "//"
1173are tolerated for very short comments (eg: a word or two) but should be avoided
1174as much as possible. Multi-line comments are made with each intermediate line
1175starting with a star aligned with the first one, as in this example :
1176
1177 | /*
1178 | * This is a multi-line
1179 | * comment.
1180 | */
1181
1182If multiple code lines need a short comment, try to align them so that you can
1183have multi-line sentences. This is rarely needed, only for really complex
1184constructs.
1185
1186Do not tell what you're doing in comments, but explain why you're doing it if
1187it seems not to be obvious. Also *do* indicate at the top of function what they
1188accept and what they don't accept. For instance, strcpy() only accepts output
1189buffers at least as large as the input buffer, and does not support any NULL
1190pointer. There is nothing wrong with that if the caller knows it.
1191
1192Wrong use of comments :
1193
1194 | int flsnz8(unsigned int x)
1195 | {
1196 | int ret = 0; /* initialize ret */
1197 | if (x >> 4) { x >>= 4; ret += 4; } /* add 4 to ret if needed */
1198 | return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1; /* add ??? */
1199 | }
1200 | ...
1201 | bit = ~len + (skip << 3) + 9; /* update bit */
1202
1203Right use of comments :
1204
1205 | /* This function returns the positoin of the highest bit set in the lowest
1206 | * byte of <x>, between 0 and 7. It only works if <x> is non-null. It uses
1207 | * a 32-bit value as a lookup table to return one of 4 values for the
1208 | * highest 16 possible 4-bit values.
1209 | */
1210 | int flsnz8(unsigned int x)
1211 | {
1212 | int ret = 0;
1213 | if (x >> 4) { x >>= 4; ret += 4; }
1214 | return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
1215 | }
1216 | ...
1217 | bit = ~len + (skip << 3) + 9; /* (skip << 3) + (8 - len), saves 1 cycle */
1218
1219
122012) Use of assembly
1221-------------------
1222
1223There are many projects where use of assembly code is not welcome. There is no
1224problem with use of assembly in haproxy, provided that :
1225
1226 a) an alternate C-form is provided for architectures not covered
1227 b) the code is small enough and well commented enough to be maintained
1228
1229It is important to take care of various incompatibilities between compiler
1230versions, for instance regarding output and cloberred registers. There are
1231a number of documentations on the subject on the net. Anyway if you are
1232fiddling with assembly, you probably know that already.
1233
1234Example :
1235 | /* gcc does not know when it can safely divide 64 bits by 32 bits. Use this
1236 | * function when you know for sure that the result fits in 32 bits, because
1237 | * it is optimal on x86 and on 64bit processors.
1238 | */
1239 | static inline unsigned int div64_32(unsigned long long o1, unsigned int o2)
1240 | {
1241 | unsigned int result;
1242 | #ifdef __i386__
1243 | asm("divl %2"
1244 | : "=a" (result)
1245 | : "A"(o1), "rm"(o2));
1246 | #else
1247 | result = o1 / o2;
1248 | #endif
1249 | return result;
1250 | }
1251