Willy Tarreau | bc84657 | 2021-11-08 16:48:54 +0100 | [diff] [blame^] | 1 | 2021-11-08 - Indirect Strings (IST) API |
| 2 | |
| 3 | |
| 4 | 1. Background |
| 5 | ------------- |
| 6 | |
| 7 | When parsing traffic, most of the standard C string functions are unusable |
| 8 | since they rely on a trailing zero. In addition, for the rare ones that support |
| 9 | a length, we have to constantly maintain both the pointer and the length. But |
| 10 | then, it's easy to come up with complex lengths and offsets calculations all |
| 11 | over the place, rendering the code hard to read and bugs hard to avoid or spot. |
| 12 | |
| 13 | IST provides a solution to this by defining a structure made of exactly two |
| 14 | word size elements, that most C ABIs know how to handle as a register when |
| 15 | used as a function argument or a function's return value. The functions are |
| 16 | inlined to leave a maximum set of opportunities to the compiler or optimization |
| 17 | and expression reduction, and as a result they are often inexpensive to use. It |
| 18 | is important however to keep in mind that all of these are designed for minimal |
| 19 | code size when dealing with short strings (i.e. parsing tokens in protocols), |
| 20 | and they are not optimal for processing large blocks. |
| 21 | |
| 22 | |
| 23 | 2. API description |
| 24 | ------------------ |
| 25 | |
| 26 | IST are defined like this: |
| 27 | |
| 28 | struct ist { |
| 29 | char *ptr; // pointer to the string's first byte |
| 30 | size_t len; // number of valid bytes starting from ptr |
| 31 | }; |
| 32 | |
| 33 | A string is not set if its ->ptr member is NULL. In this case .len is undefined |
| 34 | and is recommended to be zero. |
| 35 | |
| 36 | Declaring a function returning an IST: |
| 37 | |
| 38 | struct ist produce_ist(int ok) |
| 39 | { |
| 40 | return ok ? IST("OK") : IST("KO"); |
| 41 | } |
| 42 | |
| 43 | Declaring a function consuming an IST: |
| 44 | |
| 45 | void say_ist(struct ist i) |
| 46 | { |
| 47 | write(1, istptr(i), istlen(i)); |
| 48 | } |
| 49 | |
| 50 | Chaining the two: |
| 51 | |
| 52 | void say_ok(int ok) |
| 53 | { |
| 54 | say_ist(produce_ist(ok)); |
| 55 | } |
| 56 | |
| 57 | Notes: |
| 58 | - the arguments are passed as value, not reference, so there's no need for |
| 59 | any "const" in their declaration (except to catch coding mistakes). |
| 60 | Pointers to ist may benefit from being marked "const" however. |
| 61 | |
| 62 | - similarly for the return value, there's no point is marking it "const" as |
| 63 | this would protect the pointer and length, not the data. |
| 64 | |
| 65 | - use ist0() to append a trailing zero to a variable string for use with |
| 66 | printf()'s "%s" format, or for use with functions that work on NUL- |
| 67 | terminated strings, but beware of not doing this with constants. |
| 68 | |
| 69 | - the API provides a starting pointer and current length, but does not |
| 70 | provide an allocated size. It remains up to the caller to know how large |
| 71 | the allocated area is when adding data, though most functions make this |
| 72 | easy. |
| 73 | |
| 74 | The following macros and functions are defined. Those whose name starts with |
| 75 | underscores require special care and must not be used without being certain |
| 76 | they are properly used (typically subject to buffer overflows if misused). Note |
| 77 | that most functions were added over time depending on instant needs, and some |
| 78 | are very close to each other. Many useful functions are still missing and would |
| 79 | deserve being added. |
| 80 | |
| 81 | Below, arguments "i1","i2" are all of type "ist". Arguments "s" are |
| 82 | NUL-terminated strings of type "char*", and "cs" are of type "const char *". |
| 83 | Arguments "c" are of type "char", and "n" are of type size_t. |
| 84 | |
| 85 | IST(cs):ist make constant IST from a NUL-terminated const string |
| 86 | IST_NULL:ist return an unset IST = ist2(NULL,0) |
| 87 | __istappend(i1,c):ist append character <c> at the end of ist <i1> |
| 88 | ist(s):ist return an IST from a nul-terminated string |
| 89 | ist0(i1):char* write a \0 at the end of an IST, return the string |
| 90 | ist2(cs,l):ist return a variable IST from a const string and length |
| 91 | ist2bin(s,i1):ist copy IST into a buffer, return the result |
| 92 | ist2bin_lc(s,i1):ist like ist2bin() but turning turning to lower case |
| 93 | ist2bin_uc(s,i1):ist like ist2bin() but turning turning to upper case |
| 94 | ist2str(s,i1):ist copy IST into a buffer, add NUL and return the result |
| 95 | ist2str_lc(s,i1):ist like ist2str() but turning turning to lower case |
| 96 | ist2str_uc(s,i1):ist like ist2str() but turning turning to upper case |
| 97 | ist_find(i1,c):ist return first occurrence of char <c> in <i1> |
| 98 | ist_find_ctl(i1):char* return pointer to first CTL char in <i1> or NULL |
| 99 | ist_skip(i1,c):ist return first occurrence of char not <c> in <i1> |
| 100 | istadv(i1,n):ist advance the string by <n> characters |
| 101 | istalloc(n):ist return allocated string of zero initial length |
| 102 | istcat(d,s,n):ssize_t copy <s> after <d> for <n> chars max, return len or -1 |
| 103 | istchr(i1,c):char* return pointer to first occurrence of <c> in <i1> |
| 104 | istclear(i1*):size_t return previous size and set size to zero |
| 105 | istcpy(d,s,n):ssize_t copy <s> over <d> for <n> chars max, return len or -1 |
| 106 | istdiff(i1,i2):int return the ordinal difference, like strcmp() |
| 107 | istdup(i1):ist allocate new ist and copy original one into it |
| 108 | istend(i1):char* return pointer to first character after the IST |
| 109 | isteq(i1,i2):int return non-zero if strings are equal |
| 110 | isteqi(i1,i2):int like isteq() but case-insensitive |
| 111 | istfree(i1*) free of allocated <i1>/IST_NULL and set it to IST_NULL |
| 112 | istissame(i1,i2):int return true if pointers and lengths are equal |
| 113 | istist(i1,i2):ist return first occurrence of <i2> in <i1> |
| 114 | istlen(i1):size_t return the length of the IST (number of characters) |
| 115 | istmatch(i1,i2):int return non-zero if i1 starts like i2 (empty OK) |
| 116 | istmatchi(i1,i2):int like istmatch() but case insensitive |
| 117 | istneq(i1,i2,n):int like isteq() but limited to the first <n> chars |
| 118 | istnext(i1):ist return the IST advanced by one character |
| 119 | istnmatch(i1,i2,n):int like istmatch() but limited to the first <n> chars |
| 120 | istpad(s,i1):ist copy IST into a buffer, add a NUL, return the result |
| 121 | istptr(i1):char* return the starting pointer of the IST |
| 122 | istscat(d,s,n):ssize_t same as istcat() but always place a NUL at the end |
| 123 | istscpy(d,s,n):ssize_t same as istcpy() but always place a NUL at the end |
| 124 | istshift(i1*):char return the first character and advance the IST by one |
| 125 | istsplit(i1*,c):ist return part before <c>, make ist start from <c> |
| 126 | iststop(i1,c):ist truncate ist before first occurrence of <c> |
| 127 | isttest(i1):int return true if ist is not NULL, false otherwise |
| 128 | isttrim(i1,n):ist return ist trimmed to no more than <n> characters |
| 129 | istzero(i1,n):ist trim to <n> chars, trailing zero included. |
| 130 | |
| 131 | |
| 132 | 3. Quick index by typical C construct or function |
| 133 | ------------------------------------------------- |
| 134 | |
| 135 | Some common C constructs may be adjusted to use ist instead. The mapping is not |
| 136 | always one-to-one, but usually the computations on the length part tends to |
| 137 | disappear in the refactoring, allowing to directly chain function calls. The |
| 138 | entries below are hints to figure what function to look for in order to rewrite |
| 139 | some common use cases. |
| 140 | |
| 141 | char* IST equivalent |
| 142 | |
| 143 | strchr() istchr(), ist_find(), iststop() |
| 144 | strstr() istist() |
| 145 | strcpy() istcpy() |
| 146 | strscpy() istscpy() |
| 147 | strlcpy() istscpy() |
| 148 | strcat() istcat() |
| 149 | strscat() istscat() |
| 150 | strlcat() istscat() |
| 151 | strcmp() istdiff() |
| 152 | strdup() istdup() |
| 153 | !strcmp() isteq() |
| 154 | !strncmp() istneq(), istmatch(), istnmatch() |
| 155 | !strcasecmp() isteqi() |
| 156 | !strncasecmp() istneqi(), istmatchi() |
| 157 | strtok() istsplit() |
| 158 | return NULL return IST_NULL |
| 159 | s = malloc() s = istalloc() |
| 160 | free(s); s = NULL istfree(&s) |
| 161 | p != NULL isttest(p) |
| 162 | c = *(p++) c = istshift(p) |
| 163 | *(p++) = c __istappend(p, c) |
| 164 | p += n istadv(p, n) |
| 165 | p + strlen(p) istend(p) |
| 166 | p[max] = 0 isttrim(p, max) |
| 167 | p[max+1] = 0 istzero(p, max) |