blob: 0f118d6e63745f96c968e5bcfb6463db6d6125ea [file] [log] [blame]
Willy Tarreaubc846572021-11-08 16:48:54 +010012021-11-08 - Indirect Strings (IST) API
2
3
41. Background
5-------------
6
7When parsing traffic, most of the standard C string functions are unusable
8since they rely on a trailing zero. In addition, for the rare ones that support
9a length, we have to constantly maintain both the pointer and the length. But
10then, it's easy to come up with complex lengths and offsets calculations all
11over the place, rendering the code hard to read and bugs hard to avoid or spot.
12
13IST provides a solution to this by defining a structure made of exactly two
14word size elements, that most C ABIs know how to handle as a register when
15used as a function argument or a function's return value. The functions are
16inlined to leave a maximum set of opportunities to the compiler or optimization
17and expression reduction, and as a result they are often inexpensive to use. It
18is important however to keep in mind that all of these are designed for minimal
19code size when dealing with short strings (i.e. parsing tokens in protocols),
20and they are not optimal for processing large blocks.
21
22
232. API description
24------------------
25
26IST are defined like this:
27
28 struct ist {
29 char *ptr; // pointer to the string's first byte
30 size_t len; // number of valid bytes starting from ptr
31 };
32
33A string is not set if its ->ptr member is NULL. In this case .len is undefined
34and is recommended to be zero.
35
36Declaring a function returning an IST:
37
38 struct ist produce_ist(int ok)
39 {
40 return ok ? IST("OK") : IST("KO");
41 }
42
43Declaring a function consuming an IST:
44
45 void say_ist(struct ist i)
46 {
47 write(1, istptr(i), istlen(i));
48 }
49
50Chaining the two:
51
52 void say_ok(int ok)
53 {
54 say_ist(produce_ist(ok));
55 }
56
57Notes:
58 - the arguments are passed as value, not reference, so there's no need for
59 any "const" in their declaration (except to catch coding mistakes).
60 Pointers to ist may benefit from being marked "const" however.
61
62 - similarly for the return value, there's no point is marking it "const" as
63 this would protect the pointer and length, not the data.
64
65 - use ist0() to append a trailing zero to a variable string for use with
66 printf()'s "%s" format, or for use with functions that work on NUL-
67 terminated strings, but beware of not doing this with constants.
68
69 - the API provides a starting pointer and current length, but does not
70 provide an allocated size. It remains up to the caller to know how large
71 the allocated area is when adding data, though most functions make this
72 easy.
73
74The following macros and functions are defined. Those whose name starts with
75underscores require special care and must not be used without being certain
76they are properly used (typically subject to buffer overflows if misused). Note
77that most functions were added over time depending on instant needs, and some
78are very close to each other. Many useful functions are still missing and would
79deserve being added.
80
81Below, arguments "i1","i2" are all of type "ist". Arguments "s" are
82NUL-terminated strings of type "char*", and "cs" are of type "const char *".
83Arguments "c" are of type "char", and "n" are of type size_t.
84
85 IST(cs):ist make constant IST from a NUL-terminated const string
86 IST_NULL:ist return an unset IST = ist2(NULL,0)
87 __istappend(i1,c):ist append character <c> at the end of ist <i1>
88 ist(s):ist return an IST from a nul-terminated string
89 ist0(i1):char* write a \0 at the end of an IST, return the string
90 ist2(cs,l):ist return a variable IST from a const string and length
91 ist2bin(s,i1):ist copy IST into a buffer, return the result
92 ist2bin_lc(s,i1):ist like ist2bin() but turning turning to lower case
93 ist2bin_uc(s,i1):ist like ist2bin() but turning turning to upper case
94 ist2str(s,i1):ist copy IST into a buffer, add NUL and return the result
95 ist2str_lc(s,i1):ist like ist2str() but turning turning to lower case
96 ist2str_uc(s,i1):ist like ist2str() but turning turning to upper case
97 ist_find(i1,c):ist return first occurrence of char <c> in <i1>
98 ist_find_ctl(i1):char* return pointer to first CTL char in <i1> or NULL
99 ist_skip(i1,c):ist return first occurrence of char not <c> in <i1>
100 istadv(i1,n):ist advance the string by <n> characters
101 istalloc(n):ist return allocated string of zero initial length
102 istcat(d,s,n):ssize_t copy <s> after <d> for <n> chars max, return len or -1
103 istchr(i1,c):char* return pointer to first occurrence of <c> in <i1>
104 istclear(i1*):size_t return previous size and set size to zero
105 istcpy(d,s,n):ssize_t copy <s> over <d> for <n> chars max, return len or -1
106 istdiff(i1,i2):int return the ordinal difference, like strcmp()
107 istdup(i1):ist allocate new ist and copy original one into it
108 istend(i1):char* return pointer to first character after the IST
109 isteq(i1,i2):int return non-zero if strings are equal
110 isteqi(i1,i2):int like isteq() but case-insensitive
111 istfree(i1*) free of allocated <i1>/IST_NULL and set it to IST_NULL
112 istissame(i1,i2):int return true if pointers and lengths are equal
113 istist(i1,i2):ist return first occurrence of <i2> in <i1>
114 istlen(i1):size_t return the length of the IST (number of characters)
115 istmatch(i1,i2):int return non-zero if i1 starts like i2 (empty OK)
116 istmatchi(i1,i2):int like istmatch() but case insensitive
117 istneq(i1,i2,n):int like isteq() but limited to the first <n> chars
118 istnext(i1):ist return the IST advanced by one character
119 istnmatch(i1,i2,n):int like istmatch() but limited to the first <n> chars
120 istpad(s,i1):ist copy IST into a buffer, add a NUL, return the result
121 istptr(i1):char* return the starting pointer of the IST
122 istscat(d,s,n):ssize_t same as istcat() but always place a NUL at the end
123 istscpy(d,s,n):ssize_t same as istcpy() but always place a NUL at the end
124 istshift(i1*):char return the first character and advance the IST by one
125 istsplit(i1*,c):ist return part before <c>, make ist start from <c>
126 iststop(i1,c):ist truncate ist before first occurrence of <c>
127 isttest(i1):int return true if ist is not NULL, false otherwise
128 isttrim(i1,n):ist return ist trimmed to no more than <n> characters
129 istzero(i1,n):ist trim to <n> chars, trailing zero included.
130
131
1323. Quick index by typical C construct or function
133-------------------------------------------------
134
135Some common C constructs may be adjusted to use ist instead. The mapping is not
136always one-to-one, but usually the computations on the length part tends to
137disappear in the refactoring, allowing to directly chain function calls. The
138entries below are hints to figure what function to look for in order to rewrite
139some common use cases.
140
141 char* IST equivalent
142
143 strchr() istchr(), ist_find(), iststop()
144 strstr() istist()
145 strcpy() istcpy()
146 strscpy() istscpy()
147 strlcpy() istscpy()
148 strcat() istcat()
149 strscat() istscat()
150 strlcat() istscat()
151 strcmp() istdiff()
152 strdup() istdup()
153 !strcmp() isteq()
154 !strncmp() istneq(), istmatch(), istnmatch()
155 !strcasecmp() isteqi()
156 !strncasecmp() istneqi(), istmatchi()
157 strtok() istsplit()
158 return NULL return IST_NULL
159 s = malloc() s = istalloc()
160 free(s); s = NULL istfree(&s)
161 p != NULL isttest(p)
162 c = *(p++) c = istshift(p)
163 *(p++) = c __istappend(p, c)
164 p += n istadv(p, n)
165 p + strlen(p) istend(p)
166 p[max] = 0 isttrim(p, max)
167 p[max+1] = 0 istzero(p, max)