blob: dd1ad5f07c948353bc8f30c1af31660794e43ec0 [file] [log] [blame]
Willy Tarreau6232d112021-11-18 11:26:28 +010012021-11-17 - Scheduler API
2
3
41. Background
5-------------
6
7The scheduler relies on two major parts:
8 - the wait queue or timers queue, which contains an ordered tree of the next
9 timers to expire
10
11 - the run queue, which contains tasks that were already woken up and are
12 waiting for a CPU slot to execute.
13
14There are two types of schedulable objects in HAProxy:
15 - tasks: they contain one timer and can be in the run queue without leaving
16 their place in the timers queue.
17
18 - tasklets: they do not have the timers part and are either sleeping or
19 running.
20
21Both the timers queue and run queue in fact exist both shared between all
22threads and per-thread. A task or tasklet may only be queued in a single of
23each at a time. The thread-local queues are not thread-safe while the shared
24ones are. This means that it is only permitted to manipulate an object which
25is in the local queue or in a shared queue, but then after locking it. As such
26tasks and tasklets are usually pinned to threads and do not move, or only in
27very specific ways not detailed here.
28
29In case of doubt, keep in mind that it's not permitted to manipulate another
30thread's private task or tasklet, and that any task held by another thread
31might vanish while it's being looked at.
32
33Internally a large part of the task and tasklet struct is shared between
34the two types, which reduces code duplication and eases the preservation
35of fairness in the run queue by interleaving all of them. As such, some
36fields or flags may not always be relevant to tasklets and may be ignored.
37
38
39Tasklets do not use a thread mask but use a thread ID instead, to which they
40are bound. If the thread ID is negative, the tasklet is not bound but may only
41be run on the calling thread.
42
43
442. API
45------
46
47There are few functions exposed by the scheduler. A few more ones are in fact
48accessible but if not documented there they'd rather be avoided or used only
49when absolutely certain they're suitable, as some have delicate corner cases.
50In doubt, checking the sched.pdf diagram may help.
51
52int total_run_queues()
53 Return the approximate number of tasks in run queues. This is racy
54 and a bit inaccurate as it iterates over all queues, but it is
55 sufficient for stats reporting.
56
57int task_in_rq(t)
58 Return non-zero if the designated task is in the run queue (i.e. it was
59 already woken up).
60
61int task_in_wq(t)
62 Return non-zero if the designated task is in the timers queue (i.e. it
63 has a valid timeout and will eventually expire).
64
65int thread_has_tasks()
66 Return non-zero if the current thread has some work to be done in the
67 run queue. This is used to decide whether or not to sleep in poll().
68
69void task_wakeup(t, f)
70 Will make sure task <t> will wake up, that is, will execute at least
71 once after the start of the function is called. The task flags <f> will
72 be ORed on the task's state, among TASK_WOKEN_* flags exclusively. In
73 multi-threaded environments it is safe to wake up another thread's task
74 and even if the thread is sleeping it will be woken up. Users have to
75 keep in mind that a task running on another thread might very well
76 finish and go back to sleep before the function returns. It is
77 permitted to wake the current task up, in which case it will be
78 scheduled to run another time after it returns to the scheduler.
79
80struct task *task_unlink_wq(t)
81 Remove the task from the timers queue if it was in it, and return it.
82 It may only be done for the local thread, or for a shared thread that
83 might be in the shared queue. It must not be done for another thread's
84 task.
85
86void task_queue(t)
87 Place or update task <t> into the timers queue, where it may already
88 be, scheduling it for an expiration at date t->expire. If t->expire is
89 infinite, nothing is done, so it's safe to call this function without
90 prior checking the expiration date. It is only valid to call this
91 function for local tasks or for shared tasks who have the calling
92 thread in their thread mask.
93
Willy Tarreaueed39112022-06-15 17:20:16 +020094void task_set_thread(t, id)
95 Change task <t>'s thread ID to new value <id>. This may only be
Willy Tarreau6232d112021-11-18 11:26:28 +010096 performed by the task itself while running. This is only used to let a
Willy Tarreaueed39112022-06-15 17:20:16 +020097 task voluntarily migrate to another thread. Thread id -1 is used to
98 indicate "any thread". It's ignored and replaced by zero when threads
99 are disabled.
Willy Tarreau6232d112021-11-18 11:26:28 +0100100
101void tasklet_wakeup(tl)
102 Make sure that tasklet <tl> will wake up, that is, will execute at
103 least once. The tasklet will run on its assigned thread, or on any
104 thread if its TID is negative.
105
106void tasklet_wakeup_on(tl, thr)
107 Make sure that tasklet <tl> will wake up on thread <thr>, that is, will
108 execute at least once. The designated thread may only differ from the
109 calling one if the tasklet is already configured to run on another
110 thread, and it is not permitted to self-assign a tasklet if its tid is
111 negative, as it may already be scheduled to run somewhere else. Just in
112 case, only use tasklet_wakeup() which will pick the tasklet's assigned
113 thread ID.
114
115struct tasklet *tasklet_new()
116 Allocate a new tasklet and set it to run by default on the calling
117 thread. The caller may change its tid to another one before using it.
118 The new tasklet is returned.
119
120struct task *task_new_anywhere()
121 Allocate a new task to run on any thread, and return the task, or NULL
122 in case of allocation issue. Note that such tasks will be marked as
123 shared and will go through the locked queues, thus their activity will
124 be heavier than for other ones. See also task_new_here().
125
126struct task *task_new_here()
127 Allocate a new task to run on the calling thread, and return the task,
128 or NULL in case of allocation issue.
129
130struct task *task_new_on(t)
131 Allocate a new task to run on thread <t>, and return the task, or NULL
132 in case of allocation issue.
133
134void task_destroy(t)
135 Destroy this task. The task will be unlinked from any timers queue,
136 and either immediately freed, or asynchronously killed if currently
137 running. This may only be done by one of the threads this task is
138 allowed to run on. Developers must not forget that the task's memory
139 area is not always immediately freed, and that certain misuses could
140 only have effect later down the chain (e.g. use-after-free).
141
142void tasklet_free()
143 Free this tasklet, which must not be running, so that may only be
144 called by the thread responsible for the tasklet, typically the
145 tasklet's process() function itself.
146
147void task_schedule(t, d)
148 Schedule task <t> to run no later than date <d>. If the task is already
149 running, or scheduled for an earlier instant, nothing is done. If the
150 task was not in queued or was scheduled to run later, its timer entry
151 will be updated. This function assumes that it will never be called
152 with a timer in the past nor with TICK_ETERNITY. Only one of the
153 threads assigned to the task may call this function.
154
155The task's ->process() function receives the following arguments:
156
157 - struct task *t: a pointer to the task itself. It is always valid.
158
159 - void *ctx : a copy of the task's ->context pointer at the moment
160 the ->process() function was called by the scheduler. A
161 function must use this and not task->context, because
162 task->context might possibly be changed by another thread.
163 For instance, the muxes' takeover() function do this.
164
165 - uint state : a copy of the task's ->state field at the moment the
166 ->process() function was executed. A function must use
167 this and not task->state as the latter misses the wakeup
168 reasons and may constantly change during execution along
169 concurrent wakeups (threads or signals).
170
171The possible state flags to use during a call to task_wakeup() or seen by the
172task being called are the following; they're automatically cleaned from the
173state field before the call to ->process()
174
175 - TASK_WOKEN_INIT each creation of a task causes a first wakeup with this
176 flag set. Applications should not set it themselves.
177
178 - TASK_WOKEN_TIMER this indicates the task's expire date was reached in the
179 timers queue. Applications should not set it themselves.
180
181 - TASK_WOKEN_IO indicates the wake-up happened due to I/O activity. Now
182 that all low-level I/O processing happens on tasklets,
183 this notion of I/O is now application-defined (for
184 example stream-interfaces use it to notify the stream).
185
186 - TASK_WOKEN_SIGNAL indicates that a signal the task was subscribed to was
187 received. Applications should not set it themselves.
188
189 - TASK_WOKEN_MSG any application-defined wake-up reason, usually for
190 inter-task communication (e.g filters vs streams).
191
192 - TASK_WOKEN_RES a resource the task was waiting for was finally made
193 available, allowing the task to continue its work. This
Ilya Shipitsin5e87bcf2021-12-25 11:45:52 +0500194 is essentially used by buffers and queues. Applications
Willy Tarreau6232d112021-11-18 11:26:28 +0100195 may carefully use it for their own purpose if they're
196 certain not to rely on existing ones.
197
198 - TASK_WOKEN_OTHER any other application-defined wake-up reason.
199
200
201In addition, a few persistent flags may be observed or manipulated by the
202application, both for tasks and tasklets:
203
204 - TASK_SELF_WAKING when set, indicates that this task was found waking
205 itself up, and its class will change to bulk processing.
206 If this behavior is under control temporarily expected,
207 and it is not expected to happen again, it may make
208 sense to reset this flag from the ->process() function
209 itself.
210
211 - TASK_HEAVY when set, indicates that this task does so heavy
212 processing that it will become mandatory to give back
213 control to I/Os otherwise big latencies might occur. It
214 may be set by an application that expects something
215 heavy to happen (tens to hundreds of microseconds), and
216 reset once finished. An example of user is the TLS stack
217 which sets it when an imminent crypto operation is
218 expected.
219
220 - TASK_F_USR1 This is the first application-defined persistent flag.
221 It is always zero unless the application changes it. An
222 example of use cases is the I/O handler for backend
223 connections, to mention whether the connection is safe
224 to use or might have recently been migrated.
225
226Finally, when built with -DDEBUG_TASK, an extra sub-structure "debug" is added
227to both tasks and tasklets to note the code locations of the last two calls to
228task_wakeup() and tasklet_wakeup().