Willy Tarreau | 6232d11 | 2021-11-18 11:26:28 +0100 | [diff] [blame] | 1 | 2021-11-17 - Scheduler API |
| 2 | |
| 3 | |
| 4 | 1. Background |
| 5 | ------------- |
| 6 | |
| 7 | The scheduler relies on two major parts: |
| 8 | - the wait queue or timers queue, which contains an ordered tree of the next |
| 9 | timers to expire |
| 10 | |
| 11 | - the run queue, which contains tasks that were already woken up and are |
| 12 | waiting for a CPU slot to execute. |
| 13 | |
| 14 | There are two types of schedulable objects in HAProxy: |
| 15 | - tasks: they contain one timer and can be in the run queue without leaving |
| 16 | their place in the timers queue. |
| 17 | |
| 18 | - tasklets: they do not have the timers part and are either sleeping or |
| 19 | running. |
| 20 | |
| 21 | Both the timers queue and run queue in fact exist both shared between all |
| 22 | threads and per-thread. A task or tasklet may only be queued in a single of |
| 23 | each at a time. The thread-local queues are not thread-safe while the shared |
| 24 | ones are. This means that it is only permitted to manipulate an object which |
| 25 | is in the local queue or in a shared queue, but then after locking it. As such |
| 26 | tasks and tasklets are usually pinned to threads and do not move, or only in |
| 27 | very specific ways not detailed here. |
| 28 | |
| 29 | In case of doubt, keep in mind that it's not permitted to manipulate another |
| 30 | thread's private task or tasklet, and that any task held by another thread |
| 31 | might vanish while it's being looked at. |
| 32 | |
| 33 | Internally a large part of the task and tasklet struct is shared between |
| 34 | the two types, which reduces code duplication and eases the preservation |
| 35 | of fairness in the run queue by interleaving all of them. As such, some |
| 36 | fields or flags may not always be relevant to tasklets and may be ignored. |
| 37 | |
| 38 | |
| 39 | Tasklets do not use a thread mask but use a thread ID instead, to which they |
| 40 | are bound. If the thread ID is negative, the tasklet is not bound but may only |
| 41 | be run on the calling thread. |
| 42 | |
| 43 | |
| 44 | 2. API |
| 45 | ------ |
| 46 | |
| 47 | There are few functions exposed by the scheduler. A few more ones are in fact |
| 48 | accessible but if not documented there they'd rather be avoided or used only |
| 49 | when absolutely certain they're suitable, as some have delicate corner cases. |
| 50 | In doubt, checking the sched.pdf diagram may help. |
| 51 | |
| 52 | int total_run_queues() |
| 53 | Return the approximate number of tasks in run queues. This is racy |
| 54 | and a bit inaccurate as it iterates over all queues, but it is |
| 55 | sufficient for stats reporting. |
| 56 | |
| 57 | int task_in_rq(t) |
| 58 | Return non-zero if the designated task is in the run queue (i.e. it was |
| 59 | already woken up). |
| 60 | |
| 61 | int task_in_wq(t) |
| 62 | Return non-zero if the designated task is in the timers queue (i.e. it |
| 63 | has a valid timeout and will eventually expire). |
| 64 | |
| 65 | int thread_has_tasks() |
| 66 | Return non-zero if the current thread has some work to be done in the |
| 67 | run queue. This is used to decide whether or not to sleep in poll(). |
| 68 | |
| 69 | void task_wakeup(t, f) |
| 70 | Will make sure task <t> will wake up, that is, will execute at least |
| 71 | once after the start of the function is called. The task flags <f> will |
| 72 | be ORed on the task's state, among TASK_WOKEN_* flags exclusively. In |
| 73 | multi-threaded environments it is safe to wake up another thread's task |
| 74 | and even if the thread is sleeping it will be woken up. Users have to |
| 75 | keep in mind that a task running on another thread might very well |
| 76 | finish and go back to sleep before the function returns. It is |
| 77 | permitted to wake the current task up, in which case it will be |
| 78 | scheduled to run another time after it returns to the scheduler. |
| 79 | |
| 80 | struct task *task_unlink_wq(t) |
| 81 | Remove the task from the timers queue if it was in it, and return it. |
| 82 | It may only be done for the local thread, or for a shared thread that |
| 83 | might be in the shared queue. It must not be done for another thread's |
| 84 | task. |
| 85 | |
| 86 | void task_queue(t) |
| 87 | Place or update task <t> into the timers queue, where it may already |
| 88 | be, scheduling it for an expiration at date t->expire. If t->expire is |
| 89 | infinite, nothing is done, so it's safe to call this function without |
| 90 | prior checking the expiration date. It is only valid to call this |
| 91 | function for local tasks or for shared tasks who have the calling |
| 92 | thread in their thread mask. |
| 93 | |
Willy Tarreau | eed3911 | 2022-06-15 17:20:16 +0200 | [diff] [blame] | 94 | void task_set_thread(t, id) |
| 95 | Change task <t>'s thread ID to new value <id>. This may only be |
Willy Tarreau | 6232d11 | 2021-11-18 11:26:28 +0100 | [diff] [blame] | 96 | performed by the task itself while running. This is only used to let a |
Willy Tarreau | eed3911 | 2022-06-15 17:20:16 +0200 | [diff] [blame] | 97 | task voluntarily migrate to another thread. Thread id -1 is used to |
| 98 | indicate "any thread". It's ignored and replaced by zero when threads |
| 99 | are disabled. |
Willy Tarreau | 6232d11 | 2021-11-18 11:26:28 +0100 | [diff] [blame] | 100 | |
| 101 | void tasklet_wakeup(tl) |
| 102 | Make sure that tasklet <tl> will wake up, that is, will execute at |
| 103 | least once. The tasklet will run on its assigned thread, or on any |
| 104 | thread if its TID is negative. |
| 105 | |
| 106 | void tasklet_wakeup_on(tl, thr) |
| 107 | Make sure that tasklet <tl> will wake up on thread <thr>, that is, will |
| 108 | execute at least once. The designated thread may only differ from the |
| 109 | calling one if the tasklet is already configured to run on another |
| 110 | thread, and it is not permitted to self-assign a tasklet if its tid is |
| 111 | negative, as it may already be scheduled to run somewhere else. Just in |
| 112 | case, only use tasklet_wakeup() which will pick the tasklet's assigned |
| 113 | thread ID. |
| 114 | |
| 115 | struct tasklet *tasklet_new() |
| 116 | Allocate a new tasklet and set it to run by default on the calling |
| 117 | thread. The caller may change its tid to another one before using it. |
| 118 | The new tasklet is returned. |
| 119 | |
| 120 | struct task *task_new_anywhere() |
| 121 | Allocate a new task to run on any thread, and return the task, or NULL |
| 122 | in case of allocation issue. Note that such tasks will be marked as |
| 123 | shared and will go through the locked queues, thus their activity will |
| 124 | be heavier than for other ones. See also task_new_here(). |
| 125 | |
| 126 | struct task *task_new_here() |
| 127 | Allocate a new task to run on the calling thread, and return the task, |
| 128 | or NULL in case of allocation issue. |
| 129 | |
| 130 | struct task *task_new_on(t) |
| 131 | Allocate a new task to run on thread <t>, and return the task, or NULL |
| 132 | in case of allocation issue. |
| 133 | |
| 134 | void task_destroy(t) |
| 135 | Destroy this task. The task will be unlinked from any timers queue, |
| 136 | and either immediately freed, or asynchronously killed if currently |
| 137 | running. This may only be done by one of the threads this task is |
| 138 | allowed to run on. Developers must not forget that the task's memory |
| 139 | area is not always immediately freed, and that certain misuses could |
| 140 | only have effect later down the chain (e.g. use-after-free). |
| 141 | |
| 142 | void tasklet_free() |
| 143 | Free this tasklet, which must not be running, so that may only be |
| 144 | called by the thread responsible for the tasklet, typically the |
| 145 | tasklet's process() function itself. |
| 146 | |
| 147 | void task_schedule(t, d) |
| 148 | Schedule task <t> to run no later than date <d>. If the task is already |
| 149 | running, or scheduled for an earlier instant, nothing is done. If the |
| 150 | task was not in queued or was scheduled to run later, its timer entry |
| 151 | will be updated. This function assumes that it will never be called |
| 152 | with a timer in the past nor with TICK_ETERNITY. Only one of the |
| 153 | threads assigned to the task may call this function. |
| 154 | |
| 155 | The task's ->process() function receives the following arguments: |
| 156 | |
| 157 | - struct task *t: a pointer to the task itself. It is always valid. |
| 158 | |
| 159 | - void *ctx : a copy of the task's ->context pointer at the moment |
| 160 | the ->process() function was called by the scheduler. A |
| 161 | function must use this and not task->context, because |
| 162 | task->context might possibly be changed by another thread. |
| 163 | For instance, the muxes' takeover() function do this. |
| 164 | |
| 165 | - uint state : a copy of the task's ->state field at the moment the |
| 166 | ->process() function was executed. A function must use |
| 167 | this and not task->state as the latter misses the wakeup |
| 168 | reasons and may constantly change during execution along |
| 169 | concurrent wakeups (threads or signals). |
| 170 | |
| 171 | The possible state flags to use during a call to task_wakeup() or seen by the |
| 172 | task being called are the following; they're automatically cleaned from the |
| 173 | state field before the call to ->process() |
| 174 | |
| 175 | - TASK_WOKEN_INIT each creation of a task causes a first wakeup with this |
| 176 | flag set. Applications should not set it themselves. |
| 177 | |
| 178 | - TASK_WOKEN_TIMER this indicates the task's expire date was reached in the |
| 179 | timers queue. Applications should not set it themselves. |
| 180 | |
| 181 | - TASK_WOKEN_IO indicates the wake-up happened due to I/O activity. Now |
| 182 | that all low-level I/O processing happens on tasklets, |
| 183 | this notion of I/O is now application-defined (for |
| 184 | example stream-interfaces use it to notify the stream). |
| 185 | |
| 186 | - TASK_WOKEN_SIGNAL indicates that a signal the task was subscribed to was |
| 187 | received. Applications should not set it themselves. |
| 188 | |
| 189 | - TASK_WOKEN_MSG any application-defined wake-up reason, usually for |
| 190 | inter-task communication (e.g filters vs streams). |
| 191 | |
| 192 | - TASK_WOKEN_RES a resource the task was waiting for was finally made |
| 193 | available, allowing the task to continue its work. This |
Ilya Shipitsin | 5e87bcf | 2021-12-25 11:45:52 +0500 | [diff] [blame] | 194 | is essentially used by buffers and queues. Applications |
Willy Tarreau | 6232d11 | 2021-11-18 11:26:28 +0100 | [diff] [blame] | 195 | may carefully use it for their own purpose if they're |
| 196 | certain not to rely on existing ones. |
| 197 | |
| 198 | - TASK_WOKEN_OTHER any other application-defined wake-up reason. |
| 199 | |
| 200 | |
| 201 | In addition, a few persistent flags may be observed or manipulated by the |
| 202 | application, both for tasks and tasklets: |
| 203 | |
| 204 | - TASK_SELF_WAKING when set, indicates that this task was found waking |
| 205 | itself up, and its class will change to bulk processing. |
| 206 | If this behavior is under control temporarily expected, |
| 207 | and it is not expected to happen again, it may make |
| 208 | sense to reset this flag from the ->process() function |
| 209 | itself. |
| 210 | |
| 211 | - TASK_HEAVY when set, indicates that this task does so heavy |
| 212 | processing that it will become mandatory to give back |
| 213 | control to I/Os otherwise big latencies might occur. It |
| 214 | may be set by an application that expects something |
| 215 | heavy to happen (tens to hundreds of microseconds), and |
| 216 | reset once finished. An example of user is the TLS stack |
| 217 | which sets it when an imminent crypto operation is |
| 218 | expected. |
| 219 | |
| 220 | - TASK_F_USR1 This is the first application-defined persistent flag. |
| 221 | It is always zero unless the application changes it. An |
| 222 | example of use cases is the I/O handler for backend |
| 223 | connections, to mention whether the connection is safe |
| 224 | to use or might have recently been migrated. |
| 225 | |
| 226 | Finally, when built with -DDEBUG_TASK, an extra sub-structure "debug" is added |
| 227 | to both tasks and tasklets to note the code locations of the last two calls to |
| 228 | task_wakeup() and tasklet_wakeup(). |