Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0+ |
| 2 | .. Copyright (c) 2013 The Chromium OS Authors. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 3 | |
| 4 | Tracing in U-Boot |
| 5 | ================= |
| 6 | |
Heinrich Schuchardt | 8963f19 | 2021-11-16 18:38:47 +0100 | [diff] [blame] | 7 | U-Boot supports a simple tracing feature which allows a record of execution |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 8 | to be collected and sent to a host machine for analysis. At present the |
| 9 | main use for this is to profile boot time. |
| 10 | |
| 11 | |
| 12 | Overview |
| 13 | -------- |
| 14 | |
| 15 | The trace feature uses GCC's instrument-functions feature to trace all |
| 16 | function entry/exit points. These are then recorded in a memory buffer. |
| 17 | The memory buffer can be saved to the host over a network link using |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 18 | tftpput or by writing to an attached storage device such as MMC. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 19 | |
| 20 | On the host, the file is first converted with a tool called 'proftool', |
| 21 | which extracts useful information from it. The resulting trace output |
| 22 | resembles that emitted by Linux's ftrace feature, so can be visually |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 23 | displayed by kernelshark (see kernelshark_) and used with |
| 24 | 'trace-cmd report' (see trace_cmd_). |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 25 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 26 | It is also possible to produce a flame graph for use with flamegraph.pl |
| 27 | (see flamegraph_pl_). |
| 28 | |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 29 | |
| 30 | Quick-start using Sandbox |
| 31 | ------------------------- |
| 32 | |
| 33 | Sandbox is a build of U-Boot that can run under Linux so it is a convenient |
| 34 | way of trying out tracing before you use it on your actual board. To do |
| 35 | this, follow these steps: |
| 36 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 37 | Add the following to `config/sandbox_defconfig`: |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 38 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 39 | .. code-block:: c |
| 40 | |
Simon Glass | 84b7711 | 2021-11-24 09:26:39 -0700 | [diff] [blame] | 41 | CONFIG_TRACE=y |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 42 | |
| 43 | Build sandbox U-Boot with tracing enabled: |
| 44 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 45 | .. code-block:: console |
| 46 | |
| 47 | $ make FTRACE=1 O=sandbox sandbox_config |
| 48 | $ make FTRACE=1 O=sandbox |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 49 | |
| 50 | Run sandbox, wait for a bit of trace information to appear, and then capture |
| 51 | a trace: |
| 52 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 53 | .. code-block:: console |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 54 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 55 | $ ./sandbox/u-boot |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 56 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 57 | U-Boot 2013.04-rc2-00100-ga72fcef (Apr 17 2013 - 19:25:24) |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 58 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 59 | DRAM: 128 MiB |
| 60 | trace: enabled |
| 61 | Using default environment |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 62 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 63 | In: serial |
| 64 | Out: serial |
| 65 | Err: serial |
| 66 | =>trace stats |
| 67 | 671,406 function sites |
| 68 | 69,712 function calls |
| 69 | 0 untracked function calls |
| 70 | 73,373 traced function calls |
| 71 | 16 maximum observed call depth |
| 72 | 15 call depth limit |
| 73 | 66,491 calls not traced due to depth |
| 74 | =>trace stats |
| 75 | 671,406 function sites |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 76 | 1,279,450 function calls |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 77 | 0 untracked function calls |
| 78 | 950,490 traced function calls (333217 dropped due to overflow) |
| 79 | 16 maximum observed call depth |
| 80 | 15 call depth limit |
| 81 | 1,275,767 calls not traced due to depth |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 82 | =>trace calls 1000000 e00000 |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 83 | Call list dumped to 00000000, size 0xae0a40 |
| 84 | =>print |
| 85 | baudrate=115200 |
| 86 | profbase=0 |
| 87 | profoffset=ae0a40 |
| 88 | profsize=e00000 |
| 89 | stderr=serial |
| 90 | stdin=serial |
| 91 | stdout=serial |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 92 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 93 | Environment size: 117/8188 bytes |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 94 | =>host save hostfs - 1000000 trace ${profoffset} |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 95 | 11405888 bytes written in 10 ms (1.1 GiB/s) |
| 96 | =>reset |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 97 | |
| 98 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 99 | Then run proftool to convert the trace information to ftrace format |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 100 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 101 | .. code-block:: console |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 102 | |
Pavel Skripkin | cdfcfe4 | 2023-04-12 22:05:31 +0300 | [diff] [blame] | 103 | $ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-ftrace -o trace.dat |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 104 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 105 | Finally run kernelshark to display it (note it only works with `.dat` files!): |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 106 | |
| 107 | .. code-block:: console |
| 108 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 109 | $ kernelshark trace.dat |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 110 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 111 | Using this tool you can view the trace records and see the timestamp for each |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 112 | function. |
| 113 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 114 | .. image:: pics/kernelshark.png |
| 115 | :width: 800 |
| 116 | :alt: Kernelshark showing function-trace records |
| 117 | |
| 118 | |
| 119 | To see the records on the console, use trace-cmd: |
| 120 | |
| 121 | .. code-block:: console |
| 122 | |
| 123 | $ trace-cmd report trace.dat | less |
| 124 | cpus=1 |
| 125 | u-boot-1 [000] 3.116364: function: initf_malloc |
| 126 | u-boot-1 [000] 3.116375: function: initf_malloc |
| 127 | u-boot-1 [000] 3.116386: function: initf_bootstage |
| 128 | u-boot-1 [000] 3.116396: function: bootstage_init |
| 129 | u-boot-1 [000] 3.116408: function: malloc |
| 130 | u-boot-1 [000] 3.116418: function: malloc_simple |
| 131 | u-boot-1 [000] 3.116429: function: alloc_simple |
| 132 | u-boot-1 [000] 3.116441: function: alloc_simple |
| 133 | u-boot-1 [000] 3.116449: function: malloc_simple |
| 134 | u-boot-1 [000] 3.116457: function: malloc |
| 135 | |
| 136 | Note that `pytimechart` is obsolete so cannot be used anymore. |
| 137 | |
| 138 | There is a -f option available to select a function graph: |
| 139 | |
| 140 | .. code-block:: console |
| 141 | |
Puhan Zhou | 10af36c | 2023-08-13 13:16:19 +0800 | [diff] [blame] | 142 | $ ./sandbox/tools/proftool -m sandbox/System.map -t trace -f funcgraph dump-ftrace -o trace.dat |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 143 | |
| 144 | Again, you can use kernelshark or trace-cmd to look at the output. In this case |
| 145 | you will see the time taken by each function shown against its exit record. |
| 146 | |
| 147 | .. image:: pics/kernelshark_fg.png |
| 148 | :width: 800 |
| 149 | :alt: Kernelshark showing function-graph records |
| 150 | |
| 151 | .. code-block:: console |
| 152 | |
| 153 | $ trace-cmd report trace.dat | less |
| 154 | cpus=1 |
| 155 | u-boot-1 [000] 3.116364: funcgraph_entry: 0.011 us | initf_malloc(); |
| 156 | u-boot-1 [000] 3.116386: funcgraph_entry: | initf_bootstage() { |
| 157 | u-boot-1 [000] 3.116396: funcgraph_entry: | bootstage_init() { |
| 158 | u-boot-1 [000] 3.116408: funcgraph_entry: | malloc() { |
| 159 | u-boot-1 [000] 3.116418: funcgraph_entry: | malloc_simple() { |
| 160 | u-boot-1 [000] 3.116429: funcgraph_entry: 0.012 us | alloc_simple(); |
| 161 | u-boot-1 [000] 3.116449: funcgraph_exit: 0.031 us | } |
| 162 | u-boot-1 [000] 3.116457: funcgraph_exit: 0.049 us | } |
| 163 | u-boot-1 [000] 3.116466: funcgraph_entry: 0.063 us | memset(); |
| 164 | u-boot-1 [000] 3.116539: funcgraph_exit: 0.143 us | } |
| 165 | |
| 166 | Flame graph |
| 167 | ----------- |
| 168 | |
| 169 | Some simple flame graph options are available as well, using the dump-flamegraph |
| 170 | command: |
| 171 | |
| 172 | .. code-block:: console |
| 173 | |
Puhan Zhou | 10af36c | 2023-08-13 13:16:19 +0800 | [diff] [blame] | 174 | $ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-flamegraph -o trace.fg |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 175 | $ flamegraph.pl trace.fg >trace.svg |
| 176 | |
| 177 | You can load the .svg file into a viewer. If you use Chrome (and some other |
| 178 | programs) you can click around and zoom in and out. |
| 179 | |
| 180 | .. image:: pics/flamegraph.png |
| 181 | :width: 800 |
| 182 | :alt: Chrome showing the flamegraph.pl output |
| 183 | |
| 184 | .. image:: pics/flamegraph_zoom.png |
| 185 | :width: 800 |
| 186 | :alt: Chrome showing zooming into the flamegraph.pl output |
| 187 | |
| 188 | |
| 189 | A timing variant is also available, which gives an idea of how much time is |
| 190 | spend in each call stack: |
| 191 | |
| 192 | .. code-block:: console |
| 193 | |
Puhan Zhou | 10af36c | 2023-08-13 13:16:19 +0800 | [diff] [blame] | 194 | $ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-flamegraph -f timing -o trace.fg |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 195 | $ flamegraph.pl trace.fg >trace.svg |
| 196 | |
| 197 | Note that trace collection does slow down execution so the timings will be |
| 198 | inflated. They should be used to guide optimisation. For accurate boot timings, |
| 199 | use bootstage. |
| 200 | |
| 201 | .. image:: pics/flamegraph_timing.png |
| 202 | :width: 800 |
| 203 | :alt: Chrome showing flamegraph.pl output with timing |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 204 | |
| 205 | CONFIG Options |
| 206 | -------------- |
| 207 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 208 | CONFIG_TRACE |
| 209 | Enables the trace feature in U-Boot. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 210 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 211 | CONFIG_CMD_TRACE |
| 212 | Enables the trace command. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 213 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 214 | CONFIG_TRACE_BUFFER_SIZE |
| 215 | Size of trace buffer to allocate for U-Boot. This buffer is |
| 216 | used after relocation, as a place to put function tracing |
| 217 | information. The address of the buffer is determined by |
| 218 | the relocation code. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 219 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 220 | CONFIG_TRACE_EARLY |
| 221 | Define this to start tracing early, before relocation. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 222 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 223 | CONFIG_TRACE_EARLY_SIZE |
| 224 | Size of 'early' trace buffer. Before U-Boot has relocated |
| 225 | it doesn't have a proper trace buffer. On many boards |
| 226 | you can define an area of memory to use for the trace |
| 227 | buffer until the 'real' trace buffer is available after |
| 228 | relocation. The contents of this buffer are then copied to |
| 229 | the real buffer. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 230 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 231 | CONFIG_TRACE_EARLY_ADDR |
| 232 | Address of early trace buffer |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 233 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 234 | CONFIG_TRACE_CALL_DEPTH_LIMIT |
| 235 | Sets the limit on trace call-depth. For a broad view, 10 is typically |
| 236 | sufficient. Setting this too large creates enormous traces and distorts |
| 237 | the overall timing considerable. |
| 238 | |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 239 | |
| 240 | Building U-Boot with Tracing Enabled |
| 241 | ------------------------------------ |
| 242 | |
| 243 | Pass 'FTRACE=1' to the U-Boot Makefile to actually instrument the code. |
| 244 | This is kept as a separate option so that it is easy to enable/disable |
| 245 | instrumenting from the command line instead of having to change board |
| 246 | config files. |
| 247 | |
| 248 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 249 | Board requirements |
| 250 | ------------------ |
| 251 | |
| 252 | Trace data collection relies on a microsecond timer, accessed through |
| 253 | `timer_get_us()`. So the first thing you should do is make sure that |
| 254 | this produces sensible results for your board. Suitable sources for |
| 255 | this timer include high resolution timers, PWMs or profile timers if |
| 256 | available. Most modern SOCs have a suitable timer for this. |
| 257 | |
| 258 | See `add_ftrace()` for where `timer_get_us()` is called. The `notrace` |
| 259 | attribute must be used on each function called by `timer_get_us()` since |
| 260 | recursive calls to `add_ftrace()` will cause a fault:: |
| 261 | |
| 262 | trace: recursion detected, disabling |
| 263 | |
| 264 | You cannot use driver model to obtain the microsecond timer, since tracing |
| 265 | may be enabled before driver model is set up. Instead, provide a low-level |
| 266 | function which accesses the timer, setting it up if needed. |
| 267 | |
| 268 | |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 269 | Collecting Trace Data |
| 270 | --------------------- |
| 271 | |
| 272 | When you run U-Boot on your board it will collect trace data up to the |
| 273 | limit of the trace buffer size you have specified. Once that is exhausted |
| 274 | no more data will be collected. |
| 275 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 276 | Collecting trace data affects execution time and performance. You |
Heinrich Schuchardt | 8963f19 | 2021-11-16 18:38:47 +0100 | [diff] [blame] | 277 | will notice this particularly with trivial functions - the overhead of |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 278 | recording their execution may even exceed their normal execution time. |
| 279 | In practice this doesn't matter much so long as you are aware of the |
Heinrich Schuchardt | 8963f19 | 2021-11-16 18:38:47 +0100 | [diff] [blame] | 280 | effect. Once you have done your optimizations, turn off tracing before |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 281 | doing end-to-end timing using bootstage. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 282 | |
| 283 | The best time to start tracing is right at the beginning of U-Boot. The |
| 284 | best time to stop tracing is right at the end. In practice it is hard |
| 285 | to achieve these ideals. |
| 286 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 287 | This implementation enables tracing early in `board_init_r()`, or |
| 288 | `board_init_f()` when `TRACE_EARLY` is enabled. This means |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 289 | that it captures most of the board init process, missing only the |
| 290 | early architecture-specific init. However, it also misses the entire |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 291 | SPL stage if there is one. At present tracing is not supported in SPL. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 292 | |
| 293 | U-Boot typically ends with a 'bootm' command which loads and runs an |
| 294 | OS. There is useful trace data in the execution of that bootm |
| 295 | command. Therefore this implementation provides a way to collect trace |
| 296 | data after bootm has finished processing, but just before it jumps to |
| 297 | the OS. In practical terms, U-Boot runs the 'fakegocmd' environment |
| 298 | variable at this point. This variable should have a short script which |
| 299 | collects the trace data and writes it somewhere. |
| 300 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 301 | Controlling the trace |
| 302 | --------------------- |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 303 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 304 | U-Boot provides a command-line interface to the trace system for controlling |
| 305 | tracing and accessing the trace data. See :doc:`../usage/cmd/trace`. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 306 | |
| 307 | |
| 308 | Environment Variables |
| 309 | --------------------- |
| 310 | |
| 311 | The following are used: |
| 312 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 313 | profbase |
| 314 | Base address of trace output buffer |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 315 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 316 | profoffset |
| 317 | Offset of first unwritten byte in trace output buffer |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 318 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 319 | profsize |
| 320 | Size of trace output buffer |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 321 | |
| 322 | All of these are set by the 'trace calls' command. |
| 323 | |
| 324 | These variables keep track of the amount of data written to the trace |
| 325 | output buffer by the 'trace' command. The trace commands which write data |
| 326 | to the output buffer can use these to specify the buffer to write to, and |
| 327 | update profoffset each time. This allows successive commands to append data |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 328 | to the same buffer, for example:: |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 329 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 330 | => trace funclist 10000 e00000 |
| 331 | => trace calls |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 332 | |
| 333 | (the latter command appends more data to the buffer). |
| 334 | |
| 335 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 336 | fakegocmd |
| 337 | Specifies commands to run just before booting the OS. This |
| 338 | is a useful time to write the trace data to the host for |
| 339 | processing. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 340 | |
| 341 | |
| 342 | Writing Out Trace Data |
| 343 | ---------------------- |
| 344 | |
| 345 | Once the trace data is in an output buffer in memory there are various ways |
| 346 | to transmit it to the host. Notably you can use tftput to send the data |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 347 | over a network link:: |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 348 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 349 | fakegocmd=trace pause; usb start; set autoload n; bootp; |
| 350 | trace calls 10000000 1000000; |
| 351 | tftpput ${profbase} ${profoffset} 192.168.1.4:/tftpboot/calls |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 352 | |
| 353 | This starts up USB (to talk to an attached USB Ethernet dongle), writes |
| 354 | a trace log to address 10000000 and sends it to a host machine using |
| 355 | TFTP. After this, U-Boot will boot the OS normally, albeit a little |
| 356 | later. |
| 357 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 358 | For a filesystem you may do something like:: |
| 359 | |
| 360 | trace calls 10000000 1000000; |
| 361 | save mmc 1:1 10000000 /trace ${profoffset} |
| 362 | |
| 363 | The trace buffer format is internal to the trace system. It consists of a |
| 364 | header, a call count for each function site, followed by a list of trace |
| 365 | records, once for each function call. |
| 366 | |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 367 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 368 | Converting Trace Output Data (proftool) |
| 369 | --------------------------------------- |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 370 | |
| 371 | The trace output data is kept in a binary format which is not documented |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 372 | here. See the `trace.h` header file if you are interested. To convert it into |
| 373 | something useful, you can use proftool. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 374 | |
| 375 | This tool must be given the U-Boot map file and the trace data received |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 376 | from running that U-Boot. It produces a binary output file. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 377 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 378 | It is also possible to provide a configuration file to indicate which functions |
| 379 | should be included or dropped during conversion. This file consists of lines |
| 380 | like:: |
| 381 | |
| 382 | include-func <regex> |
| 383 | exclude-func <regex> |
| 384 | |
| 385 | where <regex> is a regular expression matched against function names. It |
| 386 | allows some functions to be dropped from the trace when producing ftrace |
| 387 | records. |
| 388 | |
| 389 | Options: |
| 390 | |
| 391 | -c <config_file> |
| 392 | Specify the optional configuration file, to control which functions are |
| 393 | included in the output. |
| 394 | |
| 395 | -f <format> |
| 396 | Specifies the format to use (see below) |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 397 | |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 398 | -m <map_file> |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 399 | Specify U-Boot map file (`System.map`) |
| 400 | |
| 401 | -o <output file> |
| 402 | Specify the output filename |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 403 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 404 | -t <trace_file> |
| 405 | Specify trace file, the data saved from U-Boot |
| 406 | |
| 407 | -v <0-4> |
| 408 | Specify the verbosity, where 0 is the minimum and 4 is for debugging. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 409 | |
| 410 | Commands: |
| 411 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 412 | dump-ftrace: |
| 413 | Write a binary dump of the file in Linux ftrace format. Two options are |
| 414 | available: |
| 415 | |
| 416 | function |
| 417 | write function-call records (caller/callee) |
| 418 | |
| 419 | funcgraph |
| 420 | write function entry/exit records (graph) |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 421 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 422 | This format can be used with kernelshark_ and trace_cmd_. |
| 423 | |
| 424 | dump-flamegraph |
| 425 | Write a list of stack records useful for producing a flame graph. Two |
| 426 | options are available: |
| 427 | |
| 428 | calls |
| 429 | create a flamegraph of stack frames |
| 430 | |
| 431 | timing |
| 432 | create a flamegraph of microseconds for each stack frame |
| 433 | |
| 434 | This format can be used with flamegraph_pl_. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 435 | |
| 436 | Viewing the Trace Data |
| 437 | ---------------------- |
| 438 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 439 | You can use kernelshark_ for a GUI, but note that version 2.0.x was broken. If |
| 440 | you have that version you could try building it from source. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 441 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 442 | The file must have a .dat extension or it is ignored. The program has terse |
| 443 | user interface but is very convenient for viewing U-Boot profile information. |
| 444 | |
| 445 | Also available is trace_cmd_ which provides a command-line interface. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 446 | |
| 447 | Workflow Suggestions |
| 448 | -------------------- |
| 449 | |
| 450 | The following suggestions may be helpful if you are trying to reduce boot |
| 451 | time: |
| 452 | |
| 453 | 1. Enable CONFIG_BOOTSTAGE and CONFIG_BOOTSTAGE_REPORT. This should get |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 454 | you are helpful overall snapshot of the boot time. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 455 | |
| 456 | 2. Build U-Boot with tracing and run it. Note the difference in boot time |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 457 | (it is common for tracing to add 10% to the time) |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 458 | |
Heinrich Schuchardt | 8963f19 | 2021-11-16 18:38:47 +0100 | [diff] [blame] | 459 | 3. Collect the trace information as described above. Use this to find where |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 460 | all the time is being spent. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 461 | |
Heinrich Schuchardt | 8963f19 | 2021-11-16 18:38:47 +0100 | [diff] [blame] | 462 | 4. Take a look at that code and see if you can optimize it. Perhaps it is |
| 463 | possible to speed up the initialization of a device, or remove an unused |
Heinrich Schuchardt | cc26f74 | 2020-12-12 10:14:22 +0100 | [diff] [blame] | 464 | feature. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 465 | |
| 466 | 5. Rebuild, run and collect again. Compare your results. |
| 467 | |
| 468 | 6. Keep going until you run out of steam, or your boot is fast enough. |
| 469 | |
| 470 | |
| 471 | Configuring Trace |
| 472 | ----------------- |
| 473 | |
| 474 | There are a few parameters in the code that you may want to consider. |
| 475 | There is a function call depth limit (set to 15 by default). When the |
| 476 | stack depth goes above this then no tracing information is recorded. |
| 477 | The maximum depth reached is recorded and displayed by the 'trace stats' |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 478 | command. While it might be tempting to set the depth limit quite high, this |
| 479 | can dramatically increase the size of the trace output as well as the execution |
| 480 | time. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 481 | |
| 482 | |
| 483 | Future Work |
| 484 | ----------- |
| 485 | |
| 486 | Tracing could be a little tidier in some areas, for example providing |
| 487 | run-time configuration options for trace. |
| 488 | |
| 489 | Some other features that might be useful: |
| 490 | |
| 491 | - Trace filter to select which functions are recorded |
| 492 | - Sample-based profiling using a timer interrupt |
| 493 | - Better control over trace depth |
| 494 | - Compression of trace information |
| 495 | |
| 496 | |
Simon Glass | b9d1997 | 2023-01-15 14:16:01 -0700 | [diff] [blame] | 497 | .. sectionauthor:: Simon Glass <sjg@chromium.org> |
| 498 | .. April 2013 |
| 499 | .. Updated January 2023 |
| 500 | |
| 501 | .. _kernelshark: https://kernelshark.org/ |
| 502 | .. _trace_cmd: https://www.trace-cmd.org/ |
| 503 | .. _flamegraph_pl: https://github.com/brendangregg/FlameGraph/blob/master/flamegraph.pl |