Tom Rini | 10e4779 | 2018-05-06 17:58:06 -0400 | [diff] [blame] | 1 | # SPDX-License-Identifier: GPL-2.0+ |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 2 | # |
| 3 | # Copyright (c) 2013 The Chromium OS Authors. |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 4 | |
| 5 | Tracing in U-Boot |
| 6 | ================= |
| 7 | |
| 8 | U-Boot supports a simple tracing feature which allows a record of excecution |
| 9 | to be collected and sent to a host machine for analysis. At present the |
| 10 | main use for this is to profile boot time. |
| 11 | |
| 12 | |
| 13 | Overview |
| 14 | -------- |
| 15 | |
| 16 | The trace feature uses GCC's instrument-functions feature to trace all |
| 17 | function entry/exit points. These are then recorded in a memory buffer. |
| 18 | The memory buffer can be saved to the host over a network link using |
| 19 | tftpput or by writing to an attached memory device such as MMC. |
| 20 | |
| 21 | On the host, the file is first converted with a tool called 'proftool', |
| 22 | which extracts useful information from it. The resulting trace output |
| 23 | resembles that emitted by Linux's ftrace feature, so can be visually |
| 24 | displayed by pytimechart. |
| 25 | |
| 26 | |
| 27 | Quick-start using Sandbox |
| 28 | ------------------------- |
| 29 | |
| 30 | Sandbox is a build of U-Boot that can run under Linux so it is a convenient |
| 31 | way of trying out tracing before you use it on your actual board. To do |
| 32 | this, follow these steps: |
| 33 | |
| 34 | Add the following to include/configs/sandbox.h (if not already there) |
| 35 | |
| 36 | #define CONFIG_TRACE |
| 37 | #define CONFIG_CMD_TRACE |
| 38 | #define CONFIG_TRACE_BUFFER_SIZE (16 << 20) |
| 39 | #define CONFIG_TRACE_EARLY_SIZE (8 << 20) |
| 40 | #define CONFIG_TRACE_EARLY |
| 41 | #define CONFIG_TRACE_EARLY_ADDR 0x00100000 |
| 42 | |
| 43 | Build sandbox U-Boot with tracing enabled: |
| 44 | |
| 45 | $ make FTRACE=1 O=sandbox sandbox_config |
| 46 | $ make FTRACE=1 O=sandbox |
| 47 | |
| 48 | Run sandbox, wait for a bit of trace information to appear, and then capture |
| 49 | a trace: |
| 50 | |
| 51 | $ ./sandbox/u-boot |
| 52 | |
| 53 | |
| 54 | U-Boot 2013.04-rc2-00100-ga72fcef (Apr 17 2013 - 19:25:24) |
| 55 | |
| 56 | DRAM: 128 MiB |
| 57 | trace: enabled |
| 58 | Using default environment |
| 59 | |
| 60 | In: serial |
| 61 | Out: serial |
| 62 | Err: serial |
| 63 | =>trace stats |
Wolfgang Denk | ec7fbf5 | 2013-10-04 17:43:24 +0200 | [diff] [blame] | 64 | 671,406 function sites |
| 65 | 69,712 function calls |
| 66 | 0 untracked function calls |
| 67 | 73,373 traced function calls |
| 68 | 16 maximum observed call depth |
| 69 | 15 call depth limit |
| 70 | 66,491 calls not traced due to depth |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 71 | =>trace stats |
Wolfgang Denk | ec7fbf5 | 2013-10-04 17:43:24 +0200 | [diff] [blame] | 72 | 671,406 function sites |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 73 | 1,279,450 function calls |
Wolfgang Denk | ec7fbf5 | 2013-10-04 17:43:24 +0200 | [diff] [blame] | 74 | 0 untracked function calls |
| 75 | 950,490 traced function calls (333217 dropped due to overflow) |
| 76 | 16 maximum observed call depth |
| 77 | 15 call depth limit |
Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 78 | 1,275,767 calls not traced due to depth |
| 79 | =>trace calls 0 e00000 |
| 80 | Call list dumped to 00000000, size 0xae0a40 |
| 81 | =>print |
| 82 | baudrate=115200 |
| 83 | profbase=0 |
| 84 | profoffset=ae0a40 |
| 85 | profsize=e00000 |
| 86 | stderr=serial |
| 87 | stdin=serial |
| 88 | stdout=serial |
| 89 | |
| 90 | Environment size: 117/8188 bytes |
| 91 | =>sb save host 0 trace 0 ${profoffset} |
| 92 | 11405888 bytes written in 10 ms (1.1 GiB/s) |
| 93 | =>reset |
| 94 | |
| 95 | |
| 96 | Then run proftool to convert the trace information to ftrace format. |
| 97 | |
| 98 | $ ./sandbox/tools/proftool -m sandbox/System.map -p trace dump-ftrace >trace.txt |
| 99 | |
| 100 | Finally run pytimechart to display it: |
| 101 | |
| 102 | $ pytimechart trace.txt |
| 103 | |
| 104 | Using this tool you can zoom and pan across the trace, with the function |
| 105 | calls on the left and little marks representing the start and end of each |
| 106 | function. |
| 107 | |
| 108 | |
| 109 | CONFIG Options |
| 110 | -------------- |
| 111 | |
| 112 | - CONFIG_TRACE |
| 113 | Enables the trace feature in U-Boot. |
| 114 | |
| 115 | - CONFIG_CMD_TRACE |
| 116 | Enables the trace command. |
| 117 | |
| 118 | - CONFIG_TRACE_BUFFER_SIZE |
| 119 | Size of trace buffer to allocate for U-Boot. This buffer is |
| 120 | used after relocation, as a place to put function tracing |
| 121 | information. The address of the buffer is determined by |
| 122 | the relocation code. |
| 123 | |
| 124 | - CONFIG_TRACE_EARLY |
| 125 | Define this to start tracing early, before relocation. |
| 126 | |
| 127 | - CONFIG_TRACE_EARLY_SIZE |
| 128 | Size of 'early' trace buffer. Before U-Boot has relocated |
| 129 | it doesn't have a proper trace buffer. On many boards |
| 130 | you can define an area of memory to use for the trace |
| 131 | buffer until the 'real' trace buffer is available after |
| 132 | relocation. The contents of this buffer are then copied to |
| 133 | the real buffer. |
| 134 | |
| 135 | - CONFIG_TRACE_EARLY_ADDR |
| 136 | Address of early trace buffer |
| 137 | |
| 138 | |
| 139 | Building U-Boot with Tracing Enabled |
| 140 | ------------------------------------ |
| 141 | |
| 142 | Pass 'FTRACE=1' to the U-Boot Makefile to actually instrument the code. |
| 143 | This is kept as a separate option so that it is easy to enable/disable |
| 144 | instrumenting from the command line instead of having to change board |
| 145 | config files. |
| 146 | |
| 147 | |
| 148 | Collecting Trace Data |
| 149 | --------------------- |
| 150 | |
| 151 | When you run U-Boot on your board it will collect trace data up to the |
| 152 | limit of the trace buffer size you have specified. Once that is exhausted |
| 153 | no more data will be collected. |
| 154 | |
| 155 | Collecting trace data has an affect on execution time/performance. You |
| 156 | will notice this particularly with trvial functions - the overhead of |
| 157 | recording their execution may even exceed their normal execution time. |
| 158 | In practice this doesn't matter much so long as you are aware of the |
| 159 | effect. Once you have done your optimisations, turn off tracing before |
| 160 | doing end-to-end timing. |
| 161 | |
| 162 | The best time to start tracing is right at the beginning of U-Boot. The |
| 163 | best time to stop tracing is right at the end. In practice it is hard |
| 164 | to achieve these ideals. |
| 165 | |
| 166 | This implementation enables tracing early in board_init_f(). This means |
| 167 | that it captures most of the board init process, missing only the |
| 168 | early architecture-specific init. However, it also misses the entire |
| 169 | SPL stage if there is one. |
| 170 | |
| 171 | U-Boot typically ends with a 'bootm' command which loads and runs an |
| 172 | OS. There is useful trace data in the execution of that bootm |
| 173 | command. Therefore this implementation provides a way to collect trace |
| 174 | data after bootm has finished processing, but just before it jumps to |
| 175 | the OS. In practical terms, U-Boot runs the 'fakegocmd' environment |
| 176 | variable at this point. This variable should have a short script which |
| 177 | collects the trace data and writes it somewhere. |
| 178 | |
| 179 | Trace data collection relies on a microsecond timer, accesed through |
| 180 | timer_get_us(). So the first think you should do is make sure that |
| 181 | this produces sensible results for your board. Suitable sources for |
| 182 | this timer include high resolution timers, PWMs or profile timers if |
| 183 | available. Most modern SOCs have a suitable timer for this. Make sure |
| 184 | that you mark this timer (and anything it calls) with |
| 185 | __attribute__((no_instrument_function)) so that the trace library can |
| 186 | use it without causing an infinite loop. |
| 187 | |
| 188 | |
| 189 | Commands |
| 190 | -------- |
| 191 | |
| 192 | The trace command has variable sub-commands: |
| 193 | |
| 194 | - stats |
| 195 | Display tracing statistics |
| 196 | |
| 197 | - pause |
| 198 | Pause tracing |
| 199 | |
| 200 | - resume |
| 201 | Resume tracing |
| 202 | |
| 203 | - funclist [<addr> <size>] |
| 204 | Dump a list of functions into the buffer |
| 205 | |
| 206 | - calls [<addr> <size>] |
| 207 | Dump function call trace into buffer |
| 208 | |
| 209 | If the address and size are not given, these are obtained from environment |
| 210 | variables (see below). In any case the environment variables are updated |
| 211 | after the command runs. |
| 212 | |
| 213 | |
| 214 | Environment Variables |
| 215 | --------------------- |
| 216 | |
| 217 | The following are used: |
| 218 | |
| 219 | - profbase |
| 220 | Base address of trace output buffer |
| 221 | |
| 222 | - profoffset |
| 223 | Offset of first unwritten byte in trace output buffer |
| 224 | |
| 225 | - profsize |
| 226 | Size of trace output buffer |
| 227 | |
| 228 | All of these are set by the 'trace calls' command. |
| 229 | |
| 230 | These variables keep track of the amount of data written to the trace |
| 231 | output buffer by the 'trace' command. The trace commands which write data |
| 232 | to the output buffer can use these to specify the buffer to write to, and |
| 233 | update profoffset each time. This allows successive commands to append data |
| 234 | to the same buffer, for example: |
| 235 | |
| 236 | trace funclist 10000 e00000 |
| 237 | trace calls |
| 238 | |
| 239 | (the latter command appends more data to the buffer). |
| 240 | |
| 241 | |
| 242 | - fakegocmd |
| 243 | Specifies commands to run just before booting the OS. This |
| 244 | is a useful time to write the trace data to the host for |
| 245 | processing. |
| 246 | |
| 247 | |
| 248 | Writing Out Trace Data |
| 249 | ---------------------- |
| 250 | |
| 251 | Once the trace data is in an output buffer in memory there are various ways |
| 252 | to transmit it to the host. Notably you can use tftput to send the data |
| 253 | over a network link: |
| 254 | |
| 255 | fakegocmd=trace pause; usb start; set autoload n; bootp; |
| 256 | trace calls 10000000 1000000; |
| 257 | tftpput ${profbase} ${profoffset} 192.168.1.4:/tftpboot/calls |
| 258 | |
| 259 | This starts up USB (to talk to an attached USB Ethernet dongle), writes |
| 260 | a trace log to address 10000000 and sends it to a host machine using |
| 261 | TFTP. After this, U-Boot will boot the OS normally, albeit a little |
| 262 | later. |
| 263 | |
| 264 | |
| 265 | Converting Trace Output Data |
| 266 | ---------------------------- |
| 267 | |
| 268 | The trace output data is kept in a binary format which is not documented |
| 269 | here. To convert it into something useful, you can use proftool. |
| 270 | |
| 271 | This tool must be given the U-Boot map file and the trace data received |
| 272 | from running that U-Boot. It produces a text output file. |
| 273 | |
| 274 | Options |
| 275 | -m <map_file> |
| 276 | Specify U-Boot map file |
| 277 | |
| 278 | -p <trace_file> |
| 279 | Specifiy profile/trace file |
| 280 | |
| 281 | Commands: |
| 282 | |
| 283 | - dump-ftrace |
| 284 | Write a text dump of the file in Linux ftrace format to stdout |
| 285 | |
| 286 | |
| 287 | Viewing the Trace Data |
| 288 | ---------------------- |
| 289 | |
| 290 | You can use pytimechart for this (sudo apt-get pytimechart might work on |
| 291 | your Debian-style machine, and use your favourite search engine to obtain |
| 292 | documentation). It expects the file to have a .txt extension. The program |
| 293 | has terse user interface but is very convenient for viewing U-Boot |
| 294 | profile information. |
| 295 | |
| 296 | |
| 297 | Workflow Suggestions |
| 298 | -------------------- |
| 299 | |
| 300 | The following suggestions may be helpful if you are trying to reduce boot |
| 301 | time: |
| 302 | |
| 303 | 1. Enable CONFIG_BOOTSTAGE and CONFIG_BOOTSTAGE_REPORT. This should get |
| 304 | you are helpful overall snapshot of the boot time. |
| 305 | |
| 306 | 2. Build U-Boot with tracing and run it. Note the difference in boot time |
| 307 | (it is common for tracing to add 10% to the time) |
| 308 | |
| 309 | 3. Collect the trace information as descibed above. Use this to find where |
| 310 | all the time is being spent. |
| 311 | |
| 312 | 4. Take a look at that code and see if you can optimise it. Perhaps it is |
| 313 | possible to speed up the initialisation of a device, or remove an unused |
| 314 | feature. |
| 315 | |
| 316 | 5. Rebuild, run and collect again. Compare your results. |
| 317 | |
| 318 | 6. Keep going until you run out of steam, or your boot is fast enough. |
| 319 | |
| 320 | |
| 321 | Configuring Trace |
| 322 | ----------------- |
| 323 | |
| 324 | There are a few parameters in the code that you may want to consider. |
| 325 | There is a function call depth limit (set to 15 by default). When the |
| 326 | stack depth goes above this then no tracing information is recorded. |
| 327 | The maximum depth reached is recorded and displayed by the 'trace stats' |
| 328 | command. |
| 329 | |
| 330 | |
| 331 | Future Work |
| 332 | ----------- |
| 333 | |
| 334 | Tracing could be a little tidier in some areas, for example providing |
| 335 | run-time configuration options for trace. |
| 336 | |
| 337 | Some other features that might be useful: |
| 338 | |
| 339 | - Trace filter to select which functions are recorded |
| 340 | - Sample-based profiling using a timer interrupt |
| 341 | - Better control over trace depth |
| 342 | - Compression of trace information |
| 343 | |
| 344 | |
| 345 | Simon Glass <sjg@chromium.org> |
| 346 | April 2013 |