Simon Glass | c44b800 | 2013-06-11 11:14:39 -0700 | [diff] [blame] | 1 | # |
| 2 | # Copyright (c) 2013 The Chromium OS Authors. |
| 3 | # |
| 4 | # This program is free software; you can redistribute it and/or |
| 5 | # modify it under the terms of the GNU General Public License as |
| 6 | # published by the Free Software Foundatio; either version 2 of |
| 7 | # the License, or (at your option) any later version. |
| 8 | # |
| 9 | # This program is distributed in the hope that it will be useful, |
| 10 | # but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 11 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 12 | # GNU General Public License for more details. |
| 13 | # |
| 14 | # You should have received a copy of the GNU General Public License |
| 15 | # along with this program; if not, write to the Free Software |
| 16 | # Foundation, Inc., 59 Temple Place, Suite 330, Boston, |
| 17 | # MA 02111-1307 USA |
| 18 | # |
| 19 | |
| 20 | Tracing in U-Boot |
| 21 | ================= |
| 22 | |
| 23 | U-Boot supports a simple tracing feature which allows a record of excecution |
| 24 | to be collected and sent to a host machine for analysis. At present the |
| 25 | main use for this is to profile boot time. |
| 26 | |
| 27 | |
| 28 | Overview |
| 29 | -------- |
| 30 | |
| 31 | The trace feature uses GCC's instrument-functions feature to trace all |
| 32 | function entry/exit points. These are then recorded in a memory buffer. |
| 33 | The memory buffer can be saved to the host over a network link using |
| 34 | tftpput or by writing to an attached memory device such as MMC. |
| 35 | |
| 36 | On the host, the file is first converted with a tool called 'proftool', |
| 37 | which extracts useful information from it. The resulting trace output |
| 38 | resembles that emitted by Linux's ftrace feature, so can be visually |
| 39 | displayed by pytimechart. |
| 40 | |
| 41 | |
| 42 | Quick-start using Sandbox |
| 43 | ------------------------- |
| 44 | |
| 45 | Sandbox is a build of U-Boot that can run under Linux so it is a convenient |
| 46 | way of trying out tracing before you use it on your actual board. To do |
| 47 | this, follow these steps: |
| 48 | |
| 49 | Add the following to include/configs/sandbox.h (if not already there) |
| 50 | |
| 51 | #define CONFIG_TRACE |
| 52 | #define CONFIG_CMD_TRACE |
| 53 | #define CONFIG_TRACE_BUFFER_SIZE (16 << 20) |
| 54 | #define CONFIG_TRACE_EARLY_SIZE (8 << 20) |
| 55 | #define CONFIG_TRACE_EARLY |
| 56 | #define CONFIG_TRACE_EARLY_ADDR 0x00100000 |
| 57 | |
| 58 | Build sandbox U-Boot with tracing enabled: |
| 59 | |
| 60 | $ make FTRACE=1 O=sandbox sandbox_config |
| 61 | $ make FTRACE=1 O=sandbox |
| 62 | |
| 63 | Run sandbox, wait for a bit of trace information to appear, and then capture |
| 64 | a trace: |
| 65 | |
| 66 | $ ./sandbox/u-boot |
| 67 | |
| 68 | |
| 69 | U-Boot 2013.04-rc2-00100-ga72fcef (Apr 17 2013 - 19:25:24) |
| 70 | |
| 71 | DRAM: 128 MiB |
| 72 | trace: enabled |
| 73 | Using default environment |
| 74 | |
| 75 | In: serial |
| 76 | Out: serial |
| 77 | Err: serial |
| 78 | =>trace stats |
| 79 | 671,406 function sites |
| 80 | 69,712 function calls |
| 81 | 0 untracked function calls |
| 82 | 73,373 traced function calls |
| 83 | 16 maximum observed call depth |
| 84 | 15 call depth limit |
| 85 | 66,491 calls not traced due to depth |
| 86 | =>trace stats |
| 87 | 671,406 function sites |
| 88 | 1,279,450 function calls |
| 89 | 0 untracked function calls |
| 90 | 950,490 traced function calls (333217 dropped due to overflow) |
| 91 | 16 maximum observed call depth |
| 92 | 15 call depth limit |
| 93 | 1,275,767 calls not traced due to depth |
| 94 | =>trace calls 0 e00000 |
| 95 | Call list dumped to 00000000, size 0xae0a40 |
| 96 | =>print |
| 97 | baudrate=115200 |
| 98 | profbase=0 |
| 99 | profoffset=ae0a40 |
| 100 | profsize=e00000 |
| 101 | stderr=serial |
| 102 | stdin=serial |
| 103 | stdout=serial |
| 104 | |
| 105 | Environment size: 117/8188 bytes |
| 106 | =>sb save host 0 trace 0 ${profoffset} |
| 107 | 11405888 bytes written in 10 ms (1.1 GiB/s) |
| 108 | =>reset |
| 109 | |
| 110 | |
| 111 | Then run proftool to convert the trace information to ftrace format. |
| 112 | |
| 113 | $ ./sandbox/tools/proftool -m sandbox/System.map -p trace dump-ftrace >trace.txt |
| 114 | |
| 115 | Finally run pytimechart to display it: |
| 116 | |
| 117 | $ pytimechart trace.txt |
| 118 | |
| 119 | Using this tool you can zoom and pan across the trace, with the function |
| 120 | calls on the left and little marks representing the start and end of each |
| 121 | function. |
| 122 | |
| 123 | |
| 124 | CONFIG Options |
| 125 | -------------- |
| 126 | |
| 127 | - CONFIG_TRACE |
| 128 | Enables the trace feature in U-Boot. |
| 129 | |
| 130 | - CONFIG_CMD_TRACE |
| 131 | Enables the trace command. |
| 132 | |
| 133 | - CONFIG_TRACE_BUFFER_SIZE |
| 134 | Size of trace buffer to allocate for U-Boot. This buffer is |
| 135 | used after relocation, as a place to put function tracing |
| 136 | information. The address of the buffer is determined by |
| 137 | the relocation code. |
| 138 | |
| 139 | - CONFIG_TRACE_EARLY |
| 140 | Define this to start tracing early, before relocation. |
| 141 | |
| 142 | - CONFIG_TRACE_EARLY_SIZE |
| 143 | Size of 'early' trace buffer. Before U-Boot has relocated |
| 144 | it doesn't have a proper trace buffer. On many boards |
| 145 | you can define an area of memory to use for the trace |
| 146 | buffer until the 'real' trace buffer is available after |
| 147 | relocation. The contents of this buffer are then copied to |
| 148 | the real buffer. |
| 149 | |
| 150 | - CONFIG_TRACE_EARLY_ADDR |
| 151 | Address of early trace buffer |
| 152 | |
| 153 | |
| 154 | Building U-Boot with Tracing Enabled |
| 155 | ------------------------------------ |
| 156 | |
| 157 | Pass 'FTRACE=1' to the U-Boot Makefile to actually instrument the code. |
| 158 | This is kept as a separate option so that it is easy to enable/disable |
| 159 | instrumenting from the command line instead of having to change board |
| 160 | config files. |
| 161 | |
| 162 | |
| 163 | Collecting Trace Data |
| 164 | --------------------- |
| 165 | |
| 166 | When you run U-Boot on your board it will collect trace data up to the |
| 167 | limit of the trace buffer size you have specified. Once that is exhausted |
| 168 | no more data will be collected. |
| 169 | |
| 170 | Collecting trace data has an affect on execution time/performance. You |
| 171 | will notice this particularly with trvial functions - the overhead of |
| 172 | recording their execution may even exceed their normal execution time. |
| 173 | In practice this doesn't matter much so long as you are aware of the |
| 174 | effect. Once you have done your optimisations, turn off tracing before |
| 175 | doing end-to-end timing. |
| 176 | |
| 177 | The best time to start tracing is right at the beginning of U-Boot. The |
| 178 | best time to stop tracing is right at the end. In practice it is hard |
| 179 | to achieve these ideals. |
| 180 | |
| 181 | This implementation enables tracing early in board_init_f(). This means |
| 182 | that it captures most of the board init process, missing only the |
| 183 | early architecture-specific init. However, it also misses the entire |
| 184 | SPL stage if there is one. |
| 185 | |
| 186 | U-Boot typically ends with a 'bootm' command which loads and runs an |
| 187 | OS. There is useful trace data in the execution of that bootm |
| 188 | command. Therefore this implementation provides a way to collect trace |
| 189 | data after bootm has finished processing, but just before it jumps to |
| 190 | the OS. In practical terms, U-Boot runs the 'fakegocmd' environment |
| 191 | variable at this point. This variable should have a short script which |
| 192 | collects the trace data and writes it somewhere. |
| 193 | |
| 194 | Trace data collection relies on a microsecond timer, accesed through |
| 195 | timer_get_us(). So the first think you should do is make sure that |
| 196 | this produces sensible results for your board. Suitable sources for |
| 197 | this timer include high resolution timers, PWMs or profile timers if |
| 198 | available. Most modern SOCs have a suitable timer for this. Make sure |
| 199 | that you mark this timer (and anything it calls) with |
| 200 | __attribute__((no_instrument_function)) so that the trace library can |
| 201 | use it without causing an infinite loop. |
| 202 | |
| 203 | |
| 204 | Commands |
| 205 | -------- |
| 206 | |
| 207 | The trace command has variable sub-commands: |
| 208 | |
| 209 | - stats |
| 210 | Display tracing statistics |
| 211 | |
| 212 | - pause |
| 213 | Pause tracing |
| 214 | |
| 215 | - resume |
| 216 | Resume tracing |
| 217 | |
| 218 | - funclist [<addr> <size>] |
| 219 | Dump a list of functions into the buffer |
| 220 | |
| 221 | - calls [<addr> <size>] |
| 222 | Dump function call trace into buffer |
| 223 | |
| 224 | If the address and size are not given, these are obtained from environment |
| 225 | variables (see below). In any case the environment variables are updated |
| 226 | after the command runs. |
| 227 | |
| 228 | |
| 229 | Environment Variables |
| 230 | --------------------- |
| 231 | |
| 232 | The following are used: |
| 233 | |
| 234 | - profbase |
| 235 | Base address of trace output buffer |
| 236 | |
| 237 | - profoffset |
| 238 | Offset of first unwritten byte in trace output buffer |
| 239 | |
| 240 | - profsize |
| 241 | Size of trace output buffer |
| 242 | |
| 243 | All of these are set by the 'trace calls' command. |
| 244 | |
| 245 | These variables keep track of the amount of data written to the trace |
| 246 | output buffer by the 'trace' command. The trace commands which write data |
| 247 | to the output buffer can use these to specify the buffer to write to, and |
| 248 | update profoffset each time. This allows successive commands to append data |
| 249 | to the same buffer, for example: |
| 250 | |
| 251 | trace funclist 10000 e00000 |
| 252 | trace calls |
| 253 | |
| 254 | (the latter command appends more data to the buffer). |
| 255 | |
| 256 | |
| 257 | - fakegocmd |
| 258 | Specifies commands to run just before booting the OS. This |
| 259 | is a useful time to write the trace data to the host for |
| 260 | processing. |
| 261 | |
| 262 | |
| 263 | Writing Out Trace Data |
| 264 | ---------------------- |
| 265 | |
| 266 | Once the trace data is in an output buffer in memory there are various ways |
| 267 | to transmit it to the host. Notably you can use tftput to send the data |
| 268 | over a network link: |
| 269 | |
| 270 | fakegocmd=trace pause; usb start; set autoload n; bootp; |
| 271 | trace calls 10000000 1000000; |
| 272 | tftpput ${profbase} ${profoffset} 192.168.1.4:/tftpboot/calls |
| 273 | |
| 274 | This starts up USB (to talk to an attached USB Ethernet dongle), writes |
| 275 | a trace log to address 10000000 and sends it to a host machine using |
| 276 | TFTP. After this, U-Boot will boot the OS normally, albeit a little |
| 277 | later. |
| 278 | |
| 279 | |
| 280 | Converting Trace Output Data |
| 281 | ---------------------------- |
| 282 | |
| 283 | The trace output data is kept in a binary format which is not documented |
| 284 | here. To convert it into something useful, you can use proftool. |
| 285 | |
| 286 | This tool must be given the U-Boot map file and the trace data received |
| 287 | from running that U-Boot. It produces a text output file. |
| 288 | |
| 289 | Options |
| 290 | -m <map_file> |
| 291 | Specify U-Boot map file |
| 292 | |
| 293 | -p <trace_file> |
| 294 | Specifiy profile/trace file |
| 295 | |
| 296 | Commands: |
| 297 | |
| 298 | - dump-ftrace |
| 299 | Write a text dump of the file in Linux ftrace format to stdout |
| 300 | |
| 301 | |
| 302 | Viewing the Trace Data |
| 303 | ---------------------- |
| 304 | |
| 305 | You can use pytimechart for this (sudo apt-get pytimechart might work on |
| 306 | your Debian-style machine, and use your favourite search engine to obtain |
| 307 | documentation). It expects the file to have a .txt extension. The program |
| 308 | has terse user interface but is very convenient for viewing U-Boot |
| 309 | profile information. |
| 310 | |
| 311 | |
| 312 | Workflow Suggestions |
| 313 | -------------------- |
| 314 | |
| 315 | The following suggestions may be helpful if you are trying to reduce boot |
| 316 | time: |
| 317 | |
| 318 | 1. Enable CONFIG_BOOTSTAGE and CONFIG_BOOTSTAGE_REPORT. This should get |
| 319 | you are helpful overall snapshot of the boot time. |
| 320 | |
| 321 | 2. Build U-Boot with tracing and run it. Note the difference in boot time |
| 322 | (it is common for tracing to add 10% to the time) |
| 323 | |
| 324 | 3. Collect the trace information as descibed above. Use this to find where |
| 325 | all the time is being spent. |
| 326 | |
| 327 | 4. Take a look at that code and see if you can optimise it. Perhaps it is |
| 328 | possible to speed up the initialisation of a device, or remove an unused |
| 329 | feature. |
| 330 | |
| 331 | 5. Rebuild, run and collect again. Compare your results. |
| 332 | |
| 333 | 6. Keep going until you run out of steam, or your boot is fast enough. |
| 334 | |
| 335 | |
| 336 | Configuring Trace |
| 337 | ----------------- |
| 338 | |
| 339 | There are a few parameters in the code that you may want to consider. |
| 340 | There is a function call depth limit (set to 15 by default). When the |
| 341 | stack depth goes above this then no tracing information is recorded. |
| 342 | The maximum depth reached is recorded and displayed by the 'trace stats' |
| 343 | command. |
| 344 | |
| 345 | |
| 346 | Future Work |
| 347 | ----------- |
| 348 | |
| 349 | Tracing could be a little tidier in some areas, for example providing |
| 350 | run-time configuration options for trace. |
| 351 | |
| 352 | Some other features that might be useful: |
| 353 | |
| 354 | - Trace filter to select which functions are recorded |
| 355 | - Sample-based profiling using a timer interrupt |
| 356 | - Better control over trace depth |
| 357 | - Compression of trace information |
| 358 | |
| 359 | |
| 360 | Simon Glass <sjg@chromium.org> |
| 361 | April 2013 |