Paul Beesley | 236d246 | 2019-03-05 17:19:37 +0000 | [diff] [blame] | 1 | PSCI Performance Measurements on Arm Juno Development Platform |
| 2 | ============================================================== |
| 3 | |
Joel Hutton | 9e60563 | 2019-02-25 15:18:56 +0000 | [diff] [blame] | 4 | This document summarises the findings of performance measurements of key |
John Tsichritzis | 63801cd | 2019-07-05 14:22:12 +0100 | [diff] [blame] | 5 | operations in the Trusted Firmware-A Power State Coordination Interface (PSCI) |
| 6 | implementation, using the in-built Performance Measurement Framework (PMF) and |
| 7 | runtime instrumentation timestamps. |
Joel Hutton | 9e60563 | 2019-02-25 15:18:56 +0000 | [diff] [blame] | 8 | |
| 9 | Method |
| 10 | ------ |
| 11 | |
| 12 | We used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2 |
| 13 | x Cortex-A57 clusters running at the following frequencies: |
| 14 | |
| 15 | +-----------------+--------------------+ |
| 16 | | Domain | Frequency (MHz) | |
| 17 | +=================+====================+ |
| 18 | | Cortex-A57 | 900 (nominal) | |
| 19 | +-----------------+--------------------+ |
| 20 | | Cortex-A53 | 650 (underdrive) | |
| 21 | +-----------------+--------------------+ |
| 22 | | AXI subsystem | 533 | |
| 23 | +-----------------+--------------------+ |
| 24 | |
| 25 | Juno supports CPU, cluster and system power down states, corresponding to power |
| 26 | levels 0, 1 and 2 respectively. It does not support any retention states. |
| 27 | |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 28 | Given that runtime instrumentation using PMF is invasive, there is a small |
| 29 | (unquantified) overhead on the results. PMF uses the generic counter for |
| 30 | timestamps, which runs at 50MHz on Juno. |
Joel Hutton | 9e60563 | 2019-02-25 15:18:56 +0000 | [diff] [blame] | 31 | |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 32 | The following source trees and binaries were used: |
Joel Hutton | 9e60563 | 2019-02-25 15:18:56 +0000 | [diff] [blame] | 33 | |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 34 | - `TF-A v2.13-rc0`_ |
| 35 | - `TFTF v2.13-rc0`_ |
Joel Hutton | 9e60563 | 2019-02-25 15:18:56 +0000 | [diff] [blame] | 36 | |
Thaddeus Serna | 8709cc9 | 2023-08-14 13:28:59 -0500 | [diff] [blame] | 37 | Please see the Runtime Instrumentation :ref:`Testing Methodology |
| 38 | <Runtime Instrumentation Methodology>` |
Boyan Karatotev | d885590 | 2025-05-07 15:46:36 +0100 | [diff] [blame] | 39 | page for more details. The tests were ran using the |
| 40 | `tf-psci-lava-instr/juno-enable-runtime-instr,juno-instrumentation:juno-tftf` |
| 41 | configuration in CI. |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 42 | |
| 43 | Results |
| 44 | ------- |
| 45 | |
| 46 | ``CPU_SUSPEND`` to deepest power level |
| 47 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 48 | |
| 49 | .. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 50 | parallel (v2.13) |
| 51 | |
| 52 | +---------+------+------------------+-------------------+--------------------+ |
| 53 | | Cluster | Core | Powerdown | Wakeup | Cache Flush | |
| 54 | +---------+------+------------------+-------------------+--------------------+ |
| 55 | | 0 | 0 | 333.0 (-52.92%) | 23.92 (-40.11%) | 138.88 | |
| 56 | +---------+------+------------------+-------------------+--------------------+ |
| 57 | | 0 | 1 | 630.9 (+145.95%) | 253.72 (-46.56%) | 136.94 (+1987.50%) | |
| 58 | +---------+------+------------------+-------------------+--------------------+ |
| 59 | | 1 | 0 | 184.74 (+71.92%) | 23.16 (-95.39%) | 80.24 (+1283.45%) | |
| 60 | +---------+------+------------------+-------------------+--------------------+ |
| 61 | | 1 | 1 | 481.14 | 18.56 (-88.25%) | 76.5 (+1520.76%) | |
| 62 | +---------+------+------------------+-------------------+--------------------+ |
| 63 | | 1 | 2 | 933.88 (+67.76%) | 289.58 (+189.64%) | 76.34 (+1510.55%) | |
| 64 | +---------+------+------------------+-------------------+--------------------+ |
| 65 | | 1 | 3 | 1112.48 | 238.42 (+753.94%) | 76.38 | |
| 66 | +---------+------+------------------+-------------------+--------------------+ |
| 67 | |
| 68 | .. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 69 | parallel (v2.12) |
| 70 | |
| 71 | +---------+------+-------------------+------------------+--------------------+ |
| 72 | | Cluster | Core | Powerdown | Wakeup | Cache Flush | |
| 73 | +---------+------+-------------------+------------------+--------------------+ |
| 74 | | 0 | 0 | 244.52 (-65.43%) | 26.92 (-32.60%) | 5.54 (-96.70%) | |
| 75 | +---------+------+-------------------+------------------+--------------------+ |
| 76 | | 0 | 1 | 526.18 (+105.12%) | 416.1 | 138.52 (+2011.59%) | |
| 77 | +---------+------+-------------------+------------------+--------------------+ |
| 78 | | 1 | 0 | 104.34 | 27.02 (-94.62%) | 5.32 | |
| 79 | +---------+------+-------------------+------------------+--------------------+ |
| 80 | | 1 | 1 | 384.98 | 23.06 (-85.40%) | 4.48 | |
| 81 | +---------+------+-------------------+------------------+--------------------+ |
| 82 | | 1 | 2 | 812.44 (+45.94%) | 126.78 | 4.54 | |
| 83 | +---------+------+-------------------+------------------+--------------------+ |
| 84 | | 1 | 3 | 986.84 | 77.22 (+176.58%) | 79.76 | |
| 85 | +---------+------+-------------------+------------------+--------------------+ |
| 86 | |
| 87 | .. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 88 | serial (v2.13) |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 89 | |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 90 | +---------+------+------------------+-----------------+-------------------+ |
| 91 | | Cluster | Core | Powerdown | Wakeup | Cache Flush | |
| 92 | +---------+------+------------------+-----------------+-------------------+ |
| 93 | | 0 | 0 | 244.08 | 24.48 (-40.00%) | 137.64 | |
| 94 | +---------+------+------------------+-----------------+-------------------+ |
| 95 | | 0 | 1 | 244.2 | 23.84 (-41.57%) | 137.86 | |
| 96 | +---------+------+------------------+-----------------+-------------------+ |
| 97 | | 1 | 0 | 294.78 | 23.54 | 76.62 | |
| 98 | +---------+------+------------------+-----------------+-------------------+ |
| 99 | | 1 | 1 | 180.1 (+74.72%) | 21.14 | 77.12 (+1533.90%) | |
| 100 | +---------+------+------------------+-----------------+-------------------+ |
| 101 | | 1 | 2 | 180.54 (+75.25%) | 20.8 | 76.76 (+1554.31%) | |
| 102 | +---------+------+------------------+-----------------+-------------------+ |
| 103 | | 1 | 3 | 180.6 (+75.44%) | 21.2 | 76.86 (+1542.31%) | |
| 104 | +---------+------+------------------+-----------------+-------------------+ |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 105 | |
| 106 | .. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 107 | serial (v2.12) |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 108 | |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 109 | +---------+------+-----------+-----------------+-------------+ |
| 110 | | Cluster | Core | Powerdown | Wakeup | Cache Flush | |
| 111 | +---------+------+-----------+-----------------+-------------+ |
| 112 | | 0 | 0 | 236.36 | 27.94 (-31.52%) | 138.0 | |
| 113 | +---------+------+-----------+-----------------+-------------+ |
| 114 | | 0 | 1 | 236.58 | 27.86 (-31.72%) | 138.2 | |
| 115 | +---------+------+-----------+-----------------+-------------+ |
| 116 | | 1 | 0 | 280.68 | 27.02 | 77.6 | |
| 117 | +---------+------+-----------+-----------------+-------------+ |
| 118 | | 1 | 1 | 101.4 | 22.52 | 4.42 | |
| 119 | +---------+------+-----------+-----------------+-------------+ |
| 120 | | 1 | 2 | 100.92 | 22.68 | 4.4 | |
| 121 | +---------+------+-----------+-----------------+-------------+ |
| 122 | | 1 | 3 | 100.96 | 22.54 | 4.38 | |
| 123 | +---------+------+-----------+-----------------+-------------+ |
Harrison Mutai | 9068845 | 2023-11-10 17:35:33 +0000 | [diff] [blame] | 124 | |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 125 | ``CPU_SUSPEND`` to power level 0 |
| 126 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
Joel Hutton | 9e60563 | 2019-02-25 15:18:56 +0000 | [diff] [blame] | 127 | |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 128 | .. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 129 | parallel (v2.13) |
| 130 | |
| 131 | +---------+------+-------------------+-----------------+-------------+ |
| 132 | | Cluster | Core | Powerdown | Wakeup | Cache Flush | |
| 133 | +---------+------+-------------------+-----------------+-------------+ |
| 134 | | 0 | 0 | 703.06 | 16.86 (-47.87%) | 7.98 | |
| 135 | +---------+------+-------------------+-----------------+-------------+ |
| 136 | | 0 | 1 | 851.88 | 16.4 (-49.41%) | 8.04 | |
| 137 | +---------+------+-------------------+-----------------+-------------+ |
| 138 | | 1 | 0 | 407.4 (+58.99%) | 15.1 (-26.20%) | 7.2 | |
| 139 | +---------+------+-------------------+-----------------+-------------+ |
| 140 | | 1 | 1 | 110.98 (-72.67%) | 15.46 | 6.56 | |
| 141 | +---------+------+-------------------+-----------------+-------------+ |
| 142 | | 1 | 2 | 554.54 | 15.4 | 6.94 | |
| 143 | +---------+------+-------------------+-----------------+-------------+ |
| 144 | | 1 | 3 | 258.96 (+143.06%) | 15.56 (-25.05%) | 6.64 | |
| 145 | +---------+------+-------------------+-----------------+-------------+ |
| 146 | |
| 147 | .. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 148 | parallel (v2.12) |
| 149 | |
| 150 | +--------------------------------------------------------------------+ |
| 151 | | test_rt_instr_cpu_susp_parallel | |
| 152 | +---------+------+-------------------+-----------------+-------------+ |
| 153 | | Cluster | Core | Powerdown | Wakeup | Cache Flush | |
| 154 | +---------+------+-------------------+-----------------+-------------+ |
| 155 | | 0 | 0 | 663.12 | 19.66 (-39.21%) | 8.26 | |
| 156 | +---------+------+-------------------+-----------------+-------------+ |
| 157 | | 0 | 1 | 804.18 | 19.24 (-40.65%) | 8.1 | |
| 158 | +---------+------+-------------------+-----------------+-------------+ |
| 159 | | 1 | 0 | 105.58 (-58.80%) | 19.68 | 7.42 | |
| 160 | +---------+------+-------------------+-----------------+-------------+ |
| 161 | | 1 | 1 | 245.02 (-39.67%) | 19.8 | 6.82 | |
| 162 | +---------+------+-------------------+-----------------+-------------+ |
| 163 | | 1 | 2 | 383.82 (-30.83%) | 18.84 | 7.06 | |
| 164 | +---------+------+-------------------+-----------------+-------------+ |
| 165 | | 1 | 3 | 523.36 (+391.23%) | 19.0 | 7.3 | |
| 166 | +---------+------+-------------------+-----------------+-------------+ |
| 167 | |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 168 | .. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.13) |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 169 | |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 170 | +---------+------+-----------+-----------------+-------------+ |
| 171 | | Cluster | Core | Powerdown | Wakeup | Cache Flush | |
| 172 | +---------+------+-----------+-----------------+-------------+ |
| 173 | | 0 | 0 | 106.12 | 17.1 (-48.24%) | 5.26 | |
| 174 | +---------+------+-----------+-----------------+-------------+ |
| 175 | | 0 | 1 | 106.88 | 17.06 (-47.08%) | 5.28 | |
| 176 | +---------+------+-----------+-----------------+-------------+ |
| 177 | | 1 | 0 | 294.36 | 15.6 | 4.56 | |
| 178 | +---------+------+-----------+-----------------+-------------+ |
| 179 | | 1 | 1 | 103.26 | 15.44 | 4.46 | |
| 180 | +---------+------+-----------+-----------------+-------------+ |
| 181 | | 1 | 2 | 103.7 | 15.26 | 4.5 | |
| 182 | +---------+------+-----------+-----------------+-------------+ |
| 183 | | 1 | 3 | 103.68 | 15.72 | 4.5 | |
| 184 | +---------+------+-----------+-----------------+-------------+ |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 185 | |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 186 | .. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.12) |
Harrison Mutai | 9068845 | 2023-11-10 17:35:33 +0000 | [diff] [blame] | 187 | |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 188 | +---------+------+-----------+-----------------+-------------+ |
| 189 | | Cluster | Core | Powerdown | Wakeup | Cache Flush | |
| 190 | +---------+------+-----------+-----------------+-------------+ |
| 191 | | 0 | 0 | 100.04 | 20.32 (-38.50%) | 5.62 | |
| 192 | +---------+------+-----------+-----------------+-------------+ |
| 193 | | 0 | 1 | 99.78 | 20.6 (-36.10%) | 5.42 | |
| 194 | +---------+------+-----------+-----------------+-------------+ |
| 195 | | 1 | 0 | 278.28 | 19.52 | 4.32 | |
| 196 | +---------+------+-----------+-----------------+-------------+ |
| 197 | | 1 | 1 | 97.3 | 19.44 | 4.26 | |
| 198 | +---------+------+-----------+-----------------+-------------+ |
| 199 | | 1 | 2 | 97.56 | 19.52 | 4.32 | |
| 200 | +---------+------+-----------+-----------------+-------------+ |
| 201 | | 1 | 3 | 97.52 | 19.46 | 4.26 | |
| 202 | +---------+------+-----------+-----------------+-------------+ |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 203 | |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 204 | ``CPU_OFF`` on all non-lead CPUs |
| 205 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 206 | |
| 207 | ``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead |
| 208 | core to the deepest power level. |
| 209 | |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 210 | .. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.13) |
| 211 | |
| 212 | +---------+------+-----------+-----------------+-------------+ |
| 213 | | Cluster | Core | Powerdown | Wakeup | Cache Flush | |
| 214 | +---------+------+-----------+-----------------+-------------+ |
| 215 | | 0 | 0 | 243.02 | 26.42 (-39.51%) | 137.58 | |
| 216 | +---------+------+-----------+-----------------+-------------+ |
| 217 | | 0 | 1 | 244.24 | 26.32 (-38.93%) | 137.88 | |
| 218 | +---------+------+-----------+-----------------+-------------+ |
| 219 | | 1 | 0 | 182.36 | 23.66 | 78.0 | |
| 220 | +---------+------+-----------+-----------------+-------------+ |
| 221 | | 1 | 1 | 108.18 | 22.68 | 4.42 | |
| 222 | +---------+------+-----------+-----------------+-------------+ |
| 223 | | 1 | 2 | 108.34 | 21.72 | 4.24 | |
| 224 | +---------+------+-----------+-----------------+-------------+ |
| 225 | | 1 | 3 | 108.22 | 21.68 | 4.34 | |
| 226 | +---------+------+-----------+-----------------+-------------+ |
| 227 | |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 228 | .. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.12) |
| 229 | |
| 230 | +---------+------+-----------+-----------------+-------------+ |
| 231 | | Cluster | Core | Powerdown | Wakeup | Cache Flush | |
| 232 | +---------+------+-----------+-----------------+-------------+ |
| 233 | | 0 | 0 | 236.3 | 30.88 (-29.30%) | 137.76 | |
| 234 | +---------+------+-----------+-----------------+-------------+ |
| 235 | | 0 | 1 | 236.66 | 30.5 (-29.23%) | 138.02 | |
| 236 | +---------+------+-----------+-----------------+-------------+ |
| 237 | | 1 | 0 | 175.9 | 27.0 | 77.86 | |
| 238 | +---------+------+-----------+-----------------+-------------+ |
| 239 | | 1 | 1 | 100.96 | 27.56 | 4.26 | |
| 240 | +---------+------+-----------+-----------------+-------------+ |
| 241 | | 1 | 2 | 101.04 | 26.48 | 4.38 | |
| 242 | +---------+------+-----------+-----------------+-------------+ |
| 243 | | 1 | 3 | 101.08 | 26.74 | 4.4 | |
| 244 | +---------+------+-----------+-----------------+-------------+ |
| 245 | |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 246 | ``CPU_VERSION`` in parallel |
| 247 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 248 | |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 249 | .. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.13) |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 250 | |
| 251 | +-------------+--------+--------------+ |
| 252 | | Cluster | Core | Latency | |
| 253 | +-------------+--------+--------------+ |
| 254 | | 0 | 0 | 1.0 | |
| 255 | +-------------+--------+--------------+ |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 256 | | 0 | 1 | 1.06 | |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 257 | +-------------+--------+--------------+ |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 258 | | 1 | 0 | 0.6 | |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 259 | +-------------+--------+--------------+ |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 260 | | 1 | 1 | 1.0 | |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 261 | +-------------+--------+--------------+ |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 262 | | 1 | 2 | 0.98 | |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 263 | +-------------+--------+--------------+ |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 264 | | 1 | 3 | 1.0 | |
Zachary Leaf | 2603192 | 2024-11-15 13:09:40 +0000 | [diff] [blame] | 265 | +-------------+--------+--------------+ |
| 266 | |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 267 | .. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.12) |
Harrison Mutai | 9068845 | 2023-11-10 17:35:33 +0000 | [diff] [blame] | 268 | |
Harrison Mutai | 4193729 | 2023-11-10 17:35:33 +0000 | [diff] [blame] | 269 | +-------------+--------+--------------+ |
| 270 | | Cluster | Core | Latency | |
| 271 | +-------------+--------+--------------+ |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 272 | | 0 | 0 | 1.0 | |
Harrison Mutai | 4193729 | 2023-11-10 17:35:33 +0000 | [diff] [blame] | 273 | +-------------+--------+--------------+ |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 274 | | 0 | 1 | 1.02 | |
Harrison Mutai | 4193729 | 2023-11-10 17:35:33 +0000 | [diff] [blame] | 275 | +-------------+--------+--------------+ |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 276 | | 1 | 0 | 0.52 | |
Harrison Mutai | 4193729 | 2023-11-10 17:35:33 +0000 | [diff] [blame] | 277 | +-------------+--------+--------------+ |
| 278 | | 1 | 1 | 0.94 | |
| 279 | +-------------+--------+--------------+ |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 280 | | 1 | 2 | 0.94 | |
Harrison Mutai | 4193729 | 2023-11-10 17:35:33 +0000 | [diff] [blame] | 281 | +-------------+--------+--------------+ |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 282 | | 1 | 3 | 0.92 | |
Harrison Mutai | 4193729 | 2023-11-10 17:35:33 +0000 | [diff] [blame] | 283 | +-------------+--------+--------------+ |
Harrison Mutai | 9068845 | 2023-11-10 17:35:33 +0000 | [diff] [blame] | 284 | |
Harrison Mutai | 21cb965 | 2023-05-17 13:09:16 +0100 | [diff] [blame] | 285 | Annotated Historic Results |
| 286 | -------------------------- |
| 287 | |
| 288 | The following results are based on the upstream `TF master as of 31/01/2017`_. |
| 289 | TF-A was built using the same build instructions as detailed in the procedure |
| 290 | above. |
| 291 | |
| 292 | In the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and |
| 293 | CPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead |
| 294 | CPU. |
| 295 | |
| 296 | ``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and |
| 297 | ``CFLUSH_OVERHEAD`` the latency of the cache flush operation. |
Joel Hutton | 9e60563 | 2019-02-25 15:18:56 +0000 | [diff] [blame] | 298 | |
| 299 | ``CPU_SUSPEND`` to deepest power level on all CPUs in parallel |
| 300 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 301 | |
| 302 | +-------+---------------------+--------------------+--------------------------+ |
| 303 | | CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | |
| 304 | +=======+=====================+====================+==========================+ |
| 305 | | 0 | 27 | 20 | 5 | |
| 306 | +-------+---------------------+--------------------+--------------------------+ |
| 307 | | 1 | 114 | 86 | 5 | |
| 308 | +-------+---------------------+--------------------+--------------------------+ |
| 309 | | 2 | 202 | 58 | 5 | |
| 310 | +-------+---------------------+--------------------+--------------------------+ |
| 311 | | 3 | 375 | 29 | 94 | |
| 312 | +-------+---------------------+--------------------+--------------------------+ |
| 313 | | 4 | 20 | 22 | 6 | |
| 314 | +-------+---------------------+--------------------+--------------------------+ |
| 315 | | 5 | 290 | 18 | 206 | |
| 316 | +-------+---------------------+--------------------+--------------------------+ |
| 317 | |
| 318 | A large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is |
| 319 | observed due to TF PSCI lock contention. In the worst case, CPU 3 has to wait |
| 320 | for the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release |
| 321 | the lock before proceeding. |
| 322 | |
| 323 | The ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the |
| 324 | last CPUs in their respective clusters to power down, therefore both the L1 and |
| 325 | L2 caches are flushed. |
| 326 | |
| 327 | The ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3 |
| 328 | because the L2 cache size for the big cluster is lot larger (2MB) compared to |
| 329 | the little cluster (1MB). |
| 330 | |
| 331 | ``CPU_SUSPEND`` to power level 0 on all CPUs in parallel |
| 332 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 333 | |
| 334 | +-------+---------------------+--------------------+--------------------------+ |
| 335 | | CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | |
| 336 | +=======+=====================+====================+==========================+ |
| 337 | | 0 | 116 | 14 | 8 | |
| 338 | +-------+---------------------+--------------------+--------------------------+ |
| 339 | | 1 | 204 | 14 | 8 | |
| 340 | +-------+---------------------+--------------------+--------------------------+ |
| 341 | | 2 | 287 | 13 | 8 | |
| 342 | +-------+---------------------+--------------------+--------------------------+ |
| 343 | | 3 | 376 | 13 | 9 | |
| 344 | +-------+---------------------+--------------------+--------------------------+ |
| 345 | | 4 | 29 | 15 | 7 | |
| 346 | +-------+---------------------+--------------------+--------------------------+ |
| 347 | | 5 | 21 | 15 | 8 | |
| 348 | +-------+---------------------+--------------------+--------------------------+ |
| 349 | |
| 350 | There is no lock contention in TF generic code at power level 0 but the large |
| 351 | variance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno |
| 352 | platform code. The platform lock is used to mediate access to a single SCP |
| 353 | communication channel. This is compounded by the SCP firmware waiting for each |
| 354 | AP CPU to enter WFI before making the channel available to other CPUs, which |
| 355 | effectively serializes the SCP power down commands from all CPUs. |
| 356 | |
| 357 | On platforms with a more efficient CPU power down mechanism, it should be |
| 358 | possible to make the ``PSCI_ENTRY`` times smaller and consistent. |
| 359 | |
| 360 | The ``PSCI_EXIT`` times are consistent across all CPUs because TF does not |
| 361 | require locks at power level 0. |
| 362 | |
| 363 | The ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only |
| 364 | the cache associated with power level 0 is flushed (L1). |
| 365 | |
| 366 | ``CPU_SUSPEND`` to deepest power level on all CPUs in sequence |
| 367 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 368 | |
| 369 | +-------+---------------------+--------------------+--------------------------+ |
| 370 | | CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | |
| 371 | +=======+=====================+====================+==========================+ |
| 372 | | 0 | 114 | 20 | 94 | |
| 373 | +-------+---------------------+--------------------+--------------------------+ |
| 374 | | 1 | 114 | 20 | 94 | |
| 375 | +-------+---------------------+--------------------+--------------------------+ |
| 376 | | 2 | 114 | 20 | 94 | |
| 377 | +-------+---------------------+--------------------+--------------------------+ |
| 378 | | 3 | 114 | 20 | 94 | |
| 379 | +-------+---------------------+--------------------+--------------------------+ |
| 380 | | 4 | 195 | 22 | 180 | |
| 381 | +-------+---------------------+--------------------+--------------------------+ |
| 382 | | 5 | 21 | 17 | 6 | |
| 383 | +-------+---------------------+--------------------+--------------------------+ |
| 384 | |
Paul Beesley | f2ec714 | 2019-10-04 16:17:46 +0000 | [diff] [blame] | 385 | The ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster |
Joel Hutton | 9e60563 | 2019-02-25 15:18:56 +0000 | [diff] [blame] | 386 | are large because all other CPUs in the cluster are powered down during the |
| 387 | test. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a |
| 388 | flush of both L1 and L2 caches. |
| 389 | |
| 390 | The ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little |
| 391 | CPUs because the L2 cache size for the big cluster is lot larger (2MB) compared |
| 392 | to the little cluster (1MB). |
| 393 | |
| 394 | The ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead |
| 395 | CPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to |
| 396 | level 0, which only requires L1 cache flush. |
| 397 | |
| 398 | ``CPU_SUSPEND`` to power level 0 on all CPUs in sequence |
| 399 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 400 | |
| 401 | +-------+---------------------+--------------------+--------------------------+ |
| 402 | | CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | |
| 403 | +=======+=====================+====================+==========================+ |
| 404 | | 0 | 22 | 14 | 5 | |
| 405 | +-------+---------------------+--------------------+--------------------------+ |
| 406 | | 1 | 22 | 14 | 5 | |
| 407 | +-------+---------------------+--------------------+--------------------------+ |
| 408 | | 2 | 21 | 14 | 5 | |
| 409 | +-------+---------------------+--------------------+--------------------------+ |
| 410 | | 3 | 22 | 14 | 5 | |
| 411 | +-------+---------------------+--------------------+--------------------------+ |
| 412 | | 4 | 17 | 14 | 6 | |
| 413 | +-------+---------------------+--------------------+--------------------------+ |
| 414 | | 5 | 18 | 15 | 6 | |
| 415 | +-------+---------------------+--------------------+--------------------------+ |
| 416 | |
| 417 | Here the times are small and consistent since there is no contention and it is |
| 418 | only necessary to flush the cache to power level 0 (L1). This is the best case |
| 419 | scenario. |
| 420 | |
| 421 | The ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than |
| 422 | for the CPUs in little cluster due to greater CPU performance. |
| 423 | |
| 424 | The ``PSCI_EXIT`` times are generally lower than in the last test because the |
| 425 | cluster remains powered on throughout the test and there is less code to execute |
| 426 | on power on (for example, no need to enter CCI coherency) |
| 427 | |
| 428 | ``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level |
| 429 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 430 | |
| 431 | The test sequence here is as follows: |
| 432 | |
| 433 | 1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence. |
| 434 | |
| 435 | 2. Program wake up timer and suspend the lead CPU to the deepest power level. |
| 436 | |
| 437 | 3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU. |
| 438 | |
| 439 | +-------+---------------------+--------------------+--------------------------+ |
| 440 | | CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | |
| 441 | +=======+=====================+====================+==========================+ |
| 442 | | 0 | 110 | 28 | 93 | |
| 443 | +-------+---------------------+--------------------+--------------------------+ |
| 444 | | 1 | 110 | 28 | 93 | |
| 445 | +-------+---------------------+--------------------+--------------------------+ |
| 446 | | 2 | 110 | 28 | 93 | |
| 447 | +-------+---------------------+--------------------+--------------------------+ |
| 448 | | 3 | 111 | 28 | 93 | |
| 449 | +-------+---------------------+--------------------+--------------------------+ |
| 450 | | 4 | 195 | 22 | 181 | |
| 451 | +-------+---------------------+--------------------+--------------------------+ |
| 452 | | 5 | 20 | 23 | 6 | |
| 453 | +-------+---------------------+--------------------+--------------------------+ |
| 454 | |
| 455 | The ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other |
| 456 | CPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call |
| 457 | powers down to the cluster level, requiring a flush of both L1 and L2 caches. |
| 458 | |
| 459 | The ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because |
| 460 | lead CPU 4 is running and CPU 5 only powers down to level 0, which only requires |
| 461 | an L1 cache flush. |
| 462 | |
| 463 | The ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little |
| 464 | CPUs because the L2 cache size for the big cluster is lot larger (2MB) compared |
| 465 | to the little cluster (1MB). |
| 466 | |
| 467 | The ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than |
| 468 | for CPUs in the little cluster due to greater CPU performance. These times |
| 469 | generally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests |
| 470 | because there is more code to execute in the "on finisher" compared to the |
| 471 | "suspend finisher" (for example, GIC redistributor register programming). |
| 472 | |
| 473 | ``PSCI_VERSION`` on all CPUs in parallel |
| 474 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 475 | |
| 476 | Since very little code is associated with ``PSCI_VERSION``, this test |
| 477 | approximates the round trip latency for handling a fast SMC at EL3 in TF. |
| 478 | |
| 479 | +-------+-------------------+ |
| 480 | | CPU | TOTAL TIME (ns) | |
| 481 | +=======+===================+ |
| 482 | | 0 | 3020 | |
| 483 | +-------+-------------------+ |
| 484 | | 1 | 2940 | |
| 485 | +-------+-------------------+ |
| 486 | | 2 | 2980 | |
| 487 | +-------+-------------------+ |
| 488 | | 3 | 3060 | |
| 489 | +-------+-------------------+ |
| 490 | | 4 | 520 | |
| 491 | +-------+-------------------+ |
| 492 | | 5 | 720 | |
| 493 | +-------+-------------------+ |
| 494 | |
| 495 | The times for the big CPUs are less than the little CPUs due to greater CPU |
| 496 | performance. |
| 497 | |
| 498 | We suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache |
| 499 | effects, given that these measurements are at the nano-second level. |
| 500 | |
John Tsichritzis | 63801cd | 2019-07-05 14:22:12 +0100 | [diff] [blame] | 501 | -------------- |
| 502 | |
Boyan Karatotev | d885590 | 2025-05-07 15:46:36 +0100 | [diff] [blame] | 503 | *Copyright (c) 2019-2025, Arm Limited and Contributors. All rights reserved.* |
John Tsichritzis | 63801cd | 2019-07-05 14:22:12 +0100 | [diff] [blame] | 504 | |
Harrison Mutai | 341740c | 2023-02-13 18:30:04 +0000 | [diff] [blame] | 505 | .. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/ |
Joel Hutton | 0f79fb1 | 2019-02-26 16:23:54 +0000 | [diff] [blame] | 506 | .. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d |
Boyan Karatotev | be6f91d | 2025-05-07 08:58:12 +0100 | [diff] [blame^] | 507 | .. _TF-A v2.13-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.13-rc0 |
| 508 | .. _TFTF v2.13-rc0: https://git.trustedfirmware.org/TF-A/tf-a-tests.git/tree/?h=v2.13-rc0 |