blob: 45c1922aa8fdf475126916d28b64a5930717f05d [file] [log] [blame]
Paul Beesley236d2462019-03-05 17:19:37 +00001PSCI Performance Measurements on Arm Juno Development Platform
2==============================================================
3
Joel Hutton9e605632019-02-25 15:18:56 +00004This document summarises the findings of performance measurements of key
John Tsichritzis63801cd2019-07-05 14:22:12 +01005operations in the Trusted Firmware-A Power State Coordination Interface (PSCI)
6implementation, using the in-built Performance Measurement Framework (PMF) and
7runtime instrumentation timestamps.
Joel Hutton9e605632019-02-25 15:18:56 +00008
9Method
10------
11
12We used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2
13x Cortex-A57 clusters running at the following frequencies:
14
15+-----------------+--------------------+
16| Domain | Frequency (MHz) |
17+=================+====================+
18| Cortex-A57 | 900 (nominal) |
19+-----------------+--------------------+
20| Cortex-A53 | 650 (underdrive) |
21+-----------------+--------------------+
22| AXI subsystem | 533 |
23+-----------------+--------------------+
24
25Juno supports CPU, cluster and system power down states, corresponding to power
26levels 0, 1 and 2 respectively. It does not support any retention states.
27
Harrison Mutai21cb9652023-05-17 13:09:16 +010028Given that runtime instrumentation using PMF is invasive, there is a small
29(unquantified) overhead on the results. PMF uses the generic counter for
30timestamps, which runs at 50MHz on Juno.
Joel Hutton9e605632019-02-25 15:18:56 +000031
Harrison Mutai21cb9652023-05-17 13:09:16 +010032The following source trees and binaries were used:
Joel Hutton9e605632019-02-25 15:18:56 +000033
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +010034- `TF-A v2.13-rc0`_
35- `TFTF v2.13-rc0`_
Joel Hutton9e605632019-02-25 15:18:56 +000036
Thaddeus Serna8709cc92023-08-14 13:28:59 -050037Please see the Runtime Instrumentation :ref:`Testing Methodology
38<Runtime Instrumentation Methodology>`
Boyan Karatotevd8855902025-05-07 15:46:36 +010039page for more details. The tests were ran using the
40`tf-psci-lava-instr/juno-enable-runtime-instr,juno-instrumentation:juno-tftf`
41configuration in CI.
Harrison Mutai21cb9652023-05-17 13:09:16 +010042
43Results
44-------
45
46``CPU_SUSPEND`` to deepest power level
47~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48
49.. table:: ``CPU_SUSPEND`` latencies s) to deepest power level in
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +010050 parallel (v2.13)
51
52 +---------+------+------------------+-------------------+--------------------+
53 | Cluster | Core | Powerdown | Wakeup | Cache Flush |
54 +---------+------+------------------+-------------------+--------------------+
55 | 0 | 0 | 333.0 (-52.92%) | 23.92 (-40.11%) | 138.88 |
56 +---------+------+------------------+-------------------+--------------------+
57 | 0 | 1 | 630.9 (+145.95%) | 253.72 (-46.56%) | 136.94 (+1987.50%) |
58 +---------+------+------------------+-------------------+--------------------+
59 | 1 | 0 | 184.74 (+71.92%) | 23.16 (-95.39%) | 80.24 (+1283.45%) |
60 +---------+------+------------------+-------------------+--------------------+
61 | 1 | 1 | 481.14 | 18.56 (-88.25%) | 76.5 (+1520.76%) |
62 +---------+------+------------------+-------------------+--------------------+
63 | 1 | 2 | 933.88 (+67.76%) | 289.58 (+189.64%) | 76.34 (+1510.55%) |
64 +---------+------+------------------+-------------------+--------------------+
65 | 1 | 3 | 1112.48 | 238.42 (+753.94%) | 76.38 |
66 +---------+------+------------------+-------------------+--------------------+
67
68.. table:: ``CPU_SUSPEND`` latencies s) to deepest power level in
Zachary Leaf26031922024-11-15 13:09:40 +000069 parallel (v2.12)
70
71 +---------+------+-------------------+------------------+--------------------+
72 | Cluster | Core | Powerdown | Wakeup | Cache Flush |
73 +---------+------+-------------------+------------------+--------------------+
74 | 0 | 0 | 244.52 (-65.43%) | 26.92 (-32.60%) | 5.54 (-96.70%) |
75 +---------+------+-------------------+------------------+--------------------+
76 | 0 | 1 | 526.18 (+105.12%) | 416.1 | 138.52 (+2011.59%) |
77 +---------+------+-------------------+------------------+--------------------+
78 | 1 | 0 | 104.34 | 27.02 (-94.62%) | 5.32 |
79 +---------+------+-------------------+------------------+--------------------+
80 | 1 | 1 | 384.98 | 23.06 (-85.40%) | 4.48 |
81 +---------+------+-------------------+------------------+--------------------+
82 | 1 | 2 | 812.44 (+45.94%) | 126.78 | 4.54 |
83 +---------+------+-------------------+------------------+--------------------+
84 | 1 | 3 | 986.84 | 77.22 (+176.58%) | 79.76 |
85 +---------+------+-------------------+------------------+--------------------+
86
87.. table:: ``CPU_SUSPEND`` latencies s) to deepest power level in
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +010088 serial (v2.13)
Harrison Mutai21cb9652023-05-17 13:09:16 +010089
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +010090 +---------+------+------------------+-----------------+-------------------+
91 | Cluster | Core | Powerdown | Wakeup | Cache Flush |
92 +---------+------+------------------+-----------------+-------------------+
93 | 0 | 0 | 244.08 | 24.48 (-40.00%) | 137.64 |
94 +---------+------+------------------+-----------------+-------------------+
95 | 0 | 1 | 244.2 | 23.84 (-41.57%) | 137.86 |
96 +---------+------+------------------+-----------------+-------------------+
97 | 1 | 0 | 294.78 | 23.54 | 76.62 |
98 +---------+------+------------------+-----------------+-------------------+
99 | 1 | 1 | 180.1 (+74.72%) | 21.14 | 77.12 (+1533.90%) |
100 +---------+------+------------------+-----------------+-------------------+
101 | 1 | 2 | 180.54 (+75.25%) | 20.8 | 76.76 (+1554.31%) |
102 +---------+------+------------------+-----------------+-------------------+
103 | 1 | 3 | 180.6 (+75.44%) | 21.2 | 76.86 (+1542.31%) |
104 +---------+------+------------------+-----------------+-------------------+
Harrison Mutai21cb9652023-05-17 13:09:16 +0100105
106.. table:: ``CPU_SUSPEND`` latencies s) to deepest power level in
Zachary Leaf26031922024-11-15 13:09:40 +0000107 serial (v2.12)
Harrison Mutai21cb9652023-05-17 13:09:16 +0100108
Zachary Leaf26031922024-11-15 13:09:40 +0000109 +---------+------+-----------+-----------------+-------------+
110 | Cluster | Core | Powerdown | Wakeup | Cache Flush |
111 +---------+------+-----------+-----------------+-------------+
112 | 0 | 0 | 236.36 | 27.94 (-31.52%) | 138.0 |
113 +---------+------+-----------+-----------------+-------------+
114 | 0 | 1 | 236.58 | 27.86 (-31.72%) | 138.2 |
115 +---------+------+-----------+-----------------+-------------+
116 | 1 | 0 | 280.68 | 27.02 | 77.6 |
117 +---------+------+-----------+-----------------+-------------+
118 | 1 | 1 | 101.4 | 22.52 | 4.42 |
119 +---------+------+-----------+-----------------+-------------+
120 | 1 | 2 | 100.92 | 22.68 | 4.4 |
121 +---------+------+-----------+-----------------+-------------+
122 | 1 | 3 | 100.96 | 22.54 | 4.38 |
123 +---------+------+-----------+-----------------+-------------+
Harrison Mutai90688452023-11-10 17:35:33 +0000124
Harrison Mutai21cb9652023-05-17 13:09:16 +0100125``CPU_SUSPEND`` to power level 0
126~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Joel Hutton9e605632019-02-25 15:18:56 +0000127
Harrison Mutai21cb9652023-05-17 13:09:16 +0100128.. table:: ``CPU_SUSPEND`` latencies s) to power level 0 in
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100129 parallel (v2.13)
130
131 +---------+------+-------------------+-----------------+-------------+
132 | Cluster | Core | Powerdown | Wakeup | Cache Flush |
133 +---------+------+-------------------+-----------------+-------------+
134 | 0 | 0 | 703.06 | 16.86 (-47.87%) | 7.98 |
135 +---------+------+-------------------+-----------------+-------------+
136 | 0 | 1 | 851.88 | 16.4 (-49.41%) | 8.04 |
137 +---------+------+-------------------+-----------------+-------------+
138 | 1 | 0 | 407.4 (+58.99%) | 15.1 (-26.20%) | 7.2 |
139 +---------+------+-------------------+-----------------+-------------+
140 | 1 | 1 | 110.98 (-72.67%) | 15.46 | 6.56 |
141 +---------+------+-------------------+-----------------+-------------+
142 | 1 | 2 | 554.54 | 15.4 | 6.94 |
143 +---------+------+-------------------+-----------------+-------------+
144 | 1 | 3 | 258.96 (+143.06%) | 15.56 (-25.05%) | 6.64 |
145 +---------+------+-------------------+-----------------+-------------+
146
147.. table:: ``CPU_SUSPEND`` latencies s) to power level 0 in
Zachary Leaf26031922024-11-15 13:09:40 +0000148 parallel (v2.12)
149
150 +--------------------------------------------------------------------+
151 | test_rt_instr_cpu_susp_parallel |
152 +---------+------+-------------------+-----------------+-------------+
153 | Cluster | Core | Powerdown | Wakeup | Cache Flush |
154 +---------+------+-------------------+-----------------+-------------+
155 | 0 | 0 | 663.12 | 19.66 (-39.21%) | 8.26 |
156 +---------+------+-------------------+-----------------+-------------+
157 | 0 | 1 | 804.18 | 19.24 (-40.65%) | 8.1 |
158 +---------+------+-------------------+-----------------+-------------+
159 | 1 | 0 | 105.58 (-58.80%) | 19.68 | 7.42 |
160 +---------+------+-------------------+-----------------+-------------+
161 | 1 | 1 | 245.02 (-39.67%) | 19.8 | 6.82 |
162 +---------+------+-------------------+-----------------+-------------+
163 | 1 | 2 | 383.82 (-30.83%) | 18.84 | 7.06 |
164 +---------+------+-------------------+-----------------+-------------+
165 | 1 | 3 | 523.36 (+391.23%) | 19.0 | 7.3 |
166 +---------+------+-------------------+-----------------+-------------+
167
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100168.. table:: ``CPU_SUSPEND`` latencies s) to power level 0 in serial (v2.13)
Harrison Mutai21cb9652023-05-17 13:09:16 +0100169
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100170 +---------+------+-----------+-----------------+-------------+
171 | Cluster | Core | Powerdown | Wakeup | Cache Flush |
172 +---------+------+-----------+-----------------+-------------+
173 | 0 | 0 | 106.12 | 17.1 (-48.24%) | 5.26 |
174 +---------+------+-----------+-----------------+-------------+
175 | 0 | 1 | 106.88 | 17.06 (-47.08%) | 5.28 |
176 +---------+------+-----------+-----------------+-------------+
177 | 1 | 0 | 294.36 | 15.6 | 4.56 |
178 +---------+------+-----------+-----------------+-------------+
179 | 1 | 1 | 103.26 | 15.44 | 4.46 |
180 +---------+------+-----------+-----------------+-------------+
181 | 1 | 2 | 103.7 | 15.26 | 4.5 |
182 +---------+------+-----------+-----------------+-------------+
183 | 1 | 3 | 103.68 | 15.72 | 4.5 |
184 +---------+------+-----------+-----------------+-------------+
Harrison Mutai21cb9652023-05-17 13:09:16 +0100185
Zachary Leaf26031922024-11-15 13:09:40 +0000186.. table:: ``CPU_SUSPEND`` latencies s) to power level 0 in serial (v2.12)
Harrison Mutai90688452023-11-10 17:35:33 +0000187
Zachary Leaf26031922024-11-15 13:09:40 +0000188 +---------+------+-----------+-----------------+-------------+
189 | Cluster | Core | Powerdown | Wakeup | Cache Flush |
190 +---------+------+-----------+-----------------+-------------+
191 | 0 | 0 | 100.04 | 20.32 (-38.50%) | 5.62 |
192 +---------+------+-----------+-----------------+-------------+
193 | 0 | 1 | 99.78 | 20.6 (-36.10%) | 5.42 |
194 +---------+------+-----------+-----------------+-------------+
195 | 1 | 0 | 278.28 | 19.52 | 4.32 |
196 +---------+------+-----------+-----------------+-------------+
197 | 1 | 1 | 97.3 | 19.44 | 4.26 |
198 +---------+------+-----------+-----------------+-------------+
199 | 1 | 2 | 97.56 | 19.52 | 4.32 |
200 +---------+------+-----------+-----------------+-------------+
201 | 1 | 3 | 97.52 | 19.46 | 4.26 |
202 +---------+------+-----------+-----------------+-------------+
Harrison Mutai21cb9652023-05-17 13:09:16 +0100203
Harrison Mutai21cb9652023-05-17 13:09:16 +0100204``CPU_OFF`` on all non-lead CPUs
205~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
206
207``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
208core to the deepest power level.
209
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100210.. table:: ``CPU_OFF`` latencies s) on all non-lead CPUs (v2.13)
211
212 +---------+------+-----------+-----------------+-------------+
213 | Cluster | Core | Powerdown | Wakeup | Cache Flush |
214 +---------+------+-----------+-----------------+-------------+
215 | 0 | 0 | 243.02 | 26.42 (-39.51%) | 137.58 |
216 +---------+------+-----------+-----------------+-------------+
217 | 0 | 1 | 244.24 | 26.32 (-38.93%) | 137.88 |
218 +---------+------+-----------+-----------------+-------------+
219 | 1 | 0 | 182.36 | 23.66 | 78.0 |
220 +---------+------+-----------+-----------------+-------------+
221 | 1 | 1 | 108.18 | 22.68 | 4.42 |
222 +---------+------+-----------+-----------------+-------------+
223 | 1 | 2 | 108.34 | 21.72 | 4.24 |
224 +---------+------+-----------+-----------------+-------------+
225 | 1 | 3 | 108.22 | 21.68 | 4.34 |
226 +---------+------+-----------+-----------------+-------------+
227
Zachary Leaf26031922024-11-15 13:09:40 +0000228.. table:: ``CPU_OFF`` latencies s) on all non-lead CPUs (v2.12)
229
230 +---------+------+-----------+-----------------+-------------+
231 | Cluster | Core | Powerdown | Wakeup | Cache Flush |
232 +---------+------+-----------+-----------------+-------------+
233 | 0 | 0 | 236.3 | 30.88 (-29.30%) | 137.76 |
234 +---------+------+-----------+-----------------+-------------+
235 | 0 | 1 | 236.66 | 30.5 (-29.23%) | 138.02 |
236 +---------+------+-----------+-----------------+-------------+
237 | 1 | 0 | 175.9 | 27.0 | 77.86 |
238 +---------+------+-----------+-----------------+-------------+
239 | 1 | 1 | 100.96 | 27.56 | 4.26 |
240 +---------+------+-----------+-----------------+-------------+
241 | 1 | 2 | 101.04 | 26.48 | 4.38 |
242 +---------+------+-----------+-----------------+-------------+
243 | 1 | 3 | 101.08 | 26.74 | 4.4 |
244 +---------+------+-----------+-----------------+-------------+
245
Harrison Mutai21cb9652023-05-17 13:09:16 +0100246``CPU_VERSION`` in parallel
247~~~~~~~~~~~~~~~~~~~~~~~~~~~
248
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100249.. table:: ``CPU_VERSION`` latency s) in parallel on all cores (2.13)
Zachary Leaf26031922024-11-15 13:09:40 +0000250
251 +-------------+--------+--------------+
252 | Cluster | Core | Latency |
253 +-------------+--------+--------------+
254 | 0 | 0 | 1.0 |
255 +-------------+--------+--------------+
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100256 | 0 | 1 | 1.06 |
Zachary Leaf26031922024-11-15 13:09:40 +0000257 +-------------+--------+--------------+
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100258 | 1 | 0 | 0.6 |
Zachary Leaf26031922024-11-15 13:09:40 +0000259 +-------------+--------+--------------+
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100260 | 1 | 1 | 1.0 |
Zachary Leaf26031922024-11-15 13:09:40 +0000261 +-------------+--------+--------------+
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100262 | 1 | 2 | 0.98 |
Zachary Leaf26031922024-11-15 13:09:40 +0000263 +-------------+--------+--------------+
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100264 | 1 | 3 | 1.0 |
Zachary Leaf26031922024-11-15 13:09:40 +0000265 +-------------+--------+--------------+
266
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100267.. table:: ``CPU_VERSION`` latency s) in parallel on all cores (2.12)
Harrison Mutai90688452023-11-10 17:35:33 +0000268
Harrison Mutai41937292023-11-10 17:35:33 +0000269 +-------------+--------+--------------+
270 | Cluster | Core | Latency |
271 +-------------+--------+--------------+
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100272 | 0 | 0 | 1.0 |
Harrison Mutai41937292023-11-10 17:35:33 +0000273 +-------------+--------+--------------+
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100274 | 0 | 1 | 1.02 |
Harrison Mutai41937292023-11-10 17:35:33 +0000275 +-------------+--------+--------------+
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100276 | 1 | 0 | 0.52 |
Harrison Mutai41937292023-11-10 17:35:33 +0000277 +-------------+--------+--------------+
278 | 1 | 1 | 0.94 |
279 +-------------+--------+--------------+
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100280 | 1 | 2 | 0.94 |
Harrison Mutai41937292023-11-10 17:35:33 +0000281 +-------------+--------+--------------+
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100282 | 1 | 3 | 0.92 |
Harrison Mutai41937292023-11-10 17:35:33 +0000283 +-------------+--------+--------------+
Harrison Mutai90688452023-11-10 17:35:33 +0000284
Harrison Mutai21cb9652023-05-17 13:09:16 +0100285Annotated Historic Results
286--------------------------
287
288The following results are based on the upstream `TF master as of 31/01/2017`_.
289TF-A was built using the same build instructions as detailed in the procedure
290above.
291
292In the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
293CPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
294CPU.
295
296``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and
297``CFLUSH_OVERHEAD`` the latency of the cache flush operation.
Joel Hutton9e605632019-02-25 15:18:56 +0000298
299``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
300~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
301
302+-------+---------------------+--------------------+--------------------------+
303| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
304+=======+=====================+====================+==========================+
305| 0 | 27 | 20 | 5 |
306+-------+---------------------+--------------------+--------------------------+
307| 1 | 114 | 86 | 5 |
308+-------+---------------------+--------------------+--------------------------+
309| 2 | 202 | 58 | 5 |
310+-------+---------------------+--------------------+--------------------------+
311| 3 | 375 | 29 | 94 |
312+-------+---------------------+--------------------+--------------------------+
313| 4 | 20 | 22 | 6 |
314+-------+---------------------+--------------------+--------------------------+
315| 5 | 290 | 18 | 206 |
316+-------+---------------------+--------------------+--------------------------+
317
318A large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is
319observed due to TF PSCI lock contention. In the worst case, CPU 3 has to wait
320for the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release
321the lock before proceeding.
322
323The ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the
324last CPUs in their respective clusters to power down, therefore both the L1 and
325L2 caches are flushed.
326
327The ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3
328because the L2 cache size for the big cluster is lot larger (2MB) compared to
329the little cluster (1MB).
330
331``CPU_SUSPEND`` to power level 0 on all CPUs in parallel
332~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
333
334+-------+---------------------+--------------------+--------------------------+
335| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
336+=======+=====================+====================+==========================+
337| 0 | 116 | 14 | 8 |
338+-------+---------------------+--------------------+--------------------------+
339| 1 | 204 | 14 | 8 |
340+-------+---------------------+--------------------+--------------------------+
341| 2 | 287 | 13 | 8 |
342+-------+---------------------+--------------------+--------------------------+
343| 3 | 376 | 13 | 9 |
344+-------+---------------------+--------------------+--------------------------+
345| 4 | 29 | 15 | 7 |
346+-------+---------------------+--------------------+--------------------------+
347| 5 | 21 | 15 | 8 |
348+-------+---------------------+--------------------+--------------------------+
349
350There is no lock contention in TF generic code at power level 0 but the large
351variance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno
352platform code. The platform lock is used to mediate access to a single SCP
353communication channel. This is compounded by the SCP firmware waiting for each
354AP CPU to enter WFI before making the channel available to other CPUs, which
355effectively serializes the SCP power down commands from all CPUs.
356
357On platforms with a more efficient CPU power down mechanism, it should be
358possible to make the ``PSCI_ENTRY`` times smaller and consistent.
359
360The ``PSCI_EXIT`` times are consistent across all CPUs because TF does not
361require locks at power level 0.
362
363The ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only
364the cache associated with power level 0 is flushed (L1).
365
366``CPU_SUSPEND`` to deepest power level on all CPUs in sequence
367~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
368
369+-------+---------------------+--------------------+--------------------------+
370| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
371+=======+=====================+====================+==========================+
372| 0 | 114 | 20 | 94 |
373+-------+---------------------+--------------------+--------------------------+
374| 1 | 114 | 20 | 94 |
375+-------+---------------------+--------------------+--------------------------+
376| 2 | 114 | 20 | 94 |
377+-------+---------------------+--------------------+--------------------------+
378| 3 | 114 | 20 | 94 |
379+-------+---------------------+--------------------+--------------------------+
380| 4 | 195 | 22 | 180 |
381+-------+---------------------+--------------------+--------------------------+
382| 5 | 21 | 17 | 6 |
383+-------+---------------------+--------------------+--------------------------+
384
Paul Beesleyf2ec7142019-10-04 16:17:46 +0000385The ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster
Joel Hutton9e605632019-02-25 15:18:56 +0000386are large because all other CPUs in the cluster are powered down during the
387test. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a
388flush of both L1 and L2 caches.
389
390The ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
391CPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
392to the little cluster (1MB).
393
394The ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead
395CPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to
396level 0, which only requires L1 cache flush.
397
398``CPU_SUSPEND`` to power level 0 on all CPUs in sequence
399~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
400
401+-------+---------------------+--------------------+--------------------------+
402| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
403+=======+=====================+====================+==========================+
404| 0 | 22 | 14 | 5 |
405+-------+---------------------+--------------------+--------------------------+
406| 1 | 22 | 14 | 5 |
407+-------+---------------------+--------------------+--------------------------+
408| 2 | 21 | 14 | 5 |
409+-------+---------------------+--------------------+--------------------------+
410| 3 | 22 | 14 | 5 |
411+-------+---------------------+--------------------+--------------------------+
412| 4 | 17 | 14 | 6 |
413+-------+---------------------+--------------------+--------------------------+
414| 5 | 18 | 15 | 6 |
415+-------+---------------------+--------------------+--------------------------+
416
417Here the times are small and consistent since there is no contention and it is
418only necessary to flush the cache to power level 0 (L1). This is the best case
419scenario.
420
421The ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than
422for the CPUs in little cluster due to greater CPU performance.
423
424The ``PSCI_EXIT`` times are generally lower than in the last test because the
425cluster remains powered on throughout the test and there is less code to execute
426on power on (for example, no need to enter CCI coherency)
427
428``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level
429~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
430
431The test sequence here is as follows:
432
4331. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence.
434
4352. Program wake up timer and suspend the lead CPU to the deepest power level.
436
4373. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU.
438
439+-------+---------------------+--------------------+--------------------------+
440| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
441+=======+=====================+====================+==========================+
442| 0 | 110 | 28 | 93 |
443+-------+---------------------+--------------------+--------------------------+
444| 1 | 110 | 28 | 93 |
445+-------+---------------------+--------------------+--------------------------+
446| 2 | 110 | 28 | 93 |
447+-------+---------------------+--------------------+--------------------------+
448| 3 | 111 | 28 | 93 |
449+-------+---------------------+--------------------+--------------------------+
450| 4 | 195 | 22 | 181 |
451+-------+---------------------+--------------------+--------------------------+
452| 5 | 20 | 23 | 6 |
453+-------+---------------------+--------------------+--------------------------+
454
455The ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other
456CPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call
457powers down to the cluster level, requiring a flush of both L1 and L2 caches.
458
459The ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because
460lead CPU 4 is running and CPU 5 only powers down to level 0, which only requires
461an L1 cache flush.
462
463The ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
464CPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
465to the little cluster (1MB).
466
467The ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than
468for CPUs in the little cluster due to greater CPU performance. These times
469generally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests
470because there is more code to execute in the "on finisher" compared to the
471"suspend finisher" (for example, GIC redistributor register programming).
472
473``PSCI_VERSION`` on all CPUs in parallel
474~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
475
476Since very little code is associated with ``PSCI_VERSION``, this test
477approximates the round trip latency for handling a fast SMC at EL3 in TF.
478
479+-------+-------------------+
480| CPU | TOTAL TIME (ns) |
481+=======+===================+
482| 0 | 3020 |
483+-------+-------------------+
484| 1 | 2940 |
485+-------+-------------------+
486| 2 | 2980 |
487+-------+-------------------+
488| 3 | 3060 |
489+-------+-------------------+
490| 4 | 520 |
491+-------+-------------------+
492| 5 | 720 |
493+-------+-------------------+
494
495The times for the big CPUs are less than the little CPUs due to greater CPU
496performance.
497
498We suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache
499effects, given that these measurements are at the nano-second level.
500
John Tsichritzis63801cd2019-07-05 14:22:12 +0100501--------------
502
Boyan Karatotevd8855902025-05-07 15:46:36 +0100503*Copyright (c) 2019-2025, Arm Limited and Contributors. All rights reserved.*
John Tsichritzis63801cd2019-07-05 14:22:12 +0100504
Harrison Mutai341740c2023-02-13 18:30:04 +0000505.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
Joel Hutton0f79fb12019-02-26 16:23:54 +0000506.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
Boyan Karatotevbe6f91d2025-05-07 08:58:12 +0100507.. _TF-A v2.13-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.13-rc0
508.. _TFTF v2.13-rc0: https://git.trustedfirmware.org/TF-A/tf-a-tests.git/tree/?h=v2.13-rc0