Merge "docs: update feature support overview" into integration
diff --git a/docs/design/firmware-design.rst b/docs/design/firmware-design.rst
index 2c9b76a..14b273e 100644
--- a/docs/design/firmware-design.rst
+++ b/docs/design/firmware-design.rst
@@ -2622,16 +2622,29 @@
 section lists the usage of Architecture Extensions, and build flags
 controlling them.
 
-In general, and unless individually mentioned, the build options
-``ARM_ARCH_MAJOR`` and ``ARM_ARCH_MINOR`` select the Architecture Extension to
-target when building TF-A. Subsequent Arm Architecture Extensions are backward
-compatible with previous versions.
+Build options
+~~~~~~~~~~~~~
+
+``ARM_ARCH_MAJOR`` and ``ARM_ARCH_MINOR``
+
+These build options serve dual purpose
+
+- Determine the architecture extension support in TF-A build: All the mandatory
+  architectural features up to ``ARM_ARCH_MAJOR.ARM_ARCH_MINOR`` are included
+  and unconditionally enabled by TF-A build system.
+
+- Passed to compiler via "-march" option to generate binary target : Tell the
+  compiler to emit instructions upto ``ARM_ARCH_MAJOR.ARM_ARCH_MINOR``
+
+The build system requires that the platform provides a valid numeric value based on
+CPU architecture extension, otherwise it defaults to base Armv8.0-A architecture.
+Subsequent Arm Architecture versions also support extensions which were introduced
+in previous versions.
 
-The build system only requires that ``ARM_ARCH_MAJOR`` and ``ARM_ARCH_MINOR`` have a
-valid numeric value. These build options only control whether or not
-Architecture Extension-specific code is included in the build. Otherwise, TF-A
-targets the base Armv8.0-A architecture; i.e. as if ``ARM_ARCH_MAJOR`` == 8
-and ``ARM_ARCH_MINOR`` == 0, which are also their respective default values.
+**TO-DO** : Its planned to decouple the two functionalities and introduce a new macro
+for compiler usage. The requirement for this decoupling arises becasue TF-A code
+always provides support for the latest and greatest architecture features but this
+is not the case for the target compiler.
 
 .. seealso:: :ref:`Build Options`
 
diff --git a/docs/perf/index.rst b/docs/perf/index.rst
index b83c6e3..0938a17 100644
--- a/docs/perf/index.rst
+++ b/docs/perf/index.rst
@@ -7,6 +7,7 @@
 
    psci-performance-instr
    psci-performance-juno
+   psci-performance-n1sdp
    psci-performance-methodology
    tsp
    performance-monitoring-unit
diff --git a/docs/perf/psci-performance-juno.rst b/docs/perf/psci-performance-juno.rst
index 7418669..7a484b8 100644
--- a/docs/perf/psci-performance-juno.rst
+++ b/docs/perf/psci-performance-juno.rst
@@ -25,62 +25,189 @@
 Juno supports CPU, cluster and system power down states, corresponding to power
 levels 0, 1 and 2 respectively. It does not support any retention states.
 
-We used the upstream `TF master as of 31/01/2017`_, building the platform using
-the ``ENABLE_RUNTIME_INSTRUMENTATION`` option:
+Given that runtime instrumentation using PMF is invasive, there is a small
+(unquantified) overhead on the results. PMF uses the generic counter for
+timestamps, which runs at 50MHz on Juno.
 
-.. code:: shell
+The following source trees and binaries were used:
 
-    make PLAT=juno ENABLE_RUNTIME_INSTRUMENTATION=1 \
-        SCP_BL2=<path/to/scp-fw.bin>                \
-        BL33=<path/to/test-fw.bin>                  \
-        all fip
+- TF-A [`v2.9-rc0`_]
+- TFTF [`v2.9-rc0`_]
 
-When using the debug build of TF, there was no noticeable difference in the
-results.
+Please see the Runtime Instrumentation `Testing Methodology`_ page for more
+details.
 
-The tests are based on an ARM-internal test framework. The release build of this
-framework was used because the results in the debug build became skewed; the
-console output prevented some of the tests from executing in parallel.
+Procedure
+---------
 
-The tests consist of both parallel and sequential tests, which are broadly
-described as follows:
+#. Build TFTF with runtime instrumentation enabled:
 
-- **Parallel Tests** This type of test powers on all the non-lead CPUs and
-  brings them and the lead CPU to a common synchronization point.  The lead CPU
-  then initiates the test on all CPUs in parallel.
+    .. code:: shell
 
-- **Sequential Tests** This type of test powers on each non-lead CPU in
-  sequence. The lead CPU initiates the test on a non-lead CPU then waits for the
-  test to complete before proceeding to the next non-lead CPU. The lead CPU then
-  executes the test on itself.
+        make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
+            TESTS=runtime-instrumentation all
 
-In the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
-CPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
-CPU.
+#. Fetch Juno's SCP binary from TF-A's archive:
 
-``PSCI_ENTRY`` refers to the time taken from entering the TF PSCI implementation
-to the point the hardware enters the low power state (WFI). Referring to the TF
-runtime instrumentation points, this corresponds to:
-``(RT_INSTR_ENTER_HW_LOW_PWR - RT_INSTR_ENTER_PSCI)``.
+    .. code:: shell
 
-``PSCI_EXIT`` refers to the time taken from the point the hardware exits the low
-power state to exiting the TF PSCI implementation. This corresponds to:
-``(RT_INSTR_EXIT_PSCI - RT_INSTR_EXIT_HW_LOW_PWR)``.
+        curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \
+            https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin
 
-``CFLUSH_OVERHEAD`` refers to the part of ``PSCI_ENTRY`` taken to flush the
-caches. This corresponds to: ``(RT_INSTR_EXIT_CFLUSH - RT_INSTR_ENTER_CFLUSH)``.
+#. Build TF-A with the following build options:
 
-Note there is very little variance observed in the values given (~1us), although
-the values for each CPU are sometimes interchanged, depending on the order in
-which locks are acquired. Also, there is very little variance observed between
-executing the tests sequentially in a single boot or rebooting between tests.
+    .. code:: shell
 
-Given that runtime instrumentation using PMF is invasive, there is a small
-(unquantified) overhead on the results. PMF uses the generic counter for
-timestamps, which runs at 50MHz on Juno.
+        make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
+            BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \
+            ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip
+
+#. Load the following images onto the development board: ``fip.bin``,
+   ``scp_bl2.bin``.
+
+Results
+-------
+
+``CPU_SUSPEND`` to deepest power level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
+        parallel
+
+    +---------+------+-----------+---------+-------------+
+    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+    +=========+======+===========+=========+=============+
+    |    0    |  0   |   243.76  |  239.92 |     6.32    |
+    +---------+------+-----------+---------+-------------+
+    |    0    |  1   |   663.5   |  30.32  |    167.82   |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  0   |   105.12  |  22.84  |     5.88    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  1   |   384.16  |  19.06  |     4.7     |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  2   |   523.98  |  270.46 |     4.74    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  3   |   950.54  |  220.9  |     89.2    |
+    +---------+------+-----------+---------+-------------+
+
+.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
+        serial
+
+    +---------+------+-----------+---------+-------------+
+    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+    +=========+======+===========+=========+=============+
+    |    0    |  0   |   266.96  |  31.74  |    167.92   |
+    +---------+------+-----------+---------+-------------+
+    |    0    |  1   |   266.9   |  31.52  |    167.82   |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  0   |   279.86  |  23.42  |    87.52    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  1   |   101.38  |   18.8  |     4.64    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  2   |   101.18  |  19.28  |     4.64    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  3   |   101.32  |  19.02  |     4.62    |
+    +---------+------+-----------+---------+-------------+
+
+``CPU_SUSPEND`` to power level 0
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Results and Commentary
-----------------------
+.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
+        parallel
+
+    +---------+------+-----------+---------+-------------+
+    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+    +=========+======+===========+=========+=============+
+    +---------+------+-----------+---------+-------------+
+    |    0    |  0   |   661.94  |  22.88  |     9.66    |
+    +---------+------+-----------+---------+-------------+
+    |    0    |  1   |   801.64  |  23.38  |     9.62    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  0   |   105.56  |  16.02  |     8.12    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  1   |   245.42  |  16.26  |     7.78    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  2   |   384.42  |   16.1  |     7.84    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  3   |   523.74  |   15.4  |     8.02    |
+    +---------+------+-----------+---------+-------------+
+
+.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial
+
+    +---------+------+-----------+---------+-------------+
+    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+    +=========+======+===========+=========+=============+
+    |    0    |  0   |   102.16  |  23.64  |     6.7     |
+    +---------+------+-----------+---------+-------------+
+    |    0    |  1   |   101.66  |  23.78  |     6.6     |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  0   |   277.74  |  15.96  |     4.66    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  1   |    98.0   |  15.88  |     4.64    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  2   |   97.66   |  15.88  |     4.62    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  3   |   97.76   |  15.38  |     4.64    |
+    +---------+------+-----------+---------+-------------+
+
+``CPU_OFF`` on all non-lead CPUs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
+core to the deepest power level.
+
+.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs
+
+    +---------+------+-----------+---------+-------------+
+    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+    +=========+======+===========+=========+=============+
+    |    0    |  0   |   265.38  |  34.12  |    167.36   |
+    +---------+------+-----------+---------+-------------+
+    |    0    |  1   |   265.72  |  33.98  |    167.48   |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  0   |   185.3   |  23.18  |    87.42    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  1   |   101.58  |  23.46  |     4.48    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  2   |   101.66  |  22.02  |     4.72    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  3   |   101.48  |  22.22  |     4.52    |
+    +---------+------+-----------+---------+-------------+
+
+``CPU_VERSION`` in parallel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores
+
+    +-------------+--------+--------------+
+    |   Cluster   |  Core  |   Latency    |
+    +=============+========+==============+
+    |      0      |   0    |     1.22     |
+    +-------------+--------+--------------+
+    |      0      |   1    |     1.2      |
+    +-------------+--------+--------------+
+    |      1      |   0    |     0.6      |
+    +-------------+--------+--------------+
+    |      1      |   1    |     1.08     |
+    +-------------+--------+--------------+
+    |      1      |   2    |     1.04     |
+    +-------------+--------+--------------+
+    |      1      |   3    |     1.04     |
+    +-------------+--------+--------------+
+
+Annotated Historic Results
+--------------------------
+
+The following results are based on the upstream `TF master as of 31/01/2017`_.
+TF-A was built using the same build instructions as detailed in the procedure
+above.
+
+In the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
+CPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
+CPU.
+
+``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and
+``CFLUSH_OVERHEAD`` the latency of the cache flush operation.
 
 ``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -290,3 +417,5 @@
 
 .. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
 .. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
+.. _v2.9-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.9-rc0
+.. _Testing Methodology: ../perf/psci-performance-methodology.html
diff --git a/docs/perf/psci-performance-n1sdp.rst b/docs/perf/psci-performance-n1sdp.rst
new file mode 100644
index 0000000..70a1436
--- /dev/null
+++ b/docs/perf/psci-performance-n1sdp.rst
@@ -0,0 +1,203 @@
+Runtime Instrumentation Testing - N1SDP
+=======================================
+
+For this test we used the N1 System Development Platform (`N1SDP`_), which
+contains an SoC consisting of two dual-core Arm N1 clusters.
+
+The following source trees and binaries were used:
+
+- TF-A [`v2.9-rc0-16-g666aec401`_]
+- TFTF [`v2.9-rc0`_]
+- SCP/MCP `Prebuilt Images`_
+
+Please see the Runtime Instrumentation `Testing Methodology`_ page for more
+details.
+
+Procedure
+---------
+
+#. Build TFTF with runtime instrumentation enabled:
+
+    .. code:: shell
+
+        make CROSS_COMPILE=aarch64-none-elf- PLAT=n1sdp \
+            TESTS=runtime-instrumentation all
+
+#. Build TF-A with the following build options:
+
+    .. code:: shell
+
+        make CROSS_COMPILE=aarch64-none-elf- PLAT=n1sdp \
+            ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all
+
+#. Fetch the SCP firmware images:
+
+    .. code:: shell
+
+        curl --fail --connect-timeout 5 --retry 5 \
+            -sLS -o build/n1sdp/release/scp_rom.bin \
+            https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/n1sdp/release/n1sdp-bl1.bin
+        curl --fail --connect-timeout 5 \
+            --retry 5 -sLS -o build/n1sdp/release/scp_ram.bin \
+            https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/n1sdp/release/n1sdp-bl2.bin
+
+#. Fetch the MCP firmware images:
+
+    .. code:: shell
+
+        curl --fail --connect-timeout 5 --retry 5 \
+            -sLS -o build/n1sdp/release/mcp_rom.bin \
+            https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/n1sdp/release/n1sdp-mcp-bl1.bin
+        curl --fail --connect-timeout 5 --retry 5 \
+            -sLS -o build/n1sdp/release/mcp_ram.bin \
+            https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/n1sdp/release/n1sdp-mcp-bl2.bin
+
+#. Using the fiptool, create a new FIP package and append the SCP ram image onto
+   it.
+
+    .. code:: shell
+
+        ./tools/fiptool/fiptool create --blob \
+                uuid=cfacc2c4-15e8-4668-82be-430a38fad705,file=build/n1sdp/release/bl1.bin \
+                --scp-fw build/n1sdp/release/scp_ram.bin build/n1sdp/release/scp_fw.bin
+
+#. Append the MCP image to the FIP.
+
+    .. code:: shell
+
+        ./tools/fiptool/fiptool create \
+            --blob uuid=54464222-a4cf-4bf8-b1b6-cee7dade539e,file=build/n1sdp/release/mcp_ram.bin \
+            build/n1sdp/release/mcp_fw.bin
+
+#. Then, add TFTF as the Non-Secure workload in the FIP image:
+
+    .. code:: shell
+
+        make CROSS_COMPILE=aarch64-none-elf- PLAT=n1sdp \
+            ENABLE_RUNTIME_INSTRUMENTATION=1 SCP_BL2=/dev/null \
+            BL33=<path/to/tftf.bin>  fip
+
+#. Load the following images onto the development board: ``fip.bin``,
+   ``scp_rom.bin``, ``scp_ram.bin``, ``mcp_rom.bin``, and ``mcp_ram.bin``.
+
+.. note::
+
+    These instructions presume you have a complete firmware stack. The N1SDP
+    `user guide`_ provides a detailed explanation on how to get setup from
+    scratch.
+
+Results
+-------
+
+``CPU_SUSPEND`` to deepest power level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
+        parallel
+
+    +---------+------+-----------+---------+-------------+
+    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+    +=========+======+===========+=========+=============+
+    |    0    |  0   |    3.44   |  10.04  |     0.4     |
+    +---------+------+-----------+---------+-------------+
+    |    0    |  1   |    4.98   |  12.72  |     0.16    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  0   |    3.58   |  15.42  |     0.2     |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  1   |    5.24   |  17.78  |     0.18    |
+    +---------+------+-----------+---------+-------------+
+
+.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
+        serial
+
+    +---------+------+-----------+---------+-------------+
+    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+    +=========+======+===========+=========+=============+
+    |    0    |  0   |    1.82   |   9.98  |     0.32    |
+    +---------+------+-----------+---------+-------------+
+    |    0    |  1   |    1.96   |   9.96  |     0.18    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  0   |    2.0    |   10.5  |     0.16    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  1   |    2.22   |  10.56  |     0.16    |
+    +---------+------+-----------+---------+-------------+
+
+``CPU_SUSPEND`` to power level 0
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
+        parallel
+
+    +---------+------+-----------+---------+-------------+
+    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+    +=========+======+===========+=========+=============+
+    |    0    |  0   |    1.52   |  11.84  |     0.34    |
+    +---------+------+-----------+---------+-------------+
+    |    0    |  1   |    1.1    |  13.66  |     0.14    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  0   |    2.18   |   9.48  |     0.18    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  1   |    2.06   |   14.4  |     0.16    |
+    +---------+------+-----------+---------+-------------+
+
+.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial
+
+    +---------+------+-----------+---------+-------------+
+    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+    +=========+======+===========+=========+=============+
+    |    0    |  0   |    1.54   |   9.34  |     0.3     |
+    +---------+------+-----------+---------+-------------+
+    |    0    |  1   |    1.88   |   9.5   |     0.16    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  0   |    1.86   |   9.86  |     0.2     |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  1   |    2.02   |   9.64  |     0.18    |
+    +---------+------+-----------+---------+-------------+
+
+``CPU_OFF`` on all non-lead CPUs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
+core to the deepest power level.
+
+.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs
+
+    +---------+------+-----------+---------+-------------+
+    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
+    +=========+======+===========+=========+=============+
+    |    0    |  0   |    1.86   |   9.88  |     0.32    |
+    +---------+------+-----------+---------+-------------+
+    |    0    |  1   |    21.1   |  12.44  |     0.42    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  0   |   21.22   |   13.2  |     0.32    |
+    +---------+------+-----------+---------+-------------+
+    |    1    |  1   |   21.56   |  13.18  |     0.54    |
+    +---------+------+-----------+---------+-------------+
+
+``CPU_VERSION`` in parallel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores
+
+    +-------------+--------+--------------+
+    |   Cluster   |  Core  |   Latency    |
+    +=============+========+==============+
+    |      0      |   0    |     0.08     |
+    +-------------+--------+--------------+
+    |      0      |   1    |     0.22     |
+    +-------------+--------+--------------+
+    |      1      |   0    |     0.28     |
+    +-------------+--------+--------------+
+    |      1      |   1    |     0.26     |
+    +-------------+--------+--------------+
+
+--------------
+
+*Copyright (c) 2023, Arm Limited. All rights reserved.*
+
+.. _v2.9-rc0-16-g666aec401: https://review.trustedfirmware.org/plugins/gitiles/TF-A/trusted-firmware-a/+/refs/heads/v2.9-rc0-16-g666aec401
+.. _v2.9-rc0: https://review.trustedfirmware.org/plugins/gitiles/TF-A/tf-a-tests/+/refs/tags/v2.9-rc0
+.. _user guide: https://gitlab.arm.com/arm-reference-solutions/arm-reference-solutions-docs/-/blob/master/docs/n1sdp/user-guide.rst
+.. _Prebuilt Images:  https://downloads.trustedfirmware.org/tf-a/css_scp_2.11.0/n1sdp/release/
+.. _N1SDP: https://developer.arm.com/documentation/101489/latest
+.. _Testing Methodology: ../perf/psci-performance-methodology.html
\ No newline at end of file