doc: Move documents into subdirectories

This change creates the following directories under docs/
in order to provide a grouping for the content:

- components
- design
- getting_started
- perf
- process

In each of these directories an index.rst file is created
and this serves as an index / landing page for each of the
groups when the pages are compiled. Proper layout of the
top-level table of contents relies on this directory/index
structure.

Without this patch it is possible to build the documents
correctly with Sphinx but the output looks messy because
there is no overall hierarchy.

Change-Id: I3c9f4443ec98571a56a6edf775f2c8d74d7f429f
Signed-off-by: Paul Beesley <paul.beesley@arm.com>
diff --git a/docs/design/firmware-design.rst b/docs/design/firmware-design.rst
new file mode 100644
index 0000000..e7107ba
--- /dev/null
+++ b/docs/design/firmware-design.rst
@@ -0,0 +1,2687 @@
+Trusted Firmware-A design
+=========================
+
+
+
+
+.. contents::
+
+Trusted Firmware-A (TF-A) implements a subset of the Trusted Board Boot
+Requirements (TBBR) Platform Design Document (PDD) [1]_ for Arm reference
+platforms. The TBB sequence starts when the platform is powered on and runs up
+to the stage where it hands-off control to firmware running in the normal
+world in DRAM. This is the cold boot path.
+
+TF-A also implements the Power State Coordination Interface PDD [2]_ as a
+runtime service. PSCI is the interface from normal world software to firmware
+implementing power management use-cases (for example, secondary CPU boot,
+hotplug and idle). Normal world software can access TF-A runtime services via
+the Arm SMC (Secure Monitor Call) instruction. The SMC instruction must be
+used as mandated by the SMC Calling Convention [3]_.
+
+TF-A implements a framework for configuring and managing interrupts generated
+in either security state. The details of the interrupt management framework
+and its design can be found in TF-A Interrupt Management Design guide [4]_.
+
+TF-A also implements a library for setting up and managing the translation
+tables. The details of this library can be found in `Xlat_tables design`_.
+
+TF-A can be built to support either AArch64 or AArch32 execution state.
+
+Cold boot
+---------
+
+The cold boot path starts when the platform is physically turned on. If
+``COLD_BOOT_SINGLE_CPU=0``, one of the CPUs released from reset is chosen as the
+primary CPU, and the remaining CPUs are considered secondary CPUs. The primary
+CPU is chosen through platform-specific means. The cold boot path is mainly
+executed by the primary CPU, other than essential CPU initialization executed by
+all CPUs. The secondary CPUs are kept in a safe platform-specific state until
+the primary CPU has performed enough initialization to boot them.
+
+Refer to the `Reset Design`_ for more information on the effect of the
+``COLD_BOOT_SINGLE_CPU`` platform build option.
+
+The cold boot path in this implementation of TF-A depends on the execution
+state. For AArch64, it is divided into five steps (in order of execution):
+
+-  Boot Loader stage 1 (BL1) *AP Trusted ROM*
+-  Boot Loader stage 2 (BL2) *Trusted Boot Firmware*
+-  Boot Loader stage 3-1 (BL31) *EL3 Runtime Software*
+-  Boot Loader stage 3-2 (BL32) *Secure-EL1 Payload* (optional)
+-  Boot Loader stage 3-3 (BL33) *Non-trusted Firmware*
+
+For AArch32, it is divided into four steps (in order of execution):
+
+-  Boot Loader stage 1 (BL1) *AP Trusted ROM*
+-  Boot Loader stage 2 (BL2) *Trusted Boot Firmware*
+-  Boot Loader stage 3-2 (BL32) *EL3 Runtime Software*
+-  Boot Loader stage 3-3 (BL33) *Non-trusted Firmware*
+
+Arm development platforms (Fixed Virtual Platforms (FVPs) and Juno) implement a
+combination of the following types of memory regions. Each bootloader stage uses
+one or more of these memory regions.
+
+-  Regions accessible from both non-secure and secure states. For example,
+   non-trusted SRAM, ROM and DRAM.
+-  Regions accessible from only the secure state. For example, trusted SRAM and
+   ROM. The FVPs also implement the trusted DRAM which is statically
+   configured. Additionally, the Base FVPs and Juno development platform
+   configure the TrustZone Controller (TZC) to create a region in the DRAM
+   which is accessible only from the secure state.
+
+The sections below provide the following details:
+
+-  dynamic configuration of Boot Loader stages
+-  initialization and execution of the first three stages during cold boot
+-  specification of the EL3 Runtime Software (BL31 for AArch64 and BL32 for
+   AArch32) entrypoint requirements for use by alternative Trusted Boot
+   Firmware in place of the provided BL1 and BL2
+
+Dynamic Configuration during cold boot
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Each of the Boot Loader stages may be dynamically configured if required by the
+platform. The Boot Loader stage may optionally specify a firmware
+configuration file and/or hardware configuration file as listed below:
+
+-  HW_CONFIG - The hardware configuration file. Can be shared by all Boot Loader
+   stages and also by the Normal World Rich OS.
+-  TB_FW_CONFIG - Trusted Boot Firmware configuration file. Shared between BL1
+   and BL2.
+-  SOC_FW_CONFIG - SoC Firmware configuration file. Used by BL31.
+-  TOS_FW_CONFIG - Trusted OS Firmware configuration file. Used by Trusted OS
+   (BL32).
+-  NT_FW_CONFIG - Non Trusted Firmware configuration file. Used by Non-trusted
+   firmware (BL33).
+
+The Arm development platforms use the Flattened Device Tree format for the
+dynamic configuration files.
+
+Each Boot Loader stage can pass up to 4 arguments via registers to the next
+stage.  BL2 passes the list of the next images to execute to the *EL3 Runtime
+Software* (BL31 for AArch64 and BL32 for AArch32) via `arg0`. All the other
+arguments are platform defined. The Arm development platforms use the following
+convention:
+
+-  BL1 passes the address of a meminfo_t structure to BL2 via ``arg1``. This
+   structure contains the memory layout available to BL2.
+-  When dynamic configuration files are present, the firmware configuration for
+   the next Boot Loader stage is populated in the first available argument and
+   the generic hardware configuration is passed the next available argument.
+   For example,
+
+   -  If TB_FW_CONFIG is loaded by BL1, then its address is passed in ``arg0``
+      to BL2.
+   -  If HW_CONFIG is loaded by BL1, then its address is passed in ``arg2`` to
+      BL2. Note, ``arg1`` is already used for meminfo_t.
+   -  If SOC_FW_CONFIG is loaded by BL2, then its address is passed in ``arg1``
+      to BL31. Note, ``arg0`` is used to pass the list of executable images.
+   -  Similarly, if HW_CONFIG is loaded by BL1 or BL2, then its address is
+      passed in ``arg2`` to BL31.
+   -  For other BL3x images, if the firmware configuration file is loaded by
+      BL2, then its address is passed in ``arg0`` and if HW_CONFIG is loaded
+      then its address is passed in ``arg1``.
+
+BL1
+~~~
+
+This stage begins execution from the platform's reset vector at EL3. The reset
+address is platform dependent but it is usually located in a Trusted ROM area.
+The BL1 data section is copied to trusted SRAM at runtime.
+
+On the Arm development platforms, BL1 code starts execution from the reset
+vector defined by the constant ``BL1_RO_BASE``. The BL1 data section is copied
+to the top of trusted SRAM as defined by the constant ``BL1_RW_BASE``.
+
+The functionality implemented by this stage is as follows.
+
+Determination of boot path
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Whenever a CPU is released from reset, BL1 needs to distinguish between a warm
+boot and a cold boot. This is done using platform-specific mechanisms (see the
+``plat_get_my_entrypoint()`` function in the `Porting Guide`_). In the case of a
+warm boot, a CPU is expected to continue execution from a separate
+entrypoint. In the case of a cold boot, the secondary CPUs are placed in a safe
+platform-specific state (see the ``plat_secondary_cold_boot_setup()`` function in
+the `Porting Guide`_) while the primary CPU executes the remaining cold boot path
+as described in the following sections.
+
+This step only applies when ``PROGRAMMABLE_RESET_ADDRESS=0``. Refer to the
+`Reset Design`_ for more information on the effect of the
+``PROGRAMMABLE_RESET_ADDRESS`` platform build option.
+
+Architectural initialization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+BL1 performs minimal architectural initialization as follows.
+
+-  Exception vectors
+
+   BL1 sets up simple exception vectors for both synchronous and asynchronous
+   exceptions. The default behavior upon receiving an exception is to populate
+   a status code in the general purpose register ``X0/R0`` and call the
+   ``plat_report_exception()`` function (see the `Porting Guide`_). The status
+   code is one of:
+
+   For AArch64:
+
+   ::
+
+       0x0 : Synchronous exception from Current EL with SP_EL0
+       0x1 : IRQ exception from Current EL with SP_EL0
+       0x2 : FIQ exception from Current EL with SP_EL0
+       0x3 : System Error exception from Current EL with SP_EL0
+       0x4 : Synchronous exception from Current EL with SP_ELx
+       0x5 : IRQ exception from Current EL with SP_ELx
+       0x6 : FIQ exception from Current EL with SP_ELx
+       0x7 : System Error exception from Current EL with SP_ELx
+       0x8 : Synchronous exception from Lower EL using aarch64
+       0x9 : IRQ exception from Lower EL using aarch64
+       0xa : FIQ exception from Lower EL using aarch64
+       0xb : System Error exception from Lower EL using aarch64
+       0xc : Synchronous exception from Lower EL using aarch32
+       0xd : IRQ exception from Lower EL using aarch32
+       0xe : FIQ exception from Lower EL using aarch32
+       0xf : System Error exception from Lower EL using aarch32
+
+   For AArch32:
+
+   ::
+
+       0x10 : User mode
+       0x11 : FIQ mode
+       0x12 : IRQ mode
+       0x13 : SVC mode
+       0x16 : Monitor mode
+       0x17 : Abort mode
+       0x1a : Hypervisor mode
+       0x1b : Undefined mode
+       0x1f : System mode
+
+   The ``plat_report_exception()`` implementation on the Arm FVP port programs
+   the Versatile Express System LED register in the following format to
+   indicate the occurrence of an unexpected exception:
+
+   ::
+
+       SYS_LED[0]   - Security state (Secure=0/Non-Secure=1)
+       SYS_LED[2:1] - Exception Level (EL3=0x3, EL2=0x2, EL1=0x1, EL0=0x0)
+                      For AArch32 it is always 0x0
+       SYS_LED[7:3] - Exception Class (Sync/Async & origin). This is the value
+                      of the status code
+
+   A write to the LED register reflects in the System LEDs (S6LED0..7) in the
+   CLCD window of the FVP.
+
+   BL1 does not expect to receive any exceptions other than the SMC exception.
+   For the latter, BL1 installs a simple stub. The stub expects to receive a
+   limited set of SMC types (determined by their function IDs in the general
+   purpose register ``X0/R0``):
+
+   -  ``BL1_SMC_RUN_IMAGE``: This SMC is raised by BL2 to make BL1 pass control
+      to EL3 Runtime Software.
+   -  All SMCs listed in section "BL1 SMC Interface" in the `Firmware Update`_
+      Design Guide are supported for AArch64 only. These SMCs are currently
+      not supported when BL1 is built for AArch32.
+
+   Any other SMC leads to an assertion failure.
+
+-  CPU initialization
+
+   BL1 calls the ``reset_handler()`` function which in turn calls the CPU
+   specific reset handler function (see the section: "CPU specific operations
+   framework").
+
+-  Control register setup (for AArch64)
+
+   -  ``SCTLR_EL3``. Instruction cache is enabled by setting the ``SCTLR_EL3.I``
+      bit. Alignment and stack alignment checking is enabled by setting the
+      ``SCTLR_EL3.A`` and ``SCTLR_EL3.SA`` bits. Exception endianness is set to
+      little-endian by clearing the ``SCTLR_EL3.EE`` bit.
+
+   -  ``SCR_EL3``. The register width of the next lower exception level is set
+      to AArch64 by setting the ``SCR.RW`` bit. The ``SCR.EA`` bit is set to trap
+      both External Aborts and SError Interrupts in EL3. The ``SCR.SIF`` bit is
+      also set to disable instruction fetches from Non-secure memory when in
+      secure state.
+
+   -  ``CPTR_EL3``. Accesses to the ``CPACR_EL1`` register from EL1 or EL2, or the
+      ``CPTR_EL2`` register from EL2 are configured to not trap to EL3 by
+      clearing the ``CPTR_EL3.TCPAC`` bit. Access to the trace functionality is
+      configured not to trap to EL3 by clearing the ``CPTR_EL3.TTA`` bit.
+      Instructions that access the registers associated with Floating Point
+      and Advanced SIMD execution are configured to not trap to EL3 by
+      clearing the ``CPTR_EL3.TFP`` bit.
+
+   -  ``DAIF``. The SError interrupt is enabled by clearing the SError interrupt
+      mask bit.
+
+   -  ``MDCR_EL3``. The trap controls, ``MDCR_EL3.TDOSA``, ``MDCR_EL3.TDA`` and
+      ``MDCR_EL3.TPM``, are set so that accesses to the registers they control
+      do not trap to EL3. AArch64 Secure self-hosted debug is disabled by
+      setting the ``MDCR_EL3.SDD`` bit. Also ``MDCR_EL3.SPD32`` is set to
+      disable AArch32 Secure self-hosted privileged debug from S-EL1.
+
+-  Control register setup (for AArch32)
+
+   -  ``SCTLR``. Instruction cache is enabled by setting the ``SCTLR.I`` bit.
+      Alignment checking is enabled by setting the ``SCTLR.A`` bit.
+      Exception endianness is set to little-endian by clearing the
+      ``SCTLR.EE`` bit.
+
+   -  ``SCR``. The ``SCR.SIF`` bit is set to disable instruction fetches from
+      Non-secure memory when in secure state.
+
+   -  ``CPACR``. Allow execution of Advanced SIMD instructions at PL0 and PL1,
+      by clearing the ``CPACR.ASEDIS`` bit. Access to the trace functionality
+      is configured not to trap to undefined mode by clearing the
+      ``CPACR.TRCDIS`` bit.
+
+   -  ``NSACR``. Enable non-secure access to Advanced SIMD functionality and
+      system register access to implemented trace registers.
+
+   -  ``FPEXC``. Enable access to the Advanced SIMD and floating-point
+      functionality from all Exception levels.
+
+   -  ``CPSR.A``. The Asynchronous data abort interrupt is enabled by clearing
+      the Asynchronous data abort interrupt mask bit.
+
+   -  ``SDCR``. The ``SDCR.SPD`` field is set to disable AArch32 Secure
+      self-hosted privileged debug.
+
+Platform initialization
+^^^^^^^^^^^^^^^^^^^^^^^
+
+On Arm platforms, BL1 performs the following platform initializations:
+
+-  Enable the Trusted Watchdog.
+-  Initialize the console.
+-  Configure the Interconnect to enable hardware coherency.
+-  Enable the MMU and map the memory it needs to access.
+-  Configure any required platform storage to load the next bootloader image
+   (BL2).
+-  If the BL1 dynamic configuration file, ``TB_FW_CONFIG``, is available, then
+   load it to the platform defined address and make it available to BL2 via
+   ``arg0``.
+-  Configure the system timer and program the `CNTFRQ_EL0` for use by NS-BL1U
+   and NS-BL2U firmware update images.
+
+Firmware Update detection and execution
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+After performing platform setup, BL1 common code calls
+``bl1_plat_get_next_image_id()`` to determine if `Firmware Update`_ is required or
+to proceed with the normal boot process. If the platform code returns
+``BL2_IMAGE_ID`` then the normal boot sequence is executed as described in the
+next section, else BL1 assumes that `Firmware Update`_ is required and execution
+passes to the first image in the `Firmware Update`_ process. In either case, BL1
+retrieves a descriptor of the next image by calling ``bl1_plat_get_image_desc()``.
+The image descriptor contains an ``entry_point_info_t`` structure, which BL1
+uses to initialize the execution state of the next image.
+
+BL2 image load and execution
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In the normal boot flow, BL1 execution continues as follows:
+
+#. BL1 prints the following string from the primary CPU to indicate successful
+   execution of the BL1 stage:
+
+   ::
+
+       "Booting Trusted Firmware"
+
+#. BL1 loads a BL2 raw binary image from platform storage, at a
+   platform-specific base address. Prior to the load, BL1 invokes
+   ``bl1_plat_handle_pre_image_load()`` which allows the platform to update or
+   use the image information. If the BL2 image file is not present or if
+   there is not enough free trusted SRAM the following error message is
+   printed:
+
+   ::
+
+       "Failed to load BL2 firmware."
+
+#. BL1 invokes ``bl1_plat_handle_post_image_load()`` which again is intended
+   for platforms to take further action after image load. This function must
+   populate the necessary arguments for BL2, which may also include the memory
+   layout. Further description of the memory layout can be found later
+   in this document.
+
+#. BL1 passes control to the BL2 image at Secure EL1 (for AArch64) or at
+   Secure SVC mode (for AArch32), starting from its load address.
+
+BL2
+~~~
+
+BL1 loads and passes control to BL2 at Secure-EL1 (for AArch64) or at Secure
+SVC mode (for AArch32) . BL2 is linked against and loaded at a platform-specific
+base address (more information can be found later in this document).
+The functionality implemented by BL2 is as follows.
+
+Architectural initialization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For AArch64, BL2 performs the minimal architectural initialization required
+for subsequent stages of TF-A and normal world software. EL1 and EL0 are given
+access to Floating Point and Advanced SIMD registers by clearing the
+``CPACR.FPEN`` bits.
+
+For AArch32, the minimal architectural initialization required for subsequent
+stages of TF-A and normal world software is taken care of in BL1 as both BL1
+and BL2 execute at PL1.
+
+Platform initialization
+^^^^^^^^^^^^^^^^^^^^^^^
+
+On Arm platforms, BL2 performs the following platform initializations:
+
+-  Initialize the console.
+-  Configure any required platform storage to allow loading further bootloader
+   images.
+-  Enable the MMU and map the memory it needs to access.
+-  Perform platform security setup to allow access to controlled components.
+-  Reserve some memory for passing information to the next bootloader image
+   EL3 Runtime Software and populate it.
+-  Define the extents of memory available for loading each subsequent
+   bootloader image.
+-  If BL1 has passed TB_FW_CONFIG dynamic configuration file in ``arg0``,
+   then parse it.
+
+Image loading in BL2
+^^^^^^^^^^^^^^^^^^^^
+
+BL2 generic code loads the images based on the list of loadable images
+provided by the platform. BL2 passes the list of executable images
+provided by the platform to the next handover BL image.
+
+The list of loadable images provided by the platform may also contain
+dynamic configuration files. The files are loaded and can be parsed as
+needed in the ``bl2_plat_handle_post_image_load()`` function. These
+configuration files can be passed to next Boot Loader stages as arguments
+by updating the corresponding entrypoint information in this function.
+
+SCP_BL2 (System Control Processor Firmware) image load
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Some systems have a separate System Control Processor (SCP) for power, clock,
+reset and system control. BL2 loads the optional SCP_BL2 image from platform
+storage into a platform-specific region of secure memory. The subsequent
+handling of SCP_BL2 is platform specific. For example, on the Juno Arm
+development platform port the image is transferred into SCP's internal memory
+using the Boot Over MHU (BOM) protocol after being loaded in the trusted SRAM
+memory. The SCP executes SCP_BL2 and signals to the Application Processor (AP)
+for BL2 execution to continue.
+
+EL3 Runtime Software image load
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+BL2 loads the EL3 Runtime Software image from platform storage into a platform-
+specific address in trusted SRAM. If there is not enough memory to load the
+image or image is missing it leads to an assertion failure.
+
+AArch64 BL32 (Secure-EL1 Payload) image load
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+BL2 loads the optional BL32 image from platform storage into a platform-
+specific region of secure memory. The image executes in the secure world. BL2
+relies on BL31 to pass control to the BL32 image, if present. Hence, BL2
+populates a platform-specific area of memory with the entrypoint/load-address
+of the BL32 image. The value of the Saved Processor Status Register (``SPSR``)
+for entry into BL32 is not determined by BL2, it is initialized by the
+Secure-EL1 Payload Dispatcher (see later) within BL31, which is responsible for
+managing interaction with BL32. This information is passed to BL31.
+
+BL33 (Non-trusted Firmware) image load
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+BL2 loads the BL33 image (e.g. UEFI or other test or boot software) from
+platform storage into non-secure memory as defined by the platform.
+
+BL2 relies on EL3 Runtime Software to pass control to BL33 once secure state
+initialization is complete. Hence, BL2 populates a platform-specific area of
+memory with the entrypoint and Saved Program Status Register (``SPSR``) of the
+normal world software image. The entrypoint is the load address of the BL33
+image. The ``SPSR`` is determined as specified in Section 5.13 of the
+`PSCI PDD`_. This information is passed to the EL3 Runtime Software.
+
+AArch64 BL31 (EL3 Runtime Software) execution
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+BL2 execution continues as follows:
+
+#. BL2 passes control back to BL1 by raising an SMC, providing BL1 with the
+   BL31 entrypoint. The exception is handled by the SMC exception handler
+   installed by BL1.
+
+#. BL1 turns off the MMU and flushes the caches. It clears the
+   ``SCTLR_EL3.M/I/C`` bits, flushes the data cache to the point of coherency
+   and invalidates the TLBs.
+
+#. BL1 passes control to BL31 at the specified entrypoint at EL3.
+
+Running BL2 at EL3 execution level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Some platforms have a non-TF-A Boot ROM that expects the next boot stage
+to execute at EL3. On these platforms, TF-A BL1 is a waste of memory
+as its only purpose is to ensure TF-A BL2 is entered at S-EL1. To avoid
+this waste, a special mode enables BL2 to execute at EL3, which allows
+a non-TF-A Boot ROM to load and jump directly to BL2. This mode is selected
+when the build flag BL2_AT_EL3 is enabled. The main differences in this
+mode are:
+
+#. BL2 includes the reset code and the mailbox mechanism to differentiate
+   cold boot and warm boot. It runs at EL3 doing the arch
+   initialization required for EL3.
+
+#. BL2 does not receive the meminfo information from BL1 anymore. This
+   information can be passed by the Boot ROM or be internal to the
+   BL2 image.
+
+#. Since BL2 executes at EL3, BL2 jumps directly to the next image,
+   instead of invoking the RUN_IMAGE SMC call.
+
+
+We assume 3 different types of BootROM support on the platform:
+
+#. The Boot ROM always jumps to the same address, for both cold
+   and warm boot. In this case, we will need to keep a resident part
+   of BL2 whose memory cannot be reclaimed by any other image. The
+   linker script defines the symbols __TEXT_RESIDENT_START__ and
+   __TEXT_RESIDENT_END__ that allows the platform to configure
+   correctly the memory map.
+#. The platform has some mechanism to indicate the jump address to the
+   Boot ROM. Platform code can then program the jump address with
+   psci_warmboot_entrypoint during cold boot.
+#. The platform has some mechanism to program the reset address using
+   the PROGRAMMABLE_RESET_ADDRESS feature. Platform code can then
+   program the reset address with psci_warmboot_entrypoint during
+   cold boot, bypassing the boot ROM for warm boot.
+
+In the last 2 cases, no part of BL2 needs to remain resident at
+runtime. In the first 2 cases, we expect the Boot ROM to be able to
+differentiate between warm and cold boot, to avoid loading BL2 again
+during warm boot.
+
+This functionality can be tested with FVP loading the image directly
+in memory and changing the address where the system jumps at reset.
+For example:
+
+	-C cluster0.cpu0.RVBAR=0x4022000
+	--data cluster0.cpu0=bl2.bin@0x4022000
+
+With this configuration, FVP is like a platform of the first case,
+where the Boot ROM jumps always to the same address. For simplification,
+BL32 is loaded in DRAM in this case, to avoid other images reclaiming
+BL2 memory.
+
+
+AArch64 BL31
+~~~~~~~~~~~~
+
+The image for this stage is loaded by BL2 and BL1 passes control to BL31 at
+EL3. BL31 executes solely in trusted SRAM. BL31 is linked against and
+loaded at a platform-specific base address (more information can be found later
+in this document). The functionality implemented by BL31 is as follows.
+
+Architectural initialization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Currently, BL31 performs a similar architectural initialization to BL1 as
+far as system register settings are concerned. Since BL1 code resides in ROM,
+architectural initialization in BL31 allows override of any previous
+initialization done by BL1.
+
+BL31 initializes the per-CPU data framework, which provides a cache of
+frequently accessed per-CPU data optimised for fast, concurrent manipulation
+on different CPUs. This buffer includes pointers to per-CPU contexts, crash
+buffer, CPU reset and power down operations, PSCI data, platform data and so on.
+
+It then replaces the exception vectors populated by BL1 with its own. BL31
+exception vectors implement more elaborate support for handling SMCs since this
+is the only mechanism to access the runtime services implemented by BL31 (PSCI
+for example). BL31 checks each SMC for validity as specified by the
+`SMC calling convention PDD`_ before passing control to the required SMC
+handler routine.
+
+BL31 programs the ``CNTFRQ_EL0`` register with the clock frequency of the system
+counter, which is provided by the platform.
+
+Platform initialization
+^^^^^^^^^^^^^^^^^^^^^^^
+
+BL31 performs detailed platform initialization, which enables normal world
+software to function correctly.
+
+On Arm platforms, this consists of the following:
+
+-  Initialize the console.
+-  Configure the Interconnect to enable hardware coherency.
+-  Enable the MMU and map the memory it needs to access.
+-  Initialize the generic interrupt controller.
+-  Initialize the power controller device.
+-  Detect the system topology.
+
+Runtime services initialization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+BL31 is responsible for initializing the runtime services. One of them is PSCI.
+
+As part of the PSCI initializations, BL31 detects the system topology. It also
+initializes the data structures that implement the state machine used to track
+the state of power domain nodes. The state can be one of ``OFF``, ``RUN`` or
+``RETENTION``. All secondary CPUs are initially in the ``OFF`` state. The cluster
+that the primary CPU belongs to is ``ON``; any other cluster is ``OFF``. It also
+initializes the locks that protect them. BL31 accesses the state of a CPU or
+cluster immediately after reset and before the data cache is enabled in the
+warm boot path. It is not currently possible to use 'exclusive' based spinlocks,
+therefore BL31 uses locks based on Lamport's Bakery algorithm instead.
+
+The runtime service framework and its initialization is described in more
+detail in the "EL3 runtime services framework" section below.
+
+Details about the status of the PSCI implementation are provided in the
+"Power State Coordination Interface" section below.
+
+AArch64 BL32 (Secure-EL1 Payload) image initialization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If a BL32 image is present then there must be a matching Secure-EL1 Payload
+Dispatcher (SPD) service (see later for details). During initialization
+that service must register a function to carry out initialization of BL32
+once the runtime services are fully initialized. BL31 invokes such a
+registered function to initialize BL32 before running BL33. This initialization
+is not necessary for AArch32 SPs.
+
+Details on BL32 initialization and the SPD's role are described in the
+"Secure-EL1 Payloads and Dispatchers" section below.
+
+BL33 (Non-trusted Firmware) execution
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+EL3 Runtime Software initializes the EL2 or EL1 processor context for normal-
+world cold boot, ensuring that no secure state information finds its way into
+the non-secure execution state. EL3 Runtime Software uses the entrypoint
+information provided by BL2 to jump to the Non-trusted firmware image (BL33)
+at the highest available Exception Level (EL2 if available, otherwise EL1).
+
+Using alternative Trusted Boot Firmware in place of BL1 & BL2 (AArch64 only)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Some platforms have existing implementations of Trusted Boot Firmware that
+would like to use TF-A BL31 for the EL3 Runtime Software. To enable this
+firmware architecture it is important to provide a fully documented and stable
+interface between the Trusted Boot Firmware and BL31.
+
+Future changes to the BL31 interface will be done in a backwards compatible
+way, and this enables these firmware components to be independently enhanced/
+updated to develop and exploit new functionality.
+
+Required CPU state when calling ``bl31_entrypoint()`` during cold boot
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This function must only be called by the primary CPU.
+
+On entry to this function the calling primary CPU must be executing in AArch64
+EL3, little-endian data access, and all interrupt sources masked:
+
+::
+
+    PSTATE.EL = 3
+    PSTATE.RW = 1
+    PSTATE.DAIF = 0xf
+    SCTLR_EL3.EE = 0
+
+X0 and X1 can be used to pass information from the Trusted Boot Firmware to the
+platform code in BL31:
+
+::
+
+    X0 : Reserved for common TF-A information
+    X1 : Platform specific information
+
+BL31 zero-init sections (e.g. ``.bss``) should not contain valid data on entry,
+these will be zero filled prior to invoking platform setup code.
+
+Use of the X0 and X1 parameters
+'''''''''''''''''''''''''''''''
+
+The parameters are platform specific and passed from ``bl31_entrypoint()`` to
+``bl31_early_platform_setup()``. The value of these parameters is never directly
+used by the common BL31 code.
+
+The convention is that ``X0`` conveys information regarding the BL31, BL32 and
+BL33 images from the Trusted Boot firmware and ``X1`` can be used for other
+platform specific purpose. This convention allows platforms which use TF-A's
+BL1 and BL2 images to transfer additional platform specific information from
+Secure Boot without conflicting with future evolution of TF-A using ``X0`` to
+pass a ``bl31_params`` structure.
+
+BL31 common and SPD initialization code depends on image and entrypoint
+information about BL33 and BL32, which is provided via BL31 platform APIs.
+This information is required until the start of execution of BL33. This
+information can be provided in a platform defined manner, e.g. compiled into
+the platform code in BL31, or provided in a platform defined memory location
+by the Trusted Boot firmware, or passed from the Trusted Boot Firmware via the
+Cold boot Initialization parameters. This data may need to be cleaned out of
+the CPU caches if it is provided by an earlier boot stage and then accessed by
+BL31 platform code before the caches are enabled.
+
+TF-A's BL2 implementation passes a ``bl31_params`` structure in
+``X0`` and the Arm development platforms interpret this in the BL31 platform
+code.
+
+MMU, Data caches & Coherency
+''''''''''''''''''''''''''''
+
+BL31 does not depend on the enabled state of the MMU, data caches or
+interconnect coherency on entry to ``bl31_entrypoint()``. If these are disabled
+on entry, these should be enabled during ``bl31_plat_arch_setup()``.
+
+Data structures used in the BL31 cold boot interface
+''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+These structures are designed to support compatibility and independent
+evolution of the structures and the firmware images. For example, a version of
+BL31 that can interpret the BL3x image information from different versions of
+BL2, a platform that uses an extended entry_point_info structure to convey
+additional register information to BL31, or a ELF image loader that can convey
+more details about the firmware images.
+
+To support these scenarios the structures are versioned and sized, which enables
+BL31 to detect which information is present and respond appropriately. The
+``param_header`` is defined to capture this information:
+
+.. code:: c
+
+    typedef struct param_header {
+        uint8_t type;       /* type of the structure */
+        uint8_t version;    /* version of this structure */
+        uint16_t size;      /* size of this structure in bytes */
+        uint32_t attr;      /* attributes: unused bits SBZ */
+    } param_header_t;
+
+The structures using this format are ``entry_point_info``, ``image_info`` and
+``bl31_params``. The code that allocates and populates these structures must set
+the header fields appropriately, and the ``SET_PARAM_HEAD()`` a macro is defined
+to simplify this action.
+
+Required CPU state for BL31 Warm boot initialization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When requesting a CPU power-on, or suspending a running CPU, TF-A provides
+the platform power management code with a Warm boot initialization
+entry-point, to be invoked by the CPU immediately after the reset handler.
+On entry to the Warm boot initialization function the calling CPU must be in
+AArch64 EL3, little-endian data access and all interrupt sources masked:
+
+::
+
+    PSTATE.EL = 3
+    PSTATE.RW = 1
+    PSTATE.DAIF = 0xf
+    SCTLR_EL3.EE = 0
+
+The PSCI implementation will initialize the processor state and ensure that the
+platform power management code is then invoked as required to initialize all
+necessary system, cluster and CPU resources.
+
+AArch32 EL3 Runtime Software entrypoint interface
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To enable this firmware architecture it is important to provide a fully
+documented and stable interface between the Trusted Boot Firmware and the
+AArch32 EL3 Runtime Software.
+
+Future changes to the entrypoint interface will be done in a backwards
+compatible way, and this enables these firmware components to be independently
+enhanced/updated to develop and exploit new functionality.
+
+Required CPU state when entering during cold boot
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This function must only be called by the primary CPU.
+
+On entry to this function the calling primary CPU must be executing in AArch32
+EL3, little-endian data access, and all interrupt sources masked:
+
+::
+
+    PSTATE.AIF = 0x7
+    SCTLR.EE = 0
+
+R0 and R1 are used to pass information from the Trusted Boot Firmware to the
+platform code in AArch32 EL3 Runtime Software:
+
+::
+
+    R0 : Reserved for common TF-A information
+    R1 : Platform specific information
+
+Use of the R0 and R1 parameters
+'''''''''''''''''''''''''''''''
+
+The parameters are platform specific and the convention is that ``R0`` conveys
+information regarding the BL3x images from the Trusted Boot firmware and ``R1``
+can be used for other platform specific purpose. This convention allows
+platforms which use TF-A's BL1 and BL2 images to transfer additional platform
+specific information from Secure Boot without conflicting with future
+evolution of TF-A using ``R0`` to pass a ``bl_params`` structure.
+
+The AArch32 EL3 Runtime Software is responsible for entry into BL33. This
+information can be obtained in a platform defined manner, e.g. compiled into
+the AArch32 EL3 Runtime Software, or provided in a platform defined memory
+location by the Trusted Boot firmware, or passed from the Trusted Boot Firmware
+via the Cold boot Initialization parameters. This data may need to be cleaned
+out of the CPU caches if it is provided by an earlier boot stage and then
+accessed by AArch32 EL3 Runtime Software before the caches are enabled.
+
+When using AArch32 EL3 Runtime Software, the Arm development platforms pass a
+``bl_params`` structure in ``R0`` from BL2 to be interpreted by AArch32 EL3 Runtime
+Software platform code.
+
+MMU, Data caches & Coherency
+''''''''''''''''''''''''''''
+
+AArch32 EL3 Runtime Software must not depend on the enabled state of the MMU,
+data caches or interconnect coherency in its entrypoint. They must be explicitly
+enabled if required.
+
+Data structures used in cold boot interface
+'''''''''''''''''''''''''''''''''''''''''''
+
+The AArch32 EL3 Runtime Software cold boot interface uses ``bl_params`` instead
+of ``bl31_params``. The ``bl_params`` structure is based on the convention
+described in AArch64 BL31 cold boot interface section.
+
+Required CPU state for warm boot initialization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When requesting a CPU power-on, or suspending a running CPU, AArch32 EL3
+Runtime Software must ensure execution of a warm boot initialization entrypoint.
+If TF-A BL1 is used and the PROGRAMMABLE_RESET_ADDRESS build flag is false,
+then AArch32 EL3 Runtime Software must ensure that BL1 branches to the warm
+boot entrypoint by arranging for the BL1 platform function,
+plat_get_my_entrypoint(), to return a non-zero value.
+
+In this case, the warm boot entrypoint must be in AArch32 EL3, little-endian
+data access and all interrupt sources masked:
+
+::
+
+    PSTATE.AIF = 0x7
+    SCTLR.EE = 0
+
+The warm boot entrypoint may be implemented by using TF-A
+``psci_warmboot_entrypoint()`` function. In that case, the platform must fulfil
+the pre-requisites mentioned in the `PSCI Library integration guide`_.
+
+EL3 runtime services framework
+------------------------------
+
+Software executing in the non-secure state and in the secure state at exception
+levels lower than EL3 will request runtime services using the Secure Monitor
+Call (SMC) instruction. These requests will follow the convention described in
+the SMC Calling Convention PDD (`SMCCC`_). The `SMCCC`_ assigns function
+identifiers to each SMC request and describes how arguments are passed and
+returned.
+
+The EL3 runtime services framework enables the development of services by
+different providers that can be easily integrated into final product firmware.
+The following sections describe the framework which facilitates the
+registration, initialization and use of runtime services in EL3 Runtime
+Software (BL31).
+
+The design of the runtime services depends heavily on the concepts and
+definitions described in the `SMCCC`_, in particular SMC Function IDs, Owning
+Entity Numbers (OEN), Fast and Yielding calls, and the SMC32 and SMC64 calling
+conventions. Please refer to that document for more detailed explanation of
+these terms.
+
+The following runtime services are expected to be implemented first. They have
+not all been instantiated in the current implementation.
+
+#. Standard service calls
+
+   This service is for management of the entire system. The Power State
+   Coordination Interface (`PSCI`_) is the first set of standard service calls
+   defined by Arm (see PSCI section later).
+
+#. Secure-EL1 Payload Dispatcher service
+
+   If a system runs a Trusted OS or other Secure-EL1 Payload (SP) then
+   it also requires a *Secure Monitor* at EL3 to switch the EL1 processor
+   context between the normal world (EL1/EL2) and trusted world (Secure-EL1).
+   The Secure Monitor will make these world switches in response to SMCs. The
+   `SMCCC`_ provides for such SMCs with the Trusted OS Call and Trusted
+   Application Call OEN ranges.
+
+   The interface between the EL3 Runtime Software and the Secure-EL1 Payload is
+   not defined by the `SMCCC`_ or any other standard. As a result, each
+   Secure-EL1 Payload requires a specific Secure Monitor that runs as a runtime
+   service - within TF-A this service is referred to as the Secure-EL1 Payload
+   Dispatcher (SPD).
+
+   TF-A provides a Test Secure-EL1 Payload (TSP) and its associated Dispatcher
+   (TSPD). Details of SPD design and TSP/TSPD operation are described in the
+   "Secure-EL1 Payloads and Dispatchers" section below.
+
+#. CPU implementation service
+
+   This service will provide an interface to CPU implementation specific
+   services for a given platform e.g. access to processor errata workarounds.
+   This service is currently unimplemented.
+
+Additional services for Arm Architecture, SiP and OEM calls can be implemented.
+Each implemented service handles a range of SMC function identifiers as
+described in the `SMCCC`_.
+
+Registration
+~~~~~~~~~~~~
+
+A runtime service is registered using the ``DECLARE_RT_SVC()`` macro, specifying
+the name of the service, the range of OENs covered, the type of service and
+initialization and call handler functions. This macro instantiates a ``const struct rt_svc_desc`` for the service with these details (see ``runtime_svc.h``).
+This structure is allocated in a special ELF section ``rt_svc_descs``, enabling
+the framework to find all service descriptors included into BL31.
+
+The specific service for a SMC Function is selected based on the OEN and call
+type of the Function ID, and the framework uses that information in the service
+descriptor to identify the handler for the SMC Call.
+
+The service descriptors do not include information to identify the precise set
+of SMC function identifiers supported by this service implementation, the
+security state from which such calls are valid nor the capability to support
+64-bit and/or 32-bit callers (using SMC32 or SMC64). Responding appropriately
+to these aspects of a SMC call is the responsibility of the service
+implementation, the framework is focused on integration of services from
+different providers and minimizing the time taken by the framework before the
+service handler is invoked.
+
+Details of the parameters, requirements and behavior of the initialization and
+call handling functions are provided in the following sections.
+
+Initialization
+~~~~~~~~~~~~~~
+
+``runtime_svc_init()`` in ``runtime_svc.c`` initializes the runtime services
+framework running on the primary CPU during cold boot as part of the BL31
+initialization. This happens prior to initializing a Trusted OS and running
+Normal world boot firmware that might in turn use these services.
+Initialization involves validating each of the declared runtime service
+descriptors, calling the service initialization function and populating the
+index used for runtime lookup of the service.
+
+The BL31 linker script collects all of the declared service descriptors into a
+single array and defines symbols that allow the framework to locate and traverse
+the array, and determine its size.
+
+The framework does basic validation of each descriptor to halt firmware
+initialization if service declaration errors are detected. The framework does
+not check descriptors for the following error conditions, and may behave in an
+unpredictable manner under such scenarios:
+
+#. Overlapping OEN ranges
+#. Multiple descriptors for the same range of OENs and ``call_type``
+#. Incorrect range of owning entity numbers for a given ``call_type``
+
+Once validated, the service ``init()`` callback is invoked. This function carries
+out any essential EL3 initialization before servicing requests. The ``init()``
+function is only invoked on the primary CPU during cold boot. If the service
+uses per-CPU data this must either be initialized for all CPUs during this call,
+or be done lazily when a CPU first issues an SMC call to that service. If
+``init()`` returns anything other than ``0``, this is treated as an initialization
+error and the service is ignored: this does not cause the firmware to halt.
+
+The OEN and call type fields present in the SMC Function ID cover a total of
+128 distinct services, but in practice a single descriptor can cover a range of
+OENs, e.g. SMCs to call a Trusted OS function. To optimize the lookup of a
+service handler, the framework uses an array of 128 indices that map every
+distinct OEN/call-type combination either to one of the declared services or to
+indicate the service is not handled. This ``rt_svc_descs_indices[]`` array is
+populated for all of the OENs covered by a service after the service ``init()``
+function has reported success. So a service that fails to initialize will never
+have it's ``handle()`` function invoked.
+
+The following figure shows how the ``rt_svc_descs_indices[]`` index maps the SMC
+Function ID call type and OEN onto a specific service handler in the
+``rt_svc_descs[]`` array.
+
+|Image 1|
+
+Handling an SMC
+~~~~~~~~~~~~~~~
+
+When the EL3 runtime services framework receives a Secure Monitor Call, the SMC
+Function ID is passed in W0 from the lower exception level (as per the
+`SMCCC`_). If the calling register width is AArch32, it is invalid to invoke an
+SMC Function which indicates the SMC64 calling convention: such calls are
+ignored and return the Unknown SMC Function Identifier result code ``0xFFFFFFFF``
+in R0/X0.
+
+Bit[31] (fast/yielding call) and bits[29:24] (owning entity number) of the SMC
+Function ID are combined to index into the ``rt_svc_descs_indices[]`` array. The
+resulting value might indicate a service that has no handler, in this case the
+framework will also report an Unknown SMC Function ID. Otherwise, the value is
+used as a further index into the ``rt_svc_descs[]`` array to locate the required
+service and handler.
+
+The service's ``handle()`` callback is provided with five of the SMC parameters
+directly, the others are saved into memory for retrieval (if needed) by the
+handler. The handler is also provided with an opaque ``handle`` for use with the
+supporting library for parameter retrieval, setting return values and context
+manipulation; and with ``flags`` indicating the security state of the caller. The
+framework finally sets up the execution stack for the handler, and invokes the
+services ``handle()`` function.
+
+On return from the handler the result registers are populated in X0-X3 before
+restoring the stack and CPU state and returning from the original SMC.
+
+Exception Handling Framework
+----------------------------
+
+Please refer to the `Exception Handling Framework`_ document.
+
+Power State Coordination Interface
+----------------------------------
+
+TODO: Provide design walkthrough of PSCI implementation.
+
+The PSCI v1.1 specification categorizes APIs as optional and mandatory. All the
+mandatory APIs in PSCI v1.1, PSCI v1.0 and in PSCI v0.2 draft specification
+`Power State Coordination Interface PDD`_ are implemented. The table lists
+the PSCI v1.1 APIs and their support in generic code.
+
+An API implementation might have a dependency on platform code e.g. CPU_SUSPEND
+requires the platform to export a part of the implementation. Hence the level
+of support of the mandatory APIs depends upon the support exported by the
+platform port as well. The Juno and FVP (all variants) platforms export all the
+required support.
+
++-----------------------------+-------------+-------------------------------+
+| PSCI v1.1 API               | Supported   | Comments                      |
++=============================+=============+===============================+
+| ``PSCI_VERSION``            | Yes         | The version returned is 1.1   |
++-----------------------------+-------------+-------------------------------+
+| ``CPU_SUSPEND``             | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+| ``CPU_OFF``                 | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+| ``CPU_ON``                  | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+| ``AFFINITY_INFO``           | Yes         |                               |
++-----------------------------+-------------+-------------------------------+
+| ``MIGRATE``                 | Yes\*\*     |                               |
++-----------------------------+-------------+-------------------------------+
+| ``MIGRATE_INFO_TYPE``       | Yes\*\*     |                               |
++-----------------------------+-------------+-------------------------------+
+| ``MIGRATE_INFO_CPU``        | Yes\*\*     |                               |
++-----------------------------+-------------+-------------------------------+
+| ``SYSTEM_OFF``              | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+| ``SYSTEM_RESET``            | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+| ``PSCI_FEATURES``           | Yes         |                               |
++-----------------------------+-------------+-------------------------------+
+| ``CPU_FREEZE``              | No          |                               |
++-----------------------------+-------------+-------------------------------+
+| ``CPU_DEFAULT_SUSPEND``     | No          |                               |
++-----------------------------+-------------+-------------------------------+
+| ``NODE_HW_STATE``           | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+| ``SYSTEM_SUSPEND``          | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+| ``PSCI_SET_SUSPEND_MODE``   | No          |                               |
++-----------------------------+-------------+-------------------------------+
+| ``PSCI_STAT_RESIDENCY``     | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+| ``PSCI_STAT_COUNT``         | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+| ``SYSTEM_RESET2``           | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+| ``MEM_PROTECT``             | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+| ``MEM_PROTECT_CHECK_RANGE`` | Yes\*       |                               |
++-----------------------------+-------------+-------------------------------+
+
+\*Note : These PSCI APIs require platform power management hooks to be
+registered with the generic PSCI code to be supported.
+
+\*\*Note : These PSCI APIs require appropriate Secure Payload Dispatcher
+hooks to be registered with the generic PSCI code to be supported.
+
+The PSCI implementation in TF-A is a library which can be integrated with
+AArch64 or AArch32 EL3 Runtime Software for Armv8-A systems. A guide to
+integrating PSCI library with AArch32 EL3 Runtime Software can be found
+`here`_.
+
+Secure-EL1 Payloads and Dispatchers
+-----------------------------------
+
+On a production system that includes a Trusted OS running in Secure-EL1/EL0,
+the Trusted OS is coupled with a companion runtime service in the BL31
+firmware. This service is responsible for the initialisation of the Trusted
+OS and all communications with it. The Trusted OS is the BL32 stage of the
+boot flow in TF-A. The firmware will attempt to locate, load and execute a
+BL32 image.
+
+TF-A uses a more general term for the BL32 software that runs at Secure-EL1 -
+the *Secure-EL1 Payload* - as it is not always a Trusted OS.
+
+TF-A provides a Test Secure-EL1 Payload (TSP) and a Test Secure-EL1 Payload
+Dispatcher (TSPD) service as an example of how a Trusted OS is supported on a
+production system using the Runtime Services Framework. On such a system, the
+Test BL32 image and service are replaced by the Trusted OS and its dispatcher
+service. The TF-A build system expects that the dispatcher will define the
+build flag ``NEED_BL32`` to enable it to include the BL32 in the build either
+as a binary or to compile from source depending on whether the ``BL32`` build
+option is specified or not.
+
+The TSP runs in Secure-EL1. It is designed to demonstrate synchronous
+communication with the normal-world software running in EL1/EL2. Communication
+is initiated by the normal-world software
+
+-  either directly through a Fast SMC (as defined in the `SMCCC`_)
+
+-  or indirectly through a `PSCI`_ SMC. The `PSCI`_ implementation in turn
+   informs the TSPD about the requested power management operation. This allows
+   the TSP to prepare for or respond to the power state change
+
+The TSPD service is responsible for.
+
+-  Initializing the TSP
+
+-  Routing requests and responses between the secure and the non-secure
+   states during the two types of communications just described
+
+Initializing a BL32 Image
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The Secure-EL1 Payload Dispatcher (SPD) service is responsible for initializing
+the BL32 image. It needs access to the information passed by BL2 to BL31 to do
+so. This is provided by:
+
+.. code:: c
+
+    entry_point_info_t *bl31_plat_get_next_image_ep_info(uint32_t);
+
+which returns a reference to the ``entry_point_info`` structure corresponding to
+the image which will be run in the specified security state. The SPD uses this
+API to get entry point information for the SECURE image, BL32.
+
+In the absence of a BL32 image, BL31 passes control to the normal world
+bootloader image (BL33). When the BL32 image is present, it is typical
+that the SPD wants control to be passed to BL32 first and then later to BL33.
+
+To do this the SPD has to register a BL32 initialization function during
+initialization of the SPD service. The BL32 initialization function has this
+prototype:
+
+.. code:: c
+
+    int32_t init(void);
+
+and is registered using the ``bl31_register_bl32_init()`` function.
+
+TF-A supports two approaches for the SPD to pass control to BL32 before
+returning through EL3 and running the non-trusted firmware (BL33):
+
+#. In the BL32 setup function, use ``bl31_set_next_image_type()`` to
+   request that the exit from ``bl31_main()`` is to the BL32 entrypoint in
+   Secure-EL1. BL31 will exit to BL32 using the asynchronous method by
+   calling ``bl31_prepare_next_image_entry()`` and ``el3_exit()``.
+
+   When the BL32 has completed initialization at Secure-EL1, it returns to
+   BL31 by issuing an SMC, using a Function ID allocated to the SPD. On
+   receipt of this SMC, the SPD service handler should switch the CPU context
+   from trusted to normal world and use the ``bl31_set_next_image_type()`` and
+   ``bl31_prepare_next_image_entry()`` functions to set up the initial return to
+   the normal world firmware BL33. On return from the handler the framework
+   will exit to EL2 and run BL33.
+
+#. The BL32 setup function registers an initialization function using
+   ``bl31_register_bl32_init()`` which provides a SPD-defined mechanism to
+   invoke a 'world-switch synchronous call' to Secure-EL1 to run the BL32
+   entrypoint.
+   NOTE: The Test SPD service included with TF-A provides one implementation
+   of such a mechanism.
+
+   On completion BL32 returns control to BL31 via a SMC, and on receipt the
+   SPD service handler invokes the synchronous call return mechanism to return
+   to the BL32 initialization function. On return from this function,
+   ``bl31_main()`` will set up the return to the normal world firmware BL33 and
+   continue the boot process in the normal world.
+
+Crash Reporting in BL31
+-----------------------
+
+BL31 implements a scheme for reporting the processor state when an unhandled
+exception is encountered. The reporting mechanism attempts to preserve all the
+register contents and report it via a dedicated UART (PL011 console). BL31
+reports the general purpose, EL3, Secure EL1 and some EL2 state registers.
+
+A dedicated per-CPU crash stack is maintained by BL31 and this is retrieved via
+the per-CPU pointer cache. The implementation attempts to minimise the memory
+required for this feature. The file ``crash_reporting.S`` contains the
+implementation for crash reporting.
+
+The sample crash output is shown below.
+
+::
+
+    x0  :0x000000004F00007C
+    x1  :0x0000000007FFFFFF
+    x2  :0x0000000004014D50
+    x3  :0x0000000000000000
+    x4  :0x0000000088007998
+    x5  :0x00000000001343AC
+    x6  :0x0000000000000016
+    x7  :0x00000000000B8A38
+    x8  :0x00000000001343AC
+    x9  :0x00000000000101A8
+    x10 :0x0000000000000002
+    x11 :0x000000000000011C
+    x12 :0x00000000FEFDC644
+    x13 :0x00000000FED93FFC
+    x14 :0x0000000000247950
+    x15 :0x00000000000007A2
+    x16 :0x00000000000007A4
+    x17 :0x0000000000247950
+    x18 :0x0000000000000000
+    x19 :0x00000000FFFFFFFF
+    x20 :0x0000000004014D50
+    x21 :0x000000000400A38C
+    x22 :0x0000000000247950
+    x23 :0x0000000000000010
+    x24 :0x0000000000000024
+    x25 :0x00000000FEFDC868
+    x26 :0x00000000FEFDC86A
+    x27 :0x00000000019EDEDC
+    x28 :0x000000000A7CFDAA
+    x29 :0x0000000004010780
+    x30 :0x000000000400F004
+    scr_el3 :0x0000000000000D3D
+    sctlr_el3   :0x0000000000C8181F
+    cptr_el3    :0x0000000000000000
+    tcr_el3 :0x0000000080803520
+    daif    :0x00000000000003C0
+    mair_el3    :0x00000000000004FF
+    spsr_el3    :0x00000000800003CC
+    elr_el3 :0x000000000400C0CC
+    ttbr0_el3   :0x00000000040172A0
+    esr_el3 :0x0000000096000210
+    sp_el3  :0x0000000004014D50
+    far_el3 :0x000000004F00007C
+    spsr_el1    :0x0000000000000000
+    elr_el1 :0x0000000000000000
+    spsr_abt    :0x0000000000000000
+    spsr_und    :0x0000000000000000
+    spsr_irq    :0x0000000000000000
+    spsr_fiq    :0x0000000000000000
+    sctlr_el1   :0x0000000030C81807
+    actlr_el1   :0x0000000000000000
+    cpacr_el1   :0x0000000000300000
+    csselr_el1  :0x0000000000000002
+    sp_el1  :0x0000000004028800
+    esr_el1 :0x0000000000000000
+    ttbr0_el1   :0x000000000402C200
+    ttbr1_el1   :0x0000000000000000
+    mair_el1    :0x00000000000004FF
+    amair_el1   :0x0000000000000000
+    tcr_el1 :0x0000000000003520
+    tpidr_el1   :0x0000000000000000
+    tpidr_el0   :0x0000000000000000
+    tpidrro_el0 :0x0000000000000000
+    dacr32_el2  :0x0000000000000000
+    ifsr32_el2  :0x0000000000000000
+    par_el1 :0x0000000000000000
+    far_el1 :0x0000000000000000
+    afsr0_el1   :0x0000000000000000
+    afsr1_el1   :0x0000000000000000
+    contextidr_el1  :0x0000000000000000
+    vbar_el1    :0x0000000004027000
+    cntp_ctl_el0    :0x0000000000000000
+    cntp_cval_el0   :0x0000000000000000
+    cntv_ctl_el0    :0x0000000000000000
+    cntv_cval_el0   :0x0000000000000000
+    cntkctl_el1 :0x0000000000000000
+    sp_el0  :0x0000000004010780
+
+Guidelines for Reset Handlers
+-----------------------------
+
+TF-A implements a framework that allows CPU and platform ports to perform
+actions very early after a CPU is released from reset in both the cold and warm
+boot paths. This is done by calling the ``reset_handler()`` function in both
+the BL1 and BL31 images. It in turn calls the platform and CPU specific reset
+handling functions.
+
+Details for implementing a CPU specific reset handler can be found in
+Section 8. Details for implementing a platform specific reset handler can be
+found in the `Porting Guide`_ (see the ``plat_reset_handler()`` function).
+
+When adding functionality to a reset handler, keep in mind that if a different
+reset handling behavior is required between the first and the subsequent
+invocations of the reset handling code, this should be detected at runtime.
+In other words, the reset handler should be able to detect whether an action has
+already been performed and act as appropriate. Possible courses of actions are,
+e.g. skip the action the second time, or undo/redo it.
+
+Configuring secure interrupts
+-----------------------------
+
+The GIC driver is responsible for performing initial configuration of secure
+interrupts on the platform. To this end, the platform is expected to provide the
+GIC driver (either GICv2 or GICv3, as selected by the platform) with the
+interrupt configuration during the driver initialisation.
+
+Secure interrupt configuration are specified in an array of secure interrupt
+properties. In this scheme, in both GICv2 and GICv3 driver data structures, the
+``interrupt_props`` member points to an array of interrupt properties. Each
+element of the array specifies the interrupt number and its attributes
+(priority, group, configuration). Each element of the array shall be populated
+by the macro ``INTR_PROP_DESC()``. The macro takes the following arguments:
+
+- 10-bit interrupt number,
+
+- 8-bit interrupt priority,
+
+- Interrupt type (one of ``INTR_TYPE_EL3``, ``INTR_TYPE_S_EL1``,
+  ``INTR_TYPE_NS``),
+
+- Interrupt configuration (either ``GIC_INTR_CFG_LEVEL`` or
+  ``GIC_INTR_CFG_EDGE``).
+
+CPU specific operations framework
+---------------------------------
+
+Certain aspects of the Armv8-A architecture are implementation defined,
+that is, certain behaviours are not architecturally defined, but must be
+defined and documented by individual processor implementations. TF-A
+implements a framework which categorises the common implementation defined
+behaviours and allows a processor to export its implementation of that
+behaviour. The categories are:
+
+#. Processor specific reset sequence.
+
+#. Processor specific power down sequences.
+
+#. Processor specific register dumping as a part of crash reporting.
+
+#. Errata status reporting.
+
+Each of the above categories fulfils a different requirement.
+
+#. allows any processor specific initialization before the caches and MMU
+   are turned on, like implementation of errata workarounds, entry into
+   the intra-cluster coherency domain etc.
+
+#. allows each processor to implement the power down sequence mandated in
+   its Technical Reference Manual (TRM).
+
+#. allows a processor to provide additional information to the developer
+   in the event of a crash, for example Cortex-A53 has registers which
+   can expose the data cache contents.
+
+#. allows a processor to define a function that inspects and reports the status
+   of all errata workarounds on that processor.
+
+Please note that only 2. is mandated by the TRM.
+
+The CPU specific operations framework scales to accommodate a large number of
+different CPUs during power down and reset handling. The platform can specify
+any CPU optimization it wants to enable for each CPU. It can also specify
+the CPU errata workarounds to be applied for each CPU type during reset
+handling by defining CPU errata compile time macros. Details on these macros
+can be found in the `cpu-specific-build-macros.rst`_ file.
+
+The CPU specific operations framework depends on the ``cpu_ops`` structure which
+needs to be exported for each type of CPU in the platform. It is defined in
+``include/lib/cpus/aarch64/cpu_macros.S`` and has the following fields : ``midr``,
+``reset_func()``, ``cpu_pwr_down_ops`` (array of power down functions) and
+``cpu_reg_dump()``.
+
+The CPU specific files in ``lib/cpus`` export a ``cpu_ops`` data structure with
+suitable handlers for that CPU. For example, ``lib/cpus/aarch64/cortex_a53.S``
+exports the ``cpu_ops`` for Cortex-A53 CPU. According to the platform
+configuration, these CPU specific files must be included in the build by
+the platform makefile. The generic CPU specific operations framework code exists
+in ``lib/cpus/aarch64/cpu_helpers.S``.
+
+CPU specific Reset Handling
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+After a reset, the state of the CPU when it calls generic reset handler is:
+MMU turned off, both instruction and data caches turned off and not part
+of any coherency domain.
+
+The BL entrypoint code first invokes the ``plat_reset_handler()`` to allow
+the platform to perform any system initialization required and any system
+errata workarounds that needs to be applied. The ``get_cpu_ops_ptr()`` reads
+the current CPU midr, finds the matching ``cpu_ops`` entry in the ``cpu_ops``
+array and returns it. Note that only the part number and implementer fields
+in midr are used to find the matching ``cpu_ops`` entry. The ``reset_func()`` in
+the returned ``cpu_ops`` is then invoked which executes the required reset
+handling for that CPU and also any errata workarounds enabled by the platform.
+This function must preserve the values of general purpose registers x20 to x29.
+
+Refer to Section "Guidelines for Reset Handlers" for general guidelines
+regarding placement of code in a reset handler.
+
+CPU specific power down sequence
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+During the BL31 initialization sequence, the pointer to the matching ``cpu_ops``
+entry is stored in per-CPU data by ``init_cpu_ops()`` so that it can be quickly
+retrieved during power down sequences.
+
+Various CPU drivers register handlers to perform power down at certain power
+levels for that specific CPU. The PSCI service, upon receiving a power down
+request, determines the highest power level at which to execute power down
+sequence for a particular CPU. It uses the ``prepare_cpu_pwr_dwn()`` function to
+pick the right power down handler for the requested level. The function
+retrieves ``cpu_ops`` pointer member of per-CPU data, and from that, further
+retrieves ``cpu_pwr_down_ops`` array, and indexes into the required level. If the
+requested power level is higher than what a CPU driver supports, the handler
+registered for highest level is invoked.
+
+At runtime the platform hooks for power down are invoked by the PSCI service to
+perform platform specific operations during a power down sequence, for example
+turning off CCI coherency during a cluster power down.
+
+CPU specific register reporting during crash
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If the crash reporting is enabled in BL31, when a crash occurs, the crash
+reporting framework calls ``do_cpu_reg_dump`` which retrieves the matching
+``cpu_ops`` using ``get_cpu_ops_ptr()`` function. The ``cpu_reg_dump()`` in
+``cpu_ops`` is invoked, which then returns the CPU specific register values to
+be reported and a pointer to the ASCII list of register names in a format
+expected by the crash reporting framework.
+
+CPU errata status reporting
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Errata workarounds for CPUs supported in TF-A are applied during both cold and
+warm boots, shortly after reset. Individual Errata workarounds are enabled as
+build options. Some errata workarounds have potential run-time implications;
+therefore some are enabled by default, others not. Platform ports shall
+override build options to enable or disable errata as appropriate. The CPU
+drivers take care of applying errata workarounds that are enabled and applicable
+to a given CPU. Refer to the section titled *CPU Errata Workarounds* in `CPUBM`_
+for more information.
+
+Functions in CPU drivers that apply errata workaround must follow the
+conventions listed below.
+
+The errata workaround must be authored as two separate functions:
+
+-  One that checks for errata. This function must determine whether that errata
+   applies to the current CPU. Typically this involves matching the current
+   CPUs revision and variant against a value that's known to be affected by the
+   errata. If the function determines that the errata applies to this CPU, it
+   must return ``ERRATA_APPLIES``; otherwise, it must return
+   ``ERRATA_NOT_APPLIES``. The utility functions ``cpu_get_rev_var`` and
+   ``cpu_rev_var_ls`` functions may come in handy for this purpose.
+
+For an errata identified as ``E``, the check function must be named
+``check_errata_E``.
+
+This function will be invoked at different times, both from assembly and from
+C run time. Therefore it must follow AAPCS, and must not use stack.
+
+-  Another one that applies the errata workaround. This function would call the
+   check function described above, and applies errata workaround if required.
+
+CPU drivers that apply errata workaround can optionally implement an assembly
+function that report the status of errata workarounds pertaining to that CPU.
+For a driver that registers the CPU, for example, ``cpux`` via ``declare_cpu_ops``
+macro, the errata reporting function, if it exists, must be named
+``cpux_errata_report``. This function will always be called with MMU enabled; it
+must follow AAPCS and may use stack.
+
+In a debug build of TF-A, on a CPU that comes out of reset, both BL1 and the
+runtime firmware (BL31 in AArch64, and BL32 in AArch32) will invoke errata
+status reporting function, if one exists, for that type of CPU.
+
+To report the status of each errata workaround, the function shall use the
+assembler macro ``report_errata``, passing it:
+
+-  The build option that enables the errata;
+
+-  The name of the CPU: this must be the same identifier that CPU driver
+   registered itself with, using ``declare_cpu_ops``;
+
+-  And the errata identifier: the identifier must match what's used in the
+   errata's check function described above.
+
+The errata status reporting function will be called once per CPU type/errata
+combination during the software's active life time.
+
+It's expected that whenever an errata workaround is submitted to TF-A, the
+errata reporting function is appropriately extended to report its status as
+well.
+
+Reporting the status of errata workaround is for informational purpose only; it
+has no functional significance.
+
+Memory layout of BL images
+--------------------------
+
+Each bootloader image can be divided in 2 parts:
+
+-  the static contents of the image. These are data actually stored in the
+   binary on the disk. In the ELF terminology, they are called ``PROGBITS``
+   sections;
+
+-  the run-time contents of the image. These are data that don't occupy any
+   space in the binary on the disk. The ELF binary just contains some
+   metadata indicating where these data will be stored at run-time and the
+   corresponding sections need to be allocated and initialized at run-time.
+   In the ELF terminology, they are called ``NOBITS`` sections.
+
+All PROGBITS sections are grouped together at the beginning of the image,
+followed by all NOBITS sections. This is true for all TF-A images and it is
+governed by the linker scripts. This ensures that the raw binary images are
+as small as possible. If a NOBITS section was inserted in between PROGBITS
+sections then the resulting binary file would contain zero bytes in place of
+this NOBITS section, making the image unnecessarily bigger. Smaller images
+allow faster loading from the FIP to the main memory.
+
+Linker scripts and symbols
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Each bootloader stage image layout is described by its own linker script. The
+linker scripts export some symbols into the program symbol table. Their values
+correspond to particular addresses. TF-A code can refer to these symbols to
+figure out the image memory layout.
+
+Linker symbols follow the following naming convention in TF-A.
+
+-  ``__<SECTION>_START__``
+
+   Start address of a given section named ``<SECTION>``.
+
+-  ``__<SECTION>_END__``
+
+   End address of a given section named ``<SECTION>``. If there is an alignment
+   constraint on the section's end address then ``__<SECTION>_END__`` corresponds
+   to the end address of the section's actual contents, rounded up to the right
+   boundary. Refer to the value of ``__<SECTION>_UNALIGNED_END__`` to know the
+   actual end address of the section's contents.
+
+-  ``__<SECTION>_UNALIGNED_END__``
+
+   End address of a given section named ``<SECTION>`` without any padding or
+   rounding up due to some alignment constraint.
+
+-  ``__<SECTION>_SIZE__``
+
+   Size (in bytes) of a given section named ``<SECTION>``. If there is an
+   alignment constraint on the section's end address then ``__<SECTION>_SIZE__``
+   corresponds to the size of the section's actual contents, rounded up to the
+   right boundary. In other words, ``__<SECTION>_SIZE__ = __<SECTION>_END__ - _<SECTION>_START__``. Refer to the value of ``__<SECTION>_UNALIGNED_SIZE__``
+   to know the actual size of the section's contents.
+
+-  ``__<SECTION>_UNALIGNED_SIZE__``
+
+   Size (in bytes) of a given section named ``<SECTION>`` without any padding or
+   rounding up due to some alignment constraint. In other words,
+   ``__<SECTION>_UNALIGNED_SIZE__ = __<SECTION>_UNALIGNED_END__ - __<SECTION>_START__``.
+
+Some of the linker symbols are mandatory as TF-A code relies on them to be
+defined. They are listed in the following subsections. Some of them must be
+provided for each bootloader stage and some are specific to a given bootloader
+stage.
+
+The linker scripts define some extra, optional symbols. They are not actually
+used by any code but they help in understanding the bootloader images' memory
+layout as they are easy to spot in the link map files.
+
+Common linker symbols
+^^^^^^^^^^^^^^^^^^^^^
+
+All BL images share the following requirements:
+
+-  The BSS section must be zero-initialised before executing any C code.
+-  The coherent memory section (if enabled) must be zero-initialised as well.
+-  The MMU setup code needs to know the extents of the coherent and read-only
+   memory regions to set the right memory attributes. When
+   ``SEPARATE_CODE_AND_RODATA=1``, it needs to know more specifically how the
+   read-only memory region is divided between code and data.
+
+The following linker symbols are defined for this purpose:
+
+-  ``__BSS_START__``
+-  ``__BSS_SIZE__``
+-  ``__COHERENT_RAM_START__`` Must be aligned on a page-size boundary.
+-  ``__COHERENT_RAM_END__`` Must be aligned on a page-size boundary.
+-  ``__COHERENT_RAM_UNALIGNED_SIZE__``
+-  ``__RO_START__``
+-  ``__RO_END__``
+-  ``__TEXT_START__``
+-  ``__TEXT_END__``
+-  ``__RODATA_START__``
+-  ``__RODATA_END__``
+
+BL1's linker symbols
+^^^^^^^^^^^^^^^^^^^^
+
+BL1 being the ROM image, it has additional requirements. BL1 resides in ROM and
+it is entirely executed in place but it needs some read-write memory for its
+mutable data. Its ``.data`` section (i.e. its allocated read-write data) must be
+relocated from ROM to RAM before executing any C code.
+
+The following additional linker symbols are defined for BL1:
+
+-  ``__BL1_ROM_END__`` End address of BL1's ROM contents, covering its code
+   and ``.data`` section in ROM.
+-  ``__DATA_ROM_START__`` Start address of the ``.data`` section in ROM. Must be
+   aligned on a 16-byte boundary.
+-  ``__DATA_RAM_START__`` Address in RAM where the ``.data`` section should be
+   copied over. Must be aligned on a 16-byte boundary.
+-  ``__DATA_SIZE__`` Size of the ``.data`` section (in ROM or RAM).
+-  ``__BL1_RAM_START__`` Start address of BL1 read-write data.
+-  ``__BL1_RAM_END__`` End address of BL1 read-write data.
+
+How to choose the right base addresses for each bootloader stage image
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There is currently no support for dynamic image loading in TF-A. This means
+that all bootloader images need to be linked against their ultimate runtime
+locations and the base addresses of each image must be chosen carefully such
+that images don't overlap each other in an undesired way. As the code grows,
+the base addresses might need adjustments to cope with the new memory layout.
+
+The memory layout is completely specific to the platform and so there is no
+general recipe for choosing the right base addresses for each bootloader image.
+However, there are tools to aid in understanding the memory layout. These are
+the link map files: ``build/<platform>/<build-type>/bl<x>/bl<x>.map``, with ``<x>``
+being the stage bootloader. They provide a detailed view of the memory usage of
+each image. Among other useful information, they provide the end address of
+each image.
+
+-  ``bl1.map`` link map file provides ``__BL1_RAM_END__`` address.
+-  ``bl2.map`` link map file provides ``__BL2_END__`` address.
+-  ``bl31.map`` link map file provides ``__BL31_END__`` address.
+-  ``bl32.map`` link map file provides ``__BL32_END__`` address.
+
+For each bootloader image, the platform code must provide its start address
+as well as a limit address that it must not overstep. The latter is used in the
+linker scripts to check that the image doesn't grow past that address. If that
+happens, the linker will issue a message similar to the following:
+
+::
+
+    aarch64-none-elf-ld: BLx has exceeded its limit.
+
+Additionally, if the platform memory layout implies some image overlaying like
+on FVP, BL31 and TSP need to know the limit address that their PROGBITS
+sections must not overstep. The platform code must provide those.
+
+TF-A does not provide any mechanism to verify at boot time that the memory
+to load a new image is free to prevent overwriting a previously loaded image.
+The platform must specify the memory available in the system for all the
+relevant BL images to be loaded.
+
+For example, in the case of BL1 loading BL2, ``bl1_plat_sec_mem_layout()`` will
+return the region defined by the platform where BL1 intends to load BL2. The
+``load_image()`` function performs bounds check for the image size based on the
+base and maximum image size provided by the platforms. Platforms must take
+this behaviour into account when defining the base/size for each of the images.
+
+Memory layout on Arm development platforms
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following list describes the memory layout on the Arm development platforms:
+
+-  A 4KB page of shared memory is used for communication between Trusted
+   Firmware and the platform's power controller. This is located at the base of
+   Trusted SRAM. The amount of Trusted SRAM available to load the bootloader
+   images is reduced by the size of the shared memory.
+
+   The shared memory is used to store the CPUs' entrypoint mailbox. On Juno,
+   this is also used for the MHU payload when passing messages to and from the
+   SCP.
+
+-  Another 4 KB page is reserved for passing memory layout between BL1 and BL2
+   and also the dynamic firmware configurations.
+
+-  On FVP, BL1 is originally sitting in the Trusted ROM at address ``0x0``. On
+   Juno, BL1 resides in flash memory at address ``0x0BEC0000``. BL1 read-write
+   data are relocated to the top of Trusted SRAM at runtime.
+
+-  BL2 is loaded below BL1 RW
+
+-  EL3 Runtime Software, BL31 for AArch64 and BL32 for AArch32 (e.g. SP_MIN),
+   is loaded at the top of the Trusted SRAM, such that its NOBITS sections will
+   overwrite BL1 R/W data and BL2. This implies that BL1 global variables
+   remain valid only until execution reaches the EL3 Runtime Software entry
+   point during a cold boot.
+
+-  On Juno, SCP_BL2 is loaded temporarily into the EL3 Runtime Software memory
+   region and transfered to the SCP before being overwritten by EL3 Runtime
+   Software.
+
+-  BL32 (for AArch64) can be loaded in one of the following locations:
+
+   -  Trusted SRAM
+   -  Trusted DRAM (FVP only)
+   -  Secure region of DRAM (top 16MB of DRAM configured by the TrustZone
+      controller)
+
+   When BL32 (for AArch64) is loaded into Trusted SRAM, it is loaded below
+   BL31.
+
+The location of the BL32 image will result in different memory maps. This is
+illustrated for both FVP and Juno in the following diagrams, using the TSP as
+an example.
+
+Note: Loading the BL32 image in TZC secured DRAM doesn't change the memory
+layout of the other images in Trusted SRAM.
+
+CONFIG section in memory layouts shown below contains:
+
+::
+
+    +--------------------+
+    |bl2_mem_params_descs|
+    |--------------------|
+    |     fw_configs     |
+    +--------------------+
+
+``bl2_mem_params_descs`` contains parameters passed from BL2 to next the
+BL image during boot.
+
+``fw_configs`` includes soc_fw_config, tos_fw_config and tb_fw_config.
+
+**FVP with TSP in Trusted SRAM with firmware configs :**
+(These diagrams only cover the AArch64 case)
+
+::
+
+                   DRAM
+    0xffffffff +----------+
+               :          :
+               |----------|
+               |HW_CONFIG |
+    0x83000000 |----------|  (non-secure)
+               |          |
+    0x80000000 +----------+
+
+               Trusted SRAM
+    0x04040000 +----------+  loaded by BL2  +----------------+
+               | BL1 (rw) |  <<<<<<<<<<<<<  |                |
+               |----------|  <<<<<<<<<<<<<  |  BL31 NOBITS   |
+               |   BL2    |  <<<<<<<<<<<<<  |                |
+               |----------|  <<<<<<<<<<<<<  |----------------|
+               |          |  <<<<<<<<<<<<<  | BL31 PROGBITS  |
+               |          |  <<<<<<<<<<<<<  |----------------|
+               |          |  <<<<<<<<<<<<<  |     BL32       |
+    0x04002000 +----------+                 +----------------+
+               |  CONFIG  |
+    0x04001000 +----------+
+               |  Shared  |
+    0x04000000 +----------+
+
+               Trusted ROM
+    0x04000000 +----------+
+               | BL1 (ro) |
+    0x00000000 +----------+
+
+**FVP with TSP in Trusted DRAM with firmware configs (default option):**
+
+::
+
+                     DRAM
+    0xffffffff +--------------+
+               :              :
+               |--------------|
+               |  HW_CONFIG   |
+    0x83000000 |--------------|  (non-secure)
+               |              |
+    0x80000000 +--------------+
+
+                Trusted DRAM
+    0x08000000 +--------------+
+               |     BL32     |
+    0x06000000 +--------------+
+
+                 Trusted SRAM
+    0x04040000 +--------------+  loaded by BL2  +----------------+
+               |   BL1 (rw)   |  <<<<<<<<<<<<<  |                |
+               |--------------|  <<<<<<<<<<<<<  |  BL31 NOBITS   |
+               |     BL2      |  <<<<<<<<<<<<<  |                |
+               |--------------|  <<<<<<<<<<<<<  |----------------|
+               |              |  <<<<<<<<<<<<<  | BL31 PROGBITS  |
+               |              |                 +----------------+
+               +--------------+
+               |    CONFIG    |
+    0x04001000 +--------------+
+               |    Shared    |
+    0x04000000 +--------------+
+
+                 Trusted ROM
+    0x04000000 +--------------+
+               |   BL1 (ro)   |
+    0x00000000 +--------------+
+
+**FVP with TSP in TZC-Secured DRAM with firmware configs :**
+
+::
+
+                   DRAM
+    0xffffffff +----------+
+               |  BL32    |  (secure)
+    0xff000000 +----------+
+               |          |
+               |----------|
+               |HW_CONFIG |
+    0x83000000 |----------|  (non-secure)
+               |          |
+    0x80000000 +----------+
+
+               Trusted SRAM
+    0x04040000 +----------+  loaded by BL2  +----------------+
+               | BL1 (rw) |  <<<<<<<<<<<<<  |                |
+               |----------|  <<<<<<<<<<<<<  |  BL31 NOBITS   |
+               |   BL2    |  <<<<<<<<<<<<<  |                |
+               |----------|  <<<<<<<<<<<<<  |----------------|
+               |          |  <<<<<<<<<<<<<  | BL31 PROGBITS  |
+               |          |                 +----------------+
+    0x04002000 +----------+
+               |  CONFIG  |
+    0x04001000 +----------+
+               |  Shared  |
+    0x04000000 +----------+
+
+               Trusted ROM
+    0x04000000 +----------+
+               | BL1 (ro) |
+    0x00000000 +----------+
+
+**Juno with BL32 in Trusted SRAM :**
+
+::
+
+                  Flash0
+    0x0C000000 +----------+
+               :          :
+    0x0BED0000 |----------|
+               | BL1 (ro) |
+    0x0BEC0000 |----------|
+               :          :
+    0x08000000 +----------+                  BL31 is loaded
+                                             after SCP_BL2 has
+               Trusted SRAM                  been sent to SCP
+    0x04040000 +----------+  loaded by BL2  +----------------+
+               | BL1 (rw) |  <<<<<<<<<<<<<  |                |
+               |----------|  <<<<<<<<<<<<<  |  BL31 NOBITS   |
+               |   BL2    |  <<<<<<<<<<<<<  |                |
+               |----------|  <<<<<<<<<<<<<  |----------------|
+               | SCP_BL2  |  <<<<<<<<<<<<<  | BL31 PROGBITS  |
+               |----------|  <<<<<<<<<<<<<  |----------------|
+               |          |  <<<<<<<<<<<<<  |     BL32       |
+               |          |                 +----------------+
+               |          |
+    0x04001000 +----------+
+               |   MHU    |
+    0x04000000 +----------+
+
+**Juno with BL32 in TZC-secured DRAM :**
+
+::
+
+                   DRAM
+    0xFFE00000 +----------+
+               |  BL32    |  (secure)
+    0xFF000000 |----------|
+               |          |
+               :          :  (non-secure)
+               |          |
+    0x80000000 +----------+
+
+                  Flash0
+    0x0C000000 +----------+
+               :          :
+    0x0BED0000 |----------|
+               | BL1 (ro) |
+    0x0BEC0000 |----------|
+               :          :
+    0x08000000 +----------+                  BL31 is loaded
+                                             after SCP_BL2 has
+               Trusted SRAM                  been sent to SCP
+    0x04040000 +----------+  loaded by BL2  +----------------+
+               | BL1 (rw) |  <<<<<<<<<<<<<  |                |
+               |----------|  <<<<<<<<<<<<<  |  BL31 NOBITS   |
+               |   BL2    |  <<<<<<<<<<<<<  |                |
+               |----------|  <<<<<<<<<<<<<  |----------------|
+               | SCP_BL2  |  <<<<<<<<<<<<<  | BL31 PROGBITS  |
+               |----------|                 +----------------+
+    0x04001000 +----------+
+               |   MHU    |
+    0x04000000 +----------+
+
+Library at ROM
+---------------
+
+Please refer to the `ROMLIB Design`_ document.
+
+Firmware Image Package (FIP)
+----------------------------
+
+Using a Firmware Image Package (FIP) allows for packing bootloader images (and
+potentially other payloads) into a single archive that can be loaded by TF-A
+from non-volatile platform storage. A driver to load images from a FIP has
+been added to the storage layer and allows a package to be read from supported
+platform storage. A tool to create Firmware Image Packages is also provided
+and described below.
+
+Firmware Image Package layout
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The FIP layout consists of a table of contents (ToC) followed by payload data.
+The ToC itself has a header followed by one or more table entries. The ToC is
+terminated by an end marker entry, and since the size of the ToC is 0 bytes,
+the offset equals the total size of the FIP file. All ToC entries describe some
+payload data that has been appended to the end of the binary package. With the
+information provided in the ToC entry the corresponding payload data can be
+retrieved.
+
+::
+
+    ------------------
+    | ToC Header     |
+    |----------------|
+    | ToC Entry 0    |
+    |----------------|
+    | ToC Entry 1    |
+    |----------------|
+    | ToC End Marker |
+    |----------------|
+    |                |
+    |     Data 0     |
+    |                |
+    |----------------|
+    |                |
+    |     Data 1     |
+    |                |
+    ------------------
+
+The ToC header and entry formats are described in the header file
+``include/tools_share/firmware_image_package.h``. This file is used by both the
+tool and TF-A.
+
+The ToC header has the following fields:
+
+::
+
+    `name`: The name of the ToC. This is currently used to validate the header.
+    `serial_number`: A non-zero number provided by the creation tool
+    `flags`: Flags associated with this data.
+        Bits 0-31: Reserved
+        Bits 32-47: Platform defined
+        Bits 48-63: Reserved
+
+A ToC entry has the following fields:
+
+::
+
+    `uuid`: All files are referred to by a pre-defined Universally Unique
+        IDentifier [UUID] . The UUIDs are defined in
+        `include/tools_share/firmware_image_package.h`. The platform translates
+        the requested image name into the corresponding UUID when accessing the
+        package.
+    `offset_address`: The offset address at which the corresponding payload data
+        can be found. The offset is calculated from the ToC base address.
+    `size`: The size of the corresponding payload data in bytes.
+    `flags`: Flags associated with this entry. None are yet defined.
+
+Firmware Image Package creation tool
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The FIP creation tool can be used to pack specified images into a binary
+package that can be loaded by TF-A from platform storage. The tool currently
+only supports packing bootloader images. Additional image definitions can be
+added to the tool as required.
+
+The tool can be found in ``tools/fiptool``.
+
+Loading from a Firmware Image Package (FIP)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The Firmware Image Package (FIP) driver can load images from a binary package on
+non-volatile platform storage. For the Arm development platforms, this is
+currently NOR FLASH.
+
+Bootloader images are loaded according to the platform policy as specified by
+the function ``plat_get_image_source()``. For the Arm development platforms, this
+means the platform will attempt to load images from a Firmware Image Package
+located at the start of NOR FLASH0.
+
+The Arm development platforms' policy is to only allow loading of a known set of
+images. The platform policy can be modified to allow additional images.
+
+Use of coherent memory in TF-A
+------------------------------
+
+There might be loss of coherency when physical memory with mismatched
+shareability, cacheability and memory attributes is accessed by multiple CPUs
+(refer to section B2.9 of `Arm ARM`_ for more details). This possibility occurs
+in TF-A during power up/down sequences when coherency, MMU and caches are
+turned on/off incrementally.
+
+TF-A defines coherent memory as a region of memory with Device nGnRE attributes
+in the translation tables. The translation granule size in TF-A is 4KB. This
+is the smallest possible size of the coherent memory region.
+
+By default, all data structures which are susceptible to accesses with
+mismatched attributes from various CPUs are allocated in a coherent memory
+region (refer to section 2.1 of `Porting Guide`_). The coherent memory region
+accesses are Outer Shareable, non-cacheable and they can be accessed
+with the Device nGnRE attributes when the MMU is turned on. Hence, at the
+expense of at least an extra page of memory, TF-A is able to work around
+coherency issues due to mismatched memory attributes.
+
+The alternative to the above approach is to allocate the susceptible data
+structures in Normal WriteBack WriteAllocate Inner shareable memory. This
+approach requires the data structures to be designed so that it is possible to
+work around the issue of mismatched memory attributes by performing software
+cache maintenance on them.
+
+Disabling the use of coherent memory in TF-A
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It might be desirable to avoid the cost of allocating coherent memory on
+platforms which are memory constrained. TF-A enables inclusion of coherent
+memory in firmware images through the build flag ``USE_COHERENT_MEM``.
+This flag is enabled by default. It can be disabled to choose the second
+approach described above.
+
+The below sections analyze the data structures allocated in the coherent memory
+region and the changes required to allocate them in normal memory.
+
+Coherent memory usage in PSCI implementation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``psci_non_cpu_pd_nodes`` data structure stores the platform's power domain
+tree information for state management of power domains. By default, this data
+structure is allocated in the coherent memory region in TF-A because it can be
+accessed by multiple CPUs, either with caches enabled or disabled.
+
+.. code:: c
+
+    typedef struct non_cpu_pwr_domain_node {
+        /*
+         * Index of the first CPU power domain node level 0 which has this node
+         * as its parent.
+         */
+        unsigned int cpu_start_idx;
+
+        /*
+         * Number of CPU power domains which are siblings of the domain indexed
+         * by 'cpu_start_idx' i.e. all the domains in the range 'cpu_start_idx
+         * -> cpu_start_idx + ncpus' have this node as their parent.
+         */
+        unsigned int ncpus;
+
+        /*
+         * Index of the parent power domain node.
+         */
+        unsigned int parent_node;
+
+        plat_local_state_t local_state;
+
+        unsigned char level;
+
+        /* For indexing the psci_lock array*/
+        unsigned char lock_index;
+    } non_cpu_pd_node_t;
+
+In order to move this data structure to normal memory, the use of each of its
+fields must be analyzed. Fields like ``cpu_start_idx``, ``ncpus``, ``parent_node``
+``level`` and ``lock_index`` are only written once during cold boot. Hence removing
+them from coherent memory involves only doing a clean and invalidate of the
+cache lines after these fields are written.
+
+The field ``local_state`` can be concurrently accessed by multiple CPUs in
+different cache states. A Lamport's Bakery lock ``psci_locks`` is used to ensure
+mutual exclusion to this field and a clean and invalidate is needed after it
+is written.
+
+Bakery lock data
+~~~~~~~~~~~~~~~~
+
+The bakery lock data structure ``bakery_lock_t`` is allocated in coherent memory
+and is accessed by multiple CPUs with mismatched attributes. ``bakery_lock_t`` is
+defined as follows:
+
+.. code:: c
+
+    typedef struct bakery_lock {
+        /*
+         * The lock_data is a bit-field of 2 members:
+         * Bit[0]       : choosing. This field is set when the CPU is
+         *                choosing its bakery number.
+         * Bits[1 - 15] : number. This is the bakery number allocated.
+         */
+        volatile uint16_t lock_data[BAKERY_LOCK_MAX_CPUS];
+    } bakery_lock_t;
+
+It is a characteristic of Lamport's Bakery algorithm that the volatile per-CPU
+fields can be read by all CPUs but only written to by the owning CPU.
+
+Depending upon the data cache line size, the per-CPU fields of the
+``bakery_lock_t`` structure for multiple CPUs may exist on a single cache line.
+These per-CPU fields can be read and written during lock contention by multiple
+CPUs with mismatched memory attributes. Since these fields are a part of the
+lock implementation, they do not have access to any other locking primitive to
+safeguard against the resulting coherency issues. As a result, simple software
+cache maintenance is not enough to allocate them in coherent memory. Consider
+the following example.
+
+CPU0 updates its per-CPU field with data cache enabled. This write updates a
+local cache line which contains a copy of the fields for other CPUs as well. Now
+CPU1 updates its per-CPU field of the ``bakery_lock_t`` structure with data cache
+disabled. CPU1 then issues a DCIVAC operation to invalidate any stale copies of
+its field in any other cache line in the system. This operation will invalidate
+the update made by CPU0 as well.
+
+To use bakery locks when ``USE_COHERENT_MEM`` is disabled, the lock data structure
+has been redesigned. The changes utilise the characteristic of Lamport's Bakery
+algorithm mentioned earlier. The bakery_lock structure only allocates the memory
+for a single CPU. The macro ``DEFINE_BAKERY_LOCK`` allocates all the bakery locks
+needed for a CPU into a section ``bakery_lock``. The linker allocates the memory
+for other cores by using the total size allocated for the bakery_lock section
+and multiplying it with (PLATFORM_CORE_COUNT - 1). This enables software to
+perform software cache maintenance on the lock data structure without running
+into coherency issues associated with mismatched attributes.
+
+The bakery lock data structure ``bakery_info_t`` is defined for use when
+``USE_COHERENT_MEM`` is disabled as follows:
+
+.. code:: c
+
+    typedef struct bakery_info {
+        /*
+         * The lock_data is a bit-field of 2 members:
+         * Bit[0]       : choosing. This field is set when the CPU is
+         *                choosing its bakery number.
+         * Bits[1 - 15] : number. This is the bakery number allocated.
+         */
+         volatile uint16_t lock_data;
+    } bakery_info_t;
+
+The ``bakery_info_t`` represents a single per-CPU field of one lock and
+the combination of corresponding ``bakery_info_t`` structures for all CPUs in the
+system represents the complete bakery lock. The view in memory for a system
+with n bakery locks are:
+
+::
+
+    bakery_lock section start
+    |----------------|
+    | `bakery_info_t`| <-- Lock_0 per-CPU field
+    |    Lock_0      |     for CPU0
+    |----------------|
+    | `bakery_info_t`| <-- Lock_1 per-CPU field
+    |    Lock_1      |     for CPU0
+    |----------------|
+    | ....           |
+    |----------------|
+    | `bakery_info_t`| <-- Lock_N per-CPU field
+    |    Lock_N      |     for CPU0
+    ------------------
+    |    XXXXX       |
+    | Padding to     |
+    | next Cache WB  | <--- Calculate PERCPU_BAKERY_LOCK_SIZE, allocate
+    |  Granule       |       continuous memory for remaining CPUs.
+    ------------------
+    | `bakery_info_t`| <-- Lock_0 per-CPU field
+    |    Lock_0      |     for CPU1
+    |----------------|
+    | `bakery_info_t`| <-- Lock_1 per-CPU field
+    |    Lock_1      |     for CPU1
+    |----------------|
+    | ....           |
+    |----------------|
+    | `bakery_info_t`| <-- Lock_N per-CPU field
+    |    Lock_N      |     for CPU1
+    ------------------
+    |    XXXXX       |
+    | Padding to     |
+    | next Cache WB  |
+    |  Granule       |
+    ------------------
+
+Consider a system of 2 CPUs with 'N' bakery locks as shown above. For an
+operation on Lock_N, the corresponding ``bakery_info_t`` in both CPU0 and CPU1
+``bakery_lock`` section need to be fetched and appropriate cache operations need
+to be performed for each access.
+
+On Arm Platforms, bakery locks are used in psci (``psci_locks``) and power controller
+driver (``arm_lock``).
+
+Non Functional Impact of removing coherent memory
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Removal of the coherent memory region leads to the additional software overhead
+of performing cache maintenance for the affected data structures. However, since
+the memory where the data structures are allocated is cacheable, the overhead is
+mostly mitigated by an increase in performance.
+
+There is however a performance impact for bakery locks, due to:
+
+-  Additional cache maintenance operations, and
+-  Multiple cache line reads for each lock operation, since the bakery locks
+   for each CPU are distributed across different cache lines.
+
+The implementation has been optimized to minimize this additional overhead.
+Measurements indicate that when bakery locks are allocated in Normal memory, the
+minimum latency of acquiring a lock is on an average 3-4 micro seconds whereas
+in Device memory the same is 2 micro seconds. The measurements were done on the
+Juno Arm development platform.
+
+As mentioned earlier, almost a page of memory can be saved by disabling
+``USE_COHERENT_MEM``. Each platform needs to consider these trade-offs to decide
+whether coherent memory should be used. If a platform disables
+``USE_COHERENT_MEM`` and needs to use bakery locks in the porting layer, it can
+optionally define macro ``PLAT_PERCPU_BAKERY_LOCK_SIZE`` (see the
+`Porting Guide`_). Refer to the reference platform code for examples.
+
+Isolating code and read-only data on separate memory pages
+----------------------------------------------------------
+
+In the Armv8-A VMSA, translation table entries include fields that define the
+properties of the target memory region, such as its access permissions. The
+smallest unit of memory that can be addressed by a translation table entry is
+a memory page. Therefore, if software needs to set different permissions on two
+memory regions then it needs to map them using different memory pages.
+
+The default memory layout for each BL image is as follows:
+
+::
+
+       |        ...        |
+       +-------------------+
+       |  Read-write data  |
+       +-------------------+ Page boundary
+       |     <Padding>     |
+       +-------------------+
+       | Exception vectors |
+       +-------------------+ 2 KB boundary
+       |     <Padding>     |
+       +-------------------+
+       |  Read-only data   |
+       +-------------------+
+       |       Code        |
+       +-------------------+ BLx_BASE
+
+Note: The 2KB alignment for the exception vectors is an architectural
+requirement.
+
+The read-write data start on a new memory page so that they can be mapped with
+read-write permissions, whereas the code and read-only data below are configured
+as read-only.
+
+However, the read-only data are not aligned on a page boundary. They are
+contiguous to the code. Therefore, the end of the code section and the beginning
+of the read-only data one might share a memory page. This forces both to be
+mapped with the same memory attributes. As the code needs to be executable, this
+means that the read-only data stored on the same memory page as the code are
+executable as well. This could potentially be exploited as part of a security
+attack.
+
+TF provides the build flag ``SEPARATE_CODE_AND_RODATA`` to isolate the code and
+read-only data on separate memory pages. This in turn allows independent control
+of the access permissions for the code and read-only data. In this case,
+platform code gets a finer-grained view of the image layout and can
+appropriately map the code region as executable and the read-only data as
+execute-never.
+
+This has an impact on memory footprint, as padding bytes need to be introduced
+between the code and read-only data to ensure the segregation of the two. To
+limit the memory cost, this flag also changes the memory layout such that the
+code and exception vectors are now contiguous, like so:
+
+::
+
+       |        ...        |
+       +-------------------+
+       |  Read-write data  |
+       +-------------------+ Page boundary
+       |     <Padding>     |
+       +-------------------+
+       |  Read-only data   |
+       +-------------------+ Page boundary
+       |     <Padding>     |
+       +-------------------+
+       | Exception vectors |
+       +-------------------+ 2 KB boundary
+       |     <Padding>     |
+       +-------------------+
+       |       Code        |
+       +-------------------+ BLx_BASE
+
+With this more condensed memory layout, the separation of read-only data will
+add zero or one page to the memory footprint of each BL image. Each platform
+should consider the trade-off between memory footprint and security.
+
+This build flag is disabled by default, minimising memory footprint. On Arm
+platforms, it is enabled.
+
+Publish and Subscribe Framework
+-------------------------------
+
+The Publish and Subscribe Framework allows EL3 components to define and publish
+events, to which other EL3 components can subscribe.
+
+The following macros are provided by the framework:
+
+-  ``REGISTER_PUBSUB_EVENT(event)``: Defines an event, and takes one argument,
+   the event name, which must be a valid C identifier. All calls to
+   ``REGISTER_PUBSUB_EVENT`` macro must be placed in the file
+   ``pubsub_events.h``.
+
+-  ``PUBLISH_EVENT_ARG(event, arg)``: Publishes a defined event, by iterating
+   subscribed handlers and calling them in turn. The handlers will be passed the
+   parameter ``arg``. The expected use-case is to broadcast an event.
+
+-  ``PUBLISH_EVENT(event)``: Like ``PUBLISH_EVENT_ARG``, except that the value
+   ``NULL`` is passed to subscribed handlers.
+
+-  ``SUBSCRIBE_TO_EVENT(event, handler)``: Registers the ``handler`` to
+   subscribe to ``event``. The handler will be executed whenever the ``event``
+   is published.
+
+-  ``for_each_subscriber(event, subscriber)``: Iterates through all handlers
+   subscribed for ``event``. ``subscriber`` must be a local variable of type
+   ``pubsub_cb_t *``, and will point to each subscribed handler in turn during
+   iteration. This macro can be used for those patterns that none of the
+   ``PUBLISH_EVENT_*()`` macros cover.
+
+Publishing an event that wasn't defined using ``REGISTER_PUBSUB_EVENT`` will
+result in build error. Subscribing to an undefined event however won't.
+
+Subscribed handlers must be of type ``pubsub_cb_t``, with following function
+signature:
+
+::
+
+   typedef void* (*pubsub_cb_t)(const void *arg);
+
+There may be arbitrary number of handlers registered to the same event. The
+order in which subscribed handlers are notified when that event is published is
+not defined. Subscribed handlers may be executed in any order; handlers should
+not assume any relative ordering amongst them.
+
+Publishing an event on a PE will result in subscribed handlers executing on that
+PE only; it won't cause handlers to execute on a different PE.
+
+Note that publishing an event on a PE blocks until all the subscribed handlers
+finish executing on the PE.
+
+TF-A generic code publishes and subscribes to some events within. Platform
+ports are discouraged from subscribing to them. These events may be withdrawn,
+renamed, or have their semantics altered in the future. Platforms may however
+register, publish, and subscribe to platform-specific events.
+
+Publish and Subscribe Example
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A publisher that wants to publish event ``foo`` would:
+
+-  Define the event ``foo`` in the ``pubsub_events.h``.
+
+   ::
+
+      REGISTER_PUBSUB_EVENT(foo);
+
+-  Depending on the nature of event, use one of ``PUBLISH_EVENT_*()`` macros to
+   publish the event at the appropriate path and time of execution.
+
+A subscriber that wants to subscribe to event ``foo`` published above would
+implement:
+
+.. code:: c
+
+    void *foo_handler(const void *arg)
+    {
+         void *result;
+
+         /* Do handling ... */
+
+         return result;
+    }
+
+    SUBSCRIBE_TO_EVENT(foo, foo_handler);
+
+
+Reclaiming the BL31 initialization code
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A significant amount of the code used for the initialization of BL31 is never
+needed again after boot time. In order to reduce the runtime memory
+footprint, the memory used for this code can be reclaimed after initialization
+has finished and be used for runtime data.
+
+The build option ``RECLAIM_INIT_CODE`` can be set to mark this boot time code
+with a ``.text.init.*`` attribute which can be filtered and placed suitably
+within the BL image for later reclamation by the platform. The platform can
+specify the filter and the memory region for this init section in BL31 via the
+plat.ld.S linker script. For example, on the FVP, this section is placed
+overlapping the secondary CPU stacks so that after the cold boot is done, this
+memory can be reclaimed for the stacks. The init memory section is initially
+mapped with ``RO``, ``EXECUTE`` attributes. After BL31 initialization has
+completed, the FVP changes the attributes of this section to ``RW``,
+``EXECUTE_NEVER`` allowing it to be used for runtime data. The memory attributes
+are changed within the ``bl31_plat_runtime_setup`` platform hook. The init
+section section can be reclaimed for any data which is accessed after cold
+boot initialization and it is upto the platform to make the decision.
+
+Performance Measurement Framework
+---------------------------------
+
+The Performance Measurement Framework (PMF) facilitates collection of
+timestamps by registered services and provides interfaces to retrieve them
+from within TF-A. A platform can choose to expose appropriate SMCs to
+retrieve these collected timestamps.
+
+By default, the global physical counter is used for the timestamp
+value and is read via ``CNTPCT_EL0``. The framework allows to retrieve
+timestamps captured by other CPUs.
+
+Timestamp identifier format
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A PMF timestamp is uniquely identified across the system via the
+timestamp ID or ``tid``. The ``tid`` is composed as follows:
+
+::
+
+    Bits 0-7: The local timestamp identifier.
+    Bits 8-9: Reserved.
+    Bits 10-15: The service identifier.
+    Bits 16-31: Reserved.
+
+#. The service identifier. Each PMF service is identified by a
+   service name and a service identifier. Both the service name and
+   identifier are unique within the system as a whole.
+
+#. The local timestamp identifier. This identifier is unique within a given
+   service.
+
+Registering a PMF service
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To register a PMF service, the ``PMF_REGISTER_SERVICE()`` macro from ``pmf.h``
+is used. The arguments required are the service name, the service ID,
+the total number of local timestamps to be captured and a set of flags.
+
+The ``flags`` field can be specified as a bitwise-OR of the following values:
+
+::
+
+    PMF_STORE_ENABLE: The timestamp is stored in memory for later retrieval.
+    PMF_DUMP_ENABLE: The timestamp is dumped on the serial console.
+
+The ``PMF_REGISTER_SERVICE()`` reserves memory to store captured
+timestamps in a PMF specific linker section at build time.
+Additionally, it defines necessary functions to capture and
+retrieve a particular timestamp for the given service at runtime.
+
+The macro ``PMF_REGISTER_SERVICE()`` only enables capturing PMF timestamps
+from within TF-A. In order to retrieve timestamps from outside of TF-A, the
+``PMF_REGISTER_SERVICE_SMC()`` macro must be used instead. This macro
+accepts the same set of arguments as the ``PMF_REGISTER_SERVICE()``
+macro but additionally supports retrieving timestamps using SMCs.
+
+Capturing a timestamp
+~~~~~~~~~~~~~~~~~~~~~
+
+PMF timestamps are stored in a per-service timestamp region. On a
+system with multiple CPUs, each timestamp is captured and stored
+in a per-CPU cache line aligned memory region.
+
+Having registered the service, the ``PMF_CAPTURE_TIMESTAMP()`` macro can be
+used to capture a timestamp at the location where it is used. The macro
+takes the service name, a local timestamp identifier and a flag as arguments.
+
+The ``flags`` field argument can be zero, or ``PMF_CACHE_MAINT`` which
+instructs PMF to do cache maintenance following the capture. Cache
+maintenance is required if any of the service's timestamps are captured
+with data cache disabled.
+
+To capture a timestamp in assembly code, the caller should use
+``pmf_calc_timestamp_addr`` macro (defined in ``pmf_asm_macros.S``) to
+calculate the address of where the timestamp would be stored. The
+caller should then read ``CNTPCT_EL0`` register to obtain the timestamp
+and store it at the determined address for later retrieval.
+
+Retrieving a timestamp
+~~~~~~~~~~~~~~~~~~~~~~
+
+From within TF-A, timestamps for individual CPUs can be retrieved using either
+``PMF_GET_TIMESTAMP_BY_MPIDR()`` or ``PMF_GET_TIMESTAMP_BY_INDEX()`` macros.
+These macros accept the CPU's MPIDR value, or its ordinal position
+respectively.
+
+From outside TF-A, timestamps for individual CPUs can be retrieved by calling
+into ``pmf_smc_handler()``.
+
+.. code:: c
+
+    Interface : pmf_smc_handler()
+    Argument  : unsigned int smc_fid, u_register_t x1,
+                u_register_t x2, u_register_t x3,
+                u_register_t x4, void *cookie,
+                void *handle, u_register_t flags
+    Return    : uintptr_t
+
+    smc_fid: Holds the SMC identifier which is either `PMF_SMC_GET_TIMESTAMP_32`
+        when the caller of the SMC is running in AArch32 mode
+        or `PMF_SMC_GET_TIMESTAMP_64` when the caller is running in AArch64 mode.
+    x1: Timestamp identifier.
+    x2: The `mpidr` of the CPU for which the timestamp has to be retrieved.
+        This can be the `mpidr` of a different core to the one initiating
+        the SMC.  In that case, service specific cache maintenance may be
+        required to ensure the updated copy of the timestamp is returned.
+    x3: A flags value that is either 0 or `PMF_CACHE_MAINT`.  If
+        `PMF_CACHE_MAINT` is passed, then the PMF code will perform a
+        cache invalidate before reading the timestamp.  This ensures
+        an updated copy is returned.
+
+The remaining arguments, ``x4``, ``cookie``, ``handle`` and ``flags`` are unused
+in this implementation.
+
+PMF code structure
+~~~~~~~~~~~~~~~~~~
+
+#. ``pmf_main.c`` consists of core functions that implement service registration,
+   initialization, storing, dumping and retrieving timestamps.
+
+#. ``pmf_smc.c`` contains the SMC handling for registered PMF services.
+
+#. ``pmf.h`` contains the public interface to Performance Measurement Framework.
+
+#. ``pmf_asm_macros.S`` consists of macros to facilitate capturing timestamps in
+   assembly code.
+
+#. ``pmf_helpers.h`` is an internal header used by ``pmf.h``.
+
+Armv8-A Architecture Extensions
+-------------------------------
+
+TF-A makes use of Armv8-A Architecture Extensions where applicable. This
+section lists the usage of Architecture Extensions, and build flags
+controlling them.
+
+In general, and unless individually mentioned, the build options
+``ARM_ARCH_MAJOR`` and ``ARM_ARCH_MINOR`` select the Architecture Extension to
+target when building TF-A. Subsequent Arm Architecture Extensions are backward
+compatible with previous versions.
+
+The build system only requires that ``ARM_ARCH_MAJOR`` and ``ARM_ARCH_MINOR`` have a
+valid numeric value. These build options only control whether or not
+Architecture Extension-specific code is included in the build. Otherwise, TF-A
+targets the base Armv8.0-A architecture; i.e. as if ``ARM_ARCH_MAJOR`` == 8
+and ``ARM_ARCH_MINOR`` == 0, which are also their respective default values.
+
+See also the *Summary of build options* in `User Guide`_.
+
+For details on the Architecture Extension and available features, please refer
+to the respective Architecture Extension Supplement.
+
+Armv8.1-A
+~~~~~~~~~
+
+This Architecture Extension is targeted when ``ARM_ARCH_MAJOR`` >= 8, or when
+``ARM_ARCH_MAJOR`` == 8 and ``ARM_ARCH_MINOR`` >= 1.
+
+-  The Compare and Swap instruction is used to implement spinlocks. Otherwise,
+   the load-/store-exclusive instruction pair is used.
+
+Armv8.2-A
+~~~~~~~~~
+
+-  The presence of ARMv8.2-TTCNP is detected at runtime. When it is present, the
+   Common not Private (TTBRn_ELx.CnP) bit is enabled to indicate that multiple
+   Processing Elements in the same Inner Shareable domain use the same
+   translation table entries for a given stage of translation for a particular
+   translation regime.
+
+Armv8.3-A
+~~~~~~~~~
+
+-  Pointer authentication features of Armv8.3-A are unconditionally enabled in
+   the Non-secure world so that lower ELs are allowed to use them without
+   causing a trap to EL3.
+
+   In order to enable the Secure world to use it, ``CTX_INCLUDE_PAUTH_REGS``
+   must be set to 1. This will add all pointer authentication system registers
+   to the context that is saved when doing a world switch.
+
+   The TF-A itself has support for pointer authentication at runtime
+   that can be enabled by setting both options ``ENABLE_PAUTH`` and
+   ``CTX_INCLUDE_PAUTH_REGS`` to 1. This enables pointer authentication in BL1,
+   BL2, BL31, and the TSP if it is used.
+
+   These options are experimental features.
+
+   Note that Pointer Authentication is enabled for Non-secure world irrespective
+   of the value of these build flags if the CPU supports it.
+
+   If ``ARM_ARCH_MAJOR == 8`` and ``ARM_ARCH_MINOR >= 3`` the code footprint of
+   enabling PAuth is lower because the compiler will use the optimized
+   PAuth instructions rather than the backwards-compatible ones.
+
+Armv7-A
+~~~~~~~
+
+This Architecture Extension is targeted when ``ARM_ARCH_MAJOR`` == 7.
+
+There are several Armv7-A extensions available. Obviously the TrustZone
+extension is mandatory to support the TF-A bootloader and runtime services.
+
+Platform implementing an Armv7-A system can to define from its target
+Cortex-A architecture through ``ARM_CORTEX_A<X> = yes`` in their
+``platform.mk`` script. For example ``ARM_CORTEX_A15=yes`` for a
+Cortex-A15 target.
+
+Platform can also set ``ARM_WITH_NEON=yes`` to enable neon support.
+Note that using neon at runtime has constraints on non secure wolrd context.
+TF-A does not yet provide VFP context management.
+
+Directive ``ARM_CORTEX_A<x>`` and ``ARM_WITH_NEON`` are used to set
+the toolchain  target architecture directive.
+
+Platform may choose to not define straight the toolchain target architecture
+directive by defining ``MARCH32_DIRECTIVE``.
+I.e:
+
+::
+
+   MARCH32_DIRECTIVE := -mach=armv7-a
+
+Code Structure
+--------------
+
+TF-A code is logically divided between the three boot loader stages mentioned
+in the previous sections. The code is also divided into the following
+categories (present as directories in the source code):
+
+-  **Platform specific.** Choice of architecture specific code depends upon
+   the platform.
+-  **Common code.** This is platform and architecture agnostic code.
+-  **Library code.** This code comprises of functionality commonly used by all
+   other code. The PSCI implementation and other EL3 runtime frameworks reside
+   as Library components.
+-  **Stage specific.** Code specific to a boot stage.
+-  **Drivers.**
+-  **Services.** EL3 runtime services (eg: SPD). Specific SPD services
+   reside in the ``services/spd`` directory (e.g. ``services/spd/tspd``).
+
+Each boot loader stage uses code from one or more of the above mentioned
+categories. Based upon the above, the code layout looks like this:
+
+::
+
+    Directory    Used by BL1?    Used by BL2?    Used by BL31?
+    bl1          Yes             No              No
+    bl2          No              Yes             No
+    bl31         No              No              Yes
+    plat         Yes             Yes             Yes
+    drivers      Yes             No              Yes
+    common       Yes             Yes             Yes
+    lib          Yes             Yes             Yes
+    services     No              No              Yes
+
+The build system provides a non configurable build option IMAGE_BLx for each
+boot loader stage (where x = BL stage). e.g. for BL1 , IMAGE_BL1 will be
+defined by the build system. This enables TF-A to compile certain code only
+for specific boot loader stages
+
+All assembler files have the ``.S`` extension. The linker source files for each
+boot stage have the extension ``.ld.S``. These are processed by GCC to create the
+linker scripts which have the extension ``.ld``.
+
+FDTs provide a description of the hardware platform and are used by the Linux
+kernel at boot time. These can be found in the ``fdts`` directory.
+
+References
+----------
+
+.. [#] `Trusted Board Boot Requirements CLIENT (TBBR-CLIENT) Armv8-A (ARM DEN0006D)`_
+.. [#] `Power State Coordination Interface PDD`_
+.. [#] `SMC Calling Convention PDD`_
+.. [#] `TF-A Interrupt Management Design guide`_.
+
+--------------
+
+*Copyright (c) 2013-2019, Arm Limited and Contributors. All rights reserved.*
+
+.. _Reset Design: ./reset-design.rst
+.. _Porting Guide: ../getting_started/porting-guide.rst
+.. _Firmware Update: ./firmware-update.rst
+.. _PSCI PDD: http://infocenter.arm.com/help/topic/com.arm.doc.den0022d/Power_State_Coordination_Interface_PDD_v1_1_DEN0022D.pdf
+.. _SMC calling convention PDD: http://infocenter.arm.com/help/topic/com.arm.doc.den0028b/ARM_DEN0028B_SMC_Calling_Convention.pdf
+.. _PSCI Library integration guide: ../getting_started/psci-lib-integration-guide.rst
+.. _SMCCC: http://infocenter.arm.com/help/topic/com.arm.doc.den0028b/ARM_DEN0028B_SMC_Calling_Convention.pdf
+.. _PSCI: http://infocenter.arm.com/help/topic/com.arm.doc.den0022d/Power_State_Coordination_Interface_PDD_v1_1_DEN0022D.pdf
+.. _Power State Coordination Interface PDD: http://infocenter.arm.com/help/topic/com.arm.doc.den0022d/Power_State_Coordination_Interface_PDD_v1_1_DEN0022D.pdf
+.. _here: ../getting_started/psci-lib-integration-guide.rst
+.. _cpu-specific-build-macros.rst: ./cpu-specific-build-macros.rst
+.. _CPUBM: ./cpu-specific-build-macros.rst
+.. _Arm ARM: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.e/index.html
+.. _User Guide: ../getting_started/user-guide.rst
+.. _SMC Calling Convention PDD: http://infocenter.arm.com/help/topic/com.arm.doc.den0028b/ARM_DEN0028B_SMC_Calling_Convention.pdf
+.. _TF-A Interrupt Management Design guide: ./interrupt-framework-design.rst
+.. _Xlat_tables design: xlat-tables-lib-v2-design.rst
+.. _Exception Handling Framework: exception-handling.rst
+.. _ROMLIB Design: romlib-design.rst
+.. _Trusted Board Boot Requirements CLIENT (TBBR-CLIENT) Armv8-A (ARM DEN0006D): https://developer.arm.com/docs/den0006/latest/trusted-board-boot-requirements-client-tbbr-client-armv8-a
+
+.. |Image 1| image:: diagrams/rt-svc-descs-layout.png?raw=true