blob: 871be2d76e8ea9281674a9b7afceec2c8e5ae029 [file] [log] [blame]
Paul Beesleyfc9ee362019-03-07 15:47:15 +00001Reliability, Availability, and Serviceability (RAS) Extensions
2==============================================================
Jeenu Viswambharane34bf582018-10-12 08:48:36 +01003
Jeenu Viswambharane34bf582018-10-12 08:48:36 +01004This document describes |TF-A| support for Arm Reliability, Availability, and
5Serviceability (RAS) extensions. RAS is a mandatory extension for Armv8.2 and
6later CPUs, and also an optional extension to the base Armv8.0 architecture.
7
8In conjunction with the |EHF|, support for RAS extension enables firmware-first
Manish Pandey0e3379d2022-10-10 11:43:08 +01009paradigm for handling platform errors: exceptions resulting from errors in
10Non-secure world are routed to and handled in EL3.
11Said errors are Synchronous External Abort (SEA), Asynchronous External Abort
12(signalled as SErrors), Fault Handling and Error Recovery interrupts.
13The |EHF| document mentions various :ref:`error handling
Manish Pandey9c9f38a2020-06-30 00:46:08 +010014use-cases <delegation-use-cases>` .
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010015
16For the description of Arm RAS extensions, Standard Error Records, and the
17precise definition of RAS terminology, please refer to the Arm Architecture
18Reference Manual. The rest of this document assumes familiarity with
19architecture and terminology.
20
21Overview
22--------
23
24As mentioned above, the RAS support in |TF-A| enables routing to and handling of
25exceptions resulting from platform errors in EL3. It allows the platform to
26define an External Abort handler, and to register RAS nodes and interrupts. RAS
27framework also provides `helpers`__ for accessing Standard Error Records as
28introduced by the RAS extensions.
29
30.. __: `Standard Error Record helpers`_
31
32The build option ``RAS_EXTENSION`` when set to ``1`` includes the RAS in run
Manish Pandey0e3379d2022-10-10 11:43:08 +010033time firmware; ``EL3_EXCEPTION_HANDLING`` and ``HANDLE_EA_EL3_FIRST_NS`` must also
Manish Pandey7c6fcb42022-09-27 14:30:34 +010034be set ``1``. ``RAS_TRAP_NS_ERR_REC_ACCESS`` controls the access to the RAS
35error record registers from Non-secure.
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010036
37.. _ras-figure:
38
Paul Beesley814f8c02019-03-13 15:49:27 +000039.. image:: ../resources/diagrams/draw.io/ras.svg
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010040
41See more on `Engaging the RAS framework`_.
42
43Platform APIs
44-------------
45
46The RAS framework allows the platform to define handlers for External Abort,
47Uncontainable Errors, Double Fault, and errors rising from EL3 execution. Please
Manish Pandey9c9f38a2020-06-30 00:46:08 +010048refer to :ref:`RAS Porting Guide <External Abort handling and RAS Support>`.
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010049
50Registering RAS error records
51-----------------------------
52
53RAS nodes are components in the system capable of signalling errors to PEs
54through one one of the notification mechanismsSEAs, SErrors, or interrupts. RAS
55nodes contain one or more error records, which are registers through which the
56nodes advertise various properties of the signalled error. Arm recommends that
57error records are implemented in the Standard Error Record format. The RAS
Antonio Nino Diaz56b68ad2019-02-28 13:35:21 +000058architecture allows for error records to be accessible via system or
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010059memory-mapped registers.
60
61The platform should enumerate the error records providing for each of them:
62
63- A handler to probe error records for errors;
64- When the probing identifies an error, a handler to handle it;
65- For memory-mapped error record, its base address and size in KB; for a system
66 register-accessed record, the start index of the record and number of
67 continuous records from that index;
68- Any node-specific auxiliary data.
69
70With this information supplied, when the run time firmware receives one of the
71notification mechanisms, the RAS framework can iterate through and probe error
72records for error, and invoke the appropriate handler to handle it.
73
74The RAS framework provides the macros to populate error record information. The
75macros are versioned, and the latest version as of this writing is 1. These
76macros create a structure of type ``struct err_record_info`` from its arguments,
77which are later passed to probe and error handlers.
78
79For memory-mapped error records:
80
81.. code:: c
82
83 ERR_RECORD_MEMMAP_V1(base_addr, size_num_k, probe, handler, aux)
84
85And, for system register ones:
86
87.. code:: c
88
89 ERR_RECORD_SYSREG_V1(idx_start, num_idx, probe, handler, aux)
90
91The probe handler must have the following prototype:
92
93.. code:: c
94
95 typedef int (*err_record_probe_t)(const struct err_record_info *info,
96 int *probe_data);
97
98The probe handler must return a non-zero value if an error was detected, or 0
99otherwise. The ``probe_data`` output parameter can be used to pass any useful
100information resulting from probe to the error handler (see `below`__). For
101example, it could return the index of the record.
102
103.. __: `Standard Error Record helpers`_
104
105The error handler must have the following prototype:
106
107.. code:: c
108
109 typedef int (*err_record_handler_t)(const struct err_record_info *info,
110 int probe_data, const struct err_handler_data *const data);
111
112The ``data`` constant parameter describes the various properties of the error,
Antonio Nino Diaz56b68ad2019-02-28 13:35:21 +0000113including the reason for the error, exception syndrome, and also ``flags``,
Manish Pandey9c9f38a2020-06-30 00:46:08 +0100114``cookie``, and ``handle`` parameters from the :ref:`top-level exception handler
115<EL3 interrupts>`.
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100116
117The platform is expected populate an array using the macros above, and register
118the it with the RAS framework using the macro ``REGISTER_ERR_RECORD_INFO()``,
119passing it the name of the array describing the records. Note that the macro
120must be used in the same file where the array is defined.
121
122Standard Error Record helpers
123~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
124
125The |TF-A| RAS framework provides probe handlers for Standard Error Records, for
126both memory-mapped and System Register accesses:
127
128.. code:: c
129
130 int ras_err_ser_probe_memmap(const struct err_record_info *info,
131 int *probe_data);
132
133 int ras_err_ser_probe_sysreg(const struct err_record_info *info,
134 int *probe_data);
135
136When the platform enumerates error records, for those records in the Standard
137Error Record format, these helpers maybe used instead of rolling out their own.
138Both helpers above:
139
140- Return non-zero value when an error is detected in a Standard Error Record;
141- Set ``probe_data`` to the index of the error record upon detecting an error.
142
143Registering RAS interrupts
144--------------------------
145
146RAS nodes can signal errors to the PE by raising Fault Handling and/or Error
147Recovery interrupts. For the firmware-first handling paradigm for interrupts to
148work, the platform must setup and register with |EHF|. See `Interaction with
149Exception Handling Framework`_.
150
151For each RAS interrupt, the platform has to provide structure of type ``struct
152ras_interrupt``:
153
154- Interrupt number;
155- The associated error record information (pointer to the corresponding
156 ``struct err_record_info``);
157- Optionally, a cookie.
158
159The platform is expected to define an array of ``struct ras_interrupt``, and
160register it with the RAS framework using the macro
161``REGISTER_RAS_INTERRUPTS()``, passing it the name of the array. Note that the
162macro must be used in the same file where the array is defined.
163
164The array of ``struct ras_interrupt`` must be sorted in the increasing order of
165interrupt number. This allows for fast look of handlers in order to service RAS
166interrupts.
167
168Double-fault handling
169---------------------
170
171A Double Fault condition arises when an error is signalled to the PE while
172handling of a previously signalled error is still underway. When a Double Fault
173condition arises, the Arm RAS extensions only require for handler to perform
174orderly shutdown of the system, as recovery may be impossible.
175
176The RAS extensions part of Armv8.4 introduced new architectural features to deal
177with Double Fault conditions, specifically, the introduction of ``NMEA`` and
178``EASE`` bits to ``SCR_EL3`` register. These were introduced to assist EL3
179software which runs part of its entry/exit routines with exceptions momentarily
180maskedmeaning, in such systems, External Aborts/SErrors are not immediately
181handled when they occur, but only after the exceptions are unmasked again.
182
183|TF-A|, for legacy reasons, executes entire EL3 with all exceptions unmasked.
184This means that all exceptions routed to EL3 are handled immediately. |TF-A|
185thus is able to detect a Double Fault conditions in software, without needing
186the intended advantages of Armv8.4 Double Fault architecture extensions.
187
188Double faults are fatal, and terminate at the platform double fault handler, and
189doesn't return.
190
191Engaging the RAS framework
192--------------------------
193
Paul Beesley1fbc97b2019-01-11 18:26:51 +0000194Enabling RAS support is a platform choice constructed from three distinct, but
195related, build options:
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100196
197- ``RAS_EXTENSION=1`` includes the RAS framework in the run time firmware;
198
199- ``EL3_EXCEPTION_HANDLING=1`` enables handling of exceptions at EL3. See
200 `Interaction with Exception Handling Framework`_;
201
Manish Pandey0e3379d2022-10-10 11:43:08 +0100202- ``HANDLE_EA_EL3_FIRST_NS=1`` enables routing of External Aborts and SErrors,
203 resulting from errors in NS world, to EL3.
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100204
205The RAS support in |TF-A| introduces a default implementation of
206``plat_ea_handler``, the External Abort handler in EL3. When ``RAS_EXTENSION``
207is set to ``1``, it'll first call ``ras_ea_handler()`` function, which is the
208top-level RAS exception handler. ``ras_ea_handler`` is responsible for iterating
209to through platform-supplied error records, probe them, and when an error is
210identified, look up and invoke the corresponding error handler.
211
212Note that, if the platform chooses to override the ``plat_ea_handler`` function
213and intend to use the RAS framework, it must explicitly call
214``ras_ea_handler()`` from within.
215
216Similarly, for RAS interrupts, the framework defines
217``ras_interrupt_handler()``. The RAS framework arranges for it to be invoked
218when a RAS interrupt taken at EL3. The function bisects the platform-supplied
219sorted array of interrupts to look up the error record information associated
220with the interrupt number. That error handler for that record is then invoked to
221handle the error.
222
223Interaction with Exception Handling Framework
224---------------------------------------------
225
226As mentioned in earlier sections, RAS framework interacts with the |EHF| to
227arbitrate handling of RAS exceptions with others that are routed to EL3. This
Manish Pandey9c9f38a2020-06-30 00:46:08 +0100228means that the platform must partition a :ref:`priority level <Partitioning
229priority levels>` for handling RAS exceptions. The platform must then define
230the macro ``PLAT_RAS_PRI`` to the priority level used for RAS exceptions.
231Platforms would typically want to allocate the highest secure priority for
232RAS handling.
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100233
Manish Pandey9c9f38a2020-06-30 00:46:08 +0100234Handling of both :ref:`interrupt <interrupt-flow>` and :ref:`non-interrupt
235<non-interrupt-flow>` exceptions follow the sequences outlined in the |EHF|
236documentation. I.e., for interrupts, the priority management is implicit; but
237for non-interrupt exceptions, they're explicit using :ref:`EHF APIs
238<Activating and Deactivating priorities>`.
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100239
Paul Beesleyf8640672019-04-12 14:19:42 +0100240--------------
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100241
Paul Beesleyf8640672019-04-12 14:19:42 +0100242*Copyright (c) 2018-2019, Arm Limited and Contributors. All rights reserved.*