blob: f329fb0b2d6084d1ac1c8664994ffdbc8e832d93 [file] [log] [blame]
Jeenu Viswambharane34bf582018-10-12 08:48:36 +01001RAS support in Trusted Firmware-A
2=================================
3
Paul Beesleyea225122019-02-11 17:54:45 +00004
Jeenu Viswambharane34bf582018-10-12 08:48:36 +01005
6.. contents::
7 :depth: 2
8
9.. |EHF| replace:: Exception Handling Framework
10.. |TF-A| replace:: Trusted Firmware-A
11
12This document describes |TF-A| support for Arm Reliability, Availability, and
13Serviceability (RAS) extensions. RAS is a mandatory extension for Armv8.2 and
14later CPUs, and also an optional extension to the base Armv8.0 architecture.
15
16In conjunction with the |EHF|, support for RAS extension enables firmware-first
Antonio Nino Diaz56b68ad2019-02-28 13:35:21 +000017paradigm for handling platform errors: exceptions resulting from errors are
18routed to and handled in EL3. Said errors are Synchronous External Abort (SEA),
19Asynchronous External Abort (signalled as SErrors), Fault Handling and Error
20Recovery interrupts. The |EHF| document mentions various `error handling
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010021use-cases`__.
22
23.. __: exception-handling.rst#delegation-use-cases
24
25For the description of Arm RAS extensions, Standard Error Records, and the
26precise definition of RAS terminology, please refer to the Arm Architecture
27Reference Manual. The rest of this document assumes familiarity with
28architecture and terminology.
29
30Overview
31--------
32
33As mentioned above, the RAS support in |TF-A| enables routing to and handling of
34exceptions resulting from platform errors in EL3. It allows the platform to
35define an External Abort handler, and to register RAS nodes and interrupts. RAS
36framework also provides `helpers`__ for accessing Standard Error Records as
37introduced by the RAS extensions.
38
39.. __: `Standard Error Record helpers`_
40
41The build option ``RAS_EXTENSION`` when set to ``1`` includes the RAS in run
42time firmware; ``EL3_EXCEPTION_HANDLING`` and ``HANDLE_EA_EL3_FIRST`` must also
43be set ``1``.
44
45.. _ras-figure:
46
Paul Beesleyea225122019-02-11 17:54:45 +000047.. image:: ../draw.io/ras.svg
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010048
49See more on `Engaging the RAS framework`_.
50
51Platform APIs
52-------------
53
54The RAS framework allows the platform to define handlers for External Abort,
55Uncontainable Errors, Double Fault, and errors rising from EL3 execution. Please
56refer to the porting guide for the `RAS platform API descriptions`__.
57
Paul Beesleyea225122019-02-11 17:54:45 +000058.. __: ../getting_started/porting-guide.rst#external-abort-handling-and-ras-support
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010059
60Registering RAS error records
61-----------------------------
62
63RAS nodes are components in the system capable of signalling errors to PEs
64through one one of the notification mechanismsSEAs, SErrors, or interrupts. RAS
65nodes contain one or more error records, which are registers through which the
66nodes advertise various properties of the signalled error. Arm recommends that
67error records are implemented in the Standard Error Record format. The RAS
Antonio Nino Diaz56b68ad2019-02-28 13:35:21 +000068architecture allows for error records to be accessible via system or
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010069memory-mapped registers.
70
71The platform should enumerate the error records providing for each of them:
72
73- A handler to probe error records for errors;
74- When the probing identifies an error, a handler to handle it;
75- For memory-mapped error record, its base address and size in KB; for a system
76 register-accessed record, the start index of the record and number of
77 continuous records from that index;
78- Any node-specific auxiliary data.
79
80With this information supplied, when the run time firmware receives one of the
81notification mechanisms, the RAS framework can iterate through and probe error
82records for error, and invoke the appropriate handler to handle it.
83
84The RAS framework provides the macros to populate error record information. The
85macros are versioned, and the latest version as of this writing is 1. These
86macros create a structure of type ``struct err_record_info`` from its arguments,
87which are later passed to probe and error handlers.
88
89For memory-mapped error records:
90
91.. code:: c
92
93 ERR_RECORD_MEMMAP_V1(base_addr, size_num_k, probe, handler, aux)
94
95And, for system register ones:
96
97.. code:: c
98
99 ERR_RECORD_SYSREG_V1(idx_start, num_idx, probe, handler, aux)
100
101The probe handler must have the following prototype:
102
103.. code:: c
104
105 typedef int (*err_record_probe_t)(const struct err_record_info *info,
106 int *probe_data);
107
108The probe handler must return a non-zero value if an error was detected, or 0
109otherwise. The ``probe_data`` output parameter can be used to pass any useful
110information resulting from probe to the error handler (see `below`__). For
111example, it could return the index of the record.
112
113.. __: `Standard Error Record helpers`_
114
115The error handler must have the following prototype:
116
117.. code:: c
118
119 typedef int (*err_record_handler_t)(const struct err_record_info *info,
120 int probe_data, const struct err_handler_data *const data);
121
122The ``data`` constant parameter describes the various properties of the error,
Antonio Nino Diaz56b68ad2019-02-28 13:35:21 +0000123including the reason for the error, exception syndrome, and also ``flags``,
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100124``cookie``, and ``handle`` parameters from the `top-level exception handler`__.
125
126.. __: interrupt-framework-design.rst#el3-interrupts
127
128The platform is expected populate an array using the macros above, and register
129the it with the RAS framework using the macro ``REGISTER_ERR_RECORD_INFO()``,
130passing it the name of the array describing the records. Note that the macro
131must be used in the same file where the array is defined.
132
133Standard Error Record helpers
134~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
135
136The |TF-A| RAS framework provides probe handlers for Standard Error Records, for
137both memory-mapped and System Register accesses:
138
139.. code:: c
140
141 int ras_err_ser_probe_memmap(const struct err_record_info *info,
142 int *probe_data);
143
144 int ras_err_ser_probe_sysreg(const struct err_record_info *info,
145 int *probe_data);
146
147When the platform enumerates error records, for those records in the Standard
148Error Record format, these helpers maybe used instead of rolling out their own.
149Both helpers above:
150
151- Return non-zero value when an error is detected in a Standard Error Record;
152- Set ``probe_data`` to the index of the error record upon detecting an error.
153
154Registering RAS interrupts
155--------------------------
156
157RAS nodes can signal errors to the PE by raising Fault Handling and/or Error
158Recovery interrupts. For the firmware-first handling paradigm for interrupts to
159work, the platform must setup and register with |EHF|. See `Interaction with
160Exception Handling Framework`_.
161
162For each RAS interrupt, the platform has to provide structure of type ``struct
163ras_interrupt``:
164
165- Interrupt number;
166- The associated error record information (pointer to the corresponding
167 ``struct err_record_info``);
168- Optionally, a cookie.
169
170The platform is expected to define an array of ``struct ras_interrupt``, and
171register it with the RAS framework using the macro
172``REGISTER_RAS_INTERRUPTS()``, passing it the name of the array. Note that the
173macro must be used in the same file where the array is defined.
174
175The array of ``struct ras_interrupt`` must be sorted in the increasing order of
176interrupt number. This allows for fast look of handlers in order to service RAS
177interrupts.
178
179Double-fault handling
180---------------------
181
182A Double Fault condition arises when an error is signalled to the PE while
183handling of a previously signalled error is still underway. When a Double Fault
184condition arises, the Arm RAS extensions only require for handler to perform
185orderly shutdown of the system, as recovery may be impossible.
186
187The RAS extensions part of Armv8.4 introduced new architectural features to deal
188with Double Fault conditions, specifically, the introduction of ``NMEA`` and
189``EASE`` bits to ``SCR_EL3`` register. These were introduced to assist EL3
190software which runs part of its entry/exit routines with exceptions momentarily
191maskedmeaning, in such systems, External Aborts/SErrors are not immediately
192handled when they occur, but only after the exceptions are unmasked again.
193
194|TF-A|, for legacy reasons, executes entire EL3 with all exceptions unmasked.
195This means that all exceptions routed to EL3 are handled immediately. |TF-A|
196thus is able to detect a Double Fault conditions in software, without needing
197the intended advantages of Armv8.4 Double Fault architecture extensions.
198
199Double faults are fatal, and terminate at the platform double fault handler, and
200doesn't return.
201
202Engaging the RAS framework
203--------------------------
204
Paul Beesley1fbc97b2019-01-11 18:26:51 +0000205Enabling RAS support is a platform choice constructed from three distinct, but
206related, build options:
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100207
208- ``RAS_EXTENSION=1`` includes the RAS framework in the run time firmware;
209
210- ``EL3_EXCEPTION_HANDLING=1`` enables handling of exceptions at EL3. See
211 `Interaction with Exception Handling Framework`_;
212
213- ``HANDLE_EA_EL3_FIRST=1`` enables routing of External Aborts and SErrors to
214 EL3.
215
216The RAS support in |TF-A| introduces a default implementation of
217``plat_ea_handler``, the External Abort handler in EL3. When ``RAS_EXTENSION``
218is set to ``1``, it'll first call ``ras_ea_handler()`` function, which is the
219top-level RAS exception handler. ``ras_ea_handler`` is responsible for iterating
220to through platform-supplied error records, probe them, and when an error is
221identified, look up and invoke the corresponding error handler.
222
223Note that, if the platform chooses to override the ``plat_ea_handler`` function
224and intend to use the RAS framework, it must explicitly call
225``ras_ea_handler()`` from within.
226
227Similarly, for RAS interrupts, the framework defines
228``ras_interrupt_handler()``. The RAS framework arranges for it to be invoked
229when a RAS interrupt taken at EL3. The function bisects the platform-supplied
230sorted array of interrupts to look up the error record information associated
231with the interrupt number. That error handler for that record is then invoked to
232handle the error.
233
234Interaction with Exception Handling Framework
235---------------------------------------------
236
237As mentioned in earlier sections, RAS framework interacts with the |EHF| to
238arbitrate handling of RAS exceptions with others that are routed to EL3. This
239means that the platform must partition a `priority level`__ for handling RAS
240exceptions. The platform must then define the macro ``PLAT_RAS_PRI`` to the
241priority level used for RAS exceptions. Platforms would typically want to
242allocate the highest secure priority for RAS handling.
243
244.. __: exception-handling.rst#partitioning-priority-levels
245
Paul Beesley1fbc97b2019-01-11 18:26:51 +0000246Handling of both `interrupt`__ and `non-interrupt`__ exceptions follow the
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100247sequences outlined in the |EHF| documentation. I.e., for interrupts, the
248priority management is implicit; but for non-interrupt exceptions, they're
249explicit using `EHF APIs`__.
250
251.. __: exception-handling.rst#interrupt-flow
252.. __: exception-handling.rst#non-interrupt-flow
253.. __: exception-handling.rst#activating-and-deactivating-priorities
254
255----
256
257*Copyright (c) 2018, Arm Limited and Contributors. All rights reserved.*