blob: cea74e9af3d443699fabd2606703e5d7febe52ef [file] [log] [blame]
Jeenu Viswambharane34bf582018-10-12 08:48:36 +01001RAS support in Trusted Firmware-A
2=================================
3
4.. section-numbering::
5 :suffix: .
6
7.. contents::
8 :depth: 2
9
10.. |EHF| replace:: Exception Handling Framework
11.. |TF-A| replace:: Trusted Firmware-A
12
13This document describes |TF-A| support for Arm Reliability, Availability, and
14Serviceability (RAS) extensions. RAS is a mandatory extension for Armv8.2 and
15later CPUs, and also an optional extension to the base Armv8.0 architecture.
16
17In conjunction with the |EHF|, support for RAS extension enables firmware-first
18paradigm for handling platform errors, in which exceptions resulting from
19errorsviz. Synchronous External Abort (SEA), Asynchronous External Abort
20(signalled as SErrors), Fault Handling and Error Recovery interrupts are routed
21to and handled in EL3. The |EHF| document mentions various `error handling
22use-cases`__.
23
24.. __: exception-handling.rst#delegation-use-cases
25
26For the description of Arm RAS extensions, Standard Error Records, and the
27precise definition of RAS terminology, please refer to the Arm Architecture
28Reference Manual. The rest of this document assumes familiarity with
29architecture and terminology.
30
31Overview
32--------
33
34As mentioned above, the RAS support in |TF-A| enables routing to and handling of
35exceptions resulting from platform errors in EL3. It allows the platform to
36define an External Abort handler, and to register RAS nodes and interrupts. RAS
37framework also provides `helpers`__ for accessing Standard Error Records as
38introduced by the RAS extensions.
39
40.. __: `Standard Error Record helpers`_
41
42The build option ``RAS_EXTENSION`` when set to ``1`` includes the RAS in run
43time firmware; ``EL3_EXCEPTION_HANDLING`` and ``HANDLE_EA_EL3_FIRST`` must also
44be set ``1``.
45
46.. _ras-figure:
47
48.. image:: draw.io/ras.svg
49
50See more on `Engaging the RAS framework`_.
51
52Platform APIs
53-------------
54
55The RAS framework allows the platform to define handlers for External Abort,
56Uncontainable Errors, Double Fault, and errors rising from EL3 execution. Please
57refer to the porting guide for the `RAS platform API descriptions`__.
58
59.. __: porting-guide.rst#external-abort-handling-and-ras-support
60
61Registering RAS error records
62-----------------------------
63
64RAS nodes are components in the system capable of signalling errors to PEs
65through one one of the notification mechanismsSEAs, SErrors, or interrupts. RAS
66nodes contain one or more error records, which are registers through which the
67nodes advertise various properties of the signalled error. Arm recommends that
68error records are implemented in the Standard Error Record format. The RAS
69architecture allows for error records to be accessible via. system or
70memory-mapped registers.
71
72The platform should enumerate the error records providing for each of them:
73
74- A handler to probe error records for errors;
75- When the probing identifies an error, a handler to handle it;
76- For memory-mapped error record, its base address and size in KB; for a system
77 register-accessed record, the start index of the record and number of
78 continuous records from that index;
79- Any node-specific auxiliary data.
80
81With this information supplied, when the run time firmware receives one of the
82notification mechanisms, the RAS framework can iterate through and probe error
83records for error, and invoke the appropriate handler to handle it.
84
85The RAS framework provides the macros to populate error record information. The
86macros are versioned, and the latest version as of this writing is 1. These
87macros create a structure of type ``struct err_record_info`` from its arguments,
88which are later passed to probe and error handlers.
89
90For memory-mapped error records:
91
92.. code:: c
93
94 ERR_RECORD_MEMMAP_V1(base_addr, size_num_k, probe, handler, aux)
95
96And, for system register ones:
97
98.. code:: c
99
100 ERR_RECORD_SYSREG_V1(idx_start, num_idx, probe, handler, aux)
101
102The probe handler must have the following prototype:
103
104.. code:: c
105
106 typedef int (*err_record_probe_t)(const struct err_record_info *info,
107 int *probe_data);
108
109The probe handler must return a non-zero value if an error was detected, or 0
110otherwise. The ``probe_data`` output parameter can be used to pass any useful
111information resulting from probe to the error handler (see `below`__). For
112example, it could return the index of the record.
113
114.. __: `Standard Error Record helpers`_
115
116The error handler must have the following prototype:
117
118.. code:: c
119
120 typedef int (*err_record_handler_t)(const struct err_record_info *info,
121 int probe_data, const struct err_handler_data *const data);
122
123The ``data`` constant parameter describes the various properties of the error,
124viz. the reason for the error, exception syndrome, and also ``flags``,
125``cookie``, and ``handle`` parameters from the `top-level exception handler`__.
126
127.. __: interrupt-framework-design.rst#el3-interrupts
128
129The platform is expected populate an array using the macros above, and register
130the it with the RAS framework using the macro ``REGISTER_ERR_RECORD_INFO()``,
131passing it the name of the array describing the records. Note that the macro
132must be used in the same file where the array is defined.
133
134Standard Error Record helpers
135~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
136
137The |TF-A| RAS framework provides probe handlers for Standard Error Records, for
138both memory-mapped and System Register accesses:
139
140.. code:: c
141
142 int ras_err_ser_probe_memmap(const struct err_record_info *info,
143 int *probe_data);
144
145 int ras_err_ser_probe_sysreg(const struct err_record_info *info,
146 int *probe_data);
147
148When the platform enumerates error records, for those records in the Standard
149Error Record format, these helpers maybe used instead of rolling out their own.
150Both helpers above:
151
152- Return non-zero value when an error is detected in a Standard Error Record;
153- Set ``probe_data`` to the index of the error record upon detecting an error.
154
155Registering RAS interrupts
156--------------------------
157
158RAS nodes can signal errors to the PE by raising Fault Handling and/or Error
159Recovery interrupts. For the firmware-first handling paradigm for interrupts to
160work, the platform must setup and register with |EHF|. See `Interaction with
161Exception Handling Framework`_.
162
163For each RAS interrupt, the platform has to provide structure of type ``struct
164ras_interrupt``:
165
166- Interrupt number;
167- The associated error record information (pointer to the corresponding
168 ``struct err_record_info``);
169- Optionally, a cookie.
170
171The platform is expected to define an array of ``struct ras_interrupt``, and
172register it with the RAS framework using the macro
173``REGISTER_RAS_INTERRUPTS()``, passing it the name of the array. Note that the
174macro must be used in the same file where the array is defined.
175
176The array of ``struct ras_interrupt`` must be sorted in the increasing order of
177interrupt number. This allows for fast look of handlers in order to service RAS
178interrupts.
179
180Double-fault handling
181---------------------
182
183A Double Fault condition arises when an error is signalled to the PE while
184handling of a previously signalled error is still underway. When a Double Fault
185condition arises, the Arm RAS extensions only require for handler to perform
186orderly shutdown of the system, as recovery may be impossible.
187
188The RAS extensions part of Armv8.4 introduced new architectural features to deal
189with Double Fault conditions, specifically, the introduction of ``NMEA`` and
190``EASE`` bits to ``SCR_EL3`` register. These were introduced to assist EL3
191software which runs part of its entry/exit routines with exceptions momentarily
192maskedmeaning, in such systems, External Aborts/SErrors are not immediately
193handled when they occur, but only after the exceptions are unmasked again.
194
195|TF-A|, for legacy reasons, executes entire EL3 with all exceptions unmasked.
196This means that all exceptions routed to EL3 are handled immediately. |TF-A|
197thus is able to detect a Double Fault conditions in software, without needing
198the intended advantages of Armv8.4 Double Fault architecture extensions.
199
200Double faults are fatal, and terminate at the platform double fault handler, and
201doesn't return.
202
203Engaging the RAS framework
204--------------------------
205
Paul Beesley1fbc97b2019-01-11 18:26:51 +0000206Enabling RAS support is a platform choice constructed from three distinct, but
207related, build options:
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100208
209- ``RAS_EXTENSION=1`` includes the RAS framework in the run time firmware;
210
211- ``EL3_EXCEPTION_HANDLING=1`` enables handling of exceptions at EL3. See
212 `Interaction with Exception Handling Framework`_;
213
214- ``HANDLE_EA_EL3_FIRST=1`` enables routing of External Aborts and SErrors to
215 EL3.
216
217The RAS support in |TF-A| introduces a default implementation of
218``plat_ea_handler``, the External Abort handler in EL3. When ``RAS_EXTENSION``
219is set to ``1``, it'll first call ``ras_ea_handler()`` function, which is the
220top-level RAS exception handler. ``ras_ea_handler`` is responsible for iterating
221to through platform-supplied error records, probe them, and when an error is
222identified, look up and invoke the corresponding error handler.
223
224Note that, if the platform chooses to override the ``plat_ea_handler`` function
225and intend to use the RAS framework, it must explicitly call
226``ras_ea_handler()`` from within.
227
228Similarly, for RAS interrupts, the framework defines
229``ras_interrupt_handler()``. The RAS framework arranges for it to be invoked
230when a RAS interrupt taken at EL3. The function bisects the platform-supplied
231sorted array of interrupts to look up the error record information associated
232with the interrupt number. That error handler for that record is then invoked to
233handle the error.
234
235Interaction with Exception Handling Framework
236---------------------------------------------
237
238As mentioned in earlier sections, RAS framework interacts with the |EHF| to
239arbitrate handling of RAS exceptions with others that are routed to EL3. This
240means that the platform must partition a `priority level`__ for handling RAS
241exceptions. The platform must then define the macro ``PLAT_RAS_PRI`` to the
242priority level used for RAS exceptions. Platforms would typically want to
243allocate the highest secure priority for RAS handling.
244
245.. __: exception-handling.rst#partitioning-priority-levels
246
Paul Beesley1fbc97b2019-01-11 18:26:51 +0000247Handling of both `interrupt`__ and `non-interrupt`__ exceptions follow the
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100248sequences outlined in the |EHF| documentation. I.e., for interrupts, the
249priority management is implicit; but for non-interrupt exceptions, they're
250explicit using `EHF APIs`__.
251
252.. __: exception-handling.rst#interrupt-flow
253.. __: exception-handling.rst#non-interrupt-flow
254.. __: exception-handling.rst#activating-and-deactivating-priorities
255
256----
257
258*Copyright (c) 2018, Arm Limited and Contributors. All rights reserved.*