blob: 86529d740be13c61de311d966b245955a67c5bf9 [file] [log] [blame]
Paul Beesleyfc9ee362019-03-07 15:47:15 +00001Reliability, Availability, and Serviceability (RAS) Extensions
2==============================================================
Jeenu Viswambharane34bf582018-10-12 08:48:36 +01003
Jeenu Viswambharane34bf582018-10-12 08:48:36 +01004This document describes |TF-A| support for Arm Reliability, Availability, and
5Serviceability (RAS) extensions. RAS is a mandatory extension for Armv8.2 and
6later CPUs, and also an optional extension to the base Armv8.0 architecture.
7
8In conjunction with the |EHF|, support for RAS extension enables firmware-first
Antonio Nino Diaz56b68ad2019-02-28 13:35:21 +00009paradigm for handling platform errors: exceptions resulting from errors are
10routed to and handled in EL3. Said errors are Synchronous External Abort (SEA),
11Asynchronous External Abort (signalled as SErrors), Fault Handling and Error
12Recovery interrupts. The |EHF| document mentions various `error handling
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010013use-cases`__.
14
15.. __: exception-handling.rst#delegation-use-cases
16
17For the description of Arm RAS extensions, Standard Error Records, and the
18precise definition of RAS terminology, please refer to the Arm Architecture
19Reference Manual. The rest of this document assumes familiarity with
20architecture and terminology.
21
22Overview
23--------
24
25As mentioned above, the RAS support in |TF-A| enables routing to and handling of
26exceptions resulting from platform errors in EL3. It allows the platform to
27define an External Abort handler, and to register RAS nodes and interrupts. RAS
28framework also provides `helpers`__ for accessing Standard Error Records as
29introduced by the RAS extensions.
30
31.. __: `Standard Error Record helpers`_
32
33The build option ``RAS_EXTENSION`` when set to ``1`` includes the RAS in run
34time firmware; ``EL3_EXCEPTION_HANDLING`` and ``HANDLE_EA_EL3_FIRST`` must also
Varun Wadekar92234852020-06-12 10:11:28 -070035be set ``1``. ``RAS_TRAP_LOWER_EL_ERR_ACCESS`` controls the access to the RAS
36error record registers from lower ELs.
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010037
38.. _ras-figure:
39
Paul Beesley814f8c02019-03-13 15:49:27 +000040.. image:: ../resources/diagrams/draw.io/ras.svg
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010041
42See more on `Engaging the RAS framework`_.
43
44Platform APIs
45-------------
46
47The RAS framework allows the platform to define handlers for External Abort,
48Uncontainable Errors, Double Fault, and errors rising from EL3 execution. Please
49refer to the porting guide for the `RAS platform API descriptions`__.
50
Paul Beesleyea225122019-02-11 17:54:45 +000051.. __: ../getting_started/porting-guide.rst#external-abort-handling-and-ras-support
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010052
53Registering RAS error records
54-----------------------------
55
56RAS nodes are components in the system capable of signalling errors to PEs
57through one one of the notification mechanismsSEAs, SErrors, or interrupts. RAS
58nodes contain one or more error records, which are registers through which the
59nodes advertise various properties of the signalled error. Arm recommends that
60error records are implemented in the Standard Error Record format. The RAS
Antonio Nino Diaz56b68ad2019-02-28 13:35:21 +000061architecture allows for error records to be accessible via system or
Jeenu Viswambharane34bf582018-10-12 08:48:36 +010062memory-mapped registers.
63
64The platform should enumerate the error records providing for each of them:
65
66- A handler to probe error records for errors;
67- When the probing identifies an error, a handler to handle it;
68- For memory-mapped error record, its base address and size in KB; for a system
69 register-accessed record, the start index of the record and number of
70 continuous records from that index;
71- Any node-specific auxiliary data.
72
73With this information supplied, when the run time firmware receives one of the
74notification mechanisms, the RAS framework can iterate through and probe error
75records for error, and invoke the appropriate handler to handle it.
76
77The RAS framework provides the macros to populate error record information. The
78macros are versioned, and the latest version as of this writing is 1. These
79macros create a structure of type ``struct err_record_info`` from its arguments,
80which are later passed to probe and error handlers.
81
82For memory-mapped error records:
83
84.. code:: c
85
86 ERR_RECORD_MEMMAP_V1(base_addr, size_num_k, probe, handler, aux)
87
88And, for system register ones:
89
90.. code:: c
91
92 ERR_RECORD_SYSREG_V1(idx_start, num_idx, probe, handler, aux)
93
94The probe handler must have the following prototype:
95
96.. code:: c
97
98 typedef int (*err_record_probe_t)(const struct err_record_info *info,
99 int *probe_data);
100
101The probe handler must return a non-zero value if an error was detected, or 0
102otherwise. The ``probe_data`` output parameter can be used to pass any useful
103information resulting from probe to the error handler (see `below`__). For
104example, it could return the index of the record.
105
106.. __: `Standard Error Record helpers`_
107
108The error handler must have the following prototype:
109
110.. code:: c
111
112 typedef int (*err_record_handler_t)(const struct err_record_info *info,
113 int probe_data, const struct err_handler_data *const data);
114
115The ``data`` constant parameter describes the various properties of the error,
Antonio Nino Diaz56b68ad2019-02-28 13:35:21 +0000116including the reason for the error, exception syndrome, and also ``flags``,
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100117``cookie``, and ``handle`` parameters from the `top-level exception handler`__.
118
119.. __: interrupt-framework-design.rst#el3-interrupts
120
121The platform is expected populate an array using the macros above, and register
122the it with the RAS framework using the macro ``REGISTER_ERR_RECORD_INFO()``,
123passing it the name of the array describing the records. Note that the macro
124must be used in the same file where the array is defined.
125
126Standard Error Record helpers
127~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
128
129The |TF-A| RAS framework provides probe handlers for Standard Error Records, for
130both memory-mapped and System Register accesses:
131
132.. code:: c
133
134 int ras_err_ser_probe_memmap(const struct err_record_info *info,
135 int *probe_data);
136
137 int ras_err_ser_probe_sysreg(const struct err_record_info *info,
138 int *probe_data);
139
140When the platform enumerates error records, for those records in the Standard
141Error Record format, these helpers maybe used instead of rolling out their own.
142Both helpers above:
143
144- Return non-zero value when an error is detected in a Standard Error Record;
145- Set ``probe_data`` to the index of the error record upon detecting an error.
146
147Registering RAS interrupts
148--------------------------
149
150RAS nodes can signal errors to the PE by raising Fault Handling and/or Error
151Recovery interrupts. For the firmware-first handling paradigm for interrupts to
152work, the platform must setup and register with |EHF|. See `Interaction with
153Exception Handling Framework`_.
154
155For each RAS interrupt, the platform has to provide structure of type ``struct
156ras_interrupt``:
157
158- Interrupt number;
159- The associated error record information (pointer to the corresponding
160 ``struct err_record_info``);
161- Optionally, a cookie.
162
163The platform is expected to define an array of ``struct ras_interrupt``, and
164register it with the RAS framework using the macro
165``REGISTER_RAS_INTERRUPTS()``, passing it the name of the array. Note that the
166macro must be used in the same file where the array is defined.
167
168The array of ``struct ras_interrupt`` must be sorted in the increasing order of
169interrupt number. This allows for fast look of handlers in order to service RAS
170interrupts.
171
172Double-fault handling
173---------------------
174
175A Double Fault condition arises when an error is signalled to the PE while
176handling of a previously signalled error is still underway. When a Double Fault
177condition arises, the Arm RAS extensions only require for handler to perform
178orderly shutdown of the system, as recovery may be impossible.
179
180The RAS extensions part of Armv8.4 introduced new architectural features to deal
181with Double Fault conditions, specifically, the introduction of ``NMEA`` and
182``EASE`` bits to ``SCR_EL3`` register. These were introduced to assist EL3
183software which runs part of its entry/exit routines with exceptions momentarily
184maskedmeaning, in such systems, External Aborts/SErrors are not immediately
185handled when they occur, but only after the exceptions are unmasked again.
186
187|TF-A|, for legacy reasons, executes entire EL3 with all exceptions unmasked.
188This means that all exceptions routed to EL3 are handled immediately. |TF-A|
189thus is able to detect a Double Fault conditions in software, without needing
190the intended advantages of Armv8.4 Double Fault architecture extensions.
191
192Double faults are fatal, and terminate at the platform double fault handler, and
193doesn't return.
194
195Engaging the RAS framework
196--------------------------
197
Paul Beesley1fbc97b2019-01-11 18:26:51 +0000198Enabling RAS support is a platform choice constructed from three distinct, but
199related, build options:
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100200
201- ``RAS_EXTENSION=1`` includes the RAS framework in the run time firmware;
202
203- ``EL3_EXCEPTION_HANDLING=1`` enables handling of exceptions at EL3. See
204 `Interaction with Exception Handling Framework`_;
205
206- ``HANDLE_EA_EL3_FIRST=1`` enables routing of External Aborts and SErrors to
207 EL3.
208
209The RAS support in |TF-A| introduces a default implementation of
210``plat_ea_handler``, the External Abort handler in EL3. When ``RAS_EXTENSION``
211is set to ``1``, it'll first call ``ras_ea_handler()`` function, which is the
212top-level RAS exception handler. ``ras_ea_handler`` is responsible for iterating
213to through platform-supplied error records, probe them, and when an error is
214identified, look up and invoke the corresponding error handler.
215
216Note that, if the platform chooses to override the ``plat_ea_handler`` function
217and intend to use the RAS framework, it must explicitly call
218``ras_ea_handler()`` from within.
219
220Similarly, for RAS interrupts, the framework defines
221``ras_interrupt_handler()``. The RAS framework arranges for it to be invoked
222when a RAS interrupt taken at EL3. The function bisects the platform-supplied
223sorted array of interrupts to look up the error record information associated
224with the interrupt number. That error handler for that record is then invoked to
225handle the error.
226
227Interaction with Exception Handling Framework
228---------------------------------------------
229
230As mentioned in earlier sections, RAS framework interacts with the |EHF| to
231arbitrate handling of RAS exceptions with others that are routed to EL3. This
232means that the platform must partition a `priority level`__ for handling RAS
233exceptions. The platform must then define the macro ``PLAT_RAS_PRI`` to the
234priority level used for RAS exceptions. Platforms would typically want to
235allocate the highest secure priority for RAS handling.
236
237.. __: exception-handling.rst#partitioning-priority-levels
238
Paul Beesley1fbc97b2019-01-11 18:26:51 +0000239Handling of both `interrupt`__ and `non-interrupt`__ exceptions follow the
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100240sequences outlined in the |EHF| documentation. I.e., for interrupts, the
241priority management is implicit; but for non-interrupt exceptions, they're
242explicit using `EHF APIs`__.
243
244.. __: exception-handling.rst#interrupt-flow
245.. __: exception-handling.rst#non-interrupt-flow
246.. __: exception-handling.rst#activating-and-deactivating-priorities
247
Paul Beesleyf8640672019-04-12 14:19:42 +0100248--------------
Jeenu Viswambharane34bf582018-10-12 08:48:36 +0100249
Paul Beesleyf8640672019-04-12 14:19:42 +0100250*Copyright (c) 2018-2019, Arm Limited and Contributors. All rights reserved.*