blob: 67c1e5e181f5be3d0a6872a571a32071544f1091 [file] [log] [blame]
developer02e65912023-08-17 16:33:10 +08001+=============================================================================+
2| Copyright (c) 2010-2022 Rambus, Inc. and/or its subsidiaries. |
3| |
4| Subject : PEC API Implementation Notes |
5| Product : SLAD API |
6| Date : 02 December, 2022 |
7| |
8+=============================================================================+
9
10The SLAD API is a set of the API's one of which is the Packet Engine
11Control (PEC) API. The driver implementation specifics of these APIs are
12described in short documents that serve as an addendum to the API
13specifications. This document describes the PEC API.
14
15This document uses the phrase "configurable" to indicate that a parameter or
16option is build-time configurable in the driver. Please refer to the User Guide
17of the driver for details.
18
19
20PEC API
21-------
22
23One PEC implementation is available: ARM (Autonomous Ring Mode).
24
25Packet engine device
26 The PEC API implementation can support multiple packet processing
27 devices. These devices are identified by the interface ID parameter
28 in the PEC API functions and are sometimes referred to as rings.
29
30ARM mode
31 ARM mode uses DMA, can queue many jobs, handles commands and
32 results asynchronously (but always in-order) and also supports
33 fragmented data buffers (scatter / gather). The implementation
34 supports a single multi-threaded application per ring, allowing
35 concurrent and independent use of PEC_SA_Register,
36 PEC_SA_UnRegister, PEC_Packet_Put and PEC_Packet_Get. Both
37 PEC_Packet_Put and PEC_Packet_Get are not re-entrant, but
38 individual threads can use these functions concurrently.
39
40 Ensure that the Token Data, Packet Data and Context Data DMA
41 buffers are not re-used or freed by the execution context using
42 the PEC and DMABuf API's functions or by another execution
43 context until the processed result descriptor(s) referring to the
44 packet associated with these buffers is(are) fully processed by
45 the engine. This is required not only when in-place packet
46 transform is done using the same Packet Data DMA buffer as input
47 and output buffer but also when different DMA buffers are used
48 for the packet processing input and output data.
49
50Byte ordering (endianess)
51 The SA and Token buffers are considered arrays of 32bit integers,
52 in host-native byte ordering. The driver will change the byte
53 order if required if this is required (configurable). The data
54 buffers are considered byte arrays and the driver will not touch
55 these.
56
57Bounce Buffers
58 Bounce Buffer support can be removed (configurable) from the driver to
59 reduce footprint.
60 For ARM mode, the implementation can bounce the SA, Token and data buffers
61 if these are not DMA-safe. This requires that the buffer was registered
62 using DMABuf_Register with an unsupported AllocatorRef.
63 When bouncing an SA buffer, it will be copied to a bounce buffer in
64 PEC_SA_Register and copied back by PEC_SA_UnRegister.
65 When bouncing a token buffer, a bounce buffer is created by
66 PEC_Packet_Put and released by PEC_Packet_Get.
67 When bouncing a data buffer (because either the source or destination
68 requires bouncing), a single bounce buffer is created by PEC_Packet_Put
69 based on the largest of the source and destination buffers. The engine
70 then performs an in-place operation in the bounce buffer. PEC_Packet_Get
71 copies the result to the destination buffer and releases the bounce
72 buffer.
73
74Descriptor grouping
75 PEC_Packet_Get and PEC_Packet_Put can process up to a
76 configurable number of descriptors in one call.
77
78Queuing
79 The ring can queue a configurable number of jobs. This can be set to
80 hundreds or even thousands, at the cost of some memory footprint. This
81 can avoid queuing in software and can also give better performance
82 (avoids idle engine due to empty ring).
83
84Scatter / Gather support
85 The ARM mode supports the scatter/gather extension (configurable) of the
86 PEC API.
87 If the SrcPkt_Handle in the command descriptor is a PEC SG_List, a
88 packet is assumed to used gather.
89 If the DstPkt_Handle in the command descriptor is a PEC SG_List, a
90 packet is assumed to used scatter.
91
92 The application is responsible for setting up both the gather list
93 in SrcPkt_Handle and the scatter list in DstPkt_Handle for each packet.
94 The application is responsible for allocating and releasing the individual
95 gather and scatter buffers.
96
97 The buffers used for the Scatter and/or Gather data are not bounced and
98 must be allocated and provided to the driver as DMA-safe buffers. This
99 can be achieved by using the driver's DMABuf API.
100
101Continuous Scatter mode
102 The ARM mode supports continuous scatter mode, which can be enabled per
103 ring. When continuous scatter mode is enabled for a ring, the result
104 packets are written to a sequence of destination buffers supplied by the
105 function PEC_Scatter_Preload(). Each result packet will occupy one
106 or more destination buffers (it will be scattered), depending on the
107 packet length and the sizes of the destination buffers.
108
109 It is not determined in advance which destination buffers will be
110 used for the result packet of a certain PEC_Packet_Put(). In a
111 typical use case, the application will pre-allocate a number of
112 buffers, each of the same size, and submits them to the
113 PEC_Scatter_Preload() function. It will regularly call
114 PEC_Scatter_Preload() to refill the supply of destination buffers, after
115 buffers are used by result packets.
116
117 Continuous scatter mode can be used even if the driver is configured
118 without scatter-gather support, but in that case each packet is required
119 to fit into a single destination buffer.
120
121 Continuous scatter mode is not supported with LAC flows.
122
123Redirection
124 Some configurations of the hardware support redirection. A packet,
125 originally submitted with PEC_Packet_Put() can have its result appear
126 on a different ring or on the inline interface. A packet, originally
127 received on the inline interface, can have its result appear on
128 a ring.
129
130 Any ring from which packets can be redirected, must be configured with
131 continuous scatter mode. Any ring towards which packets can be redirected,
132 must be configured with continuous scatter mode. When redirection is
133 possible, the sequence of packets submitted with PEC_Packet_Put() and
134 the sequence of result packets retrieved with PEC_Packet_Get()
135 on the same ring are no longer related to one another.
136
137 When a packet submitted with PEC_Packet_Put() is redirected to the
138 inline interface, no result descriptor will be received with
139 PEC_Packet_Get(). When a packet is received on the inline interface and
140 it is redirected to a ring, a result descriptor will appear with no
141 corresponding command descriptor in PEC_Packet_Put().
142
143Command Descriptor fields
144 User_p is fully supported on rings that do not use continuous scatter mode
145 and allows the user to match results to commands. User_p is not supported
146 on rings with continuous scatter mode.
147
148 The Control1 field is not used by this implementation. The PEC
149 API function PEC_CD_Control_Write is not implemented. Instead use
150 the IOToken API to pass an array of 32-bit words via the
151 InputToken_p field. The application is responsible for allocating
152 this array. The HW_Services field in the data structure passed to
153 the IOToken API specifies the exact packet flow or alternatively
154 it can specify a record invalidation command instead. The values
155 to be filled in are provided by the firmware API.
156
157 Control2 can be used to specify the engine on which a packet must be
158 processed. This can be useful for protocols like TLS, where subsequent
159 packets of the same data stream must be processed on the same engine to
160 ensure in-order processing and in-order assignment of the sequence numbers.
161 Bit 5 in Control2 can be set if the engine is specified. The engine ID
162 is put in bits 4..0. Otherwise, the Control2 field should be all zero.
163
164 LAC packet flow:
165 - A valid Token_Handle must always be provided for each packet and
166 Token_WordCount must be set to the exact size in words of the token.
167 - SA_Handle1 must point to the main SA, which must be registered by
168 PEC_SA_Register.
169 - SA_Handle2 must be a null handle
170 - SA_WordCount is not used.
171 - DstPkt_Handle must always be provided, also for input-only operations.
172 When no destination buffer is required, it can be set to SrcPkt_Handle.
173 - The TokenHeaderWord passed to the IOToken API must be filled in.
174 - The Offset_ByteCount passed to the IOToken API is not used.
175
176 Other packet flows:
177 - Token_Handle is the null handle and Token_WordCount is zero.
178 - SA_Handle1 is the null handle if classification is used, else it is the
179 DMABuf handle representing a transform record.
180 - SA_Handle2 must be a null handle
181 - SA_WordCount is not used.
182 - DstPkt_Handle must always be provided (except for continuous
183 scatter mode), also for input-only operations.
184 When no destination buffer is required, it can be set to SrcPkt_Handle.
185 When continuous scatter mode is enabled, no destination handle must be
186 provided.
187 - The Offset_ByteCount field passed to the IOTOken API specifies the
188 number of bytes at the start of each packet that will be passed
189 unchanged and are not part of the packet to be processed.
190 - The NextHeader field passed to the IOToken API specifies the Next
191 Header field for IPsec packet flows that do not use network header
192 processing.
193
194 Record invalidation commands:
195 - Token_Handle is the null handle and Token_WordCount is zero.
196 - SA_Handle must point to the SA, transform or flow record to be
197 invalidated.
198 - SA_Handle2 must be a null handle
199 - SA_WordCount is not used.
200 - SrcPkt_Handle and DstPkt_Handle must both be null handles.
201 - SrcPkt_ByteCount is zero.
202
203 Note: A record invalidation command may be submitted via the PEC API
204 to the engine only when the engine has no packets being processed
205 for this record.
206
207Result Descriptor fields
208 User_p is fully supported on rings that do not use continuous scatter mode
209 and allows the user to match results to commands. User_p is not supported
210 on rings with continuous scatter mode.
211
212 SrcPkt_Handle and DstPkt_Handle are the same as provided in the command
213 descriptor. DstPkt_p is the host address for DstPkt_Handle.
214 On ring with continuous scatter mode, these fields are the NULL handle
215 and NULL pointer. On rings with continuous scatter mode, the NumParticles
216 field will be the number of scatter buffers used by the result packet.
217 At least one scatter buffer will be used, even if the result packet
218 has zero length. In some cases, the number of scatter buffer is
219 higher that would be required by the result packet.
220
221 DstPkt_ByteCount and Bypass_WordCount have been extracted from the engine
222 result descriptor as described in the engine datasheet under PE_LENGTH.
223 Bypass_WordCount should be the same as was provided in the command
224 descriptor.
225
226 For operations that do not require output buffers such as hash operations
227 the SrcPkt_Handle and DstPkt_Handle parameters in the
228 PEC_CommandDescriptor_t descriptor must be set equal by applications.
229 The advantage of this solution is that the driver still checks that
230 the output buffer handle is not NULL for all operations and can detect
231 errors in applications that do not do provide correct output buffer handle.
232 The disadvantage is that for hash operations this will degrade performance
233 because the driver will have to perform bounce buffer copy back to
234 the original buffer which is not needed and the driver will also
235 perform the PostDMA operation which is also not needed.
236
237 Status1 and Status2 reflect up to two words from the result token that
238 contain relevant status information. Use the function
239 PEC_RD_Status_Read to extract this information in an
240 engine-independent form.
241
242 More status information is passed in an array of 32-bit words via
243 the OutputToken_p field. The IOToken API can be used to extract
244 information from this array. The application is responsible for allocating
245 this array before the call to PEC_Packet_Get.
246
247Notify Requests (callbacks)
248 In ARM mode, two notify requests (commands and results) are
249 supported. The result notification callback is only invoked in
250 interrupt mode. The command notification callback is invoked
251 from within PEC_Packet_Get.
252
253SA Invalidation
254 In order to remove an SA from the system, it is required to carry out
255 the following operations in the specified order.
256 - Submit a special command descriptor with a transform record invalidation
257 command via PEC_Packet_Put(). This command will remove the record from
258 the record cache of the packet engine.
259 - Wait until the corresponding result descriptor is received via
260 PEC_Packet_Get().
261 - Call PEC_SA_UnRegister(). This command will take care of CPU cache
262 coherency, endianness conversion and bounce buffers, whichever applies.
263 - At this time the DMA buffer of the SA can be reused for a different
264 purpose or it can be freed.
265
266Banks for DMA resources
267 DMA-safe buffers for each data type must be allocated with the
268 correct Bank parameter (in DAMBuf_Properties_t).
269 Buffers for SA records must be allocated with Bank=1, all other
270 buffers must be allocated with Bank=0. On 64-bit hosts, SA buffers
271 must be allocated in a 4GB memory range, which is taken care of by
272 using Bank=1.
273
274SA resources
275 The functions PEC_SA_Register and PEC_SA_UnRegister take three
276 DMABuf handles as parameters. The first of these is always the
277 DMABuf handle representing the SA, the second is always a null
278 handle and the third is the null handle if the SA does not have
279 an ARC4 state record.
280
281 If the SA does have an ARC4 state record, the SA_Handle3
282 parameter represents the ARC4 state record. However this DMABuf
283 handle is supposed to represent the ARC4 state part within the SA
284 buffer. The application is supposed to use DMABuf_Register
285 (AllocatorRef=='R') to register a subset of the SA buffer as a
286 DMA Handle.
287
288Multiple Applications
289 The current implementation supports multiple rings (tested with
290 two rings) and each ring can be used by a separate application,
291 independently of other rings. The applications can run
292 concurrently, as long as each application uses a different
293 ring. The use of PEC_Packet_Put and PEC_Packet_Get by different
294 concurrent applications requires no locking. The implementation
295 of PEC_SA_UnRegister contains the required locking to support
296 multiple applications.
297
298Concurrent Context Synchronization (CCS)
299 The PEC API implementation supports a single multi-threaded application
300 per interface ID, allowing concurrent and independent use of
301 PEC_SA_Register, PEC_SA_UnRegister, PEC_Packet_Put and PEC_Packet_Get.
302 Multiple applications using the PEC API are also supported but they must
303 use different interface ID each.
304
305 Note: although the PEC_SA_Register and PEC_SA_UnRegister functions take
306 InterfaceId as an input parameter it is ignored by these functions
307 since this functions do nothing what is specific to an packet I/O
308 (ring) interface.
309
310 The PEC API implementation provides synchronization mechanisms for
311 concurrent contexts invoking the API functions. The API can be used
312 by multi-threaded user-space and kernel-space applications. The latter
313 can invoke the API functions from the user process as well as from softirq
314 contexts. Mixing user process execution with softirq contexts is also
315 supported. Both Linux Uni-Processor (UP) and Symmetric Multi-Processor
316 (SMP) kernel configurations are supported.
317
318 The PEC API allows for non-blocking synchronization concurrent context
319 invoking the API functions for different interface ID's. The only
320 exception are the PEC_Init and PEC_UnInit functions which both allow
321 for just one execution context at a time even for different interface ID's.
322 Also there should be no contexts executing the PEC_Packet_Put
323 or PEC_Packet_Get function code in order for the PEC_UnInit function
324 to succeed for the same interface ID.
325
326 For optimal utilization of the packet engine the PEC API user should allow
327 for concurrent contexts for the PEC_Packet_Put and PEC_Packet_Get
328 functions for the same interface ID. Note that having multiple concurrent
329 contexts invoking the PEC_Packet_Put function for the same interface ID
330 will not improve performance because this function does not allow more
331 than one execution context at a time for one interface ID. The same
332 applies for the PEC_Packet_Get function.
333
334 When a function from the PEC API detects that it competes for a resource
335 already used at the time by another context executing the PEC code it will
336 not block and return PEC_STATUS_BUSY return code. The caller should try
337 calling this function again short after.
338
339Debugging
340 The PEC_Put_Dump() and PEC_Get_Dump() functions can be used to print
341 the command ring and result ring administration and cached data as well
342 as the content of the ring buffers respectively. A slot corresponds to
343 a descriptor in the ring. These functions can be used to debug the packet
344 I/O functionality.
345
346
347<end of document>