blob: 67c1e5e181f5be3d0a6872a571a32071544f1091 [file] [log] [blame]
+=============================================================================+
| Copyright (c) 2010-2022 Rambus, Inc. and/or its subsidiaries. |
| |
| Subject : PEC API Implementation Notes |
| Product : SLAD API |
| Date : 02 December, 2022 |
| |
+=============================================================================+
The SLAD API is a set of the API's one of which is the Packet Engine
Control (PEC) API. The driver implementation specifics of these APIs are
described in short documents that serve as an addendum to the API
specifications. This document describes the PEC API.
This document uses the phrase "configurable" to indicate that a parameter or
option is build-time configurable in the driver. Please refer to the User Guide
of the driver for details.
PEC API
-------
One PEC implementation is available: ARM (Autonomous Ring Mode).
Packet engine device
The PEC API implementation can support multiple packet processing
devices. These devices are identified by the interface ID parameter
in the PEC API functions and are sometimes referred to as rings.
ARM mode
ARM mode uses DMA, can queue many jobs, handles commands and
results asynchronously (but always in-order) and also supports
fragmented data buffers (scatter / gather). The implementation
supports a single multi-threaded application per ring, allowing
concurrent and independent use of PEC_SA_Register,
PEC_SA_UnRegister, PEC_Packet_Put and PEC_Packet_Get. Both
PEC_Packet_Put and PEC_Packet_Get are not re-entrant, but
individual threads can use these functions concurrently.
Ensure that the Token Data, Packet Data and Context Data DMA
buffers are not re-used or freed by the execution context using
the PEC and DMABuf API's functions or by another execution
context until the processed result descriptor(s) referring to the
packet associated with these buffers is(are) fully processed by
the engine. This is required not only when in-place packet
transform is done using the same Packet Data DMA buffer as input
and output buffer but also when different DMA buffers are used
for the packet processing input and output data.
Byte ordering (endianess)
The SA and Token buffers are considered arrays of 32bit integers,
in host-native byte ordering. The driver will change the byte
order if required if this is required (configurable). The data
buffers are considered byte arrays and the driver will not touch
these.
Bounce Buffers
Bounce Buffer support can be removed (configurable) from the driver to
reduce footprint.
For ARM mode, the implementation can bounce the SA, Token and data buffers
if these are not DMA-safe. This requires that the buffer was registered
using DMABuf_Register with an unsupported AllocatorRef.
When bouncing an SA buffer, it will be copied to a bounce buffer in
PEC_SA_Register and copied back by PEC_SA_UnRegister.
When bouncing a token buffer, a bounce buffer is created by
PEC_Packet_Put and released by PEC_Packet_Get.
When bouncing a data buffer (because either the source or destination
requires bouncing), a single bounce buffer is created by PEC_Packet_Put
based on the largest of the source and destination buffers. The engine
then performs an in-place operation in the bounce buffer. PEC_Packet_Get
copies the result to the destination buffer and releases the bounce
buffer.
Descriptor grouping
PEC_Packet_Get and PEC_Packet_Put can process up to a
configurable number of descriptors in one call.
Queuing
The ring can queue a configurable number of jobs. This can be set to
hundreds or even thousands, at the cost of some memory footprint. This
can avoid queuing in software and can also give better performance
(avoids idle engine due to empty ring).
Scatter / Gather support
The ARM mode supports the scatter/gather extension (configurable) of the
PEC API.
If the SrcPkt_Handle in the command descriptor is a PEC SG_List, a
packet is assumed to used gather.
If the DstPkt_Handle in the command descriptor is a PEC SG_List, a
packet is assumed to used scatter.
The application is responsible for setting up both the gather list
in SrcPkt_Handle and the scatter list in DstPkt_Handle for each packet.
The application is responsible for allocating and releasing the individual
gather and scatter buffers.
The buffers used for the Scatter and/or Gather data are not bounced and
must be allocated and provided to the driver as DMA-safe buffers. This
can be achieved by using the driver's DMABuf API.
Continuous Scatter mode
The ARM mode supports continuous scatter mode, which can be enabled per
ring. When continuous scatter mode is enabled for a ring, the result
packets are written to a sequence of destination buffers supplied by the
function PEC_Scatter_Preload(). Each result packet will occupy one
or more destination buffers (it will be scattered), depending on the
packet length and the sizes of the destination buffers.
It is not determined in advance which destination buffers will be
used for the result packet of a certain PEC_Packet_Put(). In a
typical use case, the application will pre-allocate a number of
buffers, each of the same size, and submits them to the
PEC_Scatter_Preload() function. It will regularly call
PEC_Scatter_Preload() to refill the supply of destination buffers, after
buffers are used by result packets.
Continuous scatter mode can be used even if the driver is configured
without scatter-gather support, but in that case each packet is required
to fit into a single destination buffer.
Continuous scatter mode is not supported with LAC flows.
Redirection
Some configurations of the hardware support redirection. A packet,
originally submitted with PEC_Packet_Put() can have its result appear
on a different ring or on the inline interface. A packet, originally
received on the inline interface, can have its result appear on
a ring.
Any ring from which packets can be redirected, must be configured with
continuous scatter mode. Any ring towards which packets can be redirected,
must be configured with continuous scatter mode. When redirection is
possible, the sequence of packets submitted with PEC_Packet_Put() and
the sequence of result packets retrieved with PEC_Packet_Get()
on the same ring are no longer related to one another.
When a packet submitted with PEC_Packet_Put() is redirected to the
inline interface, no result descriptor will be received with
PEC_Packet_Get(). When a packet is received on the inline interface and
it is redirected to a ring, a result descriptor will appear with no
corresponding command descriptor in PEC_Packet_Put().
Command Descriptor fields
User_p is fully supported on rings that do not use continuous scatter mode
and allows the user to match results to commands. User_p is not supported
on rings with continuous scatter mode.
The Control1 field is not used by this implementation. The PEC
API function PEC_CD_Control_Write is not implemented. Instead use
the IOToken API to pass an array of 32-bit words via the
InputToken_p field. The application is responsible for allocating
this array. The HW_Services field in the data structure passed to
the IOToken API specifies the exact packet flow or alternatively
it can specify a record invalidation command instead. The values
to be filled in are provided by the firmware API.
Control2 can be used to specify the engine on which a packet must be
processed. This can be useful for protocols like TLS, where subsequent
packets of the same data stream must be processed on the same engine to
ensure in-order processing and in-order assignment of the sequence numbers.
Bit 5 in Control2 can be set if the engine is specified. The engine ID
is put in bits 4..0. Otherwise, the Control2 field should be all zero.
LAC packet flow:
- A valid Token_Handle must always be provided for each packet and
Token_WordCount must be set to the exact size in words of the token.
- SA_Handle1 must point to the main SA, which must be registered by
PEC_SA_Register.
- SA_Handle2 must be a null handle
- SA_WordCount is not used.
- DstPkt_Handle must always be provided, also for input-only operations.
When no destination buffer is required, it can be set to SrcPkt_Handle.
- The TokenHeaderWord passed to the IOToken API must be filled in.
- The Offset_ByteCount passed to the IOToken API is not used.
Other packet flows:
- Token_Handle is the null handle and Token_WordCount is zero.
- SA_Handle1 is the null handle if classification is used, else it is the
DMABuf handle representing a transform record.
- SA_Handle2 must be a null handle
- SA_WordCount is not used.
- DstPkt_Handle must always be provided (except for continuous
scatter mode), also for input-only operations.
When no destination buffer is required, it can be set to SrcPkt_Handle.
When continuous scatter mode is enabled, no destination handle must be
provided.
- The Offset_ByteCount field passed to the IOTOken API specifies the
number of bytes at the start of each packet that will be passed
unchanged and are not part of the packet to be processed.
- The NextHeader field passed to the IOToken API specifies the Next
Header field for IPsec packet flows that do not use network header
processing.
Record invalidation commands:
- Token_Handle is the null handle and Token_WordCount is zero.
- SA_Handle must point to the SA, transform or flow record to be
invalidated.
- SA_Handle2 must be a null handle
- SA_WordCount is not used.
- SrcPkt_Handle and DstPkt_Handle must both be null handles.
- SrcPkt_ByteCount is zero.
Note: A record invalidation command may be submitted via the PEC API
to the engine only when the engine has no packets being processed
for this record.
Result Descriptor fields
User_p is fully supported on rings that do not use continuous scatter mode
and allows the user to match results to commands. User_p is not supported
on rings with continuous scatter mode.
SrcPkt_Handle and DstPkt_Handle are the same as provided in the command
descriptor. DstPkt_p is the host address for DstPkt_Handle.
On ring with continuous scatter mode, these fields are the NULL handle
and NULL pointer. On rings with continuous scatter mode, the NumParticles
field will be the number of scatter buffers used by the result packet.
At least one scatter buffer will be used, even if the result packet
has zero length. In some cases, the number of scatter buffer is
higher that would be required by the result packet.
DstPkt_ByteCount and Bypass_WordCount have been extracted from the engine
result descriptor as described in the engine datasheet under PE_LENGTH.
Bypass_WordCount should be the same as was provided in the command
descriptor.
For operations that do not require output buffers such as hash operations
the SrcPkt_Handle and DstPkt_Handle parameters in the
PEC_CommandDescriptor_t descriptor must be set equal by applications.
The advantage of this solution is that the driver still checks that
the output buffer handle is not NULL for all operations and can detect
errors in applications that do not do provide correct output buffer handle.
The disadvantage is that for hash operations this will degrade performance
because the driver will have to perform bounce buffer copy back to
the original buffer which is not needed and the driver will also
perform the PostDMA operation which is also not needed.
Status1 and Status2 reflect up to two words from the result token that
contain relevant status information. Use the function
PEC_RD_Status_Read to extract this information in an
engine-independent form.
More status information is passed in an array of 32-bit words via
the OutputToken_p field. The IOToken API can be used to extract
information from this array. The application is responsible for allocating
this array before the call to PEC_Packet_Get.
Notify Requests (callbacks)
In ARM mode, two notify requests (commands and results) are
supported. The result notification callback is only invoked in
interrupt mode. The command notification callback is invoked
from within PEC_Packet_Get.
SA Invalidation
In order to remove an SA from the system, it is required to carry out
the following operations in the specified order.
- Submit a special command descriptor with a transform record invalidation
command via PEC_Packet_Put(). This command will remove the record from
the record cache of the packet engine.
- Wait until the corresponding result descriptor is received via
PEC_Packet_Get().
- Call PEC_SA_UnRegister(). This command will take care of CPU cache
coherency, endianness conversion and bounce buffers, whichever applies.
- At this time the DMA buffer of the SA can be reused for a different
purpose or it can be freed.
Banks for DMA resources
DMA-safe buffers for each data type must be allocated with the
correct Bank parameter (in DAMBuf_Properties_t).
Buffers for SA records must be allocated with Bank=1, all other
buffers must be allocated with Bank=0. On 64-bit hosts, SA buffers
must be allocated in a 4GB memory range, which is taken care of by
using Bank=1.
SA resources
The functions PEC_SA_Register and PEC_SA_UnRegister take three
DMABuf handles as parameters. The first of these is always the
DMABuf handle representing the SA, the second is always a null
handle and the third is the null handle if the SA does not have
an ARC4 state record.
If the SA does have an ARC4 state record, the SA_Handle3
parameter represents the ARC4 state record. However this DMABuf
handle is supposed to represent the ARC4 state part within the SA
buffer. The application is supposed to use DMABuf_Register
(AllocatorRef=='R') to register a subset of the SA buffer as a
DMA Handle.
Multiple Applications
The current implementation supports multiple rings (tested with
two rings) and each ring can be used by a separate application,
independently of other rings. The applications can run
concurrently, as long as each application uses a different
ring. The use of PEC_Packet_Put and PEC_Packet_Get by different
concurrent applications requires no locking. The implementation
of PEC_SA_UnRegister contains the required locking to support
multiple applications.
Concurrent Context Synchronization (CCS)
The PEC API implementation supports a single multi-threaded application
per interface ID, allowing concurrent and independent use of
PEC_SA_Register, PEC_SA_UnRegister, PEC_Packet_Put and PEC_Packet_Get.
Multiple applications using the PEC API are also supported but they must
use different interface ID each.
Note: although the PEC_SA_Register and PEC_SA_UnRegister functions take
InterfaceId as an input parameter it is ignored by these functions
since this functions do nothing what is specific to an packet I/O
(ring) interface.
The PEC API implementation provides synchronization mechanisms for
concurrent contexts invoking the API functions. The API can be used
by multi-threaded user-space and kernel-space applications. The latter
can invoke the API functions from the user process as well as from softirq
contexts. Mixing user process execution with softirq contexts is also
supported. Both Linux Uni-Processor (UP) and Symmetric Multi-Processor
(SMP) kernel configurations are supported.
The PEC API allows for non-blocking synchronization concurrent context
invoking the API functions for different interface ID's. The only
exception are the PEC_Init and PEC_UnInit functions which both allow
for just one execution context at a time even for different interface ID's.
Also there should be no contexts executing the PEC_Packet_Put
or PEC_Packet_Get function code in order for the PEC_UnInit function
to succeed for the same interface ID.
For optimal utilization of the packet engine the PEC API user should allow
for concurrent contexts for the PEC_Packet_Put and PEC_Packet_Get
functions for the same interface ID. Note that having multiple concurrent
contexts invoking the PEC_Packet_Put function for the same interface ID
will not improve performance because this function does not allow more
than one execution context at a time for one interface ID. The same
applies for the PEC_Packet_Get function.
When a function from the PEC API detects that it competes for a resource
already used at the time by another context executing the PEC code it will
not block and return PEC_STATUS_BUSY return code. The caller should try
calling this function again short after.
Debugging
The PEC_Put_Dump() and PEC_Get_Dump() functions can be used to print
the command ring and result ring administration and cached data as well
as the content of the ring buffers respectively. A slot corresponds to
a descriptor in the ring. These functions can be used to debug the packet
I/O functionality.
<end of document>