Blame - package-21.02/kernel/crypto-eip/src/ddk/slad/PEC API Implementation Notes.txt - openwrt/feeds/mtk-openwrt-feeds

blob: 67c1e5e181f5be3d0a6872a571a32071544f1091 [file] [log] [blame]

developer	02e6591	2023-08-17 16:33:10 +0800	[diff] [blame]	1	+=============================================================================+
				2	\| Copyright (c) 2010-2022 Rambus, Inc. and/or its subsidiaries. \|
				3	\| \|
				4	\| Subject : PEC API Implementation Notes \|
				5	\| Product : SLAD API \|
				6	\| Date : 02 December, 2022 \|
				7	\| \|
				8	+=============================================================================+
				9
				10	The SLAD API is a set of the API's one of which is the Packet Engine
				11	Control (PEC) API. The driver implementation specifics of these APIs are
				12	described in short documents that serve as an addendum to the API
				13	specifications. This document describes the PEC API.
				14
				15	This document uses the phrase "configurable" to indicate that a parameter or
				16	option is build-time configurable in the driver. Please refer to the User Guide
				17	of the driver for details.
				18
				19
				20	PEC API
				21	-------
				22
				23	One PEC implementation is available: ARM (Autonomous Ring Mode).
				24
				25	Packet engine device
				26	The PEC API implementation can support multiple packet processing
				27	devices. These devices are identified by the interface ID parameter
				28	in the PEC API functions and are sometimes referred to as rings.
				29
				30	ARM mode
				31	ARM mode uses DMA, can queue many jobs, handles commands and
				32	results asynchronously (but always in-order) and also supports
				33	fragmented data buffers (scatter / gather). The implementation
				34	supports a single multi-threaded application per ring, allowing
				35	concurrent and independent use of PEC_SA_Register,
				36	PEC_SA_UnRegister, PEC_Packet_Put and PEC_Packet_Get. Both
				37	PEC_Packet_Put and PEC_Packet_Get are not re-entrant, but
				38	individual threads can use these functions concurrently.
				39
				40	Ensure that the Token Data, Packet Data and Context Data DMA
				41	buffers are not re-used or freed by the execution context using
				42	the PEC and DMABuf API's functions or by another execution
				43	context until the processed result descriptor(s) referring to the
				44	packet associated with these buffers is(are) fully processed by
				45	the engine. This is required not only when in-place packet
				46	transform is done using the same Packet Data DMA buffer as input
				47	and output buffer but also when different DMA buffers are used
				48	for the packet processing input and output data.
				49
				50	Byte ordering (endianess)
				51	The SA and Token buffers are considered arrays of 32bit integers,
				52	in host-native byte ordering. The driver will change the byte
				53	order if required if this is required (configurable). The data
				54	buffers are considered byte arrays and the driver will not touch
				55	these.
				56
				57	Bounce Buffers
				58	Bounce Buffer support can be removed (configurable) from the driver to
				59	reduce footprint.
				60	For ARM mode, the implementation can bounce the SA, Token and data buffers
				61	if these are not DMA-safe. This requires that the buffer was registered
				62	using DMABuf_Register with an unsupported AllocatorRef.
				63	When bouncing an SA buffer, it will be copied to a bounce buffer in
				64	PEC_SA_Register and copied back by PEC_SA_UnRegister.
				65	When bouncing a token buffer, a bounce buffer is created by
				66	PEC_Packet_Put and released by PEC_Packet_Get.
				67	When bouncing a data buffer (because either the source or destination
				68	requires bouncing), a single bounce buffer is created by PEC_Packet_Put
				69	based on the largest of the source and destination buffers. The engine
				70	then performs an in-place operation in the bounce buffer. PEC_Packet_Get
				71	copies the result to the destination buffer and releases the bounce
				72	buffer.
				73
				74	Descriptor grouping
				75	PEC_Packet_Get and PEC_Packet_Put can process up to a
				76	configurable number of descriptors in one call.
				77
				78	Queuing
				79	The ring can queue a configurable number of jobs. This can be set to
				80	hundreds or even thousands, at the cost of some memory footprint. This
				81	can avoid queuing in software and can also give better performance
				82	(avoids idle engine due to empty ring).
				83
				84	Scatter / Gather support
				85	The ARM mode supports the scatter/gather extension (configurable) of the
				86	PEC API.
				87	If the SrcPkt_Handle in the command descriptor is a PEC SG_List, a
				88	packet is assumed to used gather.
				89	If the DstPkt_Handle in the command descriptor is a PEC SG_List, a
				90	packet is assumed to used scatter.
				91
				92	The application is responsible for setting up both the gather list
				93	in SrcPkt_Handle and the scatter list in DstPkt_Handle for each packet.
				94	The application is responsible for allocating and releasing the individual
				95	gather and scatter buffers.
				96
				97	The buffers used for the Scatter and/or Gather data are not bounced and
				98	must be allocated and provided to the driver as DMA-safe buffers. This
				99	can be achieved by using the driver's DMABuf API.
				100
				101	Continuous Scatter mode
				102	The ARM mode supports continuous scatter mode, which can be enabled per
				103	ring. When continuous scatter mode is enabled for a ring, the result
				104	packets are written to a sequence of destination buffers supplied by the
				105	function PEC_Scatter_Preload(). Each result packet will occupy one
				106	or more destination buffers (it will be scattered), depending on the
				107	packet length and the sizes of the destination buffers.
				108
				109	It is not determined in advance which destination buffers will be
				110	used for the result packet of a certain PEC_Packet_Put(). In a
				111	typical use case, the application will pre-allocate a number of
				112	buffers, each of the same size, and submits them to the
				113	PEC_Scatter_Preload() function. It will regularly call
				114	PEC_Scatter_Preload() to refill the supply of destination buffers, after
				115	buffers are used by result packets.
				116
				117	Continuous scatter mode can be used even if the driver is configured
				118	without scatter-gather support, but in that case each packet is required
				119	to fit into a single destination buffer.
				120
				121	Continuous scatter mode is not supported with LAC flows.
				122
				123	Redirection
				124	Some configurations of the hardware support redirection. A packet,
				125	originally submitted with PEC_Packet_Put() can have its result appear
				126	on a different ring or on the inline interface. A packet, originally
				127	received on the inline interface, can have its result appear on
				128	a ring.
				129
				130	Any ring from which packets can be redirected, must be configured with
				131	continuous scatter mode. Any ring towards which packets can be redirected,
				132	must be configured with continuous scatter mode. When redirection is
				133	possible, the sequence of packets submitted with PEC_Packet_Put() and
				134	the sequence of result packets retrieved with PEC_Packet_Get()
				135	on the same ring are no longer related to one another.
				136
				137	When a packet submitted with PEC_Packet_Put() is redirected to the
				138	inline interface, no result descriptor will be received with
				139	PEC_Packet_Get(). When a packet is received on the inline interface and
				140	it is redirected to a ring, a result descriptor will appear with no
				141	corresponding command descriptor in PEC_Packet_Put().
				142
				143	Command Descriptor fields
				144	User_p is fully supported on rings that do not use continuous scatter mode
				145	and allows the user to match results to commands. User_p is not supported
				146	on rings with continuous scatter mode.
				147
				148	The Control1 field is not used by this implementation. The PEC
				149	API function PEC_CD_Control_Write is not implemented. Instead use
				150	the IOToken API to pass an array of 32-bit words via the
				151	InputToken_p field. The application is responsible for allocating
				152	this array. The HW_Services field in the data structure passed to
				153	the IOToken API specifies the exact packet flow or alternatively
				154	it can specify a record invalidation command instead. The values
				155	to be filled in are provided by the firmware API.
				156
				157	Control2 can be used to specify the engine on which a packet must be
				158	processed. This can be useful for protocols like TLS, where subsequent
				159	packets of the same data stream must be processed on the same engine to
				160	ensure in-order processing and in-order assignment of the sequence numbers.
				161	Bit 5 in Control2 can be set if the engine is specified. The engine ID
				162	is put in bits 4..0. Otherwise, the Control2 field should be all zero.
				163
				164	LAC packet flow:
				165	- A valid Token_Handle must always be provided for each packet and
				166	Token_WordCount must be set to the exact size in words of the token.
				167	- SA_Handle1 must point to the main SA, which must be registered by
				168	PEC_SA_Register.
				169	- SA_Handle2 must be a null handle
				170	- SA_WordCount is not used.
				171	- DstPkt_Handle must always be provided, also for input-only operations.
				172	When no destination buffer is required, it can be set to SrcPkt_Handle.
				173	- The TokenHeaderWord passed to the IOToken API must be filled in.
				174	- The Offset_ByteCount passed to the IOToken API is not used.
				175
				176	Other packet flows:
				177	- Token_Handle is the null handle and Token_WordCount is zero.
				178	- SA_Handle1 is the null handle if classification is used, else it is the
				179	DMABuf handle representing a transform record.
				180	- SA_Handle2 must be a null handle
				181	- SA_WordCount is not used.
				182	- DstPkt_Handle must always be provided (except for continuous
				183	scatter mode), also for input-only operations.
				184	When no destination buffer is required, it can be set to SrcPkt_Handle.
				185	When continuous scatter mode is enabled, no destination handle must be
				186	provided.
				187	- The Offset_ByteCount field passed to the IOTOken API specifies the
				188	number of bytes at the start of each packet that will be passed
				189	unchanged and are not part of the packet to be processed.
				190	- The NextHeader field passed to the IOToken API specifies the Next
				191	Header field for IPsec packet flows that do not use network header
				192	processing.
				193
				194	Record invalidation commands:
				195	- Token_Handle is the null handle and Token_WordCount is zero.
				196	- SA_Handle must point to the SA, transform or flow record to be
				197	invalidated.
				198	- SA_Handle2 must be a null handle
				199	- SA_WordCount is not used.
				200	- SrcPkt_Handle and DstPkt_Handle must both be null handles.
				201	- SrcPkt_ByteCount is zero.
				202
				203	Note: A record invalidation command may be submitted via the PEC API
				204	to the engine only when the engine has no packets being processed
				205	for this record.
				206
				207	Result Descriptor fields
				208	User_p is fully supported on rings that do not use continuous scatter mode
				209	and allows the user to match results to commands. User_p is not supported
				210	on rings with continuous scatter mode.
				211
				212	SrcPkt_Handle and DstPkt_Handle are the same as provided in the command
				213	descriptor. DstPkt_p is the host address for DstPkt_Handle.
				214	On ring with continuous scatter mode, these fields are the NULL handle
				215	and NULL pointer. On rings with continuous scatter mode, the NumParticles
				216	field will be the number of scatter buffers used by the result packet.
				217	At least one scatter buffer will be used, even if the result packet
				218	has zero length. In some cases, the number of scatter buffer is
				219	higher that would be required by the result packet.
				220
				221	DstPkt_ByteCount and Bypass_WordCount have been extracted from the engine
				222	result descriptor as described in the engine datasheet under PE_LENGTH.
				223	Bypass_WordCount should be the same as was provided in the command
				224	descriptor.
				225
				226	For operations that do not require output buffers such as hash operations
				227	the SrcPkt_Handle and DstPkt_Handle parameters in the
				228	PEC_CommandDescriptor_t descriptor must be set equal by applications.
				229	The advantage of this solution is that the driver still checks that
				230	the output buffer handle is not NULL for all operations and can detect
				231	errors in applications that do not do provide correct output buffer handle.
				232	The disadvantage is that for hash operations this will degrade performance
				233	because the driver will have to perform bounce buffer copy back to
				234	the original buffer which is not needed and the driver will also
				235	perform the PostDMA operation which is also not needed.
				236
				237	Status1 and Status2 reflect up to two words from the result token that
				238	contain relevant status information. Use the function
				239	PEC_RD_Status_Read to extract this information in an
				240	engine-independent form.
				241
				242	More status information is passed in an array of 32-bit words via
				243	the OutputToken_p field. The IOToken API can be used to extract
				244	information from this array. The application is responsible for allocating
				245	this array before the call to PEC_Packet_Get.
				246
				247	Notify Requests (callbacks)
				248	In ARM mode, two notify requests (commands and results) are
				249	supported. The result notification callback is only invoked in
				250	interrupt mode. The command notification callback is invoked
				251	from within PEC_Packet_Get.
				252
				253	SA Invalidation
				254	In order to remove an SA from the system, it is required to carry out
				255	the following operations in the specified order.
				256	- Submit a special command descriptor with a transform record invalidation
				257	command via PEC_Packet_Put(). This command will remove the record from
				258	the record cache of the packet engine.
				259	- Wait until the corresponding result descriptor is received via
				260	PEC_Packet_Get().
				261	- Call PEC_SA_UnRegister(). This command will take care of CPU cache
				262	coherency, endianness conversion and bounce buffers, whichever applies.
				263	- At this time the DMA buffer of the SA can be reused for a different
				264	purpose or it can be freed.
				265
				266	Banks for DMA resources
				267	DMA-safe buffers for each data type must be allocated with the
				268	correct Bank parameter (in DAMBuf_Properties_t).
				269	Buffers for SA records must be allocated with Bank=1, all other
				270	buffers must be allocated with Bank=0. On 64-bit hosts, SA buffers
				271	must be allocated in a 4GB memory range, which is taken care of by
				272	using Bank=1.
				273
				274	SA resources
				275	The functions PEC_SA_Register and PEC_SA_UnRegister take three
				276	DMABuf handles as parameters. The first of these is always the
				277	DMABuf handle representing the SA, the second is always a null
				278	handle and the third is the null handle if the SA does not have
				279	an ARC4 state record.
				280
				281	If the SA does have an ARC4 state record, the SA_Handle3
				282	parameter represents the ARC4 state record. However this DMABuf
				283	handle is supposed to represent the ARC4 state part within the SA
				284	buffer. The application is supposed to use DMABuf_Register
				285	(AllocatorRef=='R') to register a subset of the SA buffer as a
				286	DMA Handle.
				287
				288	Multiple Applications
				289	The current implementation supports multiple rings (tested with
				290	two rings) and each ring can be used by a separate application,
				291	independently of other rings. The applications can run
				292	concurrently, as long as each application uses a different
				293	ring. The use of PEC_Packet_Put and PEC_Packet_Get by different
				294	concurrent applications requires no locking. The implementation
				295	of PEC_SA_UnRegister contains the required locking to support
				296	multiple applications.
				297
				298	Concurrent Context Synchronization (CCS)
				299	The PEC API implementation supports a single multi-threaded application
				300	per interface ID, allowing concurrent and independent use of
				301	PEC_SA_Register, PEC_SA_UnRegister, PEC_Packet_Put and PEC_Packet_Get.
				302	Multiple applications using the PEC API are also supported but they must
				303	use different interface ID each.
				304
				305	Note: although the PEC_SA_Register and PEC_SA_UnRegister functions take
				306	InterfaceId as an input parameter it is ignored by these functions
				307	since this functions do nothing what is specific to an packet I/O
				308	(ring) interface.
				309
				310	The PEC API implementation provides synchronization mechanisms for
				311	concurrent contexts invoking the API functions. The API can be used
				312	by multi-threaded user-space and kernel-space applications. The latter
				313	can invoke the API functions from the user process as well as from softirq
				314	contexts. Mixing user process execution with softirq contexts is also
				315	supported. Both Linux Uni-Processor (UP) and Symmetric Multi-Processor
				316	(SMP) kernel configurations are supported.
				317
				318	The PEC API allows for non-blocking synchronization concurrent context
				319	invoking the API functions for different interface ID's. The only
				320	exception are the PEC_Init and PEC_UnInit functions which both allow
				321	for just one execution context at a time even for different interface ID's.
				322	Also there should be no contexts executing the PEC_Packet_Put
				323	or PEC_Packet_Get function code in order for the PEC_UnInit function
				324	to succeed for the same interface ID.
				325
				326	For optimal utilization of the packet engine the PEC API user should allow
				327	for concurrent contexts for the PEC_Packet_Put and PEC_Packet_Get
				328	functions for the same interface ID. Note that having multiple concurrent
				329	contexts invoking the PEC_Packet_Put function for the same interface ID
				330	will not improve performance because this function does not allow more
				331	than one execution context at a time for one interface ID. The same
				332	applies for the PEC_Packet_Get function.
				333
				334	When a function from the PEC API detects that it competes for a resource
				335	already used at the time by another context executing the PEC code it will
				336	not block and return PEC_STATUS_BUSY return code. The caller should try
				337	calling this function again short after.
				338
				339	Debugging
				340	The PEC_Put_Dump() and PEC_Get_Dump() functions can be used to print
				341	the command ring and result ring administration and cached data as well
				342	as the content of the ring buffers respectively. A slot corresponds to
				343	a descriptor in the ring. These functions can be used to debug the packet
				344	I/O functionality.
				345
				346
				347	<end of document>