package-21.02/kernel/crypto-eip/src/ddk/slad/PEC API Implementation Notes.txt - openwrt/feeds/mtk-openwrt-feeds - Gitiles

 +=============================================================================+
 | Copyright (c) 2010-2022 Rambus, Inc. and/or its subsidiaries.               |
 |                                                                             |
 | Subject   : PEC API Implementation Notes                                    |
 | Product   : SLAD API                                                        |
 | Date      : 02 December, 2022                                               |
 |                                                                             |
 +=============================================================================+

 The SLAD API is a set of the API's one of which is the Packet Engine
 Control (PEC) API. The driver implementation specifics of these APIs are
 described in short documents that serve as an addendum to the API
 specifications. This document describes the PEC API.

 This document uses the phrase "configurable" to indicate that a parameter or
 option is build-time configurable in the driver. Please refer to the User Guide
 of the driver for details.


 PEC API
 -------

 One PEC implementation is available: ARM (Autonomous Ring Mode).

 Packet engine device
      The PEC API implementation can support multiple packet processing
      devices. These devices are identified by the interface ID parameter
      in the PEC API functions and are sometimes referred to as rings.

 ARM mode
      ARM mode uses DMA, can queue many jobs, handles commands and
      results asynchronously (but always in-order) and also supports
      fragmented data buffers (scatter / gather). The implementation
      supports a single multi-threaded application per ring, allowing
      concurrent and independent use of PEC_SA_Register,
      PEC_SA_UnRegister, PEC_Packet_Put and PEC_Packet_Get. Both
      PEC_Packet_Put and PEC_Packet_Get are not re-entrant, but
      individual threads can use these functions concurrently.

      Ensure that the Token Data, Packet Data and Context Data DMA
      buffers are not re-used or freed by the execution context using
      the PEC and DMABuf API's functions or by another execution
      context until the processed result descriptor(s) referring to the
      packet associated with these buffers is(are) fully processed by
      the engine. This is required not only when in-place packet
      transform is done using the same Packet Data DMA buffer as input
      and output buffer but also when different DMA buffers are used
      for the packet processing input and output data.

 Byte ordering (endianess)
      The SA and Token buffers are considered arrays of 32bit integers,
      in host-native byte ordering. The driver will change the byte
      order if required if this is required (configurable).  The data
      buffers are considered byte arrays and the driver will not touch
      these.

 Bounce Buffers
      Bounce Buffer support can be removed (configurable) from the driver to
      reduce footprint.
      For ARM mode, the implementation can bounce the SA, Token and data buffers
      if these are not DMA-safe. This requires that the buffer was registered
      using DMABuf_Register with an unsupported AllocatorRef.
      When bouncing an SA buffer, it will be copied to a bounce buffer in
      PEC_SA_Register and copied back by PEC_SA_UnRegister.
      When bouncing a token buffer, a bounce buffer is created by
      PEC_Packet_Put and released by PEC_Packet_Get.
      When bouncing a data buffer (because either the source or destination
      requires bouncing), a single bounce buffer is created by PEC_Packet_Put
      based on the largest of the source and destination buffers. The engine
      then performs an in-place operation in the bounce buffer. PEC_Packet_Get
      copies the result to the destination buffer and releases the bounce
      buffer.

 Descriptor grouping
      PEC_Packet_Get and PEC_Packet_Put can process up to a
      configurable number of descriptors in one call.

 Queuing
      The ring can queue a configurable number of jobs. This can be set to
      hundreds or even thousands, at the cost of some memory footprint. This
      can avoid queuing in software and can also give better performance
      (avoids idle engine due to empty ring).

 Scatter / Gather support
      The ARM mode supports the scatter/gather extension (configurable) of the
      PEC API.
      If the SrcPkt_Handle in the command descriptor is a PEC SG_List, a
      packet is assumed to used gather.
      If the DstPkt_Handle in the command descriptor is a PEC SG_List, a
      packet is assumed to used scatter.

      The application is responsible for setting up both the gather list
      in SrcPkt_Handle and the scatter list in DstPkt_Handle for each packet.
      The application is responsible for allocating and releasing the individual
      gather and scatter buffers.

      The buffers used for the Scatter and/or Gather data are not bounced and
      must be allocated and provided to the driver as DMA-safe buffers. This
      can be achieved by using the driver's DMABuf API.

 Continuous Scatter mode
      The ARM mode supports continuous scatter mode, which can be enabled per
      ring. When continuous scatter mode is enabled for a ring, the result
      packets are written to a sequence of destination buffers supplied by the
      function PEC_Scatter_Preload(). Each result packet will occupy one
      or more destination buffers (it will be scattered), depending on the
      packet length and the sizes of the destination buffers.

      It is not determined in advance which destination buffers will be
      used for the result packet of a certain PEC_Packet_Put(). In a
      typical use case, the application will pre-allocate a number of
      buffers, each of the same size, and submits them to the
      PEC_Scatter_Preload() function. It will regularly call
      PEC_Scatter_Preload() to refill the supply of destination buffers, after
      buffers are used by result packets.

      Continuous scatter mode can be used even if the driver is configured
      without scatter-gather support, but in that case each packet is required
      to fit into a single destination buffer.

      Continuous scatter mode is not supported with LAC flows.

 Redirection
      Some configurations of the hardware support redirection. A packet,
      originally submitted with PEC_Packet_Put() can have its result appear
      on a different ring or on the inline interface. A packet, originally
      received on the inline interface, can have its result appear on
      a ring.

      Any ring from which packets can be redirected, must be configured with
      continuous scatter mode. Any ring towards which packets can be redirected,
      must be configured with continuous scatter mode. When redirection is
      possible, the sequence of packets submitted with PEC_Packet_Put() and
      the sequence of result packets retrieved with PEC_Packet_Get()
      on the same ring are no longer related to one another.

      When a packet submitted with PEC_Packet_Put() is redirected to the
      inline interface, no result descriptor will be received with
      PEC_Packet_Get(). When a packet is received on the inline interface and
      it is redirected to a ring, a result descriptor will appear with no
      corresponding command descriptor in PEC_Packet_Put().

 Command Descriptor fields
      User_p is fully supported on rings that do not use continuous scatter mode
      and allows the user to match results to commands. User_p is not supported
      on rings with continuous scatter mode.

      The Control1 field is not used by this implementation. The PEC
      API function PEC_CD_Control_Write is not implemented. Instead use
      the IOToken API to pass an array of 32-bit words via the
      InputToken_p field. The application is responsible for allocating
      this array. The HW_Services field in the data structure passed to
      the IOToken API specifies the exact packet flow or alternatively
      it can specify a record invalidation command instead.  The values
      to be filled in are provided by the firmware API.

      Control2 can be used to specify the engine on which a packet must be
      processed. This can be useful for protocols like TLS, where subsequent
      packets of the same data stream must be processed on the same engine to
      ensure in-order processing and in-order assignment of the sequence numbers.
      Bit 5 in Control2 can be set if the engine is specified. The engine ID
      is put in bits 4..0. Otherwise, the Control2 field should be all zero.

      LAC packet flow:
      - A valid Token_Handle must always be provided for each packet and
        Token_WordCount must be set to the exact size in words of the token.
      - SA_Handle1 must point to the main SA, which must be registered by
        PEC_SA_Register.
      - SA_Handle2 must be a null handle
      - SA_WordCount is not used.
      - DstPkt_Handle must always be provided, also for input-only operations.
        When no destination buffer is required, it can be set to SrcPkt_Handle.
      - The TokenHeaderWord passed to the IOToken API must be filled in.
      - The Offset_ByteCount passed to the IOToken API is not used.

      Other packet flows:
      - Token_Handle is the null handle and Token_WordCount is zero.
      - SA_Handle1 is the null handle if classification is used, else it is the
        DMABuf handle representing a transform record.
      - SA_Handle2 must be a null handle
      - SA_WordCount is not used.
      - DstPkt_Handle must always be provided (except for continuous
        scatter mode), also for input-only operations.
        When no destination buffer is required, it can be set to SrcPkt_Handle.
        When continuous scatter mode is enabled, no destination handle must be
        provided.
      - The Offset_ByteCount field passed to the IOTOken API specifies the
        number of bytes at the start of each packet that will be passed
        unchanged and are not part of the packet to be processed.
      - The NextHeader field passed to the IOToken API specifies the Next
        Header field for IPsec packet flows that do not use network header
        processing.

      Record invalidation commands:
      - Token_Handle is the null handle and Token_WordCount is zero.
      - SA_Handle must point to the SA, transform or flow record to be
        invalidated.
      - SA_Handle2 must be a null handle
      - SA_WordCount is not used.
      - SrcPkt_Handle and DstPkt_Handle must both be null handles.
      - SrcPkt_ByteCount is zero.

      Note: A record invalidation command may be submitted via the PEC API
            to the engine only when the engine has no packets being processed
            for this record.

 Result Descriptor fields
      User_p is fully supported on rings that do not use continuous scatter mode
      and allows the user to match results to commands. User_p is not supported
      on rings with continuous scatter mode.

      SrcPkt_Handle and DstPkt_Handle are the same as provided in the command
      descriptor. DstPkt_p is the host address for DstPkt_Handle.
      On ring with continuous scatter mode, these fields are the NULL handle
      and NULL pointer. On rings with continuous scatter mode, the NumParticles
      field will be the number of scatter buffers used by the result packet.
      At least one scatter buffer will be used, even if the result packet
      has zero length. In some cases, the number of scatter buffer is
      higher that would be required by the result packet.

      DstPkt_ByteCount and Bypass_WordCount have been extracted from the engine
      result descriptor as described in the engine datasheet under PE_LENGTH.
      Bypass_WordCount should be the same as was provided in the command
      descriptor.

      For operations that do not require output buffers such as hash operations
      the SrcPkt_Handle and DstPkt_Handle parameters in the
      PEC_CommandDescriptor_t descriptor must be set equal by applications.
      The advantage of this solution is that the driver still checks that
      the output buffer handle is not NULL for all operations and can detect
      errors in applications that do not do provide correct output buffer handle.
      The disadvantage is that for hash operations this will degrade performance
      because the driver will have to perform bounce buffer copy back to
      the original buffer which is not needed and the driver will also
      perform the PostDMA operation which is also not needed.

      Status1 and Status2 reflect up to two words from the result token that
      contain relevant status information. Use the function
      PEC_RD_Status_Read to extract this information in an
      engine-independent form.

      More status information is passed in an array of 32-bit words via
      the OutputToken_p field. The IOToken API can be used to extract
      information from this array. The application is responsible for allocating
      this array before the call to PEC_Packet_Get.

 Notify Requests (callbacks)
      In ARM mode, two notify requests (commands and results) are
      supported.  The result notification callback is only invoked in
      interrupt mode.  The command notification callback is invoked
      from within PEC_Packet_Get.

 SA Invalidation
      In order to remove an SA from the system, it is required to carry out
      the following operations in the specified order.
      - Submit a special command descriptor with a transform record invalidation
        command via PEC_Packet_Put(). This command will remove the record from
        the record cache of the packet engine.
      - Wait until the corresponding result descriptor is received via
        PEC_Packet_Get().
      - Call PEC_SA_UnRegister(). This command will take care of CPU cache
        coherency, endianness conversion and bounce buffers, whichever applies.
      - At this time the DMA buffer of the SA can be reused for a different
        purpose or it can be freed.

 Banks for DMA resources
      DMA-safe buffers for each data type must be allocated with the
      correct Bank parameter (in DAMBuf_Properties_t).
      Buffers for SA records must be allocated with Bank=1, all other
      buffers must be allocated with Bank=0. On 64-bit hosts, SA buffers
      must be allocated in a 4GB memory range, which is taken care of by
      using Bank=1.

 SA resources
      The functions PEC_SA_Register and PEC_SA_UnRegister take three
      DMABuf handles as parameters. The first of these is always the
      DMABuf handle representing the SA, the second is always a null
      handle and the third is the null handle if the SA does not have
      an ARC4 state record.

      If the SA does have an ARC4 state record, the SA_Handle3
      parameter represents the ARC4 state record. However this DMABuf
      handle is supposed to represent the ARC4 state part within the SA
      buffer. The application is supposed to use DMABuf_Register
      (AllocatorRef=='R') to register a subset of the SA buffer as a
      DMA Handle.

 Multiple Applications
      The current implementation supports multiple rings (tested with
      two rings) and each ring can be used by a separate application,
      independently of other rings. The applications can run
      concurrently, as long as each application uses a different
      ring. The use of PEC_Packet_Put and PEC_Packet_Get by different
      concurrent applications requires no locking. The implementation
      of PEC_SA_UnRegister contains the required locking to support
      multiple applications.

 Concurrent Context Synchronization (CCS)
      The PEC API implementation supports a single multi-threaded application
      per interface ID, allowing concurrent and independent use of
      PEC_SA_Register, PEC_SA_UnRegister, PEC_Packet_Put and PEC_Packet_Get.
      Multiple applications using the PEC API are also supported but they must
      use different interface ID each.

      Note: although the PEC_SA_Register and PEC_SA_UnRegister functions take
            InterfaceId as an input parameter it is ignored by these functions
            since this functions do nothing what is specific to an packet I/O
            (ring) interface.

      The PEC API implementation provides synchronization mechanisms for
      concurrent contexts invoking the API functions. The API can be used
      by multi-threaded user-space and kernel-space applications. The latter
      can invoke the API functions from the user process as well as from softirq
      contexts. Mixing user process execution with softirq contexts is also
      supported. Both Linux Uni-Processor (UP) and Symmetric Multi-Processor
      (SMP) kernel configurations are supported.

      The PEC API allows for non-blocking synchronization concurrent context
      invoking the API functions for different interface ID's. The only
      exception are the PEC_Init and PEC_UnInit functions which both allow
      for just one execution context at a time even for different interface ID's.
      Also there should be no contexts executing the PEC_Packet_Put
      or PEC_Packet_Get function code in order for the PEC_UnInit function
      to succeed for the same interface ID.

      For optimal utilization of the packet engine the PEC API user should allow
      for concurrent contexts for the PEC_Packet_Put and PEC_Packet_Get
      functions for the same interface ID. Note that having multiple concurrent
      contexts invoking the PEC_Packet_Put function for the same interface ID
      will not improve performance because this function does not allow more
      than one execution context at a time for one interface ID. The same
      applies for the PEC_Packet_Get function.

      When a function from the PEC API detects that it competes for a resource
      already used at the time by another context executing the PEC code it will
      not block and return PEC_STATUS_BUSY return code. The caller should try
      calling this function again short after.

 Debugging
      The PEC_Put_Dump() and PEC_Get_Dump() functions can be used to print
      the command ring and result ring administration and cached data as well
      as the content of the ring buffers respectively. A slot corresponds to
      a descriptor in the ring. These functions can be used to debug the packet
      I/O functionality.


 <end of document>
	+=============================================================================+
	\| Copyright (c) 2010-2022 Rambus, Inc. and/or its subsidiaries. \|
	\| \|
	\| Subject : PEC API Implementation Notes \|
	\| Product : SLAD API \|
	\| Date : 02 December, 2022 \|
	\| \|
	+=============================================================================+

	The SLAD API is a set of the API's one of which is the Packet Engine
	Control (PEC) API. The driver implementation specifics of these APIs are
	described in short documents that serve as an addendum to the API
	specifications. This document describes the PEC API.

	This document uses the phrase "configurable" to indicate that a parameter or
	option is build-time configurable in the driver. Please refer to the User Guide
	of the driver for details.


	PEC API
	-------

	One PEC implementation is available: ARM (Autonomous Ring Mode).

	Packet engine device
	The PEC API implementation can support multiple packet processing
	devices. These devices are identified by the interface ID parameter
	in the PEC API functions and are sometimes referred to as rings.

	ARM mode
	ARM mode uses DMA, can queue many jobs, handles commands and
	results asynchronously (but always in-order) and also supports
	fragmented data buffers (scatter / gather). The implementation
	supports a single multi-threaded application per ring, allowing
	concurrent and independent use of PEC_SA_Register,
	PEC_SA_UnRegister, PEC_Packet_Put and PEC_Packet_Get. Both
	PEC_Packet_Put and PEC_Packet_Get are not re-entrant, but
	individual threads can use these functions concurrently.

	Ensure that the Token Data, Packet Data and Context Data DMA
	buffers are not re-used or freed by the execution context using
	the PEC and DMABuf API's functions or by another execution
	context until the processed result descriptor(s) referring to the
	packet associated with these buffers is(are) fully processed by
	the engine. This is required not only when in-place packet
	transform is done using the same Packet Data DMA buffer as input
	and output buffer but also when different DMA buffers are used
	for the packet processing input and output data.

	Byte ordering (endianess)
	The SA and Token buffers are considered arrays of 32bit integers,
	in host-native byte ordering. The driver will change the byte
	order if required if this is required (configurable). The data
	buffers are considered byte arrays and the driver will not touch
	these.

	Bounce Buffers
	Bounce Buffer support can be removed (configurable) from the driver to
	reduce footprint.
	For ARM mode, the implementation can bounce the SA, Token and data buffers
	if these are not DMA-safe. This requires that the buffer was registered
	using DMABuf_Register with an unsupported AllocatorRef.
	When bouncing an SA buffer, it will be copied to a bounce buffer in
	PEC_SA_Register and copied back by PEC_SA_UnRegister.
	When bouncing a token buffer, a bounce buffer is created by
	PEC_Packet_Put and released by PEC_Packet_Get.
	When bouncing a data buffer (because either the source or destination
	requires bouncing), a single bounce buffer is created by PEC_Packet_Put
	based on the largest of the source and destination buffers. The engine
	then performs an in-place operation in the bounce buffer. PEC_Packet_Get
	copies the result to the destination buffer and releases the bounce
	buffer.

	Descriptor grouping
	PEC_Packet_Get and PEC_Packet_Put can process up to a
	configurable number of descriptors in one call.

	Queuing
	The ring can queue a configurable number of jobs. This can be set to
	hundreds or even thousands, at the cost of some memory footprint. This
	can avoid queuing in software and can also give better performance
	(avoids idle engine due to empty ring).

	Scatter / Gather support
	The ARM mode supports the scatter/gather extension (configurable) of the
	PEC API.
	If the SrcPkt_Handle in the command descriptor is a PEC SG_List, a
	packet is assumed to used gather.
	If the DstPkt_Handle in the command descriptor is a PEC SG_List, a
	packet is assumed to used scatter.

	The application is responsible for setting up both the gather list
	in SrcPkt_Handle and the scatter list in DstPkt_Handle for each packet.
	The application is responsible for allocating and releasing the individual
	gather and scatter buffers.

	The buffers used for the Scatter and/or Gather data are not bounced and
	must be allocated and provided to the driver as DMA-safe buffers. This
	can be achieved by using the driver's DMABuf API.

	Continuous Scatter mode
	The ARM mode supports continuous scatter mode, which can be enabled per
	ring. When continuous scatter mode is enabled for a ring, the result
	packets are written to a sequence of destination buffers supplied by the
	function PEC_Scatter_Preload(). Each result packet will occupy one
	or more destination buffers (it will be scattered), depending on the
	packet length and the sizes of the destination buffers.

	It is not determined in advance which destination buffers will be
	used for the result packet of a certain PEC_Packet_Put(). In a
	typical use case, the application will pre-allocate a number of
	buffers, each of the same size, and submits them to the
	PEC_Scatter_Preload() function. It will regularly call
	PEC_Scatter_Preload() to refill the supply of destination buffers, after
	buffers are used by result packets.

	Continuous scatter mode can be used even if the driver is configured
	without scatter-gather support, but in that case each packet is required
	to fit into a single destination buffer.

	Continuous scatter mode is not supported with LAC flows.

	Redirection
	Some configurations of the hardware support redirection. A packet,
	originally submitted with PEC_Packet_Put() can have its result appear
	on a different ring or on the inline interface. A packet, originally
	received on the inline interface, can have its result appear on
	a ring.

	Any ring from which packets can be redirected, must be configured with
	continuous scatter mode. Any ring towards which packets can be redirected,
	must be configured with continuous scatter mode. When redirection is
	possible, the sequence of packets submitted with PEC_Packet_Put() and
	the sequence of result packets retrieved with PEC_Packet_Get()
	on the same ring are no longer related to one another.

	When a packet submitted with PEC_Packet_Put() is redirected to the
	inline interface, no result descriptor will be received with
	PEC_Packet_Get(). When a packet is received on the inline interface and
	it is redirected to a ring, a result descriptor will appear with no
	corresponding command descriptor in PEC_Packet_Put().

	Command Descriptor fields
	User_p is fully supported on rings that do not use continuous scatter mode
	and allows the user to match results to commands. User_p is not supported
	on rings with continuous scatter mode.

	The Control1 field is not used by this implementation. The PEC
	API function PEC_CD_Control_Write is not implemented. Instead use
	the IOToken API to pass an array of 32-bit words via the
	InputToken_p field. The application is responsible for allocating
	this array. The HW_Services field in the data structure passed to
	the IOToken API specifies the exact packet flow or alternatively
	it can specify a record invalidation command instead. The values
	to be filled in are provided by the firmware API.

	Control2 can be used to specify the engine on which a packet must be
	processed. This can be useful for protocols like TLS, where subsequent
	packets of the same data stream must be processed on the same engine to
	ensure in-order processing and in-order assignment of the sequence numbers.
	Bit 5 in Control2 can be set if the engine is specified. The engine ID
	is put in bits 4..0. Otherwise, the Control2 field should be all zero.

	LAC packet flow:
	- A valid Token_Handle must always be provided for each packet and
	Token_WordCount must be set to the exact size in words of the token.
	- SA_Handle1 must point to the main SA, which must be registered by
	PEC_SA_Register.
	- SA_Handle2 must be a null handle
	- SA_WordCount is not used.
	- DstPkt_Handle must always be provided, also for input-only operations.
	When no destination buffer is required, it can be set to SrcPkt_Handle.
	- The TokenHeaderWord passed to the IOToken API must be filled in.
	- The Offset_ByteCount passed to the IOToken API is not used.

	Other packet flows:
	- Token_Handle is the null handle and Token_WordCount is zero.
	- SA_Handle1 is the null handle if classification is used, else it is the
	DMABuf handle representing a transform record.
	- SA_Handle2 must be a null handle
	- SA_WordCount is not used.
	- DstPkt_Handle must always be provided (except for continuous
	scatter mode), also for input-only operations.
	When no destination buffer is required, it can be set to SrcPkt_Handle.
	When continuous scatter mode is enabled, no destination handle must be
	provided.
	- The Offset_ByteCount field passed to the IOTOken API specifies the
	number of bytes at the start of each packet that will be passed
	unchanged and are not part of the packet to be processed.
	- The NextHeader field passed to the IOToken API specifies the Next
	Header field for IPsec packet flows that do not use network header
	processing.

	Record invalidation commands:
	- Token_Handle is the null handle and Token_WordCount is zero.
	- SA_Handle must point to the SA, transform or flow record to be
	invalidated.
	- SA_Handle2 must be a null handle
	- SA_WordCount is not used.
	- SrcPkt_Handle and DstPkt_Handle must both be null handles.
	- SrcPkt_ByteCount is zero.

	Note: A record invalidation command may be submitted via the PEC API
	to the engine only when the engine has no packets being processed
	for this record.

	Result Descriptor fields
	User_p is fully supported on rings that do not use continuous scatter mode
	and allows the user to match results to commands. User_p is not supported
	on rings with continuous scatter mode.

	SrcPkt_Handle and DstPkt_Handle are the same as provided in the command
	descriptor. DstPkt_p is the host address for DstPkt_Handle.
	On ring with continuous scatter mode, these fields are the NULL handle
	and NULL pointer. On rings with continuous scatter mode, the NumParticles
	field will be the number of scatter buffers used by the result packet.
	At least one scatter buffer will be used, even if the result packet
	has zero length. In some cases, the number of scatter buffer is
	higher that would be required by the result packet.

	DstPkt_ByteCount and Bypass_WordCount have been extracted from the engine
	result descriptor as described in the engine datasheet under PE_LENGTH.
	Bypass_WordCount should be the same as was provided in the command
	descriptor.

	For operations that do not require output buffers such as hash operations
	the SrcPkt_Handle and DstPkt_Handle parameters in the
	PEC_CommandDescriptor_t descriptor must be set equal by applications.
	The advantage of this solution is that the driver still checks that
	the output buffer handle is not NULL for all operations and can detect
	errors in applications that do not do provide correct output buffer handle.
	The disadvantage is that for hash operations this will degrade performance
	because the driver will have to perform bounce buffer copy back to
	the original buffer which is not needed and the driver will also
	perform the PostDMA operation which is also not needed.

	Status1 and Status2 reflect up to two words from the result token that
	contain relevant status information. Use the function
	PEC_RD_Status_Read to extract this information in an
	engine-independent form.

	More status information is passed in an array of 32-bit words via
	the OutputToken_p field. The IOToken API can be used to extract
	information from this array. The application is responsible for allocating
	this array before the call to PEC_Packet_Get.

	Notify Requests (callbacks)
	In ARM mode, two notify requests (commands and results) are
	supported. The result notification callback is only invoked in
	interrupt mode. The command notification callback is invoked
	from within PEC_Packet_Get.

	SA Invalidation
	In order to remove an SA from the system, it is required to carry out
	the following operations in the specified order.
	- Submit a special command descriptor with a transform record invalidation
	command via PEC_Packet_Put(). This command will remove the record from
	the record cache of the packet engine.
	- Wait until the corresponding result descriptor is received via
	PEC_Packet_Get().
	- Call PEC_SA_UnRegister(). This command will take care of CPU cache
	coherency, endianness conversion and bounce buffers, whichever applies.
	- At this time the DMA buffer of the SA can be reused for a different
	purpose or it can be freed.

	Banks for DMA resources
	DMA-safe buffers for each data type must be allocated with the
	correct Bank parameter (in DAMBuf_Properties_t).
	Buffers for SA records must be allocated with Bank=1, all other
	buffers must be allocated with Bank=0. On 64-bit hosts, SA buffers
	must be allocated in a 4GB memory range, which is taken care of by
	using Bank=1.

	SA resources
	The functions PEC_SA_Register and PEC_SA_UnRegister take three
	DMABuf handles as parameters. The first of these is always the
	DMABuf handle representing the SA, the second is always a null
	handle and the third is the null handle if the SA does not have
	an ARC4 state record.

	If the SA does have an ARC4 state record, the SA_Handle3
	parameter represents the ARC4 state record. However this DMABuf
	handle is supposed to represent the ARC4 state part within the SA
	buffer. The application is supposed to use DMABuf_Register
	(AllocatorRef=='R') to register a subset of the SA buffer as a
	DMA Handle.

	Multiple Applications
	The current implementation supports multiple rings (tested with
	two rings) and each ring can be used by a separate application,
	independently of other rings. The applications can run
	concurrently, as long as each application uses a different
	ring. The use of PEC_Packet_Put and PEC_Packet_Get by different
	concurrent applications requires no locking. The implementation
	of PEC_SA_UnRegister contains the required locking to support
	multiple applications.

	Concurrent Context Synchronization (CCS)
	The PEC API implementation supports a single multi-threaded application
	per interface ID, allowing concurrent and independent use of
	PEC_SA_Register, PEC_SA_UnRegister, PEC_Packet_Put and PEC_Packet_Get.
	Multiple applications using the PEC API are also supported but they must
	use different interface ID each.

	Note: although the PEC_SA_Register and PEC_SA_UnRegister functions take
	InterfaceId as an input parameter it is ignored by these functions
	since this functions do nothing what is specific to an packet I/O
	(ring) interface.

	The PEC API implementation provides synchronization mechanisms for
	concurrent contexts invoking the API functions. The API can be used
	by multi-threaded user-space and kernel-space applications. The latter
	can invoke the API functions from the user process as well as from softirq
	contexts. Mixing user process execution with softirq contexts is also
	supported. Both Linux Uni-Processor (UP) and Symmetric Multi-Processor
	(SMP) kernel configurations are supported.

	The PEC API allows for non-blocking synchronization concurrent context
	invoking the API functions for different interface ID's. The only
	exception are the PEC_Init and PEC_UnInit functions which both allow
	for just one execution context at a time even for different interface ID's.
	Also there should be no contexts executing the PEC_Packet_Put
	or PEC_Packet_Get function code in order for the PEC_UnInit function
	to succeed for the same interface ID.

	For optimal utilization of the packet engine the PEC API user should allow
	for concurrent contexts for the PEC_Packet_Put and PEC_Packet_Get
	functions for the same interface ID. Note that having multiple concurrent
	contexts invoking the PEC_Packet_Put function for the same interface ID
	will not improve performance because this function does not allow more
	than one execution context at a time for one interface ID. The same
	applies for the PEC_Packet_Get function.

	When a function from the PEC API detects that it competes for a resource
	already used at the time by another context executing the PEC code it will
	not block and return PEC_STATUS_BUSY return code. The caller should try
	calling this function again short after.

	Debugging
	The PEC_Put_Dump() and PEC_Get_Dump() functions can be used to print
	the command ring and result ring administration and cached data as well
	as the content of the ring buffers respectively. A slot corresponds to
	a descriptor in the ring. These functions can be used to debug the packet
	I/O functionality.


	<end of document>