xlat: Introduce MAP_REGION2() macro

The current implementation of the memory mapping API favours mapping
memory regions using the biggest possible block size in order to
reduce the number of translation tables needed.

In some cases, this behaviour might not be desirable. When translation
tables are edited at run-time, coarse-grain mappings like that might
need splitting into finer-grain tables. This operation has a
performance cost.

The MAP_REGION2() macro allows to specify the granularity of
translation tables used for the initial mapping of a memory region.
This might increase performance for memory regions that are likely to
be edited in the future, at the expense of a potentially increased
memory footprint.

The Translation Tables Library Design Guide has been updated to
explain the use case for this macro. Also added a few intermediate
titles to make the guide easier to digest.

Change-Id: I04de9302e0ee3d326b8877043a9f638766b81b7b
Co-authored-by: Sandrine Bailleux <sandrine.bailleux@arm.com>
Co-authored-by: Antonio Nino Diaz <antonio.ninodiaz@arm.com>
Signed-off-by: Antonio Nino Diaz <antonio.ninodiaz@arm.com>
diff --git a/docs/xlat-tables-lib-v2-design.rst b/docs/xlat-tables-lib-v2-design.rst
index 3006ce7..f36ee9b 100644
--- a/docs/xlat-tables-lib-v2-design.rst
+++ b/docs/xlat-tables-lib-v2-design.rst
@@ -66,7 +66,8 @@
 - its physical base address;
 - its virtual base address;
 - its size;
-- its attributes.
+- its attributes;
+- its mapping granularity (optional).
 
 See the ``struct mmap_region`` type in `xlat\_tables\_v2.h`_.
 
@@ -79,6 +80,31 @@
 read-write, executable or not, secure or non-secure, and so on). See the
 ``mmap_attr_t`` enumeration type in `xlat\_tables\_v2.h`_.
 
+The granularity controls the translation table level to go down to when mapping
+the region. For example, assuming the MMU has been configured to use a 4KB
+granule size, the library might map a 2MB memory region using either of the two
+following options:
+
+- using a single level-2 translation table entry;
+- using a level-2 intermediate entry to a level-3 translation table (which
+  contains 512 entries, each mapping 4KB).
+
+The first solution potentially requires less translation tables, hence
+potentially less memory.  However, if part of this 2MB region is later remapped
+with different memory attributes, the library might need to split the existing
+page tables to refine the mappings. If a single level-2 entry has been used
+here, a level-3 table will need to be allocated on the fly and the level-2
+modified to point to this new level-3 table. This has a performance cost at
+run-time.
+
+If the user knows upfront that such a remapping operation is likely to happen
+then they might enforce a 4KB mapping granularity for this 2MB region from the
+beginning; remapping some of these 4KB pages on the fly then becomes a
+lightweight operation.
+
+The region's granularity is an optional field; if it is not specified the
+library will choose the mapping granularity for this region as it sees fit (more
+details can be found in `The memory mapping algorithm`_ section below).
 
 Translation Context
 ~~~~~~~~~~~~~~~~~~~
@@ -190,6 +216,11 @@
 compatibility breaks, should the ``mmap_region`` structure type evolve in the
 future.
 
+The ``MAP_REGION()`` and ``MAP_REGION_FLAT()`` macros do not allow specifying a
+mapping granularity, which leaves the library implementation free to choose
+it. However, in cases where a specific granularity is required, the
+``MAP_REGION2()`` macro might be used instead.
+
 As explained earlier in this document, when the dynamic mapping feature is
 disabled, there is no notion of dynamic regions. Conceptually, there are only
 static regions. For this reason (and to retain backward compatibility with the
@@ -265,6 +296,9 @@
 Core module
 ~~~~~~~~~~~
 
+From mmap regions to translation tables
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 All the APIs in this module work on a translation context. The translation
 context contains the list of ``mmap_region``, which holds the information of all
 the regions that are mapped at any given time. Whenever there is a request to
@@ -288,14 +322,18 @@
 be added. Changes to the translation tables (as well as the mmap regions list)
 will take effect immediately.
 
+The memory mapping algorithm
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 The mapping function is implemented as a recursive algorithm. It is however
 bound by the level of depth of the translation tables (the ARMv8-A architecture
 allows up to 4 lookup levels).
 
-By default, the algorithm will attempt to minimize the number of translation
-tables created to satisfy the user's request. It will favour mapping a region
-using the biggest possible blocks, only creating a sub-table if it is strictly
-necessary. This is to reduce the memory footprint of the firmware.
+By default [#granularity-ref]_, the algorithm will attempt to minimize the
+number of translation tables created to satisfy the user's request. It will
+favour mapping a region using the biggest possible blocks, only creating a
+sub-table if it is strictly necessary. This is to reduce the memory footprint of
+the firmware.
 
 The most common reason for needing a sub-table is when a specific mapping
 requires a finer granularity. Misaligned regions also require a finer
@@ -322,6 +360,12 @@
 refer to the comments in the source code of the core module for more details
 about the sorting algorithm in use.
 
+.. [#granularity-ref] That is, when mmap regions do not enforce their mapping
+                      granularity.
+
+TLB maintenance operations
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 The library takes care of performing TLB maintenance operations when required.
 For example, when the user requests removing a dynamic region, the library
 invalidates all TLB entries associated to that region to ensure that these