xlat: Introduce MAP_REGION2() macro

The current implementation of the memory mapping API favours mapping
memory regions using the biggest possible block size in order to
reduce the number of translation tables needed.

In some cases, this behaviour might not be desirable. When translation
tables are edited at run-time, coarse-grain mappings like that might
need splitting into finer-grain tables. This operation has a
performance cost.

The MAP_REGION2() macro allows to specify the granularity of
translation tables used for the initial mapping of a memory region.
This might increase performance for memory regions that are likely to
be edited in the future, at the expense of a potentially increased
memory footprint.

The Translation Tables Library Design Guide has been updated to
explain the use case for this macro. Also added a few intermediate
titles to make the guide easier to digest.

Change-Id: I04de9302e0ee3d326b8877043a9f638766b81b7b
Co-authored-by: Sandrine Bailleux <sandrine.bailleux@arm.com>
Co-authored-by: Antonio Nino Diaz <antonio.ninodiaz@arm.com>
Signed-off-by: Antonio Nino Diaz <antonio.ninodiaz@arm.com>
diff --git a/docs/xlat-tables-lib-v2-design.rst b/docs/xlat-tables-lib-v2-design.rst
index 3006ce7..f36ee9b 100644
--- a/docs/xlat-tables-lib-v2-design.rst
+++ b/docs/xlat-tables-lib-v2-design.rst
@@ -66,7 +66,8 @@
 - its physical base address;
 - its virtual base address;
 - its size;
-- its attributes.
+- its attributes;
+- its mapping granularity (optional).
 
 See the ``struct mmap_region`` type in `xlat\_tables\_v2.h`_.
 
@@ -79,6 +80,31 @@
 read-write, executable or not, secure or non-secure, and so on). See the
 ``mmap_attr_t`` enumeration type in `xlat\_tables\_v2.h`_.
 
+The granularity controls the translation table level to go down to when mapping
+the region. For example, assuming the MMU has been configured to use a 4KB
+granule size, the library might map a 2MB memory region using either of the two
+following options:
+
+- using a single level-2 translation table entry;
+- using a level-2 intermediate entry to a level-3 translation table (which
+  contains 512 entries, each mapping 4KB).
+
+The first solution potentially requires less translation tables, hence
+potentially less memory.  However, if part of this 2MB region is later remapped
+with different memory attributes, the library might need to split the existing
+page tables to refine the mappings. If a single level-2 entry has been used
+here, a level-3 table will need to be allocated on the fly and the level-2
+modified to point to this new level-3 table. This has a performance cost at
+run-time.
+
+If the user knows upfront that such a remapping operation is likely to happen
+then they might enforce a 4KB mapping granularity for this 2MB region from the
+beginning; remapping some of these 4KB pages on the fly then becomes a
+lightweight operation.
+
+The region's granularity is an optional field; if it is not specified the
+library will choose the mapping granularity for this region as it sees fit (more
+details can be found in `The memory mapping algorithm`_ section below).
 
 Translation Context
 ~~~~~~~~~~~~~~~~~~~
@@ -190,6 +216,11 @@
 compatibility breaks, should the ``mmap_region`` structure type evolve in the
 future.
 
+The ``MAP_REGION()`` and ``MAP_REGION_FLAT()`` macros do not allow specifying a
+mapping granularity, which leaves the library implementation free to choose
+it. However, in cases where a specific granularity is required, the
+``MAP_REGION2()`` macro might be used instead.
+
 As explained earlier in this document, when the dynamic mapping feature is
 disabled, there is no notion of dynamic regions. Conceptually, there are only
 static regions. For this reason (and to retain backward compatibility with the
@@ -265,6 +296,9 @@
 Core module
 ~~~~~~~~~~~
 
+From mmap regions to translation tables
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 All the APIs in this module work on a translation context. The translation
 context contains the list of ``mmap_region``, which holds the information of all
 the regions that are mapped at any given time. Whenever there is a request to
@@ -288,14 +322,18 @@
 be added. Changes to the translation tables (as well as the mmap regions list)
 will take effect immediately.
 
+The memory mapping algorithm
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 The mapping function is implemented as a recursive algorithm. It is however
 bound by the level of depth of the translation tables (the ARMv8-A architecture
 allows up to 4 lookup levels).
 
-By default, the algorithm will attempt to minimize the number of translation
-tables created to satisfy the user's request. It will favour mapping a region
-using the biggest possible blocks, only creating a sub-table if it is strictly
-necessary. This is to reduce the memory footprint of the firmware.
+By default [#granularity-ref]_, the algorithm will attempt to minimize the
+number of translation tables created to satisfy the user's request. It will
+favour mapping a region using the biggest possible blocks, only creating a
+sub-table if it is strictly necessary. This is to reduce the memory footprint of
+the firmware.
 
 The most common reason for needing a sub-table is when a specific mapping
 requires a finer granularity. Misaligned regions also require a finer
@@ -322,6 +360,12 @@
 refer to the comments in the source code of the core module for more details
 about the sorting algorithm in use.
 
+.. [#granularity-ref] That is, when mmap regions do not enforce their mapping
+                      granularity.
+
+TLB maintenance operations
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 The library takes care of performing TLB maintenance operations when required.
 For example, when the user requests removing a dynamic region, the library
 invalidates all TLB entries associated to that region to ensure that these
diff --git a/include/lib/xlat_tables/xlat_tables_v2.h b/include/lib/xlat_tables/xlat_tables_v2.h
index 59f0955..a4f0efd 100644
--- a/include/lib/xlat_tables/xlat_tables_v2.h
+++ b/include/lib/xlat_tables/xlat_tables_v2.h
@@ -15,20 +15,36 @@
 #include <xlat_mmu_helpers.h>
 #include <xlat_tables_v2_helpers.h>
 
-/* Helper macro to define entries for mmap_region_t. It creates
- * identity mappings for each region.
+/*
+ * Default granularity size for an mmap_region_t.
+ * Useful when no specific granularity is required.
+ *
+ * By default, choose the biggest possible block size allowed by the
+ * architectural state and granule size in order to minimize the number of page
+ * tables required for the mapping.
  */
-#define MAP_REGION_FLAT(adr, sz, attr) MAP_REGION(adr, adr, sz, attr)
+#define REGION_DEFAULT_GRANULARITY	XLAT_BLOCK_SIZE(MIN_LVL_BLOCK_DESC)
+
+/* Helper macro to define an mmap_region_t. */
+#define MAP_REGION(_pa, _va, _sz, _attr)	\
+	_MAP_REGION_FULL_SPEC(_pa, _va, _sz, _attr, REGION_DEFAULT_GRANULARITY)
 
-/* Helper macro to define entries for mmap_region_t. It allows to
- * re-map address mappings from 'pa' to 'va' for each region.
+/* Helper macro to define an mmap_region_t with an identity mapping. */
+#define MAP_REGION_FLAT(_adr, _sz, _attr)			\
+	MAP_REGION(_adr, _adr, _sz, _attr)
+
+/*
+ * Helper macro to define an mmap_region_t to map with the desired granularity
+ * of translation tables.
+ *
+ * The granularity value passed to this macro must be a valid block or page
+ * size. When using a 4KB translation granule, this might be 4KB, 2MB or 1GB.
+ * Passing REGION_DEFAULT_GRANULARITY is also allowed and means that the library
+ * is free to choose the granularity for this region. In this case, it is
+ * equivalent to the MAP_REGION() macro.
  */
-#define MAP_REGION(_pa, _va, _sz, _attr) {			\
-	.base_pa = (_pa),					\
-	.base_va = (_va),					\
-	.size    = (_sz),					\
-	.attr    = (_attr),					\
-	}
+#define MAP_REGION2(_pa, _va, _sz, _attr, _gr)			\
+	_MAP_REGION_FULL_SPEC(_pa, _va, _sz, _attr, _gr)
 
 /*
  * Shifts and masks to access fields of an mmap_attr_t
@@ -86,6 +102,8 @@
 	uintptr_t		base_va;
 	size_t			size;
 	mmap_attr_t		attr;
+	/* Desired granularity. See the MAP_REGION2() macro for more details. */
+	size_t			granularity;
 } mmap_region_t;
 
 /*
diff --git a/include/lib/xlat_tables/xlat_tables_v2_helpers.h b/include/lib/xlat_tables/xlat_tables_v2_helpers.h
index f5e3100..1ea2fc0 100644
--- a/include/lib/xlat_tables/xlat_tables_v2_helpers.h
+++ b/include/lib/xlat_tables/xlat_tables_v2_helpers.h
@@ -27,6 +27,20 @@
 /* Forward declaration */
 struct mmap_region;
 
+/*
+ * Helper macro to define an mmap_region_t.  This macro allows to specify all
+ * the fields of the structure but its parameter list is not guaranteed to
+ * remain stable as we add members to mmap_region_t.
+ */
+#define _MAP_REGION_FULL_SPEC(_pa, _va, _sz, _attr, _gr)	\
+	{							\
+		.base_pa = (_pa),				\
+		.base_va = (_va),				\
+		.size = (_sz),					\
+		.attr = (_attr),				\
+		.granularity = (_gr),				\
+	}
+
 /* Struct that holds all information about the translation tables. */
 struct xlat_ctx {
 	/*
diff --git a/lib/xlat_tables_v2/xlat_tables_internal.c b/lib/xlat_tables_v2/xlat_tables_internal.c
index da658b1..feca964 100644
--- a/lib/xlat_tables_v2/xlat_tables_internal.c
+++ b/lib/xlat_tables_v2/xlat_tables_internal.c
@@ -417,7 +417,8 @@
 				 * descriptors. If not, create a table instead.
 				 */
 				if ((dest_pa & XLAT_BLOCK_MASK(level)) ||
-				    (level < MIN_LVL_BLOCK_DESC))
+				    (level < MIN_LVL_BLOCK_DESC) ||
+				    (mm->granularity < XLAT_BLOCK_SIZE(level)))
 					return ACTION_CREATE_NEW_TABLE;
 				else
 					return ACTION_WRITE_BLOCK_ENTRY;
@@ -590,9 +591,10 @@
 	mmap_region_t *mm = mmap;
 
 	while (mm->size) {
-		tf_printf(" VA:%p  PA:0x%llx  size:0x%zx  attr:0x%x\n",
+		tf_printf(" VA:%p  PA:0x%llx  size:0x%zx  attr:0x%x",
 				(void *)mm->base_va, mm->base_pa,
 				mm->size, mm->attr);
+		tf_printf(" granularity:0x%zx\n", mm->granularity);
 		++mm;
 	};
 	tf_printf("\n");
@@ -613,7 +615,7 @@
 	unsigned long long base_pa = mm->base_pa;
 	uintptr_t base_va = mm->base_va;
 	size_t size = mm->size;
-	mmap_attr_t attr = mm->attr;
+	size_t granularity = mm->granularity;
 
 	unsigned long long end_pa = base_pa + size - 1;
 	uintptr_t end_va = base_va + size - 1;
@@ -622,6 +624,12 @@
 			!IS_PAGE_ALIGNED(size))
 		return -EINVAL;
 
+	if ((granularity != XLAT_BLOCK_SIZE(1)) &&
+		(granularity != XLAT_BLOCK_SIZE(2)) &&
+		(granularity != XLAT_BLOCK_SIZE(3))) {
+		return -EINVAL;
+	}
+
 	/* Check for overflows */
 	if ((base_pa > end_pa) || (base_va > end_va))
 		return -ERANGE;
@@ -663,11 +671,9 @@
 		if (fully_overlapped_va) {
 
 #if PLAT_XLAT_TABLES_DYNAMIC
-			if ((attr & MT_DYNAMIC) ||
+			if ((mm->attr & MT_DYNAMIC) ||
 						(mm_cursor->attr & MT_DYNAMIC))
 				return -EPERM;
-#else
-			(void)attr;
 #endif /* PLAT_XLAT_TABLES_DYNAMIC */
 			if ((mm_cursor->base_va - mm_cursor->base_pa) !=
 							(base_va - base_pa))