Add support for Branch Target Identification

This patch adds the functionality needed for platforms to provide
Branch Target Identification (BTI) extension, introduced to AArch64
in Armv8.5-A by adding BTI instruction used to mark valid targets
for indirect branches. The patch sets new GP bit [50] to the stage 1
Translation Table Block and Page entries to denote guarded EL3 code
pages which will cause processor to trap instructions in protected
pages trying to perform an indirect branch to any instruction other
than BTI.
BTI feature is selected by BRANCH_PROTECTION option which supersedes
the previous ENABLE_PAUTH used for Armv8.3-A Pointer Authentication
and is disabled by default. Enabling BTI requires compiler support
and was tested with GCC versions 9.0.0, 9.0.1 and 10.0.0.
The assembly macros and helpers are modified to accommodate the BTI
instruction.
This is an experimental feature.
Note. The previous ENABLE_PAUTH build option to enable PAuth in EL3
is now made as an internal flag and BRANCH_PROTECTION flag should be
used instead to enable Pointer Authentication.
Note. USE_LIBROM=1 option is currently not supported.

Change-Id: Ifaf4438609b16647dc79468b70cd1f47a623362e
Signed-off-by: Alexei Fedorov <Alexei.Fedorov@arm.com>
diff --git a/lib/aarch64/cache_helpers.S b/lib/aarch64/cache_helpers.S
index 9c40b9d..9ef8ca7 100644
--- a/lib/aarch64/cache_helpers.S
+++ b/lib/aarch64/cache_helpers.S
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2013-2014, ARM Limited and Contributors. All rights reserved.
+ * Copyright (c) 2013-2019, ARM Limited and Contributors. All rights reserved.
  *
  * SPDX-License-Identifier: BSD-3-Clause
  */
@@ -91,6 +91,9 @@
 	cbz	x3, exit
 	adr	x14, dcsw_loop_table	// compute inner loop address
 	add	x14, x14, x0, lsl #5	// inner loop is 8x32-bit instructions
+#if ENABLE_BTI
+	add	x14, x14, x0, lsl #2	// inner loop is + "bti j" instruction
+#endif
 	mov	x0, x9
 	mov	w8, #1
 loop1:
@@ -116,6 +119,9 @@
 	br	x14			// jump to DC operation specific loop
 
 	.macro	dcsw_loop _op
+#if ENABLE_BTI
+	bti	j
+#endif
 loop2_\_op:
 	lsl	w7, w6, w2		// w7 = aligned max set number