rpi4: Accommodate "armstub8.bin" header at the beginning of BL31 image

The Raspberry Pi GPU firmware checks for a magic value at offset 240
(0xf0) of the armstub8.bin image it loads. If that value matches,
it writes the kernel load address and the DTB address into subsequent
memory locations.
We can use these addresses to avoid hardcoding these values into the BL31
image, to make it more flexible and a drop-in replacement for the
official armstub8.bin.

Reserving just 16 bytes at offset 240 of the final image file is not easily
possible, though, as this location is in the middle of the generic BL31
entry point code.
However we can prepend an extra section before the actual BL31 image, to
contain the magic and addresses. This needs to be 4KB, because the
actual BL31 entry point needs to be page aligned.

Use the platform linker script hook that the generic code provides, to
add an almost empty 4KB code block before the entry point code. The very
first word contains a branch instruction to jump over this page, into
the actual entry code.
This also gives us plenty of room for the SMP pens later.

Change-Id: I38caa5e7195fa39cbef8600933a03d86f09263d6
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
diff --git a/plat/rpi/rpi4/aarch64/armstub8_header.S b/plat/rpi/rpi4/aarch64/armstub8_header.S
new file mode 100644
index 0000000..246358d
--- /dev/null
+++ b/plat/rpi/rpi4/aarch64/armstub8_header.S
@@ -0,0 +1,37 @@
+/*
+ * Copyright (c) 2019, ARM Limited and Contributors. All rights reserved.
+ *
+ * SPDX-License-Identifier: BSD-3-Clause
+ */
+
+/*
+ * armstub8.bin header to let the GPU firmware recognise this code.
+ * It will then write the load address of the kernel image and the DT
+ * after the header magic in RAM, so we can read those addresses at runtime.
+ */
+
+.text
+	b	armstub8_end
+
+.global stub_magic
+.global dtb_ptr32
+.global kernel_entry32
+
+.org 0xf0
+armstub8:
+stub_magic:
+	.word 0x5afe570b
+stub_version:
+	.word 0
+dtb_ptr32:
+	.word 0x0
+kernel_entry32:
+	.word 0x0
+
+/*
+ * Technically an offset of 0x100 would suffice, but the follow-up code
+ * (bl31_entrypoint.S at BL31_BASE) needs to be page aligned, so pad here
+ * till the end of the first 4K page.
+ */
+.org 0x1000
+armstub8_end: