board: emulation: Add QEMU sbsa support

Add support for Arm sbsa [1] v0.3+ that is supported by QEMU [2].

Unlike other Arm based platforms the machine only provides a minimal
FDT that contains number of CPUs, ammount of memory and machine-version.
The boot firmware has to provide ACPI tables to the OS.
Due to this design a full DTB is added here as well that allows U-Boot's
driver to properly function. The DTB is appended at the end of the U-Boot
image and will be merged with the QEMU provided DTB.

In addition provide documentation how to use, enable binman to fabricate both
ROMs that are required to boot and add ACPI tables to make it full compatible
to the EDK2 reference implementation.

The board was tested using Fedora 40 Aarch64 Workstation. It's able
to boot from USB and AHCI or network.

Tested and found working:
- serial
- PCI
- xHCI
- Bochs display
- AHCI
- network using e1000e
- CPU init
- Booting Fedora 40

1: Server Base System Architecture (SBSA)
2: https://www.qemu.org/docs/master/system/arm/sbsa.html

Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com>
Cc: Peter Robinson <pbrobinson@gmail.com>
Cc: Simon Glass <sjg@chromium.org>
Cc: Tom Rini <trini@konsulko.com>
diff --git a/arch/arm/include/asm/arch-qemu-sbsa/boot0.h b/arch/arm/include/asm/arch-qemu-sbsa/boot0.h
new file mode 100644
index 0000000..4a1a254
--- /dev/null
+++ b/arch/arm/include/asm/arch-qemu-sbsa/boot0.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * sbsa-ref starts U-Boot in XIP memory. Need to relocate U-Boot
+ * to DRAM which is already up. Instead of using SPL this simple loader
+ * is being used.
+ */
+relocate_check:
+	/* x0 contains the pointer to FDT provided by ATF */
+	adr	x1, _start		/* x1 <- Runtime value of _start */
+	ldr	x2, _TEXT_BASE		/* x2 <- Linked value of _start */
+	subs	x9, x1, x2		/* x9 <- Run-vs-link offset */
+	beq	reset
+
+	adrp	x1, __image_copy_start		/* x2 <- address bits [31:12] */
+	add	x1, x1, :lo12:__image_copy_start/* x2 <- address bits [11:00] */
+	adrp	x3, __image_copy_end		/* x3 <- address bits [31:12] */
+	add	x3, x3, :lo12:__image_copy_end	/* x3 <- address bits [11:00] */
+	add	x3, x3, #0x100000		/* 1 MiB for the DTB found at _end */
+
+copy_loop:
+	ldp	x10, x11, [x1], #16	/* copy from source address [x1] */
+	stp	x10, x11, [x2], #16	/* copy to   target address [x2] */
+	cmp	x1, x3			/* until source end address [x3] */
+	b.lo	copy_loop
+
+	isb
+	ldr	x2, _TEXT_BASE		/* x2 <- Linked value of _start */
+	br	x2			/* Jump to linked address */
+	/* Never reaches this point */
+1:
+	wfi
+	b 1b
+
+relocate_done:
\ No newline at end of file