perf(bl31): convert cpu_data fetching to C

The assembly routines are opaque to the compiler and it can't inline
them. There is also no requirement for them to be called without a
stack - each of their calls has a stack available. So convert them to C
so that the compiler can do its inlining magic.

On AArch32 we need to be able to call _cpu_data from the entrypoint so
it has to stay as a slight exception.

We can also straighten out the type of the cpu_ops_ptr member so we
don't have to cast it everywhere.

Change-Id: I9c2939a955b396edf26b99ef36318eebeaab13e6
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
diff --git a/include/arch/aarch64/el3_common_macros.S b/include/arch/aarch64/el3_common_macros.S
index fce0f2c..ee5d8d9 100644
--- a/include/arch/aarch64/el3_common_macros.S
+++ b/include/arch/aarch64/el3_common_macros.S
@@ -65,7 +65,11 @@
 	 * ---------------------------------------------------------------------
 	 */
 	bl	plat_my_core_pos
-	bl	_cpu_data_by_index
+	/* index into the cpu_data */
+	mov_imm	x1, CPU_DATA_SIZE
+	mul	x0, x0, x1
+	adr_l	x1, percpu_data
+	add	x0, x0, x1
 	msr	tpidr_el3, x0
 #endif /* IMAGE_BL31 */