arm: mvebu: Fix moving internal registers

Commit 5bb2c550b11e ("arm: mvebu: Move internal registers in
arch_very_early_init() function") moved code from file cpu.c to lowlevel.c,
which moves Marvell internal registers from address INTREG_BASE_ADDR_REG to
SOC_REGS_PHY_BASE.

But the steps describing how to do it correctly were documented only in
older U-Boot versions and commit cefd764222ee ("arm: mvebu: Fix internal
register config on A38x") probably unintentionally removed important
details about MMU from code comments around.

Commit 5bb2c550b11e ("arm: mvebu: Move internal registers in
arch_very_early_init() function") implemented code movement according to
(now incomplete) comments which resulted in semi-broken code.

The result is that I-cache is currently disabled for all Armada 38x boards
and maybe there are some other (unreported / undetected) issues.

Reimplement it correctly. First flush all caches, then disable MMU and L2
cache and then move Marvell internal registers. There is no need to
explicitly disable I-cache.

After this change lzmadec command with lzma image of 0x7000000 bytes is
doing decompression just 5 seconds. Before this change it was 30 seconds.

To make lowlevel.S code more readable, extend asm/pl310.h header file to be
compatible with assembler and use macros from this file.

Fixes: 5bb2c550b11e ("arm: mvebu: Move internal registers in arch_very_early_init() function")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Stefan Roese <sr@denx.de>
diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h
index f69e9e4..9d4cd68 100644
--- a/arch/arm/include/asm/pl310.h
+++ b/arch/arm/include/asm/pl310.h
@@ -7,13 +7,12 @@
 #ifndef _PL310_H_
 #define _PL310_H_
 
-#include <linux/types.h>
-
 /* Register bit fields */
 #define PL310_AUX_CTRL_ASSOCIATIVITY_MASK	(1 << 16)
 #define L2X0_DYNAMIC_CLK_GATING_EN		(1 << 1)
 #define L2X0_STNDBY_MODE_EN			(1 << 0)
 #define L2X0_CTRL_EN				1
+#define L2X0_CTRL_OFF				0x100
 
 #define L310_SHARED_ATT_OVERRIDE_ENABLE		(1 << 22)
 #define L310_AUX_CTRL_DATA_PREFETCH_MASK	(1 << 28)
@@ -27,6 +26,10 @@
 #define L2X0_CACHE_ID_RTL_MASK          0x3f
 #define L2X0_CACHE_ID_RTL_R3P2          0x8
 
+#ifndef __ASSEMBLY__
+
+#include <linux/types.h>
+
 struct pl310_regs {
 	u32 pl310_cache_id;
 	u32 pl310_cache_type;
@@ -87,3 +90,5 @@
 void pl310_clean_inval_range(u32 start, u32 end);
 
 #endif
+
+#endif