feat(msm8916): clear CACHE_LOCK for MMU-500 r2p0+

Newer Qualcomm platforms similar to MSM8916 use MMU-500 r2p0+ instead
of MMU-500 r0p0. On these versions it is necessary to clear the
SMMU_sACR.CACHE_LOCK bit to allow the normal world to write to
SMMU_CBn_ACTLR. Without this Linux shows a warning and is unable to
workaround the errata in MMU-500:

  arm-smmu 1e00000.iommu: Failed to disable prefetcher
    [errata #841119  and #826419], check ACR.CACHE_LOCK

Handle this dynamically at runtime by enabling all the necessary SMMU
clocks and check the IDR7 register for MMU-500 r2p0+. This must be
applied to both SMMUs on the platform: APPS and GPU.

While at it clean up the clock handling: Leave the SMMU clocks on
because the normal world will need it again while booting. But make
sure the vote register of the RPM co-processor does not keep these
clocks always-on. For some reasons some platforms seem to have a
non-zero reset value for GCC_RPM_SMMU_CLOCK_BRANCH_ENA_VOTE.

Change-Id: I34cf7d3f2db977b0930eb6e64a870ecaf02a7573
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
diff --git a/plat/qti/msm8916/include/msm8916_mmap.h b/plat/qti/msm8916/include/msm8916_mmap.h
index 35e3b86..dc420fc 100644
--- a/plat/qti/msm8916/include/msm8916_mmap.h
+++ b/plat/qti/msm8916/include/msm8916_mmap.h
@@ -22,6 +22,7 @@
 
 #define APPS_SMMU_BASE		(PCNOC_BASE + 0x1e00000)
 #define APPS_SMMU_QCOM		(APPS_SMMU_BASE + 0xf0000)
+#define GPU_SMMU_BASE		(PCNOC_BASE + 0x1f00000)
 
 #define BLSP1_BASE		(PCNOC_BASE + 0x7880000)
 #define BLSP1_UART_BASE(n)	(BLSP1_BASE + 0x2f000 + (((n) - 1) * 0x1000))