Fix the CAS spinlock implementation

Make the spinlock implementation use ARMv8.1-LSE CAS instruction based
on a platform build option. The CAS-based implementation used to be
unconditionally selected for all ARM8.1+ platforms.

The previous CAS spinlock implementation had a bug wherein the spin_unlock()
implementation had an `sev` after `stlr` which is not sufficient. A dsb is
needed to ensure that the stlr completes prior to the sev. Having a dsb is
heavyweight and a better solution would be to use load exclusive semantics
to monitor the lock and wake up from wfe when a store happens to the lock.
The patch implements the same.

Change-Id: I5283ce4a889376e4cc01d1b9d09afa8229a2e522
Signed-off-by: Soby Mathew <soby.mathew@arm.com>
Signed-off-by: Olivier Deprez <olivier.deprez@arm.com>
diff --git a/docs/design/firmware-design.rst b/docs/design/firmware-design.rst
index dc08208..2cbd9c9 100644
--- a/docs/design/firmware-design.rst
+++ b/docs/design/firmware-design.rst
@@ -2540,8 +2540,11 @@
 This Architecture Extension is targeted when ``ARM_ARCH_MAJOR`` >= 8, or when
 ``ARM_ARCH_MAJOR`` == 8 and ``ARM_ARCH_MINOR`` >= 1.
 
--  The Compare and Swap instruction is used to implement spinlocks. Otherwise,
-   the load-/store-exclusive instruction pair is used.
+-  By default, a load-/store-exclusive instruction pair is used to implement
+   spinlocks. The ``USE_SPINLOCK_CAS`` build option when set to 1 selects the
+   spinlock implementation using the ARMv8.1-LSE Compare and Swap instruction.
+   Notice this instruction is only available in AArch64 execution state, so
+   the option is only available to AArch64 builds.
 
 Armv8.2-A
 ~~~~~~~~~
diff --git a/docs/getting_started/user-guide.rst b/docs/getting_started/user-guide.rst
index 44bfb7a..ae276f2 100644
--- a/docs/getting_started/user-guide.rst
+++ b/docs/getting_started/user-guide.rst
@@ -820,6 +820,10 @@
    reduces SRAM usage. Refer to `Library at ROM`_ for further details. Default
    is 0.
 
+-  ``USE_SPINLOCK_CAS``: Setting this build flag to 1 selects the spinlock
+    implementation variant using the ARMv8.1-LSE compare-and-swap instruction.
+    Notice this option is experimental and only available to AArch64 builds.
+
 -  ``V``: Verbose build. If assigned anything other than 0, the build commands
    are printed. Default is 0.