Fix the CAS spinlock implementation

Make the spinlock implementation use ARMv8.1-LSE CAS instruction based
on a platform build option. The CAS-based implementation used to be
unconditionally selected for all ARM8.1+ platforms.

The previous CAS spinlock implementation had a bug wherein the spin_unlock()
implementation had an `sev` after `stlr` which is not sufficient. A dsb is
needed to ensure that the stlr completes prior to the sev. Having a dsb is
heavyweight and a better solution would be to use load exclusive semantics
to monitor the lock and wake up from wfe when a store happens to the lock.
The patch implements the same.

Change-Id: I5283ce4a889376e4cc01d1b9d09afa8229a2e522
Signed-off-by: Soby Mathew <soby.mathew@arm.com>
Signed-off-by: Olivier Deprez <olivier.deprez@arm.com>
diff --git a/docs/getting_started/user-guide.rst b/docs/getting_started/user-guide.rst
index 44bfb7a..ae276f2 100644
--- a/docs/getting_started/user-guide.rst
+++ b/docs/getting_started/user-guide.rst
@@ -820,6 +820,10 @@
    reduces SRAM usage. Refer to `Library at ROM`_ for further details. Default
    is 0.
 
+-  ``USE_SPINLOCK_CAS``: Setting this build flag to 1 selects the spinlock
+    implementation variant using the ARMv8.1-LSE compare-and-swap instruction.
+    Notice this option is experimental and only available to AArch64 builds.
+
 -  ``V``: Verbose build. If assigned anything other than 0, the build commands
    are printed. Default is 0.