BUILD: threads: fix again the __ha_cas_dw() definition

This low-level asm implementation of a double CAS was implemented only
for certain architectures (x86_64, armv7, armv8). When threads are not
used, they were not defined, but since they were called directly from
a few locations, they were causing build issues on certain platforms
with threads disabled. This was addressed in commit f4436e1 ("BUILD:
threads: Add __ha_cas_dw fallback for single threaded builds") by
making it fall back to HA_ATOMIC_CAS() when threads are not defined,
but this actually made the situation worse by breaking other cases.

This patch fixes this by creating a high-level macro HA_ATOMIC_DWCAS()
which is similar to HA_ATOMIC_CAS() except that it's intended to work
on a double word, and which rely on the asm implementations when threads
are in use, and uses its own open-coded implementation when threads are
not used. The 3 call places relying on __ha_cas_dw() were updated to
use HA_ATOMIC_DWCAS() instead.

This change was tested on i586, x86_64, armv7, armv8 with and without
threads with gcc 4.7, armv8 with gcc 5.4 with and without threads, as
well as i586 with gcc-3.4 without threads. It will need to be backported
to 1.9 along with the fix above to fix build on armv7 with threads
disabled.
diff --git a/src/memory.c b/src/memory.c
index ef7ec93..140671b 100644
--- a/src/memory.c
+++ b/src/memory.c
@@ -252,7 +252,7 @@
 				break;
 			new.free_list = *POOL_LINK(entry, cmp.free_list);
 			new.seq = cmp.seq + 1;
-			if (__ha_cas_dw(&entry->free_list, &cmp, &new) == 0)
+			if (HA_ATOMIC_DWCAS(&entry->free_list, &cmp, &new) == 0)
 				continue;
 			free(cmp.free_list);
 			_HA_ATOMIC_SUB(&entry->allocated, 1);