BUG/MINOR: xxhash: make sure armv6 uses memcpy()

There was a special case made to allow ARMv6 to use unaligned accesses
via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while
ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit
pointers, not 64-bit ones, leading to bus errors when the compiler emits
an ldrd instruction and the input (e.g. a pattern) is not aligned, as in
issue #1035.

Note that v7 was properly using the packed approach here and was safe,
however haproxy versions 2.3 and older use the old r39 xxhash code which
has the same issue for armv7. A slightly different fix is required there,
by using a different definition of packed for 32 and 64 bits.

The problem is really visible when running v7 code on a v8 kernel because
such kernels do not implement alignment trap emulation, and the process
dies when this happens. This is why in the issue above it was only detected
under lxc. The emulation could have been disabled on v7 as well by writing
zero to /proc/cpu/alignment though.

This commit is a backport of xxhash commit a470f2ef ("update default memory
access for armv6").

Thanks to @srkunze for the report and tests, @stgraber for his help on
setting up an easy reproducer outside of lxc, and @Cyan4973 for the
discussion around the best way to fix this. Details and alternate patches
available on https://github.com/Cyan4973/xxHash/issues/490.

(cherry picked from commit 4acb99f8672232753adb36e57b45e80e5bd87783)
[wt: used the different version suitable for backpotring, using the
 distinct packed settings]
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 59ad20e080aa9dd9a197c074b18850b99c94b050)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 5b1f60dd07e35539dacc048d9c4f37922c161ade)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 77eed6c7230dbde6f19ff2ed03c0d0a44058f05e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
diff --git a/src/xxhash.c b/src/xxhash.c
index 31424de..60163eb 100644
--- a/src/xxhash.c
+++ b/src/xxhash.c
@@ -39,8 +39,14 @@
 // For others CPU, the compiler will be more cautious, and insert extra code to ensure aligned access is respected.
 // If you know your target CPU supports unaligned memory access, you want to force this option manually to improve performance.
 // You can also enable this parameter if you know your input data will always be aligned (boundaries of 4, for U32).
+// 32-bit ARM is more annoying, modern cores do support unaligned accesses, but
+// not on 64-bit data (the ldrd instructions causes an alignment exception).
+// Because of this we need to split the condition for 32 and 64 bit.
 #if defined(__ARM_FEATURE_UNALIGNED) || defined(__i386) || defined(_M_IX86) || defined(__x86_64__) || defined(_M_X64)
 #  define XXH_USE_UNALIGNED_ACCESS 1
+#  if !defined(__arm__)
+#    define XXH_USE_UNALIGNED_ACCESS64 1
+#  endif
 #endif
 
 // XXH_ACCEPT_NULL_INPUT_POINTER :
@@ -118,6 +124,12 @@
 #  define _PACKED
 #endif
 
+#if defined(__GNUC__)  && !defined(XXH_USE_UNALIGNED_ACCESS64)
+#  define _PACKED64 __attribute__ ((packed))
+#else
+#  define _PACKED64
+#endif
+
 #if !defined(XXH_USE_UNALIGNED_ACCESS) && !defined(__GNUC__)
 #  ifdef __IBMC__
 #    pragma pack(1)
@@ -133,7 +145,7 @@
 typedef struct _U64_S
 {
     U64 v;
-} _PACKED U64_S;
+} _PACKED64 U64_S;
 
 #if !defined(XXH_USE_UNALIGNED_ACCESS) && !defined(__GNUC__)
 #  pragma pack(pop)
@@ -479,7 +491,7 @@
 #else
     XXH_endianess endian_detected = (XXH_endianess)XXH_CPU_LITTLE_ENDIAN;
 
-#  if !defined(XXH_USE_UNALIGNED_ACCESS)
+#  if !defined(XXH_USE_UNALIGNED_ACCESS64)
     if ((((size_t)input) & 7)==0)   // Input is aligned, let's leverage the speed advantage
     {
         if ((endian_detected==XXH_littleEndian) || XXH_FORCE_NATIVE_FORMAT)