crc32: Add crc32 implementation using __builtin_aarch64_crc32b

ARMv8.0 has optional crc32 instruction for crc32 calculation. The
instruction is mandatory since ARMv8.1. The crc32 calculation is
faster using the dedicated instruction, e.g. 1.4 GHz iMX8MN gives:

  => time crc32 0x50000000 0x2000000
  time: 0.126 seconds # crc32 instruction
  time: 0.213 seconds # software crc32

Add implementation using the compiler builtin wrapper for the crc32
instruction and enable it by default, since we don't support any
platforms which do not implement this instruction.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Simon Glass <sjg@chromium.org>
[trini: Make crc32_table guarded by CONFIG_ARM64_CRC32]
Signed-off-by: Tom Rini <trini@konsulko.com>
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index c68e598..ce977bf 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -18,7 +18,11 @@
 				 $(call cc-option, -march=armv7))
 arch-$(CONFIG_CPU_V7M)		=-march=armv7-m
 arch-$(CONFIG_CPU_V7R)		=-march=armv7-r
+ifeq ($(CONFIG_ARM64_CRC32),y)
+arch-$(CONFIG_ARM64)		=-march=armv8-a+crc
+else
 arch-$(CONFIG_ARM64)		=-march=armv8-a
+endif
 
 # On Tegra systems we must build SPL for the armv4 core on the device
 # but otherwise we can use the value in CONFIG_SYS_ARM_ARCH