libc/memset: Implement function in assembler

Trace analysis of FVP_Base_AEMv8A model running in
Aarch32 mode with the build options listed below:
TRUSTED_BOARD_BOOT=1 GENERATE_COT=1
ARM_ROTPK_LOCATION=devel_ecdsa KEY_ALG=ecdsa
ROT_KEY=plat/arm/board/common/rotpk/arm_rotprivk_ecdsa.pem
shows that when auth_signature() gets called
71.84% of CPU execution time is spent in memset() function
written in C using single byte write operations,
see lib\libc\memset.c.
This patch replaces C memset() implementation with assembler
version giving the following results:
- for Aarch32 in auth_signature() call memset() CPU time
reduced to 24.84%.
- Number of CPU instructions executed during TF-A
boot stage before start of BL33 in RELEASE builds:
----------------------------------------------
|  Arch   |     C      |  assembler |    %   |
----------------------------------------------
| Aarch32 | 2073275460 | 1487400003 | -28.25 |
| Aarch64 | 2056807158 | 1244898303 | -39.47 |
----------------------------------------------
The patch also replaces memset.c with aarch64/memset.S
in plat\nvidia\tegra\platform.mk.

Change-Id: Ifbf085a2f577a25491e2d28446ee95a4ac891597
Signed-off-by: Alexei Fedorov <Alexei.Fedorov@arm.com>
diff --git a/lib/libc/libc.mk b/lib/libc/libc.mk
index 93d30d0..90a2a1e 100644
--- a/lib/libc/libc.mk
+++ b/lib/libc/libc.mk
@@ -1,5 +1,5 @@
 #
-# Copyright (c) 2016-2019, ARM Limited and Contributors. All rights reserved.
+# Copyright (c) 2016-2020, ARM Limited and Contributors. All rights reserved.
 #
 # SPDX-License-Identifier: BSD-3-Clause
 #
@@ -13,7 +13,6 @@
 			memcpy.c			\
 			memmove.c			\
 			memrchr.c			\
-			memset.c			\
 			printf.c			\
 			putchar.c			\
 			puts.c				\
@@ -28,8 +27,14 @@
 
 ifeq (${ARCH},aarch64)
 LIBC_SRCS	+=	$(addprefix lib/libc/aarch64/,	\
+			memset.S			\
 			setjmp.S)
 endif
 
+ifeq (${ARCH},aarch32)
+LIBC_SRCS	+=	$(addprefix lib/libc/aarch32/,	\
+			memset.S)
+endif
+
 INCLUDES	+=	-Iinclude/lib/libc		\
 			-Iinclude/lib/libc/$(ARCH)	\