libc: memset: improve performance by avoiding single byte writes

Currently our memset() implementation is safe, but slow. The main reason
for that seems to be the single byte writes that it issues, which can
show horrible performance, depending on the implementation of the
load/store subsystem.

Improve the algorithm by trying to issue 64-bit writes. As this only
works with aligned pointers, have a head and a tail section which
covers unaligned pointers, and leave the bulk of the work to the middle
section that does use 64-bit writes.

Put through some unit tests, which exercise all combinations of nasty
input parameters (pointers with various alignments, various odd and even
sizes, corner cases of content to write (-1, 256)).

Change-Id: I28ddd3d388cc4989030f1a70447581985368d5bb
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
1 file changed