arm: bitops: fix find_next_zero_bit() for case size < 32

find_next_zero_bit() incorrectly handles cases when:
- total bitmap size < 32
- rest of bits to process

static inline int find_next_zero_bit(void *addr, int size, int offset)
{
	unsigned long *p = ((unsigned long *)addr) + (offset >> 5);
	unsigned long result = offset & ~31UL;
	unsigned long tmp;

	if (offset >= size)
		return size;
	size -= result;
	offset &= 31UL;
	if (offset) {
		tmp = *(p++);
		tmp |= ~0UL >> (32-offset);
		if (size < 32)
[1]
			goto found_first;
		if (~tmp)
			goto found_middle;
		size -= 32;
		result += 32;
	}
	while (size & ~31UL) {
		tmp = *(p++);
		if (~tmp)
			goto found_middle;
		result += 32;
		size -= 32;
	}
[2]
	if (!size)
		return result;
	tmp = *p;

found_first:
[3]  tmp |= ~0UL >> size;

^^^ algo can reach above line from from points:
 [1] offset > 0 and size < 32, tmp[offset-1..0] bits set to 1
 [2] size < 32 - rest of bits to process
 in both cases bits to search are tmp[size-1..0], but line [3] will simply
 set all tmp[31-size..0] bits to 1 and ffz(tmp) below will fail.

example: bitmap size = 16, offset = 0, bitmap is empty.
 code will go through the point [2], tmp = 0x0
 after line [3] => tmp = 0xFFFF and ffz(tmp) will return 16.

found_middle:
	return result + ffz(tmp);
}

Fix it by correctly seting tmp[31..size] bits to 1 in the above case [3].

Fixes: 81e9fe5a2988 ("arm: implement find_next_zero_bit function")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
index f33efeb..2750d9b 100644
--- a/arch/arm/include/asm/bitops.h
+++ b/arch/arm/include/asm/bitops.h
@@ -158,7 +158,7 @@
 	tmp = *p;
 
 found_first:
-	tmp |= ~0UL >> size;
+	tmp |= ~0UL << size;
 found_middle:
 	return result + ffz(tmp);
 }