BUG/MEDIUM: tools: fix direction of my_ffsl()

Commit 27346b01a ("OPTIM: tools: optimize my_ffsl() for x86_64") optimized
my_ffsl() for intensive use cases in the scheduler, but as half of the times
I got it wrong so it counted bits the reverse way. It doesn't matter for the
scheduler nor fd cache but it broke cpu-map with threads which heavily relies
on proper ordering.

We should probably consider dropping support for gcc < 3.4 and switching
to builtins for these ones, though often they are as ambiguous.

No backport is needed.
diff --git a/include/common/standard.h b/include/common/standard.h
index 3f8d2d0..53d7f9f 100644
--- a/include/common/standard.h
+++ b/include/common/standard.h
@@ -815,7 +815,7 @@
 	unsigned long cnt;
 
 #if defined(__x86_64__)
-	__asm__("bsr %1,%0\n" : "=r" (cnt) : "rm" (a));
+	__asm__("bsf %1,%0\n" : "=r" (cnt) : "rm" (a));
 	cnt++;
 #else