[OPTIM] halog: use a faster zero test in fgets() A new idea came up to detect the presence of a null byte in a word. It saves several operations compared to the previous one, and eliminates the jumps (about 6 instructions which can run 2-by-2 in parallel). This sole optimisation improved the line count speed by about 30%.

commit: 1769a18f6257ee1025774b0540f580271c75c4fa [log] [tgz]
author: Willy Tarreau <w@1wt.eu> Tue May 04 10:47:57 2010 +0200
committer: Willy Tarreau <w@1wt.eu> Tue May 04 11:04:54 2010 +0200
tree: 34a0dd1f680fdaf4d2c3cb7177284afe0ef562a2
parent: 0f423a70730bc42de75222b3c00404532f084010 [diff] [blame]
diff --git a/contrib/halog/fgets2.c b/contrib/halog/fgets2.c
index 13918a7..61d9ef5 100644
--- a/contrib/halog/fgets2.c
+++ b/contrib/halog/fgets2.c

@@ -24,12 +24,25 @@
 // return 1 if the integer contains at least one zero byte
 static inline unsigned int has_zero(unsigned int x)
 {
-	if (!(x & 0xFF000000U) ||
-	    !(x & 0xFF0000U) ||
-	    !(x & 0xFF00U) ||
-	    !(x & 0xFFU))
-		return 1;
-	return 0;
+	unsigned int y;
+
+	/* Principle: we want to perform 4 tests on one 32-bit int at once. For
+	 * this, we have to simulate an SIMD instruction which we don't have by
+	 * default. The principle is that a zero byte is the only one which
+	 * will cause a 1 to appear on the upper bit of a byte/word/etc... when
+	 * we subtract 1. So we can detect a zero byte if a one appears at any
+	 * of the bits 7, 15, 23 or 31 where it was not. It takes only one
+	 * instruction to test for the presence of any of these bits, but it is
+	 * still complex to check for their initial absence. Thus, we'll
+	 * proceed differently : we first save and clear only those bits, then
+	 * we check in the final result if one of them is present and was not.
+	 */
+	y = x;
+	x = ~x & 0x80808080; /* save and invert bits 7, 15, 23, 31 */
+	y &= 0x7F7F7F7F;     /* clear them */
+	y -= 0x01010101;     /* generate a carry */
+	y &= x;              /* clear the bits that were already set */
+	return !!y;
 }
commit	1769a18f6257ee1025774b0540f580271c75c4fa	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Tue May 04 10:47:57 2010 +0200
committer	Willy Tarreau <w@1wt.eu>	Tue May 04 11:04:54 2010 +0200
tree	34a0dd1f680fdaf4d2c3cb7177284afe0ef562a2
parent	0f423a70730bc42de75222b3c00404532f084010 [diff] [blame]