MINOR: tools: do not sum squares of differences for word fingerprints
While sums of squares usually give excellent results in fixed-sise
patterns, they don't work well to compare different sized ones such
as when some sub-words are missing, because a word such as "server"
contains "er" twice, which will rsult in an extra distance of at
least 4 for just this e->r transition compared to another one missing
it. This is one of the main reasons why "show conn" only proposes
"show info" on the CLI. Maybe an improved approach consisting in
using squares only for exact same lengths would work, but it would
still make it difficult to spot reversed characters.
diff --git a/src/tools.c b/src/tools.c
index ffd167a..f39ec1e 100644
--- a/src/tools.c
+++ b/src/tools.c
@@ -5411,7 +5411,7 @@
/* Return the distance between two word fingerprints created by function
* make_word_fingerprint(). It's a positive integer calculated as the sum of
- * the squares of the differences between each location.
+ * the differences between each location.
*/
int word_fingerprint_distance(const uint8_t *fp1, const uint8_t *fp2)
{
@@ -5419,7 +5419,7 @@
for (i = 0; i < 1024; i++) {
k = (int)fp1[i] - (int)fp2[i];
- dist += k * k;
+ dist += abs(k);
}
return dist;
}