[OPTIM] stream_sock: don't use splice on too small payloads

It's more expensive to call splice() on short payloads than to use
recv()+send(). One of the reasons is that doing a splice() involves
allocating a pipe. One other reason is that the kernel will have to
copy itself if we try to splice less than a page. So let's fix a
short offset of 4kB below which we don't splice.

A quick test shows that on chunked encoded data, with splice we had
6826 syscalls (1715 splice, 3461 recv, 1650 send) while with this
patch, the same transfer resulted in 5793 syscalls (3896 recv, 1897
send).
diff --git a/include/common/defaults.h b/include/common/defaults.h
index d1ce021..96e0f61 100644
--- a/include/common/defaults.h
+++ b/include/common/defaults.h
@@ -97,6 +97,12 @@
 #define MIN_RET_FOR_READ_LOOP 1460
 #endif
 
+// The minimum number of bytes to be forwarded that is worth trying to splice.
+// Below 4kB, it's not worth allocating pipes nor pretending to zero-copy.
+#ifndef MIN_SPLICE_FORWARD
+#define MIN_SPLICE_FORWARD 4096
+#endif
+
 // the max number of events returned in one call to poll/epoll. Too small a
 // value will cause lots of calls, and too high a value may cause high latency.
 #ifndef MAX_POLL_EVENTS