OPTIM/MEDIUM: epoll: fuse active events into polled ones during polling changes
When trying to speculatively send data to a server being connected to,
we see the following pattern :
connect() = EINPROGRESS
send() = EAGAIN
epoll_ctl(add, W)
epoll_wait() = EPOLLOUT
send() = success
> epoll_ctl(del, W)
> recv() = EAGAIN
> epoll_ctl(add, R)
recv() = success
epoll_ctl(del, R)
The reason for the failed recv() call is that the reading was marked
as speculative while we already have a polled I/O there. So we already
know when removing send write poll that the read is pending. Thus,
let's improve this by merging speculative I/O into polled I/O when
polled state changes. The result is now the following as expected :
connect() = EINPROGRESS
send() = EAGAIN
epoll_ctl(add, W)
epoll_wait() = EPOLLOUT
send() = success
epoll_ctl(mod, R)
recv() = success
epoll_ctl(del, R)
This is specific to epoll(), it doesn't make much sense at the moment
to do so for other pollers, because the cost of updating them is very
small.
The average performance gain on small requests is of 1.6% in TCP mode,
which is easily explained with the syscall stats below for 10000 forwarded
connections :
Before :
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
91.02 0.024608 0 60000 1 epoll_wait
2.19 0.000593 0 20000 shutdown
1.52 0.000412 0 10000 10000 connect
1.36 0.000367 0 29998 9998 sendto
1.09 0.000294 0 49993 epoll_ctl
0.93 0.000252 0 50004 20002 recvfrom
0.79 0.000214 0 20005 close
0.62 0.000167 0 20001 10001 accept4
0.25 0.000067 0 20002 setsockopt
0.13 0.000035 0 10001 socket
0.10 0.000028 0 10001 fcntl
After:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
87.59 0.024269 0 50012 1 epoll_wait
3.19 0.000884 0 20000 shutdown
2.33 0.000646 0 29996 9996 sendto
2.02 0.000560 0 10005 10003 connect
1.40 0.000387 0 40013 10013 recvfrom
1.35 0.000374 0 40000 epoll_ctl
0.64 0.000178 0 20001 10001 accept4
0.55 0.000152 0 20005 close
0.45 0.000124 0 20002 setsockopt
0.31 0.000086 0 10001 fcntl
0.17 0.000047 0 10001 socket
Overall :
-16.6% epoll_wait
-20% recvfrom
-20% epoll_ctl
On HTTP, the gain is even better :
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
80.43 0.015386 0 60006 1 epoll_wait
4.61 0.000882 0 30000 10000 sendto
3.74 0.000715 0 20001 10001 accept4
3.35 0.000640 0 10000 10000 connect
2.66 0.000508 0 20005 close
1.34 0.000257 0 30002 10002 recvfrom
1.27 0.000242 0 30005 epoll_ctl
1.20 0.000230 0 10000 shutdown
0.62 0.000119 0 20003 setsockopt
0.40 0.000077 0 10001 socket
0.39 0.000074 0 10001 fcntl
willy@wtap:haproxy$ head -15 apres.txt
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
83.47 0.020301 0 50008 1 epoll_wait
4.26 0.001036 0 20005 close
3.30 0.000803 0 30000 10000 sendto
2.55 0.000621 0 20001 10001 accept4
1.76 0.000428 0 10000 10000 connect
1.20 0.000292 0 10000 shutdown
1.14 0.000278 0 20001 1 recvfrom
0.86 0.000210 0 20003 epoll_ctl
0.71 0.000173 0 20003 setsockopt
0.49 0.000120 0 10001 socket
0.25 0.000060 0 10001 fcntl
Overall :
-16.6% epoll_wait
-33% recvfrom
-33% epoll_ctl
1 file changed