Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I've tried duplicating your problem using the following parameters:</p> <ol> <li>The server uses epoll to manage connections.</li> <li>I make 3000 connections.</li> <li>Connections are blocking.</li> <li>The server is basically 'reduced' to handling the connections only and performing very little complicated work.</li> </ol> <p>I cannot duplicate the problem. Here is my server source code.</p> <pre><code>#include &lt;stddef.h&gt; #include &lt;stdint.h&gt; #include &lt;stdbool.h&gt; #include &lt;stdlib.h&gt; #include &lt;stdio.h&gt; #include &lt;errno.h&gt; #include &lt;netdb.h&gt; #include &lt;sys/types.h&gt; #include &lt;sys/socket.h&gt; #include &lt;sys/epoll.h&gt; #include &lt;err.h&gt; #include &lt;sysexits.h&gt; #include &lt;string.h&gt; #include &lt;unistd.h&gt; struct { int numfds; int numevents; struct epoll_event *events; } connections = { 0, 0, NULL }; static int create_srv_socket(const char *port) { int fd = -1; int rc; struct addrinfo *ai = NULL, hints; memset(&amp;hints, 0, sizeof(hints)); hints.ai_flags = AI_PASSIVE; if ((rc = getaddrinfo(NULL, port, &amp;hints, &amp;ai)) != 0) errx(EX_UNAVAILABLE, "Cannot create socket: %s", gai_strerror(rc)); if ((fd = socket(ai-&gt;ai_family, ai-&gt;ai_socktype, ai-&gt;ai_protocol)) &lt; 0) err(EX_OSERR, "Cannot create socket"); if (bind(fd, ai-&gt;ai_addr, ai-&gt;ai_addrlen) &lt; 0) err(EX_OSERR, "Cannot bind to socket"); rc = 1; if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &amp;rc, sizeof(rc)) &lt; 0) err(EX_OSERR, "Cannot setup socket options"); if (listen(fd, 25) &lt; 0) err(EX_OSERR, "Cannot setup listen length on socket"); return fd; } static int create_epoll(void) { int fd; if ((fd = epoll_create1(0)) &lt; 0) err(EX_OSERR, "Cannot create epoll"); return fd; } static bool epoll_join(int epollfd, int fd, int events) { struct epoll_event ev; ev.events = events; ev.data.fd = fd; if ((connections.numfds+1) &gt;= connections.numevents) { connections.numevents+=1024; connections.events = realloc(connections.events, sizeof(connections.events)*connections.numevents); if (!connections.events) err(EX_OSERR, "Cannot allocate memory for events list"); } if (epoll_ctl(epollfd, EPOLL_CTL_ADD, fd, &amp;ev) &lt; 0) { warn("Cannot add socket to epoll set"); return false; } connections.numfds++; return true; } static void epoll_leave(int epollfd, int fd) { if (epoll_ctl(epollfd, EPOLL_CTL_DEL, fd, NULL) &lt; 0) err(EX_OSERR, "Could not remove entry from epoll set"); connections.numfds--; } static void cleanup_old_events(void) { if ((connections.numevents - 1024) &gt; connections.numfds) { connections.numevents -= 1024; connections.events = realloc(connections.events, sizeof(connections.events)*connections.numevents); } } static void disconnect(int fd) { shutdown(fd, SHUT_RDWR); close(fd); return; } static bool read_and_reply(int fd) { char buf[128]; int rc; memset(buf, 0, sizeof(buf)); if ((rc = recv(fd, buf, sizeof(buf), 0)) &lt;= 0) { rc ? warn("Cannot read from socket") : 1; return false; } if (send(fd, buf, rc, MSG_NOSIGNAL) &lt; 0) { warn("Cannot send to socket"); return false; } return true; } int main() { int srv = create_srv_socket("8558"); int ep = create_epoll(); int rc = -1; struct epoll_event *ev = NULL; if (!epoll_join(ep, srv, EPOLLIN)) err(EX_OSERR, "Server cannot join epollfd"); while (1) { int i, cli; rc = epoll_wait(ep, connections.events, connections.numfds, -1); if (rc &lt; 0 &amp;&amp; errno == EINTR) continue; else if (rc &lt; 0) err(EX_OSERR, "Cannot properly perform epoll wait"); for (i=0; i &lt; rc; i++) { ev = &amp;connections.events[i]; if (ev-&gt;data.fd != srv) { if (ev-&gt;events &amp; EPOLLIN) { if (!read_and_reply(ev-&gt;data.fd)) { epoll_leave(ep, ev-&gt;data.fd); disconnect(ev-&gt;data.fd); } } if (ev-&gt;events &amp; EPOLLERR || ev-&gt;events &amp; EPOLLHUP) { if (ev-&gt;events &amp; EPOLLERR) warn("Error in in fd: %d", ev-&gt;data.fd); else warn("Closing disconnected fd: %d", ev-&gt;data.fd); epoll_leave(ep, ev-&gt;data.fd); disconnect(ev-&gt;data.fd); } } else { if (ev-&gt;events &amp; EPOLLIN) { if ((cli = accept(srv, NULL, 0)) &lt; 0) { warn("Could not add socket"); continue; } epoll_join(ep, cli, EPOLLIN); } if (ev-&gt;events &amp; EPOLLERR || ev-&gt;events &amp; EPOLLHUP) err(EX_OSERR, "Server FD has failed", ev-&gt;data.fd); } } cleanup_old_events(); } } </code></pre> <p>Here is the client:</p> <pre><code>from socket import * import time scks = list() for i in range(0, 3000): s = socket(AF_INET, SOCK_STREAM) s.connect(("localhost", 8558)) scks.append(s) time.sleep(600) </code></pre> <p>When running this on my local machine I get 6001 sockets using port 8558 (1 listening, 3000 client side sockets and 3000 server side sockets).</p> <pre><code>$ ss -ant | grep 8558 | wc -l 6001 </code></pre> <p>When checking the number of IP connections connected on the client I get 3000.</p> <pre><code># lsof -p$(pgrep python) | grep IPv4 | wc -l 3000 </code></pre> <p>I've also tried the test with the server on a remote machine with success too.</p> <p>I'd suggest you attempt to do the same.</p> <p>In addition try turning off iptables completely just in case its some connection tracking quirk. Sometimes the iptables option in <code>/proc</code> can help too. So try <code>sysctl -w net.netfilter.nf_conntrack_tcp_be_liberal=1</code>.</p> <p><strong>Edit:</strong> I've done another test which produces the output you see on your side. Your problem is that you are shutting down the connection on the server side pre-emptively.</p> <p>I can duplicate results similar to what you are seeing doing the following:</p> <ul> <li>After reading some data in to my server, call <code>shutdown(fd, SHUT_RD)</code>.</li> <li>Do <code>send(fd, buf, sizeof(buf))</code> on the server.</li> </ul> <p>After doing this the following behaviours are seen.</p> <ul> <li>On the client I get 3000 connections open in netstat/ss with ESTABLISHED.</li> <li>In lsof output I get 2880 (nature of how I was doing shutdown) connections established.</li> <li>The remainder of the connections <code>lsof -i:8558 | grep -v ES</code> are in CLOSE_WAIT.</li> </ul> <p>This only happens on a half-shutdown connection.</p> <p>As such I suspect this is a bug in your client or server program. Either you are sending something to the server which the server objects to, or the server is invalidly closing connections down for some reason.</p> <p>You need to confirm that what state the "anomalous" connections in (like close_wait or something else).</p> <p>At this stage I also consider this a programming problem and not really something that belongs on serverfault. Without seeing the relevant portions of the source for the client/server it is not going to be possible for anybody to track down the cause of the fault. Albeit I am pretty confident this is nothing to do with the way the operating system is handling the connections.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload