fin_wait_2.html revision 35f745d0d98970c673c5ef89cd48bbd2beeb2efe
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<!--#include virtual="header.html" -->
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<H1>Connections in the FIN_WAIT_2 state and Apache</H1>
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncStarting with the Apache 1.2 betas, people are reporting many more
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconnections in the FIN_WAIT_2 state (as reported by
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<code>netstat</code>) than they saw using older versions. When the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncserver closes a TCP connection, it sends a packet with the FIN bit
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncsent to the client, which then responds with a packet with the ACK bit
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncset. The client then sends a packet with the FIN bit set to the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncserver, which responds with an ACK and the connection is closed. The
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncstate that the connection is in during the period between when the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncserver gets the ACK from the client and the server gets the FIN from
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthe client is known as FIN_WAIT_2. See the <A
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncHREF="ftp://ds.internic.net/rfc/rfc793.txt">TCP RFC</A> for the
a180a41bba1d50822df23fff0099e90b86638b89vboxsynctechnical details of the state transitions.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncThe FIN_WAIT_2 state is somewhat unusual in that there is no timeout
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncdefined in the standard for it. This means that on many operating
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncsystems, a connection in the FIN_WAIT_2 state will stay around until
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthe system is rebooted. If the system does not have a timeout and
a180a41bba1d50822df23fff0099e90b86638b89vboxsynctoo many FIN_WAIT_2 connections build up, it can fill up the space
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncallocated for storing information about the connections and crash
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthe kernel. The connections in FIN_WAIT_2 do not tie up an httpd
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncThere are several reasons for it happening, and not all of them are
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncfully understood by the Apache team yet. What is known follows.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncSeveral clients have a bug which pops up when dealing with
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<A HREF="/keepalive.html">persistent connections</A> (aka keepalives).
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncWhen the connection is idle and the server closes the connection
a180a41bba1d50822df23fff0099e90b86638b89vboxsync(based on the <A HREF="/mod/core.html#keepalivetimeout">
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncKeepAliveTimeout</A>), the client is programmed so that the client does
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncnot send back a FIN and ACK to the server. This means that the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconnection stays in the FIN_WAIT_2 state until one of the following
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>The client opens a new connection to the same or a different
a180a41bba1d50822df23fff0099e90b86638b89vboxsync site, which causes it to fully close the older connection on
a180a41bba1d50822df23fff0099e90b86638b89vboxsync that socket.
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>The user exits the client, which on some (most?) clients
a180a41bba1d50822df23fff0099e90b86638b89vboxsync causes the OS to fully shutdown the connection.
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>The FIN_WAIT_2 times out, on servers that have a timeout
a180a41bba1d50822df23fff0099e90b86638b89vboxsync for this state.
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncIf you are lucky, this means that the buggy client will fully close the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconnection and release the resources on your server. However, there
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncare some cases where the socket is never fully closed, such as a dialup
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncclient disconnecting from their provider before closing the client.
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncIn addition, a client might sit idle for days without making another
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconnection, and thus may hold its end of the socket open for days
a180a41bba1d50822df23fff0099e90b86638b89vboxsynceven though it has no further use for it.
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<STRONG>This is a bug in the browser or in its operating system's
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncThe clients on which this problem has been verified to exist:<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>Mozilla/3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>Mozilla/2.02 (X11; I; FreeBSD 2.1.5-RELEASE i386)
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>MSIE 3.01 on the Macintosh
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>MSIE 3.01 on Windows 95
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncThis does not appear to be a problem on:
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncIt is expected that many other clients have the same problem. What a
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncclient <STRONG>should do</STRONG> is periodically check its open
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncsocket(s) to see if they have been closed by the server, and close their
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncside of the connection if the server has closed. This check need only
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncoccur once every few seconds, and may even be detected by a OS signal
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncon some systems (e.g., Win95 and NT clients have this capability, but
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthey seem to be ignoring it).<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncApache <STRONG>cannot</STRONG> avoid these FIN_WAIT_2 states unless it
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncdisables persistent connections for the buggy clients, just
a180a41bba1d50822df23fff0099e90b86638b89vboxsynclike we recommend doing for Navigator 2.x clients due to other bugs.
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncHowever, non-persistent connections increase the total number of
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconnections needed per client and slow retrieval of an image-laden
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncweb page. Since non-persistent connections have their own resource
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconsumptions and a short waiting period after each closure, a busy server
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncmay need persistence in order to best serve its clients.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncAs far as we know, the client-caused FIN_WAIT_2 problem is present for
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncall servers that support persistent connections, including Apache 1.1.x
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncWhile the above bug is a problem, it is not the whole problem.
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncSome users have observed no FIN_WAIT_2 problems with Apache 1.1.x,
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncbut with 1.2b enough connections build up in the FIN_WAIT_2 state to
a180a41bba1d50822df23fff0099e90b86638b89vboxsynccrash their server. We have not yet identified why this would occur
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncand welcome additional test input.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncOne possible (and most likely) source for additional FIN_WAIT_2 states
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncis a function called <CODE>lingering_close()</CODE> which was added
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncbetween 1.1 and 1.2. This function is necessary for the proper
a180a41bba1d50822df23fff0099e90b86638b89vboxsynchandling of persistent connections and any request which includes
a180a41bba1d50822df23fff0099e90b86638b89vboxsynccontent in the message body (e.g., PUTs and POSTs).
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncWhat it does is read any data sent by the client for
a180a41bba1d50822df23fff0099e90b86638b89vboxsynca certain time after the server closes the connection. The exact
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncreasons for doing this are somewhat complicated, but involve what
a180a41bba1d50822df23fff0099e90b86638b89vboxsynchappens if the client is making a request at the same time the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncserver sends a response and closes the connection. Without lingering,
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthe client might be forced to reset its TCP input buffer before it
a180a41bba1d50822df23fff0099e90b86638b89vboxsynchas a chance to read the server's response, and thus understand why
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthe connection has closed.
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncSee the <A HREF="#appendix">appendix</A> for more details.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncWe have not yet tracked down the exact reason why
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<CODE>lingering_close()</CODE> causes problems. Its code has been
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthoroughly reviewed and extensively updated in 1.2b6. It is possible
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthat there is some problem in the BSD TCP stack which is causing the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncobserved problems. It is also possible that we fixed it in 1.2b6.
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncUnfortunately, we have not been able to replicate the problem on our
a180a41bba1d50822df23fff0099e90b86638b89vboxsynctest servers.<P>
<A HREF="http://www.sco.com/cgi-bin/waisgate?WAISdocID=2242622956+0+0+0&WAISaction=retrieve"> SCO's instructions</A>.
at 600 seconds, while in 3.xx it defaults to 600 seconds and
<!--#include virtual="footer.html" -->