fin_wait_2.html revision 35f745d0d98970c673c5ef89cd48bbd2beeb2efe
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<HTML>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<HEAD>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<TITLE>Connections in FIN_WAIT_2 and Apache</TITLE>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<LINK REV="made" HREF="mailto:marc@apache.org">
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsync</HEAD>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<BODY>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<!--#include virtual="header.html" -->
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<H1>Connections in the FIN_WAIT_2 state and Apache</H1>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<OL>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<LI><H2>What is the FIN_WAIT_2 state?</H2>
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncStarting with the Apache 1.2 betas, people are reporting many more
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconnections in the FIN_WAIT_2 state (as reported by
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<code>netstat</code>) than they saw using older versions. When the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncserver closes a TCP connection, it sends a packet with the FIN bit
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncsent to the client, which then responds with a packet with the ACK bit
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncset. The client then sends a packet with the FIN bit set to the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncserver, which responds with an ACK and the connection is closed. The
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncstate that the connection is in during the period between when the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncserver gets the ACK from the client and the server gets the FIN from
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthe client is known as FIN_WAIT_2. See the <A
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncHREF="ftp://ds.internic.net/rfc/rfc793.txt">TCP RFC</A> for the
a180a41bba1d50822df23fff0099e90b86638b89vboxsynctechnical details of the state transitions.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncThe FIN_WAIT_2 state is somewhat unusual in that there is no timeout
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncdefined in the standard for it. This means that on many operating
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncsystems, a connection in the FIN_WAIT_2 state will stay around until
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthe system is rebooted. If the system does not have a timeout and
a180a41bba1d50822df23fff0099e90b86638b89vboxsynctoo many FIN_WAIT_2 connections build up, it can fill up the space
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncallocated for storing information about the connections and crash
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthe kernel. The connections in FIN_WAIT_2 do not tie up an httpd
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncprocess.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<LI><H2>But why does it happen?</H2>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncThere are several reasons for it happening, and not all of them are
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncfully understood by the Apache team yet. What is known follows.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<H3>Buggy clients and persistent connections</H3>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncSeveral clients have a bug which pops up when dealing with
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<A HREF="/keepalive.html">persistent connections</A> (aka keepalives).
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncWhen the connection is idle and the server closes the connection
a180a41bba1d50822df23fff0099e90b86638b89vboxsync(based on the <A HREF="/mod/core.html#keepalivetimeout">
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncKeepAliveTimeout</A>), the client is programmed so that the client does
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncnot send back a FIN and ACK to the server. This means that the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconnection stays in the FIN_WAIT_2 state until one of the following
a180a41bba1d50822df23fff0099e90b86638b89vboxsynchappens:<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<UL>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>The client opens a new connection to the same or a different
a180a41bba1d50822df23fff0099e90b86638b89vboxsync site, which causes it to fully close the older connection on
a180a41bba1d50822df23fff0099e90b86638b89vboxsync that socket.
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>The user exits the client, which on some (most?) clients
a180a41bba1d50822df23fff0099e90b86638b89vboxsync causes the OS to fully shutdown the connection.
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>The FIN_WAIT_2 times out, on servers that have a timeout
a180a41bba1d50822df23fff0099e90b86638b89vboxsync for this state.
a180a41bba1d50822df23fff0099e90b86638b89vboxsync</UL><P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncIf you are lucky, this means that the buggy client will fully close the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconnection and release the resources on your server. However, there
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncare some cases where the socket is never fully closed, such as a dialup
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncclient disconnecting from their provider before closing the client.
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncIn addition, a client might sit idle for days without making another
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconnection, and thus may hold its end of the socket open for days
a180a41bba1d50822df23fff0099e90b86638b89vboxsynceven though it has no further use for it.
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<STRONG>This is a bug in the browser or in its operating system's
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncTCP implementation.</STRONG> <P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncThe clients on which this problem has been verified to exist:<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<UL>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>Mozilla/3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>Mozilla/2.02 (X11; I; FreeBSD 2.1.5-RELEASE i386)
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>Mozilla/3.01Gold (X11; I; SunOS 5.5 sun4m)
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>MSIE 3.01 on the Macintosh
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>MSIE 3.01 on Windows 95
a180a41bba1d50822df23fff0099e90b86638b89vboxsync</UL><P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncThis does not appear to be a problem on:
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<UL>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync <LI>Mozilla/3.01 (Win95; I)
a180a41bba1d50822df23fff0099e90b86638b89vboxsync</UL>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncIt is expected that many other clients have the same problem. What a
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncclient <STRONG>should do</STRONG> is periodically check its open
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncsocket(s) to see if they have been closed by the server, and close their
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncside of the connection if the server has closed. This check need only
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncoccur once every few seconds, and may even be detected by a OS signal
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncon some systems (e.g., Win95 and NT clients have this capability, but
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthey seem to be ignoring it).<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncApache <STRONG>cannot</STRONG> avoid these FIN_WAIT_2 states unless it
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncdisables persistent connections for the buggy clients, just
a180a41bba1d50822df23fff0099e90b86638b89vboxsynclike we recommend doing for Navigator 2.x clients due to other bugs.
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncHowever, non-persistent connections increase the total number of
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconnections needed per client and slow retrieval of an image-laden
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncweb page. Since non-persistent connections have their own resource
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncconsumptions and a short waiting period after each closure, a busy server
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncmay need persistence in order to best serve its clients.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncAs far as we know, the client-caused FIN_WAIT_2 problem is present for
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncall servers that support persistent connections, including Apache 1.1.x
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncand 1.2.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<H3>Something in Apache may be broken</H3>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncWhile the above bug is a problem, it is not the whole problem.
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncSome users have observed no FIN_WAIT_2 problems with Apache 1.1.x,
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncbut with 1.2b enough connections build up in the FIN_WAIT_2 state to
a180a41bba1d50822df23fff0099e90b86638b89vboxsynccrash their server. We have not yet identified why this would occur
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncand welcome additional test input.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncOne possible (and most likely) source for additional FIN_WAIT_2 states
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncis a function called <CODE>lingering_close()</CODE> which was added
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncbetween 1.1 and 1.2. This function is necessary for the proper
a180a41bba1d50822df23fff0099e90b86638b89vboxsynchandling of persistent connections and any request which includes
a180a41bba1d50822df23fff0099e90b86638b89vboxsynccontent in the message body (e.g., PUTs and POSTs).
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncWhat it does is read any data sent by the client for
a180a41bba1d50822df23fff0099e90b86638b89vboxsynca certain time after the server closes the connection. The exact
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncreasons for doing this are somewhat complicated, but involve what
a180a41bba1d50822df23fff0099e90b86638b89vboxsynchappens if the client is making a request at the same time the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncserver sends a response and closes the connection. Without lingering,
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthe client might be forced to reset its TCP input buffer before it
a180a41bba1d50822df23fff0099e90b86638b89vboxsynchas a chance to read the server's response, and thus understand why
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthe connection has closed.
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncSee the <A HREF="#appendix">appendix</A> for more details.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncWe have not yet tracked down the exact reason why
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<CODE>lingering_close()</CODE> causes problems. Its code has been
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthoroughly reviewed and extensively updated in 1.2b6. It is possible
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncthat there is some problem in the BSD TCP stack which is causing the
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncobserved problems. It is also possible that we fixed it in 1.2b6.
a180a41bba1d50822df23fff0099e90b86638b89vboxsyncUnfortunately, we have not been able to replicate the problem on our
a180a41bba1d50822df23fff0099e90b86638b89vboxsynctest servers.<P>
a180a41bba1d50822df23fff0099e90b86638b89vboxsync
a180a41bba1d50822df23fff0099e90b86638b89vboxsync<H2><LI>What can I do about it?</H2>
There are several possible workarounds to the problem, some of
which work better than others.<P>
<H3>Add a timeout for FIN_WAIT_2</H3>
The obvious workaround is to simply have a timeout for the FIN_WAIT_2 state.
This is not specified by the RFC, and could be claimed to be a
violation of the RFC, but it is widely recognized as being necessary.
The following systems are known to have a timeout:
<P>
<UL>
<LI><A HREF="http://www.freebsd.org/">FreeBSD</A> versions starting at 2.0 or possibly earlier.
<LI><A HREF="http://www.netbsd.org/">NetBSD</A> version 1.2(?)
<LI><A HREF="http://www.openbsd.org/">OpenBSD</A> all versions(?)
<LI><A HREF="http://www.bsdi.com/">BSD/OS</A> 2.1, with the
<A HREF="ftp://ftp.bsdi.com/bsdi/patches/patches-2.1/K210-027">
K210-027</A> patch installed.
<LI><A HREF="http://www.sun.com/">Solaris</A> as of around version
2.2. The timeout can be tuned by using <CODE>ndd</CODE> to
modify <CODE>tcp_fin_wait_2_flush_interval</CODE>, but the
default should be appropriate for most servers and improper
tuning can have negative impacts.
<LI><A HREF="http://www.sco.com/">SCO TCP/IP Release 1.2.1</A>
can be modified to have a timeout by following
<A HREF="http://www.sco.com/cgi-bin/waisgate?WAISdocID=2242622956+0+0+0&WAISaction=retrieve"> SCO's instructions</A>.
<LI><A HREF="http://www.linux.org/">Linux</A> 2.0.x and
earlier(?)
<LI><A HREF="http://www.hp.com/">HP-UX</A> 10.x defaults to
terminating connections in the FIN_WAIT_2 state after the
normal keepalive timeouts. This does not
refer to the persistent connection or HTTP keepalive
timeouts, but the <CODE>SO_LINGER</CODE> socket option
which is enabled by Apache. This parameter can be adjusted
by using <CODE>nettune</CODE> to modify parameters such as
<CODE>tcp_keepstart</CODE> and <CODE>tcp_keepstop</CODE>.
In later revisions, there is an explicit timer for
connections in FIN_WAIT_2 that can be modified; contact HP
support for details.
<LI><A HREF="http://www.sgi.com/">SGI IRIX</A> 5.3, 6.2 and 6.3
(and a patch for 6.4 after release) will add a timeout for
connections in FIN_WAIT_2 in a forthcoming (as of 97/01) rollup
patch. Contact SGI for details.
<LI><A HREF="http://www.ncr.com/">NCR's MP RAS Unix</A> 2.xx and
3.xx both have FIN_WAIT_2 timeouts. In 2.xx it is non-tunable
at 600 seconds, while in 3.xx it defaults to 600 seconds and
is calculated based on the tunable "max keep alive probes"
(default of 8) multiplied by the "keep alive interval" (default
75 seconds).
<LI><A HREF="http://www.sequent.com">Squent's ptx/TCP/IP for
DYNIX/ptx</A> has had a FIN_WAIT_2 timeout since around
release 4.1 in mid-1994.
</UL>
<P>
The following systems are known to not have a timeout:
<P>
<UL>
<LI><A HREF="http://www.sun.com/">SunOS 4.x</A> does not and
almost certainly never will have one because it as at the
very end of its development cycle for Sun. If you have kernel
source should be easy to patch.
</UL>
<P>
There is a
<A HREF="http://www.apache.org/dist/contrib/patches/1.2/fin_wait_2.patch">
patch available</A> for adding a timeout to the FIN_WAIT_2 state; it
was originally intended for BSD/OS, but should be adaptable to most
systems using BSD networking code. You need kernel source code to be
able to use it. If you do adapt it to work for any other systems,
please drop me a note at <A HREF="mailto:marc@apache.org">marc@apache.org</A>.
<P>
<H3>Compile without using <CODE>lingering_close()</CODE></H3>
It is possible to compile Apache 1.2 without using the
<CODE>lingering_close()</CODE> function. This will result in that
section of code being similar to that which was in 1.1. If you do
this, be aware that it can cause problems with PUTs, POSTs and
persistent connections, especially if the client uses pipelining.
That said, it is no worse than on 1.1, and we understand that keeping your
server running is quite important.<P>
To compile without the <CODE>lingering_close()</CODE> function, add
<CODE>-DNO_LINGCLOSE</CODE> to the end of the
<CODE>EXTRA_CFLAGS</CODE> line in your <CODE>Configuration</CODE> file,
rerun <CODE>Configure</CODE> and rebuild the server.
<P>
<H3>Use <CODE>SO_LINGER</CODE> as an alternative to
<CODE>lingering_close()</CODE></H3>
On most systems, there is an option called <CODE>SO_LINGER</CODE> that
can be set with <CODE>setsockopt(2)</CODE>. It does something very
similar to <CODE>lingering_close()</CODE>, except that it is broken
on many systems so that it causes far more problems than
<CODE>lingering_close</CODE>. On some systems, it could possibly work
better so it may be worth a try if you have no other alternatives. <P>
To try it, add <CODE>-DUSE_SO_LINGER -DNO_LINGCLOSE</CODE> to the end of the
<CODE>EXTRA_CFLAGS</CODE> line in your <CODE>Configuration</CODE>
file, rerun <CODE>Configure</CODE> and rebuild the server. <P>
<STRONG>NOTE:</STRONG> Attempting to use <CODE>SO_LINGER</CODE> and
<CODE>lingering_close()</CODE> at the same time is very likely to do
very bad things, so don't.<P>
<H3>Increase the amount of memory used for storing connection state</H3>
<DL>
<DT>BSD based networking code:
<DD>BSD stores network data, such as connection states,
in something called an mbuf. When you get so many connections
that the kernel does not have enough mbufs to put them all in, your
kernel will likely crash. You can reduce the effects of the problem
by increasing the number of mbufs that are available; this will not
prevent the problem, it will just make the server go longer before
crashing.<P>
The exact way to increase them may depend on your OS; look
for some reference to the number of "mbufs" or "mbuf clusters". On
many systems, this can be done by adding the line
<CODE>NMBCLUSTERS="n"</CODE>, where <CODE>n</CODE> is the number of
mbuf clusters you want to your kernel config file and rebuilding your
kernel.<P>
</DL>
<H2><LI>Feedback</H2>
If you have any information to add to this page, please contact me at
<A HREF="mailto:marc@apache.org">marc@apache.org</A>.<P>
<H2><A NAME="appendix"><LI>Appendix</A></H2>
<P>
Below is a message from Roy Fielding, one of the authors of HTTP/1.1.
<H3>Why the lingering close functionality is necessary with HTTP</H3>
The need for a server to linger on a socket after a close is noted a couple
times in the HTTP specs, but not explained. This explanation is based on
discussions between myself, Henrik Frystyk, Robert S. Thau, Dave Raggett,
and John C. Mallery in the hallways of MIT while I was at W3C.<P>
If a server closes the input side of the connection while the client
is sending data (or is planning to send data), then the server's TCP
stack will signal an RST (reset) back to the client. Upon
receipt of the RST, the client will flush its own incoming TCP buffer
back to the un-ACKed packet indicated by the RST packet argument.
If the server has sent a message, usually an error response, to the
client just before the close, and the client receives the RST packet
before its application code has read the error message from its incoming
TCP buffer and before the server has received the ACK sent by the client
upon receipt of that buffer, then the RST will flush the error message
before the client application has a chance to see it. The result is
that the client is left thinking that the connection failed for no
apparent reason.<P>
There are two conditions under which this is likely to occur:
<OL>
<LI>sending POST or PUT data without proper authorization
<LI>sending multiple requests before each response (pipelining)
and one of the middle requests resulting in an error or
other break-the-connection result.
</OL>
<P>
The solution in all cases is to send the response, close only the
write half of the connection (what shutdown is supposed to do), and
continue reading on the socket until it is either closed by the
client (signifying it has finally read the response) or a timeout occurs.
That is what the kernel is supposed to do if SO_LINGER is set.
Unfortunately, SO_LINGER has no effect on some systems; on some other
systems, it does not have its own timeout and thus the TCP memory
segments just pile-up until the next reboot (planned or not).<P>
Please note that simply removing the linger code will not solve the
problem -- it only moves it to a different and much harder one to detect.
</OL>
<!--#include virtual="footer.html" -->
</BODY>
</HTML>