diff options
Diffstat (limited to 'docs/manual/misc/fin_wait_2.html')
-rw-r--r-- | docs/manual/misc/fin_wait_2.html | 324 |
1 files changed, 0 insertions, 324 deletions
diff --git a/docs/manual/misc/fin_wait_2.html b/docs/manual/misc/fin_wait_2.html deleted file mode 100644 index e1e313d75c..0000000000 --- a/docs/manual/misc/fin_wait_2.html +++ /dev/null @@ -1,324 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> -<TITLE>Connections in FIN_WAIT_2 and Apache</TITLE> -<LINK REV="made" HREF="mailto:marc@apache.org"> - -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> - -<H1 ALIGN="CENTER">Connections in the FIN_WAIT_2 state and Apache</H1> -<OL> -<LI><H2>What is the FIN_WAIT_2 state?</H2> -Starting with the Apache 1.2 betas, people are reporting many more -connections in the FIN_WAIT_2 state (as reported by -<CODE>netstat</CODE>) than they saw using older versions. When the -server closes a TCP connection, it sends a packet with the FIN bit -sent to the client, which then responds with a packet with the ACK bit -set. The client then sends a packet with the FIN bit set to the -server, which responds with an ACK and the connection is closed. The -state that the connection is in during the period between when the -server gets the ACK from the client and the server gets the FIN from -the client is known as FIN_WAIT_2. See the <A -HREF="ftp://ds.internic.net/rfc/rfc793.txt">TCP RFC</A> for the -technical details of the state transitions.<P> - -The FIN_WAIT_2 state is somewhat unusual in that there is no timeout -defined in the standard for it. This means that on many operating -systems, a connection in the FIN_WAIT_2 state will stay around until -the system is rebooted. If the system does not have a timeout and -too many FIN_WAIT_2 connections build up, it can fill up the space -allocated for storing information about the connections and crash -the kernel. The connections in FIN_WAIT_2 do not tie up an httpd -process.<P> - -<LI><H2>But why does it happen?</H2> - -There are numerous reasons for it happening, some of them may not -yet be fully clear. What is known follows.<P> - -<H3>Buggy clients and persistent connections</H3> - -Several clients have a bug which pops up when dealing with -<A HREF="../keepalive.html">persistent connections</A> (aka keepalives). -When the connection is idle and the server closes the connection -(based on the <A HREF="../mod/core.html#keepalivetimeout"> -KeepAliveTimeout</A>), the client is programmed so that the client does -not send back a FIN and ACK to the server. This means that the -connection stays in the FIN_WAIT_2 state until one of the following -happens:<P> -<UL> - <LI>The client opens a new connection to the same or a different - site, which causes it to fully close the older connection on - that socket. - <LI>The user exits the client, which on some (most?) clients - causes the OS to fully shutdown the connection. - <LI>The FIN_WAIT_2 times out, on servers that have a timeout - for this state. -</UL><P> -If you are lucky, this means that the buggy client will fully close the -connection and release the resources on your server. However, there -are some cases where the socket is never fully closed, such as a dialup -client disconnecting from their provider before closing the client. -In addition, a client might sit idle for days without making another -connection, and thus may hold its end of the socket open for days -even though it has no further use for it. -<STRONG>This is a bug in the browser or in its operating system's -TCP implementation.</STRONG> <P> - -The clients on which this problem has been verified to exist:<P> -<UL> - <LI>Mozilla/3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) - <LI>Mozilla/2.02 (X11; I; FreeBSD 2.1.5-RELEASE i386) - <LI>Mozilla/3.01Gold (X11; I; SunOS 5.5 sun4m) - <LI>MSIE 3.01 on the Macintosh - <LI>MSIE 3.01 on Windows 95 -</UL><P> - -This does not appear to be a problem on: -<UL> - <LI>Mozilla/3.01 (Win95; I) -</UL> -<P> - -It is expected that many other clients have the same problem. What a -client <STRONG>should do</STRONG> is periodically check its open -socket(s) to see if they have been closed by the server, and close their -side of the connection if the server has closed. This check need only -occur once every few seconds, and may even be detected by a OS signal -on some systems (<EM>e.g.</EM>, Win95 and NT clients have this capability, but -they seem to be ignoring it).<P> - -Apache <STRONG>cannot</STRONG> avoid these FIN_WAIT_2 states unless it -disables persistent connections for the buggy clients, just -like we recommend doing for Navigator 2.x clients due to other bugs. -However, non-persistent connections increase the total number of -connections needed per client and slow retrieval of an image-laden -web page. Since non-persistent connections have their own resource -consumptions and a short waiting period after each closure, a busy server -may need persistence in order to best serve its clients.<P> - -As far as we know, the client-caused FIN_WAIT_2 problem is present for -all servers that support persistent connections, including Apache 1.1.x -and 1.2.<P> - -<H3>A necessary bit of code introduced in 1.2</H3> - -While the above bug is a problem, it is not the whole problem. -Some users have observed no FIN_WAIT_2 problems with Apache 1.1.x, -but with 1.2b enough connections build up in the FIN_WAIT_2 state to -crash their server. - -The most likely source for additional FIN_WAIT_2 states -is a function called <CODE>lingering_close()</CODE> which was added -between 1.1 and 1.2. This function is necessary for the proper -handling of persistent connections and any request which includes -content in the message body (<EM>e.g.</EM>, PUTs and POSTs). -What it does is read any data sent by the client for -a certain time after the server closes the connection. The exact -reasons for doing this are somewhat complicated, but involve what -happens if the client is making a request at the same time the -server sends a response and closes the connection. Without lingering, -the client might be forced to reset its TCP input buffer before it -has a chance to read the server's response, and thus understand why -the connection has closed. -See the <A HREF="#appendix">appendix</A> for more details.<P> - -The code in <CODE>lingering_close()</CODE> appears to cause problems -for a number of factors, including the change in traffic patterns -that it causes. The code has been thoroughly reviewed and we are -not aware of any bugs in it. It is possible that there is some -problem in the BSD TCP stack, aside from the lack of a timeout -for the FIN_WAIT_2 state, exposed by the <CODE>lingering_close</CODE> -code that causes the observed problems.<P> - -<H2><LI>What can I do about it?</H2> - -There are several possible workarounds to the problem, some of -which work better than others.<P> - -<H3>Add a timeout for FIN_WAIT_2</H3> - -The obvious workaround is to simply have a timeout for the FIN_WAIT_2 state. -This is not specified by the RFC, and could be claimed to be a -violation of the RFC, but it is widely recognized as being necessary. -The following systems are known to have a timeout: -<P> -<UL> - <LI><A HREF="http://www.freebsd.org/">FreeBSD</A> versions starting at - 2.0 or possibly earlier. - <LI><A HREF="http://www.netbsd.org/">NetBSD</A> version 1.2(?) - <LI><A HREF="http://www.openbsd.org/">OpenBSD</A> all versions(?) - <LI><A HREF="http://www.bsdi.com/">BSD/OS</A> 2.1, with the - <A HREF="ftp://ftp.bsdi.com/bsdi/patches/patches-2.1/K210-027"> - K210-027</A> patch installed. - <LI><A HREF="http://www.sun.com/">Solaris</A> as of around version - 2.2. The timeout can be tuned by using <CODE>ndd</CODE> to - modify <CODE>tcp_fin_wait_2_flush_interval</CODE>, but the - default should be appropriate for most servers and improper - tuning can have negative impacts. - <LI><A HREF="http://www.linux.org/">Linux</A> 2.0.x and - earlier(?) - <LI><A HREF="http://www.hp.com/">HP-UX</A> 10.x defaults to - terminating connections in the FIN_WAIT_2 state after the - normal keepalive timeouts. This does not - refer to the persistent connection or HTTP keepalive - timeouts, but the <CODE>SO_LINGER</CODE> socket option - which is enabled by Apache. This parameter can be adjusted - by using <CODE>nettune</CODE> to modify parameters such as - <CODE>tcp_keepstart</CODE> and <CODE>tcp_keepstop</CODE>. - In later revisions, there is an explicit timer for - connections in FIN_WAIT_2 that can be modified; contact HP - support for details. - <LI><A HREF="http://www.sgi.com/">SGI IRIX</A> can be patched to - support a timeout. For IRIX 5.3, 6.2, and 6.3, - use patches 1654, 1703 and 1778 respectively. If you - have trouble locating these patches, please contact your - SGI support channel for help. - <LI><A HREF="http://www.ncr.com/">NCR's MP RAS Unix</A> 2.xx and - 3.xx both have FIN_WAIT_2 timeouts. In 2.xx it is non-tunable - at 600 seconds, while in 3.xx it defaults to 600 seconds and - is calculated based on the tunable "max keep alive probes" - (default of 8) multiplied by the "keep alive interval" (default - 75 seconds). - <LI><A HREF="http://www.sequent.com">Sequent's ptx/TCP/IP for - DYNIX/ptx</A> has had a FIN_WAIT_2 timeout since around - release 4.1 in mid-1994. -</UL> -<P> -The following systems are known to not have a timeout: -<P> -<UL> - <LI><A HREF="http://www.sun.com/">SunOS 4.x</A> does not and - almost certainly never will have one because it as at the - very end of its development cycle for Sun. If you have kernel - source should be easy to patch. -</UL> -<P> -There is a -<A HREF="http://www.apache.org/dist/contrib/patches/1.2/fin_wait_2.patch" ->patch available</A> for adding a timeout to the FIN_WAIT_2 state; it -was originally intended for BSD/OS, but should be adaptable to most -systems using BSD networking code. You need kernel source code to be -able to use it. If you do adapt it to work for any other systems, -please drop me a note at <A HREF="mailto:marc@apache.org">marc@apache.org</A>. -<P> -<H3>Compile without using <CODE>lingering_close()</CODE></H3> - -It is possible to compile Apache 1.2 without using the -<CODE>lingering_close()</CODE> function. This will result in that -section of code being similar to that which was in 1.1. If you do -this, be aware that it can cause problems with PUTs, POSTs and -persistent connections, especially if the client uses pipelining. -That said, it is no worse than on 1.1, and we understand that keeping your -server running is quite important.<P> - -To compile without the <CODE>lingering_close()</CODE> function, add -<CODE>-DNO_LINGCLOSE</CODE> to the end of the -<CODE>EXTRA_CFLAGS</CODE> line in your <CODE>Configuration</CODE> file, -rerun <CODE>Configure</CODE> and rebuild the server. -<P> -<H3>Use <CODE>SO_LINGER</CODE> as an alternative to -<CODE>lingering_close()</CODE></H3> - -On most systems, there is an option called <CODE>SO_LINGER</CODE> that -can be set with <CODE>setsockopt(2)</CODE>. It does something very -similar to <CODE>lingering_close()</CODE>, except that it is broken -on many systems so that it causes far more problems than -<CODE>lingering_close</CODE>. On some systems, it could possibly work -better so it may be worth a try if you have no other alternatives. <P> - -To try it, add <CODE>-DUSE_SO_LINGER -DNO_LINGCLOSE</CODE> to the end of the -<CODE>EXTRA_CFLAGS</CODE> line in your <CODE>Configuration</CODE> -file, rerun <CODE>Configure</CODE> and rebuild the server. <P> - -<STRONG>NOTE:</STRONG> Attempting to use <CODE>SO_LINGER</CODE> and -<CODE>lingering_close()</CODE> at the same time is very likely to do -very bad things, so don't.<P> - -<H3>Increase the amount of memory used for storing connection state</H3> -<DL> -<DT>BSD based networking code: -<DD>BSD stores network data, such as connection states, -in something called an mbuf. When you get so many connections -that the kernel does not have enough mbufs to put them all in, your -kernel will likely crash. You can reduce the effects of the problem -by increasing the number of mbufs that are available; this will not -prevent the problem, it will just make the server go longer before -crashing.<P> - -The exact way to increase them may depend on your OS; look -for some reference to the number of "mbufs" or "mbuf clusters". On -many systems, this can be done by adding the line -<CODE>NMBCLUSTERS="n"</CODE>, where <CODE>n</CODE> is the number of -mbuf clusters you want to your kernel config file and rebuilding your -kernel.<P> -</DL> - -<H3>Disable KeepAlive</H3> -<P>If you are unable to do any of the above then you should, as a last -resort, disable KeepAlive. Edit your httpd.conf and change "KeepAlive On" -to "KeepAlive Off". - -<H2><LI>Feedback</H2> - -If you have any information to add to this page, please contact me at -<A HREF="mailto:marc@apache.org">marc@apache.org</A>.<P> - -<H2><A NAME="appendix"><LI>Appendix</A></H2> -<P> -Below is a message from Roy Fielding, one of the authors of HTTP/1.1. - -<H3>Why the lingering close functionality is necessary with HTTP</H3> - -The need for a server to linger on a socket after a close is noted a couple -times in the HTTP specs, but not explained. This explanation is based on -discussions between myself, Henrik Frystyk, Robert S. Thau, Dave Raggett, -and John C. Mallery in the hallways of MIT while I was at W3C.<P> - -If a server closes the input side of the connection while the client -is sending data (or is planning to send data), then the server's TCP -stack will signal an RST (reset) back to the client. Upon -receipt of the RST, the client will flush its own incoming TCP buffer -back to the un-ACKed packet indicated by the RST packet argument. -If the server has sent a message, usually an error response, to the -client just before the close, and the client receives the RST packet -before its application code has read the error message from its incoming -TCP buffer and before the server has received the ACK sent by the client -upon receipt of that buffer, then the RST will flush the error message -before the client application has a chance to see it. The result is -that the client is left thinking that the connection failed for no -apparent reason.<P> - -There are two conditions under which this is likely to occur: -<OL> -<LI>sending POST or PUT data without proper authorization -<LI>sending multiple requests before each response (pipelining) - and one of the middle requests resulting in an error or - other break-the-connection result. -</OL> -<P> -The solution in all cases is to send the response, close only the -write half of the connection (what shutdown is supposed to do), and -continue reading on the socket until it is either closed by the -client (signifying it has finally read the response) or a timeout occurs. -That is what the kernel is supposed to do if SO_LINGER is set. -Unfortunately, SO_LINGER has no effect on some systems; on some other -systems, it does not have its own timeout and thus the TCP memory -segments just pile-up until the next reboot (planned or not).<P> - -Please note that simply removing the linger code will not solve the -problem -- it only moves it to a different and much harder one to detect. -</OL> -<!--#include virtual="footer.html" --> -</BODY> -</HTML> |