diff options
Diffstat (limited to 'docs/manual/misc')
25 files changed, 0 insertions, 9922 deletions
diff --git a/docs/manual/misc/API.html b/docs/manual/misc/API.html deleted file mode 100644 index 496be760c9..0000000000 --- a/docs/manual/misc/API.html +++ /dev/null @@ -1,1161 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML><HEAD> -<TITLE>Apache API notes</TITLE> -</HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> - -<blockquote><strong>Warning:</strong> -This document has not been updated to take into account changes -made in the 2.0 version of the Apache HTTP Server. Some of the -information may still be relevant, but please use it -with care. -</blockquote> - -<H1 ALIGN="CENTER">Apache API notes</H1> - -These are some notes on the Apache API and the data structures you -have to deal with, <EM>etc.</EM> They are not yet nearly complete, but -hopefully, they will help you get your bearings. Keep in mind that -the API is still subject to change as we gain experience with it. -(See the TODO file for what <EM>might</EM> be coming). However, -it will be easy to adapt modules to any changes that are made. -(We have more modules to adapt than you do). -<P> - -A few notes on general pedagogical style here. In the interest of -conciseness, all structure declarations here are incomplete --- the -real ones have more slots that I'm not telling you about. For the -most part, these are reserved to one component of the server core or -another, and should be altered by modules with caution. However, in -some cases, they really are things I just haven't gotten around to -yet. Welcome to the bleeding edge.<P> - -Finally, here's an outline, to give you some bare idea of what's -coming up, and in what order: - -<UL> -<LI> <A HREF="#basics">Basic concepts.</A> -<MENU> - <LI> <A HREF="#HMR">Handlers, Modules, and Requests</A> - <LI> <A HREF="#moduletour">A brief tour of a module</A> -</MENU> -<LI> <A HREF="#handlers">How handlers work</A> -<MENU> - <LI> <A HREF="#req_tour">A brief tour of the <CODE>request_rec</CODE></A> - <LI> <A HREF="#req_orig">Where request_rec structures come from</A> - <LI> <A HREF="#req_return">Handling requests, declining, and returning error - codes</A> - <LI> <A HREF="#resp_handlers">Special considerations for response handlers</A> - <LI> <A HREF="#auth_handlers">Special considerations for authentication - handlers</A> - <LI> <A HREF="#log_handlers">Special considerations for logging handlers</A> -</MENU> -<LI> <A HREF="#pools">Resource allocation and resource pools</A> -<LI> <A HREF="#config">Configuration, commands and the like</A> -<MENU> - <LI> <A HREF="#per-dir">Per-directory configuration structures</A> - <LI> <A HREF="#commands">Command handling</A> - <LI> <A HREF="#servconf">Side notes --- per-server configuration, - virtual servers, <EM>etc</EM>.</A> -</MENU> -</UL> - -<H2><A NAME="basics">Basic concepts.</A></H2> - -We begin with an overview of the basic concepts behind the -API, and how they are manifested in the code. - -<H3><A NAME="HMR">Handlers, Modules, and Requests</A></H3> - -Apache breaks down request handling into a series of steps, more or -less the same way the Netscape server API does (although this API has -a few more stages than NetSite does, as hooks for stuff I thought -might be useful in the future). These are: - -<UL> - <LI> URI -> Filename translation - <LI> Auth ID checking [is the user who they say they are?] - <LI> Auth access checking [is the user authorized <EM>here</EM>?] - <LI> Access checking other than auth - <LI> Determining MIME type of the object requested - <LI> `Fixups' --- there aren't any of these yet, but the phase is - intended as a hook for possible extensions like - <CODE>SetEnv</CODE>, which don't really fit well elsewhere. - <LI> Actually sending a response back to the client. - <LI> Logging the request -</UL> - -These phases are handled by looking at each of a succession of -<EM>modules</EM>, looking to see if each of them has a handler for the -phase, and attempting invoking it if so. The handler can typically do -one of three things: - -<UL> - <LI> <EM>Handle</EM> the request, and indicate that it has done so - by returning the magic constant <CODE>OK</CODE>. - <LI> <EM>Decline</EM> to handle the request, by returning the magic - integer constant <CODE>DECLINED</CODE>. In this case, the - server behaves in all respects as if the handler simply hadn't - been there. - <LI> Signal an error, by returning one of the HTTP error codes. - This terminates normal handling of the request, although an - ErrorDocument may be invoked to try to mop up, and it will be - logged in any case. -</UL> - -Most phases are terminated by the first module that handles them; -however, for logging, `fixups', and non-access authentication -checking, all handlers always run (barring an error). Also, the -response phase is unique in that modules may declare multiple handlers -for it, via a dispatch table keyed on the MIME type of the requested -object. Modules may declare a response-phase handler which can handle -<EM>any</EM> request, by giving it the key <CODE>*/*</CODE> (<EM>i.e.</EM>, a -wildcard MIME type specification). However, wildcard handlers are -only invoked if the server has already tried and failed to find a more -specific response handler for the MIME type of the requested object -(either none existed, or they all declined).<P> - -The handlers themselves are functions of one argument (a -<CODE>request_rec</CODE> structure. vide infra), which returns an -integer, as above.<P> - -<H3><A NAME="moduletour">A brief tour of a module</A></H3> - -At this point, we need to explain the structure of a module. Our -candidate will be one of the messier ones, the CGI module --- this -handles both CGI scripts and the <CODE>ScriptAlias</CODE> config file -command. It's actually a great deal more complicated than most -modules, but if we're going to have only one example, it might as well -be the one with its fingers in every place.<P> - -Let's begin with handlers. In order to handle the CGI scripts, the -module declares a response handler for them. Because of -<CODE>ScriptAlias</CODE>, it also has handlers for the name -translation phase (to recognize <CODE>ScriptAlias</CODE>ed URIs), the -type-checking phase (any <CODE>ScriptAlias</CODE>ed request is typed -as a CGI script).<P> - -The module needs to maintain some per (virtual) -server information, namely, the <CODE>ScriptAlias</CODE>es in effect; -the module structure therefore contains pointers to a functions which -builds these structures, and to another which combines two of them (in -case the main server and a virtual server both have -<CODE>ScriptAlias</CODE>es declared).<P> - -Finally, this module contains code to handle the -<CODE>ScriptAlias</CODE> command itself. This particular module only -declares one command, but there could be more, so modules have -<EM>command tables</EM> which declare their commands, and describe -where they are permitted, and how they are to be invoked. <P> - -A final note on the declared types of the arguments of some of these -commands: a <CODE>pool</CODE> is a pointer to a <EM>resource pool</EM> -structure; these are used by the server to keep track of the memory -which has been allocated, files opened, <EM>etc.</EM>, either to service a -particular request, or to handle the process of configuring itself. -That way, when the request is over (or, for the configuration pool, -when the server is restarting), the memory can be freed, and the files -closed, <EM>en masse</EM>, without anyone having to write explicit code to -track them all down and dispose of them. Also, a -<CODE>cmd_parms</CODE> structure contains various information about -the config file being read, and other status information, which is -sometimes of use to the function which processes a config-file command -(such as <CODE>ScriptAlias</CODE>). - -With no further ado, the module itself: - -<PRE> -/* Declarations of handlers. */ - -int translate_scriptalias (request_rec *); -int type_scriptalias (request_rec *); -int cgi_handler (request_rec *); - -/* Subsidiary dispatch table for response-phase handlers, by MIME type */ - -handler_rec cgi_handlers[] = { -{ "application/x-httpd-cgi", cgi_handler }, -{ NULL } -}; - -/* Declarations of routines to manipulate the module's configuration - * info. Note that these are returned, and passed in, as void *'s; - * the server core keeps track of them, but it doesn't, and can't, - * know their internal structure. - */ - -void *make_cgi_server_config (pool *); -void *merge_cgi_server_config (pool *, void *, void *); - -/* Declarations of routines to handle config-file commands */ - -extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake, - char *real); - -command_rec cgi_cmds[] = { -{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2, - "a fakename and a realname"}, -{ NULL } -}; - -module cgi_module = { - STANDARD_MODULE_STUFF, - NULL, /* initializer */ - NULL, /* dir config creator */ - NULL, /* dir merger --- default is to override */ - make_cgi_server_config, /* server config */ - merge_cgi_server_config, /* merge server config */ - cgi_cmds, /* command table */ - cgi_handlers, /* handlers */ - translate_scriptalias, /* filename translation */ - NULL, /* check_user_id */ - NULL, /* check auth */ - NULL, /* check access */ - type_scriptalias, /* type_checker */ - NULL, /* fixups */ - NULL, /* logger */ - NULL /* header parser */ -}; -</PRE> - -<H2><A NAME="handlers">How handlers work</A></H2> - -The sole argument to handlers is a <CODE>request_rec</CODE> structure. -This structure describes a particular request which has been made to -the server, on behalf of a client. In most cases, each connection to -the client generates only one <CODE>request_rec</CODE> structure.<P> - -<H3><A NAME="req_tour">A brief tour of the <CODE>request_rec</CODE></A></H3> - -The <CODE>request_rec</CODE> contains pointers to a resource pool -which will be cleared when the server is finished handling the -request; to structures containing per-server and per-connection -information, and most importantly, information on the request itself.<P> - -The most important such information is a small set of character -strings describing attributes of the object being requested, including -its URI, filename, content-type and content-encoding (these being filled -in by the translation and type-check handlers which handle the -request, respectively). <P> - -Other commonly used data items are tables giving the MIME headers on -the client's original request, MIME headers to be sent back with the -response (which modules can add to at will), and environment variables -for any subprocesses which are spawned off in the course of servicing -the request. These tables are manipulated using the -<CODE>ap_table_get</CODE> and <CODE>ap_table_set</CODE> routines. <P> -<BLOCKQUOTE> - Note that the <SAMP>Content-type</SAMP> header value <EM>cannot</EM> be - set by module content-handlers using the <SAMP>ap_table_*()</SAMP> - routines. Rather, it is set by pointing the <SAMP>content_type</SAMP> - field in the <SAMP>request_rec</SAMP> structure to an appropriate - string. <EM>E.g.</EM>, - <PRE> - r->content_type = "text/html"; - </PRE> -</BLOCKQUOTE> -Finally, there are pointers to two data structures which, in turn, -point to per-module configuration structures. Specifically, these -hold pointers to the data structures which the module has built to -describe the way it has been configured to operate in a given -directory (via <CODE>.htaccess</CODE> files or -<CODE><Directory></CODE> sections), for private data it has -built in the course of servicing the request (so modules' handlers for -one phase can pass `notes' to their handlers for other phases). There -is another such configuration vector in the <CODE>server_rec</CODE> -data structure pointed to by the <CODE>request_rec</CODE>, which -contains per (virtual) server configuration data.<P> - -Here is an abridged declaration, giving the fields most commonly used:<P> - -<PRE> -struct request_rec { - - pool *pool; - conn_rec *connection; - server_rec *server; - - /* What object is being requested */ - - char *uri; - char *filename; - char *path_info; - char *args; /* QUERY_ARGS, if any */ - struct stat finfo; /* Set by server core; - * st_mode set to zero if no such file */ - - char *content_type; - char *content_encoding; - - /* MIME header environments, in and out. Also, an array containing - * environment variables to be passed to subprocesses, so people can - * write modules to add to that environment. - * - * The difference between headers_out and err_headers_out is that - * the latter are printed even on error, and persist across internal - * redirects (so the headers printed for ErrorDocument handlers will - * have them). - */ - - table *headers_in; - table *headers_out; - table *err_headers_out; - table *subprocess_env; - - /* Info about the request itself... */ - - int header_only; /* HEAD request, as opposed to GET */ - char *protocol; /* Protocol, as given to us, or HTTP/0.9 */ - char *method; /* GET, HEAD, POST, <EM>etc.</EM> */ - int method_number; /* M_GET, M_POST, <EM>etc.</EM> */ - - /* Info for logging */ - - char *the_request; - int bytes_sent; - - /* A flag which modules can set, to indicate that the data being - * returned is volatile, and clients should be told not to cache it. - */ - - int no_cache; - - /* Various other config info which may change with .htaccess files - * These are config vectors, with one void* pointer for each module - * (the thing pointed to being the module's business). - */ - - void *per_dir_config; /* Options set in config files, <EM>etc.</EM> */ - void *request_config; /* Notes on *this* request */ - -}; - -</PRE> - -<H3><A NAME="req_orig">Where request_rec structures come from</A></H3> - -Most <CODE>request_rec</CODE> structures are built by reading an HTTP -request from a client, and filling in the fields. However, there are -a few exceptions: - -<UL> - <LI> If the request is to an imagemap, a type map (<EM>i.e.</EM>, a - <CODE>*.var</CODE> file), or a CGI script which returned a - local `Location:', then the resource which the user requested - is going to be ultimately located by some URI other than what - the client originally supplied. In this case, the server does - an <EM>internal redirect</EM>, constructing a new - <CODE>request_rec</CODE> for the new URI, and processing it - almost exactly as if the client had requested the new URI - directly. <P> - - <LI> If some handler signaled an error, and an - <CODE>ErrorDocument</CODE> is in scope, the same internal - redirect machinery comes into play.<P> - - <LI> Finally, a handler occasionally needs to investigate `what - would happen if' some other request were run. For instance, - the directory indexing module needs to know what MIME type - would be assigned to a request for each directory entry, in - order to figure out what icon to use.<P> - - Such handlers can construct a <EM>sub-request</EM>, using the - functions <CODE>ap_sub_req_lookup_file</CODE>, - <CODE>ap_sub_req_lookup_uri</CODE>, and - <CODE>ap_sub_req_method_uri</CODE>; these construct a new - <CODE>request_rec</CODE> structure and processes it as you - would expect, up to but not including the point of actually - sending a response. (These functions skip over the access - checks if the sub-request is for a file in the same directory - as the original request).<P> - - (Server-side includes work by building sub-requests and then - actually invoking the response handler for them, via the - function <CODE>ap_run_sub_req</CODE>). -</UL> - -<H3><A NAME="req_return">Handling requests, declining, and returning error - codes</A></H3> - -As discussed above, each handler, when invoked to handle a particular -<CODE>request_rec</CODE>, has to return an <CODE>int</CODE> to -indicate what happened. That can either be - -<UL> - <LI> OK --- the request was handled successfully. This may or may - not terminate the phase. - <LI> DECLINED --- no erroneous condition exists, but the module - declines to handle the phase; the server tries to find another. - <LI> an HTTP error code, which aborts handling of the request. -</UL> - -Note that if the error code returned is <CODE>REDIRECT</CODE>, then -the module should put a <CODE>Location</CODE> in the request's -<CODE>headers_out</CODE>, to indicate where the client should be -redirected <EM>to</EM>. <P> - -<H3><A NAME="resp_handlers">Special considerations for response - handlers</A></H3> - -Handlers for most phases do their work by simply setting a few fields -in the <CODE>request_rec</CODE> structure (or, in the case of access -checkers, simply by returning the correct error code). However, -response handlers have to actually send a request back to the client. <P> - -They should begin by sending an HTTP response header, using the -function <CODE>ap_send_http_header</CODE>. (You don't have to do -anything special to skip sending the header for HTTP/0.9 requests; the -function figures out on its own that it shouldn't do anything). If -the request is marked <CODE>header_only</CODE>, that's all they should -do; they should return after that, without attempting any further -output. <P> - -Otherwise, they should produce a request body which responds to the -client as appropriate. The primitives for this are <CODE>ap_rputc</CODE> -and <CODE>ap_rprintf</CODE>, for internally generated output, and -<CODE>ap_send_fd</CODE>, to copy the contents of some <CODE>FILE *</CODE> -straight to the client. <P> - -At this point, you should more or less understand the following piece -of code, which is the handler which handles <CODE>GET</CODE> requests -which have no more specific handler; it also shows how conditional -<CODE>GET</CODE>s can be handled, if it's desirable to do so in a -particular response handler --- <CODE>ap_set_last_modified</CODE> checks -against the <CODE>If-modified-since</CODE> value supplied by the -client, if any, and returns an appropriate code (which will, if -nonzero, be USE_LOCAL_COPY). No similar considerations apply for -<CODE>ap_set_content_length</CODE>, but it returns an error code for -symmetry.<P> - -<PRE> -int default_handler (request_rec *r) -{ - int errstatus; - FILE *f; - - if (r->method_number != M_GET) return DECLINED; - if (r->finfo.st_mode == 0) return NOT_FOUND; - - if ((errstatus = ap_set_content_length (r, r->finfo.st_size)) - || (errstatus = ap_set_last_modified (r, r->finfo.st_mtime))) - return errstatus; - - f = fopen (r->filename, "r"); - - if (f == NULL) { - log_reason("file permissions deny server access", - r->filename, r); - return FORBIDDEN; - } - - register_timeout ("send", r); - ap_send_http_header (r); - - if (!r->header_only) send_fd (f, r); - ap_pfclose (r->pool, f); - return OK; -} -</PRE> - -Finally, if all of this is too much of a challenge, there are a few -ways out of it. First off, as shown above, a response handler which -has not yet produced any output can simply return an error code, in -which case the server will automatically produce an error response. -Secondly, it can punt to some other handler by invoking -<CODE>ap_internal_redirect</CODE>, which is how the internal redirection -machinery discussed above is invoked. A response handler which has -internally redirected should always return <CODE>OK</CODE>. <P> - -(Invoking <CODE>ap_internal_redirect</CODE> from handlers which are -<EM>not</EM> response handlers will lead to serious confusion). - -<H3><A NAME="auth_handlers">Special considerations for authentication - handlers</A></H3> - -Stuff that should be discussed here in detail: - -<UL> - <LI> Authentication-phase handlers not invoked unless auth is - configured for the directory. - <LI> Common auth configuration stored in the core per-dir - configuration; it has accessors <CODE>ap_auth_type</CODE>, - <CODE>ap_auth_name</CODE>, and <CODE>ap_requires</CODE>. - <LI> Common routines, to handle the protocol end of things, at least - for HTTP basic authentication (<CODE>ap_get_basic_auth_pw</CODE>, - which sets the <CODE>connection->user</CODE> structure field - automatically, and <CODE>ap_note_basic_auth_failure</CODE>, which - arranges for the proper <CODE>WWW-Authenticate:</CODE> header - to be sent back). -</UL> - -<H3><A NAME="log_handlers">Special considerations for logging handlers</A></H3> - -When a request has internally redirected, there is the question of -what to log. Apache handles this by bundling the entire chain of -redirects into a list of <CODE>request_rec</CODE> structures which are -threaded through the <CODE>r->prev</CODE> and <CODE>r->next</CODE> -pointers. The <CODE>request_rec</CODE> which is passed to the logging -handlers in such cases is the one which was originally built for the -initial request from the client; note that the bytes_sent field will -only be correct in the last request in the chain (the one for which a -response was actually sent). - -<H2><A NAME="pools">Resource allocation and resource pools</A></H2> -<P> -One of the problems of writing and designing a server-pool server is -that of preventing leakage, that is, allocating resources (memory, -open files, <EM>etc.</EM>), without subsequently releasing them. The resource -pool machinery is designed to make it easy to prevent this from -happening, by allowing resource to be allocated in such a way that -they are <EM>automatically</EM> released when the server is done with -them. -</P> -<P> -The way this works is as follows: the memory which is allocated, file -opened, <EM>etc.</EM>, to deal with a particular request are tied to a -<EM>resource pool</EM> which is allocated for the request. The pool -is a data structure which itself tracks the resources in question. -</P> -<P> -When the request has been processed, the pool is <EM>cleared</EM>. At -that point, all the memory associated with it is released for reuse, -all files associated with it are closed, and any other clean-up -functions which are associated with the pool are run. When this is -over, we can be confident that all the resource tied to the pool have -been released, and that none of them have leaked. -</P> -<P> -Server restarts, and allocation of memory and resources for per-server -configuration, are handled in a similar way. There is a -<EM>configuration pool</EM>, which keeps track of resources which were -allocated while reading the server configuration files, and handling -the commands therein (for instance, the memory that was allocated for -per-server module configuration, log files and other files that were -opened, and so forth). When the server restarts, and has to reread -the configuration files, the configuration pool is cleared, and so the -memory and file descriptors which were taken up by reading them the -last time are made available for reuse. -</P> -<P> -It should be noted that use of the pool machinery isn't generally -obligatory, except for situations like logging handlers, where you -really need to register cleanups to make sure that the log file gets -closed when the server restarts (this is most easily done by using the -function <CODE><A HREF="#pool-files">ap_pfopen</A></CODE>, which also -arranges for the underlying file descriptor to be closed before any -child processes, such as for CGI scripts, are <CODE>exec</CODE>ed), or -in case you are using the timeout machinery (which isn't yet even -documented here). However, there are two benefits to using it: -resources allocated to a pool never leak (even if you allocate a -scratch string, and just forget about it); also, for memory -allocation, <CODE>ap_palloc</CODE> is generally faster than -<CODE>malloc</CODE>. -</P> -<P> -We begin here by describing how memory is allocated to pools, and then -discuss how other resources are tracked by the resource pool -machinery. -</P> -<H3>Allocation of memory in pools</H3> -<P> -Memory is allocated to pools by calling the function -<CODE>ap_palloc</CODE>, which takes two arguments, one being a pointer to -a resource pool structure, and the other being the amount of memory to -allocate (in <CODE>char</CODE>s). Within handlers for handling -requests, the most common way of getting a resource pool structure is -by looking at the <CODE>pool</CODE> slot of the relevant -<CODE>request_rec</CODE>; hence the repeated appearance of the -following idiom in module code: -</P> -<PRE> -int my_handler(request_rec *r) -{ - struct my_structure *foo; - ... - - foo = (foo *)ap_palloc (r->pool, sizeof(my_structure)); -} -</PRE> -<P> -Note that <EM>there is no <CODE>ap_pfree</CODE></EM> --- -<CODE>ap_palloc</CODE>ed memory is freed only when the associated -resource pool is cleared. This means that <CODE>ap_palloc</CODE> does not -have to do as much accounting as <CODE>malloc()</CODE>; all it does in -the typical case is to round up the size, bump a pointer, and do a -range check. -</P> -<P> -(It also raises the possibility that heavy use of <CODE>ap_palloc</CODE> -could cause a server process to grow excessively large. There are -two ways to deal with this, which are dealt with below; briefly, you -can use <CODE>malloc</CODE>, and try to be sure that all of the memory -gets explicitly <CODE>free</CODE>d, or you can allocate a sub-pool of -the main pool, allocate your memory in the sub-pool, and clear it out -periodically. The latter technique is discussed in the section on -sub-pools below, and is used in the directory-indexing code, in order -to avoid excessive storage allocation when listing directories with -thousands of files). -</P> -<H3>Allocating initialized memory</H3> -<P> -There are functions which allocate initialized memory, and are -frequently useful. The function <CODE>ap_pcalloc</CODE> has the same -interface as <CODE>ap_palloc</CODE>, but clears out the memory it -allocates before it returns it. The function <CODE>ap_pstrdup</CODE> -takes a resource pool and a <CODE>char *</CODE> as arguments, and -allocates memory for a copy of the string the pointer points to, -returning a pointer to the copy. Finally <CODE>ap_pstrcat</CODE> is a -varargs-style function, which takes a pointer to a resource pool, and -at least two <CODE>char *</CODE> arguments, the last of which must be -<CODE>NULL</CODE>. It allocates enough memory to fit copies of each -of the strings, as a unit; for instance: -</P> -<PRE> - ap_pstrcat (r->pool, "foo", "/", "bar", NULL); -</PRE> -<P> -returns a pointer to 8 bytes worth of memory, initialized to -<CODE>"foo/bar"</CODE>. -</P> -<H3><A NAME="pools-used">Commonly-used pools in the Apache Web server</A></H3> -<P> -A pool is really defined by its lifetime more than anything else. There -are some static pools in http_main which are passed to various -non-http_main functions as arguments at opportune times. Here they are: -</P> -<DL COMPACT> - <DT>permanent_pool - </DT> - <DD> - <UL> - <LI>never passed to anything else, this is the ancestor of all pools - </LI> - </UL> - </DD> - <DT>pconf - </DT> - <DD> - <UL> - <LI>subpool of permanent_pool - </LI> - <LI>created at the beginning of a config "cycle"; exists until the - server is terminated or restarts; passed to all config-time - routines, either via cmd->pool, or as the "pool *p" argument on - those which don't take pools - </LI> - <LI>passed to the module init() functions - </LI> - </UL> - </DD> - <DT>ptemp - </DT> - <DD> - <UL> - <LI>sorry I lie, this pool isn't called this currently in 1.3, I - renamed it this in my pthreads development. I'm referring to - the use of ptrans in the parent... contrast this with the later - definition of ptrans in the child. - </LI> - <LI>subpool of permanent_pool - </LI> - <LI>created at the beginning of a config "cycle"; exists until the - end of config parsing; passed to config-time routines <EM>via</EM> - cmd->temp_pool. Somewhat of a "bastard child" because it isn't - available everywhere. Used for temporary scratch space which - may be needed by some config routines but which is deleted at - the end of config. - </LI> - </UL> - </DD> - <DT>pchild - </DT> - <DD> - <UL> - <LI>subpool of permanent_pool - </LI> - <LI>created when a child is spawned (or a thread is created); lives - until that child (thread) is destroyed - </LI> - <LI>passed to the module child_init functions - </LI> - <LI>destruction happens right after the child_exit functions are - called... (which may explain why I think child_exit is redundant - and unneeded) - </LI> - </UL> - </DD> - <DT>ptrans - <DT> - <DD> - <UL> - <LI>should be a subpool of pchild, but currently is a subpool of - permanent_pool, see above - </LI> - <LI>cleared by the child before going into the accept() loop to receive - a connection - </LI> - <LI>used as connection->pool - </LI> - </UL> - </DD> - <DT>r->pool - </DT> - <DD> - <UL> - <LI>for the main request this is a subpool of connection->pool; for - subrequests it is a subpool of the parent request's pool. - </LI> - <LI>exists until the end of the request (<EM>i.e.</EM>, - ap_destroy_sub_req, or - in child_main after process_request has finished) - </LI> - <LI>note that r itself is allocated from r->pool; <EM>i.e.</EM>, - r->pool is - first created and then r is the first thing palloc()d from it - </LI> - </UL> - </DD> -</DL> -<P> -For almost everything folks do, r->pool is the pool to use. But you -can see how other lifetimes, such as pchild, are useful to some -modules... such as modules that need to open a database connection once -per child, and wish to clean it up when the child dies. -</P> -<P> -You can also see how some bugs have manifested themself, such as setting -connection->user to a value from r->pool -- in this case -connection exists -for the lifetime of ptrans, which is longer than r->pool (especially if -r->pool is a subrequest!). So the correct thing to do is to allocate -from connection->pool. -</P> -<P> -And there was another interesting bug in mod_include/mod_cgi. You'll see -in those that they do this test to decide if they should use r->pool -or r->main->pool. In this case the resource that they are registering -for cleanup is a child process. If it were registered in r->pool, -then the code would wait() for the child when the subrequest finishes. -With mod_include this could be any old #include, and the delay can be up -to 3 seconds... and happened quite frequently. Instead the subprocess -is registered in r->main->pool which causes it to be cleaned up when -the entire request is done -- <EM>i.e.</EM>, after the output has been sent to -the client and logging has happened. -</P> -<H3><A NAME="pool-files">Tracking open files, etc.</A></H3> -<P> -As indicated above, resource pools are also used to track other sorts -of resources besides memory. The most common are open files. The -routine which is typically used for this is <CODE>ap_pfopen</CODE>, which -takes a resource pool and two strings as arguments; the strings are -the same as the typical arguments to <CODE>fopen</CODE>, <EM>e.g.</EM>, -</P> -<PRE> - ... - FILE *f = ap_pfopen (r->pool, r->filename, "r"); - - if (f == NULL) { ... } else { ... } -</PRE> -<P> -There is also a <CODE>ap_popenf</CODE> routine, which parallels the -lower-level <CODE>open</CODE> system call. Both of these routines -arrange for the file to be closed when the resource pool in question -is cleared. -</P> -<P> -Unlike the case for memory, there <EM>are</EM> functions to close -files allocated with <CODE>ap_pfopen</CODE>, and <CODE>ap_popenf</CODE>, -namely <CODE>ap_pfclose</CODE> and <CODE>ap_pclosef</CODE>. (This is -because, on many systems, the number of files which a single process -can have open is quite limited). It is important to use these -functions to close files allocated with <CODE>ap_pfopen</CODE> and -<CODE>ap_popenf</CODE>, since to do otherwise could cause fatal errors on -systems such as Linux, which react badly if the same -<CODE>FILE*</CODE> is closed more than once. -</P> -<P> -(Using the <CODE>close</CODE> functions is not mandatory, since the -file will eventually be closed regardless, but you should consider it -in cases where your module is opening, or could open, a lot of files). -</P> -<H3>Other sorts of resources --- cleanup functions</H3> -<BLOCKQUOTE> -More text goes here. Describe the the cleanup primitives in terms of -which the file stuff is implemented; also, <CODE>spawn_process</CODE>. -</BLOCKQUOTE> -<P> -Pool cleanups live until clear_pool() is called: clear_pool(a) recursively -calls destroy_pool() on all subpools of a; then calls all the cleanups for a; -then releases all the memory for a. destroy_pool(a) calls clear_pool(a) -and then releases the pool structure itself. <EM>i.e.</EM>, clear_pool(a) doesn't -delete a, it just frees up all the resources and you can start using it -again immediately. -</P> -<H3>Fine control --- creating and dealing with sub-pools, with a note -on sub-requests</H3> - -On rare occasions, too-free use of <CODE>ap_palloc()</CODE> and the -associated primitives may result in undesirably profligate resource -allocation. You can deal with such a case by creating a -<EM>sub-pool</EM>, allocating within the sub-pool rather than the main -pool, and clearing or destroying the sub-pool, which releases the -resources which were associated with it. (This really <EM>is</EM> a -rare situation; the only case in which it comes up in the standard -module set is in case of listing directories, and then only with -<EM>very</EM> large directories. Unnecessary use of the primitives -discussed here can hair up your code quite a bit, with very little -gain). <P> - -The primitive for creating a sub-pool is <CODE>ap_make_sub_pool</CODE>, -which takes another pool (the parent pool) as an argument. When the -main pool is cleared, the sub-pool will be destroyed. The sub-pool -may also be cleared or destroyed at any time, by calling the functions -<CODE>ap_clear_pool</CODE> and <CODE>ap_destroy_pool</CODE>, respectively. -(The difference is that <CODE>ap_clear_pool</CODE> frees resources -associated with the pool, while <CODE>ap_destroy_pool</CODE> also -deallocates the pool itself. In the former case, you can allocate new -resources within the pool, and clear it again, and so forth; in the -latter case, it is simply gone). <P> - -One final note --- sub-requests have their own resource pools, which -are sub-pools of the resource pool for the main request. The polite -way to reclaim the resources associated with a sub request which you -have allocated (using the <CODE>ap_sub_req_...</CODE> functions) -is <CODE>ap_destroy_sub_req</CODE>, which frees the resource pool. -Before calling this function, be sure to copy anything that you care -about which might be allocated in the sub-request's resource pool into -someplace a little less volatile (for instance, the filename in its -<CODE>request_rec</CODE> structure). <P> - -(Again, under most circumstances, you shouldn't feel obliged to call -this function; only 2K of memory or so are allocated for a typical sub -request, and it will be freed anyway when the main request pool is -cleared. It is only when you are allocating many, many sub-requests -for a single main request that you should seriously consider the -<CODE>ap_destroy_...</CODE> functions). - -<H2><A NAME="config">Configuration, commands and the like</A></H2> - -One of the design goals for this server was to maintain external -compatibility with the NCSA 1.3 server --- that is, to read the same -configuration files, to process all the directives therein correctly, -and in general to be a drop-in replacement for NCSA. On the other -hand, another design goal was to move as much of the server's -functionality into modules which have as little as possible to do with -the monolithic server core. The only way to reconcile these goals is -to move the handling of most commands from the central server into the -modules. <P> - -However, just giving the modules command tables is not enough to -divorce them completely from the server core. The server has to -remember the commands in order to act on them later. That involves -maintaining data which is private to the modules, and which can be -either per-server, or per-directory. Most things are per-directory, -including in particular access control and authorization information, -but also information on how to determine file types from suffixes, -which can be modified by <CODE>AddType</CODE> and -<CODE>DefaultType</CODE> directives, and so forth. In general, the -governing philosophy is that anything which <EM>can</EM> be made -configurable by directory should be; per-server information is -generally used in the standard set of modules for information like -<CODE>Alias</CODE>es and <CODE>Redirect</CODE>s which come into play -before the request is tied to a particular place in the underlying -file system. <P> - -Another requirement for emulating the NCSA server is being able to -handle the per-directory configuration files, generally called -<CODE>.htaccess</CODE> files, though even in the NCSA server they can -contain directives which have nothing at all to do with access -control. Accordingly, after URI -> filename translation, but before -performing any other phase, the server walks down the directory -hierarchy of the underlying filesystem, following the translated -pathname, to read any <CODE>.htaccess</CODE> files which might be -present. The information which is read in then has to be -<EM>merged</EM> with the applicable information from the server's own -config files (either from the <CODE><Directory></CODE> sections -in <CODE>access.conf</CODE>, or from defaults in -<CODE>srm.conf</CODE>, which actually behaves for most purposes almost -exactly like <CODE><Directory /></CODE>).<P> - -Finally, after having served a request which involved reading -<CODE>.htaccess</CODE> files, we need to discard the storage allocated -for handling them. That is solved the same way it is solved wherever -else similar problems come up, by tying those structures to the -per-transaction resource pool. <P> - -<H3><A NAME="per-dir">Per-directory configuration structures</A></H3> - -Let's look out how all of this plays out in <CODE>mod_mime.c</CODE>, -which defines the file typing handler which emulates the NCSA server's -behavior of determining file types from suffixes. What we'll be -looking at, here, is the code which implements the -<CODE>AddType</CODE> and <CODE>AddEncoding</CODE> commands. These -commands can appear in <CODE>.htaccess</CODE> files, so they must be -handled in the module's private per-directory data, which in fact, -consists of two separate <CODE>table</CODE>s for MIME types and -encoding information, and is declared as follows: - -<PRE> -typedef struct { - table *forced_types; /* Additional AddTyped stuff */ - table *encoding_types; /* Added with AddEncoding... */ -} mime_dir_config; -</PRE> - -When the server is reading a configuration file, or -<CODE><Directory></CODE> section, which includes one of the MIME -module's commands, it needs to create a <CODE>mime_dir_config</CODE> -structure, so those commands have something to act on. It does this -by invoking the function it finds in the module's `create per-dir -config slot', with two arguments: the name of the directory to which -this configuration information applies (or <CODE>NULL</CODE> for -<CODE>srm.conf</CODE>), and a pointer to a resource pool in which the -allocation should happen. <P> - -(If we are reading a <CODE>.htaccess</CODE> file, that resource pool -is the per-request resource pool for the request; otherwise it is a -resource pool which is used for configuration data, and cleared on -restarts. Either way, it is important for the structure being created -to vanish when the pool is cleared, by registering a cleanup on the -pool if necessary). <P> - -For the MIME module, the per-dir config creation function just -<CODE>ap_palloc</CODE>s the structure above, and a creates a couple of -<CODE>table</CODE>s to fill it. That looks like this: - -<PRE> -void *create_mime_dir_config (pool *p, char *dummy) -{ - mime_dir_config *new = - (mime_dir_config *) ap_palloc (p, sizeof(mime_dir_config)); - - new->forced_types = ap_make_table (p, 4); - new->encoding_types = ap_make_table (p, 4); - - return new; -} -</PRE> - -Now, suppose we've just read in a <CODE>.htaccess</CODE> file. We -already have the per-directory configuration structure for the next -directory up in the hierarchy. If the <CODE>.htaccess</CODE> file we -just read in didn't have any <CODE>AddType</CODE> or -<CODE>AddEncoding</CODE> commands, its per-directory config structure -for the MIME module is still valid, and we can just use it. -Otherwise, we need to merge the two structures somehow. <P> - -To do that, the server invokes the module's per-directory config merge -function, if one is present. That function takes three arguments: -the two structures being merged, and a resource pool in which to -allocate the result. For the MIME module, all that needs to be done -is overlay the tables from the new per-directory config structure with -those from the parent: - -<PRE> -void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv) -{ - mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv; - mime_dir_config *subdir = (mime_dir_config *)subdirv; - mime_dir_config *new = - (mime_dir_config *)ap_palloc (p, sizeof(mime_dir_config)); - - new->forced_types = ap_overlay_tables (p, subdir->forced_types, - parent_dir->forced_types); - new->encoding_types = ap_overlay_tables (p, subdir->encoding_types, - parent_dir->encoding_types); - - return new; -} -</PRE> - -As a note --- if there is no per-directory merge function present, the -server will just use the subdirectory's configuration info, and ignore -the parent's. For some modules, that works just fine (<EM>e.g.</EM>, for the -includes module, whose per-directory configuration information -consists solely of the state of the <CODE>XBITHACK</CODE>), and for -those modules, you can just not declare one, and leave the -corresponding structure slot in the module itself <CODE>NULL</CODE>.<P> - -<H3><A NAME="commands">Command handling</A></H3> - -Now that we have these structures, we need to be able to figure out -how to fill them. That involves processing the actual -<CODE>AddType</CODE> and <CODE>AddEncoding</CODE> commands. To find -commands, the server looks in the module's <CODE>command table</CODE>. -That table contains information on how many arguments the commands -take, and in what formats, where it is permitted, and so forth. That -information is sufficient to allow the server to invoke most -command-handling functions with pre-parsed arguments. Without further -ado, let's look at the <CODE>AddType</CODE> command handler, which -looks like this (the <CODE>AddEncoding</CODE> command looks basically -the same, and won't be shown here): - -<PRE> -char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext) -{ - if (*ext == '.') ++ext; - ap_table_set (m->forced_types, ext, ct); - return NULL; -} -</PRE> - -This command handler is unusually simple. As you can see, it takes -four arguments, two of which are pre-parsed arguments, the third being -the per-directory configuration structure for the module in question, -and the fourth being a pointer to a <CODE>cmd_parms</CODE> structure. -That structure contains a bunch of arguments which are frequently of -use to some, but not all, commands, including a resource pool (from -which memory can be allocated, and to which cleanups should be tied), -and the (virtual) server being configured, from which the module's -per-server configuration data can be obtained if required.<P> - -Another way in which this particular command handler is unusually -simple is that there are no error conditions which it can encounter. -If there were, it could return an error message instead of -<CODE>NULL</CODE>; this causes an error to be printed out on the -server's <CODE>stderr</CODE>, followed by a quick exit, if it is in -the main config files; for a <CODE>.htaccess</CODE> file, the syntax -error is logged in the server error log (along with an indication of -where it came from), and the request is bounced with a server error -response (HTTP error status, code 500). <P> - -The MIME module's command table has entries for these commands, which -look like this: - -<PRE> -command_rec mime_cmds[] = { -{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2, - "a mime type followed by a file extension" }, -{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, - "an encoding (<EM>e.g.</EM>, gzip), followed by a file extension" }, -{ NULL } -}; -</PRE> - -The entries in these tables are: - -<UL> - <LI> The name of the command - <LI> The function which handles it - <LI> a <CODE>(void *)</CODE> pointer, which is passed in the - <CODE>cmd_parms</CODE> structure to the command handler --- - this is useful in case many similar commands are handled by the - same function. - <LI> A bit mask indicating where the command may appear. There are - mask bits corresponding to each <CODE>AllowOverride</CODE> - option, and an additional mask bit, <CODE>RSRC_CONF</CODE>, - indicating that the command may appear in the server's own - config files, but <EM>not</EM> in any <CODE>.htaccess</CODE> - file. - <LI> A flag indicating how many arguments the command handler wants - pre-parsed, and how they should be passed in. - <CODE>TAKE2</CODE> indicates two pre-parsed arguments. Other - options are <CODE>TAKE1</CODE>, which indicates one pre-parsed - argument, <CODE>FLAG</CODE>, which indicates that the argument - should be <CODE>On</CODE> or <CODE>Off</CODE>, and is passed in - as a boolean flag, <CODE>RAW_ARGS</CODE>, which causes the - server to give the command the raw, unparsed arguments - (everything but the command name itself). There is also - <CODE>ITERATE</CODE>, which means that the handler looks the - same as <CODE>TAKE1</CODE>, but that if multiple arguments are - present, it should be called multiple times, and finally - <CODE>ITERATE2</CODE>, which indicates that the command handler - looks like a <CODE>TAKE2</CODE>, but if more arguments are - present, then it should be called multiple times, holding the - first argument constant. - <LI> Finally, we have a string which describes the arguments that - should be present. If the arguments in the actual config file - are not as required, this string will be used to help give a - more specific error message. (You can safely leave this - <CODE>NULL</CODE>). -</UL> - -Finally, having set this all up, we have to use it. This is -ultimately done in the module's handlers, specifically for its -file-typing handler, which looks more or less like this; note that the -per-directory configuration structure is extracted from the -<CODE>request_rec</CODE>'s per-directory configuration vector by using -the <CODE>ap_get_module_config</CODE> function. - -<PRE> -int find_ct(request_rec *r) -{ - int i; - char *fn = ap_pstrdup (r->pool, r->filename); - mime_dir_config *conf = (mime_dir_config *) - ap_get_module_config(r->per_dir_config, &mime_module); - char *type; - - if (S_ISDIR(r->finfo.st_mode)) { - r->content_type = DIR_MAGIC_TYPE; - return OK; - } - - if((i=ap_rind(fn,'.')) < 0) return DECLINED; - ++i; - - if ((type = ap_table_get (conf->encoding_types, &fn[i]))) - { - r->content_encoding = type; - - /* go back to previous extension to try to use it as a type */ - - fn[i-1] = '\0'; - if((i=ap_rind(fn,'.')) < 0) return OK; - ++i; - } - - if ((type = ap_table_get (conf->forced_types, &fn[i]))) - { - r->content_type = type; - } - - return OK; -} - -</PRE> - -<H3><A NAME="servconf">Side notes --- per-server configuration, virtual - servers, <EM>etc</EM>.</A></H3> - -The basic ideas behind per-server module configuration are basically -the same as those for per-directory configuration; there is a creation -function and a merge function, the latter being invoked where a -virtual server has partially overridden the base server configuration, -and a combined structure must be computed. (As with per-directory -configuration, the default if no merge function is specified, and a -module is configured in some virtual server, is that the base -configuration is simply ignored). <P> - -The only substantial difference is that when a command needs to -configure the per-server private module data, it needs to go to the -<CODE>cmd_parms</CODE> data to get at it. Here's an example, from the -alias module, which also indicates how a syntax error can be returned -(note that the per-directory configuration argument to the command -handler is declared as a dummy, since the module doesn't actually have -per-directory config data): - -<PRE> -char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url) -{ - server_rec *s = cmd->server; - alias_server_conf *conf = (alias_server_conf *) - ap_get_module_config(s->module_config,&alias_module); - alias_entry *new = ap_push_array (conf->redirects); - - if (!ap_is_url (url)) return "Redirect to non-URL"; - - new->fake = f; new->real = url; - return NULL; -} -</PRE> -<!--#include virtual="footer.html" --> -</BODY></HTML> diff --git a/docs/manual/misc/FAQ-A.html b/docs/manual/misc/FAQ-A.html deleted file mode 100644 index 504f0aec76..0000000000 --- a/docs/manual/misc/FAQ-A.html +++ /dev/null @@ -1,321 +0,0 @@ -<!--#if expr="$FAQMASTER" --> - <!--#set var="STANDALONE" value="" --> - <!--#set var="INCLUDED" value="YES" --> - <!--#if expr="$QUERY_STRING = TOC" --> - <!--#set var="TOC" value="YES" --> - <!--#set var="CONTENT" value="" --> - <!--#else --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="YES" --> - <!--#endif --> -<!--#else --> - <!--#set var="STANDALONE" value="YES" --> - <!--#set var="INCLUDED" value="" --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="" --> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> - <HEAD> - <TITLE>Apache Server Frequently Asked Questions</TITLE> - </HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> - <BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" - > - <!--#include virtual="header.html" --> - <H1 ALIGN="CENTER">Apache Server Frequently Asked Questions</H1> - <P> - $Revision: 1.5 $ ($Date: 2001/02/28 03:35:59 $) - </P> - <P> - The latest version of this FAQ is always available from the main - Apache web site, at - <<A - HREF="http://www.apache.org/docs/misc/FAQ.html" - REL="Help" - ><SAMP>http://www.apache.org/docs/misc/FAQ.html</SAMP></A>>. - </P> -<!-- Notes about changes: --> -<!-- - If adding a relative link to another part of the --> -<!-- documentation, *do* include the ".html" portion. There's a --> -<!-- good chance that the user will be reading the documentation --> -<!-- on his own system, which may not be configured for --> -<!-- multiviews. --> -<!-- - When adding items, make sure they're put in the right place --> -<!-- - verify that the numbering matches up. --> -<!-- - *Don't* use <PRE></PRE> blocks - they don't appear --> -<!-- correctly in a reliable way when this is converted to text --> -<!-- with Lynx. Use <DL><DD><CODE>xxx<BR>xx</CODE></DD></DL> --> -<!-- blocks inside a <P></P> instead. This is necessary to get --> -<!-- the horizontal and vertical indenting right. --> -<!-- - Don't forget to include an HR tag after the last /P tag --> -<!-- but before the /LI in an item. --> - <P> - If you are reading a text-only version of this FAQ, you may find numbers - enclosed in brackets (such as "[12]"). These refer to the list of - reference URLs to be found at the end of the document. These references - do not appear, and are not needed, for the hypertext version. - </P> - <H2>The Questions</H2> -<OL TYPE="A"> -<!--#endif --> -<!--#if expr="$TOC || $STANDALONE" --> - <LI VALUE="1"><STRONG>Background</STRONG> - <OL> - <LI><A HREF="#what">What is Apache?</A> - </LI> - <LI><A HREF="#why">How and why was Apache created?</A> - </LI> - <LI><A HREF="#name">Why the name "Apache"?</A> - </LI> - <LI><A HREF="#compare">OK, so how does Apache compare to other servers?</A> - </LI> - <LI><A HREF="#tested">How thoroughly tested is Apache?</A> - </LI> - <LI><A HREF="#future">What are the future plans for Apache?</A> - </LI> - <LI><A HREF="#support">Whom do I contact for support?</A> - </LI> - <LI><A HREF="#more">Is there any more information on Apache?</A> - </LI> - <LI><A HREF="#where">Where can I get Apache?</A> - </LI> - </OL> - </LI> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -</OL> - -<HR> - - <H2>The Answers</H2> -<!--#endif --> -<!--#if expr="! $TOC" --> - <H3>A. Background</H3> -<OL> - <LI><A NAME="what"> - <STRONG>What is Apache?</STRONG> - </A> - <P>The Apache httpd server</P> - -<UL> - <LI>is a powerful, flexible, HTTP/1.1 compliant web server - <LI>implements the latest protocols, including HTTP/1.1 (RFC2616) - <LI>is highly configurable and extensible with third-party modules - <LI>can be customised by writing 'modules' using the Apache module API - <LI>provides full source code and comes with an unrestrictive license - <LI>runs on Windows NT/9x, Netware 5.x, OS/2, and most versions of Unix, - as well as several other operating systems - <LI>is actively being developed - <LI>encourages user feedback through new ideas, bug reports and patches - <LI>implements many frequently requested features, including:<BR><BR> - <DL> - <DT>DBM databases for authentication</DT> - <DD>allows you to easily set up password-protected pages with - enormous numbers of authorized users, without bogging down the server. - <DT>Customized responses to errors and problems</DT> - <DD>Allows you to set up files, or even CGI scripts, which are - returned by the server in response - to errors and problems, e.g. setup a script to intercept - <STRONG>500 Server Error</STRONG>s and perform on-the-fly diagnostics for - both users and yourself. </DD> - <DT> Multiple DirectoryIndex directives </DT> - <DD> Allows you to say <CODE>DirectoryIndex index.html - index.cgi</CODE>, which instructs the server to either send - back <CODE>index.html</CODE> or run <CODE>index.cgi</CODE> - when a directory URL is requested, whichever it finds in the - directory. - <DT> Unlimited flexible URL rewriting and aliasing </DT> - <DD> Apache has no fixed limit on the numbers of Aliases and - Redirects which may be declared in the config files. In addition, - a powerful rewriting engine can be used to solve most URL - manipulation problems. - <DT>Content negotiation</DT> - <DD>i.e. the ability to automatically serve clients of varying - sophistication and HTML level compliance, with documents which - offer the best representation of information that the client is - capable of accepting.</DD> - <DT>Virtual Hosts</DT> - <DD>A much requested feature, sometimes known as multi-homed servers. - This allows the server to distinguish between requests made to - different IP addresses or names (mapped to the same machine). Apache - also offers dynamically configurable mass-virtual hosting. - </DD> - <DT>Configurable Reliable Piped Logs</DT> - <DD>You can configure - Apache to generate logs in the format that you want. In addition, on - most Unix architectures, Apache can send log files to a pipe, allowing - for log rotation, hit filtering, real-time splitting of multiple vhosts - into separate logs, and asynchronous DNS resolving on the fly. - </DL> -</UL> - - <HR> - </LI> - - <LI><A NAME="why"> - <STRONG>How and why was Apache created?</STRONG> - </A> - <P> - The <A HREF="http://www.apache.org/ABOUT_APACHE.html">About Apache</A> - document explains how the Apache project evolved from its beginnings - as an outgrowth of the NCSA httpd project to its current status as - one of the fastest, most efficient, and most functional web servers - in existence. - </P> - <HR> - </LI> - - <LI><A NAME="name"> - <STRONG>Why the name "Apache"?</STRONG> - </A> - <P> - A cute name which stuck. Apache is "<STRONG>A - PA</STRONG>t<STRONG>CH</STRONG>y server". It was - based on some existing code and a series of "patch files". - </P> - - <P> - For many developers it is also a reverent connotation to the Native - American Indian tribe of Apache, <A - HREF="http://www.indians.org/welker/apache.htm">well-known for their - superior skills in warfare strategy and inexhaustible endurance</A>. - Online information about the Apache Nation is tough to locate; we - suggest searching - <A HREF="http://www.google.com/search?q=Apache+Nation">Google</A>, - <A HREF="http://www.northernlight.com/nlquery.fcg?qr=Apache+Nation">Northernlight</A>, - <A HREF="http://infoseek.go.com/Titles?qt=Apache+Nation">Infoseek</A>, or - <A HREF="http://www.alltheweb.com/cgi-bin/asearch?query=Apache+Nation">AllTheWeb</A>. - </P> - <P> - In addition, <A - HREF="http://www.indian.org/">http://www.indian.org/</A> and <A - HREF="http://www.nativeweb.com/">http://www.nativeweb.com/</A> are - two excellent resources for Native American information. - </P> - <HR> - </LI> - - <LI><A NAME="compare"> - <STRONG>OK, so how does Apache compare to other servers?</STRONG> - </A> - <P> - For an independent assessment, see - <A HREF="http://webcompare.internet.com/chart.html">Web Compare</A>'s - comparison chart. - </P> - <P> - Apache has been shown to be substantially faster, more stable, and - more feature-full than many other web servers. Although certain - commercial servers have claimed to surpass Apache's speed (it has - not been demonstrated that any of these "benchmarks" are a - good way of measuring WWW server speed at any rate), we feel that it - is better to have a mostly-fast free server than an extremely-fast - server that costs thousands of dollars. Apache is run on sites that - get millions of hits per day, and they have experienced no - performance difficulties. - </P> - <HR> - </LI> - - <LI><A NAME="tested"> - <STRONG>How thoroughly tested is Apache?</STRONG> - </A> - <P> - Apache is run on over 6 million Internet servers (as of February - 2000). It has been tested thoroughly by both developers and - users. The Apache Group maintains rigorous standards before - releasing new versions of their server, and our server runs without - a hitch on over one half of all WWW servers available on the - Internet. When bugs do show up, we release patches and new versions - as soon as they are available. - </P> - <HR> - </LI> - - <LI><A NAME="future"> - <STRONG>What are the future plans for Apache?</STRONG> - </A> - <P> - <UL> - <LI>to continue to be an "open source" no-charge-for-use HTTP server, - </LI> - <LI>to keep up with advances in HTTP protocol and web developments in - general, - </LI> - <LI>to collect suggestions for fixes/improvements from its users, - </LI> - <LI>to respond to needs of large volume providers as well as - occasional users. - </LI> - </UL> - <P></P> - <HR> - </LI> - - <LI><A NAME="support"> - <STRONG>Whom do I contact for support?</STRONG> - </A> - <P> - There is no official support for Apache. None of the developers want to - be swamped by a flood of trivial questions that can be resolved elsewhere. - Bug reports and suggestions should be sent <EM>via</EM> - <A HREF="http://www.apache.org/bug_report.html">the bug report page</A>. - Other questions should be directed to the - <A HREF="news:comp.infosystems.www.servers.unix" - >comp.infosystems.www.servers.unix</A> or <A HREF= - "news:comp.infosystems.www.servers.ms-windows" - >comp.infosystems.www.servers.ms-windows</A> - newsgroup (as appropriate for the platform you use), where some of the - Apache team lurk, in the company of many other httpd gurus who - should be able to help. - </P> - <P> - Commercial support for Apache is, however, available from a number - of third parties. - </P> - <HR> - </LI> - - <LI><A NAME="more"> - <STRONG>Is there any more information available on - Apache?</STRONG> - </A> - <P> - Indeed there is. See the main - <A HREF="http://www.apache.org/httpd">Apache web site</A>. - There is also a regular electronic publication called - <A HREF="http://www.apacheweek.com/" REL="Help"><CITE>Apache Week</CITE></A> - available. Links to relevant <CITE>Apache Week</CITE> articles are - included below where appropriate. There are also some - <A HREF="http://www.apache.org/info/apache_books.html" - >Apache-specific books</A> available. - </P> - <HR> - </LI> - - <LI><A NAME="where"> - <STRONG>Where can I get Apache?</STRONG> - </A> - <P> - You can find out how to download the source for Apache at the - project's - <A HREF="http://www.apache.org/httpd">main web page</A>. - </P> - <HR> - </LI> -</OL> -<!--#endif --> -<!--#if expr="$STANDALONE" --> - <!-- Don't forget to add HR tags at the end of each list item.. --> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> -<!--#endif --> diff --git a/docs/manual/misc/FAQ-B.html b/docs/manual/misc/FAQ-B.html deleted file mode 100644 index 054914c415..0000000000 --- a/docs/manual/misc/FAQ-B.html +++ /dev/null @@ -1,441 +0,0 @@ -<!--#if expr="$FAQMASTER" --> - <!--#set var="STANDALONE" value="" --> - <!--#set var="INCLUDED" value="YES" --> - <!--#if expr="$QUERY_STRING = TOC" --> - <!--#set var="TOC" value="YES" --> - <!--#set var="CONTENT" value="" --> - <!--#else --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="YES" --> - <!--#endif --> -<!--#else --> - <!--#set var="STANDALONE" value="YES" --> - <!--#set var="INCLUDED" value="" --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="" --> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> - <HEAD> - <TITLE>Apache Server Frequently Asked Questions</TITLE> - </HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> - <BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" - > - <!--#include virtual="header.html" --> - <H1 ALIGN="CENTER">Apache Server Frequently Asked Questions</H1> - <P> - $Revision: 1.7 $ ($Date: 2001/03/28 21:26:29 $) - </P> - <P> - The latest version of this FAQ is always available from the main - Apache web site, at - <<A - HREF="http://www.apache.org/docs/misc/FAQ.html" - REL="Help" - ><SAMP>http://www.apache.org/docs/misc/FAQ.html</SAMP></A>>. - </P> -<!-- Notes about changes: --> -<!-- - If adding a relative link to another part of the --> -<!-- documentation, *do* include the ".html" portion. There's a --> -<!-- good chance that the user will be reading the documentation --> -<!-- on his own system, which may not be configured for --> -<!-- multiviews. --> -<!-- - When adding items, make sure they're put in the right place --> -<!-- - verify that the numbering matches up. --> -<!-- - *Don't* use <PRE></PRE> blocks - they don't appear --> -<!-- correctly in a reliable way when this is converted to text --> -<!-- with Lynx. Use <DL><DD><CODE>xxx<BR>xx</CODE></DD></DL> --> -<!-- blocks inside a <P></P> instead. This is necessary to get --> -<!-- the horizontal and vertical indenting right. --> -<!-- - Don't forget to include an HR tag after the last /P tag --> -<!-- but before the /LI in an item. --> - <P> - If you are reading a text-only version of this FAQ, you may find numbers - enclosed in brackets (such as "[12]"). These refer to the list of - reference URLs to be found at the end of the document. These references - do not appear, and are not needed, for the hypertext version. - </P> - <H2>The Questions</H2> -<OL TYPE="A"> -<!--#endif --> -<!--#if expr="$TOC || $STANDALONE" --> - <LI value="2"><STRONG>General Technical Questions</STRONG> - <OL> - <LI><A HREF="#what2do">"Why can't I ...? Why won't ... - work?" What to do in case of problems</A> - </LI> - <LI><A HREF="#compatible">How compatible is Apache with my existing - NCSA 1.3 setup?</A> - </LI> - <LI><A HREF="#year2000">Is Apache Year 2000 compliant?</A> - </LI> - <LI><A HREF="#submit_patch">How do I submit a patch to the Apache Group?</A> - </LI> - <LI><A HREF="#domination">Why has Apache stolen my favourite site's - Internet address?</A> - </LI> - <LI><A HREF="#apspam">Why am I getting spam mail from the Apache site?</A> - </LI> - <LI><A HREF="#redist">May I include the Apache software on a CD or other - package I'm distributing?</A> - </LI> - <LI><A HREF="#zoom">What's the best hardware/operating system/... How do - I get the most out of my Apache Web server?</A> - </LI> - <LI><A HREF="#regex">What are "regular expressions"?</A> - </LI> - <li><a href="#binaries">Why isn't there a binary for my platform?</a></li> - </OL> - </LI> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -</OL> - -<HR> - - <H2>The Answers</H2> -<!--#endif --> -<!--#if expr="! $TOC" --> - - <H3>B. General Technical Questions</H3> -<OL> - - <LI><A NAME="what2do"> - <STRONG>"Why can't I ...? Why won't ... work?" What to - do in case of problems</STRONG> - </A> - <P> - If you are having trouble with your Apache server software, you should - take the following steps: - </P> - <OL> - <LI><STRONG>Check the errorlog!</STRONG> - <P> - Apache tries to be helpful when it encounters a problem. In many - cases, it will provide some details by writing one or messages to - the server error log. Sometimes this is enough for you to diagnose - & fix the problem yourself (such as file permissions or the like). - The default location of the error log is - <SAMP>/usr/local/apache/logs/error_log</SAMP>, but see the - <A HREF="../mod/core.html#errorlog"><SAMP>ErrorLog</SAMP></A> - directive in your config files for the location on your server. - </P> - </LI> - <LI><STRONG>Check the - <A HREF="http://httpd.apache.org/docs/misc/FAQ.html">FAQ</A>!</STRONG> - <P> - The latest version of the Apache Frequently-Asked Questions list can - always be found at the main Apache web site. - </P> - </LI> - <LI><STRONG>Check the Apache bug database</STRONG> - <P> - Most problems that get reported to The Apache Group are recorded in - the - <A HREF="http://bugs.apache.org/">bug database</A>. - <EM><STRONG>Please</STRONG> check the existing reports, open - <STRONG>and</STRONG> closed, before adding one.</EM> If you find - that your issue has already been reported, please <EM>don't</EM> add - a "me, too" report. If the original report isn't closed - yet, we suggest that you check it periodically. You might also - consider contacting the original submitter, because there may be an - email exchange going on about the issue that isn't getting recorded - in the database. - </P> - </LI> - <LI><STRONG>Ask in the <SAMP>comp.infosystems.www.servers.unix</SAMP> - or <SAMP>comp.infosystems.www.servers.ms-windows</SAMP> USENET - newsgroup (as appropriate for the platform you use).</STRONG> - <P> - A lot of common problems never make it to the bug database because - there's already high Q&A traffic about them in the - <A HREF="news:comp.infosystems.www.servers.unix" - ><SAMP>comp.infosystems.www.servers.unix</SAMP></A> - newsgroup. Many Apache users, and some of the developers, can be - found roaming its virtual halls, so it is suggested that you seek - wisdom there. The chances are good that you'll get a faster answer - there than from the bug database, even if you <EM>don't</EM> see - your question already posted. - </P> - </LI> - <LI><STRONG>If all else fails, report the problem in the bug - database</STRONG> - <P> - If you've gone through those steps above that are appropriate and - have obtained no relief, then please <EM>do</EM> let The Apache - Group know about the problem by - <A HREF="http://httpd.apache.org/bug_report.html">logging a bug report</A>. - </P> - <P> - If your problem involves the server crashing and generating a core - dump, please include a backtrace (if possible). As an example, - </P> - <P> - <DL> - <DD><CODE># cd <EM>ServerRoot</EM><BR> - # dbx httpd core<BR> - (dbx) where</CODE> - </DD> - </DL> - <P></P> - <P> - (Substitute the appropriate locations for your - <SAMP>ServerRoot</SAMP> and your <SAMP>httpd</SAMP> and - <SAMP>core</SAMP> files. You may have to use <CODE>gdb</CODE> - instead of <CODE>dbx</CODE>.) - </P> - </LI> - </OL> - <HR> - </LI> - - <LI><A NAME="compatible"> - <STRONG>How compatible is Apache with my existing NCSA 1.3 - setup?</STRONG> - </A> - <P> - Apache attempts to offer all the features and configuration options - of NCSA httpd 1.3, as well as many of the additional features found in - NCSA httpd 1.4 and NCSA httpd 1.5. - </P> - <P> - NCSA httpd appears to be moving toward adding experimental features - which are not generally required at the moment. Some of the experiments - will succeed while others will inevitably be dropped. The Apache - philosophy is to add what's needed as and when it is needed. - </P> - <P> - Friendly interaction between Apache and NCSA developers should ensure - that fundamental feature enhancements stay consistent between the two - servers for the foreseeable future. - </P> - <HR> - </LI> - - <LI><A NAME="year2000"> - <STRONG>Is Apache Year 2000 compliant?</STRONG> - </A> - <P> - Yes, Apache is Year 2000 compliant. - </P> - <P> - Apache internally never stores years as two digits. - On the HTTP protocol level RFC1123-style addresses are generated - which is the only format a HTTP/1.1-compliant server should - generate. To be compatible with older applications Apache - recognizes ANSI C's <CODE>asctime()</CODE> and - RFC850-/RFC1036-style date formats, too. - The <CODE>asctime()</CODE> format uses four-digit years, - but the RFC850 and RFC1036 date formats only define a two-digit year. - If Apache sees such a date with a value less than 70 it assumes that - the century is <SAMP>20</SAMP> rather than <SAMP>19</SAMP>. - </P> - <P> - Although Apache is Year 2000 compliant, you may still get problems - if the underlying OS has problems with dates past year 2000 - (<EM>e.g.</EM>, OS calls which accept or return year numbers). - Most (UNIX) systems store dates internally as signed 32-bit integers - which contain the number of seconds since 1<SUP>st</SUP> January 1970, so - the magic boundary to worry about is the year 2038 and not 2000. - But modern operating systems shouldn't cause any trouble - at all. - </P> - <p> - The Apache HTTP Server project is an open-source software product of - the Apache Software Foundation. The project and the Foundation - <b>cannot</b> offer legal assurances regarding any suitability - of the software for your application. There are several commercial - Apache support organizations and derivative server products available - that may be able to stand behind the software and provide you with - any assurances you may require. You may find links to some of these - vendors at - <samp><<a href="http://httpd.apache.org/info/support.cgi" - >http://httpd.apache.org/info/support.cgi</a>></samp>. - </p> - <p> - The Apache HTTP server software is distributed with the following - disclaimer, found in the software license: - </p> - <pre> - THIS SOFTWARE IS PROVIDED BY THE APACHE GROUP ``AS IS'' AND ANY - EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR - PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE APACHE GROUP OR - ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT - NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; - LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) - HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, - STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED - OF THE POSSIBILITY OF SUCH DAMAGE. - </pre> - <HR> - </LI> - - <LI><A NAME="submit_patch"> - <STRONG>How do I submit a patch to the Apache Group?</STRONG></A> - <P> - The Apache Group encourages patches from outside developers. There - are 2 main "types" of patches: small bugfixes and general - improvements. Bugfixes should be submitting using the Apache <A - HREF="http://httpd.apache.org/bug_report.html">bug report page</A>. - Improvements, modifications, and additions should follow the - instructions below. - </P> - <P> - In general, the first course of action is to be a member of the - <SAMP>new-httpd@apache.org</SAMP> mailing list. This indicates to - the Group that you are closely following the latest Apache - developments. Your patch file should be generated using either - '<CODE>diff -c</CODE>' or '<CODE>diff -u</CODE>' against - the latest CVS tree. To submit your patch, send email to - <SAMP>new-httpd@apache.org</SAMP> with a <SAMP>Subject:</SAMP> line - that starts with <SAMP>[PATCH]</SAMP> and includes a general - description of the patch. In the body of the message, the patch - should be clearly described and then included at the end of the - message. If the patch-file is long, you can note a URL to the file - instead of the file itself. Use of MIME enclosures/attachments - should be avoided. - </P> - <P> - Be prepared to respond to any questions about your patches and - possibly defend your code. If your patch results in a lot of - discussion, you may be asked to submit an updated patch that - incorporate all changes and suggestions. - </P> - <HR> - </LI> - - <LI><A NAME="domination"><STRONG>Why has Apache stolen my favourite site's - Internet address?</STRONG></A> - <P> - The simple answer is: "It hasn't." This misconception is usually - caused by the site in question having migrated to the Apache Web - server software, but not having migrated the site's content yet. When - Apache is installed, the default page that gets installed tells the - Webmaster the installation was successful. The expectation is that - this default page will be replaced with the site's real content. - If it doesn't, complain to the Webmaster, not to the Apache project -- - we just make the software and aren't responsible for what people - do (or don't do) with it. - </P> - <HR> - </LI> - - <LI><A NAME="apspam"><STRONG>Why am I getting spam mail from the - Apache site?</STRONG></A> - <P> - The short answer is: "You aren't." Usually when someone thinks the - Apache site is originating spam, it's because they've traced the - spam to a Web site, and the Web site says it's using Apache. See the - <A HREF="#domination">previous FAQ entry</A> for more details on this - phenomenon. - </P> - <P> - No marketing spam originates from the Apache site. The only mail - that comes from the site goes only to addresses that have been - <EM>requested</EM> to receive the mail. - </P> - <HR> - </LI> - - <LI><A NAME="redist"><STRONG>May I include the Apache software on a - CD or other package I'm distributing?</STRONG></A> - <P> - The detailed answer to this question can be found in the - Apache license, which is included in the Apache distribution in - the file <CODE>LICENSE</CODE>. You can also find it on the Web at - <SAMP><<A HREF="http://www.apache.org/LICENSE.txt" - >http://www.apache.org/LICENSE.txt</A>></SAMP>. - </P> - <HR> - </LI> - - <LI><A NAME="zoom"> - <STRONG>What's the best hardware/operating system/... How do - I get the most out of my Apache Web server?</STRONG> - </A> - <P> - Check out Dean Gaudet's - <A HREF="perf-tuning.html">performance tuning page</A>. - </P> - <HR> - </LI> - - <LI><A NAME="regex"> - <STRONG>What are "regular expressions"?</STRONG></A> - <P> - Regular expressions are a way of describing a pattern - for example, "all - the words that begin with the letter A" or "every 10-digit phone number" - or even "Every sentence with two commas in it, and no capital letter Q". - Regular expressions (aka "regex"s) are useful in Apache because they - let you apply certain attributes against collections of files or resources - in very flexible ways - for example, all .gif and .jpg files under - any "images" directory could be written as /\/images\/.*(jpg|gif)$/. - </P> - <P> - The best overview around is probably the one which comes with Perl. - We implement a simple subset of Perl's regex support, but it's - still a good way to learn what they mean. You can start by going - to the <A - HREF="http://www.perl.com/CPAN-local/doc/manual/html/pod/perlre.html#Regular_Expressions" - >CPAN page on regular expressions</A>, and branching out from - there. - </P> - <HR> - </LI> - - <li><a name="binaries"> - <b>Why isn't there a binary for my platform?</b></a> - <p> - The developers make sure that the software builds and works - correctly on the platforms available to them; this does - <i>not</i> necessarily mean that <i>your</i> platform - is one of them. In addition, the Apache HTTP server project - is primarily source oriented, meaning that distributing - valid and buildable source code is the purpose of a release, - not making sure that there is a binary package for all of the - supported platforms. - </p> - <p> - If you don't see a kit for your platform listed in the - binary distribution area - (<URL:<a href="http://httpd.apache.org/dist/httpd/binaries/" - >http://httpd.apache.org/dist/httpd/binaries/</a>>), - it means either that the platform isn't available to any of - the developers, or that they just haven't gotten around to - preparing a binary for it. As this is a voluntary project, - they are under no obligation to do so. Users are encouraged - and expected to build the software themselves. - </p> - <p> - The sole exception to these practices is the Windows package. - Unlike most Unix and Unix-like platforms, Windows systems - do not come with a bundled software development environment, - so we <i>do</i> prepare binary kits for Windows when we make - a release. Again, however, it's a voluntary thing and only - a limited number of the developers have the capability to build - the InstallShield package, so the Windows release may lag - somewhat behind the source release. This lag should be - no more than a few days at most. - </p> - <hr> - </li> - -</OL> -<!--#endif --> -<!--#if expr="$STANDALONE" --> - <!-- Don't forget to add HR tags at the end of each list item.. --> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> -<!--#endif --> diff --git a/docs/manual/misc/FAQ-C.html b/docs/manual/misc/FAQ-C.html deleted file mode 100644 index 3f226c08f8..0000000000 --- a/docs/manual/misc/FAQ-C.html +++ /dev/null @@ -1,273 +0,0 @@ -<!--#if expr="$FAQMASTER" --> - <!--#set var="STANDALONE" value="" --> - <!--#set var="INCLUDED" value="YES" --> - <!--#if expr="$QUERY_STRING = TOC" --> - <!--#set var="TOC" value="YES" --> - <!--#set var="CONTENT" value="" --> - <!--#else --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="YES" --> - <!--#endif --> -<!--#else --> - <!--#set var="STANDALONE" value="YES" --> - <!--#set var="INCLUDED" value="" --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="" --> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> - <HEAD> - <TITLE>Apache Server Frequently Asked Questions</TITLE> - </HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> - <BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" - > - <!--#include virtual="header.html" --> - <H1 ALIGN="CENTER">Apache Server Frequently Asked Questions</H1> - <P> - $Revision: 1.1 $ ($Date: 1999/06/24 15:02:51 $) - </P> - <P> - The latest version of this FAQ is always available from the main - Apache web site, at - <<A - HREF="http://www.apache.org/docs/misc/FAQ.html" - REL="Help" - ><SAMP>http://www.apache.org/docs/misc/FAQ.html</SAMP></A>>. - </P> -<!-- Notes about changes: --> -<!-- - If adding a relative link to another part of the --> -<!-- documentation, *do* include the ".html" portion. There's a --> -<!-- good chance that the user will be reading the documentation --> -<!-- on his own system, which may not be configured for --> -<!-- multiviews. --> -<!-- - When adding items, make sure they're put in the right place --> -<!-- - verify that the numbering matches up. --> -<!-- - *Don't* use <PRE></PRE> blocks - they don't appear --> -<!-- correctly in a reliable way when this is converted to text --> -<!-- with Lynx. Use <DL><DD><CODE>xxx<BR>xx</CODE></DD></DL> --> -<!-- blocks inside a <P></P> instead. This is necessary to get --> -<!-- the horizontal and vertical indenting right. --> -<!-- - Don't forget to include an HR tag after the last /P tag --> -<!-- but before the /LI in an item. --> - <P> - If you are reading a text-only version of this FAQ, you may find numbers - enclosed in brackets (such as "[12]"). These refer to the list of - reference URLs to be found at the end of the document. These references - do not appear, and are not needed, for the hypertext version. - </P> - <H2>The Questions</H2> -<OL TYPE="A"> -<!--#endif --> -<!--#if expr="$TOC || $STANDALONE" --> - <LI VALUE="3"><STRONG>Building Apache</STRONG> - <OL> - <LI><A HREF="#bind8.1">Why do I get an error about an undefined - reference to "<SAMP>__inet_ntoa</SAMP>" or other - <SAMP>__inet_*</SAMP> symbols?</A> - </LI> - <LI><A HREF="#cantbuild">Why won't Apache compile with my - system's <SAMP>cc</SAMP>?</A> - </LI> - <LI><A HREF="#linuxiovec">Why do I get complaints about redefinition - of "<CODE>struct iovec</CODE>" when compiling under Linux?</A> - </LI> - <LI><A HREF="#broken-gcc">I'm using gcc and I get some compilation errors, - what is wrong?</A> - </LI> - <LI><A HREF="#glibc-crypt">I'm using RedHat Linux 5.0, or some other - <SAMP>glibc</SAMP>-based Linux system, and I get errors with the - <CODE>crypt</CODE> function when I attempt to build Apache 1.2.</A> - </LI> - </OL> - </LI> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -</OL> - -<HR> - - <H2>The Answers</H2> -<!--#endif --> -<!--#if expr="! $TOC" --> - - <H3>C. Building Apache</H3> -<OL> - - <LI><A NAME="bind8.1"> - <STRONG>Why do I get an error about an undefined reference to - "<SAMP>__inet_ntoa</SAMP>" or other - <SAMP>__inet_*</SAMP> symbols?</STRONG> - </A> - <P> - If you have installed <A HREF="http://www.isc.org/bind.html">BIND-8</A> - then this is normally due to a conflict between your include files - and your libraries. BIND-8 installs its include files and libraries - <CODE>/usr/local/include/</CODE> and <CODE>/usr/local/lib/</CODE>, while - the resolver that comes with your system is probably installed in - <CODE>/usr/include/</CODE> and <CODE>/usr/lib/</CODE>. If - your system uses the header files in <CODE>/usr/local/include/</CODE> - before those in <CODE>/usr/include/</CODE> but you do not use the new - resolver library, then the two versions will conflict. - </P> - <P> - To resolve this, you can either make sure you use the include files - and libraries that came with your system or make sure to use the - new include files and libraries. Adding <CODE>-lbind</CODE> to the - <CODE>EXTRA_LDFLAGS</CODE> line in your <SAMP>Configuration</SAMP> - file, then re-running <SAMP>Configure</SAMP>, should resolve the - problem. (Apache versions 1.2.* and earlier use - <CODE>EXTRA_LFLAGS</CODE> instead.) - </P> - <P> - <STRONG>Note:</STRONG>As of BIND 8.1.1, the bind libraries and files are - installed under <SAMP>/usr/local/bind</SAMP> by default, so you - should not run into this problem. Should you want to use the bind - resolvers you'll have to add the following to the respective lines: - </P> - <P> - <DL> - <DD><CODE>EXTRA_CFLAGS=-I/usr/local/bind/include - <BR> - EXTRA_LDFLAGS=-L/usr/local/bind/lib - <BR> - EXTRA_LIBS=-lbind</CODE> - </DD> - </DL> - <P></P> - <HR> - </LI> - - <LI><A NAME="cantbuild"> - <STRONG>Why won't Apache compile with my system's - <SAMP>cc</SAMP>?</STRONG> - </A> - <P> - If the server won't compile on your system, it is probably due to one - of the following causes: - </P> - <UL> - <LI><STRONG>The <SAMP>Configure</SAMP> script doesn't recognize your system - environment.</STRONG> - <BR> - This might be either because it's completely unknown or because - the specific environment (include files, OS version, <EM>et - cetera</EM>) isn't explicitly handled. If this happens, you may - need to port the server to your OS yourself. - </LI> - <LI><STRONG>Your system's C compiler is garbage.</STRONG> - <BR> - Some operating systems include a default C compiler that is either - not ANSI C-compliant or suffers from other deficiencies. The usual - recommendation in cases like this is to acquire, install, and use - <SAMP>gcc</SAMP>. - </LI> - <LI><STRONG>Your <SAMP>include</SAMP> files may be confused.</STRONG> - <BR> - In some cases, we have found that a compiler installation or system - upgrade has left the C header files in an inconsistent state. Make - sure that your include directory tree is in sync with the compiler and - the operating system. - </LI> - <LI><STRONG>Your operating system or compiler may be out of - revision.</STRONG> - <BR> - Software vendors (including those that develop operating systems) - issue new releases for a reason; sometimes to add functionality, but - more often to fix bugs that have been discovered. Try upgrading - your compiler and/or your operating system. - </LI> - </UL> - <P> - The Apache Group tests the ability to build the server on many - different platforms. Unfortunately, we can't test all of the OS - platforms there are. If you have verified that none of the above - issues is the cause of your problem, and it hasn't been reported - before, please submit a - <A HREF="http://www.apache.org/bug_report.html">problem report</A>. - Be sure to include <EM>complete</EM> details, such as the compiler - & OS versions and exact error messages. - </P> - <HR> - </LI> - - <LI><A NAME="linuxiovec"> - <STRONG>Why do I get complaints about redefinition - of "<CODE>struct iovec</CODE>" when - compiling under Linux?</STRONG> - </A> - <P> - This is a conflict between your C library includes and your kernel - includes. You need to make sure that the versions of both are matched - properly. There are two workarounds, either one will solve the problem: - </P> - <P> - <UL> - <LI>Remove the definition of <CODE>struct iovec</CODE> from your C - library includes. It is located in <CODE>/usr/include/sys/uio.h</CODE>. - <STRONG>Or,</STRONG> - </LI> - <LI>Add <CODE>-DNO_WRITEV</CODE> to the <CODE>EXTRA_CFLAGS</CODE> - line in your <SAMP>Configuration</SAMP> and reconfigure/rebuild. - This hurts performance and should only be used as a last resort. - </LI> - </UL> - <P></P> - <HR> - </LI> - - <LI><A NAME="broken-gcc"><STRONG>I'm using gcc and I get some - compilation errors, what is wrong?</STRONG></A> - <P> - GCC parses your system header files and produces a modified subset which - it uses for compiling. This behaviour ties GCC tightly to the version - of your operating system. So, for example, if you were running IRIX 5.3 - when you built GCC and then upgrade to IRIX 6.2 later, you will have to - rebuild GCC. Similarly for Solaris 2.4, 2.5, or 2.5.1 when you upgrade - to 2.6. Sometimes you can type "gcc -v" and it will tell you the version - of the operating system it was built against. - </P> - <P> - If you fail to do this, then it is very likely that Apache will fail - to build. One of the most common errors is with <CODE>readv</CODE>, - <CODE>writev</CODE>, or <CODE>uio.h</CODE>. This is <STRONG>not</STRONG> a - bug with Apache. You will need to re-install GCC. - </P> - <HR> - </LI> - - <LI><A NAME="glibc-crypt"> - <STRONG>I'm using RedHat Linux 5.0, or some other - <SAMP>glibc</SAMP>-based Linux system, and I get errors with the - <CODE>crypt</CODE> function when I attempt to build Apache 1.2.</STRONG> - </A> - - <P> - <SAMP>glibc</SAMP> puts the <CODE>crypt</CODE> function into a separate - library. Edit your <CODE>src/Configuration</CODE> file and set this: - </P> - <DL> - <DD><CODE>EXTRA_LIBS=-lcrypt</CODE> - </DD> - </DL> - <P> - Then re-run <SAMP>src/Configure</SAMP> and re-execute the make. - </P> - <HR> - </LI> - -</OL> -<!--#endif --> -<!--#if expr="$STANDALONE" --> - <!-- Don't forget to add HR tags at the end of each list item.. --> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> -<!--#endif --> diff --git a/docs/manual/misc/FAQ-D.html b/docs/manual/misc/FAQ-D.html deleted file mode 100644 index f54c7c3a5d..0000000000 --- a/docs/manual/misc/FAQ-D.html +++ /dev/null @@ -1,432 +0,0 @@ -<!--#if expr="$FAQMASTER" --> - <!--#set var="STANDALONE" value="" --> - <!--#set var="INCLUDED" value="YES" --> - <!--#if expr="$QUERY_STRING = TOC" --> - <!--#set var="TOC" value="YES" --> - <!--#set var="CONTENT" value="" --> - <!--#else --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="YES" --> - <!--#endif --> -<!--#else --> - <!--#set var="STANDALONE" value="YES" --> - <!--#set var="INCLUDED" value="" --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="" --> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> - <HEAD> - <TITLE>Apache Server Frequently Asked Questions</TITLE> - </HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> - <BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" - > - <!--#include virtual="header.html" --> - <H1 ALIGN="CENTER">Apache Server Frequently Asked Questions</H1> - <P> - $Revision: 1.8 $ ($Date: 2001/02/28 03:35:59 $) - </P> - <P> - The latest version of this FAQ is always available from the main - Apache web site, at - <<A - HREF="http://www.apache.org/docs/misc/FAQ.html" - REL="Help" - ><SAMP>http://www.apache.org/docs/misc/FAQ.html</SAMP></A>>. - </P> -<!-- Notes about changes: --> -<!-- - If adding a relative link to another part of the --> -<!-- documentation, *do* include the ".html" portion. There's a --> -<!-- good chance that the user will be reading the documentation --> -<!-- on his own system, which may not be configured for --> -<!-- multiviews. --> -<!-- - When adding items, make sure they're put in the right place --> -<!-- - verify that the numbering matches up. --> -<!-- - *Don't* use <PRE></PRE> blocks - they don't appear --> -<!-- correctly in a reliable way when this is converted to text --> -<!-- with Lynx. Use <DL><DD><CODE>xxx<BR>xx</CODE></DD></DL> --> -<!-- blocks inside a <P></P> instead. This is necessary to get --> -<!-- the horizontal and vertical indenting right. --> -<!-- - Don't forget to include an HR tag after the last /P tag --> -<!-- but before the /LI in an item. --> - <P> - If you are reading a text-only version of this FAQ, you may find numbers - enclosed in brackets (such as "[12]"). These refer to the list of - reference URLs to be found at the end of the document. These references - do not appear, and are not needed, for the hypertext version. - </P> - <H2>The Questions</H2> -<OL TYPE="A"> -<!--#endif --> -<!--#if expr="$TOC || $STANDALONE" --> - <LI VALUE="4"><STRONG>Error Log Messages and Problems Starting Apache</STRONG> - <OL> - <LI><A HREF="#setgid">Why do I get "<SAMP>setgid: Invalid - argument</SAMP>" at startup?</A> - </LI> - <LI><A HREF="#nodelay">Why am I getting "<SAMP>httpd: could not - set socket option TCP_NODELAY</SAMP>" in my error log?</A> - </LI> - <LI><A HREF="#peerreset">Why am I getting "<SAMP>connection - reset by peer</SAMP>" in my error log?</A> - </LI> - <LI><A HREF="#wheres-the-dump">The errorlog says Apache dumped core, - but where's the dump file?</A> - </LI> - <LI><A HREF="#linux-shmget">When I run it under Linux I get "shmget: - function not found", what should I do?</A> - </LI> - <LI><A HREF="#nfslocking">Server hangs, or fails to start, and/or error log - fills with "<SAMP>fcntl: F_SETLKW: No record locks - available</SAMP>" or similar messages</A> - </LI> - <LI><A HREF="#aixccbug">Why am I getting "<SAMP>Expected </Directory> - but saw </Directory></SAMP>" when I try to start Apache?</A> - </LI> - <LI><A HREF="#redhat">I'm using RedHat Linux and I have problems with httpd - dying randomly or not restarting properly</A> - </LI> - <LI><A HREF="#stopping">I upgraded from an Apache version earlier - than 1.2.0 and suddenly I have problems with Apache dying randomly - or not restarting properly</A> - </LI> - <LI><A HREF="#setservername">When I try to start Apache from a DOS - window, I get a message like "<samp>Cannot determine host name. - Use ServerName directive to set it manually.</samp>" What does - this mean?</A> - </LI> - <LI><A HREF="#ws2_32dll">When I try to start Apache for Windows, I get a message - like "<samp>Unable To Locate WS2_32.DLL...</samp>". What should I do?</A> - </LI> - <LI><A HREF="#WSADuplicateSocket">Apache for Windows does not start. - Error log contains this message "<samp>[crit] (10045) The attempted - operation is not supported for the type of object referenced: Parent: - WSADuplicateSocket failed for socket ###</samp>". What does this - mean?</a> - </LI> - </OL> - </LI> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -</OL> - -<HR> - - <H2>The Answers</H2> -<!--#endif --> -<!--#if expr="! $TOC" --> - - <H3>D. Error Log Messages and Problems Starting Apache</H3> -<OL> - - <LI><A NAME="setgid"> - <STRONG>Why do I get "<SAMP>setgid: Invalid - argument</SAMP>" at startup?</STRONG> - </A> - <P> - Your - <A HREF="../mod/mpm_common.html#group"><SAMP>Group</SAMP></A> - directive (probably in <SAMP>conf/httpd.conf</SAMP>) needs to name a - group that actually exists in the <SAMP>/etc/group</SAMP> file (or - your system's equivalent). This problem is also frequently seen when - a negative number is used in the <CODE>Group</CODE> directive - (<EM>e.g.</EM>, "<CODE>Group #-1</CODE>"). Using a group name - -- not group number -- found in your system's group database should - solve this problem in all cases. - </P> - <HR> - </LI> - - <LI><A NAME="nodelay"> - <STRONG>Why am I getting "<SAMP>httpd: could not set socket - option TCP_NODELAY</SAMP>" in my error log?</STRONG> - </A> - <P> - This message almost always indicates that the client disconnected - before Apache reached the point of calling <CODE>setsockopt()</CODE> - for the connection. It shouldn't occur for more than about 1% of the - requests your server handles, and it's advisory only in any case. - </P> - <HR> - </LI> - - <LI><A NAME="peerreset"> - <STRONG>Why am I getting "<SAMP>connection reset by - peer</SAMP>" in my error log?</STRONG> - </A> - <P> - This is a normal message and nothing about which to be alarmed. It simply - means that the client canceled the connection before it had been - completely set up - such as by the end-user pressing the "Stop" - button. People's patience being what it is, sites with response-time - problems or slow network links may experiences this more than - high-capacity ones or those with large pipes to the network. - </P> - <HR> - </LI> - - <LI><A NAME="wheres-the-dump"> - <STRONG>The errorlog says Apache dumped core, but where's the dump - file?</STRONG> - </A> - <P> - In Apache version 1.2, the error log message - about dumped core includes the directory where the dump file should be - located. However, many Unixes do not allow a process that has - called <CODE>setuid()</CODE> to dump core for security reasons; - the typical Apache setup has the server started as root to bind to - port 80, after which it changes UIDs to a non-privileged user to - serve requests. - </P> - <P> - Dealing with this is extremely operating system-specific, and may - require rebuilding your system kernel. Consult your operating system - documentation or vendor for more information about whether your system - does this and how to bypass it. If there <EM>is</EM> a documented way - of bypassing it, it is recommended that you bypass it only for the - <SAMP>httpd</SAMP> server process if possible. - </P> - <P> - The canonical location for Apache's core-dump files is the - <A HREF="../mod/core.html#serverroot">ServerRoot</A> - directory. As of Apache version 1.3, the location can be set <EM>via</EM> - the - <A HREF="../mod/mpm_common.html#coredumpdirectory" - ><SAMP>CoreDumpDirectory</SAMP></A> - directive to a different directory. Make sure that this directory is - writable by the user the server runs as (as opposed to the user the server - is <EM>started</EM> as). - </P> - <HR> - </LI> - - <LI><A NAME="linux-shmget"> - <STRONG>When I run it under Linux I get "shmget: - function not found", what should I do?</STRONG> - </A> - <P> - Your kernel has been built without SysV IPC support. You will have - to rebuild the kernel with that support enabled (it's under the - "General Setup" submenu). Documentation for kernel - building is beyond the scope of this FAQ; you should consult the <A - HREF="http://www.redhat.com/mirrors/LDP/HOWTO/Kernel-HOWTO.html">Kernel - HOWTO</A>, or the documentation provided with your distribution, or - a <A HREF="http://www.redhat.com/mirrors/LDP/HOWTO/META-FAQ.html">Linux - newsgroup/mailing list</A>. As a last-resort workaround, you can - comment out the <CODE>#define USE_SHMGET_SCOREBOARD</CODE> - definition in the <SAMP>LINUX</SAMP> section of - <SAMP>src/conf.h</SAMP> and rebuild the server (prior to 1.3b4, - simply removing <CODE>#define HAVE_SHMGET</CODE> would have - sufficed). This will produce a server which is slower and less - reliable. - </P> - <HR> - </LI> - - <LI><A NAME="nfslocking"> - <STRONG>Server hangs, or fails to start, and/or error log - fills with "<SAMP>fcntl: F_SETLKW: No record locks - available</SAMP>" or similar messages</STRONG> - </A> - - <P> - These are symptoms of a fine locking problem, which usually means that - the server is trying to use a synchronization file on an NFS filesystem. - </P> - <P> - Because of its parallel-operation model, the Apache Web server needs to - provide some form of synchronization when accessing certain resources. - One of these synchronization methods involves taking out locks on a file, - which means that the filesystem whereon the lockfile resides must support - locking. In many cases this means it <EM>can't</EM> be kept on an - NFS-mounted filesystem. - </P> - <P> - To cause the Web server to work around the NFS locking limitations, include - a line such as the following in your server configuration files: - </P> - <DL> - <DD><CODE>LockFile /var/run/apache-lock</CODE> - </DD> - </DL> - <P> - The directory should not be generally writable (<EM>e.g.</EM>, don't use - <SAMP>/var/tmp</SAMP>). - See the <A HREF="../mod/mpm_common.html#lockfile"><SAMP>LockFile</SAMP></A> - documentation for more information. - </P> - <HR> - </LI> - - <LI><A NAME="aixccbug"><STRONG>Why am I getting "<SAMP>Expected - </Directory> but saw </Directory></SAMP>" when - I try to start Apache?</STRONG></A> - <P> - This is a known problem with certain versions of the AIX C compiler. - IBM are working on a solution, and the issue is being tracked by - <A HREF="http://bugs.apache.org/index/full/2312">problem report #2312</A>. - </P> - <HR> - </LI> - - <LI><A NAME="redhat"> - <STRONG>I'm using RedHat Linux and I have problems with httpd - dying randomly or not restarting properly</STRONG> - </A> - - <P> - RedHat Linux versions 4.x (and possibly earlier) RPMs contain - various nasty scripts which do not stop or restart Apache properly. - These can affect you even if you're not running the RedHat supplied - RPMs. - </P> - <P> - If you're using the default install then you're probably running - Apache 1.1.3, which is outdated. From RedHat's ftp site you can - pick up a more recent RPM for Apache 1.2.x. This will solve one of - the problems. - </P> - <P> - If you're using a custom built Apache rather than the RedHat RPMs - then you should <CODE>rpm -e apache</CODE>. In particular you want - the mildly broken <CODE>/etc/logrotate.d/apache</CODE> script to be - removed, and you want the broken <CODE>/etc/rc.d/init.d/httpd</CODE> - (or <CODE>httpd.init</CODE>) script to be removed. The latter is - actually fixed by the apache-1.2.5 RPMs but if you're building your - own Apache then you probably don't want the RedHat files. - </P> - <P> - We can't stress enough how important it is for folks, <EM>especially - vendors</EM> to follow the <A HREF="../stopping.html">stopping Apache - directions</A> given in our documentation. In RedHat's defense, - the broken scripts were necessary with Apache 1.1.x because the - Linux support in 1.1.x was very poor, and there were various race - conditions on all platforms. None of this should be necessary with - Apache 1.2 and later. - </P> - <HR> - </LI> - - <LI><A NAME="stopping"> - <STRONG>I upgraded from an Apache version earlier - than 1.2.0 and suddenly I have problems with Apache dying randomly - or not restarting properly</STRONG> - </A> - - <P> - You should read <A HREF="#redhat">the previous note</A> about - problems with RedHat installations. It is entirely likely that your - installation has start/stop/restart scripts which were built for - an earlier version of Apache. Versions earlier than 1.2.0 had - various race conditions that made it necessary to use - <CODE>kill -9</CODE> at times to take out all the httpd servers. - But that should not be necessary any longer. You should follow - the <A HREF="../stopping.html">directions on how to stop - and restart Apache</A>. - </P> - <P>As of Apache 1.3 there is a script - <CODE>src/support/apachectl</CODE> which, after a bit of - customization, is suitable for starting, stopping, and restarting - your server. - </P> - <HR> - </LI> - - <LI><A name="setservername"> - <b>When I try to start Apache from a DOS - window, I get a message like "<samp>Cannot determine host name. - Use ServerName directive to set it manually.</samp>" What does - this mean?</b></A> - - <p> - It means what it says; the Apache software can't determine the - hostname of your system. Edit your <samp>conf\httpd.conf</samp> - file, look for the string "ServerName", and make sure there's an - uncommented directive such as - </p> - <dl> - <dd><code>ServerName localhost</code></dd> - </dl> - <p> - or - </p> - <dl> - <dd><code>ServerName www.foo.com</code></dd> - </dl> - <p> - in the file. Correct it if there one there with wrong information, or - add one if you don't already have one. - </p> - <p> - Also, make sure that your Windows system has DNS enabled. See the TCP/IP - setup component of the Networking or Internet Options control panel. - </p> - <p> - After verifying that DNS is enabled and that you have a valid hostname - in your <samp>ServerName</samp> directive, try to start the server - again. - </p> - <hr> - </LI> - <LI><A name="ws2_32dll"> - <b>When I try to start Apache for Windows, I get a message - like "<samp>Unable To Locate WS2_32.DLL...</samp>". What should I do?</b></A> - <p> - Short answer: You need to install Winsock 2, available from - <A HREF="http://www.microsoft.com/windows95/downloads/">http://www.microsoft.com/windows95/downloads/</A> - </p> - <p> - Detailed answer: Prior to version 1.3.9, Apache for Windows used Winsock 1.1. Beginning with - version 1.3.9, Apache began using Winsock 2 features (specifically, WSADuplicateSocket()). - WS2_32.DLL implements the Winsock 2 API. Winsock 2 ships with Windows NT 4.0 and Windows 98. - Some of the earlier releases of Windows 95 did not include Winsock 2. - </p> - <hr> - </LI> - <LI><A name="WSADuplicateSocket"> - <b>Apache for Windows does not start. Error log contains this message: - "<samp>[crit] (10045) The attempted operation is not supported for - the type of object referenced: Parent: WSADuplicateSocket failed for - socket ###</samp>". What does this mean?</b></A> - <p> - We have seen this problem when Apache is run on systems along with - Virtual Private Networking clients like Aventail Connect. Aventail Connect - is a Layered Service Provider (LSP) that inserts itself, as a "shim," - between the Winsock 2 API and Window's native Winsock 2 implementation. - The Aventail Connect shim does not implement WSADuplicateSocket, which is - the cause of the failure. - </p> - <p> - The shim is not unloaded when Aventail Connect is shut down. Once - observed, the problem persists until the shim is either explicitly - unloaded or the machine is rebooted. Instructions for temporarily - removing the Aventail Connect V3.x shim can be found here: - "<a href="http://support.aventail.com/akb/article00386.html" - >How to Remove Aventail Connect v3.x from the LSP Order for Testing - Purposes</a>." - </p> - <p> - Another potential solution (not tested) is to add <code>apache.exe</code> - to the Aventail "Connect Exclusion List". See this link for details: - "<a href="http://support.aventail.com/akb/article00586.html" - >How to Add an Application to Aventail Connect's Application Exclusion - List</a>." - </p> - <hr> - </LI> -</OL> -<!--#endif --> -<!--#if expr="$STANDALONE" --> - <!-- Don't forget to add HR tags at the end of each list item.. --> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> -<!--#endif --> diff --git a/docs/manual/misc/FAQ-E.html b/docs/manual/misc/FAQ-E.html deleted file mode 100644 index 555c7c4c9f..0000000000 --- a/docs/manual/misc/FAQ-E.html +++ /dev/null @@ -1,626 +0,0 @@ -<!--#if expr="$FAQMASTER" --> - <!--#set var="STANDALONE" value="" --> - <!--#set var="INCLUDED" value="YES" --> - <!--#if expr="$QUERY_STRING = TOC" --> - <!--#set var="TOC" value="YES" --> - <!--#set var="CONTENT" value="" --> - <!--#else --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="YES" --> - <!--#endif --> -<!--#else --> - <!--#set var="STANDALONE" value="YES" --> - <!--#set var="INCLUDED" value="" --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="" --> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> - <HEAD> - <TITLE>Apache Server Frequently Asked Questions</TITLE> - </HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> - <BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" - > - <!--#include virtual="header.html" --> - <H1 ALIGN="CENTER">Apache Server Frequently Asked Questions</H1> - <P> - $Revision: 1.7 $ ($Date: 2001/02/28 03:35:59 $) - </P> - <P> - The latest version of this FAQ is always available from the main - Apache web site, at - <<A - HREF="http://www.apache.org/docs/misc/FAQ.html" - REL="Help" - ><SAMP>http://www.apache.org/docs/misc/FAQ.html</SAMP></A>>. - </P> -<!-- Notes about changes: --> -<!-- - If adding a relative link to another part of the --> -<!-- documentation, *do* include the ".html" portion. There's a --> -<!-- good chance that the user will be reading the documentation --> -<!-- on his own system, which may not be configured for --> -<!-- multiviews. --> -<!-- - When adding items, make sure they're put in the right place --> -<!-- - verify that the numbering matches up. --> -<!-- - *Don't* use <PRE></PRE> blocks - they don't appear --> -<!-- correctly in a reliable way when this is converted to text --> -<!-- with Lynx. Use <DL><DD><CODE>xxx<BR>xx</CODE></DD></DL> --> -<!-- blocks inside a <P></P> instead. This is necessary to get --> -<!-- the horizontal and vertical indenting right. --> -<!-- - Don't forget to include an HR tag after the last /P tag --> -<!-- but before the /LI in an item. --> - <P> - If you are reading a text-only version of this FAQ, you may find numbers - enclosed in brackets (such as "[12]"). These refer to the list of - reference URLs to be found at the end of the document. These references - do not appear, and are not needed, for the hypertext version. - </P> - <H2>The Questions</H2> -<OL TYPE="A"> -<!--#endif --> -<!--#if expr="$TOC || $STANDALONE" --> - <LI VALUE="5"><STRONG>Configuration Questions</STRONG> - <OL> - <LI><A HREF="#fdlim">Why can't I run more than <<EM>n</EM>> - virtual hosts?</A> - </LI> - <LI><A HREF="#freebsd-setsize">Can I increase <SAMP>FD_SETSIZE</SAMP> - on FreeBSD?</A> - </LI> - <LI><A HREF="#errordoc401">Why doesn't my <CODE>ErrorDocument - 401</CODE> work?</A> - </LI> - <LI><A HREF="#cookies1">Why does Apache send a cookie on every response?</A> - </LI> - <LI><A HREF="#cookies2">Why don't my cookies work, I even compiled in - <SAMP>mod_cookies</SAMP>?</A> - </LI> - <LI><A HREF="#jdk1-and-http1.1">Why do my Java app[let]s give me plain text - when I request an URL from an Apache server?</A> - </LI> - <LI><A HREF="#midi">How do I get Apache to send a MIDI file so the - browser can play it?</A> - </LI> - <LI><A HREF="#addlog">How do I add browsers and referrers to my logs?</A> - </LI> - <LI><A HREF="#set-servername">Why does accessing directories only work - when I include the trailing "/" - (<EM>e.g.</EM>, <SAMP>http://foo.domain.com/~user/</SAMP>) but - not when I omit it - (<EM>e.g.</EM>, <SAMP>http://foo.domain.com/~user</SAMP>)?</A> - </LI> - <LI><A HREF="#no-info-directives">Why doesn't mod_info list any - directives?</A> - </LI> - <LI><A HREF="#namevhost">I upgraded to Apache 1.3 and now my - virtual hosts don't work!</A> - </LI> - <LI><A HREF="#redhat-htm">I'm using RedHat Linux and my .htm files are - showing up as HTML source rather than being formatted!</A> - </LI> - <LI><A HREF="#htaccess-work">My <CODE>.htaccess</CODE> files are being - ignored.</A> - </LI> - <LI><A HREF="#forbidden">Why do I get a - "<SAMP>Forbidden</SAMP>" message whenever I try to - access a particular directory?</A> - <LI><A HREF="#ie-ignores-mime">Why do my files appear correctly in - Internet Explorer, but show up as source or trigger a save - window with Netscape?</A> - </OL> - </LI> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -</OL> - -<HR> - - <H2>The Answers</H2> -<!--#endif --> -<!--#if expr="! $TOC" --> - - <H3>E. Configuration Questions</H3> -<OL> - - <LI><A NAME="fdlim"> - <STRONG>Why can't I run more than <<EM>n</EM>> - virtual hosts?</STRONG> - </A> - <P> - You are probably running into resource limitations in your - operating system. The most common limitation is the - <EM>per</EM>-process limit on <STRONG>file descriptors</STRONG>, - which is almost always the cause of problems seen when adding - virtual hosts. Apache often does not give an intuitive error - message because it is normally some library routine (such as - <CODE>gethostbyname()</CODE>) which needs file descriptors and - doesn't complain intelligibly when it can't get them. - </P> - <P> - Each log file requires a file descriptor, which means that if you are - using separate access and error logs for each virtual host, each - virtual host needs two file descriptors. Each - <A HREF="../mod/mpm_common.html#listen"><SAMP>Listen</SAMP></A> - directive also needs a file descriptor. - </P> - <P> - Typical values for <<EM>n</EM>> that we've seen are in - the neighborhood of 128 or 250. When the server bumps into the file - descriptor limit, it may dump core with a SIGSEGV, it might just - hang, or it may limp along and you'll see (possibly meaningful) errors - in the error log. One common problem that occurs when you run into - a file descriptor limit is that CGI scripts stop being executed - properly. - </P> - <P> - As to what you can do about this: - </P> - <OL> - <LI>Reduce the number of - <A HREF="../mod/mpm_common.html#listen"><SAMP>Listen</SAMP></A> - directives. If there are no other servers running on the machine - on the same port then you normally don't - need any Listen directives at all. By default Apache listens to - all addresses on port 80. - </LI> - <LI>Reduce the number of log files. You can use - <A HREF="../mod/mod_log_config.html"><SAMP>mod_log_config</SAMP></A> - to log all requests to a single log file while including the name - of the virtual host in the log file. You can then write a - script to split the logfile into separate files later if - necessary. Such a script is provided with the Apache 1.3 distribution - in the <SAMP>src/support/split-logfile</SAMP> file. - </LI> - <LI>Increase the number of file descriptors available to the server - (see your system's documentation on the <CODE>limit</CODE> or - <CODE>ulimit</CODE> commands). For some systems, information on - how to do this is available in the - <A HREF="perf-tuning.html">performance hints</A> page. There is a specific - note for <A HREF="#freebsd-setsize">FreeBSD</A> below. - <P> - For Windows 95, try modifying your <SAMP>C:\CONFIG.SYS</SAMP> file to - include a line like - </P> - <DL> - <DD><CODE>FILES=300</CODE> - </DD> - </DL> - <P> - Remember that you'll need to reboot your Windows 95 system in order - for the new value to take effect. - </P> - </LI> - <LI>"Don't do that" - try to run with fewer virtual hosts - </LI> - <LI>Spread your operation across multiple server processes (using - <A HREF="../mod/mpm_common.html#listen"><SAMP>Listen</SAMP></A> - for example, but see the first point) and/or ports. - </LI> - </OL> - <P> - Since this is an operating-system limitation, there's not much else - available in the way of solutions. - </P> - <P> - As of 1.2.1 we have made attempts to work around various limitations - involving running with many descriptors. - <A HREF="descriptors.html">More information is available.</A> - </P> - <HR> - </LI> - - <LI><A NAME="freebsd-setsize"> - <STRONG>Can I increase <SAMP>FD_SETSIZE</SAMP> on FreeBSD?</STRONG> - </A> - <P> - On versions of FreeBSD before 3.0, the <SAMP>FD_SETSIZE</SAMP> define - defaults to 256. This means that you will have trouble usefully using - more than 256 file descriptors in Apache. This can be increased, but - doing so can be tricky. - </P> - <P> - If you are using a version prior to 2.2, you need to recompile your - kernel with a larger <SAMP>FD_SETSIZE</SAMP>. This can be done by adding a - line such as: - </P> - <DL> - <DD><CODE>options FD_SETSIZE <EM>nnn</EM></CODE> - </DD> - </DL> - <P> - to your kernel config file. Starting at version 2.2, this is no - longer necessary. - </P> - <P> - If you are using a version of 2.1-stable from after 1997/03/10 or - 2.2 or 3.0-current from before 1997/06/28, there is a limit in - the resolver library that prevents it from using more file descriptors - than what <SAMP>FD_SETSIZE</SAMP> is set to when libc is compiled. To - increase this, you have to recompile libc with a higher - <SAMP>FD_SETSIZE</SAMP>. - </P> - <P> - In FreeBSD 3.0, the default <SAMP>FD_SETSIZE</SAMP> has been increased to - 1024 and the above limitation in the resolver library - has been removed. - </P> - <P> - After you deal with the appropriate changes above, you can increase - the setting of <SAMP>FD_SETSIZE</SAMP> at Apache compilation time - by adding "<SAMP>-DFD_SETSIZE=<EM>nnn</EM></SAMP>" to the - <SAMP>EXTRA_CFLAGS</SAMP> line in your <SAMP>Configuration</SAMP> - file. - </P> - <HR> - </LI> - - <LI><A NAME="errordoc401"> - <STRONG>Why doesn't my <CODE>ErrorDocument 401</CODE> work?</STRONG> - </A> - <P> - You need to use it with a URL in the form - "<SAMP>/foo/bar</SAMP>" and not one with a method and - hostname such as "<SAMP>http://host/foo/bar</SAMP>". See the - <A HREF="../mod/core.html#errordocument"><SAMP>ErrorDocument</SAMP></A> - documentation for details. This was incorrectly documented in the past. - </P> - <HR> - </LI> - - <LI><A NAME="cookies1"> - <STRONG>Why does Apache send a cookie on every response?</STRONG> - </A> - <P> - Apache does <EM>not</EM> automatically send a cookie on every - response, unless you have re-compiled it with the - <A HREF="../mod/mod_usertrack.html"><SAMP>mod_usertrack</SAMP></A> - module, and specifically enabled it with the - <A HREF="../mod/mod_usertrack.html#cookietracking" - ><SAMP>CookieTracking</SAMP></A> - directive. - This module has been in Apache since version 1.2. - This module may help track users, and uses cookies to do this. If - you are not using the data generated by <SAMP>mod_usertrack</SAMP>, do - not compile it into Apache. - </P> - <HR> - </LI> - - <LI><A NAME="cookies2"> - <STRONG>Why don't my cookies work, I even compiled in - <SAMP>mod_cookies</SAMP>? - </STRONG> - </A> - <P> - Firstly, you do <EM>not</EM> need to compile in - <SAMP>mod_cookies</SAMP> in order for your scripts to work (see the - <A HREF="#cookies1">previous question</A> - for more about <SAMP>mod_cookies</SAMP>). Apache passes on your - <SAMP>Set-Cookie</SAMP> header fine, with or without this module. If - cookies do not work it will be because your script does not work - properly or your browser does not use cookies or is not set-up to - accept them. - </P> - <HR> - </LI> - - <LI><A NAME="jdk1-and-http1.1"> - <STRONG>Why do my Java app[let]s give me plain text when I request - an URL from an Apache server?</STRONG> - </A> - <P> - As of version 1.2, Apache is an HTTP/1.1 (HyperText Transfer Protocol - version 1.1) server. This fact is reflected in the protocol version - that's included in the response headers sent to a client when - processing a request. Unfortunately, low-level Web access classes - included in the Java Development Kit (JDK) version 1.0.2 expect to see - the version string "HTTP/1.0" and do not correctly interpret - the "HTTP/1.1" value Apache is sending (this part of the - response is a declaration of what the server can do rather than a - declaration of the dialect of the response). The result - is that the JDK methods do not correctly parse the headers, and - include them with the document content by mistake. - </P> - <P> - This is definitely a bug in the JDK 1.0.2 foundation classes from Sun, - and it has been fixed in version 1.1. However, the classes in - question are part of the virtual machine environment, which means - they're part of the Web browser (if Java-enabled) or the Java - environment on the client system - so even if you develop - <EM>your</EM> classes with a recent JDK, the eventual users might - encounter the problem. - The classes involved are replaceable by vendors implementing the - Java virtual machine environment, and so even those that are based - upon the 1.0.2 version may not have this problem. - </P> - <P> - In the meantime, a workaround is to tell - Apache to "fake" an HTTP/1.0 response to requests that come - from the JDK methods; this can be done by including a line such as the - following in your server configuration files: - </P> - <P> - <DL> - <DD><CODE>BrowserMatch Java1.0 force-response-1.0 - <BR> - BrowserMatch JDK/1.0 force-response-1.0</CODE> - </DD> - </DL> - <P></P> - <P> - More information about this issue can be found in the - <A HREF="http://www.apache.org/info/jdk-102.html" - ><CITE>Java and HTTP/1.1</CITE></A> - page at the Apache web site. - </P> - <HR> - </LI> - - <LI><A NAME="midi"> - <STRONG>How do I get Apache to send a MIDI file so the browser can - play it?</STRONG> - </A> - <P> - Even though the registered MIME type for MIDI files is - <SAMP>audio/midi</SAMP>, some browsers are not set up to recognize it - as such; instead, they look for <SAMP>audio/x-midi</SAMP>. There are - two things you can do to address this: - </P> - <OL> - <LI>Configure your browser to treat documents of type - <SAMP>audio/midi</SAMP> correctly. This is the type that Apache - sends by default. This may not be workable, however, if you have - many client installations to change, or if some or many of the - clients are not under your control. - </LI> - <LI>Instruct Apache to send a different <SAMP>Content-type</SAMP> - header for these files by adding the following line to your server's - configuration files: - <P> - <DL> - <DD><CODE>AddType audio/x-midi .mid .midi .kar</CODE> - </DD> - </DL> - <P></P> - <P> - Note that this may break browsers that <EM>do</EM> recognize the - <SAMP>audio/midi</SAMP> MIME type unless they're prepared to also - handle <SAMP>audio/x-midi</SAMP> the same way. - </P> - </LI> - </OL> - <HR> - </LI> - - <LI><A NAME="addlog"> - <STRONG>How do I add browsers and referrers to my logs?</STRONG> - </A> - <P> - Apache provides a couple of different ways of doing this. The - recommended method is to compile the - <A HREF="../mod/mod_log_config.html"><SAMP>mod_log_config</SAMP></A> - module into your configuration and use the - <A HREF="../mod/mod_log_config.html#customlog"><SAMP>CustomLog</SAMP></A> - directive. - </P> - <P> - You can either log the additional information in files other than your - normal transfer log, or you can add them to the records already being - written. For example: - </P> - <P> - <CODE> - CustomLog logs/access_log "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\"" - </CODE> - </P> - <P> - This will add the values of the <SAMP>User-agent:</SAMP> and - <SAMP>Referer:</SAMP> headers, which indicate the client and the - referring page, respectively, to the end of each line in the access - log. - </P> - <P> - You may want to check out the <CITE>Apache Week</CITE> article - entitled: - "<A HREF="http://www.apacheweek.com/features/logfiles" REL="Help" - ><CITE>Gathering Visitor Information: Customizing Your - Logfiles</CITE></A>". - </P> - <HR> - </LI> - - <LI><A NAME="set-servername"> - <STRONG>Why does accessing directories only work when I include - the trailing "/" - (<EM>e.g.</EM>, <SAMP>http://foo.domain.com/~user/</SAMP>) - but not when I omit it - (<EM>e.g.</EM>, <SAMP>http://foo.domain.com/~user</SAMP>)?</STRONG> - </A> - <P> - When you access a directory without a trailing "/", Apache needs - to send what is called a redirect to the client to tell it to - add the trailing slash. If it did not do so, relative URLs would - not work properly. When it sends the redirect, it needs to know - the name of the server so that it can include it in the redirect. - There are two ways for Apache to find this out; either it can guess, - or you can tell it. If your DNS is configured correctly, it can - normally guess without any problems. If it is not, however, then - you need to tell it. - </P> - <P> - Add a <A HREF="../mod/core.html#servername">ServerName</A> directive - to the config file to tell it what the domain name of the server is. - </P> - <HR> - </LI> - - <LI><A NAME="no-info-directives"> - <STRONG>Why doesn't mod_info list any directives?</STRONG> - </A> - <P> - The <A HREF="../mod/mod_info.html"><SAMP>mod_info</SAMP></A> - module allows you to use a Web browser to see how your server is - configured. Among the information it displays is the list modules and - their configuration directives. The "current" values for - the directives are not necessarily those of the running server; they - are extracted from the configuration files themselves at the time of - the request. If the files have been changed since the server was last - reloaded, the display will not match the values actively in use. - If the files and the path to the files are not readable by the user as - which the server is running (see the - <A HREF="../mod/mpm_common.html#user"><SAMP>User</SAMP></A> - directive), then <SAMP>mod_info</SAMP> cannot read them in order to - list their values. An entry <EM>will</EM> be made in the error log in - this event, however. - </P> - <HR> - </LI> - - <LI><A NAME="namevhost"> - <STRONG>I upgraded to Apache 1.3 and now my virtual hosts don't - work!</STRONG> - </A> - <P> - In versions of Apache prior to 1.3b2, there was a lot of confusion - regarding address-based virtual hosts and (HTTP/1.1) name-based - virtual hosts, and the rules concerning how the server processed - <SAMP><VirtualHost></SAMP> definitions were very complex and not - well documented. - </P> - <P> - Apache 1.3b2 introduced a new directive, - <A HREF="http://www.apache.org/docs/mod/core.html#namevirtualhost" - ><SAMP>NameVirtualHost</SAMP></A>, - which simplifies the rules quite a bit. However, changing the rules - like this means that your existing name-based - <SAMP><VirtualHost></SAMP> containers probably won't work - correctly immediately following the upgrade. - </P> - <P> - To correct this problem, add the following line to the beginning of - your server configuration file, before defining any virtual hosts: - </P> - <DL> - <DD><CODE>NameVirtualHost <EM>n.n.n.n</EM></CODE> - </DD> - </DL> - <P> - Replace the "<SAMP>n.n.n.n</SAMP>" with the IP address to - which the name-based virtual host names resolve; if you have multiple - name-based hosts on multiple addresses, repeat the directive for each - address. - </P> - <P> - Make sure that your name-based <SAMP><VirtualHost></SAMP> blocks - contain <SAMP>ServerName</SAMP> and possibly <SAMP>ServerAlias</SAMP> - directives so Apache can be sure to tell them apart correctly. - </P> - <P> - Please see the <A HREF="../vhosts/">Apache - Virtual Host documentation</A> for further details about configuration. - </P> - <HR> - </LI> - - <LI><A NAME="redhat-htm"> - <STRONG>I'm using RedHat Linux and my .htm files are showing - up as HTML source rather than being formatted!</STRONG> - </A> - - <P> - RedHat messed up and forgot to put a content type for <CODE>.htm</CODE> - files into <CODE>/etc/mime.types</CODE>. Edit <CODE>/etc/mime.types</CODE>, - find the line containing <CODE>html</CODE> and add <CODE>htm</CODE> to it. - Then restart your httpd server: - </P> - <DL> - <DD><CODE>kill -HUP `cat /var/run/httpd.pid`</CODE> - </DD> - </DL> - <P> - Then <STRONG>clear your browsers' caches</STRONG>. (Many browsers won't - re-examine the content type after they've reloaded a page.) - </P> - <HR> - </LI> - - <LI><A NAME="htaccess-work"> - <STRONG>My <CODE>.htaccess</CODE> files are being ignored.</STRONG></A> - <P> - This is almost always due to your <A HREF="../mod/core.html#allowoverride"> - AllowOverride</A> directive being set incorrectly for the directory in - question. If it is set to <CODE>None</CODE> then .htaccess files will - not even be looked for. If you do have one that is set, then be certain - it covers the directory you are trying to use the .htaccess file in. - This is normally accomplished by ensuring it is inside the proper - <A HREF="../mod/core.html#directory">Directory</A> container. - </P> - <HR> - </LI> - <LI><A NAME="forbidden"> - <STRONG>Why do I get a "<SAMP>Forbidden</SAMP>" message - whenever I try to access a particular directory?</STRONG></A> - <P> - This message is generally caused because either - </P> - <UL> - <LI>The underlying file system permissions do not allow the - User/Group under which Apache is running to access the necessary - files; or - <LI>The Apache configuration has some access restrictions in - place which forbid access to the files. - </UL> - <P> - You can determine which case applies to your situation by checking the - error log. - </P> - <P> - In the case where file system permission are at fault, remember - that not only must the directory and files in question be readable, - but also all parent directories must be at least searchable by the - web server in order for the content to be accessible. - </P> - <HR> - </LI> - <LI><A NAME="ie-ignores-mime"> - <STRONG>Why do my files appear correctly in Internet - Explorer, but show up as source or trigger a save window - with Netscape?</STRONG></A> - <P> - Internet Explorer (IE) and Netscape handle mime type detection in different - ways, and therefore will display the document differently. In particular, - IE sometimes relies on the file extension to determine the mime type. This - can happen when the server specifies a mime type of - <CODE>application/octet-stream</CODE> or <CODE>text/plain</CODE>. - (Unfortunately, this behavior makes it impossible to properly send plain - text in some situations unless the file extension is <CODE>txt</CODE>.) - There are more details available on IE's mime type detection behavior in an - <A HREF="http://msdn.microsoft.com/workshop/networking/moniker/overview/appendix_a.asp">MSDN - article</A>. - </P> - - <P> - In order to make all browsers work correctly, you should assure - that Apache sends the correct mime type for the file. This is - accomplished by editing the <CODE>mime.types</CODE> file or using - an <A HREF="../mod/mod_mime.html#addtype"><CODE>AddType</CODE></A> - directive in the Apache configuration files. - </P> - <HR> - </LI> -</OL> -<!--#endif --> -<!--#if expr="$STANDALONE" --> - <!-- Don't forget to add HR tags at the end of each list item.. --> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> -<!--#endif --> diff --git a/docs/manual/misc/FAQ-F.html b/docs/manual/misc/FAQ-F.html deleted file mode 100644 index 0f7e7046db..0000000000 --- a/docs/manual/misc/FAQ-F.html +++ /dev/null @@ -1,564 +0,0 @@ -<!--#if expr="$FAQMASTER" --> - <!--#set var="STANDALONE" value="" --> - <!--#set var="INCLUDED" value="YES" --> - <!--#if expr="$QUERY_STRING = TOC" --> - <!--#set var="TOC" value="YES" --> - <!--#set var="CONTENT" value="" --> - <!--#else --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="YES" --> - <!--#endif --> -<!--#else --> - <!--#set var="STANDALONE" value="YES" --> - <!--#set var="INCLUDED" value="" --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="" --> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> - <HEAD> - <TITLE>Apache Server Frequently Asked Questions</TITLE> - </HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> - <BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" - > - <!--#include virtual="header.html" --> - <H1 ALIGN="CENTER">Apache Server Frequently Asked Questions</H1> - <P> - $Revision: 1.9 $ ($Date: 2001/02/28 03:36:00 $) - </P> - <P> - The latest version of this FAQ is always available from the main - Apache web site, at - <<A - HREF="http://www.apache.org/docs/misc/FAQ.html" - REL="Help" - ><SAMP>http://www.apache.org/docs/misc/FAQ.html</SAMP></A>>. - </P> -<!-- Notes about changes: --> -<!-- - If adding a relative link to another part of the --> -<!-- documentation, *do* include the ".html" portion. There's a --> -<!-- good chance that the user will be reading the documentation --> -<!-- on his own system, which may not be configured for --> -<!-- multiviews. --> -<!-- - When adding items, make sure they're put in the right place --> -<!-- - verify that the numbering matches up. --> -<!-- - *Don't* use <PRE></PRE> blocks - they don't appear --> -<!-- correctly in a reliable way when this is converted to text --> -<!-- with Lynx. Use <DL><DD><CODE>xxx<BR>xx</CODE></DD></DL> --> -<!-- blocks inside a <P></P> instead. This is necessary to get --> -<!-- the horizontal and vertical indenting right. --> -<!-- - Don't forget to include an HR tag after the last /P tag --> -<!-- but before the /LI in an item. --> - <P> - If you are reading a text-only version of this FAQ, you may find numbers - enclosed in brackets (such as "[12]"). These refer to the list of - reference URLs to be found at the end of the document. These references - do not appear, and are not needed, for the hypertext version. - </P> - <H2>The Questions</H2> -<OL TYPE="A"> -<!--#endif --> -<!--#if expr="$TOC || $STANDALONE" --> - <LI VALUE="6"><STRONG>Dynamic Content (CGI and SSI)</STRONG> - <OL> - <LI><A HREF="#CGIoutsideScriptAlias">How do I enable CGI execution - in directories other than the ScriptAlias?</A> - </LI> - <LI><A HREF="#premature-script-headers">What does it mean when my - CGIs fail with "<SAMP>Premature end of script - headers</SAMP>"?</A> - </LI> - <LI><A HREF="#POSTnotallowed">Why do I keep getting "Method Not - Allowed" for form POST requests?</A> - </LI> - <LI><A HREF="#nph-scripts">How can I get my script's output without - Apache buffering it? Why doesn't my server push work?</A> - </LI> - <LI><A HREF="#cgi-spec">Where can I find the "CGI - specification"?</A> - </LI> - <LI><A HREF="#fastcgi">Why isn't FastCGI included with Apache any - more?</A> - </LI> - <LI><A HREF="#ssi-part-i">How do I enable SSI (parsed HTML)?</A> - </LI> - <LI><A HREF="#ssi-part-ii">Why don't my parsed files get cached?</A> - </LI> - <LI><A HREF="#ssi-part-iii">How can I have my script output parsed?</A> - </LI> - <LI><A HREF="#ssi-part-iv">SSIs don't work for VirtualHosts and/or - user home directories</A> - </LI> - <LI><A HREF="#errordocssi">How can I use <CODE>ErrorDocument</CODE> - and SSI to simplify customized error messages?</A> - </LI> - <LI><A HREF="#remote-user-var">Why is the environment variable - <SAMP>REMOTE_USER</SAMP> not set?</A> - </LI> - <LI><A HREF="#user-cgi">How do I allow each of my user directories - to have a cgi-bin directory?</A> - </LI> - </OL> - </LI> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -</OL> - -<HR> - - <H2>The Answers</H2> -<!--#endif --> -<!--#if expr="! $TOC" --> - - <H3>F. Dynamic Content (CGI and SSI)</H3> -<OL> - - <LI><A NAME="CGIoutsideScriptAlias"> - <STRONG>How do I enable CGI execution in directories other than - the ScriptAlias?</STRONG> - </A> - <P> - Apache recognizes all files in a directory named as a - <A HREF="../mod/mod_alias.html#scriptalias"><SAMP>ScriptAlias</SAMP></A> - as being eligible for execution rather than processing as normal - documents. This applies regardless of the file name, so scripts in a - ScriptAlias directory don't need to be named - "<SAMP>*.cgi</SAMP>" or "<SAMP>*.pl</SAMP>" or - whatever. In other words, <EM>all</EM> files in a ScriptAlias - directory are scripts, as far as Apache is concerned. - </P> - <P> - To persuade Apache to execute scripts in other locations, such as in - directories where normal documents may also live, you must tell it how - to recognize them - and also that it's okay to execute them. For - this, you need to use something like the - <A HREF="../mod/mod_mime.html#addhandler"><SAMP>AddHandler</SAMP></A> - directive. - </P> - <P> - <OL> - <LI>In an appropriate section of your server configuration files, add - a line such as - <P> - <DL> - <DD><CODE>AddHandler cgi-script .cgi</CODE> - </DD> - </DL> - <P></P> - <P> - The server will then recognize that all files in that location (and - its logical descendants) that end in "<SAMP>.cgi</SAMP>" - are script files, not documents. - </P> - </LI> - <LI>Make sure that the directory location is covered by an - <A HREF="../mod/core.html#options"><SAMP>Options</SAMP></A> - declaration that includes the <SAMP>ExecCGI</SAMP> option. - </LI> - </OL> - <P></P> - <P> - In some situations, you might not want to actually - allow all files named "<SAMP>*.cgi</SAMP>" to be executable. - Perhaps all you want is to enable a particular file in a normal directory to - be executable. This can be alternatively accomplished - <EM>via</EM> <A HREF="../mod/mod_rewrite.html"><SAMP>mod_rewrite</SAMP></A> - and the following steps: - </P> - <P> - <OL> - <LI>Locally add to the corresponding <SAMP>.htaccess</SAMP> file a ruleset - similar to this one: - <P> - <DL> - <DD><CODE>RewriteEngine on - <BR> - RewriteBase /~foo/bar/ - <BR> - RewriteRule ^quux\.cgi$ - [T=application/x-httpd-cgi]</CODE> - </DD> - </DL> - <P></P> - </LI> - <LI>Make sure that the directory location is covered by an - <A HREF="../mod/core.html#options"><SAMP>Options</SAMP></A> - declaration that includes the <SAMP>ExecCGI</SAMP> and - <SAMP>FollowSymLinks</SAMP> option. - </LI> - </OL> - <P></P> - <HR> - </LI> - - <LI><A NAME="premature-script-headers"> - <STRONG>What does it mean when my CGIs fail with - "<SAMP>Premature end of script headers</SAMP>"?</STRONG> - </A> - <P> - It means just what it says: the server was expecting a complete set of - HTTP headers (one or more followed by a blank line), and didn't get - them. - </P> - <P> - The most common cause of this problem is the script dying before - sending the complete set of headers, or possibly any at all, to the - server. To see if this is the case, try running the script standalone - from an interactive session, rather than as a script under the server. - If you get error messages, this is almost certainly the cause of the - "premature end of script headers" message. Even if the CGI - runs fine from the command line, remember that the environment and - permissions may be different when running under the web server. The - CGI can only access resources allowed for the <A - HREF="../mod/mpm_common.html#user"><CODE>User</CODE></A> and - <A HREF="../mod/mpm_common.html#group"><CODE>Group</CODE></A> specified in - your Apache configuration. In addition, the environment will not be - the same as the one provided on the command line, but it can be - adjusted using the directives provided by <A - HREF="../mod/mod_env.html">mod_env</A>. - </P> - <P> - The second most common cause of this (aside from people not - outputting the required headers at all) is a result of an interaction - with Perl's output buffering. To make Perl flush its buffers - after each output statement, insert the following statements around - the <CODE>print</CODE> or <CODE>write</CODE> statements that send your - HTTP headers: - </P> - <P> - <DL> - <DD><CODE>{<BR> - local ($oldbar) = $|;<BR> - $cfh = select (STDOUT);<BR> - $| = 1;<BR> - #<BR> - # print your HTTP headers here<BR> - #<BR> - $| = $oldbar;<BR> - select ($cfh);<BR> - }</CODE> - </DD> - </DL> - <P></P> - <P> - This is generally only necessary when you are calling external - programs from your script that send output to stdout, or if there will - be a long delay between the time the headers are sent and the actual - content starts being emitted. To maximize performance, you should - turn buffer-flushing back <EM>off</EM> (with <CODE>$| = 0</CODE> or the - equivalent) after the statements that send the headers, as displayed - above. - </P> - <P> - If your script isn't written in Perl, do the equivalent thing for - whatever language you <EM>are</EM> using (<EM>e.g.</EM>, for C, call - <CODE>fflush()</CODE> after writing the headers). - </P> - <P> - Another cause for the "premature end of script headers" - message are the RLimitCPU and RLimitMEM directives. You may - get the message if the CGI script was killed due to a - resource limit. - </P> - <P> - In addition, a configuration problem in <A - HREF="../suexec.html">suEXEC</A>, mod_perl, or another third party - module can often interfere with the execution of your CGI and cause - the "premature end of script headers" message. - </P> - <HR> - </LI> - - <LI><A NAME="POSTnotallowed"> - <STRONG>Why do I keep getting "Method Not Allowed" for - form POST requests?</STRONG> - </A> - <P> - This is almost always due to Apache not being configured to treat the - file you are trying to POST to as a CGI script. You can not POST - to a normal HTML file; the operation has no meaning. See the FAQ - entry on <A HREF="#CGIoutsideScriptAlias">CGIs outside ScriptAliased - directories</A> for details on how to configure Apache to treat the - file in question as a CGI. - </P> - <HR> - </LI> - - <LI><A NAME="nph-scripts"> - <STRONG>How can I get my script's output without Apache buffering - it? Why doesn't my server push work?</STRONG> - </A> - <P> - As of Apache 1.3, CGI scripts are essentially not buffered. Every time - your script does a "flush" to output data, that data gets relayed on to - the client. Some scripting languages, for example Perl, have their own - buffering for output - this can be disabled by setting the <CODE>$|</CODE> - special variable to 1. Of course this does increase the overall number - of packets being transmitted, which can result in a sense of slowness for - the end user. - </P> - <P>Prior to 1.3, you needed to use "nph-" scripts to accomplish - non-buffering. Today, the only difference between nph scripts and - normal scripts is that nph scripts require the full HTTP headers to - be sent. - </P> - <HR> - </LI> - - <LI><A NAME="cgi-spec"> - <STRONG>Where can I find the "CGI specification"?</STRONG> - </A> - <P> - The Common Gateway Interface (CGI) specification can be found at - the original NCSA site - <<A HREF="http://hoohoo.ncsa.uiuc.edu/cgi/interface.html"> - <SAMP>http://hoohoo.ncsa.uiuc.edu/cgi/interface.html</SAMP></A>>. - This version hasn't been updated since 1995, and there have been - some efforts to update it. - </P> - <P> - A new draft is being worked on with the intent of making it an informational - RFC; you can find out more about this project at - <<A HREF="http://web.golux.com/coar/cgi/" - ><SAMP>http://web.golux.com/coar/cgi/</SAMP></A>>. - </P> - <HR> - </LI> - - <LI><A NAME="fastcgi"> - <STRONG>Why isn't FastCGI included with Apache any more?</STRONG> - </A> - <P> - The simple answer is that it was becoming too difficult to keep the - version being included with Apache synchronized with the master copy - at the - <A HREF="http://www.fastcgi.com/" - >FastCGI web site</A>. When a new version of Apache was released, the - version of the FastCGI module included with it would soon be out of date. - </P> - <P> - You can still obtain the FastCGI module for Apache from the master - FastCGI web site. - </P> - <HR> - </LI> - - <LI><A NAME="ssi-part-i"> - <STRONG>How do I enable SSI (parsed HTML)?</STRONG> - </A> - <P> - SSI (an acronym for Server-Side Include) directives allow static HTML - documents to be enhanced at run-time (<EM>e.g.</EM>, when delivered to - a client by Apache). The format of SSI directives is covered - in the <A HREF="../mod/mod_include.html">mod_include manual</A>; - suffice it to say that Apache supports not only SSI but - xSSI (eXtended SSI) directives. - </P> - <P> - Processing a document at run-time is called <EM>parsing</EM> it; - hence the term "parsed HTML" sometimes used for documents - that contain SSI instructions. Parsing tends to be - resource-consumptive compared to serving static files, and is not - enabled by default. It can also interfere with the cachability of - your documents, which can put a further load on your server. (See - the <A HREF="#ssi-part-ii">next question</A> for more information - about this.) - </P> - <P> - To enable SSI processing, you need to - </P> - <UL> - <LI>Build your server with the - <A HREF="../mod/mod_include.html"><SAMP>mod_include</SAMP></A> - module. This is normally compiled in by default. - </LI> - <LI>Make sure your server configuration files have an - <A HREF="../mod/core.html#options"><SAMP>Options</SAMP></A> - directive which permits <SAMP>Includes</SAMP>. - </LI> - <LI>Make sure that the directory where you want the SSI documents to - live is covered by the "INCLUDES" output filter, - either explicitly or in some ancestral location. That can be done - with the following - <A HREF="../mod/core.html#setoutputfilter"><SAMP>SetOutputFilter</SAMP></A> - directive: - <P> - <blockquote><code> - <FilesMatch "\.shtml[.$]"><br> - SetOutputFilter INCLUDES<br> - </FilesMatch> - </code></blockquote> - <P></P> - <P> - This indicates that all files with the extension - ".shtml" in that location (or its descendants) should be - parsed. Note that using ".html" will cause all normal - HTML files to be parsed, which may put an inordinate load on your - server. - </P> - </LI> - </UL> - <P> - For additional information, see the <CITE>Apache Week</CITE> article on - <A HREF="http://www.apacheweek.com/features/ssi" REL="Help" - ><CITE>Using Server Side Includes</CITE></A>. - </P> - <HR> - </LI> - - <LI><A NAME="ssi-part-ii"> - <STRONG>Why don't my parsed files get cached?</STRONG> - </A> - <P> - Since the server is performing run-time processing of your SSI - directives, which may change the content shipped to the client, it - can't know at the time it starts parsing what the final size of the - result will be, or whether the parsed result will always be the same. - This means that it can't generate <SAMP>Content-Length</SAMP> or - <SAMP>Last-Modified</SAMP> headers. Caches commonly work by comparing - the <SAMP>Last-Modified</SAMP> of what's in the cache with that being - delivered by the server. Since the server isn't sending that header - for a parsed document, whatever's doing the caching can't tell whether - the document has changed or not - and so fetches it again to be on the - safe side. - </P> - <P> - You can work around this in some cases by causing an - <SAMP>Expires</SAMP> header to be generated. (See the - <A HREF="../mod/mod_expires.html" REL="Help"><SAMP>mod_expires</SAMP></A> - documentation for more details.) Another possibility is to use the - <A HREF="../mod/mod_include.html#xbithack" REL="Help" - ><SAMP>XBitHack Full</SAMP></A> - mechanism, which tells Apache to send (under certain circumstances - detailed in the XBitHack directive description) a - <SAMP>Last-Modified</SAMP> header based upon the last modification - time of the file being parsed. Note that this may actually be lying - to the client if the parsed file doesn't change but the SSI-inserted - content does; if the included content changes often, this can result - in stale copies being cached. - </P> - <HR> - </LI> - - <LI><A NAME="ssi-part-iii"> - <STRONG>How can I have my script output parsed?</STRONG> - </A> - <P> - So you want to include SSI directives in the output from your CGI - script, but can't figure out how to do it? - The short answer is "you can't." This is potentially - a security liability and, more importantly, it can not be cleanly - implemented under the current server API. The best workaround - is for your script itself to do what the SSIs would be doing. - After all, it's generating the rest of the content. - </P> - <P> - This is a feature The Apache Group hopes to add in the next major - release after 1.3. - </P> - <HR> - </LI> - - <LI><A NAME="ssi-part-iv"> - <STRONG>SSIs don't work for VirtualHosts and/or - user home directories.</STRONG> - </A> - <P> - This is almost always due to having some setting in your config file that - sets "Options Includes" or some other setting for your DocumentRoot - but not for other directories. If you set it inside a Directory - section, then that setting will only apply to that directory. - </P> - <HR> - </LI> - - <LI><A NAME="errordocssi"> - <STRONG>How can I use <CODE>ErrorDocument</CODE> - and SSI to simplify customized error messages?</STRONG> - </A> - <P> - Have a look at <A HREF="custom_errordocs.html">this document</A>. - It shows in example form how you can a combination of XSSI and - negotiation to tailor a set of <CODE>ErrorDocument</CODE>s to your - personal taste, and returning different internationalized error - responses based on the client's native language. - </P> - <HR> - </LI> - - <LI><A NAME="remote-user-var"> - <STRONG>Why is the environment variable - <SAMP>REMOTE_USER</SAMP> not set?</STRONG> - </A> - <P> - This variable is set and thus available in SSI or CGI scripts <STRONG>if and - only if</STRONG> the requested document was protected by access - authentication. For an explanation on how to implement these restrictions, - see - <A HREF="http://www.apacheweek.com/"><CITE>Apache Week</CITE></A>'s - articles on - <A HREF="http://www.apacheweek.com/features/userauth" - ><CITE>Using User Authentication</CITE></A> - or - <A HREF="http://www.apacheweek.com/features/dbmauth" - ><CITE>DBM User Authentication</CITE></A>. - </P> - <P> - Hint: When using a CGI script to receive the data of a HTML <SAMP>FORM</SAMP> - notice that protecting the document containing the <SAMP>FORM</SAMP> is not - sufficient to provide <SAMP>REMOTE_USER</SAMP> to the CGI script. You have - to protect the CGI script, too. Or alternatively only the CGI script (then - authentication happens only after filling out the form). - </P> - <HR> - </LI> - <LI><A NAME="user-cgi"><STRONG>How do I allow each of my user directories - to have a cgi-bin directory?</STRONG></A> - <P> - Remember that CGI execution does not need to be restricted only to - cgi-bin directories. You can <A HREF="#CGIoutsideScriptAlias">allow - CGI script execution in arbitrary parts of your filesystem</A>. - </P> - <P> - There are many ways to give each user directory a cgi-bin directory - such that anything requested as - <SAMP>http://example.com/~user/cgi-bin/program</SAMP> will be - executed as a CGI script. - Two alternatives are: - <OL> - <LI>Place the cgi-bin directory next to the public_html directory: - - <DL> - <DD><CODE>ScriptAliasMatch ^/~([^/]*)/cgi-bin/(.*) /home/$1/cgi-bin/$2</CODE> - </DD> - </DL> - - </LI> - <LI>Place the cgi-bin directory underneath the public_html directory: - <DL> - <DD><CODE><Directory /home/*/public_html/cgi-bin><BR> - Options ExecCGI<BR> - SetHandler cgi-script<BR> - </Directory></CODE> - </DD> - </DL> - </LI> - </OL> - <HR> - </LI> - -</OL> -<!--#endif --> -<!--#if expr="$STANDALONE" --> - <!-- Don't forget to add HR tags at the end of each list item.. --> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> -<!--#endif --> diff --git a/docs/manual/misc/FAQ-G.html b/docs/manual/misc/FAQ-G.html deleted file mode 100644 index 3900445f8e..0000000000 --- a/docs/manual/misc/FAQ-G.html +++ /dev/null @@ -1,406 +0,0 @@ -<!--#if expr="$FAQMASTER" --> - <!--#set var="STANDALONE" value="" --> - <!--#set var="INCLUDED" value="YES" --> - <!--#if expr="$QUERY_STRING = TOC" --> - <!--#set var="TOC" value="YES" --> - <!--#set var="CONTENT" value="" --> - <!--#else --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="YES" --> - <!--#endif --> -<!--#else --> - <!--#set var="STANDALONE" value="YES" --> - <!--#set var="INCLUDED" value="" --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="" --> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> - <HEAD> - <TITLE>Apache Server Frequently Asked Questions</TITLE> - </HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> - <BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" - > - <!--#include virtual="header.html" --> - <H1 ALIGN="CENTER">Apache Server Frequently Asked Questions</H1> - <P> - $Revision: 1.5 $ ($Date: 2001/02/28 03:36:00 $) - </P> - <P> - The latest version of this FAQ is always available from the main - Apache web site, at - <<A - HREF="http://www.apache.org/docs/misc/FAQ.html" - REL="Help" - ><SAMP>http://www.apache.org/docs/misc/FAQ.html</SAMP></A>>. - </P> -<!-- Notes about changes: --> -<!-- - If adding a relative link to another part of the --> -<!-- documentation, *do* include the ".html" portion. There's a --> -<!-- good chance that the user will be reading the documentation --> -<!-- on his own system, which may not be configured for --> -<!-- multiviews. --> -<!-- - When adding items, make sure they're put in the right place --> -<!-- - verify that the numbering matches up. --> -<!-- - *Don't* use <PRE></PRE> blocks - they don't appear --> -<!-- correctly in a reliable way when this is converted to text --> -<!-- with Lynx. Use <DL><DD><CODE>xxx<BR>xx</CODE></DD></DL> --> -<!-- blocks inside a <P></P> instead. This is necessary to get --> -<!-- the horizontal and vertical indenting right. --> -<!-- - Don't forget to include an HR tag after the last /P tag --> -<!-- but before the /LI in an item. --> - <P> - If you are reading a text-only version of this FAQ, you may find numbers - enclosed in brackets (such as "[12]"). These refer to the list of - reference URLs to be found at the end of the document. These references - do not appear, and are not needed, for the hypertext version. - </P> - <H2>The Questions</H2> -<OL TYPE="A"> -<!--#endif --> -<!--#if expr="$TOC || $STANDALONE" --> - <LI VALUE="7"><STRONG>Authentication and Access Restrictions</STRONG> - <OL> - <LI><A HREF="#dnsauth">Why isn't restricting access by host or domain name - working correctly?</A> - </LI> - <LI><A HREF="#user-authentication">How do I set up Apache to require - a username and password to access certain documents?</A> - </LI> - <LI><A HREF="#remote-auth-only">How do I set up Apache to allow access - to certain documents only if a site is either a local site - <EM>or</EM> the user supplies a password and username?</A> - </LI> - <LI><A HREF="#authauthoritative">Why does my authentication give - me a server error?</A> - </LI> - <LI><A HREF="#auth-on-same-machine">Do I have to keep the (mSQL) - authentication information on the same machine?</A> - </LI> - <LI><A HREF="#msql-slow">Why is my mSQL authentication terribly slow?</A> - </LI> - <LI><A HREF="#passwdauth">Can I use my <SAMP>/etc/passwd</SAMP> file - for Web page authentication?</A> - </LI> - <LI><A HREF="#prompted-twice">Why does Apache ask for my password - twice before serving a file?</a> - </LI> - </OL> - </LI> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -</OL> - -<HR> - - <H2>The Answers</H2> -<!--#endif --> -<!--#if expr="! $TOC" --> - <H3>G. Authentication and Access Restrictions</H3> -<OL> - - <LI><A NAME="dnsauth"> - <STRONG>Why isn't restricting access by host or domain name - working correctly?</STRONG> - </A> - <P> - Two of the most common causes of this are: - </P> - <OL> - <LI><STRONG>An error, inconsistency, or unexpected mapping in the DNS - registration</STRONG> - <BR> - This happens frequently: your configuration restricts access to - <SAMP>Host.FooBar.Com</SAMP>, but you can't get in from that host. - The usual reason for this is that <SAMP>Host.FooBar.Com</SAMP> is - actually an alias for another name, and when Apache performs the - address-to-name lookup it's getting the <EM>real</EM> name, not - <SAMP>Host.FooBar.Com</SAMP>. You can verify this by checking the - reverse lookup yourself. The easiest way to work around it is to - specify the correct host name in your configuration. - </LI> - <LI><STRONG>Inadequate checking and verification in your - configuration of Apache</STRONG> - <BR> - If you intend to perform access checking and restriction based upon - the client's host or domain name, you really need to configure - Apache to double-check the origin information it's supplied. You do - this by adding the <SAMP>-DMAXIMUM_DNS</SAMP> clause to the - <SAMP>EXTRA_CFLAGS</SAMP> definition in your - <SAMP>Configuration</SAMP> file. For example: - <P> - <DL> - <DD><CODE>EXTRA_CFLAGS=-DMAXIMUM_DNS</CODE> - </DD> - </DL> - <P></P> - <P> - This will cause Apache to be very paranoid about making sure a - particular host address is <EM>really</EM> assigned to the name it - claims to be. Note that this <EM>can</EM> incur a significant - performance penalty, however, because of all the name resolution - requests being sent to a nameserver. - </P> - </LI> - </OL> - <HR> - </LI> - - <LI><A NAME="user-authentication"> - <STRONG>How do I set up Apache to require a username and - password to access certain documents?</STRONG> - </A> - <P> - There are several ways to do this; some of the more popular - ones are to use the <A HREF="../mod/mod_auth.html">mod_auth</A>, - <A HREF="../mod/mod_auth_db.html">mod_auth_db</A>, or - <A HREF="../mod/mod_auth_dbm.html">mod_auth_dbm</A> modules. - </P> - <P> - For an explanation on how to implement these restrictions, see - <A HREF="http://www.apacheweek.com/"><CITE>Apache Week</CITE></A>'s - articles on - <A HREF="http://www.apacheweek.com/features/userauth" - ><CITE>Using User Authentication</CITE></A> - or - <A HREF="http://www.apacheweek.com/features/dbmauth" - ><CITE>DBM User Authentication</CITE></A>. - </P> - <HR> - </LI> - - <LI><A NAME="remote-auth-only"> - <STRONG>How do I set up Apache to allow access to certain - documents only if a site is either a local site <EM>or</EM> - the user supplies a password and username?</STRONG> - </A> - <P> - Use the <A HREF="../mod/core.html#satisfy">Satisfy</A> directive, - in particular the <CODE>Satisfy Any</CODE> directive, to require - that only one of the access restrictions be met. For example, - adding the following configuration to a <SAMP>.htaccess</SAMP> - or server configuration file would restrict access to people who - either are accessing the site from a host under domain.com or - who can supply a valid username and password: - </P> - <P> - <DL> - <DD><CODE>Deny from all - <BR> - Allow from .domain.com - <BR> - AuthType Basic - <BR> - AuthUserFile /usr/local/apache/conf/htpasswd.users - <BR> - AuthName "special directory" - <BR> - Require valid-user - <BR> - Satisfy any</CODE> - </DD> - </DL> - <P></P> - <P> - See the <A HREF="#user-authentication">user authentication</A> - question and the <A HREF="../mod/mod_access.html">mod_access</A> - module for details on how the above directives work. - </P> - <HR> - </LI> - - <LI><A NAME="authauthoritative"> - <STRONG>Why does my authentication give me a server error?</STRONG> - </A> - <P> - Under normal circumstances, the Apache access control modules will - pass unrecognized user IDs on to the next access control module in - line. Only if the user ID is recognized and the password is validated - (or not) will it give the usual success or "authentication - failed" messages. - </P> - <P> - However, if the last access module in line 'declines' the validation - request (because it has never heard of the user ID or because it is not - configured), the <SAMP>http_request</SAMP> handler will give one of - the following, confusing, errors: - </P> - <UL> - <LI><SAMP>check access</SAMP> - </LI> - <LI><SAMP>check user. No user file?</SAMP> - </LI> - <LI><SAMP>check access. No groups file?</SAMP> - </LI> - </UL> - <P> - This does <EM>not</EM> mean that you have to add an - '<SAMP>AuthUserFile /dev/null</SAMP>' line as some magazines suggest! - </P> - <P> - The solution is to ensure that at least the last module is authoritative - and <STRONG>CONFIGURED</STRONG>. By default, <SAMP>mod_auth</SAMP> is - authoritative and will give an OK/Denied, but only if it is configured - with the proper <SAMP>AuthUserFile</SAMP>. Likewise, if a valid group - is required. (Remember that the modules are processed in the reverse - order from that in which they appear in your compile-time - <SAMP>Configuration</SAMP> file.) - </P> - <P> - A typical situation for this error is when you are using the - <SAMP>mod_auth_dbm</SAMP>, <SAMP>mod_auth_msql</SAMP>, - <SAMP>mod_auth_mysql</SAMP>, <SAMP>mod_auth_anon</SAMP> or - <SAMP>mod_auth_cookie</SAMP> modules on their own. These are by - default <STRONG>not</STRONG> authoritative, and this will pass the - buck on to the (non-existent) next authentication module when the - user ID is not in their respective database. Just add the appropriate - '<SAMP><EM>XXX</EM>Authoritative yes</SAMP>' line to the configuration. - </P> - <P> - In general it is a good idea (though not terribly efficient) to have the - file-based <SAMP>mod_auth</SAMP> a module of last resort. This allows - you to access the web server with a few special passwords even if the - databases are down or corrupted. This does cost a - file open/seek/close for each request in a protected area. - </P> - <HR> - </LI> - - <LI><A NAME="auth-on-same-machine"> - <STRONG>Do I have to keep the (mSQL) authentication information - on the same machine?</STRONG> - </A> - <P> - Some organizations feel very strongly about keeping the authentication - information on a different machine than the webserver. With the - <SAMP>mod_auth_msql</SAMP>, <SAMP>mod_auth_mysql</SAMP>, and other SQL - modules connecting to (R)DBMses this is quite possible. Just configure - an explicit host to contact. - </P> - <P> - Be aware that with mSQL and Oracle, opening and closing these database - connections is very expensive and time consuming. You might want to - look at the code in the <SAMP>auth_*</SAMP> modules and play with the - compile time flags to alleviate this somewhat, if your RDBMS licences - allow for it. - </P> - <HR> - </LI> - - <LI><A NAME="msql-slow"> - <STRONG>Why is my mSQL authentication terribly slow?</STRONG> - </A> - <P> - You have probably configured the Host by specifying a FQHN, - and thus the <SAMP>libmsql</SAMP> will use a full blown TCP/IP socket - to talk to the database, rather than a fast internal device. The - <SAMP>libmsql</SAMP>, the mSQL FAQ, and the <SAMP>mod_auth_msql</SAMP> - documentation warn you about this. If you have to use different - hosts, check out the <SAMP>mod_auth_msql</SAMP> code for - some compile time flags which might - or might not - suit you. - </P> - <HR> - </LI> - - <LI><A NAME="passwdauth"> - <STRONG>Can I use my <SAMP>/etc/passwd</SAMP> file - for Web page authentication?</STRONG> - </A> - <P> - Yes, you can - but it's a <STRONG>very bad idea</STRONG>. Here are - some of the reasons: - </P> - <UL> - <LI>The Web technology provides no governors on how often or how - rapidly password (authentication failure) retries can be made. That - means that someone can hammer away at your system's - <SAMP>root</SAMP> password using the Web, using a dictionary or - similar mass attack, just as fast as the wire and your server can - handle the requests. Most operating systems these days include - attack detection (such as <EM>n</EM> failed passwords for the same - account within <EM>m</EM> seconds) and evasion (breaking the - connection, disabling the account under attack, disabling - <EM>all</EM> logins from that source, <EM>et cetera</EM>), but the - Web does not. - </LI> - <LI>An account under attack isn't notified (unless the server is - heavily modified); there's no "You have 19483 login - failures" message when the legitimate owner logs in. - </LI> - <LI>Without an exhaustive and error-prone examination of the server - logs, you can't tell whether an account has been compromised. - Detecting that an attack has occurred, or is in progress, is fairly - obvious, though - <EM>if</EM> you look at the logs. - </LI> - <LI>Web authentication passwords (at least for Basic authentication) - generally fly across the wire, and through intermediate proxy - systems, in what amounts to plain text. "O'er the net we - go/Caching all the way;/O what fun it is to surf/Giving my password - away!" - </LI> - <LI>Since HTTP is stateless, information about the authentication is - transmitted <EM>each and every time</EM> a request is made to the - server. Essentially, the client caches it after the first - successful access, and transmits it without asking for all - subsequent requests to the same server. - </LI> - <LI>It's relatively trivial for someone on your system to put up a - page that will steal the cached password from a client's cache - without them knowing. Can you say "password grabber"? - </LI> - </UL> - <P> - If you still want to do this in light of the above disadvantages, the - method is left as an exercise for the reader. It'll void your Apache - warranty, though, and you'll lose all accumulated UNIX guru points. - </P> - <HR> - </LI> - <LI><A NAME="prompted-twice"><STRONG>Why does Apache ask for my password - twice before serving a file?</STRONG></a> - <P> - If the hostname under which you are accessing the server is - different than the hostname specified in the - <A HREF="../mod/core.html#servername"><CODE>ServerName</CODE></A> - directive, then depending on the setting of the - <A HREF="../mod/core.html#usecanonicalname"><CODE>UseCanonicalName</CODE></A> - directive, Apache will redirect you to a new hostname when - constructing self-referential URLs. This happens, for example, in - the case where you request a directory without including the - trailing slash. - </P> - <P> - When this happens, Apache will ask for authentication once under the - original hostname, perform the redirect, and then ask again under the - new hostname. For security reasons, the browser must prompt again - for the password when the host name changes. - </P> - <P> - To eliminate this problem you should - </P> - - <OL> - <LI>Always use the trailing slash when requesting directories; - <LI>Change the <CODE>ServerName</CODE> to match the name you are - using in the URL; and/or - <LI>Set <CODE>UseCanonicalName off</CODE>. - </OL> - <HR> - </LI> - -</OL> -<!--#endif --> -<!--#if expr="$STANDALONE" --> - <!-- Don't forget to add HR tags at the end of each list item.. --> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> -<!--#endif --> diff --git a/docs/manual/misc/FAQ-H.html b/docs/manual/misc/FAQ-H.html deleted file mode 100644 index 778884c74d..0000000000 --- a/docs/manual/misc/FAQ-H.html +++ /dev/null @@ -1,267 +0,0 @@ -<!--#if expr="$FAQMASTER" --> - <!--#set var="STANDALONE" value="" --> - <!--#set var="INCLUDED" value="YES" --> - <!--#if expr="$QUERY_STRING = TOC" --> - <!--#set var="TOC" value="YES" --> - <!--#set var="CONTENT" value="" --> - <!--#else --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="YES" --> - <!--#endif --> -<!--#else --> - <!--#set var="STANDALONE" value="YES" --> - <!--#set var="INCLUDED" value="" --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="" --> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> - <HEAD> - <TITLE>Apache Server Frequently Asked Questions</TITLE> - </HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> - <BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" - > - <!--#include virtual="header.html" --> - <H1 ALIGN="CENTER">Apache Server Frequently Asked Questions</H1> - <P> - $Revision: 1.2 $ ($Date: 2000/09/09 18:19:54 $) - </P> - <P> - The latest version of this FAQ is always available from the main - Apache web site, at - <<A - HREF="http://www.apache.org/docs/misc/FAQ.html" - REL="Help" - ><SAMP>http://www.apache.org/docs/misc/FAQ.html</SAMP></A>>. - </P> -<!-- Notes about changes: --> -<!-- - If adding a relative link to another part of the --> -<!-- documentation, *do* include the ".html" portion. There's a --> -<!-- good chance that the user will be reading the documentation --> -<!-- on his own system, which may not be configured for --> -<!-- multiviews. --> -<!-- - When adding items, make sure they're put in the right place --> -<!-- - verify that the numbering matches up. --> -<!-- - *Don't* use <PRE></PRE> blocks - they don't appear --> -<!-- correctly in a reliable way when this is converted to text --> -<!-- with Lynx. Use <DL><DD><CODE>xxx<BR>xx</CODE></DD></DL> --> -<!-- blocks inside a <P></P> instead. This is necessary to get --> -<!-- the horizontal and vertical indenting right. --> -<!-- - Don't forget to include an HR tag after the last /P tag --> -<!-- but before the /LI in an item. --> - <P> - If you are reading a text-only version of this FAQ, you may find numbers - enclosed in brackets (such as "[12]"). These refer to the list of - reference URLs to be found at the end of the document. These references - do not appear, and are not needed, for the hypertext version. - </P> - <H2>The Questions</H2> -<OL TYPE="A"> -<!--#endif --> -<!--#if expr="$TOC || $STANDALONE" --> - <LI VALUE="8"><STRONG>URL Rewriting</STRONG> - <OL> - <LI><A HREF="#rewrite-more-config">Where can I find mod_rewrite rulesets - which already solve particular URL-related problems?</A> - </LI> - <LI><A HREF="#rewrite-article">Where can I find any published information - about URL-manipulations and mod_rewrite?</A> - </LI> - <LI><A HREF="#rewrite-complexity">Why is mod_rewrite so difficult to learn - and seems so complicated?</A> - </LI> - <LI><A HREF="#rewrite-dontwork">What can I do if my RewriteRules don't work - as expected?</A> - </LI> - <LI><A HREF="#rewrite-prefixdocroot">Why don't some of my URLs get - prefixed with DocumentRoot when using mod_rewrite?</A> - </LI> - <LI><A HREF="#rewrite-nocase">How can I make all my URLs case-insensitive - with mod_rewrite?</A> - </LI> - <LI><A HREF="#rewrite-virthost">Why are RewriteRules in my VirtualHost - parts ignored?</A> - </LI> - <LI><A HREF="#rewrite-envwhitespace">How can I use strings with whitespaces - in RewriteRule's ENV flag?</A> - </LI> - </OL> - </LI> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -</OL> - -<HR> - - <H2>The Answers</H2> -<!--#endif --> -<!--#if expr="! $TOC" --> - - <H3>H. URL Rewriting</H3> -<OL> - - <LI><A NAME="rewrite-more-config"> - <STRONG>Where can I find mod_rewrite rulesets which already solve - particular URL-related problems?</STRONG> - </A> - <P> - There is a collection of - <A HREF="http://www.engelschall.com/pw/apache/rewriteguide/" - >Practical Solutions for URL-Manipulation</A> - where you can - find all typical solutions the author of - <A HREF="../mod/mod_rewrite.html"><SAMP>mod_rewrite</SAMP></A> - currently knows of. If you have more - interesting rulesets which solve particular problems not currently covered in - this document, send it to - <A HREF="mailto:rse@apache.org">Ralf S. Engelschall</A> - for inclusion. The - other webmasters will thank you for avoiding the reinvention of the wheel. - </P> - <HR> - </LI> - - <LI><A NAME="rewrite-article"> - <STRONG>Where can I find any published information about - URL-manipulations and mod_rewrite?</STRONG> - </A> - <P> - There is an article from - <A HREF="mailto:rse@apache.org" - >Ralf S. Engelschall</A> - about URL-manipulations based on - <A HREF="../mod/mod_rewrite.html"><SAMP>mod_rewrite</SAMP></A> - in the "iX Multiuser Multitasking Magazin" issue #12/96. The - german (original) version - can be read online at - <<A HREF="http://www.heise.de/ix/artikel/9612149/" - >http://www.heise.de/ix/artikel/9612149/</A>>, - the English (translated) version can be found at - <<A HREF="http://www.heise.de/ix/artikel/E/9612149/" - >http://www.heise.de/ix/artikel/E/9612149/</A>>. - </P> - <HR> - </LI> - - <LI><A NAME="rewrite-complexity"> - <STRONG>Why is mod_rewrite so difficult to learn and seems so - complicated?</STRONG> - </A> - <P> - Hmmm... there are a lot of reasons. First, mod_rewrite itself is a powerful - module which can help you in really <STRONG>all</STRONG> aspects of URL - rewriting, so it can be no trivial module per definition. To accomplish - its hard job it uses software leverage and makes use of a powerful regular - expression - library by Henry Spencer which is an integral part of Apache since its - version 1.2. And regular expressions itself can be difficult to newbies, - while providing the most flexible power to the advanced hacker. - </P> - <P> - On the other hand mod_rewrite has to work inside the Apache API environment - and needs to do some tricks to fit there. For instance the Apache API as of - 1.x really was not designed for URL rewriting at the <TT>.htaccess</TT> - level of processing. Or the problem of multiple rewrites in sequence, which - is also not handled by the API per design. To provide this features - mod_rewrite has to do some special (but API compliant!) handling which leads - to difficult processing inside the Apache kernel. While the user usually - doesn't see anything of this processing, it can be difficult to find - problems when some of your RewriteRules seem not to work. - </P> - <HR> - </LI> - - <LI><A NAME="rewrite-dontwork"> - <STRONG>What can I do if my RewriteRules don't work as expected? - </STRONG> - </A> - <P> - Use "<SAMP>RewriteLog somefile</SAMP>" and - "<SAMP>RewriteLogLevel 9</SAMP>" and have a precise look at the - steps the rewriting engine performs. This is really the only one and best - way to debug your rewriting configuration. - </P> - <HR> - </LI> - - <LI><A NAME="rewrite-prefixdocroot"><STRONG>Why don't some of my URLs - get prefixed with DocumentRoot when using mod_rewrite?</STRONG> - </A> - <P> - If the rule starts with <SAMP>/somedir/...</SAMP> make sure that - really no <SAMP>/somedir</SAMP> exists on the filesystem if you - don't want to lead the URL to match this directory, <EM>i.e.</EM>, - there must be no root directory named <SAMP>somedir</SAMP> on the - filesystem. Because if there is such a directory, the URL will not - get prefixed with DocumentRoot. This behaviour looks ugly, but is - really important for some other aspects of URL rewriting. - </P> - <HR> - </LI> - - <LI><A NAME="rewrite-nocase"> - <STRONG>How can I make all my URLs case-insensitive with mod_rewrite? - </STRONG> - </A> - <P> - You can't! The reasons are: first, that, case translations for - arbitrary length URLs cannot be done <EM>via</EM> regex patterns and - corresponding substitutions. One needs a per-character pattern like - the sed/Perl <SAMP>tr|..|..|</SAMP> feature. Second, just making - URLs always upper or lower case does not solve the whole problem of - case-INSENSITIVE URLs, because URLs actually have to be rewritten to - the correct case-variant for the file residing on the filesystem - in order to allow Apache to access the file. And - the Unix filesystem is always case-SENSITIVE. - </P> - <P> - But there is a module named <CODE><A - HREF="../mod/mod_speling.html">mod_speling.c</A></CODE> in the - Apache distribution. Try this module to help correct people who use - mis-cased URLs. - </P> - <HR> - </LI> - - <LI><A NAME="rewrite-virthost"> - <STRONG> Why are RewriteRules in my VirtualHost parts ignored?</STRONG> - </A> - <P> - Because you have to enable the engine for every virtual host explicitly due - to security concerns. Just add a "RewriteEngine on" to your - virtual host configuration parts. - </P> - <HR> - </LI> - - <LI><A NAME="rewrite-envwhitespace"> - <STRONG> How can I use strings with whitespaces in RewriteRule's ENV - flag?</STRONG> - </A> - <P> - There is only one ugly solution: You have to surround the complete - flag argument by quotation marks (<SAMP>"[E=...]"</SAMP>). Notice: - The argument to quote here is not the argument to the E-flag, it is - the argument of the Apache config file parser, <EM>i.e.</EM>, the - third argument of the RewriteRule here. So you have to write - <SAMP>"[E=any text with whitespaces]"</SAMP>. - </P> - <HR> - </LI> - -</OL> -<!--#endif --> -<!--#if expr="$STANDALONE" --> - <!-- Don't forget to add HR tags at the end of each list item.. --> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> -<!--#endif --> diff --git a/docs/manual/misc/FAQ-I.html b/docs/manual/misc/FAQ-I.html deleted file mode 100644 index 1c1ea79ad1..0000000000 --- a/docs/manual/misc/FAQ-I.html +++ /dev/null @@ -1,271 +0,0 @@ -<!--#if expr="$FAQMASTER" --> - <!--#set var="STANDALONE" value="" --> - <!--#set var="INCLUDED" value="YES" --> - <!--#if expr="$QUERY_STRING = TOC" --> - <!--#set var="TOC" value="YES" --> - <!--#set var="CONTENT" value="" --> - <!--#else --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="YES" --> - <!--#endif --> -<!--#else --> - <!--#set var="STANDALONE" value="YES" --> - <!--#set var="INCLUDED" value="" --> - <!--#set var="TOC" value="" --> - <!--#set var="CONTENT" value="" --> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> - <HEAD> - <TITLE>Apache Server Frequently Asked Questions</TITLE> - </HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> - <BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" - > - <!--#include virtual="header.html" --> - <H1 ALIGN="CENTER">Apache Server Frequently Asked Questions</H1> - <P> - $Revision: 1.9 $ ($Date: 2001/02/28 03:36:00 $) - </P> - <P> - The latest version of this FAQ is always available from the main - Apache web site, at - <<A - HREF="http://www.apache.org/docs/misc/FAQ.html" - REL="Help" - ><SAMP>http://www.apache.org/docs/misc/FAQ.html</SAMP></A>>. - </P> -<!-- Notes about changes: --> -<!-- - If adding a relative link to another part of the --> -<!-- documentation, *do* include the ".html" portion. There's a --> -<!-- good chance that the user will be reading the documentation --> -<!-- on his own system, which may not be configured for --> -<!-- multiviews. --> -<!-- - When adding items, make sure they're put in the right place --> -<!-- - verify that the numbering matches up. --> -<!-- - *Don't* use <PRE></PRE> blocks - they don't appear --> -<!-- correctly in a reliable way when this is converted to text --> -<!-- with Lynx. Use <DL><DD><CODE>xxx<BR>xx</CODE></DD></DL> --> -<!-- blocks inside a <P></P> instead. This is necessary to get --> -<!-- the horizontal and vertical indenting right. --> -<!-- - Don't forget to include an HR tag after the last /P tag --> -<!-- but before the /LI in an item. --> - <P> - If you are reading a text-only version of this FAQ, you may find numbers - enclosed in brackets (such as "[12]"). These refer to the list of - reference URLs to be found at the end of the document. These references - do not appear, and are not needed, for the hypertext version. - </P> - <H2>The Questions</H2> -<OL TYPE="A"> -<!--#endif --> -<!--#if expr="$TOC || $STANDALONE" --> - <LI VALUE="9"><STRONG>Features</STRONG> - <OL> - <LI><A HREF="#proxy">Does or will Apache act as a Proxy server?</A> - </LI> - <LI><A HREF="#multiviews">What are "multiviews"?</A> - </LI> - <LI><A HREF="#putsupport">Why can't I publish to my Apache server - using PUT on Netscape Gold and other programs?</A> - </LI> - <LI><A HREF="#SSL-i">Why doesn't Apache include SSL?</A> - </LI> - <LI><A HREF="#footer">How can I attach a footer to my documents - without using SSI?</A> - </LI> - <LI><A HREF="#search">Does Apache include a search engine?</A> - </LI> - <LI><A HREF="#rotate">How can I rotate my log files?</A> - </LI> - <LI><A HREF="#conditional-logging">How do I keep certain requests from - appearing in my logs?</A> - </LI> - </OL> - </LI> -<!--#endif --> -<!--#if expr="$STANDALONE" --> -</OL> - -<HR> - - <H2>The Answers</H2> -<!--#endif --> -<!--#if expr="! $TOC" --> - - <H3>I. Features</H3> -<OL> - - <LI><A NAME="proxy"> - <STRONG>Does or will Apache act as a Proxy server?</STRONG> - </A> - <P> - Apache version 1.1 and above comes with a - <A HREF="../mod/mod_proxy.html">proxy module</A>. - If compiled in, this will make Apache act as a caching-proxy server. - </P> - <HR> - </LI> - - <LI><A NAME="multiviews"> - <STRONG>What are "multiviews"?</STRONG> - </A> - <P> - "Multiviews" is the general name given to the Apache - server's ability to provide language-specific document variants in - response to a request. This is documented quite thoroughly in the - <A HREF="../content-negotiation.html" REL="Help">content negotiation</A> - description page. In addition, <CITE>Apache Week</CITE> carried an - article on this subject entitled - "<A HREF="http://www.apacheweek.com/features/negotiation" REL="Help" - ><CITE>Content Negotiation Explained</CITE></A>". - </P> - <HR> - </LI> - - <LI><A NAME="putsupport"> - <STRONG>Why can't I publish to my Apache server using PUT on - Netscape Gold and other programs?</STRONG> - </A> - <P> - Because you need to install and configure a script to handle - the uploaded files. This script is often called a "PUT" handler. - There are several available, but they may have security problems. - Using FTP uploads may be easier and more secure, at least for now. - For more information, see the <CITE>Apache Week</CITE> article - <A HREF="http://www.apacheweek.com/features/put" - ><CITE>Publishing Pages with PUT</CITE></A>. - </P> - <HR> - </LI> - - <LI><A NAME="SSL-i"> - <STRONG>Why doesn't Apache include SSL?</STRONG> - </A> - <P> - SSL (Secure Socket Layer) data transport requires encryption, and many - governments have restrictions upon the import, export, and use of - encryption technology. If Apache included SSL in the base package, - its distribution would involve all sorts of legal and bureaucratic - issues, and it would no longer be freely available. Also, some of - the technology required to talk to current clients using SSL is - patented by <A HREF="http://www.rsa.com/">RSA Data Security</A>, - who restricts its use without a license. - </P> - <P> - Some SSL implementations of Apache are available, however; see the - "<A HREF="http://www.apache.org/related_projects.html" - >related projects</A>" - page at the main Apache web site. - </P> - <P> - You can find out more about this topic in the <CITE>Apache Week</CITE> - article about - <A HREF="http://www.apacheweek.com/features/ssl" REL="Help" - ><CITE>Apache and Secure Transactions</CITE></A>. - </P> - <HR> - </LI> - <LI><A NAME="footer"> - <STRONG>How can I attach a footer to my documents - without using SSI?</STRONG> - </A> - <P> - You can make arbitrary changes to static documents by configuring an - <A HREF="http://www.apache.org/docs/mod/mod_actions.html#action"> - Action</A> which launches a CGI script. The CGI is then - responsible for setting a content-type and delivering the requested - document (the location of which is passed in the - <SAMP>PATH_TRANSLATED</SAMP> environment variable), along with - whatever footer is needed. - </P> - <P> - Busy sites may not want to run a CGI script on every request, and - should consider using an Apache module to add the footer. There are - several third party modules available through the <A - HREF="http://modules.apache.org/">Apache Module Registry</A> which - will add footers to documents. These include mod_trailer, PHP - (<SAMP>php3_auto_append_file</SAMP>), mod_layout, and mod_perl - (<SAMP>Apache::Sandwich</SAMP>). - </P> - <HR> - </LI> - <LI><A NAME="search"> - <STRONG>Does Apache include a search engine?</STRONG> - </A> - <P>Apache does not include a search engine, but there are many good - commercial and free search engines which can be used easily with - Apache. Some of them are listed on the <A - HREF="http://www.searchtools.com/tools/tools.html">Web Site Search - Tools</A> page. Open source search engines that are often used with - Apache include <A HREF="http://www.htdig.org/">ht://Dig</A> and <A - HREF="http://sunsite.berkeley.edu/SWISH-E/">SWISH-E</A>. - </P> - <HR> - </LI> - <LI><A NAME="rotate"> - <STRONG>How can I rotate my log files?</STRONG> - </A> - <P>The simple answer: by piping the transfer log into an appropriate - log file rotation utility.</P> - - <P>The longer answer: In the - src/support/ directory, you will find a utility called <a - href="../programs/rotatelogs.html">rotatelogs</a> which can be used - like this:</p> - - <PRE>TransferLog "|/path/to/rotatelogs - /path/to/logs/access_log 86400"</PRE> - - <p>to enable daily rotation of - the log files.<BR> - A more sophisticated solution of a logfile - rotation utility is available under the name <CODE>cronolog</CODE> - from Andrew Ford's site at <A - HREF="http://www.ford-mason.co.uk/resources/cronolog/" - >http://www.ford-mason.co.uk/resources/cronolog/</A>. It can - automatically create logfile subdirectories based on time and date, - and can have a constant symlink point to the rotating logfiles. (As - of version 1.6.1, cronolog is available under the <A - HREF="../LICENSE">Apache License</A>). Use it like this:</p> - - <PRE>CustomLog "|/path/to/cronolog - --symlink=/usr/local/apache/logs/access_log - /usr/local/apache/logs/%Y/%m/access_log" combined </PRE> - - <HR> - </LI> - <LI><A NAME="conditional-logging"> - <STRONG>How do I keep certain requests from appearing - in my logs?</STRONG></A> - <P> - The maximum flexibility for removing unwanted information from - log files is obtained by post-processing the logs, or using - piped-logs to feed the logs through a program which does whatever - you want. However, Apache does offer the ability to prevent - requests from ever appearing in the log files. You can do this by - using the - <A HREF="../mod/mod_setenvif.html#SetEnvIf"><CODE>SetEnvIf</CODE></A> - directive to set an environment variable for certain requests and - then using the conditional - <A HREF="../mod/mod_log_config.html#customlogconditional"><CODE>CustomLog</CODE></A> - syntax to prevent logging when the environment variable is set. - </P> - <HR> - </LI> - -</OL> -<!--#endif --> -<!--#if expr="$STANDALONE" --> - <!-- Don't forget to add HR tags at the end of each list item.. --> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> -<!--#endif --> diff --git a/docs/manual/misc/FAQ.html b/docs/manual/misc/FAQ.html deleted file mode 100644 index adde949ddc..0000000000 --- a/docs/manual/misc/FAQ.html +++ /dev/null @@ -1,111 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> - <HEAD> - <TITLE>Apache Server Frequently Asked Questions</TITLE> -<!--#set var="FAQMASTER" value="YES" --> - </HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> - <BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" - > - <!--#include virtual="header.html" --> - <H1 ALIGN="CENTER">Apache Server Frequently Asked Questions</H1> - <P> - $Revision: 1.147 $ ($Date: 2001/02/28 03:36:00 $) - </P> - <P> - The latest version of this FAQ is always available from the main - Apache web site, at - <<A - HREF="http://www.apache.org/docs/misc/FAQ.html" - REL="Help" - ><SAMP>http://www.apache.org/docs/misc/FAQ.html</SAMP></A>>. - </P> -<!-- Notes about changes: --> -<!-- - If adding a relative link to another part of the --> -<!-- documentation, *do* include the ".html" portion. There's a --> -<!-- good chance that the user will be reading the documentation --> -<!-- on his own system, which may not be configured for --> -<!-- multiviews. --> -<!-- - When adding items, make sure they're put in the right place --> -<!-- - verify that the numbering matches up. --> -<!-- - *Don't* use <PRE></PRE> blocks - they don't appear --> -<!-- correctly in a reliable way when this is converted to text --> -<!-- with Lynx. Use <DL><DD><CODE>xxx<BR>xx</CODE></DD></DL> --> -<!-- blocks inside a <P></P> instead. This is necessary to get --> -<!-- the horizontal and vertical indenting right. --> -<!-- - Don't forget to include an HR tag after the last /P tag --> -<!-- but before the /LI in an item. --> - <P> - If you are reading a text-only version of this FAQ, you may find numbers - enclosed in brackets (such as "[12]"). These refer to the list of - reference URLs to be found at the end of the document. These references - do not appear, and are not needed, for the hypertext version. - </P> - <H2>The Questions</H2> -<!-- Stuff to Add: --> -<!-- - can't bind to port 80 --> -<!-- - permission denied --> -<!-- - address already in use --> -<!-- - mod_auth & passwd lines "user:pw:.*" - ++1st colon onward is --> -<!-- treated as pw, not just ++1st to \-\-2nd. --> -<!-- - SSL: --> -<!-- - Can I use Apache-SSL for free in Canada? --> -<!-- - Why can't I use Apache-SSL in the U.S.? --> -<!-- - How can I found out how many visitors my site gets? --> -<!-- - How do I add a counter? --> -<!-- - How do I configure Apache as a proxy? --> -<!-- - What browsers support HTTP/1.1? --> -<!-- - What's the point of vhosts-by-name is there aren't any --> -<!-- HTTP/1.1 browsers? --> -<!-- - Is there an Apache for W95/WNT? --> -<!-- - Why does Apache die when a vhost can't be DNS-resolved? --> -<!-- - Why do I get "send lost connection" messages in my error --> -<!-- log? --> -<!-- - specifically consider .pdf files which seem to cause this --> -<!-- a lot when accessed via the plugin ... and also mention --> -<!-- how range-requests can cause bytes served < file size --> -<!-- - Why do directory indexes appear as garbage? (A: -lucb) --> -<!-- - How do I add a footer to all pages offered by my server? --> -<!-- - Fix midi question; a bigger problem than midi vs. x-midi is --> -<!-- the simple fact that older versions of Apache (and new ones --> -<!-- that have been upgraded without upgrading the mime.types --> -<!-- file) don't have the type listed at all. --> -<!-- - RewriteRule /~fraggle/* /cgi-bin/fraggle.pl does not work --> -<!-- - how do I disable authentication for a subdirectory? --> -<!-- (A: you can't but "Satisfy any; Allow from all" can be close --> -<!-- - '400 malformed request' on Win32 might mean stale proxy; see --> -<!-- PR #2300. --> -<!-- - how do I tell what version of Apache I am running? --> -<OL TYPE="A"> -<!--#include virtual="FAQ-A.html?TOC" --> -<!--#include virtual="FAQ-B.html?TOC" --> -<!--#include virtual="FAQ-C.html?TOC" --> -<!--#include virtual="FAQ-D.html?TOC" --> -<!--#include virtual="FAQ-E.html?TOC" --> -<!--#include virtual="FAQ-F.html?TOC" --> -<!--#include virtual="FAQ-G.html?TOC" --> -<!--#include virtual="FAQ-H.html?TOC" --> -<!--#include virtual="FAQ-I.html?TOC" --> -</OL> - -<HR> - - <H2>The Answers</H2> -<!--#include virtual="FAQ-A.html?" --> -<!--#include virtual="FAQ-B.html?" --> -<!--#include virtual="FAQ-C.html?" --> -<!--#include virtual="FAQ-D.html?" --> -<!--#include virtual="FAQ-E.html?" --> -<!--#include virtual="FAQ-F.html?" --> -<!--#include virtual="FAQ-G.html?" --> -<!--#include virtual="FAQ-H.html?" --> -<!--#include virtual="FAQ-I.html?" --> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> diff --git a/docs/manual/misc/client_block_api.html b/docs/manual/misc/client_block_api.html deleted file mode 100644 index f2b55132a6..0000000000 --- a/docs/manual/misc/client_block_api.html +++ /dev/null @@ -1,95 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> -<TITLE>Reading Client Input in Apache 1.2</TITLE> -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> - -<blockquote><strong>Warning:</strong> -This document has not been updated to take into account changes -made in the 2.0 version of the Apache HTTP Server. Some of the -information may still be relevant, but please use it -with care. -</blockquote> - -<H1 ALIGN="CENTER">Reading Client Input in Apache 1.2</H1> - -<HR> - -<P>Apache 1.1 and earlier let modules handle POST and PUT requests by -themselves. The module would, on its own, determine whether the -request had an entity, how many bytes it was, and then called a -function (<CODE>read_client_block</CODE>) to get the data. - -<P>However, HTTP/1.1 requires several things of POST and PUT request -handlers that did not fit into this module, and all existing modules -have to be rewritten. The API calls for handling this have been -further abstracted, so that future HTTP protocol changes can be -accomplished while remaining backwards-compatible.</P> - -<HR> - -<H3>The New API Functions</H3> - -<PRE> - int ap_setup_client_block (request_rec *, int read_policy); - int ap_should_client_block (request_rec *); - long ap_get_client_block (request_rec *, char *buffer, int buffer_size); -</PRE> - -<OL> -<LI>Call <CODE>ap_setup_client_block()</CODE> near the beginning of the request - handler. This will set up all the necessary properties, and - will return either OK, or an error code. If the latter, - the module should return that error code. The second parameter - selects the policy to apply if the request message indicates a - body, and how a chunked - transfer-coding should be interpreted. Choose one of -<PRE> - REQUEST_NO_BODY Send 413 error if message has any body - REQUEST_CHUNKED_ERROR Send 411 error if body without Content-Length - REQUEST_CHUNKED_DECHUNK If chunked, remove the chunks for me. - REQUEST_CHUNKED_PASS Pass the chunks to me without removal. -</PRE> - In order to use the last two options, the caller MUST provide a buffer - large enough to hold a chunk-size line, including any extensions. - - - -<LI>When you are ready to possibly accept input, call - <CODE>ap_should_client_block()</CODE>. - This will tell the module whether or not to read input. If it is 0, - the module should assume that the input is of a non-entity type - (<EM>e.g.</EM>, a GET request). A nonzero response indicates that the module - should proceed (to step 3). - This step also sends a 100 Continue response - to HTTP/1.1 clients, so should not be called until the module - is <STRONG>*definitely*</STRONG> ready to read content. (otherwise, the - point of the - 100 response is defeated). Never call this function more than once. - -<LI>Finally, call <CODE>ap_get_client_block</CODE> in a loop. Pass it a - buffer and its - size. It will put data into the buffer (not necessarily the full - buffer, in the case of chunked inputs), and return the length of - the input block. When it is done reading, it will - return 0 if EOF, or -1 if there was an error. - -</OL> - -<P>As an example, please look at the code in -<CODE>mod_cgi.c</CODE>. This is properly written to the new API -guidelines.</P> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> diff --git a/docs/manual/misc/compat_notes.html b/docs/manual/misc/compat_notes.html deleted file mode 100644 index 438d90ecec..0000000000 --- a/docs/manual/misc/compat_notes.html +++ /dev/null @@ -1,135 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" - "http://www.w3.org/TR/REC-html40/loose.dtd"> -<HTML> -<HEAD> -<TITLE>Apache HTTP Server: Notes about Compatibility with NCSA's Server</TITLE> -</HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> - -<blockquote><strong>Warning:</strong> -This document has not been updated to take into account changes -made in the 2.0 version of the Apache HTTP Server. Some of the -information may still be relevant, but please use it -with care. -</blockquote> - - -<H1 ALIGN="CENTER">Compatibility Notes with NCSA's Server</H1> - -<HR> - -While Apache is for the most part a drop-in replacement for NCSA's -httpd, there are a couple gotcha's to watch out for. These are mostly -due to the fact that the parser for config and access control files -was rewritten from scratch, so certain liberties the earlier servers -took may not be available here. These are all easily fixable. If you -know of other problems that belong here, <A -HREF="http://www.apache.org/bug_report.html">let us know.</A> - -<P>Please also check the <A HREF="known_client_problems.html">known -client problems</A> page. - -<OL> -<LI>As of Apache 1.3.1, methods named in a - <A HREF="../mod/core.html#limit"><SAMP><Limit></SAMP></A> - section <EM>must</EM> be listed in upper-case. Lower- or mixed-case - method names will result in a configuration error. - <P> - </P> -</LI> - -<LI>The basic mod_auth <CODE>AuthGroupFile</CODE>-specified group file - format allows commas between user names - Apache does not. - -<P> -<LI>If you follow the NCSA guidelines for setting up access - restrictions based on client domain, you may well have added - entries for <CODE>AuthType, AuthName, AuthUserFile</CODE> or - <CODE>AuthGroupFile</CODE>. <STRONG>None</STRONG> of these are - needed (or appropriate) for restricting access based on client - domain. When Apache sees <CODE>AuthType</CODE> it (reasonably) - assumes you are using some authorization type based on username - and password. Please remove <CODE>AuthType</CODE>, it's - unnecessary even for NCSA. - -<P> -<LI><CODE>OldScriptAlias</CODE> is no longer supported. - -<P> -<LI><CODE>exec cgi=""</CODE> produces reasonable <STRONG>malformed - header</STRONG> responses when used to invoke non-CGI scripts.<BR> - The NCSA code ignores the missing header (bad idea). - <BR>Solution: write CGI to the CGI spec and use - <CODE>include virtual</CODE>, or use <CODE>exec cmd=""</CODE> instead. - -<P> -<LI>Icons for FancyIndexing broken - well, no, they're not broken, - we've just upgraded the icons from flat .xbm files to pretty and - much smaller .gif files, courtesy of <A - HREF="mailto:kevinh@eit.com">Kevin Hughes</A> at <A - HREF="http://www.eit.com/">EIT</A>. If you are using the same - srm.conf from an old distribution, make sure you add the new <A - HREF="../mod/mod_autoindex.html#addicon">AddIcon</A>, <A - HREF="../mod/mod_autoindex.html#addiconbytype">AddIconByType</A>, - and <A - HREF="../mod/mod_autoindex.html#defaulticon">DefaultIcon</A> - directives. - -<P> -<LI>Apache versions before 1.2b1 will ignore the last line of configuration - files if the last line does not have a trailing newline. This affects - configuration files (httpd.conf, access.conf and srm.conf), and - htpasswd and htgroup files. - -<P> -<LI>Apache does not permit commas delimiting the methods in <Limit>. - -<P> -<LI>Apache's <CODE><VirtualHost></CODE> treats all addresses as - "optional" (<EM>i.e.</EM>, the server should continue booting if it can't - resolve the address). Whereas in NCSA the default is to fail - booting unless an added <CODE>optional</CODE> keyword is included. - -<P> -<LI>Apache does not implement <CODE>OnDeny</CODE>; use - <A HREF="../mod/core.html#errordocument"><CODE>ErrorDocument</CODE></A> - instead. - -<P> -<LI>Apache (as of 1.3) always performs the equivalent of - <CODE>HostnameLookups minimal</CODE>. <CODE>minimal</CODE> is not an - option to <A HREF="../mod/core.html#hostnamelookups"><CODE> - HostnameLookups</CODE></A>. - -<P> -<LI>To embed spaces in directive arguments NCSA used a backslash - before the space. Apache treats backslashes as normal characters. To - embed spaces surround the argument with double-quotes instead. - -<P> -<LI>Apache does not implement the NCSA <CODE>referer</CODE> - directive. See <A HREF="http://bugs.apache.org/index/full/968"> - PR#968</A> for a few brief suggestions on alternative ways to - implement the same thing under Apache. - -<P> -<LI>Apache does not allow ServerRoot settings inside a VirtualHost - container. There is only one global ServerRoot in Apache; any desired - changes in paths for virtual hosts need to be made with the explicit - directives, <em>e.g.</em>, DocumentRoot, TransferLog, <EM>etc.</EM> - -</OL> - -More to come when we notice them.... - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> diff --git a/docs/manual/misc/custom_errordocs.html b/docs/manual/misc/custom_errordocs.html deleted file mode 100644 index c2c2310a13..0000000000 --- a/docs/manual/misc/custom_errordocs.html +++ /dev/null @@ -1,433 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> -<TITLE>International Customized Server Error Messages</TITLE> -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> - -<H1 ALIGN="CENTER">Using XSSI and <SAMP>ErrorDocument</SAMP> to configure -customized international server error responses</H1> -<P> -<H2>Index</H2> -<UL> - <LI><A HREF="#intro">Introduction</A> - <LI><A HREF="#createdir">Creating an ErrorDocument directory</A> - <LI><A HREF="#docnames">Naming the individual error document files</A> - <LI><A HREF="#headfoot">The common header and footer files</A> - <LI><A HREF="#createdocs">Creating ErrorDocuments in different languages</A> - <LI><A HREF="#fallback">The fallback language</A> - <LI><A HREF="#proxy">Customizing Proxy Error Messages</A> - <LI><A HREF="#listings">HTML listing of the discussed example</A> -</UL> -<HR> -<H2><A NAME="intro">Introduction</A></H2> -This document describes an easy way to provide your apache WWW server -with a set of customized error messages which take advantage of -<A HREF="../content-negotiation.html">Content Negotiation</A> -and <A HREF="../mod/mod_include.html">eXtended Server Side Includes (XSSI)</A> -to return error messages generated by the server in the client's -native language. -</P> -<P> -By using XSSI, all -<A HREF="../mod/core.html#errordocument">customized messages</A> -can share a homogenous and consistent style and layout, and maintenance work -(changing images, changing links) is kept to a minimum because all layout -information can be kept in a single file.<BR> -Error documents can be shared across different servers, or even hosts, -because all varying information is inserted at the time the error document -is returned on behalf of a failed request. -</P> -<P> -Content Negotiation then selects the appropriate language version of a -particular error message text, honoring the language preferences passed -in the client's request. (Users usually select their favorite languages -in the preferences options menu of today's browsers). When an error -document in the client's primary language version is unavailable, the -secondary languages are tried or a default (fallback) version is used. -</P> -<P> -You have full flexibility in designing your error documents to -your personal taste (or your company's conventions). For demonstration -purposes, we present a simple generic error document scheme. -For this hypothetic server, we assume that all error messages... -<UL> -<LI>possibly are served by different virtual hosts (different host name, - different IP address, or different port) on the server machine, -<LI>show a predefined company logo in the right top of the message - (selectable by virtual host), -<LI>print the error title first, followed by an explanatory text and - (depending on the error context) help on how to resolve the error, -<LI>have some kind of standardized background image, -<LI>display an apache logo and a feedback email address at the bottom - of the error message. -</UL> -</P> - -<P> -An example of a "document not found" message for a german client might -look like this:<BR> -<IMG SRC="../images/custom_errordocs.gif" - ALT="[Needs graphics capability to display]"><BR> -All links in the document as well as links to the server's administrator -mail address, and even the name and port of the serving virtual host -are inserted in the error document at "run-time", <EM>i.e.</EM>, when the error -actually occurs. -</P> - -<H2><A NAME="createdir">Creating an ErrorDocument directory</A></H2> - -For this concept to work as easily as possible, we must take advantage -of as much server support as we can get: -<OL> - <LI>By defining the <A HREF="../mod/core.html#options">MultiViews option</A>, - we enable the language selection of the most appropriate language - alternative (content negotiation). - <LI>By setting the - <A HREF="../mod/mod_negotiation.html#languagepriority" - >LanguagePriority</A> - directive we define a set of default fallback languages in the situation - where the client's browser did not express any preference at all. - <LI>By enabling <A HREF="../mod/mod_include.html">Server Side Includes</A> - (and disallowing execution of cgi scripts for security reasons), - we allow the server to include building blocks of the error message, - and to substitute the value of certain environment variables into the - generated document (dynamic HTML) or even to conditionally include - or omit parts of the text. - <LI>The <A HREF="../mod/mod_mime.html#addhandler">AddHandler</A> and - <A HREF="../mod/mod_mime.html#addtype">AddType</A> directives are useful - for automatically XSSI-expanding all files with a <SAMP>.shtml</SAMP> - suffix to <EM>text/html</EM>. - <LI>By using the <A HREF="../mod/mod_alias.html#alias">Alias</A> directive, - we keep the error document directory outside of the document tree - because it can be regarded more as a server part than part of - the document tree. - <LI>The <A HREF="../mod/core.html#directory"><Directory></A>-Block - restricts these "special" settings to the error document directory - and avoids an impact on any of the settings for the regular document tree. - <LI>For each of the error codes to be handled (see RFC2068 for an exact - description of each error code, or look at - <CODE>src/main/http_protocol.c</CODE> - if you wish to see apache's standard messages), an - <A HREF="../mod/core.html#errordocument">ErrorDocument</A> - in the aliased <SAMP>/errordocs</SAMP> directory is defined. - Note that we only define the basename of the document here - because the MultiViews option will select the best candidate - based on the language suffixes and the client's preferences. - Any error situation with an error code <EM>not</EM> handled by a - custom document will be dealt with by the server in the standard way - (<EM>i.e.</EM>, a plain error message in english). - <LI>Finally, the <A HREF="../mod/core.html#allowoverride">AllowOverride</A> - directive tells apache that it is not necessary to look for - a .htaccess file in the /errordocs directory: a minor speed - optimization. -</OL> -The resulting <SAMP>httpd.conf</SAMP> configuration would then look -similar to this: <SMALL>(Note that you can define your own error -messages using this method for only part of the document tree, -e.g., a /~user/ subtree. In this case, the configuration could as well -be put into the .htaccess file at the root of the subtree, and -the <Directory> and </Directory> directives -but not -the contained directives- must be omitted.)</SMALL> -<PRE> - LanguagePriority en fr de - Alias /errordocs /usr/local/apache/errordocs - <Directory /usr/local/apache/errordocs> - AllowOverride none - Options MultiViews IncludesNoExec FollowSymLinks - AddType text/html .shtml - <FilesMatch "\.shtml[.$]"> - SetOutputFilter INCLUDES - </FilesMatch> - </Directory> - # "400 Bad Request", - ErrorDocument 400 /errordocs/400 - # "401 Authorization Required", - ErrorDocument 401 /errordocs/401 - # "403 Forbidden", - ErrorDocument 403 /errordocs/403 - # "404 Not Found", - ErrorDocument 404 /errordocs/404 - # "500 Internal Server Error", - ErrorDocument 500 /errordocs/500 -</PRE> -The directory for the error messages (here: -<SAMP>/usr/local/apache/errordocs/</SAMP>) must then be created with the -appropriate permissions (readable and executable by the server uid or gid, -only writable for the administrator). - -<H3><A NAME="docnames">Naming the individual error document files</A></H3> - -By defining the <SAMP>MultiViews</SAMP> option, the server was told to -automatically scan the directory for matching variants (looking at language -and content type suffixes) when a requested document was not found. -In the configuration, we defined the names for the error documents to be -just their error number (without any suffix). -<P> -The names of the individual error documents are now determined like this -(I'm using 403 as an example, think of it as a placeholder for any of -the configured error documents): -<UL> - <LI>No file errordocs/403 should exist. Otherwise, it would be found and - served (with the DefaultType, usually text/plain), all negotiation - would be bypassed. - <LI>For each language for which we have an internationalized version - (note that this need not be the same set of languages for each - error code - you can get by with a single language version until - you actually <EM>have</EM> translated versions), a document - <SAMP>errordocs/403.shtml.<EM>lang</EM></SAMP> is created and - filled with the error text in that language (<A HREF="#createdocs">see - below</A>). - <LI>One fallback document called <SAMP>errordocs/403.shtml</SAMP> is - created, usually by creating a symlink to the default language - variant (<A HREF="#fallback">see below</A>). -</UL> - -<H3><A NAME="headfoot">The common header and footer files</A></H3> - -By putting as much layout information in two special "include files", -the error documents can be reduced to a bare minimum. -<P> -One of these layout files defines the HTML document header -and a configurable list of paths to the icons to be shown in the resulting -error document. These paths are exported as a set of XSSI environment -variables and are later evaluated by the "footer" special file. -The title of the current error (which is -put into the TITLE tag and an H1 header) is simply passed in from the main -error document in a variable called <CODE>title</CODE>.<BR> -<STRONG>By changing this file, the layout of all generated error -messages can be changed in a second.</STRONG> -(By exploiting the features of XSSI, you can easily define different -layouts based on the current virtual host, or even based on the -client's domain name). -<P> -The second layout file describes the footer to be displayed at the bottom -of every error message. In this example, it shows an apache logo, the current -server time, the server version string and adds a mail reference to the -site's webmaster. -<P> -For simplicity, the header file is simply called <CODE>head.shtml</CODE> -because it contains server-parsed content but no language specific -information. The footer file exists once for each language translation, -plus a symlink for the default language.<P> -<STRONG>Example:</STRONG> for English, French and German versions -(default english)<BR> -<CODE>foot.shtml.en</CODE>,<BR> -<CODE>foot.shtml.fr</CODE>,<BR> -<CODE>foot.shtml.de</CODE>,<BR> -<CODE>foot.shtml</CODE> symlink to <CODE>foot.shtml.en</CODE><P> -Both files are included into the error document by using the -directives <CODE><!--#include virtual="head" --></CODE> -and <CODE><!--#include virtual="foot" --></CODE> -respectively: the rest of the magic occurs in mod_negotiation and -in mod_include. -<P> - -See <A HREF="#listings">the listings below</A> to see an actual HTML -implementation of the discussed example. - - -<H3><A NAME="createdocs">Creating ErrorDocuments in different languages</A> -</H3> - -After all this preparation work, little remains to be said about the -actual documents. They all share a simple common structure: -<PRE> -<!--#set var="title" value="<EM>error description title</EM>" --> -<!--#include virtual="head" --> - <EM>explanatory error text</EM> -<!--#include virtual="foot" --> -</PRE> -In the <A HREF="#listings">listings section</A>, you can see an example -of a [400 Bad Request] error document. Documents as simple as that -certainly cause no problems to translate or expand. - -<H3><A NAME="fallback">The fallback language</A></H3> - -Do we need a special handling for languages other than those we have -translations for? We did set the LanguagePriority, didn't we?! -<P> -Well, the LanguagePriority directive is for the case where the client does -not express any language priority at all. But what -happens in the situation where the client wants one -of the languages we do not have, and none of those we do have? -<P> -Without doing anything, the Apache server will usually return a -[406 no acceptable variant] error, listing the choices from which the client -may select. But we're in an error message already, and important error -information might get lost when the client had to choose a language -representation first. -<P> -So, in this situation it appears to be easier to define a fallback language -(by copying or linking, <EM>e.g.</EM>, the english version to a language-less version). -Because the negotiation algorithm prefers "more specialized" variants over -"more generic" variants, these generic alternatives will only be chosen -when the normal negotiation did not succeed. -<P> -A simple shell script to do it (execute within the errordocs/ dir): -<PRE> - for f in *.shtml.en - do - ln -s $f `basename $f .en` - done -</PRE> - -<P> -</P> - -<H2><A NAME="proxy">Customizing Proxy Error Messages</A></H2> - -<P> - As of Apache-1.3, it is possible to use the <CODE>ErrorDocument</CODE> - mechanism for proxy error messages as well (previous versions always - returned fixed predefined error messages). -</P> -<P> - Most proxy errors return an error code of [500 Internal Server Error]. - To find out whether a particular error document was invoked on behalf - of a proxy error or because of some other server error, and what the reason - for the failure was, you can check the contents of the new - <CODE>ERROR_NOTES</CODE> CGI environment variable: - if invoked for a proxy error, this variable will contain the actual proxy - error message text in HTML form. -</P> -<P> - The following excerpt demonstrates how to exploit the <CODE>ERROR_NOTES</CODE> - variable within an error document: -</P> -<PRE> - <!--#if expr="$REDIRECT_ERROR_NOTES = ''" --> - <p> - The server encountered an unexpected condition - which prevented it from fulfilling the request. - </p> - <p> - <A HREF="mailto:<!--#echo var="SERVER_ADMIN" -->" - SUBJECT="Error message [<!--#echo var="REDIRECT_STATUS" -->] <!--#echo var="title" --> for <!--#echo var="REQUEST_URI" -->"> - Please forward this error screen to <!--#echo var="SERVER_NAME" -->'s - WebMaster</A>; it includes useful debugging information about - the Request which caused the error. - <pre><!--#printenv --></pre> - </p> - <!--#else --> - <!--#echo var="REDIRECT_ERROR_NOTES" --> - <!--#endif --> -</PRE> - -<H2><A NAME="listings">HTML listing of the discussed example</A></H2> - -So, to summarize our example, here's the complete listing of the -<SAMP>400.shtml.en</SAMP> document. You will notice that it contains -almost nothing but the error text (with conditional additions). -Starting with this example, you will find it easy to add more error -documents, or to translate the error documents to different languages. -<HR><PRE> -<!--#set var="title" value="Bad Request" ---><!--#include virtual="head" --><P> - Your browser sent a request that this server could not understand: - <BLOCKQUOTE> - <STRONG><!--#echo var="REQUEST_URI" --></STRONG> - </BLOCKQUOTE> - The request could not be understood by the server due to malformed - syntax. The client should not repeat the request without - modifications. - </P> - <P> - <!--#if expr="$HTTP_REFERER != ''" --> - Please inform the owner of - <A HREF="<!--#echo var="HTTP_REFERER" -->">the referring page</A> about - the malformed link. - <!--#else --> - Please check your request for typing errors and retry. - <!--#endif --> - </P> -<!--#include virtual="foot" --> -</PRE><HR> - -Here is the complete <SAMP>head.shtml</SAMP> file (the funny line -breaks avoid empty lines in the document after XSSI processing). Note the -configuration section at top. That's where you configure the images and logos -as well as the apache documentation directory. Look how this file displays -two different logos depending on the content of the virtual host name -($SERVER_NAME), and that an animated apache logo is shown if the browser -appears to support it (the latter requires server configuration lines -of the form <BR><CODE>BrowserMatch "^Mozilla/[2-4]" anigif</CODE><BR> -for browser types which support animated GIFs). -<HR><PRE> -<!--#if expr="$SERVER_NAME = /.*\.mycompany\.com/" ---><!--#set var="IMG_CorpLogo" - value="http://$SERVER_NAME:$SERVER_PORT/errordocs/CorpLogo.gif" ---><!--#set var="ALT_CorpLogo" value="Powered by Linux!" ---><!--#else ---><!--#set var="IMG_CorpLogo" - value="http://$SERVER_NAME:$SERVER_PORT/errordocs/PrivLogo.gif" ---><!--#set var="ALT_CorpLogo" value="Powered by Linux!" ---><!--#endif ---><!--#set var="IMG_BgImage" value="http://$SERVER_NAME:$SERVER_PORT/errordocs/BgImage.gif" ---><!--#set var="DOC_Apache" value="http://$SERVER_NAME:$SERVER_PORT/Apache/" ---><!--#if expr="$anigif" ---><!--#set var="IMG_Apache" value="http://$SERVER_NAME:$SERVER_PORT/icons/apache_anim.gif" ---><!--#else ---><!--#set var="IMG_Apache" value="http://$SERVER_NAME:$SERVER_PORT/icons/apache_pb.gif" ---><!--#endif ---><!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> -<HTML> - <HEAD> - <TITLE> - [<!--#echo var="REDIRECT_STATUS" -->] <!--#echo var="title" --> - </TITLE> - </HEAD> - <BODY BGCOLOR="white" BACKGROUND="<!--#echo var="IMG_BgImage" -->"><UL> - <H1 ALIGN="center"> - [<!--#echo var="REDIRECT_STATUS" -->] <!--#echo var="title" --> - <IMG SRC="<!--#echo var="IMG_CorpLogo" -->" - ALT="<!--#echo var="ALT_CorpLogo" -->" ALIGN=right> - </H1> - <HR><!-- ======================================================== --> - <DIV> -</PRE><HR> - and this is the <SAMP>foot.shtml.en</SAMP> file: -<HR><PRE> - - </DIV> - <HR> - <DIV ALIGN="right"><SMALL><SUP>Local Server time: - <!--#echo var="DATE_LOCAL" --> - </SUP></SMALL></DIV> - <DIV ALIGN="center"> - <A HREF="<!--#echo var="DOC_Apache" -->"> - <IMG SRC="<!--#echo var="IMG_Apache" -->" BORDER=0 ALIGN="bottom" - ALT="Powered by <!--#echo var="SERVER_SOFTWARE" -->"></A><BR> - <SMALL><SUP><!--#set var="var" - value="Powered by $SERVER_SOFTWARE -- File last modified on $LAST_MODIFIED" - --><!--#echo var="var" --></SUP></SMALL> - </DIV> - <ADDRESS>If the indicated error looks like a misconfiguration, please inform - <A HREF="mailto:<!--#echo var="SERVER_ADMIN" -->" - SUBJECT="Feedback about Error message [<!--#echo var="REDIRECT_STATUS" - -->] <!--#echo var="title" -->, req=<!--#echo var="REQUEST_URI" -->"> - <!--#echo var="SERVER_NAME" -->'s WebMaster</A>. - </ADDRESS> - </UL></BODY> -</HTML> -</PRE><HR> - - -<H3>More welcome!</H3> - -If you have tips to contribute, send mail to <A -HREF="mailto:martin@apache.org">martin@apache.org</A> - - <!--#include virtual="footer.html" --> -</BODY> -</HTML> - diff --git a/docs/manual/misc/descriptors.html b/docs/manual/misc/descriptors.html deleted file mode 100644 index 6a68c02251..0000000000 --- a/docs/manual/misc/descriptors.html +++ /dev/null @@ -1,182 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> -<TITLE>Descriptors and Apache</TITLE> -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> -<H1 ALIGN="CENTER">Descriptors and Apache</H1> - -<P>A <EM>descriptor</EM>, also commonly called a <EM>file handle</EM> is -an object that a program uses to read or write an open file, or open -network socket, or a variety of other devices. It is represented -by an integer, and you may be familiar with <CODE>stdin</CODE>, -<CODE>stdout</CODE>, and <CODE>stderr</CODE> which are descriptors 0, -1, and 2 respectively. -Apache needs a descriptor for each log file, plus one for each -network socket that it listens on, plus a handful of others. Libraries -that Apache uses may also require descriptors. Normal programs don't -open up many descriptors at all, and so there are some latent problems -that you may experience should you start running Apache with many -descriptors (<EM>i.e.</EM>, with many virtual hosts). - -<P>The operating system enforces a limit on the number of descriptors -that a program can have open at a time. There are typically three limits -involved here. One is a kernel limitation, depending on your operating -system you will either be able to tune the number of descriptors available -to higher numbers (this is frequently called <EM>FD_SETSIZE</EM>). Or you -may be stuck with a (relatively) low amount. The second limit is called -the <EM>hard resource</EM> limit, and it is sometimes set by root in an -obscure operating system file, but frequently is the same as the kernel -limit. The third limit is called the <EM>soft -resource</EM> limit. The soft limit is always less than or equal to -the hard limit. For example, the hard limit may be 1024, but the soft -limit only 64. Any user can raise their soft limit up to the hard limit. -Root can raise the hard limit up to the system maximum limit. The soft -limit is the actual limit that is used when enforcing the maximum number -of files a process can have open. - -<P>To summarize: - -<CENTER><PRE> - #open files <= soft limit <= hard limit <= kernel limit -</PRE></CENTER> - -<P>You control the hard and soft limits using the <CODE>limit</CODE> (csh) -or <CODE>ulimit</CODE> (sh) directives. See the respective man pages -for more information. For example you can probably use -<CODE>ulimit -n unlimited</CODE> to raise your soft limit up to the -hard limit. You should include this command in a shell script which -starts your webserver. - -<P>Unfortunately, it's not always this simple. As mentioned above, -you will probably run into some system limitations that will need to be -worked around somehow. Work was done in version 1.2.1 to improve the -situation somewhat. Here is a partial list of systems and workarounds -(assuming you are using 1.2.1 or later): - -<DL> - - <DT><STRONG>BSDI 2.0</STRONG> - <DD>Under BSDI 2.0 you can build Apache to support more descriptors - by adding <CODE>-DFD_SETSIZE=nnn</CODE> to - <CODE>EXTRA_CFLAGS</CODE> (where nnn is the number of descriptors - you wish to support, keep it less than the hard limit). But it - will run into trouble if more than approximately 240 Listen - directives are used. This may be cured by rebuilding your kernel - with a higher FD_SETSIZE. - <P> - - <DT><STRONG>FreeBSD 2.2, BSDI 2.1+</STRONG> - <DD>Similar to the BSDI 2.0 case, you should define - <CODE>FD_SETSIZE</CODE> and rebuild. But the extra - Listen limitation doesn't exist. - <P> - - <DT><STRONG>Linux</STRONG> - <DD>By default Linux has a kernel maximum of 256 open descriptors - per process. There are several patches available for the - 2.0.x series which raise this to 1024 and beyond, and you - can find them in the "unofficial patches" section of <A - HREF="http://www.linuxhq.com/">the Linux Information HQ</A>. - None of these patches are perfect, and an entirely different - approach is likely to be taken during the 2.1.x development. - Applying these patches will raise the FD_SETSIZE used to compile - all programs, and unless you rebuild all your libraries you should - avoid running any other program with a soft descriptor limit above - 256. As of this writing the patches available for increasing - the number of descriptors do not take this into account. On a - dedicated webserver you probably won't run into trouble. - <P> - - <DT><STRONG>Solaris through 2.5.1</STRONG> - <DD>Solaris has a kernel hard limit of 1024 (may be lower in earlier - versions). But it has a limitation that files using - the stdio library cannot have a descriptor above 255. - Apache uses the stdio library for the ErrorLog directive. - When you have more than approximately 110 virtual hosts - (with an error log and an access log each) you will need to - build Apache with <CODE>-DHIGH_SLACK_LINE=256</CODE> added to - <CODE>EXTRA_CFLAGS</CODE>. You will be limited to approximately - 240 error logs if you do this. - <P> - - <DT><STRONG>AIX</STRONG> - <DD>AIX version 3.2?? appears to have a hard limit of 128 descriptors. - End of story. Version 4.1.5 has a hard limit of 2000. - <P> - - <DT><STRONG>SCO OpenServer</STRONG> - <DD>Edit the - <CODE>/etc/conf/cf.d/stune</CODE> file or use - <CODE>/etc/conf/cf.d/configure</CODE> choice 7 - (User and Group configuration) and modify the <CODE>NOFILES</CODE> kernel - parameter to a suitably higher value. SCO recommends a number - between 60 and 11000, the default is 110. Relink and reboot, - and the new number of descriptors will be available. - - <P> - - <DT><STRONG>Compaq Tru64 UNIX/Digital UNIX/OSF</STRONG> - <DD><OL> - <LI>Raise <code>open_max_soft</code> and <code>open_max_hard</code> - to 4096 in the proc subsystem. - Do a man on sysconfig, sysconfigdb, and sysconfigtab. - <LI>Raise <code>max-vnodes</code> to a large number which is greater - than the number of apache processes * 4096 - (Setting it to 250,000 should be good for most people). - Do a man on sysconfig, sysconfigdb, and sysconfigtab. - <LI>If you are using Tru64 5.0, 5.0A, or 5.1, define - <code>NO_SLACK</code> to work around a bug in the OS. - <code>CFLAGS="-DNO_SLACK" ./configure</code> - </OL> - - <P> - - <DT><STRONG>Others</STRONG> - <DD>If you have details on another operating system, please submit - it through our <A HREF="http://www.apache.org/bug_report.html">Bug - Report Page</A>. - <P> - -</DL> - -<P>In addition to the problems described above there are problems with -many libraries that Apache uses. The most common example is the bind -DNS resolver library that is used by pretty much every unix, which -fails if it ends up with a descriptor above 256. We suspect there -are other libraries that similar limitations. So the code as of 1.2.1 -takes a defensive stance and tries to save descriptors less than 16 -for use while processing each request. This is called the <EM>low -slack line</EM>. - -<P>Note that this shouldn't waste descriptors. If you really are pushing -the limits and Apache can't get a descriptor above 16 when it wants -it, it will settle for one below 16. - -<P>In extreme situations you may want to lower the low slack line, -but you shouldn't ever need to. For example, lowering it can -increase the limits 240 described above under Solaris and BSDI 2.0. -But you'll play a delicate balancing game with the descriptors needed -to serve a request. Should you want to play this game, the compile -time parameter is <CODE>LOW_SLACK_LINE</CODE> and there's a tiny -bit of documentation in the header file <CODE>httpd.h</CODE>. - -<P>Finally, if you suspect that all this slack stuff is causing you -problems, you can disable it. Add <CODE>-DNO_SLACK</CODE> to -<CODE>EXTRA_CFLAGS</CODE> and rebuild. But please report it to -our <A HREF="http://www.apache.org/bug_report.html">Bug -Report Page</A> so that -we can investigate. - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> diff --git a/docs/manual/misc/fin_wait_2.html b/docs/manual/misc/fin_wait_2.html deleted file mode 100644 index 1ec709153f..0000000000 --- a/docs/manual/misc/fin_wait_2.html +++ /dev/null @@ -1,324 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> -<TITLE>Connections in FIN_WAIT_2 and Apache</TITLE> -<LINK REV="made" HREF="mailto:marc@apache.org"> - -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> - -<H1 ALIGN="CENTER">Connections in the FIN_WAIT_2 state and Apache</H1> -<OL> -<LI><H2>What is the FIN_WAIT_2 state?</H2> -Starting with the Apache 1.2 betas, people are reporting many more -connections in the FIN_WAIT_2 state (as reported by -<CODE>netstat</CODE>) than they saw using older versions. When the -server closes a TCP connection, it sends a packet with the FIN bit -sent to the client, which then responds with a packet with the ACK bit -set. The client then sends a packet with the FIN bit set to the -server, which responds with an ACK and the connection is closed. The -state that the connection is in during the period between when the -server gets the ACK from the client and the server gets the FIN from -the client is known as FIN_WAIT_2. See the <A -HREF="ftp://ds.internic.net/rfc/rfc793.txt">TCP RFC</A> for the -technical details of the state transitions.<P> - -The FIN_WAIT_2 state is somewhat unusual in that there is no timeout -defined in the standard for it. This means that on many operating -systems, a connection in the FIN_WAIT_2 state will stay around until -the system is rebooted. If the system does not have a timeout and -too many FIN_WAIT_2 connections build up, it can fill up the space -allocated for storing information about the connections and crash -the kernel. The connections in FIN_WAIT_2 do not tie up an httpd -process.<P> - -<LI><H2>But why does it happen?</H2> - -There are numerous reasons for it happening, some of them may not -yet be fully clear. What is known follows.<P> - -<H3>Buggy clients and persistent connections</H3> - -Several clients have a bug which pops up when dealing with -<A HREF="../keepalive.html">persistent connections</A> (aka keepalives). -When the connection is idle and the server closes the connection -(based on the <A HREF="../mod/core.html#keepalivetimeout"> -KeepAliveTimeout</A>), the client is programmed so that the client does -not send back a FIN and ACK to the server. This means that the -connection stays in the FIN_WAIT_2 state until one of the following -happens:<P> -<UL> - <LI>The client opens a new connection to the same or a different - site, which causes it to fully close the older connection on - that socket. - <LI>The user exits the client, which on some (most?) clients - causes the OS to fully shutdown the connection. - <LI>The FIN_WAIT_2 times out, on servers that have a timeout - for this state. -</UL><P> -If you are lucky, this means that the buggy client will fully close the -connection and release the resources on your server. However, there -are some cases where the socket is never fully closed, such as a dialup -client disconnecting from their provider before closing the client. -In addition, a client might sit idle for days without making another -connection, and thus may hold its end of the socket open for days -even though it has no further use for it. -<STRONG>This is a bug in the browser or in its operating system's -TCP implementation.</STRONG> <P> - -The clients on which this problem has been verified to exist:<P> -<UL> - <LI>Mozilla/3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) - <LI>Mozilla/2.02 (X11; I; FreeBSD 2.1.5-RELEASE i386) - <LI>Mozilla/3.01Gold (X11; I; SunOS 5.5 sun4m) - <LI>MSIE 3.01 on the Macintosh - <LI>MSIE 3.01 on Windows 95 -</UL><P> - -This does not appear to be a problem on: -<UL> - <LI>Mozilla/3.01 (Win95; I) -</UL> -<P> - -It is expected that many other clients have the same problem. What a -client <STRONG>should do</STRONG> is periodically check its open -socket(s) to see if they have been closed by the server, and close their -side of the connection if the server has closed. This check need only -occur once every few seconds, and may even be detected by a OS signal -on some systems (<EM>e.g.</EM>, Win95 and NT clients have this capability, but -they seem to be ignoring it).<P> - -Apache <STRONG>cannot</STRONG> avoid these FIN_WAIT_2 states unless it -disables persistent connections for the buggy clients, just -like we recommend doing for Navigator 2.x clients due to other bugs. -However, non-persistent connections increase the total number of -connections needed per client and slow retrieval of an image-laden -web page. Since non-persistent connections have their own resource -consumptions and a short waiting period after each closure, a busy server -may need persistence in order to best serve its clients.<P> - -As far as we know, the client-caused FIN_WAIT_2 problem is present for -all servers that support persistent connections, including Apache 1.1.x -and 1.2.<P> - -<H3>A necessary bit of code introduced in 1.2</H3> - -While the above bug is a problem, it is not the whole problem. -Some users have observed no FIN_WAIT_2 problems with Apache 1.1.x, -but with 1.2b enough connections build up in the FIN_WAIT_2 state to -crash their server. - -The most likely source for additional FIN_WAIT_2 states -is a function called <CODE>lingering_close()</CODE> which was added -between 1.1 and 1.2. This function is necessary for the proper -handling of persistent connections and any request which includes -content in the message body (<EM>e.g.</EM>, PUTs and POSTs). -What it does is read any data sent by the client for -a certain time after the server closes the connection. The exact -reasons for doing this are somewhat complicated, but involve what -happens if the client is making a request at the same time the -server sends a response and closes the connection. Without lingering, -the client might be forced to reset its TCP input buffer before it -has a chance to read the server's response, and thus understand why -the connection has closed. -See the <A HREF="#appendix">appendix</A> for more details.<P> - -The code in <CODE>lingering_close()</CODE> appears to cause problems -for a number of factors, including the change in traffic patterns -that it causes. The code has been thoroughly reviewed and we are -not aware of any bugs in it. It is possible that there is some -problem in the BSD TCP stack, aside from the lack of a timeout -for the FIN_WAIT_2 state, exposed by the <CODE>lingering_close</CODE> -code that causes the observed problems.<P> - -<H2><LI>What can I do about it?</H2> - -There are several possible workarounds to the problem, some of -which work better than others.<P> - -<H3>Add a timeout for FIN_WAIT_2</H3> - -The obvious workaround is to simply have a timeout for the FIN_WAIT_2 state. -This is not specified by the RFC, and could be claimed to be a -violation of the RFC, but it is widely recognized as being necessary. -The following systems are known to have a timeout: -<P> -<UL> - <LI><A HREF="http://www.freebsd.org/">FreeBSD</A> versions starting at - 2.0 or possibly earlier. - <LI><A HREF="http://www.netbsd.org/">NetBSD</A> version 1.2(?) - <LI><A HREF="http://www.openbsd.org/">OpenBSD</A> all versions(?) - <LI><A HREF="http://www.bsdi.com/">BSD/OS</A> 2.1, with the - <A HREF="ftp://ftp.bsdi.com/bsdi/patches/patches-2.1/K210-027"> - K210-027</A> patch installed. - <LI><A HREF="http://www.sun.com/">Solaris</A> as of around version - 2.2. The timeout can be tuned by using <CODE>ndd</CODE> to - modify <CODE>tcp_fin_wait_2_flush_interval</CODE>, but the - default should be appropriate for most servers and improper - tuning can have negative impacts. - <LI><A HREF="http://www.linux.org/">Linux</A> 2.0.x and - earlier(?) - <LI><A HREF="http://www.hp.com/">HP-UX</A> 10.x defaults to - terminating connections in the FIN_WAIT_2 state after the - normal keepalive timeouts. This does not - refer to the persistent connection or HTTP keepalive - timeouts, but the <CODE>SO_LINGER</CODE> socket option - which is enabled by Apache. This parameter can be adjusted - by using <CODE>nettune</CODE> to modify parameters such as - <CODE>tcp_keepstart</CODE> and <CODE>tcp_keepstop</CODE>. - In later revisions, there is an explicit timer for - connections in FIN_WAIT_2 that can be modified; contact HP - support for details. - <LI><A HREF="http://www.sgi.com/">SGI IRIX</A> can be patched to - support a timeout. For IRIX 5.3, 6.2, and 6.3, - use patches 1654, 1703 and 1778 respectively. If you - have trouble locating these patches, please contact your - SGI support channel for help. - <LI><A HREF="http://www.ncr.com/">NCR's MP RAS Unix</A> 2.xx and - 3.xx both have FIN_WAIT_2 timeouts. In 2.xx it is non-tunable - at 600 seconds, while in 3.xx it defaults to 600 seconds and - is calculated based on the tunable "max keep alive probes" - (default of 8) multiplied by the "keep alive interval" (default - 75 seconds). - <LI><A HREF="http://www.sequent.com">Sequent's ptx/TCP/IP for - DYNIX/ptx</A> has had a FIN_WAIT_2 timeout since around - release 4.1 in mid-1994. -</UL> -<P> -The following systems are known to not have a timeout: -<P> -<UL> - <LI><A HREF="http://www.sun.com/">SunOS 4.x</A> does not and - almost certainly never will have one because it as at the - very end of its development cycle for Sun. If you have kernel - source should be easy to patch. -</UL> -<P> -There is a -<A HREF="http://www.apache.org/dist/httpd/contrib/patches/1.2/fin_wait_2.patch" ->patch available</A> for adding a timeout to the FIN_WAIT_2 state; it -was originally intended for BSD/OS, but should be adaptable to most -systems using BSD networking code. You need kernel source code to be -able to use it. If you do adapt it to work for any other systems, -please drop me a note at <A HREF="mailto:marc@apache.org">marc@apache.org</A>. -<P> -<H3>Compile without using <CODE>lingering_close()</CODE></H3> - -It is possible to compile Apache 1.2 without using the -<CODE>lingering_close()</CODE> function. This will result in that -section of code being similar to that which was in 1.1. If you do -this, be aware that it can cause problems with PUTs, POSTs and -persistent connections, especially if the client uses pipelining. -That said, it is no worse than on 1.1, and we understand that keeping your -server running is quite important.<P> - -To compile without the <CODE>lingering_close()</CODE> function, add -<CODE>-DNO_LINGCLOSE</CODE> to the end of the -<CODE>EXTRA_CFLAGS</CODE> line in your <CODE>Configuration</CODE> file, -rerun <CODE>Configure</CODE> and rebuild the server. -<P> -<H3>Use <CODE>SO_LINGER</CODE> as an alternative to -<CODE>lingering_close()</CODE></H3> - -On most systems, there is an option called <CODE>SO_LINGER</CODE> that -can be set with <CODE>setsockopt(2)</CODE>. It does something very -similar to <CODE>lingering_close()</CODE>, except that it is broken -on many systems so that it causes far more problems than -<CODE>lingering_close</CODE>. On some systems, it could possibly work -better so it may be worth a try if you have no other alternatives. <P> - -To try it, add <CODE>-DUSE_SO_LINGER -DNO_LINGCLOSE</CODE> to the end of the -<CODE>EXTRA_CFLAGS</CODE> line in your <CODE>Configuration</CODE> -file, rerun <CODE>Configure</CODE> and rebuild the server. <P> - -<STRONG>NOTE:</STRONG> Attempting to use <CODE>SO_LINGER</CODE> and -<CODE>lingering_close()</CODE> at the same time is very likely to do -very bad things, so don't.<P> - -<H3>Increase the amount of memory used for storing connection state</H3> -<DL> -<DT>BSD based networking code: -<DD>BSD stores network data, such as connection states, -in something called an mbuf. When you get so many connections -that the kernel does not have enough mbufs to put them all in, your -kernel will likely crash. You can reduce the effects of the problem -by increasing the number of mbufs that are available; this will not -prevent the problem, it will just make the server go longer before -crashing.<P> - -The exact way to increase them may depend on your OS; look -for some reference to the number of "mbufs" or "mbuf clusters". On -many systems, this can be done by adding the line -<CODE>NMBCLUSTERS="n"</CODE>, where <CODE>n</CODE> is the number of -mbuf clusters you want to your kernel config file and rebuilding your -kernel.<P> -</DL> - -<H3>Disable KeepAlive</H3> -<P>If you are unable to do any of the above then you should, as a last -resort, disable KeepAlive. Edit your httpd.conf and change "KeepAlive On" -to "KeepAlive Off". - -<H2><LI>Feedback</H2> - -If you have any information to add to this page, please contact me at -<A HREF="mailto:marc@apache.org">marc@apache.org</A>.<P> - -<H2><A NAME="appendix"><LI>Appendix</A></H2> -<P> -Below is a message from Roy Fielding, one of the authors of HTTP/1.1. - -<H3>Why the lingering close functionality is necessary with HTTP</H3> - -The need for a server to linger on a socket after a close is noted a couple -times in the HTTP specs, but not explained. This explanation is based on -discussions between myself, Henrik Frystyk, Robert S. Thau, Dave Raggett, -and John C. Mallery in the hallways of MIT while I was at W3C.<P> - -If a server closes the input side of the connection while the client -is sending data (or is planning to send data), then the server's TCP -stack will signal an RST (reset) back to the client. Upon -receipt of the RST, the client will flush its own incoming TCP buffer -back to the un-ACKed packet indicated by the RST packet argument. -If the server has sent a message, usually an error response, to the -client just before the close, and the client receives the RST packet -before its application code has read the error message from its incoming -TCP buffer and before the server has received the ACK sent by the client -upon receipt of that buffer, then the RST will flush the error message -before the client application has a chance to see it. The result is -that the client is left thinking that the connection failed for no -apparent reason.<P> - -There are two conditions under which this is likely to occur: -<OL> -<LI>sending POST or PUT data without proper authorization -<LI>sending multiple requests before each response (pipelining) - and one of the middle requests resulting in an error or - other break-the-connection result. -</OL> -<P> -The solution in all cases is to send the response, close only the -write half of the connection (what shutdown is supposed to do), and -continue reading on the socket until it is either closed by the -client (signifying it has finally read the response) or a timeout occurs. -That is what the kernel is supposed to do if SO_LINGER is set. -Unfortunately, SO_LINGER has no effect on some systems; on some other -systems, it does not have its own timeout and thus the TCP memory -segments just pile-up until the next reboot (planned or not).<P> - -Please note that simply removing the linger code will not solve the -problem -- it only moves it to a different and much harder one to detect. -</OL> -<!--#include virtual="footer.html" --> -</BODY> -</HTML> diff --git a/docs/manual/misc/footer.html b/docs/manual/misc/footer.html deleted file mode 100644 index 1e5f739ebe..0000000000 --- a/docs/manual/misc/footer.html +++ /dev/null @@ -1,8 +0,0 @@ -<HR> - -<H3 ALIGN="CENTER"> - Apache HTTP Server Version 2.0 -</H3> - -<A HREF="./"><IMG SRC="../images/index.gif" ALT="Index"></A> -<A HREF="../"><IMG SRC="../images/home.gif" ALT="Home"></A> diff --git a/docs/manual/misc/header.html b/docs/manual/misc/header.html deleted file mode 100644 index 9533b02bda..0000000000 --- a/docs/manual/misc/header.html +++ /dev/null @@ -1,6 +0,0 @@ -<DIV ALIGN="CENTER"> - <IMG SRC="../images/sub.gif" ALT="[APACHE DOCUMENTATION]"> - <H3> - Apache HTTP Server Version 2.0 - </H3> -</DIV> diff --git a/docs/manual/misc/howto.html b/docs/manual/misc/howto.html deleted file mode 100644 index 57c9f2dfa8..0000000000 --- a/docs/manual/misc/howto.html +++ /dev/null @@ -1,208 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> -<META NAME="description" - CONTENT="Some 'how to' tips for the Apache httpd server"> -<META NAME="keywords" CONTENT="apache,redirect,robots,rotate,logfiles"> -<TITLE>Apache HOWTO documentation</TITLE> -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> -<H1 ALIGN="CENTER">Apache HOWTO documentation</H1> - -How to: -<UL> -<LI><A HREF="#redirect">redirect an entire server or directory to a single - URL</A> -<LI><A HREF="#logreset">reset your log files</A> -<LI><A HREF="#stoprob">stop/restrict robots</A> -<LI><A HREF="#proxyssl">proxy SSL requests <EM>through</EM> your non-SSL - server</A> -</UL> - -<HR> -<H2><A NAME="redirect">How to redirect an entire server or directory to a -single URL</A></H2> - -<P>There are two chief ways to redirect all requests for an entire -server to a single location: one which requires the use of -<CODE>mod_rewrite</CODE>, and another which uses a CGI script. - -<P>First: if all you need to do is migrate a server from one name to -another, simply use the <CODE>Redirect</CODE> directive, as supplied -by <CODE>mod_alias</CODE>: - -<BLOCKQUOTE><PRE> - Redirect / http://www.apache.org/ -</PRE></BLOCKQUOTE> - -<P>Since <CODE>Redirect</CODE> will forward along the complete path, -however, it may not be appropriate - for example, when the directory -structure has changed after the move, and you simply want to direct people -to the home page. - -<P>The best option is to use the standard Apache module -<CODE>mod_rewrite</CODE>. -If that module is compiled in, the following lines - -<BLOCKQUOTE><PRE>RewriteEngine On -RewriteRule /.* http://www.apache.org/ [R] -</PRE></BLOCKQUOTE> - -will send an HTTP 302 Redirect back to the client, and no matter -what they gave in the original URL, they'll be sent to -"http://www.apache.org/". - -<p>The second option is to set up a <CODE>ScriptAlias</CODE> pointing to -a <STRONG>CGI script</STRONG> which outputs a 301 or 302 status and the -location -of the other server.</P> - -<P>By using a <STRONG>CGI script</STRONG> you can intercept various requests -and -treat them specially, <EM>e.g.</EM>, you might want to intercept -<STRONG>POST</STRONG> -requests, so that the client isn't redirected to a script on the other -server which expects POST information (a redirect will lose the POST -information.) You might also want to use a CGI script if you don't -want to compile mod_rewrite into your server. - -<P>Here's how to redirect all requests to a script... In the server -configuration file, -<BLOCKQUOTE><PRE>ScriptAlias / /usr/local/httpd/cgi-bin/redirect_script/</PRE> -</BLOCKQUOTE> - -and here's a simple perl script to redirect requests: - -<BLOCKQUOTE><PRE> -#!/usr/local/bin/perl - -print "Status: 302 Moved Temporarily\r\n" . - "Location: http://www.some.where.else.com/\r\n" . - "\r\n"; - -</PRE></BLOCKQUOTE> - -<HR> - -<H2><A NAME="logreset">How to reset your log files</A></H2> - -<P>Sooner or later, you'll want to reset your log files (access_log and -error_log) because they are too big, or full of old information you don't -need.</P> - -<P><CODE>access.log</CODE> typically grows by 1Mb for each 10,000 requests.</P> - -<P>Most people's first attempt at replacing the logfile is to just move the -logfile or remove the logfile. This doesn't work.</P> - -<P>Apache will continue writing to the logfile at the same offset as before the -logfile moved. This results in a new logfile being created which is just -as big as the old one, but it now contains thousands (or millions) of null -characters.</P> - -<P>The correct procedure is to move the logfile, then signal Apache to tell -it to reopen the logfiles.</P> - -<P>Apache is signaled using the <STRONG>SIGHUP</STRONG> (-1) signal. -<EM>e.g.</EM> -<BLOCKQUOTE><CODE> -mv access_log access_log.old<BR> -kill -1 `cat httpd.pid` -</CODE></BLOCKQUOTE> - -<P>Note: <CODE>httpd.pid</CODE> is a file containing the -<STRONG>p</STRONG>rocess <STRONG>id</STRONG> -of the Apache httpd daemon, Apache saves this in the same directory as the log -files.</P> - -<P>Many people use this method to replace (and backup) their logfiles on a -nightly or weekly basis.</P> -<HR> - -<H2><A NAME="stoprob">How to stop or restrict robots</A></H2> - -<P>Ever wondered why so many clients are interested in a file called -<CODE>robots.txt</CODE> which you don't have, and never did have?</P> - -<P>These clients are called <STRONG>robots</STRONG> (also known as crawlers, -spiders and other cute names) - special automated clients which -wander around the web looking for interesting resources.</P> - -<P>Most robots are used to generate some kind of <EM>web index</EM> which -is then used by a <EM>search engine</EM> to help locate information.</P> - -<P><CODE>robots.txt</CODE> provides a means to request that robots limit their -activities at the site, or more often than not, to leave the site alone.</P> - -<P>When the first robots were developed, they had a bad reputation for -sending hundreds/thousands of requests to each site, often resulting -in the site being overloaded. Things have improved dramatically since -then, thanks to <A -HREF="http://info.webcrawler.com/mak/projects/robots/guidelines.html"> -Guidelines for Robot Writers</A>, but even so, some robots may exhibit -unfriendly behavior which the webmaster isn't willing to tolerate, and -will want to stop.</P> - -<P>Another reason some webmasters want to block access to robots, is to -stop them indexing dynamic information. Many search engines will use the -data collected from your pages for months to come - not much use if you're -serving stock quotes, news, weather reports or anything else that will be -stale by the time people find it in a search engine.</P> - -<P>If you decide to exclude robots completely, or just limit the areas -in which they can roam, create a <CODE>robots.txt</CODE> file; refer -to the <A HREF="http://info.webcrawler.com/mak/projects/robots/robots.html" ->robot information pages</A> provided by Martijn Koster for the syntax.</P> - -<HR> -<H2><A NAME="proxyssl">How to proxy SSL requests <EM>through</EM> - your non-SSL Apache server</A> - <BR> - <SMALL>(<EM>submitted by David Sedlock</EM>)</SMALL> -</H2> -<P> -SSL uses port 443 for requests for secure pages. If your browser just -sits there for a long time when you attempt to access a secure page -over your Apache proxy, then the proxy may not be configured to handle -SSL. You need to instruct Apache to listen on port 443 in addition to -any of the ports on which it is already listening: -</P> -<PRE> - Listen 80 - Listen 443 -</PRE> -<P> -Then set the security proxy in your browser to 443. That might be it! -</P> -<P> -If your proxy is sending requests to another proxy, then you may have -to set the directive ProxyRemote differently. Here are my settings: -</P> -<PRE> - ProxyRemote http://nicklas:80/ http://proxy.mayn.franken.de:8080 - ProxyRemote http://nicklas:443/ http://proxy.mayn.franken.de:443 -</PRE> -<P> -Requests on port 80 of my proxy <SAMP>nicklas</SAMP> are forwarded to -<SAMP>proxy.mayn.franken.de:8080</SAMP>, while requests on port 443 are -forwarded to <SAMP>proxy.mayn.franken.de:443</SAMP>. -If the remote proxy is not set up to -handle port 443, then the last directive can be left out. SSL requests -will only go over the first proxy. -</P> -<P> -Note that your Apache does NOT have to be set up to serve secure pages -with SSL. Proxying SSL is a different thing from using it. -</P> -<!--#include virtual="footer.html" --> -</BODY> -</HTML> diff --git a/docs/manual/misc/index.html b/docs/manual/misc/index.html deleted file mode 100644 index 2893b456c4..0000000000 --- a/docs/manual/misc/index.html +++ /dev/null @@ -1,101 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> - <HEAD> - <TITLE>Apache Miscellaneous Documentation</TITLE> - </HEAD> - - <!-- Background white, links blue (unvisited), navy (visited), red (active) --> - <BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" - > - <!--#include virtual="header.html" --> - <H1 ALIGN="CENTER">Apache Miscellaneous Documentation</H1> - - <P> - Below is a list of additional documentation pages that apply to the - Apache web server development project. - </P> - <DL> - <DT><A - HREF="API.html" - >API</A> - </DT> - <DD>Description of Apache's Application Programming Interface. - </DD> - <DT><A - HREF="FAQ.html" - >FAQ</A> - </DT> - <DD>Frequently-Asked Questions concerning the Apache project and server. - </DD> - <DT><A - HREF="client_block_api.html" - >Reading Client Input in Apache 1.2</A> - </DT> - <DD>Describes differences between Apache 1.1 and 1.2 in how modules - read information from the client. - </DD> - <DT><A - HREF="compat_notes.html" - >Compatibility with NCSA</A> - </DT> - <DD>Notes about Apache's compatibility with the NCSA server. - </DD> - <DT><A HREF="custom_errordocs.html">How to use XSSI and Negotiation - for custom ErrorDocuments</A> - </DT> - <DD>Describes a solution which uses XSSI and negotiation - to custom-tailor the Apache ErrorDocuments to taste, adding the - advantage of returning internationalized versions of the error - messages depending on the client's language preferences. - </DD> - <DT><A HREF="descriptors.html">File Descriptor use in Apache</A> - <DD>Describes how Apache uses file descriptors and talks about various - limits imposed on the number of descriptors available by various - operating systems. - </DD> - <DT><A - HREF="fin_wait_2.html" - ><SAMP>FIN_WAIT_2</SAMP></A> - </DT> - <DD>A description of the causes of Apache processes going into the - <SAMP>FIN_WAIT_2</SAMP> state, and what you can do about it. - </DD> - <DT><A - HREF="howto.html" - >"How-To"</A> - </DT> - <DD>Instructions about how to accomplish some commonly-desired server - functionality changes. - </DD> - <DT><A - HREF="known_client_problems.html" - >Known Client Problems</A> - </DT> - <DD>A list of problems in HTTP clients which can be mitigated by Apache. - </DD> - <DT><A - HREF="perf-tuning.html" - >Performance Notes -- Apache Tuning</A> - </DT> - <DD>Notes about how to (run-time and compile-time) configure - Apache for highest performance. Notes explaining why Apache does - some things, and why it doesn't do other things (which make it - slower/faster). - </DD> - <DT><A - HREF="security_tips.html" - >Security Tips</A> - </DT> - <DD>Some "do"s - and "don't"s - for keeping your - Apache web site secure. - </DD> - </DL> - - <!--#include virtual="footer.html" --> - </BODY> -</HTML> diff --git a/docs/manual/misc/known_client_problems.html b/docs/manual/misc/known_client_problems.html deleted file mode 100644 index 99e5d68580..0000000000 --- a/docs/manual/misc/known_client_problems.html +++ /dev/null @@ -1,305 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> -<TITLE>Apache HTTP Server Project</TITLE> -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> -<H1 ALIGN="CENTER">Known Problems in Clients</H1> - -<P>Over time the Apache Group has discovered or been notified of problems -with various clients which we have had to work around, or explain. -This document describes these problems and the workarounds available. -It's not arranged in any particular order. Some familiarity with the -standards is assumed, but not necessary. - -<P>For brevity, <EM>Navigator</EM> will refer to Netscape's Navigator -product (which in later versions was renamed "Communicator" and -various other names), and <EM>MSIE</EM> will refer to Microsoft's -Internet Explorer product. All trademarks and copyrights belong to -their respective companies. We welcome input from the various client -authors to correct inconsistencies in this paper, or to provide us with -exact version numbers where things are broken/fixed. - -<P>For reference, -<A HREF="ftp://ds.internic.net/rfc/rfc1945.txt">RFC1945</A> -defines HTTP/1.0, and -<A HREF="ftp://ds.internic.net/rfc/rfc2068.txt">RFC2068</A> -defines HTTP/1.1. Apache as of version 1.2 is an HTTP/1.1 server (with an -optional HTTP/1.0 proxy). - -<P>Various of these workarounds are triggered by environment variables. -The admin typically controls which are set, and for which clients, by using -<A HREF="../mod/mod_browser.html">mod_browser</A>. Unless otherwise -noted all of these workarounds exist in versions 1.2 and later. - -<H3><A NAME="trailing-crlf">Trailing CRLF on POSTs</A></H3> - -<P>This is a legacy issue. The CERN webserver required <CODE>POST</CODE> -data to have an extra <CODE>CRLF</CODE> following it. Thus many -clients send an extra <CODE>CRLF</CODE> that -is not included in the <CODE>Content-Length</CODE> of the request. -Apache works around this problem by eating any empty lines which -appear before a request. - -<H3><A NAME="broken-keepalive">Broken keepalive</A></H3> - -<P>Various clients have had broken implementations of <EM>keepalive</EM> -(persistent connections). In particular the Windows versions of -Navigator 2.0 get very confused when the server times out an -idle connection. The workaround is present in the default config files: -<BLOCKQUOTE><CODE> -BrowserMatch Mozilla/2 nokeepalive -</CODE></BLOCKQUOTE> -Note that this matches some earlier versions of MSIE, which began the -practice of calling themselves <EM>Mozilla</EM> in their user-agent -strings just like Navigator. - -<P>MSIE 4.0b2, which claims to support HTTP/1.1, does not properly -support keepalive when it is used on 301 or 302 (redirect) -responses. Unfortunately Apache's <CODE>nokeepalive</CODE> code -prior to 1.2.2 would not work with HTTP/1.1 clients. You must apply -<A -HREF="http://www.apache.org/dist/httpd/patches/apply_to_1.2.1/msie_4_0b2_fixes.patch" ->this patch</A> to version 1.2.1. Then add this to your config: -<BLOCKQUOTE><CODE> -BrowserMatch "MSIE 4\.0b2;" nokeepalive -</CODE></BLOCKQUOTE> - -<H3><A NAME="force-response-1.0">Incorrect interpretation of -<CODE>HTTP/1.1</CODE> in response</A></H3> - -<P>To quote from section 3.1 of RFC1945: -<BLOCKQUOTE> -HTTP uses a "<MAJOR>.<MINOR>" numbering scheme to indicate versions -of the protocol. The protocol versioning policy is intended to allow -the sender to indicate the format of a message and its capacity for -understanding further HTTP communication, rather than the features -obtained via that communication. -</BLOCKQUOTE> -Since Apache is an HTTP/1.1 server, it indicates so as part of its -response. Many client authors mistakenly treat this part of the response -as an indication of the protocol that the response is in, and then refuse -to accept the response. - -<P>The first major indication of this problem was with AOL's proxy servers. -When Apache 1.2 went into beta it was the first wide-spread HTTP/1.1 -server. After some discussion, AOL fixed their proxies. In -anticipation of similar problems, the <CODE>force-response-1.0</CODE> -environment variable was added to Apache. When present Apache will -indicate "HTTP/1.0" in response to an HTTP/1.0 client, -but will not in any other way change the response. - -<P>The pre-1.1 Java Development Kit (JDK) that is used in many clients -(including Navigator 3.x and MSIE 3.x) exhibits this problem. As do some -of the early pre-releases of the 1.1 JDK. We think it is fixed in the -1.1 JDK release. In any event the workaround: -<BLOCKQUOTE><CODE> -BrowserMatch Java/1.0 force-response-1.0 <BR> -BrowserMatch JDK/1.0 force-response-1.0 -</CODE></BLOCKQUOTE> - -<P>RealPlayer 4.0 from Progressive Networks also exhibits this problem. -However they have fixed it in version 4.01 of the player, but version -4.01 uses the same <CODE>User-Agent</CODE> as version 4.0. The -workaround is still: -<BLOCKQUOTE><CODE> -BrowserMatch "RealPlayer 4.0" force-response-1.0 -</CODE></BLOCKQUOTE> - -<H3><A NAME="msie4.0b2">Requests use HTTP/1.1 but responses must be -in HTTP/1.0</A></H3> - -<P>MSIE 4.0b2 has this problem. Its Java VM makes requests in HTTP/1.1 -format but the responses must be in HTTP/1.0 format (in particular, it -does not understand <EM>chunked</EM> responses). The workaround -is to fool Apache into believing the request came in HTTP/1.0 format. -<BLOCKQUOTE><CODE> -BrowserMatch "MSIE 4\.0b2;" downgrade-1.0 force-response-1.0 -</CODE></BLOCKQUOTE> -This workaround is available in 1.2.2, and in a -<A -HREF="http://www.apache.org/dist/httpd/patches/apply_to_1.2.1/msie_4_0b2_fixes.patch" ->patch</A> against 1.2.1. - -<H3><A NAME="257th-byte">Boundary problems with header parsing</A></H3> - -<P>All versions of Navigator from 2.0 through 4.0b2 (and possibly later) -have a problem if the trailing CRLF of the response header starts at -offset 256, 257 or 258 of the response. A BrowserMatch for this would -match on nearly every hit, so the workaround is enabled automatically -on all responses. The workaround implemented detects when this condition would -occur in a response and adds extra padding to the header to push the -trailing CRLF past offset 258 of the response. - -<H3><A NAME="boundary-string">Multipart responses and Quoted Boundary -Strings</A></H3> - -<P>On multipart responses some clients will not accept quotes (") -around the boundary string. The MIME standard recommends that -such quotes be used. But the clients were probably written based -on one of the examples in RFC2068, which does not include quotes. -Apache does not include quotes on its boundary strings to workaround -this problem. - -<H3><A NAME="byterange-requests">Byterange requests</A></H3> - -<P>A byterange request is used when the client wishes to retrieve a -portion of an object, not necessarily the entire object. There -was a very old draft which included these byteranges in the URL. -Old clients such as Navigator 2.0b1 and MSIE 3.0 for the MAC -exhibit this behaviour, and -it will appear in the servers' access logs as (failed) attempts to -retrieve a URL with a trailing ";xxx-yyy". Apache does not attempt -to implement this at all. - -<P>A subsequent draft of this standard defines a header -<CODE>Request-Range</CODE>, and a response type -<CODE>multipart/x-byteranges</CODE>. The HTTP/1.1 standard includes -this draft with a few fixes, and it defines the header -<CODE>Range</CODE> and type <CODE>multipart/byteranges</CODE>. - -<P>Navigator (versions 2 and 3) sends both <CODE>Range</CODE> and -<CODE>Request-Range</CODE> headers (with the same value), but does not -accept a <CODE>multipart/byteranges</CODE> response. The response must -be <CODE>multipart/x-byteranges</CODE>. As a workaround, if Apache -receives a <CODE>Request-Range</CODE> header it considers it "higher -priority" than a <CODE>Range</CODE> header and in response uses -<CODE>multipart/x-byteranges</CODE>. - -<P>The Adobe Acrobat Reader plugin makes extensive use of byteranges and -prior to version 3.01 supports only the <CODE>multipart/x-byterange</CODE> -response. Unfortunately there is no clue that it is the plugin -making the request. If the plugin is used with Navigator, the above -workaround works fine. But if the plugin is used with MSIE 3 (on -Windows) the workaround won't work because MSIE 3 doesn't give the -<CODE>Range-Request</CODE> clue that Navigator does. To workaround this, -Apache special cases "MSIE 3" in the <CODE>User-Agent</CODE> and serves -<CODE>multipart/x-byteranges</CODE>. Note that the necessity for this -with MSIE 3 is actually due to the Acrobat plugin, not due to the browser. - -<P>Netscape Communicator appears to not issue the non-standard -<CODE>Request-Range</CODE> header. When an Acrobat plugin prior to -version 3.01 is used with it, it will not properly understand byteranges. -The user must upgrade their Acrobat reader to 3.01. - -<H3><A NAME="cookie-merge"><CODE>Set-Cookie</CODE> header is -unmergeable</A></H3> - -<P>The HTTP specifications say that it is legal to merge headers with -duplicate names into one (separated by commas). Some browsers -that support Cookies don't like merged headers and prefer that each -<CODE>Set-Cookie</CODE> header is sent separately. When parsing the -headers returned by a CGI, Apache will explicitly avoid merging any -<CODE>Set-Cookie</CODE> headers. - -<H3><A NAME="gif89-expires"><CODE>Expires</CODE> headers and GIF89A -animations</A></H3> - -<P>Navigator versions 2 through 4 will erroneously re-request -GIF89A animations on each loop of the animation if the first -response included an <CODE>Expires</CODE> header. This happens -regardless of how far in the future the expiry time is set. There -is no workaround supplied with Apache, however there are hacks for <A -HREF="http://www.arctic.org/~dgaudet/patches/apache-1.2-gif89-expires-hack.patch">1.2</A> -and for <A -HREF="http://www.arctic.org/~dgaudet/patches/apache-1.3-gif89-expires-hack.patch">1.3</A>. - -<H3><A NAME="no-content-length"><CODE>POST</CODE> without -<CODE>Content-Length</CODE></A></H3> - -<P>In certain situations Navigator 3.01 through 3.03 appear to incorrectly -issue a POST without the request body. There is no -known workaround. It has been fixed in Navigator 3.04, Netscapes -provides some -<A HREF="http://help.netscape.com/kb/client/971014-42.html">information</A>. -There's also -<A HREF="http://www.arctic.org/~dgaudet/apache/no-content-length/"> -some information</A> about the actual problem. - -<H3><A NAME="jdk-12-bugs">JDK 1.2 betas lose parts of responses.</A></H3> - -<P>The http client in the JDK1.2beta2 and beta3 will throw away the first part of -the response body when both the headers and the first part of the body are sent -in the same network packet AND keep-alive's are being used. If either condition -is not met then it works fine. - -<P>See also Bug-ID's 4124329 and 4125538 at the java developer connection. - -<P>If you are seeing this bug yourself, you can add the following BrowserMatch -directive to work around it: - -<BLOCKQUOTE><CODE> -BrowserMatch "Java1\.2beta[23]" nokeepalive -</CODE></BLOCKQUOTE> - -<P>We don't advocate this though since bending over backwards for beta software -is usually not a good idea; ideally it gets fixed, new betas or a final release -comes out, and no one uses the broken old software anymore. In theory. - -<H3><A NAME="content-type-persistence"><CODE>Content-Type</CODE> change -is not noticed after reload</A></H3> - -<P>Navigator (all versions?) will cache the <CODE>content-type</CODE> -for an object "forever". Using reload or shift-reload will not cause -Navigator to notice a <CODE>content-type</CODE> change. The only -work-around is for the user to flush their caches (memory and disk). By -way of an example, some folks may be using an old <CODE>mime.types</CODE> -file which does not map <CODE>.htm</CODE> to <CODE>text/html</CODE>, -in this case Apache will default to sending <CODE>text/plain</CODE>. -If the user requests the page and it is served as <CODE>text/plain</CODE>. -After the admin fixes the server, the user will have to flush their caches -before the object will be shown with the correct <CODE>text/html</CODE> -type. - -<h3><a name="msie-cookie-y2k">MSIE Cookie problem with expiry date in -the year 2000</a></h3> - -<p>MSIE versions 3.00 and 3.02 (without the Y2K patch) do not handle -cookie expiry dates in the year 2000 properly. Years after 2000 and -before 2000 work fine. This is fixed in IE4.01 service pack 1, and in -the Y2K patch for IE3.02. Users should avoid using expiry dates in the -year 2000. - -<h3><a name="lynx-negotiate-trans">Lynx incorrectly asking for transparent -content negotiation</a></h3> - -<p>The Lynx browser versions 2.7 and 2.8 send a "negotiate: trans" header -in their requests, which is an indication the browser supports transparent -content negotiation (TCN). However the browser does not support TCN. -As of version 1.3.4, Apache supports TCN, and this causes problems with -these versions of Lynx. As a workaround future versions of Apache will -ignore this header when sent by the Lynx client. - -<h3><a name="ie40-vary">MSIE 4.0 mishandles Vary response header</a></h3> - -<p>MSIE 4.0 does not handle a Vary header properly. The Vary header is -generated by mod_rewrite in apache 1.3. The result is an error from MSIE -saying it cannot download the requested file. There are more details -in <a href="http://bugs.apache.org/index/full/4118">PR#4118</a>. -</P> -<P> -A workaround is to add the following to your server's configuration -files: -</P> -<PRE> - BrowserMatch "MSIE 4\.0" force-no-vary -</PRE> -<P> -(This workaround is only available with releases <STRONG>after</STRONG> -1.3.6 of the Apache Web server.) -</P> - - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> - diff --git a/docs/manual/misc/perf-tuning.html b/docs/manual/misc/perf-tuning.html deleted file mode 100644 index ce31fe5653..0000000000 --- a/docs/manual/misc/perf-tuning.html +++ /dev/null @@ -1,906 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> - <TITLE>Apache Performance Notes</TITLE> -</HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> - -<blockquote><strong>Warning:</strong> -This document has not been updated to take into account changes -made in the 2.0 version of the Apache HTTP Server. Some of the -information may still be relevant, but please use it -with care. -</blockquote> - -<H1 align="center">Apache Performance Notes</H1> - -<P>Author: Dean Gaudet - -<ul> -<li><a href="#introduction">Introduction</a></li> -<li><a href="#hardware">Hardware and Operating System Issues</a></li> -<li><a href="#runtime">Run-Time Configuration Issues</a></li> -<li><a href="#compiletime">Compile-Time Configuration Issues</a></li> -<li>Appendixes - <ul> - <li><a href="#trace">Detailed Analysis of a Trace</a></li> - <li><a href="#patches">Patches Available</a></li> - <li><a href="#preforking">The Pre-Forking Model</a></li> - </ul></li> -</ul> - -<hr> - -<H3><a name="introduction">Introduction</A></H3> -<P>Apache is a general webserver, which is designed to be correct first, and -fast second. Even so, its performance is quite satisfactory. Most -sites have less than 10Mbits of outgoing bandwidth, which Apache can -fill using only a low end Pentium-based webserver. In practice sites -with more bandwidth require more than one machine to fill the bandwidth -due to other constraints (such as CGI or database transaction overhead). -For these reasons the development focus has been mostly on correctness -and configurability. - -<P>Unfortunately many folks overlook these facts and cite raw performance -numbers as if they are some indication of the quality of a web server -product. There is a bare minimum performance that is acceptable, beyond -that extra speed only caters to a much smaller segment of the market. -But in order to avoid this hurdle to the acceptance of Apache in some -markets, effort was put into Apache 1.3 to bring performance up to a -point where the difference with other high-end webservers is minimal. - -<P>Finally there are the folks who just plain want to see how fast something -can go. The author falls into this category. The rest of this document -is dedicated to these folks who want to squeeze every last bit of -performance out of Apache's current model, and want to understand why -it does some things which slow it down. - -<P>Note that this is tailored towards Apache 1.3 on Unix. Some of it applies -to Apache on NT. Apache on NT has not been tuned for performance yet; -in fact it probably performs very poorly because NT performance requires -a different programming model. - -<hr> - -<H3><a name="hardware">Hardware and Operating System Issues</a></H3> - -<P>The single biggest hardware issue affecting webserver performance -is RAM. A webserver should never ever have to swap, swapping increases -the latency of each request beyond a point that users consider "fast -enough". This causes users to hit stop and reload, further increasing -the load. You can, and should, control the <CODE>MaxClients</CODE> -setting so that your server does not spawn so many children it starts -swapping. - -<P>Beyond that the rest is mundane: get a fast enough CPU, a fast enough -network card, and fast enough disks, where "fast enough" is something -that needs to be determined by experimentation. - -<P>Operating system choice is largely a matter of local concerns. But -a general guideline is to always apply the latest vendor TCP/IP patches. -HTTP serving completely breaks many of the assumptions built into Unix -kernels up through 1994 and even 1995. Good choices include -recent FreeBSD, and Linux. - -<hr> - -<H3><a name="runtime">Run-Time Configuration Issues</a></H3> - -<H4>HostnameLookups</H4> -<P>Prior to Apache 1.3, <CODE>HostnameLookups</CODE> defaulted to On. -This adds latency -to every request because it requires a DNS lookup to complete before -the request is finished. In Apache 1.3 this setting defaults to Off. -However (1.3 or later), if you use any <CODE>Allow from domain</CODE> or -<CODE>Deny from domain</CODE> directives then you will pay for a -double reverse DNS lookup (a reverse, followed by a forward to make sure -that the reverse is not being spoofed). So for the highest performance -avoid using these directives (it's fine to use IP addresses rather than -domain names). - -<P>Note that it's possible to scope the directives, such as within -a <CODE><Location /server-status></CODE> section. In this -case the DNS lookups are only performed on requests matching the -criteria. Here's an example which disables -lookups except for .html and .cgi files: - -<BLOCKQUOTE><PRE> -HostnameLookups off -<Files ~ "\.(html|cgi)$"> - HostnameLookups on -</Files> -</PRE></BLOCKQUOTE> - -But even still, if you just need DNS names -in some CGIs you could consider doing the -<CODE>gethostbyname</CODE> call in the specific CGIs that need it. - -<H4>FollowSymLinks and SymLinksIfOwnerMatch</H4> -<P>Wherever in your URL-space you do not have an -<CODE>Options FollowSymLinks</CODE>, or you do have an -<CODE>Options SymLinksIfOwnerMatch</CODE> Apache will have to -issue extra system calls to check up on symlinks. One extra call per -filename component. For example, if you had: - -<BLOCKQUOTE><PRE> -DocumentRoot /www/htdocs -<Directory /> - Options SymLinksIfOwnerMatch -</Directory> -</PRE></BLOCKQUOTE> - -and a request is made for the URI <CODE>/index.html</CODE>. -Then Apache will perform <CODE>lstat(2)</CODE> on <CODE>/www</CODE>, -<CODE>/www/htdocs</CODE>, and <CODE>/www/htdocs/index.html</CODE>. The -results of these <CODE>lstats</CODE> are never cached, -so they will occur on every single request. If you really desire the -symlinks security checking you can do something like this: - -<BLOCKQUOTE><PRE> -DocumentRoot /www/htdocs -<Directory /> - Options FollowSymLinks -</Directory> -<Directory /www/htdocs> - Options -FollowSymLinks +SymLinksIfOwnerMatch -</Directory> -</PRE></BLOCKQUOTE> - -This at least avoids the extra checks for the <CODE>DocumentRoot</CODE> -path. Note that you'll need to add similar sections if you have any -<CODE>Alias</CODE> or <CODE>RewriteRule</CODE> paths outside of your -document root. For highest performance, and no symlink protection, -set <CODE>FollowSymLinks</CODE> everywhere, and never set -<CODE>SymLinksIfOwnerMatch</CODE>. - -<H4>AllowOverride</H4> - -<P>Wherever in your URL-space you allow overrides (typically -<CODE>.htaccess</CODE> files) Apache will attempt to open -<CODE>.htaccess</CODE> for each filename component. For example, - -<BLOCKQUOTE><PRE> -DocumentRoot /www/htdocs -<Directory /> - AllowOverride all -</Directory> -</PRE></BLOCKQUOTE> - -and a request is made for the URI <CODE>/index.html</CODE>. Then -Apache will attempt to open <CODE>/.htaccess</CODE>, -<CODE>/www/.htaccess</CODE>, and <CODE>/www/htdocs/.htaccess</CODE>. -The solutions are similar to the previous case of <CODE>Options -FollowSymLinks</CODE>. For highest performance use -<CODE>AllowOverride None</CODE> everywhere in your filesystem. - -<H4>Negotiation</H4> - -<P>If at all possible, avoid content-negotiation if you're really -interested in every last ounce of performance. In practice the -benefits of negotiation outweigh the performance penalties. There's -one case where you can speed up the server. Instead of using -a wildcard such as: - -<BLOCKQUOTE><PRE> -DirectoryIndex index -</PRE></BLOCKQUOTE> - -Use a complete list of options: - -<BLOCKQUOTE><PRE> -DirectoryIndex index.cgi index.pl index.shtml index.html -</PRE></BLOCKQUOTE> - -where you list the most common choice first. - -<H4>Process Creation</H4> - -<P>Prior to Apache 1.3 the <CODE>MinSpareServers</CODE>, -<CODE>MaxSpareServers</CODE>, and <CODE>StartServers</CODE> settings -all had drastic effects on benchmark results. In particular, Apache -required a "ramp-up" period in order to reach a number of children -sufficient to serve the load being applied. After the initial -spawning of <CODE>StartServers</CODE> children, only one child per -second would be created to satisfy the <CODE>MinSpareServers</CODE> -setting. So a server being accessed by 100 simultaneous clients, -using the default <CODE>StartServers</CODE> of 5 would take on -the order 95 seconds to spawn enough children to handle the load. This -works fine in practice on real-life servers, because they aren't restarted -frequently. But does really poorly on benchmarks which might only run -for ten minutes. - -<P>The one-per-second rule was implemented in an effort to avoid -swamping the machine with the startup of new children. If the machine -is busy spawning children it can't service requests. But it has such -a drastic effect on the perceived performance of Apache that it had -to be replaced. As of Apache 1.3, -the code will relax the one-per-second rule. It -will spawn one, wait a second, then spawn two, wait a second, then spawn -four, and it will continue exponentially until it is spawning 32 children -per second. It will stop whenever it satisfies the -<CODE>MinSpareServers</CODE> setting. - -<P>This appears to be responsive enough that it's -almost unnecessary to twiddle the <CODE>MinSpareServers</CODE>, -<CODE>MaxSpareServers</CODE> and <CODE>StartServers</CODE> knobs. When -more than 4 children are spawned per second, a message will be emitted -to the <CODE>ErrorLog</CODE>. If you see a lot of these errors then -consider tuning these settings. Use the <CODE>mod_status</CODE> output -as a guide. - -<P>Related to process creation is process death induced by the -<CODE>MaxRequestsPerChild</CODE> setting. By default this is 0, which -means that there is no limit to the number of requests handled -per child. If your configuration currently has this set to some -very low number, such as 30, you may want to bump this up significantly. -If you are running SunOS or an old version of Solaris, limit this -to 10000 or so because of memory leaks. - -<P>When keep-alives are in use, children will be kept busy -doing nothing waiting for more requests on the already open -connection. The default <CODE>KeepAliveTimeout</CODE> of -15 seconds attempts to minimize this effect. The tradeoff -here is between network bandwidth and server resources. -In no event should you raise this above about 60 seconds, as -<A HREF="http://www.research.digital.com/wrl/techreports/abstracts/95.4.html" ->most of the benefits are lost</A>. - -<hr> - -<H3><a name="compiletime">Compile-Time Configuration Issues</a></H3> - -<H4>mod_status and ExtendedStatus On</H4> - -<P>If you include <CODE>mod_status</CODE> -and you also set <CODE>ExtendedStatus On</CODE> when building and running -Apache, then on every request Apache will perform two calls to -<CODE>gettimeofday(2)</CODE> (or <CODE>times(2)</CODE> depending -on your operating system), and (pre-1.3) several extra calls to -<CODE>time(2)</CODE>. This is all done so that the status report -contains timing indications. For highest performance, set -<CODE>ExtendedStatus off</CODE> (which is the default). - -<H4>accept Serialization - multiple sockets</H4> - -<P>This discusses a shortcoming in the Unix socket API. -Suppose your -web server uses multiple <CODE>Listen</CODE> statements to listen on -either multiple ports or multiple addresses. In order to test each -socket to see if a connection is ready Apache uses <CODE>select(2)</CODE>. -<CODE>select(2)</CODE> indicates that a socket has <EM>zero</EM> or -<EM>at least one</EM> connection waiting on it. Apache's model includes -multiple children, and all the idle ones test for new connections at the -same time. A naive implementation looks something like this -(these examples do not match the code, they're contrived for -pedagogical purposes): - -<BLOCKQUOTE><PRE> - for (;;) { - for (;;) { - fd_set accept_fds; - - FD_ZERO (&accept_fds); - for (i = first_socket; i <= last_socket; ++i) { - FD_SET (i, &accept_fds); - } - rc = select (last_socket+1, &accept_fds, NULL, NULL, NULL); - if (rc < 1) continue; - new_connection = -1; - for (i = first_socket; i <= last_socket; ++i) { - if (FD_ISSET (i, &accept_fds)) { - new_connection = accept (i, NULL, NULL); - if (new_connection != -1) break; - } - } - if (new_connection != -1) break; - } - process the new_connection; - } -</PRE></BLOCKQUOTE> - -But this naive implementation has a serious starvation problem. Recall -that multiple children execute this loop at the same time, and so multiple -children will block at <CODE>select</CODE> when they are in between -requests. All those blocked children will awaken and return from -<CODE>select</CODE> when a single request appears on any socket -(the number of children which awaken varies depending on the operating -system and timing issues). -They will all then fall down into the loop and try to <CODE>accept</CODE> -the connection. But only one will succeed (assuming there's still only -one connection ready), the rest will be <EM>blocked</EM> in -<CODE>accept</CODE>. -This effectively locks those children into serving requests from that -one socket and no other sockets, and they'll be stuck there until enough -new requests appear on that socket to wake them all up. -This starvation problem was first documented in -<A HREF="http://bugs.apache.org/index/full/467">PR#467</A>. There -are at least two solutions. - -<P>One solution is to make the sockets non-blocking. In this case the -<CODE>accept</CODE> won't block the children, and they will be allowed -to continue immediately. But this wastes CPU time. Suppose you have -ten idle children in <CODE>select</CODE>, and one connection arrives. -Then nine of those children will wake up, try to <CODE>accept</CODE> the -connection, fail, and loop back into <CODE>select</CODE>, accomplishing -nothing. Meanwhile none of those children are servicing requests that -occurred on other sockets until they get back up to the <CODE>select</CODE> -again. Overall this solution does not seem very fruitful unless you -have as many idle CPUs (in a multiprocessor box) as you have idle children, -not a very likely situation. - -<P>Another solution, the one used by Apache, is to serialize entry into -the inner loop. The loop looks like this (differences highlighted): - -<BLOCKQUOTE><PRE> - for (;;) { - <STRONG>accept_mutex_on ();</STRONG> - for (;;) { - fd_set accept_fds; - - FD_ZERO (&accept_fds); - for (i = first_socket; i <= last_socket; ++i) { - FD_SET (i, &accept_fds); - } - rc = select (last_socket+1, &accept_fds, NULL, NULL, NULL); - if (rc < 1) continue; - new_connection = -1; - for (i = first_socket; i <= last_socket; ++i) { - if (FD_ISSET (i, &accept_fds)) { - new_connection = accept (i, NULL, NULL); - if (new_connection != -1) break; - } - } - if (new_connection != -1) break; - } - <STRONG>accept_mutex_off ();</STRONG> - process the new_connection; - } -</PRE></BLOCKQUOTE> - -<A NAME="serialize">The functions</A> -<CODE>accept_mutex_on</CODE> and <CODE>accept_mutex_off</CODE> -implement a mutual exclusion semaphore. Only one child can have the -mutex at any time. There are several choices for implementing these -mutexes. The choice is defined in <CODE>src/conf.h</CODE> (pre-1.3) or -<CODE>src/include/ap_config.h</CODE> (1.3 or later). Some architectures -do not have any locking choice made, on these architectures it is unsafe -to use multiple <CODE>Listen</CODE> directives. - -<DL> -<DT><CODE>USE_FLOCK_SERIALIZED_ACCEPT</CODE> -<DD>This method uses the <CODE>flock(2)</CODE> system call to lock a -lock file (located by the <CODE>LockFile</CODE> directive). - -<DT><CODE>USE_FCNTL_SERIALIZED_ACCEPT</CODE> -<DD>This method uses the <CODE>fcntl(2)</CODE> system call to lock a -lock file (located by the <CODE>LockFile</CODE> directive). - -<DT><CODE>USE_SYSVSEM_SERIALIZED_ACCEPT</CODE> -<DD>(1.3 or later) This method uses SysV-style semaphores to implement the -mutex. Unfortunately SysV-style semaphores have some bad side-effects. -One is that it's possible Apache will die without cleaning up the semaphore -(see the <CODE>ipcs(8)</CODE> man page). The other is that the semaphore -API allows for a denial of service attack by any CGIs running under the -same uid as the webserver (<EM>i.e.</EM>, all CGIs, unless you use something -like suexec or cgiwrapper). For these reasons this method is not used -on any architecture except IRIX (where the previous two are prohibitively -expensive on most IRIX boxes). - -<DT><CODE>USE_USLOCK_SERIALIZED_ACCEPT</CODE> -<DD>(1.3 or later) This method is only available on IRIX, and uses -<CODE>usconfig(2)</CODE> to create a mutex. While this method avoids -the hassles of SysV-style semaphores, it is not the default for IRIX. -This is because on single processor IRIX boxes (5.3 or 6.2) the -uslock code is two orders of magnitude slower than the SysV-semaphore -code. On multi-processor IRIX boxes the uslock code is an order of magnitude -faster than the SysV-semaphore code. Kind of a messed up situation. -So if you're using a multiprocessor IRIX box then you should rebuild your -webserver with <CODE>-DUSE_USLOCK_SERIALIZED_ACCEPT</CODE> on the -<CODE>EXTRA_CFLAGS</CODE>. - -<DT><CODE>USE_PTHREAD_SERIALIZED_ACCEPT</CODE> -<DD>(1.3 or later) This method uses POSIX mutexes and should work on -any architecture implementing the full POSIX threads specification, -however appears to only work on Solaris (2.5 or later), and even then -only in certain configurations. If you experiment with this you should -watch out for your server hanging and not responding. Static content -only servers may work just fine. -</DL> - -<P>If your system has another method of serialization which isn't in the -above list then it may be worthwhile adding code for it (and submitting -a patch back to Apache). - -<P>Another solution that has been considered but never implemented is -to partially serialize the loop -- that is, let in a certain number -of processes. This would only be of interest on multiprocessor boxes -where it's possible multiple children could run simultaneously, and the -serialization actually doesn't take advantage of the full bandwidth. -This is a possible area of future investigation, but priority remains -low because highly parallel web servers are not the norm. - -<P>Ideally you should run servers without multiple <CODE>Listen</CODE> -statements if you want the highest performance. But read on. - -<H4>accept Serialization - single socket</H4> - -<P>The above is fine and dandy for multiple socket servers, but what -about single socket servers? In theory they shouldn't experience -any of these same problems because all children can just block in -<CODE>accept(2)</CODE> until a connection arrives, and no starvation -results. In practice this hides almost the same "spinning" behaviour -discussed above in the non-blocking solution. The way that most TCP -stacks are implemented, the kernel actually wakes up all processes blocked -in <CODE>accept</CODE> when a single connection arrives. One of those -processes gets the connection and returns to user-space, the rest spin in -the kernel and go back to sleep when they discover there's no connection -for them. This spinning is hidden from the user-land code, but it's -there nonetheless. This can result in the same load-spiking wasteful -behaviour that a non-blocking solution to the multiple sockets case can. - -<P>For this reason we have found that many architectures behave more -"nicely" if we serialize even the single socket case. So this is -actually the default in almost all cases. Crude experiments under -Linux (2.0.30 on a dual Pentium pro 166 w/128Mb RAM) have shown that -the serialization of the single socket case causes less than a 3% -decrease in requests per second over unserialized single-socket. -But unserialized single-socket showed an extra 100ms latency on -each request. This latency is probably a wash on long haul lines, -and only an issue on LANs. If you want to override the single socket -serialization you can define <CODE>SINGLE_LISTEN_UNSERIALIZED_ACCEPT</CODE> -and then single-socket servers will not serialize at all. - -<H4>Lingering Close</H4> - -<P>As discussed in -<A - HREF="http://www.ics.uci.edu/pub/ietf/http/draft-ietf-http-connection-00.txt" ->draft-ietf-http-connection-00.txt</A> section 8, -in order for an HTTP server to <STRONG>reliably</STRONG> implement the protocol -it needs to shutdown each direction of the communication independently -(recall that a TCP connection is bi-directional, each half is independent -of the other). This fact is often overlooked by other servers, but -is correctly implemented in Apache as of 1.2. - -<P>When this feature was added to Apache it caused a flurry of -problems on various versions of Unix because of a shortsightedness. -The TCP specification does not state that the FIN_WAIT_2 state has a -timeout, but it doesn't prohibit it. On systems without the timeout, -Apache 1.2 induces many sockets stuck forever in the FIN_WAIT_2 state. -In many cases this can be avoided by simply upgrading to the latest -TCP/IP patches supplied by the vendor. In cases where the vendor has -never released patches (<EM>i.e.</EM>, SunOS4 -- although folks with a source -license can patch it themselves) we have decided to disable this feature. - -<P>There are two ways of accomplishing this. One is the -socket option <CODE>SO_LINGER</CODE>. But as fate would have it, -this has never been implemented properly in most TCP/IP stacks. Even -on those stacks with a proper implementation (<EM>i.e.</EM>, Linux 2.0.31) this -method proves to be more expensive (cputime) than the next solution. - -<P>For the most part, Apache implements this in a function called -<CODE>lingering_close</CODE> (in <CODE>http_main.c</CODE>). The -function looks roughly like this: - -<BLOCKQUOTE><PRE> - void lingering_close (int s) - { - char junk_buffer[2048]; - - /* shutdown the sending side */ - shutdown (s, 1); - - signal (SIGALRM, lingering_death); - alarm (30); - - for (;;) { - select (s for reading, 2 second timeout); - if (error) break; - if (s is ready for reading) { - if (read (s, junk_buffer, sizeof (junk_buffer)) <= 0) { - break; - } - /* just toss away whatever is here */ - } - } - - close (s); - } -</PRE></BLOCKQUOTE> - -This naturally adds some expense at the end of a connection, but it -is required for a reliable implementation. As HTTP/1.1 becomes more -prevalent, and all connections are persistent, this expense will be -amortized over more requests. If you want to play with fire and -disable this feature you can define <CODE>NO_LINGCLOSE</CODE>, but -this is not recommended at all. In particular, as HTTP/1.1 pipelined -persistent connections come into use <CODE>lingering_close</CODE> -is an absolute necessity (and -<A HREF="http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html"> -pipelined connections are faster</A>, so you -want to support them). - -<H4>Scoreboard File</H4> - -<P>Apache's parent and children communicate with each other through -something called the scoreboard. Ideally this should be implemented -in shared memory. For those operating systems that we either have -access to, or have been given detailed ports for, it typically is -implemented using shared memory. The rest default to using an -on-disk file. The on-disk file is not only slow, but it is unreliable -(and less featured). Peruse the <CODE>src/main/conf.h</CODE> file -for your architecture and look for either <CODE>USE_MMAP_SCOREBOARD</CODE> or -<CODE>USE_SHMGET_SCOREBOARD</CODE>. Defining one of those two (as -well as their companions <CODE>HAVE_MMAP</CODE> and <CODE>HAVE_SHMGET</CODE> -respectively) enables the supplied shared memory code. If your system has -another type of shared memory, edit the file <CODE>src/main/http_main.c</CODE> -and add the hooks necessary to use it in Apache. (Send us back a patch -too please.) - -<P>Historical note: The Linux port of Apache didn't start to use -shared memory until version 1.2 of Apache. This oversight resulted -in really poor and unreliable behaviour of earlier versions of Apache -on Linux. - -<H4><CODE>DYNAMIC_MODULE_LIMIT</CODE></H4> - -<P>If you have no intention of using dynamically loaded modules -(you probably don't if you're reading this and tuning your -server for every last ounce of performance) then you should add -<CODE>-DDYNAMIC_MODULE_LIMIT=0</CODE> when building your server. -This will save RAM that's allocated only for supporting dynamically -loaded modules. - -<hr> - -<H3><a name="trace">Appendix: Detailed Analysis of a Trace</a></H3> - -Here is a system call trace of Apache 1.3 running on Linux. The run-time -configuration file is essentially the default plus: - -<BLOCKQUOTE><PRE> -<Directory /> - AllowOverride none - Options FollowSymLinks -</Directory> -</PRE></BLOCKQUOTE> - -The file being requested is a static 6K file of no particular content. -Traces of non-static requests or requests with content negotiation -look wildly different (and quite ugly in some cases). First the -entire trace, then we'll examine details. (This was generated by -the <CODE>strace</CODE> program, other similar programs include -<CODE>truss</CODE>, <CODE>ktrace</CODE>, and <CODE>par</CODE>.) - -<BLOCKQUOTE><PRE> -accept(15, {sin_family=AF_INET, sin_port=htons(22283), sin_addr=inet_addr("127.0.0.1")}, [16]) = 3 -flock(18, LOCK_UN) = 0 -sigaction(SIGUSR1, {SIG_IGN}, {0x8059954, [], SA_INTERRUPT}) = 0 -getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 -setsockopt(3, IPPROTO_TCP1, [1], 4) = 0 -read(3, "GET /6k HTTP/1.0\r\nUser-Agent: "..., 4096) = 60 -sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0 -time(NULL) = 873959960 -gettimeofday({873959960, 404935}, NULL) = 0 -stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0 -open("/home/dgaudet/ap/apachen/htdocs/6k", O_RDONLY) = 4 -mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400ee000 -writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389 -close(4) = 0 -time(NULL) = 873959960 -write(17, "127.0.0.1 - - [10/Sep/1997:23:39"..., 71) = 71 -gettimeofday({873959960, 417742}, NULL) = 0 -times({tms_utime=5, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 446747 -shutdown(3, 1 /* send */) = 0 -oldselect(4, [3], NULL, [3], {2, 0}) = 1 (in [3], left {2, 0}) -read(3, "", 2048) = 0 -close(3) = 0 -sigaction(SIGUSR1, {0x8059954, [], SA_INTERRUPT}, {SIG_IGN}) = 0 -munmap(0x400ee000, 6144) = 0 -flock(18, LOCK_EX) = 0 -</PRE></BLOCKQUOTE> - -<P>Notice the accept serialization: - -<BLOCKQUOTE><PRE> -flock(18, LOCK_UN) = 0 -... -flock(18, LOCK_EX) = 0 -</PRE></BLOCKQUOTE> - -These two calls can be removed by defining -<CODE>SINGLE_LISTEN_UNSERIALIZED_ACCEPT</CODE> as described earlier. - -<P>Notice the <CODE>SIGUSR1</CODE> manipulation: - -<BLOCKQUOTE><PRE> -sigaction(SIGUSR1, {SIG_IGN}, {0x8059954, [], SA_INTERRUPT}) = 0 -... -sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0 -... -sigaction(SIGUSR1, {0x8059954, [], SA_INTERRUPT}, {SIG_IGN}) = 0 -</PRE></BLOCKQUOTE> - -This is caused by the implementation of graceful restarts. When the -parent receives a <CODE>SIGUSR1</CODE> it sends a <CODE>SIGUSR1</CODE> -to all of its children (and it also increments a "generation counter" -in shared memory). Any children that are idle (between connections) -will immediately die -off when they receive the signal. Any children that are in keep-alive -connections, but are in between requests will die off immediately. But -any children that have a connection and are still waiting for the first -request will not die off immediately. - -<P>To see why this is necessary, consider how a browser reacts to a closed -connection. If the connection was a keep-alive connection and the request -being serviced was not the first request then the browser will quietly -reissue the request on a new connection. It has to do this because the -server is always free to close a keep-alive connection in between requests -(<EM>i.e.</EM>, due to a timeout or because of a maximum number of requests). -But, if the connection is closed before the first response has been -received the typical browser will display a "document contains no data" -dialogue (or a broken image icon). This is done on the assumption that -the server is broken in some way (or maybe too overloaded to respond -at all). So Apache tries to avoid ever deliberately closing the connection -before it has sent a single response. This is the cause of those -<CODE>SIGUSR1</CODE> manipulations. - -<P>Note that it is theoretically possible to eliminate all three of -these calls. But in rough tests the gain proved to be almost unnoticeable. - -<P>In order to implement virtual hosts, Apache needs to know the -local socket address used to accept the connection: - -<BLOCKQUOTE><PRE> -getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 -</PRE></BLOCKQUOTE> - -It is possible to eliminate this call in many situations (such as when -there are no virtual hosts, or when <CODE>Listen</CODE> directives are -used which do not have wildcard addresses). But no effort has yet been -made to do these optimizations. - -<P>Apache turns off the Nagle algorithm: - -<BLOCKQUOTE><PRE> -setsockopt(3, IPPROTO_TCP1, [1], 4) = 0 -</PRE></BLOCKQUOTE> - -because of problems described in -<A HREF="http://www.isi.edu/~johnh/PAPERS/Heidemann97a.html">a -paper by John Heidemann</A>. - -<P>Notice the two <CODE>time</CODE> calls: - -<BLOCKQUOTE><PRE> -time(NULL) = 873959960 -... -time(NULL) = 873959960 -</PRE></BLOCKQUOTE> - -One of these occurs at the beginning of the request, and the other occurs -as a result of writing the log. At least one of these is required to -properly implement the HTTP protocol. The second occurs because the -Common Log Format dictates that the log record include a timestamp of the -end of the request. A custom logging module could eliminate one of the -calls. Or you can use a method which moves the time into shared memory, -see the <A HREF="#patches">patches section below</A>. - -<P>As described earlier, <CODE>ExtendedStatus On</CODE> causes two -<CODE>gettimeofday</CODE> calls and a call to <CODE>times</CODE>: - -<BLOCKQUOTE><PRE> -gettimeofday({873959960, 404935}, NULL) = 0 -... -gettimeofday({873959960, 417742}, NULL) = 0 -times({tms_utime=5, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 446747 -</PRE></BLOCKQUOTE> - -These can be removed by setting <CODE>ExtendedStatus Off</CODE> (which -is the default). - -<P>It might seem odd to call <CODE>stat</CODE>: - -<BLOCKQUOTE><PRE> -stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0 -</PRE></BLOCKQUOTE> - -This is part of the algorithm which calculates the -<CODE>PATH_INFO</CODE> for use by CGIs. In fact if the request had been -for the URI <CODE>/cgi-bin/printenv/foobar</CODE> then there would be -two calls to <CODE>stat</CODE>. The first for -<CODE>/home/dgaudet/ap/apachen/cgi-bin/printenv/foobar</CODE> -which does not exist, and the second for -<CODE>/home/dgaudet/ap/apachen/cgi-bin/printenv</CODE>, which does exist. -Regardless, at least one <CODE>stat</CODE> call is necessary when -serving static files because the file size and modification times are -used to generate HTTP headers (such as <CODE>Content-Length</CODE>, -<CODE>Last-Modified</CODE>) and implement protocol features (such -as <CODE>If-Modified-Since</CODE>). A somewhat more clever server -could avoid the <CODE>stat</CODE> when serving non-static files, -however doing so in Apache is very difficult given the modular structure. - -<P>All static files are served using <CODE>mmap</CODE>: - -<BLOCKQUOTE><PRE> -mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400ee000 -... -munmap(0x400ee000, 6144) = 0 -</PRE></BLOCKQUOTE> - -On some architectures it's slower to <CODE>mmap</CODE> small -files than it is to simply <CODE>read</CODE> them. The define -<CODE>MMAP_THRESHOLD</CODE> can be set to the minimum -size required before using <CODE>mmap</CODE>. By default -it's set to 0 (except on SunOS4 where experimentation has -shown 8192 to be a better value). Using a tool such as <A -HREF="http://www.bitmover.com/lmbench/">lmbench</A> you -can determine the optimal setting for your environment. - -<P>You may also wish to experiment with <CODE>MMAP_SEGMENT_SIZE</CODE> -(default 32768) which determines the maximum number of bytes that -will be written at a time from mmap()d files. Apache only resets the -client's <CODE>Timeout</CODE> in between write()s. So setting this -large may lock out low bandwidth clients unless you also increase the -<CODE>Timeout</CODE>. - -<P>It may even be the case that <CODE>mmap</CODE> isn't -used on your architecture; if so then defining <CODE>USE_MMAP_FILES</CODE> -and <CODE>HAVE_MMAP</CODE> might work (if it works then report back to us). - -<P>Apache does its best to avoid copying bytes around in memory. The -first write of any request typically is turned into a <CODE>writev</CODE> -which combines both the headers and the first hunk of data: - -<BLOCKQUOTE><PRE> -writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389 -</PRE></BLOCKQUOTE> - -When doing HTTP/1.1 chunked encoding Apache will generate up to four -element <CODE>writev</CODE>s. The goal is to push the byte copying -into the kernel, where it typically has to happen anyhow (to assemble -network packets). On testing, various Unixes (BSDI 2.x, Solaris 2.5, -Linux 2.0.31+) properly combine the elements into network packets. -Pre-2.0.31 Linux will not combine, and will create a packet for -each element, so upgrading is a good idea. Defining <CODE>NO_WRITEV</CODE> -will disable this combining, but result in very poor chunked encoding -performance. - -<P>The log write: - -<BLOCKQUOTE><PRE> -write(17, "127.0.0.1 - - [10/Sep/1997:23:39"..., 71) = 71 -</PRE></BLOCKQUOTE> - -can be deferred by defining <CODE>BUFFERED_LOGS</CODE>. In this case -up to <CODE>PIPE_BUF</CODE> bytes (a POSIX defined constant) of log entries -are buffered before writing. At no time does it split a log entry -across a <CODE>PIPE_BUF</CODE> boundary because those writes may not -be atomic. (<EM>i.e.</EM>, entries from multiple children could become mixed together). -The code does its best to flush this buffer when a child dies. - -<P>The lingering close code causes four system calls: - -<BLOCKQUOTE><PRE> -shutdown(3, 1 /* send */) = 0 -oldselect(4, [3], NULL, [3], {2, 0}) = 1 (in [3], left {2, 0}) -read(3, "", 2048) = 0 -close(3) = 0 -</PRE></BLOCKQUOTE> - -which were described earlier. - -<P>Let's apply some of these optimizations: -<CODE>-DSINGLE_LISTEN_UNSERIALIZED_ACCEPT -DBUFFERED_LOGS</CODE> and -<CODE>ExtendedStatus Off</CODE>. Here's the final trace: - -<BLOCKQUOTE><PRE> -accept(15, {sin_family=AF_INET, sin_port=htons(22286), sin_addr=inet_addr("127.0.0.1")}, [16]) = 3 -sigaction(SIGUSR1, {SIG_IGN}, {0x8058c98, [], SA_INTERRUPT}) = 0 -getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 -setsockopt(3, IPPROTO_TCP1, [1], 4) = 0 -read(3, "GET /6k HTTP/1.0\r\nUser-Agent: "..., 4096) = 60 -sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0 -time(NULL) = 873961916 -stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0 -open("/home/dgaudet/ap/apachen/htdocs/6k", O_RDONLY) = 4 -mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400e3000 -writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389 -close(4) = 0 -time(NULL) = 873961916 -shutdown(3, 1 /* send */) = 0 -oldselect(4, [3], NULL, [3], {2, 0}) = 1 (in [3], left {2, 0}) -read(3, "", 2048) = 0 -close(3) = 0 -sigaction(SIGUSR1, {0x8058c98, [], SA_INTERRUPT}, {SIG_IGN}) = 0 -munmap(0x400e3000, 6144) = 0 -</PRE></BLOCKQUOTE> - -That's 19 system calls, of which 4 remain relatively easy to remove, -but don't seem worth the effort. - -<H3><A NAME="patches">Appendix: Patches Available</A></H3> - -There are -<A HREF="http://www.arctic.org/~dgaudet/apache/1.3/"> -several performance patches available for 1.3.</A> Although they may -not apply cleanly to the current version, -it shouldn't be difficult for someone with a little C knowledge to -update them. In particular: - -<UL> -<LI>A -<A HREF="http://www.arctic.org/~dgaudet/apache/1.3/shared_time.patch" ->patch</A> to remove all <CODE>time(2)</CODE> system calls. -<LI>A -<A HREF="http://www.arctic.org/~dgaudet/apache/1.3/mod_include_speedups.patch" ->patch</A> to remove various system calls from <CODE>mod_include</CODE>, -these calls are used by few sites but required for backwards compatibility. -<LI>A -<A HREF="http://www.arctic.org/~dgaudet/apache/1.3/top_fuel.patch" ->patch</A> which integrates the above two plus a few other speedups at the -cost of removing some functionality. -</UL> - -<H3><a name="preforking">Appendix: The Pre-Forking Model</a></H3> - -<P>Apache (on Unix) is a <EM>pre-forking</EM> model server. The -<EM>parent</EM> process is responsible only for forking <EM>child</EM> -processes, it does not serve any requests or service any network -sockets. The child processes actually process connections, they serve -multiple connections (one at a time) before dying. -The parent spawns new or kills off old -children in response to changes in the load on the server (it does so -by monitoring a scoreboard which the children keep up to date). - -<P>This model for servers offers a robustness that other models do -not. In particular, the parent code is very simple, and with a high -degree of confidence the parent will continue to do its job without -error. The children are complex, and when you add in third party -code via modules, you risk segmentation faults and other forms of -corruption. Even should such a thing happen, it only affects one -connection and the server continues serving requests. The parent -quickly replaces the dead child. - -<P>Pre-forking is also very portable across dialects of Unix. -Historically this has been an important goal for Apache, and it continues -to remain so. - -<P>The pre-forking model comes under criticism for various -performance aspects. Of particular concern are the overhead -of forking a process, the overhead of context switches between -processes, and the memory overhead of having multiple processes. -Furthermore it does not offer as many opportunities for data-caching -between requests (such as a pool of <CODE>mmapped</CODE> files). -Various other models exist and extensive analysis can be found in the -<A HREF="http://www.cs.wustl.edu/~jxh/research/research.html"> papers -of the JAWS project</A>. In practice all of these costs vary drastically -depending on the operating system. - -<P>Apache's core code is already multithread aware, and Apache version -1.3 is multithreaded on NT. There have been at least two other experimental -implementations of threaded Apache, one using the 1.3 code base on DCE, -and one using a custom user-level threads package and the 1.0 code base; -neither is publicly available. There is also an experimental port of -Apache 1.3 to <A HREF="http://www.mozilla.org/docs/refList/refNSPR/"> -Netscape's Portable Run Time</A>, which -<A HREF="http://www.arctic.org/~dgaudet/apache/2.0/">is available</A> -(but you're encouraged to join the -<A HREF="http://dev.apache.org/mailing-lists">new-httpd mailing list</A> -if you intend to use it). -Part of our redesign for version 2.0 -of Apache will include abstractions of the server model so that we -can continue to support the pre-forking model, and also support various -threaded models. - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> diff --git a/docs/manual/misc/rewriteguide.html b/docs/manual/misc/rewriteguide.html deleted file mode 100644 index 0f469bd8b0..0000000000 --- a/docs/manual/misc/rewriteguide.html +++ /dev/null @@ -1,1906 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML><HEAD> -<TITLE>Apache 1.3 URL Rewriting Guide</TITLE> -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<BLOCKQUOTE> -<!--#include virtual="header.html" --> - -<DIV ALIGN=CENTER> - -<H1> -Apache 1.3<BR> -URL Rewriting Guide<BR> -</H1> - -<ADDRESS>Originally written by<BR> -Ralf S. Engelschall <rse@apache.org><BR> -December 1997</ADDRESS> - -</DIV> - -<P> -This document supplements the mod_rewrite <A -HREF="../mod/mod_rewrite.html">reference documentation</A>. It describes -how one can use Apache's mod_rewrite to solve typical URL-based problems -webmasters are usually confronted with in practice. I give detailed -descriptions on how to solve each problem by configuring URL rewriting -rulesets. - -<H2><A name="ToC1">Introduction to mod_rewrite</A></H2> - -The Apache module mod_rewrite is a killer one, i.e. it is a really -sophisticated module which provides a powerful way to do URL manipulations. -With it you can nearly do all types of URL manipulations you ever dreamed -about. The price you have to pay is to accept complexity, because -mod_rewrite's major drawback is that it is not easy to understand and use for -the beginner. And even Apache experts sometimes discover new aspects where -mod_rewrite can help. -<P> -In other words: With mod_rewrite you either shoot yourself in the foot the -first time and never use it again or love it for the rest of your life because -of its power. This paper tries to give you a few initial success events to -avoid the first case by presenting already invented solutions to you. - -<H2><A name="ToC2">Practical Solutions</A></H2> - -Here come a lot of practical solutions I've either invented myself or -collected from other peoples solutions in the past. Feel free to learn the -black magic of URL rewriting from these examples. - -<P> -<TABLE BGCOLOR="#FFE0E0" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD> -ATTENTION: Depending on your server-configuration it can be necessary to -slightly change the examples for your situation, e.g. adding the [PT] flag -when additionally using mod_alias and mod_userdir, etc. Or rewriting a ruleset -to fit in <CODE>.htaccess</CODE> context instead of per-server context. Always try -to understand what a particular ruleset really does before you use it. It -avoid problems. -</TD></TR></TABLE> - -<H1>URL Layout</H1> - -<P> -<H2>Canonical URLs</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -On some webservers there are more than one URL for a resource. Usually there -are canonical URLs (which should be actually used and distributed) and those -which are just shortcuts, internal ones, etc. Independed which URL the user -supplied with the request he should finally see the canonical one only. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We do an external HTTP redirect for all non-canonical URLs to fix them in the -location view of the Browser and for all subsequent requests. In the example -ruleset below we replace <CODE>/~user</CODE> by the canonical <CODE>/u/user</CODE> and -fix a missing trailing slash for <CODE>/u/user</CODE>. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteRule ^/<STRONG>~</STRONG>([^/]+)/?(.*) /<STRONG>u</STRONG>/$1/$2 [<STRONG>R</STRONG>] -RewriteRule ^/([uge])/(<STRONG>[^/]+</STRONG>)$ /$1/$2<STRONG>/</STRONG> [<STRONG>R</STRONG>] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Canonical Hostnames</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -... - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC] -RewriteCond %{HTTP_HOST} !^$ -RewriteCond %{SERVER_PORT} !^80$ -RewriteRule ^/(.*) http://fully.qualified.domain.name:%{SERVER_PORT}/$1 [L,R] -RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC] -RewriteCond %{HTTP_HOST} !^$ -RewriteRule ^/(.*) http://fully.qualified.domain.name/$1 [L,R] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Moved DocumentRoot</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Usually the DocumentRoot of the webserver directly relates to the URL -``<CODE>/</CODE>''. But often this data is not really of top-level priority, it is -perhaps just one entity of a lot of data pools. For instance at our Intranet -sites there are <CODE>/e/www/</CODE> (the homepage for WWW), <CODE>/e/sww/</CODE> (the -homepage for the Intranet) etc. Now because the data of the DocumentRoot stays -at <CODE>/e/www/</CODE> we had to make sure that all inlined images and other -stuff inside this data pool work for subsequent requests. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We just redirect the URL <CODE>/</CODE> to <CODE>/e/www/</CODE>. While is seems -trivial it is actually trivial with mod_rewrite, only. Because the typical -old mechanisms of URL <EM>Aliases</EM> (as provides by mod_alias and friends) -only used <EM>prefix</EM> matching. With this you cannot do such a redirection -because the DocumentRoot is a prefix of all URLs. With mod_rewrite it is -really trivial: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteRule <STRONG>^/$</STRONG> /e/www/ [<STRONG>R</STRONG>] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Trailing Slash Problem</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Every webmaster can sing a song about the problem of the trailing slash on -URLs referencing directories. If they are missing, the server dumps an error, -because if you say <CODE>/~quux/foo</CODE> instead of -<CODE>/~quux/foo/</CODE> then the server searches for a <EM>file</EM> named -<CODE>foo</CODE>. And because this file is a directory it complains. Actually -is tries to fix it themself in most of the cases, but sometimes this mechanism -need to be emulated by you. For instance after you have done a lot of -complicated URL rewritings to CGI scripts etc. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -The solution to this subtle problem is to let the server add the trailing -slash automatically. To do this correctly we have to use an external redirect, -so the browser correctly requests subsequent images etc. If we only did a -internal rewrite, this would only work for the directory page, but would go -wrong when any images are included into this page with relative URLs, because -the browser would request an in-lined object. For instance, a request for -<CODE>image.gif</CODE> in <CODE>/~quux/foo/index.html</CODE> would become -<CODE>/~quux/image.gif</CODE> without the external redirect! -<P> -So, to do this trick we write: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteBase /~quux/ -RewriteRule ^foo<STRONG>$</STRONG> foo<STRONG>/</STRONG> [<STRONG>R</STRONG>] -</PRE></TD></TR></TABLE> - -<P> -The crazy and lazy can even do the following in the top-level -<CODE>.htaccess</CODE> file of their homedir. But notice that this creates some -processing overhead. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteBase /~quux/ -RewriteCond %{REQUEST_FILENAME} <STRONG>-d</STRONG> -RewriteRule ^(.+<STRONG>[^/]</STRONG>)$ $1<STRONG>/</STRONG> [R] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Webcluster through Homogeneous URL Layout</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -We want to create a homogenous and consistent URL layout over all WWW servers -on a Intranet webcluster, i.e. all URLs (per definition server local and thus -server dependent!) become actually server <EM>independed</EM>! What we want is -to give the WWW namespace a consistent server-independend layout: no URL -should have to include any physically correct target server. The cluster -itself should drive us automatically to the physical target host. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -First, the knowledge of the target servers come from (distributed) external -maps which contain information where our users, groups and entities stay. -The have the form - -<P><PRE> -user1 server_of_user1 -user2 server_of_user2 -: : -</PRE><P> - -We put them into files <CODE>map.xxx-to-host</CODE>. Second we need to instruct -all servers to redirect URLs of the forms - -<P><PRE> -/u/user/anypath -/g/group/anypath -/e/entity/anypath -</PRE><P> - -to - -<P><PRE> -http://physical-host/u/user/anypath -http://physical-host/g/group/anypath -http://physical-host/e/entity/anypath -</PRE><P> - -when the URL is not locally valid to a server. The following ruleset does -this for us by the help of the map files (assuming that server0 is a default -server which will be used if a user has no entry in the map): - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on - -RewriteMap user-to-host txt:/path/to/map.user-to-host -RewriteMap group-to-host txt:/path/to/map.group-to-host -RewriteMap entity-to-host txt:/path/to/map.entity-to-host - -RewriteRule ^/u/<STRONG>([^/]+)</STRONG>/?(.*) http://<STRONG>${user-to-host:$1|server0}</STRONG>/u/$1/$2 -RewriteRule ^/g/<STRONG>([^/]+)</STRONG>/?(.*) http://<STRONG>${group-to-host:$1|server0}</STRONG>/g/$1/$2 -RewriteRule ^/e/<STRONG>([^/]+)</STRONG>/?(.*) http://<STRONG>${entity-to-host:$1|server0}</STRONG>/e/$1/$2 - -RewriteRule ^/([uge])/([^/]+)/?$ /$1/$2/.www/ -RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\ -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Move Homedirs to Different Webserver</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -A lot of webmaster aksed for a solution to the following situation: They -wanted to redirect just all homedirs on a webserver to another webserver. -They usually need such things when establishing a newer webserver which will -replace the old one over time. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -The solution is trivial with mod_rewrite. On the old webserver we just -redirect all <CODE>/~user/anypath</CODE> URLs to -<CODE>http://newserver/~user/anypath</CODE>. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteRule ^/~(.+) http://<STRONG>newserver</STRONG>/~$1 [R,L] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Structured Homedirs</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Some sites with thousend of users usually use a structured homedir layout, -i.e. each homedir is in a subdirectory which begins for instance with the -first character of the username. So, <CODE>/~foo/anypath</CODE> is -<CODE>/home/<STRONG>f</STRONG>/foo/.www/anypath</CODE> while <CODE>/~bar/anypath</CODE> is -<CODE>/home/<STRONG>b</STRONG>/bar/.www/anypath</CODE>. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We use the following ruleset to expand the tilde URLs into exactly the above -layout. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteRule ^/~(<STRONG>([a-z])</STRONG>[a-z0-9]+)(.*) /home/<STRONG>$2</STRONG>/$1/.www$3 -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Filesystem Reorganisation</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -This really is a hardcore example: a killer application which heavily uses -per-directory <CODE>RewriteRules</CODE> to get a smooth look and feel on the Web -while its data structure is never touched or adjusted. - -Background: <STRONG><EM>net.sw</EM></STRONG> is my archive of freely available Unix -software packages, which I started to collect in 1992. It is both my hobby and -job to to this, because while I'm studying computer science I have also worked -for many years as a system and network administrator in my spare time. Every -week I need some sort of software so I created a deep hierarchy of -directories where I stored the packages: - -<P><PRE> -drwxrwxr-x 2 netsw users 512 Aug 3 18:39 Audio/ -drwxrwxr-x 2 netsw users 512 Jul 9 14:37 Benchmark/ -drwxrwxr-x 12 netsw users 512 Jul 9 00:34 Crypto/ -drwxrwxr-x 5 netsw users 512 Jul 9 00:41 Database/ -drwxrwxr-x 4 netsw users 512 Jul 30 19:25 Dicts/ -drwxrwxr-x 10 netsw users 512 Jul 9 01:54 Graphic/ -drwxrwxr-x 5 netsw users 512 Jul 9 01:58 Hackers/ -drwxrwxr-x 8 netsw users 512 Jul 9 03:19 InfoSys/ -drwxrwxr-x 3 netsw users 512 Jul 9 03:21 Math/ -drwxrwxr-x 3 netsw users 512 Jul 9 03:24 Misc/ -drwxrwxr-x 9 netsw users 512 Aug 1 16:33 Network/ -drwxrwxr-x 2 netsw users 512 Jul 9 05:53 Office/ -drwxrwxr-x 7 netsw users 512 Jul 9 09:24 SoftEng/ -drwxrwxr-x 7 netsw users 512 Jul 9 12:17 System/ -drwxrwxr-x 12 netsw users 512 Aug 3 20:15 Typesetting/ -drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/ -</PRE><P> - -In July 1996 I decided to make this archive public to the world via a -nice Web interface. "Nice" means that I wanted to -offer an interface where you can browse directly through the archive hierarchy. -And "nice" means that I didn't wanted to change anything inside this hierarchy -- not even by putting some CGI scripts at the top of it. Why? Because the -above structure should be later accessible via FTP as well, and I didn't -want any Web or CGI stuff to be there. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -The solution has two parts: The first is a set of CGI scripts which create all -the pages at all directory levels on-the-fly. I put them under -<CODE>/e/netsw/.www/</CODE> as follows: - -<P><PRE> --rw-r--r-- 1 netsw users 1318 Aug 1 18:10 .wwwacl -drwxr-xr-x 18 netsw users 512 Aug 5 15:51 DATA/ --rw-rw-rw- 1 netsw users 372982 Aug 5 16:35 LOGFILE --rw-r--r-- 1 netsw users 659 Aug 4 09:27 TODO --rw-r--r-- 1 netsw users 5697 Aug 1 18:01 netsw-about.html --rwxr-xr-x 1 netsw users 579 Aug 2 10:33 netsw-access.pl --rwxr-xr-x 1 netsw users 1532 Aug 1 17:35 netsw-changes.cgi --rwxr-xr-x 1 netsw users 2866 Aug 5 14:49 netsw-home.cgi -drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/ --rwxr-xr-x 1 netsw users 24050 Aug 5 15:49 netsw-lsdir.cgi --rwxr-xr-x 1 netsw users 1589 Aug 3 18:43 netsw-search.cgi --rwxr-xr-x 1 netsw users 1885 Aug 1 17:41 netsw-tree.cgi --rw-r--r-- 1 netsw users 234 Jul 30 16:35 netsw-unlimit.lst -</PRE><P> - -The <CODE>DATA/</CODE> subdirectory holds the above directory structure, i.e. the -real <STRONG><EM>net.sw</EM></STRONG> stuff and gets automatically updated via -<CODE>rdist</CODE> from time to time. - -The second part of the problem remains: how to link these two structures -together into one smooth-looking URL tree? We want to hide the <CODE>DATA/</CODE> -directory from the user while running the appropriate CGI scripts for the -various URLs. - -Here is the solution: first I put the following into the per-directory -configuration file in the Document Root of the server to rewrite the announced -URL <CODE>/net.sw/</CODE> to the internal path <CODE>/e/netsw</CODE>: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteRule ^net.sw$ net.sw/ [R] -RewriteRule ^net.sw/(.*)$ e/netsw/$1 -</PRE></TD></TR></TABLE> - -<P> -The first rule is for requests which miss the trailing slash! The second rule -does the real thing. And then comes the killer configuration which stays in -the per-directory config file <CODE>/e/netsw/.www/.wwwacl</CODE>: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -Options ExecCGI FollowSymLinks Includes MultiViews - -RewriteEngine on - -# we are reached via /net.sw/ prefix -RewriteBase /net.sw/ - -# first we rewrite the root dir to -# the handling cgi script -RewriteRule ^$ netsw-home.cgi [L] -RewriteRule ^index\.html$ netsw-home.cgi [L] - -# strip out the subdirs when -# the browser requests us from perdir pages -RewriteRule ^.+/(netsw-[^/]+/.+)$ $1 [L] - -# and now break the rewriting for local files -RewriteRule ^netsw-home\.cgi.* - [L] -RewriteRule ^netsw-changes\.cgi.* - [L] -RewriteRule ^netsw-search\.cgi.* - [L] -RewriteRule ^netsw-tree\.cgi$ - [L] -RewriteRule ^netsw-about\.html$ - [L] -RewriteRule ^netsw-img/.*$ - [L] - -# anything else is a subdir which gets handled -# by another cgi script -RewriteRule !^netsw-lsdir\.cgi.* - [C] -RewriteRule (.*) netsw-lsdir.cgi/$1 -</PRE></TD></TR></TABLE> - -<P> -Some hints for interpretation: - <ol> - <li> Notice the L (last) flag and no substitution field ('-') in the - forth part - <li> Notice the ! (not) character and the C (chain) flag - at the first rule in the last part - <li> Notice the catch-all pattern in the last rule - </ol> - -</DL> - -<P> -<H2>NCSA imagemap to Apache mod_imap</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -When switching from the NCSA webserver to the more modern Apache webserver a -lot of people want a smooth transition. So they want pages which use their old -NCSA <CODE>imagemap</CODE> program to work under Apache with the modern -<CODE>mod_imap</CODE>. The problem is that there are a lot of -hyperlinks around which reference the <CODE>imagemap</CODE> program via -<CODE>/cgi-bin/imagemap/path/to/page.map</CODE>. Under Apache this -has to read just <CODE>/path/to/page.map</CODE>. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We use a global rule to remove the prefix on-the-fly for all requests: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteRule ^/cgi-bin/imagemap(.*) $1 [PT] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Search pages in more than one directory</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Sometimes it is neccessary to let the webserver search for pages in more than -one directory. Here MultiViews or other techniques cannot help. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We program a explicit ruleset which searches for the files in the directories. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on - -# first try to find it in custom/... -# ...and if found stop and be happy: -RewriteCond /your/docroot/<STRONG>dir1</STRONG>/%{REQUEST_FILENAME} -f -RewriteRule ^(.+) /your/docroot/<STRONG>dir1</STRONG>/$1 [L] - -# second try to find it in pub/... -# ...and if found stop and be happy: -RewriteCond /your/docroot/<STRONG>dir2</STRONG>/%{REQUEST_FILENAME} -f -RewriteRule ^(.+) /your/docroot/<STRONG>dir2</STRONG>/$1 [L] - -# else go on for other Alias or ScriptAlias directives, -# etc. -RewriteRule ^(.+) - [PT] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Set Environment Variables According To URL Parts</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Perhaps you want to keep status information between requests and use the URL -to encode it. But you don't want to use a CGI wrapper for all pages just to -strip out this information. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We use a rewrite rule to strip out the status information and remember it via -an environment variable which can be later dereferenced from within XSSI or -CGI. This way a URL <CODE>/foo/S=java/bar/</CODE> gets translated to -<CODE>/foo/bar/</CODE> and the environment variable named <CODE>STATUS</CODE> is set -to the value "java". - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteRule ^(.*)/<STRONG>S=([^/]+)</STRONG>/(.*) $1/$3 [E=<STRONG>STATUS:$2</STRONG>] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Virtual User Hosts</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Assume that you want to provide <CODE>www.<STRONG>username</STRONG>.host.domain.com</CODE> -for the homepage of username via just DNS A records to the same machine and -without any virtualhosts on this machine. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -For HTTP/1.0 requests there is no solution, but for HTTP/1.1 requests which -contain a Host: HTTP header we can use the following ruleset to rewrite -<CODE>http://www.username.host.com/anypath</CODE> internally to -<CODE>/home/username/anypath</CODE>: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteCond %{<STRONG>HTTP_HOST</STRONG>} ^www\.<STRONG>[^.]+</STRONG>\.host\.com$ -RewriteRule ^(.+) %{HTTP_HOST}$1 [C] -RewriteRule ^www\.<STRONG>([^.]+)</STRONG>\.host\.com(.*) /home/<STRONG>$1</STRONG>$2 -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Redirect Homedirs For Foreigners</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -We want to redirect homedir URLs to another webserver -<CODE>www.somewhere.com</CODE> when the requesting user does not stay in the local -domain <CODE>ourdomain.com</CODE>. This is sometimes used in virtual host -contexts. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -Just a rewrite condition: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteCond %{REMOTE_HOST} <STRONG>!^.+\.ourdomain\.com$</STRONG> -RewriteRule ^(/~.+) http://www.somewhere.com/$1 [R,L] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Redirect Failing URLs To Other Webserver</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -A typical FAQ about URL rewriting is how to redirect failing requests on -webserver A to webserver B. Usually this is done via ErrorDocument -CGI-scripts in Perl, but there is also a mod_rewrite solution. But notice that -this is less performant than using a ErrorDocument CGI-script! - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -The first solution has the best performance but less flexibility and is less -error safe: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteCond /your/docroot/%{REQUEST_FILENAME} <STRONG>!-f</STRONG> -RewriteRule ^(.+) http://<STRONG>webserverB</STRONG>.dom/$1 -</PRE></TD></TR></TABLE> - -<P> -The problem here is that this will only work for pages inside the -DocumentRoot. While you can add more Conditions (for instance to also handle -homedirs, etc.) there is better variant: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteCond %{REQUEST_URI} <STRONG>!-U</STRONG> -RewriteRule ^(.+) http://<STRONG>webserverB</STRONG>.dom/$1 -</PRE></TD></TR></TABLE> - -<P> -This uses the URL look-ahead feature of mod_rewrite. The result is that this -will work for all types of URLs and is a safe way. But it does a performance -impact on the webserver, because for every request there is one more internal -subrequest. So, if your webserver runs on a powerful CPU, use this one. If it -is a slow machine, use the first approach or better a ErrorDocument -CGI-script. - -</DL> - -<P> -<H2>Extended Redirection</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Sometimes we need more control (concerning the character escaping mechanism) -of URLs on redirects. Usually the Apache kernels URL escape function also -escapes anchors, i.e. URLs like "url#anchor". You cannot use this directly on -redirects with mod_rewrite because the uri_escape() function of Apache would -also escape the hash character. How can we redirect to such a URL? - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We have to use a kludge by the use of a NPH-CGI script which does the redirect -itself. Because here no escaping is done (NPH=non-parseable headers). First -we introduce a new URL scheme <CODE>xredirect:</CODE> by the following per-server -config-line (should be one of the last rewrite rules): - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteRule ^xredirect:(.+) /path/to/nph-xredirect.cgi/$1 \ - [T=application/x-httpd-cgi,L] -</PRE></TD></TR></TABLE> - -<P> -This forces all URLs prefixed with <CODE>xredirect:</CODE> to be piped through the -<CODE>nph-xredirect.cgi</CODE> program. And this program just looks like: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -<PRE> -#!/path/to/perl -## -## nph-xredirect.cgi -- NPH/CGI script for extended redirects -## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved. -## - -$| = 1; -$url = $ENV{'PATH_INFO'}; - -print "HTTP/1.0 302 Moved Temporarily\n"; -print "Server: $ENV{'SERVER_SOFTWARE'}\n"; -print "Location: $url\n"; -print "Content-type: text/html\n"; -print "\n"; -print "<html>\n"; -print "<head>\n"; -print "<title>302 Moved Temporarily (EXTENDED)</title>\n"; -print "</head>\n"; -print "<body>\n"; -print "<h1>Moved Temporarily (EXTENDED)</h1>\n"; -print "The document has moved <a HREF=\"$url\">here</a>.<p>\n"; -print "</body>\n"; -print "</html>\n"; - -##EOF## -</PRE> -</PRE></TD></TR></TABLE> - -<P> -This provides you with the functionality to do redirects to all URL schemes, -i.e. including the one which are not directly accepted by mod_rewrite. For -instance you can now also redirect to <CODE>news:newsgroup</CODE> via - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteRule ^anyurl xredirect:news:newsgroup -</PRE></TD></TR></TABLE> - -<P> -Notice: You have not to put [R] or [R,L] to the above rule because the -<CODE>xredirect:</CODE> need to be expanded later by our special "pipe through" -rule above. - -</DL> - -<P> -<H2>Archive Access Multiplexer</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Do you know the great CPAN (Comprehensive Perl Archive Network) under <A -HREF="http://www.perl.com/CPAN">http://www.perl.com/CPAN</A>? This does a -redirect to one of several FTP servers around the world which carry a CPAN -mirror and is approximately near the location of the requesting client. -Actually this can be called an FTP access multiplexing service. While CPAN -runs via CGI scripts, how can a similar approach implemented via mod_rewrite? - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -First we notice that from version 3.0.0 mod_rewrite can also use the "ftp:" -scheme on redirects. And second, the location approximation can be done by a -rewritemap over the top-level domain of the client. With a tricky chained -ruleset we can use this top-level domain as a key to our multiplexing map. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteMap multiplex txt:/path/to/map.cxan -RewriteRule ^/CxAN/(.*) %{REMOTE_HOST}::$1 [C] -RewriteRule ^.+\.<STRONG>([a-zA-Z]+)</STRONG>::(.*)$ ${multiplex:<STRONG>$1</STRONG>|ftp.default.dom}$2 [R,L] -</PRE></TD></TR></TABLE> - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -## -## map.cxan -- Multiplexing Map for CxAN -## - -de ftp://ftp.cxan.de/CxAN/ -uk ftp://ftp.cxan.uk/CxAN/ -com ftp://ftp.cxan.com/CxAN/ - : -##EOF## -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Time-Dependend Rewriting</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -When tricks like time-dependend content should happen a lot of webmasters -still use CGI scripts which do for instance redirects to specialized pages. -How can it be done via mod_rewrite? - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -There are a lot of variables named <CODE>TIME_xxx</CODE> for rewrite conditions. -In conjunction with the special lexicographic comparison patterns <STRING, ->STRING and =STRING we can do time-dependend redirects: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteCond %{TIME_HOUR}%{TIME_MIN} >0700 -RewriteCond %{TIME_HOUR}%{TIME_MIN} <1900 -RewriteRule ^foo\.html$ foo.day.html -RewriteRule ^foo\.html$ foo.night.html -</PRE></TD></TR></TABLE> - -<P> -This provides the content of <CODE>foo.day.html</CODE> under the URL -<CODE>foo.html</CODE> from 07:00-19:00 and at the remaining time the contents of -<CODE>foo.night.html</CODE>. Just a nice feature for a homepage... - -</DL> - -<P> -<H2>Backward Compatibility for YYYY to XXXX migration</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -How can we make URLs backward compatible (still existing virtually) after -migrating document.YYYY to document.XXXX, e.g. after translating a bunch of -.html files to .phtml? - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We just rewrite the name to its basename and test for existence of the new -extension. If it exists, we take that name, else we rewrite the URL to its -original state. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -# backward compatibility ruleset for -# rewriting document.html to document.phtml -# when and only when document.phtml exists -# but no longer document.html -RewriteEngine on -RewriteBase /~quux/ -# parse out basename, but remember the fact -RewriteRule ^(.*)\.html$ $1 [C,E=WasHTML:yes] -# rewrite to document.phtml if exists -RewriteCond %{REQUEST_FILENAME}.phtml -f -RewriteRule ^(.*)$ $1.phtml [S=1] -# else reverse the previous basename cutout -RewriteCond %{ENV:WasHTML} ^yes$ -RewriteRule ^(.*)$ $1.html -</PRE></TD></TR></TABLE> - -</DL> - -<H1>Content Handling</H1> - -<P> -<H2>From Old to New (intern)</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Assume we have recently renamed the page <CODE>bar.html</CODE> to -<CODE>foo.html</CODE> and now want to provide the old URL for backward -compatibility. Actually we want that users of the old URL even not recognize -that the pages was renamed. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We rewrite the old URL to the new one internally via the following rule: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteBase /~quux/ -RewriteRule ^<STRONG>foo</STRONG>\.html$ <STRONG>bar</STRONG>.html -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>From Old to New (extern)</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Assume again that we have recently renamed the page <CODE>bar.html</CODE> to -<CODE>foo.html</CODE> and now want to provide the old URL for backward -compatibility. But this time we want that the users of the old URL get hinted -to the new one, i.e. their browsers Location field should change, too. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We force a HTTP redirect to the new URL which leads to a change of the -browsers and thus the users view: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteBase /~quux/ -RewriteRule ^<STRONG>foo</STRONG>\.html$ <STRONG>bar</STRONG>.html [<STRONG>R</STRONG>] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Browser Dependend Content</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -At least for important top-level pages it is sometimes necesarry to provide -the optimum of browser dependend content, i.e. one has to provide a maximum -version for the latest Netscape variants, a minimum version for the Lynx -browsers and a average feature version for all others. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We cannot use content negotiation because the browsers do not provide their -type in that form. Instead we have to act on the HTTP header "User-Agent". -The following condig does the following: If the HTTP header "User-Agent" -begins with "Mozilla/3", the page <CODE>foo.html</CODE> is rewritten to -<CODE>foo.NS.html</CODE> and and the rewriting stops. If the browser is "Lynx" or -"Mozilla" of version 1 or 2 the URL becomes <CODE>foo.20.html</CODE>. All other -browsers receive page <CODE>foo.32.html</CODE>. This is done by the following -ruleset: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteCond %{HTTP_USER_AGENT} ^<STRONG>Mozilla/3</STRONG>.* -RewriteRule ^foo\.html$ foo.<STRONG>NS</STRONG>.html [<STRONG>L</STRONG>] - -RewriteCond %{HTTP_USER_AGENT} ^<STRONG>Lynx/</STRONG>.* [OR] -RewriteCond %{HTTP_USER_AGENT} ^<STRONG>Mozilla/[12]</STRONG>.* -RewriteRule ^foo\.html$ foo.<STRONG>20</STRONG>.html [<STRONG>L</STRONG>] - -RewriteRule ^foo\.html$ foo.<STRONG>32</STRONG>.html [<STRONG>L</STRONG>] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Dynamic Mirror</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Assume there are nice webpages on remote hosts we want to bring into our -namespace. For FTP servers we would use the <CODE>mirror</CODE> program which -actually maintains an explicit up-to-date copy of the remote data on the local -machine. For a webserver we could use the program <CODE>webcopy</CODE> which acts -similar via HTTP. But both techniques have one major drawback: The local copy -is always just as up-to-date as often we run the program. It would be much -better if the mirror is not a static one we have to establish explicitly. -Instead we want a dynamic mirror with data which gets updated automatically -when there is need (updated data on the remote host). - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -To provide this feature we map the remote webpage or even the complete remote -webarea to our namespace by the use of the <I>Proxy Throughput</I> feature -(flag [P]): - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteBase /~quux/ -RewriteRule ^<STRONG>hotsheet/</STRONG>(.*)$ <STRONG>http://www.tstimpreso.com/hotsheet/</STRONG>$1 [<STRONG>P</STRONG>] -</PRE></TD></TR></TABLE> - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteBase /~quux/ -RewriteRule ^<STRONG>usa-news\.html</STRONG>$ <STRONG>http://www.quux-corp.com/news/index.html</STRONG> [<STRONG>P</STRONG>] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Reverse Dynamic Mirror</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -... - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteCond /mirror/of/remotesite/$1 -U -RewriteRule ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1 -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Retrieve Missing Data from Intranet</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -This is a tricky way of virtually running a corporates (external) Internet -webserver (<CODE>www.quux-corp.dom</CODE>), while actually keeping and maintaining -its data on a (internal) Intranet webserver -(<CODE>www2.quux-corp.dom</CODE>) which is protected by a firewall. The -trick is that on the external webserver we retrieve the requested data -on-the-fly from the internal one. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -First, we have to make sure that our firewall still protects the internal -webserver and that only the external webserver is allowed to retrieve data -from it. For a packet-filtering firewall we could for instance configure a -firewall ruleset like the following: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -<STRONG>ALLOW</STRONG> Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port <STRONG>80</STRONG> -<STRONG>DENY</STRONG> Host * Port * --> Host www2.quux-corp.dom Port <STRONG>80</STRONG> -</PRE></TD></TR></TABLE> - -<P> -Just adjust it to your actual configuration syntax. Now we can establish the -mod_rewrite rules which request the missing data in the background through the -proxy throughput feature: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2 -RewriteCond %{REQUEST_FILENAME} <STRONG>!-f</STRONG> -RewriteCond %{REQUEST_FILENAME} <STRONG>!-d</STRONG> -RewriteRule ^/home/([^/]+)/.www/?(.*) http://<STRONG>www2</STRONG>.quux-corp.dom/~$1/pub/$2 [<STRONG>P</STRONG>] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Load Balancing</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Suppose we want to load balance the traffic to <CODE>www.foo.com</CODE> over -<CODE>www[0-5].foo.com</CODE> (a total of 6 servers). How can this be done? - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -There are a lot of possible solutions for this problem. We will discuss first -a commonly known DNS-based variant and then the special one with mod_rewrite: - -<ol> -<li><STRONG>DNS Round-Robin</STRONG> - -<P> -The simplest method for load-balancing is to use the DNS round-robin feature -of BIND. Here you just configure <CODE>www[0-9].foo.com</CODE> as usual in your -DNS with A(address) records, e.g. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -www0 IN A 1.2.3.1 -www1 IN A 1.2.3.2 -www2 IN A 1.2.3.3 -www3 IN A 1.2.3.4 -www4 IN A 1.2.3.5 -www5 IN A 1.2.3.6 -</PRE></TD></TR></TABLE> - -<P> -Then you additionally add the following entry: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -www IN CNAME www0.foo.com. - IN CNAME www1.foo.com. - IN CNAME www2.foo.com. - IN CNAME www3.foo.com. - IN CNAME www4.foo.com. - IN CNAME www5.foo.com. - IN CNAME www6.foo.com. -</PRE></TD></TR></TABLE> - -<P> -Notice that this seems wrong, but is actually an intended feature of BIND and -can be used in this way. However, now when <CODE>www.foo.com</CODE> gets resolved, -BIND gives out <CODE>www0-www6</CODE> - but in a slightly permutated/rotated order -every time. This way the clients are spread over the various servers. - -But notice that this not a perfect load balancing scheme, because DNS resolve -information gets cached by the other nameservers on the net, so once a client -has resolved <CODE>www.foo.com</CODE> to a particular <CODE>wwwN.foo.com</CODE>, all -subsequent requests also go to this particular name <CODE>wwwN.foo.com</CODE>. But -the final result is ok, because the total sum of the requests are really -spread over the various webservers. - -<P> -<li><STRONG>DNS Load-Balancing</STRONG> - -<P> -A sophisticated DNS-based method for load-balancing is to use the program -<CODE>lbnamed</CODE> which can be found at <A -HREF="http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html">http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html</A>. -It is a Perl 5 program in conjunction with auxilliary tools which provides a -real load-balancing for DNS. - -<P> -<li><STRONG>Proxy Throughput Round-Robin</STRONG> - -<P> -In this variant we use mod_rewrite and its proxy throughput feature. First we -dedicate <CODE>www0.foo.com</CODE> to be actually <CODE>www.foo.com</CODE> by using a -single - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -www IN CNAME www0.foo.com. -</PRE></TD></TR></TABLE> - -<P> -entry in the DNS. Then we convert <CODE>www0.foo.com</CODE> to a proxy-only -server, i.e. we configure this machine so all arriving URLs are just pushed -through the internal proxy to one of the 5 other servers (<CODE>www1-www5</CODE>). -To accomplish this we first establish a ruleset which contacts a load -balancing script <CODE>lb.pl</CODE> for all URLs. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteMap lb prg:/path/to/lb.pl -RewriteRule ^/(.+)$ ${lb:$1} [P,L] -</PRE></TD></TR></TABLE> - -<P> -Then we write <CODE>lb.pl</CODE>: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -#!/path/to/perl -## -## lb.pl -- load balancing script -## - -$| = 1; - -$name = "www"; # the hostname base -$first = 1; # the first server (not 0 here, because 0 is myself) -$last = 5; # the last server in the round-robin -$domain = "foo.dom"; # the domainname - -$cnt = 0; -while (<STDIN>) { - $cnt = (($cnt+1) % ($last+1-$first)); - $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain); - print "http://$server/$_"; -} - -##EOF## -</PRE></TD></TR></TABLE> - -<P> -A last notice: Why is this useful? Seems like <CODE>www0.foo.com</CODE> still is -overloaded? The answer is yes, it is overloaded, but with plain proxy -throughput requests, only! All SSI, CGI, ePerl, etc. processing is completely -done on the other machines. This is the essential point. - -<P> -<li><STRONG>Hardware/TCP Round-Robin</STRONG> - -<P> -There is a hardware solution available, too. Cisco has a beast called -LocalDirector which does a load balancing at the TCP/IP level. Actually this -is some sort of a circuit level gateway in front of a webcluster. If you have -enough money and really need a solution with high performance, use this one. - -</ol> - -</DL> - -<P> -<H2>Reverse Proxy</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -... - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -## -## apache-rproxy.conf -- Apache configuration for Reverse Proxy Usage -## - -# server type -ServerType standalone -Port 8000 -MinSpareServers 16 -StartServers 16 -MaxSpareServers 16 -MaxClients 16 -MaxRequestsPerChild 100 - -# server operation parameters -KeepAlive on -MaxKeepAliveRequests 100 -KeepAliveTimeout 15 -Timeout 400 -IdentityCheck off -HostnameLookups off - -# paths to runtime files -PidFile /path/to/apache-rproxy.pid -LockFile /path/to/apache-rproxy.lock -ErrorLog /path/to/apache-rproxy.elog -CustomLog /path/to/apache-rproxy.dlog "%{%v/%T}t %h -> %{SERVER}e URL: %U" - -# unused paths -ServerRoot /tmp -DocumentRoot /tmp -CacheRoot /tmp -RewriteLog /dev/null -TransferLog /dev/null -TypesConfig /dev/null -AccessConfig /dev/null -ResourceConfig /dev/null - -# speed up and secure processing -<Directory /> -Options -FollowSymLinks -SymLinksIfOwnerMatch -AllowOverwrite None -</Directory> - -# the status page for monitoring the reverse proxy -<Location /rproxy-status> -SetHandler server-status -</Location> - -# enable the URL rewriting engine -RewriteEngine on -RewriteLogLevel 0 - -# define a rewriting map with value-lists where -# mod_rewrite randomly chooses a particular value -RewriteMap server rnd:/path/to/apache-rproxy.conf-servers - -# make sure the status page is handled locally -# and make sure no one uses our proxy except ourself -RewriteRule ^/apache-rproxy-status.* - [L] -RewriteRule ^(http|ftp)://.* - [F] - -# now choose the possible servers for particular URL types -RewriteRule ^/(.*\.(cgi|shtml))$ to://${server:dynamic}/$1 [S=1] -RewriteRule ^/(.*)$ to://${server:static}/$1 - -# and delegate the generated URL by passing it -# through the proxy module -RewriteRule ^to://([^/]+)/(.*) http://$1/$2 [E=SERVER:$1,P,L] - -# and make really sure all other stuff is forbidden -# when it should survive the above rules... -RewriteRule .* - [F] - -# enable the Proxy module without caching -ProxyRequests on -NoCache * - -# setup URL reverse mapping for redirect reponses -ProxyPassReverse / http://www1.foo.dom/ -ProxyPassReverse / http://www2.foo.dom/ -ProxyPassReverse / http://www3.foo.dom/ -ProxyPassReverse / http://www4.foo.dom/ -ProxyPassReverse / http://www5.foo.dom/ -ProxyPassReverse / http://www6.foo.dom/ -</PRE></TD></TR></TABLE> - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -## -## apache-rproxy.conf-servers -- Apache/mod_rewrite selection table -## - -# list of backend servers which serve static -# pages (HTML files and Images, etc.) -static www1.foo.dom|www2.foo.dom|www3.foo.dom|www4.foo.dom - -# list of backend servers which serve dynamically -# generated page (CGI programs or mod_perl scripts) -dynamic www5.foo.dom|www6.foo.dom -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>New MIME-type, New Service</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -On the net there are a lot of nifty CGI programs. But their usage is usually -boring, so a lot of webmaster don't use them. Even Apache's Action handler -feature for MIME-types is only appropriate when the CGI programs don't need -special URLs (actually PATH_INFO and QUERY_STRINGS) as their input. - -First, let us configure a new file type with extension <CODE>.scgi</CODE> -(for secure CGI) which will be processed by the popular <CODE>cgiwrap</CODE> -program. The problem here is that for instance we use a Homogeneous URL Layout -(see above) a file inside the user homedirs has the URL -<CODE>/u/user/foo/bar.scgi</CODE>. But <CODE>cgiwrap</CODE> needs the URL in the form -<CODE>/~user/foo/bar.scgi/</CODE>. The following rule solves the problem: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteRule ^/[uge]/<STRONG>([^/]+)</STRONG>/\.www/(.+)\.scgi(.*) ... -... /internal/cgi/user/cgiwrap/~<STRONG>$1</STRONG>/$2.scgi$3 [NS,<STRONG>T=application/x-http-cgi</STRONG>] -</PRE></TD></TR></TABLE> - -<P> -Or assume we have some more nifty programs: -<CODE>wwwlog</CODE> (which displays the <CODE>access.log</CODE> for a URL subtree and -<CODE>wwwidx</CODE> (which runs Glimpse on a URL subtree). We have to -provide the URL area to these programs so they know on which area -they have to act on. But usually this ugly, because they are all the -times still requested from that areas, i.e. typically we would run -the <CODE>swwidx</CODE> program from within <CODE>/u/user/foo/</CODE> via -hyperlink to - -<P><PRE> -/internal/cgi/user/swwidx?i=/u/user/foo/ -</PRE><P> - -which is ugly. Because we have to hard-code <STRONG>both</STRONG> the location of the -area <STRONG>and</STRONG> the location of the CGI inside the hyperlink. When we have to -reorganise or area, we spend a lot of time changing the various hyperlinks. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -The solution here is to provide a special new URL format which automatically -leads to the proper CGI invocation. We configure the following: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/wwwidx?i=/$1/$2$3/ -RewriteRule ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3 -</PRE></TD></TR></TABLE> - -<P> -Now the hyperlink to search at <CODE>/u/user/foo/</CODE> reads only - -<P><PRE> -HREF="*" -</PRE><P> - -which internally gets automatically transformed to - -<P><PRE> -/internal/cgi/user/wwwidx?i=/u/user/foo/ -</PRE><P> - -The same approach leads to an invocation for the access log CGI -program when the hyperlink <CODE>:log</CODE> gets used. - -</DL> - -<P> -<H2>From Static to Dynamic</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -How can we transform a static page <CODE>foo.html</CODE> into a dynamic variant -<CODE>foo.cgi</CODE> in a seemless way, i.e. without notice by the browser/user. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We just rewrite the URL to the CGI-script and force the correct MIME-type so -it gets really run as a CGI-script. This way a request to -<CODE>/~quux/foo.html</CODE> internally leads to the invokation of -<CODE>/~quux/foo.cgi</CODE>. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteBase /~quux/ -RewriteRule ^foo\.<STRONG>html</STRONG>$ foo.<STRONG>cgi</STRONG> [T=<STRONG>application/x-httpd-cgi</STRONG>] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>On-the-fly Content-Regeneration</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Here comes a really esoteric feature: Dynamically generated but statically -served pages, i.e. pages should be delivered as pure static pages (read from -the filesystem and just passed through), but they have to be generated -dynamically by the webserver if missing. This way you can have CGI-generated -pages which are statically served unless one (or a cronjob) removes the static -contents. Then the contents gets refreshed. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -This is done via the following ruleset: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteCond %{REQUEST_FILENAME} <STRONG>!-s</STRONG> -RewriteRule ^page\.<STRONG>html</STRONG>$ page.<STRONG>cgi</STRONG> [T=application/x-httpd-cgi,L] -</PRE></TD></TR></TABLE> - -<P> -Here a request to <CODE>page.html</CODE> leads to a internal run of a -corresponding <CODE>page.cgi</CODE> if <CODE>page.html</CODE> is still missing or has -filesize null. The trick here is that <CODE>page.cgi</CODE> is a usual CGI script -which (additionally to its STDOUT) writes its output to the file -<CODE>page.html</CODE>. Once it was run, the server sends out the data of -<CODE>page.html</CODE>. When the webmaster wants to force a refresh the contents, -he just removes <CODE>page.html</CODE> (usually done by a cronjob). - -</DL> - -<P> -<H2>Document With Autorefresh</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Wouldn't it be nice while creating a complex webpage if the webbrowser would -automatically refresh the page every time we write a new version from within -our editor? Impossible? - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -No! We just combine the MIME multipart feature, the webserver NPH feature and -the URL manipulation power of mod_rewrite. First, we establish a new URL -feature: Adding just <CODE>:refresh</CODE> to any URL causes this to be refreshed -every time it gets updated on the filesystem. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1 -</PRE></TD></TR></TABLE> - -<P> -Now when we reference the URL - -<P><PRE> -/u/foo/bar/page.html:refresh -</PRE><P> - -this leads to the internal invocation of the URL - -<P><PRE> -/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html -</PRE><P> - -The only missing part is the NPH-CGI script. Although one would usually say -"left as an exercise to the reader" ;-) I will provide this, too. - -<P><PRE> -#!/sw/bin/perl -## -## nph-refresh -- NPH/CGI script for auto refreshing pages -## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved. -## -$| = 1; - -# split the QUERY_STRING variable -@pairs = split(/&/, $ENV{'QUERY_STRING'}); -foreach $pair (@pairs) { - ($name, $value) = split(/=/, $pair); - $name =~ tr/A-Z/a-z/; - $name = 'QS_' . $name; - $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; - eval "\$$name = \"$value\""; -} -$QS_s = 1 if ($QS_s eq ''); -$QS_n = 3600 if ($QS_n eq ''); -if ($QS_f eq '') { - print "HTTP/1.0 200 OK\n"; - print "Content-type: text/html\n\n"; - print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n"; - exit(0); -} -if (! -f $QS_f) { - print "HTTP/1.0 200 OK\n"; - print "Content-type: text/html\n\n"; - print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n"; - exit(0); -} - -sub print_http_headers_multipart_begin { - print "HTTP/1.0 200 OK\n"; - $bound = "ThisRandomString12345"; - print "Content-type: multipart/x-mixed-replace;boundary=$bound\n"; - &print_http_headers_multipart_next; -} - -sub print_http_headers_multipart_next { - print "\n--$bound\n"; -} - -sub print_http_headers_multipart_end { - print "\n--$bound--\n"; -} - -sub displayhtml { - local($buffer) = @_; - $len = length($buffer); - print "Content-type: text/html\n"; - print "Content-length: $len\n\n"; - print $buffer; -} - -sub readfile { - local($file) = @_; - local(*FP, $size, $buffer, $bytes); - ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file); - $size = sprintf("%d", $size); - open(FP, "&lt;$file"); - $bytes = sysread(FP, $buffer, $size); - close(FP); - return $buffer; -} - -$buffer = &readfile($QS_f); -&print_http_headers_multipart_begin; -&displayhtml($buffer); - -sub mystat { - local($file) = $_[0]; - local($time); - - ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file); - return $mtime; -} - -$mtimeL = &mystat($QS_f); -$mtime = $mtime; -for ($n = 0; $n &lt; $QS_n; $n++) { - while (1) { - $mtime = &mystat($QS_f); - if ($mtime ne $mtimeL) { - $mtimeL = $mtime; - sleep(2); - $buffer = &readfile($QS_f); - &print_http_headers_multipart_next; - &displayhtml($buffer); - sleep(5); - $mtimeL = &mystat($QS_f); - last; - } - sleep($QS_s); - } -} - -&print_http_headers_multipart_end; - -exit(0); - -##EOF## -</PRE> - -</DL> - -<P> -<H2>Mass Virtual Hosting</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -The <CODE><VirtualHost></CODE> feature of Apache is nice and works great -when you just have a few dozens virtual hosts. But when you are an ISP and -have hundreds of virtual hosts to provide this feature is not the best choice. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -To provide this feature we map the remote webpage or even the complete remote -webarea to our namespace by the use of the <I>Proxy Throughput</I> feature -(flag [P]): - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -## -## vhost.map -## -www.vhost1.dom:80 /path/to/docroot/vhost1 -www.vhost2.dom:80 /path/to/docroot/vhost2 - : -www.vhostN.dom:80 /path/to/docroot/vhostN -</PRE></TD></TR></TABLE> - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -## -## httpd.conf -## - : -# use the canonical hostname on redirects, etc. -UseCanonicalName on - - : -# add the virtual host in front of the CLF-format -CustomLog /path/to/access_log "%{VHOST}e %h %l %u %t \"%r\" %>s %b" - : - -# enable the rewriting engine in the main server -RewriteEngine on - -# define two maps: one for fixing the URL and one which defines -# the available virtual hosts with their corresponding -# DocumentRoot. -RewriteMap lowercase int:tolower -RewriteMap vhost txt:/path/to/vhost.map - -# Now do the actual virtual host mapping -# via a huge and complicated single rule: -# -# 1. make sure we don't map for common locations -RewriteCond %{REQUEST_URL} !^/commonurl1/.* -RewriteCond %{REQUEST_URL} !^/commonurl2/.* - : -RewriteCond %{REQUEST_URL} !^/commonurlN/.* -# -# 2. make sure we have a Host header, because -# currently our approach only supports -# virtual hosting through this header -RewriteCond %{HTTP_HOST} !^$ -# -# 3. lowercase the hostname -RewriteCond ${lowercase:%{HTTP_HOST}|NONE} ^(.+)$ -# -# 4. lookup this hostname in vhost.map and -# remember it only when it is a path -# (and not "NONE" from above) -RewriteCond ${vhost:%1} ^(/.*)$ -# -# 5. finally we can map the URL to its docroot location -# and remember the virtual host for logging puposes -RewriteRule ^/(.*)$ %1/$1 [E=VHOST:${lowercase:%{HTTP_HOST}}] - : -</PRE></TD></TR></TABLE> - -</DL> - -<H1>Access Restriction</H1> - -<P> -<H2>Blocking of Robots</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -How can we block a really annoying robot from retrieving pages of a specific -webarea? A <CODE>/robots.txt</CODE> file containing entries of the "Robot -Exclusion Protocol" is typically not enough to get rid of such a robot. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We use a ruleset which forbids the URLs of the webarea -<CODE>/~quux/foo/arc/</CODE> (perhaps a very deep directory indexed area where the -robot traversal would create big server load). We have to make sure that we -forbid access only to the particular robot, i.e. just forbidding the host -where the robot runs is not enough. This would block users from this host, -too. We accomplish this by also matching the User-Agent HTTP header -information. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteCond %{HTTP_USER_AGENT} ^<STRONG>NameOfBadRobot</STRONG>.* -RewriteCond %{REMOTE_ADDR} ^<STRONG>123\.45\.67\.[8-9]</STRONG>$ -RewriteRule ^<STRONG>/~quux/foo/arc/</STRONG>.+ - [<STRONG>F</STRONG>] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Blocked Inline-Images</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Assume we have under http://www.quux-corp.de/~quux/ some pages with inlined -GIF graphics. These graphics are nice, so others directly incorporate them via -hyperlinks to their pages. We don't like this practice because it adds useless -traffic to our server. - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -While we cannot 100% protect the images from inclusion, we -can at least restrict the cases where the browser sends -a HTTP Referer header. - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteCond %{HTTP_REFERER} <STRONG>!^$</STRONG> -RewriteCond %{HTTP_REFERER} !^http://www.quux-corp.de/~quux/.*$ [NC] -RewriteRule <STRONG>.*\.gif$</STRONG> - [F] -</PRE></TD></TR></TABLE> - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteCond %{HTTP_REFERER} !^$ -RewriteCond %{HTTP_REFERER} !.*/foo-with-gif\.html$ -RewriteRule <STRONG>^inlined-in-foo\.gif$</STRONG> - [F] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Host Deny</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -How can we forbid a list of externally configured hosts from using our server? - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> - -For Apache >= 1.3b6: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteMap hosts-deny txt:/path/to/hosts.deny -RewriteCond ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR] -RewriteCond ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND -RewriteRule ^/.* - [F] -</PRE></TD></TR></TABLE><P> - -For Apache <= 1.3b6: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteMap hosts-deny txt:/path/to/hosts.deny -RewriteRule ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1 -RewriteRule !^NOT-FOUND/.* - [F] -RewriteRule ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1 -RewriteRule !^NOT-FOUND/.* - [F] -RewriteRule ^NOT-FOUND/(.*)$ /$1 -</PRE></TD></TR></TABLE> - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -## -## hosts.deny -## -## ATTENTION! This is a map, not a list, even when we treat it as such. -## mod_rewrite parses it for key/value pairs, so at least a -## dummy value "-" must be present for each entry. -## - -193.102.180.41 - -bsdti1.sdm.de - -192.76.162.40 - -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Proxy Deny</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -How can we forbid a certain host or even a user of a special host from using -the Apache proxy? - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We first have to make sure mod_rewrite is below(!) mod_proxy in the -<CODE>Configuration</CODE> file when compiling the Apache webserver. This way it -gets called _before_ mod_proxy. Then we configure the following for a -host-dependend deny... - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteCond %{REMOTE_HOST} <STRONG>^badhost\.mydomain\.com$</STRONG> -RewriteRule !^http://[^/.]\.mydomain.com.* - [F] -</PRE></TD></TR></TABLE> - -<P>...and this one for a user@host-dependend deny: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <STRONG>^badguy@badhost\.mydomain\.com$</STRONG> -RewriteRule !^http://[^/.]\.mydomain.com.* - [F] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Special Authentication Variant</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -Sometimes a very special authentication is needed, for instance a -authentication which checks for a set of explicitly configured users. Only -these should receive access and without explicit prompting (which would occur -when using the Basic Auth via mod_access). - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -We use a list of rewrite conditions to exclude all except our friends: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <STRONG>!^friend1@client1.quux-corp\.com$</STRONG> -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <STRONG>!^friend2</STRONG>@client2.quux-corp\.com$ -RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <STRONG>!^friend3</STRONG>@client3.quux-corp\.com$ -RewriteRule ^/~quux/only-for-friends/ - [F] -</PRE></TD></TR></TABLE> - -</DL> - -<P> -<H2>Referer-based Deflector</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -How can we program a flexible URL Deflector which acts on the "Referer" HTTP -header and can be configured with as many referring pages as we like? - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -Use the following really tricky ruleset... - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteMap deflector txt:/path/to/deflector.map - -RewriteCond %{HTTP_REFERER} !="" -RewriteCond ${deflector:%{HTTP_REFERER}} ^-$ -RewriteRule ^.* %{HTTP_REFERER} [R,L] - -RewriteCond %{HTTP_REFERER} !="" -RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND -RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L] -</PRE></TD></TR></TABLE> - -<P>... -in conjunction with a corresponding rewrite map: - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -## -## deflector.map -## - -http://www.badguys.com/bad/index.html - -http://www.badguys.com/bad/index2.html - -http://www.badguys.com/bad/index3.html http://somewhere.com/ -</PRE></TD></TR></TABLE> - -<P> -This automatically redirects the request back to the referring page (when "-" -is used as the value in the map) or to a specific URL (when an URL is -specified in the map as the second argument). - -</DL> - -<H1>Other</H1> - -<P> -<H2>External Rewriting Engine</H2> -<P> - -<DL> -<DT><STRONG>Description:</STRONG> -<DD> -A FAQ: How can we solve the FOO/BAR/QUUX/etc. problem? There seems no solution -by the use of mod_rewrite... - -<P> -<DT><STRONG>Solution:</STRONG> -<DD> -Use an external rewrite map, i.e. a program which acts like a rewrite map. It -is run once on startup of Apache receives the requested URLs on STDIN and has -to put the resulting (usually rewritten) URL on STDOUT (same order!). - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -RewriteEngine on -RewriteMap quux-map <STRONG>prg:</STRONG>/path/to/map.quux.pl -RewriteRule ^/~quux/(.*)$ /~quux/<STRONG>${quux-map:$1}</STRONG> -</PRE></TD></TR></TABLE> - -<P><TABLE BGCOLOR="#E0E5F5" BORDER="0" CELLSPACING="0" CELLPADDING="5"><TR><TD><PRE> -#!/path/to/perl - -# disable buffered I/O which would lead -# to deadloops for the Apache server -$| = 1; - -# read URLs one per line from stdin and -# generate substitution URL on stdout -while (<>) { - s|^foo/|bar/|; - print $_; -} -</PRE></TD></TR></TABLE> - -<P> -This is a demonstration-only example and just rewrites all URLs -<CODE>/~quux/foo/...</CODE> to <CODE>/~quux/bar/...</CODE>. Actually you can program -whatever you like. But notice that while such maps can be <STRONG>used</STRONG> also by -an average user, only the system administrator can <STRONG>define</STRONG> it. - -</DL> - -<!--#include virtual="footer.html" --> -</BLOCKQUOTE> -</BODY> -</HTML> diff --git a/docs/manual/misc/security_tips.html b/docs/manual/misc/security_tips.html deleted file mode 100644 index 964d7d89bf..0000000000 --- a/docs/manual/misc/security_tips.html +++ /dev/null @@ -1,231 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> -<TITLE>Apache HTTP Server: Security Tips</TITLE> -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> -<H1 ALIGN="CENTER">Security Tips for Server Configuration</H1> - -<HR> - -<P>Some hints and tips on security issues in setting up a web server. Some of -the suggestions will be general, others specific to Apache. - -<HR> - -<H2><A NAME="serverroot">Permissions on ServerRoot Directories</A></H2> -<P>In typical operation, Apache is started by the root -user, and it switches to the user defined by the <A -HREF="../mod/core.html#user"><STRONG>User</STRONG></A> directive to serve hits. -As is the case with any command that root executes, you must take care -that it is protected from modification by non-root users. Not only -must the files themselves be writeable only by root, but so must the -directories, and parents of all directories. For example, if you -choose to place ServerRoot in <CODE>/usr/local/apache</CODE> then it is -suggested that you create that directory as root, with commands -like these: - -<BLOCKQUOTE><PRE> - mkdir /usr/local/apache - cd /usr/local/apache - mkdir bin conf logs - chown 0 . bin conf logs - chgrp 0 . bin conf logs - chmod 755 . bin conf logs -</PRE></BLOCKQUOTE> - -It is assumed that /, /usr, and /usr/local are only modifiable by root. -When you install the httpd executable, you should ensure that it is -similarly protected: - -<BLOCKQUOTE><PRE> - cp httpd /usr/local/apache/bin - chown 0 /usr/local/apache/bin/httpd - chgrp 0 /usr/local/apache/bin/httpd - chmod 511 /usr/local/apache/bin/httpd -</PRE></BLOCKQUOTE> - -<P>You can create an htdocs subdirectory which is modifiable by other -users -- since root never executes any files out of there, and shouldn't -be creating files in there. - -<P>If you allow non-root users to modify any files that root either -executes or writes on then you open your system to root compromises. -For example, someone could replace the httpd binary so that the next -time you start it, it will execute some arbitrary code. If the logs -directory is writeable (by a non-root user), someone -could replace a log file with a symlink to some other system file, -and then root might overwrite that file with arbitrary data. If the -log files themselves are writeable (by a non-root user), then someone -may be able to overwrite the log itself with bogus data. -<P> -<HR> -<H2>Server Side Includes</H2> -<P>Server side includes (SSI) can be configured so that users can execute -arbitrary programs on the server. That thought alone should send a shiver -down the spine of any sys-admin.<P> - -One solution is to disable that part of SSI. To do that you use the -IncludesNOEXEC option to the <A HREF="../mod/core.html#options">Options</A> -directive.<P> - -<HR> - -<H2>Non Script Aliased CGI</H2> -<P>Allowing users to execute <STRONG>CGI</STRONG> scripts in any directory -should only -be considered if; -<OL> - <LI>You trust your users not to write scripts which will deliberately or -accidentally expose your system to an attack. - <LI>You consider security at your site to be so feeble in other areas, as to -make one more potential hole irrelevant. - <LI>You have no users, and nobody ever visits your server. -</OL><P> -<HR> - -<H2>Script Alias'ed CGI</H2> -<P>Limiting <STRONG>CGI</STRONG> to special directories gives the admin -control over -what goes into those directories. This is inevitably more secure than -non script aliased CGI, but <STRONG>only if users with write access to the -directories are trusted</STRONG> or the admin is willing to test each new CGI -script/program for potential security holes.<P> - -Most sites choose this option over the non script aliased CGI approach.<P> - -<HR> -<H2>CGI in general</H2> -<P>Always remember that you must trust the writers of the CGI script/programs -or your ability to spot potential security holes in CGI, whether they were -deliberate or accidental.<P> - -All the CGI scripts will run as the same user, so they have potential to -conflict (accidentally or deliberately) with other scripts <EM>e.g.</EM> -User A hates User B, so he writes a script to trash User B's CGI -database. One program which can be used to allow scripts to run -as different users is <A HREF="../suexec.html">suEXEC</A> which is -included with Apache as of 1.2 and is called from special hooks in -the Apache server code. Another popular way of doing this is with -<A HREF="http://wwwcgi.umr.edu/~cgiwrap/">CGIWrap</A>. <P> - -<HR> - - -<H2>Stopping users overriding system wide settings...</H2> -<P>To run a really tight ship, you'll want to stop users from setting -up <CODE>.htaccess</CODE> files which can override security features -you've configured. Here's one way to do it...<P> - -In the server configuration file, put -<BLOCKQUOTE><CODE> -<Directory /> <BR> -AllowOverride None <BR> -Options None <BR> -Allow from all <BR> -</Directory> <BR> -</CODE></BLOCKQUOTE> - -Then setup for specific directories<P> - -This stops all overrides, Includes and accesses in all directories apart -from those named.<P> -<HR> -<H2> - Protect server files by default -</H2> -<P> -One aspect of Apache which is occasionally misunderstood is the feature -of default access. That is, unless you take steps to change it, if the -server can find its way to a file through normal URL mapping rules, it -can serve it to clients. -</P> -<P> -For instance, consider the following example: -</P> -<OL> - <LI><SAMP># cd /; ln -s / public_html</SAMP> - </LI> - <LI>Accessing <SAMP>http://localhost/~root/</SAMP> - </LI> -</OL> -<P> -This would allow clients to walk through the entire filesystem. To work -around this, add the following block to your server's configuration: -</P> -<PRE> - <Directory /> - Order Deny,Allow - Deny from all - </Directory> -</PRE> -<P> -This will forbid default access to filesystem locations. Add -appropriate -<A - HREF="../mod/core.html#directory" -><SAMP><Directory></SAMP></A> -blocks to allow access only -in those areas you wish. For example, -</P> -<PRE> - <Directory /usr/users/*/public_html> - Order Deny,Allow - Allow from all - </Directory> - <Directory /usr/local/httpd> - Order Deny,Allow - Allow from all - </Directory> -</PRE> -<P> -Pay particular attention to the interactions of -<A - HREF="../mod/core.html#location" -><SAMP><Location></SAMP></A> -and -<A - HREF="../mod/core.html#directory" -><SAMP><Directory></SAMP></A> -directives; for instance, even if <SAMP><Directory /></SAMP> -denies access, a <SAMP><Location /></SAMP> directive might -overturn it. -</P> -<P> -Also be wary of playing games with the -<A - HREF="../mod/mod_userdir.html#userdir" ->UserDir</A> -directive; setting it to something like <SAMP>"./"</SAMP> -would have the same effect, for root, as the first example above. -If you are using Apache 1.3 or above, we strongly recommend that you -include the following line in your server configuration files: -</P> -<DL> - <DD><SAMP>UserDir disabled root</SAMP> - </DD> -</DL> - -<HR> -<P>Please send any other useful security tips to The Apache Group -by filling out a -<A HREF="http://www.apache.org/bug_report.html">problem report</A>. -If you are confident you have found a security bug in the Apache -source code itself, <A -HREF="http://www.apache.org/security_report.html">please let us -know</A>. - -<P> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> diff --git a/docs/manual/misc/tutorials.html b/docs/manual/misc/tutorials.html deleted file mode 100644 index 90bcdb2d15..0000000000 --- a/docs/manual/misc/tutorials.html +++ /dev/null @@ -1,209 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> -<TITLE>Apache Tutorials</TITLE> -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> - -<blockquote><strong>Warning:</strong> -This document has not been updated to take into account changes -made in the 2.0 version of the Apache HTTP Server. Some of the -information may still be relevant, but please use it -with care. -</blockquote> - - -<H1 ALIGN="CENTER">Apache Tutorials</H1> - -<P>The following documents give you step-by-step instructions on how -to accomplish common tasks with the Apache http server. Many of these -documents are located at external sites and are not the work of the -Apache Software Foundation. Copyright to documents on external sites -is owned by the authors or their assignees. Please consult the <A -HREF="../">official Apache Server documentation</A> to verify what you -read on external sites. - - -<H2>Installation & Getting Started</H2> - -<UL> - -<LI><A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-06-1-001-01-NW-DP-LF" ->Getting Started with Apache 1.3</A> (ApacheToday) - -<LI><A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-07-10-001-01-NW-LF-SW" ->Configuring Your Apache Server Installation</A> (ApacheToday) - -<LI><A -HREF="http://oreilly.apacheweek.com/pub/a/apache/2000/02/24/installing_apache.html" ->Getting, Installing, and Running Apache (on Unix)</A> (O'Reilly -Network Apache DevCenter) - -<LI><A HREF="http://www.builder.com/Servers/Apache/ss01.html">Maximum -Apache: Getting Started</A> (CNET Builder.com) - -<LI><A HREF="http://www.devshed.com/Server_Side/Administration/APACHE/" ->How to Build the Apache of Your Dreams</A> (Developer Shed) - -</UL> - - -<H2>Basic Configuration</H2> - -<UL> - -<LI><A -HREF="http://oreilly.apacheweek.com/pub/a/apache/2000/03/02/configuring_apache.html" ->An Amble Through Apache Configuration</A> (O'Reilly Network Apache -DevCenter) - -<LI><A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-07-19-002-01-NW-LF-SW" ->Using .htaccess Files with Apache</A> (ApacheToday) - -<LI><A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-07-17-001-01-PS" ->Setting Up Virtual Hosts</A> (ApacheToday) - -<LI><A HREF="http://www.builder.com/Servers/Apache/ss02.html">Maximum -Apache: Configure Apache</A> (CNET Builder.com) - -<LI>Getting More Out of Apache <A HREF="http://www.devshed.com/Server_Side/Administration/MoreApache/">Part 1</A> - <A HREF="http://www.devshed.com/Server_Side/Administration/MoreApache2/">Part 2</A> (Developer Shed) - -</UL> - -<H2>Security</H2> - -<UL> - -<LI><A -HREF="http://www.linuxplanet.com/linuxplanet/tutorials/1527/1/">Security -and Apache: An Essential Primer</A> (LinuxPlanet) - -<LI><A HREF="http://www.apacheweek.com/features/userauth">Using User -Authentication</A> (Apacheweek) - -<LI><A HREF="http://www.apacheweek.com/features/dbmauth">DBM User -Authentication</A> (Apacheweek) - -<LI><A -HREF="http://linux.com/security/newsitem.phtml?sid=12&aid=3549">An -Introduction to Securing Apache</A> (Linux.com) - -<LI><A -HREF="http://linux.com/security/newsitem.phtml?sid=12&aid=3667">Securing -Apache - Access Control</A> (Linux.com) - -<LI>Apache Authentication <A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-07-24-002-01-NW-LF-SW" ->Part 1</A> - <A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-07-31-001-01-NW-DP-LF" ->Part 2</A> - <A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-08-07-001-01-NW-LF-SW" ->Part 3</A> - <A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-08-14-001-01-NW-LF-SW" ->Part 4</A> (ApacheToday) - -<LI><a href="http://apachetoday.com/news_story.php3?ltsn=2000-11-13-003-01-SC-LF-SW" ->mod_access: Restricting Access by Host</a> (ApacheToday) - -</UL> - -<H2>Logging</H2> - -<UL> - -<LI><A -HREF="http://oreilly.apacheweek.com/pub/a/apache/2000/03/10/log_rhythms.html" ->Log Rhythms</A> (O'Reilly Network Apache DevCenter) - -<LI><A HREF="http://www.apacheweek.com/features/logfiles">Gathering -Visitor Information: Customising Your Logfiles</A> (Apacheweek) - -<LI>Apache Guide: Logging -<A HREF="http://apachetoday.com/news_story.php3?ltsn=2000-08-21-003-01-NW-LF-SW" ->Part 1</A> - -<A HREF="http://apachetoday.com/news_story.php3?ltsn=2000-08-28-001-01-NW-LF-SW" ->Part 2</A> - -<A HREF="http://apachetoday.com/news_story.php3?ltsn=2000-09-05-001-01-NW-LF-SW" ->Part 3</A> - -<A HREF="http://apachetoday.com/news_story.php3?ltsn=2000-09-18-003-01-NW-LF-SW" ->Part 4</A> - -<A HREF="http://apachetoday.com/news_story.php3?ltsn=2000-09-25-001-01-NW-LF-SW" ->Part 5</A> (ApacheToday) - -</UL> - -<H2>CGI and SSI</H2> - -<UL> - -<LI><A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-06-05-001-10-NW-LF-SW" ->Dynamic Content with CGI</A> (ApacheToday) - -<LI><A -HREF="http://www.perl.com/CPAN-local/doc/FAQs/cgi/idiots-guide.html">The -Idiot's Guide to Solving Perl CGI Problems</A> (CPAN) - -<LI><A -HREF="http://www.linuxplanet.com/linuxplanet/tutorials/1445/1/">Executing -CGI Scripts as Other Users</A> (LinuxPlanet) - -<LI><A HREF="http://www.htmlhelp.org/faq/cgifaq.html">CGI Programming -FAQ</A> (Web Design Group) - -<LI>Introduction to Server Side Includes <A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-06-12-001-01-PS">Part -1</A> - <A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-06-19-002-01-NW-LF-SW" ->Part 2</A> (ApacheToday) - -<LI><A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-06-26-001-01-NW-LF-SW" ->Advanced SSI Techniques</A> (ApacheToday) - -<LI><A -HREF="http://www.builder.com/Servers/ApacheFiles/082400/">Setting up -CGI and SSI with Apache</A> (CNET Builder.com) - -</UL> - -<H2>Other Features</H2> - -<UL> - -<LI><A HREF="http://www.apacheweek.com/features/negotiation">Content -Negotiation Explained</A> (Apacheweek) - -<LI><A HREF="http://www.apacheweek.com/features/imagemaps">Using -Apache Imagemaps</A> (Apacheweek) - -<LI><A -HREF="http://apachetoday.com/news_story.php3?ltsn=2000-06-14-002-01-PS" ->Keeping Your Images from Adorning Other Sites</A> (ApacheToday) - -<LI><A HREF="http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html" ->Language Negotiation Notes</A> (Alan J. Flavell) - -</UL> - - -<P>If you have a pointer to a an accurate and well-written tutorial -not included here, please let us know by submitting it to the -<A HREF="http://bugs.apache.org/">Apache Bug Database</A>. - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> |