diff options
Diffstat (limited to 'docs/manual/misc/API.html')
-rw-r--r-- | docs/manual/misc/API.html | 1161 |
1 files changed, 0 insertions, 1161 deletions
diff --git a/docs/manual/misc/API.html b/docs/manual/misc/API.html deleted file mode 100644 index 496be760c9..0000000000 --- a/docs/manual/misc/API.html +++ /dev/null @@ -1,1161 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML><HEAD> -<TITLE>Apache API notes</TITLE> -</HEAD> -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> - -<blockquote><strong>Warning:</strong> -This document has not been updated to take into account changes -made in the 2.0 version of the Apache HTTP Server. Some of the -information may still be relevant, but please use it -with care. -</blockquote> - -<H1 ALIGN="CENTER">Apache API notes</H1> - -These are some notes on the Apache API and the data structures you -have to deal with, <EM>etc.</EM> They are not yet nearly complete, but -hopefully, they will help you get your bearings. Keep in mind that -the API is still subject to change as we gain experience with it. -(See the TODO file for what <EM>might</EM> be coming). However, -it will be easy to adapt modules to any changes that are made. -(We have more modules to adapt than you do). -<P> - -A few notes on general pedagogical style here. In the interest of -conciseness, all structure declarations here are incomplete --- the -real ones have more slots that I'm not telling you about. For the -most part, these are reserved to one component of the server core or -another, and should be altered by modules with caution. However, in -some cases, they really are things I just haven't gotten around to -yet. Welcome to the bleeding edge.<P> - -Finally, here's an outline, to give you some bare idea of what's -coming up, and in what order: - -<UL> -<LI> <A HREF="#basics">Basic concepts.</A> -<MENU> - <LI> <A HREF="#HMR">Handlers, Modules, and Requests</A> - <LI> <A HREF="#moduletour">A brief tour of a module</A> -</MENU> -<LI> <A HREF="#handlers">How handlers work</A> -<MENU> - <LI> <A HREF="#req_tour">A brief tour of the <CODE>request_rec</CODE></A> - <LI> <A HREF="#req_orig">Where request_rec structures come from</A> - <LI> <A HREF="#req_return">Handling requests, declining, and returning error - codes</A> - <LI> <A HREF="#resp_handlers">Special considerations for response handlers</A> - <LI> <A HREF="#auth_handlers">Special considerations for authentication - handlers</A> - <LI> <A HREF="#log_handlers">Special considerations for logging handlers</A> -</MENU> -<LI> <A HREF="#pools">Resource allocation and resource pools</A> -<LI> <A HREF="#config">Configuration, commands and the like</A> -<MENU> - <LI> <A HREF="#per-dir">Per-directory configuration structures</A> - <LI> <A HREF="#commands">Command handling</A> - <LI> <A HREF="#servconf">Side notes --- per-server configuration, - virtual servers, <EM>etc</EM>.</A> -</MENU> -</UL> - -<H2><A NAME="basics">Basic concepts.</A></H2> - -We begin with an overview of the basic concepts behind the -API, and how they are manifested in the code. - -<H3><A NAME="HMR">Handlers, Modules, and Requests</A></H3> - -Apache breaks down request handling into a series of steps, more or -less the same way the Netscape server API does (although this API has -a few more stages than NetSite does, as hooks for stuff I thought -might be useful in the future). These are: - -<UL> - <LI> URI -> Filename translation - <LI> Auth ID checking [is the user who they say they are?] - <LI> Auth access checking [is the user authorized <EM>here</EM>?] - <LI> Access checking other than auth - <LI> Determining MIME type of the object requested - <LI> `Fixups' --- there aren't any of these yet, but the phase is - intended as a hook for possible extensions like - <CODE>SetEnv</CODE>, which don't really fit well elsewhere. - <LI> Actually sending a response back to the client. - <LI> Logging the request -</UL> - -These phases are handled by looking at each of a succession of -<EM>modules</EM>, looking to see if each of them has a handler for the -phase, and attempting invoking it if so. The handler can typically do -one of three things: - -<UL> - <LI> <EM>Handle</EM> the request, and indicate that it has done so - by returning the magic constant <CODE>OK</CODE>. - <LI> <EM>Decline</EM> to handle the request, by returning the magic - integer constant <CODE>DECLINED</CODE>. In this case, the - server behaves in all respects as if the handler simply hadn't - been there. - <LI> Signal an error, by returning one of the HTTP error codes. - This terminates normal handling of the request, although an - ErrorDocument may be invoked to try to mop up, and it will be - logged in any case. -</UL> - -Most phases are terminated by the first module that handles them; -however, for logging, `fixups', and non-access authentication -checking, all handlers always run (barring an error). Also, the -response phase is unique in that modules may declare multiple handlers -for it, via a dispatch table keyed on the MIME type of the requested -object. Modules may declare a response-phase handler which can handle -<EM>any</EM> request, by giving it the key <CODE>*/*</CODE> (<EM>i.e.</EM>, a -wildcard MIME type specification). However, wildcard handlers are -only invoked if the server has already tried and failed to find a more -specific response handler for the MIME type of the requested object -(either none existed, or they all declined).<P> - -The handlers themselves are functions of one argument (a -<CODE>request_rec</CODE> structure. vide infra), which returns an -integer, as above.<P> - -<H3><A NAME="moduletour">A brief tour of a module</A></H3> - -At this point, we need to explain the structure of a module. Our -candidate will be one of the messier ones, the CGI module --- this -handles both CGI scripts and the <CODE>ScriptAlias</CODE> config file -command. It's actually a great deal more complicated than most -modules, but if we're going to have only one example, it might as well -be the one with its fingers in every place.<P> - -Let's begin with handlers. In order to handle the CGI scripts, the -module declares a response handler for them. Because of -<CODE>ScriptAlias</CODE>, it also has handlers for the name -translation phase (to recognize <CODE>ScriptAlias</CODE>ed URIs), the -type-checking phase (any <CODE>ScriptAlias</CODE>ed request is typed -as a CGI script).<P> - -The module needs to maintain some per (virtual) -server information, namely, the <CODE>ScriptAlias</CODE>es in effect; -the module structure therefore contains pointers to a functions which -builds these structures, and to another which combines two of them (in -case the main server and a virtual server both have -<CODE>ScriptAlias</CODE>es declared).<P> - -Finally, this module contains code to handle the -<CODE>ScriptAlias</CODE> command itself. This particular module only -declares one command, but there could be more, so modules have -<EM>command tables</EM> which declare their commands, and describe -where they are permitted, and how they are to be invoked. <P> - -A final note on the declared types of the arguments of some of these -commands: a <CODE>pool</CODE> is a pointer to a <EM>resource pool</EM> -structure; these are used by the server to keep track of the memory -which has been allocated, files opened, <EM>etc.</EM>, either to service a -particular request, or to handle the process of configuring itself. -That way, when the request is over (or, for the configuration pool, -when the server is restarting), the memory can be freed, and the files -closed, <EM>en masse</EM>, without anyone having to write explicit code to -track them all down and dispose of them. Also, a -<CODE>cmd_parms</CODE> structure contains various information about -the config file being read, and other status information, which is -sometimes of use to the function which processes a config-file command -(such as <CODE>ScriptAlias</CODE>). - -With no further ado, the module itself: - -<PRE> -/* Declarations of handlers. */ - -int translate_scriptalias (request_rec *); -int type_scriptalias (request_rec *); -int cgi_handler (request_rec *); - -/* Subsidiary dispatch table for response-phase handlers, by MIME type */ - -handler_rec cgi_handlers[] = { -{ "application/x-httpd-cgi", cgi_handler }, -{ NULL } -}; - -/* Declarations of routines to manipulate the module's configuration - * info. Note that these are returned, and passed in, as void *'s; - * the server core keeps track of them, but it doesn't, and can't, - * know their internal structure. - */ - -void *make_cgi_server_config (pool *); -void *merge_cgi_server_config (pool *, void *, void *); - -/* Declarations of routines to handle config-file commands */ - -extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake, - char *real); - -command_rec cgi_cmds[] = { -{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2, - "a fakename and a realname"}, -{ NULL } -}; - -module cgi_module = { - STANDARD_MODULE_STUFF, - NULL, /* initializer */ - NULL, /* dir config creator */ - NULL, /* dir merger --- default is to override */ - make_cgi_server_config, /* server config */ - merge_cgi_server_config, /* merge server config */ - cgi_cmds, /* command table */ - cgi_handlers, /* handlers */ - translate_scriptalias, /* filename translation */ - NULL, /* check_user_id */ - NULL, /* check auth */ - NULL, /* check access */ - type_scriptalias, /* type_checker */ - NULL, /* fixups */ - NULL, /* logger */ - NULL /* header parser */ -}; -</PRE> - -<H2><A NAME="handlers">How handlers work</A></H2> - -The sole argument to handlers is a <CODE>request_rec</CODE> structure. -This structure describes a particular request which has been made to -the server, on behalf of a client. In most cases, each connection to -the client generates only one <CODE>request_rec</CODE> structure.<P> - -<H3><A NAME="req_tour">A brief tour of the <CODE>request_rec</CODE></A></H3> - -The <CODE>request_rec</CODE> contains pointers to a resource pool -which will be cleared when the server is finished handling the -request; to structures containing per-server and per-connection -information, and most importantly, information on the request itself.<P> - -The most important such information is a small set of character -strings describing attributes of the object being requested, including -its URI, filename, content-type and content-encoding (these being filled -in by the translation and type-check handlers which handle the -request, respectively). <P> - -Other commonly used data items are tables giving the MIME headers on -the client's original request, MIME headers to be sent back with the -response (which modules can add to at will), and environment variables -for any subprocesses which are spawned off in the course of servicing -the request. These tables are manipulated using the -<CODE>ap_table_get</CODE> and <CODE>ap_table_set</CODE> routines. <P> -<BLOCKQUOTE> - Note that the <SAMP>Content-type</SAMP> header value <EM>cannot</EM> be - set by module content-handlers using the <SAMP>ap_table_*()</SAMP> - routines. Rather, it is set by pointing the <SAMP>content_type</SAMP> - field in the <SAMP>request_rec</SAMP> structure to an appropriate - string. <EM>E.g.</EM>, - <PRE> - r->content_type = "text/html"; - </PRE> -</BLOCKQUOTE> -Finally, there are pointers to two data structures which, in turn, -point to per-module configuration structures. Specifically, these -hold pointers to the data structures which the module has built to -describe the way it has been configured to operate in a given -directory (via <CODE>.htaccess</CODE> files or -<CODE><Directory></CODE> sections), for private data it has -built in the course of servicing the request (so modules' handlers for -one phase can pass `notes' to their handlers for other phases). There -is another such configuration vector in the <CODE>server_rec</CODE> -data structure pointed to by the <CODE>request_rec</CODE>, which -contains per (virtual) server configuration data.<P> - -Here is an abridged declaration, giving the fields most commonly used:<P> - -<PRE> -struct request_rec { - - pool *pool; - conn_rec *connection; - server_rec *server; - - /* What object is being requested */ - - char *uri; - char *filename; - char *path_info; - char *args; /* QUERY_ARGS, if any */ - struct stat finfo; /* Set by server core; - * st_mode set to zero if no such file */ - - char *content_type; - char *content_encoding; - - /* MIME header environments, in and out. Also, an array containing - * environment variables to be passed to subprocesses, so people can - * write modules to add to that environment. - * - * The difference between headers_out and err_headers_out is that - * the latter are printed even on error, and persist across internal - * redirects (so the headers printed for ErrorDocument handlers will - * have them). - */ - - table *headers_in; - table *headers_out; - table *err_headers_out; - table *subprocess_env; - - /* Info about the request itself... */ - - int header_only; /* HEAD request, as opposed to GET */ - char *protocol; /* Protocol, as given to us, or HTTP/0.9 */ - char *method; /* GET, HEAD, POST, <EM>etc.</EM> */ - int method_number; /* M_GET, M_POST, <EM>etc.</EM> */ - - /* Info for logging */ - - char *the_request; - int bytes_sent; - - /* A flag which modules can set, to indicate that the data being - * returned is volatile, and clients should be told not to cache it. - */ - - int no_cache; - - /* Various other config info which may change with .htaccess files - * These are config vectors, with one void* pointer for each module - * (the thing pointed to being the module's business). - */ - - void *per_dir_config; /* Options set in config files, <EM>etc.</EM> */ - void *request_config; /* Notes on *this* request */ - -}; - -</PRE> - -<H3><A NAME="req_orig">Where request_rec structures come from</A></H3> - -Most <CODE>request_rec</CODE> structures are built by reading an HTTP -request from a client, and filling in the fields. However, there are -a few exceptions: - -<UL> - <LI> If the request is to an imagemap, a type map (<EM>i.e.</EM>, a - <CODE>*.var</CODE> file), or a CGI script which returned a - local `Location:', then the resource which the user requested - is going to be ultimately located by some URI other than what - the client originally supplied. In this case, the server does - an <EM>internal redirect</EM>, constructing a new - <CODE>request_rec</CODE> for the new URI, and processing it - almost exactly as if the client had requested the new URI - directly. <P> - - <LI> If some handler signaled an error, and an - <CODE>ErrorDocument</CODE> is in scope, the same internal - redirect machinery comes into play.<P> - - <LI> Finally, a handler occasionally needs to investigate `what - would happen if' some other request were run. For instance, - the directory indexing module needs to know what MIME type - would be assigned to a request for each directory entry, in - order to figure out what icon to use.<P> - - Such handlers can construct a <EM>sub-request</EM>, using the - functions <CODE>ap_sub_req_lookup_file</CODE>, - <CODE>ap_sub_req_lookup_uri</CODE>, and - <CODE>ap_sub_req_method_uri</CODE>; these construct a new - <CODE>request_rec</CODE> structure and processes it as you - would expect, up to but not including the point of actually - sending a response. (These functions skip over the access - checks if the sub-request is for a file in the same directory - as the original request).<P> - - (Server-side includes work by building sub-requests and then - actually invoking the response handler for them, via the - function <CODE>ap_run_sub_req</CODE>). -</UL> - -<H3><A NAME="req_return">Handling requests, declining, and returning error - codes</A></H3> - -As discussed above, each handler, when invoked to handle a particular -<CODE>request_rec</CODE>, has to return an <CODE>int</CODE> to -indicate what happened. That can either be - -<UL> - <LI> OK --- the request was handled successfully. This may or may - not terminate the phase. - <LI> DECLINED --- no erroneous condition exists, but the module - declines to handle the phase; the server tries to find another. - <LI> an HTTP error code, which aborts handling of the request. -</UL> - -Note that if the error code returned is <CODE>REDIRECT</CODE>, then -the module should put a <CODE>Location</CODE> in the request's -<CODE>headers_out</CODE>, to indicate where the client should be -redirected <EM>to</EM>. <P> - -<H3><A NAME="resp_handlers">Special considerations for response - handlers</A></H3> - -Handlers for most phases do their work by simply setting a few fields -in the <CODE>request_rec</CODE> structure (or, in the case of access -checkers, simply by returning the correct error code). However, -response handlers have to actually send a request back to the client. <P> - -They should begin by sending an HTTP response header, using the -function <CODE>ap_send_http_header</CODE>. (You don't have to do -anything special to skip sending the header for HTTP/0.9 requests; the -function figures out on its own that it shouldn't do anything). If -the request is marked <CODE>header_only</CODE>, that's all they should -do; they should return after that, without attempting any further -output. <P> - -Otherwise, they should produce a request body which responds to the -client as appropriate. The primitives for this are <CODE>ap_rputc</CODE> -and <CODE>ap_rprintf</CODE>, for internally generated output, and -<CODE>ap_send_fd</CODE>, to copy the contents of some <CODE>FILE *</CODE> -straight to the client. <P> - -At this point, you should more or less understand the following piece -of code, which is the handler which handles <CODE>GET</CODE> requests -which have no more specific handler; it also shows how conditional -<CODE>GET</CODE>s can be handled, if it's desirable to do so in a -particular response handler --- <CODE>ap_set_last_modified</CODE> checks -against the <CODE>If-modified-since</CODE> value supplied by the -client, if any, and returns an appropriate code (which will, if -nonzero, be USE_LOCAL_COPY). No similar considerations apply for -<CODE>ap_set_content_length</CODE>, but it returns an error code for -symmetry.<P> - -<PRE> -int default_handler (request_rec *r) -{ - int errstatus; - FILE *f; - - if (r->method_number != M_GET) return DECLINED; - if (r->finfo.st_mode == 0) return NOT_FOUND; - - if ((errstatus = ap_set_content_length (r, r->finfo.st_size)) - || (errstatus = ap_set_last_modified (r, r->finfo.st_mtime))) - return errstatus; - - f = fopen (r->filename, "r"); - - if (f == NULL) { - log_reason("file permissions deny server access", - r->filename, r); - return FORBIDDEN; - } - - register_timeout ("send", r); - ap_send_http_header (r); - - if (!r->header_only) send_fd (f, r); - ap_pfclose (r->pool, f); - return OK; -} -</PRE> - -Finally, if all of this is too much of a challenge, there are a few -ways out of it. First off, as shown above, a response handler which -has not yet produced any output can simply return an error code, in -which case the server will automatically produce an error response. -Secondly, it can punt to some other handler by invoking -<CODE>ap_internal_redirect</CODE>, which is how the internal redirection -machinery discussed above is invoked. A response handler which has -internally redirected should always return <CODE>OK</CODE>. <P> - -(Invoking <CODE>ap_internal_redirect</CODE> from handlers which are -<EM>not</EM> response handlers will lead to serious confusion). - -<H3><A NAME="auth_handlers">Special considerations for authentication - handlers</A></H3> - -Stuff that should be discussed here in detail: - -<UL> - <LI> Authentication-phase handlers not invoked unless auth is - configured for the directory. - <LI> Common auth configuration stored in the core per-dir - configuration; it has accessors <CODE>ap_auth_type</CODE>, - <CODE>ap_auth_name</CODE>, and <CODE>ap_requires</CODE>. - <LI> Common routines, to handle the protocol end of things, at least - for HTTP basic authentication (<CODE>ap_get_basic_auth_pw</CODE>, - which sets the <CODE>connection->user</CODE> structure field - automatically, and <CODE>ap_note_basic_auth_failure</CODE>, which - arranges for the proper <CODE>WWW-Authenticate:</CODE> header - to be sent back). -</UL> - -<H3><A NAME="log_handlers">Special considerations for logging handlers</A></H3> - -When a request has internally redirected, there is the question of -what to log. Apache handles this by bundling the entire chain of -redirects into a list of <CODE>request_rec</CODE> structures which are -threaded through the <CODE>r->prev</CODE> and <CODE>r->next</CODE> -pointers. The <CODE>request_rec</CODE> which is passed to the logging -handlers in such cases is the one which was originally built for the -initial request from the client; note that the bytes_sent field will -only be correct in the last request in the chain (the one for which a -response was actually sent). - -<H2><A NAME="pools">Resource allocation and resource pools</A></H2> -<P> -One of the problems of writing and designing a server-pool server is -that of preventing leakage, that is, allocating resources (memory, -open files, <EM>etc.</EM>), without subsequently releasing them. The resource -pool machinery is designed to make it easy to prevent this from -happening, by allowing resource to be allocated in such a way that -they are <EM>automatically</EM> released when the server is done with -them. -</P> -<P> -The way this works is as follows: the memory which is allocated, file -opened, <EM>etc.</EM>, to deal with a particular request are tied to a -<EM>resource pool</EM> which is allocated for the request. The pool -is a data structure which itself tracks the resources in question. -</P> -<P> -When the request has been processed, the pool is <EM>cleared</EM>. At -that point, all the memory associated with it is released for reuse, -all files associated with it are closed, and any other clean-up -functions which are associated with the pool are run. When this is -over, we can be confident that all the resource tied to the pool have -been released, and that none of them have leaked. -</P> -<P> -Server restarts, and allocation of memory and resources for per-server -configuration, are handled in a similar way. There is a -<EM>configuration pool</EM>, which keeps track of resources which were -allocated while reading the server configuration files, and handling -the commands therein (for instance, the memory that was allocated for -per-server module configuration, log files and other files that were -opened, and so forth). When the server restarts, and has to reread -the configuration files, the configuration pool is cleared, and so the -memory and file descriptors which were taken up by reading them the -last time are made available for reuse. -</P> -<P> -It should be noted that use of the pool machinery isn't generally -obligatory, except for situations like logging handlers, where you -really need to register cleanups to make sure that the log file gets -closed when the server restarts (this is most easily done by using the -function <CODE><A HREF="#pool-files">ap_pfopen</A></CODE>, which also -arranges for the underlying file descriptor to be closed before any -child processes, such as for CGI scripts, are <CODE>exec</CODE>ed), or -in case you are using the timeout machinery (which isn't yet even -documented here). However, there are two benefits to using it: -resources allocated to a pool never leak (even if you allocate a -scratch string, and just forget about it); also, for memory -allocation, <CODE>ap_palloc</CODE> is generally faster than -<CODE>malloc</CODE>. -</P> -<P> -We begin here by describing how memory is allocated to pools, and then -discuss how other resources are tracked by the resource pool -machinery. -</P> -<H3>Allocation of memory in pools</H3> -<P> -Memory is allocated to pools by calling the function -<CODE>ap_palloc</CODE>, which takes two arguments, one being a pointer to -a resource pool structure, and the other being the amount of memory to -allocate (in <CODE>char</CODE>s). Within handlers for handling -requests, the most common way of getting a resource pool structure is -by looking at the <CODE>pool</CODE> slot of the relevant -<CODE>request_rec</CODE>; hence the repeated appearance of the -following idiom in module code: -</P> -<PRE> -int my_handler(request_rec *r) -{ - struct my_structure *foo; - ... - - foo = (foo *)ap_palloc (r->pool, sizeof(my_structure)); -} -</PRE> -<P> -Note that <EM>there is no <CODE>ap_pfree</CODE></EM> --- -<CODE>ap_palloc</CODE>ed memory is freed only when the associated -resource pool is cleared. This means that <CODE>ap_palloc</CODE> does not -have to do as much accounting as <CODE>malloc()</CODE>; all it does in -the typical case is to round up the size, bump a pointer, and do a -range check. -</P> -<P> -(It also raises the possibility that heavy use of <CODE>ap_palloc</CODE> -could cause a server process to grow excessively large. There are -two ways to deal with this, which are dealt with below; briefly, you -can use <CODE>malloc</CODE>, and try to be sure that all of the memory -gets explicitly <CODE>free</CODE>d, or you can allocate a sub-pool of -the main pool, allocate your memory in the sub-pool, and clear it out -periodically. The latter technique is discussed in the section on -sub-pools below, and is used in the directory-indexing code, in order -to avoid excessive storage allocation when listing directories with -thousands of files). -</P> -<H3>Allocating initialized memory</H3> -<P> -There are functions which allocate initialized memory, and are -frequently useful. The function <CODE>ap_pcalloc</CODE> has the same -interface as <CODE>ap_palloc</CODE>, but clears out the memory it -allocates before it returns it. The function <CODE>ap_pstrdup</CODE> -takes a resource pool and a <CODE>char *</CODE> as arguments, and -allocates memory for a copy of the string the pointer points to, -returning a pointer to the copy. Finally <CODE>ap_pstrcat</CODE> is a -varargs-style function, which takes a pointer to a resource pool, and -at least two <CODE>char *</CODE> arguments, the last of which must be -<CODE>NULL</CODE>. It allocates enough memory to fit copies of each -of the strings, as a unit; for instance: -</P> -<PRE> - ap_pstrcat (r->pool, "foo", "/", "bar", NULL); -</PRE> -<P> -returns a pointer to 8 bytes worth of memory, initialized to -<CODE>"foo/bar"</CODE>. -</P> -<H3><A NAME="pools-used">Commonly-used pools in the Apache Web server</A></H3> -<P> -A pool is really defined by its lifetime more than anything else. There -are some static pools in http_main which are passed to various -non-http_main functions as arguments at opportune times. Here they are: -</P> -<DL COMPACT> - <DT>permanent_pool - </DT> - <DD> - <UL> - <LI>never passed to anything else, this is the ancestor of all pools - </LI> - </UL> - </DD> - <DT>pconf - </DT> - <DD> - <UL> - <LI>subpool of permanent_pool - </LI> - <LI>created at the beginning of a config "cycle"; exists until the - server is terminated or restarts; passed to all config-time - routines, either via cmd->pool, or as the "pool *p" argument on - those which don't take pools - </LI> - <LI>passed to the module init() functions - </LI> - </UL> - </DD> - <DT>ptemp - </DT> - <DD> - <UL> - <LI>sorry I lie, this pool isn't called this currently in 1.3, I - renamed it this in my pthreads development. I'm referring to - the use of ptrans in the parent... contrast this with the later - definition of ptrans in the child. - </LI> - <LI>subpool of permanent_pool - </LI> - <LI>created at the beginning of a config "cycle"; exists until the - end of config parsing; passed to config-time routines <EM>via</EM> - cmd->temp_pool. Somewhat of a "bastard child" because it isn't - available everywhere. Used for temporary scratch space which - may be needed by some config routines but which is deleted at - the end of config. - </LI> - </UL> - </DD> - <DT>pchild - </DT> - <DD> - <UL> - <LI>subpool of permanent_pool - </LI> - <LI>created when a child is spawned (or a thread is created); lives - until that child (thread) is destroyed - </LI> - <LI>passed to the module child_init functions - </LI> - <LI>destruction happens right after the child_exit functions are - called... (which may explain why I think child_exit is redundant - and unneeded) - </LI> - </UL> - </DD> - <DT>ptrans - <DT> - <DD> - <UL> - <LI>should be a subpool of pchild, but currently is a subpool of - permanent_pool, see above - </LI> - <LI>cleared by the child before going into the accept() loop to receive - a connection - </LI> - <LI>used as connection->pool - </LI> - </UL> - </DD> - <DT>r->pool - </DT> - <DD> - <UL> - <LI>for the main request this is a subpool of connection->pool; for - subrequests it is a subpool of the parent request's pool. - </LI> - <LI>exists until the end of the request (<EM>i.e.</EM>, - ap_destroy_sub_req, or - in child_main after process_request has finished) - </LI> - <LI>note that r itself is allocated from r->pool; <EM>i.e.</EM>, - r->pool is - first created and then r is the first thing palloc()d from it - </LI> - </UL> - </DD> -</DL> -<P> -For almost everything folks do, r->pool is the pool to use. But you -can see how other lifetimes, such as pchild, are useful to some -modules... such as modules that need to open a database connection once -per child, and wish to clean it up when the child dies. -</P> -<P> -You can also see how some bugs have manifested themself, such as setting -connection->user to a value from r->pool -- in this case -connection exists -for the lifetime of ptrans, which is longer than r->pool (especially if -r->pool is a subrequest!). So the correct thing to do is to allocate -from connection->pool. -</P> -<P> -And there was another interesting bug in mod_include/mod_cgi. You'll see -in those that they do this test to decide if they should use r->pool -or r->main->pool. In this case the resource that they are registering -for cleanup is a child process. If it were registered in r->pool, -then the code would wait() for the child when the subrequest finishes. -With mod_include this could be any old #include, and the delay can be up -to 3 seconds... and happened quite frequently. Instead the subprocess -is registered in r->main->pool which causes it to be cleaned up when -the entire request is done -- <EM>i.e.</EM>, after the output has been sent to -the client and logging has happened. -</P> -<H3><A NAME="pool-files">Tracking open files, etc.</A></H3> -<P> -As indicated above, resource pools are also used to track other sorts -of resources besides memory. The most common are open files. The -routine which is typically used for this is <CODE>ap_pfopen</CODE>, which -takes a resource pool and two strings as arguments; the strings are -the same as the typical arguments to <CODE>fopen</CODE>, <EM>e.g.</EM>, -</P> -<PRE> - ... - FILE *f = ap_pfopen (r->pool, r->filename, "r"); - - if (f == NULL) { ... } else { ... } -</PRE> -<P> -There is also a <CODE>ap_popenf</CODE> routine, which parallels the -lower-level <CODE>open</CODE> system call. Both of these routines -arrange for the file to be closed when the resource pool in question -is cleared. -</P> -<P> -Unlike the case for memory, there <EM>are</EM> functions to close -files allocated with <CODE>ap_pfopen</CODE>, and <CODE>ap_popenf</CODE>, -namely <CODE>ap_pfclose</CODE> and <CODE>ap_pclosef</CODE>. (This is -because, on many systems, the number of files which a single process -can have open is quite limited). It is important to use these -functions to close files allocated with <CODE>ap_pfopen</CODE> and -<CODE>ap_popenf</CODE>, since to do otherwise could cause fatal errors on -systems such as Linux, which react badly if the same -<CODE>FILE*</CODE> is closed more than once. -</P> -<P> -(Using the <CODE>close</CODE> functions is not mandatory, since the -file will eventually be closed regardless, but you should consider it -in cases where your module is opening, or could open, a lot of files). -</P> -<H3>Other sorts of resources --- cleanup functions</H3> -<BLOCKQUOTE> -More text goes here. Describe the the cleanup primitives in terms of -which the file stuff is implemented; also, <CODE>spawn_process</CODE>. -</BLOCKQUOTE> -<P> -Pool cleanups live until clear_pool() is called: clear_pool(a) recursively -calls destroy_pool() on all subpools of a; then calls all the cleanups for a; -then releases all the memory for a. destroy_pool(a) calls clear_pool(a) -and then releases the pool structure itself. <EM>i.e.</EM>, clear_pool(a) doesn't -delete a, it just frees up all the resources and you can start using it -again immediately. -</P> -<H3>Fine control --- creating and dealing with sub-pools, with a note -on sub-requests</H3> - -On rare occasions, too-free use of <CODE>ap_palloc()</CODE> and the -associated primitives may result in undesirably profligate resource -allocation. You can deal with such a case by creating a -<EM>sub-pool</EM>, allocating within the sub-pool rather than the main -pool, and clearing or destroying the sub-pool, which releases the -resources which were associated with it. (This really <EM>is</EM> a -rare situation; the only case in which it comes up in the standard -module set is in case of listing directories, and then only with -<EM>very</EM> large directories. Unnecessary use of the primitives -discussed here can hair up your code quite a bit, with very little -gain). <P> - -The primitive for creating a sub-pool is <CODE>ap_make_sub_pool</CODE>, -which takes another pool (the parent pool) as an argument. When the -main pool is cleared, the sub-pool will be destroyed. The sub-pool -may also be cleared or destroyed at any time, by calling the functions -<CODE>ap_clear_pool</CODE> and <CODE>ap_destroy_pool</CODE>, respectively. -(The difference is that <CODE>ap_clear_pool</CODE> frees resources -associated with the pool, while <CODE>ap_destroy_pool</CODE> also -deallocates the pool itself. In the former case, you can allocate new -resources within the pool, and clear it again, and so forth; in the -latter case, it is simply gone). <P> - -One final note --- sub-requests have their own resource pools, which -are sub-pools of the resource pool for the main request. The polite -way to reclaim the resources associated with a sub request which you -have allocated (using the <CODE>ap_sub_req_...</CODE> functions) -is <CODE>ap_destroy_sub_req</CODE>, which frees the resource pool. -Before calling this function, be sure to copy anything that you care -about which might be allocated in the sub-request's resource pool into -someplace a little less volatile (for instance, the filename in its -<CODE>request_rec</CODE> structure). <P> - -(Again, under most circumstances, you shouldn't feel obliged to call -this function; only 2K of memory or so are allocated for a typical sub -request, and it will be freed anyway when the main request pool is -cleared. It is only when you are allocating many, many sub-requests -for a single main request that you should seriously consider the -<CODE>ap_destroy_...</CODE> functions). - -<H2><A NAME="config">Configuration, commands and the like</A></H2> - -One of the design goals for this server was to maintain external -compatibility with the NCSA 1.3 server --- that is, to read the same -configuration files, to process all the directives therein correctly, -and in general to be a drop-in replacement for NCSA. On the other -hand, another design goal was to move as much of the server's -functionality into modules which have as little as possible to do with -the monolithic server core. The only way to reconcile these goals is -to move the handling of most commands from the central server into the -modules. <P> - -However, just giving the modules command tables is not enough to -divorce them completely from the server core. The server has to -remember the commands in order to act on them later. That involves -maintaining data which is private to the modules, and which can be -either per-server, or per-directory. Most things are per-directory, -including in particular access control and authorization information, -but also information on how to determine file types from suffixes, -which can be modified by <CODE>AddType</CODE> and -<CODE>DefaultType</CODE> directives, and so forth. In general, the -governing philosophy is that anything which <EM>can</EM> be made -configurable by directory should be; per-server information is -generally used in the standard set of modules for information like -<CODE>Alias</CODE>es and <CODE>Redirect</CODE>s which come into play -before the request is tied to a particular place in the underlying -file system. <P> - -Another requirement for emulating the NCSA server is being able to -handle the per-directory configuration files, generally called -<CODE>.htaccess</CODE> files, though even in the NCSA server they can -contain directives which have nothing at all to do with access -control. Accordingly, after URI -> filename translation, but before -performing any other phase, the server walks down the directory -hierarchy of the underlying filesystem, following the translated -pathname, to read any <CODE>.htaccess</CODE> files which might be -present. The information which is read in then has to be -<EM>merged</EM> with the applicable information from the server's own -config files (either from the <CODE><Directory></CODE> sections -in <CODE>access.conf</CODE>, or from defaults in -<CODE>srm.conf</CODE>, which actually behaves for most purposes almost -exactly like <CODE><Directory /></CODE>).<P> - -Finally, after having served a request which involved reading -<CODE>.htaccess</CODE> files, we need to discard the storage allocated -for handling them. That is solved the same way it is solved wherever -else similar problems come up, by tying those structures to the -per-transaction resource pool. <P> - -<H3><A NAME="per-dir">Per-directory configuration structures</A></H3> - -Let's look out how all of this plays out in <CODE>mod_mime.c</CODE>, -which defines the file typing handler which emulates the NCSA server's -behavior of determining file types from suffixes. What we'll be -looking at, here, is the code which implements the -<CODE>AddType</CODE> and <CODE>AddEncoding</CODE> commands. These -commands can appear in <CODE>.htaccess</CODE> files, so they must be -handled in the module's private per-directory data, which in fact, -consists of two separate <CODE>table</CODE>s for MIME types and -encoding information, and is declared as follows: - -<PRE> -typedef struct { - table *forced_types; /* Additional AddTyped stuff */ - table *encoding_types; /* Added with AddEncoding... */ -} mime_dir_config; -</PRE> - -When the server is reading a configuration file, or -<CODE><Directory></CODE> section, which includes one of the MIME -module's commands, it needs to create a <CODE>mime_dir_config</CODE> -structure, so those commands have something to act on. It does this -by invoking the function it finds in the module's `create per-dir -config slot', with two arguments: the name of the directory to which -this configuration information applies (or <CODE>NULL</CODE> for -<CODE>srm.conf</CODE>), and a pointer to a resource pool in which the -allocation should happen. <P> - -(If we are reading a <CODE>.htaccess</CODE> file, that resource pool -is the per-request resource pool for the request; otherwise it is a -resource pool which is used for configuration data, and cleared on -restarts. Either way, it is important for the structure being created -to vanish when the pool is cleared, by registering a cleanup on the -pool if necessary). <P> - -For the MIME module, the per-dir config creation function just -<CODE>ap_palloc</CODE>s the structure above, and a creates a couple of -<CODE>table</CODE>s to fill it. That looks like this: - -<PRE> -void *create_mime_dir_config (pool *p, char *dummy) -{ - mime_dir_config *new = - (mime_dir_config *) ap_palloc (p, sizeof(mime_dir_config)); - - new->forced_types = ap_make_table (p, 4); - new->encoding_types = ap_make_table (p, 4); - - return new; -} -</PRE> - -Now, suppose we've just read in a <CODE>.htaccess</CODE> file. We -already have the per-directory configuration structure for the next -directory up in the hierarchy. If the <CODE>.htaccess</CODE> file we -just read in didn't have any <CODE>AddType</CODE> or -<CODE>AddEncoding</CODE> commands, its per-directory config structure -for the MIME module is still valid, and we can just use it. -Otherwise, we need to merge the two structures somehow. <P> - -To do that, the server invokes the module's per-directory config merge -function, if one is present. That function takes three arguments: -the two structures being merged, and a resource pool in which to -allocate the result. For the MIME module, all that needs to be done -is overlay the tables from the new per-directory config structure with -those from the parent: - -<PRE> -void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv) -{ - mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv; - mime_dir_config *subdir = (mime_dir_config *)subdirv; - mime_dir_config *new = - (mime_dir_config *)ap_palloc (p, sizeof(mime_dir_config)); - - new->forced_types = ap_overlay_tables (p, subdir->forced_types, - parent_dir->forced_types); - new->encoding_types = ap_overlay_tables (p, subdir->encoding_types, - parent_dir->encoding_types); - - return new; -} -</PRE> - -As a note --- if there is no per-directory merge function present, the -server will just use the subdirectory's configuration info, and ignore -the parent's. For some modules, that works just fine (<EM>e.g.</EM>, for the -includes module, whose per-directory configuration information -consists solely of the state of the <CODE>XBITHACK</CODE>), and for -those modules, you can just not declare one, and leave the -corresponding structure slot in the module itself <CODE>NULL</CODE>.<P> - -<H3><A NAME="commands">Command handling</A></H3> - -Now that we have these structures, we need to be able to figure out -how to fill them. That involves processing the actual -<CODE>AddType</CODE> and <CODE>AddEncoding</CODE> commands. To find -commands, the server looks in the module's <CODE>command table</CODE>. -That table contains information on how many arguments the commands -take, and in what formats, where it is permitted, and so forth. That -information is sufficient to allow the server to invoke most -command-handling functions with pre-parsed arguments. Without further -ado, let's look at the <CODE>AddType</CODE> command handler, which -looks like this (the <CODE>AddEncoding</CODE> command looks basically -the same, and won't be shown here): - -<PRE> -char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext) -{ - if (*ext == '.') ++ext; - ap_table_set (m->forced_types, ext, ct); - return NULL; -} -</PRE> - -This command handler is unusually simple. As you can see, it takes -four arguments, two of which are pre-parsed arguments, the third being -the per-directory configuration structure for the module in question, -and the fourth being a pointer to a <CODE>cmd_parms</CODE> structure. -That structure contains a bunch of arguments which are frequently of -use to some, but not all, commands, including a resource pool (from -which memory can be allocated, and to which cleanups should be tied), -and the (virtual) server being configured, from which the module's -per-server configuration data can be obtained if required.<P> - -Another way in which this particular command handler is unusually -simple is that there are no error conditions which it can encounter. -If there were, it could return an error message instead of -<CODE>NULL</CODE>; this causes an error to be printed out on the -server's <CODE>stderr</CODE>, followed by a quick exit, if it is in -the main config files; for a <CODE>.htaccess</CODE> file, the syntax -error is logged in the server error log (along with an indication of -where it came from), and the request is bounced with a server error -response (HTTP error status, code 500). <P> - -The MIME module's command table has entries for these commands, which -look like this: - -<PRE> -command_rec mime_cmds[] = { -{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2, - "a mime type followed by a file extension" }, -{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, - "an encoding (<EM>e.g.</EM>, gzip), followed by a file extension" }, -{ NULL } -}; -</PRE> - -The entries in these tables are: - -<UL> - <LI> The name of the command - <LI> The function which handles it - <LI> a <CODE>(void *)</CODE> pointer, which is passed in the - <CODE>cmd_parms</CODE> structure to the command handler --- - this is useful in case many similar commands are handled by the - same function. - <LI> A bit mask indicating where the command may appear. There are - mask bits corresponding to each <CODE>AllowOverride</CODE> - option, and an additional mask bit, <CODE>RSRC_CONF</CODE>, - indicating that the command may appear in the server's own - config files, but <EM>not</EM> in any <CODE>.htaccess</CODE> - file. - <LI> A flag indicating how many arguments the command handler wants - pre-parsed, and how they should be passed in. - <CODE>TAKE2</CODE> indicates two pre-parsed arguments. Other - options are <CODE>TAKE1</CODE>, which indicates one pre-parsed - argument, <CODE>FLAG</CODE>, which indicates that the argument - should be <CODE>On</CODE> or <CODE>Off</CODE>, and is passed in - as a boolean flag, <CODE>RAW_ARGS</CODE>, which causes the - server to give the command the raw, unparsed arguments - (everything but the command name itself). There is also - <CODE>ITERATE</CODE>, which means that the handler looks the - same as <CODE>TAKE1</CODE>, but that if multiple arguments are - present, it should be called multiple times, and finally - <CODE>ITERATE2</CODE>, which indicates that the command handler - looks like a <CODE>TAKE2</CODE>, but if more arguments are - present, then it should be called multiple times, holding the - first argument constant. - <LI> Finally, we have a string which describes the arguments that - should be present. If the arguments in the actual config file - are not as required, this string will be used to help give a - more specific error message. (You can safely leave this - <CODE>NULL</CODE>). -</UL> - -Finally, having set this all up, we have to use it. This is -ultimately done in the module's handlers, specifically for its -file-typing handler, which looks more or less like this; note that the -per-directory configuration structure is extracted from the -<CODE>request_rec</CODE>'s per-directory configuration vector by using -the <CODE>ap_get_module_config</CODE> function. - -<PRE> -int find_ct(request_rec *r) -{ - int i; - char *fn = ap_pstrdup (r->pool, r->filename); - mime_dir_config *conf = (mime_dir_config *) - ap_get_module_config(r->per_dir_config, &mime_module); - char *type; - - if (S_ISDIR(r->finfo.st_mode)) { - r->content_type = DIR_MAGIC_TYPE; - return OK; - } - - if((i=ap_rind(fn,'.')) < 0) return DECLINED; - ++i; - - if ((type = ap_table_get (conf->encoding_types, &fn[i]))) - { - r->content_encoding = type; - - /* go back to previous extension to try to use it as a type */ - - fn[i-1] = '\0'; - if((i=ap_rind(fn,'.')) < 0) return OK; - ++i; - } - - if ((type = ap_table_get (conf->forced_types, &fn[i]))) - { - r->content_type = type; - } - - return OK; -} - -</PRE> - -<H3><A NAME="servconf">Side notes --- per-server configuration, virtual - servers, <EM>etc</EM>.</A></H3> - -The basic ideas behind per-server module configuration are basically -the same as those for per-directory configuration; there is a creation -function and a merge function, the latter being invoked where a -virtual server has partially overridden the base server configuration, -and a combined structure must be computed. (As with per-directory -configuration, the default if no merge function is specified, and a -module is configured in some virtual server, is that the base -configuration is simply ignored). <P> - -The only substantial difference is that when a command needs to -configure the per-server private module data, it needs to go to the -<CODE>cmd_parms</CODE> data to get at it. Here's an example, from the -alias module, which also indicates how a syntax error can be returned -(note that the per-directory configuration argument to the command -handler is declared as a dummy, since the module doesn't actually have -per-directory config data): - -<PRE> -char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url) -{ - server_rec *s = cmd->server; - alias_server_conf *conf = (alias_server_conf *) - ap_get_module_config(s->module_config,&alias_module); - alias_entry *new = ap_push_array (conf->redirects); - - if (!ap_is_url (url)) return "Redirect to non-URL"; - - new->fake = f; new->real = url; - return NULL; -} -</PRE> -<!--#include virtual="footer.html" --> -</BODY></HTML> |