diff options
author | Roy T. Fielding <fielding@apache.org> | 2000-07-13 08:01:45 +0000 |
---|---|---|
committer | Roy T. Fielding <fielding@apache.org> | 2000-07-13 08:01:45 +0000 |
commit | f599c408c99c0007f00ebb6e3412c8f9751f31c5 (patch) | |
tree | 3b8e2606f0a33d751f9419e9fc5bf464996de311 /buckets | |
parent | e465ba32742cbdb931dc0da6b38bed1ca98f7c38 (diff) | |
download | apr-f599c408c99c0007f00ebb6e3412c8f9751f31c5.tar.gz |
Gather together the main points of bucket brigades.
git-svn-id: https://svn.apache.org/repos/asf/apr/apr/trunk@60356 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'buckets')
-rw-r--r-- | buckets/doc_bucket_brigades.txt | 381 |
1 files changed, 381 insertions, 0 deletions
diff --git a/buckets/doc_bucket_brigades.txt b/buckets/doc_bucket_brigades.txt new file mode 100644 index 000000000..9fc3c7a03 --- /dev/null +++ b/buckets/doc_bucket_brigades.txt @@ -0,0 +1,381 @@ +To: new-httpd@apache.org +Subject: bucket brigades and IOL +Date: Fri, 12 Nov 1999 23:57:43 -0800 +From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU> +Message-ID: <199911122357.aa18914@gremlin-relay.ics.uci.edu> + +About two years ago I wasted a lot of time writing an Ada95 library +called Onions that provides a stackable stream abstraction for files, +sockets, etc. It is at <http://www.ics.uci.edu/pub/websoft/libwww-ada95/> +if you want to take a look at it, but I don't recommend looking at the +code since it is almost all just working around Ada95's lack of a +system interface. I'll describe the worthwhile bits here. + +The heart of Onions is the input and output stream object +classes and classwide types for building a data stream via a +stack of stream objects (Input_Pipe and Output_Pipe). Reading +from the head of an input pipe causes the head stream object +to read from the next outbound stream object, and on down the line. +Likewise for writing to the head of an output pipe. One of the +main features of streams is that they can filter the data as it +passes, converting, adding to, and/or removing from the data +before giving it to the next object. Since multiple streams can be +cascaded, the complete data conversion is the sum of the individual +data conversions performed by the stream objects. + +So far, no big deal -- this can be manually created by stacking ap_iol +types in a meaningful way. But, the one unique thing I did in Onions was +abstract the memory handling into something called Buckets and moved them +around in Bucket_Brigades. A bucket is an allocated segment of memory +with pointers to its allocation address and current size. If I were doing +this in C, I'd also add a pointer to current start address and allocated +size, so that a single bucket could be shrunk from both ends without +copying, and a function pointer for freeing it at the stream end. +Note that this is the same type of memory structure that IO-Lite uses, +though developed independently and for different reasons. + +A bucket brigade is a list-queue of buckets. Each of the stream read/write +calls would pass a bucket brigade instead of single bucket, since this +made insertion by filters more efficient, with the general idea being that +the outbound end of the sream would be writing them out using writev +or reading them in using readv, which is about as efficient as I could +get with Ada95. [I call it a list-queue instead of just queue because you +have the choice of removing buckets from (or adding to) the queue one +bucket at a time or an entire linked list of buckets.] + +But we could go one step further. A bucket is an ADT, and as such can +be used as a general handle for read-only memory, read-write memory, +cache object, file handle, mmap handle, file name, URL, whatever. +What if, instead of just a stream of memory, it could pass around a +stream of memory interspersed with file handles or references to +remote objects? A filter could then add stuff around the stream without +causing too much parsing overhead, and if it needed to look at all the +bytes in the stream it would just replace the bucket handle with a stream +of memory sucked from that handle. Something like this was talked about +last year (see threads on "Stacking up Response Handling" on 23 Sep 1998 +and "I/O filters & reference counts" in late December 1998 and January 1999). +And Dean started something with ap_buf.h, but I don't know how he meant +to finish it. + +What I was thinking of was + + typedef enum { + AP_BUCKET_rwmem, + AP_BUCKET_rmem, + AP_BUCKET_file_t, + AP_BUCKET_mmap_t, + AP_BUCKET_filename, + AP_BUCKET_cached_entity, + AP_BUCKET_URI, + } ap_bucket_color_t; + + typedef struct ap_bucket_t ap_bucket_t; + struct ap_bucket_t { + ap_bucket_color_t color; + void *content; + ap_status_t (*free)(ap_bucket_t *bucket); + unsigned int refcount; + }; + + typedef struct ap_bucket_rwmem_t ap_bucket_rwmem_t; + struct ap_bucket_rwmem_t { + void *alloc_addr; + size_t alloc_len; + void *addr; + size_t len; + }; + + typedef struct ap_bucket_rmem_t ap_bucket_rmem_t; + struct ap_bucket_rmem_t { + void *addr; + size_t len; + }; + + typedef struct ap_bucket_filename ap_bucket_filename; + struct ap_bucket_filename { + ap_context_t *ctx; + char *name; + ap_stat_t *stat; /* useful if already stat'ed */ + ap_aa_t *conf; /* access control structure for this file */ + }; + + ... + +and then + + typedef struct ap_bucket_list_t ap_bucket_list_t; + struct ap_bucket_list_t { + ap_bucket_t *bucket; + ap_bucket_list_t *prev; + ap_bucket_list_t *next; + }; + + typedef struct ap_brigade_t ap_brigade_t; + struct ap_brigade_t { + ap_context_t *ctx; + ap_bucket_list_t *first; + ap_bucket_list_t *last; + unsigned int count; + }; + +and then construct the input and output streams as pushing these +bucket brigades to or from the client. The streams would have to +be a little more complicated than Onions, since I learned later that +you also need a parallel stream of header fields (in tokenized form) +in order for it to work with anything HTTP-like. + +Why use an enum instead of a bunch of file pointers for each type +of bucket, kind of like ap_iol? Because it allows adjacent memory +buckets (the most frequent kind after a filter operation) to be +gathered into a single writev. Also, we need a way to be able to +set up an operation and figure out what it will produce without +actually performing the operation -- this is for OPTIONS and HEAD. + +Note that this would completely change the way we handle internal +redirects, subrequests, server-side include, mod_proxy, access control, etc. +And then most of the API hooks would need to change. I think that is why +Dean was putting it off until 2.1. The annoying thing is that this is the +most useful rearchitecting of the server -- the MPM, APR, and hook changes +make 2.0 easier/cleaner/faster to port to other platforms, but layering +enables in one fell swoop almost every significant non-config feature +that our users have requested. A cache would just be a hash table or +btree of file buckets, complete with AA info. + +Anyway, that was stuck in the back of my head and had to get out. +I won't be able to work on it until after the dissertation is done, +which every day seems to be further away. Maybe 3.0, with rHTTP/2.0. + +....Roy + +================================================= +To: new-httpd@apache.org +Subject: Re: bucket brigades and IOL +In-reply-to: Your message of "Sat, 13 Nov 1999 20:43:58 GMT." + <382DCD8E.881B8468@algroup.co.uk> +Date: Sun, 14 Nov 1999 22:24:03 -0800 +From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU> +Message-ID: <199911142224.aa22545@gremlin-relay.ics.uci.edu> + +BenL wrote: +>I've got to say that this is the most coherent suggestion along these +>lines that I've seen yet. I rather like it. One thing I'd add is that if +>you are going to have a movable "start of block" pointer, and changeable +>length, it can be nice to allocate extra around the edges under some +>circumstances, so that lower layers can expand the block without having +>to add extra chunks. + +Or, alternatively, allocate equal size blocks and just pass around +a reference pair within the buckets that, when the bucket is freed, +access a more complicated reference-counting pool. I think that is +closer to what IO-Lite does. + +>Also, the usual objections still apply - i.e. it is awkward to do things +>like searching for particular strings, since they may cross boundaries. +>I'm beginning to think that the right answer to this is to provide nice +>matching functions that know about the chunked structures, and last +>resort functions that'll glue it all back into one chunk... + +Yep, that's what I ended up doing for Ada95, though in that case there +were no easier alternatives. + +....Roy + +================================================= +To: new-httpd@apache.org +Subject: Re: layered I/O (was: cvs commit: ...) +In-reply-to: Your message of "Wed, 29 Mar 2000 01:21:09 PST." + <Pine.LNX.4.21.0003290004100.10357-100000@piglet> +Date: Wed, 29 Mar 2000 02:05:08 -0800 +From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU> +Message-ID: <200003290205.aa19557@gremlin-relay.ics.uci.edu> + +>Selection of IO Layers +> +>The core selects a source module and IO layers based on the urlspace +>configuration. Content might be generated by mod_perl, and the result is +>piped through mod_chunk, mod_ssl, and mod_net, in turn. When the content +>generator runs, the core enforces that the module set the content type +>before the first call to ap_bput. The content type is set by a function +>call. The function (ap_set_content_type(request_rec *, char *)) examines +>the content type and adds IO layers as neccessary. For server parsed +>html, the core might insert mod_include immediately after mod_perl. + +The problem of thinking of it that way is that, like Dean mentioned, +the output of one module may be filtered and the filter indicate that +content should be embedded from another URL, which turns out to be a +CGI script that outputs further parseable content. In this instance, +the goal of layered-IO is to abstract away such behavior so that the +instance is processed recursively and thus doesn't result in some tangled +mess of processing code for subrequests. Doing it requires that each +layer be able to pass both data and metadata, and have both data and +metadata be processed at each layer (if desired), rather than call a +single function that would set the metadata for the entire response. + +My "solution" to that is to pass three interlaced streams -- data, +metadata, and meta-metadata -- through each layer. The metadata +streams would point to a table of tokenized name-value pairs. +There are lots of ways to do that, going back to my description of +bucket brigades long ago. Basically, each block of memory would +indicate what type of data, with metadata occurring in a block before +the data block(s) that it describes (just like chunk-size describes +the subsequent chunk-data) and the layers could be dynamically +rearranged based on the metadata that passed through them, in +accordance with the purpose of the filter. + +>(Can anyone produce a use case where the IO chain could change after +>output begins?) + +Output is a little easier, but that is the normal case for input. +We don't know what filters to apply to the request body until after +we have passed through the HTTP headers, and the HTTP message processor +is itself a filter in this model. + +....Roy + + +================================================= +To: new-httpd@apache.org +Subject: Re: filtering patches +In-reply-to: Your message of "Mon, 10 Jul 2000 15:33:25 PDT." + <Pine.LNX.4.21.0007101528180.10878-100000@koj.rkbloom.net> +Date: Mon, 10 Jul 2000 16:58:00 -0700 +From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU> +Message-ID: <200007101657.aa21782@gremlin-relay.ics.uci.edu> + +[...] +I meant that the filters, when written to as part of the output stream, +are treated as a stack (write to the top-most filter without any knowledge +of what may lie underneath it). So the process of arranging filters +for a particular response is like dropping them onto a stack. When a +filter is done or the stream is closed, each instantiated filter cleans +up according to its local state and then destroys itself (as it is popped +off the stack). + +This is completely separate from the registration of filters by +name and purpose, which could be done by hooks. The difference is that +filters are registered at config time but only instantiated (given local +storage) and arranged on a per stream basis. + +Bucket brigades is simply a way to encapsulate and pass data down the stream +such that it can be as efficient as the sender desires, while retaining +a simple interface. The purpose of the bucket is to make handling of the +data uniform regardless of its type, or make type-specific conversions +via a single ADT call if and only if they are needed by some filter. +The purpose of the brigade is to reduce the number of calling arguments +and linearize the calling sequence for insertion filters. Each filter +definition is separate from its instantiation on the stream because +there may be many streams operating at once within a single program. +Each bucket is independent of the brigade so that the filters can rearrange +and insert buckets at will. Each data item is isolated by the bucket +structure, which allows them to be split across child buckets or shared +with multiple streams (e.g., cached objects). We don't need to implement +all of this on the first pass -- we just need to implement the ADT external +interfaces such that they don't have to change as we make the overall +stream more efficient. + +BTW, in case I didn't make this clear in past messages, this design is +an amalgam of the best aspects of the designs from Henrik's Streams +(see w3c-libwww), sfio (AT&T Research), IO-Lite (Rice Univ.), and +libwww-ada95 (UCI). The MIME stuff in MSIE is based on Henrik's streams. +Henrik's stuff is very fast, but is spaghetti code because it relies on +callbacks and legacy stuff in libwww. sfio is great but is intended to +be a complete replacement for stdio and hence does way too much and is +subject to a few patents that I don't appreciate. IO-Lite is cool but +is probably only worth it when the entire OS is based on IO-Lite memory +management, but regardless the code isn't available for commercial use. +As Dean has mentioned many times, we already get most of the performance +benefit of IO-Lite simply by avoiding memory copies on large writes. +libwww-ada95 was an attempt to make Ada95 suck less for systems programming, +which was only a partial success (it is very fast compared to other Ada95 +libraries, but memory management became a problem with complex filters). + +Writing our own streams library isn't NIH syndrome -- both Dean and I +have independently investigated the other available alternatives and they +just aren't suitable for our purpose. Even with all that, my own design +only truly pays off (versus plain old BUFF) when you make good use of +sendfile and shared object caching. + +[...] + + +================================================= +Other stuff Roy wrote on new-httpd: + +My buckets are passed around in list-queues (really just lists with front +and back pointers). My buckets carry data and metadata and meta-metadata. +My buckets are used to indicate stream-end, and the filter configuration +itself is determined by the stream content. It probably sounds weird, but +the effects of this interface are completely different than mere content +filters. They simplify everything. I'm not saying that we have to +simplify everything right away, but I am saying that it is just as easy +to implement a fully-general filter using bucket brigades as it is +to implement string interface filters -- all of the complex parts +are handled by the ADTs. + +... + +The real psychedelic stuff happens when you can pass metadata (tokenized +header fields) as buckets and the filters know how to pass that down the +chain without treating them as data. + +... + +The purpose of passing a list of buckets around is to linearize +the call stack for the frequent case of filtered content +splitting one large bucket into separate buckets with filtered results +interspersed in between. The effect is that a filter chain can frequently +process an entire message in one pass down the chain, which enables the +stream end to send the entire response in one go, which also allows it +to do interesting things like provide a content length by summing the +data length of all the buckets' data, and set a last-modified time +by picking the most recent time from a set of static file buckets. + +I think it would help if we stopped using artificial examples. Let's +try something simple: + + socket <-- http <-- add_footer <-- add_header <-- send_file + +send_file calls its filter with an ap_file_t bucket and End-of-Stream (EOS) +in the bucket list. add_header sets a flag, prepends another ap_file_t +bucket to the list and sends the list to its filter. add_footer looks +at the list, finds the EOS, inserts another ap_file_t bucket in +front of the EOS, and sends the list on to its filter. http walks through +the list picking up the (cached) stat values, notes the EOS and seeing +that its own flag for headers_sent is false, sets the cumulative metadata +and sends the header fields, followed by three calls to the kernel to +send out the three files using whatever mechanism is most efficient. + +The point here isn't that this is the only way to implement filters. +The point is that no other interface can implement them as efficiently. +Not even close. Yes, there are cases where string filters are just as +efficient as any other design, but there is no case in which they are +more efficient than bucket brigades. The reason is that being able +to process a list of strings in one call more than offsets the extra +cost of list processing, regardless of the filter type, and allows +for additional features that have benefits for http processing. +Like, for example, being able to determine the entire set of resources +that make up the source of this dynamic resource without teaching every +filter about WebDAV. + +... + +Making many small calls down the filter chain is something best +avoided, which is why the bucket brigades interface consists of +a linked list of buckets, such that all of the currently available +data can be passed-on in a single call. + +Being able to handle sendfile, cached objects and subrequests is very +effective at improving efficiency, which is why the buckets are typed. +A filter that needs to do byte-level processing will have to call a +routine to convert the typed bucket into a data stream, but that decision +is delayed until no other choice is available and adds no overhead to +the common cases of non-filtered or pre-/post-filtered objects. + +Being able to process header fields (metadata) through the same filter +set as the data is necessary for correctness and simplicity in the +proper ordering of independently developed filter modules, which is +why the buckets can carry metadata on the same stream. Every filter +has to be knowledgeable about metadata because only the filter knows +whether or not its actions will change the nature of the data. + + |