summaryrefslogtreecommitdiff
path: root/buckets
diff options
context:
space:
mode:
authorRoy T. Fielding <fielding@apache.org>2000-07-13 08:01:45 +0000
committerRoy T. Fielding <fielding@apache.org>2000-07-13 08:01:45 +0000
commitf599c408c99c0007f00ebb6e3412c8f9751f31c5 (patch)
tree3b8e2606f0a33d751f9419e9fc5bf464996de311 /buckets
parente465ba32742cbdb931dc0da6b38bed1ca98f7c38 (diff)
downloadapr-f599c408c99c0007f00ebb6e3412c8f9751f31c5.tar.gz
Gather together the main points of bucket brigades.
git-svn-id: https://svn.apache.org/repos/asf/apr/apr/trunk@60356 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'buckets')
-rw-r--r--buckets/doc_bucket_brigades.txt381
1 files changed, 381 insertions, 0 deletions
diff --git a/buckets/doc_bucket_brigades.txt b/buckets/doc_bucket_brigades.txt
new file mode 100644
index 000000000..9fc3c7a03
--- /dev/null
+++ b/buckets/doc_bucket_brigades.txt
@@ -0,0 +1,381 @@
+To: new-httpd@apache.org
+Subject: bucket brigades and IOL
+Date: Fri, 12 Nov 1999 23:57:43 -0800
+From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
+Message-ID: <199911122357.aa18914@gremlin-relay.ics.uci.edu>
+
+About two years ago I wasted a lot of time writing an Ada95 library
+called Onions that provides a stackable stream abstraction for files,
+sockets, etc. It is at <http://www.ics.uci.edu/pub/websoft/libwww-ada95/>
+if you want to take a look at it, but I don't recommend looking at the
+code since it is almost all just working around Ada95's lack of a
+system interface. I'll describe the worthwhile bits here.
+
+The heart of Onions is the input and output stream object
+classes and classwide types for building a data stream via a
+stack of stream objects (Input_Pipe and Output_Pipe). Reading
+from the head of an input pipe causes the head stream object
+to read from the next outbound stream object, and on down the line.
+Likewise for writing to the head of an output pipe. One of the
+main features of streams is that they can filter the data as it
+passes, converting, adding to, and/or removing from the data
+before giving it to the next object. Since multiple streams can be
+cascaded, the complete data conversion is the sum of the individual
+data conversions performed by the stream objects.
+
+So far, no big deal -- this can be manually created by stacking ap_iol
+types in a meaningful way. But, the one unique thing I did in Onions was
+abstract the memory handling into something called Buckets and moved them
+around in Bucket_Brigades. A bucket is an allocated segment of memory
+with pointers to its allocation address and current size. If I were doing
+this in C, I'd also add a pointer to current start address and allocated
+size, so that a single bucket could be shrunk from both ends without
+copying, and a function pointer for freeing it at the stream end.
+Note that this is the same type of memory structure that IO-Lite uses,
+though developed independently and for different reasons.
+
+A bucket brigade is a list-queue of buckets. Each of the stream read/write
+calls would pass a bucket brigade instead of single bucket, since this
+made insertion by filters more efficient, with the general idea being that
+the outbound end of the sream would be writing them out using writev
+or reading them in using readv, which is about as efficient as I could
+get with Ada95. [I call it a list-queue instead of just queue because you
+have the choice of removing buckets from (or adding to) the queue one
+bucket at a time or an entire linked list of buckets.]
+
+But we could go one step further. A bucket is an ADT, and as such can
+be used as a general handle for read-only memory, read-write memory,
+cache object, file handle, mmap handle, file name, URL, whatever.
+What if, instead of just a stream of memory, it could pass around a
+stream of memory interspersed with file handles or references to
+remote objects? A filter could then add stuff around the stream without
+causing too much parsing overhead, and if it needed to look at all the
+bytes in the stream it would just replace the bucket handle with a stream
+of memory sucked from that handle. Something like this was talked about
+last year (see threads on "Stacking up Response Handling" on 23 Sep 1998
+and "I/O filters & reference counts" in late December 1998 and January 1999).
+And Dean started something with ap_buf.h, but I don't know how he meant
+to finish it.
+
+What I was thinking of was
+
+ typedef enum {
+ AP_BUCKET_rwmem,
+ AP_BUCKET_rmem,
+ AP_BUCKET_file_t,
+ AP_BUCKET_mmap_t,
+ AP_BUCKET_filename,
+ AP_BUCKET_cached_entity,
+ AP_BUCKET_URI,
+ } ap_bucket_color_t;
+
+ typedef struct ap_bucket_t ap_bucket_t;
+ struct ap_bucket_t {
+ ap_bucket_color_t color;
+ void *content;
+ ap_status_t (*free)(ap_bucket_t *bucket);
+ unsigned int refcount;
+ };
+
+ typedef struct ap_bucket_rwmem_t ap_bucket_rwmem_t;
+ struct ap_bucket_rwmem_t {
+ void *alloc_addr;
+ size_t alloc_len;
+ void *addr;
+ size_t len;
+ };
+
+ typedef struct ap_bucket_rmem_t ap_bucket_rmem_t;
+ struct ap_bucket_rmem_t {
+ void *addr;
+ size_t len;
+ };
+
+ typedef struct ap_bucket_filename ap_bucket_filename;
+ struct ap_bucket_filename {
+ ap_context_t *ctx;
+ char *name;
+ ap_stat_t *stat; /* useful if already stat'ed */
+ ap_aa_t *conf; /* access control structure for this file */
+ };
+
+ ...
+
+and then
+
+ typedef struct ap_bucket_list_t ap_bucket_list_t;
+ struct ap_bucket_list_t {
+ ap_bucket_t *bucket;
+ ap_bucket_list_t *prev;
+ ap_bucket_list_t *next;
+ };
+
+ typedef struct ap_brigade_t ap_brigade_t;
+ struct ap_brigade_t {
+ ap_context_t *ctx;
+ ap_bucket_list_t *first;
+ ap_bucket_list_t *last;
+ unsigned int count;
+ };
+
+and then construct the input and output streams as pushing these
+bucket brigades to or from the client. The streams would have to
+be a little more complicated than Onions, since I learned later that
+you also need a parallel stream of header fields (in tokenized form)
+in order for it to work with anything HTTP-like.
+
+Why use an enum instead of a bunch of file pointers for each type
+of bucket, kind of like ap_iol? Because it allows adjacent memory
+buckets (the most frequent kind after a filter operation) to be
+gathered into a single writev. Also, we need a way to be able to
+set up an operation and figure out what it will produce without
+actually performing the operation -- this is for OPTIONS and HEAD.
+
+Note that this would completely change the way we handle internal
+redirects, subrequests, server-side include, mod_proxy, access control, etc.
+And then most of the API hooks would need to change. I think that is why
+Dean was putting it off until 2.1. The annoying thing is that this is the
+most useful rearchitecting of the server -- the MPM, APR, and hook changes
+make 2.0 easier/cleaner/faster to port to other platforms, but layering
+enables in one fell swoop almost every significant non-config feature
+that our users have requested. A cache would just be a hash table or
+btree of file buckets, complete with AA info.
+
+Anyway, that was stuck in the back of my head and had to get out.
+I won't be able to work on it until after the dissertation is done,
+which every day seems to be further away. Maybe 3.0, with rHTTP/2.0.
+
+....Roy
+
+=================================================
+To: new-httpd@apache.org
+Subject: Re: bucket brigades and IOL
+In-reply-to: Your message of "Sat, 13 Nov 1999 20:43:58 GMT."
+ <382DCD8E.881B8468@algroup.co.uk>
+Date: Sun, 14 Nov 1999 22:24:03 -0800
+From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
+Message-ID: <199911142224.aa22545@gremlin-relay.ics.uci.edu>
+
+BenL wrote:
+>I've got to say that this is the most coherent suggestion along these
+>lines that I've seen yet. I rather like it. One thing I'd add is that if
+>you are going to have a movable "start of block" pointer, and changeable
+>length, it can be nice to allocate extra around the edges under some
+>circumstances, so that lower layers can expand the block without having
+>to add extra chunks.
+
+Or, alternatively, allocate equal size blocks and just pass around
+a reference pair within the buckets that, when the bucket is freed,
+access a more complicated reference-counting pool. I think that is
+closer to what IO-Lite does.
+
+>Also, the usual objections still apply - i.e. it is awkward to do things
+>like searching for particular strings, since they may cross boundaries.
+>I'm beginning to think that the right answer to this is to provide nice
+>matching functions that know about the chunked structures, and last
+>resort functions that'll glue it all back into one chunk...
+
+Yep, that's what I ended up doing for Ada95, though in that case there
+were no easier alternatives.
+
+....Roy
+
+=================================================
+To: new-httpd@apache.org
+Subject: Re: layered I/O (was: cvs commit: ...)
+In-reply-to: Your message of "Wed, 29 Mar 2000 01:21:09 PST."
+ <Pine.LNX.4.21.0003290004100.10357-100000@piglet>
+Date: Wed, 29 Mar 2000 02:05:08 -0800
+From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
+Message-ID: <200003290205.aa19557@gremlin-relay.ics.uci.edu>
+
+>Selection of IO Layers
+>
+>The core selects a source module and IO layers based on the urlspace
+>configuration. Content might be generated by mod_perl, and the result is
+>piped through mod_chunk, mod_ssl, and mod_net, in turn. When the content
+>generator runs, the core enforces that the module set the content type
+>before the first call to ap_bput. The content type is set by a function
+>call. The function (ap_set_content_type(request_rec *, char *)) examines
+>the content type and adds IO layers as neccessary. For server parsed
+>html, the core might insert mod_include immediately after mod_perl.
+
+The problem of thinking of it that way is that, like Dean mentioned,
+the output of one module may be filtered and the filter indicate that
+content should be embedded from another URL, which turns out to be a
+CGI script that outputs further parseable content. In this instance,
+the goal of layered-IO is to abstract away such behavior so that the
+instance is processed recursively and thus doesn't result in some tangled
+mess of processing code for subrequests. Doing it requires that each
+layer be able to pass both data and metadata, and have both data and
+metadata be processed at each layer (if desired), rather than call a
+single function that would set the metadata for the entire response.
+
+My "solution" to that is to pass three interlaced streams -- data,
+metadata, and meta-metadata -- through each layer. The metadata
+streams would point to a table of tokenized name-value pairs.
+There are lots of ways to do that, going back to my description of
+bucket brigades long ago. Basically, each block of memory would
+indicate what type of data, with metadata occurring in a block before
+the data block(s) that it describes (just like chunk-size describes
+the subsequent chunk-data) and the layers could be dynamically
+rearranged based on the metadata that passed through them, in
+accordance with the purpose of the filter.
+
+>(Can anyone produce a use case where the IO chain could change after
+>output begins?)
+
+Output is a little easier, but that is the normal case for input.
+We don't know what filters to apply to the request body until after
+we have passed through the HTTP headers, and the HTTP message processor
+is itself a filter in this model.
+
+....Roy
+
+
+=================================================
+To: new-httpd@apache.org
+Subject: Re: filtering patches
+In-reply-to: Your message of "Mon, 10 Jul 2000 15:33:25 PDT."
+ <Pine.LNX.4.21.0007101528180.10878-100000@koj.rkbloom.net>
+Date: Mon, 10 Jul 2000 16:58:00 -0700
+From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
+Message-ID: <200007101657.aa21782@gremlin-relay.ics.uci.edu>
+
+[...]
+I meant that the filters, when written to as part of the output stream,
+are treated as a stack (write to the top-most filter without any knowledge
+of what may lie underneath it). So the process of arranging filters
+for a particular response is like dropping them onto a stack. When a
+filter is done or the stream is closed, each instantiated filter cleans
+up according to its local state and then destroys itself (as it is popped
+off the stack).
+
+This is completely separate from the registration of filters by
+name and purpose, which could be done by hooks. The difference is that
+filters are registered at config time but only instantiated (given local
+storage) and arranged on a per stream basis.
+
+Bucket brigades is simply a way to encapsulate and pass data down the stream
+such that it can be as efficient as the sender desires, while retaining
+a simple interface. The purpose of the bucket is to make handling of the
+data uniform regardless of its type, or make type-specific conversions
+via a single ADT call if and only if they are needed by some filter.
+The purpose of the brigade is to reduce the number of calling arguments
+and linearize the calling sequence for insertion filters. Each filter
+definition is separate from its instantiation on the stream because
+there may be many streams operating at once within a single program.
+Each bucket is independent of the brigade so that the filters can rearrange
+and insert buckets at will. Each data item is isolated by the bucket
+structure, which allows them to be split across child buckets or shared
+with multiple streams (e.g., cached objects). We don't need to implement
+all of this on the first pass -- we just need to implement the ADT external
+interfaces such that they don't have to change as we make the overall
+stream more efficient.
+
+BTW, in case I didn't make this clear in past messages, this design is
+an amalgam of the best aspects of the designs from Henrik's Streams
+(see w3c-libwww), sfio (AT&T Research), IO-Lite (Rice Univ.), and
+libwww-ada95 (UCI). The MIME stuff in MSIE is based on Henrik's streams.
+Henrik's stuff is very fast, but is spaghetti code because it relies on
+callbacks and legacy stuff in libwww. sfio is great but is intended to
+be a complete replacement for stdio and hence does way too much and is
+subject to a few patents that I don't appreciate. IO-Lite is cool but
+is probably only worth it when the entire OS is based on IO-Lite memory
+management, but regardless the code isn't available for commercial use.
+As Dean has mentioned many times, we already get most of the performance
+benefit of IO-Lite simply by avoiding memory copies on large writes.
+libwww-ada95 was an attempt to make Ada95 suck less for systems programming,
+which was only a partial success (it is very fast compared to other Ada95
+libraries, but memory management became a problem with complex filters).
+
+Writing our own streams library isn't NIH syndrome -- both Dean and I
+have independently investigated the other available alternatives and they
+just aren't suitable for our purpose. Even with all that, my own design
+only truly pays off (versus plain old BUFF) when you make good use of
+sendfile and shared object caching.
+
+[...]
+
+
+=================================================
+Other stuff Roy wrote on new-httpd:
+
+My buckets are passed around in list-queues (really just lists with front
+and back pointers). My buckets carry data and metadata and meta-metadata.
+My buckets are used to indicate stream-end, and the filter configuration
+itself is determined by the stream content. It probably sounds weird, but
+the effects of this interface are completely different than mere content
+filters. They simplify everything. I'm not saying that we have to
+simplify everything right away, but I am saying that it is just as easy
+to implement a fully-general filter using bucket brigades as it is
+to implement string interface filters -- all of the complex parts
+are handled by the ADTs.
+
+...
+
+The real psychedelic stuff happens when you can pass metadata (tokenized
+header fields) as buckets and the filters know how to pass that down the
+chain without treating them as data.
+
+...
+
+The purpose of passing a list of buckets around is to linearize
+the call stack for the frequent case of filtered content
+splitting one large bucket into separate buckets with filtered results
+interspersed in between. The effect is that a filter chain can frequently
+process an entire message in one pass down the chain, which enables the
+stream end to send the entire response in one go, which also allows it
+to do interesting things like provide a content length by summing the
+data length of all the buckets' data, and set a last-modified time
+by picking the most recent time from a set of static file buckets.
+
+I think it would help if we stopped using artificial examples. Let's
+try something simple:
+
+ socket <-- http <-- add_footer <-- add_header <-- send_file
+
+send_file calls its filter with an ap_file_t bucket and End-of-Stream (EOS)
+in the bucket list. add_header sets a flag, prepends another ap_file_t
+bucket to the list and sends the list to its filter. add_footer looks
+at the list, finds the EOS, inserts another ap_file_t bucket in
+front of the EOS, and sends the list on to its filter. http walks through
+the list picking up the (cached) stat values, notes the EOS and seeing
+that its own flag for headers_sent is false, sets the cumulative metadata
+and sends the header fields, followed by three calls to the kernel to
+send out the three files using whatever mechanism is most efficient.
+
+The point here isn't that this is the only way to implement filters.
+The point is that no other interface can implement them as efficiently.
+Not even close. Yes, there are cases where string filters are just as
+efficient as any other design, but there is no case in which they are
+more efficient than bucket brigades. The reason is that being able
+to process a list of strings in one call more than offsets the extra
+cost of list processing, regardless of the filter type, and allows
+for additional features that have benefits for http processing.
+Like, for example, being able to determine the entire set of resources
+that make up the source of this dynamic resource without teaching every
+filter about WebDAV.
+
+...
+
+Making many small calls down the filter chain is something best
+avoided, which is why the bucket brigades interface consists of
+a linked list of buckets, such that all of the currently available
+data can be passed-on in a single call.
+
+Being able to handle sendfile, cached objects and subrequests is very
+effective at improving efficiency, which is why the buckets are typed.
+A filter that needs to do byte-level processing will have to call a
+routine to convert the typed bucket into a data stream, but that decision
+is delayed until no other choice is available and adds no overhead to
+the common cases of non-filtered or pre-/post-filtered objects.
+
+Being able to process header fields (metadata) through the same filter
+set as the data is necessary for correctness and simplicity in the
+proper ordering of independently developed filter modules, which is
+why the buckets can carry metadata on the same stream. Every filter
+has to be knowledgeable about metadata because only the filter knows
+whether or not its actions will change the nature of the data.
+
+