diff options
Diffstat (limited to 'buckets/doc_wishes.txt')
-rw-r--r-- | buckets/doc_wishes.txt | 269 |
1 files changed, 0 insertions, 269 deletions
diff --git a/buckets/doc_wishes.txt b/buckets/doc_wishes.txt deleted file mode 100644 index c85d01ae0..000000000 --- a/buckets/doc_wishes.txt +++ /dev/null @@ -1,269 +0,0 @@ -Wishes -- use cases for layered IO -================================== - -[Feel free to add your own] - -Dirk's original list: ---------------------- - - This file is there so that I do not have to remind myself - about the reasons for Layered IO, apart from the obvious one. - - 0. To get away from a 1 to 1 mapping - - i.e. a single URI can cause multiple backend requests, - in arbitrary configurations, such as in paralel, tunnel/piped, - or in some sort of funnel mode. Such multiple backend - requests, with fully layered IO can be treated exactly - like any URI request; and recursion is born :-) - - 1. To do on the fly charset conversion - - Be, theoretically, be able to send out your content using - latin1, latin2 or any other charset; generated from static - _and_ dynamic content in other charsets (typically unicode - encoded as UTF7 or UTF8). Such conversion is prompted by - things like the user-agent string, a cookie, or other hints - about the capabilities of the OS, language preferences and - other (in)capabilities of the final receipient. - - 2. To be able to do fancy templates - - Have your application/cgi sending out an XML structure of - field/value pair-ed contents; which is substituted into a - template by the web server; possibly based on information - accessible/known to the webserver which you do not want to - be known to the backend script. Ideally that template would - be just as easy to generate by a backend as well (see 0). - - 3. On the fly translation - - And other general text and output mungling, such as translating - an english page in spanish whilst it goes through your Proxy, - or JPEG-ing a GIF generated by mod_perl+gd. - - Dw. - - -Dean's canonical list of use cases ----------------------------------- - -Date: Mon, 27 Mar 2000 17:37:25 -0800 (PST) -From: Dean Gaudet <dgaudet-list-new-httpd@arctic.org> -To: new-httpd@apache.org -Subject: canonical list of i/o layering use cases -Message-ID: <Pine.LNX.4.21.0003271648270.14812-100000@twinlark.arctic.org> - -i really hope this helps this discussion move forward. - -the following is the list of all applications i know of which have been -proposed to benefit from i/o layering. - -- data sink abstractions: - - memory destination (for ipc; for caching; or even for abstracting - things such as strings, which can be treated as an i/o - object) - - pipe/socket destination - - portability variations on the above - -- data source abstraction, such as: - - file source (includes proxy caching) - - memory source (includes most dynamic content generation) - - network source (TCP-to-TCP proxying) - - database source (which is probably, under the covers, something like - a memory source mapped from the db process on the same box, - or from a network source on another box) - - portability variations in the above sources - -- filters: - - encryption - - translation (ebcdic, unicode) - - compression - - chunking - - MUX - - mod_include et al - -and here are some of my thoughts on trying to further quantify filters: - -a filter separates two layers and is both a sink and a source. a -filter takes an input stream of bytes OOOO... and generates an -output stream of bytes which can be broken into blocks such -as: - - OOO NNN O NNNNN ... - - where O = an old or original byte copied from the input - and N = a new byte generated by the filter - -for each filter we can calculate a quantity i'll call the copied-content -ratio, or CCR: - - nbytes_old / nbytes_new - -where: - nbytes_old = number of bytes in the output of the - filter which are copied from the input - (in zero-copy this would mean "copy by - reference counting an input buffer") - nbytes_new = number of bytes which are generated - by the filter which weren't present in the - input - -examples: - -CCR = infinity: who cares -- straight through with no - transformation. the filter shouldn't even be there. - -CCR = 0: encryption, translation (ebcdic, unicode), compression. - these get zero benefit from zero-copy. - -CCR > 0: chunking, MUX, mod_include - -from the point of view of evaluating the benefit of zero-copy we only -care about filters with CCR > 0 -- because CCR = 0 cases degenerate into -a single-copy scheme anyhow. - -it is worth noting that the large_write heuristic in BUFF fairly -clearly handles zero-copy at very little overhead for CCRs larger than -DEFAULT_BUFSIZE. - -what needs further quantification is what the CCR of mod_include would -be. - -for a particular zero-copy implementation we can find some threshold k -where filters with CCRs >= k are faster with the zero-copy implementation -and CCRs < k are slower... faster/slower as compared to a baseline -implementation such as the existing BUFF. - -it's my opinion that when you consider the data sources listed above, and -the filters listed above that *in general* the existing BUFF heuristics -are faster than a complete zero-copy implementation. - -you might ask how does this jive with published research such as the -IO-Lite stuff? well, when it comes right down to it, the research in -the IO-Lite papers deal with very large CCRs and contrast them against -a naive buffering implementation such as stdio -- they don't consider -what a few heuristics such as apache's BUFF can do. - -Dean - - -Jim's summary of a discussion ------------------------------ - - OK, so the main points we wish to address are (in no particular order): - - 1. zero-copy - 2. prevent modules/filters from having to glob the entire - data stream in order to start processing/filtering - 3. the ability to layer and "multiplex" data and meta-data - in the stream - 4. the ability to perform all HTTP processing at the - filter level (including proxy), even if not implemented in - this phase - 5. Room for optimization and recursion - - Jim Jagielski - - -Roy's ramblings ---------------- - - Data flow networks are a very well-defined and understood software - architecture. They have a single, very important constraint: no filter - is allowed to know anything about the nature of its upstream or downstream - neighbors beyond what is defined by the filter's own interface. - That constraint is what makes data flow networks highly configurable and - reusable. Those are properties that we want from our filters. - - ... - - One of the goals of the filter concept was to fix the bird's nest of - interconnected side-effect conditions that allow buff to perform well - without losing the performance. That's why there is so much trepidation - about anyone messin with 1.3.x buff. - - ... - - Content filtering is my least important goal. Completely replacing HTTP - parsing with a filter is my primary goal, followed by a better proxy, - then internal memory caches, and finally zero-copy sendfile (in order of - importance, but in reverse order of likely implementation). Content - filtering is something we get for free using the bucket brigade interface, - but we don't get anything for free if we start with an interface that only - supports content filtering. - - ... - - I don't think it is safe to implement filters in Apache without either - a smart allocation system or a strict limiting mechanism that prevents - filters from buffering more than 8KB [or user-definable amount] of memory - at a time (for the entire non-flushed stream). It isn't possible to - create a robust server implementation using filters that allocate memory - from a pool (or the heap, or a stack, or whatever) without somehow - reclaiming and reusing the memory that gets written out to the network. - There is a certain level of "optimization" that must be present before - any filtering mechanism can be in Apache, and that means meeting the - requirement that the server not keel over and die the first time a user - requests a large filtered file. XML tree manipulation is an example - where that can happen. - - ... - - Disabling content-length just because there are filters in the stream - is a blatant cop-out. If you have to do that then the design is wrong. - At the very least the HTTP filter/buff should be capable of discovering - whether it knows the content length by examing whether it has the whole - response in buffer (or fd) before it sends out the headers. - - ... - - No layered-IO solution will work with the existing memory allocation - mechanisms of Apache. The reason is simply that some filters can - incrementally process data and some filters cannot, and they often - won't know the answer until they have processed the data they are given. - This means the buffering mechanism needs some form of overflow mechanism - that diverts parts of the stream into a slower-but-larger buffer (file), - and the only clean way to do that is to have the memory allocator for the - stream also do paging to disk. You can't do this within the request pool - because each layer may need to allocate more total memory than is available - on the machine, and you can't depend on some parts of the response being - written before later parts are generated because some filtering - decisions require knowledge of the end of the stream before they - can process the beginning. - - ... - - The purpose of the filtering mechanism is to provide a useful - and easy to understand means for extending the functionality of - independent modules (filters) by rearranging them in stacks - via a uniform interface. - - -Paul J. Reder's use cases for filters -------------------------------------- - - 1) Containing only text. - 2) Containing 10 .gif or .jpg references (perhaps filtering - from one format to the other). - 3) Containing an exec of a cgi that generates a text only file - 4) Containing an exec of a cgi that generates an SSI of a text only file. - 5) Containing an exec of a cgi that generates an SSI that execs a cgi - that generates a text only file (that swallows a fly, I don't know why). - 6) Containing an SSI that execs a cgi that generates an SSI that - includes a text only file. - NOTE: Solutions must be able to handle *both* 5 and 6. Order - shouldn't matter. - 7) Containing text that must be altered via a regular expression - filter to change all occurrences of "rederpj" to "misguided" - 8) Containing text that must be altered via a regular expression - filter to change all occurrences of "rederpj" to "lost" - 9) Containing perl or php that must be handed off for processing. - 10) A page in ascii that needs to be converted to ebcdic, or from - one code page to another. - 11) Use the babelfish translation filter to translate text on a - page from Spanish to Martian-Swahili. - 12) Translate to Esperanto, compress, and encrypt the output from - a php program generated by a perl script called from a cgi exec - embedded in a file included by an SSI :) - |