summaryrefslogtreecommitdiff
path: root/buckets/doc_wishes.txt
diff options
context:
space:
mode:
Diffstat (limited to 'buckets/doc_wishes.txt')
-rw-r--r--buckets/doc_wishes.txt269
1 files changed, 0 insertions, 269 deletions
diff --git a/buckets/doc_wishes.txt b/buckets/doc_wishes.txt
deleted file mode 100644
index c85d01ae0..000000000
--- a/buckets/doc_wishes.txt
+++ /dev/null
@@ -1,269 +0,0 @@
-Wishes -- use cases for layered IO
-==================================
-
-[Feel free to add your own]
-
-Dirk's original list:
----------------------
-
- This file is there so that I do not have to remind myself
- about the reasons for Layered IO, apart from the obvious one.
-
- 0. To get away from a 1 to 1 mapping
-
- i.e. a single URI can cause multiple backend requests,
- in arbitrary configurations, such as in paralel, tunnel/piped,
- or in some sort of funnel mode. Such multiple backend
- requests, with fully layered IO can be treated exactly
- like any URI request; and recursion is born :-)
-
- 1. To do on the fly charset conversion
-
- Be, theoretically, be able to send out your content using
- latin1, latin2 or any other charset; generated from static
- _and_ dynamic content in other charsets (typically unicode
- encoded as UTF7 or UTF8). Such conversion is prompted by
- things like the user-agent string, a cookie, or other hints
- about the capabilities of the OS, language preferences and
- other (in)capabilities of the final receipient.
-
- 2. To be able to do fancy templates
-
- Have your application/cgi sending out an XML structure of
- field/value pair-ed contents; which is substituted into a
- template by the web server; possibly based on information
- accessible/known to the webserver which you do not want to
- be known to the backend script. Ideally that template would
- be just as easy to generate by a backend as well (see 0).
-
- 3. On the fly translation
-
- And other general text and output mungling, such as translating
- an english page in spanish whilst it goes through your Proxy,
- or JPEG-ing a GIF generated by mod_perl+gd.
-
- Dw.
-
-
-Dean's canonical list of use cases
-----------------------------------
-
-Date: Mon, 27 Mar 2000 17:37:25 -0800 (PST)
-From: Dean Gaudet <dgaudet-list-new-httpd@arctic.org>
-To: new-httpd@apache.org
-Subject: canonical list of i/o layering use cases
-Message-ID: <Pine.LNX.4.21.0003271648270.14812-100000@twinlark.arctic.org>
-
-i really hope this helps this discussion move forward.
-
-the following is the list of all applications i know of which have been
-proposed to benefit from i/o layering.
-
-- data sink abstractions:
- - memory destination (for ipc; for caching; or even for abstracting
- things such as strings, which can be treated as an i/o
- object)
- - pipe/socket destination
- - portability variations on the above
-
-- data source abstraction, such as:
- - file source (includes proxy caching)
- - memory source (includes most dynamic content generation)
- - network source (TCP-to-TCP proxying)
- - database source (which is probably, under the covers, something like
- a memory source mapped from the db process on the same box,
- or from a network source on another box)
- - portability variations in the above sources
-
-- filters:
- - encryption
- - translation (ebcdic, unicode)
- - compression
- - chunking
- - MUX
- - mod_include et al
-
-and here are some of my thoughts on trying to further quantify filters:
-
-a filter separates two layers and is both a sink and a source. a
-filter takes an input stream of bytes OOOO... and generates an
-output stream of bytes which can be broken into blocks such
-as:
-
- OOO NNN O NNNNN ...
-
- where O = an old or original byte copied from the input
- and N = a new byte generated by the filter
-
-for each filter we can calculate a quantity i'll call the copied-content
-ratio, or CCR:
-
- nbytes_old / nbytes_new
-
-where:
- nbytes_old = number of bytes in the output of the
- filter which are copied from the input
- (in zero-copy this would mean "copy by
- reference counting an input buffer")
- nbytes_new = number of bytes which are generated
- by the filter which weren't present in the
- input
-
-examples:
-
-CCR = infinity: who cares -- straight through with no
- transformation. the filter shouldn't even be there.
-
-CCR = 0: encryption, translation (ebcdic, unicode), compression.
- these get zero benefit from zero-copy.
-
-CCR > 0: chunking, MUX, mod_include
-
-from the point of view of evaluating the benefit of zero-copy we only
-care about filters with CCR > 0 -- because CCR = 0 cases degenerate into
-a single-copy scheme anyhow.
-
-it is worth noting that the large_write heuristic in BUFF fairly
-clearly handles zero-copy at very little overhead for CCRs larger than
-DEFAULT_BUFSIZE.
-
-what needs further quantification is what the CCR of mod_include would
-be.
-
-for a particular zero-copy implementation we can find some threshold k
-where filters with CCRs >= k are faster with the zero-copy implementation
-and CCRs < k are slower... faster/slower as compared to a baseline
-implementation such as the existing BUFF.
-
-it's my opinion that when you consider the data sources listed above, and
-the filters listed above that *in general* the existing BUFF heuristics
-are faster than a complete zero-copy implementation.
-
-you might ask how does this jive with published research such as the
-IO-Lite stuff? well, when it comes right down to it, the research in
-the IO-Lite papers deal with very large CCRs and contrast them against
-a naive buffering implementation such as stdio -- they don't consider
-what a few heuristics such as apache's BUFF can do.
-
-Dean
-
-
-Jim's summary of a discussion
------------------------------
-
- OK, so the main points we wish to address are (in no particular order):
-
- 1. zero-copy
- 2. prevent modules/filters from having to glob the entire
- data stream in order to start processing/filtering
- 3. the ability to layer and "multiplex" data and meta-data
- in the stream
- 4. the ability to perform all HTTP processing at the
- filter level (including proxy), even if not implemented in
- this phase
- 5. Room for optimization and recursion
-
- Jim Jagielski
-
-
-Roy's ramblings
----------------
-
- Data flow networks are a very well-defined and understood software
- architecture. They have a single, very important constraint: no filter
- is allowed to know anything about the nature of its upstream or downstream
- neighbors beyond what is defined by the filter's own interface.
- That constraint is what makes data flow networks highly configurable and
- reusable. Those are properties that we want from our filters.
-
- ...
-
- One of the goals of the filter concept was to fix the bird's nest of
- interconnected side-effect conditions that allow buff to perform well
- without losing the performance. That's why there is so much trepidation
- about anyone messin with 1.3.x buff.
-
- ...
-
- Content filtering is my least important goal. Completely replacing HTTP
- parsing with a filter is my primary goal, followed by a better proxy,
- then internal memory caches, and finally zero-copy sendfile (in order of
- importance, but in reverse order of likely implementation). Content
- filtering is something we get for free using the bucket brigade interface,
- but we don't get anything for free if we start with an interface that only
- supports content filtering.
-
- ...
-
- I don't think it is safe to implement filters in Apache without either
- a smart allocation system or a strict limiting mechanism that prevents
- filters from buffering more than 8KB [or user-definable amount] of memory
- at a time (for the entire non-flushed stream). It isn't possible to
- create a robust server implementation using filters that allocate memory
- from a pool (or the heap, or a stack, or whatever) without somehow
- reclaiming and reusing the memory that gets written out to the network.
- There is a certain level of "optimization" that must be present before
- any filtering mechanism can be in Apache, and that means meeting the
- requirement that the server not keel over and die the first time a user
- requests a large filtered file. XML tree manipulation is an example
- where that can happen.
-
- ...
-
- Disabling content-length just because there are filters in the stream
- is a blatant cop-out. If you have to do that then the design is wrong.
- At the very least the HTTP filter/buff should be capable of discovering
- whether it knows the content length by examing whether it has the whole
- response in buffer (or fd) before it sends out the headers.
-
- ...
-
- No layered-IO solution will work with the existing memory allocation
- mechanisms of Apache. The reason is simply that some filters can
- incrementally process data and some filters cannot, and they often
- won't know the answer until they have processed the data they are given.
- This means the buffering mechanism needs some form of overflow mechanism
- that diverts parts of the stream into a slower-but-larger buffer (file),
- and the only clean way to do that is to have the memory allocator for the
- stream also do paging to disk. You can't do this within the request pool
- because each layer may need to allocate more total memory than is available
- on the machine, and you can't depend on some parts of the response being
- written before later parts are generated because some filtering
- decisions require knowledge of the end of the stream before they
- can process the beginning.
-
- ...
-
- The purpose of the filtering mechanism is to provide a useful
- and easy to understand means for extending the functionality of
- independent modules (filters) by rearranging them in stacks
- via a uniform interface.
-
-
-Paul J. Reder's use cases for filters
--------------------------------------
-
- 1) Containing only text.
- 2) Containing 10 .gif or .jpg references (perhaps filtering
- from one format to the other).
- 3) Containing an exec of a cgi that generates a text only file
- 4) Containing an exec of a cgi that generates an SSI of a text only file.
- 5) Containing an exec of a cgi that generates an SSI that execs a cgi
- that generates a text only file (that swallows a fly, I don't know why).
- 6) Containing an SSI that execs a cgi that generates an SSI that
- includes a text only file.
- NOTE: Solutions must be able to handle *both* 5 and 6. Order
- shouldn't matter.
- 7) Containing text that must be altered via a regular expression
- filter to change all occurrences of "rederpj" to "misguided"
- 8) Containing text that must be altered via a regular expression
- filter to change all occurrences of "rederpj" to "lost"
- 9) Containing perl or php that must be handed off for processing.
- 10) A page in ascii that needs to be converted to ebcdic, or from
- one code page to another.
- 11) Use the babelfish translation filter to translate text on a
- page from Spanish to Martian-Swahili.
- 12) Translate to Esperanto, compress, and encrypt the output from
- a php program generated by a perl script called from a cgi exec
- embedded in a file included by an SSI :)
-