diff options
Diffstat (limited to 'docs/manual/ebcdic.html')
-rw-r--r-- | docs/manual/ebcdic.html | 505 |
1 files changed, 0 insertions, 505 deletions
diff --git a/docs/manual/ebcdic.html b/docs/manual/ebcdic.html deleted file mode 100644 index 74d01485cc..0000000000 --- a/docs/manual/ebcdic.html +++ /dev/null @@ -1,505 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> -<TITLE>The Apache EBCDIC Port</TITLE> -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> - -<blockquote><strong>Warning:</strong> -This document has not been updated to take into account changes -made in the 2.0 version of the Apache HTTP Server. Some of the -information may still be relevant, but please use it -with care. -</blockquote> - -<H1 ALIGN="CENTER">Overview of the Apache EBCDIC Port</H1> - - <P> - Version 1.3 of the Apache HTTP Server is the first version which - includes a port to a (non-ASCII) mainframe machine which uses - the EBCDIC character set as its native codeset.<BR> - (It is the SIEMENS family of mainframes running the - <A HREF="http://www.siemens.de/servers/bs2osd/osdbc_us.htm">BS2000/OSD - operating system</A>. This mainframe OS nowadays features a - SVR4-derived POSIX subsystem). - </P> - - <P> - The port was started initially to - <UL> - <LI> prove the feasibility of porting - <A HREF="http://dev.apache.org/">the Apache HTTP server</A> - to this platform - <LI> find a "worthy and capable" successor for the venerable - <A HREF="http://www.w3.org/Daemon/">CERN-3.0</A> daemon - (which was ported a couple of years ago), and to - <LI> prove that Apache's preforking process model can on this platform - easily outperform the accept-fork-serve model used by CERN by a - factor of 5 or more. - </UL> - </P> - - <P> - This document serves as a rationale to describe some of the design - decisions of the port to this machine. - </P> - - <H2 ALIGN=CENTER>Design Goals</H2> - <P> - One objective of the EBCDIC port was to maintain enough backwards - compatibility with the (EBCDIC) CERN server to make the transition to - the new server attractive and easy. This required the addition of - a configurable method to define whether a HTML document was stored - in ASCII (the only format accepted by the old server) or in EBCDIC - (the native document format in the POSIX subsystem, and therefore - the only realistic format in which the other POSIX tools like grep - or sed could operate on the documents). The current solution to - this is a "pseudo-MIME-format" which is intercepted and - interpreted by the Apache server (see below). Future versions - might solve the problem by defining an "ebcdic-handler" for all - documents which must be converted. - </P> - - <H2 ALIGN=CENTER>Technical Solution</H2> - <P> - Since all Apache input and output is based upon the BUFF data type - and its methods, the easiest solution was to add the conversion to - the BUFF handling routines. The conversion must be settable at any - time, so a BUFF flag was added which defines whether a BUFF object - has currently enabled conversion or not. This flag is modified at - several points in the HTTP protocol: - <UL> - <LI><STRONG>set</STRONG> before a request is received (because the - request and the request header lines are always in ASCII - format) - - <LI><STRONG>set/unset</STRONG> when the request body is - received - depending on the content type of the request body - (because the request body may contain ASCII text or a binary file) - - <LI><STRONG>set</STRONG> before a reply header is sent (because the - response header lines are always in ASCII format) - - <LI><STRONG>set/unset</STRONG> when the response body is - sent - depending on the content type of the response body - (because the response body may contain text or a binary file) - </UL> - </P> - -<H2 ALIGN=CENTER>Porting Notes</H2> - <P> - <OL> - <LI> - The relevant changes in the source are #ifdef'ed into two - categories: - <DL> - <DT><CODE><STRONG>#ifdef CHARSET_EBCDIC</STRONG></CODE> - <DD>Code which is needed for any EBCDIC based machine. This - includes character translations, differences in - contiguity of the two character sets, flags which - indicate which part of the HTTP protocol has to be - converted and which part doesn't <EM>etc.</EM> - <DT><CODE><STRONG>#ifdef _OSD_POSIX</STRONG></CODE> - <DD>Code which is needed for the SIEMENS BS2000/OSD - mainframe platform only. This deals with include file - differences and socket implementation topics which are - only required on the BS2000/OSD platform. - </DL> - </LI><BR> - - <LI> - The possibility to translate between ASCII and EBCDIC at the - socket level (on BS2000 POSIX, there is a socket option which - supports this) was intentionally <EM>not</EM> chosen, because - the byte stream at the HTTP protocol level consists of a - mixture of protocol related strings and non-protocol related - raw file data. HTTP protocol strings are always encoded in - ASCII (the GET request, any Header: lines, the chunking - information <EM>etc.</EM>) whereas the file transfer parts (<EM>i.e.</EM>, GIF - images, CGI output <EM>etc.</EM>) should usually be just "passed through" - by the server. This separation between "protocol string" and - "raw data" is reflected in the server code by functions like - bgets() or rvputs() for strings, and functions like bwrite() - for binary data. A global translation of everything would - therefore be inadequate.<BR> - (In the case of text files of course, provisions must be made so - that EBCDIC documents are always served in ASCII) - </LI><BR> - - <LI> - This port therefore features a built-in protocol level conversion - for the server-internal strings (which the compiler translated to - EBCDIC strings) and thus for all server-generated documents. - The hard coded ASCII escapes \012 and \015 which are - ubiquitous in the server code are an exception: they are - already the binary encoding of the ASCII \n and \r and must - not be converted to ASCII a second time. This exception is - only relevant for server-generated strings; and <EM>external</EM> - EBCDIC documents are not expected to contain ASCII newline characters. - </LI><BR> - - <LI> - By examining the call hierarchy for the BUFF management - routines, I added an "ebcdic/ascii conversion layer" which - would be crossed on every puts/write/get/gets, and a - conversion flag which allowed enabling/disabling the - conversions on-the-fly. Usually, a document crosses this - layer twice from its origin source (a file or CGI output) to - its destination (the requesting client): <SAMP>file -> - Apache</SAMP>, and <SAMP>Apache -> client</SAMP>.<BR> - The server can now read the header - lines of a CGI-script output in EBCDIC format, and then find - out that the remainder of the script's output is in ASCII - (like in the case of the output of a WWW Counter program: the - document body contains a GIF image). All header processing is - done in the native EBCDIC format; the server then determines, - based on the type of document being served, whether the - document body (except for the chunking information, of - course) is in ASCII already or must be converted from EBCDIC. - </LI><BR> - - <LI> - For Text documents (MIME types text/plain, text/html <EM>etc.</EM>), - an implicit translation to ASCII can be used, or (if the - users prefer to store some documents in raw ASCII form for - faster serving, or because the files reside on a NFS-mounted - directory tree) can be served without conversion. - <BR> - <STRONG>Example:</STRONG><BLOCKQUOTE> - to serve files with the suffix .ahtml as a raw ASCII text/html - document without implicit conversion (and suffix .ascii - as ASCII text/plain), use the directives:<PRE> - AddType text/x-ascii-html .ahtml - AddType text/x-ascii-plain .ascii - </PRE></BLOCKQUOTE> - Similarly, any text/XXXX MIME type can be served as "raw ASCII" by - configuring a MIME type "text/x-ascii-XXXX" for it using AddType. - </LI><BR> - - <LI> - Non-text documents are always served "binary" without conversion. - This seems to be the most sensible choice for, .<EM>e.g.</EM>, GIF/ZIP/AU - file types. This of course requires the user to copy them to the - mainframe host using the "rcp -b" binary switch. - </LI><BR> - - <LI> - Server parsed files are always assumed to be in native (<EM>i.e.</EM>, - EBCDIC) format as used on the machine, and are converted after - processing. - </LI><BR> - - <LI> - For CGI output, the CGI script determines whether a conversion is - needed or not: by setting the appropriate Content-Type, text files - can be converted, or GIF output can be passed through unmodified. - An example for the latter case is the wwwcount program which we ported - as well. - </LI><BR> - </OL> - </P> - - <H2 ALIGN=CENTER>Document Storage Notes</H2> - <H3 ALIGN=CENTER>Binary Files</H3> - <P> - All files with a <SAMP>Content-Type:</SAMP> which does not - start with <SAMP>text/</SAMP> are regarded as <EM>binary files</EM> - by the server and are not subject to any conversion. - Examples for binary files are GIF images, gzip-compressed - files and the like. - </P> - <P> - When exchanging binary files between the mainframe host and a - Unix machine or Windows PC, be sure to use the ftp "binary" - (<SAMP>TYPE I</SAMP>) command, or use the - <SAMP>rcp -b</SAMP> command from the mainframe host - (the -b switch is not supported in unix rcp's). - </P> - - <H3 ALIGN=CENTER>Text Documents</H3> - <P> - The default assumption of the server is that Text Files - (<EM>i.e.</EM>, all files whose <SAMP>Content-Type:</SAMP> starts with - <SAMP>text/</SAMP>) are stored in the native character - set of the host, EBCDIC. - </P> - - <H3 ALIGN=CENTER>Server Side Included Documents</H3> - <P> - SSI documents must currently be stored in EBCDIC only. No - provision is made to convert it from ASCII before processing. - </P> - - <H2 ALIGN=CENTER>Apache Modules' Status</H2> - <TABLE BORDER ALIGN=middle> - <TR> - <TH>Module - <TH>Status - <TH>Notes - </TR> - - <TR> - <TD ALIGN=LEFT>http_core - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_access - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_actions - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_alias - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_asis - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_auth - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_auth_anon - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_auth_db - <TD ALIGN=CENTER>? - <TD>with own libdb.a - </TR> - - <TR> - <TD ALIGN=LEFT>mod_auth_dbm - <TD ALIGN=CENTER>? - <TD>with own libdb.a - </TR> - - <TR> - <TD ALIGN=LEFT>mod_autoindex - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_cern_meta - <TD ALIGN=CENTER>? - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_cgi - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_digest - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_dir - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_so - <TD ALIGN=CENTER>- - <TD>no shared libs - </TR> - - <TR> - <TD ALIGN=LEFT>mod_env - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_example - <TD ALIGN=CENTER>- - <TD>(test bed only) - </TR> - - <TR> - <TD ALIGN=LEFT>mod_expires - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_headers - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_imap - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_include - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_info - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_log_agent - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_log_config - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_log_referer - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_mime - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_mime_magic - <TD ALIGN=CENTER>? - <TD>not ported yet - </TR> - - <TR> - <TD ALIGN=LEFT>mod_negotiation - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_proxy - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_rewrite - <TD ALIGN=CENTER>+ - <TD>untested - </TR> - - <TR> - <TD ALIGN=LEFT>mod_setenvif - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_speling - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_status - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_unique_id - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_userdir - <TD ALIGN=CENTER>+ - <TD> - </TR> - - <TR> - <TD ALIGN=LEFT>mod_usertrack - <TD ALIGN=CENTER>? - <TD>untested - </TR> - </TABLE> - - <H2 ALIGN=CENTER>Third Party Modules' Status</H2> - <TABLE BORDER ALIGN=middle> - <TR> - <TH>Module - <TH>Status - <TH>Notes - </TR> - - <TR> - <TD ALIGN=LEFT><A HREF="http://java.apache.org/">mod_jserv</A> - <TD ALIGN=CENTER>- - <TD>JAVA still being ported. - </TR> - - <TR> - <TD ALIGN=LEFT><A HREF="http://www.php.net/">mod_php3</A> - <TD ALIGN=CENTER>+ - <TD>mod_php3 runs fine, with LDAP and GD and FreeType libraries - </TR> - - <TR> - <TD ALIGN=LEFT - ><A HREF="http://hpwww.ec-lyon.fr/~vincent/apache/mod_put.html">mod_put</A> - <TD ALIGN=CENTER>? - <TD>untested - </TR> - - <TR> - <TD ALIGN=LEFT - ><A HREF="ftp://hachiman.vidya.com/pub/apache/">mod_session</A> - <TD ALIGN=CENTER>- - <TD>untested - </TR> - - </TABLE> - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> |