ebcdic.html revision 6f912b4ad14f622aa8d57f887c8c745e13ff6dbf
7e68fce3cbd2246164e045a51ecd77f9f26680ednd<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
7e68fce3cbd2246164e045a51ecd77f9f26680ednd "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <meta name="generator" content="HTML Tidy, see www.w3.org" />
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <!-- Background white, links blue (unvisited), navy (visited), red (active) -->
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <!--#include virtual="header.html" -->
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <blockquote>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <strong>Warning:</strong> This document has not been updated
7e68fce3cbd2246164e045a51ecd77f9f26680ednd to take into account changes made in the 2.0 version of the
7e68fce3cbd2246164e045a51ecd77f9f26680ednd Apache HTTP Server. Some of the information may still be
7e68fce3cbd2246164e045a51ecd77f9f26680ednd relevant, but please use it with care.
7e68fce3cbd2246164e045a51ecd77f9f26680ednd </blockquote>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <h1 align="CENTER">Overview of the Apache EBCDIC Port</h1>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <p>Version 1.3 of the Apache HTTP Server is the first version
7e68fce3cbd2246164e045a51ecd77f9f26680ednd which includes a port to a (non-ASCII) mainframe machine which
7e68fce3cbd2246164e045a51ecd77f9f26680ednd uses the EBCDIC character set as its native codeset.<br />
7e68fce3cbd2246164e045a51ecd77f9f26680ednd (It is the SIEMENS family of mainframes running the <a
7e68fce3cbd2246164e045a51ecd77f9f26680ednd href="http://www.siemens.de/servers/bs2osd/osdbc_us.htm">BS2000/OSD
7e68fce3cbd2246164e045a51ecd77f9f26680ednd operating system</a>. This mainframe OS nowadays features a
7e68fce3cbd2246164e045a51ecd77f9f26680ednd SVR4-derived POSIX subsystem).</p>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd href="http://dev.apache.org/">the Apache HTTP server</a> to
7e68fce3cbd2246164e045a51ecd77f9f26680ednd this platform</li>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <li>find a "worthy and capable" successor for the venerable
7e68fce3cbd2246164e045a51ecd77f9f26680ednd (which was ported a couple of years ago), and to</li>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <li>prove that Apache's preforking process model can on this
7e68fce3cbd2246164e045a51ecd77f9f26680ednd platform easily outperform the accept-fork-serve model used
7e68fce3cbd2246164e045a51ecd77f9f26680ednd by CERN by a factor of 5 or more.</li>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <p>This document serves as a rationale to describe some of the
7e68fce3cbd2246164e045a51ecd77f9f26680ednd design decisions of the port to this machine.</p>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <p>One objective of the EBCDIC port was to maintain enough
01c674544bd4c211141bcd9fb09b96ffc18c6c3dnd backwards compatibility with the (EBCDIC) CERN server to make
3726777f47ac4bba3e21b075905959bbea47e72eerikabele the transition to the new server attractive and easy. This
3726777f47ac4bba3e21b075905959bbea47e72eerikabele required the addition of a configurable method to define
3726777f47ac4bba3e21b075905959bbea47e72eerikabele whether a HTML document was stored in ASCII (the only format
3726777f47ac4bba3e21b075905959bbea47e72eerikabele accepted by the old server) or in EBCDIC (the native document
7e68fce3cbd2246164e045a51ecd77f9f26680ednd format in the POSIX subsystem, and therefore the only realistic
7e68fce3cbd2246164e045a51ecd77f9f26680ednd format in which the other POSIX tools like grep or sed could
7e68fce3cbd2246164e045a51ecd77f9f26680ednd operate on the documents). The current solution to this is a
7e68fce3cbd2246164e045a51ecd77f9f26680ednd "pseudo-MIME-format" which is intercepted and interpreted by
3726777f47ac4bba3e21b075905959bbea47e72eerikabele the Apache server (see below). Future versions might solve the
7e68fce3cbd2246164e045a51ecd77f9f26680ednd problem by defining an "ebcdic-handler" for all documents which
7e68fce3cbd2246164e045a51ecd77f9f26680ednd must be converted.</p>
3726777f47ac4bba3e21b075905959bbea47e72eerikabele <p>Since all Apache input and output is based upon the BUFF
7e68fce3cbd2246164e045a51ecd77f9f26680ednd data type and its methods, the easiest solution was to add the
7e68fce3cbd2246164e045a51ecd77f9f26680ednd conversion to the BUFF handling routines. The conversion must
7e68fce3cbd2246164e045a51ecd77f9f26680ednd be settable at any time, so a BUFF flag was added which defines
bdbf46e4950b6f633073f803486962e82c2f086and whether a BUFF object has currently enabled conversion or not.
7e68fce3cbd2246164e045a51ecd77f9f26680ednd This flag is modified at several points in the HTTP
7e68fce3cbd2246164e045a51ecd77f9f26680ednd protocol:</p>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd (because the request and the request header lines are always
7e68fce3cbd2246164e045a51ecd77f9f26680ednd in ASCII format)</li>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd received - depending on the content type of the request body
7e68fce3cbd2246164e045a51ecd77f9f26680ednd (because the request body may contain ASCII text or a binary
7e68fce3cbd2246164e045a51ecd77f9f26680ednd file)</li>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd (because the response header lines are always in ASCII
7e68fce3cbd2246164e045a51ecd77f9f26680ednd format)</li>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <li><strong>set/unset</strong> when the response body is sent
7e68fce3cbd2246164e045a51ecd77f9f26680ednd - depending on the content type of the response body (because
7e68fce3cbd2246164e045a51ecd77f9f26680ednd the response body may contain text or a binary file)</li>
bdbf46e4950b6f633073f803486962e82c2f086and The relevant changes in the source are #ifdef'ed into two
7e68fce3cbd2246164e045a51ecd77f9f26680ednd categories:
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <dd>Code which is needed for any EBCDIC based machine.
7e68fce3cbd2246164e045a51ecd77f9f26680ednd This includes character translations, differences in
3726777f47ac4bba3e21b075905959bbea47e72eerikabele contiguity of the two character sets, flags which
7e68fce3cbd2246164e045a51ecd77f9f26680ednd indicate which part of the HTTP protocol has to be
7e68fce3cbd2246164e045a51ecd77f9f26680ednd mainframe platform only. This deals with include file
7e68fce3cbd2246164e045a51ecd77f9f26680ednd differences and socket implementation topics which are
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <li>The possibility to translate between ASCII and EBCDIC at
7e68fce3cbd2246164e045a51ecd77f9f26680ednd the socket level (on BS2000 POSIX, there is a socket option
7e68fce3cbd2246164e045a51ecd77f9f26680ednd which supports this) was intentionally <em>not</em> chosen,
7e68fce3cbd2246164e045a51ecd77f9f26680ednd because the byte stream at the HTTP protocol level consists
7e68fce3cbd2246164e045a51ecd77f9f26680ednd of a mixture of protocol related strings and non-protocol
7e68fce3cbd2246164e045a51ecd77f9f26680ednd related raw file data. HTTP protocol strings are always
7e68fce3cbd2246164e045a51ecd77f9f26680ednd encoded in ASCII (the GET request, any Header: lines, the
7e68fce3cbd2246164e045a51ecd77f9f26680ednd chunking information <em>etc.</em>) whereas the file transfer
7e68fce3cbd2246164e045a51ecd77f9f26680ednd parts (<em>i.e.</em>, GIF images, CGI output <em>etc.</em>)
7e68fce3cbd2246164e045a51ecd77f9f26680ednd should usually be just "passed through" by the server. This
7e68fce3cbd2246164e045a51ecd77f9f26680ednd separation between "protocol string" and "raw data" is
7e68fce3cbd2246164e045a51ecd77f9f26680ednd reflected in the server code by functions like bgets() or
7e68fce3cbd2246164e045a51ecd77f9f26680ednd rvputs() for strings, and functions like bwrite() for binary
7e68fce3cbd2246164e045a51ecd77f9f26680ednd data. A global translation of everything would therefore be
3726777f47ac4bba3e21b075905959bbea47e72eerikabele inadequate.<br />
7e68fce3cbd2246164e045a51ecd77f9f26680ednd (In the case of text files of course, provisions must be
3726777f47ac4bba3e21b075905959bbea47e72eerikabele made so that EBCDIC documents are always served in
7e68fce3cbd2246164e045a51ecd77f9f26680ednd ASCII)</li>
3726777f47ac4bba3e21b075905959bbea47e72eerikabele <li>This port therefore features a built-in protocol level
3726777f47ac4bba3e21b075905959bbea47e72eerikabele conversion for the server-internal strings (which the
3726777f47ac4bba3e21b075905959bbea47e72eerikabele compiler translated to EBCDIC strings) and thus for all
3726777f47ac4bba3e21b075905959bbea47e72eerikabele server-generated documents. The hard coded ASCII escapes \012
7e68fce3cbd2246164e045a51ecd77f9f26680ednd and \015 which are ubiquitous in the server code are an
d3cd98e7839dd1c737c18d42a916ed20860a50e1nd exception: they are already the binary encoding of the ASCII
7e68fce3cbd2246164e045a51ecd77f9f26680ednd \n and \r and must not be converted to ASCII a second time.
7e68fce3cbd2246164e045a51ecd77f9f26680ednd This exception is only relevant for server-generated strings;
7e68fce3cbd2246164e045a51ecd77f9f26680ednd and <em>external</em> EBCDIC documents are not expected to
bdbf46e4950b6f633073f803486962e82c2f086and contain ASCII newline characters.</li>
bdbf46e4950b6f633073f803486962e82c2f086and <li>By examining the call hierarchy for the BUFF management
7e68fce3cbd2246164e045a51ecd77f9f26680ednd routines, I added an "ebcdic/ascii conversion layer" which
bdbf46e4950b6f633073f803486962e82c2f086and conversions on-the-fly. Usually, a document crosses this
7e68fce3cbd2246164e045a51ecd77f9f26680ednd layer twice from its origin source (a file or CGI output) to
7e68fce3cbd2246164e045a51ecd77f9f26680ednd its destination (the requesting client): <samp>file ->
d177004a74b061338daf7f2603197d673ed76d36kess Apache</samp>, and <samp>Apache -> client</samp>.<br />
7e68fce3cbd2246164e045a51ecd77f9f26680ednd The server can now read the header lines of a CGI-script
7e68fce3cbd2246164e045a51ecd77f9f26680ednd output in EBCDIC format, and then find out that the remainder
7e68fce3cbd2246164e045a51ecd77f9f26680ednd of the script's output is in ASCII (like in the case of the
d3cd98e7839dd1c737c18d42a916ed20860a50e1nd output of a WWW Counter program: the document body contains a
7e68fce3cbd2246164e045a51ecd77f9f26680ednd GIF image). All header processing is done in the native
7e68fce3cbd2246164e045a51ecd77f9f26680ednd EBCDIC format; the server then determines, based on the type
bdbf46e4950b6f633073f803486962e82c2f086and of document being served, whether the document body (except
7e68fce3cbd2246164e045a51ecd77f9f26680ednd for the chunking information, of course) is in ASCII already
7e68fce3cbd2246164e045a51ecd77f9f26680ednd or must be converted from EBCDIC.</li>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd used, or (if the users prefer to store some documents in
7e68fce3cbd2246164e045a51ecd77f9f26680ednd raw ASCII form for faster serving, or because the files
7e68fce3cbd2246164e045a51ecd77f9f26680ednd reside on a NFS-mounted directory tree) can be served
d3cd98e7839dd1c737c18d42a916ed20860a50e1nd without conversion.<br />
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <blockquote>
d3cd98e7839dd1c737c18d42a916ed20860a50e1nd to serve files with the suffix .ahtml as a raw ASCII
7e68fce3cbd2246164e045a51ecd77f9f26680ednd </blockquote>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd ASCII" by configuring a MIME type "text/x-ascii-foo" for it
7e68fce3cbd2246164e045a51ecd77f9f26680ednd using AddType.
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <li>Non-text documents are always served "binary" without
7e68fce3cbd2246164e045a51ecd77f9f26680ednd conversion. This seems to be the most sensible choice for,
3726777f47ac4bba3e21b075905959bbea47e72eerikabele .<em>e.g.</em>, GIF/ZIP/AU file types. This of course
7e68fce3cbd2246164e045a51ecd77f9f26680ednd requires the user to copy them to the mainframe host using
7e68fce3cbd2246164e045a51ecd77f9f26680ednd the "rcp -b" binary switch.</li>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <li>Server parsed files are always assumed to be in native
d3cd98e7839dd1c737c18d42a916ed20860a50e1nd (<em>i.e.</em>, EBCDIC) format as used on the machine, and
7e68fce3cbd2246164e045a51ecd77f9f26680ednd are converted after processing.</li>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <li>For CGI output, the CGI script determines whether a
7e68fce3cbd2246164e045a51ecd77f9f26680ednd conversion is needed or not: by setting the appropriate
7e68fce3cbd2246164e045a51ecd77f9f26680ednd Content-Type, text files can be converted, or GIF output can
d3cd98e7839dd1c737c18d42a916ed20860a50e1nd be passed through unmodified. An example for the latter case
7e68fce3cbd2246164e045a51ecd77f9f26680ednd is the wwwcount program which we ported as well.</li>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <p>All files with a <samp>Content-Type:</samp> which does not
7e68fce3cbd2246164e045a51ecd77f9f26680ednd files</em> by the server and are not subject to any conversion.
3726777f47ac4bba3e21b075905959bbea47e72eerikabele Examples for binary files are GIF images, gzip-compressed files
7e68fce3cbd2246164e045a51ecd77f9f26680ednd and the like.</p>
3726777f47ac4bba3e21b075905959bbea47e72eerikabele <p>When exchanging binary files between the mainframe host and
7e68fce3cbd2246164e045a51ecd77f9f26680ednd a Unix machine or Windows PC, be sure to use the ftp "binary"
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <samp>rcp -b</samp> command from the mainframe host (the
7e68fce3cbd2246164e045a51ecd77f9f26680ednd -b switch is not supported in unix rcp's).</p>
3726777f47ac4bba3e21b075905959bbea47e72eerikabele <p>The default assumption of the server is that Text Files
7e68fce3cbd2246164e045a51ecd77f9f26680ednd (<em>i.e.</em>, all files whose <samp>Content-Type:</samp>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd character set of the host, EBCDIC.</p>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <p>SSI documents must currently be stored in EBCDIC only. No
7e68fce3cbd2246164e045a51ecd77f9f26680ednd provision is made to convert it from ASCII before
7e68fce3cbd2246164e045a51ecd77f9f26680ednd processing.</p>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <td align="LEFT"><a href="http://www.php.net/">mod_php3</a>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <td>mod_php3 runs fine, with LDAP and GD and FreeType
7e68fce3cbd2246164e045a51ecd77f9f26680ednd libraries</td>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd href="http://hpwww.ec-lyon.fr/~vincent/apache/mod_put.html">
7e68fce3cbd2246164e045a51ecd77f9f26680ednd href="ftp://hachiman.vidya.com/pub/apache/">mod_session</a>
7e68fce3cbd2246164e045a51ecd77f9f26680ednd <!--#include virtual="footer.html" -->