content-negotiation.html revision 44afbb2bc1622ddf021058e2a157cfd31fb67fd0
<html>
<head>
<title>Apache server Content arbitration: MultiViews and *.var files</title>
</head>
<body>
<!--#include virtual="header.html" -->
<h1>Content Arbitration: MultiViews and *.var files</h1>
The HTTP standard allows clients (i.e., browsers like Mosaic or
Netscape) to specify what data formats they are prepared to accept.
The intention is that when information is available in multiple
variants (e.g., in different data formats), servers can use this
information to decide which variant to send. This feature has been
supported in the CERN server for a while, and while it is not yet
supported in the NCSA server, it is likely to assume a new importance
in light of the emergence of HTML3 capable browsers. <p>
The Apache module <A HREF="mod_negotiation.html">mod_negotiation</A> handles
content negotiation in two different ways; special treatment for the
pseudo-mime-type <code>application/x-type-map</code>, and the
MultiViews per-directory Option (which can be set in srm.conf, or in
.htaccess files, as usual). These features are alternate user
interfaces to what amounts to the same piece of code (in the new file
<code>http_mime_db.c</code>) which implements the content negotiation
portion of the HTTP protocol. <p>
Each of these features allows one of several files to satisfy a
request, based on what the client says it's willing to accept; the
differences are in the way the files are identified:
<ul>
<li> A type map (i.e., a <code>*.var</code> file) names the files
containing the variants explicitly
<li> In a MultiViews search, the server does an implicit filename
pattern match, and chooses from among the results.
</ul>
Apache also supports a new pseudo-MIME type,
text/x-server-parsed-html3, which is treated as text/html;level=3
for purposes of content negotiation, and as server-side-included HTML
elsewhere.
<h3>Type maps (*.var files)</h3>
A type map is a document which is typed by the server (using its
normal suffix-based mechanisms) as
<code>application/x-type-map</code>. Note that to use this feature,
you've got to have an <code>AddType</code> some place which defines a
file suffix as <code>application/x-type-map</code>; the easiest thing
may be to stick a
<pre>
AddType application/x-type-map var
</pre>
in <code>srm.conf</code>. See comments in the sample config files for
details. <p>
Type map files have an entry for each available variant; these entries
consist of contiguous RFC822-format header lines. Entries for
different variants are separated by blank lines. Blank lines are
illegal within an entry. It is conventional to begin a map file with
an entry for the combined entity as a whole, e.g.,
<pre>
URI: foo; vary="type,language"
URI: foo.en.html
Content-type: text/html; level=2
Content-language: en
URI: foo.fr.html
Content-type: text/html; level=2
Content-language: fr
</pre>
If the variants have different qualities, that may be indicated by the
"qs" parameter, as in this picture (available as jpeg, gif, or ASCII-art):
<pre>
URI: foo; vary="type,language"
URI: foo.jpeg
Content-type: image/jpeg; qs=0.8
URI: foo.gif
Content-type: image/gif; qs=0.5
URI: foo.txt
Content-type: text/plain; qs=0.01
</pre><p>
The full list of headers recognized is:
<dl>
<dt> <code>URI:</code>
<dd> uri of the file containing the variant (of the given media
type, encoded with the given content encoding). These are
interpreted as URLs relative to the map file; they must be on
the same server (!), and they must refer to files to which the
client would be granted access if they were to be requested
directly.
<dt> <code>Content-type:</code>
<dd> media type --- level may be specified, along with "qs". These
are often referred to as MIME types; typical media types are
<code>image/gif</code>, <code>text/plain</code>, or
<code>text/html;&nbsp;level=3</code>.
<dt> <code>Content-language:</code>
<dd> The language of the variant, specified as an Internet standard
language code (e.g., <code>en</code> for English,
<code>kr</code> for Korean, etc.).
<dt> <code>Content-encoding:</code>
<dd> If the file is compressed, or otherwise encoded, rather than
containing the actual raw data, this says how that was done.
For compressed files (the only case where this generally comes
up), content encoding should be
<code>x-compress</code>, or <code>gzip</code>, as appropriate.
<dt> <code>Content-length:</code>
<dd> The size of the file. Clients can ask to receive a given media
type only if the variant isn't too big; specifying a content
length in the map allows the server to compare against these
thresholds without checking the actual file.
</dl>
<h3>Multiviews</h3>
This is a per-directory option, meaning it can be set with an
<code>Options</code> directive within a <code>&lt;Directory&gt;</code>
section in <code>access.conf</code>, or (if <code>AllowOverride</code>
is properly set) in <code>.htaccess</code> files. Note that
<code>Options All</code> does not set <code>MultiViews</code>; you
have to ask for it by name. (Fixing this is a one-line change to
<code>httpd.h</code>).
<p>
The effect of <code>MultiViews</code> is as follows: if the server
receives a request for <code>/some/dir/foo</code>, if
<code>/some/dir</code> has <code>MultiViews</code> enabled, and
<code>/some/dir/foo</code> does *not* exist, then the server reads the
directory looking for files named foo.*, and effectively fakes up a
type map which names all those files, assigning them the same media
types and content-encodings it would have if the client had asked for
one of them by name. It then chooses the best match to the client's
requirements, and forwards them along.
<p>
This applies to searches for the file named by the
<code>DirectoryIndex</code> directive, if the server is trying to
index a directory; if the configuration files specify
<pre>
DirectoryIndex index
</pre> then the server will arbitrate between <code>index.html</code>
and <code>index.html3</code> if both are present. If neither are
present, and <code>index.cgi</code> is there, the server will run it.
<p>
If one of the files found by the globbing is a CGI script, it's not
obvious what should happen. My code gives that case gets special
treatment --- if the request was a POST, or a GET with QUERY_ARGS or
PATH_INFO, the script is given an extremely high quality rating, and
generally invoked; otherwise it is given an extremely low quality
rating, which generally causes one of the other views (if any) to be
retrieved. This is the only jiggering of quality ratings done by the
MultiViews code; aside from that, all Qualities in the synthesized
type maps are 1.0.
<p>
<B>New as of 0.8:</B> Documents in multiple languages can also be resolved through the use
of the <code>AddLanguage</code> and <code>LanguagePriority</code>
directives:
<pre>
AddLanguage en .en
AddLanguage fr .fr
AddLanguage de .de
AddLanguage da .da
AddLanguage el .el
AddLanguage it .it
# LanguagePriority allows you to give precedence to some languages
# in case of a tie during content negotiation.
# Just list the languages in decreasing order of preference.
LanguagePriority en fr de
</pre>
Here, a request for "foo.html" matched against "foo.html.en" and
"foo.html.fr" would return an French document to a browser that
indicated a preference for French, or an English document otherwise.
In fact, a request for "foo" matched against "foo.html.en",
"foo.html.fr", "foo.ps.en", "foo.pdf.de", and "foo.txt.it" would do
just what you expect - treat those suffices as a database and compare
the request to it, returning the best match. The languages and data
types share the same suffix name space.
<p>
Note that this machinery only comes into play if the file which the
user attempted to retrieve does <em>not</em> exist by that name; if it
does, it is simply retrieved as usual. (So, someone who actually asks
for <code>foo.jpeg</code>, as opposed to <code>foo</code>, never gets
<code>foo.gif</code>).
<!--#include virtual="footer.html" -->
</body> </html>