mod_charset_lite.html revision a508647e3b1f7bb7fb521198120b2e4a5927b5be
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>Apache module mod_charset_lite</TITLE>
</HEAD>
<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
<BODY
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#000080"
ALINK="#FF0000"
>
<!--#include virtual="header.html" -->
<H1 ALIGN="CENTER">Module mod_charset_lite</H1>
<P>
This module is contained in the <CODE>mod_charset_lite.c</CODE> file, with
Apache 2.0 and later. It provides the ability to specify character set
translation, or recoding, by directory or location or virtual server. It
is not compiled into the server by default. <CODE>mod_charset_lite</CODE>
requires that Apache is compiled with APACHE_XLATE defined.
</P>
<P>
This module provides a small subset of configuration mechanisms
implemented by Russian Apache and its associated <CODE>mod_charset</CODE>.
</P>
<H2>Summary</H2>
<P>
This is an <STRONG>experimental</STRONG> module and should be used with
care. Experiment with your <CODE>mod_charset_lite</CODE> configuration to
ensure that it performs the desired function.
</P>
<P>
<CODE>mod_charset_lite</CODE> allows the administrator to specify the
source character set of objects as well as the character set they should
be translated into before sending to the client.
<CODE>mod_charset_lite</CODE> does not translate the data itself but
instead tells Apache what translation to perform.
<CODE>mod_charset_lite</CODE> is applicable to EBCDIC and ASCII
host environments. In an EBCDIC environment, Apache normally translates
text content from the code page of the Apache process locale to
ISO-8859-1. <CODE>mod_charset_lite</CODE> can be used to specify that
a different translation is to be performed. In an ASCII environment,
Apache normally performs no translation, so <CODE>mod_charset_lite</CODE>
is needed in order for any translation to take place.
</P>
<H2>Directives</H2>
<UL>
<LI><A HREF="#charsetsourceenc">CharsetSourceEnc</A>
<LI><A HREF="#charsetdefault">CharsetDefault</A>
<LI><A HREF="#charsetoptions">CharsetOptions</A>
</LI>
</UL>
<HR>
<H2><A NAME="charsetsourceenc">CharsetSourceEnc</A></H2>
<P>
<A
HREF="directive-dict.html#Syntax"
REL="Help"
><STRONG>Syntax:</STRONG></A> CharsetSourceEnc <EM>charset</EM>
<BR>
<A
HREF="directive-dict.html#Default"
REL="Help"
><STRONG>Default:</STRONG></A> <EM>None</EM>
<BR>
<A
HREF="directive-dict.html#Context"
REL="Help"
><STRONG>Context:</STRONG></A> directory, virtual host
<BR>
<A
HREF="directive-dict.html#Override"
REL="Help"
><STRONG>Override:</STRONG></A> <EM>FileInfo</EM>
<BR>
<A
HREF="directive-dict.html#Status"
REL="Help"
><STRONG>Status:</STRONG></A> Experimental
<BR>
<A
HREF="directive-dict.html#Module"
REL="Help"
><STRONG>Module:</STRONG></A> mod_charset_lite
<BR>
<A
HREF="directive-dict.html#Compatibility"
REL="Help"
><STRONG>Compatibility:</STRONG></A> Only available in Apache 2.0 or later
<P>
The <CODE>CharsetSourceEnc</CODE> directive specifies the source charset
of files in the associated container.
</P>
<P>
The value of the <EM>charset</EM> argument must be accepted as a valid
character set name by the character set support in APR. Generally, this
means that it must be supported by iconv.
</P>
Example:
<PRE>
&lt;Directory "/export/home/trawick/apacheinst/htdocs/convert"&gt;
CharsetSourceEnc UTF-16BE
CharsetDefault ISO8859-1
&lt;/Directory&gt;
</PRE>
The character set names in this example work with the iconv
translation support in Solaris 8.
<P>
<H2><A NAME="charsetdefault">CharsetDefault</A></H2>
<P>
<A
HREF="directive-dict.html#Syntax"
REL="Help"
><STRONG>Syntax:</STRONG></A> CharsetDefault <EM>charset</EM>
<BR>
<A
HREF="directive-dict.html#Default"
REL="Help"
><STRONG>Default:</STRONG></A> <EM>None</EM>
<BR>
<A
HREF="directive-dict.html#Context"
REL="Help"
><STRONG>Context:</STRONG></A> directory, virtual host
<BR>
<A
HREF="directive-dict.html#Override"
REL="Help"
><STRONG>Override:</STRONG></A> <EM>FileInfo</EM>
<BR>
<A
HREF="directive-dict.html#Status"
REL="Help"
><STRONG>Status:</STRONG></A> Experimental
<BR>
<A
HREF="directive-dict.html#Module"
REL="Help"
><STRONG>Module:</STRONG></A> mod_charset_lite
<BR>
<A
HREF="directive-dict.html#Compatibility"
REL="Help"
><STRONG>Compatibility:</STRONG></A> Only available in Apache 2.0 or later
<P>
The <CODE>CharsetDefault</CODE> directive specifies the charset that
content in the associated container should be translated to.
</P>
<P>
The value of the <EM>charset</EM> argument must be accepted as a valid
character set name by the character set support in APR. Generally, this
means that it must be supported by iconv.
</P>
Example:
<PRE>
&lt;Directory "/export/home/trawick/apacheinst/htdocs/convert"&gt;
CharsetSourceEnc UTF-16BE
CharsetDefault ISO8859-1
&lt;/Directory&gt;
</PRE>
<P>
<H2><A NAME="charsetoptions">CharsetOptions</A></H2>
<P>
<A
HREF="directive-dict.html#Syntax"
REL="Help"
><STRONG>Syntax:</STRONG></A> CharsetOptions <EM>option option ...</EM>
<BR>
<A
HREF="directive-dict.html#Default"
REL="Help"
><STRONG>Default:</STRONG></A> <EM>DebugLevel=0</EM> <EM>NoImplicitAdd</EM>
<BR>
<A
HREF="directive-dict.html#Context"
REL="Help"
><STRONG>Context:</STRONG></A> directory, virtual host
<BR>
<A
HREF="directive-dict.html#Override"
REL="Help"
><STRONG>Override:</STRONG></A> <EM>FileInfo</EM>
<BR>
<A
HREF="directive-dict.html#Status"
REL="Help"
><STRONG>Status:</STRONG></A> Experimental
<BR>
<A
HREF="directive-dict.html#Module"
REL="Help"
><STRONG>Module:</STRONG></A> mod_charset_lite
<BR>
<A
HREF="directive-dict.html#Compatibility"
REL="Help"
><STRONG>Compatibility:</STRONG></A> Only available in Apache 2.0 or later
<P>
The <CODE>CharsetOptions</CODE> directive configures certain behaviors
of <CODE>mod_charset_lite</CODE>. <EM>Option</EM> can be one of
<DL>
<DT>DebugLevel=<EM>n</EM>
<DD>
The <SAMP>DebugLevel</SAMP> keyword allows you to specify the level of
debug messages generated by <CODE>mod_charset_lite</CODE>. By default, no
messages are generated. This is equivalent to <SAMP>DebugLevel=0</SAMP>.
With higher numbers, more debug messages are generated, and server
performance will be degraded. The actual meanings of the numeric values
are described with the definitions of the DBGLVL_ constants near the
beginning of <CODE>mod_charset_lite.c</CODE>.
<DT>ImplicitAdd | NoImplicitAdd
<DD>
The <SAMP>ImplicitAdd</SAMP> keyword specifies that
<CODE>mod_charset_lite</CODE> should implicitly insert its filter when
the configuration specifies that the character set of content should be
translated. If the filter chain is explicitly configured using the
AddOutputFilter directive, <SAMP>NoImplicitAdd</SAMP> should be specified so
that <CODE>mod_charset_lite</CODE> doesn't add its filter.
</DL>
</P>
<H2>Common Problems</H2>
<H3>Invalid character set names</H3>
<P>
The character set name parameters of CharsetSourceEnc and CharsetDefault
must be acceptable to the translation mechanism used by APR on the system
where mod_charset_lite is deployed. These character set names are not
standardized and are usually not the same as the corresponding values used
in http headers. Currently, APR can only use iconv(3), so you can easily
test your character set names using the iconv(1) program, as follows:
</P>
<PRE>
iconv -f charsetsourceenc-value -t charsetdefault-value
</PRE>
<H3>Mismatch between character set of content and translation rules</H3>
<P>
If the translation rules don't make sense for the content, translation
can fail in various ways, including:
</P>
<SL>
<LI>
The translation mechanism may return a bad return code, and the connection
will be aborted.
<LI>
The translation mechanism may silently place special characters (e.g., question
marks) in the output buffer when it cannot translate the input buffer.
</SL>
<!--#include virtual="footer.html" -->
</BODY>
</HTML>