f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<?xml version="1.0"?>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<!DOCTYPE modulesynopsis SYSTEM "/style/modulesynopsis.dtd">
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<?xml-stylesheet type="text/xsl" href="/style/manual.en.xsl"?>
ecab23582cf2cd581808e72bf5e67db694b123e7nd<!-- $LastChangedRevision$ -->
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<!--
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq Licensed to the Apache Software Foundation (ASF) under one or more
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq contributor license agreements. See the NOTICE file distributed with
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq this work for additional information regarding copyright ownership.
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq The ASF licenses this file to You under the Apache License, Version 2.0
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq (the "License"); you may not use this file except in compliance with
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq the License. You may obtain a copy of the License at
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq http://www.apache.org/licenses/LICENSE-2.0
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq Unless required by applicable law or agreed to in writing, software
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq distributed under the License is distributed on an "AS IS" BASIS,
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq See the License for the specific language governing permissions and
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq limitations under the License.
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq-->
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<modulesynopsis metafile="mod_xml2enc.xml.meta">
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<name>mod_xml2enc</name>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<description>Enhanced charset/internationalisation support for libxml2-based
f4d3a92b319b23e2b8d67298acc289d52bc1c517niqfilter modules</description>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<status>Base</status>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<sourcefile>mod_xml2enc.c</sourcefile>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<identifier>xml2enc_module</identifier>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<compatibility>Version 2.4 and later. Available as a third-party module
f4d3a92b319b23e2b8d67298acc289d52bc1c517niqfor 2.2.x versions</compatibility>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<summary>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>This module provides enhanced internationalisation support for
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq markup-aware filter modules such as <module>mod_proxy_html</module>.
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq It can automatically detect the encoding of input data and ensure
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq they are correctly processed by the <a href="http://xmlsoft.org/"
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq >libxml2</a> parser, including converting to Unicode (UTF-8) where
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq necessary. It can also convert data to an encoding of choice
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq after markup processing, and will ensure the correct <var>charset</var>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq value is set in the HTTP <var>Content-Type</var> header.</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</summary>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<section id="usage"><title>Usage</title>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>There are two usage scenarios: with modules programmed to work
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq with mod_xml2enc, and with those that are not aware of it:</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <dl>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <dt>Filter modules enabled for mod_xml2enc</dt><dd>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>Modules such as <module>mod_proxy_html</module> version 3.1
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq and up use the <code>xml2enc_charset</code> optional function to retrieve
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq the charset argument to pass to the libxml2 parser, and may use the
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <code>xml2enc_filter</code> optional function to postprocess to another
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq encoding. Using mod_xml2enc with an enabled module, no configuration
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq is necessary: the other module will configure mod_xml2enc for you
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq (though you may still want to customise it using the configuration
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq directives below).</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq </dd>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <dt>Non-enabled modules</dt><dd>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>To use it with a libxml2-based module that isn't explicitly enabled for
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq mod_xml2enc, you will have to configure the filter chain yourself.
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq So to use it with a filter foo provided by a module mod_foo to
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq improve the latter's i18n support with HTML and XML, you could use</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <pre><code>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq FilterProvider iconv xml2enc Content-Type $text/html
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq FilterProvider iconv xml2enc Content-Type $xml
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq FilterProvider markup foo Content-Type $text/html
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq FilterProvider markup foo Content-Type $xml
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq FilterChain iconv markup
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq </code></pre>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>mod_foo will now support any character set supported by either
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq (or both) of libxml2 or apr_xlate/iconv.</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq </dd></dl>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</section>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<section id="api"><title>Programming API</title>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>Programmers writing libxml2-based filter modules are encouraged to
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq enable them for mod_xml2enc, to provide strong i18n support for your
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq users without reinventing the wheel. The programming API is exposed in
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <var>mod_xml2enc.h</var>, and a usage example is
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <module>mod_proxy_html</module>.</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</section>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<section id="sniffing"><title>Detecting an Encoding</title>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>Unlike <module>mod_charset_lite</module>, mod_xml2enc is designed
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq to work with data whose encoding cannot be known in advance and thus
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq configured. It therefore uses 'sniffing' techniques to detect the
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq encoding of HTTP data as follows:</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <ol>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <li>If the HTTP <var>Content-Type</var> header includes a
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <var>charset</var> parameter, that is used.</li>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <li>If the data start with an XML Byte Order Mark (BOM) or an
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq XML encoding declaration, that is used.</li>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <li>If an encoding is declared in an HTML <code>&lt;META&gt;</code>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq element, that is used.</li>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <li>If none of the above match, the default value set by
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <directive>xml2EncDefault</directive> is used.</li>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq </ol>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>The rules are applied in order. As soon as a match is found,
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq it is used and detection is stopped.</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</section>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<section id="output"><title>Output Encoding</title>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<p><a href="http://xmlsoft.org/">libxml2</a> always uses UTF-8 (Unicode)
f4d3a92b319b23e2b8d67298acc289d52bc1c517niqinternally, and libxml2-based filter modules will output that by default.
f4d3a92b319b23e2b8d67298acc289d52bc1c517niqmod_xml2enc can change the output encoding through the API, but there
f4d3a92b319b23e2b8d67298acc289d52bc1c517niqis currently no way to configure that directly.</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<p>Changing the output encoding should (in theory, at least) never be
f4d3a92b319b23e2b8d67298acc289d52bc1c517niqnecessary, and is not recommended due to the extra processing load on
f4d3a92b319b23e2b8d67298acc289d52bc1c517niqthe server of an unnecessary conversion.</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</section>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<section id="alias"><title>Unsupported Encodings</title>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<p>If you are working with encodings that are not supported by any of
f4d3a92b319b23e2b8d67298acc289d52bc1c517niqthe conversion methods available on your platform, you can still alias
f4d3a92b319b23e2b8d67298acc289d52bc1c517niqthem to a supported encoding using <directive>xml2EncAlias</directive>.</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</section>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<directivesynopsis>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<name>xml2EncDefault</name>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<description>Sets a default encoding to assume when absolutely no information
f4d3a92b319b23e2b8d67298acc289d52bc1c517niqcan be <a href="#sniffing">automatically detected</a></description>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<syntax>xml2EncDefault <var>name</var></syntax>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<contextlist><context>server config</context>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<context>virtual host</context><context>directory</context>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<context>.htaccess</context></contextlist>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<compatibility>Version 2.4.0 and later; available as a third-party
f4d3a92b319b23e2b8d67298acc289d52bc1c517niqmodule for earlier versions.</compatibility>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<usage>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>If you are processing data with known encoding but no encoding
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq information, you can set this default to help mod_xml2enc process
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq the data correctly. For example, to work with the default value
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq of Latin1 (<var>iso-8859-1</var> specified in HTTP/1.0, use</p>
7a98571671f92e53441bf24a0222768072172f90coar <highlight language="config">
7a98571671f92e53441bf24a0222768072172f90coarxml2EncDefault iso-8859-1
7a98571671f92e53441bf24a0222768072172f90coar </highlight>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</usage>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</directivesynopsis>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<directivesynopsis>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<name>xml2EncAlias</name>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<description>Recognise Aliases for encoding values</description>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<syntax>xml2EncAlias <var>charset alias [alias ...]</var></syntax>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<contextlist><context>server config</context></contextlist>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<usage>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>This server-wide directive aliases one or more encoding to another
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq encoding. This enables encodings not recognised by libxml2 to be handled
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq internally by libxml2's encoding support using the translation table for
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq a recognised encoding. This serves two purposes: to support character sets
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq (or names) not recognised either by libxml2 or iconv, and to skip
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq conversion for an encoding where it is known to be unnecessary.</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</usage>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</directivesynopsis>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<directivesynopsis>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<name>xml2StartParse</name>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<description>Advise the parser to skip leading junk.</description>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<syntax>xml2StartParse <var>element [element ...]</var></syntax>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<contextlist><context>server config</context><context>virtual host</context>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<context>directory</context><context>.htaccess</context></contextlist>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq<usage>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>Specify that the markup parser should start at the first instance
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq of any of the elements specified. This can be used as a workaround
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq where a broken backend inserts leading junk that messes up the parser (<a
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq href="http://bahumbug.wordpress.com/2006/10/12/mod_proxy_html-revisited/"
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq >example here</a>).</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq <p>It should never be used for XML, nor well-formed HTML.</p>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</usage>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</directivesynopsis>
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq
f4d3a92b319b23e2b8d67298acc289d52bc1c517niq</modulesynopsis>