286N/A * reserved comment block 286N/A * DO NOT REMOVE OR ALTER! 286N/A * Copyright 1999-2004 The Apache Software Foundation. 286N/A * Licensed under the Apache License, Version 2.0 (the "License"); 286N/A * you may not use this file except in compliance with the License. 286N/A * You may obtain a copy of the License at 286N/A * Unless required by applicable law or agreed to in writing, software 286N/A * distributed under the License is distributed on an "AS IS" BASIS, 286N/A * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 286N/A * See the License for the specific language governing permissions and 286N/A * limitations under the License. 286N/A * Provides information about encodings. Depends on the Java runtime 286N/A * to provides writers for the different encodings, but can be used 286N/A * to override encoding names and provide the last printable character 286N/A * @version $Revision: 1.11 $ $Date: 2010-11-01 04:34:44 $ 286N/A * @author <a href="mailto:arkin@intalio.com">Assaf Arkin</a> 286N/A * The last printable character for unknown encodings. 286N/A * Standard filename for properties file with encodings data. 286N/A * Standard filename for properties file with encodings data. 286N/A * Returns a writer for the specified encoding based on 286N/A * @param output The output stream 286N/A * @param encoding The encoding 286N/A * @return A suitable writer 286N/A * @throws UnsupportedEncodingException There is no convertor 286N/A * to support this encoding 286N/A * Returns the last printable character for an unspecified 286N/A * @return the default size 286N/A * Returns the EncodingInfo object for the specified 286N/A * This is not a public API. 286N/A * @param encoding The encoding 286N/A * @return The object that is used to determine if 286N/A * characters are in the given encoding. 286N/A // We shouldn't have to do this, but just in case. 566N/A // This may happen if the caller tries to use 566N/A // an encoding that wasn't registered in the 566N/A // (java name)->(preferred mime name) mapping file. 566N/A // In that case we attempt to load the charset for the 566N/A // given encoding, and if that succeeds - we create a new 566N/A // EncodingInfo instance - assuming the canonical name 566N/A // of the charset can be used as the mime name. 286N/A * A fast and cheap way to uppercase a String that is 286N/A * only made of printable ASCII characters. 286N/A * This is not a public API. 286N/A * @param s a String of ASCII characters 286N/A * @return an uppercased version of the input String, 286N/A * possibly the same String. 286N/A // is the character a lower case ASCII one? 286N/A // a cheap and fast way to uppercase that is good enough 286N/A // A little optimization, don't call String.valueOf() if 286N/A // the uppercased string is the same as the input string. 286N/A /** The default encoding, ISO style, ISO style. */ 286N/A * Get the proper mime encoding. From the XSLT recommendation: "The encoding 286N/A * attribute specifies the preferred encoding to use for outputting the result 286N/A * tree. XSLT processors are required to respect values of UTF-8 and UTF-16. 286N/A * For other values, if the XSLT processor does not support the specified 286N/A * encoding it may signal an error; if it does not signal an error it should 286N/A * use UTF-8 or UTF-16 instead. The XSLT processor must not use an encoding 286N/A * whose name does not match the EncName production of the XML Recommendation 286N/A * [XML]. If no encoding attribute is specified, then the XSLT processor should 286N/A * use either UTF-8 or UTF-16." 286N/A * @param encoding Reference to java-style encoding string, which may be null, 286N/A * in which case a default will be found. 286N/A * @return The ISO-style encoding string, or null if failure. 286N/A // Get the default system character encoding. This may be 286N/A // incorrect if they passed in a writer, but right now there 286N/A // seems to be no way to get the encoding from a writer. 286N/A * See if the mime type is equal to UTF8. If you don't 286N/A * do that, then convertJava2MimeEncoding will convert 286N/A * 8859_1 to "ISO-8859-1", which is not what we want, 286N/A * I think, and I don't think I want to alter the tables 286N/A * to convert everything to UTF-8. 286N/A * Try the best we can to convert a Java encoding to a XML-style encoding. 286N/A * @param encoding non-null reference to encoding string, java style. 286N/A * @return ISO-style encoding string. 286N/A * Try the best we can to convert a Java encoding to a XML-style encoding. 286N/A * @param encoding non-null reference to encoding string, java style. 286N/A * @return ISO-style encoding string. 566N/A // Using an inner static class here prevent initialization races 566N/A // where the hash maps could be used before they were populated. 566N/A // These maps are final and not modified after initialization. 566N/A // This map will be added to after initialization: make sure it's 566N/A // thread-safe. This map should not be used frequently - only in cases 566N/A // name mapping and returns it as an InputStream. 566N/A // Loads the Properties resource containing the mapping: 566N/A // java charset name -> preferred mime name 566N/A // Seems to be no real need to force failure here, let the 566N/A // system do its best... The issue is not really very critical, 566N/A // and the output will be in any case _correct_ though maybe not 566N/A // always human-friendly... :) 566N/A // Parses the mime list associated to a java charset name. 566N/A // The first mime name in the list is supposed to be the preferred 566N/A // "Last printable character not defined for encoding " + 566N/A // mimeName + " (" + val + ")" ... 566N/A //lastPrintable = 0x00FF; 566N/A // Integer.decode(val.substring(pos).trim()).intValue(); 566N/A // This method here attempts to find the canonical charset name for the 566N/A // the given name - which is supposed to be either a java name or a mime 566N/A // For that, it attempts to load the charset using the given name, and 566N/A // then returns the charset's canonical name. 566N/A // If the charset could not be loaded from the given name, 566N/A // the method returns null. 566N/A // This method here attempts to find the canonical charset name for the 566N/A // the set javaName+mimeNames - which are supposed to all refer to the 566N/A // For that it attempts to load the charset using the javaName, and if 566N/A // not found, attempts again using each of the mime names in turn. 566N/A // If the charset could be loaded from the javaName, then the javaName 566N/A // itself is returned as charset name. Otherwise, each of the mime names 566N/A // is tried in turn, until a charset can be loaded from one of the names, 566N/A // and the loaded charset's canonical name is returned. 566N/A // If no charset can be loaded from either the javaName or one of the 566N/A // mime names, then null is returned. 566N/A // Note that the returned name is the 'java' name that will be used in 566N/A // instances of EncodingInfo. 566N/A // This is important because EncodingInfo uses that 'java name' later on 566N/A // in calls to String.getBytes(javaName). 566N/A // is known by Charset: sometimes only one of the mime names is known, 566N/A // sometime only the javaName is known, sometimes all are known. 566N/A // By using this method here, we fix the problem where one of the mime 566N/A // names is known but the javaName is unknown, by associating the charset 566N/A // loaded from one of the mime names with the unrecognized javaName. 566N/A // When none of the mime names or javaName are known - there's not much we can 566N/A // do... It can mean that this encoding is not supported for this 566N/A // OS. If such a charset is ever use it will result in having all characters 566N/A * Loads a list of all the supported encodings. 566N/A * System property "encodings" formatted using URL syntax may define an 566N/A * external encodings list. Thanks to Sergey Ushakov for the code 566N/A // load (java name)->(preferred mime name) mapping. 566N/A // create instances of EncodingInfo from the loaded mapping 566N/A // canonicals will map the charset name to 566N/A // the info containing the prefered mime name 566N/A // (the preferred mime name is the first mime 566N/A // None of the java or mime names on the line were 566N/A // recognized => this charset is not supported? 566N/A // Fix up the _encodingTableKeyJava so that the info mapped to 566N/A // the java name contains the preferred mime name. 566N/A // (a given java name can correspond to several mime name, 566N/A // but we want the _encodingTableKeyJava to point to the 566N/A // preferred mime name). 286N/A * Return true if the character is the high member of a surrogate pair. 286N/A * This is not a public API. 286N/A * @param ch the character to test 286N/A * Return true if the character is the low member of a surrogate pair. 286N/A * This is not a public API. 286N/A * @param ch the character to test 286N/A * Return the unicode code point represented by the high/low surrogate pair. 286N/A * This is not a public API. 286N/A * Return the unicode code point represented by the char. 286N/A * A bit of a dummy method, since all it does is return the char, 286N/A * This is not a public API.