325N/A * Copyright (c) 1997, 2010, Oracle and/or its affiliates. All rights reserved. 325N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 325N/A * This code is free software; you can redistribute it and/or modify it 325N/A * under the terms of the GNU General Public License version 2 only, as 325N/A * published by the Free Software Foundation. Oracle designates this 325N/A * particular file as subject to the "Classpath" exception as provided 325N/A * by Oracle in the LICENSE file that accompanied this code. 325N/A * This code is distributed in the hope that it will be useful, but WITHOUT 325N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 325N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 325N/A * version 2 for more details (a copy is included in the LICENSE file that 325N/A * accompanied this code). 325N/A * You should have received a copy of the GNU General Public License version 325N/A * 2 along with this work; if not, write to the Free Software Foundation, 325N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 325N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 325N/A * or visit www.oracle.com if you need additional information or have any 325N/A * This is a utility class that provides various MIME related 325N/A * There are a set of methods to encode and decode MIME headers as 325N/A * per RFC 2047. A brief description on handling such headers is 325N/A * RFC 822 mail headers <strong>must</strong> contain only US-ASCII 325N/A * characters. Headers that contain non US-ASCII characters must be 325N/A * encoded so that they contain only US-ASCII characters. Basically, 325N/A * this process involves using either BASE64 or QP to encode certain 325N/A * characters. RFC 2047 describes this in detail. <p> 325N/A * In Java, Strings contain (16 bit) Unicode characters. ASCII is a 325N/A * subset of Unicode (and occupies the range 0 - 127). A String 325N/A * that contains only ASCII characters is already mail-safe. If the 325N/A * String contains non US-ASCII characters, it must be encoded. An 325N/A * additional complexity in this step is that since Unicode is not 325N/A * yet a widely used charset, one might want to first charset-encode 325N/A * the String into another charset and then do the transfer-encoding. 325N/A * Note that to get the actual bytes of a mail-safe String (say, 325N/A * for sending over SMTP), one must do 325N/A * byte[] bytes = string.getBytes("iso-8859-1"); 325N/A * </pre></blockquote><p> 325N/A * The <code>setHeader</code> and <code>addHeader</code> methods 325N/A * on MimeMessage and MimeBodyPart assume that the given header values 325N/A * are Unicode strings that contain only US-ASCII characters. Hence 325N/A * the callers of those methods must insure that the values they pass 325N/A * do not contain non US-ASCII characters. The methods in this class 325N/A * The <code>getHeader</code> family of methods on MimeMessage and 325N/A * MimeBodyPart return the raw header value. These might be encoded 325N/A * as per RFC 2047, and if so, must be decoded into Unicode Strings. 325N/A * The methods in this class help to do this. <p> 325N/A * Several System properties control strict conformance to the MIME 325N/A * spec. Note that these are not session properties but must be set 325N/A * globally as System properties. <p> 325N/A * The <code>mail.mime.decodetext.strict</code> property controls 325N/A * decoding of MIME encoded words. The MIME spec requires that encoded 325N/A * words start at the beginning of a whitespace separated word. Some 325N/A * mailers incorrectly include encoded words in the middle of a word. 325N/A * If the <code>mail.mime.decodetext.strict</code> System property is 325N/A * set to <code>"false"</code>, an attempt will be made to decode these 325N/A * illegal encoded words. The default is true. <p> 325N/A * The <code>mail.mime.encodeeol.strict</code> property controls the 325N/A * choice of Content-Transfer-Encoding for MIME parts that are not of 325N/A * type "text". Often such parts will contain textual data for which 325N/A * an encoding that allows normal end of line conventions is appropriate. 325N/A * In rare cases, such a part will appear to contain entirely textual 325N/A * data, but will require an encoding that preserves CR and LF characters 325N/A * without change. If the <code>mail.mime.decodetext.strict</code> 325N/A * System property is set to <code>"true"</code>, such an encoding will 325N/A * be used when necessary. The default is false. <p> 325N/A * In addition, the <code>mail.mime.charset</code> System property can 325N/A * be used to specify the default MIME charset to use for encoded words 325N/A * and text parts that don't otherwise specify a charset. Normally, the 325N/A * default MIME charset is derived from the default Java charset, as 325N/A * specified in the <code>file.encoding</code> System property. Most 325N/A * applications will have no need to explicitly set the default MIME 325N/A * charset. In cases where the default MIME charset to be used for 325N/A * mail messages is different than the charset used for files stored on 325N/A * the system, this property should be set. 325N/A * @version 1.45, 03/03/10 325N/A // This class cannot be instantiated 325N/A * Get the content-transfer-encoding that should be applied 325N/A * to the input stream of this datasource, to make it mailsafe. <p> 325N/A * The algorithm used here is: <br> 325N/A * If the primary type of this datasource is "text" and if all 325N/A * the bytes in its input stream are US-ASCII, then the encoding 325N/A * is "7bit". If more than half of the bytes are non-US-ASCII, then 325N/A * the encoding is "base64". If less than half of the bytes are 325N/A * non-US-ASCII, then the encoding is "quoted-printable". 325N/A * If the primary type of this datasource is not "text", then if 325N/A * all the bytes of its input stream are US-ASCII, the encoding 325N/A * is "7bit". If there is even one non-US-ASCII character, the 325N/A * encoding is "base64". 325N/A * @return the encoding. This is either "7bit", 325N/A * "quoted-printable" or "base64" 325N/A return "base64";
// what else ?! 325N/A // if not text, stop processing when we see non-ASCII 325N/A // Close the input stream 325N/A * Same as <code>getEncoding(DataSource)</code> except that instead 325N/A * of reading the data from an <code>InputStream</code> it uses the 325N/A * <code>writeTo</code> method to examine the data. This is more 325N/A * efficient in the common case of a <code>DataHandler</code> 325N/A * created with an object and a MIME type (for example, a 325N/A * thread. In the case requiring an <code>InputStream</code> the 325N/A * <code>DataHandler</code> uses a thread, a pair of pipe streams, 325N/A * and the <code>writeTo</code> method to produce the data. <p> 325N/A * Try to pick the most efficient means of determining the 325N/A * encoding. If this DataHandler was created using a DataSource, 325N/A * the getEncoding(DataSource) method is typically faster. If 325N/A * the DataHandler was created with an object, this method is 325N/A * much faster. To distinguish the two cases, we use a heuristic. 325N/A * A DataHandler created with an object will always have a null name. 325N/A * A DataHandler created with a DataSource will usually have a 325N/A * XXX - This is actually quite a disgusting hack, but it makes 325N/A * a common case run over twice as fast. 325N/A return "base64";
// what else ?! 325N/A // Check all of the available bytes 325N/A // Check all of available bytes, break out if we find 325N/A // at least one non-US-ASCII character 325N/A else // found atleast one non-ascii character, use b64 325N/A * Decode the given input stream. The Input stream returned is 325N/A * the decoded input stream. All the encodings defined in RFC 2045 325N/A * are supported here. They include "base64", "quoted-printable", 325N/A * "7bit", "8bit", and "binary". In addition, "uuencode" is also 325N/A * @param is input stream 325N/A * @param encoding the encoding of the stream. 325N/A * @return decoded input stream. 325N/A * Wrap an encoder around the given output stream. 325N/A * All the encodings defined in RFC 2045 are supported here. 325N/A * They include "base64", "quoted-printable", "7bit", "8bit" and 325N/A * "binary". In addition, "uuencode" is also supported. 325N/A * @param os output stream 325N/A * @param encoding the encoding of the stream. 325N/A * @return output stream that applies the 325N/A * Wrap an encoder around the given output stream. 325N/A * All the encodings defined in RFC 2045 are supported here. 325N/A * They include "base64", "quoted-printable", "7bit", "8bit" and 325N/A * "binary". In addition, "uuencode" is also supported. 325N/A * The <code>filename</code> parameter is used with the "uuencode" 325N/A * encoding and is included in the encoded output. 325N/A * @param os output stream 325N/A * @param encoding the encoding of the stream. 325N/A * @param filename name for the file being encoded (only used 325N/A * @return output stream that applies the 325N/A * Encode a RFC 822 "text" token into mail-safe form as per 325N/A * The given Unicode string is examined for non US-ASCII 325N/A * characters. If the string contains only US-ASCII characters, 325N/A * it is returned as-is. If the string contains non US-ASCII 325N/A * characters, it is first character-encoded using the platform's 325N/A * default charset, then transfer-encoded using either the B or 325N/A * Q encoding. The resulting bytes are then returned as a Unicode 325N/A * string containing only ASCII characters. <p> 325N/A * Note that this method should be used to encode only 325N/A * "unstructured" RFC 822 headers. <p> 325N/A * MimeBodyPart part = ... 325N/A * String rawvalue = "FooBar Mailer, Japanese version 1.1" 325N/A * // If we know for sure that rawvalue contains only US-ASCII 325N/A * // characters, we can skip the encoding part 325N/A * part.setHeader("X-mailer", MimeUtility.encodeText(rawvalue)); 325N/A * } catch (UnsupportedEncodingException e) { 325N/A * } catch (MessagingException me) { 325N/A * // setHeader() failure 325N/A * </pre></blockquote><p> 325N/A * @param text unicode string 325N/A * @return Unicode string containing only US-ASCII characters 325N/A * @exception UnsupportedEncodingException if the encoding fails 325N/A * Encode a RFC 822 "text" token into mail-safe form as per 325N/A * The given Unicode string is examined for non US-ASCII 325N/A * characters. If the string contains only US-ASCII characters, 325N/A * it is returned as-is. If the string contains non US-ASCII 325N/A * characters, it is first character-encoded using the specified 325N/A * charset, then transfer-encoded using either the B or Q encoding. 325N/A * The resulting bytes are then returned as a Unicode string 325N/A * containing only ASCII characters. <p> 325N/A * Note that this method should be used to encode only 325N/A * "unstructured" RFC 822 headers. 325N/A * @param text the header value 325N/A * @param charset the charset. If this parameter is null, the 325N/A * platform's default chatset is used. 325N/A * @param encoding the encoding to be used. Currently supported 325N/A * values are "B" and "Q". If this parameter is null, then 325N/A * the "Q" encoding is used if most of characters to be 325N/A * encoded are in the ASCII charset, otherwise "B" encoding 325N/A * @return Unicode string containing only US-ASCII characters 325N/A * Decode "unstructured" headers, that is, headers that are defined 325N/A * as '*text' as per RFC 822. <p> 325N/A * The string is decoded using the algorithm specified in 325N/A * RFC 2047, Section 6.1.1. If the charset-conversion fails 325N/A * for any sequence, an UnsupportedEncodingException is thrown. 325N/A * If the String is not an RFC 2047 style encoded header, it is 325N/A * MimeBodyPart part = ... 325N/A * String rawvalue = null; 325N/A * if ((rawvalue = part.getHeader("X-mailer")[0]) != null) 325N/A * value = MimeUtility.decodeText(rawvalue); 325N/A * } catch (UnsupportedEncodingException e) { 325N/A * } catch (MessagingException me) { } 325N/A * </pre></blockquote><p> 325N/A * @param etext the possibly encoded value 325N/A * @exception UnsupportedEncodingException if the charset 325N/A * We look for sequences separated by "linear-white-space". 325N/A * (as per RFC 2047, Section 6.1.1) 325N/A * RFC 822 defines "linear-white-space" as SPACE | HT | CR | NL. 325N/A * First, lets do a quick run thru the string and check 325N/A * whether the sequence "=?" exists at all. If none exists, 325N/A * we know there are no encoded-words in here and we can just 325N/A * return the string as-is, without suffering thru the later 325N/A * This handles the most common case of unencoded headers 325N/A // Encoded words found. Start decoding ... 325N/A // If whitespace, append it to the whitespace buffer 325N/A (c ==
'\r') || (c ==
'\n'))
325N/A // Check if token is an 'encoded-word' .. 325N/A // Yes, this IS an 'encoded-word'. 325N/A // if the previous word was also encoded, we 325N/A // should ignore the collected whitespace. Else 325N/A // we include the whitespace as well. 325N/A // This is NOT an 'encoded-word'. 325N/A // possibly decode inner encoded words 325N/A // include colleced whitespace .. 325N/A * Encode a RFC 822 "word" token into mail-safe form as per 325N/A * The given Unicode string is examined for non US-ASCII 325N/A * characters. If the string contains only US-ASCII characters, 325N/A * it is returned as-is. If the string contains non US-ASCII 325N/A * characters, it is first character-encoded using the platform's 325N/A * default charset, then transfer-encoded using either the B or 325N/A * Q encoding. The resulting bytes are then returned as a Unicode 325N/A * string containing only ASCII characters. <p> 325N/A * This method is meant to be used when creating RFC 822 "phrases". 325N/A * The InternetAddress class, for example, uses this to encode 325N/A * it's 'phrase' component. 325N/A * @param text unicode string 325N/A * @return Array of Unicode strings containing only US-ASCII 325N/A * @exception UnsupportedEncodingException if the encoding fails 325N/A * Encode a RFC 822 "word" token into mail-safe form as per 325N/A * The given Unicode string is examined for non US-ASCII 325N/A * characters. If the string contains only US-ASCII characters, 325N/A * it is returned as-is. If the string contains non US-ASCII 325N/A * characters, it is first character-encoded using the specified 325N/A * charset, then transfer-encoded using either the B or Q encoding. 325N/A * The resulting bytes are then returned as a Unicode string 325N/A * containing only ASCII characters. <p> 325N/A * @param text unicode string 325N/A * @param charset the MIME charset 325N/A * @param encoding the encoding to be used. Currently supported 325N/A * values are "B" and "Q". If this parameter is null, then 325N/A * the "Q" encoding is used if most of characters to be 325N/A * encoded are in the ASCII charset, otherwise "B" encoding 325N/A * @return Unicode string containing only US-ASCII characters 325N/A * @exception UnsupportedEncodingException if the encoding fails 325N/A * Encode the given string. The parameter 'encodingWord' should 325N/A * be true if a RFC 822 "word" token is being encoded and false if a 325N/A * RFC 822 "text" token is being encoded. This is because the 325N/A * "Q" encoding defined in RFC 2047 has more restrictions when 325N/A * encoding "word" tokens. (Sigh) 325N/A // If 'string' contains only US-ASCII characters, just 325N/A // Else, apply the specified charset conversion. 325N/A }
else // MIME charset -> java charset 325N/A // If no transfer-encoding is specified, figure one out. 325N/A // As per RFC 2047, size of an encoded string should not 325N/A // 7 = size of "=?", '?', 'B'/'Q', '?', "?=" 325N/A // First find out what the length of the encoded version of 325N/A // If the length is greater than 'avail', split 'string' 325N/A // into two and recurse. 325N/A // length <= than 'avail'. Encode the given string 325N/A try {
// do the encoding 325N/A // Now write out the encoded (all ASCII) bytes into our 325N/A if (!
first)
// not the first line of this sequence 325N/A * The string is parsed using the rules in RFC 2047 for parsing 325N/A * an "encoded-word". If the parse fails, a ParseException is 325N/A * thrown. Otherwise, it is transfer-decoded, and then 325N/A * charset-converted into Unicode. If the charset-conversion 325N/A * fails, an UnsupportedEncodingException is thrown.<p> 325N/A * @param eword the possibly encoded value 325N/A * @exception ParseException if the string is not an 325N/A * encoded-word as per RFC 2047. 325N/A * @exception UnsupportedEncodingException if the charset 325N/A // get encoded-sequence 325N/A // Extract the bytes from word 325N/A // Get the appropriate decoder 325N/A // For b64 & q, size of decoded word <= size of word. So 325N/A // the decoded bytes must fit into the 'bytes' array. This 325N/A // is certainly more efficient than writing bytes into a 325N/A // ByteArrayOutputStream and then pulling out the byte[] 325N/A // count is set to the actual number of decoded bytes 325N/A // Finally, convert the decoded bytes into a String using 325N/A // the specified charset 325N/A // there's still more text in the string 325N/A // explicitly catch and rethrow this exception, otherwise 325N/A // the below IOException catch will swallow this up! 325N/A /* An unknown charset of the form ISO-XXX-XXX, will cause 325N/A * the JDK to throw an IllegalArgumentException ... Since the 325N/A * JDK will attempt to create a classname using this string, 325N/A * but valid classnames must not contain the character '-', 325N/A * and this results in an IllegalArgumentException, rather than 325N/A * the expected UnsupportedEncodingException. Yikes 325N/A * Look for encoded words within a word. The MIME spec doesn't 325N/A * allow this, but many broken mailers, especially Japanese mailers, 325N/A * produce such incorrect encodings. 325N/A // ignore it, just use the original string 325N/A * A utility method to quote a word, if the word contains any 325N/A * characters from the specified 'specials' list.<p> 325N/A * The <code>HeaderTokenizer</code> class defines two special 325N/A * sets of delimiters - MIME and RFC 822. <p> 325N/A * This method is typically used during the generation of 325N/A * RFC 822 and MIME header fields. 325N/A * @param word word to be quoted 325N/A * @param specials the set of special characters 325N/A * @return the possibly quoted word 325N/A * @see javax.mail.internet.HeaderTokenizer#MIME 325N/A * @see javax.mail.internet.HeaderTokenizer#RFC822 325N/A * Look for any "bad" characters, Escape and 325N/A * quote the entire string if necessary. 325N/A if (c ==
'"' || c ==
'\\' || c ==
'\r' || c ==
'\n') {
325N/A // need to escape them and then quote the whole string 325N/A ;
// do nothing, CR was already escaped 325N/A // These characters cause the string to be quoted 325N/A * Fold a string at linear whitespace so that each line is no longer 325N/A * than 76 characters, if possible. If there are more than 76 325N/A * non-whitespace characters consecutively, the string is folded at 325N/A * the first whitespace after that sequence. The parameter 325N/A * <code>used</code> indicates how many characters have been used in 325N/A * the current line; it is usually the length of the header name. <p> 325N/A * Note that line breaks in the string aren't escaped; they probably 325N/A * @param used characters used in line so far 325N/A * @param s the string to fold 325N/A * @return the folded string 325N/A // Strip trailing spaces 325N/A if (c !=
' ' && c !=
'\t')
325N/A // if the string fits now, just return it 325N/A // have to actually fold the string 325N/A if (c ==
' ' || c ==
'\t')
325N/A // no space, use the whole thing 325N/A * Unfold a folded header. Any line breaks that aren't escaped and 325N/A * are followed by whitespace are removed. 325N/A * @param s the string to unfold 325N/A * @return the unfolded string 325N/A // if next line starts with whitespace, skip all of it 325N/A // XXX - always has to be true? 325N/A if (i < l && ((c = s.
charAt(i)) ==
' ' || c ==
'\t')) {
325N/A i++;
// skip whitespace 325N/A while (i < l && ((c = s.
charAt(i)) ==
' ' || c ==
'\t'))
325N/A // it's not a continuation line, just leave it in 325N/A // there's a backslash at "start - 1" 325N/A // strip it out, but leave in the line break 325N/A * Return the first index of any of the characters in "any" in "s", 325N/A * or -1 if none are found. 325N/A * This should be a method on String. 325N/A * Convert a MIME charset name into a valid Java charset name. <p> 325N/A * @param charset the MIME charset name 325N/A * @return the Java charset equivalent. If a suitable mapping is 325N/A * not available, the passed in charset is itself returned. 325N/A // no mapping table, or charset parameter is null 325N/A * Convert a java charset into its MIME charset name. <p> 325N/A * Note that a future version of JDK (post 1.2) might provide 325N/A * this functionality, in which case, we may deprecate this 325N/A * @param charset the JDK charset 325N/A * is not possible, the passed in charset itself 325N/A // no mapping table or charset param is null 325N/A * Get the default charset corresponding to the system's current 325N/A * default locale. If the System property <code>mail.mime.charset</code> 325N/A * is set, a system charset corresponding to this MIME charset will be 325N/A * @return the default charset of the system's default locale, 325N/A * as a Java charset. (NOT a MIME charset) 325N/A * If mail.mime.charset is set, it controls the default 325N/A * Java charset as well. 325N/A * Get the default MIME charset for this locale. 325N/A // Tables to map MIME charset names to Java names and vice versa. 325N/A // XXX - Should eventually use J2SE 1.4 java.nio.charset.Charset 325N/A // Use this class's classloader to load the mapping file 325N/A // XXX - we should use SecuritySupport, but it's in another package 325N/A // Load the JDK-to-MIME charset mapping table 325N/A // Load the MIME-to-JDK charset mapping table 325N/A // If we didn't load the tables, e.g., because we didn't have 325N/A // permission, load them manually. The entries here should be 325N/A // the same as the default javamail.charset.map. 325N/A break;
// error in reading, stop 325N/A // ignore empty lines and comments 325N/A // A valid entry is of the form <key><separator><value> 325N/A // where, <separator> := SPACE | HT. Parse this 325N/A * Check if the given string contains non US-ASCII characters. 325N/A * @return ALL_ASCII if all characters in the string 325N/A * belong to the US-ASCII charset. MOSTLY_ASCII 325N/A * if more than half of the available characters 325N/A * are US-ASCII characters. Else MOSTLY_NONASCII. 325N/A for (
int i =
0; i < l; i++) {
325N/A * Check if the given byte array contains non US-ASCII characters. 325N/A * @return ALL_ASCII if all characters in the string 325N/A * belong to the US-ASCII charset. MOSTLY_ASCII 325N/A * if more than half of the available characters 325N/A * are US-ASCII characters. Else MOSTLY_NONASCII. 325N/A * XXX - this method is no longer used 325N/A // The '&' operator automatically causes b[i] to be promoted 325N/A // to an int, and we mask out the higher bytes in the int 325N/A // so that the resulting value is not a negative integer. 325N/A * Check if the given input stream contains non US-ASCII characters. 325N/A * Upto <code>max</code> bytes are checked. If <code>max</code> is 325N/A * set to <code>ALL</code>, then all the bytes available in this 325N/A * input stream are checked. If <code>breakOnNonAscii</code> is true 325N/A * the check terminates when the first non-US-ASCII character is 325N/A * found and MOSTLY_NONASCII is returned. Else, the check continues 325N/A * till <code>max</code> bytes or till the end of stream. 325N/A * @param is the input stream 325N/A * @param max maximum bytes to check for. The special value 325N/A * ALL indicates that all the bytes in this input 325N/A * stream must be checked. 325N/A * @param breakOnNonAscii if <code>true</code>, then terminate the 325N/A * the check when the first non-US-ASCII character 325N/A * @return ALL_ASCII if all characters in the string 325N/A * belong to the US-ASCII charset. MOSTLY_ASCII 325N/A * if more than half of the available characters 325N/A * are US-ASCII characters. Else MOSTLY_NONASCII. 325N/A // The '&' operator automatically causes b[i] to 325N/A // be promoted to an int, and we mask out the higher 325N/A // bytes in the int so that the resulting value is 325N/A // not a negative integer. 325N/A if (b ==
'\r' || b ==
'\n')
325N/A // We have been told to break on the first non-ascii character. 325N/A // We haven't got any non-ascii character yet, but then we 325N/A // have not checked all of the available bytes either. So we 325N/A // cannot say for sure that this input stream is ALL_ASCII, 325N/A // and hence we must play safe and return MOSTLY_NONASCII 325N/A // If we're looking at non-text data, and we saw CR without LF 325N/A // or vice versa, consider this mostly non-ASCII so that it 325N/A // will be base64 encoded (since the quoted-printable encoder 325N/A // doesn't encode this case properly). 325N/A // if we've seen a long line, we degrade to mostly ascii 325N/A return b >=
0177 || (b <
040 && b !=
'\r' && b !=
'\n' && b !=
'\t');
325N/A * An OutputStream that determines whether the data written to 325N/A * it is all ASCII, mostly ASCII, or mostly non-ASCII. 325N/A if (b ==
'\r' || b ==
'\n')
325N/A * Return ASCII-ness of data stream. 325N/A // If we're looking at non-text data, and we saw CR without LF 325N/A // or vice versa, consider this mostly non-ASCII so that it 325N/A // will be base64 encoded (since the quoted-printable encoder 325N/A // doesn't encode this case properly). 325N/A // if we've seen a long line, we degrade to mostly ascii