AnalyzerGuru.java revision 1461
0N/A * The contents of this file are subject to the terms of the 0N/A * Common Development and Distribution License (the "License"). 0N/A * You may not use this file except in compliance with the License. 0N/A * language governing permissions and limitations under the License. 0N/A * When distributing Covered Code, include this CDDL HEADER in each 0N/A * If applicable, add the following below this CDDL HEADER, with the 0N/A * fields enclosed by brackets "[]" replaced with your own identifying 0N/A * information: Portions Copyright [yyyy] [name of copyright owner] 1458N/A * Copyright (c) 2005, 2012, Oracle and/or its affiliates. All rights reserved. 1272N/A * Manages and provides Analyzers as needed. Please see 143N/A * this</a> page for a great description of the purpose of the AnalyzerGuru. 0N/A * Created on September 22, 2005 202N/A /** The default {@code FileAnalyzerFactory} instance. */ 483N/A /** Map from file names to analyzer factories. */ 202N/A /** Map from file extensions to analyzer factories. */ 460N/A // @TODO: have a comparator 202N/A /** Map from magic strings to analyzer factories. */ 202N/A * List of matcher objects which can be used to determine which analyzer 210N/A /** List of all registered {@code FileAnalyzerFactory} instances. */ 0N/A * If you write your own analyzer please register it here 210N/A * Register a {@code FileAnalyzerFactory} instance. 483N/A "name '" +
name +
"' used in multiple analyzers";
210N/A "suffix '" +
suffix +
"' used in multiple analyzers";
257N/A "magic '" +
magic +
"' used in multiple analyzers";
143N/A * Instruct the AnalyzerGuru to use a given analyzer for a given 143N/A * @param extension the file-extension to add 202N/A * @param factory a factory which creates 202N/A * the analyzer to use for the given extension 143N/A * (if you pass null as the analyzer, you will disable 143N/A * the analyzer used for that extension) 0N/A * Get the default Analyzer. 1461N/A * @return a possibly cached instance of an analyzer. 143N/A * Get an analyzer suited to analyze a file. This function will reuse 143N/A * analyzers since they are costly. 143N/A * @param in Input stream containing data to be analyzed 143N/A * @param file Name of the file to be analyzed 143N/A * @return An analyzer suited for that file content 143N/A * @throws java.io.IOException If an error occurs while accessing the 419N/A * data in the input stream. 143N/A * Create a Lucene document and fill in the required fields 143N/A * @param file The file to index 143N/A * @param in The data to generate the index for 143N/A * @param path Where the file is located (from source root) 1461N/A * @param fa analyzer to use to determine the genre of and analyze 143N/A * @return The Lucene document to add to the index database 143N/A * @throws java.io.IOException If an exception occurs while collecting the 0N/A // date = hr.getLastCommentDate() //RFE 143N/A * Get the content type for a named file. 143N/A * @param in The input stream we want to get the content type for (if 143N/A * we cannot determine the content type by the filename) 143N/A * @param file The name of the file 216N/A * @return The contentType suitable for printing to response.setContentType() or null 216N/A * if the factory was not found 143N/A * @throws java.io.IOException If an error occurs while accessing the input 143N/A * Write a browsable version of the file 202N/A * @param factory The analyzer factory for this filetype 143N/A * @param in The input stream containing the data 143N/A * @param out Where to write the result 1127N/A * @param defs definitions for the source file, if available 143N/A * @param annotation Annotation information for the file 271N/A * @param project Project the file belongs to 143N/A * @throws java.io.IOException If an error occurs while creating the 922N/A // This is some kind of text file, so we need to expand tabs to 922N/A // spaces to match the project's tab settings. 143N/A * Get the genre of a file 143N/A * @param file The file to inpect 0N/A * @return The genre suitable to decide how to display the file 143N/A * Get the genre of a bulk of data 143N/A * @param in A stream containing the data 143N/A * @return The genre suitable to decide how to display the file 143N/A * @throws java.io.IOException If an error occurs while getting the content 143N/A * Get the genre for a named class (this is most likely an analyzer) 202N/A * @param factory the analyzer factory to get the genre for 143N/A * @return The genre of this class (null if not found) 210N/A * Find a {@code FileAnalyzerFactory} with the specified class name. If one 210N/A * doesn't exist, create one and register it. 210N/A * @param factoryClassName name of the factory class 210N/A * @return a file analyzer factory 210N/A * @throws ClassNotFoundException if there is no class with that name 210N/A * @throws ClassCastException if the class is not a subclass of {@code 210N/A * @throws IllegalAccessException if the constructor cannot be accessed 210N/A * @throws InstantiationException if the class cannot be instantiated 210N/A * Find a {@code FileAnalyzerFactory} which is an instance of the specified 210N/A * class. If one doesn't exist, create one and register it. 210N/A * @param factoryClass the factory class 210N/A * @return a file analyzer factory 210N/A * @throws ClassCastException if the class is not a subclass of {@code 210N/A * @throws IllegalAccessException if the constructor cannot be accessed 210N/A * @throws InstantiationException if the class cannot be instantiated 143N/A * Finds a suitable analyser class for file name. If the analyzer cannot 143N/A * be determined by the file extension, try to look at the data in the 143N/A * InputStream to find a suitable analyzer. 0N/A * Use if you just want to find file type. 143N/A * @param in The input stream containing the data 143N/A * @param file The file name to get the analyzer for 202N/A * @return the analyzer factory to use 143N/A * @throws java.io.IOException If a problem occurs while reading the data 1072N/A //TODO above is not that great, since if 2 analyzers share one extension 1072N/A //then only the first one registered will own it 1072N/A //it would be cool if above could return more analyzers and below would 1072N/A //then decide between them ... 143N/A * Finds a suitable analyser class for file name. 143N/A * @param file The file name to get the analyzer for 202N/A * @return the analyzer factory to use 483N/A // file doesn't have any of the extensions we know, try full match 1368N/A * Finds a suitable analyser class for the data in this stream. On success 1368N/A * the current position in the given input stream is reset to the position 1368N/A * when this method got invoked. 143N/A * @param in The stream containing the data to analyze 202N/A * @return the analyzer factory to use 143N/A * @throws java.io.IOException if an error occurs while reading data from 1368N/A * @see InputStream#mark(int) 1458N/A /* Need at least 4 bytes to perform magic string matching. */ 1461N/A * Finds a suitable analyzer class for a magic signature 143N/A * @param signature the magic signature look up 202N/A * @return the analyzer factory to use 956N/A // XXX this assumes ISO-8859-1 encoding (and should work in most cases 956N/A // for US-ASCII, UTF-8 and other ISO-8859-* encodings, but not always), 956N/A // we should try to be smarter than this... 200N/A // See if text files have the magic sequence if we remove the 200N/A /** Byte-order markers. */ 956N/A BOMS.
put(
"UTF-8",
new byte[] {(
byte)
0xEF, (
byte)
0xBB, (
byte)
0xBF});
956N/A BOMS.
put(
"UTF-16BE",
new byte[] {(
byte)
0xFE, (
byte)
0xFF});
956N/A BOMS.
put(
"UTF-16LE",
new byte[] {(
byte)
0xFF, (
byte)
0xFE});
200N/A * Strip away the byte-order marker from the string, if it has one. 956N/A * @param sig a sequence of bytes from which to remove the BOM 200N/A * @return a string without the byte-order marker, or <code>null</code> if 200N/A * the string doesn't start with a BOM 956N/A // BOM matched beginning of signature