AnalyzerGuru.java revision 460
260N/A * The contents of this file are subject to the terms of the 260N/A * Common Development and Distribution License (the "License"). 260N/A * You may not use this file except in compliance with the License. 260N/A * language governing permissions and limitations under the License. 260N/A * When distributing Covered Code, include this CDDL HEADER in each 260N/A * If applicable, add the following below this CDDL HEADER, with the 260N/A * fields enclosed by brackets "[]" replaced with your own identifying 260N/A * information: Portions Copyright [yyyy] [name of copyright owner] 1235N/A * Copyright 2007 Sun Microsystems, Inc. All rights reserved. 1190N/A * Use is subject to license terms. 260N/A * Manages and porvides Analyzers as needed. Please see 456N/A * this</a> page for a great description of the purpose of the AnalyzerGuru. 260N/A * Created on September 22, 2005 1185N/A /** The default {@code FileAnalyzerFactory} instance. */ 1185N/A /** Map from file extensions to analyzer factories. */ 1185N/A // @TODO: have a comparator 1185N/A /** Map from magic strings to analyzer factories. */ 1185N/A * List of matcher objects which can be used to determine which analyzer 894N/A /** List of all registered {@code FileAnalyzerFactory} instances. */ 1185N/A * If you write your own analyzer please register it here 260N/A * Register a {@code FileAnalyzerFactory} instance. 260N/A "suffix '" +
suffix +
"' used in multiple analyzers";
260N/A "magic '" +
magic +
"' used in multiple analyzers";
260N/A * Instruct the AnalyzerGuru to use a given analyzer for a given 260N/A * @param extension the file-extension to add 260N/A * @param factory a factory which creates 260N/A * the analyzer to use for the given extension 260N/A * (if you pass null as the analyzer, you will disable 1195N/A * the analyzer used for that extension) 260N/A * Get the default Analyzer. 260N/A * Get an analyzer suited to analyze a file. This function will reuse 260N/A * analyzers since they are costly. 260N/A * @param in Input stream containing data to be analyzed 260N/A * @param file Name of the file to be analyzed 260N/A * @return An analyzer suited for that file content 260N/A * @throws java.io.IOException If an error occurs while accessing the 260N/A * data in the input stream. 260N/A * Create a Lucene document and fill in the required fields 260N/A * @param file The file to index 260N/A * @param in The data to generate the index for 1195N/A * @param path Where the file is located (from source root) 260N/A * @return The Lucene document to add to the index database 260N/A * @throws java.io.IOException If an exception occurs while collecting the 540N/A // date = hr.getLastCommentDate() //RFE // Ignoring any errors while analysing * Get the content type for a named file. * @param in The input stream we want to get the content type for (if * we cannot determine the content type by the filename) * @param file The name of the file * @return The contentType suitable for printing to response.setContentType() or null * if the factory was not found * @throws java.io.IOException If an error occurs while accessing the input * Write a browsable version of the file * @param factory The analyzer factory for this filetype * @param in The input stream containing the data * @param out Where to write the result * @param annotation Annotation information for the file * @param project Project the file belongs to * @throws java.io.IOException If an error occurs while creating the * Get the genre of a file * @param file The file to inpect * @return The genre suitable to decide how to display the file * Get the genre of a bulk of data * @param in A stream containing the data * @return The genre suitable to decide how to display the file * @throws java.io.IOException If an error occurs while getting the content * Get the genre for a named class (this is most likely an analyzer) * @param factory the analyzer factory to get the genre for * @return The genre of this class (null if not found) * Find a {@code FileAnalyzerFactory} with the specified class name. If one * doesn't exist, create one and register it. * @param factoryClassName name of the factory class * @return a file analyzer factory * @throws ClassNotFoundException if there is no class with that name * @throws ClassCastException if the class is not a subclass of {@code * @throws IllegalAccessException if the constructor cannot be accessed * @throws InstantiationException if the class cannot be instantiated * Find a {@code FileAnalyzerFactory} which is an instance of the specified * class. If one doesn't exist, create one and register it. * @param factoryClass the factory class * @return a file analyzer factory * @throws ClassCastException if the class is not a subclass of {@code * @throws IllegalAccessException if the constructor cannot be accessed * @throws InstantiationException if the class cannot be instantiated * Finds a suitable analyser class for file name. If the analyzer cannot * be determined by the file extension, try to look at the data in the * InputStream to find a suitable analyzer. * Use if you just want to find file type. * @param in The input stream containing the data * @param file The file name to get the analyzer for * @return the analyzer factory to use * @throws java.io.IOException If a problem occurs while reading the data * Finds a suitable analyser class for file name. * @param file The file name to get the analyzer for * @return the analyzer factory to use // file doesn't have any of the extensions we know * Finds a suitable analyser class for the data in this stream * @param in The stream containing the data to analyze * @return the analyzer factory to use * @throws java.io.IOException if an error occurs while reading data from * Finds a suitable analyser class for a magic signature * @param signature the magic signature look up * @return the analyzer factory to use * Get an analyzer by looking up the "magic signature" * @param signature the signature to look up * @return the analyzer factory to handle data with this signature // See if text files have the magic sequence if we remove the /** Byte-order markers. */ new String(
new char[] {
0xEF,
0xBB,
0xBF}),
// UTF-8 BOM new String(
new char[] {
0xFE,
0xFF}),
// UTF-16BE BOM new String(
new char[] {
0xFF,
0xFE}),
// UTF-16LE BOM * Strip away the byte-order marker from the string, if it has one. * @param str the string to remove the BOM from * @return a string without the byte-order marker, or <code>null</code> if * the string doesn't start with a BOM