AnalyzerGuru.java revision 1181
1364N/A * The contents of this file are subject to the terms of the 1364N/A * Common Development and Distribution License (the "License"). 1364N/A * You may not use this file except in compliance with the License. 1364N/A * language governing permissions and limitations under the License. 1364N/A * When distributing Covered Code, include this CDDL HEADER in each 1364N/A * If applicable, add the following below this CDDL HEADER, with the 1364N/A * fields enclosed by brackets "[]" replaced with your own identifying 1364N/A * information: Portions Copyright [yyyy] [name of copyright owner] 1383N/A * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. 1364N/A * Manages and porvides Analyzers as needed. Please see 1364N/A * this</a> page for a great description of the purpose of the AnalyzerGuru. 1383N/A * Created on September 22, 2005 1364N/A /** The default {@code FileAnalyzerFactory} instance. */ 1364N/A /** Map from file names to analyzer factories. */ 1364N/A /** Map from file extensions to analyzer factories. */ 1364N/A // @TODO: have a comparator 1370N/A /** Map from magic strings to analyzer factories. */ 1364N/A * List of matcher objects which can be used to determine which analyzer 1364N/A /** List of all registered {@code FileAnalyzerFactory} instances. */ 1364N/A * If you write your own analyzer please register it here 1383N/A * Register a {@code FileAnalyzerFactory} instance. 1389N/A "name '" +
name +
"' used in multiple analyzers";
1364N/A * Instruct the AnalyzerGuru to use a given analyzer for a given 1364N/A * @param extension the file-extension to add 1364N/A * @param factory a factory which creates 1364N/A * the analyzer to use for the given extension 1364N/A * (if you pass null as the analyzer, you will disable 1364N/A * the analyzer used for that extension) 1364N/A * Get the default Analyzer. 1364N/A * Get an analyzer suited to analyze a file. This function will reuse 1364N/A * analyzers since they are costly. 1364N/A * @param in Input stream containing data to be analyzed 1364N/A * @param file Name of the file to be analyzed 1364N/A * @return An analyzer suited for that file content 1364N/A * @throws java.io.IOException If an error occurs while accessing the 1364N/A * data in the input stream. 1364N/A * Create a Lucene document and fill in the required fields 1364N/A * @param file The file to index 1364N/A * @param in The data to generate the index for 1364N/A * @param path Where the file is located (from source root) 1364N/A * @return The Lucene document to add to the index database 1364N/A * @throws java.io.IOException If an exception occurs while collecting the 1364N/A // date = hr.getLastCommentDate() //RFE 1364N/A * Get the content type for a named file. 1364N/A * @param in The input stream we want to get the content type for (if 1364N/A * we cannot determine the content type by the filename) 1364N/A * @param file The name of the file 1364N/A * @return The contentType suitable for printing to response.setContentType() or null 1364N/A * if the factory was not found 1364N/A * @throws java.io.IOException If an error occurs while accessing the input 1364N/A * Write a browsable version of the file 1364N/A * @param factory The analyzer factory for this filetype 1364N/A * @param in The input stream containing the data 1364N/A * @param out Where to write the result 1364N/A * @param defs definitions for the source file, if available 1364N/A * @param annotation Annotation information for the file 1364N/A * @param project Project the file belongs to 1364N/A * @throws java.io.IOException If an error occurs while creating the 949N/A // This is some kind of text file, so we need to expand tabs to 1186N/A // spaces to match the project's tab settings. 949N/A * Get the genre of a file 1186N/A * @param file The file to inpect 1186N/A * @return The genre suitable to decide how to display the file 1186N/A * Get the genre of a bulk of data 1186N/A * @param in A stream containing the data 1186N/A * @return The genre suitable to decide how to display the file 1186N/A * @throws java.io.IOException If an error occurs while getting the content 1186N/A * Get the genre for a named class (this is most likely an analyzer) 1186N/A * @param factory the analyzer factory to get the genre for 949N/A * @return The genre of this class (null if not found) 1390N/A * Find a {@code FileAnalyzerFactory} with the specified class name. If one 1390N/A * doesn't exist, create one and register it. 1390N/A * @param factoryClassName name of the factory class 1390N/A * @return a file analyzer factory 1186N/A * @throws ClassNotFoundException if there is no class with that name 1390N/A * @throws ClassCastException if the class is not a subclass of {@code 1390N/A * @throws IllegalAccessException if the constructor cannot be accessed 1254N/A * @throws InstantiationException if the class cannot be instantiated 1186N/A * Find a {@code FileAnalyzerFactory} which is an instance of the specified 1186N/A * class. If one doesn't exist, create one and register it. 1388N/A * @param factoryClass the factory class 1186N/A * @return a file analyzer factory 949N/A * @throws ClassCastException if the class is not a subclass of {@code 1186N/A * @throws IllegalAccessException if the constructor cannot be accessed 1186N/A * @throws InstantiationException if the class cannot be instantiated 1186N/A * Finds a suitable analyser class for file name. If the analyzer cannot 1186N/A * be determined by the file extension, try to look at the data in the 1186N/A * InputStream to find a suitable analyzer. 1390N/A * Use if you just want to find file type. 1186N/A * @param in The input stream containing the data 1186N/A * @param file The file name to get the analyzer for 1186N/A * @return the analyzer factory to use 1186N/A * @throws java.io.IOException If a problem occurs while reading the data 1390N/A //TODO above is not that great, since if 2 analyzers share one extension 1390N/A //then only the first one registered will own it 1390N/A //it would be cool if above could return more analyzers and below would 1390N/A //then decide between them ... 1186N/A * Finds a suitable analyser class for file name. 1186N/A * @param file The file name to get the analyzer for 1186N/A * @return the analyzer factory to use 1186N/A // file doesn't have any of the extensions we know, try full match 1186N/A * Finds a suitable analyser class for the data in this stream 1186N/A * @param in The stream containing the data to analyze 1186N/A * @return the analyzer factory to use 1186N/A * @throws java.io.IOException if an error occurs while reading data from 1186N/A * Finds a suitable analyser class for a magic signature 1186N/A * @param signature the magic signature look up 1186N/A * @return the analyzer factory to use 1186N/A // XXX this assumes ISO-8859-1 encoding (and should work in most cases 1186N/A // for US-ASCII, UTF-8 and other ISO-8859-* encodings, but not always), 1186N/A // we should try to be smarter than this... 1186N/A // See if text files have the magic sequence if we remove the 1186N/A BOMS.
put(
"UTF-8",
new byte[] {(
byte)
0xEF, (
byte)
0xBB, (
byte)
0xBF});
949N/A BOMS.
put(
"UTF-16BE",
new byte[] {(
byte)
0xFE, (
byte)
0xFF});
1186N/A * Strip away the byte-order marker from the string, if it has one. 1390N/A * @param sig a sequence of bytes from which to remove the BOM 1390N/A * @return a string without the byte-order marker, or <code>null</code> if 1186N/A * the string doesn't start with a BOM 1186N/A // BOM matched beginning of signature