mod_speling.c revision b39ba1ea90cd1940dcd9e8d0f18c1ff02c187ac1
* this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. /* mod_speling.c - by Alexei Kosut <akosut@organic.com> June, 1996 * This module is transparent, and simple. It attempts to correct * misspellings of URLs that users might have entered, namely by checking * capitalizations. If it finds a match, it sends a redirect. * Sep-1999 Hugo Haas <hugo@w3.org> * o Added a CheckCaseOnly option to check only miscapitalized words. * 08-Aug-1997 <Martin.Kraemer@Mch.SNI.De> * o Upgraded module interface to apache_1.3a2-dev API (more NULL's in * o Integrated tcsh's "spelling correction" routine which allows one * Rewrote it to ignore case as well. This ought to catch the majority * of misspelled requests. * o Commented out the second pass where files' suffixes are stripped. * Given the better hit rate of the first pass, this rather ugly * (request index.html, receive index.db ?!?!) solution can be * o wrote a "kind of" html page for mod_speling * Activate it with "CheckSpelling On" * Create a configuration specific to this module for a server or directory * location, and fill it with the default settings. * The API says that in the absence of a merge function, the record for the * closest ancestor is used exclusively. That's what we want, so we don't * bother to have such a function. * Respond to a callback to create configuration record for a server or * Respond to a callback to create a config record for a specific directory. * Define the directives specific to this module. This structure is referenced * later by the 'module' structure. "whether or not to fix only miscapitalized requests"),
* spdist() is taken from Kernighan & Pike, * _The_UNIX_Programming_Environment_ * and adapted somewhat to correspond better to psychological reality. * (Note the changes to the return values) * According to Pollock and Zamora, CACM April 1984 (V. 27, No. 4), * page 363, the correct order for this is: * OMISSION = TRANSPOSITION > INSERTION > SUBSTITUTION * thus, it was exactly backwards in the old version. -- PWP * This routine was taken out of tcsh's spelling correction code * (tcsh-6.07.04) and re-converted to apache data types ("char" type * instead of tcsh's NLS'ed "Char"). Plus it now ignores the case * during comparisons, so is a "approximate strcasecmp()". * NOTE that is still allows only _one_ real "typo", * it does NOT try to correct multiple errors. /* We only want to worry about GETs */ /* We've already got a file of some kind or another */ /* This is a sub request - don't mess with it */ * The request should end up looking like this: * So we do this in steps. First break r->filename into two pieces * Don't do anything if the request doesn't contain a slash, or /* good = /correct-file */ /* Check to see if the URL pieces add up */ /* Now open the directory and do ourselves a check... */ /* Oops, not a directory... */ * If we end up with a "fixed" URL which is identical to the * requested one, we must have found a broken symlink or some such. * Do _not_ try to redirect this, it causes a loop! * miscapitalization errors are checked first (like, e.g., lower case * file, upper case request) * simple typing errors are checked next (like, e.g., * The spdist() should have found the majority of the misspelled * requests. It is of questionable use to continue looking for * files with the same base name, but potentially of totally wrong * I would propose to not set the WANT_BASENAME_MATCH define. * 08-Aug-1997 <Martin.Kraemer@Mch.SNI.De> * However, Alexei replied giving some reasons to add it anyway: * > Oh, by the way, I remembered why having the * > extension-stripping-and-matching stuff is a good idea: * > If you're using MultiViews, and have a file named foobar.html, * > which you refer to as "foobar", and someone tried to access * > "Foobar", mod_speling won't find it, because it won't find * > anything matching that spelling. With the extension-munging, * > it would locate "foobar.html". Not perfect, but I ran into * > that problem when I first wrote the module. * Okay... we didn't find anything. Now we take out the hard-core * power tools. There are several cases here. Someone might have * entered a wrong extension (.htm instead of .html or vice * versa) or the document could be negotiated. At any rate, now * we just compare stuff before the first dot. If it matches, we * figure we got us a match. This can result in wrong things if * there are files of different content types but the same prefix * (e.g. foo.gif and foo.html) This code will pick the first one * it finds. Better than a Not Found, though. /* Wow... we found us a mispelling. Construct a fixed url */ * Conditions for immediate redirection: * a) the first candidate was not found by stripping the suffix * AND b) there exists only one candidate OR the best match is not * then return a redirection right away. ref ?
"Fixed spelling: %s to %s from %s" :
"Fixed spelling: %s to %s",
* Otherwise, a "[300] Multiple Choices" list with the variants is /* Generate the response text. */ "The document name you requested (<code>";
"</code>) could not be found on this server.\n" "However, we found documents with names similar " "to the one you requested.<p>" "Available documents:\n<ul>\n";
/* The format isn't very neat... */ * when we have printed the "close matches" and there are * more "distant matches" (matched by stripping the suffix), * then we insert an additional separator text to suggest * that the user LOOK CLOSELY whether these are really the "</ul>\nFurthermore, the following related " "documents were found:\n<ul>\n";
/* If we know there was a referring page, add a note: */ "Please consider informing the owner of the " "about the broken link.\n";
ref ?
"Spelling fix: %s: %d candidates from %s" :
"Spelling fix: %s: %d candidates",
NULL,
/* merge per-dir config */ NULL,
/* merge server config */