ROADMAP revision 96a3e7182f6ab9eaff6c94fc31a162b59b2827da
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob AusteinAPACHE 2.x ROADMAP
11e9368a226272085c337e9e74b79808c16fbdbaTinderbox User==================
75c0816e8295e180f4bc7f10db3d0d880383bc1cMark AndrewsLast modified at [$Date$]
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
4a14ce5ba00ab7bc55c99ffdcf59c7a4ab902721Automatic Updater
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob AusteinWORKS IN PROGRESS
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein-----------------
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein * Source code should follow style guidelines.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein OK, we all agree pretty code is good. Probably best to clean this
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein up by hand immediately upon branching a 2.1 tree.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Status: Justin volunteers to hand-edit the entire source tree ;)
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Justin says:
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Recall when the release plan for 2.0 was written:
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Absolute Enforcement of an "Apache Style" for code.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Watch this slip into 3.0.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein David says:
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein The style guide needs to be reviewed before this can be done.
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt http://httpd.apache.org/dev/styleguide.html
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt The current file is dated April 20th 1998!
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein OtherBill offers:
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein It's survived since '98 because it's welldone :-) Suggest we
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein simply follow whatever is documented in styleguide.html as we
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein branch the next tree. Really sort of straightforward, if you
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein dislike a bit within that doc, bring it up on the dev@httpd
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein list prior to the next branch.
71c66a876ecca77923638d3f94cc0783152b2f03Mark Andrews
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein So Bill sums up ... let's get the code cleaned up in CVS head.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Remember, it just takes cvs diff -b (that is, --ignore-space-change)
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein to see the code changes and ignore that cruft. Get editing Justin :)
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein * revamp the input filter syntax to provide for ordering of
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein filters created with the Set{Input|Output}Filter and the
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Add{Input|Output}Filter directives. A 'relative to filterx'
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein syntax is definately preferable.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein * Platforms that do not support fork (primarily Win32 and AS/400)
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt Architect start-up code that avoids initializing all the modules
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt in the parent process on platforms that do not support fork.
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein . Better yet - not only inform the startup of which phase it's in,
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein but allow the parent 'process' to initialize shared memory, etc,
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt and create a module-by-module stream to pass to the child, so the
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt parent can actually arbitrate the important stuff.
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein * Replace stat [deferred open] with open/fstat in directory_walk.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Justin, Ian, OtherBill all interested in this. Implies setting up
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt the apr_file_t member in request_rec, and having all modules use
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt that file, and allow the cleanup to close it [if it isn't a shared,
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein cached file handle.]
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt * The Async Apache Server implemented in terms of APR.
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt [Bill Stoddard's pet project.]
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Message-ID: <008301c17d42$9b446970$01000100@sashimi> (dev@apr)
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein OtherBill notes that this can proceed in two parts...
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt Async accept, setup, and tear-down of the request
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein e.g. dealing with the incoming request headers, prior to
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein dispatching the request to a thread for processing.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein This doesn't need to wait for a 2.x/3.0 bump.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Async delegation of the entire request processing chain
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Too many handlers use stack storage and presume it is
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt available for the life of the request, so a complete
71c66a876ecca77923638d3f94cc0783152b2f03Mark Andrews async implementation would need to happen 3.0 release.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Brian notes that async writes will provide a bigger
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein scalability win than async reads for most servers.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein We may want to try a hybrid sync-read/async-write MPM
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein as a next step. This should be relatively easy to
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein build: start with the current worker or leader/followers
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt model, but hand off each response brigade to a "completion
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt thread" that multiplexes writes on many connections, so
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein that the worker thread doesn't have to wait around for
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein the sendfile to complete.
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt * Add a string "class" that combines a char* with a length
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt and a reference count. This will help reduce the number
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein of strlen and strdup operations during request processing.
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt Including both the length and allocation will save us a ton
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein of reallocation we do today, in terms of string manipulation.
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein OtherBill asks if this is really an APR issue, not an HTTPD issue?
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt Brian notes that the performance optimization work in 2.0
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt has all but eliminated the original motiviation for this
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein idea. The httpd doesn't spend that much time in strlen
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein calls any more.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
14a656f94b1fd0ababd84a772228dfa52276ba15Evan HuntMAKING APACHE REPOSITORY-AGNOSTIC
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein(or: remove knowledge of the filesystem)
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein[ 2002/10/01: discussion in progress on items below; this isn't
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt planned yet ]
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein * dav_resource concept for an HTTP resource ("ap_resource")
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein * r->filename, r->canonical_filename, r->finfo need to
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein disappear. All users need to use new APIs on the ap_resource
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt object.
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt (backwards compat: today, when this occurs with mod_dav and a
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein custom backend, the above items refer to the topmost directory
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein mapped by a location; e.g. docroot)
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Need to preserve a 'filename'-like string for mime-by-name
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein sorts of operations. But this only needs to be the name itself
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein and not a full path.
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Justin: Can we leverage the path info, or do we not trust the
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein user?
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein gstein: well, it isn't the "path info", but the actual URI of
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt the resource. And of course we trust the user... that is
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt the resource they requested.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein dav_resource->uri is the field you want. path_info might
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein still exist, but that portion might be related to the
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein CGI concept of "path translated" or some other further
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein resolution.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt To continue, I would suggest that "path translated" and
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein having *any* path info is Badness. It means that you did
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt not fully resolve a resource for the given URI. The
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt "abs_path" in a URI identifies a resource, and that
fc74b733bf679e1b3fb1599e32d445dffe325208Tinderbox User should get fully resolved. None of this "resolve to
5d564da348e890e42f63eebf2dced9a05b41f4fbTinderbox User <here> and then we have a magical second resolution
fc74b733bf679e1b3fb1599e32d445dffe325208Tinderbox User (inside the CGI script)" or somesuch.
fc74b733bf679e1b3fb1599e32d445dffe325208Tinderbox User
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt Justin: Well, let's consider mod_mbox for a second. It is sort of
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt a virtual filesystem in its own right - as it introduces
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein it's own notion of a URI space, but it is intrinsically
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein tied to the filesystem to do the lookups. But, for the
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein portion that isn't resolved on the file system, it has
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein its own addressing scheme. Do we need the ability to
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein layer resolution?
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein * The translate_name hook goes away
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Wrowe altogether disagrees. translate_name today even operates
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein on URIs ... this mechansim needs to be preserved.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein * The doc for map_to_storage is totally opaque to me. It has
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein something to do with filesystems, but it also talks about
71c66a876ecca77923638d3f94cc0783152b2f03Mark Andrews security and per_dir_config and other stuff. I presume something
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein needs to happen there -- at least better doc.
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein Wrowe agrees and will write it up.
14a656f94b1fd0ababd84a772228dfa52276ba15Evan Hunt
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein * The directory_walk concept disappears. All configuration is
60e5e10f8d2e2b0c41e8abad38cacd867caa6ab2Rob Austein tagged to Locations. The "mod_filesystem" module might have some
internal concept of the same config appearing in multiple
places, but that is handled internally rather than by Apache
core.
Wrowe suggests this is wrong, instead it's private to filesystem
requests, and is already invoked from map_to_storage, not the core
handler. <Directory > and <Files > blocks are preserved as-is,
but <Directory > sections become specific to the filesystem handler
alone. Because alternate filesystem schemes could be loaded, this
should be exposed, from the core, for other file-based stores to
share. Consider an archive store where the layers become
<Directory path> -> <Archive store> -> <File name>
Justin: How do we map Directory entries to Locations?
* The "Location tree" is an in-memory representation of the URL
namespace. Nodes of the tree have configuration specific to that
location in the namespace.
Something like:
typedef struct {
const char *name; /* name of this node relative to parent */
struct ap_conf_vector_t *locn_config;
apr_hash_t *children; /* NULL if no child configs */
} ap_locn_node;
The following config:
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>
Creates a node with name=="server_status", and the node is a
child of the "/" node. (hmm. node->name is redundant with the
hash key; maybe drop node->name)
In the config vector, mod_access has stored its Order, Deny, and
Allow configs. mod_core has stored the SetHandler.
During the Location walk, we merge the config vectors normally.
Note that an Alias simply associates a filesystem path (in
mod_filesystem) with that Location in the tree. Merging
continues with child locations, but a merge is never done
through filesystem locations. Config on a specific subdir needs
to be mapped back into the corresponding point in the Location
tree for proper merging.
* Config is parsed into a tree, as we did for the 2.0 timeframe,
but that tree is just a representation of the config (for
multiple runs and for in-memory manipulation and usage). It is
unrelated to the "Location tree".
* Calls to apr_file_io functions generally need to be replaced
with operations against the ap_resource. For example, rather
than calling apr_dir_open/read/close(), a caller uses
resource->repos->get_children() or somesuch.
Note that things like mod_dir, mod_autoindex, and mod_negotation
need to be converted to use these mechanisms so that their
functions will work on logical repositories rather than just
filesystems.
* How do we handle CGI scripts? Especially when the resource may
not be backed by a file? Ideally, we should be able to come up
with some mechanism to allow CGIs to work in a
repository-independent manner.
- Writing the virtual data as a file and then executing it?
- Can a shell be executed in a streamy manner? (Portably?)
- Have an 'execute_resource' hook/func that allows the
repository to choose its manner - be it exec() or whatever.
- Won't this approach lead to duplication of code? Helper fns?
gstein: PHP, Perl, and Python scripts are nominally executed by
a filter inserted by mod_php/perl/python. I'd suggest
that shell/batch scripts are similar.
But to ask further: what if it is an executable
*program* rather than just a script? Do we yank that out
of the repository, drop it onto the filesystem, and run
it? eeewwwww...
I'll vote -0.9 for CGIs as a filter. Keep 'em handlers.
Justin: So, do we give up executing CGIs from virtual repositories?
That seems like a sad tradeoff to make. I'd like to have
my CGI scripts under DAV (SVN) control.
* How do we handle overlaying of Location and Directory entries?
Right now, we have a problem when /cgi-bin/ is ScriptAlias'd and
mod_dav has control over /. Some people believe that /cgi-bin/
shouldn't be under DAV control, while others do believe it
should be. What's the right strategy?