ROADMAP revision b76a31daaa6e83bb0fd627a04f20e82bffcf1df4
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel BřezinaAPACHE 2.x ROADMAP
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina==================
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel BřezinaLast modified at [$Date$]
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel BřezinaWORKS IN PROGRESS
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina-----------------
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina * Source code should follow style guidelines.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina OK, we all agree pretty code is good. Probably best to clean this
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina up by hand immediately upon branching a 2.1 tree.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Status: Justin volunteers to hand-edit the entire source tree ;)
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Justin says:
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Recall when the release plan for 2.0 was written:
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Absolute Enforcement of an "Apache Style" for code.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Watch this slip into 3.0.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina David says:
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina The style guide needs to be reviewed before this can be done.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina http://httpd.apache.org/dev/styleguide.html
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina The current file is dated April 20th 1998!
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina OtherBill offers:
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina It's survived since '98 because it's welldone :-) Suggest we
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina simply follow whatever is documented in styleguide.html as we
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina branch the next tree. Really sort of straightforward, if you
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina dislike a bit within that doc, bring it up on the dev@httpd
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina list prior to the next branch.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina So Bill sums up ... let's get the code cleaned up in CVS head.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Remember, it just takes cvs diff -b (that is, --ignore-space-change)
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina to see the code changes and ignore that cruft. Get editing Justin :)
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina * Replace stat [deferred open] with open/fstat in directory_walk.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Justin, Ian, OtherBill all interested in this. Implies setting up
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina the apr_file_t member in request_rec, and having all modules use
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina that file, and allow the cleanup to close it [if it isn't a shared,
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina cached file handle.]
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina * The Async Apache Server implemented in terms of APR.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina [Bill Stoddard's pet project.]
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Message-ID: <008301c17d42$9b446970$01000100@sashimi> (dev@apr)
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina OtherBill notes that this can proceed in two parts...
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Async accept, setup, and tear-down of the request
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina e.g. dealing with the incoming request headers, prior to
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina dispatching the request to a thread for processing.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina This doesn't need to wait for a 2.x/3.0 bump.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Async delegation of the entire request processing chain
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Too many handlers use stack storage and presume it is
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina available for the life of the request, so a complete
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina async implementation would need to happen 3.0 release.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Brian notes that async writes will provide a bigger
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina scalability win than async reads for most servers.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina We may want to try a hybrid sync-read/async-write MPM
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina as a next step. This should be relatively easy to
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina build: start with the current worker or leader/followers
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina model, but hand off each response brigade to a "completion
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina thread" that multiplexes writes on many connections, so
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina that the worker thread doesn't have to wait around for
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina the sendfile to complete.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel BřezinaMAKING APACHE REPOSITORY-AGNOSTIC
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina(or: remove knowledge of the filesystem)
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina[ 2002/10/01: discussion in progress on items below; this isn't
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina planned yet ]
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina * dav_resource concept for an HTTP resource ("ap_resource")
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina * r->filename, r->canonical_filename, r->finfo need to
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina disappear. All users need to use new APIs on the ap_resource
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina object.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina (backwards compat: today, when this occurs with mod_dav and a
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina custom backend, the above items refer to the topmost directory
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina mapped by a location; e.g. docroot)
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Need to preserve a 'filename'-like string for mime-by-name
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina sorts of operations. But this only needs to be the name itself
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina and not a full path.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Justin: Can we leverage the path info, or do we not trust the
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina user?
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina gstein: well, it isn't the "path info", but the actual URI of
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina the resource. And of course we trust the user... that is
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina the resource they requested.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina dav_resource->uri is the field you want. path_info might
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina still exist, but that portion might be related to the
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina CGI concept of "path translated" or some other further
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina resolution.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina To continue, I would suggest that "path translated" and
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina having *any* path info is Badness. It means that you did
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina not fully resolve a resource for the given URI. The
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina "abs_path" in a URI identifies a resource, and that
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina should get fully resolved. None of this "resolve to
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina <here> and then we have a magical second resolution
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina (inside the CGI script)" or somesuch.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Justin: Well, let's consider mod_mbox for a second. It is sort of
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina a virtual filesystem in its own right - as it introduces
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina it's own notion of a URI space, but it is intrinsically
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina tied to the filesystem to do the lookups. But, for the
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina portion that isn't resolved on the file system, it has
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina its own addressing scheme. Do we need the ability to
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina layer resolution?
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina * The translate_name hook goes away
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Wrowe altogether disagrees. translate_name today even operates
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina on URIs ... this mechansim needs to be preserved.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
35fa5a83ce8badf6bc868937047f44c3f32b7c28Sumit Bose * The doc for map_to_storage is totally opaque to me. It has
35fa5a83ce8badf6bc868937047f44c3f32b7c28Sumit Bose something to do with filesystems, but it also talks about
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina security and per_dir_config and other stuff. I presume something
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina needs to happen there -- at least better doc.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Wrowe agrees and will write it up.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina * The directory_walk concept disappears. All configuration is
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina tagged to Locations. The "mod_filesystem" module might have some
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina internal concept of the same config appearing in multiple
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina places, but that is handled internally rather than by Apache
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina core.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Wrowe suggests this is wrong, instead it's private to filesystem
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina requests, and is already invoked from map_to_storage, not the core
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina handler. <Directory > and <Files > blocks are preserved as-is,
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina but <Directory > sections become specific to the filesystem handler
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina alone. Because alternate filesystem schemes could be loaded, this
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina should be exposed, from the core, for other file-based stores to
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina share. Consider an archive store where the layers become
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina <Directory path> -> <Archive store> -> <File name>
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Justin: How do we map Directory entries to Locations?
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina * The "Location tree" is an in-memory representation of the URL
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina namespace. Nodes of the tree have configuration specific to that
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina location in the namespace.
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Something like:
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina typedef struct {
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina const char *name; /* name of this node relative to parent */
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina struct ap_conf_vector_t *locn_config;
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina apr_hash_t *children; /* NULL if no child configs */
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina } ap_locn_node;
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina The following config:
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina <Location /server-status>
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina SetHandler server-status
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Order deny,allow
d3dee2a07f1a8ee9ae6f94e149ced754ef76c248Pavel Březina Deny from all
Allow from 127.0.0.1
</Location>
Creates a node with name=="server_status", and the node is a
child of the "/" node. (hmm. node->name is redundant with the
hash key; maybe drop node->name)
In the config vector, mod_access has stored its Order, Deny, and
Allow configs. mod_core has stored the SetHandler.
During the Location walk, we merge the config vectors normally.
Note that an Alias simply associates a filesystem path (in
mod_filesystem) with that Location in the tree. Merging
continues with child locations, but a merge is never done
through filesystem locations. Config on a specific subdir needs
to be mapped back into the corresponding point in the Location
tree for proper merging.
* Config is parsed into a tree, as we did for the 2.0 timeframe,
but that tree is just a representation of the config (for
multiple runs and for in-memory manipulation and usage). It is
unrelated to the "Location tree".
* Calls to apr_file_io functions generally need to be replaced
with operations against the ap_resource. For example, rather
than calling apr_dir_open/read/close(), a caller uses
resource->repos->get_children() or somesuch.
Note that things like mod_dir, mod_autoindex, and mod_negotiation
need to be converted to use these mechanisms so that their
functions will work on logical repositories rather than just
filesystems.
* How do we handle CGI scripts? Especially when the resource may
not be backed by a file? Ideally, we should be able to come up
with some mechanism to allow CGIs to work in a
repository-independent manner.
- Writing the virtual data as a file and then executing it?
- Can a shell be executed in a streamy manner? (Portably?)
- Have an 'execute_resource' hook/func that allows the
repository to choose its manner - be it exec() or whatever.
- Won't this approach lead to duplication of code? Helper fns?
gstein: PHP, Perl, and Python scripts are nominally executed by
a filter inserted by mod_php/perl/python. I'd suggest
that shell/batch scripts are similar.
But to ask further: what if it is an executable
*program* rather than just a script? Do we yank that out
of the repository, drop it onto the filesystem, and run
it? eeewwwww...
I'll vote -0.9 for CGIs as a filter. Keep 'em handlers.
Justin: So, do we give up executing CGIs from virtual repositories?
That seems like a sad tradeoff to make. I'd like to have
my CGI scripts under DAV (SVN) control.
* How do we handle overlaying of Location and Directory entries?
Right now, we have a problem when /cgi-bin/ is ScriptAlias'd and
mod_dav has control over /. Some people believe that /cgi-bin/
shouldn't be under DAV control, while others do believe it
should be. What's the right strategy?