85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsyncAPACHE 2.x ROADMAP
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync==================
ecd27a5c936b5767a6c08e21e204f2d4340ec926vboxsyncLast modified at [$Date$]
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsyncWORKS IN PROGRESS
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync-----------------
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync * Source code should follow style guidelines.
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync OK, we all agree pretty code is good. Probably best to clean this
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync up by hand immediately upon branching a 2.1 tree.
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync Status: Justin volunteers to hand-edit the entire source tree ;)
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync Justin says:
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync Recall when the release plan for 2.0 was written:
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync Absolute Enforcement of an "Apache Style" for code.
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync Watch this slip into 3.0.
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync David says:
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync The style guide needs to be reviewed before this can be done.
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync http://httpd.apache.org/dev/styleguide.html
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync The current file is dated April 20th 1998!
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync OtherBill offers:
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync It's survived since '98 because it's welldone :-) Suggest we
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync simply follow whatever is documented in styleguide.html as we
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync branch the next tree. Really sort of straightforward, if you
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync dislike a bit within that doc, bring it up on the dev@httpd
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync list prior to the next branch.
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync So Bill sums up ... let's get the code cleaned up in CVS head.
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync Remember, it just takes cvs diff -b (that is, --ignore-space-change)
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync to see the code changes and ignore that cruft. Get editing Justin :)
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync * Replace stat [deferred open] with open/fstat in directory_walk.
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync Justin, Ian, OtherBill all interested in this. Implies setting up
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync the apr_file_t member in request_rec, and having all modules use
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync that file, and allow the cleanup to close it [if it isn't a shared,
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync cached file handle.]
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync * The Async Apache Server implemented in terms of APR.
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync [Bill Stoddard's pet project.]
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync Message-ID: <008301c17d42$9b446970$01000100@sashimi> (dev@apr)
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync OtherBill notes that this can proceed in two parts...
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync Async accept, setup, and tear-down of the request
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync e.g. dealing with the incoming request headers, prior to
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync dispatching the request to a thread for processing.
85fc7dbf8f928aea2a6fddde85a77950f69284ddvboxsync This doesn't need to wait for a 2.x/3.0 bump.
Async delegation of the entire request processing chain
Too many handlers use stack storage and presume it is
available for the life of the request, so a complete
async implementation would need to happen 3.0 release.
Brian notes that async writes will provide a bigger
scalability win than async reads for most servers.
We may want to try a hybrid sync-read/async-write MPM
as a next step. This should be relatively easy to
build: start with the current worker or leader/followers
model, but hand off each response brigade to a "completion
thread" that multiplexes writes on many connections, so
that the worker thread doesn't have to wait around for
the sendfile to complete.
MAKING APACHE REPOSITORY-AGNOSTIC
(or: remove knowledge of the filesystem)
[ 2002/10/01: discussion in progress on items below; this isn't
planned yet ]
* dav_resource concept for an HTTP resource ("ap_resource")
* r->filename, r->canonical_filename, r->finfo need to
disappear. All users need to use new APIs on the ap_resource
object.
(backwards compat: today, when this occurs with mod_dav and a
custom backend, the above items refer to the topmost directory
mapped by a location; e.g. docroot)
Need to preserve a 'filename'-like string for mime-by-name
sorts of operations. But this only needs to be the name itself
and not a full path.
Justin: Can we leverage the path info, or do we not trust the
user?
gstein: well, it isn't the "path info", but the actual URI of
the resource. And of course we trust the user... that is
the resource they requested.
dav_resource->uri is the field you want. path_info might
still exist, but that portion might be related to the
CGI concept of "path translated" or some other further
resolution.
To continue, I would suggest that "path translated" and
having *any* path info is Badness. It means that you did
not fully resolve a resource for the given URI. The
"abs_path" in a URI identifies a resource, and that
should get fully resolved. None of this "resolve to
<here> and then we have a magical second resolution
(inside the CGI script)" or somesuch.
Justin: Well, let's consider mod_mbox for a second. It is sort of
a virtual filesystem in its own right - as it introduces
it's own notion of a URI space, but it is intrinsically
tied to the filesystem to do the lookups. But, for the
portion that isn't resolved on the file system, it has
its own addressing scheme. Do we need the ability to
layer resolution?
* The translate_name hook goes away
Wrowe altogether disagrees. translate_name today even operates
on URIs ... this mechansim needs to be preserved.
* The doc for map_to_storage is totally opaque to me. It has
something to do with filesystems, but it also talks about
security and per_dir_config and other stuff. I presume something
needs to happen there -- at least better doc.
Wrowe agrees and will write it up.
* The directory_walk concept disappears. All configuration is
tagged to Locations. The "mod_filesystem" module might have some
internal concept of the same config appearing in multiple
places, but that is handled internally rather than by Apache
core.
Wrowe suggests this is wrong, instead it's private to filesystem
requests, and is already invoked from map_to_storage, not the core
handler. <Directory > and <Files > blocks are preserved as-is,
but <Directory > sections become specific to the filesystem handler
alone. Because alternate filesystem schemes could be loaded, this
should be exposed, from the core, for other file-based stores to
share. Consider an archive store where the layers become
<Directory path> -> <Archive store> -> <File name>
Justin: How do we map Directory entries to Locations?
* The "Location tree" is an in-memory representation of the URL
namespace. Nodes of the tree have configuration specific to that
location in the namespace.
Something like:
typedef struct {
const char *name; /* name of this node relative to parent */
struct ap_conf_vector_t *locn_config;
apr_hash_t *children; /* NULL if no child configs */
} ap_locn_node;
The following config:
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>
Creates a node with name=="server_status", and the node is a
child of the "/" node. (hmm. node->name is redundant with the
hash key; maybe drop node->name)
In the config vector, mod_access has stored its Order, Deny, and
Allow configs. mod_core has stored the SetHandler.
During the Location walk, we merge the config vectors normally.
Note that an Alias simply associates a filesystem path (in
mod_filesystem) with that Location in the tree. Merging
continues with child locations, but a merge is never done
through filesystem locations. Config on a specific subdir needs
to be mapped back into the corresponding point in the Location
tree for proper merging.
* Config is parsed into a tree, as we did for the 2.0 timeframe,
but that tree is just a representation of the config (for
multiple runs and for in-memory manipulation and usage). It is
unrelated to the "Location tree".
* Calls to apr_file_io functions generally need to be replaced
with operations against the ap_resource. For example, rather
than calling apr_dir_open/read/close(), a caller uses
resource->repos->get_children() or somesuch.
Note that things like mod_dir, mod_autoindex, and mod_negotiation
need to be converted to use these mechanisms so that their
functions will work on logical repositories rather than just
filesystems.
* How do we handle CGI scripts? Especially when the resource may
not be backed by a file? Ideally, we should be able to come up
with some mechanism to allow CGIs to work in a
repository-independent manner.
- Writing the virtual data as a file and then executing it?
- Can a shell be executed in a streamy manner? (Portably?)
- Have an 'execute_resource' hook/func that allows the
repository to choose its manner - be it exec() or whatever.
- Won't this approach lead to duplication of code? Helper fns?
gstein: PHP, Perl, and Python scripts are nominally executed by
a filter inserted by mod_php/perl/python. I'd suggest
that shell/batch scripts are similar.
But to ask further: what if it is an executable
*program* rather than just a script? Do we yank that out
of the repository, drop it onto the filesystem, and run
it? eeewwwww...
I'll vote -0.9 for CGIs as a filter. Keep 'em handlers.
Justin: So, do we give up executing CGIs from virtual repositories?
That seems like a sad tradeoff to make. I'd like to have
my CGI scripts under DAV (SVN) control.
* How do we handle overlaying of Location and Directory entries?
Right now, we have a problem when /cgi-bin/ is ScriptAlias'd and
mod_dav has control over /. Some people believe that /cgi-bin/
shouldn't be under DAV control, while others do believe it
should be. What's the right strategy?