mod_unique_id.xml revision 53bae66d3dc14a667e14a451f7bc65a893dd450f
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick<!DOCTYPE modulesynopsis SYSTEM "/style/modulesynopsis.dtd">
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce<?xml-stylesheet type="text/xsl" href="/style/manual.en.xsl"?>
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick<description>Provides an environment variable with a unique
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nickidentifier for each request</description>
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce <p>This module provides a magic token for each request which is
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce guaranteed to be unique across "all" requests under very
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce specific conditions. The unique identifier is even unique
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce across multiple machines in a properly configured cluster of
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce machines. The environment variable <code>UNIQUE_ID</code> is
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce set to the identifier for each request. Unique identifiers are
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick useful for various reasons which are beyond the scope of this
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce document.</p>
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick <p>First a brief recap of how the Apache server works on Unix
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick machines. This feature currently isn't supported on Windows NT.
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick On Unix machines, Apache creates several children, the children
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick process requests one at a time. Each child can serve multiple
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce requests in its lifetime. For the purpose of this discussion,
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick the children don't share any data with each other. We'll refer
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce to the children as httpd processes.</p>
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce <p>Your website has one or more machines under your
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick administrative control, together we'll call them a cluster of
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick machines. Each machine can possibly run multiple instances of
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick Apache. All of these collectively are considered "the
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick universe", and with certain assumptions we'll show that in this
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick universe we can generate unique identifiers for each request,
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick without extensive communication between machines in the
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick cluster.</p>
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick <p>The machines in your cluster should satisfy these
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick requirements. (Even if you have only one machine you should
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce synchronize its clock with NTP.)</p>
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce <li>The machines' times are synchronized via NTP or other
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce network time protocol.</li>
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick <li>The machines' hostnames all differ, such that the module
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick can do a hostname lookup on the hostname and receive a
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick different IP address for each machine in the cluster.</li>
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick <p>As far as operating system assumptions go, we assume that
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce pids (process ids) fit in 32-bits. If the operating system uses
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce more than 32-bits for a pid, the fix is trivial but must be
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce performed in the code.</p>
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce <p>Given those assumptions, at a single point in time we can
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce identify any httpd process on any machine in the cluster from
7765ee8964c8ffd7faee9baa0412abeb1ef5b0a4Nick all other httpd processes. The machine's IP address and the pid
c43e8c459ef8a48adf6b336c1a8a13cfe0415065Campbell Barton of the httpd process are sufficient to do this. So in order to
c43e8c459ef8a48adf6b336c1a8a13cfe0415065Campbell Barton generate unique identifiers for requests we need only
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce distinguish between different points in time.</p>
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce <p>To distinguish time we will use a Unix timestamp (seconds
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce since January 1, 1970 UTC), and a 16-bit counter. The timestamp
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce has only one second granularity, so the counter is used to
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce represent up to 65536 values during a single second. The
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce quadruple <em>( ip_addr, pid, time_stamp, counter )</em> is
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce sufficient to enumerate 65536 requests per second per httpd
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce process. There are issues however with pid reuse over time, and
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce the counter is used to alleviate this issue.</p>
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce <p>When an httpd child is created, the counter is initialized
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce with ( current microseconds divided by 10 ) modulo 65536 (this
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce formula was chosen to eliminate some variance problems with the
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce low order bits of the microsecond timers on some systems). When
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce a unique identifier is generated, the time stamp used is the
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce time the request arrived at the web server. The counter is
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce incremented every time an identifier is generated (and allowed
c43e8c459ef8a48adf6b336c1a8a13cfe0415065Campbell Barton to roll over).</p>
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce <p>The kernel generates a pid for each process as it forks the
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce process, and pids are allowed to roll over (they're 16-bits on
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce many Unixes, but newer systems have expanded to 32-bits). So
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce over time the same pid will be reused. However unless it is
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce reused within the same second, it does not destroy the
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce uniqueness of our quadruple. That is, we assume the system does
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce not spawn 65536 processes in a one second interval (it may even
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce be 32768 processes on some Unixes, but even this isn't likely
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce to happen).</p>
c43e8c459ef8a48adf6b336c1a8a13cfe0415065Campbell Barton <p>Suppose that time repeats itself for some reason. That is,
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce suppose that the system's clock is screwed up and it revisits a
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce past time (or it is too far forward, is reset correctly, and
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce then revisits the future time). In this case we can easily show
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce that we can get pid and time stamp reuse. The choice of
c43e8c459ef8a48adf6b336c1a8a13cfe0415065Campbell Barton initializer for the counter is intended to help defeat this.
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce Note that we really want a random number to initialize the
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce counter, but there aren't any readily available numbers on most
c43e8c459ef8a48adf6b336c1a8a13cfe0415065Campbell Barton systems (<em>i.e.</em>, you can't use rand() because you need
c43e8c459ef8a48adf6b336c1a8a13cfe0415065Campbell Barton to seed the generator, and can't seed it with the time because
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce time, at least at one second resolution, has repeated itself).
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce This is not a perfect defense.</p>
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce <p>How good a defense is it? Suppose that one of your machines
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce serves at most 500 requests per second (which is a very
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce reasonable upper bound at this writing, because systems
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce generally do more than just shovel out static files). To do
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce that it will require a number of children which depends on how
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce many concurrent clients you have. But we'll be pessimistic and
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce suppose that a single child is able to serve 500 requests per
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce second. There are 1000 possible starting counter values such
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce that two sequences of 500 requests overlap. So there is a 1.5%
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce chance that if time (at one second resolution) repeats itself
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce this child will repeat a counter value, and uniqueness will be
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce broken. This was a very pessimistic example, and with real
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce world values it's even less likely to occur. If your system is
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce such that it's still likely to occur, then perhaps you should
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce make the counter 32 bits (by editing the code).</p>
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce <p>You may be concerned about the clock being "set back" during
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce summer daylight savings. However this isn't an issue because
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce the times used here are UTC, which "always" go forward. Note
f5c7ca101be6b6abe17c6c28e971c86617f6c782bryce that x86 based Unixes may need proper configuration for this to
c43e8c459ef8a48adf6b336c1a8a13cfe0415065Campbell Barton be true -- they should be configured to assume that the