c1d83a48b931e1221c061386159f01a9ed8b1461rbowen<?xml version="1.0"?>
0d26a2bd71224b954baab529bbadc4d676c35b95slive<!DOCTYPE modulesynopsis SYSTEM "/style/modulesynopsis.dtd">
e942c741056732f50da2074b36fe59805d370650slive<?xml-stylesheet type="text/xsl" href="/style/manual.en.xsl"?>
5f5d1b4cc970b7f06ff8ef6526128e9a27303d88nd<!-- $LastChangedRevision$ -->
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd<!--
db479b48bd4d75423ed4a45e15b75089d1a8ad72fielding Licensed to the Apache Software Foundation (ASF) under one or more
db479b48bd4d75423ed4a45e15b75089d1a8ad72fielding contributor license agreements. See the NOTICE file distributed with
db479b48bd4d75423ed4a45e15b75089d1a8ad72fielding this work for additional information regarding copyright ownership.
db479b48bd4d75423ed4a45e15b75089d1a8ad72fielding The ASF licenses this file to You under the Apache License, Version 2.0
db479b48bd4d75423ed4a45e15b75089d1a8ad72fielding (the "License"); you may not use this file except in compliance with
db479b48bd4d75423ed4a45e15b75089d1a8ad72fielding the License. You may obtain a copy of the License at
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd http://www.apache.org/licenses/LICENSE-2.0
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd Unless required by applicable law or agreed to in writing, software
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd distributed under the License is distributed on an "AS IS" BASIS,
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd See the License for the specific language governing permissions and
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd limitations under the License.
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd-->
d5d794fc2f4cc9ca6d6da17cfa2cdcd8d244bacdnd
7db9f691a00ead175b03335457ca296a33ddf31bnd<modulesynopsis metafile="mod_unique_id.xml.meta">
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen<name>mod_unique_id</name>
0d26a2bd71224b954baab529bbadc4d676c35b95slive<description>Provides an environment variable with a unique
0d26a2bd71224b954baab529bbadc4d676c35b95sliveidentifier for each request</description>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen<status>Extension</status>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen<sourcefile>mod_unique_id.c</sourcefile>
0d26a2bd71224b954baab529bbadc4d676c35b95slive<identifier>unique_id_module</identifier>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen<summary>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>This module provides a magic token for each request which is
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen guaranteed to be unique across "all" requests under very
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen specific conditions. The unique identifier is even unique
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen across multiple machines in a properly configured cluster of
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen machines. The environment variable <code>UNIQUE_ID</code> is
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen set to the identifier for each request. Unique identifiers are
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen useful for various reasons which are beyond the scope of this
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen document.</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen</summary>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
53bae66d3dc14a667e14a451f7bc65a893dd450fnd<section id="theory">
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <title>Theory</title>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>First a brief recap of how the Apache server works on Unix
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen machines. This feature currently isn't supported on Windows NT.
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen On Unix machines, Apache creates several children, the children
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen process requests one at a time. Each child can serve multiple
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen requests in its lifetime. For the purpose of this discussion,
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen the children don't share any data with each other. We'll refer
a1ef40892ffa2b44fc249423c5b6c42a74a84c68nd to the children as <dfn>httpd processes</dfn>.</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>Your website has one or more machines under your
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen administrative control, together we'll call them a cluster of
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen machines. Each machine can possibly run multiple instances of
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen Apache. All of these collectively are considered "the
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen universe", and with certain assumptions we'll show that in this
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen universe we can generate unique identifiers for each request,
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen without extensive communication between machines in the
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen cluster.</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>The machines in your cluster should satisfy these
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen requirements. (Even if you have only one machine you should
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen synchronize its clock with NTP.)</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <ul>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <li>The machines' times are synchronized via NTP or other
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen network time protocol.</li>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <li>The machines' hostnames all differ, such that the module
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen can do a hostname lookup on the hostname and receive a
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen different IP address for each machine in the cluster.</li>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen </ul>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>As far as operating system assumptions go, we assume that
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen pids (process ids) fit in 32-bits. If the operating system uses
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen more than 32-bits for a pid, the fix is trivial but must be
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen performed in the code.</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>Given those assumptions, at a single point in time we can
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen identify any httpd process on any machine in the cluster from
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen all other httpd processes. The machine's IP address and the pid
a93bec588d3829a236a6dc1dca8d51c4eca98d23sf of the httpd process are sufficient to do this. A httpd process
a93bec588d3829a236a6dc1dca8d51c4eca98d23sf can handle multiple requests simultaneously if you use a
a93bec588d3829a236a6dc1dca8d51c4eca98d23sf multi-threaded MPM. In order to identify threads, we use a thread
a93bec588d3829a236a6dc1dca8d51c4eca98d23sf index Apache httpd uses internally. So in order to
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen generate unique identifiers for requests we need only
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen distinguish between different points in time.</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>To distinguish time we will use a Unix timestamp (seconds
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen since January 1, 1970 UTC), and a 16-bit counter. The timestamp
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen has only one second granularity, so the counter is used to
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen represent up to 65536 values during a single second. The
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen quadruple <em>( ip_addr, pid, time_stamp, counter )</em> is
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen sufficient to enumerate 65536 requests per second per httpd
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen process. There are issues however with pid reuse over time, and
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen the counter is used to alleviate this issue.</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>When an httpd child is created, the counter is initialized
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen with ( current microseconds divided by 10 ) modulo 65536 (this
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen formula was chosen to eliminate some variance problems with the
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen low order bits of the microsecond timers on some systems). When
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen a unique identifier is generated, the time stamp used is the
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen time the request arrived at the web server. The counter is
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen incremented every time an identifier is generated (and allowed
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen to roll over).</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>The kernel generates a pid for each process as it forks the
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen process, and pids are allowed to roll over (they're 16-bits on
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen many Unixes, but newer systems have expanded to 32-bits). So
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen over time the same pid will be reused. However unless it is
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen reused within the same second, it does not destroy the
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen uniqueness of our quadruple. That is, we assume the system does
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen not spawn 65536 processes in a one second interval (it may even
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen be 32768 processes on some Unixes, but even this isn't likely
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen to happen).</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>Suppose that time repeats itself for some reason. That is,
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen suppose that the system's clock is screwed up and it revisits a
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen past time (or it is too far forward, is reset correctly, and
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen then revisits the future time). In this case we can easily show
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen that we can get pid and time stamp reuse. The choice of
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen initializer for the counter is intended to help defeat this.
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen Note that we really want a random number to initialize the
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen counter, but there aren't any readily available numbers on most
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen systems (<em>i.e.</em>, you can't use rand() because you need
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen to seed the generator, and can't seed it with the time because
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen time, at least at one second resolution, has repeated itself).
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen This is not a perfect defense.</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>How good a defense is it? Suppose that one of your machines
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen serves at most 500 requests per second (which is a very
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen reasonable upper bound at this writing, because systems
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen generally do more than just shovel out static files). To do
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen that it will require a number of children which depends on how
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen many concurrent clients you have. But we'll be pessimistic and
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen suppose that a single child is able to serve 500 requests per
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen second. There are 1000 possible starting counter values such
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen that two sequences of 500 requests overlap. So there is a 1.5%
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen chance that if time (at one second resolution) repeats itself
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen this child will repeat a counter value, and uniqueness will be
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen broken. This was a very pessimistic example, and with real
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen world values it's even less likely to occur. If your system is
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen such that it's still likely to occur, then perhaps you should
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen make the counter 32 bits (by editing the code).</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>You may be concerned about the clock being "set back" during
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen summer daylight savings. However this isn't an issue because
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen the times used here are UTC, which "always" go forward. Note
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen that x86 based Unixes may need proper configuration for this to
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen be true -- they should be configured to assume that the
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen motherboard clock is on UTC and compensate appropriately. But
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen even still, if you're running NTP then your UTC time will be
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen correct very shortly after reboot.</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
a93bec588d3829a236a6dc1dca8d51c4eca98d23sf <!-- FIXME: thread_index is unsigned int, so not always 32bit.-->
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>The <code>UNIQUE_ID</code> environment variable is
a93bec588d3829a236a6dc1dca8d51c4eca98d23sf constructed by encoding the 144-bit (32-bit IP address, 32 bit
a93bec588d3829a236a6dc1dca8d51c4eca98d23sf pid, 32 bit time stamp, 16 bit counter, 32 bit thread index)
a93bec588d3829a236a6dc1dca8d51c4eca98d23sf quadruple using the
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen alphabet <code>[A-Za-z0-9@-]</code> in a manner similar to MIME
a93bec588d3829a236a6dc1dca8d51c4eca98d23sf base64 encoding, producing 24 characters. The MIME base64
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen alphabet is actually <code>[A-Za-z0-9+/]</code> however
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <code>+</code> and <code>/</code> need to be specially encoded
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen in URLs, which makes them less desirable. All values are
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen encoded in network byte ordering so that the encoding is
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen comparable across architectures of different byte ordering. The
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen actual ordering of the encoding is: time stamp, IP address,
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen pid, counter. This ordering has a purpose, but it should be
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen emphasized that applications should not dissect the encoding.
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen Applications should treat the entire encoded
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <code>UNIQUE_ID</code> as an opaque token, which can be
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen compared against other <code>UNIQUE_ID</code>s for equality
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen only.</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>The ordering was chosen such that it's possible to change
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen the encoding in the future without worrying about collision
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen with an existing database of <code>UNIQUE_ID</code>s. The new
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen encodings should also keep the time stamp as the first element,
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen and can otherwise use the same alphabet and bit length. Since
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen the time stamps are essentially an increasing sequence, it's
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen sufficient to have a <em>flag second</em> in which all machines
11235fe555f482044904c555d3725c7d38575bf6jailletc in the cluster stop serving any request, and stop using the old
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen encoding format. Afterwards they can resume requests and begin
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen issuing the new encodings.</p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen <p>This we believe is a relatively portable solution to this
a93bec588d3829a236a6dc1dca8d51c4eca98d23sf problem. The identifiers
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen generated have essentially an infinite life-time because future
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen identifiers can be made longer as required. Essentially no
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen communication is required between machines in the cluster (only
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen NTP synchronization is required, which is low overhead), and no
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen communication between httpd processes is required (the
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen communication is implicit in the pid value assigned by the
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen kernel). In very specific situations the identifier can be
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen shortened, but more information needs to be assumed (for
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen example the 32-bit IP address is overkill for any site, but
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen there is no portable shorter replacement for it). </p>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen</section>
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen
c1d83a48b931e1221c061386159f01a9ed8b1461rbowen</modulesynopsis>