mod_unique_id.html revision 25503838e438bb909e3ff880125732c7ed5e64ad
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<HTML>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<HEAD>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<TITLE>Apache module mod_unique_id</TITLE>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo</HEAD>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<BODY
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo BGCOLOR="#FFFFFF"
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo TEXT="#000000"
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo LINK="#0000FF"
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo VLINK="#000080"
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo ALINK="#FF0000"
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<!--#include virtual="header.html" -->
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<H1 ALIGN="CENTER">Module mod_unique_id</H1>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooThis module provides a magic token for each request which is guaranteed
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooto be unique across "all" requests under very specific conditions.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooThe unique identifier is even unique across multiple machines in a
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooproperly configured cluster of machines. The environment variable
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<CODE>UNIQUE_ID</CODE> is set to the identifier for each request.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooUnique identifiers are useful for various reasons which are beyond the
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooscope of this document.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<H2>Theory</H2>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooFirst a brief recap of how the Apache server works on Unix machines.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooThis feature currently isn't supported on Windows NT. On Unix machines,
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooApache creates several children, the children process requests one at
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooa time. Each child can serve multiple requests in its lifetime. For the
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoopurpose of this discussion, the children don't share any data
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoowith each other. We'll refer to the children as httpd processes.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooYour website has one or more machines under your administrative control,
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yootogether we'll call them a cluster of machines. Each machine can
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoopossibly run multiple instances of Apache. All of these collectively
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooare considered "the universe", and with certain assumptions we'll
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooshow that in this universe we can generate unique identifiers for each
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoorequest, without extensive communication between machines in the cluster.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooThe machines in your cluster should satisfy these requirements.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo(Even if you have only one machine you should synchronize its clock
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoowith NTP.)
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<UL>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<LI>The machines' times are synchronized via NTP or other network time
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo protocol.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<LI>The machines' hostnames all differ, such that the module can do a
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo hostname lookup on the hostname and receive a different IP address
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo for each machine in the cluster.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo</UL>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooAs far as operating system assumptions go, we assume that pids (process
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooids) fit in 32-bits. If the operating system uses more than 32-bits
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoofor a pid, the fix is trivial but must be performed in the code.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooGiven those assumptions, at a single point in time we can identify
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooany httpd process on any machine in the cluster from all other httpd
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooprocesses. The machine's IP address and the pid of the httpd process
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooare sufficient to do this. So in order to generate unique identifiers
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoofor requests we need only distinguish between different points in time.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooTo distinguish time we will use a Unix timestamp (seconds since January
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo1, 1970 UTC), and a 16-bit counter. The timestamp has only one second
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoogranularity, so the counter is used to represent up to 65536 values
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooduring a single second. The quadruple <EM>( ip_addr, pid, time_stamp,
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoocounter )</EM> is sufficient to enumerate 65536 requests per second per
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoohttpd process. There are issues however with pid reuse over
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yootime, and the counter is used to alleviate this issue.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooWhen an httpd child is created, the counter is initialized with (
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoocurrent microseconds divided by 10 ) modulo 65536 (this formula was
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoochosen to eliminate some variance problems with the low order bits of
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoothe microsecond timers on some systems). When a unique identifier is
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoogenerated, the time stamp used is the time the request arrived at the
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooweb server. The counter is incremented every time an identifier is
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoogenerated (and allowed to roll over).
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooThe kernel generates a pid for each process as it forks the process, and
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoopids are allowed to roll over (they're 16-bits on many Unixes, but newer
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoosystems have expanded to 32-bits). So over time the same pid will be
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooreused. However unless it is reused within the same second, it does not
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoodestroy the uniqueness of our quadruple. That is, we assume the system
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoodoes not spawn 65536 processes in a one second interval (it may even be
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo32768 processes on some Unixes, but even this isn't likely to happen).
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooSuppose that time repeats itself for some reason. That is, suppose that
129881b9d52cbe45ed1e2529da980fa231c4e97eStéphane Graberthe system's clock is screwed up and it revisits a past time (or it is
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yootoo far forward, is reset correctly, and then revisits the future time).
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooIn this case we can easily show that we can get pid and time stamp reuse.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooThe choice of initializer for the counter is intended to help defeat this.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooNote that we really want a random number to initialize the counter,
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoobut there aren't any readily available numbers on most systems (i.e. you
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoocan't use rand() because you need to seed the generator, and can't seed
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooit with the time because time, at least at one second resolution, has
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoorepeated itself). This is not a perfect defense.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooHow good a defense is it? Well suppose that one of your machines serves
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooat most 500 requests per second (which is a very reasonable upper bound
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooat this writing, because systems generally do more than just shovel out
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoostatic files). To do that it will require a number of children which
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoodepends on how many concurrent clients you have. But we'll be pessimistic
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooand suppose that a single child is able to serve 500 requests per second.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooThere are 1000 possible starting counter values such that two sequences
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooof 500 requests overlap. So there is a 1.5% chance that if time (at one
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoosecond resolution) repeats itself this child will repeat a counter value,
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooand uniqueness will be broken. This was a very pessimistic example,
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooand with real world values it's even less likely to occur. If your
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoosystem is such that it's still likely to occur, then perhaps you should
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoomake the counter 32 bits (by editing the code).
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooYou may be concerned about the clock being "set back" during summer
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoodaylight savings. However this isn't an issue because the times used here
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooare UTC, which "always" go forward. Note that x86 based Unixes may need
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooproper configuration for this to be true -- they should be configured to
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooassume that the motherboard clock is on UTC and compensate appropriately.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooBut even still, if you're running NTP then your UTC time will be correct
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoovery shortly after reboot.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooThe <CODE>UNIQUE_ID</CODE> environment variable is constructed by
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooencoding the 112-bit (32-bit IP address, 32 bit pid, 32 bit time stamp,
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo16 bit counter) quadruple using the alphabet <CODE>[A-Za-z0-9@-]</CODE>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooin a manner similar to MIME base64 encoding, producing 19 characters.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooThe MIME base64 alphabet is actually <CODE>[A-Za-z0-9+/]</CODE> however
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<CODE>+</CODE> and <CODE>/</CODE> need to be specially encoded in URLs,
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoowhich makes them less desirable. All values are encoded in network
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoobyte ordering so that the encoding is comparable across architectures of
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoodifferent byte ordering. The actual ordering of the encoding is: time
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoostamp, IP address, pid, counter. This ordering has a purpose, but it
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooshould be emphasized that applications should not dissect the encoding.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooApplications should treat the entire encoded <CODE>UNIQUE_ID</CODE> as an
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooopaque token, which can be compared against other <CODE>UNIQUE_ID</CODE>s
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoofor equality only.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooThe ordering was chosen such that it's possible to change the encoding
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooin the future without worrying about collision with an existing database
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooof <CODE>UNIQUE_ID</CODE>s. The new encodings should also keep the time
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoostamp as the first element, and can otherwise use the same alphabet and
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoobit length. Since the time stamps are essentially an increasing sequence,
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooit's sufficient to have a <EM>flag second</EM> in which all machines in the
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoocluster stop serving and request, and stop using the old encoding format.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooAfterwards they can resume requests and begin issuing the new encodings.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<P>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooThis we believe is a relatively portable solution to this problem. It can
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoobe extended to multithreaded systems like Windows NT, and can grow with
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoofuture needs. The identifiers generated have essentially an infinite
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoolife-time because future identifiers can be made longer as required.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae YooEssentially no communication is required between machines in the cluster
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo(only NTP synchronization is required, which is low overhead), and no
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoocommunication between httpd processes is required (the communication is
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooimplicit in the pid value assigned by the kernel). In very specific
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoosituations the identifier can be shortened, but more information needs
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yooto be assumed (for example the 32-bit IP address is overkill for any
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoosite, but there is no portable shorter replacement for it).
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<HR>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<H2>Directives</H2>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<CODE>mod_unique_id</CODE> has no directives.
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo<!--#include virtual="footer.html" -->
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo</BODY>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo</HTML>
2b371b262f7272266ff18cc2aff65176a2c16383Sungbae Yoo