README revision d04ccbb3f3163ae5962a8b7465d9796bff6ca434
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsCDDL HEADER START
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
861a4860e4b1444b4942386a44bf56539d160c25Mark AndrewsThe contents of this file are subject to the terms of the
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsCommon Development and Distribution License (the "License").
ec5347e2c775f027573ce5648b910361aa926c01Automatic UpdaterYou may not use this file except in compliance with the License.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsYou can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsor http://www.opensolaris.org/os/licensing.
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsSee the License for the specific language governing permissions
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsand limitations under the License.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsWhen distributing Covered Code, include this CDDL HEADER in each
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsfile and include the License file at usr/src/OPENSOLARIS.LICENSE.
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsIf applicable, add the following below this CDDL HEADER, with the
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsfields enclosed by brackets "[]" replaced with your own identifying
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsinformation: Portions Copyright [yyyy] [name of copyright owner]
ea94d370123a5892f6c47a97f21d1b28d44bb168Tinderbox User
95317501208f3bf5b159e6a40801b7069f68c486Mark AndrewsCDDL HEADER END
95317501208f3bf5b159e6a40801b7069f68c486Mark Andrews
95317501208f3bf5b159e6a40801b7069f68c486Mark AndrewsCopyright 2007 Sun Microsystems, Inc. All rights reserved.
95317501208f3bf5b159e6a40801b7069f68c486Mark AndrewsUse is subject to license terms.
95317501208f3bf5b159e6a40801b7069f68c486Mark Andrews
95317501208f3bf5b159e6a40801b7069f68c486Mark AndrewsArchitectural Overview for the DHCP agent
95317501208f3bf5b159e6a40801b7069f68c486Mark AndrewsPeter Memishian
95317501208f3bf5b159e6a40801b7069f68c486Mark Andrewsident "%Z%%M% %I% %E% SMI"
95317501208f3bf5b159e6a40801b7069f68c486Mark Andrews
95317501208f3bf5b159e6a40801b7069f68c486Mark AndrewsINTRODUCTION
95317501208f3bf5b159e6a40801b7069f68c486Mark Andrews============
95317501208f3bf5b159e6a40801b7069f68c486Mark Andrews
95317501208f3bf5b159e6a40801b7069f68c486Mark AndrewsThe Solaris DHCP agent (dhcpagent) is a DHCP client implementation
95317501208f3bf5b159e6a40801b7069f68c486Mark Andrewscompliant with RFCs 2131, 3315, and others. The major forces shaping
95317501208f3bf5b159e6a40801b7069f68c486Mark Andrewsits design were:
73cac2175470e9068829589476dda8bd6d88036fMark Andrews
73cac2175470e9068829589476dda8bd6d88036fMark Andrews * Must be capable of managing multiple network interfaces.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews * Must consume little CPU, since it will always be running.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews * Must have a small memory footprint, since it will always be
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews running.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews * Must not rely on any shared libraries outside of /lib, since
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrews it must run before all filesystems have been mounted.
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrews
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark AndrewsWhen a DHCP agent implementation is only required to control a single
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrewsinterface on a machine, the problem is expressed well as a simple
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsstate-machine, as shown in RFC2131. However, when a DHCP agent is
01163d188b89911c3a23fe1125a4cab6764a408cMark Andrewsresponsible for managing more than one interface at a time, the
01163d188b89911c3a23fe1125a4cab6764a408cMark Andrewsproblem becomes much more complicated.
01163d188b89911c3a23fe1125a4cab6764a408cMark Andrews
01163d188b89911c3a23fe1125a4cab6764a408cMark AndrewsThis can be resolved using threads or with an event-driven model.
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark AndrewsGiven that DHCP's behavior can be expressed concisely as a state
648ba62b1f156cbca14d54d06c535385f1193d13Mark Andrewsmachine, the event-driven model is the closest match.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsWhile tried-and-true, that model is subtle and easy to get wrong.
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsIndeed, much of the agent's code is there to manage the complexity of
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsprogramming in an asynchronous event-driven paradigm.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
11b07ea523314b123b4e503ce3813443a018c8d3Mark AndrewsTHE BASICS
11b07ea523314b123b4e503ce3813443a018c8d3Mark Andrews==========
11b07ea523314b123b4e503ce3813443a018c8d3Mark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsThe DHCP agent consists of roughly 30 source files, most with a
35665db4e49e3e4c0e3776e635449f931f3732cfMark Andrewscompanion header file. While the largest source file is around 1700
c6313caa6cd9d011ac075048d2b62fd3410162dfMark Andrewslines, most are much shorter. The source files can largely be broken
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark Andrewsup into three groups:
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
bdfd62f497fe0d5281c25b61271595a4c821a040Mark Andrews * Source files that, along with their companion header files,
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews define an abstract "object" that is used by other parts of
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrews the system. Examples include "packet.c", which along with
bdfd62f497fe0d5281c25b61271595a4c821a040Mark Andrews "packet.h" provide a Packet object for use by the rest of
bdfd62f497fe0d5281c25b61271595a4c821a040Mark Andrews the agent; and "async.c", which along with "async.h" defines
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrews an interface for managing asynchronous transactions within
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrews the agent.
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrews
bdfd62f497fe0d5281c25b61271595a4c821a040Mark Andrews * Source files that implement a given state of the agent; for
bdfd62f497fe0d5281c25b61271595a4c821a040Mark Andrews instance, there is a "request.c" which comprises all of
bdfd62f497fe0d5281c25b61271595a4c821a040Mark Andrews the procedural "work" which must be done while in the
bdfd62f497fe0d5281c25b61271595a4c821a040Mark Andrews REQUESTING state of the agent. By encapsulating states in
bdfd62f497fe0d5281c25b61271595a4c821a040Mark Andrews files, it becomes easier to debug errors in the
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews client/server protocol and adapt the agent to new
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark Andrews constraints, since all the relevant code is in one place.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews * Source files, which along with their companion header files,
64cde9d94a2f6c7050f00d9651df6b05e63d09f2Mark Andrews encapsulate a given task or related set of tasks. The
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews difference between this and the first group is that the
cbb94d52f98b48e8c3a8866dbf8c67860764f349Mark Andrews interfaces exported from these files do not operate on
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews an "object", but rather perform a specific task. Examples
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews include "dlpi_io.c", which provides a useful interface
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews to DLPI-related i/o operations.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsOVERVIEW
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrews========
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrews
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark AndrewsHere we discuss the essential objects and subtle aspects of the
11b07ea523314b123b4e503ce3813443a018c8d3Mark AndrewsDHCP agent implementation. Note that there is of course much more
11b07ea523314b123b4e503ce3813443a018c8d3Mark Andrewsthat is not discussed here, but after this overview you should be able
11b07ea523314b123b4e503ce3813443a018c8d3Mark Andrewsto fend for yourself in the source code.
11b07ea523314b123b4e503ce3813443a018c8d3Mark Andrews
11b07ea523314b123b4e503ce3813443a018c8d3Mark AndrewsFor details on the DHCPv6 aspects of the design, and how this relates
11b07ea523314b123b4e503ce3813443a018c8d3Mark Andrewsto the implementation present in previous releases of Solaris, see the
11b07ea523314b123b4e503ce3813443a018c8d3Mark AndrewsREADME.v6 file.
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrews
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark AndrewsEvent Handlers and Timer Queues
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrews-------------------------------
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrews
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark AndrewsThe most important object in the agent is the event handler, whose
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrewsinterface is in libinetutil.h and whose implementation is in
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrewslibinetutil. The event handler is essentially an object-oriented
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrewswrapper around poll(2): other components of the agent can register to
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrewsbe called back when specific events on file descriptors happen -- for
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrewsinstance, to wait for requests to arrive on its IPC socket, the agent
11b07ea523314b123b4e503ce3813443a018c8d3Mark Andrewsregisters a callback function (accept_event()) that will be called
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrewsback whenever a new connection arrives on the file descriptor
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrewsassociated with the IPC socket. When the agent initially begins in
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrewsmain(), it registers a number of events with the event handler, and
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsthen calls iu_handle_events(), which proceeds to wait for events to
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewshappen -- this function does not return until the agent is shutdown
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsvia signal.
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark Andrews
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark AndrewsWhen the registered events occur, the callback functions are called
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark Andrewsback, which in turn might lead to additional callbacks being
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrewsregistered -- this is the classic event-driven model. (As an aside,
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsnote that programming in an event-driven model means that callbacks
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewscannot block, or else the agent will become unresponsive.)
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsA special kind of "event" is a timeout. Since there are many timers
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrewswhich must be maintained for each DHCP-controlled interface (such as a
61a03692ab84504fb2bd85b71facfe0f6456b466Mark Andrewslease expiration timer, time-to-first-renewal (t1) timer, and so
61a03692ab84504fb2bd85b71facfe0f6456b466Mark Andrewsforth), an object-oriented abstraction to timers called a "timer
61a03692ab84504fb2bd85b71facfe0f6456b466Mark Andrewsqueue" is provided, whose interface is in libinetutil.h with a
61a03692ab84504fb2bd85b71facfe0f6456b466Mark Andrewscorresponding implementation in libinetutil. The timer queue allows
61a03692ab84504fb2bd85b71facfe0f6456b466Mark Andrewscallback functions to be "scheduled" for callback after a certain
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrewsamount of time has passed.
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsThe event handler and timer queue objects work hand-in-hand: the event
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewshandler is passed a pointer to a timer queue in iu_handle_events() --
8f8634e66351e292925dde8ab6b0418a0141f86aMark Andrewsfrom there, it can use the iu_earliest_timer() routine to find the
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewstimer which will next fire, and use this to set its timeout value in
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsits call to poll(2). If poll(2) returns due to a timeout, the event
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewshandler calls iu_expire_timers() to expire all timers that expired
61a03692ab84504fb2bd85b71facfe0f6456b466Mark Andrews(note that more than one may have expired if, for example, multiple
68843c99b695bf194b019d465f6d33e6297fd02aMark Andrewstimers were set to expire at the same time).
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsAlthough it is possible to instantiate more than one timer queue or
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsevent handler object, it doesn't make a lot of sense -- these objects
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrewsare really "singletons". Accordingly, the agent has two global
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrewsvariables, `eh' and `tq', which store pointers to the global event
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrewshandler and timer queue.
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrews
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark AndrewsNetwork Interfaces
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrews------------------
514f6f14245f42c96f3d5e90462ce2ebe0597d2eMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsFor each network interface managed by the agent, there is a set of
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsassociated state that describes both its general properties (such as
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsthe maximum MTU) and its connections to DHCP-related state (the
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrewsprotocol state machines). This state is stored in a pair of
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsstructures called `dhcp_pif_t' (the IP physical interface layer or
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsPIF) and `dhcp_lif_t' (the IP logical interface layer or LIF). Each
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsdhcp_pif_t represents a single physical interface, such as "hme0," for
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsa given IP protocol version (4 or 6), and has a list of dhcp_lif_t
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsstructures representing the logical interfaces (such as "hme0:1") in
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsuse by the agent.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsThis split is important because of differences between IPv4 and IPv6.
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsFor IPv4, each DHCP state machine manages a single IP address and
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrewsassociated configuration data. This corresponds to a single logical
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark Andrewsinterface, which must be specified by the user. For IPv6, however,
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark Andrewseach DHCP state machine manages a group of addresses, and is
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark Andrewsassociated with DUID value rather than with just an interface.
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark Andrews
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark AndrewsThus, DHCPv6 behaves more like in.ndpd in its creation of "ADDRCONF"
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark Andrewsinterfaces. The agent automatically plumbs logical interfaces when
d6dc0d4f584352d2e4305435599ae8c93776d9b4Mark Andrewsneeded and removes them when the addresses expire.
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsThe state for a given session is stored separately in `dhcp_smach_t'.
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsThis state machine then points to the main LIF used for I/O, and to a
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewslist of `dhcp_lease_t' structures representing individual leases, and
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewseach of those points to a list of LIFs corresponding to the individual
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsaddresses being managed.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsOne point that was brushed over in the preceding discussion of event
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewshandlers and timer queues was context. Recall that the event-driven
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsnature of the agent requires that functions cannot block, lest they
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsstarve out others and impact the observed responsiveness of the agent.
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsAs an example, consider the process of extending a lease: the agent
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsmust send a REQUEST packet and wait for an ACK or NAK packet in
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsresponse. This is done by sending a REQUEST and then returning to the
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsevent handler that waits for an ACK or NAK packet to arrive on the
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsfile descriptor associated with the interface. Note however, that
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewswhen the ACK or NAK does arrive, and the callback function called
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsback, it must know which state machine this packet is for (it must get
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsback its context). This could be handled through an ad-hoc mapping of
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrewsfile descriptors to state machines, but a cleaner approach is to have
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsthe event handler's register function (iu_register_event()) take in an
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsopaque context pointer, which will then be passed back to the
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewscallback. In the agent, the context pointer used depends on the
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsnature of the event: events on LIFs use the dhcp_lif_t pointer, events
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewson the state machine use dhcp_smach_t, and so on.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsNote that there is nothing that guarantees the pointer passed into
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsiu_register_event() or iu_schedule_timer() will still be valid when
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsthe callback is called back (for instance, the memory may have been
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsfreed in the meantime). To solve this problem, all of the data
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsstructures used in this way are reference counted. For more details
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewson how the reference count scheme is implemented, see the closing
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewscomments in interface.h regarding memory management.
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsTransactions
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews------------
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsMany operations performed via DHCP must be performed in groups -- for
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsinstance, acquiring a lease requires several steps: sending a
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsDISCOVER, collecting OFFERs, selecting an OFFER, sending a REQUEST,
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsand receiving an ACK, assuming everything goes well. Note however
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsthat due to the event-driven model the agent operates in, these
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrewsoperations are not inherently "grouped" -- instead, the agent sends a
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsDISCOVER, goes back into the main event loop, waits for events
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrews(perhaps even requests on the IPC channel to begin acquiring a lease
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewson another state machine), eventually checks to see if an acceptable
d9b4174233b951f25cd53a2787b9f14314258c2fMark AndrewsOFFER has come in, and so forth. To some degree, the notion of the
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsstate machine's current state (SELECTING, REQUESTING, etc) helps
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewscontrol the potential chaos of the event-driven model (for instance,
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsif while the agent is waiting for an OFFER on a given state machine,
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsan IPC event comes in requesting that the leases be RELEASED, the
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsagent knows to send back an error since the state machine must be in
b08e3be5dbfba22719ae9c428bd6853ac6f09798Mark Andrewsat least the BOUND state before a RELEASE can be performed.)
f91671c7dc877a52adc06d0a7d0ed1c7f6391e6eMark Andrews
f91671c7dc877a52adc06d0a7d0ed1c7f6391e6eMark AndrewsHowever, states are not enough -- for instance, suppose that the agent
d9b4174233b951f25cd53a2787b9f14314258c2fMark Andrewsbegins trying to renew a lease. This is done by sending a REQUEST
packet and waiting for an ACK or NAK, which might never come. If,
while waiting for the ACK or NAK, the user sends a request to renew
the lease as well, then if the agent were to send another REQUEST,
things could get quite complicated (and this is only the beginning of
this rathole). To protect against this, two objects exist:
`async_action' and `ipc_action'. These objects are related, but
independent of one another; the more essential object is the
`async_action', which we will discuss first.
In short, an `async_action' represents a pending transaction (aka
asynchronous action), of which each state machine can have at most
one. The `async_action' structure is embedded in the `dhcp_smach_t'
structure, which is fine since there can be at most one pending
transaction per state machine. Typical "asynchronous transactions"
are START, EXTEND, and INFORM, since each consists of a sequence of
packets that must be done without interruption. Note that not all
DHCP operations are "asynchronous" -- for instance, a DHCPv4 RELEASE
operation is synchronous (not asynchronous) since after the RELEASE is
sent no reply is expected from the DHCP server, but DHCPv6 Release is
asynchronous, as all DHCPv6 messages are transactional. Some
operations, such as status query, are synchronous and do not affect
the system state, and thus do not require sequencing.
When the agent realizes it must perform an asynchronous transaction,
it calls async_async() to open the transaction. If one is already
pending, then the new transaction must fail (the details of failure
depend on how the transaction was initiated, which is described in
more detail later when the `ipc_action' object is discussed). If
there is no pending asynchronous transaction, the operation succeeds.
When the transaction is complete, either async_finish() or
async_cancel() must be called to complete or cancel the asynchronous
action on that state machine. If the transaction is unable to
complete within a certain amount of time (more on this later), a timer
should be used to cancel the operation.
The notion of asynchronous transactions is complicated by the fact
that they may originate from both inside and outside of the agent.
For instance, a user initiates an asynchronous START transaction when
he performs an `ifconfig hme0 dhcp start', but the agent will
internally need to perform asynchronous EXTEND transactions to extend
the lease before it expires. Note that user-initiated actions always
have priority over internal actions: the former will cancel the
latter, if necessary.
This leads us into the `ipc_action' object. An `ipc_action'
represents the IPC-related pieces of an asynchronous transaction that
was started as a result of a user request, as well as the `BUSY' state
of the administrative interface. Only IPC-generated asynchronous
transactions have a valid `ipc_action' object. Note that since there
can be at most one asynchronous action per state machine, there can
also be at most one `ipc_action' per state machine (this means it can
also conveniently be embedded inside the `dhcp_smach_t' structure).
One of the main purposes of the `ipc_action' object is to timeout user
events. When the user specifies a timeout value as an argument to
ifconfig, he is specifying an `ipc_action' timeout; in other words,
how long he is willing to wait for the command to complete. When this
time expires, the ipc_action is terminated, as well as the
asynchronous operation.
The API provided for the `ipc_action' object is quite similar to the
one for the `async_action' object: when an IPC request comes in for an
operation requiring asynchronous operation, ipc_action_start() is
called. When the request completes, ipc_action_finish() is called.
If the user times out before the request completes, then
ipc_action_timeout() is called.
Packet Management
-----------------
Another complicated area is packet management: building, manipulating,
sending and receiving packets. These operations are all encapsulated
behind a dozen or so interfaces (see packet.h) that abstract the
unimportant details away from the rest of the agent code. In order to
send a DHCP packet, code first calls init_pkt(), which returns a
dhcp_pkt_t initialized suitably for transmission. Note that currently
init_pkt() returns a dhcp_pkt_t that is actually allocated as part of
the `dhcp_smach_t', but this may change in the future.. After calling
init_pkt(), the add_pkt_opt*() functions are used to add options to
the DHCP packet. Finally, send_pkt() and send_pkt_v6() can be used to
transmit the packet to a given IP address.
The send_pkt() function is actually quite complicated; for one, it
must internally use either DLPI or sockets depending on the machine
state; for another, it handles the details of packet timeout and
retransmission. The last argument to send_pkt() is a pointer to a
"stop function." If this argument is passed as NULL, then the packet
will only be sent once (it won't be retransmitted). Otherwise, before
each retransmission, the stop function will be called back prior to
retransmission. The callback may alter dsm_send_timeout if necessary
to place a cap on the next timeout; this is done for DHCPv6 in
stop_init_reboot() in order to implement the CNF_MAX_RD constraint.
The return value from this function indicates whether to continue
retransmission or not, which allows the send_pkt() caller to control
the retransmission policy without making it have to deal with the
retransmission mechanism. See request.c for an example of this in
action.
The recv_pkt() function is simpler but still complicated by the fact
that one may want to receive several different types of packets at
once and in different ways (DLPI or sockets). The caller registers an
event handler on the file descriptor, and then calls recv_pkt() to
read in the packet along with meta information about the message (the
sender and interface identifier).
For IPv6, packet reception is done with a single socket, using
IPV6_PKTINFO to determine the actual destination address and receiving
interface. Packets are then matched against the state machines on the
given interface through the transaction ID.
The same facility exists for inbound IPv4 packets, but because there's
no IP_PKTINFO processing on output yet in Solaris, and because IPv4
still relies on DLPI, DHCP packets are handled on a per-LIF (when
bound) and per-PIF (when unbound) basis. Eventually, when IP_PKTINFO
is available for IPv4, the per-LIF sockets can go away. If it ever
becomes possible to send and receive IP packets without having an IP
address configured on an interface, then the DLPI streams can go as
well.
Time
----
The notion of time is an exceptionally subtle area. You will notice
five ways that time is represented in the source: as lease_t's,
uint32_t's, time_t's, hrtime_t's, and monosec_t's. Each of these
types serves a slightly different function.
The `lease_t' type is the simplest to understand; it is the unit of
time in the CD_{LEASE,T1,T2}_TIME options in a DHCP packet, as defined
by RFC2131. This is defined as a positive number of seconds (relative
to some fixed point in time) or the value `-1' (DHCP_PERM) which
represents infinity (i.e., a permanent lease). The lease_t should be
used either when dealing with actual DHCP packets that are sent on the
wire or for variables which follow the exact definition given in the
RFC.
The `uint32_t' type is also used to represent a relative time in
seconds. However, here the value `-1' is not special and of course
this type is not tied to any definition given in RFC2131. Use this
for representing "offsets" from another point in time that are not
DHCP lease times.
The `time_t' type is the natural Unix type for representing time since
the epoch. Unfortunately, it is affected by stime(2) or adjtime(2)
and since the DHCP client is used during system installation (and thus
when time is typically being configured), the time_t cannot be used in
general to represent an absolute time since the epoch. For instance,
if a time_t were used to keep track of when a lease began, and then a
minute later stime(2) was called to adjust the system clock forward a
year, then the lease would appeared to have expired a year ago even
though it has only been a minute. For this reason, time_t's should
only be used either when wall time must be displayed (such as in
DHCP_STATUS ipc transaction) or when a time meaningful across reboots
must be obtained (such as when caching an ACK packet at system
shutdown).
The `hrtime_t' type returned from gethrtime() works around the
limitations of the time_t in that it is not affected by stime(2) or
adjtime(2), with the disadvantage that it represents time from some
arbitrary time in the past and in nanoseconds. The timer queue code
deals with hrtime_t's directly since that particular piece of code is
meant to be fairly independent of the rest of the DHCP client.
However, dealing with nanoseconds is error-prone when all the other
time types are in seconds. As a result, yet another time type, the
`monosec_t' was created to represent a monotonically increasing time
in seconds, and is really no more than (hrtime_t / NANOSEC). Note
that this unit is typically used where time_t's would've traditionally
been used. The function monosec() in util.c returns the current
monosec, and monosec_to_time() can convert a given monosec to wall
time, using the system's current notion of time.
One additional limitation of the `hrtime_t' and `monosec_t' types is
that they are unaware of the passage of time across checkpoint/resume
events (e.g., those generated by sys-suspend(1M)). For example, if
gethrtime() returns time T, and then the machine is suspended for 2
hours, and then gethrtime() is called again, the time returned is not
T + (2 * 60 * 60 * NANOSEC), but rather approximately still T.
To work around this (and other checkpoint/resume related problems),
when a system is resumed, the DHCP client makes the pessimistic
assumption that all finite leases have expired while the machine was
suspended and must be obtained again. This is known as "refreshing"
the leases, and is handled by refresh_smachs().
Note that it appears like a more intelligent approach would be to
record the time(2) when the system is suspended, compare that against
the time(2) when the system is resumed, and use the delta between them
to decide which leases have expired. Sadly, this cannot be done since
through at least Solaris 10, it is not possible for userland programs
to be notified of system suspend events.
Configuration
-------------
For the most part, the DHCP client only *retrieves* configuration data
from the DHCP server, leaving the configuration to scripts (such as
boot scripts), which themselves use dhcpinfo(1) to retrieve the data
from the DHCP client. This is desirable because it keeps the mechanism
of retrieving the configuration data decoupled from the policy of using
the data.
However, unless used in "inform" mode, the DHCP client *does*
configure each IP interface enough to allow it to communicate with
other hosts. Specifically, the DHCP client configures the interface's
IP address, netmask, and broadcast address using the information
provided by the server. Further, for IPv4 logical interface 0
("hme0"), any provided default routes are also configured.
For IPv6, only the IP addresses are set. The netmask (prefix) is then
set automatically by in.ndpd, and routes are discovered in the usual
way by router discovery or routing protocols. DHCPv6 doesn't set
routes.
Since logical interfaces cannot be specified as output interfaces in
the kernel forwarding table, and in most cases, logical interfaces
share a default route with their associated physical interface, the
DHCP client does not automatically add or remove default routes when
IPv4 leases are acquired or expired on logical interfaces.
Event Scripting
---------------
The DHCP client supports user program invocations on DHCP events. The
supported events are BOUND, EXTEND, EXPIRE, DROP, RELEASE, and INFORM
for DHCPv4, and BUILD6, EXTEND6, EXPIRE6, DROP6, LOSS6, RELEASE6, and
INFORM6 for DHCPv6. The user program runs asynchronous to the DHCP
client so that the main event loop stays active to process other
events, including events triggered by the user program (for example,
when it invokes dhcpinfo).
The user program execution is part of the transaction of a DHCP command.
For example, if the user program is not enabled, the transaction of the
DHCP command START is considered over when an ACK is received and the
interface is configured successfully. If the user program is enabled,
it is invoked after the interface is configured successfully, and the
transaction is considered over only when the user program exits. The
event scripting implementation makes use of the asynchronous operations
discussed in the "Transactions" section.
An upper bound of 58 seconds is imposed on how long the user program
can run. If the user program does not exit after 55 seconds, the signal
SIGTERM is sent to it. If it still does not exit after additional 3
seconds, the signal SIGKILL is sent to it. Since the event handler is
a wrapper around poll(), the DHCP client cannot directly observe the
completion of the user program. Instead, the DHCP client creates a
child "helper" process to synchronously monitor the user program (this
process is also used to send the aformentioned signals to the process,
if necessary). The DHCP client and the helper process share a pipe
which is included in the set of poll descriptors monitored by the DHCP
client's event handler. When the user program exits, the helper process
passes the user program exit status to the DHCP client through the pipe,
informing the DHCP client that the user program has finished. When the
DHCP client is asked to shut down, it will wait for any running instances
of the user program to complete.