Cross Reference: /illumos-gate/usr/src/cmd/cmd-inet/sbin/dhcpagent/

READMECDDL HEADER START

The contents of this file are subject to the terms of the
Common Development and Distribution License (the "License").
You may not use this file except in compliance with the License.

You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
or http://www.opensolaris.org/os/licensing.
See the License for the specific language governing permissions
and limitations under the License.

When distributing Covered Code, include this CDDL HEADER in each
file and include the License file at usr/src/OPENSOLARIS.LICENSE.
If applicable, add the following below this CDDL HEADER, with the
fields enclosed by brackets "[]" replaced with your own identifying
information: Portions Copyright [yyyy] [name of copyright owner]

CDDL HEADER END

Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.

Architectural Overview for the DHCP agent
Peter Memishian
ident   "%Z%%M% %I% %E% SMI"

INTRODUCTION
============

The Solaris DHCP agent (dhcpagent) is a DHCP client implementation
compliant with RFCs 2131, 3315, and others.  The major forces shaping
its design were:

    * Must be capable of managing multiple network interfaces.
    * Must consume little CPU, since it will always be running.
    * Must have a small memory footprint, since it will always be
      running.
    * Must not rely on any shared libraries outside of /lib, since
      it must run before all filesystems have been mounted.

When a DHCP agent implementation is only required to control a single
interface on a machine, the problem is expressed well as a simple
state-machine, as shown in RFC2131.  However, when a DHCP agent is
responsible for managing more than one interface at a time, the
problem becomes much more complicated.

This can be resolved using threads or with an event-driven model.
Given that DHCP's behavior can be expressed concisely as a state
machine, the event-driven model is the closest match.

While tried-and-true, that model is subtle and easy to get wrong.
Indeed, much of the agent's code is there to manage the complexity of
programming in an asynchronous event-driven paradigm.

THE BASICS
==========

The DHCP agent consists of roughly 30 source files, most with a
companion header file.  While the largest source file is around 1700
lines, most are much shorter.  The source files can largely be broken
up into three groups:

    * Source files that, along with their companion header files,
      define an abstract "object" that is used by other parts of
      the system.  Examples include "packet.c", which along with
      "packet.h" provide a Packet object for use by the rest of
      the agent; and "async.c", which along with "async.h" defines
      an interface for managing asynchronous transactions within
      the agent.

    * Source files that implement a given state of the agent; for
      instance, there is a "request.c" which comprises all of
      the procedural "work" which must be done while in the
      REQUESTING state of the agent.  By encapsulating states in
      files, it becomes easier to debug errors in the
      client/server protocol and adapt the agent to new
      constraints, since all the relevant code is in one place.

    * Source files, which along with their companion header files,
      encapsulate a given task or related set of tasks.  The
      difference between this and the first group is that the
      interfaces exported from these files do not operate on
      an "object", but rather perform a specific task.  Examples
      include "defaults.c", which provides a useful interface
      to /etc/default/dhcpagent file operations.

OVERVIEW
========

Here we discuss the essential objects and subtle aspects of the
DHCP agent implementation.  Note that there is of course much more
that is not discussed here, but after this overview you should be able
to fend for yourself in the source code.

For details on the DHCPv6 aspects of the design, and how this relates
to the implementation present in previous releases of Solaris, see the
README.v6 file.

Event Handlers and Timer Queues
-------------------------------

The most important object in the agent is the event handler, whose
interface is in libinetutil.h and whose implementation is in
libinetutil.  The event handler is essentially an object-oriented
wrapper around poll(2): other components of the agent can register to
be called back when specific events on file descriptors happen -- for
instance, to wait for requests to arrive on its IPC socket, the agent
registers a callback function (accept_event()) that will be called
back whenever a new connection arrives on the file descriptor
associated with the IPC socket.  When the agent initially begins in
main(), it registers a number of events with the event handler, and
then calls iu_handle_events(), which proceeds to wait for events to
happen -- this function does not return until the agent is shutdown
via signal.

When the registered events occur, the callback functions are called
back, which in turn might lead to additional callbacks being
registered -- this is the classic event-driven model.  (As an aside,
note that programming in an event-driven model means that callbacks
cannot block, or else the agent will become unresponsive.)

A special kind of "event" is a timeout.  Since there are many timers
which must be maintained for each DHCP-controlled interface (such as a
lease expiration timer, time-to-first-renewal (t1) timer, and so
forth), an object-oriented abstraction to timers called a "timer
queue" is provided, whose interface is in libinetutil.h with a
corresponding implementation in libinetutil.  The timer queue allows
callback functions to be "scheduled" for callback after a certain
amount of time has passed.

The event handler and timer queue objects work hand-in-hand: the event
handler is passed a pointer to a timer queue in iu_handle_events() --
from there, it can use the iu_earliest_timer() routine to find the
timer which will next fire, and use this to set its timeout value in
its call to poll(2).  If poll(2) returns due to a timeout, the event
handler calls iu_expire_timers() to expire all timers that expired
(note that more than one may have expired if, for example, multiple
timers were set to expire at the same time).

Although it is possible to instantiate more than one timer queue or
event handler object, it doesn't make a lot of sense -- these objects
are really "singletons".  Accordingly, the agent has two global
variables, `eh' and `tq', which store pointers to the global event
handler and timer queue.

Network Interfaces
------------------

For each network interface managed by the agent, there is a set of
associated state that describes both its general properties (such as
the maximum MTU) and its connections to DHCP-related state (the
protocol state machines).  This state is stored in a pair of
structures called `dhcp_pif_t' (the IP physical interface layer or
PIF) and `dhcp_lif_t' (the IP logical interface layer or LIF).  Each
dhcp_pif_t represents a single physical interface, such as "hme0," for
a given IP protocol version (4 or 6), and has a list of dhcp_lif_t
structures representing the logical interfaces (such as "hme0:1") in
use by the agent.

This split is important because of differences between IPv4 and IPv6.
For IPv4, each DHCP state machine manages a single IP address and
associated configuration data.  This corresponds to a single logical
interface, which must be specified by the user.  For IPv6, however,
each DHCP state machine manages a group of addresses, and is
associated with DUID value rather than with just an interface.

Thus, DHCPv6 behaves more like in.ndpd in its creation of "ADDRCONF"
interfaces.  The agent automatically plumbs logical interfaces when
needed and removes them when the addresses expire.

The state for a given session is stored separately in `dhcp_smach_t'.
This state machine then points to the main LIF used for I/O, and to a
list of `dhcp_lease_t' structures representing individual leases, and
each of those points to a list of LIFs corresponding to the individual
addresses being managed.

One point that was brushed over in the preceding discussion of event
handlers and timer queues was context.  Recall that the event-driven
nature of the agent requires that functions cannot block, lest they
starve out others and impact the observed responsiveness of the agent.
As an example, consider the process of extending a lease: the agent
must send a REQUEST packet and wait for an ACK or NAK packet in
response.  This is done by sending a REQUEST and then returning to the
event handler that waits for an ACK or NAK packet to arrive on the
file descriptor associated with the interface.  Note however, that
when the ACK or NAK does arrive, and the callback function called
back, it must know which state machine this packet is for (it must get
back its context).  This could be handled through an ad-hoc mapping of
file descriptors to state machines, but a cleaner approach is to have
the event handler's register function (iu_register_event()) take in an
opaque context pointer, which will then be passed back to the
callback.  In the agent, the context pointer used depends on the
nature of the event: events on LIFs use the dhcp_lif_t pointer, events
on the state machine use dhcp_smach_t, and so on.

Note that there is nothing that guarantees the pointer passed into
iu_register_event() or iu_schedule_timer() will still be valid when
the callback is called back (for instance, the memory may have been
freed in the meantime).  To solve this problem, all of the data
structures used in this way are reference counted.  For more details
on how the reference count scheme is implemented, see the closing
comments in interface.h regarding memory management.

Transactions
------------

Many operations performed via DHCP must be performed in groups -- for
instance, acquiring a lease requires several steps: sending a
DISCOVER, collecting OFFERs, selecting an OFFER, sending a REQUEST,
and receiving an ACK, assuming everything goes well.  Note however
that due to the event-driven model the agent operates in, these
operations are not inherently "grouped" -- instead, the agent sends a
DISCOVER, goes back into the main event loop, waits for events
(perhaps even requests on the IPC channel to begin acquiring a lease
on another state machine), eventually checks to see if an acceptable
OFFER has come in, and so forth.  To some degree, the notion of the
state machine's current state (SELECTING, REQUESTING, etc) helps
control the potential chaos of the event-driven model (for instance,
if while the agent is waiting for an OFFER on a given state machine,
an IPC event comes in requesting that the leases be RELEASED, the
agent knows to send back an error since the state machine must be in
at least the BOUND state before a RELEASE can be performed.)

However, states are not enough -- for instance, suppose that the agent
begins trying to renew a lease.  This is done by sending a REQUEST
packet and waiting for an ACK or NAK, which might never come.  If,
while waiting for the ACK or NAK, the user sends a request to renew
the lease as well, then if the agent were to send another REQUEST,
things could get quite complicated (and this is only the beginning of
this rathole).  To protect against this, two objects exist:
`async_action' and `ipc_action'.  These objects are related, but
independent of one another; the more essential object is the
`async_action', which we will discuss first.

In short, an `async_action' represents a pending transaction (aka
asynchronous action), of which each state machine can have at most
one.  The `async_action' structure is embedded in the `dhcp_smach_t'
structure, which is fine since there can be at most one pending
transaction per state machine.  Typical "asynchronous transactions"
are START, EXTEND, and INFORM, since each consists of a sequence of
packets that must be done without interruption.  Note that not all
DHCP operations are "asynchronous" -- for instance, a DHCPv4 RELEASE
operation is synchronous (not asynchronous) since after the RELEASE is
sent no reply is expected from the DHCP server, but DHCPv6 Release is
asynchronous, as all DHCPv6 messages are transactional.  Some
operations, such as status query, are synchronous and do not affect
the system state, and thus do not require sequencing.

When the agent realizes it must perform an asynchronous transaction,
it calls async_async() to open the transaction.  If one is already
pending, then the new transaction must fail (the details of failure
depend on how the transaction was initiated, which is described in
more detail later when the `ipc_action' object is discussed).  If
there is no pending asynchronous transaction, the operation succeeds.

When the transaction is complete, either async_finish() or
async_cancel() must be called to complete or cancel the asynchronous
action on that state machine.  If the transaction is unable to
complete within a certain amount of time (more on this later), a timer
should be used to cancel the operation.

The notion of asynchronous transactions is complicated by the fact
that they may originate from both inside and outside of the agent.
For instance, a user initiates an asynchronous START transaction when
he performs an `ifconfig hme0 dhcp start', but the agent will
internally need to perform asynchronous EXTEND transactions to extend
the lease before it expires.  Note that user-initiated actions always
have priority over internal actions: the former will cancel the
latter, if necessary.

This leads us into the `ipc_action' object.  An `ipc_action'
represents the IPC-related pieces of an asynchronous transaction that
was started as a result of a user request, as well as the `BUSY' state
of the administrative interface.  Only IPC-generated asynchronous
transactions have a valid `ipc_action' object.  Note that since there
can be at most one asynchronous action per state machine, there can
also be at most one `ipc_action' per state machine (this means it can
also conveniently be embedded inside the `dhcp_smach_t' structure).

One of the main purposes of the `ipc_action' object is to timeout user
events.  When the user specifies a timeout value as an argument to
ifconfig, he is specifying an `ipc_action' timeout; in other words,
how long he is willing to wait for the command to complete.  When this
time expires, the ipc_action is terminated, as well as the
asynchronous operation.

The API provided for the `ipc_action' object is quite similar to the
one for the `async_action' object: when an IPC request comes in for an
operation requiring asynchronous operation, ipc_action_start() is
called.  When the request completes, ipc_action_finish() is called.
If the user times out before the request completes, then
ipc_action_timeout() is called.

Packet Management
-----------------

Another complicated area is packet management: building, manipulating,
sending and receiving packets.  These operations are all encapsulated
behind a dozen or so interfaces (see packet.h) that abstract the
unimportant details away from the rest of the agent code.  In order to
send a DHCP packet, code first calls init_pkt(), which returns a
dhcp_pkt_t initialized suitably for transmission.  Note that currently
init_pkt() returns a dhcp_pkt_t that is actually allocated as part of
the `dhcp_smach_t', but this may change in the future..  After calling
init_pkt(), the add_pkt_opt*() functions are used to add options to
the DHCP packet.  Finally, send_pkt() and send_pkt_v6() can be used to
transmit the packet to a given IP address.

The send_pkt() function handles the details of packet timeout and
retransmission.  The last argument to send_pkt() is a pointer to a
"stop function."  If this argument is passed as NULL, then the packet
will only be sent once (it won't be retransmitted).  Otherwise, before
each retransmission, the stop function will be called back prior to
retransmission.  The callback may alter dsm_send_timeout if necessary
to place a cap on the next timeout; this is done for DHCPv6 in
stop_init_reboot() in order to implement the CNF_MAX_RD constraint.

The return value from this function indicates whether to continue
retransmission or not, which allows the send_pkt() caller to control
the retransmission policy without making it have to deal with the
retransmission mechanism.  See request.c for an example of this in
action.

The recv_pkt() function is simpler but still complicated by the fact
that one may want to receive several different types of packets at
once.  The caller registers an event handler on the file descriptor,
and then calls recv_pkt() to read in the packet along with meta
information about the message (the sender and interface identifier).

For IPv6, packet reception is done with a single socket, using
IPV6_PKTINFO to determine the actual destination address and receiving
interface.  Packets are then matched against the state machines on the
given interface through the transaction ID.

For IPv4, due to oddities in the DHCP specification (discussed in
PSARC/2007/571), a special IP_DHCPINIT_IF socket option must be used
to allow unicast DHCP traffic to be received on an interface during
lease acquisition.  Since the IP_DHCPINIT_IF socket option can only
enable one interface at a time, one socket must be used per interface.

Time
----

The notion of time is an exceptionally subtle area.  You will notice
five ways that time is represented in the source: as lease_t's,
uint32_t's, time_t's, hrtime_t's, and monosec_t's.  Each of these
types serves a slightly different function.

The `lease_t' type is the simplest to understand; it is the unit of
time in the CD_{LEASE,T1,T2}_TIME options in a DHCP packet, as defined
by RFC2131. This is defined as a positive number of seconds (relative
to some fixed point in time) or the value `-1' (DHCP_PERM) which
represents infinity (i.e., a permanent lease).  The lease_t should be
used either when dealing with actual DHCP packets that are sent on the
wire or for variables which follow the exact definition given in the
RFC.

The `uint32_t' type is also used to represent a relative time in
seconds.  However, here the value `-1' is not special and of course
this type is not tied to any definition given in RFC2131.  Use this
for representing "offsets" from another point in time that are not
DHCP lease times.

The `time_t' type is the natural Unix type for representing time since
the epoch.  Unfortunately, it is affected by stime(2) or adjtime(2)
and since the DHCP client is used during system installation (and thus
when time is typically being configured), the time_t cannot be used in
general to represent an absolute time since the epoch.  For instance,
if a time_t were used to keep track of when a lease began, and then a
minute later stime(2) was called to adjust the system clock forward a
year, then the lease would appeared to have expired a year ago even
though it has only been a minute.  For this reason, time_t's should
only be used either when wall time must be displayed (such as in
DHCP_STATUS ipc transaction) or when a time meaningful across reboots
must be obtained (such as when caching an ACK packet at system
shutdown).

The `hrtime_t' type returned from gethrtime() works around the
limitations of the time_t in that it is not affected by stime(2) or
adjtime(2), with the disadvantage that it represents time from some
arbitrary time in the past and in nanoseconds.  The timer queue code
deals with hrtime_t's directly since that particular piece of code is
meant to be fairly independent of the rest of the DHCP client.

However, dealing with nanoseconds is error-prone when all the other
time types are in seconds.  As a result, yet another time type, the
`monosec_t' was created to represent a monotonically increasing time
in seconds, and is really no more than (hrtime_t / NANOSEC).  Note
that this unit is typically used where time_t's would've traditionally
been used.  The function monosec() in util.c returns the current
monosec, and monosec_to_time() can convert a given monosec to wall
time, using the system's current notion of time.

One additional limitation of the `hrtime_t' and `monosec_t' types is
that they are unaware of the passage of time across checkpoint/resume
events (e.g., those generated by sys-suspend(1M)).  For example, if
gethrtime() returns time T, and then the machine is suspended for 2
hours, and then gethrtime() is called again, the time returned is not
T + (2 * 60 * 60 * NANOSEC), but rather approximately still T.

To work around this (and other checkpoint/resume related problems),
when a system is resumed, the DHCP client makes the pessimistic
assumption that all finite leases have expired while the machine was
suspended and must be obtained again.  This is known as "refreshing"
the leases, and is handled by refresh_smachs().

Note that it appears like a more intelligent approach would be to
record the time(2) when the system is suspended, compare that against
the time(2) when the system is resumed, and use the delta between them
to decide which leases have expired.  Sadly, this cannot be done since
through at least Solaris 10, it is not possible for userland programs
to be notified of system suspend events.

Configuration
-------------

For the most part, the DHCP client only *retrieves* configuration data
from the DHCP server, leaving the configuration to scripts (such as
boot scripts), which themselves use dhcpinfo(1) to retrieve the data
from the DHCP client.  This is desirable because it keeps the mechanism
of retrieving the configuration data decoupled from the policy of using
the data.

However, unless used in "inform" mode, the DHCP client *does*
configure each IP interface enough to allow it to communicate with
other hosts.  Specifically, the DHCP client configures the interface's
IP address, netmask, and broadcast address using the information
provided by the server.  Further, for IPv4 logical interface 0
("hme0"), any provided default routes are also configured.

For IPv6, only the IP addresses are set.  The netmask (prefix) is then
set automatically by in.ndpd, and routes are discovered in the usual
way by router discovery or routing protocols.  DHCPv6 doesn't set
routes.

Since logical interfaces cannot be specified as output interfaces in
the kernel forwarding table, and in most cases, logical interfaces
share a default route with their associated physical interface, the
DHCP client does not automatically add or remove default routes when
IPv4 leases are acquired or expired on logical interfaces.

Event Scripting
---------------

The DHCP client supports user program invocations on DHCP events.  The
supported events are BOUND, EXTEND, EXPIRE, DROP, RELEASE, and INFORM
for DHCPv4, and BUILD6, EXTEND6, EXPIRE6, DROP6, LOSS6, RELEASE6, and
INFORM6 for DHCPv6.  The user program runs asynchronous to the DHCP
client so that the main event loop stays active to process other
events, including events triggered by the user program (for example,
when it invokes dhcpinfo).

The user program execution is part of the transaction of a DHCP command.
For example, if the user program is not enabled, the transaction of the
DHCP command START is considered over when an ACK is received and the
interface is configured successfully.  If the user program is enabled,
it is invoked after the interface is configured successfully, and the
transaction is considered over only when the user program exits.  The
event scripting implementation makes use of the asynchronous operations
discussed in the "Transactions" section.

An upper bound of 58 seconds is imposed on how long the user program
can run. If the user program does not exit after 55 seconds, the signal
SIGTERM is sent to it. If it still does not exit after additional 3
seconds, the signal SIGKILL is sent to it.  Since the event handler is
a wrapper around poll(), the DHCP client cannot directly observe the
completion of the user program.  Instead, the DHCP client creates a
child "helper" process to synchronously monitor the user program (this
process is also used to send the aformentioned signals to the process,
if necessary).  The DHCP client and the helper process share a pipe
which is included in the set of poll descriptors monitored by the DHCP
client's event handler.  When the user program exits, the helper process
passes the user program exit status to the DHCP client through the pipe,
informing the DHCP client that the user program has finished.  When the
DHCP client is asked to shut down, it will wait for any running instances
of the user program to complete.


README.v6CDDL HEADER START

The contents of this file are subject to the terms of the
Common Development and Distribution License (the "License").
You may not use this file except in compliance with the License.

You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
or http://www.opensolaris.org/os/licensing.
See the License for the specific language governing permissions
and limitations under the License.

When distributing Covered Code, include this CDDL HEADER in each
file and include the License file at usr/src/OPENSOLARIS.LICENSE.
If applicable, add the following below this CDDL HEADER, with the
fields enclosed by brackets "[]" replaced with your own identifying
information: Portions Copyright [yyyy] [name of copyright owner]

CDDL HEADER END

Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.

ident   "%Z%%M% %I% %E% SMI"


**  PLEASE NOTE:
**
**  This document discusses aspects of the DHCPv4 client design that have
**  since changed (e.g., DLPI is no longer used).  However, since those
**  aspects affected the DHCPv6 design, the discussion has been left for
**  historical record.


DHCPv6 Client Low-Level Design

Introduction

  This project adds DHCPv6 client-side (not server) support to
  Solaris.  Future projects may add server-side support as well as
  enhance the basic capabilities added here.  These future projects
  are not discussed in detail in this document.

  This document assumes that the reader is familiar with the following
  other documents:

  - RFC 3315: the primary description of DHCPv6
  - RFCs 2131 and 2132: IPv4 DHCP
  - RFCs 2461 and 2462: IPv6 NDP and stateless autoconfiguration
  - RFC 3484: IPv6 default address selection
  - ifconfig(1M): Solaris IP interface configuration
  - in.ndpd(1M): Solaris IPv6 Neighbor and Router Discovery daemon
  - dhcpagent(1M): Solaris DHCP client
  - dhcpinfo(1): Solaris DHCP parameter utility
  - ndpd.conf(4): in.ndpd configuration file
  - netstat(1M): Solaris network status utility
  - snoop(1M): Solaris network packet capture and inspection
  - "DHCPv6 Client High-Level Design"

  Several terms from those documents (such as the DHCPv6 IA_NA and
  IAADDR options) are used without further explanation in this
  document; see the reference documents above for details.

  The overall plan is to enhance the existing Solaris dhcpagent so
  that it is able to process DHCPv6.  It would also have been possible
  to create a new, separate daemon process for this, or to integrate
  the feature into in.ndpd.  These alternatives, and the reason for
  the chosen design, are discussed in Appendix A.

  This document discusses the internal design issues involved in the
  protocol implementation, and with the associated components (such as
  in.ndpd, snoop, and the kernel's source address selection
  algorithm).  It does not discuss the details of the protocol itself,
  which are more than adequately described in the RFC, nor the
  individual lines of code, which will be in the code review.

  As a cross-reference, Appendix B has a summary of the components
  involved and the changes to each.


Background

  In order to discuss the design changes for DHCPv6, it's necessary
  first to talk about the current IPv4-only design, and the
  assumptions built into that design.

  The main data structure used in dhcpagent is the 'struct ifslist'.
  Each instance of this structure represents a Solaris logical IP
  interface under DHCP's control.  It also represents the shared state
  with the DHCP server that granted the address, the address itself,
  and copies of the negotiated options.

  There is one list in dhcpagent containing all of the IP interfaces
  that are under DHCP control.  IP interfaces not under DHCP control
  (for example, those that are statically addressed) are not included
  in this list, even when plumbed on the system.  These ifslist
  entries are chained like this:

  ifsheadp -> ifslist -> ifslist -> ifslist -> NULL
            net0      net0:1     net1

  Each ifslist entry contains the address, mask, lease information,
  interface name, hardware information, packets, protocol state, and
  timers.  The name of the logical IP interface under DHCP's control
  is also the name used in the administrative interfaces (dhcpinfo,
  ifconfig) and when logging events.

  Each entry holds open a DLPI stream and two sockets.  The DLPI
  stream is nulled-out with a filter when not in use, but still
  consumes system resources.  (Most significantly, it causes data
  copies in the driver layer that end up sapping performance.)

  The entry storage is managed by a insert/hold/release/remove model
  and reference counts.  In this model, insert_ifs() allocates a new
  ifslist entry and inserts it into the global list, with the global
  list holding a reference.  remove_ifs() removes it from the global
  list and drops that reference.  hold_ifs() and release_ifs() are
  used by data structures that refer to ifslist entries, such as timer
  entries, to make sure that the ifslist entry isn't freed until the
  timer has been dispatched or deleted.

  The design is single-threaded, so code that walks the global list
  needn't bother taking holds on the ifslist structure.  Only
  references that may be used at a different time (i.e., pointers
  stored in other data structures) need to be recorded.

  Packets are handled using PKT (struct dhcp; <netinet/dhcp.h>),
  PKT_LIST (struct dhcp_list; <dhcp_impl.h>), and dhcp_pkt_t (struct
  dhcp_pkt; "packet.h").  PKT is just the RFC 2131 DHCP packet
  structure, and has no additional information, such as packet length.
  PKT_LIST contains a PKT pointer, length, decoded option arrays, and
  linkage for putting the packet in a list.  Finally, dhcp_pkt_t has a
  PKT pointer and length values suitable for modifying the packet.

  Essentially, PKT_LIST is a wrapper for received packets, and
  dhcp_pkt_t is a wrapper for packets to be sent.

  The basic PKT structure is used in dhcpagent, inetboot, in.dhcpd,
  libdhcpagent, libwanboot, libdhcputil, and others.  PKT_LIST is used
  in a similar set of places, including the kernel NFS modules.
  dhcp_pkt_t is (as the header file implies) limited to dhcpagent.

  In addition to these structures, dhcpagent maintains a set of
  internal supporting abstractions.  Two key ones involved in this
  project are the "async operation" and the "IPC action."  An async
  operation encapsulates the actions needed for a given operation, so
  that if cancellation is needed, there's a single point where the
  associated resources can be freed.  An IPC action represents the
  user state related to the private interface used by ifconfig.


DHCPv6 Inherent Differences

  DHCPv6 naturally has some commonality with IPv4 DHCP, but also has
  some significant differences.

  Unlike IPv4 DHCP, DHCPv6 relies on link-local IP addresses to do its
  work.  This means that, on Solaris, the client doesn't need DLPI to
  perform any of the I/O; regular IP sockets will do the job.  It also
  means that, unlike IPv4 DHCP, DHCPv6 does not need to obtain a lease
  for the address used in its messages to the server.  The system
  provides the address automatically.

  IPv4 DHCP expects some messages from the server to be broadcast.
  DHCPv6 has no such mechanism; all messages from the server to the
  client are unicast.  In the case where the client and server aren't
  on the same subnet, a relay agent is used to get the unicast replies
  back to the client's link-local address.

  With IPv4 DHCP, a single address plus configuration options is
  leased with a given client ID and a single state machine instance,
  and the implementation binds that to a single IP logical interface
  specified by the user.  The lease has a "Lease Time," a required
  option, as well as two timers, called T1 (renew) and T2 (rebind),
  which are controlled by regular options.

  DHCPv6 uses a single client/server session to control the
  acquisition of configuration options and "identity associations"
  (IAs).  The identity associations, in turn, contain lists of
  addresses for the client to use and the T1/T2 timer values.  Each
  individual address has its own preferred and valid lifetime, with
  the address being marked "deprecated" at the end of the preferred
  interval, and removed at the end of the valid interval.

  IPv4 DHCP leaves many of the retransmit decisions up to the client,
  and some things (such as RELEASE and DECLINE) are sent just once.
  Others (such as the REQUEST message used for renew and rebind) are
  dealt with by heuristics.  DHCPv6 treats each message to the server
  as a separate transaction, and resends each message using a common
  retransmission mechanism.  DHCPv6 also has separate messages for
  Renew, Rebind, and Confirm rather than reusing the Request
  mechanism.

  The set of options (which are used to convey configuration
  information) for each protocol are distinct.  Notably, two of the
  mistakes from IPv4 DHCP have been fixed: DHCPv6 doesn't carry a
  client name, and doesn't attempt to impersonate a routing protocol
  by setting a "default route."

  Another welcome change is the lack of a netmask/prefix length with
  DHCPv6.  Instead, the client uses the Router Advertisement prefixes
  to set the correct interface netmask.  This reduces the number of
  databases that need to be kept in sync.  (The equivalent mechanism
  in IPv4 would have been the use of ICMP Address Mask Request /
  Reply, but the BOOTP designers chose to embed it in the address
  assignment protocol itself.)

  Otherwise, DHCPv6 is similar to IPv4 DHCP.  The same overall
  renew/rebind and lease expiry strategy is used, although the state
  machine events must now take into account multiple IAs and the fact
  that each can cause RENEWING or REBINDING state independently.


DHCPv6 And Solaris

  The protocol distinctions above have several important implications.
  For the logical interfaces:

    - Because Solaris uses IP logical interfaces to configure
      addresses, we must have multiple IP logical interfaces per IA
      with IPv6.

    - Because we need to support multiple addresses (and thus multiple
      IP logical interfaces) per IA and multiple IAs per client/server
      session, the IP logical interface name isn't a unique name for
      the lease.

  As a result, IP logical interfaces will come and go with DHCPv6,
  just as happens with the existing stateless address
  autoconfiguration support in in.ndpd.  The logical interface names
  (visible in ifconfig) have no administrative significance.

  Fortunately, DHCPv6 does end up with one fixed name that can be used
  to identify a session.  Because DHCPv6 uses link local addresses for
  communication with the server, the name of the IP logical interface
  that has this link local address (normally the same as the IP
  physical interface) can be used as an identifier for dhcpinfo and
  logging purposes.


Dhcpagent Redesign Overview

  The redesign starts by refactoring the IP interface representation.
  Because we need to have multiple IP logical interfaces (LIFs) for a
  single identity association (IA), we should not store all of the
  DHCP state information along with the LIF information.

  For DHCPv6, we will need to keep LIFs on a single IP physical
  interface (PIF) together, so this is probably also a good time to
  reconsider the way dhcpagent represents physical interfaces.  The
  current design simply replicates the state (notably the DLPI stream,
  but also the hardware address and other bits) among all of the
  ifslist entries on the same physical interface.

  The new design creates two lists of dhcp_pif_t entries, one list for
  IPv4 and the other for IPv6.  Each dhcp_pif_t represents a PIF, with
  a list of dhcp_lif_t entries attached, each of which represents a
  LIF used by dhcpagent.  This structure mirrors the kernel's ill_t
  and ipif_t interface representations.

  Next, the lease-tracking needs to be refactored.  DHCPv6 is the
  functional superset in this case, as it has two lifetimes per
  address (LIF) and IA groupings with shared T1/T2 timers.  To
  represent these groupings, we will use a new dhcp_lease_t structure.
  IPv4 DHCP will have one such structure per state machine, while
  DHCPv6 will have a list.  (Note: the initial implementation will
  have only one lease per DHCPv6 state machine, because each state
  machine uses a single link-local address, a single DUID+IAID pair,
  and supports only Non-temporary Addresses [IA_NA option].  Future
  enhancements may use multiple leases per DHCPv6 state machine or
  support other IA types.)

  For all of these new structures, we will use the same insert/hold/
  release/remove model as with the original ifslist.

  Finally, the remaining items (and the bulk of the original ifslist
  members) are kept on a per-state-machine basis.  As this is no
  longer just an "interface," a new dhcp_smach_t structure will hold
  these, and the ifslist structure is gone.


Lease Representation

  For DHCPv6, we need to track multiple LIFs per lease (IA), but we
  also need multiple LIFs per PIF.  Rather than having two sets of
  list linkage for each LIF, we can observe that a LIF is on exactly
  one PIF and is a member of at most one lease, and then simplify: the
  lease structure will use a base pointer for the first LIF in the
  lease, and a count for the number of consecutive LIFs in the PIF's
  list of LIFs that belong to the lease.

  When removing a LIF from the system, we need to decrement the count
  of LIFs in the lease, and advance the base pointer if the LIF being
  removed is the first one.  Inserting a LIF means just moving it into
  this list and bumping the counter.

  When removing a lease from a state machine, we need to dispose of
  the LIFs referenced.  If the LIF being disposed is the main LIF for
  a state machine, then all that we can do is canonize the LIF
  (returning it to a default state); this represents the normal IPv4
  DHCP operation on lease expiry.  Otherwise, the lease is the owner
  of that LIF (it was created because of a DHCPv6 IA), and disposal
  means unplumbing the LIF from the actual system and removing the LIF
  entry from the PIF.


Main Structure Linkage

  For IPv4 DHCP, the new linkage is straightforward.  Using the same
  system configuration example as in the initial design discussion:

          +- lease  +- lease       +- lease
          |  ^      |  ^           |  ^
          |  |      |  |           |  |
          \  smach  \  smach       \  smach
           \ ^|      \ ^|           \ ^|
            v|v       v|v            v|v
            lif ----> lif -> NULL     lif -> NULL
            net0      net0:1          net1
            ^                         ^
            |                         |
  v4root -> pif --------------------> pif -> NULL
            net0                      net1

  This diagram shows three separate state machines running (with
  backpointers omitted for clarity).  Each state machine has a single
  "main" LIF with which it's associated (and named).  Each also has a
  single lease structure that points back to the same LIF (count of
  1), because IPv4 DHCP controls a single address allocation per state
  machine.

  DHCPv6 is a bit more complex.  This shows DHCPv6 running on two
  interfaces (more or fewer interfaces are of course possible) and
  with multiple leases on the first interface, and each lease with
  multiple addresses (one with two addresses, the second with one).

            lease ----------------> lease -> NULL   lease -> NULL
            ^   \(2)                |(1)            ^   \ (1)
            |    \                  |               |    \
            smach \                 |               smach \
            ^ |    \                |               ^ |    \
            | v     v               v               | v     v
            lif --> lif --> lif --> lif --> NULL    lif --> lif -> NULL
            net0    net0:1  net0:4  net0:2          net1    net1:5
            ^                                       ^
            |                                       |
  v6root -> pif ----------------------------------> pif -> NULL
            net0                                    net1

  Note that there's intentionally no ordering based on name in the
  list of LIFs.  Instead, the contiguous LIF structures in that list
  represent the addresses in each lease.  The logical interfaces
  themselves are allocated and numbered by the system kernel, so they
  may not be sequential, and there may be gaps in the list if other
  entities (such as in.ndpd) are also configuring interfaces.

  Note also that with IPv4 DHCP, the lease points to the LIF that's
  also the main LIF for the state machine, because that's the IP
  interface that dhcpagent controls.  With DHCPv6, the lease (one per
  IA structure) points to a separate set of LIFs that are created just
  for the leased addresses (one per IA address in an IAADDR option).
  The state machine alone points to the main LIF.


Packet Structure Extensions

  Obviously, we need some DHCPv6 packet data structures and
  definitions.  A new <netinet/dhcp6.h> file will be introduced with
  the necessary #defines and structures.  The key structure there will
  be:

    struct dhcpv6_message {
        uint8_t     d6m_msg_type;
        uint8_t     d6m_transid_ho;
        uint16_t    d6m_transid_lo;
    };
    typedef struct dhcpv6_message   dhcpv6_message_t;

  This defines the usual (non-relay) DHCPv6 packet header, and is
  roughly equivalent to PKT for IPv4.

  Extending dhcp_pkt_t for DHCPv6 is straightforward, as it's used
  only within dhcpagent.  This structure will be amended to use a
  union for v4/v6 and include a boolean to flag which version is in
  use.

  For the PKT_LIST structure, things are more complex.  This defines
  both a queuing mechanism for received packets (typically OFFERs) and
  a set of packet decoding structures.  The decoding structures are
  highly specific to IPv4 DHCP -- they have no means to handle nested
  or repeated options (as used heavily in DHCPv6) and make use of the
  DHCP_OPT structure which is specific to IPv4 DHCP -- and are
  somewhat expensive in storage, due to the use of arrays indexed by
  option code number.

  Worse, this structure is used throughout the system, so changes to
  it need to be made carefully.  (For example, the existing 'pkt'
  member can't just be turned into a union.)

  For an initial prototype, since discarded, I created a new
  dhcp_plist_t structure to represent packet lists as used inside
  dhcpagent and made dhcp_pkt_t valid for use on input and output.
  The result is unsatisfying, though, as it results in code that
  manipulates far too many data structures in common cases; it's a sea
  of pointers to pointers.

  The better answer is to use PKT_LIST for both IPv4 and IPv6, adding
  the few new bits of metadata required to the end (receiving ifIndex,
  packet source/destination addresses), and staying within the overall
  existing design.

  For option parsing, dhcpv6_find_option() and dhcpv6_pkt_option()
  functions will be added to libdhcputil.  The former function will
  walk a DHCPv6 option list, and provide safe (bounds-checked) access
  to the options inside.  The function can be called recursively, so
  that option nesting can be handled fairly simply by nested loops,
  and can be called repeatedly to return each instance of a given
  option code number.  The latter function is just a convenience
  wrapper on dhcpv6_find_option() that starts with a PKT_LIST pointer
  and iterates over the top-level options with a given code number.

  There are two special considerations for the use of these library
  interfaces: there's no "pad" option for DHCPv6 or alignment
  requirements on option headers or contents, and nested options
  always follow a structure that has type-dependent length.  This
  means that code that handles options must all be written to deal
  with unaligned data, and suboption code must index the pointer past
  the type-dependent part.


Packet Construction

  Unlike DHCPv4, DHCPv6 places the transaction timer value in an
  option.  The existing code sets the current time value in
  send_pkt_internal(), which allows it to be updated in a
  straightforward way when doing retransmits.

  To make this work in a simple manner for DHCPv6, I added a
  remove_pkt_opt() function.  The update logic just does a remove and
  re-adds the option.  We could also just assume the presence of the
  option, find it, and modify in place, but the remove feature seems
  more general.

  DHCPv6 uses nesting options.  To make this work, two new utility
  functions are needed.  First, an add_pkt_subopt() function will take
  a pointer to an existing option and add an embedded option within
  it.  The packet length and existing option length are updated.  If
  that existing option isn't a top-level option, though, this means
  that the caller must update the lengths of all of the enclosing
  options up to the top level.  To do this, update_v6opt_len() will be
  added.  This is used in the special case of adding a Status Code
  option to an IAADDR option within an IA_NA top-level option.


Sockets and I/O Handling

  DHCPv6 doesn't need or use either a DLPI or a broadcast IP socket.
  Instead, a single unicast-bound IP socket on a link-local address
  would be the most that is needed.  This is roughly equivalent to
  if_sock_ip_fd in the existing design, but that existing socket is
  bound only after DHCP reaches BOUND state -- that is, when it
  switches away from DLPI.  We need something different.

  This, along with the excess of open file descriptors in an otherwise
  idle daemon and the potentially serious performance problems in
  leaving DLPI open at all times, argues for a larger redesign of the
  I/O logic in dhcpagent.

  The first thing that we can do is eliminate the need for the
  per-ifslist if_sock_fd.  This is used primarily for issuing ioctls
  to configure interfaces -- a task that would work as well with any
  open socket -- and is also registered to receive any ACK/NAK packets
  that may arrive via broadcast.  Both of these can be eliminated by
  creating a pair of global sockets (IPv4 and IPv6), bound and
  configured for ACK/NAK reception.  The only functional difference is
  that the list of running state machines must be scanned on reception
  to find the correct transaction ID, but the existing design
  effectively already goes to this effort because the kernel
  replicates received datagrams among all matching sockets, and each
  ifslist entry has a socket open.

  (The existing code for if_sock_fd makes oblique reference to unknown
  problems in the system that may prevent binding from working in some
  cases.  The reference dates back some seven years to the original
  DHCP implementation.  I've observed no such problems in extensive
  testing and if any do show up, they will be dealt with by fixing the
  underlying bugs.)

  This leads to an important simplification: it's no longer necessary
  to register, unregister, and re-register for packet reception while
  changing state -- register_acknak() and unregister_acknak() are
  gone.  Instead, we always receive, and we dispatch the packets as
  they arrive.  As a result, when receiving a DHCPv4 ACK or DHCPv6
  Reply when in BOUND state, we know it's a duplicate, and we can
  discard.

  The next part is in minimizing DLPI usage.  A DLPI stream is needed
  at most for each IPv4 PIF, and it's not needed when all of the
  DHCP instances on that PIF are bound.  In fact, the current
  implementation deals with this in configure_bound() by setting a
  "blackhole" packet filter.  The stream is left open.

  To simplify this, we will open at most one DLPI stream on a PIF, and
  use reference counts from the state machines to determine when the
  stream must be open and when it can be closed.  This mechanism will
  be centralized in a set_smach_state() function that changes the
  state and opens/closes the DLPI stream when needed.

  This leads to another simplification.  The I/O logic in the existing
  dhcpagent makes use of the protocol state to select between DLPI and
  sockets.  Now that we keep track of this in a simpler manner, we no
  longer need to switch out on state in when sending a packet; just
  test the dsm_using_dlpi flag instead.

  Still another simplification is in the handling of DHCPv4 INFORM.
  The current code has separate logic in it for getting the interface
  state and address information.  This is no longer necessary, as the
  LIF mechanism keeps track of the interface state.  And since we have
  separate lease structures, and INFORM doesn't acquire a lease, we no
  longer have to be careful about canonizing the interface on
  shutdown.

  Although the default is to send all client messages to a well-known
  multicast address for servers and relays, DHCPv6 also has a
  mechanism that allows the client to send unicast messages to the
  server.  The operation of this mechanism is slightly complex.
  First, the server sends the client a unicast address via an option.
  We may use this address as the destination (rather than the
  well-known multicast address for local DHCPv6 servers and relays)
  only if we have a viable local source address.  This means using
  SIOCGDSTINFO each time we try to send unicast.  Next, the server may
  send back a special status code: UseMulticast.  If this is received,
  and if we were actually using unicast in our messages to the server,
  then we need to forget the unicast address, switch back to
  multicast, and resend our last message.

  Note that it's important to avoid the temptation to resend the last
  message every time UseMulticast is seen, and do it only once on
  switching back to multicast: otherwise, a potential feedback loop is
  created.

  Because IP_PKTINFO (PSARC 2006/466) has integrated, we could go a
  step further by removing the need for any per-LIF sockets and just
  use the global sockets for all but DLPI.  However, in order to
  facilitate a Solaris 10 backport, this will be done separately as CR
  6509317.

  In the case of DHCPv6, we already have IPV6_PKTINFO, so we will pave
  the way for IPv4 by beginning to using this now, and thus have just
  a single socket (bound to "::") for all of DHCPv6.  Doing this
  requires switching from the old BSD4.2 -lsocket -lnsl to the
  standards-compliant -lxnet in order to use ancillary data.

  It may also be possible to remove the need for DLPI for IPv4, and
  incidentally simplify the code a fair amount, by adding a kernel
  option to allow transmission and reception of UDP packets over
  interfaces that are plumbed but not marked IFF_UP.  This is left for
  future work.


The State Machine

  Several parts of the existing state machine need additions to handle
  DHCPv6, which is a superset of DHCPv4.

  First, there are the RENEWING and REBINDING states.  For IPv4 DHCP,
  these states map one-to-one with a single address and single lease
  that's undergoing renewal.  It's a simple progression (on timeout)
  from BOUND, to RENEWING, to REBINDING and finally back to SELECTING
  to start over.  Each retransmit is done by simply rescheduling the
  T1 or T2 timer.

  For DHCPv6, things are somewhat more complex.  At any one time,
  there may be multiple IAs (leases) that are effectively in renewing
  or rebinding state, based on the T1/T2 timers for each IA, and many
  addresses that have expired.

  However, because all of the leases are related to a single server,
  and that server either responds to our requests or doesn't, we can
  simplify the states to be nearly identical to IPv4 DHCP.

  The revised definition for use with DHCPv6 is:

    - Transition from BOUND to RENEWING state when the first T1 timer
      (of any lease on the state machine) expires.  At this point, as
      an optimization, we should begin attempting to renew any IAs
      that are within REN_TIMEOUT (10 seconds) of reaching T1 as well.
      We may as well avoid sending an excess of packets.

    - When a T1 lease timer expires and we're in RENEWING or REBINDING
      state, just ignore it, because the transaction is already in
      progress.

    - At each retransmit timeout, we should check to see if there are
      more IAs that need to join in because they've passed point T1 as
      well, and, if so, add them.  This check isn't necessary at this
      time, because only a single IA_NA is possible with the initial
      design.

    - When we reach T2 on any IA and we're in BOUND or RENEWING state,
      enter REBINDING state.  At this point, we have a choice.  For
      those other IAs that are past T1 but not yet at T2, we could
      ignore them (sending only those that have passed point T2),
      continue to send separate Renew messages for them, or just
      include them in the Rebind message.  This isn't an issue that
      must be dealt with for this project, but the plan is to include
      them in the Rebind message.

    - When a T2 lease timer expires and we're in REBINDING state, just
      ignore it, as with the corresponding T1 timer.

    - As addresses reach the end of their preferred lifetimes, set the
      IFF_DEPRECATED flag.  As they reach the end of the valid
      lifetime, remove them from the system.  When an IA (lease)
      becomes empty, just remove it.  When there are no more leases
      left, return to SELECTING state to start over.

  Note that the RFC treats the IAs as separate entities when
  discussing the renew/rebind T1/T2 timers, but treats them as a unit
  when doing the initial negotiation.  This is, to say the least,
  confusing, especially so given that there's no reason to expect that
  after having failed to elicit any responses at all from the server
  on one IA, the server will suddenly start responding when we attempt
  to renew some other IA.  We rationalize this behavior by using a
  single renew/rebind state for the entire state machine (and thus
  client/server pair).

  There's a subtle timing difference here between DHCPv4 and DHCPv6.
  For DHCPv4, the client just sends packets more and more frequently
  (shorter timeouts) as the next state gets nearer.  DHCPv6 treats
  each as a transaction, using the same retransmit logic as for other
  messages.  The DHCPv6 method is a cleaner design, so we will change
  the DHCPv4 implementation to do the same, and compute the new timer
  values as part of stop_extending().

  Note that it would be possible to start the SELECTING state earlier
  than waiting for the last lease to expire, and thus avoid a loss of
  connectivity.  However, it this point, there are other servers on
  the network that have seen us attempting to Rebind for quite some
  time, and they have not responded.  The likelihood that there's a
  server that will ignore Rebind but then suddenly spring into action
  on a Solicit message seems low enough that the optimization won't be
  done now.  (Starting SELECTING state earlier may be done in the
  future, if it's found to be useful.)


Persistent State

  IPv4 DHCP has only minimal need for persistent state, beyond the
  configuration parameters.  The state is stored when "ifconfig dhcp
  drop" is run or the daemon receives SIGTERM, which is typically done
  only well after the system is booted and running.

  The daemon stores this state in /etc/dhcp, because it needs to be
  available when only the root file system has been mounted.

  Moreover, dhcpagent starts very early in the boot process.  It runs
  as part of svc:/network/physical:default, which runs well before
  root is mounted read/write:

     svc:/system/filesystem/root:default ->
        svc:/system/metainit:default ->
           svc:/system/identity:node ->
              svc:/network/physical:default
           svc:/network/iscsi_initiator:default ->
              svc:/network/physical:default

  and, of course, well before either /var or /usr is mounted.  This
  means that any persistent state must be kept in the root file
  system, and that if we write before shutdown, we have to cope
  gracefully with the root file system returning EROFS on write
  attempts.

  For DHCPv6, we need to try to keep our stable DUID and IAID values
  stable across reboots to fulfill the demands of RFC 3315.

  The DUID is either configured or automatically generated.  When
  configured, it comes from the /etc/default/dhcpagent file, and thus
  does not need to be saved by the daemon.  If automatically
  generated, there's exactly one of these created, and it will
  eventually be needed before /usr is mounted, if /usr is mounted over
  IPv6.  This means a new file in the root file system,
  /etc/dhcp/duid, will be used to hold the automatically generated
  DUID.

  The determination of whether to use a configured DUID or one saved
  in a file is made in get_smach_cid().  This function will
  encapsulate all of the DUID parsing and generation machinery for the
  rest of dhcpagent.

  If root is not writable at the point when dhcpagent starts, and our
  attempt fails with EROFS, we will set a timer for 60 second
  intervals to retry the operation periodically.  In the unlikely case
  that it just never succeeds or that we're rebooted before root
  becomes writable, then the impact will be that the daemon will wake
  up once a minute and, ultimately, we'll choose a different DUID on
  next start-up, and we'll thus lose our leases across a reboot.

  The IAID similarly must be kept stable if at all possible, but
  cannot be configured by the user.  To do make these values stable,
  we will use two strategies.  First the IAID value for a given
  interface (if not known) will just default to the IP ifIndex value,
  provided that there's no known saved IAID using that value.  Second,
  we will save off the IAID we choose in a single /etc/dhcp/iaid file,
  containing an array of entries indexed by logical interface name.
  Keeping it in a single file allows us to scan for used and unused
  IAID values when necessary.

  This mechanism depends on the interface name, and thus will need to
  be revisited when Clearview vanity naming and NWAM are available.

  Currently, the boot system (GRUB, OBP, the miniroot) does not
  support installing over IPv6.  This could change in the future, so
  one of the goals of the above stability plan is to support that
  event.

  When running in the miniroot on an x86 system, /etc/dhcp (and the
  rest of the root) is mounted on a read-only ramdisk.  In this case,
  writing to /etc/dhcp will just never work.  A possible solution
  would be to add a new privileged command in ifconfig that forces
  dhcpagent to write to an alternate location.  The initial install
  process could then do "ifconfig <x> dhcp write /a" to get the needed
  state written out to the newly-constructed system root.

  This part (the new write option) won't be implemented as part of
  this project, because it's not needed yet.


Router Advertisements

  IPv6 Router Advertisements perform two functions related to DHCPv6:

    - they specify whether and how to run DHCPv6 on a given interface.
    - they provide a list of the valid prefixes on an interface.

  For the first function, in.ndpd needs to use the same DHCP control
  interfaces that ifconfig uses, so that it can launch dhcpagent and
  trigger DHCPv6 when necessary.  Note that it never needs to shut
  down DHCPv6, as router advertisements can't do that.

  However, launching dhcpagent presents new problems.  As a part of
  the "Quagga SMF Modifications" project (PSARC 2006/552), in.ndpd in
  Nevada is now privilege-aware and runs with limited privileges,
  courtesy of SMF.  Dhcpagent, on the other hand, must run with all
  privileges.

  A simple work-around for this issue is to rip out the "privileges="
  clause from the method_credential for in.ndpd.  I've taken this
  direction initially, but the right longer-term answer seems to be
  converting dhcpagent into an SMF service.  This is quite a bit more
  complex, as it means turning the /sbin/dhcpagent command line
  interface into a utility that manipulates the service and passes the
  command line options via IPC extensions.

  Such a design also begs the question of whether dhcpagent itself
  ought to run with reduced privileges.  It could, but it still needs
  the ability to grant "all" (traditional UNIX root) privileges to the
  eventhook script, if present.  There seem to be few ways to do this,
  though it's a good area for research.

  The second function, prefix handling, is also subtle.  Unlike IPv4
  DHCP, DHCPv6 does not give the netmask or prefix length along with
  the leased address.  The client is on its own to determine the right
  netmask to use.  This is where the advertised prefixes come in:
  these must be used to finish the interface configuration.

  We will have the DHCPv6 client configure each interface with an
  all-ones (/128) netmask by default.  In.ndpd will be modified so
  that when it detects a new IFF_DHCPRUNNING IP logical interface, it
  checks for a known matching prefix, and sets the netmask as
  necessary.  If no matching prefix is known, it will send a new
  Router Solicitation message to try to find one.

  When in.ndpd learns of a new prefix from a Router Advertisement, it
  will scan all of the IFF_DHCPRUNNING IP logical interfaces on the
  same physical interface and set the netmasks when necessary.
  Dhcpagent, for its part, will ignore the netmask on IPv6 interfaces
  when checking for changes that would require it to "abandon" the
  interface.

  Given the way that DHCPv6 and in.ndpd control both the horizontal
  and the vertical in plumbing and removing logical interfaces, and
  users do not, it might be worthwhile to consider roping off any
  direct user changes to IPv6 logical interfaces under control of
  in.ndpd or dhcpagent, and instead force users through a higher-level
  interface.  This won't be done as part of this project, however.


ARP Hardware Types

  There are multiple places within the DHCPv6 client where the mapping
  of DLPI MAC type to ARP Hardware Type is required:

  - When we are constructing an automatic, stable DUID for our own
    identity, we prefer to use a DUID-LLT if possible.  This is done
    by finding a link-layer interface, opening it, reading the MAC
    address and type, and translating in the make_stable_duid()
    function in libdhcpagent.

  - When we translate a user-configured DUID from
    /etc/default/dhcpagent into a binary representation, we may have
    to deal with a physical interface name.  In this case, we must
    open that interface and read the MAC address and type.

  - As part of the PIF data structure initialization, we need to read
    out the MAC type so that it can be used in the BOOTP/DHCPv4
    'htype' field.

  Ideally, these would all be provided by a single libdlpi
  implementation.  However, that project is on-going at this time and
  has not yet integrated.  For the time being, a dlpi_to_arp()
  translation function (taking dl_mac_type and returning an ARP
  Hardware Type number) will be placed in libdhcputil.

  This temporary function should be removed and this section of the
  code updated when the new libdlpi from Clearview integrates.


Field Mappings

  Old (all in ifslist)  New
  next          dhcp_smach_t.dsm_next
  prev          dhcp_smach_t.dsm_prev
  if_hold_count     dhcp_smach_t.dsm_hold_count
  if_ia         dhcp_smach_t.dsm_ia
  if_async      dhcp_smach_t.dsm_async
  if_state      dhcp_smach_t.dsm_state
  if_dflags     dhcp_smach_t.dsm_dflags
  if_name       dhcp_smach_t.dsm_name (see text)
  if_index      dhcp_pif_t.pif_index
  if_max        dhcp_lif_t.lif_max and dhcp_pif_t.pif_max
  if_min        (was unused; removed)
  if_opt        (was unused; removed)
  if_hwaddr     dhcp_pif_t.pif_hwaddr
  if_hwlen      dhcp_pif_t.pif_hwlen
  if_hwtype     dhcp_pif_t.pif_hwtype
  if_cid        dhcp_smach_t.dsm_cid
  if_cidlen     dhcp_smach_t.dsm_cidlen
  if_prl        dhcp_smach_t.dsm_prl
  if_prllen     dhcp_smach_t.dsm_prllen
  if_daddr      dhcp_pif_t.pif_daddr
  if_dlen       dhcp_pif_t.pif_dlen
  if_saplen     dhcp_pif_t.pif_saplen
  if_sap_before     dhcp_pif_t.pif_sap_before
  if_dlpi_fd        dhcp_pif_t.pif_dlpi_fd
  if_sock_fd        v4_sock_fd and v6_sock_fd (globals)
  if_sock_ip_fd     dhcp_lif_t.lif_sock_ip_fd
  if_timer      (see text)
  if_t1         dhcp_lease_t.dl_t1
  if_t2         dhcp_lease_t.dl_t2
  if_lease      dhcp_lif_t.lif_expire
  if_nrouters       dhcp_smach_t.dsm_nrouters
  if_routers        dhcp_smach_t.dsm_routers
  if_server     dhcp_smach_t.dsm_server
  if_addr       dhcp_lif_t.lif_v6addr
  if_netmask        dhcp_lif_t.lif_v6mask
  if_broadcast      dhcp_lif_t.lif_v6peer
  if_ack        dhcp_smach_t.dsm_ack
  if_orig_ack       dhcp_smach_t.dsm_orig_ack
  if_offer_wait     dhcp_smach_t.dsm_offer_wait
  if_offer_timer    dhcp_smach_t.dsm_offer_timer
  if_offer_id       dhcp_pif_t.pif_dlpi_id
  if_acknak_id      dhcp_lif_t.lif_acknak_id
  if_acknak_bcast_id    v4_acknak_bcast_id (global)
  if_neg_monosec    dhcp_smach_t.dsm_neg_monosec
  if_newstart_monosec   dhcp_smach_t.dsm_newstart_monosec
  if_curstart_monosec   dhcp_smach_t.dsm_curstart_monosec
  if_disc_secs      dhcp_smach_t.dsm_disc_secs
  if_reqhost        dhcp_smach_t.dsm_reqhost
  if_recv_pkt_list  dhcp_smach_t.dsm_recv_pkt_list
  if_sent       dhcp_smach_t.dsm_sent
  if_received       dhcp_smach_t.dsm_received
  if_bad_offers     dhcp_smach_t.dsm_bad_offers
  if_send_pkt       dhcp_smach_t.dsm_send_pkt
  if_send_timeout   dhcp_smach_t.dsm_send_timeout
  if_send_dest      dhcp_smach_t.dsm_send_dest
  if_send_stop_func dhcp_smach_t.dsm_send_stop_func
  if_packet_sent    dhcp_smach_t.dsm_packet_sent
  if_retrans_timer  dhcp_smach_t.dsm_retrans_timer
  if_script_fd      dhcp_smach_t.dsm_script_fd
  if_script_pid     dhcp_smach_t.dsm_script_pid
  if_script_helper_pid  dhcp_smach_t.dsm_script_helper_pid
  if_script_event   dhcp_smach_t.dsm_script_event
  if_script_event_id    dhcp_smach_t.dsm_script_event_id
  if_callback_msg   dhcp_smach_t.dsm_callback_msg
  if_script_callback    dhcp_smach_t.dsm_script_callback

  Notes:

    - The dsm_name field currently just points to the lif_name on the
      controlling LIF.  This may need to be named differently in the
      future; perhaps when Zones are supported.

    - The timer mechanism will be refactored.  Rather than using the
      separate if_timer[] array to hold the timer IDs and
      if_{t1,t2,lease} to hold the relative timer values, we will
      gather this information into a dhcp_timer_t structure:

    dt_id       timer ID value
    dt_start    relative start time

  New fields not accounted for above:

  dhcp_pif_t.pif_next       linkage in global list of PIFs
  dhcp_pif_t.pif_prev       linkage in global list of PIFs
  dhcp_pif_t.pif_lifs       pointer to list of LIFs on this PIF
  dhcp_pif_t.pif_isv6       IPv6 flag
  dhcp_pif_t.pif_dlpi_count number of state machines using DLPI
  dhcp_pif_t.pif_hold_count reference count
  dhcp_pif_t.pif_name       name of physical interface
  dhcp_lif_t.lif_next       linkage in per-PIF list of LIFs
  dhcp_lif_t.lif_prev       linkage in per-PIF list of LIFs
  dhcp_lif_t.lif_pif        backpointer to parent PIF
  dhcp_lif_t.lif_smachs     pointer to list of state machines
  dhcp_lif_t.lif_lease      backpointer to lease holding LIF
  dhcp_lif_t.lif_flags      interface flags (IFF_*)
  dhcp_lif_t.lif_hold_count reference count
  dhcp_lif_t.lif_dad_wait   waiting for DAD resolution flag
  dhcp_lif_t.lif_removed    removed from list flag
  dhcp_lif_t.lif_plumbed    plumbed by dhcpagent flag
  dhcp_lif_t.lif_expired    lease has expired flag
  dhcp_lif_t.lif_declined   reason to refuse this address (string)
  dhcp_lif_t.lif_iaid       unique and stable 32-bit identifier
  dhcp_lif_t.lif_iaid_id    timer for delayed /etc writes
  dhcp_lif_t.lif_preferred  preferred timer for v6; deprecate after
  dhcp_lif_t.lif_name       name of logical interface
  dhcp_smach_t.dsm_lif      controlling (main) LIF
  dhcp_smach_t.dsm_leases   pointer to list of leases
  dhcp_smach_t.dsm_lif_wait number of LIFs waiting on DAD
  dhcp_smach_t.dsm_lif_down number of LIFs that have failed
  dhcp_smach_t.dsm_using_dlpi   currently using DLPI flag
  dhcp_smach_t.dsm_send_tcenter v4 central timer value; v6 MRT
  dhcp_lease_t.dl_next      linkage in per-state-machine list of leases
  dhcp_lease_t.dl_prev      linkage in per-state-machine list of leases
  dhcp_lease_t.dl_smach     back pointer to state machine
  dhcp_lease_t.dl_lifs      pointer to first LIF configured by lease
  dhcp_lease_t.dl_nlifs     number of configured consecutive LIFs
  dhcp_lease_t.dl_hold_count    reference counter
  dhcp_lease_t.dl_removed   removed from list flag
  dhcp_lease_t.dl_stale     lease was not updated by Renew/Rebind


Snoop

  The snoop changes are fairly straightforward.  As snoop just decodes
  the messages, and the message format is quite different between
  DHCPv4 and DHCPv6, a new module will be created to handle DHCPv6
  decoding, and will export a interpret_dhcpv6() function.

  The one bit of commonality between the two protocols is the use of
  ARP Hardware Type numbers, which are found in the underlying BOOTP
  message format for DHCPv4 and in the DUID-LL and DUID-LLT
  construction for DHCPv6.  To simplify this, the existing static
  show_htype() function in snoop_dhcp.c will be renamed to arp_htype()
  (to better reflect its functionality), updated with more modern
  hardware types, moved to snoop_arp.c (where it belongs), and made a
  public symbol within snoop.

  While I'm there, I'll update snoop_arp.c so that when it prints an
  ARP message in verbose mode, it uses arp_htype() to translate the
  ar_hrd value.

  The snoop updates also involve the addition of a new "dhcp6" keyword
  for filtering.  As a part of this, CR 6487534 will be fixed.


IPv6 Source Address Selection

  One of the customer requests for DHCPv6 is to be able to predict the
  address selection behavior in the presence of both stateful and
  stateless addresses on the same network.

  Solaris implements RFC 3484 address selection behavior.  In this
  scheme, the first seven rules implement some basic preferences for
  addresses, with Rule 8 being a deterministic tie breaker.

  Rule 8 relies on a special function, CommonPrefixLen, defined in the
  RFC, that compares leading bits of the address without regard to
  configured prefix length.  As Rule 1 eliminates equal addresses,
  this always picks a single address.

  This rule, though, allows for additional checks:

   Rule 8 may be superseded if the implementation has other means of
   choosing among source addresses.  For example, if the implementation
   somehow knows which source address will result in the "best"
   communications performance.

  We will thus split Rule 8 into three separate rules:

  - First, compare on configured prefix.  The interface with the
    longest configured prefix length that also matches the candidate
    address will be preferred.

  - Next, check the type of address.  Prefer statically configured
    addresses above all others.  Next, those from DHCPv6.  Next,
    stateless autoconfigured addresses.  Finally, temporary addresses.
    (Note that Rule 7 will take care of temporary address preferences,
    so that this rule doesn't actually need to look at them.)

  - Finally, run the check-all-bits (CommonPrefixLen) tie breaker.

  The result of this is that if there's a local address in the same
  configured prefix, then we'll prefer that over other addresses.  If
  there are multiple to choose from, then will pick static first, then
  DHCPv6, then dynamic.  Finally, if there are still multiples, we'll
  use the "closest" address, bitwise.

  Also, this basic implementation scheme also addresses CR 6485164, so
  a fix for that will be included with this project.


Minor Improvements

  Various small problems with the system encountered during
  development will be fixed along with this project.  Some of these
  are:

  - List of ARPHRD_* types is a bit short; add some new ones.

  - List of IPPORT_* values is similarly sparse; add others in use by
    snoop.

  - dhcpmsg.h lacks PRINTFLIKE for dhcpmsg(); add it.

  - CR 6482163 causes excessive lint errors with libxnet; will fix.

  - libdhcpagent uses gettimeofday() for I/O timing, and this can
    drift on systems with NTP.  It should use a stable time source
    (gethrtime()) instead, and should return better error values.

  - Controlling debug mode in the daemon shouldn't require changing
    the command line arguments or jumping through special hoops.  I've
    added undocumented ".DEBUG_LEVEL=[0-3]" and ".VERBOSE=[01]"
    features to /etc/default/dhcpagent.

  - The various attributes of the IPC commands (requires privileges,
    creates a new session, valid with BOOTP, immediate reply) should
    be gathered together into one look-up table rather than scattered
    as hard-coded tests.

  - Remove the event unregistration from the command dispatch loop and
    get rid of the ipc_action_pending() botch.  We'll get a
    zero-length read any time the client goes away, and that will be
    enough to trigger termination.  This fix removes async_pending()
    and async_timeout() as well, and fixes CR 6487958 as a
    side-effect.

  - Throughout the dhcpagent code, there are private implementations
    of doubly-linked and singly-linked lists for each data type.
    These will all be removed and replaced with insque(3C) and
    remque(3C).


Testing

  The implementation was tested using the TAHI test suite for DHCPv6
  (www.tahi.org).  There are some peculiar aspects to this test suite,
  and these issues directed some of the design.  In particular:

  - If Renew/Rebind doesn't mention one of our leases, then we need to
    allow the message to be retransmitted.  Real servers are unlikely
    to do this.

  - We must look for a status code within IAADDR and within IA_NA, and
    handle the paradoxical case of "NoAddrAvail."  That doesn't make
    sense, as a server with no addresses wouldn't use those options.
    That option makes more sense at the top level of the message.

  - If we get "UseMulticast" when we were already using multicast,
    then ignore the error code.  Sending another request would cause a
    loop.

  - TAHI uses "NoBinding" at the top level of the message.  This
    status code only makes sense within an IA, as it refers to the
    GUID:IAID binding, which doesn't exist outside an IA.  We must
    ignore such errors -- treat them as success.


Interactions With Other Projects

  Clearview UV (vanity naming) will cause link names, and thus IP
  interface names, to become changeable over time.  This will break
  the IAID stability mechanism if UV is used for arbitrary renaming,
  rather than as just a DR enhancement.

  When this portion of Clearview integrates, this part of the DHCPv6
  design may need to be revisited.  (The solution will likely be
  handled at some higher layer, such as within Network Automagic.)

  Clearview is also contributing a new libdlpi that will work for
  dhcpagent, and is thus removing the private dlpi_io.[ch] functions
  from this daemon.  When that Clearview project integrates, the
  DHCPv6 project will need to adjust to the new interfaces, and remove
  or relocate the dlpi_to_arp() function.


Futures

  Zones currently cannot address any IP interfaces by way of DHCP.
  This project will not fix that problem, but the DUID/IAID could be
  used to help fix it in the future.

  In particular, the DUID allows the client to obtain separate sets of
  addresses and configuration parameters on a single interface, just
  like an IPv4 Client ID, but it includes a clean mechanism for vendor
  extensions.  If we associate the DUID with the zone identifier or
  name through an extension, then we have a really simple way of
  allocating per-zone addresses.

  Moreover, RFC 4361 describes a handy way of using DHCPv6 DUID/IAID
  values with IPv4 DHCP, which would quickly solve the problem of
  using DHCP for IPv4 address assignment in non-global zones as well.

  (One potential risk with this plan is that there may be server
  implementations that either do not implement the RFC correctly or
  otherwise mishandle the DUID.  This has apparently bitten some early
  adopters.)

  Implementing the FQDN option for DHCPv6 would, given the current
  libdhcputil design, require a new 'type' of entry for the inittab6
  file.  This is because the design does not allow for any simple
  means to ``compose'' a sequence of basic types together.  Thus,
  every type of option must either be a basic type, or an array of
  multiple instances of the same basic type.

  If we implement FQDN in the future, it may be useful to explore some
  means of allowing a given option instance to be a sequence of basic
  types.

  This project does not make the DNS resolver or any other subsystem
  use the data gathered by DHCPv6.  It just makes the data available
  through dhcpinfo(1).  Future projects should modify those services
  to use configuration data learned via DHCPv6.  (One of the reasons
  this is not being done now is that Network Automagic [NWAM] will
  likely be changing this area substantially in the very near future,
  and thus the effort would be largely wasted.)


Appendix A - Choice of Venue

  There are three logical places to implement DHCPv6:

    - in dhcpagent
    - in in.ndpd
    - in a new daemon (say, 'dhcp6agent')

  We need to access parameters via dhcpinfo, and should provide the
  same set of status and control features via ifconfig as are present
  for IPv4.  (For the latter, if we fail to do that, it will likely
  confuse users.  The expense for doing it is comparatively small, and
  it will be useful for testing, even though it should not be needed
  in normal operation.)

  If we implement somewhere other than dhcpagent, then we need to give
  that new daemon (in.ndpd or dhcp6agent) the same basic IPC features
  as dhcpagent already has.  This means either extracting those bits
  (async.c and ipc_action.c) into a shared library or just copying
  them.  Obviously, the former would be preferred, but as those bits
  depend on the rest of the dhcpagent infrastructure for timers and
  state handling, this means that the new process would have to look a
  lot like dhcpagent.

  Implementing DHCPv6 as part of in.ndpd is attractive, as it
  eliminates the confusion that the router discovery process for
  determining interface netmasks can cause, along with the need to do
  any signaling at all to bring DHCPv6 up.  However, the need to make
  in.ndpd more like dhcpagent is unattractive.

  Having a new dhcp6agent daemon seems to have little to recommend it,
  other than leaving the existing dhcpagent code untouched.  If we do
  that, then we end up with two implementations that do many similar
  things, and must be maintained in parallel.

  Thus, although it leads to some complexity in reworking the data
  structures to fit both protocols, on balance the simplest solution
  is to extend dhcpagent.


Appendix B - Cross-Reference

  in.ndpd

    - Start dhcpagent and issue "dhcp start" command via libdhcpagent
    - Parse StatefulAddrConf interface option from ndpd.conf
    - Watch for M and O bits to trigger DHCPv6
    - Handle "no routers found" case and start DHCPv6
    - Track prefixes and set prefix length on IFF_DHCPRUNNING aliases
    - Send new Router Solicitation when prefix unknown
    - Change privileges so that dhcpagent can be launched successfully

  libdhcputil

    - Parse new /etc/dhcp/inittab6 file
    - Handle new UNUMBER24, SNUMBER64, IPV6, DUID and DOMAIN types
    - Add DHCPv6 option iterators (dhcpv6_find_option and
      dhcpv6_pkt_option)
    - Add dlpi_to_arp function (temporary)

  libdhcpagent

    - Add stable DUID and IAID creation and storage support
      functions and add new dhcp_stable.h include file
    - Support new DECLINING and RELEASING states introduced by DHCPv6.
    - Update implementation so that it doesn't rely on gettimeofday()
      for I/O timeouts
    - Extend the hostconf functions to support DHCPv6, using a new
      ".dh6" file

  snoop

    - Add support for DHCPv6 packet decoding (all types)
    - Add "dhcp6" filter keyword
    - Fix known bugs in DHCP filtering

  ifconfig

    - Remove inet-only restriction on "dhcp" keyword

  netstat

    - Remove strange "-I list" feature.
    - Add support for DHCPv6 and iterating over IPv6 interfaces.

  ip

    - Add extensions to IPv6 source address selection to prefer DHCPv6
      addresses when all else is equal
    - Fix known bugs in source address selection (remaining from TX
      integration)

  other

    - Add ifindex and source/destination address into PKT_LIST.
    - Add more ARPHDR_* and IPPORT_* values.
Name	Date	Size
..	2016-02-27 17:52:10	7
adopt.c	2009-10-13 19:53:24	9.7 KiB
agent.c	2010-08-02 22:06:35	41.8 KiB
agent.h	2007-01-17 17:41:37	5.4 KiB
async.c	2007-01-17 17:41:37	2.9 KiB
async.h	2007-01-17 17:41:37	1.8 KiB
bound.c	2010-08-02 22:06:35	32.9 KiB
class_id.c	2005-06-14 09:00:00	4.6 KiB
class_id.h	2005-06-14 09:00:00	1.2 KiB
common.h	2007-01-17 17:41:37	1.8 KiB
defaults.c	2016-02-27 17:52:10	7.7 KiB
defaults.h	2009-05-15 16:13:42	2.1 KiB
dhcpagent.dfl	2009-05-15 16:13:42	5.7 KiB
dhcpagent.xcl	2005-06-14 09:00:00	1.4 KiB
inform.c	2010-07-26 21:00:52	3.7 KiB
init_reboot.c	2014-04-30 01:58:01	7.8 KiB
interface.c	2009-11-17 18:17:48	46.9 KiB
interface.h	2009-10-13 19:53:24	7.7 KiB
ipc_action.c	2007-01-17 17:41:37	7 KiB
ipc_action.h	2007-01-17 17:41:37	2 KiB
Makefile	2012-09-23 01:47:23	1.9 KiB
packet.c	2009-01-07 02:16:25	40.4 KiB
packet.h	2007-10-30 19:15:43	4.9 KiB
README	2007-10-30 19:15:43	24.1 KiB
README.v6	2007-10-30 19:15:43	55.4 KiB
release.c	2010-08-02 22:06:35	7.9 KiB
renew.c	2010-08-02 22:06:35	15.1 KiB
request.c	2009-01-07 02:16:25	32.9 KiB
script_handler.c	2009-04-30 02:36:29	9.2 KiB
script_handler.h	2009-04-30 02:36:29	2.4 KiB
select.c	2010-08-02 22:06:35	7.5 KiB
states.c	2010-08-02 22:06:35	40 KiB
states.h	2009-05-15 16:13:42	10.9 KiB
util.c	2010-08-02 22:06:35	17.2 KiB
util.h	2010-08-02 22:06:35	2.4 KiB