ipnet.c revision 7b57f05abb8796d3c91c8d4d4c75dcafb5af6b69
0N/A * The contents of this file are subject to the terms of the 0N/A * Common Development and Distribution License (the "License"). 0N/A * You may not use this file except in compliance with the License. 0N/A * See the License for the specific language governing permissions 0N/A * and limitations under the License. 0N/A * When distributing Covered Code, include this CDDL HEADER in each 0N/A * If applicable, add the following below this CDDL HEADER, with the 0N/A * fields enclosed by brackets "[]" replaced with your own identifying 0N/A * information: Portions Copyright [yyyy] [name of copyright owner] 0N/A * Copyright 2009 Sun Microsystems, Inc. All rights reserved. 0N/A * Use is subject to license terms. 1879N/A * The ipnet device defined here provides access to packets at the IP layer. To 1879N/A * provide access to packets at this layer it registers a callback function in 1879N/A * the ip module and when there are open instances of the device ip will pass 1879N/A * packets into the device. Packets from ip are passed on the input, output and 1879N/A * loopback paths. Internally the module returns to ip as soon as possible by 1879N/A * deferring processing using a taskq. 0N/A * Management of the devices in /dev/ipnet/ is handled by the devname 0N/A * filesystem and use of the neti interfaces. This module registers for NIC 0N/A * events using the neti framework so that when IP interfaces are bought up, 0N/A * taken down etc. the ipnet module is notified and its view of the interfaces 0N/A * configured on the system adjusted. On attach, the module gets an initial 0N/A * view of the system again using the neti framework but as it has already 0N/A * registered for IP interface events, it is still up-to-date with any changes. 0N/A "ipnet",
/* mi_idname */ 0N/A 2048,
/* mi_hiwat */ 0N/A * List to hold static view of ipnetif_t's on the system. This is needed to 0N/A * avoid holding the lock protecting the avl tree of ipnetif's over the 0N/A * callback into the dev filesystem. 0N/A * Convenience enumerated type for ipnet_accept(). It describes the 0N/A * properties of a given ipnet_addrp_t relative to a single ipnet_t 0N/A * client stream. The values represent whether the address is ... 0N/A/* Argument used for the ipnet_nicevent_taskq callback. */ 0N/A "STREAMS ipnet driver",
0N/A * This structure contains the template data (names and type) that is 0N/A * copied, in bulk, into the new kstats structure created by net_kstat_create. 0N/A * No actual statistical information is stored in this instance of the 0N/A * ipnet_kstats_t structure. 0N/A * Walk the list of physical interfaces on the machine, for each 0N/A * interface create a new ipnetif_t and add any addresses to it. We 0N/A * need to do the walk twice, once for IPv4 and once for IPv6. 0N/A * The interfaces are destroyed as part of ipnet_stack_fini() for each 0N/A * stack. Note that we cannot do this initialization in 0N/A * ipnet_stack_init(), since ipnet_stack_init() cannot fail. 0N/A * Standard module entry points. 0N/A * We call ddi_taskq_create() with nthread == 1 to ensure in-order 0N/A * delivery of packets to clients. Note that we need to create the 0N/A * taskqs before calling netstack_register() since ipnet_stack_init() 0N/A * registers callbacks that use 'em. 0N/A * It is possible for an exclusive stack to be in the process of 0N/A * shutting down here, and the netid and protocol lookups could fail 1915N/A * Create a local set of kstats for each zone. 1915N/A "ipnet",
"ipnet_stats",
"misc");
1915N/A * This function is called on attach to build an initial view of the 1915N/A * interfaces on the system. It will be called once for IPv4 and once 1915N/A * for IPv6, although there is only one ipnet interface for both IPv4 1915N/A * and IPv6 there are separate address lists. 1915N/A * If ipnet_register_netihook() was unable to initialize this 1915N/A * stack's net_handle_t, then we cannot populate any interface 0N/A * information. This usually happens when we attempted to 0N/A * grab a net_handle_t as a stack was shutting down. We don't 0N/A * want to fail the entire _init() operation because of a 0N/A * stack shutdown (other stacks will continue to work just 0N/A * fine), so we silently return success here. 0N/A * Make sure we're not processing NIC events during the 0N/A * population of our interfaces and address lists. 0N/A * Skip addresses that aren't up. We'll add 0N/A * them when we receive an NE_LIF_UP event. 0N/A /* Don't add it if we already have it. */ 2678N/A * If the system is labeled, only the global zone is allowed to open 2678N/A /* We don't support open as a module */ 2678N/A /* This driver is self-cloning, we don't support re-open. */ 0N/A * We need to hold ips_event_lock here as any NE_LIF_DOWN events need 0N/A * to be processed after ipnet_if is set and the ipnet_t has been 2664N/A * inserted in the ips_str_list. 2667N/A * Only register our callback if we're the first open client; we call 2667N/A * unregister in close() for the last open client. 0N/A /* Fallthrough, we don't support I_STR with DLIOCIPNETINFO. */ * Allocate a new mblk_t and put a dl_ipnetinfo_t in it. * The structure it copies the header information from, * hook_pkt_observe_t, is constructed using network byte * order in ipobs_hook(), so there is no conversion here. /* First check if the address is multicast or limited broadcast. */ * Walk the address list to see if the address belongs to our * interface or is one of our subnet broadcast addresses. * If we're not in the global zone, then only look at * Verify if the packet contained in hdr should be passed up to the * If the packet's ifindex matches ours, or the packet's group ifindex * matches ours, it's on the interface we're observing. (Thus, * observing on the group ifindex matches all ifindexes in the group.) * Do not allow an ipnet stream to see packets that are not from or to * its zone. The exception is when zones are using the shared stack * model. In this case, streams in the global zone have visibility * into other shared-stack zones, and broadcast and multicast traffic * is visible by all zones in the stack. * If DL_PROMISC_SAP isn't enabled, then the bound SAP must match the /* If the destination address is ours, then accept the packet. */ * If DL_PROMISC_PHYS is enabled, then we can see all packets that are * sent or received on the interface we're observing, or packets that * have our source address (this allows us to see packets we send). * We accept multicast and broadcast packets transmitted or received * on the interface we're observing. * Verify if the packet contained in hdr should be passed up to the ipnet * client stream that's in IPNET_LOMODE. * ipnet_if is only NULL for IPNET_MINOR_LO devices. * An ipnet stream must not see packets that are not from/to its zone. * Create a new ipnetif_t and new minor node for it. If creation is * successful the new ipnetif_t is inserted into an avl_tree * containing ipnetif's for this stack instance. * Because ipnetif_create() can be called from a NIC event * callback, it should not block. * Now that the interface can be found by lookups back into ipnet, * allowing for sanity checking, call the BPF attach. /* Send a SIGHUP to all open streams associated with this ipnetif. */ * Now that the interface can't be found, do a BPF detach * Release the reference we implicitly held in ipnetif_create(). /* Remove IPv4/v6 address lists from the ipnetif */ * Create an ipnetif_addr_t with the given logical interface id (lif) * and add it to the supplied ipnetif. The lif is the netinfo * representation of logical interface id, and we use this id to match * incoming netinfo events against our lists of addresses. * Try and get the broadcast address. Note that it's okay for * an interface to not have a broadcast address, so we don't * fail the entire operation if net_getlifaddr() fails here. * The zoneid stored in ipnetif_t needs to correspond to the actual * zone the address is being used in. This facilitates finding the * correct netstack_t pointer, amongst other things, later. * Note that we have one ipnetif for both IPv4 and IPv6, but we receive * separate NE_UNPLUMB events for IPv4 and IPv6. We remove the ipnetif * if both IPv4 and IPv6 interfaces have been unplumbed. * We must have missed a NE_LIF_DOWN event. Delete this * ifaddr and re-create it. * Make sure that open streams on this ipnetif are still allowed to * This callback from the NIC event framework dispatches a taskq as the event /* Do any of the addresses in addrlist belong the supplied zoneid? */ /* Should the supplied ipnetif be visible from the supplied zoneid? */ * The global zone has visibility into all interfaces in the global * stack, and exclusive stack zones have visibility into all * interfaces in their stack. * Shared-stack zones only have visibility for interfaces that have * addresses in their zone. * Verify that any ipnet_t that has a reference to the supplied ipnetif should * still be allowed to have it open. A given ipnet_t may no longer be allowed * to have an ipnetif open if there are no longer any addresses that belong to * the ipnetif in the ipnet_t's non-global shared-stack zoneid. If that's the * case, send the ipnet_t an M_HANGUP. * On labeled systems, non-global zones shouldn't see anything * To register multiple hooks with he same callback function, * a unique name is needed. /* ******************************************************************** */ /* BPF Functions below */ /* ******************************************************************** */ * Convenience function to make mapping a zoneid to an ipnet_stack_t easy. * Rather than weave the complexity of what needs to be done for a BPF * device attach or detach into the code paths of where they're used, * it is presented here in a couple of simple functions, along with * when the clone structures can be free'd. * Set the functions to call back to when adding or removing an interface so * that BPF can keep its internal list of these up to date. * If we're setting a new attach function, call it for every * mac that has already been attached. * The call to ipnet_bpfattach() calls into bpf`bpfattach * which then wants to resolve the link name into a link id. * For ipnet, this results in a call back to * ipnet_get_linkid_byname which also needs to lock and walk * the AVL tree. Thus the call to ipnet_bpfattach needs to * be made without the avl_lock held. * The list of interfaces available via ipnet is private for each zone, * so the AVL tree of each zone must be searched for a given name, even * if all names are unique. * To find the linkid for a given name, it is necessary to know which zone * the interface name belongs to and to search the avl tree for that zone * as there is no master list of all interfaces and which zone they belong * to. It is assumed that the caller of this function is somehow already * working with the ipnet interfaces and hence the ips_event_lock is held. * When BPF calls into this function, it is doing so because of an event * in ipnet, and thus ipnet holds the ips_event_lock. Thus the datalink id * value returned has meaning without the need for grabbing a hold on the * Strictly speaking, there is no such thing as a "client" in ipnet, like * there is in mac. BPF only needs to have this because it is required as * part of interfacing correctly with mac. The reuse of the original * ipnetif_t as a client poses no danger, so long as it is done with its * own ref-count'd hold that is given up on close. * This is called from BPF when it needs to start receiving packets * The use of the ipnet_t structure here is somewhat lightweight when * compared to how it is used elsewhere but it already has all of the * right fields in it, so reuse here doesn't seem out of order. Its * primary purpose here is to provide the means to store pointers for * use when ipnet_promisc_remove() needs to be called. * This should never be called for the IPNET_MINOR_LO device as it is * never created via ipnetif_create. * To register multiple hooks with the same callback function, * a unique name is needed. * arg here comes from the ipnet_t allocated in ipnet_promisc_add. * An important field from that structure is "ipnet_data" that * contains the "data" pointer passed into ipnet_promisc_add: it needs * to be passed back to bpf when we call into ipnet_itap. * ipnet_itap is set by ipnet_set_bpfattach, which in turn is called * clone'd ipnetif_t's are created when a shared IP instance zone comes * to life and configures an IP address. The model that BPF uses is that * each interface must have a unique pointer and each interface must be * representative of what it can capture. They are limited to one DLT * per interface and one zone per interface. Thus every interface that * can be seen in a zone must be announced via an attach to bpf. For * shared instance zones, this means the ipnet driver needs to detect * when an address is added to an interface in a zone for the first * time (and also when the last address is removed.) * Called when BPF loads, the goal is to tell BPF about all of the interfaces * in use by zones that have a shared IP stack. These interfaces are stored * in the ips_avl_by_shared tree. Note that if there are 1000 bge0's in use * as bge0:1 through to bge0:1000, then this would be represented by a single