mpd_tables.c revision b6bc5f8f1738d5e35fb4beb2e2f2c3cfeb7f3406
4448N/A * The contents of this file are subject to the terms of the 4448N/A * Common Development and Distribution License (the "License"). 4448N/A * You may not use this file except in compliance with the License. 4448N/A * See the License for the specific language governing permissions 4448N/A * and limitations under the License. 4448N/A * When distributing Covered Code, include this CDDL HEADER in each 4448N/A * If applicable, add the following below this CDDL HEADER, with the 4448N/A * fields enclosed by brackets "[]" replaced with your own identifying 4448N/A * information: Portions Copyright [yyyy] [name of copyright owner] 4448N/A * Copyright 2009 Sun Microsystems, Inc. All rights reserved. 4448N/A * Use is subject to license terms. 4448N/A * Global list of phyints, phyint instances, phyint groups and the anonymous 4448N/A * group; the latter is initialized in phyint_init(). 4448N/A * Grouplist signature; initialized in phyint_init(). /* Initialize any per-file global state. Returns 0 on success, -1 on failure */ /* Return the phyint with the given name */ * Lookup a phyint in the group that has the same hardware address as `pi', or * NULL if there's none. If `online_only' is set, then only online phyints * are considered when matching. Otherwise, phyints that had been offlined * due to a duplicate hardware address will also be considered. * NOTE: even when online_only is B_FALSE, we ignore phyints * that are administratively offline (rather than offline * because they're dups); when they're brought back online, * they'll be flagged as dups if need be. * Respond to DLPI notifications. Currently, this only processes physical * address changes for the phyint passed via `arg' by onlining or offlining * If our hardware address hasn't changed, there's nothing to do. * Our old hardware address was a duplicate. If we'd been * offlined because of it, and our new hardware address is not * a duplicate, then bring us online. Otherwise, `oduppi' * must've been the one brought offline; bring it online. * Our new hardware address was a duplicate and we're not * yet flagged as a duplicate; bring us offline. * Initialize information about the underlying link for `pi', and set us * up to be notified about future changes. Returns _B_TRUE on success. errmsg =
"cannot get hardware address";
* Check if the link supports DLPI link state notifications. For * historical reasons, the actual changes are tracked through routing * sockets, so we immediately disable the notification upon success. * Enable notification of hardware address changes to keep pi_hwaddr * Close use of link on `pi'. * NOTE: we don't clear pi_notes here so that iflinkstate() can still * properly report the link state even when offline (which is possible * since we use IFF_RUNNING to track link state). /* Return the phyint instance with the given name and the given family */ * Insert the phyint in the linked list of all phyints. If the phyint belongs * to some group, insert it in the phyint group list. /* Insert the phyint at the head of the 'all phyints' list */ * Insert the phyint at the head of the 'phyint_group members' list * of the phyint group to which it belongs. /* Refresh the group state now that this phyint has been added */ /* Insert the phyint instance in the linked list of all phyint instances. */ * Insert the phyint at the head of the 'all phyint instances' list. * Create a new phyint with the given parameters. Also insert it into * the list of all phyints and the list of phyint group members by calling * Record the phyint values. * Initialize the link state. The link state is initialized to * up, so that if the link is down when IPMP starts monitoring * the interface, it will appear as though there has been a * transition from the link up to link down. This avoids * having to treat this situation as a special case. * Insert the phyint in the list of all phyints, and the * list of phyint group members * If the interface is offline, we set the state to PI_OFFLINE. * Otherwise, optimistically consider this interface running. Later * (in process_link_state_changes()), we will adjust this to match the * current state of the link. Further, if test addresses are * subsequently assigned, we will transition to PI_NOTARGETS and then * to either PI_RUNNING or PI_FAILED depending on the probe results. * Create a new phyint instance belonging to the phyint 'pi' and address * family 'af'. Also insert it into the list of all phyint instances by * calling phyint_inst_insert(). * Attach the phyint instance to the phyint. * Set the back pointers as well /* Insert the phyint instance in the list of all phyint instances. */ * Change the state of phyint `pi' to state `state'. * To simplify things, some callers always set a given state * regardless of the previous state of the phyint (e.g., setting * PI_RUNNING when it's already set). We shouldn't bother * generating an event or consuming a signature for these, since * the actual state of the interface is unchanged. * Note that `pi' has changed state. * Insert the phyint group in the linked list of all phyint groups * at the head of the list * Create a new phyint group called 'name'. * Normal groups always start in the PG_FAILED state since they * have no active interfaces. In contrast, anonymous groups are * heterogeneous and thus always PG_OK. * Change the state of the phyint group `pg' to state `state'. * To simplify things, some callers always set a given state * regardless of the previous state of the group (e.g., setting * PG_DEGRADED when it's already set). We shouldn't bother * generating an event or consuming a signature for these, since * the actual state of the group is unchanged. * We can never know with certainty that a group has * failed. It is possible that all known targets have * failed simultaneously, and new targets have come up * instead. If the targets are routers then router * discovery will kick in, and we will see the new routers * thru routing socket messages. But if the targets are * hosts, we have to discover it by multicast. So flush * all the host targets. The next probe will send out a * multicast echo request. If this is a group failure, we * will still not see any response, otherwise the group * will be repaired after we get NUM_PROBE_REPAIRS * consecutive unicast replies on any phyint. logerr(
"phyint_group_chstate: invalid group state %d; " * Create a new phyint instance and initialize it from the values supplied by * the kernel. Always check for ENXIO before logging any error, because the * interface could have vanished after completion of SIOCGLIFCONF. * pointer to the phyint instance on success * NULL on failure Eg. if the phyint instance is not found in the kernel logdebug(
"phyint_inst_init_from_k(%s %s)\n",
/* Get the socket for doing ioctls */ * Get the interface flags. Ignore virtual interfaces, IPMP * meta-interfaces, point-to-point interfaces, and interfaces * that can't support multicast. * Get the ifindex for recording later in our tables, in case we need * to create a new phyint. " ioctl (get lifindex)");
* Get the phyint group name of this phyint, from the kernel. "ioctl (get group name)");
* If the phyint is not part of any group, pg_name is the * null string. If 'track_all_phyints' is false, there is no * need to create a phyint. * If the IFF_FAILED, IFF_INACTIVE, or IFF_OFFLINE flags are * set, reset them. These flags shouldn't be set if in.mpathd * isn't tracking the interface. * We need to create a new phyint instance. We may also need to * create the group if e.g. the SIOCGLIFCONF loop in initifs() found * an underlying interface before it found its IPMP meta-interface. * Note that we keep any created groups even if phyint_inst_from_k() * fails since a group's existence is not dependent on the ability of * in.mpathd to the track the group's interfaces. logerr(
"phyint_inst_init_from_k: cannot create group " * Lookup the phyint. If the phyint does not exist create it. logerr(
"phyint_inst_init_from_k:" " unable to create phyint %s\n",
pi_name);
/* The phyint exists already. */ * Normally we should see consistent values for the IPv4 and * IPv6 instances, for phyint properties. If we don't, it * means things have changed underneath us, and we should * resync our tables with the kernel. Check whether the * interface index has changed. If so, it is most likely * the interface has been unplumbed and replumbed, * while we are yet to update our tables. Do it now. * If the group name seen by the IPv4 and IPv6 instances * are different, it is most likely the groupname has * changed, while we are yet to update our tables. Do it now. * Create a new phyint instance, corresponding to the 'af' logerr(
"phyint_inst_init_from_k: unable to create" * If this phyint does not have a unique hardware address in its * group, offline it. (The change_pif_flags() implementation * requires that we defer this until after the phyint_instance * Bind pii_probe_sock to the address associated with pii_probe_logint. * This socket will be used for sending and receiving ICMP/ICMPv6 probes to * targets. Do the common part in this function, and complete the * initializations by calling the protocol specific functions * phyint_inst_v{4,6}_sockinit() respectively. * Return values: _B_TRUE/_B_FALSE for success or failure respectively. logdebug(
"phyint_inst_sockinit(%s %s)\n",
* If the socket is already bound, close pii_probe_sock * If the phyint is not part of a named group and track_all_phyints is logdebug(
"phyint_inst_sockinit: no group\n");
* Initialize the socket by calling the protocol specific function. * If it succeeds, add the socket to the poll list. /* Something failed, cleanup and return false */ * IPv6 specific part in initializing the pii_probe_sock. This socket is * Open a raw socket with ICMPv6 protocol. * Use IPV6_BOUND_IF to make sure that probes are sent and received on * the specified phyint only. Bind to the test address to ensure that * the responses are sent to the specified phyint. * Set the hopcount to 1 so that probe packets are not routed. * Disable multicast loopback. Set the receive filter to * receive only ICMPv6 echo replies. * Probes must not block in case of lower layer issues. (
char *)&
off,
sizeof (
off)) < 0) {
* Filter out so that we only receive ICMP echo replies /* Enable receipt of hoplimit */ /* Enable receipt of timestamp */ * IPv4 specific part in initializing the pii_probe_sock. This socket is * Open a raw socket with ICMPv4 protocol. * Use IP_BOUND_IF to make sure that probes are sent and received on * the specified phyint only. Bind to the test address to ensure that * the responses are sent to the specified phyint. * Set the ttl to 1 so that probe packets are not routed. * Disable multicast loopback. Enable receipt of timestamp. * Probes must not block in case of lower layer issues. (
char *)&
ttl,
sizeof (
ttl)) < 0) {
* Remove the phyint group from the list of 'all phyint groups' * The anonymous group always exists, even when empty. * The phyint group must be empty, and must not have any phyints. * The phyint group must be in the list of all phyint groups * Refresh the state of `pg' based on its current members. * Anonymous groups never change state. * If we're shutting down, skip logging messages since otherwise our * shutdown housecleaning will make us report that groups are unusable. * NOTE: We use pg_failmsg_printed rather than origstate since * otherwise at startup we'll log a "now usable" message when the * first usable phyint is added to an empty group. logerr(
"At least 1 IP interface (%s) in group %s is now " logerr(
"All IP interfaces in group %s are now unusable\n",
* Extract information from the kernel about the desired phyint. * Look only for properties of the phyint and not properties of logints. * Take appropriate action on the changes. * The phyint exists in the kernel and matches our knowledge * The phyint has vanished in the kernel. * The phyint's interface index has changed. * Ask the caller to delete and recreate the phyint. * Some ioctl error. Don't change anything. * The phyint has changed group. logdebug(
"phyint_inst_update_from_k(%s %s)\n",
* Get the ifindex from the kernel, for comparison with the " ioctl (get lifindex)");
* The index has changed. Most likely the interface has * been unplumbed and replumbed. Ask the caller to take " old index %d new index %d\n",
* Get the group name from the kernel, for comparison with * the value in our tables. " ioctl (get groupname)");
* If the phyint has changed group i.e. if the phyint group name * returned by the kernel is different, ask the caller to delete * and recreate the phyint in the right group /* Groupname has changed */ * Get the current phyint flags from the kernel, and determine what * flags have changed by comparing against our tables. Note that the * IFF_INACTIVE processing in initifs() relies on this call to ensure * that IFF_INACTIVE is really still set on the interface. * Make sure the IFF_FAILED flag is set if and only if we think * the interface should be failed. /* No change in phyint status */ * Delete the phyint. Remove it from the list of all phyints, and the * list of phyint group members. /* Both IPv4 and IPv6 phyint instances must have been deleted. */ * The phyint must belong to a group. /* The phyint must be in the list of all phyints */ /* Remove the phyint from the phyint group list */ /* Phyint is the 1st in the phyint group list */ /* Refresh the group state now that this phyint has been removed */ /* Remove the phyint from the global list of phyints */ /* Phyint is the 1st in the list */ * See if another phyint in the group had been offlined because * it was a dup of `pi' -- and if so, online it. * Offline phyint `pi' if at least `minred' usable interfaces remain in the * group. Returns an IPMP error code. * Verify that enough usable interfaces in the group would remain. * As a special case, if the group has failed, allow any non-offline * phyints to be offlined. * The interface is now offline, so stop probing it. Note that * if_mpadm(1M) will down the test addresses, after receiving a * success reply from us. The routing socket message will then make us * close the socket used for sending probes. But it is more logical * that an offlined interface must not be probed, even if it has test * NOTE: stop_probing() also sets PI_OFFLINE. * If we're offlining the phyint because it has a duplicate hardware * address, print a warning -- and leave the link open so that we can * be notified of hardware address changes that make it usable again. * Otherwise, close the link so that we won't prevent a detach. logerr(
"IP interface %s has a hardware address which is not " "unique in group %s; offlining\n",
pi->
pi_name,
* If this phyint was preventing another phyint with a duplicate * hardware address from being online, bring that one online now. * If this interface was active, try to activate another INACTIVE * interface in the group. * Undo a previous offline of `pi'. Returns an IPMP error code. * If necessary, reinitialize our link information and verify that its * hardware address is still unique across the group. logerr(
"IP interface %s now has a unique hardware address in " * While the interface was offline, it may have failed (e.g. the link * may have gone down). phyint_inst_check_for_failure() will have * already set pi_flags with IFF_FAILED, so we can use that to decide * whether the phyint should transition to running. Note that after * we transition to running, we will start sending probes again (if * test addresses are configured), which may also reveal that the * interface is in fact failed. /* calls phyint_chstate() */ * Give the requestor time to configure test addresses before * complaining that they're missing. * Delete (unlink and free), the phyint instance. * If the phyint instance has associated probe targets * Delete all the logints associated with this phyint * Close the socket used to send probes to targets from this phyint. * Phyint instance must be in the list of all phyint instances. * Remove phyint instance from the global list of phyint instances. /* Phyint is the 1st in the list */ * Reset the phyint instance pointer in the phyint. * If this is the last phyint instance (being deleted) on this * phyint, then delete the phyint. logdebug(
"pii->pi_phyint NULL can't print\n");
logdebug(
"\nPhyint instance: %s %s index %u state %x flags %llx " "time_ackproc %lld time_lost %u",
* Lookup a logint based on the logical interface name, on the given * Insert a logint at the head of the list of logints of the given * Create a new named logint, on the specified phyint instance. * Initialize the logint based on the data returned by the kernel. /* Get the socket for doing ioctls */ * Get the flags from the kernel. Also serves as a check whether * the logical still exists. If it doesn't exist, no need to proceed * any further. li_in_use will make the caller clean up the logint /* Interface may have vanished */ * Verified the logint exists. Now lookup the logint in our tables. * If it does not exist, create a new logint. * Pretend the interface does not exist * Update li->li_flags with the new flags, after saving the old * value. This is used later to check what flags has changed and * Get the address, prefix, prefixlength and update the logint. * Check if anything has changed. If the logint used for the * test address has changed, take suitable action. /* Interface may have vanished */ /* Interface may have vanished */ * If this is the logint corresponding to the test address used for * sending probes, then if anything significant has changed we need to * determine the test address again. We ignore changes to the * IFF_FAILED and IFF_RUNNING flags since those happen as a matter of * Something significant that affects the testaddress * has changed. Redo the testaddress selection later on * in select_test_ifs(). For now do the cleanup and * set pii_probe_logint to NULL. /* Update the logint with the values obtained from the kernel. */ logerr(
"logint_init_from_k: IGNORED %s %s %s addr %s\n",
* Delete (unlink and free) a logint. logdebug(
"logint_delete(%s %s %s/%u)\n",
/* logint must be in the list of logints */ /* Remove the logint from the list of logints */ /* logint is the 1st in the list */ * If this logint is also being used for probing, then close the * associated socket, if it exists. * Fill in the sockaddr_storage pointed to by `ssp' with the IP address * represented by the [`af',`addr'] pair. Needed because in.mpathd internally * stores all addresses as in6_addrs, but we don't want to expose that. /* Lookup target on its address */ logdebug(
"target_lookup(%s %s): addr %s\n",
* Find and return the next active target, for the next probe. * If no active targets are available, return NULL. * Target must be in the list of targets for this phyint /* Return the next active target */ * Go to the next target. If we hit the end, * reset the ptr to the head * Bubble up the unused target to active * Bubble up the slow target to unused * Bubble up the dead target to slow * Select the best available target, that is not already TG_ACTIVE, * for the caller. The caller will determine whether it wants to * make the returned target TG_ACTIVE. * The selection order is as follows. * 1. pick a TG_UNSED target, if it exists. * 2. else pick a TG_SLOW target that has recovered, if it exists * 3. else pick any TG_SLOW target, if it exists * 4. else pick a TG_DEAD target that has recovered, if it exists * 5. else pick any TG_DEAD target, if it exists * Promote the slow_recovered to unused * Promote the dead_recovered to slow * Some target was deleted. If we don't have even MIN_PROBE_TARGETS * that are active, pick the next best below. /* We are out of targets */ * Create a default target entry. * If the test address is not yet initialized, do not add * any target, since we cannot determine whether the target * belongs to the same subnet as the test address. * If there are multiple subnets associated with an interface, then * add the target to this phyint instance only if it belongs to the * same subnet as the test address. This assures us that we will * be able to reach this target through our routing table. * Prefer router over hosts. Using hosts is a * fallback mechanism, hence delete all host * Routers take precedence over hosts. If this * is a router list and we are trying to add a * host, just return. If this is a host list * and if we have sufficient targets, just return * If this is the first target, set 'pii_targets_are_routers' * The list of targets is either a list of hosts or list or * routers, but not a mix. * Change state to PI_RUNNING if this phyint instance is capable of * sending and receiving probes -- that is, if we know of at least 1 * target, and this phyint instance is probe-capable. For more * details, see the phyint state diagram in mpd_probe.c. * Add the target address named by `addr' to phyint instance `pii' if it does * not already exist. If the target is a router, `is_router' should be set to * If the target does not exist, create it; target_create() will set * tg_in_use to true. Even if it exists already, if it's a router * target and we'd previously learned of it through multicast, then we * need to recreate it as a router target. Otherwise, just set * tg_in_use to to true so that init_router_targets() won't delete it. * Insert target at head of linked list of targets for the associated * Delete a target (unlink and free). * Target must be in the list of targets for this phyint * Reset all references to 'tg' in the probe information * Remove this target from the list of targets of this * Adjust the next target to probe, if it points to * to the currently deleted target. * The number of active targets pii_ntargets == 0 iff * the next active target pii->pii_target_next == NULL /* At this point, we don't have any active targets. */ * Activate any TG_SLOW or TG_DEAD router targets, * since we don't have any other targets * If we still don't have any active targets, the list must * must be really empty. There aren't even TG_SLOW or TG_DEAD * targets. Zero out the probe stats since it will not be * If there are no targets on both instances and the interface would * otherwise be considered PI_RUNNING, go back to PI_NOTARGETS state, * since we cannot probe this phyint any more. For more details, * Flush the target list of every phyint in the group, if the list * is a host target list. This is called if group failure is suspected. * If all targets have failed, multicast will subsequently discover new * targets. Else it is a group failure. * Note: This function is a no-op if the list is a router target list. * Delete all the targets. When the list becomes * empty, target_delete() will set pii->pii_targets * Delete all the targets. When the list becomes * empty, target_delete() will set pii->pii_targets * Reset all references to 'target' in the probe info, as this target is * being deleted. The pr_target field is guaranteed to be non-null if * pr_status is PR_UNACKED. So we change the pr_status to PR_LOST, so that * pr_target will not be accessed unconditionally. * Clear the probe statistics array. /* Reset the next probe index in the probe stats array */ "status %d rtt_sa %lld rtt_sd %lld crtt %d tg_in_use %d\n",
* Compare two prefixes that have the same prefix length. * Fails if the prefix length is unreasonable. /* Make the N leftmost bits one */ * Get the number of UP logints on phyint `pi'. * Get the phyint instance with the other (IPv4 / IPv6) protocol * Check whether a phyint is functioning. * Check whether a phyint is usable. * Post an EC_IPMP sysevent of subclass `subclass' and attributes `nvl'. * Before sending the event, it prepends the current version of the IPMP * sysevent API. Returns 0 on success, -1 on failure (in either case, * Initialize the event channel if we haven't already done so. logerr(
"cannot create event channel `%s': %s\n",
* Return the external IPMP state associated with phyint `pi'. * Return the external IPMP interface type associated with phyint `pi'. * Return the external IPMP link state associated with phyint `pi'. * Return the external IPMP probe state associated with phyint `pi'. * Return the external IPMP target mode associated with phyint instance `pii'. * Return the external IPMP flags associated with phyint `pi'. * Store the test address used on phyint instance `pii' in `ssp'. If there's * no test address, 0.0.0.0 is stored. * Return the external IPMP group state associated with phyint group `pg'. * Return the external IPMP probe state associated with probe `ps'. * Generate an ESC_IPMP_PROBE_STATE sysevent for the probe described by `pr' * on phyint instance `pii'. Returns 0 on success, -1 on failure. logperror(
"cannot create `interface change' event");
logperror(
"cannot create `probe state' event");
* Generate an ESC_IPMP_GROUP_STATE sysevent for phyint group `pg'. * Returns 0 on success, -1 on failure. logperror(
"cannot create `group state change' event");
logperror(
"cannot create `group state change' event");
* Generate an ESC_IPMP_GROUP_CHANGE sysevent of type `op' for phyint group * `pg'. Returns 0 on success, -1 on failure. logperror(
"cannot create `group change' event");
logperror(
"cannot create `group change' event");
* Generate an ESC_IPMP_GROUP_MEMBER_CHANGE sysevent for phyint `pi' in * group `pg'. Returns 0 on success, -1 on failure. logperror(
"cannot create `group member change' event");
logperror(
"cannot create `group member change' event");
* Generate an ESC_IPMP_IF_CHANGE sysevent for phyint `pi' in group `pg'. * Returns 0 on success, -1 on failure. logperror(
"cannot create `interface change' event");
logperror(
"cannot create `interface change' event");
* Generate a signature for use. The signature is conceptually divided * into two pieces: a random 16-bit "generation number" and a 48-bit * monotonically increasing integer. The generation number protects * against stale updates to entities (e.g., IPMP groups) that have been * deleted and since recreated. * Store the information associated with group `grname' into a dynamically * allocated structure pointed to by `*grinfopp'. Returns an IPMP error code. * Tally up the number of interfaces, allocate an array to hold them, * and insert their names into the array. While we're at it, if any * interface is actually enabled to send probes, save the group fdt. * If this is the anonymous group, there's no other information to * collect (since there's no IPMP interface). * Grab some additional information about the group from the kernel. * (NOTE: since SIOCGLIFGROUPINFO does not look up by interface name, * we can use ifsock_v4 even for a V6-only group.) logperror(
"getgroupinfo: SIOCGLIFGROUPINFO");
* Tally up the number of data addresses, allocate an array to hold * them, and insert their values into the array. * It's possible to have duplicate addresses (if some are * down). Weed the dups out to avoid confusing consumers. * (If groups start having tons of addresses, we'll need a * better algorithm here.) for (j = 0; j < i; j++) {
* Store the target information associated with phyint instance `pii' into a * dynamically allocated structure pointed to by `*targinfopp'. Returns an * Store the information associated with interface `ifname' into a dynamically * allocated structure pointed to by `*ifinfopp'. Returns an IPMP error code. * Store the current list of IPMP groups into a dynamically allocated * structure pointed to by `*grlistpp'. Returns an IPMP error code. * Tally up the number of groups, allocate an array to hold them, and * insert their names into the array. * Store the address information for `ssp' (in group `grname') into a * dynamically allocated structure pointed to by `*adinfopp'. Returns an IPMP * error code. (We'd call this function getaddrinfo(), but it would conflict * with getaddrinfo(3SOCKET)). * Walk through the data addresses, and find a match. Note that since * some of the addresses may be down, more than one may match. We * prefer an up address (if one exists). * Store a snapshot of the IPMP subsystem into a dynamically allocated * structure pointed to by `*snapp'. Returns an IPMP error code. * Add information for each group in the list, along with all of its * Add information for each configured phyint.