2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright (c) 1995, 2011, Oracle and/or its affiliates. All rights reserved. 2N/A * Just in case we're not in a build environment, make sure that 2N/A * TEXT_DOMAIN gets set to something. 2N/A * Metadevice diskset interfaces 2N/A * This is not the first replica being added to the 2N/A * diskset so call with ADDSIDENMS_BCAST. If this 2N/A * is a traditional diskset, the bcast flag is ignored 2N/A * since traditional disksets don't use the rpc.mdcommd. 2N/A /* Use rpc.mdcommd to add md side info from all nodes */ 2N/A * If reconfig cycle has been started, this node is stuck in 2N/A * in the return step until this command has completed. If 2N/A * mdcommd is suspended, ask send_message to fail (instead of 2N/A * retrying) so that metaset can finish allowing the 2N/A * reconfig cycle to proceed. 2N/A * Okay we have a valid key 2N/A * Let's see if it is hsp or not 2N/A * If it is hsp add here 2N/A * The device reference count can be greater than 1 if 2N/A * more than one softpart is configured on top of the 2N/A * same device. If this is the case then we want to 2N/A * increment the count to sync up with the other sides. 2N/A char **
node_v,
/* Nodes which are being added */ 2N/A /* Put the new entries into the set */ 2N/A * Get membershiplist from API routine. If there's 2N/A * an error, fail to create set and pass back error. 2N/A * meta_set_addhosts has already verified that 2N/A * this node list is in the membership list 2N/A * so set ALIVE flag. 2N/A * Since this is a new set, all hosts being 2N/A * added are new to the set, so also set ADD flag. 2N/A * Nodelist must be kept in ascending 2N/A /* Nothing in list, just add it */ 2N/A /* Add to head of list */ 2N/A /* Search for place ot add it */ 2N/A /* Add before nd_curr */ 2N/A /* Add to end of list */ 2N/A /* Set master to be first node added */ 2N/A * Creating mnset for first time. 2N/A * Set master to be invalid until first drive is 2N/A /* Create the set where needed */ 2N/A * Create the set on each new node. If the set already 2N/A * exists, then the node list being created on each new node 2N/A * is the current node list from before the new nodes 2N/A * were added. If the set doesn't exist, then the node 2N/A * list being created on each new node is the entire 2N/A * Add the drive records to the new sets 2N/A * and names for the new sides. 2N/A char **
node_v,
/* Nodes which are being added */ 2N/A /* Put the new entries into the set */ 2N/A /* Create the set where needed */ 2N/A * Create the set on each new host 2N/A * Add the drive records to the new sets 2N/A * and names for the new sides. 2N/A char **
node_v,
/* Nodes which are being added */ 2N/A /* We must be a member of the set we are creating */ 2N/A * If auto_take then we must be the only member of the set 2N/A * that we are creating. 2N/A * If we're part of SC3.0 we'll already have allocated the 2N/A * set number so we can skip the allocation algorithm used. 2N/A * Set number is unique across traditional and MN disksets. 2N/A * Make sure this set name is not used on the 2N/A * Lock the set on current set members. 2N/A * Set locking done much earlier for MN diskset than for traditional 2N/A * diskset since lock_set is used to protect against 2N/A * other meta* commands running on the other nodes. 2N/A * Don't issue mdcommd SUSPEND command since there is nothing 2N/A * to suspend since there currently is no set. 2N/A /* Make sure we are blocking all signals */ 2N/A /* Lock the set on new set members */ 2N/A /* Now have the diskset locked, verify set number is still ok */ 2N/A /* END CHECK CODE */ 2N/A /* Lock the set on new set members */ 2N/A * Mark the set record MD_SR_OK 2N/A * On each added node, set the node record for that node 2N/A * to OK. Then set all node records for the newly added 2N/A * nodes on all nodes to ok. 2N/A * By setting a node's own node record to ok first, even if 2N/A * the node adding the hosts panics, the rest of the nodes can 2N/A * determine the same node list during the choosing of the master 2N/A * during reconfig. So, only nodes considered for mastership 2N/A * are nodes that have both MD_MN_NODE_OK and MD_SR_OK set 2N/A * on that node's rpc.metad. If all nodes have MD_SR_OK set, 2N/A * but no node has its own MD_MN_NODE_OK set, then the set will 2N/A * be removed during reconfig since a panic occurred during the 2N/A * creation of the initial diskset. 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Something wrong, will pick this up in next loop */ 2N/A /* Only changing my local cache of node list */ 2N/A /* Set node record for added host to ok on that host */ 2N/A /* Now set all node records on all nodes to be ok */ 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A * Set successfully created. 2N/A * Notify rpc.mdcommd on all nodes of a nodelist change. 2N/A * Send reinit command to mdcommd which forces it to get 2N/A * fresh set description. Then send resume. 2N/A * Resume on class 0 will resume all classes. 2N/A /* Class is ignored for REINIT */ 2N/A "Unable to reinit rpc.mdcommd.\n"));
2N/A "Unable to resume rpc.mdcommd.\n"));
2N/A /* release signals back to what they were on entry */ 2N/A /* all signals already blocked for MN disket */ 2N/A /* Make sure we are blocking all signals */ 2N/A * On each added node (which is now each node to be deleted), 2N/A * set the node record for that node to DEL. Then set all 2N/A * node records for the newly added (soon to be deleted) nodes 2N/A * on all nodes to ok. 2N/A * By setting a node's own node record to DEL first, even if 2N/A * the node doing the rollback panics, the rest of the nodes can 2N/A * determine the same node list during the choosing of the master 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Something wrong, will pick this up in next loop */ 2N/A /* Only changing my local cache of node list */ 2N/A /* Set node record for added host to DEL on that host */ 2N/A /* Now set all node records on all nodes to be DEL */ 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Mark set record on all hosts to be DELETED */ 2N/A /* Don't test lock flag since guaranteed to be set if in rollback */ 2N/A /* release signals back to what they were on entry */ 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A * All nodes should be alive in non-oha mode. 2N/A * For traditional diskset, issue the RPC and 2N/A * ignore RPC failure if in OHA mode. 2N/A /* Make sure we own the set */ 2N/A /* Lock the set on our side */ 2N/A /* If we have drives */ 2N/A /* Use rpc.mdcommd to add md side info from all nodes */ 2N/A * If reconfig cycle has been started, this node is stuck in 2N/A * in the return step until this command has completed. If 2N/A * mdcommd is suspended, ask send_message to fail (instead of 2N/A * retrying) so that metaset can finish allowing the 2N/A * reconfig cycle to proceed. 2N/A * The device reference count can be greater than 1 if 2N/A * more than one softpart is configured on top of the 2N/A * same device. If this is the case then we want to 2N/A * decrement the count to zero so the entry can be 2N/A /* Skip empty slots */ 2N/A * If a MN diskset, set is already locked on all nodes via clnt_lock_set. 2N/A /* Make sure we are blocking all signals */ 2N/A * Lock the set on current set members for traditional disksets. 2N/A * For traditional diskset, issue the RPC and 2N/A * ignore RPC failure if in OHA mode. 2N/A * Mark the set record MD_SR_DEL 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A * All nodes should be alive in non-oha mode. 2N/A * For traditional diskset, issue the RPC and 2N/A * ignore RPC failure if in OHA mode. 2N/A /* The set is OK to delete, make it so. */ 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A * All nodes should be alive in non-oha mode. 2N/A * For traditional diskset, issue the RPC and 2N/A * ignore RPC failure if in OHA mode. 2N/A * Unlock the set on current set members 2N/A * for traditional disksets. 2N/A * For traditional diskset, issue the RPC and 2N/A * ignore RPC failure if in OHA mode. 2N/A * A MN diskset has the clnt_locks held by meta_set_deletehosts so 2N/A * don't flush that data until meta_set_deletehosts has finished 2N/A * with it. meta_set_deletehosts will handle the flush of the 2N/A /* release signals back to what they were on entry */ 2N/A /* all signals already blocked for MN disket */ 2N/A /* Make sure we are blocking all signals */ 2N/A * Unlock the set on current set members 2N/A * for traditional disksets. 2N/A * For traditional diskset, issue the RPC and 2N/A * ignore RPC failure if in OHA mode. 2N/A /* release signals back to what they were on entry */ 2N/A * A MN diskset has the clnt_locks held by meta_set_deletehosts so 2N/A * don't flush that data until meta_set_deletehosts has finished 2N/A * with it. meta_set_deletehosts will handle the flush of the 2N/A * procsigs already called for MN diskset. 2N/A * md_rb_sig_handling already called for traditional diskset. 2N/A * May need this to re-add sidenames on roll back. 2N/A * Delete the db replica sides 2N/A * This is done before the next loop, so that 2N/A * the db does not get unloaded before we are finished 2N/A * deleting the sides. 2N/A /* Skip hosts not being deleted */ 2N/A /* Skip empty slots */ 2N/A /* Skip hosts not being deleted */ 2N/A /* Delete the names from the namespace */ 2N/A /* Skip hosts not being deleted */ 2N/A /* Skip empty slots */ 2N/A /* Skip hosts not being deleted */ 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A * All nodes should be alive in non-oha mode. 2N/A * For traditional diskset, issue the RPC and 2N/A * ignore RPC failure if in OHA mode. 2N/A /* release signals back to what they were on entry */ 2N/A /* all signals already blocked for MN disket */ 2N/A /* Make sure we are blocking all signals */ 2N/A * See if we have to re-add the drives specified. 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A /* Don't care if set record is MN or not */ 2N/A /* Drive already added, skip to next node */ 2N/A * Set record structure was allocated from RPC 2N/A * routine getset so this structure is only of 2N/A * size md_set_record even if the MN flag is 2N/A * set. So, clear the flag so that the free 2N/A * code doesn't attempt to free a structure 2N/A * the size of md_mnset_record. 2N/A * Set record structure was allocated from RPC routine 2N/A * getset so this structure is only of size 2N/A * md_set_record even if the MN flag is set. So, 2N/A * clear the flag so that the free code doesn't 2N/A * attempt to free a structure the size of 2N/A * This is not the first replica being added to the 2N/A * diskset so call with ADDSIDENMS_BCAST. If this 2N/A * is a traditional diskset, the bcast flag is ignored 2N/A * since traditional disksets don't use the rpc.mdcommd. 2N/A * Add the device names for the new sides into the namespace, 2N/A * on all hosts not being deleted. 2N/A /* Find a node that is not being deleted */ 2N/A /* Skip empty slots */ 2N/A /* Find a node that is not being deleted */ 2N/A /* Skip nodes not being deleted */ 2N/A /* this side was just created, add the names */ 2N/A /* Skip empty slots */ 2N/A /* Skip nodes not being deleted */ 2N/A /* this side was just created, add the names */ 2N/A /* Skip empty slots */ 2N/A /* release signals back to what they were on entry */ 2N/A /* find the end of the link list */ 2N/A * For MO diskset the sideno is not an index into 2N/A * the array of nodes. Hence getside_devinfo is 2N/A * used instead of meta_getnextside_devinfo. 2N/A /* decrement sideno, to look like the previous sideno */ 2N/A /* Add to the end of the linked list */ 2N/A * Exported Entry Points 2N/A * Check the given disk set name for syntactic correctness. 2N/A * Add host(s) to the multi-node diskset provided in sp. 2N/A * - create set if non-existent. 2N/A * Check membershiplist first. If there's 2N/A * an error, fail to create set and pass back error. 2N/A /* Verify that all nodes are in member list */ 2N/A * If node in list isn't a member of the membership, 2N/A * just return error. 2N/A * Node list is needed later, but there is a lot of error 2N/A * checking and possible failures between here and there, so 2N/A * just re-get the list later if there are no errors. 2N/A * Verify that list of nodes being added contains no 2N/A * Verify that each node being added thinks that its nodename 2N/A * is the same as the nodename given. 2N/A * If this node and another node were both attempting to 2N/A * create the same setname at the same time, and the other 2N/A * node has just created the set on this node then sd would 2N/A * be non-NULL, but sp->setno would be null (setno is filled 2N/A * in by the create_set). If this is true, then fail since 2N/A * the other node has already won this race. 2N/A /* The auto_take behavior is inconsistent with multiple hosts. */ 2N/A * We already have the set. 2N/A /* Make sure we own the set */ 2N/A * The drive and node records are stored in the local mddbs of each 2N/A * node in the diskset. Each node's rpc.metad daemon reads in the set, 2N/A * drive and node records from that node's local mddb and caches them 2N/A * internally. Any process needing diskset information contacts its 2N/A * local rpc.metad to get this information. Since each node in the 2N/A * diskset is independently reading the set information from its local 2N/A * mddb, the set, drive and node records in the local mddbs must stay 2N/A * in-sync, so that all nodes have a consistent view of the diskset. 2N/A * For a multinode diskset, explicitly verify that all nodes in the 2N/A * diskset are ALIVE (i.e. are in the API membership list). Otherwise, 2N/A * fail this operation since all nodes must be ALIVE in order to add 2N/A * the new node record to their local mddb. If a panic of this node 2N/A * leaves the local mddbs set, node and drive records out-of-sync, the 2N/A * reconfig cycle will fix the local mddbs and force them back into 2N/A * Check if node is already in set. 2N/A /* Is node already in set? */ 2N/A * Lock the set on current set members. 2N/A * Set locking done much earlier for MN diskset than for traditional 2N/A * diskset since lock_set and SUSPEND are used to protect against 2N/A * other meta* commands running on the other nodes. 2N/A /* Make sure we are blocking all signals */ 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A * Lock out other meta* commands by suspending 2N/A * class 1 messages across the diskset. 2N/A /* Send suspend to nodes in nodelist before addhosts call */ 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Lock the set on new set members */ 2N/A /* Already verified to be alive */ 2N/A * Perform the required checks for new hosts 2N/A /* Make sure this set name is not used on the other hosts */ 2N/A /* Keep on truck'n */ 2N/A /* Get drive descriptors for the set */ 2N/A /* END CHECK CODE */ 2N/A * Create the set where needed 2N/A * Send suspend to rpc.mdcommd on nodes where a set has been 2N/A * created since rpc.mdcommd must now be running on the remote nodes. 2N/A * Lock out other meta* commands by suspending 2N/A * class 1 messages across the diskset. 2N/A * Merge the new entries into the set with the existing sides. 2N/A * Get membershiplist from API routine. If there's 2N/A * an error, fail to create set and pass back error. 2N/A * Nodelist must be kept in ascending nodeid order. 2N/A /* Nothing in list, just add it */ 2N/A /* Add to head of list */ 2N/A /* Search for place to add it */ 2N/A /* Add before nd_curr */ 2N/A /* Add to end of list */ 2N/A /* Node already verified to be in membership */ 2N/A /* If we have drives */ 2N/A * For all the hosts being added, create a sidename structure 2N/A /* Skip nodes not being added */ 2N/A * Add the new sidename for each drive to all the hosts 2N/A * If a multi-node diskset, each host only stores 2N/A * the side information for itself. So, only send 2N/A * side information to the new hosts where each host 2N/A * will add the appropriate side information to its 2N/A /* Skip nodes not being added */ 2N/A /* Add side info to new hosts */ 2N/A * Add the device names for the new sides into the namespace 2N/A * for all hosts being added. This is adding the side 2N/A * names to the diskset's mddb so add sidenames for all 2N/A /* Skip nodes not being added */ 2N/A /* this side was just created, add the names */ 2N/A * Notify rpc.mdcommd on all nodes of a nodelist change. 2N/A * Start by suspending rpc.mdcommd (which drains it of all 2N/A * messages), then change the nodelist followed by a reinit 2N/A /* Send suspend_all to nodes in nodelist (existing + new) */ 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Add the node(s) to the each host that is currently in the set */ 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A * Mark the drives MD_DR_OK. 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Add the mediator information to all hosts in the set. */ 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A * If a MN diskset and there are drives in the set, 2N/A * set the master on the new nodes and 2N/A * automatically join the new nodes into the set. 2N/A * Is current set STALE? 2N/A /* Set master on newly added nodes */ 2N/A /* Join newly added nodes to diskset and set OWN flag */ 2N/A * Also set ADD flag since this flag 2N/A * is already set in rpc.metad - it's 2N/A * just not in the local copy. 2N/A * Could flush local cache and call 2N/A * metaget_setdesc, but this just 2N/A * adds time. Since this node knows 2N/A * the state of the node flags in 2N/A * rpc.metad, just set the ADD 2N/A * flag and save time. 2N/A /* Send new node flag list to all Owner nodes */ 2N/A * Will effectively set OWN flag in records kept 2N/A * cached in rpc.metad. The ADD flag would have 2N/A * already been set by the call to clnt_addhosts. 2N/A * Mark the set record MD_SR_OK 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A * On each newly added node, set the node record for that node 2N/A * to OK. Then set all node records for the newly added 2N/A * nodes on all nodes to ok. 2N/A * By setting a node's own node record to ok first, even if 2N/A * the node adding the hosts panics, the rest of the nodes can 2N/A * determine the same node list during the choosing of the master 2N/A * during reconfig. So, only nodes considered for mastership 2N/A * are nodes that have both MD_MN_NODE_OK and MD_SR_OK set 2N/A * on that node's rpc.metad. If all nodes have MD_SR_OK set, 2N/A * but no node has its own MD_MN_NODE_OK set, then the set will 2N/A * be removed during reconfig since a panic occurred during the 2N/A * creation of the initial diskset. 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Something wrong, will pick this up in next loop */ 2N/A /* Only changing my local cache of node list */ 2N/A /* Set node record for added host to ok on that host */ 2N/A /* Now set all node records on all nodes to be ok */ 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A * Notify rpc.mdcommd on all nodes of a nodelist change. 2N/A * Send reinit command to mdcommd which forces it to get 2N/A * fresh set description. Then send resume. 2N/A * Resume on class 0 will resume all classes, so can skip 2N/A * doing an explicit resume of class1 (ignore suspend1_flag). 2N/A * Don't know if nodelist contains the nodes being added 2N/A * or not, so do reinit to nodes not being added (by skipping 2N/A * any nodes in the nodelist being added) and then do 2N/A * reinit to nodes being added if remote_sets_created is 1. 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Skip nodes being added - handled later */ 2N/A /* Class is ignored for REINIT */ 2N/A "Unable to reinit rpc.mdcommd.\n"));
2N/A * Send reinit to added nodes that had a set created since 2N/A * rpc.mdcommd is running on the nodes with a set. 2N/A "Unable to reinit rpc.mdcommd.\n"));
2N/A * Unlock diskset by resuming messages across the diskset. 2N/A * Just resume all classes so that resume is the same whether 2N/A * just one class was locked or all classes were locked. 2N/A * Don't know if nodelist contains the nodes being added 2N/A * or not, so do resume_all to nodes not being added (by 2N/A * skipping any nodes in the nodelist being added) and then do 2N/A * resume_all to nodes being added if remote_sets_created is 1. 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Skip nodes being added - handled later */ 2N/A "Unable to resume rpc.mdcommd.\n"));
2N/A * Send resume to added nodes that had a set created since 2N/A * rpc.mdcommd is be running on the nodes with a set. 2N/A /* Already verified to be alive */ 2N/A "Unable to resume rpc.mdcommd.\n"));
2N/A * Start a resync thread on the newly added nodes 2N/A * if set is not stale. Also start a thread to update the 2N/A * abr state of all soft partitions 2N/A "Unable to start resync " 2N/A "Unable to start sp update " 2N/A * Don't know if nodelist contains the nodes being added 2N/A * or not, so do clnt_unlock_set to nodes not being added (by 2N/A * skipping any nodes in the nodelist being added) and then do 2N/A * clnt_unlock_set to nodes being added. 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Skip hosts we get in the next loop */ 2N/A /* Already verified to be alive */ 2N/A /* release signals back to what they were on entry */ 2N/A * For each node being deleted, set DEL flag and 2N/A * reset OK flag on that node first. 2N/A * Until a node has turned off its own 2N/A * rpc.metad's NODE_OK flag, that node could be 2N/A * considered for master during a reconfig. 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Something wrong, handle this in next loop */ 2N/A /* Only changing my local cache of node list */ 2N/A /* Set flags for del host to DEL on that host */ 2N/A /* Reset master on newly added node */ 2N/A /* Withdraw set on newly added node */ 2N/A * Turn off owner flag in nodes to be deleted 2N/A * if there are drives in the set. 2N/A * Also, turn off NODE_OK and turn on NODE_DEL 2N/A * for nodes to be deleted. 2N/A * These flags are used to set the node 2N/A * record flags in all nodes in the set. 2N/A * Now, reset owner and set delete flags for the deleted 2N/A * nodes on all nodes. 2N/A * On each node being deleted, set the set record 2N/A * to be in DEL state. 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A * Notify rpc.mdcommd on all nodes of a nodelist change. 2N/A * Send reinit command to mdcommd which forces it to get 2N/A * fresh set description. Then send resume. 2N/A * Nodelist contains all nodes (existing + added). 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Send reinit to nodes in nodelist before addhosts call */ 2N/A * Skip nodes being added if remote sets were not 2N/A * created since rpc.mdcommd may not be running 2N/A * on the remote nodes. 2N/A /* Class is ignored for REINIT */ 2N/A "Unable to reinit rpc.mdcommd.\n"));
2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A * Skip nodes being added if remote sets were not 2N/A * created since rpc.mdcommd may not be running 2N/A * on the remote nodes. 2N/A * Resume all classes but class 1 so that lock is held 2N/A * against meta* commands. 2N/A * Send resume_all_but_1 to nodes in nodelist 2N/A * before addhosts call. 2N/A "Unable to resume rpc.mdcommd.\n"));
2N/A /* Nodelist may or may not contain nodes being added. */ 2N/A /* Skip nodes not being added */ 2N/A /* Nodelist may or may not contain nodes being added. */ 2N/A /* Skip nodes not being added */ 2N/A /* delete the drive records */ 2N/A /* delete the set record */ 2N/A /* Don't test lock flag since guaranteed to be set if in rollback */ 2N/A /* Nodelist may or may not contain nodes being added. */ 2N/A * Unlock diskset by resuming messages across the diskset. 2N/A * Just resume all classes so that resume is the same whether 2N/A * just one class was locked or all classes were locked. 2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A * Skip nodes being added since remote sets 2N/A * were either created and then deleted or 2N/A * were never created. Either way - rpc.mdcommd 2N/A * may not be running on the remote node. 2N/A "Unable to resume rpc.mdcommd.\n"));
2N/A /* All nodes are guaranteed to be ALIVE */ 2N/A /* Skip hosts we get in the next loop */ 2N/A /* release signals back to what they were on entry */ 2N/A * Add host(s) to the traditional diskset provided in sp. 2N/A * - create set if non-existent. 2N/A /* The auto_take behavior is inconsistent with multiple hosts. */ 2N/A * We already have the set. 2N/A /* Make sure we own the set */ 2N/A * Perform the required checks for new hosts 2N/A /* Make sure this set name is not used on the other hosts */ 2N/A /* Keep on truck'n */ 2N/A /* Count the number of occupied slots */ 2N/A /* Count occupied slots */ 2N/A /* Make sure the we have space to add the new sides */ 2N/A /* Get drive descriptors for the set */ 2N/A /* Setup the mediator record roll-back structure */ 2N/A /* END CHECK CODE */ 2N/A /* Lock the set on current set members */ 2N/A /* Skip empty slots */ 2N/A /* Lock the set on new set members */ 2N/A * Add the new hosts to the existing set record on the existing hosts 2N/A /* skip empty slots */ 2N/A /* Merge the new entries into the set with the existing sides */ 2N/A /* Skip full slots */ 2N/A /* If we have drives */ 2N/A * For all the hosts being added, create a sidename structure 2N/A /* Skip empty slots */ 2N/A /* Skip nodes not being added */ 2N/A * Add the new sidename for each drive to the existing hosts 2N/A /* Skip empty slots */ 2N/A /* Skip nodes being added */ 2N/A /* create the set on the new nodes, this adds the drives as well */ 2N/A * Add the device entries for the new sides into the namespace. 2N/A /* Skip empty slots */ 2N/A /* Skip nodes not being added */ 2N/A * Mark the drives MD_DR_OK. 2N/A /* Skip empty slots */ 2N/A /* Bring the mediator record up to date with the set record */ 2N/A /* Inform the mediator hosts of the new node list */ 2N/A /* Add the mediator information to all hosts in the set */ 2N/A /* Skip empty slots */ 2N/A * Mark the set record MD_SR_OK 2N/A /* Skip empty slots */ 2N/A /* Skip empty slots */ 2N/A /* Skip hosts we get in the next loop */ 2N/A /* Make sure we are blocking all signals */ 2N/A /* Skip empty slots */ 2N/A /* Skip nodes not being added */ 2N/A /* delete the drive records */ 2N/A /* delete the set record on the 'new' hosts */ 2N/A /* Skip empty slots */ 2N/A /* Skip nodes not being added */ 2N/A /* Skip empty slots */ 2N/A /* Skip nodes not being added */ 2N/A /* Skip empty slots */ 2N/A /* Skip empty slots */ 2N/A /* Skip hosts we get in the next loop */ 2N/A /* release signals back to what they were on entry */ 2N/A * Add host(s) to the diskset provided in sp. 2N/A * - create set if non-existent. 2N/A * Delete host(s) from the diskset provided in sp. 2N/A * - destroy set if last host in set is removed. 2N/A * Verify that list of nodes being deleted contains no 2N/A /* Make sure we own the set */ 2N/A * The drive and node records are stored in the local mddbs of each 2N/A * node in the diskset. Each node's rpc.metad daemon reads in the set, 2N/A * drive and node records from that node's local mddb and caches them 2N/A * internally. Any process needing diskset information contacts its 2N/A * local rpc.metad to get this information. Since each node in the 2N/A * diskset is independently reading the set information from its local 2N/A * mddb, the set, drive and node records in the local mddbs must stay 2N/A * in-sync, so that all nodes have a consistent view of the diskset. 2N/A * For a multinode diskset, explicitly verify that all nodes in the 2N/A * diskset are ALIVE (i.e. are in the API membership list) if the 2N/A * forceflag is FALSE. (The case of forceflag being TRUE is handled 2N/A * in OHA check above.) 2N/A * If forceflag is FALSE and a node in the diskset is not in 2N/A * the membership list, then fail this operation since all nodes must 2N/A * be ALIVE in order to delete the node record from their local mddb. 2N/A * If a panic of this node leaves the local mddbs set, node and drive 2N/A * records out-of-sync, the reconfig cycle will fix the local mddbs 2N/A * and force them back into synchronization. 2N/A * Lock the set on current set members. 2N/A * Set locking done much earlier for MN diskset than for traditional 2N/A * diskset since lock_set and SUSPEND are used to protect against 2N/A * other meta* commands running on the other nodes. 2N/A /* Make sure we are blocking all signals */ 2N/A * Lock out other meta* commands by suspending 2N/A * class 1 messages across the diskset. 2N/A * Count the number of nodes currently in the set. 2N/A /* Count full slots */ 2N/A * OHA mode == -f -h <hostname> 2N/A * OHA is One Host Administration that occurs when the forceflag (-f) 2N/A * is set and at least one host in the diskset isn't responding 2N/A * When in OHA mode, a node cannot delete itself from a diskset. 2N/A * When in OHA mode, a node can delete a list of nodes from a diskset 2N/A * even if some of the nodes in the diskset are unresponsive. 2N/A * For multinode diskset, only allow OHA mode when the nodes that 2N/A * aren't responding in the diskset are not in the membership list 2N/A * (i.e. nodes that aren't responding are not marked ALIVE). 2N/A * Nodes that aren't in the membership list will be rejoining 2N/A * the diskset through a reconfig cycle and the local mddb set 2N/A * and node records can be reconciled during the reconfig cycle. 2N/A * If a node isn't responding, but is still in the membership list, 2N/A * fail the request since the node may not be responding because 2N/A * rpc.metad died and is restarting. In this case, no reconfig 2N/A * cycle will be started, so there's no way to recover if 2N/A * the host delete operation was allowed. 2N/A * NOTE: if nodes that weren't in the membership when the OHA host 2N/A * delete occurred are now the only nodes in membership list, 2N/A * those nodes will see the old view of the diskset. As soon as 2N/A * a node re-enters the cluster that was present in the cluster 2N/A * during the host deletion, the diskset will reflect the host 2N/A * deletion on all nodes presently in the cluster. 2N/A * If a node isn't ALIVE (in member list), 2N/A * then allow a force-able delete in OHA mode. 2N/A * Don't test for clnt_nullproc since already 2N/A * tested the RPC connections by clnt_lock_set. 2N/A /* Skip empty slots */ 2N/A * If we timeout to at least one 2N/A * client, then we can allow OHA mode, 2N/A * otherwise, we are in normal mode. 2N/A * Don't allow this for MN diskset since meta_set_destroy of 1 node 2N/A * does NOT remove this node's node record from the other node's set 2N/A * records in their local mddb. This leaves a MN diskset in a very 2N/A /* Can return since !MN diskset so nothing to unlock */ 2N/A * In multinode diskset, can only delete self if this 2N/A * is the last node in the set or if all nodes in 2N/A * the set are being deleted. The traditional diskset code 2N/A * allows a node to delete itself (when there are other nodes 2N/A * in the diskset) when using the force flag, but that code 2N/A * path doesn't have the node remove itself from 2N/A * the set node list on the other nodes. Since this isn't 2N/A * satisfactory for the multinode diskset, just don't 2N/A * allow this operation. 2N/A * In multinode diskset, don't allow deletion of master node unless 2N/A * this is the only node left or unless all nodes are being 2N/A * deleted since there is no way to switch 2N/A * master ownership (unless via a cluster reconfig cycle). 2N/A /* Deleting self w/o forceflg */ 2N/A * Setup the mediator record roll-back structure for a trad diskset. 2N/A * For a MN diskset, the deletion of a host in the diskset 2N/A * does not cause an update of the mediator record. If the 2N/A * host deletion will cause the diskset to be removed (this is 2N/A * the last host being removed or all hosts are being removed) 2N/A * then the mediator record must have already been removed by the 2N/A * user or this delete host operation will fail (a check for 2N/A * this is done later in this routine). 2N/A /* Bring the mediator record up to date with the set record */ 2N/A * For traditional diskset: 2N/A * Check to see if all the hosts we are trying to delete the set from 2N/A * have a set "setname" that is the same as ours, i.e. - same name, 2N/A * same time stamp, same genid. We only do this if forceflg is not 2N/A * specified or we are in OHA mode. 2N/A /* We skip this side */ 2N/A * Can't talk to the host only allowed in OHA 2N/A * We got an error we do not, or are not, 2N/A * prepared to handle. 2N/A * If we got here: both hosts are up; a host in 2N/A * our set record does not have the set. So we 2N/A * delete the host from our set and invalidate 2N/A * If we delete a host, make sure the mediator 2N/A * hosts are made aware of this. 2N/A * If we can talk to the host, and they do not have the 2N/A * exact set, then we disallow the operation. 2N/A * Here we prune the node_v's that were invalidated above. 2N/A * If we are left with no nodes, then we have 2N/A * compeleted the operation. 2N/A * Inform the mediator hosts of the new node 2N/A * For multinode diskset: 2N/A * If forceflag is FALSE then check to see if all the hosts we 2N/A * are trying to delete the set from have a set "setname" that 2N/A * is the same as ours, i.e. - same name, same time stamp, same genid. 2N/A * If forceflag is TRUE, then we don't care if the hosts being 2N/A * deleted have the same set information or not since user is forcing 2N/A * those hosts to be deleted. 2N/A /* We skip this node since comparing against it */ 2N/A * If we can talk to the host, and they do not have the 2N/A * exact set, then we disallow the operation. 2N/A * For traditional diskset: 2N/A * Can't allow user to delete their node (without deleting all nodes) 2N/A * out of a set in OHA mode, would leave a real mess. 2N/A * This action was already failed above for a MN diskset. 2N/A /* Can directly return since !MN diskset; nothing to unlock */ 2N/A /* Get the drive descriptors for this set */ 2N/A * We have been asked to delete all the hosts in the set, i.e. - delete 2N/A * This is only a valid operation if all drives have been 2N/A * If a mediator is currently associated with this set, 2N/A * fail the deletion of the last host(s). 2N/A * Get timeout values in case we need to roll back 2N/A * We need this around for re-adding DB side names later. 2N/A * Alloc nodeid list if drives are present in diskset. 2N/A * nodeid list is used to reset mirror owners if the 2N/A * owner is a deleted node. 2N/A /* Lock the set on current set members */ 2N/A /* Skip empty slots */ 2N/A * Notify rpc.mdcommd on all nodes of a nodelist change. 2N/A * Start by suspending rpc.mdcommd (which drains it of 2N/A * all messages), then change the nodelist followed 2N/A * by a reinit and resume. 2N/A * Is current set STALE? 2N/A * Need to know this if delete host fails and node 2N/A * is re-joined to diskset. 2N/A * For each node being deleted, set DEL flag and 2N/A * reset OK flag on that node first. 2N/A * Until a node has turned off its own 2N/A * rpc.metad's NODE_OK flag, that node could be 2N/A * considered for master during a reconfig. 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A /* Something wrong, handle this in next loop */ 2N/A /* If node_id_list is alloc'd, fill in for later use */ 2N/A /* All nodes are guaranteed to be ALIVE unless OHA */ 2N/A /* Only changing my local cache of node list */ 2N/A /* Set flags for del host to DEL on that host */ 2N/A * Turn off owner flag in nodes to be deleted 2N/A * if this node has been joined. 2N/A * Also, turn off NODE_OK and turn on NODE_DEL 2N/A * for nodes to be deleted. 2N/A * These flags are used to set the node 2N/A * record flags in all nodes in the set. 2N/A * Only withdraw nodes that are joined. 2N/A * Don't communicate with non-ALIVE node if 2N/A * in OHA - but set flags in master list so 2N/A * alive nodes are updated correctly. 2N/A * Going to set locally cached 2N/A * node flags to rollback join 2N/A * so in case of error, the 2N/A * rollback code knows which 2N/A * nodes to re-join. rpc.metad 2N/A * ignores the RB_JOIN flag. 2N/A * Be careful in ordering of 2N/A * following steps so that 2N/A * recovery from a panic 2N/A * between the steps is viable. 2N/A * Only reset master info in 2N/A * rpc.metad - don't reset 2N/A * local cached info which will 2N/A * be used to set master info 2N/A * back if failure (rollback). 2N/A * Reset master on deleted node 2N/A * Now, reset owner and set delete flags for the 2N/A * deleted nodes on all nodes. 2N/A /* Skip non-ALIVE node if in OHA */ 2N/A * Notify rpc.mdcommd on all nodes of a nodelist change. 2N/A * Send reinit command to mdcommd which forces it to get 2N/A * fresh set description. 2N/A /* Class is ignored for REINIT */ 2N/A "Unable to reinit rpc.mdcommd.\n"));
2N/A "Unable to resume rpc.mdcommd.\n"));
2N/A * Mark the set record MD_SR_DEL on the hosts we are deleting 2N/A * If a MN diskset and OHA mode, don't issue RPC to nodes that 2N/A * If a MN diskset and not in OHA mode, then all nodes must respond 2N/A * to RPC (be alive) or this routine will return failure. 2N/A * If a traditional diskset, all RPC failures if in OHA mode. 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A /* Skip non-ALIVE node if in OHA mode */ 2N/A * All nodes should be alive in non-oha mode. 2N/A * For traditional diskset, issue the RPC and 2N/A * ignore RPC failure if in OHA mode. 2N/A /* Delete the set on the hosts we are deleting */ 2N/A * Failure during del_set_on_hosts would have recreated 2N/A * the diskset on the remote hosts, but for multi-owner 2N/A * disksets need to set node flags properly and REINIT and 2N/A * RESUME rpc.mdcommd, so just let the rollback code 2N/A /* Delete the host from sets on hosts not being deleted */ 2N/A /* All nodes are guaranteed to be ALIVE unless in oha mode */ 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A /* Skip nodes being deleted */ 2N/A /* Skip empty slots */ 2N/A /* Skip nodes being deleted */ 2N/A /* We have drives */ 2N/A * Delete the old sidename for each drive on all the hosts. 2N/A * If a multi-node diskset, each host only stores 2N/A * the side information for itself. So, a multi-node 2N/A * diskset doesn't delete the old sidename for 2N/A * If a MN diskset, reset owners of mirrors that are 2N/A * owned by the deleted nodes. 2N/A /* Skip empty slots */ 2N/A /* Skip nodes being deleted */ 2N/A /* All nodes guaranteed ALIVE unless in oha mode */ 2N/A * If mirror owner was set to a deleted node, 2N/A * then each existing node resets mirror owner 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A /* Skip nodes being deleted */ 2N/A * If mirror owner is a deleted node, reset 2N/A * mirror owners to NULL. If an error occurs, 2N/A * print a warning and continue. Don't fail 2N/A * metaset because of mirror owner reset 2N/A * problem since next node to grab mirror 2N/A * will resolve this issue. Before next node 2N/A * grabs mirrors, metaset will show the deleted 2N/A * node as owner which is why an attempt to 2N/A * reset the mirror owner is made. 2N/A "Unable to reset mirror owner on" 2N/A * Bring the mediator record up to date with the set record for 2N/A * traditional diskset. 2N/A /* Inform the mediator hosts of the new node list */ 2N/A * For traditional diskset: 2N/A * We are deleting ourselves out of the set and we have drives to 2N/A * consider; so we need to halt the set, release the drives and 2N/A * reset the timeout. **** THIS IS A ONE WAY TICKET, NO ROLL BACK 2N/A * IS POSSIBLE AS SOON AS THE HALT SET COMPLETES, SO THIS IS DONE 2N/A * WITH ALL SIGNALS BLOCKED AND LAST **** 2N/A * This situation cannot occur in a MN diskset since a node can't 2N/A * delete itself unless all nodes are being deleted and a diskset 2N/A * cannot contain any drives if all nodes are being deleted. 2N/A * So, don't even test for this if a MN diskset. 2N/A /* Make sure we are blocking all signals */ 2N/A /* release signals back to what they were on entry */ 2N/A * Unlock diskset by resuming messages across the diskset. 2N/A * Just resume all classes so that resume is the same whether 2N/A * just one class was locked or all classes were locked. 2N/A * Skip nodes being deleted if remote set 2N/A * was deleted since rpc.mdcommd may no longer 2N/A * be running on remote node. 2N/A "Unable to resume rpc.mdcommd.\n"));
2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A /* Skip empty slots */ 2N/A /* release signals back to what they were on entry */ 2N/A /* all signals already blocked for MN disket */ 2N/A * Send reinit command to rpc.mdcommd which forces it to get 2N/A * fresh set description and resume all classes but class 0. 2N/A * Don't send any commands to rpc.mdcommd if set on that node 2N/A * If the remote set was deleted, rpc.mdcommd 2N/A * may no longer be running so send nothing to it. 2N/A /* Class is ignored for REINIT */ 2N/A "Unable to reinit rpc.mdcommd.\n"));
2N/A * If the remote set was deleted, rpc.mdcommd 2N/A * may no longer be running so send nothing to it. 2N/A "Unable to resume rpc.mdcommd.\n"));
2N/A * Lock out other meta* commands on nodes with the newly 2N/A * re-created sets by suspending class 1 messages 2N/A * across the diskset. 2N/A /* Skip nodes not being deleted */ 2N/A /* Suspend commd on nodes with re-created sets */ 2N/A "Unable to suspend rpc.mdcommd.\n"));
2N/A * See if we have to re-add the drives specified. 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A /* Don't care if set record is MN or not */ 2N/A /* Drive already added, skip to next node */ 2N/A * Set record structure was allocated from RPC 2N/A * routine getset so this structure is only of 2N/A * size md_set_record even if the MN flag is 2N/A * set. So, clear the flag so that the free 2N/A * code doesn't attempt to free a structure 2N/A * the size of md_mnset_record. 2N/A * Set record structure was allocated from RPC routine 2N/A * getset so this structure is only of size 2N/A * md_set_record even if the MN flag is set. So, 2N/A * clear the flag so that the free code doesn't 2N/A * attempt to free a structure the size of 2N/A * This is not the first replica being added to the 2N/A * diskset so call with ADDSIDENMS_BCAST. If this 2N/A * is a traditional diskset, the bcast flag is ignored 2N/A * since traditional disksets don't use the rpc.mdcommd. 2N/A * Add the device names for the new sides into the namespace, 2N/A * on all hosts not being deleted. 2N/A /* Find a node that is not being deleted */ 2N/A /* Skip empty slots */ 2N/A /* Find a node that is not being deleted */ 2N/A /* Skip nodes not being deleted */ 2N/A /* this side was just created, add the names */ 2N/A /* Skip empty slots */ 2N/A /* Skip nodes not being deleted */ 2N/A /* this side was just created, add the names */ 2N/A * Add the new sidename for each drive to all the hosts 2N/A * Multi-node disksets only store the sidename for 2N/A * that host, so there is nothing to re-add. 2N/A /* Skip empty slots */ 2N/A /* Skip nodes not being deleted */ 2N/A /* Skip empty slots */ 2N/A /* rollback the mediator record */ 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A /* Record should be for a multi-node diskset */ 2N/A /* Skip empty slots */ 2N/A /* Record should be for a non-multi-node set */ 2N/A * Set record structure was allocated from RPC 2N/A * routine getset so this structure is only of 2N/A * size md_set_record even if the MN flag is 2N/A * set. So, clear the flag so that the free 2N/A * code doesn't attempt to free a structure 2N/A * the size of md_mnset_record. 2N/A /* Skip empty slots */ 2N/A /* Sets MD_SR_OK on given nodes. */ 2N/A * On each newly re-added node, set the node record for that 2N/A * node to OK. Then set all node records for the newly added 2N/A * nodes on all nodes to ok. 2N/A * By setting a node's own node record to ok first, even if 2N/A * the node re-adding the hosts panics, the rest of the nodes 2N/A * can determine the same node list during the choosing of the 2N/A * master during reconfig. So, only nodes considered for 2N/A * mastership are nodes that have both MD_MN_NODE_OK and 2N/A * MD_SR_OK set on that node's rpc.metad. If all nodes have 2N/A * MD_SR_OK set, but no node has its own MD_MN_NODE_OK set, 2N/A * then the set will be removed during reconfig since a panic 2N/A * occurred during the re-creation of the deletion of 2N/A * the initial diskset. 2N/A * Notify rpc.mdcommd on all nodes of a 2N/A * nodelist change. Start by suspending 2N/A * rpc.mdcommd (which drains it of all 2N/A * messages), then change the nodelist 2N/A * followed by a reinit and resume. 2N/A "Unable to suspend " 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A /* Something wrong, finish this in next loop */ 2N/A /* Set master on re-joining node. */ 2N/A * Re-join set to same state as 2N/A * before - stale or non-stale. 2N/A /* Only changing my local cache of node list */ 2N/A /* Set record for host to ok on that host */ 2N/A /* Now set all node records on all nodes to be ok */ 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A * Notify rpc.mdcommd on all nodes of a nodelist change. 2N/A * Send reinit command to mdcommd which forces it to get 2N/A * fresh set description. 2N/A /* Class is ignored for REINIT */ 2N/A "Unable to reinit rpc.mdcommd.\n"));
2N/A * Unlock diskset by resuming messages across the diskset. 2N/A * Just resume all classes so that resume is the same whether 2N/A * just one class was locked or all classes were locked. 2N/A "Unable to resume rpc.mdcommd.\n"));
2N/A * Start a resync thread on the re-added nodes 2N/A * if set is not stale. Also start a thread to update the 2N/A * abr state of all soft partitions 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A "Unable to start resync " 2N/A "Unable to start sp update " 2N/A /* Don't test lock flag since guaranteed to be set if in rollback */ 2N/A * During OHA mode, don't issue RPCs to 2N/A * non-alive nodes since there is no reason to 2N/A * wait for RPC timeouts. 2N/A /* Skip empty slots */ 2N/A /* release signals back to what they were on entry */ 2N/A /* Make sure we own the set */ 2N/A /* Lock the set on our side */ 2N/A /* enable auto_take but only if it is not already set */ 2N/A /* verify that we're the only host in the set */ 2N/A /* Disable SCSI reservations */ 2N/A /* disable auto_take, if set, or error */ 2N/A /* Enable SCSI reservations */