shadow_migrate.c revision 2
2N/A * The contents of this file are subject to the terms of the 2N/A * Common Development and Distribution License (the "License"). 2N/A * You may not use this file except in compliance with the License. 2N/A * See the License for the specific language governing permissions 2N/A * and limitations under the License. 2N/A * When distributing Covered Code, include this CDDL HEADER in each 2N/A * If applicable, add the following below this CDDL HEADER, with the 2N/A * fields enclosed by brackets "[]" replaced with your own identifying 2N/A * information: Portions Copyright [yyyy] [name of copyright owner] 2N/A * Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved. 2N/A * This file contains the infrastructure to migrate files and directories. 2N/A * If this directory is part of a different filesystem, then stop the 2N/A * traversal rather than wasting time traversing the subdirectory. The 2N/A * implementation of 'f_fsid' leaves something to be desired, but since 2N/A * this is just a suggestion, it's harmless if we're wrong. 2N/A /* XXX verify endianness */ 2N/A * If the size of the pending lists exceeds a reasonable size, then 2N/A * bail out. While we try to keep the FID lists short, there are times 2N/A * (such as when there are a large number of errors) when the lists 2N/A * grow very large. If this is the case, then it's probably not worth 2N/A * trying to load and resume the migration from this list, and we're 2N/A * better off just loading the root directory and starting from 2N/A * Iterate over the given pending FID list and add entries for each item in the 2N/A * With two pending lists and the abilty for entries to appear 2N/A * multiple times in a pending list, we want to make sure we 2N/A * don't add the same entry twice. For efficiency, we create a 2N/A * hash based on FID and ignore those we've already seen. 2N/A * Ideally, we'd like to avoid adding children if we've already 2N/A * added a parent (which would visit the same child twice), but 2N/A * this requires a more complicated data structure and should 2N/A * hopefully be a rare occurrence. 2N/A * If this is a relative path, it is the remote path and we 2N/A * should turn it into a guess at the absolute path. 2N/A * When we first start migration, we have the root directory 2N/A * and its contents in the pending list. As a special case to 2N/A * avoid looking at the entire hierarchy twice, we never add 2N/A * the root directory to the pending list. If there is some 2N/A * error that is keeping the root directory from being 2N/A * migrated, we'll discover it when we process the pending 2N/A * This function is responsible for adding the initial directories to the list. 2N/A * In order to allow us to resume a previous migration, we make the assumption 2N/A * that the filesystem is largely static, and the remote paths are likely the 2N/A * same as the local ones. By this token, we can iterate over the pending list 2N/A * and lookup the remote path for those FIDs that are not yet migrated. As an 2N/A * extra check, we also look at the vnode path information as a second source 2N/A * of possible path information. If everything fails, then we fall back to 2N/A * processing the FID list individually. While not ideal, it gets the job 2N/A * done. This is done asynchronously to the open, when the first migration is 2N/A * attempted. Because we don't want to block reading the FID list when mounted 2N/A * in standby mode, we return an error if we're currently in standby mode. 2N/A /* shouldn't happen */ 2N/A * This function will go and load the pending FID list, if necessary. It 2N/A * returns with the sh_lock held on success 2N/A "pending FID list is currently being loaded")));
2N/A * This function is called during shadow_close() and is responsible for 2N/A * removing all items from the work queue and freeing up any errors seen. 2N/A * Record an error against the given path. We first check to see if it's a 2N/A * known error, returning if it is. Otherwise, we create an entry in the error 2N/A * list and record the relevant information. 2N/A * Called when migration fails for a file or directory. In this case, we 2N/A * consult the kernel to get the remote path for the object. If this fails, 2N/A * then we assume it's a local error and don't record the failure. If it 2N/A * succeeds, it indicates there was a problem with the remote side, and we do 2N/A * Internal function to calculate priority within the pending queue. This is 2N/A * based primarily on the the directory depth, as we want to proceed 2N/A * depth-first in order to minimize the size of our pending list. We also bias 2N/A * towards the most recently accessed entries, under the assumption that they 2N/A * are more likely to be accessed again. 2N/A * We have only 64 bits of identifiers, and a complete timestamp could 2N/A * potentially take up this entire value. Instead, we carve 16 high 2N/A * order bits for the depth, and then squeeze the timestamp into the 2N/A * remaining bits. This may lose some nanosecond accuracy, but this 2N/A * won't make a significant difference in the overall functioning of 2N/A * At this point the highest value represents the highest priority, but 2N/A * priority queues are based on the lowest value being the highest 2N/A * priority. We invert the value here to achieve this. 2N/A * The actual migration is done through the SHADOW_IOC_MIGRATE ioctl(). 2N/A * Normally, all migration errors are converted into the generic EIO error so 2N/A * as not to confuse consumers. For data reporting purposes, however, we want 2N/A * to get the real error. 2N/A * Migrate a directory. 2N/A * Skip the .SUNWshadow private directory. 2N/A * Skip .zfs if this is a ZFS filesystem and it's 2N/A * This function processes one entry from the on-disk pending list. This 2N/A * function can fail with ESHADOW_MIGRATE_DONE if there are no entries left to 2N/A * process. This is called with the lock held. 2N/A * This should never fail, but if it does just ignore the error and let 2N/A * the client try again. 2N/A * If the enqueue itself fails, we'll still be safe because of 2N/A * the on-disk pending list. This can theoretically stomp on 2N/A * the previous error, but the only way either operation can 2N/A * fail is with ENOMEM. 2N/A * Primary entry point for migrating a file or directory. The caller is 2N/A * responsible for controlling how often this function is called and by how 2N/A * many threads. This pulls an entry of the pending list, and processes it 2N/A * This function can return ESHADOW_MIGRATE_BUSY if all possible threads are 2N/A * busy processing data, or ESHADOW_MIGRATE_DONE if the filesystem is done 2N/A "all entries are actively being processed")));
2N/A * Debugging tool to allow simulation of ESHADOW_MIGRATE_BUSY. The 2N/A * delay is specified in milliseconds. 2N/A * This indicates that the filesystem is mounted in standby 2N/A * mode. If this is the case, return an error, which will 2N/A * cause the consumer to retry at a later point (or move onto 2N/A * other filesystems). 2N/A "filesystem currently mounted in standby mode")));
2N/A * The above functions can only fail if there is a library error (such 2N/A * as out-of-memory conditions). In this case we should put it back in 2N/A * our queue. If there was an I/O error or kernel level problem, we'll 2N/A * rely on the shadow pending queue to pick up the file later as part 2N/A * of the cleanup phase. The exception is EINTR, where we know we 2N/A * should retry the migration. 2N/A * Returns true if this filesystem has finished being migrated. 2N/A * Returns true if there are only files with persistent errors left to migrate. 2N/A * These errors may still be fixed by the user, so consumers should use this 2N/A * information to process entries less aggressively. 2N/A * This is a debugging tool that allows applications to dump out the current 2N/A * pending list or otherwise manipulate it. Because it's only for debugging 2N/A * purposes, it can leave the pending list in an arbitrary invalid state is 2N/A * something fails (i.e. memory allocation). 2N/A * Cleanup after a completed shadow migration. This is identical to 2N/A * shadow_cancel() except that it verifies that the migration is complete. 2N/A * This is a debugging-only tool that makes it easier to simulate 2N/A * ESHADOW_MIGRATE_BUSY by suspending shadow_migrate_one() before migrating the 2N/A * file or directory. This should not be used by production software - if 2N/A * there needs to be throttling done, it should be implemented by the caller 2N/A * invoking shadow_migrate_one() on a less frequent basis. The delay is 2N/A * specified in milliseconds.