lxc_attach.c revision 9afe19d634946d50eab30e3b90cb5cebcde39eea
* License along with this library; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Execute the specified command - enter the container NAME\n\ -n, --name=NAME NAME for name of the container\n\ -e, --elevated-privileges\n\ Use elevated privileges (capabilities, cgroup\n\ restrictions) instead of those of the container.\n\ WARNING: This may leak privleges into the container.\n\ -a, --arch=ARCH Use ARCH for program instead of container's own\n\ -s, --namespaces=FLAGS\n\ Don't attach to all the namespaces of the container\n\ but just to the following OR'd list of flags:\n\ MOUNT, PID, UTSNAME, IPC, USER or NETWORK\n\ WARNING: Using -s implies -e, it may therefore\n\ leak privileges into the container. Use with care.\n\ -R, --remount-sys-proc\n\ Remount /sys and /proc if not attaching to the\n\ mount namespace when using -s in order to properly\n\ reflect the correct namespace context. See the\n\ lxc-attach(1) manual page for details.\n",
ERROR(
"failed to get the init pid");
ERROR(
"failed to get context of the init process, pid = %d",
init_pid);
/* determine which namespaces the container was created with ERROR(
"failed to automatically determine the " "namespaces which the container unshared");
/* For the cgroup attaching logic to work in conjunction with pid and user namespaces, * we need to have the following hierarchy: * lxc-attach [process executed externally] * | socketpair(cgroup_ipc_sockets) * | | fork() -> grandchild * | |<------------------|----+ * |<----------------------|-----+ | * | signal child -------->| | * | | signal child ---->| * | waitpid() | waitpid() | exec() * | |<------------------| exit() * |<----------------------| exit() * The rationale is the following: The first parent is needed because after * setns() (mount + user namespace) we can't access the cgroup filesystem * to add the pid to the corresponding cgroup. Therefore, we need to do that * in a process executed on the host, so that's why we need to fork and wait * for it to have done some initialization (cgroups may restrict certain * operations so we have to do that in the end) and use IPC for signaling. * Then in the child process we do the setns(). However, a process is never * really attached to a pid namespace (never changes its pid, doesn't appear * in the pid namespace /proc), only child processes of that process are * truely inside the new pid namespace. That's why we need to fork() again * after setns() before performing final initializations, then signal our * parent, which signals the primary process, which does cgroup adding, * which then signals to the grandchild that it can exec(). SYSERROR(
"could not set up required IPC mechanism for attaching");
SYSERROR(
"failed to create first subprocess");
ERROR(
"failed to get pid of attached process to add to cgroup");
ERROR(
"failed to attach process to cgroup");
ERROR(
"failed to signal child that cgroup logic has finished");
/* at this point we are in the 'parent' process so we need to close the * socket reserved for the 'grandparent' process /* we need to attach before we fork since certain namespaces * (such as pid namespaces) only really affect children of the * current process and not the process itself ERROR(
"failed to enter the namespace");
WARN(
"could not change directory to '%s'",
curdir);
/* hack: we need sync.h infrastructure - and that needs a handler * FIXME: perhaps we should also just use a very simple socketpair() * here? - like with the grandparent <-> parent communication? ERROR(
"failed to initialize synchronization socket");
/* wait until the child has done configuring itself before * we put it in a cgroup that potentially limits these /* ask grandparent to add child to cgroups, the grandparent will * itself check whether that's actually necessary ERROR(
"error using IPC to notify main process of pid to add to the cgroups of the container");
/* we need some mechanism to check whether the grandparent could * add us to the cgroups or not - so we await a dummy integer * on the same socket (that's why we don't use a pipe - we need * two-way communication). So if the parent fails and exits, that * will close the socket, which will cause a read of 0 bytes for * us, so we just terminate. If we read at least a byte, we don't * care about the contents... /* only print someting if we can't assume the parent already * gave an error message, that will reduce confusion for the ERROR(
"failed to get notification that the child process was added to the container's cgroups");
/* we don't need that IPC interface anymore */ /* tell the child we are done initializing */ ERROR(
"failed switching apparmor profiles");
/* A description of the purpose of this functionality is * provided in the lxc-attach(1) manual page. We have to * remount here and not in the parent process, otherwise * /proc may not properly reflect the new pid namespace. ERROR(
"could not ensure correct architecture: %s",
ERROR(
"could not drop privileges");
/* tell parent we are done setting up the container and wait * until we have been put in the container's cgroup, if /* ignore errors, we will fall back to root in that case * (/proc was not mounted etc.) /* try to set the uid/gid combination */ /* this probably happens because of incompatible nss * implementations in host and container (remember, this * code is still using the host's glibc but our mount * namespace is in the container) * we may try to get the information by spawning a * [getent passwd uid] process and parsing the result /* executed if either no passwd entry or execvp fails, * we will fall back on /bin/sh as a default shell