2690N/A.. This document is formatted using reStructuredText, which is a Markup
2690N/A Syntax and Parser Component of Docutils for Python. An html version
2690N/A of this document can be generated using the following command:
2690N/A rst2html.py doc/parallel-linked-images.txt >doc/parallel-linked-images.html
2690N/A
2690N/A======================
2690N/AParallel Linked Images
2690N/A======================
2690N/A
2690N/A:Author: Edward Pilatowicz
2690N/A:Version: 0.1
2690N/A
2690N/A
2690N/AProblems
2690N/A========
2690N/A
2690N/ACurrently linked image recursion is done serially and in stages. For
2690N/Aexample, when we perform an "pkg update" on an image then for each child
2690N/Aimage we will execute multiple pkg.1 cli operations. The multiple pkg.1
2690N/Ainvocations on a single child image correspond with the following
2690N/Asequential stages of pkg.1 execution:
2690N/A
2690N/A1) publisher check: sanity check child publisher configuration against
2690N/A parent publisher configuration.
2690N/A2) planning: plan fmri and action changes.
2690N/A3) preparation: download content needed to execute planned changes.
2690N/A4) execution: execute planned changes.
2690N/A
2690N/ASo to update an image with children, we invoke pkg.1 four times for each
2690N/Achild image. This architecture is inefficient for multiple reasons:
2690N/A
2690N/A- we don't do any operations on child images in parallel
2690N/A
2690N/A- when executing multiple pkg.1 invocations to perform a single
2690N/A operation on a child image, we are constantly throwing out and
2690N/A re-initializing lots of pkg.1 state.
2690N/A
2690N/ATo make matters worse, when as we execute stages 3 and 4 on a child
2690N/Aimage the pkg client also re-executes previous stages. For example,
2690N/Awhen we start stage 4 (execution) we re-execute stages 2 and 3. So for
2690N/Aeach child we update we end up invoking stage 2 three times, and stage 3
2690N/Atwice. This leads to bugs like 18393 (where it seems that we download
2690N/Apackages twice). It also means that we have caching code buried within
2690N/Athe packaging system that attempts to cache internal state to disk in an
2690N/Aeffort to speed up subsequent re-runs of previous stages.
2690N/A
2690N/A
2690N/ASolutions
2690N/A=========
2690N/A
2690N/A
2690N/AEliminate duplicate work
2690N/A------------------------
2690N/A
2690N/AWe want to eliminate a lot of the duplicate work done when executing
2690N/Apackaging operations on children in stages. To do this we will update
2690N/Athe pkg client api to allow callers to:
2690N/A
2690N/A- Save an image plan to disk.
2690N/A- Load an image plan from disk.
2690N/A- Execute a loaded plan from disk without first "preparing" it. (This
2690N/A assumes that the caller has already "prepared" the plan in a previous
2690N/A invocation.)
2690N/A
2690N/AIn addition to eliminating duplicated work during staged execution, this
2690N/Awill also allow us to stop caching intermediate state internally within
2690N/Athe package system. Instead client.py will be enhanced to cache the
2690N/Aimage plan and it will be the only component that knows about "staging".
2690N/A
2690N/ATo allow us to save and restore plans, all image plan data will be saved
2690N/Awithin a PlanDescription object, and we will support serializing this
2690N/Aobject into a json format. The json format for saved image plans is an
2690N/Ainternal, unstable, and unversioned private interface. We will not
2690N/Asupport saving an image plan to disk and then executing it later with a
2690N/Adifferent version of the packaging system on a different host. Also,
2690N/Aeven though we will be adding data into the PlanDescription object we
2690N/Awill also not be exposing any new information about an image plan to via
2690N/Athe PlanDescription object to api consumers.
2690N/A
2690N/AAn added advantage of allowing api consumers to save an image plan to
2690N/Adisk is that it should help with our plans to have the api.gen_plan_*()
2690N/Afunctions to be able to return PlanDescription object for child images.
2690N/AA file descriptor (or path) associated with a saved image plan would be
2690N/Aone way for child images to pass image plans back to their parent (which
2690N/Acould then load them and yield them as results to api.gen_plan_*()).
2690N/A
2690N/A
2690N/AUpdate children in parallel
2690N/A---------------------------
2690N/A
2690N/AWe want to enhance the package client so that it can update child images
2690N/Ain parallel.
2690N/A
2690N/ADue to potential resource constraints (cpu, memory, and disk io) we
2690N/Acannot entirely remove the ability to operate on child images serially.
2690N/AInstead, we plan to allow for a concurrency setting that specifies how
2690N/Amany child images we are willing to update in parallel. By default when
2690N/Aoperating on child images we will use a concurrency setting of 1, this
2690N/Amaintains the current behavior of the packaging system. If a user wants
2690N/Ato specify a higher concurrency setting, they can use the "-C N" option
2690N/Ato subcommands that recurse (like "install", "update", etc) or they can
2690N/Aset the environment variable "PKG_CONCURRENCY=N". (In both cases N is
2690N/Aan integer which specifies the desired concurrency level.)
2690N/A
2690N/ACurrently, pkg.1 worker subprocesses are invoked via the pkg.1 cli
2690N/Ainterfaces. When switching to parallel execution this will be changed
2690N/Ato use a json encoded rpc execution model. This richer interface is
2690N/Aneeded to allow worker processes to pause and resume execution between
2690N/Astages so that we can do multi-staged operations in a single process.
2690N/A
2690N/AUnfortunately, the current implementation does not yet retain child
2690N/Aprocesses across different stages of execution. Instead, whenever we
2690N/Astart a new stage of execution, we spawn one process for each child
2690N/Aimages, then we make a remote procedure call into N images at once
2690N/A(where N is our concurrency level). When an RPC returns, that child
2690N/Aprocess exits and we start a call for the next available child.
2690N/A
2690N/AUltimately, we'd like to move to model where we have a pool of N worker
2690N/Aprocesses, and those processes can operate on different images as
2690N/Anecessary. These processes would be persistent across all stages of
2690N/Aexecution, and ideally, when moving from one stage to another these
2690N/Aprocesses could cache in memory the state for at least N child images so
2690N/Athat the processes could simply resume execution where they last left
2690N/Aoff.
2690N/A
2690N/AThe client side of this rpc interface will live in a new module called
2690N/APkgRemote. The linked image subsystem will use the PkgRemote module to
2690N/Ainitiate operations on child images. One PkgRemote instance will be
2690N/Aallocated for each child that we are operating on. Currently, this
2690N/APkgRemote module will only support the sync and update operations used
2690N/Awithin linked images, but in the future it could easily be expanded to
2690N/Asupport other remote pkg.1 operations so that we can support recursive
2690N/Alinked image operations (see 7140357). When PkgRemote invokes an
2690N/Aoperation on a child image it will fork off a new pkg.1 worker process
2690N/Aas follows:
2690N/A
2690N/A pkg -R /path/to/linked/image remote --ctlfd=5
2690N/A
2690N/Athis new pkg.1 worker process will function as an rpc server which the
2690N/Aclient will make requests to.
2690N/A
2690N/ARpc communication between the client and server will be done via json
2690N/Aencoded rpc. These requests will be sent between the client and server
2690N/Avia a pipe. The communication pipe is created by the client, and its
2690N/Afile descriptor is passed to the server via fork/exec. The server is
2690N/Atold about the pipe file descriptor via the --ctlfd parameter. To avoid
2690N/Aissues with blocking IO, all communication via this pipe will be done by
2690N/Apassing file descriptors. For example, if the client wants to send a
2690N/Arpc request to the server, it will write that rpc request into a
2690N/Atemporary file and then send the fd associated with the temporary file
2690N/Aover the pipe. Any reply from the server will be similarly serialized
2690N/Aand then sent via a file descriptor over the pipe. This should ensure
2690N/Athat no matter the size of the request or the response, we will not
2690N/Ablock when sending or receiving requests via the pipe. (Currently, the
2690N/Alimit of fds that can be queued in a pipe is around 700. Given that our
2690N/Arpc model includes matched requests and responses, it seems unlikely
2690N/Athat we'd ever hit this limit.)
2690N/A
2690N/AIn the pkg.1 worker server process, we will have a simple json rpc
2690N/Aserver that lives within client.py. This server will listen for
2690N/Arequests from the client and invoke client.py subcommand interfaces
2690N/A(like update()). The client.py subcommand interfaces were chosen to be
2690N/Athe target for remote interfaces for rpc calls for the following
2690N/Areasons:
2690N/A
2690N/A- Least amount of encoding / decoding. Since these interfaces are
2690N/A invoked just after parsing user arguments, they mostly involve simple
2690N/A arguments (strings, integers, etc) which have a direct json encoding.
2690N/A Additionally, the return values from these calls are simple return
2690N/A code integers, not objects, which means the results are also easy to
2690N/A encode. This means that we don't need lots of extra serialization /
2690N/A de-serialization logic (for things like api exceptions, etc).
2690N/A
2690N/A- Output and exception handling. The client.py interfaces already
2690N/A handle exceptions and output for the client. This means that we don't
2690N/A have to create new output classes and build our own output and
2690N/A exception management handling code, instead we leverage the existing
2690N/A code.
2690N/A
2690N/A- Future recursion support. Currently when recursing into child images
2690N/A we only execute "sync" and "update" operations. Eventually we want to
2690N/A support pkg.1 subcommand recursion into linked images (see 7140357)
2690N/A for many more operations. If we do this, the client.py interfaces
2690N/A provide a nice boundary since there will be an almost 1:1 mapping
2690N/A between parent and child subcommand operations.
2690N/A
2690N/A
2690N/AChild process output and progress management
2690N/A--------------------------------------------
2690N/A
2690N/ACurrently, since child execution happens serially, all child images have
2690N/Adirect access to standard out and display their progress directly there.
2690N/AOnce we start updating child images in parallel this will no longer be
2690N/Apossible. Instead, all output from children will be logged to temporary
2690N/Afiles and displayed by the parent when a child completes a given stage
2690N/Aof execution.
2690N/A
2690N/AAdditionally, since child images will no longer have access to standard
2690N/Aout, we will need a new mechanism to indicate progress while operating
2690N/Aon child images. To do this we will have a progress pipe between each
2690N/Aparent and child image. The child image will write one byte to this
2690N/Apipe whenever one of the ProgressTracker`*_progress() interfaces are
2690N/Ainvoked. The parent process can read from this pipe to detect progress
2690N/Awithin children and update its user visible progress tracker
2690N/Aaccordingly.