Self-Repairing Deployment Pipelines

Posted by Mark Burgess
August 5, 2014

Or what we should mean by Distributed Orchestration

MakeVelocity.001.alt Orchestrating complicated distributed processes is an unfamiliar aspect of computing that leads to all kinds of confusions. We are not taught how to do it in college, so we end up trying to apply any methods we are taught, often in inappropriate ways. Promise theory paints a very simple picture of distributed orchestration. Rather than imagining that a central conductor (controller) somehow plays every instrument by remote magic wand, in an algorithmic fashion, promise theory says: let every player in an ensemble know their part, and leave them all to get on with it. The result is an emergent phenomenon. The natural dependences on one another will make them all play together. Over-thinking the storyline in a distributed process is the easiest way to get into a pickle. This is a key point: how we tell the story of a process and how it gets executed are two different things. Modern programming languages sometimes pretend they are the same, and sometimes separate the two entirely. MakeVelocity.002 Scroll back time to 1992, and the world was having all the same problems as today, in different wrapping. Then there was cron, pumping out scripts on hourly, daily and weekly schedules. This was used not only for running jobs on the system but for basic configuration health checks. Cron scripts were like a wild horde of cats, each with their own lives, hard to bring into some sense of order. In the early 90s the acme of orchestration was to sort out all your cron jobs to do the right thing at the right time. The scripts had to be different on the machines too because the flavours of Unix were quite different – and thus there was distributed complexity. Before CFEngine people would devise devious ways of creating one cronfile for each host and then pushing them out. This was considered to be orchestration in 1992. MakeVelocity.003 One of the first use cases for CFEngine was to replace all of this with a single uniform model oriented language/interface. CFEngine was target oriented, because it had to be repeatable. Convergence . In this article I explain why virtual environments and containers are basically this issue all over again. MakeVelocity.004 Another tool of this epoch is the make for building software from dependencies. In 1994, Richard Stallman pointed out to me that CFEngine was very like make. Indeed, this ended up influencing the syntax of the language. MakeVelocity.005 The Makefile was different, it was the opposite of a script. Instead of starting in a known state and pushing out a sequence of transitions from there, it focused on the end state and asked how can I get to that desired end state? In math parlance, it was a change of boundary condition. This was an incredibly important idea, because it meant that – no matter what kind of a mess you were in – you would end up with the right outcome. This is far more important than knowing where you started from. MakeVelocity.006 Makefiles did not offer much in the way of abstraction; you could substitute variables and make simple patterns, but this was sufficient for most tasks, because patterns are one of the most important mechanisms for dealing with complexity. Similarly, make was a serial processor running on a single machine, not really suitable for today’s distributed execution requirements. The main concession to parallelism was the addition of “-j” to parallelize building of dependencies. MakeVelocity.008 What was really needed was a model based approach where we could provide answers to the following questions: what, when, where, how and why. MakeVelocity.009 So now we come to the world of today where software is no longer shackled to a workstation or a server, but potentially a small cog in a large system. And more than that - it is a platform for commerce in the modern world. It’s not just developers and IT folks who care about having stuff built - it’s everyone who uses a service. MakeVelocity.010 Many of the problems we are looking to solve can be couched in the model of a deployment of some kind. Whether it is in-house software (“devops”), purchased off-the-shelf software (say “desktop”) or even batch jobs in HPC clusters, all of these typically pass through a test phase before being deployed onto some infrastructure container, such as a server, process group, or even embedded device. MakeVelocity.011 Alas the technologies we’ve invented are still very primitive. If we look back to the history of logic, it grew out of the need to hit objects with projectiles in warfare. Ballistics was the cultural origin of mathematics and logic in the days of Newton and Boole. Even today, we basically still try to catapult data and instructions into remote hosts using remote copies and shells. MakeVelocity.012 So if a script is like a catapult, that takes us from one decision to the next in a scripted logic. Another name for this triggered branching process is a chain reaction (an explosion). A Makefile is the opposite: a convergent process like something sliding easily down a drain. The branching logic in a script leads to multitudes of parallel alternative worlds. When we branch in git or version control systems we add to this complexity. In a convergent process we are integrating possible worlds into a consistent outcome. This is the enabler for continuous delivery. MakeVelocity.013 So developers might feel as though they have their triggered deployments under control, but are they really. No matter, we can go from this… MakeVelocity.014 To this … This picture illustrates for me the meaning of true automation. No one has to push a button to get a response. The response is proactive and distributed into the very fabric of the design – not like an add-on. The picture contrasts how we go from manual labour to assisted manual labour, to a proper redesign of process. Automation that still needs humans to operate it is not automation, it is a crane or a power-suit. MakeVelocity.015 CFEngine’s model of promises is able to answer all of the questions what, when, where, how and why, at a basic level and has been carefully designed to have the kind of desired-end-state self-healing properties of a drain. Every CFEngine promise is a controlled implosion that leaves a desired end-state. MakeVelocity.016 Today, configuration promises have to be supported across many different scales, from the smallest containers like a user identity, to processes, process groups, virtual and physical machines, local networks, organizational namespaces and even globally spanning administrative domains. How do we do that? The simple answer is that we always do it “from within” – through autonomous agents that collaborate and take responsibility for keeping desired-end-state promises at all levels. MakeVelocity.017 Traditionally, we think of management of boxes: server boxes, rack boxes, routing boxes, etc. We can certainly put an agent inside every one of those processing entities… MakeVelocity.018 But we also need to be able to address abstract containers, labelled by the properties we use in our models of intent – business purpose. These are things like: linux, smartos, webservers, storage devices, and so on. They describe the functional roles in a story about the business purpose of our system. MakeVelocity.019 This brings up an important issue: how we tell stories. Despite what we are taught in software engineering, there is not only one version of reality when it comes to computer systems. There is the story:

  • Told by the designer
  • Told by the programmer
  • Told by the debugger
  • Told by the user…
  • etc

When building software, it’s all “if this then that, else while do…” That highly serialized (step by step) story only lives in the programmer’s mind. It is an artificial view of reality designed to perpetuate a view of determinism. It is not what the system does. Often when something goes wrong with a system, we start telling simplified stories that focus on “root causes”, which is again trying to condense the indeterminism of a system down to a simple deterministic fiction – a story we can live with. So there is a serial story of transitions we go through when building systems that is represented by commands and hands-on interaction. Then there is a story about how tasks are delegated to different parts of the system. What is important to understand is that we should not try to tie the story about what a system does too closely to how it does it. This leads to inefficiencies and even problems. The goal of an orchestration tool is to try to paint a suitable story around purpose that makes a good compromise on these issues. Then we can leave the execution of the story to the individual characters in the play. MakeVelocity.020 What CFEngine introduced was a kind of “make” for desired system state, which had types of target promises - not just files to be built or installed, but processes to run, services to maintain, and so on. It was a make not only for the binary data, but also for the runtime state. Moreover, it could span any number of machines and use the container abstractions to assign each part of the music to its player. MakeVelocity.021 Embedded in this orchestration framework is still good old “make” and its rule based evaluation. MakeVelocity.022 But it is a make that has storytelling attributes so that we can understand the purpose of the system independently of the algorithm used to build and maintain it. MakeVelocity.023 In addition it has built-in tools that a tailored to the special transformations that are needed to build and maintain the full software stack. These tools had to be borrowed from other sources in make. MakeVelocity.024 Now that we have a more powerful make that can handle static and runtime state across any kind of physical or abstract region, we turn it into a distributed service by making the agent on each separate box aware of all the context it needs to figure out its role in the whole system. Then, as long as we have defined desired outcome into simple target “buckets”, convergence and orchestration of that state can happen entirely autonomously: without any human intervention. MakeVelocity.025 Host containers can even spawn guest containers to keep specialized environments separate. This doesn’t necessarily help execution, but it simplifies the telling of the story for human shepherds. So: how do we orchestrate a distributed process of assembling some software in an environment like this? It’s simple: give everyone their job description and let them get on with it. MakeVelocity.026 The trick is to use policy as the musical score for the orchestration. Policy assigns roles and appropriate guidance to each potential player in an ensemble. It can even assign roles to abstract groups that don’t exist yet! The idea is very flexible. In this example, we coordinate some docker containers (children) inside a parent host and enable them to work together, sharing necessary information, but not trying to push them into a sequence of steps like a programmer. A schematic of the arrangement: Need to Insert Schematic here schematic MakeVelocity.027 The parent container promises to make specialized children each with a particular task. It makes its data available to then in their “in-boxes”, and agrees to collect any results from them from their “out-boxes”. Using make semantics, it can check to see whether any changes have been made that need to be picked up and integrated into a final result. MakeVelocity.028 There is a CFEngine agent in each of these containers. Each of the agents knows the part it has to play, and the result will now emerge from the design, without having to drive the story step by step from some central controller. In the example, I used docker containers. For interested docker fans, this is the entire docker file. It is the analogue of the cron file I showed above. The principle is the same: replace all of the burden of handling variations on a simple model-based automation, designed to handle absractions. All of the modelling is in CFEngine policy so that the story is told at an appropriate level of abstraction, rather than confusing a bricklayer version with an architect version. MakeVelocity.029 The advantage of using this approach is that it can work even for the partially connected devices like mobile phones and pads that are increasingly a vital part of the software chain. Because there is no need for a micro-managed sequential bricklaying approach to ensuring process completion, the emergent effect can happen independently inside every container without different containers preventing one another from making progress by having to wait. MakeVelocity.032 The ability to run on networking devices and incorporate them into the story is now a critical part of the story-telling. The abstractions that we are building are pushing all of the complexity down onto networking engineers. They have not yet benefitted from the abstractions that Open Source innovation used to transform server management. Now that is possible, and they become part of the single orchestrated storyline. MakeVelocity.033 The goal of distributed management as story-telling is to make comprehensible system documentation directly executable by distributed agents so that there is only one version of events: the one that describes our intent rather than an explosion of changes triggered by a push button trigger. MakeVelocity.034 The idea of story telling as a crucial part of human-machine design is part of a larger narrative. It starts with humans solving problems by trial and error. Then we learn better approaches, practice and rehearse. Once we have it down to a mechanical repetition, we are basically acting like machines, so why not get a machine to do it? Our first attempts at mechanizing are to make machines that imitate how humans work. Finally we separate the humans activity from a true form of automation, just like the storm drain design in the illustration above. MakeVelocity.035 I wanted to show how to make an orchestrated outcome emerge from a description of the different roles in a process. This is an unfamiliar idea to many, because we are not taught emergent methods in college. But we need to be less afraid of emergence, because all of things we rely most on in our lives actually work in this way (see my blog The Brain Horizon). When you don’t have to worry about how something works, you can focus on what it does. This is how we need to be thinking about complex, multi-scale processes. Read more about these issues in my book: In Search of Certainty Watch the CFEngine for Docker automation demo The CFEngine example policy used is shown below. It can also be downloaded here ################################# # # Orchestration of containers # ################################# bundle common my { vars: “parent_address” string => “172.17.42.1”; “name” string => “$(sys.uqhost)”; “jobs” slist => { “container_1”, “container_2”, “container_3” }; classes: “containers” or => { @(jobs) }; } ################################################################################# bundle agent main { methods: parent:: “prepare continuous delivery source” comment => “The files where people make changes”; “collect build products” comment => “Reap the harvest”; “package for delivery” comment => “Tie it with a bow”; containers:: “fetch from the parent” comment => “pick up files and build something”; services: parent:: “children” comment => “Spawn some fledgings”; } ################################################################################### # Methods ################################################################################### bundle agent method_prepare_continuous_delivery_source { files: # Make one task for each bud "/tmp/feeds/$(my.jobs)_feed.txt" create => “true”, edit_line => append_if_no_line(“This is for $(my.jobs) from $(sys.uqhost)”); } ################################################################################### bundle agent method_collect_build_products { files: “/tmp/results/.” create => “true”; "/tmp/results/reply_from_$(my.jobs)" comment => “Pick up the latest on the pipeline”, copy_from => secure_cp("/tmp/result_for_pickup.txt","$(sys.docker_guest_ip[$(my.jobs)])"); } ################################################################################### bundle agent method_package_for_delivery { vars: “all_files” slist => { “/tmp/results/reply_from_container_1”, “/tmp/results/reply_from_container_2” }; commands: "/bin/tar zcf /tmp/package.tgz /tmp/results" if => makerule("/tmp/package.tgz", “@(all_files)”); } ################################################################################### bundle agent service_children(name,state) { guest_environments: parent.!cleanup:: "$(my.jobs)" guest_details => stem_cell, guest_state => “create”; cleanup:: "$(my.jobs)" guest_details => stem_cell, guest_state => “delete”; reports: !cleanup:: “CONTAINERS running”; cleanup:: “CONTAINERS stopped”; } #################################################################################### body guest_details stem_cell { guest_type => “docker”; guest_image_name => “cf-stem-cell”; } ################################################################################### # Containers ################################################################################### bundle agent method_fetch_from_the_parent { services: “ssh” comment => “So we can log in and inspect the containers”; “cfengine” comment => “Need the server running”; files: "/tmp/job_description.txt" comment => “Pick up the latest on the pipeline”, copy_from => secure_cp("/tmp/feeds/$(my.name)_feed.txt","$(my.parent_address)"); "/tmp/result_for_pickup.txt" create => “true”, comment => “Do my part and deliver the result for pickup”, edit_line => signature("/tmp/job_description.txt", “A postcard from sunny $(sys.ipv4) - Thanks, with love $(my.name)”); } #################################################################################### # Misc - try to keep this all in one file #################################################################################### body common control { bundlesequence => {“my”, “main”}; } ## bundle server access_control { access: "/tmp" admit => { “172.17.0.0/16” }; } ## body server control { allowconnects => { “172.17.0.0/16” }; allowallconnects => { “172.17.0.0/16” }; trustkeysfrom => { “172.17.0.0/16” }; } ## bundle agent service_ssh(a,b) { processes: “/usr/sbin/sshd” restart_class => “restart_ssh”; commands: “/usr/sbin/sshd” ifvarclass => “restart_ssh”; } ## bundle agent service_cfengine(a,b) { processes: “cf-serverd” restart_class => “restart_cfengine”; commands: “/var/cfengine/bin/cf-serverd” ifvarclass => “restart_cfengine”; } ## body copy_from secure_cp(from,server) { source => “$(from)”; servers => { “$(server)” }; compare => “digest”; encrypt => “true”; verify => “true”; trustkey => “true”; } ## bundle edit_line append_if_no_line(str) { insert_lines: "$(str)" comment => “Append a line to the file if it doesn’t already exist”; } ## bundle edit_line signature(file,str) { insert_lines: "$(file)" insert_type => “file”; “$(str)”; } ## body contain in_dir(s) { chdir => “$(s)”; }