Self-Repairing Deployment Pipelines
Or what we should mean by Distributed Orchestration Orchestrating complicated distributed processes is an unfamiliar aspect of computing that leads to all kinds of confusions. We are not taught how to do it in college, so we end up trying to apply any methods we are taught, often in inappropriate ways. Promise theory paints a very simple picture of distributed orchestration. Rather than imagining that a central conductor (controller) somehow plays every instrument by remote magic wand, in an algorithmic fashion, promise theory says: let every player in an ensemble know their part, and leave them all to get on with it. The result is an emergent phenomenon. The natural dependences on one another will make them all play together. Over-thinking the storyline in a distributed process is the easiest way to get into a pickle. This is a key point: how we tell the story of a process and how it gets executed are two different things. Modern programming languages sometimes pretend they are the same, and sometimes separate the two entirely. Scroll back time to 1992, and the world was having all the same problems as today, in different wrapping. Then there was cron, pumping out scripts on hourly, daily and weekly schedules. This was used not only for running jobs on the system but for basic configuration health checks. Cron scripts were like a wild horde of cats, each with their own lives, hard to bring into some sense of order. In the early 90s the acme of orchestration was to sort out all your cron jobs to do the right thing at the right time. The scripts had to be different on the machines too because the flavours of Unix were quite different – and thus there was distributed complexity. Before CFEngine people would devise devious ways of creating one cronfile for each host and then pushing them out. This was considered to be orchestration in 1992. One of the first use cases for CFEngine was to replace all of this with a single uniform model oriented language/interface. CFEngine was target oriented, because it had to be repeatable. Convergence . In this article I explain why virtual environments and containers are basically this issue all over again. Another tool of this epoch is the make for building software from dependencies. In 1994, Richard Stallman pointed out to me that CFEngine was very like make. Indeed, this ended up influencing the syntax of the language. The Makefile was different, it was the opposite of a script. Instead of starting in a known state and pushing out a sequence of transitions from there, it focused on the end state and asked how can I get to that desired end state? In math parlance, it was a change of boundary condition. This was an incredibly important idea, because it meant that – no matter what kind of a mess you were in – you would end up with the right outcome. This is far more important than knowing where you started from. Makefiles did not offer much in the way of abstraction; you could substitute variables and make simple patterns, but this was sufficient for most tasks, because patterns are one of the most important mechanisms for dealing with complexity. Similarly, make was a serial processor running on a single machine, not really suitable for today’s distributed execution requirements. The main concession to parallelism was the addition of “-j” to parallelize building of dependencies. What was really needed was a model based approach where we could provide answers to the following questions: what, when, where, how and why. So now we come to the world of today where software is no longer shackled to a workstation or a server, but potentially a small cog in a large system. And more than that - it is a platform for commerce in the modern world. It’s not just developers and IT folks who care about having stuff built - it’s everyone who uses a service. Many of the problems we are looking to solve can be couched in the model of a deployment of some kind. Whether it is in-house software (“devops”), purchased off-the-shelf software (say “desktop”) or even batch jobs in HPC clusters, all of these typically pass through a test phase before being deployed onto some infrastructure container, such as a server, process group, or even embedded device. Alas the technologies we’ve invented are still very primitive. If we look back to the history of logic, it grew out of the need to hit objects with projectiles in warfare. Ballistics was the cultural origin of mathematics and logic in the days of Newton and Boole. Even today, we basically still try to catapult data and instructions into remote hosts using remote copies and shells. So if a script is like a catapult, that takes us from one decision to the next in a scripted logic. Another name for this triggered branching process is a chain reaction (an explosion). A Makefile is the opposite: a convergent process like something sliding easily down a drain. The branching logic in a script leads to multitudes of parallel alternative worlds. When we branch in git or version control systems we add to this complexity. In a convergent process we are integrating possible worlds into a consistent outcome. This is the enabler for continuous delivery. So developers might feel as though they have their triggered deployments under control, but are they really. No matter, we can go from this… To this … This picture illustrates for me the meaning of true automation. No one has to push a button to get a response. The response is proactive and distributed into the very fabric of the design – not like an add-on. The picture contrasts how we go from manual labour to assisted manual labour, to a proper redesign of process. Automation that still needs humans to operate it is not automation, it is a crane or a power-suit. CFEngine’s model of promises is able to answer all of the questions what, when, where, how and why, at a basic level and has been carefully designed to have the kind of desired-end-state self-healing properties of a drain. Every CFEngine promise is a controlled implosion that leaves a desired end-state. Today, configuration promises have to be supported across many different scales, from the smallest containers like a user identity, to processes, process groups, virtual and physical machines, local networks, organizational namespaces and even globally spanning administrative domains. How do we do that? The simple answer is that we always do it “from within” – through autonomous agents that collaborate and take responsibility for keeping desired-end-state promises at all levels. Traditionally, we think of management of boxes: server boxes, rack boxes, routing boxes, etc. We can certainly put an agent inside every one of those processing entities… But we also need to be able to address abstract containers, labelled by the properties we use in our models of intent – business purpose. These are things like: linux, smartos, webservers, storage devices, and so on. They describe the functional roles in a story about the business purpose of our system. This brings up an important issue: how we tell stories. Despite what we are taught in software engineering, there is not only one version of reality when it comes to computer systems. There is the story: