Or what we should mean by Distributed Orchestration
Orchestrating complicated distributed processes is an unfamiliar aspect
of computing that leads to all kinds of confusions. We are not taught
how to do it in college, so we end up trying to apply any methods we are
taught, often in inappropriate ways. Promise theory paints a very simple
picture of distributed orchestration. Rather than imagining that a
central conductor (controller) somehow plays every instrument by remote
magic wand, in an algorithmic fashion, promise theory says: let every
player in an ensemble know their part, and leave them all to get on with
it. The result is an emergent phenomenon. The natural dependences on one
another will make them all play together. Over-thinking the storyline in
a distributed process is the easiest way to get into a pickle. This is a
key point: how we tell the story of a process and how it gets executed
are two different things. Modern programming languages sometimes pretend
they are the same, and sometimes separate the two entirely.
Scroll back time to 1992, and the world was having all the same problems
as today, in different wrapping. Then there was cron
, pumping out
scripts on hourly, daily and weekly schedules. This was used not only
for running jobs on the system but for basic configuration health
checks. Cron scripts were like a wild horde of cats, each with their own
lives, hard to bring into some sense of order. In the early 90s the acme
of orchestration was to sort out all your cron jobs to do the right
thing at the right time. The scripts had to be different on the machines
too because the flavours of Unix were quite different – and thus there
was distributed complexity. Before CFEngine people would devise devious
ways of creating one cronfile for each host and then pushing them out.
This was considered to be orchestration in 1992.
One of the first use cases for CFEngine was to replace all of this with
a single uniform model oriented language/interface. CFEngine was target
oriented, because it had to be repeatable. Convergence . In this article
I explain why virtual environments and containers are basically this
issue all over again.
Another tool of this epoch is the make
for building software from
dependencies. In 1994, Richard Stallman pointed out to me that CFEngine
was very like make. Indeed, this ended up influencing the syntax of the
language.
The Makefile was different, it was the opposite of a script. Instead of
starting in a known state and pushing out a sequence of transitions from
there, it focused on the end state and asked how can I get to that
desired end state? In math parlance, it was a change of boundary
condition. This was an incredibly important idea, because it meant that
– no matter what kind of a mess you were in – you would end up with
the right outcome. This is far more important than knowing where you
started from.
Makefiles did not offer much in the way of abstraction; you could
substitute variables and make simple patterns, but this was sufficient
for most tasks, because patterns are one of the most important
mechanisms for dealing with complexity. Similarly, make was a serial
processor running on a single machine, not really suitable for today’s
distributed execution requirements. The main concession to parallelism
was the addition of “-j” to parallelize building of dependencies.
What was really needed was a model based approach where we could provide
answers to the following questions: what, when, where, how and why.
So now we come to the world of today where software is no longer
shackled to a workstation or a server, but potentially a small cog in a
large system. And more than that - it is a platform for commerce in the
modern world. It’s not just developers and IT folks who care about
having stuff built - it’s everyone who uses a service.
Many of the problems we are looking to solve can be couched in the model
of a deployment of some kind. Whether it is in-house software
(“devops”), purchased off-the-shelf software (say “desktop”) or even
batch jobs in HPC clusters, all of these typically pass through a test
phase before being deployed onto some infrastructure container, such as
a server, process group, or even embedded device.
Alas the technologies we’ve invented are still very primitive. If we
look back to the history of logic, it grew out of the need to hit
objects with projectiles in warfare. Ballistics was the cultural origin
of mathematics and logic in the days of Newton and Boole. Even today, we
basically still try to catapult data and instructions into remote hosts
using remote copies and shells.
So if a script is like a catapult, that takes us from one decision to
the next in a scripted logic. Another name for this triggered branching
process is a chain reaction (an explosion). A Makefile is the opposite:
a convergent process like something sliding easily down a drain. The
branching logic in a script leads to multitudes of parallel alternative
worlds. When we branch in git or version control systems we add to this
complexity. In a convergent process we are integrating possible worlds
into a consistent outcome. This is the enabler for continuous
delivery.
So developers might feel as though they have their triggered deployments
under control, but are they really. No matter, we can go from this…
To this … This picture illustrates for me the meaning of true
automation. No one has to push a button to get a response. The response
is proactive and distributed into the very fabric of the design – not
like an add-on. The picture contrasts how we go from manual labour to
assisted manual labour, to a proper redesign of process. Automation that
still needs humans to operate it is not automation, it is a crane or a
power-suit.
CFEngine’s model of promises is able to answer all of the questions
what, when, where, how and why, at a basic level and has been carefully
designed to have the kind of desired-end-state self-healing properties
of a drain. Every CFEngine promise is a controlled implosion that leaves
a desired end-state.
Today, configuration promises have to be supported across many different
scales, from the smallest containers like a user identity, to processes,
process groups, virtual and physical machines, local networks,
organizational namespaces and even globally spanning administrative
domains. How do we do that? The simple answer is that we always do it
“from within” – through autonomous agents that collaborate and take
responsibility for keeping desired-end-state promises at all levels.
Traditionally, we think of management of boxes: server boxes, rack
boxes, routing boxes, etc. We can certainly put an agent inside every
one of those processing entities…
But we also need to be able to address abstract containers, labelled by
the properties we use in our models of intent – business purpose. These
are things like: linux, smartos, webservers, storage devices, and so on.
They describe the functional roles in a story about the business purpose
of our system.
This brings up an important issue: how we tell stories. Despite what we
are taught in software engineering, there is not only one version of
reality when it comes to computer systems. There is the story:
- Told by the designer
- Told by the programmer
- Told by the debugger
- Told by the user…
- etc
When building software, it’s all “if this then that, else while do…” That highly serialized (step by step) story only lives in the programmer’s mind. It is an artificial view of reality designed to perpetuate a view of determinism. It is not what the system does. Often when something goes wrong with a system, we start telling simplified stories that focus on “root causes”, which is again trying to condense the indeterminism of a system down to a simple deterministic fiction – a story we can live with. So there is a serial story of transitions we go through when building systems that is represented by commands and hands-on interaction. Then there is a story about how tasks are delegated to different parts of the system. What is important to understand is that we should not try to tie the story about what a system does too closely to how it does it. This leads to inefficiencies and even problems. The goal of an orchestration tool is to try to paint a suitable story around purpose that makes a good compromise on these issues. Then we can leave the execution of the story to the individual characters in the play. What CFEngine introduced was a kind of “make” for desired system state, which had types of target promises - not just files to be built or installed, but processes to run, services to maintain, and so on. It was a make not only for the binary data, but also for the runtime state. Moreover, it could span any number of machines and use the container abstractions to assign each part of the music to its player. Embedded in this orchestration framework is still good old “make” and its rule based evaluation. But it is a make that has storytelling attributes so that we can understand the purpose of the system independently of the algorithm used to build and maintain it. In addition it has built-in tools that a tailored to the special transformations that are needed to build and maintain the full software stack. These tools had to be borrowed from other sources in make. Now that we have a more powerful make that can handle static and runtime state across any kind of physical or abstract region, we turn it into a distributed service by making the agent on each separate box aware of all the context it needs to figure out its role in the whole system. Then, as long as we have defined desired outcome into simple target “buckets”, convergence and orchestration of that state can happen entirely autonomously: without any human intervention. Host containers can even spawn guest containers to keep specialized environments separate. This doesn’t necessarily help execution, but it simplifies the telling of the story for human shepherds. So: how do we orchestrate a distributed process of assembling some software in an environment like this? It’s simple: give everyone their job description and let them get on with it. The trick is to use policy as the musical score for the orchestration. Policy assigns roles and appropriate guidance to each potential player in an ensemble. It can even assign roles to abstract groups that don’t exist yet! The idea is very flexible. In this example, we coordinate some docker containers (children) inside a parent host and enable them to work together, sharing necessary information, but not trying to push them into a sequence of steps like a programmer. A schematic of the arrangement: Need to Insert Schematic here The parent container promises to make specialized children each with a particular task. It makes its data available to then in their “in-boxes”, and agrees to collect any results from them from their “out-boxes”. Using make semantics, it can check to see whether any changes have been made that need to be picked up and integrated into a final result. There is a CFEngine agent in each of these containers. Each of the agents knows the part it has to play, and the result will now emerge from the design, without having to drive the story step by step from some central controller. In the example, I used docker containers. For interested docker fans, this is the entire docker file. It is the analogue of the cron file I showed above. The principle is the same: replace all of the burden of handling variations on a simple model-based automation, designed to handle absractions. All of the modelling is in CFEngine policy so that the story is told at an appropriate level of abstraction, rather than confusing a bricklayer version with an architect version. The advantage of using this approach is that it can work even for the partially connected devices like mobile phones and pads that are increasingly a vital part of the software chain. Because there is no need for a micro-managed sequential bricklaying approach to ensuring process completion, the emergent effect can happen independently inside every container without different containers preventing one another from making progress by having to wait. The ability to run on networking devices and incorporate them into the story is now a critical part of the story-telling. The abstractions that we are building are pushing all of the complexity down onto networking engineers. They have not yet benefitted from the abstractions that Open Source innovation used to transform server management. Now that is possible, and they become part of the single orchestrated storyline. The goal of distributed management as story-telling is to make comprehensible system documentation directly executable by distributed agents so that there is only one version of events: the one that describes our intent rather than an explosion of changes triggered by a push button trigger. The idea of story telling as a crucial part of human-machine design is part of a larger narrative. It starts with humans solving problems by trial and error. Then we learn better approaches, practice and rehearse. Once we have it down to a mechanical repetition, we are basically acting like machines, so why not get a machine to do it? Our first attempts at mechanizing are to make machines that imitate how humans work. Finally we separate the humans activity from a true form of automation, just like the storm drain design in the illustration above. I wanted to show how to make an orchestrated outcome emerge from a description of the different roles in a process. This is an unfamiliar idea to many, because we are not taught emergent methods in college. But we need to be less afraid of emergence, because all of things we rely most on in our lives actually work in this way (see my blog The Brain Horizon). When you don’t have to worry about how something works, you can focus on what it does. This is how we need to be thinking about complex, multi-scale processes. Read more about these issues in my book: In Search of Certainty Watch the CFEngine for Docker automation demo The CFEngine example policy used is shown below. It can also be downloaded here ################################# # # Orchestration of containers # ################################# bundle common my { vars: “parent_address” string => “172.17.42.1”; “name” string => “$(sys.uqhost)”; “jobs” slist => { “container_1”, “container_2”, “container_3” }; classes: “containers” or => { @(jobs) }; } ################################################################################# bundle agent main { methods: parent:: “prepare continuous delivery source” comment => “The files where people make changes”; “collect build products” comment => “Reap the harvest”; “package for delivery” comment => “Tie it with a bow”; containers:: “fetch from the parent” comment => “pick up files and build something”; services: parent:: “children” comment => “Spawn some fledgings”; } ################################################################################### # Methods ################################################################################### bundle agent method_prepare_continuous_delivery_source { files: # Make one task for each bud "/tmp/feeds/$(my.jobs)_feed.txt" create => “true”, edit_line => append_if_no_line(“This is for $(my.jobs) from $(sys.uqhost)”); } ################################################################################### bundle agent method_collect_build_products { files: “/tmp/results/.” create => “true”; "/tmp/results/reply_from_$(my.jobs)" comment => “Pick up the latest on the pipeline”, copy_from => secure_cp("/tmp/result_for_pickup.txt","$(sys.docker_guest_ip[$(my.jobs)])"); } ################################################################################### bundle agent method_package_for_delivery { vars: “all_files” slist => { “/tmp/results/reply_from_container_1”, “/tmp/results/reply_from_container_2” }; commands: "/bin/tar zcf /tmp/package.tgz /tmp/results" if => makerule("/tmp/package.tgz", “@(all_files)”); } ################################################################################### bundle agent service_children(name,state) { guest_environments: parent.!cleanup:: "$(my.jobs)" guest_details => stem_cell, guest_state => “create”; cleanup:: "$(my.jobs)" guest_details => stem_cell, guest_state => “delete”; reports: !cleanup:: “CONTAINERS running”; cleanup:: “CONTAINERS stopped”; } #################################################################################### body guest_details stem_cell { guest_type => “docker”; guest_image_name => “cf-stem-cell”; } ################################################################################### # Containers ################################################################################### bundle agent method_fetch_from_the_parent { services: “ssh” comment => “So we can log in and inspect the containers”; “cfengine” comment => “Need the server running”; files: "/tmp/job_description.txt" comment => “Pick up the latest on the pipeline”, copy_from => secure_cp("/tmp/feeds/$(my.name)_feed.txt","$(my.parent_address)"); "/tmp/result_for_pickup.txt" create => “true”, comment => “Do my part and deliver the result for pickup”, edit_line => signature("/tmp/job_description.txt", “A postcard from sunny $(sys.ipv4) - Thanks, with love $(my.name)”); } #################################################################################### # Misc - try to keep this all in one file #################################################################################### body common control { bundlesequence => {“my”, “main”}; } ## bundle server access_control { access: "/tmp" admit => { “172.17.0.0/16” }; } ## body server control { allowconnects => { “172.17.0.0/16” }; allowallconnects => { “172.17.0.0/16” }; trustkeysfrom => { “172.17.0.0/16” }; } ## bundle agent service_ssh(a,b) { processes: “/usr/sbin/sshd” restart_class => “restart_ssh”; commands: “/usr/sbin/sshd” ifvarclass => “restart_ssh”; } ## bundle agent service_cfengine(a,b) { processes: “cf-serverd” restart_class => “restart_cfengine”; commands: “/var/cfengine/bin/cf-serverd” ifvarclass => “restart_cfengine”; } ## body copy_from secure_cp(from,server) { source => “$(from)”; servers => { “$(server)” }; compare => “digest”; encrypt => “true”; verify => “true”; trustkey => “true”; } ## bundle edit_line append_if_no_line(str) { insert_lines: "$(str)" comment => “Append a line to the file if it doesn’t already exist”; } ## bundle edit_line signature(file,str) { insert_lines: "$(file)" insert_type => “file”; “$(str)”; } ## body contain in_dir(s) { chdir => “$(s)”; }