Or what we should mean by Distributed Orchestration
Orchestrating complicated distributed processes is an unfamiliar aspect
of computing that leads to all kinds of confusions. We are not taught
how to do it in college, so we end up trying to apply any methods we are
taught, often in inappropriate ways. Promise theory paints a very simple
picture of distributed orchestration. Rather than imagining that a
central conductor (controller) somehow plays every instrument by remote
magic wand, in an algorithmic fashion, promise theory says: let every
player in an ensemble know their part, and leave them all to get on with
it. The result is an emergent phenomenon. The natural dependences on one
another will make them all play together. Over-thinking the storyline in
a distributed process is the easiest way to get into a pickle. This is a
key point: how we tell the story of a process and how it gets executed
are two different things. Modern programming languages sometimes pretend
they are the same, and sometimes separate the two entirely.
Scroll back time to 1992, and the world was having all the same problems
as today, in different wrapping. Then there was
cron
, pumping out
scripts on hourly, daily and weekly schedules. This was used not only
for running jobs on the system but for basic configuration health
checks. Cron scripts were like a wild horde of cats, each with their own
lives, hard to bring into some sense of order. In the early 90s the acme
of orchestration was to sort out all your cron jobs to do the right
thing at the right time. The scripts had to be different on the machines
too because the flavours of Unix were quite different – and thus there
was distributed complexity. Before CFEngine people would devise devious
ways of creating one cronfile for each host and then pushing them out.
This was considered to be orchestration in 1992.
One of the first use cases for CFEngine was to replace all of this with
a single uniform model oriented language/interface. CFEngine was target
oriented, because it had to be repeatable. Convergence . In this article
I explain why virtual environments and containers are basically this
issue all over again.
Another tool of this epoch is the
make
for building software from
dependencies. In 1994, Richard Stallman pointed out to me that CFEngine
was very like make. Indeed, this ended up influencing the syntax of the
language.
The Makefile was different, it was the opposite of a script. Instead of
starting in a known state and pushing out a sequence of transitions from
there, it focused on the end state and asked how can I get to that
desired end state? In math parlance, it was a change of boundary
condition. This was an incredibly important idea, because it meant that
– no matter what kind of a mess you were in – you would end up with
the right outcome. This is far more important than knowing where you
started from.
Makefiles did not offer much in the way of abstraction; you could
substitute variables and make simple patterns, but this was sufficient
for most tasks, because patterns are one of the most important
mechanisms for dealing with complexity. Similarly, make was a serial
processor running on a single machine, not really suitable for today’s
distributed execution requirements. The main concession to parallelism
was the addition of “-j” to parallelize building of dependencies.
What was really needed was a model based approach where we could provide
answers to the following questions: what, when, where, how and why.
So now we come to the world of today where software is no longer
shackled to a workstation or a server, but potentially a small cog in a
large system. And more than that - it is a platform for commerce in the
modern world. It’s not just developers and IT folks who care about
having stuff built - it’s everyone who uses a service.
Many of the problems we are looking to solve can be couched in the model
of a deployment of some kind. Whether it is in-house software
(“devops”), purchased off-the-shelf software (say “desktop”) or even
batch jobs in HPC clusters, all of these typically pass through a test
phase before being deployed onto some infrastructure container, such as
a server, process group, or even embedded device.
Alas the technologies we’ve invented are still very primitive. If we
look back to the history of logic, it grew out of the need to hit
objects with projectiles in warfare. Ballistics was the cultural origin
of mathematics and logic in the days of Newton and Boole. Even today, we
basically still try to catapult data and instructions into remote hosts
using remote copies and shells.
So if a script is like a catapult, that takes us from one decision to
the next in a scripted logic. Another name for this triggered branching
process is a chain reaction (an explosion). A Makefile is the opposite:
a convergent process like something sliding easily down a drain. The
branching logic in a script leads to multitudes of parallel alternative
worlds. When we branch in git or version control systems we add to this
complexity. In a convergent process we are integrating possible worlds
into a consistent outcome. This is the enabler for continuous
delivery.
So developers might feel as though they have their triggered deployments
under control, but are they really. No matter, we can go from this…
To this … This picture illustrates for me the meaning of true
automation. No one has to push a button to get a response. The response
is proactive and distributed into the very fabric of the design – not
like an add-on. The picture contrasts how we go from manual labour to
assisted manual labour, to a proper redesign of process. Automation that
still needs humans to operate it is not automation, it is a crane or a
power-suit.
CFEngine’s model of promises is able to answer all of the questions
what, when, where, how and why, at a basic level and has been carefully
designed to have the kind of desired-end-state self-healing properties
of a drain. Every CFEngine promise is a controlled implosion that leaves
a desired end-state.
Today, configuration promises have to be supported across many different
scales, from the smallest containers like a user identity, to processes,
process groups, virtual and physical machines, local networks,
organizational namespaces and even globally spanning administrative
domains. How do we do that? The simple answer is that we always do it
“from within” – through autonomous agents that collaborate and take
responsibility for keeping desired-end-state promises at all levels.
Traditionally, we think of management of boxes: server boxes, rack
boxes, routing boxes, etc. We can certainly put an agent inside every
one of those processing entities…
But we also need to be able to address abstract containers, labelled by
the properties we use in our models of intent – business purpose. These
are things like: linux, smartos, webservers, storage devices, and so on.
They describe the functional roles in a story about the business purpose
of our system.
This brings up an important issue: how we tell stories. Despite what we
are taught in software engineering, there is not only one version of
reality when it comes to computer systems. There is the story:
- Told by the designer
- Told by the programmer
- Told by the debugger
- Told by the user…
- etc
When building software, it’s all “if this then that, else while do…”
That highly serialized (step by step) story only lives in the
programmer’s mind. It is an artificial view of reality designed to
perpetuate a view of determinism. It is not what the system does. Often
when something goes wrong with a system, we start telling simplified
stories that focus on “root causes”, which is again trying to condense
the indeterminism of a system down to a simple deterministic fiction –
a story we can live with. So there is a serial story of transitions we
go through when building systems that is represented by commands and
hands-on interaction. Then there is a story about how tasks are
delegated to different parts of the system. What is important to
understand is that we should not try to tie the story about what a
system does too closely to how it does it. This leads to inefficiencies
and even problems. The goal of an orchestration tool is to try to paint
a suitable story around purpose that makes a good compromise on these
issues. Then we can leave the execution of the story to the individual
characters in the play.
What CFEngine introduced was a kind of “make” for desired system state,
which had types of target promises - not just files to be built or
installed, but processes to run, services to maintain, and so on. It was
a make not only for the binary data, but also for the runtime state.
Moreover, it could span any number of machines and use the container
abstractions to assign each part of the music to its player.
Embedded in this orchestration framework is still good old “make” and
its rule based evaluation.
But it is a make that has storytelling attributes so that we can
understand the purpose of the system independently of the algorithm used
to build and maintain it.
In addition it has built-in tools that a tailored to the special
transformations that are needed to build and maintain the full software
stack. These tools had to be borrowed from other sources in make.
Now that we have a more powerful make that can handle static and runtime
state across any kind of physical or abstract region, we turn it into a
distributed service by making the agent on each separate box aware of
all the context it needs to figure out its role in the whole system.
Then, as long as we have defined desired outcome into simple target
“buckets”, convergence and orchestration of that state can happen
entirely autonomously: without any human intervention.
Host containers can even spawn guest containers to keep specialized
environments separate. This doesn’t necessarily help execution, but it
simplifies the telling of the story for human shepherds. So: how do we
orchestrate a distributed process of assembling some software in an
environment like this? It’s simple: give everyone their job description
and let them get on with it.
The trick is to use policy as the musical score for the orchestration.
Policy assigns roles and appropriate guidance to each potential player
in an ensemble. It can even assign roles to abstract groups that don’t
exist yet! The idea is very flexible. In this example, we coordinate
some docker containers (children) inside a parent host and enable them
to work together, sharing necessary information, but not trying to push
them into a sequence of steps like a programmer. A schematic of the
arrangement: Need to Insert Schematic
here
The parent container promises to make specialized children each with a
particular task. It makes its data available to then in their
“in-boxes”, and agrees to collect any results from them from their
“out-boxes”. Using make semantics, it can check to see whether any
changes have been made that need to be picked up and integrated into a
final result.
There is a CFEngine agent in each of these containers. Each of the
agents knows the part it has to play, and the result will now emerge
from the design, without having to drive the story step by step from
some central controller. In the example, I used docker containers. For
interested docker fans, this is the entire docker file. It is the
analogue of the cron file I showed above. The principle is the same:
replace all of the burden of handling variations on a simple model-based
automation, designed to handle absractions. All of the modelling is in
CFEngine policy so that the story is told at an appropriate level of
abstraction, rather than confusing a bricklayer version with an
architect version.
The advantage of using this approach is that it can work even for the
partially connected devices like mobile phones and pads that are
increasingly a vital part of the software chain. Because there is no
need for a micro-managed sequential bricklaying approach to ensuring
process completion, the emergent effect can happen independently inside
every container without different containers preventing one another from
making progress by having to wait.
The ability to run on networking devices and incorporate them into the
story is now a critical part of the story-telling. The abstractions that
we are building are pushing all of the complexity down onto networking
engineers. They have not yet benefitted from the abstractions that Open
Source innovation used to transform server management. Now that is
possible, and they become part of the single orchestrated storyline.
The goal of distributed management as story-telling is to make
comprehensible system documentation directly executable by distributed
agents so that there is only one version of events: the one that
describes our intent rather than an explosion of changes triggered by a
push button trigger.
The idea of story telling as a crucial part of human-machine design is
part of a larger narrative. It starts with humans solving problems by
trial and error. Then we learn better approaches, practice and rehearse.
Once we have it down to a mechanical repetition, we are basically acting
like machines, so why not get a machine to do it? Our first attempts at
mechanizing are to make machines that imitate how humans work. Finally
we separate the humans activity from a true form of automation, just
like the storm drain design in the illustration above.
I wanted to show how to make an orchestrated outcome emerge from a
description of the different roles in a process. This is an unfamiliar
idea to many, because we are not taught emergent methods in college. But
we need to be less afraid of emergence, because all of things we rely
most on in our lives actually work in this way (see my blog The Brain
Horizon). When you don’t have
to worry about how something works, you can focus on what it does. This
is how we need to be thinking about complex, multi-scale processes. Read
more about these issues in my book: In Search of
Certainty Watch the CFEngine
for Docker automation
demo
The CFEngine example policy used is shown below. It can also be
downloaded
here
#################################
#
#
Orchestration of containers
#
#################################
bundle
common my
{
vars:
“parent_address” string => “172.17.42.1”;
“name” string => “$(sys.uqhost)”;
“jobs”
slist => {
“container_1”,
“container_2”,
“container_3”
};
classes:
“containers” or => { @(jobs) };
}
#################################################################################
bundle
agent main
{
methods:
parent::
“prepare
continuous delivery source” comment => “The files where people make
changes”;
“collect build products” comment => “Reap the harvest”;
“package for delivery” comment => “Tie it with a bow”;
containers::
“fetch
from the parent” comment => “pick up files and build something”;
services:
parent::
“children” comment => “Spawn some fledgings”;
}
###################################################################################
#
Methods
###################################################################################
bundle
agent method_prepare_continuous_delivery_source
{
files:
#
Make one task for each bud
"/tmp/feeds/$(my.jobs)_feed.txt"
create => “true”,
edit_line => append_if_no_line(“This is for $(my.jobs) from
$(sys.uqhost)”);
}
###################################################################################
bundle
agent method_collect_build_products
{
files:
“/tmp/results/.”
create => “true”;
"/tmp/results/reply_from_$(my.jobs)"
comment => “Pick up the latest on the pipeline”,
copy_from =>
secure_cp("/tmp/result_for_pickup.txt","$(sys.docker_guest_ip[$(my.jobs)])");
}
###################################################################################
bundle
agent method_package_for_delivery
{
vars:
“all_files”
slist => { “/tmp/results/reply_from_container_1”,
“/tmp/results/reply_from_container_2”
};
commands:
"/bin/tar
zcf /tmp/package.tgz /tmp/results"
if
=> makerule("/tmp/package.tgz", “@(all_files)”);
}
###################################################################################
bundle
agent service_children(name,state)
{
guest_environments:
parent.!cleanup::
"$(my.jobs)"
guest_details => stem_cell,
guest_state => “create”;
cleanup::
"$(my.jobs)"
guest_details => stem_cell,
guest_state => “delete”;
reports:
!cleanup::
“CONTAINERS running”;
cleanup::
“CONTAINERS stopped”;
}
####################################################################################
body
guest_details stem_cell
{
guest_type
=> “docker”;
guest_image_name
=> “cf-stem-cell”;
}
###################################################################################
#
Containers
###################################################################################
bundle
agent method_fetch_from_the_parent
{
services:
“ssh”
comment => “So we can log in and inspect the containers”;
“cfengine” comment => “Need the server running”;
files:
"/tmp/job_description.txt"
comment => “Pick up the latest on the pipeline”,
copy_from =>
secure_cp("/tmp/feeds/$(my.name)_feed.txt","$(my.parent_address)");
"/tmp/result_for_pickup.txt"
create => “true”,
comment => “Do my part and deliver the result for pickup”,
edit_line => signature("/tmp/job_description.txt", “A postcard from
sunny $(sys.ipv4) - Thanks, with love $(my.name)”);
}
####################################################################################
#
Misc - try to keep this all in one file
####################################################################################
body
common control
{
bundlesequence
=> {“my”, “main”};
}
##
bundle
server access_control
{
access:
"/tmp"
admit => { “172.17.0.0/16” };
}
##
body
server control
{
allowconnects
=> { “172.17.0.0/16” };
allowallconnects
=> { “172.17.0.0/16” };
trustkeysfrom
=> { “172.17.0.0/16” };
}
##
bundle
agent service_ssh(a,b)
{
processes:
“/usr/sbin/sshd” restart_class => “restart_ssh”;
commands:
“/usr/sbin/sshd” ifvarclass => “restart_ssh”;
}
##
bundle
agent service_cfengine(a,b)
{
processes:
“cf-serverd” restart_class => “restart_cfengine”;
commands:
“/var/cfengine/bin/cf-serverd” ifvarclass =>
“restart_cfengine”;
}
##
body
copy_from secure_cp(from,server)
{
source
=> “$(from)”;
servers
=> { “$(server)” };
compare
=> “digest”;
encrypt
=> “true”;
verify
=> “true”;
trustkey
=> “true”;
}
##
bundle
edit_line append_if_no_line(str)
{
insert_lines:
"$(str)"
comment
=> “Append a line to the file if it doesn’t already exist”;
}
##
bundle
edit_line signature(file,str)
{
insert_lines:
"$(file)"
insert_type => “file”;
“$(str)”;
}
##
body
contain in_dir(s)
{
chdir
=> “$(s)”;
}