Authored by Brian Bennett of Digital Elf
I’ve been working a lot with CFEngine newbies. CFEngine has been described as flour, eggs, milk and butter. All the ingredients needed to make a cake. Getting the new CFEngine user to recognize, then become excited about the possibilities that CFEngine provides they are now faced with the question of “What next?”
Indeed, anybody can throw some flour, eggs, milk and butter into a bowl, mix and bake it. But will it taste good?
This is an exposé of how I have managed my CFEngine repository for more than eight years. This design was used to manage over 1,000 host instances.
This works best if you have an agile infrastructure. Use SmartOS, OpenStack, Amazon EC2 (Disclosure: I work for Amazon), CloudStack or similar.
The repository, and version control
Firstly, place your CFEngine repository in some revision control. I am
highly partial to git
. Get the Pro
Git
book (or download it). Read chapters 1,
2, 3. This will make you a git power user. After you’re comfortable
using git read chapters 6 and 7. When you’re hungry for more, read the
rest.
I symlink /var/cfengine/masterfiles
to
/cfengine/inputs
. This contains all of
my policy files.
I also create /cfengine/files
for files
that get copied to remote systems. This mostly contains my configuration
files.
/cfengine/
is initialized as a git
repository. Changes made to either
inputs
or
files
should be atomic. Adding
something new for Apache? Any inputs
and files
involved should be checked in
as a single commit. This makes reverting a change easier.
Environments
I use four environments.
- Alpha
- Beta
- pre-Production
- Production
I also lied about initializing
cfengine
as a git repository. I use a
central repository server that contains only a bare git repository. The
central repository has four branches.
- master
- beta
- preprod
- prod
Astute readers will notice there’s no alpha branch. I’ll get to that later.
beta is a full integration environment. Everything in beta is expected to work, yet not to be relied upon. That is to say, nothing should move to beta that is known broken. Beta will break. But don’t do that intentionally. If it’s half finished keep it out of beta.
prod is the full production environment. Breaking this means losing money. Don’t break this. Prod is tagged daily. Rolling back is done by checking out the appropriate tag.
preprod is for final quality assurance testing. Preprod should be identical to prod except for changes to be imminently released to prod. Preprod can also be used for offline testing of the production environment without affecting capacity or availability. Preprod should be in your production network fabric.
master is the trunk. All code is initially merged here, then merged to the appropriate branches. No one should be allowed to merge directly to any branch other than master. The repositry czar merges commits to other branches.
A DevOps Workflow
This is why there’s no alpha branch.
Let’s assume that you’re going to be making a change to the configuration of Tomcat.
- Spawn a new CFEngine instance.
git clone
the CFEngine master branch and bootstrap the server to itself.- Spawn as many instances as necessary for your application to work. This will be at least Tomcat instances, possibly including Apache and Postgres instances and bootstrap all of them to your new CFEngine server instance.
- By editing only the CFEngine files out of the cloned repository make all of your updates.
- Code review
- When that feature is ready push a single commit to master and merge to beta.
- Integration testing in beta.
- Changes that need to be made are done in the private instance set. When they’re ready proceed from step 5.
- When that feature is ready merge from beta to preprod.
- Final QA testing.
- Changes that need to be made are done in the private instance set. When they’re ready proceed from step 5) (Yes, that means it goes through integration again).
- When that feature is ready merge from preprod to prod.
Managing The CFEngine Repository: Layers
I use a layered approach. Each layer is contained within a single bundle.
- Meta - These are things that affect every host that runs CFEngine.
Layers that are based on intrinsic characteristics:
- Operating system families - windows, unix (anything Unix like)
- Operating System - linux, solaris, bsd, darwin
- Distribution - debian, redhat, solaris11, omnios, freebsd, openbsd
- Distro Version sub-layer - debian_6, redhat_6, centos_6
Layers that are based on the role
- Application - apache, postgresql, mysql, bind, tomcat(These are
often named after packages)
- Application sub-layer - apache1_3, apache2, tomcat6, tomcat7
- Role - external_web, internal_web, proxy, smarthost
- Hostname - web_f7f274, web_4d06a8
BundleLayer Mapping
I generally contain one bundle per file, per layer. The default policy files that come with CFEngine are in what I consider the meta layer.
This is a subset of my policy files to give you an idea of the organization.
unix.cf
-bundle agent unix
linux.cf
-bundle agent linux
debian.cf
-bundle agent debian
redhat.cf
-bundle agent redhat
solaris.cf
-bundle agent solaris
apache2.cf
-bundle agent apache2
bind9.cf
-bundle agent bind9
web_ext.cf
-bundle agent web_ext
(policy for public facing web servers)dpkg.cf
-bundle agent dpkg
(Package management common to Debian)rpm.cf
-bundle agent rpm
(Package management common to RedHat)ips.cf
-bundle agent ips
(Package management common to the Image Package System, used by Solaris)digitalelf_stdlib.cf
- Private library of bundles and bodies. This is similar in nature tocfengine_stdlib.cf
, but I never changecfengine_stdlib.cf
. I put things into my private library. When they are well tested I open a pull request withcfengine/core
to contribute it.
All promises are added to the lowest layer bundle (with global being the
lowest and hostname behing the highest). Thus, changes to
/etc/resolv.conf
, because all Unix like
systems treat /etc/resolv.conf
alike
goes into the unix
layer. The
sysctl
handling is different per
operating system so they go into linux
and bsd
bundles at the OS layer.
An external facing web server, by nature of being a web server must
include apache as does an internal facng web server, so each
automatically pulls in apache2
.
Likewise canonical DNS servers and caching DNS servers alike pull in
bind9
.
Dynamic bundlesequence
Because of the layered approach, which inputs and bundles need to be run
are dynamically generated. Public web servers running on Debian Linux
will be able to select the ext\_web
,
apache2
,
debian
, and
linux
bundles automatically. I can have
the same web content on Solaris 11 and it will instead choose
ext\_web
,
apache2
, and
solaris
bundles.
I have a very large header to
promises.cf
to facilitate this. Here is
an excerpt, along with additional commentary of my
promises.cf
to show how the
bundlesequence
is dynamically
generated.
bundle common classify {
# This section classifies hosts instances into roles based on the hostname
# I use a completely virtualized infrastructure with hostnames determined by
# on a role specific prefix and a hex string separated by an underscore.
# The hex string is the last 3 bytes of the MAC address of the lowest
# numbered interface (e.g., eth0). Instances are created this way by my
# provisioning system.
classes:
"dns_ns" or => { classmatch("ns[0-9]*") };
"dns_forwarder" or => { classmatch("dns_[0-9a-f]*") };
"db_server" or => { classmatch("db_[0-9a-f]*") };
"gitlab" or => { classmatch("gitlab_[0-9a-f]*") };
"web_ext" or => { classmatch("www_[0-9a-f]*") };
"web_int" or => { classmatch("web_[0-9a-f]*") };
"xwiki" or => { classmatch("xwiki_[0-9a-f]*") };
# Roles choose application bundles
"apache" expression => "dpkg_repo|web_ext|web_int";
"bind" expression => "dns_ns|dns_forwarder";
"postgresql" expression => "db_server";
"tomcat" expression => "xwiki|jira";
"rails" expression => "gitlab"
# Roles and/or applications can be grouped
"app_server" expression => "rails|tomcat"
# Applications may also depend on other applications
"sql_client" expression => "app_server";
"ssl" expression => "apache|tomcat|rails";
"stunnel" expression => "mysql";
}
bundle common g {
# This section assigns bundles to application/role/grouping classes.
# An array is created, named **bundles**. Each *key* is named after
# a *bundle*. The *value* of each key is the input file where that
# bundle can be found.
vars:
# These classes were defined by me in the classify bundle
apache::
"bundles[apache]" string => "apache.cf";
bind::
"bundles[bind]" string => "bind.cf";
postgresql::
"bundles[postgresql]" string => "postgresql.cf";
ssl::
"bundles[ssl]" string => "ssl.cf";
stunnel::
"bundles[stunnel]" string => "stunnel.cf";
# Thse are hard classes determined by cfengine. I don't need to explicitly
# classify them.
debian::
"bundles[dpkg]" string => "dpkg.cf";
"bundles[debian]" string => "debian.cf";
centos::
"bundles[rpm]" string => "rpm.cf";
"bundles[centos]" string => "centos.cf"
sunos_5_11::
"bundles[ips]" string => "ips.cf";
"bundles[solaris]" string => "solaris.cf";
xen_dom0::
"bundles[xen_dom0]", string => "xen_dom.cf0";
# Now the magic.
# I create two slists. One named "sequence" and one named "inputs".
# The "sequence" slist contains a list of bundle names.
# The "inputs" slist contains a list of input files.
any::
"sequence" slist => getindices("bundles");
"inputs" slist => getvalues("bundles");
}
body common control {
# The bundlesequence now includes those things which are common to all, plus
# the contents of the slist "sequence" (which has ben dynamically generated),
# plus the unqualified hostname.
bundlesequence => { "global", "main", "@{g.sequence}", "${sys.uqhost}"};
# The inputs now includes common libraries and main.cf which will be run by
# all systems, plus the contents of the slist "inputs" (which has been
# dynamically) generated, plus an input based on the unqualified hostname.
inputs => { "cfengine_stdlib.cf", "digitalelf_stdlib.cf", "main.cf", "@{g.inputs}", "${sys.uqhost}.cf" };
# Sometimes I need to have any specific configuration for a single host (e.g.,
# one of dns_ns will be the master and the rest will be slaves so the master
# needs special configuration). The following options will allow CFEngine to
# skip the hostname bundle/input if one does not exist (which it usually
# doesn't).
ignore_missing_bundles => "true";
ignore_missing_inputs => "true";
version => "Community Promises.cf 1.0.0";
}
Notice that instances are automatically classified by their hostname. So
if I need a new external web server I provision a new instance with the
name prefix www_
(I can also choose
the OS at provisioning time). My provisioning system automatically
assigns them a unique ID, creates the instance, installs the OS,
installs cfengine, bootstraps it to the CFEngine master server, runs
cfengine to apply the final configuration and finally adds the
instance’s services to the appropriate load balancer entries.
I have repository mirrors of all platforms I run so a newly provisioned host can be in production with a perfect configuration in as little as five minutes.
[1]: Pro Git book