CFEngine Music To Our Ears

May 30, 2013

Bruce Carleton is the organizer of the San Francisco CFEngine User Group. He has a deep understanding of system administration for distributed systems and how to make it easy (and fun!) to use CFEngine for that. He sent a message to the user group email list, that we want to share with you:

Re: [SF-Cfengine-Users-Group] Automation Adoption Challenges and Analogies

During the last SF CFEngine users group, we had a good discussion about promoting adoption of CFEngine in our various environments. It can be a challenging thing to do, and I think it’s worth sharing some of my thoughts on the matter. This is an edited version of an email I sent out on the help-cfengine mailing list, so I apologize in advance for the resend to those who have dual subscriptions.

I think there are two key parts to the confusion new users of CFEngine experience. Rather than having the more familiar programming paradigms like imperative or object oriented programming, there is a high level domain specific language for managing files, processes and other things on distributed systems.

Another part of the confusion comes from the problem domain itself. Rather than the familiar and consistent execution environment of disk, memory, registers and the processor(s) of a single system, there are many systems that are distributed, some that may have high latency or intermittent connectivity. Mutual authentication is another significant challenge.

At the risk of being overly general I’ll try to use music composition as an analogy for writing CFEngine code. Programmers from outside the large systems arena, might be a bit like a guitar player. They have become very familiar with their six strings and learned the neck of their instrument. When they compose music, that music may be written in the context of a single guitar player, playing in the known environment of their six strings.

Improvisation is easy to do, because there is little coordination necessary. The guitar player can make up an arrangement as they go, and if they are skillful, it will still sound great. Now let’s throw that guitar player into the role of composing music for a symphony orchestra.

An orchestra contains sections of string, brass, woodwind, and percussion instruments. Now the composer has to provide arrangements for each of its players, a more complicated task. Part of this, is that the composer can not concern themselves with exactly how the players will play their varied instruments. The players already know how and must to be trusted to do so. The composer has to provide the music score, the specific arrangements for each section of the orchestra. Now lets take that back to CFEngine.

When you write a CFEngine policy, it needs to provide instructions to a varied number of systems that have been classified in various ways, possibly using their operating system, subnet, application and various other characteristics. One very important part of this is using the high level abilities of CFEngine, or said another way, just worry about the music. When you deploy your policy (the music) the systems subject to the policy will already know how to execute the policy using cf-agent.

Another thing to consider, is that once the symphony has begun (production), it can be difficult to change the arrangement. Improvisation is difficult when using provisioning tools like Kickstart or the provisioning API’s at cloud providers. You must work out ways to safely re-provision large numbers of hosts. You are also exposed to failures of the control system for provisioning, which may need to be very complicated and possibly be shared with a large number of uncooperating users. That control system can become a single point of failure from the users perspective.

This is where CFEngine can shine. When skillfully deployed, it can provide a scalable and fault tolerant way to change the music, in the middle of the symphony. Improvisation becomes possible again, albeit at a slower pace. Now some might say that’s exactly what’s wrong with CFEngine and tools like it, though I don’t feel that way. You will have to judge for yourself, given your environment. Something that also merits mentioning is the current state of cluster management and the elusive single system image.

Creating a single system image is an admirable goal, but, treating individual systems as “autonomous” is better in the systems management context. It is easier to back-fit onto existing servers, something that is important when trying to manage large numbers of systems in an existing infrastructure. It also scales better because individual servers do more for themselves. It helps to eliminate single points of failure in an infrastructure and turns total service failures, into service degradations. Another thing is, systems should have some ability to fix themselves. Too much time is spent on monitoring and alerting human response, rather than creating automation that simply fixes things when they stop working, then records the event for later analysis.

So anyway, I’ll stop rambling now. If you have made it to the end of this post, thanks for following along. I hope you found it to be a valuable way of thinking about CFEngine. Comments are welcome.

Best,

--Bruce