Making IP (Layer 3) a First-Class Netizen in the Configuration World

Posted by Mahesh Kumar
July 3, 2014

Networking has become a utility in our daily lives, but it is still evolving as a platform for modern commerce. CFEngine has now taken networking beyond simple automation, by bringing modern self-repairing, model-oriented approaches to WebScale management at the IP layer. Network interface configuration is nothing new in the world of configuration management. CFEngine versions 1 and 2 were plugging the obvious hole in the out-of-the-box experience during the mid 1990s, before operating systems finally got their acts together. In 2007, CFEngine even explored Layer 3 WAN management (IP routing by any other name) as a member of the EMANICS network of excellence - but, back then, the market was not ready. The dominant networking vendors were still largely concerned with proprietary architecture, steeped in feature-based competition. However, the more recent arrival of Linux as a network operating system has changed that. The adoption and development of the Linux kernel by several companies has pulled down the vendor wall to let in Open Source. Significantly, for the industry, the growth fueled by Open Source has led to the birth of modern datacenter engineering, and DevOps. Recently, with the discussion of Software Defined Networking (SDN), and the arrival of DevOps in the networking space, several tool providers have waded into the world of Layer 2 networking - bonds, bridges and VLANs - which is the comfort zone of most system administrators. However, the historically complex Layer 3 routing services have thus far been neglected. In order to change this, CFEngine founder Mark Burgess has been working with Dinesh Dutt of Cumulus Networks to come up with a model of Layer 3 networking based on CFEngine’s Promise Theory approach to scalable management.

SDN, by any other name

The promise of SDN was to make networking better support business value, i.e. applications, while bringing a more policy-based approach to management with version control. This was the world that CFEngine brought to IT Automation and Management - now often called `Infrastructure as Code’. However, new challenges often beat a retreat to centralized management. The central controller models of OpenFlow and other virtualized implementations of SDN provide detailed programming models, but this is done at the cost of re-centralizing the control model through a complex API. Moreover, the SDN APIs push the burden of implementation onto a new breed of programmer, who is expected to understand the complex historical meanderings of network management - an expensive and optimistic proposition. Conversely, our research into Promise Theory at CFEngine has been to leverage patterns and create desired end-state technology with self-repairing capability, which is the so-called `executable documentation’ model. By declaring simple promises and relying on its configuration engine, one can go beyond simple automation to bring vertical and horizontal resilience to network structures. CTO Mark Burgess says, “We did not want to just make an API to pass on the complexity of these systems to developers. We wanted to fundamentally simplify L3 configuration so that anyone could understand it. Much of the work went into distilling the complexity down to a manageable level, without sacrificing all-important decentralized aspects.”

From LAN to WAN and back

Layer 3 routing protocols are notoriously arcane inventions, evolved over many years against a changing backdrop of business needs, so it is not surprising that this has taken an innovative approach. But today, Wide Area Networking models are becoming the way to scale the fattening Local Area Networks in datacenters. For example, a simple 2x2 redundant routing block (see fig 1), equilibrated by OSPF might look as simple as this (with unnumbered interfaces): 2x2 Figure 1. 2x2 Redundant Routing Block.

bundle agent main()
{
  interfaces:
    "swp1"
          link_services =>  ospf_area("0");
    "swp2"
          link_services =>  ospf_area("0");
}

body routing_services control
{
  ospf_redistribute => { "kernel", "static" };
  # Map linux hostnames to a router_id
  hostname_1:: ospf_router_id => "1.1.1.1";
  hostname_2:: ospf_router_id => "2.2.2.2";
  hostname_3:: ospf_router_id => "3.3.3.3";
  hostname_4:: ospf_router_id => "4.4.4.4";
}

The ability to know the intended state, and to monitor the actual state, while using its machine-learning to trace performance characteristics now drops out of the CFEngine framework for free. (Note that the bulk of the configuration lies in naming the boxes – a generic theme when managing name-spaces, according to promise theory.) This pattern is supplemented by a basic template like this, which can be kept in a standard library for re-use:

body link_services ospf_area(area)
{
  ospf_area => "$(area)";
  ospf_authentication_digest => "ABCDEFGHIJK";
  ospf_link_type => "point-to-point";
  ospf_hello_interval => "5";
}

The purpose of this very generic code is clearly not to micromanage details through continuous tweaking, but rather to make the problem go away as a managed service. Currently, CFEngine has built its reference implementation as part of the development-friendly Cumulus Linux ecosystem.

The house that Clos built

Large datacentres today are using non-blocking Fat Tree architectures to scale traffic levels horizontally with commodity hardware. These structures were pioneered by Charles Clos in the 1950s. The highly regular Clos networks are well suited to patterns of promises. Dynamical change for application management can be handled by protocols like iBGP, without bringing new virtualization into the picture. For example, repeating the approach for iBGP, one might imagine a similar data-driven pattern for a 2x5 tree (fig 2): 2x5

Figure 2. 2x5 Tree Diagram.

bundle agent LeafSpine
{
vars:

 # Generate the interface lists used on the routers

 "spine"  slist => expandrange("swp[1-5]", "1"); # point to 5 leafsw
 "leaves" slist => expandrange("swp[1-2]", "1"); # point to 2 spinesw

 "net_adverts[leaf1]" slist => { "10.10.10.1/24", "10.10.20.1/24" };
 "net_adverts[leaf2]" slist => { "10.10.30.1/24", "2001:0DB9:0:f101::1/64" };
 "net_adverts[leaf3]" slist => { "192.168.1.0/24" };
 "net_adverts[leaf4]" slist => { "192.168.1.0/24" };
 "net_adverts[leaf5]" slist => { "192.168.1.0/24" };

 "router_id[spine1]" string => "2.0.0.1";
 "router_id[spine2]" string => "2.0.0.2";
 "router_id[leaf1]" string => "1.0.0.1";
 "router_id[leaf2]" string => "1.0.0.2";
 "router_id[leaf3]" string => "1.0.0.3";
 "router_id[leaf4]" string => "1.0.0.4";
 "router_id[leaf5]" string => "1.0.0.5";

interfaces:
 spine::
   "$(spine)"
       link_services =>  ibgp_reflector("server");
 leaves::
   "$(leaves)"
       link_services =>  ibgp_reflector("client");
}

With the data specified in an associative array, two generic promise patterns are sufficient to configure the 2x5 tree.

Model-based for WebScale

Model-based configuration is not just a “nice to have”, it is a “must have” feature of a scalable architecture. The three challenges of today’s software stack remain: scale, complexity and knowledge management. Brute force can handle the first of these, but going forward, using a model-based approach is the only plausible option. What CFEngine adds is its usual promise of self-healing convergent end-state management and a knowledge-oriented approach. A simplification of datacenter operations, by de-coupling fast and slow changes with a versioned set of promises, takes away several burdens that haunt network engineers. This is about enabling DevOps to work collaboratively across application-network divide. And that kind of evolution in the management of network services is long overdue.