Using CFEngine inventory as Ansible inventory

October 7, 2021

CFEngine and Ansible are two complementary infrastructure management tools that both work with so-called inventories. However, the common term can be quite confusing because the way they are defined and created is very different for an Ansible Inventory and for a CFEngine Inventory. In the most basic case, an Ansible Inventory is just a file with a list of hosts and groups of hosts that Ansible then manages when fed the inventory file. On the other hand, CFEngine Inventory is a database of information about all the hosts in the infrastructure managed by CFEngine which the hosts themselves report. In a more complex scenario, an Ansible Inventory can also contain a lot of information about the hosts in the infrastructure, but those need to be pulled from somewhere else and given to Ansible. With CFEngine, hosts talk to a CFEngine Hub, pull policy from it and report information back to it. On the other hand, with Ansible, policy is pushed to the hosts from one place which thus must have a list of all hosts available in advance, potentially with some extra information (parameters) of the hosts.

So the information for an Ansible Inventory has to be pulled from somewhere and then given to Ansible. And of course that somewhere can be CFEngine Inventory which has all the information. Using such data, Ansible can be used in a more efficient way only targetting the hosts where some actions need to be perfomed. Like shown in the image below and further explained later in this post.

CFEngine Hosts and Classes

One of the key concepts in CFEngine are classes. Policy and CFEngine components can define classes which are used for making decisions and some of which are also reported to the hub. The word class has, in this case, its mathematical meaning – a generalization of a set. Looking at individual hosts, classes classify the hosts on which they are defined and for the policy they define a context (that is why they are sometimes called contexts). However, looking from the global view, classes actually define sets of hosts.1 So for example the class linux used in a policy as a so-called class guard linux:: defines a required context for the following promises to be evaluated. If the policy is evaluated in a linux context, i.e. in an environment where the class linux is defined, the promises that follow the class guard are evaluated, otherwise they are skipped. And the hosts where the class linux is defined create a set of, well, hosts running Linux.

CFEngine Hosts by hard class API

The recent versions of CFEngine provide a new API to get hosts grouped by hard classes. Hard classes are special classes defined implicitly by CFEngine components based on the environment discovery. So they reflect the OS the host is running, network parameters, CFEngine version, virtualization technology (if any), etc. With a simple API request, the user can get a JSON- or YAML-formatted file with all the hosts known to CFEngine grouped by hard classes defined on them.

The API request is supposed to be of the following form:

URI: https://hub.example.com/api/hosts/by-class
Method: GET
Parameters:
    context-include     comma delimited string of regular expressions
    format              Output format. Default value is "json". Allowed values: "json", "yaml".
    withInventory       Include inventory data to the API response. Default value is
                        "false". Allowed values: "true", "false"

and with no parameters specified (and thus defaults being used), the response looks like this (with parts omitted for brevity):

{
  "cfengine": {
    "hosts": [
      "ip-172-31-21-241.eu-west-1.compute.internal",
      "ip-172-31-23-151.eu-west-1.compute.internal",
      "ip-172-31-28-22.eu-west-1.compute.internal",
      "ip-172-31-25-102.eu-west-1.compute.internal",
      "ip-172-31-22-152.eu-west-1.compute.internal"
    ]
  },
  "127_0_0_1": {
    "hosts": [
      "ip-172-31-22-152.eu-west-1.compute.internal",
      "ip-172-31-23-151.eu-west-1.compute.internal",
      "ip-172-31-25-102.eu-west-1.compute.internal",
      "ip-172-31-28-22.eu-west-1.compute.internal",
      "ip-172-31-21-241.eu-west-1.compute.internal"
    ]
  },
  "172_31_21_241": {
    "hosts": [
      "ip-172-31-21-241.eu-west-1.compute.internal"
    ]
  },
  "172_31_22_152": {
    "hosts": [
      "ip-172-31-22-152.eu-west-1.compute.internal"
    ]
  },
  "172_31_23_151": {
    "hosts": [
      "ip-172-31-23-151.eu-west-1.compute.internal"
    ]
  },
  "linux": {
    "hosts": [
      "ip-172-31-23-151.eu-west-1.compute.internal",
      "ip-172-31-21-241.eu-west-1.compute.internal",
      "ip-172-31-25-102.eu-west-1.compute.internal",
      "ip-172-31-28-22.eu-west-1.compute.internal",
      "ip-172-31-22-152.eu-west-1.compute.internal"
    ]
  },
  "centos": {
    "hosts": [
      "ip-172-31-21-241.eu-west-1.compute.internal",
      "ip-172-31-22-152.eu-west-1.compute.internal",
      "ip-172-31-25-102.eu-west-1.compute.internal"
    ]
  },
  "centos_7": {
    "hosts": [
      "ip-172-31-25-102.eu-west-1.compute.internal",
      "ip-172-31-22-152.eu-west-1.compute.internal",
      "ip-172-31-21-241.eu-west-1.compute.internal"
    ]
  },
  "ubuntu": {
    "hosts": [
      "ip-172-31-28-22.eu-west-1.compute.internal",
      "ip-172-31-23-151.eu-west-1.compute.internal"
    ]
  },
  "ubuntu_16": {
    "hosts": [
      "ip-172-31-28-22.eu-west-1.compute.internal",
      "ip-172-31-23-151.eu-west-1.compute.internal"
    ]
  },
  "ubuntu_16_04": {
    "hosts": [
      "ip-172-31-28-22.eu-west-1.compute.internal",
      "ip-172-31-23-151.eu-west-1.compute.internal"
    ]
  },
  "xen": {
    "hosts": [
      "ip-172-31-28-22.eu-west-1.compute.internal",
      "ip-172-31-25-102.eu-west-1.compute.internal",
      "ip-172-31-22-152.eu-west-1.compute.internal",
      "ip-172-31-23-151.eu-west-1.compute.internal",
      "ip-172-31-21-241.eu-west-1.compute.internal"
    ]
  }
}

The first part of the example above shows the group of hosts with the cfengine class defined. This special class is defined on all hosts managed by CFEngine so this group can conveniently be used as an equivalent of what is commonly known as the all group. Then we can see some groups (classes) based on the IP addresses assigned to the hosts, the group of Linux hosts and multiple groups based on the operating system (GNU/Linux distribution) the hosts are running. Those groups nicely demonstrate a common pattern of how CFEngine defines the hard classes – a specific class (e.g. ubuntu_16_04), is defined together with more generic classes (ubuntu_16 and ubuntu) that are constructed from it by removing the trailing parts separated by underscores. Last but not least, there's a group of hosts running in the Xen environment which is the same as the cfengine group because all the hosts that were used to generate the above example were running in AWS.

CFEngine to Ansible

The above example response from the CFEngine hosts by class API shows the format of the response (in the JSON format, but the YAML format is just a direct equivalent). Each class/group has its top-level object with a hosts array of host names. This format is directly recognized by Ansible, not by accident, of course, in fact, the API response format was designed that way as part of the efforts to make integration of CFEngine and Ansible easier.

Ansible inventory data formats

Ansible supports various inventory sources providing inventory data in various formats. The most common source is a simple file in a JSON, YAML or ini format, but if given an executable, Ansible uses the script-plugin that runs the executable and parses its output. Such executable is supposed to output a JSON in the format shown above in the example. Unfortunately, this format differs from the JSON/YAML format of an inventory file, even though the difference is very subtle.2 The CFEngine hosts by class API was designed to be used in a script (usually just a simple curl invocation) and thus the format of the response follows the specification of the script plugin. Which means it doesn't work as an inventory file. Future versions of CFEngine will recognize an extra parameter inventoryFile=true/false to support scenarios where the response is saved into a file instead of directly fed into one of the Ansible utilities.

Example inventory script

As described above, the CFEngine hosts by class API returns inventory data suited for Ansible's inventory script-plugin which expects an executable to run. The most trivial script for such use is just:

#!/bin/bash

curl -u ${CFE_USER}:${CFE_PWD} https://hub.example.com/api/hosts/by-class

with the CFE_USER and CFE_PWD environment variables to be set and exported with the export command.3 With a script like the above saved as, let's say getcfeinventory.sh, with the executable bit set (chmod u+x get_cfe_inventory.sh) the CFEngine Inventory data can be used with the Ansible utilities like this:

history -a
export CFE_USER="admin" CFE_PWD="testingCFEngine"

ansible -m setup -i ./get_cfe_inventory.sh debian

unset CFE_PWD
history -c

where the last argument to the ansible command (debian) tells Ansible to only target hosts from the debian group as defined in the inventory data, i.e. as returned by the CFEngine hosts by class API.

Of course it's possible to execute some tasks only on some specific hosts with Ansible itself, (although the debian class in CFEngine is defined on all hosts running Debian-like/Debian-derived GNU/Linux distributions), but with adding a condition like where ansible.distro_family == debian (if such special variable exists/existed) would mean Ansible would connect to all the hosts and only then skip the particular task(s) on those hosts that fulfill the condition. Using the information from CFEngine speeds things up significantly if there are many hosts in the infrastructure and it can use the full potential of CFEngine's hard classes.

Host variables

As was explained in the first part of this post, CFEngine Inventory contains information about hosts in the infrastructure managed by CFEngine which the hosts report. Ansible Inventory, on the other hand, may contain host-specific variables with host-specific values used in the tasks and playbooks. Obviously, these two mechanisms can be chained up just like getting the groups of hosts from CFEngine and feeding them to Ansible. By adding the withInventory=true argument to the API query (i.e. at the end of the URL used in the curl command), the response is extended with a piece of data like this:

{
  "_meta": {
    "hostvars": {
      "ip-172-31-22-152.eu-west-1.compute.internal": {
        "CFEngine Inventory": {
          "OS": "CentOS 7",
          "Kernel": "linux",
          "Timezone": "UTC",
          "CPU model": "Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz",
          "Host name": "ip-172-31-22-152.eu-west-1.compute.internal",
          "Interfaces": "eth0",
          "BIOS vendor": "Xen",
          "CFEngine ID": "SHA=0854031414e708d8c647496b75580fb09972b6a108737af08fcaa03e43af7de6",
          "CPU sockets": "1",
          "System UUID": "35A626EC-2164-5893-1476-75B7BC9325D4",
          "Architecture": "x86_64",
          "BIOS version": "4.2.amazon",
          "Virtual host": "xen",
          "Disk free (%)": "91.00",
          "MAC addresses": "06:3e:2f:ff:99:cf",
          "IPv4 addresses": "172.31.22.152",
          "Kernel Release": "3.10.0-957.27.2.el7.x86_64",
          "Policy Servers": "172.31.21.241",
          "System version": "4.2.amazon",
          "Uptime minutes": "96",
          "Ports listening": "22, 25, 111, 5308",
          "CFEngine version": "3.18.0",
          "Memory size (MB)": "14875.92",
          "CPU logical cores": "4",
          "Policy Release Id": "79d0f8cfca33efdc0aea676048ec4b7b9889cc0a",
          "CPU physical cores": "2",
          "System manufacturer": "Xen",
          "System product name": "HVM domU",
          "Timezone GMT Offset": "+0000",
          "Physical memory (MB)": "15360",
          "System serial number": "ec26a635-6421-9358-1476-75b7bc9325d4",
          "Primary Policy Server": "172.31.21.241",
          "Allowed hosts for cf-runagent": "172.31.21.241",
          "Allowed users for cf-runagent": "root"
        }
      }
    }
  }
}

Then, host-specific variables are defined for all the hosts in all the groups that are part of the response. These so-called attributes are user-defined (with some pre-defined) in the CFEngine policy and thus can be easily extended. Once again, the format is ready-to-use with Ansible and so the ansible tasks and playbooks can refer to these values, using the special hostvars[] dictionary. This can be a nice extension or a complement for the Ansible Facts functionality.

Conclusions

This post is another piece in the series of posts focused on integration of Ansible and CFEngine and using the best of what the two tools can provide. It shows how the CFEngine Inventory and Ansible Inventory, two seemingly conceptually different things, are actually very similar and how they can be used together to provide enhanced performance and better ease of use. Once again, the combination of Ansible for the deployment part of the infrastructure management and for the well-targeted real-time changes with CFEngine for the general long-term maintenance and infrastructure knowledge and overview proves to be very useful.


  1. And class expressions which are logical expressions combining classes with the logical operators AND, OR and NOT, define sets that are results of the respective set operations intersection, union and complement↩︎

  2. The hosts key must be an object and the individual hosts need to be objects too, not just strings. ↩︎

  3. Unfortunately, the Ansible script-plugin doesn't support passing through standard error output and standard input so just having something like curl -u admin https://hub.example.com/api/hosts/by-class and letting curl to prompt for the password is not possible. ↩︎