Introducing classfilterdata() policy function

We recently introduced a new policy function classfilterdata(), which will be available in the next LTS release of CFEngine, version 3.27. If you can’t wait for the release, feel free to grab the latest master non-LTS from our nightly packages.

In this blog post, we’ll illustrate how the classfilterdata() policy function works. However, if you want a more real-world example, you should check out The agent is in - Episode 51 - Data-Driven Configuration with classfilterdata() by Jay Goldberg from Two Sigma.

The classfilterdata() policy function takes 2 to 3 arguments. The first being a reference to a data container or inline JSON, followed by the type of data structure, and the key or index of the class expression within the data structure.

Here is a tarball containing the files used in this blog post classfilterdata.tar.gz.

Array of arrays

Let’s start off with the first data structure, "array_of_arrays", which is reminiscent of its precursor classfiltercsv(). The major difference is that classfilterdata() takes a data container as its first argument. This is in contrast to the path of a CSV file. Having a data container as the first argument gives you the flexibility to parse the data container from various formats using readjson(), readyaml(), readcsv(), readenvfile(), readdata(), etc.

Given the following JSON file containing the inventory of a company, let’s say you want to filter it based on the department.

array_of_arrays/my_data.json
[
  ["ID", "Name", "Decription", "Quantity", "Department"],
  [1, "Laptop", "MacBook Pro", 13, "sales"],
  [2, "Phone", "The Fairphone (Gen. 6)", 5, "workshop"],
  [3, "Monitor", "DELL P2419HC", 7, "sales"],
  [4, "Printer", "HP Printer", 1, "marketing"]
]

With policy, you can simply parse the JSON file using the readjson() policy function, followed by e.g. filtering it based on the defined sales class using classfilterdata(). The index ”4” tells the policy function which column to filter on. In our case, the 4th column, counting from 0.

array_of_arrays/my_policy.cf
bundle agent array_of_arrays
{
  classes:
    "sales";

  vars:
    "raw"
      data => readjson("$(this.promise_dirname)/my_data.json");
    "filtered"
      data => classfilterdata("@(raw)", "array_of_arrays", "4");

  reports:
    "Inventory for sales $(with)"
      with => storejson("@(filtered)");
}

In the end, we convert the data container back to a JSON string and print it in a reports promise. From the output, you can see that only entries from the sales department are present.

R: Inventory for sales [
  [
    1,
    "Laptop",
    "MacBook Pro",
    13,
    "sales"
  ],
  [
    3,
    "Monitor",
    "DELL P2419HC",
    7,
    "sales"
  ]
]

Array of objects

The ability to support more complex data structures, such as "array_of_objects", is what makes classfilterdata() so much more powerful than its predecessor (i.e., classfiltercsv()). Imagine that you are the system administrator of developer machines, where you are responsible for configuring and installing the default editor for each developer. You’re handed a JSON file containing their usernames and preferred editor.

array_of_objects/my_data.json
[
  {
    "username": "larsewi",
    "editor": "vim"
  },
  {
    "username": "nickanderson",
    "editor": "emacs"
  },
  {
    "username": "olehermanse",
    "editor": "vim"
  },
  {
    "username": "craigcomstock",
    "editor": "ed"
  }
]

The policy for filtering out the correct editors based on the users (defined as classes) can be written as follows. Notice that we no longer use an index to specify the class expression. The key difference (pun intended) is that we need a key to access the child element rather than an index. The reason is that the ordering of key-value pairs in a JSON object is arbitrary. Hence, we specify the key of the desired class expression. Which, in our case, is the username.

array_of_objects/my_policy.cf
bundle agent array_of_objects
{
  classes:
    "nickanderson";

  vars:
    "raw"
      data => readjson("$(this.promise_dirname)/my_data.json");
    "filtered"
      data => classfilterdata("@(raw)", "array_of_objects", "username");

  reports:
    "Required editors $(with)"
      with => storejson("@(filtered)");
}

In this example, the username is hardcoded, but it could, for example, be parsed from /etc/passwd. The output would be:

R: Required editors [
  {
    "editor": "emacs",
    "username": "nickanderson"
  }
]

Object of arrays

So far, we’ve only seen arrays as the root element in the data structure. However, an object can also be used. Using an object as the root element comes with an extra perk. You can either specify the key or an index of the child element as the third argument, or you can simply omit it, causing the key of the child element itself to be the class expression. Let’s illustrate "object_of_arrays" with the following data:

object_of_arrays/my_data.json
{
  "debian": ["bison", "flex", "libacl1"],
  "debian.dev": ["libbison-dev", "libacl1-dev"],
  "redhat": ["byacc", "flex"],
  "redhat.dev": ["flex-devel"]
}

The drawback of using the key to the child element as the class expression is that you can never list the exact same class expression twice, since a key has to be unique.

object_of_arrays/my_policy.cf
bundle agent object_of_arrays
{
  vars:
    "raw"
      data => readjson("$(this.promise_dirname)/my_data.json");
    "filtered"
      data => classfilterdata("@(raw)", "object_of_arrays");

  reports:
    "Packages to be installed $(with)"
      with => storejson("@(filtered)");
}

If we run the policy on an Ubuntu machine, we would get the following output:

R: Packages to be installed {
  "debian": [
    "bison",
    "flex",
    "libacl1"
  ]
}

For more information on classes and expressions check out the following resources:

Object of objects

The last supported data structure is "object_of_objects". Also, here, you can specify a key in the child object, or omit it to use the key of the child object itself. Let’s demonstrate the former with the following YAML file:

# Files and permissions

/etc/chrony.conf:
  mode: 644
  owner: root
  group: root
  if: redhat

/etc/apt/apt.conf:
  mode: 644
  owner: root
  group: root
  if: debian|ubuntu

Now, let’s filter the data based on the value of the "if" key:

bundle agent object_of_objects
{
  vars:
    "raw"
      data => readyaml("$(this.promise_dirname)/my_data.yaml");
    "filtered"
      data => classfilterdata("@(raw)", "object_of_objects", "if");

  reports:
    "File permissions $(with)"
      with => storejson("@(filtered)");
}

Unfortunately, there is no policy function to convert data containers back to YAML, so we’ll use JSON instead.

R: File permissions {
  "/etc/apt/apt.conf": {
    "group": "root",
    "if": "debian|ubuntu",
    "mode": 644,
    "owner": "root"
  }
}

Auto

If you’re lazy, or you’re, e.g., writing a more generic bundle, you can tell CFEngine to automatically detect the data structure by using "auto" instead of explicitly specifying the data structure. Furthermore, "auto" lets you interchange the different supported data structures. Take this data as an example:

{
  "array": ["foo", "bar", "baz"],
  "object": { "0": "bogus", "1": "any", "2": "doofus" }
}

Here we have an object of arrays and objects. Any of the “more specific” data structures from before would cause the policy function to fail. However, for "auto", the interpretation is less strict. Notice how the keys in the child object must be in a way that they can be interpreted as both keys in the objects or indices in the arrays.

bundle agent auto
{
  classes:
    "bar";

  vars:
    "raw"
      data => readjson("$(this.promise_dirname)/my_data.json");
    "filtered"
      data => classfilterdata("@(raw)", "auto", "1");

  reports:
    "Filtered $(with)"
      with => storejson("@(filtered)");
}

The filtered output from the hybrid data structure would be as follows:

R: Filtered {
  "array": [
    "foo",
    "bar",
    "baz"
  ],
  "object": {
    "0": "bogus",
    "1": "any",
    "2": "doofus"
  }
}

You are now an expert user of the classfilterdata(). If you have any cool use cases for this policy function or you have any questions about the CFEngine in general, let us know at GitHub Discussions. We are eager to hear from you.