We recently introduced a new policy function
classfilterdata()
,
which will be available in the next LTS release of CFEngine, version 3.27. If
you can’t wait for the release, feel free to grab the latest master non-LTS
from our nightly
packages.
In this blog post, we’ll illustrate how the classfilterdata()
policy function
works. However, if you want a more real-world example, you should check out The
agent is in - Episode 51 - Data-Driven Configuration with
classfilterdata()
by Jay Goldberg from Two
Sigma.
The classfilterdata()
policy function takes 2 to 3 arguments. The first being
a reference to a data container or inline JSON, followed by the type of data
structure, and the key or index of the class expression within the data
structure.
Here is a tarball containing the files used in this blog post classfilterdata.tar.gz.
Array of arrays
Let’s start off with the first data structure, "array_of_arrays"
, which is
reminiscent of its precursor
classfiltercsv()
.
The major difference is that classfilterdata()
takes a data container as its
first argument. This is in contrast to the path of a CSV file. Having a data
container as the first argument gives you the flexibility to parse the data
container from various formats using
readjson()
,
readyaml()
,
readcsv()
,
readenvfile()
,
readdata()
,
etc.
Given the following JSON file containing the inventory of a company, let’s say you want to filter it based on the department.
[
["ID", "Name", "Decription", "Quantity", "Department"],
[1, "Laptop", "MacBook Pro", 13, "sales"],
[2, "Phone", "The Fairphone (Gen. 6)", 5, "workshop"],
[3, "Monitor", "DELL P2419HC", 7, "sales"],
[4, "Printer", "HP Printer", 1, "marketing"]
]
With policy, you can simply parse the JSON file using the readjson()
policy
function, followed by e.g. filtering it based on the defined sales
class using
classfilterdata()
. The index ”4”
tells the policy function which column to
filter on. In our case, the 4th column, counting from 0.
bundle agent array_of_arrays
{
classes:
"sales";
vars:
"raw"
data => readjson("$(this.promise_dirname)/my_data.json");
"filtered"
data => classfilterdata("@(raw)", "array_of_arrays", "4");
reports:
"Inventory for sales $(with)"
with => storejson("@(filtered)");
}
In the end, we convert the data container back to a JSON string and print it in a reports promise. From the output, you can see that only entries from the sales department are present.
R: Inventory for sales [
[
1,
"Laptop",
"MacBook Pro",
13,
"sales"
],
[
3,
"Monitor",
"DELL P2419HC",
7,
"sales"
]
]
Array of objects
The ability to support more complex data structures, such as
"array_of_objects"
, is what makes classfilterdata()
so much more powerful
than its predecessor (i.e., classfiltercsv()
). Imagine that you are the system
administrator of developer machines, where you are responsible for configuring
and installing the default editor for each developer. You’re handed a JSON file
containing their usernames and preferred editor.
[
{
"username": "larsewi",
"editor": "vim"
},
{
"username": "nickanderson",
"editor": "emacs"
},
{
"username": "olehermanse",
"editor": "vim"
},
{
"username": "craigcomstock",
"editor": "ed"
}
]
The policy for filtering out the correct editors based on the users (defined as classes) can be written as follows. Notice that we no longer use an index to specify the class expression. The key difference (pun intended) is that we need a key to access the child element rather than an index. The reason is that the ordering of key-value pairs in a JSON object is arbitrary. Hence, we specify the key of the desired class expression. Which, in our case, is the username.
bundle agent array_of_objects
{
classes:
"nickanderson";
vars:
"raw"
data => readjson("$(this.promise_dirname)/my_data.json");
"filtered"
data => classfilterdata("@(raw)", "array_of_objects", "username");
reports:
"Required editors $(with)"
with => storejson("@(filtered)");
}
In this example, the username is hardcoded, but it could, for example, be parsed from /etc/passwd. The output would be:
R: Required editors [
{
"editor": "emacs",
"username": "nickanderson"
}
]
Object of arrays
So far, we’ve only seen arrays as the root element in the data structure.
However, an object can also be used. Using an object as the root element comes
with an extra perk. You can either specify the key or an index of the child
element as the third argument, or you can simply omit it, causing the key of the
child element itself to be the class expression. Let’s illustrate
"object_of_arrays"
with the following data:
{
"debian": ["bison", "flex", "libacl1"],
"debian.dev": ["libbison-dev", "libacl1-dev"],
"redhat": ["byacc", "flex"],
"redhat.dev": ["flex-devel"]
}
The drawback of using the key to the child element as the class expression is that you can never list the exact same class expression twice, since a key has to be unique.
bundle agent object_of_arrays
{
vars:
"raw"
data => readjson("$(this.promise_dirname)/my_data.json");
"filtered"
data => classfilterdata("@(raw)", "object_of_arrays");
reports:
"Packages to be installed $(with)"
with => storejson("@(filtered)");
}
If we run the policy on an Ubuntu machine, we would get the following output:
R: Packages to be installed {
"debian": [
"bison",
"flex",
"libacl1"
]
}
For more information on classes and expressions check out the following resources:
Object of objects
The last supported data structure is "object_of_objects"
. Also, here, you can
specify a key in the child object, or omit it to use the key of the child object
itself. Let’s demonstrate the former with the following YAML file:
# Files and permissions
/etc/chrony.conf:
mode: 644
owner: root
group: root
if: redhat
/etc/apt/apt.conf:
mode: 644
owner: root
group: root
if: debian|ubuntu
Now, let’s filter the data based on the value of the "if"
key:
bundle agent object_of_objects
{
vars:
"raw"
data => readyaml("$(this.promise_dirname)/my_data.yaml");
"filtered"
data => classfilterdata("@(raw)", "object_of_objects", "if");
reports:
"File permissions $(with)"
with => storejson("@(filtered)");
}
Unfortunately, there is no policy function to convert data containers back to YAML, so we’ll use JSON instead.
R: File permissions {
"/etc/apt/apt.conf": {
"group": "root",
"if": "debian|ubuntu",
"mode": 644,
"owner": "root"
}
}
Auto
If you’re lazy, or you’re, e.g., writing a more generic bundle, you can tell
CFEngine to automatically detect the data structure by using "auto"
instead of
explicitly specifying the data structure. Furthermore, "auto"
lets you
interchange the different supported data structures. Take this data as an
example:
{
"array": ["foo", "bar", "baz"],
"object": { "0": "bogus", "1": "any", "2": "doofus" }
}
Here we have an object of arrays and objects. Any of the “more specific” data
structures from before would cause the policy function to fail. However, for
"auto"
, the interpretation is less strict. Notice how the keys in the child
object must be in a way that they can be interpreted as both keys in the objects
or indices in the arrays.
bundle agent auto
{
classes:
"bar";
vars:
"raw"
data => readjson("$(this.promise_dirname)/my_data.json");
"filtered"
data => classfilterdata("@(raw)", "auto", "1");
reports:
"Filtered $(with)"
with => storejson("@(filtered)");
}
The filtered output from the hybrid data structure would be as follows:
R: Filtered {
"array": [
"foo",
"bar",
"baz"
],
"object": {
"0": "bogus",
"1": "any",
"2": "doofus"
}
}
Share your thoughts
You are now an expert user of the classfilterdata()
. If you have any cool use
cases for this policy function or you have any questions about the CFEngine in
general, let us know at GitHub
Discussions. We are eager to hear
from you.