Using Splunk to analyze CFEngine logs

0. Why?

CFEngine Enterprise collects very detailed, real-time information about the configuration of your IT Infrastructure. Splunk is an excellent enterprise search engine, engineered for speed, robustness, and scalability. You will learn how analyzing CFEngine logs with Splunk lets you see CFEngine data in new ways. You will learn how to access CFEngine’s logs, what information they capture and how Splunk’s “rex” command and other neat tools make that information accessible.

As you’ll see from this guide, analyzing CFEngine logs with Splunk lets you see your data in new ways as well. You will learn about Splunk’s “rex” command and many other neat tools. We’ll use CFEngine Enterprise in these examples but you can easily adapt them to CFEngine Community yourself or look at https://cfengine.com/manuals_files/SpecialTopic_Reporting.pdf for ideas and inspiration.

1. Install

You need the Splunk server installed on an indexer machine. For small installations this is also the search server. We’ll call this machine “indexer01” in this guide. All the work in this section will be done on a forwarder machine called “this-host-name”, which sends all the log data to indexer01 but doesn’t do any searching.

Typically you’ll have the lightweight Splunk forwarder installed in /opt/splunkforwarder. You’ll probably have /opt/splunkforwarder/etc/system/local/inputs.conf looking like this:

This is pretty simple so far. The /opt/splunkforwarder/etc/system/local/outputs.conf file is not much worse:

Note that both inputs.conf and outputs.conf are very bare. You should consult with your local Splunk administrator if you suspect you may have the Splunk forwarder already installed.

2. Verify

The first search you will do checks that you’re finding CFEngine logs on a specific host: host=“this-host-name” source=/var/cfengine/*/cf_*.log

Limit it to the last 24 hours. You should see at least a couple of lines. If you don’t, check that the forwarder is working. The file /opt/splunkforwarder/var/log/splunkd.conf may be useful for debugging. From now on we won’t specify the host name, so you’ll be searching across your whole environment.

See the wildcards “*”? Those ensure we’ll see logs that are in different directories too. Later searches will filter further to ensure only matching data is found, but using the source search parameter will at least speed up the search by limiting it to a few files instead of everything in the index. You could also set up a separate index for CFEngine logs, but that’s far beyond the scope of this guide.

3. Splunk away

We’ll start building a Splunk query and gradually expand it. The first task is field extraction. Let’s pull out the repair date. We’ll ignore it from now on, since Splunk captures the event arrival date, but we could use it to tag events more precisely. That should be done at the forwarder, though, and requires modifying inputs.conf and a few other files, so let’s skip that for now and just go with Splunk’s idea of event date.

Let’s take a look at the cf_notkept.log file, which will tell us what promises were not kept.

We can extract the useful data similarly to how we did it for repaired promises.

Here we grab the promise type and the reason the promise was broken. Can we find the top reasons or failing promise types by host? Sure!

4. Details, details

We’ll look at command promises next. From cf_repair.log we’ll extract the command (“promiser”) and the “repairstatus” which is usually “succeeded”. We also grab the “promisetype” which tags the command. The base search is for “->” to cut out data we don’t need.

This search will tag the fields, which you can use for filtering. So for instance you could say “I only want promisetype=internal_promise” which happens to be for internal CFEngine use. Or you could chart the number of repaired promises by host, over time. It’s easy to explore from that point on.

Also see how we do search promiser="*" at the end? That ensures we only get results with the “promiser” field defined.

But what really happened in that magic “rex” command? It tells Splunk to extract fields from each line found. That’s actually pretty expensive, so you’re better off doing the field extraction at the forwarder, but let’s assume you’re not ready for that sort of commitment.

The regular expression is like PCRE or Perl, using named captures to tag each field. It’s fairly ugly and hard-coded in order to extract useful data out of each text line. This is unfortunately due to the default format of cfengine’s logs, which is meant for human consumption.

Let’s look for other types of repaired promises. File permissions:

Here we extract the file name (“promiser”) and the old and new permissions.

File deletions are captured similarly:

And let’s grab file copies too:

Reports are easy:

Processes require two searches, one for “signalled ‘SIGNAL’” and the other for “Making a one-time restart promise”:

5. Aggregation

You will probably want to take all these searches and make a report out of them. As I mentioned, these searches are awkward because the logs are intended for people. But if you need to, use the “append” command Splunk provides. Here’s an example:

It’s not complicated, just tedious, and s-l-o-w (you have to wait for the two searches to complete in series, not in parallel). If you find yourself doing this kind of searching aggregation, you should look into parsing the fields out at the forwarder and other Splunk tricks