Show posts by author:
Vratislav Podzimek

Efficient data/file copying on modern Linux

Editing and copying large files or large numbers of files is slow. For a configuration management tool, it is probably one of the slowest things we do, apart from waiting for other programs to finish or waiting for network communication. In this blog post, we look at how to copy files. More specifically, the most performant approaches available on modern Linux systems. We are working on implementing these techniques so CFEngine and all your policy will copy files more efficiently.

May 15, 2024

libntech 1.0: now available to more projects

The license of our in-house C utility and compatibility library libntech was recently changed from GPLv3 to Apache License Version 2.0 which makes the library suitable for more projects thanks to the more permissive license. While GPLv3 practically required any project using libntech to be licensed under GPLv3 as well, the Apache License v2.0 allows any open source as well as proprietary software to utilize our utility library, keeping the copyright attributions.

October 12, 2023

Processes, forks and executions - part 2

This is the second blog post in a short series about processes on UNIX-like systems. It is a followup to the previous post which focused on basic definitions, creation of processes and relations between them. This time we analyze the semantics of two closely related system calls that play major roles in process creation and program execution. fork() and exec() The UNIX-based operating systems provide the fork() system call1 to create a clone of an existing process and the execve() system call to start executing a program in a process. Windows, on the other hand, provide the CreateProcess() function which starts a given program in a newly created process. Why are UNIX-based systems doing things in a more complicated way? There are many reasons for that, some simply historical, as described in The Evolution of the Unix Time-sharing System:

July 28, 2022

Processes, forks and executions - part 1

While working on the integration of CFEngine Build into Mission Portal we came to the point where we needed to start executing separate tools from our recently added daemon - cf-reactor. Although it may seem like nothing special, knowing a bit about the process creation and program execution specifics (and having to fight some really hard to solve bugs in the past) we spent a lot of time and effort on this step. Now we want to share the story and the results of the effort, but since understanding of the reasons behind the work together with how the implementation works requires quite deep knowledge of how processes are being created and programs are being started on UNIX-like systems, we first start with a series of blog posts focused on this seemingly simple area. They cover the basics as well as some advanced topics in two parts:

July 26, 2022

Synchronize data between PostgreSQL and files

Databases are great for data processing and storage. However, in many cases it is better or easier to work with data in files on a file system, some tools even cannot access the data in any other way. When a database (DB) is created in a database management system (DBMS) using a file system as its data storage, it of course uses files on the given file system to store the data. But working with those files outside of the DBMS, even for read-only access to the data stored in the DB, is practically impossible. So what can be done if some setup requires data in files while at the same time, the data processing and storage requires a use of a DB(MS)? The answer is synchronization between two storage places – a DB and files. It can either be from the DB to the files where the files are then treated as read-only for the parties working with the data, or with modifications of the files being synchronized to the DB. In the former setup, the DB is the single source of truth – the data in the files may be out of sync, but the DB has the up to date version. In the latter setup, the DB provides a backup or alternative read-only access to the data that is primarily stored in the files or the files provide an alternative write-only access to the DB. A two-way synchronization and thus a combination of read and write access in both places, the DB and the files, should be avoided because it's very hard (one could even say impossible) to properly implement mechanisms ensuring data consistency. Both between the two storages, but even in each of them alone.

April 6, 2022

Trigger arbitrary code from PostgreSQL

In this blog post we show how it is possible to run an arbitrary program, script, or execute arbitrary code in reaction to changes and generally events in a PostgreSQL database. Triggers Database management systems (DBMS) provide mechanisms for defining reactions to certain actions or, in other words, for defining that specific actions should trigger specific reactions. PostgreSQL, the DBMS used by CFEngine Enterprise, is no exception. These triggers can be used for ensuring consistency between tables when changes in one table should be reflected in another table, for recording information about actions, and many other things. PostgreSQL's Overview of Trigger Behavior describes the basics of triggers with the following sentences:

March 31, 2022

CFEngine bootstrap with Ansible

CFEngine and Ansible are two complementary infrastructure management tools. Findings from our analysis show that they can be combined and used side by side with joint forces to handle all areas in the best possible way. Part of infrastructure management is hosts deployment, either when building a brand new infrastructure or when growing one by adding new hosts. This is something Ansible truly excels in as it makes it very easy to run a sequence of steps on all hosts to initialize (deploy) them and it only requires SSH access to the hosts and Python installed on them. 1

February 3, 2022

Static checking of CFEngine code

Software quality has been a topic and an area of interest since the dawn of software itself. And as software evolved so did the techniques and approaches to assuring its high quality. Better computers providing more computing power, bigger storage and faster communication have allowed software developers to detect issues in their code sooner and faster. And so we got from getting a syntax error after two days of waiting for the box of punch cards to go through the queue of boxes and get loaded into a computer running a compiler to getting such errors from a compiler in seconds or even in real-time from the code editor. And we got from bugs being detected by actually seeing real bugs on punch cards with machine instructions to operating systems providing bug reports with coredumps, tracebacks and lots of information helping the developers to identify the problem, tests detecting problems before the code gets into production, or compilers and tooling detecting them before the code is even executed the first time. We can afford doing things like fuzz testing, we have enough computational power for compilers and special tools to analyze the code and check all possible paths through it and much more. At the same time, software has become a part of almost everything we use or interact with every day and so with the incomparably greater amount of software potentially affecting our lives there is an incomparably greater amount of bugs that need to be detected and fixed or at least handled gracefully. With some software being more critical than other and with bugs ranging from minor annoyances to losses of human lives. Many things have changed in this evolution, but one rule has always been key:

December 9, 2021

CVE-2021-38379 & CVE-2021-36756 - Exported report permissions and certificate checking in Federated Reporting

The CFEngine engineering team has recently discovered two security issues in the CFEngine Enterprise product: CVE-2021-38379 - Publicly available exported reports CVE-2021-36756 - Certificate not checked in Federated Reporting While the latter one (CVE-2021-36756) only affects CFEngine Enterprise deployments using the Federated Reporting functionality, the former one (CVE-2021-38379) affects all deployments running all supported versions of CFEngine Enterprise (and many unsupported versions, 3.5 or newer, to be more precise). Both issues were discovered internally during development and testing and we have no indications of these vulnerabilities being exploited or known of outside of the development team.

October 27, 2021

Using CFEngine inventory as Ansible inventory

CFEngine and Ansible are two complementary infrastructure management tools that both work with so-called inventories. However, the common term can be quite confusing because the way they are defined and created is very different for an Ansible Inventory and for a CFEngine Inventory. In the most basic case, an Ansible Inventory is just a file with a list of hosts and groups of hosts that Ansible then manages when fed the inventory file. On the other hand, CFEngine Inventory is a database of information about all the hosts in the infrastructure managed by CFEngine which the hosts themselves report. In a more complex scenario, an Ansible Inventory can also contain a lot of information about the hosts in the infrastructure, but those need to be pulled from somewhere else and given to Ansible. With CFEngine, hosts talk to a CFEngine Hub, pull policy from it and report information back to it. On the other hand, with Ansible, policy is pushed to the hosts from one place which thus must have a list of all hosts available in advance, potentially with some extra information (parameters) of the hosts.

October 7, 2021