Show posts tagged:
developers

Speeding up PostgreSQL ETL pipeline with the help of GODS

Problem to solve When working on the new Federated Reporting feature for CFEngine we had to solve the problem of collecting data from multiple CFEngine hubs (feeders) on a single hub (superhub). CFEngine hubs are using PostgreSQL to store data, so, more specifically, the problem was how to collect data from multiple PostgreSQL databases in one PostgreSQL database. And because we are talking about ~1 GiB of SQL data per feeder hub and for example 10 feeders connected to a superhub here, the initial and trivial solution using basically this ETL (Extract Transform Load) pipeline - pg_dump | gz | ssh | gunzip | psql - provided really poor performance. The problem was in the last part of the pipeline - importing data using psql. Reading and writing 10 GiB of data of course takes a while, but we soon realized that I/O speed was not the bottleneck in this case.

September 30, 2019

Restricting CFEngine to one CPU core using Systemd

In some performance critical situations, it makes sense to limit management software to a single CPU (core). We can do this using systemd and cgroups. CFEngine already provides systemd units on relevant platforms, we just need to tweak them. I’m using CFEngine Enterprise 3.12 on CentOS 7, but the steps should be very similar on other platforms/versions. This post is based on an excellent article from Red Hat: https://access.redhat.com/solutions/1445073 Using ps to check what CPU core is utilized Listing all processes and their core We can use ps to check CPU core for desired processes:

August 29, 2018