Measuring values extracted from a running log

Posted by Nick Anderson
August 13, 2019

Recently I wanted to start measuring the length of time it took for PostgreSQL to acquire a lock so that I could keep an eye on how it changes over time.

My PostgreSQL logs contain entries like the following that record the amount of time in ms it took to acquire a lock.

2019-06-11 18:49:39 GMT LOG:  process 10427 acquired AccessShareLock on relation 17949 of database 16384 after 1118.396 ms at character 269

Measurement promises store and track values. Values can be sampled from either a pipe (command output) or a file as defined by the promises stream_type. Values are stored according to the promises history_type. When a measured value is not expected to change frequently a history_type of scalar or static will result in the measurement being sampled less frequently and the single value with compressed statistics will be stored. A history_type of log results in the measured value being logged as an infinite time-series in $(sys.statedir)/<PromiseHandle>_measure.log. A history_type of weekly results in the storing of a two-dimensional time average over a weekly period. Measurements with history_type of weekly are automatically graphed in Mission Portal if they are collected.

Measurements in CFEngine have been around for a long time, but 3.10 brought custom measurements from the closed source enterprise code base to the open source core code base.

Let’s review this policy to extract and measure ms PostgreSQL had to wait before acquiring the lock.

bundle monitor measure_psql
{
     measurements:

       am_policy_server::

       "/var/log/postgresql.log"
         handle => "psql_lock_wait_before_acquisition",
         comment => "How long psql had to wait before it acquired a lock",
         stream_type => "file",
         match_value => psql_acquired_AccessShareLock,
         data_type => "int",
         units => "ms",
         history_type => "weekly",
         if => fileexists( "/var/log/postgresql.log" );
}

Measurement promises are evaluated by cf-monitord and must reside within monitor type bundles. Hosts that having the am_policy_server class defined will scan the postgresql.log file, extract the value to measure as defined by body match_value psql_acquired_AccessShareLock and store it in a two-dimensional time average if the file exists.

The values related to this measurement are stored in the mon bundle scope, with names based on the promise handle.

  • mon.value_psql_lock_wait_before_acquisition The last sampled value for the promise with handle psql_lock_wait_before_acquisition.
  • mon.av_psql_lock_wait_before_acquisition The average sampled value for the promise with handle psql_lock_wait_before_acquisition over the stored weekly history.
  • mon.dev_psql_lock_wait_before_acquisition The standard deviation sampled value for the promise with handle psql_lock_wait_before_acquisition over the stored weekly history.

Next, lets review body match_value psql_acquired_AccessShareLock where the details of how the sampled value is extracted from the log file reside.

body match_value psql_acquired_AccessShareLock
# @brief Average time to successfully acquire AccessShareLock from log file
{
   select_line_matching => ".*acquired AccessShareLock.*";
   extraction_regex => ".*after (\d+)";
   track_growing_file => "true";
   select_multiline_policy => "average"; # average, sun, first, last
}

select_line_matching takes an anchored regular expression matching lines that we should consider. extraction_regex is an unanchored regular expression that should contain a single back-reference for extracting the value that should be sampled. track_growing_file allows cf-monitord to remember the position from which it last read in the file so that the same log entries are not counted multiple times. Finally, during each sample there may be multiple log entries recording that a lock was acquired. We use select_multiline_policy to average the extracted values together.

Once the policy has been running I can see the last recorded, average, and standard deviation by running cf-promises --show-vars=psql.

[root@demohub ~]# cf-promises --show-vars=psql
Variable name                            Variable value        Meta tags
default:mon.av_psql_lock_wait_before_acquisition 1640.40       monitoring,source=environment
default:mon.dev_psql_lock_wait_before_acquisition 0.00         monitoring,source=environment
default:mon.value_psql_lock_wait_before_acquisition 5468.00    monitoring,source=environment

In Mission Portal of CFEngine Enterprise the measurements are graphed so they can be visualized over time.