Introduction

In the CFEngine Core team, we have recently been working on a fix for our WaitForCriticalSection() function. In short, the function checks a timestamp in a chunk of (lock) data stored in a local LMDB database and if the timestamp is too old, it writes a new chunk of (lock) data with the new timestamp. However, this used to be done in separate steps - read the data from the DB and close DB, check the data and potentially write the new data into the DB. So, there was a race condition because if multiple processes did the same steps at the same time, they could have read and checked the same timestamp value and then write their own data with their new timestamps one after another. On the high-level perspective that meant multiple processes could have entered the critical section at the same time.

To fix the issue, we needed a function that would read, check and potentially write data into an LMDB database in one transaction. That way, multiple processes trying to enter the critical section at the same time would do so one after another. To decide whether the current value should be overwritten with a new value, the new function, called OverwriteDB() (complementary to ReadDB() and WriteDB()), takes as one of its arguments a pointer to a function that gets the current value and returns true (overwrite) or false (pass). Everything was working fine with this new function. And then, in our CI, we started getting hit by a (SIG)BUS. Tests started failing and hanging on our Solaris SPARC64 and HP-UX IA64 machines and investigation revealed that cf-agent processes in the tests were sometimes killed with the SIGBUS signal when there were multiple such processes running at the same time. And fighting over the critical section, we figured.

Bad alignment

When we tried to debug the issue using truss (Solaris alternative of strace), we saw this:

16422:  open("/var/cfengine/state/cf_lock.lmdb", O_RDWR|O_CREAT, 0644) = 10
16422:  pread(10, "\0\0\0\0\0\0\0\b\0\0\0\0".., 92, 0)  = 92
16422:  pread(10, "\0\0\001\0\0\0\b\0\0\0\0".., 92, 8192) = 92
16422:  mmap(0x00000000, 104857600, PROT_READ, MAP_SHARED, 10, 0) = 0xF8000000
16422:  open("/var/cfengine/state/cf_lock.lmdb", O_WRONLY|O_DSYNC) = 11
16422:  fcntl(11, F_GETFD, 0x00000000)          = 0
16422:  fcntl(11, F_SETFD, 0x00000001)          = 0
16422:  fcntl(9, F_SETLK, 0xFFBFA63C)           = 0
16422:  close(8)                    = 0
16422:  time()                      = 1591194272
16422:  getpid()                    = 16422 [16343]
16422:  time()                      = 1591194272
16422:  time()                      = 1591194272
16422:  lwp_mutex_timedlock(0xFE860040, 0x00000000) = 0
16422:  time()                      = 1591194272
16422:      Incurred fault #5, FLTACCESS  %pc = 0xFF28D24C
16422:        siginfo: SIGBUS BUS_ADRALN addr=0xF800B426
16422:      Received signal #10, SIGBUS [caught]
16422:        siginfo: SIGBUS BUS_ADRALN addr=0xF800B426

So the process used mmap() to map a file to its memory address space starting at the address 0xF8000000 and then when it tried to access the address 0xF800B426 it was killed with the SIGBUS signal indicating that there was an invalid address alignment (BUS_ADRALN).

This was the first time we saw an issue like this. Memories from the university had some traces of alignment and alignment issues, but not too strong. And actually, it turns out that you will not get this error on x86 CPUs and some other architectures, at least not in a similar scenario. That was why everything worked on most of our machines except for Solaris 10 and 11 and HP-UX machines which use SPARC64 and IA64 CPUs, respectively.

Alignment error example

It is simple to write code that triggers this behavior:

#include <stdio.h>

int main() {
    int nums[2] = {1, 2};

    unsigned char *bytes = (unsigned char *) nums;

    bytes++;
    bytes++;
    int wrong = *((int *)bytes);
    printf ("wrong number: %d\n", wrong);

    return 0;
}

When compiled and run on an x86_64 system, it produces this output:

wrong number: 131072

but on our Solaris 10 SPARC64 machine, it’s not so lucky:

Bus Error (core dumped)

The problem is that when trying to load an integer from an address that is 2 bytes after where the nums array starts, there is an alignment problem. The nums array is properly aligned (otherwise it wouldn’t be accessible at all!) and thus its address incremented by 2 is not divisible by 4 (sizeof(int)). On x86_64, this is not a problem, the address is in the address space of the process and so the 4 bytes from the middle of the nums array are interpreted as an int (why 131072 is left as an exercise to the reader).

But wait! CFEngine code is (hopefully) not doing anything crazy and weird like that, right?! It’s not. However, that’s where mmap() comes into the game.

Alignment and mmap()

Let’s have a look at another, this time a bit longer, example:

#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>

struct Data {
    int num;
    char code[2];
};

int main () {
    char chars[2] = {'a', 'b'};
    struct Data data = { 1, { 'c', 'd' }};

    printf("chars (%p): %c, %c\n", chars, chars[0], chars[1]);
    printf("data  (%p): {%d, {%c, %c}\n", &data, data.num, data.code[0], data.code[1]);


    int fd = open("/tmp/file.dat", O_CREAT|O_TRUNC|O_WRONLY, S_IRUSR|S_IWUSR);
    ssize_t written = write(fd, chars, sizeof(chars));
    if (written != sizeof(chars)) {
        printf("Failed to write 'chars': %m\n");
        return 1;
    }
    written = write(fd, &data, sizeof(data));
    if (written != sizeof(data)) {
        printf("Failed to write 'chars: %m'\n");
        return 1;
    }
    close(fd);

    fd = open("/tmp/file.dat", O_RDONLY);
    const size_t total_size = sizeof(chars) + sizeof(data);
    unsigned char buf[total_size];
    ssize_t n_read = read(fd, buf, total_size);
    if (n_read != total_size) {
        printf("Failed to read data back: %m\n");
        return 1;
    }
    close(fd);

    printf("File contents:");
    for (size_t i = 0; i < total_size; i++)
        printf(" %x", buf[i]);
    printf("\n");


    fd = open("/tmp/file.dat", O_RDONLY);
    struct stat sb = {0};
    fstat(fd, &sb);
    unsigned char *addr = mmap(NULL, sb.st_size, PROT_READ, MAP_SHARED, fd, 0);

    char *chars_in_file = addr;
    struct Data *data_in_file = (struct Data *) (addr + sizeof(chars));

    printf("chars_in_file (%p): %c, %c\n", chars_in_file, chars_in_file[0], chars_in_file[1]);
    printf("data_in_file  (%p): {%d, {%c, %c}\n", data_in_file, data_in_file->num, data_in_file->code[0], data_in_file->code[1]);

    return 0;
}

The output on my x86_64 system is as one would expect:

chars (0x7ffd0f83a0de): a, b
data  (0x7ffd0f83a0d4): {1, {c, d}
File contents: 61 62 1 0 0 0 63 64 0 0
chars_in_file (0x7f0665784000): a, b
data_in_file  (0x7f0665784002): {1, {c, d}

and we can see multiple interesting things. chars and data are stored on stack which, of course, grows down (in the address space). So the address of data is lower than the address of chars. We can also check that the address of chars (0x7ffd0f83a0de) is divisible by 2, but not by 4. The address of data (0x7ffd0f83a0d4), however, is divisible by 4. Which is good because it is a struct Data variable which starts with an int that is 4 bytes big. The difference between the addresses is 10 so there is padding between the two variables on the stack (data is only 6 bytes big).

If we write both chars and data into a file one right after the other and then read the bytes back, we can see that there are the two char bytes for 'a' (61) and 'b' (62) followed by 4 bytes of the little-endian representation of int 1 followed by char bytes for 'b' and 'c' followed by two zero bytes. Where do these two zero bytes come from? They are padding, of course. The struct Data structure is 6 bytes big (4 bytes for the int num + 2 bytes for char code[2]) so it is padded with 2 zero bytes to be 8 bytes, which is a round (binary) number.

When the file is mapped to the address space of the process using mmap(), pointers to the particular offsets (0 and 2) of the address segment can be created to get access to the copies of chars and data in the file. This time, however, the copy of data (at address 0x7f0665784002) follows immediately after the copy of chars (at address 0x7f0665784000).

Any guesses what happens with the above code on SPARC64 Solaris 10 machine?

chars (ffbffbc0): a, b
data (ffbffbb8): {1, {c, d}
File contents: 61 62 0 0 0 1 63 64 0 0
chars_in_file (ff1e0000): a, b
Bus Error (core dumped)

(Also note that SPARC64 is big-endian, the 4B int 1 is stored as 0 0 0 1.)

And the output from truss sheds some more light:

Received signal #10, SIGBUS [default]
  siginfo: SIGBUS BUS_ADRALN addr=0xFF1E0002

Trying to access the copy of data in the file results in the process being killed with SIGBUS due to an invalid alignment. Of course, the address 0xFF1E0002 is not divisible by 4 and so trying to load the int in the struct attempts to do a non-aligned access.

Getting from under the (SIG)BUS

And the above is what was happening in our tests with the changes that where fixing the WaitForCriticalSection() function. Or, to be more precise, in the OverwriteDB() function, where the data from the mdb_get() function (getting a value from LMDB) was directly passed to the function making the decision about whether to overwrite the data with the new value or not. So while making the decision the function did an unaligned access to the data passed to it which was actually a pointer to a memory segment where the LMDB DB file was mapped before with mmap(). And of course, this didn’t happen always, because the position of the data depended on the contents of the DB file.

The fix was, as usually, almost trivial compared to the complexity of the investigation and debugging. All that was needed was to create a local copy of the data from the mapped segment using the memcpy() function which makes sure the copied data is accessed using aligned addresses. And the destination was a local variable on the stack, which is always properly aligned (by the compiler) for direct access.

Getting out from under a SIGBUS BUS_ADRALN on Solaris/HP-UX

Introduction

Bad alignment

Alignment error example

Alignment and mmap()

Getting from under the (SIG)BUS

Try CFEngine Enterprise for free