Processes, forks and executions

While working on the integration of CFEngine Build into Mission Portal we came to the point where we needed to start executing separate tools from our recently added daemon - cf-reactor. Although it may seem like nothing special, knowing a bit about the process creation and program execution specifics (and having to fight some really hard to solve bugs in the past) we spent a lot of time and effort on this step. Now we want to share the story and the results of the effort, but since understanding of the reasons behind the work together with how the implementation works requires quite deep knowledge of how processes are being created and programs are being started on UNIX-like systems, we first start with a series of blog posts focused on this seemingly simple area. They cover the basics as well as some advanced topics in two parts:

Process creation and relations between processes
The two-step mechanism of an external program execution

Let’s start with the first part right away with the second part being covered in a follow-up blog post.

Process creation and process relations

Operating systems work with entities called processes where a process is an instance of a computer program. A program is a set of data and instructions that can be executed. So a program is a static/passive entity, it is stored in some memory and it only has its static data (including initial values of dynamic data). A process, on the other hand, is an active entity, it has a runtime state stored in its address space and CPU registers and it executes instructions of the program (and dynamically added/generated instructions). The operating system running a process then maintains information about the process – its identifier, resource counters, address space boundaries, security aspects, open files, etc. While a process is a unique entity and it has a unique identifier, there can, of course, be multiple processes running the same program. And there can be multiple copies of a program stored in multiple places, but they are all the same program (the same set of data and instructions).

Process creation

Processes are created by operating systems, either because of decisions of the operating systems themselves or as reactions to requests (system calls) made by existing processes. A natural way of thinking is that a process is created when a program is started being executed. However, this is not generally true. On some operating systems, the UNIX-based ones, a new process is created when an existing process forks (clones/replicates) itself. That is, when the operating system creates a clone of an existing process that requested it. The new process is really a clone of its parent process and right after the fork both processes continue executing the instructions that followed the fork() system call. For a program to be started, some process needs to use the execve() system call (usually by means of one of the exec*() functions) which loads and executes the given program, in the process that made the syscall.

Process tree

With the way processes are created, they actually form a process tree where every process has one parent process and zero or more child processes. When a child process terminates, it becomes a so-called zombie process and its parent process is responsible for reaping it. To do that, the parent process calls one of the wait*() functions waiting for the child process to terminate and obtaining information about their run (in particular their exit status). The information about a process is maintained by the operating system until it is reaped. And so until then it also shows up in process listings/tables/etc.

If processes form a tree, what is the root node in the tree and how is this process created? And what if a parent process terminates and is reaped before one of its child processes breaking the path from the root process to the still-running child process(es)?

Special processes

The answer to both of the above questions is: Not all processes are born equal. As mentioned in the first paragraph of this post, every process has an identifier – PID. PIDs are assigned to processes when they are created and typically, they are something like sequential numbers of processes. So when there's a process with PID 12345, one can assume there have been at least 12345 processes started in the system since it itself was started¹. There are systems with randomized PIDs and various mechanisms for assigning PIDs, but we can ignore those specific cases for now. It then becomes obvious that the root node of the process tree must be a process with PID 0. And that's exactly the case. However, for example on modern Linux systems, the process with PID 0 does not show up in any process table or listing. As described above, processes on UNIX-based systems (incl. Linux) are created by cloning existing processes. But this is not 100 % true! Because for the very first process there is, of course, no process to clone it from! The first process, with PID 0, is actually a part of the operating system itself, one of the so-called kernel threads, kernel tasks or kernel processes (further referred to as kernel threads) and it is usually called swapper or sched because it is responsible for paging (part of memory management) which is one of the first things necessary for running multiple processes.

The first, let's say, "normal" process is the process with PID 1 – the init process. It is, well…, the initial process and it is responsible for, well…, initializing the system by setting things up and, in particular, creating child processes and starting programs that themselves will create new processes and start other programs to, eventually, bring the operating system and components running on it into a ready-to-use state. One could expect that the init would be the root of the process tree. But looking at the first few lines of output from the ps command on a modern GNU/Linux system, we can see it is not the case:

$ ps -eo pid,ppid,cmd|head -n10
PID    PPID CMD
1       0 /usr/lib/systemd/systemd --switched-root --system --deserialize 31
2       0 [kthreadd]
3       2 [rcu_gp]
4       2 [rcu_par_gp]
5       2 [netns]
7       2 [kworker/0:0H-events_highpri]
9       2 [kworker/0:1H-events_highpri]
10       2 [mm_percpu_wq]
11       2 [rcu_tasks_kthread]

As we can see, there are processes with parent PID (ppid) equal to 0. So their parent process is the hidden very first process with PID 0. The init itself (systemd in this case) has ppid equal to 0 and then there is another such process – [kthreadd]. Its name is in square brackets and it suggests it has something to do with kernel, threads, and acting as a daemon. In fact, it is again a kernel thread, a part of the operating system itself, not a normal process. And we can see other processes with names in square brackets, a convention used for marking kernel threads. Their ppid, however, is 2 so their parent process is the kthreadd which is thus the kernel thread responsible for creation of other kernel threads. Whether the kernel code actually uses the same mechanisms for creating new kernel threads from kthreadd as when forking/cloning new processes is an exercise left to the reader². An interesting observation from the above example output is that the kernel threads have PIDs greater than the init process even though they are actually needed for the init process to run. But again, PIDs are assigned when processes are created not when they start running a program. And the init process is guaranteed to always have PID equal to 1.

The first few lines of the ps output only show the init process and kernel threads. Conveniently, ps sorts the processes based on their PID by default. So what can we see if we look a bit further down in the output? Of course there are more kernel threads. And then, the first "normal" processes show up – systemd-journald and systemd-udevd in our case. At this point it should be no surprise that they have parent PID equal to 1. Even further down there is the first process with ppid different from 1 and 2 – sedispatch – a child process of the auditd process.

...
764       2 [xfs-log/dm-4]
765       2 [xfs-cil/dm-4]
766       2 [xfsaild/dm-4]
870       1 /usr/lib/systemd/systemd-journald
892       1 /usr/lib/systemd/systemd-udevd
944       2 [ktpacpid]
950       2 [cfg80211]
951       2 [nv_queue]
952       2 [nv_queue]
953       1 /usr/sbin/dmeventd -f
959       2 [irq/141-iwlwifi]
...
1116       1 /usr/lib/systemd/systemd-oomd
1117       1 /usr/lib/systemd/systemd-resolved
1118       1 /sbin/auditd
1120    1118 /usr/sbin/sedispatch

The reaper(s)

As mentioned above in the Process tree section, when a process terminates, it becomes a zombie process and its parent process is responsible for reaping it. Until that happens, the operating system maintains the information about the process. And it is actually only the parent process that can reap the child process by using one of the wait*() functions.

But what if the parent process terminates before one or more of its child processes which then become orphan processes? Well, then reaping of such orphan process is a job for the reaper (process) which becomes a new parent of the orphan process. Traditionally, the init process is the reaper and all orphan processes are assigned to it. Their parent PID is set to 1, the init process is notified (with a SIGCHLD signal) every time one of such processes terminates and it can call one of the wait*() functions to reap the terminated process. Linux, however, since version 3.4 supports so-called subreapers that are described in the prctl(2) man page:

A subreaper fulfills the role of init(1) for its descendant processes. When a process becomes orphaned (i.e., its immediate parent terminates), then that process will be reparented to the nearest still living ancestor subreaper. Subsequently, calls to getppid(2) in the orphaned process will now return the PID of the subreaper process, and when the orphan terminates, it is the subreaper process that will receive a SIGCHLD signal and will be able to wait(2) on the process to discover its termination status.

If we use a trivial example of an orphaned process, we can see these mechanisms in action:

$ echo $$
333288
$ bash -c 'sleep 10 &'; ps -o pid,ppid,cmd
PID    PPID CMD
333288    5816 bash
496832    2536 sleep 10
496833  333288 ps -o pid,ppid,cmd

A bash process started (from an interactive bash session with PID 333288 as the value of the special $$ variable shows) with a command that tells it to create a sleep 10 process in the background. This bash process then terminates immediately and so the sleep process it started becomes an orphan process. In the ps output we can see the process of the interactive bash session, the sleep 10 process and the ps process itself. And as it shows, the parent PID of the sleep 10 process is 2536. By using ps again:

$ ps -p 2536 -o pid,ppid,cmd
PID    PPID CMD
2536       1 /usr/lib/systemd/systemd --user

we can see that the process with PID 2536 is actually systemd! But this time, it is not running as the init process (with PID 1), but as a User Manager – a process that manages user services and many other things and that is a subreaper for orphaned processes started by the given user. And of course, if we run ps -p 496832 more than 10 seconds after the sleep 10 was started, it shows no results – the 496832 process no longer exists which means it was reaped by its (new) parent process.

Hopefully, this post helped shed some light on how processes work in UNIX-like systems, the meaning of PIDs and PPIDs, and familiarized you with terminology like process trees, parent and child processes, reaping, zombies, etc. In the part 2 blog post, we go more into depth about what happens inside the process, with fork(), exec(), file descriptors, threads, locks, and more.

The PIDs of the daemon processes that start during the boot process can actually be used to determine if the system uses a shell script based boot process or something else (like systemd or upstart) because shell scripts create many processes and so even the PIDs of the daemon processes that start early in the boot process are quite high. ↩︎
Hint: git grep 'pid_t kernel_thread(' in the Linux sources. ↩︎

Processes, forks and executions - part 1

Process creation and process relations

Process creation

Process tree

Special processes

The reaper(s)

Try CFEngine Enterprise for free