While working on the integration of CFEngine Build into Mission
Portal
we came to the point where we needed to start executing separate tools from our recently added daemon -
cf-reactor
. Although
it may seem like nothing special, knowing a bit about the process creation and program execution
specifics (and having to fight some really hard to solve bugs in the past) we spent a lot of time
and effort on this step. Now we want to share the story and the results of the effort, but since
understanding of the reasons behind the work together with how the implementation works requires
quite deep knowledge of how processes are being created and programs are being started on UNIX-like
systems, we first start with a series of blog posts focused on this seemingly simple area. They
cover the basics as well as some advanced topics in two parts:
-
Process creation and relations between processes
-
The two-step mechanism of an external program execution
Let’s start with the first part right away with the second part being covered in a follow-up blog post.
Process creation and process relations
Operating systems work with entities called processes where a process is an instance of a computer program. A program is a set of data and instructions that can be executed. So a program is a static/passive entity, it is stored in some memory and it only has its static data (including initial values of dynamic data). A process, on the other hand, is an active entity, it has a runtime state stored in its address space and CPU registers and it executes instructions of the program (and dynamically added/generated instructions). The operating system running a process then maintains information about the process – its identifier, resource counters, address space boundaries, security aspects, open files, etc. While a process is a unique entity and it has a unique identifier, there can, of course, be multiple processes running the same program. And there can be multiple copies of a program stored in multiple places, but they are all the same program (the same set of data and instructions).
Process creation
Processes are created by operating systems, either because of decisions of the operating systems
themselves or as reactions to requests (system calls) made by existing processes. A natural way of
thinking is that a process is created when a program is started being executed. However,
this is not generally true. On some operating systems, the UNIX-based ones, a new process is created
when an existing process forks (clones/replicates) itself. That is, when the operating system
creates a clone of an existing process that requested it. The new process is really a clone of
its parent process and right after the fork both processes continue executing the instructions that
followed the fork()
system call. For a program to be started, some process needs to use the
execve()
system call (usually by means of one of the exec*()
functions) which loads and executes
the given program, in the process that made the syscall.
Process tree
With the way processes are created, they actually form a process tree where every process has one
parent process and zero or more child processes. When a child process terminates, it becomes a
so-called zombie process and its parent process is responsible for reaping it. To do that, the
parent process calls one of the wait*()
functions waiting for the child process to terminate and
obtaining information about their run (in particular their exit status). The information about a
process is maintained by the operating system until it is reaped. And so until then it also shows up
in process listings/tables/etc.
If processes form a tree, what is the root node in the tree and how is this process created? And what if a parent process terminates and is reaped before one of its child processes breaking the path from the root process to the still-running child process(es)?
Special processes
The answer to both of the above questions is: Not all processes are born equal. As mentioned in the
first paragraph of this post, every process has an identifier – PID. PIDs are assigned to
processes when they are created and typically, they are something like sequential numbers of
processes. So when there's a process with PID 12345
, one can assume there have been at least
12345 processes started in the system since it itself was started1. There are systems with
randomized PIDs and various mechanisms for assigning PIDs, but we can ignore those specific cases
for now. It then becomes obvious that the root node of the process tree must be a process with PID
0
. And that's exactly the case. However, for example on modern Linux systems, the process with
PID 0
does not show up in any process table or listing. As described above, processes on
UNIX-based systems (incl. Linux) are created by cloning existing processes. But this is not 100 %
true! Because for the very first process there is, of course, no process to clone it from! The first
process, with PID 0
, is actually a part of the operating system itself, one of the so-called
kernel threads, kernel tasks or kernel processes (further referred to as kernel threads) and
it is usually called swapper or sched because it is responsible for paging (part of memory
management) which is one of the first things necessary for running multiple processes.
The first, let's say, "normal" process is the process with PID 1
– the init process. It is,
well…, the initial process and it is responsible for, well…, initializing the system by
setting things up and, in particular, creating child processes and starting programs that themselves
will create new processes and start other programs to, eventually, bring the operating system and
components running on it into a ready-to-use state. One could expect that the init would be the
root of the process tree. But looking at the first few lines of output from the ps
command on a
modern GNU/Linux system, we can see it is not the case:
$ ps -eo pid,ppid,cmd|head -n10
PID PPID CMD
1 0 /usr/lib/systemd/systemd --switched-root --system --deserialize 31
2 0 [kthreadd]
3 2 [rcu_gp]
4 2 [rcu_par_gp]
5 2 [netns]
7 2 [kworker/0:0H-events_highpri]
9 2 [kworker/0:1H-events_highpri]
10 2 [mm_percpu_wq]
11 2 [rcu_tasks_kthread]
As we can see, there are processes with parent PID (ppid
) equal to 0
. So their parent process is
the hidden very first process with PID 0
. The init itself (systemd in this case) has ppid
equal to 0 and then there is another such process – [kthreadd]. Its name is in square brackets
and it suggests it has something to do with kernel, threads, and acting as a daemon. In fact,
it is again a kernel thread, a part of the operating system itself, not a normal process. And we can
see other processes with names in square brackets, a convention used for marking kernel
threads. Their ppid
, however, is 2
so their parent process is the kthreadd which is thus the
kernel thread responsible for creation of other kernel threads. Whether the kernel code actually
uses the same mechanisms for creating new kernel threads from kthreadd as when forking/cloning new
processes is an exercise left to the reader2. An interesting observation from the above example
output is that the kernel threads have PIDs greater than the init process even though they are
actually needed for the init process to run. But again, PIDs are assigned when processes are
created not when they start running a program. And the init process is guaranteed to always
have PID equal to 1
.
The first few lines of the ps
output only show the init process and kernel
threads. Conveniently, ps
sorts the processes based on their PID by default. So what can we see if
we look a bit further down in the output? Of course there are more kernel threads. And then, the
first "normal" processes show up – systemd-journald and systemd-udevd in our case. At this
point it should be no surprise that they have parent PID equal to 1
. Even further down there is
the first process with ppid
different from 1
and 2
– sedispatch – a child process of the
auditd process.
...
764 2 [xfs-log/dm-4]
765 2 [xfs-cil/dm-4]
766 2 [xfsaild/dm-4]
870 1 /usr/lib/systemd/systemd-journald
892 1 /usr/lib/systemd/systemd-udevd
944 2 [ktpacpid]
950 2 [cfg80211]
951 2 [nv_queue]
952 2 [nv_queue]
953 1 /usr/sbin/dmeventd -f
959 2 [irq/141-iwlwifi]
...
1116 1 /usr/lib/systemd/systemd-oomd
1117 1 /usr/lib/systemd/systemd-resolved
1118 1 /sbin/auditd
1120 1118 /usr/sbin/sedispatch
The reaper(s)
As mentioned above in the Process tree section, when a process terminates, it
becomes a zombie process and its parent process is responsible for reaping it. Until that
happens, the operating system maintains the information about the process. And it is actually only
the parent process that can reap the child process by using one of the wait*()
functions.
But what if the parent process terminates before one or more of its child processes which then
become orphan processes? Well, then reaping of such orphan process is a job for the reaper
(process) which becomes a new parent of the orphan process. Traditionally, the init process is the
reaper and all orphan processes are assigned to it. Their parent PID is set to 1
, the init
process is notified (with a SIGCHLD
signal) every time one of such processes terminates and it can
call one of the wait*()
functions to reap the terminated process. Linux, however, since version
3.4 supports so-called subreapers that are described in the prctl(2)
man page:
A subreaper fulfills the role of init(1) for its descendant processes. When a process becomes orphaned (i.e., its immediate parent terminates), then that process will be reparented to the nearest still living ancestor subreaper. Subsequently, calls to getppid(2) in the orphaned process will now return the PID of the subreaper process, and when the orphan terminates, it is the subreaper process that will receive a SIGCHLD signal and will be able to wait(2) on the process to discover its termination status.
If we use a trivial example of an orphaned process, we can see these mechanisms in action:
$ echo $$
333288
$ bash -c 'sleep 10 &'; ps -o pid,ppid,cmd
PID PPID CMD
333288 5816 bash
496832 2536 sleep 10
496833 333288 ps -o pid,ppid,cmd
A bash
process started (from an interactive bash
session with PID 333288
as the value of the
special $$
variable shows) with a command that tells it to create a sleep 10
process in the
background. This bash
process then terminates immediately and so the sleep
process it started
becomes an orphan process. In the ps
output we can see the process of the interactive bash
session, the sleep 10
process and the ps
process itself. And as it shows, the parent PID of the
sleep 10
process is 2536
. By using ps
again:
$ ps -p 2536 -o pid,ppid,cmd
PID PPID CMD
2536 1 /usr/lib/systemd/systemd --user
we can see that the process with PID 2536
is actually systemd! But this time, it is not running
as the init process (with PID 1
), but as a User Manager – a process that manages user
services and many other things and that is a subreaper for orphaned processes started by the given
user. And of course, if we run ps -p 496832
more than 10 seconds after the sleep 10
was started,
it shows no results – the 496832
process no longer exists which means it was reaped by its (new)
parent process.
Hopefully, this post helped shed some light on how processes work in UNIX-like systems, the meaning of PIDs and PPIDs, and familiarized you with terminology like process trees, parent and child processes, reaping, zombies, etc.
In the part 2 blog post, we go more into depth about what happens inside the process, with fork()
, exec()
, file descriptors, threads, locks, and more.
-
The PIDs of the daemon processes that start during the boot process can actually be used to determine if the system uses a shell script based boot process or something else (like systemd or upstart) because shell scripts create many processes and so even the PIDs of the daemon processes that start early in the boot process are quite high. ↩︎
-
Hint:
git grep 'pid_t kernel_thread('
in the Linux sources. ↩︎