Recently we introduced new feature where you can trigger agent runs and report collection from the Mission Portal UI.
This required our daemon cf-execd
to behave a bit differently when periodic agent runs occur.
Previously the daemon would create a new thread in which to run cf-agent
, capture output, wait for completion and move on.
We changed the behavior so that the daemon forks itself and then fork/execs cf-agent
as before, with the forked cf-execd
processing agent run output.
When the agent is finished the forked cf-execd
process is left a zombie/defunct.
The daemon wakes up every minute to see if it should do an agent run.
The next time the original daemon cf-execd
“wakes up” it will clean up that defunct forked cf-execd
.
So now two cf-execd processes are expected around the time of each cf-agent run, either the periodic run or the runs triggered by Mission Portal UI.
Watchdogs keep things tidy
Many system administrators wisely use watchdog tools to ensure that the correct number of processes are present for the services they implement. Many commercial and open source projects exist to monitor systems and take action.
If you use such a system to ensure that there is only one cf-execd process, you will need to adjust your setup to allow two processes if you are running version 3.18.0 or greater.
There is a built-in watchdog in the Masterfiles Policy Framework that is not enabled by default, which we had to adjust when this change was implemented.
This watchdog currently works on three platforms: Linux, AIX and Windows, each with different functionality. The linux version is a crontab entry which ensures that at least one cf-execd process is present. The Windows version ensures a threshold of time and number of processes of cf-agent. And most similar to what many administrators may have configured is the AIX watchdog which tracks several different problem cases or pathologies and takes action if needed.
Adjust your thresholds
So remember to adjust any monitoring software which expects only one cf-execd
daemon to run and allow it to run 2.
And if you don’t have monitoring software already doing this work, consider using our built-in watchdog.