A popular theme in science fiction writing today involves microscopic machines that build and repair other machines, clothing and even people. Today scientists are researching this very thing. Our future may be bright indeed. For system administration nano technology is already here.
Software agents now exist that will search computer systems and repair defects or build missing components. CFEngine is such an agent. It is able to examine the overall system at a low level including files, processes and network packets and make builds or repairs based on what it finds. The result is a more self reliant system.
Consider the average web server. What might the criteria be that describe a healthy web server?
The Apache application package is installed.
The Apache configuration files’ ownership, permissions and content are correct.
The Apache process is running with a minimum number of children.
The SSL key file’s ownership, permissions and content are correct.
The content that Apache will serve has the correct ownership, permissions and content.
The correct file systems are mounted.
File systems should have a minimum amount of free space.
The volume of inbound http or https traffic should fall within a desired range.
System load should fall within an acceptable range.
The volume of none http or https traffic should fall within an acceptable range.
CPU temperature should fall within an acceptable range.
Log files are rotated regularly to avoid dangerous growth.
Temp files are purged regularly to avoid dangerous growth.
Other services such as ftp should not be running.
A historical approach to this problem would be one of passive monitoring and reactive humans. A monitoring system, often a heterogeneous collection of tools, keeps watch on the system and reports to a central office if anything is found to be outside of the norm. A human is tasked with examining and correcting the problem. How long does this take?
Our CFEngine nanobot is able to monitor these criteria and often take action without the need for a human. For example.
If file ownerships or permissions are wrong then correct them.
If file contents are wrong then copy the correct file from a repository or rebuild the file on the fly.
If the number of Apache processes falls outside of the allowed range then restart Apache.
If the Apache package is not installed then install it.
If file systems are missing then attempt to mount them.
If file systems have less than the minimum amount of free space delete certain files, switch over the service to another host or report the incident.
If http traffic is outside of the desired range then restart Apache, increase the number of children, start another load balanced host or report the incident.
If undesired services or processes are found running on the hosts kill them and report the incident.
Our CFEngine nanobot is capable of taking corrective action immediately after detecting a problem. How long will this take compared to the traditional approach of alerting a human and waiting for intervention?
It may be a while yet before nanobots can scrub your clothes clean as you wear them but having a nanobot autonomously repair your software system is possible now.