The Joint Australian Tsunami Warning Centre’s Earthquake Monitoring component at Geoscience Australia recently adopted CFEngine’s newest worldwide software release to radically simplify the management of their test and production real-time monitoring systems.
With so many earthquake events along the Indo-Australian plate, and the risk of equally catastrophic Tsunami waves, Tsunami warnings are a mission critical service in the area. The recent 7.4 magnitude earthquake in Aceh, Indonesia sparked fears of Tsunami along the recently hard-hit areas around the Indian ocean, though major Tsunami usually result from much larger events.
Tsunami alerts have the potential to save lives and avert major damage to coastlines. Forewarned is forearmed. Although Tsunami Earthquake detection models can be remarkably accurate, testing of the automatic warning software is of vital importance to avoid both false and missing positives[1].
Leading the systems development effort, software engineer Michael Potter says: “I based my decision to use CFEngine on a visit to another earthquake monitoring center in the United States, who were using the older CFEngine 2 with great success in managing their system configuration. After careful analysis of the capabilities of CFEngine 3, we gradually introduced it to manage our test, and eventually our production, environments. We make extensive use of CFEngine’s line editing capabilities, which are unique amongst configuration management systems.”
“We use CFEngine in the context of running operational production software systems, as opposed to system administration, and production software systems need a degree of automation in several of their components. CFEngine has allowed me to transform the way in which my team deploys configurations into our test and production environments. Previously there was a dependence on tar-balls and ad-hoc instructions to configure our environments - now we deploy a small, standard set of identical packages to all our servers in a single release, and let CFEngine work out what components need to be applied on any given server, including restarting different processes as necessary.”
Change management
One of the effects of introducing CFEngine is that it became impossible for changes to be made directly to the production system, as CFEngine would always revert changes to the desired state. This prevented ad-hoc changes being made - a downside to earlier approaches. However it also meant that changes had to pass through a single point of management, which reduced the ability of operators to make legitimate short term changes.
“The solution that CFEngine allowed was to provide a controlled cross-platform interface to the operators which let them update a special control file. CFEngine itself would then read and adapt it for the production system, on the fly, with its line-editing capability. Hence we now get to have our cake and eat it too: CFEngine prevents anyone from making permanent configuration changes directly to production systems, but our operators still have the degree of control they need by letting CFEngine itself make the changes in a controlled manner.”
The team uses line-editing heavily. A major benefit of this feature has been the automatic configuration of the test system. Previously the test systems did not fully replicate the production environments, and running something in the test environment did not provide the desired degree of confidence that it would also run in production. Using CFEngine however, engineers have been able to develop a system where they deploy precisely the same configurations and files to the test environment that are destined for production. CFEngine adapts these along the way, turning off the sending of critical alerts to the whole world in the case of a Tsunami Earthquake event. “We hope in future to let CFEngine manage a full, continuous integration suite, by stopping the system at certain times, removing and adding daily build packages, then running a test data suite.”
“Where do I see my future use of CFEngine heading? More automation, especially around testing, and more integration into our production systems – particularly around providing more controlled interfaces to the production system, as well as detecting and acting upon certain events or scenarios where some change is required to keep the system running smoothly.”