IT-organizations around the world face constant pressure to do more, do it faster and with fewer resources. They are seen as cost centers and as such benchmarked against the performance of nimbler competitors and the public cloud providers.
IT-organizations feel that pressure from multiple sources at the same time. The CFO compares his internal IT-budgets with the costs of public cloud offerings and demands comparable savings. The lines of business know they can get the latest applications and platforms within minutes in the public cloud and complain about the slowness of their internal IT department. And the end-users expect continuous availability of all IT-services. Add Moore’s law to describe the growth in scale and complexity of the IT infrastructure, and it becomes understandable why CIOs have the shortest tenure of all C-level executives.
Luckily automation can come to rescue for most IT-organizations.
The business value produced by highly automated IT-organizations is increasing exponentially. They deliver unbeatable metrics when it comes to change-frequency, IT-service quality and operational costs. Some companies are now so good at automation that they can beat the economics of public clouds, move their application workloads back to their on-premise infrastructure and gain more granular control of their data, their policies and their compliance.
I have previously argued (Why Automation Leads to Greatly Reduced IT Costs) that IT-organizations that don’t become highly automated run a great risk of being outsourced. Modern organizations will be managed by a new breed of System Administrators, the System Engineers (From System Administrator to System Engineer). IT-organizations that are highly automated will become much more productive than the ones that don’t, and the delta grows exponentially with infrastructure size over time.
The value gap between manual and automated IT-operations grows exponentially with the increase of changes and size of infrastructure
Automation requires upfront investments. For very small environments (less than hundred servers/VMs) there is not that much value in automation, but as the number of changes and the size of infrastructure grows, the value of automation increases exponentially. We see it especially in the number of tickets and mean time to repair (MTTR). IT-operational metrics, like number of support tickets, mean time to repair, time to deploy new changes and the numbers of system engineers are all metrics that flatten in a highly automated environment.
Once all changes are version controlled and managed by CFEngine through patterns and models, it doesn’t matter whether you are managing 5,000 or 50,000 servers. In the manual scenario all of these metrics will grow above linear and quickly result in an unsustainable situation of explosive headcount increases and poor IT performance.
The automation train is leaving the station and if you are responsible for IT-operations in your organization the time is now to embark on your automation journey.
Automation is a journey
Automating IT-operations is a journey both with regards to the level of automation, the IT-service and the number of IT-services. Ideally all changes in IT-operations should be managed by an automation solution and recorded in version control. Very few organizations enjoy this luxury, but we are starting to see more and more that do. CFEngine has use-cases where one System Engineer manages thousands of servers, changes are deployed into production on a daily basis and automatic remediation of drifts saves the need for manual actions and ensures a high quality of service.
For most IT-organizations the automation nirvana seems unreachable. A first important step is to realize that automation is a process that requires buy-in from people. Start slowly, get buy-in and then accelerate as everyone begins to recognize the benefits.
Automation is a journey and it takes time!
Start by focusing on your customer and the IT-services you are providing. Automate the most mission-critical ones first. Once you see the results, you can move on to the less mission-critical applications. In the end all IT-services should be automated because an outage anywhere in the stack can have cascading effects that are impossible to predict the outcome of.