Organizations invest more in site reliability engineering but challenges still persist

process automation

Businesses are investing more in site reliability engineering but are being held back by outdated and manual processes, according to a new report.

A study of 450 site reliability engineers carried out by software intelligence company Dynatrace finds 88 percent say there is now more understanding of the strategic importance of their role than there was three years ago.

In addition 68 percent of SREs expect their role in security to become more central in the future, as organizations continue using third-party libraries, such as Log4j, for cloud-native application development.

However, almost all (99 percent) encounter challenges when defining and creating service level objectives to evaluate service levels for applications and infrastructure. The most common challenges include are cited as: too many data sources (64 percent), difficulty finding the most relevant metrics for a service (54 percent), and the inability of monitoring tools to easily define and track SLO performance (36 percent). 68 percent of SREs say siloed teams and multiple tools make it difficult to align on a single version of ‘the truth’ about service levels.

“Reliability has become a critical success factor in a world where every second of downtime leads to lost revenue, declining share prices, and lasting reputational damage,” says Bernd Greifeneder, founder and chief technology officer at Dynatrace. “This makes SRE central to driving faster digital transformation. Most organizations, however, remain relatively immature in their adoption of SRE practices. At a time when demand far outstrips the supply of skilled engineers, organizations should be doing everything in their power to amplify the efforts of these teams. Despite this, manual steps and unnecessary effort are a major distraction for SREs, which holds organizations back. SREs must define a ‘golden path,’ a set of steps development teams can take to navigate the complexity of cloud-native delivery, to overcome these barriers and fully unleash digital innovation.”

Among other findings, 71 percent of organizations are increasing the use of automation across every part of the lifecycle to reduce the workload for developers and SREs. Automation in SRE is primarily being used to resolve security vulnerabilities (61 percent), and application failures (57 percent), increase the speed of delivery (56 percent), and predict SLO violations before they occur (55 percent).

You can get the full report from the Dynatrace site.

Photo Credit: NicoElNino/Shutterstock

Author: Martha Meyer