Measuring to improve quality

August 10, 2022 Janet Gregory, Lisa Crispin

measure, quality Holistic Testing, Process, Quality

Font size:

Quality is hard to define and hard to measure in a meaningful way. And there are many aspects of quality. Process quality is about how well a team creates and delivers their product. Product quality is what the customer cares about. It’s all important.

Organizations that continually improve their product and process quality need data. Business leadership wants data about how their teams are performing. They also want to know if their customers are happy. Measuring quality and performance is tricky. Metrics like bug counts or lines of code written are often meaningless and can be gamed. For example, we’ve seen teams say that they cannot put code into production with critical or high severity bugs, so someone decides that the bugs really are medium, because they have work arounds.

A trend we’ve experienced in the past few years is that organizations are adopting the metrics identified by the Accelerate State of DevOps survey report by DORA that correlate with team performance. The Accelerate State of DevOps survey has provided academically rigorous research that gives us reliable data on the most effective ways to deliver software. Using the results and comparing them from year to year, Google’s DevOps Research and Assessment team (DORA) identified five key software delivery performance metrics to help teams continuously improve. These metrics mostly relate to process quality but can influence product quality as well.

Lead time for changes:

As defined in the Accelerate report, this is the amount of time that elapses from when a new change (code or configuration) is committed to the repository, to when it is successfully running in production.

Lead time reflects how fast teams can get feedback from production use. The survey also showed that tester-developer collaboration reduces lead time for changes. This metric reflects process quality – shortening the feedback loop. Continuous integration shortens this time. Continuous delivery and deployment shorten it even more.

Deployment frequency:

The deployment frequency measures how frequently and consistently an organization deploys to production. Practicing continuous deployment, means deploying small changes frequently to production. Elite-performing teams deploy multiple times per day.

In some business domains, for example, safety-critical domains like medical or transportation, customers don’t want frequent changes. Even with safe release strategies such as feature flagging, the perceived risk is too high. Organizations can still maintain practice continuous delivery having deploy-ready artifacts, but not deployed to production. Focusing on finishing small changes with a quick cadence reduces risk.

While this measures an organization’s process quality, product quality is directly impacted by it. For example, if new code changes continually cause regression failures, the time to identify and fix them will affect deployment frequency, which can affect a customer’s perception of product quality.

Time to restore service:

Time to restore service is the average time a team or organization needs to restore service after a severe incident or a defect that impacts users, such as an unplanned outage. Longer times to restore service means unhappy customers, who perceive poor product quality. There was an outage by one of the main services of cell phones and internet in Canada during July 2022. Major businesses were unable to do their work. People like Janet were inconvenienced because she couldn’t use her google maps to find a location. Customers – all kinds, suffered. The reason given was a software update. It took the company over 12 hours to even restore the most basic cell service.

This metric can affect a wide range of organizational practices. The application or service needs appropriate telemetry so that failures are identified quickly. The code base must be easy to understand and update and be protected by a good safety net of automated regression tests. The deployment pipeline or delivery workflow must finish quickly to get fixes out to production. The team needs good working agreements to respond to production issues. Many different factors go into preventing customer pain.

Change Failure Rate:

The percentage of changes to production that result in degraded service, e.g., service impairment or outage that require remediation such as a hotfix or rollback, is the change failure rate. Teams that deploy frequently may have a higher number of failures, but if their change failure rate is low, their overall success is greater. For example, if they deploy 5 changes a day, that means 25 changes in a week. If 5 of those changes fail, the rate is 25%. If a team deploys only once a week, and that change fails, they have 100% failure rate.

This metric reflects both product and process quality. The synergy of combining change failure rate with time to restore service is powerful. Teams that spend a lot of time fixing problems have less time to devote to new features. All the leading development practices that help teams produce maintainable, testable, operable code, building quality in and testing effectively, lower the change failure rate.

Reliability:

Reliability is the operational performance and a measure of modern operational practices. It includes quality attributes such as availability, latency, performance, and scalability. The DORA State of DevOps survey asked respondents to rate their ability to meet or exceed their reliability targets. Teams see better outcomes when they prioritize operational performance. They achieve their service level objectives, they use automation appropriately, and they are prepared to respond to production problems quickly. The survey found that teams doing well with site reliability engineering practices performed better in other areas and had better business outcomes. These benefits apply to all levels of team performance.

Using the DORA metrics

Elite-performing teams have a competitive advantage, and the survey results show the number of teams in the elite category growing fast each year. Teams can use these metrics to get a baseline of their current performance and identify what they want to improve next. The data helps them see what’s holding them back and helps them measure experiments to overcome those constraints.

As with any kind of measurement, consider your context. Look at the big picture drawn by all five metrics. They can help with finding the right balance of speed and reliability. Use these metrics together with others to guide setting goals and designing small, iterative experiments to work towards them.

Sources:

2021 State of DevOps report: https://cloud.google.com/devops/state-of-devops

“How DORA metrics can measure and improve performance” by Ganesh Datta, DevOps.com, https://devops.com/how-dora-metrics-can-measure-and-improve-performance