Flaky tests

March 20, 2024     Janet Gregory, Lisa Crispin
automation, continuous integration, flaky tests     Test Automation

Font size:

We’re not sure if you started singing along with Marit, but we certainly did. This tweet captured the essence of what we think about flaky (sometimes spelled flakey) tests.

What are “flaky tests”?


To us, flaky tests are ones that pass when you run them locally, pass most of the time when run as part of an automated test suite in the CI pipeline, but fail sometimes for no understandable reason. Another case might be if you run them and they fail, but rerun them, they pass. Your team is rolling the dice with every run of your automated test suite.

These tests add no value. We can’t trust them, and we don’t know if they are testing the right thing. They give people a false sense of security. Your team thinks you have a test that covers a particular functionality, but really you don’t.

Identifying and addressing flaky tests

The teams that Lisa worked with, had a sophisticated system to identify flaky tests based on pass/fail rates. This visibility was helpful, and the reports help teams see how big a problem they had. Still, analyzing the failures is a headache. Flaky tests get “quarantined” so the team can get a “green” build. Then, if the test passes locally, they put it back into the CI and wait for a failure. Once a test fails, finding the problem often means hopping from one machine to another, trying to find the right log file with the right information. So many of us have felt this pain that eats up so much time and kills our confidence for deploying changes to production.

These days, there are lots of tools to help your team identify flaky tests and prioritize which ones to investigate first. Modern continuous integration tools such as Semaphore CI and Circle CI gather data about inconsistent results across runs. They have features that let you find the tests causing the biggest problems. These might use AI or LLMs, or they may be using statistical analysis. We asked Gemini about using AI and LLM-based tools to help identify flaky tests. It agreed that these can help, and there are other ways to address flakiness. Our favorite part of the response:

Combine AI's analytical power with human expertise. Developers and testers can use AI-generated reports to pinpoint flaky tests and work together to resolve the underlying issues.

Our advice

What’s our advice for dealing with flaky tests? If you can’t fix a flaky test so that it only fails for legitimate reasons, throw it away. (We know, it’s hard to delete something you spent time creating, so perhaps archive it somewhere for now). If there is value in having a test to do that particular check, write a new one. Have one clear purpose for each test, so that if it fails, you know exactly what was being tested. Use good test and code design practices so that your test code is easy to understand and maintain. We’re fans of having coders and testers collaborate to automate tests, making the most of both code and test design skills.

Flakiness is generally most common in the end-to-end or “workflow” tests. Collaborate with your delivery teammates to see if any of these can be split into smaller, less brittle tests. Are these tests for business logic that could be checked at a lower level such as the API level? An easy check to do, is look to see if you have more than one assert in a test.

Think back to the complicated process Lisa’s teammates went through to diagnose test failures. These flaky test failures are in the realm of “unknown unknowns”. We can’t predict all the failures. What they could do is to log every event that occurs before, during and after each test runs in a central location. Then they would only have one place to research test failures, and all the data they might need is there for easy accessibility. Observability, the ability to ask questions of our systems to explore and solve totally unpredictable problems, applies to our automated tests as well.

Finally, investing the time to address flaky tests is important because flaky tests are often indications of flaky production code! Lisa has often paired with a developer to fix flaky tests, only to find that the failure occurred due to a real bug that simply didn’t happen often due to timing or some other reason. You might want to check out Gojko Adzic’s book Fifty Quick Ideas to Improve your Tests, and its section on how to deal with flaky tests.

Don't gamble with your tests. If it’s flaky and you know it – investigate, and fix the problem, whether it’s in the test code or the production code! Please don’t comment it out or quarantine and forget about it.