How Much Are Flaky Tests Costing You?

Flaky tests have been called “the most expensive lie in engineering.” They pass sometimes, fail others, with no code changes. Teams learn to ignore them, hit “retry,” and move on. The cost accumulates quietly—an invisible tax on engineering velocity.

The data

The numbers are consistent across organizations that have measured this:

Atlassian: 15% of Jira test failures are flaky. They estimate this wastes 150,000 engineering hours per year across their organization.
Google: 16% of their test failures are caused by flaky tests, not actual bugs.
Microsoft: 13% of test failures in their Windows codebase are flaky. A separate Microsoft study found that flaky tests, slow pipelines, and broken builds were the #1 cause of “bad days” for developers.

The pattern holds whether you’re a 10-person startup or a 10,000-person enterprise. If you haven’t measured your flaky rate, it’s probably somewhere in that 13-16% range.

Where the cost comes from

Direct CI costs

Every flaky failure triggers a rerun. Some teams configure automatic retries. Others have developers manually restart builds. Studies of large Travis CI datasets found that at least 2% of all builds are explicitly restarted due to suspected flakiness—developers hitting “rerun” rather than investigating. Either way, you’re paying for CI minutes that don’t produce useful signal.

A team running 20 CI builds per day at 15 minutes each, with a 15% flaky rate, wastes roughly 1,350 CI minutes per month just on reruns. At GitHub Actions rates ($0.006/min for Linux), that’s only $8/month. The CI cost is real but small.

Developer time

The bigger cost is human. When a build fails, someone has to look at it. Even if they quickly determine “oh, that’s just the flaky login test,” that’s a context switch. They’ve lost focus. Studies suggest context switches cost 15-25 minutes of productivity.

With the same numbers above (20 builds/day, 15% flaky rate), that’s 90 flaky failures per month. At 15 minutes per investigation, you’re looking at 22+ hours of developer time wasted monthly. At $75/hour, that’s $1,650/month—or nearly $20,000/year—for a single developer dealing with flaky tests.

Scale that across a team and the numbers get uncomfortable.

Trust erosion

The hardest cost to quantify is cultural. When tests flake regularly, teams stop trusting them. Red builds become background noise. Developers merge anyway because “it’s probably just that test.”

As Google engineers put it: “It’s human nature to ignore alarms when there is a history of false signals.”

This erodes the entire value of your test suite. Real bugs slip through because the signal is lost in the noise. You’ve invested in testing infrastructure that no longer provides confidence.

Alert fatigue

Slack notifications for failed builds become noise. Teams mute channels or stop paying attention to CI status entirely. When a real failure happens, it gets lost in the flood of flaky notifications—or worse, someone assumes it’s flaky and ignores it.

Ownership decay

When tests fail constantly, nobody owns fixing them. The original author has moved on. The current team didn’t write the test and doesn’t understand what it’s checking. Tests get skipped “temporarily” and stay skipped for months. The coverage you thought you had quietly disappears.

CI pipelines often suffer from what economists call the “Tragedy of the Commons”—everyone uses them, nobody maintains them. Without clear ownership, flaky tests accumulate and infrastructure degrades. Someone jumps in only when it blocks a release, applies a quick fix, and moves on.

Calculate your cost

We built a calculator to help you estimate the real cost of flaky tests for your team:

Open Flaky Test Calculator

Input your CI runs, duration, and flaky rate to see monthly and annual costs broken down by CI spend and developer time.

What to do about it

Once you’ve quantified the problem, you can prioritize fixing it.

Identify the worst offenders. Track which tests flake most often. Fix the top 5 flakiest tests first—they’re likely causing most of your pain.

Quarantine known flaky tests. Don’t let them block merges while you work on fixes. Move them to a separate suite that runs but doesn’t gate deployments.

Track stability over time. Measure your flaky rate monthly. Set a target. Celebrate when it goes down.

Invest in better test infrastructure. Many flaky tests stem from timing issues, shared state, or network dependencies. Better test isolation prevents flakiness at the source.

How Gaffer helps

Gaffer automatically detects flaky tests by tracking pass/fail patterns across runs. When a test flips between passing and failing without code changes, we flag it.

You can see which tests are flakiest, how often they flip, and how much CI time they’ve wasted. No manual tracking required.

Start Free