Failure Clustering: Debug Root Causes, Not Individual Tests

Five tests fail in CI. You investigate the first one — “Connection refused.” You investigate the second — same error. The third, fourth, fifth — all the same root cause. You just spent 30 minutes discovering what should have been obvious in 3 seconds: one service is down, and every test that depends on it failed.

What Is Failure Clustering?

Failure clustering groups test failures that share the same root cause into a single cluster. Instead of showing you 5 individual failures, it shows you 1 pattern affecting 5 tests.

The idea is simple: failures don’t happen in isolation. When a database connection drops, every test that queries the database fails. When an API endpoint changes, every test that calls that endpoint fails. Treating each failure independently wastes time.

Why Individual Test Failures Mislead

Most test reporting tools show a flat list of failures:

Test	Status	Error
`auth.test.ts > login`	FAIL	Connection refused
`auth.test.ts > signup`	FAIL	Connection refused
`api.test.ts > get users`	FAIL	Connection refused
`api.test.ts > create user`	FAIL	ECONNREFUSED 127.0.0.1:5432
`db.test.ts > migrations`	FAIL	Connection refused

A developer sees 5 failures and might think 5 things broke. In reality, one thing broke — the database connection — and it cascaded to 5 tests. The fourth failure even has a slightly different error message, making it harder to spot the pattern manually.

The cost of debugging individually

Time: 5-10 minutes investigating each failure before realizing they’re the same
False fixes: Developers sometimes “fix” individual tests (adding retries, mocking) when the real issue is upstream
Noise: A 5-failure CI report feels worse than a 1-pattern report, even though the same amount of work is needed
Context switching: Jumping between 5 test files when the fix is in one shared dependency

How Gaffer Clusters Failures

Gaffer analyzes error messages across all failing tests in a run and groups them by shared patterns:

Clusters: 1 pattern (5 tests)
   "Connection refused" — 5 tests
     auth.test.ts > login
     auth.test.ts > signup
     api.test.ts > get users
     api.test.ts > create user
     db.test.ts > migrations

The algorithm:

Extract error signatures from each failing test — the error message, stripped of variable parts (timestamps, UUIDs, port numbers)
Group by similarity — tests with matching signatures form a cluster
Name the cluster using the most common error message in the group
Sort by impact — clusters with more affected tests appear first

This works across frameworks. Whether your errors come from Playwright, Jest, pytest, or JUnit XML, the clustering applies to the parsed error output.

When Clustering Matters Most

Infrastructure failures

Database down, Redis unavailable, network partition — these cascade across your entire test suite. Without clustering, a single infrastructure blip looks like 50 unrelated failures.

API contract changes

Someone changes an API response format. Every test that parses that response fails with a similar error. Clustering shows you the one change that needs reverting or adapting.

Environment issues

Missing environment variables, wrong Node version, incompatible dependency — these affect multiple tests in the same way. Clustering reveals the environment problem instead of burying it in individual test errors.

Flaky infrastructure in CI

CI runners sometimes have transient issues — DNS resolution failures, disk space, rate limits. Clustering helps you distinguish “the runner had a bad day” from “our code has bugs.”

Failure Clustering vs. Flaky Test Detection

These are complementary features that answer different questions:

	Failure Clustering	Flaky Test Detection
Question	”Why did these tests fail together?"	"Does this test sometimes fail without code changes?”
Scope	Within a single test run	Across multiple runs over time
Action	Fix the shared root cause	Investigate test non-determinism
Example	5 tests fail with “Connection refused”	`login` test fails 40% of runs

Used together, you get a complete picture: clustering tells you what went wrong right now, and flaky detection tells you what’s been unreliable over time.

Getting Started

Failure clustering works automatically with any test report uploaded to Gaffer. No configuration required.

Upload a test report — via CLI, API, or direct upload
View clusters — on the test run detail page, clusters appear below the failure list
Query via CLI — gaffer query failures shows clusters in your terminal
Query via MCP — AI agents access clusters through get_test_run_details()

Stop debugging the same root cause five times

Gaffer clusters failures automatically. Free tier, no credit card.

Start Free

Gaffer