Failure Clustering: Debug Root Causes, Not Individual Tests

Five tests fail in CI. You investigate the first one — “Connection refused.” You investigate the second — same error. The third, fourth, fifth — all the same root cause. You just spent 30 minutes discovering what should have been obvious in 3 seconds: one service is down, and every test that depends on it failed.

What Is Failure Clustering?

Failure clustering groups test failures that share the same root cause into a single cluster. Instead of showing you 5 individual failures, it shows you 1 pattern affecting 5 tests.

The idea is simple: failures don’t happen in isolation. When a database connection drops, every test that queries the database fails. When an API endpoint changes, every test that calls that endpoint fails. Treating each failure independently wastes time.

Why Individual Test Failures Mislead

Most test reporting tools show a flat list of failures:

TestStatusError
auth.test.ts > loginFAILConnection refused
auth.test.ts > signupFAILConnection refused
api.test.ts > get usersFAILConnection refused
api.test.ts > create userFAILECONNREFUSED 127.0.0.1:5432
db.test.ts > migrationsFAILConnection refused

A developer sees 5 failures and might think 5 things broke. In reality, one thing broke — the database connection — and it cascaded to 5 tests. The fourth failure even has a slightly different error message, making it harder to spot the pattern manually.

The cost of debugging individually

  • Time: 5-10 minutes investigating each failure before realizing they’re the same
  • False fixes: Developers sometimes “fix” individual tests (adding retries, mocking) when the real issue is upstream
  • Noise: A 5-failure CI report feels worse than a 1-pattern report, even though the same amount of work is needed
  • Context switching: Jumping between 5 test files when the fix is in one shared dependency

How Gaffer Clusters Failures

Gaffer analyzes error messages across all failing tests in a run and groups them by shared patterns:

Clusters: 1 pattern (5 tests)
"Connection refused" — 5 tests
auth.test.ts > login
auth.test.ts > signup
api.test.ts > get users
api.test.ts > create user
db.test.ts > migrations

The algorithm:

  1. Extract error signatures from each failing test — the error message, stripped of variable parts (timestamps, UUIDs, port numbers)
  2. Group by similarity — tests with matching signatures form a cluster
  3. Name the cluster using the most common error message in the group
  4. Sort by impact — clusters with more affected tests appear first

This works across frameworks. Whether your errors come from Playwright, Jest, pytest, or JUnit XML, the clustering applies to the parsed error output.

When Clustering Matters Most

Infrastructure failures

Database down, Redis unavailable, network partition — these cascade across your entire test suite. Without clustering, a single infrastructure blip looks like 50 unrelated failures.

API contract changes

Someone changes an API response format. Every test that parses that response fails with a similar error. Clustering shows you the one change that needs reverting or adapting.

Environment issues

Missing environment variables, wrong Node version, incompatible dependency — these affect multiple tests in the same way. Clustering reveals the environment problem instead of burying it in individual test errors.

Flaky infrastructure in CI

CI runners sometimes have transient issues — DNS resolution failures, disk space, rate limits. Clustering helps you distinguish “the runner had a bad day” from “our code has bugs.”

Failure Clustering vs. Flaky Test Detection

These are complementary features that answer different questions:

Failure ClusteringFlaky Test Detection
Question”Why did these tests fail together?""Does this test sometimes fail without code changes?”
ScopeWithin a single test runAcross multiple runs over time
ActionFix the shared root causeInvestigate test non-determinism
Example5 tests fail with “Connection refused”login test fails 40% of runs

Used together, you get a complete picture: clustering tells you what went wrong right now, and flaky detection tells you what’s been unreliable over time.

Getting Started

Failure clustering works automatically with any test report uploaded to Gaffer. No configuration required.

  1. Upload a test report — via CLI, API, or direct upload
  2. View clusters — on the test run detail page, clusters appear below the failure list
  3. Query via CLIgaffer query failures shows clusters in your terminal
  4. Query via MCP — AI agents access clusters through get_test_run_details()

Stop debugging the same root cause five times

Gaffer clusters failures automatically. Free tier, no credit card.

Start Free
Start Free