Five tests fail in CI. You investigate the first one — “Connection refused.” You investigate the second — same error. The third, fourth, fifth — all the same root cause. You just spent 30 minutes discovering what should have been obvious in 3 seconds: one service is down, and every test that depends on it failed.
What Is Failure Clustering?
Failure clustering groups test failures that share the same root cause into a single cluster. Instead of showing you 5 individual failures, it shows you 1 pattern affecting 5 tests.
The idea is simple: failures don’t happen in isolation. When a database connection drops, every test that queries the database fails. When an API endpoint changes, every test that calls that endpoint fails. Treating each failure independently wastes time.
Why Individual Test Failures Mislead
Most test reporting tools show a flat list of failures:
| Test | Status | Error |
|---|---|---|
auth.test.ts > login | FAIL | Connection refused |
auth.test.ts > signup | FAIL | Connection refused |
api.test.ts > get users | FAIL | Connection refused |
api.test.ts > create user | FAIL | ECONNREFUSED 127.0.0.1:5432 |
db.test.ts > migrations | FAIL | Connection refused |
A developer sees 5 failures and might think 5 things broke. In reality, one thing broke — the database connection — and it cascaded to 5 tests. The fourth failure even has a slightly different error message, making it harder to spot the pattern manually.
The cost of debugging individually
- Time: 5-10 minutes investigating each failure before realizing they’re the same
- False fixes: Developers sometimes “fix” individual tests (adding retries, mocking) when the real issue is upstream
- Noise: A 5-failure CI report feels worse than a 1-pattern report, even though the same amount of work is needed
- Context switching: Jumping between 5 test files when the fix is in one shared dependency
How Gaffer Clusters Failures
Gaffer analyzes error messages across all failing tests in a run and groups them by shared patterns:
Clusters: 1 pattern (5 tests) "Connection refused" — 5 tests auth.test.ts > login auth.test.ts > signup api.test.ts > get users api.test.ts > create user db.test.ts > migrationsThe algorithm:
- Extract error signatures from each failing test — the error message, stripped of variable parts (timestamps, UUIDs, port numbers)
- Group by similarity — tests with matching signatures form a cluster
- Name the cluster using the most common error message in the group
- Sort by impact — clusters with more affected tests appear first
This works across frameworks. Whether your errors come from Playwright, Jest, pytest, or JUnit XML, the clustering applies to the parsed error output.
When Clustering Matters Most
Infrastructure failures
Database down, Redis unavailable, network partition — these cascade across your entire test suite. Without clustering, a single infrastructure blip looks like 50 unrelated failures.
API contract changes
Someone changes an API response format. Every test that parses that response fails with a similar error. Clustering shows you the one change that needs reverting or adapting.
Environment issues
Missing environment variables, wrong Node version, incompatible dependency — these affect multiple tests in the same way. Clustering reveals the environment problem instead of burying it in individual test errors.
Flaky infrastructure in CI
CI runners sometimes have transient issues — DNS resolution failures, disk space, rate limits. Clustering helps you distinguish “the runner had a bad day” from “our code has bugs.”
Failure Clustering vs. Flaky Test Detection
These are complementary features that answer different questions:
| Failure Clustering | Flaky Test Detection | |
|---|---|---|
| Question | ”Why did these tests fail together?" | "Does this test sometimes fail without code changes?” |
| Scope | Within a single test run | Across multiple runs over time |
| Action | Fix the shared root cause | Investigate test non-determinism |
| Example | 5 tests fail with “Connection refused” | login test fails 40% of runs |
Used together, you get a complete picture: clustering tells you what went wrong right now, and flaky detection tells you what’s been unreliable over time.
Getting Started
Failure clustering works automatically with any test report uploaded to Gaffer. No configuration required.
- Upload a test report — via CLI, API, or direct upload
- View clusters — on the test run detail page, clusters appear below the failure list
- Query via CLI —
gaffer query failuresshows clusters in your terminal - Query via MCP — AI agents access clusters through
get_test_run_details()
Stop debugging the same root cause five times
Gaffer clusters failures automatically. Free tier, no credit card.
Start Free