Your AI agent can't fix what it can't see
Give Claude Code, Cursor, or any MCP client structured access to your test history. Agents query failures and fix code — no copy-pasting terminal output.
AI agents parse unstructured terminal output and miss context
Your agent reads a wall of test output and tries to extract meaning. It misses patterns, misidentifies root causes, and wastes tokens on noise.
No memory of previous test runs
Each coding session starts fresh. The agent doesn't know which tests were flaky last week, which failures are recurring, or what the coverage trend looks like.
Agents can't distinguish flaky failures from real regressions
Without historical flip rates, your agent treats every failure as a bug to fix — including the ones that pass on re-run.
Structured data, not terminal output
// Agent queries Gaffer MCP server gaffer.get_flaky_tests() // Structured response — not terminal output { "flaky_tests": [ { "name": "src/auth.test.ts > login", "flip_rate": 0.4, "last_10_runs": ["pass","fail","pass","pass","fail"...] } ], "health_score": 87, "failure_clusters": [ { "pattern": "Connection refused", "count": 3 } ] }
Your agent gets structured JSON — not text to parse. It can reason about flip rates, cluster patterns, and coverage gaps.
What agents can do with Gaffer
Query flaky tests
Get a list of unreliable tests with flip rates, run history, and last failure. Agents prioritize fixes by impact.
Understand failure clusters
Failures grouped by root cause pattern. Agent sees "3 tests fail with Connection refused" and fixes the shared cause, not each test individually.
Historical context across sessions
Agents remember what happened in previous runs. They know which tests regressed, which are improving, and where coverage is dropping.
Works with any MCP client
Claude Code, Cursor, Windsurf, or any tool that supports the Model Context Protocol. Install once, use everywhere.