Your AI agent can't fix what it can't see

Give Claude Code, Cursor, or any MCP client structured access to your test history. Agents query failures and fix code — no copy-pasting terminal output.

AI agents parse unstructured terminal output and miss context

Your agent reads a wall of test output and tries to extract meaning. It misses patterns, misidentifies root causes, and wastes tokens on noise.

No memory of previous test runs

Each coding session starts fresh. The agent doesn't know which tests were flaky last week, which failures are recurring, or what the coverage trend looks like.

Agents can't distinguish flaky failures from real regressions

Without historical flip rates, your agent treats every failure as a bug to fix — including the ones that pass on re-run.


Structured data, not terminal output

// Agent queries Gaffer MCP server
gaffer.get_flaky_tests()

// Structured response — not terminal output
{
  "flaky_tests": [
    {
      "name": "src/auth.test.ts > login",
      "flip_rate": 0.4,
      "last_10_runs": ["pass","fail","pass","pass","fail"...]
    }
  ],
  "health_score": 87,
  "failure_clusters": [
    { "pattern": "Connection refused", "count": 3 }
  ]
}

Your agent gets structured JSON — not text to parse. It can reason about flip rates, cluster patterns, and coverage gaps.


What agents can do with Gaffer

Query flaky tests

Get a list of unreliable tests with flip rates, run history, and last failure. Agents prioritize fixes by impact.

Understand failure clusters

Failures grouped by root cause pattern. Agent sees "3 tests fail with Connection refused" and fixes the shared cause, not each test individually.

Historical context across sessions

Agents remember what happened in previous runs. They know which tests regressed, which are improving, and where coverage is dropping.

Works with any MCP client

Claude Code, Cursor, Windsurf, or any tool that supports the Model Context Protocol. Install once, use everywhere.

Give your AI agent test memory

Install the MCP server in under a minute.

Playwright Playwright Vitest Vitest Pytest Pytest Jest Jest JUnit JUnit PHP PHP XML XML JSON JSON .NET .NET RSpec RSpec