Playwright MCP with Claude Code: Setup & CI Guide

Installing Playwright MCP in Claude Code is a single command: claude mcp add playwright npx @playwright/mcp@latest. After that, Claude can drive a real browser, click through your app, and read the page back as an accessibility tree. This guide covers the install, the configuration that actually matters, how to run it in CI, and the part every other guide skips: keeping the results after the session ends. We run our own dashboard suite this way. In a representative two-week dogfood window it logged a 99.95% pass rate across 173 uploads, with 4 flaky tests and a health score of 95, all still queryable after each run.

What is the Playwright MCP server?

Playwright MCP is a Model Context Protocol server, published by Microsoft as @playwright/mcp, that exposes a real Chromium, Firefox, or WebKit browser as a set of tools an AI agent can call: navigate, click, fill a form, evaluate JavaScript, snapshot the accessibility tree, capture a screenshot.

The reason it works well with Claude Code is that the agent gets the page as structured data, not pixels. Claude calls browser_snapshot, gets back the accessibility tree, and reasons about roles and labels directly. No OCR-ing a button out of a screenshot, no guessing click coordinates that break when the layout shifts.

Playwright MCP vs the Playwright CLI: when to use which

npx playwright test runs pre-written specs against a headless browser and produces a report. Playwright MCP runs the browser interactively under the agent’s control, with no spec file: the agent decides each next action from the current page state.

	Playwright CLI (`npx playwright test`)	Playwright MCP
Driven by	Pre-written spec files	The agent, live
Output	A report (JUnit, HTML, JSON)	Tool-call results in the session
Best for	Your regression suite in CI	Self-QA, reproducing a bug, generating specs
Survives the run	Yes, as report files	No, unless you capture it

You need both. The CLI is your suite. The MCP is how the agent drives the app the way a QA engineer would, when it is checking a fix before pushing.

Prerequisites

You need Node.js 18 or newer, the Claude Code CLI, and the Playwright browser binaries. Everything below assumes those three are in place.

Node.js version requirements

@playwright/mcp requires Node 18+. Check with node -v. If claude mcp add succeeds but the server fails to start, an old Node is the most common cause: older npx silently falls back to deprecated APIs and the server never comes up.

Installing the Claude Code CLI

Install Claude Code from npm and confirm it runs:

npm install -g @anthropic-ai/claude-code
claude --version

The claude mcp subcommand registers MCP servers in your Claude Code config, so the next step depends on this being on your PATH.

How to set up Playwright MCP with Claude Code

Run claude mcp add playwright npx @playwright/mcp@latest. That registers the server, pinned to whatever @latest resolves to at install time. Start a session, run /mcp, and you should see playwright connected with about two dozen tools.

One-command setup: claude mcp add playwright

claude mcp add playwright npx @playwright/mcp@latest

To lock a specific version instead of tracking @latest:

claude mcp add playwright npx @playwright/[email protected]

Re-run the @latest form whenever you want to bump. The package moves quickly, so pinning is reasonable if you care about reproducibility.

Verifying the connection

In a Claude Code session, run /mcp. You want playwright in the list with status connected and the browser tools registered (browser_navigate, browser_click, browser_snapshot, and friends). If the server fails to start, the binaries are usually missing:

npx playwright install chromium

That is the same step a regular Playwright project needs. The MCP server uses the same browser cache, so if you already run Playwright locally, the browsers are there.

Configuration: snapshot mode, browser selection, storage state

Three configuration choices cover almost everything:

Snapshot mode is the default and the one you want. The agent reads the accessibility tree, which is token-cheap and deterministic. Vision mode exists but costs more tokens and is more brittle.
Browser selection defaults to Chromium. Pass --browser firefox or --browser webkit in the MCP launch args if you need to reproduce a browser-specific bug.
Storage state keeps the agent logged in. Generate it once with regular Playwright, then point the MCP server at it:

npx playwright codegen --save-storage=auth.json https://app.example.com/login

Add --storage-state auth.json to the MCP server’s launch args, and the agent starts every session already authenticated instead of re-running the login flow on each action.

What can you do with Playwright MCP in Claude Code?

Once it is connected, you ask Claude to do browser work in plain language and it drives the browser through tool calls. The five workflows below are the ones that hold up in practice.

Self-QA during development

You made a change and want to know if you broke an obvious flow. Ask Claude to walk it:

Open http://localhost:5173/login, sign in with [email protected] / password, then verify the dashboard shows my four most recent test runs.

Claude calls browser_navigate, browser_snapshot to read the form, browser_fill for the inputs, browser_click to submit, and browser_snapshot again to read the dashboard back. It catches regressions before CI runs. It does not catch anything you did not think to ask about.

Exploratory testing: asking Claude to find edge cases

Instead of scripting the checks, hand the agent a goal and let it probe:

Try to submit the signup form with invalid inputs. Report any case where the error message is missing, wrong, or where the form submits anyway.

The agent generates its own inputs and reads the results. This is genuinely useful for the boring permutations a human skips, with the caveat that it is non-deterministic. Treat what it finds as leads, not a passing suite.

Authenticated testing with storage state

With the auth.json from the configuration step, every session starts logged in. The agent can exercise account settings, billing, and other gated flows without re-authenticating on each action. Regenerate the storage state when it expires.

Debugging flaky tests

When a spec passes locally but flakes in CI, ask the agent to drive that exact flow live and watch what differs. The MCP gives it the live browser; the failure record gives it the target. We will wire the failure record in below, through Gaffer’s flaky test detection.

Generating test code from a live session

After the agent walks a flow once, ask it to write the spec:

Now write that as a Playwright test in tests/dashboard.spec.ts.

It translates the tool-call sequence into a real spec. This is the fastest path from “this flow works” to “this flow is covered,” but the generated file vanishes with the session unless you commit it. Memory is the recurring theme of this guide.

Running Playwright MCP in CI

In CI, the agent runs headless and the browser binaries must be present in the image. The MCP server controls a browser the same way it does locally, so the failures are the usual Playwright-in-Docker ones: missing system libraries, no display, and binaries that were never installed.

Headless mode and browser binaries in Docker

Headless is the default in a CI environment, so you do not toggle anything for that. What you do need is the browsers plus their system dependencies in the image:

npx playwright install --with-deps chromium

--with-deps pulls the apt packages Chromium needs on a stripped-down CI image. Skip it and you get cryptic shared-library errors at launch, not at install.

Common CI failures and how to fix them

Symptom	Cause	Fix
`Executable doesn't exist`	Browser binaries not installed	`npx playwright install --with-deps`
`error while loading shared libraries`	Missing system deps	Add `--with-deps`, or use the official Playwright image
Server fails to start	Node < 18	Pin a Node 18+ step before the MCP step
Hangs forever	Waiting on a display	Confirm headless mode; do not force `--headed` in CI

When in doubt, base the job on mcr.microsoft.com/playwright, which ships the browsers and deps preinstalled.

Running browser QA on every branch automatically

The agentic version of CI runs the agent as a workflow step: it drives the browser, then runs the suite, then pushes. The key is that the browser session and the suite run both produce output you want to keep. The next section is about not throwing that output away.

Capturing and storing Playwright MCP test results

By default, everything a Playwright MCP session produces dies when the CI job exits: the browser run, the failures Claude found, and the test code it generated. To give the agent test memory, wrap your test command with gaffer test so each run is parsed and stored, then query past runs with gaffer query or the Gaffer MCP.

Why ephemeral CI output breaks the agentic loop

The loop is two-sided. Playwright MCP covers the present: this page, this run, this browser. The missing side is history: was this test flaky last week, is this failure new, did the suite already pass on main. Without that, every CI failure looks new. The agent re-investigates a flaky test it already triaged, re-applies a fix that already failed, and treats a 200-failure run as 200 problems instead of the three or four root causes it actually is.

A pass rate, a flaky count, a health score: none of that exists if the per-run output is discarded at job exit. That history is the difference between a browser-automation demo and a loop the agent can reason against.

Wrapping MCP runs with gaffer test for persistent reporting

gaffer test wraps your existing test command, parses whatever report it produces (Playwright, JUnit, and more), and stores the result:

gaffer test -- npx playwright test --reporter=line,junit

In CI, the official Action does the upload after the suite runs. Use @v2 with gaffer_upload_token (the older gaffer_api_key input is deprecated):

- name: Install Playwright browsers
  run: npx playwright install --with-deps chromium

- name: Run Playwright tests
  run: npx playwright test --reporter=line,html,junit

- name: Upload results to Gaffer
  if: always()
  uses: gaffer-sh/gaffer-uploader@v2
  with:
    gaffer_upload_token: ${{ secrets.GAFFER_UPLOAD_TOKEN }}
    report_path: ./test-results
    commit_sha: ${{ github.sha }}
    branch: ${{ github.ref_name }}
    test_framework: playwright

if: always() matters. The runs you most want to keep are the failing ones, so gating the upload on success throws away exactly the data the agent needs next session. The same upload works from the CLI with gaffer upload ./test-results --token $GAFFER_UPLOAD_TOKEN --commit-sha $GITHUB_SHA --branch $GITHUB_REF_NAME --test-framework playwright.

Querying past sessions with gaffer query

gaffer query reads the stored history locally, no dashboard round-trip:

gaffer query flaky                  # flaky tests ranked by score
gaffer query history "login flow"   # pass/fail record for one test
gaffer query health                 # health score and trend
gaffer query runs --limit 20        # recent runs with counts

For the agent to ask the same questions mid-session, give your AI coding tools access to your test results by installing the Gaffer MCP alongside Playwright MCP. It exposes get_flaky_tests (tests above a flip-rate threshold over a window, default 30 days) and get_test_history (the pass/fail record for one test across runs), among others. The full tool reference is at /docs/mcp/.

The full agentic loop: Claude Code, Playwright MCP, Gaffer, CI

The complete loop has four legs: Claude drives the browser with Playwright MCP, runs the suite, CI uploads the results to Gaffer, and the next session reads that history back through the Gaffer MCP before deciding what to fix. Playwright MCP gives the agent eyes and hands; the result store gives it memory.

In practice that means the agent calls get_test_history before “fixing” a failing test: if it passed on main last week, the regression is recent and probably from code that just landed. It calls get_flaky_tests before opening a fix at all: a test that flipped four times in two weeks is a quarantine candidate, not a code change. Those decisions are unreachable from a CI log dump. They need stored history.

This guide is the setup-and-storage half. The deeper walkthrough of that loop, with the two-MCP triage flow drawn out end to end, is in Playwright MCP + Claude Code: A Complete Test Loop.

Frequently asked questions

Can Claude Code use Playwright MCP?

Yes. Install it with claude mcp add playwright npx @playwright/mcp@latest, then run /mcp in a session to confirm it is connected. Claude can then drive a real browser through tool calls: navigate, click, fill forms, and read the page back as an accessibility tree.

What is the most useful MCP for Claude Code?

It depends on the task, but for testing work, Playwright MCP plus a result-history MCP is the pairing that holds up. Playwright MCP lets the agent drive the browser; a history MCP like Gaffer’s lets it ask whether a failure is new, flaky, or a known regression. One without the other leaves the agent either blind to the live app or blind to the suite’s past.

What is the difference between Playwright MCP and Claude in Chrome?

Playwright MCP controls a Playwright-managed browser (Chromium, Firefox, or WebKit) as MCP tools, reading the page as an accessibility tree, which makes it well suited to CI and headless runs. Claude in Chrome is a browser extension that operates inside your live Chrome session, oriented toward interactive use on real sites you are already logged into. Playwright MCP is the better fit for automated browser QA and CI; Claude in Chrome is the better fit for ad-hoc tasks in your own browser.

Do I need to install Playwright separately to use Playwright MCP?

You need the browser binaries, which @playwright/mcp uses from the same cache as the regular Playwright CLI. If the server fails to start, run npx playwright install chromium (add --with-deps in CI). You do not need a Playwright project or config file for interactive MCP use.

Where do Playwright MCP results go after the session ends?

Nowhere, by default. The browser run and any generated specs live only in that session. To keep them, wrap your suite with gaffer test or upload with the gaffer-sh/gaffer-uploader@v2 Action, then query the history with gaffer query or the Gaffer MCP.

Setup is one command. The work that pays off is storing each run so the agent’s next session starts with memory instead of a blank slate. The deeper loop is in A Complete Test Loop, and the full MCP tool reference is at /docs/mcp/.

Gaffer