Dogfooding Gaffer's MCP Server to Fix Slow Playwright Tests

I’ve been building an MCP server for Gaffer that lets you query test analytics from Claude (or any MCP-compatible tool). I already had some basic functionality in place (querying test results, grabbing project details, etc), but it didn’t actually feel useful during claude sessions. I decided to spend time improving that, while giving Claude a simple task; “Use the Gaffer MCP to figure out why my playwright tests are slow.”

If tools didn’t exist or weren’t up-to-snuff, we’d take a break, improve them and come back to the task of speeding up my tests…or, spoiler alert; fixing the underlying code those tests covered. To be honest, if I couldn’t get the MCP to a point that it could help solve some real-world issues, I was probably going to scrap it.

The Starting Point

One of the first features I added to the MCP was get_slowest_tests. Here’s what Gaffer’s slow playwright tests looked like:

TestFileAvgP95
Skip to dashboard on invitationauth/organization.spec.ts17.9s53.4s
Complete onboarding wizardauth/onboarding.spec.ts17.2s31.6s
Invitations + access attemptonboarding-access-request.spec.ts18.4s27.9s
Upload token CRUDprojects.spec.ts15.0s27.4s
API key CRUDprojects.spec.ts17.4s25.5s

It came back with the 20 slowest tests ranked by P95 duration. That P95 of 53.4 seconds for a single test was rough. And the gap between average (17.9s) and P95 (53.4s) suggested something flaky was going on too. Claude was quick to notice a pattern with auth and onboarding tests dominating the list of slow tests.

Playwright Test Specific Issues

With a list of the slowest tests, it was easy to start taking a look at commonalities and searching for patterns between them. Some patterns jumped out quickly, others were harder to spot.

Sequential API Setup

Tests were creating users, projects, and test runs one at a time:

// Before: Sequential - each call waits for the previous
const userA = await testDataApi.createUser()
const project = await testDataApi.createProject(userA.organizationId)
const testRun = await testDataApi.createTestRun(project.projectId)
const userB = await testDataApi.createUser()  // Could have run in parallel with userA!

With 4-5 sequential API calls before the test even starts, that’s 3-5 seconds of setup per test.

Redundant Navigation

Some tests were navigating to /dashboard first, then to the actual test URL:

// Before: Why are we going to dashboard first?
await page.goto('/dashboard', { waitUntil: 'hydration' })
await page.goto(reportUrl)  // This is where we actually need to be

The session was already established via cookies. The dashboard visit was doing nothing except forcing playwright to wait twice for page loads.

Sequential Visibility Checks

Multiple tests were checking elements one at a time:

// Before: Each waits independently
await expect(page.getByRole('heading', { name: 'Welcome' })).toBeVisible()
await expect(page.getByRole('heading', { name: /organization/i })).toBeVisible()
await expect(page.getByRole('textbox', { name: /name/i })).toBeVisible()

These could all run in parallel since they’re checking independent elements…or in some cases, the tests really only needed one check. I’ve noticed this pattern with long-lived Playwright tests in the past. Where some other engineer needed to verify an element is visible and instead of dumping them into a promise catch-all, appends them to the bottom of the list. While this is mostly painless, the verifications can slowly build up and cause issues.

The Fixes

Parallelize User Creation

// After: Create independent users in parallel
const [userA, userB] = await Promise.all([
  testDataApi.createUser(),
  testDataApi.createUser(),
])

// Sequential only where there's a real dependency
const project = await testDataApi.createProject(userA.organizationId)
const testRun = await testDataApi.createTestRun(project.projectId)

Remove Redundant Navigation

// After: Go directly where you need to be
await page.goto(reportUrl, { waitUntil: 'hydration' })

Parallelize Assertions

// After: Check all at once
await Promise.all([
  expect(page.getByRole('heading', { name: 'Welcome' })).toBeVisible(),
  expect(page.getByRole('heading', { name: /organization/i })).toBeVisible(),
  expect(page.getByRole('textbox', { name: /name/i })).toBeVisible(),
])

The results:

There was a modest improvement, but really nothing worth celebrating. And honestly, even with ~5-10 test-runs, I couldn’t be completely sure that some outlier run wouldn’t pop up and take up an extra three-to-four of my precious Github Action minutes. I think there may have been a ~3s improvement per test.

Gaffer Code Fixes

This was the real culprit, and it took some digging to find. The auth middleware was waiting up to 10 seconds for auth state to load on every single navigation.

The problem? Better Auth uses absolute URLs by default when making session requests. During SSR in Nuxt, this breaks cookie forwarding - the server-side fetch doesn’t include the user’s cookies because it’s treated as a cross-origin request.

// Before: Absolute URL breaks SSR cookie forwarding
const authClient = createAuthClient({
  baseURL: 'https://app.gaffer.sh/api/auth',  // Cross-origin during SSR!
  plugins: [...]
})

The result: session is null during SSR, then suddenly available after client hydration. The auth middleware had to wait for the client to load and make a fresh session request. The fix was to use relative URLs so cookies are forwarded properly:

// After: Relative URL preserves cookies during SSR
const authClient = createAuthClient({
  baseURL: '/api/auth',  // Same-origin, cookies forwarded
  plugins: [...]
})

This required some refactoring - moving the auth plugin from client-only to universal, updating the session hook to work during SSR, and adjusting the middleware timeout from 10s to 5s since auth now resolves immediately. (Related issue: better-auth/better-auth#5358)

Non-Blocking Data Fetches

A smaller but still useful optimization: the dashboard was using blocking data fetches that delayed hydration:

// Before: Page waits for all API calls before rendering
const { data: projects } = await useFetch('/api/projects')
// After: Page renders immediately, data loads in background
const { data: projects, status } = useLazyFetch('/api/projects')

// Show skeletons while loading
<template v-if="status === 'pending'">
  <USkeleton class="h-9 w-16" />
</template>

Now the page hydrates immediately and tests pass faster. The actual data arrives moments later with skeleton placeholders in the meantime.

Results

After the SSR auth fix, I used get_test_history to check the damage:

TestBeforeAfterChange
Invitations + access attempt~23s8-14s~50%
Skip to dashboard on invitation (P95)53.4s35.9s-33%

The test that was running 22-27s before is now consistently hitting 8-9 seconds in recent runs. That’s not a typo - the SSR fix was the real bottleneck.

Takeaways

The playwright-specific fixes were fine. Parallelizing setup, removing redundant navigation, batching assertions - each saved maybe a second or two. Nothing to write home about.

The SSR auth fix was the actual win. It wasn’t visible in test code at all. The problem was buried in the auth layer, and the only reason I spotted it was because every auth test was slow. That pattern - “all tests in category X are slow” - is worth paying attention to. It usually means the problem isn’t the tests.

I’m glad I stuck with improving the MCP until it could actually help me solve a real problem. It would have been easy to call it “good enough” after the basic endpoints were working. But forcing myself to use it on a real task exposed the gaps - and now the MCP is actually useful.


Start Free