Test Reporting: What to Include and How to Automate It

Most guides on test reporting assume a QA lead assembling a release document for stakeholders: a hand-written defect log, a coverage table, a sign-off summary. For a developer running tests in CI, that picture is wrong. Your test report is already generated automatically on every push. The real problems are that it’s a local HTML file nobody else can open, it gets overwritten on the next build, and there’s no way to compare today’s run against last week’s. This guide covers what a test report should contain and the standard report types, then spends most of its time on the part that actually bites: getting reports off the CI runner and somewhere your team and your history can use them.

What is test reporting?

Test reporting is the practice of turning the raw output of a test run into a structured artifact that other people and other tools can read. A test runner prints pass/fail lines to a terminal. Test reporting captures that information in a durable format, records which tests ran and how they did, and makes the result available beyond the machine that produced it.

For a manual QA process, reporting is a person writing up what they observed. For an automated pipeline, the test framework emits the report itself: JUnit XML, a JSON file, or an HTML page, written to disk the moment the run finishes. The interesting work is no longer producing the report. It’s what happens to the file afterward.

Test reporting vs. test reports: the process and the artifact

These two terms get used interchangeably, but they describe different things.

A test report is the artifact: a single file or page describing one test run. results.xml, index.html, a CTRF JSON blob.

Test reporting is the process around that artifact: generating it, storing it, sharing it, and comparing it against previous runs. A team can produce a perfect test report on every build and still have no test reporting worth the name, because the files land in a CI artifact bucket nobody opens and vanish when the retention window closes.

The rest of this guide treats the report as a solved problem (your framework already writes one) and focuses on the reporting.

What should a test report include?

A useful test report answers four questions: what was tested, what happened, what broke, and where it ran. Whether a human or a CI pipeline produces it, those four are the load-bearing sections.

Project information and test scope

The header. What project, which version or commit, which branch, and what subset of tests this report covers (full suite, smoke tests, a single shard). In automated reports this is metadata you tag at upload time rather than prose you write: a commit SHA and branch name are enough to make any report traceable back to the exact code it ran against.

Test summary (pass, fail, skip counts)

The numbers that matter at a glance: total tests, passed, failed, skipped, and the resulting pass rate. Most report formats put this at the top because it’s the first thing anyone looks at. A summary of 412 passed, 3 failed, 5 skipped tells you more in one line than a wall of green checkmarks.

Defect log

The list of failures, each with the test name, the assertion that failed, and the error or stack trace. In a manual QA report this is written by hand and cross-referenced to a tracker. In an automated report the framework produces it for free: every failed test carries its message and trace in the output. The job of the reporting layer is to make those failures searchable across runs, instead of visible only in the current one.

Test environment details

Where the tests ran: OS, runtime version, browser (for end-to-end suites), and any relevant configuration. This is the section people skip until a test passes locally and fails in CI, at which point it’s the only section that matters. Capturing it automatically from the CI environment beats documenting it by hand.

Types of test reports

“Test report” covers several distinct artifacts depending on what was tested and who’s reading. The components above stay roughly constant; the emphasis shifts.

Functional test reports

The output of functional or end-to-end tests that exercise application behavior the way a user would. These lean heavily on the defect log and on attachments: screenshots, videos, and traces for each failure. The reader is usually a developer trying to reproduce a broken flow.

Regression test reports

Generated after a change to confirm nothing that used to work is now broken. The single most valuable thing a regression report can show is comparison: this test passed last release and fails now. That comparison is impossible from one report in isolation, which is why regression reporting depends on persistent history more than any other type.

Performance and load test reports

The output of performance and load testing. The emphasis moves from pass/fail to numbers: response times, throughput, percentiles, error rates under load. These reports are about trends over time, since a single load test result means little without a baseline to compare against.

CI/CD automated test reports

The report your pipeline generates on every push without anyone asking. This is the type most developers actually deal with day to day, and the one the rest of this guide is about. It’s produced automatically, it covers whatever ran in that build, and its central problem is volume: one report per push, per spec, per shard, most of which are never looked at and all of which expire.

Best practices for effective test reporting

The standard advice (be clear, be concise, include the right sections) applies to any document. The advice that’s specific to automated reporting comes down to three things.

Automate generation: don’t write reports by hand

If you’re copying numbers from a terminal into a document, stop. Every major test framework can emit a structured report directly. The standard interchange format is JUnit XML, which Playwright, Jest, Vitest, pytest, Cypress, and most others can produce with a single flag or config line. JSON formats like CTRF cover the same ground with a richer schema.

Hand-written reports are slow, error-prone, and stale the moment the next build runs. The only reports worth maintaining by hand are the human-facing release summaries that sit on top of automated data, and even those should pull their numbers from a generated source.

The first step in any reporting setup is making the framework write the file:

# Playwright
npx playwright test --reporter=junit

# Jest / Vitest (via reporter config)
# vitest.config.ts -> test.reporters: ['junit']

# pytest
pytest --junitxml=results.xml

Make reports shareable and persistent, not local HTML files

A generated HTML report is sitting on a CI runner that gets wiped when the job ends, or that holds artifacts for a fixed window before deleting them. To open it, a teammate has to find the workflow run, click into the job, download a zip, unzip it, and open index.html in a browser. To compare two runs, they do that twice.

A report that lives at a stable URL removes all of that. The link works for anyone you send it to, including a non-technical stakeholder or a contractor without CI access, and it still works when an incident postmortem references a run from three months ago. The CI artifact is fine as storage for raw files. It’s a poor viewing surface.

Track trends across builds, not single runs in isolation

One report tells you whether this build passed. Fifty reports tell you that login.spec.ts has been failing one run in eight for three weeks while everyone re-ran the job and moved on. That signal only exists across runs, and you can’t see it by opening reports one at a time.

This is the return on automating the rest. Generating a report is plumbing. Knowing which test got flaky, which got slower since the last release, and which only fails on PR branches is the actual payoff, and it requires treating every upload as a row in a time series rather than a file you glance at and discard.

Challenges in test reporting

Two problems show up in almost every CI pipeline once the report-generation step works.

Test report sprawl in CI pipelines

A sharded suite running on every push produces a lot of files. Four shards times every push times every branch is hundreds of report fragments a week, scattered across job artifacts, most expiring before anyone reads them. Parallel runs make it worse: each spec writes its own file, and the HTML viewer only renders one at a time, so you either merge fragments yourself or browse them piecemeal.

Sprawl isn’t a storage problem, it’s a findability problem. The report you want is almost always there. Locating it among hundreds of expiring zip files is the part that fails.

Reports that stakeholders can’t read

A raw JUnit XML file is unreadable to anyone who isn’t a CI system. An HTML report is better, but it’s still gated behind CI navigation that a designer, a PM, or a customer who filed a bug can’t be expected to wade through. The reporting setup for QA teams covers the stakeholder-visibility side specifically. The gap between “the data exists” and “the right person can see it” is where most reporting setups quietly fail.

Automated test report generation in CI/CD

Here’s the concrete version of the pipeline the best practices above describe.

What a modern test report pipeline looks like

Four steps, in order:

Generate. The framework writes a structured report (JUnit XML or CTRF JSON) as part of the test step.
Run on failure too. The reports you most need are from failed builds, so the upload step runs even when tests fail.
Send somewhere durable. The report goes to a hosted layer with a stable URL instead of (or in addition to) a CI artifact.
Compare. Each upload joins a history you can query across runs.

The first two steps are a few lines of CI config. Here’s a GitHub Actions job that generates a report and uploads it using the gaffer-sh/gaffer-uploader@v2 action:

name: Test

on:
  push:
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm

      - run: npm ci
      - run: npx playwright test --reporter=junit
        # writes results.xml even on failure

      - name: Upload report to Gaffer
        if: always()
        uses: gaffer-sh/gaffer-uploader@v2
        with:
          gaffer_upload_token: ${{ secrets.GAFFER_PROJECT_TOKEN }}
          report_path: ./results.xml
          commit_sha: ${{ github.sha }}
          branch: ${{ github.ref_name }}
          test_framework: playwright

The if: always() is doing real work. Without it, a failing test aborts the job before the upload step and you lose the report from exactly the build you needed to see. The commit_sha and branch inputs become tags on the upload, which is what lets you filter runs by branch later.

If you’d rather not use the action (a different CI provider, a local run, a custom script), the same upload runs from the CLI:

gaffer upload ./results.xml \
  --token $GAFFER_PROJECT_TOKEN \
  --commit-sha $GITHUB_SHA \
  --branch $GITHUB_REF_NAME \
  --test-framework playwright

The --token flag accepts a gfr_… project token or falls back to the GAFFER_PROJECT_TOKEN, GAFFER_UPLOAD_TOKEN, or GAFFER_TOKEN environment variable. The per-file ceiling is 5 GB (raise --max-file-size-mb to 5000 for large traces); elevation to multipart upload above 90 MB is automatic.

Three common approaches, in increasing order of usefulness:

CI artifacts. Built in, zero setup, and the default for most teams. Fine as raw storage. No cross-run history, no viewing surface, and a hard retention limit.
Blob storage (S3 and similar). Gets you a permanent URL for the HTML file. Better than artifacts for sharing, but you’re still looking at one static file at a time with no pass-rate trends, flaky detection, or failure search.
A dedicated report-hosting layer. Ingests the report from CI and gives you a persistent URL plus the analytics the raw file can’t: history, flaky-test detection, and pass-rate trends, without writing any of that yourself.

Gaffer hosts test reports in that last category. It accepts JUnit XML, CTRF JSON, and several other formats, stores each run at a stable URL, and builds the cross-run history on top. The same data is queryable through its MCP server, so an agent (or you, in an editor) can ask about test health directly. get_project_health returns a health score, pass rate, flaky-test count, and a trend direction over a chosen window; get_test_history returns the per-run pass/fail timeline for one test with branch and commit for each entry; get_flaky_tests returns the tests flipping between pass and fail with a flip rate and flakiness score per test. The shape of a get_project_health response looks like this (values illustrative):

{
  "projectName": "gaffer",
  "healthScore": 92,
  "passRate": 98.4,
  "testRunCount": 137,
  "flakyTestCount": 4,
  "trend": "stable",
  "period": { "days": 30, "start": "...", "end": "..." }
}

That’s the difference between a report and reporting: the file tells you about one run, the history tells you whether the suite is getting better or worse.

Test reporting for small teams and open-source projects

Most of the writing on test reporting targets enterprise QA: defect-tracking workflows, release-readiness gates, sign-off documents. A three-person startup or an open-source project with a handful of maintainers doesn’t need any of that. It needs the report off the runner, at a URL a contributor or a bug reporter can open, with enough history to catch a test that’s been flaky for a month.

The constraint for small teams is usually cost and setup time, not features. Per-seat pricing on enterprise test platforms makes them a non-starter when half your contributors are occasional. Flat pricing with unlimited users fits the shape of a small or open project far better. The setup should be one CI step and one token, not a rollout.

If that’s your situation, the test reporting setup for small teams walks through the specifics. The short version is the same as the rest of this guide: stop hand-writing reports, let CI generate them, and put them somewhere your team and your history can actually use.

Gaffer