Your production services have SLOs, dashboards, and PagerDuty alerts. Your test suite has a green or red badge in CI. OpenTelemetry closes that gap.
The Gap
Most engineering teams have spent years building their observability stack. Services emit traces. Infrastructure reports resource usage. Error rates trigger alerts at 3am.
Meanwhile, test health lives in CI logs. To know if your test suite is degrading, someone has to click through pipeline runs and eyeball it. There’s no trend line. No alert when the pass rate drops 5% over two weeks. No way to see that flaky test count doubled after a dependency upgrade.
This isn’t because teams don’t care about test health. It’s because test results have been trapped in CI — a separate system from the observability infrastructure that already exists.
Test Results Are Metrics
A test run produces time-series data:
- Pass rate — functionally an SLI for your codebase’s correctness
- Failure count — a reliability signal, same as error rate for a service
- Flaky test count — a measure of non-determinism in your system
- Coverage percentage — a proxy for how much of your code is exercised
These are time-series metrics. They belong in the same systems that track request latency and error rates — not in a spreadsheet someone updates monthly for a sprint retro.
Why OpenTelemetry
You could write a custom script to POST test metrics to Datadog’s API. Teams do this. It works until the test runner output format changes, or you switch observability vendors, or someone on another team builds a slightly different version of the same script.
OpenTelemetry is the standard transport for observability data. It’s vendor-neutral by design. Export via OTLP and your metrics flow to Datadog, Grafana Cloud, New Relic, Honeycomb — whatever your team already uses. No vendor-specific instrumentation. No rewriting integrations when you switch backends.
Your observability stack already speaks OTLP. Your test metrics should too.
What Gaffer Exports
When you enable OpenTelemetry export, Gaffer sends metrics after each test run is processed. Here’s what gets emitted:
| Metric | Type | What It Measures |
|---|---|---|
gaffer.tests.pass_rate | Gauge (%) | Pass rate of the test run |
gaffer.tests.total | Gauge | Total test count |
gaffer.tests.passed | Gauge | Passed test count |
gaffer.tests.failed | Gauge | Failed test count |
gaffer.tests.skipped | Gauge | Skipped test count |
gaffer.coverage.line_percent | Gauge (%) | Line coverage percentage |
gaffer.coverage.branch_percent | Gauge (%) | Branch coverage percentage |
Every metric includes gaffer.project.id and gaffer.project.name as attributes. When your CI uploads include branch and commit metadata, gaffer.branch and gaffer.commit_sha are also attached. This means you can filter dashboards by project, compare branches, and correlate specific commits with metric changes.
Setting It Up
If you’re already uploading test results to Gaffer, this is configuration — not code.
- Go to Settings > Integrations > OpenTelemetry in the Gaffer dashboard
- Add a destination — built-in presets exist for Datadog and Grafana Cloud, plus a generic OTLP option for any compatible endpoint
- Enter your credentials (API key for Datadog, instance ID and API token for Grafana Cloud, or a custom OTLP endpoint URL)
- Select which branches and event types to export
- Click Test to verify the connection
No CI pipeline changes. No test runner plugins. Gaffer handles the OTLP export server-side when results are processed.
For Datadog-specific setup including a dashboard template you can import directly, see the Datadog integration guide.
Building a Test Health Dashboard
Once metrics are flowing, you build dashboards the same way you would for any other service.
In Datadog, create a timeseries widget querying gaffer.tests.pass_rate grouped by gaffer.project.name. Add a monitor that alerts when pass rate drops below 95% for two consecutive runs. Overlay it on the same dashboard where you track deployment frequency and change failure rate.
In Grafana, add a Prometheus or OTLP data source pointing at your collector. A panel with gaffer.tests.failed as a time series, filtered by gaffer.branch="main", shows you exactly when failures started increasing. Set up a Grafana alert rule to notify your Slack channel when failed count exceeds a threshold.
These aren’t special dashboards. They sit alongside your existing infrastructure and application panels. Test health becomes another signal in the same place your team already looks every morning.
Correlating Test Health with Deployment Velocity
Combine test metrics with data you’re already collecting.
Plot gaffer.tests.pass_rate on the same graph as deployment frequency. If pass rate drops and deployments slow down simultaneously, you can see the cause-and-effect: a degrading test suite creates a bottleneck. Teams stop merging because CI is red, or they start ignoring failures — both of which compound.
Track gaffer.tests.failed against your change failure rate (CFR). A spike in test failures followed by a spike in production incidents suggests your test suite was catching real problems that got merged anyway.
OTLP Export in Test Reporting
Allure, ReportPortal, and BuildPulse are test reporting tools. They store results and display them in their own UI. If you want test metrics in your observability stack, you build a custom integration.
Gaffer exports test metrics via OTLP natively — no custom scripts or third-party plugins required. Your test results feed into the same dashboards, alerts, and correlation views as the rest of your operational data.
Try It
OpenTelemetry export is currently in beta and available on all plans. Configure it in Settings > Integrations > OpenTelemetry — it takes about two minutes if you have your observability credentials handy. For setup details, see the OpenTelemetry integration docs.