Skip to content
Join Now Login

CLI Reference

Install via Homebrew (macOS, Linux):

Terminal window
brew install gaffer-sh/tap/gaffer

Or via the install script (macOS, Linux):

Terminal window
curl -fsSL https://app.gaffer.sh/install.sh | sh

The install script places the gaffer binary in ~/.local/bin. Both methods support Linux (x86_64, aarch64) and macOS (Apple Silicon, Intel). For a guided walkthrough, see the Getting Started guide.

Run a test command and analyze results.

Terminal window
gaffer test -- npm test
gaffer test -- pytest -x
gaffer test -- go test ./...
gaffer test -- cargo test
FlagEnv varDescription
--token <token>GAFFER_TOKENAPI token for cloud sync
--report <path> / -r <path>Report file path(s) to parse (repeatable)
--root <dir>Project root directory (default: .)
--format <human|json>Output format (default: human)
--show-errorsShow full error messages, stack traces, and context files for failed tests
--compare <branch>Compare against the latest run on a branch (e.g. --compare=main)
--fail-on <mode>Override exit code based on failure classification. new exits 0 when only pre-existing or flaky failures exist
--affectedDerive the wrapped command from affected-tests. Use with --files. The trailing -- <cmd> is ignored when set
--files <paths>Changed source files. Only meaningful with --affected (repeatable)
--no-graphWith --affected, disable the import-graph strategy and fall back to naming + proximity heuristics
--no-cacheWith --affected, force an in-memory graph build instead of using .gaffer/graph.db
--on-empty <auto|skip|fail>With --affected, behavior when no tests are affected. auto (default) exits 0 only when all signals were available; skip always exits 0; fail always exits non-zero
--api-url <url>GAFFER_API_URLOverride API endpoint
  1. Runs your command as a child process, passing through stdout/stderr
  2. Discovers report files via glob patterns (config or defaults)
  3. Parses test results and coverage reports
  4. Computes health score, flaky tests, failure clusters, duration analysis
  5. Classifies each failure as new, pre_existing, flaky, or unknown (auto-compares against the default branch)
  6. Prints enriched summary to stderr
  7. Syncs results to cloud (if token configured)
  8. Exits with the child process’s exit code (or overrides via --fail-on)
gaffer 40 passed 2 failed 3 skipped 12.4s
Health: 87 (good) ^ Slow: p95 245.3ms
Flaky: 2 tests
src/auth.test.ts > login — 40% flip rate (4/10 runs)
src/api.test.ts > timeout handler — 20% flip rate (2/10 runs)
Clusters: 1 pattern (3 tests)
"Connection refused" — 3 tests
New failures: 1
src/billing.test.ts > charge card
Pre-existing: 1
src/db.test.ts > connection timeout
Coverage: 78.5% lines (1234/1572)
Synced: 1 run uploaded

Compare the current run against a baseline branch:

Terminal window
gaffer test --compare=main -- npm test
vs main: 2 new failures, 1 fixed, 3 pre-existing pass rate -5.0% duration +1.2s
NEW src/auth.test.ts > login > OAuth redirect
NEW src/billing.test.ts > charge card
FIX src/api.test.ts > timeout handler

Every failure is automatically classified by comparing against your default branch:

ClassificationMeaning
newFailed now, passed on the baseline branch. Likely caused by your changes.
pre_existingAlready failing on the baseline branch. Not your fault.
flakyKnown flaky test (high flip rate in historical data).
unknownNo baseline data available (first run or no runs on the default branch).

Classification runs automatically on every gaffer test invocation. The default branch is detected via git (falls back to main, then master).

Use --fail-on=new to exit 0 when only pre-existing or flaky failures exist:

Terminal window
gaffer test --fail-on=new -- npm test

This is useful in CI to avoid blocking PRs on failures that existed before your changes. If a failure is classified as unknown (no baseline), it’s treated as new for safety.

Signal exits (e.g. SIGTERM killing the test process) always propagate regardless of --fail-on.

Use --format=json to get machine-readable output on stdout:

Terminal window
gaffer test --format=json -- npm test | jq .health.score

The JSON output includes a classification object with each failure’s type:

Terminal window
gaffer test --format=json -- npm test | jq '.classification.classified_failures[] | {name, classification}'

--affected collapses the affected-tests + gaffer test agentic loop into one invocation. Pass the changed source files; Gaffer maps them to test files, scopes the runner, and parses results as usual.

Terminal window
gaffer test --affected --files src/auth.ts src/api.ts

Use --on-empty=auto (default) to exit 0 only when all detection signals were available. When some signals were unavailable (degraded mode), auto exits non-zero so CI doesn’t silently green-light a partial run. Use --on-empty=skip to always exit 0 or --on-empty=fail to always exit non-zero.

The CLI currently reports coverage_history and failure_history as unavailable on every run (those signals require a future Gaffer-history connection), so auto will exit non-zero on any empty result today. If you want silent-skip on empty, pass --on-empty=skip explicitly.

Map changed source files to relevant test specs. Returns test files and a suggested run command.

Terminal window
gaffer affected-tests --files src/auth.ts src/api.ts
FlagDescription
--files <paths>Source files that changed (required, repeatable)
--root <dir>Project root directory (default: .)
--format <human|json>Output format (default: json)
--prettyHuman-readable output to stderr. Equivalent to --format human
--no-graphDisable the import-graph strategy. Falls back to naming + proximity heuristics only. Faster on huge codebases at the cost of missing indirect dependencies
--no-cacheForce an in-memory graph build instead of using .gaffer/graph.db. Useful for ephemeral CI runs and read-only filesystems
--print-cmdPrint only the bare run_command string to stdout. Exit 1 when no command is available so gaffer test -- $(gaffer affected-tests --files X --print-cmd) fails fast on the empty case
StrategyExample
Naming conventionsrc/auth.ts finds src/auth.test.ts, src/auth.spec.ts, src/__tests__/auth.test.ts
Directory proximitysrc/utils.ts finds test files in sibling __tests__/ or tests/ directories
Import graphReverse-reachability over the static import graph. First call walks the project; subsequent calls incrementally update files whose mtime has changed, persisted to .gaffer/graph.db

The import graph runs by default. Pass --no-graph to opt out. Results are deduplicated across strategies; the JSON payload reports which signals were attempted and which were unavailable so callers can detect degraded runs.

{
"affected": [
{
"test_file": "src/auth.test.ts",
"source_file": "src/auth.ts",
"confidence": 0.97,
"strategy": "naming_convention",
"signals": [
{ "strategy": "naming_convention", "confidence": 0.9 },
{ "strategy": "import_graph", "confidence": 0.7 }
]
},
{
"test_file": "src/__tests__/api.test.ts",
"source_file": "src/api.ts",
"confidence": 0.3,
"strategy": "directory_proximity",
"signals": [
{ "strategy": "directory_proximity", "confidence": 0.3 }
]
}
],
"run_command": "pnpm vitest src/auth.test.ts src/__tests__/api.test.ts",
"framework": "vitest",
"signals": {
"attempted": ["naming_convention", "directory_proximity", "import_graph"],
"unavailable": ["coverage_history", "failure_history"]
}
}

Per-test fields: confidence is the noisy-OR combination across signals, and strategy is the highest-confidence individual signal (kept flat for legacy consumers). The signals array carries every signal that selected the test with its per-signal confidence.

The run command auto-detects your framework and package manager (pnpm, yarn, bun, or npm from lock files). signals.unavailable lists detection sources that weren’t reachable on this run; when affected is empty and this list is non-empty, the result is degraded rather than confirmed-empty.

The integrated gaffer test --affected flag is the simplest path. To pipe through affected-tests directly, use --print-cmd:

Terminal window
gaffer test -- $(gaffer affected-tests --files $(git diff --name-only main) --print-cmd)

--print-cmd exits 1 when no command is available, so the gaffer test invocation never runs with an empty wrapped command.

Diagnose common setup issues. Checks config, database, token validity, report discovery, framework detection, and CLI version.

Terminal window
gaffer doctor
gaffer doctor
OK Config .gaffer/config.toml
OK Database .gaffer/data.db (48KB), has data
OK Token gaf_...x4f2 (valid, API reachable)
OK Reports 12 files match current patterns
OK Frameworks vitest (vitest.config.ts), playwright (playwright.config.ts)
OK Version gaffer 0.1.0

Useful as a first diagnostic step when tests fail to sync or report files aren’t detected. Each check outputs OK, WARN, or FAIL with actionable detail.

Interactive project setup.

Terminal window
gaffer init

Steps:

  1. Detects test frameworks (Vitest, Playwright, Jest, pytest, Go, RSpec, .NET, Cargo, PHPUnit, Mocha)
  2. Shows reporter setup instructions for each detected framework
  3. Optionally authenticates via browser (creates API token)
  4. Writes .gaffer/config.toml
  5. Adds .gaffer/ to .gitignore

Query local test intelligence without running tests. Output is JSON by default — use --pretty for human-readable. AI agents can access the same data via the MCP server.

Health score, trend, and label.

Terminal window
gaffer query health
gaffer query health --pretty
gaffer query health | jq .score

Flaky tests ranked by composite score.

Terminal window
gaffer query flaky
gaffer query flaky | jq '.[].test_name'

Top N slowest tests by duration.

Terminal window
gaffer query slowest
gaffer query slowest --limit 5

Recent test runs with pass/fail counts.

Terminal window
gaffer query runs
gaffer query runs --limit 5

Pass/fail history for a specific test (name matched with LIKE).

Terminal window
gaffer query history "login"
gaffer query history "auth > login" --limit 10

Search failures across runs by test name or error message.

Terminal window
gaffer query failures "timeout"
gaffer query failures "connection refused" --limit 10

Force-sync pending uploads. Use when a previous gaffer test run was interrupted before syncing, or to retry failed uploads. To upload reports without the CLI, see the Upload API.

Terminal window
gaffer sync
gaffer sync --token gaf_xxx

Config file: .gaffer/config.toml (or gaffer.toml at project root)

[project]
token = "gaf_..."
api_url = "https://app.gaffer.sh"
[test]
report_patterns = [
"**/.gaffer/reports/**/*.xml",
"**/.gaffer/reports/**/*.json",
"**/junit*.xml",
"**/test-results/**/*.xml",
"**/test-reports/**/*.xml",
"**/target/nextest/**/*.xml",
"**/ctrf/**/*.json",
"**/ctrf-report.json",
"**/coverage/lcov.info",
"**/lcov.info",
]

Resolution order: CLI flags > environment variables > config file > defaults.

Config discovery: Walks up from the working directory looking for .gaffer/config.toml or gaffer.toml. The directory containing the config becomes the project root.

VariablePurpose
GAFFER_TOKENAPI token for cloud sync (overridden by --token)
GAFFER_API_URLAPI endpoint URL (overridden by --api-url)

When no --report flag or report_patterns config is set, Gaffer auto-discovers:

  • **/.gaffer/reports/**/*.xml — Gaffer’s own report directory
  • **/.gaffer/reports/**/*.json — Gaffer’s own report directory
  • **/junit*.xml — JUnit XML reports
  • **/test-results/**/*.xml — Common test result directories
  • **/test-reports/**/*.xml — Common test report directories
  • **/target/nextest/**/*.xml — Cargo nextest JUnit output
  • **/ctrf/**/*.json — CTRF JSON reports
  • **/ctrf-report.json — Default CTRF output
  • **/coverage/lcov.info — Default coverage output
  • **/lcov.info — Root-level coverage