Agent QC is a portable draft standard for evidence-driven testing of Agent projects: runtimes, CLIs, SDKs, tool and MCP gateways, multi-channel agents, GUI/TUI/desktop/WebUI clients, browser automation, skills/plugins, background schedulers, eval suites, and distribution packages.
The standard applies to any Agent project where "the agent says it works" is not enough. A passing result must be backed by inspectable evidence: command logs, test reports, traces, screenshots, model/tool transcripts, qcloop verifier rounds, CI URLs, or human review records.
| Adjacent system | It owns | Agent QC owns |
|---|---|---|
| Test frameworks | Running tests, fixtures, reporters, coverage, traces. | Which evidence each project profile must produce. |
| Agent runtime | Sessions, turns, tools, permissions, tasks, background work. | Runtime-specific QC gates and behavior acceptance. |
| qcloop | Batch execution, worker/verifier/repair loop, attempts, qc rounds. | How repeated QC cases are shaped and judged. |
| CI/CD | Job orchestration, matrices, artifacts, releases, Pages. | Gate intent, result semantics, and report aggregation. |
| Evidence systems | Durable traces, provenance, review, replay, export. | Evidence refs required by QC verdicts. |
| Humans / LLM judges | Review of semantics, UX, safety, and output quality. | Rubric shape and verdict contract. |
- A project classification model for Agent products.
- A cross-project gate matrix from static checks to live provider tests and release smoke.
- A testing-technique taxonomy covering snapshots, smoke tests, black-box, white-box, gray-box, replay, chaos, and release install proof.
- Interaction surface rules for CLI streams, TUI, WebUI, desktop GUI, browser automation, channel UI, and eval UI.
- Best practices adapted from Agent UI runtime-backed projection and Agent Skills progressive disclosure.
- Portable evidence and performance/reliability contracts.
- Core objects:
qc_plan,qc_case,qc_gate,qc_run,qc_verdict,qc_evidence, andqc_report. - Evidence-driven verdict rules for pass, fail, blocked, exhausted, waived, and needs-review.
- qcloop integration for repeated independent QC cases.
- Deep case studies from Codex, Claude Code local snapshot, OpenClaw, Hermes Agent, and other Agent project shapes.
- Public JSON schemas and examples.
Agent QC starts by classifying the project instead of assuming one stack:
agent-runtime-cli: CLI/runtime agents, tool execution, sandboxing, protocol state.agent-sdk-api: client SDKs, public APIs, generated contracts, fake servers.agent-tool-mcp-gateway: tool servers, MCP/ACP gateways, connector contracts.multi-channel-agent-gateway: Telegram/Discord/Slack/Matrix/webhook gateways, auth and secrets.agent-ui-tui-desktop: GUI, TUI, desktop shell, browser automation, screenshots.agent-skills-plugins: skills, plugins, manifests, discovery, package boundaries.background-agent-scheduler: cron, queues, workers, retries, concurrency, recovery.agent-distribution-release: install, package, Docker, release, supply-chain, cross-platform.agent-evals-quality: model behavior, task quality, rubrics, regressions, judges.
A real project usually combines several profiles.
- Specification
- Quickstart
- Best practices
- Test techniques and compositions
- Project classification
- Gate matrix
- Interaction surface testing
- qcloop integration
- Evidence-driven verdicts
- Acceptance scenarios
- Evidence contract
- Performance and reliability metrics
- Flow and taxonomy
- Case-study patterns
- Star project testing systems
- 中文规范
llms.txt: concise navigation index for AI clients.llms-full.txt: concatenated current English documentation for model context.llm.txtandllm-full.txt: compatibility aliases.
npm install
npm run devnpm run check:schemas
npm run buildThe static site is generated at docs/.vitepress/dist.