Skip to content

fix(aok): three critical fixes from run #24824838890 critic review#28106

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/compare-latest-runs-agents
Draft

fix(aok): three critical fixes from run #24824838890 critic review#28106
Copilot wants to merge 3 commits intomainfrom
copilot/compare-latest-runs-agents

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 23, 2026

Three critical bugs identified in the critic review of consolidated Agentic Optimization Kit run #24824838890. Each fix is a separate commit, clearly isolated by root cause.


Fix 1 — Percentage formatting (f2015a6)

Bug: The escalation issue body contained "875%" where "87.5%" was meant (7 failures out of 8 runs). The agent computed 87.5 correctly but dropped the decimal point, producing a nonsensical value.

Root cause: Phase 6 had no formatting instruction for failure rate percentages.

Fix: Added an explicit formula to the "Issue must:" line in Phase 6:

"divide failures by total runs and multiply by 100, then format to one decimal place (e.g., 7/8 = 87.5%, not 875%). Verify every percentage is in [0%, 100%]."

Also added a matching Guardrails bullet that applies the same rule to the discussion body.


Fix 2 — Period date accuracy (85cfd7c)

Bug: The executive summary stated "Period: 2026-04-22 to 2026-04-23 (7-day snapshot)" — a 1-day span incorrectly labelled as a 7-day window.

Root cause: The prompt template used YYYY-MM-DD to YYYY-MM-DD placeholders with no authoritative data source. The agent inferred start/end from the earliest/latest run timestamps in the dataset rather than computing from the current date. The --start-date -7d flag used in the download step was never captured anywhere the agent could read it.

Fix:

  1. At the top of the "Download Copilot workflow logs" step, compute PERIOD_START=$(date -u -d "7 days ago" +%Y-%m-%d) and PERIOD_END=$(date -u +%Y-%m-%d) and write them to /tmp/gh-aw/token-audit/period.env.
  2. Add period.env to the Data Inputs list so the agent knows the file exists.
  3. Update the discussion template line to: "PERIOD_START to PERIOD_END (7-day window) — read the exact start and end dates from /tmp/gh-aw/token-audit/period.env via cat; do not infer these dates from run timestamps".

Fix 3 — Delivery status transparency (98750f4)

Bug: The agent's completion summary stated "Phase 6: Created escalation issue for Smoke Copilot (P1) and Architecture Diagram Generator (P2)" — but the safe_outputs job failed with a GitHub API rate limit (HTTP 403, retried 4×). No issue was ever posted. The agent reported false success.

Root cause: The safeoutputs MCP create_issue/create_discussion returns success when it queues the action, not when GitHub delivers it. The actual delivery happens in the downstream safe_outputs job after the agent session ends and can fail independently. The agent had no awareness of this.

Fix:

  1. Added a Delivery note block at the end of Phase 6 explaining the async queue model and instructing the agent to say "submitted for delivery via safe-outputs" instead of "created" in completion summaries.
  2. Added a matching Guardrails bullet to reinforce this across the full session.

All three changes are in .github/workflows/agentic-optimization-kit.md only. The workflow compiles cleanly (0 errors, 0 warnings). Code review passed with no comments.


🤖 Smoke CI run #24839692815 completed successfully on 2026-04-23T14:04:11Z.

Generated by Smoke CI · ● 397.9K ·

Copilot AI and others added 3 commits April 23, 2026 13:53
…ypos

The escalation issue generated in run #24824838890 contained "875%"
where "87.5%" was meant (7/8 run failures). This happened because there
was no explicit instruction to format failure rates correctly.

Two guardrails added to agentic-optimization-kit.md:
1. In Phase 6 "Issue must:" line — explicit formula (failures/runs×100,
   1 decimal place, range-checked to [0%,100%]).
2. In the Guardrails section — a short restatement for the discussion.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/7e192d3c-abd2-4d17-8f65-21635650eb32

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Run #24824838890 published "Period: 2026-04-22 to 2026-04-23 (7-day
snapshot)" — a 1-day span labelled as 7 days. The agent inferred dates
from the run dataset timestamps instead of computing them arithmetically.

Changes in agentic-optimization-kit.md:
- Add date computation at top of "Download logs" step: writes
  PERIOD_START (today-7d) and PERIOD_END (today) to period.env
- Add period.env to the Data Inputs inventory so the agent knows
  the file exists
- Update discussion template from placeholder "YYYY-MM-DD to YYYY-MM-DD"
  to an explicit instruction: read dates from period.env via cat and do
  not infer them from run timestamps

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
…ess claims

Run #24824838890's agent concluded "Created escalation issue for Smoke
Copilot (P1)" — but the safe_outputs job failed with a GitHub API rate
limit, so no issue was ever posted. The agent saw the safeoutputs MCP
return success (it only queues the action), so it had no way to know
delivery had failed.

Changes in agentic-optimization-kit.md:
- Add a Delivery note block at the end of Phase 6: explains that
  create_issue/create_discussion via safeoutputs MCP only queues the
  item; the actual API call happens in the downstream safe_outputs job.
  Instructs the agent to say "submitted for delivery via safe-outputs"
  rather than "created" in completion summaries.
- Add a matching Guardrails bullet: "Safe-output delivery is
  asynchronous — use submitted for delivery instead of created."

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/7e192d3c-abd2-4d17-8f65-21635650eb32

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants