fix(aok): three critical fixes from run #24824838890 critic review#28106
Draft
fix(aok): three critical fixes from run #24824838890 critic review#28106
Conversation
…ypos The escalation issue generated in run #24824838890 contained "875%" where "87.5%" was meant (7/8 run failures). This happened because there was no explicit instruction to format failure rates correctly. Two guardrails added to agentic-optimization-kit.md: 1. In Phase 6 "Issue must:" line — explicit formula (failures/runs×100, 1 decimal place, range-checked to [0%,100%]). 2. In the Guardrails section — a short restatement for the discussion. Agent-Logs-Url: https://github.com/github/gh-aw/sessions/7e192d3c-abd2-4d17-8f65-21635650eb32 Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Run #24824838890 published "Period: 2026-04-22 to 2026-04-23 (7-day snapshot)" — a 1-day span labelled as 7 days. The agent inferred dates from the run dataset timestamps instead of computing them arithmetically. Changes in agentic-optimization-kit.md: - Add date computation at top of "Download logs" step: writes PERIOD_START (today-7d) and PERIOD_END (today) to period.env - Add period.env to the Data Inputs inventory so the agent knows the file exists - Update discussion template from placeholder "YYYY-MM-DD to YYYY-MM-DD" to an explicit instruction: read dates from period.env via cat and do not infer them from run timestamps Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
…ess claims Run #24824838890's agent concluded "Created escalation issue for Smoke Copilot (P1)" — but the safe_outputs job failed with a GitHub API rate limit, so no issue was ever posted. The agent saw the safeoutputs MCP return success (it only queues the action), so it had no way to know delivery had failed. Changes in agentic-optimization-kit.md: - Add a Delivery note block at the end of Phase 6: explains that create_issue/create_discussion via safeoutputs MCP only queues the item; the actual API call happens in the downstream safe_outputs job. Instructs the agent to say "submitted for delivery via safe-outputs" rather than "created" in completion summaries. - Add a matching Guardrails bullet: "Safe-output delivery is asynchronous — use submitted for delivery instead of created." Agent-Logs-Url: https://github.com/github/gh-aw/sessions/7e192d3c-abd2-4d17-8f65-21635650eb32 Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Copilot created this pull request from a session on behalf of
mnkiefer
April 23, 2026 14:04
View session
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three critical bugs identified in the critic review of consolidated Agentic Optimization Kit run #24824838890. Each fix is a separate commit, clearly isolated by root cause.
Fix 1 — Percentage formatting (
f2015a6)Bug: The escalation issue body contained
"875%"where"87.5%"was meant (7 failures out of 8 runs). The agent computed 87.5 correctly but dropped the decimal point, producing a nonsensical value.Root cause: Phase 6 had no formatting instruction for failure rate percentages.
Fix: Added an explicit formula to the
"Issue must:"line in Phase 6:Also added a matching Guardrails bullet that applies the same rule to the discussion body.
Fix 2 — Period date accuracy (
85cfd7c)Bug: The executive summary stated
"Period: 2026-04-22 to 2026-04-23 (7-day snapshot)"— a 1-day span incorrectly labelled as a 7-day window.Root cause: The prompt template used
YYYY-MM-DD to YYYY-MM-DDplaceholders with no authoritative data source. The agent inferred start/end from the earliest/latest run timestamps in the dataset rather than computing from the current date. The--start-date -7dflag used in the download step was never captured anywhere the agent could read it.Fix:
PERIOD_START=$(date -u -d "7 days ago" +%Y-%m-%d)andPERIOD_END=$(date -u +%Y-%m-%d)and write them to/tmp/gh-aw/token-audit/period.env.period.envto the Data Inputs list so the agent knows the file exists."PERIOD_START to PERIOD_END (7-day window) — read the exact start and end dates from /tmp/gh-aw/token-audit/period.env via cat; do not infer these dates from run timestamps".Fix 3 — Delivery status transparency (
98750f4)Bug: The agent's completion summary stated "Phase 6: Created escalation issue for Smoke Copilot (P1) and Architecture Diagram Generator (P2)" — but the
safe_outputsjob failed with a GitHub API rate limit (HTTP 403, retried 4×). No issue was ever posted. The agent reported false success.Root cause: The safeoutputs MCP
create_issue/create_discussionreturns success when it queues the action, not when GitHub delivers it. The actual delivery happens in the downstreamsafe_outputsjob after the agent session ends and can fail independently. The agent had no awareness of this.Fix:
All three changes are in
.github/workflows/agentic-optimization-kit.mdonly. The workflow compiles cleanly (0 errors, 0 warnings). Code review passed with no comments.🤖 Smoke CI run #24839692815 completed successfully on 2026-04-23T14:04:11Z.