Mitigate intermittent AWF startup failures when awf-api-proxy health check flaps#27909
Closed
Mitigate intermittent AWF startup failures when awf-api-proxy health check flaps#27909
awf-api-proxy health check flaps#27909Conversation
4 tasks
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/596920a6-3a87-4dfb-9f45-762e54d420ac Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/596920a6-3a87-4dfb-9f45-762e54d420ac Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/596920a6-3a87-4dfb-9f45-762e54d420ac Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix awf-api-proxy sidecar health check issue
Mitigate intermittent AWF startup failures when Apr 22, 2026
awf-api-proxy health check flaps
Collaborator
|
@lpcox could this be something else? |
Collaborator
|
@pelikhan looking into it |
Closed
This comment has been minimized.
This comment has been minimized.
This was referenced Apr 23, 2026
This comment has been minimized.
This comment has been minimized.
Contributor
|
Hey One thing to address before this is ready for review:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Main-branch agentic workflows were failing before agent activation because Docker compose exited when
awf-api-proxybecame unhealthy during startup. Failures were transient and cross-engine, with a consistent pre-inference crash signature.AWF startup resilience for api-proxy flaps
/usr/local/bin/awfwrapper that retries AWF startup when stderr contains:dependency failed to start: container awf-api-proxy is unhealthyAWF_API_PROXY_START_RETRIES(default1)AWF_API_PROXY_RETRY_DELAY_SECONDS(default5)Install-path split: wrapper vs real runtime
install_awf_binary.shnow installs the actual runtime to/usr/local/lib/awf/awf-real(bundle launcher or native binary)./usr/local/bin/awfis now the retrying launcher, preserving existing call sites while centralizing retry logic.Operational hardening
umask 077 && mktemp) when capturing AWF output for retry pattern detection.🤖 Smoke CI scheduled run — https://github.com/github/gh-aw/actions/runs/24812244277
🤖 Smoke CI scheduled run — https://github.com/github/gh-aw/actions/runs/24813191140
🤖 Smoke CI scheduled run — https://github.com/github/gh-aw/actions/runs/24818568683
🤖 Smoke CI scheduled run — https://github.com/github/gh-aw/actions/runs/24820507401