Skip to content

Commit 6dce746

Browse files
authored
[aw-compat] Fix Serena codemod output for nested engine config and stale gh-aw source pins (#28080)
1 parent 9cf106c commit 6dce746

4 files changed

Lines changed: 332 additions & 51 deletions

File tree

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# ADR-28080: Multi-Path Serena Config Discovery and Structural YAML Mutation in Codemod
2+
3+
**Date**: 2026-04-23
4+
**Status**: Draft
5+
**Deciders**: pelikhan
6+
7+
---
8+
9+
## Part 1 — Narrative (Human-Friendly)
10+
11+
### Context
12+
13+
The `serena-tools-to-shared-import` codemod automates migration of inline Serena tool configuration to the shared `shared/mcp/serena.md` import. The original implementation only detected Serena config at the top-level `tools.serena` YAML path. However, workflows that specify a custom LLM engine can place the same config under `engine.tools.serena` instead. When such workflows are processed by the old codemod, the migration is silently skipped, leaving the workflow non-compilable after the shared import becomes mandatory. A second compounding issue: workflows pinned to a specific `github/gh-aw` commit SHA via `source:` reference an older version of the import chain that predates the required `with.languages` input, so even a successfully migrated workflow would fail validation at the pinned version.
14+
15+
### Decision
16+
17+
We will extend `findSerenaLanguagesForMigration` to probe both `tools.serena` (top-level) and `engine.tools.serena` (nested under an engine block), treating whichever location is populated as the migration source. We will fix YAML block insertion to scan forward to the end of the entire `engine` block before inserting the new `imports` entry, preserving sibling engine fields such as `model` and `id`. We will replace the simple top-level block removal with an indentation-aware `removeBlockIfEmpty` that avoids deleting engine blocks that retain meaningful sibling content. Finally, we will add `maybeUpdatePinnedSourceRef`, which rewrites any `source:` value pointing to `github/gh-aw` at a 40-character commit SHA to `@main` during the same migration pass, preventing stale-import validation failures.
18+
19+
### Alternatives Considered
20+
21+
#### Alternative 1: Separate Codemods per YAML Path
22+
23+
Define a second, independent codemod that handles `engine.tools.serena` exclusively, leaving the original `tools.serena` codemod unchanged. This keeps each codemod small and single-purpose, but requires both to be applied in the correct order and makes it easy for a caller to apply only one — leaving workflows in a half-migrated state that is harder to debug than no migration at all.
24+
25+
#### Alternative 2: Require Manual Migration for Engine-Scoped Serena Config
26+
27+
Document that `engine.tools.serena` requires manual migration and skip automation entirely. This is low risk for the codemod itself but shifts toil onto every workflow author whose workflow uses this pattern, and provides no protection against the `source:` pin problem. Given the volume of affected workflows in the repository, manual migration was not a viable path.
28+
29+
### Consequences
30+
31+
#### Positive
32+
- Workflows using `engine.tools.serena` are now migrated automatically, closing a silent failure mode.
33+
- The `@main` source pin rewrite prevents post-migration validation failures against stale upstream import chains that lack the now-required `with.languages` input.
34+
- Indentation-aware block removal preserves `engine` sibling fields (`id`, `model`) at the correct YAML depth, making the transformation structurally correct for the full range of engine block shapes.
35+
36+
#### Negative
37+
- `findSerenaLanguagesForMigration` now encodes a precedence rule (top-level `tools.serena` wins over `engine.tools.serena`); if a workflow has both, only the top-level value is used and the engine-scoped config is silently discarded.
38+
- The `maybeUpdatePinnedSourceRef` rewrite is scoped to `github/gh-aw` sources pinned by commit SHA — workflows pinned to forks, tags, or other repos are unaffected, requiring separate handling if the same pin-staleness problem arises there.
39+
- Indentation-aware YAML mutation increases the complexity of `removeBlockIfEmpty` and `hasNestedContent`, making edge cases harder to reason about without a property-based test suite.
40+
41+
#### Neutral
42+
- The change touches only `pkg/cli/codemod_serena_import.go`, keeping scope narrow and the diff reviewable in a single pass.
43+
- Regression tests added for both `engine.tools.serena` migration with sibling preservation and the `source:` SHA rewrite path.
44+
45+
---
46+
47+
## Part 2 — Normative Specification (RFC 2119)
48+
49+
> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).
50+
51+
### Serena Config Discovery
52+
53+
1. Implementations **MUST** detect Serena configuration at `tools.serena` (top-level) before checking `engine.tools.serena`.
54+
2. Implementations **MUST** fall back to `engine.tools.serena` when no `tools.serena` key is present in the frontmatter.
55+
3. Implementations **MUST NOT** attempt migration when neither location contains a non-empty `languages` list.
56+
4. Implementations **MUST NOT** merge languages from both paths; the first matching path **SHALL** be the sole source of the language list.
57+
58+
### YAML Block Insertion
59+
60+
1. When inserting the `imports` block adjacent to an `engine:` block, implementations **MUST** scan forward to the end of the full `engine` block (i.e., until the next top-level key or end-of-document) before computing the insertion point.
61+
2. Implementations **MUST NOT** insert the `imports` block on the line immediately following the `engine:` key, as this would interleave the new block with the engine block's nested content.
62+
63+
### Empty-Block Removal
64+
65+
1. Implementations **MUST** remove a YAML block (`tools`, `engine`) only when it contains no meaningful nested content after migration — where "meaningful" means a non-empty, non-comment child line indented deeper than the block key.
66+
2. Implementations **MUST** preserve the enclosing block (e.g., `engine:`) when sibling fields remain after the migrated sub-key (`tools`) is removed.
67+
3. Implementations **MUST NOT** remove a block based solely on indentation depth; the check **MUST** inspect actual content.
68+
69+
### Pinned Source Ref Rewrite
70+
71+
1. When a migration is applied and `source:` is present in frontmatter, implementations **MUST** rewrite the ref to `@main` if and only if: (a) the source repo is `github/gh-aw`, and (b) the current ref is exactly a 40-character hexadecimal commit SHA.
72+
2. Implementations **MUST NOT** rewrite `source:` values that point to a different repository, use a named branch, or use a tag.
73+
3. Implementations **SHOULD** perform the source ref rewrite in the same migration pass as the `tools.serena` removal to keep the workflow in a consistent state after a single codemod run.
74+
75+
### Conformance
76+
77+
An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance.
78+
79+
---
80+
81+
*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/24840177385) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*

pkg/cli/codemod_serena_import.go

Lines changed: 135 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -13,32 +13,17 @@ import (
1313
var serenaImportCodemodLog = logger.New("cli:codemod_serena_import")
1414

1515
// getSerenaToSharedImportCodemod creates a codemod that migrates removed tools.serena
16-
// configuration to an equivalent imports entry using shared/mcp/serena.md.
16+
// or engine.tools.serena configuration to an equivalent imports entry using
17+
// shared/mcp/serena.md, and may normalize a pinned source ref to @main.
1718
func getSerenaToSharedImportCodemod() Codemod {
1819
return Codemod{
1920
ID: "serena-tools-to-shared-import",
20-
Name: "Migrate tools.serena to shared Serena import",
21-
Description: "Removes 'tools.serena' and adds an equivalent 'imports' entry using shared/mcp/serena.md with languages.",
21+
Name: "Migrate tools.serena or engine.tools.serena to shared Serena import",
22+
Description: "Removes 'tools.serena' or 'engine.tools.serena', adds an equivalent 'imports' entry using shared/mcp/serena.md with languages, and may rewrite a pinned 'source:' ref to '@main'.",
2223
IntroducedIn: "1.0.0",
2324
Apply: func(content string, frontmatter map[string]any) (string, bool, error) {
24-
toolsAny, hasTools := frontmatter["tools"]
25-
if !hasTools {
26-
return content, false, nil
27-
}
28-
29-
toolsMap, ok := toolsAny.(map[string]any)
30-
if !ok {
31-
return content, false, nil
32-
}
33-
34-
serenaAny, hasSerena := toolsMap["serena"]
35-
if !hasSerena {
36-
return content, false, nil
37-
}
38-
39-
languages, ok := extractSerenaLanguages(serenaAny)
25+
languages, ok := findSerenaLanguagesForMigration(frontmatter)
4026
if !ok || len(languages) == 0 {
41-
serenaImportCodemodLog.Print("Found tools.serena but languages configuration is invalid or empty - skipping migration; verify tools.serena languages are set")
4227
return content, false, nil
4328
}
4429

@@ -50,7 +35,8 @@ func getSerenaToSharedImportCodemod() Codemod {
5035
return lines, false
5136
}
5237

53-
result = removeTopLevelBlockIfEmpty(result, "tools")
38+
result = removeBlockIfEmpty(result, "tools")
39+
result = removeBlockIfEmpty(result, "engine")
5440

5541
if alreadyImported {
5642
return result, true
@@ -59,6 +45,7 @@ func getSerenaToSharedImportCodemod() Codemod {
5945
return addSerenaImport(result, languages), true
6046
})
6147
if applied {
48+
newContent = maybeUpdatePinnedSourceRef(newContent, frontmatter)
6249
if alreadyImported {
6350
serenaImportCodemodLog.Print("Removed tools.serena (shared/mcp/serena.md import already present)")
6451
} else {
@@ -70,6 +57,52 @@ func getSerenaToSharedImportCodemod() Codemod {
7057
}
7158
}
7259

60+
func findSerenaLanguagesForMigration(frontmatter map[string]any) ([]string, bool) {
61+
toolsAny, hasTools := frontmatter["tools"]
62+
if hasTools {
63+
if toolsMap, ok := toolsAny.(map[string]any); ok {
64+
if serenaAny, hasSerena := toolsMap["serena"]; hasSerena {
65+
languages, ok := extractSerenaLanguages(serenaAny)
66+
if ok && len(languages) > 0 {
67+
return languages, true
68+
}
69+
}
70+
}
71+
}
72+
73+
engineAny, hasEngine := frontmatter["engine"]
74+
if !hasEngine {
75+
return nil, false
76+
}
77+
78+
engineMap, ok := engineAny.(map[string]any)
79+
if !ok {
80+
return nil, false
81+
}
82+
83+
engineToolsAny, hasEngineTools := engineMap["tools"]
84+
if !hasEngineTools {
85+
return nil, false
86+
}
87+
88+
engineToolsMap, ok := engineToolsAny.(map[string]any)
89+
if !ok {
90+
return nil, false
91+
}
92+
93+
serenaAny, hasSerena := engineToolsMap["serena"]
94+
if !hasSerena {
95+
return nil, false
96+
}
97+
98+
languages, ok := extractSerenaLanguages(serenaAny)
99+
if !ok || len(languages) == 0 {
100+
return nil, false
101+
}
102+
103+
return languages, true
104+
}
105+
73106
func extractSerenaLanguages(serenaAny any) ([]string, bool) {
74107
switch serena := serenaAny.(type) {
75108
case []string:
@@ -204,7 +237,13 @@ func addSerenaImport(lines []string, languages []string) []string {
204237
insertAt := 0
205238
for i, line := range lines {
206239
if isTopLevelKey(line) && strings.HasPrefix(strings.TrimSpace(line), "engine:") {
207-
insertAt = i + 1
240+
insertAt = len(lines)
241+
for j := i + 1; j < len(lines); j++ {
242+
if isTopLevelKey(lines[j]) {
243+
insertAt = j
244+
break
245+
}
246+
}
208247
break
209248
}
210249
}
@@ -228,42 +267,88 @@ func formatStringArrayInline(values []string) string {
228267
return "[" + strings.Join(quoted, ", ") + "]"
229268
}
230269

231-
func removeTopLevelBlockIfEmpty(lines []string, blockName string) []string {
232-
blockIdx := -1
233-
blockEnd := len(lines)
234-
for i, line := range lines {
235-
if isTopLevelKey(line) && strings.HasPrefix(strings.TrimSpace(line), blockName+":") {
236-
blockIdx = i
237-
for j := i + 1; j < len(lines); j++ {
238-
if isTopLevelKey(lines[j]) {
239-
blockEnd = j
240-
break
241-
}
242-
}
243-
break
270+
func removeBlockIfEmpty(lines []string, blockName string) []string {
271+
result := make([]string, 0, len(lines))
272+
for i := 0; i < len(lines); {
273+
line := lines[i]
274+
trimmed := strings.TrimSpace(line)
275+
if !strings.HasPrefix(trimmed, blockName+":") {
276+
result = append(result, line)
277+
i++
278+
continue
244279
}
245-
}
246280

247-
if blockIdx == -1 {
248-
return lines
281+
valuePart := strings.TrimSpace(strings.TrimPrefix(trimmed, blockName+":"))
282+
if valuePart != "" && !strings.HasPrefix(valuePart, "#") {
283+
result = append(result, line)
284+
i++
285+
continue
286+
}
287+
288+
hasMeaningfulNestedContent, blockEnd := hasNestedContent(lines, i+1, getIndentation(line))
289+
290+
if hasMeaningfulNestedContent {
291+
result = append(result, line)
292+
i++
293+
continue
294+
}
295+
296+
i = blockEnd
249297
}
250298

251-
hasMeaningfulNestedContent := false
252-
for _, line := range lines[blockIdx+1 : blockEnd] {
253-
trimmed := strings.TrimSpace(line)
254-
if trimmed == "" || strings.HasPrefix(trimmed, "#") {
299+
return result
300+
}
301+
302+
func hasNestedContent(lines []string, startIndex int, blockIndent string) (bool, int) {
303+
for i := startIndex; i < len(lines); i++ {
304+
nestedLine := lines[i]
305+
nestedTrimmed := strings.TrimSpace(nestedLine)
306+
if nestedTrimmed == "" {
307+
continue
308+
}
309+
310+
nestedIndent := getIndentation(nestedLine)
311+
if strings.HasPrefix(nestedTrimmed, "#") {
312+
if len(nestedIndent) <= len(blockIndent) {
313+
return false, i
314+
}
255315
continue
256316
}
257-
hasMeaningfulNestedContent = true
258-
break
317+
318+
if len(nestedIndent) <= len(blockIndent) && strings.Contains(nestedLine, ":") {
319+
return false, i
320+
}
321+
322+
return true, i
259323
}
260324

261-
if hasMeaningfulNestedContent {
262-
return lines
325+
return false, len(lines)
326+
}
327+
328+
func maybeUpdatePinnedSourceRef(content string, frontmatter map[string]any) string {
329+
sourceAny, hasSource := frontmatter["source"]
330+
if !hasSource {
331+
return content
263332
}
264333

265-
result := make([]string, 0, len(lines)-(blockEnd-blockIdx))
266-
result = append(result, lines[:blockIdx]...)
267-
result = append(result, lines[blockEnd:]...)
268-
return result
334+
source, ok := sourceAny.(string)
335+
if !ok || strings.TrimSpace(source) == "" {
336+
return content
337+
}
338+
339+
sourceSpec, err := parseSourceSpec(source)
340+
if err != nil {
341+
return content
342+
}
343+
344+
if sourceSpec.Repo != "github/gh-aw" || !IsCommitSHA(sourceSpec.Ref) {
345+
return content
346+
}
347+
348+
updatedSource := sourceSpec.Repo + "/" + sourceSpec.Path + "@main"
349+
updatedContent, err := UpdateFieldInFrontmatter(content, "source", updatedSource)
350+
if err != nil {
351+
return content
352+
}
353+
return updatedContent
269354
}

0 commit comments

Comments
 (0)