feat(supervisor): optional ndots override for runner pods#3441
Conversation
|
WalkthroughThis pull request introduces a new DNS configuration feature for the supervisor. It adds documentation describing a pod DNS Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/supervisor/src/env.ts (1)
132-133: Optional: warn whenKUBERNETES_POD_DNS_NDOTSis set without enabling the override.If an operator sets
KUBERNETES_POD_DNS_NDOTSbut forgetsKUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED=true, the value is silently ignored (thednsConfigblock is not emitted inkubernetes.tsat Line 324). Consider adding asuperRefinecheck that raises an issue (or at least a startup log warning) whenKUBERNETES_POD_DNS_NDOTShas been explicitly provided while the flag is off — this would mirror the existingCOMPUTE_SNAPSHOTS_ENABLEDcross-validation at Lines 261-274. Low priority; purely a UX/footgun prevention nit.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/supervisor/src/env.ts` around lines 132 - 133, Add a cross-field validation to the env schema using zod's .superRefine so that when KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED is false but KUBERNETES_POD_DNS_NDOTS was explicitly provided (i.e., differs from its default) an issue is raised (or at minimum a startup warning logged); mirror the pattern used for COMPUTE_SNAPSHOTS_ENABLED cross-validation (the same file's existing check) and reference the KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED and KUBERNETES_POD_DNS_NDOTS symbols when implementing the check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@apps/supervisor/src/env.ts`:
- Around line 132-133: Add a cross-field validation to the env schema using
zod's .superRefine so that when KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED is
false but KUBERNETES_POD_DNS_NDOTS was explicitly provided (i.e., differs from
its default) an issue is raised (or at minimum a startup warning logged); mirror
the pattern used for COMPUTE_SNAPSHOTS_ENABLED cross-validation (the same file's
existing check) and reference the KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED and
KUBERNETES_POD_DNS_NDOTS symbols when implementing the check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 2ec86861-d032-462f-a7d7-cfa3b0cd56aa
📒 Files selected for processing (3)
.server-changes/supervisor-pod-dns-ndots.mdapps/supervisor/src/env.tsapps/supervisor/src/workloadManager/kubernetes.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: typecheck / typecheck
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
Files:
apps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Add crumbs as you write code using
//@Crumbscomments or `// `#region` `@crumbsblocks. These are temporary debug instrumentation and must be stripped usingagentcrumbs stripbefore merge.
Files:
apps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
apps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
apps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
apps/supervisor/src/workloadManager/**/*.{js,ts}
📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)
Container orchestration abstraction (Docker or Kubernetes) should be implemented in
src/workloadManager/
Files:
apps/supervisor/src/workloadManager/kubernetes.ts
**/*.ts{,x}
📄 CodeRabbit inference engine (CLAUDE.md)
Always import from
@trigger.dev/sdkwhen writing Trigger.dev tasks. Never use@trigger.dev/sdk/v3or deprecatedclient.defineJob.
Files:
apps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
apps/supervisor/src/env.ts
📄 CodeRabbit inference engine (apps/supervisor/CLAUDE.md)
Environment configuration should be defined in
src/env.ts
Files:
apps/supervisor/src/env.ts
🧠 Learnings (5)
📚 Learning: 2026-03-02T12:42:47.652Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: Applies to apps/supervisor/src/workloadManager/**/*.{js,ts} : Container orchestration abstraction (Docker or Kubernetes) should be implemented in `src/workloadManager/`
Applied to files:
apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-27T11:45:41.240Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3114
File: apps/supervisor/src/workloadServer/index.ts:832-840
Timestamp: 2026-03-27T11:45:41.240Z
Learning: In `apps/supervisor/src/workloadManager/compute.ts` and the supervisor restore flow, `TRIGGER_METADATA_URL` does not need to be re-injected on VM restore because it is baked into the instance environment at creation time and the environment is preserved through snapshot/restore. The Kubernetes restore path follows the same pattern. Do not flag the absence of `TRIGGER_METADATA_URL` re-injection on restore as a bug.
Applied to files:
apps/supervisor/src/workloadManager/kubernetes.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
apps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
apps/supervisor/src/workloadManager/kubernetes.tsapps/supervisor/src/env.ts
📚 Learning: 2026-03-02T12:42:47.652Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: Applies to apps/supervisor/src/env.ts : Environment configuration should be defined in `src/env.ts`
Applied to files:
apps/supervisor/src/env.ts
🔇 Additional comments (4)
apps/supervisor/src/env.ts (2)
125-133: LGTM — sensible defaults and bounds for the ndots override.
min(1).max(15)aligns with glibc'sRES_MAXNDOTS=15, and excluding0is a reasonable guardrail (ndots:0 disables the search list entirely).- Default
2preserves cross-namespace service resolution: a name likesvc.ns(1 dot) still goes through the search list and matchessvc.ns.svc.cluster.local.- The override is opt-in via
KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLED(defaultfalse), so the change is backward compatible.- Doc comment accurately describes the behavioral tradeoffs and pairs well with
.server-changes/supervisor-pod-dns-ndots.md.
202-204: Cosmetic reformat — no semantic change.The resulting error message string is identical to the previous
join(", ")form.apps/supervisor/src/workloadManager/kubernetes.ts (1)
324-330: LGTM — correct conditionaldnsConfiginjection.
- Shape matches
k8s.V1PodSpec.dnsConfigwithoptions: V1PodDNSConfigOption[](thevaluefield is a string, and the template literal${env.KUBERNETES_POD_DNS_NDOTS}produces one).dnsPolicyis intentionally left unset, so it staysClusterFirst; per Kubernetes semantics,dnsConfig.optionsmerges with the baseresolv.confand same-name options are overridden, so the pod-levelndotscorrectly takes precedence over the cluster default.- Conditional spread keeps the generated spec unchanged when the flag is off — zero behavioral impact for existing deployments.
.server-changes/supervisor-pod-dns-ndots.md (1)
1-9: LGTM — docs accurately describe the feature.Flag name, default value (
2), configurability viaKUBERNETES_POD_DNS_NDOTS, and the search-list-expansion rationale all match the implementation inenv.tsandkubernetes.ts. The callout about code paths relying on search-list expansion for longer names is a useful operator warning.
Adds
KUBERNETES_POD_DNS_NDOTS_OVERRIDE_ENABLEDflag (off by default) that overrides the cluster default and setsdnsConfig.options.ndotson runner pods (defaulting to 2, configurable viaKUBERNETES_POD_DNS_NDOTS).Kubernetes defaults pods to
ndots: 5, so any name with fewer than 5 dots, including typical external domains likeapi.example.com, is first walked through every entry in the cluster search list (<ns>.svc.cluster.local,svc.cluster.local,cluster.local) before being tried as-is, turning one resolution into 4+ CoreDNS queries (×2 with A+AAAA).Using a lower
ndotsvalue reduces DNS query amplification in thecluster.localzone.