Skip to content

Commit ca39956

Browse files
authored
feat(webapp): add per-worker Node.js heap metrics (#3437)
## Summary Adds direct V8 heap and process-memory gauges to the webapp's OpenTelemetry meter. The webapp already exports per-cluster-worker Node.js runtime metrics (event-loop lag / utilization, active handles, active requests, libuv threadpool size) via a custom meter under the `trigger.dev` scope. Heap and memory were missing; this PR adds them alongside, in the same observable-batch pattern. ## New gauges | Metric | Source | Unit | | --- | --- | --- | | `nodejs.memory.heap.used` | `process.memoryUsage().heapUsed` | bytes | | `nodejs.memory.heap.total` | `process.memoryUsage().heapTotal` | bytes | | `nodejs.memory.heap.limit` | `v8.getHeapStatistics().heap_size_limit` | bytes | | `nodejs.memory.external` | `process.memoryUsage().external` | bytes | | `nodejs.memory.array_buffers` | `process.memoryUsage().arrayBuffers` | bytes | | `nodejs.memory.rss` | `process.memoryUsage().rss` | bytes | Gated by the existing `INTERNAL_OTEL_NODEJS_METRICS_ENABLED` flag, same as the adjacent event-loop / handle gauges. Zero overhead when disabled. ## Why `@opentelemetry/host-metrics` publishes `process.memory.usage`, which is RSS only. RSS is the sum of V8 heap, external memory (Buffers, etc.), native code, and thread stacks. Without a direct heap metric it is not possible to size the V8 heap cap (`--max-old-space-size`) from metrics alone, because RSS overstates heap by the external + native footprint. A worker can have a 4 GB RSS with a 2.5 GB heap and 1.5 GB of buffers; the former constrains `--max-old-space-size`, the latter does not. `nodejs.memory.heap.limit` also surfaces the configured `--max-old-space-size` (read from `v8.getHeapStatistics().heap_size_limit`), so operators can see the current limit in the same dashboard as actual usage rather than cross-referencing container environment variables. ## Risk Minimal. Observable gauges are sampled at the configured metric-export interval. `v8.getHeapStatistics()` and `process.memoryUsage()` are each microsecond-level calls, and six gauges are added to the same batch callback that already reads ~20 other Node.js runtime values per sample. Same registration pattern as the existing event-loop metrics in the file. ## Test plan - [ ] Deploy and confirm the six new gauges appear at the configured exporter - [ ] In cluster mode, confirm per-worker granularity (one series per cluster worker, tagged by `process.executable.name` / `service.instance.id`) - [ ] Confirm `nodejs.memory.heap.limit` reports the configured `--max-old-space-size` value in bytes
1 parent 41434b5 commit ca39956

2 files changed

Lines changed: 69 additions & 0 deletions

File tree

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
area: webapp
3+
type: improvement
4+
---
5+
6+
Add per-worker Node.js heap metrics to the OTel meter — `nodejs.memory.heap.used`, `nodejs.memory.heap.total`, `nodejs.memory.heap.limit`, `nodejs.memory.external`, `nodejs.memory.array_buffers`, `nodejs.memory.rss`. Host-metrics only publishes RSS, which overstates V8 heap by the external + native footprint; these give direct heap visibility per cluster worker so `NODE_MAX_OLD_SPACE_SIZE` can be sized against observed heap peaks rather than RSS.

apps/webapp/app/v3/tracer.server.ts

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ import { ATTR_SERVICE_NAME } from "@opentelemetry/semantic-conventions";
3838
import { PrismaInstrumentation } from "@prisma/instrumentation";
3939
import { HostMetrics } from "@opentelemetry/host-metrics";
4040
import { AwsInstrumentation as AwsSdkInstrumentation } from "@opentelemetry/instrumentation-aws-sdk";
41+
import v8 from "node:v8";
4142
import { awsEcsDetector, awsEc2Detector } from "@opentelemetry/resource-detector-aws";
4243
import {
4344
detectResources,
@@ -630,6 +631,39 @@ function configureNodejsMetrics({ meter }: { meter: Meter }) {
630631
unit: "1", // OpenTelemetry convention for ratios
631632
});
632633

634+
// V8 heap + process memory. `NODE_MAX_OLD_SPACE_SIZE` caps V8 old space
635+
// (reflected in `heap.limit`), but doesn't cap external/arrayBuffers/native
636+
// memory — which is why RSS can exceed the heap total. Tracking all of these
637+
// per-worker lets us size `NODE_MAX_OLD_SPACE_SIZE` against observed heap
638+
// peaks rather than RSS (which overstates heap by the external + native
639+
// footprint). `host-metrics` already publishes `process.memory.usage`
640+
// (RSS), but we duplicate it under `nodejs.memory.rss` so all the memory
641+
// numbers land in the same scope and are queryable together.
642+
const heapUsedGauge = meter.createObservableGauge("nodejs.memory.heap.used", {
643+
description: "V8 heap actively in use after the last GC",
644+
unit: "By",
645+
});
646+
const heapTotalGauge = meter.createObservableGauge("nodejs.memory.heap.total", {
647+
description: "V8 heap reserved (young + old generations)",
648+
unit: "By",
649+
});
650+
const heapLimitGauge = meter.createObservableGauge("nodejs.memory.heap.limit", {
651+
description: "V8 heap size limit (configured via --max-old-space-size)",
652+
unit: "By",
653+
});
654+
const externalMemoryGauge = meter.createObservableGauge("nodejs.memory.external", {
655+
description: "Memory used by C++ objects bound to JS (Buffer, etc.)",
656+
unit: "By",
657+
});
658+
const arrayBuffersGauge = meter.createObservableGauge("nodejs.memory.array_buffers", {
659+
description: "Memory allocated for ArrayBuffers and SharedArrayBuffers",
660+
unit: "By",
661+
});
662+
const rssGauge = meter.createObservableGauge("nodejs.memory.rss", {
663+
description: "Resident set size — total physical memory held by the process",
664+
unit: "By",
665+
});
666+
633667
// Get UV threadpool size (defaults to 4 if not set)
634668
const uvThreadpoolSize = parseInt(process.env.UV_THREADPOOL_SIZE || "4", 10);
635669

@@ -683,10 +717,16 @@ function configureNodejsMetrics({ meter }: { meter: Meter }) {
683717
currentEventLoopUtilization,
684718
lastEventLoopUtilization
685719
);
720+
// Rotate the baseline so the next collection reports per-interval
721+
// utilization rather than the cumulative average from process start.
722+
lastEventLoopUtilization = currentEventLoopUtilization;
686723

687724
// diff.utilization is between 0 and 1 (fraction of time "active")
688725
const utilization = Number.isFinite(diff.utilization) ? diff.utilization : 0;
689726

727+
const mem = process.memoryUsage();
728+
const heapStats = v8.getHeapStatistics();
729+
690730
return {
691731
threadpoolSize: uvThreadpoolSize,
692732
handlesByType,
@@ -702,6 +742,14 @@ function configureNodejsMetrics({ meter }: { meter: Meter }) {
702742
p99: eventLoopLagP99?.values?.[0]?.value ?? 0,
703743
utilization,
704744
},
745+
memory: {
746+
heapUsed: mem.heapUsed,
747+
heapTotal: mem.heapTotal,
748+
heapLimit: heapStats.heap_size_limit,
749+
external: mem.external,
750+
arrayBuffers: mem.arrayBuffers,
751+
rss: mem.rss,
752+
},
705753
};
706754
}
707755

@@ -714,6 +762,7 @@ function configureNodejsMetrics({ meter }: { meter: Meter }) {
714762
requestsByType,
715763
requestsTotal,
716764
eventLoop,
765+
memory,
717766
} = await readNodeMetrics();
718767

719768
// Observe UV threadpool size
@@ -739,6 +788,14 @@ function configureNodejsMetrics({ meter }: { meter: Meter }) {
739788
res.observe(eventLoopLagP90Gauge, eventLoop.p90);
740789
res.observe(eventLoopLagP99Gauge, eventLoop.p99);
741790
res.observe(eluGauge, eventLoop.utilization);
791+
792+
// Observe memory metrics (bytes)
793+
res.observe(heapUsedGauge, memory.heapUsed);
794+
res.observe(heapTotalGauge, memory.heapTotal);
795+
res.observe(heapLimitGauge, memory.heapLimit);
796+
res.observe(externalMemoryGauge, memory.external);
797+
res.observe(arrayBuffersGauge, memory.arrayBuffers);
798+
res.observe(rssGauge, memory.rss);
742799
},
743800
[
744801
uvThreadpoolSizeGauge,
@@ -753,6 +810,12 @@ function configureNodejsMetrics({ meter }: { meter: Meter }) {
753810
eventLoopLagP90Gauge,
754811
eventLoopLagP99Gauge,
755812
eluGauge,
813+
heapUsedGauge,
814+
heapTotalGauge,
815+
heapLimitGauge,
816+
externalMemoryGauge,
817+
arrayBuffersGauge,
818+
rssGauge,
756819
]
757820
);
758821
}

0 commit comments

Comments
 (0)