[Workers AI] Add NVIDIA Nemotron 3 Super model, changelog, release notes, and pricing (#28922)

mchenco · web-flow · commit 1baf1e1be361 · 2026-03-11T11:53:15.000-04:00
diff --git a/src/content/changelog/workers-ai/2026-03-11-nemotron-3-super-workers-ai.mdx b/src/content/changelog/workers-ai/2026-03-11-nemotron-3-super-workers-ai.mdx
@@ -0,0 +1,22 @@
+---
+title: "NVIDIA Nemotron 3 Super now available on Workers AI"
+description: A hybrid MoE model with 120B total parameters and 12B active, optimized for multi-agent and agentic AI workloads.
+products:
+  - workers-ai
+date: 2026-03-11
+---
+
+We're excited to partner with NVIDIA to bring [`@cf/nvidia/nemotron-3-120b-a12b`](/workers-ai/models/nemotron-3-120b-a12b/) to Workers AI. NVIDIA Nemotron 3 Super is a Mixture-of-Experts (MoE) model with a hybrid Mamba-transformer architecture, 120B total parameters, and 12B active parameters per forward pass.
+
+The model is optimized for running many collaborating agents per application. It delivers high accuracy for reasoning, tool calling, and instruction following across complex multi-step tasks.
+
+**Key capabilities:**
+
+- **Hybrid Mamba-transformer architecture** delivers over 50% higher token generation throughput compared to leading open models, reducing latency for real-world applications
+- **Tool calling** support for building AI agents that invoke tools across multiple conversation turns
+- **Multi-Token Prediction (MTP)** accelerates long-form text generation by predicting several future tokens simultaneously in a single forward pass
+- **32,000 token context window** for retaining conversation history and plan states across multi-step agent workflows
+
+Use Nemotron 3 Super through the [Workers AI binding](/workers-ai/configuration/bindings/) (`env.AI.run()`), the REST API, or the [OpenAI-compatible endpoint](/workers-ai/configuration/open-ai-compatibility/).
+
+For more information, refer to the [Nemotron 3 Super model page](/workers-ai/models/nemotron-3-120b-a12b/).
diff --git a/src/content/docs/workers-ai/platform/pricing.mdx b/src/content/docs/workers-ai/platform/pricing.mdx
@@ -60,6 +60,7 @@ The Price in Tokens column is equivalent to the Price in Neurons column - the di
 | @cf/aisingapore/gemma-sea-lion-v4-27b-it     | $0.351 per M input tokens <br/> $0.555 per M output tokens | 31876 neurons per M input tokens <br/> 50488 neurons per M output tokens  |
 | @cf/ibm-granite/granite-4.0-h-micro          | $0.017 per M input tokens <br/> $0.112 per M output tokens | 1542 neurons per M input tokens <br/> 10158 neurons per M output tokens   |
 | @cf/zai-org/glm-4.7-flash                    | $0.060 per M input tokens <br/> $0.400 per M output tokens | 5500 neurons per M input tokens <br/> 36400 neurons per M output tokens   |
+| @cf/nvidia/nemotron-3-120b-a12b              | $0.500 per M input tokens <br/> $1.500 per M output tokens | 45455 neurons per M input tokens <br/> 136364 neurons per M output tokens |
 
 ## Embeddings model pricing
 
diff --git a/src/content/release-notes/workers-ai.yaml b/src/content/release-notes/workers-ai.yaml
@@ -3,6 +3,10 @@ link: "/workers-ai/changelog/"
 productName: Workers AI
 productLink: "/workers-ai/"
 entries:
+  - publish_date: "2026-03-11"
+    title: NVIDIA Nemotron 3 Super now available on Workers AI
+    description: |-
+      - [`@cf/nvidia/nemotron-3-120b-a12b`](/workers-ai/models/nemotron-3-120b-a12b/) now available on Workers AI! A hybrid MoE model with 120B total parameters and 12B active, optimized for multi-agent and agentic AI workloads. Read [changelog](/changelog/post/2026-03-11-nemotron-3-super-workers-ai/) to get started.
   - publish_date: "2026-03-06"
     title: Deepgram Nova-3 now supports 10 languages with regional variants
     description: |-
diff --git a/src/content/workers-ai-models/nemotron-3-120b-a12b.json b/src/content/workers-ai-models/nemotron-3-120b-a12b.json