Nemotron 3 Ultra Master Prompts (June 2026)
Optimized prompts for NVIDIA Nemotron 3 Ultra — the first open-weight 550B hybrid Mamba-MoE model. 55B active parameters, 1M context window, 89.1 MMLU. Datacenter-scale agentic reasoning.
📋 Prompt
/* NEMOTRON 3 ULTRA MASTER PROMPT VERSION: 1.0.0 CAPABILITIES: 550B Mamba-MoE, 55B Active, 1M Context, 89.1 MMLU ARCHITECTURE: Hybrid Mamba–Transformer Mixture-of-Experts */ **Task:** [ANALYSIS | REASONING | GENERATION | AGENTIC] **Context Size:** [ESTIMATED_TOKENS] tokens (max 1,000,000) **Reasoning Depth:** [SHALLOW | MODERATE | DEEP | EXHAUSTIVE] **Instructions:** 1. [PHASE_1 — Decomposition/Setup] 2. [PHASE_2 — Core Processing] 3. [PHASE_3 — Synthesis/Verification] 4. [PHASE_4 — Output Formatting] **Output Format:** [MARKDOWN | JSON | TABLE | CODE] **Quality Requirements:** - [REQUIREMENT_1] - [REQUIREMENT_2] Nemotron 3 Ultra architecture notes: - Mamba backbone excels at ultra-long sequences (use full 1M context!) - MoE with 55B active parameters — substantial reasoning depth per token - Hybrid design: Mamba for long-range + Transformer experts for focused reasoning - NVFP4 variant: ~5× throughput on Blackwell hardware Strategy: Front-load complex tasks with explicit reasoning chains. The Mamba architecture processes linearly — structure your prompt to match.
💡 Tips
- Nemotron 3 Ultra's 1M context window is the largest among open models — provide complete document sets, not summaries
- Use explicit multi-step reasoning chains — the 55B active MoE excels at structured decomposition
- Mamba backbone means the model processes sequentially — front-load the most important context
- For enterprise document analysis, cross-reference clauses across all documents simultaneously
- NVFP4 variant on Blackwell hardware delivers ~5× throughput for production deployment
Nemotron 3 Ultra Prompt Guide
NVIDIA Nemotron 3 Ultra (released June 2026) is the first open-weight 550 billion parameter hybrid Mamba–Mixture-of-Experts model — a groundbreaking architecture combining Mamba’s linear-time sequence processing with Transformer-based expert modules.
Architecture
Input → [Mamba Backbone] → [MoE Router] → [Expert 1..N] → Output
↑ Linear time ↑ 55B active ↑ Sparse activation
1M context OK out of 550B total ~10% active params
Key Specifications
| Metric | Value |
|---|---|
| Total Parameters | 550B |
| Active Parameters | 55B (~10%) |
| Context Window | 1,000,000 tokens |
| MMLU Score | 89.1 |
| Architecture | Hybrid Mamba–Transformer MoE |
| License | Open weights (NVFP4 variant on Hugging Face) |
Prompting Strategy
Nemotron 3 Ultra’s unique Mamba-MoE architecture requires different prompting than pure Transformer models:
- Front-load critical context — Mamba processes sequentially; early tokens have more influence
- Use explicit reasoning chains — The 55B active MoE excels at structured multi-step decomposition
- Leverage the full 1M context — Include entire document sets, codebases, or transcripts
- Structured output formats — Request tables, JSON, or markdown with explicit section headers
- Agentic workflows — Decompose complex goals into
t=1..Nreasoning steps
Performance Characteristics
- Strengths: Ultra-long context tasks, multi-document analysis, agentic reasoning, structured decomposition
- Trade-off: Mamba backbone means sequential processing (not parallel like pure Transformers)
- Deployment: NVFP4 quantization variant achieves ~5× throughput on NVIDIA Blackwell hardware
Related Prompts
Optimized prompts for Google Gemma 4 12B — the encoder-free any-to-any multimodal model. Handles text, image, audio, and video with 256K context. Apache 2.0 open weights. Laptop-class deployment.