Nemotron 3 Ultra Master Prompts (June 2026)

📁 Llm 🤖 Nemotron-3-Ultra 📊 Advanced 📅 Jun 12, 2026

Optimized prompts for NVIDIA Nemotron 3 Ultra — the first open-weight 550B hybrid Mamba-MoE model. 55B active parameters, 1M context window, 89.1 MMLU. Datacenter-scale agentic reasoning.

📋 Prompt

/* NEMOTRON 3 ULTRA MASTER PROMPT
   VERSION: 1.0.0
   CAPABILITIES: 550B Mamba-MoE, 55B Active, 1M Context, 89.1 MMLU
   ARCHITECTURE: Hybrid Mamba–Transformer Mixture-of-Experts */

**Task:** [ANALYSIS | REASONING | GENERATION | AGENTIC]
**Context Size:** [ESTIMATED_TOKENS] tokens (max 1,000,000)
**Reasoning Depth:** [SHALLOW | MODERATE | DEEP | EXHAUSTIVE]

**Instructions:**
  1. [PHASE_1 — Decomposition/Setup]
  2. [PHASE_2 — Core Processing]
  3. [PHASE_3 — Synthesis/Verification]
  4. [PHASE_4 — Output Formatting]

**Output Format:** [MARKDOWN | JSON | TABLE | CODE]
**Quality Requirements:**
  - [REQUIREMENT_1]
  - [REQUIREMENT_2]

Nemotron 3 Ultra architecture notes:
- Mamba backbone excels at ultra-long sequences (use full 1M context!)
- MoE with 55B active parameters — substantial reasoning depth per token
- Hybrid design: Mamba for long-range + Transformer experts for focused reasoning
- NVFP4 variant: ~5× throughput on Blackwell hardware

Strategy: Front-load complex tasks with explicit reasoning chains.
The Mamba architecture processes linearly — structure your prompt to match.

💡 Tips

  • Nemotron 3 Ultra's 1M context window is the largest among open models — provide complete document sets, not summaries
  • Use explicit multi-step reasoning chains — the 55B active MoE excels at structured decomposition
  • Mamba backbone means the model processes sequentially — front-load the most important context
  • For enterprise document analysis, cross-reference clauses across all documents simultaneously
  • NVFP4 variant on Blackwell hardware delivers ~5× throughput for production deployment

Nemotron 3 Ultra Prompt Guide

NVIDIA Nemotron 3 Ultra (released June 2026) is the first open-weight 550 billion parameter hybrid Mamba–Mixture-of-Experts model — a groundbreaking architecture combining Mamba’s linear-time sequence processing with Transformer-based expert modules.

Architecture

Input → [Mamba Backbone] → [MoE Router] → [Expert 1..N] → Output
         ↑ Linear time        ↑ 55B active        ↑ Sparse activation
         1M context OK        out of 550B total     ~10% active params

Key Specifications

MetricValue
Total Parameters550B
Active Parameters55B (~10%)
Context Window1,000,000 tokens
MMLU Score89.1
ArchitectureHybrid Mamba–Transformer MoE
LicenseOpen weights (NVFP4 variant on Hugging Face)

Prompting Strategy

Nemotron 3 Ultra’s unique Mamba-MoE architecture requires different prompting than pure Transformer models:

  1. Front-load critical context — Mamba processes sequentially; early tokens have more influence
  2. Use explicit reasoning chains — The 55B active MoE excels at structured multi-step decomposition
  3. Leverage the full 1M context — Include entire document sets, codebases, or transcripts
  4. Structured output formats — Request tables, JSON, or markdown with explicit section headers
  5. Agentic workflows — Decompose complex goals into t=1..N reasoning steps

Performance Characteristics

  • Strengths: Ultra-long context tasks, multi-document analysis, agentic reasoning, structured decomposition
  • Trade-off: Mamba backbone means sequential processing (not parallel like pure Transformers)
  • Deployment: NVFP4 quantization variant achieves ~5× throughput on NVIDIA Blackwell hardware

Related Prompts

llm gemma-4 google Gemma-4

Optimized prompts for Google Gemma 4 12B — the encoder-free any-to-any multimodal model. Handles text, image, audio, and video with 256K context. Apache 2.0 open weights. Laptop-class deployment.

View