Cosmos3-Super Master Prompts (June 2026)

📁 Video-GenerationMultimodal 🤖 Cosmos3-Super 📊 Advanced 📅 Jun 12, 2026

Optimized prompts for NVIDIA Cosmos3-Super — a 64B physical-AI omnimodel that couples action trajectories with video+audio generation. World-model architecture for physics-aware content creation.

📋 Prompt

/* COSMOS3-SUPER MASTER PROMPT
   VERSION: 1.0.0
   CAPABILITIES: Physical-AI Video Gen, Action-Conditioned, Audio Sync
   ARCHITECTURE: 64B Omnimodel (32B Reasoner + 32B Generator) */

**Scene:** [SCENE_TYPE] — [DURATION]s at [FPS]fps
**World Physics:**
  - Gravity: [VALUE] m/s²
  - Atmosphere: [DENSITY], [WIND], [TEMPERATURE]
  - Materials present: [GLASS, METAL, CLOTH, WATER, ORGANIC]
**Camera:**
  - Path: [SHOT_1 → SHOT_2 → SHOT_3] with [TRANSITION_TYPE]
  - Lens: [FOCAL_LENGTH]mm, [APERTURE]
**Action Trajectory (keyframed):**
  t=[START]: [INITIAL_STATE]
  t=[MID]: [INTERMEDIATE_ACTION]
  t=[END]: [FINAL_STATE]
**Audio:**
  - Type: [PHYSICS_BASED | DESIGNED | MUSIC]
  - Sync: [ON_ACTION | CONTINUOUS | REACTIVE]
**Quality:** [PHOTOREAL | STYLIZED], temporal consistency HIGH

Cosmos3-Super differentiators:
- Physical simulation, not just pixel prediction
- Couples action → video → audio in one unified generation
- World-model architecture understands object permanence and physics
- OpenMDW 1.1 license on Hugging Face

💡 Tips

  • Cosmos3-Super is a world model — use physics parameters (gravity, material, collision) not just visual descriptions
  • Action trajectories use keyframe syntax with precise timestamps for predictable output
  • Camera path is described as a sequence of shots with durations — not free-text
  • Audio is generated synchronously with video — specify audio events at the same timestamps as visual actions
  • For best temporal consistency, keep action sequences under 30 seconds

Cosmos3-Super Prompt Guide

NVIDIA Cosmos3-Super (released June 2026) is a 64 billion parameter physical-AI omnimodel — a world-model architecture that combines a 32B reasoning module with a 32B generation module. Unlike traditional video generators that predict pixels, Cosmos3-Super simulates physics and couples action trajectories with synchronized video and audio output.

Architecture

Action Trajectory → [32B Reasoner] → Physical State → [32B Generator] → Video + Audio
                                    ↑
                              World Knowledge

Prompting Strategy

Cosmos3-Super requires a fundamentally different prompting approach than diffusion-based video models (Sora, Runway, Kling):

  1. Define physics first — Gravity, material properties, atmospheric conditions
  2. Keyframe actions — Use t=TIMESTAMP syntax for action trajectories
  3. Camera as path — Describe camera movement as timed shot sequences
  4. Audio sync — Specify audio events at the same timestamps as visual actions
  5. World knowledge — The 32B reasoner understands real-world physics; describe outcomes, not pixel-level details

Comparison: Cosmos3 vs Traditional Video Gen

AspectCosmos3-SuperTraditional (Sora/Runway)
ApproachPhysics simulationPixel prediction
ActionsKeyframe trajectoriesDescriptive text
AudioSynchronized generationSeparate generation
ConsistencyTemporal by designRequires guidance
LicenseOpenMDW 1.1 (Hugging Face)Proprietary

Related Prompts

video-generation sora openai Sora

Professional cinematic prompts for OpenAI Sora. Features director-style camera control, scene composition, and photorealistic video generation.

View
video-generation seedance-2-0 bytedance Seedance-2.0

Advanced prompts for Seedance 2.0 motion transfer, video editing, and style transformation. Includes before/after comparisons and JSON-style configurations.

View
video-generation runway-gen-4 cinematic Runway-Gen-4

Professional cinematic prompts for Runway Gen-4. Features director-style camera control, lighting rigs, and scene composition techniques.

View