Cosmos3-Super Master Prompts (June 2026)

📁 Video-Generation Multimodal 🤖 Cosmos3-Super 📊 Advanced 📅 Jun 12, 2026

Optimized prompts for NVIDIA Cosmos3-Super — a 64B physical-AI omnimodel that couples action trajectories with video+audio generation. World-model architecture for physics-aware content creation.

📋 Prompt

/* COSMOS3-SUPER MASTER PROMPT
   VERSION: 1.0.0
   CAPABILITIES: Physical-AI Video Gen, Action-Conditioned, Audio Sync
   ARCHITECTURE: 64B Omnimodel (32B Reasoner + 32B Generator) */

**Scene:** [SCENE_TYPE] — [DURATION]s at [FPS]fps
**World Physics:**
  - Gravity: [VALUE] m/s²
  - Atmosphere: [DENSITY], [WIND], [TEMPERATURE]
  - Materials present: [GLASS, METAL, CLOTH, WATER, ORGANIC]
**Camera:**
  - Path: [SHOT_1 → SHOT_2 → SHOT_3] with [TRANSITION_TYPE]
  - Lens: [FOCAL_LENGTH]mm, [APERTURE]
**Action Trajectory (keyframed):**
  t=[START]: [INITIAL_STATE]
  t=[MID]: [INTERMEDIATE_ACTION]
  t=[END]: [FINAL_STATE]
**Audio:**
  - Type: [PHYSICS_BASED | DESIGNED | MUSIC]
  - Sync: [ON_ACTION | CONTINUOUS | REACTIVE]
**Quality:** [PHOTOREAL | STYLIZED], temporal consistency HIGH

Cosmos3-Super differentiators:
- Physical simulation, not just pixel prediction
- Couples action → video → audio in one unified generation
- World-model architecture understands object permanence and physics
- OpenMDW 1.1 license on Hugging Face

💡 Tips

Cosmos3-Super is a world model — use physics parameters (gravity, material, collision) not just visual descriptions
Action trajectories use keyframe syntax with precise timestamps for predictable output
Camera path is described as a sequence of shots with durations — not free-text
Audio is generated synchronously with video — specify audio events at the same timestamps as visual actions
For best temporal consistency, keep action sequences under 30 seconds

Cosmos3-Super Prompt Guide

NVIDIA Cosmos3-Super (released June 2026) is a 64 billion parameter physical-AI omnimodel — a world-model architecture that combines a 32B reasoning module with a 32B generation module. Unlike traditional video generators that predict pixels, Cosmos3-Super simulates physics and couples action trajectories with synchronized video and audio output.

Architecture

Action Trajectory → [32B Reasoner] → Physical State → [32B Generator] → Video + Audio
                                    ↑
                              World Knowledge

Prompting Strategy

Cosmos3-Super requires a fundamentally different prompting approach than diffusion-based video models (Sora, Runway, Kling):

Define physics first — Gravity, material properties, atmospheric conditions
Keyframe actions — Use t=TIMESTAMP syntax for action trajectories
Camera as path — Describe camera movement as timed shot sequences
Audio sync — Specify audio events at the same timestamps as visual actions
World knowledge — The 32B reasoner understands real-world physics; describe outcomes, not pixel-level details

Comparison: Cosmos3 vs Traditional Video Gen

Aspect	Cosmos3-Super	Traditional (Sora/Runway)
Approach	Physics simulation	Pixel prediction
Actions	Keyframe trajectories	Descriptive text
Audio	Synchronized generation	Separate generation
Consistency	Temporal by design	Requires guidance
License	OpenMDW 1.1 (Hugging Face)	Proprietary

Related Prompts

Sora Cinematic Video Prompts (2026)

video-generation sora openai Sora

Professional cinematic prompts for OpenAI Sora. Features director-style camera control, scene composition, and photorealistic video generation.

View

Seedance 2.0 Video-to-Video & Motion Transfer (2026)

video-generation seedance-2-0 bytedance Seedance-2.0

Advanced prompts for Seedance 2.0 motion transfer, video editing, and style transformation. Includes before/after comparisons and JSON-style configurations.

View

Runway Gen-4 Cinematic AI Video Prompts (2026)

video-generation runway-gen-4 cinematic Runway-Gen-4

Professional cinematic prompts for Runway Gen-4. Features director-style camera control, lighting rigs, and scene composition techniques.

View