Cosmos3-Super Master Prompts (June 2026)
Optimized prompts for NVIDIA Cosmos3-Super — a 64B physical-AI omnimodel that couples action trajectories with video+audio generation. World-model architecture for physics-aware content creation.
📋 Prompt
/* COSMOS3-SUPER MASTER PROMPT VERSION: 1.0.0 CAPABILITIES: Physical-AI Video Gen, Action-Conditioned, Audio Sync ARCHITECTURE: 64B Omnimodel (32B Reasoner + 32B Generator) */ **Scene:** [SCENE_TYPE] — [DURATION]s at [FPS]fps **World Physics:** - Gravity: [VALUE] m/s² - Atmosphere: [DENSITY], [WIND], [TEMPERATURE] - Materials present: [GLASS, METAL, CLOTH, WATER, ORGANIC] **Camera:** - Path: [SHOT_1 → SHOT_2 → SHOT_3] with [TRANSITION_TYPE] - Lens: [FOCAL_LENGTH]mm, [APERTURE] **Action Trajectory (keyframed):** t=[START]: [INITIAL_STATE] t=[MID]: [INTERMEDIATE_ACTION] t=[END]: [FINAL_STATE] **Audio:** - Type: [PHYSICS_BASED | DESIGNED | MUSIC] - Sync: [ON_ACTION | CONTINUOUS | REACTIVE] **Quality:** [PHOTOREAL | STYLIZED], temporal consistency HIGH Cosmos3-Super differentiators: - Physical simulation, not just pixel prediction - Couples action → video → audio in one unified generation - World-model architecture understands object permanence and physics - OpenMDW 1.1 license on Hugging Face
💡 Tips
- Cosmos3-Super is a world model — use physics parameters (gravity, material, collision) not just visual descriptions
- Action trajectories use keyframe syntax with precise timestamps for predictable output
- Camera path is described as a sequence of shots with durations — not free-text
- Audio is generated synchronously with video — specify audio events at the same timestamps as visual actions
- For best temporal consistency, keep action sequences under 30 seconds
Cosmos3-Super Prompt Guide
NVIDIA Cosmos3-Super (released June 2026) is a 64 billion parameter physical-AI omnimodel — a world-model architecture that combines a 32B reasoning module with a 32B generation module. Unlike traditional video generators that predict pixels, Cosmos3-Super simulates physics and couples action trajectories with synchronized video and audio output.
Architecture
Action Trajectory → [32B Reasoner] → Physical State → [32B Generator] → Video + Audio
↑
World Knowledge
Prompting Strategy
Cosmos3-Super requires a fundamentally different prompting approach than diffusion-based video models (Sora, Runway, Kling):
- Define physics first — Gravity, material properties, atmospheric conditions
- Keyframe actions — Use
t=TIMESTAMPsyntax for action trajectories - Camera as path — Describe camera movement as timed shot sequences
- Audio sync — Specify audio events at the same timestamps as visual actions
- World knowledge — The 32B reasoner understands real-world physics; describe outcomes, not pixel-level details
Comparison: Cosmos3 vs Traditional Video Gen
| Aspect | Cosmos3-Super | Traditional (Sora/Runway) |
|---|---|---|
| Approach | Physics simulation | Pixel prediction |
| Actions | Keyframe trajectories | Descriptive text |
| Audio | Synchronized generation | Separate generation |
| Consistency | Temporal by design | Requires guidance |
| License | OpenMDW 1.1 (Hugging Face) | Proprietary |
Related Prompts
Professional cinematic prompts for OpenAI Sora. Features director-style camera control, scene composition, and photorealistic video generation.
Advanced prompts for Seedance 2.0 motion transfer, video editing, and style transformation. Includes before/after comparisons and JSON-style configurations.
Professional cinematic prompts for Runway Gen-4. Features director-style camera control, lighting rigs, and scene composition techniques.