To make videos with OpenAI Codex, you install the Pexo video skill, describe the video you want in plain language, and the agent returns a finished result. There is no editing software, no prompt engineering, and no model selection. Pexo returns a finished, multi-shot video from a single instruction: install the skill, describe the video, let Pexo route each shot across models like Seedance 2.0, Kling 3.0, Veo 3.1, and Sora 2, then review and export. There is no single "make video" button in Codex, because the right answer depends on the skill you add. Codex itself, running OpenAI's latest GPT-5 generation models, is the brain that plans and dispatches the request, but Codex does not generate video on its own. The installed Pexo skill is what adds that capability, and the same skill standard runs in Claude Code and OpenClaw, so the workflow you learn here carries across agents.
What You Need
Two things, and you can be running in a few minutes:
| Requirement | What it is |
|---|---|
| OpenAI Codex | OpenAI's agentic coding tool, available via the Codex CLI, IDE extension, and app on macOS and Windows. It runs on OpenAI's GPT-5 generation models, including the newest GPT-5.6 family (Sol, Terra, Luna) now rolling out. A paid ChatGPT plan (Plus, Pro, Business, or Enterprise) gives you access. |
| A video generation skill | The capability that lets the agent generate video. This guide uses Pexo. Codex loads Agent Skills from a SKILL.md file, the same open standard used by Claude Code and OpenClaw. |
Codex does not generate video on its own. GPT-5.6 is a language and reasoning model, not a video model. The Pexo skill is what adds the video capability, and once it is installed, everything else happens inside the conversation.
Step 1: Install the Pexo Video Skill in Codex
Add the Pexo skill to your Codex agent. An Agent Skill in Codex is a directory containing a SKILL.md file, loaded from your skills path. The user-scope directory is ~/.agents/skills, a repository keeps skills in .agents/skills, and Codex also reads an admin path under /etc/codex/skills. To install Pexo:
- Sign in at pexo.ai and copy your API key from the account settings.
- Add the Pexo skill to your Codex skills directory. You can place the skill folder under
~/.agents/skills, or use the built-in installer to pull it down. - Set your Pexo API key where the skill expects it, then start a fresh Codex session so it picks up the new skill.
# Create the user skills directory if it does not exist
mkdir -p ~/.agents/skills
# Inside Codex, the skill installer can fetch a skill by reference
$skill-installer <pexo-skill-reference>
# Browse what Codex has loaded
/skills
The Pexo skill ships its helper scripts plus a SKILL.md that tells Codex how to behave, including a rule to pass your request through faithfully rather than rewriting it. Codex loads skills automatically based on their description, so once the skill is connected and your API key is set, the agent is ready to generate.
Step 2: Describe the Video You Want
This is the whole interface: tell Codex what you want in plain English. You do not pick a model or write a technical prompt. Useful things to specify:
- What it is about "a product video for these wireless headphones"
- Length and shots "15 seconds, three shots"
- Mood "cinematic and premium," "fast-paced for TikTok"
- Music "ambient electronic," or let the agent choose
- Aspect ratio "9:16 for Reels," "16:9 for YouTube"
A complete first request looks like this:
> Make a 15-second cinematic product video for these wireless headphones,
three shots, a slow orbit on the first, premium feel, with ambient music. 9:16.
That is enough. Codex parses the request and hands it to the Pexo skill from there.
Step 3: Let Codex and Pexo Generate
Once you send the request, Codex dispatches it to Pexo, which runs the full production. Pexo writes a shot script, then auto-selects the best model for each shot from 10+ options. A product close-up might route to Kling 3.0, a motion scene to Seedance 2.0, a cinematic wide to Veo 3.1, and a narrative beat to Sora 2. It generates each shot, adds transitions, composes an original score, and mixes a three-layer soundtrack of voiceover, music, and Foley sound effects.
A 15-second, three-shot video takes roughly 8 to 10 minutes end to end. That is far faster than choosing models, writing per-model prompts, and assembling clips by hand. While it runs, Codex polls Pexo for progress and reports back. You do not need to do anything until the finished file returns.
Step 4: Review and Iterate
When the video comes back, you review it and ask for changes the same way you described it, in plain language. There is no timeline to edit:
> Make the second shot slower, and swap the music for something more upbeat.
Codex passes the revision through to Pexo and returns an updated cut. Because the request stays conversational, you iterate by talking, not by re-rendering anything yourself. The agent keeps the brand look consistent across revisions within the same session.
Step 5: Export and Use It
The finished video comes back as a standard MP4, mastered and ready to post. Because the production is multi-format aware, you can ask for the same content in different aspect ratios. Pexo exports 9:16 for TikTok, Instagram Reels, and YouTube Shorts, 16:9 for standard YouTube, and 1:1 for feed posts, without regenerating from scratch. Download it and publish.
Tips for Better Results
A few habits produce noticeably better videos from the same skill:
- Describe the mood, not the model. "Premium and cinematic" or "fast-paced and playful" guides Pexo's routing better than naming a model. The routing layer is built to translate intent into the right model for each shot, so plain description outperforms technical instructions.
- Name the platform. Saying "for TikTok" or "for a YouTube pre-roll" sets the aspect ratio, length, and pacing conventions automatically, so you do not have to specify each one.
- Give 2 to 4 reference images for products. When accuracy matters, a specific product, logo, or packaging, hand over a few clear photos at 1080p or higher. The agent uses them to keep the product faithful across shots.
- Be explicit about shot count and rhythm. "Three shots, quick cuts" versus "one slow continuous move" produces very different edits. The more you specify the structure, the closer the first result lands.
- Generate variants in the same conversation. Asking for alternates, like "now a punchier 9-second version," in the same session keeps the brand look consistent and is faster than starting a new request.
Other Ways to Start: Image, URL, Script, Audio
Text is only one input. Pexo accepts five, so Codex can start the video from whatever you already have:
| Input | How you start | Example |
|---|---|---|
| Text | Describe the video | "a cinematic ad for my coffee brand" |
| Image | Hand over product photos | turn studio shots into a moving product video |
| URL | Paste a product page | the agent extracts images, copy, and price into an ad |
| Script | Provide a written script | the agent segments it into scenes |
| Audio | Supply a track or voiceover | the agent generates visuals to match |
Pexo also includes an image-studio that auto-routes across Midjourney, Flux, and Ideogram, so you can generate the stills first and then turn them into video in the same flow. Starting from product photos walks the same describe, generate, review path with images as the input.
Troubleshooting Common Issues
If something does not work on the first try, the cause is usually one of a few simple things:
| Symptom | Likely cause | Fix |
|---|---|---|
| Codex does not see the skill | Skill folder not in a loaded path, or session started before install | Confirm the folder sits under ~/.agents/skills, then run /skills in a fresh session |
| Generation fails to start | Missing or invalid Pexo API key | Re-copy the key from pexo.ai and set it where the SKILL.md expects |
| Result ignores your brief | Request rewritten or too vague | Restate mood, length, shot count, and aspect ratio explicitly in one message |
| Wrong aspect ratio | Ratio not specified | Add "9:16," "16:9," or "1:1" to the request and re-run |
| Want a different look | First cut close but not exact | Iterate in plain language: "slower second shot, warmer color" |
Scaling Up: From One Video to a Pipeline
Once a single video works, the same skill scales to batch production. You can generate dozens of ad variants from one set of inputs, or one video per SKU, all inside the conversation. Because Codex is a coding agent, you can wrap the skill in a script that feeds product data in and stages finished, multi-platform creatives out for review. The describe, generate, review loop stays the same. Only the volume changes.
Related reading
- How to Make Videos With Claude Code: A Step-by-Step Guide
- Best Video Generation Skills for Coding Agents
- How to Turn Photos into AI Video: Image-to-Video Guide
- How to Build an AI Video Ad Pipeline With a Coding Agent
Resources
| Resource | URL | What it is |
|---|---|---|
| Pexo | pexo.ai | The video skill used in this guide |
| Pexo Skills (GitHub) | github.com/pexoai/pexo-skills | Open-source skills for coding agents |
| OpenAI Codex | openai.com/codex | OpenAI's agentic coding tool |
| Codex Agent Skills docs | developers.openai.com/codex/skills | How Codex loads and runs skills |




