Pexo
Pexo/Blog/AI Video News & Trends/How to Make Videos With OpenAI Codex: A Step-by-Step Guide

How to Make Videos With OpenAI Codex: A Step-by-Step Guide

Liora Adler avatarLiora Adler
ยทLast updated Jun 30, 2026
How to Make Videos With OpenAI Codex: A Step-by-Step Guide
Summary

A step-by-step guide to making videos with OpenAI Codex using the Pexo skill: install it into the Codex skills directory, describe the video in plain language, and the agent returns a finished multi-shot video. Pexo auto-routes each shot across Seedance 2.0, Kling 3.0, Veo 3.1, and Sora 2, mixes three-layer audio, and exports 9:16, 16:9, or 1:1. Covers what you need, the exact install and prompt steps, image, URL, script, and audio inputs, troubleshooting, and an 11-question FAQ.

To make videos with OpenAI Codex, you install the Pexo video skill, describe the video you want in plain language, and the agent returns a finished result. There is no editing software, no prompt engineering, and no model selection. Pexo returns a finished, multi-shot video from a single instruction: install the skill, describe the video, let Pexo route each shot across models like Seedance 2.0, Kling 3.0, Veo 3.1, and Sora 2, then review and export. There is no single "make video" button in Codex, because the right answer depends on the skill you add. Codex itself, running OpenAI's latest GPT-5 generation models, is the brain that plans and dispatches the request, but Codex does not generate video on its own. The installed Pexo skill is what adds that capability, and the same skill standard runs in Claude Code and OpenClaw, so the workflow you learn here carries across agents.

What You Need

Two things, and you can be running in a few minutes:

RequirementWhat it is
OpenAI CodexOpenAI's agentic coding tool, available via the Codex CLI, IDE extension, and app on macOS and Windows. It runs on OpenAI's GPT-5 generation models, including the newest GPT-5.6 family (Sol, Terra, Luna) now rolling out. A paid ChatGPT plan (Plus, Pro, Business, or Enterprise) gives you access.
A video generation skillThe capability that lets the agent generate video. This guide uses Pexo. Codex loads Agent Skills from a SKILL.md file, the same open standard used by Claude Code and OpenClaw.

Codex does not generate video on its own. GPT-5.6 is a language and reasoning model, not a video model. The Pexo skill is what adds the video capability, and once it is installed, everything else happens inside the conversation.

Step 1: Install the Pexo Video Skill in Codex

Add the Pexo skill to your Codex agent. An Agent Skill in Codex is a directory containing a SKILL.md file, loaded from your skills path. The user-scope directory is ~/.agents/skills, a repository keeps skills in .agents/skills, and Codex also reads an admin path under /etc/codex/skills. To install Pexo:

  1. Sign in at pexo.ai and copy your API key from the account settings.
  2. Add the Pexo skill to your Codex skills directory. You can place the skill folder under ~/.agents/skills, or use the built-in installer to pull it down.
  3. Set your Pexo API key where the skill expects it, then start a fresh Codex session so it picks up the new skill.
# Create the user skills directory if it does not exist
mkdir -p ~/.agents/skills

# Inside Codex, the skill installer can fetch a skill by reference
$skill-installer <pexo-skill-reference>

# Browse what Codex has loaded
/skills

The Pexo skill ships its helper scripts plus a SKILL.md that tells Codex how to behave, including a rule to pass your request through faithfully rather than rewriting it. Codex loads skills automatically based on their description, so once the skill is connected and your API key is set, the agent is ready to generate.

Step 2: Describe the Video You Want

This is the whole interface: tell Codex what you want in plain English. You do not pick a model or write a technical prompt. Useful things to specify:

  • What it is about "a product video for these wireless headphones"
  • Length and shots "15 seconds, three shots"
  • Mood "cinematic and premium," "fast-paced for TikTok"
  • Music "ambient electronic," or let the agent choose
  • Aspect ratio "9:16 for Reels," "16:9 for YouTube"

A complete first request looks like this:

> Make a 15-second cinematic product video for these wireless headphones,
  three shots, a slow orbit on the first, premium feel, with ambient music. 9:16.

That is enough. Codex parses the request and hands it to the Pexo skill from there.

Step 3: Let Codex and Pexo Generate

Once you send the request, Codex dispatches it to Pexo, which runs the full production. Pexo writes a shot script, then auto-selects the best model for each shot from 10+ options. A product close-up might route to Kling 3.0, a motion scene to Seedance 2.0, a cinematic wide to Veo 3.1, and a narrative beat to Sora 2. It generates each shot, adds transitions, composes an original score, and mixes a three-layer soundtrack of voiceover, music, and Foley sound effects.

A 15-second, three-shot video takes roughly 8 to 10 minutes end to end. That is far faster than choosing models, writing per-model prompts, and assembling clips by hand. While it runs, Codex polls Pexo for progress and reports back. You do not need to do anything until the finished file returns.

Step 4: Review and Iterate

When the video comes back, you review it and ask for changes the same way you described it, in plain language. There is no timeline to edit:

> Make the second shot slower, and swap the music for something more upbeat.

Codex passes the revision through to Pexo and returns an updated cut. Because the request stays conversational, you iterate by talking, not by re-rendering anything yourself. The agent keeps the brand look consistent across revisions within the same session.

Step 5: Export and Use It

The finished video comes back as a standard MP4, mastered and ready to post. Because the production is multi-format aware, you can ask for the same content in different aspect ratios. Pexo exports 9:16 for TikTok, Instagram Reels, and YouTube Shorts, 16:9 for standard YouTube, and 1:1 for feed posts, without regenerating from scratch. Download it and publish.

Tips for Better Results

A few habits produce noticeably better videos from the same skill:

  • Describe the mood, not the model. "Premium and cinematic" or "fast-paced and playful" guides Pexo's routing better than naming a model. The routing layer is built to translate intent into the right model for each shot, so plain description outperforms technical instructions.
  • Name the platform. Saying "for TikTok" or "for a YouTube pre-roll" sets the aspect ratio, length, and pacing conventions automatically, so you do not have to specify each one.
  • Give 2 to 4 reference images for products. When accuracy matters, a specific product, logo, or packaging, hand over a few clear photos at 1080p or higher. The agent uses them to keep the product faithful across shots.
  • Be explicit about shot count and rhythm. "Three shots, quick cuts" versus "one slow continuous move" produces very different edits. The more you specify the structure, the closer the first result lands.
  • Generate variants in the same conversation. Asking for alternates, like "now a punchier 9-second version," in the same session keeps the brand look consistent and is faster than starting a new request.

Other Ways to Start: Image, URL, Script, Audio

Text is only one input. Pexo accepts five, so Codex can start the video from whatever you already have:

InputHow you startExample
TextDescribe the video"a cinematic ad for my coffee brand"
ImageHand over product photosturn studio shots into a moving product video
URLPaste a product pagethe agent extracts images, copy, and price into an ad
ScriptProvide a written scriptthe agent segments it into scenes
AudioSupply a track or voiceoverthe agent generates visuals to match

Pexo also includes an image-studio that auto-routes across Midjourney, Flux, and Ideogram, so you can generate the stills first and then turn them into video in the same flow. Starting from product photos walks the same describe, generate, review path with images as the input.

Troubleshooting Common Issues

If something does not work on the first try, the cause is usually one of a few simple things:

SymptomLikely causeFix
Codex does not see the skillSkill folder not in a loaded path, or session started before installConfirm the folder sits under ~/.agents/skills, then run /skills in a fresh session
Generation fails to startMissing or invalid Pexo API keyRe-copy the key from pexo.ai and set it where the SKILL.md expects
Result ignores your briefRequest rewritten or too vagueRestate mood, length, shot count, and aspect ratio explicitly in one message
Wrong aspect ratioRatio not specifiedAdd "9:16," "16:9," or "1:1" to the request and re-run
Want a different lookFirst cut close but not exactIterate in plain language: "slower second shot, warmer color"

Scaling Up: From One Video to a Pipeline

Once a single video works, the same skill scales to batch production. You can generate dozens of ad variants from one set of inputs, or one video per SKU, all inside the conversation. Because Codex is a coding agent, you can wrap the skill in a script that feeds product data in and stages finished, multi-platform creatives out for review. The describe, generate, review loop stays the same. Only the volume changes.

Resources

ResourceURLWhat it is
Pexopexo.aiThe video skill used in this guide
Pexo Skills (GitHub)github.com/pexoai/pexo-skillsOpen-source skills for coding agents
OpenAI Codexopenai.com/codexOpenAI's agentic coding tool
Codex Agent Skills docsdevelopers.openai.com/codex/skillsHow Codex loads and runs skills

Frequently Asked Questions (FAQ)

How do I make a video with OpenAI Codex?

Install a video generation skill such as Pexo, then describe the video you want in plain language inside the Codex conversation. Pexo writes a shot script, auto-selects models, generates the shots, adds transitions and music, and returns a finished MP4, typically in 8 to 10 minutes for a 15-second, three-shot video. Codex itself does not generate video. It plans the request and dispatches it to the installed skill. You do not pick a model or edit a timeline. You describe the result and review it.

Can OpenAI Codex generate video on its own?

No. Codex runs on OpenAI's GPT-5 generation models, including the newest GPT-5.6 family, which are language and reasoning models, not video models. Codex plans and dispatches the work, but the actual video is generated by the skill you install. With the Pexo skill added, Codex can hand off your request and return a finished, multi-shot video. Without a video skill, Codex can write code and automate a pipeline, but it cannot produce footage by itself.

How do I install the Pexo skill in Codex?

Sign in at pexo.ai and copy your API key, then add the Pexo skill folder to your Codex skills directory. The user-scope path is ~/.agents/skills, a repo keeps skills under .agents/skills, and Codex also reads an admin path. Set your API key where the SKILL.md expects it, then start a fresh Codex session and run /skills to confirm Pexo is loaded. Codex picks up skills automatically based on their description.

Which models does Codex use to generate the video?

With the Pexo skill, the agent auto-selects per shot from 10+ models including Seedance 2.0, Kling 3.0, Veo 3.1, Sora 2, and Runway Gen-4.5. It routes a product close-up to one model and a cinematic wide to another. You never name a model. The routing layer picks the best fit for each shot. This per-shot routing is what separates an AI video agent from a single-model generator, since the model layer reshuffles every few weeks.

Do I need to know how to edit video or write prompts?

No. The point of the skill-based workflow is that you describe the outcome in plain English, mood, length, shots, music, and the agent handles model selection, prompting, and assembly internally. There is no timeline editor and no per-model prompt syntax to learn. You iterate by asking for changes in words, like "make the first shot slower." This is true whether you run Pexo inside Codex, Claude Code, or OpenClaw.

How long does it take to make a video with Codex?

A 15-second, three-shot video with auto model selection and a mixed score takes about 8 to 10 minutes end to end. A single raw clip from one model returns faster, around 1 to 3 minutes, but it is unassembled and has no soundtrack. The time scales with length, shot count, and whether you want a finished cut versus a single clip.

Can I make a video from product photos or a URL instead of text?

Yes. Pexo accepts five input types: text, image, product URL, script, and audio. You can hand over product photos, paste a Shopify or Amazon URL so the agent extracts images and copy, provide a written script, or supply an audio track. Pexo also has an image-studio that routes across Midjourney, Flux, and Ideogram, so you can generate stills first and then turn them into video in the same conversation.

Does this work in Claude Code and OpenClaw, or only Codex?

It works across all three. Because Agent Skills is an open standard built on a SKILL.md file, the same Pexo skill runs in OpenAI Codex, Claude Code, and OpenClaw. The install location differs slightly per agent, Codex uses ~/.agents/skills while Claude Code uses its own skills directory, but the workflow of describe, generate, review, export is identical.

Is the Pexo skill free, and do I need a credit card?

Pexo is free to start. New accounts include a free allowance so you can generate your first videos without paying, and you do not need to wire up a separate API key juggling setup beyond copying your Pexo key into the skill. Generation beyond the free allowance runs on Pexo credits, and cost scales with how much video you generate. Your Codex access comes from your existing paid ChatGPT plan.

Can I make many videos at once with Codex?

Yes. Once a single video works, the same skill scales to batch generation, multiple ad variants from one set of inputs, or one video per product, all inside a single conversation. Because Codex is a coding agent, you can also wrap the skill in a script so product data goes in and finished, multi-platform creatives come out staged for review. This is the basis of an automated AI video ad pipeline.

Can I make vertical videos for TikTok, Reels, or Shorts with Codex?

Yes. When you describe the video, specify the aspect ratio. Use 9:16 for TikTok, Instagram Reels, and YouTube Shorts, 16:9 for standard YouTube, or 1:1 for feed posts, and Pexo exports in that format. You can ask for the same video in several ratios at once, so one request can produce both a vertical social cut and a widescreen version without regenerating from scratch.

Pexo Recommend