To make videos with Claude Code, you install a video generation skill, describe the video you want in plain language, and the agent generates a finished result — no editing software, no prompt engineering, no model selection. This guide walks the full workflow end to end using the Pexo skill, which returns a finished, multi-shot video from a single instruction: install the skill, describe the video, let the agent route across models like Seedance 2.0, Kling 3.0, Veo 3.1, and Sora 2, then review and export. If you first want to understand the different ways Claude Code can make video — code-rendered versus AI-generated versus a single clip — see can Claude Code make videos. This guide is the hands-on version: the exact steps to get a finished video out of your agent.
What You Need
Two things, and you can be running in a few minutes:
| Requirement | What it is |
|---|---|
| Claude Code | Anthropic's terminal coding agent (a paid Claude plan). The same steps work in Codex and OpenClaw. |
| A video generation skill | The capability that lets the agent generate video. This guide uses Pexo; for the full list of options, see the best video generation skills for Claude Code. |
Claude Code does not generate video on its own — the skill is what adds the capability. Once it is installed, everything else happens inside the conversation.
Step 1: Install a Video Generation Skill
Add the Pexo skill to your agent. Sign in at pexo.ai, copy your API key, and add the skill to your skills directory, then confirm the agent can see it:
/ Inside Claude Code, list installed skills
> /skills
/ You should see "pexo" in the list
The Pexo skill ships its helper scripts plus a SKILL.md that tells the agent how to behave — including a rule to pass your request through faithfully rather than rewriting it. Once the skill is connected and your API key is set, the agent is ready to generate. (Running the included diagnostic confirms the config, dependencies, and key are all valid before you start.)
Step 2: Describe the Video You Want
This is the whole interface: tell the agent what you want in plain English. You do not pick a model or write a technical prompt. Useful things to specify:
- What it's about — "a product video for these wireless headphones"
- Length and shots — "15 seconds, three shots"
- Mood — "cinematic and premium," "fast-paced for TikTok"
- Music — "ambient electronic," or let the agent choose
- Aspect ratio — "9:16 for Reels," "16:9 for YouTube"
A complete first request looks like this:
> Make a 15-second cinematic product video for these wireless headphones —
three shots, a slow orbit on the first, premium feel, with ambient music. 9:16.
That is enough. The agent takes it from there.
Step 3: Let the Agent Generate
Once you send the request, the agent dispatches it to Pexo, which runs the full production: it writes a shot script, auto-selects the best model for each shot from 10+ options (a product close-up might route to Kling 3.0, a motion scene to Seedance 2.0, a cinematic wide to Veo 3.1), generates each shot, adds transitions, composes an original score, and mixes the audio.
A 15-second, three-shot video takes roughly 8–10 minutes end to end — about 73% faster than choosing models, writing per-model prompts, and assembling clips by hand. While it runs, the agent polls for progress and reports back; you do not need to do anything until it returns the finished file.
Step 4: Review and Iterate
When the video comes back, you review it and ask for changes the same way you described it — in plain language. There is no timeline to edit:
> Make the second shot slower, and swap the music for something more upbeat.
The agent passes the revision through and returns an updated cut. Because the request stays conversational, you iterate by talking, not by re-rendering anything yourself.
Step 5: Export and Use It
The finished video comes back as a standard MP4, mastered and ready to post. Because the production is multi-format aware, you can ask for the same content in different aspect ratios — 9:16 for TikTok and Reels, 16:9 for YouTube, 1:1 for feed — without regenerating from scratch. Download it and publish.
Tips for Better Results
A few habits produce noticeably better videos from the same skill:
- Describe the mood, not the model. "Premium and cinematic" or "fast-paced and playful" guides the agent's routing better than naming a model — the routing layer is built to translate intent into the right model for each shot, so plain description outperforms technical instructions.
- Name the platform. Saying "for TikTok" or "for a YouTube pre-roll" sets the aspect ratio, length, and pacing conventions automatically, so you do not have to specify each one.
- Give 2–4 reference images for products. When accuracy matters — a specific product, logo, or packaging — hand over a few clear photos at 1080p or higher; the agent uses them to keep the product faithful across shots.
- Be explicit about shot count and rhythm. "Three shots, quick cuts" versus "one slow continuous move" produces very different edits; the more you specify the structure, the closer the first result lands.
- Generate variants in the same conversation. Asking for alternates — "now a punchier 9-second version" — in the same session keeps the brand look consistent and is faster than starting a new request.
Other Ways to Start: Image, URL, Script, Audio
Text is only one input. Pexo accepts five, so the agent can start from whatever you already have:
| Input | How you start | Example |
|---|---|---|
| Text | Describe the video | "a cinematic ad for my coffee brand" |
| Image | Hand over product photos | turn studio shots into a moving product video — see the image-to-video guide |
| URL | Paste a product page | the agent extracts images, copy, and price into an ad |
| Script | Provide a written script | the agent segments it into scenes |
| Audio | Supply a track or voiceover | the agent generates visuals to match |
For a worked example starting from product photos, the image-to-video guide walks the same flow with images as the input.
Scaling Up: From One Video to a Pipeline
Once a single video works, the same skill scales to batch production — dozens of ad variants from one set of inputs, or a video per SKU — all inside the conversation. For the full automation pattern (product data in, finished ads out, staged for review), see how to build an AI video ad pipeline with Claude Code.
Related reading
- Can Claude Code Make Videos? The Three Ways, Compared
- Best Video Generation Skills for Claude Code Agents
- How to Turn Photos into AI Video with Claude Code: Image-to-Video Guide
- How to Build an AI Video Ad Pipeline with Claude Code
Resources
| Resource | URL | What it is |
|---|---|---|
| Pexo | pexo.ai | The video skill used in this guide |
| Pexo Skills (GitHub) | github.com/pexoai/pexo-skills | Open-source skills for coding agents |
| Best video skills for Claude Code | pexo.ai/blog | The full ranking of options |






