Pexo
banner
Pexo/Blog/How to Make Videos With Claude Code: A Step-by-Step Guide

How to Make Videos With Claude Code: A Step-by-Step Guide

Finn avatar
Finn·Last updated Jun 2, 2026
How to Make Videos With Claude Code: A Step-by-Step Guide
Summary

A step-by-step guide to making videos with Claude Code using the Pexo skill. Step 1: install a video generation skill and confirm the agent sees it. Step 2: describe the video in plain language — what it's about, length, shots, mood, music, aspect ratio. Step 3: the agent auto-selects the best model per shot (Seedance 2.0, Kling 3.0, Veo 3.1, Sora 2, Runway Gen-4), generates, adds transitions, and mixes a score — about 8–10 minutes for a 15-second, 3-shot video. Step 4: review and iterate by asking for changes in words. Step 5: export in any aspect ratio. The guide also covers the five input types (text, image, URL, script, audio), tips for better results, scaling one video into a batch pipeline, and works across Claude Code, Codex, and OpenClaw. For the different approaches to making video — code-rendered vs AI-generated — it points to the companion 'can Claude Code make videos' guide.

To make videos with Claude Code, you install a video generation skill, describe the video you want in plain language, and the agent generates a finished result — no editing software, no prompt engineering, no model selection. This guide walks the full workflow end to end using the Pexo skill, which returns a finished, multi-shot video from a single instruction: install the skill, describe the video, let the agent route across models like Seedance 2.0, Kling 3.0, Veo 3.1, and Sora 2, then review and export. If you first want to understand the different ways Claude Code can make video — code-rendered versus AI-generated versus a single clip — see can Claude Code make videos. This guide is the hands-on version: the exact steps to get a finished video out of your agent.

What You Need

Two things, and you can be running in a few minutes:

RequirementWhat it is
Claude CodeAnthropic's terminal coding agent (a paid Claude plan). The same steps work in Codex and OpenClaw.
A video generation skillThe capability that lets the agent generate video. This guide uses Pexo; for the full list of options, see the best video generation skills for Claude Code.

Claude Code does not generate video on its own — the skill is what adds the capability. Once it is installed, everything else happens inside the conversation.

Step 1: Install a Video Generation Skill

Add the Pexo skill to your agent. Sign in at pexo.ai, copy your API key, and add the skill to your skills directory, then confirm the agent can see it:

/ Inside Claude Code, list installed skills
> /skills
/ You should see "pexo" in the list

The Pexo skill ships its helper scripts plus a SKILL.md that tells the agent how to behave — including a rule to pass your request through faithfully rather than rewriting it. Once the skill is connected and your API key is set, the agent is ready to generate. (Running the included diagnostic confirms the config, dependencies, and key are all valid before you start.)

Step 2: Describe the Video You Want

This is the whole interface: tell the agent what you want in plain English. You do not pick a model or write a technical prompt. Useful things to specify:

  • What it's about — "a product video for these wireless headphones"
  • Length and shots — "15 seconds, three shots"
  • Mood — "cinematic and premium," "fast-paced for TikTok"
  • Music — "ambient electronic," or let the agent choose
  • Aspect ratio — "9:16 for Reels," "16:9 for YouTube"

A complete first request looks like this:

> Make a 15-second cinematic product video for these wireless headphones —
  three shots, a slow orbit on the first, premium feel, with ambient music. 9:16.

That is enough. The agent takes it from there.

Step 3: Let the Agent Generate

Once you send the request, the agent dispatches it to Pexo, which runs the full production: it writes a shot script, auto-selects the best model for each shot from 10+ options (a product close-up might route to Kling 3.0, a motion scene to Seedance 2.0, a cinematic wide to Veo 3.1), generates each shot, adds transitions, composes an original score, and mixes the audio.

A 15-second, three-shot video takes roughly 8–10 minutes end to end — about 73% faster than choosing models, writing per-model prompts, and assembling clips by hand. While it runs, the agent polls for progress and reports back; you do not need to do anything until it returns the finished file.

Step 4: Review and Iterate

When the video comes back, you review it and ask for changes the same way you described it — in plain language. There is no timeline to edit:

> Make the second shot slower, and swap the music for something more upbeat.

The agent passes the revision through and returns an updated cut. Because the request stays conversational, you iterate by talking, not by re-rendering anything yourself.

Step 5: Export and Use It

The finished video comes back as a standard MP4, mastered and ready to post. Because the production is multi-format aware, you can ask for the same content in different aspect ratios — 9:16 for TikTok and Reels, 16:9 for YouTube, 1:1 for feed — without regenerating from scratch. Download it and publish.

Tips for Better Results

A few habits produce noticeably better videos from the same skill:

  • Describe the mood, not the model. "Premium and cinematic" or "fast-paced and playful" guides the agent's routing better than naming a model — the routing layer is built to translate intent into the right model for each shot, so plain description outperforms technical instructions.
  • Name the platform. Saying "for TikTok" or "for a YouTube pre-roll" sets the aspect ratio, length, and pacing conventions automatically, so you do not have to specify each one.
  • Give 2–4 reference images for products. When accuracy matters — a specific product, logo, or packaging — hand over a few clear photos at 1080p or higher; the agent uses them to keep the product faithful across shots.
  • Be explicit about shot count and rhythm. "Three shots, quick cuts" versus "one slow continuous move" produces very different edits; the more you specify the structure, the closer the first result lands.
  • Generate variants in the same conversation. Asking for alternates — "now a punchier 9-second version" — in the same session keeps the brand look consistent and is faster than starting a new request.

Other Ways to Start: Image, URL, Script, Audio

Text is only one input. Pexo accepts five, so the agent can start from whatever you already have:

InputHow you startExample
TextDescribe the video"a cinematic ad for my coffee brand"
ImageHand over product photosturn studio shots into a moving product video — see the image-to-video guide
URLPaste a product pagethe agent extracts images, copy, and price into an ad
ScriptProvide a written scriptthe agent segments it into scenes
AudioSupply a track or voiceoverthe agent generates visuals to match

For a worked example starting from product photos, the image-to-video guide walks the same flow with images as the input.

Scaling Up: From One Video to a Pipeline

Once a single video works, the same skill scales to batch production — dozens of ad variants from one set of inputs, or a video per SKU — all inside the conversation. For the full automation pattern (product data in, finished ads out, staged for review), see how to build an AI video ad pipeline with Claude Code.

Resources

ResourceURLWhat it is
Pexopexo.aiThe video skill used in this guide
Pexo Skills (GitHub)github.com/pexoai/pexo-skillsOpen-source skills for coding agents
Best video skills for Claude Codepexo.ai/blogThe full ranking of options

Frequently Asked Questions (FAQ)

How do I make a video with Claude Code?

Install a video generation skill (such as Pexo), then describe the video you want in plain language inside the conversation. The agent writes a shot script, auto-selects models, generates the shots, adds transitions and music, and returns a finished MP4 — typically in 8–10 minutes for a 15-second, three-shot video. You do not pick a model or edit a timeline; you describe the result and review it.

Do I need to know how to edit video or write prompts?

No. The point of the skill-based workflow is that you describe the outcome in plain English — mood, length, shots, music — and the agent handles model selection, prompting, and assembly internally. There is no timeline editor and no per-model prompt syntax to learn. You iterate by asking for changes in words, like "make the first shot slower."

How long does it take to make a video with Claude Code?

A 15-second, three-shot video with auto model selection and a mixed score takes about 8–10 minutes end to end. A single raw clip from one model returns faster (1–3 minutes) but is unassembled. The time scales with length, shot count, and whether you want a finished cut versus a single clip.

Which models does Claude Code use to generate the video?

With the Pexo skill, the agent auto-selects per shot from 10+ models including Seedance 2.0, Kling 3.0, Veo 3.1, Sora 2, and Runway Gen-4 — routing a product close-up to one model and a cinematic wide to another. You never name a model; the routing layer picks the best fit for each shot. This is what separates an AI video agent from a single-model generator.

Can I make a video from product photos or a URL instead of text?

Yes. Pexo accepts five input types: text, image, product URL, script, and audio. You can hand over product photos, paste a Shopify or Amazon URL (the agent extracts images and copy), provide a script, or supply an audio track. For a photo-based walkthrough, see the image-to-video guide linked above.

Does this work in Codex and OpenClaw, or only Claude Code?

It works across all three. Because Agent Skills is an open standard, the same Pexo skill runs in Claude Code, OpenAI Codex, and OpenClaw. The install location differs slightly per agent, but the workflow — describe, generate, review, export — is identical.

What is the difference between making video with Remotion versus a skill like Pexo?

Remotion has Claude Code write code that renders into a deterministic MP4 — motion graphics and animation, no AI footage. A skill like Pexo generates real AI video from a description and returns a finished, assembled film. Use Remotion for charts and branded animation; use Pexo for product, cinematic, or social footage. See "can Claude Code make videos" for the full comparison of approaches.

Can I make many videos at once with Claude Code?

Yes. Once a single video works, the same skill scales to batch generation — multiple ad variants from one set of inputs, or one video per product — all inside a single conversation. This is the basis of an automated ad pipeline, where product data goes in and finished, multi-platform creatives come out staged for review.

How much does it cost to make videos with Claude Code?

The skill itself is free to install; generation runs on Pexo credits (new accounts include a free allowance to try it). Beyond your Claude Code subscription, cost scales with how much video you generate. Code-rendered approaches like Remotion have no generation cost but produce animation rather than AI footage.

Pexo Recommend

Can Claude Code Make Videos? The Three Ways, Compared

Can Claude Code Make Videos? The Three Ways, Compared

Can Claude Code make videos? Yes — in three fundamentally different ways: code-rendered video (Remotion, HyperFrames), a single AI clip (the built-in video_generate or a direct model call), or a finished AI video from a goal (a video agent skill like Pexo, or the Higgsfield MCP). This guide explains what each path produces and how to pick — for Claude Code, Claude Desktop, Codex, and OpenClaw.

Finn avatarFinnJun 2, 2026