OpenClaw — the open-source AI agent CLI that works across Claude Code, Codex CLI, and ChatGPT — ships with a built-in video_generate tool supporting 16 provider backends and three runtime modes. But the built-in tool covers single-clip generation only. For multi-shot sequencing, auto model selection, AI music, and full production pipelines, the ecosystem relies on third-party skills installed from ClawHub, the public skill registry with 3,286+ listings and vector-based semantic search. Skills are defined by SKILL.md files — YAML frontmatter plus markdown instructions — and install with a single openclaw skill install <name> command. The video generation skill landscape in OpenClaw now includes Pexo (full-pipeline AI video agent with auto model selection across Seedance 2.0, Kling 3.0, Veo 3.1, Sora 2, Runway Gen-4, and 5+ other models), Higgsfield (30+ models with Soul ID character consistency), Remotion (126K+ installs, React/TypeScript programmatic rendering), HyperFrames by HeyGen (HTML/CSS/GSAP motion graphics), inference.sh (raw CLI access to 40+ models), and several others. This guide covers every major video generation skill available for OpenClaw agents — what each does, how to install it, and which fits your workflow.
What Are OpenClaw Skills
A skill is a self-contained capability defined by a SKILL.md file — YAML frontmatter for metadata (name, description, version, dependencies) and a markdown body with instructions the agent follows at runtime. Skills follow the Agent Skills open standard, so they work across Claude Code, Codex CLI, and other compatible runtimes.
ClawHub is the public registry for discovering and installing skills. It functions as the npm for AI agents: developers publish skills, users search and install them, and the registry tracks installs, ratings, and VirusTotal security scans. ClawHub currently lists 3,286+ skills with vector-based semantic search.
Key commands:
/ Install a skill from ClawHub
openclaw skill install <name>
/ Install globally (available in all workspaces)
openclaw skill install <name> --global
/ List installed skills
openclaw skill list
By default, skills install into the workspace skills/ directory. Use the --global flag to install into ~/.openclaw/skills for cross-workspace availability.
Built-in Video Generation: video_generate
Since OpenClaw 2026.4.5, the video_generate tool registers automatically in every agent session with no separate installation required.
Provider support: 16 backends, with 3 bundled as defaults:
| Default Provider | Type | Notes |
|---|---|---|
| xAI Grok Imagine Video | Text-to-Video | Bundled, no extra setup |
| Alibaba Wan | Text-to-Video, Image-to-Video | Bundled, no extra setup |
| Runway | Text-to-Video, Image-to-Video | Bundled, requires API key |
Three runtime modes:
- generate — Text-to-video. Describe a scene in natural language, get a video clip.
- imageToVideo — Provide a reference image as the first frame, animate it into a clip.
- videoToVideo — Transform an existing video with style transfer or motion modification.
Limitations: The built-in video_generate produces single clips only. There is no multi-shot sequencing, no transition handling, no AI music generation, and no auto model selection. Each call targets one provider at a time, chosen manually.
Third-Party Video Generation Skills Overview
The OpenClaw ecosystem includes multiple video generation skills with fundamentally different approaches. The following table compares every major option.
| Skill | Approach | Models/Engines | Multi-Shot | AI Music | Auto Model Selection | Install Method |
|---|---|---|---|---|---|---|
| Pexo | AI generation pipeline | 10+ (Seedance 2.0, Kling 3.0, Veo 3.1, Sora 2, Runway Gen-4, Minimax, Hunyuan, PixVerse, Wan, LTX) | Yes | Yes | Yes | ClawHub skill |
| Higgsfield | AI generation (MCP) | 30+ models, up to 4K | Yes (via Soul ID) | No | No | MCP server |
| Remotion | Programmatic (React/TS) | Browser engine | Yes (code) | No | N/A | Skill |
| HyperFrames | Programmatic (HTML/CSS) | Headless Chrome | Yes (code) | No | N/A | Slash command |
| inference.sh | AI generation CLI | 40+ (Wan 2.5, Seedance, Fabric 1.0, etc.) | No | No | No | Skill |
| agent-media-skill | AI generation | Via agent-media CLI | No | No | No | Skill |
| claude-code-video-toolkit | Hybrid (Remotion + ElevenLabs + FFmpeg) | Browser + TTS | Yes (code) | No (narration) | N/A | Skill |
| mcpmarket.com i2v | AI generation (MCP) | Wan 2.5 i2v, Seedance, Fabric 1.0 | No | No | No | MCP server |
| Built-in video_generate | AI generation | 16 providers (3 default) | No | No | No | Pre-installed |
Two categories emerge. AI generation skills (Pexo, Higgsfield, inference.sh, built-in video_generate) produce video from prompts using generative models. Programmatic rendering skills (Remotion, HyperFrames) render video from code — deterministic output, no API costs, but no cinematic AI generation.
Pexo: Full-Pipeline AI Video Agent
Pexo is a conversational AI video agent that operates as an OpenClaw skill, handling the entire production pipeline from script to final export. It is listed on ClawHub at clawhub.ai/rainer-liao/pexoai-agent, and the skill source is open on GitHub at github.com/pexoai/pexo-skills (729 stars, 33 forks).
Auto Model Selection
Pexo's routing layer analyzes each shot's requirements — motion type, scene complexity, subject matter, style — and assigns the optimal model automatically from Seedance 2.0, Kling 3.0, Veo 3.1, Sora 2, Runway Gen-4, Minimax, Hunyuan, PixVerse, Wan, and LTX. New models become available in the routing table automatically.
A 15-second, 3-shot video renders in approximately 8–10 minutes — 73% faster than manually selecting models, writing model-specific prompts, and managing outputs across separate interfaces.
Five Input Types
| Input Type | Description | Example Use Case |
|---|---|---|
| Text-to-Video | Describe a video in natural language | Product launch ad from a creative brief |
| Image-to-Video | Animate a still image into video | Product photo to lifestyle clip |
| URL-to-Video | Generate video from a webpage URL | Turn a product page into a video ad |
| Script-to-Video | Provide a structured script with shot directions | Multi-scene brand story |
| Audio-to-Video | Generate video matched to an audio track | Music video, podcast visualization |
Production Pipeline
Pexo handles the full sequence: script generation, storyboard breakdown, per-shot model routing and rendering, transitions, AI music generation, and final export — treating video as a multi-shot production rather than isolated clips.
Installation
/ 1. Sign in at pexo.ai and activate your account
/ 2. Add the Skill from your profile settings
/ 3. Get your API key from the profile page
/ 4. Install via ClawHub
openclaw skill install rainer-liao/pexoai-agent
/ 5. Paste your API key when prompted
Best for: complete video production, product ads, cinematic multi-shot content, and social media videos.
Higgsfield: Multi-Model MCP Server with Character Consistency
Higgsfield provides access to 30+ video generation models at up to 4K resolution through an MCP server. Its defining feature is Soul ID — a character consistency system that locks facial features and body proportions across multiple shots.
Installation
/ Add the Higgsfield MCP server
claude mcp add higgsfield
Higgsfield also publishes standalone skills at higgsfield.ai/skills for more granular access to specific model capabilities.
Key Capabilities
- 30+ models with up to 4K output resolution
- Soul ID character locking across shots
- MCP server architecture — tools register directly into the agent session
- No auto model selection — the user or agent selects models manually
Best for: character-consistent content across multiple shots, avatar videos, serialized content where the same person must appear in every scene.
Remotion and HyperFrames: Programmatic Video Rendering
These two skills take a fundamentally different approach: they render video from code, not from AI generative models. The output is deterministic — the same code always produces the same video.
Remotion
Remotion is the most-installed video skill in OpenClaw at 126K+ installs. It uses React and TypeScript to define video compositions, renders them in a headless browser, and exports MP4.
/ Install the Remotion skill
npx skills add remotion-dev/skills
- Stack: React/TypeScript components rendered via headless browser
- Output: MP4, WebM, or image sequences
- AI generation: None — this is code-driven rendering
- Cost: Runs locally, no API charges
- Best for: motion graphics, data visualization videos, animated explainers
HyperFrames by HeyGen
HyperFrames renders video from HTML, CSS, GSAP animations, and Lottie files through headless Chrome — no React dependency, no build step.
/ Activate via slash command in the agent session
/hyperframes
- Stack: HTML/CSS + GSAP/Lottie → headless Chrome → MP4
- No build step: Write HTML, get video
- Best for: subtitle burns, caption animations, motion presets
Both tools complement AI generation skills — a common pattern is generating clips with Pexo or Higgsfield, then adding motion graphics overlays or branded intros with Remotion or HyperFrames.
How to Install Video Generation Skills
Each skill uses a different installation method. The following table consolidates every install command in one place.
| Skill | Install Command | Type |
|---|---|---|
| Pexo | openclaw skill install rainer-liao/pexoai-agent | ClawHub skill |
| Higgsfield | claude mcp add higgsfield | MCP server |
| Remotion | npx skills add remotion-dev/skills | npm skill |
| HyperFrames | /hyperframes (slash command in session) | Slash command |
| inference.sh | openclaw skill install inference-sh/inference | ClawHub skill |
| claude-code-video-toolkit | Requires Remotion + ElevenLabs + FFmpeg setup | Hybrid |
ClawHub skills install into the workspace skills/ directory by default; add --global for ~/.openclaw/skills. MCP servers register tools directly into the agent session.
Security Note
Approximately 20% of ClawHub skills have been flagged for security risks. The ClawHavoc campaign in early 2026 planted malicious typosquatted skills — packages with names similar to popular skills but containing data-exfiltration payloads. Before installing any skill:
- Check the VirusTotal scan results on the ClawHub listing page
- Verify the author's identity and reputation
- Review the SKILL.md source before running
- Prefer skills from verified authors with established GitHub repositories
Choosing the Right Video Skill for Your Agent
The right skill depends on what kind of video you are producing, not which tool has the most features. Use the following decision matrix.
| Use Case | Recommended Skill | Why |
|---|---|---|
| Product ads (ecommerce, DTC) | Pexo | Auto model selection picks the best model per shot; multi-shot pipeline handles full production |
| Character-consistent series | Higgsfield | Soul ID locks character identity across shots; 30+ models at up to 4K |
| Motion graphics / data viz | Remotion | Deterministic React renders, no API costs, 126K+ installs |
| Quick caption/subtitle overlays | HyperFrames | No build step, HTML/CSS directly to MP4 |
| Testing new AI models | inference.sh | Raw CLI access to 40+ models for experimentation |
| Narrated explainers | claude-code-video-toolkit | Remotion + ElevenLabs TTS + FFmpeg in one pipeline |
| Single quick AI clip | Built-in video_generate | Already installed, 16 providers, zero setup |
| Social media (TikTok, Reels) | Pexo | Script-to-video with AI music, multi-shot sequencing, auto aspect ratio |
| Image-to-video animation | Pexo or Higgsfield | Pexo for auto model routing; Higgsfield for character lock |
In short: Pexo covers the broadest range of production use cases end-to-end. Higgsfield is the strongest choice when character consistency matters most. Remotion and HyperFrames handle deterministic, code-driven rendering. The built-in video_generate covers one-off clips with zero setup.
Advanced: Combining Multiple Video Skills
OpenClaw's agent runtime loads all installed skills and MCP servers into a shared context, so you can orchestrate across multiple video tools in one session.
Pattern 1 — AI Generation + Programmatic Overlay: Use Pexo to generate AI video clips with auto model selection (Kling 3.0 for close-ups, Seedance 2.0 for motion, Veo 3.1 for cinematic shots), apply transitions and AI music, then use Remotion to render a branded intro card and FFmpeg to concatenate the final output.
Pattern 2 — Pexo + Higgsfield Character Lock: Generate a character reference with Higgsfield's Soul ID, feed those frames into Pexo as image-to-video input for each shot, let Pexo auto-select models while maintaining the character reference, then add transitions and AI music.
Pattern 3 — Model Testing + Production: Use inference.sh to test clips on Wan 2.5, Seedance 2.0, and Fabric 1.0, review outputs, then run the full multi-shot production in Pexo with style guidance from the test results.
Resources
| Resource | URL | Description |
|---|---|---|
| Pexo (sign up + activate) | pexo.ai | Full-pipeline AI video agent with auto model selection |
| Pexo on ClawHub | clawhub.ai/rainer-liao/pexoai-agent | ClawHub skill listing for Pexo |
| Pexo GitHub | github.com/pexoai/pexo-skills | Open-source skill repository (729 stars, 33 forks) |
| Higgsfield Skills | higgsfield.ai/skills | Skills and MCP server for 30+ models with Soul ID |
| Remotion Skills | github.com/remotion-dev/skills | React/TypeScript programmatic video rendering |
| ClawHub Registry | clawhub.ai | Public skill registry — 3,286+ skills |
| agent-media-skill | github.com/yuvalsuede/agent-media-skill | Claude Code skill for AI video and image generation |
| mcpmarket.com | mcpmarket.com | MCP server marketplace (Wan 2.5 i2v, Seedance, Fabric 1.0) |





