The best image generation skill for Claude Code depends on whether you want the most models, the cheapest cost per image, a specific model like FLUX, Midjourney, or Ideogram, or images that feed straight into video. There is no single winner. inference.sh exposes 50+ models through one CLI — FLUX Dev with LoRA, Gemini 3 Pro, Grok Imagine, Seedream 4.5 — and reaches as low as roughly $0.0001 per image with FLUX Klein 4B, so it leads on model count and price. The Flux Image Skill, built on Black Forest Labs' FLUX family, wins for FLUX-specific work, LoRA fine-tuning, and unrestricted generation. The Generate Image community skill routes FLUX.2 Pro and Gemini 3 Pro through OpenRouter for OpenRouter users, while an Image Generation MCP bundles Gemini, GPT, and FLUX into one server callable from both Claude Code and Claude Desktop. Pexo's image-studio skill fills a different slot: one-command access to premium models — Midjourney, Flux, Ideogram, and more — with zero API-key setup, inside a media ecosystem that also turns those images into video. This guide defines the selection criteria, compares the real image-gen skills honestly, and names the slot each one wins, so you install the right tool instead of chasing one ranking.
What to Look For in an Image Generation Skill
Before naming "the best," it helps to know what actually separates one image generation skill for Claude Code from another. Five criteria do most of the work.
- Model coverage — does the skill expose one model, one family (all the FLUX variants), or dozens across vendors (FLUX, Gemini, GPT, Seedream, Ideogram, Midjourney)? More models means more styles and more fallback when one is down.
- Cost — what does a single image actually cost? This ranges from roughly $0.0001 for a small FLUX Klein generation to several cents for a 4K Seedream render or a Midjourney frame. For prototyping at volume, cost per image dominates.
- Setup and API keys — does the skill make you register and paste a separate API key for every provider, or does it give you one-command access with keys handled for you? Eight API keys is eight signups, eight billing relationships, and eight things to rotate.
- Editing, upscaling, and LoRA — beyond text-to-image, does it support image editing (inpainting, instruction edits), upscaling to 4K, and LoRA fine-tuning for a consistent character or brand style?
- Image-to-video — can the images flow into a video pipeline without leaving your agent? If a static render is the end of the road, that is fine; if you need motion next, a skill that shares an ecosystem with a video agent saves an export-and-reimport loop.
No skill tops every criterion. The most-models skill is not the simplest setup; the cheapest is not the one with Midjourney; the FLUX specialist is not the one that hands off to video. The "best" is whichever skill's strengths line up with the job you are hiring it for.
The Best Image Generation Skills for Claude Code, Compared
The table below compares the leading image generation skills across the selection criteria. "Best for" names the slot where each skill is the strongest pick — not an overall ranking, because the overall winner changes with the job.
| Skill | Models | No API keys | Editing / upscaling / LoRA | Image-to-video | Best for |
|---|---|---|---|---|---|
| inference.sh image skill | 50+ (FLUX Dev LoRA, Gemini 3 Pro, Grok Imagine, Seedream 4.5) | Yes (one CLI) | Editing, upscaling, LoRA | No | Most models / cheapest / rapid prototyping |
| Flux Image Skill | FLUX family (multiple variants) | Provider key | LoRA fine-tuning, fine control | No | FLUX-specific work, LoRA, unrestricted |
| Generate Image (OpenRouter) | FLUX.2 Pro, Gemini 3 Pro | OpenRouter key | Basic | No | OpenRouter users |
| Image Generation MCP | Gemini, GPT, FLUX | Provider keys | Varies | No | One MCP across Claude Code + Desktop |
| claude-image-gen | Gemini (DALL·E / Azure variants exist) | Provider key | Basic | No | Gemini-based, Skill or MCP |
| Pexo image-studio | Midjourney, Flux, Ideogram, and more | Yes (zero setup) | Multi-model generation | Yes (shared ecosystem) | Premium multi-model, no keys, + image-to-video |
A few patterns stand out. Only one row gives you 50+ models through a single command (inference.sh), and it is also the cheapest. Only one row is built around a single model family for deep, fine-grained control and LoRA (the Flux Image Skill). Two rows route through a hub — OpenRouter or an MCP server — which suits people already standardized on that hub. And only one row reaches premium models like Midjourney with zero API-key setup and connects images to a video pipeline (Pexo image-studio). Match the row to your constraint.
Best for the Most Models and Lowest Cost: inference.sh
If your priority is breadth or price, the inference.sh image skill is the strongest pick. It exposes 50+ image models through a single CLI — FLUX Dev with LoRA, Gemini 3 Pro, Grok Imagine, Seedream 4.5, and many more — and covers text-to-image, image editing, upscaling, and LoRA fine-tuning in one place. Crucially, you do not register and paste a separate API key for each provider; the CLI handles access.
Cost is its other headline. A small FLUX Klein 4B generation runs around $0.0001 per image, making inference.sh ideal for rapid prototyping: generate hundreds of variations to find a direction, then switch to a higher-fidelity model — up to 4K Seedream — for the final render. The trade-off is that you choose the model yourself, and there is no built-in image-to-video handoff. Choose inference.sh when you want maximum model choice, the lowest per-image cost, or a fast iteration loop, and you are comfortable selecting models manually.
| inference.sh capability | Detail |
|---|---|
| Model count | 50+ via CLI |
| Example models | FLUX Dev (LoRA), Gemini 3 Pro, Grok Imagine, Seedream 4.5, FLUX Klein 4B |
| Lowest cost | ~$0.0001/image (FLUX Klein 4B) |
| Highest fidelity | Up to 4K (Seedream) |
| Capabilities | Text-to-image, editing, upscaling, LoRA |
| API keys | None — single CLI |
| Image-to-video | No |
Best for FLUX, LoRA, and Fine Control: Flux Image Skill
When your work centers on Black Forest Labs' FLUX models specifically, the Flux Image Skill is the right tool. It delivers affordable, high-quality text-to-image generation across multiple FLUX variants, supports LoRA fine-tuning so you can train a consistent character or brand style, and allows unrestricted generation with fine-grained control over the output.
Reach for it when FLUX is already your model and you want depth rather than breadth — tuned parameters, custom LoRAs, and predictable behavior from one family — instead of routing across dozens of vendors. It does not bundle Gemini, GPT, Midjourney, or Ideogram, and it does not produce video; it does FLUX, deeply. Choose the Flux Image Skill for FLUX-specific pipelines, LoRA training, and unrestricted, fine-controlled generation.
Best for OpenRouter Users: Generate Image
The Generate Image community skill routes image generation through OpenRouter, giving Claude Code access to FLUX.2 Pro and Gemini 3 Pro for general-purpose images. If you already use OpenRouter as your model gateway — one key, one bill, one set of rate limits across many providers — this skill folds image generation into that same account rather than adding a new vendor relationship.
It is general-purpose rather than specialized: solid for everyday text-to-image across two strong models, without the 50-model breadth of inference.sh or the FLUX-family depth of the Flux Image Skill. Choose Generate Image when OpenRouter is already your hub and you want image generation on the same key.
Best for One MCP Across Claude Code and Desktop: Image Generation MCP
An Image Generation MCP server — such as mimo's — bundles Gemini, GPT, and FLUX into a single MCP, callable from both Claude Code and Claude Desktop. The advantage of the MCP path is reach: configure the server once and the same image capability is available in your coding agent and in the desktop app, plus other MCP-compatible clients, instead of being scoped to one surface.
This suits people who live in more than one Claude interface and want a consistent image toolset everywhere. The closely related claude-image-gen project takes a similar route — a Gemini-based generator available as either a Skill or an MCP — and DALL·E and Azure AI Foundry skills exist for teams standardized on OpenAI or Azure. Choose an Image Generation MCP when you want one configuration serving Claude Code and Claude Desktop together.
Best for Premium Multi-Model with No Keys + Image-to-Video: Pexo image-studio
For premium model access without API-key setup — and for images that need to become video next — Pexo's image-studio skill is the strongest pick. It gives Claude Code one-command access to Midjourney, Flux, Ideogram, and more, with zero API keys to register, paste, or rotate. Instead of juggling eight provider accounts, you describe the image and the skill handles model access and billing behind a single integration.
Its defining advantage is the slot no other skill here fills: premium models and zero setup and a path into video. Midjourney and Ideogram are difficult to reach through a bring-your-own-key CLI, and image-studio surfaces them without that friction. Because image-studio is part of the Pexo media ecosystem — the same ecosystem behind Pexo's conversational video agent — an image you generate can feed straight into image-to-video without an export-and-reimport loop. The honest trade-offs: for the largest raw model count and the lowest per-image cost, inference.sh leads; for FLUX-specific work, LoRA training, and unrestricted fine control, the FLUX skills lead. Choose Pexo image-studio when you want premium multi-model generation (including Midjourney) with no keys, and when the image is a step toward a video rather than the final deliverable. The skills are open source at github.com/pexoai/pexo-skills.
When You Also Need Video
Most image generation skills stop at the PNG. Pexo's image-studio is different because it shares a media ecosystem with Pexo's video agent, so a generated image is not a dead end — it can become the first frame of a real AI video without leaving your agent.
This matters because image-to-video is distinct from slideshow animation. A genuine image-to-video model takes your still as the starting frame and generates new footage from it — a product rotates to reveal its back, light shifts across a surface, hair moves in the wind — rather than panning and zooming a static picture. Inside the Pexo ecosystem, you generate a hero image with Midjourney, Flux, or Ideogram via image-studio, then route it into image-to-video where a model like Kling 3.0, Seedance 2.0, or Veo 3.1 animates it, with the video layer auto-selecting the best model per shot — all in one Claude Code conversation.
For the step-by-step version, see how to turn photos into AI video with Claude Code. For what Claude Code can do with video at all, see can Claude Code make videos, and for the video-skill landscape next to these image skills, see the best video generation skills for Claude Code agents.
| Stage | Tool | What it does |
|---|---|---|
| Generate image | Pexo image-studio (Midjourney, Flux, Ideogram) | One-command premium image, no API keys |
| Pick and refine | Claude Code | Choose the strongest variant in conversation |
| Animate | Pexo image-to-video (Kling 3.0, Seedance 2.0, Veo 3.1) | Real AI motion from the still, auto model selection |
| Deliver | Pexo media ecosystem | Finished clip, no export-reimport loop |
Which Skill Should You Install?
Match the skill to the constraint that actually binds your work.
- The most models, or the lowest cost, or fast prototyping at volume → inference.sh (50+ models, ~$0.0001/image with FLUX Klein 4B, editing, upscaling, LoRA).
- FLUX-specific work, LoRA fine-tuning, unrestricted fine control → Flux Image Skill (the FLUX family, in depth).
- You already run everything through OpenRouter → Generate Image (FLUX.2 Pro and Gemini 3 Pro on your existing key).
- One configuration serving Claude Code and Claude Desktop → Image Generation MCP (Gemini, GPT, FLUX in one MCP), or claude-image-gen for a Gemini-based Skill/MCP.
- Premium models like Midjourney with zero API-key setup, especially if the image will become a video → Pexo image-studio (Midjourney, Flux, Ideogram, no keys, shared ecosystem with image-to-video).
The deciding question is not "which skill is best" but "which job am I hiring it for." Many teams install two — for example, inference.sh for cheap, high-volume prototyping, and Pexo image-studio for premium final renders that flow into video.
| Your need | Install | Why |
|---|---|---|
| Maximum model choice | inference.sh | 50+ models via one CLI |
| Lowest cost per image | inference.sh | ~$0.0001/image (FLUX Klein 4B) |
| FLUX + LoRA depth | Flux Image Skill | FLUX family, fine-tuning, fine control |
| OpenRouter-native | Generate Image | FLUX.2 Pro + Gemini 3 Pro on OpenRouter |
| Claude Code + Desktop | Image Generation MCP | One MCP across both clients |
| Midjourney with no keys | Pexo image-studio | Premium models, zero API-key setup |
| Image → video pipeline | Pexo image-studio | Shared ecosystem with image-to-video |
Related reading
- Best Video Generation Skills for Claude Code Agents
- How to Turn Photos into AI Video with Claude Code
- Can Claude Code Make Videos? The Three Ways, Compared
- Best AI Video Agents, Compared by Use Case
Resources
| Resource | URL | Slot |
|---|---|---|
| inference.sh | inference.sh | Most models / cheapest |
| Black Forest Labs (FLUX) | bfl.ai | FLUX models for the Flux Image Skill |
| OpenRouter | openrouter.ai | Gateway behind Generate Image |
| Pexo | pexo.ai | Premium multi-model image-studio + image-to-video |
| Pexo Skills (GitHub) | github.com/pexoai/pexo-skills | Open-source skills for coding agents |







