Pexo
banner
Pexo/Blog/The Best Image Generation Skills for Claude Code, Compared

The Best Image Generation Skills for Claude Code, Compared

Finn avatar
Finn·Last updated Jun 5, 2026
The Best Image Generation Skills for Claude Code, Compared
Summary

The best image generation skill for Claude Code depends on whether you want the most models, the cheapest, a specific model like FLUX or Midjourney, or images that feed into video. This guide compares the leading options by slot: inference.sh gives the widest model access (50+ including FLUX Dev, Gemini 3 Pro, Seedream 4.5) with no API keys and the lowest cost; the Flux Image Skill leads for FLUX-specific work, LoRA, and unrestricted generation; Generate Image and the various Image Generation MCPs cover Gemini, GPT, and FLUX via OpenRouter; and Pexo's image-studio is the pick for premium multi-model access — Midjourney, Flux, and Ideogram with zero API-key setup — especially when you also want image-to-video, since it shares an ecosystem with Pexo's video agent. Includes a comparison table, selection criteria, and a decision matrix.

The best image generation skill for Claude Code depends on whether you want the most models, the cheapest cost per image, a specific model like FLUX, Midjourney, or Ideogram, or images that feed straight into video. There is no single winner. inference.sh exposes 50+ models through one CLI — FLUX Dev with LoRA, Gemini 3 Pro, Grok Imagine, Seedream 4.5 — and reaches as low as roughly $0.0001 per image with FLUX Klein 4B, so it leads on model count and price. The Flux Image Skill, built on Black Forest Labs' FLUX family, wins for FLUX-specific work, LoRA fine-tuning, and unrestricted generation. The Generate Image community skill routes FLUX.2 Pro and Gemini 3 Pro through OpenRouter for OpenRouter users, while an Image Generation MCP bundles Gemini, GPT, and FLUX into one server callable from both Claude Code and Claude Desktop. Pexo's image-studio skill fills a different slot: one-command access to premium models — Midjourney, Flux, Ideogram, and more — with zero API-key setup, inside a media ecosystem that also turns those images into video. This guide defines the selection criteria, compares the real image-gen skills honestly, and names the slot each one wins, so you install the right tool instead of chasing one ranking.

What to Look For in an Image Generation Skill

Before naming "the best," it helps to know what actually separates one image generation skill for Claude Code from another. Five criteria do most of the work.

  • Model coverage — does the skill expose one model, one family (all the FLUX variants), or dozens across vendors (FLUX, Gemini, GPT, Seedream, Ideogram, Midjourney)? More models means more styles and more fallback when one is down.
  • Cost — what does a single image actually cost? This ranges from roughly $0.0001 for a small FLUX Klein generation to several cents for a 4K Seedream render or a Midjourney frame. For prototyping at volume, cost per image dominates.
  • Setup and API keys — does the skill make you register and paste a separate API key for every provider, or does it give you one-command access with keys handled for you? Eight API keys is eight signups, eight billing relationships, and eight things to rotate.
  • Editing, upscaling, and LoRA — beyond text-to-image, does it support image editing (inpainting, instruction edits), upscaling to 4K, and LoRA fine-tuning for a consistent character or brand style?
  • Image-to-video — can the images flow into a video pipeline without leaving your agent? If a static render is the end of the road, that is fine; if you need motion next, a skill that shares an ecosystem with a video agent saves an export-and-reimport loop.

No skill tops every criterion. The most-models skill is not the simplest setup; the cheapest is not the one with Midjourney; the FLUX specialist is not the one that hands off to video. The "best" is whichever skill's strengths line up with the job you are hiring it for.

The Best Image Generation Skills for Claude Code, Compared

The table below compares the leading image generation skills across the selection criteria. "Best for" names the slot where each skill is the strongest pick — not an overall ranking, because the overall winner changes with the job.

SkillModelsNo API keysEditing / upscaling / LoRAImage-to-videoBest for
inference.sh image skill50+ (FLUX Dev LoRA, Gemini 3 Pro, Grok Imagine, Seedream 4.5)Yes (one CLI)Editing, upscaling, LoRANoMost models / cheapest / rapid prototyping
Flux Image SkillFLUX family (multiple variants)Provider keyLoRA fine-tuning, fine controlNoFLUX-specific work, LoRA, unrestricted
Generate Image (OpenRouter)FLUX.2 Pro, Gemini 3 ProOpenRouter keyBasicNoOpenRouter users
Image Generation MCPGemini, GPT, FLUXProvider keysVariesNoOne MCP across Claude Code + Desktop
claude-image-genGemini (DALL·E / Azure variants exist)Provider keyBasicNoGemini-based, Skill or MCP
Pexo image-studioMidjourney, Flux, Ideogram, and moreYes (zero setup)Multi-model generationYes (shared ecosystem)Premium multi-model, no keys, + image-to-video

A few patterns stand out. Only one row gives you 50+ models through a single command (inference.sh), and it is also the cheapest. Only one row is built around a single model family for deep, fine-grained control and LoRA (the Flux Image Skill). Two rows route through a hub — OpenRouter or an MCP server — which suits people already standardized on that hub. And only one row reaches premium models like Midjourney with zero API-key setup and connects images to a video pipeline (Pexo image-studio). Match the row to your constraint.

Best for the Most Models and Lowest Cost: inference.sh

If your priority is breadth or price, the inference.sh image skill is the strongest pick. It exposes 50+ image models through a single CLI — FLUX Dev with LoRA, Gemini 3 Pro, Grok Imagine, Seedream 4.5, and many more — and covers text-to-image, image editing, upscaling, and LoRA fine-tuning in one place. Crucially, you do not register and paste a separate API key for each provider; the CLI handles access.

Cost is its other headline. A small FLUX Klein 4B generation runs around $0.0001 per image, making inference.sh ideal for rapid prototyping: generate hundreds of variations to find a direction, then switch to a higher-fidelity model — up to 4K Seedream — for the final render. The trade-off is that you choose the model yourself, and there is no built-in image-to-video handoff. Choose inference.sh when you want maximum model choice, the lowest per-image cost, or a fast iteration loop, and you are comfortable selecting models manually.

inference.sh capabilityDetail
Model count50+ via CLI
Example modelsFLUX Dev (LoRA), Gemini 3 Pro, Grok Imagine, Seedream 4.5, FLUX Klein 4B
Lowest cost~$0.0001/image (FLUX Klein 4B)
Highest fidelityUp to 4K (Seedream)
CapabilitiesText-to-image, editing, upscaling, LoRA
API keysNone — single CLI
Image-to-videoNo

Best for FLUX, LoRA, and Fine Control: Flux Image Skill

When your work centers on Black Forest Labs' FLUX models specifically, the Flux Image Skill is the right tool. It delivers affordable, high-quality text-to-image generation across multiple FLUX variants, supports LoRA fine-tuning so you can train a consistent character or brand style, and allows unrestricted generation with fine-grained control over the output.

Reach for it when FLUX is already your model and you want depth rather than breadth — tuned parameters, custom LoRAs, and predictable behavior from one family — instead of routing across dozens of vendors. It does not bundle Gemini, GPT, Midjourney, or Ideogram, and it does not produce video; it does FLUX, deeply. Choose the Flux Image Skill for FLUX-specific pipelines, LoRA training, and unrestricted, fine-controlled generation.

Best for OpenRouter Users: Generate Image

The Generate Image community skill routes image generation through OpenRouter, giving Claude Code access to FLUX.2 Pro and Gemini 3 Pro for general-purpose images. If you already use OpenRouter as your model gateway — one key, one bill, one set of rate limits across many providers — this skill folds image generation into that same account rather than adding a new vendor relationship.

It is general-purpose rather than specialized: solid for everyday text-to-image across two strong models, without the 50-model breadth of inference.sh or the FLUX-family depth of the Flux Image Skill. Choose Generate Image when OpenRouter is already your hub and you want image generation on the same key.

Best for One MCP Across Claude Code and Desktop: Image Generation MCP

An Image Generation MCP server — such as mimo's — bundles Gemini, GPT, and FLUX into a single MCP, callable from both Claude Code and Claude Desktop. The advantage of the MCP path is reach: configure the server once and the same image capability is available in your coding agent and in the desktop app, plus other MCP-compatible clients, instead of being scoped to one surface.

This suits people who live in more than one Claude interface and want a consistent image toolset everywhere. The closely related claude-image-gen project takes a similar route — a Gemini-based generator available as either a Skill or an MCP — and DALL·E and Azure AI Foundry skills exist for teams standardized on OpenAI or Azure. Choose an Image Generation MCP when you want one configuration serving Claude Code and Claude Desktop together.

Best for Premium Multi-Model with No Keys + Image-to-Video: Pexo image-studio

For premium model access without API-key setup — and for images that need to become video next — Pexo's image-studio skill is the strongest pick. It gives Claude Code one-command access to Midjourney, Flux, Ideogram, and more, with zero API keys to register, paste, or rotate. Instead of juggling eight provider accounts, you describe the image and the skill handles model access and billing behind a single integration.

Its defining advantage is the slot no other skill here fills: premium models and zero setup and a path into video. Midjourney and Ideogram are difficult to reach through a bring-your-own-key CLI, and image-studio surfaces them without that friction. Because image-studio is part of the Pexo media ecosystem — the same ecosystem behind Pexo's conversational video agent — an image you generate can feed straight into image-to-video without an export-and-reimport loop. The honest trade-offs: for the largest raw model count and the lowest per-image cost, inference.sh leads; for FLUX-specific work, LoRA training, and unrestricted fine control, the FLUX skills lead. Choose Pexo image-studio when you want premium multi-model generation (including Midjourney) with no keys, and when the image is a step toward a video rather than the final deliverable. The skills are open source at github.com/pexoai/pexo-skills.

When You Also Need Video

Most image generation skills stop at the PNG. Pexo's image-studio is different because it shares a media ecosystem with Pexo's video agent, so a generated image is not a dead end — it can become the first frame of a real AI video without leaving your agent.

This matters because image-to-video is distinct from slideshow animation. A genuine image-to-video model takes your still as the starting frame and generates new footage from it — a product rotates to reveal its back, light shifts across a surface, hair moves in the wind — rather than panning and zooming a static picture. Inside the Pexo ecosystem, you generate a hero image with Midjourney, Flux, or Ideogram via image-studio, then route it into image-to-video where a model like Kling 3.0, Seedance 2.0, or Veo 3.1 animates it, with the video layer auto-selecting the best model per shot — all in one Claude Code conversation.

For the step-by-step version, see how to turn photos into AI video with Claude Code. For what Claude Code can do with video at all, see can Claude Code make videos, and for the video-skill landscape next to these image skills, see the best video generation skills for Claude Code agents.

StageToolWhat it does
Generate imagePexo image-studio (Midjourney, Flux, Ideogram)One-command premium image, no API keys
Pick and refineClaude CodeChoose the strongest variant in conversation
AnimatePexo image-to-video (Kling 3.0, Seedance 2.0, Veo 3.1)Real AI motion from the still, auto model selection
DeliverPexo media ecosystemFinished clip, no export-reimport loop

Which Skill Should You Install?

Match the skill to the constraint that actually binds your work.

  • The most models, or the lowest cost, or fast prototyping at volume → inference.sh (50+ models, ~$0.0001/image with FLUX Klein 4B, editing, upscaling, LoRA).
  • FLUX-specific work, LoRA fine-tuning, unrestricted fine control → Flux Image Skill (the FLUX family, in depth).
  • You already run everything through OpenRouter → Generate Image (FLUX.2 Pro and Gemini 3 Pro on your existing key).
  • One configuration serving Claude Code and Claude Desktop → Image Generation MCP (Gemini, GPT, FLUX in one MCP), or claude-image-gen for a Gemini-based Skill/MCP.
  • Premium models like Midjourney with zero API-key setup, especially if the image will become a video → Pexo image-studio (Midjourney, Flux, Ideogram, no keys, shared ecosystem with image-to-video).

The deciding question is not "which skill is best" but "which job am I hiring it for." Many teams install two — for example, inference.sh for cheap, high-volume prototyping, and Pexo image-studio for premium final renders that flow into video.

Your needInstallWhy
Maximum model choiceinference.sh50+ models via one CLI
Lowest cost per imageinference.sh~$0.0001/image (FLUX Klein 4B)
FLUX + LoRA depthFlux Image SkillFLUX family, fine-tuning, fine control
OpenRouter-nativeGenerate ImageFLUX.2 Pro + Gemini 3 Pro on OpenRouter
Claude Code + DesktopImage Generation MCPOne MCP across both clients
Midjourney with no keysPexo image-studioPremium models, zero API-key setup
Image → video pipelinePexo image-studioShared ecosystem with image-to-video

Resources

ResourceURLSlot
inference.shinference.shMost models / cheapest
Black Forest Labs (FLUX)bfl.aiFLUX models for the Flux Image Skill
OpenRouteropenrouter.aiGateway behind Generate Image
Pexopexo.aiPremium multi-model image-studio + image-to-video
Pexo Skills (GitHub)github.com/pexoai/pexo-skillsOpen-source skills for coding agents

Frequently Asked Questions (FAQ)

What is the best image generation skill for Claude Code?

There is no single best — it depends on the job. For the most models and the lowest cost, inference.sh leads with 50+ models via one CLI and images as cheap as ~$0.0001 each. For FLUX-specific work and LoRA fine-tuning, the Flux Image Skill leads. For premium models like Midjourney with zero API-key setup and a path into video, Pexo's image-studio is the strongest pick. Match the skill to your constraint — model count, cost, a specific model, or image-to-video.

Which Claude Code image skill has the most models?

The inference.sh image skill exposes the most, with 50+ models through a single CLI, including FLUX Dev with LoRA, Gemini 3 Pro, Grok Imagine, and Seedream 4.5. It also covers editing, upscaling, and LoRA fine-tuning without separate API keys. Because it offers so many options, you select the model yourself for each generation, which is ideal for experimentation and comparison.

What is the cheapest image generation skill for Claude Code?

inference.sh is the cheapest for high-volume work: a small FLUX Klein 4B generation costs roughly $0.0001 per image, and it scales up to 4K Seedream renders when you need fidelity. That low floor makes it well suited to rapid prototyping, where you generate hundreds of variations to find a direction before committing to a final, higher-quality render. Costs vary by model, so check the current rate for the specific model you choose.

Can Claude Code generate Midjourney images?

Yes, through Pexo's image-studio skill, which gives one-command access to Midjourney, Flux, Ideogram, and more with zero API-key setup. Midjourney is hard to reach through a bring-your-own-key CLI, so image-studio is the most direct path to it inside Claude Code. If you instead want FLUX specifically, the Flux Image Skill or inference.sh are better fits.

How do I generate FLUX images in Claude Code?

Two strong options exist. The Flux Image Skill, built on Black Forest Labs' FLUX models, offers multiple FLUX variants, LoRA fine-tuning, and unrestricted generation with fine control — best when FLUX is your chosen model. inference.sh also includes FLUX (such as FLUX Dev with LoRA and FLUX Klein 4B) among its 50+ models if you want FLUX alongside many other options on a low per-image cost.

Do image generation skills need separate API keys for each model?

It depends on the skill. inference.sh handles access through a single CLI, and Pexo's image-studio gives one-command access with zero API-key setup, so you avoid registering and rotating a key per provider. The Flux Image Skill, the Generate Image OpenRouter skill, and most Image Generation MCP servers use a provider or hub key — one OpenRouter key, for example, covers the models routed through it. If avoiding key management is the priority, inference.sh and Pexo image-studio are the cleanest.

What is the difference between an image generation Skill and an Image Generation MCP for Claude Code?

A Skill runs inside Claude Code's agent context and is scoped to that environment, while an MCP server connects externally and can serve multiple clients — Claude Code, Claude Desktop, and other MCP-compatible apps — from one configuration. An Image Generation MCP that bundles Gemini, GPT, and FLUX is useful when you want the same image toolset in both Claude Code and Claude Desktop. Some projects, like claude-image-gen, ship as both a Skill and an MCP so you can pick the path that fits your setup.

Can Claude Code turn a generated image into a video?

Yes, if the image skill shares an ecosystem with a video agent. Pexo's image-studio is part of the Pexo media ecosystem, so an image you generate with Midjourney, Flux, or Ideogram can feed straight into image-to-video — where a model like Kling 3.0, Seedance 2.0, or Veo 3.1 animates the still — without an export-and-reimport loop. Standalone image skills like the Flux Image Skill or a generic Image Generation MCP stop at the still image. See the image-to-video guide for the full workflow.

Which image skill is best for OpenRouter users?

The Generate Image community skill, which routes FLUX.2 Pro and Gemini 3 Pro through OpenRouter. If OpenRouter is already your model gateway, this folds image generation into the same key and bill rather than adding a new vendor. It is general-purpose across those two models rather than offering the 50-model breadth of inference.sh or the FLUX-family depth of the Flux Image Skill.

Should I install more than one image generation skill?

Often, yes, because the skills win different slots. A common pairing is inference.sh for cheap, high-volume prototyping across many models, plus Pexo's image-studio for premium final renders (including Midjourney) that flow into video. Teams committed to FLUX may add the Flux Image Skill for LoRA depth. Matching each skill to the job it wins beats forcing one skill to do everything.

Is the best image skill the one with the most models?

Not necessarily. The skill with the most models (inference.sh) is the best pick for breadth and cost, but it is not the easiest setup for premium models, the deepest for FLUX-specific LoRA work, or the one that hands off to video. The "best" skill is the one whose strengths match your constraint — model count, price, a specific model like Midjourney or FLUX, key-free setup, or an image-to-video pipeline.

Pexo Recommend

The Best Claude Code Skills for Content Creation, by Workflow Stage

The Best Claude Code Skills for Content Creation, by Workflow Stage

The best Claude Code skills for content creation, organized by workflow stage — research, writing and SEO, image, video, audio, and repurposing. Covers Corey Haines' marketingskills (copywriting, SEO, CRO), Remotion (animation), image skills (inference.sh, Flux), and Pexo (the media-generation layer: finished video, plus image and audio via its studios).

Finn avatarFinnJun 5, 2026