Integrating AI image generation.
Categories, costs, and what to evaluate. For developers and product teams building image generation into an application.
Four categories of image-generation API.
First-party vendor APIs. The model owner exposes the model through their own API. OpenAI Images, Adobe Firefly, Stability, Ideogram, Black Forest Labs (Flux), Midjourney's recent API tier. The advantage: latest model versions land here first, support is direct, enterprise terms are negotiated with the model owner. The disadvantage: each vendor is a separate integration with separate billing.
Hosted aggregators. Replicate, Fal.ai, Hugging Face Inference Endpoints. A single API surface across many open-weight models (and increasingly some closed partner models). Useful when you need to evaluate or deploy multiple models without building separate integrations. The aggregator is the operator of the hosted endpoint; their terms cascade with the model's licence.
Self-hosted inference. Run open-weight models on your own GPU. Stable Diffusion (any variant), Flux (variants with appropriate licences), via tools like Diffusers, ComfyUI, AUTOMATIC1111, vLLM-style inference servers. Maximum control, lowest per-image cost at high volume, highest operational burden.
Cloud inference platforms. AWS Bedrock, Azure AI Foundry, Google Vertex AI. Image-generation models hosted by the cloud providers behind their existing IAM, billing, and compliance frameworks. Useful when the rest of your stack is already on that cloud and procurement prefers consolidated vendors.
Evaluation criteria for an image-generation API.
Cost per image at your volume.
Per-image, per-megapixel, per-step, or per-second pricing structures. Calculate on your actual expected usage; vendor headline pricing rarely tells the full story.
Latency (p50 / p95).
Production UX requires predictable response times. Some models are 1-2s, some are 10-30s. Streaming progress updates help but don't replace fast inference.
Rate limits and burst handling.
Headline rate limits and burst tolerance. Dedicated capacity tiers (provisioned throughput, dedicated endpoints) for production.
Moderation and content safety.
Vendor-side filters that prevent inappropriate output. Sometimes a separate API call, sometimes integrated. Evaluate alongside your own moderation layer.
SDKs and ecosystem.
Official Python and JS clients. Type definitions. Observability hooks for tracing latency and costs.
Webhook delivery vs polling.
Long-running generations work better with webhooks than polling. Some APIs only offer polling; some require webhook receivers.
Batch API availability.
Higher throughput at lower cost for bulk generation. OpenAI batch, Replicate's queue endpoints, etc.
Model update cadence.
When the vendor replaces or updates a model, your outputs change. Versioning matters: pinning to a specific model version protects your application from silent shifts.
Cost at scale.
Per-image cost in 2026 typically falls in three ranges depending on model class and resolution. Lightweight models (small SD-class, distilled Flux variants) cost fractions of a cent at hosted-aggregator rates; high-resolution proprietary models can run in the dimes range per image at native quality. Self-hosting changes the equation entirely: you pay for GPU hours, electricity, and DevOps time, and the per-image cost approaches zero at high utilisation.
We don't publish vendor prices. Calculate your own from the vendor's pricing page with this template:
avg_resolution = ___ (in megapixels)
avg_steps = ___
vendor_pricing_unit = ___ (per_image | per_megapixel | per_step)
vendor_unit_cost = $___
if per_image: monthly_cost = images_per_day * 30 * vendor_unit_cost
if per_megapixel: monthly_cost = images_per_day * 30 * avg_resolution * vendor_unit_cost
if per_step: monthly_cost = images_per_day * 30 * avg_steps * vendor_unit_cost
The structural insight: at low volume (under a few thousand images per month) the choice barely matters. At medium volume (tens of thousands) the pricing structure matters more than the headline rate. At high volume (hundreds of thousands or more) the choice is usually between a negotiated enterprise contract and self-hosting; commodity API rates rarely scale economically.
Self-hosted cost modelling. A single A100 or H100 produces 0.5-2 images per second on a typical SDXL or Flux setup, depending on resolution and steps. Amortise the GPU rental or capital cost plus electricity over throughput; at high utilisation the per-image cost approaches the cost of electricity, in the order of $0.001-0.01 per image. DevOps overhead (one engineer-day per week to maintain) is the dominant operational cost.
Moderation and content safety.
Every commercial deployment needs a moderation layer. Vendor-side safety filters reduce the rate of inappropriate outputs but do not eliminate them, and the vendor's threshold may differ from your application's. Build moderation into the request flow:
- Prompt-side moderation. Block clearly inappropriate prompts before they reach the generator. Cost-effective; catches the majority of misuse.
- Output-side moderation. Run the generated image through a classifier before returning to the user. Vendors expose moderation APIs (OpenAI Moderation, Google Cloud Vision SafeSearch, AWS Rekognition); open-source options include CLIP-based classifiers fine-tuned for safety.
- Audit logging. Log every prompt and the moderation decisions for compliance. Retention policy must consider both abuse investigation needs and privacy / data-minimisation requirements.
- Rate limiting per user. Prevent abuse of free or low-friction tiers. Per-user limits often catch problems faster than per-account.
Treat moderation as a layered defence rather than a single check. The vendor's safety filter, your prompt-side filter, your output-side classifier, and human review of flagged outputs each catch different failure modes.
Directory of commonly-used APIs.
Not a ranking. Use the links to read the current docs and pricing on the vendor side.
| API | Category | Docs / Legal |
|---|---|---|
| OpenAI Images API | First-party closed | docs →legal → verified April 2026 |
| Google Vertex AI (Imagen) | First-party closed via cloud | docs →legal → verified April 2026 |
| Adobe Firefly API | First-party closed | docs →legal → verified April 2026 |
| Stability AI API | First-party (open-weight model) | docs →legal → verified April 2026 |
| Black Forest Labs (Flux) API | First-party (open-weight model) | docs →legal → verified April 2026 |
| Ideogram API | First-party closed | docs →legal → verified April 2026 |
| Replicate | Aggregator (many models) | docs →legal → verified April 2026 |
| Fal.ai | Aggregator (many models) | docs →legal → verified April 2026 |
| Hugging Face Inference Endpoints | Aggregator (any HF model) | docs →legal → verified April 2026 |
| AWS Bedrock | Cloud-hosted models | docs →legal → verified April 2026 |
| Azure AI Foundry | Cloud-hosted models | docs →legal → verified April 2026 |