Serving Quick Start

Start Your First Server

Deploy a diffusion model with one command:

hypergen serve stabilityai/stable-diffusion-xl-base-1.0

The server will start on http://localhost:8000. You’ll see:

INFO - Starting HyperGen server...
INFO - Model: stabilityai/stable-diffusion-xl-base-1.0
INFO - Device: cuda
INFO - Host: 0.0.0.0
INFO - Port: 8000
INFO - Initializing model worker...
INFO - Server ready!

The first run will download the model from HuggingFace, which may take several minutes.

Generate Images

Using OpenAI Python Client

Install the OpenAI client if you don’t have it:

pip install openai pillow

Then generate images:

from openai import OpenAI
import base64
from pathlib import Path

# Create client
client = OpenAI(
    api_key="not-needed",  # No auth by default
    base_url="http://localhost:8000/v1"
)

# Generate images
response = client.images.generate(
    model="sdxl",
    prompt="A cat holding a sign that says hello world",
    n=2,
    size="1024x1024",
    response_format="b64_json"
)

# Save images
for i, img_data in enumerate(response.data):
    img_bytes = base64.b64decode(img_data.b64_json)
    Path(f"output_{i}.png").write_bytes(img_bytes)
    print(f"Saved output_{i}.png")

Using cURL

Test with cURL:

curl http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A serene mountain landscape",
    "n": 1,
    "size": "1024x1024"
  }'

Using Requests

import requests
import base64

response = requests.post(
    "http://localhost:8000/v1/images/generations",
    json={
        "prompt": "A sunset over the ocean",
        "n": 1,
        "size": "1024x1024",
        "response_format": "b64_json",
        "num_inference_steps": 30,
        "guidance_scale": 7.5
    }
)

data = response.json()

# Save first image
img_bytes = base64.b64decode(data["data"][0]["b64_json"])
with open("sunset.png", "wb") as f:
    f.write(img_bytes)

Server Options

With Authentication

Secure your server with an API key:

hypergen serve stabilityai/stable-diffusion-xl-base-1.0 \
  --api-key your-secret-key-here

Then use it from your client:

client = OpenAI(
    api_key="your-secret-key-here",
    base_url="http://localhost:8000/v1"
)

Generate a secure API key with: openssl rand -hex 32

Custom Port

Run on a different port:

hypergen serve stabilityai/stable-diffusion-xl-base-1.0 --port 8080

Custom Data Type

Use bfloat16 for better quality on supported GPUs:

hypergen serve black-forest-labs/FLUX.1-dev --dtype bfloat16

With LoRA

Serve a model with a LoRA adapter:

hypergen serve stabilityai/stable-diffusion-xl-base-1.0 \
  --lora ./my_trained_lora

Common Use Cases

Serve SDXL

hypergen serve stabilityai/stable-diffusion-xl-base-1.0

Default settings work well for SDXL.

Serve FLUX.1

hypergen serve black-forest-labs/FLUX.1-dev \
  --dtype bfloat16 \
  --max-queue-size 50

Use bfloat16 for FLUX models.

Serve SD 1.5

hypergen serve runwayml/stable-diffusion-v1-5 \
  --port 8000

Smaller model, faster inference.

Serve with Custom Settings

hypergen serve stabilityai/stable-diffusion-xl-base-1.0 \
  --host 0.0.0.0 \
  --port 8000 \
  --api-key $(openssl rand -hex 32) \
  --dtype float16 \
  --max-queue-size 100 \
  --max-batch-size 4

Production-ready configuration.

Advanced Generation Parameters

Control Inference Steps

response = client.images.generate(
    model="sdxl",
    prompt="A beautiful landscape",
    num_inference_steps=30,  # Faster but lower quality
    # Or 50-100 for higher quality
)

Use Negative Prompts

response = client.images.generate(
    model="sdxl",
    prompt="A portrait of a person",
    negative_prompt="blurry, low quality, distorted",
)

Set Random Seed

For reproducible results:

response = client.images.generate(
    model="sdxl",
    prompt="A cat in a garden",
    seed=42,  # Same seed = same image
)

Adjust Guidance Scale

Control adherence to prompt:

response = client.images.generate(
    model="sdxl",
    prompt="Abstract art",
    guidance_scale=12.0,  # Higher = stricter prompt following
)

Generate Multiple Images

response = client.images.generate(
    model="sdxl",
    prompt="A robot playing guitar",
    n=4,  # Generate 4 variations
)

for i, img_data in enumerate(response.data):
    # Save each image
    pass

Health Checks

Check Server Status

curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "model": "stabilityai/stable-diffusion-xl-base-1.0",
  "queue_size": 0,
  "device": "cuda"
}

List Available Models

curl http://localhost:8000/v1/models

Response:

{
  "object": "list",
  "data": [
    {
      "id": "stabilityai/stable-diffusion-xl-base-1.0",
      "object": "model",
      "created": 1234567890,
      "owned_by": "hypergen"
    }
  ]
}

Complete Example Script

Save as generate.py:

#!/usr/bin/env python3
"""
Generate images using HyperGen server.

Usage:
    python generate.py "A cat holding a sign"
"""

import sys
import base64
from pathlib import Path
from openai import OpenAI

def generate_image(prompt: str, output: str = "output.png"):
    """Generate an image from a prompt."""
    client = OpenAI(
        api_key="not-needed",
        base_url="http://localhost:8000/v1"
    )

    print(f"Generating: {prompt}")

    response = client.images.generate(
        model="sdxl",
        prompt=prompt,
        n=1,
        size="1024x1024",
        response_format="b64_json",
        num_inference_steps=50,
        guidance_scale=7.5
    )

    # Save image
    img_bytes = base64.b64decode(response.data[0].b64_json)
    Path(output).write_bytes(img_bytes)

    print(f"Saved to: {output}")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python generate.py 'your prompt here'")
        sys.exit(1)

    prompt = sys.argv[1]
    generate_image(prompt)

Run it:

python generate.py "A serene Japanese garden with cherry blossoms"

Troubleshooting

Server won’t start

Issue: Port already in use Solution:

# Use a different port
hypergen serve model_id --port 8001

CUDA out of memory

Issue: Model too large for GPU Solutions:

Use a smaller model:

hypergen serve runwayml/stable-diffusion-v1-5

Use float16:

hypergen serve model_id --dtype float16

Close other GPU applications

Slow generation

Issue: Generation takes too long Solutions:

Reduce inference steps:

num_inference_steps=30  # Instead of 50

Use a faster model:

hypergen serve stabilityai/sdxl-turbo  # Optimized for speed

Connection refused

Issue: Can’t connect to server Checks:

Is the server running?
```
curl http://localhost:8000/health
```

Is the port correct?

base_url="http://localhost:8000/v1"  # Check port number

Is the host correct?

hypergen serve model_id --host 0.0.0.0  # Allow external connections

Next Steps

Configuration

All server configuration options

Serving Overview

Understand the server architecture

Training

Train custom LoRAs to serve

Supported Models

See all compatible models

Getting Started

Training

Serving

Models

​Start Your First Server

​Generate Images

​Using OpenAI Python Client

​Using cURL

​Using Requests

​Server Options

​With Authentication

​Custom Port

​Custom Data Type

​With LoRA

​Common Use Cases

​Serve SDXL

​Serve FLUX.1

​Serve SD 1.5

​Serve with Custom Settings

​Advanced Generation Parameters

​Control Inference Steps

​Use Negative Prompts

​Set Random Seed

​Adjust Guidance Scale

​Generate Multiple Images

​Health Checks

​Check Server Status

​List Available Models

​Complete Example Script

​Troubleshooting

​Server won’t start

​CUDA out of memory

​Slow generation

​Connection refused

​Next Steps

Configuration

Serving Overview

Training

Supported Models

Start Your First Server

Generate Images

Using OpenAI Python Client

Using cURL

Using Requests

Server Options

With Authentication

Custom Port

Custom Data Type

With LoRA

Common Use Cases

Serve SDXL

Serve FLUX.1

Serve SD 1.5

Serve with Custom Settings

Advanced Generation Parameters

Control Inference Steps

Use Negative Prompts

Set Random Seed

Adjust Guidance Scale

Generate Multiple Images

Health Checks

Check Server Status

List Available Models

Complete Example Script

Troubleshooting

Server won’t start

CUDA out of memory

Slow generation

Connection refused

Next Steps