Overview
Check the health and status of the HyperGen server. This endpoint provides information about server status, loaded model, queue size, and device.This endpoint does not require authentication, making it ideal for monitoring and health checks.
Authentication
Not required - This endpoint is publicly accessible
Request
No request body or parameters required.Response
Server health statusValues:
"healthy"- Server is running and ready to process requests"unhealthy"- Server is experiencing issues (not currently implemented)
The model identifier that was loaded at server startupExample:
"stabilityai/stable-diffusion-xl-base-1.0"Current number of pending requests in the queueRange: 0 to
max_queue_size (default: 100)0- No pending requests, server is idle>0- Requests are waiting to be processed
Device the model is running onValues:
"cuda"- NVIDIA GPU"cuda:0","cuda:1", etc. - Specific GPU device"cpu"- CPU"mps"- Apple Silicon GPU
Examples
Basic Health Check
Response
Server Under Load
When the server has pending requests:Use Cases
Monitoring Script
Monitor server health and queue status:Load Balancer Health Check
Use for load balancer health checks (e.g., AWS ALB, nginx):Kubernetes Liveness Probe
Wait for Server Ready
Wait for server to be ready before sending requests:Queue Monitoring
Monitor queue size and alert when backlog grows:Automatic Scaling Decision
Use queue size to make scaling decisions:Metrics Collection
Prometheus Exporter Example
Export metrics for Prometheus monitoring:Response Status Codes
Server is reachable and health check succeeded
Server error (rare, as endpoint is very simple)
The
/health endpoint should always return 200 OK if the server is running, even if the queue is full or the server is under heavy load.Best Practices
Monitoring
Monitoring
- Poll the
/healthendpoint every 10-30 seconds - Monitor
queue_sizeto detect backlog - Alert when
statusis not"healthy" - Track
queue_sizetrends over time
Load Balancing
Load Balancing
- Use
/healthfor load balancer health checks - Set appropriate timeout (5-10 seconds)
- Configure retry logic
- Don’t route traffic to instances with high
queue_size
Auto-Scaling
Auto-Scaling
- Scale up when
queue_sizeconsistently exceeds threshold - Scale down when
queue_sizeis consistently 0 - Use average queue size over time window (e.g., 5 minutes)
- Avoid flapping by using hysteresis
Deployment
Deployment
- Wait for
/healthto return"healthy"before routing traffic - Use in readiness probes for orchestration platforms
- Check
/healthbefore running integration tests - Include in pre-deployment smoke tests
Troubleshooting
Server Not Responding
If/health endpoint is not responding:
- Check if server is running:
ps aux | grep hypergen - Check server logs for errors
- Verify port is not blocked by firewall
- Ensure server started successfully (check for CUDA errors)
High Queue Size
Ifqueue_size is consistently high:
- Generation is too slow (consider using SDXL Turbo)
- Too many concurrent requests
- Image sizes are too large
- Need to scale horizontally (add more servers)