Async Inference (V2)
Submit inference requests asynchronously and poll for results. Ideal for long-running models like video generation, image upscaling, and LLMs.
The V2 async API lets you submit a request, get a request_id immediately,
and poll for the result when it's ready. No long-lived HTTP connections.
Sync (v1) vs. async (v2). The v1 sync API (POST /v1/{slug}) returns the
output in a single blocking response and is in maintenance mode — fine for
fast models that finish in a few seconds. Use v2 async for anything that
can take longer than ~10 s (video, upscaling, LLMs), or whenever you want
explicit control over the request deadline. Prefer not to manage the polling
loop yourself? The Python SDK wraps all of this in a
single segmind.run() call.
Quick start
1. Submit a request
curl -X POST "https://api.segmind.com/v2/seedream-4.5" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "a red rose on a wooden table, studio lighting",
"aspect_ratio": "1:1",
"seed": 123
}'Response:
{
"request_id": "2c7f59ea-13f1-402c-9353-915a2b5a2124",
"status": "QUEUED",
"poll_url": "https://api.segmind.com/v1/requests/2c7f59ea-...",
"status_url": "https://api.segmind.com/v2/requests/2c7f59ea-.../status",
"response_url": "https://api.segmind.com/v2/requests/2c7f59ea-..."
}| Field | Description |
|---|---|
request_id | Unique identifier for this request |
status | Always QUEUED on submit |
poll_url | V1 poll endpoint (backward compatible) |
status_url | Lightweight status check (no output payload) |
response_url | Full result endpoint (output + metadata) |
2. Check status (lightweight)
Use status_url for efficient polling — it returns only status and metrics,
no output payload.
curl "https://api.segmind.com/v2/requests/2c7f59ea-.../status" \
-H "x-api-key: YOUR_API_KEY"While processing:
{
"status": "QUEUED",
"request_id": "2c7f59ea-...",
"status_url": "https://api.segmind.com/v2/requests/2c7f59ea-.../status",
"response_url": "https://api.segmind.com/v2/requests/2c7f59ea-...",
"metrics": {}
}When done:
{
"status": "COMPLETED",
"request_id": "2c7f59ea-...",
"metrics": { "inference_time": 13.06 }
}3. Fetch the result
Once status is COMPLETED, fetch the full result from response_url.
curl "https://api.segmind.com/v2/requests/2c7f59ea-..." \
-H "x-api-key: YOUR_API_KEY"Image result:
{
"status": "COMPLETED",
"images": [
{
"url": "https://segmind-inference-io.s3.amazonaws.com/e17ba-output.jpg",
"content_type": "image/jpeg",
"file_size": "868446"
}
],
"output": "https://segmind-inference-io.s3.amazonaws.com/e17ba-output.jpg",
"seed": "123",
"prompt": "a red rose on a wooden table, studio lighting",
"timings": { "inference": 13.06 },
"metrics": { "inference_time": 13.06 }
}Response formats by modality
The result shape depends on what the model produces.
Image models
{
"status": "COMPLETED",
"images": [{ "url": "...", "content_type": "image/jpeg", "file_size": "..." }],
"output": "https://...",
"seed": "123",
"prompt": "...",
"timings": { "inference": 13.06 },
"metrics": { "inference_time": 13.06 }
}Video models
{
"status": "COMPLETED",
"video": {
"url": "...",
"content_type": "video/mp4",
"file_name": "output.mp4",
"file_size": 5757619
},
"output": "https://..."
}LLM / text models
{
"status": "COMPLETED",
"output": "The generated text...",
"reasoning": null,
"partial": false,
"error": null
}The output field is always present across all modalities for backward
compatibility.
Status values
| Status | Description |
|---|---|
QUEUED | Request accepted, waiting for a worker |
PROCESSING | A worker has picked up the request |
COMPLETED | Inference finished, result available |
FAILED | Inference failed (see error field) |
Polling guidance
Poll the status_url until status is COMPLETED or FAILED, then fetch the
full body from response_url.
- Interval: default to 1 s between polls. For known-slow models (long video, long-running LLMs) back off to 5–10 s to cut request volume.
- Timeout: use an overall deadline of ≤ 600 s for most models. Raise it for slow video models, or avoid polling entirely with webhooks for fire-and-forget jobs.
- Use the lightweight
status_url(notresponse_url) while polling — it skips the output payload.
Idempotency
Each POST /v2/{slug} creates a new request_id. The submit is the only
step worth retrying: if a submit returns a 5xx, retry the submit (you'll get
a fresh request_id). Never retry by re-POSTing after a successful submit — that
starts a second billable job. Polling (GET) is always safe to retry.
Result expiry
Request status and results are stored for 1 hour after submission. After
that, the status key expires and polling any endpoint will return HTTP 404.
Fetch your results within this window.
Error handling
Failed requests return HTTP 422 on V2 endpoints:
{
"status": "FAILED",
"error": "Prompt is Mandatory and must be string",
"metrics": {}
}Not found returns HTTP 404:
{
"error": "Request 00000000-... not found"
}Endpoints summary
| Endpoint | Method | Description |
|---|---|---|
/v2/{model} | POST | Submit async request |
/v2/requests/{id}/status | GET | Lightweight status + metrics |
/v2/requests/{id} | GET | Full result (when COMPLETED) |
/v1/requests/{id} | GET | Legacy poll (status + output combined) |
Full Python example
The Python SDK does the submit-and-poll loop for you —
result = segmind.run("seedream-4.5", prompt="..."). The raw example below
shows what happens under the hood if you'd rather call the API directly.
import requests
import time
API_KEY = "YOUR_API_KEY"
BASE = "https://api.segmind.com"
# Submit
resp = requests.post(
f"{BASE}/v2/seedream-4.5",
headers={"x-api-key": API_KEY},
json={"prompt": "a beautiful sunset", "aspect_ratio": "16:9"},
)
data = resp.json()
request_id = data["request_id"]
print(f"Submitted: {request_id}")
# Poll
while True:
status = requests.get(
f"{BASE}/v2/requests/{request_id}/status",
headers={"x-api-key": API_KEY},
).json()
if status["status"] in ("COMPLETED", "FAILED"):
break
time.sleep(2)
# Fetch result
if status["status"] == "COMPLETED":
result = requests.get(
f"{BASE}/v2/requests/{request_id}",
headers={"x-api-key": API_KEY},
).json()
print(f"Image URL: {result['images'][0]['url']}")
print(f"Inference time: {result['timings']['inference']}s")See also
- Python SDK —
segmind.run()/submit_async()instead of hand-rolled polling. - Webhooks — get notified when a result is ready.
- Rate limits and API error codes.