SegmindSegmind / Docs

Async Inference (V2)

Submit inference requests asynchronously and poll for results. Ideal for long-running models like video generation, image upscaling, and LLMs.

The V2 async API lets you submit a request, get a request_id immediately, and poll for the result when it's ready. No long-lived HTTP connections.

Sync (v1) vs. async (v2). The v1 sync API (POST /v1/{slug}) returns the output in a single blocking response and is in maintenance mode — fine for fast models that finish in a few seconds. Use v2 async for anything that can take longer than ~10 s (video, upscaling, LLMs), or whenever you want explicit control over the request deadline. Prefer not to manage the polling loop yourself? The Python SDK wraps all of this in a single segmind.run() call.

Quick start

1. Submit a request

curl -X POST "https://api.segmind.com/v2/seedream-4.5" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a red rose on a wooden table, studio lighting",
    "aspect_ratio": "1:1",
    "seed": 123
  }'

Response:

{
  "request_id": "2c7f59ea-13f1-402c-9353-915a2b5a2124",
  "status": "QUEUED",
  "poll_url": "https://api.segmind.com/v1/requests/2c7f59ea-...",
  "status_url": "https://api.segmind.com/v2/requests/2c7f59ea-.../status",
  "response_url": "https://api.segmind.com/v2/requests/2c7f59ea-..."
}
FieldDescription
request_idUnique identifier for this request
statusAlways QUEUED on submit
poll_urlV1 poll endpoint (backward compatible)
status_urlLightweight status check (no output payload)
response_urlFull result endpoint (output + metadata)

2. Check status (lightweight)

Use status_url for efficient polling — it returns only status and metrics, no output payload.

curl "https://api.segmind.com/v2/requests/2c7f59ea-.../status" \
  -H "x-api-key: YOUR_API_KEY"

While processing:

{
  "status": "QUEUED",
  "request_id": "2c7f59ea-...",
  "status_url": "https://api.segmind.com/v2/requests/2c7f59ea-.../status",
  "response_url": "https://api.segmind.com/v2/requests/2c7f59ea-...",
  "metrics": {}
}

When done:

{
  "status": "COMPLETED",
  "request_id": "2c7f59ea-...",
  "metrics": { "inference_time": 13.06 }
}

3. Fetch the result

Once status is COMPLETED, fetch the full result from response_url.

curl "https://api.segmind.com/v2/requests/2c7f59ea-..." \
  -H "x-api-key: YOUR_API_KEY"

Image result:

{
  "status": "COMPLETED",
  "images": [
    {
      "url": "https://segmind-inference-io.s3.amazonaws.com/e17ba-output.jpg",
      "content_type": "image/jpeg",
      "file_size": "868446"
    }
  ],
  "output": "https://segmind-inference-io.s3.amazonaws.com/e17ba-output.jpg",
  "seed": "123",
  "prompt": "a red rose on a wooden table, studio lighting",
  "timings": { "inference": 13.06 },
  "metrics": { "inference_time": 13.06 }
}

Response formats by modality

The result shape depends on what the model produces.

Image models

{
  "status": "COMPLETED",
  "images": [{ "url": "...", "content_type": "image/jpeg", "file_size": "..." }],
  "output": "https://...",
  "seed": "123",
  "prompt": "...",
  "timings": { "inference": 13.06 },
  "metrics": { "inference_time": 13.06 }
}

Video models

{
  "status": "COMPLETED",
  "video": {
    "url": "...",
    "content_type": "video/mp4",
    "file_name": "output.mp4",
    "file_size": 5757619
  },
  "output": "https://..."
}

LLM / text models

{
  "status": "COMPLETED",
  "output": "The generated text...",
  "reasoning": null,
  "partial": false,
  "error": null
}

The output field is always present across all modalities for backward compatibility.

Status values

StatusDescription
QUEUEDRequest accepted, waiting for a worker
PROCESSINGA worker has picked up the request
COMPLETEDInference finished, result available
FAILEDInference failed (see error field)

Polling guidance

Poll the status_url until status is COMPLETED or FAILED, then fetch the full body from response_url.

  • Interval: default to 1 s between polls. For known-slow models (long video, long-running LLMs) back off to 5–10 s to cut request volume.
  • Timeout: use an overall deadline of ≤ 600 s for most models. Raise it for slow video models, or avoid polling entirely with webhooks for fire-and-forget jobs.
  • Use the lightweight status_url (not response_url) while polling — it skips the output payload.

Idempotency

Each POST /v2/{slug} creates a new request_id. The submit is the only step worth retrying: if a submit returns a 5xx, retry the submit (you'll get a fresh request_id). Never retry by re-POSTing after a successful submit — that starts a second billable job. Polling (GET) is always safe to retry.

Result expiry

Request status and results are stored for 1 hour after submission. After that, the status key expires and polling any endpoint will return HTTP 404. Fetch your results within this window.

Error handling

Failed requests return HTTP 422 on V2 endpoints:

{
  "status": "FAILED",
  "error": "Prompt is Mandatory and must be string",
  "metrics": {}
}

Not found returns HTTP 404:

{
  "error": "Request 00000000-... not found"
}

Endpoints summary

EndpointMethodDescription
/v2/{model}POSTSubmit async request
/v2/requests/{id}/statusGETLightweight status + metrics
/v2/requests/{id}GETFull result (when COMPLETED)
/v1/requests/{id}GETLegacy poll (status + output combined)

Full Python example

The Python SDK does the submit-and-poll loop for you — result = segmind.run("seedream-4.5", prompt="..."). The raw example below shows what happens under the hood if you'd rather call the API directly.

import requests
import time

API_KEY = "YOUR_API_KEY"
BASE = "https://api.segmind.com"

# Submit
resp = requests.post(
    f"{BASE}/v2/seedream-4.5",
    headers={"x-api-key": API_KEY},
    json={"prompt": "a beautiful sunset", "aspect_ratio": "16:9"},
)
data = resp.json()
request_id = data["request_id"]
print(f"Submitted: {request_id}")

# Poll
while True:
    status = requests.get(
        f"{BASE}/v2/requests/{request_id}/status",
        headers={"x-api-key": API_KEY},
    ).json()

    if status["status"] in ("COMPLETED", "FAILED"):
        break
    time.sleep(2)

# Fetch result
if status["status"] == "COMPLETED":
    result = requests.get(
        f"{BASE}/v2/requests/{request_id}",
        headers={"x-api-key": API_KEY},
    ).json()
    print(f"Image URL: {result['images'][0]['url']}")
    print(f"Inference time: {result['timings']['inference']}s")

See also

On this page