Async Inference (V2)

Submit inference requests asynchronously and poll for results. Ideal for long-running models like video generation, image upscaling, and LLMs.

The V2 async API lets you submit a request, get a request_id immediately, and poll for the result when it's ready. No long-lived HTTP connections.

Sync (v1) vs. async (v2). The v1 sync API (POST /v1/{slug}) returns the output in a single blocking response and is in maintenance mode — fine for fast models that finish in a few seconds. Use v2 async for anything that can take longer than ~10 s (video, upscaling, LLMs), or whenever you want explicit control over the request deadline. Prefer not to manage the polling loop yourself? The Python SDK wraps all of this in a single segmind.run() call.

Quick start

1. Submit a request

curl -X POST "https://api.segmind.com/v2/seedream-4.5" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a red rose on a wooden table, studio lighting",
    "aspect_ratio": "1:1",
    "seed": 123
  }'

Response:

{
  "request_id": "2c7f59ea-13f1-402c-9353-915a2b5a2124",
  "status": "QUEUED",
  "poll_url": "https://api.segmind.com/v1/requests/2c7f59ea-...",
  "status_url": "https://api.segmind.com/v2/requests/2c7f59ea-.../status",
  "response_url": "https://api.segmind.com/v2/requests/2c7f59ea-..."
}

Field	Description
`request_id`	Unique identifier for this request
`status`	Always `QUEUED` on submit
`poll_url`	V1 poll endpoint (backward compatible)
`status_url`	Lightweight status check (no output payload)
`response_url`	Full result endpoint (output + metadata)

2. Check status (lightweight)

Use status_url for efficient polling — it returns only status and metrics, no output payload.

curl "https://api.segmind.com/v2/requests/2c7f59ea-.../status" \
  -H "x-api-key: YOUR_API_KEY"

While processing:

{
  "status": "QUEUED",
  "request_id": "2c7f59ea-...",
  "status_url": "https://api.segmind.com/v2/requests/2c7f59ea-.../status",
  "response_url": "https://api.segmind.com/v2/requests/2c7f59ea-...",
  "metrics": {}
}

When done:

{
  "status": "COMPLETED",
  "request_id": "2c7f59ea-...",
  "metrics": {
    "cost": 0.04,
    "inference_time": 13.06,
    "queue_time": 0.3,
    "total_time": 13.36,
    "remaining_credits": 92.86
  }
}

See Metrics for what each field means.

3. Fetch the result

Once status is COMPLETED, fetch the full result from response_url.

curl "https://api.segmind.com/v2/requests/2c7f59ea-..." \
  -H "x-api-key: YOUR_API_KEY"

Image result:

{
  "status": "COMPLETED",
  "images": [
    {
      "url": "https://segmind-inference-io.s3.amazonaws.com/e17ba-output.jpg",
      "content_type": "image/jpeg",
      "file_size": "868446"
    }
  ],
  "output": "https://segmind-inference-io.s3.amazonaws.com/e17ba-output.jpg",
  "seed": "123",
  "prompt": "a red rose on a wooden table, studio lighting",
  "timings": { "inference": 13.06 },
  "metrics": {
    "cost": 0.04,
    "inference_time": 13.06,
    "queue_time": 0.3,
    "total_time": 13.36,
    "remaining_credits": 92.86
  }
}

Metrics

Every status (status_url) and result (response_url) response for a request that reached COMPLETED (and, where available, FAILED) carries a metrics object. Fields are best-effort — a field is omitted when it isn't applicable (e.g. cost/remaining_credits on an unbilled or failed request).

Field	Unit	Description
`cost`	credits	Credits charged for this inference. Matches the `X-Cost` header returned by the v1 sync API.
`remaining_credits`	credits	Your account balance after this inference was billed.
`inference_time`	seconds	Time the model spent processing (excludes time spent queued).
`queue_time`	seconds	Time the request waited in the queue before a worker picked it up.
`total_time`	seconds	End-to-end time, `queue_time + inference_time`.
`retry_count`	count	Number of times the request was retried by a worker. Present only when a retry occurred.

cost lets you track per-request spend on the async API, the same way the X-Cost response header works on the v1 sync API.

Response formats by modality

The result shape depends on what the model produces.

Image models

{
  "status": "COMPLETED",
  "images": [{ "url": "...", "content_type": "image/jpeg", "file_size": "..." }],
  "output": "https://...",
  "seed": "123",
  "prompt": "...",
  "timings": { "inference": 13.06 },
  "metrics": {
    "cost": 0.04,
    "inference_time": 13.06,
    "queue_time": 0.3,
    "total_time": 13.36,
    "remaining_credits": 92.86
  }
}

Video models

{
  "status": "COMPLETED",
  "video": {
    "url": "...",
    "content_type": "video/mp4",
    "file_name": "output.mp4",
    "file_size": 5757619
  },
  "output": "https://..."
}

LLM / text models

{
  "status": "COMPLETED",
  "output": "The generated text...",
  "reasoning": null,
  "partial": false,
  "error": null
}

The output field is always present across all modalities for backward compatibility.

Status values

Status	Description
`QUEUED`	Request accepted, waiting for a worker
`PROCESSING`	A worker has picked up the request
`COMPLETED`	Inference finished, result available
`FAILED`	Inference failed (see `error` field)

Polling guidance

Poll the status_url until status is COMPLETED or FAILED, then fetch the full body from response_url.

Interval: default to 1 s between polls. For known-slow models (long video, long-running LLMs) back off to 5–10 s to cut request volume.
Timeout: use an overall deadline of ≤ 600 s for most models. Raise it for slow video models, or avoid polling entirely with webhooks for fire-and-forget jobs.
Use the lightweight status_url (not response_url) while polling — it skips the output payload.

Idempotency

Each POST /v2/{slug} creates a new request_id. The submit is the only step worth retrying: if a submit returns a 5xx, retry the submit (you'll get a fresh request_id). Never retry by re-POSTing after a successful submit — that starts a second billable job. Polling (GET) is always safe to retry.

Result expiry

Request status and results are stored for 1 hour after submission. After that, the status key expires and polling any endpoint will return HTTP 404. Fetch your results within this window.

Error handling

Failed requests return HTTP 422 on V2 endpoints. Timing metrics may still be present, but billing fields (cost, remaining_credits) are omitted since nothing was charged:

{
  "status": "FAILED",
  "error": "Prompt is Mandatory and must be string",
  "metrics": { "inference_time": 0.007, "queue_time": 0.23, "total_time": 0.24 }
}

Not found returns HTTP 404:

{
  "error": "Request 00000000-... not found"
}

Endpoints summary

Endpoint	Method	Description
`/v2/{model}`	POST	Submit async request
`/v2/requests/{id}/status`	GET	Lightweight status + metrics
`/v2/requests/{id}`	GET	Full result (when `COMPLETED`)
`/v1/requests/{id}`	GET	Legacy poll (status + output combined)

Full Python example

The Python SDK does the submit-and-poll loop for you — result = segmind.run("seedream-4.5", prompt="..."). The raw example below shows what happens under the hood if you'd rather call the API directly.

import requests
import time

API_KEY = "YOUR_API_KEY"
BASE = "https://api.segmind.com"

# Submit
resp = requests.post(
    f"{BASE}/v2/seedream-4.5",
    headers={"x-api-key": API_KEY},
    json={"prompt": "a beautiful sunset", "aspect_ratio": "16:9"},
)
data = resp.json()
request_id = data["request_id"]
print(f"Submitted: {request_id}")

# Poll
while True:
    status = requests.get(
        f"{BASE}/v2/requests/{request_id}/status",
        headers={"x-api-key": API_KEY},
    ).json()

    if status["status"] in ("COMPLETED", "FAILED"):
        break
    time.sleep(2)

# Fetch result
if status["status"] == "COMPLETED":
    result = requests.get(
        f"{BASE}/v2/requests/{request_id}",
        headers={"x-api-key": API_KEY},
    ).json()
    print(f"Image URL: {result['images'][0]['url']}")
    print(f"Inference time: {result['timings']['inference']}s")