Python SDK
Run any Segmind model from Python. Async-by-default run(), explicit sync run_sync(), submit/poll handles for long jobs, and a normalized LLM chat() — all over the v2 async gateway.
The Segmind Python SDK is a thin client over the AI Gateway.
As of 1.1.0 it is async-by-default: run() submits to the
v2 async API, polls for you, and
returns the finished result. A blocking v1 call is still available as
run_sync().
Install
pip install "segmind>=1.1.0"1.1.0 is a breaking release. run() is now async (v2). The old
blocking behaviour moved to run_sync(), and run_async() was removed
— call run() instead. If you're upgrading from 1.0.x, see
Migrating from 1.0.x.
Set your API key once via the environment:
export SEGMIND_API_KEY="YOUR_API_KEY"The SDK reads SEGMIND_API_KEY automatically. Grab a key from the
API Keys page.
Sync vs. async — which to use?
run()(async, default) — submits to v2, polls, and returns the result. Best for anything that can take more than a few seconds, and the right default for almost all models. Has a built-in 600 s deadline.run_sync()(sync) — a single blocking v1 HTTP call. Fine for fast models (< 60 s) when you want the raw bytes back immediately. v1 is in maintenance mode.submit_async()(handle) — submit now, poll later with your own deadline and cadence. Use this for long video/LLM jobs that may exceed 600 s, for fire-and-forget patterns, or to fan out many requests in parallel.
Quick start
import segmind
# Submit to v2, poll until COMPLETED, return the result body.
result = segmind.run("seedream-4.5", prompt="a red rose on a wooden table")
print(result["output"]) # URL to the generated imageimport segmind
# Single blocking v1 call — returns a raw httpx.Response.
response = segmind.run_sync("seedream-4.5", prompt="a red rose on a wooden table")
with open("rose.jpg", "wb") as f:
f.write(response.content)API
segmind.run(slug, **params) -> dict
Async by default. Submits to POST /v2/{slug}, polls until the request reaches
COMPLETED, and returns the final response body as a dict. Polls every
1 s with a 600 s deadline.
result = segmind.run("seedream-4.5", prompt="a sunset over the ocean")
print(result["output"])run() forwards every keyword to the model body, so it does not
accept timeout or interval arguments (a model could legitimately have a
param named timeout). For a custom deadline or poll cadence, use
submit_async() +
job.wait(timeout=…, interval=…).
segmind.run_sync(slug, **params)
Synchronous, single blocking call to POST /v1/{slug}. Returns the raw
httpx.Response — use .content for binary output (images, audio) or
.json() for structured responses.
response = segmind.run_sync("sdxl1.0-newreality-lightning", prompt="a cyberpunk city")
with open("city.jpg", "wb") as f:
f.write(response.content)segmind.submit_async(slug, **params) -> AsyncJob
Submits a v2 request and returns an AsyncJob handle immediately,
without waiting for the result. Use it to control the poll deadline/cadence, to
keep the request_id, or to run other work while the job processes.
job = segmind.submit_async("seedance-2.0", prompt="a sunset timelapse")
print(job.request_id) # e.g. "2c7f59ea-13f1-402c-9353-915a2b5a2124"
result = job.wait(timeout=600) # block until doneAsyncJob
The handle returned by submit_async().
| Attribute / method | Returns | Description |
|---|---|---|
.request_id | str | Unique id for the submitted request. |
.status_url | str | Lightweight status endpoint (no payload). |
.response_url | str | Full result endpoint. |
.status() | dict | Current status body without blocking. status is one of QUEUED, PROCESSING, COMPLETED, FAILED. |
.result() | dict | Final response body. Only meaningful once COMPLETED. |
.wait(timeout=600.0, interval=1.0) | dict | Block until a terminal state and return the result. Raises InferenceFailed / InferenceTimeout. |
import time
import segmind
job = segmind.submit_async("seedance-2.0", prompt="a sunset timelapse")
# Manual polling, if you want it:
while job.status()["status"] not in ("COMPLETED", "FAILED"):
time.sleep(2)
result = job.result()Error handling
run() and AsyncJob.wait() raise on failure:
| Exception | Raised when | Attributes |
|---|---|---|
InferenceFailed | The request reached FAILED. | .detail (server error string), .status_body (raw status payload) |
InferenceTimeout | wait() exceeded its timeout before a terminal state. The job may still be running server-side — re-fetch to recover. | .request_id, .elapsed_s |
SegmindError | Base class — transport/auth errors (401/404/5xx). | .status, .detail |
import segmind
from segmind import InferenceFailed, InferenceTimeout
try:
result = segmind.run("seedream-4.5", prompt="a red rose")
except InferenceFailed as e:
print(f"Model failed: {e.detail}")
except InferenceTimeout as e:
print(f"Still running after {e.elapsed_s:.0f}s — request {e.request_id}")Worked examples
Fast image model
For a quick image model the defaults are all you need — run() typically
returns after a single poll.
import segmind
result = segmind.run("seedream-4.5", prompt="a red rose on a wooden table, studio lighting")
print(result["output"]) # https://segmind-inference-io.s3.amazonaws.com/...-output.jpgSlow video model
Video generation can run for several minutes. Because run() caps the wait at
600 s, use submit_async() and pass a larger timeout (and a slower
interval to reduce poll traffic):
import segmind
from segmind import InferenceFailed, InferenceTimeout
job = segmind.submit_async("seedance-2.0", prompt="a sunset timelapse over the ocean")
print(f"Submitted {job.request_id}")
try:
result = job.wait(timeout=900, interval=5) # up to 15 min, poll every 5s
print(result["output"]) # URL to the generated video
except InferenceTimeout as e:
print(f"Not done after {e.elapsed_s:.0f}s — poll {job.status_url} later")
except InferenceFailed as e:
print(f"Failed: {e.detail}")For truly long-running or fire-and-forget jobs, register a webhook instead of polling — you'll be notified when the result is ready.
LLM chat
For text/LLM models, chat() returns a normalized ChatResponse with a
provider-agnostic .text. It's async-by-default like run(), with
chat_sync() and submit_chat() counterparts.
import segmind
reply = segmind.chat("gpt-5.5", prompt="Write a haiku about the sea")
print(reply.text)ChatResponse exposes .text, .json(), .tool_calls, .usage, and .raw.
Streaming is not supported by the gateway. See the
v2 async reference for
the underlying response shape.
Migrating from 1.0.x
1.1.0 flips the default verb to async. Update calls as follows:
| 1.0.x | 1.1.0 |
|---|---|
segmind.run(slug, ...) (sync) | segmind.run_sync(slug, ...) |
segmind.run_async(slug, ...) | segmind.run(slug, ...) |
| — | segmind.submit_async(slug, ...) (unchanged) |
run_async() was removed with no alias — importing or calling it raises an
error. There is no behavioural change to submit_async().
See also
- API Reference · Async Inference (V2) — the HTTP endpoints the SDK calls.
- Webhooks — get notified instead of polling.
- Authentication — API keys and headers.