Python SDK

Run any Segmind model from Python. Async-by-default run(), explicit sync run_sync(), submit/poll handles for long jobs, and a normalized LLM chat() — all over the v2 async gateway.

The Segmind Python SDK is a thin client over the AI Gateway. As of 1.1.0 it is async-by-default: run() submits to the v2 async API, polls for you, and returns the finished result. A blocking v1 call is still available as run_sync().

Install

pip install "segmind>=1.1.0"

1.1.0 is a breaking release. run() is now async (v2). The old blocking behaviour moved to run_sync(), and run_async() was removed — call run() instead. If you're upgrading from 1.0.x, see Migrating from 1.0.x.

Set your API key once via the environment:

export SEGMIND_API_KEY="YOUR_API_KEY"

The SDK reads SEGMIND_API_KEY automatically. Grab a key from the API Keys page.

Sync vs. async — which to use?

run() (async, default) — submits to v2, polls, and returns the result. Best for anything that can take more than a few seconds, and the right default for almost all models. Has a built-in 600 s deadline.
run_sync() (sync) — a single blocking v1 HTTP call. Fine for fast models (< 60 s) when you want the raw bytes back immediately. v1 is in maintenance mode.
submit_async() (handle) — submit now, poll later with your own deadline and cadence. Use this for long video/LLM jobs that may exceed 600 s, for fire-and-forget patterns, or to fan out many requests in parallel.

Quick start

import segmind

# Submit to v2, poll until COMPLETED, return the result body.
result = segmind.run("seedream-4.5", prompt="a red rose on a wooden table")
print(result["output"])  # URL to the generated image

import segmind

# Single blocking v1 call — returns a raw httpx.Response.
response = segmind.run_sync("seedream-4.5", prompt="a red rose on a wooden table")
with open("rose.jpg", "wb") as f:
    f.write(response.content)

API

`segmind.run(slug, **params) -> dict`

Async by default. Submits to POST /v2/{slug}, polls until the request reaches COMPLETED, and returns the final response body as a dict. Polls every 1 s with a 600 s deadline.

result = segmind.run("seedream-4.5", prompt="a sunset over the ocean")
print(result["output"])

run() forwards every keyword to the model body, so it does not accept timeout or interval arguments (a model could legitimately have a param named timeout). For a custom deadline or poll cadence, use submit_async() + job.wait(timeout=…, interval=…).

`segmind.run_sync(slug, **params)`

Synchronous, single blocking call to POST /v1/{slug}. Returns the raw httpx.Response — use .content for binary output (images, audio) or .json() for structured responses.

response = segmind.run_sync("sdxl1.0-newreality-lightning", prompt="a cyberpunk city")
with open("city.jpg", "wb") as f:
    f.write(response.content)

`segmind.submit_async(slug, **params) -> AsyncJob`

Submits a v2 request and returns an AsyncJob handle immediately, without waiting for the result. Use it to control the poll deadline/cadence, to keep the request_id, or to run other work while the job processes.

job = segmind.submit_async("seedance-2.0", prompt="a sunset timelapse")
print(job.request_id)          # e.g. "2c7f59ea-13f1-402c-9353-915a2b5a2124"
result = job.wait(timeout=600) # block until done

`AsyncJob`

The handle returned by submit_async().

Attribute / method	Returns	Description
`.request_id`	`str`	Unique id for the submitted request.
`.status_url`	`str`	Lightweight status endpoint (no payload).
`.response_url`	`str`	Full result endpoint.
`.status()`	`dict`	Current status body without blocking. `status` is one of `QUEUED`, `PROCESSING`, `COMPLETED`, `FAILED`.
`.result()`	`dict`	Final response body. Only meaningful once `COMPLETED`.
`.wait(timeout=600.0, interval=1.0)`	`dict`	Block until a terminal state and return the result. Raises `InferenceFailed` / `InferenceTimeout`.

import time
import segmind

job = segmind.submit_async("seedance-2.0", prompt="a sunset timelapse")

# Manual polling, if you want it:
while job.status()["status"] not in ("COMPLETED", "FAILED"):
    time.sleep(2)

result = job.result()

Error handling

run() and AsyncJob.wait() raise on failure:

Exception	Raised when	Attributes
`InferenceFailed`	The request reached `FAILED`.	`.detail` (server error string), `.status_body` (raw status payload)
`InferenceTimeout`	`wait()` exceeded its `timeout` before a terminal state. The job may still be running server-side — re-fetch to recover.	`.request_id`, `.elapsed_s`
`SegmindError`	Base class — transport/auth errors (401/404/5xx).	`.status`, `.detail`

import segmind
from segmind import InferenceFailed, InferenceTimeout

try:
    result = segmind.run("seedream-4.5", prompt="a red rose")
except InferenceFailed as e:
    print(f"Model failed: {e.detail}")
except InferenceTimeout as e:
    print(f"Still running after {e.elapsed_s:.0f}s — request {e.request_id}")

Worked examples

Fast image model

For a quick image model the defaults are all you need — run() typically returns after a single poll.

import segmind

result = segmind.run("seedream-4.5", prompt="a red rose on a wooden table, studio lighting")
print(result["output"])  # https://segmind-inference-io.s3.amazonaws.com/...-output.jpg

Slow video model

Video generation can run for several minutes. Because run() caps the wait at 600 s, use submit_async() and pass a larger timeout (and a slower interval to reduce poll traffic):

import segmind
from segmind import InferenceFailed, InferenceTimeout

job = segmind.submit_async("seedance-2.0", prompt="a sunset timelapse over the ocean")
print(f"Submitted {job.request_id}")

try:
    result = job.wait(timeout=900, interval=5)  # up to 15 min, poll every 5s
    print(result["output"])  # URL to the generated video
except InferenceTimeout as e:
    print(f"Not done after {e.elapsed_s:.0f}s — poll {job.status_url} later")
except InferenceFailed as e:
    print(f"Failed: {e.detail}")

For truly long-running or fire-and-forget jobs, register a webhook instead of polling — you'll be notified when the result is ready.

LLM chat

For text/LLM models, chat() returns a normalized ChatResponse with a provider-agnostic .text. It's async-by-default like run(), with chat_sync() and submit_chat() counterparts.

import segmind

reply = segmind.chat("gpt-5.5", prompt="Write a haiku about the sea")
print(reply.text)

ChatResponse exposes .text, .json(), .tool_calls, .usage, and .raw. Streaming is not supported by the gateway. See the v2 async reference for the underlying response shape.

Migrating from 1.0.x

1.1.0 flips the default verb to async. Update calls as follows:

1.0.x	1.1.0
`segmind.run(slug, ...)` (sync)	`segmind.run_sync(slug, ...)`
`segmind.run_async(slug, ...)`	`segmind.run(slug, ...)`
—	`segmind.submit_async(slug, ...)` (unchanged)

run_async() was removed with no alias — importing or calling it raises an error. There is no behavioural change to submit_async().