Optimize your Stable Diffusion Models for deployment
Segmind's stable diffusion optimize API helps optimize the performance of your SD 1.4 and 1.5 models, for production. The API compiles the model into a GPU accelerated format, which helps in faster inference, better CUDA core utilization and optimal memory footprint.
Users can send their checkpoint files, LoRA, PyTorch (.ckpt, LoRA, .bin) files for optimization, select the GPU and optimization parameters and Segmind will convert them into highly optimized .plan files or .so files for optimal production deployment on NVIDIA GPUs. Below is how to use the API to optimize your models.
Optimize model
Use this endpoint to submit an optimization request for a huggingface model. The dynamic field
POST/optimize/v1
Request body
# Model name in HuggingFacemodel: str
# Set to false if image dimensions need to be specified.
# Default: True
dynamic: bool
# If both height and width is set to zero,
# Min/Max: 256-1024
height: int
weight: int
# GPU to optimize for; Choose from: ['A10','A100']
gpu: str
# Model type; Choose from ['INPAINT', 'IMAGETTOIMAGE', 'TEXTTOIMAGE']
type: str
Response
Response:
# Job URL for polling. Eg: "https://api.segmind.com/job/<job_id>"
job_link: str
This endpoint provides the job status for the optimization request.
GET/status/v1/<job_id>/
Response
Response:
# Status of the job
# Can be any of ["PENDING", "PROCESSING", "COMPLETED", "FAILED"]status: str# Will be empty string till job status is "COMPLETED"
download_link: str
Example
Python
Curl
|
import requests
from time import sleep
job_link ="https://api.segmind.com/status/v1/311c158d"# Add an initial delay for 30 mins if processing synchronously# sleep(30*60*60)
status ="PENDING"while(status notin["COMPLETED, FAILED"]){
response = requests.get(
endpoint_url,
headers={"X-Segmind-Access-Token":"<YOUR ACCESS TOKEN>",})
status = response["status"]# 10s interval
sleep (10);}if status =="COMPLETED"{
download_link = response["download_link"]}
List Jobs API
Use this endpoint to get a list of all optimization requests.
GET/jobs/v1/
Request Params
Allows filter by the following fields
# Model name in HuggingFacemodel: str# GPU Type: ["A10" or "A100"]
gpu: str
# Model TYPE. One of ["INPAINT", "IMAGETTOIMAGE", "TEXTTOIMAGE"]model_type: str
Response
Response:
# count of objects
count: int
# List of jobsdata: list(object {
job_id: str ,status: str ,created_at: str
})
Example
Python
Curl
|
import requests
from time import sleep
endpoint ="https://api.segmind.com/jobs/v1/"
response = requests.get(
endpoint_url,
headers={"X-Segmind-Access-Token":"<YOUR ACCESS TOKEN>",})for jobs in response["data"]:# do something here ...