Getting started

Follow instructions below to create your own inference endpoint in seconds.

Step 1: Access the Dedicated Endpoint Creation Page

  1. Log into your Segmind account.

  2. From the left sidebar on console, click on Endpoints.

  3. Click on New Endpoint to begin creating a dedicated endpoint.

Step 2: Choose Your Model

  1. In the Create a new dedicated endpoint section, you will see a Choose your model dropdown.

  2. Select the model you wish to use. You can find models under Public Models or Your Models. For this example, we'll select the Simple Vector Flux model.

Step 3: Configure the Endpoint

  1. In the Configuration section, fill in the following details:

    • Custom Endpoint URL: Enter a unique name for your endpoint, e.g., endpoint1.

    • Instance Type: Choose your preferred GPU type (e.g., L40, H100, A40, A100).

    • Active GPU: Specify the number of active GPUs you want to use (e.g., 2).

    • Passive GPU: Enter the number of passive GPUs (e.g., 4).

  1. Select the Scale Type:

    • Choose between Queue Delay or Request Count.

    • If you select Queue Delay, specify the Queue Delay (s) and Execution Timeout (s).

    Queue Delay -

    Queue Delay refers to the time a request will wait in line before a new active machine is spun up, helping to manage workload during peak times. This setting can ensure smoother performance by allowing resources to handle requests more effectively.

    Request Count -

    Request Count defines the number of requests that can be in queue at a given time. After the defined threshold of request count in queue, a new machine will be spun up. This setting helps control the load on the system, preventing overload and maintaining performance stability.

Step 4: Review Instance Type

  1. Review the instance type information on the right-hand side to confirm your selections. It displays the GPU type, CPU count, RAM, and pricing details.

  2. Ensure the settings align with your requirements.

Step 5: Launch the Instance

  1. Once you have completed the configuration and reviewed your choices, click on the Launch Instance button at the bottom to create the endpoint.

Step 6: Manage Your Endpoints

  1. After launching, you will be redirected to the Dedicated Endpoints page.

  2. Here, you can view and manage your created endpoints. You will see options to start, stop, or delete endpoints as needed.

  3. Click on usage button to see the usage for your endpoint by different granularity.

Last updated