Key Concepts

Key Concepts


Model is a type of machine learning model that is trained to generate new data, such as text, images, audio, or video, based on the patterns and relationships it has learned from a training dataset. To see the list of all public models on Segmind, go to and filter by models.

PixelFlow GUI

Pixelflow is a node based tool that gives developer and creators the ability to access a myriad of open-sourced models and seamlessly string them together to create highly tailored AI workflows.

PixelFlow APIs

Workflows created on PixelFlow GUI can be converted into APIs for simpler calling from your application. You can mark the inputs and outputs to a workflows to convert them into PixelFlow APIs.


Playgrounds allow you to use a single model via its model page. This is a simple form interface to give in the inputs and view the outputs without using an API key.


Segmind support fine tuning SDXL model. Fine-tuning refers to the process of taking a pre-trained Stable Diffusion model, which is a powerful text-to-image generative AI model, and further training it on a specific dataset to specialize it for a particular domain or task.


Prompts are textual inputs or instructions provided to generative AI models like text-to-image, image-to-image, and language models. For text-to-image models, prompts describe the desired visual content, guiding the generation of images. In image-to-image models, prompts specify the desired modifications or transformations to an existing image. Language models use prompts as context or starting points for generating human-like text.

Effective prompting is crucial, as the quality and specificity of prompts significantly influence the model's output. Prompt engineering involves carefully crafting prompts to elicit the desired responses, often requiring experimentation and iterative refinement for optimal results.


Tokens are the basic units of text that large language models (LLMs) process and generate, typically subword units like words, punctuation, or word pieces. LLMs are charged per token because tokenizing inputs, generating outputs, and the overall computational cost scale with the number of tokens.

Longer inputs/outputs and larger models require processing more tokens, consuming more resources. Charging per token allows providers to account for the variable computational demands based on the input/output length and model size for each request. It creates a scalable pricing model tied to the actual resources consumed for running these complex models.


Weights refer to the numerical parameters of the deep neural network that encode the mapping between text prompts and generated images. LoRA (Low-Rank Adaptation) and other fine-tuning techniques aim to specialize or adapt these weights for specific domains or tasks without modifying the entire pre-trained model. LoRA adds a small set of task-specific weights on top of the base model weights during fine-tuning. This allows efficient customization while preserving the general knowledge from the original training. Other techniques like prompt tuning or full fine-tuning update more weights for increased specialization. Ultimately, these methods optimize a subset of weights to encode new knowledge while leveraging the robustness of the pre-trained weights, enabling efficient domain adaptation of large generative models.

Base model

A base model for a weight file refers to the initial, pre-trained model that serves as the foundation for further fine-tuning or adaptation through techniques like LoRA or prompt tuning.


Modality refers to the different types or forms of data that a system can process. Common modalities include: Text, Image, Audio and Video. Segmind supports the following model types:

Text to ImageImage to ImageLLMvLLM

Text-to-Image models can create visual images from textual descriptions or prompts. These models understand the semantic meaning of the text and generate corresponding images. Popular examples include SSD-1B, SDXL, SD1.5.

Image-to-Image models can create, edit, or manipulate images based on input images and additional guidance or prompts. These models can be used for tasks such as image inpainting, super-resolution, style transfer, and image generation. Examples include Stable Diffusion Inpainting, outpainting, Ip Adapters, ControlNets.

An LLM is trained on vast amounts of textual data to understand and generate human-like text. These models can be used for various natural language processing tasks, such as language translation, text summarization, question answering, and text generation. Popular examples include Llama series of models from Meta, GPT from OpenAI, and Claude from Anthropic.

A vLLM is a type of LLM that is designed to work with both textual and visual data. These models are trained on multimodal datasets containing text and images, allowing them to understand and generate text based on visual inputs and vice versa. Examples include CLIP from OpenAI and LLaVA.

Last updated