Roles and Personas¶

Before diving into the details of the API, descriptions of the personas these APIs were designed for will help convey the thought process of the API design.

Inference Platform Admin¶

The Inference Platform Admin creates and manages the infrastructure necessary to run LLM workloads, including handling Ops for:

Hardware
Model Server
Base Model
Resource Allocation for Workloads
Gateway configuration
etc

Inference Workload Owner¶

An Inference Workload Owner persona owns and manages one or many Generative AI Workloads (LLM focused currently). This includes:

Defining criticality
Managing fine-tunes
LoRA Adapters
System Prompts
Prompt Cache
etc.
Managing rollout of adapters