Skip to content

Inference Model

Alpha since v0.1.0

The InferenceModel resource is alpha and may have breaking changes in future releases of the API.

Background

An InferenceModel allows the Inference Workload Owner to define:

  • Which Model/LoRA adapter(s) to consume.
  • Mapping from a client facing model name to the target model name in the InferencePool.
  • InferenceModel allows for traffic splitting between adapters in the same InferencePool to allow for new LoRA adapter versions to be easily rolled out.
  • Criticality of the requests to the InferenceModel.

Spec

The full spec of the InferenceModel is defined here.