Inference Model¶

Alpha since v0.1.0

The InferenceModel resource is alpha and may have breaking changes in future releases of the API.

Background¶

An InferenceModel allows the Inference Workload Owner to define:

Which Model/LoRA adapter(s) to consume.
Mapping from a client facing model name to the target model name in the InferencePool.
InferenceModel allows for traffic splitting between adapters in the same InferencePool to allow for new LoRA adapter versions to be easily rolled out.
Criticality of the requests to the InferenceModel.

The full spec of the InferenceModel is defined here.