kaito-project/kaito

每日信息看板 · 2026-03-02

返回当天 Daily Index

开源项目

AI 总结

KAITO 发布 v0.9.0（2026-02-26），作为 Kubernetes Operator 自动化大模型推理/微调与 GPU 节点供给，并新增覆盖所有 vLLM 支持模型，降低在集群中上线与扩缩容 AI 服务的门槛。

KAITO 通过 CRD/Controller（Workspace）在 Kubernetes 中自动部署推理或调优工作负载
提供 OpenAI 兼容推理接口，支持 vLLM 与 transformers 等开源推理运行时
基于模型需求自动拉起 GPU 节点并按监控指标对推理服务自动扩缩容
利用本地 NVMe 存储模型权重，并支持 Gateway API Inference Extension
自 v0.5.0 起提供 RAGEngine Operator：集成 LlamaIndex 编排、支持本地/远程 Embedding、内置 FAISS 向量库并对接任意 OAI 兼容推理后端

#GitHub #repo #开源项目 #Kubernetes #Operator #vLLM #RAG

原链接

内容摘录

Kubernetes AI Toolchain Operator (KAITO)

!GitHub Release
Go Report Card
!GitHub go.mod Go version
codecov
FOSSA Status

| !notification What is NEW! |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ALL vLLM supported modeled can be run in KAITO now, check the latest release. |
| Latest Release: Feb 26th, 2026. KAITO v0.9.0. |
| First Release: Nov 15th, 2023. KAITO v0.1.0. |

KAITO is an operator that automates the AI/ML model inference or tuning workload in a Kubernetes cluster.
The target models are popular open-sourced large models such as phi-4 and llama.
KAITO has the following key differentiations compared to most of the mainstream model deployment methodologies built on top of virtual machine infrastructures:
Providing OpenAI-compatible server to perform inference calls.
Provide preset configurations to avoid adjusting workload parameters based on GPU hardware.
Provide support for popular open-sourced inference runtimes: vLLM and transformers.
Auto-provision GPU nodes based on model requirements.
Autoscale the inference workload based on the service monitoring metrics.
Leverage local NVMe as the primary storage to store model weight files.
Support Gateway API Inference Extension.

Using KAITO, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
Architecture

KAITO follows the classic Kubernetes Custom Resource Definition(CRD)/controller design pattern. User manages a workspace custom resource which describes the GPU requirements and the inference or tuning specification. KAITO controllers will automate the deployment by reconciling the workspace custom resource.
<div align="left">
<img src="website/static/img/arch.png" width=80% title="KAITO architecture" alt="KAITO architecture">
</div>

The above figure presents the KAITO architecture overview. Its major components consist of:
**Workspace controller**: It reconciles the workspace custom resource, creates NodeClaim (explained below) custom resources to trigger node auto provisioning, and creates the inference or tuning workload (deployment, statefulset or job) based on the model preset configurations.
**Node provisioner controller**: The controller's name is *gpu-provisioner* in gpu-provisioner helm chart. It uses the NodeClaim CRD originated from Karpenter to interact with the workspace controller. It integrates with Azure Resource Manager REST APIs to add new GPU nodes to the AKS or AKS Arc cluster.
Note: The *gpu-provisioner* is an open sourced component. It can be replaced by other controllers if they support Karpenter-core APIs.

**NEW!** Starting with version v0.5.0, KAITO releases a new operator, **RAGEngine**, which is used to streamline the process of managing a Retrieval Augmented Generation(RAG) service.
<div align="left">
<img src="website/static/img/ragarch.png" width=80% title="KAITO RAGEngine architecture" alt="KAITO RAGEngine architecture">
</div>

As illustrated in the above figure, the **RAGEngine controller** reconciles the ragengine custom resource and creates a RAGService deployment. The RAGService provides the following capabilities:
**Orchestration**: use LlamaIndex orchestrator.
**Embedding**: support both local and remote embedding services, to embed queries and documents in the vector database.
**Vector database**: support a built-in faiss in-memory vector database. Remote vector database support will be added soon.
**Backend inference**: support any OAI compatible inference service.

The details of the service APIs can be found in this document.
Installation
**Workspace**: Please check the installation guidance here for deployment using helm and here for deployment using Terraform.
**RAGEngine**: Please check the installation guidance here.
Workspace quick start

After installing KAITO, one can try following commands to start a phi-3.5-mini-instruct inference service.

The workspace status can be tracked by running the following command. When the STATE column becomes Ready, the model has been deployed successfully.

Next, one can find the inference service's cluster ip and use a temporal curl pod to test the service endpoint in the cluster.
Usage

The detailed usage for KAITO supported models can be found in **HERE**. In case users want to deploy their own containerized models, they can provide the pod template in the inference field of the workspace custom resource (please see API definitions for details).
Note: Currently the controller does **NOT** handle automatic model upgrade. It only creates inference workloads based on the preset configurations if the workloads do not exist.

The number of the supported models in KAITO is growing! Please check this document to see how to add a new supported model. Refer to tuning document, inference document , RAGEngine document and FAQ for more information.
Contributing

Read more

This project welcomes contributions and suggestions. The contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit CLAs for CNCF.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the CLAs for CNCF, please electronically sign the CLA via
https://easycla.lfx.linuxfoundation.org. If you encounter issues, you can submit a ticket with the
Linux Foundation ID group through the Linux Foundation Support website.
Get Involved!
Visit #KAITO channel in CNCF Slack to discuss features in development and proposals.
We host a weekly…