Claude Skill
atopos31/llmio
llmio 是一个基于 TypeScript 的轻量级 LLM API 负载均衡网关,支持 OpenAI、Claude、Gemini、DeepSeek 等。具备智能路由、故障转移和速率限制功能。
概览
仓库信息
安装这个 Skill
docker run -d \Registry 信息
docker run -d \git clone https://github.com/atopos31/llmio.git
项目简介
llmio 是一个轻量级、高性能的 LLM API 负载均衡网关,使用 TypeScript 编写。它支持 OpenAI、Claude、Gemini、DeepSeek 等多个 AI 提供商,提供智能请求路由、故障转移和速率限制功能,适用于生产级 AI 应用。
Unified LLM gateway with weighted load balancing, observability & cost tracking. 统一的 LLM 网关,提供权重负载均衡、可观测性与费用追踪。
要点
- 多提供商支持:OpenAI、Claude、Gemini、DeepSeek 等
- 智能负载均衡与自动故障转移
- 速率限制与请求排队
- 轻量级,易于部署
- 基于 TypeScript,类型安全
使用场景
- 跨多个 LLM 提供商分发 API 调用
- 构建具有自动故障转移能力的弹性 AI 应用
- 在生产环境中管理 API 速率限制和成本
- 为团队集中管理 AI 网关配置
README 摘要
# LLMIO English | [中文](README_cn.md) LLMIO is a Go-based LLM load‑balancing gateway that provides a unified REST API, weighted scheduling, observability, and a modern admin UI for LLM clients (openclaw / claude code / codex / gemini cli / cherry studio / open webui). It helps you integrate OpenAI, Anthropic, Gemini, and other model capabilities in a single service. **QQ group: 1083599685** ## Architecture  ## Features - **Unified API**: Compatible with OpenAI Chat Completions, OpenAI Responses, Gemini Native, and Anthropic Messages. Supports both streaming and non‑streaming passthrough. - **Weighted scheduling**: `balancers/` provides two strategies (random by weight / priority by weight). You can route based on tool calling, structured output, and multimodal capability. - **Admin Web UI**: React + TypeScript + Tailwind + Vite console for providers, models, associations, logs, and metrics. - **Rate limiting & failure handling**: Built‑in rate‑limit fallback and provider connectivity checks for fault isolation. - **Local persistence**: Pure Go SQLite (`db/llmio.db`) for config and request logs, ready to use out of the box. - **Session tracking**: Pass `session_id` in any request body (works with `extra_body` in OpenAI SDK) to tag logs with a session identifier. Filter and search by `session_id` in the admin UI or via `GET /api/logs?session_id=`. - **Observability**: Every request is recorded with TraceID, latency breakdown (proxy / first-chunk / completion time), TPS, token usage (input / cached / output), and optional full IO logging. Per-request cost is calculated from configurable per-million-token prices (CNY / USD) and shown in the log detail view alongside provider and model metadata. ## Deployment ### Docker Compose (