Claude Skill
atopos31/llmio
llmio is a lightweight TypeScript-based LLM API load-balancing gateway supporting OpenAI, Claude, Gemini, DeepSeek, and more. Features intelligent routing, failover, and rate limiting.
Overview
Repository
Install this Skill
docker run -d \Registry
docker run -d \git clone https://github.com/atopos31/llmio.git
Summary
llmio is a lightweight, high-performance LLM API load-balancing gateway written in TypeScript. It supports multiple AI providers including OpenAI, Claude, Gemini, DeepSeek, and others, offering intelligent request routing, fallback, and rate limiting for production AI applications.
LLM API 负载均衡网关。
Key features
- Multi-provider support: OpenAI, Claude, Gemini, DeepSeek, and more
- Intelligent load balancing with automatic failover
- Rate limiting and request queuing
- Lightweight and easy to deploy
- TypeScript-based with type safety
Use cases
- Distributing API calls across multiple LLM providers
- Building resilient AI applications with automatic fallback
- Managing API rate limits and costs in production
- Centralizing AI gateway configuration for teams
README excerpt
# LLMIO English | [中文](README_cn.md) LLMIO is a Go-based LLM load‑balancing gateway that provides a unified REST API, weighted scheduling, observability, and a modern admin UI for LLM clients (openclaw / claude code / codex / gemini cli / cherry studio / open webui). It helps you integrate OpenAI, Anthropic, Gemini, and other model capabilities in a single service. **QQ group: 1083599685** ## Architecture  ## Features - **Unified API**: Compatible with OpenAI Chat Completions, OpenAI Responses, Gemini Native, and Anthropic Messages. Supports both streaming and non‑streaming passthrough. - **Weighted scheduling**: `balancers/` provides two strategies (random by weight / priority by weight). You can route based on tool calling, structured output, and multimodal capability. - **Admin Web UI**: React + TypeScript + Tailwind + Vite console for providers, models, associations, logs, and metrics. - **Rate limiting & failure handling**: Built‑in rate‑limit fallback and provider connectivity checks for fault isolation. - **Local persistence**: Pure Go SQLite (`db/llmio.db`) for config and request logs, ready to use out of the box. - **Session tracking**: Pass `session_id` in any request body (works with `extra_body` in OpenAI SDK) to tag logs with a session identifier. Filter and search by `session_id` in the admin UI or via `GET /api/logs?session_id=`. - **Observability**: Every request is recorded with TraceID, latency breakdown (proxy / first-chunk / completion time), TPS, token usage (input / cached / output), and optional full IO logging. Per-request cost is calculated from configurable per-million-token prices (CNY / USD) and shown in the log detail view alongside provider and model metadata. ## Deployment ### Docker Compose (