Claude Skill
jjang-ai/vmlx
vMLX is an MLX-based framework powering MLX Studio with continuous batch, prefix cache, paged attention, KV cache quantization, and vision-language support. Compatible with OpenAI & Anthropic APIs...
Overview
Repository
Install this Skill
uv tool install vmlxRegistry
uv tool install vmlxpip install vmlxpip install vmlx[image]git clone https://github.com/jjang-ai/vmlx.gitnpm install && npm run build
Summary
vMLX is an advanced MLX-based framework that powers MLX Studio with features like continuous batch processing, prefix caching, paged attention, KV cache quantization, and vision-language support. It also provides image generation/editing capabilities and compatibility with OpenAI and Anthropic APIs, making it a versatile tool for local LLM deployment on Apple Silicon.
vMLX - JANG_Q的基地 - 连续批处理、前缀、分页、KV缓存量化、视觉语言 - 驱动MLX Studio。图像生成/编辑、OpenAI/Anth
Key features
- Continuous batch processing for efficient inference
- Prefix caching and paged attention optimization
- KV cache quantization for reduced memory usage
- Vision-language (VL) support for multimodal tasks
- Image generation and editing capabilities
- Compatible with OpenAI and Anthropic APIs
Use cases
- Running large language models locally on MacBook with MLX
- Building multimodal applications with vision-language models
- Optimizing inference with KV cache reuse and compression
- Deploying MCP servers for AI agent workflows
- Generating and editing images via MLX Studio
README excerpt
<p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/jjang-ai/vmlx/main/assets/logo-wide-dark.png"> <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/jjang-ai/vmlx/main/assets/logo-wide-light.png"> <img alt="vMLX" src="https://raw.githubusercontent.com/jjang-ai/vmlx/main/assets/logo-wide-light.png" width="400"> </picture> </p> <h3 align="center">MLX Inference Server for Apple Silicon</h3> <p align="center"> Self-hosted inference server for LLMs, VLMs, and image generation on Apple Silicon.<br> OpenAI + Anthropic + Ollama compatible HTTP API. Self-hosted; no third-party API keys required.<br> Native MTP artifact detection and family-specific cache policy gates keep speculative/cache settings explicit and model-safe. </p> <p align="center"> <em>Looking for a native Swift macOS app or Swift inference engine? See <a href="https://osaurus.ai">osaurus.ai</a>.</em> </p> <p align="center"> <a href="https://pypi.org/project/vmlx/"><img src="https://img.shields.io/pypi/v/vmlx?color=%234B8BBE&label=PyPI&logo=python&logoColor=white" alt="PyPI" /></a> <a href="https://github.com/jjang-ai/vmlx/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache_2.0-green?logo=apache" alt="License" /></a> <a href="https://github.com/jjang-ai/vmlx"><img src="https://img.shields.io/github/stars/jjang-ai/vmlx?style=social" alt="Stars" /></a> <img src="https://img.shields.io/badge/Apple_Silicon-M1%2FM2%2FM3%2FM4-black?logo=apple" alt="Apple Silicon" /> <img src="https://img.shields.io/badge/Python-3.10+-3776AB?logo=python&logoColor=white" alt="Python" /> <img src="https://img.shields.io/badge/Electron-28-47848F?logo=electron&logoColor=white" alt=