Claude Skill

jjang-ai/vmlx

vMLX is an MLX-based framework powering MLX Studio with continuous batch, prefix cache, paged attention, KV cache quantization, and vision-language support. Compatible with OpenAI & Anthropic APIs...

Overview

Stars687
Forks71
LanguagePython
Last pushed2026-06-17
Last synced2026-06-17
View on GitHub

Repository

Ownerjjang-ai
Repositoryvmlx
Full namejjang-ai/vmlx
Repo ID1,160,596,966

Install this Skill

uv tool install vmlx

Registry

Typemcp_server
Quality score85/100
Verificationreadme_parsed
Last verified2026-06-08
Platforms
ClaudeMCPOpenClaw
Capabilities
pdfmemoryimagevideoterminalanthropic-apikvcache-compressionkvcache-optimizationkvcache-reusellm
Detected files
README.mddocspyproject.tomltests
Config keys
URL
Install methods
  • uv tool install vmlx
  • pip install vmlx
  • pip install vmlx[image]
  • git clone https://github.com/jjang-ai/vmlx.git
  • npm install && npm run build

Summary

vMLX is an advanced MLX-based framework that powers MLX Studio with features like continuous batch processing, prefix caching, paged attention, KV cache quantization, and vision-language support. It also provides image generation/editing capabilities and compatibility with OpenAI and Anthropic APIs, making it a versatile tool for local LLM deployment on Apple Silicon.

Chinese description

vMLX - JANG_Q的基地 - 连续批处理、前缀、分页、KV缓存量化、视觉语言 - 驱动MLX Studio。图像生成/编辑、OpenAI/Anth

Key features

  • Continuous batch processing for efficient inference
  • Prefix caching and paged attention optimization
  • KV cache quantization for reduced memory usage
  • Vision-language (VL) support for multimodal tasks
  • Image generation and editing capabilities
  • Compatible with OpenAI and Anthropic APIs

Use cases

  • Running large language models locally on MacBook with MLX
  • Building multimodal applications with vision-language models
  • Optimizing inference with KV cache reuse and compression
  • Deploying MCP servers for AI agent workflows
  • Generating and editing images via MLX Studio

README excerpt

<p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/jjang-ai/vmlx/main/assets/logo-wide-dark.png"> <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/jjang-ai/vmlx/main/assets/logo-wide-light.png"> <img alt="vMLX" src="https://raw.githubusercontent.com/jjang-ai/vmlx/main/assets/logo-wide-light.png" width="400"> </picture> </p> <h3 align="center">MLX Inference Server for Apple Silicon</h3> <p align="center"> Self-hosted inference server for LLMs, VLMs, and image generation on Apple Silicon.<br> OpenAI + Anthropic + Ollama compatible HTTP API. Self-hosted; no third-party API keys required.<br> Native MTP artifact detection and family-specific cache policy gates keep speculative/cache settings explicit and model-safe. </p> <p align="center"> <em>Looking for a native Swift macOS app or Swift inference engine? See <a href="https://osaurus.ai">osaurus.ai</a>.</em> </p> <p align="center"> <a href="https://pypi.org/project/vmlx/"><img src="https://img.shields.io/pypi/v/vmlx?color=%234B8BBE&label=PyPI&logo=python&logoColor=white" alt="PyPI" /></a> <a href="https://github.com/jjang-ai/vmlx/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache_2.0-green?logo=apache" alt="License" /></a> <a href="https://github.com/jjang-ai/vmlx"><img src="https://img.shields.io/github/stars/jjang-ai/vmlx?style=social" alt="Stars" /></a> <img src="https://img.shields.io/badge/Apple_Silicon-M1%2FM2%2FM3%2FM4-black?logo=apple" alt="Apple Silicon" /> <img src="https://img.shields.io/badge/Python-3.10+-3776AB?logo=python&logoColor=white" alt="Python" /> <img src="https://img.shields.io/badge/Electron-28-47848F?logo=electron&logoColor=white" alt=

Topics

Explore more

Data from GitHub. Synced on 2026-06-17