vmlx 是什么工具？

vmlx 是一款Python开发的AI辅助工具。开源MCP工具：vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1。⭐512 · Python 主要应用场景包括：MCP配置。

vmlx 如何安装和开始使用？

访问 vmlx 的 GitHub 仓库或官方网站，按照 README 文档中的步骤安装依赖并运行。通常需要 Python 3.8+ 或 Node.js 16+ 基础环境。

vmlx 是否免费？许可证是什么？

vmlx 完全免费，采用 Apache-2.0 许可证开源发布，任何人都可以免费使用、修改和分发。

vmlx 适合哪些用户使用？

vmlx 对初学者友好，无需深厚技术背景即可快速上手。同时也适合有经验的开发者和 AI 工程师进行深度定制。

vmlx 的社区活跃度和项目维护状况如何？

vmlx 在 GitHub 上已获得 512 个 Star，处于积极发展阶段，社区在持续扩大。

📄 工具详情 ⚙️ 安装教程 📚 使用教程

能力标签

🤖 Agent 🔄 工作流 🌐 翻译 💻 CLI 🔗 REST API 🧬 Embedding 🖼 视觉 🔊 TTS 🎙 STT 🧠 Claude

🛠

AI工具

vmlx MCP工具

基于 Python · 开源免费，本地部署，数据完全自主可控

英文名：vmlx

⭐ 512 Stars 🍴 62 Forks 💻 Python 📄 Apache-2.0 🏷 AI 8.2分

8.2AI 综合评分

模型压缩KV缓存优化MLX框架MCP工具显存优化

🌐 访问官网

✦ AI Skill Hub 推荐

经 AI Skill Hub 精选评估，vmlx MCP工具获评「强烈推荐」。这款AI工具在功能完整性、社区活跃度和易用性方面表现出色，AI 评分 8.2 分，适合有一定技术背景的用户使用。

📚 深度解析

vmlx MCP工具是一款基于 Python 的开源工具，在 GitHub 上收获 1k+ Star，是模型压缩、KV缓存优化、MLX框架、MCP工具领域中的优质开源项目。开源工具的最大优势在于代码完全透明，你可以审计每一行代码的安全性，也可以根据自身需求进行二次开发和定制。

**为什么要使用开源工具而非商业 SaaS？**
对于个人开发者和有隐私需求的用户，本地部署的开源工具意味着数据不离本机，不受第三方服务商的数据政策约束。同时，开源工具通常没有使用次数限制和月度费用，一次安装即可长期使用，对于高频使用场景的总拥有成本（TCO）远低于订阅制商业工具。

**安装与环境准备**
vmlx MCP工具依赖 Python 运行环境。建议通过 pyenv（Python）或 nvm（Node.js）管理 Python 版本，避免全局环境污染。对于新手用户，推荐先创建虚拟环境（python -m venv venv && source venv/bin/activate），再安装依赖，这样即使出现问题也可以随时删除虚拟环境重新开始，不影响系统稳定性。

**社区与维护**
GitHub Issue 和 Discussion 是获取帮助的最快渠道。在提问前建议先检查 Closed Issues（已关闭的问题），大多数常见问题都已有解答。遇到 Bug 时，提供 pip list 的输出、完整错误堆栈和最小可复现示例，能显著提高开发者响应速度。AI Skill Hub 将持续追踪 vmlx MCP工具的版本更新，及时通知重要功能变化。

📋 工具概览

vmlx MCP工具是一款基于 Python 开发的开源工具，专注于模型压缩、KV缓存优化、MLX框架等核心功能。作为 GitHub 开源项目，它拥有活跃的社区支持和持续的版本迭代，代码完全透明可审计，支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流，都能提供稳定可靠的解决方案。

GitHub Stars

⭐ 512

开发语言

Python

支持平台

Windows / macOS / Linux

维护状态

正常维护，社区驱动

开源协议

Apache-2.0

AI 综合评分

8.2 分

工具类型

AI工具

Forks

📖 中文文档

以下内容由 AI Skill Hub 根据项目信息自动整理，如需查看完整原始文档请访问底部「原始来源」。

📌 核心特色

开源免费，支持本地部署，数据完全自主可控
活跃的 GitHub 开源社区，持续迭代更新
提供详细文档和使用示例，新手友好
支持自定义配置，灵活适配不同使用环境
可作为基础组件集成进现有技术栈或进行二次开发

🎯 主要使用场景

本地部署运行，保护数据隐私，满足合规要求
自定义集成到现有系统，扩展技术栈能力
作为开源基础组件进行商业化二次开发

以下安装命令基于项目开发语言和类型自动生成，实际以官方 README 为准。

安装命令

# 方式一：pip 安装（推荐）
pip install vmlx

# 方式二：虚拟环境安装（推荐生产环境）
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install vmlx

# 方式三：从源码安装（获取最新功能）
git clone https://github.com/jjang-ai/vmlx
cd vmlx
pip install -e .

# 验证安装
python -c "import vmlx; print('安装成功')"

📋 安装步骤说明

访问 GitHub 仓库页面
按照 README 文档完成依赖安装
根据系统环境完成初始化配置
参考官方示例或文档开始使用
遇到问题可在 GitHub Issues 中查找解答

以下用法示例由 AI Skill Hub 整理，涵盖最常见的使用场景。

常用命令 / 代码示例

# 命令行使用
vmlx --help

# 基本用法
vmlx input_file -o output_file

# Python 代码中调用
import vmlx

# 示例
result = vmlx.process("input")
print(result)

以下配置示例基于典型使用场景生成，具体参数请参照官方文档调整。

配置示例

# vmlx 配置文件示例（config.yml）
app:
  name: "vmlx"
  debug: false
  log_level: "INFO"

# 运行时指定配置文件
vmlx --config config.yml

# 或通过环境变量配置
export VMLX_API_KEY="your-key"
export VMLX_OUTPUT_DIR="./output"

📑 README 深度解析真实文档完整度 74/100 查看 GitHub 原文 →

以下内容由系统直接从 GitHub README 解析整理，保留代码块、表格与列表结构。

简介

MLX Inference Server for Apple Silicon

Self-hosted inference server for LLMs, VLMs, and image generation on Apple Silicon. OpenAI + Anthropic + Ollama compatible HTTP API. Self-hosted; no third-party API keys required. Native MTP artifact detection and family-specific cache policy gates keep speculative/cache settings explicit and model-safe.

Looking for a native Swift macOS app or Swift inference engine? See <a href="https://osaurus.ai">osaurus.ai</a>.

<a href="#quickstart">Quickstart</a> • <a href="#model-support">Models</a> • <a href="#features">Features</a> • <a href="#image-generation--editing">Image Gen</a> • <a href="#api-reference">API</a> • <a href="#desktop-app">Desktop App</a> • <a href="#advanced-quantization">JANG</a> • <a href="#cli-commands">CLI</a> • <a href="#configuration">Config</a> • <a href="#contributing">Contributing</a> • <a href="#한국어-korean">한국어</a>

---

JANG 2-bit destroys MLX 4-bit on MiniMax M2.5: | Quantization | MMLU (200q) | Size | |---|---|---| | JANG\_2L (2-bit) | 74% | 89 GB | | MLX 4-bit | 26.5% | 120 GB | | MLX 3-bit | 24.5% | 93 GB | | MLX 2-bit | 25% | 68 GB | Adaptive mixed-precision keeps critical layers at higher precision. Scores at jangq.ai. Models at JANGQ-AI.


Chat with any MLX model -- thinking mode, streaming, and syntax highlighting	Agentic chat with full coding capabilities -- tool use and structured output

---

Features

Optional Dependencies

pip install vmlx              # Core: text LLMs, VLMs, embeddings, reranking
pip install vmlx[image]       # + Image generation (mflux)
pip install vmlx[jang]        # + JANG quantization tools
pip install vmlx[dev]         # + Development/testing tools
pip install vmlx[image,jang]  # Multiple extras

---

Install from PyPI

Published on PyPI as vmlx -- install and run in one command:

```bash

Or: pip in a virtual environment

python3 -m venv ~/.vmlx-env && source ~/.vmlx-env/bin/activate pip install vmlx vmlx serve mlx-community/Qwen3-8B-4bit ```

Note: On macOS 14+, bare pip install fails with "externally-managed-environment". Use uv, pipx, or a venv.

The vMLX inference server is now running at http://0.0.0.0:8000 with an OpenAI + Anthropic compatible API. Works with any model from mlx-community -- thousands of models ready to go.

Quickstart

curl Examples

Chat completion (streaming)

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local",
    "messages": [{"role": "user", "content": "Explain quantum computing in 3 sentences."}],
    "stream": true,
    "temperature": 0.7
  }'

Chat completion with thinking mode

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local",
    "messages": [{"role": "user", "content": "Solve: what is 23 * 47?"}],
    "enable_thinking": true,
    "stream": true
  }'

Tool calling

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local",
    "messages": [{"role": "user", "content": "What is the weather in Tokyo?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }]
  }'

Anthropic Messages API

curl http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: not-needed" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "local",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Embeddings

curl http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Text-to-speech

curl http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Hello, welcome to vMLX!",
    "voice": "af_heart"
  }' --output speech.wav

Speech-to-text

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=whisper

Image generation

curl http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "schnell",
    "prompt": "A mountain landscape at sunset",
    "size": "1024x1024"
  }'

Reranking

curl http://localhost:8000/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local",
    "query": "What is machine learning?",
    "documents": [
      "ML is a subset of AI",
      "The weather is sunny today",
      "Neural networks learn from data"
    ]
  }'

Cache stats

curl http://localhost:8000/v1/cache/stats

Health check

curl http://localhost:8000/health

---

Guidelines

Run the full test suite before submitting PRs
Follow existing code style and patterns
Include tests for new features
Update documentation for user-facing changes

---

Recommended: uv (fast, no venv hassle)

brew install uv uv tool install vmlx vmlx serve mlx-community/Qwen3-8B-4bit

Configuration

Server Options

vmlx serve <model> \
  --host 0.0.0.0 \              # Bind address (default: 0.0.0.0)
  --port 8000 \                 # Port (default: 8000)
  --api-key sk-your-key \       # Optional API key authentication
  --continuous-batching \       # Enable concurrent request handling
  --enable-prefix-cache \       # Reuse KV states for repeated prompts
  --use-paged-cache \           # Block-based KV cache with dedup
  --kv-cache-quantization q8 \  # Quantize cache: q4 or q8
  --enable-disk-cache \         # Persist cache to SSD
  --enable-jit \                # JIT Metal kernel compilation
  --tool-call-parser auto \     # Auto-detect tool call format
  --reasoning-parser auto \     # Auto-detect thinking format
  --log-level INFO \            # Logging: DEBUG, INFO, WARNING, ERROR
  --max-model-len 8192 \        # Max context length
  --speculative-model <model> \ # Draft model for speculative decoding
  --enable-pld \                # Prompt Lookup Decoding — no draft model, best for code/JSON/schemas
  --distributed \               # Enable multi-Mac pipeline parallelism
  --cluster-secret <secret> \   # Shared auth secret for workers
  --distributed-mode pipeline \ # pipeline (default) or tensor (coming soon)
  --worker-nodes ip:port,... \  # Manual worker IPs (overrides auto-discovery)
  --cors-origins "*"            # CORS allowed origins

Quantization Options

vmlx convert <model> \
  --bits 4 \                    # Uniform quantization bits: 2, 3, 4, 6, 8
  --group-size 64 \             # Quantization group size (default: 64)
  --output ./output-dir \       # Output directory
  --jang-profile JANG_3M \      # JANG mixed-precision profile
  --calibration-method activations  # Activation-aware calibration

Image Generation & Editing Options

```bash pip install vmlx[image]

Audio Options

TTS and STT require the mlx-audio package:

```bash pip install mlx-audio

Use with OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="local",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Use with Anthropic SDK

import anthropic

client = anthropic.Anthropic(base_url="http://localhost:8000/v1", api_key="not-needed")
message = client.messages.create(
    model="local",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

Generation API

curl http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "schnell",
    "prompt": "A cat astronaut floating in space with Earth in the background",
    "size": "1024x1024",
    "n": 1
  }'

```python

Python (OpenAI SDK)

response = client.images.generate( model="schnell", prompt="A cat astronaut floating in space", size="1024x1024", n=1, ) ```

Editing API

```bash

API Reference

API Gateway

The desktop app runs an API Gateway on a single port (default 8080) that routes requests to all loaded models by name. Run multiple models simultaneously and access them all through one URL.

```bash

Works with Ollama CLI too

OLLAMA_HOST=http://localhost:8080 ollama run Qwen3.5-122B ```

The gateway supports OpenAI, Anthropic, and Ollama wire formats. Configure the port in the API tab.

Endpoints

OpenAI / Anthropic

Method	Path	Description
`POST`	`/v1/chat/completions`	OpenAI Chat Completions API (streaming + non-streaming)
`POST`	`/v1/messages`	Anthropic Messages API
`POST`	`/v1/responses`	OpenAI Responses API
`POST`	`/v1/completions`	Text completions
`POST`	`/v1/images/generations`	Image generation
`POST`	`/v1/images/edits`	Image editing (Qwen Image Edit)
`POST`	`/v1/embeddings`	Text embeddings
`POST`	`/v1/rerank`	Document reranking
`POST`	`/v1/audio/transcriptions`	Speech-to-text (Whisper)
`POST`	`/v1/audio/speech`	Text-to-speech (Kokoro)
`GET`	`/v1/models`	List loaded models
`GET`	`/v1/cache/stats`	Cache statistics
`GET`	`/health`	Server health check

Ollama

Method	Path	Description
`POST`	`/api/chat`	Chat completion (NDJSON streaming)
`POST`	`/api/generate`	Text generation (NDJSON streaming)
`GET`	`/api/tags`	List loaded models
`POST`	`/api/show`	Model details
`POST`	`/api/embeddings`	Generate embeddings

CLI Commands

vmlx serve <model>              # Start inference server
vmlx convert <model> --bits 4   # MLX uniform quantization
vmlx convert <model> -j JANG_3M # JANG adaptive quantization
vmlx info <model>               # Model metadata and config
vmlx doctor <model>             # Run diagnostics
vmlx bench <model>              # Performance benchmarks
vmlx-worker --secret <secret>   # Start distributed worker node

---

이미지 편집 API

curl http://localhost:8000/v1/images/edits \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-image-edit", "prompt": "배경을 해질녘으로 변경", "image": "<base64 인코딩된 이미지>", "size": "1024x1024", "strength": 0.8 }' ```

🎯 aiskill88 AI 点评 A 级 2026-05-20

创新的KV缓存压缩方案，解决MLX推理的显存瓶颈。持久化缓存设计独特，代码活跃，生产应用价值高。

📚 实用指南（长尾问题）

适合谁

需要让 Claude / Cursor 操作本地工具的 AI 工程师
构建多智能体协作系统的 Agent 开发者
构建企业知识库 / RAG 检索应用的团队
跨境业务、多语言内容运营团队
做语音类 AI 产品的开发者

最佳实践

配置 MCP 服务器时建议使用 stdio 传输 + JSON-RPC，避免暴露公网
本地部署优先选 GGUF 量化模型，节省显存并保持响应速度
Agent 任务先做 dry-run 验证工具调用链，再开启自主执行

常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
MCP 配置路径拼错或权限不足，重启 Claude Desktop 才生效
显存不足直接 OOM — 优先降低 context 或换更小的量化模型
Python 依赖冲突：建议用 venv / uv 隔离环境

部署方案

CLI：直接 npm install -g / pip install，命令行调用
本地部署：CPU 8GB 起，GPU 推荐 16GB+ 显存
云端托管：可放在 Vercel / Railway / Fly.io 等 PaaS 平台

⚡ 核心功能

开源免费，支持本地部署，数据完全自主可控
活跃的 GitHub 开源社区，持续迭代更新
提供详细文档和使用示例，新手友好
支持自定义配置，灵活适配不同使用环境
可作为基础组件集成进现有技术栈或进行二次开发

👥 适合谁

需要让 Claude / Cursor 操作本地工具的 AI 工程师
构建多智能体协作系统的 Agent 开发者
构建企业知识库 / RAG 检索应用的团队
跨境业务、多语言内容运营团队

⭐ 最佳实践

配置 MCP 服务器时建议使用 stdio 传输 + JSON-RPC，避免暴露公网
本地部署优先选 GGUF 量化模型，节省显存并保持响应速度
Agent 任务先做 dry-run 验证工具调用链，再开启自主执行

⚠️ 常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
MCP 配置路径拼错或权限不足，重启 Claude Desktop 才生效
显存不足直接 OOM — 优先降低 context 或换更小的量化模型
Python 依赖冲突：建议用 venv / uv 隔离环境

👥 适合人群

AI 技术爱好者研究人员和学生开发者和工程师技术创业者

🎯 使用场景

本地部署运行，保护数据隐私，满足合规要求
自定义集成到现有系统，扩展技术栈能力
作为开源基础组件进行商业化二次开发

⚖️ 优点与不足

✅ 优点

+Apache-2.0 协议，可免费商用
+完全开源免费，无授权费用
+本地部署，数据完全自主可控
+开发者社区支持，遇问题可查可问

⚠️ 不足

−安装和初始配置可能需要一定技术基础
−功能完整性通常不如成熟商业产品
−技术支持主要依赖开源社区，响应速度不稳定

⚠️ 使用须知

AI Skill Hub 为第三方内容聚合平台，本页面信息基于公开数据整理，不对工具功能和质量作任何法律背书。

建议在沙箱或测试环境中充分验证后，再部署至生产环境，并做好必要的安全评估。

📄 License 说明

🔗 相关工具推荐

transformers AI技能包

Hugging Face开源的深度学习框架，提供预训练语言模型、视觉模型和多模态模型。集成BERT、GPT、Llama等

ComfyUI 节点式AI图像生成

强大的开源扩散模型可视化工具，提供图形界面、API和后端服务。采用节点图式设计，支持模块化工作流构建，适合AI绘图、图像

llama-cpp AI技能包

高效的大语言模型C/C++推理框架，支持在本地CPU/GPU上运行量化LLM模型，具有内存占用小、推理速度快的特点。适合

yt-dlp 视频下载

功能强大的开源视频下载工具，支持YouTube、TikTok等数千个视频平台，可自动下载视频、字幕、封面和元数据。适合内

📚 相关教程推荐

Cursor AI 编程完全指南：Rules 配置、Composer 使用、MCP 集成

帮助中心 · AI Skill Hub

MCP 工作流生产级配置方案：从开发环境到团队共享

📰 相关 AI 新闻

Claude Code 最新功能与使用技巧

AI 资讯 · 知识关联

🍿 AI 圈相关吃瓜

AutoGPT 自主完成了任务：把我的文件夹全部重命名了

AI 圈观察

给 Agent 的目标是"提高效率"，三小时后它关掉了所有通知

AI 圈观察

Claude 回复了30页，我只问了"你好"

🗺️ 相关解决方案

ai-workflow-templates

translation

ai-translation-pipeline

cli

cli-productivity

🧩 你可能还需要

基于当前 Skill 的能力图谱，自动补全的工具组合

技能寻求者

MCP · Agent · 工作流

开源AI工具：RAG知识库系统

基于Vue.js前端的RAG知识库系统，提供高效的知识检索和生成功能，助力AI应用开发

DeepCode Agent工作流

MCP · Agent · 工作流

total-agent-memory MCP工具

为Claude Code和Codex CLI提供持久化记忆功能的开源MCP工具。自动提取知识图谱，支持多轮对话上下文保留，适合需要长期记忆和

MassGen多智能体系统

MCP · Agent · 工作流

natively-cluely-ai-assistant — Claude Skill 中文使用文档

免费开源的AI面试助手，实时转录，隐蔽模式，局部RAG，BYOK。无订阅，防止数据泄露。

❓ 常见问题 FAQ

vMLX支持哪些模型？−

主要支持MLX框架兼容的大语言模型，特别优化了流行的开源LLM模型。

L2磁盘缓存重启后会保留吗？+

压缩后会影响模型精度吗？+

安装这个工具需要什么基础？+

安装过程中遇到依赖冲突怎么办？+

工具安装成功但运行报错，该怎么处理？+

这个工具是否有数据隐私风险？+

工具更新后会影响已有的配置和数据吗？+

💡 AI Skill Hub 点评

AI Skill Hub 点评：vmlx MCP工具的核心功能完整，质量优秀。对于AI 技术爱好者来说，这是一个值得纳入个人工具库的选择。建议先在非生产环境试用，再逐步推广。

📚 深入学习 vmlx MCP工具

查看分步骤安装教程和完整使用指南，快速上手这款工具

⚙️ 安装教程 📚 使用教程

🌐 原始信息

原始名称	`vmlx`
原始描述	开源MCP工具：vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1。⭐512 · Python
Topics	`模型压缩KV缓存优化MLX框架MCP工具显存优化`
GitHub	https://github.com/jjang-ai/vmlx
License	Apache-2.0
语言	Python

🔗 原始来源

🐙 GitHub 仓库 https://github.com/jjang-ai/vmlx 🌐 官方网站 https://vmlx.net

收录时间：2026-05-18 · 更新时间：2026-05-19 · License：Apache-2.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。