Built for Apple Silicon

Run LLMs natively on your Mac

Native Zig inference server with a macOS menu bar app. OpenAI-compatible API. No Python. Just fast.

Download View on GitHub
macOS 14+
M1 / M2 / M3 / M4
MIT License
MLX Claw app — chat interface with model browser and server controls
37
tokens/sec decode
220
tokens/sec prefill
0
Python dependencies

Everything you need to run local AI

A complete inference stack from server to UI, built from scratch in Zig and Swift.

Native Performance

Written in Zig with direct MLX-C bindings. No Python runtime, no overhead. KV cache reuse across requests for instant multi-turn.

OpenAI-Compatible API

Drop-in replacement. Chat completions, streaming, tool calling, embeddings, logprobs. Works with any OpenAI client library.

Built-in Agent

7 built-in tools: shell, file read/write/edit, search, web browse, web search. Extend with prompt-based skills — just drop a markdown file.

Menu Bar App

Native macOS app lives in your menu bar. Download models from HuggingFace with resumable transfers. Chat, browse, and manage from one place.

Streaming & Tool Calling

Real-time SSE streaming with automatic tool call detection. The model can call functions, get results, and continue reasoning — all in one request.

Extensible Skills

Teach the agent new capabilities by dropping markdown files in a folder. No code needed — just describe the workflow and the agent follows it.

Up and running in seconds

Terminal
# Download a release or build from source
git clone https://github.com/ddalcu/mlx-serve
cd mlx-serve
zig build -Doptimize=ReleaseFast

# Start the server
./zig-out/bin/mlx-serve \
  --model ~/models/gemma-4-4b-it-4bit \
  --serve --port 8080
Use the API
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'
MLX Claw
MLX Claw
Desktop app for macOS
Get

Run the latest open models

Supports quantized MLX-format models. Download directly from HuggingFace in the app.

Gemma 4

Google · 4B, 12B, 27B

Qwen 3.5

Alibaba · MoE · 4B, 14B, 32B

Llama 3

Meta · 8B, 70B

Mistral

Mistral AI · 7B, 8x7B